Frappe Docker V13: Email Queue stuck

Emails are stuck in the queue unless they are manually triggered via “Send Now” Button.

I had set up a multi-bench Frappe Docker Production instance.
I have configured Email Domain, Email Account(Default Sending)
Any email sent from a document is stuck in the Email Queue.
However clicking manually using “Send Now” Button is able to send the mail

I have enabled scheduler and workers are also available.
Any idea to fix this issue.

Is this related?

There were no error logs found. I tried sending email with and without attachment.
However i could clear the queue only manually.
This site was perfectly working fine in VM. Recently i migrated the site to docker(same version of ERPNext and Frappe as that of VM) via backup restore.

Please find the compose file volume definition section. This is as per original frappe docker documentation. No modification done.

      volume: {}
networks:
  bench-network:
    name: lens-core-q
  mariadb-network:
    name: mariadb-network
    external: true
  traefik-public:
    name: traefik-public
    external: true
volumes:
  assets:
    name: lens-core-q_assets
  redis-data:
    name: lens-core-q_redis-data
  sites:
    name: lens-core-q_sites
x-backend-defaults:
  depends_on:
    configurator:
      condition: service_completed_successfully
  image: frappe/frappe-worker:v13.32.0
  volumes:
  - sites:/home/frappe/frappe-bench/sites
  - assets:/home/frappe/frappe-bench/sites/assets:ro
x-depends-on-configurator:
  depends_on:
    configurator:
      condition: service_completed_successfully
x-erpnext-backend-image:
  image: frappe/erpnext-worker:v13.33.0

compose file is incomplete

share one that was generated, like the one called gitops in example.

Below compose file was generated by executing

docker compose --project-name lens-core-q \
  --env-file ~/gitops/lens-core-q.env \
  -f compose.yaml \
  -f overrides/compose.erpnext.yaml \
  -f overrides/compose.redis.yaml \
  -f docs/compose/compose.multi-bench.yaml \
  -f docs/compose/compose.multi-bench-ssl.yaml config > ~/gitops/lens-core-q.yaml
name: lens-core-q
services:
  backend:
    depends_on:
      configurator:
        condition: service_completed_successfully
    image: frappe/erpnext-worker:v13.33.0
    networks:
      bench-network: null
      mariadb-network: null
    volumes:
    - type: volume
      source: sites
      target: /home/frappe/frappe-bench/sites
      volume: {}
    - type: volume
      source: assets
      target: /home/frappe/frappe-bench/sites/assets
      read_only: true
      volume: {}
  configurator:
    command:
    - configure.py
    depends_on:
      redis:
        condition: service_started
    environment:
      DB_HOST: mariadb-database
      DB_PORT: "3306"
      REDIS_CACHE: redis:6379/0
      REDIS_QUEUE: redis:6379/1
      REDIS_SOCKETIO: redis:6379/2
      SOCKETIO_PORT: "9000"
    image: frappe/erpnext-worker:v13.33.0
    networks:
      bench-network: null
      mariadb-network: null
    volumes:
    - type: volume
      source: sites
      target: /home/frappe/frappe-bench/sites
      volume: {}
    - type: volume
      source: assets
      target: /home/frappe/frappe-bench/sites/assets
      read_only: true
      volume: {}
  frontend:
    depends_on:
      backend:
        condition: service_started
      websocket:
        condition: service_started
    environment:
      BACKEND: backend:8000
      FRAPPE_SITE_NAME_HEADER: $$host
      SOCKETIO: websocket:9000
      UPSTREAM_REAL_IP_ADDRESS: 127.0.0.1
      UPSTREAM_REAL_IP_HEADER: X-Forwarded-For
      UPSTREAM_REAL_IP_RECURSIVE: "off"
    image: frappe/erpnext-nginx:v13.33.0
    labels:
      traefik.docker.network: traefik-public
      traefik.enable: "true"
      traefik.http.routers.lens-core-q-http.entrypoints: http
      traefik.http.routers.lens-core-q-http.middlewares: https-redirect
      traefik.http.routers.lens-core-q-http.rule: Host(`dev.nectar.lmnas.com`,`qnexindia.lmnas.com`,`qsgb.lmnas.com`)
      traefik.http.routers.lens-core-q-http.service: lens-core-q
      traefik.http.routers.lens-core-q-https.entrypoints: https
      traefik.http.routers.lens-core-q-https.rule: Host(`dev.nectar.lmnas.com`,`qnexindia.lmnas.com`,`qsgb.lmnas.com`)
      traefik.http.routers.lens-core-q-https.service: lens-core-q
      traefik.http.routers.lens-core-q-https.tls: "true"
      traefik.http.routers.lens-core-q-https.tls.certresolver: le
      traefik.http.services.lens-core-q.loadbalancer.server.port: "8080"
    networks:
      bench-network: null
      traefik-public: null
    volumes:
    - type: volume
      source: assets
      target: /usr/share/nginx/html/assets
      volume: {}
    - type: volume
      source: sites
      target: /usr/share/nginx/html/sites
      volume: {}
  queue-default:
    command:
    - bench
    - worker
    - --queue
    - default
    depends_on:
      configurator:
        condition: service_completed_successfully
    image: frappe/erpnext-worker:v13.33.0
    networks:
      bench-network: null
      mariadb-network: null
    volumes:
    - type: volume
      source: sites
      target: /home/frappe/frappe-bench/sites
      volume: {}
    - type: volume
      source: assets
      target: /home/frappe/frappe-bench/sites/assets
      read_only: true
      volume: {}
  queue-long:
    command:
    - bench
    - worker
    - --queue
    - long
    depends_on:
      configurator:
        condition: service_completed_successfully
    image: frappe/erpnext-worker:v13.33.0
    networks:
      bench-network: null
      mariadb-network: null
    volumes:
    - type: volume
      source: sites
      target: /home/frappe/frappe-bench/sites
      volume: {}
    - type: volume
      source: assets
      target: /home/frappe/frappe-bench/sites/assets
      read_only: true
      volume: {}
  queue-short:
    command:
    - bench
    - worker
    - --queue
    - short
    depends_on:
      configurator:
        condition: service_completed_successfully
    image: frappe/erpnext-worker:v13.33.0
    networks:
      bench-network: null
    volumes:
    - type: volume
      source: sites
      target: /home/frappe/frappe-bench/sites
      volume: {}
    - type: volume
      source: assets
      target: /home/frappe/frappe-bench/sites/assets
      read_only: true
      volume: {}
  redis:
    image: redis:6.2-alpine
    networks:
      bench-network: null
      mariadb-network: null
    volumes:
    - type: volume
      source: redis-data
      target: /data
      volume: {}
  scheduler:
    command:
    - bench
    - schedule
    depends_on:
      configurator:
        condition: service_completed_successfully
    image: frappe/erpnext-worker:v13.33.0
    networks:
      bench-network: null
      mariadb-network: null
    volumes:
    - type: volume
      source: sites
      target: /home/frappe/frappe-bench/sites
      volume: {}
    - type: volume
      source: assets
      target: /home/frappe/frappe-bench/sites/assets
      read_only: true
      volume: {}
  websocket:
    depends_on:
      configurator:
        condition: service_completed_successfully
    image: frappe/frappe-socketio:v13.32.0
    networks:
      bench-network: null
      mariadb-network: null
    volumes:
    - type: volume
      source: sites
      target: /home/frappe/frappe-bench/sites
      volume: {}
networks:
  bench-network:
    name: lens-core-q
  mariadb-network:
    name: mariadb-network
    external: true
  traefik-public:
    name: traefik-public
    external: true
volumes:
  assets:
    name: lens-core-q_assets
  redis-data:
    name: lens-core-q_redis-data
  sites:
    name: lens-core-q_sites
x-backend-defaults:
  depends_on:
    configurator:
      condition: service_completed_successfully
  image: frappe/frappe-worker:v13.32.0
  volumes:
  - sites:/home/frappe/frappe-bench/sites
  - assets:/home/frappe/frappe-bench/sites/assets:ro
x-depends-on-configurator:
  depends_on:
    configurator:
      condition: service_completed_successfully
x-erpnext-backend-image:
  image: frappe/erpnext-worker:v13.33.0

everything seems to be correct in yaml.

Try restarting the default queue container or the docker compose project.

Thanks @revant_one for taking a look at the yaml. I restarted the container, rebuilt the project. I cleared all the queue manually.
Even after doing all these, a new email triggered is again stuck in the queue.
FYI… This site was restored from a VM. Email was working perfect in the VM. After migrating to Docker we are facing this issue. All other sites also have the same problem.

Could you please share a sample yaml you have constructed so that i will use to compare if something is missing from my side.

I use ERPNext on Docker Swarm · GitHub

and https://helm.erpnext.com

if email is sent with “Send Now” button then firewalls should be fine.

are other background jobs working?

if you’ve many background jobs in a queue, you can scale up the number of containers for workers, checkout docker compose cli help/docs for scale.

@revant_one Thanks.
I reviewed Error logs and scheduled job logs. Everything seemed to look fine.
based on your last reply I investigated the background jobs. On a first glance i could not get any log. But “Show Failed Jobs” took about 30 sec to pull down the list of errored jobs.

Maybe this is the culprit. pymysql.err.OperationalError: (2003, “Can’t connect to MySQL server on ‘mariadb-database’ ([Errno -3] Temporary failure in name resolution)”)

This is the only kind of background job error in the list. Other jobs are not failing.
From other posts on the following error, It could be an mariadb access issue for non-root user.
Can’t Connect to MySQL server

frappe.email.queue.send_one
Traceback (most recent call last):
  File "/home/frappe/frappe-bench/env/lib/python3.9/site-packages/pymysql/connections.py", line 613, in connect
    sock = socket.create_connection(
  File "/usr/local/lib/python3.9/socket.py", line 823, in create_connection
    for res in getaddrinfo(host, port, 0, SOCK_STREAM):
  File "/usr/local/lib/python3.9/socket.py", line 954, in getaddrinfo
    for res in _socket.getaddrinfo(host, port, family, type, proto, flags):
socket.gaierror: [Errno -2] Name or service not known

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/frappe/frappe-bench/apps/frappe/frappe/utils/background_jobs.py", line 134, in execute_job
    method(**kwargs)
  File "/home/frappe/frappe-bench/apps/frappe/frappe/email/queue.py", line 475, in send_one
    email = frappe.db.sql(
  File "/home/frappe/frappe-bench/apps/frappe/frappe/database/database.py", line 148, in sql
    self.connect()
  File "/home/frappe/frappe-bench/apps/frappe/frappe/database/database.py", line 84, in connect
    self._conn = self.get_connection()
  File "/home/frappe/frappe-bench/apps/frappe/frappe/database/mariadb/database.py", line 76, in get_connection
    conn = pymysql.connect(
  File "/home/frappe/frappe-bench/env/lib/python3.9/site-packages/pymysql/connections.py", line 353, in __init__
    self.connect()
  File "/home/frappe/frappe-bench/env/lib/python3.9/site-packages/pymysql/connections.py", line 664, in connect
    raise exc
pymysql.err.OperationalError: (2003, "Can't connect to MySQL server on 'mariadb-database' ([Errno -2] Name or service not known)")

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/frappe/frappe-bench/env/lib/python3.9/site-packages/pymysql/connections.py", line 613, in connect
    sock = socket.create_connection(
  File "/usr/local/lib/python3.9/socket.py", line 823, in create_connection
    for res in getaddrinfo(host, port, 0, SOCK_STREAM):
  File "/usr/local/lib/python3.9/socket.py", line 954, in getaddrinfo
    for res in _socket.getaddrinfo(host, port, family, type, proto, flags):
socket.gaierror: [Errno -2] Name or service not known

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/frappe/frappe-bench/env/lib/python3.9/site-packages/rq/worker.py", line 1013, in perform_job
    rv = job.perform()
  File "/home/frappe/frappe-bench/env/lib/python3.9/site-packages/rq/job.py", line 709, in perform
    self._result = self._execute()
  File "/home/frappe/frappe-bench/env/lib/python3.9/site-packages/rq/job.py", line 732, in _execute
    result = self.func(*self.args, **self.kwargs)
  File "/home/frappe/frappe-bench/apps/frappe/frappe/utils/background_jobs.py", line 157, in execute_job
    frappe.db.rollback()
  File "/home/frappe/frappe-bench/apps/frappe/frappe/database/database.py", line 940, in rollback
    self.sql("rollback")
  File "/home/frappe/frappe-bench/apps/frappe/frappe/database/database.py", line 148, in sql
    self.connect()
  File "/home/frappe/frappe-bench/apps/frappe/frappe/database/database.py", line 84, in connect
    self._conn = self.get_connection()
  File "/home/frappe/frappe-bench/apps/frappe/frappe/database/mariadb/database.py", line 76, in get_connection
    conn = pymysql.connect(
  File "/home/frappe/frappe-bench/env/lib/python3.9/site-packages/pymysql/connections.py", line 353, in __init__
    self.connect()
  File "/home/frappe/frappe-bench/env/lib/python3.9/site-packages/pymysql/connections.py", line 664, in connect
    raise exc
pymysql.err.OperationalError: (2003, "Can't connect to MySQL server on 'mariadb-database' ([Errno -2] Name or service not known)")

And below is the log from mariadb container. (docker logs mariadb-database)

2022-09-25 02:32:47+00:00 [Note] [Entrypoint]: Entrypoint script for MariaDB Server 1:10.6.9+maria~ubu2004 started.
2022-09-25 02:33:08+00:00 [Note] [Entrypoint]: Switching to dedicated user 'mysql'
2022-09-25 02:33:08+00:00 [Note] [Entrypoint]: Entrypoint script for MariaDB Server 1:10.6.9+maria~ubu2004 started.
2022-09-25 02:33:08+00:00 [Note] [Entrypoint]: MariaDB upgrade not required
2022-09-25  2:33:08 0 [Note] mariadbd (server 10.6.9-MariaDB-1:10.6.9+maria~ubu2004) starting as process 1 ...
2022-09-25  2:33:08 0 [Warning] Setting lower_case_table_names=2 because file system for /var/lib/mysql/ is case insensitive
2022-09-25  2:33:08 0 [Note] mariadbd: Aria engine: starting recovery
recovered pages: 0% 13% 23% 36% 48% 67% 80% 100% (0.0 seconds); tables to flush: 2 1 0
 (0.0 seconds); 
2022-09-25  2:33:08 0 [Note] mariadbd: Aria engine: recovery done
2022-09-25  2:33:08 0 [Note] InnoDB: Compressed tables use zlib 1.2.11
2022-09-25  2:33:08 0 [Note] InnoDB: Number of pools: 1
2022-09-25  2:33:09 0 [Note] InnoDB: Using crc32 + pclmulqdq instructions
2022-09-25  2:33:09 0 [Note] mariadbd: O_TMPFILE is not supported on /tmp (disabling future attempts)
2022-09-25  2:33:09 0 [Note] InnoDB: Using Linux native AIO
2022-09-25  2:33:09 0 [Note] InnoDB: Initializing buffer pool, total size = 134217728, chunk size = 134217728
2022-09-25  2:33:09 0 [Note] InnoDB: Completed initialization of buffer pool
2022-09-25  2:33:09 0 [Note] InnoDB: Starting crash recovery from checkpoint LSN=2880599249,2888050250
2022-09-25  2:33:11 0 [Note] InnoDB: Starting a batch to recover 7636 pages from redo log.
2022-09-25  2:33:19 0 [Note] InnoDB: Starting final batch to recover 1177 pages from redo log.
2022-09-25  2:33:20 0 [Note] InnoDB: 128 rollback segments are active.
2022-09-25  2:33:38 0 [Note] InnoDB: Removed temporary tablespace data file: "./ibtmp1"
2022-09-25  2:33:38 0 [Note] InnoDB: Creating shared tablespace for temporary tables
2022-09-25  2:33:38 0 [Note] InnoDB: Setting file './ibtmp1' size to 12 MB. Physically writing the file full; Please wait ...
2022-09-25  2:33:38 0 [Note] InnoDB: File './ibtmp1' size is now 12 MB.
2022-09-25  2:33:38 0 [Note] InnoDB: 10.6.9 started; log sequence number 2935130095; transaction id 453902
2022-09-25  2:33:38 0 [Note] Plugin 'FEEDBACK' is disabled.
2022-09-25  2:33:38 0 [Note] InnoDB: Loading buffer pool(s) from /var/lib/mysql/ib_buffer_pool
2022-09-25  2:33:38 0 [Warning] You need to use --log-bin to make --expire-logs-days or --binlog-expire-logs-seconds work.
2022-09-25  2:33:38 0 [Note] Server socket created on IP: '0.0.0.0'.
2022-09-25  2:33:38 0 [Note] Server socket created on IP: '::'.
2022-09-25  2:33:38 0 [Note] InnoDB: Buffer pool(s) load completed at 220925  2:33:38
2022-09-25  2:33:39 0 [Note] mariadbd: ready for connections.
Version: '10.6.9-MariaDB-1:10.6.9+maria~ubu2004'  socket: '/run/mysqld/mysqld.sock'  port: 3306  mariadb.org binary distribution
2022-09-25  2:35:04 192 [ERROR] mariadbd: Table './_57ffdb70fef2935b/__global_search' is marked as crashed and should be repaired
2022-09-25  2:35:04 192 [Warning] Checking table:   './_57ffdb70fef2935b/__global_search'
2022-09-25  2:35:18 242 [ERROR] mariadbd: Table './_bb57f08da0c57814/taberror@0020log' is marked as crashed and should be repaired
2022-09-25  2:35:18 242 [Warning] Checking table:   './_bb57f08da0c57814/taberror@0020log'
2022-09-25  8:39:50 7935 [Warning] Aborted connection 7935 to db: '_57ffdb70fef2935b' user: '_57ffdb70fef2935b' host: '172.20.0.4' (Got an error reading communication packets)
2022-09-25  9:51:46 8690 [Warning] Aborted connection 8690 to db: '_bb57f08da0c57814' user: '_bb57f08da0c57814' host: '172.20.0.6' (Got an error reading communication packets)
2022-09-25 10:49:45 9064 [Warning] Aborted connection 9064 to db: '_bb57f08da0c57814' user: '_bb57f08da0c57814' host: '172.20.0.6' (Got an error reading communication packets)
2022-09-25 14:42:24 9202 [Warning] Aborted connection 9202 to db: '_bb57f08da0c57814' user: '_bb57f08da0c57814' host: '172.20.0.6' (Got an error reading communication packets)
2022-09-25 16:47:20 0 [Note] mariadbd (initiated by: unknown): Normal shutdown
2022-09-25 16:47:20 0 [Note] InnoDB: FTS optimize thread exiting.
2022-09-25 16:47:22 0 [Note] InnoDB: Starting shutdown...
2022-09-25 16:47:22 0 [Note] InnoDB: Dumping buffer pool(s) to /var/lib/mysql/ib_buffer_pool
2022-09-25 16:47:22 0 [Note] InnoDB: Restricted to 2028 pages due to innodb_buf_pool_dump_pct=25
2022-09-25 16:47:22 0 [Note] InnoDB: Buffer pool(s) dump completed at 220925 16:47:22
2022-09-25 16:47:25 0 [Note] InnoDB: Removed temporary tablespace data file: "./ibtmp1"
2022-09-25 16:47:25 0 [Note] InnoDB: Shutdown completed; log sequence number 2941051785; transaction id 462436
2022-09-25 16:47:25 0 [Note] mariadbd: Shutdown complete

2022-09-25 16:47:27+00:00 [Note] [Entrypoint]: Entrypoint script for MariaDB Server 1:10.6.9+maria~ubu2004 started.
2022-09-25 16:47:42+00:00 [Note] [Entrypoint]: Switching to dedicated user 'mysql'
2022-09-25 16:47:42+00:00 [Note] [Entrypoint]: Entrypoint script for MariaDB Server 1:10.6.9+maria~ubu2004 started.
2022-09-25 16:47:42+00:00 [Note] [Entrypoint]: MariaDB upgrade not required
2022-09-25 16:47:42 0 [Note] mariadbd (server 10.6.9-MariaDB-1:10.6.9+maria~ubu2004) starting as process 1 ...
2022-09-25 16:47:42 0 [Warning] Setting lower_case_table_names=2 because file system for /var/lib/mysql/ is case insensitive
2022-09-25 16:47:42 0 [Note] InnoDB: Compressed tables use zlib 1.2.11
2022-09-25 16:47:42 0 [Note] InnoDB: Number of pools: 1
2022-09-25 16:47:42 0 [Note] InnoDB: Using crc32 + pclmulqdq instructions
2022-09-25 16:47:42 0 [Note] mariadbd: O_TMPFILE is not supported on /tmp (disabling future attempts)
2022-09-25 16:47:42 0 [Note] InnoDB: Using Linux native AIO
2022-09-25 16:47:42 0 [Note] InnoDB: Initializing buffer pool, total size = 134217728, chunk size = 134217728
2022-09-25 16:47:42 0 [Note] InnoDB: Completed initialization of buffer pool
2022-09-25 16:47:43 0 [Note] InnoDB: 128 rollback segments are active.
2022-09-25 16:47:57 0 [Note] InnoDB: Creating shared tablespace for temporary tables
2022-09-25 16:47:57 0 [Note] InnoDB: Setting file './ibtmp1' size to 12 MB. Physically writing the file full; Please wait ...
2022-09-25 16:47:57 0 [Note] InnoDB: File './ibtmp1' size is now 12 MB.
2022-09-25 16:47:57 0 [Note] InnoDB: 10.6.9 started; log sequence number 2941051785; transaction id 462437
2022-09-25 16:47:57 0 [Note] Plugin 'FEEDBACK' is disabled.
2022-09-25 16:47:57 0 [Note] InnoDB: Loading buffer pool(s) from /var/lib/mysql/ib_buffer_pool
2022-09-25 16:47:57 0 [Warning] You need to use --log-bin to make --expire-logs-days or --binlog-expire-logs-seconds work.
2022-09-25 16:47:57 0 [Note] Server socket created on IP: '0.0.0.0'.
2022-09-25 16:47:57 0 [Note] Server socket created on IP: '::'.
2022-09-25 16:47:57 0 [Note] mariadbd: ready for connections.
Version: '10.6.9-MariaDB-1:10.6.9+maria~ubu2004'  socket: '/run/mysqld/mysqld.sock'  port: 3306  mariadb.org binary distribution
2022-09-25 16:47:59 0 [Note] InnoDB: Buffer pool(s) load completed at 220925 16:47:59
2022-09-25 16:58:38 1090 [Warning] Access denied for user 'root'@'localhost' (using password: NO)

From above logs do you infer anything. Or could you guide me on how to fix this.

Any help or clue. All my sites migrated from VM to Docker suffer from this issue. Your help on this will be much appreciated.

The problem is due to queue-short not having access to mariadb-network.
If you are generating project compose file using the official frappe docker single server setup

Then compose.multi-bench.yaml would be missing mariadb-network for queue-short, due to which queue-short would not be able to connect to the database and would result in the following error
pymysql.err.OperationalError: (2003, "Can't connect to MySQL server on 'mariadb-database' ([Errno -2] Name or service not known)")

Fix,
Add the mariadb-network to the networks for queue-short in compose.multi-bench.yaml so that queue-short can have access to mariadb.

Please refer to the PR queue short added with mariadb network

Thanks @revant_one for your references and directions. I could draw differences between your files and pwd.yml from frappe_docker to figure our the issue.

1 Like