Emails had been working fine for some time. For no reason that I can discern, messages have started piling up in the queue with a status of ‘Not Sent’. If I go into one of the messages and hit ‘Send Now’. It sends immediately and successfully.
This is a helm-based deployment (all in dockers). Bench is not really an option. I’m not sure how the scheduler could fail in this environment but anything’s possible.
To close this out…I’d moved the location of my databases recently. I’d updated the configuration in each site’s site_config.json. I turned down the old database server and was operating successfully, or so I thought, on the new databases.
common_site_config.json - I just learned about this file tonight. At least one of the worker containers was consulting this file for its database configuration. The old database is offline and the pod was failing on startup. Overall, erpnext was up and mostly working fine. The scheduler seems to have been the only collateral damage.