Nginx shuts down atleast once every month at midnight. Why?

adam26d · September 3, 2018, 7:14am

I can’t figure it out, but I guess the answer should be in a log somewhere. sudo service nginx status give me the status below. I’ve to manually restart nginx for the ERPNext to resume.

● nginx.service - A high performance web server and a reverse proxy server
   Loaded: loaded (/lib/systemd/system/nginx.service; enabled; vendor preset: enabled)
   Active: failed (Result: timeout) since Mon 2018-09-03 00:00:06 CAT; 9h ago
  Process: 31655 ExecStop=/sbin/start-stop-daemon --quiet --stop --retry QUIT/5 --pidfile /run/nginx.p
  Process: 15139 ExecStart=/usr/sbin/nginx -g daemon on; master_process on; (code=exited, status=0/SUC
  Process: 15137 ExecStartPre=/usr/sbin/nginx -t -q -g daemon on; master_process on; (code=exited, sta
 Main PID: 15142 (code=exited, status=0/SUCCESS)

Aug 31 10:15:34 erp.co.zm systemd[1]: nginx.service: Unit entered failed state.
Aug 31 10:15:34 erp.co.zm systemd[1]: nginx.service: Failed with result 'timeout'.
Aug 31 10:15:34 erp.co.zm systemd[1]: Starting A high performance web server and a reverse proxy serve
Aug 31 10:15:34 erp.co.zm systemd[1]: nginx.service: Failed to read PID from file /run/nginx.pid: Inva
Aug 31 10:15:34 erp.co.zm systemd[1]: Started A high performance web server and a reverse proxy server
Sep 03 00:00:01 erp.co.zm systemd[1]: Stopping A high performance web server and a reverse proxy serve
Sep 03 00:00:06 erp.co.zm systemd[1]: nginx.service: Stopping timed out. Terminating.
Sep 03 00:00:06 erp.co.zm systemd[1]: Stopped A high performance web server and a reverse proxy server
Sep 03 00:00:06 erp.co.zm systemd[1]: nginx.service: Unit entered failed state.
Sep 03 00:00:06 erp.co.zm systemd[1]: nginx.service: Failed with result 'timeout'.
~
~
~
~
~
lines 1-18/18 (END)

bkm · July 3, 2019, 2:19pm

Did you ever find a solution to this issue?

BKM

trentmu · July 3, 2019, 2:38pm

Anything in the cron logs?

chabad360 · July 3, 2019, 5:51pm

Would you be able to post your nginx config (/etc/nginx/nginx.conf), and (if applicable) /etc/nginx/conf.d/frappe.conf?

VamYip · July 6, 2019, 11:24am

I face the same issue and suspect apt-daily which might be doing an upgrade/update and stopping nginx in the process.

sudo journalctl -xe might also provide logs of what happened before/after nginx stopped.

Haven’t been able to spend time to test this assumption or fix it.

Regards,
Vamyip

adam26d · July 7, 2019, 6:13am

Hi @bkm

For me it’s due to letsencrypt renewal failing to start nginx after attempting to renew certificates.

Check if you have faulty domains using certbot dry-run and remove or fix them.

I recently discovered post-hooks but haven’t tried the feature which basically runs after any renewal attempts. This should resolve the issue too.

Please share your experience here afterwards.

bkm · July 7, 2019, 1:09pm

I only had this occur on one hosted instance and rather than constantly mess with trying to detect when the host was down, I created a new host in another environment and moved all of the data. The problem never reoccurred.

However, the fact that I could not actually identify the cause has bothered me for a long time and I keep checking threads about it to see if anyone has ever found a solution.

BKM

adam26d · July 7, 2019, 4:34pm

Right. Most likely some lingering Nginx configs from another site. My suspicions remain with letsencrypt.

Obinna_Ukwueze · July 7, 2019, 9:35pm

I had a similar issue and I discovered that “Scheduler” was using up way too much resources.

bkm · July 8, 2019, 1:12pm

That is an interesting observation. Now much (on average) was it adding to memory and cpu usage?

BKM

chabad360 · July 19, 2019, 3:16am

Try running Let’s Encrypt (even if it isn’t renewal time) and seeing if nginx comes back up after it runs.

Maloba_Wanekeya · April 18, 2020, 3:25pm

This has been happening for me every 5 minutes - culprit was letsencrypt cron job (/usr/local/vesta/bin/v-update-sys-queue letsencrypt) that was running every 5 minutes.I will be digging in logs to know exactly why this happened, I have suspended this CRON JOB