After Certbot https, ERPNext is too slow

xickomesquita · August 20, 2019, 4:44pm

Hi,

I https my site, with certbot (using the tutorial in its website), and after that, my site went too slow.

I’m running an EC2 AWS host, with ubuntu 18, install an ERPNext with python3.

I already checked the log files as shown on this post, and they are short.

My fail2ban:

/var/log/fail2ban.log {

weekly
rotate 4
compress

delaycompress
missingok
postrotate
    #fail2ban-client flushlogs 1>/dev/null
    fail2ban-client flushlogs 1>/dev/null || true
endscript

# If fail2ban runs as non-root it still needs to have write access
# to logfiles.
# create 640 fail2ban adm
create 640 root adm
}

Not sure if the logs are rotated.

Now, when I try to restart nginx, I’ve got this message:

Job for nginx.service failed because the control process exited with error code.
See “systemctl status nginx.service” and “journalctl -xe” for details.

and, with systemctl:

Job for nginx.service failed because the control process exited with error code.
See "systemctl status nginx.service" and "journalctl -xe" for details.
frappe@ip-172-31-81-53:~/frappe-bench$ ^C
frappe@ip-172-31-81-53:~/frappe-bench$ sudo systemctl status nginx.service
● nginx.service - A high performance web server and a reverse proxy server
   Loaded: loaded (/lib/systemd/system/nginx.service; enabled; vendor preset: enabled)
   Active: failed (Result: exit-code) since Tue 2019-08-20 17:01:34 UTC; 42s ago
 Docs: man:nginx(8)
  Process: 2881 ExecStop=/sbin/start-stop-daemon --quiet --stop --retry QUIT/5 --pidfile /run/nginx.pid (code=exited, status=2)
  Process: 4718 ExecStart=/usr/sbin/nginx -g daemon on; master_process on; (code=exited, status=1/FAILURE)
  Process: 4711 ExecStartPre=/usr/sbin/nginx -t -q -g daemon on; master_process on; (code=exited, status=0/SUCCESS)
 Main PID: 849 (code=exited, status=0/SUCCESS)

Aug 20 17:01:31 ip-172-31-81-53 systemd[1]: Starting A high performance web server and a reverse proxy server...
Aug 20 17:01:31 ip-172-31-81-53 nginx[4718]: nginx: [emerg] bind() to 0.0.0.0:80 failed (98: Address already in use)
Aug 20 17:01:32 ip-172-31-81-53 nginx[4718]: nginx: [emerg] bind() to 0.0.0.0:80 failed (98: Address already in use)
Aug 20 17:01:32 ip-172-31-81-53 nginx[4718]: nginx: [emerg] bind() to 0.0.0.0:80 failed (98: Address already in use)
Aug 20 17:01:33 ip-172-31-81-53 nginx[4718]: nginx: [emerg] bind() to 0.0.0.0:80 failed (98: Address already in use)
Aug 20 17:01:33 ip-172-31-81-53 nginx[4718]: nginx: [emerg] bind() to 0.0.0.0:80 failed (98: Address already in use)
Aug 20 17:01:34 ip-172-31-81-53 nginx[4718]: nginx: [emerg] still could not bind()
Aug 20 17:01:34 ip-172-31-81-53 systemd[1]: nginx.service: Control process exited, code=exited status=1
Aug 20 17:01:34 ip-172-31-81-53 systemd[1]: nginx.service: Failed with result 'exit-code'.
Aug 20 17:01:34 ip-172-31-81-53 systemd[1]: Failed to start A high performance web server and a reverse proxy server.

Any hints?

bkm · August 20, 2019, 5:11pm

I am curious to know what tutorial you used. I thought the process was pretty simple by using the bench setup command to do it.

I just did this on a v12 production server running Ubuntu 18.04

I used these steps:

bench config dns_multitenant on
bench setup add-domain --site1.local [my.stiename.com]
sudo bench setup lets-encrypt site1.local --custom-domain [my.sitename.com]
sudo service nginx restart

The only difference between version is that for a v12 installs (and some recent v11) you have to edit a correction into one of the py files BEFORE you try to get the certificate.

Using nano editor:

nano ~/.bench/bench/config/nginx.py

Search for the work “unicode” and change it to “bytes” then save the file.
(it only occurs one time in the file)

As long as you edit the nginx.py file before you run the bench setup command for lets encrypt, then everything will work just fine.

If there is another way to do it, then I am just unaware of it.

Not sure if this woll help, but thought I would tell you how I succeeded.

BKM

xickomesquita · August 20, 2019, 6:17pm

Use the certbot site tutorial (https://certbot.eff.org/lets-encrypt/ubuntubionic-nginx).

I’ve ran your steps:

bench config dns_multitenant on

ok

bench setup add-domain erpnext.galileobio.com --site erpnext.galileobio.com

ok

sudo bench setup lets-encrypt erpnext.galileobio.com --custom-domain erpnext.galileobio.com

Error:

frappe@ip-172-31-81-53:~/frappe-bench$ bench setup add-domain erpnext.galileobio.com --site erpnext.galileobio.com
frappe@ip-172-31-81-53:~/frappe-bench$ sudo bench setup lets-encrypt erpnext.galileobio.com --custom-domain erpnext.galileobio.com
Running this will stop the nginx service temporarily causing your sites to go offline
Do you want to continue? [y/N]: y
INFO:bench.utils:sudo systemctl stop nginx
INFO:bench.utils:/opt/certbot-auto  --config /etc/letsencrypt/configs/erpnext.galileobio.com.cfg certonly
Saving debug log to /var/log/letsencrypt/letsencrypt.log
Plugins selected: Authenticator standalone, Installer None
Cert not yet due for renewal

You have an existing certificate that has exactly the same domains or certificate name you requested and isn't close to expiry.
(ref: /etc/letsencrypt/renewal/erpnext.galileobio.com.conf)

What would you like to do?
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
1: Keep the existing certificate for now
2: Renew & replace the cert (limit ~5 per 7 days)
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Select the appropriate number [1-2] then [enter] (press 'c' to cancel): 1
Keeping the existing certificate

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Certificate not yet due for renewal; no action taken.
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
nginx.conf already exists and this will overwrite it. Do you want to continue? [y/N]: y
INFO:bench.utils:sudo systemctl start nginx
Job for nginx.service failed because the control process exited with error code.
See "systemctl status nginx.service" and "journalctl -xe" for details.
Traceback (most recent call last):
  File "/usr/local/bin/bench", line 11, in <module>
    load_entry_point('bench', 'console_scripts', 'bench')()
  File "/home/frappe/.bench/bench/cli.py", line 40, in cli
    bench_command()
  File "/usr/lib/python3/dist-packages/click/core.py", line 722, in __call__
    return self.main(*args, **kwargs)
  File "/usr/lib/python3/dist-packages/click/core.py", line 697, in main
    rv = self.invoke(ctx)
  File "/usr/lib/python3/dist-packages/click/core.py", line 1066, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/usr/lib/python3/dist-packages/click/core.py", line 1066, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
   File "/usr/lib/python3/dist-packages/click/core.py", line 895, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/usr/lib/python3/dist-packages/click/core.py", line 535, in invoke
    return callback(*args, **kwargs)
  File "/home/frappe/.bench/bench/commands/setup.py", line 128, in setup_letsencrypt
    setup_letsencrypt(site, custom_domain, bench_path='.', interactive=not non_interactive)
  File "/home/frappe/.bench/bench/config/lets_encrypt.py", line 41, in setup_letsencrypt
    run_certbot_and_setup_ssl(site, custom_domain, bench_path, interactive)
  File "/home/frappe/.bench/bench/config/lets_encrypt.py", line 80, in run_certbot_and_setup_ssl
    service('nginx', 'start')
  File "/home/frappe/.bench/bench/config/production_setup.py", line 67, in service
    exec_cmd("sudo {service_manager} {option} {service}".format(service_manager='systemctl', option=option, service=service))
  File "/home/frappe/.bench/bench/utils.py", line 161, in exec_cmd
    raise CommandFailedError(cmd)
bench.utils.CommandFailedError: sudo systemctl start nginx
frappe@ip-172-31-81-53:~/frappe-bench$ sudo bench setup lets-encrypt erpnext.galileobio.com --custom-domain erpnext.galileobio.com

sudo service nginx restart

Error:

Job for nginx.service failed because the control process exited with error code.
See "systemctl status nginx.service" and "journalctl -xe" for details.

In this meantime, the system is working, veeery slowly.

Thanks in advance!!

UPDATE: When I test Nginx:

sudo nginx -t

nginx: [warn] conflicting server name “erpnext.galileobio.com” on 0.0.0.0:443, ignored
nginx: [warn] conflicting server name “erpnext.galileobio.com” on 0.0.0.0:80, ignored
nginx: the configuration file /etc/nginx/nginx.conf syntax is ok
nginx: configuration file /etc/nginx/nginx.conf test is successful

so, I’ve use ‘fuser -k’, and could do all the steps OK.

Restarted my server and… its usable!! but still slow…

Thanks so much for your tip!!

rmeyer · August 20, 2019, 7:41pm

Strange, I didn’t notice any performance issues with letsencrypt.

bkm · August 20, 2019, 7:52pm

What operating system are you using?

xickomesquita · August 20, 2019, 8:08pm

I’m using Ubuntu 18, hosted at AWS EC2, with 1GB + 4GB swap…

I started thinking about logs, but all my logs are very small… I don’t know if there is some tricks with it to handle…

And the most strange is when I click in any link for the fist time, it take some time, but, when I go back an click again, it works fast…

Very strange!

EDIT: It could be something about the database?

rmeyer · August 20, 2019, 11:46pm

That’s not strange at all. It’s called caching

bkm · August 21, 2019, 11:44am

As @rmeyer indicated. This is the expected. Once the system is started, the first time you use a portion of the database it creates a holding place in memory so it does not have to search out all the links again. It simply remembers them in a cache setup in memory to speed thing up. This would also happen if you restarted only the mysql service without restarting the whole system.

As long as it is working fast enough AFTER the first access of the database, then your system is working properly. Remember this means that the first access of any table will be slow and then it will be remembered and faster for the next access.

So, open an Item in the the Stock section
open a saved sales invoice and sales order
open a saved customer
open a saved purchase order

After that, most of your system will have been accessed at least once and it will run faster.

BKM

xickomesquita · August 21, 2019, 3:17pm

I wish, but… Not happening that way, at all…

After every log off, it takes 30 seconds to open some sections, like Purchase Order, for example. After the first login, and click, everything would get normal, but isn’t happening.

I have a 200~300 Mbps internet here, and it seems not a normal behavior…

It could have nothing to do with lets encrypt… but, as I said, I lock the logs, and they are with minimum size, rotated…

bkm · August 21, 2019, 3:27pm

Maybe it is not your installation that is the problem. Maybe it is the node where your server is being hosted.

Check out this post and try running it on your server to see if your server I/O is running fast enough to support your work:

It is certainly worth some investigating.

BKM

brian_pond · August 21, 2019, 3:27pm

I would not think Let’s Encrypt is the culprit. There should only be a very slight overhead from https.

Unfortunately, these types of performance issues can be time-consuming to debug. Could be server (CPU, Memory, Disk). Network bandwidth/latency. SQL database response time. Cache settings. A problem with a dependency. The list goes on.

Luckily, this sounds consistent, and not intermittent. After every restart, you always have a 30 second wait to open a Purchase Order? If it’s consistent, then it can be solved. But means reading logs, doing some fancy debugging and monitoring, etc.

xickomesquita · August 21, 2019, 4:51pm

Hi bkm,

So, I was read this before… You can see my results…

And my innob config:

Clearly, I have an issue to solve… My point is, before lets encrypt, I’ve never made this test, but, login was almost instantaneous…

xickomesquita · August 21, 2019, 5:00pm

Brian, I understand what you are saying.

I’m using AWS, and, until understand the ERPNext, I had to uninstall/terminate a lot of instances… All of them with almost instantaneous login.

Once I had issues to update the system, I understood that I need to improve my memory. So, You can see that I increase the swap to 4GB.

After that, everything goes fast. Until I https my instance.

I will still looking for some fixes to improve my speed.

Thanks!

brian_pond · August 21, 2019, 5:33pm

Well, there is an easy way to test if Let’s Encrypt is the problem.

First, make a backup copy of your current nginx.conf file.
Edit your nginx.conf. Comment-out (#) all the lines that implement TLS/SSL, certificates, etc.
Change your server block from 443 to 80.
Restart nginx.

That should be all that’s necessary. Nginx will continue to act as a reverse proxy to gunicorn. And continue serving static content. It will just do so normally, under HTTP via 80.

Then you can re-test the performance. And see if it’s better, worse, or no different.

bkm · August 21, 2019, 8:04pm

Ok, what I see is a reversal in your memory allocations. According to the test results you displayed, you have a little less than 1GB of regular memory and 4GB of swap space.

By running your system in this manner you have essentially no room in memory for the database caching that needs to take place. You only have 1GB to work with and that is really small. I have never had any luck with servers running less than 2GB of regular memory and things don’t really start moving at reasonable speeds until I have 4GB or more of regular memory. The SWAP memory that I use is only 1GB and that is more than sufficient even on my largest servers (10GB of memory or more).

SWAP memory is actually disk drive space and since you only have a 16.2GB disk and you allocated 4GB of that to SWAP memory, then you essentially hove no disk space to work with either! That is another choke point for your speed issues.

Since SWAP memory uses the disk drive to act like regular memory, it operates at only 10% of the speed (or even less) of regular memory. This is another potential source of your performance issues.

I have never run anything on AWS servers, but I have run on Google servers that are very similar, and I spent a bunch of time working through these kinds of performance issues until I figured out the right minimum combination of resources.

I think if you reversed your memory allocations to 4GB of real memory and 1GB of swap space you would have a tremendous improvement in performance.

Just my experience and opinion.

As always… Your mileage may vary…

BKM

xickomesquita · September 5, 2019, 1:54pm

Hi… After some days, I’ve changed the free AWS to a Vultr VPS… Now I have 4GB regular ans 4GB of swap…

The speed is ok now, but the catching is bit, in the first uses… I’m now checking all the configs, to assure that is everything on place, due to the server change… any hints?

Thanks

xickomesquita · June 24, 2020, 11:22pm

Updating… Solution is Cloudflare!

bkm · June 25, 2020, 8:58pm

Can you elaborate a bit on how or what Cloudfare did to assist in your quest for performance?

BKM

xickomesquita · June 26, 2020, 3:32am

Hi,

I hire Vultr, with 4gb mem… did a great swap, but, after all, I had a great lag between all the clicks… Did some research, and understood that I’m in Brazil, and hire a service in US (we don’t have great and cheap solutions over here, as far as I know), and my performance was awful…

After cloudflare, my instance was ‘copied’ to a São Paulo server, by cloudflare, and the system get much better, in terms of performance…

I don’t know if it can answer your question… If doesn’t, please let me know!

bkm · June 26, 2020, 4:18pm

There is the answer I was hoping to see. Thank you!

I take regional perspective into my search for servers all the time and never even think about servers not close to me any longer. It was from that perspective that I make the mistake of assuming that other would naturally do the same.

The financial disparity between local servers and those at a greater distance had not crossed my mind. Again Thank you for the update.

BKM