What is the proper configuration for a server with 3000 users?

johnskywalker · August 2, 2020, 8:54am

We experienced downtime with our server when 2000+ users tried logging in simultaneously.

We upgraded our server

from
16 GB Memory
4 Core Processor
320 GB SSD Disk

to
32 GB Memory
8 Core Processor
640 GB SSD Disk

but no effect

what we did next is we increased the workers to
“gunicorn_workers”: 17
“background_workers”: 4

I would like to ask what are the experiences with other members in this matter

Edit: it went well for a couple of days and just today we had another crash.

SubhajitDey · August 2, 2020, 10:21am

The recommended gunicorn workers is : 2 * num_cores + 1 (ERPNext Performance Tuning · frappe/erpnext Wiki · GitHub).

How do you come up with 62 gunicorn & 4 background workers number ?

Regards,
Subhajit

johnskywalker · August 2, 2020, 4:09pm

Hello we just did trial and error for 62 gunicorn. Is it too much?

SubhajitDey · August 3, 2020, 2:28am

Not sure. There is no definite sizing measures so far, so trial & error is the only way

Fred1 · August 3, 2020, 9:56am

@johnskywalker is the database separated to another server or they are in the same server configuration mentioned above.Also is it dedicated or just shared vps? Need this information to guard me in deploying something similar.Thanks

trentmu · August 3, 2020, 12:58pm

Have you checked that your DB parameters are optimised?
Poor performance is often encountered if buffers on DB system are insufficient, and also if the OS disk writes are loaded with dirty cache buffers

johnskywalker · September 11, 2020, 4:34pm

database server is in the same server. hosted in aws

kisg · September 11, 2020, 8:17pm

The first rule of optimization, that you need to measure the things that you want to optimize. As others have already wrote, there are many places in an ERPNext deployment that can be optimized:

The Python servers (frontends and workers)
The MariaDB database servers
The Redis instances
The Nginx servers used for real-time communication

You should add monitoring for each of these services to find the bottlenecks in your system.

Also, if you are running on Amazon (or any other cloud provider) already, you would be probably better of looking at their Kubernetes offering. With a Kubernetes based deployment you get:

each of the above components can be managed and scaled independently
you can dynamically increase / reduce the number of running instances depending on the load on the system
you can easily add redundancy with automatic failover
if you configure Envoy or other similar L7 proxy / service mesh system to connect your ERPNext components, you get detailed statistics / monitoring / load balancing / TLS termination for “free”.

Joseph_Marie_Alba1 · September 13, 2020, 2:34am

Can you check if your /tmp directory is disk based? ( $ df /tmp)
if it resides on disk like /dev/sda1, the gunicorn workers may tend to hang for half a minute according to Gunicorn’s FAQ.

You may need to set --worker-tmp-dir /dev/shm as a gunicorn parameter in your supervisor.conf file.

johnskywalker · September 13, 2020, 3:29pm

thank you for this. will look into this

Joseph_Marie_Alba1 · September 13, 2020, 3:39pm

You may also try using --threads 10 parameter to the gunicorn line at supervisor.conf, and lower the gunicorn workers.

For a 4 core processor, the gunicorn parameters can be: -w 9 --threads 10

This means that you have 9 workers but each worker can pool 10 threads.
Effectively, you have 90 places instead of your current 62 using lower machine capacity.

(worker is 2 x num_core + 1, and thread is 2 to 4 x numcores → this means for 4 core, you can set the thread up to 16 threads per worker.)

To set this, you have to edit the supervisor.conf file.

johnskywalker · September 14, 2020, 11:22pm

Thanks! right now its -w 9 --threads 120 so far it’s working fine now.

Joseph_Marie_Alba1 · September 15, 2020, 2:15am

Threads of a worker share a common python GIL per worker.
So, at any moment if one thread locks up the GIL, the other threads of the worker will have to wait.
If you notice your system slows down, try lowering the threads.

Fred1 · September 15, 2020, 7:25am

@Joseph_Marie_Alba1 I tried etc/supervisord.conf there is no information and tried etc/supervisor/supervisord.conf I have information but there is no information in regard to workers and thread.I could adjust the workers from the common site ,also there is no reference to thread also there.Pls guide as to the directory to adjust this thread because I need this to boast the speed of one of my site.

Joseph_Marie_Alba1 · September 15, 2020, 7:55am

Hehehe. You are looking for conf at the wrong places.
Are you on production mode? Did you install using Easy Install?

Fred1 · September 15, 2020, 8:17am

I am on production.Yes I installed by easy install.Thanks

Joseph_Marie_Alba1 · September 15, 2020, 8:55am

Assuming that you installed your frappe setup in the bench directory:
lock for supervisor.conf file under the conf directory (its sibblings are env, apps, sites, etc.)
In the supervisor.conf file, look for the gunicorn line.
Update me how it goes.

Fred1 · September 15, 2020, 9:46am

From env directory i got gunicorn line.But by default i met the thread to be 120.I have adjusted it to 24 because i am using 6 cores processor and the gunicorn by default is 12 which i left to remain 12
I will watch for improvement.Thanks

Joseph_Marie_Alba1 · September 15, 2020, 9:53am

Actually, supervisor.conf should be in the conf directory not env?

Fred1 · September 15, 2020, 10:07am

But I got it from env with this :sudo nano /etc/supervisor/conf.d/*.conf . Or I am doing something wrong?