What is the proper configuration for a server with 3000 users?

We experienced downtime with our server when 2000+ users tried logging in simultaneously.

We upgraded our server

from
16 GB Memory
4 Core Processor
320 GB SSD Disk

to
32 GB Memory
8 Core Processor
640 GB SSD Disk

but no effect

what we did next is we increased the workers to
“gunicorn_workers”: 17
“background_workers”: 4

I would like to ask what are the experiences with other members in this matter

Edit: it went well for a couple of days and just today we had another crash.

6 Likes

The recommended gunicorn workers is : 2 * num_cores + 1 (ERPNext Performance Tuning · frappe/erpnext Wiki · GitHub).

How do you come up with 62 gunicorn & 4 background workers number ?

Regards,
Subhajit

1 Like

Hello we just did trial and error for 62 gunicorn. Is it too much?

Not sure. There is no definite sizing measures so far, so trial & error is the only way :slight_smile:

@johnskywalker is the database separated to another server or they are in the same server configuration mentioned above.Also is it dedicated or just shared vps? Need this information to guard me in deploying something similar.Thanks

Have you checked that your DB parameters are optimised?
Poor performance is often encountered if buffers on DB system are insufficient, and also if the OS disk writes are loaded with dirty cache buffers

1 Like

database server is in the same server. hosted in aws

The first rule of optimization, that you need to measure the things that you want to optimize. As others have already wrote, there are many places in an ERPNext deployment that can be optimized:

  • The Python servers (frontends and workers)
  • The MariaDB database servers
  • The Redis instances
  • The Nginx servers used for real-time communication

You should add monitoring for each of these services to find the bottlenecks in your system.

Also, if you are running on Amazon (or any other cloud provider) already, you would be probably better of looking at their Kubernetes offering. With a Kubernetes based deployment you get:

  • each of the above components can be managed and scaled independently
  • you can dynamically increase / reduce the number of running instances depending on the load on the system
  • you can easily add redundancy with automatic failover
  • if you configure Envoy or other similar L7 proxy / service mesh system to connect your ERPNext components, you get detailed statistics / monitoring / load balancing / TLS termination for “free”.
6 Likes

Can you check if your /tmp directory is disk based? ( $ df /tmp)
if it resides on disk like /dev/sda1, the gunicorn workers may tend to hang for half a minute according to Gunicorn’s FAQ.

You may need to set --worker-tmp-dir /dev/shm as a gunicorn parameter in your supervisor.conf file.

1 Like

thank you for this. will look into this

You may also try using --threads 10 parameter to the gunicorn line at supervisor.conf, and lower the gunicorn workers.

For a 4 core processor, the gunicorn parameters can be: -w 9 --threads 10

This means that you have 9 workers but each worker can pool 10 threads.
Effectively, you have 90 places instead of your current 62 using lower machine capacity.

(worker is 2 x num_core + 1, and thread is 2 to 4 x numcores → this means for 4 core, you can set the thread up to 16 threads per worker.)

To set this, you have to edit the supervisor.conf file.

2 Likes

Thanks! right now its -w 9 --threads 120 so far it’s working fine now.

Threads of a worker share a common python GIL per worker.
So, at any moment if one thread locks up the GIL, the other threads of the worker will have to wait.
If you notice your system slows down, try lowering the threads.

1 Like

@Joseph_Marie_Alba1 I tried etc/supervisord.conf there is no information and tried etc/supervisor/supervisord.conf I have information but there is no information in regard to workers and thread.I could adjust the workers from the common site ,also there is no reference to thread also there.Pls guide as to the directory to adjust this thread because I need this to boast the speed of one of my site.

Hehehe. You are looking for conf at the wrong places.
Are you on production mode? Did you install using Easy Install?

I am on production.Yes I installed by easy install.Thanks

Assuming that you installed your frappe setup in the bench directory:
lock for supervisor.conf file under the conf directory (its sibblings are env, apps, sites, etc.)
In the supervisor.conf file, look for the gunicorn line.
Update me how it goes.

From env directory i got gunicorn line.But by default i met the thread to be 120.I have adjusted it to 24 because i am using 6 cores processor and the gunicorn by default is 12 which i left to remain 12
I will watch for improvement.Thanks

Actually, supervisor.conf should be in the conf directory not env?

But I got it from env with this :sudo nano /etc/supervisor/conf.d/*.conf . Or I am doing something wrong?