We experienced downtime with our server when 2000+ users tried logging in simultaneously.
We upgraded our server
16 GB Memory
4 Core Processor
320 GB SSD Disk
32 GB Memory
8 Core Processor
640 GB SSD Disk
but no effect
what we did next is we increased the workers to
I would like to ask what are the experiences with other members in this matter
Edit: it went well for a couple of days and just today we had another crash.
The recommended gunicorn workers is : 2 * num_cores + 1 (https://github.com/frappe/erpnext/wiki/ERPNext-Performance-Tuning).
How do you come up with 62 gunicorn & 4 background workers number ?
Hello we just did trial and error for 62 gunicorn. Is it too much?
Not sure. There is no definite sizing measures so far, so trial & error is the only way
@johnskywalker is the database separated to another server or they are in the same server configuration mentioned above.Also is it dedicated or just shared vps? Need this information to guard me in deploying something similar.Thanks
Have you checked that your DB parameters are optimised?
Poor performance is often encountered if buffers on DB system are insufficient, and also if the OS disk writes are loaded with dirty cache buffers
database server is in the same server. hosted in aws
The first rule of optimization, that you need to measure the things that you want to optimize. As others have already wrote, there are many places in an ERPNext deployment that can be optimized:
- The Python servers (frontends and workers)
- The MariaDB database servers
- The Redis instances
- The Nginx servers used for real-time communication
You should add monitoring for each of these services to find the bottlenecks in your system.
Also, if you are running on Amazon (or any other cloud provider) already, you would be probably better of looking at their Kubernetes offering. With a Kubernetes based deployment you get:
- each of the above components can be managed and scaled independently
- you can dynamically increase / reduce the number of running instances depending on the load on the system
- you can easily add redundancy with automatic failover
- if you configure Envoy or other similar L7 proxy / service mesh system to connect your ERPNext components, you get detailed statistics / monitoring / load balancing / TLS termination for “free”.
Can you check if your /tmp directory is disk based? ( $ df /tmp)
if it resides on disk like /dev/sda1, the gunicorn workers may tend to hang for half a minute according to Gunicorn’s FAQ.
You may need to set --worker-tmp-dir /dev/shm as a gunicorn parameter in your supervisor.conf file.
thank you for this. will look into this
You may also try using --threads 10 parameter to the gunicorn line at supervisor.conf, and lower the gunicorn workers.
For a 4 core processor, the gunicorn parameters can be: -w 9 --threads 10
This means that you have 9 workers but each worker can pool 10 threads.
Effectively, you have 90 places instead of your current 62 using lower machine capacity.
(worker is 2 x num_core + 1, and thread is 2 to 4 x numcores -> this means for 4 core, you can set the thread up to 16 threads per worker.)
To set this, you have to edit the supervisor.conf file.
Thanks! right now its -w 9 --threads 120 so far it’s working fine now.
Threads of a worker share a common python GIL per worker.
So, at any moment if one thread locks up the GIL, the other threads of the worker will have to wait.
If you notice your system slows down, try lowering the threads.
@Joseph_Marie_Alba1 I tried etc/supervisord.conf there is no information and tried etc/supervisor/supervisord.conf I have information but there is no information in regard to workers and thread.I could adjust the workers from the common site ,also there is no reference to thread also there.Pls guide as to the directory to adjust this thread because I need this to boast the speed of one of my site.
Hehehe. You are looking for conf at the wrong places.
Are you on production mode? Did you install using Easy Install?
I am on production.Yes I installed by easy install.Thanks
Assuming that you installed your frappe setup in the bench directory:
lock for supervisor.conf file under the conf directory (its sibblings are env, apps, sites, etc.)
In the supervisor.conf file, look for the gunicorn line.
Update me how it goes.
From env directory i got gunicorn line.But by default i met the thread to be 120.I have adjusted it to 24 because i am using 6 cores processor and the gunicorn by default is 12 which i left to remain 12
I will watch for improvement.Thanks
Actually, supervisor.conf should be in the conf directory not env?
But I got it from env with this :sudo nano /etc/supervisor/conf.d/*.conf . Or I am doing something wrong?