[performance] Splitting Reads and Writes for better scalability

Off late we started receiving screenshots like below from our ground operations team -

23%20PM

Request Timed Out
While submitting sales order, sales invoice, purchase order, purchase receipt, stock entry etc…

With further investigation we realised that whenever someone ran Stock balance report this problem was coming and we were able to reproduce it every time thereafter.

The output of mysqladmin -u root -p -i 2 processlist showed the database query is taking a lot of time and mysql threads were in following states most of the times -

Copying to tmp table
Sending data

While the stock balance report was fetching required data from database, all other transaction requests were getting queued and eventually timed out.

It made sense to segregate READ (Reporting) and WRITE (Transactional) calls and direct them to different databases as a first measure so that our transactional requests have a clear passage and they do not time out. Idea was to setup mysql replication and divert all reporting calls to replica database instead of the primary database.

Introducing @frappe.read_only()

Couple of weeks back we had a code sprint with frappe team at our office. We implemented a support to divert ```read-only``` calls to replica database. A new function decorator ```@frappe.read_only``` is added to decorate methods which should use replica database.

e.g. following method in frappe/query_report.py is now decorated as below -

@frappe.whitelist()
@frappe.read_only()               
def run(report_name, filters=None, user=None):

        report = get_report_doc(report_name)

2 new attributes namely - slave_host and use_slave_for_read_only are added in site_config.json to enable and configure the replica database -

{
 "db_name": "c6b2c772b91fd3d8",
 "db_password": "agYCYGfvpfrBzfDe",
 "slave_host": "replica01.ntex.com",
 "use_slave_for_read_only": 1
}

The @frappe.read_only() decorator wraps the method calls and alters the database connection by connecting to a replica database instead of main database as follows -

frappe/__init__.py

def read_only():
        def innfn(fn):
                def wrapper_fn(*args, **kwargs):
                        if conf.use_slave_for_read_only:
                                connect_read_only()  <--- connects replica

With this change, our deployment architecture looked like below -

Untitled%20Diagram-Page-1
This helped us alleviate problems at our database side. However we were still sharing the the same python environment for reports and transaction calls. All HTTP requests were still piling up at a single gunicorn server.

We introduce another python runtime and updated the deployment architecture as follows -

Untitled%20Diagram-Page-2

At the Nginx webserver we did URL based routing and diverted all the report calls to secondary ErpNext server. Nginx config snippet below -

location / {
    if ( $arg_cmd = "frappe.desk.reportview.export_query" ) {
      proxy_pass http://secondary.ntex.com;
      break;
    }
    if ( $arg_cmd = "frappe.desk.query_report.run" ) {
      proxy_pass http://secondary.ntex.com;
      break;
    }
    proxy_pass http://primary.ntex.com;

Since frappe maintains user session information in database table, it was getting replicated to the replica server as well and users remained singed-in no matter to which ERPNext server instance they get routed to.

Ideally there should not be any transactional calls from secondary ErpNext, but on every http request, frappe updates users session in a database table. We must make this update in the primary database itself since, we were doing only 1 way replication.

We also locked down database user’s permissions on the replica db to allow only read-only operations so as to avoid any accidental updates in replica database.

Hope above commentary helps any one looking to scale up their ErpNext deployments.

26 Likes

How many gunicorn workers were you running before this became an issue? Was the source of this slow running requests?

We are running on 4 web workers and 6 background workers in transactional system, and 4 web workers and 0 background workers in Reporting System.

One can also use this architecture and tune their reporting servers to be optimised for heavy calculations as reporting can indeed become a CPU intensive task, while keeping a general purpose server for transactional system.

1 Like

The next thing I was thinking was to split background job execution between two Python runtimes. Our main runtime is used almost 24 hours a day.

We can somehow annotate or configure background jobs to be scheduled on different servers by introducing a doctype called scheduler config

It can also be a static json file or an extenstion to site_config.json as well.

We can then at runtime add new scheduler config records as follows -

{
job: "erpnext.stock.reorder_item.reorder_item",
server_id : "primary"
}

{
job: "erpnext.setup.doctype.email_digest.email_digest.send",
server_id : "secondary"
}

The scheduler while scheduling the job would lookup in the config record and match the server_id with SERVER_ID environment variable and either schedule or skip the job. If there is no scheduler config is found, it would by default run it on the primary server.

This way we can split the background jobs between available servers.

1 Like

While using above implementation, we found out both servers were using different cache, causing differences in results with functions using redis cache(frappe.cache).

To fix it, we did a few tweaks to point secondary server to use redis of primary server. To do that,

  1. Changed bind_address in redis_cache and redis_socketio to lan_ip(in our case, we did 0.0.0.0) instead of 127.0.0.1 on primary server. (make sure not to make it 0.0.0.0 on a server having a public ip, use lan ethernet ip instead).
  2. Changed redis_cache and redis_socketio hosts in common_site_config in secondary server to point to master server.
"redis_cache": "redis://localhost:13000", 
"redis_socketio": "redis://localhost:12000", 

to

"redis_cache": "redis://master_server_ip:13000", 
"redis_socketio": "redis://master_server_ip:12000", 

This way, we got all servers to use single redis, hence resolved the problem of wrong caching.

Taking forward what we were doing, we did a few more tweaks to increase no of workers, which spread across multiple servers as well. Following changes were done.

  1. Change bind address in config files of redis_queue as well to lan_ip or 0.0.0.0.
  2. Changed common site_config in secondary server to point to master server’s ip
"redis_queue": "redis://localhost:11000", 

to

"redis_queue": "redis://master_server_ip:11000", 

Along with this, we had to disable the scheduler on secondary server by removing frappe-schedule entry from supervisor.conf. Following this, reloading and restarting supervisor did the trick.

Removed below entry from supervisor.conf in secondary server.

[program:frappe-bench-frappe-schedule]
command=/usr/local/bin/bench schedule
priority=3
autostart=true
autorestart=true
stdout_logfile=/part2/workspace/pr/frappe-bench/logs/schedule.log
stderr_logfile=/part2/workspace/pr/frappe-bench/logs/schedule.error.log
user=himanshu
directory=/part2/workspace/pr/frappe-bench

Edit: Now after this, we came across third issue. Keeping both instance’s corresponding site folders synced with each other, so that changes in site config folder, as well as any updated files created could be synced to secondary.

For this, rsync was the solution. We set up a cron job of syncing sites folder using rsync by adding following entry to crontab of secondary server.

* * * * * rsync -rakv frappe@master_server_ip:~/frappe-bench/sites/my_site_folder ~/frappe-bench/sites/ >> ~/frappe-bench/logs/rsync.log

Now any changes done on primary site_configs are reflected in secondary in a minute (very acceptable for planned deployments)

4 Likes

Do I have to backup and restore the main database to replica database or will it replica automatically after providing necessary atttributes?

Google “MySQL replication”. Basically we have setup a real time replication from primary database server to secondary.

1 Like

Here is a good video on working with replication using MariaDB. It’s introductory, but it provides sufficient information on this topic. I hope it helps.

1 Like

Dumb question. Can I configure this on V11, v12 or this is custom made configuration?

Have anyone faced issue on system try to update session to slave database?

V11 onwards

1 Like

Excellent. Thanks. One query, is the above architectural setup different than what frappe now provides out of the box here: setup-read-from-secondary-db ?

No. It is the same feature and design provided out of the box post v11.

1 Like

Thank you Sachin_Mane for the quick response. Highly appreciate it.

Some parts not working as expected 2021-09-08

2 Likes

We tried this as well and it is quite incosistent. If your code references ERPNext reports (Which have @frappe_readonly()), and in your own code you try to do writes, they will fail because the your code will attempt writes to a mariadb server with read-only set.
We tried using the @frappe_writeonly() without any success