[FYI/"Workable"] Stock Entry (STE-) submit timeouts in Production

I previously created this post. This is a continuation, and possibly/hopefully an improvement - on that one.

I use…
ERPNext: v10.1.81 (898977b) (v10.x.x)
Frappe Framework: v10.1.71-8 (92dc1e0) (v10.x.x)
on a Debian stretch 9.x manually installed setup - i7 with 16GB RAM on an SSD
The DB parameters are “optimised” as much as possible within my knowledge for performance
The hostnames are in the /etc/hosts file to ensure it isn’t a DNS issue

The “user” PC is a debian LXDE i5 with 16GB RAM using chromium browser (behaviour is same in FF)

Problem:

  • When I submit certain (weirdly specific) STE requests, they would timeout. No error…just a timeout, and the doctype doesn’t get submitted.

Method:

  • I changed the timeouts, increased all sorts of parameters and retried lots of times.

Result:

  • A workable workaround which (for my scenario) is adequate.
    • Not ideal, but OK.

Observations:

  • Although I now have 8x gunicorn threads, only ONE seems to be doing any work.
  • The CPU is an i7 - only 1 of its cores appears to be working (~90% in top, but in the GUI it shows ~10-13%)
  • As I submit more STE-#'s, it does appear to get marginally quicker on each subsequent submit
    • possibly it still has cached copies of some of the code or something??
  • While the mem usage is relatively low, the response is also low.
    • eg. when the in use RAM is ~3GB for the overall system, there are a lot more timeouts.
    • When the overall system is >5GB, there are less timeouts, and the submits are (quite a lot) quicker?
    • If I set up several tabs on the browser, each “ready” to save/submit, and then submit the next one the moment the done “ting” sounds, then there are less timeouts too
      • this also leads me to believe that there is a cache effect somewhere
  • The timeout message appears after 2mins - no matter what I use for the -t parameter??

Adjusted parameters via

nano ~/frappe-bench/config/supervisor.conf

  • Increased the number of worker threads to 8 (from default of 2)
  • Increased timeout to 360s (from default of 60s. I used 360s initially, and this worked on some STE’s but there is one particular BOM type that still fails, so I upped it to 480[fail], 600[fail] and eventually what worked was 900 for those)
  • Changed listener to 0.0.0.0 (from default of 127.0.0.1) to ensure that I access whatever port feels like listening, just in case of DNS type annoyances as outlined in the one section of the old post…

and updated/reloaded it with

sudo supervisorctl reread
sudo systemctl restart supervisor

I now have

ps -ef | grep guni

/home/frappe/frappe-bench/env/bin/python /home/frappe/frappe-bench/env/bin/gunicorn -b 0.0.0.0:8000 -w 8 -t 360 frappe.app:application --preload --limit-request-line 0

This allows me to (albeit slowly) submit them correctly. So, to summarise, what is working is as follows:

  • Save/Submit the STE
  • When it times out…just LEAVE IT untouched…it continues in the BG
  • If the timeouts are adequate, then it will finish its submit process - as long as you don’t mess with it.
  • The CPU utilisation will remain at the ~10%
    • depends what you have in there - (i7 ~10%, i5 ~35% etc) until the process finishes or properly times out or bombs
    • If the CPU drops back to very low (ie <5%) and the screen doesn’t yet show “Submitted”, then the timeout was NOT adequate, and the process is incomplete
      • you’ll need to increase the timeouts further, and restart the supervisor processes like this
        • nano ~/frappe-bench/config/supervisor.conf and increase the -t 60 to a higher number, like say 900 for the horrendously slow stuff
        • sudo systemctl restart supervisor
          • I found that using this
            ⇒ supervisorctl reread
            ⇒ supervisorctl stop all
            ⇒ supervisorctl start all
            • did not load the new/adjusted parameters, while restarting the entire process did
  • The “workable” submit - which eventually finished after ~13mins of nothingness
    • The 3rd submit attempt took “only” 6mins to complete, and the 4th one took only 5mins, so definitely got much faster

I do NOT know why this happens, nor how to actually fix it.

  • hopefully someone who knows what’s what will see these notes/observations, have a eureka moment and change something in the mainstream code which can address/fix it properly
    • I have looked at Submit Stock Entry taking too long time which describes a similar problem, in that the transactions are backdated, but I am not using a script like in that post, and don’t know how to do such things (yet).
      • There is an observation made in that same post about the fact that STE is uniquely slow
      • The post is from 2016, so I’m guessing that things in the code have changed - but apparently not quite enough just yet

Here are some pretty pictures to show the observations

Hopefully this will be helpful. Comments/clues/help welcomed if there is a better way.

1 Like

You have dug around lots and uncovered many prospective parameters at play -

This here Performance issue with v10 pos notes MySQL analysis tools documented here:

https://dev.mysql.com/doc/refman/8.0/en/show-processlist.html
https://dev.mysql.com/doc/refman/8.0/en/sys-processlist.html
https://dev.mysql.com/doc/refman/8.0/en/thread-information.html
https://dev.mysql.com/doc/refman/8.0/en/threads-table.html

They may help learn precisely what query or queries are spending their time that may point you to the holdup?

frappe@ubuntu:~/frappe-bench$ mysql -u root -p
Enter password: 
Welcome to the MariaDB monitor.  Commands end with ; or \g.
Your MariaDB connection id is 348420
Server version: 10.2.13-MariaDB-10.2.13+maria~xenial-log mariadb.org binary distribution

Copyright (c) 2000, 2018, Oracle, MariaDB Corporation Ab and others.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

MariaDB [(none)]> SELECT * FROM performance_schema.threads\G
Empty set (0.08 sec)

MariaDB [(none)]> show databases;
+--------------------+
| Database           |
+--------------------+
| 1bd3e0294da19198   |
| d56cb677eaab3383   |
| information_schema |
| mysql              |
| performance_schema |
+--------------------+
5 rows in set (0.03 sec)

MariaDB [mysql]> use 1bd3e0294da19198;
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A

Database changed
MariaDB [1bd3e0294da19198]> SELECT * FROM performance_schema.threads\G
Empty set (0.00 sec)

MariaDB [1bd3e0294da19198]> show processlist;
+--------+-------------+-----------+------------------+---------+------+--------------------------+------------------+----------+
| Id     | User        | Host      | db               | Command | Time | State                    | Info             | Progress |
+--------+-------------+-----------+------------------+---------+------+--------------------------+------------------+----------+
|      1 | system user |           | NULL             | Daemon  | NULL | InnoDB purge coordinator | NULL             |    0.000 |
|      3 | system user |           | NULL             | Daemon  | NULL | InnoDB purge worker      | NULL             |    0.000 |
|      2 | system user |           | NULL             | Daemon  | NULL | InnoDB purge worker      | NULL             |    0.000 |
|      4 | system user |           | NULL             | Daemon  | NULL | InnoDB purge worker      | NULL             |    0.000 |
|      5 | system user |           | NULL             | Daemon  | NULL | InnoDB shutdown handler  | NULL             |    0.000 |
| 348420 | root        | localhost | 1bd3e0294da19198 | Query   |    0 | init                     | show processlist |    0.000 |
+--------+-------------+-----------+------------------+---------+------+--------------------------+------------------+----------+
6 rows in set (0.04 sec)

MariaDB [1bd3e0294da19198]> SELECT * FROM INFORMATION_SCHEMA.PROCESSLIST;
+--------+-------------+-----------+------------------+---------+------+--------------------------+----------------------------------------------+---------+-------+-----------+----------+-------------+---------------+----------+----------------------------------------------+------+
| ID     | USER        | HOST      | DB               | COMMAND | TIME | STATE                    | INFO                                         | TIME_MS | STAGE | MAX_STAGE | PROGRESS | MEMORY_USED | EXAMINED_ROWS | QUERY_ID | INFO_BINARY                                  | TID  |
+--------+-------------+-----------+------------------+---------+------+--------------------------+----------------------------------------------+---------+-------+-----------+----------+-------------+---------------+----------+----------------------------------------------+------+
| 348420 | root        | localhost | 1bd3e0294da19198 | Query   |    0 | Filling schema table     | SELECT * FROM INFORMATION_SCHEMA.PROCESSLIST | 9.992   |     0 |         0 | 0.000    |       85296 |             0 |  3039715 | SELECT * FROM INFORMATION_SCHEMA.PROCESSLIST | 2725 |
|      5 | system user |           | NULL             | Daemon  |    0 | InnoDB shutdown handler  | NULL                                         | 0.000   |     0 |         0 | 0.000    |       24360 |             0 |        0 | NULL                                         | 2543 |
|      4 | system user |           | NULL             | Daemon  |    0 | InnoDB purge worker      | NULL                                         | 0.000   |     0 |         0 | 0.000    |       24360 |             0 |        0 | NULL                                         | 2539 |
|      2 | system user |           | NULL             | Daemon  |    0 | InnoDB purge worker      | NULL                                         | 0.000   |     0 |         0 | 0.000    |       24360 |             0 |        0 | NULL                                         | 2537 |
|      3 | system user |           | NULL             | Daemon  |    0 | InnoDB purge worker      | NULL                                         | 0.000   |     0 |         0 | 0.000    |       24360 |             0 |        0 | NULL                                         | 2538 |
|      1 | system user |           | NULL             | Daemon  |    0 | InnoDB purge coordinator | NULL                                         | 0.000   |     0 |         0 | 0.000    |       24360 |             0 |        0 | NULL                                         | 2536 |
+--------+-------------+-----------+------------------+---------+------+--------------------------+----------------------------------------------+---------+-------+-----------+----------+-------------+---------------+----------+----------------------------------------------+------+
6 rows in set (0.02 sec)

This Slow Query Log utility may shed light on the problem https://mariadb.com/kb/en/library/slow-query-log-overview/

Just one of your 8x gunicorn workers does the work, presumably that is all that is required to respond to the front end HTTP requests and other housekeeping tasks.

This Record Lock Issue Version 4 lists the output from the query SHOW ENGINE INNODB STATUS\G that apparently concerns a record locking issue - perhaps look into that in your case - you stated one of your BOMs was particularly slow.