Server Memory Issues

wale · May 22, 2017, 3:01am

Hi ,

I seem to be encountering some issues with memory on my ERPNext server. Below is the output of the ‘free’ utility over a 20 minute period (about 5 min intervals)

      total        used        free      shared  buff/cache   available
Mem:           7982        4770        1600           8        1611        3105
Swap:             0           0           0
ubuntu@ip-172-31-1-158:~$ free -m
      total        used        free      shared  buff/cache   available
Mem:           7982        4971        1592           8        1418        2907
Swap:             0           0           0
ubuntu@ip-172-31-1-158:~$ free -m
      total        used        free      shared  buff/cache   available
Mem:           7982        3515        4383           8          82        4364
Swap:             0           0           0
ubuntu@ip-172-31-1-158:~$ free -m
      total        used        free      shared  buff/cache   available
Mem:           7982        7885          45           8          50           7
Swap:             0           0           0

As shown, the server practically ran out of 8 Gb memory during a period when only 1 user (me) is accessing any of the sites! It just seems to keep fluctuating significantly. I would really appreciate some guidance here

Thanks

rmehta · May 22, 2017, 7:13am

@wale share a list of processes that are using your memory

wale · May 22, 2017, 7:52am

Hi @rmehta

Good to hear from you. Please find below:

PID  PPID CMD                         %MEM %CPU

21111     1 /usr/bin/redis-server 127.0 54.2  9.3

 1579  1288 /usr/sbin/mysqld --basedir=  5.5  0.2

 1611  1027 /usr/bin/redis-server 127.0  0.7  0.0

24248  1608 /home/erpnext/frappe-bench/  0.5  8.0

21120  1027 /home/erpnext/frappe-bench/  0.4  0.3

 1608  1027 /home/erpnext/frappe-bench/  0.4  0.1

24159  1608 /home/erpnext/frappe-bench/  0.3  0.0

24298 24297 /home/erpnext/frappe-bench/  0.3 36.0

12048  1027 /usr/bin/node /home/erpnext  0.3  0.0

Kind regards,

wale · May 22, 2017, 1:14pm

Hi @rmehta

Did you get a chance to view my response above? Looks like Redis is the major factor there. Any suggestions please?

Thanks

rmehta · May 22, 2017, 1:36pm

Seems very weird. Can you debug why redis is eating so much memory? I would assume Mariadb to be the culprit.

wale · May 22, 2017, 1:57pm

Hi @rmehta, could you give me some guidance on how to go about this?

wale · May 22, 2017, 2:51pm

…Also, is there a recommended maxmemory value for Redis?

wale · May 23, 2017, 1:09am

memory info for Redis

# Memory
used_memory:2144019240
used_memory_human:2.00G
used_memory_rss:2165862400
used_memory_peak:2144072488
used_memory_peak_human:2.00G
used_memory_lua:36864
mem_fragmentation_ratio:1.01
mem_allocator:jemalloc-3.6.0
(4.53s)

This is after I set maxmemory to 2gb. MariaDB version is 10.0.30

clarkej · May 24, 2017, 12:19pm

Wale this redis howto explains how to conduct a benchmark analysis to profile memory usage

If you could profile, benchmark and dig into your issue - to identify what you are doing and the result - that would help guide others to theorize and formulate ideas, thanks.

wale · May 25, 2017, 5:57pm

Hi @clarkej

Thanks a lot for your feedback. Redis actually seems to have gotten better after I upgraded Mariadb to 10.1.x
The main issue now is erpnext which seems to be gobbling up memory whenever a list or form is left open on any of the sites. Please see output of top below:

top - 17:49:48 up 1 day, 17:01,  1 user,  load average: 10.56, 6.47, 6.43

Tasks: 139 total,   1 running, 138 sleeping,   0 stopped,   0 zombie

%Cpu(s):  0.0 us, 15.3 sy,  0.0 ni,  0.1 id, 84.4 wa,  0.0 hi,  0.0 si,  0.1 st

KiB Mem :  8173676 total,    48096 free,  8000808 used,   124772 buff/cache

KiB Swap:        0 total,        0 free,        0 used.     8360 avail Mem

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND

20534 erpnext   20   0 6131568 5.766g   1328 D   0.3 74.0   1:05.42 python

 1111 redis     20   0  993224 905212   1292 D   0.3 11.1   3:30.93 redis-serv+

 1425 mysql     20   0 1089460 517424      0 S   0.4  6.3   5:18.33 mysqld

 4383 erpnext   20   0  199976 110528   1228 D   0.3  1.4   0:20.68 gunicorn

31531 erpnext   20   0  194016 104928   1436 S   0.1  1.3   0:24.28 gunicorn

 1521 erpnext   20   0  124872  86908    640 D   0.2  1.1   2:02.23 redis-serv+

 1518 erpnext   20   0  117432  29440    528 D   0.3  0.4   0:50.14 gunicorn

25870 erpnext   20   0  114104  28596    944 D   0.3  0.3   0:00.89 python

 1519 erpnext   20   0 1196640  24288      0 D   0.3  0.3   0:37.07 node

25881 erpnext   20   0   77436  19636   1632 S   0.0  0.2   0:00.29 bench

25889 erpnext   20   0   76484  15812   1832 D   0.3  0.2   0:00.33 python

 1056 root      20   0   58164  12572    836 D   0.3  0.2  16:29.83 supervisord

 1082 root      20   0  221784  11412      0 S   0.7  0.1   1:10.43 snapd

  341 root      20   0   43888  10072   3548 S   0.0  0.1   2:45.07 systemd-jo+

 1522 erpnext   20   0   38856   7140    756 D   0.1  0.1   1:29.60 redis-serv+

17708 www-data  20   0  493456   4196      0 S   0.3  0.1   0:23.98 apache2

17709 www-data  20   0  493520   4140      0 S   0.3  0.1   0:23.90 apache2

@rmehta any ideas? Could this be some kind of memory leak or is there some setting I should be looking at?

Kind regards,

clarkej · May 25, 2017, 6:43pm

Certainly Python cannot garbage collect a list or form left open.

But circular reference ‘memory leak’ cases must be identified for them to be fixed to be garbage collected it notes here 28.12. gc — Garbage Collector interface — Python 2.7.18 documentation

That is not my cup of soup to diagnose…

wale · May 25, 2017, 8:21pm

Hi @clarkej

Upon further investigation, it appears that erpnext is eating up ram regardless of what’s going on in the sites. It just continues to take up more and more ram till the system kills the process!

Thanks

wale · May 26, 2017, 2:19pm

Still hoping for some assistance here…

Kind regards,

ebjorsell · May 26, 2017, 2:48pm

This is anecdotal, and I will try to get some hard facts. Yesterday I attempted to upgrade to version 8. It was a fresh install on Ubuntu 16.04 fully up to date, with a v6 DB imported and migrated.

I saw unusual memory usage, and pages were hanging/timing out in places. I ended up having to roll back.

If I have time today, I’ll try glean some specific facts. For what it is worth, there seems to be some instability in the erpnext stack at the moment.

-Jev

wale · May 26, 2017, 6:00pm

Thanks a lot @ebjorsell, would really appreciate whatever info you can gather. This is quite a significant issue and we need to get it resolved asap

Thanks for responding

Cheers!

bomsy · May 26, 2017, 10:07pm

Hi Wale,
I’m assuming you are on a production environment.
I’ve also recently been having the same issues.
Can you run this for ports on redis_queue.conf and redis_cache.conf files to monitor and give a feedback

redis-cli -h [host] -p [port] -r -1 -i 1 INFO | grep rss_human

hosts and ports are located in the config folder in your bench folder

Regards

wale · May 27, 2017, 8:07pm

Hi @bomsy

Thanks for your response. Getting a ‘connection refused’ error for redis_cache.conf and redis_queue.conf doesn’t give any output

Any other ideas?

Kind regards,

clarkej · May 29, 2017, 3:11pm

ok good - what system kills what process?

Either the frappe-bench/logs or /var/log/ should have details of your case.

wale · May 29, 2017, 3:41pm

Hi @clarkej

good to hear from you. There are several different logs in the frappe-bench/logs directory. Which of them would hold the required info? Same goes for /var/log/

Thanks

clarkej · May 29, 2017, 5:37pm

For a killed process as you state, I would say /var/log/syslog might confirm your case.

Similarly any of the logs in frappe-bench/logs may give you more context here what erpnext was doing at the time.