The command bench update fails due to "redis_cache service is not running"

Hello,

We were adding images for ErpNext File List for our website purposes. Those files were too big and the system informed that the maximum size is 10Mb. Although the files were actually sized either below 1Mb, one of them around 3 Mb and some about 4 Mb too. After the system informed about the file size, the website went offline (both public and admin part). Public side is showing the text
“Sorry!
We will be back soon.”

We went to terminal, updated the Linux server and gave command “bench restart”. The website came back and we started to upload more images. It failed instantly and the same repeated. After “bench restart” the website came back visible, also the admin part. Third time we tried to upload files, it didn’t come anymore back with “bench restart”.

Then we have tried to give command “bench update” but it goes ahead until it fails saying following:

Patching sites…
Migrating [site name].
redis_cache service is not running.
Cannot run bench migrate without the services running.

Command “bench --version” gives:
5.14.3

We’ve killed redis server job that was running and then
sudo systemctl stop redis-server
sudo systemctl start redis-server
sudo systemctl reload redis-server

No help for those. The redis_cache seems to be the issue.

We’ve also done following:

bench setup requirements
bench update --reset
and then
bench migrate

This is problematic since all happened only after uploading files and this is an alive website for a business! Now it has been hours showing " Sorry! We will be back soon."

Just noted that the website is few seconds online during the command “bench start” is running. When it ends for the below lines, it goes offline with “Sorry” again.

0:36:46 redis_socketio.1 | 8363:M 10 Oct 2022 20:36:46.062 # Redis is now ready to exit, bye bye...
20:36:46 redis_cache.1    | 8379:M 10 Oct 2022 20:36:46.091 # User requested shutdown...
20:36:46 redis_cache.1    | 8379:M 10 Oct 2022 20:36:46.093 * Removing the pid file.
20:36:46 redis_cache.1    | 8379:M 10 Oct 2022 20:36:46.093 # Redis is now ready to exit, bye bye...
20:36:46 redis_queue.1    | 8362:M 10 Oct 2022 20:36:46.061 # User requested shutdown...
20:36:46 system           | redis_socketio.1 stopped (rc=-15)
20:36:46 system           | redis_cache.1 stopped (rc=-15)
20:36:46 redis_queue.1    | 8362:M 10 Oct 2022 20:36:46.061 * Removing the pid file.
20:36:46 redis_queue.1    | 8362:M 10 Oct 2022 20:36:46.063 # Redis is now ready to exit, bye bye...
20:36:46 system           | redis_queue.1 stopped (rc=-15)
20:36:46 system           | worker_short.1 stopped (rc=-15)
20:36:47 system           | web.1 stopped (rc=-15)

Hi,

Can you say the version of Frappe/ERPNext, the hosting environment and the amount of RAM ?

bench version

bench_manager 0.0.1
erpnext 13.39.5
frappe 13.41.4

This is running in a server with Ubuntu 20.04 LTS Disk with 25 Gb of Storage and 1 GB RAM.

And redis server itself is running but the redis cache service not.

sudo systemctl status redis-server
● redis-server.service - Advanced key-value store
     Loaded: loaded (/lib/systemd/system/redis-server.service; enabled; vendor preset: enabled)
     Active: active (running) since Mon 2022-10-10 20:56:18 UTC; 10h ago
       Docs: http://redis.io/documentation,
             man:redis-server(1)
    Process: 1102 ExecStart=/usr/bin/redis-server /etc/redis/redis.conf (code=exited, status=0/SUCCESS)
   Main PID: 1113 (redis-server)
      Tasks: 4 (limit: 1066)
     Memory: 1.3M
     CGroup: /system.slice/redis-server.service
             └─1113 /usr/bin/redis-server 127.0.0.1:6379

Oct 10 20:56:18 localhost systemd[1]: Starting Advanced key-value store...
Oct 10 20:56:18 localhost systemd[1]: redis-server.service: Can't open PID file /run/redis/redis-server.pid (ye>
Oct 10 20:56:18 localhost systemd[1]: Started Advanced key-value store.

Hello,

We realised that “bench start” is for development and “bench restart” for production. And also realised that if giving command “bench restart”, it seems to have hardcoded user “frappe”.

bench restart
$ supervisorctl restart frappe:
frappe: ERROR (no such group)
frappe: ERROR (no such group)
ERROR: 
Traceback (most recent call last): - - - 

We do not have user “frappe” existing but we have our own frappe user. How do we run “bench restart” with another frappe user?

I think 1G RAM is not enough. If possible see if you can get at least 4G on the server and take it from there.

1 Like

We should at least get it running. We added swap RAM already. Should I add more? The website has been running well for many months, we have added images earlier without issues and it is not under heavy load.

swapon -s
Filename				Type		Size	Used	Priority
/dev/sdb                               	partition	524284	0	-2
/swapfile                              	file    	524284	0	-3

As Far As I Know frappe needs more RAM, and redis especially requires it.

swapon -show
Filename				Type		Size	Used	Priority
/myswapfile                            	file    	1048572	0	-2
/dev/sdb                               	partition	524284	0	-3

We’ve added 1 giga swapfile. It should be possible to at least survive temporarily and bring up the website.

Since bench restart is hardcoded for “frappe” as a user and our frappe user is different, how can we get the system up and running?

I’m guessing the problem is that Redis encounters an out of memory condition and can’t continue ( guessing). In a terminal try: redis-cli, then type MEMORY (tab) to look at options to see and maybe purge memory and see if the info helps. Redis , to my knowledge , is not going to be able to use swap, but again, this isn’t something I know alot about.

We flushed the redis memory – actually already many times.
If trying “bench update”:

Backup for Site site1.local has been successfully completed
Patching sites...
Migrating site1.local
redis_cache service is not running.
Cannot run bench migrate without the services running.

I can’t find anywhere info HOW to put redis_cache running! We have given command “sudo systemctl start redis” but that has no effect on this and if we look the status of redis, it is indeed running.

If you haven’t already, take a look at the redis files in the frappe-bench(or whatever)/ config folder and note the values. Then try bench setup redis and bench setup socketio and look at them again.

We got transparent hugepages disabled. Looked here:

Then we adjusted some dirty pages, these in etc/sysctl.conf

fs.inotify.max_user_watches=262144
vm.overcommit_memory = 1
vm.swappiness=50
vm.dirty_background_ratio=40

Still “bench update” brings the issue of “redis_cache service is not running.” and then in the other hand “bench restart” stops due to searching for “frappe” as frappe user and ours is different.

So, no matter adjusting the conditions in the server it has no effect before we can get the redis_cache service running and change the frappe user in the bench restart script.

We will go ahead by removing, deleting the whole frappe-bench. We have taken site database and config as backup locally and some files. Then trying with “bench init” to bring a better frappe-bench back. Let’s see.

Hey, did you manage to get things working? I’d be happy to help out.

Supervisor manages the redis services for Bench and not systemctl. This may be a simple case of supervisor misbehaving.

1 Like

Hello,
we have removed frappe-bench and installed it again with “bench init”. Now trying to restore it with

bench —-force —-site site1.local restore /site1.local/private/backups/20221010_182909-site1_local-database.sql.gz —-with-public-files /site1.local/public/files/ —-with-private-files /site1.local/private/files/

Just something wrong with the above format since it is asking to look bench help. We had these files outside the bench folder when deleted it, so we have re-structured the folders and the files are there again. So, missing files is not an issue.

Just run the command with bench —-site site1.local —-force restore /site1.local/private/backups/20221010_182909-site1_local-database.sql.gz copy the files over into the right public/files and private/files folders inside the site folder accordingly.

I think we would have got it working otherwise now (the line – were wrong type) but now it says:

Site site1.local does not exist!

So, we were thinking this would restore this part too. Should I get this “site1.local” to some config first? (we have a file called “20221011_211618-site1_local-site_config_backup.json”)

If you made a new bench, then it will not have any site in it. You’ll have to create a new site by running - bench new-site site1.local and then try restoring.

Depending on what you are using, you may also have to install erpnext (and other apps) on this bench.