ERPNext.com Frappe Cloud Support Partners Foundation Frappe School

Docker Swarm And Classic Bench Setup

Hello Every One,
Firstly I would like to thank Revant for the effort he has put into Dockerizing the classic bench setup. That really helped understanding the life cycle of the bench.
The stack concept is what interests me. In the classic bench setup if there is a code change we have to restart the bench to take effect. This event causes downtime for all the sites even if the code for was one particular site.
The stack gives hope of isolating worker/redis stacks of each site. So in the event of an update the targeted stack will only. This will prevent the downtime for other sites.

What would be the best approach to do that.

Ref:

For a single site to run it needs all the services that bench runs.

That makes 1 bench == 1 stack == 1 helm release

Steps in brief, it will also work in classic bench.

  • create 1st bench, install multiple sites.
  • create 2nd bench (updated code), no sites installed.
  • set pause_scheduler and maintenance_mode for site to be moved on 1st bench.
  • backup site to be moved on 1st bench
  • copy all files of the site to be moved from sites/ directory of 1st bench into 2nd bench, skip copying private/public files
  • bench --site … migrate on 2nd bench.
    • if succeeds:
      • move remaining site files from 1st bench to 2nd bench
      • delete site directory from 1st bench.
      • unset pause_scheduler and maintenance_mode on site on 2nd bench
      • redirect site request to 2nd bench: update k8s ingress | update nginx config on 2nd bench | change traefik labels.
    • if fails:
      • save the migration error log
      • restore the backed up database on from 1st bench
      • delete stub site from 2nd bench.
      • unset pause_scheduler and maintenance_mode on site on 1st bench

Repeat this process site by site. It will result in 1 site going down at a time for upgrade.

We still need 2 benches/stacks/releases running at a time for this type of site movement to happen.

Thank you for your input. This is one solution. I have an other point of view if that works.

Create two bench behind a load balancer. Both sath the same volume attached
An env is set before the update starts in the bench container
Health check command checks the env and returns container as unhealthy so the load balancer only sends requests to the healthy container

After the update is done set the container back to healthy state

Repeat the step for the other containers

Thanks for the valuable info.

I’m trying to use Docker Swarm in multi-site / multi-nodes scenario so that I can deploy more sites in the future. Otherwise, I would go through the Bench/Single site path.

I have followed the Docker Swarm installation (Docker swarm, Traefik and Portainer first then adding the stacks and containers) step by step and got my first site up and running. (Thanks for @revant_one help). I also didn’t succeed to add the Maintenance Container (migrate sites) but this is not bothering me for now.

PS: I’ve used the “dockerswarm.rocks” documents to deploy docker swarm + Traefik + Portainer. I used in Traefik the “Getting the client IP” option by using the “traefik-host.yml” file.

I’ve added a node to the swarm and tried to install the 2nd site on the new node but it didn’t work. I believe it’s a volumes and frappe services problem.

Now, unless I’m mistaken, I can’t find any guide about this installation scenario; Multi Site / Multi Nodes in swarm. Even the " frappe_docker/docs/docker-swarm.md " document didn’t mention anything about how to scale up the installation to build an optimized multi-site/multi-node platform.

Any advices please?

Default docker volumes are located on host node and tied to it. This is the reason why portainer and traefik are deployed on labeled nodes. You can try to label approach for erpnext stack as well.

Also note that if you create a new stack for frappe-bench-node-04-v13 also change the internal service names from erpnext-python, erpnext-worker to erpnext-python-node-04-v13, erpnext-worker-node-04-v13

I think you’ve 2 options with docker swarm, community can add options.

  1. Instead of default volumes use nfs-server and mount volume from there.
  2. Use node labels and deploy stacks to labeled nodes.

Once you figure out mounting shared volumes with read-write access it’s as easy as replicating the containers.

If you are planning to host a cluster, go for kubernetes. I feel there are more cluster management abstractions on k8s. https://helm.erpnext.com/faq

In general,

  • If you’ve powerful nodes at least 16GB RAM then you can host your own volume management cluster. (check openebs.io, rook.io)
  • If you don’t have powerful nodes for state on cluster, use ISP managed db, volumes, cache services.
  • If you buy cheap entry level nodes with max 4GB RAM then buy a separate powerful server in same private network and host all the cluster state there.

Thanks @revant_one.
I will leave Portainer and Traefik on the manager node as per the install script. I will add label to node-04 then I will add a constraint in the frapper-bench-v13 stack to be deployed on same node say “node-04” and make sure the services and volumes constraints are set too based on the label.

What about Mariadb Stack ? It should be deployed on the same worker node ? or it can be deployed on the manager node? or the master service to run on the manager and the slave service to run on the worker node? or I have to deploy Mariadb on the worker node running frappe-erpnext stack ?

Cluster state in terms of ERPNext means, redis, mariadb and nfs-server. This is what I do. I’ve my state server setup on non containerized setup.

Attach block volumes to the VPS at location /var/lib/mysql and /var/nfs/general, that way you can increase the size of DB without any restarts via VPS dashboard or api.

If you need mariadb and files to be hosted in cluster then play around with node labels, volume plugins.

Setup server like this https://gist.github.com/revant/0d2935903c56562e9758e3ceda1a9c35

1 Like

Hi @revant_one / All
I’m running Frappe/ERPNext on Docker Swarm with Portainer and Traefik as mentioned.

The current versions are:
ERPNext: v13.2.0 (version-13)
Frappe Framework: v13.2.0 (version-13)

Now I get this message

I want to upgrade to the latest version v13.5.1. My question is:

1- Is it possible to Stop the container frappe-bench-v13_erpnext-python then restart it while opting “always download image” ? Will this do the trick
or
2- I have to use command line path to upgrade frappe and ERPNEXT?

May you please advise which path I can go through (Portainer and GUI if valid) or CLI ?
Any link to any of both options please ?

Thanks a Lot!

It’s not just the single container. It needs to redeploy the whole stack. Go to the Portainer > Stacks > “Name of ERPNext stack”; change the *_VERSION Environment variables and update the stack.
This will update all the containers that use old version of Frappe/ERPNext.

If AUTO_MIGRATE environment variable is set to 1 in the stack for erpnext-python service then it will also automatically run migrations after loading the new container.
use this if you’re sure migration patches won’t fail. (Check below how to find migrations fail or succeed)

My recommendation:

  • Enter the erpnext-python container of current stack and note down db_name from site.name.com/site_config.json. e.g. _964acac14970fadc
  • Create one more stack for erpnext. (No need to add traefik labels, we just need this stack to test migrations)
  • Call it with different name e.g. bench-210621.
  • Do not forget to add this as a suffix to name of each service in new stack e.g. erpnext-python-210621, erpnext-nginx-210621, frappe-socketio-210621, etc.
  • Once new stack is created successfully, create a new site under the stack by entering the erpnext-python-210621 container.
    • bench new-site bench-210621.local --db-type=mariadb --mariadb-no-socket --mariadb-root-password=secret (replace secret with your mariadb root password)
    • you can Ctrl + C half way. We just need the file for bench-210621.local/site_config.json to exist with valid db credentials
    • cat bench-210621.local/site_config.json; Note down the db_name’ e.g. _fcadc07dbdbc21db
    • Keep this portainer window with console running
  • Enter the mariadb container and backup current site
    • mysqldump -uroot -p`cat /var/run/secrets/frappe-mariadb-root-password` _964acac14970fadc > /opt/current.sql
  • Drop the DB for bench-210621.local
    • mysql -uroot -p`cat /var/run/secrets/frappe-mariadb-root-password`
    • mysql> drop database `_fcadc07dbdbc21db`; create database `_fcadc07dbdbc21db`; exit
  • Restore the current.sql over to new db _fcadc07dbdbc21db
    • mysql -uroot -p`cat /var/run/secrets/frappe-mariadb-root-password` _fcadc07dbdbc21db < /opt/current.sql
    • If database it huge, enter the mysql shell and restore,
      • mysql -uroot -p`cat /var/run/secrets/frappe-mariadb-root-password`
      • mysql> use `_fcadc07dbdbc21db`; source /opt/current.sql
    • exit mariadb shell and container
  • After DB restoration is complete go back to the portainer window kept open with erpnext-python-210621 shell running and execute the migration command.
    • bench --site bench-210621.local migrate
  • If you are lucky migrations will be smooth. Exit the python container and follow simple upgrade for successful migrations. Also delete this new stack.
  • If your patches fail, ask questions, raise issue, check code, hack failing patches.
1 Like

Dear @revant_one,
Thank you for the explained steps. I have no issue deploying but facing issue due to bind volumes.

My setup,
3 bricks of glusterfs
3 nodes docker swarm

Now the bricks as mounted as bind volumes. I have issue for custom apps regarding the asset volume and assets folder in the site volume. It is perfectly fine to persist site data but not asset data. So when containers are re deployed the assets volume overwrite the container directories.

Could you suggest a work around please

Regards,
Ahmad

I’ve not used glusterfs.

Try using normal local docker volume for assets

I’m using the standard Docker swarm setup as published on


I use Portainer as per the above mentioned instructions.

I simply made a snapshot from my droplet and tried your answer to No. 1 (to avoid the check process described in No. 2) so that if the process fails I restore the droplet from the snapshot.

  • Edited the frappe-bench-v13 stack
  • Changed ERPNEXT_VERSION and FRAPPE_VERSION using several options like:
    version-13.10.0 , v13.10.0 , 13.10.0 , etc…
    No one worked
  • Only when I used version-13 all the services in the stack went to “running”

Then I found out the system was downgraded to version 13.5.1 !

Any idea ?

Thanks,

Images are available.

It will take some time to download the latest images

All images are available.

If you don’t wish to wait, go to images section on portainer side menu and pull the specific images manually

Because you’re using version-13 tag it is using the latest available tag for version 13 on your machine, which is v13.5.1

DM me access / credentials to your portainer so I can check.