Try ERPNext Buy Support Partners Foundation

Docker Swarm And Classic Bench Setup

Hello Every One,
Firstly I would like to thank Revant for the effort he has put into Dockerizing the classic bench setup. That really helped understanding the life cycle of the bench.
The stack concept is what interests me. In the classic bench setup if there is a code change we have to restart the bench to take effect. This event causes downtime for all the sites even if the code for was one particular site.
The stack gives hope of isolating worker/redis stacks of each site. So in the event of an update the targeted stack will only. This will prevent the downtime for other sites.

What would be the best approach to do that.

Ref:

For a single site to run it needs all the services that bench runs.

That makes 1 bench == 1 stack == 1 helm release

Steps in brief, it will also work in classic bench.

  • create 1st bench, install multiple sites.
  • create 2nd bench (updated code), no sites installed.
  • set pause_scheduler and maintenance_mode for site to be moved on 1st bench.
  • backup site to be moved on 1st bench
  • copy all files of the site to be moved from sites/ directory of 1st bench into 2nd bench, skip copying private/public files
  • bench --site … migrate on 2nd bench.
    • if succeeds:
      • move remaining site files from 1st bench to 2nd bench
      • delete site directory from 1st bench.
      • unset pause_scheduler and maintenance_mode on site on 2nd bench
      • redirect site request to 2nd bench: update k8s ingress | update nginx config on 2nd bench | change traefik labels.
    • if fails:
      • save the migration error log
      • restore the backed up database on from 1st bench
      • delete stub site from 2nd bench.
      • unset pause_scheduler and maintenance_mode on site on 1st bench

Repeat this process site by site. It will result in 1 site going down at a time for upgrade.

We still need 2 benches/stacks/releases running at a time for this type of site movement to happen.

Thank you for your input. This is one solution. I have an other point of view if that works.

Create two bench behind a load balancer. Both sath the same volume attached
An env is set before the update starts in the bench container
Health check command checks the env and returns container as unhealthy so the load balancer only sends requests to the healthy container

After the update is done set the container back to healthy state

Repeat the step for the other containers