Helm install issue

joshr · August 3, 2022, 9:33pm

I now have two aws kubernetes clusters, one with multiple pre-existing and fully functional erpnext instances running in it, that are giving the same issue when installing newer versions of erpnext.

Here’s what I’m using to deploy the chart:

helm install frappe-bench -n erpnext --set persistence.worker.storageClass=cluster-sc --version 4.0.48 frappe/erpnext

In one cluster, the storageClass was pre-existing and was used to generate other functional PVCs. In the other cluster, the storageClass is new but it also appears to be working fine. In both clusters, I end up with the exact same scenario:
$ kubectl get po -nerpnext
NAME READY STATUS RESTARTS AGE
frappe-bench-erpnext-conf-bench-20220803205447-xhh4c 0/1 Init:Error 0 6m28s
frappe-bench-erpnext-gunicorn-6595dfc78d-m8dwc 1/1 Running 0 6m28s
frappe-bench-erpnext-new-site-20220803205447-mjlgv 0/1 Init:1/2 0 6m28s
frappe-bench-erpnext-nginx-c5885647b-bz82g 1/1 Running 0 6m28s
frappe-bench-erpnext-scheduler-67c5fff586-zfq8x 0/1 CrashLoopBackOff 5 6m27s
frappe-bench-erpnext-socketio-78b799b4d7-2bzzt 0/1 CrashLoopBackOff 6 6m28s
frappe-bench-erpnext-worker-d-dc4c7d77-tvbsq 0/1 CrashLoopBackOff 5 6m27s
frappe-bench-erpnext-worker-l-689cc7c9b5-w8pb5 0/1 CrashLoopBackOff 5 6m28s
frappe-bench-erpnext-worker-s-796bd95d67-nkhrx 0/1 CrashLoopBackOff 5 6m28s
frappe-bench-mariadb-0 1/1 Running 0 6m27s
frappe-bench-redis-cache-master-0 1/1 Running 0 6m27s
frappe-bench-redis-queue-master-0 1/1 Running 0 6m27s
frappe-bench-redis-socketio-master-0 1/1 Running 0 6m28s

Looking at the logs, it complains about apps.txt:
$ kubectl logs -nerpnext frappe-bench-erpnext-scheduler-67c5fff586-zfq8x
Traceback (most recent call last):
File “/usr/local/bin/patched_bench_helper.py”, line 48, in
raise SystemExit(main())
File “/usr/local/bin/patched_bench_helper.py”, line 43, in main
frappe.utils.bench_helper.main()
File “/home/frappe/frappe-bench/apps/frappe/frappe/utils/bench_helper.py”, line 16, in main
commands = get_app_groups()
File “/home/frappe/frappe-bench/apps/frappe/frappe/utils/bench_helper.py”, line 25, in get_app_groups
for app in get_apps():
File “/home/frappe/frappe-bench/apps/frappe/frappe/utils/bench_helper.py”, line 102, in get_apps
return frappe.get_all_apps(with_internal_apps=False, sites_path=“.”)
File “/home/frappe/frappe-bench/apps/frappe/frappe/init.py”, line 1385, in get_all_apps
apps = get_file_items(os.path.join(sites_path, “apps.txt”), raise_not_found=True)
File “/home/frappe/frappe-bench/apps/frappe/frappe/init.py”, line 1533, in get_file_items
content = read_file(path, raise_not_found=raise_not_found)
File “/home/frappe/frappe-bench/apps/frappe/frappe/init.py”, line 1561, in read_file
raise OSError(f"{path} Not Found")
OSError: b’./apps.txt’ Not Found

The PVC looks healthy:
$ kubectl get pvc -nerpnext
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
data-frappe-bench-mariadb-0 Bound pvc-eba9b63d-8048-439e-86a4-3ae617e71c70 8Gi RWO gp2 11m
frappe-bench-erpnext Bound pvc-c295e7c5-e13e-47be-bccc-84b79fc66c61 8Gi RWX cluster-sc 11m

Inside the filesystem, only the assets directory is present:

$sudo ls efs/dynamic_provisioning/pvc-c295e7c5-e13e-47be-bccc-84b79fc66c61/
assets

Suggestions?

revant_one · August 3, 2022, 11:06pm

what’s the error here?

conf-bench job should complete successfully for apps.txt and common_site_config.json to get generated.

joshr · August 3, 2022, 11:24pm

$ kubectl logs -n erpnext frappe-bench-erpnext-conf-bench-20220803205447-xhh4c
Defaulted container “configure” out of: configure, frappe-bench-ownership (init)
Error from server (BadRequest): container “configure” in pod “frappe-bench-erpnext-conf-bench-20220803205447-xhh4c” is waiting to start: PodInitializing

revant_one · August 3, 2022, 11:28pm

kubectl logs -n erpnext frappe-bench-erpnext-conf-bench-20220803205447-xhh4c frappe-bench-ownership

check log of failed init container

joshr · August 3, 2022, 11:32pm

chown: changing ownership of ‘/home/frappe/frappe-bench/sites/assets’: Operation not permitted
chown: changing ownership of ‘/home/frappe/frappe-bench/sites’: Operation not permitted

This seems, on the surface, to indicate a problem with the underlying efs fs to which the storage class tethered. As I’d mentioned, though, on one of the clusters, the sc is being used to provide PVCs for other erpnext installations.

joshr · August 3, 2022, 11:56pm

Digging a little more, this seems likely to be related to the efs provisioner. I’m not sure why the older installations didn’t have an issue (something to do with the older charts?).

revant_one · August 4, 2022, 4:03am

nothing really changed in charts.
only the image tag has changed

recent successful setup on AWS castlecraft / aws-eks-erpnext · GitLab

joshr · August 4, 2022, 1:48pm

This turned out to be a problem with the ‘new’ efs provisioner. There is no good option yet for erpnext other than installing the old provisioner. See here: https://github.com/kubernetes-sigs/aws-efs-csi-driver/issues/300

revant_one · August 4, 2022, 3:35pm

Try disabling initContainer that does the chown. Set the jobs.configure.fixVolume to false in Values.

joshr · August 4, 2022, 5:30pm

Thanks! Disabling fixVolume seems to have worked while still keeping the new provisioner.