ERPNext kubernetes installation pods stuck in ContainerCreating

did you change the user and group permission for files and directories on volume?

the pod was not created.

I experiment with the initContainers, as to pass commands at the time of pods creation, but that didn’t succeeded.
those experiment are hashed

@mwogi was able to install locally on bare metal using rancher. Similar setup.

config file not created, retry

means the the erpnext-python container “sites” couldn’t create the common_site_config.json file due to insufficient permission.

I login the erpnext-worker pod and execute this:

~$ chown -R 1000:1000 ~/frappe-bench/
chown: changing ownership of '/home/frappe/frappe-bench/commands/migrate.py': Operation not permitted
chown: changing ownership of '/home/frappe/frappe-bench/commands/push_backup.py': Operation not permitted
chown: changing ownership of '/home/frappe/frappe-bench/commands/doctor.py': Operation not permitted
chown: changing ownership of '/home/frappe/frappe-bench/commands/auto_migrate.py': Operation not permitted
chown: changing ownership of '/home/frappe/frappe-bench/commands/backup.py': Operation not permitted
chown: changing ownership of '/home/frappe/frappe-bench/commands/gevent_patch.py': Operation not permitted
chown: changing ownership of '/home/frappe/frappe-bench/commands/utils.py': Operation not permitted
chown: changing ownership of '/home/frappe/frappe-bench/commands/check_connection.py': Operation not permitted
chown: changing ownership of '/home/frappe/frappe-bench/commands/restore_backup.py': Operation not permitted
chown: changing ownership of '/home/frappe/frappe-bench/commands/new.py': Operation not permitted
chown: changing ownership of '/home/frappe/frappe-bench/commands/constants.py': Operation not permitted
chown: cannot read directory '/home/frappe/frappe-bench/logs/lost+found': Permission denied
chown: changing ownership of '/home/frappe/frappe-bench/logs': Operation not permitted
chown: cannot read directory '/home/frappe/frappe-bench/sites/lost+found': Permission denied
chown: changing ownership of '/home/frappe/frappe-bench/sites/assets': Operation not permitted
chown: changing ownership of '/home/frappe/frappe-bench/sites': Operation not permitted

while trying this

~$ kubectl -n erpnext exec -it --user=root alfalehtech-erpnext-worker-d-75b9bb6888-t64d8 -- /bash/sh
error: auth info "root" does not exist

when I exec into the pod the default user is frappe, and this is the directory structure inside.

frappe@alfalehtech-erpnext-worker-s-757699d7bf-zhgwl:~/frappe-bench$ ls -l
total 24
drwxr-xr-x 1 frappe frappe 4096 Dec 11 08:34 apps
drwxr-xr-x 1 frappe frappe 4096 Dec 25 10:52 commands
drwxr-xr-x 1 frappe frappe 4096 Dec 11 08:30 env
drwxrwxrwx 3 root   root   4096 Dec 25 10:51 logs
drwxrwxrwx 4 root   root   4096 Dec 25 10:52 sites

after chown -R 1000:1000 frappe-bench, the same error return.

config file not created, retry 24
config file not created, retry 25
config file not created, retry 26
config file not created, retry 27
config file not created, retry 28
config file not created, retry 29
config file not created, retry 30
config file not created, retry 31
timeout: config file not created

okay. try restarting erpnext deployment, may be it’ll create common_site_config.json

Thanks for the mention @revant_one. Yes, I have been able to successfully install on baremetal using Rancher and Longhorn storage which allows RWX.

@Nader_Alabed I noticed that sometimes these inconsistent errors show up on some versions of the docker images and not others. Sometimes just go a version lower, do the installation and then upgrade to the latest images. E.g. go to v13.16 then after successful install upgrade to v13.17. I recently had the same problem and could do a successful install as it threw a ‘common_site_config not valid’ error but when I downgraded it installed successfully. After that I upgraded.

It is very strange. I am suddenly getting the same problem with Kubernetes and Longhorn.
It worked for quite some time without a problem.

But suddenly, I also get exactly this error message and I am not even aware I changed anything. It does not work with any version of ERPNext.

Is there any suggestion to least narrow down the problem? Thank you so much!

Actually I found the “error”. I wanted to use PostgreSQL as database and omitted MariaDB values. Then this error happens.
Is there a way to go for PostgreSQL on Kubernetes? Do I read the Helm documentation correctly that MariaDB is always needed and PostgreSQL is only a secondary database?