*SOLVED* Install.py now FAILS on Unbuntu 14.04, Debian 8, Centos 7. What happened? (also same on Ubuntu 16.04) *SOLVED*

I have another post about a similar failure on CentOS 7, however now this same failure is happening on ALL linux server setups. All 3 of the recommended Linux OS distributions and the recommended versions all have the same error now. It appears to be related to ownership issues encountered during the install.py script running. Here is the dump from my Ubuntu 14.04 test a few min ago:


TASK [Check if /tmp/.bench exists] *****************************************************************************
ok: [localhost]
TASK [Check if bench_repo_path exists] *************************************************************************
ok: [localhost]
TASK [move /tmp/.bench if it exists] ***************************************************************************
fatal: [localhost]: FAILED! => {“failed”: true, “msg”: “Failed to change ownership of the temporary files Ansibl
e needs to create despite connecting as root. Unprivileged become user would be unable to read the file.”}
to retry, use: --limit @/tmp/.bench/playbooks/production/install.retry
PLAY RECAP *****************************************************************************************************
localhost : ok=63 changed=40 unreachable=0 failed=1
Traceback (most recent call last):
File “install.py”, line 388, in
install_bench(args)
File “install.py”, line 114, in install_bench
run_playbook(‘production/install.yml’, sudo=True, extra_vars=extra_vars)
File “install.py”, line 326, in run_playbook
success = subprocess.check_call(args, cwd=os.path.join(cwd, ‘playbooks’))
File “/usr/lib/python2.7/subprocess.py”, line 540, in check_call
raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command ‘[‘ansible-playbook’, ‘-c’, ‘local’, ‘production/install.yml’, ‘-e’, ‘@/t
mp/extra_vars.json’, ‘–become’, ‘–become-user=erp_jmi’]’ returned non-zero exit status 2
erp_jmi@ubu1404-erp10-install-test:~$


.
.
It is now impossible to install ERPNext on any of the recommended linux server configurations.

I am sure I am not the only one stuck here. Less than 48hrs ago at least 2 of these server types would install and only CentOS 7 had the issue. Now none of the work,

You can see the error on Debian 8 here:

And see the error on CentOS 7 here:

And now here is the same error from Ubuntu 16.04:


TASK [Check if /tmp/.bench exists] *****************************************************************************
ok: [localhost]
TASK [Check if bench_repo_path exists] *************************************************************************
ok: [localhost]
TASK [move /tmp/.bench if it exists] ***************************************************************************
fatal: [localhost]: FAILED! => {“failed”: true, “msg”: “Failed to change ownership of the temporary files Ansibl
e needs to create despite connecting as root. Unprivileged become user would be unable to read the file.”}
to retry, use: --limit @/tmp/.bench/playbooks/production/install.retry
PLAY RECAP *****************************************************************************************************
localhost : ok=63 changed=41 unreachable=0 failed=1
Traceback (most recent call last):
File “install.py”, line 388, in
install_bench(args)
File “install.py”, line 114, in install_bench
run_playbook(‘production/install.yml’, sudo=True, extra_vars=extra_vars)
File “install.py”, line 326, in run_playbook
success = subprocess.check_call(args, cwd=os.path.join(cwd, ‘playbooks’))
File “/usr/lib/python2.7/subprocess.py”, line 541, in check_call
raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command ‘[‘ansible-playbook’, ‘-c’, ‘local’, ‘production/install.yml’, ‘-e’, ‘@/t
mp/extra_vars.json’, ‘–become’, ‘–become-user=erp_jmi’]’ returned non-zero exit status 2
erp_jmi@ubu16-erp10-installtest:~$


So even when I now go through the patch to correct the install error with mysql on Ubuntu 16.04 the above error has started popping up.

At this point I am not able to install ERPNext on any linux server using the install.py script.

BKM

There is a PR done 15 hours ago. I am not a fan of this one at all. It is creating symlinks to /var/run for the MariaDB issue. Much better to code the extra config file. Pid and sock files should live in /var/run per Linux FHS guidelines.

There are some extra commands as well in the ansible scripts that I don’t recognize.

I wonder if someone went ahead and put this into the install.py already? :face_with_raised_eyebrow:

That would explain why it worked yesterday morning and not today.

BKM

Did you comment on this at the github PR conversation?

BKM

I am running some tests to see what those commands do first, but yes planning to comment on the PR.

:+1::+1::+1::grin:

Thanks for the assistance. (btw… like your Lego/Robo work. Very cool.)

BKM

1 Like

ok, I ran a test on ubuntu 16.04. The errors that everyone is seeing is coming from the playbooks/develop/includes/setup_bench.yml file. This is one of the two files changed in the PR I posted about earlier. I have commented to the PR that is needs to be reverted immediately. It does exactly as you say - it breaks the install.py script as I have instructed folks to use it.

What is interesting (and frustrating) is this change breaks the behavior of the --user command line argument. If you remove that one argument when running install.py it seems to run ok to completion. What it does is creates the default frappe user and puts everything in the /home/frappe directory. I purposely change the user in my instructions to erpnext because it is a better representation of what the user is for instead of frappe. From a “good sys admin” perspective, application users (e.g. service accounts) should be more meaningful.

Exactly correct! In my early days of testing I was forced to use the --user switch to install everything in the default Google Cloud Platform user (which happens to be the first part of your email address that owns the account).

Later, when setting up production servers on independent hosts, I continued to use the --user switch to make the command line user more relevant to my tech folks that have to support the site.

The down side of having the install ‘force’ everything into /home/frappe is that in many circumstances, the user setting up the site will NOT have the ability to actually log into the frappe user account on the host. Hosting like the GCP (google) service will not let you create alternate users nor will it let you logout of the default user in order to login as someone else. This means the all further updates would have to be done as “sudo” commands.

Using ‘sudo’ to do updates BREAKS ERPNEXT !!!

This was the big problem in my very first “Step by Step” guide for installing ERPNext on the GCP cloud servers. I neglected to use the --user switch and the install.py defaulted to creating a frappe user that I could never access except as a sudo user. From that point on, the server was only ever useful as a test site because you could never successfully update it without breaking it. Unfortuantely, there seems to be a time limit on how long after a post is made the original author can edit it. My ability to edit that long step by step post was lost and there are still people sending me :heart: likes about it.

It appears that this PR has further broken the script to always use the frappe account regardless of what you set in the --user switch. That is just poor judgement.

It is similar patches and untested stuff like this that prevents most people from ever using the frappe cloud as their hosting solution for ERPNext. Every ridiculous update like this one is automatically put into the frappe cloud accounts and the users are sometimes stuck with junk code for weeks on end. There needs to be a collective re-think about how this process works.

BKM

That is not true. You can create other users on the virtual server and putty to the box just like any other linux machine. I am hosting my own erp instance in GCP. It is running just like my instructions discuss. I do have to initially logon as my main jwrober user, like you mention. But then I sudo su - erpnext and then do admin work via bench there as needed.

Hey,
The initial PR was sent by me. Creating symlinks was a terrible idea and I’m sorry for it.

I sent in another PR to fix the issue. Instead of creating symlinks, the mariadb configuration is set in another file (erpnext.cnf) now inside the mariadb.conf.d directory.

Also I tested the script with the --user flag, and a passed the argument erpnext to it. I tested it on CentOS7 and Ubuntu16.04 and it seems to be working fine on both the systems. Can you please confirm the same.

Agaian terribly sorry for the initial PR.

@codingCoffee - did you happen to create an erpnext user like I do in the admin guide and then install as that user? All files including ./bench and any site bench directories should end up in /home/erpnext and not in /home/frappe, or /home/root. I will run another test later today when I get a chance. I’ll also look over the merged PR as well.

I am beginning testing again right now. It will probably take an hour or two to complete. I will update here when I am finished. Unfortunately @James_Robertson I will not be able to test your specific user name since I am working with GCP and cannot create additional users aside from my account username. I will however go through your procedure on a Debian 8 server after I complete my regular install tests. It will just be with my own username instead of erpnext.

BKM

@codingCoffee
@James_Robertson

I have used the Google Cloud Platform to spin up all of the servers this morning and here are the results:


Debian 8 -----------> Success (install.py works with no prerequisites)

Ubuntu 14.04 ----> Success (install.py works with no prerequisites)

Ubuntu 16.04 ----> FAILED (1st attempt - see traceback below)

CentOS 7 ---------> FAILED (see traceback below)


The Ubuntu v16.04 install attempt with the new install.py failed for the same reason it has failed for the past month. MariaBD still seems to have a configuration problem. Here is the trace:


TASK [Gathering Facts] **************************************************************************
ok: [localhost]

TASK [Check whether a site exists] **************************************************************
ok: [localhost]

TASK [Create new site] **************************************************************************
fatal: [localhost]: FAILED! => {“changed”: true, “cmd”: [“bench”, “new-site”, “site1.local”, “–admin-password”, “password”, “–mariadb-root-password”, “password”], “delta”: “0:00:01.951457”, “end”: “2018-01-19 14:34:16.990444”, “failed”: true, “rc”: 1, “start”: “2018-01-19 14:34:15.038987”, “stderr”: “ERROR 2002 (HY000): Can’t connect to local MySQL server through socket ‘/var/run/mysqld/mysqld.sock’ (2)”, “stderr_lines”: [“ERROR 2002 (HY000): Can’t connect to local MySQL server through socket ‘/var/run/mysqld/mysqld.sock’ (2)”], “stdout”: “Database not installed, this can due to lack of permission, or that the database name exists.\nCheck your mysql root password, or use --force to reinstall”, “stdout_lines”: [“Database not installed, this can due to lack of permission, or that the database name exists.”, “Check your mysql root password, or use --force to reinstall”]}
to retry, use: --limit @/tmp/.bench/playbooks/production/install.retry

PLAY RECAP **************************************************************************************
localhost : ok=73 changed=45 unreachable=0 failed=1

Traceback (most recent call last):
File “install.py”, line 388, in
install_bench(args)
File “install.py”, line 114, in install_bench
run_playbook(‘production/install.yml’, sudo=True, extra_vars=extra_vars)
File “install.py”, line 326, in run_playbook
success = subprocess.check_call(args, cwd=os.path.join(cwd, ‘playbooks’))
File “/usr/lib/python2.7/subprocess.py”, line 541, in check_call
raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command ‘[‘ansible-playbook’, ‘-c’, ‘local’, ‘production/install.yml’, ‘-e’, ‘@/tmp/extra_vars.json’, ‘–become’, ‘–become-user=erp_jmi’]’ returned non-zero exit status 2
erp_jmi@test-ubu1604:~$


It is possible to get past this error if you use the workaround created by @James_Robertson that involves creating a new .cnf file BEFORE running install.py. On my second attempt, using the workaround of creating a dummy .cnf file, I was able to get ERPNext to install on Ubuntu 16.04.
@codingCoffee New users will not have an easy time figuring this out and it should be fixed in the install script.


CENTOS v7

The install script FAILS to complete when run on a CentOS 7 host. The test was run by first…disabling selinux with the following command:

sudo setenforce 0

Then the install script was retrieved and run as normal.
Tje following is the trace dump from the failed attempt:


TASK [install erpnext to default site] **********************************************************
skipping: [localhost]
PLAY [localhost] ********************************************************************************
TASK [Gathering Facts] **************************************************************************
ok: [localhost]
TASK [insert/update inputrc for history] ********************************************************
An exception occurred during task execution. To see the full traceback, use -vvv. The error was:
OSError: [Errno 2] No such file or directory
fatal: [localhost]: FAILED! => {“changed”: false, “failed”: true, “msg”: “Could not replace file:
/tmp/tmpWGA_Qe to /home/root/.inputrc: [Errno 2] No such file or directory”}
to retry, use: --limit @/tmp/.bench/playbooks/production/install.retry
PLAY RECAP **************************************************************************************
localhost : ok=70 changed=48 unreachable=0 failed=1
Traceback (most recent call last):
File “install.py”, line 388, in
install_bench(args)
File “install.py”, line 114, in install_bench
run_playbook(‘production/install.yml’, sudo=True, extra_vars=extra_vars)
File “install.py”, line 326, in run_playbook
success = subprocess.check_call(args, cwd=os.path.join(cwd, ‘playbooks’))
File “/usr/lib64/python2.7/subprocess.py”, line 542, in check_call
raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command ‘[‘ansible-playbook’, ‘-c’, ‘local’, ‘production/install.y
ml’, ‘-e’, ‘@/tmp/extra_vars.json’, ‘–become’, ‘–become-user=erp_jmi’]’ returned non-zero exit
status 2
[erp_jmi@test-centos7 ~]$


.
.
At this point it appears we are at least back to being able to use the install.py script to habdle setting up ERPNext production servers on Debain v8 and Ubuntu v14.04.

This was the case a few days ago. However, we are still unable to run the install script on a CentOS 7 server. It fails every time including when I try some to the other work-around prerequisites from other posts. We are also unable to run the install script on Ubuntu 16.04 unless several prerequisites are performed to the server ahead of the install. Those prerequisites involve creating a dummy .cnf file in order to get MariaDB to complete the install.

@codingCoffee Your PRs were supposed to address both of these failure issues identified above. At this point they do not appear to be working. If you attempt to fix them, please perform regressive testing on the rest of the linux server types before releasing another version of the install script.

BKM

Did you use my instructions to create the /home/root directory first on CentOS?

Yes, I tried that, and it still failed but in a new location in the script. It didn’t seem to make sense to then try to find another workaround for the new failure point. The reality is that I took as much of your install guide as I could and applied it to the CentOS7 attempts. I created the .cnf file, set the selinux to permissive, and created the /home/root directory. Even with all of this, it still fails.

I tried this on several server iterations in different server farm locations around the country just to make sure I was not suffering sever issues. The results were the same. Since I can spin up multiple servers at multiple locations at the same time, I was using that ability to run the install test as many times as I could fit on my screens. Overall the CentOS7 server was used 9 different times. 4 of those were with your suggested prerequisites and they all failed.

I think now it is important for someone from the development team to put some real effort into making the “Easy Install” script live up to it’s name. It is no longer useful as an installer except to the 2 oldest and most stable linux candidates.

If you are interested in the new fail point, I can spin it up again and post the trace dump here. It doesn’t take a lot of effort to do that.

BKM

@James_Robertson
Well, I didn’t try exactly what you had written in your guide. But on a fresh server these are the exact commands I executed as the root user

Ubuntu 16.04

cd ~
apt-get update
apt-get -y upgrade
apt-get install -y python-minimal build-essential python-setuptools
wget https://raw.githubusercontent.com/frappe/bench/master/playbooks/install.py
sudo python install.py --develop --user frappe --mysql-root-password frappe --admin-password frappe

CentOS 7

cd ~
yum check-update
yum update -y
yum groupinstall -y development
curl "https://raw.githubusercontent.com/frappe/bench/master/playbooks/install.py" -o install.py
sudo python install.py --develop --user frappe --mysql-root-password frappe --admin-password frappe

The install script handles the user creation and creates a user named erpnext and installs frappe-bench in the /home/erpnext/ directory

I have tried this on CentOS7 and Ubuntu16.04 droplets on Digital Ocean and it seems to be working.

@bkm

You are right my PR was supposed to address these problem. I’ll take a look at the trace dumps you posted and try to figure out the solution.

Till then I would be grateful if you could execute the commands I mentioned above and post the trace dumps here. Use the --production flag instead of --develop

Per your request. This is the trace dump from Google Cloud Platform test on Ubuntu 16.04. I copied and pasted your exact instructions into the server to prepare it and run the install. I also made sure to use the --production switch instead of --develop. Here are the results:


TASK [Gathering Facts] ******************************************************************************
ok: [localhost]
TASK [Check whether a site exists] ******************************************************************
ok: [localhost]
TASK [Create new site] ******************************************************************************
fatal: [localhost]: FAILED! => {“changed”: true, “cmd”: [“bench”, “new-site”, “site1.local”, “–admin
-password”, “frappe”, “–mariadb-root-password”, “frappe”], “delta”: “0:00:02.073400”, “end”: “2018-0
1-19 17:46:50.195798”, “failed”: true, “rc”: 1, “start”: “2018-01-19 17:46:48.122398”, “stderr”: "ERR
OR 2002 (HY000): Can’t connect to local MySQL server through socket ‘/var/run/mysqld/mysqld.sock’ (2)
", “stderr_lines”: [“ERROR 2002 (HY000): Can’t connect to local MySQL server through socket ‘/var/run
/mysqld/mysqld.sock’ (2)”], “stdout”: “Database not installed, this can due to lack of permission, or
that the database name exists.\nCheck your mysql root password, or use --force to reinstall”, “stdou
t_lines”: [“Database not installed, this can due to lack of permission, or that the database name exi
sts.”, “Check your mysql root password, or use --force to reinstall”]}
to retry, use: --limit @/tmp/.bench/playbooks/production/install.retry
PLAY RECAP ******************************************************************************************
localhost : ok=73 changed=45 unreachable=0 failed=1
Traceback (most recent call last):
File “install.py”, line 388, in
install_bench(args)
File “install.py”, line 114, in install_bench
run_playbook(‘production/install.yml’, sudo=True, extra_vars=extra_vars)
File “install.py”, line 326, in run_playbook
success = subprocess.check_call(args, cwd=os.path.join(cwd, ‘playbooks’))
File “/usr/lib/python2.7/subprocess.py”, line 541, in check_call
raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command ‘[‘ansible-playbook’, ‘-c’, ‘local’, ‘production/install.yml’,
‘-e’, ‘@/tmp/extra_vars.json’, ‘–become’, ‘–become-user=erp_jmi’]’ returned non-zero exit status 2


.
.
Let me know what you find different. It is possible that the Digital Ocean droplets are more provisioned than the basic server instances at other hosting locations.

BKM

Again… as per your request, I copied and pasted all of your commands into the CentOS 7 server to load all of your prerequisites and install erpnext. Again, it failed. Here is the trace dump:


TASK [create a new default site] *****************************************************************
skipping: [localhost]
TASK [install erpnext to default site] ***********************************************************
skipping: [localhost]
PLAY [localhost] *********************************************************************************
TASK [Gathering Facts] ***************************************************************************
ok: [localhost]
TASK [insert/update inputrc for history] *********************************************************
An exception occurred during task execution. To see the full traceback, use -vvv. The error was: O
SError: [Errno 2] No such file or directory
fatal: [localhost]: FAILED! => {“changed”: false, “failed”: true, “msg”: “Could not replace file:
/tmp/tmpve_Njg to /home/root/.inputrc: [Errno 2] No such file or directory”}
to retry, use: --limit @/tmp/.bench/playbooks/production/install.retry
PLAY RECAP ***************************************************************************************
localhost : ok=70 changed=48 unreachable=0 failed=1
Traceback (most recent call last):
File “install.py”, line 388, in
install_bench(args)
File “install.py”, line 114, in install_bench
run_playbook(‘production/install.yml’, sudo=True, extra_vars=extra_vars)
File “install.py”, line 326, in run_playbook
success = subprocess.check_call(args, cwd=os.path.join(cwd, ‘playbooks’))
File “/usr/lib64/python2.7/subprocess.py”, line 542, in check_call
raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command ‘[‘ansible-playbook’, ‘-c’, ‘local’, ‘production/install.ym
l’, ‘-e’, ‘@/tmp/extra_vars.json’, ‘–become’, ‘–become-user=erp_jmi’]’ returned non-zero exit st
atus 2


.
.
Both times I used your special instructions for the Centos7 and Ubuntu 16.04 servers, they failed.

At this point the only thing I can think of as possibly different ‘might’ be your use of the --develop switch in your install tests. I ONLY use --production servers and never set up developer servers as I am not an developer.

BKM

I tried on Digital Ocean, I got the same Error… I added --site along with it. I going to test with exact script codingCoffee shared

TASK [Create new site] **********************************************************************************************************************************

fatal: [localhost]: FAILED! => {“changed”: true, “cmd”: [“bench”, “new-site”, “xxx”, “–admin-password”, “xxx”, “–mariadb-root-password”, “xxx”], “delta”: “0:00:02.467042”, “end”: “2018-01-19 21:56:32.215387”, “failed”: true, “rc”: 1, “start”: “2018-01-19 21:56:29.748345”, “stderr”: “ERROR 2002 (HY000): Can’t connect to local MySQL server through socket ‘/var/run/mysqld/mysqld.sock’ (2)”, “stderr_lines”: [“ERROR 2002 (HY000): Can’t connect to local MySQL server through socket ‘/var/run/mysqld/mysqld.sock’ (2)”], “stdout”: “Database not installed, this can due to lack of permission, or that the database name exists.\nCheck your mysql root password, or use --force to reinstall”, “stdout_lines”: [“Database not installed, this can due to lack of permission, or that the database name exists.”, “Check your mysql root password, or use --force to reinstall”]}
to retry, use: --limit @/tmp/.bench/playbooks/production/install.retry

PLAY RECAP **********************************************************************************************************************************************
localhost : ok=69 changed=41 unreachable=0 failed=1

Traceback (most recent call last):
File “install.py”, line 388, in
install_bench(args)
File “install.py”, line 114, in install_bench
run_playbook(‘production/install.yml’, sudo=True, extra_vars=extra_vars)
File “install.py”, line 326, in run_playbook
success = subprocess.check_call(args, cwd=os.path.join(cwd, ‘playbooks’))
File “/usr/lib/python2.7/subprocess.py”, line 541, in check_call
raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command ‘[‘ansible-playbook’, ‘-c’, ‘local’, ‘production/install.yml’, ‘-e’, ‘@/tmp/extra_vars.json’, ‘–become’, ‘–become-user=frappe’]’ returned non-zero exit status 2

This time I used ALL of your instructions in the install guide you published and this is the resulting (new) error:


TASK [restart mysql] *******************************************************************************************
skipping: [localhost]
TASK [create a new default site] *******************************************************************************
skipping: [localhost]
TASK [install erpnext to default site] *************************************************************************
skipping: [localhost]
PLAY [localhost] ***********************************************************************************************
TASK [Gathering Facts] *****************************************************************************************
ok: [localhost]
TASK [insert/update inputrc for history] ***********************************************************************
An exception occurred during task execution. To see the full traceback, use -vvv. The error was: OSError: [Errno
2] No such file or directory
fatal: [localhost]: FAILED! => {“changed”: false, “failed”: true, “msg”: “Could not replace file: /tmp/tmpAO67He
to /home/root/.inputrc: [Errno 2] No such file or directory”}
to retry, use: --limit @/tmp/.bench/playbooks/production/install.retry
PLAY RECAP *****************************************************************************************************
localhost : ok=70 changed=44 unreachable=0 failed=1
Traceback (most recent call last):
File “install.py”, line 388, in
install_bench(args)
File “install.py”, line 114, in install_bench
run_playbook(‘production/install.yml’, sudo=True, extra_vars=extra_vars)
File “install.py”, line 326, in run_playbook
success = subprocess.check_call(args, cwd=os.path.join(cwd, ‘playbooks’))
File “/usr/lib64/python2.7/subprocess.py”, line 542, in check_call
raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command ‘[‘ansible-playbook’, ‘-c’, ‘local’, ‘production/install.yml’, ‘-e’, ‘@/t
mp/extra_vars.json’, ‘–become’, ‘–become-user=erpnext’]’ returned non-zero exit status 2
[erpnext@test-centos7 ~]$


.
.
This error is different from those I had earlier using other work-around processes. So, even using your guide, the system will not install on CentOS 7. Hope this helps you work through your guide.

BKM