however I am getting an “access denied” if a file is not set to be public on the s3 side.
Naturally u wouldn’t want any company files to be openly accessible on the internet, so you must incorporate some sort of authentication process between ERPNext and the s3 bucket.
Can anyone help?
Edit: we have setup s3 backup, which has a connection between ERPNext to grant write access to a particular s3 bucket.
So I tried to put an attachment inside that same bucket, but that doesn’t work either.
this (“files will still be stored on NFS (or Ceph) first, then mirrored to S3”) is bad why exactly (maybe because every “transformation” [if that is even the correct term in this context] bares the risk of a file getting damaged or corrupted)?
I agree, this would be a good general feature in the core (ideally, not limited to s3 though)
this ( “files will still be stored on NFS (or Ceph) first, then mirrored to S3” ) is bad why exactly (maybe because every “transformation” [if that is even the correct term in this context] bares the risk of a file getting damaged or corrupted)?
Not of damaged files, but more because there will be inefficiencies:
More computing & time/latency & network traffic needed because file will be uploaded to ERPNext first, then saved to NFS/Ceph, then uploaded to S3.
By comparison, uploading directly from client to S3 is very fast, uses as much available AWS S3 bandwidth as possible. Further operations can be done by background workers.
When there is change/delete, there will be multiple delete operations needed.
More storage is needed. If we store 100 GB, we need 100 GB in NFS/Ceph and we need 100 GB in S3.
More sysadmin effort & cost. If we store big files in only in S3, this means we much less space in NFS/Ceph. For example, we can have a 10 TB S3 bucket, and only 10 GB in NFS because the big files aren’t stored in NFS.
Much less efficient delivery/CDN setup. S3 is very well integrated with AWS CloudFront, meaning that it can serve files really fast & optimized and without any Frappe Bench computing resources.
A comparison example (not ERP, but just object storage approach) is Strapi: https://strapi.io/documentation/v3.x/plugins/upload.html#using-a-provider . By using S3 provider, files are uploaded, downloaded, and managed directly in S3, meaning it’s very efficient and fast. While files metadata are stored in database (SQL or MongoDB), but since they’re just metadata, the amount of data is much smaller compared to actual files.
Goofys seems to work really nice.
The AWS credentials can be kept inside the root home within .aws folder and need not be accessible to anyone else
The /etc/fstab file can be edited to mount the s3 volume with permission for any user.
Eg. sudo goofys -o allow_other --uid=1001 --file-mode=0755 --dir-mode=0755 bucketname mountpoint
will load it with 755 file and 755 folder permission for user with uid 1001
Assuming erpnext is installed with frappe user find the uid for frappe user with id -u frappe
How to get S3 bucket key is described here
although I had to make some changes in the policy document.
Note that without "Resource": "USE YOUR BUCKET NAME/*",
part the bucket loading was successful but it was not possible to create or delete files in it.
and without the "Resource": "USE YOUR BUCKET NAME",
the mounting did not succeed
Besides Goofy S3FS-FUSE also seems to be a good alternative for mounting S3 bucket.
It has advantages over goofys in caching when there are more non sequential reads
I dont think either goofys or s3fs will provide the advantage of moving files directly from client to S3 but I did not need that kind of optimisation. Nevertheless would be good to know more about that.