Using s3 for attachments - how to grant access?

We want to use s3 storage for attachments.

however I am getting an “access denied” if a file is not set to be public on the s3 side.
Naturally u wouldn’t want any company files to be openly accessible on the internet, so you must incorporate some sort of authentication process between ERPNext and the s3 bucket.

Can anyone help?


Edit: we have setup s3 backup, which has a connection between ERPNext to grant write access to a particular s3 bucket.

So I tried to put an attachment inside that same bucket, but that doesn’t work either.

4 Likes

I’ve not tried this.

1 Like

thx for the pointer.

I believe you are also saying that this is not possible with a standard ERPNext instance.

1 Like

I hope this becomes a fully supported option.

The caveat with frappe-attachments-s3 is that files will still be stored on NFS (or Ceph) first, then mirrored to S3. So it is not a native feature.

just curious about this in detail …

this (“files will still be stored on NFS (or Ceph) first, then mirrored to S3”) is bad why exactly (maybe because every “transformation” [if that is even the correct term in this context] bares the risk of a file getting damaged or corrupted)?

I agree, this would be a good general feature in the core (ideally, not limited to s3 though)

2 Likes

Re: @vrms:

this ( “files will still be stored on NFS (or Ceph) first, then mirrored to S3” ) is bad why exactly (maybe because every “transformation” [if that is even the correct term in this context] bares the risk of a file getting damaged or corrupted)?

Not of damaged files, but more because there will be inefficiencies:

  1. More computing & time/latency & network traffic needed because file will be uploaded to ERPNext first, then saved to NFS/Ceph, then uploaded to S3.
  2. By comparison, uploading directly from client to S3 is very fast, uses as much available AWS S3 bandwidth as possible. Further operations can be done by background workers.
  3. When there is change/delete, there will be multiple delete operations needed.
  4. More storage is needed. If we store 100 GB, we need 100 GB in NFS/Ceph and we need 100 GB in S3.
  5. More sysadmin effort & cost. If we store big files in only in S3, this means we much less space in NFS/Ceph. For example, we can have a 10 TB S3 bucket, and only 10 GB in NFS because the big files aren’t stored in NFS.
  6. Much less efficient delivery/CDN setup. S3 is very well integrated with AWS CloudFront, meaning that it can serve files really fast & optimized and without any Frappe Bench computing resources.

A comparison example (not ERP, but just object storage approach) is Strapi: https://strapi.io/documentation/v3.x/plugins/upload.html#using-a-provider . By using S3 provider, files are uploaded, downloaded, and managed directly in S3, meaning it’s very efficient and fast. While files metadata are stored in database (SQL or MongoDB), but since they’re just metadata, the amount of data is much smaller compared to actual files.

1 Like

I’ve create AWS S3 File attachment pull request, there are still a lot to be done, any help would be appreciate.

7 Likes

If files go on S3, there is possibility of dropping RWX storage like NFS in kubernetes setup. The configs can go in configMaps instead of volumes.

what if we mount s3 bucket to sites folder

last time I tried on kubernetes, it only allowed root to mount the object storage as volumes. https://github.com/ctrox/csi-s3

It can be done for non kubernetes installs with https://github.com/kahing/goofys

2 Likes

Have you tried the goofys ?
I am trying to make a comparision between zerodha and goofys and not sure which one makes more sense. Can you please provide some pointers.
zerodha https://github.com/zerodha/frappe-attachments-s3
goofys https://github.com/kahing/goofys

I see that goofys would have less latency but would that need more maintenance ?

goofys has broader uses. It mounts S3 as a volume for fast use. If it can be managed to mount the volume with right permissions it is safer and faster.

If you mount the volume as root then it is safer to use frappe-attachments-s3

Thanks for the reply.
How can we mount the volume if not as “root”. I mean things which are in fstab are allowed I guess.
Also Goofys says one can mount by simply editing fstab

goofys#bucket   /mnt/mountpoint        fuse     _netdev,allow_other,--file-mode=0666,--dir-mode=0777    0       0

What do you mean by “managed to mount the volume with right permissions”
Isn’t that as simple as editing fstab ?

I used goofys with https://github.com/ctrox/csi-s3#goofys

That needed all pods to have root permission to access volume. I haven’t tried goofys on its own.

Got it Thanks. You had tried with dockers and it needed root
I guess by itself once I edit fstab we wont need root. So looks good and fast

For the sake of completion

Goofys seems to work really nice.
The AWS credentials can be kept inside the root home within .aws folder and need not be accessible to anyone else
The /etc/fstab file can be edited to mount the s3 volume with permission for any user.
Eg.
sudo goofys -o allow_other --uid=1001 --file-mode=0755 --dir-mode=0755 bucketname mountpoint
will load it with 755 file and 755 folder permission for user with uid 1001

Assuming erpnext is installed with frappe user find the uid for frappe user with
id -u frappe

How to get S3 bucket key is described here

although I had to make some changes in the policy document.

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "MountFolder",
      "Action": "s3:*",
      "Effect": "Allow",
      "Resource": "USE YOUR BUCKET NAME",
      "Principal": {
        "AWS": [
          "USE ARN OF YOUR AWS IAM"
        ]
      }
    },
    {
      "Sid": "AccessEditDelete",
      "Action": [
        "s3:DeleteObject",
        "s3:GetObject",
        "s3:PutObject",
        "s3:PutObjectAcl"
      ],
      "Effect": "Allow",
      "Resource": "USE YOUR BUCKET NAME/*",
      "Principal": {
        "AWS": [
          "USE ARN OF YOUR AWS IAM"
        ]
      }
    }
  ]
}

Note that without
"Resource": "USE YOUR BUCKET NAME/*",
part the bucket loading was successful but it was not possible to create or delete files in it.
and without the
"Resource": "USE YOUR BUCKET NAME",
the mounting did not succeed

Besides Goofy S3FS-FUSE also seems to be a good alternative for mounting S3 bucket.


It has advantages over goofys in caching when there are more non sequential reads

I dont think either goofys or s3fs will provide the advantage of moving files directly from client to S3 but I did not need that kind of optimisation. Nevertheless would be good to know more about that.

Hope this is useful