503 Service Temporarily Unavailable with GKE

I’m making very slow progress installing ERPNext to a K8s cluster hosted on GKE (Standard).

I’ve setup nginx-ingress and tested it with a hello world app as described here: Ingress with NGINX controller on Google Kubernetes Engine  |  Google Cloud Platform Community

ERPNext was installed via helm according to the TL;DR section described here: helm/README.md at main · frappe/helm · GitHub

Here is the code I used to create the nginx-ingress:

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: erp-resource
  annotations:
    kubernetes.io/ingress.class: "nginx"
    nginx.ingress.kubernetes.io/ssl-redirect: "false"
spec:
  rules:
  - host: "1.2.3.4.nip.io"
    http:
      paths:
      - pathType: Prefix
        path: "/erp"
        backend:
          service:
            name: frappe-bench-erpnext
            port:
              number: 8080

I also tried the 503 debugging tutorial here: Resolve 503 errors when I access Kubernetes in EKS cluster

It seems everything is fine.

I’m stuck here with 503 :dizzy_face:

can you port forward the service and try connecting to it with curl -H "Host: site.name.com" http://127.0.0.1:8080/api/method/ping

second command needs to be executed as well. It installs in-cluster nfs server.

are all your ERPNext pods in running state?

Hi Revant,

Thanks for your help.

Yes, all the pods are in running state. The PV and PVC are all successful.

I tried to forward the port of the erpnext-nginx pod, when accessing via a browser, it shows: Localhost does not exist.

I then ssh into a worker pod, in the sites directory, I can see a site directory was created, e.g., 1.2.3.4.nio.io. However, the currentsite.txt was missing. I manually created the file and put the site URL into it. I also tried ‘bench use 1.2.3.4.nio.io‘. Then I restarted the all the worker pods manually, but they still show the same thing. I then restarted the worker node, all pods got restarted and showing running state.

Unfortunately the result was the same.

I’m wondering where did I make the mistake?

this file should not exist for dns multi-tenant to work, it forces bench to serve only one site.

port-forward service and try curl.

kubectl port-forward -n erpnext svc/frappe-bench-erpnext 8080:8080                      

curl:

curl -H "Host: 1.2.3.4.nio.io" https://0.0.0.0:8080/api/method/ping

Here I don’t find namespace? Is your ingress created in same namespace? It should be in same namespace.

Hi Revant,

Thanks for your help.

Here is the output when using curl from kubectl:

Handling connection for 8080
E0724 10:14:46.300864   91714 portforward.go:391] error copying from local connection to remote stream: read tcp4 127.0.0.1:8080->127.0.0.1:64131: read: connection reset by peer

I ignored the namespace for ingress and it was deployed in default namespace, I will fix this.

Here are the steps I did setting up ERPNext.

# prepare NFS, this NFS has to be slightly larger than 8GiB, or ERPNext will fail to bind the storage
kubectl create namespace nfs
helm repo add nfs-ganesha-server-and-external-provisioner https://kubernetes-sigs.github.io/nfs-ganesha-server-and-external-provisioner
helm upgrade --install -n nfs in-cluster nfs-ganesha-server-and-external-provisioner/nfs-server-provisioner --set 'storageClass.mountOptions={vers=4.1}' --set persistence.enabled=true --set persistence.size=9Gi
# install ERPNext
kubectl create namespace erpnext
helm repo add frappe https://helm.erpnext.com
helm install frappe-bench -n erpnext -f custom-values.yaml frappe/erpnext

This is the custom-values.yaml file.

# Default values for erpnext.
# This is a YAML-formatted file.
# Declare variables to be passed into your templates.

# Configure external database host
# dbHost: ""
# dbPort: 3306
# dbRootUser: ""
# dbRootPassword: ""

nginx:
  replicaCount: 1
  image:
    repository: frappe/erpnext-nginx
    tag: v13.36.1
    pullPolicy: IfNotPresent
  # config: |
  #   # custom template /etc/nginx/templates/default.conf.template
  #   # https://github.com/nginxinc/docker-nginx-unprivileged/blob/main/stable/alpine/20-envsubst-on-templates.sh
  environment:
    upstreamRealIPAddress: "127.0.0.1"
    upstreamRealIPRecursive: "off"
    upstreamRealIPHeader: "X-Forwarded-For"
    frappeSiteNameHeader: "$host"
  livenessProbe:
    tcpSocket:
      port: 8080
    initialDelaySeconds: 5
    periodSeconds: 10
  readinessProbe:
    tcpSocket:
      port: 8080
    initialDelaySeconds: 5
    periodSeconds: 10
  service:
    type: ClusterIP
    port: 8080
  resources: {}
  nodeSelector: {}
  tolerations: []
  affinity: {}
  envVars: []
  initContainers: []
  sidecars: []

worker:
  image:
    repository: frappe/erpnext-worker
    tag: v13.36.1
    pullPolicy: IfNotPresent

  gunicorn:
    replicaCount: 1
    livenessProbe:
      tcpSocket:
        port: 8000
      initialDelaySeconds: 5
      periodSeconds: 10
    readinessProbe:
      tcpSocket:
        port: 8000
      initialDelaySeconds: 5
      periodSeconds: 10
    service:
      type: ClusterIP
      port: 8000
    resources: {}
    nodeSelector: {}
    tolerations: []
    affinity: {}
    args:
      # https://pythonspeed.com/articles/gunicorn-in-docker/
      # allowed workerClass options are "gevent", "gthread", "sync", default "gevent"
      # set --config=/opt/patches/gevent_patch.py to use gevent.monkey.patch_all()
    - /home/frappe/frappe-bench/env/bin/gunicorn
    - --bind=0.0.0.0:8000
    - --config=/opt/patches/gevent_patch.py
    - --log-file=-
    - --preload
    - --threads=4
    - --timeout=120
    - --worker-class=gevent
    - --worker-tmp-dir=/dev/shm
    - --workers=2
    - frappe.app:application
    envVars: []
    initContainers: []
    sidecars: []

  default:
    replicaCount: 1
    resources: {}
    nodeSelector: {}
    tolerations: []
    affinity: {}
    livenessProbe:
      override: false
      probe: {}
    readinessProbe:
      override: false
      probe: {}
    envVars: []
    initContainers: []
    sidecars: []

  short:
    replicaCount: 1
    resources: {}
    nodeSelector: {}
    tolerations: []
    affinity: {}
    livenessProbe:
      override: false
      probe: {}
    readinessProbe:
      override: false
      probe: {}
    envVars: []
    initContainers: []
    sidecars: []

  long:
    replicaCount: 1
    resources: {}
    nodeSelector: {}
    tolerations: []
    affinity: {}
    livenessProbe:
      override: false
      probe: {}
    readinessProbe:
      override: false
      probe: {}
    envVars: []
    initContainers: []
    sidecars: []

  scheduler:
    replicaCount: 1
    resources: {}
    nodeSelector: {}
    tolerations: []
    affinity: {}
    livenessProbe:
      override: false
      probe: {}
    readinessProbe:
      override: false
      probe: {}
    envVars: []
    initContainers: []
    sidecars: []

  healthProbe: |
    exec:
      command:
        - bash
        - -c
        - echo "Ping backing services";
        {{- if .Values.mariadb.enabled }}
        {{- if eq .Values.mariadb.architecture "replication" }}
        - wait-for-it {{ .Release.Name }}-mariadb-primary:{{ .Values.mariadb.primary.service.ports.mysql }} -t 1;
        {{- else }}
        - wait-for-it {{ .Release.Name }}-mariadb:{{ .Values.mariadb.primary.service.ports.mysql }} -t 1;
        {{- end }}
        {{- else if .Values.dbHost }}
        - wait-for-it {{ .Values.dbHost }}:{{ .Values.mariadb.primary.service.ports.mysql }} -t 1;
        {{- end }}
        {{- if index .Values "redis-cache" "host" }}
        - wait-for-it {{ .Release.Name }}-redis-cache-master:{{ index .Values "redis-cache" "master" "containerPorts" "redis" }} -t 1;
        {{- else if index .Values "redis-cache" "host" }}
        - wait-for-it {{ index .Values "redis-cache" "host" }} -t 1;
        {{- end }}
        {{- if index .Values "redis-queue" "host" }}
        - wait-for-it {{ .Release.Name }}-redis-queue-master:{{ index .Values "redis-queue" "master" "containerPorts" "redis" }} -t 1;
        {{- else if index .Values "redis-queue" "host" }}
        - wait-for-it {{ index .Values "redis-queue" "host" }} -t 1;
        {{- end }}
        {{- if index .Values "redis-socketio" "host" }}
        - wait-for-it {{ .Release.Name }}-redis-socketio-master:{{ index .Values "redis-socketio" "master" "containerPorts" "redis" }} -t 1;
        {{- else if index .Values "redis-socketio" "host" }}
        - wait-for-it {{ index .Values "redis-socketio" "host" }} -t 1;
        {{- end }}
        {{- if .Values.postgresql.host }}
        - wait-for-it {{ .Values.postgresql.host }}:{{ .Values.postgresql.primary.service.ports.postgresql }} -t 1;
        {{- else if .Values.postgresql.enabled }}
        - wait-for-it {{ .Release.Name }}-postgresql:{{ .Values.postgresql.primary.service.ports.postgresql }} -t 1;
        {{- end }}
    initialDelaySeconds: 15
    periodSeconds: 5

socketio:
  replicaCount: 1
  livenessProbe:
    tcpSocket:
      port: 9000
    initialDelaySeconds: 5
    periodSeconds: 10
  readinessProbe:
    tcpSocket:
      port: 9000
    initialDelaySeconds: 5
    periodSeconds: 10
  image:
    repository: frappe/frappe-socketio
    tag: v13.36.2
    pullPolicy: IfNotPresent
  resources: {}
  nodeSelector: {}
  tolerations: []
  affinity: {}
  service:
    type: ClusterIP
    port: 9000
  envVars: []
  initContainers: []
  sidecars: []

persistence:
  worker:
    enabled: true
    # existingClaim: ""
    size: 8Gi
    storageClass: "nfs"
  logs:
    # Container based log search and analytics stack recommended
    enabled: false
    # existingClaim: ""
    size: 8Gi
    storageClass: "nfs"

# Ingress
ingress:
  ingressName: "erp-nip-com"
  enabled: false
  annotations:
    kubernetes.io/ingress.class: nginx
    kubernetes.io/tls-acme: "true"
    cert-manager.io/cluster-issuer: letsencrypt-prod
  hosts:
  - host: 1.2.3.4.nip.io
    paths:
    - path: /
      pathType: ImplementationSpecific
  tls:
   - secretName: erp-nip-com-tls
     hosts:
       - 1.2.3.4.nip.io

jobs:
  volumePermissions:
    enabled: false
    backoffLimit: 0
    resources: {}
    nodeSelector: {}
    tolerations: []
    affinity: {}

  configure:
    enabled: true
    fixVolume: true
    backoffLimit: 0
    resources: {}
    nodeSelector: {}
    tolerations: []
    affinity: {}

  createSite:
    enabled: true
    forceCreate: false
    siteName: "1.2.3.4.nip.io"
    adminPassword: "secret"
    installApps:
    - "erpnext"
    dbType: "mariadb"
    backoffLimit: 0
    resources: {}
    nodeSelector: {}
    tolerations: []
    affinity: {}

  dropSite:
    enabled: true
    forced: false
    siteName: "1.2.3.4.nip.io"
    backoffLimit: 0
    resources: {}
    nodeSelector: {}
    tolerations: []
    affinity: {}

  backup:
    enabled: false
    siteName: "erp.cluster.local"
    withFiles: true
    push:
      enabled: false
      # bucket: "erpnext"
      # region: "us-east-1"
      # accessKey: "ACCESSKEY"
      # secretKey: "SECRETKEY"
      # endpoint: http://store.storage.svc.cluster.local
    backoffLimit: 0
    resources: {}
    nodeSelector: {}
    tolerations: []
    affinity: {}

  migrate:
    enabled: false
    siteName: "erp.cluster.local"
    backoffLimit: 0
    resources: {}
    nodeSelector: {}
    tolerations: []
    affinity: {}

imagePullSecrets: []
nameOverride: ""
fullnameOverride: ""

serviceAccount:
  # Specifies whether a service account should be created
  create: true

podSecurityContext:
  supplementalGroups: [1000]

securityContext:
  capabilities:
    add:
    - CAP_CHOWN
  # readOnlyRootFilesystem: true
  # runAsNonRoot: true
  # runAsUser: 1000

redis-cache:
  # https://github.com/bitnami/charts/tree/master/bitnami/redis
  enabled: true
  # host: ""
  architecture: standalone
  auth:
    enabled: false
    sentinal: false
  master:
    containerPorts:
      redis: 6379
    persistence:
      enabled: false

redis-queue:
  # https://github.com/bitnami/charts/tree/master/bitnami/redis
  enabled: true
  # host: ""
  architecture: standalone
  auth:
    enabled: false
    sentinal: false
  master:
    containerPorts:
      redis: 6379
    persistence:
      enabled: false

redis-socketio:
  # https://github.com/bitnami/charts/tree/master/bitnami/redis
  enabled: true
  # host: ""
  architecture: standalone
  auth:
    enabled: false
    sentinal: false
  master:
    containerPorts:
      redis: 6379
    persistence:
      enabled: false

mariadb:
  # https://github.com/bitnami/charts/tree/master/bitnami/mariadb
  enabled: true
  auth:
    rootPassword: "changeit"
    username: "erpnext"
    password: "changeit"
    replicationPassword: "changeit"
  primary:
    service:
      ports:
        mysql: 3306
    configuration: |-
      [mysqld]
      skip-name-resolve
      explicit_defaults_for_timestamp
      basedir=/opt/bitnami/mariadb
      plugin_dir=/opt/bitnami/mariadb/plugin
      port=3306
      socket=/opt/bitnami/mariadb/tmp/mysql.sock
      tmpdir=/opt/bitnami/mariadb/tmp
      max_allowed_packet=16M
      bind-address=::
      pid-file=/opt/bitnami/mariadb/tmp/mysqld.pid
      log-error=/opt/bitnami/mariadb/logs/mysqld.log

      # Frappe Specific Changes
      character-set-client-handshake=FALSE
      character-set-server=utf8mb4
      collation-server=utf8mb4_unicode_ci

      [client]
      port=3306
      socket=/opt/bitnami/mariadb/tmp/mysql.sock
      plugin_dir=/opt/bitnami/mariadb/plugin

      # Frappe Specific Changes
      default-character-set=utf8mb4

      [manager]
      port=3306
      socket=/opt/bitnami/mariadb/tmp/mysql.sock
      pid-file=/opt/bitnami/mariadb/tmp/mysqld.pid

postgresql:
  # https://github.com/bitnami/charts/tree/master/bitnami/postgresql
  enabled: false
  # host: ""
  auth:
    username: "postgres"
    postgresPassword: "changeit"
  primary:
    service:
      ports:
        postgresql: 5432
helm template frappe-bench -n erpnext frappe/erpnext -f custom-values.yaml -s templates/job-create-site.yaml > create-new-site-job.yaml

kubectl apply -f create-new-site-job.yaml

create it in erpnext namespace, delete it from default namespace.

503 is because service frappe-bench-erpnext does not exist in default namespace

this is something else that you’ve to figure out on your own

no need to copy full file. only override needed variables.

if you copy full file it’ll override everything and upgrades won’t work when new chart is released with updated values

Hi Revant,

Thanks again for your help.

I figured this out. Now the returned message became:

{"message":"pong"}

I changed https to http in the request url http://0.0.0.0:8080/api/method/ping.

You are right. Changing the namespace into erpnext solved the 503 issue. I think this is the root cause.

Thanks for your support. This is not an issue with ERPNext, more about how to use K8s.