We are building NeetoDeploy, a compelling alternative to Heroku. Stay updated by following NeetoDeploy on Twitter and reading our blog.
At neeto we are building 20+ applications, and most of our applications are running in NeetoDeploy. Once we migrated from Heroku to NeetoDeploy, we started getting 520 response code for our applications. This issue was occurring randomly and rarely.
A 520 response code happens when the connection started on the origin web server, but that request was not completed. This could be due to server crashes or the inability to handle the coming requests because of insufficient resources.
When we looked at our logs closely, we found that all the 520 response code situation occurred when we restarted or deployed the app. From this, we concluded that the new pods are failing to handle requests from the client initially and working fine after sometime.
Once our investigation narrowed down to the new pods, we quickly realized that requests are arriving at the server even when the server is not fully ready yet to take new requests.
When we create a new pod in Kubernetes, it is marked as "Ready", and requests are sent to it as soon as its containers start. However, the servers initiated within these containers may require additional time to boot up and to become ready to accept the requests fully.
$ kubectl rollout restart deployment bling-staging-web
As we can see a new container is getting created for the new pod. The READY status for the new pod is 0. It means it's not yet READY.
NAME READY STATUS RESTARTS AGE
bling-staging-web-656f74d9d-6kpzz 1/1 Running 0 2m8s
bling-staging-web-79fc6f978-cdjf5 0/1 ContainerCreating 0 5s
Now we can see that the new pod is marked as READY (1 out of 1), and the old one is terminating.
NAME READY STATUS RESTARTS AGE
bling-staging-web-656f74d9d-6kpzz 0/1 Terminating 0 2m9s
bling-staging-web-79fc6f978-cdjf5 1/1 Running 0 6s
The new pod is shown as READY
as soon as the container was created. But on
checking the logs, we could see that the server was still starting up and not
ready yet.
[1] Puma starting in cluster mode...
[1] Installing dependencies...
From the above observation, we understood that the pod is marked as "READY" right after the container is created. Consequently, requests are received even before the server is fully prepared to serve them, and they get a 520 response code.
To fix this issue, we must ensure that pods are marked as "Ready" only after the server is up and ready to accept the requests. We can do this by using Kubernetes health probes. More than six years ago we wrote a blog on how we can leverage the readiness and liveness probes of Kubernetes.
Initially, we only added Startup probe since we had a problem with the boot-up phase. You can read more about the configuration settings here.
The following configuration will add the Startup probe for the deployments:
startupProbe:
failureThreshold: 10
httpGet:
path: /health_check
port: 3000
scheme: HTTP
periodSeconds: 5
successThreshold: 1
timeoutSeconds: 60
initialDelaySeconds: 10
/health_check
is a route in the application that is expected to return a 200
response code if all is going well. Now, let's restart the application again
after adding the Startup probe.
Container is created for the new pod, but the pod is still not "Ready".
NAME READY STATUS RESTARTS AGE
bling-staging-web-656f74d9d-6kpzz 1/1 Running 0 2m8s
bling-staging-web-79fc6f978-cdjf5 0/1 Running 0 5s
The new pod is marked as "Ready", and the old one is "Terminating".
NAME READY STATUS RESTARTS AGE
bling-staging-web-656f74d9d-6kpzz 0/1 Terminating 0 2m38s
bling-staging-web-79fc6f978-cdjf5 1/1 Running 0 35s
If we check the logs, we can see the health check request:
[1] Puma starting in cluster mode...
[1] Installing dependencies...
[1] * Puma version: 6.3.1 (ruby 3.2.2-p53) ("Mugi No Toki Itaru")
[1] * Min threads: 5
[1] * Max threads: 5
[1] * Environment: heroku
[1] * Master PID: 1
[1] * Workers: 1
[1] * Restarts: (✔) hot (✔) phased
[1] * Listening on http://0.0.0.0:3000
[1] Use Ctrl-C to stop
[2024-02-10T02:40:48.944785 #23] INFO -- : [bb9e756a-51cc-4d6b-9a4a-96b0464f6740] Started GET "/health_check" for 192.168.120.195 at 2024-02-10 02:40:48 +0000
[2024-02-10T02:40:48.946148 #23] INFO -- : [bb9e756a-51cc-4d6b-9a4a-96b0464f6740] Processing by HealthCheckController#healthy as */*
[2024-02-10T02:40:48.949292 #23] INFO -- : [bb9e756a-51cc-4d6b-9a4a-96b0464f6740] Completed 200 OK in 3ms (Allocations: 691)
Now, the pod is marked as "Ready" only after the health check succeeds, in other words, only when the server is prepared to accept the requests.
Once we released the health check for our deployments, we found that health checks were failing for all production applications but working for staging and review applications.
We were getting the following error in our production applications.
Startup probe failed: Get "https://192.168.43.231:3000/health_check": http: server gave HTTP response to HTTPS client
2024-02-12 06:40:04 +0000 HTTP parse error, malformed request: #<Puma::HttpParserError: Invalid HTTP format, parsing fails. Are you trying to open an SSL connection to a non-SSL Puma?>
From the above logs, it was clear that the issue was related to SSL
configuration. On comparing the production environment configuration with the
others, we figured out that we had enabled
force_ssl
for production applications. The force_ssl=true
setting ensures that all
incoming requests are SSL encrypted and will automatically redirect to their SSL
counterparts.
The following diagram broadly shows the path of an incoming request.
From the above diagram, we can infer the following things:
Here is how our health check request works.
force_ssl
is enabled,
ActionDispatch::SSL
middleware redirects the request to HTTPS.Are you trying to open an SSL connection to a non-SSL Puma?
error since no
SSL certificates are configured with the server.The solution to our problem lies in understanding why only the health check
request is rejected, whereas the request from the ingress controller is not,
even though both are HTTP requests. This is because ingress controller sets some
headers before forwarding to the pod, and the header we are concerned about is
X-FORWARDED-PROTO.
The X-Forwarded-Proto
header contains the HTTP/HTTPS scheme the client used to
access the application. When a client makes an HTTPS request, the ingress
controller terminates the SSL/TLS connection and forwards the request to the
backend service using plain HTTP after adding theX-Forwarded-Proto
along with
the other headers.
Everything started working after adding the X-Forwarded-Proto
header to our
startup probe request.
startupProbe:
failureThreshold: 10
httpGet:
httpHeaders:
- name: X-FORWARDED-PROTO
value: https
path: <%= health_check_url %>
port: <%= port %>
scheme: HTTP
periodSeconds: 5
successThreshold: 1
timeoutSeconds: 60
initialDelaySeconds: 10
We also added Readiness and Liveness probes for our deployments.
If your application runs on Heroku, you can deploy it on NeetoDeploy without any change. If you want to give NeetoDeploy a try, then please send us an email at [email protected].
If you have questions about NeetoDeploy or want to see the journey, follow NeetoDeploy on Twitter. You can also join our Slack community to chat with us about any Neeto product.
If this blog was helpful, check out our full blog archive.