<?xml version="1.0" encoding="utf-8"?>
    <feed xmlns="http://www.w3.org/2005/Atom">
     <title>BigBinary Blog</title>
     <link href="https://www.bigbinary.com/feed.xml" rel="self"/>
     <link href="https://www.bigbinary.com/"/>
     <updated>2026-03-06T03:01:15+00:00</updated>
     <id>https://www.bigbinary.com/</id>
     <entry>
       <title><![CDATA[Configuring the Kubernetes Horizontal Pod Autoscaler to scale based on custom metrics from Prometheus]]></title>
       <author><name>Sreeram Venkitesh</name></author>
      <link href="https://www.bigbinary.com/blog/prometheus-adapter"/>
      <updated>2024-07-23T12:00:00+00:00</updated>
      <id>https://www.bigbinary.com/blog/prometheus-adapter</id>
      <content type="html"><![CDATA[<p>Some of the major upsides of using Kubernetes to manage deployments are theself-healing and autoscaling capabilities of Kubernetes. If a deployment has asudden spike of traffic, Kubernetes will automatically spin up new containersand handle that load gracefully. It will also scale down deployments when thetraffic reduces.</p><p>Kubernetes has<a href="https://www.bigbinary.com/blog/solving-scalability-in-neeto-deploy#understanding-kubernetes-autoscalers">a couple of different ways</a>to scale deployments automatically based on the load the application receives.The<a href="https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale/">Horizontal Pod Autoscaler (HPA)</a>can be used out of the box in a Kubernetes cluster to increase or decrease thenumber of Pods of your deployment. By default, HPA supports scaling based on CPUand memory usage, served by the<a href="https://kubernetes.io/docs/tasks/debug/debug-cluster/resource-metrics-pipeline/#metrics-server">metrics server</a>.</p><p>While building <a href="https://neeto.com/neetodeploy">NeetoDeploy</a> initially, we'd setup to scale deployments based on CPU and memory usage, since these were thedefault metrics supported by the HPA. However, later we wanted to scaledeployments based on the average response time of our application.</p><p>This is an example of a case where the metric we want to scale is not directlyrelated to the CPU or the memory usage. Other examples of this could be networkmetrics from the load balancer, like the number of requests received in theapplication. In this blog, we will discuss how we achieved autoscaling ofdeployments in Kubernetes based on the average response time using<a href="https://github.com/kubernetes-sigs/prometheus-adapter">prometheus-adapter</a>.</p><p>When an application receives a lot of requests suddenly, this creates a spike inthe average response time. The CPU and memory metrics also spike, but they takelonger to catch up. In such cases, being able to scale deployments based on theresponse time will ensure that the spike in traffic is handled gracefully.</p><p><a href="https://prometheus.io/">Prometheus</a> is one of the most popular cloud nativemonitoring tools and the Kubernetes HPA can be extended to scale deploymentsbased on metrics exposed by Prometheus. We used the <code>prometheus-adapter</code> tobuild autoscaling based on the average response time in<a href="https://neeto.com/neetodeploy">NeetoDeploy</a>.</p><h2>Setting up the custom metrics</h2><p>We took the following steps to make our HPAs work with Prometheus metrics.</p><ol><li>Installed <code>prometheus-adapter</code> in our cluster.</li><li>Configured the metric we wanted for our HPAs as a custom metric in the<code>prometheus-adapter</code>.</li><li>Confirmed that the metric is added to the <code>custom.metrics.k8s.io</code> APIendpoint.</li><li>Configured an HPA with the custom metric.</li></ol><h2>Install prometheus-adapter in the cluster</h2><p><a href="https://github.com/kubernetes-sigs/prometheus-adapter">prometheus-adapter</a> isan implementation of the <code>custom.metrics.k8s.io</code> API using Prometheus. We usedthe prometheus-adapter to set up Kubernetes metrics APIs for our Promtheusmetrics, which then can be used with our HPAs.</p><p>We installed <code>prometheus-adapter</code> in our cluster using <a href="https://helm.sh/">Helm</a>.We got a template for the values file for the Helm installation<a href="https://github.com/prometheus-community/helm-charts/blob/main/charts/prometheus-adapter/values.yaml">here</a>.</p><p>We made a few changes to the file before we applied it to our cluster anddeployed <code>prometheus-adapter</code>:</p><ol><li>We made sure that the Prometheus deployment is configured properly by givingthe correct service url and port.</li></ol><pre><code class="language-yaml"># values.yamlprometheus:  # Value is templated  url: http://prometheus.monitoring.svc.cluster.local  port: 9090  path: &quot;&quot;# ... rest of the file</code></pre><ol start="2"><li>We made sure that the custom metrics that we needed for our HPA areconfigured under <code>rules.custom</code> in the <code>values.yaml</code> file. In the followingexample, we are using the custom metric <code>traefik_service_avg_response_time</code>since we'll be using that to calculate the average response time for eachdeployment.</li></ol><pre><code class="language-yaml"># values.yamlrules:  default: false  custom:    - seriesQuery:'{__name__=~&quot;traefik_service_avg_response_time&quot;, service!=&quot;&quot;}'      resources:        overrides:          app_name:            resource: service          namespace:            resource: namespace      metricsQuery: traefik_service_avg_response_time{&lt;&lt;.LabelMatchers&gt;&gt;}</code></pre><p>Once we configured our <code>values.yaml</code> file properly, we installed<code>prometheus-adapter</code> in our cluster with Helm.</p><pre><code class="language-bash">helm repo add prometheus https://prometheus-community.github.io/helm-chartshelm repo updatehelm install prom-adapter prometheus-community/prometheus-adapter --values values.yaml</code></pre><h2>Query for custom metric</h2><p>Once we got <code>prometheus-adapter</code> running, we queried our cluster to check if thecustom metric is coming up in the <code>custom.metrics.k8s.io</code> API endpoint.</p><pre><code class="language-bash">kubectl get --raw /apis/custom.metrics.k8s.io/v1beta1 | jq</code></pre><p>The response looked like this:</p><pre><code class="language-json">{  &quot;kind&quot;: &quot;APIResourceList&quot;,  &quot;apiVersion&quot;: &quot;v1&quot;,  &quot;groupVersion&quot;: &quot;custom.metrics.k8s.io/v1beta1&quot;,  &quot;resources&quot;: [    {      &quot;name&quot;: &quot;services/traefik_service_avg_response_time&quot;,      &quot;singularName&quot;: &quot;&quot;,      &quot;namespaced&quot;: true,      &quot;kind&quot;: &quot;MetricValueList&quot;,      &quot;verbs&quot;: [&quot;get&quot;]    },    {      &quot;name&quot;: &quot;namespaces/traefik_service_avg_response_time&quot;,      &quot;singularName&quot;: &quot;&quot;,      &quot;namespaced&quot;: false,      &quot;kind&quot;: &quot;MetricValueList&quot;,      &quot;verbs&quot;: [&quot;get&quot;]    }  ]}</code></pre><p>We also queried the metric API for a particular service we've configured themetric for. Here, we're querying the <code>traefik_service_avg_response_time</code> metricfor the <code>neeto-chat-web-staging</code> app in the default namespace.</p><pre><code>kubectl get --raw /apis/custom.metrics.k8s.io/v1beta1/namespaces/default/services/neeto-chat-web-staging/traefik_service_avg_response_time | jq</code></pre><p>The API response gave the following.</p><pre><code class="language-json">{  &quot;kind&quot;: &quot;MetricValueList&quot;,  &quot;apiVersion&quot;: &quot;custom.metrics.k8s.io/v1beta1&quot;,  &quot;metadata&quot;: {},  &quot;items&quot;: [    {      &quot;describedObject&quot;: {        &quot;kind&quot;: &quot;Service&quot;,        &quot;namespace&quot;: &quot;default&quot;,        &quot;name&quot;: &quot;neeto-chat-web-staging&quot;,        &quot;apiVersion&quot;: &quot;/v1&quot;      },      &quot;metricName&quot;: &quot;traefik_service_avg_response_time&quot;,      &quot;timestamp&quot;: &quot;2024-02-26T19:31:33Z&quot;,      &quot;value&quot;: &quot;19m&quot;,      &quot;selector&quot;: null    }  ]}</code></pre><p>From the response, we can see that the average response time at the instant isreported as <code>19ms</code>.</p><h2>Create the HPA</h2><p>Now that we're sure that <code>prometheus-adapter</code> is able to serve custom metricsunder the <code>custom.metrics.k8s.io</code> API, we wired this up with a Horizontal PodAutoscaler to scale our deployments based on our custom metric.</p><pre><code class="language-yaml">apiVersion: autoscaling/v2kind: HorizontalPodAutoscalermetadata:  name: my-app-name-hpaspec:  scaleTargetRef:    apiVersion: apps/v1    kind: Deployment    name: my-app-name-deployment  minReplicas: 1  maxReplicas: 10  metrics:    - type: Object      object:        metric:          name: traefik_service_avg_response_time          selector: { matchLabels: { app_name: my-app-name } }        describedObject:          apiVersion: v1          kind: Service          name: my-app-name        target:          type: Value          value: 0.03</code></pre><p>With everything set up, the HPA was able to fetch the custom metric scraped byPrometheus and scale our Pods up and down based on the value of the metric. Wealso created a recording rule in Prometheus for storing our custom metricqueries and dropped the unwanted labels as a best practice. We can use thecustom metric stored with the recording rule directly with <code>prometheus-adapter</code>to expose the metrics as an API endpoint in Kubernetes. This is helpful whenyour custom metric queries are complex.</p><p>If your application runs on Heroku, you can deploy it on NeetoDeploy without anychange. If you want to give NeetoDeploy a try, then please send us an email atinvite@neeto.com.</p><p>If you have questions about NeetoDeploy or want to see the journey, followNeetoDeploy on <a href="https://twitter.com/neetodeploy">X</a>. You can also join our<a href="https://launchpass.com/neetohq">community Slack</a> to chat with us about anyNeeto product.</p>]]></content>
    </entry><entry>
       <title><![CDATA[How we fixed app downtime issue in NeetoDeploy]]></title>
       <author><name>Abhishek T</name></author>
      <link href="https://www.bigbinary.com/blog/how-we-fixed-app-down-time-in-neeto-deploy"/>
      <updated>2024-07-09T12:00:00+00:00</updated>
      <id>https://www.bigbinary.com/blog/how-we-fixed-app-down-time-in-neeto-deploy</id>
      <content type="html"><![CDATA[<p><em>We are building <a href="https://neeto.com/neetoDeploy">NeetoDeploy</a>, a compellingalternative to Heroku. Stay updated by following NeetoDeploy on<a href="https://twitter.com/neetodeploy">Twitter</a> and reading our<a href="https://www.bigbinary.com/blog/categories/neetodeploy">blog</a>.</em></p><p>At <a href="https://www.neeto.com/">neeto</a> we are building 20+ applications, and most ofour applications are running in NeetoDeploy. Once we migrated from Heroku toNeetoDeploy, we started getting 520 response code for our applications. Thisissue was occurring randomly and rarely.</p><h3>What is 520 response code?</h3><p>A 520 response code happens when the connection is started on the origin webserver, but the request is not completed. This could be due to server crashes orthe inability to handle the incoming requests because of insufficient resources.</p><p>When we looked at our logs closely, we found that all the 520 response codesituations occurred when we restarted or deployed the app. From this, weconcluded that the new pods are failing to handle requests from the clientinitially and working fine after some time.</p><h3>What is wrong with new pods?</h3><p>Once our investigation narrowed down to the new pods, we quickly realized thatrequests are arriving at the server even when the server is not fully ready yetto take new requests.</p><p>When we create a new pod in Kubernetes, it is marked as &quot;Ready&quot;, and requestsare sent to it as soon as its containers start. However, the servers initiatedwithin these containers may require additional time to boot up and to becomeready to accept the requests fully.</p><h4>Let's try restarting the application</h4><pre><code class="language-bash">$ kubectl rollout restart deployment bling-staging-web</code></pre><p>As we can see, a new container is getting created for the new pod. The READYstatus for the new pod is 0. It means it's not yet READY.</p><pre><code class="language-bash">NAME                               READY  STATUS             RESTARTS  AGEbling-staging-web-656f74d9d-6kpzz  1/1    Running            0         2m8sbling-staging-web-79fc6f978-cdjf5  0/1    ContainerCreating  0         5s</code></pre><p>Now we can see that the new pod is marked as READY (1 out of 1), and the old oneis terminating.</p><pre><code class="language-bash">NAME                               READY  STATUS             RESTARTS  AGEbling-staging-web-656f74d9d-6kpzz  0/1    Terminating        0         2m9sbling-staging-web-79fc6f978-cdjf5  1/1    Running            0         6s</code></pre><p>The new pod is shown as <code>READY</code> as soon as the container was created. But onchecking the logs, we could see that the server was still starting up and notready yet.</p><pre><code>[1] Puma starting in cluster mode...[1] Installing dependencies...</code></pre><p>From the above observation, we understood that the pod is marked as &quot;READY&quot;right after the container is created. Consequently, requests are received evenbefore the server is fully prepared to serve them, and they get a 520 responsecode.</p><h2>Solution</h2><p>To fix this issue, we must ensure that pods are marked as &quot;Ready&quot; only after theserver is up and ready to accept the requests. We can do this by usingKubernetes<a href="https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-startup-probes/">health probes</a>.More than six years ago we wrote<a href="https://www.bigbinary.com/blog/deploying-rails-applications-using-kubernetes-with-zero-downtime">a blog</a>on how we can leverage the readiness and liveness probes of Kubernetes.</p><h3>Adding Startup probe</h3><p>Initially, we only added<a href="https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-startup-probes/#define-startup-probes">Startup probe</a>since we had a problem with the boot-up phase. You can read more about theconfiguration settings<a href="https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-startup-probes/#configure-probes">here</a>.</p><p>The following configuration will add the Startup probe for the deployments:</p><pre><code class="language-yaml">startupProbe:  failureThreshold: 10  httpGet:    path: /health_check    port: 3000    scheme: HTTP  periodSeconds: 5  successThreshold: 1  timeoutSeconds: 60  initialDelaySeconds: 10</code></pre><p><code>/health_check</code> is a route in the application that is expected to return a 200response code if all is going well. Now, let's restart the application againafter adding the Startup probe.</p><p>Container is created for the new pod, but the pod is still not &quot;Ready&quot;.</p><pre><code class="language-bash">NAME                               READY  STATUS            RESTARTS  AGEbling-staging-web-656f74d9d-6kpzz  1/1    Running           0         2m8sbling-staging-web-79fc6f978-cdjf5  0/1    Running           0         5s</code></pre><p>The new pod is marked as &quot;Ready&quot;, and the old one is &quot;Terminating&quot;.</p><pre><code class="language-bash">NAME                               READY  STATUS            RESTARTS  AGEbling-staging-web-656f74d9d-6kpzz  0/1    Terminating       0         2m38sbling-staging-web-79fc6f978-cdjf5  1/1    Running           0         35s</code></pre><p>If we check the logs, we can see the health check request:</p><pre><code>  [1] Puma starting in cluster mode...  [1] Installing dependencies...  [1] * Puma version: 6.3.1 (ruby 3.2.2-p53) (&quot;Mugi No Toki Itaru&quot;)  [1] *  Min threads: 5  [1] *  Max threads: 5  [1] *  Environment: heroku  [1] *   Master PID: 1  [1] *      Workers: 1  [1] *     Restarts: () hot () phased [1] * Listening on http://0.0.0.0:3000 [1] Use Ctrl-C to stop [2024-02-10T02:40:48.944785 #23]  INFO -- : [bb9e756a-51cc-4d6b-9a4a-96b0464f6740] Started GET &quot;/health_check&quot; for 192.168.120.195 at 2024-02-10 02:40:48 +0000 [2024-02-10T02:40:48.946148 #23]  INFO -- : [bb9e756a-51cc-4d6b-9a4a-96b0464f6740] Processing by HealthCheckController#healthy as */* [2024-02-10T02:40:48.949292 #23]  INFO -- : [bb9e756a-51cc-4d6b-9a4a-96b0464f6740] Completed 200 OK in 3ms (Allocations: 691)</code></pre><p>Now, the pod is marked as &quot;Ready&quot; only after the health check succeeds, in otherwords, only when the server is prepared to accept the requests.</p><h3>Fixing the Startup probe for production applications</h3><p>Once we released the health check for our deployments, we found that healthchecks were failing for all production applications but working for staging andreview applications.</p><p>We were getting the following error in our production applications.</p><pre><code class="language-bash">Startup probe failed: Get &quot;https://192.168.43.231:3000/health_check&quot;: http: server gave HTTP response to HTTPS client2024-02-12 06:40:04 +0000 HTTP parse error, malformed request: #&lt;Puma::HttpParserError: Invalid HTTP format, parsing fails. Are you trying to open an SSL connection to a non-SSL Puma?&gt;</code></pre><p>From the above logs, it was clear that the issue was related to SSLconfiguration. On comparing the production environment configuration with theothers, we figured out that we had enabled<a href="https://guides.rubyonrails.org/configuring.html#config-force-ssl">force_ssl</a>for production applications. The <code>force_ssl=true</code> setting ensures that allincoming requests are SSL encrypted and will automatically redirect to their SSLcounterparts.</p><p>The following diagram broadly shows the path of an incoming request.</p><p><img src="/blog_images/2024/how-we-fixed-app-down-time-in-neeto-deploy/image4.png" alt="HTTPS request path"></p><p>From the above diagram, we can infer the following things:</p><ul><li>SSL verification is happening in the ingress controller and not in the server.</li><li>Client requests are going through the ingress controller before reaching theserver.</li><li>Request from ingress controller to the pod is an HTTP request.</li><li>The HTTP health check requests are directly sent from Kubelet to the pod anddo not go through the ingress controller.</li></ul><p>Here is how our health check request works.</p><ol><li><a href="https://kubernetes.io/docs/reference/command-line-tools-reference/kubelet/">Kubelet</a>sends an HTTP request to the server directly.</li><li>Since <code>force_ssl</code> is enabled,<a href="https://api.rubyonrails.org/v7.1.2/classes/ActionDispatch/SSL.html">ActionDispatch::SSL</a>middleware redirects the request to HTTPS.</li><li>When the HTTPS request reaches the server, <a href="https://puma.io/">Puma</a> throws<code>Are you trying to open an SSL connection to a non-SSL Puma?</code> error since noSSL certificates are configured with the server.</li></ol><p>The solution to our problem lies in understanding why only the health checkrequest is rejected, whereas the request from the ingress controller is not,even though both are HTTP requests. This is because ingress controller sets someheaders before forwarding to the pod, and the header we are concerned about is<a href="https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/X-Forwarded-Proto">X-FORWARDED-PROTO</a>.The <code>X-Forwarded-Proto</code> header contains the HTTP/HTTPS scheme the client used toaccess the application. When a client makes an HTTPS request, the ingresscontroller terminates the SSL/TLS connection and forwards the request to thebackend service using plain HTTP after adding the<code>X-Forwarded-Proto</code> along withthe other headers.</p><p>Everything started working after adding the <code>X-Forwarded-Proto</code> header to ourstartup probe request.</p><pre><code class="language-yaml">startupProbe:  failureThreshold: 10  httpGet:    httpHeaders:      - name: X-FORWARDED-PROTO        value: https    path: &lt;%= health_check_url %&gt;    port: &lt;%= port %&gt;    scheme: HTTP  periodSeconds: 5  successThreshold: 1  timeoutSeconds: 60  initialDelaySeconds: 10</code></pre><p>We also added<a href="https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-startup-probes/#define-readiness-probes">Readiness</a>and<a href="https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-startup-probes/#define-a-liveness-http-request">Liveness</a>probes for our deployments.</p><p>If your application runs on Heroku, you can deploy it on NeetoDeploy without anychange. If you want to give NeetoDeploy a try, then please send us an email at<a href="mailto:invite@neeto.com">invite@neeto.com</a>.</p><p>If you have questions about NeetoDeploy or want to see the journey, followNeetoDeploy on <a href="https://twitter.com/neetodeploy">Twitter</a>. You can also join ourSlack community to chat with us about any Neeto product.</p>]]></content>
    </entry><entry>
       <title><![CDATA[Grafana Loki and Kubernetes Event exporter]]></title>
       <author><name>Vishal Yadav</name></author>
      <link href="https://www.bigbinary.com/blog/k8s-event-exporter-and-grafana-loki-integration"/>
      <updated>2024-05-07T12:00:00+00:00</updated>
      <id>https://www.bigbinary.com/blog/k8s-event-exporter-and-grafana-loki-integration</id>
      <content type="html"><![CDATA[<p>In the previous<a href="https://www.bigbinary.com/blog/prometheus-and-grafana-integration">blog</a>, wediscussed integrating <a href="https://prometheus.io/">Prometheus</a> and<a href="https://grafana.com/">Grafana</a> in the Kubernetes Cluster. In this blog, we'llexplore how to integrate the<a href="https://github.com/resmoio/kubernetes-event-exporter">Kubernetes Event exporter</a>&amp; <a href="https://grafana.com/oss/loki/">Grafana Loki</a> into your Kubernetes Clusterusing a helm chart.</p><p>Additionally, youll also learn how to add Grafana Loki as a data source to yourGrafana Dashboard. This will help you visualize the Kubernetes events.</p><p>Furthermore, we'll delve into the specifics of setting up the Event exporter andGrafana Loki, ensuring you understand each step of the process. From downloadingand configuring the necessary helm charts to understanding the Grafana Lokidashboard, we'll cover it all.</p><p>By the end of this blog, you'll be able to fully utilize Grafana Loki andKubernetes Event Exporter, gaining insights from your Kubernetes events.</p><h2>How Kubernetes event exporter can help us in monitoring health</h2><p>Objects in Kubernetes, such as Pod, Deployment, Ingress, Service publish eventsto indicate status updates or problems. Most of the time, these events areoverlooked and their 1-hour lifespan might cause missing important updates. Theyare also not searchable and cannot be aggregated.</p><p>For instance, they can alert you to changes in the state of pods, errors inscheduling, and resource constraints. Therefore, exporting these events andvisualizing them can be crucial for maintaining the health of your cluster.</p><p>Kubernetes event exporter allows exporting the often missed Kubernetes events tovarious outputs so that they can be used for observability or alerting purposes.We can have multiple receivers to export the events from the Kubernetes cluster.</p><ul><li><a href="https://www.opsgenie.com/">Opsgenie</a></li><li><a href="https://github.com/resmoio/kubernetes-event-exporter#webhookshttp">Webhooks/HTTP</a></li><li><a href="https://www.elastic.co/">Elasticsearch</a></li><li><a href="https://opensearch.org/">OpenSearch</a></li><li><a href="https://github.com/resmoio/kubernetes-event-exporter#slack">Slack</a></li><li><a href="https://github.com/resmoio/kubernetes-event-exporter#kinesis">Kinesis</a></li><li><a href="https://github.com/resmoio/kubernetes-event-exporter#firehose">Firehose</a></li><li><a href="https://github.com/resmoio/kubernetes-event-exporter#sns">SNS</a></li><li><a href="https://github.com/resmoio/kubernetes-event-exporter#sqs">SQS</a></li><li><a href="https://github.com/resmoio/kubernetes-event-exporter#file">File</a></li><li><a href="https://github.com/resmoio/kubernetes-event-exporter#stdout">Stdout</a></li><li><a href="https://github.com/resmoio/kubernetes-event-exporter#kafka">Kafka</a></li><li><a href="https://docs.aws.amazon.com/systems-manager/latest/userguide/OpsCenter.html">OpsCenter</a></li><li><a href="https://github.com/resmoio/kubernetes-event-exporter#customizing-payload">Customize Payload</a></li><li><a href="https://github.com/resmoio/kubernetes-event-exporter#pubsub">Pubsub</a></li><li><a href="https://github.com/resmoio/kubernetes-event-exporter#teams">Teams</a></li><li><a href="https://github.com/resmoio/kubernetes-event-exporter#syslog">Syslog</a></li><li><a href="https://github.com/resmoio/kubernetes-event-exporter#bigquery">Bigquery</a></li><li><a href="https://github.com/resmoio/kubernetes-event-exporter#pipe">Pipe</a></li><li><a href="https://github.com/resmoio/kubernetes-event-exporter#aws-eventbridge">Event Bridge</a></li><li><a href="https://github.com/resmoio/kubernetes-event-exporter#loki">Grafana Loki</a></li></ul><h2>Setting up Grafana Loki &amp; Kubernetes event exporter using Helm chart</h2><p>We will once again use <a href="https://artifacthub.io/">ArtifactHub</a>, which provides ahelm chart for installing Grafana Loki onto a Kubernetes Cluster. If you needinstructions on how to install Helm on your system, you can refer to this blog.</p><p>In this blog post, we will install a Helm<a href="https://artifacthub.io/packages/helm/grafana/loki">chart</a> that sets up Loki inscalable mode, with separate read-and-write components that can be independentlyscaled. Alternatively, we can install Loki in monolithic mode, where the HelmChart installation runs the Grafana Loki <em>single binary</em> within a Kubernetescluster. You can learn more about this<a href="https://grafana.com/docs/loki/latest/setup/install/helm/install-monolithic/#install-the-monolithic-helm-chart">here</a>.</p><h3>1. Create S3 buckets</h3><ul><li><p>grafana-loki-chunks-bucket</p></li><li><p>grafana-loki-admin-bucket</p></li><li><p>grafana-loki-ruler-bucket</p><p><img src="/blog_images/2024/k8s-event-exporter-and-grafana-loki-integration/loki-s3-buckets.png" alt="loki-s3-buckets.png"></p></li></ul><h3>2. Create a policy for Grafana Loki</h3><p>Create a new policy under IAM on Amazon AWS using the below snippet.</p><pre><code>{    &quot;Version&quot;: &quot;2012-10-17&quot;,    &quot;Statement&quot;: [        {            &quot;Sid&quot;: &quot;LokiStorage&quot;,            &quot;Effect&quot;: &quot;Allow&quot;,            &quot;Action&quot;: [                &quot;s3:ListBucket&quot;,                &quot;s3:PutObject&quot;,                &quot;s3:GetObject&quot;,                &quot;s3:DeleteObject&quot;            ],            &quot;Resource&quot;: [                &quot;arn:aws:s3:::grafana-loki-chunks-bucket&quot;,                &quot;arn:aws:s3:::grafana-loki-chunks-bucket/*&quot;,                &quot;arn:aws:s3:::grafana-loki-admin-bucket&quot;,                &quot;arn:aws:s3:::grafana-loki-admin-bucket/*&quot;,                &quot;arn:aws:s3:::grafana-loki-ruler-bucket&quot;,                &quot;arn:aws:s3:::grafana-loki-ruler-bucket/*&quot;            ]        }    ]}</code></pre><p>Output:</p><p><img src="/blog_images/2024/k8s-event-exporter-and-grafana-loki-integration/grafana-loki-policy.png" alt="grafana-loki-policy.png"></p><h3>3. Create a Role with the above permission</h3><p>Create a role with a custom trust policy &amp; use the below snippet</p><pre><code>{    &quot;Version&quot;: &quot;2012-10-17&quot;,    &quot;Statement&quot;: [        {            &quot;Effect&quot;: &quot;Allow&quot;,            &quot;Principal&quot;: {                &quot;Federated&quot;: &quot;arn:aws:iam::account_id:oidc-provider/oidc.eks.us-east-1.amazonaws.com/id/open_id&quot;            },            &quot;Action&quot;: &quot;sts:AssumeRoleWithWebIdentity&quot;,            &quot;Condition&quot;: {                &quot;StringEquals&quot;: {                    &quot;oidc.eks.us-east-1.amazonaws.com/id/open_id:aud&quot;: &quot;sts.amazonaws.com&quot;,                    &quot;oidc.eks.us-east-1.amazonaws.com/id/open_id:sub&quot;: &quot;system:serviceaccount:default:grafana-loki-access-s3-role-sa&quot;                }            }        }    ]}</code></pre><p>Note: Please update the account_id and open_id in the above given snippet.</p><p><strong>grafana-loki-access-s3-role-sa</strong>is the service account name that we willmention in the Loki values.</p><h3>4. Add Grafana using the helm chart</h3><p>To get this Helm chart, run this command:</p><pre><code class="language-jsx">helm repo add grafana https://grafana.github.io/helm-chartshelm repo update</code></pre><p>Output:</p><p><img src="/blog_images/2024/k8s-event-exporter-and-grafana-loki-integration/chart-add-output.png" alt="chart-add-output.png"></p><p>We have downloaded the latest version of the Grafana.</p><h3>5. Install grafana/loki stack using the helm chart</h3><p>Create a <strong>loki-values yaml</strong> file with the below snippet</p><pre><code>loki:  readinessProbe: {}  auth_enabled: false  storage:    bucketNames:      chunks: grafana-loki-chunks-bucket      ruler: grafana-loki-rules-bucket      admin: grafana-loki-admin-bucket    type: s3    s3:      endpoint: null      region: us-east-1      secretAccessKey: null      accessKeyId: null      s3ForcePathStyle: false      insecure: falsemonitoring:  lokiCanary:      enabled: false  selfMonitoring:    enabled: falsetest:  enabled: falseserviceAccount:  create: true  name: grafana-loki-access-s3-role-sa  imagePullSecrets: []  annotations:    eks.amazonaws.com/role-arn: arn:aws:iam::account_id:role/loki-role  automountServiceAccountToken: true</code></pre><p>To install Loki using the Helm Chart on Kubernetes Cluster, runthis<code>helm install</code>command:</p><pre><code class="language-jsx">helm install my-loki grafana/loki --values loki-values.yaml</code></pre><p>Output:</p><p><img src="/blog_images/2024/k8s-event-exporter-and-grafana-loki-integration/chart-installation-output.png" alt="chart-installation-output.png"></p><p>We have successfully installed Loki on the Kubernetes Cluster.</p><p>Run the followingcommand to view all the resources created by the Loki HelmChart in your Kubernetes cluster:</p><pre><code class="language-jsx">kubectl get all -l app.kubernetes.io/name=loki</code></pre><p>Output:</p><p><img src="/blog_images/2024/k8s-event-exporter-and-grafana-loki-integration/all-resources-output.png" alt="all-resources-output.png"></p><p>Helm chart created the following components:</p><ul><li><strong>Loki read and write:</strong> Loki is installed in scalable mode by default, whichincludes a read-and-write component. These components can be independentlyscaled out.</li><li><strong>Gateway:</strong> Inspired by Grafanas<a href="https://github.com/grafana/loki/blob/main/production/ksonnet/loki">Tanka setup</a>,the chart installs a gateway component by default. This NGINX componentexposes Lokis API and automatically proxies requests to the appropriate Lokicomponents (read or write, or a single instance in the case of filesystemstorage). The gateway must be enabled to provide an Ingress since the Ingressonly exposes the gateway. If enabled, Grafana and log shipping agents, such asPromtail, should be configured to use the gateway. If NetworkPolicies areenabled, they become more restrictive when the gateway is active.</li><li><strong>Caching:</strong> In-memory caching is enabled by default. If this type of cachingis unsuitable for your deployment, consider setting up memcache.</li></ul><p>Run this command to view all the Kubernetes Services for Prometheus &amp; Grafana:</p><pre><code class="language-jsx">kubectl get service -l app.kubernetes.io/name=loki</code></pre><p>Output:</p><p><img src="/blog_images/2024/k8s-event-exporter-and-grafana-loki-integration/all-services-output.png" alt="all-services-output.png"></p><p>Listed services for Loki are:</p><ul><li>loki-backend</li><li>loki-backend-headless</li><li>loki-gateway</li><li>loki-memberlist</li><li>loki-read</li><li>loki-read-headless</li><li>loki-write</li><li>loki-write-headless</li><li>query-scheduler-discovery</li></ul><p>The <code>loki-gateway</code> service will be used to add Loki as a Datasource intoGrafana.</p><h3>6. Adding Loki data source in Grafana</h3><p>On the main page of Grafana, click on &quot;<strong>Home</strong>&quot;. Under &quot;<strong>Connections</strong>&quot;, youwill find the &quot;<strong>Data sources</strong>&quot; option.</p><p><img src="/blog_images/2024/k8s-event-exporter-and-grafana-loki-integration/grafana-dashboard.png" alt="/blog_images/event-exporter-and-grafana-loki-integration/grafana-dashboard.png"></p><p>On the Data Sources page, click on the &quot;Add new data source&quot; button.</p><p><img src="/blog_images/2024/k8s-event-exporter-and-grafana-loki-integration/data-sources-page.png" alt="/blog_images/event-exporter-and-grafana-loki-integration/data-sources-page.png"></p><p>In the search bar, type &quot;Loki&quot; and search for it.</p><p><img src="/blog_images/2024/k8s-event-exporter-and-grafana-loki-integration/add-data-source.png" alt="/blog_images/event-exporter-and-grafana-loki-integration/add-data-source.png"></p><p>Clicking on &quot;Loki&quot; will redirect you to the dedicated page for the Loki datasource.</p><p><img src="/blog_images/2024/k8s-event-exporter-and-grafana-loki-integration/loki-data-source.png" alt="/blog_images/event-exporter-and-grafana-loki-integration/loki-data-source.png"></p><p>To read the metrics from Loki, we will use the <code>loki-gateway</code> service. Add theURL of the service as <code>http://loki-gateway</code>.</p><p><img src="/blog_images/2024/k8s-event-exporter-and-grafana-loki-integration/loki-form.png" alt="/blog_images/event-exporter-and-grafana-loki-integration/loki-form.png"></p><p>After clicking on the &quot;Save &amp; test&quot; button, you will receive a toastr messageshown in the image below. This message is received because no clients have beencreated for Loki yet.</p><p><img src="/blog_images/2024/k8s-event-exporter-and-grafana-loki-integration/loki-addon-output.png" alt="loki-addon-output.png"></p><h3>7. Install Kubernetes event exporter using the helm chart</h3><p>Create an <strong>event-exporter-values yaml</strong> file with the below snippet</p><pre><code class="language-jsx">config:  leaderElection: {}  logLevel: debug  logFormat: pretty  metricsNamePrefix: event_exporter_  receivers:    - name: &quot;dump&quot;      file:        path: &quot;/dev/stdout&quot;        layout: {}    - name: &quot;loki&quot;      loki:        url: &quot;http://loki-gateway/loki/api/v1/push&quot;        streamLabels:          source: kubernetes-event-exporter          container: kubernetes-event-exporter  route:    routes:      - match:          - receiver: &quot;dump&quot;          - receiver: &quot;loki&quot;</code></pre><p>With the use of the above snippet, run this command to install the Kubernetesevent exporter in your Kubernetes Cluster.</p><pre><code class="language-jsx">helm repo add bitnami [https://charts.bitnami.com/bitnami](https://charts.bitnami.com/bitnami)helm install event-exporter bitnami/kubernetes-event-exporter --values event-exporter.yaml</code></pre><p>Output:</p><p><img src="/blog_images/2024/k8s-event-exporter-and-grafana-loki-integration/event-exporter-installation-output.png" alt="event-exporter-installation-output.png"></p><p>To view all the resources created by the above helm chart, run this command:</p><pre><code class="language-jsx">kubectl get all -l app.kubernetes.io/name=kubernetes-event-exporter</code></pre><p>Output:</p><p><img src="/blog_images/2024/k8s-event-exporter-and-grafana-loki-integration/event-exporter-all-resources.png" alt="event-exporter-all-resources.png"></p><p>To view the logs of the event exporter POD, run this command:</p><pre><code class="language-jsx">kubectl logs -f pod/kubernetes-event-exporter-586455bbdd-sqlqc</code></pre><p>Note: Replace <strong>kubernetes-event-exporter-586455bbdd-sqlqc</strong> with your pod name.</p><p>Output:</p><p><img src="/blog_images/2024/k8s-event-exporter-and-grafana-loki-integration/event-exporter-logs.png" alt="event-exporter-logs"></p><p>As you can see in the above image, the event exporter is working and runningfine. Event logs are being sent to both the receivers that wed configure in thevalues YAML file.</p><p>Once the POD is created &amp; running, we can go back to the Loki data source under<strong>Connections</strong> &gt; <strong>Data Sources</strong> page.</p><p>Again click on the Save &amp; test button &amp; this time youll receive a successtoastr message.</p><p>Output:</p><p><img src="/blog_images/2024/k8s-event-exporter-and-grafana-loki-integration/loki-data-source-added-output.png" alt="loki-data-source-added-output"></p><h3>8. Kubernetes event exporter dashboard</h3><p>We will import this<a href="https://grafana.com/grafana/dashboards/17882-kubernetes-event-exporter/">dashboard</a>into Grafana to monitor and track the events received from the Kubernetescluster. You can go through this blog if you want to learn how to import anexisting dashboard into Grafana.</p><p>After successfully importing the dashboard, you can view all the events from thecluster, as shown in the image below. Additionally, you can filter the eventsbased on any value within any interval.</p><p>Kubernetes Event Exporter</p><p><img src="/blog_images/2024/k8s-event-exporter-and-grafana-loki-integration/event-exporter-dashboard.png" alt="Kubernetes Event Exporter"></p><h2>Conclusion</h2><p>In this blog post, we discussed the process of setting up Grafana Loki andKubernetes Event exporter. We covered various steps, such as creating a policyfor Grafana Loki, creating a role with the necessary permissions, and addingGrafana using the Helm chart, installing the Loki stack, and adding Loki as adata source in Grafana, installing Kubernetes event exporter using the Helmchart, and finally, setting up the Kubernetes event exporter dashboard inGrafana.</p><p>By following the steps outlined in this blog post, you can effectively monitorand track events from your Kubernetes cluster using Grafana Loki and KubernetesEvent exporter. This setup provides valuable insights and helps introubleshooting and analyzing events in your cluster.</p><p>If you have any questions or feedback, please feel free to reach out. Happymonitoring!</p>]]></content>
    </entry><entry>
       <title><![CDATA[Setting up Prometheus and Grafana on Kubernetes using Helm]]></title>
       <author><name>Vishal Yadav</name></author>
      <link href="https://www.bigbinary.com/blog/prometheus-and-grafana-integration"/>
      <updated>2024-01-25T12:00:00+00:00</updated>
      <id>https://www.bigbinary.com/blog/prometheus-and-grafana-integration</id>
      <content type="html"><![CDATA[<p>In this blog, we will learn how to set up Prometheus and Grafana on Kubernetesusing Helm.</p><p><a href="https://prometheus.io/">Prometheus</a> along with<a href="https://grafana.com/">Grafana</a>is a highly scalable open-sourcemonitoringframework for<a href="https://devopscube.com/docker-container-clustering-tools/">container orchestration platform</a>.Prometheus probes the application and collects various data. It stores all thisdata in its time series database. Grafana is a visualization tool. It uses thedata from the database to show the data that is meaningful to the user.</p><p>Both Prometheus and Grafana are gaining popularity inthe<a href="https://devopscube.com/what-is-observability/">observability</a>space as ithelps with metrics and alerts. Learning to integrate them using Helm will allowus to monitor our Kubernetes cluster and troubleshoot problems easily.Furthermore, we can deep dive into our cluster's well-being and efficiency,focusing on resource usage and performance metrics within our Kubernetesenvironment.</p><p>We will also learn how to create a simple<a href="https://grafana.com/grafana/dashboards/">dashboard</a> on Grafana.</p><h2><strong>Why using Prometheus and Grafana for monitoring is good</strong></h2><p>Using Prometheus and Grafana for monitoring has many benefits:</p><ul><li><strong>Scalability:</strong> Both tools are highly scalable and can handle the monitoringneeds of small to large Kubernetes clusters.</li><li><strong>Flexibility:</strong> They allow us to create custom dashboards tailored to ourspecific monitoring requirements.</li><li><strong>Real-time Monitoring:</strong> Prometheus provides real-time monitoring, helping usto quickly detect and respond to issues.</li><li><strong>Alerting:</strong> Prometheus enables us to set up alerts based on specificmetrics, so we can be notified when issues arise.</li><li><strong>Data Visualization:</strong> Grafana offers powerful data visualizationcapabilities, making it easier to understand complex data.</li><li><strong>Open Source:</strong> Both Prometheus and Grafana are open-source, reducingmonitoring costs.</li><li><strong>Community Support:</strong> We can benefit from active communities, ensuringcontinuous development and support.</li><li><strong>Integration:</strong> They seamlessly integrate with other Kubernetes componentsand applications, simplifying setup.</li><li><strong>Historical Data:</strong> Grafana allows us to explore historical data, aiding inlong-term analysis and trend identification.</li><li><strong>Extensible:</strong> Both tools are extensible, allowing us to integrate additionaldata sources and plugins.</li><li><strong>Efficient Resource Usage:</strong> Prometheus efficiently utilizes resources,ensuring minimal impact on our cluster's performance.</li></ul><p>Two common ways to use Prometheus and Grafana on Kubernetes:</p><ol><li><strong>Manual Kubernetes deployment</strong>: In this method, we need to write<a href="https://kubernetes.io/docs/concepts/workloads/controllers/deployment/">Kubernetes Deployments</a>and<a href="https://kubernetes.io/docs/concepts/services-networking/service/">Services</a>for both Prometheus and Grafana. In the YAML file, we need to put all thesettings for Prometheus and Grafana on Kubernetes. Then we send these filesto our Kubernetes cluster. But we can end up with many YAML files, which canbe hard. If we make a mistake in any YAML file, Prometheus and Grafana won'twork on Kubernetes.</li><li><strong>Using Helm</strong>: This is an easy way to send any application container toKubernetes.<a href="https://helm.sh/">Helm</a>is the official package manager forKubernetes. With Helm, we can make installing, sending, and managingKubernetes makes applications easier.</li></ol><p>A<a href="https://helm.sh/">Helm Chart</a>has all the YAML files:</p><ul><li>Deployments.</li><li>Services.</li><li>Secrets.</li><li>ConfigMaps manifests.</li></ul><p>We use these files to send the application container to Kubernetes. Instead ofmaking individual YAML files for each application container, Helm lets usdownload Helm charts that already have YAML files.</p><h2>Setting up Prometheus and Grafana using Helm chart</h2><p>We will use <a href="https://artifacthub.io/">ArtifactHub</a>, which offers public andprivate repositories for Helm Charts. We will use these Helm Charts to arrangethe pods and services in our Kubernetes cluster.</p><p>To get Prometheus and Grafana working on Kubernetes with Helm, we will start byinstalling Helm.</p><h4>Installing Helm on Linux</h4><pre><code class="language-bash">sudo apt-get install helm</code></pre><h4>Installing Helm on Windows</h4><pre><code class="language-bash">choco install Kubernetes-helm</code></pre><h4>Installing Helm on macOS</h4><pre><code class="language-bash">brew install helm</code></pre><p>We can check out the official<a href="https://helm.sh/docs/intro/install/">Helm documentation</a> if we run into anyissues while installing Helm.</p><p>The image below represents the successful Helm installation on macOS.</p><p><img src="/blog_images/2024/prometheus-and-grafana-integration/Screenshot_2023-11-01_at_1.09.49_PM.png" alt="Screenshot 2023-11-01 at 1.09.49PM.png"></p><p>For this blog, were going to install Helm<a href="https://artifacthub.io/packages/helm/prometheus-community/kube-prometheus-stack">chart</a>and by default, this chart also installs additional, dependent charts (includingGrafana):</p><ul><li><a href="https://github.com/prometheus-community/helm-charts/tree/main/charts/kube-state-metrics">prometheus-community/kube-state-metrics</a></li><li><a href="https://github.com/prometheus-community/helm-charts/tree/main/charts/prometheus-node-exporter">prometheus-community/prometheus-node-exporter</a></li><li><a href="https://github.com/grafana/helm-charts/tree/main/charts/grafana">grafana/grafana</a></li></ul><p>To get this Helm chart, let's run this command:</p><pre><code class="language-bash">helm repo add prometheus-community https://prometheus-community.github.io/helm-chartshelm repo update</code></pre><p><img src="/blog_images/2024/prometheus-and-grafana-integration/Screenshot_2023-11-01_at_1.15.43_PM.png" alt="Screenshot 2023-11-01 at 1.15.43PM.png"></p><p>We have downloaded the latest version of Prometheus &amp; Grafana.</p><p>To install the Prometheus Helm Chart on a Kubernetes Cluster, let's run thefollowing command:</p><pre><code class="language-bash">helm install my-kube-prometheus-stack prometheus-community/kube-prometheus-stack</code></pre><p><img src="/blog_images/2024/prometheus-and-grafana-integration/Screenshot_2023-11-01_at_1.21.46_PM.png" alt="Screenshot 2023-11-01 at 1.21.46PM.png"></p><p>We have successfully installed Prometheus &amp; Grafana on the Kubernetes Cluster.We can access the Prometheus &amp; Grafana servers via ports 9090 &amp; 80,respectively.</p><p>Now, let's run the followingcommand to view all the resources created by theHelm Chart in our Kubernetes cluster:</p><pre><code class="language-bash">kubectl get all</code></pre><p><img src="/blog_images/2024/prometheus-and-grafana-integration/Screenshot_2023-11-02_at_11.19.39_AM.png" alt="Screenshot 2023-11-02 at 11.19.39AM.png">The Helm chart created the following resources:</p><ul><li><strong>Pods</strong>: It hosts the deployed Prometheus Kubernetes application inside thecluster.</li><li><strong>Replica Sets</strong>: A collection of instances of the same application inside theKubernetes cluster. It enhances application reliability.</li><li><strong>Deployments</strong>: It is the blueprint for creating the application pods.</li><li><strong>Services</strong>: It exposes the pods running inside the Kubernetes cluster. Weuse it to access the deployed Kubernetes application.</li><li><strong>Stateful Sets</strong>: They manage the deployment of the stateful applicationcomponents and ensure stable and predictable network identities for thesecomponents.</li><li><strong>Daemon Sets</strong>: They ensure that all (or a specific set of) nodes run a copyof a pod, which is useful for tasks such as logging, monitoring, and othernode-specific operations.</li></ul><p>Run this command to view all the Kubernetes Services for Prometheus &amp; Grafana:</p><pre><code class="language-bash">kubectl get service</code></pre><p><img src="/blog_images/2024/prometheus-and-grafana-integration/Screenshot_2023-11-02_at_11.37.39_AM.png" alt="Screenshot 2023-11-02 at 11.37.39AM.png"></p><p>Listed services for Prometheus and Grafana are:</p><ul><li>alertmanager-operated</li><li>kube-prometheus-stack-alertmanager</li><li>kube-prometheus-stack-grafana</li><li>kube-prometheus-stack-kube-state-metrics</li><li>kube-prometheus-stack-operator</li><li>kube-prometheus-stack-prometheus</li><li>kube-prometheus-stack-prometheus-node-exporter</li><li>prometheus-operated</li></ul><p><code>kube-prometheus-stack-grafana</code> and <code>kube-prometheus-stack-prometheus</code> are the<code>ClusterIP</code> type services, which means we can only access them within theKubernetes cluster.</p><p>To expose the Prometheus and Grafana to be accessed outside the Kubernetescluster , we can either use the NodeIP or LoadBalance service.</p><h2>Exposing Prometheus and Grafana using NodePort services</h2><p>Let's run the following command to expose the<code>Prometheus</code>Kubernetes service:</p><pre><code class="language-bash">kubectl expose service kube-prometheus-stack-prometheus --type=NodePort --target-port=9090 --name=prometheus-node-port-servicekubectl expose service kube-prometheus-stack-grafana --type=NodePort --target-port=3000 --name=grafana-node-port-service</code></pre><p>That command will create new services of<code>NodePort</code>type &amp; make thePrometheusand Grafana is accessible outside the Kubernetes Cluster on ports<code>9090</code> and<code>80</code>.</p><p><img src="/blog_images/2024/prometheus-and-grafana-integration/Screenshot_2023-11-02_at_11.59.15_AM.png" alt="Screenshot 2023-11-02 at 11.59.15AM.png"></p><p>As we can see, the <code>grafana-node-port-service</code> and<code>prometheus-node-port-service</code> are successfully created and are being exposed onnode ports <code>32489</code> &amp; <code>30905</code></p><p>Now, we can run this command and get the external IP of any node to access thePrometheus and Grafana:</p><pre><code class="language-jsx">kubectl get nodes -o wide</code></pre><p><img src="/blog_images/2024/prometheus-and-grafana-integration/Screenshot_2023-11-02_at_11.57.17_AM.png" alt="Screenshot 2023-11-02 at 11.57.17AM.png"></p><p>We can use the External-IP and the node ports to access the Prometheus andGrafana dashboards outside the cluster environment.</p><p>Prometheus Dashboard</p><p><img src="/blog_images/2024/prometheus-and-grafana-integration/Screenshot_2023-11-02_at_12.04.36_PM.png" alt="Prometheus Dashboard"></p><p>Grafana Dashboard</p><p><img src="/blog_images/2024/prometheus-and-grafana-integration/Screenshot_2023-11-02_at_12.04.53_PM.png" alt="Grafana Dashboard"></p><p>Run this command, to get the password for the <strong>admin</strong> user of the Grafanadashboard:</p><pre><code>kubectl get secret --namespace default kube-prometheus-stack-grafana -o jsonpath=&quot;{.data.admin-password}&quot; | base64 --decode ; echo</code></pre><h2>Grafana Dashboard</h2><p>Upon login to the Grafana dashboard, use <code>admin</code>as the username and ourgenerated password. We will see &quot;Welcome to Grafana&quot; homepage as shown below.</p><p><img src="/blog_images/2024/prometheus-and-grafana-integration/Screenshot_2023-11-02_at_12.12.04_PM.png" alt="Screenshot 2023-11-02 at 12.12.04PM.png"></p><p>Since we used the Kube Prometheus Stack helm chart, the data source forPrometheus and Alert Manager is added by default.</p><p><img src="/blog_images/2024/prometheus-and-grafana-integration/Screenshot_2023-11-02_at_12.18.14_PM.png" alt="Screenshot 2023-11-02 at 12.18.14PM.png"></p><p>We can add more data sources by clicking on the <strong>Add new data source</strong> buttonon the top right side.</p><p>By default, this Helm chart adds multiple dashboards to monitor the health ofthe Kubernetes cluster and its resources.</p><p><img src="/blog_images/2024/prometheus-and-grafana-integration/Screenshot_2023-11-02_at_12.22.37_PM.png" alt="Screenshot 2023-11-02 at 12.22.37PM.png"></p><p>Additionally, we also have the option of creating our dashboards from scratch aswell as importing multiple Grafana dashboards provided by the<a href="https://grafana.com/grafana/dashboards/">Grafana library</a>.</p><p>To import a Grafana Dashboard, let's follow these steps:</p><ul><li><p>From this <a href="https://grafana.com/grafana/dashboards/">Grafana library</a>, we canadd any dashboard</p><p><img src="/blog_images/2024/prometheus-and-grafana-integration/Screenshot_2023-11-02_at_12.28.44_PM.png" alt="Screenshot 2023-11-02 at 12.28.44PM.png"></p></li><li><p>Select Dashboard and copy the Dashboard ID</p><p><img src="/blog_images/2024/prometheus-and-grafana-integration/Screenshot_2023-11-02_at_12.34.54_PM.png" alt="Screenshot 2023-11-02 at 12.34.54PM.png"></p></li><li><p>Under <strong>Dashboards</strong> page we can get the <strong>Import</strong> option</p><p><img src="/blog_images/2024/prometheus-and-grafana-integration/Screenshot_2023-11-02_at_12.32.55_PM.png" alt="Screenshot 2023-11-02 at 12.32.55PM.png"></p></li><li><p>Under &quot;Import Dashboard&quot; page, we need to paste the Dashboard IP that wecopied earlier &amp; click on the <strong>Load</strong> button.</p><p><img src="/blog_images/2024/prometheus-and-grafana-integration/Screenshot_2023-11-02_at_12.35.39_PM.png" alt="Screenshot 2023-11-02 at 12.35.39PM.png"></p></li><li><p>After clicking on the <strong>Load</strong> button, it will auto-load the dashboard fromthe library after which we can import the dashboard by clicking on the<strong>Import</strong> button.</p><p><img src="/blog_images/2024/prometheus-and-grafana-integration/Screenshot_2023-11-02_at_12.37.50_PM.png" alt="Screenshot 2023-11-02 at 12.37.50PM.png"></p></li><li><p>Once the import is complete, well be redirected to the new imported dashboardwhichll also be visible under the Dashboards page.</p><p><img src="/blog_images/2024/prometheus-and-grafana-integration/Screenshot_2023-11-02_at_12.39.42_PM.png" alt="Screenshot 2023-11-02 at 12.39.42PM.png"></p><p>We can use this Node Exporter dashboard to monitor &amp; observe the health of ournodes present in our Kubernetes Cluster.</p></li></ul><h2>Conclusion</h2><p>In this blog, we learned how to integrate Prometheus and Grafana using the helmchart. We also learned how to import dashboards into Grafana from the<a href="https://grafana.com/grafana/dashboards/">Grafana library</a>.</p><p>In the next blog, we will explore how to integrate<a href="https://grafana.com/oss/loki/">Grafana Loki</a> with Grafana and collect and storeevent-related metrics using the<a href="https://github.com/resmoio/kubernetes-event-exporter">Kubernetes Event Exporter</a>.</p>]]></content>
    </entry><entry>
       <title><![CDATA[How we added sleep when idle feature to NeetoDeploy and reduced cost]]></title>
       <author><name>Sreeram Venkitesh</name></author>
      <link href="https://www.bigbinary.com/blog/cost-reduction-in-neeto-deploy-by-turning-off-inactive-apps"/>
      <updated>2024-01-19T12:00:00+00:00</updated>
      <id>https://www.bigbinary.com/blog/cost-reduction-in-neeto-deploy-by-turning-off-inactive-apps</id>
      <content type="html"><![CDATA[<p><em>We are building <a href="https://neeto.com/neetoDeploy">NeetoDeploy</a>, a compellingHeroku alternative. Stay updated by following NeetoDeploy on<a href="https://twitter.com/neetodeploy">Twitter</a> and reading our<a href="https://www.bigbinary.com/blog/categories/neetodeploy">blog</a>.</em></p><h2>What is sleep when idle feature</h2><p>&quot;Sleep when idle&quot; is a feature of NeetoDeploy, which puts the deployedapplication to sleep when there is no hit to the server for 5 minutes. Thishelps reduce the cost of the server.</p><p>&quot;Sleep when idle&quot; feature can be enabled not only for the pull request reviewapplications, but for staging and production applications too. Many folks buildapplications to learn and for hobby. In such cases, there is no point in runningthe server when the server is not likely to get any traffic. Since NeetoDeploybilling is based on the usage &quot;Sleep when idle&quot; feature helps keep the bill lowfor the users.</p><p>Let's say you build something and you deployed to production. You shared it withyour friends. For a day or two you got a bit of traffic, and after that youmoved on to other things. If &quot;sleep when idle&quot; is enabled then you don't need toworry about anything. If the server is not getting any traffic then you will notbe billed.</p><h2>How is Neeto using sleep when idle feature</h2><p>At <a href="https://neeto.com">neeto</a>, we are building 20+ applications at the sametime. It means lots of pull requests for all these products and thus lots of PRreview apps are created.</p><p>For a long time, we were using Heroku to build the review apps. However whenNeetoDeploy started to become stable, we movedto generating PR review apps fromHeroku to NeetoDeploy. This helped reduce cost.</p><h2>How to make deployments sleep when idle?</h2><p>This video describes how &quot;sleep when idle&quot; feature is implemented.</p><p>&lt;iframewidth=&quot;560&quot;height=&quot;315&quot;src=&quot;https://youtube.com/embed/trn2DJyTjnw&quot;frameborder=&quot;0&quot;title=&quot;How we designed NeetoDeploy's 'sleep when idle' feature&quot;allow=&quot;accelerometer; autoplay; encrypted-media; gyroscope; picture-in-picture&quot;allowfullscreen</p><blockquote><p>&lt;/iframe&gt;</p></blockquote><p>Keeping the apps running only when they're being used involves two steps:</p><ol><li>Scaling the deployments down and bringing them back up again</li><li>Figuring out when to do the scaling</li></ol><p>The deployments can be scaled easily using the <code>kubectl scale</code> command. Forexample, if we want to turn our deployment off, we can run the following toupdate our deployment to zero replicas, essentially destroying all the pods.</p><pre><code class="language-bash">kubectl scale deployment/nginx --replicas=0</code></pre><p>We can also delete our service, ingress or any other resource we might havecreated for our deployment. The configuration of the deployment itself wouldstill be present in the cluster even when we make it sleep, since the KubernetesDeployment is not deleted.</p><p>When we want to bring our app back up again, we can use the same command to spinup new pods:</p><pre><code class="language-bash">kubectl scale deployment/nginx --replicas=1</code></pre><p>The challenge was to figure out <em>when</em> to do this. We decided that we'd have athreshold based on the time the app is last accessed by users. If theapplication is not accessed for more than five minutes, we consider theapplication to be idle and we will scale it down. It'll be brought back up whena user tries to access it again.</p><h2>Exploring existing solutions</h2><p>There are existing CNCF projects like <a href="https://knative.dev/">Knative</a> and<a href="https://keda.sh/">Keda</a>, which can potentially be used to achieve what we wanthere. We spent some time exploring these but realized that these solutionsweren't exactly suitable for our requirements. Kubernetes also natively has a<code>HPAScaleToZero</code><a href="https://kubernetes.io/docs/reference/command-line-tools-reference/feature-gates">feature gate</a>which enables the<a href="https://www.bigbinary.com/blog/solving-scalability-in-neeto-deploy#understanding-kubernetes-autoscalers">Horizontal Pod Autoscaler</a>to scale down deployments to zero pods, but this is still in alpha and, hence isnot available in EKS yet.</p><p>Ultimately, we decided to write our own service for achieving this. The entirebackend of NeetoDeploy was designed as<a href="https://www.bigbinary.com/blog/neeto-deploy-zero-to-one">a collection of microservices</a>from day one. So it made sense to build our <em>pod idling service</em> as anothermicroservice that runs in our cluster.</p><h2>Figuring out when to make applications sleep</h2><p>To know when applications can be idled, we need to know when people areaccessing the applications from their browsers. Since all the requests toapplications deployed on NeetoDeploy would go through our load balancer, itwould contain the information of when every app was last accessed.</p><p>We use <a href="https://traefik.io/traefik/">Traefik</a> as our load balancer and we usedTraefik's <a href="https://doc.traefik.io/traefik/middlewares/overview/">middlewares</a> toretrieve and process the information of when apps are being accessed. We wrote acustom middleware to send all the request information to the pod idling service,whenever an app is being accessed. The pod idling service would store all theURLs, along with the timestamp at which they were accessed, in a Redis cache.The following graphic shows how the request information would be collected andstored by the pod idling service into its Redis cache, both of which are runningwithin the cluster.</p><p><img src="/blog_images/2024/cost-reduction-in-neeto-deploy-by-turning-off-inactive-apps/pod-idling-new-architecture.png" alt="The architecture of the pod idling service"></p><p>The pod idling service would then filter the apps that were last accessed morethan five minutes ago. It then sends a request to the cluster to scale all theseapps down. We'd also delete any related resources like the Services and theIngressRoutes used to configure networking for the deployments.</p><p>We first tested this by running the service manually, and sure enough, all theinactive deployments are filtered and scaled properly. We then added this as acron job in the pod idling service, which would run every five minutes. Thismeans that no app would run for more than five minutes if they're not beingused.</p><p>But wait! How would we bring the app back up after scaling it down?</p><h2>Building the downtime service</h2><p>As we discussed above, we use Traefik's IngressRoutes to route traffic to theapplication being accessed. We made use of the<a href="https://doc.traefik.io/traefik/v2.10/routing/routers/#priority_1">priority parameter</a>of IngressRoutes to boot up apps that are sleeping. Essentially, we created awildcard Traefik IngressRoute that points to a &quot;downtime service&quot; deployment,which is a React app that serves a message of <code>There's nothing here, yet</code> to letusers know that the app they're trying to access doesn't exist. You can see thisin action if you visit a random URL in NeetoDeploy, say something like<a href="https://nonexistent-appname.neetodeployapp.com">nonexistent-appname.neetodeployapp.com</a></p><p><img src="/blog_images/2024/cost-reduction-in-neeto-deploy-by-turning-off-inactive-apps/downtime-service-page.png" alt="The downtime service page"></p><p>Wildcard IngressRoutes have the least priority by default. So if we create a&quot;catch-all&quot; wildcard IngressRoute, any invalid url without an IngressRoute ofits own, can be redirected to a single Service in Kubernetes. This is how we'reredirecting non-existent apps to the page shown above. In the following graphic,we can see how a request to a random URL is routed to the downtime service withthe wildcard IngressRoute.</p><p><img src="/blog_images/2024/cost-reduction-in-neeto-deploy-by-turning-off-inactive-apps/downtime-service-architecture.png" alt="Architecture of how the downtime service works in NeetoDeploy"></p><p>This also means that if an app is scaled down by the pod idling service and getsits IngressRoute deleted, the next time a user tries to access the app, therequest would instead be routed to the downtime service. We need to handle thescale up logic from the downtime service.</p><p>Whenever a user requests a URL that doesn't have an IngressRoute, there are twopossibilities.</p><ol><li>The app doesn't exist.</li><li>The app exists, but is currently scaled down.</li></ol><p>The downtime service would first check the cluster if the requested app ispresent in the cluster in a sleeping state. If not then the user will be servedthe &quot;There's nothing here, yet&quot; page. If there is a sleeping deployment,however, we boot it back up. The downtime service sends the scale up request tothe cluster. We keep redirecting the user back to the url till the app is up andrunning. This redirection would keep happening until the app is scaled up sincewe create the Service and IngressRoute only after the pods of the app arerunning. At this point, the request will be routed to the correct pod by theapp's IngressRoute, since it has a higher priority than the wildcardIngressRoute of the downtime service. All of these steps are illustrated in theGIF below:</p><p><img src="/blog_images/2024/cost-reduction-in-neeto-deploy-by-turning-off-inactive-apps/downtime-service.gif" alt="Illustration of how the downtime service works"></p><p>This design worked flawlessly and we were able to bring back scaled downapplications with as low as 20-30 seconds of wait time.</p><h2>Conclusion</h2><p>We've been running this setup for almost a year now, and it has been workingsmoothly so far. Pod idling service and the downtime service started as simplermicroservices and continue to evolve, adapting to the increasing demand as wegrow.</p><p>If your application runs on Heroku, you can deploy it on NeetoDeploy without anychange. If you want to give NeetoDeploy a try, then please send us an email atinvite@neeto.com.</p><p>If you have questions about NeetoDeploy or want to see the journey, followNeetoDeploy on <a href="https://twitter.com/neetodeploy">X</a>. You can also join our<a href="https://launchpass.com/neetohq">community Slack</a> to chat with us about anyNeeto product.</p>]]></content>
    </entry><entry>
       <title><![CDATA[Building the metrics dashboard in NeetoDeploy with Prometheus]]></title>
       <author><name>Sreeram Venkitesh</name></author>
      <link href="https://www.bigbinary.com/blog/using-prometheus-in-neeto-deploy"/>
      <updated>2024-01-09T12:00:00+00:00</updated>
      <id>https://www.bigbinary.com/blog/using-prometheus-in-neeto-deploy</id>
      <content type="html"><![CDATA[<p><em>We are building <a href="https://neeto.com/neetoDeploy">NeetoDeploy</a>, a compellingalternative to Heroku. Stay updated by following NeetoDeploy on<a href="https://twitter.com/neetodeploy">Twitter</a> and reading our<a href="https://www.bigbinary.com/blog/categories/neetodeploy">blog</a>.</em></p><p>One of the features that we wanted in our cloud deployment platform,<strong>NeetoDeploy</strong> was an application metrics. We decided to use<a href="https://prometheus.io/">Prometheus</a> for building this feature. Prometheus is anopen source monitoring and alerting toolkit and is a CNCF graduated project.Venturing into the Cloud Native ecosystem of projects apart from Kubernetes wassomething we had never done before. We ended up learning a lot about Prometheusand how to use it during the course of building this feature.</p><h2>Initial setup</h2><p>We installed Prometheus in our Kubernetes cluster by writing a deploymentconfiguration YAML and applying it to our cluster. We also provisioned an AWSElastic Block Store volume using a PersistentVolumeClaim to store the metricsdata collected by Prometheus. Prometheus needed a<a href="https://github.com/prometheus/prometheus/blob/main/documentation/examples/prometheus-kubernetes.yml">configuration file</a>where we defined what all targets it will be scraping metrics from. This is aYAML file which we stored in a ConfigMap in our cluster.</p><p>Targets in Prometheus can be anything that exposes metrics data in thePrometheus format at a <code>/metrics</code> endpoint. This can be your applicationservers, Kubernetes API servers or even Prometheus itself. Prometheus wouldscrape the data at the defined <code>scrape_interval</code> and store it in the volume astime series data. This can be queried and visualized in the Prometheus dashboardthat comes bundled in the Prometheus deployment.</p><p>We used <code>kubectl port-forward</code> command to test that Prometheus is workinglocally. Once everything was tested and we confirmed, we exposed Prometheus withan ingress so that we can hit its APIs with that url.</p><p>Initially we had configured the following targets:</p><ol><li><a href="https://github.com/prometheus/node_exporter">node_exporter</a> from Prometheus,which would scrape the metrics of the machine the deployment is running on.</li><li><a href="https://github.com/kubernetes/kube-state-metrics">kube-state-metrics</a> whichwould listen to the Kubernetes API and store metrics of all the objects.</li><li><a href="https://traefik.io/traefik/">Traefik</a> for all the network-related metrics(like the number of requests etc.) since we are using Traefik as our ingresscontroller.</li><li>kubernetes-nodes</li><li>kubernetes-pods</li><li>kubernetes-cadvisor</li><li>kubernetes-service-endpoints</li></ol><p>The last 4 scrape jobs would be collecting metrics from the Kubernetes REST APIrelated to nodes, pods, containers and services respectively.</p><p>For scraping metrics from all of these targets, we had set a resource request of500 MB of RAM and 0.5 vCPU to our Prometheus deployment.</p><p>After setting up all of this, the Prometheus deployment was running fine, and wewere able to see the data from the Prometheus dashboard. Seeing this, we weresatisfied and happily started hacking with PromQL, Prometheus's query language.</p><h2>The CrashLoopBackOff crime scene</h2><p><code>CrashLoopBackOff</code> is when a Kubernetes pod is going into a loop of crashing,restarting itself and then crashing again - and this was what was happening tothe Prometheus deployment we had created. From what we could see, the pod hadcrashed, and when it gets recreated, Prometheus would initialize itself and do areload of the<a href="https://prometheus.io/docs/prometheus/latest/storage/">Write Ahead Log (WAL)</a>.</p><p>The WAL is there for adding additional durability to the database. Prometheusstores the metrics it scrapes in-memory before persisting them to the databaseas chunks, and the WAL makes sure that the in-memory data will not be lost inthe case of a crash. In our case, the Prometheus deployment was crashing and itwould get recreated. It would try to load the data from WAL into memory, andthen crash again before this was completed, leading to the CrashLoopBackOffstate.</p><p>We tried deleting the WAL blocks manually from the volume, even though thiswould incur some data loss. This was able to bring the deployment back up againsince WAL replay needn't be done. The deployment went into CrashLoopBackOffagain after a while.</p><h2>Investigating the error</h2><p>The first line of approach we took was to monitor the CPU, memory, and diskusage of the deployment. The disk usage seemed to be normal. We had provisioneda 100GB volume and it wasn't anywhere near getting used up. The CPU usage alsoseemed normal. The memory usage, however, was suspicious.</p><p>After the pods had crashed initially, we recreated the deployment and monitoredit using kubectl's <code>--watch</code> flag for following all the pod updates. While doingthis we were able to see that the pods were going into CrashLoopBackOff becausethey were getting OOMKilled first. The <code>OOMKilled</code> error in Kubernetes is when apod is terminated because it tries to use more memory than it is allotted in itsresource limits. We were consistently seeing the <code>OOMKilled</code> error so memorymust be the culprit here.</p><p>We added Prometheus itself as a target in Prometheus so that we could monitorthe memory usage of the Prometheus deployment. The following was the generaltrend of how Prometheus's memory was increasing over time. This would go onuntil the memory would cross the specified limit, and then the pod would go intoCrashLoopBackOff.</p><p><img src="/blog_images/2024/using-prometheus-in-neeto-deploy/memory_usage.png" alt="Memory usage of the Prometheus deployment"></p><p>Now that we knew that memory was the issue, we started looking into what wascausing the memory leak. After talking with some folks from the Kubernetes Slackworkspace, we were asked to look at the TSDB status of the Prometheusdeployment. We monitored the stats in real time and saw that the number of timeseries data stored in the database was growing in tens of thousands by eachsecond! This lined up with the increase in the memory usage graph from earlier.</p><p><img src="/blog_images/2024/using-prometheus-in-neeto-deploy/tsdb.png" alt="Prometheus TSDB stats"></p><h2>How we fixed it</h2><p>We can calculate the memory requirement for Prometheus based on the number oftargets we are scraping metrics from and the frequency at which we are scrapingthe data. The memory requirement of the deployment is a function of both ofthese parameters. In our case ,this was definitely higher than what we couldafford to allocate (based on the nodegroup's machine type) since we werescraping a lot of data at a scrape interval of 15 seconds, which was set in thedefault configuration for Prometheus.</p><p>We increased the scrape interval to 60 seconds and removed all the targets fromthe Prometheus configuration whose metrics we didn't need for building thedashboard. Within the targets that we were scraping from, we used the<code>metric_relabel_configs</code> option to persist in the database only those metricswhich we needed and to drop everything else. We only needed the<code>container_cpu_usage_seconds_total</code>, <code>container_memory_usage_bytes</code> and the<code>traefik_service_requests_total</code> metrics - so we configured Prometheus so thatonly these three would be stored in our database, and by extension the WAL.</p><p>We redeployed Prometheus after making these changes and the memory showed greatstability afterwards. The following is the memory usage of Prometheus over thelast few days. It has not exceeded 1GB.</p><p><img src="/blog_images/2024/using-prometheus-in-neeto-deploy/memory_usage_after_fix.png" alt="Memory usage of the Prometheus deployment after the fix"></p><h2>The aftermath</h2><p>Once Prometheus was stable we were able to build the metrics dashboard with thePrometheus API in a straightforward manner. The metrics dashboard came to usewithin a couple of days, when the staging deployment of<a href="https://neetocode.com/">NeetoCode</a> had faced a downtime. You can see thechanges in the metrics from the time when the outage had occurred</p><p><img src="/blog_images/2024/using-prometheus-in-neeto-deploy/neetocode_metrics.png" alt="NeetoCode metrics showing the downtime"></p><p>The quintessential learning that we got from this experience is to always bewary of the resources that are being used up when it comes to tasks likescraping metrics over an extended period of time. We were scraping all themetrics initially in order to explore everything, even though all the metricswere not being used. But because of this, we were able to read a lot about howPrometheus works internally, and also learn some Prometheus best practices thehard way.</p><p>If your application runs on Heroku, you can deploy it on NeetoDeploy without anychange. If you want to give NeetoDeploy a try, then please send us an email at<a href="mailto:invite@neeto.com">invite@neeto.com</a>.</p><p>If you have questions about NeetoDeploy or want to see the journey, followNeetoDeploy on <a href="https://twitter.com/neetodeploy">Twitter</a>. You can also join our<a href="https://neetohq.slack.com/">community Slack</a> to chat with us about any Neetoproduct.</p>]]></content>
    </entry><entry>
       <title><![CDATA[Use parametrized containers to deploy Rails microservices]]></title>
       <author><name>Rahul Mahale</name></author>
      <link href="https://www.bigbinary.com/blog/deploying-rails-applications-with-parmaetrized-containers"/>
      <updated>2018-09-26T12:00:00+00:00</updated>
      <id>https://www.bigbinary.com/blog/deploying-rails-applications-with-parmaetrized-containers</id>
      <content type="html"><![CDATA[<p>When using micro services with containers, one has to consider <strong>modularity</strong>and <strong>reusability</strong> while designing a system.</p><p>While using Kubernetes as a distributed system for container deployments,modularity and reusability can be achieved using parameterizing containers todeploy micro services.</p><h2>Parameterized containers</h2><p>Assuming container as a function in a program, how many parameters does it have?Each parameter represents an input that can customize a generic container to aspecific situation.</p><p>Let's assume we have a Rails application isolated in services like puma,sidekiq/delayed-job and websocket. Each service runs as a separate deployment ona separate container for the same application. When deploying the change weshould be building the same image for all the three containers but they shouldbe different function/processes. In our case, we will be running 3 pods with thesame image. This can be achieved by building a generic image for containers. TheGeneric container must be accepting parameters to run different services.</p><p>We need to expose parameters and consume them inside the container. There aretwo ways to pass parameters to our container.</p><ol><li>Using environment variables.</li><li>Using command line arguments.</li></ol><p>In this article, we will use environment variables to run parameterizedcontainers like puma, sidekiq/delayed-job and websocket for Rails applicationson kubernetes.</p><p>We will deploy <a href="https://github.com/bigbinary/wheel">wheel</a> on kubernetes usingparametrized container approach.</p><h4>Pre-requisite</h4><ul><li><p>Understanding of<a href="https://docs.docker.com/engine/reference/builder/">Dockerfile</a> and imagebuilding.</p></li><li><p>Access to working kubernetes cluster.</p></li><li><p>Understanding of <a href="http://kubernetes.io/">Kubernetes</a> terms like<a href="http://kubernetes.io/docs/user-guide/pods/">pods</a>,<a href="http://kubernetes.io/docs/user-guide/deployments/">deployments</a>,<a href="https://kubernetes.io/docs/concepts/services-networking/service/">services</a>,<a href="https://kubernetes.io/docs/tasks/configure-pod-container/configure-pod-configmap/">configmap</a>,and<a href="https://kubernetes.io/docs/concepts/overview/working-with-objects/annotations/">annotations</a>.</p></li></ul><h3>Building a generic container image.</h3><p>Dockerfile (Link is not available) in wheel uses bash script<code>setup_while_container_init.sh</code> as a command to start a container. The script isself-explanatory and, as we can see, it consists of two functions <code>web</code> and<code>background</code>. Function <code>web</code> starts the puma service and <code>background</code> starts thedelayed_job service.</p><p>We create two different deployments on kubernetes for web and backgroundservices. Deployment templates are identical for both web and background. Thevalue of environment variable <code>POD_TYPE</code> to init-script runs the particularservice in a pod.</p><p>Once we have docker image built, let's deploy the application.</p><h3>Creating kubernetes deployment manifests for wheel application</h3><p>Wheel uses PostgreSQL database and we need postgres service to run theapplication. We will use the postgres image from docker hub and will deploy itas deployment.</p><p><strong>Note:</strong> For production deployments, database should be deployed as astatefulset or use managed database services.</p><p>K8s manifest for deploying PostgreSQL.</p><pre><code class="language-yaml">---apiVersion: extensions/v1beta1kind: Deploymentmetadata:  labels:    app: db  name: dbspec:  replicas: 1  template:    metadata:      labels:        app: db    spec:      containers:        - image: postgres:9.4          name: db          env:            - name: POSTGRES_USER              value: postgres            - name: POSTGRES_PASSWORD              value: welcome---apiVersion: v1kind: Servicemetadata:  labels:    app: db  name: dbspec:  ports:    - name: headless      port: 5432      targetPort: 5432  selector:    app: db</code></pre><p>Create Postgres DB and the service.</p><pre><code class="language-bash">$ kubectl create -f db-deployment.yml -f db-service.ymldeployment db createdservice db created</code></pre><p>Now that DB is available, we need to access it from the application using<code>database.yml</code>.</p><p>We will create configmap to store database credentials and mount it on the<code>config/database.yml</code> in our application deployments.</p><pre><code class="language-yaml">---apiVersion: v1kind: ConfigMapmetadata:  name: database-configdata:  database.yml: |    development:      adapter: postgresql      database: wheel_development      host: db      username: postgres      password: welcome      pool: 5    test:      adapter: postgresql      database: wheel_test      host: db      username: postgres      password: welcome      pool: 5    staging:      adapter: postgresql      database: postgres      host: db      username: postgres      password: welcome      pool: 5</code></pre><p>Create configmap for database.yml.</p><pre><code class="language-bash">$ kubectl create -f database-configmap.ymlconfigmap database-config created</code></pre><p>We have the database ready for our application, now let's proceed to deploy ourRails services.</p><h3>Deploying Rails micro-services using the same docker image</h3><p>In this blog, we will limit our services to web and background with kubernetesdeployment.</p><p>Let's create a deployment and service for our web application.</p><pre><code class="language-yaml">---apiVersion: extensions/v1beta1kind: Deploymentmetadata:  name: wheel-web  labels:    app: wheel-webspec:  replicas: 1  template:    metadata:      labels:        app: wheel-web    spec:      containers:      - image: bigbinary/wheel:generic        name: web        imagePullPolicy: Always        env:        - name: DEPLOY_TIME          value: $date          value: staging        - name: POD_TYPE          value: WEB        ports:        - containerPort: 80        volumeMounts:          - name: database-config            mountPath: /wheel/config/database.yml            subPath: database.yml      volumes:        - name: database-config          configMap:            name: database-config---apiVersion: v1kind: Servicemetadata:  labels:    app: wheel-web  name: webspec:  ports:  - name: puma    port: 80    targetPort: 80  selector:    app: wheel-web  type: LoadBalancer</code></pre><p>Note that we used <code>POD_TYPE</code> as <code>WEB</code>, which will start the puma process fromthe container startup script.</p><p>Let's create a web/puma deployment and service.</p><pre><code class="language-bash">kubectl create -f web-deployment.yml -f web-service.ymldeployment wheel-web createdservice web created</code></pre><pre><code class="language-yaml">---apiVersion: extensions/v1beta1kind: Deploymentmetadata:  name: wheel-background  labels:    app: wheel-backgroundspec:  replicas: 1  template:    metadata:      labels:        app: wheel-background    spec:      containers:        - image: bigbinary/wheel:generic          name: background          imagePullPolicy: Always          env:            - name: DEPLOY_TIME              value: $date            - name: POD_TYPE              value: background          ports:            - containerPort: 80          volumeMounts:            - name: database-config              mountPath: /wheel/config/database.yml              subPath: database.yml      volumes:        - name: database-config          configMap:            name: database-config---apiVersion: v1kind: Servicemetadata:  labels:    app: wheel-background  name: backgroundspec:  ports:    - name: background      port: 80      targetPort: 80  selector:    app: wheel-background</code></pre><p>For background/delayed-job we set <code>POD_TYPE</code> as <code>background</code>. It will startdelayed-job process.</p><p>Let's create background deployment and the service.</p><pre><code class="language-bash">kubectl create -f background-deployment.yml -f background-service.ymldeployment wheel-background createdservice background created</code></pre><p>Get application endpoint.</p><pre><code class="language-bash">$ kubectl get svc web -o wide | awk '{print $4}'a55714dd1a22d11e88d4b0a87a399dcf-2144329260.us-east-1.elb.amazonaws.com</code></pre><p>We can access the application using the endpoint.</p><p>Now let's see pods.</p><pre><code class="language-bash">$ kubectl get podsNAME                                READY     STATUS    RESTARTS   AGEdb-5f7d5c96f7-x9fll                 1/1       Running   0          1hwheel-background-6c7cbb4c75-sd9sd   1/1       Running   0          30mwheel-web-f5cbf47bd-7hzp8           1/1       Running   0          10m</code></pre><p>We see that <code>db</code> pod is running postgres, <code>wheel-web</code> pod is running puma and<code>wheel-background</code> pod is running delayed job.</p><p>If we check logs, everything coming to puma is handled by web pod. All thebackground jobs are handled by background pod.</p><p>Similarly, if we are using websocket, separate API pods, traffic will be routedto respective services.</p><p>This is how we can deploy Rails micro services using parametrized containers anda generic image.</p>]]></content>
    </entry><entry>
       <title><![CDATA[Kubernetes ingress controller for authenticating apps]]></title>
       <author><name>Rahul Mahale</name></author>
      <link href="https://www.bigbinary.com/blog/using-kubernetes-ingress-authentication"/>
      <updated>2018-08-14T12:00:00+00:00</updated>
      <id>https://www.bigbinary.com/blog/using-kubernetes-ingress-authentication</id>
      <content type="html"><![CDATA[<p><a href="https://kubernetes.io/docs/concepts/services-networking/ingress/">Kubernetes Ingress</a>has redefined the routing in this era of containerization and with all thesefreehand routing techniques the thought of &quot;My router my rules&quot; seems real.</p><p>We use nginx-ingress as a routing service for our applications. There is a lotmore than routing we can do with ingress. One of the important features issetting up authentication using ingress for our application. As all the trafficgoes from ingress to our service, it makes sense to setup authentication oningress.</p><p>As mentioned in<a href="https://github.com/kubernetes/ingress-nginx/tree/master/docs/examples/">ingress repository</a>there are different types of techniques available for authentication including:</p><ul><li>Basic authentication</li><li>Client-certs authentication</li><li>External authentication</li><li>Oauth external authentication</li></ul><p>In this blog, we will set up authentication for the sample application usingbasic ingress authentication technique.</p><h4>Pre-requisites</h4><ul><li><p>Access to working kubernetes cluster.</p></li><li><p>Understanding of <a href="http://kubernetes.io/">Kubernetes</a> terms like<a href="http://kubernetes.io/docs/user-guide/pods/">pods</a>,<a href="http://kubernetes.io/docs/user-guide/deployments/">deployments</a>,<a href="https://kubernetes.io/docs/concepts/services-networking/service/">services</a>,<a href="https://kubernetes.io/docs/tasks/configure-pod-container/configure-pod-configmap/">configmap</a>,<a href="https://kubernetes.io/docs/concepts/services-networking/ingress/">ingress</a>and<a href="https://kubernetes.io/docs/concepts/overview/working-with-objects/annotations/">annotations</a></p></li></ul><p>First, let's create ingress resources from upstream example by running thefollowing command.</p><pre><code class="language-bash">$ kubectl create -f https://raw.githubusercontent.com/kubernetes/ingress-nginx/master/deploy/mandatory.yamlnamespace &quot;ingress-nginx&quot; createddeployment &quot;default-http-backend&quot; createdservice &quot;default-http-backend&quot; createdconfigmap &quot;nginx-configuration&quot; createdconfigmap &quot;tcp-services&quot; createdconfigmap &quot;udp-services&quot; createdserviceaccount &quot;nginx-ingress-serviceaccount&quot; createdclusterrole &quot;nginx-ingress-clusterrole&quot; createdrole &quot;nginx-ingress-role&quot; createdrolebinding &quot;nginx-ingress-role-nisa-binding&quot; createdclusterrolebinding &quot;nginx-ingress-clusterrole-nisa-binding&quot; createddeployment &quot;nginx-ingress-controller&quot; created</code></pre><p>Now that ingress controller resources are created we need a service to accessthe ingress.</p><p>Use following manifest to create service for ingress.</p><pre><code class="language-yaml">apiVersion: v1kind: Servicemetadata:  annotations:    service.beta.kubernetes.io/aws-load-balancer-backend-protocol: tcp  labels:    k8s-addon: ingress-nginx.addons.k8s.io  name: ingress-nginx  namespace: ingress-nginxspec:  externalTrafficPolicy: Cluster  ports:    - name: https      port: 443      protocol: TCP      targetPort: http    - name: http      port: 80      protocol: TCP      targetPort: http  selector:    app: ingress-nginx  type: LoadBalancer</code></pre><p>Now, get the ELB endpoint and bind it with some domain name.</p><pre><code class="language-bash">$kubectl create -f ingress-service.ymlservice ingress-nginx created$ kubectl -n ingress-nginx get svc  ingress-nginx -o wideNAME            CLUSTER-IP      EXTERNAL-IP                                                               PORT(S)                      AGE       SELECTORingress-nginx   100.71.250.56   abcghccf8540698e8bff782799ca8h04-1234567890.us-east-2.elb.amazonaws.com   80:30032/TCP,443:30108/TCP   10s       app=ingress-nginx</code></pre><p>Let's create a deployment and service for our sample application kibana. We needelasticsearch to run kibana.</p><p>Here is manifest for the sample application.</p><pre><code class="language-yaml">---apiVersion: extensions/v1beta1kind: Deploymentmetadata:  labels:    app: kibana  name: kibana  namespace: ingress-nginxspec:  replicas: 1  template:    metadata:      labels:        app: kibana    spec:      containers:        - image: kibana:latest          name: kibana          ports:            - containerPort: 5601---apiVersion: v1kind: Servicemetadata:  annotations:  labels:    app: kibana  name: kibana  namespace: ingress-nginxspec:  ports:    - name: kibana      port: 5601      targetPort: 5601  selector:    app: kibana---apiVersion: extensions/v1beta1kind: Deploymentmetadata:  labels:    app: elasticsearch  name: elasticsearch  namespace: ingress-nginxspec:  replicas: 1  strategy:    type: RollingUpdate  template:    metadata:      labels:        app: elasticsearch    spec:      containers:        - image: elasticsearch:latest          name: elasticsearch          ports:            - containerPort: 5601---apiVersion: v1kind: Servicemetadata:  annotations:  labels:    app: elasticsearch  name: elasticsearch  namespace: ingress-nginxspec:  ports:    - name: elasticsearch      port: 9200      targetPort: 9200  selector:    app: elasticsearch</code></pre><p>Create the sample application.</p><pre><code class="language-bash">kubectl apply -f kibana.ymldeployment &quot;kibana&quot; createdservice &quot;kibana&quot; createddeployment &quot;elasticsearch&quot; createdservice &quot;elasticsearch&quot; created</code></pre><p>Now that we have created application and ingress resources, it's time to createan ingress and access the application.</p><p>Use the following manifest to create ingress.</p><pre><code class="language-yaml">apiVersion: extensions/v1beta1kind: Ingressmetadata:  annotations:  name: kibana-ingress  namespace: ingress-nginxspec:  rules:    - host: logstest.myapp-staging.com      http:        paths:          - path: /            backend:              serviceName: kibana              servicePort: 5601</code></pre><pre><code class="language-bash">$kubectl -n ingress-nginx create -f ingress.ymlingress &quot;kibana-ingress&quot; created.</code></pre><p>Now that our application is up, when we access the kibana dashboard using URLhttp://logstest.myapp-staging.com We directly have access to our Kibanadashboard and anyone with this URL can access logs as shown in the followingimage.</p><p><img src="/blog_images/2018/using-kubernetes-ingress-authentication/kibana.png" alt="Kibana dashboard without authentication"></p><p>Now, let's set up a basic authentication using htpasswd.</p><p>Follow below commands to generate the secret for credentials.</p><p>Let's create an auth file with username and password.</p><pre><code class="language-bash">$ htpasswd -c auth kibanaadminNew password: &lt;kibanaadmin&gt;New password:Re-type new password:Adding password for user kibanaadmin</code></pre><p>Create k8s secret.</p><pre><code class="language-bash">$ kubectl -n ingress-nginx create secret generic basic-auth --from-file=authsecret &quot;basic-auth&quot; created</code></pre><p>Verify the secret.</p><pre><code class="language-yaml">kubectl get secret basic-auth -o yamlapiVersion: v1data:  auth: Zm9vOiRhcHIxJE9GRzNYeWJwJGNrTDBGSERBa29YWUlsSDkuY3lzVDAKkind: Secretmetadata:  name: basic-auth  namespace: ingress-nginxtype: Opaque</code></pre><p>Use following annotations in our ingress manifest by updating the ingressmanifest.</p><pre><code class="language-bash">kubectl -n ingress-nginx edit ingress kibana ingress</code></pre><p>Paste the following annotations</p><pre><code class="language-bash">nginx.ingress.kubernetes.io/auth-type: basicnginx.ingress.kubernetes.io/auth-secret: basic-authnginx.ingress.kubernetes.io/auth-realm: &quot;Kibana Authentication Required - kibanaadmin&quot;</code></pre><p>Now that ingress is updated, hit the URL again and as shown in the image belowwe are asked for authentication.</p><p><img src="/blog_images/2018/using-kubernetes-ingress-authentication/kibana_auth.png" alt="Kibana dashboard without authentication"></p>]]></content>
    </entry><entry>
       <title><![CDATA[Speed up Docker image build process of a Rails app]]></title>
       <author><name>Vishal Telangre</name></author>
      <link href="https://www.bigbinary.com/blog/speeding-up-docker-image-build-process-of-a-rails-application"/>
      <updated>2018-07-25T12:00:00+00:00</updated>
      <id>https://www.bigbinary.com/blog/speeding-up-docker-image-build-process-of-a-rails-application</id>
      <content type="html"><![CDATA[<p><strong>tl;dr : We reduced the Docker image building time from 10 minutes to 5 minutesby reusing bundler cache and by precompiling assets.</strong></p><p>We deploy one of our Rails applications on a dedicated Kubernetes cluster.Kubernetes is a good fit for us since as per the load and resource consumption,Kubernetes horizontally scales the containerized application automatically. Theprerequisite to deploy any kind of application on Kubernetes is that theapplication needs to be containerized. We use Docker to containerize ourapplication.</p><p>We have been successfully containerizing and deploying our Rails application onKubernetes for about a year now. Although containerization was working fine, wewere not happy with the overall time spent to containerize the applicationwhenever we changed the source code and deployed the app.</p><p>We use <a href="https://jenkins.io/">Jenkins</a> for building on-demand Docker images ofour application with the help of<a href="https://wiki.jenkins-ci.org/display/JENKINS/CloudBees+Docker+Build+and+Publish+plugin">CloudBees Docker Build and Publish plugin</a>.</p><p>We observed that the average build time of a Jenkins job to build a Docker imagewas about 9 to 10 minutes.</p><p><img src="/blog_images/2018/speeding-up-docker-image-build-process-of-a-rails-application/build-time-trend-before-speedup-tweaks.png" alt="Screenshot of build time trend graph before speedup tweaks"></p><h2>Investigating what takes most time</h2><p>We wipe the workspace folder of the Jenkins job after finishing each Jenkinsbuild to avoid any unintentional behavior caused by the residue left from aprevious build. The application's folder is about 500 MiB in size. Each Jenkinsbuild spends about 20 seconds to perform a shallow Git clone of the latestcommit of the specified git branch from our remote GitHub repository.</p><p>After cloning the latest source code, Jenkins executes <code>docker build</code> command tobuild a Docker image with a unique tag to containerize the cloned source code ofthe application.</p><p>Jenkins build spends another 10 seconds invoking <code>docker build</code> command andsending build context to Docker daemon.</p><pre><code class="language-bash">01:05:43 [docker-builder] $ docker build --build-arg RAILS_ENV=production -t bigbinary/xyz:production-role-management-feature-1529436929 --pull=true --file=./Dockerfile /var/lib/jenkins/workspace/docker-builder01:05:53 Sending build context to Docker daemon 489.4 MB</code></pre><p>We use the same Docker image on a number of Kubernetes pods. Therefore, we donot want to execute <code>bundle install</code> and <code>rake assets:precompile</code> tasks whilestarting a container in each pod which would prevent that pod from accepting anyrequests until these tasks are finished.</p><p>The recommended approach is to run <code>bundle install</code> and <code>rake assets:precompile</code>tasks while or before containerizing the Rails application.</p><p>Following is a trimmed down version of our actual Dockerfile which is used by<code>docker build</code> command to containerize our application.</p><pre><code class="language-dockerfile">FROM bigbinary/xyz-base:latestENV APP_PATH /data/app/WORKDIR $APP_PATHADD . $APP_PATHARG RAILS_ENVRUN bin/bundle install --without development testRUN bin/rake assets:precompileCMD [&quot;bin/bundle&quot;, &quot;exec&quot;, &quot;puma&quot;]</code></pre><p>The <code>RUN</code> instructions in the above Dockerfile executes <code>bundle install</code> and<code>rake assets:precompile</code> tasks while building a Docker image. Therefore, when aKubernetes pod is created using such a Docker image, Kubernetes pulls the image,starts a Docker container using that image inside the pod and runs <code>puma</code> serverimmediately.</p><p>The base Docker image which we use in the <code>FROM</code> instruction contains necessarysystem packages. We rarely need to update any system package. Therefore, anintermediate layer which may have been built previously for that instruction isreused while executing the <code>docker build</code> command. If the layer for <code>FROM</code>instruction is reused, Docker reuses cached layers for the next two instructionssuch as <code>ENV</code> and <code>WORKDIR</code> respectively since both of them are never changed.</p><pre><code class="language-bash">01:05:53 Step 1/8 : FROM bigbinary/xyz-base:latest01:05:53 latest: Pulling from bigbinary/xyz-base01:05:53 Digest: sha256:193951cad605d23e38a6016e07c5d4461b742eb2a89a69b614310ebc898796f001:05:53 Status: Image is up to date for bigbinary/xyz-base:latest01:05:53  ---&gt; c2ab738db40501:05:53 Step 2/8 : ENV APP_PATH /data/app/01:05:53  ---&gt; Using cache01:05:53  ---&gt; 5733bc978f1901:05:53 Step 3/8 : WORKDIR $APP_PATH01:05:53  ---&gt; Using cache01:05:53  ---&gt; 0e5fbc868af8</code></pre><p>Docker checks contents of the files in the image and calculates checksum foreach file for an <code>ADD</code> instruction. Since source code changes often, thepreviously cached layer for the <code>ADD</code> instruction is invalidated due to themismatching checksums. Therefore, the 4th instruction <code>ADD</code> in our Dockerfilehas to add the local files in the provided build context to the filesystem ofthe image being built in a separate intermediate container instead of reusingthe previously cached instruction layer. On an average, this instruction spendsabout 25 seconds.</p><pre><code class="language-bash">01:05:53 Step 4/8 : ADD . $APP_PATH01:06:12  ---&gt; cbb9a6ac297e01:06:17 Removing intermediate container 99ca98218d99</code></pre><p>We need to build Docker images for our application using different Railsenvironments. To achieve that, we trigger a<a href="https://wiki.jenkins.io/display/JENKINS/Parameterized+Build">parameterized Jenkins build</a>by specifying the needed Rails environment parameter. This parameter is thenpassed to the <code>docker build</code> command using <code>--build-arg RAILS_ENV=production</code>option. The <code>ARG</code> instruction in the Dockerfile defines <code>RAILS_ENV</code> variable andis implicitly used as an environment variable by the rest of the instructionsdefined just after that <code>ARG</code> instruction. Even if the previous <code>ADD</code>instruction didn't invalidate build cache; if the <code>ARG</code> variable is differentfrom a previous build, then a &quot;cache miss&quot; occurs and the build cache isinvalidated for the subsequent instructions.</p><pre><code class="language-bash">01:06:17 Step 5/8 : ARG RAILS_ENV01:06:17  ---&gt; Running in b793b8cc2fe701:06:22  ---&gt; b8a70589e38401:06:24 Removing intermediate container b793b8cc2fe7</code></pre><p>The next two <code>RUN</code> instructions are used to install gems and precompile staticassets using sprockets. As earlier instruction(s) already invalidates the buildcache, these <code>RUN</code> instructions are mostly executed instead of reusing cachedlayer. The <code>bundle install</code> command takes about 2.5 minutes and the<code>rake assets:precompile</code> task takes about 4.35 minutes.</p><pre><code class="language-bash">01:06:24 Step 6/8 : RUN bin/bundle install --without development test01:06:24  ---&gt; Running in a556c7ca842a01:06:25 bin/bundle install --without development test01:08:22  ---&gt; 82ab04f1ff4201:08:40 Removing intermediate container a556c7ca842a01:08:58 Step 7/8 : RUN bin/rake assets:precompile01:08:58  ---&gt; Running in b345c73a22c01:08:58 bin/bundle exec rake assets:precompile01:09:07 ** Invoke assets:precompile (first_time)01:09:07 ** Invoke assets:environment (first_time)01:09:07 ** Execute assets:environment01:09:07 ** Invoke environment (first_time)01:09:07 ** Execute environment01:09:12 ** Execute assets:precompile01:13:20  ---&gt; 57bf04f3c11101:13:23 Removing intermediate container b345c73a22c</code></pre><p>Above both <code>RUN</code> instructions clearly looks like the main culprit which wereslowing down the whole <code>docker build</code> command and thus the Jenkins build.</p><p>The final instruction <code>CMD</code> which starts the <code>puma</code> server takes another 10seconds. After building the Docker image, the <code>docker push</code> command spendsanother minute.</p><pre><code class="language-bash">01:13:23 Step 8/8 : CMD [&quot;bin/bundle&quot;, &quot;exec&quot;, &quot;puma&quot;]01:13:23  ---&gt; Running in 104967ad155301:13:31  ---&gt; 35d2259cdb1d01:13:34 Removing intermediate container 104967ad155301:13:34 [0mSuccessfully built 35d2259cdb1d01:13:35 [docker-builder] $ docker inspect 35d2259cdb1d01:13:35 [docker-builder] $ docker push bigbinary/xyz:production-role-management-feature-152943692901:13:35 The push refers to a repository [docker.io/bigbinary/xyz]01:14:21 d67854546d53: Pushed01:14:22 production-role-management-feature-1529436929: digest: sha256:07f86cfd58fac412a38908d7a7b7d0773c6a2980092df416502d7a5c051910b3 size: 410601:14:22 Finished: SUCCESS</code></pre><p>So, we found the exact commands which were causing the <code>docker build</code> command totake so much time to build a Docker image.</p><p>Let's summarize the steps involved in building our Docker image and the averagetime each needed to finish.</p><table><thead><tr><th>Command or Instruction</th><th>Average Time Spent</th></tr></thead><tbody><tr><td>Shallow clone of Git Repository by Jenkins</td><td>20 Seconds</td></tr><tr><td>Invocation of <code>docker build</code> by Jenkins and sending build context to Docker daemon</td><td>10 Seconds</td></tr><tr><td><code>FROM bigbinary/xyz-base:latest</code></td><td>0 Seconds</td></tr><tr><td><code>ENV APP_PATH /data/app/</code></td><td>0 Seconds</td></tr><tr><td><code>WORKDIR $APP_PATH</code></td><td>0 Seconds</td></tr><tr><td><code>ADD . $APP_PATH</code></td><td>25 Seconds</td></tr><tr><td><code>ARG RAILS_ENV</code></td><td>7 Seconds</td></tr><tr><td><code>RUN bin/bundle install --without development test</code></td><td>2.5 Minutes</td></tr><tr><td><code>RUN bin/rake assets:precompile</code></td><td>4.35 Minutes</td></tr><tr><td><code>CMD [&quot;bin/bundle&quot;, &quot;exec&quot;, &quot;puma&quot;]</code></td><td>1.15 Minutes</td></tr><tr><td><strong>Total</strong></td><td><strong>9 Minutes</strong></td></tr></tbody></table><p>Often, people build Docker images from a single Git branch, like <code>master</code>. Sincechanges in a single branch are incremental and hardly has differences in the<code>Gemfile.lock</code> file across commits, bundler cache need not be managedexplicitly. Instead, Docker automatically reuses the previously built layer forthe <code>RUN bundle install</code> instruction if the <code>Gemfile.lock</code> file remainsunchanged.</p><p>In our case, this does not happen. For every new feature or a bug fix, we createa separate Git branch. To verify the changes on a particular branch, we deploy aseparate review app which serves the code from that branch. To achieve thisworkflow, everyday we need to build a lot of Docker images containing sourcecode from varying Git branches as well as with varying environments. Most of thetimes, the <code>Gemfile.lock</code> and assets have different versions across these Gitbranches. Therefore, it is hard for Docker to cache layers for <code>bundle install</code>and <code>rake assets:precompile</code> tasks and reuse those layers during every<code>docker build</code> command run with different application source code and adifferent environment. This is why the previously built Docker layer for the<code>RUN bin/bundle install</code> instruction and the <code>RUN bin/rake assets:precompile</code>instruction was often not being used in our case. This reason was causing the<code>RUN</code> instructions to be executed without reusing the previously built Dockerlayer cache while performing every other Docker build.</p><p>Before discussing the approaches to speed up our Docker build flow, let'sfamiliarize with the <code>bundle install</code> and <code>rake assets:precompile</code> tasks and howto speed up them by reusing cache.</p><h2>Speeding up &quot;bundle install&quot; by using cache</h2><p>By default, Bundler installs gems at the location which is set by Rubygems.Also, Bundler looks up for the installed gems at the same location.</p><p>This location can be explicitly changed by using <code>--path</code> option.</p><p>If <code>Gemfile.lock</code> does not exist or no gem is found at the explicitly providedlocation or at the default gem path then <code>bundle install</code> command fetches allremote sources, resolves dependencies if needed and installs required gems asper <code>Gemfile</code>.</p><p>The <code>bundle install --path=vendor/cache</code> command would install the gems at the<code>vendor/cache</code> location in the current directory. If the same command is runwithout making any change in <code>Gemfile</code>, since the gems were already installedand cached in <code>vendor/cache</code>, the command will finish instantly because Bundlerneed not to fetch any new gems.</p><p>The tree structure of <code>vendor/cache</code> directory looks like this.</p><pre><code class="language-tree">vendor/cache aasm-4.12.3.gem actioncable-5.1.4.gem activerecord-5.1.4.gem [...] ruby  2.4.0      bin       aws.rb       dotenv       erubis       [...]      build_info       nokogiri-1.8.1.info      bundler       gems           activeadmin-043ba0c93408          [...]      cache       aasm-4.12.3.gem       actioncable-5.1.4.gem       [...]       bundler        git      specifications          aasm-4.12.3.gemspec          actioncable-5.1.4.gemspec          activerecord-5.1.4.gemspec          [...]           [...][...]</code></pre><p>It appears that Bundler keeps two separate copies of the <code>.gem</code> files at twodifferent locations, <code>vendor/cache</code> and <code>vendor/cache/ruby/VERSION_HERE/cache</code>.</p><p>Therefore, even if we remove a gem in the <code>Gemfile</code>, then that gem will beremoved only from the <code>vendor/cache</code> directory. The<code>vendor/cache/ruby/VERSION_HERE/cache</code> will still have the cached <code>.gem</code> filefor that removed gem.</p><p>Let's see an example.</p><p>We have <code>'aws-sdk', '2.11.88'</code> gem in our Gemfile and that gem is installed.</p><pre><code class="language-bash">$ ls vendor/cache/aws-sdk-*vendor/cache/aws-sdk-2.11.88.gemvendor/cache/aws-sdk-core-2.11.88.gemvendor/cache/aws-sdk-resources-2.11.88.gem$ ls vendor/cache/ruby/2.4.0/cache/aws-sdk-*vendor/cache/ruby/2.4.0/cache/aws-sdk-2.11.88.gemvendor/cache/ruby/2.4.0/cache/aws-sdk-core-2.11.88.gemvendor/cache/ruby/2.4.0/cache/aws-sdk-resources-2.11.88.gem</code></pre><p>Now, we will remove the <code>aws-sdk</code> gem from Gemfile and run <code>bundle install</code>.</p><pre><code class="language-bash">$ bundle install --path=vendor/cacheUsing rake 12.3.0Using aasm 4.12.3[...]Updating files in vendor/cacheRemoving outdated .gem files from vendor/cache  * aws-sdk-2.11.88.gem  * jmespath-1.3.1.gem  * aws-sdk-resources-2.11.88.gem  * aws-sdk-core-2.11.88.gem  * aws-sigv4-1.0.2.gemBundled gems are installed into `./vendor/cache`$ ls vendor/cache/aws-sdk-*no matches found: vendor/cache/aws-sdk-*$ ls vendor/cache/ruby/2.4.0/cache/aws-sdk-*vendor/cache/ruby/2.4.0/cache/aws-sdk-2.11.88.gemvendor/cache/ruby/2.4.0/cache/aws-sdk-core-2.11.88.gemvendor/cache/ruby/2.4.0/cache/aws-sdk-resources-2.11.88.gem</code></pre><p>We can see that the cached version of gem(s) remained unaffected.</p><p>If we add the same gem <code>'aws-sdk', '2.11.88'</code> back to the Gemfile and perform<code>bundle install</code>, instead of fetching that gem from remote Gem repository,Bundler will install that gem from the cache.</p><pre><code class="language-bash">$ bundle install --path=vendor/cacheResolving dependencies........[...]Using aws-sdk 2.11.88[...]Updating files in vendor/cache  * aws-sigv4-1.0.3.gem  * jmespath-1.4.0.gem  * aws-sdk-core-2.11.88.gem  * aws-sdk-resources-2.11.88.gem  * aws-sdk-2.11.88.gem$ ls vendor/cache/aws-sdk-*vendor/cache/aws-sdk-2.11.88.gemvendor/cache/aws-sdk-core-2.11.88.gemvendor/cache/aws-sdk-resources-2.11.88.gem</code></pre><p>What we understand from this is that if we can reuse the explicitly provided<code>vendor/cache</code> directory every time we need to execute <code>bundle install</code> command,then the command will be much faster because Bundler will use gems from localcache instead of fetching from the Internet.</p><h2>Speeding up &quot;rake assets:precompile&quot; task by using cache</h2><p>JavaScript code written in TypeScript, Elm, JSX etc cannot be directly served tothe browser. Almost all web browsers understands JavaScript (ES4), CSS and imagefiles. Therefore, we need to transpile, compile or convert the source asset intothe formats which browsers can understand. In Rails,<a href="https://github.com/rails/sprockets">Sprockets</a> is the most widely used libraryfor managing and compiling assets.</p><p>In development environment, Sprockets compiles assets on-the-fly as and whenneeded using <code>Sprockets::Server</code>. In production environment, recommendedapproach is to pre-compile assets in a directory on disk and serve it using aweb server like Nginx.</p><p>Precompilation is a multi-step process for converting a source asset file into astatic and optimized form using components such as processors, transformers,compressors, directives, environments, a manifest and pipelines with the help ofvarious gems such as <code>sass-rails</code>, <code>execjs</code>, etc. The assets need to beprecompiled in production so that Sprockets need not resolve inter-dependenciesbetween required source dependencies every time a static asset is requested. Tounderstand how Sprockets work in great detail, please read<a href="https://github.com/rails/sprockets/blob/0cb3314368f9f9e84343ebedcc09c7137e920bc4/guides/how_sprockets_works.md#sprockets">this guide</a>.</p><p>When we compile source assets using <code>rake assets:precompile</code> task, we can findthe compiled assets in <code>public/assets</code> directory inside our Rails application.</p><pre><code class="language-bash">$ ls public/assetsmanifest-15adda275d6505e4010b95819cf61eb3.jsonicons-6250335393ad03df1c67eafe138ab488.eoticons-6250335393ad03df1c67eafe138ab488.eot.gzcons-b341bf083c32f9e244d0dea28a763a63.svgcons-b341bf083c32f9e244d0dea28a763a63.svg.gzapplication-8988c56131fcecaf914b22f54359bf20.jsapplication-8988c56131fcecaf914b22f54359bf20.js.gzxlsx.full.min-feaaf61b9d67aea9f122309f4e78d5a5.jsxlsx.full.min-feaaf61b9d67aea9f122309f4e78d5a5.js.gzapplication-adc697aed7731c864bafaa3319a075b1.cssapplication-adc697aed7731c864bafaa3319a075b1.css.gzFontAwesome-42b44fdc9088cae450b47f15fc34c801.otfFontAwesome-42b44fdc9088cae450b47f15fc34c801.otf.gz[...]</code></pre><p>We can see that the each source asset has been compiled and minified along withits gunzipped version.</p><p>Note that the assets have a unique and random digest or fingerprint in theirfile names. A digest is a hash calculated by Sprockets from the contents of anasset file. If the contents of an asset is changed, then that asset's digestalso changes. The digest is mainly used for busting cache so a new version ofthe same asset can be generated if the source file is modified or the configuredcache period is expired.</p><p>The <code>rake assets:precompile</code> task also generates a manifest file along with theprecompiled assets. This manifest is used by Sprockets to perform fast lookupswithout having to actually compile our assets code.</p><p>An example manifest file, in our case<code>public/assets/manifest-15adda275d6505e4010b95819cf61eb3.json</code> looks like this.</p><pre><code class="language-json">{  &quot;files&quot;: {    &quot;application-8988c56131fcecaf914b22f54359bf20.js&quot;: {      &quot;logical_path&quot;: &quot;application.js&quot;,      &quot;mtime&quot;: &quot;2018-07-06T07:32:27+00:00&quot;,      &quot;size&quot;: 3797752,      &quot;digest&quot;: &quot;8988c56131fcecaf914b22f54359bf20&quot;    },    &quot;xlsx.full.min-feaaf61b9d67aea9f122309f4e78d5a5.js&quot;: {      &quot;logical_path&quot;: &quot;xlsx.full.min.js&quot;,      &quot;mtime&quot;: &quot;2018-07-05T22:06:17+00:00&quot;,      &quot;size&quot;: 883635,      &quot;digest&quot;: &quot;feaaf61b9d67aea9f122309f4e78d5a5&quot;    },    &quot;application-adc697aed7731c864bafaa3319a075b1.css&quot;: {      &quot;logical_path&quot;: &quot;application.css&quot;,      &quot;mtime&quot;: &quot;2018-07-06T07:33:12+00:00&quot;,      &quot;size&quot;: 242611,      &quot;digest&quot;: &quot;adc697aed7731c864bafaa3319a075b1&quot;    },    &quot;FontAwesome-42b44fdc9088cae450b47f15fc34c801.otf&quot;: {      &quot;logical_path&quot;: &quot;FontAwesome.otf&quot;,      &quot;mtime&quot;: &quot;2018-06-20T06:51:49+00:00&quot;,      &quot;size&quot;: 134808,      &quot;digest&quot;: &quot;42b44fdc9088cae450b47f15fc34c801&quot;    },    [...]  },  &quot;assets&quot;: {    &quot;icons.eot&quot;: &quot;icons-6250335393ad03df1c67eafe138ab488.eot&quot;,    &quot;icons.svg&quot;: &quot;icons-b341bf083c32f9e244d0dea28a763a63.svg&quot;,    &quot;application.js&quot;: &quot;application-8988c56131fcecaf914b22f54359bf20.js&quot;,    &quot;xlsx.full.min.js&quot;: &quot;xlsx.full.min-feaaf61b9d67aea9f122309f4e78d5a5.js&quot;,    &quot;application.css&quot;: &quot;application-adc697aed7731c864bafaa3319a075b1.css&quot;,    &quot;FontAwesome.otf&quot;: &quot;FontAwesome-42b44fdc9088cae450b47f15fc34c801.otf&quot;,    [...]  }}</code></pre><p>Using this manifest file, Sprockets can quickly find a fingerprinted file nameusing that file's logical file name and vice versa.</p><p>Also, Sprockets generates cache in binary format at <code>tmp/cache/assets</code> in theRails application's folder for the specified Rails environment. Following is anexample tree structure of the <code>tmp/cache/assets</code> directory automaticallygenerated after executing <code>RAILS_ENV=environment_here rake assets:precompile</code>command for each Rails environment.</p><pre><code class="language-tree">$ cd tmp/cache/assets &amp;&amp; tree. demo  sass   7de35a15a8ab2f7e131a9a9b42f922a69327805d    application.css.sassc    bootstrap.css.sassc   [...]  sprockets      002a592d665d92efe998c44adc041bd3      7dd8829031d3067dcf26ffc05abd2bd5      [...] production  sass   80d56752e13dda1267c19f4685546798718ad433    application.css.sassc    bootstrap.css.sassc   [...]  sprockets      143f5a036c623fa60d73a44d8e5b31e7      31ae46e77932002ed3879baa6e195507      [...] staging   sass    2101b41985597d41f1e52b280a62cd0786f2ee51     application.css.sassc     bootstrap.css.sassc    [...]   sprockets       2c154d4604d873c6b7a95db6a7d5787a       3ae685d6f922c0e3acea4bbfde7e7466       [...]</code></pre><p>Let's inspect the contents of an example cached file. Since the cached file isin binary form, we can forcefully see the non-visible control characters as wellas the binary content in text form using <code>cat -v</code> command.</p><pre><code class="language-bash">$ cat -v tmp/cache/assets/staging/sprockets/2c154d4604d873c6b7a95db6a7d5787a^D^H{^QI&quot;class^F:^FETI&quot;^SProcessedAsset^F;^@FI&quot;^Qlogical_path^F;^@TI&quot;^]components/Comparator.js^F;^@TI&quot;^Mpathname^F;^@TI&quot;T$root/app/assets/javascripts/components/Comparator.jsx^F;^@FI&quot;^Qcontent_type^F;^@TI&quot;^[application/javascript^F;^@TI&quot;mtime^F;^@Tl+^GM-gM-z;[I&quot;^Klength^F;^@Ti^BM-L^BI&quot;^Kdigest^F;^@TI&quot;%18138d01fe4c61bbbfeac6d856648ec9^F;^@FI&quot;^Ksource^F;^@TI&quot;^BM-L^Bvar Comparator = function (props) {  var comparatorOptions = [React.createElement(&quot;option&quot;, { key: &quot;?&quot;, value: &quot;?&quot; })];  var allComparators = props.metaData.comparators;  var fieldDataType = props.fieldDataType;  var allowedComparators = allComparators[fieldDataType] || allComparators.integer;  return React.createElement(    &quot;select&quot;,    {      id: &quot;comparator-&quot; + props.id,      disabled: props.disabled,      onChange: props.handleComparatorChange,      value: props.comparatorValue },    comparatorOptions.concat(allowedComparators.map(function (comparator, id) {      return React.createElement(        &quot;option&quot;,        { key: id, value: comparator },        comparator      );    }))  );};^F;^@TI&quot;^Vdependency_digest^F;^@TI&quot;%d6c86298311aa7996dd6b5389f45949f^F;^@FI&quot;^Srequired_paths^F;^@T[^FI&quot;T$root/app/assets/javascripts/components/Comparator.jsx^F;^@FI&quot;^Udependency_paths^F;^@T[^F{^HI&quot;   path^F;^@TI&quot;T$root/app/assets/javascripts/components/Comparator.jsx^F;^@F@^NI&quot;^^2018-07-03T22:38:31+00:00^F;^@T@^QI&quot;%51ab9ceec309501fc13051c173b0324f^F;^@FI&quot;^M_version^F;^@TI&quot;%30fd133466109a42c8cede9d119c3992^F;^@F</code></pre><p>We can see that there are some weird looking characters in the above filebecause it is not a regular file to be read by humans. Also, it seems to beholding some important information such as mime-type, original source code'spath, compiled source, digest, paths and digests of required dependencies, etc.Above compiled cache appears to be of the original source file located at<code>app/assets/javascripts/components/Comparator.jsx</code> having actual contents in JSXand ES6 syntax as shown below.</p><pre><code class="language-jsx">const Comparator = props =&gt; {  const comparatorOptions = [&lt;option key=&quot;?&quot; value=&quot;?&quot; /&gt;];  const allComparators = props.metaData.comparators;  const fieldDataType = props.fieldDataType;  const allowedComparators =    allComparators[fieldDataType] || allComparators.integer;  return (    &lt;select      id={`comparator-${props.id}`}      disabled={props.disabled}      onChange={props.handleComparatorChange}      value={props.comparatorValue}    &gt;      {comparatorOptions.concat(        allowedComparators.map((comparator, id) =&gt; (          &lt;option key={id} value={comparator}&gt;            {comparator}          &lt;/option&gt;        ))      )}    &lt;/select&gt;  );};</code></pre><p>If similar cache exists for a Rails environment under <code>tmp/cache/assets</code> and ifno source asset file is modified then re-running the <code>rake assets:precompile</code>task for the same environment will finish quickly. This is because Sprocketswill reuse the cache and therefore will need not to resolve the inter-assetsdependencies, perform conversion, etc.</p><p>Even if certain source assets are modified, Sprockets will rebuild the cache andre-generate compiled and fingerprinted assets just for the modified sourceassets.</p><p>Therefore, now we can understand that that if we can reuse the directories<code>tmp/cache/assets</code> and <code>public/assets</code> every time we need to execute<code>rake assets:precompile</code> task, then the Sprockets will perform precompilationmuch faster.</p><h2>Speeding up &quot;docker build&quot; -- first attempt</h2><p>As discussed above, we were now familiar about how to speed up the<code>bundle install</code> and <code>rake assets:precompile</code> commands individually.</p><p>We decided to use this knowledge to speed up our slow <code>docker build</code> command.Our initial thought was to mount a directory on the host Jenkins machine intothe filesystem of the image being built by the <code>docker build</code> command. Thismounted directory then can be used as a cache directory to persist the cachefiles of both <code>bundle install</code> and <code>rake assets:precompile</code> commands run as partof <code>docker build</code> command in each Jenkins build. Then every new build couldreuse the previous build's cache and therefore could finish faster.</p><p>Unfortunately, this wasn't possible due to no support from Docker yet. Unlikethe <code>docker run</code> command, we cannot mount a host directory into <code>docker build</code>command. A feature request for providing a shared host machine directory pathoption to the <code>docker build</code> command is still<a href="https://github.com/moby/moby/issues/14080#issuecomment-119371247">open here</a>.</p><p>To reuse cache and perform faster, we need to carry the cache files of both<code>bundle install</code> and <code>rake assets:precompile</code> commands between each<code>docker build</code> (therefore, Jenkins build). We were looking for some place whichcan be treated as a shared cache location and can be accessed during each build.</p><p>We decided to use Amazon's <a href="https://aws.amazon.com/s3/">S3 service</a> to solvethis problem.</p><p>To upload and download files from S3, we needed to inject credentials for S3into the build context provided to the <code>docker build</code> command.</p><p><img src="/blog_images/2018/speeding-up-docker-image-build-process-of-a-rails-application/jenkins-configuration-to-inject-s3-credentials-in-docker-build.png" alt="Screenshot of Jenkins configuration to inject S3 credentials in docker build command"></p><p>Alternatively, these S3 credentials can be provided to the <code>docker build</code>command using <code>--build-arg</code> option as discussed earlier.</p><p>We used <code>s3cmd</code> command-line utility to interact with the S3 service.</p><p>Following shell script named as <code>install_gems_and_precompile_assets.sh</code> wasconfigured to be executed using a <code>RUN</code> instruction while running the<code>docker build</code> command.</p><pre><code class="language-bash">set -ex# Step 1.if [ -e s3cfg ]; then mv s3cfg ~/.s3cfg; fibundler_cache_path=&quot;vendor/cache&quot;assets_cache_path=&quot;tmp/assets/cache&quot;precompiled_assets_path=&quot;public/assets&quot;cache_archive_name=&quot;cache.tar.gz&quot;s3_bucket_path=&quot;s3://docker-builder-bundler-and-assets-cache&quot;s3_cache_archive_path=&quot;$s3_bucket_path/$cache_archive_name&quot;# Step 2.# Fetch tarball archive containing cache and extract it.# The &quot;tar&quot; command extracts the archive into &quot;vendor/cache&quot;,# &quot;tmp/assets/cache&quot; and &quot;public/assets&quot;.if s3cmd get $s3_cache_archive_path; then  tar -xzf $cache_archive_name &amp;&amp; rm -f $cache_archive_namefi# Step 3.# Install gems from &quot;vendor/cache&quot; and pack up them.bin/bundle install --without development test --path $bundler_cache_pathbin/bundle pack --quiet# Step 4.# Precompile assets.# Note that the &quot;RAILS_ENV&quot; is already defined in Dockerfile# and will be used implicitly.bin/rake assets:precompile# Step 5.# Compress &quot;vendor/cache&quot;, &quot;tmp/assets/cache&quot;# and &quot;public/assets&quot; directories into a tarball archive.tar -zcf $cache_archive_name $bundler_cache_path \                             $assets_cache_path  \                             $precompiled_assets_path# Step 6.# Push the compressed archive containing updated cache to S3.s3cmd put $cache_archive_name $s3_cache_archive_path || true# Step 7.rm -f $cache_archive_name ~/.s3cfg</code></pre><p>Let's discuss the various steps annotated in the above script.</p><ol><li>The S3 credentials file injected by Jenkins into the build context needs tobe placed at <code>~/.s3cfg</code> location, so we move that credentials fileaccordingly.</li><li>Try to fetch the compressed tarball archive comprising directories such as<code>vendor/cache</code>, <code>tmp/assets/cache</code> and <code>public/assets</code>. If exists, extractthe tarball archive at respective paths and remove that tarball.</li><li>Execute the <code>bundle install</code> command which would reuse the extracted cachefrom <code>vendor/cache</code>.</li><li>Execute the <code>rake assets:precompile</code> command which would reuse the extractedcache from <code>tmp/assets/cache</code> and <code>public/assets</code>.</li><li>Compress the cache directories <code>vendor/cache</code>, <code>tmp/assets/cache</code> and<code>public/assets</code> in a tarball archive.</li><li>Upload the compressed tarball archive containing updated cache directories toS3.</li><li>Remove the compressed tarball archive and the S3 credentials file.</li></ol><p>Please note that, in our actual case we had generated different tarball archivesdepending upon the provided <code>RAILS_ENV</code> environment. For demonstration, here weuse just a single archive instead.</p><p>The <code>Dockerfile</code> needed to update to execute the<code>install_gems_and_precompile_assets.sh</code> script.</p><pre><code class="language-dockerfile">FROM bigbinary/xyz-base:latestENV APP_PATH /data/app/WORKDIR $APP_PATHADD . $APP_PATHARG RAILS_ENVRUN install_gems_and_precompile_assets.shCMD [&quot;bin/bundle&quot;, &quot;exec&quot;, &quot;puma&quot;]</code></pre><p>With this setup, average time of the Jenkins builds was now reduced to about 5minutes. This was a great achievement for us.</p><p>We reviewed this approach in a great detail. We found that although the approachwas working fine, there was a major security flaw. It is not at all recommendedto inject confidential information such as login credentials, private keys, etc.as part of the build context or using build arguments while building a Dockerimage using <code>docker build</code> command. And we were actually injecting S3credentials into the Docker image. Such confidential credentials provided whilebuilding a Docker image can be inspected using <code>docker history</code> command byanyone who has access to that Docker image.</p><p>Due to above reason, we needed to abandon this approach and look for another.</p><h2>Speeding up &quot;docker build&quot; -- second attempt</h2><p>In our second attempt, we decided to execute <code>bundle install</code> and<code>rake assets:precompile</code> commands outside the <code>docker build</code> command. Outsidemeaning the place to execute these commands was Jenkins build itself. So withthe new approach, we had to first execute <code>bundle install</code> and<code>rake assets:precompile</code> commands as part of the Jenkins build and then execute<code>docker build</code> as usual. With this approach, we could now avail the inter-buildcaching benefits provided by Jenkins.</p><p>The prerequisite was to have all the necessary system packages installed on theJenkins machine required by the gems enlisted in the application's Gemfile. Weinstalled all the necessary system packages on our Jenkins server.</p><p>Following screenshot highlights the things that we needed to configure in ourJenkins job to make this approach work.</p><p><img src="/blog_images/2018/speeding-up-docker-image-build-process-of-a-rails-application/jenkins-configuration-to-install-arbitrary-ruby-version-and-perform-caching.png" alt="Screenshot of Jenkins configuration highlighting installation of arbitrary Ruby version and maintaining cache and bundling gems and precompiling assets outside Docker build"></p><h4>1. Running the Jenkins build in RVM managed environment with the specified Ruby version</h4><p>Sometimes, we need to use different Ruby version as specified in the<code>.ruby-version</code> in the cloned source code of the application. By default, the<code>bundle install</code> command would install the gems for the system Ruby versionavailable on the Jenkins machine. This was not acceptable for us. Therefore, weneeded a way to execute the <code>bundle install</code> command in Jenkins build in anisolated environment which could use the Ruby version specified in the<code>.ruby-version</code> file instead of the default system Ruby version. To addressthis, we used <a href="https://wiki.jenkins.io/display/JENKINS/RVM+Plugin">RVM plugin</a>for Jenkins. The RVM plugin enabled us to run the Jenkins build in an isolatedenvironment by using or installing the Ruby version specified in the<code>.ruby-version</code> file. The section highlighted with green color in the abovescreenshot shows the configuration required to enable this plugin.</p><h4>2. Carrying cache files between Jenkins builds required to speed up &quot;bundle install&quot; and &quot;rake assets:precompile&quot; commands</h4><p>We used <a href="https://wiki.jenkins.io/display/JENKINS/Job+Cacher+Plugin">Job Cacher</a>Jenkins plugin to persist and carry the cache directories such as<code>vendor/cache</code>, <code>tmp/cache/assets</code> and <code>public/assets</code> between builds. At thebeginning of a Jenkins build just after cloning the source code of theapplication, the Job Cacher plugin restores the previously cached version ofthese directories into the current build. Similarly, before finishing a Jenkinsbuild, the Job Cacher plugin copies the current version of these directories at<code>/var/lib/jenkins/jobs/docker-builder/cache</code> on the Jenkins machine which isoutside the workspace directory of the Jenkins job. The section highlighted withred color in the above screenshot shows the necessary configuration required toenable this plugin.</p><h4>3. Executing the &quot;bundle install&quot; and &quot;rake assets:precompile&quot; commands before &quot;docker build&quot; command</h4><p>Using the &quot;Execute shell&quot; build step provided by Jenkins, we execute<code>bundle install</code> and <code>rake assets:precompile</code> commands just before the<code>docker build</code> command invoked by the CloudBees Docker Build and Publish plugin.Since the Job Cacher plugin already restores the version of <code>vendor/cache</code>,<code>tmp/cache/assets</code> and <code>public/assets</code> directories from the previous build intothe current build, the <code>bundle install</code> and <code>rake assets:precompile</code> commandsreuses the cache and performs faster.</p><p>The updated Dockerfile has lesser number of instructions now.</p><pre><code class="language-dockerfile">FROM bigbinary/xyz-base:latestENV APP_PATH /data/app/WORKDIR $APP_PATHADD . $APP_PATHCMD [&quot;bin/bundle&quot;, &quot;exec&quot;, &quot;puma&quot;]</code></pre><p>With this approach, average Jenkins build time is now between 3.5 to 4.5minutes.</p><p>Following graph shows the build time trend of some of the recent builds on ourJenkins server.</p><p><img src="/blog_images/2018/speeding-up-docker-image-build-process-of-a-rails-application/build-time-trend-after-speedup-tweaks.png" alt="Screenshot of build time trend graph after speedup tweaks"></p><p>Please note that the spikes in the above graphs shows that certain Jenkinsbuilds took more than 5 minutes sometimes due to concurrently running builds atthat time. Because our Jenkins server has a limited set of resources,concurrently running builds often run longer than estimated.</p><p>We are still looking to improve the containerization speed even more and stillmaintaining the image size small. Please let us know if there's anything else wecan do to improve the containerization process.</p><p>Note that that our Jenkins server runs on the Ubuntu OS which is based onDebian. Our base Docker image is also based on Debian. Some of the gems in ourGemfile are native extensions written in C. The pre-installed gems on Jenkinsmachine have been working without any issues while running inside the Dockercontainers on Kubernetes. It may not work if both of the platforms are differentsince native extension gems installed on Jenkins host may fail to work insidethe Docker container.</p>]]></content>
    </entry><entry>
       <title><![CDATA[Logtrail to tail log with Elasticsearch & Kibana on Kubernetes]]></title>
       <author><name>Rahul Mahale</name></author>
      <link href="https://www.bigbinary.com/blog/tail-log-using-logtrail-with-elk-on-kubernetes"/>
      <updated>2018-06-01T12:00:00+00:00</updated>
      <id>https://www.bigbinary.com/blog/tail-log-using-logtrail-with-elk-on-kubernetes</id>
      <content type="html"><![CDATA[<p>Monitoring and Logging are important aspects of deployments. Centralized loggingis always useful in helping us identify the problems.</p><p>EFK (Elasticsearch, Fluentd, Kibana) is a beautiful combination of tools tostore logs centrally and visualize them on a single click. There are many otheropen-source logging tools available in the market but EFK (ELK if Logstash isused) is one of the most widely used centralized logging tools.</p><p>This blog post shows how to integrate<a href="https://github.com/sivasamyk/logtrail">Logtrail</a> which has a<a href="https://papertrailapp.com/">papertrail</a> like UI to tail the logs. UsingLogtrail we can also apply filters to tail the logs centrally.</p><p>As EFK ships as an add-on with Kubernetes, all we have to do is deploy the EFKadd-on on our k8s cluster.</p><h4>Pre-requisite:</h4><ul><li><p>Access to working kubernetes cluster with<a href="https://kubernetes.io/docs/reference/kubectl/kubectl/">kubectl</a>configuration.</p></li><li><p>All our application logs should be redirected to STDOUT, so that Fluentdforwards them to Elasticsearch.</p></li><li><p>Understanding of <a href="http://kubernetes.io/">Kubernetes</a> terms like<a href="http://kubernetes.io/docs/user-guide/pods/">pods</a>,<a href="http://kubernetes.io/docs/user-guide/deployments/">deployments</a>,<a href="https://kubernetes.io/docs/concepts/services-networking/service/">services</a>,<a href="https://kubernetes.io/docs/concepts/workloads/controllers/daemonset/">daemonsets</a>,<a href="https://kubernetes.io/docs/tasks/configure-pod-container/configure-pod-configmap/">configmap</a>and<a href="https://kubernetes.io/docs/concepts/cluster-administration/addons/">addons</a>.</p></li></ul><p>Installing EFK add-on from<a href="https://github.com/kubernetes/kops/tree/master/addons/logging-elasticsearch">kubernetes upstream</a>is simple. Deploy EFK using following command.</p><pre><code class="language-bash">$ kubectl create -f https://raw.githubusercontent.com/kubernetes/kops/master/addons/logging-elasticsearch/v1.6.0.yamlserviceaccount &quot;elasticsearch-logging&quot; createdclusterrole &quot;elasticsearch-logging&quot; createdclusterrolebinding &quot;elasticsearch-logging&quot; createdserviceaccount &quot;fluentd-es&quot; createdclusterrole &quot;fluentd-es&quot; createdclusterrolebinding &quot;fluentd-es&quot; createddaemonset &quot;fluentd-es&quot; createdservice &quot;elasticsearch-logging&quot; createdstatefulset &quot;elasticsearch-logging&quot; createddeployment &quot;kibana-logging&quot; createdservice &quot;kibana-logging&quot; created</code></pre><p>Once k8s resources are created access the Kibana dashboard. To access thedashboard get the URL using <code>kubectl cluster-info</code></p><pre><code class="language-bash">$ kubectl cluster-info | grep KibanaKibana is running at https://api.k8s-test.com/api/v1/proxy/namespaces/kube-system/services/kibana-logging</code></pre><p>Now goto Kibana dashboard and we should be able to see the logs on ourdashboard.</p><p><img src="/blog_images/2018/tail-log-using-logtrail-with-elk-on-kubernetes/kibana_dashboard.png" alt="Kibana dashboard"></p><p>Above dashboard shows the Kibana UI. We can create metrics and graphs as per ourrequirement.</p><p>We also want to view logs in <code>tail</code> style. We will use<a href="https://github.com/sivasamyk/logtrail">logtrail</a> to view logs in tail format.For that, we need docker image having logtrail plugin pre-installed.</p><p><strong>Note:</strong> If upstream Kibana version of k8s EFK add-on is 4.x, use kibana 4.ximage for installing logtrail plugin in your custom image. If add-on ships withkibana version 5.x, make sure you pre-install logtrail on kibana 5 image.</p><p>Check the kibana version for add-on<a href="https://github.com/kubernetes/kops/blob/master/addons/logging-elasticsearch/v1.6.0.yaml#L245">here</a>.</p><p>We will replace default kibana image with<a href="https://hub.docker.com/r/rahulmahale/kubernetes-logtrail/">kubernetes-logtrail image</a>.</p><p>To replace docker image update the kibana deployment using below command.</p><pre><code class="language-bash">$ kubectl -n kube-system set image deployment/kibana-logging kibana-logging=rahulmahale/kubernetes-logtrail:latestdeployment &quot;kibana-logging&quot; image updated</code></pre><p>Once the image is deployed go to the kibana dashboard and click on logtrail asshown below.</p><p><img src="/blog_images/2018/tail-log-using-logtrail-with-elk-on-kubernetes/kibana-logtrail-menu.png" alt="Switch to logtrail"></p><p>After switching to logtrail we will start seeing all the logs in real time asshown below.</p><p><img src="/blog_images/2018/tail-log-using-logtrail-with-elk-on-kubernetes/logtrail.png" alt="Logs in Logtrail"></p><p>This centralized logging dashboard with logtrail allows us to filter on severalparameters.</p><p>For example let's say we want to check all the logs for namespace <code>myapp</code>. Wecan use filter <code>kubernetes.namespace_name:&quot;myapp&quot;</code>. We can user filter<code>kubernetes.container_name:&quot;mycontainer&quot;</code> to monitor log for a specificcontainer.</p>]]></content>
    </entry><entry>
       <title><![CDATA[Increase reliability using super_fetch of Sidekiq Pro]]></title>
       <author><name>Vishal Telangre</name></author>
      <link href="https://www.bigbinary.com/blog/increase-reliability-of-background-job-processing-using-super_fetch-of-sidekiq-pro"/>
      <updated>2018-05-08T12:00:00+00:00</updated>
      <id>https://www.bigbinary.com/blog/increase-reliability-of-background-job-processing-using-super_fetch-of-sidekiq-pro</id>
      <content type="html"><![CDATA[<p><a href="https://github.com/mperham/sidekiq">Sidekiq</a>is a background job processing library for Ruby.Sidekiq offers three versions: OSS, Pro and Enterprise.</p><p>OSS is free and open source and has basic features.Pro and Enterprise versions are closed source and paid,thus comes with more advanced features.To compare the list of features offered by each of these versions,please visit <a href="https://sidekiq.org">Sidekiq website</a>.</p><p>Sidekiq Pro 3.4.0<a href="https://github.com/mperham/sidekiq/blob/6e79f2a860ae558f2ed52b8917d2fede846c0a50/Pro-Changes.md#340">introduced</a><code>super_fetch</code> strategyto reliably fetch jobs from the queue in Redis.</p><p>In this post, we will discuss the benefits of using <code>super_fetch</code> strategy.</p><h2>Problem</h2><p>Open source version of Sidekiq comes with <code>basic_fetch</code> strategy.Let's see an example to understand how it works.</p><p>Let's add Sidekiq to our <code>Gemfile</code> and run <code>bundle install</code> to install it.</p><pre><code class="language-ruby">gem 'sidekiq'</code></pre><p>Add following Sidekiq worker in <code>app/workers/sleep_worker.rb</code>.</p><pre><code class="language-ruby">class SleepWorker  include Sidekiq::Worker  def perform(name)    puts &quot;Started #{name}&quot;    sleep 30    puts &quot;Finished #{name}&quot;  endend</code></pre><p>This worker does nothing great but sleeps for 30 seconds.</p><p>Let's open Rails consoleand schedule this worker to run as a background job asynchronously.</p><pre><code class="language-ruby">&gt;&gt; require &quot;sidekiq/api&quot;=&gt; true&gt;&gt; Sidekiq::Queue.new.size=&gt; 0&gt;&gt; SleepWorker.perform_async(&quot;A&quot;)=&gt; &quot;5d8bf898c36a60a1096cf4d3&quot;&gt;&gt; Sidekiq::Queue.new.size=&gt; 1</code></pre><p>As we can see, queue now has 1 job scheduled to be processed.</p><p>Let's start Sidekiq in another terminal tab.</p><pre><code class="language-ruby">$ bundle exec sidekiq40510 TID-owu1swr1i INFO: Booting Sidekiq 5.1.3 with redis options {:id=&gt;&quot;Sidekiq-server-PID-40510&quot;, :url=&gt;nil}40510 TID-owu1swr1i INFO: Starting processing, hit Ctrl-C to stop40510 TID-owu1tr5my SleepWorker JID-5d8bf898c36a60a1096cf4d3 INFO: startStarted A</code></pre><p>As we can see, the job with ID <code>5d8bf898c36a60a1096cf4d3</code>was picked up by Sidekiqand it started processing the job.</p><p>If we check the Sidekiq queue size in the Rails console, it will be zero now.</p><pre><code class="language-ruby">&gt;&gt; Sidekiq::Queue.new.size=&gt; 0</code></pre><p>Let's shutdown the Sidekiq process gracefullywhile Sidekiq is still in the middle of processing our scheduled job.Press either <code>Ctrl-C</code> or run <code>kill -SIGINT &lt;PID&gt;</code> command.</p><pre><code class="language-ruby">$ kill -SIGINT 40510</code></pre><pre><code class="language-ruby">40510 TID-owu1swr1i INFO: Shutting down40510 TID-owu1swr1i INFO: Terminating quiet workers40510 TID-owu1x00rm INFO: Scheduler exiting...40510 TID-owu1swr1i INFO: Pausing to allow workers to finish...40510 TID-owu1swr1i WARN: Terminating 1 busy worker threads40510 TID-owu1swr1i WARN: Work still in progress [#&lt;struct Sidekiq::BasicFetch::UnitOfWork queue=&quot;queue:default&quot;, job=&quot;{\&quot;class\&quot;:\&quot;SleepWorker\&quot;,\&quot;args\&quot;:[\&quot;A\&quot;],\&quot;retry\&quot;:true,\&quot;queue\&quot;:\&quot;default\&quot;,\&quot;jid\&quot;:\&quot;5d8bf898c36a60a1096cf4d3\&quot;,\&quot;created_at\&quot;:1525427293.956314,\&quot;enqueued_at\&quot;:1525427293.957355}&quot;&gt;]40510 TID-owu1swr1i INFO: Pushed 1 jobs back to Redis40510 TID-owu1tr5my SleepWorker JID-5d8bf898c36a60a1096cf4d3 INFO: fail: 19.576 sec40510 TID-owu1swr1i INFO: Bye!</code></pre><p>As we can see, Sidekiq pushed back the unfinished job back to Redis queuewhen Sidekiq received a <code>SIGINT</code> signal.</p><p>Let's verify it.</p><pre><code class="language-ruby">&gt;&gt; Sidekiq::Queue.new.size=&gt; 1</code></pre><p>Before we move on, let's learn some basics about signals such as <code>SIGINT</code>.</p><h2>A crash course on POSIX signals</h2><p><code>SIGINT</code> is an interrupt signal.It is an alternative to hitting<code>Ctrl-C</code> from the keyboard.When a process is running in foreground,we can hit <code>Ctrl-C</code> to signal the process to shut down.When the process is running in background,we can use <code>kill</code> command to send a <code>SIGINT</code> signal to the process' PID.A process can optionally catch this signal and shutdown itself gracefully.If the process does not respect this signal and ignores it,then nothing really happens and the process keeps running.Both <code>INT</code> and <code>SIGINT</code> are identical signals.</p><p>Another useful signal is <code>SIGTERM</code>.It is called a termination signal.A process can either catch itand perform necessary cleanup or just ignore it.Similar to a <code>SIGINT</code> signal,if a process ignores this signal, then the process keeps running.Note that, if no signal is supplied to the <code>kill</code> command,<code>SIGTERM</code> is used by default.Both <code>TERM</code> and <code>SIGTERM</code> are identical signals.</p><p><code>SIGTSTP</code> or <code>TSTP</code> is called terminal stop signal.It is an alternative to hitting <code>Ctrl-Z</code> on the keyboard.This signal causes a process to suspend further execution.</p><p><code>SIGKILL</code> is known as kill signal.This signal is intended to kill the process immediately and forcefully.A process cannot catch this signal,therefore the process cannot perform cleanup or graceful shutdown.This signal is usedwhen a process does not respect and respondto both <code>SIGINT</code> and <code>SIGTERM</code> signals.<code>KILL</code>, <code>SIGKILL</code> and <code>9</code> are identical signals.</p><p>There are a lot of other signals besides these,but they are not relevant for this post.Please check them out <a href="https://en.wikipedia.org/wiki/Signal_(IPC)#POSIX_signals">here</a>.</p><p>A Sidekiq process pays respectto all of these signals and behaves as we expect.When Sidekiq receives a <code>TERM</code> or <code>SIGTERM</code> signal,Sidekiq terminates itself gracefully.</p><h2>Back to our example</h2><p>Coming back to our example from above,we had sent a <code>SIGINT</code> signal to the Sidekiq process.</p><pre><code class="language-ruby">$ kill -SIGINT 40510</code></pre><p>On receiving this <code>SIGINT</code> signal,Sidekiq process having PID 40510 terminated quiet workers,paused the queue and waited for a whileto let busy workers finish their jobs.Since our busy SleepWorker did not finish quickly,Sidekiq terminated that busy workerand pushed it back to the queue in Redis.After that, Sidekiq gracefully terminated itself with an exit code 0.Note that, the default timeout is 8 secondsuntil which Sidekiq can wait to let the busy workers finishotherwise it pushes the unfinished jobs back to the queue in Redis.This timeout can be changed with <code>-t</code> optiongiven at the startup of Sidekiq process.</p><p>Sidekiq <a href="https://github.com/mperham/sidekiq/wiki/Deployment#overview">recommends</a>to send a <code>TSTP</code> and a <code>TERM</code> togetherto ensure that the Sidekiq process shuts down safely and gracefully.On receiving a <code>TSTP</code> signal,Sidekiq stops pulling new workandfinishes the work which is in-progress.The idea is to first send a <code>TSTP</code> signal,wait as much as possible (by default for 8 seconds as discussed above)to ensure that busy workers finish their jobsand then send a <code>TERM</code> signalto shutdown the process.</p><p>Sidekiq pushes back the unprocessed job in Redis when terminated gracefully.It means that Sidekiq pulls the unfinished job and starts processing again whenwe restart the Sidekiq process.</p><pre><code class="language-ruby">$ bundle exec sidekiq45916 TID-ovfq8ll0k INFO: Booting Sidekiq 5.1.3 with redis options {:id=&gt;&quot;Sidekiq-server-PID-45916&quot;, :url=&gt;nil}45916 TID-ovfq8ll0k INFO: Starting processing, hit Ctrl-C to stop45916 TID-ovfqajol4 SleepWorker JID-5d8bf898c36a60a1096cf4d3 INFO: startStarted AFinished A45916 TID-ovfqajol4 SleepWorker JID-5d8bf898c36a60a1096cf4d3 INFO: done: 30.015 sec</code></pre><p>We can see that Sidekiq pulled the previously terminated jobwith ID <code>5d8bf898c36a60a1096cf4d3</code> and processed that job again.</p><p>So far so good.</p><p>This behavior is implemented using<a href="https://github.com/mperham/sidekiq/blob/6e79f2a860ae558f2ed52b8917d2fede846c0a50/lib/sidekiq/fetch.rb"><code>basic_fetch</code></a>strategy which is present in the open sourced version of Sidekiq.</p><p>Sidekiq uses <a href="https://redis.io/commands/brpop">BRPOP</a> Redis commandto fetch a scheduled job from the queue.When a job is fetched,that job gets removed from the queue andthat job no longer exists in Redis.If this fetched job is processed, then all is good.Also, if the Sidekiq process is terminated gracefully onreceiving either a <code>SIGINT</code> or a <code>SIGTERM</code> signal,Sidekiq will push back the unfinished jobs back to the queue in Redis.</p><p>But what if the Sidekiq process crashes in the middlewhile processing that fetched job?</p><p>A process is considered as crashedif the process does not shutdown gracefully.As we discussed before,when we send a <code>SIGKILL</code> signal to a process,the process cannot receive or catch this signal.Because the process cannot shutdown gracefully and nicely,it gets crashed.</p><p>When a Sidekiq process is crashed,the fetched jobs by that Sidekiq processwhich are not yet finished get lostforever.</p><p>Let's try to reproduce this scenario.</p><p>We will schedule another job.</p><pre><code class="language-ruby">&gt;&gt; SleepWorker.perform_async(&quot;B&quot;)=&gt; &quot;37a5ab4139796c4b9dc1ea6d&quot;&gt;&gt; Sidekiq::Queue.new.size=&gt; 1</code></pre><p>Now, let's start Sidekiq process and kill it using <code>SIGKILL</code> or <code>9</code> signal.</p><pre><code class="language-ruby">$ bundle exec sidekiq47395 TID-ow8q4nxzf INFO: Starting processing, hit Ctrl-C to stop47395 TID-ow8qba0x7 SleepWorker JID-37a5ab4139796c4b9dc1ea6d INFO: startStarted B[1]    47395 killed     bundle exec sidekiq</code></pre><pre><code class="language-ruby">$ kill -SIGKILL 47395</code></pre><p>Let's check if Sidekiq had pushed the busy (unprocessed) jobback to the queue in Redis before terminating.</p><pre><code class="language-ruby">&gt;&gt; Sidekiq::Queue.new.size=&gt; 0</code></pre><p>No. It does not.</p><p>Actually, the Sidekiq process did not get a chance to shutdown gracefullywhen it received the <code>SIGKILL</code> signal.</p><p>If we restart the Sidekiq process,it cannot fetch that unprocessed jobsince the job was not pushed back to the queue in Redis at all.</p><pre><code class="language-ruby">$ bundle exec sidekiq47733 TID-ox1lau26l INFO: Booting Sidekiq 5.1.3 with redis options {:id=&gt;&quot;Sidekiq-server-PID-47733&quot;, :url=&gt;nil}47733 TID-ox1lau26l INFO: Starting processing, hit Ctrl-C to stop</code></pre><p>Therefore,the job having name argument as <code>B</code> or ID as <code>37a5ab4139796c4b9dc1ea6d</code>is completely lost.There is no way to get that job back.</p><p>Losing job like this may not be a problem for some applicationsbut for some critical applications this could be a huge issue.</p><p>We faced a similar problem like this.One of our clients' application is deployed on a Kubernetes cluster.Our Sidekiq process runs in a Docker containerin the Kubernetes<a href="https://kubernetes.io/docs/concepts/workloads/pods/pod">pods</a>which we call <code>background</code> pods.</p><p>Here's our stripped down version of<a href="https://kubernetes.io/docs/concepts/workloads/controllers/deployment/">Kubernetes deployment</a>manifest which creates a Kubernetes deployment resource.Our Sidekiq process runs in the pods spawned by that deployment resource.</p><pre><code class="language-ruby">---apiVersion: extensions/v1beta1kind: Deploymentmetadata:  name: backgroundspec:  replicas: 2  template:    spec:      terminationGracePeriodSeconds: 60      containers:      - name: background        image: &lt;%= ENV['IMAGE'] %&gt;        env:        - name: POD_TYPE          value: background        lifecycle:          preStop:            exec:              command:              - /bin/bash              - -l              - -c              - for pid in tmp/pids/sidekiq*.pid; do bin/bundle exec sidekiqctl stop $pid 60; done</code></pre><p>When we apply an updated version of this manifest,for say, changing the Docker image, the running pods are terminatedand new pods are created.</p><p>Before terminating the only container in the pod,Kubernetes executes <code>sidekiqctl stop $pid 60</code> commandwhich we have defined using the<a href="https://kubernetes.io/docs/tasks/configure-pod-container/attach-handler-lifecycle-event/">preStop</a>event handler.Note that, Kubernetes already sends <code>SIGTERM</code> signalto the container being terminated inside the podbefore invoking the <code>preStop</code> event handler.The default termination grace period is 30 seconds and it is configurable.If the container doesn't terminate within the termination grace period,a <code>SIGKILL</code> signal will be sent to forcefully terminate the container.</p><p>The <code>sidekiqctl stop $pid 60</code> command executed in the <code>preStop</code> handler doesthree things.</p><ol><li>Sends a <code>SIGTERM</code> signal to the Sidekiq process running in the container.</li><li>Waits for 60 seconds.</li><li>Sends a <code>SIGKILL</code> signal to kill the Sidekiq process forcefullyif the process has not terminated gracefully yet.</li></ol><p>This worked for us when the count of busy jobs was relatively small.</p><p>When the number of processing jobs is higher,Sidekiq does not get enough timeto quiet the busy workersand fails to push some of them back on the Redis queue.</p><p>We found that some of the jobs were getting lostwhen our <code>background</code> pod restarted.We had to restart our background pod forreasons such asupdating the Kubernetes deployment manifest,pod being automatically evicted by Kubernetesdue to host node encountering OOM (out of memory) issue, etc.</p><p>We tried increasing both<code>terminationGracePeriodSeconds</code> in the deployment manifestas well as the <code>sidekiqctl stop</code> command's timeout.Despite that,we still kept facing the same issueof losing jobs whenever pod restarts.</p><p>We even tried sending <code>TSTP</code> and then <code>TERM</code> after a timeoutrelatively longer than 60 seconds.But the pod was getting harshly terminatedwithout gracefully terminating Sidekiq process running inside it.Therefore we kept losing the busy jobswhich were running during the pod termination.</p><h2>Sidekiq Pro's super_fetch</h2><p>We were looking for a way to stop losing our Sidekiq jobsor a way to recover them reliably when our <code>background</code> Kubernetes pod restarts.</p><p>We realized that the commercial version of Sidekiq,Sidekiq Pro offers an additional fetch strategy,<a href="https://github.com/mperham/sidekiq/wiki/Reliability#using-super_fetch"><code>super_fetch</code></a>,which seemed more efficient and reliablecompared to <code>basic_fetch</code> strategy.</p><p>Let's see what difference <code>super_fetch</code> strategymakes over <code>basic_fetch</code>.</p><p>We will need to use <code>sidekiq-pro</code> gem which needs to be purchased.Since Sidekiq Pro gem is close sourced, we cannot fetch itfrom the default public gem registry,<a href="https://rubygems.org">https://rubygems.org</a>.Instead, we will have to fetch it from a private gem registrywhich we get after purchasing it.We add following code to our <code>Gemfile</code> and run <code>bundle install</code>.</p><pre><code class="language-ruby">source ENV['SIDEKIQ_PRO_GEM_URL'] do  gem 'sidekiq-pro'end</code></pre><p>To enable <code>super_fetch</code>,we need to add following codein an initializer <code>config/initializers/sidekiq.rb</code>.</p><pre><code class="language-ruby">Sidekiq.configure_server do |config|  config.super_fetch!end</code></pre><p>Well, that's it.Sidekiq will use <code>super_fetch</code> instead of <code>basic_fetch</code> now.</p><pre><code class="language-ruby">$ bundle exec sidekiq75595 TID-owsytgvqj INFO: Sidekiq Pro 4.0.2, commercially licensed.  Thanks for your support!75595 TID-owsytgvqj INFO: Booting Sidekiq 5.1.3 with redis options {:id=&gt;&quot;Sidekiq-server-PID-75595&quot;, :url=&gt;nil}75595 TID-owsytgvqj INFO: Starting processing, hit Ctrl-C to stop75595 TID-owsys5imz INFO: SuperFetch activated</code></pre><p>When <code>super_fetch</code> is activated, Sidekiq process' graceful shutdown behavioris similar to that of <code>basic_fetch</code>.</p><pre><code class="language-ruby">&gt;&gt; SleepWorker.perform_async(&quot;C&quot;)=&gt; &quot;f002a41393f9a79a4366d2b5&quot;&gt;&gt; Sidekiq::Queue.new.size=&gt; 1</code></pre><pre><code class="language-ruby">$ bundle exec sidekiq76021 TID-ow6kdcca5 INFO: Sidekiq Pro 4.0.2, commercially licensed.  Thanks for your support!76021 TID-ow6kdcca5 INFO: Booting Sidekiq 5.1.3 with redis options {:id=&gt;&quot;Sidekiq-server-PID-76021&quot;, :url=&gt;nil}76021 TID-ow6kdcca5 INFO: Starting processing, hit Ctrl-C to stop76021 TID-ow6klq2cx INFO: SuperFetch activated76021 TID-ow6kiesnp SleepWorker JID-f002a41393f9a79a4366d2b5 INFO: startStarted C</code></pre><pre><code class="language-ruby">&gt;&gt; Sidekiq::Queue.new.size=&gt; 0</code></pre><pre><code class="language-ruby">$ kill -SIGTERM 76021</code></pre><pre><code class="language-ruby">76021 TID-ow6kdcca5 INFO: Shutting down76021 TID-ow6kdcca5 INFO: Terminating quiet workers76021 TID-ow6kieuwh INFO: Scheduler exiting...76021 TID-ow6kdcca5 INFO: Pausing to allow workers to finish...76021 TID-ow6kdcca5 WARN: Terminating 1 busy worker threads76021 TID-ow6kdcca5 WARN: Work still in progress [#&lt;struct Sidekiq::Pro::SuperFetch::Retriever::UnitOfWork queue=&quot;queue:default&quot;, job=&quot;{\&quot;class\&quot;:\&quot;SleepWorker\&quot;,\&quot;args\&quot;:[\&quot;C\&quot;],\&quot;retry\&quot;:true,\&quot;queue\&quot;:\&quot;default\&quot;,\&quot;jid\&quot;:\&quot;f002a41393f9a79a4366d2b5\&quot;,\&quot;created_at\&quot;:1525500653.404454,\&quot;enqueued_at\&quot;:1525500653.404501}&quot;, local_queue=&quot;queue:sq|vishal.local:76021:3e64c4b08393|default&quot;&gt;]76021 TID-ow6kdcca5 INFO: SuperFetch: Moving job from queue:sq|vishal.local:76021:3e64c4b08393|default back to queue:default76021 TID-ow6kiesnp SleepWorker JID-f002a41393f9a79a4366d2b5 INFO: fail: 13.758 sec76021 TID-ow6kdcca5 INFO: Bye!</code></pre><pre><code class="language-ruby">&gt;&gt; Sidekiq::Queue.new.size=&gt; 1</code></pre><p>That looks good.As we can see, Sidekiq moved busy job back from a private queueto the queue in Rediswhen Sidekiq received a <code>SIGTERM</code> signal.</p><p>Now, let's try to kill Sidekiq process forcefullywithout allowing a graceful shutdownby sending a <code>SIGKILL</code> signal.</p><p>Since Sidekiq was gracefully shutdown before,if we restart Sidekiq again,it will re-process the pushed back job having ID <code>f002a41393f9a79a4366d2b5</code>.</p><pre><code class="language-ruby">$ bundle exec sidekiq76890 TID-oxecurbtu INFO: Sidekiq Pro 4.0.2, commercially licensed.  Thanks for your support!76890 TID-oxecurbtu INFO: Booting Sidekiq 5.1.3 with redis options {:id=&gt;&quot;Sidekiq-server-PID-76890&quot;, :url=&gt;nil}76890 TID-oxecurbtu INFO: Starting processing, hit Ctrl-C to stop76890 TID-oxecyhftq INFO: SuperFetch activated76890 TID-oxecyotvm SleepWorker JID-f002a41393f9a79a4366d2b5 INFO: startStarted C[1]    76890 killed     bundle exec sidekiq</code></pre><pre><code class="language-ruby">$ kill -SIGKILL 76890</code></pre><pre><code class="language-ruby">&gt;&gt; Sidekiq::Queue.new.size=&gt; 0</code></pre><p>It appears that Sidekiq didn't get any chanceto push the busy job back to the queue in Redison receiving a <code>SIGKILL</code> signal.</p><p>So, where is the magic of <code>super_fetch</code>?</p><p>Did we lose our job again?</p><p>Let's restart Sidekiq and see it ourself.</p><pre><code class="language-ruby">$ bundle exec sidekiq77496 TID-oum04ghgw INFO: Sidekiq Pro 4.0.2, commercially licensed.  Thanks for your support!77496 TID-oum04ghgw INFO: Booting Sidekiq 5.1.3 with redis options {:id=&gt;&quot;Sidekiq-server-PID-77496&quot;, :url=&gt;nil}77496 TID-oum04ghgw INFO: Starting processing, hit Ctrl-C to stop77496 TID-oum086w9s INFO: SuperFetch activated77496 TID-oum086w9s WARN: SuperFetch: recovered 1 jobs77496 TID-oum08eu3o SleepWorker JID-f002a41393f9a79a4366d2b5 INFO: startStarted CFinished C77496 TID-oum08eu3o SleepWorker JID-f002a41393f9a79a4366d2b5 INFO: done: 30.011 sec</code></pre><p>Whoa, isn't that cool?</p><p>See that line where it says <code>SuperFetch: recovered 1 jobs</code>.</p><p>Although the job wasn't pushed back to the queue in Redis,Sidekiq somehow recovered our lost job having ID <code>f002a41393f9a79a4366d2b5</code>and reprocessed that job again!</p><p>Interested to learn about how Sidekiq did that? Keep on reading.</p><p>Note that, since Sidekiq Pro is a close sourced and commercial software,we cannot explain <code>super_fetch</code>'s exact implementation details.</p><p>As we discussed in-depth before,Sidekiq's <code>basic_fetch</code> strategy uses <code>BRPOP</code> Redis commandto fetch a job from the queue in Redis.It works great to some extent,but it is prone to losing jobif Sidekiq crashes or does not shutdown gracefully.</p><p>On the other hand, Sidekiq Pro offers <code>super_fetch</code> strategy which uses<a href="http://redis.io/commands/rpoplpush">RPOPLPUSH</a> Redis command to fetch a job.</p><p><code>RPOPLPUSH</code> Redis command providesa unique approach towards implementing a reliable queue.<code>RPOPLPUSH</code> command accepts two listsnamely a source list and a destination list.This command atomicallyreturns and removes the last element from the source list,and pushes that element as the first element in the destination list.Atomically means that both pop and push operationsare performed as a single operation at the same time;i.e. both should succeed, otherwise both are treated as failed.</p><p><code>super_fetch</code> registers a private queue in Redisfor each Sidekiq process on start-up.<code>super_fetch</code> atomically fetches a scheduled jobfrom the public queue in Redisand pushes that job into the private queue (or working queue)using <code>RPOPLPUSH</code> Redis command.Once the job is finished processing,Sidekiq removes that job from the private queue.During a graceful shutdown,Sidekiq moves back the unfinished jobsfrom the private queue to the public queue.If shutdown of Sidekiq process is not graceful,the unfinished jobs of that Sidekiq processremain there in the private queue which are called as orphaned jobs.On restarting or starting another Sidekiq process,<code>super_fetch</code> looks for such orphaned jobs in the private queues.If Sidekiq finds orphaned jobs, Sidekiq re-enqueue them and processes again.</p><p>It may happen thatwe have multiple Sidekiq processes running at the same time.If a process dies among them, its unfinished jobs become orphans.<a href="https://github.com/mperham/sidekiq/wiki/Reliability#recovering-jobs">This Sidekiq wiki</a>describes in detail the criteria which <code>super_fetch</code> relies uponfor identifying which jobs are orphaned and which jobs are not orphaned.If we don't restart or start another process,<code>super_fetch</code> may take 5 minutes or 3 hours to recover such orphaned jobs.The recommended approach is to restart or start another Sidekiq processto signal <code>super_fetch</code> to look for orphans.</p><p>Interestingly, in the older versions of Sidekiq Pro,<code>super_fetch</code> performed checks for orphaned jobs and queues<a href="https://github.com/mperham/sidekiq/issues/3273">every 24 hours</a>at the Sidekiq process startup.Due to this, when the Sidekiq process crashes,the orphaned jobs of that process remain unpicked for up to 24 hoursuntil the next restart.This orphan delay check windowhad been later lowered to 1 hour in Sidekiq Pro 3.4.1.</p><p>Another fun thing to know is that,there existed two fetch strategies namely<a href="https://github.com/mperham/sidekiq/wiki/Reliability/_compare/71312b1f3880bcee9ff47f59c7516c15657553d8...15776fd781848a36a0ddb24c3f2315202696e30c"><code>reliable_fetch</code></a>and <code>timed_fetch</code>in the older versions of Sidekiq Pro.Apparently, <code>reliable_fetch</code><a href="https://github.com/mperham/sidekiq/wiki/Pro-Reliability-Server#reliable_fetch">did not work with Docker</a>and <code>timed_fetch</code> had asymptotic computational complexity <code>O(log N)</code>,comparatively<a href="https://github.com/mperham/sidekiq/wiki/Pro-Reliability-Server#timed_fetch">less efficient</a>than <code>super_fetch</code>,which has asymptotic computational complexity <code>O(1)</code>.Both of these strategies had been deprecatedin Sidekiq Pro 3.4.0 in favor of <code>super_fetch</code>.Later, both of these strategies had been<a href="https://github.com/mperham/sidekiq/blob/6e79f2a860ae558f2ed52b8917d2fede846c0a50/Pro-4.0-Upgrade.md#whats-new">removed</a>in Sidekiq Pro 4.0and <a href="https://github.com/mperham/sidekiq/wiki/Reliability#notes">are not documented anywhere</a>.</p><h2>Final result</h2><p>We have enabled <code>super_fetch</code> in our application andit seemed to be working without any major issues so far.Our Kubernetes <code>background</code> pods does not seem tobe loosing any jobs when these pods are restarted.</p><p>Update : Mike Pheram, author of Sidekiq, posted following<a href="https://www.reddit.com/r/ruby/comments/8htnpe/increase_reliability_of_background_job_processing">comment</a>.</p><blockquote><p>Faktory provides all of the beanstalkd functionality, including the same reliability, with a nicer Web UI. It's free and OSS.https://github.com/contribsys/faktory http://contribsys.com/faktory/</p></blockquote>]]></content>
    </entry><entry>
       <title><![CDATA[Deploying Docker Registry on Kubernetes using S3 Storage]]></title>
       <author><name>Rahul Mahale</name></author>
      <link href="https://www.bigbinary.com/blog/deploying-docker-registry-on-kubernetes-using-s3-storage"/>
      <updated>2018-05-03T12:00:00+00:00</updated>
      <id>https://www.bigbinary.com/blog/deploying-docker-registry-on-kubernetes-using-s3-storage</id>
      <content type="html"><![CDATA[<p>In today's era of containerization, no matter what container we are using weneed an image to run the container. Docker images are stored on containerregistries like Docker hub(cloud), Google Container Registry(GCR), AWS ECR,quay.io etc.</p><p>We can also self-host docker registry on any docker platform. In this blog post,we will see how to deploy docker registry on kubernetes using storage driver S3.</p><h4>Pre-requisite:</h4><ul><li><p>Access to working kubernetes cluster.</p></li><li><p>Understanding of <a href="http://kubernetes.io/">Kubernetes</a> terms like<a href="http://kubernetes.io/docs/user-guide/pods/">pods</a>,<a href="http://kubernetes.io/docs/user-guide/deployments/">deployments</a>,<a href="https://kubernetes.io/docs/concepts/services-networking/service/">services</a>,<a href="https://kubernetes.io/docs/tasks/configure-pod-container/configure-pod-configmap/">configmap</a>and<a href="https://kubernetes.io/docs/concepts/services-networking/ingress/">ingress</a>.</p></li></ul><p>As per docker registry<a href="https://docs.docker.com/registry/deploying/">documentation</a>, We can simplystart the registry using docker image <code>registry</code>.</p><p>Basic parameters when deploying production registry are:</p><ul><li>Authentication</li><li>SSL</li><li>Storage</li></ul><p>We will use <strong>htpasswd</strong> authentication for this post though registry imagesupports <strong>silly</strong> and <strong>token</strong> based authentication as well.</p><p>Docker registry requires applications to use SSL certificate and key in theregistry. We will use kubernetes service, which terminates SSL on ELB levelusing annotations.</p><p>For registry storage, we can use filesystem, s3, azure, swift etc. For thecomplete list of options please visit<a href="https://docs.docker.com/registry/configuration/#storagedocker">docker site</a>site.</p><p>We need to store the docker images pushed to the registry. We will use S3 tostore these docker images.</p><h4>Steps for deploying registry on kubernetes.</h4><p>Get the <code>ARN</code> of the SSL certificate to be used for SSL.</p><p>If you don't have SSL on AWS IAM, upload it using the following command.</p><pre><code class="language-bash">$aws iam upload-server-certificate --server-certificate-name registry --certificate-body file://registry.crt --private-key file://key.pem</code></pre><p>Get the <code>arn</code> for the certificate using the command.</p><pre><code class="language-bash">$aws iam get-server-certificate --server-certificate-name registry  | grep Arn</code></pre><p>Create S3 bucket which will be used to store docker images using s3cmd or awss3.</p><pre><code class="language-bash">$s3cmd mb s3://myregistryBucket 's3://myregistry/' created</code></pre><p>Create a separate namespace, configmap, deployment and service for registryusing following templates.</p><pre><code class="language-yaml">---apiVersion: v1kind: Namespacemetadata:name: container-registry---apiVersion: v1kind: ConfigMapmetadata:  name: auth  namespace: container-registrydata:  htpasswd: |    admin:$2y$05$TpZPzI7U7cr3cipe6jrOPe0bqohiwgEerEB6E4bFLsUf7Bk.SEBRi---apiVersion: extensions/v1beta1kind: Deploymentmetadata:  labels:    app: registry  name: registry  namespace: container-registryspec:  replicas: 1  strategy:    type: RollingUpdate  template:    metadata:      labels:        app: registry    spec:      containers:        - env:            - name: REGISTRY_AUTH              value: htpasswd            - name: REGISTRY_AUTH_HTPASSWD_PATH              value: /auth/htpasswd            - name: REGISTRY_AUTH_HTPASSWD_REALM              value: Registry Realm            - name: REGISTRY_STORAGE              value: s3            - name: REGISTRY_STORAGE_S3_ACCESSKEY              value: &lt;your-s3-access-key&gt;            - name: REGISTRY_STORAGE_S3_BUCKET              value: &lt;your-registry-bucket&gt;            - name: REGISTRY_STORAGE_S3_REGION              value: us-east-1            - name: REGISTRY_STORAGE_S3_SECRETKEY              value: &lt;your-secret-s3-key&gt;          image: registry:2          name: registry          ports:            - containerPort: 5000          volumeMounts:            - name: auth              mountPath: /auth      volumes:        - name: auth          configMap:            name: auth---apiVersion: v1kind: Servicemetadata:  annotations:    service.beta.kubernetes.io/aws-load-balancer-ssl-cert: &lt;your-iam-certificate-arn&gt;    service.beta.kubernetes.io/aws-load-balancer-instance-protocol: http    service.beta.kubernetes.io/aws-load-balancer-ssl-ports: &quot;443&quot;  labels:    app: registry  name: registry  namespace: container-registryspec:  ports:    - name: &quot;443&quot;      port: 443      targetPort: 5000  selector:    app: registrytype: LoadBalancer</code></pre><p>Let's launch this manifest using <code>kubectl apply</code>.</p><pre><code class="language-bash">kubectl apply -f registry-namespace.yml registry-configmap.yml registry-deployment.yaml registry-namespace.ymlnamespace &quot;registry&quot; createdconfigmap &quot;auth&quot; createddeployment &quot;registry&quot; createdservice &quot;registry&quot; created</code></pre><p>Now that we have created registry, we should map DNS to web service ELBendpoint. We can get the webservice ELB endpoint using the following command.</p><pre><code class="language-bash">$kubectl -n registry get svc registry -o wideNAME       CLUSTER-IP      EXTERNAL-IP                                                               PORT(S)         AGE       SELECTORregistry   100.71.250.56   abcghccf8540698e8bff782799ca8h04-1234567890.us-east-2.elb.amazonaws.com   443:30494/TCP   1h       app=registry</code></pre><p>We will point DNS to this ELB endpoint with domain registry.myapp.com</p><p>Once we have registry running, now it's time to push the image to a registry.</p><p>First, pull the image or build the image locally to push.</p><p>On local machine run following commands:</p><pre><code class="language-bash">$docker pull busyboxlatest: Pulling from busyboxf9ea5e501ad7: Pull completeac3f08b78d4e: Pull completeDigest: sha256:da268b65d710e5ca91271f161d0ff078dc63930bbd6baac88d21b20d23b427ecStatus: Downloaded newer image for busybox:latest</code></pre><p>Now login to our registry using the following commands.</p><pre><code class="language-bash">$ sudo docker login registry.myapp.comUsername: adminPassword:Login Succeeded</code></pre><p>Now tag the image to point it to our registry using <code>docker tag</code> command</p><pre><code class="language-bash">$ sudo docker tag busybox registry.myapp.com/my-app:latest</code></pre><p>Once the image is tagged we are good to push.</p><p>Using the <code>docker push</code> command let's push the image.</p><pre><code class="language-bash">$ sudo docker push docker.gocloudlogistics.com/my-app:latestThe push refers to a repository [registry.myapp.com/my-app]05732a3f47b5: Pushed30de36c4bd15: Pushed5237590c0d08: Pushedlatest: digest: sha256:f112e608b2639b21498bd4dbca9076d378cc216a80d52287f7f0f6ea6ad739ab size: 205</code></pre><p>We are successfully able to push image to registry running on kunbernetes andstored on S3. Let's verify if it exists on S3.</p><p>Navigate to our s3 bucket and we can see the docker registry repository<code>busybox</code> has been created.</p><pre><code class="language-bash">$ s3cmd ls s3://myregistry/docker/registry/repositories/DIR   s3://myregistry/docker/registry/repositories/busybox/</code></pre><p>All our image related files are stored on S3.</p><p>In this way, we self-host container registry on kubernetes backed by s3 storage.</p>]]></content>
    </entry><entry>
       <title><![CDATA[Setup path based routing for a Rails app with HAProxy Ingress]]></title>
       <author><name>Rahul Mahale</name></author>
      <link href="https://www.bigbinary.com/blog/using-haproxy-ingress-with-rails-uniconrn-and-websockets"/>
      <updated>2018-02-28T12:00:00+00:00</updated>
      <id>https://www.bigbinary.com/blog/using-haproxy-ingress-with-rails-uniconrn-and-websockets</id>
      <content type="html"><![CDATA[<p>After months of testing we recently moved a Ruby on Rails application toproduction that is using Kubernetes cluster.</p><p>In this article we will discuss how to setup path based routing for a Ruby onRails application in kubernetes using HAProxy ingress.</p><p>This post assumes that you have basic understanding of<a href="http://kubernetes.io/">Kubernetes</a> terms like<a href="http://kubernetes.io/docs/user-guide/pods/">pods</a>,<a href="http://kubernetes.io/docs/user-guide/deployments/">deployments</a>,<a href="https://kubernetes.io/docs/concepts/services-networking/service/">services</a>,<a href="https://kubernetes.io/docs/tasks/configure-pod-container/configure-pod-configmap/">configmap</a>and <a href="https://kubernetes.io/docs/concepts/services-networking/ingress/">ingress</a>.</p><p>Typically our Rails app has services like unicorn/puma,sidekiq/delayed-job/resque, Websockets and some dedicated API services. We hadone web service exposed to the world using load balancer and it was workingwell. But as the traffic increased it became necessary to route traffic based onURLs/path.</p><p>However Kubernetes does not supports this type of load balancing out of the box.There is work in progress for<a href="https://github.com/coreos/alb-ingress-controller">alb-ingress-controller</a> tosupport this but we could not rely on it for production usage as it is still inalpha.</p><p>The best way to achieve path based routing was to use<a href="https://kubernetes.io/docs/concepts/services-networking/ingress/#ingress-controllers">ingress controller</a>.</p><p>We researched and found that there are different types of ingress available ink8s world.</p><ol><li><a href="https://github.com/kubernetes/ingress-nginx">nginx-ingress</a></li><li><a href="https://github.com/kubernetes/ingress-gce">ingress-gce</a></li><li><a href="https://github.com/jcmoraisjr/haproxy-ingress">HAProxy-ingress</a></li><li><a href="https://docs.traefik.io/providers/kubernetes-ingress/">traefik</a></li><li><a href="https://github.com/appscode/voyager">voyager</a></li></ol><p>We experimented with nginx-ingress and HAProxy and decided to go with HAProxy.HAProxy has better support for Rails websockets which we needed in the project.</p><p>We will walk you through step by step on how to use haproxy ingress in a Railsapp.</p><h3>Configuring Rails app with HAProxy ingress controller</h3><p>Here is what we are going to do.</p><ul><li>Create a Rails app with different services and deployments.</li><li>Create tls secret for SSL.</li><li>Create HAProxy ingress configmap.</li><li>Create HAProxy ingress controller.</li><li>Expose ingress with service type LoadBalancer</li><li>Setup app DNS with ingress service.</li><li>Create different ingress rules specifying path based routing.</li><li>Test the path based routing.</li></ul><p>Now let's build Rails application deployment manifest for services likeweb(unicorn),background(sidekiq), Websocket(ruby thin),API(dedicated unicorn).</p><p>Here is our web app deployment and service template.</p><pre><code class="language-yaml">---apiVersion: v1kind: Deploymentmetadata:  name: test-production-web  labels:    app: test-production-web  namespace: testspec:  template:    metadata:      labels:        app: test-production-web    spec:      containers:      - image: &lt;your-repo&gt;/&lt;your-image-name&gt;:latest        name: test-production        imagePullPolicy: Always       env:        - name: POSTGRES_HOST          value: test-production-postgres        - name: REDIS_HOST          value: test-production-redis        - name: APP_ENV          value: production        - name: APP_TYPE          value: web        - name: CLIENT          value: test        ports:        - containerPort: 80      imagePullSecrets:        - name: registrykey---apiVersion: v1kind: Servicemetadata:  name: test-production-web  labels:    app: test-production-web  namespace: testspec:  ports:  - port: 80    protocol: TCP    targetPort: 80  selector:    app: test-production-web</code></pre><p>Here is background app deployment and service template.</p><pre><code class="language-yaml">---apiVersion: v1kind: Deploymentmetadata:  name: test-production-background  labels:    app: test-production-background  namespace: testspec:  template:    metadata:      labels:        app: test-production-background    spec:      containers:      - image: &lt;your-repo&gt;/&lt;your-image-name&gt;:latest        name: test-production        imagePullPolicy: Always       env:        - name: POSTGRES_HOST          value: test-production-postgres        - name: REDIS_HOST          value: test-production-redis        - name: APP_ENV          value: production        - name: APP_TYPE          value: background        - name: CLIENT          value: test        ports:        - containerPort: 80      imagePullSecrets:        - name: registrykey---apiVersion: v1kind: Servicemetadata:  name: test-production-background  labels:    app: test-production-background  namespace: testspec:  ports:  - port: 80    protocol: TCP    targetPort: 80  selector:    app: test-production-background</code></pre><p>Here is websocket app deployment and service template.</p><pre><code class="language-yaml">---apiVersion: v1kind: Deploymentmetadata:  name: test-production-websocket  labels:    app: test-production-websocket  namespace: testspec:  template:    metadata:      labels:        app: test-production-websocket    spec:      containers:      - image: &lt;your-repo&gt;/&lt;your-image-name&gt;:latest        name: test-production        imagePullPolicy: Always       env:        - name: POSTGRES_HOST          value: test-production-postgres        - name: REDIS_HOST          value: test-production-redis        - name: APP_ENV          value: production        - name: APP_TYPE          value: websocket        - name: CLIENT          value: test        ports:        - containerPort: 80      imagePullSecrets:        - name: registrykey---apiVersion: v1kind: Servicemetadata:  name: test-production-websocket  labels:    app: test-production-websocket  namespace: testspec:  ports:  - port: 80    protocol: TCP    targetPort: 80  selector:    app: test-production-websocket</code></pre><p>Here is API app deployment and service info.</p><pre><code class="language-yaml">---apiVersion: v1kind: Deploymentmetadata:  name: test-production-api  labels:    app: test-production-api  namespace: testspec:  template:    metadata:      labels:        app: test-production-api    spec:      containers:      - image: &lt;your-repo&gt;/&lt;your-image-name&gt;:latest        name: test-production        imagePullPolicy: Always       env:        - name: POSTGRES_HOST          value: test-production-postgres        - name: REDIS_HOST          value: test-production-redis        - name: APP_ENV          value: production        - name: APP_TYPE          value: api        - name: CLIENT          value: test        ports:        - containerPort: 80      imagePullSecrets:        - name: registrykey---apiVersion: v1kind: Servicemetadata:  name: test-production-api  labels:    app: test-production-api  namespace: testspec:  ports:  - port: 80    protocol: TCP    targetPort: 80  selector:    app: test-production-api</code></pre><p>Let's launch this manifest using <code>kubectl apply</code>.</p><pre><code class="language-bash">$ kubectl apply -f test-web.yml -f test-background.yml -f test-websocket.yml -f test-api.ymldeployment &quot;test-production-web&quot; createdservice &quot;test-production-web&quot; createddeployment &quot;test-production-background&quot; createdservice &quot;test-production-background&quot; createddeployment &quot;test-production-websocket&quot; createdservice &quot;test-production-websocket&quot; createddeployment &quot;test-production-api&quot; createdservice &quot;test-production-api&quot; created</code></pre><p>Once our app is deployed and running we should create HAProxy ingress. Beforethat let's create a tls secret with our SSL key and certificate.</p><p>This is also used to enable HTTPS for app URL and to terminate it on L7.</p><pre><code class="language-bash">$ kubectl create secret tls tls-certificate --key server.key --cert server.pem</code></pre><p>Here <code>server.key</code> is our SSL key and <code>server.pem</code> is our SSL certificate in pemformat.</p><p>Now let's Create HAProxy controller resources.</p><h3>HAProxy configmap</h3><p>For all the available configuration parameters from HAProxy refer<a href="https://github.com/jcmoraisjr/HAProxy-ingress#configmap">here</a>.</p><pre><code class="language-yaml">apiVersion: v1data:  dynamic-scaling: &quot;true&quot;  backend-server-slots-increment: &quot;4&quot;kind: ConfigMapmetadata:  name: haproxy-configmap  namespace: test</code></pre><h3>HAProxy Ingress controller deployment</h3><p>Deployment template for the Ingress controller with at-least 2 replicas tomanage rolling deploys.</p><pre><code class="language-yaml">apiVersion: extensions/v1beta1kind: Deploymentmetadata:  labels:    run: haproxy-ingress  name: haproxy-ingress  namespace: testspec:  replicas: 2  selector:    matchLabels:      run: haproxy-ingress  template:    metadata:      labels:        run: haproxy-ingress    spec:      containers:        - name: haproxy-ingress          image: quay.io/jcmoraisjr/haproxy-ingress:v0.5-beta.1          args:            - --default-backend-service=$(POD_NAMESPACE)/test-production-web            - --default-ssl-certificate=$(POD_NAMESPACE)/tls-certificate            - --configmap=$(POD_NAMESPACE)/haproxy-configmap            - --ingress-class=haproxy          ports:            - name: http              containerPort: 80            - name: https              containerPort: 443            - name: stat              containerPort: 1936          env:            - name: POD_NAME              valueFrom:                fieldRef:                  fieldPath: metadata.name            - name: POD_NAMESPACE              valueFrom:                fieldRef:                  fieldPath: metadata.namespace</code></pre><p>Notable fields in above manifest are arguments passed to controller.</p><p><code>--default-backend-service</code> is the service when No rule is matched your requestwill be served by this app.</p><p>In our case it is <code>test-production-web</code> service, But it can be custom 404 pageor whatever better you think.</p><p><code>--default-ssl-certificate</code> is the SSL secret we just created above this willterminate SSL on L7 and our app is served on HTTPS to outside world.</p><h3>HAProxy Ingress service</h3><p>This is the <code>LoadBalancer</code> type service to allow client traffic to reach ourIngress Controller.</p><p>LoadBalancer has access to both public network and internal Kubernetes networkwhile retaining the L7 routing of the Ingress Controller.</p><pre><code class="language-yaml">apiVersion: v1kind: Servicemetadata:  labels:    run: haproxy-ingress  name: haproxy-ingress  namespace: testspec:  type: LoadBalancer  ports:    - name: http      port: 80      protocol: TCP      targetPort: 80    - name: https      port: 443      protocol: TCP      targetPort: 443    - name: stat      port: 1936      protocol: TCP      targetPort: 1936  selector:    run: haproxy-ingress</code></pre><p>Now let's apply all the manifests of HAProxy.</p><pre><code class="language-bash">$ kubectl apply -f haproxy-configmap.yml -f haproxy-deployment.yml -f haproxy-service.ymlconfigmap &quot;haproxy-configmap&quot; createddeployment &quot;haproxy-ingress&quot; createdservice &quot;haproxy-ingress&quot; created</code></pre><p>Once all the resources are running get the LoadBalancer endpoint using.</p><pre><code class="language-bash">$ kubectl -n test get svc haproxy-ingress -o wideNAME               TYPE           CLUSTER-IP       EXTERNAL-IP                                                            PORT(S)                                     AGE       SELECTORhaproxy-ingress   LoadBalancer   100.67.194.186   a694abcdefghi11e8bc3b0af2eb5c5d8-806901662.us-east-1.elb.amazonaws.com   80:31788/TCP,443:32274/TCP,1936:32157/TCP   2m        run=ingress</code></pre><h3>DNS mapping with application URL</h3><p>Once we have ELB endpoint of ingress service, map the DNS with URL like<code>test-rails-app.com</code>.</p><h3>Ingress Implementation</h3><p>Now after doing all the hard work it is time to configure ingress and path basedrules.</p><p>In our case we want to have following rules.</p><p><em>https://test-rails-app.com</em> requests to be served by <code>test-production-web</code>.</p><p><em>https://test-rails-app.com/websocket</em> requests to be served by<code>test-production-websocket</code>.</p><p><em>https://test-rails-app.com/api</em> requests to be served by <code>test-production-api</code>.</p><p>Let's create a ingress manifest defining all the rules.</p><pre><code class="language-yaml">---apiVersion: extensions/v1beta1kind: Ingressmetadata:  name: ingress  namespace: testspec:  tls:    - hosts:        - test-rails-app.com      secretName: tls-certificate  rules:    - host: test-rails-app.com      http:        paths:          - path: /            backend:              serviceName: test-production-web              servicePort: 80          - path: /api            backend:              serviceName: test-production-api              servicePort: 80          - path: /websocket            backend:              serviceName: test-production-websocket              servicePort: 80</code></pre><p>Moreover there are<a href="https://github.com/jcmoraisjr/haproxy-ingress#annotations">Ingress Annotations</a>for adjusting configuration changes.</p><p>As expected, now our default traffic on <code>/</code> is routed to <code>test-production-web</code>service.</p><p><code>/api</code> is routed to <code>test-production-api</code> service.</p><p><code>/websocket</code> is routed to <code>test-production-websocket</code> service.</p><p>Thus ingress implementation solves our purpose of path based routing andterminating SSL on L7 on Kubernetes.</p>]]></content>
    </entry><entry>
       <title><![CDATA[Scheduling pods on nodes in Kubernetes using labels]]></title>
       <author><name>Rahul Mahale</name></author>
      <link href="https://www.bigbinary.com/blog/scheduling-pods-on-nodes-in-kubernetes-using-labels"/>
      <updated>2017-10-16T12:00:00+00:00</updated>
      <id>https://www.bigbinary.com/blog/scheduling-pods-on-nodes-in-kubernetes-using-labels</id>
      <content type="html"><![CDATA[<p>This post assumes that you have basic understanding of<a href="http://kubernetes.io/">Kubernetes</a> terms like<a href="http://kubernetes.io/docs/user-guide/pods/">pods</a>,<a href="http://kubernetes.io/docs/user-guide/deployments/">deployments</a> and<a href="https://kubernetes.io/docs/concepts/architecture/nodes/">nodes</a>.</p><p>A Kubernetes cluster can have many nodes. Each node in turn can run multiplepods. By default Kubernetes manages which pod will run on which node and this issomething we do not need to worry about it.</p><p>However sometimes we want to ensure that certain pods do not run on the samenode. For example we have an application called <em>wheel</em>. We have both stagingand production version of this app and we want to ensure that production pod andstaging pod are not on the same host.</p><p>To ensure that certain pods do not run on the same host we can use<strong>nodeSelector</strong> constraint in <strong>PodSpec</strong> to schedule pods on nodes.</p><h3>Kubernetes cluster</h3><p>We will use <a href="https://github.com/kubernetes/kops/">kops</a> to provision cluster. Wecan check the health of cluster using <code>kops validate-cluster</code>.</p><pre><code class="language-bash">$ kops validate clusterUsing cluster from kubectl context: test-k8s.nodes-staging.comValidating cluster test-k8s.nodes-staging.comINSTANCE GROUPSNAME              ROLE   MACHINETYPE MIN MAX SUBNETSmaster-us-east-1a Master m4.large    1   1 us-east-1amaster-us-east-1b Master m4.large    1   1 us-east-1bmaster-us-east-1c Master m4.large    1   1 us-east-1cnodes-wheel-stg   Node   m4.large    2   5 us-east-1a,us-east-1bnodes-wheel-prd   Node   m4.large    2   5 us-east-1a,us-east-1bNODE STATUS           NAME                ROLE   READYip-192-10-110-59.ec2.internal  master Trueip-192-10-120-103.ec2.internal node   Trueip-192-10-42-9.ec2.internal    master Trueip-192-10-73-191.ec2.internal  master Trueip-192-10-82-66.ec2.internal   node   Trueip-192-10-72-68.ec2.internal   node   Trueip-192-10-182-70.ec2.internal  node   TrueYour cluster test-k8s.nodes-staging.com is ready</code></pre><p>Here we can see that there are two instance groups for nodes: <em>nodes-wheel-stg</em>and <em>nodes-wheel-prd</em>.</p><p><em>nodes-wheel-stg</em> might have application pods like <em>pod-wheel-stg-sidekiq</em>,<em>pod-wheel-stg-unicorn</em> and <em>pod-wheel-stg-redis</em>. Similarly <em>nodes-wheel-prd</em>might have application pods like <em>pod-wheel-prd-sidekiq</em>,<em>pod-wheel-prd-unicorn</em> and <em>pod-wheel-prd-redis</em>.</p><p>As we can see the <strong>Max number of nodes</strong> for instance group <em>nodes-wheel-stg</em>and <em>nodes-wheel-prd</em> is 5. It means if new nodes are created in future thenbased on the instance group the newly created nodes will automatically belabelled and no manual work is required.</p><h3>Labelling a Node</h3><p>We will use<a href="https://kubernetes.io/docs/concepts/overview/working-with-objects/labels/">kubernetes labels</a>to label a node. To add a label we need to edit instance group using kops.</p><pre><code class="language-bash">$ kops edit ig nodes-wheel-stg</code></pre><p>This will open up instance group configuration file, we will add following labelin instance group spec.</p><pre><code class="language-yaml">nodeLabels:  type: wheel-stg</code></pre><p>Complete <code>ig</code> configuration looks like this.</p><pre><code class="language-yaml">apiVersion: kops/v1alpha2kind: InstanceGroupmetadata:  creationTimestamp: 2017-10-12T06:24:53Z  labels:    kops.k8s.io/cluster: k8s.nodes-staging.com  name: nodes-wheel-stgspec:  image: kope.io/k8s-1.7-debian-jessie-amd64-hvm-ebs-2017-07-28  machineType: m4.large  maxSize: 5  minSize: 2  nodeLabels:    type: wheel-stg  role: Node  subnets:    - us-east-1a    - us-east-1b    - us-east-1c</code></pre><p>Similarly, we can label for instance group <em>nodes-wheel-prod</em> with label <em>typewheel-prod</em>.</p><p>After making the changes update cluster using<code>kops rolling update cluster --yes --force</code>. This will update the cluster withspecified labels.</p><p>New nodes added in future will have labels based on respective<code>instance groups</code>.</p><p>Once nodes are labeled we can verify using <code>kubectl describe node</code>.</p><pre><code class="language-bash">$ kubectl describe node ip-192-10-82-66.ec2.internalName:               ip-192-10-82-66.ec2.internalRoles:              nodeLabels:             beta.kubernetes.io/arch=amd64                    beta.kubernetes.io/instance-type=m4.large                    beta.kubernetes.io/os=linux                    failure-domain.beta.kubernetes.io/region=us-east-1                    failure-domain.beta.kubernetes.io/zone=us-east-1a                    kubernetes.io/hostname=ip-192-10-82-66.ec2.internal                    kubernetes.io/role=node                    type=wheel-stg</code></pre><p>In this way we have our node labeled using kops.</p><h3>Labelling nodes using kubectl</h3><p>We can also label node using <code>kubectl</code>.</p><pre><code class="language-bash">$ kubectl label node ip-192-20-44-136.ec2.internal type=wheel-stg</code></pre><p>After labeling a node, we will add <code>nodeSelector</code> field to our <code>PodSpec</code> indeployment template.</p><p>We will add the following block in deployment manifest.</p><pre><code class="language-yaml">nodeSelector:  type: wheel-stg</code></pre><p>We can add this configuration in original deployment manifest.</p><pre><code class="language-yaml">apiVersion: v1kind: Deploymentmetadata:  name: test-staging-node  labels:    app: test-staging  namespace: testspec:  replicas: 1  template:    metadata:      labels:        app: test-staging    spec:      containers:      - image: &lt;your-repo&gt;/&lt;your-image-name&gt;:latest        name: test-staging        imagePullPolicy: Always        - name: REDIS_HOST          value: test-staging-redis        - name: APP_ENV          value: staging        - name: CLIENT          value: test        ports:        - containerPort: 80      nodeSelector:        type: wheel-stg      imagePullSecrets:        - name: registrykey</code></pre><p>Let's launch this deployment and check where the pod is scheduled.</p><pre><code class="language-bash">$ kubectl apply -f test-deployment.ymldeployment &quot;test-staging-node&quot; created</code></pre><p>We can verify that our pod is running on node <code>type=wheel-stg</code>.</p><pre><code class="language-bash">kubectl describe pod test-staging-2751555626-9sd4mName:           test-staging-2751555626-9sd4mNamespace:      defaultNode:           ip-192-10-82-66.ec2.internal/192.10.82.66......Conditions:  Type           Status  Initialized    True  Ready          True  PodScheduled   TrueQoS Class:       BurstableNode-Selectors:  type=wheel-stgTolerations:     node.alpha.kubernetes.io/notReady:NoExecute for 300s                 node.alpha.kubernetes.io/unreachable:NoExecute for 300sEvents:          &lt;none&gt;</code></pre><p>Similarly we run <em>nodes-wheel-prod</em> pods on nodes labeled with<code>type: wheel-prod</code>.</p><p>Please note that when we specify <code>nodeSelector</code> and no node matches label thenpods are in <code>pending</code> state as they don't find node with matching label.</p><p>In this way we schedule our pods to run on specific nodes for certain use-cases.</p>]]></content>
    </entry><entry>
       <title><![CDATA[Graceful shutdown of Sidekiq processes on Kubernetes]]></title>
       <author><name>Rahul Mahale</name></author>
      <link href="https://www.bigbinary.com/blog/graceful-shutdown-of-sidekiq-processes-on-k8s"/>
      <updated>2017-08-24T12:00:00+00:00</updated>
      <id>https://www.bigbinary.com/blog/graceful-shutdown-of-sidekiq-processes-on-k8s</id>
      <content type="html"><![CDATA[<p>In our last <a href="deploying-rails-applications-using-kubernetes-with-zero-downtime">blog</a>, we explained how to handlerolling deployments of Rails applications with no downtime.</p><p>In this article we will walk you throughhow to handle graceful shutdown of processes in Kubernetes.</p><p>This post assumes that you have basic understanding of<a href="http://kubernetes.io/">Kubernetes</a>terms like<a href="http://kubernetes.io/docs/user-guide/pods/">pods</a>and<a href="http://kubernetes.io/docs/user-guide/deployments/">deployments</a>.</p><h3>Problem</h3><p>When we deploy Rails applications on kubernetesit stops existing pods and spins up new ones.When old pod is terminated by Replicaset,then active Sidekiq processes are also terminated.We run our batch jobs using sidekiq and it is possible thatsidekiq jobs might be running when deployment is being performed.Terminating old pod during deployment can kill the already running jobs.</p><h3>Solution #1</h3><p>As per default<a href="https://kubernetes.io/docs/concepts/workloads/pods/pod/#termination-of-pods">pod termination</a>policy of kubernetes, kubernetes sends command to delete pod with a default grace period of 30 seconds.At this time kubernetes sends TERM signal.When the grace period expires, any processes still running in the Pod are killed with SIGKILL.</p><p>We can adjust the <code>terminationGracePeriodSeconds</code> timeout as per our need and can change it from30 seconds to 2 minutes.</p><p>However there might be cases where we are notsure how much time a process takes to gracefully shutdown.In such cases we should consider using<code>PreStop</code> hook which is our next solution.</p><h3>Solution #2</h3><p>Kubernetes provides many<a href="https://kubernetes.io/docs/concepts/containers/container-lifecycle-hooks/">Container lifecycle hooks</a>.</p><p><code>PreStop</code> hook is called immediately before a container is terminated.It is a blocking call. It means it is synchronous.It also means that this hook must be completed beforethe container is terminated.</p><p>Note that unlike solution1 this solution is not time bound.Kubernetes will wait as long as it takes for <code>PreStop</code> processto finish. It is never a good idea to have a process which takes morethan a minute to shutdown but in real world there are caseswhere more time is needed. Use <code>PreStop</code> for such cases.</p><p>We decided to use <code>preStop</code> hook to stop Sidekiq because we had some really long running processes.</p><h3>Using PreStop hooks in Sidekiq deployment</h3><p>This is a simple deployment template which terminates<a href="https://github.com/mperham/sidekiq/wiki/Signals">Sidekiq process</a>when pod is terminated during deployment.</p><pre><code class="language-yaml">apiVersion: v1kind: Deploymentmetadata:  name: test-staging-sidekiq  labels:    app: test-staging  namespace: testspec:  template:    metadata:      labels:        app: test-staging    spec:      containers:        - image: &lt;your-repo&gt;/&lt;your-image-name&gt;:latest          name: test-staging          imagePullPolicy: Always          env:            - name: REDIS_HOST              value: test-staging-redis            - name: APP_ENV              value: staging            - name: CLIENT              value: test          volumeMounts:            - mountPath: /etc/sidekiq/config              name: test-staging-sidekiq          ports:            - containerPort: 80      volumes:        - name: test-staging-sidekiq          configMap:            name: test-staging-sidekiq            items:              - key: config                path: sidekiq.yml      imagePullSecrets:        - name: registrykey</code></pre><p>Next we will use <code>PreStop</code> lifecycle hook to stopSidekiq safely before pod termination.</p><p>We will add the following block in deployment manifest.</p><pre><code class="language-yaml">lifecycle:  preStop:    exec:      command:        [          &quot;/bin/bash&quot;,          &quot;-l&quot;,          &quot;-c&quot;,          &quot;cd /opt/myapp/current; for f in tmp/pids/sidekiq*.pid; do bundle exec sidekiqctl stop $f; done&quot;,        ]</code></pre><p><code>PreStop</code> hook stops all theSidekiq processes and does graceful shutdown of Sidekiqbefore terminating the pod.</p><p>We can add this configuration in original deployment manifest.</p><pre><code class="language-yaml">apiVersion: v1kind: Deploymentmetadata:  name: test-staging-sidekiq  labels:    app: test-staging  namespace: testspec:  replicas: 1  template:    metadata:      labels:        app: test-staging    spec:      containers:        - image: &lt;your-repo&gt;/&lt;your-image-name&gt;:latest          name: test-staging          imagePullPolicy: Always          lifecycle:            preStop:              exec:                command:                  [                    &quot;/bin/bash&quot;,                    &quot;-l&quot;,                    &quot;-c&quot;,                    &quot;cd /opt/myapp/current; for f in tmp/pids/sidekiq*.pid; do bundle exec sidekiqctl stop $f; done&quot;,                  ]          env:            - name: REDIS_HOST              value: test-staging-redis            - name: APP_ENV              value: staging            - name: CLIENT              value: test          volumeMounts:            - mountPath: /etc/sidekiq/config              name: test-staging-sidekiq          ports:            - containerPort: 80      volumes:        - name: test-staging-sidekiq          configMap:            name: test-staging-sidekiq            items:              - key: config                path: sidekiq.yml      imagePullSecrets:        - name: registrykey</code></pre><p>Let's launch this deployment and monitor the rolling deployment.</p><pre><code class="language-bash">$ kubectl apply -f test-deployment.ymldeployment &quot;test-staging-sidekiq&quot; configured</code></pre><p>We can confirm that existing Sidekiq jobs are completedbefore termination of old pod during the deployment process.In this way we handle a graceful shutdown ofSidekiq process. We can apply this technique to other processesas well.</p>]]></content>
    </entry><entry>
       <title><![CDATA[Deploy Rails apps on Kubernetes cluster & no downtime]]></title>
       <author><name>Rahul Mahale</name></author>
      <link href="https://www.bigbinary.com/blog/deploying-rails-applications-using-kubernetes-with-zero-downtime"/>
      <updated>2017-07-25T12:00:00+00:00</updated>
      <id>https://www.bigbinary.com/blog/deploying-rails-applications-using-kubernetes-with-zero-downtime</id>
      <content type="html"><![CDATA[<p>This post assumes that you have basic understanding of<a href="http://kubernetes.io/">Kubernetes</a>terms like<a href="http://kubernetes.io/docs/user-guide/pods/">pods</a>and<a href="http://kubernetes.io/docs/user-guide/deployments/">deployments</a>.</p><h3>Problem</h3><p>We deploy Rails applications on Kubernetes frequentlyandwe need to ensure thatdeployments do not cause any downtime.When we used Capistrano to manage deploymentsit was much easier sinceit has provision to restart services in the rolling fashion.</p><p>Kubernetes restarts pods directlyandany process already running on the pod is terminated.So on rolling deployments we face downtimeuntil the new pod is up and running.</p><h3>Solution</h3><p>In Kubernetes we have<a href="https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-probes/">readiness probes and liveness probes</a>.Liveness probes take care of keeping pod livewhile readiness probe is responsible for keeping pods ready.</p><p>This is what Kubernetes documentation has to say aboutwhen to use readiness probes.</p><blockquote><p>Sometimes, applications are temporarily unable to serve traffic. For example, an application might need to load large data or configuration files during startup. In such cases, you dont want to kill the application, but you dont want to send it requests either. Kubernetes provides readiness probes to detect and mitigate these situations. A pod with containers reporting that they are not ready does not receive traffic through Kubernetes Services.</p></blockquote><p>It meansnew traffic should not be routed tothose pods which are currently running butare not ready yet.</p><h3>Using readiness probes in deployment flow</h3><p>Here is what we are going to do.</p><ul><li>We will use readiness probes to deploy our Rails app.</li><li>Readiness probes definition has to be specified in pod <code>spec</code> of deployment.</li><li>Readiness probe uses health check to detect the pod readiness.</li><li>We will create a simple file on our pod with name <code>health_check</code> returning status <code>200</code>.</li><li>This health check runs on arbitrary port 81.</li><li>We will expose this port in nginx config running on a pod.</li><li>When our application is up on nginx this health_check returns <code>200</code>.</li><li>We will use above fields to configure health check in pod's spec of deployment.</li></ul><p>Now let's build test deployment manifest.</p><pre><code class="language-yaml">---apiVersion: v1kind: Deploymentmetadata:  name: test-staging  labels:    app: test-staging  namespace: testspec:  template:    metadata:      labels:        app: test-staging    spec:      containers:      - image: &lt;your-repo&gt;/&lt;your-image-name&gt;:latest        name: test-staging        imagePullPolicy: Always       env:        - name: POSTGRES_HOST          value: test-staging-postgres        - name: APP_ENV          value: staging        - name: CLIENT          value: test        ports:        - containerPort: 80      imagePullSecrets:        - name: registrykey</code></pre><p>This is a simple deployment template which will terminate pod on the rolling deployment.Application may suffer a downtime until the pod is in running state.</p><p>Next we will use readiness probe to define that pod is ready to accept the application traffic.We will add the following block in deployment manifest.</p><pre><code class="language-yaml">readinessProbe:  httpGet:    path: /health_check    port: 81  periodSeconds: 5  successThreshold: 3  failureThreshold: 2</code></pre><p>In above rediness probe definition <code>httpGet</code> checks the health check.</p><p>Health-check queries application on the file <code>health_check</code> printing <code>200</code> when accessed over port <code>81</code>.We will poll it for each 5 seconds with the field <code>periodSeconds</code>.</p><p>We will mark pod as ready only if we get a successful health_check count for 3 times.Similarly, we will mark it as a failure if we get failureThreshold twice.This can be adjusted as per application need.This helps deployment to determine if the pod is in ready status or not.With readiness probes for rolling updates, we will use <code>maxUnavailable</code> and <code>maxSurge</code> in deployment strategy.</p><p>As per Kubernetes documentation.</p><blockquote><p><strong><code>maxUnavailable</code></strong> is a field that specifies the maximum number of Podsthat can be unavailable during the update process.The value can be an absolute number (e.g. 5) or a percentage of desired Pods (e.g. 10%).The absolute number is calculated from percentage by rounding down.This can not be 0.</p></blockquote><p>and</p><blockquote><p><strong><code>maxSurge</code></strong> is field that specifiesThe maximum number of Podsthat can be created above the desired number of Pods.Value can be an absolute number (e.g. 5) ora percentage of desired Pods (e.g. 10%).This cannot be 0 if MaxUnavailable is 0.The absolute number is calculated from percentage by rounding up.By default, a value of 25% is used.</p></blockquote><p>Now we will update our deployment manifests withtwo replicas and the rolling update strategy by specifying the following parameters.</p><pre><code class="language-yaml">replicas: 2minReadySeconds: 50revisionHistoryLimit: 10strategy:  type: RollingUpdate  rollingUpdate:    maxUnavailable: 50%    maxSurge: 1</code></pre><p>This makes sure that on deployment one of our pods is always runningandat most 1 more pod can be created while deployment.</p><p>We can read more about rolling-deployments<a href="https://kubernetes.io/docs/concepts/workloads/controllers/deployment">here</a>.</p><p>We can add this configuration in original deployment manifest.</p><pre><code class="language-yaml">apiVersion: v1kind: Deploymentmetadata:  name: test-staging  labels:    app: test-staging  namespace: testspec:  replicas: 2  minReadySeconds: 50  revisionHistoryLimit: 10  strategy:    type: RollingUpdate    rollingUpdate:      maxUnavailable: 50%      maxSurge: 1  template:    metadata:      labels:        app: test-staging    spec:      containers:      - image: &lt;your-repo&gt;/&lt;your-image-name&gt;:latest        name: test-staging        imagePullPolicy: Always       env:        - name: POSTGREs_HOST          value: test-staging-postgres        - name: APP_ENV          value: staging        - name: CLIENT          value: test        ports:        - containerPort: 80        readinessProbe:          httpGet:            path: /health_check            port: 81          periodSeconds: 5          successThreshold: 3          failureThreshold: 2      imagePullSecrets:        - name: registrykey</code></pre><p>Let's launch this deployment using the command given below and monitor the rolling deployment.</p><pre><code class="language-bash">$ kubectl apply -f test-deployment.ymldeployment &quot;test-staging-web&quot; configured</code></pre><p>After the deployment is configured we can check the pods and how they are restarted.</p><p>We can also access the application to check if we face any down time.</p><pre><code class="language-bash">$ kubectl  get pods    NAME                                  READY      STATUS  RESTARTS    AGEtest-staging-web-372228001-t85d4           1/1       Running   0          1dtest-staging-web-372424609-1fpqg           0/1       Running   0          50s</code></pre><p>We can see above that only one pod is re-created at the timeandone of the old pod is serving the application traffic.Also, new pod is running but not ready as it has not yet passed the readiness probe condition.</p><p>After sometime when the new pod is in ready state,old pod is re-created and traffic is served by the new pod.In this way, our application does not suffer any down-time andwe can confidently do deployments even at peak hours.</p>]]></content>
    </entry><entry>
       <title><![CDATA[Rails using db:migrate & db:seed on Kubernetes]]></title>
       <author><name>Vishal Telangre</name></author>
      <link href="https://www.bigbinary.com/blog/managing-rails-tasks-such-as-db-migrate-and-db-seed-on-kuberenetes-while-performing-rolling-deployments"/>
      <updated>2017-06-16T12:00:00+00:00</updated>
      <id>https://www.bigbinary.com/blog/managing-rails-tasks-such-as-db-migrate-and-db-seed-on-kuberenetes-while-performing-rolling-deployments</id>
      <content type="html"><![CDATA[<p>This post assumes that you have basic understanding of<a href="http://kubernetes.io/">Kubernetes</a> terms like<a href="http://kubernetes.io/docs/user-guide/pods/">pods</a> and<a href="http://kubernetes.io/docs/user-guide/deployments/">deployments</a>.</p><h3>Problem</h3><p>We want to deploy a Rails application on Kubernetes. We assume that the<code>assets:precompile</code> task would be run as part of the Docker image build process.</p><p>We want to run rake tasks such as <code>db:migrate</code> and <code>db:seed</code> on the initialdeployment, and just <code>db:migrate</code> task on each later deployment.</p><p>We cannot run these tasks while building the Docker image as it would not beable to connect to the database at that moment.</p><p>So, how to run these tasks?</p><h3>Solution</h3><p>We assume that we have a Docker image named <code>myorg/myapp:v0.0.1</code> which containsthe source code for our Rails application.</p><p>We also assume that we have included <code>database.yml</code> manifest in this Dockerimage with the required configuration needed for connecting to the database.</p><p>We need to create a<a href="https://kubernetes.io/docs/concepts/workloads/controllers/deployment/">Kubernetes deployment</a>template with the following content.</p><pre><code class="language-yaml">apiVersion: extensions/v1beta1kind: Deploymentmetadata:  name: myappspec:  template:    spec:      containers:        - image: myorg/myapp:v0.0.1          name: myapp          imagePullPolicy: IfNotPresent          env:            - name: DB_NAME              value: myapp            - name: DB_USERNAME              value: username            - name: DB_PASSWORD              value: password            - name: DB_HOST              value: 54.10.10.245          ports:            - containerPort: 80      imagePullSecrets:        - name: docker_pull_secret</code></pre><p>Let's save this template file as <code>myapp-deployment.yml</code>.</p><p>We can change the options and environment variables in above template as per ourneed. The environment variables specified here will be available to our Railsapplication.</p><p>To apply above template for the first time on Kubernetes, we will use thefollowing command.</p><pre><code class="language-bash">$ kubectl create -f myapp-deployment.yml</code></pre><p>Later on, to apply the same template after modifications such as change in theDocker image name or change in the environment variables, we will use thefollowing command.</p><pre><code class="language-bash">$ kubectl apply -f myapp-deployment.yml</code></pre><p>After applying the deployment template, it will create a pod for our applicationon Kubernetes.</p><p>To see the pods, we use the following command.</p><pre><code class="language-bash">$ kubectl get pods</code></pre><p>Let's say that our app is now running in the pod named <code>myapp-4007005961-1st7s</code>.</p><p>To execute a rake task, for e.g. <code>db:migrate</code> on this pod, we can run thefollowing command.</p><pre><code class="language-bash">$ kubectl exec myapp-4007005961-1st7s                              \          -- bash -c                                               \          'cd ~/myapp &amp;&amp; RAILS_ENV=production bin/rake db:migrate'</code></pre><p>Similarly, we can execute <code>db:seed</code> rake task as well.</p><p>If we already have an automated flow for deployments on Kubernetes, we can makeuse of this approach to programmatically or conditionally run any rake task asper the needs.</p><h3>Why not to use <a href="https://kubernetes.io/docs/concepts/workloads/controllers/jobs-run-to-completion/">Kubernetes Jobs</a> to solve this?</h3><p>We faced some issues while using Kubernetes Jobs to run migration and seed raketasks.</p><ol><li><p>If the rake task returns a non-zero exit code, the Kubernetes job keepsspawning pods until the task command returns a zero exit code.</p></li><li><p>To get around the issue mentioned above we needed to unnecessarily implementadditional custom logic of checking job status and the status of all thespawned pods.</p></li><li><p>Capturing the command's STDOUT or STDERR was difficult using Kubernetes job.</p></li><li><p>Some housekeeping was needed such as manually terminating the job if itwasn't successful. If not done, it will fail to create a Kubernetes job withthe same name, which is bound to occur when we perform later deployments.</p></li></ol><p>Because of these issues, we choose not to rely on Kubernetes jobs to solve thisproblem.</p>]]></content>
    </entry><entry>
       <title><![CDATA[Kubernetes Configmap with files to deploy Rails apps]]></title>
       <author><name>Rahul Mahale</name></author>
      <link href="https://www.bigbinary.com/blog/using-kubernetes-configmap-with-configuration-files-for-deploying-rails-app"/>
      <updated>2017-05-25T12:00:00+00:00</updated>
      <id>https://www.bigbinary.com/blog/using-kubernetes-configmap-with-configuration-files-for-deploying-rails-app</id>
      <content type="html"><![CDATA[<p>This post assumes that you have basic understanding of<a href="http://kubernetes.io/">Kubernetes</a>terms like<a href="http://kubernetes.io/docs/user-guide/pods/">pods</a>and<a href="http://kubernetes.io/docs/user-guide/deployments/">deployments</a>.</p><p>We deploy our Rails applications on Kubernetes and frequently do rolling deployments.</p><p>While performing application deployments on kubernetes cluster, sometimes we need to change the application configuration file.Changing this application configuration file means weneed to change source code, commit the change and then go through the complete deployment process.</p><p>This gets cumbersome for simple changes.</p><p>Let's take the case of wanting to add queue in sidekiq configuration.</p><p>We should be able to change configuration and restart the pod instead of modifying the source-code, creating a new image and then performing a new deployment.</p><p>This is where Kubernetes's <a href="https://kubernetes.io/docs/tasks/configure-pod-container/configure-pod-configmap/">ConfigMap</a> comes handy.It allows us to handle configuration files much more efficiently.</p><p>Now we will walk you through the process of managing sidekiq configuration file using configmap.</p><h2>Starting with configmap</h2><p>First we need to create a configmap.We can either create it using <code>kubectl create configmap</code> commandorwe can use a yaml template.</p><p>We will be using yaml template <code>test-configmap.yml</code> which already has sidekiq configuration.</p><pre><code class="language-yaml">apiVersion: v1kind: ConfigMapmetadata:  name: test-staging-sidekiq  labels:    name: test-staging-sidekiq  namespace: testdata:  config: |-    ---    :verbose: true    :environment: staging    :pidfile: tmp/pids/sidekiq.pid    :logfile: log/sidekiq.log    :concurrency: 20    :queues:      - [default, 1]    :dynamic: true    :timeout: 300</code></pre><p>The above template creates configmap in the <code>test</code> namespace and is only accessible in that namespace.</p><p>Let's launch this configmap using following command.</p><pre><code class="language-bash">$ kubectl create -f  test-configmap.ymlconfigmap &quot;test-staging-sidekiq&quot; created</code></pre><p>After that let's use this configmap to create our <code>sidekiq.yml</code> configuration file in deployment template named <code>test-deployment.yml</code>.</p><pre><code class="language-yaml">---apiVersion: extensions/v1beta1kind: Deploymentmetadata:  name: test-staging  labels:    app: test-staging  namespace: testspec:  template:    metadata:      labels:        app: test-staging    spec:      containers:      - image: &lt;your-repo&gt;/&lt;your-image-name&gt;:latest        name: test-staging        imagePullPolicy: Always       env:        - name: REDIS_HOST          value: test-staging-redis        - name: APP_ENV          value: staging        - name: CLIENT          value: test        volumeMounts:            - mountPath: /etc/sidekiq/config              name: test-staging-sidekiq        ports:        - containerPort: 80      volumes:        - name: test-staging-sidekiq          configMap:             name: test-staging-sidekiq             items:              - key: config                path:  sidekiq.yml      imagePullSecrets:        - name: registrykey</code></pre><p>Now let's create a deployment using above template.</p><pre><code class="language-bash">$ kubectl create -f  test-deployment.ymldeployment &quot;test-pv&quot; created</code></pre><p>Once the deployment is created, pod running from that deployment will start sidekiq using the <code>sidekiq.yml</code> mounted at <code>/etc/sidekiq/config/sidekiq.yml</code>.</p><p>Let's check this on the pod.</p><pre><code class="language-bash">deployer@test-staging-2766611832-jst35:~$ cat /etc/sidekiq/config/sidekiq_1.yml---:verbose: true:environment: staging:pidfile: tmp/pids/sidekiq_1.pid:logfile: log/sidekiq_1.log:concurrency: 20:timeout: 300:dynamic: true:queues:  - [default, 1]</code></pre><p>Our sidekiq process uses this configuration to start sidekiq.Looks like configmap did its job.</p><p>Further if we want to add one new queue to sidekiq,we can simply modify the configmap template and restart the pod.</p><p>For example if we want to add <code>mailer</code> queue we will modify template as shown below.</p><pre><code class="language-yaml">apiVersion: v1kind: ConfigMapmetadata:  name: test-staging-sidekiq  labels:    name: test-staging-sidekiq  namespace: testdata:  config: |-    ---    :verbose: true    :environment: staging    :pidfile: tmp/pids/sidekiq_1.pid    :logfile: log/sidekiq_1.log    :concurrency: 20    :queues:      - [default, 1]      - [mailer, 1]    :dynamic: true    :timeout: 300</code></pre><p>Let's launch this configmap using following command.</p><pre><code class="language-bash">$ kubectl apply -f  test-configmap.ymlconfigmap &quot;test-staging-sidekiq&quot; configured</code></pre><p>Once the pod is restarted, it will use new sidekiq configuration fetched from the configmap.</p><p>In this way, we keep our Rails application configuration files out of the source-code and tweak them as needed.</p>]]></content>
    </entry><entry>
       <title><![CDATA[Kubernetes Persistent volume to store persistent data]]></title>
       <author><name>Rahul Mahale</name></author>
      <link href="https://www.bigbinary.com/blog/using-kubernetes-persistent-volume-for-persistent-data-storage"/>
      <updated>2017-04-12T12:00:00+00:00</updated>
      <id>https://www.bigbinary.com/blog/using-kubernetes-persistent-volume-for-persistent-data-storage</id>
      <content type="html"><![CDATA[<p>In one of our projects we are running Rails application on<a href="http://kubernetes.io/">Kubernetes</a> cluster. It is proven tool for managing anddeploying docker containers in production.</p><p>In kubernetes containers are managed using<a href="http://kubernetes.io/docs/user-guide/deployments/">deployments</a> and they aretermed as <a href="http://kubernetes.io/docs/user-guide/pods/">pods</a>. <code>deployment</code> holdsthe specification of pods. It is responsible to run the pod with specifiedresources. When <code>pod</code> is restarted or <code>deployment</code> is deleted then data is loston pod. We need to retain data out of pods lifecycle when the <code>pod</code> or<code>deployment</code> is destroyed.</p><p>We use docker-compose during development mode. In docker-compose linking betweenhost directory and container directory works out of the box. We wanted similarmechanism with Kubernetes to link volumes. In kubernetes we have various typesof <a href="https://kubernetes.io/docs/user-guide/volumes/#types-of-volumes1">volumes</a>to use. We chose<a href="http://kubernetes.io/docs/user-guide/persistent-volumes/">persistent volume</a>with <a href="https://aws.amazon.com/ebs/">AWS EBS</a> storage. We used persistent volumeclaim as per the need of application.</p><p>As per the<a href="http://kubernetes.io/docs/user-guide/persistent-volumes/">Persistent Volume's definition</a>(PV) Cluster administrators must first create storage in order for Kubernetes tomount it.</p><p>Our Kubernetes cluster is hosted on AWS. We created AWS EBS volumes which can beused to create persistent volume.</p><p>Let's create a sample volume using aws cli and try to use it in the deployment.</p><pre><code class="language-bash">aws ec2 create-volume --availability-zone us-east-1a --size 20 --volume-type gp2</code></pre><p>This will create a volume in <code>us-east-1a</code> region. We need to note <code>VolumeId</code>once the volume is created.</p><pre><code class="language-bash">$ aws ec2 create-volume --availability-zone us-east-1a --size 20 --volume-type gp2{    &quot;AvailabilityZone&quot;: &quot;us-east-1a&quot;,    &quot;Encrypted&quot;: false,    &quot;VolumeType&quot;: &quot;gp2&quot;,    &quot;VolumeId&quot;: &quot;vol-123456we7890ilk12&quot;,    &quot;State&quot;: &quot;creating&quot;,    &quot;Iops&quot;: 100,    &quot;SnapshotId&quot;: &quot;&quot;,    &quot;CreateTime&quot;: &quot;2017-01-04T03:53:00.298Z&quot;,    &quot;Size&quot;: 20}</code></pre><p>Now let's create a persistent volume template <code>test-pv</code> to create volume usingthis EBS storage.</p><pre><code class="language-yaml">kind: PersistentVolumeapiVersion: v1metadata:  name: test-pv  labels:    type: amazonEBSspec:  capacity:    storage: 10Gi  accessModes:    - ReadWriteMany  awsElasticBlockStore:    volumeID: &lt;your-volume-id&gt;    fsType: ext4</code></pre><p>Once we had template to create persistent volume, we used<a href="http://kubernetes.io/docs/user-guide/kubectl/">kubectl</a> to launch it. Kubectlis command line tool to interact with Kubernetes cluster.</p><pre><code class="language-bash">$ kubectl create -f  test-pv.ymlpersistentvolume &quot;test-pv&quot; created</code></pre><p>Once persistent volume is created you can check using following command.</p><pre><code class="language-bash">$ kubectl get pvNAME       CAPACITY   ACCESSMODES   RECLAIMPOLICY   STATUS      CLAIM               REASON    AGEtest-pv     10Gi        RWX           Retain          Available                                7s</code></pre><p>Now that our persistent volume is in available state, we can claim it bycreating<a href="http://kubernetes.io/docs/user-guide/persistent-volumes/#persistentvolumeclaims">persistent volume claim policy</a>.</p><p>We can define persistent volume claim using following template <code>test-pvc.yml</code>.</p><pre><code class="language-yaml">kind: PersistentVolumeClaimapiVersion: v1metadata:  name: test-pvc  labels:    type: amazonEBSspec:  accessModes:    - ReadWriteMany  resources:    requests:      storage: 10Gi</code></pre><p>Let's create persistent volume claim using above template.</p><pre><code class="language-bash">$ kubectl create -f  test-pvc.ymlpersistentvolumeclaim &quot;test-pvc&quot; created</code></pre><p>After creating the persistent volume claim, our persistent volume will changefrom <code>available</code> state to <code>bound</code> state.</p><pre><code class="language-bash">$ kubectl get pvNAME       CAPACITY   ACCESSMODES   RECLAIMPOLICY   STATUS     CLAIM               REASON    AGEtest-pv    10Gi        RWX           Retain          Bound      default/test-pvc              2m$kubectl get pvcNAME        STATUS    VOLUME    CAPACITY   ACCESSMODES   AGEtest-pvc    Bound     test-pv   10Gi        RWX           1m</code></pre><p>Now we have persistent volume claim available on our Kubernetes cluster, Let'suse it in deployment.</p><h3>Deploying Kubernetes application</h3><p>We will use following deployment template as <code>test-pv-deployment.yml</code>.</p><pre><code class="language-yaml">apiVersion: extensions/v1beta1kind: Deploymentmetadata:  name: test-pv  labels:    app: test-pvspec:  replicas: 1  template:    metadata:      labels:        app: test-pv        tier: frontend    spec:      containers:        - image: &lt;your-repo&gt;/&lt;your-image-name&gt;:latest          name: test-pv          imagePullPolicy: Always          env:            - name: APP_ENV              value: staging            - name: UNICORN_WORKER_PROCESSES              value: &quot;2&quot;          volumeMounts:            - name: test-volume              mountPath: &quot;/&lt;path-to-my-app&gt;/shared/data&quot;          ports:            - containerPort: 80      imagePullSecrets:        - name: registrypullsecret      volumes:        - name: test-volume          persistentVolumeClaim:            claimName: test-pvc</code></pre><p>Now launch the deployment using following command.</p><pre><code class="language-bash">$ kubectl create -f  test-pvc.ymldeployment &quot;test-pv&quot; created</code></pre><p>Once the deployment is up and running all the contents on <code>shared</code> directorywill be stored on persistent volume claim. Further when pod or deploymentcrashes for any reason our data will be always retained on the persistentvolume. We can use it to launch the application deployment.</p><p>This solved our goal of retaining data across deployments across <code>pod</code> restarts.</p>]]></content>
    </entry>
     </feed>