Yesod hosting with Docker and Kubernetes

December 14, 2015

By Michael Snoyman

N.B. The stack image command referenced in this blog post has been removed. Please see Building Haskell Apps with Docker for a more up-to-date guide.

About a month ago, there were a few days of instability for yesodweb.com hosting. The reason for this was that I was in the midst of moving hosting of yesodweb.com (and a few other sites) to new hosting. I went through a few iterations, and wanted to mention how the hosting now works, what I like, and some pain points to notice.

The end result of this is a Docker/Kubernetes deployment, consisting of a single Docker image containing various sites (six currently), and an extra application that reverse proxies to the appropriate one based on virtual host. But let's step through it bit by bit.

Stack's Docker support

The Stack build tool supports using Docker in two different ways, both of which are leveraged by this setup.

Using a Docker build image to provide build tools (like GHC), system libraries, and optionally Haskell libraries, and performing the build within such a container. This isolates your build from most host-specific configurations, and grants you immediate access to many tools (like PostgreSQL client libraries) without modifying your host.
Generate a Docker image based on a base image that includes necessary system libraries, and includes generated executables and any additional files requested (such as configuration files and static resources like CSS and Javascript).

What's really nice about this setup vs a more standard Docker image generation approach is that our generated runtime image (from (2)) does not include any build-specific tools. This makes our images lighter-weight, and avoids having unnecessary code in production (which is good from a security standpoint).

What's really nice about all of this is how simple the Stack configuration is to make it happen. Consider the following stack.yaml file:

# Every Stack.yaml needs to specify its resolver
resolver: lts-3.14

# Build using Docker. Will use the default stack-build image, which
# includes build tools but not precompiled libraries.
docker:
  enable: true

# Generate a Docker image
image:
  container:
    # New image will be called snoyberg/yesodweb
    name: snoyberg/yesodweb
    # Base it on the stack-run images
    base: fpco/stack-run
    # Include some config and static files in the image
    add:
      config: /app/config
      static: /app/static

With this in place, running stack image container will generate the snoyberg/yesodweb image, which I can then push to whatever Docker registry I want using normal Docker commands.

For more information on Stack's Docker support, see the Docker integration page.

Mega-repo

I initially deployed each of my sites as a separate deployment. However, for various resource-related reasons (disk space, number of machines), I decided to try out deploying the six sites as a single deployment. I'm not convinced yet that this is a great idea, but it's certainly working in practice. The result is my snoyman-webapps repo. As a short snippet:

apps:
- vhost: www.yesodweb.com
  dir: /app/yesodweb.com
  exe: yesodweb
  args:
  - production
- vhost: www.haskellers.com
  dir: /app/haskellers
  exe: haskellers
  args:
  - production

redirects:
- src: yesodweb.com
  dst: www.yesodweb.com
- src: yesodweb.org
  dst: www.yesodweb.com

This file has a few important things to note:

A submodule for each site in the deployment, inside the sites directory
The Kubernetes configuration (discussed below) in the kube directory
A reverse proxying web application for running the sites and serving appropriate content based on virtual host.

Let's jump into that last one right away.

Reverse proxying

Probably the most instructive file on this program is the webapps.yaml config file. This shows that the web app is capable of:

Running child applications (the six sites I mentioned)
Reverse proxying to the appropriate applications
Performing simple redirects between domain names

In theory this code could be turned into something standalone, but for now it's really custom-tailored to my needs here.

The biggest downside with this approach is that (without Server Name Indication, or SNI) it doesn't support TLS connections. I chose sites for this that are not served over TLS currently, and do not handle sensitive information (e.g., no password collection). Upgrading to have SNI support and using something like Let's Encrypt would be a fun upgrade in the future.

Kubernetes configuration

Kubernetes uses YAML files for configuration. I'm not going to jump into the syntax of the config files or the overarching model Kubernetes uses for running your applications. If you're unfamiliar and interested, I recommended reading the Kubernetes docs.

Secrets

Three of the sites I'm hosting have databases, and therefore the database credentials need to be securely provided to the apps during deployment. Kubernetes provides a nice mechanism for this: secrets. You specify some (base64-encoded) content in a YAML file, and then you can mount a virtual filesystem for your apps to access the data from. Let's have a look at the haskellers.com (scrubbed) secrets file:

apiVersion: v1
kind: Secret
metadata:
    name: haskellers-postgresql-secret
data:
    # https://github.com/snoyberg/haskellers/blob/master/config/db/postgresql.yml.example
    postgresql.yml: base64-yaml-contents
    # https://github.com/snoyberg/haskellers/blob/master/config/db/aws.example
    aws: base64-yaml-contents
    client-session-key.aes: base64-version-of-client-session-key

I have a separate secrets config for each site. The decision was made mostly for historical reasons, since the sites were originally hosted separately. I still like to keep them separate though, since it's easy to put the secrets into different subdirectories, as we'll see next.

Replication controller

There's a lot of content in the replication controller config. I'll strip it down just a bit:

apiVersion: v1
kind: ReplicationController
metadata:
    name: snoyman-webapps
    labels:
        name: snoyman-webapps
spec:
  replicas: 1
  selector:
    name: snoyman-webapps
  template:
    metadata:
        labels:
            name: snoyman-webapps
    spec:
        volumes:
        - name: haskellers-postgresql-volume
          secret:
              secretName: haskellers-postgresql-secret
        containers:
        - name: webapps
          image: snoyberg/snoyman-webapps:latest
          command: ["webapps"]
          workingDir: /app/webapps
          ports:
          - name: webapps
            containerPort: 3000
          env:
          - name: PORT
            value: "3000"
          volumeMounts:
          - name: haskellers-postgresql-volume
            readOnly: true
            mountPath: /app/haskellers/config/db

The interesting stuff:

We mount the haskellers.com secret at /app/haskellers/config/db, where the app itself expects it
The webapps app needs to know what port to listen on, so we tell it via the PORT environment variable
We also tell Kubernetes that the application is listening on port 3000
The internally listened on ports for each application are irrelevant to us: the webapps app handles though for us automatically

Service (load balancer)

The load balancing service is quite short and idiomatic:

apiVersion: v1
kind: Service
metadata:
  name: snoyman-webapps
  labels:
    name: snoyman-webapps
spec:
  type: LoadBalancer
  ports:
  - name: http
    port: 80
    targetPort: webapps
  selector:
    name: snoyman-webapps

Updates

When it comes time to make updates to one the these sites, I do the following:

Change that site's repo and commit
Update the submodule reference for snoyman-webapps
Run stack image container && docker push snoyberg/snoyman-webapps
- See the stack.yaml file for details
Perform a rolling update with Kubernetes: kubectl rolling-update snoyman-webapps --image=snoyberg/snoyman-webapps:latest

Some notes:

The rolling-update is lack-luster in Kubernetes; it can fail to work for a variety of reason I have yet to fully understand. My biggest advice: when possible, create single-container replication controllers.
I always just push and use the latest image. For more control/reliability, I recommend tagging Docker images with the Git SHA1 of the commit being built from. I'm lazy for these sites, but for client deployments at FP Complete, we always follow this practice.

Google Container Engine

I started off this whole project as a way to evaluate Kubernetes. Based on that, I started hosting this on Google Container Engine instead of fiddling with configuring AWS to host Kubernetes myself. Overall, I'm happy with how it turned out. I had a few ugly issues

Running out of disk space due to the large number of Docker images (likely the primary motivation to moving towards a single Docker image for all the sites).
All my sites went down one day when my account switched over from the trial to non-trial. I don't remember getting an email warning me about this, which would have been nice.

A n1-standard-1 instance size has been plenty to support all of these sites, which is nice (yay lightweight Haskell/Yesod!). That said, at FP Complete, we host our stuff on AWS, and have had pretty good experience with running Kubernetes there.

Conclusion

Overall, I find the Docker/Kubernetes deployment workflow quite pleasant to work with. I may find more hiccups over time, but for now, I'd strongly recommend people consider it for deployments of their own, especially if you're using tooling like Stack that makes it so easy to create Docker images.

Comments

Yesod hosting with Docker and Kubernetes

December 14, 2015

By Michael Snoyman

Stack's Docker support

Mega-repo

Reverse proxying

Kubernetes configuration

Secrets

Replication controller

Service (load balancer)

Updates

Google Container Engine

Conclusion

Archives