Running TeamCity on Kubernetes

Definition

Kubernetes – κυβερνήτης • (kyvernítis) mBt_pEfqCAAAiVyz

  1. governor (leader of a region or state)
  2. (nautical) captain, skipper
  3. pilot (of an aircraft)

Motivation

61066550We recently moved to a new office and dis­covered that one of our bare metal Con­tin­uous Integration build agents didn’t survive the move. Since other developers were already unhappy with the fact, that the CI could only be reached from within the office, we took that as an in­cen­tive to give  building on the Google Cloud Platform a shot.

We already experimented with CoreOS for our internal logging system. Although we liked it, we were quite curious how a cluster setup with Kubernetes (K8S if you like) performs.

So in this post, we’ll show you how to set things up, what pitfalls to avoid, and how we managed to create a robust and easy to set up solution.

 

Cluster Setup

The setup of a Kubernetes cluster on the Google Cloud Platform is quite easy (and fast as a Google engineer will happily demonstrate) with the Google Cloud Container Engine.
I recommend using the command line tool gcloud because there are almost no Kubernetes features implemented in the GUI yet.

For our use case we figured two CPU cores should be enough and used a n1‑standard‑2 machine:


gcloud alpha container clusters create\
 -m n1-standard-2\
 teamcity

Build Agent Setup

Now that you have a three node Kubernetes cluster running, it is time to start-up some containers. In Kubernetes a group of containers and their resources (volumes, ports…) form a pod (as in: a pod of dolphins). To ensure that n pods are always running, you can use a replication controller. Almost anything you do in a Kuberntes cluster you do through kubectl which is hidden under:


gcloud alpha container kubectl

By the way, your life will be a bit easier when you setup an alias first:


alias kubectl='gcloud alpha container kubectl'

Here’s an example for our replication controller:


id: teamcity-agent
kind: ReplicationController
apiVersion: v1beta1
labels:
  name: teamcity-agent
desiredState:
  replicas: 3
  replicaSelector:
    name: teamcity-agent
  podTemplate:
    labels:
      name: teamcity-agent
    desiredState:
      manifest:
       version: v1beta1
       id: teamcity-agent
       containers:
         - name: teamcityagent
           image: smallimprovements/teamcity-agent-docker
           env:
             - name: TEAMCITY_SERVER
               value: http://teamcity:8111
           volumeMounts:
             - name:  agent-data
               mountPath: /data
       volumes:
          - name: agent-data
            source:
              hostDir:
                path: /var/buildAgent

This will spawn a replication controller (line 2) named teamcity-agent (line 1). The controller should fire up three pods (line 7). The pods consists of:

  • a container named teamcityagent (line 18), that has its working directory “data mounted to a volume (line 23)
  • a Volume named agent-data (line 27) that goes to a host directory (line 29)

If aren’t comfortable writing YAML, don’t worry, Kubernetes understands JSON as you’ll see in Googles example.

For a full reference of replication controllers refer to the excellent docs.

To fire up anything in Kubernetes you use kubectl create:


kubectl create -f ./teamcity-agent-rc.yml

If everything went well, you’ll get a list of your running controller and pods by:


kubectl get pods,rc

With the  parameter “-l”  you can filter the resources by the labels attached to them. In our example we gave the replication controller (line 5) – and all the pods spawned by him – the label “teamcity-agent” (line 12). So to list the controller and the pods, we run:


POD                    IP                  CONTAINER(S)        IMAGE(S)                                    HOST                                                            LABELS                STATUS              CREATED
teamcity-agent-acnbt   10.196.1.4          teamcityagent       smallimprovements/teamcity-agent-docker:latest   k8s-teamcity-node-1.c.eastern-kit-781.internal/146.148.14.166   name=teamcity-agent   Running             12 hours
teamcity-agent-hvz97   10.196.2.4          teamcityagent       smallimprovements/teamcity-agent-docker:latest   k8s-teamcity-node-2.c.eastern-kit-781.internal/104.155.19.117   name=teamcity-agent   Running             12 hours
teamcity-agent-tycvv   10.196.3.4          teamcityagent       smallimprovements/teamcity-agent-docker:latest   k8s-teamcity-node-3.c.eastern-kit-781.internal/104.155.45.72    name=teamcity-agent   Running             12 hours

CONTROLLER          CONTAINER(S)        IMAGE(S)                                    SELECTOR              REPLICAS
teamcity-agent      teamcityagent       smallimprovements/teamcity-agent-docker:latest   name=teamcity-agent   3

We tried running an in-memory disk, ramdisk, inside the container, because we weren’t happy with the build times at first (but eventually didn’t see any noticeable performance improvements). If you want to do this, you need to start the whole Kubernetes cluster with –allow_privilged=true so you can mount tmpfs inside a container.

But, it’s much easier to set up a ramdisk on your host system, and use it as a volume for your containers. This is what we are doing in line 26 of the example above.

The fstab on our host system looks like this


tmpfs /var/buildAgent tmpfs noatime,size=2G  0  0

Build Server Setup

61075533Another important part of our setup is the TeamCity Server. Again we use a replication controller.

“Why?” you might ask, do you use a replication controller, if you need only one pod?

That’s because pods are not durable. When a pod fails it’s gone. In their design document Google says that you “should almost always” use controllers to facilitate self-healing in the cluster.


id: teamcity
kind: ReplicationController
apiVersion: v1beta1
labels:
  name: teamcity
desiredState:
  replicas: 1
  replicaSelector:
    name: teamcity
  podTemplate:
    labels:
      name: teamcity
    desiredState:
      manifest:
        id: teamcity
        containers:
          - name: teamcity
            image: smallimprovements/teamcity-docker
            ports:
              - name: teamcity
                containerPort: 8111
            volumeMounts:
              - name:  teamcity-data
                mountPath: /var/lib/teamcity
        volumes:
          - name: teamcity-data
            source:
              persistentDisk:
                pdName: teamcity-backup
                fsType: ext4

Nothing special here, but note that we exposed the port 8111 of the teamcity container and named it “teamcity” (line 19). Furthermore we have given the label “teamcity” to the pods (line 12). The label will be important for the setup of our TeamCity service.

Which brings us to the last Kubernetes resources of this post, the service.

Service

Because pods are ephemeral and their containers will get a new IP address every time they are fired up, the only way to get a permanent address to access our TeamCity server is to expose it as a service:


id: teamcity
kind: Service
apiVersion: v1beta1
port: 8111
containerPort: teamcity
labels:
  name: teamcity
selector:
  name: teamcity

Here a service with the id “teamcity” (line 1) is created.
We use the container port with the name “teamcity” (line 5) from the pods that match the selector “teamcity” (line 9), i.e. the pods that have the label “teamcity”.

The container port is exposed as the port 8111 (line 4) on the teamcity service.
So how do you access the service now? From within a container the service is accessible on a host named after the services id. So inside a teamcityagent container you can access TeamCity now through http://teamcity:8111.

But we wanted to access TeamCity from our office. Luckily every service will get an IP address through which it is accessible from the nodes inside the cluster.

To find out the services IP address use kubectl, as in the following example:


NAME       LABELS         SELECTOR       IP             PORT
teamcity   name=teamcity  name=teamcity  10.199.243.51  8111

So now we know that from any node in the cluster TeamCity will be accessible via 10.199.243.51:8111.
Because we didn’t want to expose our TeamCity server to the public, we decided dig a ssh tunnel from our office into the cluster and from there to the TeamCity service.


# we use a load balanced ssh connection
# to reach any node in the teamcity cluster
LB_SSH_CLUSTER_TEAMCITY=123.123.1.2 

# from inside the cluster access Teamcity
# on the services ip
SERVICE_TEAMCITY=10.199.243.51
PORT_TEAMCITY=8111

ssh -NfC\
    -L $PORT_TEAMCITY:$SERVICE_TC:$PORT_TEAMCITY\
    $LB_SSH_CLUSTER_TEAMCITY

You might have stumbled over the load balanced comment: To make things more stable we decided to use a load balancer for the SSH connection. So when we destroy or create nodes in our Kubernetes cluster, we add them to the target pool and they are eligible for the tunnel. How to set up a load balancer on the cloud platform is described here.

To wrap things up, here are the steps  to get from zero to a cluster with three nodes running TeamCity and the agents:


#create the cluster
gcloud alpha container clusters create\
 -m n1-standard-2\
 teamcity

#create TeamCity
kubectl create -f teamcity-rc.yml

#expose it as a service
kubectl create -f teamcity-service.yml

#create the agents
kubectl create -f teamcity-agent-rc.yml

Debugging

To be fair most of the time you’ll spend getting the resource configurations right, or debugging your containers. A nice touch is that you can read the logs from outside the cluster through kubectl:


# kubectl logs <pod> <container>
kubectl logs teamcity-grizy teamcity

If the pods won’t start, this doesn’t help you much, because as Kubernetes will tell you: There are no logs. In that case a “kubectl get events will” usually help. This will give you a list of events that happened in your cluster, before things went bad.

In theory you can execute commands through kubectl in the containers, as we tried in the following example


kubectl exec -it\
-p teamcity-agent-acnbt\
-c teamcityagent\
--  bash

Alas, in practice this consistently failed us, but we guessed it will work once the container engine reaches beta (it might work for you).

Conclusion

Its been fun setting up a build system on Kubernetes. Debugging was easier than on plain CoreOS because we could read the clusters events.

We loved the service abstraction to link containers and were quite impressed by the replication controllers (a feature lacking in plain CoreOS)

What we didn’t like was, that it wasn’t possible to enforce that only one agent might run per node. So it took some tries (kubectl {delete, create}) till they ran where they should.

Per cluster one additional instance is created, that hosts the Kubernetes API: The master. Currently it will have the same machine type as every node in your cluster. So for our 3 node setup an additional two core machine is quite a waste, but this evens out when you do larger clusters.