Building Custom Kubernetes Operators Part 3: Building Operators in Go using Operator SDK

  • April 16, 2019

Kubernetes operators were introduced as an implementation of the Infrastructure as Software concept. Using them you can abstract the deployment of applications and services in a Kubernetes cluster. This is the third in a series of articles explaining how operators work, and how they can be implemented in different languages.

Introduction

Previously in this series we’ve seen what Kubernetes operators are and how they work. The last article gave an in-depth review about how we, as developers, can create operators that extend Kubernetes API.

In this article we are going to use the Go programming language and the Operator SDK to implement the immortal containers operator presented in the previous article. We recommend that you read article II of this series, if you have not already done so, before going further.

This article assumes that you have Go (version 1.11, at least) installed on your computer. You will also need access to a Kubernetes cluster to test the operator (you can use minikube to create a development cluster).

The complete source code for the operator being built through this article can be found at: https://github.com/flugel-it/k8s-go-operator-sdk-operator

Introducing the immortal containers operator

This section presents the immortal containers operator. We will use this simple operator as an example in this and other future articles.

The purpose of the immortal containers operator is to provide users with a way to define containers that should run forever — that is to say that whenever such a container is terminated for any reason, it must be restarted.

Keep in mind that this is just a “toy” operator (one created for the purpose of illustration) to demonstrate the necessary steps to create an operator. The functionality it provides can be achieved with already existing Kubernetes features, such as deployments.

This operator defines a new object kind named ImmortalContainer. Users create objects of this kind in order to specify containers that must run continually, “forever”. In each such object the user specifies the image he wants to run.

Each ImmortalContainer object has the following structure:

ImmortalContainer
    - Spec
        - Image
    - Status
        - CurrentPod
        - StartTimes

Let’s say the operator has been installed and the user wants to create an immortal container to run the image nginx:latest. To do so, he can use kubectl to create an ImmortalContainer object.

# example.yaml
apiVersion: immortalcontainer.flugel.it/v1alpha1
kind: ImmortalContainer
metadata:
  name: example-immortalcontainer
spec:
  image: nginx:latest
$ kubectl apply -f example.yaml

For each ImmortalContainer object the operator’s controller creates a pod to run the container and then recreates the pod whenever it terminates or is deleted. In the same object the operator exposes the name of the created pod and the number of times it has been created.

$ kubectl get pods
NAME                        READY     STATUS   RESTARTS   AGE
example-immortalcontainer-immortalpod  1/1    Running 0   25m

If someone deletes the pod, it will be recreated.

$ kubectl delete pods example-immortalcontainer-immortalpod
pod "example-immortalcontainer-immortalpod" deleted
 
$ kubectl get pods                                         
NAME                       READY     STATUS    RESTARTS      AGE
example-immortalcontainer-immortalpod 0/1 ContainerCreating 0  3s

Finally, the user can edit the ImmortalContainer object he has created to see the CurrentPod and StartTimes fields.

$ kubectl edit immortalcontainer example-immortalcontainer

apiVersion: immortalcontainer.flugel.it/v1alpha1
kind: ImmortalContainer
metadata:
…
spec:
 image: nginx:latest
status:
 currentPod: example-immortalcontainer-immortalpod
 startTimes: 2

 

Implementation process

The implementation of the immortal containers operator, or almost any other operator, involves at least the following tasks:

  1. Project initialization: we must define the structure of our code, install required dependencies, and initialize a repository to store our files (this last step is optional but highly recommended).
  2. Custom resources definitions: create CRDs to define new object kinds and resources. For our operator we must create the object kind ImmortalContainer and define a resource to give users access to these objects.
  3. Custom controller implementation: We need to implement a controller that reacts to relevant events and transforms the actual state into the desired one. For our operator, the controller must watch for ImmortalContainer and Pod events.
  4. Building: To be able to install the operator in a cluster, some artifacts, such as the controller image and some manifest files that we will later see in detail, must be generated.

It’s possible to implement operators using only Go and a Kubernetes API client library, but such an approach would require writing lots of boilerplate code. Instead, we’ve chosen to use the Operator SDK, a framework that makes writing operators easier.

As the Operator SDK README states, the framework provides:

  • High level APIs and abstractions allowing operational logic to be written more intuitively
  • Tools for scaffolding and code generation, facilitating quicker bootstrapping of new projects.
  • Extensions to cover common operator use cases

 

Getting Operator SDK

To install Operator SDK you can execute the following commands, also described in the project’s README.

$ mkdir -p $GOPATH/src/github.com/operator-framework
$ cd $GOPATH/src/github.com/operator-framework
$ git clone https://github.com/operator-framework/operator-sdk
$ cd operator-sdk
$ git checkout master
$ make dep
$ make install

To verify that you have Operator SDK correctly installed, try the following command, and check the output.

$ operator-sdk --version
operator-sdk version v0.5.0+git

 

Project initialization

Before proceeding, we need to create a directory in our Go workspace that matches our organization’s (or user’s) repositories’ home. You can read more about how to organize your Go workspaces in https://golang.org/doc/code.html. In our case the organization is named “flugel-it” and the repositories live in github.com. So, we created the directory $GOPATH/src/github.com/flugel-it.

$ mkdir -p $GOPATH/src/github.com/flugel-it

The next step is to initialize our operator project using Operator SDK:

$ cd $GOPATH/src/github.com/flugel-it
$ operator-sdk new immortal-containers-operator
$ cd immortal-containers-operator

This operation takes some time, as it scaffolds the project, generates code, and downloads dependencies. Operator SDK also initializes the project to use git, so we will be able to use git to track the changes without extra efforts. The just-created project directory is $GOPATH/src/github.com/flugel-it/immortal-containers-operator.

Note that the just-created operator is namespace-scoped. This means it is constrained to one namespace only. It’s also possible, using Operator SDK, to create cluster-scoped operators that work at the same time in any namespace. Whether to go with namespace-scoped or cluster-scoped operators is a design decision to be based upon your specific implementation needs.

Defining the custom resource

As we have said, we are going to use a custom resource to expose the desired and actual states. This resource is an API endpoint storing a collection of objects, where each object belongs to ImmortalContainer object kind.

Via this resource users will be able to create immortal containers and query their current status.
Rather than writing a yaml file for our Custom Resource Definition, we are going to let Operator SDK help us with this task. First, we must execute the “add api “ command to scaffold the resource definition, being sure to be inside the project directory:

$ operator-sdk add api --api-version=immortalcontainer.flugel.it/v1alpha1 
--kind=ImmortalContainer

Note that we’ve indicated that our API group is immortalcontainer.flugel.it, the API version is v1alpha1 and the new object kind name is ImmortalContainer.
Execution of the command results in the generation of various files, the most important of which (at least for now) is pkg/apis/immortalcontainer/v1alpha1/immortalcontainer_types.go. Inside this file resides the field definitions for the ImmortalContainer object kind. We need to edit the file and modify the code to add the Image, CurrentPod and StartTimes fields.

type ImmortalContainerSpec struct {
    // INSERT ADDITIONAL SPEC FIELDS - desired state of cluster
    // Important: Run "operator-sdk generate k8s" to regenerate code after modifying this file
    // Add custom validation using kubebuilder tags: https://book.kubebuilder.io/beyond_basics/generating_crd.html
 
    // +kubebuilder:validation:MinLength=1
    Image string `json:"image"`
}
 
// ImmortalContainerStatus defines the observed state of ImmortalContainer
// +k8s:openapi-gen=true
type ImmortalContainerStatus struct {
    // INSERT ADDITIONAL STATUS FIELD - define observed state of cluster
    // Important: Run "operator-sdk generate k8s" to regenerate code after modifying this file
    // Add custom validation using kubebuilder tags: https://book.kubebuilder.io/beyond_basics/generating_crd.html
    CurrentPod string `json:"currentPod,omitempty"`
    StartTimes int `json:"startTimes,omitempty"`
}

For each field, we defined its name, type, and annotations used to serialize and deserialize the objects to JSON. Also, using comments annotations, we added a minimum length validation for the Image field. As you can see, we put the Image field into the spec section and the other two fields, used to represent the current state, into the status section.

After modifying the file, we use Operator SDK to generate the CRD and the Go code needed by the operator.

$ operator-sdk generate openapi

You can take a look at the generated CRD here:

deploy/crds/immortalcontainer_v1alpha1_immortalcontainer_crd.yaml

Custom controller implementation

This section describes how the operator’s controller works and how to implement it with the help of Operator SDK.

The mission of the controller is to keep the desired and actual states synchronized. It must therefore watch for changes in ImmortalContainer objects (desired state, what containers to run) and Pods (actual state, containers running) and execute the actions needed to reconcile both states.

The following diagram illustrates the controller’s main components.

Just as we’ve done for the custom resource, we are going to use Operator SDK’s command “add controller” to scaffold our operator’s controller.

$ operator-sdk add controller --api-version=immortalcontainer.flugel.it/v1alpha1 --kind=ImmortalContainer

You can find the generated controller code here pkg/controller/immortalcontainer/immortalcontainer_controller.go. This code is a good starting point, but we need to adapt it to our needs.

Watching for events

Operator SDK provides the functionality needed to watch for events. When an event arrives, the objects whose desired or actual states might have changed are added to a queue for later processing. This queue contains every object whose states reconciliation is pending.
In this case, the generated controller is very close to what the immortal containers operator needs. Event watchers are declared inside the add function. As you can see, it already watches for events about ImmortalContainer and Pod objects.

func add(mgr manager.Manager, r reconcile.Reconciler) error {
    // Create a new controller
    c, err := controller.New("immortalcontainer-controller", mgr, controller.Options{Reconciler: r})
    if err != nil {
        return err
    }
 
    // Watch for changes to primary resource ImmortalContainer
    err = c.Watch(&source.Kind{Type: &immortalcontainerv1alpha1.ImmortalContainer{}}, &handler.EnqueueRequestForObject{})
    if err != nil {
        return err
    }
 
    // TODO(user): Modify this to be the types you create that are owned by the primary resource
    // Watch for changes to secondary resource Pods and requeue the owner ImmortalContainer
    err = c.Watch(&source.Kind{Type: &corev1.Pod{}}, &handler.EnqueueRequestForOwner{
        IsController: true,
        OwnerType:    &immortalcontainerv1alpha1.ImmortalContainer{},
    })
    if err != nil {
        return err
    }
 
    return nil
}

Note that the first event watcher checks for events about ImmortalContainer objects and uses a handler of type EnqueueRequestForObject. This handler enqueues, for later processing, the ImmortalContainer object’s name, for each received event.

The second watcher listens for pod events, but uses a handler of type EnqueueRequestForOwner. This means that when an event arrives the name of the pod’s owner will be enqueued. Here we are using an OwnerType filter to catch only the pod events related to ImmortalContainer objects.

We can see therefore that this two event watchers guarantees that for every received event, the name of any possibly-affected ImmortalContainer object will be enqueued.

States Reconciling

The main task of a controller is to keep the desired and actual states synced. To do so it processes every object that the event handlers have put in the queue of objects whose states reconciliations are pending.

During this processing, the controller compares the desired and actual states of the object and then executes actions to make them match. The steps of this process make up the controller’s reconcile loop.

To implement the immortal containers operator, the controller’s reconcile loop must create the pod if it does not exist and update the status fields of the ImmortalContainer object.

The generated controller contains a Reconcile function where the reconcile loop must be implemented. This function is a method of the ReconcileImmortalContainer type, which guarantees the function has access to a Kubernetes API client. The Reconcile function parameter is a request containing the name of the ImmortalContainer object to be reconciled.

The first section of the function deals with initializing a logger, fetching the ImmortalContainer object, and storing it in a variable named instance.

When the ImmortalContainer no longer exists, the function returns with no error — because the object may have been deleted.  In the case of any error fetching the object, the function returns, but the object name will be kept in the queue for later reprocessing.

func (r *ReconcileImmortalContainer) Reconcile(request reconcile.Request) (reconcile.Result, error) {
    reqLogger := log.WithValues("Request.Namespace", request.Namespace, "Request.Name", request.Name)
    reqLogger.Info("Reconciling ImmortalContainer")
 
    // Fetch the ImmortalContainer instance
    instance := &immortalcontainerv1alpha1.ImmortalContainer{}
    err := r.client.Get(context.TODO(), request.NamespacedName, instance)
    if err != nil {
        if errors.IsNotFound(err) {
            // Request object not found, could have been deleted after reconcile request.
            // Owned objects are automatically garbage collected. For additional cleanup logic use finalizers.
            // Return and don't requeue
            return reconcile.Result{}, nil
        }
        // Error reading the object - requeue the request.
        return reconcile.Result{}, err
    }
…

The next step is to create the definition of the pod for this ImmortalContainer, that is, a pod that runs the image specified in the Image field of the ImmortalContainer object. Note that here we are not creating the pod itself, but only a pod definition to be used (at a later time) to create the pod if needed.

Here we are also setting the ownerReference field of the pod definition to point to the ImmortalContainer object. This is the purpose of the call to SetControllerReference.

…
    // Define a new Pod object
    pod := newPodForImmortalContainer(instance)
 
    // Set ImmortalContainer instance as the owner and controller
    if err := controllerutil.SetControllerReference(instance, pod, r.scheme); err != nil {
        return reconcile.Result{}, err
    }
…

After creating the pod definition, the function tries to fetch a pod matching it. To do that it uses the Get method of r.client, the Kubernetes API client.

    found := &corev1.Pod{}
    err = r.client.Get(context.TODO(), types.NamespacedName{Name: pod.Name, Namespace: pod.Namespace}, found)

If the pod is found, there is nothing else to do in the reconcile loop. But if the pod is not found, the controller first creates it and then updates the ImmortalContainer object’s status, setting the values of the CurrentPod and StartTimes fields.

    if err != nil && errors.IsNotFound(err) {
        reqLogger.Info("Creating a new Pod", "Pod.Namespace", pod.Namespace, "Pod.Name", pod.Name)
        err = r.client.Create(context.TODO(), pod)
        if err != nil {
            return reconcile.Result{}, err
        }
 
        instance.Status.CurrentPod = pod.Name
        instance.Status.StartTimes++
        reqLogger.Info("Updating status", instance.Namespace, "/", instance.Name)
        err := r.client.Status().Update(context.TODO(), instance)
        if err != nil {
            reqLogger.Error(err, "Failed to update ImmortalContainer status.")
            return reconcile.Result{}, err
        }
 
        // Pod created successfully - don't requeue
        return reconcile.Result{}, nil

That’s all it takes to reconcile states for our operator. In the next section, creation of the pod definition will be clarified.

Creating the pod definition

This function creates the pod definition for an ImmortalContainer. It defines a pod with one container, using the image specified in the Image field of the ImmortalContainer object spec — received as the cr argument.

func newPodForImmortalContainer(cr *immortalcontainerv1alpha1.ImmortalContainer) *corev1.Pod {
    labels := map[string]string{
        "app": cr.Name,
    }
    return &corev1.Pod{
        ObjectMeta: metav1.ObjectMeta{
            Name:      cr.Name + "-immortalpod",
            Namespace: cr.Namespace,
            Labels:    labels,
        },
        Spec: corev1.PodSpec{
            Containers: []corev1.Container{
                {
                    Name:  "acontainer",
                    Image: cr.Spec.Image,
                },
            },
        },
    }
}

It’s important to note that this function does not create the pod in the cluster, but only its definition, for later use in the controller.

Running the operator

Having implemented the custom resource and custom controller, we are ready to try the operator with a real cluster.

When an operator is installed on a cluster, the controller of the operator runs in a pod inside the cluster. For testing and debugging purposes, it’s also possible to execute the controller from outside the cluster. In both cases, the controller communicates with the cluster using Kubernetes API.

Before continuing, be sure you have a cluster available for use and your credentials configured. You can run kubectl get nodes to check that you can reach the cluster. If you don’t have a cluster, you can use minikube or microk8s to create a local development cluster.

 

Running outside the cluster

Running outside the cluster means that while all the resources  (e.g. ImmortalContainer objects and pods) live inside the cluster, the controller is executed externally, in the developer computer, for example. The following diagram illustrates such a situation:

Assuming the cluster is running and your credentials are stored in ~/.kube/config, we are ready to try the operator.

Check the cluster availability

$ kubectl get nodes

The following two commands install the custom resource in the cluster and run the controller locally, in your computer:

$ kubectl apply -f deploy/crds/immortalcontainer_v1alpha1_immortalcontainer_crd.yaml
$ operator-sdk up local

After that, you should see the following output appear in the logs:

INFO[0000] Running the operator locally.                
INFO[0000] Using namespace default.  

Creating an ImmortalContainer

To see if the operator works correctly, we are going to create an ImmortalContainer object.

First, we are going to create its definition. To do this we need to edit the file deploy/crds/immortalcontainer_v1alpha1_immortalcontainer_cr.yaml as shown:

apiVersion: immortalcontainer.flugel.it/v1alpha1
kind: ImmortalContainer
metadata:
  name: example-immortalcontainer
spec:
  image: nginx:latest

You can use any image, we have used nginx:latest because it’s very common and not too big.

We then use kubectl to create the ImmortalContainer object in the cluster.

$ kubectl apply -f deploy/crds/immortalcontainer_v1alpha1_immortalcontainer_cr.yaml

Upon detecting the new immortal container, the controller will create a pod to run the image. Let’s verify that the pod is created.

$ kubectl get pods
NAME                       READY     STATUS  RESTARTS    AGE
example-immortalcontainer-immortalpod  1/1  Running  0   25m

With the following command we can see the containers running inside this pod:

$ kubectl get pods example-immortalcontainer-immortalpod -o \ jsonpath='{.spec.containers[*].name} {.spec.containers[*].image}'
 
acontainer nginx:latest

Let’s check that the pod is recreated when we delete it.

$ kubectl delete pods example-immortalcontainer-immortalpod
pod "example-immortalcontainer-immortalpod" deleted
 
$ kubectl get pods                                         
NAME                         READY   STATUS        RESTARTS   AGE
example-immortalcontainer-immortalpod 0/1 ContainerCreating 0  3s

Finally, we can edit the ImmortalContainer object to see its status fields, CurrentPod and StartTimes.

$ kubectl edit immortalcontainer example-immortalcontainer
apiVersion: immortalcontainer.flugel.it/v1alpha1
kind: ImmortalContainer
metadata:
…
spec:
 image: nginx:latest
status:
 currentPod: example-immortalcontainer-immortalpod
 startTimes: 2

As you can see, the operator works as expected.

Deploying the operator to a cluster

Now that we have our operator up and running, we are going to build all the artifacts needed to deploy it inside a cluster. This is the kind of setup we would use in a production system. In this setup, the operator’s controller runs in a pod.

The controller still uses Kubernetes API to watch events and manage objects. Since it is no longer running in the user/developer computer, however, it requires permissions to access Kubernetes API (because it has no access to the user’s credentials).

Operator SDK automatically generates the needed permissions. You can read more about authorization in Kubernetes here.

To run the controller in a pod, we need to build an image (containing the executable code of the controller and a pod or deployment definition) to instantiate the pod. Fortunately, Operator SDK automates these tasks too.

The Operator SDK build command creates a docker image for the controller. For this article we’ve decided to name the image flugelit/immortalcontainer-operator:dev. We are pushing it to Docker Hub in order for our Kubernetes cluster to fetch it.

Note: You can use any other public or private registry

$ operator-sdk build flugelit/immortalcontainer-operator:dev
$ docker push flugelit/immortalcontainer-operator:dev

We’ve pushed the image here: https://hub.docker.com/r/flugelit/immortalcontainer-operator

Next, we must modify the image name in the deployment file, operator.yaml. You can use the sed command or edit the file and replace REPLACE_IMAGE with flugelit/immortalcontainer-operator:dev.

$ sed -i 's|REPLACE_IMAGE|flugelit/immortalcontainer-operator:dev|g' deploy/operator.yaml

Now we are ready to deploy the operator to the cluster. The following commands install the CRD, configure permissions needed for our controller to access Kubernetes API from inside the cluster, and deploy the operator.

$ kubectl apply -f deploy/crds/immortalcontainer_v1alpha1_immortalcontainer_crd.yaml
$ kubectl apply -f deploy/service_account.yaml
$ kubectl apply -f deploy/role.yaml
$ kubectl apply -f deploy/role_binding.yaml
$ kubectl apply -f deploy/operator.yaml

From here you could repeat the steps described in “Creating an ImmortalContainer” in order to try the operator, the only difference being that this time the controller is running inside the cluster.

Clean up

Using the following commands you can remove the operator from the cluster.

$ kubectl delete -f deploy/operator.yaml
$ kubectl delete -f deploy/role_binding.yaml
$ kubectl delete -f deploy/role.yaml
$ kubectl delete -f deploy/service_account.yaml
$ kubectl delete -f deploy/crds/immortalcontainer_v1alpha1_immortalcontainer_crd.yaml

Be careful; pods that were created for ImmortalContainers may still be running. If so, you can delete them. Since the operator is not running, they will not be restarted.

In Conclusion

In this article we’ve gone through all the steps needed to implement a simple but complete Kubernetes operator. Thanks to Operator SDK we saved lots of time and built the operator in a somewhat “standard” way.

Since building and running an operator required numerous steps, here we would like to summarize the process.

  1. Project initialization: We used Operator SDK to initialize the project and install required development dependencies,.
  2. Custom resources definitions: We created a new resource and defined a new object kind, ImmortalContainer. Then, using Operator SDK, we generated the CRDs.
  3. Custom controller implementation: We generated the barebones of the controller, customized it to watch for relevant events, and implemented the reconcile loop.
  4. Running outside the cluster: We ran the controller outside the cluster. This made it easier to see the logs.
  5. Building: To be able to deploy the operator to a cluster, we generated some artifacts, such as a Docker image and a deployment file.
  6. Deployment: Finally, we installed the operator to a cluster.

We think these steps are a good guide for most operators’ implementations. We hope you find them useful when implementing your own!

Finally, to avoid making this article any longer, we’ve decided to leave topics related to automatic testing for the next time. In the next issue, therefore, we will see how to implement unit and end-to-end tests.

References

 

Flugel.it