Upgrading the CSI Plugin

Because we specify spec.template.spec.priorityClassName = system-cluster-critical, we should get rescheduled even if the server is low on resources. See this link for additional information.

From pod-priority-preemption, we can see that the priority-class is instructing the server to preempt lower priority PODs if needed.

On Production deployments, we would want to do a node upgrade manually, to verify that there is no service loss.

Upgrade Overview

Kubernetes supports two ways for upgrading resources:

  • OnDelete - once a POD is deleted, the new scheduled POD will be running with upgraded specification. Using this strategy, we can choose which POD will be upgraded and we have mode control over the flow.
  • RollingUpgrade - Once applied, Kubernetes will do the upgrade of the DaemonSet one by one on its own, without the ability to intervene if something goes wrong.

The manual approach is preferred, to make sure that there is no service loss while upgrading.

This is the flow we recommend for upgrading the CSI plugin:

  1. Upgrade the lb-csi-node DaemonSet PODs manually, one by one.
  2. Verify that the upgraded node is still working.
  3. Upgrade the lb-csi-controller StatefulSet.
  4. Verify that the entire cluster is working.

Applying a Manual Upgrade

Manual flow:

  1. Stage #1: Modify DaemonSet's spec.updateStrategy to OnDelete
  2. Stage #2: Update DaemonSet lb-csi-plugin image
  3. Stage #3: Select One Node And Apply Upgrade and Verify
  4. Stage #4: Verify That The Updated POD Is Functioning Properly
  5. Stage #5: Upgrade the Remaining lb-csi-node PODs
  6. Stage #6: Modify DaemonSet's spec.updateStrategy back to RollinUpdate
  7. Stage #7: Upgrade StatefulSet

Stage #1: Modify DaemonSet's spec.updateStrategy to OnDelete

Bash
Copy

Stage #2: Update DaemonSet lb-csi-plugin Image

The only difference between the two DaemonSets is the lb-csi-plugin image:

Bash
Copy

In case the discovery-client is deployed as a container in lb-csi-node POD we should add the following difference as well:

Bash
Copy

The Docker registry prefix could vary between deployments.

Updating only the container image, use kubectl set image

Bash
Copy

In case discovery-client is deployed as a container in lb-csi-node POD, run the following command as well:

Bash
Copy

Stage #3: Select One Node and Apply Upgrade And Verify

We will specify how to manually upgrade the image in each of the PODs:

  1. List all the lb-csi-plugin pods in the cluster:
Bash
Copy

For this example, select the first lb-csi-node:

Bash
Copy
  1. Delete the POD running on our selected server.
Bash
Copy
  1. Verify that the lb-csi-node POD is upgraded.

Listing the PODs again will show that one of them has a very short Age and it would have a different name:

Bash
Copy

We need to verify that it is Running.

We should also verify that the image was updated correctly by running the following command:

Bash
Copy

In case discovery-client is deployed as a container in the lb-csi-node POD, verify that its image was updated as well with the following command:

Bash
Copy

Stage #4: Verify that the Upgraded lb-csi-node POD is Functioning Properly

We will run a simple verification test to see that our node is still functioning before we move to the next node.

By deploying a simple workload on the upgraded node, we can verify that the lb-csi-node node is functioning properly.

We provide two ways to run the verification test:

  1. Using Static Manifests
  2. Using the Provided Helm Chart

Verify that the Upgraded Node is Using Static Manifests

Our verification test is very simple and has the following steps:

  1. Create an example PVC.
  2. Deploy a POD consuming this PVC on upgraded node.

Create a manifest file named fs-workload.yaml containing the two kinds we want to deploy - PVC and POD:

YAML
Copy

Make sure you modify the following fields that are cluster-specific:

  • storageClassName: The name of the SC configured in your cluster.
  • nodeName: The name of the node we want to deploy on.
  • Pod.spec.image: The name of the busybox image. Note that the Docker registry prefix could vary between deployments.

In order to get this, run the following commands:

Bash
Copy

We can see that POD lb-csi-node-stzg6 was the one that had restarted and was updated, so we will set nodeName to be rack06-server67-vm03.

Apply the following command:

Bash
Copy

The workload will write some files to the mounted volume. You can run the following command to see that the content is written to the volume:

Bash
Copy

After a successful workload on the upgraded node, delete the tmp workload by running:

Bash
Copy

Verify the Upgraded Node Using Helm

We will use the workload Helm chart provided with the bundle for this:

Bash
Copy

We will use the name of the StorageClass and the name of the upgraded node (rack06-server63-vm04) to deploy the FS pod workload.

Bash
Copy

Now we need to verify that the PVC was Bound and that the POD is in Ready status.

Bash
Copy

If all is well, we can assume that the upgrade for that node worked.

Now we will uninstall the workload using the command:

Bash
Copy

Stage #5: Upgrade Remaining lb-csi-node PODs

Repeat the following steps:

Stage #6: Modify DaemonSet's spec.updateStrategy back to RollingUpdate

Bash
Copy

Stage #7: Upgrade StatefulSet

Since we have only one replica in the lb-csi-controller StatefulSet, there is no need to do a rolling upgrade.

Between the two spoken versions, there were many modifications to the StatefulSet, since we added Snapshots.

Snapshot requires the following resources to be deployed on the Kubernetes cluster:

  1. Snapshot RBAC ClusterRole and ClusterRoleBindings.

  2. Custom resource definitions:

    1. kind: VolumeSnapshot (version: v0.4.0, apiVersion: apiextensions.k8s.io/v1
    2. kind: VolumeSnapshotClass (version: v0.4.0, apiVersion: apiextensions.k8s.io/v1)
    3. VolumeSnapshotContent (version: v0.4.0, apiVersion: apiextensions.k8s.io/v1)
  3. Two additional containers in the lb-csi-controller POD:\

    1. name: snapshot-controller (v4.0.0)
    2. name: csi-snapshotter (v4.0.0)
  4. Deploy ClusterRole and ClusterRoleBindings.

We assume that the Kubernetes cluster admin will know what is deployed on the system.

The following steps allow us to validate if we have the Roles and Bindings to work with snapshots.

If resources are not present on the cluster, these steps will guide you as to how to add them.

  1. Verify if we have ClusterRoles for snapshots:
Bash
Copy
  1. The same should be done with the ClusterRoleBindings:
Bash
Copy
  1. Deploy ClusterRoles and ClusterRoleBinding using the following command:
Bash
Copy
  1. Deploy Snapshot CRDS.

We need to understand if we have the snapshot CRDs deployed already on the cluster.

Bash
Copy

If we see output like this, we already have CRD deployed on the cluster and we can skip adding them:

Bash
Copy

If we get no output, it means that we do not have CRDs deployed and we need to deploy them as follows:

Bash
Copy
  1. Upgrade lb-csi-controller StatefulSet

The Docker registry prefix could vary between deployments. Please verify the image prefix before running.

Bash
Copy

Verify StatefulSet And DaemonSet Version As Expected

List all CSI plugin pods:

Bash
Copy

Verify that the version-rel matches the expected version.

For the controller pod:

Bash
Copy

The same for each node pod:

Bash
Copy

Applying RollingUpgrade (Automated Deployment)

Checking DaemonSet Update Strategy

Bash
Copy

Checking StatefulSet Update Strategy

Bash
Copy

Rollout History

Each time we deploy the DaemonSet, a new rollout will be created. This can be viewed using the following command:

Bash
Copy

The same can be seen for ReplicaSet resources:

Bash
Copy

Rollout Status

We can verify the status of a rollout using the following command:

Bash
Copy

Verify StatefulSet And DaemonSet Version As Expected

List all CSI plugin pods:

Bash
Copy

Verify that the version-rel matches the expected version.

For the controller pod:

Bash
Copy

The same for each node pod:

Bash
Copy

Rollback DaemonSet

In case nothing works, we can roll back.

Bash
Copy

Now we can see again that the rollout has changed and that we got a new ControllerRevision (always incrementing):

Bash
Copy

Rollback StatefulSet

Bash
Copy
Bash
Copy

Verify that the Upgraded Cluster Is Working

Once you have completed all operations for the upgrade, you should run different workloads to verify that all is functioning properly:

  1. Create block PVC,POD.
  2. Create filesystem PVC,POD.
  3. Create snapshots, clones, clone PVCs.

You can use the workload examples provided with the lb-csi-bundle-<version>.tar.gz of the target version.

Type to search, ESC to discard
Type to search, ESC to discard
Type to search, ESC to discard