Upgrading the CSI Plugin

Because we specify spec.template.spec.priorityClassName = system-cluster-critical, we should get rescheduled even if the server is low on resources. See this link for additional information.

From pod-priority-preemption, we can see that the priority-class is instructing the server to preempt lower priority PODs if needed.

On Production deployments, we would want to do a node upgrade manually, to verify that there is no service loss.

Upgrade Overview

Kubernetes supports two ways for upgrading resources:

OnDelete - once a POD is deleted, the new scheduled POD will be running with upgraded specification. Using this strategy, we can choose which POD will be upgraded and we have mode control over the flow.
RollingUpgrade - Once applied, Kubernetes will do the upgrade of the DaemonSet one by one on its own, without the ability to intervene if something goes wrong.

The manual approach is preferred, to make sure that there is no service loss while upgrading.

This is the flow we recommend for upgrading the CSI plugin:

Upgrade the lb-csi-node DaemonSet PODs manually, one by one.
Verify that the upgraded node is still working.
Upgrade the lb-csi-controller StatefulSet.
Verify that the entire cluster is working.

Applying a Manual Upgrade

Manual flow:

Stage #1: Modify DaemonSet's spec.updateStrategy to OnDelete
Stage #2: Update DaemonSet lb-csi-plugin image
Stage #3: Select One Node And Apply Upgrade and Verify
Stage #4: Verify That The Updated POD Is Functioning Properly
Stage #5: Upgrade the Remaining lb-csi-node PODs
Stage #6: Modify DaemonSet's spec.updateStrategy back to RollinUpdate
Stage #7: Upgrade StatefulSet

Stage #1: Modify DaemonSet's `spec.updateStrategy` to `OnDelete`

Bash
    
​x
 
kubectl patch ds/lb-csi-node -n kube-system -p '{"spec":{"updateStrategy":{"type":"OnDelete"}}}'daemonset.apps/lb-csi-node patched​# verify changes appliedkubectl get ds/lb-csi-node -o go-template='{{.spec.updateStrategy.type}}{{"\n"}}' -n kube-systemOnDelete
Copy

Stage #2: Update DaemonSet `lb-csi-plugin` Image

The only difference between the two DaemonSets is the lb-csi-plugin image:

Bash
    
 
< image: docker.lightbitslabs.com/lightos-csi/lb-csi-plugin:1.2.0---> image: docker.lightbitslabs.com/lightos-csi/lb-csi-plugin:1.4.2
Copy

In case the discovery-client is deployed as a container in lb-csi-node POD we should add the following difference as well:

Bash
    
 
< image: docker.lightbitslabs.com/lightos-csi/lb-nvme-discovery-client:1.2.0---> image: docker.lightbitslabs.com/lightos-csi/lb-nvme-discovery-client:1.4.2
Copy

The Docker registry prefix could vary between deployments.

Updating only the container image, use kubectl set image

Bash
    
 
kubectl set image ds/lb-csi-node -n kube-system lb-csi-plugin=docker.lightbitslabs.com/lightos-csi/lb-csi-plugin:1.4.2
Copy

In case discovery-client is deployed as a container in lb-csi-node POD, run the following command as well:

Bash
    
kubectl set image ds/lb-csi-node -n kube-system lb-nvme-discovery-client=docker.lightbitslabs.com/lightos-csi/lb-nvme-discovery-client:1.4.2
Copy

Stage #3: Select One Node and Apply Upgrade And Verify

We will specify how to manually upgrade the image in each of the PODs:

List all the lb-csi-plugin pods in the cluster:

Bash
    
 
kubectl get pods -n kube-system -l app=lb-csi-plugin -o wide​NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATESlb-csi-controller-0 4/4 Running 0 117m 10.244.3.7 rack06-server63-vm04 <none> <none>lb-csi-node-rwrz6 3/3 Running 0 5m10s 192.168.20.61 rack06-server63-vm04 <none> <none>lb-csi-node-stzg6 3/3 Running 0 5m 192.168.20.84 rack06-server67-vm03 <none> <none>lb-csi-node-wc46m 3/3 Running 0 17h 192.168.16.114 rack09-server69-vm01 <none> <none>
Copy

For this example, select the first lb-csi-node:

Bash
    
 
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATESlb-csi-node-rwrz6 3/3 Running 0 5m10s 192.168.20.61 rack06-server63-vm04 <none> <none>
Copy

Delete the POD running on our selected server.

Bash
    
 
kubectl delete pods/lb-csi-node-rwrz6 -n kube-system​pod "lb-csi-node-rwrz6" deleted
Copy

Verify that the lb-csi-node POD is upgraded.

Listing the PODs again will show that one of them has a very short Age and it would have a different name:

Bash
    
 
kubectl get pods -n kube-system -l app=lb-csi-plugin -o wide​NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATESlb-csi-node-g47z2 2/2 Running 0 39s 192.168.20.61 rack06-server63-vm04 <none> <none>
Copy

We need to verify that it is Running.

We should also verify that the image was updated correctly by running the following command:

Bash
    
kubectl get pods lb-csi-node-g47z2 -n kube-system -o jsonpath='{.spec.containers[?(@.name=="lb-csi-plugin")].image}' ; echo​docker.lightbitslabs.com/lightos-csi/lb-csi-plugin:1.4.2
Copy

In case discovery-client is deployed as a container in the lb-csi-node POD, verify that its image was updated as well with the following command:

Bash
    
kubectl get pods lb-csi-node-tpd7d -n kube-system -o jsonpath='{.spec.containers[?(@.name=="lb-nvme-discovery-client")].image}' ; echo​docker.lightbitslabs.com/lightos-csi/lb-nvme-discovery-client:1.4.2
Copy

Stage #4: Verify that the Upgraded `lb-csi-node` `POD` is Functioning Properly

We will run a simple verification test to see that our node is still functioning before we move to the next node.

By deploying a simple workload on the upgraded node, we can verify that the lb-csi-node node is functioning properly.

We provide two ways to run the verification test:

Using Static Manifests
Using the Provided Helm Chart

Verify that the Upgraded Node is Using Static Manifests

Our verification test is very simple and has the following steps:

Create an example PVC.
Deploy a POD consuming this PVC on upgraded node.

Create a manifest file named fs-workload.yaml containing the two kinds we want to deploy - PVC and POD:

YAML
    
apiVersion: v1kind: PersistentVolumeClaimmetadata:  name: example-fs-after-upgrade-pvcspec:  storageClassName: "<STORAGE-CLASS-NAME>"  accessModes:  - ReadWriteOnce  volumeMode: Filesystem  resources:    requests:      storage: 10Gi---apiVersion: v1kind: Podmetadata:  name: "example-fs-after-upgrade-pod"spec:  nodeName: "<NODE-NAME>"  containers:  - name: busybox-date-container    imagePullPolicy: IfNotPresent    image: busybox    command: ["/bin/sh"]    args: ["-c", "if [ -f /mnt/test/hostname ] ; then (md5sum -s -c /mnt/test/hostname.md5 && echo OLD MD5 OK || echo BAD OLD MD5) >> /mnt/test/log ; fi ; echo $KUBE_NODE_NAME: $(date +%Y-%m-%d.%H-%M-%S) >| /mnt/test/hostname ; md5sum /mnt/test/hostname >| /mnt/test/hostname.md5 ; echo NEW NODE: $KUBE_NODE_NAME: $(date +%Y-%m-%d.%H-%M-%S) >> /mnt/test/log ; while true ; do date +%Y-%m-%d.%H-%M-%S >| /mnt/test/date ; sleep 10 ; done" ]    env:    - name: KUBE_NODE_NAME      valueFrom:        fieldRef:          fieldPath: spec.nodeName    stdin: true    tty: true    volumeMounts:    - name: test-mnt      mountPath: "/mnt/test"  volumes:  - name: test-mnt    persistentVolumeClaim:      claimName: "example-fs-after-upgrade-pvc"
Copy

Make sure you modify the following fields that are cluster-specific:

storageClassName: The name of the SC configured in your cluster.
nodeName: The name of the node we want to deploy on.
Pod.spec.image: The name of the busybox image. Note that the Docker registry prefix could vary between deployments.

In order to get this, run the following commands:

Bash
    
kubectl get pods -n kube-system -l app=lb-csi-plugin -o wide​NAME                  READY   STATUS    RESTARTS   AGE     IP               NODE                   NOMINATED NODE   READINESS GATESlb-csi-controller-0   4/4     Running   0          117m    10.244.3.7       rack06-server63-vm04   <none>           <none>lb-csi-node-rwrz6     3/3     Running   0          17h     192.168.20.61    rack06-server63-vm04   <none>           <none>lb-csi-node-stzg6     3/3     Running   0          5m      192.168.20.84    rack06-server67-vm03   <none>           <none>lb-csi-node-wc46m     3/3     Running   0          17h     192.168.16.114   rack09-server69-vm01   <none>           <none>
Copy

We can see that POD lb-csi-node-stzg6 was the one that had restarted and was updated, so we will set nodeName to be rack06-server67-vm03.

Apply the following command:

Bash
    
 
kubectl create -f fs-workload.yaml
Copy

The workload will write some files to the mounted volume. You can run the following command to see that the content is written to the volume:

Bash
    
kubectl exec -it pod/example-fs-after-upgrade-pod -- /bin/sh -c "cat /mnt/test/date ; cat /mnt/test/hostname; cat /mnt/test/hostname.md5"​2021-05-23.08-13-10rack08-server52: 2021-05-23.08-03-3061afe45d31f826f5b7e54e6bd92ec07d  /mnt/test/hostname
Copy

After a successful workload on the upgraded node, delete the tmp workload by running:

Bash
    
 
kubectl delete -f fs-workload.yaml
Copy

Verify the Upgraded Node Using Helm

We will use the workload Helm chart provided with the bundle for this:

Bash
    
 
kubectl get storageclass​NAME  PROVISIONER            RECLAIMPOLICY  BINDINGMODE ALLOWVOLUMEEXPANSION AGElb-sc csi.lightbitslabs.com  Delete         Immediate   false                2d12h
Copy

We will use the name of the StorageClass and the name of the upgraded node (rack06-server63-vm04) to deploy the FS pod workload.

Bash
    
 
helm install --set filesystem.enabled=true \   --set global.storageClass.name=lb-sc \   --set filesystem.nodeName=rack06-server63-vm04 \   fs-workload \   lb-csi-workload-examples
Copy

Now we need to verify that the PVC was Bound and that the POD is in Ready status.

Bash
    
 
kubectl get pv,pvc,pod​NAME                                                       STATUS CLAIM                  SC    AGEpersistentvolume/pvc-6b26b875-fafd-4abe-95bb-2f5305b61a29  Bound  default/example-fs-pvc lb-sc 12m​NAME                                  STATUS  VOLUME                                    SC    AGEpersistentvolumeclaim/example-fs-pvc  Bound   pvc-6b26b875-fafd-4abe-95bb-2f5305b61a29  lb-sc 12m​NAME                    READY   STATUS    RESTARTS   AGEpod/example-fs-pod      1/1     Running   0          12m
Copy

If all is well, we can assume that the upgrade for that node worked.

Now we will uninstall the workload using the command:

Bash
    
 
helm delete fs-workload
Copy

Stage #5: Upgrade Remaining `lb-csi-node` `POD`s

Repeat the following steps:

Stage #3: Select One Node and Apply Upgrade And Verify
Stage #4: Verify that the Upgraded lb-csi-node POD is Functioning Properly

Stage #6: Modify DaemonSet's `spec.updateStrategy` back to `RollingUpdate`

Bash
    
 
kubectl patch ds/lb-csi-node -n kube-system -p '{"spec":{"updateStrategy":{"type":"RollingUpdate"}}}'​daemonset.apps/lb-csi-node patched​# verify changes appliedkubectl get ds/lb-csi-node -o go-template='{{.spec.updateStrategy.type}}{{"\n"}}' -n kube-systemRollingUpdate
Copy

Stage #7: Upgrade StatefulSet

Since we have only one replica in the lb-csi-controller StatefulSet, there is no need to do a rolling upgrade.

Between the two spoken versions, there were many modifications to the StatefulSet, since we added Snapshots.

Snapshot requires the following resources to be deployed on the Kubernetes cluster:

Snapshot RBAC ClusterRole and ClusterRoleBindings.
Custom resource definitions:
1. kind: VolumeSnapshot (version: v0.4.0, apiVersion: apiextensions.k8s.io/v1
2. kind: VolumeSnapshotClass (version: v0.4.0, apiVersion: apiextensions.k8s.io/v1)
3. VolumeSnapshotContent (version: v0.4.0, apiVersion: apiextensions.k8s.io/v1)
Two additional containers in the lb-csi-controller POD:\
1. name: snapshot-controller (v4.0.0)
2. name: csi-snapshotter (v4.0.0)
Deploy ClusterRole and ClusterRoleBindings.

We assume that the Kubernetes cluster admin will know what is deployed on the system.

The following steps allow us to validate if we have the Roles and Bindings to work with snapshots.

If resources are not present on the cluster, these steps will guide you as to how to add them.

Verify if we have ClusterRoles for snapshots:

Bash
    
 
kubectl get clusterrole | grep snap​# If we get empty response we will need to deploy the ClusterRoles, see step #3.​# If we get the following output:​external-snapshotter-runner 2d15hsnapshot-controller-runner 2d15h​# It means that the roles are deployed and the Cluster-Admin need to make sure that the granted permissions are enough.
Copy

The same should be done with the ClusterRoleBindings:

Bash
    
kubectl get clusterrolebindings | grep snap​# If we get empty response we will need to deploy the ClusterRoleBindings, see step #3.​# If we get the following output:​csi-snapshotter-role 2d15hsnapshot-controller-role 2d15h​# It means that the roles are deployed and the Cluster-Admin need to make ClusterRoleBindings are assigned to the correct ServiceAccount.
Copy

Deploy ClusterRoles and ClusterRoleBinding using the following command:

Bash
    
 
kubectl create -f snapshot -rbac.yaml​clusterrole.rbac.authorization.k8s.io/snapshot -controller -runner createdclusterrole.rbac.authorization.k8s.io/external -snapshotter -runner createdclusterrolebinding.rbac.authorization.k8s.io/snapshot -controller -role createdclusterrolebinding.rbac.authorization.k8s.io/csi -snapshotter -role created
Copy

Deploy Snapshot CRDS.

We need to understand if we have the snapshot CRDs deployed already on the cluster.

Bash
    
kubectl get crd -o jsonpath='{range .items[*]}{@.spec.names.kind}{" , "}{@.apiVersion}{" , "}{@.metadata.annotations.controller-gen\.kubebuilder\.io/version}{"\n"}{end}' ;echo
Copy

If we see output like this, we already have CRD deployed on the cluster and we can skip adding them:

Bash
    
 
VolumeSnapshotClass , apiextensions.k8s.io/v1 , v0.4.0VolumeSnapshotContent , apiextensions.k8s.io/v1 , v0.4.0VolumeSnapshot , apiextensions.k8s.io/v1 , v0.4.0
Copy

If we get no output, it means that we do not have CRDs deployed and we need to deploy them as follows:

Bash
    
 
kubectl create -f snapshot-crds.yaml​customresourcedefinition.apiextensions.k8s.io/volumesnapshotclasses.snapshot.storage.k8s.io createdcustomresourcedefinition.apiextensions.k8s.io/volumesnapshotcontents.snapshot.storage.k8s.io createdcustomresourcedefinition.apiextensions.k8s.io/volumesnapshots.snapshot.storage.k8s.io created
Copy

Upgrade lb-csi-controller StatefulSet

The Docker registry prefix could vary between deployments. Please verify the image prefix before running.

Bash
    
 
kubectl apply -f stateful-set.yaml​Warning: kubectl apply should be used on resource created by either kubectl create --save-config or kubectl applystatefulset.apps/lb-csi-controller configured
Copy

Verify StatefulSet And DaemonSet Version As Expected

List all CSI plugin pods:

Bash
    
 
kubectl get pods -n kube-system -l app=lb-csi-plugin​NAME                  READY   STATUS    RESTARTS   AGElb-csi-controller-0   6/6     Running   0          3m33slb-csi-node-k4bzk     3/3     Running   0          13mlb-csi-node-pcsmm     3/3     Running   0          13mlb-csi-node-z7lpr     3/3     Running   0          13m
Copy

Verify that the version-rel matches the expected version.

For the controller pod:

Bash
    
kubectl logs -n kube-system lb-csi-controller-0 -c lb-csi-plugin | grep version-rel​time="2021-03-21T18:50:54.410655+00:00" level=info msg=starting config="{NodeID:rack06-server63-vm04.ctrl Endpoint:unix:///var/lib/csi/sockets/pluginproxy/csi.sock DefaultFS:ext4 LogLevel:debug LogRole:controller LogTimestamps:true LogFormat:text BinaryName: Transport:tcp SquelchPanics:true PrettyJson:false}" driver-name=csi.lightbitslabs.com node=rack06-server63-vm04.ctrl role=controller version-build-id= version-git$v1.4.2-0-gaf08f7e0 version-hash=1.4.2 version-rel=1.4.2
Copy

The same for each node pod:

Bash
    
kubectl logs -n kube-system lb-csi-node-k4bzk -c lb-csi-plugin | grep version-rel​time="2021-03-21T18:41:18.750957+00:00" level=info msg=starting config="{NodeID:rack06-server63-vm04.node Endpoint:unix:///csi/csi.sock DefaultFS:ext4 LogLevel:debug LogRole:node LogTimestamps:true LogFormat:text BinaryName: Transport:tcp SquelchPanics:true PrettyJson:false}" driver-name=csi.lightbitslabs.com node=rack06-server63-vm04.node role=node version-build-id= version-git=v1.4.2-0-gaf08f7e0 version-hash=1.4.2 version-rel=1.4.2
Copy

Applying RollingUpgrade (Automated Deployment)

Checking DaemonSet Update Strategy

Bash
    
 
kubectl get ds/lb-csi-node -o go-template='{{.spec.updateStrategy.type}}{{"\n"}}' -n kube-system
Copy

Checking StatefulSet Update Strategy

Bash
    
 
kubectl get sts/lb-csi-controller -o go-template='{{.spec.updateStrategy.type}}{{"\n"}}' -n kube-system
Copy

Rollout History

Each time we deploy the DaemonSet, a new rollout will be created. This can be viewed using the following command:

Bash
    
 
kubectl rollout history daemonset lb-csi-node -n kube-system​daemonset.apps/lb-csi-node REVISION  CHANGE-CAUSE1         <none>
Copy

The same can be seen for ReplicaSet resources:

Bash
    
 
kubectl rollout history statefulset lb-csi-controller -n kube-system​statefulset.apps/lb-csi-controller REVISION12
Copy

Rollout Status

We can verify the status of a rollout using the following command:

Bash
    
 
kubectl rollout status daemonset lb-csi-node -n kube-systemdaemon set "lb-csi-node" successfully rolled out
Copy

Verify StatefulSet And DaemonSet Version As Expected

List all CSI plugin pods:

Bash
    
 
kubectl get pods -n kube-system -l app=lb-csi-plugin​NAME                  READY   STATUS    RESTARTS   AGElb-csi-controller-0   6/6     Running   0          3m33slb-csi-node-k4bzk     2/2     Running   0          13mlb-csi-node-pcsmm     2/2     Running   0          13mlb-csi-node-z7lpr     2/2     Running   0          13m
Copy

Verify that the version-rel matches the expected version.

For the controller pod:

Bash
    
kubectl logs -n kube-system lb-csi-controller-0 -c lb-csi-plugin | grep version-rel​time="2021-03-21T18:50:54.410655+00:00" level=info msg=starting config="{NodeID:rack06-server63-vm04.ctrl Endpoint:unix:///var/lib/csi/sockets/pluginproxy/csi.sock DefaultFS:ext4 LogLevel:debug LogRole:controller LogTimestamps:true LogFormat:text BinaryName: Transport:tcp SquelchPanics:true PrettyJson:false}" driver-name=csi.lightbitslabs.com node=rack06-server63-vm04.ctrl role=controller version-build-id= version-git$v1.4.2-0-gaf08f7e0 version-hash=1.4.2 version-rel=1.4.2
Copy

The same for each node pod:

Bash
    
kubectl logs -n kube-system lb-csi-node-k4bzk -c lb-csi-plugin | grep version-rel​time="2021-03-21T18:41:18.750957+00:00" level=info msg=starting config="{NodeID:rack06-server63-vm04.node Endpoint:unix:///csi/csi.sock DefaultFS:ext4 LogLevel:debug LogRole:node LogTimestamps:true LogFormat:text BinaryName: Transport:tcp SquelchPanics:true PrettyJson:false}" driver-name=csi.lightbitslabs.com node=rack06-server63-vm04.node role=node version-build-id= version-git=v1.4.2-0-gaf08f7e0 version-hash=1.4.2 version-rel=1.4.2
Copy

Rollback DaemonSet

In case nothing works, we can roll back.

Bash
    
 
kubectl rollout undo daemonset lb-csi-node -n kube-system ​daemonset.apps/lb-csi-node rolled back
Copy

Now we can see again that the rollout has changed and that we got a new ControllerRevision (always incrementing):

Bash
    
 
kubectl rollout history daemonset lb-csi-node -n kube-system daemonset.apps lb-csi-node​REVISION  CHANGE-CAUSE                                      2         <none> 3         <none>
Copy

Rollback StatefulSet

Bash
    
 
kubectl rollout undo statefulset lb-csi-controller -n kube-system ​statefulset.apps/lb-csi-controller rolled back
Copy

Bash
    
 
kubectl rollout history statefulset lb-csi-controller -n kube-system​statefulset.apps/lb-csi-controller REVISION23
Copy

Verify that the Upgraded Cluster Is Working

Once you have completed all operations for the upgrade, you should run different workloads to verify that all is functioning properly:

Create block PVC,POD.
Create filesystem PVC,POD.
Create snapshots, clones, clone PVCs.

You can use the workload examples provided with the lb-csi-bundle-<version>.tar.gz of the target version.

Last updated on

Was this page helpful?

Upgrading the CSI Plugin

Upgrade Overview

Applying a Manual Upgrade

Stage #1: Modify DaemonSet's spec.updateStrategy to OnDelete

Stage #2: Update DaemonSet lb-csi-plugin Image

Stage #4: Verify that the Upgraded lb-csi-node POD is Functioning Properly

Verify that the Upgraded Node is Using Static Manifests

Verify the Upgraded Node Using Helm

Stage #5: Upgrade Remaining lb-csi-node PODs

Stage #6: Modify DaemonSet's spec.updateStrategy back to RollingUpdate

Stage #7: Upgrade StatefulSet

Verify StatefulSet And DaemonSet Version As Expected

Applying RollingUpgrade (Automated Deployment)

Checking DaemonSet Update Strategy

Checking StatefulSet Update Strategy

Rollout History

Rollout Status

Verify StatefulSet And DaemonSet Version As Expected

Rollback DaemonSet

Rollback StatefulSet

Verify that the Upgraded Cluster Is Working

Stage #1: Modify DaemonSet's `spec.updateStrategy` to `OnDelete`

Stage #2: Update DaemonSet `lb-csi-plugin` Image

Stage #4: Verify that the Upgraded `lb-csi-node` `POD` is Functioning Properly

Stage #5: Upgrade Remaining `lb-csi-node` `POD`s

Stage #6: Modify DaemonSet's `spec.updateStrategy` back to `RollingUpdate`