Extending the Lightbits Cluster
The lb-csi-plugin is stateless and holds no persistent information between operations.
Kubernetes API calls that invoke the lb-csi (using Lightbits storage), require information of the Lightbits management endpoints so that the plugin can access the Lightbits cluster.
The way to update this management endpoint list to the plugin today is via the StorageClass.Parameters.mgmt-endpoint field.
The CSI API does not pass the request. Parameters in some types of API calls to the plugin, This results in situations where the plugin needs to access the Lightbits API but does not have the required information to do so.
In order to overcome this limitation, we defined an internal resource named ResourceID which holds this information as the resource ID passed to Kubernetes.
Such a ResourceID is utilized for representing the PersistentVolume.spec.volumeHandle and VolumeSnapshotContent.status.snapshotHandle resources.
ResourceID is guarantied to pass on every API call from CSI - meaning that we can use it to hold limited state.
ResourceID format is: mgmt:<host>:<port>[,<host>:<port>...]|nguid:<nguid>|proj:<proj>|scheme:<grpc|grpcs>
When the Lightbits cluster is expanded (i.e., adding/changing servers to an existing cluster), the management-endpoints list is updated, ensuring that the lb-csi-plugin can access any of the Lightbits api-servers.
This is because the PersistentVolume.spec.volumeHandle and VolumeSnapshotContent.status.snapshotHandle fields are immutable, and they cannot be changed post-resource creation.
To mitigate this problem, we provide a one-shot script that accesses a Kubernetes cluster via standard kubectl calls, and patches the following resource with the updated information:
StorageClass- will modify theparameters.mgmt-endpointfield with the new endpoint list.PersistentVolume- will modify the resourcespec.volumeHandleby using thekubectl replacecall.VolumeSnapshotContent- will replace the resource'sspec.source.volumeHandlewith the new updatedspec.source.snapshotHandle- which will contain the new endpoint list.
This behavior of relying on 'ResourceID' will be fixed in future versions of lb-csi-plugin, once the Lightbits cluster supports VIP. All resources will point to a single endpoint, which will not change during Lightbits cluster updates.
This script should be idempotent, and should be safe to run on resources that were already updated.
Usage
Below is the patcher help output we provide to patch the existing resources in the cluster:
lightos-patcher.sh --helpUsage: lightos-patcher.sh [-s <storage_class>] [-e <endpoints>] [-d <backup_directory>]-v <storage_class> name of the storage class and all related PVs to update-s <snapshot_storage_class> name of the snapshot storage class and all related SnapshotContents to update-e <endpoints> new endpoint list in the form of: <host:port>,<host:port>,...-d <backup_directory> folder to backup before and after resourcesExamples: Suppose we have LightOS Cluster los1 with the following mgmt-endpoints: 192.168.17.2:443,192.168.18.3:443,192.168.20.4:443 After extending this cluster by adding a new server (192.168.20.5:443) we will have the following mgmt-endpoints: 192.168.17.2:443,192.168.18.3:443,192.168.20.4:443,192.168.20.5:443 # patch example-sc StorageClass and all PVs related to that StorageClass ./lightos-patcher.sh -v example-sc -e 192.168.17.2:443,192.168.18.3:443,192.168.20.4:443,192.168.20.5:443 -d ~/backup # patch example-sc VolumeSnapshotClass and all VolumeSnapshotContents related to that class ./lightos-patcher.sh -s example-snap-sc -e 192.168.17.2:443,192.168.18.3:443,192.168.20.4:443,192.168.20.5:443 -d ~/backupThe order of the commands should be:
- Apply the script against all
StorageClasss with the-voption. Verify that all StorageClass and PVs are updated. - If there are VolumeSnapshots on the cluster, apply the script with the
-soption.
Avoid operations that might access PV,PVC,VolumeSnapshots resources while running this script. Operations like replace will delete and recreate the resource with different values. As a result the StorageClass may temporarily not be accessible.
On a cluster that has existing PVs before expanding the Lightbits cluster, run the following:
./lightos-patcher.sh -v <storage-class-name> -e <new-comma-separated-endpoint-list> -d <backup-folder>This command will:
- Patch the
StorageClass.Parameters.mgmt-endpointwith thenew-comma-separated-endpoint-listvalue. - Look up all
PVs in theStorageClass, and patch thePersistentVolume.spec.volumeHandlevalue with thenew-comma-separated-endpoint-list.
On a cluster that has VolumeSnapshots created before expanding the Lightbits cluster, run the following:
./lightos-patcher.sh -s <storage-class-name> -e <new-comma-separated-endpoint-list> -d <backup-folder>This command will:
- Look up all
VolumeSnapshotContents in thisVolumeSnapshotClassand replace theVolumeSnapshotContent.spec.source.volumeHandlevalue with theVolumeSnapshotContent.spec.source.snapshotHandle.
The VolumeSnapshotConten.Status.restoreSize field will be zeroed out because this is a calculated property using ListSnapshots - which is not implemented and cannot be modified via API.
This field still remains valid under VolumeSnapshot.Status.restoreSize, and if you want to try to restore this snapshot into a PVC with smaller size it will fail with the following error:
requested volume size 1073741824 is less than the size 2147483648 for the source snapshot s1
as expected.