Extending the Lightbits Cluster
The lb-csi-plugin is stateless and holds no persistent information between operations.
Kubernetes API calls that invoke the lb-csi (using Lightbits storage), require information of the Lightbits management endpoints so that the plugin can access the Lightbits cluster.
The way to update this management endpoint list to the plugin today is via the StorageClass.Parameters.mgmt-endpoint
field.
The CSI API does not pass the request. Parameters in some types of API calls to the plugin, This results in situations where the plugin needs to access the Lightbits API but does not have the required information to do so.
In order to overcome this limitation, we defined an internal resource named ResourceID
which holds this information as the resource ID passed to Kubernetes.
Such a ResourceID
is utilized for representing the PersistentVolume.spec.volumeHandle
and VolumeSnapshotContent.status.snapshotHandle
resources.
ResourceID
is guarantied to pass on every API call from CSI - meaning that we can use it to hold limited state.
ResourceID
format is: mgmt:<host>:<port>[,<host>:<port>...]|nguid:<nguid>|proj:<proj>|scheme:<grpc|grpcs>
When the Lightbits cluster is expanded (i.e., adding/changing servers to an existing cluster), the management-endpoints list is updated, ensuring that the lb-csi-plugin can access any of the Lightbits api-servers.
This is because the PersistentVolume.spec.volumeHandle
and VolumeSnapshotContent.status.snapshotHandle
fields are immutable, and they cannot be changed post-resource creation.
To mitigate this problem, we provide a one-shot script that accesses a Kubernetes cluster via standard kubectl calls, and patches the following resource with the updated information:
StorageClass
- will modify theparameters.mgmt-endpoint
field with the new endpoint list.PersistentVolume
- will modify the resourcespec.volumeHandle
by using thekubectl replace
call.VolumeSnapshotContent
- will replace the resource'sspec.source.volumeHandle
with the new updatedspec.source.snapshotHandle
- which will contain the new endpoint list.
This behavior of relying on 'ResourceID' will be fixed in future versions of lb-csi-plugin
, once the Lightbits cluster supports VIP. All resources will point to a single endpoint, which will not change during Lightbits cluster updates.
This script should be idempotent, and should be safe to run on resources that were already updated.
Usage
Below is the patcher help output we provide to patch the existing resources in the cluster:
lightos-patcher.sh --help
Usage: lightos-patcher.sh [-s <storage_class>] [-e <endpoints>] [-d <backup_directory>]
-v <storage_class> name of the storage class and all related PVs to update
-s <snapshot_storage_class> name of the snapshot storage class and all related SnapshotContents to update
-e <endpoints> new endpoint list in the form of: <host:port>,<host:port>,...
-d <backup_directory> folder to backup before and after resources
Examples:
Suppose we have LightOS Cluster los1 with the following mgmt-endpoints:
192.168.17.2:443,192.168.18.3:443,192.168.20.4:443
After extending this cluster by adding a new server (192.168.20.5:443) we will have the following mgmt-endpoints:
192.168.17.2:443,192.168.18.3:443,192.168.20.4:443,192.168.20.5:443
# patch example-sc StorageClass and all PVs related to that StorageClass
./lightos-patcher.sh -v example-sc -e 192.168.17.2:443,192.168.18.3:443,192.168.20.4:443,192.168.20.5:443 -d ~/backup
# patch example-sc VolumeSnapshotClass and all VolumeSnapshotContents related to that class
./lightos-patcher.sh -s example-snap-sc -e 192.168.17.2:443,192.168.18.3:443,192.168.20.4:443,192.168.20.5:443 -d ~/backup
The order of the commands should be:
- Apply the script against all
StorageClass
s with the-v
option. Verify that all StorageClass and PVs are updated. - If there are VolumeSnapshots on the cluster, apply the script with the
-s
option.
Avoid operations that might access PV,PVC,VolumeSnapshots resources while running this script. Operations like replace will delete and recreate the resource with different values. As a result the StorageClass may temporarily not be accessible.
On a cluster that has existing PVs before expanding the Lightbits cluster, run the following:
./lightos-patcher.sh -v <storage-class-name> -e <new-comma-separated-endpoint-list> -d <backup-folder>
This command will:
- Patch the
StorageClass.Parameters.mgmt-endpoint
with thenew-comma-separated-endpoint-list
value. - Look up all
PV
s in theStorageClass
, and patch thePersistentVolume.spec.volumeHandle
value with thenew-comma-separated-endpoint-list
.
On a cluster that has VolumeSnapshot
s created before expanding the Lightbits cluster, run the following:
./lightos-patcher.sh -s <storage-class-name> -e <new-comma-separated-endpoint-list> -d <backup-folder>
This command will:
- Look up all
VolumeSnapshotContent
s in thisVolumeSnapshotClass
and replace theVolumeSnapshotContent.spec.source.volumeHandle
value with theVolumeSnapshotContent.spec.source.snapshotHandle
.
The VolumeSnapshotConten.Status.restoreSize
field will be zeroed out because this is a calculated property using ListSnapshots
- which is not implemented and cannot be modified via API.
This field still remains valid under VolumeSnapshot.Status.restoreSize
, and if you want to try to restore this snapshot into a PVC with smaller size it will fail with the following error:
requested volume size 1073741824 is less than the size 2147483648 for the source snapshot s1
as expected.