Volume Management
Understanding a Volume Location in the Lightbits Cluster
The basic work with a block device is to create and manage volumes. This section explains the volumes in a Lightbits cluster - including how to identify the placement of the volumes and how a node failure impacts the volume protection state.
You can use the lbcli List Volumes command to see which node in the cluster holds the data of a specific volume. Note that volume placement may change during the life cycle of a volume due to dynamic rebalancing.
Sample Command
$ lbcli -J $LIGHTOS_JWT get volume --uuid=48dbbe54-1548-4444-866e-6438d4877e5f (Project Admin should add the --project-name=<project-name>)
Sample Output
name vol_3
rebuildProgress None
UUID 48dbbe54-1548-4444-866e-6438d4877e5f
Acl
Values
hostnqn1
nsid 4 size"107374182400"
nodeList
9a625dbf-de1b-4211-b9d3-0bdf70faa5f5
9f1a3a85-5be5-4f98-a44e-4271fbdbe7cc
a7f67ad0-1f3d-49cb-9650-b82042378014
protectionState FullyProtected
replicaCount3
state Created
You can also run the nvme list-subsys
command on the application server, on a specific block-device that corresponds to a volume on the cluster.
Sample Command
nvme list-subsys nvme0n1
Sample Output
nvme-subsys0 -
NQN=nqn.2014-08.org.nvmexpress:NVMf:uuid:b550533c-8bb0-46df-9019-cd4c25c6e6e7
+- nvme0 tcp traddr=10.18.38.4 trsvcid=4420 live
+- nvme1 tcp traddr=10.18.38.5 trsvcid=4420 live
+- nvme2 tcp traddr=10.18.38.7 trsvcid=4420 live inaccessible
+- nvme3 tcp traddr=10.18.38.8 trsvcid=4420 live optimized
+- nvme4 tcp traddr=10.18.38.29 trsvcid=4420 live inaccessible
This command’s output includes the cluster’s primary node (optimized) IP address for the volume, the secondary node (inaccessible) IP address, and the other nodes that do not hold data for that volume (block device nvme0n1).
Volume Placement
With Volume Placement, you can specify which failure domain to place a volume on upon its creation. The idea behind volume placement is to statically define, separate, and manage how volumes and data can tolerate a failure and provide availability - through failure of a server, rack, row, power grid, etc. By default, Lightbits will define server level failure domains.
Note that this feature is associated with the create volume
command.
Current feature limitations of Volume Placement include the following:
- Available only for single replica volumes.
- Only failure domains labels are currently matched (e.g., fd:server00).
- Up to 25 affinities can be specified.
- Value (failure domain name) is limited to up to 100 characters.
- Volume Placement cannot be specified for a clone (a clone is always placed on the same nodes as the parent volume/snapshot).
- Dynamic Rebalancing must be disabled.
- The entire Lightbits cluster must be upgraded to at least release version 2.3.8.
In Lightbits 2.3.8 and above, a new flag is available - 'placement-affinity' - which can be used as follows:
lbcli create volume
lbcli -J $JWT create volume ----name=vol1 --acl=acl1 --size="4 Gib" --replica-count=1 --placement-affinity="fd:Server0|fd:rack1|fd:rack0"
In the example above, you can ask the system to place the volume as follows:
- On a node that includes Server0 in its failure domains. Note that the server name is used as a default failure domain configuration in node-manager.yaml. You will be responsible for determining if you want anything else other than the default yaml that Lightbits provides.
- On a node that includes rack1 in its failure domains.
- On a node that includes rack0 in its failure domains.
If the requirement cannot be satisfied because Lightbits did not find such a node, Lightbits will fail the request.
If Lightbits cannot find active nodes with failure domains that match the volume placement request, create volume will fail and will not place the volume on other nodes.
Node Failure and Volume Protection State
The Lightbits cluster software continuously monitors the cluster nodes' health and connectivity and responds to changes in the nodes’ status. In AWS, the Auto Healing feature will take preventive measures in case of AWS notification of an instance outage, and remedial measures on abrupt failure. See the Auto Maintenance Overview section for additional information.
If a node fails, volumes that have data stored on that node can be affected. For a volume with a replication factor of 3, a single node failure may cause the volume protection state to become Degraded. If another node fails, the volume’s state may become ReadOnly.
Although it is not recommended in AWS, RF2 volumes will become ReadOnly with a single node failure, and RF1 volumes will become Inaccessible if the node the data is on fails.
In case all nodes that hold a volume’s replica fail, the volume becomes Inaccessible.
You can view the volume’s protection state by issuing the lbcli List Volumes command.
Sample Command
$ lbcli -J $LIGHTOS_JWT --project-name=a list volumes
Sample Output
Name UUID Protection State State Size Replicas ACL
vol1 76c3eae8 FullyProtected Created 200 GiB 3 values:"acl1"
vol2 3f3c3ad2 Degraded Created 200 GiB 3 values:"acl2"
vol3 8700cba8 ReadOnly Created 200 GiB 1 values:"acl3"
As you can see in the output, vol2 and vol3 are not in a FullyProtected volume protection state.
Now, you can use the lbcli List Nodes command to identify which node has failed (the Cluster Admin command). In this command’s output you will see one of the following node states:
Sample Command
$ lbcli -J $LIGHTOS_JWT list nodes
Sample Output
NAME UUID State NVME-Endpoint
server00-0 192af7c0-d39f-4872-b849-7eb3dc0f7b53 Active 10.23.26.13:4420
server01-0 1f4ef0ce-0634-47c7-9e5f-d4fd910ff376 Active 10.23.26.8:4420
server02-0 6d9b8337-18cd-4b14-bea1-f56aca213d68 Inactive 10.23.26.4:4420