Identifying a Node Failure in a Lightbits Cluster
The Lightbits cluster software continuously monitors the cluster nodes' health and connectivity and responds to changes in the nodes’ status.
If a node fails, volumes that have data stored on that node can be affected. For a volume with a replication factor of 3, a single node failure may cause the volume protection state to become Degraded. If another node fails, the volume’s state may become ReadOnly.
For a volume with a replication factor of 2, a single node failure may cause the volume to become ReadOnly.
In case all nodes that hold a volume’s replica fail, the volume becomes Inaccessible.
You can view the volume’s protection state by issuing the lbcli list volumes
command.
Sample Command
$ lbcli list volumes
In this command, the JWT is not stored in the lbcli configuration file:
$ lbcli -J $JWT --project-name=<Project Name> list volumes
Sample Output
Name UUID Protection State State Size Replicas ACL
vol1 76c3eae8 FullyProtected Created 200 GiB 3 values:"acl1"
vol2 3f3c3ad2 Degraded Created 200 GiB 3 values:"acl2"
vol3 8700cba8 ReadOnly Created 200 GiB 2 values:"acl3"
As you can see in the output, vol2 and vol3 are not in a FullyProtected volume protection state.
Now, you can use the lbcli list nodes
command to identify which node has failed. In this command’s output you will see one of the following node states:
Node state | Description |
---|---|
Activating | Node is being activated and is currently unable to serve IOs. This state can occur after a node is reconnected to the network, coming up from reboot, or recovering from any other failure state. After the activation is complete, the node’s state transitions to Active. |
Active | Node is active and can serve IOs. |
Deactivating | Node failure is detected and the Lightbits cluster software is changing the roles of other nodes in the cluster to keep data accessible. |
Inactive | Node is inactive. |
Sample Command
$ lbcli list nodes
A -J flag after lbcli indicates that the JWT is not stored in the lbcli configuration file.
Sample Output
NAME UUID State NVME-Endpoint
server00-0 192af7c0-d39f-4872-b849-7eb3dc0f7b53 Active 10.23.26.13:4420
server01-0 1f4ef0ce-0634-47c7-9e5f-d4fd910ff376 Active 10.23.26.8:4420
server02-0 6d9b8337-18cd-4b14-bea1-f56aca213d68 Inactive 10.23.26.4:4420
server03-0 912736af-bbc5-45c5-ba22-901eea9f9fde Active 10.23.26.29:4420 server04-0 dc3ee1b5-0625-4a4c-b627-76fbd66db74c Active 10.23.26.7:4420
server05-0 e157c1a2-701b-403b-bb73-e0c1f4be0096 Active 10.23.26.5:4420