Identifying a Node Failure in a Lightbits Cluster

The Lightbits cluster software continuously monitors the cluster nodes' health and connectivity and responds to changes in the nodes’ status.

If a node fails, volumes that have data stored on that node can be affected. For a volume with a replication factor of 3, a single node failure may cause the volume protection state to become Degraded. If another node fails, the volume’s state may become ReadOnly.

For a volume with a replication factor of 2, a single node failure may cause the volume to become ReadOnly.

In case all nodes that hold a volume’s replica fail, the volume becomes Inaccessible.

You can view the volume’s protection state by issuing the lbcli list volumes command.

Sample Command

Bash
Copy

In this command, the JWT is not stored in the lbcli configuration file:

Bash
Copy

Sample Output

Bash
Copy

As you can see in the output, vol2 and vol3 are not in a FullyProtected volume protection state.

Now, you can use the lbcli list nodes command to identify which node has failed. In this command’s output you will see one of the following node states:

Node stateDescription
ActivatingNode is being activated and is currently unable to serve IOs. This state can occur after a node is reconnected to the network, coming up from reboot, or recovering from any other failure state. After the activation is complete, the node’s state transitions to Active.
ActiveNode is active and can serve IOs.
DeactivatingNode failure is detected and the Lightbits cluster software is changing the roles of other nodes in the cluster to keep data accessible.
InactiveNode is inactive.

Sample Command

Bash
Copy

A -J flag after lbcli indicates that the JWT is not stored in the lbcli configuration file.

Sample Output

Bash
Copy
Type to search, ESC to discard
Type to search, ESC to discard
Type to search, ESC to discard