Identifying a Failed SSD Drive

With EC enabled, the Lightbits software allows IOs to continue without interruption in the case of an SSD failure. There are three drive status values when troubleshooting drive failures.

Drive StatusDescription
HealthyThe SSD is functioning properly.
RebuildingThe SSD has failed and data reconstruction is in progress.
FailedData reconstruction has completed. You can remove the failed SSD and insert a new SSD.
  1. Check the nvme devices status by entering the lbcli list nvme-devices command to see if any SSD has failed and is in an EC rebuilding process.

Sample Command

Bash
Copy

A -J flag after lbcli indicates that the JWT is not stored in the lbcli configuration file.

Sample Output

Bash
Copy

In this example, the output shows one NVMe SSD which has failed and is now during data reconstruction.

Since this example does not use the --node-uuid or --server-uuid flags, the output shows all of the failed NVMe SSDs across the entire cluster. You can filter for specific nodes or servers using these flags. Once the data reconstruction is complete and the SSD state changes to Failed, the SSD is no longer managed by any node and is not associated with a node UUID.

  1. To monitor a failed SSD’s rebuild progress, use the lbcli get node command with the --node-uuid flag for the Lightbits node that is managing the failed NVMe SSD.

Sample Command

Bash
Copy

A -J flag after lbcli indicates that the JWT is not stored in the lbcli configuration file.

Sample Output

Bash
Copy
  1. Recheck the NVMe devices’ status with the lbcli list nvme-devices command to see if the status has changed from Rebuilding to Failed for the failed SSD. If the status is changed, the rebuild process is complete.

Sample Command

Bash
Copy

A -J flag after lbcli indicates that the JWT is not stored in the lbcli configuration file.

Sample Output

Bash
Copy

To replace the failed device, follow the steps for Adding an NVMe SSD to a Lightbits Storage Server.

Type to search, ESC to discard
Type to search, ESC to discard
Type to search, ESC to discard