Dynamic Rebalancing on Capacity Usage

Dynamic rebalancing based on physical capacity usage is an important feature designed to ensure that all of the data that was placed on all Lightbits storage instances are balanced.

This feature can be enabled or disabled per user requirements. When proactive rebalance mode is enabled, the cluster will rebalance cluster capacity - automatically preventing scenarios where one storage node in a cluster can reach read-only status, while other nodes have free space to serve more capacity.

Test Purpose

The purpose of this test is to prove that this feature can work as expected. There are several conditions to trigger this rebalancing. For additional information, please refer to Dynamic Rebalancing in the Lightbits Administration Guide.

Test Steps

  1. Check the proactive-rebalance setting. If it is not enabled, you can enable it with the “lbcli enable feature-flag proactive-rebalance” command.
Bash
Copy
  1. Create several large volumes that can consume one specific storage node at 50% physical capacity. Note that you may need to create more volumes than in the example below, to be able to select enough volumes whose replications are located in two storage servers. You can then remove these irrelevant volumes.
Bash
Copy
  1. In the client server, check the volume and multi-path information, to make sure it consumes two specific storage nodes’ capacity, and keeps another storage node with more than 80% free capacity.
Bash
Copy
  1. Run FIO to write data to all of these volumes, and try to make them full with sequential IO. This will make total capacity utilization in these two nodes more than 50%, and will then be expected to trigger dynamic rebalancing. This could take time, and it can be done with a script as shown below.
Bash
Copy
  1. During this data filling, monitor the nodes’ physical capacity utilization with the “lbcli get node” command. You can also use the Grafana monitoring GUI node dashboard to check the capacity utilization.
Bash
Copy
  1. After all the volumes are filled, use “nvme list-subsys” to check the nvme multi-path information. Some volumes’ replication should be moved to the new node.
Bash
Copy
  1. Check the capacity utilization of each node again:
Type to search, ESC to discard
Type to search, ESC to discard
Type to search, ESC to discard