Dynamic Rebalancing
Fail in Place
When fail in place mode activates, the Lightbits cluster will try to move replications of volumes from failed nodes to other healthy nodes, while preserving failure domain requirements.
Fail in place mode is activated according to user requests, and the duration from the node failure until the cluster starts recovery of volumes is determined by a cluster configuration called DurationToTurnIntoPermanentFailure
. Replica migration starts DurationToTurnIntoPermanentFailure
amount of time from the time of node failure.
Fail in Place Example
Fail in place mode is activated by enabling feature-flag.
lbcli enable feature-flag fail-in-place [flags]
Example:
enable a feature flag in place with given feature flag name
lbcli --jwt $JWT enable feature-flag fail-in-place
Fail in place mode is disabled by disabling feature-flag.
lbcli disable feature-flag fail-in-place [flags]
Example:
disable a feature flag in place with given feature flag name
lbcli --jwt $JWT disable feature-flag fail-in-place
Example of set duration to cluster rebalance.
lbcli -J $JWT update cluster-config --parameter=DurationToTurnIntoPermanentFailure --value=20m
Proactive Rebalance
When proactive rebalance mode is enabled, the cluster will rebalance cluster capacity - automatically preventing scenarios where one storage node in a cluster can reach read-only status, while other nodes have free space to serve more capacity.
During proactive rebalance of a volume, the protection state of a volume is kept according to the state of nodes in the cluster. When a volume migrates from a source node, it will create another temporary replica of the volume, and the replica of the source node will be removed when the destination node syncs all of the required volume data.
The following section describes when proactive rebalance is activated. The read-only thresholds are:
- Storage effective capacity.
- Available RAM for metadata.
The reasons to trigger volume migration (the rebalance of volumes) are:
Node nearing read-only:
- A source node is eligible to migrate volumes if the node utilization is 10% from the read-only threshold.
- A destination node is eligible for migration if it has at least 30% free capacity from the read-only threshold.
Cluster capacity imbalance:
- There is a node in the cluster where utilization is under 20%.
- The capacity difference between the two most imbalanced nodes exceeds 30%. Specifically, if the node with the highest capacity utilization has X% utilization and the node with the lowest capacity utilization has Y% utilization, rebalancing occurs when X − Y > 30.
Proactive Rebalance CLI Example
Proactive rebalance mode is activated by enabling feature-flag.
lbcli enable feature-flag proactive-rebalance [flags]
Example:
enable a feature proactive rebalance with given feature flag name
lbcli --jwt $JWT enable feature-flag proactive-rebalance
Proactive rebalance mode is disabled by disable feature-flag.
lbcli disable feature-flag proactive-rebalance [flags]
Example:
disable a feature flag proactive rebalance with given feature flag name
lbcli --jwt $JWT disable feature-flag proactive-rebalance