Dynamic Rebalancing

Fail in Place

When fail in place mode activates, the Lightbits cluster will try to move replications of volumes from failed nodes to other healthy nodes, while preserving failure domain requirements.

Fail in place mode is activated according to user requests, and the duration from the node failure until the cluster starts recovery of volumes is determined by a cluster configuration called DurationToTurnIntoPermanentFailure. Replica migration starts DurationToTurnIntoPermanentFailure amount of time from the time of node failure.

Fail in Place Example

Fail in place mode is activated by enabling feature-flag.

lbcli enable feature-flag fail-in-place [flags]

Example:

enable a feature flag in place with given feature flag name

lbcli --jwt $JWT enable feature-flag fail-in-place

Fail in place mode is disabled by disabling feature-flag.

lbcli disable feature-flag fail-in-place [flags]

Example:

disable a feature flag in place with given feature flag name

lbcli --jwt $JWT disable feature-flag fail-in-place

Example of set duration to cluster rebalance.

lbcli -J $JWT update cluster-config --parameter=DurationToTurnIntoPermanentFailure --value=20m

Proactive Rebalance

When proactive rebalance mode is enabled, the cluster will rebalance cluster capacity - automatically preventing scenarios where one storage node in a cluster can reach read-only status, while other nodes have free space to serve more capacity.

During proactive rebalance of a volume, the protection state of a volume is kept according to the state of nodes in the cluster. When a volume migrates from a source node, it will create another temporary replica of the volume, and the replica of the source node will be removed when the destination node syncs all of the required volume data.

The following section describes when proactive rebalance is activated. The read-only thresholds are:

Storage effective capacity.
Available RAM for metadata.

The reasons to trigger volume migration (the rebalance of volumes) are:

Node nearing read-only:
1. A source node is eligible to migrate volumes if the node utilization is 10% from the read-only threshold.
2. A destination node is eligible for migration if it has at least 30% free capacity from the read-only threshold.
Cluster capacity imbalance:
1. There is a node in the cluster where utilization is under 20%.
2. The capacity difference between the two most imbalanced nodes exceeds 30%. Specifically, if the node with the highest capacity utilization has X% utilization and the node with the lowest capacity utilization has Y% utilization, rebalancing occurs when X − Y > 30.

Proactive Rebalance CLI Example

Proactive rebalance mode is activated by enabling feature-flag.

lbcli enable feature-flag proactive-rebalance [flags]

Example:

enable a feature proactive rebalance with given feature flag name

lbcli --jwt $JWT enable feature-flag proactive-rebalance

Proactive rebalance mode is disabled by disable feature-flag.

lbcli disable feature-flag proactive-rebalance [flags]

Example:

disable a feature flag proactive rebalance with given feature flag name

lbcli --jwt $JWT disable feature-flag proactive-rebalance

Last updated on

Was this page helpful?