Lightbits Cluster Architecture
The Lightbits cluster storage solution distributes services and replicates data across different Lightbits servers, to guarantee service and data availability when one or more Lightbits servers experience transient or permanent failures. A cluster of Lightbits servers replicates data internally and keeps it fully consistent and available in the presence of failures. From the perspective of clients accessing the data, data replication is transparent, and server failover is seamless.
In MultiAZ deployments, the cluster will guarantee that each replica will be held in a different availability zone.
The following sections explain the concepts of replication factor, failure domains, and volume placement used in the Lightbits cluster architecture.
For more information about Lightbits cluster architecture, see the Deploying Reliable High-Performance Storage with Lightbits Whitepaper.
Volume Assignments
As described above, the Lightbits storage cluster uses node replication in the cluster for data availability and durability. Lightbits supports Replication Factors 1, 2, and 3 - where Replication Factor 3 (RF3) means that volumes are replicated on three separate storage nodes, and Replication Factor 1 (RF1) means that they are stored on only one node with no replication.
In Lightbits SDS in Azure, it is recommended to have all volumes created with RF3.
In RF3, one of the storage nodes holding the volume will act as the primary (P) node for that volume, and the other two storage nodes holding a replication of the volume will act as secondary (S) nodes.
Each storage node that stores data of multiple volumes can act as a primary node for some volumes and as a secondary node for others. A primary node for a certain volume appears in the accessible path of the client using that volume, handles all user IO requests for that volume, and replicates data to the secondary nodes. If a primary node fails, the NVMe/TCP multipath feature updates the accessible path and reassigns one of the secondary nodes to be the new primary node.
When a user creates a volume, Lightbits transparently selects the nodes that hold the volume’s data and configures the primary and secondary roles. The node selection logic balances the volumes between nodes upon volume creation.
It is recommended that clients created within a specific zone will be allocated a volume with a primary node in the same zone in order to reduce latency and lower charge of cross-AZ traffic. See the Volume Creation section for additional information.
Failure Domains
A failure domain is a set of resources that will be negatively impacted in the event of a particular failure; for example, a rack or server room failure. In a single Azure AZ, a failure domain can be a VM. In a multi-AZ cluster, a failure domain will be an Availability Zone.
Each volume replica will be placed in a different failure domain, so that if one of the domains fails, the rest of the cluster can continue working with no downtime and with minimal effect on the cluster.
Single AZ
In a single AZ, if a VM fails, the cluster is still available with minimal impact. But if the AZ where the cluster is deployed fails, the cluster will no longer be available and data is at risk until new servers are allocated after the AZ is recovered.
MultiAZ
In a Multi AZ, if a VM fails or if one of the zones is down, the cluster is still available with minimal impact.
Dynamic Rebalancing
With dynamic rebalancing, the cluster will try to move volumes from one node to other nodes. Note that you will need to distinguish between proactive rebalance and fail in place (described below).
Fail in Place
When fail in place mode activates, the Lightbits cluster will try to move replications of volumes from failed nodes to other healthy nodes, while preserving failure domain requirements.
Proactive Rebalance
The proactive rebalance feature enables the cluster to automatically balance volumes between nodes, based on capacity.
When proactive rebalance mode is enabled, the cluster will automatically rebalance cluster capacity. This will prevent scenarios where one storage node in the cluster is over capacity reaching read-only status, while other nodes have available space to serve more capacity.