Contact Lightbits Support before enabling this feature.
When a client (host) wants to connect to an SSD TCP server (target), it needs to specify various arguments. One of these arguments is the NVMe Qualified Name (NQN) of the subsystem to which to connect.
A subsystem gives access to the actual volumes being served. When a client connects to the subsystem, the client and target create a controller that adheres to the SSD protocol. Through the controller, reads and writes can be submitted to the volumes, which are referred to as namespaces in SSD terminology.
When a controller is created, it receives a unique 16-bit controller ID number. The standard reserves a few controller IDs, so not all of them are available. The namespaces, in turn, are assigned a 16-bit namespace ID (NSID). In short, a subsystem is a collection of controllers and volumes.
With just a single subsystem NQN, the number of controllers and volumes SSD could provide is limited. However, the NQN itself consists of 255 bytes, which allows SSD to scale to vast amounts of controllers and volumes by scaling the number of subsystems. A host can connect to multiple SSD subsystems - provided it has access.
Controller IDs and NSIDs are ephemeral. As controllers and namespaces are created and destroyed, IDs are recycled. To uniquely identify a volume, a namespace provides the NGUID property.
Lightbits SSD Controllers
Since SSD itself has no notion of clustering, SSD controller IDs within a Lightbits (LB) subsystem are sharded across the LB nodes. Sharding is a commonly-used approach in distributed systems, as it ensures that - based on some heuristic - each node handles a portion of, in this case, the controller IDs. This results in a predictable load distribution, which is why a single LB node can only serve 2030 controllers per subsystem. The NGUID of the namespaces corresponds to the volume UUID that is defined when the volume is created through the API or CLI.
Implementation of Auxiliary Subsystems
Lightbits has also implemented auxiliary subsystems, which can be enabled in addition to the default subsystem.. The default number of auxiliary subsystems is 4, allowing for a total of 5 × 2,030 controllers per node.
Since the NQN must be unique, connecting to the auxiliary subsystem requires a suffix during connection. This suffix is defined as “aux-N” with 0 > N <= 4. To avoid reconfiguring existing deployments, NQNs without a suffix are mapped to the first auxiliary (aux-0) subsystem.
The difference between a subsystem and an auxiliary subsystem is that, by definition, all of the volumes of the default subsystem are made accessible to the auxiliary subsystem.
Auxiliary Subsystem Considerations
A namespace is identified by a unique NGUID, and only one subsystem should offer access to a specific volume. Standard SSD-over initiators are not designed to handle volumes appearing in multiple subsystems simultaneously. Connecting a host to both "aux-1" and "aux-2" concurrently will trigger a kernel log error: "nvme nvme1: ignoring nsid 9174 because of duplicate IDs". Older kernels (e.g., 5.14) lack this protection, causing the block device to appear multiple times, which should not be mistaken for Asymmetric Namespace Access (ANA).
A consequence of this is that when the device is opened (e.g., by a hypervisor), it becomes challenging to determine precisely which controller is in use, making it difficult (though not impossible) to decide which controller to disconnect.
Under normal operation, when utilizing the discovery-client, you will only connect to a single auxiliary subsystem, thus preventing this scenario.
Discovery-client Configuration
New hosts should be configured to connect to a specific auxiliary subsystem. This should be configured in /etc/discovery-client/discovery-client.yaml with a new entry: AuxSuffix: aux-1. After restarting the updated DC, a log entry appears: “level=info msg="Starting service using auxiliary subsystem with suffix aux-1".
Some additional important things to remember:
- Before setting the AuxSuffix, ensure that there are no existing connections.
- When no AuxSuffix is specified, the default subsys NQN will be used.
- AuxSuffix: aux-0 is not a valid entry; connections will be refused by duroslight. The only valid entries are aux-1, aux-2, aux-3, and aux-4.
- When changing a host, i.e., from aux-1 to aux-3, ensure that there are no existing connections.
Auxiliary subsystems can be configured with a newer discovery client with auxiliary subsystem awareness. Older discovery clients will just connect to the default subsystem.
Duroslight Configuration
No additional configuration is required. However, be aware of necessary file descriptors when using a lot of connections; it is recommended to reduce max_ctrl_ io_queues to 32 or 64.
To determine if the correct binary is deployed, during startup, duroslight will report the aux_subsystem_ count. A new Prometheus metric is also added to show the distribution of ctrl_ids usage across subsystems:
curl localhost:9180/metrics 2>/dev/null|grep fe_ctrl_id_used # HELP lightbox_fe_ctrl_id_used ctrl_id_used# TYPE lightbox_fe_ctrl_id_used counterlightbox_fe_ctrl_id_used{pool_id="0",shard="0",type="counter"} 2lightbox_fe_ctrl_id_used{pool_id="1",shard="0",type="counter"} 1lightbox_fe_ctrl_id_used{pool_id="2",shard="0",type="counter"} 2lightbox_fe_ctrl_id_used{pool_id="3",shard="0",type="counter"} 0lightbox_fe_ctrl_id_used{pool_id="4",shard="0",type="counter"} 0