Release 3.15.1

Release Date

v3.15.1 was released to the public on June 03, 2025.

New in This Release

This release introduces the following changes since version 3.14.x. A change is classified as either a new feature, an enhancement, a major issue (e.g., an issue that could lead to potential data loss or service loss), or a minor issue.

Issue TypeDescriptionID
New Feature (GA) Added API-driven rotation of the cluster root encryption key (KEK). Invoking the API will create a new KEK and re-encrypt the existing DEKs with the new KEK. See the relevant documentation in the Cluster-Level Encryption article.LBM1-33506
New Feature (GA) Upgraded etcd to version 3.5.18, which has enhanced security and bug fixes. Note: The "DB Size" graph in "ETCD by Prometheus" dashboards in Grafana relies on a deprecated metric. Use the updated dashboard provided in the 3.15.1 release for correct operation of this graph.LBM1-36732
New Feature (Tech Preview) Enhanced the NVMe deallocate implementation to improve capacity utilization. This enhancement integrates with the file system’s TRIM command, enabling the operating system to identify and release unused data blocks. Refer to the Lightbits TRIM Support documentation to enable this capability. Note that this feature is in tech preview, which means it should not be used in production setups.LBM1-35258
Enhancement api-service: The replace node API will now fail at the api-service, if cluster encryption is enabled while the destination server is still not encryption enabled.LBM1-36477
Enhancement build: Updated to golang 1.24.1.LBM1-36917
Enhancement duroslight: Removed the timestamp from journal log entries (as the journal itself already provides them).LBM1-36921
Enhancement node-manager: Automatically calculate the required timeout for node powerup, based on node capacity. The timeout was previously a fixed value.LBM1-36874
Enhancement node-manager: Enabled selecting NVMe devices by serial number, as device paths are not stable and could change across reboots.LBM1-36722
Enhancement prometheus: Updated Prometheus' record.rules.yaml to calculate the metric for total TCP connections per node.LBM1-36609
Enhancement tpm: Get the TPM path dynamically, and do not use a hardcoded /dev/tpm0.LBM1-34675
Major cluster-manager: Eliminated redundant short rebuilds following a primary switch or a primary failover. Such rebuilds created an unnecessary single point of failure; i.e., additional failures during this short rebuild could cause some volumes to become Unavailable or Read Only - with both cases resulting in a loss of service.LBM1-13327
Major cluster-manager/duroslight: Fixed a race condition where, during the graceful shutdown of a node, incorrect node connectivity events could be sent. This could cause Lightbits to mistakenly assume a healthy server was disconnected and mark it as inactive.LBM1-32958
Major duroslight: Fixed a rare race condition during node recovery, which trips an assertion causing duroslight to crash.LBM1-37085
Major node-manager: Added an extra safety measure to protect against having more than a single accessible path per volume. Prior to this change, following a primary-switch, new primaries would wait for the old ones to disable themselves before marking themselves as optimized paths. However, they would only wait the amount of time it would take a node to internally consider itself as self-failed. If a node setting itself as a not optimized path encounters some issue that makes it slow/non-responsive to the relevant update, two optimized paths may be exposed.LBM1-36651
Major node-manager: Node-manager did not attempt to restart GFTL.service if systemd failed to start the service. This fix introduced a retry mechanism.LBM1-37361
Major node-manager: Fixed an issue where a server failed to power up when configured with a single instance across multiple NUMA nodes, and all SSDs are located in the second NUMA. This issue could only be triggered during upgrades or addition of new servers.LBM1-37114
Major userlbe: Fixed a bug where IO errors during abrupt recovery were handled incorrectly - thereby causing recovery to fail.LBM1-37205
Minor api-service: Prior to this modification, the customer's IdP server would be accessed as soon as the IdP configuration was created, regardless of whether the Federated Authentication feature was enabled. This fix ensures that the IdP server is only accessed if both the IdP configuration is in place and the Federated Authentication feature is enabled.LBM1-35674
Minor cluster-manager: Fixed an issue that could cause an event to incorrectly indicate that migrating volumes are degraded when in fact they are fully protected.LBM1-33865
Minor data-layer: Fixed an issue when deleting servers that could cause a delete server task to hang around after the server was deleted.LBM1-37395
Minor data-layer: Fixed a rare issue that caused the rebuild progress to be reported as 1% instead of the actual rebuild progress.LBM1-37389
Minor data-layer: Resolved an issue preventing proactive rebalance from selecting target nodes that previously had permanent failures and proactive rebalance involving volumes that had snapshots - even after a long period of time - due to certain conditions that caused obsolete snapshot keys to still exist in etcd even though their corresponding data was moved or deleted.LBM1-34858
Minor discovery-client: Fixed an issue that could cause nvme discover to crash with a nil pointer exception when receiving malformed responses.LBM1-36790
Minor discovery-client: Passed the host's hostid when issuing nvme connect. Previously discovery could fail because Lightbits did not pass this value, and the kernel would send a random value, which could cause "nvme_fabrics: found same hostnqn <...> but different hostid <...> when using both the discovery-client and nvme-cli in parallel.LBM1-36644
Minor

lbcli: Fixed an issue in listHosts where the volumeUUID filter was ignored.

lightbits-api: Added a hostNQN filter in the listHosts API.

LBM1-36282
Minor lightbits-api: Fixed the listHosts and listVolumes APIs, which previously indicated hosts were not in volume IPACL as connected to the volume, although they were not connected.LBM1-36964
Minor userlbe: Fixed a race condition that, in rare scenarios, could cause a node to perform an abrupt power-up instead of a graceful power-up.LBM1-37264
Minor userlbe: Fixed a potential crash when adding a new disk during system runtime.LBM1-37070
Minor userlbe: Fixed a GFTL (lbe) crash due to a rare race condition between an SSD failure/removal and SSD read submissions by UNITREADER: observed only with multiple concurrent volume/node rebuilds running. The problem signature in the log is: "IO was partially done. Handling not implemented yet (completed: 0 bytes..." followed by "Assert failed on:(unit->is gc_read_unit)".LBM1-36882
Minor userlbe: Fixed a rare issue that could only happen on VMs with very little storage, where recovery could crash with a start position equal to the target position when recovery wrapped around the storage.LBM1-37387
Minor userlbe: Resolved an issue where a process would terminate abruptly during shutdown instead of exiting gracefully. Note that in such cases, the system will still perform a graceful power-up.LBM1-36844

Installation and Upgradeability

You can upgrade to this release from all previous Lightbits 3.12.x, 3.13.x, and 3.14.x releases.

Type to search, ESC to discard
Type to search, ESC to discard
Type to search, ESC to discard