Release 3.18.2

AI Tools

Release Date

v3.18.2 was released to the public on June 17, 2026.

New in This Release

This release introduces the following changes since version 3.17.x. A change is classified as either a new feature, an enhancement, a major issue (e.g., an issue that could lead to potential data loss or service loss), or a minor issue.

Issue Type	Description	ID
Enhancement	Added the ability to create encrypted thin clones from unencrypted base snapshots, with each derived volume protected by its own unique encryption key. Available on clusters configured with encryption. See the lbcli Create Volume documentation for details.	LBM1-42408
Enhancement	Added the ability to reserve additional RAM at deployment time by setting an optional custom reserve value. This is useful when the default reservation for the OS and non-Lightbits services (8 GiB) is not sufficient. The total reserved RAM is capped at 15% of server memory, or the existing default of 21 GiB per Lightbits instance.	LBM1-40362
Enhancement	Added two new Grafana dashboard panels for connectivity visibility: the number of disconnected nodes (the nodes a client is expected to be connected to but is not), and the number of hosts impacted by those disconnections. This makes it easier to spot and diagnose client-to-node connectivity issues at a glance.	LBM1-41407
Enhancement	Hardened error handling across shared internal components and the lbcli command-line tool - improving robustness of operations such as TLS setup and certificate handling at startup.	LBM1-43622
Enhancement	Hardened error handling during backend subsystem initialization at service startup - preventing rare early-startup failures and improving the reliability of the node and cluster management services.	LBM1-43620
Enhancement	Hardened error handling during Cluster Manager startup, improving the reliability of cluster initialization.	LBM1-43618
Enhancement	Hardened error handling in the API service request handlers, improving the reliability of volume management and other API operations.	LBM1-43619
Enhancement	Hardened error handling in the monitoring and telemetry components so that metrics - including SMART disk-health data - are collected accurately and without interruption.	LBM1-43625
Enhancement	Hardened error handling in the Node Manager core - improving the robustness of startup diagnostics and overall service stability.	LBM1-43617
Enhancement	Hardened error handling in the volume and data-layer task paths so that rare error conditions during service startup are surfaced correctly instead of being silently masked - improving the startup reliability of core data services.	LBM1-43621
Enhancement	Improved resiliency on dual-node servers: when a journaling device fails, only the affected node instance is restarted rather than the entire Node Manager service. This reduces the blast radius of a journaling device failure and improves overall service continuity.	LBM1-39359
Major	A race condition that could cause a deadlock during updates to a Protection Rule (PR) or volume protection state has been resolved.	LBM1-42800
Major	Fixed a PG migration stall caused by snapshot deletion during `createExistingSnapshots`. To improve resiliency, the process now skips failed snapshots rather than aborting - ensuring that remaining node-snapshot keys are still written.	LBM1-41873
Major	Fixed a rare condition where a storage node could fail to start if placement group membership changed while the node was offline. Memory pressure during a prior recovery could leave stale metadata persisted due to internal counters not being reset; a subsequent restart would then fail a consistency check. The recovery fallback logic now fully resets all affected counters when a partial failure is detected.	LBM1-43456
Major	Hardened the Node Manager against a rare deadlock that could occur while updating feature flags, so the node continues operating reliably instead of hanging.	LBM1-42974
Major	Strengthened control-plane resiliency so that a node still rebuilding its data is no longer promoted from secondary to primary. This keeps the active control-plane role on a fully recovered node during rebuilds.	LBM1-42314
Major	Strengthened node-recovery resiliency against a rare timing condition where deleting a volume while a recovering node was updating its volume statistics could cause a storage-layer crash. Volume deletes during recovery are now handled safely.	LBM1-43379
Major	Strengthened the resiliency of node recovery and rebuild against a rare timing condition between snapshot deletion and node recovery. Snapshots are now cleaned up consistently across nodes, preventing a stale snapshot from later causing a rebuild to fail.	LBM1-44159
Major	Strengthened the resiliency of the cluster upgrade flow against a rare timing condition between the Cluster Manager and the upgrade process, where an upgrade task could be completed and removed just as it was being loaded. Upgrade tasks now load reliably, so server upgrades complete as expected instead of entering an unexpected failed state.	LBM1-42249
Minor	A resiliency safeguard has been added to prevent the Node Manager (NM) from reassigning a device already designated as a data device for use as a journal device, further improving overall cluster robustness.	LBM1-44298
Minor	Clarified the lbcli and REST API documentation for the disable-server (evict) operation, accurately describing its behavior for servers hosting RF=1 (single-replica) volumes and the scope of the force flag.	LBM1-44875
Minor	`cluster-manager`: Fixed a rare condition where a deprecated event key in etcd could cause the event cleaner to exit, leading to event accumulation and a potential out-of-memory (OOM) condition during CM switchover.	LBM1-42614
Minor	Fix for install-lightos cleanup task, deleting server-config.yaml, which holds cluster endpoints.	LBM1-42366
Minor	Fixed a rare condition where Duroslight could hang for approximately five minutes during shutdown. Duroslight now cancels pending futures upon receiving the shutdown command. As a result, rebuild times upon recovery may be slightly longer.	LBM1-38706
Minor	Improved device-management resiliency so that a healthy NVMe device can be added even while the server is rebuilding data after a previous device failure. Adding a replacement device now succeeds in this scenario instead of being rejected.	LBM1-43410
Minor	Improved event accuracy during planned server-disable operations. Disabling a server now reports an event indicating the node is inactive because the server was disabled, instead of a misleading connectivity-issue event - giving operators a clearer signal during maintenance.	LBM1-40182
Minor	Improved journal device event reporting so that these events now include the originating node identifier - making it easier to pinpoint which node an event relates to.	LBM1-42966
Minor	Improved the accuracy of journal device failure detection so that a Duroslight failure caused by a non-journal issue is no longer misclassified as a journal device failure. This prevents a node from being incorrectly marked as failed when SSD journaling is enabled, and removes spurious journal-device-failure events when journaling is not in use.	LBM1-42963
Minor	Improved the systemd metrics collector by reducing excessive log output and documenting its exposed metrics - giving operators cleaner exporter logs and a clearer monitoring reference.	LBM1-42172
Minor	Increased the accuracy of the alert calculation logic and improved alert message firing for `NodeRebuildNotPossible`, to ensure that the alert triggers as expected.	LBM1-41095
Minor	Strengthened the resiliency of Key Encryption Key (KEK) rotation on encryption-enabled clusters, so the Cluster Manager now restarts reliably even if a failover coincides with a brief window during KEK rotation. This keeps the management API and cluster-state handling available, and the data path is unaffected as long as all nodes are healthy.	LBM1-42309
Minor	Strengthened the resiliency of NVMe SSD device-state tracking so that a network disconnect coinciding precisely with a device state change no longer prevents later state updates for that device - keeping device health reporting accurate.	LBM1-42991
Minor	Strengthened the resiliency of the Cluster Manager's placement-group (PG) replacement flow. New safeguards prevent a rare timing condition — two PG members failing permanently at nearly the same time from placing two replicas of the same volume on one node, keeping volume protection state consistent and hardening data placement.	LBM1-44273
Minor	Strengthened the resiliency of volume migration so that a snapshot deleted in the background during migration setup is cleaned up correctly on the target node. This prevents stale snapshot data from being left behind, which could otherwise cause a later rebuild or migration to fail.	LBM1-42584

Installation and Upgradeability

You can upgrade to this release from all previous Lightbits 3.15.x, 3.16.x, and 3.17.x releases.

Last updated on

Was this page helpful?