Lightbits Release Documentation
3.19.x
Lightbits Release Notes
Lightbits Known Issues
Title
Message
Create new category
What is the title of your new category?
Edit page index title
What is the title of the page index?
Edit category
What is the new title of your category?
Edit link
What is the new title and URL of your link?
Release 3.18.2
AI Tools
Summarize Page
Copy Markdown
Open in ChatGPT
Open in Claude
Connect to Cursor
Connect to VS Code
Release Date
v3.18.2 was released to the public on June 17, 2026.
New in This Release
This release introduces the following changes since version 3.17.x. A change is classified as either a new feature, an enhancement, a major issue (e.g., an issue that could lead to potential data loss or service loss), or a minor issue.
| Issue Type | Description | ID |
|---|---|---|
| Added the ability to create encrypted thin clones from unencrypted base snapshots, with each derived volume protected by its own unique encryption key. Available on clusters configured with encryption. See the lbcli Create Volume documentation for details. | LBM1-42408 | |
| Added the ability to reserve additional RAM at deployment time by setting an optional custom reserve value. This is useful when the default reservation for the OS and non-Lightbits services (8 GiB) is not sufficient. The total reserved RAM is capped at 15% of server memory, or the existing default of 21 GiB per Lightbits instance. | LBM1-40362 | |
| Added two new Grafana dashboard panels for connectivity visibility: the number of disconnected nodes (the nodes a client is expected to be connected to but is not), and the number of hosts impacted by those disconnections. This makes it easier to spot and diagnose client-to-node connectivity issues at a glance. | LBM1-41407 | |
| Hardened error handling across shared internal components and the lbcli command-line tool - improving robustness of operations such as TLS setup and certificate handling at startup. | LBM1-43622 | |
| Hardened error handling during backend subsystem initialization at service startup - preventing rare early-startup failures and improving the reliability of the node and cluster management services. | LBM1-43620 | |
| Hardened error handling during Cluster Manager startup, improving the reliability of cluster initialization. | LBM1-43618 | |
| Hardened error handling in the API service request handlers, improving the reliability of volume management and other API operations. | LBM1-43619 | |
| Hardened error handling in the monitoring and telemetry components so that metrics - including SMART disk-health data - are collected accurately and without interruption. | LBM1-43625 | |
| Hardened error handling in the Node Manager core - improving the robustness of startup diagnostics and overall service stability. | LBM1-43617 | |
| Hardened error handling in the volume and data-layer task paths so that rare error conditions during service startup are surfaced correctly instead of being silently masked - improving the startup reliability of core data services. | LBM1-43621 | |
| Improved resiliency on dual-node servers: when a journaling device fails, only the affected node instance is restarted rather than the entire Node Manager service. This reduces the blast radius of a journaling device failure and improves overall service continuity. | LBM1-39359 | |
| A race condition that could cause a deadlock during updates to a Protection Rule (PR) or volume protection state has been resolved. | LBM1-42800 | |
Fixed a PG migration stall caused by snapshot deletion during createExistingSnapshots. To improve resiliency, the process now skips failed snapshots rather than aborting - ensuring that remaining node-snapshot keys are still written. | LBM1-41873 | |
| Fixed a rare condition where a storage node could fail to start if placement group membership changed while the node was offline. Memory pressure during a prior recovery could leave stale metadata persisted due to internal counters not being reset; a subsequent restart would then fail a consistency check. The recovery fallback logic now fully resets all affected counters when a partial failure is detected. | LBM1-43456 | |
| Hardened the Node Manager against a rare deadlock that could occur while updating feature flags, so the node continues operating reliably instead of hanging. | LBM1-42974 | |
| Strengthened control-plane resiliency so that a node still rebuilding its data is no longer promoted from secondary to primary. This keeps the active control-plane role on a fully recovered node during rebuilds. | LBM1-42314 | |
| Strengthened node-recovery resiliency against a rare timing condition where deleting a volume while a recovering node was updating its volume statistics could cause a storage-layer crash. Volume deletes during recovery are now handled safely. | LBM1-43379 | |
| Strengthened the resiliency of node recovery and rebuild against a rare timing condition between snapshot deletion and node recovery. Snapshots are now cleaned up consistently across nodes, preventing a stale snapshot from later causing a rebuild to fail. | LBM1-44159 | |
| Strengthened the resiliency of the cluster upgrade flow against a rare timing condition between the Cluster Manager and the upgrade process, where an upgrade task could be completed and removed just as it was being loaded. Upgrade tasks now load reliably, so server upgrades complete as expected instead of entering an unexpected failed state. | LBM1-42249 | |
| A resiliency safeguard has been added to prevent the Node Manager (NM) from reassigning a device already designated as a data device for use as a journal device, further improving overall cluster robustness. | LBM1-44298 | |
| Clarified the lbcli and REST API documentation for the disable-server (evict) operation, accurately describing its behavior for servers hosting RF=1 (single-replica) volumes and the scope of the force flag. | LBM1-44875 | |
cluster-manager: Fixed a rare condition where a deprecated event key in etcd could cause the event cleaner to exit, leading to event accumulation and a potential out-of-memory (OOM) condition during CM switchover. | LBM1-42614 | |
| Fix for install-lightos cleanup task, deleting server-config.yaml, which holds cluster endpoints. | LBM1-42366 | |
| Fixed a rare condition where Duroslight could hang for approximately five minutes during shutdown. Duroslight now cancels pending futures upon receiving the shutdown command. As a result, rebuild times upon recovery may be slightly longer. | LBM1-38706 | |
| Improved device-management resiliency so that a healthy NVMe device can be added even while the server is rebuilding data after a previous device failure. Adding a replacement device now succeeds in this scenario instead of being rejected. | LBM1-43410 | |
| Improved event accuracy during planned server-disable operations. Disabling a server now reports an event indicating the node is inactive because the server was disabled, instead of a misleading connectivity-issue event - giving operators a clearer signal during maintenance. | LBM1-40182 | |
| Improved journal device event reporting so that these events now include the originating node identifier - making it easier to pinpoint which node an event relates to. | LBM1-42966 | |
| Improved the accuracy of journal device failure detection so that a Duroslight failure caused by a non-journal issue is no longer misclassified as a journal device failure. This prevents a node from being incorrectly marked as failed when SSD journaling is enabled, and removes spurious journal-device-failure events when journaling is not in use. | LBM1-42963 | |
| Improved the systemd metrics collector by reducing excessive log output and documenting its exposed metrics - giving operators cleaner exporter logs and a clearer monitoring reference. | LBM1-42172 | |
Increased the accuracy of the alert calculation logic and improved alert message firing for NodeRebuildNotPossible, to ensure that the alert triggers as expected. | LBM1-41095 | |
| Strengthened the resiliency of Key Encryption Key (KEK) rotation on encryption-enabled clusters, so the Cluster Manager now restarts reliably even if a failover coincides with a brief window during KEK rotation. This keeps the management API and cluster-state handling available, and the data path is unaffected as long as all nodes are healthy. | LBM1-42309 | |
| Strengthened the resiliency of NVMe SSD device-state tracking so that a network disconnect coinciding precisely with a device state change no longer prevents later state updates for that device - keeping device health reporting accurate. | LBM1-42991 | |
| Strengthened the resiliency of the Cluster Manager's placement-group (PG) replacement flow. New safeguards prevent a rare timing condition — two PG members failing permanently at nearly the same time from placing two replicas of the same volume on one node, keeping volume protection state consistent and hardening data placement. | LBM1-44273 | |
| Strengthened the resiliency of volume migration so that a snapshot deleted in the background during migration setup is cleaned up correctly on the target node. This prevents stale snapshot data from being left behind, which could otherwise cause a later rebuild or migration to fail. | LBM1-42584 |
Installation and Upgradeability
You can upgrade to this release from all previous Lightbits 3.15.x, 3.16.x, and 3.17.x releases.
VariableType to search · ESC to discard
GlossaryType to search · ESC to discard
InsertType to search · ESC to discard
No matches
Last updated on
Was this page helpful?
Next to read:
Release 3.18.1© 2026 Lightbits Labs™
Discard Changes
Do you want to discard your current changes and overwrite with the template?
Archive Synced Block
Message
Create new Template
What is this template's title?
Delete Template
Message