Lightbits Release Notes
Lightbits Known Issues
Title
Message
Create new category
What is the title of your new category?
Edit page index title
What is the title of the page index?
Edit category
What is the new title of your category?
Edit link
What is the new title and URL of your link?
Known Issues Lightbits 3.16.1
Copy Markdown
Open in ChatGPT
Open in Claude
| ID | Description |
|---|---|
| 42614 | Cluster manager and etcd services could suffer a very slow potential memory leak. Mishandling of a deprecated GFTL data loss event could cause the event clean logic to stop cleanup of old events, leading to a continuous increase in the number of events stored on the cluster. |
| 42282 | A volume protection state might be reported incorrectly in API as fully protected instead of degraded or read only, following permanent failure re-balance that fails. The issue is limited to the protection state the API reports, but internally the protection state is handled as expected. |
| 41873 | Under a specific race condition, if a snapshot is created while a node is down and then deleted during the node’s startup at a very precise timing, the node could become unavailable. |
| 41466 | Creating a snapshot with a retention time that exceeds 192 years will fail, and a restart of the api-service. |
| 41162 | Deleting a snapshot while a node is inactive could cause a subsequent rebuild initiated from that node (acting as primary) to fail. This condition can occur when the inactive node retains metadata for the deleted snapshot while peer nodes do not. Full (migration) rebuilds are more likely to be impacted, as they could include objects associated with the affected snapshot. If this issue is encountered, contact Lightbits Support for an approved procedure to identify and release the problematic snapshots. |
| 41095 | The NodeRebuildNotPossible alert is not triggered as expected. |
| 41068 | A node could crash when powering up from an abrupt failure in the rare case where the volume containing the most recently written data is deleted just before an NVMe device failure - as well as the system completing the full rebuild before any new writes are issued to any volume replicated on that node. If this occurs, the remediation is to either fail the node in place or contact Lightbits Support, who can perform an internal procedure to recover the node from this state. |
| 40428 | In extremely rare cases, the reported logical size of a volume could be incorrect after a discard operation is performed and TRIM support was enabled. |
| 40293 | When an admin-endpoint is deleted or updated, the corresponding iptables rules created for it remain in place. As a result, the related ports stay open even though the admin-endpoint has been deleted or updated. The iptables configuration is refreshed only after a service restart, instead of being properly updated in real time. |
| 40068 | In rare cases, a newly-created volume could be assigned the same NSID as an existing volume. This condition can lead to incorrect delete or update operations for volumes sharing the same NSID. If this issue is encountered, contact Lightbits Support for a manual remediation procedure to identify and fix the affected volumes. |
| 39951 | A temporary issue - such as a brief network glitch occurring during a specific short window in the node power-up process - could prevent the node from completing the power-up successfully. If this issue occurs, contact Lightbits Support for assistance. |
| 39628 | To prevent a rare potential Machine Check Exception (MCE) and forced reboots on Sapphire Rapids machines, we recommend disabling the DSA offload feature. This condition can occur if the duroslight log indicates "Enabling DSA crc32 offload for reads," and can be prevented by adding dsa_read_crc32: false and dsa_write_crc32: false under the "configurator" section of /etc/duroslight/conf.yaml. |
| 39211 | When deleting the most recent snapshot of a volume while a node holding a replica is offline, recently written data could revert to the data stored in that snapshot if the node later becomes the primary. |
| 39184 | When TRIM is enabled and a user performs the discard operation, the logical report size might be incorrect and not reflect the true logical size. |
| 39168 | When using DCPMM, if a snapshot is taken after an abrupt failure, recently written data could be reverted to the state captured in the snapshot. |
| 38754 | A node-manager service will fail to shut down gracefully, if the shutdown is issued before it successfully completed to power-up. |
| 38751 | The wrlat_reply_qued statistic is inaccurate, as it reflects any other nvme command - not just writes. |
| 38497 | When creating a new server to replace another server in the cluster using the --extend-cluster=false flag (which is the default setting), and at a much later time this server and its node experience a permanent failure and fail in place is enabled (causing the servers resources to be migrated), if the server goes active again it might not participate in all proper distribution of replicas over the cluster and could cause an imbalance of resources. |
| 37830 | In a very rare case, a node could fail to recover and return to an active state if an I/O error or bad block is encountered on an underlying SSD during its startup sequence. This issue prevents a key service (gftl) from initializing correctly and could require manual intervention - such as the removal of the failed SSD from the system - to allow the node to successfully complete its recovery. |
| 37544 | In a rare scenario, the discovery-client service could stop if network connectivity to the cluster is disrupted at the same time as multiple volume changes are generating notifications. The service is designed to restart automatically after such an event, and no manual intervention is required. |
| 37505 | In a rare combination of events, the 'physicalOwnedCapacity' volume statistic could report an incorrect value if data at a specific LBA is overwritten with content that has a different compression ratio. In this scenario, the updated length of the overwritten data is not correctly accounted for in the statistic. |
| 29683 | Systems with Solidigm/Intel D5-P5316 drives may experience higher than expected write latency after several drive write cycles. Contact Lightbits Support if you use Solidigm/Intel D5-P5316 SSDs and are experiencing higher than expected write latency. |
| 28027 | A server upgrade status will not update in the following sequence: 1. A server is upgraded to release x.y.z. 2. The operation fails (i.e., times out); however, binaries on the server are updated to version x.y.z. 3. At a later time, the upgrade is attempted again to version x.y.z (this operation is skipped internally, as binaries have already been updated). 4. The upgrade status will continue to show the failed upgrade operation, even though the last upgrade returned with no error. |
| 22582 | A server could remain in "Enabling" state if the enable server command is issued during an upgrade. |
| 19670 | The compression ratio returned by get-cluster API will be incorrect when the cluster has snapshots created over volumes. The calculation of the compression ratio at the cluster level uses different logic for physical used capacity and the amount of uncompressed data written to storage. Hence the compression ratio value might be higher than the actual value. A correct indication of cluster level compression can be deduced from a weighted average of compression ratio at the node levels; i.e., Compression ratio = sum(node compression ratio * node physical usage) / sum(node physical usage). |
| 18948 | The node local rebuild progress (due to SSD failure) shows 100% done when there is no storage space left to complete the rebuild. |
| 18522 | When attempting to add a server to a cluster using lbcli 'create server' or rest post '/api/v2/servers", and the operation fails for any reason, 'list servers' could permanently show the new server in 'creating' state. |
| 17298 | The migration of volumes due to automatic rebalancing could take time, even when volumes are empty. |
| 15715 | During a volume rebuild, the Grafana dashboard does not show the write IOs for the recovered data. |
| 14995 | A single server cluster cannot be upgraded using the cluster upgrade command. Upgrade using only the upgrade server command. |
| 14889 | In case of an SSD failure, the system will scan the storage and rebuild the data. The entire raw capacity will be scanned, even when not all of it was utilized. This leads to a longer rebuild time than necessary. |
| 14863 | Prior to lb CSI installation, the lb discovery client service must be installed and started on all K8S cluster nodes. |
| 13064 | Following a 'replace node' operation, volumes with a single replica will be created as 'unavailable' in the new node. Note: Single replica volumes are not protected, and data will not move to the new node. Workaround: Delete single replica volumes before replacing the node, or reboot the new server after replacing the node. |
| 11856 | Volume and node usage metrics might show different values between REST/lbcli and Prometheus, when a volume is deleted and a node is disconnected. |
| 11326 | Volume metrics do not return any value for volumes that are created but do not store any data. |
| 10021 | Commands affecting SSD content (such as blkdiscard, nvme format) should not be executed on the Lightbits server. |
Type to search, ESC to discard
Type to search, ESC to discard
Type to search, ESC to discard
Last updated on
Was this page helpful?
Next to read:
Known Issues in Lightbits 3.15.3© 2026 Lightbits Labs™
Discard Changes
Do you want to discard your current changes and overwrite with the template?
Archive Synced Block
Message
Create new Template
What is this template's title?
Delete Template
Message