Known Issues in Lightbits 3.17.1

IDDescription
42614Cluster manager and etcd services could suffer a very slow potential memory leak. Mishandling of a deprecated GFTL data loss event could cause the event clean logic to stop cleanup of old events, leading to a continuous increase in the number of events stored on the cluster.
42309In certain situations, if there is CM failover during the initial KEK rotation process (race condition), the new CM may not be able to become active. This means that many APIs will fail. The data path will still work as long as all nodes are healthy.
42282A volume protection state might be reported incorrectly in API as fully protected instead of degraded or read only, following permanent failure re-balance that fails. The issue is limited to the protection state the API reports, but internally the protection state is handled as expected.
41873Under a specific race condition, if a snapshot is created while a node is down and then deleted during the node’s startup at a very precise timing, the node could become unavailable.
41466Creating a snapshot with a retention time that exceeds 192 years will fail, and a restart of the api-service.
41162Deleting a snapshot while a node is inactive could cause a subsequent rebuild initiated from that node (acting as primary) to fail. This condition can occur when the inactive node retains metadata for the deleted snapshot while peer nodes do not. Full (migration) rebuilds are more likely to be impacted, as they could include objects associated with the affected snapshot. If this issue is encountered, contact Lightbits Support for an approved procedure to identify and release the problematic snapshots.
41095The NodeRebuildNotPossible alert is not triggered as expected.
41068A node could crash when powering up from an abrupt failure in the rare case where the volume containing the most recently written data is deleted just before an NVMe device failure - as well as the system completing the full rebuild before any new writes are issued to any volume replicated on that node. If this occurs, the remediation is to either fail the node in place or contact Lightbits Support, who can perform an internal procedure to recover the node from this state.
40883In the specific case of using VCP to upgrade a cluster, the upgrade to Lightbits version 3.17.1 or higher will fail because VCP cannot parse the new version format. To successfully upgrade the cluster, use the Lightbits core CLI or REST API directly.
40607In a specific edge case, if the Duroslight fails to write to the Journal device during a rebuild, Duroslight might crash without producing a Journal SSD Failed Event. In such cases, only a NodeInactive event may be recorded.
40428In extremely rare cases, the reported logical size of a volume could be incorrect after a discard operation is performed and TRIM support was enabled.
40293When an admin-endpoint is deleted or updated, the corresponding iptables rules created for it remain in place. As a result, the related ports stay open even though the admin-endpoint has been deleted or updated. The iptables configuration is refreshed only after a service restart, instead of being properly updated in real time.
40068In rare cases, a newly-created volume could be assigned the same NSID as an existing volume. This condition can lead to incorrect delete or update operations for volumes sharing the same NSID. If this issue is encountered, contact Lightbits Support for a manual remediation procedure to identify and fix the affected volumes.
39951A temporary issue - such as a brief network glitch occurring during a specific short window in the node power-up process - could prevent the node from completing the power-up successfully. If this issue occurs, contact Lightbits Support for assistance.
39742Volumes protection state may fail to be updated correctly, in certain scenarios due to an internal race condition that could lead to very temporary resource inconsistency that will fail protection state update.
39184When TRIM is enabled and a user performs the discard operation, the logical report size might be incorrect and not reflect the true logical size.
38497When creating a new server to replace another server in the cluster using the --extend-cluster=false flag (which is the default setting), and at a much later time this server and its node experience a permanent failure and fail in place is enabled (causing the servers resources to be migrated), if the server goes active again it might not participate in all proper distribution of replicas over the cluster and could cause an imbalance of resources.
37830In a very rare case, a node could fail to recover and return to an active state if an I/O error or bad block is encountered on an underlying SSD during its startup sequence. This issue prevents a key service (gftl) from initializing correctly and could require manual intervention - such as the removal of the failed SSD from the system - to allow the node to successfully complete its recovery.
37544In a rare scenario, the discovery-client service could stop if network connectivity to the cluster is disrupted at the same time as multiple volume changes are generating notifications. The service is designed to restart automatically after such an event, and no manual intervention is required.
37505In a rare combination of events, the 'physicalOwnedCapacity' volume statistic could report an incorrect value if data at a specific LBA is overwritten with content that has a different compression ratio. In this scenario, the updated length of the overwritten data is not correctly accounted for in the statistic.
28027A server upgrade status will not update in the following sequence: 1. A server is upgraded to release x.y.z. 2. The operation fails (i.e., times out); however, binaries on the server are updated to version x.y.z. 3. At a later time, the upgrade is attempted again to version x.y.z (this operation is skipped internally, as binaries have already been updated). 4. The upgrade status will continue to show the failed upgrade operation, even though the last upgrade returned with no error.
Type to search, ESC to discard
Type to search, ESC to discard
Type to search, ESC to discard