Issues Fixed in Lightbits 3.15.3

ID	Description
42614	Cluster manager and etcd services could suffer a very slow potential memory leak in rare cases. Mishandling of a deprecated GFTL data loss event could cause the event clean logic to stop cleanup of old events, leading to a continuous increase in the number of events stored on the cluster.
42309	In certain situations, if there is CM failover during the initial KEK rotation process (race condition), the new CM may not be able to become active. This means that many APIs will fail. The data path will still work as long as all nodes are healthy.
42282	A volume protection state might be reported incorrectly in the API as fully protected instead of degraded or read only, following a permanent failure re-balance that fails. The issue is limited to the protection state the API reports, but internally the protection state is handled as expected.
41162	Deleting a snapshot while a node is inactive could cause a subsequent rebuild initiated from that node (acting as primary) to fail. This condition can occur when the inactive node retains metadata for the deleted snapshot while peer nodes do not. Full (migration) rebuilds are more likely to be impacted, as they could include objects associated with the affected snapshot. If this issue is encountered, contact Lightbits Support for an approved procedure to identify and release the problematic snapshots.
41068	A node could crash when powering up from an abrupt failure in the rare case where the volume containing the most recently written data is deleted just before an NVMe device failure - as well as the system completing the full rebuild before any new writes are issued to any volume replicated on that node. If this occurs, the remediation is to either fail the node in place or contact Lightbits Support, who can perform an internal procedure to recover the node from this state.
40871	Instances with seven or more devices and a specific configuration (eight SWLF cores and eight or more recovery cores) could fail graceful recovery and fall back to abrupt recovery, which can take significantly longer. Mitigation: Set the module parameter gracefulrecovery maxrecovery cores="6" in the gftl-options file.
40626	Under an extremely rare race condition that can occur during background garbage collection while two successive snapshots are deleted, it is possible for data from an older volume snapshot to overwrite more recent data. A permanent fix for this issue is in development and will be included in a forthcoming release.
40208	A volume rebuild could fail to complete following an internal error in the handling of creating a new snapshot. When a specific portion of the handling of a create snapshot task occurs exactly as the cluster manager service is switched over, this volume and other volumes that share the same protection group could get into an inconsistent state that will prevent the completion of a volume rebuild.
39742	In certain scenarios, volumes protection state may fail to be updated correctly, due to an internal race condition that could lead to very temporary resource inconsistency that will fail the protection state update.

Last updated on

Was this page helpful?