Issues Fixed in Lightbits 3.19.1

AI Tools

ID	Description
44298	If a journal NVMe device fails when the node manager is down, then when the node manager comes back up, it can try to take another disk to use for journaling. If the matchers for journaling are the same as for data devices, it can take one of the data devices by mistake - causing the GFTL to crash. For journaling, it is necessary to use specific matchers that are different then the data devices (for example, the serial number).
43456	In a rare combination of conditions, a storage node could fail to start after a restart if placement group membership changed while the node was offline. This requires memory pressure during a prior recovery (causing stale metadata to be persisted), followed by placement group rebalancing that removes volumes from the node. Normal operations and graceful recovery flows are not affected.
42991	A network disconnect that might coincide exactly with a change of state of a NVMe SSD device could prevent correct updates of future changes of this specific NVMe SSD device state (the issue will resolve itself the next time node-manager service is restarted).
42800	A volume's protections state could fail to update correctly in some cases of network/ETCD unavailability.
42614	Cluster manager and etcd services could suffer a very slow potential memory leak in rare cases. Mishandling of a deprecated GFTL data loss event could cause the event clean logic to stop cleanup of old events, leading to a continuous increase in the number of events stored on the cluster.
42309	In rare cases, a CM failover occurring during the initial KEK rotation process may result in a race condition where the new CM fails to become active, causing some API calls to fail. The data path remains unaffected as long as all nodes are healthy.
41873	Under a specific race condition, if a snapshot is created while a node is down and subsequently deleted during a very precise window in the node's startup sequence, the node may become unavailable.
41466	Creating a snapshot with a retention time greater than 192 years will fail and cause the API service to restart.
41095	The NodeRebuildNotPossible alert may not trigger under conditions where it should, resulting in missed notifications for rebuild-blocking scenarios.
38706	In some rare cases, Duroslight could hang during shutdown.
33865	In certain cases when migrating volumes during dynamic rebalancing, a VolumeInDegradedProtectionState event could be sent out when the volume is actually fully protected.

Last updated on

Was this page helpful?