Issues Fixed in Lightbits 3.19.1

AI Tools
IDDescription
44298If a journal NVMe device fails when the node manager is down, then when the node manager comes back up, it can try to take another disk to use for journaling. If the matchers for journaling are the same as for data devices, it can take one of the data devices by mistake - causing the GFTL to crash. For journaling, it is necessary to use specific matchers that are different then the data devices (for example, the serial number).
43456In a rare combination of conditions, a storage node could fail to start after a restart if placement group membership changed while the node was offline. This requires memory pressure during a prior recovery (causing stale metadata to be persisted), followed by placement group rebalancing that removes volumes from the node. Normal operations and graceful recovery flows are not affected.
42991A network disconnect that might coincide exactly with a change of state of a NVMe SSD device could prevent correct updates of future changes of this specific NVMe SSD device state (the issue will resolve itself the next time node-manager service is restarted).
42800A volume's protections state could fail to update correctly in some cases of network/ETCD unavailability.
42614Cluster manager and etcd services could suffer a very slow potential memory leak in rare cases. Mishandling of a deprecated GFTL data loss event could cause the event clean logic to stop cleanup of old events, leading to a continuous increase in the number of events stored on the cluster.
42309In rare cases, a CM failover occurring during the initial KEK rotation process may result in a race condition where the new CM fails to become active, causing some API calls to fail. The data path remains unaffected as long as all nodes are healthy.
41873Under a specific race condition, if a snapshot is created while a node is down and subsequently deleted during a very precise window in the node's startup sequence, the node may become unavailable.
41466Creating a snapshot with a retention time greater than 192 years will fail and cause the API service to restart.
41095The NodeRebuildNotPossible alert may not trigger under conditions where it should, resulting in missed notifications for rebuild-blocking scenarios.
38706In some rare cases, Duroslight could hang during shutdown.
33865In certain cases when migrating volumes during dynamic rebalancing, a VolumeInDegradedProtectionState event could be sent out when the volume is actually fully protected.
VariableType to search · ESC to discard
GlossaryType to search · ESC to discard
InsertType to search · ESC to discard
No matches