Issues Fixed in Lightbits 3.19.1

IDDescription
44298If a Journal NVMe device fails when the node manager is down, when the node manager comes back up, it can try to take another disk to use for journaling. If the matchers for journaling are the same as for data devices, it can take one of the data devices by mistake - causing the GFTL to crash. For journaling, it is necessary to use specific matchers that are different then the data devices (for example, the serial number).
43456In a rare combination of conditions, a storage node could fail to start after a restart if placement group membership changed while the node was offline. This requires memory pressure during a prior recovery (causing stale metadata to be persisted), followed by placement group rebalancing that removes volumes from the node. Normal operations and graceful recovery flows are not affected.
43379On rare occasions, when a node was recovering and pre-existing volume logical stats were being updated as part of the recovery process and the volume delete command was received - the GFTL could crash.
42991A network disconnect that might coincide exactly with a change of state of a NVMe SSD device could prevent correct updates of future changes of this specific NVMe SSD device state (the issue will resolve itself the next time node-manager service is restarted).
42800A volume's protections state could fail to update correctly in some cases of network/ETCD unavailability.
42614Cluster manager and etcd services could suffer a very slow potential memory leak. Mishandling of a deprecated GFTL data loss event could cause the event clean logic to stop cleanup of old events, leading to a continuous increase in the number of events stored on the cluster.
42309In certain situations, if there is CM failover during the initial KEK rotation process (race condition), the new CM may not be able to become active. This means that many APIs will fail. The data path will still work as long as all nodes are healthy.
41873Under a specific race condition, if a snapshot is created while a node is down and then deleted during the node’s startup at a very precise timing, the node could become unavailable.
41466Creating a snapshot with a retention time that exceeds 192 years will fail, and a restart of the api-service.
41095The NodeRebuildNotPossible alert is not triggered as expected.
38706In some rare cases, Duroslight could hang during shutdown.
33865In certain cases when migrating volumes during dynamic rebalancing, a VolumeInDegradedProtectionState event could be sent out when the volume is actually fully protected.
Type to search, ESC to discard
Type to search, ESC to discard
Type to search, ESC to discard