Issues Fixed in Lightbits 3.18.2

AI Tools

ID	Description
44875	Clarified the lbcli and REST API documentation to say that it is not possible to evict a server with 1x volumes with the force flag, or to evict a server even if it could lead to risk of service loss using the force flag. Force will only allow you to evict a server if its nodes are unavailable but still requires other nodes to be active.
44762	Log stream to a rsyslog target only supports the secured mode of operation. Unsecured mode is not currently supported.
44298	If a journal NVMe device fails when the node manager is down, then when the node manager comes back up, it can try to take another disk to use for journaling. If the matchers for journaling are the same as for data devices, it can take one of the data devices by mistake - causing the GFTL to crash. For journaling, it is necessary to use specific matchers that are different then the data devices (for example, the serial number).
44159	In a rare case, a rebuild could fail after the following sequence: two or more snapshots are created while a node is inactive, and then all are deleted with at least one deletion occurring after the node's recovery had started. This has been resolved.
44124	Lightbits 3.19.1 enforces consistent formatting for Protocol Buffer (protobuf) duration fields: values must now be expressed in seconds (e.g., 3600s). Alternative unit formats such as 60m or 1h are no longer accepted.
43456	In a rare combination of conditions, a storage node could fail to start after a restart if placement group membership changed while the node was offline. This requires memory pressure during a prior recovery (causing stale metadata to be persisted), followed by placement group rebalancing that removes volumes from the node. Normal operations and graceful recovery flows are not affected.
42991	A network disconnect that might coincide exactly with a change of state of a NVMe SSD device could prevent correct updates of future changes of this specific NVMe SSD device state (the issue will resolve itself the next time node-manager service is restarted).
42963	When SSD Journaling is enabled, if Duroslight fails due to a non-journal-related issue, the Node Manager (NM) might incorrectly classify the failure as a journal device failure, causing the NM to remain inactive and enter a Permanently Failed state. When SSD Journaling is disabled, a Duroslight failure does not impact the failure scenario. However, users might receive a spurious "journal device failed" event even when journaling is not in use. This is cosmetic only and does not reflect an actual journal issue.
42800	A volume's protections state could fail to update correctly in some cases of network/ETCD unavailability.
42614	Cluster manager and etcd services could suffer a very slow potential memory leak in rare cases. Mishandling of a deprecated GFTL data loss event could cause the event clean logic to stop cleanup of old events, leading to a continuous increase in the number of events stored on the cluster.
42309	In rare cases, a CM failover occurring during the initial KEK rotation process may result in a race condition where the new CM fails to become active, causing some API calls to fail. The data path remains unaffected as long as all nodes are healthy.
41873	Under a specific race condition, if a snapshot is created while a node is down and subsequently deleted during a very precise window in the node's startup sequence, the node may become unavailable.
41095	The NodeRebuildNotPossible alert may not trigger under conditions where it should, resulting in missed notifications for rebuild-blocking scenarios.
38706	In some rare cases, Duroslight could hang during shutdown.

Last updated on

Was this page helpful?