Known Issues in Lightbits 3.9.6
ID | Description |
---|---|
39211 | When deleting the most recent snapshot of a volume while a node holding a replica is offline, recently written data could revert to the data stored in that snapshot if the node later becomes the primary. |
39168 | When using DCPMM, if a snapshot is taken after an abrupt failure, recently written data could be reverted to the state captured in the snapshot. |
38754 | A node-manager service will fail to shut down gracefully, if the shutdown is issued before it successfully completed to power-up. |
38497 | When creating a new server to replace another server in the cluster (using the --extend-cluster=false flag - note that this is also the default), the new server will not participate in all proper distribution of replicas over the cluster and could cause an imbalance of resources. |
38496 | Creating a new server to replace another server in the cluster (using the --extend-cluster=false flag - note that this is also the default) - while dynamic rebalance is enabled - could cause the new server to participate in the dynamic rebalance process. The nodes on the server could then move automatically to active and not remain unattached. This prevents the server from acting as a replacement in the replace node process. |
38043 | If encryption was turned on but enabling it failed - resulting in the creation of an 'EnableServerEncryptionFailed' event - the API service will return stale events. Any event from that point onward that exists in the system will not be returned by the "ListEvents" API. As a workaround, check if this event exists before upgrading to 3.14/3.15.1. Note that a similar issue could also occur when a cluster has double disk failure on one of the servers (or single disk failure with no EC), and Lightbits 3.2.x or older was used at the time of failure. |
37505 | The volume statistic 'physicalOwnedCapacity' might report an incorrect value when data is overwritten at the same LBA block with a different length. This can occur when the overwritten data is compressed with a different compression ratio than the original. In such case, the length of the overwritten data is not accounted for in the statistic. |
37395 | In some rare racy conditions a server may remain stuck in a deleting state. |
37205 | Incorrect handling of IO errors from NVMe SSDs during abrupt recovery may cause node recovery to fail. |
36882 | GFTL service could fail locally due to a rare race condition when a SSD failure/removal, a SSD read submission, and multiple volume rebuilds all occur at exactly the same time. |
36722 | Users can reference the NVMe device by its path name (e.g., /dev/nvme0n1) - as used during the initial system setup - to determine the storage SSD used by servers in the Lightbits storage cluster. However, this could lead to data loss since device names are not persistent across reboots. |
36282 |
|
36090 | Due to a rare internal error, involved long network disconnections nodes might lose service and stay in Inactive state - even though the node should be active. |
36089 | Under rare situations involving stress on the cluster that includes rebalance activity accompanied by disconnections from etcd, node manager may crash and restart, or fail to complete rebuilds, and volumes may be stuck in Migrating state. A workaround if this happens is to restart the affected node manager. |
35837 | On a single-instance service on a machine with multiple numa-nodes with memory, memory stress can occur, and the kernel will try to perform memory reclamation. This leads to start failures in the duroslight service, with the node staying inactive. |
35575 | Volumes could remain in degraded state, after the node has recovered from network issues. |
34169 | Duroslight crashes (segfault) during startup on Sapphire Rapids, when the kernel is in lockdown mode. |
28027 | A server upgrade status will not update in the following sequence:
|
Was this page helpful?