Known Issues in Lightbits 3.15.1

IDDescription
38043

If encryption was turned on but enabling it failed - resulting in the creation of an 'EnableServerEncryptionFailed' event - the API service will return stale events. Any event from that point onward that exists in the system will not be returned by the "ListEvents" API.

As a workaround, check if this event exists before upgrading to 3.14/3.15.1.

Note that a similar issue could also occur when a cluster has double disk failure on one of the servers (or single disk failure with no EC), and Lightbits 3.2.x or older was used at the time of failure.

37831In some cases, silent data corruption on an SSD could cause a node crash instead of attempting to recover the data and reporting an event. This can occur if the SSD returns invalid data rather than an I/O error.
37738Under certain scenarios, Lightbits will cause the grub package to be updated during Lightbits installation, including the addition of new servers. On RHEL8 and derivatives, after updating Grub from "grub2-2.02-162.el8_10" to "grub2-2.02-165.el8_10", if the system is using BIOS mode it might enter the "grub rescue>" prompt upon booting. When this happens, see https://access.redhat.com/solutions/7118853 for how to restore system boot to normal operation.
32827When TRIM is enabled and a user performs the discard operation, the logical report size might be incorrect and not reflect the true logical size.
29683Systems with Solidigm/Intel D5-P5316 drives may experience higher than expected write latency after several drive write cycles. Contact Lightbits Support if you use Solidigm/Intel D5-P5316 SSDs and are experiencing higher than expected write latency.
25382

Under the conditions below, the amount of storage occupied by cold units (filled with 4096 small objects), is not accounted for and not reported, which could result in reaching a storage full or almost full situation that is not observable in the node storage statistics:

  • A sufficient amount of logical user storage contains highly compressible data; e.g., zeroes.
  • This data has been written in large chunks over a short period of time. During this time, no or almost no user writes with lower compression rates or to uncompressed volumes.
  • The highly compressed data written remains unmodified (cold); i.e., not overwritten by user writes for a long period of time. When such a situation occurs, the control plane software does not detect storage capacity reaching the threshold to start proactive rebalancing to free capacity. The System Administrator also relies on the same storage statistics the control plane exposes, and therefore cannot tell that the system capacity has reached the limit.
22582A server could remain in "Enabling" state if the enable server command is issued during an upgrade.
19670The compression ratio returned by get-cluster API will be incorrect when the cluster has snapshots created over volumes. The calculation of the compression ratio at the cluster level uses different logic for physical used capacity and the amount of uncompressed data written to storage. Hence the compression ratio value might be higher than the actual value. A correct indication of cluster level compression can be deduced from a weighted average of compression ratio at the node levels; i.e., Compression ratio = sum(node compression ratio * node physical usage) / sum(node physical usage).
18966"lbcli list events" could fail with "received message larger than max" when there are events that contain a large amount of information. Workaround: Use the --limit and --since arguments to read a smaller amount of data at a time.
18948The node local rebuild progress (due to SSD failure) shows 100% done when there is no storage space left to complete the rebuild.
18522When attempting to add a server to a cluster using lbcli 'create server' or rest post '/api/v2/servers", and the operation fails for any reason, 'list servers' could permanently show the new server in 'creating' state.
18214Automatic rebalancing features (fail-in-place and proactive-rebalance) should be disabled if enable_iptables is enabled during installation.
17298The migration of volumes due to automatic rebalancing could take time, even when volumes are empty.
15715During a volume rebuild, the Grafana dashboard does not show the write IOs for the recovered data.
14995A single server cluster cannot be upgraded using cluster upgrade command, upgrade using only upgrade server command.
14889In case of an SSD failure, the system will scan the storage and rebuild the data. The entire raw capacity will be scanned, even when not all of it was utilized. This leads to a longer rebuild time than necessary.
14863Prior to lb CSI installation, the lb discovery client service must be installed and started on all K8S cluster nodes.
14212OpenStack: Once a volume attach fails, the following attempts to attach it will also fail. Workaround: Remove the discovery-client configuration files for the failed volume and restart the discovery-client and Nova services.
13064Following a 'replace node' operation, volumes with a single replica will be created as 'unavailable' in the new node. Note: Single replica volumes are not protected, and data will not move to the new node. Workaround: Delete single replica volumes before replacing the node, or reboot the new server after replacing the node.
11856Volume and node usage metrics might show different values between REST/lbcli and Prometheus, when a volume is deleted and a node is disconnected.
11326Volume metrics do not return any value for volumes that are created but do not store any data.
10021Commands affecting SSD content (such as blkdiscard, nvme format) should not be executed on the Lightbits server.
Type to search, ESC to discard
Type to search, ESC to discard
Type to search, ESC to discard