Lightbits Supported Events
Lightbits services will generate events for various operations, activities, statuses, errors, or warnings that can occur in a Lightbits cluster. These events can be collected via official Lightbits API, or - from Release 3.0.3 - by scraping them from the Lightbits services log files.
The format and information of events will be identical in either format they collected through (API/logs). Events in logs will be preceded by a dedicated "LBEVENT"
tag, to simplify filtering them out of logs.
The default location for all logs of Lightbits services is /var/log/<service-name>.log
; however, this may be customized in each cluster install by updating the service role template:
logging:
filename: /var/log/node-manager.log
Event Name | Event Cause | Event Code | Cause Code | Severity | Event Type | Reporting Service | Description |
---|---|---|---|---|---|---|---|
Node Inactive | Local RAID failure | 201 | 102 | High | Node | Node Manager | Node state has changed to inactive due to multiple NVMe device disk failures (EC enabled)/single disk failure (no EC). Note: This event was renamed. |
Node Inactive | OS health check failed | 201 | 110 | High | Node | Node Manager | Node state has changed to inactive due to a kernel panic being detected in the OS. |
Node Inactive | Duroslight health check failed | 201 | 105 | Medium | Node | Node Manager | Node state has changed to inactive due to a failure with Duroslight. |
Node Inactive | Heartbeat to node failed | 201 | 107 | Medium | Node | Cluster Manager | Node state has changed to inactive due to a node being unreachable (network failure/server down). |
Node Inactive | Connectivity issue | 201 | 108 | Medium | Node | Cluster Manager | Node state has changed to inactive due to connectivity errors in dataplane. |
Node Inactive | FailedToEnableEncryption | 201 | 1304 | High | Node | Cluster Manager | Node state has changed to inactive due to connectivity |
Node Active | N/A | 200 | 1 | Info | Node | Cluster Manager | Node is active. |
Node in Permanent Failure State Inactive | Node Inactive | 206 | 112 | High | Node | Cluster Manager | Node entered permanent failure state (duration of node in inactive state exceeded configured time window of inactivity). |
Node Powerup | Started | 208 | 2 | Info | Node | Node Manager | Node powerup started. |
Node Powerup | Completed | 208 | 3 | Info | Node | Node Manager | Node powerup completed. |
Node Powerup | Failed | 208 | 4 | High | Node | Node Manager | Node powerup failed. |
Node High Storage Usage | PerformanceDegredationHighCapacityUtilization | 204 | 500 | Info | Node | Node Manager | Node entered degraded performance state due to high storage utilization (> 70%). |
Node High Storage Usage | PerformanceDegredationHighCapacityUtilizationMd | 204 | 501 | Medium | Node | Node Manager | Node entered degraded performance state due to high metadata utilization. |
Node Read-Only State | Node entered read-only state | 209 | 504 | High | Node | Node Manager | Node entered read-only state. |
Node Read-Only State | Node exited read-only state | 209 | 506 | Info | Node | Node Manager | Node exited the read-only state. |
NodeStorageCapacity | ReadOnlyModeMD | 204 | 505 | High | Node | Node Manager | Node metadata is at high utilization (90%), and nearing read-only state. |
NodeRaidRebuildStatus | Initiated | 205 | 2 | Medium | Node | Node Manager | Local Raid rebuild has started. |
NodeRaidRebuildStatus | Completed | 205 | 3 | Info | Node | Node Manager | Local Raid rebuild has completed. |
NodeRaidRebuildStatus | ReadOnlyMode (Halted) | 205 | 504 | High | Node | Node Manager | Local Raid rebuild was halted (as the node entered read-only state, no further free storage to complete the rebuild). |
NodeRaidRebuildStatus | ExitReadOnlyMode (Resume) | 205 | 506 | Medium | Node | Node Manager | Local Raid rebuild resumed (continued with local Raid rebuild after it was halted). |
Node Unattached | N/A | 210 | 1 | Info | Node | Cluster Manager | Node entered Unattached state. All volume and snapshot resources were migrated off of this node. |
Server Clock Drift | Clock drift detected | 1000 | 1000 | High | Server | Node Manager | Detected a clock drift between the reporting server and the rest of the servers in the cluster. |
Server Linux VM Write Cache Configuration Error | Server Linux VM Write Cache Configuration Error | 1200 | 1200 | High | Server | Node Manager | For Internal use |
Server Upgrade | Started | 301 | 2 | Info | Server | Upgrade Manager | Upgrade of the server started. |
Server Upgrade | Finished | 301 | 3 | Info | Server | Upgrade Manager | Upgrade of the server completed. |
Server Upgrade | Failed | 301 | 4 | High | Server | Upgrade Manager | Upgrade of server has failed. Note: Additional failure-specific information will be returned here. |
Server Upgrade Skipped | Server non-upgradeable | 302 | 1100 | Medium | Server | Upgrade Manager | Skip upgrade operation for server, as the server is non-upgradeable. Note: Additional specific information to skip the cause will be returned here. |
UpgradeManagerStartupFailed | Failed | 700 | 4 | High | Server | Upgrade Manager | Failed to complete (upgrade manager failed to start up). |
Cluster Upgrade | Started | 401 | 2 | Info | Server | Upgrade Manager | Started an upgrade of the cluster. |
Cluster Upgrade | Finished | 401 | 3 | Info | Server | Upgrade Manager | Completed an upgrade of the cluster. |
Cluster Upgrade | Failed | 401 | 4 | High | Server | Upgrade Manager | Upgrade of the cluster has failed. Note: Additional failure-specific information will be returned here. |
NVMeSSDUnhealthy | DeviceHealthReachedMaxReadRetries | 1001 | 1003 | Medium | NVMe SSD | Node Manager | An NVMe SSD device is unhealthy. |
NVMeSSDUnhealthy | DeviceHealthReachedMaxWriteRetries | 1001 | 1002 | Medium | NVMe SSD | Node Manager | An NVMe SSD device is unhealthy. |
NVMeSSDUnhealthy | DeviceHealthAbortedWriteCmds | 1001 | 1004 | Medium | NVMe SSD | Node Manager | An NVMe SSD device is unhealthy. |
NVMeSSDUnhealthy | DeviceHealthAbortedReadCmds | 1001 | 1005 | Medium | NVMe SSD | Node Manager | An NVMe SSD device is unhealthy. |
NVMeSSDUnhealthy | DeviceHealthWriteErrors | 1001 | 1007 | Medium | NVMe SSD | Node Manager | An NVMe SSD device is unhealthy. |
NVMeSSDUnhealthy | DeviceHealthReadErrors | 1001 | 1006 | Medium | NVMe SSD | Node Manager | An NVMe SSD device is unhealthy. |
NVMeDeviceFailed | N/A | 501 | 1 | High | NVMe SSD | Node Manager | An NVMe SSD device has failed. |
NVMeDeviceAdded | N/A | 502 | 1 | Info | NVMe SSD | Node Manager | Added a new NVMe SSD to a node. |
AddNVMeDeviceOperationFailed | AddNVMeDeviceTaskFailure | 503 | 901 | High | NVMe SSD | Node Manager | Failed to add a new NVMe SSD to a node. |
Volumes Fully Protected | N/A | 600 | 1 | Info | Volume | Cluster Manager | Volumes are in fully protected protection state (the event is issued on entering a new state; the event will return a list of volumes affected by this change of state). |
Volumes in Degraded State | N/A | 601 | 1 | Medium | Volume | Cluster Manager | Volumes are in degraded protection state (the event is issued on entering a new state; the event will return a list of volumes affected by this change of state). |
Volumes in Read-Only | N/A | 602 | 1 | High | Volume | Cluster Manager | Volumes are in read-only protection state (the event is issued on entering a new state; the event will return a list of volumes affected by this change of state). |
Volumes are Unavailable | N/A | 603 | 1 | High | Volume | Cluster Manager | Volumes are in unavailable protection state (the event is issued on entering a new state; the event will return a list of volumes affected by this change of state). |
ClusterCapacityFull | HighClusterStorageUtilization | 402 | 1001 | High | Cluster | Cluster Manager | Cluster utilization is high. |
UnRecoverableDataIntegrity | DataIntegrityDuringRebuildDueToInheritanceOfCorruption | 1100 | 1113 | Critical | Volume/NVMeSSD | Node Manager | Unrecoverable data integrity error. |
UnRecoverableDataIntegrity | DataIntegrityDuringUserReadsDueToInheritanceOfCorruption | 1100 | 1112 | Critical | Volume/NVMeSSD | Node Manager | Unrecoverable data integrity error. |
UnRecoverableDataIntegrity | DataIntegrityDuringRebuildDueToMalfunctioningDevices | 1100 | 1111 | Critical | Volume/NVMeSSD | Node Manager | Unrecoverable data integrity error. |
UnRecoverableDataIntegrity | DataIntegrityDuringUserReadsDueToMalfunctioningDevices | 1100 | 1110 | Critical | Volume/NVMeSSD | Node Manager | Unrecoverable data integrity error. |
RecoverableDataIntegrity | DataIntegrityDuringRebuildDueToInheritanceOfCorruption | 1101 | 1113 | High | Volume/NVMeSSD | Node Manager | Recoverable data integrity error. |
RecoverableDataIntegrity | DataIntegrityDuringUserReadsDueToInheritanceOfCorruption | 1101 | 1112 | High | Volume/NVMeSSD | Node Manager | Recoverable data integrity error. |
RecoverableDataIntegrity | DataIntegrityDuringRebuildDueToMalfunctioningDevices | 1101 | 1111 | High | Volume/NVMeSSD | Node Manager | Recoverable data integrity error. |
RecoverableDataIntegrity | DataIntegrityDuringUserReadsDueToMalfunctioningDevices | 1101 | 1110 | High | Volume/NVMeSSD | Node Manager | Recoverable data integrity error. |
UnRecoverableDataIntegrity | DataIntegrityDuringNodeRebuild | 1100 | 1116 | Critical | Node/NVMeSSD | Node Manager | Detected data integrity error during node rebuild. |
RecoverableDataIntegrity | DataIntegrityDuringRecoveryFromGracefulShutdown | 1101 | 1115 | High | Node/NVMeSSD | Node Manager | Detected data integrity error during node rebuild, during recovery from a graceful shutdown. |
GarbageCollectionDataIntegrity | DataIntegrityDuringGarbageCollection | 1102 | 1118 | High | Node/NVMeSSD | Node Manager | Detected data integrity error during garbage collection processing. |
InitializingServerEncryptionFailed | FailedToGetKEK | 1400 | 1500 | Medium | ServerEncryption | Node Manager | Failed to get the encryption key from the cluster manager. |
InitializingServerEncryptionFailed | FailedToInitializeTPM | 1400 | 1501 | Medium | ServerEncryption | Node Manager | Failed to initialize the TPM key. |
InitializingServerEncryptionFailed | FailedToReadKEK | 1400 | 1502 | Medium | ServerEncryption | Node Manager | Failed to set the encryption key in the cache. |
InitializingServerEncryptionFailed | FailedToWriteKEK | 1400 | 1503 | Medium | ServerEncryption | Node Manager | Failed to save the encryption key. |
VolumeEncryptionFailed | MissingDEK | 1301 | 1300 | Critical | Cluster Encryption | Node Manager | Failed to encrypt or update a volume on a node, due to a missing DEK. |
VolumeEncryptionFailed | CorruptedDEK | 1301 | 1301 | Critical | Cluster Encryption | Node Manager | Failed to encrypt or update a volume on a node, due to a corrupted DEK. |
EnableClusterEncryptionFailed | NotEnoughActiveNodes | 1302 | 1302 | Medium | Cluster Encryption | Cluster Manager | There are not enough servers with active nodes up to enable encryption. |
EnableClusterEncryptionFailed | FailedToDistributeKekToNodes | 1302 | 1303 | Medium | Cluster Encryption | Cluster Manager | Cluster Manager failed to distribute the Cluster Encryption Key to all servers with active nodes. |
EnableClusterEncryptionFailed | FailedToEnableEncryption | 1302 | 1304 | Medium | Cluster Encryption | Cluster Manager | Failed to enable encryption on the cluster level. |
EnableClusterEncryptionInitiated | FailedToEnableEncryption | 1303 | 2 | Info | Cluster Encryption | Cluster Manager | Cluster encryption process was initiated. |
EnableClusterEncryptionCompleted | FailedToEnableEncryption | 1303 | 3 | Info | Cluster Encryption | Cluster Manager | The cluster encryption process completed successfully. Your cluster is encrypted. |
ServerDisableOperation | Completed | 321 | 3 | Info | Server | Cluster Manager | Server enable operation completed successfully (move server out of maintenance mode). |
ServerDisableOperation | Failed to Complete | 321 | 4 | Medium | Server | Cluster Manager | Server disable operation completed successfully (move server into maintenance mode). |
ServerEnableOperation | Completed | 322 | 3 | Info | Server | Cluster Manager | Server disable operation failed to complete (more specific error cause is returned in event). |
ServerCreated | Completed | 311 | 3 | Info | Server | Cluster Manager/Node Manager | New server created in the cluster (server added to the cluster). |
ServerCreateFailed | Failed to Complete | 312 | 4 | High | Server | Cluster Manager | Failed to add a new server to the cluster. |
ServerDeleted | Completed | 313 | 3 | Info | Server | Cluster Manager | Deleted a server from the cluster. |
ServerEviction | Initiated | 351 | 2 | High | Server | Cluster Manager | Started an eviction of a server's resources. |
ServerEviction | Completed | 351 | 3 | Info | Server | Cluster Manager | Completed an eviction of a server's resources. |
ServerEviction | Failed to Complete | 351 | 4 | High | Server | Cluster Manager | Failed to complete an eviction of a server's resources. |
ServerEvictionAborted | Initiated | 352 | 2 | Info | Server | Cluster Manager | Initiated an abortion of an ongoing server eviction. |
ServerEvictionAborted | Completed | 352 | 3 | Info | Server | Cluster Manager | Completed issuing an abort of an ongoing eviction. |
Cluster root key rotation | Initiated | 1304 | 2 | Info | Cluster Encryption | Cluster Manager | Started the process to rotate the cluster root key. |
Cluster root key rotation | Completed | 1304 | 3 | Info | Cluster Encryption | Cluster Manager | Completed the process of rotating the cluster root key. |
Cluster root key rotation | Complete | 1304 | 4 | Medium | Cluster Encryption | Cluster Manager | Failed to complete the process of rotating the cluster root key. This can be either a fatal error that will completely fail the rotation process, or possibly a non-fatal one, which will only cause the rotation process to either take longer or be stuck without failing it. |
Was this page helpful?