Search

Node is Inactive

AI Tools

Scenario 1


Description	Node is inactive	Version: All versions
Symptoms	In the command: lbcli list nodes, the result will display as below. `Name UUID State NVMe endpoint Failure domains Local rebuild progress` `node02-0 0fed6410-935d-5bb4-86e0-aa976f33dfc0 Active 172.16.175.12:4420 [node02] None` `node00-0 2e1ebbf5-1799-5518-a7d7-6d9fb6158e27 Active 172.16.175.10:4420 [node00] None` `node01-0 4eb2935a-1dc0-5486-94c9-1f305a1aa464 Inactive 172.16.175.11:4420 [node01] None`
Logs to View	Log name: journalctl -u node-manager Log lines: `Dec 23 12:23:31 node01 node-manager[4516]: warn service/duroslight_health_checker.go:153 Report channel is full. Skipping sending duroglight health-state {"id": "4eb2935a-1dc0-5486-94c9-1f305a1aa464"}` Dec 23 12:23:31 node01 node-manager[4516]: 2021-12-23 12:23:31.320212062 +0000 UTC m=+605428.378684317 ``write error: write /var/log/node-manager.log: no space left on device
Troubleshooting Steps	Check the disk space (df-h).
Root Cause	Disk space is full.
Resolution	Remove unnecessary files to free space.

Scenario 2


Description	Node is inactive	Version: All versions
Symptoms	In the command lbcli list nodes, the result will display as below. `Name UUID State NVMe endpoint Failure domains Local rebuild progress` `node02-0 0fed6410-935d-5bb4-86e0-aa976f33dfc0 Active 172.16.175.12:4420 [node02] None` `node00-0 2e1ebbf5-1799-5518-a7d7-6d9fb6158e27 Active 172.16.175.10:4420 [node00] None` `node01-0 4eb2935a-1dc0-5486-94c9-1f305a1aa464 Inactive 172.16.175.11:4420 [node01] None`
Logs to View	Log name: journalctl -u etcd Repeated log lines line the example below: Jan 13 09:17:41 light1-2 etcd[17831]: the clock difference against peer <node-id> is too high [5.806176262s > 1s]
Troubleshooting Steps	Compare the time between all the servers. There will likely be a time difference. Note that even a one-second time difference can cause communication problems in the system due to time synchronization.
Root Cause	The time is not synchronized between the Lightbits servers.
Resolution	Verify that all servers are connected to the NTP server, and that the time is synced. Check that the value of the ‘ validTicksPercent’ parameter is set to 40 in the cluster-manager.yaml file in all the servers. Reboot the server so that all the services will come up correctly after the time is synced.

Last updated on

Was this page helpful?

On This Page

Node is Inactive Scenario 1 Scenario 2