Node is Inactive

Scenario 1

DescriptionNode is inactiveVersion: All versions
Symptoms

In the command: lbcli list nodes, the result will display as below.

Name UUID State NVMe endpoint Failure domains Local rebuild progress

node02-0 0fed6410-935d-5bb4-86e0-aa976f33dfc0 Active 172.16.175.12:4420 [node02] None

node00-0 2e1ebbf5-1799-5518-a7d7-6d9fb6158e27 Active 172.16.175.10:4420 [node00] None

**node01-0 4eb2935a-1dc0-5486-94c9-1f305a1aa464 Inactive 172.16.175.11:4420 [node01] None**

Logs to View

Log name: journalctl -u node-manager Log lines:

Dec 23 12:23:31 node01 node-manager[4516]: warn service/duroslight_health_checker.go:153 Report channel is full. Skipping sending duroglight health-state {"id": "4eb2935a-1dc0-5486-94c9-1f305a1aa464"} Dec 23 12:23:31 node01 node-manager[4516]: 2021-12-23 12:23:31.320212062 +0000 UTC m=+605428.378684317 ``**write error: write /var/log/node-manager.log: no space left on device**

Troubleshooting StepsCheck the disk space (df-h).
Root CauseDisk space is full.
ResolutionRemove unnecessary files to free space.

Scenario 2

DescriptionNode is inactiveVersion: All versions
Symptoms

In the command lbcli list nodes, the result will display as below.

Name UUID State NVMe endpoint Failure domains Local rebuild progress

node02-0 0fed6410-935d-5bb4-86e0-aa976f33dfc0 Active 172.16.175.12:4420 [node02] None

node00-0 2e1ebbf5-1799-5518-a7d7-6d9fb6158e27 Active 172.16.175.10:4420 [node00] None

**node01-0 4eb2935a-1dc0-5486-94c9-1f305a1aa464 Inactive 172.16.175.11:4420 [node01] None**

Logs to View

Log name: journalctl -u etcd Repeated log lines line the example below:

Jan 13 09:17:41 light1-2 etcd[17831]: the clock difference against peer <node-id> is too high [5.806176262s > 1s]

Troubleshooting StepsCompare the time between all the servers. There will likely be a time difference. Note that even a one-second time difference can cause communication problems in the system due to time synchronization.
Root CauseThe time is not synchronized between the Lightbits servers.
Resolution
  1. Verify that all servers are connected to the NTP server, and that the time is synced.
  2. Check that the value of the ‘ validTicksPercent’ parameter is set to 40 in the cluster-manager.yaml file in all the servers. Reboot the server so that all the services will come up correctly after the time is synced.
Type to search, ESC to discard
Type to search, ESC to discard
Type to search, ESC to discard