Auto Revive

Lightbits supports auto revive of the node level service. If a node-manager stops functioning or frontend processes halt - or a kernel bug is detected - the server is rebooted and Lightbits services are restarted.

The auto revive feature will attempt up to two auto revives in a two-hour window. The number of attempts and the time window can be modified using the update cluster config API.

The correct operation of auto revive requires generation and storing of some local files. By default, these are are placed at /var/cache/node-manager. However, they can be located on any path on the server by updating node-manager yaml: nodeManagerAutoReviveDir: /var/cache/node-manager. The service can then be restarted.

clusterconfig

AllowedNumRevives: This is the number of attempts to revive services in a specified time window (the default is set to 2; a 0 value will disable the feature).

RevivesWindowDuration: This is the time window to monitor the number of auto revive attempts (the default is set to two hours).

Bash
Copy
Type to search, ESC to discard
Type to search, ESC to discard
Type to search, ESC to discard