Auto Maintenance - Additional Information

Maintenance Parameter Store Objects

The Lightbits STS in AWS stores the maintenance state of the instances that are in the progress of maintenance activities in the AWS Parameter Store. The tables below list the parameters and their descriptions.

ReplaceInfo

FieldDescription
instanceIdThe Instance ID of the instance to be replaced.
forceReplaceThe InstanceID of the instance to be forcefully replaced (triggers ‘upgrade’ role), for manually forcing the flow to start.
targetUUIDThe UUID of the new server to be configured.
targetInstanceIdThe Instance ID of the new server to be configured.
configureRetriesThe amount of server configuration retries if the configuration fails.

HealingInfo

FieldDescription
instanceIdThe Instance ID of the instance to be healed.
targetUUIDThe UUID of the new server to be configured.
targetInstanceIdThe Instance ID of the new server to be configured.

Scale-Out Information

FieldDescription
desiredClusterSizeThe N the cluster is to be scaled to.
newServerUUIDThe UUID of the new server to be configured.
newServerInstanceIDThe Instance ID of the new server to be configured.

Maintenance State Machine States by Role

The tables below list the various maintenance stages an instance can be in.

Healing States

FieldDescription
IdleSet scale in protection for the entire cluster.
scale-outScaling out the cluster n+1.
waiting-for-targetWaiting for the target instance to become available for configuration (no action).
waiting-for-ssmWaiting for the target instance to be available for SSM commands (no action).
configure-targetConfigure the target with Ansible.
create-serverCreate a target server (replace the node flow).
waiting-for-target-activationWaiting for the target to become active (no action).
complete-target-activationTarget Node Active, complete lifecycle action to add an instance to NLB.
replace-nodeTrigger node replication (source to target).
wait-replace-nodeWaiting for node replication to complete (moving volumes to a new node).
disable-serverDisable the source server.
wait-lb-server-disabledWaiting for the server to be disabled.
delete-serverDelete the source server.
decrease-autoscaling-groupDecrease ASG n-1 (terminates the source instance).
wait-for-asg-decreaseWait for the source instance to be terminated.
terminate-sourceTerminate Source Instance (terminating:wait would end in termination, but this is more optimized).
wait-for-terminate-stateWaiting for the instance to be fully terminated.
complete-replaceResets scale in protection of instances if replace did not fail.
healing-failedReplace failed (no action).
Type to search, ESC to discard
Type to search, ESC to discard
Type to search, ESC to discard