Defining Failure Domains
A Failure Domain (FD) encompasses a section of a network, power supply, or physical location negatively affected when network, power, cooling, or other critical service experiences problems occur. Different copies of the data are stored in different FDs to keep data protected from various failures.
To specify the servers in the FD, you can configure it with items in the failure_domain array of the server configuration files. Take into consideration the server00.yml and server01.yml configuration below.
Server00 failure_domains array is configured with its own server name and the rack it is placed in "rack00".
name: server00
data_ifaces:
- bootproto: static
conn_name: ens1
ifname: ens1
ip4: 10.10.100/24
nodes:
- instanceID: 0
data_ip: 10.10.100
failure_domains:
- server00
- rack00
ec_enabled: true
lightfieldMode: SW_LF
storageDeviceLayout:
initialDeviceCount: 6
maxDeviceCount: 12
allowCrossNumaDevices: false
deviceMatchers:
# - model =~ ".*"
- partition == false
- size >= gib(300)
Server01 failure_domains array is configured with its own server name and the rack it is placed in "rack00".
name: server01
data_ifaces:
- bootproto: static
conn_name: ens1
ifname: ens1
ip4: 10.10.10.101/24
nodes:
- instanceID: 0
data_ip: 10.10.10.101
failure_domains:
- server01
- rack00
ec_enabled: true
lightfieldMode: SW_LF
storageDeviceLayout:
initialDeviceCount: 6
maxDeviceCount: 12
allowCrossNumaDevices: false
deviceMatchers:
# - model =~ ".*"
- partition == false
- size >= gib(300)
Make a note of the items in both server00 and server01 failure_domains arrays.
Since both servers share the same "rack00", volumes replicas will not be shared between these two servers (and their nodes).
If the lists were default, then volume replicas would be shared between the servers. Default means server00 failure_domain array only had "server00", and server01 failure_domain array only had "server01".
- At a minimum or good default configuration, configure the failure_domains with the server names. Add other items as above with "rack00", to help control the flow of volume replication.
- In a dual instance/node setup, volume replicas will not land on other nodes of same server.
- See Host Configuration File Variables for the entire list of variables available for the host variable files.
- The configurations above have a "data_ifaces" section for each server configuration. Typically this section is not included as the servers should be preconfigured with their data IPs; however, we can instruct Ansible to configure the data IPs during the Lightbits installation, so that the "data_ ifaces" section tells Ansible to configure the IP and subnet on said interface.
- Note that for ipv6 addresses, you will use 'ip6: ip/prefix' format. For example:
ip6: 2001:0db8:0:f101::1/64
. - The addresses used for ip4 or ip6 fields must match the address used in data_ip. The only difference is that ip4 and ip6 show the subnet or prefix as well. However, note that data_ip only shows the address without the subnet or prefix.