Connecting Application Servers to Lightbits
Connecting an application server to the volumes on the Lightbits storage server is accomplished through the following procedure.
Connecting an Application Server to a Volume
Step | Command Type | Simplified Command |
---|---|---|
Get details about Lightbits storage cluster | Lightbits lbcli CLI | <li> lbcli get cluster </li> <li>lbcli list nodes </li> |
Lightbits REST | <li>get /api/v2/cluster </li> <li>get /api/v2/nodes </li> | |
Verify network connectivity | Linux command | ping <IP of Lightbits Instance> |
Connect to Lightbits | Connect to Lightbits cluster | nvme connect <your Lightbits connection details> |
Review block device details | Linux command | lsblk or nvme list |
Only cluster admins have access to cluster and node level APIs. Therefore, tenant admins should get all of the required connection details from their cluster admin.
Before You Begin
Before you begin the process to connect an application server to the Lightbits storage server, confirm that the following conditions are met:
- A volume exists on the Lightbits storage server with the correct ACL (of the client or ALLOW_ANY).
- A TCP/IP connection exists to the Lightbits storage server.
- If you are a tenant admin, you should get all of the connection details from your cluster admin.
Reviewing the Lightbits Storage Cluster Connection Details (Cluster Admin Only)
The following table lists the required details you need to use for the nvme connect
command on your application server. You can retrieve this information using the nvme connect command or an lbcli command.
Required Lightbits Storage Cluster Connection Details
Item | Description | NVMe Connect Command Parameter | lbcli Command (To Get the Information) |
---|---|---|---|
Subsystem NQN | The Lightbits cluster that was used to create the volume. | -n | lbcli get cluster |
Instance IP Addresses | The IP addresses for all of the nodes in the Lightbits cluster. | -a | lbcli list nodes |
TCP ports | The TCP ports used by the Lightbits cluster nodes. | -s | lbcli list nodes |
ACL string | The ACL used when you created the volume on the Lightbits storage cluster. | -q | lbcli get volume (volume name) |
Obtaining the Lightbits Cluster Subsystem NQN
On any Lightbits server, enter the lbcli get cluster
command.
Sample Command
$ lbcli -J $LIGHTOS_JWT get cluster
Sample Output
UUID: 442a77f8-7f7a-4ab7-9fce-f1d1612e8b03
currentMaxReplicas: 3
subsystemNQN: nqn.2014-08.org.nvmexpress:NVMf:uuid:70492bf6-92e6-498a-872b-408ceb4d52d8 supportedMaxReplicas: 3
The output includes the subsystem NQN for the Lightbits cluster.
Obtaining the Lightbits Nodes Data IP Addresses and TCP Ports
On any Lightbits server, enter thelbcli list nodes
command.
Sample Command
$ lbcli -J $LIGHTOS_JWT list nodes
Sample Output
NAME UUID State NVME-Endpoint
server00-0 192af7c0-d39f-4872-b849-7eb3dc0f7b53 Active 10.23.26.13:4420
server01-0 1f4ef0ce-0634-47c7-9e5f-d4fd910ff376 Active 10.23.26.8:4420
server02-0 6d9b8337-18cd-4b14-bea1-f56aca213d68 Active 10.23.26.4:4420
The output’s NVME-Endpoint includes the Instance IP addresses and TCP ports for all the Lightbits cluster’s nodes.
Obtaining the Volume ACL String
The ACL string is the ACL you used when you created the volume on the Lightbits storage cluster.
You can also review the list of existing volumes and their ACLs by executing the lbcli list volumes
or lbcli get volume
on any of the Lightbits servers.
Verifying TCP/IP Connectivity
Before you run the nvme connect[ ](https://www.lightbitslabs.com/nvme-cli-overview/nvme-connect-command.html)
command on the application server, enter a Linux ping
command to check the TCP/IP connectivity between your application server and the Lightbits storage cluster.
Sample Command
$ ping -c 1 10.23.26.8
rack02-server70
: An application server10.23.26.8
: The Instance IP address on one of the Lightbits storage cluster nodes
Sample Output
PING 10.23.26.8 (10.23.26.8) 56(84) bytes of data.
64 bytes from 10.23.26.8: icmp_seq=1 ttl=255 time=0.032 ms
--- 10.23.26.8 ping statistics --- 1 packets transmitted,
1 received, 0% packet loss, time 0ms rtt min/avg/max/mdev = 0.032/0.032/0.032/0.000 ms
The output indicates this application server has a good connection to the Lightbits storage instance.
It is recommended to repeat this check with all the IP addresses obtained from the lbcli list nodes
command.
Connecting to the Lightbits Cluster
With the IP, port, subsystem NQN and ACL values for your volume, you can execute the NVMe Connect Command.
You must repeat the nvme connect command for each of the NVMe endpoints retrieved by the lbcli List Nodes command.
Sample NVMe Connect Command
$ nvme connect -t tcp -a 10.23.26.13 -s 4420 -l -1 -n nqn.2014-08.org.nvmexpress:NVMf:uuid:70492bf6-92e6-498a-872b-408ceb4d52d8 -q test123
- Use the client procedure for each node in the cluster. Remember to use the correct NVME-Endpoint for each node.
- Add the
--ctrl-loss-tmo -1
flag to allow infinite attempts to reconnect nodes. This prevents a timeout from occuring when attempting to connect with a node in a failure state.- During the connection phase to a client, the system can crash if you use NVMe/TCP drivers that are not supported by Lightbits.
For more details on the NVMe CLI, see the NVMe CLI Overview section of this document.
Currently, Lightbits only supports TCP for the transport type value.
The above connect command will connect you to the primary node where the volume is. It is recommended to have the discovery client installed on all the clients. This will automatically pull the required information from the cluster (or from several clusters), discover all the volumes the client has access to, and maintain high availability so that if the primary fails, the optimized NMVe/TCP path will go to the new primary. See the Discovery Client Deployment section for more information.
After you have entered the nvme connect
command, you can confirm the client’s connection to Lightbits cluster by entering the nvme list-subsys
command.
Reviewing Block Device Details on the Application Server
After the nvme connect command completes, you can see the available block devices on the application server using the Linux lsblk
command, or the nvme connect
command.
The following example shows how to use the Linux lsblk
command to list all block devices after the nvme connect
command has been executed. This command will show a list of all block devices on the client and the block devices the client can connect to (all the volumes for which the client is part of their ACL and all volumes that are ALLOW_ANY).
Sample Command
$ lsblk
Sample Output
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
nvme0n1 259:0 0 10G 0 disk
In this example output, you can see the 10GB NVMe/TCP device with the name nvme0n1. This name indicates the device is:
- From the NVMe subsystem
- The first volume on the NVMe subsystem 0
Your Lightbits storage cluster is now initialized and ready for use.
You can configure your applications to use this NVMe/TCP connected volume as you would with any other Linux block device.
NVMe/TCP MultiPath
NVMe multipath I/O refers to two or more independent paths between a single host and a namespace. Each path uses its controller, although multiple controllers can share a subsystem port. Multipath I/O like namespace sharing requires that the NVM subsystem contains two or more controllers.
Multipath is part of NVMe specification and is used by the Lightbits cluster software as follows:
- The primary node exposes the path to the volume.
- Clients send read and write requests to the primary node.
- The primary node replicates to the secondary nodes.
- If the primary node fails, the secondary node exposes a path to the client so the client can continue working with the secondary node.
Lightbits uses a proprietary protocol on top of TCP to replicate data between primary and secondary nodes, without requiring any changes to the client.
The default CF stack will not deploy any client machine in your environment. In order to test the functionalities and performance of your AWS-based Lightbits cluster, you will therefore need to deploy an AWS instance with an operating system that supports NVMe over TCP via the adequate nvme_tcp kernel module.
Such distributions include:
- Ubuntu
- RHEL
- Amazon Linux
- Rocky
Make sure your kernel version has a fully functional nvme_tcp kernel module. Lightbits recommends kernel version 5.4 and above.
The following is an example of a sequence of commands to test client connectivity:
- Connect to one of your storage instances using Session Manager (SSH).
- Get the system JWT token.
- List nodes for status.
- Create a test volume using lbcli create volume.
In order to connect to a storage instance, go to the EC2 instances dashboard and select one of the lightbits-node instances. Then click Connect > Session Manager.
Within the session manager (SSH) window:
> sudo su > cd /opt/light-app/config/cluster1/; . system_jwt;
> lbcli list nodes -J $LIGHTOS_JWT
Name UUID State NVMe endpoint Failure domains Local rebuild progress
ip-10-240-99-56.ec2.internal-0 1c221509-be0c-57f9-9f66-980174d8f91a Active 10.240.99.56:4420
[i-020acc69e738c6c39 ip-1...] None
ip-10-240-99-224.ec2.internal-0 83bb07d7-2bb8-5e8e-a3db-fce76288b971 Active 10.240.99.224:4420 [i-04896c6c251c859b1 ip-1...] None
ip-10-240-99-148.ec2.internal-0 ad122562-be10-5268-82d0-7fa41349f9f7 Active 10.240.99.148:4420 [i-017209734ebc14b03 ip-1...] None
> lbcli create volume --project-name=default --name=vol1 --acl=client1 --size="1 Tib" --compression=true --replica-count=3 -J $LIGHTOS_JWT
Name UUID State Protection State NSID Size Replicas Compression ACL Rebuild Progress
vol1 d1264ac8-3fd4-43d4-8738-236ff55451a5 Creating Unknown 0 1.0 TiB 3 true values:"client1"
> lbcli list volumes -J $LIGHTOS_JWT Name UUID State Protection State NSID Size Replicas Compression ACL Rebuild Progress vol1 d1264ac8-3fd4-43d4-8738-236ff55451a5 Available FullyProtected 1 1.0 TiB 3 true values:"client1" None
You can copy the JWT and add it to the file located at /etc/lbcli/lbcli. yaml.
Example:
output-format human-readable
dial-timeout 5s
command-timeout 60s
insecure-skip-tls-verify true
debug false api-version2
insecure-transport false
endpoint https //127.0.0.1443
jwt the value you copied
This will allow you to perform further lbcli management commands to the storage cluster without specifying each JWT command.
- Create a client instance within the same VPC as the storage cluster.
- Connect to the client using SSH.
- Load the nvme-tcp module.
- Discover the volume with discovery-service via the NLB (Load Balancer) URL.
- Fio r/w test.
> sudo su
> sudo modprobe nvme_tcp
> discovery-client connect-all -t tcp -a ofir-Netwo-KCNFHE07SDVA-5a5f14234bd98447.elb.us-east-1.amazonaws.com -q client1 -p
"Instance"2 "Cntlid"257 "Device""/dev/nvme2"
"Instance"3 "Cntlid"513 "Device""/dev/nvme3"
"Instance"4 "Cntlid"769 "Device""/dev/nvme4"
> nvme list Node SN Model Namespace Usage Format FW Rev
/dev/nvme0n1 vol0edb72d077feb8d69 Amazon Elastic Block Store 1 21.47 GB / 21.47 GB 512 B + 0 B 2.0 /dev/nvme1n1 AWS436D8B424F5F5F8CC Amazon EC2 NVMe Instance Storage 1 3.75 TB / 3.75 TB 512 B + 0 B 0 /dev/nvme2n1 513ee00237f59136 Lightbits LightOS 1 1.10 TB / 1.10 TB 4 KiB + 0 B 2.3
> fio --name=rw --ioengine=libaio --iodepth=256 --rw=randread --bs=4k --direct=1 --filename=/dev/nvme2n1 --size=10G --numjobs=8 --runtime=20 --group_reporting
Node Rebuild
The Lightbits cluster has the ability to rebuild data on a node if it is not in sync with other replications on other nodes. The cluster will identify that there is a node that is not in sync and will trigger the rebuild process. This could happen due to connectivity issues to the node, software issues that caused the node to stop responding, or restart of an instance.
The node will decide whether to perform a partial rebuild - usually after a short disruption or reboot, or a full rebuild in cases of prolonged downtime. During the rebuild process, all volumes that have a replication on the affected node will be in degraded mode. Once the rebuild is done, all volumes will return to be fully available.