Provisioning Grafana and Prometheus
Prometheus gathers statistics from the Lightbits cluster. Grafana in turn represents everything in graphs on dashboards. This monitoring package can monitor several clusters at once, and multiple clusters can be configured.
The following details how to provision a monitoring stack with Prometheus and Grafana. The instructions below are relevant for Lightbits versions 3.4.1 and above.
Ensure that SELinux Firewall permissions are permissive before deploying.
Prerequisites
Designate a non-Lightbits server for the Lightbits monitoring solution to be installed. Ensure that the following software is installed on the designated monitoring host.
Hardware specifications:
- 10 cores
- 32 GB RAM
- 128 GB of storage
- Connectivity to Lightbits' access network (Lightbits exporter service and API service).
Installing Monitoring Packages
To install the monitoring packages, run the following:
sudo yum install lightos-monitoring-images lightos-monitoring-clustering
For Deb-packaged based OSs (for example, Ubuntu), see Connecting to the Cluster Client DEB Repository, and then run: sudo apt-get install lightos-monitoring-images lightos-monitoring-clustering
.
Monitoring Stack Deployment
To start running the Prometheus and Grafana containers, run the following (clustering):
/var/lib/monitoring-images/deploy.sh deploy
Configuring Prometheus
Prometheus should be configured with all of the jobs to scrape, and alert and recording rules. The only thing left to configure Prometheus with is to add all of the targets for the Lightbits cluster to monitor.
Since this information is deployment-specific, each one should follow the provided example and set the host accordingly.
Each Prometheus instance can monitor multiple clusters at the same time.
To add a cluster to monitor or to update an existing cluster, run the commands in the section below.
Adding Prometheus Targets
The following example illustrates how to generate the configuration for a cluster named cluster_1
, which has three servers:
- rack01-server01
- rack01-server02
- rack01-server03
You will then need to configure Prometheus to scrape the services that run on all of the nodes.
The following command generates targets.yaml files that define Prometheus endpoints to scrape. See the <file_sd_config>
section of the Prometheus Documentation for additional information.
/var/lib/monitoring-images/deploy.sh add_cluster \
-c cluster1 \
-i rack01-server01,rack01-server02,rack01-server03
This action creates the following files:
/var/lib/monitoring-clustering/file_sd_configs/api-service/cluster1-targets.yaml
- labels:
job: cluster_1
targets:
- rack01-server01:443
- rack01-server02:443
- rack01-server03:443
/var/lib/monitoring-clustering/file_sd_configs/lightbox-exporter/cluster1-targets.yaml
- labels:
job: cluster_1
targets:
- rack01-server01:8090
- rack01-server02:8090
- rack01-server03:8090
Ensure the the yaml configuration files have the minimum permissions, reset the files permissions in the Prometheus container to be rw-r--r--
and not rw-------
.
Since we bind-mounted the /var/lib/monitoring-clustering/file_sd_configs
folder to the Prometheus container, this command issues a reload to Prometheus that is configured to collect these endpoints.
Verify that the targets were configured correctly by viewing http://<prometheus_host>:9090/targets
.
Removing Prometheus Targets
The following command will undo the former command:
/var/lib/monitoring-images/deploy.sh remove_cluster -c cluster1
This will delete the following files:
- /var/lib/monitoring-clustering/file_sd_configs/api-service/cluster1-targets.yaml
- /var/lib/monitoring-clustering/file_sd_configs/lightbox-exporter/cluster1-targets.yaml
And issue a reload to Prometheus.
You can verify that the configuration works by:
- Navigating to
http://<prometheus_host>:9090/config
- making sure that the expected configuration is used. - Navigating to
http://<prometheus_host>:9090/targets?search=
- making sure that the targets configured in the previous step are updated.
Configuring Grafana
Grafana should be configured automatically with the deployed Prometheus instance as Datasource, and all of the dashboards that Lightbits provides to monitor the cluster.
Cleaning Up Deployed Containers
To run the deployment again or clean the machine from artifacts that this operation applied, run the following command:
/var/lib/monitoring-images/deploy.sh clean
Uninstalling Monitoring Packages
To uninstall monitoring packages, run the following command:
sudo yum remove lightos-monitoring-images lightos-monitoring-clustering
For Deb-packaged based OSs (for example, Ubuntu), run: sudo apt-get remove lightos-monitoring-images lightos-monitoring-clustering
Enabling Persistent Journaling
To aid with troubleshooting and support, Lightbits recommends setting the journal to keep logs even after reboots and shutdowns. To ensure that the OS disk does not fill up, Lightbits also recommends updating the log rotation.
Enable persistent journaling:
sudo sed -i 's/#Storage.*/Storage=persistent/' /etc/systemd/journald.conf
Enable log rotation:
sudo sed -i 's| missingok$| daily\n rotate 30\n compress\n missingok\n notifempty|g'
/etc/logrotate.d/syslog