| ID | Description |
|---|---|
| 39628 | On Sapphire Rapids machines, the DSA offload can be enabled - causing a Machine Check Exception (MCE) and forcing a reboot. For this to happen, the number of DSA devices on a numa node should be sufficient to allocate the number of channels equal to the size of the duroslight process cpuset (each device has eight channels). This can be confirmed from the duroslight log: "dsa_service: failed to get channels" - which indicates that the offload is not enabled; "configurator - Enabling DSA crc32 offload for reads" indicates that the offload is enabled, and there is a risk of an MCE or other malfunction. In order to prevent the malfunctions in such configurations, the offloads can be disabled in /etc/duroslight/conf.yaml under "configurator" by adding "dsa_ read_crc32: false" and "dsa_write_crc32: false". |
| 37831 | In some cases, silent data corruption on an SSD could cause a node crash instead of attempting to recover the data and reporting an event. This can occur if the SSD returns invalid data rather than an I/O error. |
| 36478 | Exceeding the max limit of file descriptors (client connections) of 65536 would lead the Duroslight service to fail. |
| 17329 | Lightbits exposes latency information per request size. The time window for latency measurement is not synchronized with the measurement of nr read/write requests. Therefore a weighted average calculation of latency over all request sizes will result in inaccurate latency information. |
Was this page helpful?