[PATCH 0/1] nvme-pci: Add CPU latency pm-qos handling

Tero Kristo posted 1 patch 1 month, 3 weeks ago
[PATCH 0/1] nvme-pci: Add CPU latency pm-qos handling
Posted by Tero Kristo 1 month, 3 weeks ago
Hello,

Re-posting this as the 6.12-rc1 is out, and the previous RFC didn't
receive any feedback. The patch hasn't seen any changes, but I included
the cover letter for details.

The patch adds mechanism for tacking NVME latency with random workloads.
A new sysfs knob (cpu_latency_us) is added under NVME devices, which can
be used to fine tune PM QoS CPU latency limit while NVME is operational.

Below is a postprocessed measurement run on an Icelake Xeon platform,
measuring latencies with 'fio' tool, running random-read and read
profiles. 5 random-read and 5 bulk read operations are done with the
latency limit enabled / disabled, and the maximum 'slat' (start latency),
'clat' (completion latency) and 'lat' (total latency) values shown for each
setup; values are in microseconds. The bandwidth is measured with the
'read' payload of fio, and min-avg-max values are shown in MiB/s. c6%
indicates the time spent in c6 state as percentage during the test for
the CPU running 'fio'.

==
Setting cpu_latency_us limit to 10 (enabled)
  slat: 31, clat: 99, lat: 113, bw: 1156-1332-1359, c6%: 2.8
  slat: 49, clat: 135, lat: 143, bw: 1156-1332-1361, c6%: 1.0
  slat: 67, clat: 148, lat: 156, bw: 1159-1331-1361, c6%: 0.9
  slat: 51, clat: 99, lat: 107, bw: 1160-1330-1356, c6%: 1.0
  slat: 82, clat: 114, lat: 122, bw: 1156-1333-1359, c6%: 1.0
Setting cpu_latency_us limit to -1 (disabled)
  slat: 112, clat: 275, lat: 364, bw: 1153-1334-1364, c6%: 80.0
  slat: 110, clat: 270, lat: 324, bw: 1164-1338-1369, c6%: 80.1
  slat: 106, clat: 260, lat: 320, bw: 1159-1330-1362, c6%: 79.7
  slat: 110, clat: 255, lat: 300, bw: 1156-1332-1363, c6%: 80.2
  slat: 107, clat: 248, lat: 322, bw: 1152-1331-1362, c6%: 79.9
==

As a summary, the c6 induced latencies are eliminated from the
random-read tests ('clat' drops from 250+us to 100-150us), and in the
maximum throughput testing the bandwidth is not impacted negatively
(bandwidth values are pretty much identical) so the overhead introduced
is minimal.

-Tero