drivers/cxl/core/hdm.c | 23 +++++++--- drivers/cxl/core/region.c | 42 +++++++++--------- tools/testing/cxl/test/cxl.c | 83 +++++++++++++++++++++++++++++++++--- 3 files changed, 115 insertions(+), 33 deletions(-)
Hello,
This v4 continues Vishal Aslot's "Support zero-sized decoders" series [1]
and addresses the v3 review of patch 1's port->hdm_end handling [2].
CXL r3.2 §8.2.4.20.12 and §14.13.10 permit committing an HDM decoder with
size 0. BIOS commits and LOCKs such decoders to burn the trailing, unused
slots so the OS cannot program regions through them, e.g. a Type 3 device
in a Trusted Computing Base (TCB) established via the Trusted Security
Protocol (TSP). init_hdm_decoder() rejected these with -ENXIO during port
enumeration and aborted the whole port, so affected systems showed nothing
under 'cxl list'.
Patch 1 enumerates the decoder into the topology with its HW-reported LOCK
state and skips the DPA reservation it does not need.
On port->hdm_end (the v3 review): v3 advanced the watermark for the
zero-size decoder. sashiko correctly noted the write was outside
cxl_rwsem.dpa, and that advancing it without a balanced release strands
hdm_end -- cxl_dpa_free() returns early on !dpa_res, so it can never be
decremented past the zero-size id, breaking LIFO teardown of lower
decoders. v4 therefore does not touch hdm_end at all. The in-order check
in __cxl_dpa_reserve() is its only consumer and is never legitimately
reached past such a decoder: the burned slots are trailing, so enumeration
reserves no committed decoder after one, and the OS must not program a
region through a locked slot. hdm_end stays at the last sized reservation,
which is accurate. IMHO, if a non-trailing zero-size layout ever needs
support, the check should key off commit_end rather than hdm_end,
out of scope here.
Patch 1 also carries the poison fix: commit_end may now reference a decoder
with no DPA resource, so poison_by_decoder() falls through to
cxl_get_poison_unmapped() and collects poison from the unmapped DPA tail.
Patch 2 adds cxl_test coverage, gated behind a new mock_zero_size_decoders
module parameter (default off). v2 committed the mock slots unconditionally
on the host-bridge0 auto-region endpoints, which the region tests reuse,
regressing 7 of 17 cxl unit tests; defaulting off leaves the shared
topology untouched, and enabling it reproduces the BIOS layout.
Tested with the ndctl cxl unit suite: param off, no regressions (13 pass,
1 environmental skip); param on, cxl-poison's by-memdev-by-dpa case returns
all 4 records. cxl_test exercises the topology
and poison handling; the init_hdm_decoder enumeration path is validated on
real hardware.
The result is in the following:
"""
$ sudo env "PATH=$PATH" meson test -C build --suite cxl \
> --num-processes 1 -t 6 \
> --print-errorlogs
ninja: Entering directory `/home/nvidia/ndctl/build'
[1/50] Generating version.h with a custom command
1/14 ndctl:cxl / cxl-topology.sh OK 3.38s
2/14 ndctl:cxl / cxl-region-sysfs.sh OK 2.60s
3/14 ndctl:cxl / cxl-labels.sh OK 2.53s
4/14 ndctl:cxl / cxl-create-region.sh OK 3.25s
5/14 ndctl:cxl / cxl-xor-region.sh OK 2.61s
6/14 ndctl:cxl / cxl-events.sh OK 2.46s
7/14 ndctl:cxl / cxl-sanitize.sh OK 5.42s
8/14 ndctl:cxl / cxl-destroy-region.sh OK 2.47s
9/14 ndctl:cxl / cxl-qos-class.sh OK 2.54s
10/14 ndctl:cxl / cxl-translate.sh OK 0.78s
11/14 ndctl:cxl / cxl-elc.sh OK 2.45s
12/14 ndctl:cxl / cxl-security.sh SKIP 0.02s exit status 77
13/14 ndctl:cxl / cxl-features.sh OK 1.27s
14/14 ndctl:cxl / cxl-poison.sh OK 7.91s
Ok: 13
Expected Fail: 0
Fail: 0
Unexpected Pass: 0
Skipped: 1
Timeout: 0
Full log written to /home/nvidia/ndctl/build/meson-logs/testlog.txt
"""
And the tests Alison mentioned.
"""
$ sudo env "PATH=$PATH" meson test -C build --num-processes 1 -t 6 --print-errorlogs \
> cxl-region-sysfs.sh cxl-create-region.sh cxl-xor-region.sh \
cxl-destroy-region.sh cxl-qos-class.sh cxl-poison.sh
ninja: Entering directory `/home/nvidia/ndctl/build'
[1/50] Generating version.h with a custom command
1/6 ndctl:cxl / cxl-region-sysfs.sh OK 2.62s
2/6 ndctl:cxl / cxl-create-region.sh OK 3.15s
3/6 ndctl:cxl / cxl-xor-region.sh OK 2.61s
4/6 ndctl:cxl / cxl-destroy-region.sh OK 2.39s
5/6 ndctl:cxl / cxl-qos-class.sh OK 2.47s
6/6 ndctl:cxl / cxl-poison.sh OK 7.80s
Ok: 6
Expected Fail: 0
Fail: 0
Unexpected Pass: 0
Skipped: 0
Timeout: 0
Full log written to /home/nvidia/ndctl/build/meson-logs/testlog.txt
"""
[1] https://lore.kernel.org/all/20251015024019.1189713-1-vaslot@nvidia.com/T/#u
[2] https://lore.kernel.org/all/20260607053837.4389-1-icheng@nvidia.com/
Richard Cheng (2):
cxl/hdm: Allow zero sized HDM decoders
tools/testing/cxl: Enable zero sized decoder under hb0
drivers/cxl/core/hdm.c | 23 +++++++---
drivers/cxl/core/region.c | 42 +++++++++---------
tools/testing/cxl/test/cxl.c | 83 +++++++++++++++++++++++++++++++++---
3 files changed, 115 insertions(+), 33 deletions(-)
base-commit: ddd664bbff63e09e7a7f9acae9c43605d4cf185f
--
2.43.0
© 2016 - 2026 Red Hat, Inc.