[PATCH v3 0/2] Support zero-sized HDM decoders

Richard Cheng posted 2 patches 21 hours ago
There is a newer version of this series
drivers/cxl/core/hdm.c       | 25 ++++++++---
drivers/cxl/core/region.c    | 42 +++++++++---------
tools/testing/cxl/test/cxl.c | 83 +++++++++++++++++++++++++++++++++---
3 files changed, 117 insertions(+), 33 deletions(-)
[PATCH v3 0/2] Support zero-sized HDM decoders
Posted by Richard Cheng 21 hours ago
Hello,

This v3 picks up Vishal Aslot's "Support zero-sized decoders" series [1]
and addresses the v2 review feedback [2].

CXL r3.2 §8.2.4.20.12 and §14.13.10 permit committing an HDM decoder with
size 0. BIOS commits and LOCKs such decoders to burn slots so the OS
cannot program new regions through them, e.g. for a Type 3 device in a
Trusted Computing Base (TCB) established via the Trusted Security Protocol
enumeration and aborts the whole port, so affected systems show nothing
under 'cxl list'.

Patch 1 enumerates the committed zero-size decoder into topology with its
HW-reported LOCK state, and resolves the two v2 review issues:

  - hdm_end (sashiko-bot, Alison): the zero-size decoder skips
    devm_cxl_dpa_reserve(), which left port->hdm_end un-incremented, so
    the next committed decoder failed the in-order check in
    __cxl_dpa_reserve() with -EBUSY. hdm_end is now advanced.

  - poison (Alison): commit_end can now reference a decoder with no DPA
    resource. poison_by_decoder() returned before the commit_end check,
    so cxl_get_poison_unmapped() never ran and unmapped-tail poison was
    lost (cxl-poison: 4 expected, 2 found). The fix is folded into
injection behind a new mock_zero_size_decoders module parameter
(default off): the shared topology is unchanged, and enabling it
reproduces the BIOS layout (auto-region at decoder[0], zero-size+locked
above at commit_end).

Patch 2 adds cxl_test coverage. v2 committed the mock zero-size slots
unconditionally on the host-bridge0 auto-region endpoints, which the
region tests reuse, regressing 7 of 17 cxl unit tests. v3 gates the
injection behind a new mock_zero_size_decoders module param (default off).
The shared topology is unchanged, and enabling it reproduces the BIOS
layout.

Tested with the ndctl cxl unit suite: param off, no regressions; param
on, cxl-poison's by-memdev-by-dpa case returns all 4 records (2 without
patch 1's poison fix).

The result is in the following:
"""
$ sudo env "PATH=$PATH" meson test -C build --suite cxl \
> --num-processes 1 -t 6 \
> --print-errorlogs
ninja: Entering directory `/home/nvidia/ndctl/build'
[1/50] Generating version.h with a custom command
 1/14 ndctl:cxl / cxl-topology.sh               OK                3.38s
 2/14 ndctl:cxl / cxl-region-sysfs.sh           OK                2.60s
 3/14 ndctl:cxl / cxl-labels.sh                 OK                2.53s
 4/14 ndctl:cxl / cxl-create-region.sh          OK                3.25s
 5/14 ndctl:cxl / cxl-xor-region.sh             OK                2.61s
 6/14 ndctl:cxl / cxl-events.sh                 OK                2.46s
 7/14 ndctl:cxl / cxl-sanitize.sh               OK                5.42s
 8/14 ndctl:cxl / cxl-destroy-region.sh         OK                2.47s
 9/14 ndctl:cxl / cxl-qos-class.sh              OK                2.54s
10/14 ndctl:cxl / cxl-translate.sh              OK                0.78s
11/14 ndctl:cxl / cxl-elc.sh                    OK                2.45s
12/14 ndctl:cxl / cxl-security.sh               SKIP              0.02s   exit status 77
13/14 ndctl:cxl / cxl-features.sh               OK                1.27s
14/14 ndctl:cxl / cxl-poison.sh                 OK                7.91s

Ok:                 13  
Expected Fail:      0   
Fail:               0   
Unexpected Pass:    0   
Skipped:            1   
Timeout:            0   

Full log written to /home/nvidia/ndctl/build/meson-logs/testlog.txt
"""

And the tests Alison mentioned.
"""
$ sudo env "PATH=$PATH" meson test -C build --num-processes 1 -t 6 --print-errorlogs \
> cxl-region-sysfs.sh cxl-create-region.sh cxl-xor-region.sh \
    cxl-destroy-region.sh cxl-qos-class.sh cxl-poison.sh
ninja: Entering directory `/home/nvidia/ndctl/build'
[1/50] Generating version.h with a custom command
1/6 ndctl:cxl / cxl-region-sysfs.sh          OK                2.62s
2/6 ndctl:cxl / cxl-create-region.sh         OK                3.15s
3/6 ndctl:cxl / cxl-xor-region.sh            OK                2.61s
4/6 ndctl:cxl / cxl-destroy-region.sh        OK                2.39s
5/6 ndctl:cxl / cxl-qos-class.sh             OK                2.47s
6/6 ndctl:cxl / cxl-poison.sh                OK                7.80s

Ok:                 6   
Expected Fail:      0   
Fail:               0   
Unexpected Pass:    0   
Skipped:            0   
Timeout:            0   

Full log written to /home/nvidia/ndctl/build/meson-logs/testlog.txt
"""

[1]
https://lore.kernel.org/all/20251015024019.1189713-1-vaslot@nvidia.com/T/#u
[2] https://lore.kernel.org/all/cover.1779957270.git.icheng@nvidia.com/

Richard Cheng (2):
  cxl/hdm: Allow zero sized HDM decoders
  tools/testing/cxl: Enable zero sized decoder under hb0

 drivers/cxl/core/hdm.c       | 25 ++++++++---
 drivers/cxl/core/region.c    | 42 +++++++++---------
 tools/testing/cxl/test/cxl.c | 83 +++++++++++++++++++++++++++++++++---
 3 files changed, 117 insertions(+), 33 deletions(-)


base-commit: ddd664bbff63e09e7a7f9acae9c43605d4cf185f
-- 
2.43.0