[PATCH v6 00/11] cxl: ACPI PRM Address Translation Support and AMD Zen5 enablement

Robert Richter posted 11 patches 2 months, 3 weeks ago
There is a newer version of this series
drivers/cxl/Kconfig       |   5 +
drivers/cxl/acpi.c        |  17 ++--
drivers/cxl/core/Makefile |   1 +
drivers/cxl/core/atl.c    | 207 ++++++++++++++++++++++++++++++++++++++
drivers/cxl/core/cdat.c   |   8 +-
drivers/cxl/core/core.h   |   8 ++
drivers/cxl/core/port.c   |   8 +-
drivers/cxl/core/region.c | 156 +++++++++++++++++-----------
drivers/cxl/cxl.h         |  31 ++++--
9 files changed, 356 insertions(+), 85 deletions(-)
create mode 100644 drivers/cxl/core/atl.c
[PATCH v6 00/11] cxl: ACPI PRM Address Translation Support and AMD Zen5 enablement
Posted by Robert Richter 2 months, 3 weeks ago
This patch set adds support for address translation using ACPI PRM and
enables this for AMD Zen5 platforms. v4 is the current appoach in
response to earlier attempts to implement CXL address translation:

 * v1: [1] and the comments on it, esp. Dan's [2],
 * v2: [3] and comments on [4], esp. Dave's [5],
 * v3: [6] and comments on it, esp. Dave's [7],
 * v4: [8].

This version addresses review comments. No major changes compared to
the previous submission. See the changelog for details. Thank you all
for your reviews and testing.

Documentation of CXL Address Translation Support will be added to the
Kernel's "Compute Express Link: Linux Conventions". This patch
submission will be the base for a documention patch that describes CXL
Address Translation support accordingly.

The CXL driver currently does not implement address translation which
assumes the host physical addresses (HPA) and system physical
addresses (SPA) are equal.

Systems with different HPA and SPA addresses need address translation.
If this is the case, the hardware addresses esp. used in the HDM
decoder configurations are different to the system's or parent port
address ranges. E.g. AMD Zen5 systems may be configured to use
'Normalized addresses'. Then, CXL endpoints have their own physical
address base which is not the same as the SPA used by the CXL host
bridge. Thus, addresses need to be translated from the endpoint's to
its CXL host bridge's address range.

To enable address translation, the endpoint's HPA range must be
translated to the CXL host bridge's address range. A callback is
introduced to translate a decoder's HPA to the CXL host bridge's
address range. The callback is then used to determine the region
parameters which includes the SPA translated address range of the
endpoint decoder and the interleaving configuration. This is stored in
struct cxl_region which allows an endpoint decoder to determine that
parameters based on its assigned region.

Note that only auto-discovery of decoders is supported. Thus, decoders
are locked and cannot be configured manually.

Finally, Zen5 address translation is enabled using ACPI PRMT.

This series bases on cxl/next.

V6:
 * rebased onto v6.18-rc5 and CXL updates for v6.19
 * note: applies on top of: [PATCH v3 0/3] CXL updates for v6.19

V5:
 * fixed build error with !CXL_REGION (kbot),
 * updated sob-chains,
 * added note to get_cxl_root_decoder() to drop reference after use
   (Dave),
 * moved initialziation of base* variables in
   cxl_prm_translate_hpa_range() (Dave, Jonathan),
 * fixed initialization of cxlr->hpa_range for the non-auto case
   (Alison),
 * added description of the @hpa_range arg to
   cxl_calc_interleave_pos() (kbot),
 * removed optional patches 12-14 to send them separately (Alison,
   Dave),
 * reordered patches 1-6 to reduce dependencies between them and give
   way for early pick up candidates,
 * rebased onto cxl/next (c692f5a947ad),
 * added commas in comment in cxl_add_to_region() (Jonathan),
 * removed cxlmd from struct cxl_region_context (Dave, Jonathan),
 * removed use of PTR_ERR_OR_ZERO() (Jonathan),
 * increased wrap width to 80 chars for comments in cxl_atl.c (Jonathan),
 * moved (ways > 1) check out of while loop in cxl_prm_translate_hpa_range()
   (Jonathan),
 * removed trailing comma in struct prm_cxl_dpa_spa_data initializer (Jonathan),
 * updated patch description on locking the decoders (Dave, Jonathan),
 * spell fix in patch description (Jonathan),

V4:
 * rebased onto v6.18-rc2 (cxl/next),
 * updated sob-chain,
 * reworked and simplified code to use an address translation callback
   bound to the root port,
 * moved all address translation code to core/atl.c,
 * cxlr->cxlrd change, updated patch description (Alison),
 * use DEFINE_RANGE() (Jonathan),
 * change name to @hpa_range (Dave, Jonathan),
 * updated patch description if there is a no-op (Gregory),
 * use Designated initializers for struct cxl_region_context (Dave),
 * move callback handler to struct cxl_root_ops (Dave),
 * move hanler inialization to acpi_probe() (Dave),
 * updated comment where Normalized Addressing is checked (Dave),
 * limit PRM enablement only to AMD supported kernel configs (AMD_NB)
   (Jonathan),
 * added 3 related optional cleanup patches at the end of the series,

V3:
 * rebased onto cxl/next,
 * complete rework to reduce number of required changes/patches and to
   remove platform specific code (Dan and Dave),
 * changed implementation allowing to add address translation to the
   CXL specification (documention patch in preparation),
 * simplified and generalized determination of interleaving
   parameters using the address translation callback,
 * depend only on the existence of the ACPI PRM GUID for CXL Address
   Translation enablement, removed platform checks,
 * small changes to region code only which does not require a full
   rework and refactoring of the code, just separating region
   parameter setup and region construction,
 * moved code to new core/atl.c file,
 * fixed subsys_initcall order dependency of EFI runtime services
   (Gregory and Joshua),

V2:
 * rebased onto cxl/next,
 * split of v1 in two parts:
   * removed cleanups and updates from this series to post them as a
     separate series (Dave),
   * this part 2 applies on top of part 1, v3,
 * added tags to SOB chain,
 * reworked architecture, vendor and platform setup (Jonathan):
   * added patch "cxl/x86: Prepare for architectural platform setup",
   * added function arch_cxl_port_platform_setup() plus a __weak
     versions for archs other than x86,
   * moved code to core/x86,
 * added comment to cxl_to_hpa_fn (Ben),
 * updated year in copyright statement (Ben),
 * cxl_port_calc_hpa(): Removed HPA check for zero (Jonathan), return
   1 if modified,
 * cxl_port_calc_pos(): Updated description and wording (Ben),
 * added sereral patches around interleaving and SPA calculation in
   cxl_endpoint_decoder_initialize(),
 * reworked iterator in cxl_endpoint_decoder_initialize() (Gregory),
 * fixed region interleaving parameters() (Alison),
 * fixed check in cxl_region_attach() (Alison),
 * Clarified in coverletter that not all ports in a system must
   implement the to_hpa() callback (Terry).

[1] https://lore.kernel.org/linux-cxl/20240701174754.967954-1-rrichter@amd.com/
[2] https://lore.kernel.org/linux-cxl/669086821f136_5fffa29473@dwillia2-xfh.jf.intel.com.notmuch/
[3] https://patchwork.kernel.org/project/cxl/cover/20250218132356.1809075-1-rrichter@amd.com/
[4] https://patchwork.kernel.org/project/cxl/cover/20250715191143.1023512-1-rrichter@amd.com/
[5] https://lore.kernel.org/all/78284b12-3e0b-4758-af18-397f32136c3f@intel.com/
[6] https://patchwork.kernel.org/project/cxl/cover/20250912144514.526441-1-rrichter@amd.com/
[7] https://lore.kernel.org/all/20250912144514.526441-8-rrichter@amd.com/T/#m23c2adb9d1e20770ccd5d11475288bda382b0af5
[8] https://patchwork.kernel.org/project/cxl/cover/20251103184804.509762-1-rrichter@amd.com/

Robert Richter (11):
  cxl/region: Rename misleading variable name @hpa to @hpa_range
  cxl/region: Store root decoder in struct cxl_region
  cxl/region: Store HPA range in struct cxl_region
  cxl: Simplify cxl_root_ops allocation and handling
  cxl/region: Separate region parameter setup and region construction
  cxl/region: Add @hpa_range argument to function
    cxl_calc_interleave_pos()
  cxl/region: Use region data to get the root decoder
  cxl: Introduce callback for HPA address ranges translation
  cxl/acpi: Prepare use of EFI runtime services
  cxl: Enable AMD Zen5 address translation using ACPI PRMT
  cxl/atl: Lock decoders that need address translation

 drivers/cxl/Kconfig       |   5 +
 drivers/cxl/acpi.c        |  17 ++--
 drivers/cxl/core/Makefile |   1 +
 drivers/cxl/core/atl.c    | 207 ++++++++++++++++++++++++++++++++++++++
 drivers/cxl/core/cdat.c   |   8 +-
 drivers/cxl/core/core.h   |   8 ++
 drivers/cxl/core/port.c   |   8 +-
 drivers/cxl/core/region.c | 156 +++++++++++++++++-----------
 drivers/cxl/cxl.h         |  31 ++++--
 9 files changed, 356 insertions(+), 85 deletions(-)
 create mode 100644 drivers/cxl/core/atl.c

-- 
2.47.3
Re: [PATCH v6 00/11] cxl: ACPI PRM Address Translation Support and AMD Zen5 enablement
Posted by Alison Schofield 2 months, 3 weeks ago
Does this work 'as is', no changes required, to support DPA->SPA
(used in CXL Events) or SPA->DPA (used in poison by region offset)?
Re: [PATCH v6 00/11] cxl: ACPI PRM Address Translation Support and AMD Zen5 enablement
Posted by Robert Richter 2 months, 3 weeks ago
On 14.11.25 12:01:29, Alison Schofield wrote:

> Does this work 'as is', no changes required, to support DPA->SPA
> (used in CXL Events) or SPA->DPA (used in poison by region offset)?

The PRM handler could be used for to-SPA translations, but it might
not fit well to other users such as profiling, tracing and error
handling. Those users are executing in a critical path from a
performance or stability point of view. Performing a firmware call
could cause problems here. Since the to-DPA translation is missing
too, a different approach to solve address translation might work
better, such as examining the region parameters. The kernel's address
translation library could possibly be extended and used too. That
needs to be figured out. Also, my main focus for the patches is region
enablement.

Thanks,

-Robert
Re: [PATCH v6 00/11] cxl: ACPI PRM Address Translation Support and AMD Zen5 enablement
Posted by Alison Schofield 2 months ago
On Mon, Nov 17, 2025 at 03:58:40PM +0100, Robert Richter wrote:
> On 14.11.25 12:01:29, Alison Schofield wrote:
> 
> > Does this work 'as is', no changes required, to support DPA->SPA
> > (used in CXL Events) or SPA->DPA (used in poison by region offset)?
> 
> The PRM handler could be used for to-SPA translations, but it might
> not fit well to other users such as profiling, tracing and error
> handling. Those users are executing in a critical path from a
> performance or stability point of view. Performing a firmware call
> could cause problems here. Since the to-DPA translation is missing
> too, a different approach to solve address translation might work
> better, such as examining the region parameters. The kernel's address
> translation library could possibly be extended and used too. That
> needs to be figured out. Also, my main focus for the patches is region
> enablement.

I see a dpa-to-spa prm routine. I don't know enough about the cost of
using it to say it's not worth using for CXl trace events that want
to report a SPA (from a DPA).

If we cannot trust what the address translation code will emit in
this case, forcing it to ULLONG_MAX would be safest.

-- Alison


> 
> Thanks,
> 
> -Robert
Re: [PATCH v6 00/11] cxl: ACPI PRM Address Translation Support and AMD Zen5 enablement
Posted by Robert Richter 2 months ago
On 03.12.25 20:22:17, Alison Schofield wrote:
> On Mon, Nov 17, 2025 at 03:58:40PM +0100, Robert Richter wrote:
> > On 14.11.25 12:01:29, Alison Schofield wrote:
> > 
> > > Does this work 'as is', no changes required, to support DPA->SPA
> > > (used in CXL Events) or SPA->DPA (used in poison by region offset)?
> > 
> > The PRM handler could be used for to-SPA translations, but it might
> > not fit well to other users such as profiling, tracing and error
> > handling. Those users are executing in a critical path from a
> > performance or stability point of view. Performing a firmware call
> > could cause problems here. Since the to-DPA translation is missing
> > too, a different approach to solve address translation might work
> > better, such as examining the region parameters. The kernel's address
> > translation library could possibly be extended and used too. That
> > needs to be figured out. Also, my main focus for the patches is region
> > enablement.
> 
> I see a dpa-to-spa prm routine. I don't know enough about the cost of
> using it to say it's not worth using for CXl trace events that want
> to report a SPA (from a DPA).
> 
> If we cannot trust what the address translation code will emit in
> this case, forcing it to ULLONG_MAX would be safest.

I will implement those handlers accordingly. Thanks Alison.

-Robert
Re: [PATCH v6 00/11] cxl: ACPI PRM Address Translation Support and AMD Zen5 enablement
Posted by Alison Schofield 2 months, 2 weeks ago
On Mon, Nov 17, 2025 at 03:58:40PM +0100, Robert Richter wrote:
> On 14.11.25 12:01:29, Alison Schofield wrote:
> 
> > Does this work 'as is', no changes required, to support DPA->SPA
> > (used in CXL Events) or SPA->DPA (used in poison by region offset)?
> 
> The PRM handler could be used for to-SPA translations, but it might
> not fit well to other users such as profiling, tracing and error
> handling. Those users are executing in a critical path from a
> performance or stability point of view. Performing a firmware call
> could cause problems here. Since the to-DPA translation is missing
> too, a different approach to solve address translation might work
> better, such as examining the region parameters. The kernel's address
> translation library could possibly be extended and used too. That
> needs to be figured out. Also, my main focus for the patches is region
> enablement.

If address translations are not supported/supportable, a quick exit
on any attempt (DPA->SPA or SPA->DPA) with this config seems needed. 

Better to fail and report ULLONG_MAX than leave open the possibility
of adding the wrong address to trace events or using the wrong address
in poison by region offset action.

Maybe you already know that it fails gracefully? If so, then it comes
down to documenting the limitation.

-- Alison


> 
> Thanks,
> 
> -Robert
Re: [PATCH v6 00/11] cxl: ACPI PRM Address Translation Support and AMD Zen5 enablement
Posted by Gregory Price 2 months, 2 weeks ago
On Sun, Nov 23, 2025 at 05:14:05PM -0800, Alison Schofield wrote:
> On Mon, Nov 17, 2025 at 03:58:40PM +0100, Robert Richter wrote:
> > On 14.11.25 12:01:29, Alison Schofield wrote:
> > 
> > > Does this work 'as is', no changes required, to support DPA->SPA
> > > (used in CXL Events) or SPA->DPA (used in poison by region offset)?
> > 
> > The PRM handler could be used for to-SPA translations, but it might
> > not fit well to other users such as profiling, tracing and error
> > handling. Those users are executing in a critical path from a
> > performance or stability point of view. Performing a firmware call
> > could cause problems here. Since the to-DPA translation is missing
> > too, a different approach to solve address translation might work
> > better, such as examining the region parameters. The kernel's address
> > translation library could possibly be extended and used too. That
> > needs to be figured out. Also, my main focus for the patches is region
> > enablement.
> 
> If address translations are not supported/supportable, a quick exit
> on any attempt (DPA->SPA or SPA->DPA) with this config seems needed. 
>
> Better to fail and report ULLONG_MAX than leave open the possibility
> of adding the wrong address to trace events or using the wrong address
> in poison by region offset action.
> 
> Maybe you already know that it fails gracefully? If so, then it comes
> down to documenting the limitation.
>


IIRC the to_spa() function wouldn't be populated (will be NULL) if this
is the case, so you wouldn't even be able to call the translation
function.

This was hit in a prior version of the set where I saw it fail on a
system using System Address mode instead of Normalized Address mode.

~Gregory
Re: [PATCH v6 00/11] cxl: ACPI PRM Address Translation Support and AMD Zen5 enablement
Posted by Alison Schofield 2 months, 2 weeks ago
On Mon, Nov 24, 2025 at 03:10:07PM -0500, Gregory Price wrote:
> On Sun, Nov 23, 2025 at 05:14:05PM -0800, Alison Schofield wrote:
> > On Mon, Nov 17, 2025 at 03:58:40PM +0100, Robert Richter wrote:
> > > On 14.11.25 12:01:29, Alison Schofield wrote:
> > > 
> > > > Does this work 'as is', no changes required, to support DPA->SPA
> > > > (used in CXL Events) or SPA->DPA (used in poison by region offset)?
> > > 
> > > The PRM handler could be used for to-SPA translations, but it might
> > > not fit well to other users such as profiling, tracing and error
> > > handling. Those users are executing in a critical path from a
> > > performance or stability point of view. Performing a firmware call
> > > could cause problems here. Since the to-DPA translation is missing
> > > too, a different approach to solve address translation might work
> > > better, such as examining the region parameters. The kernel's address
> > > translation library could possibly be extended and used too. That
> > > needs to be figured out. Also, my main focus for the patches is region
> > > enablement.
> > 
> > If address translations are not supported/supportable, a quick exit
> > on any attempt (DPA->SPA or SPA->DPA) with this config seems needed. 
> >
> > Better to fail and report ULLONG_MAX than leave open the possibility
> > of adding the wrong address to trace events or using the wrong address
> > in poison by region offset action.
> > 
> > Maybe you already know that it fails gracefully? If so, then it comes
> > down to documenting the limitation.
> >
> 
> 
> IIRC the to_spa() function wouldn't be populated (will be NULL) if this
> is the case, so you wouldn't even be able to call the translation
> function.

The hpa_to_spa fn defined as a root decoder ops is an additional layer for
arch's needing HPA to SPA translation. It's optional. If there is no
hpa_to_spa fn, then it is assumed that the CXL HPA==SPA and that is the
'final answer'  added to the trace log.

Sounds like you are on one of these systems, so maybe you could take a
look at what happens. If your devices support, try to inject and/or
clear poison and see the resulting kernel trace log. There is an
example for that here:
https://github.com/pmem/ndctl/blob/main/test/cxl-poison.sh

-- Alison

> 
> This was hit in a prior version of the set where I saw it fail on a
> system using System Address mode instead of Normalized Address mode.
> 
> ~Gregory
Re: [PATCH v6 00/11] cxl: ACPI PRM Address Translation Support and AMD Zen5 enablement
Posted by Gregory Price 2 months, 2 weeks ago
On Mon, Nov 24, 2025 at 07:26:51PM -0800, Alison Schofield wrote:
> On Mon, Nov 24, 2025 at 03:10:07PM -0500, Gregory Price wrote:
> > 
> > IIRC the to_spa() function wouldn't be populated (will be NULL) if this
> > is the case, so you wouldn't even be able to call the translation
> > function.
> 
> The hpa_to_spa fn defined as a root decoder ops is an additional layer for
> arch's needing HPA to SPA translation. It's optional. If there is no
> hpa_to_spa fn, then it is assumed that the CXL HPA==SPA and that is the
> 'final answer'  added to the trace log.
> 
> Sounds like you are on one of these systems, so maybe you could take a
> look at what happens. If your devices support, try to inject and/or
> clear poison and see the resulting kernel trace log. There is an
> example for that here:
> https://github.com/pmem/ndctl/blob/main/test/cxl-poison.sh
> 

Servers am able to test on are @ 6.16 with cxl .17+.18 backports

I don't see: /sys/kernel/debug/cxl/$dev/${action}_poison
in my sysfs

Have enabled enabling various debug and einj options.

When were these added? Am i missing build options?

~Gregory
Re: [PATCH v6 00/11] cxl: ACPI PRM Address Translation Support and AMD Zen5 enablement
Posted by Alison Schofield 2 months, 1 week ago
On Tue, Nov 25, 2025 at 08:54:58AM -0500, Gregory Price wrote:
> On Mon, Nov 24, 2025 at 07:26:51PM -0800, Alison Schofield wrote:
> > On Mon, Nov 24, 2025 at 03:10:07PM -0500, Gregory Price wrote:
> > > 
> > > IIRC the to_spa() function wouldn't be populated (will be NULL) if this
> > > is the case, so you wouldn't even be able to call the translation
> > > function.
> > 
> > The hpa_to_spa fn defined as a root decoder ops is an additional layer for
> > arch's needing HPA to SPA translation. It's optional. If there is no
> > hpa_to_spa fn, then it is assumed that the CXL HPA==SPA and that is the
> > 'final answer'  added to the trace log.
> > 
> > Sounds like you are on one of these systems, so maybe you could take a
> > look at what happens. If your devices support, try to inject and/or
> > clear poison and see the resulting kernel trace log. There is an
> > example for that here:
> > https://github.com/pmem/ndctl/blob/main/test/cxl-poison.sh
> > 
> 
> Servers am able to test on are @ 6.16 with cxl .17+.18 backports
> 
> I don't see: /sys/kernel/debug/cxl/$dev/${action}_poison
> in my sysfs
> 
> Have enabled enabling various debug and einj options.
> 
> When were these added? Am i missing build options?

Inject and clear poison are since 6.4.
Need CONFIG_DEBUG_FS, but of course you wouldn't have even seen the
the /sys/kernel/debug path if that were missing.

Could be your devices don't support inject or clear. At init time we set
the bits inidicating what poison opcodes the device supports, see
cxl_set_poison_cmd_enabled().
50d527f52cbf ("cxl/mem: Add debugfs attributes for poison inject and clear")

Devices may support list but not inject and clear. Look for this attribute:
/sys/bus/cxl/devices/memX/trigger_poison_list. If that is present, then a
quicker, maybe fruitful check, may be 'cxl list -M --media-errors'. If you're
lucky ;) your devices come pre-loaded with poison. That cmd will emit the
poisoned DPAs and if part of a region, the SPAs too. cxl-list is getting that
all from the kernel trace log, so if you don't have 'cxl list', just trigger
directly and examine the trace log.

-- Alison


> 
> ~Gregory
Re: [PATCH v6 00/11] cxl: ACPI PRM Address Translation Support and AMD Zen5 enablement
Posted by Gregory Price 2 months, 1 week ago
On Tue, Nov 25, 2025 at 08:37:33AM -0800, Alison Schofield wrote:
> On Tue, Nov 25, 2025 at 08:54:58AM -0500, Gregory Price wrote:
> 
> Could be your devices don't support inject or clear. At init time we set
> the bits inidicating what poison opcodes the device supports, see
> cxl_set_poison_cmd_enabled().
> 50d527f52cbf ("cxl/mem: Add debugfs attributes for poison inject and clear")
> 
> Devices may support list but not inject and clear. Look for this attribute:
> /sys/bus/cxl/devices/memX/trigger_poison_list. If that is present, then a
> quicker, maybe fruitful check, may be 'cxl list -M --media-errors'. If you're
> lucky ;) your devices come pre-loaded with poison. That cmd will emit the
> poisoned DPAs and if part of a region, the SPAs too. cxl-list is getting that
> all from the kernel trace log, so if you don't have 'cxl list', just trigger
> directly and examine the trace log.
> 

Unfortunately it appears my devices do not support any of this :[

[ /sys/kernel/debug/cxl]$ ls mem0/
dpamem

Is all I have

~Gregory