[v9] cxl: ACPI PRM Address Translation Support and AMD Zen5 enablement

[PATCH v9 10/13] cxl: Enable AMD Zen5 address translation using ACPI PRMT

Posted by Robert Richter 4 weeks, 1 day ago

Add AMD Zen5 support for address translation.

Zen5 systems may be configured to use 'Normalized addresses'. Then,
host physical addresses (HPA) are different from their system physical
addresses (SPA). The endpoint has its own physical address space and
an incoming HPA is already converted to the device's physical address
(DPA). Thus it has interleaving disabled and CXL endpoints are
programmed passthrough (DPA == HPA).

Host Physical Addresses (HPAs) need to be translated from the endpoint
to its CXL host bridge, esp. to identify the endpoint's root decoder
and region's address range. ACPI Platform Runtime Mechanism (PRM)
provides a handler to translate the DPA to its SPA. This is documented
in:

 AMD Family 1Ah Models 00h–0Fh and Models 10h–1Fh
 ACPI v6.5 Porting Guide, Publication # 58088
 https://www.amd.com/en/search/documentation/hub.html

With Normalized Addressing this PRM handler must be used to translate
an HPA of an endpoint to its SPA.

Do the following to implement AMD Zen5 address translation:

Introduce a new file core/atl.c to handle ACPI PRM specific address
translation code. Naming is loosely related to the kernel's AMD
Address Translation Library (CONFIG_AMD_ATL) but implementation does
not depend on it, nor it is vendor specific. Use Kbuild and Kconfig
options respectively to enable the code depending on architecture and
platform options.

AMD Zen5 systems support the ACPI PRM CXL Address Translation firmware
call (see ACPI v6.5 Porting Guide, Address Translation - CXL DPA to
System Physical Address). Firmware enables the PRM handler if the
platform has address translation implemented. Check firmware and
kernel support of ACPI PRM using the specific GUID. On success enable
address translation by setting up the earlier introduced root port
callback, see function cxl_prm_setup_translation(). Setup is done in
cxl_setup_prm_address_translation(), it is the only function that
needs to be exported. For low level PRM firmware calls, use the ACPI
framework.

Identify the region's interleaving ways by inspecting the address
ranges. Also determine the interleaving granularity using the address
translation callback. Note that the position of the chunk from one
interleaving block to the next may vary and thus cannot be considered
constant. Address offsets larger than the interleaving block size
cannot be used to calculate the granularity. Thus, probe the
granularity using address translation for various HPAs in the same
interleaving block.

Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Tested-by: Gregory Price <gourry@gourry.net>
Signed-off-by: Robert Richter <rrichter@amd.com>
---
 drivers/cxl/Kconfig       |   5 +
 drivers/cxl/acpi.c        |   2 +
 drivers/cxl/core/Makefile |   1 +
 drivers/cxl/core/atl.c    | 190 ++++++++++++++++++++++++++++++++++++++
 drivers/cxl/cxl.h         |   7 ++
 5 files changed, 205 insertions(+)
 create mode 100644 drivers/cxl/core/atl.c

diff --git a/drivers/cxl/Kconfig b/drivers/cxl/Kconfig
index 48b7314afdb8..103950a9b73e 100644
--- a/drivers/cxl/Kconfig
+++ b/drivers/cxl/Kconfig
@@ -233,4 +233,9 @@ config CXL_MCE
 	def_bool y
 	depends on X86_MCE && MEMORY_FAILURE
 
+config CXL_ATL
+	def_bool y
+	depends on CXL_REGION
+	depends on ACPI_PRMT && AMD_NB
+
 endif
diff --git a/drivers/cxl/acpi.c b/drivers/cxl/acpi.c
index a31d0f97f916..50c2987e0459 100644
--- a/drivers/cxl/acpi.c
+++ b/drivers/cxl/acpi.c
@@ -925,6 +925,8 @@ static int cxl_acpi_probe(struct platform_device *pdev)
 	cxl_root->ops.qos_class = cxl_acpi_qos_class;
 	root_port = &cxl_root->port;
 
+	cxl_setup_prm_address_translation(cxl_root);
+
 	rc = bus_for_each_dev(adev->dev.bus, NULL, root_port,
 			      add_host_bridge_dport);
 	if (rc < 0)
diff --git a/drivers/cxl/core/Makefile b/drivers/cxl/core/Makefile
index 5ad8fef210b5..11fe272a6e29 100644
--- a/drivers/cxl/core/Makefile
+++ b/drivers/cxl/core/Makefile
@@ -20,3 +20,4 @@ cxl_core-$(CONFIG_CXL_REGION) += region.o
 cxl_core-$(CONFIG_CXL_MCE) += mce.o
 cxl_core-$(CONFIG_CXL_FEATURES) += features.o
 cxl_core-$(CONFIG_CXL_EDAC_MEM_FEATURES) += edac.o
+cxl_core-$(CONFIG_CXL_ATL) += atl.o
diff --git a/drivers/cxl/core/atl.c b/drivers/cxl/core/atl.c
new file mode 100644
index 000000000000..c36984686fb0
--- /dev/null
+++ b/drivers/cxl/core/atl.c
@@ -0,0 +1,190 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Copyright (C) 2025 Advanced Micro Devices, Inc.
+ */
+
+#include <linux/prmt.h>
+#include <linux/pci.h>
+#include <linux/acpi.h>
+
+#include <cxlmem.h>
+#include "core.h"
+
+/*
+ * PRM Address Translation - CXL DPA to System Physical Address
+ *
+ * Reference:
+ *
+ * AMD Family 1Ah Models 00h–0Fh and Models 10h–1Fh
+ * ACPI v6.5 Porting Guide, Publication # 58088
+ */
+
+static const guid_t prm_cxl_dpa_spa_guid =
+	GUID_INIT(0xee41b397, 0x25d4, 0x452c, 0xad, 0x54, 0x48, 0xc6, 0xe3,
+		  0x48, 0x0b, 0x94);
+
+struct prm_cxl_dpa_spa_data {
+	u64 dpa;
+	u8 reserved;
+	u8 devfn;
+	u8 bus;
+	u8 segment;
+	u64 *spa;
+} __packed;
+
+static u64 prm_cxl_dpa_spa(struct pci_dev *pci_dev, u64 dpa)
+{
+	struct prm_cxl_dpa_spa_data data;
+	u64 spa;
+	int rc;
+
+	data = (struct prm_cxl_dpa_spa_data) {
+		.dpa     = dpa,
+		.devfn   = pci_dev->devfn,
+		.bus     = pci_dev->bus->number,
+		.segment = pci_domain_nr(pci_dev->bus),
+		.spa     = &spa,
+	};
+
+	rc = acpi_call_prm_handler(prm_cxl_dpa_spa_guid, &data);
+	if (rc) {
+		pci_dbg(pci_dev, "failed to get SPA for %#llx: %d\n", dpa, rc);
+		return ULLONG_MAX;
+	}
+
+	pci_dbg(pci_dev, "PRM address translation: DPA -> SPA: %#llx -> %#llx\n", dpa, spa);
+
+	return spa;
+}
+
+static int cxl_prm_setup_root(struct cxl_root *cxl_root, void *data)
+{
+	struct cxl_region_context *ctx = data;
+	struct cxl_endpoint_decoder *cxled = ctx->cxled;
+	struct cxl_decoder *cxld = &cxled->cxld;
+	struct cxl_memdev *cxlmd = cxled_to_memdev(cxled);
+	struct range hpa_range = ctx->hpa_range;
+	struct pci_dev *pci_dev;
+	u64 spa_len, len;
+	u64 addr, base_spa, base;
+	int ways, gran;
+
+	/*
+	 * When Normalized Addressing is enabled, the endpoint maintains a 1:1
+	 * mapping between HPA and DPA. If disabled, skip address translation
+	 * and perform only a range check.
+	 */
+	if (hpa_range.start != cxled->dpa_res->start)
+		return 0;
+
+	/*
+	 * Endpoints are programmed passthrough in Normalized Addressing mode.
+	 */
+	if (ctx->interleave_ways != 1) {
+		dev_dbg(&cxld->dev, "unexpected interleaving config: ways: %d granularity: %d\n",
+			ctx->interleave_ways, ctx->interleave_granularity);
+		return -ENXIO;
+	}
+
+	if (!cxlmd || !dev_is_pci(cxlmd->dev.parent)) {
+		dev_dbg(&cxld->dev, "No endpoint found: %s, range %#llx-%#llx\n",
+			dev_name(cxld->dev.parent), hpa_range.start,
+			hpa_range.end);
+		return -ENXIO;
+	}
+
+	pci_dev = to_pci_dev(cxlmd->dev.parent);
+
+	/* Translate HPA range to SPA. */
+	base = hpa_range.start;
+	hpa_range.start = prm_cxl_dpa_spa(pci_dev, hpa_range.start);
+	hpa_range.end = prm_cxl_dpa_spa(pci_dev, hpa_range.end);
+	base_spa = hpa_range.start;
+
+	if (hpa_range.start == ULLONG_MAX || hpa_range.end == ULLONG_MAX) {
+		dev_dbg(cxld->dev.parent,
+			"CXL address translation: Failed to translate HPA range: %#llx-%#llx:%#llx-%#llx(%s)\n",
+			hpa_range.start, hpa_range.end, ctx->hpa_range.start,
+			ctx->hpa_range.end, dev_name(&cxld->dev));
+		return -ENXIO;
+	}
+
+	/*
+	 * Since translated addresses include the interleaving offsets, align
+	 * the range to 256 MB.
+	 */
+	hpa_range.start = ALIGN_DOWN(hpa_range.start, SZ_256M);
+	hpa_range.end = ALIGN(hpa_range.end, SZ_256M) - 1;
+
+	len = range_len(&ctx->hpa_range);
+	spa_len = range_len(&hpa_range);
+	if (!len || !spa_len || spa_len % len) {
+		dev_dbg(cxld->dev.parent,
+			"CXL address translation: HPA range not contiguous: %#llx-%#llx:%#llx-%#llx(%s)\n",
+			hpa_range.start, hpa_range.end, ctx->hpa_range.start,
+			ctx->hpa_range.end, dev_name(&cxld->dev));
+		return -ENXIO;
+	}
+
+	ways = spa_len / len;
+	gran = SZ_256;
+
+	/*
+	 * Determine interleave granularity
+	 *
+	 * Note: The position of the chunk from one interleaving block to the
+	 * next may vary and thus cannot be considered constant. Address offsets
+	 * larger than the interleaving block size cannot be used to calculate
+	 * the granularity.
+	 */
+	if (ways > 1) {
+		while (gran <= SZ_16M) {
+			addr = prm_cxl_dpa_spa(pci_dev, base + gran);
+			if (addr != base_spa + gran)
+				break;
+			gran <<= 1;
+		}
+	}
+
+	if (gran > SZ_16M) {
+		dev_dbg(cxld->dev.parent,
+			"CXL address translation: Cannot determine granularity: %#llx-%#llx:%#llx-%#llx(%s)\n",
+			hpa_range.start, hpa_range.end, ctx->hpa_range.start,
+			ctx->hpa_range.end, dev_name(&cxld->dev));
+		return -ENXIO;
+	}
+
+	ctx->hpa_range = hpa_range;
+	ctx->interleave_ways = ways;
+	ctx->interleave_granularity = gran;
+
+	dev_dbg(&cxld->dev,
+		"address mapping found for %s (hpa -> spa): %#llx+%#llx -> %#llx+%#llx ways:%d granularity:%d\n",
+		dev_name(cxlmd->dev.parent), base, len, hpa_range.start,
+		spa_len, ways, gran);
+
+	return 0;
+}
+
+void cxl_setup_prm_address_translation(struct cxl_root *cxl_root)
+{
+	struct device *host = cxl_root->port.uport_dev;
+	u64 spa;
+	struct prm_cxl_dpa_spa_data data = { .spa = &spa };
+	int rc;
+
+	/*
+	 * Applies only to PCIe Host Bridges which are children of the CXL Root
+	 * Device (HID=“ACPI0017”). Check this and drop cxl_test instances.
+	 */
+	if (!acpi_match_device(host->driver->acpi_match_table, host))
+		return;
+
+	/* Check kernel (-EOPNOTSUPP) and firmware support (-ENODEV) */
+	rc = acpi_call_prm_handler(prm_cxl_dpa_spa_guid, &data);
+	if (rc == -EOPNOTSUPP || rc == -ENODEV)
+		return;
+
+	cxl_root->ops.translation_setup_root = cxl_prm_setup_root;
+}
+EXPORT_SYMBOL_NS_GPL(cxl_setup_prm_address_translation, "CXL");
diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
index 8ea334d81edf..20b0fd43fa7b 100644
--- a/drivers/cxl/cxl.h
+++ b/drivers/cxl/cxl.h
@@ -817,6 +817,13 @@ static inline void cxl_dport_init_ras_reporting(struct cxl_dport *dport,
 						struct device *host) { }
 #endif
 
+#ifdef CONFIG_CXL_ATL
+void cxl_setup_prm_address_translation(struct cxl_root *cxl_root);
+#else
+static inline
+void cxl_setup_prm_address_translation(struct cxl_root *cxl_root) {}
+#endif
+
 struct cxl_decoder *to_cxl_decoder(struct device *dev);
 struct cxl_root_decoder *to_cxl_root_decoder(struct device *dev);
 struct cxl_switch_decoder *to_cxl_switch_decoder(struct device *dev);
-- 
2.47.3

Re: [PATCH v9 10/13] cxl: Enable AMD Zen5 address translation using ACPI PRMT

Posted by Ard Biesheuvel 3 weeks, 4 days ago

(cc Peter)

On Sat, 10 Jan 2026 at 12:46, Robert Richter <rrichter@amd.com> wrote:
>
> Add AMD Zen5 support for address translation.
>
...
> Do the following to implement AMD Zen5 address translation:
>
> Introduce a new file core/atl.c to handle ACPI PRM specific address
> translation code. Naming is loosely related to the kernel's AMD
> Address Translation Library (CONFIG_AMD_ATL) but implementation does
> not depend on it, nor it is vendor specific. Use Kbuild and Kconfig
> options respectively to enable the code depending on architecture and
> platform options.
>
> AMD Zen5 systems support the ACPI PRM CXL Address Translation firmware
> call (see ACPI v6.5 Porting Guide, Address Translation - CXL DPA to
> System Physical Address). Firmware enables the PRM handler if the
> platform has address translation implemented. Check firmware and
> kernel support of ACPI PRM using the specific GUID. On success enable
> address translation by setting up the earlier introduced root port
> callback, see function cxl_prm_setup_translation(). Setup is done in
> cxl_setup_prm_address_translation(), it is the only function that
> needs to be exported. For low level PRM firmware calls, use the ACPI
> framework.
>

Does the PRM service in question tolerate being invoked unprivileged?
The PRM spec requires this, and this is something we may need to
enforce at some point.

cc'ing Peter with whom I've discussed this just recently.



> Identify the region's interleaving ways by inspecting the address
> ranges. Also determine the interleaving granularity using the address
> translation callback. Note that the position of the chunk from one
> interleaving block to the next may vary and thus cannot be considered
> constant. Address offsets larger than the interleaving block size
> cannot be used to calculate the granularity. Thus, probe the
> granularity using address translation for various HPAs in the same
> interleaving block.
>
> Reviewed-by: Dave Jiang <dave.jiang@intel.com>
> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
> Tested-by: Gregory Price <gourry@gourry.net>
> Signed-off-by: Robert Richter <rrichter@amd.com>
> ---
>  drivers/cxl/Kconfig       |   5 +
>  drivers/cxl/acpi.c        |   2 +
>  drivers/cxl/core/Makefile |   1 +
>  drivers/cxl/core/atl.c    | 190 ++++++++++++++++++++++++++++++++++++++
>  drivers/cxl/cxl.h         |   7 ++
>  5 files changed, 205 insertions(+)
>  create mode 100644 drivers/cxl/core/atl.c
>
> diff --git a/drivers/cxl/Kconfig b/drivers/cxl/Kconfig
> index 48b7314afdb8..103950a9b73e 100644
> --- a/drivers/cxl/Kconfig
> +++ b/drivers/cxl/Kconfig
> @@ -233,4 +233,9 @@ config CXL_MCE
>         def_bool y
>         depends on X86_MCE && MEMORY_FAILURE
>
> +config CXL_ATL
> +       def_bool y
> +       depends on CXL_REGION
> +       depends on ACPI_PRMT && AMD_NB
> +
>  endif
> diff --git a/drivers/cxl/acpi.c b/drivers/cxl/acpi.c
> index a31d0f97f916..50c2987e0459 100644
> --- a/drivers/cxl/acpi.c
> +++ b/drivers/cxl/acpi.c
> @@ -925,6 +925,8 @@ static int cxl_acpi_probe(struct platform_device *pdev)
>         cxl_root->ops.qos_class = cxl_acpi_qos_class;
>         root_port = &cxl_root->port;
>
> +       cxl_setup_prm_address_translation(cxl_root);
> +
>         rc = bus_for_each_dev(adev->dev.bus, NULL, root_port,
>                               add_host_bridge_dport);
>         if (rc < 0)
> diff --git a/drivers/cxl/core/Makefile b/drivers/cxl/core/Makefile
> index 5ad8fef210b5..11fe272a6e29 100644
> --- a/drivers/cxl/core/Makefile
> +++ b/drivers/cxl/core/Makefile
> @@ -20,3 +20,4 @@ cxl_core-$(CONFIG_CXL_REGION) += region.o
>  cxl_core-$(CONFIG_CXL_MCE) += mce.o
>  cxl_core-$(CONFIG_CXL_FEATURES) += features.o
>  cxl_core-$(CONFIG_CXL_EDAC_MEM_FEATURES) += edac.o
> +cxl_core-$(CONFIG_CXL_ATL) += atl.o
> diff --git a/drivers/cxl/core/atl.c b/drivers/cxl/core/atl.c
> new file mode 100644
> index 000000000000..c36984686fb0
> --- /dev/null
> +++ b/drivers/cxl/core/atl.c
> @@ -0,0 +1,190 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/*
> + * Copyright (C) 2025 Advanced Micro Devices, Inc.
> + */
> +
> +#include <linux/prmt.h>
> +#include <linux/pci.h>
> +#include <linux/acpi.h>
> +
> +#include <cxlmem.h>
> +#include "core.h"
> +
> +/*
> + * PRM Address Translation - CXL DPA to System Physical Address
> + *
> + * Reference:
> + *
> + * AMD Family 1Ah Models 00h–0Fh and Models 10h–1Fh
> + * ACPI v6.5 Porting Guide, Publication # 58088
> + */
> +
> +static const guid_t prm_cxl_dpa_spa_guid =
> +       GUID_INIT(0xee41b397, 0x25d4, 0x452c, 0xad, 0x54, 0x48, 0xc6, 0xe3,
> +                 0x48, 0x0b, 0x94);
> +
> +struct prm_cxl_dpa_spa_data {
> +       u64 dpa;
> +       u8 reserved;
> +       u8 devfn;
> +       u8 bus;
> +       u8 segment;
> +       u64 *spa;
> +} __packed;
> +
> +static u64 prm_cxl_dpa_spa(struct pci_dev *pci_dev, u64 dpa)
> +{
> +       struct prm_cxl_dpa_spa_data data;
> +       u64 spa;
> +       int rc;
> +
> +       data = (struct prm_cxl_dpa_spa_data) {
> +               .dpa     = dpa,
> +               .devfn   = pci_dev->devfn,
> +               .bus     = pci_dev->bus->number,
> +               .segment = pci_domain_nr(pci_dev->bus),
> +               .spa     = &spa,
> +       };
> +
> +       rc = acpi_call_prm_handler(prm_cxl_dpa_spa_guid, &data);
> +       if (rc) {
> +               pci_dbg(pci_dev, "failed to get SPA for %#llx: %d\n", dpa, rc);
> +               return ULLONG_MAX;
> +       }
> +
> +       pci_dbg(pci_dev, "PRM address translation: DPA -> SPA: %#llx -> %#llx\n", dpa, spa);
> +
> +       return spa;
> +}
> +
> +static int cxl_prm_setup_root(struct cxl_root *cxl_root, void *data)
> +{
> +       struct cxl_region_context *ctx = data;
> +       struct cxl_endpoint_decoder *cxled = ctx->cxled;
> +       struct cxl_decoder *cxld = &cxled->cxld;
> +       struct cxl_memdev *cxlmd = cxled_to_memdev(cxled);
> +       struct range hpa_range = ctx->hpa_range;
> +       struct pci_dev *pci_dev;
> +       u64 spa_len, len;
> +       u64 addr, base_spa, base;
> +       int ways, gran;
> +
> +       /*
> +        * When Normalized Addressing is enabled, the endpoint maintains a 1:1
> +        * mapping between HPA and DPA. If disabled, skip address translation
> +        * and perform only a range check.
> +        */
> +       if (hpa_range.start != cxled->dpa_res->start)
> +               return 0;
> +
> +       /*
> +        * Endpoints are programmed passthrough in Normalized Addressing mode.
> +        */
> +       if (ctx->interleave_ways != 1) {
> +               dev_dbg(&cxld->dev, "unexpected interleaving config: ways: %d granularity: %d\n",
> +                       ctx->interleave_ways, ctx->interleave_granularity);
> +               return -ENXIO;
> +       }
> +
> +       if (!cxlmd || !dev_is_pci(cxlmd->dev.parent)) {
> +               dev_dbg(&cxld->dev, "No endpoint found: %s, range %#llx-%#llx\n",
> +                       dev_name(cxld->dev.parent), hpa_range.start,
> +                       hpa_range.end);
> +               return -ENXIO;
> +       }
> +
> +       pci_dev = to_pci_dev(cxlmd->dev.parent);
> +
> +       /* Translate HPA range to SPA. */
> +       base = hpa_range.start;
> +       hpa_range.start = prm_cxl_dpa_spa(pci_dev, hpa_range.start);
> +       hpa_range.end = prm_cxl_dpa_spa(pci_dev, hpa_range.end);
> +       base_spa = hpa_range.start;
> +
> +       if (hpa_range.start == ULLONG_MAX || hpa_range.end == ULLONG_MAX) {
> +               dev_dbg(cxld->dev.parent,
> +                       "CXL address translation: Failed to translate HPA range: %#llx-%#llx:%#llx-%#llx(%s)\n",
> +                       hpa_range.start, hpa_range.end, ctx->hpa_range.start,
> +                       ctx->hpa_range.end, dev_name(&cxld->dev));
> +               return -ENXIO;
> +       }
> +
> +       /*
> +        * Since translated addresses include the interleaving offsets, align
> +        * the range to 256 MB.
> +        */
> +       hpa_range.start = ALIGN_DOWN(hpa_range.start, SZ_256M);
> +       hpa_range.end = ALIGN(hpa_range.end, SZ_256M) - 1;
> +
> +       len = range_len(&ctx->hpa_range);
> +       spa_len = range_len(&hpa_range);
> +       if (!len || !spa_len || spa_len % len) {
> +               dev_dbg(cxld->dev.parent,
> +                       "CXL address translation: HPA range not contiguous: %#llx-%#llx:%#llx-%#llx(%s)\n",
> +                       hpa_range.start, hpa_range.end, ctx->hpa_range.start,
> +                       ctx->hpa_range.end, dev_name(&cxld->dev));
> +               return -ENXIO;
> +       }
> +
> +       ways = spa_len / len;
> +       gran = SZ_256;
> +
> +       /*
> +        * Determine interleave granularity
> +        *
> +        * Note: The position of the chunk from one interleaving block to the
> +        * next may vary and thus cannot be considered constant. Address offsets
> +        * larger than the interleaving block size cannot be used to calculate
> +        * the granularity.
> +        */
> +       if (ways > 1) {
> +               while (gran <= SZ_16M) {
> +                       addr = prm_cxl_dpa_spa(pci_dev, base + gran);
> +                       if (addr != base_spa + gran)
> +                               break;
> +                       gran <<= 1;
> +               }
> +       }
> +
> +       if (gran > SZ_16M) {
> +               dev_dbg(cxld->dev.parent,
> +                       "CXL address translation: Cannot determine granularity: %#llx-%#llx:%#llx-%#llx(%s)\n",
> +                       hpa_range.start, hpa_range.end, ctx->hpa_range.start,
> +                       ctx->hpa_range.end, dev_name(&cxld->dev));
> +               return -ENXIO;
> +       }
> +
> +       ctx->hpa_range = hpa_range;
> +       ctx->interleave_ways = ways;
> +       ctx->interleave_granularity = gran;
> +
> +       dev_dbg(&cxld->dev,
> +               "address mapping found for %s (hpa -> spa): %#llx+%#llx -> %#llx+%#llx ways:%d granularity:%d\n",
> +               dev_name(cxlmd->dev.parent), base, len, hpa_range.start,
> +               spa_len, ways, gran);
> +
> +       return 0;
> +}
> +
> +void cxl_setup_prm_address_translation(struct cxl_root *cxl_root)
> +{
> +       struct device *host = cxl_root->port.uport_dev;
> +       u64 spa;
> +       struct prm_cxl_dpa_spa_data data = { .spa = &spa };
> +       int rc;
> +
> +       /*
> +        * Applies only to PCIe Host Bridges which are children of the CXL Root
> +        * Device (HID=“ACPI0017”). Check this and drop cxl_test instances.
> +        */
> +       if (!acpi_match_device(host->driver->acpi_match_table, host))
> +               return;
> +
> +       /* Check kernel (-EOPNOTSUPP) and firmware support (-ENODEV) */
> +       rc = acpi_call_prm_handler(prm_cxl_dpa_spa_guid, &data);
> +       if (rc == -EOPNOTSUPP || rc == -ENODEV)
> +               return;
> +
> +       cxl_root->ops.translation_setup_root = cxl_prm_setup_root;
> +}
> +EXPORT_SYMBOL_NS_GPL(cxl_setup_prm_address_translation, "CXL");
> diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
> index 8ea334d81edf..20b0fd43fa7b 100644
> --- a/drivers/cxl/cxl.h
> +++ b/drivers/cxl/cxl.h
> @@ -817,6 +817,13 @@ static inline void cxl_dport_init_ras_reporting(struct cxl_dport *dport,
>                                                 struct device *host) { }
>  #endif
>
> +#ifdef CONFIG_CXL_ATL
> +void cxl_setup_prm_address_translation(struct cxl_root *cxl_root);
> +#else
> +static inline
> +void cxl_setup_prm_address_translation(struct cxl_root *cxl_root) {}
> +#endif
> +
>  struct cxl_decoder *to_cxl_decoder(struct device *dev);
>  struct cxl_root_decoder *to_cxl_root_decoder(struct device *dev);
>  struct cxl_switch_decoder *to_cxl_switch_decoder(struct device *dev);
> --
> 2.47.3
>
>

Re: [PATCH v9 10/13] cxl: Enable AMD Zen5 address translation using ACPI PRMT

Posted by Robert Richter 3 weeks, 4 days ago

On Wed, Jan 14, 2026 at 08:47:22AM +0100, Ard Biesheuvel wrote:
> (cc Peter)
> 
> On Sat, 10 Jan 2026 at 12:46, Robert Richter <rrichter@amd.com> wrote:
> >
> > Add AMD Zen5 support for address translation.
> >
> ...
> > Do the following to implement AMD Zen5 address translation:
> >
> > Introduce a new file core/atl.c to handle ACPI PRM specific address
> > translation code. Naming is loosely related to the kernel's AMD
> > Address Translation Library (CONFIG_AMD_ATL) but implementation does
> > not depend on it, nor it is vendor specific. Use Kbuild and Kconfig
> > options respectively to enable the code depending on architecture and
> > platform options.
> >
> > AMD Zen5 systems support the ACPI PRM CXL Address Translation firmware
> > call (see ACPI v6.5 Porting Guide, Address Translation - CXL DPA to
> > System Physical Address). Firmware enables the PRM handler if the
> > platform has address translation implemented. Check firmware and
> > kernel support of ACPI PRM using the specific GUID. On success enable
> > address translation by setting up the earlier introduced root port
> > callback, see function cxl_prm_setup_translation(). Setup is done in
> > cxl_setup_prm_address_translation(), it is the only function that
> > needs to be exported. For low level PRM firmware calls, use the ACPI
> > framework.
> >
> 
> Does the PRM service in question tolerate being invoked unprivileged?
> The PRM spec requires this, and this is something we may need to
> enforce at some point.
> 
> cc'ing Peter with whom I've discussed this just recently.

Interesting appoach, need to check if that works. I haven't tried that
yet. Though, that needs some rework of the kernel code as some high
priority code depends on the translation and that would cause kind of
priority inversion. E.g. an interrupt handler cannot wait until a
dpa-to-spa conversion is done.

For CXL it is only used for region setup in the init path and process
context. For tracing and error handling those translations are
disabled. See patch 13/13.

Thanks,

-Robert

Re: [PATCH v9 10/13] cxl: Enable AMD Zen5 address translation using ACPI PRMT

Posted by Ard Biesheuvel 3 weeks, 4 days ago

On Wed, 14 Jan 2026 at 15:00, Robert Richter <rrichter@amd.com> wrote:
>
> On Wed, Jan 14, 2026 at 08:47:22AM +0100, Ard Biesheuvel wrote:
> > (cc Peter)
> >
> > On Sat, 10 Jan 2026 at 12:46, Robert Richter <rrichter@amd.com> wrote:
> > >
> > > Add AMD Zen5 support for address translation.
> > >
> > ...
> > > Do the following to implement AMD Zen5 address translation:
> > >
> > > Introduce a new file core/atl.c to handle ACPI PRM specific address
> > > translation code. Naming is loosely related to the kernel's AMD
> > > Address Translation Library (CONFIG_AMD_ATL) but implementation does
> > > not depend on it, nor it is vendor specific. Use Kbuild and Kconfig
> > > options respectively to enable the code depending on architecture and
> > > platform options.
> > >
> > > AMD Zen5 systems support the ACPI PRM CXL Address Translation firmware
> > > call (see ACPI v6.5 Porting Guide, Address Translation - CXL DPA to
> > > System Physical Address). Firmware enables the PRM handler if the
> > > platform has address translation implemented. Check firmware and
> > > kernel support of ACPI PRM using the specific GUID. On success enable
> > > address translation by setting up the earlier introduced root port
> > > callback, see function cxl_prm_setup_translation(). Setup is done in
> > > cxl_setup_prm_address_translation(), it is the only function that
> > > needs to be exported. For low level PRM firmware calls, use the ACPI
> > > framework.
> > >
> >
> > Does the PRM service in question tolerate being invoked unprivileged?
> > The PRM spec requires this, and this is something we may need to
> > enforce at some point.
> >
> > cc'ing Peter with whom I've discussed this just recently.
>
> Interesting appoach, need to check if that works. I haven't tried that
> yet. Though, that needs some rework of the kernel code as some high
> priority code depends on the translation and that would cause kind of
> priority inversion. E.g. an interrupt handler cannot wait until a
> dpa-to-spa conversion is done.
>

This is not about running it in user space, but about running the code
in an unprivileged sandbox. So scheduling wpuldn't really come into
play here.

> For CXL it is only used for region setup in the init path and process
> context. For tracing and error handling those translations are
> disabled. See patch 13/13.
>

Re: [PATCH v9 10/13] cxl: Enable AMD Zen5 address translation using ACPI PRMT

Posted by Jonathan Cameron 3 weeks, 4 days ago

On Wed, 14 Jan 2026 16:21:03 +0100
Ard Biesheuvel <ardb@kernel.org> wrote:

> On Wed, 14 Jan 2026 at 15:00, Robert Richter <rrichter@amd.com> wrote:
> >
> > On Wed, Jan 14, 2026 at 08:47:22AM +0100, Ard Biesheuvel wrote:  
> > > (cc Peter)
> > >
> > > On Sat, 10 Jan 2026 at 12:46, Robert Richter <rrichter@amd.com> wrote:  
> > > >
> > > > Add AMD Zen5 support for address translation.
> > > >  
> > > ...  
> > > > Do the following to implement AMD Zen5 address translation:
> > > >
> > > > Introduce a new file core/atl.c to handle ACPI PRM specific address
> > > > translation code. Naming is loosely related to the kernel's AMD
> > > > Address Translation Library (CONFIG_AMD_ATL) but implementation does
> > > > not depend on it, nor it is vendor specific. Use Kbuild and Kconfig
> > > > options respectively to enable the code depending on architecture and
> > > > platform options.
> > > >
> > > > AMD Zen5 systems support the ACPI PRM CXL Address Translation firmware
> > > > call (see ACPI v6.5 Porting Guide, Address Translation - CXL DPA to
> > > > System Physical Address). Firmware enables the PRM handler if the
> > > > platform has address translation implemented. Check firmware and
> > > > kernel support of ACPI PRM using the specific GUID. On success enable
> > > > address translation by setting up the earlier introduced root port
> > > > callback, see function cxl_prm_setup_translation(). Setup is done in
> > > > cxl_setup_prm_address_translation(), it is the only function that
> > > > needs to be exported. For low level PRM firmware calls, use the ACPI
> > > > framework.
> > > >  
> > >
> > > Does the PRM service in question tolerate being invoked unprivileged?
> > > The PRM spec requires this, and this is something we may need to
> > > enforce at some point.
> > >
> > > cc'ing Peter with whom I've discussed this just recently.  
> >
> > Interesting appoach, need to check if that works. I haven't tried that
> > yet. Though, that needs some rework of the kernel code as some high
> > priority code depends on the translation and that would cause kind of
> > priority inversion. E.g. an interrupt handler cannot wait until a
> > dpa-to-spa conversion is done.
> >  
> 
> This is not about running it in user space, but about running the code
> in an unprivileged sandbox. So scheduling wpuldn't really come into
> play here.

Hi Ard,

I haven't looked into the background yet, so a naive question:

Do we have a potential issue wrt to merging this as it stands and improving
on it later?  i.e. Is this a blocking issue for this patch set?

Thanks,

Jonathan

> 
> > For CXL it is only used for region setup in the init path and process
> > context. For tracing and error handling those translations are
> > disabled. See patch 13/13.
> >  
>

Re: [PATCH v9 10/13] cxl: Enable AMD Zen5 address translation using ACPI PRMT

Posted by Peter Zijlstra 3 weeks, 3 days ago

On Wed, Jan 14, 2026 at 06:08:59PM +0000, Jonathan Cameron wrote:

> Do we have a potential issue wrt to merging this as it stands and improving
> on it later?  i.e. Is this a blocking issue for this patch set?

Well, why do you *have* to use PRMT at all? And this is a serious
question; PRMT is basically injecting unaudited magic code into the
kernel, and that is a security risk.

Worse, in order to run this shit, we have to lower or disable various
security measures.

If I had my way, we would WARN and TAINT the kernel whenever such
garbage got used.

Re: [PATCH v9 10/13] cxl: Enable AMD Zen5 address translation using ACPI PRMT

Posted by Ard Biesheuvel 3 weeks, 3 days ago

On Thu, 15 Jan 2026 at 09:04, Peter Zijlstra <peterz@infradead.org> wrote:
>
> On Wed, Jan 14, 2026 at 06:08:59PM +0000, Jonathan Cameron wrote:
>
> > Do we have a potential issue wrt to merging this as it stands and improving
> > on it later?  i.e. Is this a blocking issue for this patch set?
>
> Well, why do you *have* to use PRMT at all? And this is a serious
> question; PRMT is basically injecting unaudited magic code into the
> kernel, and that is a security risk.
>
> Worse, in order to run this shit, we have to lower or disable various
> security measures.
>

Only if we decide to keep running it privileged, which the PRM spec no
longer requires (as you have confirmed yourself when we last discussed
this, right?)

> If I had my way, we would WARN and TAINT the kernel whenever such
> garbage got used.

These are things that used to live in SMM, requiring all CPUs to
disappear into SMM mode in a way that was completely opaque to the OS.

PRM runs under the control of the OS, does not require privileges and
only needs MMIO access to the regions it describes in its manifest
(which the OS can inspect, if desired). So if there are security
concerns with PRM today, it is because we were lazy and did not
implement PRM securely from the beginning.

In my defense, I wasn't aware of the unprivileged requirement until
you spotted it recently: it was something I had asked for when the PRM
spec was put up for "review" by the Intel and MS authors, and they
told me they couldn't possibly make any changes at that point, because
it had already gone into production. But as it turns out, the change
was made after all.

I am a total noob when it comes to how x86 does its ring0/ring3
switching, but with some help, I should be able to prototype something
to call into the PRM service unprivileged, running under the efi_mm.

Would that allay your concerns?

Re: [PATCH v9 10/13] cxl: Enable AMD Zen5 address translation using ACPI PRMT

Posted by Peter Zijlstra 3 weeks, 2 days ago

On Thu, Jan 15, 2026 at 09:30:10AM +0100, Ard Biesheuvel wrote:
> On Thu, 15 Jan 2026 at 09:04, Peter Zijlstra <peterz@infradead.org> wrote:
> >
> > On Wed, Jan 14, 2026 at 06:08:59PM +0000, Jonathan Cameron wrote:
> >
> > > Do we have a potential issue wrt to merging this as it stands and improving
> > > on it later?  i.e. Is this a blocking issue for this patch set?
> >
> > Well, why do you *have* to use PRMT at all? And this is a serious
> > question; PRMT is basically injecting unaudited magic code into the
> > kernel, and that is a security risk.
> >
> > Worse, in order to run this shit, we have to lower or disable various
> > security measures.
> >
> 
> Only if we decide to keep running it privileged, which the PRM spec no
> longer requires (as you have confirmed yourself when we last discussed
> this, right?)

Indeed. But those very constraints also make me wonder why we would ever
bother with PRM at all, and not simply require a native driver. Then you
actually *know* what the thing does and can debug/fix it without having
to rely on BIOS updates and whatnot.

Worse, you might have to deal with various incompatible buggy PRM
versions because BIOS :/

> > If I had my way, we would WARN and TAINT the kernel whenever such
> > garbage got used.
> 
> These are things that used to live in SMM, requiring all CPUs to
> disappear into SMM mode in a way that was completely opaque to the OS.
> 
> PRM runs under the control of the OS, does not require privileges and
> only needs MMIO access to the regions it describes in its manifest
> (which the OS can inspect, if desired). So if there are security
> concerns with PRM today, it is because we were lazy and did not
> implement PRM securely from the beginning.
> 
> In my defense, I wasn't aware of the unprivileged requirement until
> you spotted it recently: it was something I had asked for when the PRM
> spec was put up for "review" by the Intel and MS authors, and they
> told me they couldn't possibly make any changes at that point, because
> it had already gone into production. But as it turns out, the change
> was made after all.
> 
> I am a total noob when it comes to how x86 does its ring0/ring3
> switching, but with some help, I should be able to prototype something
> to call into the PRM service unprivileged, running under the efi_mm.

The ring transition itself is done using IRET; create a iret frame with
userspace CS and the right IP (and flag etc.) and off you go. The
problem is getting back in the kernel I suppose. All the 'normal' kernel
entry points assume the kernel stack is empty and all that.

The whole usermodehelper stuff creates a whole extra thread, sets
everything up and drops into userspace. Perhaps that is the easiest
solution. Basically you set the thread's mm to efi_mm, populate
task_pt_regs() with the right bits and simply drop into 'userspace'.

Then it can complete by terminating itself (sys_exit()) and the calling
context reaps the thing and continues.

> Would that allay your concerns?

Yeah, running it as userspace would be fine; we don't trust that.

But again; a native driver is ever so much better than relying on PRM.

In this case it is AMD doing a driver for their own chips, they know how
they work, they should be able to write this natively.

Re: [PATCH v9 10/13] cxl: Enable AMD Zen5 address translation using ACPI PRMT

Posted by Robert Richter 2 weeks, 6 days ago

(+Rafael and some AMD folks)

Hi Peter,

On Fri, Jan 16, 2026 at 03:38:38PM +0100, Peter Zijlstra wrote:
> On Thu, Jan 15, 2026 at 09:30:10AM +0100, Ard Biesheuvel wrote:
> > On Thu, 15 Jan 2026 at 09:04, Peter Zijlstra <peterz@infradead.org> wrote:
> > >
> > > On Wed, Jan 14, 2026 at 06:08:59PM +0000, Jonathan Cameron wrote:
> > >
> > > > Do we have a potential issue wrt to merging this as it stands and improving
> > > > on it later?  i.e. Is this a blocking issue for this patch set?
> > >
> > > Well, why do you *have* to use PRMT at all? And this is a serious
> > > question; PRMT is basically injecting unaudited magic code into the
> > > kernel, and that is a security risk.
> > >
> > > Worse, in order to run this shit, we have to lower or disable various
> > > security measures.
> > >
> > 
> > Only if we decide to keep running it privileged, which the PRM spec no
> > longer requires (as you have confirmed yourself when we last discussed
> > this, right?)
> 
> Indeed. But those very constraints also make me wonder why we would ever
> bother with PRM at all, and not simply require a native driver. Then you
> actually *know* what the thing does and can debug/fix it without having
> to rely on BIOS updates and whatnot.

an address translation driver needs the configuration data from the
Data Fabric, which is only known to firmware but not to the kernel.
Other ways would be necessary to expose and calculate that data, if it
is even feasible to make this information available.

So using PRM looks reasonable to me as this abstracts the logic and
data behind a method, same as doing a library call. Of course, you
don't want to trust that, but that could be addressed running it
unprivileged.

> Worse, you might have to deal with various incompatible buggy PRM
> versions because BIOS :/

The address translation functions are straight forward. I haven't
experienced any issues here. If there would be any, this will be
solvable, e.g. by requiring a specific minimum version or uuid to run
PRM.

> 
> > > If I had my way, we would WARN and TAINT the kernel whenever such
> > > garbage got used.
> > 
> > These are things that used to live in SMM, requiring all CPUs to
> > disappear into SMM mode in a way that was completely opaque to the OS.
> > 
> > PRM runs under the control of the OS, does not require privileges and
> > only needs MMIO access to the regions it describes in its manifest
> > (which the OS can inspect, if desired). So if there are security
> > concerns with PRM today, it is because we were lazy and did not
> > implement PRM securely from the beginning.
> > 
> > In my defense, I wasn't aware of the unprivileged requirement until
> > you spotted it recently: it was something I had asked for when the PRM
> > spec was put up for "review" by the Intel and MS authors, and they
> > told me they couldn't possibly make any changes at that point, because
> > it had already gone into production. But as it turns out, the change
> > was made after all.
> > 
> > I am a total noob when it comes to how x86 does its ring0/ring3
> > switching, but with some help, I should be able to prototype something
> > to call into the PRM service unprivileged, running under the efi_mm.
> 
> The ring transition itself is done using IRET; create a iret frame with
> userspace CS and the right IP (and flag etc.) and off you go. The
> problem is getting back in the kernel I suppose. All the 'normal' kernel
> entry points assume the kernel stack is empty and all that.
> 
> The whole usermodehelper stuff creates a whole extra thread, sets
> everything up and drops into userspace. Perhaps that is the easiest
> solution. Basically you set the thread's mm to efi_mm, populate
> task_pt_regs() with the right bits and simply drop into 'userspace'.
> 
> Then it can complete by terminating itself (sys_exit()) and the calling
> context reaps the thing and continues.

I can help with testing and also work on securing the PRM calls.
Thanks Ard for also looking into this.

> 
> > Would that allay your concerns?
> 
> Yeah, running it as userspace would be fine; we don't trust that.
> 
> But again; a native driver is ever so much better than relying on PRM.
> 
> In this case it is AMD doing a driver for their own chips, they know how
> they work, they should be able to write this natively.

Since a native driver introduces additional issues, as explained
above, I would prefer to use PRM for address translation and instead
ensure the PRM call is secure.

Dan, Dave, regarding this series, the cxl driver just uses existing
PRM kernel code and does not implement anything new here. Is there
anything that would prevent this series from being accepted? We are
already at v10 and review is complete:

https://patchwork.kernel.org/project/cxl/list/?series=1042412

I will follow up with working on unprivileged PRM calls. I think, that
will be the best solution here.

Thanks,

-Robert

Re: [PATCH v9 10/13] cxl: Enable AMD Zen5 address translation using ACPI PRMT

Posted by dan.j.williams@intel.com 2 weeks, 5 days ago

Robert Richter wrote:
[..]
> > Indeed. But those very constraints also make me wonder why we would ever
> > bother with PRM at all, and not simply require a native driver. Then you
> > actually *know* what the thing does and can debug/fix it without having
> > to rely on BIOS updates and whatnot.
> 
> an address translation driver needs the configuration data from the
> Data Fabric, which is only known to firmware but not to the kernel.
> Other ways would be necessary to expose and calculate that data, if it
> is even feasible to make this information available.

If it is just data is it amenable to put into a table?

Look at the complexity of the XOR addressing mode already defined in the
CEDT.CFMWS table, is the complexity significantly different than that?

> So using PRM looks reasonable to me as this abstracts the logic and
> data behind a method, same as doing a library call. Of course, you
> don't want to trust that, but that could be addressed running it
> unprivileged.

PRM should always be a last resort relative to an open specification
with a native driver implementation.

At a minimum Peter's feedback reiginited my simmering concerns with PRM
as a system-software design tool, and this should be a test case for
what Linux is willing and not willing to accept moving forward.

> > Worse, you might have to deal with various incompatible buggy PRM
> > versions because BIOS :/
> 
> The address translation functions are straight forward. I haven't
> experienced any issues here. If there would be any, this will be
> solvable, e.g. by requiring a specific minimum version or uuid to run
> PRM.

Can you publish the source to the PRM handler?

[..]
> > The whole usermodehelper stuff creates a whole extra thread, sets
> > everything up and drops into userspace. Perhaps that is the easiest
> > solution. Basically you set the thread's mm to efi_mm, populate
> > task_pt_regs() with the right bits and simply drop into 'userspace'.
> > 
> > Then it can complete by terminating itself (sys_exit()) and the calling
> > context reaps the thing and continues.
> 
> I can help with testing and also work on securing the PRM calls.
> Thanks Ard for also looking into this.
> 
> > 
> > > Would that allay your concerns?
> > 
> > Yeah, running it as userspace would be fine; we don't trust that.
> > 
> > But again; a native driver is ever so much better than relying on PRM.
> > 
> > In this case it is AMD doing a driver for their own chips, they know how
> > they work, they should be able to write this natively.
> 
> Since a native driver introduces additional issues, as explained
> above, I would prefer to use PRM for address translation and instead
> ensure the PRM call is secure.

How is this case outside of the typical issues that kernel and its ABI
are meant to abstract?

> Dan, Dave, regarding this series, the cxl driver just uses existing
> PRM kernel code and does not implement anything new here. Is there
> anything that would prevent this series from being accepted? We are
> already at v10 and review is complete:
> 
> https://patchwork.kernel.org/project/cxl/list/?series=1042412
> 
> I will follow up with working on unprivileged PRM calls. I think, that
> will be the best solution here.

The PRM to ring3 work is important for the PRM handlers that are
converting existing SMM flows to use PRM. For new DSMs the answer to the
"why not a native driver?" question needs to be clear.

That said, I am also interested in the PRM to ring3 work and did some
investigation there especially when the threat of runtime updates to PRM
handlers was being proposed. I think it is an important capability that
might also get some reuse with the confidential computing case for some
interactions with platform security services, but that is separate from
the primary question of enabling wider deployment of PRM solutions.

Re: [PATCH v9 10/13] cxl: Enable AMD Zen5 address translation using ACPI PRMT

Posted by Yazen Ghannam 2 weeks, 6 days ago

On Mon, Jan 19, 2026 at 03:33:33PM +0100, Robert Richter wrote:
> (+Rafael and some AMD folks)
> 
> Hi Peter,
> 
> On Fri, Jan 16, 2026 at 03:38:38PM +0100, Peter Zijlstra wrote:
> > On Thu, Jan 15, 2026 at 09:30:10AM +0100, Ard Biesheuvel wrote:
> > > On Thu, 15 Jan 2026 at 09:04, Peter Zijlstra <peterz@infradead.org> wrote:
> > > >
> > > > On Wed, Jan 14, 2026 at 06:08:59PM +0000, Jonathan Cameron wrote:
> > > >
> > > > > Do we have a potential issue wrt to merging this as it stands and improving
> > > > > on it later?  i.e. Is this a blocking issue for this patch set?
> > > >
> > > > Well, why do you *have* to use PRMT at all? And this is a serious
> > > > question; PRMT is basically injecting unaudited magic code into the
> > > > kernel, and that is a security risk.
> > > >
> > > > Worse, in order to run this shit, we have to lower or disable various
> > > > security measures.
> > > >
> > > 
> > > Only if we decide to keep running it privileged, which the PRM spec no
> > > longer requires (as you have confirmed yourself when we last discussed
> > > this, right?)
> > 
> > Indeed. But those very constraints also make me wonder why we would ever
> > bother with PRM at all, and not simply require a native driver. Then you
> > actually *know* what the thing does and can debug/fix it without having
> > to rely on BIOS updates and whatnot.
> 
> an address translation driver needs the configuration data from the
> Data Fabric, which is only known to firmware but not to the kernel.
> Other ways would be necessary to expose and calculate that data, if it
> is even feasible to make this information available.
> 
> So using PRM looks reasonable to me as this abstracts the logic and
> data behind a method, same as doing a library call. Of course, you
> don't want to trust that, but that could be addressed running it
> unprivileged.
> 

Additionally, the same translation code can be used in multiple places
(tools, FW, kernel, etc.). Most consumers treat the code like a library
that they include. It's coded once and bugs can be fixed in one place.

However, with a native kernel driver, we have to re-write everything to
match coding style, licensing, etc.

Also, new hardware may need changes to the code (sometimes major). So
there's upstream work, backporting (more testing), and so on.

See the AMD Address Translation Library at drivers/ras/amd/atl/.

> > Worse, you might have to deal with various incompatible buggy PRM
> > versions because BIOS :/
> 
> The address translation functions are straight forward. I haven't
> experienced any issues here. If there would be any, this will be
> solvable, e.g. by requiring a specific minimum version or uuid to run
> PRM.
> 

This is a good point, and I've brought this up with some of my
colleagues.

The PRM methods are supposed to be able to be updated at runtime by the
OS. We could think of this as a similar flow to microcode.

Thanks,
Yazen

Re: [PATCH v9 10/13] cxl: Enable AMD Zen5 address translation using ACPI PRMT

Posted by dan.j.williams@intel.com 2 weeks, 5 days ago

Yazen Ghannam wrote:
[..]
> Additionally, the same translation code can be used in multiple places
> (tools, FW, kernel, etc.). Most consumers treat the code like a library
> that they include. It's coded once and bugs can be fixed in one place.
> 
> However, with a native kernel driver, we have to re-write everything to
> match coding style, licensing, etc.
> 
> Also, new hardware may need changes to the code (sometimes major). So
> there's upstream work, backporting (more testing), and so on.
> 
> See the AMD Address Translation Library at drivers/ras/amd/atl/.

There is more nuance here.

There are indeed cases where there are high degrees of non-architectural
details in flux from one product to the next. For example, the details
that EDAC no longer needs to chase because the ADXL DSM exists are a
solution to the problem of shifting and complicated memory topology
details.

CXL is a standard that this architecture at issue decided to inject
software-model-destroying artificats like CXL-endpoint-HPA to
CXL-Host-Bridge-SPA (Normalized Addressing) translation.

A Normalized Address looks like a static offset per host bridge, not a
method call round trip to a runtime firmware service.

Note that there are other platforms that break basic HPA-to-SPA
assumptions, but those have been handled with native driver support via
XOR interleave, and non-CXL-Host-Bridge target updates to the
ACPI.CEDT.CFMWS table.

> > > Worse, you might have to deal with various incompatible buggy PRM
> > > versions because BIOS :/
> > 
> > The address translation functions are straight forward. I haven't
> > experienced any issues here. If there would be any, this will be
> > solvable, e.g. by requiring a specific minimum version or uuid to run
> > PRM.
> > 
> 
> This is a good point, and I've brought this up with some of my
> colleagues.

The more that software bugs leak into this interface requiring
consideration of versions and the like, the louder the requests for
"please move this to a driver" will become.

> The PRM methods are supposed to be able to be updated at runtime by the
> OS. We could think of this as a similar flow to microcode.

No, at the point where runtime updates are needed outside of a BIOS
update we have crossed the threshold into Linux actively taking on new
maintenance burden to enable hardware platforms to avoid the discipline
of architectural solutions.

Microcode is a confined solution space. PRM is unbounded.

Now, stepping back, this specific Zen5 support has been a long time
coming. Specifically, there are shipping platforms where Linux is unable
to use any of its CXL RAS support because it gets tripped up on this
fundamental step. I would like to see exact details on what this PRM
handler is doing so that we, linux-cxl community, can make a
determination about:

    "yes this algorithm is so tiny and static, PRM not indicated"

    "no, this is complicated and guaranteed to keep shifting product to
     product, Linux is better off with a PRM helper"

...but still merge this PRM call, regardless of the determination. Put
the next potential use of PRM on notice that native drivers are required
outside of meeting the "complicated + shifting" criteria that indicate
PRM.

Re: [PATCH v9 10/13] cxl: Enable AMD Zen5 address translation using ACPI PRMT

Posted by Yazen Ghannam 2 weeks, 4 days ago

On Tue, Jan 20, 2026 at 04:35:57PM -0800, dan.j.williams@intel.com wrote:
> Yazen Ghannam wrote:
> [..]
> > Additionally, the same translation code can be used in multiple places
> > (tools, FW, kernel, etc.). Most consumers treat the code like a library
> > that they include. It's coded once and bugs can be fixed in one place.
> > 
> > However, with a native kernel driver, we have to re-write everything to
> > match coding style, licensing, etc.
> > 
> > Also, new hardware may need changes to the code (sometimes major). So
> > there's upstream work, backporting (more testing), and so on.
> > 
> > See the AMD Address Translation Library at drivers/ras/amd/atl/.
> 
> There is more nuance here.
> 
> There are indeed cases where there are high degrees of non-architectural
> details in flux from one product to the next. For example, the details
> that EDAC no longer needs to chase because the ADXL DSM exists are a
> solution to the problem of shifting and complicated memory topology
> details.
> 

Right, this is the intended use case. 

> CXL is a standard that this architecture at issue decided to inject
> software-model-destroying artificats like CXL-endpoint-HPA to
> CXL-Host-Bridge-SPA (Normalized Addressing) translation.
> 
> A Normalized Address looks like a static offset per host bridge, not a
> method call round trip to a runtime firmware service.
> 
> Note that there are other platforms that break basic HPA-to-SPA
> assumptions, but those have been handled with native driver support via
> XOR interleave, and non-CXL-Host-Bridge target updates to the
> ACPI.CEDT.CFMWS table.
> 

I see. So the concern is including model-specific methods that would
modify the CXL standard flow, correct?

Or, more specifically, is it reliance on external/system-specific
information?

Or the time spent on a round trip call to another service?

> > > > Worse, you might have to deal with various incompatible buggy PRM
> > > > versions because BIOS :/
> > > 
> > > The address translation functions are straight forward. I haven't
> > > experienced any issues here. If there would be any, this will be
> > > solvable, e.g. by requiring a specific minimum version or uuid to run
> > > PRM.
> > > 
> > 
> > This is a good point, and I've brought this up with some of my
> > colleagues.
> 
> The more that software bugs leak into this interface requiring
> consideration of versions and the like, the louder the requests for
> "please move this to a driver" will become.
> 

Yes, ack.

> > The PRM methods are supposed to be able to be updated at runtime by the
> > OS. We could think of this as a similar flow to microcode.
> 
> No, at the point where runtime updates are needed outside of a BIOS
> update we have crossed the threshold into Linux actively taking on new
> maintenance burden to enable hardware platforms to avoid the discipline
> of architectural solutions.
> 
> Microcode is a confined solution space. PRM is unbounded.
> 
> Now, stepping back, this specific Zen5 support has been a long time
> coming. Specifically, there are shipping platforms where Linux is unable
> to use any of its CXL RAS support because it gets tripped up on this
> fundamental step. I would like to see exact details on what this PRM
> handler is doing so that we, linux-cxl community, can make a
> determination about:
> 
>     "yes this algorithm is so tiny and static, PRM not indicated"
> 
>     "no, this is complicated and guaranteed to keep shifting product to
>      product, Linux is better off with a PRM helper"
> 
> ...but still merge this PRM call, regardless of the determination. Put
> the next potential use of PRM on notice that native drivers are required
> outside of meeting the "complicated + shifting" criteria that indicate
> PRM.

I can give a general overview. The AMD CXL address translation flows are
an extension of the AMD Data Fabric address translation flows.
Specifically for Zen5, it would be "DF v4.5" with adjustments for CXL.

The "DF 4.5" translation is upstream in the AMD Address Translation
Library. See code examples with "git grep -i df4p5".

I would consider this "complicated + shifting". This is true for general
memory errors reported through MCA/EDAC.

I defer to my CXL colleagues if the "shifting" criteria applies to
future CXL systems.

Thanks,
Yazen

Re: [PATCH v9 10/13] cxl: Enable AMD Zen5 address translation using ACPI PRMT

Posted by dan.j.williams@intel.com 2 weeks, 4 days ago

Yazen Ghannam wrote:
> On Tue, Jan 20, 2026 at 04:35:57PM -0800, dan.j.williams@intel.com wrote:
> > Yazen Ghannam wrote:
> > [..]
> > > Additionally, the same translation code can be used in multiple places
> > > (tools, FW, kernel, etc.). Most consumers treat the code like a library
> > > that they include. It's coded once and bugs can be fixed in one place.
> > > 
> > > However, with a native kernel driver, we have to re-write everything to
> > > match coding style, licensing, etc.
> > > 
> > > Also, new hardware may need changes to the code (sometimes major). So
> > > there's upstream work, backporting (more testing), and so on.
> > > 
> > > See the AMD Address Translation Library at drivers/ras/amd/atl/.
> > 
> > There is more nuance here.
> > 
> > There are indeed cases where there are high degrees of non-architectural
> > details in flux from one product to the next. For example, the details
> > that EDAC no longer needs to chase because the ADXL DSM exists are a
> > solution to the problem of shifting and complicated memory topology
> > details.
> > 
> 
> Right, this is the intended use case. 
> 
> > CXL is a standard that this architecture at issue decided to inject
> > software-model-destroying artificats like CXL-endpoint-HPA to
> > CXL-Host-Bridge-SPA (Normalized Addressing) translation.
> > 
> > A Normalized Address looks like a static offset per host bridge, not a
> > method call round trip to a runtime firmware service.
> > 
> > Note that there are other platforms that break basic HPA-to-SPA
> > assumptions, but those have been handled with native driver support via
> > XOR interleave, and non-CXL-Host-Bridge target updates to the
> > ACPI.CEDT.CFMWS table.
> > 
> 
> I see. So the concern is including model-specific methods that would
> modify the CXL standard flow, correct?

Yes, but more than that, Linux benefits from one vendor's model-specific
feature being upleveled into a standard concept.

With ACPI there is a Code First process to get clarifications and small
features into the specification for situations like this. For CXL we can
only approximate that with documenting "conventions" for shipping
platforms [1]. The request for CXL is document the driver-breaking
platform features in a way that at least gives Linux a way to say "oh,
hey $HW_VENDOR, you seem to be taking the same liberties with the
specification as $OTHER_HW_VENDOR. Please implement it the same way
while working a change to the CXL specification on the backend."

[1]: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=7ac6612d6b79

As I told Robert, I want a generic "Normalized Address" facility of
which Zen5 is the first user.

> Or, more specifically, is it reliance on external/system-specific
> information?

Reliance on system information is not a problem. ACPI is great at
distilling platform degrees of freedom into static tables and shared
concepts.

> Or the time spent on a round trip call to another service?

No, overhead is not the concern, opaqueness, complexity, and security
implications of sprinkling runtime service calls for what amounts to "do
some limited address math" is the problem. Static tables can carry a
large problem space without all the pitfalls of runtime service calls.
Examples are "CXL XOR Interleave Math Structure" and "Interleave Set
spans non-CXL domains" feature of the ACPI.CEDT

> > > The PRM methods are supposed to be able to be updated at runtime by the
> > > OS. We could think of this as a similar flow to microcode.
> > 
> > No, at the point where runtime updates are needed outside of a BIOS
> > update we have crossed the threshold into Linux actively taking on new
> > maintenance burden to enable hardware platforms to avoid the discipline
> > of architectural solutions.
> > 
> > Microcode is a confined solution space. PRM is unbounded.
> > 
> > Now, stepping back, this specific Zen5 support has been a long time
> > coming. Specifically, there are shipping platforms where Linux is unable
> > to use any of its CXL RAS support because it gets tripped up on this
> > fundamental step. I would like to see exact details on what this PRM
> > handler is doing so that we, linux-cxl community, can make a
> > determination about:
> > 
> >     "yes this algorithm is so tiny and static, PRM not indicated"
> > 
> >     "no, this is complicated and guaranteed to keep shifting product to
> >      product, Linux is better off with a PRM helper"
> > 
> > ...but still merge this PRM call, regardless of the determination. Put
> > the next potential use of PRM on notice that native drivers are required
> > outside of meeting the "complicated + shifting" criteria that indicate
> > PRM.
> 
> I can give a general overview. The AMD CXL address translation flows are
> an extension of the AMD Data Fabric address translation flows.
> Specifically for Zen5, it would be "DF v4.5" with adjustments for CXL.
> 
> The "DF 4.5" translation is upstream in the AMD Address Translation
> Library. See code examples with "git grep -i df4p5".

Right, that looks like all the same complexity that the Intel ADXL DSM
deals with, but ADXL only needs to handle the "complicated + shifting"
nature of product-to-product DRAM architecture changes. CXL address
translation is left to the OS driver because CXL is standardized (can
not shift).

> I would consider this "complicated + shifting". This is true for general
> memory errors reported through MCA/EDAC.
> 
> I defer to my CXL colleagues if the "shifting" criteria applies to
> future CXL systems.

My hypothesis is that it was convenient for $HW_VENDOR to glomm this
small subset of "CXL Normalized Address" into existing firmware method
infrastructure. It did so at the expense of exporting the complexity of
yet one more PRM method call to Linux.

A static table is unplanned work for $HW_VENDOR, comparable of amount of
work for Linux, and lower amount of risk to mitigate from PRM exposure
for Linux.

My goal here is to have an archived message to point to the next time
someone wants to reach for the "PRM" tool and understand that Linux has
a high bar for new invocations.

Re: [PATCH v9 10/13] cxl: Enable AMD Zen5 address translation using ACPI PRMT

Posted by Gregory Price 2 weeks, 4 days ago

On Wed, Jan 21, 2026 at 02:09:27PM -0800, dan.j.williams@intel.com wrote:
> > 
> > I see. So the concern is including model-specific methods that would
> > modify the CXL standard flow, correct?
> 
...
> 
> As I told Robert, I want a generic "Normalized Address" facility of
> which Zen5 is the first user.
> 

Isn't that what this patch functionally is w/ a specific PRM function?

   rc = acpi_call_prm_handler(prm_cxl_dpa_spa_guid, &data);

Or is the request now: replace this with static table data?


point of ignorance: what facility would you use to expose such tables?

-----

When i initialially hacked up driver support for this mode before
getting PRM support, the "hacked up translation code" I was this:

  /* Find 0-based offset into whole interleave region */
  dev = (pdev->bus->number == 0xe1) ? 0 : 1;
  offset = (0x100 * (((norm_addr >> 8) * 2) + dev)) + (norm_addr & 0xff);

  /* Find the SPA base for the address */
  for (idx = 0; idx < cfmws_nr; idx++) {
      size = cxl_get_cfmws_size(idx);
      /* We may have a gap in the CFMWS */
      if (offset < size) {
          *sys_addr = cxl_get_cfmws_base(idx) + offset;
          return 0;
      }
      offset -= size;
   }

------

This makes hard-assumptions about two things:

  device interleave index  - pcidev(0xe1) => 0
  cfmws base               - all CFMWS are used for this one region

cxl_get_cfmws_base() was a call into ACPI code, and the acpi code just
kept a global cache of the raw CEDT CFMWS structures (base + size);

So, assuming you had such tables, it would need to be like:

                  Normalized Decoders Table
    --------------------------------------------------------
    | CXL PCIDev | Decoder  | CFMW SPAN  |  Interleave IDX |
    --------------------------------------------------------
    |     d1     |    0     |    1,2     |        0        |
    |     e1     |    0     |    1,2     |        1        |
    --------------------------------------------------------
  --------------------------------^
  |            CFMW Index Table
  |  -----------------------------------------
  |  | CFMW ID |     BASE       |    SIZE    |
  |  -----------------------------------------
  |  |    0    | 0xb00000....   |     ...    |
  |->|    1    | 0xc05000....   |            |
  |->|    2    | 0x100500....   |            |
     |    3    | 0x200000....   |     ...    |
     -----------------------------------------

-------

The code above turns into

int cxl_normal_translate(pdev, norm_addr, u64* sys_addr)
{
    int i_idx = cxl_nrm_decoder_interleave_index(pdev);
    int span, i;
    u64 offset;

    if (i_idx < 0)
    	return -EINVAL;

    span = cxl_nrm_decoder_window_span(pdev);

    /* Normalized offset into whole region */
    offset = (0x100 * (((norm_addr >> 8) * 2) + i_idx)) + (norm_addr & 0xff);

    /* Find actual CFMW Base (might cross multiple w/ gaps) */
    for (i = 0; i < span; i++) {
        u64 base, size;
	int id;

        id = cxl_nrm_decoder_cfmws_id(i);
	if (id < 0)
	   return -EINVAL;

        if (!cxl_nrm_decoder_cfmws_data(id, &base, &size))
	   return -EINVAL;

	if (offset < size) {
	    *sys_addr = cxl_get_cfmws_base(id) + offset;
	    return 0;
	}
        offset -= size;
    }
    return -EINVAL;
}

Where the cxl_nrm_*() functions just query the exposed tables - however
that actually happens.

--------

I don't know whether the above math is actually true, it's basically
just the simply interleave maths. If something else is going on, then
this whole table thing might not actually work.

The rest of the patch set would more or less stay the same.

~Gregory

Re: [PATCH v9 10/13] cxl: Enable AMD Zen5 address translation using ACPI PRMT

Posted by dan.j.williams@intel.com 2 weeks, 3 days ago

Gregory Price wrote:
> On Wed, Jan 21, 2026 at 02:09:27PM -0800, dan.j.williams@intel.com wrote:
> > > 
> > > I see. So the concern is including model-specific methods that would
> > > modify the CXL standard flow, correct?
> > 
> ...
> > 
> > As I told Robert, I want a generic "Normalized Address" facility of
> > which Zen5 is the first user.
> > 
> 
> Isn't that what this patch functionally is w/ a specific PRM function?
> 
>    rc = acpi_call_prm_handler(prm_cxl_dpa_spa_guid, &data);
> 
> Or is the request now: replace this with static table data?

As I mentioned at the bottom of this message to Yazen [1], the request
is to prove or disprove the hypothesis that a table would have sufficed,
but otherwise go ahead with merging this handler. Set a precedent that
the next attempt to solve a problem like this with PRM will face a
higher bar.

[1]: http://lore.kernel.org/69701f6de978_1d6f1001e@dwillia2-mobl4.notmuch

> point of ignorance: what facility would you use to expose such tables?

New sub-structure of the CEDT similar to the CXIMS.

> -----
> 
> When i initialially hacked up driver support for this mode before
> getting PRM support, the "hacked up translation code" I was this:
> 
>   /* Find 0-based offset into whole interleave region */
>   dev = (pdev->bus->number == 0xe1) ? 0 : 1;
>   offset = (0x100 * (((norm_addr >> 8) * 2) + dev)) + (norm_addr & 0xff);
> 
>   /* Find the SPA base for the address */
>   for (idx = 0; idx < cfmws_nr; idx++) {
>       size = cxl_get_cfmws_size(idx);
>       /* We may have a gap in the CFMWS */
>       if (offset < size) {
>           *sys_addr = cxl_get_cfmws_base(idx) + offset;
>           return 0;
>       }
>       offset -= size;
>    }
> 
> ------
> 
> This makes hard-assumptions about two things:
> 
>   device interleave index  - pcidev(0xe1) => 0
>   cfmws base               - all CFMWS are used for this one region
> 
> cxl_get_cfmws_base() was a call into ACPI code, and the acpi code just
> kept a global cache of the raw CEDT CFMWS structures (base + size);
> 
> So, assuming you had such tables, it would need to be like:
> 
>                   Normalized Decoders Table
>     --------------------------------------------------------
>     | CXL PCIDev | Decoder  | CFMW SPAN  |  Interleave IDX |
>     --------------------------------------------------------
>     |     d1     |    0     |    1,2     |        0        |
>     |     e1     |    0     |    1,2     |        1        |
>     --------------------------------------------------------
>   --------------------------------^
>   |            CFMW Index Table
>   |  -----------------------------------------
>   |  | CFMW ID |     BASE       |    SIZE    |
>   |  -----------------------------------------
>   |  |    0    | 0xb00000....   |     ...    |
>   |->|    1    | 0xc05000....   |            |
>   |->|    2    | 0x100500....   |            |
>      |    3    | 0x200000....   |     ...    |
>      -----------------------------------------
> 
> -------
> 
> The code above turns into
> 
> int cxl_normal_translate(pdev, norm_addr, u64* sys_addr)
> {
>     int i_idx = cxl_nrm_decoder_interleave_index(pdev);
>     int span, i;
>     u64 offset;
> 
>     if (i_idx < 0)
>     	return -EINVAL;
> 
>     span = cxl_nrm_decoder_window_span(pdev);
> 
>     /* Normalized offset into whole region */
>     offset = (0x100 * (((norm_addr >> 8) * 2) + i_idx)) + (norm_addr & 0xff);
> 
>     /* Find actual CFMW Base (might cross multiple w/ gaps) */
>     for (i = 0; i < span; i++) {
>         u64 base, size;
> 	int id;
> 
>         id = cxl_nrm_decoder_cfmws_id(i);
> 	if (id < 0)
> 	   return -EINVAL;
> 
>         if (!cxl_nrm_decoder_cfmws_data(id, &base, &size))
> 	   return -EINVAL;
> 
> 	if (offset < size) {
> 	    *sys_addr = cxl_get_cfmws_base(id) + offset;
> 	    return 0;
> 	}
>         offset -= size;
>     }
>     return -EINVAL;
> }
> 
> Where the cxl_nrm_*() functions just query the exposed tables - however
> that actually happens.
> 
> --------
> 
> I don't know whether the above math is actually true, it's basically
> just the simply interleave maths. If something else is going on, then
> this whole table thing might not actually work.
> 
> The rest of the patch set would more or less stay the same.

If the above is even close to being correct, I would merge that in a
heartbeat over this PRM proposal.

Robert, do you really want to be spending time on trying moving PRM to
userspace vs just doing the above?

Re: [PATCH v9 10/13] cxl: Enable AMD Zen5 address translation using ACPI PRMT

Posted by dan.j.williams@intel.com 2 weeks, 3 days ago

dan.j.williams@ wrote:
[..]
> If the above is even close to being correct, I would merge that in a
> heartbeat over this PRM proposal.
> 
> Robert, do you really want to be spending time on trying moving PRM to
> userspace vs just doing the above?

To be clear I am still of the opinion that even if it is confirmed that
Gregory's algorithm would have done the trick with a new table, proceed with
the PRM solution. The PRM method appears to be already shipping and it fixes a
long overdue problem causing end user pain. The request is do not plan to ship
new PRM without clarity on why a native driver approach can not work.

Re: [PATCH v9 10/13] cxl: Enable AMD Zen5 address translation using ACPI PRMT

Posted by Dave Jiang 2 weeks, 6 days ago


On 1/19/26 7:33 AM, Robert Richter wrote:
> (+Rafael and some AMD folks)
> 
> Hi Peter,
> 
> On Fri, Jan 16, 2026 at 03:38:38PM +0100, Peter Zijlstra wrote:
>> On Thu, Jan 15, 2026 at 09:30:10AM +0100, Ard Biesheuvel wrote:
>>> On Thu, 15 Jan 2026 at 09:04, Peter Zijlstra <peterz@infradead.org> wrote:
>>>>
>>>> On Wed, Jan 14, 2026 at 06:08:59PM +0000, Jonathan Cameron wrote:
>>>>
>>>>> Do we have a potential issue wrt to merging this as it stands and improving
>>>>> on it later?  i.e. Is this a blocking issue for this patch set?
>>>>
>>>> Well, why do you *have* to use PRMT at all? And this is a serious
>>>> question; PRMT is basically injecting unaudited magic code into the
>>>> kernel, and that is a security risk.
>>>>
>>>> Worse, in order to run this shit, we have to lower or disable various
>>>> security measures.
>>>>
>>>
>>> Only if we decide to keep running it privileged, which the PRM spec no
>>> longer requires (as you have confirmed yourself when we last discussed
>>> this, right?)
>>
>> Indeed. But those very constraints also make me wonder why we would ever
>> bother with PRM at all, and not simply require a native driver. Then you
>> actually *know* what the thing does and can debug/fix it without having
>> to rely on BIOS updates and whatnot.
> 
> an address translation driver needs the configuration data from the
> Data Fabric, which is only known to firmware but not to the kernel.
> Other ways would be necessary to expose and calculate that data, if it
> is even feasible to make this information available.
> 
> So using PRM looks reasonable to me as this abstracts the logic and
> data behind a method, same as doing a library call. Of course, you
> don't want to trust that, but that could be addressed running it
> unprivileged.
> 
>> Worse, you might have to deal with various incompatible buggy PRM
>> versions because BIOS :/
> 
> The address translation functions are straight forward. I haven't
> experienced any issues here. If there would be any, this will be
> solvable, e.g. by requiring a specific minimum version or uuid to run
> PRM.
> 
>>
>>>> If I had my way, we would WARN and TAINT the kernel whenever such
>>>> garbage got used.
>>>
>>> These are things that used to live in SMM, requiring all CPUs to
>>> disappear into SMM mode in a way that was completely opaque to the OS.
>>>
>>> PRM runs under the control of the OS, does not require privileges and
>>> only needs MMIO access to the regions it describes in its manifest
>>> (which the OS can inspect, if desired). So if there are security
>>> concerns with PRM today, it is because we were lazy and did not
>>> implement PRM securely from the beginning.
>>>
>>> In my defense, I wasn't aware of the unprivileged requirement until
>>> you spotted it recently: it was something I had asked for when the PRM
>>> spec was put up for "review" by the Intel and MS authors, and they
>>> told me they couldn't possibly make any changes at that point, because
>>> it had already gone into production. But as it turns out, the change
>>> was made after all.
>>>
>>> I am a total noob when it comes to how x86 does its ring0/ring3
>>> switching, but with some help, I should be able to prototype something
>>> to call into the PRM service unprivileged, running under the efi_mm.
>>
>> The ring transition itself is done using IRET; create a iret frame with
>> userspace CS and the right IP (and flag etc.) and off you go. The
>> problem is getting back in the kernel I suppose. All the 'normal' kernel
>> entry points assume the kernel stack is empty and all that.
>>
>> The whole usermodehelper stuff creates a whole extra thread, sets
>> everything up and drops into userspace. Perhaps that is the easiest
>> solution. Basically you set the thread's mm to efi_mm, populate
>> task_pt_regs() with the right bits and simply drop into 'userspace'.
>>
>> Then it can complete by terminating itself (sys_exit()) and the calling
>> context reaps the thing and continues.
> 
> I can help with testing and also work on securing the PRM calls.
> Thanks Ard for also looking into this.
> 
>>
>>> Would that allay your concerns?
>>
>> Yeah, running it as userspace would be fine; we don't trust that.
>>
>> But again; a native driver is ever so much better than relying on PRM.
>>
>> In this case it is AMD doing a driver for their own chips, they know how
>> they work, they should be able to write this natively.
> 
> Since a native driver introduces additional issues, as explained
> above, I would prefer to use PRM for address translation and instead
> ensure the PRM call is secure.
> 
> Dan, Dave, regarding this series, the cxl driver just uses existing
> PRM kernel code and does not implement anything new here. Is there
> anything that would prevent this series from being accepted? We are
> already at v10 and review is complete:
> 
> https://patchwork.kernel.org/project/cxl/list/?series=1042412
> 
> I will follow up with working on unprivileged PRM calls. I think, that
> will be the best solution here.

I have no objections with the promise of work on unprivileged PRM call. Please rev the convention doc with Dan's request and we can get this merged.

> 
> Thanks,
> 
> -Robert

Re: [PATCH v9 10/13] cxl: Enable AMD Zen5 address translation using ACPI PRMT

Posted by Gregory Price 2 weeks, 6 days ago

On Mon, Jan 19, 2026 at 03:33:33PM +0100, Robert Richter wrote:
> Dan, Dave, regarding this series, the cxl driver just uses existing
> PRM kernel code and does not implement anything new here. Is there
> anything that would prevent this series from being accepted? We are
> already at v10 and review is complete:
> 
> https://patchwork.kernel.org/project/cxl/list/?series=1042412
> 

I will also add that this code has been heavily tested version to
version on many thousands of boxes for over a year.

~Gregory