cxl: Address translation support, part 2: Generic support and AMD Zen5 platform enablement

[PATCH v2 14/15] cxl/amd: Enable Zen5 address translation using ACPI PRMT

Posted by Robert Richter 11 months, 3 weeks ago

Add AMD platform specific Zen5 support for address translation.

Zen5 systems may be configured to use 'Normalized addresses'. Then,
CXL endpoints use their own physical address space and are programmed
passthrough (DPA == HPA), the number of interleaving ways for the
endpoint is set to one. The Host Physical Addresses (HPAs) need to be
translated from the endpoint to its CXL host bridge. The HPA of a CXL
host bridge is equivalent to the System Physical Address (SPA).

ACPI Platform Runtime Mechanism (PRM) is used to translate the CXL
Device Physical Address (DPA) to its System Physical Address. This is
documented in:

 AMD Family 1Ah Models 00h–0Fh and Models 10h–1Fh
 ACPI v6.5 Porting Guide, Publication # 58088
 https://www.amd.com/en/search/documentation/hub.html

To implement AMD Zen5 address translation the following steps are
needed:

Apply platform specific changes to each port where necessary using
platform detection and the existing architectural framework.

Add a function cxl_port_setup_amd() to implement AMD platform specific
code. Use Kbuild and Kconfig options respectively to enable the code
depending on architecture and platform options. Handle architecture
specifics in arch_cxl_port_platform_setup() and create a new file
core/x86/amd.c for this.

Introduce a function cxl_zen5_init() to handle Zen5 specific
enablement. Zen5 platforms are detected using the PCIe vendor and
device ID of the corresponding CXL root port. There is a check for
ACPI PRMT CXL address translation support.

Apply cxl_zen5_to_hpa() as cxl_port->to_hpa() callback to Zen5 CXL
host bridges to enable platform specific address translation.

Use ACPI PRM DPA-to-SPA translation to determine an endpoint's
interleaving configuration and base address during the early
initialization process. This is used to determine an endpoint's SPA
range and to check the address translation setup.

The configuration can be determined calling the PRM for specific HPAs
given. The resulting SPAs are used to calculate interleaving
parameters of the host bridge and root port. The maximum granularity
(chunk size) is 16k, minimum is 256. Use the following calculation for
the given HPAs:

 ways    = hpa_len(SZ_16K) / SZ_16K
 gran    = (hpa_len(SZ_16K) - hpa_len(SZ_16K - SZ_256) - SZ_256)
          / (ways - 1)
 pos     = (hpa_len(SZ_16K) - ways * SZ_16K) / gran

Before the endpoint is attached to a region the translation is checked
for reasonable values.

Signed-off-by: Robert Richter <rrichter@amd.com>
---
 drivers/cxl/Kconfig           |   4 +
 drivers/cxl/core/Makefile     |   1 +
 drivers/cxl/core/core.h       |   3 +
 drivers/cxl/core/region.c     |  25 +++-
 drivers/cxl/core/x86/amd.c    | 259 ++++++++++++++++++++++++++++++++++
 drivers/cxl/core/x86/common.c |   2 +
 6 files changed, 293 insertions(+), 1 deletion(-)
 create mode 100644 drivers/cxl/core/x86/amd.c

diff --git a/drivers/cxl/Kconfig b/drivers/cxl/Kconfig
index 876469e23f7a..e576028dd983 100644
--- a/drivers/cxl/Kconfig
+++ b/drivers/cxl/Kconfig
@@ -146,4 +146,8 @@ config CXL_REGION_INVALIDATION_TEST
 	  If unsure, or if this kernel is meant for production environments,
 	  say N.
 
+config CXL_AMD
+       def_bool y
+       depends on AMD_NB
+
 endif
diff --git a/drivers/cxl/core/Makefile b/drivers/cxl/core/Makefile
index db1d16d39037..cfe41b8edfd3 100644
--- a/drivers/cxl/core/Makefile
+++ b/drivers/cxl/core/Makefile
@@ -18,3 +18,4 @@ cxl_core-$(CONFIG_TRACING) += trace.o
 cxl_core-$(CONFIG_CXL_REGION) += region.o
 
 cxl_core-$(CONFIG_X86)		+= x86/common.o
+cxl_core-$(CONFIG_CXL_AMD)	+= x86/amd.o
diff --git a/drivers/cxl/core/core.h b/drivers/cxl/core/core.h
index e2955f91fd98..d5c94e8cea42 100644
--- a/drivers/cxl/core/core.h
+++ b/drivers/cxl/core/core.h
@@ -119,5 +119,8 @@ int cxl_port_get_switch_dport_bandwidth(struct cxl_port *port,
 
 int cxl_gpf_port_setup(struct device *dport_dev, struct cxl_port *port);
 void arch_cxl_port_platform_setup(struct cxl_port *port);
+#if defined(CONFIG_X86)
+void cxl_port_setup_amd(struct cxl_port *port);
+#endif
 
 #endif /* __CXL_CORE_H__ */
diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
index dab059ee26ef..b6806e67c62a 100644
--- a/drivers/cxl/core/region.c
+++ b/drivers/cxl/core/region.c
@@ -837,10 +837,24 @@ static int cxl_port_calc_hpa(struct cxl_port *port, struct cxl_decoder *cxld,
 {
 	struct range hpa = *hpa_range;
 	u64 len = range_len(&hpa);
+	int bits;
 
 	if (!port->to_hpa)
 		return 0;
 
+	/*
+	 * Check range and length alignment of 256 MB for the
+	 * interleaved address range. With max. 16-way interleaving
+	 * applied this is at least 16 KB.
+	 */
+
+	if (!len || hpa_range->start & (SZ_16K - 1) || len & (SZ_16K - 1)) {
+		dev_warn(&port->dev,
+			"HPA range not aligned or multiple of 16 kB: %#llx-%#llx(%s)\n",
+			hpa_range->start, hpa_range->end, dev_name(&cxld->dev));
+		return -ENXIO;
+	}
+
 	/* Translate HPA to the next upper domain. */
 	hpa.start = port->to_hpa(cxld, hpa.start);
 	hpa.end = port->to_hpa(cxld, hpa.end);
@@ -853,7 +867,16 @@ static int cxl_port_calc_hpa(struct cxl_port *port, struct cxl_decoder *cxld,
 		return -ENXIO;
 	}
 
-	if (range_len(&hpa) != len * cxld->interleave_ways) {
+	/*
+	 *  Apply min and max interleaving addresses to the range.
+	 *  Determine the interleave ways and expand the 16 KB range
+	 *  by the power-of-2 part it.
+	 */
+	bits = range_len(&hpa) > len ? __ffs(range_len(&hpa) / len) : 0;
+	hpa.start &= ~((SZ_16K << bits) - 1);
+	hpa.end |= (SZ_16K << bits) - 1;
+
+	if (range_len(&hpa) % len) {
 		dev_warn(&port->dev,
 			"CXL address translation: HPA range not contiguous: %#llx-%#llx:%#llx-%#llx(%s)\n",
 			hpa.start, hpa.end, hpa_range->start,
diff --git a/drivers/cxl/core/x86/amd.c b/drivers/cxl/core/x86/amd.c
new file mode 100644
index 000000000000..483c92c18054
--- /dev/null
+++ b/drivers/cxl/core/x86/amd.c
@@ -0,0 +1,259 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Copyright (C) 2025 Advanced Micro Devices, Inc.
+ */
+
+#include <linux/prmt.h>
+#include <linux/pci.h>
+
+#include <cxlmem.h>
+#include "../core.h"
+
+#define PCI_DEVICE_ID_AMD_ZEN5_ROOT		0x153e
+
+static const struct pci_device_id zen5_root_port_ids[] = {
+	{ PCI_DEVICE(PCI_VENDOR_ID_AMD, PCI_DEVICE_ID_AMD_ZEN5_ROOT) },
+	{},
+};
+
+static int is_zen5_root_port(struct device *dev, void *unused)
+{
+	if (!dev_is_pci(dev))
+		return 0;
+
+	return !!pci_match_id(zen5_root_port_ids, to_pci_dev(dev));
+}
+
+static bool is_zen5(struct cxl_port *port)
+{
+	if (!IS_ENABLED(CONFIG_ACPI_PRMT))
+		return false;
+
+	/* To get the CXL root port, find the CXL host bridge first. */
+	if (is_cxl_root(port) ||
+	    !port->host_bridge ||
+	    !is_cxl_root(to_cxl_port(port->dev.parent)))
+		return false;
+
+	return !!device_for_each_child(port->host_bridge, NULL,
+				       is_zen5_root_port);
+}
+
+/*
+ * PRM Address Translation - CXL DPA to System Physical Address
+ *
+ * Reference:
+ *
+ * AMD Family 1Ah Models 00h–0Fh and Models 10h–1Fh
+ * ACPI v6.5 Porting Guide, Publication # 58088
+ */
+
+static const guid_t prm_cxl_dpa_spa_guid =
+	GUID_INIT(0xee41b397, 0x25d4, 0x452c, 0xad, 0x54, 0x48, 0xc6, 0xe3,
+		  0x48, 0x0b, 0x94);
+
+struct prm_cxl_dpa_spa_data {
+	u64 dpa;
+	u8 reserved;
+	u8 devfn;
+	u8 bus;
+	u8 segment;
+	void *out;
+} __packed;
+
+static u64 prm_cxl_dpa_spa(struct pci_dev *pci_dev, u64 dpa)
+{
+	struct prm_cxl_dpa_spa_data data;
+	u64 spa;
+	int rc;
+
+	data = (struct prm_cxl_dpa_spa_data) {
+		.dpa     = dpa,
+		.devfn   = pci_dev->devfn,
+		.bus     = pci_dev->bus->number,
+		.segment = pci_domain_nr(pci_dev->bus),
+		.out     = &spa,
+	};
+
+	rc = acpi_call_prm_handler(prm_cxl_dpa_spa_guid, &data);
+	if (rc) {
+		pci_dbg(pci_dev, "failed to get SPA for %#llx: %d\n", dpa, rc);
+		return ULLONG_MAX;
+	}
+
+	pci_dbg(pci_dev, "PRM address translation: DPA -> SPA: %#llx -> %#llx\n", dpa, spa);
+
+	return spa;
+}
+
+/* Bits used for interleaving. */
+#define SPA_INTERLEAVING_BITS	GENMASK_ULL(14, 8)
+
+static u64 cxl_zen5_to_hpa(struct cxl_decoder *cxld, u64 hpa)
+{
+	struct cxl_memdev *cxlmd;
+	struct pci_dev *pci_dev;
+	struct cxl_port *port;
+	u64 base, spa, spa2, len, len2, offset, granularity, gran_mask;
+	int ways, pos, ways_bits, gran_bits;
+
+	/*
+	 * Nothing to do if base is non-zero and Normalized Addressing
+	 * is disabled.
+	 */
+	if (cxld->hpa_range.start)
+		return hpa;
+
+	/* Only translate from endpoint to its parent port. */
+	if (!is_endpoint_decoder(&cxld->dev))
+		return hpa;
+
+	/*
+	 * Endpoints are programmed passthrough in Normalized
+	 * Addressing mode.
+	 */
+	if (cxld->interleave_ways != 1) {
+		dev_dbg(&cxld->dev, "unexpected interleaving config: ways: %d granularity: %d\n",
+			cxld->interleave_ways, cxld->interleave_granularity);
+		return ULLONG_MAX;
+	}
+
+	if (hpa > cxld->hpa_range.end) {
+		dev_dbg(&cxld->dev, "hpa addr %#llx out of range %#llx-%#llx\n",
+			hpa, cxld->hpa_range.start, cxld->hpa_range.end);
+		return ULLONG_MAX;
+	}
+
+	port = to_cxl_port(cxld->dev.parent);
+	cxlmd = port ? to_cxl_memdev(port->uport_dev) : NULL;
+	if (!port || !dev_is_pci(cxlmd->dev.parent)) {
+		dev_dbg(&cxld->dev, "No endpoint found: %s, range %#llx-%#llx\n",
+			dev_name(cxld->dev.parent), cxld->hpa_range.start,
+			cxld->hpa_range.end);
+		return ULLONG_MAX;
+	}
+	pci_dev = to_pci_dev(cxlmd->dev.parent);
+
+	/*
+	 * If the decoder is already attached we are past the decoder
+	 * initialization, do not determine the address mapping and
+	 * just return here.
+	 */
+	if (cxld->region)
+		return prm_cxl_dpa_spa(pci_dev, hpa);
+
+	/*
+	 * Determine the interleaving config. Maximum granularity
+	 * (chunk size) is 16k, minimum is 256. Calculated with:
+	 *
+	 *	ways	= hpa_len(SZ_16K) / SZ_16K
+	 * 	gran	= (hpa_len(SZ_16K) - hpa_len(SZ_16K - SZ_256) - SZ_256)
+	 *                / (ways - 1)
+	 *	pos	= (hpa_len(SZ_16K) - ways * SZ_16K) / gran
+	 */
+
+	base = prm_cxl_dpa_spa(pci_dev, 0);
+	spa  = prm_cxl_dpa_spa(pci_dev, SZ_16K);
+	spa2 = prm_cxl_dpa_spa(pci_dev, SZ_16K - SZ_256);
+
+	/* Includes checks to avoid div by zero */
+	if (!base || base == ULLONG_MAX || spa == ULLONG_MAX ||
+	    spa2 == ULLONG_MAX || spa < base + SZ_16K || spa2 <= base ||
+	    (spa > base + SZ_16K && spa - spa2 < SZ_256 * 2)) {
+		dev_dbg(&cxld->dev, "Error translating HPA: base: %#llx spa: %#llx spa2: %#llx\n",
+			base, spa, spa2);
+		return ULLONG_MAX;
+	}
+
+	len = spa - base;
+	len2 = spa2 - base;
+
+	/* offset = pos * granularity */
+	if (len == SZ_16K && len2 == SZ_16K - SZ_256) {
+		ways = 1;
+		offset = 0;
+		granularity = SZ_256;
+		pos = 0;
+		ways_bits = 0;
+		gran_bits = 8;
+	} else {
+		ways = len / SZ_16K;
+		offset = spa & (SZ_16K - 1);
+		granularity = (len - len2 - SZ_256) / (ways - 1);
+		ways_bits = __ffs(ways);
+		gran_bits = __ffs(granularity);
+		pos = offset >> gran_bits;
+	}
+
+	/*
+	 * Check the mapping: Number of ways is power of 2 or a
+	 * multiple of 3 ways (len == ways * SZ_16K), granularitys is
+	 * power of 2.
+	 */
+	if (len & ~(3ULL << (ways_bits + 14)) ||
+	    granularity != 1 << gran_bits || offset != pos << gran_bits) {
+		dev_dbg(&cxld->dev, "Error determining address mapping: base: %#llx spa: %#llx spa2: %#llx ways: %d pos: %d granularity: %llu\n",
+			base, spa, spa2, ways, pos, granularity);
+		return ULLONG_MAX;
+	}
+
+	spa = prm_cxl_dpa_spa(pci_dev, hpa);
+
+	/*
+	 * Check SPA using a PRM call for the closest DPA calculated
+	 * for the HPA. If the HPA matches a different interleaving
+	 * position other than the decoder's, determine its offset to
+	 * adjust the SPA.
+	 */
+
+	gran_mask = GENMASK_ULL(gran_bits, 0);
+	spa2 = base + (hpa & ~gran_mask) * ways + (hpa & gran_mask);
+	base = base - pos * granularity;
+
+	dev_dbg(&cxld->dev,
+		"address mapping found for %s (hpa -> spa): %#llx -> %#llx (%#llx+%#llx) ways: %d pos: %d granularity: %llu\n",
+		pci_name(pci_dev), hpa, spa, base, spa - base, ways, pos,
+		granularity);
+
+
+	if ((spa ^ spa2) & ~SPA_INTERLEAVING_BITS) {
+		dev_dbg(&cxld->dev, "SPA calculation failed: %#llx:%#llx\n",
+			spa, spa2);
+		return ULLONG_MAX;
+	}
+
+	return spa;
+}
+
+static void cxl_zen5_init(struct cxl_port *port)
+{
+	u64 spa;
+	struct prm_cxl_dpa_spa_data data = { .out = &spa, };
+	int rc;
+
+	if (!is_zen5(port))
+		return;
+
+	/* Check kernel and firmware support */
+	rc = acpi_call_prm_handler(prm_cxl_dpa_spa_guid, &data);
+
+	if (rc == -EOPNOTSUPP) {
+		pr_warn_once("ACPI PRMT: PRM address translation not supported by kernel\n");
+		return;
+	}
+
+	if (rc == -ENODEV) {
+		pr_warn_once("ACPI PRMT: PRM address translation not supported by firmware\n");
+		return;
+	}
+
+	port->to_hpa = cxl_zen5_to_hpa;
+
+	dev_dbg(port->host_bridge, "PRM address translation enabled for %s.\n",
+		dev_name(&port->dev));
+}
+
+void cxl_port_setup_amd(struct cxl_port *port)
+{
+	cxl_zen5_init(port);
+}
diff --git a/drivers/cxl/core/x86/common.c b/drivers/cxl/core/x86/common.c
index eeb9bdadb26d..7ccd68b035e6 100644
--- a/drivers/cxl/core/x86/common.c
+++ b/drivers/cxl/core/x86/common.c
@@ -9,4 +9,6 @@
 
 void arch_cxl_port_platform_setup(struct cxl_port *port)
 {
+	if (IS_ENABLED(CONFIG_CXL_AMD))
+		cxl_port_setup_amd(port);
 }
-- 
2.39.5

Re: [PATCH v2 14/15] cxl/amd: Enable Zen5 address translation using ACPI PRMT

Posted by Jonathan Cameron 11 months ago

On Tue, 18 Feb 2025 14:23:55 +0100
Robert Richter <rrichter@amd.com> wrote:

> Add AMD platform specific Zen5 support for address translation.
> 
> Zen5 systems may be configured to use 'Normalized addresses'. Then,
> CXL endpoints use their own physical address space and are programmed
> passthrough (DPA == HPA), the number of interleaving ways for the
> endpoint is set to one. The Host Physical Addresses (HPAs) need to be
> translated from the endpoint to its CXL host bridge. The HPA of a CXL
> host bridge is equivalent to the System Physical Address (SPA).
> 
> ACPI Platform Runtime Mechanism (PRM) is used to translate the CXL
> Device Physical Address (DPA) to its System Physical Address. This is
> documented in:
> 
>  AMD Family 1Ah Models 00h–0Fh and Models 10h–1Fh
>  ACPI v6.5 Porting Guide, Publication # 58088
>  https://www.amd.com/en/search/documentation/hub.html
> 
> To implement AMD Zen5 address translation the following steps are
> needed:
> 
> Apply platform specific changes to each port where necessary using
> platform detection and the existing architectural framework.
> 
> Add a function cxl_port_setup_amd() to implement AMD platform specific
> code. Use Kbuild and Kconfig options respectively to enable the code
> depending on architecture and platform options. Handle architecture
> specifics in arch_cxl_port_platform_setup() and create a new file
> core/x86/amd.c for this.
> 
> Introduce a function cxl_zen5_init() to handle Zen5 specific
> enablement. Zen5 platforms are detected using the PCIe vendor and
> device ID of the corresponding CXL root port. There is a check for
> ACPI PRMT CXL address translation support.
> 
> Apply cxl_zen5_to_hpa() as cxl_port->to_hpa() callback to Zen5 CXL
> host bridges to enable platform specific address translation.
> 
> Use ACPI PRM DPA-to-SPA translation to determine an endpoint's
> interleaving configuration and base address during the early
> initialization process. This is used to determine an endpoint's SPA
> range and to check the address translation setup.
> 
> The configuration can be determined calling the PRM for specific HPAs
> given. The resulting SPAs are used to calculate interleaving
> parameters of the host bridge and root port. The maximum granularity
> (chunk size) is 16k, minimum is 256. Use the following calculation for
> the given HPAs:
> 
>  ways    = hpa_len(SZ_16K) / SZ_16K
>  gran    = (hpa_len(SZ_16K) - hpa_len(SZ_16K - SZ_256) - SZ_256)
>           / (ways - 1)
>  pos     = (hpa_len(SZ_16K) - ways * SZ_16K) / gran
> 
> Before the endpoint is attached to a region the translation is checked
> for reasonable values.
> 
> Signed-off-by: Robert Richter <rrichter@amd.com>
Some trivial comments inline.  The rest I may take another look at as
not gotten my head fully around it yet.

> diff --git a/drivers/cxl/core/x86/amd.c b/drivers/cxl/core/x86/amd.c
> new file mode 100644
> index 000000000000..483c92c18054
> --- /dev/null
> +++ b/drivers/cxl/core/x86/amd.c
> @@ -0,0 +1,259 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/*
> + * Copyright (C) 2025 Advanced Micro Devices, Inc.
> + */
> +
> +#include <linux/prmt.h>
> +#include <linux/pci.h>
> +
> +#include <cxlmem.h>
> +#include "../core.h"
> +
> +#define PCI_DEVICE_ID_AMD_ZEN5_ROOT		0x153e

Might as well just put it inline unless you have lots of uses.

> +
> +static const struct pci_device_id zen5_root_port_ids[] = {
> +	{ PCI_DEVICE(PCI_VENDOR_ID_AMD, PCI_DEVICE_ID_AMD_ZEN5_ROOT) },
> +	{},
	{}

Don't want to make it easy to add stuff after.

> +};
> +
> +static int is_zen5_root_port(struct device *dev, void *unused)
> +{
> +	if (!dev_is_pci(dev))
> +		return 0;
> +
> +	return !!pci_match_id(zen5_root_port_ids, to_pci_dev(dev));
> +}
> +
> +static bool is_zen5(struct cxl_port *port)
> +{
> +	if (!IS_ENABLED(CONFIG_ACPI_PRMT))
> +		return false;
> +
> +	/* To get the CXL root port, find the CXL host bridge first. */
> +	if (is_cxl_root(port) ||
> +	    !port->host_bridge ||
combine a few of these on same line perhaps?
> +	    !is_cxl_root(to_cxl_port(port->dev.parent)))
> +		return false;

> +
> +/* Bits used for interleaving. */
> +#define SPA_INTERLEAVING_BITS	GENMASK_ULL(14, 8)
> +
> +static u64 cxl_zen5_to_hpa(struct cxl_decoder *cxld, u64 hpa)
> +{


> +	port = to_cxl_port(cxld->dev.parent);
> +	cxlmd = port ? to_cxl_memdev(port->uport_dev) : NULL;
> +	if (!port || !dev_is_pci(cxlmd->dev.parent)) {

Given you are going to dereference cxlmd, maybe more logical to do

	if (!cxlmd || !dev_is_pci()

> +		dev_dbg(&cxld->dev, "No endpoint found: %s, range %#llx-%#llx\n",
> +			dev_name(cxld->dev.parent), cxld->hpa_range.start,
> +			cxld->hpa_range.end);
> +		return ULLONG_MAX;
> +	}
> +	pci_dev = to_pci_dev(cxlmd->dev.parent);
> +
> +	/*
> +	 * If the decoder is already attached we are past the decoder
> +	 * initialization, do not determine the address mapping and
> +	 * just return here.
> +	 */
> +	if (cxld->region)
> +		return prm_cxl_dpa_spa(pci_dev, hpa);
> +
>
> +	/*
> +	 * Check the mapping: Number of ways is power of 2 or a
> +	 * multiple of 3 ways (len == ways * SZ_16K), granularitys is

granularity is 

> +	 * power of 2.
> +	 */
> +	if (len & ~(3ULL << (ways_bits + 14)) ||
> +	    granularity != 1 << gran_bits || offset != pos << gran_bits) {
> +		dev_dbg(&cxld->dev, "Error determining address mapping: base: %#llx spa: %#llx spa2: %#llx ways: %d pos: %d granularity: %llu\n",
> +			base, spa, spa2, ways, pos, granularity);
> +		return ULLONG_MAX;
> +	}
> +
> +	spa = prm_cxl_dpa_spa(pci_dev, hpa);
> +
> +	/*
> +	 * Check SPA using a PRM call for the closest DPA calculated
> +	 * for the HPA. If the HPA matches a different interleaving
> +	 * position other than the decoder's, determine its offset to
> +	 * adjust the SPA.
> +	 */
> +
> +	gran_mask = GENMASK_ULL(gran_bits, 0);
> +	spa2 = base + (hpa & ~gran_mask) * ways + (hpa & gran_mask);
> +	base = base - pos * granularity;
> +
> +	dev_dbg(&cxld->dev,
> +		"address mapping found for %s (hpa -> spa): %#llx -> %#llx (%#llx+%#llx) ways: %d pos: %d granularity: %llu\n",
> +		pci_name(pci_dev), hpa, spa, base, spa - base, ways, pos,
> +		granularity);
> +
> +
> +	if ((spa ^ spa2) & ~SPA_INTERLEAVING_BITS) {
> +		dev_dbg(&cxld->dev, "SPA calculation failed: %#llx:%#llx\n",
> +			spa, spa2);
> +		return ULLONG_MAX;
> +	}
> +
> +	return spa;
> +}
> +
> +static void cxl_zen5_init(struct cxl_port *port)
> +{
> +	u64 spa;
> +	struct prm_cxl_dpa_spa_data data = { .out = &spa, };
> +	int rc;
> +
> +	if (!is_zen5(port))
> +		return;
> +
> +	/* Check kernel and firmware support */
> +	rc = acpi_call_prm_handler(prm_cxl_dpa_spa_guid, &data);
> +
No blank line here. Keen the error check right up against the call.
> +	if (rc == -EOPNOTSUPP) {
> +		pr_warn_once("ACPI PRMT: PRM address translation not supported by kernel\n");
> +		return;
> +	}
> +
> +	if (rc == -ENODEV) {
> +		pr_warn_once("ACPI PRMT: PRM address translation not supported by firmware\n");
> +		return;
> +	}
> +
> +	port->to_hpa = cxl_zen5_to_hpa;
> +
> +	dev_dbg(port->host_bridge, "PRM address translation enabled for %s.\n",
> +		dev_name(&port->dev));
> +}
> +
> +void cxl_port_setup_amd(struct cxl_port *port)
> +{
> +	cxl_zen5_init(port);
> +}

Re: [PATCH v2 14/15] cxl/amd: Enable Zen5 address translation using ACPI PRMT

Posted by Dave Jiang 11 months, 3 weeks ago


On 2/18/25 6:23 AM, Robert Richter wrote:
> Add AMD platform specific Zen5 support for address translation.
> 
> Zen5 systems may be configured to use 'Normalized addresses'. Then,
> CXL endpoints use their own physical address space and are programmed
> passthrough (DPA == HPA), the number of interleaving ways for the
> endpoint is set to one. The Host Physical Addresses (HPAs) need to be
> translated from the endpoint to its CXL host bridge. The HPA of a CXL
> host bridge is equivalent to the System Physical Address (SPA).
> 
> ACPI Platform Runtime Mechanism (PRM) is used to translate the CXL
> Device Physical Address (DPA) to its System Physical Address. This is
> documented in:
> 
>  AMD Family 1Ah Models 00h–0Fh and Models 10h–1Fh
>  ACPI v6.5 Porting Guide, Publication # 58088
>  https://www.amd.com/en/search/documentation/hub.html
> 
> To implement AMD Zen5 address translation the following steps are
> needed:
> 
> Apply platform specific changes to each port where necessary using
> platform detection and the existing architectural framework.
> 
> Add a function cxl_port_setup_amd() to implement AMD platform specific
> code. Use Kbuild and Kconfig options respectively to enable the code
> depending on architecture and platform options. Handle architecture
> specifics in arch_cxl_port_platform_setup() and create a new file
> core/x86/amd.c for this.
> 
> Introduce a function cxl_zen5_init() to handle Zen5 specific
> enablement. Zen5 platforms are detected using the PCIe vendor and
> device ID of the corresponding CXL root port. There is a check for
> ACPI PRMT CXL address translation support.
> 
> Apply cxl_zen5_to_hpa() as cxl_port->to_hpa() callback to Zen5 CXL
> host bridges to enable platform specific address translation.
> 
> Use ACPI PRM DPA-to-SPA translation to determine an endpoint's
> interleaving configuration and base address during the early
> initialization process. This is used to determine an endpoint's SPA
> range and to check the address translation setup.
> 
> The configuration can be determined calling the PRM for specific HPAs
> given. The resulting SPAs are used to calculate interleaving
> parameters of the host bridge and root port. The maximum granularity
> (chunk size) is 16k, minimum is 256. Use the following calculation for
> the given HPAs:
> 
>  ways    = hpa_len(SZ_16K) / SZ_16K
>  gran    = (hpa_len(SZ_16K) - hpa_len(SZ_16K - SZ_256) - SZ_256)
>           / (ways - 1)
>  pos     = (hpa_len(SZ_16K) - ways * SZ_16K) / gran
> 
> Before the endpoint is attached to a region the translation is checked
> for reasonable values.
> 
> Signed-off-by: Robert Richter <rrichter@amd.com>

What is the possibilty of making this ACPI PRM call interface public and teach drivers/cxl/acpi.c to make it common code? That way if the platform implements the ACPI call then the translation gets registered and we don't need to have platform specific code. Really would like to avoid drivers/cxl/core/$arch if that's possible. 

DJ

> ---
>  drivers/cxl/Kconfig           |   4 +
>  drivers/cxl/core/Makefile     |   1 +
>  drivers/cxl/core/core.h       |   3 +
>  drivers/cxl/core/region.c     |  25 +++-
>  drivers/cxl/core/x86/amd.c    | 259 ++++++++++++++++++++++++++++++++++
>  drivers/cxl/core/x86/common.c |   2 +
>  6 files changed, 293 insertions(+), 1 deletion(-)
>  create mode 100644 drivers/cxl/core/x86/amd.c
> 
> diff --git a/drivers/cxl/Kconfig b/drivers/cxl/Kconfig
> index 876469e23f7a..e576028dd983 100644
> --- a/drivers/cxl/Kconfig
> +++ b/drivers/cxl/Kconfig
> @@ -146,4 +146,8 @@ config CXL_REGION_INVALIDATION_TEST
>  	  If unsure, or if this kernel is meant for production environments,
>  	  say N.
>  
> +config CXL_AMD
> +       def_bool y
> +       depends on AMD_NB
> +
>  endif
> diff --git a/drivers/cxl/core/Makefile b/drivers/cxl/core/Makefile
> index db1d16d39037..cfe41b8edfd3 100644
> --- a/drivers/cxl/core/Makefile
> +++ b/drivers/cxl/core/Makefile
> @@ -18,3 +18,4 @@ cxl_core-$(CONFIG_TRACING) += trace.o
>  cxl_core-$(CONFIG_CXL_REGION) += region.o
>  
>  cxl_core-$(CONFIG_X86)		+= x86/common.o
> +cxl_core-$(CONFIG_CXL_AMD)	+= x86/amd.o
> diff --git a/drivers/cxl/core/core.h b/drivers/cxl/core/core.h
> index e2955f91fd98..d5c94e8cea42 100644
> --- a/drivers/cxl/core/core.h
> +++ b/drivers/cxl/core/core.h
> @@ -119,5 +119,8 @@ int cxl_port_get_switch_dport_bandwidth(struct cxl_port *port,
>  
>  int cxl_gpf_port_setup(struct device *dport_dev, struct cxl_port *port);
>  void arch_cxl_port_platform_setup(struct cxl_port *port);
> +#if defined(CONFIG_X86)
> +void cxl_port_setup_amd(struct cxl_port *port);
> +#endif
>  
>  #endif /* __CXL_CORE_H__ */
> diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
> index dab059ee26ef..b6806e67c62a 100644
> --- a/drivers/cxl/core/region.c
> +++ b/drivers/cxl/core/region.c
> @@ -837,10 +837,24 @@ static int cxl_port_calc_hpa(struct cxl_port *port, struct cxl_decoder *cxld,
>  {
>  	struct range hpa = *hpa_range;
>  	u64 len = range_len(&hpa);
> +	int bits;
>  
>  	if (!port->to_hpa)
>  		return 0;
>  
> +	/*
> +	 * Check range and length alignment of 256 MB for the
> +	 * interleaved address range. With max. 16-way interleaving
> +	 * applied this is at least 16 KB.
> +	 */
> +
> +	if (!len || hpa_range->start & (SZ_16K - 1) || len & (SZ_16K - 1)) {
> +		dev_warn(&port->dev,
> +			"HPA range not aligned or multiple of 16 kB: %#llx-%#llx(%s)\n",
> +			hpa_range->start, hpa_range->end, dev_name(&cxld->dev));
> +		return -ENXIO;
> +	}
> +
>  	/* Translate HPA to the next upper domain. */
>  	hpa.start = port->to_hpa(cxld, hpa.start);
>  	hpa.end = port->to_hpa(cxld, hpa.end);
> @@ -853,7 +867,16 @@ static int cxl_port_calc_hpa(struct cxl_port *port, struct cxl_decoder *cxld,
>  		return -ENXIO;
>  	}
>  
> -	if (range_len(&hpa) != len * cxld->interleave_ways) {
> +	/*
> +	 *  Apply min and max interleaving addresses to the range.
> +	 *  Determine the interleave ways and expand the 16 KB range
> +	 *  by the power-of-2 part it.
> +	 */
> +	bits = range_len(&hpa) > len ? __ffs(range_len(&hpa) / len) : 0;
> +	hpa.start &= ~((SZ_16K << bits) - 1);
> +	hpa.end |= (SZ_16K << bits) - 1;
> +
> +	if (range_len(&hpa) % len) {
>  		dev_warn(&port->dev,
>  			"CXL address translation: HPA range not contiguous: %#llx-%#llx:%#llx-%#llx(%s)\n",
>  			hpa.start, hpa.end, hpa_range->start,
> diff --git a/drivers/cxl/core/x86/amd.c b/drivers/cxl/core/x86/amd.c
> new file mode 100644
> index 000000000000..483c92c18054
> --- /dev/null
> +++ b/drivers/cxl/core/x86/amd.c
> @@ -0,0 +1,259 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/*
> + * Copyright (C) 2025 Advanced Micro Devices, Inc.
> + */
> +
> +#include <linux/prmt.h>
> +#include <linux/pci.h>
> +
> +#include <cxlmem.h>
> +#include "../core.h"
> +
> +#define PCI_DEVICE_ID_AMD_ZEN5_ROOT		0x153e
> +
> +static const struct pci_device_id zen5_root_port_ids[] = {
> +	{ PCI_DEVICE(PCI_VENDOR_ID_AMD, PCI_DEVICE_ID_AMD_ZEN5_ROOT) },
> +	{},
> +};
> +
> +static int is_zen5_root_port(struct device *dev, void *unused)
> +{
> +	if (!dev_is_pci(dev))
> +		return 0;
> +
> +	return !!pci_match_id(zen5_root_port_ids, to_pci_dev(dev));
> +}
> +
> +static bool is_zen5(struct cxl_port *port)
> +{
> +	if (!IS_ENABLED(CONFIG_ACPI_PRMT))
> +		return false;
> +
> +	/* To get the CXL root port, find the CXL host bridge first. */
> +	if (is_cxl_root(port) ||
> +	    !port->host_bridge ||
> +	    !is_cxl_root(to_cxl_port(port->dev.parent)))
> +		return false;
> +
> +	return !!device_for_each_child(port->host_bridge, NULL,
> +				       is_zen5_root_port);
> +}
> +
> +/*
> + * PRM Address Translation - CXL DPA to System Physical Address
> + *
> + * Reference:
> + *
> + * AMD Family 1Ah Models 00h–0Fh and Models 10h–1Fh
> + * ACPI v6.5 Porting Guide, Publication # 58088
> + */
> +
> +static const guid_t prm_cxl_dpa_spa_guid =
> +	GUID_INIT(0xee41b397, 0x25d4, 0x452c, 0xad, 0x54, 0x48, 0xc6, 0xe3,
> +		  0x48, 0x0b, 0x94);
> +
> +struct prm_cxl_dpa_spa_data {
> +	u64 dpa;
> +	u8 reserved;
> +	u8 devfn;
> +	u8 bus;
> +	u8 segment;
> +	void *out;
> +} __packed;
> +
> +static u64 prm_cxl_dpa_spa(struct pci_dev *pci_dev, u64 dpa)
> +{
> +	struct prm_cxl_dpa_spa_data data;
> +	u64 spa;
> +	int rc;
> +
> +	data = (struct prm_cxl_dpa_spa_data) {
> +		.dpa     = dpa,
> +		.devfn   = pci_dev->devfn,
> +		.bus     = pci_dev->bus->number,
> +		.segment = pci_domain_nr(pci_dev->bus),
> +		.out     = &spa,
> +	};
> +
> +	rc = acpi_call_prm_handler(prm_cxl_dpa_spa_guid, &data);
> +	if (rc) {
> +		pci_dbg(pci_dev, "failed to get SPA for %#llx: %d\n", dpa, rc);
> +		return ULLONG_MAX;
> +	}
> +
> +	pci_dbg(pci_dev, "PRM address translation: DPA -> SPA: %#llx -> %#llx\n", dpa, spa);
> +
> +	return spa;
> +}
> +
> +/* Bits used for interleaving. */
> +#define SPA_INTERLEAVING_BITS	GENMASK_ULL(14, 8)
> +
> +static u64 cxl_zen5_to_hpa(struct cxl_decoder *cxld, u64 hpa)
> +{
> +	struct cxl_memdev *cxlmd;
> +	struct pci_dev *pci_dev;
> +	struct cxl_port *port;
> +	u64 base, spa, spa2, len, len2, offset, granularity, gran_mask;
> +	int ways, pos, ways_bits, gran_bits;
> +
> +	/*
> +	 * Nothing to do if base is non-zero and Normalized Addressing
> +	 * is disabled.
> +	 */
> +	if (cxld->hpa_range.start)
> +		return hpa;
> +
> +	/* Only translate from endpoint to its parent port. */
> +	if (!is_endpoint_decoder(&cxld->dev))
> +		return hpa;
> +
> +	/*
> +	 * Endpoints are programmed passthrough in Normalized
> +	 * Addressing mode.
> +	 */
> +	if (cxld->interleave_ways != 1) {
> +		dev_dbg(&cxld->dev, "unexpected interleaving config: ways: %d granularity: %d\n",
> +			cxld->interleave_ways, cxld->interleave_granularity);
> +		return ULLONG_MAX;
> +	}
> +
> +	if (hpa > cxld->hpa_range.end) {
> +		dev_dbg(&cxld->dev, "hpa addr %#llx out of range %#llx-%#llx\n",
> +			hpa, cxld->hpa_range.start, cxld->hpa_range.end);
> +		return ULLONG_MAX;
> +	}
> +
> +	port = to_cxl_port(cxld->dev.parent);
> +	cxlmd = port ? to_cxl_memdev(port->uport_dev) : NULL;
> +	if (!port || !dev_is_pci(cxlmd->dev.parent)) {
> +		dev_dbg(&cxld->dev, "No endpoint found: %s, range %#llx-%#llx\n",
> +			dev_name(cxld->dev.parent), cxld->hpa_range.start,
> +			cxld->hpa_range.end);
> +		return ULLONG_MAX;
> +	}
> +	pci_dev = to_pci_dev(cxlmd->dev.parent);
> +
> +	/*
> +	 * If the decoder is already attached we are past the decoder
> +	 * initialization, do not determine the address mapping and
> +	 * just return here.
> +	 */
> +	if (cxld->region)
> +		return prm_cxl_dpa_spa(pci_dev, hpa);
> +
> +	/*
> +	 * Determine the interleaving config. Maximum granularity
> +	 * (chunk size) is 16k, minimum is 256. Calculated with:
> +	 *
> +	 *	ways	= hpa_len(SZ_16K) / SZ_16K
> +	 * 	gran	= (hpa_len(SZ_16K) - hpa_len(SZ_16K - SZ_256) - SZ_256)
> +	 *                / (ways - 1)
> +	 *	pos	= (hpa_len(SZ_16K) - ways * SZ_16K) / gran
> +	 */
> +
> +	base = prm_cxl_dpa_spa(pci_dev, 0);
> +	spa  = prm_cxl_dpa_spa(pci_dev, SZ_16K);
> +	spa2 = prm_cxl_dpa_spa(pci_dev, SZ_16K - SZ_256);
> +
> +	/* Includes checks to avoid div by zero */
> +	if (!base || base == ULLONG_MAX || spa == ULLONG_MAX ||
> +	    spa2 == ULLONG_MAX || spa < base + SZ_16K || spa2 <= base ||
> +	    (spa > base + SZ_16K && spa - spa2 < SZ_256 * 2)) {
> +		dev_dbg(&cxld->dev, "Error translating HPA: base: %#llx spa: %#llx spa2: %#llx\n",
> +			base, spa, spa2);
> +		return ULLONG_MAX;
> +	}
> +
> +	len = spa - base;
> +	len2 = spa2 - base;
> +
> +	/* offset = pos * granularity */
> +	if (len == SZ_16K && len2 == SZ_16K - SZ_256) {
> +		ways = 1;
> +		offset = 0;
> +		granularity = SZ_256;
> +		pos = 0;
> +		ways_bits = 0;
> +		gran_bits = 8;
> +	} else {
> +		ways = len / SZ_16K;
> +		offset = spa & (SZ_16K - 1);
> +		granularity = (len - len2 - SZ_256) / (ways - 1);
> +		ways_bits = __ffs(ways);
> +		gran_bits = __ffs(granularity);
> +		pos = offset >> gran_bits;
> +	}
> +
> +	/*
> +	 * Check the mapping: Number of ways is power of 2 or a
> +	 * multiple of 3 ways (len == ways * SZ_16K), granularitys is
> +	 * power of 2.
> +	 */
> +	if (len & ~(3ULL << (ways_bits + 14)) ||
> +	    granularity != 1 << gran_bits || offset != pos << gran_bits) {
> +		dev_dbg(&cxld->dev, "Error determining address mapping: base: %#llx spa: %#llx spa2: %#llx ways: %d pos: %d granularity: %llu\n",
> +			base, spa, spa2, ways, pos, granularity);
> +		return ULLONG_MAX;
> +	}
> +
> +	spa = prm_cxl_dpa_spa(pci_dev, hpa);
> +
> +	/*
> +	 * Check SPA using a PRM call for the closest DPA calculated
> +	 * for the HPA. If the HPA matches a different interleaving
> +	 * position other than the decoder's, determine its offset to
> +	 * adjust the SPA.
> +	 */
> +
> +	gran_mask = GENMASK_ULL(gran_bits, 0);
> +	spa2 = base + (hpa & ~gran_mask) * ways + (hpa & gran_mask);
> +	base = base - pos * granularity;
> +
> +	dev_dbg(&cxld->dev,
> +		"address mapping found for %s (hpa -> spa): %#llx -> %#llx (%#llx+%#llx) ways: %d pos: %d granularity: %llu\n",
> +		pci_name(pci_dev), hpa, spa, base, spa - base, ways, pos,
> +		granularity);
> +
> +
> +	if ((spa ^ spa2) & ~SPA_INTERLEAVING_BITS) {
> +		dev_dbg(&cxld->dev, "SPA calculation failed: %#llx:%#llx\n",
> +			spa, spa2);
> +		return ULLONG_MAX;
> +	}
> +
> +	return spa;
> +}
> +
> +static void cxl_zen5_init(struct cxl_port *port)
> +{
> +	u64 spa;
> +	struct prm_cxl_dpa_spa_data data = { .out = &spa, };
> +	int rc;
> +
> +	if (!is_zen5(port))
> +		return;
> +
> +	/* Check kernel and firmware support */
> +	rc = acpi_call_prm_handler(prm_cxl_dpa_spa_guid, &data);
> +
> +	if (rc == -EOPNOTSUPP) {
> +		pr_warn_once("ACPI PRMT: PRM address translation not supported by kernel\n");
> +		return;
> +	}
> +
> +	if (rc == -ENODEV) {
> +		pr_warn_once("ACPI PRMT: PRM address translation not supported by firmware\n");
> +		return;
> +	}
> +
> +	port->to_hpa = cxl_zen5_to_hpa;
> +
> +	dev_dbg(port->host_bridge, "PRM address translation enabled for %s.\n",
> +		dev_name(&port->dev));
> +}
> +
> +void cxl_port_setup_amd(struct cxl_port *port)
> +{
> +	cxl_zen5_init(port);
> +}
> diff --git a/drivers/cxl/core/x86/common.c b/drivers/cxl/core/x86/common.c
> index eeb9bdadb26d..7ccd68b035e6 100644
> --- a/drivers/cxl/core/x86/common.c
> +++ b/drivers/cxl/core/x86/common.c
> @@ -9,4 +9,6 @@
>  
>  void arch_cxl_port_platform_setup(struct cxl_port *port)
>  {
> +	if (IS_ENABLED(CONFIG_CXL_AMD))
> +		cxl_port_setup_amd(port);
>  }

[PATCH] [HACK] drop zen5_init checks due to segfault

Posted by Gregory Price 10 months, 1 week ago

Unclear why this is occuring, but a raw call to the PRM at this point
causes segfaults on my Zen5 system.  Later calls to the prm work just
fine, and modifying the structure to include pci_dev info still results
in a segfault.

Debugging this is not possible on my end since the crash happens deep in
the ACPI prm code.  Seems maybe the PRM interface isn't ready or something?

---
 drivers/cxl/core/x86/amd.c | 17 -----------------
 1 file changed, 17 deletions(-)

diff --git a/drivers/cxl/core/x86/amd.c b/drivers/cxl/core/x86/amd.c
index 483c92c18054..5e3708f9e179 100644
--- a/drivers/cxl/core/x86/amd.c
+++ b/drivers/cxl/core/x86/amd.c
@@ -227,26 +227,9 @@ static u64 cxl_zen5_to_hpa(struct cxl_decoder *cxld, u64 hpa)

 static void cxl_zen5_init(struct cxl_port *port)
 {
-	u64 spa;
-	struct prm_cxl_dpa_spa_data data = { .out = &spa, };
-	int rc;
-
 	if (!is_zen5(port))
 		return;

-	/* Check kernel and firmware support */
-	rc = acpi_call_prm_handler(prm_cxl_dpa_spa_guid, &data);
-
-	if (rc == -EOPNOTSUPP) {
-		pr_warn_once("ACPI PRMT: PRM address translation not supported by kernel\n");
-		return;
-	}
-
-	if (rc == -ENODEV) {
-		pr_warn_once("ACPI PRMT: PRM address translation not supported by firmware\n");
-		return;
-	}
-
 	port->to_hpa = cxl_zen5_to_hpa;

 	dev_dbg(port->host_bridge, "PRM address translation enabled for %s.\n",
--
2.47.1

Re: [PATCH] [HACK] drop zen5_init checks due to segfault

Posted by Robert Richter 9 months ago

On 04.04.25 22:38:58, Gregory Price wrote:
> Unclear why this is occuring, but a raw call to the PRM at this point
> causes segfaults on my Zen5 system.  Later calls to the prm work just
> fine, and modifying the structure to include pci_dev info still results
> in a segfault.
> 
> Debugging this is not possible on my end since the crash happens deep in
> the ACPI prm code.  Seems maybe the PRM interface isn't ready or something?

There is a subsys_initcall order dependency if driver is builtin:

subsys_initcall(cxl_acpi_init);
subsys_initcall(efisubsys_init);

A fix using subsys_initcall_sync(cxl_acpi_init) solves the issue.

efi_rts_wq workqueue is used by cxl_acpi_init() before its allocation
in efisubsys_init(). I will address that in the next submission.

Thanks for looking into this.

-Robert

Re: [PATCH] [HACK] drop zen5_init checks due to segfault

Posted by Joshua Hahn 7 months, 3 weeks ago

On Tue, 13 May 2025 23:10:36 +0200 Robert Richter <rrichter@amd.com> wrote:

> On 04.04.25 22:38:58, Gregory Price wrote:
> > Unclear why this is occuring, but a raw call to the PRM at this point
> > causes segfaults on my Zen5 system.  Later calls to the prm work just
> > fine, and modifying the structure to include pci_dev info still results
> > in a segfault.
> > 
> > Debugging this is not possible on my end since the crash happens deep in
> > the ACPI prm code.  Seems maybe the PRM interface isn't ready or something?
> 
> There is a subsys_initcall order dependency if driver is builtin:
> 
> subsys_initcall(cxl_acpi_init);
> subsys_initcall(efisubsys_init);
> 
> A fix using subsys_initcall_sync(cxl_acpi_init) solves the issue.
> 
> efi_rts_wq workqueue is used by cxl_acpi_init() before its allocation
> in efisubsys_init(). I will address that in the next submission.
> 
> Thanks for looking into this.
> 
> -Robert
> 

Hello Robert,

I hope you are doing well! Sorry for reviving an old thread. I'm currently
trying to apply this patchset, and saw the same issue that Gregory was having.
Keeping the PRM checks would be helpful for debugging when things go wrong, so
I wanted to try and apply your suggestion, but had a bit of trouble
understanding what the core of the problem was.

I was hoping for some help in understanding your explanation here -- I don't
think I can see where the dependency appears. (In particular, I'm having
trouble understanding where the efi_rts_wq dependnecy matters during the
cxl_zen5_init function). 

Thank you for this patchset, and for your help!
Joshua

Sent using hkml (https://github.com/sjp38/hackermail)

Re: [PATCH] [HACK] drop zen5_init checks due to segfault

Posted by Robert Richter 7 months, 2 weeks ago

On 17.06.25 13:33:18, Joshua Hahn wrote:
> I was hoping for some help in understanding your explanation here -- I don't
> think I can see where the dependency appears. (In particular, I'm having
> trouble understanding where the efi_rts_wq dependnecy matters during the
> cxl_zen5_init function). 

Here a temporary patch with an explanation in the description:


From a540b814d48574b67a9aaa97a5d7536c61d4deda Mon Sep 17 00:00:00 2001
From: Robert Richter <rrichter@amd.com>
Date: Tue, 13 May 2025 15:02:16 +0200
Subject: [PATCH] cxl/acpi: Prepare use of EFI runtime services

In order to use EFI runtime services, esp. ACPI PRM which uses the
efi_rts_wq workqueue, initialize EFI before CXL ACPI.

There is a subsys_initcall order dependency if driver is builtin:

 subsys_initcall(cxl_acpi_init);
 subsys_initcall(efisubsys_init);

Prevent the efi_rts_wq workqueue being used by cxl_acpi_init() before
its allocation. Use subsys_initcall_sync(cxl_acpi_init) to always run
efisubsys_init() first.

Reported-by: Gregory Price <gourry@gourry.net>
Signed-off-by: Robert Richter <rrichter@amd.com>
---
 drivers/cxl/acpi.c | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/drivers/cxl/acpi.c b/drivers/cxl/acpi.c
index a1a99ec3f12c..af750a8bd373 100644
--- a/drivers/cxl/acpi.c
+++ b/drivers/cxl/acpi.c
@@ -932,8 +932,12 @@ static void __exit cxl_acpi_exit(void)
 	cxl_bus_drain();
 }
 
-/* load before dax_hmem sees 'Soft Reserved' CXL ranges */
-subsys_initcall(cxl_acpi_init);
+/*
+ * Load before dax_hmem sees 'Soft Reserved' CXL ranges. Use
+ * subsys_initcall_sync() since there is an order dependency with
+ * subsys_initcall(efisubsys_init), which must run first.
+ */
+subsys_initcall_sync(cxl_acpi_init);
 
 /*
  * Arrange for host-bridge ports to be active synchronous with
-- 
2.39.5

Re: [PATCH] [HACK] drop zen5_init checks due to segfault

Posted by Joshua Hahn 7 months, 2 weeks ago

On Tue, 24 Jun 2025 07:43:13 +0200 Robert Richter <rrichter@amd.com> wrote:

> On 17.06.25 13:33:18, Joshua Hahn wrote:
> > I was hoping for some help in understanding your explanation here -- I don't
> > think I can see where the dependency appears. (In particular, I'm having
> > trouble understanding where the efi_rts_wq dependnecy matters during the
> > cxl_zen5_init function). 
> 
> Here a temporary patch with an explanation in the description:

Hi Robert,

Thank you for this patch! I just tested on my machine, and can confirm that
this does indeed fix the problem. I'm not sure if this will be folded into
the rest of the patchset or if it will be its own, but I will add my
signatures below.

Thank you again, Have a great day!

Tested-by: Joshua Hahn <joshua.hahnjy@gmail.com>
Reviewed-by: Joshua Hahn <joshua.hahnjy@gmail.com>

> From a540b814d48574b67a9aaa97a5d7536c61d4deda Mon Sep 17 00:00:00 2001
> From: Robert Richter <rrichter@amd.com>
> Date: Tue, 13 May 2025 15:02:16 +0200
> Subject: [PATCH] cxl/acpi: Prepare use of EFI runtime services
> 
> In order to use EFI runtime services, esp. ACPI PRM which uses the
> efi_rts_wq workqueue, initialize EFI before CXL ACPI.
> 
> There is a subsys_initcall order dependency if driver is builtin:
> 
>  subsys_initcall(cxl_acpi_init);
>  subsys_initcall(efisubsys_init);
> 
> Prevent the efi_rts_wq workqueue being used by cxl_acpi_init() before
> its allocation. Use subsys_initcall_sync(cxl_acpi_init) to always run
> efisubsys_init() first.
> 
> Reported-by: Gregory Price <gourry@gourry.net>
> Signed-off-by: Robert Richter <rrichter@amd.com>

[...snip...]

Sent using hkml (https://github.com/sjp38/hackermail)