[PATCH] [HACK] drop zen5_init checks due to segfault

Gregory Price posted 1 patch 10 months, 1 week ago
drivers/cxl/core/x86/amd.c | 17 -----------------
1 file changed, 17 deletions(-)
[PATCH] [HACK] drop zen5_init checks due to segfault
Posted by Gregory Price 10 months, 1 week ago
Unclear why this is occuring, but a raw call to the PRM at this point
causes segfaults on my Zen5 system.  Later calls to the prm work just
fine, and modifying the structure to include pci_dev info still results
in a segfault.

Debugging this is not possible on my end since the crash happens deep in
the ACPI prm code.  Seems maybe the PRM interface isn't ready or something?

---
 drivers/cxl/core/x86/amd.c | 17 -----------------
 1 file changed, 17 deletions(-)

diff --git a/drivers/cxl/core/x86/amd.c b/drivers/cxl/core/x86/amd.c
index 483c92c18054..5e3708f9e179 100644
--- a/drivers/cxl/core/x86/amd.c
+++ b/drivers/cxl/core/x86/amd.c
@@ -227,26 +227,9 @@ static u64 cxl_zen5_to_hpa(struct cxl_decoder *cxld, u64 hpa)

 static void cxl_zen5_init(struct cxl_port *port)
 {
-	u64 spa;
-	struct prm_cxl_dpa_spa_data data = { .out = &spa, };
-	int rc;
-
 	if (!is_zen5(port))
 		return;

-	/* Check kernel and firmware support */
-	rc = acpi_call_prm_handler(prm_cxl_dpa_spa_guid, &data);
-
-	if (rc == -EOPNOTSUPP) {
-		pr_warn_once("ACPI PRMT: PRM address translation not supported by kernel\n");
-		return;
-	}
-
-	if (rc == -ENODEV) {
-		pr_warn_once("ACPI PRMT: PRM address translation not supported by firmware\n");
-		return;
-	}
-
 	port->to_hpa = cxl_zen5_to_hpa;

 	dev_dbg(port->host_bridge, "PRM address translation enabled for %s.\n",
--
2.47.1
Re: [PATCH] [HACK] drop zen5_init checks due to segfault
Posted by Robert Richter 9 months ago
On 04.04.25 22:38:58, Gregory Price wrote:
> Unclear why this is occuring, but a raw call to the PRM at this point
> causes segfaults on my Zen5 system.  Later calls to the prm work just
> fine, and modifying the structure to include pci_dev info still results
> in a segfault.
> 
> Debugging this is not possible on my end since the crash happens deep in
> the ACPI prm code.  Seems maybe the PRM interface isn't ready or something?

There is a subsys_initcall order dependency if driver is builtin:

subsys_initcall(cxl_acpi_init);
subsys_initcall(efisubsys_init);

A fix using subsys_initcall_sync(cxl_acpi_init) solves the issue.

efi_rts_wq workqueue is used by cxl_acpi_init() before its allocation
in efisubsys_init(). I will address that in the next submission.

Thanks for looking into this.

-Robert
Re: [PATCH] [HACK] drop zen5_init checks due to segfault
Posted by Joshua Hahn 7 months, 3 weeks ago
On Tue, 13 May 2025 23:10:36 +0200 Robert Richter <rrichter@amd.com> wrote:

> On 04.04.25 22:38:58, Gregory Price wrote:
> > Unclear why this is occuring, but a raw call to the PRM at this point
> > causes segfaults on my Zen5 system.  Later calls to the prm work just
> > fine, and modifying the structure to include pci_dev info still results
> > in a segfault.
> > 
> > Debugging this is not possible on my end since the crash happens deep in
> > the ACPI prm code.  Seems maybe the PRM interface isn't ready or something?
> 
> There is a subsys_initcall order dependency if driver is builtin:
> 
> subsys_initcall(cxl_acpi_init);
> subsys_initcall(efisubsys_init);
> 
> A fix using subsys_initcall_sync(cxl_acpi_init) solves the issue.
> 
> efi_rts_wq workqueue is used by cxl_acpi_init() before its allocation
> in efisubsys_init(). I will address that in the next submission.
> 
> Thanks for looking into this.
> 
> -Robert
> 

Hello Robert,

I hope you are doing well! Sorry for reviving an old thread. I'm currently
trying to apply this patchset, and saw the same issue that Gregory was having.
Keeping the PRM checks would be helpful for debugging when things go wrong, so
I wanted to try and apply your suggestion, but had a bit of trouble
understanding what the core of the problem was.

I was hoping for some help in understanding your explanation here -- I don't
think I can see where the dependency appears. (In particular, I'm having
trouble understanding where the efi_rts_wq dependnecy matters during the
cxl_zen5_init function). 

Thank you for this patchset, and for your help!
Joshua

Sent using hkml (https://github.com/sjp38/hackermail)
Re: [PATCH] [HACK] drop zen5_init checks due to segfault
Posted by Robert Richter 7 months, 2 weeks ago
On 17.06.25 13:33:18, Joshua Hahn wrote:
> I was hoping for some help in understanding your explanation here -- I don't
> think I can see where the dependency appears. (In particular, I'm having
> trouble understanding where the efi_rts_wq dependnecy matters during the
> cxl_zen5_init function). 

Here a temporary patch with an explanation in the description:


From a540b814d48574b67a9aaa97a5d7536c61d4deda Mon Sep 17 00:00:00 2001
From: Robert Richter <rrichter@amd.com>
Date: Tue, 13 May 2025 15:02:16 +0200
Subject: [PATCH] cxl/acpi: Prepare use of EFI runtime services

In order to use EFI runtime services, esp. ACPI PRM which uses the
efi_rts_wq workqueue, initialize EFI before CXL ACPI.

There is a subsys_initcall order dependency if driver is builtin:

 subsys_initcall(cxl_acpi_init);
 subsys_initcall(efisubsys_init);

Prevent the efi_rts_wq workqueue being used by cxl_acpi_init() before
its allocation. Use subsys_initcall_sync(cxl_acpi_init) to always run
efisubsys_init() first.

Reported-by: Gregory Price <gourry@gourry.net>
Signed-off-by: Robert Richter <rrichter@amd.com>
---
 drivers/cxl/acpi.c | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/drivers/cxl/acpi.c b/drivers/cxl/acpi.c
index a1a99ec3f12c..af750a8bd373 100644
--- a/drivers/cxl/acpi.c
+++ b/drivers/cxl/acpi.c
@@ -932,8 +932,12 @@ static void __exit cxl_acpi_exit(void)
 	cxl_bus_drain();
 }
 
-/* load before dax_hmem sees 'Soft Reserved' CXL ranges */
-subsys_initcall(cxl_acpi_init);
+/*
+ * Load before dax_hmem sees 'Soft Reserved' CXL ranges. Use
+ * subsys_initcall_sync() since there is an order dependency with
+ * subsys_initcall(efisubsys_init), which must run first.
+ */
+subsys_initcall_sync(cxl_acpi_init);
 
 /*
  * Arrange for host-bridge ports to be active synchronous with
-- 
2.39.5
Re: [PATCH] [HACK] drop zen5_init checks due to segfault
Posted by Joshua Hahn 7 months, 2 weeks ago
On Tue, 24 Jun 2025 07:43:13 +0200 Robert Richter <rrichter@amd.com> wrote:

> On 17.06.25 13:33:18, Joshua Hahn wrote:
> > I was hoping for some help in understanding your explanation here -- I don't
> > think I can see where the dependency appears. (In particular, I'm having
> > trouble understanding where the efi_rts_wq dependnecy matters during the
> > cxl_zen5_init function). 
> 
> Here a temporary patch with an explanation in the description:

Hi Robert,

Thank you for this patch! I just tested on my machine, and can confirm that
this does indeed fix the problem. I'm not sure if this will be folded into
the rest of the patchset or if it will be its own, but I will add my
signatures below.

Thank you again, Have a great day!

Tested-by: Joshua Hahn <joshua.hahnjy@gmail.com>
Reviewed-by: Joshua Hahn <joshua.hahnjy@gmail.com>

> From a540b814d48574b67a9aaa97a5d7536c61d4deda Mon Sep 17 00:00:00 2001
> From: Robert Richter <rrichter@amd.com>
> Date: Tue, 13 May 2025 15:02:16 +0200
> Subject: [PATCH] cxl/acpi: Prepare use of EFI runtime services
> 
> In order to use EFI runtime services, esp. ACPI PRM which uses the
> efi_rts_wq workqueue, initialize EFI before CXL ACPI.
> 
> There is a subsys_initcall order dependency if driver is builtin:
> 
>  subsys_initcall(cxl_acpi_init);
>  subsys_initcall(efisubsys_init);
> 
> Prevent the efi_rts_wq workqueue being used by cxl_acpi_init() before
> its allocation. Use subsys_initcall_sync(cxl_acpi_init) to always run
> efisubsys_init() first.
> 
> Reported-by: Gregory Price <gourry@gourry.net>
> Signed-off-by: Robert Richter <rrichter@amd.com>

[...snip...]

Sent using hkml (https://github.com/sjp38/hackermail)