[PATCH v5 1/7] cxl/acpi: Refactor cxl_acpi_probe() to always schedule fallback DAX registration

Smita Koralahalli posted 7 patches 2 months, 3 weeks ago
[PATCH v5 1/7] cxl/acpi: Refactor cxl_acpi_probe() to always schedule fallback DAX registration
Posted by Smita Koralahalli 2 months, 3 weeks ago
Refactor cxl_acpi_probe() to use a single exit path so that the fallback
DAX registration can be scheduled regardless of probe success or failure.

With CONFIG_CXL_ACPI enabled, future patches will bypass DAX device
registration via the HMAT and hmem drivers. To avoid missing DAX
registration for SOFT RESERVED regions, the fallback path must be
triggered regardless of probe outcome.

No functional changes.

Signed-off-by: Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com>
---
 drivers/cxl/acpi.c | 30 ++++++++++++++++++------------
 1 file changed, 18 insertions(+), 12 deletions(-)

diff --git a/drivers/cxl/acpi.c b/drivers/cxl/acpi.c
index a1a99ec3f12c..ca06d5acdf8f 100644
--- a/drivers/cxl/acpi.c
+++ b/drivers/cxl/acpi.c
@@ -825,7 +825,7 @@ static int pair_cxl_resource(struct device *dev, void *data)
 
 static int cxl_acpi_probe(struct platform_device *pdev)
 {
-	int rc;
+	int rc = 0;
 	struct resource *cxl_res;
 	struct cxl_root *cxl_root;
 	struct cxl_port *root_port;
@@ -837,7 +837,7 @@ static int cxl_acpi_probe(struct platform_device *pdev)
 	rc = devm_add_action_or_reset(&pdev->dev, cxl_acpi_lock_reset_class,
 				      &pdev->dev);
 	if (rc)
-		return rc;
+		goto out;
 
 	cxl_res = devm_kzalloc(host, sizeof(*cxl_res), GFP_KERNEL);
 	if (!cxl_res)
@@ -848,18 +848,20 @@ static int cxl_acpi_probe(struct platform_device *pdev)
 	cxl_res->flags = IORESOURCE_MEM;
 
 	cxl_root = devm_cxl_add_root(host, &acpi_root_ops);
-	if (IS_ERR(cxl_root))
-		return PTR_ERR(cxl_root);
+	if (IS_ERR(cxl_root)) {
+		rc = PTR_ERR(cxl_root);
+		goto out;
+	}
 	root_port = &cxl_root->port;
 
 	rc = bus_for_each_dev(adev->dev.bus, NULL, root_port,
 			      add_host_bridge_dport);
 	if (rc < 0)
-		return rc;
+		goto out;
 
 	rc = devm_add_action_or_reset(host, remove_cxl_resources, cxl_res);
 	if (rc)
-		return rc;
+		goto out;
 
 	ctx = (struct cxl_cfmws_context) {
 		.dev = host,
@@ -867,12 +869,14 @@ static int cxl_acpi_probe(struct platform_device *pdev)
 		.cxl_res = cxl_res,
 	};
 	rc = acpi_table_parse_cedt(ACPI_CEDT_TYPE_CFMWS, cxl_parse_cfmws, &ctx);
-	if (rc < 0)
-		return -ENXIO;
+	if (rc < 0) {
+		rc = -ENXIO;
+		goto out;
+	}
 
 	rc = add_cxl_resources(cxl_res);
 	if (rc)
-		return rc;
+		goto out;
 
 	/*
 	 * Populate the root decoders with their related iomem resource,
@@ -887,17 +891,19 @@ static int cxl_acpi_probe(struct platform_device *pdev)
 	rc = bus_for_each_dev(adev->dev.bus, NULL, root_port,
 			      add_host_bridge_uport);
 	if (rc < 0)
-		return rc;
+		goto out;
 
 	if (IS_ENABLED(CONFIG_CXL_PMEM))
 		rc = device_for_each_child(&root_port->dev, root_port,
 					   add_root_nvdimm_bridge);
 	if (rc < 0)
-		return rc;
+		goto out;
 
 	/* In case PCI is scanned before ACPI re-trigger memdev attach */
 	cxl_bus_rescan();
-	return 0;
+
+out:
+	return rc;
 }
 
 static const struct acpi_device_id cxl_acpi_ids[] = {
-- 
2.17.1
Re: [PATCH v5 1/7] cxl/acpi: Refactor cxl_acpi_probe() to always schedule fallback DAX registration
Posted by dan.j.williams@intel.com 2 months, 2 weeks ago
Smita Koralahalli wrote:
> Refactor cxl_acpi_probe() to use a single exit path so that the fallback
> DAX registration can be scheduled regardless of probe success or failure.

I do not understand why cxl_acpi needs to be responsible for this,
especially in the cxl_acpi_probe() failure path. Certainly if
cxl_acpi_probe() fails, that is a strong signal to give up on the CXL
subsystem altogether and fallback to DAX vanilla discovery exclusively.

Now, maybe the need for this becomes clearer in follow-on patches.
However, I would have expected that DAX, which currently arranges for
CXL to load first would just flush CXL discovery, make a decision about
whether proceed with Soft Reserved, or not.

Something like:

DAX						CXL 
						Scan CXL Windows. Fail on any window
                                                parsing failures
                                                
                                                Launch a work item to flush PCI
                                                discovery and give a reaonable amount of
                                                time for cxl_pci and cxl_mem to quiesce

<assumes CXL Windows are discovered     	
 by virtue of initcall order or         	
 MODULE_SOFTDEP("pre: cxl_acpi")>       	
                                        	
Calls a CXL flush routine to await probe	
completion (will always be racy)        	

Evaluates if all Soft Reserve has
cxl_region coverage

if yes: skip publishing CXL intersecting
Soft Reserve range in iomem, let dax_cxl
attach to the cxl_region devices

if no: decline the already published
cxl_dax_regions, notify cxl_acpi to
shutdown. Install Soft Reserved in iomem
and create dax_hmem devices for the
ranges per usual.

Something like the above puts all the onus on device-dax to decide if
CXL is meeting expectations. CXL is only responsible flagging when it
thinks it has successfully completed init. If device-dax disagrees with
what CXL has done it can tear down the world without ever attaching
'struct cxl_dax_region'. The success/fail is an "all or nothing"
proposition.  Either CXL understands everything or the user needs to
work with their hardware vendor to fix whatever is giving the CXL driver
indigestion.

It needs to be coarse and simple because longer term the expectation is
the Soft Reserved stops going to System RAM by default and instead
becomes an isolated memory pool that requires opt-in. In many ways the
current behavior is optimized for hardware validation not applications.

> With CONFIG_CXL_ACPI enabled, future patches will bypass DAX device
> registration via the HMAT and hmem drivers. To avoid missing DAX
> registration for SOFT RESERVED regions, the fallback path must be
> triggered regardless of probe outcome.
> 
> No functional changes.

A comment below in case something like this patch moves forward:

> 
> Signed-off-by: Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com>
> ---
>  drivers/cxl/acpi.c | 30 ++++++++++++++++++------------
>  1 file changed, 18 insertions(+), 12 deletions(-)
> 
> diff --git a/drivers/cxl/acpi.c b/drivers/cxl/acpi.c
> index a1a99ec3f12c..ca06d5acdf8f 100644
> --- a/drivers/cxl/acpi.c
> +++ b/drivers/cxl/acpi.c
> @@ -825,7 +825,7 @@ static int pair_cxl_resource(struct device *dev, void *data)
>  
>  static int cxl_acpi_probe(struct platform_device *pdev)
>  {
> -	int rc;
> +	int rc = 0;
>  	struct resource *cxl_res;
>  	struct cxl_root *cxl_root;
>  	struct cxl_port *root_port;
> @@ -837,7 +837,7 @@ static int cxl_acpi_probe(struct platform_device *pdev)
>  	rc = devm_add_action_or_reset(&pdev->dev, cxl_acpi_lock_reset_class,
>  				      &pdev->dev);
>  	if (rc)
> -		return rc;
> +		goto out;

No, new goto please. With cleanup.h the momentum is towards elimination
of goto. If you need to do something like this, just wrap the function:

diff --git a/drivers/cxl/acpi.c b/drivers/cxl/acpi.c
index a1a99ec3f12c..b50d3aa45ad5 100644
--- a/drivers/cxl/acpi.c
+++ b/drivers/cxl/acpi.c
@@ -823,7 +823,7 @@ static int pair_cxl_resource(struct device *dev, void *data)
 	return 0;
 }
 
-static int cxl_acpi_probe(struct platform_device *pdev)
+static int __cxl_acpi_probe(struct platform_device *pdev)
 {
 	int rc;
 	struct resource *cxl_res;
@@ -900,6 +900,15 @@ static int cxl_acpi_probe(struct platform_device *pdev)
 	return 0;
 }
 
+static int cxl_acpi_probe(struct platform_device *pdev)
+{
+	int rc = __cxl_acpi_probe(pdev);
+
+	/* do something */
+
+	return rc;
+}
+
 static const struct acpi_device_id cxl_acpi_ids[] = {
 	{ "ACPI0017" },
 	{ },
Re: [PATCH v5 1/7] cxl/acpi: Refactor cxl_acpi_probe() to always schedule fallback DAX registration
Posted by Alison Schofield 2 months, 2 weeks ago
On Tue, Jul 22, 2025 at 02:04:00PM -0700, Dan Williams wrote:
> Smita Koralahalli wrote:
> > Refactor cxl_acpi_probe() to use a single exit path so that the fallback
> > DAX registration can be scheduled regardless of probe success or failure.
> 
> I do not understand why cxl_acpi needs to be responsible for this,
> especially in the cxl_acpi_probe() failure path. Certainly if
> cxl_acpi_probe() fails, that is a strong signal to give up on the CXL
> subsystem altogether and fallback to DAX vanilla discovery exclusively.
> 
> Now, maybe the need for this becomes clearer in follow-on patches.
> However, I would have expected that DAX, which currently arranges for
> CXL to load first would just flush CXL discovery, make a decision about
> whether proceed with Soft Reserved, or not.
> 
> Something like:
> 
> DAX						CXL 
> 						Scan CXL Windows. Fail on any window
>                                                 parsing failures
>                                                 
>                                                 Launch a work item to flush PCI
>                                                 discovery and give a reaonable amount of
>                                                 time for cxl_pci and cxl_mem to quiesce
> 
> <assumes CXL Windows are discovered     	
>  by virtue of initcall order or         	
>  MODULE_SOFTDEP("pre: cxl_acpi")>       	
>                                         	
> Calls a CXL flush routine to await probe	
> completion (will always be racy)        	
> 
> Evaluates if all Soft Reserve has
> cxl_region coverage
> 
> if yes: skip publishing CXL intersecting
> Soft Reserve range in iomem, let dax_cxl
> attach to the cxl_region devices
> 
> if no: decline the already published
> cxl_dax_regions, notify cxl_acpi to
> shutdown. Install Soft Reserved in iomem
> and create dax_hmem devices for the
> ranges per usual.

This is super course. If CXL region driver sets up 99 regions with
exact matching SR ranges and there are no CXL Windows with unused SR,
then we have a YES!

But if after those 99 successful assemblies, we get one errant window
with a Soft Reserved for which a region never assembles, it's a hard NO.
DAX declines, ie teardowns the 99 dax_regions and cxl_regions.

Obviously, this is different from the current approach that aimed to
pick up completely unused SRs and the trimmings from SRs that exceeded
region size and offered them to DAX too.

I'm cringing a bit at the fact that one bad apple (like a cxl device
that doesn't show up for it's region) means no CXL devices get managed.

Probably asking the obvious question here. This is what 'we' want,
right?

> 
> Something like the above puts all the onus on device-dax to decide if
> CXL is meeting expectations. CXL is only responsible flagging when it
> thinks it has successfully completed init. If device-dax disagrees with
> what CXL has done it can tear down the world without ever attaching
> 'struct cxl_dax_region'. The success/fail is an "all or nothing"
> proposition.  Either CXL understands everything or the user needs to
> work with their hardware vendor to fix whatever is giving the CXL driver
> indigestion.
> 
> It needs to be coarse and simple because longer term the expectation is
> the Soft Reserved stops going to System RAM by default and instead
> becomes an isolated memory pool that requires opt-in. In many ways the
> current behavior is optimized for hardware validation not applications.
> 
> > With CONFIG_CXL_ACPI enabled, future patches will bypass DAX device
> > registration via the HMAT and hmem drivers. To avoid missing DAX
> > registration for SOFT RESERVED regions, the fallback path must be
> > triggered regardless of probe outcome.
> > 
> > No functional changes.
> 
> A comment below in case something like this patch moves forward:
> 
> > 
> > Signed-off-by: Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com>
> > ---
> >  drivers/cxl/acpi.c | 30 ++++++++++++++++++------------
> >  1 file changed, 18 insertions(+), 12 deletions(-)
> > 
> > diff --git a/drivers/cxl/acpi.c b/drivers/cxl/acpi.c
> > index a1a99ec3f12c..ca06d5acdf8f 100644
> > --- a/drivers/cxl/acpi.c
> > +++ b/drivers/cxl/acpi.c
> > @@ -825,7 +825,7 @@ static int pair_cxl_resource(struct device *dev, void *data)
> >  
> >  static int cxl_acpi_probe(struct platform_device *pdev)
> >  {
> > -	int rc;
> > +	int rc = 0;
> >  	struct resource *cxl_res;
> >  	struct cxl_root *cxl_root;
> >  	struct cxl_port *root_port;
> > @@ -837,7 +837,7 @@ static int cxl_acpi_probe(struct platform_device *pdev)
> >  	rc = devm_add_action_or_reset(&pdev->dev, cxl_acpi_lock_reset_class,
> >  				      &pdev->dev);
> >  	if (rc)
> > -		return rc;
> > +		goto out;
> 
> No, new goto please. With cleanup.h the momentum is towards elimination
> of goto. If you need to do something like this, just wrap the function:
> 
> diff --git a/drivers/cxl/acpi.c b/drivers/cxl/acpi.c
> index a1a99ec3f12c..b50d3aa45ad5 100644
> --- a/drivers/cxl/acpi.c
> +++ b/drivers/cxl/acpi.c
> @@ -823,7 +823,7 @@ static int pair_cxl_resource(struct device *dev, void *data)
>  	return 0;
>  }
>  
> -static int cxl_acpi_probe(struct platform_device *pdev)
> +static int __cxl_acpi_probe(struct platform_device *pdev)
>  {
>  	int rc;
>  	struct resource *cxl_res;
> @@ -900,6 +900,15 @@ static int cxl_acpi_probe(struct platform_device *pdev)
>  	return 0;
>  }
>  
> +static int cxl_acpi_probe(struct platform_device *pdev)
> +{
> +	int rc = __cxl_acpi_probe(pdev);
> +
> +	/* do something */
> +
> +	return rc;
> +}
> +
>  static const struct acpi_device_id cxl_acpi_ids[] = {
>  	{ "ACPI0017" },
>  	{ },
Re: [PATCH v5 1/7] cxl/acpi: Refactor cxl_acpi_probe() to always schedule fallback DAX registration
Posted by dan.j.williams@intel.com 2 months, 2 weeks ago
Alison Schofield wrote:
> On Tue, Jul 22, 2025 at 02:04:00PM -0700, Dan Williams wrote:
> > Smita Koralahalli wrote:
> > > Refactor cxl_acpi_probe() to use a single exit path so that the fallback
> > > DAX registration can be scheduled regardless of probe success or failure.
> > 
> > I do not understand why cxl_acpi needs to be responsible for this,
> > especially in the cxl_acpi_probe() failure path. Certainly if
> > cxl_acpi_probe() fails, that is a strong signal to give up on the CXL
> > subsystem altogether and fallback to DAX vanilla discovery exclusively.
> > 
> > Now, maybe the need for this becomes clearer in follow-on patches.
> > However, I would have expected that DAX, which currently arranges for
> > CXL to load first would just flush CXL discovery, make a decision about
> > whether proceed with Soft Reserved, or not.
> > 
> > Something like:
> > 
> > DAX						CXL 
> > 						Scan CXL Windows. Fail on any window
> >                                                 parsing failures
> >                                                 
> >                                                 Launch a work item to flush PCI
> >                                                 discovery and give a reaonable amount of
> >                                                 time for cxl_pci and cxl_mem to quiesce
> > 
> > <assumes CXL Windows are discovered     	
> >  by virtue of initcall order or         	
> >  MODULE_SOFTDEP("pre: cxl_acpi")>       	
> >                                         	
> > Calls a CXL flush routine to await probe	
> > completion (will always be racy)        	
> > 
> > Evaluates if all Soft Reserve has
> > cxl_region coverage
> > 
> > if yes: skip publishing CXL intersecting
> > Soft Reserve range in iomem, let dax_cxl
> > attach to the cxl_region devices
> > 
> > if no: decline the already published
> > cxl_dax_regions, notify cxl_acpi to
> > shutdown. Install Soft Reserved in iomem
> > and create dax_hmem devices for the
> > ranges per usual.
> 
> This is super course. If CXL region driver sets up 99 regions with
> exact matching SR ranges and there are no CXL Windows with unused SR,
> then we have a YES!
>
> But if after those 99 successful assemblies, we get one errant window
> with a Soft Reserved for which a region never assembles, it's a hard NO.
> DAX declines, ie teardowns the 99 dax_regions and cxl_regions.

Exactly.

> Obviously, this is different from the current approach that aimed to
> pick up completely unused SRs and the trimmings from SRs that exceeded
> region size and offered them to DAX too.
> 
> I'm cringing a bit at the fact that one bad apple (like a cxl device
> that doesn't show up for it's region) means no CXL devices get managed.

Think of the linear extended cache case where if you just look at the
endpoint CXL decoders you get half of the capacity and none of the
warning that the DDR side of the cache can not be managed. Consider the
other address translation error cases where endpoint decoders do not
tell the full story.

Now think of the contract of what the CXL driver offers. It says, "hey
if you change this configuration I have high confidence that I have
managed everything associated with this region throughout the whole
platform config", or "if an error happens anywhere within a CXL window
you can rely on my identification of the system components to suspect".

> Probably asking the obvious question here. This is what 'we' want,
> right?

So, in the end it is not a fair fight. It is a handful of Linux
developers vs multiple platform BIOS teams per vendor and per hardware
generation. The amount of Linux tripping, specification-stretching,
innovation is eye watering.

The downside of being drastic and coarse is worth the benefit.

The status quo is end users get nothing (stranded memory), or low
confidence partial configs with latent RAS problems.

The coarse policy is a spec compliant, safe fallback that gets end users
access to the memory the BIOS/firmware promised. It buys time for the
platform vendor to get Linux fixed, or even better, these Linux
coniptions catch problems earlier in platform qualification cycle.

The alternative fine grained policy hopes that the memory the CXL driver
gives up on was not critical or indicative of a deeper misunderstanding
of the platform. I do not want to debug hopeful guesses on top of
undefined breakage.