[RESEND v13 17/25] cxl: Introduce cxl_pci_drv_bound() to check for bound driver

Terry Bowman posted 25 patches 1 month, 1 week ago
[RESEND v13 17/25] cxl: Introduce cxl_pci_drv_bound() to check for bound driver
Posted by Terry Bowman 1 month, 1 week ago
CXL devices handle protocol errors via driver-specific callbacks rather
than the generic pci_driver::err_handlers by default. The callbacks are
implemented in the cxl_pci driver and are not part of struct pci_driver, so
cxl_core must verify that a device is actually bound to the cxl_pci
module's driver before invoking the callbacks (the device could be bound
to another driver, e.g. VFIO).

However, cxl_core can not reference symbols in the cxl_pci module because
it creates a circular dependency. This prevents cxl_core from checking the
EP's bound driver and calling the callbacks.

To fix this, move drivers/cxl/pci.c into drivers/cxl/core/pci_drv.c and
build it as part of the cxl_core module. Compile into cxl_core using
CXL_PCI and CXL_CORE Kconfig dependencies. This removes the standalone
cxl_pci module, consolidates the cxl_pci driver code into cxl_core, and
eliminates the circular dependency so cxl_core can safely perform
bound-driver checks and invoke the CXL PCI callbacks.

Introduce cxl_pci_drv_bound() to return boolean depending on if the PCI EP
parameter is bound to a CXL driver instance. This will be used in future
patch when dequeuing work from the kfifo.

Signed-off-by: Terry Bowman <terry.bowman@amd.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Reviewed-by: Ben Cheatham <benjamin.cheatham@amd.com>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>

---

Changes in v12->v13;
- Add Dave Jiang's review-by.

Changes in v11->v12:
- Add device_lock_assert() in cxl_pci_drv_bound() (Dave Jiang)
- Add Jonathan's review-by

Changes in v11->v12:
- None

Changes in v10->v11:
- cxl_error_detected() - Change handlers' scoped_guard() to guard() (Jonathan)
- cxl_error_detected() - Remove extra line (Shiju)
- Changes moved to core/ras.c (Terry)
- cxl_error_detected(), remove 'ue' and return with function call. (Jonathan)
- Remove extra space in documentation for PCI_ERS_RESULT_PANIC definition
- Move #include "pci.h from cxl.h to core.h (Terry)
- Remove unnecessary includes of cxl.h and core.h in mem.c (Terry)
---
 drivers/cxl/Kconfig                   |  6 +++---
 drivers/cxl/Makefile                  |  2 --
 drivers/cxl/core/Makefile             |  1 +
 drivers/cxl/core/core.h               |  9 +++++++++
 drivers/cxl/{pci.c => core/pci_drv.c} | 21 +++++++++++++--------
 drivers/cxl/core/port.c               |  3 +++
 tools/testing/cxl/Kbuild              |  1 +
 7 files changed, 30 insertions(+), 13 deletions(-)
 rename drivers/cxl/{pci.c => core/pci_drv.c} (99%)

diff --git a/drivers/cxl/Kconfig b/drivers/cxl/Kconfig
index ffe6ad981434..360c78fa7e97 100644
--- a/drivers/cxl/Kconfig
+++ b/drivers/cxl/Kconfig
@@ -20,7 +20,7 @@ menuconfig CXL_BUS
 if CXL_BUS
 
 config CXL_PCI
-	tristate "PCI manageability"
+	bool "PCI manageability"
 	default CXL_BUS
 	help
 	  The CXL specification defines a "CXL memory device" sub-class in the
@@ -29,12 +29,12 @@ config CXL_PCI
 	  memory to be mapped into the system address map (Host-managed Device
 	  Memory (HDM)).
 
-	  Say 'y/m' to enable a driver that will attach to CXL memory expander
+	  Say 'y' to enable a driver that will attach to CXL memory expander
 	  devices enumerated by the memory device class code for configuration
 	  and management primarily via the mailbox interface. See Chapter 2.3
 	  Type 3 CXL Device in the CXL 2.0 specification for more details.
 
-	  If unsure say 'm'.
+	  If unsure say 'y'.
 
 config CXL_MEM_RAW_COMMANDS
 	bool "RAW Command Interface for Memory Devices"
diff --git a/drivers/cxl/Makefile b/drivers/cxl/Makefile
index 2caa90fa4bf2..ff6add88b6ae 100644
--- a/drivers/cxl/Makefile
+++ b/drivers/cxl/Makefile
@@ -12,10 +12,8 @@ obj-$(CONFIG_CXL_PORT) += cxl_port.o
 obj-$(CONFIG_CXL_ACPI) += cxl_acpi.o
 obj-$(CONFIG_CXL_PMEM) += cxl_pmem.o
 obj-$(CONFIG_CXL_MEM) += cxl_mem.o
-obj-$(CONFIG_CXL_PCI) += cxl_pci.o
 
 cxl_port-y := port.o
 cxl_acpi-y := acpi.o
 cxl_pmem-y := pmem.o security.o
 cxl_mem-y := mem.o
-cxl_pci-y := pci.o
diff --git a/drivers/cxl/core/Makefile b/drivers/cxl/core/Makefile
index fa1d4aed28b9..2937d0ddcce2 100644
--- a/drivers/cxl/core/Makefile
+++ b/drivers/cxl/core/Makefile
@@ -21,3 +21,4 @@ cxl_core-$(CONFIG_CXL_FEATURES) += features.o
 cxl_core-$(CONFIG_CXL_EDAC_MEM_FEATURES) += edac.o
 cxl_core-$(CONFIG_CXL_RAS) += ras.o
 cxl_core-$(CONFIG_CXL_RCH_RAS) += ras_rch.o
+cxl_core-$(CONFIG_CXL_PCI) += pci_drv.o
diff --git a/drivers/cxl/core/core.h b/drivers/cxl/core/core.h
index e47ae7365ce0..61c6726744d7 100644
--- a/drivers/cxl/core/core.h
+++ b/drivers/cxl/core/core.h
@@ -195,4 +195,13 @@ int cxl_set_feature(struct cxl_mailbox *cxl_mbox, const uuid_t *feat_uuid,
 		    u16 *return_code);
 #endif
 
+#ifdef CONFIG_CXL_PCI
+bool cxl_pci_drv_bound(struct pci_dev *pdev);
+int cxl_pci_driver_init(void);
+void cxl_pci_driver_exit(void);
+#else
+static inline bool cxl_pci_drv_bound(struct pci_dev *pdev) { return false; };
+static inline int cxl_pci_driver_init(void) { return 0; }
+static inline void cxl_pci_driver_exit(void) { }
+#endif
 #endif /* __CXL_CORE_H__ */
diff --git a/drivers/cxl/pci.c b/drivers/cxl/core/pci_drv.c
similarity index 99%
rename from drivers/cxl/pci.c
rename to drivers/cxl/core/pci_drv.c
index bd95be1f3d5c..06f2fd993cb0 100644
--- a/drivers/cxl/pci.c
+++ b/drivers/cxl/core/pci_drv.c
@@ -1131,6 +1131,17 @@ static struct pci_driver cxl_pci_driver = {
 	},
 };
 
+bool cxl_pci_drv_bound(struct pci_dev *pdev)
+{
+	device_lock_assert(&pdev->dev);
+
+	if (pdev->driver != &cxl_pci_driver)
+		pr_err_ratelimited("%s device not bound to CXL PCI driver\n",
+				   pci_name(pdev));
+
+	return (pdev->driver == &cxl_pci_driver);
+}
+
 #define CXL_EVENT_HDR_FLAGS_REC_SEVERITY GENMASK(1, 0)
 static void cxl_handle_cper_event(enum cxl_event_type ev_type,
 				  struct cxl_cper_event_rec *rec)
@@ -1177,7 +1188,7 @@ static void cxl_cper_work_fn(struct work_struct *work)
 }
 static DECLARE_WORK(cxl_cper_work, cxl_cper_work_fn);
 
-static int __init cxl_pci_driver_init(void)
+int __init cxl_pci_driver_init(void)
 {
 	int rc;
 
@@ -1192,15 +1203,9 @@ static int __init cxl_pci_driver_init(void)
 	return rc;
 }
 
-static void __exit cxl_pci_driver_exit(void)
+void cxl_pci_driver_exit(void)
 {
 	cxl_cper_unregister_work(&cxl_cper_work);
 	cancel_work_sync(&cxl_cper_work);
 	pci_unregister_driver(&cxl_pci_driver);
 }
-
-module_init(cxl_pci_driver_init);
-module_exit(cxl_pci_driver_exit);
-MODULE_DESCRIPTION("CXL: PCI manageability");
-MODULE_LICENSE("GPL v2");
-MODULE_IMPORT_NS("CXL");
diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c
index 48f6a1492544..b70e1b505b5c 100644
--- a/drivers/cxl/core/port.c
+++ b/drivers/cxl/core/port.c
@@ -2507,6 +2507,8 @@ static __init int cxl_core_init(void)
 	if (rc)
 		goto err_ras;
 
+	cxl_pci_driver_init();
+
 	return 0;
 
 err_ras:
@@ -2522,6 +2524,7 @@ static __init int cxl_core_init(void)
 
 static void cxl_core_exit(void)
 {
+	cxl_pci_driver_exit();
 	cxl_ras_exit();
 	cxl_region_exit();
 	bus_unregister(&cxl_bus_type);
diff --git a/tools/testing/cxl/Kbuild b/tools/testing/cxl/Kbuild
index 6905f8e710ab..d8b8272ef87b 100644
--- a/tools/testing/cxl/Kbuild
+++ b/tools/testing/cxl/Kbuild
@@ -65,6 +65,7 @@ cxl_core-$(CONFIG_CXL_FEATURES) += $(CXL_CORE_SRC)/features.o
 cxl_core-$(CONFIG_CXL_EDAC_MEM_FEATURES) += $(CXL_CORE_SRC)/edac.o
 cxl_core-$(CONFIG_CXL_RAS) += $(CXL_CORE_SRC)/ras.o
 cxl_core-$(CONFIG_CXL_RCH_RAS) += $(CXL_CORE_SRC)/ras_rch.o
+cxl_core-$(CONFIG_CXL_PCI) += $(CXL_CORE_SRC)/pci_drv.o
 cxl_core-y += config_check.o
 cxl_core-y += cxl_core_test.o
 cxl_core-y += cxl_core_exports.o
-- 
2.34.1
Re: [RESEND v13 17/25] cxl: Introduce cxl_pci_drv_bound() to check for bound driver
Posted by dan.j.williams@intel.com 3 weeks, 6 days ago
Terry Bowman wrote:
> CXL devices handle protocol errors via driver-specific callbacks rather
> than the generic pci_driver::err_handlers by default. The callbacks are
> implemented in the cxl_pci driver and are not part of struct pci_driver, so
> cxl_core must verify that a device is actually bound to the cxl_pci
> module's driver before invoking the callbacks (the device could be bound
> to another driver, e.g. VFIO).
> 
> However, cxl_core can not reference symbols in the cxl_pci module because
> it creates a circular dependency. This prevents cxl_core from checking the
> EP's bound driver and calling the callbacks.
> 
> To fix this, move drivers/cxl/pci.c into drivers/cxl/core/pci_drv.c and
> build it as part of the cxl_core module. Compile into cxl_core using
> CXL_PCI and CXL_CORE Kconfig dependencies. This removes the standalone
> cxl_pci module, consolidates the cxl_pci driver code into cxl_core, and
> eliminates the circular dependency so cxl_core can safely perform
> bound-driver checks and invoke the CXL PCI callbacks.
> 
> Introduce cxl_pci_drv_bound() to return boolean depending on if the PCI EP
> parameter is bound to a CXL driver instance. This will be used in future
> patch when dequeuing work from the kfifo.

I am thoroughly confused about what this patch is trying to do. The
whole point of a cxl_core is to separate the potential shared mechanics
across CXL device types from the specific special case of the CXL memory
class device.

This would be like saying that all PCI drivers need to be built-into the
PCI core to satisfy PCI error handling.

If the core needs to verify the driver before calling the handler then
the design is broken.

The design should accommodate a case of *only* a CXL accelerator driver
loading and not a CXL memory expander driver. Let me go take a look at
how cxl_pci_drv_bound() is used. There must be a simple misunderstanding
that we can resolve quickly.
Re: [RESEND v13 17/25] cxl: Introduce cxl_pci_drv_bound() to check for bound driver
Posted by Alison Schofield 1 month ago
On Tue, Nov 04, 2025 at 11:02:57AM -0600, Terry Bowman wrote:
> CXL devices handle protocol errors via driver-specific callbacks rather
> than the generic pci_driver::err_handlers by default. The callbacks are
> implemented in the cxl_pci driver and are not part of struct pci_driver, so
> cxl_core must verify that a device is actually bound to the cxl_pci
> module's driver before invoking the callbacks (the device could be bound
> to another driver, e.g. VFIO).
> 
> However, cxl_core can not reference symbols in the cxl_pci module because
> it creates a circular dependency. This prevents cxl_core from checking the
> EP's bound driver and calling the callbacks.
> 
> To fix this, move drivers/cxl/pci.c into drivers/cxl/core/pci_drv.c and
> build it as part of the cxl_core module. Compile into cxl_core using
> CXL_PCI and CXL_CORE Kconfig dependencies. This removes the standalone
> cxl_pci module, consolidates the cxl_pci driver code into cxl_core, and
> eliminates the circular dependency so cxl_core can safely perform
> bound-driver checks and invoke the CXL PCI callbacks.
> 
> Introduce cxl_pci_drv_bound() to return boolean depending on if the PCI EP
> parameter is bound to a CXL driver instance. This will be used in future
> patch when dequeuing work from the kfifo.

This one was troublesome in cxl-test, more circular dependencies.
I noticed you and GregP chatting about it, so I simply remove it from
the set for now (made all callsites true).

With it gone, the set builds cxl-test and passes the test suite.
I'll watch what happens with this one, and can take another look at
the cxl-test issues if they persist.

A bit below...

snip

> diff --git a/drivers/cxl/pci.c b/drivers/cxl/core/pci_drv.c
> similarity index 99%
> rename from drivers/cxl/pci.c
> rename to drivers/cxl/core/pci_drv.c
> index bd95be1f3d5c..06f2fd993cb0 100644
> --- a/drivers/cxl/pci.c
> +++ b/drivers/cxl/core/pci_drv.c

Needs:
+#include "core.h"

Compiler is warning: no previous prototypes.

snip
Re: [RESEND v13 17/25] cxl: Introduce cxl_pci_drv_bound() to check for bound driver
Posted by Alison Schofield 1 month ago
On Tue, Nov 11, 2025 at 12:33:53AM -0800, Alison Schofield wrote:
> On Tue, Nov 04, 2025 at 11:02:57AM -0600, Terry Bowman wrote:
> > CXL devices handle protocol errors via driver-specific callbacks rather
> > than the generic pci_driver::err_handlers by default. The callbacks are
> > implemented in the cxl_pci driver and are not part of struct pci_driver, so
> > cxl_core must verify that a device is actually bound to the cxl_pci
> > module's driver before invoking the callbacks (the device could be bound
> > to another driver, e.g. VFIO).
> > 
> > However, cxl_core can not reference symbols in the cxl_pci module because
> > it creates a circular dependency. This prevents cxl_core from checking the
> > EP's bound driver and calling the callbacks.
> > 
> > To fix this, move drivers/cxl/pci.c into drivers/cxl/core/pci_drv.c and
> > build it as part of the cxl_core module. Compile into cxl_core using
> > CXL_PCI and CXL_CORE Kconfig dependencies. This removes the standalone
> > cxl_pci module, consolidates the cxl_pci driver code into cxl_core, and
> > eliminates the circular dependency so cxl_core can safely perform
> > bound-driver checks and invoke the CXL PCI callbacks.
> > 
> > Introduce cxl_pci_drv_bound() to return boolean depending on if the PCI EP
> > parameter is bound to a CXL driver instance. This will be used in future
> > patch when dequeuing work from the kfifo.
> 
> This one was troublesome in cxl-test, more circular dependencies.
> I noticed you and GregP chatting about it, so I simply remove it from
> the set for now (made all callsites true).
> 
> With it gone, the set builds cxl-test and passes the test suite.
> I'll watch what happens with this one, and can take another look at
> the cxl-test issues if they persist.

Hi Terry -

I took another look, suspecting the circle issue started with the
move of pci.c into the core, and not necessarily your new additions.
There are two functions that are wrapped in cxl-test and now with
this move are being called from the core and creating the 'circle':

cxl_await_media_ready()
cxl_rcd_component_reg_phys()

Both those need the 'restrict' method, like for Patch 14.

Once that is resolved, the new function cxl_pci_drv_bound()
seems like it needs mocking and will require the same treatment.

Suggest doing it in separate patches. First patch does the move
and the cxl-test work.  Then a second patch adds the new function
and it's cxl-test support.

--Alison


> 
> A bit below...
> 
> snip
> 
> > diff --git a/drivers/cxl/pci.c b/drivers/cxl/core/pci_drv.c
> > similarity index 99%
> > rename from drivers/cxl/pci.c
> > rename to drivers/cxl/core/pci_drv.c
> > index bd95be1f3d5c..06f2fd993cb0 100644
> > --- a/drivers/cxl/pci.c
> > +++ b/drivers/cxl/core/pci_drv.c
> 
> Needs:
> +#include "core.h"
> 
> Compiler is warning: no previous prototypes.
> 
> snip
>
Re: [RESEND v13 17/25] cxl: Introduce cxl_pci_drv_bound() to check for bound driver
Posted by Bowman, Terry 1 month ago

On 11/13/2025 3:42 PM, Alison Schofield wrote:
> On Tue, Nov 11, 2025 at 12:33:53AM -0800, Alison Schofield wrote:
>> On Tue, Nov 04, 2025 at 11:02:57AM -0600, Terry Bowman wrote:
>>> CXL devices handle protocol errors via driver-specific callbacks rather
>>> than the generic pci_driver::err_handlers by default. The callbacks are
>>> implemented in the cxl_pci driver and are not part of struct pci_driver, so
>>> cxl_core must verify that a device is actually bound to the cxl_pci
>>> module's driver before invoking the callbacks (the device could be bound
>>> to another driver, e.g. VFIO).
>>>
>>> However, cxl_core can not reference symbols in the cxl_pci module because
>>> it creates a circular dependency. This prevents cxl_core from checking the
>>> EP's bound driver and calling the callbacks.
>>>
>>> To fix this, move drivers/cxl/pci.c into drivers/cxl/core/pci_drv.c and
>>> build it as part of the cxl_core module. Compile into cxl_core using
>>> CXL_PCI and CXL_CORE Kconfig dependencies. This removes the standalone
>>> cxl_pci module, consolidates the cxl_pci driver code into cxl_core, and
>>> eliminates the circular dependency so cxl_core can safely perform
>>> bound-driver checks and invoke the CXL PCI callbacks.
>>>
>>> Introduce cxl_pci_drv_bound() to return boolean depending on if the PCI EP
>>> parameter is bound to a CXL driver instance. This will be used in future
>>> patch when dequeuing work from the kfifo.
>> This one was troublesome in cxl-test, more circular dependencies.
>> I noticed you and GregP chatting about it, so I simply remove it from
>> the set for now (made all callsites true).
>>
>> With it gone, the set builds cxl-test and passes the test suite.
>> I'll watch what happens with this one, and can take another look at
>> the cxl-test issues if they persist.
> Hi Terry -
>
> I took another look, suspecting the circle issue started with the
> move of pci.c into the core, and not necessarily your new additions.
> There are two functions that are wrapped in cxl-test and now with
> this move are being called from the core and creating the 'circle':
>
> cxl_await_media_ready()
> cxl_rcd_component_reg_phys()
>
> Both those need the 'restrict' method, like for Patch 14.
>
> Once that is resolved, the new function cxl_pci_drv_bound()
> seems like it needs mocking and will require the same treatment.
>
> Suggest doing it in separate patches. First patch does the move
> and the cxl-test work.  Then a second patch adds the new function
> and it's cxl-test support.
>
> --Alison
>

Hi Alison,

Thanks for finding the issue. I'll start on the fix using the changes 
you described.

-Terry


>> A bit below...
>>
>> snip
>>
>>> diff --git a/drivers/cxl/pci.c b/drivers/cxl/core/pci_drv.c
>>> similarity index 99%
>>> rename from drivers/cxl/pci.c
>>> rename to drivers/cxl/core/pci_drv.c
>>> index bd95be1f3d5c..06f2fd993cb0 100644
>>> --- a/drivers/cxl/pci.c
>>> +++ b/drivers/cxl/core/pci_drv.c
>> Needs:
>> +#include "core.h"
>>
>> Compiler is warning: no previous prototypes.
>>
>> snip
>>

Re: [RESEND v13 17/25] cxl: Introduce cxl_pci_drv_bound() to check for bound driver
Posted by Gregory Price 1 month, 1 week ago
On Tue, Nov 04, 2025 at 11:02:57AM -0600, Terry Bowman wrote:
> CXL devices handle protocol errors via driver-specific callbacks rather
> than the generic pci_driver::err_handlers by default. The callbacks are
> implemented in the cxl_pci driver and are not part of struct pci_driver, so
> cxl_core must verify that a device is actually bound to the cxl_pci
> module's driver before invoking the callbacks (the device could be bound
> to another driver, e.g. VFIO).
> 
> However, cxl_core can not reference symbols in the cxl_pci module because
> it creates a circular dependency. This prevents cxl_core from checking the
> EP's bound driver and calling the callbacks.
> 
> To fix this, move drivers/cxl/pci.c into drivers/cxl/core/pci_drv.c and
> build it as part of the cxl_core module. Compile into cxl_core using
> CXL_PCI and CXL_CORE Kconfig dependencies. This removes the standalone
> cxl_pci module, consolidates the cxl_pci driver code into cxl_core, and
> eliminates the circular dependency so cxl_core can safely perform
> bound-driver checks and invoke the CXL PCI callbacks.
> 
> Introduce cxl_pci_drv_bound() to return boolean depending on if the PCI EP
> parameter is bound to a CXL driver instance. This will be used in future
> patch when dequeuing work from the kfifo.
> 
> Signed-off-by: Terry Bowman <terry.bowman@amd.com>
> Reviewed-by: Dave Jiang <dave.jiang@intel.com>
> Reviewed-by: Ben Cheatham <benjamin.cheatham@amd.com>
> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
> 
> ---

This commit causes my QEMU basic expander setup and a real device setup
to fail to probe the cxl_core driver.

[    2.697094] cxl_core 0000:0d:00.0: BAR 0 [mem 0xfe800000-0xfe80ffff 64bit]: not claimed; can't enable device
[    2.697098] cxl_core 0000:0d:00.0: probe with driver cxl_core failed with error -22

Probe order issue when CXL drivers are built-in maybe?

~Gregory
Re: [RESEND v13 17/25] cxl: Introduce cxl_pci_drv_bound() to check for bound driver
Posted by Gregory Price 1 month, 1 week ago
On Wed, Nov 05, 2025 at 12:51:04PM -0500, Gregory Price wrote:
> On Tue, Nov 04, 2025 at 11:02:57AM -0600, Terry Bowman wrote:
> > CXL devices handle protocol errors via driver-specific callbacks rather
> > than the generic pci_driver::err_handlers by default. The callbacks are
> > implemented in the cxl_pci driver and are not part of struct pci_driver, so
> > cxl_core must verify that a device is actually bound to the cxl_pci
> > module's driver before invoking the callbacks (the device could be bound
> > to another driver, e.g. VFIO).
> > 
> > However, cxl_core can not reference symbols in the cxl_pci module because
> > it creates a circular dependency. This prevents cxl_core from checking the
> > EP's bound driver and calling the callbacks.
> > 
> > To fix this, move drivers/cxl/pci.c into drivers/cxl/core/pci_drv.c and
> > build it as part of the cxl_core module. Compile into cxl_core using
> > CXL_PCI and CXL_CORE Kconfig dependencies. This removes the standalone
> > cxl_pci module, consolidates the cxl_pci driver code into cxl_core, and
> > eliminates the circular dependency so cxl_core can safely perform
> > bound-driver checks and invoke the CXL PCI callbacks.
> > 
> > Introduce cxl_pci_drv_bound() to return boolean depending on if the PCI EP
> > parameter is bound to a CXL driver instance. This will be used in future
> > patch when dequeuing work from the kfifo.
> > 
> > Signed-off-by: Terry Bowman <terry.bowman@amd.com>
> > Reviewed-by: Dave Jiang <dave.jiang@intel.com>
> > Reviewed-by: Ben Cheatham <benjamin.cheatham@amd.com>
> > Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
> > 
> > ---
> 
> This commit causes my QEMU basic expander setup and a real device setup
> to fail to probe the cxl_core driver.
> 
> [    2.697094] cxl_core 0000:0d:00.0: BAR 0 [mem 0xfe800000-0xfe80ffff 64bit]: not claimed; can't enable device
> [    2.697098] cxl_core 0000:0d:00.0: probe with driver cxl_core failed with error -22
> 
> Probe order issue when CXL drivers are built-in maybe?
> 

I've narrowed it down to:

Works
-----
CONFIG_CXL_BUS=m
CONFIG_CXL_MEM=m

Fails
-----
CONFIG_CXL_BUS=y
CONFIG_CXL_MEM=y
or BUS ^ MEM

this commit moves pci -> pci_drv.o and moves it ahead of cxl_mem into
cxl_core into core, but note the comment in the Makefile:

# Order is important here for the built-in case:
# - 'core' first for fundamental init
# - 'port' before platform root drivers like 'acpi' so that CXL-root ports
#   are immediately enabled
# - 'mem' and 'pmem' before endpoint drivers so that memdevs are
#   immediately enabled
# - 'pci' last, also mirrors the hardware enumeration hierarchy

~Gregory
Re: [RESEND v13 17/25] cxl: Introduce cxl_pci_drv_bound() to check for bound driver
Posted by Gregory Price 1 month, 1 week ago
On Wed, Nov 05, 2025 at 02:03:31PM -0500, Gregory Price wrote:
> On Wed, Nov 05, 2025 at 12:51:04PM -0500, Gregory Price wrote:
> > 
> > [    2.697094] cxl_core 0000:0d:00.0: BAR 0 [mem 0xfe800000-0xfe80ffff 64bit]: not claimed; can't enable device
> > [    2.697098] cxl_core 0000:0d:00.0: probe with driver cxl_core failed with error -22
> > 
> > Probe order issue when CXL drivers are built-in maybe?
> > 
> 

moving it back but leaving the function seemed to work for me, i don't
know what the implication of this is though (i.e. it's unclear to me
why you moved it from point a to point b in the first place).

(only tested this on QEMU)
---

diff --git a/drivers/cxl/Makefile b/drivers/cxl/Makefile
index ff6add88b6ae..2caa90fa4bf2 100644
--- a/drivers/cxl/Makefile
+++ b/drivers/cxl/Makefile
@@ -12,8 +12,10 @@ obj-$(CONFIG_CXL_PORT) += cxl_port.o
 obj-$(CONFIG_CXL_ACPI) += cxl_acpi.o
 obj-$(CONFIG_CXL_PMEM) += cxl_pmem.o
 obj-$(CONFIG_CXL_MEM) += cxl_mem.o
+obj-$(CONFIG_CXL_PCI) += cxl_pci.o

 cxl_port-y := port.o
 cxl_acpi-y := acpi.o
 cxl_pmem-y := pmem.o security.o
 cxl_mem-y := mem.o
+cxl_pci-y := pci.o
diff --git a/drivers/cxl/core/Makefile b/drivers/cxl/core/Makefile
index 2937d0ddcce2..fa1d4aed28b9 100644
--- a/drivers/cxl/core/Makefile
+++ b/drivers/cxl/core/Makefile
@@ -21,4 +21,3 @@ cxl_core-$(CONFIG_CXL_FEATURES) += features.o
 cxl_core-$(CONFIG_CXL_EDAC_MEM_FEATURES) += edac.o
 cxl_core-$(CONFIG_CXL_RAS) += ras.o
 cxl_core-$(CONFIG_CXL_RCH_RAS) += ras_rch.o
-cxl_core-$(CONFIG_CXL_PCI) += pci_drv.o
diff --git a/drivers/cxl/core/core.h b/drivers/cxl/core/core.h
index a7a0838c8f23..7c287b4fa699 100644
--- a/drivers/cxl/core/core.h
+++ b/drivers/cxl/core/core.h
@@ -223,13 +223,4 @@ int cxl_set_feature(struct cxl_mailbox *cxl_mbox, const uuid_t *feat_uuid,
 		    u16 *return_code);
 #endif

-#ifdef CONFIG_CXL_PCI
-bool cxl_pci_drv_bound(struct pci_dev *pdev);
-int cxl_pci_driver_init(void);
-void cxl_pci_driver_exit(void);
-#else
-static inline bool cxl_pci_drv_bound(struct pci_dev *pdev) { return false; };
-static inline int cxl_pci_driver_init(void) { return 0; }
-static inline void cxl_pci_driver_exit(void) { }
-#endif
 #endif /* __CXL_CORE_H__ */
diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c
index d19ebf052d76..ca02ad58fc57 100644
--- a/drivers/cxl/core/port.c
+++ b/drivers/cxl/core/port.c
@@ -2520,8 +2520,6 @@ static __init int cxl_core_init(void)
 	if (rc)
 		goto err_ras;

-	cxl_pci_driver_init();
-
 	return 0;

 err_ras:
@@ -2537,7 +2535,6 @@ static __init int cxl_core_init(void)

 static void cxl_core_exit(void)
 {
-	cxl_pci_driver_exit();
 	cxl_ras_exit();
 	cxl_region_exit();
 	bus_unregister(&cxl_bus_type);
diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
index 97e6c187e048..a2660d64c6eb 100644
--- a/drivers/cxl/cxl.h
+++ b/drivers/cxl/cxl.h
@@ -941,4 +941,10 @@ u16 cxl_gpf_get_dvsec(struct device *dev);
 #define devm_cxl_switch_port_decoders_setup DECLARE_TESTABLE(devm_cxl_switch_port_decoders_setup)
 #endif

+#ifdef CONFIG_CXL_PCI
+bool cxl_pci_drv_bound(struct pci_dev *pdev);
+#else
+static inline bool cxl_pci_drv_bound(struct pci_dev *pdev) { return false; };
+#endif
+
 #endif /* __CXL_H__ */
diff --git a/drivers/cxl/core/pci_drv.c b/drivers/cxl/pci.c
similarity index 99%
rename from drivers/cxl/core/pci_drv.c
rename to drivers/cxl/pci.c
index bc3c959f7eb6..e6d741e15ac2 100644
--- a/drivers/cxl/core/pci_drv.c
+++ b/drivers/cxl/pci.c
@@ -1189,7 +1189,7 @@ static void cxl_cper_work_fn(struct work_struct *work)
 }
 static DECLARE_WORK(cxl_cper_work, cxl_cper_work_fn);

-int __init cxl_pci_driver_init(void)
+static int __init cxl_pci_driver_init(void)
 {
 	int rc;

@@ -1204,9 +1204,15 @@ int __init cxl_pci_driver_init(void)
 	return rc;
 }

-void cxl_pci_driver_exit(void)
+static void cxl_pci_driver_exit(void)
 {
 	cxl_cper_unregister_work(&cxl_cper_work);
 	cancel_work_sync(&cxl_cper_work);
 	pci_unregister_driver(&cxl_pci_driver);
 }
+
+module_init(cxl_pci_driver_init);
+module_exit(cxl_pci_driver_exit);
+MODULE_DESCRIPTION("CXL: PCI manageability");
+MODULE_LICENSE("GPL v2");
+MODULE_IMPORT_NS("CXL");
Re: [RESEND v13 17/25] cxl: Introduce cxl_pci_drv_bound() to check for bound driver
Posted by Bowman, Terry 1 month, 1 week ago

On 11/5/2025 4:26 PM, Gregory Price wrote:
> On Wed, Nov 05, 2025 at 02:03:31PM -0500, Gregory Price wrote:
>> On Wed, Nov 05, 2025 at 12:51:04PM -0500, Gregory Price wrote:
>>> [    2.697094] cxl_core 0000:0d:00.0: BAR 0 [mem 0xfe800000-0xfe80ffff 64bit]: not claimed; can't enable device
>>> [    2.697098] cxl_core 0000:0d:00.0: probe with driver cxl_core failed with error -22
>>>
>>> Probe order issue when CXL drivers are built-in maybe?
>>>
> moving it back but leaving the function seemed to work for me, i don't
> know what the implication of this is though (i.e. it's unclear to me
> why you moved it from point a to point b in the first place).
>
> (only tested this on QEMU)
Thanks for pointing this out.

I expect your changes will not work when using loadable modules (m instead of y). 

It appears cxl_pci_probe() is being called earlier due to the changes. The call stack is:
cxl_pci_probe()     
  pcim_enable_device(pdev);     <---- Silent exit here because cxl_pci_probe() fails below 
    pci_enable_device(struct pci_dev *dev)
      pci_enable_device_flags(struct pci_dev *dev, unsigned long flags)
        cxl_pci_probe()         <---- Returns failure due to resource reservation failure

Brief testing is showing the following works:
@@ -922,7 +924,8 @@ static int cxl_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
 
        rc = pcim_enable_device(pdev);
        if (rc)
-               return rc;
+               return -EPROBE_DEFER;
+


Terry

> ---
>
> diff --git a/drivers/cxl/Makefile b/drivers/cxl/Makefile
> index ff6add88b6ae..2caa90fa4bf2 100644
> --- a/drivers/cxl/Makefile
> +++ b/drivers/cxl/Makefile
> @@ -12,8 +12,10 @@ obj-$(CONFIG_CXL_PORT) += cxl_port.o
>  obj-$(CONFIG_CXL_ACPI) += cxl_acpi.o
>  obj-$(CONFIG_CXL_PMEM) += cxl_pmem.o
>  obj-$(CONFIG_CXL_MEM) += cxl_mem.o
> +obj-$(CONFIG_CXL_PCI) += cxl_pci.o
>
>  cxl_port-y := port.o
>  cxl_acpi-y := acpi.o
>  cxl_pmem-y := pmem.o security.o
>  cxl_mem-y := mem.o
> +cxl_pci-y := pci.o
> diff --git a/drivers/cxl/core/Makefile b/drivers/cxl/core/Makefile
> index 2937d0ddcce2..fa1d4aed28b9 100644
> --- a/drivers/cxl/core/Makefile
> +++ b/drivers/cxl/core/Makefile
> @@ -21,4 +21,3 @@ cxl_core-$(CONFIG_CXL_FEATURES) += features.o
>  cxl_core-$(CONFIG_CXL_EDAC_MEM_FEATURES) += edac.o
>  cxl_core-$(CONFIG_CXL_RAS) += ras.o
>  cxl_core-$(CONFIG_CXL_RCH_RAS) += ras_rch.o
> -cxl_core-$(CONFIG_CXL_PCI) += pci_drv.o
> diff --git a/drivers/cxl/core/core.h b/drivers/cxl/core/core.h
> index a7a0838c8f23..7c287b4fa699 100644
> --- a/drivers/cxl/core/core.h
> +++ b/drivers/cxl/core/core.h
> @@ -223,13 +223,4 @@ int cxl_set_feature(struct cxl_mailbox *cxl_mbox, const uuid_t *feat_uuid,
>  		    u16 *return_code);
>  #endif
>
> -#ifdef CONFIG_CXL_PCI
> -bool cxl_pci_drv_bound(struct pci_dev *pdev);
> -int cxl_pci_driver_init(void);
> -void cxl_pci_driver_exit(void);
> -#else
> -static inline bool cxl_pci_drv_bound(struct pci_dev *pdev) { return false; };
> -static inline int cxl_pci_driver_init(void) { return 0; }
> -static inline void cxl_pci_driver_exit(void) { }
> -#endif
>  #endif /* __CXL_CORE_H__ */
> diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c
> index d19ebf052d76..ca02ad58fc57 100644
> --- a/drivers/cxl/core/port.c
> +++ b/drivers/cxl/core/port.c
> @@ -2520,8 +2520,6 @@ static __init int cxl_core_init(void)
>  	if (rc)
>  		goto err_ras;
>
> -	cxl_pci_driver_init();
> -
>  	return 0;
>
>  err_ras:
> @@ -2537,7 +2535,6 @@ static __init int cxl_core_init(void)
>
>  static void cxl_core_exit(void)
>  {
> -	cxl_pci_driver_exit();
>  	cxl_ras_exit();
>  	cxl_region_exit();
>  	bus_unregister(&cxl_bus_type);
> diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
> index 97e6c187e048..a2660d64c6eb 100644
> --- a/drivers/cxl/cxl.h
> +++ b/drivers/cxl/cxl.h
> @@ -941,4 +941,10 @@ u16 cxl_gpf_get_dvsec(struct device *dev);
>  #define devm_cxl_switch_port_decoders_setup DECLARE_TESTABLE(devm_cxl_switch_port_decoders_setup)
>  #endif
>
> +#ifdef CONFIG_CXL_PCI
> +bool cxl_pci_drv_bound(struct pci_dev *pdev);
> +#else
> +static inline bool cxl_pci_drv_bound(struct pci_dev *pdev) { return false; };
> +#endif
> +
>  #endif /* __CXL_H__ */
> diff --git a/drivers/cxl/core/pci_drv.c b/drivers/cxl/pci.c
> similarity index 99%
> rename from drivers/cxl/core/pci_drv.c
> rename to drivers/cxl/pci.c
> index bc3c959f7eb6..e6d741e15ac2 100644
> --- a/drivers/cxl/core/pci_drv.c
> +++ b/drivers/cxl/pci.c
> @@ -1189,7 +1189,7 @@ static void cxl_cper_work_fn(struct work_struct *work)
>  }
>  static DECLARE_WORK(cxl_cper_work, cxl_cper_work_fn);
>
> -int __init cxl_pci_driver_init(void)
> +static int __init cxl_pci_driver_init(void)
>  {
>  	int rc;
>
> @@ -1204,9 +1204,15 @@ int __init cxl_pci_driver_init(void)
>  	return rc;
>  }
>
> -void cxl_pci_driver_exit(void)
> +static void cxl_pci_driver_exit(void)
>  {
>  	cxl_cper_unregister_work(&cxl_cper_work);
>  	cancel_work_sync(&cxl_cper_work);
>  	pci_unregister_driver(&cxl_pci_driver);
>  }
> +
> +module_init(cxl_pci_driver_init);
> +module_exit(cxl_pci_driver_exit);
> +MODULE_DESCRIPTION("CXL: PCI manageability");
> +MODULE_LICENSE("GPL v2");
> +MODULE_IMPORT_NS("CXL");

Re: [RESEND v13 17/25] cxl: Introduce cxl_pci_drv_bound() to check for bound driver
Posted by Gregory Price 1 month, 1 week ago
On Wed, Nov 05, 2025 at 05:26:01PM -0500, Gregory Price wrote:
> On Wed, Nov 05, 2025 at 02:03:31PM -0500, Gregory Price wrote:
> > On Wed, Nov 05, 2025 at 12:51:04PM -0500, Gregory Price wrote:
> > > 
> > > [    2.697094] cxl_core 0000:0d:00.0: BAR 0 [mem 0xfe800000-0xfe80ffff 64bit]: not claimed; can't enable device
> > > [    2.697098] cxl_core 0000:0d:00.0: probe with driver cxl_core failed with error -22
> > > 
> > > Probe order issue when CXL drivers are built-in maybe?
> > > 
> > 
> 
> moving it back but leaving the function seemed to work for me, i don't
> know what the implication of this is though (i.e. it's unclear to me
> why you moved it from point a to point b in the first place).
> 
> (only tested this on QEMU)

also tested on Zen5 systems and others.  Seems stable to me.

~Gregory