[PATCH v3 0/3] Fix port enumeration failure

Li Ming posted 3 patches 1 month, 2 weeks ago
drivers/cxl/core/core.h | 18 +++++++++++++++++
drivers/cxl/core/port.c | 52 +++++++++++++++++--------------------------------
2 files changed, 36 insertions(+), 34 deletions(-)
[PATCH v3 0/3] Fix port enumeration failure
Posted by Li Ming 1 month, 2 weeks ago
I ran CXL mock testing with next branch, I usually hit the following
call trace.

 Oops: general protection fault, probably for non-canonical address 0xdffffc0000000092: 0000 [#1] SMP KASAN NOPTI
 KASAN: null-ptr-deref in range [0x0000000000000490-0x0000000000000497]
 CPU: 3 UID: 0 PID: 42 Comm: kworker/u16:1 Tainted: G           O      J 6.19.0-rc5-cxl+ #4 PREEMPT(voluntary)
 Tainted: [O]=OOT_MODULE, [J]=FWCTL
 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.17.0-0-gb52ca86e094d-prebuilt.qemu.org 04/01/2014
 Workqueue: async async_run_entry_fn
 RIP: 0010:cxl_dpa_to_region+0x105/0x1f0 [cxl_core]
 Call Trace:
  <TASK>
  cxl_event_trace_record+0xd1/0xa70 [cxl_core]
  __cxl_event_trace_record+0x12f/0x1e0 [cxl_core]
  cxl_mem_get_records_log+0x261/0x500 [cxl_core]
  cxl_mem_get_event_records+0x7c/0xc0 [cxl_core]
  cxl_mock_mem_probe+0xd38/0x1c60 [cxl_mock_mem]
  platform_probe+0x9d/0x130
  really_probe+0x1c8/0x960
  driver_probe_device+0x45/0x120
  __device_attach_driver+0x15d/0x280
  bus_for_each_drv+0x100/0x180
  __device_attach_async_helper+0x199/0x250
  async_run_entry_fn+0x95/0x430
  process_one_work+0x7db/0x1940

After detailed debugging, I identified adding dport failure leads to the
problem.
What I observed is when two memdev were trying to enumerate a same port,
the first memdev was responsible for port creation and bind it to the
cxl port driver. However, there is a small window between the point
where the new port becomes visible(after being added to the device list
of cxl bus) and when it is bound to the port driver. During this window,
the second memdev may discover the port and acquire its lock while
attempting to add its dport, which blocks bus_probe_device() inside
device_add(). As a result, the second memdev observes the port as
unbound and fails to add its dport. The second memdev->endpoint would
not be updated because of that, then trigger above trace.

The solution is to fix this race by holding the host lock of the target
port during dport addition, preventing premature access before driver
binding completed.

base-commit: 63fbf275fa9f18f7020fb8acf54fa107e51d0f23 cxl/next

Changes from V2:
- Split to_port_host() implementation to a lead-in patch. (Dan)
- Use to_port_host() instead of open coded. (Dan)
- Rename to_port_host() to port_to_host() to align with dport_to_host().

Changes from V1:
- Remove the patch of initializing memdev->endpoint to NULL. (Dan)
- Fixes typo errors. (Jonathan)
- Introduce a helper called to_port_host().
- unregister_port() cleanup.

Signed-off-by: Li Ming <ming.li@zohomail.com>
---
Li Ming (3):
      cxl/port: Introduce port_to_host() helper
      cxl/port: Hold port host lock during dport adding.
      cxl/port: Use port_to_host() to get port host

 drivers/cxl/core/core.h | 18 +++++++++++++++++
 drivers/cxl/core/port.c | 52 +++++++++++++++++--------------------------------
 2 files changed, 36 insertions(+), 34 deletions(-)
---
base-commit: 63fbf275fa9f18f7020fb8acf54fa107e51d0f23
change-id: 20260208-fix-port-enumeration-failure-34e1f4953f02

Best regards,
-- 
Li Ming <ming.li@zohomail.com>
Re: [PATCH v3 0/3] Fix port enumeration failure
Posted by Dave Jiang 1 month, 1 week ago

On 2/10/26 4:46 AM, Li Ming wrote:
> I ran CXL mock testing with next branch, I usually hit the following
> call trace.

Applied series to cxl/fixes. I squashed patch 1 and 3 together.
0066688dbcdc cxl/port: Hold port host lock during dport adding.
822655e6751d cxl/port: Introduce port_to_host() helper

> 
>  Oops: general protection fault, probably for non-canonical address 0xdffffc0000000092: 0000 [#1] SMP KASAN NOPTI
>  KASAN: null-ptr-deref in range [0x0000000000000490-0x0000000000000497]
>  CPU: 3 UID: 0 PID: 42 Comm: kworker/u16:1 Tainted: G           O      J 6.19.0-rc5-cxl+ #4 PREEMPT(voluntary)
>  Tainted: [O]=OOT_MODULE, [J]=FWCTL
>  Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.17.0-0-gb52ca86e094d-prebuilt.qemu.org 04/01/2014
>  Workqueue: async async_run_entry_fn
>  RIP: 0010:cxl_dpa_to_region+0x105/0x1f0 [cxl_core]
>  Call Trace:
>   <TASK>
>   cxl_event_trace_record+0xd1/0xa70 [cxl_core]
>   __cxl_event_trace_record+0x12f/0x1e0 [cxl_core]
>   cxl_mem_get_records_log+0x261/0x500 [cxl_core]
>   cxl_mem_get_event_records+0x7c/0xc0 [cxl_core]
>   cxl_mock_mem_probe+0xd38/0x1c60 [cxl_mock_mem]
>   platform_probe+0x9d/0x130
>   really_probe+0x1c8/0x960
>   driver_probe_device+0x45/0x120
>   __device_attach_driver+0x15d/0x280
>   bus_for_each_drv+0x100/0x180
>   __device_attach_async_helper+0x199/0x250
>   async_run_entry_fn+0x95/0x430
>   process_one_work+0x7db/0x1940
> 
> After detailed debugging, I identified adding dport failure leads to the
> problem.
> What I observed is when two memdev were trying to enumerate a same port,
> the first memdev was responsible for port creation and bind it to the
> cxl port driver. However, there is a small window between the point
> where the new port becomes visible(after being added to the device list
> of cxl bus) and when it is bound to the port driver. During this window,
> the second memdev may discover the port and acquire its lock while
> attempting to add its dport, which blocks bus_probe_device() inside
> device_add(). As a result, the second memdev observes the port as
> unbound and fails to add its dport. The second memdev->endpoint would
> not be updated because of that, then trigger above trace.
> 
> The solution is to fix this race by holding the host lock of the target
> port during dport addition, preventing premature access before driver
> binding completed.
> 
> base-commit: 63fbf275fa9f18f7020fb8acf54fa107e51d0f23 cxl/next
> 
> Changes from V2:
> - Split to_port_host() implementation to a lead-in patch. (Dan)
> - Use to_port_host() instead of open coded. (Dan)
> - Rename to_port_host() to port_to_host() to align with dport_to_host().
> 
> Changes from V1:
> - Remove the patch of initializing memdev->endpoint to NULL. (Dan)
> - Fixes typo errors. (Jonathan)
> - Introduce a helper called to_port_host().
> - unregister_port() cleanup.
> 
> Signed-off-by: Li Ming <ming.li@zohomail.com>
> ---
> Li Ming (3):
>       cxl/port: Introduce port_to_host() helper
>       cxl/port: Hold port host lock during dport adding.
>       cxl/port: Use port_to_host() to get port host
> 
>  drivers/cxl/core/core.h | 18 +++++++++++++++++
>  drivers/cxl/core/port.c | 52 +++++++++++++++++--------------------------------
>  2 files changed, 36 insertions(+), 34 deletions(-)
> ---
> base-commit: 63fbf275fa9f18f7020fb8acf54fa107e51d0f23
> change-id: 20260208-fix-port-enumeration-failure-34e1f4953f02
> 
> Best regards,
Re: [PATCH v3 0/3] Fix port enumeration failure
Posted by Alison Schofield 1 month, 2 weeks ago
On Tue, Feb 10, 2026 at 07:46:55PM +0800, Li Ming wrote:
> I ran CXL mock testing with next branch, I usually hit the following
> call trace.

Hi Ming,

Without a crisp reproducer, I just beat this up and it ran beyond what
it had 'before this patch'. It survived 100 --suite=cxl and 500 module
reloads.

For the series:
Tested-by: Alison Schofield <alison.schofield@intel.com>

snip
Re: [PATCH v3 0/3] Fix port enumeration failure
Posted by dan.j.williams@intel.com 1 month, 2 weeks ago
Li Ming wrote:
> I ran CXL mock testing with next branch, I usually hit the following
> call trace.
[..]
> Changes from V2:
> - Split to_port_host() implementation to a lead-in patch. (Dan)
> - Use to_port_host() instead of open coded. (Dan)
> - Rename to_port_host() to port_to_host() to align with dport_to_host().
> 
> Changes from V1:
> - Remove the patch of initializing memdev->endpoint to NULL. (Dan)
> - Fixes typo errors. (Jonathan)
> - Introduce a helper called to_port_host().
> - unregister_port() cleanup.
> 
> Signed-off-by: Li Ming <ming.li@zohomail.com>
> ---
> Li Ming (3):
>       cxl/port: Introduce port_to_host() helper
>       cxl/port: Hold port host lock during dport adding.
>       cxl/port: Use port_to_host() to get port host
> 
>  drivers/cxl/core/core.h | 18 +++++++++++++++++
>  drivers/cxl/core/port.c | 52 +++++++++++++++++--------------------------------
>  2 files changed, 36 insertions(+), 34 deletions(-)
> ---
> base-commit: 63fbf275fa9f18f7020fb8acf54fa107e51d0f23
> change-id: 20260208-fix-port-enumeration-failure-34e1f4953f02

For the series:

Reviewed-by: Dan Williams <dan.j.williams@intel.com>

...but I was expecting patch1 and patch3 to be squashed. Dave can do
that on applying, or leave it as is.