drivers/cxl/core/core.h | 18 +++++++++++++++++ drivers/cxl/core/port.c | 52 +++++++++++++++++-------------------------------- 2 files changed, 36 insertions(+), 34 deletions(-)
I ran CXL mock testing with next branch, I usually hit the following
call trace.
Oops: general protection fault, probably for non-canonical address 0xdffffc0000000092: 0000 [#1] SMP KASAN NOPTI
KASAN: null-ptr-deref in range [0x0000000000000490-0x0000000000000497]
CPU: 3 UID: 0 PID: 42 Comm: kworker/u16:1 Tainted: G O J 6.19.0-rc5-cxl+ #4 PREEMPT(voluntary)
Tainted: [O]=OOT_MODULE, [J]=FWCTL
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.17.0-0-gb52ca86e094d-prebuilt.qemu.org 04/01/2014
Workqueue: async async_run_entry_fn
RIP: 0010:cxl_dpa_to_region+0x105/0x1f0 [cxl_core]
Call Trace:
<TASK>
cxl_event_trace_record+0xd1/0xa70 [cxl_core]
__cxl_event_trace_record+0x12f/0x1e0 [cxl_core]
cxl_mem_get_records_log+0x261/0x500 [cxl_core]
cxl_mem_get_event_records+0x7c/0xc0 [cxl_core]
cxl_mock_mem_probe+0xd38/0x1c60 [cxl_mock_mem]
platform_probe+0x9d/0x130
really_probe+0x1c8/0x960
driver_probe_device+0x45/0x120
__device_attach_driver+0x15d/0x280
bus_for_each_drv+0x100/0x180
__device_attach_async_helper+0x199/0x250
async_run_entry_fn+0x95/0x430
process_one_work+0x7db/0x1940
After detailed debugging, I identified adding dport failure leads to the
problem.
What I observed is when two memdev were trying to enumerate a same port,
the first memdev was responsible for port creation and bind it to the
cxl port driver. However, there is a small window between the point
where the new port becomes visible(after being added to the device list
of cxl bus) and when it is bound to the port driver. During this window,
the second memdev may discover the port and acquire its lock while
attempting to add its dport, which blocks bus_probe_device() inside
device_add(). As a result, the second memdev observes the port as
unbound and fails to add its dport. The second memdev->endpoint would
not be updated because of that, then trigger above trace.
The solution is to fix this race by holding the host lock of the target
port during dport addition, preventing premature access before driver
binding completed.
base-commit: 63fbf275fa9f18f7020fb8acf54fa107e51d0f23 cxl/next
Changes from V2:
- Split to_port_host() implementation to a lead-in patch. (Dan)
- Use to_port_host() instead of open coded. (Dan)
- Rename to_port_host() to port_to_host() to align with dport_to_host().
Changes from V1:
- Remove the patch of initializing memdev->endpoint to NULL. (Dan)
- Fixes typo errors. (Jonathan)
- Introduce a helper called to_port_host().
- unregister_port() cleanup.
Signed-off-by: Li Ming <ming.li@zohomail.com>
---
Li Ming (3):
cxl/port: Introduce port_to_host() helper
cxl/port: Hold port host lock during dport adding.
cxl/port: Use port_to_host() to get port host
drivers/cxl/core/core.h | 18 +++++++++++++++++
drivers/cxl/core/port.c | 52 +++++++++++++++++--------------------------------
2 files changed, 36 insertions(+), 34 deletions(-)
---
base-commit: 63fbf275fa9f18f7020fb8acf54fa107e51d0f23
change-id: 20260208-fix-port-enumeration-failure-34e1f4953f02
Best regards,
--
Li Ming <ming.li@zohomail.com>
On 2/10/26 4:46 AM, Li Ming wrote: > I ran CXL mock testing with next branch, I usually hit the following > call trace. Applied series to cxl/fixes. I squashed patch 1 and 3 together. 0066688dbcdc cxl/port: Hold port host lock during dport adding. 822655e6751d cxl/port: Introduce port_to_host() helper > > Oops: general protection fault, probably for non-canonical address 0xdffffc0000000092: 0000 [#1] SMP KASAN NOPTI > KASAN: null-ptr-deref in range [0x0000000000000490-0x0000000000000497] > CPU: 3 UID: 0 PID: 42 Comm: kworker/u16:1 Tainted: G O J 6.19.0-rc5-cxl+ #4 PREEMPT(voluntary) > Tainted: [O]=OOT_MODULE, [J]=FWCTL > Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.17.0-0-gb52ca86e094d-prebuilt.qemu.org 04/01/2014 > Workqueue: async async_run_entry_fn > RIP: 0010:cxl_dpa_to_region+0x105/0x1f0 [cxl_core] > Call Trace: > <TASK> > cxl_event_trace_record+0xd1/0xa70 [cxl_core] > __cxl_event_trace_record+0x12f/0x1e0 [cxl_core] > cxl_mem_get_records_log+0x261/0x500 [cxl_core] > cxl_mem_get_event_records+0x7c/0xc0 [cxl_core] > cxl_mock_mem_probe+0xd38/0x1c60 [cxl_mock_mem] > platform_probe+0x9d/0x130 > really_probe+0x1c8/0x960 > driver_probe_device+0x45/0x120 > __device_attach_driver+0x15d/0x280 > bus_for_each_drv+0x100/0x180 > __device_attach_async_helper+0x199/0x250 > async_run_entry_fn+0x95/0x430 > process_one_work+0x7db/0x1940 > > After detailed debugging, I identified adding dport failure leads to the > problem. > What I observed is when two memdev were trying to enumerate a same port, > the first memdev was responsible for port creation and bind it to the > cxl port driver. However, there is a small window between the point > where the new port becomes visible(after being added to the device list > of cxl bus) and when it is bound to the port driver. During this window, > the second memdev may discover the port and acquire its lock while > attempting to add its dport, which blocks bus_probe_device() inside > device_add(). As a result, the second memdev observes the port as > unbound and fails to add its dport. The second memdev->endpoint would > not be updated because of that, then trigger above trace. > > The solution is to fix this race by holding the host lock of the target > port during dport addition, preventing premature access before driver > binding completed. > > base-commit: 63fbf275fa9f18f7020fb8acf54fa107e51d0f23 cxl/next > > Changes from V2: > - Split to_port_host() implementation to a lead-in patch. (Dan) > - Use to_port_host() instead of open coded. (Dan) > - Rename to_port_host() to port_to_host() to align with dport_to_host(). > > Changes from V1: > - Remove the patch of initializing memdev->endpoint to NULL. (Dan) > - Fixes typo errors. (Jonathan) > - Introduce a helper called to_port_host(). > - unregister_port() cleanup. > > Signed-off-by: Li Ming <ming.li@zohomail.com> > --- > Li Ming (3): > cxl/port: Introduce port_to_host() helper > cxl/port: Hold port host lock during dport adding. > cxl/port: Use port_to_host() to get port host > > drivers/cxl/core/core.h | 18 +++++++++++++++++ > drivers/cxl/core/port.c | 52 +++++++++++++++++-------------------------------- > 2 files changed, 36 insertions(+), 34 deletions(-) > --- > base-commit: 63fbf275fa9f18f7020fb8acf54fa107e51d0f23 > change-id: 20260208-fix-port-enumeration-failure-34e1f4953f02 > > Best regards,
On Tue, Feb 10, 2026 at 07:46:55PM +0800, Li Ming wrote: > I ran CXL mock testing with next branch, I usually hit the following > call trace. Hi Ming, Without a crisp reproducer, I just beat this up and it ran beyond what it had 'before this patch'. It survived 100 --suite=cxl and 500 module reloads. For the series: Tested-by: Alison Schofield <alison.schofield@intel.com> snip
Li Ming wrote: > I ran CXL mock testing with next branch, I usually hit the following > call trace. [..] > Changes from V2: > - Split to_port_host() implementation to a lead-in patch. (Dan) > - Use to_port_host() instead of open coded. (Dan) > - Rename to_port_host() to port_to_host() to align with dport_to_host(). > > Changes from V1: > - Remove the patch of initializing memdev->endpoint to NULL. (Dan) > - Fixes typo errors. (Jonathan) > - Introduce a helper called to_port_host(). > - unregister_port() cleanup. > > Signed-off-by: Li Ming <ming.li@zohomail.com> > --- > Li Ming (3): > cxl/port: Introduce port_to_host() helper > cxl/port: Hold port host lock during dport adding. > cxl/port: Use port_to_host() to get port host > > drivers/cxl/core/core.h | 18 +++++++++++++++++ > drivers/cxl/core/port.c | 52 +++++++++++++++++-------------------------------- > 2 files changed, 36 insertions(+), 34 deletions(-) > --- > base-commit: 63fbf275fa9f18f7020fb8acf54fa107e51d0f23 > change-id: 20260208-fix-port-enumeration-failure-34e1f4953f02 For the series: Reviewed-by: Dan Williams <dan.j.williams@intel.com> ...but I was expecting patch1 and patch3 to be squashed. Dave can do that on applying, or leave it as is.
© 2016 - 2026 Red Hat, Inc.