I ran CXL mock testing with next branch, I usually hit the following
call trace.
Oops: general protection fault, probably for non-canonical address 0xdffffc0000000092: 0000 [#1] SMP KASAN NOPTI
KASAN: null-ptr-deref in range [0x0000000000000490-0x0000000000000497]
CPU: 3 UID: 0 PID: 42 Comm: kworker/u16:1 Tainted: G O J 6.19.0-rc5-cxl+ #4 PREEMPT(voluntary)
Tainted: [O]=OOT_MODULE, [J]=FWCTL
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.17.0-0-gb52ca86e094d-prebuilt.qemu.org 04/01/2014
Workqueue: async async_run_entry_fn
RIP: 0010:cxl_dpa_to_region+0x105/0x1f0 [cxl_core]
Call Trace:
<TASK>
cxl_event_trace_record+0xd1/0xa70 [cxl_core]
__cxl_event_trace_record+0x12f/0x1e0 [cxl_core]
cxl_mem_get_records_log+0x261/0x500 [cxl_core]
cxl_mem_get_event_records+0x7c/0xc0 [cxl_core]
cxl_mock_mem_probe+0xd38/0x1c60 [cxl_mock_mem]
platform_probe+0x9d/0x130
really_probe+0x1c8/0x960
driver_probe_device+0x45/0x120
__device_attach_driver+0x15d/0x280
bus_for_each_drv+0x100/0x180
__device_attach_async_helper+0x199/0x250
async_run_entry_fn+0x95/0x430
process_one_work+0x7db/0x1940
After detailed debugging, I identified adding dport failure leads to the
problem.
What I observed is when two memdev were trying to enumerate a same port,
the first memdev was responsible for port creation and bind it to the
cxl port driver. However, there is a small window between the point
where the new port becomes visible(after being added to the device list
of cxl bus) and when it is bound to the port driver. During this window,
the second memdev may discover the port and acquire its lock while
attempting to add its dport, which blocks bus_probe_device() inside
device_add(). As a result, the second memdev observes the port as
unbound and fails to add its dport. The second memdev->endpoint would
not be updated because of that, then trigger above trace.
The solution is to fix this race by holding the host lock of the target
port during dport addition, preventing premature access before driver
binding completed.
base-commit: 63fbf275fa9f18f7020fb8acf54fa107e51d0f23 cxl/next
Changes from V1:
- Remove the patch of initializing memdev->endpoint to NULL. (Dan)
- Fixes typo errors. (Jonathan)
- Introduce a helper called to_port_host().
- unregister_port() cleanup.
Li Ming (2):
cxl/port: Hold port host lock while dport adding.
cxl/port: unregister_port() cleanup
drivers/cxl/core/port.c | 47 +++++++++++++++++++++++++----------------
1 file changed, 29 insertions(+), 18 deletions(-)
--
2.43.0