[PATCH v2 0/4] cxl: Consolidate cxlmd->endpoint accessing

Li Ming posted 4 patches 3 weeks, 3 days ago
drivers/cxl/core/mbox.c   |  5 +++--
drivers/cxl/core/region.c |  8 +++++---
drivers/cxl/cxlmem.h      |  2 +-
drivers/cxl/mem.c         | 10 ++++++++++
drivers/cxl/pci.c         |  3 +++
include/linux/device.h    |  1 +
6 files changed, 23 insertions(+), 6 deletions(-)
[PATCH v2 0/4] cxl: Consolidate cxlmd->endpoint accessing
Posted by Li Ming 3 weeks, 3 days ago
Currently, CXL subsystem implementation has some functions that may
access CXL memdev's endpoint before the endpoint initialization
completed or without checking the CXL memdev endpoint validity. 
This patchset fixes three scenarios as above description.

1. cxl_dpa_to_region() is possible to access an invalid CXL memdev
   endpoint.
   there are two scenarios that can trigger this issue:
   a. memdev poison injection/clearing debugfs interfaces:
      devm_cxl_add_endpoint() is used to register CXL memdev endpoint
      and update cxlmd->endpoint from -ENXIO to the endpoint structure.
      memdev poison injection/clearing debugfs interfaces are registered
      before devm_cxl_add_endpoint() is invoked in cxl_mem_probe().
      There is a small window where user can use the debugfs interfaces
      to access an invalid endpoint.
   b. cxl_event_config() in the end of cxl_pci_probe():
      cxl_event_config() invokes cxl_mem_get_event_record() to get
      remain event logs from CXL device during cxl_pci_probe(). If CXL
      memdev probing failed before that, it is also possible to access
      an invalid endpoint.
   To fix these two cases, cxl_dpa_to_region() requires callers holding
   CXL memdev lock to access it and check if CXL memdev driver bingding
   status. Holding CXL memdev lock ensures that CXL memdev probing has
   completed, and if CXL memdev driver is bound, it will mean
   cxlmd->endpoint is valid. (PATCH #1-#3)

2. cxl_reset_done() callback in cxl_pci module.
   cxl_reset_done() callback also accesses cxlmd->endpoint without any
   checking. If CXL memdev probing fails, then cxl_reset_done() is
   called by PCI subsystem, it will access an invalid endpoint. The
   solution is adding a CXL memdev driver binding status inside
   cxl_reset_done(). (PATCH #4)

---
Changes in v2:
- Move hoding CXL memdev lock to cxl_debugfs_poison_inject/clear(). (Alison)
- Drop device_lock_assert() in cxl_inject/clear_poison_locked(). (Alison)
- Remove device_lock_assert() in cxl_dpa_to_region() to remove patch
  "cxl/region: Hold memdev lock during region poison injection/clear". (Alison)
- Squash patch "cxl/pci: Hold memdev lock in cxl_event_trace_record()"
  and patch "cxl/region: Ensure endpoint is valid in cxl_dpa_to_region()". (Dan & Dave)
- Remove patch "cxl/port: Reset cxlmd->endpoint to -ENXIO by default".
- Link to v1: https://lore.kernel.org/r/20260310-fix_access_endpoint_without_drv_check-v1-0-94fe919a0b87@zohomail.com

---
Li Ming (4):
      driver core: Add conditional guard support for device_lock()
      cxl/memdev: Hold memdev lock during memdev poison injection/clear
      cxl/pci: Hold memdev lock in cxl_event_trace_record()
      cxl/pci: Check memdev driver binding status in cxl_reset_done()

 drivers/cxl/core/mbox.c   |  5 +++--
 drivers/cxl/core/region.c |  8 +++++---
 drivers/cxl/cxlmem.h      |  2 +-
 drivers/cxl/mem.c         | 10 ++++++++++
 drivers/cxl/pci.c         |  3 +++
 include/linux/device.h    |  1 +
 6 files changed, 23 insertions(+), 6 deletions(-)
---
base-commit: 11439c4635edd669ae435eec308f4ab8a0804808
change-id: 20260308-fix_access_endpoint_without_drv_check-f2e6ff4bdc48

Best regards,
-- 
Li Ming <ming.li@zohomail.com>
Re: [PATCH v2 0/4] cxl: Consolidate cxlmd->endpoint accessing
Posted by Dave Jiang 3 weeks ago

On 3/14/26 12:06 AM, Li Ming wrote:
> Currently, CXL subsystem implementation has some functions that may
> access CXL memdev's endpoint before the endpoint initialization
> completed or without checking the CXL memdev endpoint validity. 
> This patchset fixes three scenarios as above description.
> 
> 1. cxl_dpa_to_region() is possible to access an invalid CXL memdev
>    endpoint.
>    there are two scenarios that can trigger this issue:
>    a. memdev poison injection/clearing debugfs interfaces:
>       devm_cxl_add_endpoint() is used to register CXL memdev endpoint
>       and update cxlmd->endpoint from -ENXIO to the endpoint structure.
>       memdev poison injection/clearing debugfs interfaces are registered
>       before devm_cxl_add_endpoint() is invoked in cxl_mem_probe().
>       There is a small window where user can use the debugfs interfaces
>       to access an invalid endpoint.
>    b. cxl_event_config() in the end of cxl_pci_probe():
>       cxl_event_config() invokes cxl_mem_get_event_record() to get
>       remain event logs from CXL device during cxl_pci_probe(). If CXL
>       memdev probing failed before that, it is also possible to access
>       an invalid endpoint.
>    To fix these two cases, cxl_dpa_to_region() requires callers holding
>    CXL memdev lock to access it and check if CXL memdev driver bingding
>    status. Holding CXL memdev lock ensures that CXL memdev probing has
>    completed, and if CXL memdev driver is bound, it will mean
>    cxlmd->endpoint is valid. (PATCH #1-#3)
> 
> 2. cxl_reset_done() callback in cxl_pci module.
>    cxl_reset_done() callback also accesses cxlmd->endpoint without any
>    checking. If CXL memdev probing fails, then cxl_reset_done() is
>    called by PCI subsystem, it will access an invalid endpoint. The
>    solution is adding a CXL memdev driver binding status inside
>    cxl_reset_done(). (PATCH #4)
> 
> ---
> Changes in v2:
> - Move hoding CXL memdev lock to cxl_debugfs_poison_inject/clear(). (Alison)
> - Drop device_lock_assert() in cxl_inject/clear_poison_locked(). (Alison)
> - Remove device_lock_assert() in cxl_dpa_to_region() to remove patch
>   "cxl/region: Hold memdev lock during region poison injection/clear". (Alison)
> - Squash patch "cxl/pci: Hold memdev lock in cxl_event_trace_record()"
>   and patch "cxl/region: Ensure endpoint is valid in cxl_dpa_to_region()". (Dan & Dave)
> - Remove patch "cxl/port: Reset cxlmd->endpoint to -ENXIO by default".
> - Link to v1: https://lore.kernel.org/r/20260310-fix_access_endpoint_without_drv_check-v1-0-94fe919a0b87@zohomail.com
> 
> ---
> Li Ming (4):
>       driver core: Add conditional guard support for device_lock()
>       cxl/memdev: Hold memdev lock during memdev poison injection/clear
>       cxl/pci: Hold memdev lock in cxl_event_trace_record()
>       cxl/pci: Check memdev driver binding status in cxl_reset_done()
> 
>  drivers/cxl/core/mbox.c   |  5 +++--
>  drivers/cxl/core/region.c |  8 +++++---
>  drivers/cxl/cxlmem.h      |  2 +-
>  drivers/cxl/mem.c         | 10 ++++++++++
>  drivers/cxl/pci.c         |  3 +++
>  include/linux/device.h    |  1 +
>  6 files changed, 23 insertions(+), 6 deletions(-)
> ---
> base-commit: 11439c4635edd669ae435eec308f4ab8a0804808
> change-id: 20260308-fix_access_endpoint_without_drv_check-f2e6ff4bdc48

Applied to cxl/next
43e4c205197e cxl/pci: Check memdev driver binding status in cxl_reset_done()
11ce2524b7f3 cxl/pci: Hold memdev lock in cxl_event_trace_record()
b227d1faed0a cxl/memdev: Hold memdev lock during memdev poison injection/clear
e5564e392075 Merge tag 'device_lock_cond_guard-7.1-rc1' into for-7.1/cxl-consolidate-endpoint

> 
> Best regards,