[PATCH 00/20] vfio/pci: Add CXL Type-2 device passthrough support

mhonap@nvidia.com posted 20 patches 3 weeks, 5 days ago
There is a newer version of this series
Documentation/driver-api/index.rst            |   1 +
Documentation/driver-api/vfio-pci-cxl.rst     | 216 +++++
drivers/cxl/core/pci.c                        |  80 +-
drivers/cxl/core/regs.c                       |  29 +
drivers/cxl/cxl.h                             |  34 -
drivers/vfio/pci/Kconfig                      |   2 +
drivers/vfio/pci/Makefile                     |   1 +
drivers/vfio/pci/cxl/Kconfig                  |   7 +
drivers/vfio/pci/cxl/vfio_cxl_config.c        | 304 +++++++
drivers/vfio/pci/cxl/vfio_cxl_core.c          | 713 +++++++++++++++
drivers/vfio/pci/cxl/vfio_cxl_emu.c           | 414 +++++++++
drivers/vfio/pci/cxl/vfio_cxl_priv.h          | 123 +++
drivers/vfio/pci/vfio_pci.c                   |  32 +
drivers/vfio/pci/vfio_pci_config.c            |  58 +-
drivers/vfio/pci/vfio_pci_core.c              |  31 +
drivers/vfio/pci/vfio_pci_priv.h              |  72 ++
drivers/vfio/pci/vfio_pci_rdwr.c              |   8 +
include/cxl/cxl.h                             |  52 ++
include/linux/vfio_pci_core.h                 |  10 +
include/uapi/linux/vfio.h                     |  52 ++
tools/testing/selftests/vfio/Makefile         |   1 +
.../selftests/vfio/lib/vfio_pci_device.c      |   4 +-
.../selftests/vfio/vfio_cxl_type2_test.c      | 816 ++++++++++++++++++
23 files changed, 3013 insertions(+), 47 deletions(-)
create mode 100644 Documentation/driver-api/vfio-pci-cxl.rst
create mode 100644 drivers/vfio/pci/cxl/Kconfig
create mode 100644 drivers/vfio/pci/cxl/vfio_cxl_config.c
create mode 100644 drivers/vfio/pci/cxl/vfio_cxl_core.c
create mode 100644 drivers/vfio/pci/cxl/vfio_cxl_emu.c
create mode 100644 drivers/vfio/pci/cxl/vfio_cxl_priv.h
create mode 100644 tools/testing/selftests/vfio/vfio_cxl_type2_test.c
[PATCH 00/20] vfio/pci: Add CXL Type-2 device passthrough support
Posted by mhonap@nvidia.com 3 weeks, 5 days ago
From: Manish Honap <mhonap@nvidia.com>

This series adds support for passthrough of CXL Type-2 devices to virtual
machines through VFIO; The goal is to expose CXL functionality through
the generic vfio-pci core, without any need for a variant driver.

Current design is based on CXL core APIs provided by Alejandro's CXL
type-2 device support patch series which is currently in upstream
review. (see drivers/net/ethernet/sfc/efx_cxl.c) [1].

This patchset should be applied on the cxl next branch using the base
specified at the end of this cover letter + Alejandro's v23 mentioned in
[1].

This patch series introduces CONFIG_VFIO_CXL_CORE, a new optional module
source compiled into vfio-pci-core, that hooks into the vfio-pci
open/close and reset paths to provide:

  * Automatic CXL Type-2 detection at device open time via the CXL Device
    DVSEC capability (Vendor ID 0x1E98, ID 0x0000) and HDM Decoder
    Capability block.

  * Kernel-owned HDM decoder management.  The VMM never programs HDM
    decoders directly; instead it reads and writes an emulated shadow copy
    of the HDM register block through a dedicated COMP_REGS VFIO region.
    All bit-field rules (reserved bits, read-only bits, the
    COMMIT/COMMITTED latch) are enforced by the kernel.

  * A DPA VFIO region backed by the kernel-assigned Host Physical Address
    (HPA).  The VMM maps this region with mmap(); PTEs are inserted lazily
    on first fault.  During FLR/reset all PTEs are invalidated atomically
    under memory_lock and re-inserted after the reset path re-enables the
    decoder.

  * CXL DVSEC configuration-space emulation.  Writes to the CXL Control,
    Status, Control2, Status2, Lock, and Range Base registers in the
    device's PCI extended configuration space are intercepted and replayed
    through a per-device shadow (vconfig), enforcing CXL 3.1 register
    semantics including the RWL/RW1CS/RWO access types and the CONFIG_LOCK
    one-shot latch.

  * A new VFIO_DEVICE_INFO_CAP_CXL capability (id=6) returned in the
    VFIO_DEVICE_GET_INFO capability chain, carrying all the information a
    VMM (e.g. QEMU) needs: HDM decoder count, BAR index and offset of the
    component registers, total DPA size, and indices of the two new VFIO
    regions.

  * Two new VFIO region subtypes under the PCI_VENDOR_ID_CXL vendor
    namespace: VFIO_REGION_SUBTYPE_CXL (DPA memory) and
    VFIO_REGION_SUBTYPE_CXL_COMP_REGS (emulated HDM registers).

  * A module parameter (disable_cxl=1) and a per-device flag
    (vdev->disable_cxl) so that the feature can be suppressed for
    individual devices or globally without recompiling.

  * Comprehensive selftests in tools/testing/selftests/vfio/ covering
    device detection, capability parsing, region enumeration, HDM register
    emulation, DPA mmap with page-fault insertion, FLR invalidation, and
    DVSEC register emulation.

This new design is moved away from variant driver approach and all the
CXL functionality is now made part of vfio-pci driver.

The reasons for this change are:

  * Generic CXL Type-2 support features (DVSEC, HDM, regions, reset)
    are common to all CXL adapters and don't belong in variant drivers.
    When something is vendor-specific (e.g. live migration, proprietary
    features), a variant is appropriate; generic CXL behavior should
    not require a vendor-specific driver. Generic CXL support belongs
    in the core, not behind a variant.

  * With this new approach, the user always binds to vfio-pci. No need to
    choose or document a CXL-specific or vendor-specific driver for
    standard CXL Type-2 passthrough.

  * For any CXL Type-2 device, enlightened vfio-pci works with any device
    that presents CXL Device DVSEC and the expected component layout.

  * CXL detection, state, register emulation, region creation, and reset
    live in a CXL-aware layer invoked from the core (optionally built
    via CONFIG_VFIO_CXL_CORE). The core stays a single entry point;
    CXL is an optional extension, not a separate driver stack.

  * Pushing CXL into the pci-core avoids per-device CXL detection and
    feature toggling inside vendor-specific drivers.

Series structure
================

  * Patches 1-5 extend the CXL subsystem to export the interfaces and
    defines that vfio-pci-core needs.

  * Patches 6-8 lay the vfio-pci-core plumbing.

  * Patches 9-12 implement the core device lifecycle and DPA region.

  * Patches 13-15 implement configuration-space and register emulation.

  * Patches 16-18 wire everything together.

  * Patches 19-20 add documentation and testing.

Limitations and future work
===========================

  * This series does not yet support switched topologies with more than one
    caching agent; that is planned for a future series.

  * RAS / ECC / CCA / Reset Support
    This design will integrate RAS and ECC handling in generic vfio-pci by
    leveraging CXL core and RAS capabilities in next patch updates.

  * cxl_reset support [2]
    Integrate changes from Srirangan to have VFIO-CXL reset support.

Dependencies
============

[1] Type2 device basic support https://lore.kernel.org/linux-cxl/20260201155438.2664640-1-alejandro.lucero-palau@amd.com/
[2] CXL Reset support for Type 2 devices https://lore.kernel.org/linux-cxl/20260306092322.148765-1-smadhavan@nvidia.com/

Cc: Alex Williamson <alex.williamson@redhat.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Ira Weiny <ira.weiny@intel.com>
Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Cc: Alejandro Lucero <alejandro.lucero-palau@amd.com>
Cc: linux-cxl@vger.kernel.org
Cc: kvm@vger.kernel.org

Co-developed-by: Zhi Wang <zhiw@nvidia.com>
Signed-off-by: Zhi Wang <zhiw@nvidia.com>
Signed-off-by: Manish Honap <mhonap@nvidia.com>

--

Manish Honap (20):
  cxl: Introduce cxl_get_hdm_reg_info()
  cxl: Expose cxl subsystem specific functions for vfio
  cxl: Move CXL spec defines to public header
  cxl: Media ready check refactoring
  cxl: Expose BAR index and offset from register map
  vfio/cxl: Add UAPI for CXL Type-2 device passthrough
  vfio/pci: Add CXL state to vfio_pci_core_device
  vfio/pci: Add vfio-cxl Kconfig and build infrastructure
  vfio/cxl: Implement CXL device detection and HDM register probing
  vfio/cxl: CXL region management
  vfio/cxl: Expose DPA memory region to userspace with fault+zap mmap
  vfio/pci: Export config access helpers
  vfio/cxl: Introduce HDM decoder register emulation framework
  vfio/cxl: Check media readiness and create CXL memdev
  vfio/cxl: Introduce CXL DVSEC configuration space emulation
  vfio/pci: Expose CXL device and region info via VFIO ioctl
  vfio/cxl: Provide opt-out for CXL feature
  docs: vfio-pci: Document CXL Type-2 device passthrough
  selftests/vfio: Add CXL Type-2 passthrough tests
  selftests/vfio: Fix VLA initialisation in vfio_pci_irq_set()

 Documentation/driver-api/index.rst            |   1 +
 Documentation/driver-api/vfio-pci-cxl.rst     | 216 +++++
 drivers/cxl/core/pci.c                        |  80 +-
 drivers/cxl/core/regs.c                       |  29 +
 drivers/cxl/cxl.h                             |  34 -
 drivers/vfio/pci/Kconfig                      |   2 +
 drivers/vfio/pci/Makefile                     |   1 +
 drivers/vfio/pci/cxl/Kconfig                  |   7 +
 drivers/vfio/pci/cxl/vfio_cxl_config.c        | 304 +++++++
 drivers/vfio/pci/cxl/vfio_cxl_core.c          | 713 +++++++++++++++
 drivers/vfio/pci/cxl/vfio_cxl_emu.c           | 414 +++++++++
 drivers/vfio/pci/cxl/vfio_cxl_priv.h          | 123 +++
 drivers/vfio/pci/vfio_pci.c                   |  32 +
 drivers/vfio/pci/vfio_pci_config.c            |  58 +-
 drivers/vfio/pci/vfio_pci_core.c              |  31 +
 drivers/vfio/pci/vfio_pci_priv.h              |  72 ++
 drivers/vfio/pci/vfio_pci_rdwr.c              |   8 +
 include/cxl/cxl.h                             |  52 ++
 include/linux/vfio_pci_core.h                 |  10 +
 include/uapi/linux/vfio.h                     |  52 ++
 tools/testing/selftests/vfio/Makefile         |   1 +
 .../selftests/vfio/lib/vfio_pci_device.c      |   4 +-
 .../selftests/vfio/vfio_cxl_type2_test.c      | 816 ++++++++++++++++++
 23 files changed, 3013 insertions(+), 47 deletions(-)
 create mode 100644 Documentation/driver-api/vfio-pci-cxl.rst
 create mode 100644 drivers/vfio/pci/cxl/Kconfig
 create mode 100644 drivers/vfio/pci/cxl/vfio_cxl_config.c
 create mode 100644 drivers/vfio/pci/cxl/vfio_cxl_core.c
 create mode 100644 drivers/vfio/pci/cxl/vfio_cxl_emu.c
 create mode 100644 drivers/vfio/pci/cxl/vfio_cxl_priv.h
 create mode 100644 tools/testing/selftests/vfio/vfio_cxl_type2_test.c

base-commit: 3f7938b1aec7f06d5b23adca83e4542fcf027001
--
2.25.1