MAINTAINERS | 7 + drivers/gpu/drm/xe/Makefile | 2 + drivers/gpu/drm/xe/xe_gt_sriov_pf_migration.c | 9 + drivers/gpu/drm/xe/xe_pci.c | 17 + drivers/gpu/drm/xe/xe_pci.h | 3 + drivers/gpu/drm/xe/xe_sriov_pf_migration.c | 35 +- drivers/gpu/drm/xe/xe_sriov_pf_migration.h | 1 + .../gpu/drm/xe/xe_sriov_pf_migration_types.h | 4 +- drivers/gpu/drm/xe/xe_sriov_vfio.c | 276 +++++++++ drivers/vfio/pci/Kconfig | 2 + drivers/vfio/pci/Makefile | 2 + drivers/vfio/pci/xe/Kconfig | 12 + drivers/vfio/pci/xe/Makefile | 3 + drivers/vfio/pci/xe/main.c | 568 ++++++++++++++++++ include/drm/intel/xe_sriov_vfio.h | 30 + 15 files changed, 964 insertions(+), 7 deletions(-) create mode 100644 drivers/gpu/drm/xe/xe_sriov_vfio.c create mode 100644 drivers/vfio/pci/xe/Kconfig create mode 100644 drivers/vfio/pci/xe/Makefile create mode 100644 drivers/vfio/pci/xe/main.c create mode 100644 include/drm/intel/xe_sriov_vfio.h
Hi,
We're now at v6, thanks for all the review feedback.
First 24 patches are now already merged through drm-tip tree, and I hope
we can get the remaining ones through the VFIO tree.
No major changes worth highlighting in this rev. Full changelog can be
found below.
Cover letter from the previous revision:
Xe is a DRM driver supporting Intel GPUs and for SR-IOV capable
devices, it enables the creation of SR-IOV VFs.
This series adds xe-vfio-pci driver variant that interacts with Xe
driver to control VF device state and read/write migration data,
allowing it to extend regular vfio-pci functionality with VFIO migration
capability.
The driver doesn't expose PRE_COPY support, as currently supported
hardware lacks the capability to track dirty pages.
While Xe driver already had the capability to manage VF device state,
management of migration data was something that needed to be implemented
and constitutes the majority of the series.
The migration data is processed asynchronously by the Xe driver, and is
organized into multiple migration data packet types representing the
hardware interfaces of the device (GGTT / MMIO / GuC FW / VRAM).
Since the VRAM can potentially be larger than available system memory,
it is copied in multiple chunks. The metadata needed for migration
compatibility decisions is added as part of descriptor packet (currently
limited to PCI device ID / revision).
Xe driver abstracts away the internals of packet processing and takes
care of tracking the position within individual packets.
The API exported to VFIO is similar to API exported by VFIO to
userspace, a simple .read()/.write().
Note that some of the VF resources are not virtualized (e.g. GGTT - the
GFX device global virtual address space). This means that the VF driver
needs to be aware that migration has occurred in order to properly
relocate (patching or reemiting data that contains references to GGTT
addresses) before resuming operation.
The code to handle that is already present in upstream Linux and in
production VF drivers for other OSes.
Links to previous revisions for reference.
v1:
https://lore.kernel.org/lkml/20251011193847.1836454-1-michal.winiarski@intel.com/
v2:
https://lore.kernel.org/lkml/20251021224133.577765-1-michal.winiarski@intel.com/
v3:
https://lore.kernel.org/lkml/20251030203135.337696-1-michal.winiarski@intel.com/
v4:
https://lore.kernel.org/lkml/20251105151027.540712-1-michal.winiarski@intel.com/
v5:
https://lore.kernel.org/lkml/20251111010439.347045-1-michal.winiarski@intel.com/
v5 -> v6:
* Exclude the patches already merged through drm-tip
* Add logging when migration is enabled in debug mode (Michał)
* Rename the xe_pf_get_pf helper (Michał)
* Don't use "vendor specific" (yet again) (Michał)
* Kerneldoc tweaks (Michał)
* Use guard(xe_pm_runtime_noresume) instead of assert (Michał)
* Check for num_vfs rather than total_vfs (Michał)
v4 -> v5:
* Require GuC version >= 70.54.0
* Fix VFIO migration migf disable
* Fix null-ptr-deref on save_read error
* Don't use "vendor specific" (again) (Kevin)
* Introduce xe_sriov_packet_types.h (Michał)
* Kernel-doc fixes (Michał)
* Use tile_id / gt_id instead of tile / gt in packet header (Michał)
* Don't use struct_group() in packet (Michał)
* And other, more minor changes
v3 -> v4:
* Add error handling on data_read / data_write path
* Don't match on PCI class, use PCI_DRIVER_OVERRIDE_DEVICE_VFIO helper
instead (Lucas De Marchi)
* Use proper node VMA size inside GGTT save / restore helper (Michał)
* Improve data tracking set_bit / clear_bit wrapper names (Michał)
* Improve packet dump helper (Michał)
* Use drmm for migration mutex init (Michał)
* Rename the pf_device access helper (Michał)
* Use non-interruptible sleep in VRAM copy (Matt)
* Rename xe_sriov_migration_data to xe_sriov_packet along with relevant
functions (Michał)
* Rename per-vf device-level data to xe_sriov_migration_state (Michał)
* Use struct name that matches component name instead of anonymous
struct (Michał)
* Don't add XE_GT_SRIOV_STATE_MAX to state enum, use a helper macro
instead (Michał)
* Kernel-doc fixes (Michał)
v2 -> v3:
* Bind xe-vfio-pci to specific devices instead of using vendor and
class (Christoph Hellwig / Jason Gunthorpe)
* Don't refer to the driver as "vendor specific" (Christoph)
* Use pci_iov_get_pf_drvdata and change the interface to take xe_device
(Jason)
* Update the RUNNING_P2P comment (Jason / Kevin Tian)
* Add state_mutex to protect device state transitions (Kevin)
* Implement .error_detected (Kevin)
* Drop redundant comments (Kevin)
* Explain 1-based indexing and wait_flr_done (Kevin)
* Add a missing get_file() (Kevin)
* Drop redundant state transitions when p2p is supported (Kevin)
* Update run/stop naming to match other drivers (Kevin)
* Fix error state handling (Kevin)
* Fix SAVE state diagram rendering (Michał Wajdeczko)
* Control state machine flipping PROCESS / WAIT logic (Michał Wajdeczko)
* Drop GUC / GGTT / MMIO / VRAM from SAVE control state machine
* Use devm instead of drmm for migration-related allocations (Michał)
* Use GGTT node for size calculations (Michał)
* Use mutex guards consistently (Michał)
* Fix build break on 32-bit (lkp)
* Kernel-doc updates (Michał)
* And other, more minor changes
v1 -> v2:
* Do not require debug flag to support migration on PTL/BMG
* Fix PCI class match on VFIO side
* Reorganized PF Control state machine (Michał Wajdeczko)
* Kerneldoc tidying (Michał Wajdeczko)
* Return NULL instead of -ENODATA for produce/consume (Michał Wajdeczko)
* guc_buf s/sync/sync_read (Matt Brost)
* Squash patch 03 (Matt Brost)
* Assert on PM ref instead of taking it (Matt Brost)
* Remove CCS completely (Matt Brost)
* Return ptr on guc_buf_sync_read (Michał Wajdeczko)
* Define default guc_buf size (Michał Wajdeczko)
* Drop CONFIG_PCI_IOV=n stubs where not needed (Michał Wajdeczko)
* And other, more minor changes
Michał Winiarski (4):
drm/xe/pf: Enable SR-IOV VF migration
drm/xe/pci: Introduce a helper to allow VF access to PF xe_device
drm/xe/pf: Export helpers for VFIO
vfio/xe: Add device specific vfio_pci driver variant for Intel
graphics
MAINTAINERS | 7 +
drivers/gpu/drm/xe/Makefile | 2 +
drivers/gpu/drm/xe/xe_gt_sriov_pf_migration.c | 9 +
drivers/gpu/drm/xe/xe_pci.c | 17 +
drivers/gpu/drm/xe/xe_pci.h | 3 +
drivers/gpu/drm/xe/xe_sriov_pf_migration.c | 35 +-
drivers/gpu/drm/xe/xe_sriov_pf_migration.h | 1 +
.../gpu/drm/xe/xe_sriov_pf_migration_types.h | 4 +-
drivers/gpu/drm/xe/xe_sriov_vfio.c | 276 +++++++++
drivers/vfio/pci/Kconfig | 2 +
drivers/vfio/pci/Makefile | 2 +
drivers/vfio/pci/xe/Kconfig | 12 +
drivers/vfio/pci/xe/Makefile | 3 +
drivers/vfio/pci/xe/main.c | 568 ++++++++++++++++++
include/drm/intel/xe_sriov_vfio.h | 30 +
15 files changed, 964 insertions(+), 7 deletions(-)
create mode 100644 drivers/gpu/drm/xe/xe_sriov_vfio.c
create mode 100644 drivers/vfio/pci/xe/Kconfig
create mode 100644 drivers/vfio/pci/xe/Makefile
create mode 100644 drivers/vfio/pci/xe/main.c
create mode 100644 include/drm/intel/xe_sriov_vfio.h
--
2.51.2
On Tue, 25 Nov 2025 00:08:37 +0100 Michał Winiarski <michal.winiarski@intel.com> wrote: > Hi, > > We're now at v6, thanks for all the review feedback. > > First 24 patches are now already merged through drm-tip tree, and I hope > we can get the remaining ones through the VFIO tree. Are all those dependencies in a topic branch somewhere? Otherwise to go in through vfio would mean we need to rebase our next branch after drm is merged. LPC is happening during this merge window, so we may not be able to achieve that leniency in ordering. Is the better approach to get acks on the variant driver and funnel the whole thing through the drm tree? Thanks, Alex
On Tue, Nov 25, 2025 at 01:13:15PM -0700, Alex Williamson wrote: > On Tue, 25 Nov 2025 00:08:37 +0100 > Michał Winiarski <michal.winiarski@intel.com> wrote: > > > Hi, > > > > We're now at v6, thanks for all the review feedback. > > > > First 24 patches are now already merged through drm-tip tree, and I hope > > we can get the remaining ones through the VFIO tree. > > Are all those dependencies in a topic branch somewhere? Otherwise to > go in through vfio would mean we need to rebase our next branch after > drm is merged. LPC is happening during this merge window, so we may > not be able to achieve that leniency in ordering. Is the better > approach to get acks on the variant driver and funnel the whole thing > through the drm tree? Thanks, +1 on merging through drm if VFIO maintainers are ok with this. I've done this for various drm external changes in the past with maintainers acks. Matt > > Alex
On Tue, 2025-11-25 at 17:20 -0800, Matthew Brost wrote: > On Tue, Nov 25, 2025 at 01:13:15PM -0700, Alex Williamson wrote: > > On Tue, 25 Nov 2025 00:08:37 +0100 > > Michał Winiarski <michal.winiarski@intel.com> wrote: > > > > > Hi, > > > > > > We're now at v6, thanks for all the review feedback. > > > > > > First 24 patches are now already merged through drm-tip tree, and > > > I hope > > > we can get the remaining ones through the VFIO tree. > > > > Are all those dependencies in a topic branch somewhere? Otherwise > > to > > go in through vfio would mean we need to rebase our next branch > > after > > drm is merged. LPC is happening during this merge window, so we > > may > > not be able to achieve that leniency in ordering. Is the better > > approach to get acks on the variant driver and funnel the whole > > thing > > through the drm tree? Thanks, > > +1 on merging through drm if VFIO maintainers are ok with this. I've > done this for various drm external changes in the past with > maintainers > acks. > > Matt @Michal Winiarski Are these patches depending on any other VFIO changes that are queued for 6.19? If not and with proper VFIO acks, I could ask Dave / Sima to allow this for drm-xe-next-fixes pull. Then I also would need a strong justification for it being in 6.19 rather in 7.0. Otherwise we'd need to have the VFIO changes it depends on in a topic branch, or target this for 7.0 and hold off the merge until we can backmerge 6.9-rc1. Thanks, Thomas > > > > > Alex
On Wed, Nov 26, 2025 at 12:38:34PM +0100, Thomas Hellström wrote: > On Tue, 2025-11-25 at 17:20 -0800, Matthew Brost wrote: > > On Tue, Nov 25, 2025 at 01:13:15PM -0700, Alex Williamson wrote: > > > On Tue, 25 Nov 2025 00:08:37 +0100 > > > Michał Winiarski <michal.winiarski@intel.com> wrote: > > > > > > > Hi, > > > > > > > > We're now at v6, thanks for all the review feedback. > > > > > > > > First 24 patches are now already merged through drm-tip tree, and > > > > I hope > > > > we can get the remaining ones through the VFIO tree. > > > > > > Are all those dependencies in a topic branch somewhere? Otherwise > > > to > > > go in through vfio would mean we need to rebase our next branch > > > after > > > drm is merged. LPC is happening during this merge window, so we > > > may > > > not be able to achieve that leniency in ordering. Is the better > > > approach to get acks on the variant driver and funnel the whole > > > thing > > > through the drm tree? Thanks, > > > > +1 on merging through drm if VFIO maintainers are ok with this. I've > > done this for various drm external changes in the past with > > maintainers > > acks. > > > > Matt > > @Michal Winiarski > > Are these patches depending on any other VFIO changes that are queued > for 6.19? No, there's a series that I'm working on in parallel: https://lore.kernel.org/lkml/20251120123647.3522082-1-michal.winiarski@intel.com/ Which will potentially change the VFIO driver that's part of this series. But I believe that this could go through fixes, after we have all the pieces in place as part of 6.19-rc release. > > If not and with proper VFIO acks, I could ask Dave / Sima to allow this > for drm-xe-next-fixes pull. Then I also would need a strong > justification for it being in 6.19 rather in 7.0. > > Otherwise we'd need to have the VFIO changes it depends on in a topic > branch, or target this for 7.0 and hold off the merge until we can > backmerge 6.9-rc1. Unless Alex has a different opinion, I think the justification would be that this is just a matter of logistics - merging through DRM would just be a simpler process than merging through VFIO. End result would be the same. Thanks, -Michał > > Thanks, > Thomas > > > > > > > > > > Alex >
On Wed, 26 Nov 2025 15:46:43 +0100 Michał Winiarski <michal.winiarski@intel.com> wrote: > On Wed, Nov 26, 2025 at 12:38:34PM +0100, Thomas Hellström wrote: > > On Tue, 2025-11-25 at 17:20 -0800, Matthew Brost wrote: > > > On Tue, Nov 25, 2025 at 01:13:15PM -0700, Alex Williamson wrote: > > > > On Tue, 25 Nov 2025 00:08:37 +0100 > > > > Michał Winiarski <michal.winiarski@intel.com> wrote: > > > > > > > > > Hi, > > > > > > > > > > We're now at v6, thanks for all the review feedback. > > > > > > > > > > First 24 patches are now already merged through drm-tip tree, and > > > > > I hope > > > > > we can get the remaining ones through the VFIO tree. > > > > > > > > Are all those dependencies in a topic branch somewhere? Otherwise > > > > to > > > > go in through vfio would mean we need to rebase our next branch > > > > after > > > > drm is merged. LPC is happening during this merge window, so we > > > > may > > > > not be able to achieve that leniency in ordering. Is the better > > > > approach to get acks on the variant driver and funnel the whole > > > > thing > > > > through the drm tree? Thanks, > > > > > > +1 on merging through drm if VFIO maintainers are ok with this. I've > > > done this for various drm external changes in the past with > > > maintainers > > > acks. > > > > > > Matt > > > > @Michal Winiarski > > > > Are these patches depending on any other VFIO changes that are queued > > for 6.19? > > No, there's a series that I'm working on in parallel: > https://lore.kernel.org/lkml/20251120123647.3522082-1-michal.winiarski@intel.com/ > > Which will potentially change the VFIO driver that's part of this > series. > But I believe that this could go through fixes, after we have all the > pieces in place as part of 6.19-rc release. 6.19-rc or 6.19+1, depends on to what extent we decide the other variant drivers have this same problem. This driver has worked around it in the traditional way though and I don't think it needs to be delayed for a universal helper. > > If not and with proper VFIO acks, I could ask Dave / Sima to allow this > > for drm-xe-next-fixes pull. Then I also would need a strong > > justification for it being in 6.19 rather in 7.0. > > > > Otherwise we'd need to have the VFIO changes it depends on in a topic > > branch, or target this for 7.0 and hold off the merge until we can > > backmerge 6.9-rc1. > > Unless Alex has a different opinion, I think the justification would be > that this is just a matter of logistics - merging through DRM would just > be a simpler process than merging through VFIO. End result would be the > same. Yes, the result is the same, logistics of waiting for the drm-next merge, rebasing, and sending a 2nd vfio pull request is the overhead. The easier route through drm still depends on getting full acks on this and whether drm will take it. Thanks, Alex
On Wed, 2025-11-26 at 12:38 +0100, Thomas Hellström wrote: > On Tue, 2025-11-25 at 17:20 -0800, Matthew Brost wrote: > > On Tue, Nov 25, 2025 at 01:13:15PM -0700, Alex Williamson wrote: > > > On Tue, 25 Nov 2025 00:08:37 +0100 > > > Michał Winiarski <michal.winiarski@intel.com> wrote: > > > > > > > Hi, > > > > > > > > We're now at v6, thanks for all the review feedback. > > > > > > > > First 24 patches are now already merged through drm-tip tree, > > > > and > > > > I hope > > > > we can get the remaining ones through the VFIO tree. > > > > > > Are all those dependencies in a topic branch somewhere? > > > Otherwise > > > to > > > go in through vfio would mean we need to rebase our next branch > > > after > > > drm is merged. LPC is happening during this merge window, so we > > > may > > > not be able to achieve that leniency in ordering. Is the better > > > approach to get acks on the variant driver and funnel the whole > > > thing > > > through the drm tree? Thanks, > > > > +1 on merging through drm if VFIO maintainers are ok with this. > > I've > > done this for various drm external changes in the past with > > maintainers > > acks. > > > > Matt > > @Michal Winiarski > > Are these patches depending on any other VFIO changes that are queued > for 6.19? > > If not and with proper VFIO acks, I could ask Dave / Sima to allow > this > for drm-xe-next-fixes pull. Then I also would need a strong > justification for it being in 6.19 rather in 7.0. > > Otherwise we'd need to have the VFIO changes it depends on in a topic > branch, or target this for 7.0 and hold off the merge until we can > backmerge 6.9-rc1. 6.19-rc1 /Thomas > > Thanks, > Thomas > > > > > > > > > > Alex >
© 2016 - 2025 Red Hat, Inc.