On 2/17/26 16:30, ankita@nvidia.com wrote:
> From: Ankit Agrawal <ankita@nvidia.com>
>
> This series enables hugepfnmap support in QEMU for VFIO device memory
> regions that have non-power-of-2 sizes. This specifically addresses the
> needs of Grace-based systems (GB200) where device memory is exposed
> as a BAR.
>
> ## Problem
>
> On Grace-based systems, device memory regions can have sizes like
> 0x2F00F00000 (not power-of-2). The current QEMU VFIO mapping code
> aligns each sparse mmap area independently using the trailing zeros
> of its size (ctz64), which results in suboptimal alignment for the
> overall VMA.
>
> This prevents the kernel from using hugepfnmap that enables huge
> page mappings for device memory. Without proper alignment, the
> mapping falls back to PTE, significantly impacting performance
> due to increased TLB pressure and page table overhead for large
> memory regions.
>
> ## Solution
>
> Patch 1: Sort sparse mmap regions by offset during setup and validate
> that they don't overlap. This ensures predictable mapping
> order and enables gap detection.
>
> Patch 2: Adds Error parameter to vfio_region_setup() for better error
> handling and reporting.
>
> Patch 3: Change the alignment strategy from per-sparse-region to
> whole-region alignment using pow2ceil(region->size). Create
> a single aligned base mapping for the entire region, then
> overlay sparse areas with MAP_FIXED. Gaps between sparse
> regions are explicitly unmapped.
>
> v4
> * Replace lx with PRIx64 in the error in 1/3 (Cedric)
> * Error** param for vfio_setup_region_sparse_mmaps in 2/3 (Cedric)
> * Comment to notify that the mapping algorithm expect sorted offset.
> (Cedric)
>
> v3: https://lore.kernel.org/all/20260215084950.4657-1-ankita@nvidia.com/
> * New patch 2/3 to add Error **param in vfio_region_setup (Cedric)
>
> v2: https://lore.kernel.org/all/20260211030615.3202-1-ankita@nvidia.com/
> * Fixed the code returning early without trace (Shameer, Alex)
>
> v1: https://lore.kernel.org/all/20260130040649.42485-1-ankita@nvidia.com/
>
> Ankit Agrawal (3):
> hw/vfio: sort and validate sparse mmap regions by offset
> vfio: Add Error ** parameter to vfio_region_setup()
> hw/vfio: align mmap to power-of-2 of region size for hugepfnmap
>
> hw/vfio/display.c | 6 +-
> hw/vfio/pci.c | 3 +-
> hw/vfio/region.c | 140 ++++++++++++++++++++++++++++++++----------
> hw/vfio/vfio-region.h | 2 +-
> 4 files changed, 114 insertions(+), 37 deletions(-)
>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Thanks,
C.