On Wed, 11 Feb 2026 03:06:13 +0000
<ankita@nvidia.com> wrote:
> From: Ankit Agrawal <ankita@nvidia.com>
>
> This series enables hugepfnmap support in QEMU for VFIO device memory
> regions that have non-power-of-2 sizes. This specifically addresses the
> needs of Grace-based systems (GB200) where device memory is exposed
> as a BAR.
>
> ## Problem
>
> On Grace-based systems, device memory regions can have sizes like
> 0x2F00F00000 (not power-of-2). The current QEMU VFIO mapping code
> aligns each sparse mmap area independently using the trailing zeros
> of its size (ctz64), which results in suboptimal alignment for the
> overall VMA.
>
> This prevents the kernel from using hugepfnmap that enables huge
> page mappings for device memory. Without proper alignment, the
> mapping falls back to PTE, significantly impacting performance
> due to increased TLB pressure and page table overhead for large
> memory regions.
>
> ## Solution
>
> Patch 1: Sort sparse mmap regions by offset during setup and validate
> that they don't overlap. This ensures predictable mapping
> order and enables gap detection.
>
> Patch 2: Change the alignment strategy from per-sparse-region to
> whole-region alignment using pow2ceil(region->size). Create
> a single aligned base mapping for the entire region, then
> overlay sparse areas with MAP_FIXED. Gaps between sparse
> regions are explicitly unmapped.
>
> v2:
> * Fixed the code returning early without trace (Shameer, Alex)
>
> Link: https://lore.kernel.org/all/20260130040649.42485-1-ankita@nvidia.com/ [v1]
>
> Ankit Agrawal (2):
> hw/vfio: sort and validate sparse mmap regions by offset
> hw/vfio: align mmap to power-of-2 of region size for hugepfnmap
>
> hw/vfio/region.c | 126 ++++++++++++++++++++++++++++++++++++-----------
> 1 file changed, 98 insertions(+), 28 deletions(-)
>
Reviewed-by: Alex Williamson <alex@shazbot.org>