[PATCH v2 0/2] hw/vfio: Enable hugepfnmap for non-power-of-2 device memory regions

ankita@nvidia.com posted 2 patches 1 month, 4 weeks ago
Patches applied successfully (tree, apply log)
git fetch https://github.com/patchew-project/qemu tags/patchew/20260211030615.3202-1-ankita@nvidia.com
Maintainers: Alex Williamson <alex@shazbot.org>, "Cédric Le Goater" <clg@redhat.com>
There is a newer version of this series
hw/vfio/region.c | 126 ++++++++++++++++++++++++++++++++++++-----------
1 file changed, 98 insertions(+), 28 deletions(-)
[PATCH v2 0/2] hw/vfio: Enable hugepfnmap for non-power-of-2 device memory regions
Posted by ankita@nvidia.com 1 month, 4 weeks ago
From: Ankit Agrawal <ankita@nvidia.com>

This series enables hugepfnmap support in QEMU for VFIO device memory
regions that have non-power-of-2 sizes. This specifically addresses the
needs of Grace-based systems (GB200) where device memory is exposed
as a BAR.

## Problem

On Grace-based systems, device memory regions can have sizes like
0x2F00F00000 (not power-of-2). The current QEMU VFIO mapping code
aligns each sparse mmap area independently using the trailing zeros
of its size (ctz64), which results in suboptimal alignment for the
overall VMA.

This prevents the kernel from using hugepfnmap that enables huge
page mappings for device memory. Without proper alignment, the
mapping falls back to PTE, significantly impacting performance
due to increased TLB pressure and page table overhead for large
memory regions.

## Solution

Patch 1: Sort sparse mmap regions by offset during setup and validate
         that they don't overlap. This ensures predictable mapping
         order and enables gap detection.

Patch 2: Change the alignment strategy from per-sparse-region to
         whole-region alignment using pow2ceil(region->size). Create
         a single aligned base mapping for the entire region, then
         overlay sparse areas with MAP_FIXED. Gaps between sparse
         regions are explicitly unmapped.

v2:
* Fixed the code returning early without trace (Shameer, Alex)

Link: https://lore.kernel.org/all/20260130040649.42485-1-ankita@nvidia.com/ [v1]

Ankit Agrawal (2):
  hw/vfio: sort and validate sparse mmap regions by offset
  hw/vfio: align mmap to power-of-2 of region size for hugepfnmap

 hw/vfio/region.c | 126 ++++++++++++++++++++++++++++++++++++-----------
 1 file changed, 98 insertions(+), 28 deletions(-)

-- 
2.34.1
Re: [PATCH v2 0/2] hw/vfio: Enable hugepfnmap for non-power-of-2 device memory regions
Posted by Alex Williamson 1 month, 4 weeks ago
On Wed, 11 Feb 2026 03:06:13 +0000
<ankita@nvidia.com> wrote:

> From: Ankit Agrawal <ankita@nvidia.com>
> 
> This series enables hugepfnmap support in QEMU for VFIO device memory
> regions that have non-power-of-2 sizes. This specifically addresses the
> needs of Grace-based systems (GB200) where device memory is exposed
> as a BAR.
> 
> ## Problem
> 
> On Grace-based systems, device memory regions can have sizes like
> 0x2F00F00000 (not power-of-2). The current QEMU VFIO mapping code
> aligns each sparse mmap area independently using the trailing zeros
> of its size (ctz64), which results in suboptimal alignment for the
> overall VMA.
> 
> This prevents the kernel from using hugepfnmap that enables huge
> page mappings for device memory. Without proper alignment, the
> mapping falls back to PTE, significantly impacting performance
> due to increased TLB pressure and page table overhead for large
> memory regions.
> 
> ## Solution
> 
> Patch 1: Sort sparse mmap regions by offset during setup and validate
>          that they don't overlap. This ensures predictable mapping
>          order and enables gap detection.
> 
> Patch 2: Change the alignment strategy from per-sparse-region to
>          whole-region alignment using pow2ceil(region->size). Create
>          a single aligned base mapping for the entire region, then
>          overlay sparse areas with MAP_FIXED. Gaps between sparse
>          regions are explicitly unmapped.
> 
> v2:
> * Fixed the code returning early without trace (Shameer, Alex)
> 
> Link: https://lore.kernel.org/all/20260130040649.42485-1-ankita@nvidia.com/ [v1]
> 
> Ankit Agrawal (2):
>   hw/vfio: sort and validate sparse mmap regions by offset
>   hw/vfio: align mmap to power-of-2 of region size for hugepfnmap
> 
>  hw/vfio/region.c | 126 ++++++++++++++++++++++++++++++++++++-----------
>  1 file changed, 98 insertions(+), 28 deletions(-)
> 

Reviewed-by: Alex Williamson <alex@shazbot.org>