follow_pfnmap_start() walks the page table for a given address and
fills out the struct follow_pfnmap_args in pfnmap_args_setup().
The address mask of the page table level is already provided to this
latter function for calculating the pfn. This address mask can also
be useful for the caller to determine the extent of the contiguous
mapping.
For example, vfio-pci now supports huge_fault for pfnmaps and is able
to insert pud and pmd mappings. When we DMA map these pfnmaps, ex.
PCI MMIO BARs, we iterate follow_pfnmap_start() to get each pfn to test
for a contiguous pfn range. Providing the mapping address mask allows
us to skip the extent of the mapping level. Assuming a 1GB pud level
and 4KB page size, iterations are reduced by a factor of 256K. In wall
clock time, mapping a 32GB PCI BAR is reduced from ~1s to <1ms.
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: David Hildenbrand <david@redhat.com>
Cc: linux-mm@kvack.org
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: "Mitchell Augustin" <mitchell.augustin@canonical.com>
Tested-by: "Mitchell Augustin" <mitchell.augustin@canonical.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
---
include/linux/mm.h | 2 ++
mm/memory.c | 1 +
2 files changed, 3 insertions(+)
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 7b1068ddcbb7..92b30dba7e38 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -2417,11 +2417,13 @@ struct follow_pfnmap_args {
* Outputs:
*
* @pfn: the PFN of the address
+ * @addr_mask: address mask covering pfn
* @pgprot: the pgprot_t of the mapping
* @writable: whether the mapping is writable
* @special: whether the mapping is a special mapping (real PFN maps)
*/
unsigned long pfn;
+ unsigned long addr_mask;
pgprot_t pgprot;
bool writable;
bool special;
diff --git a/mm/memory.c b/mm/memory.c
index 539c0f7c6d54..8f0969f132fe 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -6477,6 +6477,7 @@ static inline void pfnmap_args_setup(struct follow_pfnmap_args *args,
args->lock = lock;
args->ptep = ptep;
args->pfn = pfn_base + ((args->address & ~addr_mask) >> PAGE_SHIFT);
+ args->addr_mask = addr_mask;
args->pgprot = pgprot;
args->writable = writable;
args->special = special;
--
2.48.1
On 18.02.25 23:22, Alex Williamson wrote: > follow_pfnmap_start() walks the page table for a given address and > fills out the struct follow_pfnmap_args in pfnmap_args_setup(). > The address mask of the page table level is already provided to this > latter function for calculating the pfn. This address mask can also > be useful for the caller to determine the extent of the contiguous > mapping. > > For example, vfio-pci now supports huge_fault for pfnmaps and is able > to insert pud and pmd mappings. When we DMA map these pfnmaps, ex. > PCI MMIO BARs, we iterate follow_pfnmap_start() to get each pfn to test > for a contiguous pfn range. Providing the mapping address mask allows > us to skip the extent of the mapping level. Assuming a 1GB pud level > and 4KB page size, iterations are reduced by a factor of 256K. In wall > clock time, mapping a 32GB PCI BAR is reduced from ~1s to <1ms. > > Cc: Andrew Morton <akpm@linux-foundation.org> > Cc: David Hildenbrand <david@redhat.com> > Cc: linux-mm@kvack.org > Reviewed-by: Peter Xu <peterx@redhat.com> > Reviewed-by: "Mitchell Augustin" <mitchell.augustin@canonical.com> > Tested-by: "Mitchell Augustin" <mitchell.augustin@canonical.com> > Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> > Signed-off-by: Alex Williamson <alex.williamson@redhat.com> > --- Acked-by: David Hildenbrand <david@redhat.com> -- Cheers, David / dhildenb
On Wed, 19 Feb 2025 09:31:48 +0100 David Hildenbrand <david@redhat.com> wrote: > On 18.02.25 23:22, Alex Williamson wrote: > > follow_pfnmap_start() walks the page table for a given address and > > fills out the struct follow_pfnmap_args in pfnmap_args_setup(). > > The address mask of the page table level is already provided to this > > latter function for calculating the pfn. This address mask can also > > be useful for the caller to determine the extent of the contiguous > > mapping. > > > > For example, vfio-pci now supports huge_fault for pfnmaps and is able > > to insert pud and pmd mappings. When we DMA map these pfnmaps, ex. > > PCI MMIO BARs, we iterate follow_pfnmap_start() to get each pfn to test > > for a contiguous pfn range. Providing the mapping address mask allows > > us to skip the extent of the mapping level. Assuming a 1GB pud level > > and 4KB page size, iterations are reduced by a factor of 256K. In wall > > clock time, mapping a 32GB PCI BAR is reduced from ~1s to <1ms. > > > > Cc: Andrew Morton <akpm@linux-foundation.org> > > Cc: David Hildenbrand <david@redhat.com> > > Cc: linux-mm@kvack.org > > Reviewed-by: Peter Xu <peterx@redhat.com> > > Reviewed-by: "Mitchell Augustin" <mitchell.augustin@canonical.com> > > Tested-by: "Mitchell Augustin" <mitchell.augustin@canonical.com> > > Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> > > Signed-off-by: Alex Williamson <alex.williamson@redhat.com> > > --- > > Acked-by: David Hildenbrand <david@redhat.com> Thanks, David! Is there any objection from mm folks to bring this in through the vfio tree? Patch: https://lore.kernel.org/all/20250218222209.1382449-6-alex.williamson@redhat.com/ Series: https://lore.kernel.org/all/20250218222209.1382449-1-alex.williamson@redhat.com/ Thanks, Alex
On 26.02.25 20:54, Alex Williamson wrote: > On Wed, 19 Feb 2025 09:31:48 +0100 > David Hildenbrand <david@redhat.com> wrote: > >> On 18.02.25 23:22, Alex Williamson wrote: >>> follow_pfnmap_start() walks the page table for a given address and >>> fills out the struct follow_pfnmap_args in pfnmap_args_setup(). >>> The address mask of the page table level is already provided to this >>> latter function for calculating the pfn. This address mask can also >>> be useful for the caller to determine the extent of the contiguous >>> mapping. >>> >>> For example, vfio-pci now supports huge_fault for pfnmaps and is able >>> to insert pud and pmd mappings. When we DMA map these pfnmaps, ex. >>> PCI MMIO BARs, we iterate follow_pfnmap_start() to get each pfn to test >>> for a contiguous pfn range. Providing the mapping address mask allows >>> us to skip the extent of the mapping level. Assuming a 1GB pud level >>> and 4KB page size, iterations are reduced by a factor of 256K. In wall >>> clock time, mapping a 32GB PCI BAR is reduced from ~1s to <1ms. >>> >>> Cc: Andrew Morton <akpm@linux-foundation.org> >>> Cc: David Hildenbrand <david@redhat.com> >>> Cc: linux-mm@kvack.org >>> Reviewed-by: Peter Xu <peterx@redhat.com> >>> Reviewed-by: "Mitchell Augustin" <mitchell.augustin@canonical.com> >>> Tested-by: "Mitchell Augustin" <mitchell.augustin@canonical.com> >>> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> >>> Signed-off-by: Alex Williamson <alex.williamson@redhat.com> >>> --- >> >> Acked-by: David Hildenbrand <david@redhat.com> > > Thanks, David! > > Is there any objection from mm folks to bring this in through the vfio > tree? I assume it's fine. Andrew is on CC, so he should be aware of it. I'm not aware of possible clashes. -- Cheers, David / dhildenb
© 2016 - 2025 Red Hat, Inc.