Let's provide variants of track_pfn_remap() and untrack_pfn() that won't
mess with VMAs, to replace the existing interface step-by-step.
Add some documentation.
Signed-off-by: David Hildenbrand <david@redhat.com>
---
arch/x86/mm/pat/memtype.c | 14 ++++++++++++++
include/linux/pgtable.h | 33 +++++++++++++++++++++++++++++++++
2 files changed, 47 insertions(+)
diff --git a/arch/x86/mm/pat/memtype.c b/arch/x86/mm/pat/memtype.c
index 193e33251b18f..c011d8dd8f441 100644
--- a/arch/x86/mm/pat/memtype.c
+++ b/arch/x86/mm/pat/memtype.c
@@ -1068,6 +1068,20 @@ int pfnmap_sanitize_pgprot(unsigned long pfn, unsigned long size, pgprot_t *prot
return 0;
}
+int pfnmap_track(unsigned long pfn, unsigned long size, pgprot_t *prot)
+{
+ const resource_size_t paddr = (resource_size_t)pfn << PAGE_SHIFT;
+
+ return reserve_pfn_range(paddr, size, prot, 0);
+}
+
+void pfnmap_untrack(unsigned long pfn, unsigned long size)
+{
+ const resource_size_t paddr = (resource_size_t)pfn << PAGE_SHIFT;
+
+ free_pfn_range(paddr, size);
+}
+
/*
* untrack_pfn is called while unmapping a pfnmap for a region.
* untrack can be called for a specific region indicated by pfn and size or
diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h
index 91aadfe2515a5..898a3ab195578 100644
--- a/include/linux/pgtable.h
+++ b/include/linux/pgtable.h
@@ -1506,6 +1506,16 @@ static inline int pfnmap_sanitize_pgprot(unsigned long pfn, unsigned long size,
return 0;
}
+static inline int pfnmap_track(unsigned long pfn, unsigned long size,
+ pgprot_t *prot)
+{
+ return 0;
+}
+
+static inline void pfnmap_untrack(unsigned long pfn, unsigned long size)
+{
+}
+
/*
* track_pfn_copy is called when a VM_PFNMAP VMA is about to get the page
* tables copied during copy_page_range(). Will store the pfn to be
@@ -1570,6 +1580,29 @@ extern int track_pfn_remap(struct vm_area_struct *vma, pgprot_t *prot,
*/
int pfnmap_sanitize_pgprot(unsigned long pfn, unsigned long size,
pgprot_t *prot);
+
+/**
+ * pfnmap_track - track a pfn range
+ * @pfn: the start of the pfn range
+ * @size: the size of the pfn range
+ * @prot: the pgprot to track
+ *
+ * Tracking a pfnmap range involves conditionally reserving a pfn range and
+ * sanitizing the pgprot -- see pfnmap_sanitize_pgprot().
+ *
+ * Returns 0 on success and -EINVAL on error.
+ */
+int pfnmap_track(unsigned long pfn, unsigned long size, pgprot_t *prot);
+
+/**
+ * pfnmap_untrack - untrack a pfn range
+ * @pfn: the start of the pfn range
+ * @size: the size of the pfn range
+ *
+ * Untrack a pfn range previously tracked through pfnmap_track(), for example,
+ * un-doing any reservation.
+ */
+void pfnmap_untrack(unsigned long pfn, unsigned long size);
extern int track_pfn_copy(struct vm_area_struct *dst_vma,
struct vm_area_struct *src_vma, unsigned long *pfn);
extern void untrack_pfn_copy(struct vm_area_struct *dst_vma,
--
2.49.0
On Fri, Apr 25, 2025 at 10:17:07AM +0200, David Hildenbrand wrote:
> Let's provide variants of track_pfn_remap() and untrack_pfn() that won't
> mess with VMAs, to replace the existing interface step-by-step.
>
> Add some documentation.
>
> Signed-off-by: David Hildenbrand <david@redhat.com>
There's some pedantry below, but this looks fine generally, so
notwithstanding that,
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
> ---
> arch/x86/mm/pat/memtype.c | 14 ++++++++++++++
> include/linux/pgtable.h | 33 +++++++++++++++++++++++++++++++++
> 2 files changed, 47 insertions(+)
>
> diff --git a/arch/x86/mm/pat/memtype.c b/arch/x86/mm/pat/memtype.c
> index 193e33251b18f..c011d8dd8f441 100644
> --- a/arch/x86/mm/pat/memtype.c
> +++ b/arch/x86/mm/pat/memtype.c
> @@ -1068,6 +1068,20 @@ int pfnmap_sanitize_pgprot(unsigned long pfn, unsigned long size, pgprot_t *prot
> return 0;
> }
>
> +int pfnmap_track(unsigned long pfn, unsigned long size, pgprot_t *prot)
> +{
> + const resource_size_t paddr = (resource_size_t)pfn << PAGE_SHIFT;
> +
> + return reserve_pfn_range(paddr, size, prot, 0);
Nitty, but a pattern established by Liam which we've followed consistently
in VMA code is to prefix parameters that might be less than obvious,
especially boolean parameters, with a comment naming the parameter, e.g.:
return reserve_pfn_range(paddr, size, prot, /*strict_prot=*/0);
> +}
> +
> +void pfnmap_untrack(unsigned long pfn, unsigned long size)
> +{
> + const resource_size_t paddr = (resource_size_t)pfn << PAGE_SHIFT;
> +
> + free_pfn_range(paddr, size);
> +}
> +
> /*
> * untrack_pfn is called while unmapping a pfnmap for a region.
> * untrack can be called for a specific region indicated by pfn and size or
> diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h
> index 91aadfe2515a5..898a3ab195578 100644
> --- a/include/linux/pgtable.h
> +++ b/include/linux/pgtable.h
> @@ -1506,6 +1506,16 @@ static inline int pfnmap_sanitize_pgprot(unsigned long pfn, unsigned long size,
> return 0;
> }
>
> +static inline int pfnmap_track(unsigned long pfn, unsigned long size,
> + pgprot_t *prot)
> +{
> + return 0;
> +}
> +
> +static inline void pfnmap_untrack(unsigned long pfn, unsigned long size)
> +{
> +}
> +
> /*
> * track_pfn_copy is called when a VM_PFNMAP VMA is about to get the page
> * tables copied during copy_page_range(). Will store the pfn to be
> @@ -1570,6 +1580,29 @@ extern int track_pfn_remap(struct vm_area_struct *vma, pgprot_t *prot,
> */
> int pfnmap_sanitize_pgprot(unsigned long pfn, unsigned long size,
> pgprot_t *prot);
> +
> +/**
> + * pfnmap_track - track a pfn range
To risk sounding annoyingly pedantic and giving the kind of review that is
annoying, this really needs to be expanded, I think perhaps this
description is stating the obvious :)
To me the confusing thing is that the 'generic' sounding pfnmap_track() is
actually PAT-specific, so surely the description should give a brief
overview of PAT here, saying it's applicable on x86-64 etc. etc.
I'm not sure there's much use in keeping this generic when it clearly is
not at this point?
> + * @pfn: the start of the pfn range
> + * @size: the size of the pfn range
In what units? Given it's a pfn range it's a bit ambiguous as to whether it
should be expressed in pages/bytes.
> + * @prot: the pgprot to track
> + *
> + * Tracking a pfnmap range involves conditionally reserving a pfn range and
> + * sanitizing the pgprot -- see pfnmap_sanitize_pgprot().
> + *
> + * Returns 0 on success and -EINVAL on error.
> + */
> +int pfnmap_track(unsigned long pfn, unsigned long size, pgprot_t *prot);
> +
> +/**
> + * pfnmap_untrack - untrack a pfn range
> + * @pfn: the start of the pfn range
> + * @size: the size of the pfn range
Same comment as above re: units.
> + *
> + * Untrack a pfn range previously tracked through pfnmap_track(), for example,
> + * un-doing any reservation.
> + */
> +void pfnmap_untrack(unsigned long pfn, unsigned long size);
> extern int track_pfn_copy(struct vm_area_struct *dst_vma,
> struct vm_area_struct *src_vma, unsigned long *pfn);
> extern void untrack_pfn_copy(struct vm_area_struct *dst_vma,
> --
> 2.49.0
>
>>
>> +int pfnmap_track(unsigned long pfn, unsigned long size, pgprot_t *prot)
>> +{
>> + const resource_size_t paddr = (resource_size_t)pfn << PAGE_SHIFT;
>> +
>> + return reserve_pfn_range(paddr, size, prot, 0);
>
> Nitty, but a pattern established by Liam which we've followed consistently
> in VMA code is to prefix parameters that might be less than obvious,
> especially boolean parameters, with a comment naming the parameter, e.g.:
> > return reserve_pfn_range(paddr, size, prot, /*strict_prot=*/0);
Not sure I like that. But as this parameter goes away patch #8, I'll
leave it as is in this patch and not start a bigger discussion on better
alternatives (don't use these stupid boolean variables ...) ;)
[...]
>> +
>> +/**
>> + * pfnmap_track - track a pfn range
>
> To risk sounding annoyingly pedantic and giving the kind of review that is
> annoying, this really needs to be expanded, I think perhaps this
> description is stating the obvious :)
>
> To me the confusing thing is that the 'generic' sounding pfnmap_track() is
> actually PAT-specific, so surely the description should give a brief
> overview of PAT here, saying it's applicable on x86-64 etc. etc.
>
> I'm not sure there's much use in keeping this generic when it clearly is
> not at this point?
Sorry, is your suggestion to document more PAT stuff or what exactly?
As you know, I'm a busy man ... so instructions/suggestions please :)
>
>> + * @pfn: the start of the pfn range
>> + * @size: the size of the pfn range
>
> In what units? Given it's a pfn range it's a bit ambiguous as to whether it
> should be expressed in pages/bytes.
Agreed. It's bytes. (not my favorite here, but good enough)
--
Cheers,
David / dhildenb
On Mon, Apr 28, 2025 at 07:12:11PM +0200, David Hildenbrand wrote:
>
> > >
> > > +int pfnmap_track(unsigned long pfn, unsigned long size, pgprot_t *prot)
> > > +{
> > > + const resource_size_t paddr = (resource_size_t)pfn << PAGE_SHIFT;
> > > +
> > > + return reserve_pfn_range(paddr, size, prot, 0);
> >
> > Nitty, but a pattern established by Liam which we've followed consistently
> > in VMA code is to prefix parameters that might be less than obvious,
> > especially boolean parameters, with a comment naming the parameter, e.g.:
> > > return reserve_pfn_range(paddr, size, prot, /*strict_prot=*/0);
>
> Not sure I like that. But as this parameter goes away patch #8, I'll leave
> it as is in this patch and not start a bigger discussion on better
> alternatives (don't use these stupid boolean variables ...) ;)
>
> [...]
>
> > > +
> > > +/**
> > > + * pfnmap_track - track a pfn range
> >
> > To risk sounding annoyingly pedantic and giving the kind of review that is
> > annoying, this really needs to be expanded, I think perhaps this
> > description is stating the obvious :)
> >
> > To me the confusing thing is that the 'generic' sounding pfnmap_track() is
> > actually PAT-specific, so surely the description should give a brief
> > overview of PAT here, saying it's applicable on x86-64 etc. etc.
> >
> > I'm not sure there's much use in keeping this generic when it clearly is
> > not at this point?
>
> Sorry, is your suggestion to document more PAT stuff or what exactly?
>
> As you know, I'm a busy man ... so instructions/suggestions please :)
Haha sure, I _think_ the model here is to have a brief summary then underneath a
more detailed explanation, so that could be:
This address range is requested to be 'tracked' by a hardware
implementation allowing fine-grained control of memory attributes at
page level granularity.
This allows for fine-grained control of memory cache behaviour. Tracking
memory this way is persisted across VMA split and merge.
Currently there is only one implementation for this - x86 Page Attribute
Table (PAT). See Documentation/arch/x86/pat.rst for more details.
>
> >
> > > + * @pfn: the start of the pfn range
> > > + * @size: the size of the pfn range
> >
> > In what units? Given it's a pfn range it's a bit ambiguous as to whether it
> > should be expressed in pages/bytes.
>
> Agreed. It's bytes. (not my favorite here, but good enough)
Ack, definitely need to spell it out here! Cheers :)
>
>
> --
> Cheers,
>
> David / dhildenb
>
© 2016 - 2026 Red Hat, Inc.