[PATCH v1 03/11] x86/mm/pat: introduce pfnmap_track() and pfnmap_untrack()

David Hildenbrand posted 11 patches 9 months, 2 weeks ago
There is a newer version of this series
[PATCH v1 03/11] x86/mm/pat: introduce pfnmap_track() and pfnmap_untrack()
Posted by David Hildenbrand 9 months, 2 weeks ago
Let's provide variants of track_pfn_remap() and untrack_pfn() that won't
mess with VMAs, to replace the existing interface step-by-step.

Add some documentation.

Signed-off-by: David Hildenbrand <david@redhat.com>
---
 arch/x86/mm/pat/memtype.c | 14 ++++++++++++++
 include/linux/pgtable.h   | 33 +++++++++++++++++++++++++++++++++
 2 files changed, 47 insertions(+)

diff --git a/arch/x86/mm/pat/memtype.c b/arch/x86/mm/pat/memtype.c
index 193e33251b18f..c011d8dd8f441 100644
--- a/arch/x86/mm/pat/memtype.c
+++ b/arch/x86/mm/pat/memtype.c
@@ -1068,6 +1068,20 @@ int pfnmap_sanitize_pgprot(unsigned long pfn, unsigned long size, pgprot_t *prot
 	return 0;
 }
 
+int pfnmap_track(unsigned long pfn, unsigned long size, pgprot_t *prot)
+{
+	const resource_size_t paddr = (resource_size_t)pfn << PAGE_SHIFT;
+
+	return reserve_pfn_range(paddr, size, prot, 0);
+}
+
+void pfnmap_untrack(unsigned long pfn, unsigned long size)
+{
+	const resource_size_t paddr = (resource_size_t)pfn << PAGE_SHIFT;
+
+	free_pfn_range(paddr, size);
+}
+
 /*
  * untrack_pfn is called while unmapping a pfnmap for a region.
  * untrack can be called for a specific region indicated by pfn and size or
diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h
index 91aadfe2515a5..898a3ab195578 100644
--- a/include/linux/pgtable.h
+++ b/include/linux/pgtable.h
@@ -1506,6 +1506,16 @@ static inline int pfnmap_sanitize_pgprot(unsigned long pfn, unsigned long size,
 	return 0;
 }
 
+static inline int pfnmap_track(unsigned long pfn, unsigned long size,
+		pgprot_t *prot)
+{
+	return 0;
+}
+
+static inline void pfnmap_untrack(unsigned long pfn, unsigned long size)
+{
+}
+
 /*
  * track_pfn_copy is called when a VM_PFNMAP VMA is about to get the page
  * tables copied during copy_page_range(). Will store the pfn to be
@@ -1570,6 +1580,29 @@ extern int track_pfn_remap(struct vm_area_struct *vma, pgprot_t *prot,
  */
 int pfnmap_sanitize_pgprot(unsigned long pfn, unsigned long size,
 		pgprot_t *prot);
+
+/**
+ * pfnmap_track - track a pfn range
+ * @pfn: the start of the pfn range
+ * @size: the size of the pfn range
+ * @prot: the pgprot to track
+ *
+ * Tracking a pfnmap range involves conditionally reserving a pfn range and
+ * sanitizing the pgprot -- see pfnmap_sanitize_pgprot().
+ *
+ * Returns 0 on success and -EINVAL on error.
+ */
+int pfnmap_track(unsigned long pfn, unsigned long size, pgprot_t *prot);
+
+/**
+ * pfnmap_untrack - untrack a pfn range
+ * @pfn: the start of the pfn range
+ * @size: the size of the pfn range
+ *
+ * Untrack a pfn range previously tracked through pfnmap_track(), for example,
+ * un-doing any reservation.
+ */
+void pfnmap_untrack(unsigned long pfn, unsigned long size);
 extern int track_pfn_copy(struct vm_area_struct *dst_vma,
 		struct vm_area_struct *src_vma, unsigned long *pfn);
 extern void untrack_pfn_copy(struct vm_area_struct *dst_vma,
-- 
2.49.0
Re: [PATCH v1 03/11] x86/mm/pat: introduce pfnmap_track() and pfnmap_untrack()
Posted by Lorenzo Stoakes 9 months, 2 weeks ago
On Fri, Apr 25, 2025 at 10:17:07AM +0200, David Hildenbrand wrote:
> Let's provide variants of track_pfn_remap() and untrack_pfn() that won't
> mess with VMAs, to replace the existing interface step-by-step.
>
> Add some documentation.
>
> Signed-off-by: David Hildenbrand <david@redhat.com>

There's some pedantry below, but this looks fine generally, so
notwithstanding that,

Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>

> ---
>  arch/x86/mm/pat/memtype.c | 14 ++++++++++++++
>  include/linux/pgtable.h   | 33 +++++++++++++++++++++++++++++++++
>  2 files changed, 47 insertions(+)
>
> diff --git a/arch/x86/mm/pat/memtype.c b/arch/x86/mm/pat/memtype.c
> index 193e33251b18f..c011d8dd8f441 100644
> --- a/arch/x86/mm/pat/memtype.c
> +++ b/arch/x86/mm/pat/memtype.c
> @@ -1068,6 +1068,20 @@ int pfnmap_sanitize_pgprot(unsigned long pfn, unsigned long size, pgprot_t *prot
>  	return 0;
>  }
>
> +int pfnmap_track(unsigned long pfn, unsigned long size, pgprot_t *prot)
> +{
> +	const resource_size_t paddr = (resource_size_t)pfn << PAGE_SHIFT;
> +
> +	return reserve_pfn_range(paddr, size, prot, 0);

Nitty, but a pattern established by Liam which we've followed consistently
in VMA code is to prefix parameters that might be less than obvious,
especially boolean parameters, with a comment naming the parameter, e.g.:

	return reserve_pfn_range(paddr, size, prot, /*strict_prot=*/0);

> +}
> +
> +void pfnmap_untrack(unsigned long pfn, unsigned long size)
> +{
> +	const resource_size_t paddr = (resource_size_t)pfn << PAGE_SHIFT;
> +
> +	free_pfn_range(paddr, size);
> +}
> +
>  /*
>   * untrack_pfn is called while unmapping a pfnmap for a region.
>   * untrack can be called for a specific region indicated by pfn and size or
> diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h
> index 91aadfe2515a5..898a3ab195578 100644
> --- a/include/linux/pgtable.h
> +++ b/include/linux/pgtable.h
> @@ -1506,6 +1506,16 @@ static inline int pfnmap_sanitize_pgprot(unsigned long pfn, unsigned long size,
>  	return 0;
>  }
>
> +static inline int pfnmap_track(unsigned long pfn, unsigned long size,
> +		pgprot_t *prot)
> +{
> +	return 0;
> +}
> +
> +static inline void pfnmap_untrack(unsigned long pfn, unsigned long size)
> +{
> +}
> +
>  /*
>   * track_pfn_copy is called when a VM_PFNMAP VMA is about to get the page
>   * tables copied during copy_page_range(). Will store the pfn to be
> @@ -1570,6 +1580,29 @@ extern int track_pfn_remap(struct vm_area_struct *vma, pgprot_t *prot,
>   */
>  int pfnmap_sanitize_pgprot(unsigned long pfn, unsigned long size,
>  		pgprot_t *prot);
> +
> +/**
> + * pfnmap_track - track a pfn range

To risk sounding annoyingly pedantic and giving the kind of review that is
annoying, this really needs to be expanded, I think perhaps this
description is stating the obvious :)

To me the confusing thing is that the 'generic' sounding pfnmap_track() is
actually PAT-specific, so surely the description should give a brief
overview of PAT here, saying it's applicable on x86-64 etc. etc.

I'm not sure there's much use in keeping this generic when it clearly is
not at this point?

> + * @pfn: the start of the pfn range
> + * @size: the size of the pfn range

In what units? Given it's a pfn range it's a bit ambiguous as to whether it
should be expressed in pages/bytes.

> + * @prot: the pgprot to track
> + *
> + * Tracking a pfnmap range involves conditionally reserving a pfn range and
> + * sanitizing the pgprot -- see pfnmap_sanitize_pgprot().
> + *
> + * Returns 0 on success and -EINVAL on error.
> + */
> +int pfnmap_track(unsigned long pfn, unsigned long size, pgprot_t *prot);
> +
> +/**
> + * pfnmap_untrack - untrack a pfn range
> + * @pfn: the start of the pfn range
> + * @size: the size of the pfn range

Same comment as above re: units.

> + *
> + * Untrack a pfn range previously tracked through pfnmap_track(), for example,
> + * un-doing any reservation.
> + */
> +void pfnmap_untrack(unsigned long pfn, unsigned long size);
>  extern int track_pfn_copy(struct vm_area_struct *dst_vma,
>  		struct vm_area_struct *src_vma, unsigned long *pfn);
>  extern void untrack_pfn_copy(struct vm_area_struct *dst_vma,
> --
> 2.49.0
>
Re: [PATCH v1 03/11] x86/mm/pat: introduce pfnmap_track() and pfnmap_untrack()
Posted by David Hildenbrand 9 months, 2 weeks ago
>>
>> +int pfnmap_track(unsigned long pfn, unsigned long size, pgprot_t *prot)
>> +{
>> +	const resource_size_t paddr = (resource_size_t)pfn << PAGE_SHIFT;
>> +
>> +	return reserve_pfn_range(paddr, size, prot, 0);
> 
> Nitty, but a pattern established by Liam which we've followed consistently
> in VMA code is to prefix parameters that might be less than obvious,
> especially boolean parameters, with a comment naming the parameter, e.g.:
 > > 	return reserve_pfn_range(paddr, size, prot, /*strict_prot=*/0);

Not sure I like that. But as this parameter goes away patch #8, I'll 
leave it as is in this patch and not start a bigger discussion on better 
alternatives (don't use these stupid boolean variables ...) ;)

[...]

>> +
>> +/**
>> + * pfnmap_track - track a pfn range
> 
> To risk sounding annoyingly pedantic and giving the kind of review that is
> annoying, this really needs to be expanded, I think perhaps this
> description is stating the obvious :)
> 
> To me the confusing thing is that the 'generic' sounding pfnmap_track() is
> actually PAT-specific, so surely the description should give a brief
> overview of PAT here, saying it's applicable on x86-64 etc. etc.
> 
> I'm not sure there's much use in keeping this generic when it clearly is
> not at this point?

Sorry, is your suggestion to document more PAT stuff or what exactly?

As you know, I'm a busy man ... so instructions/suggestions please :)

> 
>> + * @pfn: the start of the pfn range
>> + * @size: the size of the pfn range
> 
> In what units? Given it's a pfn range it's a bit ambiguous as to whether it
> should be expressed in pages/bytes.

Agreed. It's bytes. (not my favorite here, but good enough)


-- 
Cheers,

David / dhildenb
Re: [PATCH v1 03/11] x86/mm/pat: introduce pfnmap_track() and pfnmap_untrack()
Posted by Lorenzo Stoakes 9 months, 2 weeks ago
On Mon, Apr 28, 2025 at 07:12:11PM +0200, David Hildenbrand wrote:
>
> > >
> > > +int pfnmap_track(unsigned long pfn, unsigned long size, pgprot_t *prot)
> > > +{
> > > +	const resource_size_t paddr = (resource_size_t)pfn << PAGE_SHIFT;
> > > +
> > > +	return reserve_pfn_range(paddr, size, prot, 0);
> >
> > Nitty, but a pattern established by Liam which we've followed consistently
> > in VMA code is to prefix parameters that might be less than obvious,
> > especially boolean parameters, with a comment naming the parameter, e.g.:
> > > 	return reserve_pfn_range(paddr, size, prot, /*strict_prot=*/0);
>
> Not sure I like that. But as this parameter goes away patch #8, I'll leave
> it as is in this patch and not start a bigger discussion on better
> alternatives (don't use these stupid boolean variables ...) ;)
>
> [...]
>
> > > +
> > > +/**
> > > + * pfnmap_track - track a pfn range
> >
> > To risk sounding annoyingly pedantic and giving the kind of review that is
> > annoying, this really needs to be expanded, I think perhaps this
> > description is stating the obvious :)
> >
> > To me the confusing thing is that the 'generic' sounding pfnmap_track() is
> > actually PAT-specific, so surely the description should give a brief
> > overview of PAT here, saying it's applicable on x86-64 etc. etc.
> >
> > I'm not sure there's much use in keeping this generic when it clearly is
> > not at this point?
>
> Sorry, is your suggestion to document more PAT stuff or what exactly?
>
> As you know, I'm a busy man ... so instructions/suggestions please :)

Haha sure, I _think_ the model here is to have a brief summary then underneath a
more detailed explanation, so that could be:

	This address range is requested to be 'tracked' by a hardware
	implementation allowing fine-grained control of memory attributes at
	page level granularity.

	This allows for fine-grained control of memory cache behaviour. Tracking
	memory this way is persisted across VMA split and merge.

	Currently there is only one implementation for this - x86 Page Attribute
	Table (PAT). See Documentation/arch/x86/pat.rst for more details.

>
> >
> > > + * @pfn: the start of the pfn range
> > > + * @size: the size of the pfn range
> >
> > In what units? Given it's a pfn range it's a bit ambiguous as to whether it
> > should be expressed in pages/bytes.
>
> Agreed. It's bytes. (not my favorite here, but good enough)

Ack, definitely need to spell it out here! Cheers :)

>
>
> --
> Cheers,
>
> David / dhildenb
>