[PATCH v5 1/4] mm: Add address apis for ptdescs

Vishal Moola (Oracle) posted 4 patches 1 month, 2 weeks ago
[PATCH v5 1/4] mm: Add address apis for ptdescs
Posted by Vishal Moola (Oracle) 1 month, 2 weeks ago
Architectures frequently only care about the address associated with a
page table. The current ptdesc api forced callers to acquire a ptdesc to
use them. Add more apis to abstract ptdescs away from architectures that
don't need the descriptor.

Add pgtable_alloc_addr() and pgtable_free_addr() to operate on the
underlying addresses associated with page table descriptors, similar to
get_free_pages() and free_pages(). Zero the allocations since
theres no reason to want a page table with stale data.

Have pgtable_alloc_addr() return a void pointer. This will simplify code
for callers since they all want pointers.

Suggested-by: Dave Hansen <dave.hansen@intel.com>
Signed-off-by: Vishal Moola (Oracle) <vishal.moola@gmail.com>
---
 include/linux/mm.h |  4 ++++
 mm/memory.c        | 34 ++++++++++++++++++++++++++++++++++
 2 files changed, 38 insertions(+)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index f8a8fd47399c..9b6d3d910990 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -3419,6 +3419,10 @@ static inline void __pagetable_free(struct ptdesc *pt)
 	__free_pages(page, compound_order(page));
 }
 
+void *pgtable_alloc_addr_noprof(gfp_t gfp, unsigned int order);
+#define pgtable_alloc_addr(...)     alloc_hooks(pgtable_alloc_addr_noprof(__VA_ARGS__))
+void pgtable_free_addr(const void *addr);
+
 #ifdef CONFIG_ASYNC_KERNEL_PGTABLE_FREE
 void pagetable_free_kernel(struct ptdesc *pt);
 #else
diff --git a/mm/memory.c b/mm/memory.c
index 1a26947ed8cd..b9653377d647 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -7452,6 +7452,40 @@ long copy_folio_from_user(struct folio *dst_folio,
 }
 #endif /* CONFIG_TRANSPARENT_HUGEPAGE || CONFIG_HUGETLBFS */
 
+/**
+ * pgtable_alloc_addr - Allocate pagetables to get an address
+ * @gfp:       GFP flags
+ * @order:     desired pagetable order
+ *
+ * pgtable_alloc_addr is like pagetable_alloc. This is for callers who only want a
+ * page table's address, not its ptdesc.
+ *
+ * Return: The address associated with the allocated page table, or 0 on
+ * failure.
+ */
+void *pgtable_alloc_addr_noprof(gfp_t gfp, unsigned int order)
+{
+	struct ptdesc *ptdesc = pagetable_alloc_noprof(gfp | __GFP_ZERO, order);
+
+	if (!ptdesc)
+		return 0;
+	return ptdesc_address(ptdesc);
+}
+
+/**
+ * pgtable_free_addr - Free pagetables by address
+ * @addr:      The virtual address from pgtable_alloc()
+ *
+ * This function is for callers who have the address but no ptdesc. If you
+ * have the ptdesc, use pagetable_free() instead.
+ */
+void pgtable_free_addr(const void *addr)
+{
+	struct ptdesc *ptdesc = virt_to_ptdesc(addr);
+
+	pagetable_free(ptdesc);
+}
+
 #if defined(CONFIG_SPLIT_PTE_PTLOCKS) && ALLOC_SPLIT_PTLOCKS
 
 static struct kmem_cache *page_ptl_cachep;
-- 
2.52.0
Re: [PATCH v5 1/4] mm: Add address apis for ptdescs
Posted by Dave Hansen 1 month, 2 weeks ago
On 2/11/26 11:52, Vishal Moola (Oracle) wrote:
> +/**
> + * pgtable_alloc_addr - Allocate pagetables to get an address
> + * @gfp:       GFP flags
> + * @order:     desired pagetable order

FWIW, I don't like how pgtable_alloc_addr() looks in practice. It reads
like it is: "allocate a page table address", not "allocate a page
table". I don't have a better suggestion other than having:

	pgtable_alloc()

that returns a page table pointer, a void*, and:

	ptdesc_alloc()

which returns a ptdesc*. But I suspect that would get confusing at the
point that ptdescs _themselves_ start getting allocated.
Re: [PATCH v5 1/4] mm: Add address apis for ptdescs
Posted by Vishal Moola (Oracle) 1 month, 2 weeks ago
On Wed, Feb 11, 2026 at 12:13:10PM -0800, Dave Hansen wrote:
> On 2/11/26 11:52, Vishal Moola (Oracle) wrote:
> > +/**
> > + * pgtable_alloc_addr - Allocate pagetables to get an address
> > + * @gfp:       GFP flags
> > + * @order:     desired pagetable order
> 
> FWIW, I don't like how pgtable_alloc_addr() looks in practice. It reads
> like it is: "allocate a page table address", not "allocate a page
> table". I don't have a better suggestion other than having:

Hmmm. I meant for it to read "allocate a page table and get its address."

> 	pgtable_alloc()
> 
> that returns a page table pointer, a void*, and:

Initially, I intended to name it pgtable_alloc() & pgtable_free(). I saw
arm using pgtable_alloc() and powerpc using pgtable_free(), so I looked
for another name.

> 	ptdesc_alloc()
> 
> which returns a ptdesc*. But I suspect that would get confusing at the
> point that ptdescs _themselves_ start getting allocated.

The ptdesc_alloc() equivalent right now is named pagetable_alloc(), so I
don't think it'd get confusing.
Re: [PATCH v5 1/4] mm: Add address apis for ptdescs
Posted by Vishal Moola (Oracle) 1 month, 2 weeks ago
On Wed, Feb 11, 2026 at 02:18:20PM -0800, Vishal Moola (Oracle) wrote:
> On Wed, Feb 11, 2026 at 12:13:10PM -0800, Dave Hansen wrote:
> > On 2/11/26 11:52, Vishal Moola (Oracle) wrote:
> > > +/**
> > > + * pgtable_alloc_addr - Allocate pagetables to get an address
> > > + * @gfp:       GFP flags
> > > + * @order:     desired pagetable order
> > 
> > FWIW, I don't like how pgtable_alloc_addr() looks in practice. It reads
> > like it is: "allocate a page table address", not "allocate a page
> > table". I don't have a better suggestion other than having:
> 
> Hmmm. I meant for it to read "allocate a page table and get its address."
> 
> > 	pgtable_alloc()
> > 
> > that returns a page table pointer, a void*, and:
> 
> Initially, I intended to name it pgtable_alloc() & pgtable_free(). I saw
> arm using pgtable_alloc() and powerpc using pgtable_free(), so I looked
> for another name.

I've done some digging about these names.
The arm cases uses a function pointer, so we should be able to use that
name without issue.

What do you think is a reasonable name for freeing?

pgtable_free() is defined for sparc and powerpc. I could rename them
prefixed with "__" to get the name since they only have 1-2 internal
callers.

> > 	ptdesc_alloc()
> > 
> > which returns a ptdesc*. But I suspect that would get confusing at the
> > point that ptdescs _themselves_ start getting allocated.
> 
> The ptdesc_alloc() equivalent right now is named pagetable_alloc(), so I
> don't think it'd get confusing.
Re: [PATCH v5 1/4] mm: Add address apis for ptdescs
Posted by Vishal Moola (Oracle) 1 month, 1 week ago
On Wed, Feb 11, 2026 at 04:07:54PM -0800, Vishal Moola (Oracle) wrote:
> On Wed, Feb 11, 2026 at 02:18:20PM -0800, Vishal Moola (Oracle) wrote:
> > On Wed, Feb 11, 2026 at 12:13:10PM -0800, Dave Hansen wrote:
> > > On 2/11/26 11:52, Vishal Moola (Oracle) wrote:
> > > > +/**
> > > > + * pgtable_alloc_addr - Allocate pagetables to get an address
> > > > + * @gfp:       GFP flags
> > > > + * @order:     desired pagetable order
> > > 
> > > FWIW, I don't like how pgtable_alloc_addr() looks in practice. It reads
> > > like it is: "allocate a page table address", not "allocate a page
> > > table". I don't have a better suggestion other than having:
> > 
> > Hmmm. I meant for it to read "allocate a page table and get its address."
> > 
> > > 	pgtable_alloc()
> > > 
> > > that returns a page table pointer, a void*, and:
> > 
> > Initially, I intended to name it pgtable_alloc() & pgtable_free(). I saw
> > arm using pgtable_alloc() and powerpc using pgtable_free(), so I looked
> > for another name.
> 
> I've done some digging about these names.
> The arm cases uses a function pointer, so we should be able to use that
> name without issue.

Dave, I wanted to follow up on the below question:

> What do you think is a reasonable name for freeing?
>
> pgtable_free() is defined for sparc and powerpc. I could rename them
> prefixed with "__" to get the name since they only have 1-2 internal
> callers.

Matthew brought another question to my attention in this particular
scenario. Should pat/set_memory's alloc_*_page() use pte_alloc_one()
instead of get_zeroed_page()? Is there any reason not to?
Re: [PATCH v5 1/4] mm: Add address apis for ptdescs
Posted by Dave Hansen 1 month, 1 week ago
On 2/18/26 12:23, Vishal Moola (Oracle) wrote:
>> What do you think is a reasonable name for freeing?
>>
>> pgtable_free() is defined for sparc and powerpc. I could rename them
>> prefixed with "__" to get the name since they only have 1-2 internal
>> callers.
> Matthew brought another question to my attention in this particular
> scenario. Should pat/set_memory's alloc_*_page() use pte_alloc_one()
> instead of get_zeroed_page()? Is there any reason not to?

They're not special in any way I can think of. There's no reason I know
of to keep them special and avoid converting them.
Re: [PATCH v5 1/4] mm: Add address apis for ptdescs
Posted by Matthew Wilcox 1 month, 2 weeks ago
On Wed, Feb 11, 2026 at 12:13:10PM -0800, Dave Hansen wrote:
> On 2/11/26 11:52, Vishal Moola (Oracle) wrote:
> > +/**
> > + * pgtable_alloc_addr - Allocate pagetables to get an address
> > + * @gfp:       GFP flags
> > + * @order:     desired pagetable order
> 
> FWIW, I don't like how pgtable_alloc_addr() looks in practice. It reads
> like it is: "allocate a page table address", not "allocate a page
> table". I don't have a better suggestion other than having:
> 
> 	pgtable_alloc()
> 
> that returns a page table pointer, a void*, and:
> 
> 	ptdesc_alloc()
> 
> which returns a ptdesc*. But I suspect that would get confusing at the
> point that ptdescs _themselves_ start getting allocated.

I think that's fine and consistent with folio_alloc().  Internally to
ptdesc_alloc(), it'll use a kmem_cache_alloc(), so there won't be
any confusion.