Architectures frequently only care about the address associated with a
page table. The current ptdesc api forced callers to acquire a ptdesc to
use them. Add more apis to abstract ptdescs away from architectures that
don't need the descriptor.
Add pgtable_alloc_addr() and pgtable_free_addr() to operate on the
underlying addresses associated with page table descriptors, similar to
get_free_pages() and free_pages(). Zero the allocations since
theres no reason to want a page table with stale data.
Have pgtable_alloc_addr() return a void pointer. This will simplify code
for callers since they all want pointers.
Suggested-by: Dave Hansen <dave.hansen@intel.com>
Signed-off-by: Vishal Moola (Oracle) <vishal.moola@gmail.com>
---
include/linux/mm.h | 4 ++++
mm/memory.c | 34 ++++++++++++++++++++++++++++++++++
2 files changed, 38 insertions(+)
diff --git a/include/linux/mm.h b/include/linux/mm.h
index f8a8fd47399c..9b6d3d910990 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -3419,6 +3419,10 @@ static inline void __pagetable_free(struct ptdesc *pt)
__free_pages(page, compound_order(page));
}
+void *pgtable_alloc_addr_noprof(gfp_t gfp, unsigned int order);
+#define pgtable_alloc_addr(...) alloc_hooks(pgtable_alloc_addr_noprof(__VA_ARGS__))
+void pgtable_free_addr(const void *addr);
+
#ifdef CONFIG_ASYNC_KERNEL_PGTABLE_FREE
void pagetable_free_kernel(struct ptdesc *pt);
#else
diff --git a/mm/memory.c b/mm/memory.c
index 1a26947ed8cd..b9653377d647 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -7452,6 +7452,40 @@ long copy_folio_from_user(struct folio *dst_folio,
}
#endif /* CONFIG_TRANSPARENT_HUGEPAGE || CONFIG_HUGETLBFS */
+/**
+ * pgtable_alloc_addr - Allocate pagetables to get an address
+ * @gfp: GFP flags
+ * @order: desired pagetable order
+ *
+ * pgtable_alloc_addr is like pagetable_alloc. This is for callers who only want a
+ * page table's address, not its ptdesc.
+ *
+ * Return: The address associated with the allocated page table, or 0 on
+ * failure.
+ */
+void *pgtable_alloc_addr_noprof(gfp_t gfp, unsigned int order)
+{
+ struct ptdesc *ptdesc = pagetable_alloc_noprof(gfp | __GFP_ZERO, order);
+
+ if (!ptdesc)
+ return 0;
+ return ptdesc_address(ptdesc);
+}
+
+/**
+ * pgtable_free_addr - Free pagetables by address
+ * @addr: The virtual address from pgtable_alloc()
+ *
+ * This function is for callers who have the address but no ptdesc. If you
+ * have the ptdesc, use pagetable_free() instead.
+ */
+void pgtable_free_addr(const void *addr)
+{
+ struct ptdesc *ptdesc = virt_to_ptdesc(addr);
+
+ pagetable_free(ptdesc);
+}
+
#if defined(CONFIG_SPLIT_PTE_PTLOCKS) && ALLOC_SPLIT_PTLOCKS
static struct kmem_cache *page_ptl_cachep;
--
2.52.0
On 2/11/26 11:52, Vishal Moola (Oracle) wrote: > +/** > + * pgtable_alloc_addr - Allocate pagetables to get an address > + * @gfp: GFP flags > + * @order: desired pagetable order FWIW, I don't like how pgtable_alloc_addr() looks in practice. It reads like it is: "allocate a page table address", not "allocate a page table". I don't have a better suggestion other than having: pgtable_alloc() that returns a page table pointer, a void*, and: ptdesc_alloc() which returns a ptdesc*. But I suspect that would get confusing at the point that ptdescs _themselves_ start getting allocated.
On Wed, Feb 11, 2026 at 12:13:10PM -0800, Dave Hansen wrote: > On 2/11/26 11:52, Vishal Moola (Oracle) wrote: > > +/** > > + * pgtable_alloc_addr - Allocate pagetables to get an address > > + * @gfp: GFP flags > > + * @order: desired pagetable order > > FWIW, I don't like how pgtable_alloc_addr() looks in practice. It reads > like it is: "allocate a page table address", not "allocate a page > table". I don't have a better suggestion other than having: Hmmm. I meant for it to read "allocate a page table and get its address." > pgtable_alloc() > > that returns a page table pointer, a void*, and: Initially, I intended to name it pgtable_alloc() & pgtable_free(). I saw arm using pgtable_alloc() and powerpc using pgtable_free(), so I looked for another name. > ptdesc_alloc() > > which returns a ptdesc*. But I suspect that would get confusing at the > point that ptdescs _themselves_ start getting allocated. The ptdesc_alloc() equivalent right now is named pagetable_alloc(), so I don't think it'd get confusing.
On Wed, Feb 11, 2026 at 02:18:20PM -0800, Vishal Moola (Oracle) wrote: > On Wed, Feb 11, 2026 at 12:13:10PM -0800, Dave Hansen wrote: > > On 2/11/26 11:52, Vishal Moola (Oracle) wrote: > > > +/** > > > + * pgtable_alloc_addr - Allocate pagetables to get an address > > > + * @gfp: GFP flags > > > + * @order: desired pagetable order > > > > FWIW, I don't like how pgtable_alloc_addr() looks in practice. It reads > > like it is: "allocate a page table address", not "allocate a page > > table". I don't have a better suggestion other than having: > > Hmmm. I meant for it to read "allocate a page table and get its address." > > > pgtable_alloc() > > > > that returns a page table pointer, a void*, and: > > Initially, I intended to name it pgtable_alloc() & pgtable_free(). I saw > arm using pgtable_alloc() and powerpc using pgtable_free(), so I looked > for another name. I've done some digging about these names. The arm cases uses a function pointer, so we should be able to use that name without issue. What do you think is a reasonable name for freeing? pgtable_free() is defined for sparc and powerpc. I could rename them prefixed with "__" to get the name since they only have 1-2 internal callers. > > ptdesc_alloc() > > > > which returns a ptdesc*. But I suspect that would get confusing at the > > point that ptdescs _themselves_ start getting allocated. > > The ptdesc_alloc() equivalent right now is named pagetable_alloc(), so I > don't think it'd get confusing.
On Wed, Feb 11, 2026 at 04:07:54PM -0800, Vishal Moola (Oracle) wrote: > On Wed, Feb 11, 2026 at 02:18:20PM -0800, Vishal Moola (Oracle) wrote: > > On Wed, Feb 11, 2026 at 12:13:10PM -0800, Dave Hansen wrote: > > > On 2/11/26 11:52, Vishal Moola (Oracle) wrote: > > > > +/** > > > > + * pgtable_alloc_addr - Allocate pagetables to get an address > > > > + * @gfp: GFP flags > > > > + * @order: desired pagetable order > > > > > > FWIW, I don't like how pgtable_alloc_addr() looks in practice. It reads > > > like it is: "allocate a page table address", not "allocate a page > > > table". I don't have a better suggestion other than having: > > > > Hmmm. I meant for it to read "allocate a page table and get its address." > > > > > pgtable_alloc() > > > > > > that returns a page table pointer, a void*, and: > > > > Initially, I intended to name it pgtable_alloc() & pgtable_free(). I saw > > arm using pgtable_alloc() and powerpc using pgtable_free(), so I looked > > for another name. > > I've done some digging about these names. > The arm cases uses a function pointer, so we should be able to use that > name without issue. Dave, I wanted to follow up on the below question: > What do you think is a reasonable name for freeing? > > pgtable_free() is defined for sparc and powerpc. I could rename them > prefixed with "__" to get the name since they only have 1-2 internal > callers. Matthew brought another question to my attention in this particular scenario. Should pat/set_memory's alloc_*_page() use pte_alloc_one() instead of get_zeroed_page()? Is there any reason not to?
On 2/18/26 12:23, Vishal Moola (Oracle) wrote: >> What do you think is a reasonable name for freeing? >> >> pgtable_free() is defined for sparc and powerpc. I could rename them >> prefixed with "__" to get the name since they only have 1-2 internal >> callers. > Matthew brought another question to my attention in this particular > scenario. Should pat/set_memory's alloc_*_page() use pte_alloc_one() > instead of get_zeroed_page()? Is there any reason not to? They're not special in any way I can think of. There's no reason I know of to keep them special and avoid converting them.
On Wed, Feb 11, 2026 at 12:13:10PM -0800, Dave Hansen wrote: > On 2/11/26 11:52, Vishal Moola (Oracle) wrote: > > +/** > > + * pgtable_alloc_addr - Allocate pagetables to get an address > > + * @gfp: GFP flags > > + * @order: desired pagetable order > > FWIW, I don't like how pgtable_alloc_addr() looks in practice. It reads > like it is: "allocate a page table address", not "allocate a page > table". I don't have a better suggestion other than having: > > pgtable_alloc() > > that returns a page table pointer, a void*, and: > > ptdesc_alloc() > > which returns a ptdesc*. But I suspect that would get confusing at the > point that ptdescs _themselves_ start getting allocated. I think that's fine and consistent with folio_alloc(). Internally to ptdesc_alloc(), it'll use a kmem_cache_alloc(), so there won't be any confusion.
© 2016 - 2026 Red Hat, Inc.