From: Dave Hansen <dave.hansen@linux.intel.com>
The page tables used to map the kernel and userspace often have very
different handling rules. There are frequently *_kernel() variants of
functions just for kernel page tables. That's not great and has lead
to code duplication.
Instead of having completely separate call paths, allow a 'ptdesc' to
be marked as being for kernel mappings. Introduce helpers to set and
clear this status.
Note: this uses the PG_referenced bit. Page flags are a great fit for
this since it is truly a single bit of information. Use PG_referenced
itself because it's a fairly benign flag (as opposed to things like
PG_lock). It's also (according to Willy) unlikely to go away any time
soon.
PG_referenced is not in PAGE_FLAGS_CHECK_AT_FREE. It does not need to
be cleared before freeing the page, and pages coming out of the
allocator should have it cleared. Regardless, introduce an API to
clear it anyway. Having symmetry in the API makes it easier to change
the underlying implementation later, like if there was a need to move
to a PAGE_FLAGS_CHECK_AT_FREE bit.
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
---
include/linux/page-flags.h | 46 ++++++++++++++++++++++++++++++++++++++
1 file changed, 46 insertions(+)
diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
index 8d3fa3a91ce4..1d82fb6fffe5 100644
--- a/include/linux/page-flags.h
+++ b/include/linux/page-flags.h
@@ -1244,6 +1244,52 @@ static inline int folio_has_private(const struct folio *folio)
return !!(folio->flags & PAGE_FLAGS_PRIVATE);
}
+/**
+ * ptdesc_set_kernel - Mark a ptdesc used to map the kernel
+ * @ptdesc: The ptdesc to be marked
+ *
+ * Kernel page tables often need special handling. Set a flag so that
+ * the handling code knows this ptdesc will not be used for userspace.
+ */
+static inline void ptdesc_set_kernel(struct ptdesc *ptdesc)
+{
+ struct folio *folio = ptdesc_folio(ptdesc);
+
+ folio_set_referenced(folio);
+}
+
+/**
+ * ptdesc_clear_kernel - Mark a ptdesc as no longer used to map the kernel
+ * @ptdesc: The ptdesc to be unmarked
+ *
+ * Use when the ptdesc is no longer used to map the kernel and no longer
+ * needs special handling.
+ */
+static inline void ptdesc_clear_kernel(struct ptdesc *ptdesc)
+{
+ struct folio *folio = ptdesc_folio(ptdesc);
+
+ /*
+ * Note: the 'PG_referenced' bit does not strictly need to be
+ * cleared before freeing the page. But this is nice for
+ * symmetry.
+ */
+ folio_clear_referenced(folio);
+}
+
+/**
+ * ptdesc_test_kernel - Check if a ptdesc is used to map the kernel
+ * @ptdesc: The ptdesc being tested
+ *
+ * Call to tell if the ptdesc used to map the kernel.
+ */
+static inline bool ptdesc_test_kernel(struct ptdesc *ptdesc)
+{
+ struct folio *folio = ptdesc_folio(ptdesc);
+
+ return folio_test_referenced(folio);
+}
+
#undef PF_ANY
#undef PF_HEAD
#undef PF_NO_TAIL
--
2.43.0
On Fri, Sep 19, 2025 at 01:39:59PM +0800, Lu Baolu wrote:
> +static inline void ptdesc_set_kernel(struct ptdesc *ptdesc)
> +{
> + struct folio *folio = ptdesc_folio(ptdesc);
> +
> + folio_set_referenced(folio);
> +}
So this was the right way to do this at the time. However, if you look
at commit 522abd92279a this should now be ...
enum pt_flags {
PT_reserved = PG_reserved,
+ PT_kernel = PG_referenced,
/* High bits are used for zone/node/section */
};
[...]
+static inline void ptdesc_set_kernel(struct ptdesc *ptdesc)
+{
+ set_bit(PT_kernel, &pt->pt_flags.f);
+}
(etc)
On 10/9/25 03:56, Matthew Wilcox wrote:
> On Fri, Sep 19, 2025 at 01:39:59PM +0800, Lu Baolu wrote:
>> +static inline void ptdesc_set_kernel(struct ptdesc *ptdesc)
>> +{
>> + struct folio *folio = ptdesc_folio(ptdesc);
>> +
>> + folio_set_referenced(folio);
>> +}
> So this was the right way to do this at the time. However, if you look
> at commit 522abd92279a this should now be ...
>
> enum pt_flags {
> PT_reserved = PG_reserved,
> + PT_kernel = PG_referenced,
> /* High bits are used for zone/node/section */
> };
> [...]
>
> +static inline void ptdesc_set_kernel(struct ptdesc *ptdesc)
> +{
> + set_bit(PT_kernel, &pt->pt_flags.f);
> +}
Thank you for the review comment. I updated the patch like the
following:
diff --git a/include/linux/mm.h b/include/linux/mm.h
index a3f97c551ad8..5abd427b6202 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -2940,6 +2940,7 @@ static inline pmd_t *pmd_alloc(struct mm_struct
*mm, pud_t *pud, unsigned long a
#endif /* CONFIG_MMU */
enum pt_flags {
+ PT_kernel = PG_referenced,
PT_reserved = PG_reserved,
/* High bits are used for zone/node/section */
};
@@ -2965,6 +2966,46 @@ static inline bool pagetable_is_reserved(struct
ptdesc *pt)
return test_bit(PT_reserved, &pt->pt_flags.f);
}
+/**
+ * ptdesc_set_kernel - Mark a ptdesc used to map the kernel
+ * @ptdesc: The ptdesc to be marked
+ *
+ * Kernel page tables often need special handling. Set a flag so that
+ * the handling code knows this ptdesc will not be used for userspace.
+ */
+static inline void ptdesc_set_kernel(struct ptdesc *ptdesc)
+{
+ set_bit(PT_kernel, &ptdesc->pt_flags.f);
+}
+
+/**
+ * ptdesc_clear_kernel - Mark a ptdesc as no longer used to map the kernel
+ * @ptdesc: The ptdesc to be unmarked
+ *
+ * Use when the ptdesc is no longer used to map the kernel and no longer
+ * needs special handling.
+ */
+static inline void ptdesc_clear_kernel(struct ptdesc *ptdesc)
+{
+ /*
+ * Note: the 'PG_referenced' bit does not strictly need to be
+ * cleared before freeing the page. But this is nice for
+ * symmetry.
+ */
+ clear_bit(PT_kernel, &ptdesc->pt_flags.f);
+}
+
+/**
+ * ptdesc_test_kernel - Check if a ptdesc is used to map the kernel
+ * @ptdesc: The ptdesc being tested
+ *
+ * Call to tell if the ptdesc used to map the kernel.
+ */
+static inline bool ptdesc_test_kernel(struct ptdesc *ptdesc)
+{
+ return test_bit(PT_kernel, &ptdesc->pt_flags.f);
+}
+
/**
* pagetable_alloc - Allocate pagetables
* @gfp: GFP flags
Thanks,
baolu
© 2016 - 2026 Red Hat, Inc.