From nobody Mon Jun 8 22:53:12 2026 Received: from smtp-out2.suse.de (smtp-out2.suse.de [195.135.223.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 1F5813F54C8 for ; Mon, 25 May 2026 16:55:46 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=195.135.223.131 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779728150; cv=none; b=Y1AsHd1oA1MLX5bv/wfU+CI8PO8Y3ocJrOwpd0ypSbrN1H5f7WXrmUzLef2hMphRKOwUEpPCpuafKTQn2vb8hrPYRzsSWKlim0jODd33Opvce+p62Ze825voxLOIIAtpbXwX8kAU90+cwe6+zVvylG9XfK4+sGBzivmLDIuh52o= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779728150; c=relaxed/simple; bh=pm8QGLi1V59afrIrxwl169AKigsAU/aCJl698ALvivI=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=NVLpVmjIeTsqLtqptpnD3myeb1eCQ0DW5DDTcPCjnwGg08LL4hqcVFACTgTj5uyZgpzXWes+zeCp8GcK4UkwCyhQr5IY6adYaSrWMI5cgZIymcc/ck7LInk9O3S3JjDUV1LeL1gOOLIyJLF2AiO294TGXsk1EzkJ9+6gdKxzP88= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=suse.de; spf=pass smtp.mailfrom=suse.de; arc=none smtp.client-ip=195.135.223.131 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=suse.de Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=suse.de Received: from imap1.dmz-prg2.suse.org (imap1.dmz-prg2.suse.org [IPv6:2a07:de40:b281:104:10:150:64:97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by smtp-out2.suse.de (Postfix) with ESMTPS id 7C5B0758DB; Mon, 25 May 2026 16:55:45 +0000 (UTC) Authentication-Results: smtp-out2.suse.de; none Received: from imap1.dmz-prg2.suse.org (localhost [127.0.0.1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by imap1.dmz-prg2.suse.org (Postfix) with ESMTPS id 14F9D59D4D; Mon, 25 May 2026 16:55:45 +0000 (UTC) Received: from dovecot-director2.suse.de ([2a07:de40:b281:106:10:150:64:167]) by imap1.dmz-prg2.suse.org with ESMTPSA id YCzkAhF/FGrlRAAAD6G6ig (envelope-from ); Mon, 25 May 2026 16:55:45 +0000 From: Oscar Salvador To: Andrew Morton Cc: David Hildenbrand , Michal Hocko , Muchun Song , Vlastimil Babka , Lorenzo Stoakes , linux-kernel@vger.kernel.org, linux-mm@kvack.org, Oscar Salvador Subject: [RFC PATCH v3 1/8] mm: Add softleaf_from_pud Date: Mon, 25 May 2026 18:55:21 +0200 Message-ID: <20260525165528.184397-2-osalvador@suse.de> X-Mailer: git-send-email 2.54.0 In-Reply-To: <20260525165528.184397-1-osalvador@suse.de> References: <20260525165528.184397-1-osalvador@suse.de> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Rspamd-Pre-Result: action=no action; module=replies; Message is reply to one we originated X-Spamd-Result: default: False [-4.00 / 50.00]; REPLY(-4.00)[] X-Rspamd-Queue-Id: 7C5B0758DB X-Spam-Score: -4.00 X-Rspamd-Pre-Result: action=no action; module=replies; Message is reply to one we originated X-Rspamd-Server: rspamd1.dmz-prg2.suse.org X-Rspamd-Action: no action X-Spam-Flag: NO X-Spam-Level: Content-Type: text/plain; charset="utf-8" We want to be able to operate on HugeTLB pages as we do with normal pages, which means stop pretending everyting is a pte in HugeTLB world and be able to operate on the right entry level. Since we can have HugeTLB as PUD entries, we need the infrastructure that allows us to operate on them, so add softleaf_from_pud(), and the infrastructure that comes with it. Signed-off-by: Oscar Salvador --- arch/arm64/include/asm/pgtable.h | 12 +++++ arch/loongarch/include/asm/pgtable.h | 1 + arch/powerpc/include/asm/book3s/64/pgtable.h | 7 +++ arch/s390/include/asm/pgtable.h | 38 ++++++++++++++++ arch/x86/include/asm/pgtable.h | 48 ++++++++++++++++++++ arch/x86/include/asm/pgtable_64.h | 2 + include/asm-generic/pgtable_uffd.h | 15 ++++++ include/linux/leafops.h | 33 ++++++++++++++ include/linux/pgtable.h | 37 +++++++++++++++ 9 files changed, 193 insertions(+) diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgta= ble.h index 4dfa42b7d053..ca0f1fcae7e8 100644 --- a/arch/arm64/include/asm/pgtable.h +++ b/arch/arm64/include/asm/pgtable.h @@ -593,6 +593,13 @@ static inline int pmd_protnone(pmd_t pmd) #define pmd_mkvalid_k(pmd) pte_pmd(pte_mkvalid_k(pmd_pte(pmd))) #define pmd_mkinvalid(pmd) pte_pmd(pte_mkinvalid(pmd_pte(pmd))) #ifdef CONFIG_HAVE_ARCH_USERFAULTFD_WP +#define pud_uffd_wp(pud) pte_uffd_wp(pud_pte(pud)) +#define pud_mkuffd_wp(pud) pte_pud(pte_mkuffd_wp(pud_pte(pud))) +#define pud_clear_uffd_wp(pud) pte_pud(pte_clear_uffd_wp(pud_pte(pud))) +#define pud_swp_uffd_wp(pud) pte_swp_uffd_wp(pud_pte(pud)) +#define pud_swp_mkuffd_wp(pud) pte_pud(pte_swp_mkuffd_wp(pud_pte(pud))) +#define pud_swp_clear_uffd_wp(pud) \ + pte_pud(pte_swp_clear_uffd_wp(pud_pte(pud))) #define pmd_uffd_wp(pmd) pte_uffd_wp(pmd_pte(pmd)) #define pmd_mkuffd_wp(pmd) pte_pmd(pte_mkuffd_wp(pmd_pte(pmd))) #define pmd_clear_uffd_wp(pmd) pte_pmd(pte_clear_uffd_wp(pmd_pte(pmd))) @@ -1539,6 +1546,11 @@ static inline pmd_t pmdp_establish(struct vm_area_st= ruct *vma, #define __swp_entry_to_pmd(swp) __pmd((swp).val) #endif /* CONFIG_ARCH_ENABLE_THP_MIGRATION */ =20 +#ifdef CONFIG_HUGETLB_PAGE +#define __pud_to_swp_entry(pud) ((swp_entry_t) { pud_val(pud) }) +#define __swp_entry_to_pud(swp) __pud((swp).val) +#endif + /* * Ensure that there are not more swap files than can be encoded in the ke= rnel * PTEs. diff --git a/arch/loongarch/include/asm/pgtable.h b/arch/loongarch/include/= asm/pgtable.h index 2a0b63ae421f..dc6d841ea269 100644 --- a/arch/loongarch/include/asm/pgtable.h +++ b/arch/loongarch/include/asm/pgtable.h @@ -339,6 +339,7 @@ static inline pte_t mk_swap_pte(unsigned long type, uns= igned long offset) #define __swp_entry_to_pmd(x) __pmd((x).val | _PAGE_HUGE) #define __pte_to_swp_entry(pte) ((swp_entry_t) { pte_val(pte) }) #define __pmd_to_swp_entry(pmd) ((swp_entry_t) { pmd_val(pmd) }) +#define __pud_to_swp_entry(pud) ((swp_entry_t) { pud_val(pud) }) =20 static inline bool pte_swp_exclusive(pte_t pte) { diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h b/arch/powerpc/in= clude/asm/book3s/64/pgtable.h index e67e64ac6e8c..fb43e7cc09b6 100644 --- a/arch/powerpc/include/asm/book3s/64/pgtable.h +++ b/arch/powerpc/include/asm/book3s/64/pgtable.h @@ -1065,6 +1065,13 @@ static inline pte_t *pmdp_ptep(pmd_t *pmd) #define pmd_swp_soft_dirty(pmd) pte_swp_soft_dirty(pmd_pte(pmd)) #define pmd_swp_clear_soft_dirty(pmd) pte_pmd(pte_swp_clear_soft_dirty(pmd= _pte(pmd))) #endif + +#ifdef CONFIG_HUGETLB_PAGE +#define pud_swp_mksoft_dirty(pud) pte_pud(pte_swp_mksoft_dirty(pud_pte(pud= ))) +#define pud_swp_soft_dirty(pud) pte_swp_soft_dirty(pud_pte(pud)) +#define pud_swp_clear_soft_dirty(pud) pte_pud(pte_swp_clear_soft_dirty(pud= _pte(pud))) +#endif + #endif /* CONFIG_HAVE_ARCH_SOFT_DIRTY */ =20 #ifdef CONFIG_NUMA_BALANCING diff --git a/arch/s390/include/asm/pgtable.h b/arch/s390/include/asm/pgtabl= e.h index 2c6cee8241e0..c0ebf827fdb0 100644 --- a/arch/s390/include/asm/pgtable.h +++ b/arch/s390/include/asm/pgtable.h @@ -903,11 +903,31 @@ static inline pmd_t pmd_clear_soft_dirty(pmd_t pmd) return clear_pmd_bit(pmd, __pgprot(_SEGMENT_ENTRY_SOFT_DIRTY)); } =20 +static inline int pud_soft_dirty(pud_t pud) +{ + return pud_val(pud) & _REGION3_ENTRY_SOFT_DIRTY; +} + +static inline pud_t pud_mksoft_dirty(pud_t pud) +{ + return set_pud_bit(pud, __pgprot(_REGION3_ENTRY_SOFT_DIRTY)); +} + +static inline pud_t pud_clear_soft_dirty(pud_t pud) +{ + return clear_pud_bit(pud, __pgprot(_REGION3_ENTRY_SOFT_DIRTY)); +} + #ifdef CONFIG_ARCH_ENABLE_THP_MIGRATION #define pmd_swp_soft_dirty(pmd) pmd_soft_dirty(pmd) #define pmd_swp_mksoft_dirty(pmd) pmd_mksoft_dirty(pmd) #define pmd_swp_clear_soft_dirty(pmd) pmd_clear_soft_dirty(pmd) #endif +#ifdef CONFIG_HUGETLB_PAGE +#define pud_swp_soft_dirty(pud) pud_soft_dirty(pud) +#define pud_swp_mksoft_dirty(pud) pud_mksoft_dirty(pud) +#define pud_swp_clear_soft_dirty(pud) pud_clear_soft_dirty(pud) +#endif =20 /* * query functions pte_write/pte_dirty/pte_young only work if @@ -1947,6 +1967,24 @@ static inline unsigned long __swp_offset_rste(swp_en= try_t entry) * requires conversion of the swap type and offset, and not all the possib= le * PTE bits. */ +static inline swp_entry_t __pud_to_swp_entry(pud_t pud) +{ + swp_entry_t arch_entry; + pte_t pte; + + arch_entry =3D __rste_to_swp_entry(pud_val(pud)); + pte =3D mk_swap_pte(__swp_type_rste(arch_entry), __swp_offset_rste(arch_e= ntry)); + return __pte_to_swp_entry(pte); +} + +static inline pud_t __swp_entry_to_pud(swp_entry_t arch_entry) +{ + pud_t pud; + + pud =3D __pud(mk_swap_rste(__swp_type(arch_entry), __swp_offset(arch_entr= y))); + return pud; +} + static inline swp_entry_t __pmd_to_swp_entry(pmd_t pmd) { swp_entry_t arch_entry; diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h index 2187e9cfcefa..a3cf289948a0 100644 --- a/arch/x86/include/asm/pgtable.h +++ b/arch/x86/include/asm/pgtable.h @@ -648,6 +648,23 @@ static inline pud_t pud_mkwrite(pud_t pud) return pud_clear_saveddirty(pud); } =20 +#ifdef CONFIG_HAVE_ARCH_USERFAULTFD_WP +static inline int pud_uffd_wp(pud_t pud) +{ + return pud_flags(pud) & _PAGE_UFFD_WP; +} + +static inline pud_t pud_mkuffd_wp(pud_t pud) +{ + return pud_wrprotect(pud_set_flags(pud, _PAGE_UFFD_WP)); +} + +static inline pud_t pud_clear_uffd_wp(pud_t pud) +{ + return pud_clear_flags(pud, _PAGE_UFFD_WP); +} +#endif + #ifdef CONFIG_HAVE_ARCH_SOFT_DIRTY static inline int pte_soft_dirty(pte_t pte) { @@ -1549,6 +1566,22 @@ static inline pmd_t pmd_swp_clear_soft_dirty(pmd_t p= md) return pmd_clear_flags(pmd, _PAGE_SWP_SOFT_DIRTY); } #endif +#ifdef CONFIG_HUGETLB_PAGE +static inline pud_t pud_swp_mksoft_dirty(pud_t pud) +{ + return pud_set_flags(pud, _PAGE_SWP_SOFT_DIRTY); +} + +static inline int pud_swp_soft_dirty(pud_t pud) +{ + return pud_flags(pud) & _PAGE_SWP_SOFT_DIRTY; +} + +static inline pud_t pud_swp_clear_soft_dirty(pud_t pud) +{ + return pud_clear_flags(pud, _PAGE_SWP_SOFT_DIRTY); +} +#endif #endif =20 #ifdef CONFIG_HAVE_ARCH_USERFAULTFD_WP @@ -1581,6 +1614,21 @@ static inline pmd_t pmd_swp_clear_uffd_wp(pmd_t pmd) { return pmd_clear_flags(pmd, _PAGE_SWP_UFFD_WP); } + +static inline pud_t pud_swp_mkuffd_wp(pud_t pud) +{ + return pud_set_flags(pud, _PAGE_SWP_UFFD_WP); +} + +static inline int pud_swp_uffd_wp(pud_t pud) +{ + return pud_flags(pud) & _PAGE_SWP_UFFD_WP; +} + +static inline pud_t pud_swp_clear_uffd_wp(pud_t pud) +{ + return pud_clear_flags(pud, _PAGE_SWP_UFFD_WP); +} #endif /* CONFIG_HAVE_ARCH_USERFAULTFD_WP */ =20 static inline u16 pte_flags_pkey(unsigned long pte_flags) diff --git a/arch/x86/include/asm/pgtable_64.h b/arch/x86/include/asm/pgtab= le_64.h index ce45882ccd07..0709dee52813 100644 --- a/arch/x86/include/asm/pgtable_64.h +++ b/arch/x86/include/asm/pgtable_64.h @@ -234,8 +234,10 @@ static inline void native_pgd_clear(pgd_t *pgd) =20 #define __pte_to_swp_entry(pte) ((swp_entry_t) { pte_val((pte)) }) #define __pmd_to_swp_entry(pmd) ((swp_entry_t) { pmd_val((pmd)) }) +#define __pud_to_swp_entry(pud) ((swp_entry_t) { pud_val((pud)) }) #define __swp_entry_to_pte(x) (__pte((x).val)) #define __swp_entry_to_pmd(x) (__pmd((x).val)) +#define __swp_entry_to_pud(x) (__pud((x).val)) =20 extern void cleanup_highmap(void); =20 diff --git a/include/asm-generic/pgtable_uffd.h b/include/asm-generic/pgtab= le_uffd.h index 0d85791efdf7..59c9d6762ec8 100644 --- a/include/asm-generic/pgtable_uffd.h +++ b/include/asm-generic/pgtable_uffd.h @@ -78,6 +78,21 @@ static inline pmd_t pmd_swp_clear_uffd_wp(pmd_t pmd) { return pmd; } + +static inline pud_t pud_swp_mkuffd_wp(pud_t pud) +{ + return pud; +} + +static inline int pud_swp_uffd_wp(pud_t pud) +{ + return 0; +} + +static inline pud_t pud_swp_clear_uffd_wp(pud_t pud) +{ + return pud; +} #endif /* CONFIG_HAVE_ARCH_USERFAULTFD_WP */ =20 #endif /* _ASM_GENERIC_PGTABLE_UFFD_H */ diff --git a/include/linux/leafops.h b/include/linux/leafops.h index 992cd8bd8ed0..08646398b0fe 100644 --- a/include/linux/leafops.h +++ b/include/linux/leafops.h @@ -117,6 +117,39 @@ static inline softleaf_t softleaf_from_pmd(pmd_t pmd) =20 #endif =20 +#ifdef CONFIG_HUGETLB_PAGE +/** + * softleaf_from_pud() - Obtain a leaf entry from a PUD entry. + * @pud: PUD entry. + * + * If @pud is present (therefore not a leaf entry) the function returns an= empty + * leaf entry. Otherwise, it returns a leaf entry. + * + * Returns: Leaf entry. + */ +static inline softleaf_t softleaf_from_pud(pud_t pud) +{ + softleaf_t arch_entry; + + if (pud_present(pud) || pud_none(pud)) + return softleaf_mk_none(); + + if (pud_swp_soft_dirty(pud)) + pud =3D pud_swp_clear_soft_dirty(pud); + if (pud_swp_uffd_wp(pud)) + pud =3D pud_swp_clear_uffd_wp(pud); + arch_entry =3D __pud_to_swp_entry(pud); + + /* Temporary until swp_entry_t eliminated. */ + return swp_entry(__swp_type(arch_entry), __swp_offset(arch_entry)); +} +#else +static inline softleaf_t softleaf_from_pud(pud_t pud) +{ + return softleaf_mk_none(); +} +#endif + /** * softleaf_is_none() - Is the leaf entry empty? * @entry: Leaf entry. diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h index cdd68ed3ae1a..70aae957be5b 100644 --- a/include/linux/pgtable.h +++ b/include/linux/pgtable.h @@ -1797,6 +1797,22 @@ static inline pmd_t pmd_swp_clear_soft_dirty(pmd_t p= md) return pmd; } #endif +#ifndef CONFIG_HUGETLB_PAGE +static inline pud_t pud_swp_mksoft_dirty(pud_t pud) +{ + return pud; +} + +static inline int pud_swp_soft_dirty(pud_t pud) +{ + return 0; +} + +static inline pud_t pud_swp_clear_soft_dirty(pud_t pud) +{ + return pud; +} +#endif #else /* !CONFIG_HAVE_ARCH_SOFT_DIRTY */ static inline int pte_soft_dirty(pte_t pte) { @@ -1857,6 +1873,21 @@ static inline pmd_t pmd_swp_clear_soft_dirty(pmd_t p= md) { return pmd; } + +static inline pud_t pud_swp_mksoft_dirty(pud_t pud) +{ + return pud; +} + +static inline int pud_swp_soft_dirty(pud_t pud) +{ + return 0; +} + +static inline pud_t pud_swp_clear_soft_dirty(pud_t pud) +{ + return pud; +} #endif =20 #ifndef __HAVE_PFNMAP_TRACKING @@ -2420,4 +2451,10 @@ pgprot_t vm_get_page_prot(vm_flags_t vm_flags) \ } \ EXPORT_SYMBOL(vm_get_page_prot); =20 +#ifdef CONFIG_HUGETLB_PAGE +#ifndef __pud_to_swp_entry +#define __pud_to_swp_entry(pud) ((swp_entry_t) { pud_val(pud) }) +#endif +#endif + #endif /* _LINUX_PGTABLE_H */ --=20 2.53.0 From nobody Mon Jun 8 22:53:12 2026 Received: from smtp-out1.suse.de (smtp-out1.suse.de [195.135.223.130]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id DE73B3F413B for ; Mon, 25 May 2026 16:55:53 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=195.135.223.130 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779728155; cv=none; b=et1Wi0n76Ow1523dQnx1LgitG6mgQQcfJ6gRdqPQfza8MYISvTtz6yRIdP6j0/Eu91bdC+7r3IHbr4dLY85RLRqF5fIBV7Eof32NLYIQYC0jALxMQlNQSFV8dXMG5HYgjB6aFjmXk/i16ZYVlO10M9WxB3MI4R2YGyPcukF0Swg= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779728155; c=relaxed/simple; bh=BlzaaYDqmNskaOEzBTxtbkzwNhaa2XPVXMyejJDBL7o=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=kqWwgOSEFNhho0ZI/MTp0g6//EdEiyAg235oteh3acaH4hHPVNYFr62Putk1cYrhK59EU/gutIeifWQbKONb4Yw2TfIp1bToTPieXHvGxCW5NdC1kL/SNuPi+Av07oe2KlAilVBRzYy6rTK7AOtsMWGA4xMgCVvL7nYElF9oYp0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=suse.de; spf=pass smtp.mailfrom=suse.de; arc=none smtp.client-ip=195.135.223.130 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=suse.de Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=suse.de Received: from imap1.dmz-prg2.suse.org (imap1.dmz-prg2.suse.org [IPv6:2a07:de40:b281:104:10:150:64:97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by smtp-out1.suse.de (Postfix) with ESMTPS id EDA686B871; Mon, 25 May 2026 16:55:45 +0000 (UTC) Authentication-Results: smtp-out1.suse.de; none Received: from imap1.dmz-prg2.suse.org (localhost [127.0.0.1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by imap1.dmz-prg2.suse.org (Postfix) with ESMTPS id 8924759D4B; Mon, 25 May 2026 16:55:45 +0000 (UTC) Received: from dovecot-director2.suse.de ([2a07:de40:b281:106:10:150:64:167]) by imap1.dmz-prg2.suse.org with ESMTPSA id cPJaHxF/FGrlRAAAD6G6ig (envelope-from ); Mon, 25 May 2026 16:55:45 +0000 From: Oscar Salvador To: Andrew Morton Cc: David Hildenbrand , Michal Hocko , Muchun Song , Vlastimil Babka , Lorenzo Stoakes , linux-kernel@vger.kernel.org, linux-mm@kvack.org, Oscar Salvador Subject: [RFC PATCH v3 2/8] mm: Add {pmd,pud}_huge_lock helper Date: Mon, 25 May 2026 18:55:22 +0200 Message-ID: <20260525165528.184397-3-osalvador@suse.de> X-Mailer: git-send-email 2.54.0 In-Reply-To: <20260525165528.184397-1-osalvador@suse.de> References: <20260525165528.184397-1-osalvador@suse.de> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Rspamd-Pre-Result: action=no action; module=replies; Message is reply to one we originated X-Spam-Score: -4.00 X-Rspamd-Queue-Id: EDA686B871 X-Rspamd-Pre-Result: action=no action; module=replies; Message is reply to one we originated X-Rspamd-Action: no action X-Spam-Level: X-Spamd-Result: default: False [-4.00 / 50.00]; REPLY(-4.00)[] X-Rspamd-Server: rspamd2.dmz-prg2.suse.org X-Spam-Flag: NO Content-Type: text/plain; charset="utf-8" HugeTLB and THP use the same lock for pud and pmd, so create two helpers that can be directly used by both of them, as they will be used in the generic pagewalkers. Signed-off-by: Oscar Salvador --- include/linux/mm_inline.h | 32 ++++++++++++++++++++++++++++++++ 1 file changed, 32 insertions(+) diff --git a/include/linux/mm_inline.h b/include/linux/mm_inline.h index a171070e15f0..93637890cbeb 100644 --- a/include/linux/mm_inline.h +++ b/include/linux/mm_inline.h @@ -667,4 +667,36 @@ static inline size_t num_pages_contiguous(struct page = **pages, size_t nr_pages) return i; } =20 +static inline spinlock_t *pmd_huge_lock(pmd_t *pmd, struct vm_area_struct = *vma) +{ + spinlock_t *ptl; + + if (pmd_present(*pmd) || !pmd_none(*pmd)) { + ptl =3D pmd_lock(vma->vm_mm, pmd); + if (pmd_present(*pmd) && pmd_leaf(*pmd)) + return ptl; + else if (!pmd_present(*pmd) && !pmd_none(*pmd)) + return ptl; + spin_unlock(ptl); + } + + return NULL; +} + +static inline spinlock_t *pud_huge_lock(pud_t *pud, struct vm_area_struct = *vma) +{ + spinlock_t *ptl; + + if (pud_present(*pud) || !pud_none(*pud)) { + ptl =3D pud_lock(vma->vm_mm, pud); + if (pud_present(*pud) && pud_leaf(*pud)) + return ptl; + else if (!pud_present(*pud) && !pud_none(*pud)) + return ptl; + spin_unlock(ptl); + } + + return NULL; +} + #endif --=20 2.53.0 From nobody Mon Jun 8 22:53:12 2026 Received: from smtp-out1.suse.de (smtp-out1.suse.de [195.135.223.130]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C31EE3F4DD0 for ; Mon, 25 May 2026 16:55:59 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=195.135.223.130 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779728161; cv=none; b=os4ITYbyeJG+ZssYfMniI71J5JXnpDySF1Uxf35RQBgst3kuviHuBhVY0Q/mLs1nzSD4H03t+osuAk+Gi5sA7R5FZ9GEgnN8PedenN3JdTffuijvo0GUO8QSiB5bQsFyvy0tiSgdkciX87xjMLlfVCMZoZfvlcEuwxzhPHCytos= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779728161; c=relaxed/simple; bh=4l81V3zvZWgCddm5KkpIdIWl6Kyqyfwqgw4EKtj52Pw=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=fVHURLSYhLNiCcPi7Za0JE/F3fnWUVxwiWex4s65fh5lQO9RSi9fmjXnJ1lmNANvP4iSdkHoFsRaC0N+IG0g3RJtpFxpAhISmqnJlkODoAq76cptSjUHH9H6mLRMVdlMqQy+qmsSnwtup6rIrR95fjjoSilnvA7lHX5uTUv2CAE= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=suse.de; spf=pass smtp.mailfrom=suse.de; dkim=pass (1024-bit key) header.d=suse.de header.i=@suse.de header.b=aKW76ywA; dkim=permerror (0-bit key) header.d=suse.de header.i=@suse.de header.b=sAql24Id; dkim=pass (1024-bit key) header.d=suse.de header.i=@suse.de header.b=aKW76ywA; dkim=permerror (0-bit key) header.d=suse.de header.i=@suse.de header.b=sAql24Id; arc=none smtp.client-ip=195.135.223.130 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=suse.de Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=suse.de Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=suse.de header.i=@suse.de header.b="aKW76ywA"; dkim=permerror (0-bit key) header.d=suse.de header.i=@suse.de header.b="sAql24Id"; dkim=pass (1024-bit key) header.d=suse.de header.i=@suse.de header.b="aKW76ywA"; dkim=permerror (0-bit key) header.d=suse.de header.i=@suse.de header.b="sAql24Id" Received: from imap1.dmz-prg2.suse.org (unknown [10.150.64.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by smtp-out1.suse.de (Postfix) with ESMTPS id 6A0356BBD6; Mon, 25 May 2026 16:55:46 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1779728146; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=cLljK19gyH6FPH1FidJAts1LMmk6IsCmEot0Hs+myqk=; b=aKW76ywAswJI2mh5YjDF795vfzbjL6d08OwANt24uT0WD50UYP3LR8OU+AIXN8F8I//dFe I351jcFoNlBgd46KEWMLj7ETQ/m57weD/0QP4KCgRUgtA0qwFh5inEuvtu95iKms4Iksc7 dSAQj7G85KeGXEqGFpPPYP++RLcB6gY= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1779728146; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=cLljK19gyH6FPH1FidJAts1LMmk6IsCmEot0Hs+myqk=; b=sAql24IdVL8sOekVyeJInylSG9cX1yt1N5exo4PG/mkJ/45j9INALMajqKLOCikEhXK1kp 0lC9paySvakWJHDA== Authentication-Results: smtp-out1.suse.de; none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1779728146; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=cLljK19gyH6FPH1FidJAts1LMmk6IsCmEot0Hs+myqk=; b=aKW76ywAswJI2mh5YjDF795vfzbjL6d08OwANt24uT0WD50UYP3LR8OU+AIXN8F8I//dFe I351jcFoNlBgd46KEWMLj7ETQ/m57weD/0QP4KCgRUgtA0qwFh5inEuvtu95iKms4Iksc7 dSAQj7G85KeGXEqGFpPPYP++RLcB6gY= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1779728146; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=cLljK19gyH6FPH1FidJAts1LMmk6IsCmEot0Hs+myqk=; b=sAql24IdVL8sOekVyeJInylSG9cX1yt1N5exo4PG/mkJ/45j9INALMajqKLOCikEhXK1kp 0lC9paySvakWJHDA== Received: from imap1.dmz-prg2.suse.org (localhost [127.0.0.1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by imap1.dmz-prg2.suse.org (Postfix) with ESMTPS id 0703859D4B; Mon, 25 May 2026 16:55:45 +0000 (UTC) Received: from dovecot-director2.suse.de ([2a07:de40:b281:106:10:150:64:167]) by imap1.dmz-prg2.suse.org with ESMTPSA id CA81OxF/FGrlRAAAD6G6ig (envelope-from ); Mon, 25 May 2026 16:55:45 +0000 From: Oscar Salvador To: Andrew Morton Cc: David Hildenbrand , Michal Hocko , Muchun Song , Vlastimil Babka , Lorenzo Stoakes , linux-kernel@vger.kernel.org, linux-mm@kvack.org, Oscar Salvador Subject: [RFC PATCH v3 3/8] mm: Implement folio_pmd_batch Date: Mon, 25 May 2026 18:55:23 +0200 Message-ID: <20260525165528.184397-4-osalvador@suse.de> X-Mailer: git-send-email 2.54.0 In-Reply-To: <20260525165528.184397-1-osalvador@suse.de> References: <20260525165528.184397-1-osalvador@suse.de> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Spam-Score: -6.80 X-Spam-Level: X-Spamd-Result: default: False [-6.80 / 50.00]; REPLY(-4.00)[]; BAYES_HAM(-3.00)[100.00%]; MID_CONTAINS_FROM(1.00)[]; NEURAL_HAM_LONG(-1.00)[-1.000]; R_MISSING_CHARSET(0.50)[]; NEURAL_HAM_SHORT(-0.20)[-0.998]; MIME_GOOD(-0.10)[text/plain]; RCVD_COUNT_TWO(0.00)[2]; RCVD_VIA_SMTP_AUTH(0.00)[]; FUZZY_RATELIMITED(0.00)[rspamd.com]; ARC_NA(0.00)[]; MIME_TRACE(0.00)[0:+]; FROM_HAS_DN(0.00)[]; TO_MATCH_ENVRCPT_ALL(0.00)[]; FROM_EQ_ENVFROM(0.00)[]; RCPT_COUNT_SEVEN(0.00)[9]; internal_greylist_whitelist(0.00)[10.150.64.97]; DBL_BLOCKED_OPENRESOLVER(0.00)[suse.de:email,suse.de:mid]; DKIM_SIGNED(0.00)[suse.de:s=susede2_rsa,suse.de:s=susede2_ed25519]; TO_DN_SOME(0.00)[]; RCVD_TLS_ALL(0.00)[] X-Spam-Flag: NO Content-Type: text/plain; charset="utf-8" HugeTLB can be mapped as contiguous PMDs, so we need a way to be able to batch them as we do for contiguous PTEs. Implement folio_pmd_batch in order to do that. Signed-off-by: Oscar Salvador --- arch/arm64/include/asm/pgtable.h | 19 ++++++++ include/linux/pgtable.h | 28 ++++++++++++ mm/internal.h | 75 +++++++++++++++++++++++++++++++- 3 files changed, 121 insertions(+), 1 deletion(-) diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgta= ble.h index ca0f1fcae7e8..08ae4ee7d1da 100644 --- a/arch/arm64/include/asm/pgtable.h +++ b/arch/arm64/include/asm/pgtable.h @@ -164,6 +164,8 @@ static inline pteval_t __phys_to_pte_val(phys_addr_t ph= ys) (__boundary - 1 < (end) - 1) ? __boundary : (end); \ }) =20 +#define pmd_valid_cont(pmd) (pmd_valid(pmd) && pmd_cont(pmd)) + #define pte_hw_dirty(pte) (pte_write(pte) && !pte_rdonly(pte)) #define pte_sw_dirty(pte) (!!(pte_val(pte) & PTE_DIRTY)) #define pte_dirty(pte) (pte_sw_dirty(pte) || pte_hw_dirty(pte)) @@ -669,6 +671,12 @@ static inline pgprot_t pmd_pgprot(pmd_t pmd) return __pgprot(pmd_val(pfn_pmd(pfn, __pgprot(0))) ^ pmd_val(pmd)); } =20 +#define pmd_advance_pfn pmd_advance_pfn +static inline pmd_t pmd_advance_pfn(pmd_t pmd, unsigned long nr) +{ + return pfn_pmd(pmd_pfn(pmd) + nr, pmd_pgprot(pmd)); +} + #define pud_pgprot pud_pgprot static inline pgprot_t pud_pgprot(pud_t pud) { @@ -1656,6 +1664,17 @@ extern void modify_prot_commit_ptes(struct vm_area_s= truct *vma, unsigned long ad pte_t *ptep, pte_t old_pte, pte_t pte, unsigned int nr); =20 +#ifdef CONFIG_HUGETLB_PAGE +#define pmd_batch_hint pmd_batch_hint +static inline unsigned int pmd_batch_hint(pmd_t *pmdp, pmd_t pmd) +{ + if (!pmd_valid_cont(pmd)) + return 1; + + return CONT_PMDS - (((unsigned long)pmdp >> 3) & (CONT_PMDS - 1)); +} +#endif + #ifdef CONFIG_ARM64_CONTPTE =20 /* diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h index 70aae957be5b..f5291f9ce583 100644 --- a/include/linux/pgtable.h +++ b/include/linux/pgtable.h @@ -358,6 +358,34 @@ static inline void lazy_mmu_mode_pause(void) {} static inline void lazy_mmu_mode_resume(void) {} #endif =20 +#ifndef pmd_batch_hint +/** + * pmd_batch_hint - Number of PMD entries that can be added to batch witho= ut scanning. + * @pmdp: Page table pointer for the entry. + * @pmd: Page table entry. + * + * Some architectures know that a set of contiguous pmds all map the same + * contiguous memory with the same permissions. In this case, it can provi= de a + * hint to aid pmd batching without the core code needing to scan every pm= d. + * + * An architecture implementation may ignore the PMD accessed state. Furth= er, + * the dirty state must apply atomically to all the PMDs described by the = hint. + * + * May be overridden by the architecture, else pmd_batch_hint is always 1. + */ +static inline unsigned int pmd_batch_hint(pmd_t *pmdp, pmd_t pmd) +{ + return 1; +} +#endif + +#ifndef pmd_advance_pfn +static inline pmd_t pmd_advance_pfn(pmd_t pmd, unsigned long nr) +{ + return __pmd(pmd_val(pmd) + (nr << PFN_PTE_SHIFT)); +} +#endif + #ifndef pte_batch_hint /** * pte_batch_hint - Number of pages that can be added to batch without sca= nning. diff --git a/mm/internal.h b/mm/internal.h index 5a2ddcf68e0b..9a0f9e89b054 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -270,7 +270,7 @@ static inline int anon_vma_prepare(struct vm_area_struc= t *vma) return __anon_vma_prepare(vma); } =20 -/* Flags for folio_pte_batch(). */ +/* Flags for folio_{pmd,pte}_batch(). */ typedef int __bitwise fpb_t; =20 /* Compare PTEs respecting the dirty bit. */ @@ -294,6 +294,79 @@ typedef int __bitwise fpb_t; */ #define FPB_MERGE_YOUNG_DIRTY ((__force fpb_t)BIT(4)) =20 +static inline pmd_t __pmd_batch_clear_ignored(pmd_t pmd, fpb_t flags) +{ + if (!(flags & FPB_RESPECT_DIRTY)) + pmd =3D pmd_mkclean(pmd); + if (likely(!(flags & FPB_RESPECT_SOFT_DIRTY))) + pmd =3D pmd_clear_soft_dirty(pmd); + if (likely(!(flags & FPB_RESPECT_WRITE))) + pmd =3D pmd_wrprotect(pmd); + return pmd_mkold(pmd); +} + +/** + * folio_pmd_batch - detect a PMD batch for a large folio. + * - The only user of this is hugetlb for contiguous + * PMDs + **/ +static inline unsigned int folio_pmd_batch(struct folio *folio, pmd_t *pmd= p, pmd_t *pmdentp, + unsigned int max_nr, fpb_t flags, bool *any_writable, + bool *any_young, bool *any_dirty) +{ + pmd_t expected_pmd, pmd =3D *pmdentp; + bool writable, young, dirty; + unsigned int nr, cur_nr; + + if (any_writable) + *any_writable =3D !!pmd_write(*pmdentp); + if (any_young) + *any_young =3D !!pmd_young(*pmdentp); + if (any_dirty) + *any_dirty =3D !!pmd_dirty(*pmdentp); + + VM_WARN_ON_FOLIO(!pmd_present(pmd), folio); + VM_WARN_ON_FOLIO(!folio_test_large(folio) || max_nr < 1, folio); + VM_WARN_ON_FOLIO(page_folio(pfn_to_page(pmd_pfn(pmd))) !=3D folio, folio); + + /* Limit max_nr to the actual remaining PFNs in the folio we could batch.= */ + max_nr =3D min_t(unsigned long, max_nr, + (folio_pfn(folio) + folio_nr_pages(folio) - + pmd_pfn(pmd)) >> (PMD_SHIFT - PAGE_SHIFT)); + + nr =3D pmd_batch_hint(pmdp, pmd); + expected_pmd =3D __pmd_batch_clear_ignored(pmd_advance_pfn(pmd, nr << (PM= D_SHIFT - PAGE_SHIFT)), flags); + pmdp =3D pmdp + nr; + + while (nr < max_nr) { + pmd =3D pmdp_get(pmdp); + if (any_writable) + writable =3D !!pmd_write(pmd); + if (any_young) + young =3D !!pmd_young(pmd); + if (any_dirty) + dirty =3D !!pmd_dirty(pmd); + pmd =3D __pmd_batch_clear_ignored(pmd, flags); + + if (!pmd_same(pmd, expected_pmd)) + break; + + if (any_writable) + *any_writable |=3D writable; + if (any_young) + *any_young |=3D young; + if (any_dirty) + *any_dirty |=3D dirty; + + cur_nr =3D pmd_batch_hint(pmdp, pmd); + expected_pmd =3D pmd_advance_pfn(expected_pmd, cur_nr << (PMD_SHIFT - PA= GE_SHIFT)); + pmdp +=3D cur_nr; + nr +=3D cur_nr; + } + + return min(nr, max_nr); +} + static inline pte_t __pte_batch_clear_ignored(pte_t pte, fpb_t flags) { if (!(flags & FPB_RESPECT_DIRTY)) --=20 2.53.0 From nobody Mon Jun 8 22:53:12 2026 Received: from smtp-out1.suse.de (smtp-out1.suse.de [195.135.223.130]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id A6B433F413B for ; Mon, 25 May 2026 16:56:05 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=195.135.223.130 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779728167; cv=none; b=ZwtNsUi69ObY95FnTa0h2r0w5Nt6s0FMhzuOub4QMVsMJnW8EHdUukJ0HT3OeBj4FR+dXouEf+myGZJu9+QmyaymRRvMf+xMAQj2YhHviQe9DBLQ4w4ZtDNH5CcRKm4bRU0BBBHPbeokjRO3gpyZKLqoDP4tyBn7HTiJuo0IPbM= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779728167; c=relaxed/simple; bh=oGagSxDsoOt7uKNgZaWSte2C0mwqEtWPIBOo03Ad/LU=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=GWlH0DR9uS43fTQ5JnSTi5tPBqq7YniUo+rTW9zgbF6AycgAd6/3pUXvXbTs7h4cIaVxibmz3kGZxTBkATsDRS5oOC+XxWh+5T3IT/siN702Yk0YmhiFOmkZLfb48rXQs9DlLGkc+qLHZ1ihJNI8GBn/dqci1FUbLmv/9UGJy6A= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=suse.de; spf=pass smtp.mailfrom=suse.de; arc=none smtp.client-ip=195.135.223.130 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=suse.de Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=suse.de Received: from imap1.dmz-prg2.suse.org (imap1.dmz-prg2.suse.org [IPv6:2a07:de40:b281:104:10:150:64:97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by smtp-out1.suse.de (Postfix) with ESMTPS id E5B2164DF0; Mon, 25 May 2026 16:55:46 +0000 (UTC) Authentication-Results: smtp-out1.suse.de; none Received: from imap1.dmz-prg2.suse.org (localhost [127.0.0.1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by imap1.dmz-prg2.suse.org (Postfix) with ESMTPS id 77FDE59D4D; Mon, 25 May 2026 16:55:46 +0000 (UTC) Received: from dovecot-director2.suse.de ([2a07:de40:b281:106:10:150:64:167]) by imap1.dmz-prg2.suse.org with ESMTPSA id 4Hc3GxJ/FGrlRAAAD6G6ig (envelope-from ); Mon, 25 May 2026 16:55:46 +0000 From: Oscar Salvador To: Andrew Morton Cc: David Hildenbrand , Michal Hocko , Muchun Song , Vlastimil Babka , Lorenzo Stoakes , linux-kernel@vger.kernel.org, linux-mm@kvack.org, Oscar Salvador , David Hildenbrand Subject: [RFC PATCH v3 4/8] mm: Implement pt_range_walk Date: Mon, 25 May 2026 18:55:24 +0200 Message-ID: <20260525165528.184397-5-osalvador@suse.de> X-Mailer: git-send-email 2.54.0 In-Reply-To: <20260525165528.184397-1-osalvador@suse.de> References: <20260525165528.184397-1-osalvador@suse.de> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Pre-Result: action=no action; module=replies; Message is reply to one we originated X-Spam-Score: -4.00 X-Rspamd-Queue-Id: E5B2164DF0 X-Rspamd-Pre-Result: action=no action; module=replies; Message is reply to one we originated X-Rspamd-Action: no action X-Spam-Level: X-Spamd-Result: default: False [-4.00 / 50.00]; REPLY(-4.00)[] X-Rspamd-Server: rspamd2.dmz-prg2.suse.org X-Spam-Flag: NO Implement pt_range_walk, which is a pagewalk API that implements locking and batching itself, and returns a struct containing information about the address space which is backed by the vma. It goes through the address range provided, and returns whatever it find there, softleaf entries, folios, etc. and information about the entry itself like whether it is dirty, shared, present, size of the entry, pageta= ble level of the entry, number of batched entries, etc. It defines the following types: #define PT_TYPE_NONE #define PT_TYPE_FOLIO #define PT_TYPE_MARKER #define PT_TYPE_PFN #define PT_TYPE_SWAP #define PT_TYPE_MIGRATION #define PT_TYPE_DEVICE #define PT_TYPE_HWPOISON #define PT_TYPE_ALL and it lets the caller be explicit about what types it is interested in. If it finds a type, but the caller stated it is not of importance, it keeps scanning the address range till the next type is found, or till we exhaust the range. We have three functions: .pt_range_walk_start() .pt_range_walk_next() .pt_range_walk_done() pt_range_walk_start() starts scanning the range and it returns the first type it finds, then we keep calling pt_range_walk_next() until we get PTW_DONE, which means we exhausted the range, and once that happens we have to call pt_range_walk_done() in order to cleanup the pt_range_walk internal state, like locking. An example below: =C2=B4=C2=B4=C2=B4=C2=B4 pt_type_flags_t flags =3D PT_TYPE_ALL; type =3D pt_range_walk_start(&ptw, vma, start, vma->vm_end, flags); while (type !=3D PTW_DONE) { do_something type =3D pt_range_walk_next(&ptw, vma, start, vma->vm_end, flags= ); } pt_range_walk_done(&ptw); =C2=B4=C2=B4=C2=B4=C2=B4 The API manages locking within the interface, and also batching, which means that it can handle contiguous ptes (or pmds in the case of hugetlb) itself. Suggested-by: David Hildenbrand Signed-off-by: Oscar Salvador --- arch/arm64/include/asm/pgtable.h | 1 + include/linux/mm.h | 2 + include/linux/pagewalk.h | 106 ++++++++ mm/memory.c | 22 ++ mm/pagewalk.c | 400 +++++++++++++++++++++++++++++++ 5 files changed, 531 insertions(+) diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgta= ble.h index 08ae4ee7d1da..7a3c109b9e99 100644 --- a/arch/arm64/include/asm/pgtable.h +++ b/arch/arm64/include/asm/pgtable.h @@ -639,6 +639,7 @@ static inline pmd_t pmd_mkspecial(pmd_t pmd) #define pmd_pfn(pmd) ((__pmd_to_phys(pmd) & PMD_MASK) >> PAGE_SHIFT) #define pfn_pmd(pfn,prot) __pmd(__phys_to_pmd_val((phys_addr_t)(pfn) << PA= GE_SHIFT) | pgprot_val(prot)) =20 +#define pud_dirty(pud) pte_dirty(pud_pte(pud)) #define pud_young(pud) pte_young(pud_pte(pud)) #define pud_mkyoung(pud) pte_pud(pte_mkyoung(pud_pte(pud))) #define pud_mkwrite_novma(pud) pte_pud(pte_mkwrite_novma(pud_pte(pud))) diff --git a/include/linux/mm.h b/include/linux/mm.h index af23453e9dbd..0f236363c981 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -3073,6 +3073,8 @@ struct folio *vm_normal_folio_pmd(struct vm_area_stru= ct *vma, unsigned long addr, pmd_t pmd); struct page *vm_normal_page_pmd(struct vm_area_struct *vma, unsigned long = addr, pmd_t pmd); +struct folio *vm_normal_folio_pud(struct vm_area_struct *vma, + unsigned long addr, pud_t pud); struct page *vm_normal_page_pud(struct vm_area_struct *vma, unsigned long = addr, pud_t pud); =20 diff --git a/include/linux/pagewalk.h b/include/linux/pagewalk.h index b41d7265c01b..370471687ce1 100644 --- a/include/linux/pagewalk.h +++ b/include/linux/pagewalk.h @@ -198,4 +198,110 @@ struct folio *folio_walk_start(struct folio_walk *fw, vma_pgtable_walk_end(__vma); \ } while (0) =20 +typedef int __bitwise pt_type_flags_t; + +/* + * Types we are interested in returning. Those which are not explicitly set + * will be silently ignored by keep walking the page tables. + */ +#define PT_TYPE_NONE ((__force pt_type_flags_t)BIT(0)) +#define PT_TYPE_FOLIO ((__force pt_type_flags_t)BIT(1)) +#define PT_TYPE_MARKER ((__force pt_type_flags_t)BIT(2)) +#define PT_TYPE_PFN ((__force pt_type_flags_t)BIT(3)) +#define PT_TYPE_SWAP ((__force pt_type_flags_t)BIT(4)) +#define PT_TYPE_MIGRATION ((__force pt_type_flags_t)BIT(5)) +#define PT_TYPE_DEVICE ((__force pt_type_flags_t)BIT(6)) +#define PT_TYPE_HWPOISON ((__force pt_type_flags_t)BIT(7)) +#define PT_TYPE_ALL (PT_TYPE_NONE | PT_TYPE_FOLIO | PT_TYPE_MARKER | \ + PT_TYPE_PFN | PT_TYPE_SWAP | PT_TYPE_MIGRATION | \ + PT_TYPE_DEVICE | PT_TYPE_HWPOISON) + +enum pt_range_walk_level { + PTW_PUD_LEVEL, + PTW_PMD_LEVEL, + PTW_PTE_LEVEL, +}; + +enum pt_range_walk_type { + PTW_ABORT, + PTW_DONE, + PTW_NONE, + PTW_FOLIO, + PTW_MARKER, + PTW_PFN, + PTW_SWAP, + PTW_MIGRATION, + PTW_DEVICE, + PTW_HWPOISON, +}; + +/** + * struct pt_range_walk - pt_range_walk() + * @page: exact folio page referenced (if applicable) + * @folio: folio mapped (if any) + * @nr_entries: number of contiguous entries of the same type + * @size: stores nr_batched * entry_size + * @softleaf_entry: softleaf entry (if any) + * @writable: whether it is writable + * @young: whether it is young + * @dirty: whether it is dirty + * @present: whether it is present in the page tables + * @vma_locked: whether we are holding the vma lock + * @pmd_shared: only used for hugetlb + * @curr_addr: current addr we are operating on + * @next_addr: next addr to be used walk the page tables + * @level: page table level + * @pte: copy of the entry value (PTW_PTE_LEVEL). + * @pmd: copy of the entry value (PTW_PMD_LEVEL). + * @pud: copy of the entry value (PTW_PUD_LEVEL). + * @mm: the mm_struct we are walking + * @vma: the vma we are walking + * @ptl: pointer to the page table lock. + */ + +struct pt_range_walk { + struct page *page; + struct folio *folio; + int nr_entries; + unsigned long size; + softleaf_t softleaf_entry; + bool writable; + bool young; + bool dirty; + bool present; + bool vma_locked; + bool pmd_shared; + bool lock_i_mmap; + bool i_mmap_locked; + unsigned long curr_addr; + unsigned long next_addr; + enum pt_range_walk_level level; + union { + pte_t *ptep; + pud_t *pudp; + pmd_t *pmdp; + }; + union { + pte_t pte; + pud_t pud; + pmd_t pmd; + }; + struct mm_struct *mm; + struct vm_area_struct *vma; + spinlock_t *ptl; +}; + +enum pt_range_walk_type pt_range_walk(struct pt_range_walk *ptw, + struct vm_area_struct *vma, + unsigned long addr, unsigned long end, + pt_type_flags_t flags); +enum pt_range_walk_type pt_range_walk_start(struct pt_range_walk *ptw, + struct vm_area_struct *vma, + unsigned long addr, unsigned long end, + pt_type_flags_t flags); +enum pt_range_walk_type pt_range_walk_next(struct pt_range_walk *ptw, + struct vm_area_struct *vma, + unsigned long addr, unsigned long end, + pt_type_flags_t flags); +void pt_range_walk_done(struct pt_range_walk *ptw); #endif /* _LINUX_PAGEWALK_H */ diff --git a/mm/memory.c b/mm/memory.c index ea6568571131..64457a447224 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -837,6 +837,28 @@ struct page *vm_normal_page_pud(struct vm_area_struct = *vma, return __vm_normal_page(vma, addr, pud_pfn(pud), pud_special(pud), pud_val(pud), PGTABLE_LEVEL_PUD); } + +/** + * vm_normal_folio_pud() - Get the "struct folio" associated with a PUD + * @vma: The VMA mapping the @pud. + * @addr: The address where the @pud is mapped. + * @pud: The PUD. + * + * Get the "struct folio" associated with a PUD. See __vm_normal_page() + * for details on "normal" and "special" mappings. + * + * Return: Returns the "struct folio" if this is a "normal" mapping. Retur= ns + * NULL if this is a "special" mapping. + */ +struct folio *vm_normal_folio_pud(struct vm_area_struct *vma, + unsigned long addr, pud_t pud) +{ + struct page *page =3D vm_normal_page_pud(vma, addr, pud); + + if (page) + return page_folio(page); + return NULL; +} #endif =20 /** diff --git a/mm/pagewalk.c b/mm/pagewalk.c index 3ae2586ff45b..49374fdcfbf6 100644 --- a/mm/pagewalk.c +++ b/mm/pagewalk.c @@ -1024,3 +1024,403 @@ struct folio *folio_walk_start(struct folio_walk *f= w, fw->ptl =3D ptl; return page_folio(page); } + +enum pt_range_walk_type pt_range_walk(struct pt_range_walk *ptw, + struct vm_area_struct *vma, + unsigned long addr, unsigned long end, + pt_type_flags_t flags) +{ + pgd_t *pgdp; + p4d_t *p4dp; + pud_t *pudp, pud; + pmd_t *pmdp, pmd; + pte_t *ptep, pte; + int nr_batched =3D 1; + spinlock_t *ptl =3D NULL; + unsigned long entry_size; + struct page *page; + struct folio *folio; + enum pt_range_walk_type ret_type =3D PTW_DONE; + bool writable, young, dirty; + unsigned long curr_addr, next_addr =3D ptw->next_addr ? ptw->next_addr : = addr; + + if (WARN_ON_ONCE(next_addr < vma->vm_start || next_addr >=3D vma->vm_end)) + return ret_type; + + mmap_assert_locked(ptw->mm); + + if (ptw->ptl) { + spin_unlock(ptw->ptl); + ptw->ptl =3D NULL; + } + + if (ptw->level =3D=3D PTW_PTE_LEVEL && ptw->ptep) { + pte_unmap(ptw->ptep); + ptw->ptep =3D NULL; + } + + if (!ptw->vma_locked) { + vma_pgtable_walk_begin(vma); + ptw->vma_locked =3D true; + ptw->vma =3D vma; + } + +keep_walking: + ret_type =3D PTW_DONE; + folio =3D NULL; + page =3D NULL; + writable =3D young =3D dirty =3D false; + ptw->present =3D false; + ptw->pmd_shared =3D false; + ptw->folio =3D NULL; + ptw->page =3D NULL; + + curr_addr =3D next_addr; + if (ptl) { + spin_unlock(ptl); + ptl =3D NULL; + } + /* + * If we keep walking the page tables because we are not interested + * in the type we found, make sure to check whether we reached the end. + */ + if (curr_addr >=3D end) { + ptw->next_addr =3D next_addr; + return ret_type; + } +again: + pgdp =3D pgd_offset(ptw->mm, curr_addr); + next_addr =3D pgd_addr_end(curr_addr, end); + + if (pgd_none_or_clear_bad(pgdp)) + /* PTW_ABORT? */ + goto keep_walking; + + next_addr =3D p4d_addr_end(curr_addr, end); + p4dp =3D p4d_offset(pgdp, curr_addr); + if (p4d_none_or_clear_bad(p4dp)) + /* PTW_ABORT? */ + goto keep_walking; + + entry_size =3D PUD_SIZE; + ptw->level =3D PTW_PUD_LEVEL; + next_addr =3D pud_addr_end(curr_addr, end); + pudp =3D pud_offset(p4dp, curr_addr); + pud =3D pudp_get(pudp); + if (pud_none(pud)) { + if (!(flags & PT_TYPE_NONE)) + goto keep_walking; + ret_type =3D PTW_NONE; + goto found; + } + /* + * For now, there are no architectures which supports pgd or p4d + * leafs, pud is the first level that can be a leaf. + */ + if (IS_ENABLED(CONFIG_PGTABLE_HAS_HUGE_LEAVES) && + (!pud_present(pud) || pud_leaf(pud))) { + ptl =3D pud_huge_lock(pudp, vma); + if (!ptl) + goto again; + + pud =3D pudp_get(pudp); + ptw->pudp =3D pudp; + ptw->pud =3D pud; + if (pud_none(pud)) { + if (!(flags & PT_TYPE_NONE)) + goto keep_walking; + ret_type =3D PTW_NONE; + } else if (pud_present(pud) && !pud_leaf(pud)) { + spin_unlock(ptl); + ptl =3D NULL; + goto pmd_table; + } else if (pud_present(pud)) { + /* + * We do not support PUD-device or pud-PFNMAP, so + * if it is present, we must have a folio (Tm). + */ + page =3D vm_normal_page_pud(vma, curr_addr, pud); + if (!page || !(flags & PT_TYPE_FOLIO)) + goto keep_walking; + + ret_type =3D PTW_FOLIO; + folio =3D page_folio(page); + ptw->present =3D true; + dirty =3D !!pud_dirty(pud); + young =3D !!pud_young(pud); + writable =3D !!pud_write(pud); + } else if (!pud_none(pud)) { + /* PUD-hugetlbs can have special swap entries */ + const softleaf_t entry =3D softleaf_from_pud(pud); + + ptw->softleaf_entry =3D entry; + + if (softleaf_is_marker(entry)) { + if (!(flags & PT_TYPE_MARKER)) + goto keep_walking; + ret_type =3D PTW_MARKER; + } else if (softleaf_has_pfn(entry)) { + if (softleaf_is_migration(entry)) { + if (!(flags & PT_TYPE_MIGRATION)) + goto keep_walking; + ret_type =3D PTW_MIGRATION; + } else if (softleaf_is_hwpoison(entry)) { + if (!(flags & PT_TYPE_HWPOISON)) + goto keep_walking; + ret_type =3D PTW_HWPOISON; + } + + page =3D softleaf_to_page(entry); + if (page) + folio =3D page_folio(page); + } + } else { + /* We found nothing, keep going */ + goto keep_walking; + } + + /* We found a type */ + goto found; + } +pmd_table: + entry_size =3D PMD_SIZE; + ptw->level =3D PTW_PMD_LEVEL; + next_addr =3D pmd_addr_end(curr_addr, end); + pmdp =3D pmd_offset(pudp, curr_addr); + pmd =3D pmdp_get_lockless(pmdp); + if (pmd_none(pmd)) { + if (!(flags & PT_TYPE_NONE)) + goto keep_walking; + ret_type =3D PTW_NONE; + goto found; + } + + if (IS_ENABLED(CONFIG_PGTABLE_HAS_HUGE_LEAVES) && + (!pmd_present(pmd) || pmd_leaf(pmd))) { + ptl =3D pmd_huge_lock(pmdp, vma); + if (!ptl) + goto again; + + pmd =3D pmdp_get(pmdp); + ptw->pmdp =3D pmdp; + ptw->pmd =3D pmd; + if (pmd_none(pmd)) { + if (!(flags & PT_TYPE_NONE)) + goto keep_walking; + ret_type =3D PTW_NONE; + } else if (pmd_present(pmd) && !pmd_leaf(pmd)) { + spin_unlock(ptl); + ptl =3D NULL; + goto pte_table; + } else if (pmd_present(pmd)) { + page =3D vm_normal_page_pmd(vma, curr_addr, pmd); + if (page) { + if (!(flags & PT_TYPE_FOLIO)) + goto keep_walking; + ret_type =3D PTW_FOLIO; + folio =3D page_folio(page); + if (folio_size(folio) > entry_size) { + /* We can batch */ + int max_nr =3D folio_size(folio) / entry_size; + + nr_batched =3D folio_pmd_batch(folio, pmdp, &pmd, + max_nr, 0, + &writable, + &young, + &dirty); + } else { + dirty =3D !!pmd_dirty(pmd); + young =3D !!pmd_young(pmd); + writable =3D !!pmd_write(pmd); + } + } else if (!page && (is_huge_zero_pmd(pmd) || + vma->vm_flags & VM_PFNMAP)) { + if (!(flags & PT_TYPE_PFN)) + goto keep_walking; + /* Create a subtype to differentiate them? */ + ret_type =3D PTW_PFN; + } else if (!page) { + goto keep_walking; + } + ptw->present =3D true; + next_addr +=3D (nr_batched * entry_size) - entry_size; + } else if (!pmd_none(pmd)) { + const softleaf_t entry =3D softleaf_from_pmd(pmd); + + ptw->softleaf_entry =3D entry; + + if (softleaf_is_marker(entry)) { + if (!(flags & PT_TYPE_MARKER)) + goto keep_walking; + ret_type =3D PTW_MARKER; + } else if (softleaf_has_pfn(entry)) { + if (softleaf_is_migration(entry)) { + if (!(flags & PT_TYPE_MIGRATION)) + goto keep_walking; + ret_type =3D PTW_MIGRATION; + } else if (softleaf_is_hwpoison(entry)) { + if (!(flags & PT_TYPE_HWPOISON)) + goto keep_walking; + ret_type =3D PTW_HWPOISON; + } else if (softleaf_is_device_private(entry) || + softleaf_is_device_exclusive(entry)) { + if (!(flags & PT_TYPE_DEVICE)) + goto keep_walking; + ptw->present =3D true; + ret_type =3D PTW_DEVICE; + } + page =3D softleaf_to_page(entry); + if (page) + folio =3D page_folio(page); + } + } else { + /* We found nothing, keep going */ + goto keep_walking; + } + + if (ret_type !=3D PTW_NONE && is_vm_hugetlb_page(vma) && + hugetlb_pmd_shared((pte_t *)pmdp)) + ptw->pmd_shared =3D true; + + goto found; + } +pte_table: + entry_size =3D PAGE_SIZE; + ptw->level =3D PTW_PTE_LEVEL; + next_addr =3D curr_addr + PAGE_SIZE; + ptep =3D pte_offset_map_lock(vma->vm_mm, pmdp, curr_addr, &ptl); + if (!ptep) + goto again; + + pte =3D ptep_get(ptep); + ptw->ptep =3D ptep; + ptw->pte =3D pte; + if (pte_none(pte)) { + if (!(flags & PT_TYPE_NONE)) + goto not_found; + ret_type =3D PTW_NONE; + } else if (pte_present(pte)) { + page =3D vm_normal_page(vma, curr_addr, pte); + if (page) { + if (!(flags & PT_TYPE_FOLIO)) + goto not_found; + ret_type =3D PTW_FOLIO; + folio =3D page_folio(page); + if (folio_test_large(folio)) { + /* We can batch */ + unsigned long end_addr =3D pmd_addr_end(curr_addr, end); + int max_nr =3D (end_addr - curr_addr) >> PAGE_SHIFT; + + nr_batched =3D folio_pte_batch_flags(folio, vma, ptep, &pte, max_nr, + FPB_MERGE_WRITE | FPB_MERGE_YOUNG_DIRTY); + } + } else if (!page && (is_zero_pfn(pte_pfn(pte)) || + vma->vm_flags & VM_PFNMAP)) { + if (!(flags & PT_TYPE_PFN)) + goto not_found; + ret_type =3D PTW_PFN; + } + + dirty =3D !!pte_dirty(pte); + young =3D !!pte_young(pte); + writable =3D !!pte_write(pte); + ptw->present =3D true; + next_addr +=3D (nr_batched * entry_size) - entry_size; + } else if (!pte_none(pte)) { + const softleaf_t entry =3D softleaf_from_pte(pte); + + ptw->softleaf_entry =3D entry; + + if (softleaf_is_marker(entry)) { + if (!(flags & PT_TYPE_MARKER)) + goto not_found; + ret_type =3D PTW_MARKER; + } else if (softleaf_is_swap(entry)) { + unsigned long end_addr =3D pmd_addr_end(curr_addr, end); + int max_nr =3D (end_addr - curr_addr) >> PAGE_SHIFT; + + if (!(flags & PT_TYPE_SWAP)) + goto not_found; + + nr_batched =3D swap_pte_batch(ptep, max_nr, pte); + next_addr +=3D (nr_batched * entry_size) - entry_size; + ret_type =3D PTW_SWAP; + } else if (softleaf_has_pfn(entry)) { + if (softleaf_is_migration(entry)) { + if (!(flags & PT_TYPE_MIGRATION)) + goto not_found; + ret_type =3D PTW_MIGRATION; + } else if (softleaf_is_hwpoison(entry)) { + if (!(flags & PT_TYPE_HWPOISON)) + goto not_found; + ret_type =3D PTW_HWPOISON; + } else if (softleaf_is_device_private(entry) || + softleaf_is_device_exclusive(entry)) { + if (!(flags & PT_TYPE_DEVICE)) + goto not_found; + ptw->present =3D true; + ret_type =3D PTW_DEVICE; + } + page =3D softleaf_to_page(entry); + if (page) + folio =3D page_folio(page); + } + } else { +not_found: + /* We found nothing, keep going */ + pte_unmap_unlock(ptep, ptl); + ptw->ptep =3D NULL; + ptl =3D NULL; + goto keep_walking; + } + +found: + /* Fill in remaining ptw struct before returning */ + ptw->ptl =3D ptl; + ptw->curr_addr =3D curr_addr; + ptw->next_addr =3D next_addr; + ptw->writable =3D writable; + ptw->young =3D young; + ptw->dirty =3D dirty; + ptw->nr_entries =3D nr_batched; + ptw->size =3D nr_batched * entry_size; + if (folio) { + ptw->folio =3D folio; + ptw->page =3D page + ((curr_addr & (entry_size - 1)) >> PAGE_SHIFT); + } + return ret_type; +} + +enum pt_range_walk_type pt_range_walk_start(struct pt_range_walk *ptw, + struct vm_area_struct *vma, + unsigned long addr, unsigned long end, + pt_type_flags_t flags) +{ + if (!ptw->mm) + return PTW_DONE; + if (addr >=3D end) + return PTW_DONE; + return pt_range_walk(ptw, vma, addr, end, flags); +} + +enum pt_range_walk_type pt_range_walk_next(struct pt_range_walk *ptw, + struct vm_area_struct *vma, + unsigned long addr, unsigned long end, + pt_type_flags_t flags) +{ + /* We went through the complete range */ + if (ptw->next_addr >=3D end) + return PTW_DONE; + return pt_range_walk(ptw, vma, addr, end, flags); +} + +void pt_range_walk_done(struct pt_range_walk *ptw) +{ + if (ptw->ptl) + spin_unlock(ptw->ptl); + if (ptw->level =3D=3D PTW_PTE_LEVEL && ptw->ptep) + pte_unmap(ptw->ptep); + if (ptw->vma_locked) + vma_pgtable_walk_end(ptw->vma); + cond_resched(); +} --=20 2.53.0 From nobody Mon Jun 8 22:53:12 2026 Received: from smtp-out1.suse.de (smtp-out1.suse.de [195.135.223.130]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 99FD53F6C53 for ; Mon, 25 May 2026 16:56:12 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=195.135.223.130 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779728174; cv=none; b=dvkHa/7V3reJGyuMK/9b4qnP6qcDHj48V8vw8ixjcOYSHEaPNJSKCdkS9/ESFUO7uQQcfwLj6bBdGpMGSTKPHl/d19fPKGL3D5tJtTDv4oGetjpLkSyrLtxHnr9wpNp61YGNYOkf5Ez3t6OwDHwdMvpRwQSwiAZVtSmu31cGQ00= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779728174; c=relaxed/simple; bh=Z5eQCDzqvZFLdYe6OFTgxeNgrw2d/wFycMUbTe4umTo=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=WtOg9n4/QZwHeilqJZtbh5/wi4YaRvbdc1CmAQE84PkXFHPVxzRhoQG7yu1q5AH+8AGoS8X6oX/421uWfFrMqwbSEFtJAPVFms24/CqbvWUrzUUQbdsMIiGyF9waeOOoGvYDH8yHSx5rFCVUwHV56iem2gukvL508zHIf1UMjFk= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=suse.de; spf=pass smtp.mailfrom=suse.de; dkim=pass (1024-bit key) header.d=suse.de header.i=@suse.de header.b=CM3l5Lir; dkim=permerror (0-bit key) header.d=suse.de header.i=@suse.de header.b=2Hv4XgHv; dkim=pass (1024-bit key) header.d=suse.de header.i=@suse.de header.b=CM3l5Lir; dkim=permerror (0-bit key) header.d=suse.de header.i=@suse.de header.b=2Hv4XgHv; arc=none smtp.client-ip=195.135.223.130 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=suse.de Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=suse.de Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=suse.de header.i=@suse.de header.b="CM3l5Lir"; dkim=permerror (0-bit key) header.d=suse.de header.i=@suse.de header.b="2Hv4XgHv"; dkim=pass (1024-bit key) header.d=suse.de header.i=@suse.de header.b="CM3l5Lir"; dkim=permerror (0-bit key) header.d=suse.de header.i=@suse.de header.b="2Hv4XgHv" Received: from imap1.dmz-prg2.suse.org (unknown [10.150.64.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by smtp-out1.suse.de (Postfix) with ESMTPS id 62FCF64DF5; Mon, 25 May 2026 16:55:47 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1779728147; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=WfkvnGWMnFduz6XN9C7dSVsfQC1CoypcMKVRfccukRA=; b=CM3l5Liri3t9JpTliCAaVapKeYMO357gJZf9Af6i2Du/222p5yMQfe2WiG1I9pYcdp/Btr G0P++3RFzK4eAAH7b7HB99Nstu6rWoPLXRKGOby4L2K9ExziUCTYR62CNiBBY3XwM2vrT7 uXCSetqrk+NC0xqYFCEOSVhysFfBMWk= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1779728147; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=WfkvnGWMnFduz6XN9C7dSVsfQC1CoypcMKVRfccukRA=; b=2Hv4XgHv+9cxi3fqNZJzZdas3miCeJo3b8XQHq0fMWs+1XtJ4j9fqHV5StxZH0gSl620OJ 1MJxPG6qi/H5pHDQ== Authentication-Results: smtp-out1.suse.de; none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1779728147; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=WfkvnGWMnFduz6XN9C7dSVsfQC1CoypcMKVRfccukRA=; b=CM3l5Liri3t9JpTliCAaVapKeYMO357gJZf9Af6i2Du/222p5yMQfe2WiG1I9pYcdp/Btr G0P++3RFzK4eAAH7b7HB99Nstu6rWoPLXRKGOby4L2K9ExziUCTYR62CNiBBY3XwM2vrT7 uXCSetqrk+NC0xqYFCEOSVhysFfBMWk= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1779728147; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=WfkvnGWMnFduz6XN9C7dSVsfQC1CoypcMKVRfccukRA=; b=2Hv4XgHv+9cxi3fqNZJzZdas3miCeJo3b8XQHq0fMWs+1XtJ4j9fqHV5StxZH0gSl620OJ 1MJxPG6qi/H5pHDQ== Received: from imap1.dmz-prg2.suse.org (localhost [127.0.0.1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by imap1.dmz-prg2.suse.org (Postfix) with ESMTPS id F357159D4B; Mon, 25 May 2026 16:55:46 +0000 (UTC) Received: from dovecot-director2.suse.de ([2a07:de40:b281:106:10:150:64:167]) by imap1.dmz-prg2.suse.org with ESMTPSA id yINSORJ/FGrlRAAAD6G6ig (envelope-from ); Mon, 25 May 2026 16:55:46 +0000 From: Oscar Salvador To: Andrew Morton Cc: David Hildenbrand , Michal Hocko , Muchun Song , Vlastimil Babka , Lorenzo Stoakes , linux-kernel@vger.kernel.org, linux-mm@kvack.org, Oscar Salvador Subject: [RFC PATCH v3 5/8] mm: Make /proc/pid/smaps use the new generic pagewalk API Date: Mon, 25 May 2026 18:55:25 +0200 Message-ID: <20260525165528.184397-6-osalvador@suse.de> X-Mailer: git-send-email 2.54.0 In-Reply-To: <20260525165528.184397-1-osalvador@suse.de> References: <20260525165528.184397-1-osalvador@suse.de> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Spam-Score: -6.80 X-Spam-Level: X-Spamd-Result: default: False [-6.80 / 50.00]; REPLY(-4.00)[]; BAYES_HAM(-3.00)[100.00%]; MID_CONTAINS_FROM(1.00)[]; NEURAL_HAM_LONG(-1.00)[-1.000]; R_MISSING_CHARSET(0.50)[]; NEURAL_HAM_SHORT(-0.20)[-0.998]; MIME_GOOD(-0.10)[text/plain]; RCVD_COUNT_TWO(0.00)[2]; ARC_NA(0.00)[]; RCVD_VIA_SMTP_AUTH(0.00)[]; FROM_HAS_DN(0.00)[]; TO_DN_SOME(0.00)[]; MIME_TRACE(0.00)[0:+]; TO_MATCH_ENVRCPT_ALL(0.00)[]; FUZZY_RATELIMITED(0.00)[rspamd.com]; DBL_BLOCKED_OPENRESOLVER(0.00)[suse.de:email,suse.de:mid]; internal_greylist_whitelist(0.00)[10.150.64.97]; FROM_EQ_ENVFROM(0.00)[]; DKIM_SIGNED(0.00)[suse.de:s=susede2_rsa,suse.de:s=susede2_ed25519]; RCPT_COUNT_SEVEN(0.00)[9]; R_RATELIMIT(0.00)[to_ip_from(RLd9dsuofksntgrby8c3fm48h6)]; RCVD_TLS_ALL(0.00)[] X-Spam-Flag: NO Content-Type: text/plain; charset="utf-8" Have /proc/pid/smaps make use of the new generic API, and remove the code which was using the old one. Signed-off-by: Oscar Salvador --- fs/proc/task_mmu.c | 306 +++++++++++++++------------------------------ 1 file changed, 100 insertions(+), 206 deletions(-) diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c index 751b9ba160fb..ce958f208b5e 100644 --- a/fs/proc/task_mmu.c +++ b/fs/proc/task_mmu.c @@ -915,7 +915,7 @@ static void smaps_page_accumulate(struct mem_size_stats= *mss, =20 static void smaps_account(struct mem_size_stats *mss, struct page *page, bool compound, bool young, bool dirty, bool locked, - bool present) + bool present, int ssize) { struct folio *folio =3D page_folio(page); int i, nr =3D compound ? compound_nr(page) : 1; @@ -923,6 +923,11 @@ static void smaps_account(struct mem_size_stats *mss, = struct page *page, bool exclusive; int mapcount; =20 + if (ssize) { + nr =3D ssize / PAGE_SIZE; + size =3D ssize; + } + /* * First accumulate quantities that depend only on |size| and the type * of the compound page. @@ -988,150 +993,6 @@ static void smaps_account(struct mem_size_stats *mss,= struct page *page, } } =20 -#ifdef CONFIG_SHMEM -static int smaps_pte_hole(unsigned long addr, unsigned long end, - __always_unused int depth, struct mm_walk *walk) -{ - struct mem_size_stats *mss =3D walk->private; - struct vm_area_struct *vma =3D walk->vma; - - mss->swap +=3D shmem_partial_swap_usage(walk->vma->vm_file->f_mapping, - linear_page_index(vma, addr), - linear_page_index(vma, end)); - - return 0; -} -#else -#define smaps_pte_hole NULL -#endif /* CONFIG_SHMEM */ - -static void smaps_pte_hole_lookup(unsigned long addr, struct mm_walk *walk) -{ -#ifdef CONFIG_SHMEM - if (walk->ops->pte_hole) { - /* depth is not used */ - smaps_pte_hole(addr, addr + PAGE_SIZE, 0, walk); - } -#endif -} - -static void smaps_pte_entry(pte_t *pte, unsigned long addr, - struct mm_walk *walk) -{ - struct mem_size_stats *mss =3D walk->private; - struct vm_area_struct *vma =3D walk->vma; - bool locked =3D !!(vma->vm_flags & VM_LOCKED); - struct page *page =3D NULL; - bool present =3D false, young =3D false, dirty =3D false; - pte_t ptent =3D ptep_get(pte); - - if (pte_present(ptent)) { - page =3D vm_normal_page(vma, addr, ptent); - young =3D pte_young(ptent); - dirty =3D pte_dirty(ptent); - present =3D true; - } else if (pte_none(ptent)) { - smaps_pte_hole_lookup(addr, walk); - } else { - const softleaf_t entry =3D softleaf_from_pte(ptent); - - if (softleaf_is_swap(entry)) { - int mapcount; - - mss->swap +=3D PAGE_SIZE; - mapcount =3D swp_swapcount(entry); - if (mapcount >=3D 2) { - u64 pss_delta =3D (u64)PAGE_SIZE << PSS_SHIFT; - - do_div(pss_delta, mapcount); - mss->swap_pss +=3D pss_delta; - } else { - mss->swap_pss +=3D (u64)PAGE_SIZE << PSS_SHIFT; - } - } else if (softleaf_has_pfn(entry)) { - if (softleaf_is_device_private(entry)) - present =3D true; - page =3D softleaf_to_page(entry); - } - } - - if (!page) - return; - - smaps_account(mss, page, false, young, dirty, locked, present); -} - -#ifdef CONFIG_TRANSPARENT_HUGEPAGE -static void smaps_pmd_entry(pmd_t *pmd, unsigned long addr, - struct mm_walk *walk) -{ - struct mem_size_stats *mss =3D walk->private; - struct vm_area_struct *vma =3D walk->vma; - bool locked =3D !!(vma->vm_flags & VM_LOCKED); - struct page *page =3D NULL; - bool present =3D false; - struct folio *folio; - - if (pmd_none(*pmd)) - return; - if (pmd_present(*pmd)) { - page =3D vm_normal_page_pmd(vma, addr, *pmd); - present =3D true; - } else if (unlikely(thp_migration_supported())) { - const softleaf_t entry =3D softleaf_from_pmd(*pmd); - - if (softleaf_has_pfn(entry)) - page =3D softleaf_to_page(entry); - } - if (IS_ERR_OR_NULL(page)) - return; - folio =3D page_folio(page); - if (folio_test_anon(folio)) - mss->anonymous_thp +=3D HPAGE_PMD_SIZE; - else if (folio_test_swapbacked(folio)) - mss->shmem_thp +=3D HPAGE_PMD_SIZE; - else if (folio_is_zone_device(folio)) - /* pass */; - else - mss->file_thp +=3D HPAGE_PMD_SIZE; - - smaps_account(mss, page, true, pmd_young(*pmd), pmd_dirty(*pmd), - locked, present); -} -#else -static void smaps_pmd_entry(pmd_t *pmd, unsigned long addr, - struct mm_walk *walk) -{ -} -#endif - -static int smaps_pte_range(pmd_t *pmd, unsigned long addr, unsigned long e= nd, - struct mm_walk *walk) -{ - struct vm_area_struct *vma =3D walk->vma; - pte_t *pte; - spinlock_t *ptl; - - ptl =3D pmd_trans_huge_lock(pmd, vma); - if (ptl) { - smaps_pmd_entry(pmd, addr, walk); - spin_unlock(ptl); - goto out; - } - - pte =3D pte_offset_map_lock(vma->vm_mm, pmd, addr, &ptl); - if (!pte) { - walk->action =3D ACTION_AGAIN; - return 0; - } - for (; addr !=3D end; pte++, addr +=3D PAGE_SIZE) - smaps_pte_entry(pte, addr, walk); - pte_unmap_unlock(pte - 1, ptl); -out: - cond_resched(); - return 0; -} - static void show_smap_vma_flags(struct seq_file *m, struct vm_area_struct = *vma) { /* @@ -1228,58 +1089,6 @@ static void show_smap_vma_flags(struct seq_file *m, = struct vm_area_struct *vma) seq_putc(m, '\n'); } =20 -#ifdef CONFIG_HUGETLB_PAGE -static int smaps_hugetlb_range(pte_t *pte, unsigned long hmask, - unsigned long addr, unsigned long end, - struct mm_walk *walk) -{ - struct mem_size_stats *mss =3D walk->private; - struct vm_area_struct *vma =3D walk->vma; - struct folio *folio =3D NULL; - bool present =3D false; - spinlock_t *ptl; - pte_t ptent; - - ptl =3D huge_pte_lock(hstate_vma(vma), walk->mm, pte); - ptent =3D huge_ptep_get(walk->mm, addr, pte); - if (pte_present(ptent)) { - folio =3D page_folio(pte_page(ptent)); - present =3D true; - } else { - const softleaf_t entry =3D softleaf_from_pte(ptent); - - if (softleaf_has_pfn(entry)) - folio =3D softleaf_to_folio(entry); - } - - if (folio) { - /* We treat non-present entries as "maybe shared". */ - if (!present || folio_maybe_mapped_shared(folio) || - hugetlb_pmd_shared(pte)) - mss->shared_hugetlb +=3D huge_page_size(hstate_vma(vma)); - else - mss->private_hugetlb +=3D huge_page_size(hstate_vma(vma)); - } - spin_unlock(ptl); - return 0; -} -#else -#define smaps_hugetlb_range NULL -#endif /* HUGETLB_PAGE */ - -static const struct mm_walk_ops smaps_walk_ops =3D { - .pmd_entry =3D smaps_pte_range, - .hugetlb_entry =3D smaps_hugetlb_range, - .walk_lock =3D PGWALK_RDLOCK, -}; - -static const struct mm_walk_ops smaps_shmem_walk_ops =3D { - .pmd_entry =3D smaps_pte_range, - .hugetlb_entry =3D smaps_hugetlb_range, - .pte_hole =3D smaps_pte_hole, - .walk_lock =3D PGWALK_RDLOCK, -}; - /* * Gather mem stats from @vma with the indicated beginning * address @start, and keep them in @mss. @@ -1287,14 +1096,20 @@ static const struct mm_walk_ops smaps_shmem_walk_op= s =3D { * Use vm_start of @vma as the beginning address if @start is 0. */ static void smap_gather_stats(struct vm_area_struct *vma, - struct mem_size_stats *mss, unsigned long start) + struct mem_size_stats *mss, + unsigned long start) { - const struct mm_walk_ops *ops =3D &smaps_walk_ops; + struct pt_range_walk ptw =3D { + .mm =3D vma->vm_mm + }; + enum pt_range_walk_type type; + pt_type_flags_t flags =3D PT_TYPE_ALL; =20 - /* Invalid start */ if (start >=3D vma->vm_end) return; =20 + flags &=3D ~(PT_TYPE_NONE|PT_TYPE_PFN); + if (vma->vm_file && shmem_mapping(vma->vm_file->f_mapping)) { /* * For shared or readonly shmem mappings we know that all @@ -1309,18 +1124,97 @@ static void smap_gather_stats(struct vm_area_struct= *vma, unsigned long shmem_swapped =3D shmem_swap_usage(vma); =20 if (!start && (!shmem_swapped || (vma->vm_flags & VM_SHARED) || - !(vma->vm_flags & VM_WRITE))) { + !(vma->vm_flags & VM_WRITE))) { mss->swap +=3D shmem_swapped; } else { - ops =3D &smaps_shmem_walk_ops; + flags |=3D PT_TYPE_NONE; } } =20 - /* mmap_lock is held in m_start */ if (!start) - walk_page_vma(vma, ops, mss); - else - walk_page_range(vma->vm_mm, start, vma->vm_end, ops, mss); + start =3D vma->vm_start; + + type =3D pt_range_walk_start(&ptw, vma, start, vma->vm_end, flags); + while (type !=3D PTW_DONE) { + bool locked =3D !!(vma->vm_flags & VM_LOCKED); + bool compound =3D false, account =3D false; + unsigned long swap_size; + int mapcount; + + switch (type) { + case PTW_FOLIO: + case PTW_MIGRATION: + case PTW_HWPOISON: + case PTW_DEVICE: + /* + * We either have a folio because vm_normal_folio was + * successful, or because we had a special swap entry + * and could retrieve it with softleaf_to_page. + */ + if (is_vm_hugetlb_page(vma)) { + /* HugeTLB */ + unsigned long size =3D huge_page_size(hstate_vma(ptw.vma)); + + if (!ptw.present || folio_maybe_mapped_shared(ptw.folio) || + ptw.pmd_shared) + mss->shared_hugetlb +=3D size; + else + mss->private_hugetlb +=3D size; + } else { + account =3D true; + if (ptw.level =3D=3D PTW_PMD_LEVEL) { + /* THP */ + compound =3D true; + if (folio_test_anon(ptw.folio)) + mss->anonymous_thp +=3D ptw.size; + else if (folio_test_swapbacked(ptw.folio)) + mss->shmem_thp +=3D ptw.size; + else if (folio_is_zone_device(ptw.folio)) + /* pass */; + else + mss->file_thp +=3D ptw.size; + } else if (ptw.level =3D=3D PTW_PTE_LEVEL && ptw.nr_entries > 1) { + compound =3D true; + } + } + break; + case PTW_SWAP: + account =3D true; + swap_size =3D PAGE_SIZE * ptw.nr_entries; + mss->swap +=3D swap_size; + mapcount =3D swp_swapcount(ptw.softleaf_entry); + if (mapcount >=3D 2) { + u64 pss_delta =3D (u64)swap_size << PSS_SHIFT; + + do_div(pss_delta, mapcount); + mss->swap_pss +=3D pss_delta; + } else { + mss->swap_pss +=3D (u64)swap_size << PSS_SHIFT; + } + break; + case PTW_NONE: +#ifdef CONFIG_SHMEM + unsigned long addr =3D ptw.curr_addr; + unsigned long end =3D ptw.next_addr; + + if (ptw.level =3D=3D PTW_PMD_LEVEL || ptw.level =3D=3D PTW_PTE_LEVEL) + mss->swap +=3D shmem_partial_swap_usage(vma->vm_file->f_mapping, + linear_page_index(vma, addr), + linear_page_index(vma, end)); +#endif + break; + default: + /* Ooops */ + break; + } + + if (account && ptw.folio) + smaps_account(mss, ptw.page, compound, ptw.young, + ptw.dirty, locked, ptw.present, ptw.size); + type =3D pt_range_walk_next(&ptw, vma, start, vma->vm_end, flags); + } + + pt_range_walk_done(&ptw); } =20 #define SEQ_PUT_DEC(str, val) \ --=20 2.53.0 From nobody Mon Jun 8 22:53:12 2026 Received: from smtp-out2.suse.de (smtp-out2.suse.de [195.135.223.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 4DDFC3F5BED for ; Mon, 25 May 2026 16:55:55 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=195.135.223.131 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779728156; cv=none; b=qK41l1dpp1ynmYfQLh9YomwTlc5rSqC4JdTFzA0wtC93QcdVUAoKqJMvmwSbEECwJubU2BvtboQ0XLe+G07VnuQqMQzBpiBxQs0Znacawm5botynDYM6mfFIo1PQvCphJd8cESsltJxurGtdvK1UVhZIeo0anGDFuYqrdKYPFqc= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779728156; c=relaxed/simple; bh=zqHYiSf++sceleHbUeJBvlz5Od+Gc5ImgCCA+WE08Rk=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=N6Q9UMb9nmSTxpu3AIwRdgAx+QTFpDaHDQIOpomDzQPbwXHyrdbvPC3ZzWzwDDYTXa8tgYF/szHnJsuVkGejq98gT0OialjkTuOHIGKgE6PgLKfdx245pa7i7VX7+Xii0DMixZbqNtd1wSdGvKY55y4/0QBP9jOHZhpz2PrI2mk= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=suse.de; spf=pass smtp.mailfrom=suse.de; arc=none smtp.client-ip=195.135.223.131 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=suse.de Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=suse.de Received: from imap1.dmz-prg2.suse.org (imap1.dmz-prg2.suse.org [IPv6:2a07:de40:b281:104:10:150:64:97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by smtp-out2.suse.de (Postfix) with ESMTPS id D4C0567D58; Mon, 25 May 2026 16:55:47 +0000 (UTC) Authentication-Results: smtp-out2.suse.de; none Received: from imap1.dmz-prg2.suse.org (localhost [127.0.0.1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by imap1.dmz-prg2.suse.org (Postfix) with ESMTPS id 707F059D4D; Mon, 25 May 2026 16:55:47 +0000 (UTC) Received: from dovecot-director2.suse.de ([2a07:de40:b281:106:10:150:64:167]) by imap1.dmz-prg2.suse.org with ESMTPSA id 6KpdGRN/FGrlRAAAD6G6ig (envelope-from ); Mon, 25 May 2026 16:55:47 +0000 From: Oscar Salvador To: Andrew Morton Cc: David Hildenbrand , Michal Hocko , Muchun Song , Vlastimil Babka , Lorenzo Stoakes , linux-kernel@vger.kernel.org, linux-mm@kvack.org, Oscar Salvador Subject: [RFC PATCH v3 6/8] mm: Make /proc/pid/numa_maps use the new generic pagewalk API Date: Mon, 25 May 2026 18:55:26 +0200 Message-ID: <20260525165528.184397-7-osalvador@suse.de> X-Mailer: git-send-email 2.54.0 In-Reply-To: <20260525165528.184397-1-osalvador@suse.de> References: <20260525165528.184397-1-osalvador@suse.de> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Rspamd-Pre-Result: action=no action; module=replies; Message is reply to one we originated X-Spam-Score: -4.00 X-Rspamd-Queue-Id: D4C0567D58 X-Rspamd-Pre-Result: action=no action; module=replies; Message is reply to one we originated X-Rspamd-Action: no action X-Spam-Level: X-Spamd-Result: default: False [-4.00 / 50.00]; REPLY(-4.00)[] X-Rspamd-Server: rspamd2.dmz-prg2.suse.org X-Spam-Flag: NO Content-Type: text/plain; charset="utf-8" Have /proc/pid/numa_maps make use of the new generic API, and remove the code which was using the old one Signed-off-by: Oscar Salvador --- fs/proc/task_mmu.c | 159 +++++++++------------------------------------ 1 file changed, 32 insertions(+), 127 deletions(-) diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c index ce958f208b5e..b9ca47f0bc18 100644 --- a/fs/proc/task_mmu.c +++ b/fs/proc/task_mmu.c @@ -3060,131 +3060,6 @@ static void gather_stats(struct page *page, struct = numa_maps *md, int pte_dirty, md->node[folio_nid(folio)] +=3D nr_pages; } =20 -static struct page *can_gather_numa_stats(pte_t pte, struct vm_area_struct= *vma, - unsigned long addr) -{ - struct page *page; - int nid; - - if (!pte_present(pte)) - return NULL; - - page =3D vm_normal_page(vma, addr, pte); - if (!page || is_zone_device_page(page)) - return NULL; - - if (PageReserved(page)) - return NULL; - - nid =3D page_to_nid(page); - if (!node_isset(nid, node_states[N_MEMORY])) - return NULL; - - return page; -} - -#ifdef CONFIG_TRANSPARENT_HUGEPAGE -static struct page *can_gather_numa_stats_pmd(pmd_t pmd, - struct vm_area_struct *vma, - unsigned long addr) -{ - struct page *page; - int nid; - - if (!pmd_present(pmd)) - return NULL; - - page =3D vm_normal_page_pmd(vma, addr, pmd); - if (!page) - return NULL; - - if (PageReserved(page)) - return NULL; - - nid =3D page_to_nid(page); - if (!node_isset(nid, node_states[N_MEMORY])) - return NULL; - - return page; -} -#endif - -static int gather_pte_stats(pmd_t *pmd, unsigned long addr, - unsigned long end, struct mm_walk *walk) -{ - struct numa_maps *md =3D walk->private; - struct vm_area_struct *vma =3D walk->vma; - spinlock_t *ptl; - pte_t *orig_pte; - pte_t *pte; - -#ifdef CONFIG_TRANSPARENT_HUGEPAGE - ptl =3D pmd_trans_huge_lock(pmd, vma); - if (ptl) { - struct page *page; - - page =3D can_gather_numa_stats_pmd(*pmd, vma, addr); - if (page) - gather_stats(page, md, pmd_dirty(*pmd), - HPAGE_PMD_SIZE/PAGE_SIZE); - spin_unlock(ptl); - return 0; - } -#endif - orig_pte =3D pte =3D pte_offset_map_lock(walk->mm, pmd, addr, &ptl); - if (!pte) { - walk->action =3D ACTION_AGAIN; - return 0; - } - do { - pte_t ptent =3D ptep_get(pte); - struct page *page =3D can_gather_numa_stats(ptent, vma, addr); - if (!page) - continue; - gather_stats(page, md, pte_dirty(ptent), 1); - - } while (pte++, addr +=3D PAGE_SIZE, addr !=3D end); - pte_unmap_unlock(orig_pte, ptl); - cond_resched(); - return 0; -} -#ifdef CONFIG_HUGETLB_PAGE -static int gather_hugetlb_stats(pte_t *pte, unsigned long hmask, - unsigned long addr, unsigned long end, struct mm_walk *walk) -{ - pte_t huge_pte; - struct numa_maps *md; - struct page *page; - spinlock_t *ptl; - - ptl =3D huge_pte_lock(hstate_vma(walk->vma), walk->mm, pte); - huge_pte =3D huge_ptep_get(walk->mm, addr, pte); - if (!pte_present(huge_pte)) - goto out; - - page =3D pte_page(huge_pte); - - md =3D walk->private; - gather_stats(page, md, pte_dirty(huge_pte), 1); -out: - spin_unlock(ptl); - return 0; -} - -#else -static int gather_hugetlb_stats(pte_t *pte, unsigned long hmask, - unsigned long addr, unsigned long end, struct mm_walk *walk) -{ - return 0; -} -#endif - -static const struct mm_walk_ops show_numa_ops =3D { - .hugetlb_entry =3D gather_hugetlb_stats, - .pmd_entry =3D gather_pte_stats, - .walk_lock =3D PGWALK_RDLOCK, -}; - /* * Display pages allocated per node and memory policy via /proc. */ @@ -3196,9 +3071,15 @@ static int show_numa_map(struct seq_file *m, void *v) struct numa_maps *md =3D &numa_priv->md; struct file *file =3D vma->vm_file; struct mm_struct *mm =3D vma->vm_mm; + struct pt_range_walk ptw =3D { + .mm =3D mm + }; + enum pt_range_walk_type type; + pt_type_flags_t flags; char buffer[64]; struct mempolicy *pol; pgoff_t ilx; + int nr_pages; int nid; =20 if (!mm) @@ -3229,8 +3110,32 @@ static int show_numa_map(struct seq_file *m, void *v) if (is_vm_hugetlb_page(vma)) seq_puts(m, " huge"); =20 - /* mmap_lock is held by m_start */ - walk_page_vma(vma, &show_numa_ops, md); + flags =3D PT_TYPE_FOLIO; + type =3D pt_range_walk_start(&ptw, vma, vma->vm_start, vma->vm_end, flags= ); + while (type !=3D PTW_DONE) { + + if (!ptw.folio || !ptw.page || PageReserved(ptw.page)) + goto not_found; + + nid =3D page_to_nid(ptw.page); + if (!node_isset(nid, node_states[N_MEMORY])) + goto not_found; + + if (is_vm_hugetlb_page(vma)) + /* + * As opposed to THP, HugeTLB counts the entire huge + * page as one unit size. + */ + nr_pages =3D 1; + else + nr_pages =3D ptw.size / PAGE_SIZE; + + gather_stats(ptw.page, md, ptw.dirty, nr_pages); +not_found: + type =3D pt_range_walk_next(&ptw, vma, vma->vm_start, vma->vm_end, flags= ); + + } + pt_range_walk_done(&ptw); =20 if (!md->pages) goto out; --=20 2.53.0 From nobody Mon Jun 8 22:53:12 2026 Received: from smtp-out1.suse.de (smtp-out1.suse.de [195.135.223.130]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 7F24A3F5BED for ; Mon, 25 May 2026 16:56:18 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=195.135.223.130 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779728183; cv=none; b=gTPLLARy3c8gui+JSENFU31WZR+TO6pJdQ4IKiDEia1Ax0GKkMrKpsF0WnINfQI2E6n05Mw8OOQPhdMveGQFKJwVKmVhzJc+ldh0D9dzJ7+6JGCl8bcB1ze0sSNCk7aNQ5Z745M34D673BGtRMkegnQs+dLVmTOQpDbfNbqNclc= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779728183; c=relaxed/simple; bh=+SfKLq0oi4u3Y8KhADGIFApz5IjxyQCiquKMoOVx7t8=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=dpEef7jtulLKvgIxoXzdXVzzkYFpM0ud0pK235p/zEGGPGoYAJcn1REFzaYCR3oPLJVwwUsu6YK2UF+2oCxQdTq+Kvo6hB69k5us6IWlW19rpLYHnHLvkBH3HVMXQfUq4U3z0loAvJOYMS6gtEcO/WAXs34isRcb7VXiKdA2+dc= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=suse.de; spf=pass smtp.mailfrom=suse.de; arc=none smtp.client-ip=195.135.223.130 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=suse.de Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=suse.de Received: from imap1.dmz-prg2.suse.org (imap1.dmz-prg2.suse.org [IPv6:2a07:de40:b281:104:10:150:64:97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by smtp-out1.suse.de (Postfix) with ESMTPS id 562A264DF7; Mon, 25 May 2026 16:55:48 +0000 (UTC) Authentication-Results: smtp-out1.suse.de; none Received: from imap1.dmz-prg2.suse.org (localhost [127.0.0.1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by imap1.dmz-prg2.suse.org (Postfix) with ESMTPS id E36BD59D4B; Mon, 25 May 2026 16:55:47 +0000 (UTC) Received: from dovecot-director2.suse.de ([2a07:de40:b281:106:10:150:64:167]) by imap1.dmz-prg2.suse.org with ESMTPSA id cPJTNRN/FGrlRAAAD6G6ig (envelope-from ); Mon, 25 May 2026 16:55:47 +0000 From: Oscar Salvador To: Andrew Morton Cc: David Hildenbrand , Michal Hocko , Muchun Song , Vlastimil Babka , Lorenzo Stoakes , linux-kernel@vger.kernel.org, linux-mm@kvack.org, Oscar Salvador Subject: [RFC PATCH v3 7/8] mm: Make /proc/pid/pagemap use the new generic pagewalk API Date: Mon, 25 May 2026 18:55:27 +0200 Message-ID: <20260525165528.184397-8-osalvador@suse.de> X-Mailer: git-send-email 2.54.0 In-Reply-To: <20260525165528.184397-1-osalvador@suse.de> References: <20260525165528.184397-1-osalvador@suse.de> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Rspamd-Pre-Result: action=no action; module=replies; Message is reply to one we originated X-Spamd-Result: default: False [-4.00 / 50.00]; REPLY(-4.00)[] X-Rspamd-Queue-Id: 562A264DF7 X-Spam-Score: -4.00 X-Rspamd-Pre-Result: action=no action; module=replies; Message is reply to one we originated X-Rspamd-Server: rspamd1.dmz-prg2.suse.org X-Rspamd-Action: no action X-Spam-Flag: NO X-Spam-Level: Content-Type: text/plain; charset="utf-8" Have /proc/pid/pagemap make use of the new generic API, and remove the code which was using the old one. Signed-off-by: Oscar Salvador --- arch/arm64/include/asm/pgtable.h | 9 + arch/x86/include/asm/pgtable.h | 5 + arch/x86/mm/pgtable.c | 18 +- fs/proc/task_mmu.c | 1749 +++++++++++++++--------------- include/linux/leafops.h | 13 + include/linux/pgtable.h | 30 + mm/pgtable-generic.c | 21 + 7 files changed, 986 insertions(+), 859 deletions(-) diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgta= ble.h index 7a3c109b9e99..71719929e2e8 100644 --- a/arch/arm64/include/asm/pgtable.h +++ b/arch/arm64/include/asm/pgtable.h @@ -639,6 +639,7 @@ static inline pmd_t pmd_mkspecial(pmd_t pmd) #define pmd_pfn(pmd) ((__pmd_to_phys(pmd) & PMD_MASK) >> PAGE_SHIFT) #define pfn_pmd(pfn,prot) __pmd(__phys_to_pmd_val((phys_addr_t)(pfn) << PA= GE_SHIFT) | pgprot_val(prot)) =20 +#define pud_mkinvalid(pud) pte_pud(pte_mkinvalid(pud_pte(pud))) #define pud_dirty(pud) pte_dirty(pud_pte(pud)) #define pud_young(pud) pte_young(pud_pte(pud)) #define pud_mkyoung(pud) pte_pud(pte_mkyoung(pud_pte(pud))) @@ -1527,6 +1528,14 @@ static inline pmd_t pmdp_establish(struct vm_area_st= ruct *vma, } #endif =20 +#define pudp_establish pudp_establish +static inline pud_t pudp_establish(struct vm_area_struct *vma, + unsigned long address, pud_t *pudp, pud_t pud) +{ + page_table_check_pud_set(vma->vm_mm, address, pudp, pud); + return __pud(xchg_relaxed(&pud_val(*pudp), pud_val(pud))); +} + /* * Encode and decode a swap entry: * bits 0-1: present (must be zero) diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h index a3cf289948a0..7903485df8c3 100644 --- a/arch/x86/include/asm/pgtable.h +++ b/arch/x86/include/asm/pgtable.h @@ -1392,10 +1392,15 @@ static inline pud_t pudp_establish(struct vm_area_s= truct *vma, } #endif =20 +#define __HAVE_ARCH_PUDP_INVALIDATE_AD +extern pud_t pudp_invalidate_ad(struct vm_area_struct *vma, + unsigned long address, pud_t *pudp); + #define __HAVE_ARCH_PMDP_INVALIDATE_AD extern pmd_t pmdp_invalidate_ad(struct vm_area_struct *vma, unsigned long address, pmd_t *pmdp); =20 +#define __HAVE_ARCH_PUDP_INVALIDATE pud_t pudp_invalidate(struct vm_area_struct *vma, unsigned long address, pud_t *pudp); =20 diff --git a/arch/x86/mm/pgtable.c b/arch/x86/mm/pgtable.c index da7f0a03cf90..85b912397afa 100644 --- a/arch/x86/mm/pgtable.c +++ b/arch/x86/mm/pgtable.c @@ -530,8 +530,22 @@ pmd_t pmdp_invalidate_ad(struct vm_area_struct *vma, u= nsigned long address, } #endif =20 -#if defined(CONFIG_TRANSPARENT_HUGEPAGE) && \ - defined(CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD) +#if (defined(CONFIG_TRANSPARENT_HUGEPAGE) && \ + defined(CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD)) || \ + defined CONFIG_HUGETLB_PAGE + +pud_t pudp_invalidate_ad(struct vm_area_struct *vma, unsigned long address, + pud_t *pudp) +{ + VM_WARN_ON_ONCE(!pud_present(*pudp)); + + /* + * No flush is necessary. Once an invalid PUD is established, the PUD's + * access and dirty bits cannot be updated. + */ + return pudp_establish(vma, address, pudp, pud_mkinvalid(*pudp)); +} + pud_t pudp_invalidate(struct vm_area_struct *vma, unsigned long address, pud_t *pudp) { diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c index b9ca47f0bc18..20ffb26692cc 100644 --- a/fs/proc/task_mmu.c +++ b/fs/proc/task_mmu.c @@ -1785,46 +1785,6 @@ static bool __folio_page_mapped_exclusively(struct f= olio *folio, struct page *pa return !folio_maybe_mapped_shared(folio); } =20 -static int pagemap_pte_hole(unsigned long start, unsigned long end, - __always_unused int depth, struct mm_walk *walk) -{ - struct pagemapread *pm =3D walk->private; - unsigned long addr =3D start; - int err =3D 0; - - while (addr < end) { - struct vm_area_struct *vma =3D find_vma(walk->mm, addr); - pagemap_entry_t pme =3D make_pme(0, 0); - /* End of address space hole, which we mark as non-present. */ - unsigned long hole_end; - - if (vma) - hole_end =3D min(end, vma->vm_start); - else - hole_end =3D end; - - for (; addr < hole_end; addr +=3D PAGE_SIZE) { - err =3D add_to_pagemap(&pme, pm); - if (err) - goto out; - } - - if (!vma) - break; - - /* Addresses in the VMA. */ - if (vma->vm_flags & VM_SOFTDIRTY) - pme =3D make_pme(0, PM_SOFT_DIRTY); - for (; addr < min(end, vma->vm_end); addr +=3D PAGE_SIZE) { - err =3D add_to_pagemap(&pme, pm); - if (err) - goto out; - } - } -out: - return err; -} - static pagemap_entry_t pte_to_pagemap_entry(struct pagemapread *pm, struct vm_area_struct *vma, unsigned long addr, pte_t pte) { @@ -1891,357 +1851,173 @@ static pagemap_entry_t pte_to_pagemap_entry(struc= t pagemapread *pm, return make_pme(frame, flags); } =20 -#ifdef CONFIG_TRANSPARENT_HUGEPAGE -static int pagemap_pmd_range_thp(pmd_t *pmdp, unsigned long addr, - unsigned long end, struct vm_area_struct *vma, - struct pagemapread *pm) +struct pagemap_scan_private { + struct pm_scan_arg arg; + unsigned long masks_of_interest, cur_vma_category; + struct page_region *vec_buf; + unsigned long vec_buf_len, vec_buf_index, found_pages; + struct page_region __user *vec_out; +}; + +static bool pagemap_scan_is_interesting_page(unsigned long categories, + const struct pagemap_scan_private *p) { - unsigned int idx =3D (addr & ~PMD_MASK) >> PAGE_SHIFT; - u64 flags =3D 0, frame =3D 0; - pmd_t pmd =3D *pmdp; - struct page *page =3D NULL; - struct folio *folio =3D NULL; - int err =3D 0; + categories ^=3D p->arg.category_inverted; + if ((categories & p->arg.category_mask) !=3D p->arg.category_mask) + return false; + if (p->arg.category_anyof_mask && !(categories & p->arg.category_anyof_ma= sk)) + return false; =20 - if (vma->vm_flags & VM_SOFTDIRTY) - flags |=3D PM_SOFT_DIRTY; + return true; +} =20 - if (pmd_none(pmd)) - goto populate_pagemap; +#ifdef CONFIG_HUGETLB_PAGE +static void make_uffd_wp_pud(struct vm_area_struct *vma, + unsigned long addr, pud_t *pudp) +{ + pud_t old, pud =3D *pudp; =20 - if (pmd_present(pmd)) { - page =3D pmd_page(pmd); + if (pud_present(pud)) { + old =3D pudp_invalidate_ad(vma, addr, pudp); + pud =3D pud_mkuffd_wp(old); + set_pud_at(vma->vm_mm, addr, pudp, pud); + } else if (pud_is_migration_entry(pud)) { + pud =3D pud_swp_mkuffd_wp(pud); + set_pud_at(vma->vm_mm, addr, pudp, pud); + } +} +#else +static void make_uffd_wp_pud(struct vm_area_struct *vma, + unsigned long addr, pud_t *pudp) +{ +} +#endif =20 - flags |=3D PM_PRESENT; - if (pmd_soft_dirty(pmd)) - flags |=3D PM_SOFT_DIRTY; - if (pmd_uffd_wp(pmd)) - flags |=3D PM_UFFD_WP; - if (pm->show_pfn) - frame =3D pmd_pfn(pmd) + idx; - } else if (thp_migration_supported()) { - const softleaf_t entry =3D softleaf_from_pmd(pmd); - unsigned long offset; =20 - if (pm->show_pfn) { - if (softleaf_has_pfn(entry)) - offset =3D softleaf_to_pfn(entry) + idx; - else - offset =3D swp_offset(entry) + idx; - frame =3D swp_type(entry) | - (offset << MAX_SWAPFILES_SHIFT); - } - flags |=3D PM_SWAP; - if (pmd_swp_soft_dirty(pmd)) - flags |=3D PM_SOFT_DIRTY; - if (pmd_swp_uffd_wp(pmd)) - flags |=3D PM_UFFD_WP; - VM_WARN_ON_ONCE(!pmd_is_migration_entry(pmd)); - page =3D softleaf_to_page(entry); - } +static void make_uffd_wp_pmd(struct vm_area_struct *vma, + unsigned long addr, pmd_t *pmdp) +{ + pmd_t old, pmd =3D *pmdp; =20 - if (page) { - folio =3D page_folio(page); - if (!folio_test_anon(folio)) - flags |=3D PM_FILE; + if (pmd_present(pmd)) { + old =3D pmdp_invalidate_ad(vma, addr, pmdp); + pmd =3D pmd_mkuffd_wp(old); + set_pmd_at(vma->vm_mm, addr, pmdp, pmd); + } else if (pmd_is_migration_entry(pmd)) { + pmd =3D pmd_swp_mkuffd_wp(pmd); + set_pmd_at(vma->vm_mm, addr, pmdp, pmd); } +} =20 -populate_pagemap: - for (; addr !=3D end; addr +=3D PAGE_SIZE, idx++) { - u64 cur_flags =3D flags; - pagemap_entry_t pme; - - if (folio && (flags & PM_PRESENT) && - __folio_page_mapped_exclusively(folio, page)) - cur_flags |=3D PM_MMAP_EXCLUSIVE; +static void make_uffd_wp_pte(struct vm_area_struct *vma, + unsigned long addr, pte_t *pte, pte_t ptent) +{ + if (pte_present(ptent)) { + pte_t old_pte; =20 - pme =3D make_pme(frame, cur_flags); - err =3D add_to_pagemap(&pme, pm); - if (err) - break; - if (pm->show_pfn) { - if (flags & PM_PRESENT) - frame++; - else if (flags & PM_SWAP) - frame +=3D (1 << MAX_SWAPFILES_SHIFT); - } + old_pte =3D ptep_modify_prot_start(vma, addr, pte); + ptent =3D pte_mkuffd_wp(old_pte); + ptep_modify_prot_commit(vma, addr, pte, old_pte, ptent); + } else if (pte_none(ptent)) { + set_pte_at(vma->vm_mm, addr, pte, + make_pte_marker(PTE_MARKER_UFFD_WP)); + } else { + ptent =3D pte_swp_mkuffd_wp(ptent); + set_pte_at(vma->vm_mm, addr, pte, ptent); } - return err; } -#endif /* CONFIG_TRANSPARENT_HUGEPAGE */ =20 -static int pagemap_pmd_range(pmd_t *pmdp, unsigned long addr, unsigned lon= g end, - struct mm_walk *walk) +#if defined(CONFIG_TRANSPARENT_HUGEPAGE) || defined(CONFIG_HUGETLB_PAGE) +static void pagemap_scan_backout_range(struct pagemap_scan_private *p, + unsigned long addr, unsigned long end) { - struct vm_area_struct *vma =3D walk->vma; - struct pagemapread *pm =3D walk->private; - spinlock_t *ptl; - pte_t *pte, *orig_pte; - int err =3D 0; + struct page_region *cur_buf =3D &p->vec_buf[p->vec_buf_index]; =20 -#ifdef CONFIG_TRANSPARENT_HUGEPAGE - ptl =3D pmd_trans_huge_lock(pmdp, vma); - if (ptl) { - err =3D pagemap_pmd_range_thp(pmdp, addr, end, vma, pm); - spin_unlock(ptl); - return err; - } + if (!p->vec_buf) + return; + + if (cur_buf->start !=3D addr) + cur_buf->end =3D addr; + else + cur_buf->start =3D cur_buf->end =3D 0; + + p->found_pages -=3D (end - addr) / PAGE_SIZE; +} #endif =20 +static bool pagemap_scan_push_range(unsigned long categories, + struct pagemap_scan_private *p, + unsigned long addr, unsigned long end) +{ + struct page_region *cur_buf =3D &p->vec_buf[p->vec_buf_index]; + /* - * We can assume that @vma always points to a valid one and @end never - * goes beyond vma->vm_end. + * When there is no output buffer provided at all, the sentinel values + * won't match here. There is no other way for `cur_buf->end` to be + * non-zero other than it being non-empty. */ - orig_pte =3D pte =3D pte_offset_map_lock(walk->mm, pmdp, addr, &ptl); - if (!pte) { - walk->action =3D ACTION_AGAIN; - return err; + if (addr =3D=3D cur_buf->end && categories =3D=3D cur_buf->categories) { + cur_buf->end =3D end; + return true; } - for (; addr < end; pte++, addr +=3D PAGE_SIZE) { - pagemap_entry_t pme; =20 - pme =3D pte_to_pagemap_entry(pm, vma, addr, ptep_get(pte)); - err =3D add_to_pagemap(&pme, pm); - if (err) - break; + if (cur_buf->end) { + if (p->vec_buf_index >=3D p->vec_buf_len - 1) + return false; + + cur_buf =3D &p->vec_buf[++p->vec_buf_index]; } - pte_unmap_unlock(orig_pte, ptl); =20 - cond_resched(); + cur_buf->start =3D addr; + cur_buf->end =3D end; + cur_buf->categories =3D categories; =20 - return err; + return true; } =20 -#ifdef CONFIG_HUGETLB_PAGE -/* This function walks within one hugetlb entry in the single call */ -static int pagemap_hugetlb_range(pte_t *ptep, unsigned long hmask, - unsigned long addr, unsigned long end, - struct mm_walk *walk) +static int pagemap_scan_output(unsigned long categories, + struct pagemap_scan_private *p, + unsigned long addr, unsigned long *end) { - struct pagemapread *pm =3D walk->private; - struct vm_area_struct *vma =3D walk->vma; - u64 flags =3D 0, frame =3D 0; - spinlock_t *ptl; - int err =3D 0; - pte_t pte; - - if (vma->vm_flags & VM_SOFTDIRTY) - flags |=3D PM_SOFT_DIRTY; - - ptl =3D huge_pte_lock(hstate_vma(vma), walk->mm, ptep); - pte =3D huge_ptep_get(walk->mm, addr, ptep); - if (pte_present(pte)) { - struct folio *folio =3D page_folio(pte_page(pte)); + unsigned long n_pages, total_pages; + int ret =3D 0; =20 - if (!folio_test_anon(folio)) - flags |=3D PM_FILE; + if (!p->vec_buf) + return 0; =20 - if (!folio_maybe_mapped_shared(folio) && - !hugetlb_pmd_shared(ptep)) - flags |=3D PM_MMAP_EXCLUSIVE; + categories &=3D p->arg.return_mask; =20 - if (huge_pte_uffd_wp(pte)) - flags |=3D PM_UFFD_WP; + n_pages =3D (*end - addr) / PAGE_SIZE; + if (check_add_overflow(p->found_pages, n_pages, &total_pages) || + total_pages > p->arg.max_pages) { + size_t n_too_much =3D total_pages - p->arg.max_pages; =20 - flags |=3D PM_PRESENT; - if (pm->show_pfn) - frame =3D pte_pfn(pte) + - ((addr & ~hmask) >> PAGE_SHIFT); - } else if (pte_swp_uffd_wp_any(pte)) { - flags |=3D PM_UFFD_WP; + *end -=3D n_too_much * PAGE_SIZE; + n_pages -=3D n_too_much; + ret =3D -ENOSPC; } =20 - for (; addr !=3D end; addr +=3D PAGE_SIZE) { - pagemap_entry_t pme =3D make_pme(frame, flags); - - err =3D add_to_pagemap(&pme, pm); - if (err) - break; - if (pm->show_pfn && (flags & PM_PRESENT)) - frame++; + if (!pagemap_scan_push_range(categories, p, addr, *end)) { + *end =3D addr; + n_pages =3D 0; + ret =3D -ENOSPC; } =20 - spin_unlock(ptl); - cond_resched(); + p->found_pages +=3D n_pages; + if (ret) + p->arg.walk_end =3D *end; =20 - return err; + return ret; } -#else -#define pagemap_hugetlb_range NULL -#endif /* HUGETLB_PAGE */ - -static const struct mm_walk_ops pagemap_ops =3D { - .pmd_entry =3D pagemap_pmd_range, - .pte_hole =3D pagemap_pte_hole, - .hugetlb_entry =3D pagemap_hugetlb_range, - .walk_lock =3D PGWALK_RDLOCK, -}; =20 -/* - * /proc/pid/pagemap - an array mapping virtual pages to pfns - * - * For each page in the address space, this file contains one 64-bit entry - * consisting of the following: - * - * Bits 0-54 page frame number (PFN) if present - * Bits 0-4 swap type if swapped - * Bits 5-54 swap offset if swapped - * Bit 55 pte is soft-dirty (see Documentation/admin-guide/mm/soft-dir= ty.rst) - * Bit 56 page exclusively mapped - * Bit 57 pte is uffd-wp write-protected - * Bit 58 pte is a guard region - * Bits 59-60 zero - * Bit 61 page is file-page or shared-anon - * Bit 62 page swapped - * Bit 63 page present - * - * If the page is not present but in swap, then the PFN contains an - * encoding of the swap file number and the page's offset into the - * swap. Unmapped pages return a null PFN. This allows determining - * precisely which pages are mapped (or in swap) and comparing mapped - * pages between processes. - * - * Efficient users of this interface will use /proc/pid/maps to - * determine which areas of memory are actually mapped and llseek to - * skip over unmapped regions. - */ -static ssize_t pagemap_read(struct file *file, char __user *buf, - size_t count, loff_t *ppos) +static unsigned long pagemap_page_category(struct pagemap_scan_private *p, + struct vm_area_struct *vma, + unsigned long addr, pte_t pte) { - struct mm_struct *mm =3D file->private_data; - struct pagemapread pm; - unsigned long src; - unsigned long svpfn; - unsigned long start_vaddr; - unsigned long end_vaddr; - int ret =3D 0, copied =3D 0; + unsigned long categories; =20 - if (!mm || !mmget_not_zero(mm)) - goto out; - - ret =3D -EINVAL; - /* file position must be aligned */ - if ((*ppos % PM_ENTRY_BYTES) || (count % PM_ENTRY_BYTES)) - goto out_mm; - - ret =3D 0; - if (!count) - goto out_mm; - - /* do not disclose physical addresses: attack vector */ - pm.show_pfn =3D file_ns_capable(file, &init_user_ns, CAP_SYS_ADMIN); - - pm.len =3D (PAGEMAP_WALK_SIZE >> PAGE_SHIFT); - pm.buffer =3D kmalloc_array(pm.len, PM_ENTRY_BYTES, GFP_KERNEL); - ret =3D -ENOMEM; - if (!pm.buffer) - goto out_mm; - - src =3D *ppos; - svpfn =3D src / PM_ENTRY_BYTES; - end_vaddr =3D mm->task_size; - - /* watch out for wraparound */ - start_vaddr =3D end_vaddr; - if (svpfn <=3D (ULONG_MAX >> PAGE_SHIFT)) { - unsigned long end; - - ret =3D mmap_read_lock_killable(mm); - if (ret) - goto out_free; - start_vaddr =3D untagged_addr_remote(mm, svpfn << PAGE_SHIFT); - mmap_read_unlock(mm); - - end =3D start_vaddr + ((count / PM_ENTRY_BYTES) << PAGE_SHIFT); - if (end >=3D start_vaddr && end < mm->task_size) - end_vaddr =3D end; - } - - /* Ensure the address is inside the task */ - if (start_vaddr > mm->task_size) - start_vaddr =3D end_vaddr; - - ret =3D 0; - while (count && (start_vaddr < end_vaddr)) { - int len; - unsigned long end; - - pm.pos =3D 0; - end =3D (start_vaddr + PAGEMAP_WALK_SIZE) & PAGEMAP_WALK_MASK; - /* overflow ? */ - if (end < start_vaddr || end > end_vaddr) - end =3D end_vaddr; - ret =3D mmap_read_lock_killable(mm); - if (ret) - goto out_free; - ret =3D walk_page_range(mm, start_vaddr, end, &pagemap_ops, &pm); - mmap_read_unlock(mm); - start_vaddr =3D end; - - len =3D min(count, PM_ENTRY_BYTES * pm.pos); - if (copy_to_user(buf, pm.buffer, len)) { - ret =3D -EFAULT; - goto out_free; - } - copied +=3D len; - buf +=3D len; - count -=3D len; - } - *ppos +=3D copied; - if (!ret || ret =3D=3D PM_END_OF_BUFFER) - ret =3D copied; - -out_free: - kfree(pm.buffer); -out_mm: - mmput(mm); -out: - return ret; -} - -static int pagemap_open(struct inode *inode, struct file *file) -{ - struct mm_struct *mm; - - mm =3D proc_mem_open(inode, PTRACE_MODE_READ); - if (IS_ERR_OR_NULL(mm)) - return mm ? PTR_ERR(mm) : -ESRCH; - file->private_data =3D mm; - return 0; -} - -static int pagemap_release(struct inode *inode, struct file *file) -{ - struct mm_struct *mm =3D file->private_data; - - if (mm) - mmdrop(mm); - return 0; -} - -#define PM_SCAN_CATEGORIES (PAGE_IS_WPALLOWED | PAGE_IS_WRITTEN | \ - PAGE_IS_FILE | PAGE_IS_PRESENT | \ - PAGE_IS_SWAPPED | PAGE_IS_PFNZERO | \ - PAGE_IS_HUGE | PAGE_IS_SOFT_DIRTY | \ - PAGE_IS_GUARD) -#define PM_SCAN_FLAGS (PM_SCAN_WP_MATCHING | PM_SCAN_CHECK_WPASYNC) - -struct pagemap_scan_private { - struct pm_scan_arg arg; - unsigned long masks_of_interest, cur_vma_category; - struct page_region *vec_buf; - unsigned long vec_buf_len, vec_buf_index, found_pages; - struct page_region __user *vec_out; -}; - -static unsigned long pagemap_page_category(struct pagemap_scan_private *p, - struct vm_area_struct *vma, - unsigned long addr, pte_t pte) -{ - unsigned long categories; - - if (pte_none(pte)) - return 0; + if (pte_none(pte)) + return 0; =20 if (pte_present(pte)) { struct page *page; @@ -2284,122 +2060,7 @@ static unsigned long pagemap_page_category(struct p= agemap_scan_private *p, return categories; } =20 -static void make_uffd_wp_pte(struct vm_area_struct *vma, - unsigned long addr, pte_t *pte, pte_t ptent) -{ - if (pte_present(ptent)) { - pte_t old_pte; - - old_pte =3D ptep_modify_prot_start(vma, addr, pte); - ptent =3D pte_mkuffd_wp(old_pte); - ptep_modify_prot_commit(vma, addr, pte, old_pte, ptent); - } else if (pte_none(ptent)) { - set_pte_at(vma->vm_mm, addr, pte, - make_pte_marker(PTE_MARKER_UFFD_WP)); - } else { - ptent =3D pte_swp_mkuffd_wp(ptent); - set_pte_at(vma->vm_mm, addr, pte, ptent); - } -} - -#ifdef CONFIG_TRANSPARENT_HUGEPAGE -static unsigned long pagemap_thp_category(struct pagemap_scan_private *p, - struct vm_area_struct *vma, - unsigned long addr, pmd_t pmd) -{ - unsigned long categories =3D PAGE_IS_HUGE; - - if (pmd_none(pmd)) - return categories; - - if (pmd_present(pmd)) { - struct page *page; - - categories |=3D PAGE_IS_PRESENT; - if (!pmd_uffd_wp(pmd)) - categories |=3D PAGE_IS_WRITTEN; - - if (p->masks_of_interest & PAGE_IS_FILE) { - page =3D vm_normal_page_pmd(vma, addr, pmd); - if (page && !PageAnon(page)) - categories |=3D PAGE_IS_FILE; - } - - if (is_huge_zero_pmd(pmd)) - categories |=3D PAGE_IS_PFNZERO; - if (pmd_soft_dirty(pmd)) - categories |=3D PAGE_IS_SOFT_DIRTY; - } else { - categories |=3D PAGE_IS_SWAPPED; - if (!pmd_swp_uffd_wp(pmd)) - categories |=3D PAGE_IS_WRITTEN; - if (pmd_swp_soft_dirty(pmd)) - categories |=3D PAGE_IS_SOFT_DIRTY; - - if (p->masks_of_interest & PAGE_IS_FILE) { - const softleaf_t entry =3D softleaf_from_pmd(pmd); - - if (softleaf_has_pfn(entry) && - !folio_test_anon(softleaf_to_folio(entry))) - categories |=3D PAGE_IS_FILE; - } - } - - return categories; -} - -static void make_uffd_wp_pmd(struct vm_area_struct *vma, - unsigned long addr, pmd_t *pmdp) -{ - pmd_t old, pmd =3D *pmdp; - - if (pmd_present(pmd)) { - old =3D pmdp_invalidate_ad(vma, addr, pmdp); - pmd =3D pmd_mkuffd_wp(old); - set_pmd_at(vma->vm_mm, addr, pmdp, pmd); - } else if (pmd_is_migration_entry(pmd)) { - pmd =3D pmd_swp_mkuffd_wp(pmd); - set_pmd_at(vma->vm_mm, addr, pmdp, pmd); - } -} -#endif /* CONFIG_TRANSPARENT_HUGEPAGE */ - #ifdef CONFIG_HUGETLB_PAGE -static unsigned long pagemap_hugetlb_category(pte_t pte) -{ - unsigned long categories =3D PAGE_IS_HUGE; - - if (pte_none(pte)) - return categories; - - /* - * According to pagemap_hugetlb_range(), file-backed HugeTLB - * page cannot be swapped. So PAGE_IS_FILE is not checked for - * swapped pages. - */ - if (pte_present(pte)) { - categories |=3D PAGE_IS_PRESENT; - - if (!huge_pte_uffd_wp(pte)) - categories |=3D PAGE_IS_WRITTEN; - if (!PageAnon(pte_page(pte))) - categories |=3D PAGE_IS_FILE; - if (is_zero_pfn(pte_pfn(pte))) - categories |=3D PAGE_IS_PFNZERO; - if (pte_soft_dirty(pte)) - categories |=3D PAGE_IS_SOFT_DIRTY; - } else { - categories |=3D PAGE_IS_SWAPPED; - - if (!pte_swp_uffd_wp_any(pte)) - categories |=3D PAGE_IS_WRITTEN; - if (pte_swp_soft_dirty(pte)) - categories |=3D PAGE_IS_SOFT_DIRTY; - } - - return categories; -} - static void make_uffd_wp_huge_pte(struct vm_area_struct *vma, unsigned long addr, pte_t *ptep, pte_t ptent) @@ -2424,497 +2085,535 @@ static void make_uffd_wp_huge_pte(struct vm_area_= struct *vma, huge_ptep_modify_prot_commit(vma, addr, ptep, ptent, huge_pte_mkuffd_wp(ptent)); } -#endif /* CONFIG_HUGETLB_PAGE */ - -#if defined(CONFIG_TRANSPARENT_HUGEPAGE) || defined(CONFIG_HUGETLB_PAGE) -static void pagemap_scan_backout_range(struct pagemap_scan_private *p, - unsigned long addr, unsigned long end) +#else +static void make_uffd_wp_huge_pte(struct vm_area_struct *vma, + unsigned long addr, pte_t *ptep, + pte_t ptent) { - struct page_region *cur_buf =3D &p->vec_buf[p->vec_buf_index]; - - if (!p->vec_buf) - return; - - if (cur_buf->start !=3D addr) - cur_buf->end =3D addr; - else - cur_buf->start =3D cur_buf->end =3D 0; - - p->found_pages -=3D (end - addr) / PAGE_SIZE; } #endif =20 -static bool pagemap_scan_is_interesting_page(unsigned long categories, - const struct pagemap_scan_private *p) -{ - categories ^=3D p->arg.category_inverted; - if ((categories & p->arg.category_mask) !=3D p->arg.category_mask) - return false; - if (p->arg.category_anyof_mask && !(categories & p->arg.category_anyof_ma= sk)) - return false; - - return true; -} - -static bool pagemap_scan_is_interesting_vma(unsigned long categories, - const struct pagemap_scan_private *p) -{ - unsigned long required =3D p->arg.category_mask & PAGE_IS_WPALLOWED; - - categories ^=3D p->arg.category_inverted; - if ((categories & required) !=3D required) - return false; - - return true; -} - -static int pagemap_scan_test_walk(unsigned long start, unsigned long end, - struct mm_walk *walk) -{ - struct pagemap_scan_private *p =3D walk->private; - struct vm_area_struct *vma =3D walk->vma; - unsigned long vma_category =3D 0; - bool wp_allowed =3D userfaultfd_wp_async(vma) && - userfaultfd_wp_use_markers(vma); - - if (!wp_allowed) { - /* User requested explicit failure over wp-async capability */ - if (p->arg.flags & PM_SCAN_CHECK_WPASYNC) - return -EPERM; - /* - * User requires wr-protect, and allows silently skipping - * unsupported vmas. - */ - if (p->arg.flags & PM_SCAN_WP_MATCHING) - return 1; - /* - * Then the request doesn't involve wr-protects at all, - * fall through to the rest checks, and allow vma walk. - */ - } - - if (vma->vm_flags & VM_PFNMAP) - return 1; - - if (wp_allowed) - vma_category |=3D PAGE_IS_WPALLOWED; - - if (vma->vm_flags & VM_SOFTDIRTY) - vma_category |=3D PAGE_IS_SOFT_DIRTY; +/* + * /proc/pid/pagemap - an array mapping virtual pages to pfns + * + * For each page in the address space, this file contains one 64-bit entry + * consisting of the following: + * + * Bits 0-54 page frame number (PFN) if present + * Bits 0-4 swap type if swapped + * Bits 5-54 swap offset if swapped + * Bit 55 pte is soft-dirty (see Documentation/admin-guide/mm/soft-dir= ty.rst) + * Bit 56 page exclusively mapped + * Bit 57 pte is uffd-wp write-protected + * Bit 58 pte is a guard region + * Bits 59-60 zero + * Bit 61 page is file-page or shared-anon + * Bit 62 page swapped + * Bit 63 page present + * + * If the page is not present but in swap, then the PFN contains an + * encoding of the swap file number and the page's offset into the + * swap. Unmapped pages return a null PFN. This allows determining + * precisely which pages are mapped (or in swap) and comparing mapped + * pages between processes. + * + * Efficient users of this interface will use /proc/pid/maps to + * determine which areas of memory are actually mapped and llseek to + * skip over unmapped regions. + */ =20 - if (!pagemap_scan_is_interesting_vma(vma_category, p)) - return 1; +/* + * /proc/pid/pagemap - an array mapping virtual pages to pfns + * + * For each page in the address space, this file contains one 64-bit entry + * consisting of the following: + * + * Bits 0-54 page frame number (PFN) if present + * Bits 0-4 swap type if swapped + * Bits 5-54 swap offset if swapped + * Bit 55 pte is soft-dirty (see Documentation/admin-guide/mm/soft-dir= ty.rst) + * Bit 56 page exclusively mapped + * Bit 57 pte is uffd-wp write-protected + * Bit 58 pte is a guard region + * Bits 59-60 zero + * Bit 61 page is file-page or shared-anon + * Bit 62 page swapped + * Bit 63 page present + * + * If the page is not present but in swap, then the PFN contains an + * encoding of the swap file number and the page's offset into the + * swap. Unmapped pages return a null PFN. This allows determining + * precisely which pages are mapped (or in swap) and comparing mapped + * pages between processes. + * + * Efficient users of this interface will use /proc/pid/maps to + * determine which areas of memory are actually mapped and llseek to + * skip over unmapped regions. + */ =20 - p->cur_vma_category =3D vma_category; +static int pagemap_open(struct inode *inode, struct file *file) +{ + struct mm_struct *mm; =20 + mm =3D proc_mem_open(inode, PTRACE_MODE_READ); + if (IS_ERR_OR_NULL(mm)) + return mm ? PTR_ERR(mm) : -ESRCH; + file->private_data =3D mm; return 0; } =20 -static bool pagemap_scan_push_range(unsigned long categories, - struct pagemap_scan_private *p, - unsigned long addr, unsigned long end) +static int pagemap_release(struct inode *inode, struct file *file) { - struct page_region *cur_buf =3D &p->vec_buf[p->vec_buf_index]; + struct mm_struct *mm =3D file->private_data; =20 - /* - * When there is no output buffer provided at all, the sentinel values - * won't match here. There is no other way for `cur_buf->end` to be - * non-zero other than it being non-empty. - */ - if (addr =3D=3D cur_buf->end && categories =3D=3D cur_buf->categories) { - cur_buf->end =3D end; - return true; - } + if (mm) + mmdrop(mm); + return 0; +} =20 - if (cur_buf->end) { - if (p->vec_buf_index >=3D p->vec_buf_len - 1) - return false; +#define PM_SCAN_CATEGORIES (PAGE_IS_WPALLOWED | PAGE_IS_WRITTEN | \ + PAGE_IS_FILE | PAGE_IS_PRESENT | \ + PAGE_IS_SWAPPED | PAGE_IS_PFNZERO | \ + PAGE_IS_HUGE | PAGE_IS_SOFT_DIRTY | \ + PAGE_IS_GUARD) +#define PM_SCAN_FLAGS (PM_SCAN_WP_MATCHING | PM_SCAN_CHECK_WPASYNC) =20 - cur_buf =3D &p->vec_buf[++p->vec_buf_index]; - } +static bool pagemap_scan_is_interesting_vma(unsigned long categories, + const struct pagemap_scan_private *p) +{ + unsigned long required =3D p->arg.category_mask & PAGE_IS_WPALLOWED; =20 - cur_buf->start =3D addr; - cur_buf->end =3D end; - cur_buf->categories =3D categories; + categories ^=3D p->arg.category_inverted; + if ((categories & required) !=3D required) + return false; =20 return true; } =20 -static int pagemap_scan_output(unsigned long categories, - struct pagemap_scan_private *p, - unsigned long addr, unsigned long *end) +static int pagemap_scan_get_args(struct pm_scan_arg *arg, + unsigned long uarg) { - unsigned long n_pages, total_pages; - int ret =3D 0; + if (copy_from_user(arg, (void __user *)uarg, sizeof(*arg))) + return -EFAULT; =20 - if (!p->vec_buf) - return 0; + if (arg->size !=3D sizeof(struct pm_scan_arg)) + return -EINVAL; =20 - categories &=3D p->arg.return_mask; + /* Validate requested features */ + if (arg->flags & ~PM_SCAN_FLAGS) + return -EINVAL; + if ((arg->category_inverted | arg->category_mask | + arg->category_anyof_mask | arg->return_mask) & ~PM_SCAN_CATEGORIES) + return -EINVAL; =20 - n_pages =3D (*end - addr) / PAGE_SIZE; - if (check_add_overflow(p->found_pages, n_pages, &total_pages) || - total_pages > p->arg.max_pages) { - size_t n_too_much =3D total_pages - p->arg.max_pages; - *end -=3D n_too_much * PAGE_SIZE; - n_pages -=3D n_too_much; - ret =3D -ENOSPC; - } + arg->start =3D untagged_addr((unsigned long)arg->start); + arg->end =3D untagged_addr((unsigned long)arg->end); + arg->vec =3D untagged_addr((unsigned long)arg->vec); =20 - if (!pagemap_scan_push_range(categories, p, addr, *end)) { - *end =3D addr; - n_pages =3D 0; - ret =3D -ENOSPC; - } + /* Validate memory pointers */ + if (!IS_ALIGNED(arg->start, PAGE_SIZE)) + return -EINVAL; + if (!access_ok((void __user *)(long)arg->start, arg->end - arg->start)) + return -EFAULT; + if (!arg->vec && arg->vec_len) + return -EINVAL; + if (UINT_MAX =3D=3D SIZE_MAX && arg->vec_len > SIZE_MAX) + return -EINVAL; + if (arg->vec && !access_ok((void __user *)(long)arg->vec, + size_mul(arg->vec_len, sizeof(struct page_region)))) + return -EFAULT; =20 - p->found_pages +=3D n_pages; - if (ret) - p->arg.walk_end =3D *end; + /* Fixup default values */ + arg->end =3D ALIGN(arg->end, PAGE_SIZE); + arg->walk_end =3D 0; + if (!arg->max_pages) + arg->max_pages =3D ULONG_MAX; =20 - return ret; + return 0; } =20 -static int pagemap_scan_thp_entry(pmd_t *pmd, unsigned long start, - unsigned long end, struct mm_walk *walk) +static int pagemap_scan_writeback_args(struct pm_scan_arg *arg, + unsigned long uargl) { -#ifdef CONFIG_TRANSPARENT_HUGEPAGE - struct pagemap_scan_private *p =3D walk->private; - struct vm_area_struct *vma =3D walk->vma; - unsigned long categories; - spinlock_t *ptl; - int ret =3D 0; - - ptl =3D pmd_trans_huge_lock(pmd, vma); - if (!ptl) - return -ENOENT; + struct pm_scan_arg __user *uarg =3D (void __user *)uargl; =20 - categories =3D p->cur_vma_category | - pagemap_thp_category(p, vma, start, *pmd); + if (copy_to_user(&uarg->walk_end, &arg->walk_end, sizeof(arg->walk_end))) + return -EFAULT; =20 - if (!pagemap_scan_is_interesting_page(categories, p)) - goto out_unlock; + return 0; +} =20 - ret =3D pagemap_scan_output(categories, p, start, &end); - if (start =3D=3D end) - goto out_unlock; +static int pagemap_scan_init_bounce_buffer(struct pagemap_scan_private *p) +{ + if (!p->arg.vec_len) + return 0; =20 - if (~p->arg.flags & PM_SCAN_WP_MATCHING) - goto out_unlock; - if (~categories & PAGE_IS_WRITTEN) - goto out_unlock; + p->vec_buf_len =3D min_t(size_t, PAGEMAP_WALK_SIZE >> PAGE_SHIFT, + p->arg.vec_len); + p->vec_buf =3D kmalloc_objs(*p->vec_buf, p->vec_buf_len); + if (!p->vec_buf) + return -ENOMEM; =20 - /* - * Break huge page into small pages if the WP operation - * needs to be performed on a portion of the huge page. - */ - if (end !=3D start + HPAGE_SIZE) { - spin_unlock(ptl); - split_huge_pmd(vma, pmd, start); - pagemap_scan_backout_range(p, start, end); - /* Report as if there was no THP */ - return -ENOENT; - } + p->vec_buf->start =3D p->vec_buf->end =3D 0; + p->vec_out =3D (struct page_region __user *)(long)p->arg.vec; =20 - make_uffd_wp_pmd(vma, start, pmd); - flush_tlb_range(vma, start, end); -out_unlock: - spin_unlock(ptl); - return ret; -#else /* !CONFIG_TRANSPARENT_HUGEPAGE */ - return -ENOENT; -#endif + return 0; } =20 -static int pagemap_scan_pmd_entry(pmd_t *pmd, unsigned long start, - unsigned long end, struct mm_walk *walk) +static long pagemap_scan_flush_buffer(struct pagemap_scan_private *p) { - struct pagemap_scan_private *p =3D walk->private; - struct vm_area_struct *vma =3D walk->vma; - unsigned long addr, flush_end =3D 0; - pte_t *pte, *start_pte; - spinlock_t *ptl; - int ret; - - ret =3D pagemap_scan_thp_entry(pmd, start, end, walk); - if (ret !=3D -ENOENT) - return ret; + const struct page_region *buf =3D p->vec_buf; + long n =3D p->vec_buf_index; =20 - ret =3D 0; - start_pte =3D pte =3D pte_offset_map_lock(vma->vm_mm, pmd, start, &ptl); - if (!pte) { - walk->action =3D ACTION_AGAIN; + if (!p->vec_buf) return 0; - } =20 - lazy_mmu_mode_enable(); + if (buf[n].end !=3D buf[n].start) + n++; =20 - if ((p->arg.flags & PM_SCAN_WP_MATCHING) && !p->vec_out) { - /* Fast path for performing exclusive WP */ - for (addr =3D start; addr !=3D end; pte++, addr +=3D PAGE_SIZE) { - pte_t ptent =3D ptep_get(pte); + if (!n) + return 0; =20 - if ((pte_present(ptent) && pte_uffd_wp(ptent)) || - pte_swp_uffd_wp_any(ptent)) - continue; - make_uffd_wp_pte(vma, addr, pte, ptent); - if (!flush_end) - start =3D addr; - flush_end =3D addr + PAGE_SIZE; - } - goto flush_and_return; - } + if (copy_to_user(p->vec_out, buf, n * sizeof(*buf))) + return -EFAULT; =20 - if (!p->arg.category_anyof_mask && !p->arg.category_inverted && - p->arg.category_mask =3D=3D PAGE_IS_WRITTEN && - p->arg.return_mask =3D=3D PAGE_IS_WRITTEN) { - for (addr =3D start; addr < end; pte++, addr +=3D PAGE_SIZE) { - unsigned long next =3D addr + PAGE_SIZE; - pte_t ptent =3D ptep_get(pte); + p->arg.vec_len -=3D n; + p->vec_out +=3D n; =20 - if ((pte_present(ptent) && pte_uffd_wp(ptent)) || - pte_swp_uffd_wp_any(ptent)) - continue; - ret =3D pagemap_scan_output(p->cur_vma_category | PAGE_IS_WRITTEN, - p, addr, &next); - if (next =3D=3D addr) - break; - if (~p->arg.flags & PM_SCAN_WP_MATCHING) - continue; - make_uffd_wp_pte(vma, addr, pte, ptent); - if (!flush_end) - start =3D addr; - flush_end =3D next; - } - goto flush_and_return; - } + p->vec_buf_index =3D 0; + p->vec_buf_len =3D min_t(size_t, p->vec_buf_len, p->arg.vec_len); + p->vec_buf->start =3D p->vec_buf->end =3D 0; =20 - for (addr =3D start; addr !=3D end; pte++, addr +=3D PAGE_SIZE) { - pte_t ptent =3D ptep_get(pte); - unsigned long categories =3D p->cur_vma_category | - pagemap_page_category(p, vma, addr, ptent); - unsigned long next =3D addr + PAGE_SIZE; + return n; +} =20 - if (!pagemap_scan_is_interesting_page(categories, p)) - continue; +static unsigned long pagemap_set_category(struct pagemap_scan_private *p, + struct pt_range_walk *ptw, + enum pt_range_walk_type type) +{ + unsigned long categories =3D 0; =20 - ret =3D pagemap_scan_output(categories, p, addr, &next); - if (next =3D=3D addr) - break; + if (ptw->level !=3D PTW_PTE_LEVEL) + categories |=3D PAGE_IS_HUGE; =20 - if (~p->arg.flags & PM_SCAN_WP_MATCHING) - continue; - if (~categories & PAGE_IS_WRITTEN) - continue; + if (ptw->present) { + categories |=3D PAGE_IS_PRESENT; =20 - make_uffd_wp_pte(vma, addr, pte, ptent); - if (!flush_end) - start =3D addr; - flush_end =3D next; + if (type =3D=3D PTW_FOLIO && !PageAnon(ptw->page)) + categories |=3D PAGE_IS_FILE; + if (type =3D=3D PTW_PFN) + categories |=3D PAGE_IS_PFNZERO; + } else { + categories |=3D PAGE_IS_SWAPPED; } =20 -flush_and_return: - if (flush_end) - flush_tlb_range(vma, start, addr); - - lazy_mmu_mode_disable(); - pte_unmap_unlock(start_pte, ptl); - - cond_resched(); - return ret; -} - -#ifdef CONFIG_HUGETLB_PAGE -static int pagemap_scan_hugetlb_entry(pte_t *ptep, unsigned long hmask, - unsigned long start, unsigned long end, - struct mm_walk *walk) -{ - struct pagemap_scan_private *p =3D walk->private; - struct vm_area_struct *vma =3D walk->vma; - unsigned long categories; - spinlock_t *ptl; - int ret =3D 0; - pte_t pte; - - if (~p->arg.flags & PM_SCAN_WP_MATCHING) { - /* Go the short route when not write-protecting pages. */ + switch (ptw->level) { + case PTW_PUD_LEVEL: + if (ptw->present) { + if (!pud_uffd_wp(ptw->pud)) + categories |=3D PAGE_IS_WRITTEN; + if (pud_soft_dirty(ptw->pud)) + categories |=3D PAGE_IS_SOFT_DIRTY; + } else { + if (!pud_swp_uffd_wp(ptw->pud)) + categories |=3D PAGE_IS_WRITTEN; + if (pud_swp_soft_dirty(ptw->pud)) + categories |=3D PAGE_IS_SOFT_DIRTY; + } + break; + case PTW_PMD_LEVEL: + if (ptw->present) { + if (!pmd_uffd_wp(ptw->pmd)) + categories |=3D PAGE_IS_WRITTEN; + if (pmd_soft_dirty(ptw->pmd)) + categories |=3D PAGE_IS_SOFT_DIRTY; + } else { + if (p->masks_of_interest & PAGE_IS_FILE) { + const softleaf_t entry =3D softleaf_from_pmd(ptw->pmd); =20 - pte =3D huge_ptep_get(walk->mm, start, ptep); - categories =3D p->cur_vma_category | pagemap_hugetlb_category(pte); + if (softleaf_has_pfn(entry) && + !folio_test_anon(softleaf_to_folio(entry))) + categories |=3D PAGE_IS_FILE; + } =20 - if (!pagemap_scan_is_interesting_page(categories, p)) - return 0; + if (!pmd_swp_uffd_wp(ptw->pmd)) + categories |=3D PAGE_IS_WRITTEN; =20 - return pagemap_scan_output(categories, p, start, &end); + if (pmd_swp_soft_dirty(ptw->pmd)) + categories |=3D PAGE_IS_SOFT_DIRTY; + } + break; + case PTW_PTE_LEVEL: + if (ptw->present) { + if (!pte_uffd_wp(ptw->pte)) + categories |=3D PAGE_IS_WRITTEN; + if (pte_soft_dirty(ptw->pte)) + categories |=3D PAGE_IS_SOFT_DIRTY; + } else { + if (!pte_swp_uffd_wp_any(ptw->pte)) + categories |=3D PAGE_IS_WRITTEN; + if (pte_swp_soft_dirty(ptw->pte)) + categories |=3D PAGE_IS_SOFT_DIRTY; + } + break; } =20 - i_mmap_lock_write(vma->vm_file->f_mapping); - ptl =3D huge_pte_lock(hstate_vma(vma), vma->vm_mm, ptep); - - pte =3D huge_ptep_get(walk->mm, start, ptep); - categories =3D p->cur_vma_category | pagemap_hugetlb_category(pte); + return categories; +} =20 - if (!pagemap_scan_is_interesting_page(categories, p)) - goto out_unlock; +static int pagemap_scan_walk(struct vm_area_struct *vma, struct pagemap_sc= an_private *p, + unsigned long addr) +{ + int ret =3D 0; + struct pt_range_walk ptw =3D { + .mm =3D vma->vm_mm + }; + enum pt_range_walk_type type; + pt_type_flags_t flags =3D PT_TYPE_ALL; =20 - ret =3D pagemap_scan_output(categories, p, start, &end); - if (start =3D=3D end) - goto out_unlock; +start_again: + type =3D pt_range_walk_start(&ptw, vma, addr, vma->vm_end, flags); + while (type !=3D PTW_DONE) { + bool must_return =3D false; + unsigned long categories =3D p->cur_vma_category | + pagemap_set_category(p, &ptw, type); + unsigned long addr; + unsigned long flush_end =3D 0; + unsigned long end =3D ptw.next_addr; + unsigned long curr_addr =3D ptw.curr_addr; + pte_t *ptep; =20 - if (~categories & PAGE_IS_WRITTEN) - goto out_unlock; + addr =3D curr_addr; =20 - if (end !=3D start + HPAGE_SIZE) { - /* Partial HugeTLB page WP isn't possible. */ - pagemap_scan_backout_range(p, start, end); - p->arg.walk_end =3D start; - ret =3D 0; - goto out_unlock; - } + if (type =3D=3D PTW_NONE) { + int err; =20 - make_uffd_wp_huge_pte(vma, start, ptep, pte); - flush_hugetlb_tlb_range(vma, start, end); + if (!vma || !pagemap_scan_is_interesting_page(p->cur_vma_category, p)) + goto keep_walking; =20 -out_unlock: - spin_unlock(ptl); - i_mmap_unlock_write(vma->vm_file->f_mapping); + ret =3D pagemap_scan_output(p->cur_vma_category, p, addr, &end); + if (curr_addr =3D=3D end) + goto out; + if (~p->arg.flags & PM_SCAN_WP_MATCHING) + goto keep_walking; =20 - return ret; -} -#else -#define pagemap_scan_hugetlb_entry NULL -#endif + err =3D uffd_wp_range(vma, curr_addr, end - curr_addr, true); + if (err < 0) { + ret =3D err; + goto out; + } + goto keep_walking; + } =20 -static int pagemap_scan_pte_hole(unsigned long addr, unsigned long end, - int depth, struct mm_walk *walk) -{ - struct pagemap_scan_private *p =3D walk->private; - struct vm_area_struct *vma =3D walk->vma; - int ret, err; + if (ptw.level !=3D PTW_PTE_LEVEL) { + if (!pagemap_scan_is_interesting_page(categories, p)) + goto keep_walking; =20 - if (!vma || !pagemap_scan_is_interesting_page(p->cur_vma_category, p)) - return 0; + ret =3D pagemap_scan_output(categories, p, curr_addr, &end); + if (curr_addr =3D=3D end) + goto out; =20 - ret =3D pagemap_scan_output(p->cur_vma_category, p, addr, &end); - if (addr =3D=3D end) - return ret; + if (~p->arg.flags & PM_SCAN_WP_MATCHING) { + if (ret) + goto out; + else + goto keep_walking; + } =20 - if (~p->arg.flags & PM_SCAN_WP_MATCHING) - return ret; + if (~categories & PAGE_IS_WRITTEN) + goto keep_walking; =20 - err =3D uffd_wp_range(vma, addr, end - addr, true); - if (err < 0) - ret =3D err; + if (end !=3D curr_addr + HPAGE_SIZE) { + if (is_vm_hugetlb_page(ptw.vma)) { + /* Partial HugeTLB page WP isn't possible. */ + pagemap_scan_backout_range(p, curr_addr, end); + p->arg.walk_end =3D curr_addr; + ret =3D 0; + goto keep_walking; + } + if (ptw.level =3D=3D PTW_PMD_LEVEL) { + pt_range_walk_done(&ptw); + split_huge_pmd(ptw.vma, ptw.pmdp, curr_addr); + pagemap_scan_backout_range(p, curr_addr, end); + /* Relaunch now that we split the pmd */ + goto start_again; + } + } + } else { + lazy_mmu_mode_enable(); + ptep =3D ptw.ptep; + if ((p->arg.flags & PM_SCAN_WP_MATCHING) && !p->vec_out) { + for (addr =3D curr_addr; addr !=3D end; ptep++, addr +=3D PAGE_SIZE) { + pte_t ptent =3D ptep_get(ptep); + + ptw.next_addr =3D addr + PAGE_SIZE; + if ((pte_present(ptent) && pte_uffd_wp(ptent)) || + pte_swp_uffd_wp_any(ptent)) + continue; + make_uffd_wp_pte(vma, addr, ptep, ptent); + if (!flush_end) + curr_addr =3D addr; + flush_end =3D addr + PAGE_SIZE; + } + goto flush_and_return; + } =20 - return ret; -} + if (!p->arg.category_anyof_mask && !p->arg.category_inverted && + p->arg.category_mask =3D=3D PAGE_IS_WRITTEN && + p->arg.return_mask =3D=3D PAGE_IS_WRITTEN) { + for (addr =3D curr_addr; addr < end; ptep++, addr +=3D PAGE_SIZE) { + unsigned long next =3D addr + PAGE_SIZE; + pte_t ptent =3D ptep_get(ptep); + + ptw.next_addr =3D addr + PAGE_SIZE; + if ((pte_present(ptent) && pte_uffd_wp(ptent)) || + pte_swp_uffd_wp_any(ptent)) + continue; + ret =3D pagemap_scan_output(p->cur_vma_category | PAGE_IS_WRITTEN, + p, addr, &next); + if (next =3D=3D addr) { + must_return =3D true; + break; + } + if (~p->arg.flags & PM_SCAN_WP_MATCHING) + continue; + make_uffd_wp_pte(vma, addr, ptep, ptent); + if (!flush_end) + curr_addr =3D addr; + flush_end =3D next; + } + goto flush_and_return; + } =20 -static const struct mm_walk_ops pagemap_scan_ops =3D { - .test_walk =3D pagemap_scan_test_walk, - .pmd_entry =3D pagemap_scan_pmd_entry, - .pte_hole =3D pagemap_scan_pte_hole, - .hugetlb_entry =3D pagemap_scan_hugetlb_entry, -}; + for (addr =3D curr_addr; addr !=3D end; ptep++, addr +=3D PAGE_SIZE) { + pte_t ptent =3D ptep_get(ptep); + unsigned long categories =3D p->cur_vma_category | + pagemap_page_category(p, vma, addr, ptent); + unsigned long next =3D addr + PAGE_SIZE; =20 -static int pagemap_scan_get_args(struct pm_scan_arg *arg, - unsigned long uarg) -{ - if (copy_from_user(arg, (void __user *)uarg, sizeof(*arg))) - return -EFAULT; + ptw.next_addr =3D addr + PAGE_SIZE; + if (!pagemap_scan_is_interesting_page(categories, p)) + continue; =20 - if (arg->size !=3D sizeof(struct pm_scan_arg)) - return -EINVAL; + ret =3D pagemap_scan_output(categories, p, addr, &next); + if (next =3D=3D addr) { + must_return =3D true; + break; + } =20 - /* Validate requested features */ - if (arg->flags & ~PM_SCAN_FLAGS) - return -EINVAL; - if ((arg->category_inverted | arg->category_mask | - arg->category_anyof_mask | arg->return_mask) & ~PM_SCAN_CATEGORIES) - return -EINVAL; + if (~p->arg.flags & PM_SCAN_WP_MATCHING) + continue; + if (~categories & PAGE_IS_WRITTEN) + continue; =20 - arg->start =3D untagged_addr((unsigned long)arg->start); - arg->end =3D untagged_addr((unsigned long)arg->end); - arg->vec =3D untagged_addr((unsigned long)arg->vec); + make_uffd_wp_pte(vma, addr, ptep, ptent); + if (!flush_end) + curr_addr =3D addr; + flush_end =3D next; + } + } =20 - /* Validate memory pointers */ - if (!IS_ALIGNED(arg->start, PAGE_SIZE)) - return -EINVAL; - if (!access_ok((void __user *)(long)arg->start, arg->end - arg->start)) - return -EFAULT; - if (!arg->vec && arg->vec_len) - return -EINVAL; - if (UINT_MAX =3D=3D SIZE_MAX && arg->vec_len > SIZE_MAX) - return -EINVAL; - if (arg->vec && !access_ok((void __user *)(long)arg->vec, - size_mul(arg->vec_len, sizeof(struct page_region)))) - return -EFAULT; + if (ptw.level =3D=3D PTW_PUD_LEVEL) { + if (is_vm_hugetlb_page(ptw.vma)) + make_uffd_wp_huge_pte(vma, curr_addr, ptw.ptep, ptw.pte); + else + make_uffd_wp_pud(ptw.vma, curr_addr, ptw.pudp); + } =20 - /* Fixup default values */ - arg->end =3D ALIGN(arg->end, PAGE_SIZE); - arg->walk_end =3D 0; - if (!arg->max_pages) - arg->max_pages =3D ULONG_MAX; + if (ptw.level =3D=3D PTW_PMD_LEVEL) { + if (is_vm_hugetlb_page(ptw.vma)) + make_uffd_wp_huge_pte(vma, curr_addr, ptw.ptep, ptw.pte); + else + make_uffd_wp_pmd(ptw.vma, curr_addr, ptw.pmdp); + } =20 - return 0; + if (is_vm_hugetlb_page(ptw.vma)) { + flush_hugetlb_tlb_range(vma, curr_addr, end); + } else { +flush_and_return: + if (flush_end || ptw.level !=3D PTW_PTE_LEVEL) + flush_tlb_range(vma, curr_addr, end); + if (ptw.level =3D=3D PTW_PTE_LEVEL) + lazy_mmu_mode_disable(); + } + if (must_return) + goto out; +keep_walking: + type =3D pt_range_walk_next(&ptw, vma, vma->vm_start, vma->vm_end, flags= ); + } +out: + pt_range_walk_done(&ptw); + return ret; } =20 -static int pagemap_scan_writeback_args(struct pm_scan_arg *arg, - unsigned long uargl) +static int pagemap_scan_test(unsigned long start, unsigned long end, + struct pagemap_scan_private *p, + struct vm_area_struct *vma) { - struct pm_scan_arg __user *uarg =3D (void __user *)uargl; + unsigned long vma_category =3D 0; + bool wp_allowed =3D userfaultfd_wp_async(vma) && + userfaultfd_wp_use_markers(vma); =20 - if (copy_to_user(&uarg->walk_end, &arg->walk_end, sizeof(arg->walk_end))) - return -EFAULT; + if (!wp_allowed) { + /* User requested explicit failure over wp-async capability */ + if (p->arg.flags & PM_SCAN_CHECK_WPASYNC) + return -EPERM; + /* + * User requires wr-protect, and allows silently skipping + * unsupported vmas. + */ + if (p->arg.flags & PM_SCAN_WP_MATCHING) + return 1; + /* + * Then the request doesn't involve wr-protects at all, + * fall through to the rest checks, and allow vma walk. + */ + } =20 - return 0; -} + if (vma->vm_flags & VM_PFNMAP) + return 1; =20 -static int pagemap_scan_init_bounce_buffer(struct pagemap_scan_private *p) -{ - if (!p->arg.vec_len) - return 0; + if (wp_allowed) + vma_category |=3D PAGE_IS_WPALLOWED; =20 - p->vec_buf_len =3D min_t(size_t, PAGEMAP_WALK_SIZE >> PAGE_SHIFT, - p->arg.vec_len); - p->vec_buf =3D kmalloc_objs(*p->vec_buf, p->vec_buf_len); - if (!p->vec_buf) - return -ENOMEM; + if (vma->vm_flags & VM_SOFTDIRTY) + vma_category |=3D PAGE_IS_SOFT_DIRTY; =20 - p->vec_buf->start =3D p->vec_buf->end =3D 0; - p->vec_out =3D (struct page_region __user *)(long)p->arg.vec; + if (!pagemap_scan_is_interesting_vma(vma_category, p)) + return 1; + + p->cur_vma_category =3D vma_category; =20 return 0; } =20 -static long pagemap_scan_flush_buffer(struct pagemap_scan_private *p) +static int pagemap_scan_pte_hole(unsigned long addr, unsigned long end, + struct pagemap_scan_private *p, + struct vm_area_struct *vma) { - const struct page_region *buf =3D p->vec_buf; - long n =3D p->vec_buf_index; - - if (!p->vec_buf) - return 0; - - if (buf[n].end !=3D buf[n].start) - n++; + int ret, err; =20 - if (!n) + if (!vma || !pagemap_scan_is_interesting_page(p->cur_vma_category, p)) return 0; =20 - if (copy_to_user(p->vec_out, buf, n * sizeof(*buf))) - return -EFAULT; + ret =3D pagemap_scan_output(p->cur_vma_category, p, addr, &end); + if (addr =3D=3D end) + return ret; =20 - p->arg.vec_len -=3D n; - p->vec_out +=3D n; + if (~p->arg.flags & PM_SCAN_WP_MATCHING) + return ret; =20 - p->vec_buf_index =3D 0; - p->vec_buf_len =3D min_t(size_t, p->vec_buf_len, p->arg.vec_len); - p->vec_buf->start =3D p->vec_buf->end =3D 0; + err =3D uffd_wp_range(vma, addr, end - addr, true); + if (err < 0) + ret =3D err; =20 - return n; + return ret; } =20 static long do_pagemap_scan(struct mm_struct *mm, unsigned long uarg) { struct pagemap_scan_private p =3D {0}; + struct vm_area_struct *vma; unsigned long walk_start; size_t n_ranges_out =3D 0; int ret; @@ -2932,6 +2631,7 @@ static long do_pagemap_scan(struct mm_struct *mm, uns= igned long uarg) for (walk_start =3D p.arg.start; walk_start < p.arg.end; walk_start =3D p.arg.walk_end) { struct mmu_notifier_range range; + unsigned long next; long n_out; =20 if (fatal_signal_pending(current)) { @@ -2950,8 +2650,42 @@ static long do_pagemap_scan(struct mm_struct *mm, un= signed long uarg) mmu_notifier_invalidate_range_start(&range); } =20 - ret =3D walk_page_range(mm, walk_start, p.arg.end, - &pagemap_scan_ops, &p); + vma =3D find_vma(mm, walk_start); + do { + if (!vma) { + walk_start =3D p.arg.end; + next =3D p.arg.end; + ret =3D pagemap_scan_pte_hole(walk_start, next, &p, NULL); + if (ret) + break; + } else if (walk_start < vma->vm_start) { + next =3D min(p.arg.end, vma->vm_start); + ret =3D pagemap_scan_pte_hole(walk_start, next, &p, NULL); + if (ret) + break; + walk_start =3D next; + } else { + next =3D min(p.arg.end, vma->vm_end); + + ret =3D pagemap_scan_test(walk_start, min(p.arg.end, vma->vm_end), + &p, vma); + + if (ret > 0) { + ret =3D 0; + walk_start =3D min(p.arg.end, vma->vm_end); + next =3D walk_start; + vma =3D find_vma(mm, walk_start); + continue; + } + + ret =3D pagemap_scan_walk(vma, &p, walk_start); + if (ret) + break; + walk_start =3D min(p.arg.end, vma->vm_end); + vma =3D find_vma(mm, walk_start); + next =3D walk_start; + } + } while (next < p.arg.end); =20 if (p.arg.flags & PM_SCAN_WP_MATCHING) mmu_notifier_invalidate_range_end(&range); @@ -2985,6 +2719,306 @@ static long do_pagemap_scan(struct mm_struct *mm, u= nsigned long uarg) return ret; } =20 +static int pagemap_read_walk_range(struct vm_area_struct *vma, unsigned lo= ng start, + struct pagemapread *pm) +{ + int err =3D 0; + struct pt_range_walk ptw =3D { + .mm =3D vma->vm_mm + }; + enum pt_range_walk_type type; + pt_type_flags_t wflags =3D PT_TYPE_ALL; + pte_t *ptep; + + wflags &=3D ~(PT_TYPE_PFN); + + type =3D pt_range_walk_start(&ptw, vma, start, vma->vm_end, wflags); + while (type !=3D PTW_DONE) { + unsigned long end; + u64 frame =3D 0, flags =3D 0; + struct page *page =3D NULL; + struct folio *folio =3D NULL; + + end =3D 0; + switch (ptw.level) { + case PTW_PUD_LEVEL: + end =3D pud_addr_end(start, vma->vm_end); + if (vma->vm_flags & VM_SOFTDIRTY) + flags |=3D PM_SOFT_DIRTY; + + if (pud_present(ptw.pud)) { + page =3D pud_page(ptw.pud); + folio =3D page_folio(page); + flags |=3D PM_PRESENT; + + if (!folio_test_anon(folio)) + flags |=3D PM_FILE; + + if (pm->show_pfn) { + unsigned long hmask =3D huge_page_mask(hstate_vma(vma)); + + frame =3D pud_pfn(ptw.pud) + + ((start & ~hmask) >> PAGE_SHIFT); + } + } else if (pud_swp_uffd_wp(ptw.pud)) { + flags |=3D PM_UFFD_WP; + } + break; + case PTW_PMD_LEVEL: + unsigned int idx =3D (start & ~PMD_MASK) >> PAGE_SHIFT; + + end =3D pmd_addr_end(start, vma->vm_end); + if (vma->vm_flags & VM_SOFTDIRTY) + flags |=3D PM_SOFT_DIRTY; + + if (pmd_none(ptw.pmd)) + goto populate_pagemap; + + if (pmd_present(ptw.pmd)) { + page =3D pmd_page(ptw.pmd); + flags |=3D PM_PRESENT; + + if (pmd_soft_dirty(ptw.pmd)) + flags |=3D PM_SOFT_DIRTY; + if (pmd_uffd_wp(ptw.pmd)) + flags |=3D PM_UFFD_WP; + if (pm->show_pfn) + frame =3D pmd_pfn(ptw.pmd) + idx; + } else if (thp_migration_supported() || IS_ENABLED(CONFIG_HUGETLB_PAGE)= ) { + const softleaf_t entry =3D softleaf_from_pmd(ptw.pmd); + unsigned long offset; + + if (pm->show_pfn) { + if (softleaf_has_pfn(entry)) + offset =3D softleaf_to_pfn(entry) + idx; + else + offset =3D swp_offset(entry) + idx; + frame =3D swp_type(entry) | + (offset << MAX_SWAPFILES_SHIFT); + } + + if (!is_vm_hugetlb_page(vma)) + flags |=3D PM_SWAP; + if (pmd_swp_soft_dirty(ptw.pmd)) + flags |=3D PM_SOFT_DIRTY; + if (pmd_swp_uffd_wp(ptw.pmd)) + flags |=3D PM_UFFD_WP; + + VM_WARN_ON_ONCE(!pmd_is_migration_entry(ptw.pmd)); + page =3D softleaf_to_page(entry); + } + + if (page) { + folio =3D page_folio(page); + if (!folio_test_anon(folio)) + flags |=3D PM_FILE; + } + + break; + case PTW_PTE_LEVEL: + end =3D pmd_addr_end(start, vma->vm_end); + break; + } + + if (ptw.level =3D=3D PTW_PTE_LEVEL) { + ptep =3D ptw.ptep; + for (; start < end; ptep++, start +=3D PAGE_SIZE) { + pagemap_entry_t pme; + + pme =3D pte_to_pagemap_entry(pm, vma, start, ptep_get(ptep)); + err =3D add_to_pagemap(&pme, pm); + ptw.next_addr =3D start + PAGE_SIZE; + if (err) + break; + } + } else if (ptw.level =3D=3D PTW_PMD_LEVEL) { +populate_pagemap: + for (; start !=3D end; start +=3D PAGE_SIZE) { + u64 cur_flags =3D flags; + pagemap_entry_t pme; + + if (folio && (flags & PM_PRESENT) && + __folio_page_mapped_exclusively(folio, page)) + cur_flags |=3D PM_MMAP_EXCLUSIVE; + + pme =3D make_pme(frame, cur_flags); + err =3D add_to_pagemap(&pme, pm); + if (err) + break; + if (pm->show_pfn) { + if (flags & PM_PRESENT) + frame++; + else if (flags & PM_SWAP) + frame +=3D (1 << MAX_SWAPFILES_SHIFT); + } + } + } + if (err) + break; + type =3D pt_range_walk_next(&ptw, vma, vma->vm_start, vma->vm_end, wflag= s); + } + pt_range_walk_done(&ptw); + + return err; +} + +static int pagemap_pte_hole(struct mm_struct *mm, unsigned long start, uns= igned long end, + struct pagemapread *pm) +{ + unsigned long addr =3D start; + int err =3D 0; + + while (addr < end) { + struct vm_area_struct *vma =3D find_vma(mm, addr); + pagemap_entry_t pme =3D make_pme(0, 0); + /* End of address space hole, which we mark as non-present. */ + unsigned long hole_end; + + if (vma) + hole_end =3D min(end, vma->vm_start); + else + hole_end =3D end; + + for (; addr < hole_end; addr +=3D PAGE_SIZE) { + err =3D add_to_pagemap(&pme, pm); + if (err) + goto out; + } + + if (!vma) + break; + + /* Addresses in the VMA. */ + if (vma->vm_flags & VM_SOFTDIRTY) + pme =3D make_pme(0, PM_SOFT_DIRTY); + for (; addr < min(end, vma->vm_end); addr +=3D PAGE_SIZE) { + err =3D add_to_pagemap(&pme, pm); + if (err) + goto out; + } + } +out: + return err; +} + +static ssize_t pagemap_read(struct file *file, char __user *buf, + size_t count, loff_t *ppos) +{ + struct mm_struct *mm =3D file->private_data; + struct pagemapread pm; + unsigned long src; + unsigned long svpfn; + unsigned long start_vaddr; + unsigned long end_vaddr; + int ret =3D 0, copied =3D 0; + + if (!mm || !mmget_not_zero(mm)) + goto out; + + ret =3D -EINVAL; + /* file position must be aligned */ + if ((*ppos % PM_ENTRY_BYTES) || (count % PM_ENTRY_BYTES)) + goto out_mm; + + ret =3D 0; + if (!count) + goto out_mm; + + /* do not disclose physical addresses: attack vector */ + pm.show_pfn =3D file_ns_capable(file, &init_user_ns, CAP_SYS_ADMIN); + + pm.len =3D (PAGEMAP_WALK_SIZE >> PAGE_SHIFT); + pm.buffer =3D kmalloc_array(pm.len, PM_ENTRY_BYTES, GFP_KERNEL); + ret =3D -ENOMEM; + if (!pm.buffer) + goto out_mm; + + src =3D *ppos; + svpfn =3D src / PM_ENTRY_BYTES; + end_vaddr =3D mm->task_size; + + /* watch out for wraparound */ + start_vaddr =3D end_vaddr; + if (svpfn <=3D (ULONG_MAX >> PAGE_SHIFT)) { + unsigned long end; + + ret =3D mmap_read_lock_killable(mm); + if (ret) + goto out_free; + start_vaddr =3D untagged_addr_remote(mm, svpfn << PAGE_SHIFT); + mmap_read_unlock(mm); + + end =3D start_vaddr + ((count / PM_ENTRY_BYTES) << PAGE_SHIFT); + if (end >=3D start_vaddr && end < mm->task_size) + end_vaddr =3D end; + } + + /* Ensure the address is inside the task */ + if (start_vaddr > mm->task_size) + start_vaddr =3D end_vaddr; + + ret =3D 0; + + while (count && (start_vaddr < end_vaddr)) { + int len; + unsigned long end; + unsigned long next; + + pm.pos =3D 0; + end =3D (start_vaddr + PAGEMAP_WALK_SIZE) & PAGEMAP_WALK_MASK; + if (end < start_vaddr || end > end_vaddr) + end =3D end_vaddr; + ret =3D mmap_read_lock_killable(mm); + if (ret) + goto out_free; + + struct vm_area_struct *vma =3D find_vma(mm, start_vaddr); + + do { + if (!vma) { + next =3D end; + ret =3D pagemap_pte_hole(mm, start_vaddr, next, &pm); + if (ret) + goto out_err; + } else if (start_vaddr < vma->vm_start) { + next =3D min(end, vma->vm_start); + ret =3D pagemap_pte_hole(mm, start_vaddr, next, &pm); + if (ret) + goto out_err; + start_vaddr =3D next; + } else { + ret =3D pagemap_read_walk_range(vma, start_vaddr, &pm); + if (ret) + goto out_err; + start_vaddr =3D min(end, vma->vm_end); + next =3D start_vaddr; + vma =3D find_vma(mm, start_vaddr); + } + } while (next < end); +out_err: + mmap_read_unlock(mm); + + len =3D min(count, PM_ENTRY_BYTES * pm.pos); + if (copy_to_user(buf, pm.buffer, len)) { + ret =3D -EFAULT; + goto out_free; + } + copied +=3D len; + buf +=3D len; + count -=3D len; + } + *ppos +=3D copied; + if (!ret || ret =3D=3D PM_END_OF_BUFFER) + ret =3D copied; + +out_free: + kfree(pm.buffer); +out_mm: + mmput(mm); +out: + return ret; +} + static long do_pagemap_cmd(struct file *file, unsigned int cmd, unsigned long arg) { @@ -3007,6 +3041,7 @@ const struct file_operations proc_pagemap_operations = =3D { .unlocked_ioctl =3D do_pagemap_cmd, .compat_ioctl =3D do_pagemap_cmd, }; + #endif /* CONFIG_PROC_PAGE_MONITOR */ =20 #ifdef CONFIG_NUMA diff --git a/include/linux/leafops.h b/include/linux/leafops.h index 08646398b0fe..bda9e7971732 100644 --- a/include/linux/leafops.h +++ b/include/linux/leafops.h @@ -628,6 +628,19 @@ static inline bool pmd_is_device_private_entry(pmd_t p= md) =20 #endif /* CONFIG_ZONE_DEVICE && CONFIG_ARCH_ENABLE_THP_MIGRATION */ =20 +#ifdef CONFIG_HUGETLB_PAGE +/** + * pud_is_migration_entry() - Does this PUD entry encode a migration entry? + * @pud: PUD entry. + * + * Returns: true if the PUD encodes a migration entry, otherwise false. + */ +static inline bool pud_is_migration_entry(pud_t pud) +{ + return softleaf_is_migration(softleaf_from_pud(pud)); +} +#endif + /** * pmd_is_migration_entry() - Does this PMD entry encode a migration entry? * @pmd: PMD entry. diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h index f5291f9ce583..1512b1bb49e3 100644 --- a/include/linux/pgtable.h +++ b/include/linux/pgtable.h @@ -1263,11 +1263,21 @@ static inline pmd_t generic_pmdp_establish(struct v= m_area_struct *vma, } #endif =20 +#ifndef __HAVE_ARCH_PUDP_INVALIDATE +extern pud_t pudp_invalidate(struct vm_area_struct *vma, unsigned long add= ress, + pud_t *pudp); +#endif + #ifndef __HAVE_ARCH_PMDP_INVALIDATE extern pmd_t pmdp_invalidate(struct vm_area_struct *vma, unsigned long add= ress, pmd_t *pmdp); #endif =20 +#ifndef __HAVE_ARCH_PUDP_INVALIDATE_AD +extern pud_t pudp_invalidate_ad(struct vm_area_struct *vma, + unsigned long address, pud_t *pudp); +#endif + #ifndef __HAVE_ARCH_PMDP_INVALIDATE_AD =20 /* @@ -1810,6 +1820,21 @@ static inline pgprot_t pgprot_modify(pgprot_t oldpro= t, pgprot_t newprot) =20 #ifdef CONFIG_HAVE_ARCH_SOFT_DIRTY #ifndef CONFIG_ARCH_ENABLE_THP_MIGRATION +static inline pud_t pud_swp_mksoft_dirty(pud_t pud) +{ + return pud; +} + +static inline int pud_swp_soft_dirty(pud_t pud) +{ + return 0; +} + +static inline pud_t pud_swp_clear_soft_dirty(pud_t pud) +{ + return pud; +} + static inline pmd_t pmd_swp_mksoft_dirty(pmd_t pmd) { return pmd; @@ -1852,6 +1877,11 @@ static inline int pmd_soft_dirty(pmd_t pmd) return 0; } =20 +static inline int pud_soft_dirty(pud_t pud) +{ + return 0; +} + static inline pte_t pte_mksoft_dirty(pte_t pte) { return pte; diff --git a/mm/pgtable-generic.c b/mm/pgtable-generic.c index b91b1a98029c..89010192c969 100644 --- a/mm/pgtable-generic.c +++ b/mm/pgtable-generic.c @@ -208,6 +208,27 @@ pmd_t pmdp_invalidate(struct vm_area_struct *vma, unsi= gned long address, } #endif =20 +#ifndef __HAVE_ARCH_PUDP_INVALIDATE +pud_t pudp_invalidate(struct vm_area_struct *vma, unsigned long address, + pud_t *pudp) +{ + VM_WARN_ON_ONCE(!pud_present(*pudp)); + pud_t old =3D pudp_establish(vma, address, pudp, pud_mkinvalid(*pudp)); + flush_pud_tlb_range(vma, address, address + HPAGE_PUD_SIZE); + return old; +} +#endif + +#ifndef __HAVE_ARCH_PUDP_INVALIDATE_AD +pud_t pudp_invalidate_ad(struct vm_area_struct *vma, unsigned long address, + pud_t *pudp) + +{ + VM_WARN_ON_ONCE(!pud_present(*pudp)); + return pudp_invalidate(vma, address, pudp); +} +#endif + #ifndef __HAVE_ARCH_PMDP_INVALIDATE_AD pmd_t pmdp_invalidate_ad(struct vm_area_struct *vma, unsigned long address, pmd_t *pmdp) --=20 2.53.0 From nobody Mon Jun 8 22:53:12 2026 Received: from smtp-out1.suse.de (smtp-out1.suse.de [195.135.223.130]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 69B19296BD2 for ; Mon, 25 May 2026 16:56:28 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=195.135.223.130 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779728190; cv=none; b=j0KOylIB//Z0pTr6sydoE/K3xmWy90SODxWRkm2uCchBqnvmAjKTClw/1ySRLUF2RTkgYgIeinrECvhDdLsxtAyse7CHSkuYUNI9Hh3QKATAq/qoQOt+TUXaJQvBDPnAl8ME7sMNp8DYT1aHK+iDBBhnku0bOoZiPptJyZp2w9Y= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779728190; c=relaxed/simple; bh=TMBQ2P6kOPgp5aMtwTq0vwL2OrXwpwMIYHXJfKGit7M=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=UIX+p/2ZUFDdEANG+q6dtnfzWDmJqxou3Z52rHkvr/xbQKVvKEMYdylyWM2HcqkB9XE3YlP5zycq6mpWtJM6L0vp25p8x7x8/si/hM4h43XBNfRQwRaS446ggXDc9Q1s5MDlrFStC+4SeNUK+5rqG6IlL+8SYYvZmTra8xlwlq4= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=suse.de; spf=pass smtp.mailfrom=suse.de; dkim=pass (1024-bit key) header.d=suse.de header.i=@suse.de header.b=iE1hjCH0; dkim=permerror (0-bit key) header.d=suse.de header.i=@suse.de header.b=xYwfvlGy; dkim=pass (1024-bit key) header.d=suse.de header.i=@suse.de header.b=iE1hjCH0; dkim=permerror (0-bit key) header.d=suse.de header.i=@suse.de header.b=xYwfvlGy; arc=none smtp.client-ip=195.135.223.130 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=suse.de Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=suse.de Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=suse.de header.i=@suse.de header.b="iE1hjCH0"; dkim=permerror (0-bit key) header.d=suse.de header.i=@suse.de header.b="xYwfvlGy"; dkim=pass (1024-bit key) header.d=suse.de header.i=@suse.de header.b="iE1hjCH0"; dkim=permerror (0-bit key) header.d=suse.de header.i=@suse.de header.b="xYwfvlGy" Received: from imap1.dmz-prg2.suse.org (unknown [10.150.64.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by smtp-out1.suse.de (Postfix) with ESMTPS id C7E1664DF8; Mon, 25 May 2026 16:55:48 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1779728148; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=/AWqlJ53Y4jJx9FgEpNOKN4WEbvz8ryM9BZb3tsdED0=; b=iE1hjCH0NiH0vRWm0+J+8piqVnBh2VUvbYSqFJDMLTbCu10LBLcdEvL41L2ST9gy8+JlwV eB1K/Mh7YqxPsgItQ9q/vh9iCo0TTlMC79l2Ez8fuoI6swL2iK24LpV+mri1QMFYdGnpFa aiblaxH+E776TxsCGV48jCw6q1GOttY= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1779728148; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=/AWqlJ53Y4jJx9FgEpNOKN4WEbvz8ryM9BZb3tsdED0=; b=xYwfvlGybss6iC8IZF5rrBVbt5TVIGHvLvhcw2nTp1lTKWb6YrPljp/w51mgVIg4NG70dK OUa2yPqclX+UWRDw== Authentication-Results: smtp-out1.suse.de; none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1779728148; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=/AWqlJ53Y4jJx9FgEpNOKN4WEbvz8ryM9BZb3tsdED0=; b=iE1hjCH0NiH0vRWm0+J+8piqVnBh2VUvbYSqFJDMLTbCu10LBLcdEvL41L2ST9gy8+JlwV eB1K/Mh7YqxPsgItQ9q/vh9iCo0TTlMC79l2Ez8fuoI6swL2iK24LpV+mri1QMFYdGnpFa aiblaxH+E776TxsCGV48jCw6q1GOttY= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1779728148; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=/AWqlJ53Y4jJx9FgEpNOKN4WEbvz8ryM9BZb3tsdED0=; b=xYwfvlGybss6iC8IZF5rrBVbt5TVIGHvLvhcw2nTp1lTKWb6YrPljp/w51mgVIg4NG70dK OUa2yPqclX+UWRDw== Received: from imap1.dmz-prg2.suse.org (localhost [127.0.0.1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by imap1.dmz-prg2.suse.org (Postfix) with ESMTPS id 6385659D4B; Mon, 25 May 2026 16:55:48 +0000 (UTC) Received: from dovecot-director2.suse.de ([2a07:de40:b281:106:10:150:64:167]) by imap1.dmz-prg2.suse.org with ESMTPSA id iC4zFhR/FGrlRAAAD6G6ig (envelope-from ); Mon, 25 May 2026 16:55:48 +0000 From: Oscar Salvador To: Andrew Morton Cc: David Hildenbrand , Michal Hocko , Muchun Song , Vlastimil Babka , Lorenzo Stoakes , linux-kernel@vger.kernel.org, linux-mm@kvack.org, Oscar Salvador Subject: [RFC PATCH v3 8/8] mm: Make /proc/pid/clear_refs use the new generic pagewalk API Date: Mon, 25 May 2026 18:55:28 +0200 Message-ID: <20260525165528.184397-9-osalvador@suse.de> X-Mailer: git-send-email 2.54.0 In-Reply-To: <20260525165528.184397-1-osalvador@suse.de> References: <20260525165528.184397-1-osalvador@suse.de> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Spam-Score: -6.80 X-Spam-Level: X-Spamd-Result: default: False [-6.80 / 50.00]; REPLY(-4.00)[]; BAYES_HAM(-3.00)[100.00%]; MID_CONTAINS_FROM(1.00)[]; NEURAL_HAM_LONG(-1.00)[-1.000]; R_MISSING_CHARSET(0.50)[]; NEURAL_HAM_SHORT(-0.20)[-0.997]; MIME_GOOD(-0.10)[text/plain]; RCVD_COUNT_TWO(0.00)[2]; ARC_NA(0.00)[]; RCVD_VIA_SMTP_AUTH(0.00)[]; FROM_HAS_DN(0.00)[]; TO_DN_SOME(0.00)[]; MIME_TRACE(0.00)[0:+]; TO_MATCH_ENVRCPT_ALL(0.00)[]; FUZZY_RATELIMITED(0.00)[rspamd.com]; DBL_BLOCKED_OPENRESOLVER(0.00)[suse.de:email,suse.de:mid]; internal_greylist_whitelist(0.00)[10.150.64.97]; FROM_EQ_ENVFROM(0.00)[]; DKIM_SIGNED(0.00)[suse.de:s=susede2_rsa,suse.de:s=susede2_ed25519]; RCPT_COUNT_SEVEN(0.00)[9]; R_RATELIMIT(0.00)[to_ip_from(RLd9dsuofksntgrby8c3fm48h6)]; RCVD_TLS_ALL(0.00)[] X-Spam-Flag: NO Content-Type: text/plain; charset="utf-8" Have /proc/pid/clear_refs make use of the new generic API, and remove the code which was using the old one. Signed-off-by: Oscar Salvador --- fs/proc/task_mmu.c | 133 ++++++++++++++++++++++++--------------------- 1 file changed, 70 insertions(+), 63 deletions(-) diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c index 20ffb26692cc..5f09a5b26b61 100644 --- a/fs/proc/task_mmu.c +++ b/fs/proc/task_mmu.c @@ -1571,72 +1571,54 @@ static inline void clear_soft_dirty_pmd(struct vm_a= rea_struct *vma, } #endif =20 -static int clear_refs_pte_range(pmd_t *pmd, unsigned long addr, - unsigned long end, struct mm_walk *walk) +static void clear_refs_pmd_range(struct pt_range_walk *ptw, + enum clear_refs_types type) { - struct clear_refs_private *cp =3D walk->private; - struct vm_area_struct *vma =3D walk->vma; - pte_t *pte, ptent; - spinlock_t *ptl; - struct folio *folio; + struct vm_area_struct *vma =3D ptw->vma; + unsigned long addr =3D ptw->curr_addr; =20 - ptl =3D pmd_trans_huge_lock(pmd, vma); - if (ptl) { - if (cp->type =3D=3D CLEAR_REFS_SOFT_DIRTY) { - clear_soft_dirty_pmd(vma, addr, pmd); - goto out; - } - - if (!pmd_present(*pmd)) - goto out; + if (type =3D=3D CLEAR_REFS_SOFT_DIRTY) { + clear_soft_dirty_pmd(vma, addr, ptw->pmdp); + return; + } =20 - folio =3D pmd_folio(*pmd); + if (!pmd_present(ptw->pmd) || !ptw->folio) + return; =20 - /* Clear accessed and referenced bits. */ - pmdp_test_and_clear_young(vma, addr, pmd); - folio_test_clear_young(folio); - folio_clear_referenced(folio); -out: - spin_unlock(ptl); - return 0; - } + pmdp_test_and_clear_young(vma, addr, ptw->pmdp); + folio_test_clear_young(ptw->folio); + folio_clear_referenced(ptw->folio); +} =20 - pte =3D pte_offset_map_lock(vma->vm_mm, pmd, addr, &ptl); - if (!pte) { - walk->action =3D ACTION_AGAIN; - return 0; - } - for (; addr !=3D end; pte++, addr +=3D PAGE_SIZE) { - ptent =3D ptep_get(pte); +static void clear_refs_pte_range(struct pt_range_walk *ptw, + enum clear_refs_types type) +{ + struct vm_area_struct *vma =3D ptw->vma; + unsigned long addr =3D ptw->curr_addr; + unsigned long end =3D pmd_addr_end(addr, vma->vm_end); + pte_t *ptep =3D ptw->ptep; =20 - if (cp->type =3D=3D CLEAR_REFS_SOFT_DIRTY) { - clear_soft_dirty(vma, addr, pte); + for (; addr !=3D end; ptep++, addr +=3D PAGE_SIZE) { + if (type =3D=3D CLEAR_REFS_SOFT_DIRTY) { + clear_soft_dirty(vma, addr, ptep); continue; } =20 - if (!pte_present(ptent)) - continue; - - folio =3D vm_normal_folio(vma, addr, ptent); - if (!folio) + if (!ptw->present || !ptw->folio) continue; =20 /* Clear accessed and referenced bits. */ - ptep_test_and_clear_young(vma, addr, pte); - folio_test_clear_young(folio); - folio_clear_referenced(folio); + ptep_test_and_clear_young(vma, addr, ptep); + folio_test_clear_young(ptw->folio); + folio_clear_referenced(ptw->folio); } - pte_unmap_unlock(pte - 1, ptl); - cond_resched(); - return 0; + + ptw->next_addr =3D end; } =20 -static int clear_refs_test_walk(unsigned long start, unsigned long end, - struct mm_walk *walk) +static int clear_refs_test_vma(struct vm_area_struct *vma, + enum clear_refs_types type) { - struct clear_refs_private *cp =3D walk->private; - struct vm_area_struct *vma =3D walk->vma; - if (vma->vm_flags & VM_PFNMAP) return 1; =20 @@ -1646,19 +1628,13 @@ static int clear_refs_test_walk(unsigned long start= , unsigned long end, * Writing 3 to /proc/pid/clear_refs only affects file mapped pages. * Writing 4 to /proc/pid/clear_refs affects all pages. */ - if (cp->type =3D=3D CLEAR_REFS_ANON && vma->vm_file) + if (type =3D=3D CLEAR_REFS_ANON && vma->vm_file) return 1; - if (cp->type =3D=3D CLEAR_REFS_MAPPED && !vma->vm_file) + if (type =3D=3D CLEAR_REFS_MAPPED && !vma->vm_file) return 1; return 0; } =20 -static const struct mm_walk_ops clear_refs_walk_ops =3D { - .pmd_entry =3D clear_refs_pte_range, - .test_walk =3D clear_refs_test_walk, - .walk_lock =3D PGWALK_WRLOCK, -}; - static ssize_t clear_refs_write(struct file *file, const char __user *buf, size_t count, loff_t *ppos) { @@ -1688,9 +1664,6 @@ static ssize_t clear_refs_write(struct file *file, co= nst char __user *buf, if (mm) { VMA_ITERATOR(vmi, mm, 0); struct mmu_notifier_range range; - struct clear_refs_private cp =3D { - .type =3D type, - }; =20 if (mmap_write_lock_killable(mm)) { count =3D -EINTR; @@ -1712,13 +1685,47 @@ static ssize_t clear_refs_write(struct file *file, = const char __user *buf, vm_flags_clear(vma, VM_SOFTDIRTY); vma_set_page_prot(vma); } - + vma_iter_init(&vmi, mm, 0); inc_tlb_flush_pending(mm); mmu_notifier_range_init(&range, MMU_NOTIFY_SOFT_DIRTY, 0, mm, 0, -1UL); mmu_notifier_invalidate_range_start(&range); } - walk_page_range(mm, 0, -1, &clear_refs_walk_ops, &cp); + + for_each_vma(vmi, vma) { + struct pt_range_walk ptw =3D { + .mm =3D mm, + }; + enum pt_range_walk_type pt_type; + pt_type_flags_t flags =3D PT_TYPE_ALL; + int ret; + + /* We ignore hugetlb vmas */ + if (is_vm_hugetlb_page(vma)) + continue; + + ret =3D clear_refs_test_vma(vma, type); + if (ret > 0) + continue; + else if (ret < 0) + break; + + pt_type =3D pt_range_walk_start(&ptw, vma, vma->vm_start, vma->vm_end, = flags); + while (pt_type !=3D PTW_DONE) { + + /* + * Since we ignore hugetlb vmas, we just care for PMD + * or PTE mapped pages. + */ + if (ptw.level =3D=3D PTW_PMD_LEVEL) + clear_refs_pmd_range(&ptw, type); + else if (ptw.level =3D=3D PTW_PTE_LEVEL) + clear_refs_pte_range(&ptw, type); + + pt_type =3D pt_range_walk_next(&ptw, vma, vma->vm_start, vma->vm_end, = flags); + } + pt_range_walk_done(&ptw); + } if (type =3D=3D CLEAR_REFS_SOFT_DIRTY) { mmu_notifier_invalidate_range_end(&range); flush_tlb_mm(mm); --=20 2.53.0