From nobody Sat Feb 7 19:45:35 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4D751C7EE23 for ; Mon, 22 May 2023 04:49:55 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231486AbjEVEtx (ORCPT ); Mon, 22 May 2023 00:49:53 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:42344 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229841AbjEVEtu (ORCPT ); Mon, 22 May 2023 00:49:50 -0400 Received: from mail-yb1-xb31.google.com (mail-yb1-xb31.google.com [IPv6:2607:f8b0:4864:20::b31]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 94CCDE0 for ; Sun, 21 May 2023 21:49:49 -0700 (PDT) Received: by mail-yb1-xb31.google.com with SMTP id 3f1490d57ef6-babb985f9c8so3605725276.1 for ; Sun, 21 May 2023 21:49:49 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1684730989; x=1687322989; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:from:to:cc:subject:date:message-id:reply-to; bh=aPDEzBR49R6dDlw5Ac4B9CyjOOF4I7QDHwXmKNSpbHY=; b=jjqeukn26805nHVLC5nBwMgzM4/Jmq9nujdcXAsHlTTa0MF+m2phch6zZaIztXO6Dz fns/H0GYdXEsi480XLUtkShE3/SCYxhfBiTmi4a8mWlPCmy6IGq/VUwbfbhJIIoyZO7w TE3oj47MvYeZYWdEl1lce5yvexgSm//EOtHJ9GZeQN0nTG0k7iqmz6ivXvrQT6/vTILo r9ZBlahSUgKAO7KElSgVquCd8eytGkW3ohhN44RmnSrwXqOowlkCJVfPWiZAiA+rPaIg Hemws7mTvH+9rnEvzhcxuWU8gWDXFk8QvfENWQsXDu8/h+u8uj+2BFe5jHCcgno98BaD /dgQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1684730989; x=1687322989; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=aPDEzBR49R6dDlw5Ac4B9CyjOOF4I7QDHwXmKNSpbHY=; b=LW3beiwCABOqDmgARfC59I7VCuKQQP34QlzvfNdrA5CWh+XFCBMdIM6b5j+hrqAnyI kwCTgXEMA6aty1KBh+yryO2OwD42BQdfEjmAb3r1R2POJ5CJD0Uas3yRjM6wdZOz6YNz Zavp6CVjX2M6/zKljzN2Mm3cns6FhF6jJ7889FGLCn0OUGnUnB4MnKbwEbXGHTLtjJag DVGf+tcxPrF88GI9XyS5kPW7+5UwxaxTtmiC5wirsQX62aMuGRzhpoqWIPFxD+cvRfP3 xCECxgJQtQp6O2+F+HppQmvpsAuQt6E87YIk6el/ZU81G5sTWK/UGr6el7Hbo1y1icKM Vo0A== X-Gm-Message-State: AC+VfDwnXMjDML0c1vymHSaTtPGpTZH0bT95+6stWEjHTIK9ooYIy8TI rPE+Xoh0A5yW8nwIv+Yflrx00A== X-Google-Smtp-Source: ACHHUZ5ZcQphQAL7jjAMB8ykWMW22wF1U8iyf27N7XhXamyXhJP6vomdj/dc7m8k3wKxzVaO0p10GA== X-Received: by 2002:a25:d242:0:b0:b9a:66b7:673e with SMTP id j63-20020a25d242000000b00b9a66b7673emr11337327ybg.43.1684730988606; Sun, 21 May 2023 21:49:48 -0700 (PDT) Received: from ripple.attlocal.net (172-10-233-147.lightspeed.sntcca.sbcglobal.net. [172.10.233.147]) by smtp.gmail.com with ESMTPSA id b206-20020a0dd9d7000000b00560f6704ee1sm1809847ywe.26.2023.05.21.21.49.46 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 21 May 2023 21:49:48 -0700 (PDT) Date: Sun, 21 May 2023 21:49:45 -0700 (PDT) From: Hugh Dickins X-X-Sender: hugh@ripple.attlocal.net To: Andrew Morton cc: Mike Kravetz , Mike Rapoport , "Kirill A. Shutemov" , Matthew Wilcox , David Hildenbrand , Suren Baghdasaryan , Qi Zheng , Yang Shi , Mel Gorman , Peter Xu , Peter Zijlstra , Will Deacon , Yu Zhao , Alistair Popple , Ralph Campbell , Ira Weiny , Steven Price , SeongJae Park , Naoya Horiguchi , Christophe Leroy , Zack Rusin , Jason Gunthorpe , Axel Rasmussen , Anshuman Khandual , Pasha Tatashin , Miaohe Lin , Minchan Kim , Christoph Hellwig , Song Liu , Thomas Hellstrom , linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: [PATCH 01/31] mm: use pmdp_get_lockless() without surplus barrier() In-Reply-To: <68a97fbe-5c1e-7ac6-72c-7b9c6290b370@google.com> Message-ID: <34467cca-58b6-3e64-1ee7-e3dc43257a@google.com> References: <68a97fbe-5c1e-7ac6-72c-7b9c6290b370@google.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Use pmdp_get_lockless() in preference to READ_ONCE(*pmdp), to get a more reliable result with PAE (or READ_ONCE as before without PAE); and remove the unnecessary extra barrier()s which got left behind in its callers. HOWEVER: Note the small print in linux/pgtable.h, where it was designed specifically for fast GUP, and depends on interrupts being disabled for its full guarantee: most callers which have been added (here and before) do NOT have interrupts disabled, so there is still some need for caution. Signed-off-by: Hugh Dickins Acked-by: Peter Xu Acked-by: Yu Zhao --- fs/userfaultfd.c | 10 +--------- include/linux/pgtable.h | 17 ----------------- mm/gup.c | 6 +----- mm/hmm.c | 2 +- mm/khugepaged.c | 5 ----- mm/ksm.c | 3 +-- mm/memory.c | 14 ++------------ mm/mprotect.c | 5 ----- mm/page_vma_mapped.c | 2 +- 9 files changed, 7 insertions(+), 57 deletions(-) diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c index 0fd96d6e39ce..f7a0817b1ec0 100644 --- a/fs/userfaultfd.c +++ b/fs/userfaultfd.c @@ -349,15 +349,7 @@ static inline bool userfaultfd_must_wait(struct userfa= ultfd_ctx *ctx, if (!pud_present(*pud)) goto out; pmd =3D pmd_offset(pud, address); - /* - * READ_ONCE must function as a barrier with narrower scope - * and it must be equivalent to: - * _pmd =3D *pmd; barrier(); - * - * This is to deal with the instability (as in - * pmd_trans_unstable) of the pmd. - */ - _pmd =3D READ_ONCE(*pmd); + _pmd =3D pmdp_get_lockless(pmd); if (pmd_none(_pmd)) goto out; =20 diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h index c5a51481bbb9..8ec27fe69dc8 100644 --- a/include/linux/pgtable.h +++ b/include/linux/pgtable.h @@ -1344,23 +1344,6 @@ static inline int pud_trans_unstable(pud_t *pud) static inline int pmd_none_or_trans_huge_or_clear_bad(pmd_t *pmd) { pmd_t pmdval =3D pmdp_get_lockless(pmd); - /* - * The barrier will stabilize the pmdval in a register or on - * the stack so that it will stop changing under the code. - * - * When CONFIG_TRANSPARENT_HUGEPAGE=3Dy on x86 32bit PAE, - * pmdp_get_lockless is allowed to return a not atomic pmdval - * (for example pointing to an hugepage that has never been - * mapped in the pmd). The below checks will only care about - * the low part of the pmd with 32bit PAE x86 anyway, with the - * exception of pmd_none(). So the important thing is that if - * the low part of the pmd is found null, the high part will - * be also null or the pmd_none() check below would be - * confused. - */ -#ifdef CONFIG_TRANSPARENT_HUGEPAGE - barrier(); -#endif /* * !pmd_present() checks for pmd migration entries * diff --git a/mm/gup.c b/mm/gup.c index bbe416236593..3bd5d3854c51 100644 --- a/mm/gup.c +++ b/mm/gup.c @@ -653,11 +653,7 @@ static struct page *follow_pmd_mask(struct vm_area_str= uct *vma, struct mm_struct *mm =3D vma->vm_mm; =20 pmd =3D pmd_offset(pudp, address); - /* - * The READ_ONCE() will stabilize the pmdval in a register or - * on the stack so that it will stop changing under the code. - */ - pmdval =3D READ_ONCE(*pmd); + pmdval =3D pmdp_get_lockless(pmd); if (pmd_none(pmdval)) return no_page_table(vma, flags); if (!pmd_present(pmdval)) diff --git a/mm/hmm.c b/mm/hmm.c index 6a151c09de5e..e23043345615 100644 --- a/mm/hmm.c +++ b/mm/hmm.c @@ -332,7 +332,7 @@ static int hmm_vma_walk_pmd(pmd_t *pmdp, pmd_t pmd; =20 again: - pmd =3D READ_ONCE(*pmdp); + pmd =3D pmdp_get_lockless(pmdp); if (pmd_none(pmd)) return hmm_vma_walk_hole(start, end, -1, walk); =20 diff --git a/mm/khugepaged.c b/mm/khugepaged.c index 6b9d39d65b73..732f9ac393fc 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -961,11 +961,6 @@ static int find_pmd_or_thp_or_none(struct mm_struct *m= m, return SCAN_PMD_NULL; =20 pmde =3D pmdp_get_lockless(*pmd); - -#ifdef CONFIG_TRANSPARENT_HUGEPAGE - /* See comments in pmd_none_or_trans_huge_or_clear_bad() */ - barrier(); -#endif if (pmd_none(pmde)) return SCAN_PMD_NONE; if (!pmd_present(pmde)) diff --git a/mm/ksm.c b/mm/ksm.c index 0156bded3a66..df2aa281d49d 100644 --- a/mm/ksm.c +++ b/mm/ksm.c @@ -1194,8 +1194,7 @@ static int replace_page(struct vm_area_struct *vma, s= truct page *page, * without holding anon_vma lock for write. So when looking for a * genuine pmde (in which to find pte), test present and !THP together. */ - pmde =3D *pmd; - barrier(); + pmde =3D pmdp_get_lockless(pmd); if (!pmd_present(pmde) || pmd_trans_huge(pmde)) goto out; =20 diff --git a/mm/memory.c b/mm/memory.c index f69fbc251198..2eb54c0d5d3c 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -4925,18 +4925,9 @@ static vm_fault_t handle_pte_fault(struct vm_fault *= vmf) * So now it's safe to run pte_offset_map(). */ vmf->pte =3D pte_offset_map(vmf->pmd, vmf->address); - vmf->orig_pte =3D *vmf->pte; + vmf->orig_pte =3D ptep_get_lockless(vmf->pte); vmf->flags |=3D FAULT_FLAG_ORIG_PTE_VALID; =20 - /* - * some architectures can have larger ptes than wordsize, - * e.g.ppc44x-defconfig has CONFIG_PTE_64BIT=3Dy and - * CONFIG_32BIT=3Dy, so READ_ONCE cannot guarantee atomic - * accesses. The code below just needs a consistent view - * for the ifs and we later double check anyway with the - * ptl lock held. So here a barrier will do. - */ - barrier(); if (pte_none(vmf->orig_pte)) { pte_unmap(vmf->pte); vmf->pte =3D NULL; @@ -5060,9 +5051,8 @@ static vm_fault_t __handle_mm_fault(struct vm_area_st= ruct *vma, if (!(ret & VM_FAULT_FALLBACK)) return ret; } else { - vmf.orig_pmd =3D *vmf.pmd; + vmf.orig_pmd =3D pmdp_get_lockless(vmf.pmd); =20 - barrier(); if (unlikely(is_swap_pmd(vmf.orig_pmd))) { VM_BUG_ON(thp_migration_supported() && !is_pmd_migration_entry(vmf.orig_pmd)); diff --git a/mm/mprotect.c b/mm/mprotect.c index 92d3d3ca390a..c5a13c0f1017 100644 --- a/mm/mprotect.c +++ b/mm/mprotect.c @@ -309,11 +309,6 @@ static inline int pmd_none_or_clear_bad_unless_trans_h= uge(pmd_t *pmd) { pmd_t pmdval =3D pmdp_get_lockless(pmd); =20 - /* See pmd_none_or_trans_huge_or_clear_bad for info on barrier */ -#ifdef CONFIG_TRANSPARENT_HUGEPAGE - barrier(); -#endif - if (pmd_none(pmdval)) return 1; if (pmd_trans_huge(pmdval)) diff --git a/mm/page_vma_mapped.c b/mm/page_vma_mapped.c index 4e448cfbc6ef..64aff6718bdb 100644 --- a/mm/page_vma_mapped.c +++ b/mm/page_vma_mapped.c @@ -210,7 +210,7 @@ bool page_vma_mapped_walk(struct page_vma_mapped_walk *= pvmw) * compiler and used as a stale value after we've observed a * subsequent update. */ - pmde =3D READ_ONCE(*pvmw->pmd); + pmde =3D pmdp_get_lockless(pvmw->pmd); =20 if (pmd_trans_huge(pmde) || is_pmd_migration_entry(pmde) || (pmd_present(pmde) && pmd_devmap(pmde))) { --=20 2.35.3 From nobody Sat Feb 7 19:45:35 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9793EC7EE23 for ; Mon, 22 May 2023 04:51:18 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231545AbjEVEvR (ORCPT ); Mon, 22 May 2023 00:51:17 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:42846 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229571AbjEVEvL (ORCPT ); Mon, 22 May 2023 00:51:11 -0400 Received: from mail-yb1-xb2e.google.com (mail-yb1-xb2e.google.com [IPv6:2607:f8b0:4864:20::b2e]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 85F0B132 for ; Sun, 21 May 2023 21:51:05 -0700 (PDT) Received: by mail-yb1-xb2e.google.com with SMTP id 3f1490d57ef6-b9a7e639656so10757891276.0 for ; Sun, 21 May 2023 21:51:05 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1684731064; x=1687323064; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:from:to:cc:subject:date:message-id:reply-to; bh=ANcMmc9BeTTDRJseGPHLnYq8DMAivc7OJ7zKS5AkOEU=; b=suZ7/fWyjgtDmmUUG368PhOQSIoqVmOnxBEGz1k9zCkGOxssorwZrFR60X+8a4DQqr 7u71fp8c7x68EiyyW8ZSDwzHkHiPONOn7yDOgmy63jiQBTyUNcqNciC9w+0U347zF4Gf qZEb5FQPLn1+cE60AE8Q2hbQucuksdSO++m5CDyyBTN/iGnKQo4gRrU//els8iIiBdS7 ovYLm7Yj3qMDE0Qoy1tWZGPaThqr4eXUwvzaDxInjjwEGGfg2sUxlTqNWk6AaZWuz+i8 FTUjVygqWWxTs+smOosRSftvUI/03MEA0EZ3L2DHmfVjXa6bRJ1mwd0Lc0WFnRQmYQLX GTlA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1684731064; x=1687323064; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=ANcMmc9BeTTDRJseGPHLnYq8DMAivc7OJ7zKS5AkOEU=; b=cIQnZDEAno+Dh2HM1nAaFnzgKtnED2E4c/IqYJ2q+O4ZBmU+sct4cYRCr1quYEBDxn hIO4XON1ZNNZ1HH3FKt6FvPpBiN2RBXTe0n3Jc59nFLZqNCaGglmTCIVAKvvrZ2Hccdt 6lOwXdRXAH9WpWZ5Kv5HWe5dqdOeI7jE/WQbfhHCHs+nAbpdfI8ptpdB87nAaypXwV9i sYDcSHATxrC0dUWH71xE74J5vuwoKCnctZS2Fe4gqhVVXj/uGQC1RUt+CC5UB87xaN8B 3B5xNB17I0ssMPGTlddTSN9WYOzpfylsBmh5shZQjwryWuv9hFyOee32Hetls+NOD16C lCDg== X-Gm-Message-State: AC+VfDyFO2lXBsy8e5pyyIW5PKwA5cBFBF65jwSM19hoZ2nPxrHIe+W4 L8kV4PgPzs6CrW9OXE4AgvwYwg== X-Google-Smtp-Source: ACHHUZ6jO0XEKla2wlyWHkLbd6267BBIAu3WYemOi2dKQNM0AlHnDzPyGpw7fyb0X8v56kZQVrFU2A== X-Received: by 2002:a25:1042:0:b0:ba8:1807:9d7f with SMTP id 63-20020a251042000000b00ba818079d7fmr9116239ybq.58.1684731064262; Sun, 21 May 2023 21:51:04 -0700 (PDT) Received: from ripple.attlocal.net (172-10-233-147.lightspeed.sntcca.sbcglobal.net. [172.10.233.147]) by smtp.gmail.com with ESMTPSA id a11-20020a81bb4b000000b0054e82b6a95esm1808098ywl.42.2023.05.21.21.51.01 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 21 May 2023 21:51:04 -0700 (PDT) Date: Sun, 21 May 2023 21:51:00 -0700 (PDT) From: Hugh Dickins X-X-Sender: hugh@ripple.attlocal.net To: Andrew Morton cc: Mike Kravetz , Mike Rapoport , "Kirill A. Shutemov" , Matthew Wilcox , David Hildenbrand , Suren Baghdasaryan , Qi Zheng , Yang Shi , Mel Gorman , Peter Xu , Peter Zijlstra , Will Deacon , Yu Zhao , Alistair Popple , Ralph Campbell , Ira Weiny , Steven Price , SeongJae Park , Naoya Horiguchi , Christophe Leroy , Zack Rusin , Jason Gunthorpe , Axel Rasmussen , Anshuman Khandual , Pasha Tatashin , Miaohe Lin , Minchan Kim , Christoph Hellwig , Song Liu , Thomas Hellstrom , linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: [PATCH 02/31] mm/migrate: remove cruft from migration_entry_wait()s In-Reply-To: <68a97fbe-5c1e-7ac6-72c-7b9c6290b370@google.com> Message-ID: <1659568-468a-6d36-c26-6a52a335ab59@google.com> References: <68a97fbe-5c1e-7ac6-72c-7b9c6290b370@google.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" migration_entry_wait_on_locked() does not need to take a mapped pte pointer, its callers can do the unmap first. Annotate it with __releases(ptl) to reduce sparse warnings. Fold __migration_entry_wait_huge() into migration_entry_wait_huge(). Fold __migration_entry_wait() into migration_entry_wait(), preferring the tighter pte_offset_map_lock() to pte_offset_map() and pte_lockptr(). Signed-off-by: Hugh Dickins Reviewed-by: Alistair Popple --- include/linux/migrate.h | 4 ++-- include/linux/swapops.h | 17 +++-------------- mm/filemap.c | 13 ++++--------- mm/migrate.c | 37 +++++++++++++------------------------ 4 files changed, 22 insertions(+), 49 deletions(-) diff --git a/include/linux/migrate.h b/include/linux/migrate.h index 6241a1596a75..affea3063473 100644 --- a/include/linux/migrate.h +++ b/include/linux/migrate.h @@ -75,8 +75,8 @@ bool isolate_movable_page(struct page *page, isolate_mode= _t mode); =20 int migrate_huge_page_move_mapping(struct address_space *mapping, struct folio *dst, struct folio *src); -void migration_entry_wait_on_locked(swp_entry_t entry, pte_t *ptep, - spinlock_t *ptl); +void migration_entry_wait_on_locked(swp_entry_t entry, spinlock_t *ptl) + __releases(ptl); void folio_migrate_flags(struct folio *newfolio, struct folio *folio); void folio_migrate_copy(struct folio *newfolio, struct folio *folio); int folio_migrate_mapping(struct address_space *mapping, diff --git a/include/linux/swapops.h b/include/linux/swapops.h index 3a451b7afcb3..4c932cb45e0b 100644 --- a/include/linux/swapops.h +++ b/include/linux/swapops.h @@ -332,15 +332,9 @@ static inline bool is_migration_entry_dirty(swp_entry_= t entry) return false; } =20 -extern void __migration_entry_wait(struct mm_struct *mm, pte_t *ptep, - spinlock_t *ptl); extern void migration_entry_wait(struct mm_struct *mm, pmd_t *pmd, unsigned long address); -#ifdef CONFIG_HUGETLB_PAGE -extern void __migration_entry_wait_huge(struct vm_area_struct *vma, - pte_t *ptep, spinlock_t *ptl); extern void migration_entry_wait_huge(struct vm_area_struct *vma, pte_t *p= te); -#endif /* CONFIG_HUGETLB_PAGE */ #else /* CONFIG_MIGRATION */ static inline swp_entry_t make_readable_migration_entry(pgoff_t offset) { @@ -362,15 +356,10 @@ static inline int is_migration_entry(swp_entry_t swp) return 0; } =20 -static inline void __migration_entry_wait(struct mm_struct *mm, pte_t *pte= p, - spinlock_t *ptl) { } static inline void migration_entry_wait(struct mm_struct *mm, pmd_t *pmd, - unsigned long address) { } -#ifdef CONFIG_HUGETLB_PAGE -static inline void __migration_entry_wait_huge(struct vm_area_struct *vma, - pte_t *ptep, spinlock_t *ptl) { } -static inline void migration_entry_wait_huge(struct vm_area_struct *vma, p= te_t *pte) { } -#endif /* CONFIG_HUGETLB_PAGE */ + unsigned long address) { } +static inline void migration_entry_wait_huge(struct vm_area_struct *vma, + pte_t *pte) { } static inline int is_writable_migration_entry(swp_entry_t entry) { return 0; diff --git a/mm/filemap.c b/mm/filemap.c index b4c9bd368b7e..28b42ee848a4 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -1359,8 +1359,6 @@ static inline int folio_wait_bit_common(struct folio = *folio, int bit_nr, /** * migration_entry_wait_on_locked - Wait for a migration entry to be remov= ed * @entry: migration swap entry. - * @ptep: mapped pte pointer. Will return with the ptep unmapped. Only req= uired - * for pte entries, pass NULL for pmd entries. * @ptl: already locked ptl. This function will drop the lock. * * Wait for a migration entry referencing the given page to be removed. Th= is is @@ -1369,13 +1367,13 @@ static inline int folio_wait_bit_common(struct foli= o *folio, int bit_nr, * should be called while holding the ptl for the migration entry referenc= ing * the page. * - * Returns after unmapping and unlocking the pte/ptl with pte_unmap_unlock= (). + * Returns after unlocking the ptl. * * This follows the same logic as folio_wait_bit_common() so see the comme= nts * there. */ -void migration_entry_wait_on_locked(swp_entry_t entry, pte_t *ptep, - spinlock_t *ptl) +void migration_entry_wait_on_locked(swp_entry_t entry, spinlock_t *ptl) + __releases(ptl) { struct wait_page_queue wait_page; wait_queue_entry_t *wait =3D &wait_page.wait; @@ -1409,10 +1407,7 @@ void migration_entry_wait_on_locked(swp_entry_t entr= y, pte_t *ptep, * a valid reference to the page, and it must take the ptl to remove the * migration entry. So the page is valid until the ptl is dropped. */ - if (ptep) - pte_unmap_unlock(ptep, ptl); - else - spin_unlock(ptl); + spin_unlock(ptl); =20 for (;;) { unsigned int flags; diff --git a/mm/migrate.c b/mm/migrate.c index 01cac26a3127..3ecb7a40075f 100644 --- a/mm/migrate.c +++ b/mm/migrate.c @@ -296,14 +296,18 @@ void remove_migration_ptes(struct folio *src, struct = folio *dst, bool locked) * get to the page and wait until migration is finished. * When we return from this function the fault will be retried. */ -void __migration_entry_wait(struct mm_struct *mm, pte_t *ptep, - spinlock_t *ptl) +void migration_entry_wait(struct mm_struct *mm, pmd_t *pmd, + unsigned long address) { + spinlock_t *ptl; + pte_t *ptep; pte_t pte; swp_entry_t entry; =20 - spin_lock(ptl); + ptep =3D pte_offset_map_lock(mm, pmd, address, &ptl); pte =3D *ptep; + pte_unmap(ptep); + if (!is_swap_pte(pte)) goto out; =20 @@ -311,18 +315,10 @@ void __migration_entry_wait(struct mm_struct *mm, pte= _t *ptep, if (!is_migration_entry(entry)) goto out; =20 - migration_entry_wait_on_locked(entry, ptep, ptl); + migration_entry_wait_on_locked(entry, ptl); return; out: - pte_unmap_unlock(ptep, ptl); -} - -void migration_entry_wait(struct mm_struct *mm, pmd_t *pmd, - unsigned long address) -{ - spinlock_t *ptl =3D pte_lockptr(mm, pmd); - pte_t *ptep =3D pte_offset_map(pmd, address); - __migration_entry_wait(mm, ptep, ptl); + spin_unlock(ptl); } =20 #ifdef CONFIG_HUGETLB_PAGE @@ -332,9 +328,9 @@ void migration_entry_wait(struct mm_struct *mm, pmd_t *= pmd, * * This function will release the vma lock before returning. */ -void __migration_entry_wait_huge(struct vm_area_struct *vma, - pte_t *ptep, spinlock_t *ptl) +void migration_entry_wait_huge(struct vm_area_struct *vma, pte_t *ptep) { + spinlock_t *ptl =3D huge_pte_lockptr(hstate_vma(vma), vma->vm_mm, ptep); pte_t pte; =20 hugetlb_vma_assert_locked(vma); @@ -352,16 +348,9 @@ void __migration_entry_wait_huge(struct vm_area_struct= *vma, * lock release in migration_entry_wait_on_locked(). */ hugetlb_vma_unlock_read(vma); - migration_entry_wait_on_locked(pte_to_swp_entry(pte), NULL, ptl); + migration_entry_wait_on_locked(pte_to_swp_entry(pte), ptl); } } - -void migration_entry_wait_huge(struct vm_area_struct *vma, pte_t *pte) -{ - spinlock_t *ptl =3D huge_pte_lockptr(hstate_vma(vma), vma->vm_mm, pte); - - __migration_entry_wait_huge(vma, pte, ptl); -} #endif =20 #ifdef CONFIG_ARCH_ENABLE_THP_MIGRATION @@ -372,7 +361,7 @@ void pmd_migration_entry_wait(struct mm_struct *mm, pmd= _t *pmd) ptl =3D pmd_lock(mm, pmd); if (!is_pmd_migration_entry(*pmd)) goto unlock; - migration_entry_wait_on_locked(pmd_to_swp_entry(*pmd), NULL, ptl); + migration_entry_wait_on_locked(pmd_to_swp_entry(*pmd), ptl); return; unlock: spin_unlock(ptl); --=20 2.35.3 From nobody Sat Feb 7 19:45:35 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3E8CEC77B73 for ; Mon, 22 May 2023 04:52:40 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231551AbjEVEwj (ORCPT ); Mon, 22 May 2023 00:52:39 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:43322 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229571AbjEVEwg (ORCPT ); Mon, 22 May 2023 00:52:36 -0400 Received: from mail-yw1-x112e.google.com (mail-yw1-x112e.google.com [IPv6:2607:f8b0:4864:20::112e]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C634BBB for ; Sun, 21 May 2023 21:52:35 -0700 (PDT) Received: by mail-yw1-x112e.google.com with SMTP id 00721157ae682-561bcd35117so70007567b3.3 for ; Sun, 21 May 2023 21:52:35 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1684731155; x=1687323155; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:from:to:cc:subject:date:message-id:reply-to; bh=lc/g6LXQOs93yTgCiGczTkkFK+sAgodNjTysgahDEVw=; b=y0KD6rBKhVlt+/hQTMgxrgHr2ryOWmswWrxwVwW1aFZ79LkRBOC59dixF5L/cDZc4B qnUQhKlaq1KoTySTp2vqiuM8rApR8HI8YhApmutmEIgesAjWn3Zc6GfqRJFCVIZETmFo Ve2TuK1DkPJI4FLOOGkjpX3SgGZwTv9PexqmJixY6pwUW+8F8Muuz17KmV5M/4+hZQ/g NOfOuuy9xT5Aq5lXVwksnHlv6Vlink7vPREGhAD7sqE7v5xbbRTG+tMaRpWIb/f65tVQ 6KthNz49DESDlvL8PCw3g3sz72EdxO0SmuuSbMBDwQ700O/IFokzhDePdlcNOBT1Ipvs 7n6A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1684731155; x=1687323155; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=lc/g6LXQOs93yTgCiGczTkkFK+sAgodNjTysgahDEVw=; b=X/aaKhL6/S0/F2VAAvvi6H28sfJbZ2k889yvEb/LN3q7gcsUz9VoRnVvoBV25qKH70 zuLXGW0rL9YGWxlNH1IsPJhnPS4FJK9Z5a95TlIQNUI3NS3cIA6UGF1WbQIavO3u/Qw/ E5TJxpxpZgt25xJEMCUfh34VZaFGXtwyLydeLQjPTeX2zvnWSWsKE/xbI1Qs+JldE6t/ TeLrebYA9wHbQzbXjGw1UH5oCjNhp+5lPqTVXRzWVGUNEq08smkIbC1QQriUQkAkMJBy BREWWOSwAzWiY3Q/xm7eXqmZ3CwYbWS1H08lUycRvvYweSCGDhN+SBu8arqFhFKQh5rE TUug== X-Gm-Message-State: AC+VfDwhrP3DRIYXNxaLLRcx6TvOsbWIWKvuEsOKLXzJu0PZhKxr/Zku fGhyndr5mdqWa8EWXpDnfR/HhA== X-Google-Smtp-Source: ACHHUZ5dIfhq7xIc6CspLmzSUsrpzIMl80pglZ3+j6+NrHQ8JwvnQ0G2n+LIPSBV/lbo+YJXvCTWKQ== X-Received: by 2002:a0d:d901:0:b0:55a:9e43:7efe with SMTP id b1-20020a0dd901000000b0055a9e437efemr10375067ywe.44.1684731154895; Sun, 21 May 2023 21:52:34 -0700 (PDT) Received: from ripple.attlocal.net (172-10-233-147.lightspeed.sntcca.sbcglobal.net. [172.10.233.147]) by smtp.gmail.com with ESMTPSA id w6-20020a814906000000b0054f8b201c70sm1786111ywa.108.2023.05.21.21.52.32 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 21 May 2023 21:52:34 -0700 (PDT) Date: Sun, 21 May 2023 21:52:31 -0700 (PDT) From: Hugh Dickins X-X-Sender: hugh@ripple.attlocal.net To: Andrew Morton cc: Mike Kravetz , Mike Rapoport , "Kirill A. Shutemov" , Matthew Wilcox , David Hildenbrand , Suren Baghdasaryan , Qi Zheng , Yang Shi , Mel Gorman , Peter Xu , Peter Zijlstra , Will Deacon , Yu Zhao , Alistair Popple , Ralph Campbell , Ira Weiny , Steven Price , SeongJae Park , Naoya Horiguchi , Christophe Leroy , Zack Rusin , Jason Gunthorpe , Axel Rasmussen , Anshuman Khandual , Pasha Tatashin , Miaohe Lin , Minchan Kim , Christoph Hellwig , Song Liu , Thomas Hellstrom , linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: [PATCH 03/31] mm/pgtable: kmap_local_page() instead of kmap_atomic() In-Reply-To: <68a97fbe-5c1e-7ac6-72c-7b9c6290b370@google.com> Message-ID: <9df4aba7-fd2f-2da3-1543-fc6b4b42f5b9@google.com> References: <68a97fbe-5c1e-7ac6-72c-7b9c6290b370@google.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" pte_offset_map() was still using kmap_atomic(): update it to the preferred kmap_local_page() before making further changes there, in case we need this as a bisection point; but I doubt it can cause any trouble. Signed-off-by: Hugh Dickins --- include/linux/pgtable.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h index 8ec27fe69dc8..94235ff2706e 100644 --- a/include/linux/pgtable.h +++ b/include/linux/pgtable.h @@ -96,9 +96,9 @@ static inline pte_t *pte_offset_kernel(pmd_t *pmd, unsign= ed long address) =20 #if defined(CONFIG_HIGHPTE) #define pte_offset_map(dir, address) \ - ((pte_t *)kmap_atomic(pmd_page(*(dir))) + \ + ((pte_t *)kmap_local_page(pmd_page(*(dir))) + \ pte_index((address))) -#define pte_unmap(pte) kunmap_atomic((pte)) +#define pte_unmap(pte) kunmap_local((pte)) #else #define pte_offset_map(dir, address) pte_offset_kernel((dir), (address)) #define pte_unmap(pte) ((void)(pte)) /* NOP */ --=20 2.35.3 From nobody Sat Feb 7 19:45:35 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 237DFC77B73 for ; Mon, 22 May 2023 04:53:41 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231584AbjEVExj (ORCPT ); Mon, 22 May 2023 00:53:39 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:43990 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231574AbjEVExf (ORCPT ); Mon, 22 May 2023 00:53:35 -0400 Received: from mail-yb1-xb2f.google.com (mail-yb1-xb2f.google.com [IPv6:2607:f8b0:4864:20::b2f]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 995D1F9 for ; Sun, 21 May 2023 21:53:33 -0700 (PDT) Received: by mail-yb1-xb2f.google.com with SMTP id 3f1490d57ef6-b9daef8681fso4777466276.1 for ; Sun, 21 May 2023 21:53:33 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1684731213; x=1687323213; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:from:to:cc:subject:date:message-id:reply-to; bh=aQBwLL4HlECbSO5gR059+G6dephAddbWEgO6E2LDWZA=; b=QlhUE6K0ATPFlt6KGVkJhXqSKL2ZPZOFukSr2LTyU5JBI1yxIy3RC9hQaumaw7cyuQ ADgG035cRuNduod0FxNhIZ0JZ4NHX+U32x2wiEFQa00gNFi56c3mSK3uiJdpEyXYIUsT ROwP4RLPq5CG77EbsJtEYQtkmgcTOVN5CnUNanq9+UFDyx0pkaHUzcdHI1z5w0XPUSmi o2MufvhNsbrbye+c4cFuachrzWPeZh2+I5zinfMV/dyW4naX4xe0MJ9MR0i12bBzFRCj 8PLk4Zikwq2ioFjQFgXXXofFctLjXIybQaGA/r+no+6O/pmdY3s7wDE/uJygGJA7s2KN wLcg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1684731213; x=1687323213; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=aQBwLL4HlECbSO5gR059+G6dephAddbWEgO6E2LDWZA=; b=jFGs5QlzkSezGPHzoFA0OrmIe97hl2nWtDFwovQn/erngHfXis/FxJOnAK+Q+tPXFO tFB+yZKSsK2HSqOKXjbYBWZbL55pY8E+AaSZRDrLd9ROQikfY8C+LIJBCp+6RW8BnIBL exTeeNSiwO18b8J0zpHJJ69G3VDF/bHVEkzPnRyGvksuHOBlXX5bNJUgkX4CZ3HzZuXk 0p90HtSomiH//FwwV/kpuzBuZ/Fk27wjrG/oRtTyCgfPx3U6K1wzkr69iKkMlealyHy6 G+mUZMt02Boz8UkmznXayYJGNcD0DdBvPHckIKNTdXLVmqfw+r3aQ1VEJ7elX060q9oW ZTTg== X-Gm-Message-State: AC+VfDx0HaJUzBIb/nTOGhAi9aQTqRJ5H6PyvgNLrhpDAB05OIeAz0Jh BKLdWuxNY3bV4SDLMiGs60IV3g== X-Google-Smtp-Source: ACHHUZ4a/wUx7HBHObnjrL9fmwJm9cuLSPhXj2yvWOXUl7AxoJ4/x4VelqXRKeaAtFJsJWrg4u9aMA== X-Received: by 2002:a25:cfc6:0:b0:ba8:62ed:2221 with SMTP id f189-20020a25cfc6000000b00ba862ed2221mr9802761ybg.62.1684731212645; Sun, 21 May 2023 21:53:32 -0700 (PDT) Received: from ripple.attlocal.net (172-10-233-147.lightspeed.sntcca.sbcglobal.net. [172.10.233.147]) by smtp.gmail.com with ESMTPSA id x7-20020a259a07000000b00b8f6ec5a955sm1255333ybn.49.2023.05.21.21.53.29 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 21 May 2023 21:53:32 -0700 (PDT) Date: Sun, 21 May 2023 21:53:28 -0700 (PDT) From: Hugh Dickins X-X-Sender: hugh@ripple.attlocal.net To: Andrew Morton cc: Mike Kravetz , Mike Rapoport , "Kirill A. Shutemov" , Matthew Wilcox , David Hildenbrand , Suren Baghdasaryan , Qi Zheng , Yang Shi , Mel Gorman , Peter Xu , Peter Zijlstra , Will Deacon , Yu Zhao , Alistair Popple , Ralph Campbell , Ira Weiny , Steven Price , SeongJae Park , Naoya Horiguchi , Christophe Leroy , Zack Rusin , Jason Gunthorpe , Axel Rasmussen , Anshuman Khandual , Pasha Tatashin , Miaohe Lin , Minchan Kim , Christoph Hellwig , Song Liu , Thomas Hellstrom , linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: [PATCH 04/31] mm/pgtable: allow pte_offset_map[_lock]() to fail In-Reply-To: <68a97fbe-5c1e-7ac6-72c-7b9c6290b370@google.com> Message-ID: <8218ffdc-8be-54e5-0a8-83f5542af283@google.com> References: <68a97fbe-5c1e-7ac6-72c-7b9c6290b370@google.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Make pte_offset_map() a wrapper for __pte_offset_map() (optionally outputs pmdval), pte_offset_map_lock() a sparse __cond_lock wrapper for __pte_offset_map_lock(): those __funcs added in mm/pgtable-generic.c. __pte_offset_map() do pmdval validation (including pmd_clear_bad() when pmd_bad()), returning NULL if pmdval is not for a page table. __pte_offset_map_lock() verify pmdval unchanged after getting the lock, trying again if it changed. No #ifdef CONFIG_TRANSPARENT_HUGEPAGE around them: that could be done to cover the imminent case, but we expect to generalize it later, and it makes a mess of where to do the pmd_bad() clearing. Add pte_offset_map_nolock(): outputs ptl like pte_offset_map_lock(), without actually taking the lock. This will be preferred to open uses of pte_lockptr(), because (when split ptlock is in page table's struct page) it points to the right lock for the returned pte pointer, even if *pmd gets changed racily afterwards. Update corresponding Documentation. Do not add the anticipated rcu_read_lock() and rcu_read_unlock()s yet: they have to wait until all architectures are balancing pte_offset_map()s with pte_unmap()s (as in the arch series posted earlier). But comment where they will go, so that it's easy to add them for experiments. And only when those are in place can transient racy failure cases be enabled. Add more safety for the PAE mismatched pmd_low pmd_high case at that time. Signed-off-by: Hugh Dickins --- Documentation/mm/split_page_table_lock.rst | 17 ++++--- include/linux/mm.h | 27 +++++++---- include/linux/pgtable.h | 22 ++++++--- mm/pgtable-generic.c | 56 ++++++++++++++++++++++ 4 files changed, 101 insertions(+), 21 deletions(-) diff --git a/Documentation/mm/split_page_table_lock.rst b/Documentation/mm/= split_page_table_lock.rst index 50ee0dfc95be..a834fad9de12 100644 --- a/Documentation/mm/split_page_table_lock.rst +++ b/Documentation/mm/split_page_table_lock.rst @@ -14,15 +14,20 @@ tables. Access to higher level tables protected by mm->= page_table_lock. There are helpers to lock/unlock a table and other accessor functions: =20 - pte_offset_map_lock() - maps pte and takes PTE table lock, returns pointer to the taken - lock; + maps PTE and takes PTE table lock, returns pointer to PTE with + pointer to its PTE table lock, or returns NULL if no PTE table; + - pte_offset_map_nolock() + maps PTE, returns pointer to PTE with pointer to its PTE table + lock (not taken), or returns NULL if no PTE table; + - pte_offset_map() + maps PTE, returns pointer to PTE, or returns NULL if no PTE table; + - pte_unmap() + unmaps PTE table; - pte_unmap_unlock() unlocks and unmaps PTE table; - pte_alloc_map_lock() - allocates PTE table if needed and take the lock, returns pointer - to taken lock or NULL if allocation failed; - - pte_lockptr() - returns pointer to PTE table lock; + allocates PTE table if needed and takes its lock, returns pointer to + PTE with pointer to its lock, or returns NULL if allocation failed; - pmd_lock() takes PMD table lock, returns pointer to taken lock; - pmd_lockptr() diff --git a/include/linux/mm.h b/include/linux/mm.h index 27ce77080c79..3c2e56980853 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -2787,14 +2787,25 @@ static inline void pgtable_pte_page_dtor(struct pag= e *page) dec_lruvec_page_state(page, NR_PAGETABLE); } =20 -#define pte_offset_map_lock(mm, pmd, address, ptlp) \ -({ \ - spinlock_t *__ptl =3D pte_lockptr(mm, pmd); \ - pte_t *__pte =3D pte_offset_map(pmd, address); \ - *(ptlp) =3D __ptl; \ - spin_lock(__ptl); \ - __pte; \ -}) +pte_t *__pte_offset_map(pmd_t *pmd, unsigned long addr, pmd_t *pmdvalp); +static inline pte_t *pte_offset_map(pmd_t *pmd, unsigned long addr) +{ + return __pte_offset_map(pmd, addr, NULL); +} + +pte_t *__pte_offset_map_lock(struct mm_struct *mm, pmd_t *pmd, + unsigned long addr, spinlock_t **ptlp); +static inline pte_t *pte_offset_map_lock(struct mm_struct *mm, pmd_t *pmd, + unsigned long addr, spinlock_t **ptlp) +{ + pte_t *pte; + + __cond_lock(*ptlp, pte =3D __pte_offset_map_lock(mm, pmd, addr, ptlp)); + return pte; +} + +pte_t *pte_offset_map_nolock(struct mm_struct *mm, pmd_t *pmd, + unsigned long addr, spinlock_t **ptlp); =20 #define pte_unmap_unlock(pte, ptl) do { \ spin_unlock(ptl); \ diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h index 94235ff2706e..3fabbb018557 100644 --- a/include/linux/pgtable.h +++ b/include/linux/pgtable.h @@ -94,14 +94,22 @@ static inline pte_t *pte_offset_kernel(pmd_t *pmd, unsi= gned long address) #define pte_offset_kernel pte_offset_kernel #endif =20 -#if defined(CONFIG_HIGHPTE) -#define pte_offset_map(dir, address) \ - ((pte_t *)kmap_local_page(pmd_page(*(dir))) + \ - pte_index((address))) -#define pte_unmap(pte) kunmap_local((pte)) +#ifdef CONFIG_HIGHPTE +#define __pte_map(pmd, address) \ + ((pte_t *)kmap_local_page(pmd_page(*(pmd))) + pte_index((address))) +#define pte_unmap(pte) do { \ + kunmap_local((pte)); \ + /* rcu_read_unlock() to be added later */ \ +} while (0) #else -#define pte_offset_map(dir, address) pte_offset_kernel((dir), (address)) -#define pte_unmap(pte) ((void)(pte)) /* NOP */ +static inline pte_t *__pte_map(pmd_t *pmd, unsigned long address) +{ + return pte_offset_kernel(pmd, address); +} +static inline void pte_unmap(pte_t *pte) +{ + /* rcu_read_unlock() to be added later */ +} #endif =20 /* Find an entry in the second-level page table.. */ diff --git a/mm/pgtable-generic.c b/mm/pgtable-generic.c index d2fc52bffafc..c7ab18a5fb77 100644 --- a/mm/pgtable-generic.c +++ b/mm/pgtable-generic.c @@ -10,6 +10,8 @@ #include #include #include +#include +#include #include #include =20 @@ -229,3 +231,57 @@ pmd_t pmdp_collapse_flush(struct vm_area_struct *vma, = unsigned long address, } #endif #endif /* CONFIG_TRANSPARENT_HUGEPAGE */ + +pte_t *__pte_offset_map(pmd_t *pmd, unsigned long addr, pmd_t *pmdvalp) +{ + pmd_t pmdval; + + /* rcu_read_lock() to be added later */ + pmdval =3D pmdp_get_lockless(pmd); + if (pmdvalp) + *pmdvalp =3D pmdval; + if (unlikely(pmd_none(pmdval) || is_pmd_migration_entry(pmdval))) + goto nomap; + if (unlikely(pmd_trans_huge(pmdval) || pmd_devmap(pmdval))) + goto nomap; + if (unlikely(pmd_bad(pmdval))) { + pmd_clear_bad(pmd); + goto nomap; + } + return __pte_map(&pmdval, addr); +nomap: + /* rcu_read_unlock() to be added later */ + return NULL; +} + +pte_t *pte_offset_map_nolock(struct mm_struct *mm, pmd_t *pmd, + unsigned long addr, spinlock_t **ptlp) +{ + pmd_t pmdval; + pte_t *pte; + + pte =3D __pte_offset_map(pmd, addr, &pmdval); + if (likely(pte)) + *ptlp =3D pte_lockptr(mm, &pmdval); + return pte; +} + +pte_t *__pte_offset_map_lock(struct mm_struct *mm, pmd_t *pmd, + unsigned long addr, spinlock_t **ptlp) +{ + spinlock_t *ptl; + pmd_t pmdval; + pte_t *pte; +again: + pte =3D __pte_offset_map(pmd, addr, &pmdval); + if (unlikely(!pte)) + return pte; + ptl =3D pte_lockptr(mm, &pmdval); + spin_lock(ptl); + if (likely(pmd_same(pmdval, pmdp_get_lockless(pmd)))) { + *ptlp =3D ptl; + return pte; + } + pte_unmap_unlock(pte, ptl); + goto again; +} --=20 2.35.3 From nobody Sat Feb 7 19:45:35 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9F5F6C7EE23 for ; Mon, 22 May 2023 04:54:37 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231627AbjEVEyf (ORCPT ); Mon, 22 May 2023 00:54:35 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:44626 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231605AbjEVEyb (ORCPT ); Mon, 22 May 2023 00:54:31 -0400 Received: from mail-yw1-x112d.google.com (mail-yw1-x112d.google.com [IPv6:2607:f8b0:4864:20::112d]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 699DEE9 for ; Sun, 21 May 2023 21:54:29 -0700 (PDT) Received: by mail-yw1-x112d.google.com with SMTP id 00721157ae682-561afe72a73so76337857b3.0 for ; Sun, 21 May 2023 21:54:29 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1684731268; x=1687323268; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:from:to:cc:subject:date:message-id:reply-to; bh=DKO2MGlysYL+7+oQy8UdaapVjC7HViDdumwjMefUUFk=; b=J9ylJNbawMeH4tOfy11w+ZaO/6872J2rqfHP5lovPYJFW8zB72NKQpN6jeQPkFtYO5 7RHmvMjgUcPMPHqNGnXVyrAFqE5jQbr7ovn6/CLPgFbf3l4X7M/QuLcTo+6IociOTnvp 5CQ9bnI6x5YxJS4Rh5hvct/0Yo5JX0qxeMmK1yabPvgHVQgZYihchIYIgKApcKzJ3IRg ROh6CHyiaDDOL+Ex2mG/L8yyWOYTqU/rosesPvWEE4ijihe6qctq0T8BNRcL6ZpZfkCn M6FU+PpeRPkKqHsAb2AldyKg/94bE7H6WNY4ZFE5SlqswttpLgasmwxgkPJXNEXdyjtP 03EQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1684731268; x=1687323268; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=DKO2MGlysYL+7+oQy8UdaapVjC7HViDdumwjMefUUFk=; b=Pymt1ElQf608iFI+8jZDVJTW3E//lxwzfyY6rgB3vvX+Z1wYsH7162FejLcmsK3zj7 LfTvtUTGfg8Xh0UN1KtmnCjq0UCIUWefKalz68eZ+bQcLys2InsYlvXXRvvFNyGj2MGY FP0fWktXqGVpHdlghFVlEApfHDu/H+OGgwsOIlJF6M0eTHj2xjpBRg++fabGVuD/YBLk rsa2XoHD1s0AOwc82FAZwC+2NG0Z9xvE/MO4868TqVIqQFW3a1HhVPgmPNLfnI1A7Eam N0J1m9IEzYyBbwjRPEOBLH4N0oW9lzKWSYg5g5gJmHzMnF+XvGLTmv0Y+URZULk/dl/b lskQ== X-Gm-Message-State: AC+VfDxNNRvLbkBGmuE5OeTFEUQiIUT0GKbyRlW5ry6S4DbHQKmj6AII Zv+t2laoX3XD+xIRMmXZ4CrGDQ== X-Google-Smtp-Source: ACHHUZ4551zDAoXcCIh8jhAd8f8X+Cxw/fLbkIrjHAC/+8ziZTZq6Fx4vtipE1QtRd7bp1hgyGTCog== X-Received: by 2002:a81:4e97:0:b0:561:e7bb:36a1 with SMTP id c145-20020a814e97000000b00561e7bb36a1mr8839265ywb.49.1684731268515; Sun, 21 May 2023 21:54:28 -0700 (PDT) Received: from ripple.attlocal.net (172-10-233-147.lightspeed.sntcca.sbcglobal.net. [172.10.233.147]) by smtp.gmail.com with ESMTPSA id r190-20020a0de8c7000000b0056189f9ec2asm1803508ywe.133.2023.05.21.21.54.25 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 21 May 2023 21:54:28 -0700 (PDT) Date: Sun, 21 May 2023 21:54:25 -0700 (PDT) From: Hugh Dickins X-X-Sender: hugh@ripple.attlocal.net To: Andrew Morton cc: Mike Kravetz , Mike Rapoport , "Kirill A. Shutemov" , Matthew Wilcox , David Hildenbrand , Suren Baghdasaryan , Qi Zheng , Yang Shi , Mel Gorman , Peter Xu , Peter Zijlstra , Will Deacon , Yu Zhao , Alistair Popple , Ralph Campbell , Ira Weiny , Steven Price , SeongJae Park , Naoya Horiguchi , Christophe Leroy , Zack Rusin , Jason Gunthorpe , Axel Rasmussen , Anshuman Khandual , Pasha Tatashin , Miaohe Lin , Minchan Kim , Christoph Hellwig , Song Liu , Thomas Hellstrom , linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: [PATCH 05/31] mm/filemap: allow pte_offset_map_lock() to fail In-Reply-To: <68a97fbe-5c1e-7ac6-72c-7b9c6290b370@google.com> Message-ID: <3e6d4f8-9f4d-fa7e-304e-1494dddd45b@google.com> References: <68a97fbe-5c1e-7ac6-72c-7b9c6290b370@google.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" filemap_map_pages() allow pte_offset_map_lock() to fail; and remove the pmd_devmap_trans_unstable() check from filemap_map_pmd(), which can safely return to filemap_map_pages() and let pte_offset_map_lock() discover that. Signed-off-by: Hugh Dickins --- mm/filemap.c | 12 +++++------- 1 file changed, 5 insertions(+), 7 deletions(-) diff --git a/mm/filemap.c b/mm/filemap.c index 28b42ee848a4..9e129ad43e0d 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -3408,13 +3408,6 @@ static bool filemap_map_pmd(struct vm_fault *vmf, st= ruct folio *folio, if (pmd_none(*vmf->pmd)) pmd_install(mm, vmf->pmd, &vmf->prealloc_pte); =20 - /* See comment in handle_pte_fault() */ - if (pmd_devmap_trans_unstable(vmf->pmd)) { - folio_unlock(folio); - folio_put(folio); - return true; - } - return false; } =20 @@ -3501,6 +3494,11 @@ vm_fault_t filemap_map_pages(struct vm_fault *vmf, =20 addr =3D vma->vm_start + ((start_pgoff - vma->vm_pgoff) << PAGE_SHIFT); vmf->pte =3D pte_offset_map_lock(vma->vm_mm, vmf->pmd, addr, &vmf->ptl); + if (!vmf->pte) { + folio_unlock(folio); + folio_put(folio); + goto out; + } do { again: page =3D folio_file_page(folio, xas.xa_index); --=20 2.35.3 From nobody Sat Feb 7 19:45:35 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id E0CCAC77B73 for ; Mon, 22 May 2023 04:55:58 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231631AbjEVEz6 (ORCPT ); Mon, 22 May 2023 00:55:58 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:45124 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231509AbjEVEzz (ORCPT ); Mon, 22 May 2023 00:55:55 -0400 Received: from mail-yb1-xb36.google.com (mail-yb1-xb36.google.com [IPv6:2607:f8b0:4864:20::b36]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 1D573BB for ; Sun, 21 May 2023 21:55:55 -0700 (PDT) Received: by mail-yb1-xb36.google.com with SMTP id 3f1490d57ef6-ba829e17aacso8512568276.0 for ; Sun, 21 May 2023 21:55:55 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1684731354; x=1687323354; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:from:to:cc:subject:date:message-id:reply-to; bh=zTm3eSBiFsuHpLmgEmMbxBID/kzy6WIASRGS/m4xNS4=; b=cg30IRtt7WWBZINDX0Jv402rJAxuzmQi7sNbiwo3sAYNeuv0iwLqCdxKXUV1666arB iODQXo3TXBZWNrbAGBgo7JPYM/uh3QZy2aOwzrHWKdaKeQxiaTgQr+B/h0WaJumRmkZS EwvnPxhlE/ygC1l9PvisFRAxceZsimtF2xJnCScbrcoMyL1u1L/6wf9rDZLxnaTA3LQc zI7g1VSQ5r9D+qngti0qturyyoRo7eal1Od9D8FGmOBxjDKgugY3ZMq1QggviidfFbw9 1sPnCORb05gZ3h3EwvicWZtgtg3JnoDyX2eiBh+EAi3PI8BUvp+6C/C1Nf/8YEhUgGVC fWYQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1684731354; x=1687323354; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=zTm3eSBiFsuHpLmgEmMbxBID/kzy6WIASRGS/m4xNS4=; b=JUV2Awq5oX2NO3ESgrJYMD+z+OkQXW5Et38CqVu12HuLCNxUDTgCYirhce4ntk6epm fT9/sjxNADTv76gKkTevx0qrLJ1szZF4DBS1iaiSadhCNv8yCkdblfhU8MKExXIqVXvn BF7HUF8tb+JtF8jk6S/KJ+FxAMZamq+WUaIjBfhWr0rfTuKuiQ87V3vdOrVY8KQm/KDV Y8sYsgu+broqd9vJP46Qxuq8B+3UP7PS47EPbPTK7fixg1S74GeVzI8gHqD8027Gv2ix LTy16NVoeG1AIchMIvK5vsY6FBvvHzIQM6KPFj14mRO6yhd9Wyj8mjWI0fZexarSqjNI AHuA== X-Gm-Message-State: AC+VfDwh7MukIJoSrihvM6mQtALadM/AqNb4r7W2E18GAAuFeESCmgZp D9IbjIUFZ6uNSEzTfCooaVWGdg== X-Google-Smtp-Source: ACHHUZ5T4JJRJ8fgbx7qxtgyN7oMDnzb0y/FhCeDxcmhIqNJgCW1YAFxxuHwFhGWjkNrnbHB22VBBQ== X-Received: by 2002:a25:fc28:0:b0:ba8:37bd:59db with SMTP id v40-20020a25fc28000000b00ba837bd59dbmr8298369ybd.34.1684731354145; Sun, 21 May 2023 21:55:54 -0700 (PDT) Received: from ripple.attlocal.net (172-10-233-147.lightspeed.sntcca.sbcglobal.net. [172.10.233.147]) by smtp.gmail.com with ESMTPSA id 185-20020a2505c2000000b00babd051a405sm938658ybf.26.2023.05.21.21.55.51 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 21 May 2023 21:55:53 -0700 (PDT) Date: Sun, 21 May 2023 21:55:50 -0700 (PDT) From: Hugh Dickins X-X-Sender: hugh@ripple.attlocal.net To: Andrew Morton cc: Mike Kravetz , Mike Rapoport , "Kirill A. Shutemov" , Matthew Wilcox , David Hildenbrand , Suren Baghdasaryan , Qi Zheng , Yang Shi , Mel Gorman , Peter Xu , Peter Zijlstra , Will Deacon , Yu Zhao , Alistair Popple , Ralph Campbell , Ira Weiny , Steven Price , SeongJae Park , Naoya Horiguchi , Christophe Leroy , Zack Rusin , Jason Gunthorpe , Axel Rasmussen , Anshuman Khandual , Pasha Tatashin , Miaohe Lin , Minchan Kim , Christoph Hellwig , Song Liu , Thomas Hellstrom , linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: [PATCH 06/31] mm/page_vma_mapped: delete bogosity in page_vma_mapped_walk() In-Reply-To: <68a97fbe-5c1e-7ac6-72c-7b9c6290b370@google.com> Message-ID: <502d6743-b0bf-d848-596a-4b3f3e44de8b@google.com> References: <68a97fbe-5c1e-7ac6-72c-7b9c6290b370@google.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Revert commit a7a69d8ba88d ("mm/thp: another PVMW_SYNC fix in page_vma_mapped_walk()"): I was proud of that "Aha!" commit at the time, but in revisiting page_vma_mapped_walk() for pte_offset_map() failure, that block raised a doubt: and it now seems utterly bogus. The prior map_pte() has taken ptl unconditionally when PVMW_SYNC: I must have forgotten that when making the change. It did no harm, but could not have fixed a BUG or WARN, and is hard to reconcile with coming changes. Signed-off-by: Hugh Dickins --- mm/page_vma_mapped.c | 4 ---- 1 file changed, 4 deletions(-) diff --git a/mm/page_vma_mapped.c b/mm/page_vma_mapped.c index 64aff6718bdb..007dc7456f0e 100644 --- a/mm/page_vma_mapped.c +++ b/mm/page_vma_mapped.c @@ -275,10 +275,6 @@ bool page_vma_mapped_walk(struct page_vma_mapped_walk = *pvmw) goto restart; } pvmw->pte++; - if ((pvmw->flags & PVMW_SYNC) && !pvmw->ptl) { - pvmw->ptl =3D pte_lockptr(mm, pvmw->pmd); - spin_lock(pvmw->ptl); - } } while (pte_none(*pvmw->pte)); =20 if (!pvmw->ptl) { --=20 2.35.3 From nobody Sat Feb 7 19:45:35 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id D2E07C77B73 for ; Mon, 22 May 2023 04:57:36 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231637AbjEVE5f (ORCPT ); Mon, 22 May 2023 00:57:35 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:45514 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229552AbjEVE5b (ORCPT ); Mon, 22 May 2023 00:57:31 -0400 Received: from mail-yw1-x1134.google.com (mail-yw1-x1134.google.com [IPv6:2607:f8b0:4864:20::1134]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id ACBD9E9 for ; Sun, 21 May 2023 21:57:30 -0700 (PDT) Received: by mail-yw1-x1134.google.com with SMTP id 00721157ae682-561a33b6d63so74535077b3.1 for ; Sun, 21 May 2023 21:57:30 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1684731450; x=1687323450; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:from:to:cc:subject:date:message-id:reply-to; bh=dk0QCHZ/ZrkXov8P0kyfxgscgrLXPV5n2YV5/OCWJso=; b=CynFLjQo0B1iyGJQ1Hhn7p6Dwng2pZKqC1sA23EZv3JCM+ZbaT1ahTTm4sUvOZtzCZ BowaOQG29xMNLY0nY/iY+TITfx8rXjfCpXFh5g/iSfQ5lmiD0WCo6dz+EbU/qzow4T48 gIYfejZ6h5BXNTbcNV7JSAg8ejqfun0/YIkYFIX2j/yjnUq7/K2x26jXo8nJSL+Ur/DZ +SUB1nDtFHo1Zu1hnHccttoMBnxyawb0XgPXLVByf6aDR7smWV6B07cuec2Ws0N9yLdQ 8jqD+ttVDEwqJC0ct/mh9eZgT02XjAOsBx+WhPs+42uDF1lV5/G1Bt1qT41SuPS3wH4j Fx7g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1684731450; x=1687323450; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=dk0QCHZ/ZrkXov8P0kyfxgscgrLXPV5n2YV5/OCWJso=; b=YK1zIOQ49dcbGpS3txrFSOTesTonQVRBTMM6A1ltKSiclOPz3bAohyoBlYI85BjIEw e/mdtEv6epRTCsVGNabnmxSQoN/cAwM9sJf3ISTXMHnzVxSxfDUg5/Gn7sSMhYgoNQSv ThzDjoq5Yvu7BG5UGqnQbS6JWGOli1ES2oZXuDicXqB6r/7Izu8U6t/5lcNJz00xpTGf MUXm4EJ5qGJJEaO0hMTwPOnNT0MA7n9JAPZEqVRYbuRnpfgclZcgLrdwE0wEX801DdrT QFRQlyVl5XQ9dCSKhxfYhMMkIMkJAXpdp6TINtVQeLrNlvx19xgUve7kPImEq7teV52t Eh1w== X-Gm-Message-State: AC+VfDzKjNIUuEl3pEqoWLyD0pC3hazY5HBWVCMRHpm3d9yC0p1Va5rp w7MOd/JRQ0yrxJVtuJTJLiSKgA== X-Google-Smtp-Source: ACHHUZ6bURP043biciXrsTo4ID2f5FUMVgcYdQ4w5Fut/Rhs5NuMHxd9s5+H+9KhMfle9QGb5MKZdA== X-Received: by 2002:a81:9206:0:b0:55a:2ce1:2353 with SMTP id j6-20020a819206000000b0055a2ce12353mr9508264ywg.2.1684731449358; Sun, 21 May 2023 21:57:29 -0700 (PDT) Received: from ripple.attlocal.net (172-10-233-147.lightspeed.sntcca.sbcglobal.net. [172.10.233.147]) by smtp.gmail.com with ESMTPSA id c130-20020a814e88000000b0056183cdb2d9sm1794222ywb.60.2023.05.21.21.57.26 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 21 May 2023 21:57:29 -0700 (PDT) Date: Sun, 21 May 2023 21:57:25 -0700 (PDT) From: Hugh Dickins X-X-Sender: hugh@ripple.attlocal.net To: Andrew Morton cc: Mike Kravetz , Mike Rapoport , "Kirill A. Shutemov" , Matthew Wilcox , David Hildenbrand , Suren Baghdasaryan , Qi Zheng , Yang Shi , Mel Gorman , Peter Xu , Peter Zijlstra , Will Deacon , Yu Zhao , Alistair Popple , Ralph Campbell , Ira Weiny , Steven Price , SeongJae Park , Naoya Horiguchi , Christophe Leroy , Zack Rusin , Jason Gunthorpe , Axel Rasmussen , Anshuman Khandual , Pasha Tatashin , Miaohe Lin , Minchan Kim , Christoph Hellwig , Song Liu , Thomas Hellstrom , linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: [PATCH 07/31] mm/page_vma_mapped: reformat map_pte() with less indentation In-Reply-To: <68a97fbe-5c1e-7ac6-72c-7b9c6290b370@google.com> Message-ID: <4d93bd9-346c-938f-45d0-e073372323f6@google.com> References: <68a97fbe-5c1e-7ac6-72c-7b9c6290b370@google.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" No functional change here, but adjust the format of map_pte() so that the following commit will be easier to read: separate out the PVMW_SYNC case first, and remove two levels of indentation from the ZONE_DEVICE case. Signed-off-by: Hugh Dickins --- mm/page_vma_mapped.c | 65 +++++++++++++++++++++++--------------------- 1 file changed, 34 insertions(+), 31 deletions(-) diff --git a/mm/page_vma_mapped.c b/mm/page_vma_mapped.c index 007dc7456f0e..947dc7491815 100644 --- a/mm/page_vma_mapped.c +++ b/mm/page_vma_mapped.c @@ -15,38 +15,41 @@ static inline bool not_found(struct page_vma_mapped_wal= k *pvmw) =20 static bool map_pte(struct page_vma_mapped_walk *pvmw) { - pvmw->pte =3D pte_offset_map(pvmw->pmd, pvmw->address); - if (!(pvmw->flags & PVMW_SYNC)) { - if (pvmw->flags & PVMW_MIGRATION) { - if (!is_swap_pte(*pvmw->pte)) - return false; - } else { - /* - * We get here when we are trying to unmap a private - * device page from the process address space. Such - * page is not CPU accessible and thus is mapped as - * a special swap entry, nonetheless it still does - * count as a valid regular mapping for the page (and - * is accounted as such in page maps count). - * - * So handle this special case as if it was a normal - * page mapping ie lock CPU page table and returns - * true. - * - * For more details on device private memory see HMM - * (include/linux/hmm.h or mm/hmm.c). - */ - if (is_swap_pte(*pvmw->pte)) { - swp_entry_t entry; + if (pvmw->flags & PVMW_SYNC) { + /* Use the stricter lookup */ + pvmw->pte =3D pte_offset_map_lock(pvmw->vma->vm_mm, pvmw->pmd, + pvmw->address, &pvmw->ptl); + return true; + } =20 - /* Handle un-addressable ZONE_DEVICE memory */ - entry =3D pte_to_swp_entry(*pvmw->pte); - if (!is_device_private_entry(entry) && - !is_device_exclusive_entry(entry)) - return false; - } else if (!pte_present(*pvmw->pte)) - return false; - } + pvmw->pte =3D pte_offset_map(pvmw->pmd, pvmw->address); + if (pvmw->flags & PVMW_MIGRATION) { + if (!is_swap_pte(*pvmw->pte)) + return false; + } else if (is_swap_pte(*pvmw->pte)) { + swp_entry_t entry; + /* + * Handle un-addressable ZONE_DEVICE memory. + * + * We get here when we are trying to unmap a private + * device page from the process address space. Such + * page is not CPU accessible and thus is mapped as + * a special swap entry, nonetheless it still does + * count as a valid regular mapping for the page + * (and is accounted as such in page maps count). + * + * So handle this special case as if it was a normal + * page mapping ie lock CPU page table and return true. + * + * For more details on device private memory see HMM + * (include/linux/hmm.h or mm/hmm.c). + */ + entry =3D pte_to_swp_entry(*pvmw->pte); + if (!is_device_private_entry(entry) && + !is_device_exclusive_entry(entry)) + return false; + } else if (!pte_present(*pvmw->pte)) { + return false; } pvmw->ptl =3D pte_lockptr(pvmw->vma->vm_mm, pvmw->pmd); spin_lock(pvmw->ptl); --=20 2.35.3 From nobody Sat Feb 7 19:45:35 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8AE7EC77B73 for ; Mon, 22 May 2023 04:59:11 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231656AbjEVE7K (ORCPT ); Mon, 22 May 2023 00:59:10 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46124 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231650AbjEVE7F (ORCPT ); Mon, 22 May 2023 00:59:05 -0400 Received: from mail-yw1-x1133.google.com (mail-yw1-x1133.google.com [IPv6:2607:f8b0:4864:20::1133]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 96E10ED for ; Sun, 21 May 2023 21:59:02 -0700 (PDT) Received: by mail-yw1-x1133.google.com with SMTP id 00721157ae682-56190515833so47623467b3.0 for ; Sun, 21 May 2023 21:59:02 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1684731541; x=1687323541; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:from:to:cc:subject:date:message-id:reply-to; bh=9VYOFAJpcVHpYrrS1TQZcIVdYEqSjI0GrwCgA5USRTU=; b=TGvacnGGV6vCjNoR60zApDSkblq671aMycg9KRLogEr52U+VDqJD6c9LnImDcePwXP E+oOLSRKY3QnKjySzwhyx6yXTUVprQkKKmDGsyxWBnNRAxybbHWPE+dZuUWqHGT3elWd KJzJqVN+sTGBo7NuVZnh1kpo/iER1/+XPOJgROkPPAZo27SecID8MF0vNrn6JwCjXzCa UJM1gcqwPlnlc9FLHaIRUz/YKXEVOCkEBbD5opmzBt0US/6C6qPiSdxH6LlPD5UCxU/z zjtg+pK6HvaDdyOshiCuMXpAmIJoOt9KExIMuIDdOzn/Rpr/BNUdc7KJhSnoFWJGvEir v0Cg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1684731541; x=1687323541; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=9VYOFAJpcVHpYrrS1TQZcIVdYEqSjI0GrwCgA5USRTU=; b=CNrRo7uQLhJLoLnrvcUqnXnPXEXQVtaBP0NIuRtlzFFis5ETVp+V1zZ+Gfz7CcauPe Ufim+Q1Nk+a9A3HWlVBJ2PPw4qfbBNZEa0t1nujVoiTRoOcx8tWasjPG6kj1bXfBNmGG EpNlWkjxxscgGqJPv8krNr9R9jOZMUAb70pgBgkxm6YU0WSx03cjICEqBTcIdNWNN85R 65CrtRk3aiYuJnlgT1+70B2ZP0Fxt5RL+c+N+Hvtoqm5MytAsd+Giu6/WcnMCdG8QlAX 3IyfXmCYTLisORN7IF9+MMjvrkI3gx1eqn7doyW5IWZ8bxEkPE+WE4++xzvkrt5CcH3W dtWw== X-Gm-Message-State: AC+VfDz2w9ckXRYexFbjFOJSaC8zoUZ3mtMv8f/M7p92y9wCt7wM4TV+ 0uDX1NyeFUwwCzXDVW9B60IipA== X-Google-Smtp-Source: ACHHUZ4Kg3oHxKZR/9CKF00axRzhTRl1agKzGrSrlAst7zhZTlHx2mXvNeak2L2x7OdVQNGzScOvHA== X-Received: by 2002:a81:1e45:0:b0:560:d022:53ac with SMTP id e66-20020a811e45000000b00560d02253acmr10367255ywe.5.1684731541606; Sun, 21 May 2023 21:59:01 -0700 (PDT) Received: from ripple.attlocal.net (172-10-233-147.lightspeed.sntcca.sbcglobal.net. [172.10.233.147]) by smtp.gmail.com with ESMTPSA id g189-20020a8152c6000000b00555e1886350sm1827794ywb.78.2023.05.21.21.58.59 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 21 May 2023 21:59:01 -0700 (PDT) Date: Sun, 21 May 2023 21:58:58 -0700 (PDT) From: Hugh Dickins X-X-Sender: hugh@ripple.attlocal.net To: Andrew Morton cc: Mike Kravetz , Mike Rapoport , "Kirill A. Shutemov" , Matthew Wilcox , David Hildenbrand , Suren Baghdasaryan , Qi Zheng , Yang Shi , Mel Gorman , Peter Xu , Peter Zijlstra , Will Deacon , Yu Zhao , Alistair Popple , Ralph Campbell , Ira Weiny , Steven Price , SeongJae Park , Naoya Horiguchi , Christophe Leroy , Zack Rusin , Jason Gunthorpe , Axel Rasmussen , Anshuman Khandual , Pasha Tatashin , Miaohe Lin , Minchan Kim , Christoph Hellwig , Song Liu , Thomas Hellstrom , linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: [PATCH 08/31] mm/page_vma_mapped: pte_offset_map_nolock() not pte_lockptr() In-Reply-To: <68a97fbe-5c1e-7ac6-72c-7b9c6290b370@google.com> Message-ID: <8fa3fb6e-2e39-cbea-c529-ee9e64c7d2d0@google.com> References: <68a97fbe-5c1e-7ac6-72c-7b9c6290b370@google.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" map_pte() use pte_offset_map_nolock(), to make sure of the ptl belonging to pte, even if pmd entry is then changed racily: page_vma_mapped_walk() use that instead of getting pte_lockptr() later, or restart if map_pte() found no page table. Signed-off-by: Hugh Dickins --- mm/page_vma_mapped.c | 28 ++++++++++++++++++++++------ 1 file changed, 22 insertions(+), 6 deletions(-) diff --git a/mm/page_vma_mapped.c b/mm/page_vma_mapped.c index 947dc7491815..2af734274073 100644 --- a/mm/page_vma_mapped.c +++ b/mm/page_vma_mapped.c @@ -13,16 +13,28 @@ static inline bool not_found(struct page_vma_mapped_wal= k *pvmw) return false; } =20 -static bool map_pte(struct page_vma_mapped_walk *pvmw) +static bool map_pte(struct page_vma_mapped_walk *pvmw, spinlock_t **ptlp) { if (pvmw->flags & PVMW_SYNC) { /* Use the stricter lookup */ pvmw->pte =3D pte_offset_map_lock(pvmw->vma->vm_mm, pvmw->pmd, pvmw->address, &pvmw->ptl); - return true; + *ptlp =3D pvmw->ptl; + return !!pvmw->pte; } =20 - pvmw->pte =3D pte_offset_map(pvmw->pmd, pvmw->address); + /* + * It is important to return the ptl corresponding to pte, + * in case *pvmw->pmd changes underneath us; so we need to + * return it even when choosing not to lock, in case caller + * proceeds to loop over next ptes, and finds a match later. + * Though, in most cases, page lock already protects this. + */ + pvmw->pte =3D pte_offset_map_nolock(pvmw->vma->vm_mm, pvmw->pmd, + pvmw->address, ptlp); + if (!pvmw->pte) + return false; + if (pvmw->flags & PVMW_MIGRATION) { if (!is_swap_pte(*pvmw->pte)) return false; @@ -51,7 +63,7 @@ static bool map_pte(struct page_vma_mapped_walk *pvmw) } else if (!pte_present(*pvmw->pte)) { return false; } - pvmw->ptl =3D pte_lockptr(pvmw->vma->vm_mm, pvmw->pmd); + pvmw->ptl =3D *ptlp; spin_lock(pvmw->ptl); return true; } @@ -156,6 +168,7 @@ bool page_vma_mapped_walk(struct page_vma_mapped_walk *= pvmw) struct vm_area_struct *vma =3D pvmw->vma; struct mm_struct *mm =3D vma->vm_mm; unsigned long end; + spinlock_t *ptl; pgd_t *pgd; p4d_t *p4d; pud_t *pud; @@ -257,8 +270,11 @@ bool page_vma_mapped_walk(struct page_vma_mapped_walk = *pvmw) step_forward(pvmw, PMD_SIZE); continue; } - if (!map_pte(pvmw)) + if (!map_pte(pvmw, &ptl)) { + if (!pvmw->pte) + goto restart; goto next_pte; + } this_pte: if (check_pte(pvmw)) return true; @@ -281,7 +297,7 @@ bool page_vma_mapped_walk(struct page_vma_mapped_walk *= pvmw) } while (pte_none(*pvmw->pte)); =20 if (!pvmw->ptl) { - pvmw->ptl =3D pte_lockptr(mm, pvmw->pmd); + pvmw->ptl =3D ptl; spin_lock(pvmw->ptl); } goto this_pte; --=20 2.35.3 From nobody Sat Feb 7 19:45:35 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id A2E93C7EE23 for ; Mon, 22 May 2023 05:00:24 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231663AbjEVFAX (ORCPT ); Mon, 22 May 2023 01:00:23 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46634 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230162AbjEVFAV (ORCPT ); Mon, 22 May 2023 01:00:21 -0400 Received: from mail-yb1-xb29.google.com (mail-yb1-xb29.google.com [IPv6:2607:f8b0:4864:20::b29]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 50383C6 for ; Sun, 21 May 2023 22:00:20 -0700 (PDT) Received: by mail-yb1-xb29.google.com with SMTP id 3f1490d57ef6-ba94605bcd5so4375348276.2 for ; Sun, 21 May 2023 22:00:20 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1684731619; x=1687323619; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:from:to:cc:subject:date:message-id:reply-to; bh=UuVwG0QveaZLADVtUaHSgqKzTTEqSjST+aR027Vb5lo=; b=gG5CM9oziEo3d2xuTQZP2LZGLbyhx6n0Gd07VVIcPAplxjY92fK4ZBhk/PKkZAtjDN 5xbQCSYDeQd+vsPz3UctEBNNQiAW1PG9SnA0aX9yoaPh6erzg7iRa3/rTzxeaqzGLp7M lSM/wuLv0LneLKuxocS16HoYAdzNdVswju/csw1C62EZM/zSg7SkkxlvZsPb/qR4YDiI JxhcXWwQQZon4YC+DZWlfAheKEVu+DJEgm81Ja53+XH3TYiVcdE+XisUHVOIOH3c+bhB 84wXaTSFtqGDpcIfa2zJuBidBN4AfMzBHzdLZY7lpccTQVYK/Jpluc7HpY0MDlN7jzqs jXCg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1684731619; x=1687323619; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=UuVwG0QveaZLADVtUaHSgqKzTTEqSjST+aR027Vb5lo=; b=dnpRub+PjzLpvWT8CLwsT+aHLeX7fb1rX2nK40S3FTmUyQxt0krHUYkQ6B8lqp3kY6 d9xEDXcO9vaBh03IUZHEBHQroZo1hQZIC83xm/s/j+sB5wT0TnR5o2VXohDVIMJ7PTGt 3d4i1i+UhBWyAUdkpMydRMGGoj4EP/Uho4zCATJU5kwYxIgiEe3AwKXaLh/b721QuMIl tHtwaHV3A+zcVAYdjzyzTc0Wvyd0yjff6PuEDWcBiTS0aYIjqLpGiLKnT4DfddhjK8e4 nfEJLZg786o1tVDfl6e2AQdSaTeMf79Rd8T+wGYooUgF60XeI/0ZlWz4w4NiDNewrF8W obpA== X-Gm-Message-State: AC+VfDyzhco+FBatsGTlgqanuBVgmVesJUSlMkidYIJRF3j/LcfLuVT3 OAYO9YrrzdVXclOYRh02mM/k4Q== X-Google-Smtp-Source: ACHHUZ7OcXDmRntzMQt9n+hv5KPWL1NcuSYc2shkgSn7zkVgG1IKDeMbUgq8Rk9jU0J7GbmzBFEDoA== X-Received: by 2002:a81:6d09:0:b0:561:902e:dc0a with SMTP id i9-20020a816d09000000b00561902edc0amr9837501ywc.32.1684731619134; Sun, 21 May 2023 22:00:19 -0700 (PDT) Received: from ripple.attlocal.net (172-10-233-147.lightspeed.sntcca.sbcglobal.net. [172.10.233.147]) by smtp.gmail.com with ESMTPSA id r63-20020a815d42000000b00555df877a4csm1794565ywb.102.2023.05.21.22.00.16 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 21 May 2023 22:00:18 -0700 (PDT) Date: Sun, 21 May 2023 22:00:15 -0700 (PDT) From: Hugh Dickins X-X-Sender: hugh@ripple.attlocal.net To: Andrew Morton cc: Mike Kravetz , Mike Rapoport , "Kirill A. Shutemov" , Matthew Wilcox , David Hildenbrand , Suren Baghdasaryan , Qi Zheng , Yang Shi , Mel Gorman , Peter Xu , Peter Zijlstra , Will Deacon , Yu Zhao , Alistair Popple , Ralph Campbell , Ira Weiny , Steven Price , SeongJae Park , Naoya Horiguchi , Christophe Leroy , Zack Rusin , Jason Gunthorpe , Axel Rasmussen , Anshuman Khandual , Pasha Tatashin , Miaohe Lin , Minchan Kim , Christoph Hellwig , Song Liu , Thomas Hellstrom , linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: [PATCH 09/31] mm/pagewalkers: ACTION_AGAIN if pte_offset_map_lock() fails In-Reply-To: <68a97fbe-5c1e-7ac6-72c-7b9c6290b370@google.com> Message-ID: <6265ac58-6018-a8c6-cf38-69cba698471@google.com> References: <68a97fbe-5c1e-7ac6-72c-7b9c6290b370@google.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Simple walk_page_range() users should set ACTION_AGAIN to retry when pte_offset_map_lock() fails. No need to check pmd_trans_unstable(): that was precisely to avoid the possiblity of calling pte_offset_map() on a racily removed or inserted THP entry, but such cases are now safely handled inside it. Likewise there is no need to check pmd_none() or pmd_bad() before calling it. Signed-off-by: Hugh Dickins Reviewed-by: SeongJae Park --- fs/proc/task_mmu.c | 32 ++++++++++++++++---------------- mm/damon/vaddr.c | 12 ++++++++---- mm/mempolicy.c | 7 ++++--- mm/mincore.c | 9 ++++----- mm/mlock.c | 4 ++++ 5 files changed, 36 insertions(+), 28 deletions(-) diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c index 420510f6a545..dba5052ce09b 100644 --- a/fs/proc/task_mmu.c +++ b/fs/proc/task_mmu.c @@ -631,14 +631,11 @@ static int smaps_pte_range(pmd_t *pmd, unsigned long = addr, unsigned long end, goto out; } =20 - if (pmd_trans_unstable(pmd)) - goto out; - /* - * The mmap_lock held all the way back in m_start() is what - * keeps khugepaged out of here and from collapsing things - * in here. - */ pte =3D pte_offset_map_lock(vma->vm_mm, pmd, addr, &ptl); + if (!pte) { + walk->action =3D ACTION_AGAIN; + return 0; + } for (; addr !=3D end; pte++, addr +=3D PAGE_SIZE) smaps_pte_entry(pte, addr, walk); pte_unmap_unlock(pte - 1, ptl); @@ -1191,10 +1188,11 @@ static int clear_refs_pte_range(pmd_t *pmd, unsigne= d long addr, return 0; } =20 - if (pmd_trans_unstable(pmd)) - return 0; - pte =3D pte_offset_map_lock(vma->vm_mm, pmd, addr, &ptl); + if (!pte) { + walk->action =3D ACTION_AGAIN; + return 0; + } for (; addr !=3D end; pte++, addr +=3D PAGE_SIZE) { ptent =3D *pte; =20 @@ -1538,9 +1536,6 @@ static int pagemap_pmd_range(pmd_t *pmdp, unsigned lo= ng addr, unsigned long end, spin_unlock(ptl); return err; } - - if (pmd_trans_unstable(pmdp)) - return 0; #endif /* CONFIG_TRANSPARENT_HUGEPAGE */ =20 /* @@ -1548,6 +1543,10 @@ static int pagemap_pmd_range(pmd_t *pmdp, unsigned l= ong addr, unsigned long end, * goes beyond vma->vm_end. */ orig_pte =3D pte =3D pte_offset_map_lock(walk->mm, pmdp, addr, &ptl); + if (!pte) { + walk->action =3D ACTION_AGAIN; + return err; + } for (; addr < end; pte++, addr +=3D PAGE_SIZE) { pagemap_entry_t pme; =20 @@ -1887,11 +1886,12 @@ static int gather_pte_stats(pmd_t *pmd, unsigned lo= ng addr, spin_unlock(ptl); return 0; } - - if (pmd_trans_unstable(pmd)) - return 0; #endif orig_pte =3D pte =3D pte_offset_map_lock(walk->mm, pmd, addr, &ptl); + if (!pte) { + walk->action =3D ACTION_AGAIN; + return 0; + } do { struct page *page =3D can_gather_numa_stats(*pte, vma, addr); if (!page) diff --git a/mm/damon/vaddr.c b/mm/damon/vaddr.c index 1fec16d7263e..b8762ff15c3c 100644 --- a/mm/damon/vaddr.c +++ b/mm/damon/vaddr.c @@ -318,9 +318,11 @@ static int damon_mkold_pmd_entry(pmd_t *pmd, unsigned = long addr, spin_unlock(ptl); } =20 - if (pmd_none(*pmd) || unlikely(pmd_bad(*pmd))) - return 0; pte =3D pte_offset_map_lock(walk->mm, pmd, addr, &ptl); + if (!pte) { + walk->action =3D ACTION_AGAIN; + return 0; + } if (!pte_present(*pte)) goto out; damon_ptep_mkold(pte, walk->mm, addr); @@ -464,9 +466,11 @@ static int damon_young_pmd_entry(pmd_t *pmd, unsigned = long addr, regular_page: #endif /* CONFIG_TRANSPARENT_HUGEPAGE */ =20 - if (pmd_none(*pmd) || unlikely(pmd_bad(*pmd))) - return -EINVAL; pte =3D pte_offset_map_lock(walk->mm, pmd, addr, &ptl); + if (!pte) { + walk->action =3D ACTION_AGAIN; + return 0; + } if (!pte_present(*pte)) goto out; folio =3D damon_get_folio(pte_pfn(*pte)); diff --git a/mm/mempolicy.c b/mm/mempolicy.c index 1756389a0609..4d0bcf6f0d52 100644 --- a/mm/mempolicy.c +++ b/mm/mempolicy.c @@ -514,10 +514,11 @@ static int queue_folios_pte_range(pmd_t *pmd, unsigne= d long addr, if (ptl) return queue_folios_pmd(pmd, ptl, addr, end, walk); =20 - if (pmd_trans_unstable(pmd)) - return 0; - mapped_pte =3D pte =3D pte_offset_map_lock(walk->mm, pmd, addr, &ptl); + if (!pte) { + walk->action =3D ACTION_AGAIN; + return 0; + } for (; addr !=3D end; pte++, addr +=3D PAGE_SIZE) { if (!pte_present(*pte)) continue; diff --git a/mm/mincore.c b/mm/mincore.c index 2d5be013a25a..f33f6a0b1ded 100644 --- a/mm/mincore.c +++ b/mm/mincore.c @@ -113,12 +113,11 @@ static int mincore_pte_range(pmd_t *pmd, unsigned lon= g addr, unsigned long end, goto out; } =20 - if (pmd_trans_unstable(pmd)) { - __mincore_unmapped_range(addr, end, vma, vec); - goto out; - } - ptep =3D pte_offset_map_lock(walk->mm, pmd, addr, &ptl); + if (!ptep) { + walk->action =3D ACTION_AGAIN; + return 0; + } for (; addr !=3D end; ptep++, addr +=3D PAGE_SIZE) { pte_t pte =3D *ptep; =20 diff --git a/mm/mlock.c b/mm/mlock.c index 40b43f8740df..9f2b1173b1b1 100644 --- a/mm/mlock.c +++ b/mm/mlock.c @@ -329,6 +329,10 @@ static int mlock_pte_range(pmd_t *pmd, unsigned long a= ddr, } =20 start_pte =3D pte_offset_map_lock(vma->vm_mm, pmd, addr, &ptl); + if (!start_pte) { + walk->action =3D ACTION_AGAIN; + return 0; + } for (pte =3D start_pte; addr !=3D end; pte++, addr +=3D PAGE_SIZE) { if (!pte_present(*pte)) continue; --=20 2.35.3 From nobody Sat Feb 7 19:45:35 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5044DC7EE23 for ; Mon, 22 May 2023 05:02:07 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231672AbjEVFCG (ORCPT ); Mon, 22 May 2023 01:02:06 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:47074 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230162AbjEVFCB (ORCPT ); Mon, 22 May 2023 01:02:01 -0400 Received: from mail-yw1-x112b.google.com (mail-yw1-x112b.google.com [IPv6:2607:f8b0:4864:20::112b]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 906B592 for ; Sun, 21 May 2023 22:02:00 -0700 (PDT) Received: by mail-yw1-x112b.google.com with SMTP id 00721157ae682-55db055b412so45888387b3.0 for ; Sun, 21 May 2023 22:02:00 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1684731720; x=1687323720; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:from:to:cc:subject:date:message-id:reply-to; bh=s/wneZk+WXTrjMA3fZuua9PXiv3fUuHnEiWoD80tgK8=; b=Q+WLY7Ro5SD5yNUAGTclzTje+VDS5OJwsYctg43k8TXzrg8fckZAOFDLbOpbv+Cz4d vusZX67zn6q4S5qUq6SfyoErvRul9hzMJrB3MRNnhQ6C5pPl/f+0Ug23z4gIrshnXudw uyL2XPYtPcLeaaYEa9djIcbbIiLW16k5noNko0Ixwnf5lma2IwkwvbUb6+ol9PyhD+iA SFpbS9+EhhrldJnU+af1fSiFWnx4LWMy4hftbnHOIsp4fFcbSzoqMd7Z+gmLZKCZGF9K xNYQslTPradcxr1kOt1BAZkUVCQ1AUWpnoyxrlCIdMtfrWBTqYrF1ShOS0ygR98FffYZ BJig== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1684731720; x=1687323720; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=s/wneZk+WXTrjMA3fZuua9PXiv3fUuHnEiWoD80tgK8=; b=UVhqOhp1ZjTtMF/MFt1rsG683bC2Mp/5pJngLLUlH8gPKenvlHKDWBPnolBc/NA/SD nnFLamK4e45IxjRJ/I9myHoJFtQiOHUOUrtyS14gnSyjVM0NTmOLTHoJSDQusivkcYKc HnRdOodIbuud/A3VlxK90Pn+3opdcjFbGTEJbTby680QvvHQWoYZqX6ZWBzxS/oMlkkq HcaIPTCHv8z9q8rttuNTAi8fQIjcqf8s75bQrZjWhdNLiP/KlhbLH06MqoZQRdAbOq/m B3nhtuE/he6v0kLlhZrD486NU10C+ls5nqkXFteLe13UcJUm2BWu0iTywjGuWjNXjQOr 6xcQ== X-Gm-Message-State: AC+VfDyDn6YFTb/tEBTOuP6KhM6fcwSKsl5xnaGhyeZDvz0ibPLzAnwU 1k3SyfGRWCOZ0sCwS7KWlQQyPw== X-Google-Smtp-Source: ACHHUZ62zTVVW00Kw9P1qNrMirZM+5qrpLkNDTDbJjpW5hqzfUZErYJorH+/umfKDMFCMPWuq8GGOg== X-Received: by 2002:a0d:cc45:0:b0:55a:3502:d2ca with SMTP id o66-20020a0dcc45000000b0055a3502d2camr10051671ywd.13.1684731719564; Sun, 21 May 2023 22:01:59 -0700 (PDT) Received: from ripple.attlocal.net (172-10-233-147.lightspeed.sntcca.sbcglobal.net. [172.10.233.147]) by smtp.gmail.com with ESMTPSA id x67-20020a81a046000000b0054fcbf35b94sm1819620ywg.87.2023.05.21.22.01.56 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 21 May 2023 22:01:59 -0700 (PDT) Date: Sun, 21 May 2023 22:01:56 -0700 (PDT) From: Hugh Dickins X-X-Sender: hugh@ripple.attlocal.net To: Andrew Morton cc: Mike Kravetz , Mike Rapoport , "Kirill A. Shutemov" , Matthew Wilcox , David Hildenbrand , Suren Baghdasaryan , Qi Zheng , Yang Shi , Mel Gorman , Peter Xu , Peter Zijlstra , Will Deacon , Yu Zhao , Alistair Popple , Ralph Campbell , Ira Weiny , Steven Price , SeongJae Park , Naoya Horiguchi , Christophe Leroy , Zack Rusin , Jason Gunthorpe , Axel Rasmussen , Anshuman Khandual , Pasha Tatashin , Miaohe Lin , Minchan Kim , Christoph Hellwig , Song Liu , Thomas Hellstrom , linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: [PATCH 10/31] mm/pagewalk: walk_pte_range() allow for pte_offset_map() In-Reply-To: <68a97fbe-5c1e-7ac6-72c-7b9c6290b370@google.com> Message-ID: References: <68a97fbe-5c1e-7ac6-72c-7b9c6290b370@google.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" walk_pte_range() has a no_vma option to serve walk_page_range_novma(). I don't know of any problem, but it looks safer to check for init_mm, and use pte_offset_kernel() rather than pte_offset_map() in that case: pte_offset_map()'s pmdval validation is intended for userspace. Allow for its pte_offset_map() or pte_offset_map_lock() to fail, and retry with ACTION_AGAIN if so. Add a second check for ACTION_AGAIN in walk_pmd_range(), to catch it after return from walk_pte_range(). Remove the pmd_trans_unstable() check after split_huge_pmd() in walk_pmd_range(): walk_pte_range() now handles those cases safely (and they must fail powerpc's is_hugepd() check). Signed-off-by: Hugh Dickins --- mm/pagewalk.c | 33 +++++++++++++++++++++++---------- 1 file changed, 23 insertions(+), 10 deletions(-) diff --git a/mm/pagewalk.c b/mm/pagewalk.c index cb23f8a15c13..64437105fe0d 100644 --- a/mm/pagewalk.c +++ b/mm/pagewalk.c @@ -46,15 +46,27 @@ static int walk_pte_range(pmd_t *pmd, unsigned long add= r, unsigned long end, spinlock_t *ptl; =20 if (walk->no_vma) { - pte =3D pte_offset_map(pmd, addr); - err =3D walk_pte_range_inner(pte, addr, end, walk); - pte_unmap(pte); + /* + * pte_offset_map() might apply user-specific validation. + */ + if (walk->mm =3D=3D &init_mm) + pte =3D pte_offset_kernel(pmd, addr); + else + pte =3D pte_offset_map(pmd, addr); + if (pte) { + err =3D walk_pte_range_inner(pte, addr, end, walk); + if (walk->mm !=3D &init_mm) + pte_unmap(pte); + } } else { pte =3D pte_offset_map_lock(walk->mm, pmd, addr, &ptl); - err =3D walk_pte_range_inner(pte, addr, end, walk); - pte_unmap_unlock(pte, ptl); + if (pte) { + err =3D walk_pte_range_inner(pte, addr, end, walk); + pte_unmap_unlock(pte, ptl); + } } - + if (!pte) + walk->action =3D ACTION_AGAIN; return err; } =20 @@ -141,11 +153,8 @@ static int walk_pmd_range(pud_t *pud, unsigned long ad= dr, unsigned long end, !(ops->pte_entry)) continue; =20 - if (walk->vma) { + if (walk->vma) split_huge_pmd(walk->vma, pmd, addr); - if (pmd_trans_unstable(pmd)) - goto again; - } =20 if (is_hugepd(__hugepd(pmd_val(*pmd)))) err =3D walk_hugepd_range((hugepd_t *)pmd, addr, next, walk, PMD_SHIFT); @@ -153,6 +162,10 @@ static int walk_pmd_range(pud_t *pud, unsigned long ad= dr, unsigned long end, err =3D walk_pte_range(pmd, addr, next, walk); if (err) break; + + if (walk->action =3D=3D ACTION_AGAIN) + goto again; + } while (pmd++, addr =3D next, addr !=3D end); =20 return err; --=20 2.35.3 From nobody Sat Feb 7 19:45:35 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4E349C7EE23 for ; Mon, 22 May 2023 05:03:17 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231681AbjEVFDP (ORCPT ); Mon, 22 May 2023 01:03:15 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:47698 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231674AbjEVFDL (ORCPT ); Mon, 22 May 2023 01:03:11 -0400 Received: from mail-yb1-xb2a.google.com (mail-yb1-xb2a.google.com [IPv6:2607:f8b0:4864:20::b2a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C8433E9 for ; Sun, 21 May 2023 22:03:10 -0700 (PDT) Received: by mail-yb1-xb2a.google.com with SMTP id 3f1490d57ef6-ba86ea269e0so7904390276.1 for ; Sun, 21 May 2023 22:03:10 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1684731790; x=1687323790; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:from:to:cc:subject:date:message-id:reply-to; bh=W3t6elq9R4Rh1GonIcnIp9HMVNr0t0fb96tM78h+6lY=; b=Kc//GQWtdcYt13ahGjffwsLxy+Q5bVB6WROYwiSc3JojtgWwDe3izjSyHuE6U0IG/O 5Fry5zLgCWeTO19pC/0eYwlnvGNsyrmto6ZIBbyVNO56nsRvrMOAOUvYspI147xhZOi+ xqdX2iixj9Eo1KLPuK7IDQx9rj9ZoPh6D5XKx6tya/9I/bhzim+Yk6xTIzKBifA2beI4 8qMCGwFCGRwoHjLeW/d0tZsZvr4CJThANuX9SXUFpqX1pEid4ilFDqnQwxhHJZTeMBQb PwATHquSI9R4s4VAZXrMlE6dKgd4X76gX9zjTUT4KksMHQf4ahMKcn/OqGVTbmF1mVyt 4G7w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1684731790; x=1687323790; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=W3t6elq9R4Rh1GonIcnIp9HMVNr0t0fb96tM78h+6lY=; b=IN6gI94sVKJiUeDaC112dAJZ5TO/WkIGwL46DZwmZdrFTSkTu+fufGXiAEXOFKW06g VhjUss3ciLF9Y38n+dvjZgRLltne/dGeLO7SUmv8s/xyQk/vwcFMQBT70+Dbxk87gYaV AM1J9gm8EL4GofkwVNRV1G8wq5ZU+E/EWdQkSShy8wv+nY0DeYUHnq8XyhB6QJffDeS+ yTJlr11aGB1dtMKZmvWZikIW5uEJuYPeib/ESWCC0z6yhbhAfzTvfOVh7fEpLDszPObt SB4GxVZbOPZkCahO71apgufy6uiBdRIdBRlosUZbC31Qo3toAVmSLjfw9/cjII2r+RE+ f2SA== X-Gm-Message-State: AC+VfDyX33BbhhZOChB31JfOJhlTgNQX6RX50bFffpVUrx4WkYsg45ox nEEvhoBeY5bFad1pJgsiAnaW3Q== X-Google-Smtp-Source: ACHHUZ6OdnTYy1GhoQl2g/GfXGTjIkyQzrSYRsCypiToKGf2HILo2risNfGACl3kmPaMyL1d5bYIVQ== X-Received: by 2002:a25:aaac:0:b0:bab:eb8b:c484 with SMTP id t41-20020a25aaac000000b00babeb8bc484mr2926154ybi.14.1684731789855; Sun, 21 May 2023 22:03:09 -0700 (PDT) Received: from ripple.attlocal.net (172-10-233-147.lightspeed.sntcca.sbcglobal.net. [172.10.233.147]) by smtp.gmail.com with ESMTPSA id e6-20020a25b046000000b00ba73c26f0d6sm1322602ybj.15.2023.05.21.22.03.07 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 21 May 2023 22:03:09 -0700 (PDT) Date: Sun, 21 May 2023 22:03:06 -0700 (PDT) From: Hugh Dickins X-X-Sender: hugh@ripple.attlocal.net To: Andrew Morton cc: Mike Kravetz , Mike Rapoport , "Kirill A. Shutemov" , Matthew Wilcox , David Hildenbrand , Suren Baghdasaryan , Qi Zheng , Yang Shi , Mel Gorman , Peter Xu , Peter Zijlstra , Will Deacon , Yu Zhao , Alistair Popple , Ralph Campbell , Ira Weiny , Steven Price , SeongJae Park , Naoya Horiguchi , Christophe Leroy , Zack Rusin , Jason Gunthorpe , Axel Rasmussen , Anshuman Khandual , Pasha Tatashin , Miaohe Lin , Minchan Kim , Christoph Hellwig , Song Liu , Thomas Hellstrom , linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: [PATCH 11/31] mm/vmwgfx: simplify pmd & pud mapping dirty helpers In-Reply-To: <68a97fbe-5c1e-7ac6-72c-7b9c6290b370@google.com> Message-ID: References: <68a97fbe-5c1e-7ac6-72c-7b9c6290b370@google.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" wp_clean_pmd_entry() need not check pmd_trans_unstable() or pmd_none(), wp_clean_pud_entry() need not check pud_trans_unstable() or pud_none(): it's just the ACTION_CONTINUE when trans_huge or devmap that's needed to prevent splitting, and we're hoping to remove pmd_trans_unstable(). Is that PUD #ifdef necessary? Maybe some configs are missing a stub. Signed-off-by: Hugh Dickins --- mm/mapping_dirty_helpers.c | 34 +++++++++------------------------- 1 file changed, 9 insertions(+), 25 deletions(-) diff --git a/mm/mapping_dirty_helpers.c b/mm/mapping_dirty_helpers.c index e1eb33f49059..87b4beeda4fa 100644 --- a/mm/mapping_dirty_helpers.c +++ b/mm/mapping_dirty_helpers.c @@ -128,19 +128,11 @@ static int wp_clean_pmd_entry(pmd_t *pmd, unsigned lo= ng addr, unsigned long end, { pmd_t pmdval =3D pmdp_get_lockless(pmd); =20 - if (!pmd_trans_unstable(&pmdval)) - return 0; - - if (pmd_none(pmdval)) { - walk->action =3D ACTION_AGAIN; - return 0; - } - - /* Huge pmd, present or migrated */ - walk->action =3D ACTION_CONTINUE; - if (pmd_trans_huge(pmdval) || pmd_devmap(pmdval)) + /* Do not split a huge pmd, present or migrated */ + if (pmd_trans_huge(pmdval) || pmd_devmap(pmdval)) { WARN_ON(pmd_write(pmdval) || pmd_dirty(pmdval)); - + walk->action =3D ACTION_CONTINUE; + } return 0; } =20 @@ -156,23 +148,15 @@ static int wp_clean_pmd_entry(pmd_t *pmd, unsigned lo= ng addr, unsigned long end, static int wp_clean_pud_entry(pud_t *pud, unsigned long addr, unsigned lon= g end, struct mm_walk *walk) { +#ifdef CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD pud_t pudval =3D READ_ONCE(*pud); =20 - if (!pud_trans_unstable(&pudval)) - return 0; - - if (pud_none(pudval)) { - walk->action =3D ACTION_AGAIN; - return 0; - } - -#ifdef CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD - /* Huge pud */ - walk->action =3D ACTION_CONTINUE; - if (pud_trans_huge(pudval) || pud_devmap(pudval)) + /* Do not split a huge pud */ + if (pud_trans_huge(pudval) || pud_devmap(pudval)) { WARN_ON(pud_write(pudval) || pud_dirty(pudval)); + walk->action =3D ACTION_CONTINUE; + } #endif - return 0; } =20 --=20 2.35.3 From nobody Sat Feb 7 19:45:35 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 22A8EC7EE23 for ; Mon, 22 May 2023 05:04:16 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231693AbjEVFEP (ORCPT ); Mon, 22 May 2023 01:04:15 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48054 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231689AbjEVFEM (ORCPT ); Mon, 22 May 2023 01:04:12 -0400 Received: from mail-yw1-x1133.google.com (mail-yw1-x1133.google.com [IPv6:2607:f8b0:4864:20::1133]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 68EC9F4 for ; Sun, 21 May 2023 22:04:11 -0700 (PDT) Received: by mail-yw1-x1133.google.com with SMTP id 00721157ae682-561c1ae21e7so73851677b3.0 for ; Sun, 21 May 2023 22:04:11 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1684731850; x=1687323850; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:from:to:cc:subject:date:message-id:reply-to; bh=gMNcXWmakk7P+zoeKodH9UUwH6aG3s17nHsuG14zY8Y=; b=yE9oBZ+c+KHVypY5gza4GB1+80d52bTCDbLI/Y6WgFZjswF85OgTmRueNohIKa5HCn RIW2Wnvd56WBT/PaJWNYgidoUhB8V5KAp9zJm+23i0kvJLcqYbw0yNLwpWS8EoUIgMXO 0057efgR9JUAdgc/f2mkKLwhOjnHnkDDKtEeFpSuOpxEwvKhw8lEmNKs6VxGMTFXq9W5 sSyOJRzgEjdA5DausOy5f9zrEnebUpaQfXGLkpy0BqfSupfqQaLTpS0g/cbhzJZLF6eH CdCRTra6FNLJCs5X7UtLw6KjHZYdHYr0T5056rVVYaa+mJu1Od1zuvThKJk70HtbvQjx SxKA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1684731850; x=1687323850; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=gMNcXWmakk7P+zoeKodH9UUwH6aG3s17nHsuG14zY8Y=; b=hkiQQMZ8kbDZ8Keo5biyQe2tdLlnBWKJ+ULT9N9Q5L7A+F7aXyTg248RXQ9sfHbSXC 3za5l5Bs23jNFVFXvzEfVC4ALwji6ZAZ3vXgb5g0HaK+dwhw3den5KyUrj0aYORalZ8t 7SmIZtMSN8hod9x7MrEhfsR71SPhlALRbD7EzceasRPpow83yvowuONxqG7rHDIJPLoI 8ex0JOYjpt5TOT+5bubnLViLXIBEPZVgDXEwHldlqXtFfaiYl3iBnXx27QYtfNLCvNCc f5sTjiVd4BVkxd7+COIYYBLZAGYs2inrERf4Ob+o8Oylp4l7oo4+hasAiFRugyTJBfpv lCtA== X-Gm-Message-State: AC+VfDzpQS7hLrvOH6HVD1WY7cndSqzkUSNBHUPZpKgmukKdeHNt+psi eQZSVXhejlQRqqLDOnsyROgdyg== X-Google-Smtp-Source: ACHHUZ4s2TnDnXPeA62RlJKiZrtDTHYAyZbSNbhHfgawNyc1d9fDfp4flHO3RJFgNx0d20xXc/Hleg== X-Received: by 2002:a0d:d656:0:b0:55d:c8fb:8f61 with SMTP id y83-20020a0dd656000000b0055dc8fb8f61mr10119663ywd.7.1684731850557; Sun, 21 May 2023 22:04:10 -0700 (PDT) Received: from ripple.attlocal.net (172-10-233-147.lightspeed.sntcca.sbcglobal.net. [172.10.233.147]) by smtp.gmail.com with ESMTPSA id t184-20020a0dd1c1000000b00555c30ec361sm1798344ywd.143.2023.05.21.22.04.07 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 21 May 2023 22:04:10 -0700 (PDT) Date: Sun, 21 May 2023 22:04:07 -0700 (PDT) From: Hugh Dickins X-X-Sender: hugh@ripple.attlocal.net To: Andrew Morton cc: Mike Kravetz , Mike Rapoport , "Kirill A. Shutemov" , Matthew Wilcox , David Hildenbrand , Suren Baghdasaryan , Qi Zheng , Yang Shi , Mel Gorman , Peter Xu , Peter Zijlstra , Will Deacon , Yu Zhao , Alistair Popple , Ralph Campbell , Ira Weiny , Steven Price , SeongJae Park , Naoya Horiguchi , Christophe Leroy , Zack Rusin , Jason Gunthorpe , Axel Rasmussen , Anshuman Khandual , Pasha Tatashin , Miaohe Lin , Minchan Kim , Christoph Hellwig , Song Liu , Thomas Hellstrom , linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: [PATCH 12/31] mm/vmalloc: vmalloc_to_page() use pte_offset_kernel() In-Reply-To: <68a97fbe-5c1e-7ac6-72c-7b9c6290b370@google.com> Message-ID: References: <68a97fbe-5c1e-7ac6-72c-7b9c6290b370@google.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" vmalloc_to_page() was using pte_offset_map() (followed by pte_unmap()), but it's intended for userspace page tables: prefer pte_offset_kernel(). Signed-off-by: Hugh Dickins Reviewed-by: Lorenzo Stoakes --- mm/vmalloc.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/mm/vmalloc.c b/mm/vmalloc.c index 9683573f1225..741722d247d5 100644 --- a/mm/vmalloc.c +++ b/mm/vmalloc.c @@ -703,11 +703,10 @@ struct page *vmalloc_to_page(const void *vmalloc_addr) if (WARN_ON_ONCE(pmd_bad(*pmd))) return NULL; =20 - ptep =3D pte_offset_map(pmd, addr); + ptep =3D pte_offset_kernel(pmd, addr); pte =3D *ptep; if (pte_present(pte)) page =3D pte_page(pte); - pte_unmap(ptep); =20 return page; } --=20 2.35.3 From nobody Sat Feb 7 19:45:35 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 60BD5C77B73 for ; Mon, 22 May 2023 05:05:24 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231708AbjEVFFX (ORCPT ); Mon, 22 May 2023 01:05:23 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48502 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231694AbjEVFFU (ORCPT ); Mon, 22 May 2023 01:05:20 -0400 Received: from mail-yb1-xb2c.google.com (mail-yb1-xb2c.google.com [IPv6:2607:f8b0:4864:20::b2c]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 04652E9 for ; Sun, 21 May 2023 22:05:19 -0700 (PDT) Received: by mail-yb1-xb2c.google.com with SMTP id 3f1490d57ef6-b9daef8681fso4783618276.1 for ; Sun, 21 May 2023 22:05:19 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1684731919; x=1687323919; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:from:to:cc:subject:date:message-id:reply-to; bh=Q9fqf/aXpqllLgdqCnqVNZO09v0Ag5UU0wZCu2wmsUs=; b=OKj/gvd6hObldwT6K4phcbvAmhbsjul6iPbLRzKjs71jzjUwd6r3e8xCWCNZVwPENy 0HaWB0qgO/fNnS/Ois67og8DBdTyq3v/IDNkLAR46vDKiCQUR3xPJoyU7eFLCu5OCUxu ene9fgjo5yHZ3AS4/uG54DbEfQUuxcpEJxYx8UgryAp+t99VA5jZFAwJV246EE9mtg1F OZdfnZLlti+W2fNxVTHpJKMKRYyeIyeNsz3sd1gR6zvPQCCVQ9jZz+INsTuvonpNhmYF Bm782lSdhp0xDJA2N1jyxGLFmtKZ57I/CBr/DUG8sWZnMiCyq1l9BTQfv5dwSU+wKfRJ U63g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1684731919; x=1687323919; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=Q9fqf/aXpqllLgdqCnqVNZO09v0Ag5UU0wZCu2wmsUs=; b=FbAWxP75Sxkv1o8ghhO5EebgDFgN1xxVBnpRxQSPLVGvrZu+l4Zy5oX4Ub+Nta1gML OiFlWdGZFL1qZiMWfdI6XkZq5xHtKMGFLxtiJEAfbd2cgQnRMPie8pNMRi4cL+kKKV3+ WeGvRsduYYzFT1kYASvE4wfLXwwzUAk8llXiU+CrfrvSFsRYz619og6nwrsMyaQpEjIb 9QrOECUHZLElPEKfa448+Zy1Efab9JEbLn1aT/oPPhPFoobGfAfCI7bLBNHDazqhu6lt gSIhbPaWR0fLbsyqQPg4nmUwyv8mHDH/YTwpOzmrXOvEZJ6k9WqrnnmQTh3ZZKKmRRWJ lqLw== X-Gm-Message-State: AC+VfDyzJ/5a8r7C55u8YS3eIQXYFk0fGH7pK7wFJ3BZ0nCOl18W77Cd INa1x4yvnJjjwz1rwfhnbpUn6A== X-Google-Smtp-Source: ACHHUZ5Raj9TudeXZElbzzsJbxf03YA3/zAVWdObhAbA8Pr6yvbRJfnYmXwklG1UR7VtuXWIGmLBMQ== X-Received: by 2002:a25:fa12:0:b0:ba8:1c9e:c77f with SMTP id b18-20020a25fa12000000b00ba81c9ec77fmr9327198ybe.22.1684731919014; Sun, 21 May 2023 22:05:19 -0700 (PDT) Received: from ripple.attlocal.net (172-10-233-147.lightspeed.sntcca.sbcglobal.net. [172.10.233.147]) by smtp.gmail.com with ESMTPSA id a12-20020a25938c000000b00ba87e9b5bf9sm1274482ybm.45.2023.05.21.22.05.16 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 21 May 2023 22:05:18 -0700 (PDT) Date: Sun, 21 May 2023 22:05:15 -0700 (PDT) From: Hugh Dickins X-X-Sender: hugh@ripple.attlocal.net To: Andrew Morton cc: Mike Kravetz , Mike Rapoport , "Kirill A. Shutemov" , Matthew Wilcox , David Hildenbrand , Suren Baghdasaryan , Qi Zheng , Yang Shi , Mel Gorman , Peter Xu , Peter Zijlstra , Will Deacon , Yu Zhao , Alistair Popple , Ralph Campbell , Ira Weiny , Steven Price , SeongJae Park , Naoya Horiguchi , Christophe Leroy , Zack Rusin , Jason Gunthorpe , Axel Rasmussen , Anshuman Khandual , Pasha Tatashin , Miaohe Lin , Minchan Kim , Christoph Hellwig , Song Liu , Thomas Hellstrom , linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: [PATCH 13/31] mm/hmm: retry if pte_offset_map() fails In-Reply-To: <68a97fbe-5c1e-7ac6-72c-7b9c6290b370@google.com> Message-ID: <2edc4657-b6ff-3d6e-2342-6b60bfccc5b@google.com> References: <68a97fbe-5c1e-7ac6-72c-7b9c6290b370@google.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" hmm_vma_walk_pmd() is called through mm_walk, but already has a goto again loop of its own, so take part in that if pte_offset_map() fails. Signed-off-by: Hugh Dickins Reviewed-by: Alistair Popple --- mm/hmm.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/mm/hmm.c b/mm/hmm.c index e23043345615..b1a9159d7c92 100644 --- a/mm/hmm.c +++ b/mm/hmm.c @@ -381,6 +381,8 @@ static int hmm_vma_walk_pmd(pmd_t *pmdp, } =20 ptep =3D pte_offset_map(pmdp, addr); + if (!ptep) + goto again; for (; addr < end; addr +=3D PAGE_SIZE, ptep++, hmm_pfns++) { int r; =20 --=20 2.35.3 From nobody Sat Feb 7 19:45:35 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 707C5C7EE23 for ; Mon, 22 May 2023 05:06:42 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231714AbjEVFGl (ORCPT ); Mon, 22 May 2023 01:06:41 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48864 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229722AbjEVFGi (ORCPT ); Mon, 22 May 2023 01:06:38 -0400 Received: from mail-yw1-x112b.google.com (mail-yw1-x112b.google.com [IPv6:2607:f8b0:4864:20::112b]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id F330A92 for ; Sun, 21 May 2023 22:06:36 -0700 (PDT) Received: by mail-yw1-x112b.google.com with SMTP id 00721157ae682-561bcd35117so70100797b3.3 for ; Sun, 21 May 2023 22:06:36 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1684731996; x=1687323996; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:from:to:cc:subject:date:message-id:reply-to; bh=itm4lMYCv8hAr7oWbErSKKsiGUM8mghvSpbiblkFfbk=; b=xUO60/u0hCL36TvKXHyURnnxqO5vn7ZybjY7ONPqMVFmXcRaGi5WJ1fjIK85dRtVJa +whV5tpXJtxlVEd0xUGkv7xuykXYLwzIAtD/A83Pd0ByM1utIbe7HqhKdkTngJ8uTZye nXdCtHMDNgLl+HSpG2o6gWxk89SbFvj51bDmDs5EUuxNtQO799f4KEwsfMDYpb1aA0JV G1qpsPEAFFQ60/jP3AFkOagxrcttoK3mZdLQdFsbhRCY1FyAsYOWEizdcI6h50IGdllG JcD8IVUHEVvPfu5cztMQwnxIE3n4xO4E81T6/znW0rbNwJzdKCpC1q5F2RM6c3EnqDt5 BYMw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1684731996; x=1687323996; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=itm4lMYCv8hAr7oWbErSKKsiGUM8mghvSpbiblkFfbk=; b=JGede4ErEdyjhrVGGycNcFkSlUzDrvCtgqWU0mr2ngN1RWwWkpuG7SwIm9NmseuFLm 3A8tm2hbTSo59diT3g4riGqiQQXsv99qZ6pSNVGejltHyDcspDTDMXwE9ZBaKGdqhmiJ q+wfrWRCje9E9i2Ajz1CeEj/wCp2m+ufJ/A2/VwJLeFrYvZWJ7puGzz0FkFGmk99b7Ki +f/amxmJncen/f2z0D9nFl1cCUw1fGOKdPnx6b29OyqHiKGpX5y46aoKz7pgrqNrzVee USvVE5a0x2q8Lf+emynG6xHaTetF2VkRWOLdXyG1i9oygMjXOPUNDp5u+4dptyjebj7o 9rqQ== X-Gm-Message-State: AC+VfDyWEanvfquZs+woCunWTjm387zYd+Wrl8dX1Ux5wQZRyOpSTP0e 6PZUZ4KCWi18UwXbKwNd72IEIQ== X-Google-Smtp-Source: ACHHUZ78QIP2SjKpIXnCqaoasaQ8h+cR3FlbUtT2N1GVyk/3dI0nRSCd0E3wxVyacPDSOVPhhCpzYw== X-Received: by 2002:a0d:ed43:0:b0:561:9d6e:6f45 with SMTP id w64-20020a0ded43000000b005619d6e6f45mr10989561ywe.26.1684731996077; Sun, 21 May 2023 22:06:36 -0700 (PDT) Received: from ripple.attlocal.net (172-10-233-147.lightspeed.sntcca.sbcglobal.net. [172.10.233.147]) by smtp.gmail.com with ESMTPSA id v184-20020a8148c1000000b0054f50f71834sm1805106ywa.124.2023.05.21.22.06.33 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 21 May 2023 22:06:35 -0700 (PDT) Date: Sun, 21 May 2023 22:06:32 -0700 (PDT) From: Hugh Dickins X-X-Sender: hugh@ripple.attlocal.net To: Andrew Morton cc: Mike Kravetz , Mike Rapoport , "Kirill A. Shutemov" , Matthew Wilcox , David Hildenbrand , Suren Baghdasaryan , Qi Zheng , Yang Shi , Mel Gorman , Peter Xu , Peter Zijlstra , Will Deacon , Yu Zhao , Alistair Popple , Ralph Campbell , Ira Weiny , Steven Price , SeongJae Park , Naoya Horiguchi , Christophe Leroy , Zack Rusin , Jason Gunthorpe , Axel Rasmussen , Anshuman Khandual , Pasha Tatashin , Miaohe Lin , Minchan Kim , Christoph Hellwig , Song Liu , Thomas Hellstrom , linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: [PATCH 14/31] fs/userfaultfd: retry if pte_offset_map() fails In-Reply-To: <68a97fbe-5c1e-7ac6-72c-7b9c6290b370@google.com> Message-ID: <424274a4-7c13-e14-b380-428fc69a45c5@google.com> References: <68a97fbe-5c1e-7ac6-72c-7b9c6290b370@google.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Instead of worrying whether the pmd is stable, userfaultfd_must_wait() call pte_offset_map() as before, but go back to try again if that fails. Risk of endless loop? It already broke out if pmd_none(), !pmd_present() or pmd_trans_huge(), and pte_offset_map() would have cleared pmd_bad(): which leaves pmd_devmap(). Presumably pmd_devmap() is inappropriate in a vma subject to userfaultfd (it would have been mistreated before), but add a check just to avoid all possibility of endless loop there. Signed-off-by: Hugh Dickins Acked-by: Peter Xu --- fs/userfaultfd.c | 11 ++++++----- 1 file changed, 6 insertions(+), 5 deletions(-) diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c index f7a0817b1ec0..ca83423f8d54 100644 --- a/fs/userfaultfd.c +++ b/fs/userfaultfd.c @@ -349,12 +349,13 @@ static inline bool userfaultfd_must_wait(struct userf= aultfd_ctx *ctx, if (!pud_present(*pud)) goto out; pmd =3D pmd_offset(pud, address); +again: _pmd =3D pmdp_get_lockless(pmd); if (pmd_none(_pmd)) goto out; =20 ret =3D false; - if (!pmd_present(_pmd)) + if (!pmd_present(_pmd) || pmd_devmap(_pmd)) goto out; =20 if (pmd_trans_huge(_pmd)) { @@ -363,11 +364,11 @@ static inline bool userfaultfd_must_wait(struct userf= aultfd_ctx *ctx, goto out; } =20 - /* - * the pmd is stable (as in !pmd_trans_unstable) so we can re-read it - * and use the standard pte_offset_map() instead of parsing _pmd. - */ pte =3D pte_offset_map(pmd, address); + if (!pte) { + ret =3D true; + goto again; + } /* * Lockless access: we're in a wait_event so it's ok if it * changes under us. PTE markers should be handled the same as none --=20 2.35.3 From nobody Sat Feb 7 19:45:35 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 35E0DC77B73 for ; Mon, 22 May 2023 05:07:51 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231777AbjEVFHu (ORCPT ); Mon, 22 May 2023 01:07:50 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49220 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231748AbjEVFHp (ORCPT ); Mon, 22 May 2023 01:07:45 -0400 Received: from mail-yb1-xb2c.google.com (mail-yb1-xb2c.google.com [IPv6:2607:f8b0:4864:20::b2c]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 93863121 for ; Sun, 21 May 2023 22:07:40 -0700 (PDT) Received: by mail-yb1-xb2c.google.com with SMTP id 3f1490d57ef6-ba81ded8d3eso8293222276.3 for ; Sun, 21 May 2023 22:07:40 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1684732059; x=1687324059; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:from:to:cc:subject:date:message-id:reply-to; bh=Po1SIcw5OHs73xEeyHWdahQyrrpxMX/55AnymOheHdY=; b=XzjxgjxtBkGDdrW/JpZQDO+0w80TppLUP+i4zWugdKgK+0BNyYiAZxNPu761zqLwx6 N3d13FtSjEQD3/Az+Ls+AFMRVUTTfTFgq/wZnRC0unRzbTg+QL2AfL6VOrwQD3Fvko86 qlQqsdfEsD0BIjaUfvcqtxxN7EhPjQv4ZZ4GoDxZbghd8kbh23gJUnZyVTa53cW4rd+n tv3DkjOFptWFHPJ1odIqKO55YZwU9ED++0I8TwmndtdJ/9Tx2WjmcnDlE6yL1e/aGSkQ np4d2w4yIyiw0TP07HR0yzbTPdSSCgTIX7OtmwdleJ4u8G05hehjmr7+xcCmgkF4+y38 RHng== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1684732059; x=1687324059; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=Po1SIcw5OHs73xEeyHWdahQyrrpxMX/55AnymOheHdY=; b=dVww+O0f9z+uvXtvp1yOl6IH2wP4gUM7hpmSEoHntm/39WLvWf5tCPg8qTxMLUGeYX FhshdE4h4y2RLGWuJoAmK2539m5ebfA6VVKsswOu3aA8LsIk35PEZlGw7Bqs9ZHFiadS /UemxjSvWZeAuSDtdRL1kgXLwLbjRic3VoP6/2PZ1zjj8aPERNI8V38xrvQHa0rhu+m+ uyq2Uk9hQY+sJ25ankSaVFEE9PWcaYj6f1lZz32C/zKE4essLpIYwGqqk2uMy13LLK2F MozzqfSTPgr50Em/deqCMUGFy777ttj4mPq6dO6BOCVnoenjbHUzcXyQHlUciRKStZ/Y Rq4w== X-Gm-Message-State: AC+VfDzt37Sa2ZjHkFkr6SQwGcgSElx7PqYcv2qnATv0jGBox1tXgT7G uzSVazV1g0025nniqD0CBrrsQw== X-Google-Smtp-Source: ACHHUZ4m3WCKHCRUDIqxwtJbDgKim2PjP4Uj/Gl/wyaY7gpZzCtLpfaCmSlMhzQlWiwCXodOnzTDfw== X-Received: by 2002:a25:ada2:0:b0:ba8:fe6:8e3f with SMTP id z34-20020a25ada2000000b00ba80fe68e3fmr8924520ybi.5.1684732059390; Sun, 21 May 2023 22:07:39 -0700 (PDT) Received: from ripple.attlocal.net (172-10-233-147.lightspeed.sntcca.sbcglobal.net. [172.10.233.147]) by smtp.gmail.com with ESMTPSA id r10-20020a0de80a000000b0054662f7b42dsm1801064ywe.63.2023.05.21.22.07.36 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 21 May 2023 22:07:39 -0700 (PDT) Date: Sun, 21 May 2023 22:07:35 -0700 (PDT) From: Hugh Dickins X-X-Sender: hugh@ripple.attlocal.net To: Andrew Morton cc: Mike Kravetz , Mike Rapoport , "Kirill A. Shutemov" , Matthew Wilcox , David Hildenbrand , Suren Baghdasaryan , Qi Zheng , Yang Shi , Mel Gorman , Peter Xu , Peter Zijlstra , Will Deacon , Yu Zhao , Alistair Popple , Ralph Campbell , Ira Weiny , Steven Price , SeongJae Park , Naoya Horiguchi , Christophe Leroy , Zack Rusin , Jason Gunthorpe , Axel Rasmussen , Anshuman Khandual , Pasha Tatashin , Miaohe Lin , Minchan Kim , Christoph Hellwig , Song Liu , Thomas Hellstrom , linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: [PATCH 15/31] mm/userfaultfd: allow pte_offset_map_lock() to fail In-Reply-To: <68a97fbe-5c1e-7ac6-72c-7b9c6290b370@google.com> Message-ID: <49d92b15-3442-4e84-39bd-c77c316bf844@google.com> References: <68a97fbe-5c1e-7ac6-72c-7b9c6290b370@google.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" mfill_atomic_install_pte() and mfill_atomic_pte_zeropage() treat failed pte_offset_map_lock() as -EFAULT, with no attempt to retry. Signed-off-by: Hugh Dickins --- mm/userfaultfd.c | 10 ++++++++-- 1 file changed, 8 insertions(+), 2 deletions(-) diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c index e97a0b4889fc..b1554286a31c 100644 --- a/mm/userfaultfd.c +++ b/mm/userfaultfd.c @@ -76,14 +76,16 @@ int mfill_atomic_install_pte(pmd_t *dst_pmd, if (flags & MFILL_ATOMIC_WP) _dst_pte =3D pte_mkuffd_wp(_dst_pte); =20 + ret =3D -EFAULT; dst_pte =3D pte_offset_map_lock(dst_mm, dst_pmd, dst_addr, &ptl); + if (!dst_pte) + goto out; =20 if (vma_is_shmem(dst_vma)) { /* serialize against truncate with the page table lock */ inode =3D dst_vma->vm_file->f_inode; offset =3D linear_page_index(dst_vma, dst_addr); max_off =3D DIV_ROUND_UP(i_size_read(inode), PAGE_SIZE); - ret =3D -EFAULT; if (unlikely(offset >=3D max_off)) goto out_unlock; } @@ -121,6 +123,7 @@ int mfill_atomic_install_pte(pmd_t *dst_pmd, ret =3D 0; out_unlock: pte_unmap_unlock(dst_pte, ptl); +out: return ret; } =20 @@ -212,13 +215,15 @@ static int mfill_atomic_pte_zeropage(pmd_t *dst_pmd, =20 _dst_pte =3D pte_mkspecial(pfn_pte(my_zero_pfn(dst_addr), dst_vma->vm_page_prot)); + ret =3D -EFAULT; dst_pte =3D pte_offset_map_lock(dst_vma->vm_mm, dst_pmd, dst_addr, &ptl); + if (!dst_pte) + goto out; if (dst_vma->vm_file) { /* the shmem MAP_PRIVATE case requires checking the i_size */ inode =3D dst_vma->vm_file->f_inode; offset =3D linear_page_index(dst_vma, dst_addr); max_off =3D DIV_ROUND_UP(i_size_read(inode), PAGE_SIZE); - ret =3D -EFAULT; if (unlikely(offset >=3D max_off)) goto out_unlock; } @@ -231,6 +236,7 @@ static int mfill_atomic_pte_zeropage(pmd_t *dst_pmd, ret =3D 0; out_unlock: pte_unmap_unlock(dst_pte, ptl); +out: return ret; } =20 --=20 2.35.3 From nobody Sat Feb 7 19:45:35 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5D543C77B75 for ; Mon, 22 May 2023 05:10:45 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231826AbjEVFJp (ORCPT ); Mon, 22 May 2023 01:09:45 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:50106 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231863AbjEVFJb (ORCPT ); Mon, 22 May 2023 01:09:31 -0400 Received: from mail-yw1-x112f.google.com (mail-yw1-x112f.google.com [IPv6:2607:f8b0:4864:20::112f]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 4FEDA130 for ; Sun, 21 May 2023 22:09:00 -0700 (PDT) Received: by mail-yw1-x112f.google.com with SMTP id 00721157ae682-561bcd35117so70117817b3.3 for ; Sun, 21 May 2023 22:09:00 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1684732137; x=1687324137; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:from:to:cc:subject:date:message-id:reply-to; bh=7rTGIbKCcYdvVJu9Ybd88rQGNMtFs0WCnevY0u3jIzc=; b=FWm+aAqCrTI8L5F1peTncvX2j+gJ1K5pdecyQ2SgwMPmXn8TR/t30T7yAueI1Z1Prx +ZNB4HFJeDGCEa3TY+DbZFPXAQMoW0lt0/cPYFkOtZn9pNtwNaAji3QwKEg49ie0wiso /aqMK319Yal/5gxVXR6hyaD4EBzJg8qItdk0tBwwmogEtt3sYbaEVCj54AVP/iyafLSN g3n7sbdycHd6xijbyr/rauTFVAs7KxoLkcQWJDCzbFut9bVjwaE8Xx0Mf3/CDuWrY3hk wIWhl9hSpGsCek8QZILjryCAxN2GrBpCITRg0NVy+fYzjrAIzOzQeSq16gHJmxidsl5V /Prg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1684732137; x=1687324137; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=7rTGIbKCcYdvVJu9Ybd88rQGNMtFs0WCnevY0u3jIzc=; b=hOsWBwHH0w7/3YIZ04LkX87D6UukroLP1iiw1SRI9Z/6VOJm0fvTXorldxsFT3qZl4 5XyzR549eVJvdFAk53oigyG+jGjmIGUH8+TI/FMk2n4n4exizuocG2iwE2sgRjdvtA5E KVCX3aQbF5e8OJwAYOSGkjiDtWTlQ+iPOZPfrsuKtT1oFw0Dd6jogUnoeDc3e2BbtBWS 4nBEKvv7xnO5i7J2SWkHGAWsHqEZAbXy0vZWzisErAGiTy9bsc2XB0Nmctlo67P3XmLH v/ViPFAUu30Vp2iNKdy89i2nqxP+8sgfKkcAKEWLVEt/MGwf6w20dZt2egzSP/ZbRAJn 9BZw== X-Gm-Message-State: AC+VfDy1R3bw8pf6v+xau4p6GzMvwMlNDfPDW+e8vwQUBzSUSbS/JYu4 +qzNzCIVp4+85m1eRlLqo2NzxA== X-Google-Smtp-Source: ACHHUZ7MEfF9deLoAIS2DPpctOmVhab2WCwoG7BFDeIpVNW1XvRRPWqoYkqwVwVotMomYmO1i86qzw== X-Received: by 2002:a0d:d74a:0:b0:55a:14df:5c10 with SMTP id z71-20020a0dd74a000000b0055a14df5c10mr12334350ywd.18.1684732136920; Sun, 21 May 2023 22:08:56 -0700 (PDT) Received: from ripple.attlocal.net (172-10-233-147.lightspeed.sntcca.sbcglobal.net. [172.10.233.147]) by smtp.gmail.com with ESMTPSA id t67-20020a818346000000b0055a503ca1e8sm1804176ywf.109.2023.05.21.22.08.54 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 21 May 2023 22:08:56 -0700 (PDT) Date: Sun, 21 May 2023 22:08:53 -0700 (PDT) From: Hugh Dickins X-X-Sender: hugh@ripple.attlocal.net To: Andrew Morton cc: Mike Kravetz , Mike Rapoport , "Kirill A. Shutemov" , Matthew Wilcox , David Hildenbrand , Suren Baghdasaryan , Qi Zheng , Yang Shi , Mel Gorman , Peter Xu , Peter Zijlstra , Will Deacon , Yu Zhao , Alistair Popple , Ralph Campbell , Ira Weiny , Steven Price , SeongJae Park , Naoya Horiguchi , Christophe Leroy , Zack Rusin , Jason Gunthorpe , Axel Rasmussen , Anshuman Khandual , Pasha Tatashin , Miaohe Lin , Minchan Kim , Christoph Hellwig , Song Liu , Thomas Hellstrom , linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: [PATCH 16/31] mm/debug_vm_pgtable,page_table_check: warn pte map fails In-Reply-To: <68a97fbe-5c1e-7ac6-72c-7b9c6290b370@google.com> Message-ID: <87c0e8cc-85c0-806e-da9f-b7b3cacde7d@google.com> References: <68a97fbe-5c1e-7ac6-72c-7b9c6290b370@google.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Failures here would be surprising: pte_advanced_tests() and pte_clear_tests() and __page_table_check_pte_clear_range() each issue a warning if pte_offset_map() or pte_offset_map_lock() fails. Signed-off-by: Hugh Dickins --- mm/debug_vm_pgtable.c | 9 ++++++++- mm/page_table_check.c | 2 ++ 2 files changed, 10 insertions(+), 1 deletion(-) diff --git a/mm/debug_vm_pgtable.c b/mm/debug_vm_pgtable.c index c54177aabebd..ee119e33fef1 100644 --- a/mm/debug_vm_pgtable.c +++ b/mm/debug_vm_pgtable.c @@ -138,6 +138,9 @@ static void __init pte_advanced_tests(struct pgtable_de= bug_args *args) return; =20 pr_debug("Validating PTE advanced\n"); + if (WARN_ON(!args->ptep)) + return; + pte =3D pfn_pte(args->pte_pfn, args->page_prot); set_pte_at(args->mm, args->vaddr, args->ptep, pte); flush_dcache_page(page); @@ -619,6 +622,9 @@ static void __init pte_clear_tests(struct pgtable_debug= _args *args) * the unexpected overhead of cache flushing is acceptable. */ pr_debug("Validating PTE clear\n"); + if (WARN_ON(!args->ptep)) + return; + #ifndef CONFIG_RISCV pte =3D __pte(pte_val(pte) | RANDOM_ORVALUE); #endif @@ -1377,7 +1383,8 @@ static int __init debug_vm_pgtable(void) args.ptep =3D pte_offset_map_lock(args.mm, args.pmdp, args.vaddr, &ptl); pte_clear_tests(&args); pte_advanced_tests(&args); - pte_unmap_unlock(args.ptep, ptl); + if (args.ptep) + pte_unmap_unlock(args.ptep, ptl); =20 ptl =3D pmd_lock(args.mm, args.pmdp); pmd_clear_tests(&args); diff --git a/mm/page_table_check.c b/mm/page_table_check.c index 25d8610c0042..0c511330dbc9 100644 --- a/mm/page_table_check.c +++ b/mm/page_table_check.c @@ -240,6 +240,8 @@ void __page_table_check_pte_clear_range(struct mm_struc= t *mm, pte_t *ptep =3D pte_offset_map(&pmd, addr); unsigned long i; =20 + if (WARN_ON(!ptep)) + return; for (i =3D 0; i < PTRS_PER_PTE; i++) { __page_table_check_pte_clear(mm, addr, *ptep); addr +=3D PAGE_SIZE; --=20 2.35.3 From nobody Sat Feb 7 19:45:35 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8A13EC77B75 for ; Mon, 22 May 2023 05:10:54 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231955AbjEVFKx (ORCPT ); Mon, 22 May 2023 01:10:53 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:51950 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231804AbjEVFKg (ORCPT ); Mon, 22 May 2023 01:10:36 -0400 Received: from mail-yb1-xb36.google.com (mail-yb1-xb36.google.com [IPv6:2607:f8b0:4864:20::b36]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 9889D109 for ; Sun, 21 May 2023 22:10:14 -0700 (PDT) Received: by mail-yb1-xb36.google.com with SMTP id 3f1490d57ef6-b9e6ec482b3so8134495276.3 for ; Sun, 21 May 2023 22:10:14 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1684732209; x=1687324209; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:from:to:cc:subject:date:message-id:reply-to; bh=qkatc+QCc6cccs9xXaqGfaqNL6OFGEuJiK0miGnv7G8=; b=G8p2CzqFuweOTJakxAlGPjhKL7rDUMtnLNT7EUicI7w+jufYMVBoVuc81ZJQYoSTfb w6t5S+SR5Hp/NszCj/XtFj6CuqMt4ZlVp3PxVkJ3mlWRBt+Iifs7tpZrzkBlF0AC0HZQ qbtqENGFk4bM4644i8k7d4FbjPR3Um0C3R30rKDY1tPszjXHKCLCSRdB9GHFrQ6reUFc o/gs5XQG3ml90zH4fHmJbebDWsSKfY/AnaCH/uwICdx3ExQ+m1F+PTZVZDShUMb4Npmu GRvVy3RMuLlfKPCoJDA4zF1iEeyu+t7qtp3Nzn44IZZdc2OcH7AU7ec8Vtx87ucadbDG 6sAw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1684732209; x=1687324209; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=qkatc+QCc6cccs9xXaqGfaqNL6OFGEuJiK0miGnv7G8=; b=LVlnEgtWpjBzJAoREowHovZ0aIJp4Du86drlwVFAP9cQNV0pL/EzYtZKe5cs9lMpP7 +K4Loyvr/yjgkGP2PVtJ4XesrCZkPM/OBhgOef9Kc8LSwSytRaHvZBKBo1HGb59LIxYN cfvrn1sq//YJgZl9NEycRzRfFtK+QdAyVgQKVYFhEU9TiwaBxFHPyJ6xsuYvEEsdkTAJ Wk/X2mQl7wmQBb+KoXKa/yllOCM0UUP0WXPB6uGVKgHqgqzlq575x1XLaL7t7KixH039 I1jyESS42tUyEbdilyNTn7ULKkAtvladzXwunG/ghzPukDhREwl3J/aq/I8kQIZ4twb0 fE+g== X-Gm-Message-State: AC+VfDxgfXXadzxpPxSEOd+p7chP8MnIq/wNyoE/W63fshF/zGXNM9yN yhb2hy1Tz6AfHYJgyvzxxP177w== X-Google-Smtp-Source: ACHHUZ7YbSO8bhu9h9PxXguF69pafygjT3CZbvHofHpkGqbBtfy3DyuxisCr5F+9n6J8Jk2l8rmL9g== X-Received: by 2002:a0d:d595:0:b0:561:e910:52f5 with SMTP id x143-20020a0dd595000000b00561e91052f5mr12187363ywd.27.1684732208925; Sun, 21 May 2023 22:10:08 -0700 (PDT) Received: from ripple.attlocal.net (172-10-233-147.lightspeed.sntcca.sbcglobal.net. [172.10.233.147]) by smtp.gmail.com with ESMTPSA id y185-20020a817dc2000000b00545a08184fdsm1790422ywc.141.2023.05.21.22.10.06 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 21 May 2023 22:10:08 -0700 (PDT) Date: Sun, 21 May 2023 22:10:05 -0700 (PDT) From: Hugh Dickins X-X-Sender: hugh@ripple.attlocal.net To: Andrew Morton cc: Mike Kravetz , Mike Rapoport , "Kirill A. Shutemov" , Matthew Wilcox , David Hildenbrand , Suren Baghdasaryan , Qi Zheng , Yang Shi , Mel Gorman , Peter Xu , Peter Zijlstra , Will Deacon , Yu Zhao , Alistair Popple , Ralph Campbell , Ira Weiny , Steven Price , SeongJae Park , Naoya Horiguchi , Christophe Leroy , Zack Rusin , Jason Gunthorpe , Axel Rasmussen , Anshuman Khandual , Pasha Tatashin , Miaohe Lin , Minchan Kim , Christoph Hellwig , Song Liu , Thomas Hellstrom , linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: [PATCH 17/31] mm/various: give up if pte_offset_map[_lock]() fails In-Reply-To: <68a97fbe-5c1e-7ac6-72c-7b9c6290b370@google.com> Message-ID: References: <68a97fbe-5c1e-7ac6-72c-7b9c6290b370@google.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Following the examples of nearby code, various functions can just give up if pte_offset_map() or pte_offset_map_lock() fails. And there's no need for a preliminary pmd_trans_unstable() or other such check, since such cases are now safely handled inside. Signed-off-by: Hugh Dickins --- mm/gup.c | 9 ++++++--- mm/ksm.c | 7 ++++--- mm/memcontrol.c | 8 ++++---- mm/memory-failure.c | 8 +++++--- mm/migrate.c | 3 +++ mm/swap_state.c | 3 +++ 6 files changed, 25 insertions(+), 13 deletions(-) diff --git a/mm/gup.c b/mm/gup.c index 3bd5d3854c51..bb67193c5460 100644 --- a/mm/gup.c +++ b/mm/gup.c @@ -544,10 +544,10 @@ static struct page *follow_page_pte(struct vm_area_st= ruct *vma, if (WARN_ON_ONCE((flags & (FOLL_PIN | FOLL_GET)) =3D=3D (FOLL_PIN | FOLL_GET))) return ERR_PTR(-EINVAL); - if (unlikely(pmd_bad(*pmd))) - return no_page_table(vma, flags); =20 ptep =3D pte_offset_map_lock(mm, pmd, address, &ptl); + if (!ptep) + return no_page_table(vma, flags); pte =3D *ptep; if (!pte_present(pte)) goto no_page; @@ -851,8 +851,9 @@ static int get_gate_page(struct mm_struct *mm, unsigned= long address, pmd =3D pmd_offset(pud, address); if (!pmd_present(*pmd)) return -EFAULT; - VM_BUG_ON(pmd_trans_huge(*pmd)); pte =3D pte_offset_map(pmd, address); + if (!pte) + return -EFAULT; if (pte_none(*pte)) goto unmap; *vma =3D get_gate_vma(mm); @@ -2377,6 +2378,8 @@ static int gup_pte_range(pmd_t pmd, pmd_t *pmdp, unsi= gned long addr, pte_t *ptep, *ptem; =20 ptem =3D ptep =3D pte_offset_map(&pmd, addr); + if (!ptep) + return 0; do { pte_t pte =3D ptep_get_lockless(ptep); struct page *page; diff --git a/mm/ksm.c b/mm/ksm.c index df2aa281d49d..3dc15459dd20 100644 --- a/mm/ksm.c +++ b/mm/ksm.c @@ -431,10 +431,9 @@ static int break_ksm_pmd_entry(pmd_t *pmd, unsigned lo= ng addr, unsigned long nex pte_t *pte; int ret; =20 - if (pmd_leaf(*pmd) || !pmd_present(*pmd)) - return 0; - pte =3D pte_offset_map_lock(walk->mm, pmd, addr, &ptl); + if (!pte) + return 0; if (pte_present(*pte)) { page =3D vm_normal_page(walk->vma, addr, *pte); } else if (!pte_none(*pte)) { @@ -1203,6 +1202,8 @@ static int replace_page(struct vm_area_struct *vma, s= truct page *page, mmu_notifier_invalidate_range_start(&range); =20 ptep =3D pte_offset_map_lock(mm, pmd, addr, &ptl); + if (!ptep) + goto out_mn; if (!pte_same(*ptep, orig_pte)) { pte_unmap_unlock(ptep, ptl); goto out_mn; diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 4b27e245a055..fdd953655fe1 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -6057,9 +6057,9 @@ static int mem_cgroup_count_precharge_pte_range(pmd_t= *pmd, return 0; } =20 - if (pmd_trans_unstable(pmd)) - return 0; pte =3D pte_offset_map_lock(vma->vm_mm, pmd, addr, &ptl); + if (!pte) + return 0; for (; addr !=3D end; pte++, addr +=3D PAGE_SIZE) if (get_mctgt_type(vma, addr, *pte, NULL)) mc.precharge++; /* increment precharge temporarily */ @@ -6277,10 +6277,10 @@ static int mem_cgroup_move_charge_pte_range(pmd_t *= pmd, return 0; } =20 - if (pmd_trans_unstable(pmd)) - return 0; retry: pte =3D pte_offset_map_lock(vma->vm_mm, pmd, addr, &ptl); + if (!pte) + return 0; for (; addr !=3D end; addr +=3D PAGE_SIZE) { pte_t ptent =3D *(pte++); bool device =3D false; diff --git a/mm/memory-failure.c b/mm/memory-failure.c index 5b663eca1f29..b3cc8f213fe3 100644 --- a/mm/memory-failure.c +++ b/mm/memory-failure.c @@ -414,6 +414,8 @@ static unsigned long dev_pagemap_mapping_shift(struct v= m_area_struct *vma, if (pmd_devmap(*pmd)) return PMD_SHIFT; pte =3D pte_offset_map(pmd, address); + if (!pte) + return 0; if (pte_present(*pte) && pte_devmap(*pte)) ret =3D PAGE_SHIFT; pte_unmap(pte); @@ -800,11 +802,11 @@ static int hwpoison_pte_range(pmd_t *pmdp, unsigned l= ong addr, goto out; } =20 - if (pmd_trans_unstable(pmdp)) - goto out; - mapped_pte =3D ptep =3D pte_offset_map_lock(walk->vma->vm_mm, pmdp, addr, &ptl); + if (!ptep) + goto out; + for (; addr !=3D end; ptep++, addr +=3D PAGE_SIZE) { ret =3D check_hwpoisoned_entry(*ptep, addr, PAGE_SHIFT, hwp->pfn, &hwp->tk); diff --git a/mm/migrate.c b/mm/migrate.c index 3ecb7a40075f..308a56f0b156 100644 --- a/mm/migrate.c +++ b/mm/migrate.c @@ -305,6 +305,9 @@ void migration_entry_wait(struct mm_struct *mm, pmd_t *= pmd, swp_entry_t entry; =20 ptep =3D pte_offset_map_lock(mm, pmd, address, &ptl); + if (!ptep) + return; + pte =3D *ptep; pte_unmap(ptep); =20 diff --git a/mm/swap_state.c b/mm/swap_state.c index b76a65ac28b3..db2ec85ef332 100644 --- a/mm/swap_state.c +++ b/mm/swap_state.c @@ -734,6 +734,9 @@ static void swap_ra_info(struct vm_fault *vmf, =20 /* Copy the PTEs because the page table may be unmapped */ orig_pte =3D pte =3D pte_offset_map(vmf->pmd, faddr); + if (!pte) + return; + if (fpfn =3D=3D pfn + 1) { lpfn =3D fpfn; rpfn =3D fpfn + win; --=20 2.35.3 From nobody Sat Feb 7 19:45:35 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 248A3C7EE23 for ; Mon, 22 May 2023 05:12:19 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229917AbjEVFMR (ORCPT ); Mon, 22 May 2023 01:12:17 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53332 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229600AbjEVFMO (ORCPT ); Mon, 22 May 2023 01:12:14 -0400 Received: from mail-yw1-x1131.google.com (mail-yw1-x1131.google.com [IPv6:2607:f8b0:4864:20::1131]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id AF10EA1 for ; Sun, 21 May 2023 22:12:12 -0700 (PDT) Received: by mail-yw1-x1131.google.com with SMTP id 00721157ae682-561c1768bacso74710487b3.1 for ; Sun, 21 May 2023 22:12:12 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1684732332; x=1687324332; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:from:to:cc:subject:date:message-id:reply-to; bh=7GAfzbPNj/2rvKRevzngE24N05kydgFSIoMtFyLCrv4=; b=Xury70gJk/Ie7HEmpym3Ju1gfiqnvb3lYbEkOk+qdASbHtYcBtM6f7yvCnD5Qi5hF5 Xqo4lAQiP6+GN4sGJVvV6KOkXQkulHhobtYfAhUe9/l/2KsHbHtH2T86CZpNKlWRl63d IzMkj8FLns70Ingn1LKVytPrfZjfNvxgPvDTj2KhwHoFQ8Kx+CJ4VG6UzypdkQqqJSvO YWMudU4lJeyeTAt+pPO1YgidnSwVn5tO3duE8/EWyOVS7vVgZwJSIiPPVnzLsKahA0qM ZlEnIC4jzz74OlvLVDlS2OOEGCjrtDJzReiTZcHJL2Ft7dpNZsKSmX1OuM4OkIMXn0/q hnaQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1684732332; x=1687324332; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=7GAfzbPNj/2rvKRevzngE24N05kydgFSIoMtFyLCrv4=; b=LiJ6FVxUb5/H7vEheyNkoeGlcqFK4OZc0ItURGDdUJIDFho+rAFiNthtoX575hxln9 xUm4LD7B9RhCcjb7qJl8r66ZMaZZejNoryE7pbiWOsVDLjjqdLRiLM5OWQra+fIcqN3H sfx9/FuLVepHWd0wvT06Opua98aD6ou49X53ZfES3xbeG59szChJeryh5QUAL+MTFhGv /dwAaZfPU0dSZQuF+MaHk9o+igYB3YsRWhF5+ixClAuxLoPb11acC226xvh4i/7ES/+c GR6z1kYuwfMEo3rC9Qd9Teq/vW3ZlRyldEweYPGnuOgWFTm01EeG0Hhj77GVoWu2neVN FScw== X-Gm-Message-State: AC+VfDy+W7ytacL4VYSlIkC/FepYn+PcDHuLYCAFFpWq6Oqabs8Q7SO0 QhZxpncTPqi/VV4c4o269EczYA== X-Google-Smtp-Source: ACHHUZ5YOFvYflvdD4FMdNwpV45kv2pl6g6+I95eOvlzunuPKVoSDfQex4to1CKCQVwtCrNkcGYjoQ== X-Received: by 2002:a0d:d107:0:b0:561:b4e3:5fc8 with SMTP id t7-20020a0dd107000000b00561b4e35fc8mr9743269ywd.37.1684732331752; Sun, 21 May 2023 22:12:11 -0700 (PDT) Received: from ripple.attlocal.net (172-10-233-147.lightspeed.sntcca.sbcglobal.net. [172.10.233.147]) by smtp.gmail.com with ESMTPSA id o80-20020a0dcc53000000b00559f03541c6sm1814009ywd.132.2023.05.21.22.12.09 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 21 May 2023 22:12:11 -0700 (PDT) Date: Sun, 21 May 2023 22:12:08 -0700 (PDT) From: Hugh Dickins X-X-Sender: hugh@ripple.attlocal.net To: Andrew Morton cc: Mike Kravetz , Mike Rapoport , "Kirill A. Shutemov" , Matthew Wilcox , David Hildenbrand , Suren Baghdasaryan , Qi Zheng , Yang Shi , Mel Gorman , Peter Xu , Peter Zijlstra , Will Deacon , Yu Zhao , Alistair Popple , Ralph Campbell , Ira Weiny , Steven Price , SeongJae Park , Naoya Horiguchi , Christophe Leroy , Zack Rusin , Jason Gunthorpe , Axel Rasmussen , Anshuman Khandual , Pasha Tatashin , Miaohe Lin , Minchan Kim , Christoph Hellwig , Song Liu , Thomas Hellstrom , linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: [PATCH 18/31] mm/mprotect: delete pmd_none_or_clear_bad_unless_trans_huge() In-Reply-To: <68a97fbe-5c1e-7ac6-72c-7b9c6290b370@google.com> Message-ID: <4a834932-9064-9ed7-3cd1-99466f549486@google.com> References: <68a97fbe-5c1e-7ac6-72c-7b9c6290b370@google.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" change_pmd_range() had special pmd_none_or_clear_bad_unless_trans_huge(), required to avoid "bad" choices when setting automatic NUMA hinting under mmap_read_lock(); but most of that is already covered in pte_offset_map() now. change_pmd_range() just wants a pmd_none() check before wasting time on MMU notifiers, then checks on the read-once _pmd value to work out what's needed for huge cases. If change_pte_range() returns -EAGAIN to retry if pte_offset_map_lock() fails, nothing more special is needed. Signed-off-by: Hugh Dickins --- mm/mprotect.c | 74 ++++++++++++--------------------------------------- 1 file changed, 17 insertions(+), 57 deletions(-) diff --git a/mm/mprotect.c b/mm/mprotect.c index c5a13c0f1017..64e1df0af514 100644 --- a/mm/mprotect.c +++ b/mm/mprotect.c @@ -93,22 +93,9 @@ static long change_pte_range(struct mmu_gather *tlb, bool uffd_wp_resolve =3D cp_flags & MM_CP_UFFD_WP_RESOLVE; =20 tlb_change_page_size(tlb, PAGE_SIZE); - - /* - * Can be called with only the mmap_lock for reading by - * prot_numa so we must check the pmd isn't constantly - * changing from under us from pmd_none to pmd_trans_huge - * and/or the other way around. - */ - if (pmd_trans_unstable(pmd)) - return 0; - - /* - * The pmd points to a regular pte so the pmd can't change - * from under us even if the mmap_lock is only hold for - * reading. - */ pte =3D pte_offset_map_lock(vma->vm_mm, pmd, addr, &ptl); + if (!pte) + return -EAGAIN; =20 /* Get target node for single threaded private VMAs */ if (prot_numa && !(vma->vm_flags & VM_SHARED) && @@ -301,26 +288,6 @@ static long change_pte_range(struct mmu_gather *tlb, return pages; } =20 -/* - * Used when setting automatic NUMA hinting protection where it is - * critical that a numa hinting PMD is not confused with a bad PMD. - */ -static inline int pmd_none_or_clear_bad_unless_trans_huge(pmd_t *pmd) -{ - pmd_t pmdval =3D pmdp_get_lockless(pmd); - - if (pmd_none(pmdval)) - return 1; - if (pmd_trans_huge(pmdval)) - return 0; - if (unlikely(pmd_bad(pmdval))) { - pmd_clear_bad(pmd); - return 1; - } - - return 0; -} - /* * Return true if we want to split THPs into PTE mappings in change * protection procedure, false otherwise. @@ -398,7 +365,8 @@ static inline long change_pmd_range(struct mmu_gather *= tlb, pmd =3D pmd_offset(pud, addr); do { long ret; - + pmd_t _pmd; +again: next =3D pmd_addr_end(addr, end); =20 ret =3D change_pmd_prepare(vma, pmd, cp_flags); @@ -406,16 +374,8 @@ static inline long change_pmd_range(struct mmu_gather = *tlb, pages =3D ret; break; } - /* - * Automatic NUMA balancing walks the tables with mmap_lock - * held for read. It's possible a parallel update to occur - * between pmd_trans_huge() and a pmd_none_or_clear_bad() - * check leading to a false positive and clearing. - * Hence, it's necessary to atomically read the PMD value - * for all the checks. - */ - if (!is_swap_pmd(*pmd) && !pmd_devmap(*pmd) && - pmd_none_or_clear_bad_unless_trans_huge(pmd)) + + if (pmd_none(*pmd)) goto next; =20 /* invoke the mmu notifier if the pmd is populated */ @@ -426,7 +386,8 @@ static inline long change_pmd_range(struct mmu_gather *= tlb, mmu_notifier_invalidate_range_start(&range); } =20 - if (is_swap_pmd(*pmd) || pmd_trans_huge(*pmd) || pmd_devmap(*pmd)) { + _pmd =3D pmdp_get_lockless(pmd); + if (is_swap_pmd(_pmd) || pmd_trans_huge(_pmd) || pmd_devmap(_pmd)) { if ((next - addr !=3D HPAGE_PMD_SIZE) || pgtable_split_needed(vma, cp_flags)) { __split_huge_pmd(vma, pmd, addr, false, NULL); @@ -441,15 +402,10 @@ static inline long change_pmd_range(struct mmu_gather= *tlb, break; } } else { - /* - * change_huge_pmd() does not defer TLB flushes, - * so no need to propagate the tlb argument. - */ - int nr_ptes =3D change_huge_pmd(tlb, vma, pmd, + ret =3D change_huge_pmd(tlb, vma, pmd, addr, newprot, cp_flags); - - if (nr_ptes) { - if (nr_ptes =3D=3D HPAGE_PMD_NR) { + if (ret) { + if (ret =3D=3D HPAGE_PMD_NR) { pages +=3D HPAGE_PMD_NR; nr_huge_updates++; } @@ -460,8 +416,12 @@ static inline long change_pmd_range(struct mmu_gather = *tlb, } /* fall through, the trans huge pmd just split */ } - pages +=3D change_pte_range(tlb, vma, pmd, addr, next, - newprot, cp_flags); + + ret =3D change_pte_range(tlb, vma, pmd, addr, next, newprot, + cp_flags); + if (ret < 0) + goto again; + pages +=3D ret; next: cond_resched(); } while (pmd++, addr =3D next, addr !=3D end); --=20 2.35.3 From nobody Sat Feb 7 19:45:35 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 26CAFC7EE23 for ; Mon, 22 May 2023 05:13:45 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229922AbjEVFNo (ORCPT ); Mon, 22 May 2023 01:13:44 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:54086 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231565AbjEVFNl (ORCPT ); Mon, 22 May 2023 01:13:41 -0400 Received: from mail-yb1-xb36.google.com (mail-yb1-xb36.google.com [IPv6:2607:f8b0:4864:20::b36]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 2A136100 for ; Sun, 21 May 2023 22:13:38 -0700 (PDT) Received: by mail-yb1-xb36.google.com with SMTP id 3f1490d57ef6-ba841216e92so8155539276.1 for ; Sun, 21 May 2023 22:13:38 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1684732417; x=1687324417; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:from:to:cc:subject:date:message-id:reply-to; bh=dPoyGlkGpegFUTFofVDIHWRBddk7VxC7U3Cl7g+dttQ=; b=u/pipKzV3+7Fw72J6ZuzihOhd10w0gGlLGmoTYVNzOrFtguVpB16bQ3yHwI9CSCfkH 3Vl3zrzd94RD+ao7QuGXMU38bZgzZUL1/QZs1hHXZKn7GPINxEVThftcMMnY2o+Yudl0 U2nS3rhKugfJt5YxhFfn9PDdwtey86BwLjUPjmgewXJIUFBVGRQf2ahMAOU7Z/t1DHAm l66iDmXAruV3nmxv3A2DqZgGdZJpE3BSBsWI9c3hTvkUhwfiQW5QjWVbpPvVCeixeUrI GKrbNt5veHLdmjZBBDPCavEMUBfFme3RGuyXCTdVESH6VYwjwFZwaExB5zSeQBSZMndy lQew== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1684732417; x=1687324417; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=dPoyGlkGpegFUTFofVDIHWRBddk7VxC7U3Cl7g+dttQ=; b=B4dq0P87YlyKv3X+UVN+5bxqCZLlbdW186Ldu8YAmWsH8Sog6dppbeeZvfpo28rIRU rbN13fq6yPhrTpjHJ2S35WGx5Qo9ZViDLLeHrDHSdJvhrAbkHM5vOzkjKBjIiufTBzri CnoRGic17BughCSF6w6cyrAx/vz1SfqhVCVQ+JAjX6azXYaSZzgmJzmVoevmoKDFzJ4k LfV8CNMePLhhr2zvwJBXg/bOPe7dgm8Hn0sKKzKlw9UupKvcl7p6f0trpiK8AwWCtS1Z RWorfgvxiNeAPAyaza5oRGSWfbwV1aEnUbXcf7LQoNGd5vYNWTtZjQyPd32ZtFh336eW SRqA== X-Gm-Message-State: AC+VfDwDvnknkpWwcFsJ9EC0kF5TuCES/Y0+691ojJVLTfW6rBrdahOv BpjTC87yMvAVyZVkBxpBLm6s8w== X-Google-Smtp-Source: ACHHUZ6QCzXgzIS4cFPDNtq3tlt+HJnjXO6ZXovLO9zv6wjiXs/oia4ZbqdffbL8veAaOUMs5BJwAQ== X-Received: by 2002:a0d:d595:0:b0:561:e910:52f5 with SMTP id x143-20020a0dd595000000b00561e91052f5mr12194393ywd.27.1684732417238; Sun, 21 May 2023 22:13:37 -0700 (PDT) Received: from ripple.attlocal.net (172-10-233-147.lightspeed.sntcca.sbcglobal.net. [172.10.233.147]) by smtp.gmail.com with ESMTPSA id i67-20020a0df846000000b00559f1cb8444sm1824582ywf.70.2023.05.21.22.13.34 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 21 May 2023 22:13:36 -0700 (PDT) Date: Sun, 21 May 2023 22:13:33 -0700 (PDT) From: Hugh Dickins X-X-Sender: hugh@ripple.attlocal.net To: Andrew Morton cc: Mike Kravetz , Mike Rapoport , "Kirill A. Shutemov" , Matthew Wilcox , David Hildenbrand , Suren Baghdasaryan , Qi Zheng , Yang Shi , Mel Gorman , Peter Xu , Peter Zijlstra , Will Deacon , Yu Zhao , Alistair Popple , Ralph Campbell , Ira Weiny , Steven Price , SeongJae Park , Naoya Horiguchi , Christophe Leroy , Zack Rusin , Jason Gunthorpe , Axel Rasmussen , Anshuman Khandual , Pasha Tatashin , Miaohe Lin , Minchan Kim , Christoph Hellwig , Song Liu , Thomas Hellstrom , linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: [PATCH 19/31] mm/mremap: retry if either pte_offset_map_*lock() fails In-Reply-To: <68a97fbe-5c1e-7ac6-72c-7b9c6290b370@google.com> Message-ID: <2d3fbfea-5884-8211-0cc-954afe25ae9c@google.com> References: <68a97fbe-5c1e-7ac6-72c-7b9c6290b370@google.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" move_ptes() return -EAGAIN if pte_offset_map_lock() of old fails, or if pte_offset_map_nolock() of new fails: move_page_tables() retry if so. But that does need a pmd_none() check inside, to stop endless loop when huge shmem is truncated (thank you to syzbot); and move_huge_pmd() must tolerate that a page table might have been allocated there just before (of course it would be more satisfying to remove the empty page table, but this is not a path worth optimizing). Signed-off-by: Hugh Dickins --- mm/huge_memory.c | 5 +++-- mm/mremap.c | 28 ++++++++++++++++++++-------- 2 files changed, 23 insertions(+), 10 deletions(-) diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 624671aaa60d..d4bd5fa7c823 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -1760,9 +1760,10 @@ bool move_huge_pmd(struct vm_area_struct *vma, unsig= ned long old_addr, =20 /* * The destination pmd shouldn't be established, free_pgtables() - * should have release it. + * should have released it; but move_page_tables() might have already + * inserted a page table, if racing against shmem/file collapse. */ - if (WARN_ON(!pmd_none(*new_pmd))) { + if (!pmd_none(*new_pmd)) { VM_BUG_ON(pmd_trans_huge(*new_pmd)); return false; } diff --git a/mm/mremap.c b/mm/mremap.c index b11ce6c92099..1fc47b4f38d7 100644 --- a/mm/mremap.c +++ b/mm/mremap.c @@ -133,7 +133,7 @@ static pte_t move_soft_dirty_pte(pte_t pte) return pte; } =20 -static void move_ptes(struct vm_area_struct *vma, pmd_t *old_pmd, +static int move_ptes(struct vm_area_struct *vma, pmd_t *old_pmd, unsigned long old_addr, unsigned long old_end, struct vm_area_struct *new_vma, pmd_t *new_pmd, unsigned long new_addr, bool need_rmap_locks) @@ -143,6 +143,7 @@ static void move_ptes(struct vm_area_struct *vma, pmd_t= *old_pmd, spinlock_t *old_ptl, *new_ptl; bool force_flush =3D false; unsigned long len =3D old_end - old_addr; + int err =3D 0; =20 /* * When need_rmap_locks is true, we take the i_mmap_rwsem and anon_vma @@ -170,8 +171,16 @@ static void move_ptes(struct vm_area_struct *vma, pmd_= t *old_pmd, * pte locks because exclusive mmap_lock prevents deadlock. */ old_pte =3D pte_offset_map_lock(mm, old_pmd, old_addr, &old_ptl); - new_pte =3D pte_offset_map(new_pmd, new_addr); - new_ptl =3D pte_lockptr(mm, new_pmd); + if (!old_pte) { + err =3D -EAGAIN; + goto out; + } + new_pte =3D pte_offset_map_nolock(mm, new_pmd, new_addr, &new_ptl); + if (!new_pte) { + pte_unmap_unlock(old_pte, old_ptl); + err =3D -EAGAIN; + goto out; + } if (new_ptl !=3D old_ptl) spin_lock_nested(new_ptl, SINGLE_DEPTH_NESTING); flush_tlb_batched_pending(vma->vm_mm); @@ -208,8 +217,10 @@ static void move_ptes(struct vm_area_struct *vma, pmd_= t *old_pmd, spin_unlock(new_ptl); pte_unmap(new_pte - 1); pte_unmap_unlock(old_pte - 1, old_ptl); +out: if (need_rmap_locks) drop_rmap_locks(vma); + return err; } =20 #ifndef arch_supports_page_table_move @@ -537,6 +548,7 @@ unsigned long move_page_tables(struct vm_area_struct *v= ma, new_pmd =3D alloc_new_pmd(vma->vm_mm, vma, new_addr); if (!new_pmd) break; +again: if (is_swap_pmd(*old_pmd) || pmd_trans_huge(*old_pmd) || pmd_devmap(*old_pmd)) { if (extent =3D=3D HPAGE_PMD_SIZE && @@ -544,8 +556,6 @@ unsigned long move_page_tables(struct vm_area_struct *v= ma, old_pmd, new_pmd, need_rmap_locks)) continue; split_huge_pmd(vma, old_pmd, old_addr); - if (pmd_trans_unstable(old_pmd)) - continue; } else if (IS_ENABLED(CONFIG_HAVE_MOVE_PMD) && extent =3D=3D PMD_SIZE) { /* @@ -556,11 +566,13 @@ unsigned long move_page_tables(struct vm_area_struct = *vma, old_pmd, new_pmd, true)) continue; } - + if (pmd_none(*old_pmd)) + continue; if (pte_alloc(new_vma->vm_mm, new_pmd)) break; - move_ptes(vma, old_pmd, old_addr, old_addr + extent, new_vma, - new_pmd, new_addr, need_rmap_locks); + if (move_ptes(vma, old_pmd, old_addr, old_addr + extent, + new_vma, new_pmd, new_addr, need_rmap_locks) < 0) + goto again; } =20 mmu_notifier_invalidate_range_end(&range); --=20 2.35.3 From nobody Sat Feb 7 19:45:35 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id D503EC77B75 for ; Mon, 22 May 2023 05:15:23 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231864AbjEVFPW (ORCPT ); Mon, 22 May 2023 01:15:22 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:54540 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231795AbjEVFPL (ORCPT ); Mon, 22 May 2023 01:15:11 -0400 Received: from mail-yw1-x112c.google.com (mail-yw1-x112c.google.com [IPv6:2607:f8b0:4864:20::112c]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 74835A8 for ; Sun, 21 May 2023 22:15:10 -0700 (PDT) Received: by mail-yw1-x112c.google.com with SMTP id 00721157ae682-561e919d355so60422787b3.0 for ; Sun, 21 May 2023 22:15:10 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1684732509; x=1687324509; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:from:to:cc:subject:date:message-id:reply-to; bh=wdvmzniCtM+nQioqQdrwu6IzqdoeozaKC+WtG0d7jzc=; b=cRcUhaoiSgJ38o5WfT/WZeLX6uP++hPYcx9RZxFfGqfMkCU/m1hwojIEJua5ROp1t2 3gbVLqFCMhxh733PpSF/0qZ8YjtL+7/P2FqSYSUAY16Ac5W48ksKrjIJgFQ08/pbPPf3 q4tD4wBFecp/AobnrAUouJg1D3xu+hq/x/0yYcV3B7KprXoBoJD5VvXqNme3XhkF8MUs ptKdL72dy1i0NG2qsZVUCDca3Hwe/qFNH0+a+76pWfu+QubvXC6ErHyq6G5+PWt8MPaI 67e8Krgsis/Lu/L/JhEyvnDKGzJIK/fkyJS7kC+QKlqSYLfoaAZoCGiKnWGOJLttwSnL hGFg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1684732509; x=1687324509; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=wdvmzniCtM+nQioqQdrwu6IzqdoeozaKC+WtG0d7jzc=; b=CDnLzdgX0U5mNQ/db3kVMrjZwaMSJGbaRul28Dk/GarbTdceoi8dmTNg2Ri8jRY8Us NkDifg5KLuDrcIJux1Epc7UBDqc9c6cxnmcv6FDR2gR8q8Bq1tPitv67awmwyLDOqNbE k5VD8pbkEFy4ctBztmRr4RYzqNEGtmL28E6oC5ChgdOJ5bT9A3l+uv6Qn0VwFH5QMlge WuoJ+SW/BhRqLdHnUV8CICHlZbrs/Lhk1hGIwQLBGb+kiOrv97QxLvPV1H3w+GU5Gk+R DV0lS6HwZLVjPB7oNcPL2DHv99G+C5G5bApi+VVZweDXOHMXPhWU5VM0Wa/t5X1HaNuX 7aHA== X-Gm-Message-State: AC+VfDztB79F94uZKbRWVcV7+I2yWkBkDz/U+G60sPOlB4fB7aCjRJoF XBaVvoySKAKzmtc0dH2YUFg2uQ== X-Google-Smtp-Source: ACHHUZ4FV9kV2YREHnhwIbwygT+rPx4Et9mG+j5eesRA86qgE8Xs1bA3kk0cV5xsWSqbz756kifYvg== X-Received: by 2002:a0d:e807:0:b0:55a:4ff4:f97d with SMTP id r7-20020a0de807000000b0055a4ff4f97dmr10153523ywe.48.1684732509567; Sun, 21 May 2023 22:15:09 -0700 (PDT) Received: from ripple.attlocal.net (172-10-233-147.lightspeed.sntcca.sbcglobal.net. [172.10.233.147]) by smtp.gmail.com with ESMTPSA id z73-20020a0dd74c000000b00559d9989490sm1828589ywd.41.2023.05.21.22.15.06 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 21 May 2023 22:15:09 -0700 (PDT) Date: Sun, 21 May 2023 22:15:06 -0700 (PDT) From: Hugh Dickins X-X-Sender: hugh@ripple.attlocal.net To: Andrew Morton cc: Mike Kravetz , Mike Rapoport , "Kirill A. Shutemov" , Matthew Wilcox , David Hildenbrand , Suren Baghdasaryan , Qi Zheng , Yang Shi , Mel Gorman , Peter Xu , Peter Zijlstra , Will Deacon , Yu Zhao , Alistair Popple , Ralph Campbell , Ira Weiny , Steven Price , SeongJae Park , Naoya Horiguchi , Christophe Leroy , Zack Rusin , Jason Gunthorpe , Axel Rasmussen , Anshuman Khandual , Pasha Tatashin , Miaohe Lin , Minchan Kim , Christoph Hellwig , Song Liu , Thomas Hellstrom , linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: [PATCH 20/31] mm/madvise: clean up pte_offset_map_lock() scans In-Reply-To: <68a97fbe-5c1e-7ac6-72c-7b9c6290b370@google.com> Message-ID: References: <68a97fbe-5c1e-7ac6-72c-7b9c6290b370@google.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Came here to make madvise's several pte_offset_map_lock() scans advance to next extent on failure, and remove superfluous pmd_trans_unstable() and pmd_none_or_trans_huge_or_clear_bad() calls. But also did some nearby cleanup. swapin_walk_pmd_entry(): don't name an address "index"; don't drop the lock after every pte, only when calling out to read_swap_cache_async(). madvise_cold_or_pageout_pte_range() and madvise_free_pte_range(): prefer "start_pte" for pointer, orig_pte usually denotes a saved pte value; leave lazy MMU mode before unlocking; merge the success and failure paths after split_folio(). Signed-off-by: Hugh Dickins --- mm/madvise.c | 122 ++++++++++++++++++++++++++++----------------------- 1 file changed, 68 insertions(+), 54 deletions(-) diff --git a/mm/madvise.c b/mm/madvise.c index b5ffbaf616f5..0af64c4a8f82 100644 --- a/mm/madvise.c +++ b/mm/madvise.c @@ -188,37 +188,43 @@ static int madvise_update_vma(struct vm_area_struct *= vma, =20 #ifdef CONFIG_SWAP static int swapin_walk_pmd_entry(pmd_t *pmd, unsigned long start, - unsigned long end, struct mm_walk *walk) + unsigned long end, struct mm_walk *walk) { struct vm_area_struct *vma =3D walk->private; - unsigned long index; struct swap_iocb *splug =3D NULL; + pte_t *ptep =3D NULL; + spinlock_t *ptl; + unsigned long addr; =20 - if (pmd_none_or_trans_huge_or_clear_bad(pmd)) - return 0; - - for (index =3D start; index !=3D end; index +=3D PAGE_SIZE) { + for (addr =3D start; addr < end; addr +=3D PAGE_SIZE) { pte_t pte; swp_entry_t entry; struct page *page; - spinlock_t *ptl; - pte_t *ptep; =20 - ptep =3D pte_offset_map_lock(vma->vm_mm, pmd, index, &ptl); + if (!ptep++) { + ptep =3D pte_offset_map_lock(vma->vm_mm, pmd, addr, &ptl); + if (!ptep) + break; + } + pte =3D *ptep; - pte_unmap_unlock(ptep, ptl); - if (!is_swap_pte(pte)) continue; entry =3D pte_to_swp_entry(pte); if (unlikely(non_swap_entry(entry))) continue; =20 + pte_unmap_unlock(ptep, ptl); + ptep =3D NULL; + page =3D read_swap_cache_async(entry, GFP_HIGHUSER_MOVABLE, - vma, index, false, &splug); + vma, addr, false, &splug); if (page) put_page(page); } + + if (ptep) + pte_unmap_unlock(ptep, ptl); swap_read_unplug(splug); cond_resched(); =20 @@ -340,7 +346,7 @@ static int madvise_cold_or_pageout_pte_range(pmd_t *pmd, bool pageout =3D private->pageout; struct mm_struct *mm =3D tlb->mm; struct vm_area_struct *vma =3D walk->vma; - pte_t *orig_pte, *pte, ptent; + pte_t *start_pte, *pte, ptent; spinlock_t *ptl; struct folio *folio =3D NULL; LIST_HEAD(folio_list); @@ -422,11 +428,11 @@ static int madvise_cold_or_pageout_pte_range(pmd_t *p= md, } =20 regular_folio: - if (pmd_trans_unstable(pmd)) - return 0; #endif tlb_change_page_size(tlb, PAGE_SIZE); - orig_pte =3D pte =3D pte_offset_map_lock(vma->vm_mm, pmd, addr, &ptl); + start_pte =3D pte =3D pte_offset_map_lock(vma->vm_mm, pmd, addr, &ptl); + if (!start_pte) + return 0; flush_tlb_batched_pending(mm); arch_enter_lazy_mmu_mode(); for (; addr < end; pte++, addr +=3D PAGE_SIZE) { @@ -447,25 +453,28 @@ static int madvise_cold_or_pageout_pte_range(pmd_t *p= md, * are sure it's worth. Split it if we are only owner. */ if (folio_test_large(folio)) { + int err; + if (folio_mapcount(folio) !=3D 1) break; if (pageout_anon_only_filter && !folio_test_anon(folio)) break; + if (!folio_trylock(folio)) + break; folio_get(folio); - if (!folio_trylock(folio)) { - folio_put(folio); - break; - } - pte_unmap_unlock(orig_pte, ptl); - if (split_folio(folio)) { - folio_unlock(folio); - folio_put(folio); - orig_pte =3D pte_offset_map_lock(mm, pmd, addr, &ptl); - break; - } + arch_leave_lazy_mmu_mode(); + pte_unmap_unlock(start_pte, ptl); + start_pte =3D NULL; + err =3D split_folio(folio); folio_unlock(folio); folio_put(folio); - orig_pte =3D pte =3D pte_offset_map_lock(mm, pmd, addr, &ptl); + if (err) + break; + start_pte =3D pte =3D + pte_offset_map_lock(mm, pmd, addr, &ptl); + if (!start_pte) + break; + arch_enter_lazy_mmu_mode(); pte--; addr -=3D PAGE_SIZE; continue; @@ -510,8 +519,10 @@ static int madvise_cold_or_pageout_pte_range(pmd_t *pm= d, folio_deactivate(folio); } =20 - arch_leave_lazy_mmu_mode(); - pte_unmap_unlock(orig_pte, ptl); + if (start_pte) { + arch_leave_lazy_mmu_mode(); + pte_unmap_unlock(start_pte, ptl); + } if (pageout) reclaim_pages(&folio_list); cond_resched(); @@ -612,7 +623,7 @@ static int madvise_free_pte_range(pmd_t *pmd, unsigned = long addr, struct mm_struct *mm =3D tlb->mm; struct vm_area_struct *vma =3D walk->vma; spinlock_t *ptl; - pte_t *orig_pte, *pte, ptent; + pte_t *start_pte, *pte, ptent; struct folio *folio; int nr_swap =3D 0; unsigned long next; @@ -620,13 +631,12 @@ static int madvise_free_pte_range(pmd_t *pmd, unsigne= d long addr, next =3D pmd_addr_end(addr, end); if (pmd_trans_huge(*pmd)) if (madvise_free_huge_pmd(tlb, vma, pmd, addr, next)) - goto next; - - if (pmd_trans_unstable(pmd)) - return 0; + return 0; =20 tlb_change_page_size(tlb, PAGE_SIZE); - orig_pte =3D pte =3D pte_offset_map_lock(mm, pmd, addr, &ptl); + start_pte =3D pte =3D pte_offset_map_lock(mm, pmd, addr, &ptl); + if (!start_pte) + return 0; flush_tlb_batched_pending(mm); arch_enter_lazy_mmu_mode(); for (; addr !=3D end; pte++, addr +=3D PAGE_SIZE) { @@ -664,23 +674,26 @@ static int madvise_free_pte_range(pmd_t *pmd, unsigne= d long addr, * deactivate all pages. */ if (folio_test_large(folio)) { + int err; + if (folio_mapcount(folio) !=3D 1) - goto out; + break; + if (!folio_trylock(folio)) + break; folio_get(folio); - if (!folio_trylock(folio)) { - folio_put(folio); - goto out; - } - pte_unmap_unlock(orig_pte, ptl); - if (split_folio(folio)) { - folio_unlock(folio); - folio_put(folio); - orig_pte =3D pte_offset_map_lock(mm, pmd, addr, &ptl); - goto out; - } + arch_leave_lazy_mmu_mode(); + pte_unmap_unlock(start_pte, ptl); + start_pte =3D NULL; + err =3D split_folio(folio); folio_unlock(folio); folio_put(folio); - orig_pte =3D pte =3D pte_offset_map_lock(mm, pmd, addr, &ptl); + if (err) + break; + start_pte =3D pte =3D + pte_offset_map_lock(mm, pmd, addr, &ptl); + if (!start_pte) + break; + arch_enter_lazy_mmu_mode(); pte--; addr -=3D PAGE_SIZE; continue; @@ -725,17 +738,18 @@ static int madvise_free_pte_range(pmd_t *pmd, unsigne= d long addr, } folio_mark_lazyfree(folio); } -out: + if (nr_swap) { if (current->mm =3D=3D mm) sync_mm_rss(mm); - add_mm_counter(mm, MM_SWAPENTS, nr_swap); } - arch_leave_lazy_mmu_mode(); - pte_unmap_unlock(orig_pte, ptl); + if (start_pte) { + arch_leave_lazy_mmu_mode(); + pte_unmap_unlock(start_pte, ptl); + } cond_resched(); -next: + return 0; } =20 --=20 2.35.3 From nobody Sat Feb 7 19:45:35 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id EE033C7EE23 for ; Mon, 22 May 2023 05:17:40 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231917AbjEVFRj (ORCPT ); Mon, 22 May 2023 01:17:39 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:56136 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230204AbjEVFRe (ORCPT ); Mon, 22 May 2023 01:17:34 -0400 Received: from mail-yw1-x1134.google.com (mail-yw1-x1134.google.com [IPv6:2607:f8b0:4864:20::1134]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id DBFE6AB for ; Sun, 21 May 2023 22:17:33 -0700 (PDT) Received: by mail-yw1-x1134.google.com with SMTP id 00721157ae682-561c5b5e534so74337747b3.2 for ; Sun, 21 May 2023 22:17:33 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1684732653; x=1687324653; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:from:to:cc:subject:date:message-id:reply-to; bh=hzsL+M6cdepZgQw4hIUfWIci6E11MaKNkaWhV1yArkY=; b=f6H5lzyEl/529vvBXo7cjHFatUfG3YtCbKZO/m7kkdLuuofbAW6JB7TNcMTLxhS2Fa oCscaMoZSrdZFIADmHM3vqeNvKHSfDv45YS3VG0xINc3KnCVeYBUvVx6KUlmN5IVqTQ3 eea7j8ExK19Opvsc5Spx705Pw81qEdSQc02WpxsCt2PFWo+K9ekVeiRw9nrsNx+U4KaQ G9GGOC4mudc6X5o4tr3EZNzo4YmhfN4UtuJyHpWp81+YRZq+yrHLruWf2ZHUJUdTErJk SGxCiEVvSkiDFU5p6ro7GOO7WaLMeSjUeJjhiSiayQPvsPwDptONLF8o1b8tv18mP3Yi FtUA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1684732653; x=1687324653; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=hzsL+M6cdepZgQw4hIUfWIci6E11MaKNkaWhV1yArkY=; b=g97ZtBmScaqaAot69+fV6O0BnK8P/eGAmm/8V2NEsqhXmsDouMlRHRXi0fi5ZwcF0+ s5s5kwjdxKoZAN2/RExCqC5txpgQWRZW6/650z4dJguGhGWbyvM+DX0xr9mSc0VVahMc 7MxGqDg9Al/Psl6qSAfeLgI9qVoXzUOcnR6Q+aICSDwlM7KINS77tI4Vynoap8eEUXId rbIwieaGRAAF0eVzFfgI7UQPwGHN6HR3o2O6eoBhzM46RUWPu3LLEc++9OEO7TDf6aFP fdmtvKiUv9Dqzsmt0CYsFNRfeqwvq7H+8LcPhy8epy5D850bvuSBudklzkliE+Jixh7V sbXw== X-Gm-Message-State: AC+VfDyYoaKVEphFd79KXSECBr4pAUrt0J60iQ1pczR769nj1m8mcLFJ Vy1ryh81ocfVDIYiO/EXcCVWog== X-Google-Smtp-Source: ACHHUZ7HPZHU/c0SUjH6d0BWcOB8eLdBcSfh1g/VsYEO0pbi+eAMTALQvB8Hsf5hIiR7CjUOpnbz8A== X-Received: by 2002:a81:6ec5:0:b0:561:e944:a559 with SMTP id j188-20020a816ec5000000b00561e944a559mr9577004ywc.31.1684732652960; Sun, 21 May 2023 22:17:32 -0700 (PDT) Received: from ripple.attlocal.net (172-10-233-147.lightspeed.sntcca.sbcglobal.net. [172.10.233.147]) by smtp.gmail.com with ESMTPSA id e129-20020a0dc287000000b00545a08184f8sm1818483ywd.136.2023.05.21.22.17.30 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 21 May 2023 22:17:32 -0700 (PDT) Date: Sun, 21 May 2023 22:17:29 -0700 (PDT) From: Hugh Dickins X-X-Sender: hugh@ripple.attlocal.net To: Andrew Morton cc: Mike Kravetz , Mike Rapoport , "Kirill A. Shutemov" , Matthew Wilcox , David Hildenbrand , Suren Baghdasaryan , Qi Zheng , Yang Shi , Mel Gorman , Peter Xu , Peter Zijlstra , Will Deacon , Yu Zhao , Alistair Popple , Ralph Campbell , Ira Weiny , Steven Price , SeongJae Park , Naoya Horiguchi , Christophe Leroy , Zack Rusin , Jason Gunthorpe , Axel Rasmussen , Anshuman Khandual , Pasha Tatashin , Miaohe Lin , Minchan Kim , Christoph Hellwig , Song Liu , Thomas Hellstrom , linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: [PATCH 21/31] mm/madvise: clean up force_shm_swapin_readahead() In-Reply-To: <68a97fbe-5c1e-7ac6-72c-7b9c6290b370@google.com> Message-ID: References: <68a97fbe-5c1e-7ac6-72c-7b9c6290b370@google.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Some nearby MADV_WILLNEED cleanup unrelated to pte_offset_map_lock(). shmem_swapin_range() is a better name than force_shm_swapin_readahead(). Fix unimportant off-by-one on end_index. Call the swp_entry_t "entry" rather than "swap": either is okay, but entry is the name used elsewhere in mm/madvise.c. Do not assume GFP_HIGHUSER_MOVABLE: that's right for anon swap, but shmem should take gfp from mapping. Pass the actual vma and address to read_swap_cache_async(), in case a NUMA mempolicy applies. lru_add_drain() at outer level, like madvise_willneed()'s other branch. Signed-off-by: Hugh Dickins --- mm/madvise.c | 24 +++++++++++++----------- 1 file changed, 13 insertions(+), 11 deletions(-) diff --git a/mm/madvise.c b/mm/madvise.c index 0af64c4a8f82..9b3c9610052f 100644 --- a/mm/madvise.c +++ b/mm/madvise.c @@ -235,30 +235,34 @@ static const struct mm_walk_ops swapin_walk_ops =3D { .pmd_entry =3D swapin_walk_pmd_entry, }; =20 -static void force_shm_swapin_readahead(struct vm_area_struct *vma, +static void shmem_swapin_range(struct vm_area_struct *vma, unsigned long start, unsigned long end, struct address_space *mapping) { XA_STATE(xas, &mapping->i_pages, linear_page_index(vma, start)); - pgoff_t end_index =3D linear_page_index(vma, end + PAGE_SIZE - 1); + pgoff_t end_index =3D linear_page_index(vma, end) - 1; struct page *page; struct swap_iocb *splug =3D NULL; =20 rcu_read_lock(); xas_for_each(&xas, page, end_index) { - swp_entry_t swap; + unsigned long addr; + swp_entry_t entry; =20 if (!xa_is_value(page)) continue; - swap =3D radix_to_swp_entry(page); + entry =3D radix_to_swp_entry(page); /* There might be swapin error entries in shmem mapping. */ - if (non_swap_entry(swap)) + if (non_swap_entry(entry)) continue; + + addr =3D vma->vm_start + + ((xas.xa_index - vma->vm_pgoff) << PAGE_SHIFT); xas_pause(&xas); rcu_read_unlock(); =20 - page =3D read_swap_cache_async(swap, GFP_HIGHUSER_MOVABLE, - NULL, 0, false, &splug); + page =3D read_swap_cache_async(entry, mapping_gfp_mask(mapping), + vma, addr, false, &splug); if (page) put_page(page); =20 @@ -266,8 +270,6 @@ static void force_shm_swapin_readahead(struct vm_area_s= truct *vma, } rcu_read_unlock(); swap_read_unplug(splug); - - lru_add_drain(); /* Push any new pages onto the LRU now */ } #endif /* CONFIG_SWAP */ =20 @@ -291,8 +293,8 @@ static long madvise_willneed(struct vm_area_struct *vma, } =20 if (shmem_mapping(file->f_mapping)) { - force_shm_swapin_readahead(vma, start, end, - file->f_mapping); + shmem_swapin_range(vma, start, end, file->f_mapping); + lru_add_drain(); /* Push any new pages onto the LRU now */ return 0; } #else --=20 2.35.3 From nobody Sat Feb 7 19:45:35 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id D4AB2C77B75 for ; Mon, 22 May 2023 05:18:33 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231939AbjEVFSc (ORCPT ); Mon, 22 May 2023 01:18:32 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:56762 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231737AbjEVFS3 (ORCPT ); Mon, 22 May 2023 01:18:29 -0400 Received: from mail-yw1-x1129.google.com (mail-yw1-x1129.google.com [IPv6:2607:f8b0:4864:20::1129]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 9249FAB for ; Sun, 21 May 2023 22:18:28 -0700 (PDT) Received: by mail-yw1-x1129.google.com with SMTP id 00721157ae682-561d5a16be0so74545127b3.2 for ; Sun, 21 May 2023 22:18:28 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1684732708; x=1687324708; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:from:to:cc:subject:date:message-id:reply-to; bh=7TjV4RyPitx7LIAI3Pj7MSABK+uluBQcp84mqUzVC6k=; b=Rn20sER4T4J8WT0bCSNgtKt+s3EuFbG8GtWiix1BcIQNeo/R8lvZKZibiwh3YZIsxd EnCfCzxhNamXvgrmaGBWZwQLLOogQl2zJkptkNsZuwZJP2V7Ysbew+9PodV2Eo/dF6Rz xAM/3hkG4T96NLraRyIuY8t8M3JyxrJk7Ta1kFceKAinzE25KLkprmyFpjg3Ymt8wWQE 3xmly1wR7KhvAsdh/yuCHd1F4QD89xMkxfEmWA4IIW5JNutOsZzL6TS+snO9aVVtcasO VB24BtEFsnlH0J+Nvd70D0Zxa1Rw4VEJiHNtQC31uldZTzMR/gntM0yqANgBYRGqCRYm U+Ow== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1684732708; x=1687324708; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=7TjV4RyPitx7LIAI3Pj7MSABK+uluBQcp84mqUzVC6k=; b=SCaUZw1FInayFwYaIq8nw5w85aXcwXDQF2AbCS1x9hWuX2cCqqQUrLBPxt+JYOD8dg +Js+sfgpW8M+kw/R2FmukLnQOkirQ5V45v9EbWekHbsdOGTZBjEynHG3lWaJUyvQIB49 RsIkqSm0WTClHA/H0HUz3Mp+Esi8T4aeBQZs6xV0kh6cqe8D4nnY3eixHMqlWUOQ3TBn 50poEyBee3RWyVcHZaPKuGbbthWKeoIbq9pAoKCqYU5wZEOXQI0YUxHLmmiQpnukueqP GsbOjD28zO3VKgJ/ZcMlDpLY/F8a1PwXHB8jLi7b/lFHxFa62q5p+tgcVZYNvlN4oihs ixeg== X-Gm-Message-State: AC+VfDzSHJz4q0K4dR/76qQ5vaC8hDOgDRBtUqCKkB6YS5YXpaQdcJPV tiD5K5ve8CmLzEj/yjIUpkeoRA== X-Google-Smtp-Source: ACHHUZ6v2mlgLkozXZxDjiAaqAiQT2KR+fKn2zWVFJn2KWprg4vexj/qA5jscdHYSkrbKxnnqDZJ7A== X-Received: by 2002:a0d:df8b:0:b0:55a:8b11:5f6a with SMTP id i133-20020a0ddf8b000000b0055a8b115f6amr8985666ywe.19.1684732707681; Sun, 21 May 2023 22:18:27 -0700 (PDT) Received: from ripple.attlocal.net (172-10-233-147.lightspeed.sntcca.sbcglobal.net. [172.10.233.147]) by smtp.gmail.com with ESMTPSA id u130-20020a816088000000b00552df52450csm1818581ywb.88.2023.05.21.22.18.25 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 21 May 2023 22:18:27 -0700 (PDT) Date: Sun, 21 May 2023 22:18:24 -0700 (PDT) From: Hugh Dickins X-X-Sender: hugh@ripple.attlocal.net To: Andrew Morton cc: Mike Kravetz , Mike Rapoport , "Kirill A. Shutemov" , Matthew Wilcox , David Hildenbrand , Suren Baghdasaryan , Qi Zheng , Yang Shi , Mel Gorman , Peter Xu , Peter Zijlstra , Will Deacon , Yu Zhao , Alistair Popple , Ralph Campbell , Ira Weiny , Steven Price , SeongJae Park , Naoya Horiguchi , Christophe Leroy , Zack Rusin , Jason Gunthorpe , Axel Rasmussen , Anshuman Khandual , Pasha Tatashin , Miaohe Lin , Minchan Kim , Christoph Hellwig , Song Liu , Thomas Hellstrom , linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: [PATCH 22/31] mm/swapoff: allow pte_offset_map[_lock]() to fail In-Reply-To: <68a97fbe-5c1e-7ac6-72c-7b9c6290b370@google.com> Message-ID: <619c27-d7b0-ae71-329e-9da3d3e7fc7@google.com> References: <68a97fbe-5c1e-7ac6-72c-7b9c6290b370@google.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Adjust unuse_pte() and unuse_pte_range() to allow pte_offset_map_lock() and pte_offset_map() failure; remove pmd_none_or_trans_huge_or_clear_bad() from unuse_pmd_range() now that pte_offset_map() does all that itself. Signed-off-by: Hugh Dickins --- mm/swapfile.c | 38 ++++++++++++++++++++------------------ 1 file changed, 20 insertions(+), 18 deletions(-) diff --git a/mm/swapfile.c b/mm/swapfile.c index 274bbf797480..12d204e6dae2 100644 --- a/mm/swapfile.c +++ b/mm/swapfile.c @@ -1774,7 +1774,7 @@ static int unuse_pte(struct vm_area_struct *vma, pmd_= t *pmd, hwposioned =3D true; =20 pte =3D pte_offset_map_lock(vma->vm_mm, pmd, addr, &ptl); - if (unlikely(!pte_same_as_swp(*pte, swp_entry_to_pte(entry)))) { + if (unlikely(!pte || !pte_same_as_swp(*pte, swp_entry_to_pte(entry)))) { ret =3D 0; goto out; } @@ -1827,7 +1827,8 @@ static int unuse_pte(struct vm_area_struct *vma, pmd_= t *pmd, set_pte_at(vma->vm_mm, addr, pte, new_pte); swap_free(entry); out: - pte_unmap_unlock(pte, ptl); + if (pte) + pte_unmap_unlock(pte, ptl); if (page !=3D swapcache) { unlock_page(page); put_page(page); @@ -1839,17 +1840,22 @@ static int unuse_pte_range(struct vm_area_struct *v= ma, pmd_t *pmd, unsigned long addr, unsigned long end, unsigned int type) { - swp_entry_t entry; - pte_t *pte; + pte_t *pte =3D NULL; struct swap_info_struct *si; - int ret =3D 0; =20 si =3D swap_info[type]; - pte =3D pte_offset_map(pmd, addr); do { struct folio *folio; unsigned long offset; unsigned char swp_count; + swp_entry_t entry; + int ret; + + if (!pte++) { + pte =3D pte_offset_map(pmd, addr); + if (!pte) + break; + } =20 if (!is_swap_pte(*pte)) continue; @@ -1860,6 +1866,8 @@ static int unuse_pte_range(struct vm_area_struct *vma= , pmd_t *pmd, =20 offset =3D swp_offset(entry); pte_unmap(pte); + pte =3D NULL; + folio =3D swap_cache_get_folio(entry, vma, addr); if (!folio) { struct page *page; @@ -1878,8 +1886,7 @@ static int unuse_pte_range(struct vm_area_struct *vma= , pmd_t *pmd, if (!folio) { swp_count =3D READ_ONCE(si->swap_map[offset]); if (swp_count =3D=3D 0 || swp_count =3D=3D SWAP_MAP_BAD) - goto try_next; - + continue; return -ENOMEM; } =20 @@ -1889,20 +1896,17 @@ static int unuse_pte_range(struct vm_area_struct *v= ma, pmd_t *pmd, if (ret < 0) { folio_unlock(folio); folio_put(folio); - goto out; + return ret; } =20 folio_free_swap(folio); folio_unlock(folio); folio_put(folio); -try_next: - pte =3D pte_offset_map(pmd, addr); - } while (pte++, addr +=3D PAGE_SIZE, addr !=3D end); - pte_unmap(pte - 1); + } while (addr +=3D PAGE_SIZE, addr !=3D end); =20 - ret =3D 0; -out: - return ret; + if (pte) + pte_unmap(pte); + return 0; } =20 static inline int unuse_pmd_range(struct vm_area_struct *vma, pud_t *pud, @@ -1917,8 +1921,6 @@ static inline int unuse_pmd_range(struct vm_area_stru= ct *vma, pud_t *pud, do { cond_resched(); next =3D pmd_addr_end(addr, end); - if (pmd_none_or_trans_huge_or_clear_bad(pmd)) - continue; ret =3D unuse_pte_range(vma, pmd, addr, next, type); if (ret) return ret; --=20 2.35.3 From nobody Sat Feb 7 19:45:35 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 98853C7EE23 for ; Mon, 22 May 2023 05:19:56 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231272AbjEVFTz (ORCPT ); Mon, 22 May 2023 01:19:55 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:57142 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231947AbjEVFTt (ORCPT ); Mon, 22 May 2023 01:19:49 -0400 Received: from mail-yb1-xb2c.google.com (mail-yb1-xb2c.google.com [IPv6:2607:f8b0:4864:20::b2c]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 59C21A0 for ; Sun, 21 May 2023 22:19:48 -0700 (PDT) Received: by mail-yb1-xb2c.google.com with SMTP id 3f1490d57ef6-ba1815e12efso4782697276.3 for ; Sun, 21 May 2023 22:19:48 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1684732787; x=1687324787; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:from:to:cc:subject:date:message-id:reply-to; bh=ooTKFpGgFIG6tpdJyTdSPdnhROWHUT8BFdRqrlDYiG0=; b=c16Z4sMpjKlgLDYv6Gs/y1EYdzJZLSgcmaTkBaIf9Gh+M6R5Fj28OTQhOSCRw7EEv5 CoXdOMLH5hzAb7ctwTVEzalqQl9a+Um/b9wFPM2bKdaAzb1WrTE0ZJDEimE6nNgaQriX ekKs2u5sm5n3wdVCVhRqBQ64w5hCu6jc8AAKpTKma9h0EDNZhsm+d/uAlNMvz+IWh9Vi IdWQzCcYU/nxwQLTQZI99miGWNLqkFHHVO984XkRjgpZ7eBUTR4yIiwEf+nz+MbqBbbu qrDeOVuSWS3vwtPVFhgFnPNPF1tXG11tH/YIDKTxJb7j0INA2XTKzLvkIppZ9XEZec1f V+Og== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1684732787; x=1687324787; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=ooTKFpGgFIG6tpdJyTdSPdnhROWHUT8BFdRqrlDYiG0=; b=DaTL7e5aD33N1gGI1a7WC6VVwlPGZb1UlzX0mp2ARQDjUT37Oe1u9GPay7Zn0Zo9ef lTbgPmJwJqCvY3bXqcCpznwoaH/zUC1jS6uWLm97JhlVURDrXYoPkWe+SlrTuNxVFmcb /aS7MyRu5Ojf6c+lBWKceYUhLFb59Rggg1S5u00SIBUTPL3OfHKO0ZW675F83cOehQFZ Qzp14pne63gvxukYbdKi8GNTfxJ0SVC+HTtVFGomd5p0KJUz2Rrz3Sb5I5FbCXkDD9xt GXj8mLpufM1EEw2jUcapMbhuhjImPfzBYTntFotI60cCoPBLHKvJ+LMzH1KdMimZw0lj Z21w== X-Gm-Message-State: AC+VfDx+IDSUnkZ/4y9kPCDLoxb0yG6drBx5sEstQtTzBVpVL7kbEAtG UaT+AVIw/X405i5OHF6B/H7oyw== X-Google-Smtp-Source: ACHHUZ4/iOac0pg6azmy6Y4+9ZIQvlTnkM+d3E/2233dtMLIq4uN1+AwZtw93iYqYZBi+HEHW+NHVQ== X-Received: by 2002:a0d:d743:0:b0:55a:59cb:4c1a with SMTP id z64-20020a0dd743000000b0055a59cb4c1amr9770012ywd.14.1684732787446; Sun, 21 May 2023 22:19:47 -0700 (PDT) Received: from ripple.attlocal.net (172-10-233-147.lightspeed.sntcca.sbcglobal.net. [172.10.233.147]) by smtp.gmail.com with ESMTPSA id o9-20020a817309000000b00556aa81f615sm1809311ywc.68.2023.05.21.22.19.44 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 21 May 2023 22:19:47 -0700 (PDT) Date: Sun, 21 May 2023 22:19:44 -0700 (PDT) From: Hugh Dickins X-X-Sender: hugh@ripple.attlocal.net To: Andrew Morton cc: Mike Kravetz , Mike Rapoport , "Kirill A. Shutemov" , Matthew Wilcox , David Hildenbrand , Suren Baghdasaryan , Qi Zheng , Yang Shi , Mel Gorman , Peter Xu , Peter Zijlstra , Will Deacon , Yu Zhao , Alistair Popple , Ralph Campbell , Ira Weiny , Steven Price , SeongJae Park , Naoya Horiguchi , Christophe Leroy , Zack Rusin , Jason Gunthorpe , Axel Rasmussen , Anshuman Khandual , Pasha Tatashin , Miaohe Lin , Minchan Kim , Christoph Hellwig , Song Liu , Thomas Hellstrom , linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: [PATCH 23/31] mm/mglru: allow pte_offset_map_nolock() to fail In-Reply-To: <68a97fbe-5c1e-7ac6-72c-7b9c6290b370@google.com> Message-ID: <242721-1e64-845e-226a-bf2b2dc72dd@google.com> References: <68a97fbe-5c1e-7ac6-72c-7b9c6290b370@google.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" MGLRU's walk_pte_range() use the safer pte_offset_map_nolock(), rather than pte_lockptr(), to get the ptl for its trylock. Just return false and move on to next extent if it fails, like when the trylock fails. Remove the VM_WARN_ON_ONCE(pmd_leaf) since that will happen, rarely. Signed-off-by: Hugh Dickins Acked-by: Yu Zhao --- mm/vmscan.c | 16 +++++++--------- 1 file changed, 7 insertions(+), 9 deletions(-) diff --git a/mm/vmscan.c b/mm/vmscan.c index d257916f39e5..1c344589c145 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -3992,15 +3992,15 @@ static bool walk_pte_range(pmd_t *pmd, unsigned lon= g start, unsigned long end, struct pglist_data *pgdat =3D lruvec_pgdat(walk->lruvec); int old_gen, new_gen =3D lru_gen_from_seq(walk->max_seq); =20 - VM_WARN_ON_ONCE(pmd_leaf(*pmd)); - - ptl =3D pte_lockptr(args->mm, pmd); - if (!spin_trylock(ptl)) + pte =3D pte_offset_map_nolock(args->mm, pmd, start & PMD_MASK, &ptl); + if (!pte) return false; + if (!spin_trylock(ptl)) { + pte_unmap(pte); + return false; + } =20 arch_enter_lazy_mmu_mode(); - - pte =3D pte_offset_map(pmd, start & PMD_MASK); restart: for (i =3D pte_index(start), addr =3D start; addr !=3D end; i++, addr += =3D PAGE_SIZE) { unsigned long pfn; @@ -4041,10 +4041,8 @@ static bool walk_pte_range(pmd_t *pmd, unsigned long= start, unsigned long end, if (i < PTRS_PER_PTE && get_next_vma(PMD_MASK, PAGE_SIZE, args, &start, &= end)) goto restart; =20 - pte_unmap(pte); - arch_leave_lazy_mmu_mode(); - spin_unlock(ptl); + pte_unmap_unlock(pte, ptl); =20 return suitable_to_scan(total, young); } --=20 2.35.3 From nobody Sat Feb 7 19:45:35 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id C5054C7EE23 for ; Mon, 22 May 2023 05:21:12 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231574AbjEVFVL (ORCPT ); Mon, 22 May 2023 01:21:11 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:57538 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231950AbjEVFU6 (ORCPT ); Mon, 22 May 2023 01:20:58 -0400 Received: from mail-yb1-xb30.google.com (mail-yb1-xb30.google.com [IPv6:2607:f8b0:4864:20::b30]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id F15ED100 for ; Sun, 21 May 2023 22:20:56 -0700 (PDT) Received: by mail-yb1-xb30.google.com with SMTP id 3f1490d57ef6-babb985f9c8so3645159276.1 for ; Sun, 21 May 2023 22:20:56 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1684732856; x=1687324856; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:from:to:cc:subject:date:message-id:reply-to; bh=j6etDg/swU7AOCalaMxD05iIweXIhmZHtm6GGNoHnzQ=; b=YFRKfsKFJp2z/omMF9kAApd0S0vxQuFTDhSSSfTCRKzoqcrub4PhM+O1boooV8+h+7 EnjnOkymlppoTkDw4nRAPps7rFoXTdCM2fk5vgu0nhCdGNkuTC3Ywg1Tg0z3smilw57a AYhYR543c86roHNURCr25oJ9c1fQFj3ujvNOV5N67eBFAEXegF1f2oEskky6yYw53RB6 1LMrFi8toFy6m/MnfxFA7DcCZIfsKlCmP01wxE3ks9bt8Xfi1FEm4Lsd/b5mwwUQxQvr vvJIRG6Xgs+dtaUMASMBUyR0SSOfetdWJ8Iw1sW/hI3XPXcU9B/tW4Jbs4RLezTnbIt4 iIvA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1684732856; x=1687324856; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=j6etDg/swU7AOCalaMxD05iIweXIhmZHtm6GGNoHnzQ=; b=LGqvEc0AcAmTqYUF03F2YMkBO6oYB2W7Zj8ZKsHm26Qg1M0ZuiC8WTtDcMfzbKgyyq Ht4BekgjBlgceZysA66ZFQDhOGgQj7qV082MQdZcowjtlFbPff7KsJ8ZFAEFS/ZTZ1+u SHfF35UAjzAi4ABzQitVFwUZmfS4IUsRsovniUYTmGB9Hb4A/CcBNvR8M2VQOZxpMLdN EKxAw16MJDMs0wDCwDQsuqrbpiyJY8mS7tqDAjTOe/COSZqC2/lkpVpCXL2RZkvnI5yT jECczQSSqpcruBbvPj6JMNgfeDqiv869pjtY8SSbb9T5NIoHw0wnWLHlkmZN+qZ0P8cZ EbOw== X-Gm-Message-State: AC+VfDz96Pt1qN22rmztFQvoAIdHR1G4abBebg+tbjhu8qhInfQqZwIy aLdPiVZ26RFqIPALrcl4vQl4+A== X-Google-Smtp-Source: ACHHUZ4unls6ogV9SvyZExwOGgG/jyFNvH+bFxzEVTKzBIPGmZGTzZNTpUXsl12MimVOmHUoz8BiUw== X-Received: by 2002:a0d:ea93:0:b0:559:f0ef:aac0 with SMTP id t141-20020a0dea93000000b00559f0efaac0mr10004964ywe.30.1684732856030; Sun, 21 May 2023 22:20:56 -0700 (PDT) Received: from ripple.attlocal.net (172-10-233-147.lightspeed.sntcca.sbcglobal.net. [172.10.233.147]) by smtp.gmail.com with ESMTPSA id i16-20020a0ddf10000000b0054f80928ea4sm1795763ywe.140.2023.05.21.22.20.53 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 21 May 2023 22:20:55 -0700 (PDT) Date: Sun, 21 May 2023 22:20:52 -0700 (PDT) From: Hugh Dickins X-X-Sender: hugh@ripple.attlocal.net To: Andrew Morton cc: Mike Kravetz , Mike Rapoport , "Kirill A. Shutemov" , Matthew Wilcox , David Hildenbrand , Suren Baghdasaryan , Qi Zheng , Yang Shi , Mel Gorman , Peter Xu , Peter Zijlstra , Will Deacon , Yu Zhao , Alistair Popple , Ralph Campbell , Ira Weiny , Steven Price , SeongJae Park , Naoya Horiguchi , Christophe Leroy , Zack Rusin , Jason Gunthorpe , Axel Rasmussen , Anshuman Khandual , Pasha Tatashin , Miaohe Lin , Minchan Kim , Christoph Hellwig , Song Liu , Thomas Hellstrom , linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: [PATCH 24/31] mm/migrate_device: allow pte_offset_map_lock() to fail In-Reply-To: <68a97fbe-5c1e-7ac6-72c-7b9c6290b370@google.com> Message-ID: References: <68a97fbe-5c1e-7ac6-72c-7b9c6290b370@google.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" migrate_vma_collect_pmd(): remove the pmd_trans_unstable() handling after splitting huge zero pmd, and the pmd_none() handling after successfully splitting huge page: those are now managed inside pte_offset_map_lock(), and by "goto again" when it fails. But the skip after unsuccessful split_huge_page() must stay: it avoids an endless loop. The skip when pmd_bad()? Remove that: it will be treated as a hole rather than a skip once cleared by pte_offset_map_lock(), but with different timing that would be so anyway; and it's arguably best to leave the pmd_bad() handling centralized there. migrate_vma_insert_page(): remove comment on the old pte_offset_map() and old locking limitations; remove the pmd_trans_unstable() check and just proceed to pte_offset_map_lock(), aborting when it fails (page has now been charged to memcg, but that's so in other cases, and presumably uncharged later). Signed-off-by: Hugh Dickins Reviewed-by: Alistair Popple --- mm/migrate_device.c | 31 ++++--------------------------- 1 file changed, 4 insertions(+), 27 deletions(-) diff --git a/mm/migrate_device.c b/mm/migrate_device.c index d30c9de60b0d..a14af6b12b04 100644 --- a/mm/migrate_device.c +++ b/mm/migrate_device.c @@ -83,9 +83,6 @@ static int migrate_vma_collect_pmd(pmd_t *pmdp, if (is_huge_zero_page(page)) { spin_unlock(ptl); split_huge_pmd(vma, pmdp, addr); - if (pmd_trans_unstable(pmdp)) - return migrate_vma_collect_skip(start, end, - walk); } else { int ret; =20 @@ -100,16 +97,12 @@ static int migrate_vma_collect_pmd(pmd_t *pmdp, if (ret) return migrate_vma_collect_skip(start, end, walk); - if (pmd_none(*pmdp)) - return migrate_vma_collect_hole(start, end, -1, - walk); } } =20 - if (unlikely(pmd_bad(*pmdp))) - return migrate_vma_collect_skip(start, end, walk); - ptep =3D pte_offset_map_lock(mm, pmdp, addr, &ptl); + if (!ptep) + goto again; arch_enter_lazy_mmu_mode(); =20 for (; addr < end; addr +=3D PAGE_SIZE, ptep++) { @@ -595,27 +588,10 @@ static void migrate_vma_insert_page(struct migrate_vm= a *migrate, pmdp =3D pmd_alloc(mm, pudp, addr); if (!pmdp) goto abort; - if (pmd_trans_huge(*pmdp) || pmd_devmap(*pmdp)) goto abort; - - /* - * Use pte_alloc() instead of pte_alloc_map(). We can't run - * pte_offset_map() on pmds where a huge pmd might be created - * from a different thread. - * - * pte_alloc_map() is safe to use under mmap_write_lock(mm) or when - * parallel threads are excluded by other means. - * - * Here we only have mmap_read_lock(mm). - */ if (pte_alloc(mm, pmdp)) goto abort; - - /* See the comment in pte_alloc_one_map() */ - if (unlikely(pmd_trans_unstable(pmdp))) - goto abort; - if (unlikely(anon_vma_prepare(vma))) goto abort; if (mem_cgroup_charge(page_folio(page), vma->vm_mm, GFP_KERNEL)) @@ -650,7 +626,8 @@ static void migrate_vma_insert_page(struct migrate_vma = *migrate, } =20 ptep =3D pte_offset_map_lock(mm, pmdp, addr, &ptl); - + if (!ptep) + goto abort; if (check_stable_address_space(mm)) goto unlock_abort; =20 --=20 2.35.3 From nobody Sat Feb 7 19:45:35 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8D201C77B75 for ; Mon, 22 May 2023 05:22:26 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231956AbjEVFWY (ORCPT ); Mon, 22 May 2023 01:22:24 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:58006 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231958AbjEVFWS (ORCPT ); Mon, 22 May 2023 01:22:18 -0400 Received: from mail-yb1-xb2f.google.com (mail-yb1-xb2f.google.com [IPv6:2607:f8b0:4864:20::b2f]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A8347CA for ; Sun, 21 May 2023 22:22:17 -0700 (PDT) Received: by mail-yb1-xb2f.google.com with SMTP id 3f1490d57ef6-ba829f93da3so8148830276.1 for ; Sun, 21 May 2023 22:22:17 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1684732937; x=1687324937; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:from:to:cc:subject:date:message-id:reply-to; bh=bf4ovlbip9o2eAgzZYzeIq4fNWAxMs47VtohjgZo/Hc=; b=xxliRdiYq491ddtLAOp7IvlP6Tgs09+WixlM6YG4PIP9hmkihY34Mp7pqS8UXaXWhi RqkBsAewEsCIX9b0ejcCa5E2jV6uN+a3W57TZ3FlG9ZI/Xxbjh+C1MSqcwhmQFBu25PN rRzcJXLhWTtkRDhu7m269tBf4pXwAXhsMjAzMdb8HFMIiX2bUkp/5SZfLmQ8JxH4T0vk wKyQPeorZ9iGKOGhWx6oCTIHOln7Bg8pdmjG1PPm7ouSR3zthhZvslly93YlThiYWoSt Uq6if1E3nt9EtldOeJnlIf2uspBMjaO0EQ7qGUle/0SCGanLsme+EJIaES6VTaJRp/ZR yfYQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1684732937; x=1687324937; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=bf4ovlbip9o2eAgzZYzeIq4fNWAxMs47VtohjgZo/Hc=; b=XbgnM2eD0eOUhKeTJb/YcZu+b7myoXKFCdSuRo1xLr1rJRyPucBV/tNQjRpafiVBqE AmBYdR55lX4wMxlp7YIdKrrzAZq8LYovhWLkuc2uLHqia0ElNsYeR8hPQetQuLRUqO2p LduEJpg/DpFF6/sVHIv3T6NN8yXoC/Fdp+qUf4Neyi7OOKq1oJgq0KqgaAYQEdVOsSgF etd7qqvPuP6Z3kdQ58JI+swGrggIlT49dIDujJ4eZTiGImaQ3jzQeGljh2cwP28ZOZ7A KUe7fmsHN99LPsWnzNDyJJ0kN5Y3HlyG6Hdb8Iy3LJww9NiAiYPb8OYY27Pz+WjenG7g y/aQ== X-Gm-Message-State: AC+VfDweAb3yJIMOD7a0CuEeZnlsgaUn8NHR/FQ9ee0swtUxdxKy6pjx gz/0FtdgNe0wfIUTrk6y7JfByw== X-Google-Smtp-Source: ACHHUZ6vQXHbZLVvPerE0Fu3jg6OZ5EHbThB/3RtX3VnGbimP3VA+F9C2VHWinuyuwC+mVweYmLjPw== X-Received: by 2002:a25:f812:0:b0:ba8:3590:4302 with SMTP id u18-20020a25f812000000b00ba835904302mr10377914ybd.36.1684732936746; Sun, 21 May 2023 22:22:16 -0700 (PDT) Received: from ripple.attlocal.net (172-10-233-147.lightspeed.sntcca.sbcglobal.net. [172.10.233.147]) by smtp.gmail.com with ESMTPSA id x7-20020a259a07000000b00b8f6ec5a955sm1266497ybn.49.2023.05.21.22.22.14 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 21 May 2023 22:22:16 -0700 (PDT) Date: Sun, 21 May 2023 22:22:13 -0700 (PDT) From: Hugh Dickins X-X-Sender: hugh@ripple.attlocal.net To: Andrew Morton cc: Mike Kravetz , Mike Rapoport , "Kirill A. Shutemov" , Matthew Wilcox , David Hildenbrand , Suren Baghdasaryan , Qi Zheng , Yang Shi , Mel Gorman , Peter Xu , Peter Zijlstra , Will Deacon , Yu Zhao , Alistair Popple , Ralph Campbell , Ira Weiny , Steven Price , SeongJae Park , Naoya Horiguchi , Christophe Leroy , Zack Rusin , Jason Gunthorpe , Axel Rasmussen , Anshuman Khandual , Pasha Tatashin , Miaohe Lin , Minchan Kim , Christoph Hellwig , Song Liu , Thomas Hellstrom , linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: [PATCH 25/31] mm/gup: remove FOLL_SPLIT_PMD use of pmd_trans_unstable() In-Reply-To: <68a97fbe-5c1e-7ac6-72c-7b9c6290b370@google.com> Message-ID: References: <68a97fbe-5c1e-7ac6-72c-7b9c6290b370@google.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" There is now no reason for follow_pmd_mask()'s FOLL_SPLIT_PMD block to distinguish huge_zero_page from a normal THP: follow_page_pte() handles any instability, and here it's a good idea to replace any pmd_none(*pmd) by a page table a.s.a.p, in the huge_zero_page case as for a normal THP. (Hmm, couldn't the normal THP case have hit an unstably refaulted THP before? But there are only two, exceptional, users of FOLL_SPLIT_PMD.) Signed-off-by: Hugh Dickins --- mm/gup.c | 19 ++++--------------- 1 file changed, 4 insertions(+), 15 deletions(-) diff --git a/mm/gup.c b/mm/gup.c index bb67193c5460..4ad50a59897f 100644 --- a/mm/gup.c +++ b/mm/gup.c @@ -681,21 +681,10 @@ static struct page *follow_pmd_mask(struct vm_area_st= ruct *vma, return follow_page_pte(vma, address, pmd, flags, &ctx->pgmap); } if (flags & FOLL_SPLIT_PMD) { - int ret; - page =3D pmd_page(*pmd); - if (is_huge_zero_page(page)) { - spin_unlock(ptl); - ret =3D 0; - split_huge_pmd(vma, pmd, address); - if (pmd_trans_unstable(pmd)) - ret =3D -EBUSY; - } else { - spin_unlock(ptl); - split_huge_pmd(vma, pmd, address); - ret =3D pte_alloc(mm, pmd) ? -ENOMEM : 0; - } - - return ret ? ERR_PTR(ret) : + spin_unlock(ptl); + split_huge_pmd(vma, pmd, address); + /* If pmd was left empty, stuff a page table in there quickly */ + return pte_alloc(mm, pmd) ? ERR_PTR(-ENOMEM) : follow_page_pte(vma, address, pmd, flags, &ctx->pgmap); } page =3D follow_trans_huge_pmd(vma, address, pmd, flags); --=20 2.35.3 From nobody Sat Feb 7 19:45:35 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id A4923C77B75 for ; Mon, 22 May 2023 05:23:34 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231967AbjEVFXd (ORCPT ); Mon, 22 May 2023 01:23:33 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:58710 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229559AbjEVFXb (ORCPT ); Mon, 22 May 2023 01:23:31 -0400 Received: from mail-yb1-xb30.google.com (mail-yb1-xb30.google.com [IPv6:2607:f8b0:4864:20::b30]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id DAE4CAA for ; Sun, 21 May 2023 22:23:29 -0700 (PDT) Received: by mail-yb1-xb30.google.com with SMTP id 3f1490d57ef6-ba1815e12efso4784744276.3 for ; Sun, 21 May 2023 22:23:29 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1684733009; x=1687325009; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:from:to:cc:subject:date:message-id:reply-to; bh=lzQPoLjWXuP3+Ka40j/FX+cgzbLo3MyR/78LE/Otflc=; b=OqBANCRQzvanaas2fLT/vDBYw2XZ+pto0Z3I0g1aU/R1OYHuij2QrMo2xy+Rr6C35x l86kCF5UL52/GbeJlQ6bHEBRSpOVELHuSaKXNybOZhqbs+opPxUSDPmtJNEm+jrMaklN ft3L6EaR3I8ziuMCU5sNY2OaZ+zGmtKvbNZ9Tsv9F9B6wECrNMIjTpzlO5d+Z255LDVp yEk+S1bLKTQfkqEB9/oNMA15Vbb0/7X2Ek3v5V8dnbfTlUQxMGQNWKaElfOhfI5B5peu qIbnRaHYRIRCfQSqqRnyf0yf2fu8VPlbHJkeNdMJAA3hkVwFBfoZ10oZW3ZVmkHCSkfx eCng== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1684733009; x=1687325009; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=lzQPoLjWXuP3+Ka40j/FX+cgzbLo3MyR/78LE/Otflc=; b=S4jBqKp8JdodbskzYJHzAPTd8jDWsDM1BgE1xI82F5bq828e2GpRDt7Eg1wqjgWRhu 91QFXHBeHj31sfJSHJv2ESVugylnbHWHp0TACqa3M4MvYrqeF0SdqIoNFTZB82QbvHP1 alpAvXUF8tvn3pCRm0N4ZprPUk40vdwgAgYb1WNSPIa9fgPb1V/p6GvOc5kskZ/MOFZN X8+YNuFmRSVcwNhNDBuRGnS04dr7kUrlJFJn40S+K4vAlgCxwAGAopd/C7xSX3B6uH0g gUI7HnoWZee2CgangzwXCGjqps4m0zZSuT9a7fXtKDpu5AOXjO6u6Ib5AQueanHJ61Qb VjEg== X-Gm-Message-State: AC+VfDwZGcSaJ6kt9oxSOqeHiTfGyy/tAUQ3zHah2E2h3CvxSQFqbzme 7h5QgzCsrHX97oKgnuOutUOqRw== X-Google-Smtp-Source: ACHHUZ7qqz3U4uOnP6Ec8LA+Qz8AEibK4IaZAP5jwPoGGf6Fy3SqZkPf36XnHstJeJW2gmcrXMZvlw== X-Received: by 2002:a25:d2c5:0:b0:ba7:20a:3967 with SMTP id j188-20020a25d2c5000000b00ba7020a3967mr11645090ybg.43.1684733009011; Sun, 21 May 2023 22:23:29 -0700 (PDT) Received: from ripple.attlocal.net (172-10-233-147.lightspeed.sntcca.sbcglobal.net. [172.10.233.147]) by smtp.gmail.com with ESMTPSA id g10-20020a056902134a00b00b9db62abff3sm1277036ybu.58.2023.05.21.22.23.26 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 21 May 2023 22:23:28 -0700 (PDT) Date: Sun, 21 May 2023 22:23:25 -0700 (PDT) From: Hugh Dickins X-X-Sender: hugh@ripple.attlocal.net To: Andrew Morton cc: Mike Kravetz , Mike Rapoport , "Kirill A. Shutemov" , Matthew Wilcox , David Hildenbrand , Suren Baghdasaryan , Qi Zheng , Yang Shi , Mel Gorman , Peter Xu , Peter Zijlstra , Will Deacon , Yu Zhao , Alistair Popple , Ralph Campbell , Ira Weiny , Steven Price , SeongJae Park , Naoya Horiguchi , Christophe Leroy , Zack Rusin , Jason Gunthorpe , Axel Rasmussen , Anshuman Khandual , Pasha Tatashin , Miaohe Lin , Minchan Kim , Christoph Hellwig , Song Liu , Thomas Hellstrom , linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: [PATCH 26/31] mm/huge_memory: split huge pmd under one pte_offset_map() In-Reply-To: <68a97fbe-5c1e-7ac6-72c-7b9c6290b370@google.com> Message-ID: <3f442a9c-af6d-573d-1ad1-f6f413b1abc9@google.com> References: <68a97fbe-5c1e-7ac6-72c-7b9c6290b370@google.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" __split_huge_zero_page_pmd() use a single pte_offset_map() to sweep the extent: it's already under pmd_lock(), so this is no worse for latency; and since it's supposed to have full control of the just-withdrawn page table, here choose to VM_BUG_ON if it were to fail. And please don't increment haddr by PAGE_SIZE, that should remain huge aligned: declare a separate addr (not a bugfix, but it was deceptive). __split_huge_pmd_locked() likewise (but it had declared a separate addr); and change its BUG_ON(!pte_none) to VM_BUG_ON, for consistency with zero (those deposited page tables are sometimes victims of random corruption). Signed-off-by: Hugh Dickins Reviewed-by: Yang Shi --- mm/huge_memory.c | 28 ++++++++++++++++++---------- 1 file changed, 18 insertions(+), 10 deletions(-) diff --git a/mm/huge_memory.c b/mm/huge_memory.c index d4bd5fa7c823..839c13fa0bbe 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -2037,6 +2037,8 @@ static void __split_huge_zero_page_pmd(struct vm_area= _struct *vma, struct mm_struct *mm =3D vma->vm_mm; pgtable_t pgtable; pmd_t _pmd, old_pmd; + unsigned long addr; + pte_t *pte; int i; =20 /* @@ -2052,17 +2054,20 @@ static void __split_huge_zero_page_pmd(struct vm_ar= ea_struct *vma, pgtable =3D pgtable_trans_huge_withdraw(mm, pmd); pmd_populate(mm, &_pmd, pgtable); =20 - for (i =3D 0; i < HPAGE_PMD_NR; i++, haddr +=3D PAGE_SIZE) { - pte_t *pte, entry; - entry =3D pfn_pte(my_zero_pfn(haddr), vma->vm_page_prot); + pte =3D pte_offset_map(&_pmd, haddr); + VM_BUG_ON(!pte); + for (i =3D 0, addr =3D haddr; i < HPAGE_PMD_NR; i++, addr +=3D PAGE_SIZE)= { + pte_t entry; + + entry =3D pfn_pte(my_zero_pfn(addr), vma->vm_page_prot); entry =3D pte_mkspecial(entry); if (pmd_uffd_wp(old_pmd)) entry =3D pte_mkuffd_wp(entry); - pte =3D pte_offset_map(&_pmd, haddr); VM_BUG_ON(!pte_none(*pte)); - set_pte_at(mm, haddr, pte, entry); - pte_unmap(pte); + set_pte_at(mm, addr, pte, entry); + pte++; } + pte_unmap(pte - 1); smp_wmb(); /* make pte visible before pmd */ pmd_populate(mm, pmd, pgtable); } @@ -2077,6 +2082,7 @@ static void __split_huge_pmd_locked(struct vm_area_st= ruct *vma, pmd_t *pmd, bool young, write, soft_dirty, pmd_migration =3D false, uffd_wp =3D false; bool anon_exclusive =3D false, dirty =3D false; unsigned long addr; + pte_t *pte; int i; =20 VM_BUG_ON(haddr & ~HPAGE_PMD_MASK); @@ -2205,8 +2211,10 @@ static void __split_huge_pmd_locked(struct vm_area_s= truct *vma, pmd_t *pmd, pgtable =3D pgtable_trans_huge_withdraw(mm, pmd); pmd_populate(mm, &_pmd, pgtable); =20 + pte =3D pte_offset_map(&_pmd, haddr); + VM_BUG_ON(!pte); for (i =3D 0, addr =3D haddr; i < HPAGE_PMD_NR; i++, addr +=3D PAGE_SIZE)= { - pte_t entry, *pte; + pte_t entry; /* * Note that NUMA hinting access restrictions are not * transferred to avoid any possibility of altering @@ -2249,11 +2257,11 @@ static void __split_huge_pmd_locked(struct vm_area_= struct *vma, pmd_t *pmd, entry =3D pte_mkuffd_wp(entry); page_add_anon_rmap(page + i, vma, addr, false); } - pte =3D pte_offset_map(&_pmd, addr); - BUG_ON(!pte_none(*pte)); + VM_BUG_ON(!pte_none(*pte)); set_pte_at(mm, addr, pte, entry); - pte_unmap(pte); + pte++; } + pte_unmap(pte - 1); =20 if (!pmd_migration) page_remove_rmap(page, vma, true); --=20 2.35.3 From nobody Sat Feb 7 19:45:35 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4AFCFC7EE29 for ; Mon, 22 May 2023 05:24:39 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231971AbjEVFYh (ORCPT ); Mon, 22 May 2023 01:24:37 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:59152 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229559AbjEVFYf (ORCPT ); Mon, 22 May 2023 01:24:35 -0400 Received: from mail-yw1-x1131.google.com (mail-yw1-x1131.google.com [IPv6:2607:f8b0:4864:20::1131]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 48BEDF9 for ; Sun, 21 May 2023 22:24:34 -0700 (PDT) Received: by mail-yw1-x1131.google.com with SMTP id 00721157ae682-561c1436c75so78162957b3.1 for ; Sun, 21 May 2023 22:24:34 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1684733073; x=1687325073; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:from:to:cc:subject:date:message-id:reply-to; bh=9aUzicIbTynIqE9awb1KUIJU7FYy6E6iJhesHPP8TVA=; b=FCyyq681fiCl9+MbURk3V7VS6u/az3wH03OAXe0MvSy7niV7M+xCTcD3LFVHKqg1yQ ike7yvjl4pZ53eTIQqFm0t5pi4r6AsMUJr4uYJMWzl0YHx8ZPfzSAps1Y7Xpi29TZEWv oF3xEY1+E9kXiv/OXuel6wSjjlu4EU8KZchqn8ox3RdwN6yhtrdyc37IOWvkIHSmyayR wIwLertYdChnzsX/BWtNMS+OritTekYale8j0KLTkGM2OoyaSuh4ZrnXK1pihTZaxeir zEXvAqI7WXP5/kZeI5P2fpOTxrVe/RXgYq45Sc6JeGL/JmIWIWy78rlaXhstJBzFaLu3 cGzA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1684733073; x=1687325073; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=9aUzicIbTynIqE9awb1KUIJU7FYy6E6iJhesHPP8TVA=; b=PIqUenp1QqXMWlcsejB5NWwwUUArP4lLVhBSwEfDc+dqDfYDNn2oq1FIsCktv4Mq6q e+tyl7fg9VVQTGQJnL3ov9viN1TpkDt/SjO3sMPC+ue1dA/ke2E7c9HdmMBBMqZGKxjn G9mzJgNpJObKmEwo+HbzQeY0fv3lqvl1hf1sHhpMuQBOwmHvht4QxeLzcEpvU6WMM83I wkK0G26OmmtmgY5JOqHnU9a5LwEWDDjynPlYPJfi+JxGJu5el4gaCml48S78dA5GzIVk f4+VMF+6AcRhfoKTxFTlsODhggyaD/4c5HnAwzeFY7GAAx4NJ+vPgDJ5ySlOmeveqskE wrPw== X-Gm-Message-State: AC+VfDwXRrmRByIWTUKr1TEGc5KX26WoLf21vkN/uHDav6HWMmAxWBXQ DbwzSknkP4PPlnp8pO5h+Q7J9g== X-Google-Smtp-Source: ACHHUZ5k0AE/kz26DrRpHOYfkL1f8tXo8j4DB4kKzH+Hlu6k3+yZrZ26Tvg/gU/iOROW9T7xGJmWkw== X-Received: by 2002:a81:8a01:0:b0:561:b5cc:e10a with SMTP id a1-20020a818a01000000b00561b5cce10amr9111246ywg.6.1684733073356; Sun, 21 May 2023 22:24:33 -0700 (PDT) Received: from ripple.attlocal.net (172-10-233-147.lightspeed.sntcca.sbcglobal.net. [172.10.233.147]) by smtp.gmail.com with ESMTPSA id g189-20020a8152c6000000b00555e1886350sm1840019ywb.78.2023.05.21.22.24.30 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 21 May 2023 22:24:33 -0700 (PDT) Date: Sun, 21 May 2023 22:24:29 -0700 (PDT) From: Hugh Dickins X-X-Sender: hugh@ripple.attlocal.net To: Andrew Morton cc: Mike Kravetz , Mike Rapoport , "Kirill A. Shutemov" , Matthew Wilcox , David Hildenbrand , Suren Baghdasaryan , Qi Zheng , Yang Shi , Mel Gorman , Peter Xu , Peter Zijlstra , Will Deacon , Yu Zhao , Alistair Popple , Ralph Campbell , Ira Weiny , Steven Price , SeongJae Park , Naoya Horiguchi , Christophe Leroy , Zack Rusin , Jason Gunthorpe , Axel Rasmussen , Anshuman Khandual , Pasha Tatashin , Miaohe Lin , Minchan Kim , Christoph Hellwig , Song Liu , Thomas Hellstrom , linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: [PATCH 27/31] mm/khugepaged: allow pte_offset_map[_lock]() to fail In-Reply-To: <68a97fbe-5c1e-7ac6-72c-7b9c6290b370@google.com> Message-ID: References: <68a97fbe-5c1e-7ac6-72c-7b9c6290b370@google.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" __collapse_huge_page_swapin(): don't drop the map after every pte, it only has to be dropped by do_swap_page(); give up if pte_offset_map() fails; trace_mm_collapse_huge_page_swapin() at the end, with result; fix comment on returned result; fix vmf.pgoff, though it's not used. collapse_huge_page(): use pte_offset_map_lock() on the _pmd returned from clearing; allow failure, but it should be impossible there. hpage_collapse_scan_pmd() and collapse_pte_mapped_thp() allow for pte_offset_map_lock() failure. Signed-off-by: Hugh Dickins Reviewed-by: Yang Shi --- mm/khugepaged.c | 72 +++++++++++++++++++++++++++++++++---------------- 1 file changed, 49 insertions(+), 23 deletions(-) diff --git a/mm/khugepaged.c b/mm/khugepaged.c index 732f9ac393fc..49cfa7cdfe93 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -993,9 +993,8 @@ static int check_pmd_still_valid(struct mm_struct *mm, * Only done if hpage_collapse_scan_pmd believes it is worthwhile. * * Called and returns without pte mapped or spinlocks held. - * Note that if false is returned, mmap_lock will be released. + * Returns result: if not SCAN_SUCCEED, mmap_lock has been released. */ - static int __collapse_huge_page_swapin(struct mm_struct *mm, struct vm_area_struct *vma, unsigned long haddr, pmd_t *pmd, @@ -1004,23 +1003,35 @@ static int __collapse_huge_page_swapin(struct mm_st= ruct *mm, int swapped_in =3D 0; vm_fault_t ret =3D 0; unsigned long address, end =3D haddr + (HPAGE_PMD_NR * PAGE_SIZE); + int result; + pte_t *pte =3D NULL; =20 for (address =3D haddr; address < end; address +=3D PAGE_SIZE) { struct vm_fault vmf =3D { .vma =3D vma, .address =3D address, - .pgoff =3D linear_page_index(vma, haddr), + .pgoff =3D linear_page_index(vma, address), .flags =3D FAULT_FLAG_ALLOW_RETRY, .pmd =3D pmd, }; =20 - vmf.pte =3D pte_offset_map(pmd, address); - vmf.orig_pte =3D *vmf.pte; - if (!is_swap_pte(vmf.orig_pte)) { - pte_unmap(vmf.pte); - continue; + if (!pte++) { + pte =3D pte_offset_map(pmd, address); + if (!pte) { + mmap_read_unlock(mm); + result =3D SCAN_PMD_NULL; + goto out; + } } + + vmf.orig_pte =3D *pte; + if (!is_swap_pte(vmf.orig_pte)) + continue; + + vmf.pte =3D pte; ret =3D do_swap_page(&vmf); + /* Which unmaps pte (after perhaps re-checking the entry) */ + pte =3D NULL; =20 /* * do_swap_page returns VM_FAULT_RETRY with released mmap_lock. @@ -1029,24 +1040,29 @@ static int __collapse_huge_page_swapin(struct mm_st= ruct *mm, * resulting in later failure. */ if (ret & VM_FAULT_RETRY) { - trace_mm_collapse_huge_page_swapin(mm, swapped_in, referenced, 0); /* Likely, but not guaranteed, that page lock failed */ - return SCAN_PAGE_LOCK; + result =3D SCAN_PAGE_LOCK; + goto out; } if (ret & VM_FAULT_ERROR) { mmap_read_unlock(mm); - trace_mm_collapse_huge_page_swapin(mm, swapped_in, referenced, 0); - return SCAN_FAIL; + result =3D SCAN_FAIL; + goto out; } swapped_in++; } =20 + if (pte) + pte_unmap(pte); + /* Drain LRU add pagevec to remove extra pin on the swapped in pages */ if (swapped_in) lru_add_drain(); =20 - trace_mm_collapse_huge_page_swapin(mm, swapped_in, referenced, 1); - return SCAN_SUCCEED; + result =3D SCAN_SUCCEED; +out: + trace_mm_collapse_huge_page_swapin(mm, swapped_in, referenced, result); + return result; } =20 static int alloc_charge_hpage(struct page **hpage, struct mm_struct *mm, @@ -1146,9 +1162,6 @@ static int collapse_huge_page(struct mm_struct *mm, u= nsigned long address, address + HPAGE_PMD_SIZE); mmu_notifier_invalidate_range_start(&range); =20 - pte =3D pte_offset_map(pmd, address); - pte_ptl =3D pte_lockptr(mm, pmd); - pmd_ptl =3D pmd_lock(mm, pmd); /* probably unnecessary */ /* * This removes any huge TLB entry from the CPU so we won't allow @@ -1163,13 +1176,18 @@ static int collapse_huge_page(struct mm_struct *mm,= unsigned long address, mmu_notifier_invalidate_range_end(&range); tlb_remove_table_sync_one(); =20 - spin_lock(pte_ptl); - result =3D __collapse_huge_page_isolate(vma, address, pte, cc, - &compound_pagelist); - spin_unlock(pte_ptl); + pte =3D pte_offset_map_lock(mm, &_pmd, address, &pte_ptl); + if (pte) { + result =3D __collapse_huge_page_isolate(vma, address, pte, cc, + &compound_pagelist); + spin_unlock(pte_ptl); + } else { + result =3D SCAN_PMD_NULL; + } =20 if (unlikely(result !=3D SCAN_SUCCEED)) { - pte_unmap(pte); + if (pte) + pte_unmap(pte); spin_lock(pmd_ptl); BUG_ON(!pmd_none(*pmd)); /* @@ -1253,6 +1271,11 @@ static int hpage_collapse_scan_pmd(struct mm_struct = *mm, memset(cc->node_load, 0, sizeof(cc->node_load)); nodes_clear(cc->alloc_nmask); pte =3D pte_offset_map_lock(mm, pmd, address, &ptl); + if (!pte) { + result =3D SCAN_PMD_NULL; + goto out; + } + for (_address =3D address, _pte =3D pte; _pte < pte + HPAGE_PMD_NR; _pte++, _address +=3D PAGE_SIZE) { pte_t pteval =3D *_pte; @@ -1622,8 +1645,10 @@ int collapse_pte_mapped_thp(struct mm_struct *mm, un= signed long addr, * lockless_pages_from_mm() and the hardware page walker can access page * tables while all the high-level locks are held in write mode. */ - start_pte =3D pte_offset_map_lock(mm, pmd, haddr, &ptl); result =3D SCAN_FAIL; + start_pte =3D pte_offset_map_lock(mm, pmd, haddr, &ptl); + if (!start_pte) + goto drop_immap; =20 /* step 1: check all mapped PTEs are to the right huge page */ for (i =3D 0, addr =3D haddr, pte =3D start_pte; @@ -1697,6 +1722,7 @@ int collapse_pte_mapped_thp(struct mm_struct *mm, uns= igned long addr, =20 abort: pte_unmap_unlock(start_pte, ptl); +drop_immap: i_mmap_unlock_write(vma->vm_file->f_mapping); goto drop_hpage; } --=20 2.35.3 From nobody Sat Feb 7 19:45:35 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9B631C7EE23 for ; Mon, 22 May 2023 05:25:37 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231976AbjEVFZg (ORCPT ); Mon, 22 May 2023 01:25:36 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:59554 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229777AbjEVFZb (ORCPT ); Mon, 22 May 2023 01:25:31 -0400 Received: from mail-yb1-xb31.google.com (mail-yb1-xb31.google.com [IPv6:2607:f8b0:4864:20::b31]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B15FEAA for ; Sun, 21 May 2023 22:25:29 -0700 (PDT) Received: by mail-yb1-xb31.google.com with SMTP id 3f1490d57ef6-b9a7e639656so10801845276.0 for ; Sun, 21 May 2023 22:25:29 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1684733129; x=1687325129; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:from:to:cc:subject:date:message-id:reply-to; bh=NBl0ppCRA8l25HwCsBAYjZw+ZZ23COv4vEuhAo6MgQs=; b=Cd3nHwxaC/nh6HEQWhU6+zHLQfimMyJGpQmSRQFhhW/GPP873cSkYFOiS5uVG1tykM Ryshue+/dZ9gOhxIxMYC4NRprObjBBWLSBCffW9HayG++u4h8tLgWq5NECMQwyEn52g4 13GfY4O2VekWiqVxfPenJhxKSjbH9x7XqjT7KEDC9iyAKT/BVUfykNPLsCku8blfINLG XrU8Vy7pHm8Cw4m+KEkicTGibPauERe+Pk26G/c3KRes7WKtltas6Fuf97+EzswuIMvR NjmExJ27Xjr53poHCBGkRWEPVuUwRpPQE56LGrkD5LJsjNMZNI4MUNIfvWEA+VYa9OvY 6Vyg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1684733129; x=1687325129; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=NBl0ppCRA8l25HwCsBAYjZw+ZZ23COv4vEuhAo6MgQs=; b=ZG2h7xmMLO0zlVnx8Jwvydn+5SFq673bfVjTpmGW1EEFbNnczaHOi83MTXXEflPpo+ LNxsUgRZFgXbtgv9CBAh8tn2CccldJ2wcnfKAW83KZpDuhdgA7ppOHQZHCi2kdaWN5fz zQlt4mzpWrX6q7bcKLt0VxwFN69OGRKsXT+atWhUxJBO5o2ciBHHXBGrKX97jIil3GJX mTz2MQKzYket6np77rCZ2Ajwu+6d5HGgfUVwidF4h94t+E7HF0qXH2efer7icGYD5ING V1YNrT7HlDydEXl+Szri9w8DYr67A285K/z0X2ENWnkdnLPR56/Spnm1urtKQcibcScU 1TCQ== X-Gm-Message-State: AC+VfDzyjDrWSbH2XC/PyhLR3cauZWiSkmJhTR1qrXqR05RsMysM+SIy 5//+hsuNI54ULUcCvWct4un4Gw== X-Google-Smtp-Source: ACHHUZ73cLvyzajJn77JRRVp9Omy0BAayWsdbmLAc2xDyGvyYdthxJkt0VkVvx7l16HAMXlwEB2XiA== X-Received: by 2002:a05:6902:1101:b0:ba7:3df3:6df5 with SMTP id o1-20020a056902110100b00ba73df36df5mr12099433ybu.38.1684733128711; Sun, 21 May 2023 22:25:28 -0700 (PDT) Received: from ripple.attlocal.net (172-10-233-147.lightspeed.sntcca.sbcglobal.net. [172.10.233.147]) by smtp.gmail.com with ESMTPSA id x7-20020a259a07000000b00b8f6ec5a955sm1267873ybn.49.2023.05.21.22.25.26 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 21 May 2023 22:25:28 -0700 (PDT) Date: Sun, 21 May 2023 22:25:25 -0700 (PDT) From: Hugh Dickins X-X-Sender: hugh@ripple.attlocal.net To: Andrew Morton cc: Mike Kravetz , Mike Rapoport , "Kirill A. Shutemov" , Matthew Wilcox , David Hildenbrand , Suren Baghdasaryan , Qi Zheng , Yang Shi , Mel Gorman , Peter Xu , Peter Zijlstra , Will Deacon , Yu Zhao , Alistair Popple , Ralph Campbell , Ira Weiny , Steven Price , SeongJae Park , Naoya Horiguchi , Christophe Leroy , Zack Rusin , Jason Gunthorpe , Axel Rasmussen , Anshuman Khandual , Pasha Tatashin , Miaohe Lin , Minchan Kim , Christoph Hellwig , Song Liu , Thomas Hellstrom , linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: [PATCH 28/31] mm/memory: allow pte_offset_map[_lock]() to fail In-Reply-To: <68a97fbe-5c1e-7ac6-72c-7b9c6290b370@google.com> Message-ID: References: <68a97fbe-5c1e-7ac6-72c-7b9c6290b370@google.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" copy_pte_range(): use pte_offset_map_nolock(), and allow for it to fail; but with a comment on some further assumptions that are being made there. zap_pte_range() and zap_pmd_range(): adjust their interaction so that a pte_offset_map_lock() failure in zap_pte_range() leads to a retry in zap_pmd_range(); remove call to pmd_none_or_trans_huge_or_clear_bad(). Allow pte_offset_map_lock() to fail in many functions. Update comment on calling pte_alloc() in do_anonymous_page(). Remove redundant calls to pmd_trans_unstable(), pmd_devmap_trans_unstable(), pmd_none() and pmd_bad(); but leave pmd_none_or_clear_bad() calls in free_pmd_range() and copy_pmd_range(), those do simplify the next level down. Signed-off-by: Hugh Dickins --- mm/memory.c | 172 +++++++++++++++++++++++++--------------------------- 1 file changed, 82 insertions(+), 90 deletions(-) diff --git a/mm/memory.c b/mm/memory.c index 2eb54c0d5d3c..c7b920291a72 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -1012,13 +1012,25 @@ copy_pte_range(struct vm_area_struct *dst_vma, stru= ct vm_area_struct *src_vma, progress =3D 0; init_rss_vec(rss); =20 + /* + * copy_pmd_range()'s prior pmd_none_or_clear_bad(src_pmd), and the + * error handling here, assume that exclusive mmap_lock on dst and src + * protects anon from unexpected THP transitions; with shmem and file + * protected by mmap_lock-less collapse skipping areas with anon_vma + * (whereas vma_needs_copy() skips areas without anon_vma). A rework + * can remove such assumptions later, but this is good enough for now. + */ dst_pte =3D pte_alloc_map_lock(dst_mm, dst_pmd, addr, &dst_ptl); if (!dst_pte) { ret =3D -ENOMEM; goto out; } - src_pte =3D pte_offset_map(src_pmd, addr); - src_ptl =3D pte_lockptr(src_mm, src_pmd); + src_pte =3D pte_offset_map_nolock(src_mm, src_pmd, addr, &src_ptl); + if (!src_pte) { + pte_unmap_unlock(dst_pte, dst_ptl); + /* ret =3D=3D 0 */ + goto out; + } spin_lock_nested(src_ptl, SINGLE_DEPTH_NESTING); orig_src_pte =3D src_pte; orig_dst_pte =3D dst_pte; @@ -1083,8 +1095,7 @@ copy_pte_range(struct vm_area_struct *dst_vma, struct= vm_area_struct *src_vma, } while (dst_pte++, src_pte++, addr +=3D PAGE_SIZE, addr !=3D end); =20 arch_leave_lazy_mmu_mode(); - spin_unlock(src_ptl); - pte_unmap(orig_src_pte); + pte_unmap_unlock(orig_src_pte, src_ptl); add_mm_rss_vec(dst_mm, rss); pte_unmap_unlock(orig_dst_pte, dst_ptl); cond_resched(); @@ -1388,10 +1399,11 @@ static unsigned long zap_pte_range(struct mmu_gathe= r *tlb, swp_entry_t entry; =20 tlb_change_page_size(tlb, PAGE_SIZE); -again: init_rss_vec(rss); - start_pte =3D pte_offset_map_lock(mm, pmd, addr, &ptl); - pte =3D start_pte; + start_pte =3D pte =3D pte_offset_map_lock(mm, pmd, addr, &ptl); + if (!pte) + return addr; + flush_tlb_batched_pending(mm); arch_enter_lazy_mmu_mode(); do { @@ -1507,17 +1519,10 @@ static unsigned long zap_pte_range(struct mmu_gathe= r *tlb, * If we forced a TLB flush (either due to running out of * batch buffers or because we needed to flush dirty TLB * entries before releasing the ptl), free the batched - * memory too. Restart if we didn't do everything. + * memory too. Come back again if we didn't do everything. */ - if (force_flush) { - force_flush =3D 0; + if (force_flush) tlb_flush_mmu(tlb); - } - - if (addr !=3D end) { - cond_resched(); - goto again; - } =20 return addr; } @@ -1536,8 +1541,10 @@ static inline unsigned long zap_pmd_range(struct mmu= _gather *tlb, if (is_swap_pmd(*pmd) || pmd_trans_huge(*pmd) || pmd_devmap(*pmd)) { if (next - addr !=3D HPAGE_PMD_SIZE) __split_huge_pmd(vma, pmd, addr, false, NULL); - else if (zap_huge_pmd(tlb, vma, pmd, addr)) - goto next; + else if (zap_huge_pmd(tlb, vma, pmd, addr)) { + addr =3D next; + continue; + } /* fall through */ } else if (details && details->single_folio && folio_test_pmd_mappable(details->single_folio) && @@ -1550,20 +1557,14 @@ static inline unsigned long zap_pmd_range(struct mm= u_gather *tlb, */ spin_unlock(ptl); } - - /* - * Here there can be other concurrent MADV_DONTNEED or - * trans huge page faults running, and if the pmd is - * none or trans huge it can change under us. This is - * because MADV_DONTNEED holds the mmap_lock in read - * mode. - */ - if (pmd_none_or_trans_huge_or_clear_bad(pmd)) - goto next; - next =3D zap_pte_range(tlb, vma, pmd, addr, next, details); -next: - cond_resched(); - } while (pmd++, addr =3D next, addr !=3D end); + if (pmd_none(*pmd)) { + addr =3D next; + continue; + } + addr =3D zap_pte_range(tlb, vma, pmd, addr, next, details); + if (addr !=3D next) + pmd--; + } while (pmd++, cond_resched(), addr !=3D end); =20 return addr; } @@ -1905,6 +1906,10 @@ static int insert_pages(struct vm_area_struct *vma, = unsigned long addr, const int batch_size =3D min_t(int, pages_to_write_in_pmd, 8); =20 start_pte =3D pte_offset_map_lock(mm, pmd, addr, &pte_lock); + if (!start_pte) { + ret =3D -EFAULT; + goto out; + } for (pte =3D start_pte; pte_idx < batch_size; ++pte, ++pte_idx) { int err =3D insert_page_in_batch_locked(vma, pte, addr, pages[curr_page_idx], prot); @@ -2572,10 +2577,10 @@ static int apply_to_pte_range(struct mm_struct *mm,= pmd_t *pmd, mapped_pte =3D pte =3D (mm =3D=3D &init_mm) ? pte_offset_kernel(pmd, addr) : pte_offset_map_lock(mm, pmd, addr, &ptl); + if (!pte) + return -EINVAL; } =20 - BUG_ON(pmd_huge(*pmd)); - arch_enter_lazy_mmu_mode(); =20 if (fn) { @@ -2804,7 +2809,6 @@ static inline int __wp_page_copy_user(struct page *ds= t, struct page *src, int ret; void *kaddr; void __user *uaddr; - bool locked =3D false; struct vm_area_struct *vma =3D vmf->vma; struct mm_struct *mm =3D vma->vm_mm; unsigned long addr =3D vmf->address; @@ -2830,12 +2834,12 @@ static inline int __wp_page_copy_user(struct page *= dst, struct page *src, * On architectures with software "accessed" bits, we would * take a double page fault, so mark it accessed here. */ + vmf->pte =3D NULL; if (!arch_has_hw_pte_young() && !pte_young(vmf->orig_pte)) { pte_t entry; =20 vmf->pte =3D pte_offset_map_lock(mm, vmf->pmd, addr, &vmf->ptl); - locked =3D true; - if (!likely(pte_same(*vmf->pte, vmf->orig_pte))) { + if (unlikely(!vmf->pte || !pte_same(*vmf->pte, vmf->orig_pte))) { /* * Other thread has already handled the fault * and update local tlb only @@ -2857,13 +2861,12 @@ static inline int __wp_page_copy_user(struct page *= dst, struct page *src, * zeroes. */ if (__copy_from_user_inatomic(kaddr, uaddr, PAGE_SIZE)) { - if (locked) + if (vmf->pte) goto warn; =20 /* Re-validate under PTL if the page is still mapped */ vmf->pte =3D pte_offset_map_lock(mm, vmf->pmd, addr, &vmf->ptl); - locked =3D true; - if (!likely(pte_same(*vmf->pte, vmf->orig_pte))) { + if (unlikely(!vmf->pte || !pte_same(*vmf->pte, vmf->orig_pte))) { /* The PTE changed under us, update local tlb */ update_mmu_tlb(vma, addr, vmf->pte); ret =3D -EAGAIN; @@ -2888,7 +2891,7 @@ static inline int __wp_page_copy_user(struct page *ds= t, struct page *src, ret =3D 0; =20 pte_unlock: - if (locked) + if (vmf->pte) pte_unmap_unlock(vmf->pte, vmf->ptl); kunmap_atomic(kaddr); flush_dcache_page(dst); @@ -3110,7 +3113,7 @@ static vm_fault_t wp_page_copy(struct vm_fault *vmf) * Re-check the pte - we dropped the lock */ vmf->pte =3D pte_offset_map_lock(mm, vmf->pmd, vmf->address, &vmf->ptl); - if (likely(pte_same(*vmf->pte, vmf->orig_pte))) { + if (likely(vmf->pte && pte_same(*vmf->pte, vmf->orig_pte))) { if (old_folio) { if (!folio_test_anon(old_folio)) { dec_mm_counter(mm, mm_counter_file(&old_folio->page)); @@ -3178,19 +3181,20 @@ static vm_fault_t wp_page_copy(struct vm_fault *vmf) /* Free the old page.. */ new_folio =3D old_folio; page_copied =3D 1; - } else { + pte_unmap_unlock(vmf->pte, vmf->ptl); + } else if (vmf->pte) { update_mmu_tlb(vma, vmf->address, vmf->pte); + pte_unmap_unlock(vmf->pte, vmf->ptl); } =20 - if (new_folio) - folio_put(new_folio); - - pte_unmap_unlock(vmf->pte, vmf->ptl); /* * No need to double call mmu_notifier->invalidate_range() callback as * the above ptep_clear_flush_notify() did already call it. */ mmu_notifier_invalidate_range_only_end(&range); + + if (new_folio) + folio_put(new_folio); if (old_folio) { if (page_copied) free_swap_cache(&old_folio->page); @@ -3230,6 +3234,8 @@ vm_fault_t finish_mkwrite_fault(struct vm_fault *vmf) WARN_ON_ONCE(!(vmf->vma->vm_flags & VM_SHARED)); vmf->pte =3D pte_offset_map_lock(vmf->vma->vm_mm, vmf->pmd, vmf->address, &vmf->ptl); + if (!vmf->pte) + return VM_FAULT_NOPAGE; /* * We might have raced with another page fault while we released the * pte_offset_map_lock. @@ -3591,10 +3597,11 @@ static vm_fault_t remove_device_exclusive_entry(str= uct vm_fault *vmf) =20 vmf->pte =3D pte_offset_map_lock(vma->vm_mm, vmf->pmd, vmf->address, &vmf->ptl); - if (likely(pte_same(*vmf->pte, vmf->orig_pte))) + if (likely(vmf->pte && pte_same(*vmf->pte, vmf->orig_pte))) restore_exclusive_pte(vma, vmf->page, vmf->address, vmf->pte); =20 - pte_unmap_unlock(vmf->pte, vmf->ptl); + if (vmf->pte) + pte_unmap_unlock(vmf->pte, vmf->ptl); folio_unlock(folio); folio_put(folio); =20 @@ -3625,6 +3632,8 @@ static vm_fault_t pte_marker_clear(struct vm_fault *v= mf) { vmf->pte =3D pte_offset_map_lock(vmf->vma->vm_mm, vmf->pmd, vmf->address, &vmf->ptl); + if (!vmf->pte) + return 0; /* * Be careful so that we will only recover a special uffd-wp pte into a * none pte. Otherwise it means the pte could have changed, so retry. @@ -3728,11 +3737,9 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) vmf->page =3D pfn_swap_entry_to_page(entry); vmf->pte =3D pte_offset_map_lock(vma->vm_mm, vmf->pmd, vmf->address, &vmf->ptl); - if (unlikely(!pte_same(*vmf->pte, vmf->orig_pte))) { - spin_unlock(vmf->ptl); - goto out; - } - + if (unlikely(!vmf->pte || + !pte_same(*vmf->pte, vmf->orig_pte))) + goto unlock; /* * Get a page reference while we know the page can't be * freed. @@ -3807,7 +3814,7 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) */ vmf->pte =3D pte_offset_map_lock(vma->vm_mm, vmf->pmd, vmf->address, &vmf->ptl); - if (likely(pte_same(*vmf->pte, vmf->orig_pte))) + if (likely(vmf->pte && pte_same(*vmf->pte, vmf->orig_pte))) ret =3D VM_FAULT_OOM; goto unlock; } @@ -3877,7 +3884,7 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) */ vmf->pte =3D pte_offset_map_lock(vma->vm_mm, vmf->pmd, vmf->address, &vmf->ptl); - if (unlikely(!pte_same(*vmf->pte, vmf->orig_pte))) + if (unlikely(!vmf->pte || !pte_same(*vmf->pte, vmf->orig_pte))) goto out_nomap; =20 if (unlikely(!folio_test_uptodate(folio))) { @@ -4003,13 +4010,15 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) /* No need to invalidate - it was non-present before */ update_mmu_cache(vma, vmf->address, vmf->pte); unlock: - pte_unmap_unlock(vmf->pte, vmf->ptl); + if (vmf->pte) + pte_unmap_unlock(vmf->pte, vmf->ptl); out: if (si) put_swap_device(si); return ret; out_nomap: - pte_unmap_unlock(vmf->pte, vmf->ptl); + if (vmf->pte) + pte_unmap_unlock(vmf->pte, vmf->ptl); out_page: folio_unlock(folio); out_release: @@ -4041,22 +4050,12 @@ static vm_fault_t do_anonymous_page(struct vm_fault= *vmf) return VM_FAULT_SIGBUS; =20 /* - * Use pte_alloc() instead of pte_alloc_map(). We can't run - * pte_offset_map() on pmds where a huge pmd might be created - * from a different thread. - * - * pte_alloc_map() is safe to use under mmap_write_lock(mm) or when - * parallel threads are excluded by other means. - * - * Here we only have mmap_read_lock(mm). + * Use pte_alloc() instead of pte_alloc_map(), so that OOM can + * be distinguished from a transient failure of pte_offset_map(). */ if (pte_alloc(vma->vm_mm, vmf->pmd)) return VM_FAULT_OOM; =20 - /* See comment in handle_pte_fault() */ - if (unlikely(pmd_trans_unstable(vmf->pmd))) - return 0; - /* Use the zero-page for reads */ if (!(vmf->flags & FAULT_FLAG_WRITE) && !mm_forbids_zeropage(vma->vm_mm)) { @@ -4064,6 +4063,8 @@ static vm_fault_t do_anonymous_page(struct vm_fault *= vmf) vma->vm_page_prot)); vmf->pte =3D pte_offset_map_lock(vma->vm_mm, vmf->pmd, vmf->address, &vmf->ptl); + if (!vmf->pte) + goto unlock; if (vmf_pte_changed(vmf)) { update_mmu_tlb(vma, vmf->address, vmf->pte); goto unlock; @@ -4104,6 +4105,8 @@ static vm_fault_t do_anonymous_page(struct vm_fault *= vmf) =20 vmf->pte =3D pte_offset_map_lock(vma->vm_mm, vmf->pmd, vmf->address, &vmf->ptl); + if (!vmf->pte) + goto release; if (vmf_pte_changed(vmf)) { update_mmu_tlb(vma, vmf->address, vmf->pte); goto release; @@ -4131,7 +4134,8 @@ static vm_fault_t do_anonymous_page(struct vm_fault *= vmf) /* No need to invalidate - it was non-present before */ update_mmu_cache(vma, vmf->address, vmf->pte); unlock: - pte_unmap_unlock(vmf->pte, vmf->ptl); + if (vmf->pte) + pte_unmap_unlock(vmf->pte, vmf->ptl); return ret; release: folio_put(folio); @@ -4380,15 +4384,10 @@ vm_fault_t finish_fault(struct vm_fault *vmf) return VM_FAULT_OOM; } =20 - /* - * See comment in handle_pte_fault() for how this scenario happens, we - * need to return NOPAGE so that we drop this page. - */ - if (pmd_devmap_trans_unstable(vmf->pmd)) - return VM_FAULT_NOPAGE; - vmf->pte =3D pte_offset_map_lock(vma->vm_mm, vmf->pmd, vmf->address, &vmf->ptl); + if (!vmf->pte) + return VM_FAULT_NOPAGE; =20 /* Re-check under ptl */ if (likely(!vmf_pte_changed(vmf))) { @@ -4630,17 +4629,11 @@ static vm_fault_t do_fault(struct vm_fault *vmf) * The VMA was not fully populated on mmap() or missing VM_DONTEXPAND */ if (!vma->vm_ops->fault) { - /* - * If we find a migration pmd entry or a none pmd entry, which - * should never happen, return SIGBUS - */ - if (unlikely(!pmd_present(*vmf->pmd))) + vmf->pte =3D pte_offset_map_lock(vmf->vma->vm_mm, vmf->pmd, + vmf->address, &vmf->ptl); + if (unlikely(!vmf->pte)) ret =3D VM_FAULT_SIGBUS; else { - vmf->pte =3D pte_offset_map_lock(vmf->vma->vm_mm, - vmf->pmd, - vmf->address, - &vmf->ptl); /* * Make sure this is not a temporary clearing of pte * by holding ptl and checking again. A R/M/W update @@ -5429,10 +5422,9 @@ int follow_pte(struct mm_struct *mm, unsigned long a= ddress, pmd =3D pmd_offset(pud, address); VM_BUG_ON(pmd_trans_huge(*pmd)); =20 - if (pmd_none(*pmd) || unlikely(pmd_bad(*pmd))) - goto out; - ptep =3D pte_offset_map_lock(mm, pmd, address, ptlp); + if (!ptep) + goto out; if (!pte_present(*ptep)) goto unlock; *ptepp =3D ptep; --=20 2.35.3 From nobody Sat Feb 7 19:45:35 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0EC45C77B75 for ; Mon, 22 May 2023 05:26:50 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231982AbjEVF0s (ORCPT ); Mon, 22 May 2023 01:26:48 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:60018 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229777AbjEVF0q (ORCPT ); Mon, 22 May 2023 01:26:46 -0400 Received: from mail-yb1-xb36.google.com (mail-yb1-xb36.google.com [IPv6:2607:f8b0:4864:20::b36]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 0BBCFA8 for ; Sun, 21 May 2023 22:26:45 -0700 (PDT) Received: by mail-yb1-xb36.google.com with SMTP id 3f1490d57ef6-ba86ec8047bso8190201276.3 for ; Sun, 21 May 2023 22:26:45 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1684733204; x=1687325204; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:from:to:cc:subject:date:message-id:reply-to; bh=6V2+W0bna42qnZ43tvOf68yFS9Jt1uoq2b6WWLaZ+Rg=; b=pmFZ8eUZ5rWyUxNcjOAbP+PUX2zOF/4/kxaRJz4z7PKZCL9sh91Sp1fz2BDSpLA8va PshjIXySjEbJBMm+E5T0DV3GV1KfmGHUHW7NoxKfPJtzpKWDj4nZyySG6hAbDZeMssbe CpZvDb9v4tBFY4XeT7TABIe79HTh891UxQUA304GYCwmuhQf0Ddrievqmcn4LZJQZqo+ AyQ3f5iFkop7qEDUFMOD3A4f3L3NVlS89hjrz223HyzYGxeKrVNBksEbzG6kBQ75cK67 0udH2XswWGkcnuzDyqRllpE286zOHOwA3ju/MkmzjXbIYSmo/1Xbs4kh16I1DKzJVpox VJug== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1684733204; x=1687325204; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=6V2+W0bna42qnZ43tvOf68yFS9Jt1uoq2b6WWLaZ+Rg=; b=SLjp1DXd5GogZ6IBSNoErJXShZSId/X3GPp9UjFnmyXFjdir+LmVLz6Pb+e/pjUdVt Zf+KZCFbhWxngbMpdIQQaS0TyjLQB2iCJSTfHO0N5LGiDzV7Y9tZqcG8p6iC8OA81A31 g/0eyNnotTtQcXB7WaWXSG/513oyHwtb8BJFsM9ztCOFFtCr62YBQn76zXcNVFGTQfql 5Q3caVFG7FXMP/22J+NrNM6aQSEBO8yGlzX4WhgTCaE2m8mi5o0qN6di9BzpzV1CbcNN 1XxaQSCBoXJhPb1OpuLYHU29CT7WujF23+0cRZXLIqSU0WqrcE1DA3SELJMgA1e11NMS gV2A== X-Gm-Message-State: AC+VfDx5vVYshF3OyGDoNE1+ZcS3h5yKbeW6veR2Fjca1YyLtlHqQLwL MRz2dv+zSkai+hy9qqtlUKLJQw== X-Google-Smtp-Source: ACHHUZ6vQMBkQllrmMVkhqi7UHm68ru97kbfrpxbzO+74u3FaXGIYp1Xs+So1aiUfDsa50JYfZkYNw== X-Received: by 2002:a81:9383:0:b0:55a:7c7:c756 with SMTP id k125-20020a819383000000b0055a07c7c756mr11286138ywg.31.1684733204133; Sun, 21 May 2023 22:26:44 -0700 (PDT) Received: from ripple.attlocal.net (172-10-233-147.lightspeed.sntcca.sbcglobal.net. [172.10.233.147]) by smtp.gmail.com with ESMTPSA id d206-20020a814fd7000000b0054605c23114sm1832452ywb.66.2023.05.21.22.26.41 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 21 May 2023 22:26:43 -0700 (PDT) Date: Sun, 21 May 2023 22:26:40 -0700 (PDT) From: Hugh Dickins X-X-Sender: hugh@ripple.attlocal.net To: Andrew Morton cc: Mike Kravetz , Mike Rapoport , "Kirill A. Shutemov" , Matthew Wilcox , David Hildenbrand , Suren Baghdasaryan , Qi Zheng , Yang Shi , Mel Gorman , Peter Xu , Peter Zijlstra , Will Deacon , Yu Zhao , Alistair Popple , Ralph Campbell , Ira Weiny , Steven Price , SeongJae Park , Naoya Horiguchi , Christophe Leroy , Zack Rusin , Jason Gunthorpe , Axel Rasmussen , Anshuman Khandual , Pasha Tatashin , Miaohe Lin , Minchan Kim , Christoph Hellwig , Song Liu , Thomas Hellstrom , linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: [PATCH 29/31] mm/memory: handle_pte_fault() use pte_offset_map_nolock() In-Reply-To: <68a97fbe-5c1e-7ac6-72c-7b9c6290b370@google.com> Message-ID: <5f10e87-c413-eb92-fc6-541e52c1f6be@google.com> References: <68a97fbe-5c1e-7ac6-72c-7b9c6290b370@google.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" handle_pte_fault() use pte_offset_map_nolock() to get the vmf.ptl which corresponds to vmf.pte, instead of pte_lockptr() being used later, when there's a chance that the pmd entry might have changed, perhaps to none, or to a huge pmd, with no split ptlock in its struct page. Remove its pmd_devmap_trans_unstable() call: pte_offset_map_nolock() will handle that case by failing. Update the "morph" comment above, looking forward to when shmem or file collapse to THP may not take mmap_lock for write (or not at all). do_numa_page() use the vmf->ptl from handle_pte_fault() at first, but refresh it when refreshing vmf->pte. do_swap_page()'s pte_unmap_same() (the thing that takes ptl to verify a two-part PAE orig_pte) use the vmf->ptl from handle_pte_fault() too; but do_swap_page() is also used by anon THP's __collapse_huge_page_swapin(), so adjust that to set vmf->ptl by pte_offset_map_nolock(). Signed-off-by: Hugh Dickins --- mm/khugepaged.c | 6 ++++-- mm/memory.c | 38 +++++++++++++------------------------- 2 files changed, 17 insertions(+), 27 deletions(-) diff --git a/mm/khugepaged.c b/mm/khugepaged.c index 49cfa7cdfe93..c11db2e78e95 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -1005,6 +1005,7 @@ static int __collapse_huge_page_swapin(struct mm_stru= ct *mm, unsigned long address, end =3D haddr + (HPAGE_PMD_NR * PAGE_SIZE); int result; pte_t *pte =3D NULL; + spinlock_t *ptl; =20 for (address =3D haddr; address < end; address +=3D PAGE_SIZE) { struct vm_fault vmf =3D { @@ -1016,7 +1017,7 @@ static int __collapse_huge_page_swapin(struct mm_stru= ct *mm, }; =20 if (!pte++) { - pte =3D pte_offset_map(pmd, address); + pte =3D pte_offset_map_nolock(mm, pmd, address, &ptl); if (!pte) { mmap_read_unlock(mm); result =3D SCAN_PMD_NULL; @@ -1024,11 +1025,12 @@ static int __collapse_huge_page_swapin(struct mm_st= ruct *mm, } } =20 - vmf.orig_pte =3D *pte; + vmf.orig_pte =3D ptep_get_lockless(pte); if (!is_swap_pte(vmf.orig_pte)) continue; =20 vmf.pte =3D pte; + vmf.ptl =3D ptl; ret =3D do_swap_page(&vmf); /* Which unmaps pte (after perhaps re-checking the entry) */ pte =3D NULL; diff --git a/mm/memory.c b/mm/memory.c index c7b920291a72..4ec46eecefd3 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -2786,10 +2786,9 @@ static inline int pte_unmap_same(struct vm_fault *vm= f) int same =3D 1; #if defined(CONFIG_SMP) || defined(CONFIG_PREEMPTION) if (sizeof(pte_t) > sizeof(unsigned long)) { - spinlock_t *ptl =3D pte_lockptr(vmf->vma->vm_mm, vmf->pmd); - spin_lock(ptl); + spin_lock(vmf->ptl); same =3D pte_same(*vmf->pte, vmf->orig_pte); - spin_unlock(ptl); + spin_unlock(vmf->ptl); } #endif pte_unmap(vmf->pte); @@ -4696,7 +4695,6 @@ static vm_fault_t do_numa_page(struct vm_fault *vmf) * validation through pte_unmap_same(). It's of NUMA type but * the pfn may be screwed if the read is non atomic. */ - vmf->ptl =3D pte_lockptr(vma->vm_mm, vmf->pmd); spin_lock(vmf->ptl); if (unlikely(!pte_same(*vmf->pte, vmf->orig_pte))) { pte_unmap_unlock(vmf->pte, vmf->ptl); @@ -4767,8 +4765,10 @@ static vm_fault_t do_numa_page(struct vm_fault *vmf) flags |=3D TNF_MIGRATED; } else { flags |=3D TNF_MIGRATE_FAIL; - vmf->pte =3D pte_offset_map(vmf->pmd, vmf->address); - spin_lock(vmf->ptl); + vmf->pte =3D pte_offset_map_lock(vma->vm_mm, vmf->pmd, + vmf->address, &vmf->ptl); + if (unlikely(!vmf->pte)) + goto out; if (unlikely(!pte_same(*vmf->pte, vmf->orig_pte))) { pte_unmap_unlock(vmf->pte, vmf->ptl); goto out; @@ -4897,27 +4897,16 @@ static vm_fault_t handle_pte_fault(struct vm_fault = *vmf) vmf->pte =3D NULL; vmf->flags &=3D ~FAULT_FLAG_ORIG_PTE_VALID; } else { - /* - * If a huge pmd materialized under us just retry later. Use - * pmd_trans_unstable() via pmd_devmap_trans_unstable() instead - * of pmd_trans_huge() to ensure the pmd didn't become - * pmd_trans_huge under us and then back to pmd_none, as a - * result of MADV_DONTNEED running immediately after a huge pmd - * fault in a different thread of this mm, in turn leading to a - * misleading pmd_trans_huge() retval. All we have to ensure is - * that it is a regular pmd that we can walk with - * pte_offset_map() and we can do that through an atomic read - * in C, which is what pmd_trans_unstable() provides. - */ - if (pmd_devmap_trans_unstable(vmf->pmd)) - return 0; /* * A regular pmd is established and it can't morph into a huge - * pmd from under us anymore at this point because we hold the - * mmap_lock read mode and khugepaged takes it in write mode. - * So now it's safe to run pte_offset_map(). + * pmd by anon khugepaged, since that takes mmap_lock in write + * mode; but shmem or file collapse to THP could still morph + * it into a huge pmd: just retry later if so. */ - vmf->pte =3D pte_offset_map(vmf->pmd, vmf->address); + vmf->pte =3D pte_offset_map_nolock(vmf->vma->vm_mm, vmf->pmd, + vmf->address, &vmf->ptl); + if (unlikely(!vmf->pte)) + return 0; vmf->orig_pte =3D ptep_get_lockless(vmf->pte); vmf->flags |=3D FAULT_FLAG_ORIG_PTE_VALID; =20 @@ -4936,7 +4925,6 @@ static vm_fault_t handle_pte_fault(struct vm_fault *v= mf) if (pte_protnone(vmf->orig_pte) && vma_is_accessible(vmf->vma)) return do_numa_page(vmf); =20 - vmf->ptl =3D pte_lockptr(vmf->vma->vm_mm, vmf->pmd); spin_lock(vmf->ptl); entry =3D vmf->orig_pte; if (unlikely(!pte_same(*vmf->pte, entry))) { --=20 2.35.3 From nobody Sat Feb 7 19:45:35 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id DB4ADC77B75 for ; Mon, 22 May 2023 05:28:06 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230119AbjEVF2G (ORCPT ); Mon, 22 May 2023 01:28:06 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:60670 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229777AbjEVF2D (ORCPT ); Mon, 22 May 2023 01:28:03 -0400 Received: from mail-yw1-x1135.google.com (mail-yw1-x1135.google.com [IPv6:2607:f8b0:4864:20::1135]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id DE0C5A8 for ; Sun, 21 May 2023 22:28:01 -0700 (PDT) Received: by mail-yw1-x1135.google.com with SMTP id 00721157ae682-561e5014336so44501597b3.1 for ; Sun, 21 May 2023 22:28:01 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1684733281; x=1687325281; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:from:to:cc:subject:date:message-id:reply-to; bh=5QGRbISmNdiSrcRCSB3pJg16AgA7v+5ZDTS/B09Kn8o=; b=fVaNEVFFiBF4weUE59/pRbRilwRxpePXgogsf2ieUkcR0MbQmK9TAJMKwiY+Nl3wQv zKHe//Tgn/dCTvP7G0yKMDDCqiRigP7TN+cRzAhOy91u60ZDaLJj9pISGxGf7L/o9RQV RIdEDl0ysDGYdG9CQkCaefawPcW0Pu46VUAhGlWTpeI7kCMeXW2zSJd4jXBBPMxr+Sk7 BBLjIT+TptnFiPfiUV09WlYuflTUtcZ1khoOvaNMb55o7+CG56Ax+prHX5PMQDGLuiD5 5JiMmNrkUrTjybBbOntUDonvOeyTUvUndYAkHD0GpeLxqXMXoE3ar/hPCiH8QI5EkB9C IHZg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1684733281; x=1687325281; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=5QGRbISmNdiSrcRCSB3pJg16AgA7v+5ZDTS/B09Kn8o=; b=amRV71fttttPb/ZxO5fCWtBl3Ur3yh46nHS8RcaGEnvN3mkmiX4jUY5sKjqn9OVoRf p+jG9WsElZMhtBDlRLKhYNqMRmZTiR9INpSVkCoTthdVz/K/KItG1BscqJoV/T55H2fO jQf8XbDpH0W1WzDXtQiohwN2NL2y6RGYos+9iT5DarvPqPb5psZQFbdh3vcqAr36hLuO RW0LvTd/yPCcVxIhok15c9DwuVWypjMT1NU+bsbhpmFHXTZMRnCqdMaPo9WygLno+iMU kyTrBOoYwWe68vs9Wg2O0ulk7rON4Va6SQJTKEUOf5UFxOaEb+bHwiR0niRzOPbOub2o 76TA== X-Gm-Message-State: AC+VfDwKTMx5mvg68DSTMBCn65jjPHPr9x2lmD5Hq7+aoe3cIdt5KJi1 TXLUZwTy8AI2fRJiaPrPuJDjLQ== X-Google-Smtp-Source: ACHHUZ7e18kwUgmVMtp2lWAMCU5Id6EnMaAFL78obg4zT8QsdVgO5fiVzPuIAxqLocTFuTV77qh/ag== X-Received: by 2002:a81:a044:0:b0:561:e2df:c4d1 with SMTP id x65-20020a81a044000000b00561e2dfc4d1mr10124450ywg.9.1684733280842; Sun, 21 May 2023 22:28:00 -0700 (PDT) Received: from ripple.attlocal.net (172-10-233-147.lightspeed.sntcca.sbcglobal.net. [172.10.233.147]) by smtp.gmail.com with ESMTPSA id k131-20020a816f89000000b0055a416529bbsm1831188ywc.24.2023.05.21.22.27.58 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 21 May 2023 22:28:00 -0700 (PDT) Date: Sun, 21 May 2023 22:27:57 -0700 (PDT) From: Hugh Dickins X-X-Sender: hugh@ripple.attlocal.net To: Andrew Morton cc: Mike Kravetz , Mike Rapoport , "Kirill A. Shutemov" , Matthew Wilcox , David Hildenbrand , Suren Baghdasaryan , Qi Zheng , Yang Shi , Mel Gorman , Peter Xu , Peter Zijlstra , Will Deacon , Yu Zhao , Alistair Popple , Ralph Campbell , Ira Weiny , Steven Price , SeongJae Park , Naoya Horiguchi , Christophe Leroy , Zack Rusin , Jason Gunthorpe , Axel Rasmussen , Anshuman Khandual , Pasha Tatashin , Miaohe Lin , Minchan Kim , Christoph Hellwig , Song Liu , Thomas Hellstrom , linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: [PATCH 30/31] mm/pgtable: delete pmd_trans_unstable() and friends In-Reply-To: <68a97fbe-5c1e-7ac6-72c-7b9c6290b370@google.com> Message-ID: References: <68a97fbe-5c1e-7ac6-72c-7b9c6290b370@google.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Delete pmd_trans_unstable, pmd_none_or_trans_huge_or_clear_bad() and pmd_devmap_trans_unstable(), all now unused. With mixed feelings, delete all the comments on pmd_trans_unstable(). That was very good documentation of a subtle state, and this series does not even eliminate that state: but rather, normalizes and extends it, asking pte_offset_map[_lock]() callers to anticipate failure, without regard for whether mmap_read_lock() or mmap_write_lock() is held. Retain pud_trans_unstable(), which has one use in __handle_mm_fault(), but delete its equivalent pud_none_or_trans_huge_or_dev_or_clear_bad(). While there, move the default arch_needs_pgtable_deposit() definition up near where pgtable_trans_huge_deposit() and withdraw() are declared. Signed-off-by: Hugh Dickins --- include/linux/pgtable.h | 103 +++------------------------------------- mm/khugepaged.c | 4 -- 2 files changed, 7 insertions(+), 100 deletions(-) diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h index 3fabbb018557..a1326e61d7ee 100644 --- a/include/linux/pgtable.h +++ b/include/linux/pgtable.h @@ -599,6 +599,10 @@ extern void pgtable_trans_huge_deposit(struct mm_struc= t *mm, pmd_t *pmdp, extern pgtable_t pgtable_trans_huge_withdraw(struct mm_struct *mm, pmd_t *= pmdp); #endif =20 +#ifndef arch_needs_pgtable_deposit +#define arch_needs_pgtable_deposit() (false) +#endif + #ifdef CONFIG_TRANSPARENT_HUGEPAGE /* * This is an implementation of pmdp_establish() that is only suitable for= an @@ -1300,9 +1304,10 @@ static inline int pud_trans_huge(pud_t pud) } #endif =20 -/* See pmd_none_or_trans_huge_or_clear_bad for discussion. */ -static inline int pud_none_or_trans_huge_or_dev_or_clear_bad(pud_t *pud) +static inline int pud_trans_unstable(pud_t *pud) { +#if defined(CONFIG_TRANSPARENT_HUGEPAGE) && \ + defined(CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD) pud_t pudval =3D READ_ONCE(*pud); =20 if (pud_none(pudval) || pud_trans_huge(pudval) || pud_devmap(pudval)) @@ -1311,104 +1316,10 @@ static inline int pud_none_or_trans_huge_or_dev_or= _clear_bad(pud_t *pud) pud_clear_bad(pud); return 1; } - return 0; -} - -/* See pmd_trans_unstable for discussion. */ -static inline int pud_trans_unstable(pud_t *pud) -{ -#if defined(CONFIG_TRANSPARENT_HUGEPAGE) && \ - defined(CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD) - return pud_none_or_trans_huge_or_dev_or_clear_bad(pud); -#else - return 0; #endif -} - -#ifndef arch_needs_pgtable_deposit -#define arch_needs_pgtable_deposit() (false) -#endif -/* - * This function is meant to be used by sites walking pagetables with - * the mmap_lock held in read mode to protect against MADV_DONTNEED and - * transhuge page faults. MADV_DONTNEED can convert a transhuge pmd - * into a null pmd and the transhuge page fault can convert a null pmd - * into an hugepmd or into a regular pmd (if the hugepage allocation - * fails). While holding the mmap_lock in read mode the pmd becomes - * stable and stops changing under us only if it's not null and not a - * transhuge pmd. When those races occurs and this function makes a - * difference vs the standard pmd_none_or_clear_bad, the result is - * undefined so behaving like if the pmd was none is safe (because it - * can return none anyway). The compiler level barrier() is critically - * important to compute the two checks atomically on the same pmdval. - * - * For 32bit kernels with a 64bit large pmd_t this automatically takes - * care of reading the pmd atomically to avoid SMP race conditions - * against pmd_populate() when the mmap_lock is hold for reading by the - * caller (a special atomic read not done by "gcc" as in the generic - * version above, is also needed when THP is disabled because the page - * fault can populate the pmd from under us). - */ -static inline int pmd_none_or_trans_huge_or_clear_bad(pmd_t *pmd) -{ - pmd_t pmdval =3D pmdp_get_lockless(pmd); - /* - * !pmd_present() checks for pmd migration entries - * - * The complete check uses is_pmd_migration_entry() in linux/swapops.h - * But using that requires moving current function and pmd_trans_unstable= () - * to linux/swapops.h to resolve dependency, which is too much code move. - * - * !pmd_present() is equivalent to is_pmd_migration_entry() currently, - * because !pmd_present() pages can only be under migration not swapped - * out. - * - * pmd_none() is preserved for future condition checks on pmd migration - * entries and not confusing with this function name, although it is - * redundant with !pmd_present(). - */ - if (pmd_none(pmdval) || pmd_trans_huge(pmdval) || - (IS_ENABLED(CONFIG_ARCH_ENABLE_THP_MIGRATION) && !pmd_present(pmdval))) - return 1; - if (unlikely(pmd_bad(pmdval))) { - pmd_clear_bad(pmd); - return 1; - } return 0; } =20 -/* - * This is a noop if Transparent Hugepage Support is not built into - * the kernel. Otherwise it is equivalent to - * pmd_none_or_trans_huge_or_clear_bad(), and shall only be called in - * places that already verified the pmd is not none and they want to - * walk ptes while holding the mmap sem in read mode (write mode don't - * need this). If THP is not enabled, the pmd can't go away under the - * code even if MADV_DONTNEED runs, but if THP is enabled we need to - * run a pmd_trans_unstable before walking the ptes after - * split_huge_pmd returns (because it may have run when the pmd become - * null, but then a page fault can map in a THP and not a regular page). - */ -static inline int pmd_trans_unstable(pmd_t *pmd) -{ -#ifdef CONFIG_TRANSPARENT_HUGEPAGE - return pmd_none_or_trans_huge_or_clear_bad(pmd); -#else - return 0; -#endif -} - -/* - * the ordering of these checks is important for pmds with _page_devmap se= t. - * if we check pmd_trans_unstable() first we will trip the bad_pmd() check - * inside of pmd_none_or_trans_huge_or_clear_bad(). this will end up corre= ctly - * returning 1 but not before it spams dmesg with the pmd_clear_bad() outp= ut. - */ -static inline int pmd_devmap_trans_unstable(pmd_t *pmd) -{ - return pmd_devmap(*pmd) || pmd_trans_unstable(pmd); -} - #ifndef CONFIG_NUMA_BALANCING /* * Technically a PTE can be PROTNONE even when not doing NUMA balancing but diff --git a/mm/khugepaged.c b/mm/khugepaged.c index c11db2e78e95..1083f0e38a07 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -946,10 +946,6 @@ static int hugepage_vma_revalidate(struct mm_struct *m= m, unsigned long address, return SCAN_SUCCEED; } =20 -/* - * See pmd_trans_unstable() for how the result may change out from - * underneath us, even if we hold mmap_lock in read. - */ static int find_pmd_or_thp_or_none(struct mm_struct *mm, unsigned long address, pmd_t **pmd) --=20 2.35.3 From nobody Sat Feb 7 19:45:35 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 00EC7C77B75 for ; Mon, 22 May 2023 05:29:16 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231995AbjEVF3P (ORCPT ); Mon, 22 May 2023 01:29:15 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:32900 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229470AbjEVF3M (ORCPT ); Mon, 22 May 2023 01:29:12 -0400 Received: from mail-yw1-x112d.google.com (mail-yw1-x112d.google.com [IPv6:2607:f8b0:4864:20::112d]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A1ECAA8 for ; Sun, 21 May 2023 22:29:11 -0700 (PDT) Received: by mail-yw1-x112d.google.com with SMTP id 00721157ae682-55db055b412so46112237b3.0 for ; Sun, 21 May 2023 22:29:11 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1684733351; x=1687325351; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:from:to:cc:subject:date:message-id:reply-to; bh=++qSmbn7bhjg3RT15iglmwv4R1gESGfbvlyrc9z9nPg=; b=KUYquKPQFLz686sxc05IvSA2pptVmZo25juuDJPh5PU0X8mcUoMX85X6dQWcuCeLiy uZgK8UCVFiR0+JLDfI+BG2fZ4jpaUFacf9y4wexEGbBLRrMU7EGQ2WyCANVks/wifwjW WIWR1uH1YZ2eCpYGacQpMGZqDG1Rast16Vhqw6loQySVQm459lCNaPM2V4pX8ejuYDHJ iWokAPVtiNyJ1VEM+GanV1g70rryZ2Zn0pEeEaFvrZDQKTZaXrD5lh2IK6/6AvLj649e dxMB/Rriy/3qD/arXkivaOaOUE1co2k6CyJ03pNexh99nj5a3n2SbOmvYztGkHfCQKXx DWrw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1684733351; x=1687325351; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=++qSmbn7bhjg3RT15iglmwv4R1gESGfbvlyrc9z9nPg=; b=QELWOKdeHEuDJWAOisdqC+w+lPOSfvHFt/6s7MecIq25umeWB3gmwYSCSrl6UeL2WM 8w47JIWszwxeeKrcc9ajh7mpv8/KYf56FxbWGBc5AIHVanxt/a4n2ajTz/JAx+9K3H7q yTmcJoiykV37qKxefOkxQf3opfaLp4G8suwmP+PdWRZc5PONlY0v66nKSrJSkNeAXG77 aaSuUHleZOaAcudZjj2UtqnUI+kbmrL5jUi46dRLtvQYLWaD8iXqOxSPN8XIrKTWYdOn qst6A3E4e5pzRRJtR8vavdLfgUcPPYM6BulqniH1sbJ1gYTo44bGayg8dckJon3hGtXc v5CA== X-Gm-Message-State: AC+VfDymliUZORtXohisVzO4clPzDIrtjSYzRcN4XKo7leDoH7709+Lr DnSInEH8YzUrHIAKQcoaMeRuUA== X-Google-Smtp-Source: ACHHUZ58DrOuEAqS21o31T6NkFTUrmPfkTQChFezlC+fyR5u9eV8KPIJhLKCfEeaQKdNd4WYmfpBlw== X-Received: by 2002:a0d:cac7:0:b0:55a:2084:9e05 with SMTP id m190-20020a0dcac7000000b0055a20849e05mr10730696ywd.23.1684733350763; Sun, 21 May 2023 22:29:10 -0700 (PDT) Received: from ripple.attlocal.net (172-10-233-147.lightspeed.sntcca.sbcglobal.net. [172.10.233.147]) by smtp.gmail.com with ESMTPSA id x67-20020a81a046000000b0054fcbf35b94sm1832465ywg.87.2023.05.21.22.29.08 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 21 May 2023 22:29:10 -0700 (PDT) Date: Sun, 21 May 2023 22:29:07 -0700 (PDT) From: Hugh Dickins X-X-Sender: hugh@ripple.attlocal.net To: Andrew Morton cc: Mike Kravetz , Mike Rapoport , "Kirill A. Shutemov" , Matthew Wilcox , David Hildenbrand , Suren Baghdasaryan , Qi Zheng , Yang Shi , Mel Gorman , Peter Xu , Peter Zijlstra , Will Deacon , Yu Zhao , Alistair Popple , Ralph Campbell , Ira Weiny , Steven Price , SeongJae Park , Naoya Horiguchi , Christophe Leroy , Zack Rusin , Jason Gunthorpe , Axel Rasmussen , Anshuman Khandual , Pasha Tatashin , Miaohe Lin , Minchan Kim , Christoph Hellwig , Song Liu , Thomas Hellstrom , linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: [PATCH 31/31] perf/core: Allow pte_offset_map() to fail In-Reply-To: <68a97fbe-5c1e-7ac6-72c-7b9c6290b370@google.com> Message-ID: References: <68a97fbe-5c1e-7ac6-72c-7b9c6290b370@google.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" In rare transient cases, not yet made possible, pte_offset_map() and pte_offet_map_lock() may not find a page table: handle appropriately. Signed-off-by: Hugh Dickins --- This is a perf patch, not an mm patch, and it will want to go in through the tip tree in due course; but keep it in this series for now, so that it's not missed, and not submitted before mm review. kernel/events/core.c | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/kernel/events/core.c b/kernel/events/core.c index db016e418931..174be710f3b3 100644 --- a/kernel/events/core.c +++ b/kernel/events/core.c @@ -7490,6 +7490,7 @@ static u64 perf_get_pgtable_size(struct mm_struct *mm= , unsigned long addr) return pud_leaf_size(pud); =20 pmdp =3D pmd_offset_lockless(pudp, pud, addr); +again: pmd =3D pmdp_get_lockless(pmdp); if (!pmd_present(pmd)) return 0; @@ -7498,6 +7499,9 @@ static u64 perf_get_pgtable_size(struct mm_struct *mm= , unsigned long addr) return pmd_leaf_size(pmd); =20 ptep =3D pte_offset_map(&pmd, addr); + if (!ptep) + goto again; + pte =3D ptep_get_lockless(ptep); if (pte_present(pte)) size =3D pte_leaf_size(pte); --=20 2.35.3