From nobody Sat Feb 7 20:47:53 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3B4D4C7EE29 for ; Fri, 9 Jun 2023 01:07:10 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229661AbjFIBHJ (ORCPT ); Thu, 8 Jun 2023 21:07:09 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:59536 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229520AbjFIBHG (ORCPT ); Thu, 8 Jun 2023 21:07:06 -0400 Received: from mail-yw1-x112e.google.com (mail-yw1-x112e.google.com [IPv6:2607:f8b0:4864:20::112e]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 93D55E47 for ; Thu, 8 Jun 2023 18:07:04 -0700 (PDT) Received: by mail-yw1-x112e.google.com with SMTP id 00721157ae682-568ba7abc11so11400877b3.3 for ; Thu, 08 Jun 2023 18:07:04 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1686272824; x=1688864824; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:from:to:cc:subject:date:message-id:reply-to; bh=FYWG+Dmea1iVO9u1anz9ea+vH7b9XYdNpJnPYqrcQ8g=; b=F4iPSaqGVauSHCN3PPx1HPCWJrbnjh20XWi+31Rxj6kybI3XEkrDo9g8TlN/DUHnID cA24qUKu4G6Vj7NJbRdZASctuqFdhdFDB+bQFh7yqdfrUkggC5zThxCrXeltsHoErZDt xA3Doa45fnOu6mBH+w4Zu0i9kibQ9g3niiOe6jmkqXtnnKBsq4Rlqd1i923q7ctcGoK9 NpGckUH03ViLvNgz5x9NiKAf8htS6SXGATYEGV3wZ9h6z7DQULhRQ42RGKscWpa7yH/X UrzgAwS5xf827WDV7tLvd4mt4ZGJ4FS3qwkbBLtNfkKh0VkQABmVM7ywrrTk/wDqi4Mi rRxQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1686272824; x=1688864824; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=FYWG+Dmea1iVO9u1anz9ea+vH7b9XYdNpJnPYqrcQ8g=; b=hi+mtgluv+dLiMQ20HBArXgNH8AwbgZXVXkDJUiWPmJK3Z23Dl5DYts2FTFPYBOePj mKJWVJird4PgB1/AOv8lUsfmH7qZMIYzM/l8pguUdna1mVHhAfB/sB/8YbWzkCfjNznL g+gkN7RRsIxq+JHPOeK+9ojHHoZ7Lq49XZosB2Dv+MhAIV9E4NPa8s0JgwKp0NkZf5Em XuDu/BI6gMnvXUCoHtn5KhtO057DN370MrszUr00Espu2Yv6QZ2ijfONEOcBHX4o2HLQ K9Jg/M39Sw0NRKmwKQexb+6h7pZ9koHXEaCCgEoCbyTkxvgEsnmHrvdYbWlxkpfKFYUp +aNA== X-Gm-Message-State: AC+VfDxDSR6kjCr9hZtSG/O2D/xbBWZ1fG3KW77AtbI3fewCtDbO5nVJ iutW29kw5L0PdRDTe+wrXQbuxw== X-Google-Smtp-Source: ACHHUZ4e9gtItKHo/uKH6fckN0QfmcYW2cVTmMNzekmPXGojmR2zjeAFUImej/93RTPFo775/xnN5A== X-Received: by 2002:a81:7b0b:0:b0:561:afca:5b4d with SMTP id w11-20020a817b0b000000b00561afca5b4dmr1387844ywc.3.1686272823636; Thu, 08 Jun 2023 18:07:03 -0700 (PDT) Received: from ripple.attlocal.net (172-10-233-147.lightspeed.sntcca.sbcglobal.net. [172.10.233.147]) by smtp.gmail.com with ESMTPSA id y4-20020a81a104000000b00566e949fb9esm282673ywg.82.2023.06.08.18.07.00 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 08 Jun 2023 18:07:02 -0700 (PDT) Date: Thu, 8 Jun 2023 18:06:53 -0700 (PDT) From: Hugh Dickins X-X-Sender: hugh@ripple.attlocal.net To: Andrew Morton cc: Mike Kravetz , Mike Rapoport , "Kirill A. Shutemov" , Matthew Wilcox , David Hildenbrand , Suren Baghdasaryan , Qi Zheng , Yang Shi , Mel Gorman , Peter Xu , Peter Zijlstra , Will Deacon , Yu Zhao , Alistair Popple , Ralph Campbell , Ira Weiny , Steven Price , SeongJae Park , Lorenzo Stoakes , Huang Ying , Naoya Horiguchi , Christophe Leroy , Zack Rusin , Jason Gunthorpe , Axel Rasmussen , Anshuman Khandual , Pasha Tatashin , Miaohe Lin , Minchan Kim , Christoph Hellwig , Song Liu , Thomas Hellstrom , Ryan Roberts , linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: [PATCH v2 01/32] mm: use pmdp_get_lockless() without surplus barrier() In-Reply-To: Message-ID: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Use pmdp_get_lockless() in preference to READ_ONCE(*pmdp), to get a more reliable result with PAE (or READ_ONCE as before without PAE); and remove the unnecessary extra barrier()s which got left behind in its callers. HOWEVER: Note the small print in linux/pgtable.h, where it was designed specifically for fast GUP, and depends on interrupts being disabled for its full guarantee: most callers which have been added (here and before) do NOT have interrupts disabled, so there is still some need for caution. Signed-off-by: Hugh Dickins Acked-by: Yu Zhao Acked-by: Peter Xu --- fs/userfaultfd.c | 10 +--------- include/linux/pgtable.h | 17 ----------------- mm/gup.c | 6 +----- mm/hmm.c | 2 +- mm/khugepaged.c | 5 ----- mm/ksm.c | 3 +-- mm/memory.c | 14 ++------------ mm/mprotect.c | 5 ----- mm/page_vma_mapped.c | 2 +- 9 files changed, 7 insertions(+), 57 deletions(-) diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c index 0fd96d6e39ce..f7a0817b1ec0 100644 --- a/fs/userfaultfd.c +++ b/fs/userfaultfd.c @@ -349,15 +349,7 @@ static inline bool userfaultfd_must_wait(struct userfa= ultfd_ctx *ctx, if (!pud_present(*pud)) goto out; pmd =3D pmd_offset(pud, address); - /* - * READ_ONCE must function as a barrier with narrower scope - * and it must be equivalent to: - * _pmd =3D *pmd; barrier(); - * - * This is to deal with the instability (as in - * pmd_trans_unstable) of the pmd. - */ - _pmd =3D READ_ONCE(*pmd); + _pmd =3D pmdp_get_lockless(pmd); if (pmd_none(_pmd)) goto out; =20 diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h index c5a51481bbb9..8ec27fe69dc8 100644 --- a/include/linux/pgtable.h +++ b/include/linux/pgtable.h @@ -1344,23 +1344,6 @@ static inline int pud_trans_unstable(pud_t *pud) static inline int pmd_none_or_trans_huge_or_clear_bad(pmd_t *pmd) { pmd_t pmdval =3D pmdp_get_lockless(pmd); - /* - * The barrier will stabilize the pmdval in a register or on - * the stack so that it will stop changing under the code. - * - * When CONFIG_TRANSPARENT_HUGEPAGE=3Dy on x86 32bit PAE, - * pmdp_get_lockless is allowed to return a not atomic pmdval - * (for example pointing to an hugepage that has never been - * mapped in the pmd). The below checks will only care about - * the low part of the pmd with 32bit PAE x86 anyway, with the - * exception of pmd_none(). So the important thing is that if - * the low part of the pmd is found null, the high part will - * be also null or the pmd_none() check below would be - * confused. - */ -#ifdef CONFIG_TRANSPARENT_HUGEPAGE - barrier(); -#endif /* * !pmd_present() checks for pmd migration entries * diff --git a/mm/gup.c b/mm/gup.c index bbe416236593..3bd5d3854c51 100644 --- a/mm/gup.c +++ b/mm/gup.c @@ -653,11 +653,7 @@ static struct page *follow_pmd_mask(struct vm_area_str= uct *vma, struct mm_struct *mm =3D vma->vm_mm; =20 pmd =3D pmd_offset(pudp, address); - /* - * The READ_ONCE() will stabilize the pmdval in a register or - * on the stack so that it will stop changing under the code. - */ - pmdval =3D READ_ONCE(*pmd); + pmdval =3D pmdp_get_lockless(pmd); if (pmd_none(pmdval)) return no_page_table(vma, flags); if (!pmd_present(pmdval)) diff --git a/mm/hmm.c b/mm/hmm.c index 6a151c09de5e..e23043345615 100644 --- a/mm/hmm.c +++ b/mm/hmm.c @@ -332,7 +332,7 @@ static int hmm_vma_walk_pmd(pmd_t *pmdp, pmd_t pmd; =20 again: - pmd =3D READ_ONCE(*pmdp); + pmd =3D pmdp_get_lockless(pmdp); if (pmd_none(pmd)) return hmm_vma_walk_hole(start, end, -1, walk); =20 diff --git a/mm/khugepaged.c b/mm/khugepaged.c index 6b9d39d65b73..732f9ac393fc 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -961,11 +961,6 @@ static int find_pmd_or_thp_or_none(struct mm_struct *m= m, return SCAN_PMD_NULL; =20 pmde =3D pmdp_get_lockless(*pmd); - -#ifdef CONFIG_TRANSPARENT_HUGEPAGE - /* See comments in pmd_none_or_trans_huge_or_clear_bad() */ - barrier(); -#endif if (pmd_none(pmde)) return SCAN_PMD_NONE; if (!pmd_present(pmde)) diff --git a/mm/ksm.c b/mm/ksm.c index 0156bded3a66..df2aa281d49d 100644 --- a/mm/ksm.c +++ b/mm/ksm.c @@ -1194,8 +1194,7 @@ static int replace_page(struct vm_area_struct *vma, s= truct page *page, * without holding anon_vma lock for write. So when looking for a * genuine pmde (in which to find pte), test present and !THP together. */ - pmde =3D *pmd; - barrier(); + pmde =3D pmdp_get_lockless(pmd); if (!pmd_present(pmde) || pmd_trans_huge(pmde)) goto out; =20 diff --git a/mm/memory.c b/mm/memory.c index f69fbc251198..2eb54c0d5d3c 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -4925,18 +4925,9 @@ static vm_fault_t handle_pte_fault(struct vm_fault *= vmf) * So now it's safe to run pte_offset_map(). */ vmf->pte =3D pte_offset_map(vmf->pmd, vmf->address); - vmf->orig_pte =3D *vmf->pte; + vmf->orig_pte =3D ptep_get_lockless(vmf->pte); vmf->flags |=3D FAULT_FLAG_ORIG_PTE_VALID; =20 - /* - * some architectures can have larger ptes than wordsize, - * e.g.ppc44x-defconfig has CONFIG_PTE_64BIT=3Dy and - * CONFIG_32BIT=3Dy, so READ_ONCE cannot guarantee atomic - * accesses. The code below just needs a consistent view - * for the ifs and we later double check anyway with the - * ptl lock held. So here a barrier will do. - */ - barrier(); if (pte_none(vmf->orig_pte)) { pte_unmap(vmf->pte); vmf->pte =3D NULL; @@ -5060,9 +5051,8 @@ static vm_fault_t __handle_mm_fault(struct vm_area_st= ruct *vma, if (!(ret & VM_FAULT_FALLBACK)) return ret; } else { - vmf.orig_pmd =3D *vmf.pmd; + vmf.orig_pmd =3D pmdp_get_lockless(vmf.pmd); =20 - barrier(); if (unlikely(is_swap_pmd(vmf.orig_pmd))) { VM_BUG_ON(thp_migration_supported() && !is_pmd_migration_entry(vmf.orig_pmd)); diff --git a/mm/mprotect.c b/mm/mprotect.c index 92d3d3ca390a..c5a13c0f1017 100644 --- a/mm/mprotect.c +++ b/mm/mprotect.c @@ -309,11 +309,6 @@ static inline int pmd_none_or_clear_bad_unless_trans_h= uge(pmd_t *pmd) { pmd_t pmdval =3D pmdp_get_lockless(pmd); =20 - /* See pmd_none_or_trans_huge_or_clear_bad for info on barrier */ -#ifdef CONFIG_TRANSPARENT_HUGEPAGE - barrier(); -#endif - if (pmd_none(pmdval)) return 1; if (pmd_trans_huge(pmdval)) diff --git a/mm/page_vma_mapped.c b/mm/page_vma_mapped.c index 4e448cfbc6ef..64aff6718bdb 100644 --- a/mm/page_vma_mapped.c +++ b/mm/page_vma_mapped.c @@ -210,7 +210,7 @@ bool page_vma_mapped_walk(struct page_vma_mapped_walk *= pvmw) * compiler and used as a stale value after we've observed a * subsequent update. */ - pmde =3D READ_ONCE(*pvmw->pmd); + pmde =3D pmdp_get_lockless(pvmw->pmd); =20 if (pmd_trans_huge(pmde) || is_pmd_migration_entry(pmde) || (pmd_present(pmde) && pmd_devmap(pmde))) { --=20 2.35.3 From nobody Sat Feb 7 20:47:53 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id DE106C7EE25 for ; Fri, 9 Jun 2023 01:08:32 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229595AbjFIBIb (ORCPT ); Thu, 8 Jun 2023 21:08:31 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:59974 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229520AbjFIBI2 (ORCPT ); Thu, 8 Jun 2023 21:08:28 -0400 Received: from mail-qk1-x729.google.com (mail-qk1-x729.google.com [IPv6:2607:f8b0:4864:20::729]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id EBF5CE47 for ; Thu, 8 Jun 2023 18:08:25 -0700 (PDT) Received: by mail-qk1-x729.google.com with SMTP id af79cd13be357-75d50f25cc9so105894385a.1 for ; Thu, 08 Jun 2023 18:08:25 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1686272905; x=1688864905; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:from:to:cc:subject:date:message-id:reply-to; bh=howkEJoV//tXil1Z2yV8DA3fQgmxIGa28HiuE8oWzhc=; b=4TUvz+8KoUFPjBrCAcYR59EYGfTLuV6CLia8XZUURQDZbunBGHJYb+jGkZmsygXgl3 +15vELNZ2tt6xf6nRk+326C0YlS5yYQ2GCB6uL+UjODH+sl0FpGKeWODfJXUrOGUmWtC s13E9TqS54wkrJ0CBEm34+uvV622uJ8q5jR+3xcY9g99e07dWOHLQHDJc1AmZlHp7BF3 G5Ij1yJL5CY3uB6yvHN60sdMB/duTTXIRCBGPDCpPIbQNiuj+9xk+3ynZo+U+EWgi22V hQAEa6GzzVBUVKOsI+w/Ikakst0ZV10IGK7MsNaGUrB0ra9/vx96mtgIfR6mIWCjP/h4 L8aw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1686272905; x=1688864905; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=howkEJoV//tXil1Z2yV8DA3fQgmxIGa28HiuE8oWzhc=; b=FgTjEAl0bUZCfGqT3gBsgiGLZ53wdgMngo1VyxEf1VjbbJzyoOxqw30hGPcFlwS/Lc MSwJdRnPn09Hi64jxgdFCxVp9a2f1P8j/3U0okbnI7ICoyF1Cl87JYDVwaRVdWGgQO5T sCIIXsTCehFSMuLtXmj3pMuvECWlCEjpnZK0KQ2Wi+8RM7oT9HoEGoX4MbdYLIBZ3Zpl STkN+m9tcBHvlbVotxMa9Yk0kWv0W+qKWdyjBMxrRGlk7PPGomF6WeBbXK3Wx7hOgZCN gi0OyOkcKXlmeEqa/AmZSikqs77EAQFEXIV+KDKr1LgJdFmI6OCgyVhokZMXCvaxQarp 5hpQ== X-Gm-Message-State: AC+VfDz09quqzkTQyWgRrlzV0oTMoWm6pK1gkdDZMm6Jqqkf+sI1HCx3 E4cqOvLSKSkO/3OLCDBTFHtjyQ== X-Google-Smtp-Source: ACHHUZ7/PxpX5u49rxcuOgTsJSbOZnLq1smZ1N/fcsjFrRWYRcflJMGjskZfjY8eigXMifdaD6g1aw== X-Received: by 2002:a05:620a:84c6:b0:75d:5571:64c0 with SMTP id pq6-20020a05620a84c600b0075d557164c0mr6330539qkn.37.1686272904886; Thu, 08 Jun 2023 18:08:24 -0700 (PDT) Received: from ripple.attlocal.net (172-10-233-147.lightspeed.sntcca.sbcglobal.net. [172.10.233.147]) by smtp.gmail.com with ESMTPSA id x204-20020a81a0d5000000b00565de196516sm286345ywg.32.2023.06.08.18.08.21 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 08 Jun 2023 18:08:24 -0700 (PDT) Date: Thu, 8 Jun 2023 18:08:20 -0700 (PDT) From: Hugh Dickins X-X-Sender: hugh@ripple.attlocal.net To: Andrew Morton cc: Mike Kravetz , Mike Rapoport , "Kirill A. Shutemov" , Matthew Wilcox , David Hildenbrand , Suren Baghdasaryan , Qi Zheng , Yang Shi , Mel Gorman , Peter Xu , Peter Zijlstra , Will Deacon , Yu Zhao , Alistair Popple , Ralph Campbell , Ira Weiny , Steven Price , SeongJae Park , Lorenzo Stoakes , Huang Ying , Naoya Horiguchi , Christophe Leroy , Zack Rusin , Jason Gunthorpe , Axel Rasmussen , Anshuman Khandual , Pasha Tatashin , Miaohe Lin , Minchan Kim , Christoph Hellwig , Song Liu , Thomas Hellstrom , Ryan Roberts , linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: [PATCH v2 02/32] mm/migrate: remove cruft from migration_entry_wait()s In-Reply-To: Message-ID: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" migration_entry_wait_on_locked() does not need to take a mapped pte pointer, its callers can do the unmap first. Annotate it with __releases(ptl) to reduce sparse warnings. Fold __migration_entry_wait_huge() into migration_entry_wait_huge(). Fold __migration_entry_wait() into migration_entry_wait(), preferring the tighter pte_offset_map_lock() to pte_offset_map() and pte_lockptr(). Signed-off-by: Hugh Dickins Reviewed-by: Alistair Popple --- include/linux/migrate.h | 4 ++-- include/linux/swapops.h | 17 +++-------------- mm/filemap.c | 13 ++++--------- mm/migrate.c | 37 +++++++++++++------------------------ 4 files changed, 22 insertions(+), 49 deletions(-) diff --git a/include/linux/migrate.h b/include/linux/migrate.h index 6241a1596a75..affea3063473 100644 --- a/include/linux/migrate.h +++ b/include/linux/migrate.h @@ -75,8 +75,8 @@ bool isolate_movable_page(struct page *page, isolate_mode= _t mode); =20 int migrate_huge_page_move_mapping(struct address_space *mapping, struct folio *dst, struct folio *src); -void migration_entry_wait_on_locked(swp_entry_t entry, pte_t *ptep, - spinlock_t *ptl); +void migration_entry_wait_on_locked(swp_entry_t entry, spinlock_t *ptl) + __releases(ptl); void folio_migrate_flags(struct folio *newfolio, struct folio *folio); void folio_migrate_copy(struct folio *newfolio, struct folio *folio); int folio_migrate_mapping(struct address_space *mapping, diff --git a/include/linux/swapops.h b/include/linux/swapops.h index 3a451b7afcb3..4c932cb45e0b 100644 --- a/include/linux/swapops.h +++ b/include/linux/swapops.h @@ -332,15 +332,9 @@ static inline bool is_migration_entry_dirty(swp_entry_= t entry) return false; } =20 -extern void __migration_entry_wait(struct mm_struct *mm, pte_t *ptep, - spinlock_t *ptl); extern void migration_entry_wait(struct mm_struct *mm, pmd_t *pmd, unsigned long address); -#ifdef CONFIG_HUGETLB_PAGE -extern void __migration_entry_wait_huge(struct vm_area_struct *vma, - pte_t *ptep, spinlock_t *ptl); extern void migration_entry_wait_huge(struct vm_area_struct *vma, pte_t *p= te); -#endif /* CONFIG_HUGETLB_PAGE */ #else /* CONFIG_MIGRATION */ static inline swp_entry_t make_readable_migration_entry(pgoff_t offset) { @@ -362,15 +356,10 @@ static inline int is_migration_entry(swp_entry_t swp) return 0; } =20 -static inline void __migration_entry_wait(struct mm_struct *mm, pte_t *pte= p, - spinlock_t *ptl) { } static inline void migration_entry_wait(struct mm_struct *mm, pmd_t *pmd, - unsigned long address) { } -#ifdef CONFIG_HUGETLB_PAGE -static inline void __migration_entry_wait_huge(struct vm_area_struct *vma, - pte_t *ptep, spinlock_t *ptl) { } -static inline void migration_entry_wait_huge(struct vm_area_struct *vma, p= te_t *pte) { } -#endif /* CONFIG_HUGETLB_PAGE */ + unsigned long address) { } +static inline void migration_entry_wait_huge(struct vm_area_struct *vma, + pte_t *pte) { } static inline int is_writable_migration_entry(swp_entry_t entry) { return 0; diff --git a/mm/filemap.c b/mm/filemap.c index b4c9bd368b7e..28b42ee848a4 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -1359,8 +1359,6 @@ static inline int folio_wait_bit_common(struct folio = *folio, int bit_nr, /** * migration_entry_wait_on_locked - Wait for a migration entry to be remov= ed * @entry: migration swap entry. - * @ptep: mapped pte pointer. Will return with the ptep unmapped. Only req= uired - * for pte entries, pass NULL for pmd entries. * @ptl: already locked ptl. This function will drop the lock. * * Wait for a migration entry referencing the given page to be removed. Th= is is @@ -1369,13 +1367,13 @@ static inline int folio_wait_bit_common(struct foli= o *folio, int bit_nr, * should be called while holding the ptl for the migration entry referenc= ing * the page. * - * Returns after unmapping and unlocking the pte/ptl with pte_unmap_unlock= (). + * Returns after unlocking the ptl. * * This follows the same logic as folio_wait_bit_common() so see the comme= nts * there. */ -void migration_entry_wait_on_locked(swp_entry_t entry, pte_t *ptep, - spinlock_t *ptl) +void migration_entry_wait_on_locked(swp_entry_t entry, spinlock_t *ptl) + __releases(ptl) { struct wait_page_queue wait_page; wait_queue_entry_t *wait =3D &wait_page.wait; @@ -1409,10 +1407,7 @@ void migration_entry_wait_on_locked(swp_entry_t entr= y, pte_t *ptep, * a valid reference to the page, and it must take the ptl to remove the * migration entry. So the page is valid until the ptl is dropped. */ - if (ptep) - pte_unmap_unlock(ptep, ptl); - else - spin_unlock(ptl); + spin_unlock(ptl); =20 for (;;) { unsigned int flags; diff --git a/mm/migrate.c b/mm/migrate.c index 01cac26a3127..3ecb7a40075f 100644 --- a/mm/migrate.c +++ b/mm/migrate.c @@ -296,14 +296,18 @@ void remove_migration_ptes(struct folio *src, struct = folio *dst, bool locked) * get to the page and wait until migration is finished. * When we return from this function the fault will be retried. */ -void __migration_entry_wait(struct mm_struct *mm, pte_t *ptep, - spinlock_t *ptl) +void migration_entry_wait(struct mm_struct *mm, pmd_t *pmd, + unsigned long address) { + spinlock_t *ptl; + pte_t *ptep; pte_t pte; swp_entry_t entry; =20 - spin_lock(ptl); + ptep =3D pte_offset_map_lock(mm, pmd, address, &ptl); pte =3D *ptep; + pte_unmap(ptep); + if (!is_swap_pte(pte)) goto out; =20 @@ -311,18 +315,10 @@ void __migration_entry_wait(struct mm_struct *mm, pte= _t *ptep, if (!is_migration_entry(entry)) goto out; =20 - migration_entry_wait_on_locked(entry, ptep, ptl); + migration_entry_wait_on_locked(entry, ptl); return; out: - pte_unmap_unlock(ptep, ptl); -} - -void migration_entry_wait(struct mm_struct *mm, pmd_t *pmd, - unsigned long address) -{ - spinlock_t *ptl =3D pte_lockptr(mm, pmd); - pte_t *ptep =3D pte_offset_map(pmd, address); - __migration_entry_wait(mm, ptep, ptl); + spin_unlock(ptl); } =20 #ifdef CONFIG_HUGETLB_PAGE @@ -332,9 +328,9 @@ void migration_entry_wait(struct mm_struct *mm, pmd_t *= pmd, * * This function will release the vma lock before returning. */ -void __migration_entry_wait_huge(struct vm_area_struct *vma, - pte_t *ptep, spinlock_t *ptl) +void migration_entry_wait_huge(struct vm_area_struct *vma, pte_t *ptep) { + spinlock_t *ptl =3D huge_pte_lockptr(hstate_vma(vma), vma->vm_mm, ptep); pte_t pte; =20 hugetlb_vma_assert_locked(vma); @@ -352,16 +348,9 @@ void __migration_entry_wait_huge(struct vm_area_struct= *vma, * lock release in migration_entry_wait_on_locked(). */ hugetlb_vma_unlock_read(vma); - migration_entry_wait_on_locked(pte_to_swp_entry(pte), NULL, ptl); + migration_entry_wait_on_locked(pte_to_swp_entry(pte), ptl); } } - -void migration_entry_wait_huge(struct vm_area_struct *vma, pte_t *pte) -{ - spinlock_t *ptl =3D huge_pte_lockptr(hstate_vma(vma), vma->vm_mm, pte); - - __migration_entry_wait_huge(vma, pte, ptl); -} #endif =20 #ifdef CONFIG_ARCH_ENABLE_THP_MIGRATION @@ -372,7 +361,7 @@ void pmd_migration_entry_wait(struct mm_struct *mm, pmd= _t *pmd) ptl =3D pmd_lock(mm, pmd); if (!is_pmd_migration_entry(*pmd)) goto unlock; - migration_entry_wait_on_locked(pmd_to_swp_entry(*pmd), NULL, ptl); + migration_entry_wait_on_locked(pmd_to_swp_entry(*pmd), ptl); return; unlock: spin_unlock(ptl); --=20 2.35.3 From nobody Sat Feb 7 20:47:53 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 89D7AC7EE29 for ; Fri, 9 Jun 2023 01:09:35 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S236492AbjFIBJe (ORCPT ); Thu, 8 Jun 2023 21:09:34 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:60374 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229520AbjFIBJc (ORCPT ); Thu, 8 Jun 2023 21:09:32 -0400 Received: from mail-ua1-x935.google.com (mail-ua1-x935.google.com [IPv6:2607:f8b0:4864:20::935]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 56462125 for ; Thu, 8 Jun 2023 18:09:31 -0700 (PDT) Received: by mail-ua1-x935.google.com with SMTP id a1e0cc1a2514c-786470fd7a3so464537241.3 for ; Thu, 08 Jun 2023 18:09:31 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1686272970; x=1688864970; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:from:to:cc:subject:date:message-id:reply-to; bh=lc/g6LXQOs93yTgCiGczTkkFK+sAgodNjTysgahDEVw=; b=UcfhhEB8JVYtSaYNjeXN56wB9MqyMQtCQGtprQFTqjfYjEL7E7V3Iy94OMfqel1oXO QZIcQiOjvsa3e98AoL/nIYiIeV2XwYtBY8pjjhbDjSLvNRXQUNuQ1CEyYWBCd3xYPZVP trZtykP/Mof04N1VHc0lVHbgyp1EiW4pew5nO8aSYUb++iO0ncGeIIIiQK3p/c+W0yHU AAa6tsu0E2vKcj4HqLxwkHvPM9meFiA6K3qv+5vTGLBRG4cfgXg5/75cLxanqJajUo8o cFN/FNCedO3dWchgzF6syrkwFnGXauAim52zAU4uX164Btb5VfynbXsQAJloguXMP3eG dROw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1686272970; x=1688864970; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=lc/g6LXQOs93yTgCiGczTkkFK+sAgodNjTysgahDEVw=; b=axaT44TRtnT5adueFlXqt2Nb0labDw9oSj/8uTSjE6Num/DSgY8HVkgAwxHrXVhUh1 o1KmDlILMXr+7jvdHL1QKGS2+TuKcwaNJHYJ6lydsfPBnqV5k/cNq91dB+pdx+NZCybi QpQWOqcpPlJT0AZc0ZbD17msch3lglKJOpiXw1R62MGBQWRt6kGOnuUUHUNOa6/kdUi8 bMZYiDpdzgFPInEjimAfWRbfGFXVflqGZR1kbWZp/KUQ6i85DPlzD5muelK9a9+Yw02s NIJwO4R+FRNRTgZ5zedua5EyO7oa9PlikcqdtPuHLcZfv++nPB+hY0ThDARn92xoIl7J kk+Q== X-Gm-Message-State: AC+VfDzqbQjqY1eaTSYdb/eYyQjbSVlfErPC4Ot9zSQUzlDCU8RFqtrz 3oj6WheFZNs1fVR5LDplA5gtdg== X-Google-Smtp-Source: ACHHUZ5C3ZXPkFDl7beNnC8Wxo2aSMU5SCcCj/7iCu4IfWqU05s5B/rb2VYjxSOJxyLyi/equlOn5A== X-Received: by 2002:a67:f490:0:b0:43b:1dfa:2534 with SMTP id o16-20020a67f490000000b0043b1dfa2534mr162704vsn.10.1686272970295; Thu, 08 Jun 2023 18:09:30 -0700 (PDT) Received: from ripple.attlocal.net (172-10-233-147.lightspeed.sntcca.sbcglobal.net. [172.10.233.147]) by smtp.gmail.com with ESMTPSA id n11-20020a25400b000000b00bb393903508sm603507yba.14.2023.06.08.18.09.26 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 08 Jun 2023 18:09:29 -0700 (PDT) Date: Thu, 8 Jun 2023 18:09:25 -0700 (PDT) From: Hugh Dickins X-X-Sender: hugh@ripple.attlocal.net To: Andrew Morton cc: Mike Kravetz , Mike Rapoport , "Kirill A. Shutemov" , Matthew Wilcox , David Hildenbrand , Suren Baghdasaryan , Qi Zheng , Yang Shi , Mel Gorman , Peter Xu , Peter Zijlstra , Will Deacon , Yu Zhao , Alistair Popple , Ralph Campbell , Ira Weiny , Steven Price , SeongJae Park , Lorenzo Stoakes , Huang Ying , Naoya Horiguchi , Christophe Leroy , Zack Rusin , Jason Gunthorpe , Axel Rasmussen , Anshuman Khandual , Pasha Tatashin , Miaohe Lin , Minchan Kim , Christoph Hellwig , Song Liu , Thomas Hellstrom , Ryan Roberts , linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: [PATCH v2 03/32] mm/pgtable: kmap_local_page() instead of kmap_atomic() In-Reply-To: Message-ID: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" pte_offset_map() was still using kmap_atomic(): update it to the preferred kmap_local_page() before making further changes there, in case we need this as a bisection point; but I doubt it can cause any trouble. Signed-off-by: Hugh Dickins --- include/linux/pgtable.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h index 8ec27fe69dc8..94235ff2706e 100644 --- a/include/linux/pgtable.h +++ b/include/linux/pgtable.h @@ -96,9 +96,9 @@ static inline pte_t *pte_offset_kernel(pmd_t *pmd, unsign= ed long address) =20 #if defined(CONFIG_HIGHPTE) #define pte_offset_map(dir, address) \ - ((pte_t *)kmap_atomic(pmd_page(*(dir))) + \ + ((pte_t *)kmap_local_page(pmd_page(*(dir))) + \ pte_index((address))) -#define pte_unmap(pte) kunmap_atomic((pte)) +#define pte_unmap(pte) kunmap_local((pte)) #else #define pte_offset_map(dir, address) pte_offset_kernel((dir), (address)) #define pte_unmap(pte) ((void)(pte)) /* NOP */ --=20 2.35.3 From nobody Sat Feb 7 20:47:53 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id EDAF1C7EE29 for ; Fri, 9 Jun 2023 01:10:43 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S237180AbjFIBKm (ORCPT ); Thu, 8 Jun 2023 21:10:42 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:32780 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229520AbjFIBKj (ORCPT ); Thu, 8 Jun 2023 21:10:39 -0400 Received: from mail-yw1-x112a.google.com (mail-yw1-x112a.google.com [IPv6:2607:f8b0:4864:20::112a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C15F4184 for ; Thu, 8 Jun 2023 18:10:37 -0700 (PDT) Received: by mail-yw1-x112a.google.com with SMTP id 00721157ae682-565bd368e19so10730187b3.1 for ; Thu, 08 Jun 2023 18:10:37 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1686273037; x=1688865037; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:from:to:cc:subject:date:message-id:reply-to; bh=aQBwLL4HlECbSO5gR059+G6dephAddbWEgO6E2LDWZA=; b=pehkqmIu5uuC2/B3POt5qXky+VS7UB4H0MKZKoA0UPHDqWhVIxnSWvN2qZj214i7Wi xlP2w3eBKNw5TWHpEBt8Ykvx85JhHDfFCoQrHuZvQV3gF+6AdXASRaQWOncyvod5v/cn lgCa62bmqNVztpuFb1tB0iswKKgr36t6UpOuDeHZ2gPb/O3B3TO2wbqCc1yTI1noFll7 3JosUpTyMIooa8ybR3xoSKQBoApqEb5yRt0CGd4bkYJ/922RqEv0JAPqXk9hzJKhIMnT DFobijIf2lK30P2q6urus71da4ADA/iomZybOe90CAfsWC55Rzl6LoyMj3bmbkCn6ZM3 5Fuw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1686273037; x=1688865037; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=aQBwLL4HlECbSO5gR059+G6dephAddbWEgO6E2LDWZA=; b=k9orL6bekU9n1yjUwy70/IBXEuBqTSiI6JJCngPfJA5XjB4jfKspmi7SkIZx9jbR2I BOul06UomJR+USyX0CTWEpmZea9FFJOVOj6Zp6toYOTsnTDcQbtFLmIgo7Kn/Q119cPH d+UkRTlwYWm8QtzVNOyYrRfUCUMw3kdTPdD2kGZzQMyIt4/b9KVJ6EuVCs417sRawYl+ 3pvUwDsiR0DgQgheGRiAxQmI8w7Jmv4+BKQ0DXRvJvHVmO7PLUUM6l4v1NWC1s08bK5r dMo4PVMEL7CK2V+DQAmXTgI/kZTtvSxTOhaNWPL9WGofsHniKYv1FaroZuIngTxIGl4z UbYg== X-Gm-Message-State: AC+VfDxIYdaltIfQG+Lq3tgs37r2J9o0bf0sMA6GZsBrX4z6b3xG/Opn ZGMM4TPaBWDUCCV45BY7XgsMEA== X-Google-Smtp-Source: ACHHUZ4RPJYEqOW+mRJ0KmB3JL8CSbjs+Mlo2tkty+xU7pqquzaJQl3sBeIwjaFrTRmmS8IvpanSCg== X-Received: by 2002:a81:47d4:0:b0:565:9d00:8a14 with SMTP id u203-20020a8147d4000000b005659d008a14mr1378760ywa.20.1686273036762; Thu, 08 Jun 2023 18:10:36 -0700 (PDT) Received: from ripple.attlocal.net (172-10-233-147.lightspeed.sntcca.sbcglobal.net. [172.10.233.147]) by smtp.gmail.com with ESMTPSA id a17-20020a81bb51000000b00545a08184fdsm265795ywl.141.2023.06.08.18.10.33 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 08 Jun 2023 18:10:36 -0700 (PDT) Date: Thu, 8 Jun 2023 18:10:32 -0700 (PDT) From: Hugh Dickins X-X-Sender: hugh@ripple.attlocal.net To: Andrew Morton cc: Mike Kravetz , Mike Rapoport , "Kirill A. Shutemov" , Matthew Wilcox , David Hildenbrand , Suren Baghdasaryan , Qi Zheng , Yang Shi , Mel Gorman , Peter Xu , Peter Zijlstra , Will Deacon , Yu Zhao , Alistair Popple , Ralph Campbell , Ira Weiny , Steven Price , SeongJae Park , Lorenzo Stoakes , Huang Ying , Naoya Horiguchi , Christophe Leroy , Zack Rusin , Jason Gunthorpe , Axel Rasmussen , Anshuman Khandual , Pasha Tatashin , Miaohe Lin , Minchan Kim , Christoph Hellwig , Song Liu , Thomas Hellstrom , Ryan Roberts , linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: [PATCH v2 04/32] mm/pgtable: allow pte_offset_map[_lock]() to fail In-Reply-To: Message-ID: <2929bfd-9893-a374-e463-4c3127ff9b9d@google.com> References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Make pte_offset_map() a wrapper for __pte_offset_map() (optionally outputs pmdval), pte_offset_map_lock() a sparse __cond_lock wrapper for __pte_offset_map_lock(): those __funcs added in mm/pgtable-generic.c. __pte_offset_map() do pmdval validation (including pmd_clear_bad() when pmd_bad()), returning NULL if pmdval is not for a page table. __pte_offset_map_lock() verify pmdval unchanged after getting the lock, trying again if it changed. No #ifdef CONFIG_TRANSPARENT_HUGEPAGE around them: that could be done to cover the imminent case, but we expect to generalize it later, and it makes a mess of where to do the pmd_bad() clearing. Add pte_offset_map_nolock(): outputs ptl like pte_offset_map_lock(), without actually taking the lock. This will be preferred to open uses of pte_lockptr(), because (when split ptlock is in page table's struct page) it points to the right lock for the returned pte pointer, even if *pmd gets changed racily afterwards. Update corresponding Documentation. Do not add the anticipated rcu_read_lock() and rcu_read_unlock()s yet: they have to wait until all architectures are balancing pte_offset_map()s with pte_unmap()s (as in the arch series posted earlier). But comment where they will go, so that it's easy to add them for experiments. And only when those are in place can transient racy failure cases be enabled. Add more safety for the PAE mismatched pmd_low pmd_high case at that time. Signed-off-by: Hugh Dickins --- Documentation/mm/split_page_table_lock.rst | 17 ++++--- include/linux/mm.h | 27 +++++++---- include/linux/pgtable.h | 22 ++++++--- mm/pgtable-generic.c | 56 ++++++++++++++++++++++ 4 files changed, 101 insertions(+), 21 deletions(-) diff --git a/Documentation/mm/split_page_table_lock.rst b/Documentation/mm/= split_page_table_lock.rst index 50ee0dfc95be..a834fad9de12 100644 --- a/Documentation/mm/split_page_table_lock.rst +++ b/Documentation/mm/split_page_table_lock.rst @@ -14,15 +14,20 @@ tables. Access to higher level tables protected by mm->= page_table_lock. There are helpers to lock/unlock a table and other accessor functions: =20 - pte_offset_map_lock() - maps pte and takes PTE table lock, returns pointer to the taken - lock; + maps PTE and takes PTE table lock, returns pointer to PTE with + pointer to its PTE table lock, or returns NULL if no PTE table; + - pte_offset_map_nolock() + maps PTE, returns pointer to PTE with pointer to its PTE table + lock (not taken), or returns NULL if no PTE table; + - pte_offset_map() + maps PTE, returns pointer to PTE, or returns NULL if no PTE table; + - pte_unmap() + unmaps PTE table; - pte_unmap_unlock() unlocks and unmaps PTE table; - pte_alloc_map_lock() - allocates PTE table if needed and take the lock, returns pointer - to taken lock or NULL if allocation failed; - - pte_lockptr() - returns pointer to PTE table lock; + allocates PTE table if needed and takes its lock, returns pointer to + PTE with pointer to its lock, or returns NULL if allocation failed; - pmd_lock() takes PMD table lock, returns pointer to taken lock; - pmd_lockptr() diff --git a/include/linux/mm.h b/include/linux/mm.h index 27ce77080c79..3c2e56980853 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -2787,14 +2787,25 @@ static inline void pgtable_pte_page_dtor(struct pag= e *page) dec_lruvec_page_state(page, NR_PAGETABLE); } =20 -#define pte_offset_map_lock(mm, pmd, address, ptlp) \ -({ \ - spinlock_t *__ptl =3D pte_lockptr(mm, pmd); \ - pte_t *__pte =3D pte_offset_map(pmd, address); \ - *(ptlp) =3D __ptl; \ - spin_lock(__ptl); \ - __pte; \ -}) +pte_t *__pte_offset_map(pmd_t *pmd, unsigned long addr, pmd_t *pmdvalp); +static inline pte_t *pte_offset_map(pmd_t *pmd, unsigned long addr) +{ + return __pte_offset_map(pmd, addr, NULL); +} + +pte_t *__pte_offset_map_lock(struct mm_struct *mm, pmd_t *pmd, + unsigned long addr, spinlock_t **ptlp); +static inline pte_t *pte_offset_map_lock(struct mm_struct *mm, pmd_t *pmd, + unsigned long addr, spinlock_t **ptlp) +{ + pte_t *pte; + + __cond_lock(*ptlp, pte =3D __pte_offset_map_lock(mm, pmd, addr, ptlp)); + return pte; +} + +pte_t *pte_offset_map_nolock(struct mm_struct *mm, pmd_t *pmd, + unsigned long addr, spinlock_t **ptlp); =20 #define pte_unmap_unlock(pte, ptl) do { \ spin_unlock(ptl); \ diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h index 94235ff2706e..3fabbb018557 100644 --- a/include/linux/pgtable.h +++ b/include/linux/pgtable.h @@ -94,14 +94,22 @@ static inline pte_t *pte_offset_kernel(pmd_t *pmd, unsi= gned long address) #define pte_offset_kernel pte_offset_kernel #endif =20 -#if defined(CONFIG_HIGHPTE) -#define pte_offset_map(dir, address) \ - ((pte_t *)kmap_local_page(pmd_page(*(dir))) + \ - pte_index((address))) -#define pte_unmap(pte) kunmap_local((pte)) +#ifdef CONFIG_HIGHPTE +#define __pte_map(pmd, address) \ + ((pte_t *)kmap_local_page(pmd_page(*(pmd))) + pte_index((address))) +#define pte_unmap(pte) do { \ + kunmap_local((pte)); \ + /* rcu_read_unlock() to be added later */ \ +} while (0) #else -#define pte_offset_map(dir, address) pte_offset_kernel((dir), (address)) -#define pte_unmap(pte) ((void)(pte)) /* NOP */ +static inline pte_t *__pte_map(pmd_t *pmd, unsigned long address) +{ + return pte_offset_kernel(pmd, address); +} +static inline void pte_unmap(pte_t *pte) +{ + /* rcu_read_unlock() to be added later */ +} #endif =20 /* Find an entry in the second-level page table.. */ diff --git a/mm/pgtable-generic.c b/mm/pgtable-generic.c index d2fc52bffafc..c7ab18a5fb77 100644 --- a/mm/pgtable-generic.c +++ b/mm/pgtable-generic.c @@ -10,6 +10,8 @@ #include #include #include +#include +#include #include #include =20 @@ -229,3 +231,57 @@ pmd_t pmdp_collapse_flush(struct vm_area_struct *vma, = unsigned long address, } #endif #endif /* CONFIG_TRANSPARENT_HUGEPAGE */ + +pte_t *__pte_offset_map(pmd_t *pmd, unsigned long addr, pmd_t *pmdvalp) +{ + pmd_t pmdval; + + /* rcu_read_lock() to be added later */ + pmdval =3D pmdp_get_lockless(pmd); + if (pmdvalp) + *pmdvalp =3D pmdval; + if (unlikely(pmd_none(pmdval) || is_pmd_migration_entry(pmdval))) + goto nomap; + if (unlikely(pmd_trans_huge(pmdval) || pmd_devmap(pmdval))) + goto nomap; + if (unlikely(pmd_bad(pmdval))) { + pmd_clear_bad(pmd); + goto nomap; + } + return __pte_map(&pmdval, addr); +nomap: + /* rcu_read_unlock() to be added later */ + return NULL; +} + +pte_t *pte_offset_map_nolock(struct mm_struct *mm, pmd_t *pmd, + unsigned long addr, spinlock_t **ptlp) +{ + pmd_t pmdval; + pte_t *pte; + + pte =3D __pte_offset_map(pmd, addr, &pmdval); + if (likely(pte)) + *ptlp =3D pte_lockptr(mm, &pmdval); + return pte; +} + +pte_t *__pte_offset_map_lock(struct mm_struct *mm, pmd_t *pmd, + unsigned long addr, spinlock_t **ptlp) +{ + spinlock_t *ptl; + pmd_t pmdval; + pte_t *pte; +again: + pte =3D __pte_offset_map(pmd, addr, &pmdval); + if (unlikely(!pte)) + return pte; + ptl =3D pte_lockptr(mm, &pmdval); + spin_lock(ptl); + if (likely(pmd_same(pmdval, pmdp_get_lockless(pmd)))) { + *ptlp =3D ptl; + return pte; + } + pte_unmap_unlock(pte, ptl); + goto again; +} --=20 2.35.3 From nobody Sat Feb 7 20:47:53 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 03020C7EE29 for ; Fri, 9 Jun 2023 01:11:43 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S237269AbjFIBLl (ORCPT ); Thu, 8 Jun 2023 21:11:41 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33370 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230175AbjFIBLj (ORCPT ); Thu, 8 Jun 2023 21:11:39 -0400 Received: from mail-qk1-x729.google.com (mail-qk1-x729.google.com [IPv6:2607:f8b0:4864:20::729]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 45E4330F2 for ; Thu, 8 Jun 2023 18:11:35 -0700 (PDT) Received: by mail-qk1-x729.google.com with SMTP id af79cd13be357-75d4b85b3ccso117688185a.2 for ; Thu, 08 Jun 2023 18:11:35 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1686273094; x=1688865094; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:from:to:cc:subject:date:message-id:reply-to; bh=DKO2MGlysYL+7+oQy8UdaapVjC7HViDdumwjMefUUFk=; b=sgrtTue3rv24LYzFS1k289BQNEsOuAchrD6cHOusSyoPb1K73NZICBU4ccdS+MDThN zBEewtmtI4CNAWNB55+HxFZZrtyp16opIUtn0ashIh7fwhjdgk+9O/mwLbz9OSd84hi/ pBLFfsBUzEB4I2fW6b9r72oAK2m+QnFCxV5NwJBtw2NTKg7epNTeTUcPNH+shND90LBV x2i2pLMwvamhFrNXJIWV1Y51CIqpk9A6NTBVqHI51CoxyJBUf9QXsv4+sDVgvqVWsQDD RoswrTGcvQxNRpRSEw4ateSoH47vdZ8PWjBeC9XQHtwBxYtSov2fw90M/6dX+45eNcbK 0hmA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1686273094; x=1688865094; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=DKO2MGlysYL+7+oQy8UdaapVjC7HViDdumwjMefUUFk=; b=Lws1I3lclU1q+6zInSyGiwQzMf+Gs3PMYWrRrKQPhNDBQnhkie/51y9NANO6Kpt5Nu QxY5/FnkMV27SVDEkCAD0LnaPQakAnFNjLJQTKR+SjdxB5Cc5o6e/fAy7MOyq8vl0MXL /GuiDxek7krBnQ4qg5NfLS4X27REZr3AXxOfBd5ToXbsoJ5UieXkuWO0jrH4De0Unyfc s6pNQxXlZZBgWMkBVpCUdSxL9aL5AE86x47d0A9/N4KMo8XghZ1W784GIVa5ztQ9QlQT KnRPW2BorJV8HHBM3db/+J6yDDAKvYOLErN1DQfpX0UNNEzJtT5+zxUpkFvgHJ+pNxAH PS0A== X-Gm-Message-State: AC+VfDx9AJoFaVwFy1ASMgzqQETBCROO5Wg6hBG+uGjFRvgxRqsG/lI3 HC4rSRfTn+7KZMmtEj5NPlq47g== X-Google-Smtp-Source: ACHHUZ43VqWz0kXy5mK3UV6S1YjsSZPeNy+eW38Rkn5m5MYDr07ka9TKGJWrHyw+7CpJWl7tn2JLMg== X-Received: by 2002:a05:620a:a02:b0:75e:ba6e:be65 with SMTP id i2-20020a05620a0a0200b0075eba6ebe65mr6769325qka.55.1686273094097; Thu, 08 Jun 2023 18:11:34 -0700 (PDT) Received: from ripple.attlocal.net (172-10-233-147.lightspeed.sntcca.sbcglobal.net. [172.10.233.147]) by smtp.gmail.com with ESMTPSA id y63-20020a817d42000000b0056507de3d82sm273863ywc.104.2023.06.08.18.11.30 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 08 Jun 2023 18:11:33 -0700 (PDT) Date: Thu, 8 Jun 2023 18:11:29 -0700 (PDT) From: Hugh Dickins X-X-Sender: hugh@ripple.attlocal.net To: Andrew Morton cc: Mike Kravetz , Mike Rapoport , "Kirill A. Shutemov" , Matthew Wilcox , David Hildenbrand , Suren Baghdasaryan , Qi Zheng , Yang Shi , Mel Gorman , Peter Xu , Peter Zijlstra , Will Deacon , Yu Zhao , Alistair Popple , Ralph Campbell , Ira Weiny , Steven Price , SeongJae Park , Lorenzo Stoakes , Huang Ying , Naoya Horiguchi , Christophe Leroy , Zack Rusin , Jason Gunthorpe , Axel Rasmussen , Anshuman Khandual , Pasha Tatashin , Miaohe Lin , Minchan Kim , Christoph Hellwig , Song Liu , Thomas Hellstrom , Ryan Roberts , linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: [PATCH v2 05/32] mm/filemap: allow pte_offset_map_lock() to fail In-Reply-To: Message-ID: <54607cf4-ddb6-7ef3-043-1d2de1a9a71@google.com> References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" filemap_map_pages() allow pte_offset_map_lock() to fail; and remove the pmd_devmap_trans_unstable() check from filemap_map_pmd(), which can safely return to filemap_map_pages() and let pte_offset_map_lock() discover that. Signed-off-by: Hugh Dickins --- mm/filemap.c | 12 +++++------- 1 file changed, 5 insertions(+), 7 deletions(-) diff --git a/mm/filemap.c b/mm/filemap.c index 28b42ee848a4..9e129ad43e0d 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -3408,13 +3408,6 @@ static bool filemap_map_pmd(struct vm_fault *vmf, st= ruct folio *folio, if (pmd_none(*vmf->pmd)) pmd_install(mm, vmf->pmd, &vmf->prealloc_pte); =20 - /* See comment in handle_pte_fault() */ - if (pmd_devmap_trans_unstable(vmf->pmd)) { - folio_unlock(folio); - folio_put(folio); - return true; - } - return false; } =20 @@ -3501,6 +3494,11 @@ vm_fault_t filemap_map_pages(struct vm_fault *vmf, =20 addr =3D vma->vm_start + ((start_pgoff - vma->vm_pgoff) << PAGE_SHIFT); vmf->pte =3D pte_offset_map_lock(vma->vm_mm, vmf->pmd, addr, &vmf->ptl); + if (!vmf->pte) { + folio_unlock(folio); + folio_put(folio); + goto out; + } do { again: page =3D folio_file_page(folio, xas.xa_index); --=20 2.35.3 From nobody Sat Feb 7 20:47:53 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6D1AAC7EE25 for ; Fri, 9 Jun 2023 01:13:02 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S237413AbjFIBNB (ORCPT ); Thu, 8 Jun 2023 21:13:01 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:34262 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S237379AbjFIBM6 (ORCPT ); Thu, 8 Jun 2023 21:12:58 -0400 Received: from mail-yw1-x112b.google.com (mail-yw1-x112b.google.com [IPv6:2607:f8b0:4864:20::112b]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B630A1B9 for ; Thu, 8 Jun 2023 18:12:57 -0700 (PDT) Received: by mail-yw1-x112b.google.com with SMTP id 00721157ae682-5659d85876dso10704177b3.2 for ; Thu, 08 Jun 2023 18:12:57 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1686273177; x=1688865177; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:from:to:cc:subject:date:message-id:reply-to; bh=zTm3eSBiFsuHpLmgEmMbxBID/kzy6WIASRGS/m4xNS4=; b=W/4m2uxIgiBxYZeqFiaOJ+t56Xk1lzeX/0joWZYYIlyVr9x1rZQ0PVY6idfgJo3YAs J06WtHIbHcuFZFi7CTRbs4r/YWp12HHJP2GIpjC6Vtuxoe9P0OFgcqDXalAZfOXD7v9f PF1TUdN6QUVjLjHU9HZGO/hqyJB/REsEub0tjNaQLQmNIjXpYIuYk+iYoQFltIweg+DY h7yV6zg9eLJCboUD4lcwwTs6Aiti1k/7W+hJoajF2B88W4XFTVhUJC0bytvim02PYS13 jSNRaWmPHUADOzQcYjUwHqikvc9EsK2V2BuWiEGpR+wVTRiltwgLI4HXCzIB2BRua0U+ cdFg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1686273177; x=1688865177; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=zTm3eSBiFsuHpLmgEmMbxBID/kzy6WIASRGS/m4xNS4=; b=UVSSapVj1y1RVmv+xCo6QBK9LzR/L2ZGa3fhWlQOEJjhVmWp4HqG7Enl44vCFKXZP/ w/cdcFvp/huelKrBfARW93AFFTfCIwyAobGZi7NByGwIoB6vb2eTeta1UGgz/FES76Sn k6Iyfm9KoYKiv06AcNJEe3vSF4zkfebX6B0w0uCsyQwB+jvfsE2lxHgOFUb9zE0UfD0P AvI0Gt4v72awmF4UofyXjC4zZilnMejXOpOPltTtP+G5DjMWYJFF0zQP88lgek+ULGqc jF5eT/ysuMHtA8ZFoQWBArqVbM1FRLWW/uye2EDp7GbjxUIcYcBBpL1wRs4X6UVQIcSD b2wQ== X-Gm-Message-State: AC+VfDxdKgfKKspsouo8MAWWO95yWf64Lj2bxxaGLA/Jb6kA7MnHISN2 YhyQsCMM9INK+1Bw1mYhpzhUGw== X-Google-Smtp-Source: ACHHUZ7s2CfLundXAjg1vKJyhl3Rk4a+SVhLQYBVIqG94BbiAa+Lman4+SqaG4ekKQNDIkezzL6CKw== X-Received: by 2002:a25:b11e:0:b0:b8e:cb88:1b69 with SMTP id g30-20020a25b11e000000b00b8ecb881b69mr1064722ybj.34.1686273176726; Thu, 08 Jun 2023 18:12:56 -0700 (PDT) Received: from ripple.attlocal.net (172-10-233-147.lightspeed.sntcca.sbcglobal.net. [172.10.233.147]) by smtp.gmail.com with ESMTPSA id b1-20020a252e41000000b00ba6ffc7ef35sm573643ybn.65.2023.06.08.18.12.53 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 08 Jun 2023 18:12:56 -0700 (PDT) Date: Thu, 8 Jun 2023 18:12:52 -0700 (PDT) From: Hugh Dickins X-X-Sender: hugh@ripple.attlocal.net To: Andrew Morton cc: Mike Kravetz , Mike Rapoport , "Kirill A. Shutemov" , Matthew Wilcox , David Hildenbrand , Suren Baghdasaryan , Qi Zheng , Yang Shi , Mel Gorman , Peter Xu , Peter Zijlstra , Will Deacon , Yu Zhao , Alistair Popple , Ralph Campbell , Ira Weiny , Steven Price , SeongJae Park , Lorenzo Stoakes , Huang Ying , Naoya Horiguchi , Christophe Leroy , Zack Rusin , Jason Gunthorpe , Axel Rasmussen , Anshuman Khandual , Pasha Tatashin , Miaohe Lin , Minchan Kim , Christoph Hellwig , Song Liu , Thomas Hellstrom , Ryan Roberts , linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: [PATCH v2 06/32] mm/page_vma_mapped: delete bogosity in page_vma_mapped_walk() In-Reply-To: Message-ID: <87475a22-e59e-2d8b-d78a-df376d314bd@google.com> References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Revert commit a7a69d8ba88d ("mm/thp: another PVMW_SYNC fix in page_vma_mapped_walk()"): I was proud of that "Aha!" commit at the time, but in revisiting page_vma_mapped_walk() for pte_offset_map() failure, that block raised a doubt: and it now seems utterly bogus. The prior map_pte() has taken ptl unconditionally when PVMW_SYNC: I must have forgotten that when making the change. It did no harm, but could not have fixed a BUG or WARN, and is hard to reconcile with coming changes. Signed-off-by: Hugh Dickins --- mm/page_vma_mapped.c | 4 ---- 1 file changed, 4 deletions(-) diff --git a/mm/page_vma_mapped.c b/mm/page_vma_mapped.c index 64aff6718bdb..007dc7456f0e 100644 --- a/mm/page_vma_mapped.c +++ b/mm/page_vma_mapped.c @@ -275,10 +275,6 @@ bool page_vma_mapped_walk(struct page_vma_mapped_walk = *pvmw) goto restart; } pvmw->pte++; - if ((pvmw->flags & PVMW_SYNC) && !pvmw->ptl) { - pvmw->ptl =3D pte_lockptr(mm, pvmw->pmd); - spin_lock(pvmw->ptl); - } } while (pte_none(*pvmw->pte)); =20 if (!pvmw->ptl) { --=20 2.35.3 From nobody Sat Feb 7 20:47:53 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id A184DC7EE29 for ; Fri, 9 Jun 2023 01:14:24 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S236900AbjFIBOW (ORCPT ); Thu, 8 Jun 2023 21:14:22 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:35114 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229684AbjFIBOT (ORCPT ); Thu, 8 Jun 2023 21:14:19 -0400 Received: from mail-yw1-x1129.google.com (mail-yw1-x1129.google.com [IPv6:2607:f8b0:4864:20::1129]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 082BF184 for ; Thu, 8 Jun 2023 18:14:18 -0700 (PDT) Received: by mail-yw1-x1129.google.com with SMTP id 00721157ae682-568af2f6454so11288867b3.1 for ; Thu, 08 Jun 2023 18:14:17 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1686273257; x=1688865257; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:from:to:cc:subject:date:message-id:reply-to; bh=dk0QCHZ/ZrkXov8P0kyfxgscgrLXPV5n2YV5/OCWJso=; b=b0CpC9XpkwDMafp8znPdIUsufEoFF4TcEqSqtaBP/NRJK+TenjbuJzsvixBg7OeqD/ 2YfCQZOz9K140u4I0LV9p7i9HmUHhwPj5OjQEN4SArRt2Or8i+5jxZhLGecB/haRDICX vtLEyf8OOmoIO0wzxw5muMOyeWGS5tJSD8isNbt74/I3PyUiS3JXgFtf8p9995UjX66/ sVUXiIkxjAllgABbdRX2liHFwE/N9RUJ3gtPDEiB01DtZbdswHmgd6fCDnuNDxOd5cgB 2PYb4O3ns/eFa8oej3CUcEw8DbK3FH4hcK4a5VnPjgKE3o+d2ZfZbD5hqntDTaQ68DiO 87/g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1686273257; x=1688865257; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=dk0QCHZ/ZrkXov8P0kyfxgscgrLXPV5n2YV5/OCWJso=; b=cZXaLpHOW5f8K39tCp7RAuFbU3hl5zL6ywwsqG+zO0l/3C13RfPSWbIVVJM+xipJXC 3PGWQmCmcL0yBLp9+Za5rNVMAQ9l1/lAKIKDDretRntFJbdYXBnh3qX6QADMAHgplGpa uRPffwTBtnYxx1IV1CQa1c9YEQKBiustagzNgUfmUmMrPDsPI+5TEYAEmA6pKkjNN8lG R21R3D0PuVLCMNABrZBohoCNHxKsA5y848nQmw1jLfotBCDrA0rhsEy6HyKml2KKpH5I WJfpaK5LGW67kVb4lw+Pml6dbd5DY+N3PuodXhhQL3Y+1ozwVL77+szLjpWOlWRxNC7v CCBA== X-Gm-Message-State: AC+VfDyZwd2f27vAbE5tA8HDrxaVfyydBA4JaASCGvged51YxnV/4OeL ci3PKe6kUqiT5XBm2dpowG5rrw== X-Google-Smtp-Source: ACHHUZ6G5/QqeKVeqB1vcKg8Kl9KAE//8sPZoP2ai/tJpfMIrL/rfl6x9oi+hpyk1yDqLu4iwiQrEA== X-Received: by 2002:a0d:d6c8:0:b0:549:2623:6f65 with SMTP id y191-20020a0dd6c8000000b0054926236f65mr1346506ywd.33.1686273257036; Thu, 08 Jun 2023 18:14:17 -0700 (PDT) Received: from ripple.attlocal.net (172-10-233-147.lightspeed.sntcca.sbcglobal.net. [172.10.233.147]) by smtp.gmail.com with ESMTPSA id t66-20020a815f45000000b0054fba955474sm301207ywb.17.2023.06.08.18.14.13 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 08 Jun 2023 18:14:16 -0700 (PDT) Date: Thu, 8 Jun 2023 18:14:12 -0700 (PDT) From: Hugh Dickins X-X-Sender: hugh@ripple.attlocal.net To: Andrew Morton cc: Mike Kravetz , Mike Rapoport , "Kirill A. Shutemov" , Matthew Wilcox , David Hildenbrand , Suren Baghdasaryan , Qi Zheng , Yang Shi , Mel Gorman , Peter Xu , Peter Zijlstra , Will Deacon , Yu Zhao , Alistair Popple , Ralph Campbell , Ira Weiny , Steven Price , SeongJae Park , Lorenzo Stoakes , Huang Ying , Naoya Horiguchi , Christophe Leroy , Zack Rusin , Jason Gunthorpe , Axel Rasmussen , Anshuman Khandual , Pasha Tatashin , Miaohe Lin , Minchan Kim , Christoph Hellwig , Song Liu , Thomas Hellstrom , Ryan Roberts , linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: [PATCH v2 07/32] mm/page_vma_mapped: reformat map_pte() with less indentation In-Reply-To: Message-ID: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" No functional change here, but adjust the format of map_pte() so that the following commit will be easier to read: separate out the PVMW_SYNC case first, and remove two levels of indentation from the ZONE_DEVICE case. Signed-off-by: Hugh Dickins --- mm/page_vma_mapped.c | 65 +++++++++++++++++++++++--------------------- 1 file changed, 34 insertions(+), 31 deletions(-) diff --git a/mm/page_vma_mapped.c b/mm/page_vma_mapped.c index 007dc7456f0e..947dc7491815 100644 --- a/mm/page_vma_mapped.c +++ b/mm/page_vma_mapped.c @@ -15,38 +15,41 @@ static inline bool not_found(struct page_vma_mapped_wal= k *pvmw) =20 static bool map_pte(struct page_vma_mapped_walk *pvmw) { - pvmw->pte =3D pte_offset_map(pvmw->pmd, pvmw->address); - if (!(pvmw->flags & PVMW_SYNC)) { - if (pvmw->flags & PVMW_MIGRATION) { - if (!is_swap_pte(*pvmw->pte)) - return false; - } else { - /* - * We get here when we are trying to unmap a private - * device page from the process address space. Such - * page is not CPU accessible and thus is mapped as - * a special swap entry, nonetheless it still does - * count as a valid regular mapping for the page (and - * is accounted as such in page maps count). - * - * So handle this special case as if it was a normal - * page mapping ie lock CPU page table and returns - * true. - * - * For more details on device private memory see HMM - * (include/linux/hmm.h or mm/hmm.c). - */ - if (is_swap_pte(*pvmw->pte)) { - swp_entry_t entry; + if (pvmw->flags & PVMW_SYNC) { + /* Use the stricter lookup */ + pvmw->pte =3D pte_offset_map_lock(pvmw->vma->vm_mm, pvmw->pmd, + pvmw->address, &pvmw->ptl); + return true; + } =20 - /* Handle un-addressable ZONE_DEVICE memory */ - entry =3D pte_to_swp_entry(*pvmw->pte); - if (!is_device_private_entry(entry) && - !is_device_exclusive_entry(entry)) - return false; - } else if (!pte_present(*pvmw->pte)) - return false; - } + pvmw->pte =3D pte_offset_map(pvmw->pmd, pvmw->address); + if (pvmw->flags & PVMW_MIGRATION) { + if (!is_swap_pte(*pvmw->pte)) + return false; + } else if (is_swap_pte(*pvmw->pte)) { + swp_entry_t entry; + /* + * Handle un-addressable ZONE_DEVICE memory. + * + * We get here when we are trying to unmap a private + * device page from the process address space. Such + * page is not CPU accessible and thus is mapped as + * a special swap entry, nonetheless it still does + * count as a valid regular mapping for the page + * (and is accounted as such in page maps count). + * + * So handle this special case as if it was a normal + * page mapping ie lock CPU page table and return true. + * + * For more details on device private memory see HMM + * (include/linux/hmm.h or mm/hmm.c). + */ + entry =3D pte_to_swp_entry(*pvmw->pte); + if (!is_device_private_entry(entry) && + !is_device_exclusive_entry(entry)) + return false; + } else if (!pte_present(*pvmw->pte)) { + return false; } pvmw->ptl =3D pte_lockptr(pvmw->vma->vm_mm, pvmw->pmd); spin_lock(pvmw->ptl); --=20 2.35.3 From nobody Sat Feb 7 20:47:53 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 120D7C7EE25 for ; Fri, 9 Jun 2023 01:15:55 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S237446AbjFIBPx (ORCPT ); Thu, 8 Jun 2023 21:15:53 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:35666 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229458AbjFIBPv (ORCPT ); Thu, 8 Jun 2023 21:15:51 -0400 Received: from mail-qt1-x831.google.com (mail-qt1-x831.google.com [IPv6:2607:f8b0:4864:20::831]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 011A2184 for ; Thu, 8 Jun 2023 18:15:48 -0700 (PDT) Received: by mail-qt1-x831.google.com with SMTP id d75a77b69052e-3f9d8aa9025so314401cf.0 for ; Thu, 08 Jun 2023 18:15:48 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1686273348; x=1688865348; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:from:to:cc:subject:date:message-id:reply-to; bh=9VYOFAJpcVHpYrrS1TQZcIVdYEqSjI0GrwCgA5USRTU=; b=qGqUUKrUhXZqdfIpdRpru1enURAlAM37w3JnFCBdqcN5NOVKva/5ZB+Wx1AncWuaoB 4NOGT/sluCqCWe9k46AbMdrfO/y7S/uoPCSIv3YcbyPtfcHQLmMhf87/VSB5iKcTyY2z DvH2xbXgY53bvpfsWKoRD42G3UAVdO/UJ+NkrWhrUDvDf4DfgkGc7vx+86uzLhaB2Vdl OBAdZ3Qvm5QESmYlbn3dnRPJ/xpFtDzek+36j2PNxi4y2bmdWvVJycdWxpou775822Zd vv/wKkp98uy9cbO4mRL/ywca8CgLJugp1BRNkerDGGNXhj+j6x2eNtUbygfDvk9GNFDY 1+3Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1686273348; x=1688865348; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=9VYOFAJpcVHpYrrS1TQZcIVdYEqSjI0GrwCgA5USRTU=; b=Icv9/FZFuPUNs31zosXQhizhDMouvqCGP6C+6z4ybduYeYfS9FUuRcqjtYVGBTWaZs Eart5unOrjk/xdStmc82tnDo0Qbs7urWpnvuKkg6gZRm24Bkcj4UAZDSPEKitOyUQrBt CBBwCz1+LyZUtzgaiNiu4vEZJh8UVIPcuwta8zvXyLbJVPcuf1EStT7dffWi4D3sSg3R Ig6NkAIo+uxSKzmaElzqPIFn6FIwbEVVrniaPx7oFNLzeoBiaRA43gKUeeq4NtOHeG9U gvtra5VT3xe41IxaIXUSh9HdytNZT0VJyT2g05TaEl1IaT09Pm1quDWkt0zHnJtsiGVi FYFw== X-Gm-Message-State: AC+VfDygGNirJyjv3y0l0pEs35ebzrsmxj2uknHVBi/tqB+uwRcBFyIS Jyed0TDyT0IaIPKK7FOWohjgmw== X-Google-Smtp-Source: ACHHUZ5RPLaTpoCJzmmgtuS8+EduaIDi339U1d6YVPQERnp7XZzVYgUh9cqEFrnN8L2JonZ6XbEg1g== X-Received: by 2002:ac8:57d2:0:b0:3f6:9a18:e67c with SMTP id w18-20020ac857d2000000b003f69a18e67cmr42699qta.66.1686273348055; Thu, 08 Jun 2023 18:15:48 -0700 (PDT) Received: from ripple.attlocal.net (172-10-233-147.lightspeed.sntcca.sbcglobal.net. [172.10.233.147]) by smtp.gmail.com with ESMTPSA id u15-20020a25840f000000b00b9e2ef25f1asm583095ybk.44.2023.06.08.18.15.44 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 08 Jun 2023 18:15:47 -0700 (PDT) Date: Thu, 8 Jun 2023 18:15:43 -0700 (PDT) From: Hugh Dickins X-X-Sender: hugh@ripple.attlocal.net To: Andrew Morton cc: Mike Kravetz , Mike Rapoport , "Kirill A. Shutemov" , Matthew Wilcox , David Hildenbrand , Suren Baghdasaryan , Qi Zheng , Yang Shi , Mel Gorman , Peter Xu , Peter Zijlstra , Will Deacon , Yu Zhao , Alistair Popple , Ralph Campbell , Ira Weiny , Steven Price , SeongJae Park , Lorenzo Stoakes , Huang Ying , Naoya Horiguchi , Christophe Leroy , Zack Rusin , Jason Gunthorpe , Axel Rasmussen , Anshuman Khandual , Pasha Tatashin , Miaohe Lin , Minchan Kim , Christoph Hellwig , Song Liu , Thomas Hellstrom , Ryan Roberts , linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: [PATCH v2 08/32] mm/page_vma_mapped: pte_offset_map_nolock() not pte_lockptr() In-Reply-To: Message-ID: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" map_pte() use pte_offset_map_nolock(), to make sure of the ptl belonging to pte, even if pmd entry is then changed racily: page_vma_mapped_walk() use that instead of getting pte_lockptr() later, or restart if map_pte() found no page table. Signed-off-by: Hugh Dickins --- mm/page_vma_mapped.c | 28 ++++++++++++++++++++++------ 1 file changed, 22 insertions(+), 6 deletions(-) diff --git a/mm/page_vma_mapped.c b/mm/page_vma_mapped.c index 947dc7491815..2af734274073 100644 --- a/mm/page_vma_mapped.c +++ b/mm/page_vma_mapped.c @@ -13,16 +13,28 @@ static inline bool not_found(struct page_vma_mapped_wal= k *pvmw) return false; } =20 -static bool map_pte(struct page_vma_mapped_walk *pvmw) +static bool map_pte(struct page_vma_mapped_walk *pvmw, spinlock_t **ptlp) { if (pvmw->flags & PVMW_SYNC) { /* Use the stricter lookup */ pvmw->pte =3D pte_offset_map_lock(pvmw->vma->vm_mm, pvmw->pmd, pvmw->address, &pvmw->ptl); - return true; + *ptlp =3D pvmw->ptl; + return !!pvmw->pte; } =20 - pvmw->pte =3D pte_offset_map(pvmw->pmd, pvmw->address); + /* + * It is important to return the ptl corresponding to pte, + * in case *pvmw->pmd changes underneath us; so we need to + * return it even when choosing not to lock, in case caller + * proceeds to loop over next ptes, and finds a match later. + * Though, in most cases, page lock already protects this. + */ + pvmw->pte =3D pte_offset_map_nolock(pvmw->vma->vm_mm, pvmw->pmd, + pvmw->address, ptlp); + if (!pvmw->pte) + return false; + if (pvmw->flags & PVMW_MIGRATION) { if (!is_swap_pte(*pvmw->pte)) return false; @@ -51,7 +63,7 @@ static bool map_pte(struct page_vma_mapped_walk *pvmw) } else if (!pte_present(*pvmw->pte)) { return false; } - pvmw->ptl =3D pte_lockptr(pvmw->vma->vm_mm, pvmw->pmd); + pvmw->ptl =3D *ptlp; spin_lock(pvmw->ptl); return true; } @@ -156,6 +168,7 @@ bool page_vma_mapped_walk(struct page_vma_mapped_walk *= pvmw) struct vm_area_struct *vma =3D pvmw->vma; struct mm_struct *mm =3D vma->vm_mm; unsigned long end; + spinlock_t *ptl; pgd_t *pgd; p4d_t *p4d; pud_t *pud; @@ -257,8 +270,11 @@ bool page_vma_mapped_walk(struct page_vma_mapped_walk = *pvmw) step_forward(pvmw, PMD_SIZE); continue; } - if (!map_pte(pvmw)) + if (!map_pte(pvmw, &ptl)) { + if (!pvmw->pte) + goto restart; goto next_pte; + } this_pte: if (check_pte(pvmw)) return true; @@ -281,7 +297,7 @@ bool page_vma_mapped_walk(struct page_vma_mapped_walk *= pvmw) } while (pte_none(*pvmw->pte)); =20 if (!pvmw->ptl) { - pvmw->ptl =3D pte_lockptr(mm, pvmw->pmd); + pvmw->ptl =3D ptl; spin_lock(pvmw->ptl); } goto this_pte; --=20 2.35.3 From nobody Sat Feb 7 20:47:53 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id A5173C7EE25 for ; Fri, 9 Jun 2023 01:17:35 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S237463AbjFIBRe (ORCPT ); Thu, 8 Jun 2023 21:17:34 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36490 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229458AbjFIBRc (ORCPT ); Thu, 8 Jun 2023 21:17:32 -0400 Received: from mail-yw1-x1133.google.com (mail-yw1-x1133.google.com [IPv6:2607:f8b0:4864:20::1133]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B2B9419AC for ; Thu, 8 Jun 2023 18:17:31 -0700 (PDT) Received: by mail-yw1-x1133.google.com with SMTP id 00721157ae682-568900c331aso12056237b3.3 for ; Thu, 08 Jun 2023 18:17:31 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1686273451; x=1688865451; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:from:to:cc:subject:date:message-id:reply-to; bh=YGxXR2sylNgXOWjswYon8LmdFOYpbwThTVM1QZPxixo=; b=hfFZS/To38y/kHK62NcQe3NeUtPQadIbTbMGv9CwCwnaOm3+ubft1ovMBDYwgIcJHN maH31bmCKIXTcoH2mw+7AlwbT//z013fnJricqAgo1Y8wR2cLWz5jidfA/WWWdNn++DD eExfA1ytyJkqY/La00m6AnHJyexbZrQSpOFK5OQ6AFOk6cyYk4BIHtO7IYbubUELYE5/ t2x0w817NAACm7JQwyi+uQozGB37VzsQ0dIb5qTadGN0KMsSpWs2GUyHK1Uhm4UV4EVa KWOR3EiG1vfrWrdRdz5JYnJcT4eEEx83Gbf/GYzP39cr1/xlfqJ/0ZZkApmwpKDFSRuJ KEIA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1686273451; x=1688865451; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=YGxXR2sylNgXOWjswYon8LmdFOYpbwThTVM1QZPxixo=; b=OyCBFfXBk68SoX5I76/UqS6rEIeaOMqYCHFTsOdXQYKgCOqNXubtrWq05rM8GIprOI ykgnwMb6EppVLL7vD/6vo9RmzgkIictS3JZ8V3gfnICd3RWuLfC1TKuOixFkJldVBq7w /i65uzq4/bZ7aT3FvPk8KxzlcYkCT9DJXBweORS9eKKfEuRhcpuHIIEjxsDAg50efl1c VC/dj1+BibrLQPS+E11WN0dfDRN5eM9Y+PyDxJDsF1R+S1x2d27asYdBhWKkm3O0AXNF RB9YAnTNZ4/+PPgjHdojYDJro/OkhF7+/eVCRNk123YieAvYdhGr1GtR/qHkvljiqyg2 Nctg== X-Gm-Message-State: AC+VfDwcsYPuDjWvIho1pnyTHqnStFCi4tXaUKmPfAo/UzeqdeKlauav eQQxLhNHMzIVynlm7gSMOyMrfA== X-Google-Smtp-Source: ACHHUZ684nq9AHXRq1UVfeo4n1qoOEQQWARDc3yoeKJKitFHVG0vbG1oJERSIO9SpyMcUhVMBrYK3g== X-Received: by 2002:a81:6cd5:0:b0:564:c4db:631e with SMTP id h204-20020a816cd5000000b00564c4db631emr1257858ywc.5.1686273450756; Thu, 08 Jun 2023 18:17:30 -0700 (PDT) Received: from ripple.attlocal.net (172-10-233-147.lightspeed.sntcca.sbcglobal.net. [172.10.233.147]) by smtp.gmail.com with ESMTPSA id p131-20020a817489000000b00560beb1c97bsm287394ywc.97.2023.06.08.18.17.27 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 08 Jun 2023 18:17:30 -0700 (PDT) Date: Thu, 8 Jun 2023 18:17:26 -0700 (PDT) From: Hugh Dickins X-X-Sender: hugh@ripple.attlocal.net To: Andrew Morton cc: Mike Kravetz , Mike Rapoport , "Kirill A. Shutemov" , Matthew Wilcox , David Hildenbrand , Suren Baghdasaryan , Qi Zheng , Yang Shi , Mel Gorman , Peter Xu , Peter Zijlstra , Will Deacon , Yu Zhao , Alistair Popple , Ralph Campbell , Ira Weiny , Steven Price , SeongJae Park , Lorenzo Stoakes , Huang Ying , Naoya Horiguchi , Christophe Leroy , Zack Rusin , Jason Gunthorpe , Axel Rasmussen , Anshuman Khandual , Pasha Tatashin , Miaohe Lin , Minchan Kim , Christoph Hellwig , Song Liu , Thomas Hellstrom , Ryan Roberts , linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: [PATCH v2 09/32] mm/pagewalkers: ACTION_AGAIN if pte_offset_map_lock() fails In-Reply-To: Message-ID: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Simple walk_page_range() users should set ACTION_AGAIN to retry when pte_offset_map_lock() fails. No need to check pmd_trans_unstable(): that was precisely to avoid the possiblity of calling pte_offset_map() on a racily removed or inserted THP entry, but such cases are now safely handled inside it. Likewise there is no need to check pmd_none() or pmd_bad() before calling it. Signed-off-by: Hugh Dickins Reviewed-by: SeongJae Park for mm/damon part --- fs/proc/task_mmu.c | 32 ++++++++++++++++---------------- mm/damon/vaddr.c | 12 ++++++++---- mm/mempolicy.c | 7 ++++--- mm/mincore.c | 9 ++++----- mm/mlock.c | 4 ++++ 5 files changed, 36 insertions(+), 28 deletions(-) diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c index 420510f6a545..dba5052ce09b 100644 --- a/fs/proc/task_mmu.c +++ b/fs/proc/task_mmu.c @@ -631,14 +631,11 @@ static int smaps_pte_range(pmd_t *pmd, unsigned long = addr, unsigned long end, goto out; } =20 - if (pmd_trans_unstable(pmd)) - goto out; - /* - * The mmap_lock held all the way back in m_start() is what - * keeps khugepaged out of here and from collapsing things - * in here. - */ pte =3D pte_offset_map_lock(vma->vm_mm, pmd, addr, &ptl); + if (!pte) { + walk->action =3D ACTION_AGAIN; + return 0; + } for (; addr !=3D end; pte++, addr +=3D PAGE_SIZE) smaps_pte_entry(pte, addr, walk); pte_unmap_unlock(pte - 1, ptl); @@ -1191,10 +1188,11 @@ static int clear_refs_pte_range(pmd_t *pmd, unsigne= d long addr, return 0; } =20 - if (pmd_trans_unstable(pmd)) - return 0; - pte =3D pte_offset_map_lock(vma->vm_mm, pmd, addr, &ptl); + if (!pte) { + walk->action =3D ACTION_AGAIN; + return 0; + } for (; addr !=3D end; pte++, addr +=3D PAGE_SIZE) { ptent =3D *pte; =20 @@ -1538,9 +1536,6 @@ static int pagemap_pmd_range(pmd_t *pmdp, unsigned lo= ng addr, unsigned long end, spin_unlock(ptl); return err; } - - if (pmd_trans_unstable(pmdp)) - return 0; #endif /* CONFIG_TRANSPARENT_HUGEPAGE */ =20 /* @@ -1548,6 +1543,10 @@ static int pagemap_pmd_range(pmd_t *pmdp, unsigned l= ong addr, unsigned long end, * goes beyond vma->vm_end. */ orig_pte =3D pte =3D pte_offset_map_lock(walk->mm, pmdp, addr, &ptl); + if (!pte) { + walk->action =3D ACTION_AGAIN; + return err; + } for (; addr < end; pte++, addr +=3D PAGE_SIZE) { pagemap_entry_t pme; =20 @@ -1887,11 +1886,12 @@ static int gather_pte_stats(pmd_t *pmd, unsigned lo= ng addr, spin_unlock(ptl); return 0; } - - if (pmd_trans_unstable(pmd)) - return 0; #endif orig_pte =3D pte =3D pte_offset_map_lock(walk->mm, pmd, addr, &ptl); + if (!pte) { + walk->action =3D ACTION_AGAIN; + return 0; + } do { struct page *page =3D can_gather_numa_stats(*pte, vma, addr); if (!page) diff --git a/mm/damon/vaddr.c b/mm/damon/vaddr.c index 1fec16d7263e..b8762ff15c3c 100644 --- a/mm/damon/vaddr.c +++ b/mm/damon/vaddr.c @@ -318,9 +318,11 @@ static int damon_mkold_pmd_entry(pmd_t *pmd, unsigned = long addr, spin_unlock(ptl); } =20 - if (pmd_none(*pmd) || unlikely(pmd_bad(*pmd))) - return 0; pte =3D pte_offset_map_lock(walk->mm, pmd, addr, &ptl); + if (!pte) { + walk->action =3D ACTION_AGAIN; + return 0; + } if (!pte_present(*pte)) goto out; damon_ptep_mkold(pte, walk->mm, addr); @@ -464,9 +466,11 @@ static int damon_young_pmd_entry(pmd_t *pmd, unsigned = long addr, regular_page: #endif /* CONFIG_TRANSPARENT_HUGEPAGE */ =20 - if (pmd_none(*pmd) || unlikely(pmd_bad(*pmd))) - return -EINVAL; pte =3D pte_offset_map_lock(walk->mm, pmd, addr, &ptl); + if (!pte) { + walk->action =3D ACTION_AGAIN; + return 0; + } if (!pte_present(*pte)) goto out; folio =3D damon_get_folio(pte_pfn(*pte)); diff --git a/mm/mempolicy.c b/mm/mempolicy.c index 1756389a0609..4d0bcf6f0d52 100644 --- a/mm/mempolicy.c +++ b/mm/mempolicy.c @@ -514,10 +514,11 @@ static int queue_folios_pte_range(pmd_t *pmd, unsigne= d long addr, if (ptl) return queue_folios_pmd(pmd, ptl, addr, end, walk); =20 - if (pmd_trans_unstable(pmd)) - return 0; - mapped_pte =3D pte =3D pte_offset_map_lock(walk->mm, pmd, addr, &ptl); + if (!pte) { + walk->action =3D ACTION_AGAIN; + return 0; + } for (; addr !=3D end; pte++, addr +=3D PAGE_SIZE) { if (!pte_present(*pte)) continue; diff --git a/mm/mincore.c b/mm/mincore.c index 2d5be013a25a..f33f6a0b1ded 100644 --- a/mm/mincore.c +++ b/mm/mincore.c @@ -113,12 +113,11 @@ static int mincore_pte_range(pmd_t *pmd, unsigned lon= g addr, unsigned long end, goto out; } =20 - if (pmd_trans_unstable(pmd)) { - __mincore_unmapped_range(addr, end, vma, vec); - goto out; - } - ptep =3D pte_offset_map_lock(walk->mm, pmd, addr, &ptl); + if (!ptep) { + walk->action =3D ACTION_AGAIN; + return 0; + } for (; addr !=3D end; ptep++, addr +=3D PAGE_SIZE) { pte_t pte =3D *ptep; =20 diff --git a/mm/mlock.c b/mm/mlock.c index 40b43f8740df..9f2b1173b1b1 100644 --- a/mm/mlock.c +++ b/mm/mlock.c @@ -329,6 +329,10 @@ static int mlock_pte_range(pmd_t *pmd, unsigned long a= ddr, } =20 start_pte =3D pte_offset_map_lock(vma->vm_mm, pmd, addr, &ptl); + if (!start_pte) { + walk->action =3D ACTION_AGAIN; + return 0; + } for (pte =3D start_pte; addr !=3D end; pte++, addr +=3D PAGE_SIZE) { if (!pte_present(*pte)) continue; --=20 2.35.3 From nobody Sat Feb 7 20:47:53 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3DC5FC7EE25 for ; Fri, 9 Jun 2023 01:19:02 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S237546AbjFIBS7 (ORCPT ); Thu, 8 Jun 2023 21:18:59 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:37396 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229692AbjFIBS4 (ORCPT ); Thu, 8 Jun 2023 21:18:56 -0400 Received: from mail-yb1-xb2e.google.com (mail-yb1-xb2e.google.com [IPv6:2607:f8b0:4864:20::b2e]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 1816319AC for ; Thu, 8 Jun 2023 18:18:55 -0700 (PDT) Received: by mail-yb1-xb2e.google.com with SMTP id 3f1490d57ef6-ba86ea269e0so1277519276.1 for ; Thu, 08 Jun 2023 18:18:55 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1686273534; x=1688865534; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:from:to:cc:subject:date:message-id:reply-to; bh=s/wneZk+WXTrjMA3fZuua9PXiv3fUuHnEiWoD80tgK8=; b=ngXLVdTpVyrnOmcx9sqd11fDN/DpMfDjRFc20sS67KuKXUBPfJVN7b54WivfwCCqcm 2w8p0+WuKxMdG7YLSaNER90dnOHjPSlvPspYoe5kIeMhFky06+Enxr+OKG3u1lmvAOqa C+PVbym1M2SnYvLxMqSZg33YxKhOPm3FgR9jzQ+MnLfZGFUC2OJbbjuScpkVkepg/TEp 276YbysaG6RpeWzjjsraxwjBqi8YR2LoYFLfcozNIrD2bqcAOk2V+q1v/295yDeTTnP9 TfyYNMuGFwQjegWuOVfz1GKj11hEcBHgzfIvjlO8LY2ikBAtYqRoDenRCee7HmvBFb1A njIw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1686273534; x=1688865534; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=s/wneZk+WXTrjMA3fZuua9PXiv3fUuHnEiWoD80tgK8=; b=YMauREB2g/iRaGULaR5xRwXHQNiURkiWyM9zw8gV41v7LapBeB1Q4asogV3syiepZY IIeeM1puuqXlhTpwsm4ZSfOLIPm2J0JoUwww824VOkPl+X2QX4sRVIdvDoXfWUFSSM/j dJbFrZTyFifDHtmPBf24L9GyncP3pLXw8IGuGhTZeuv0J5hfYYggiYM7YdJugrhWhE7+ fMRsCVV+MZqIf/9ZCx23M0KPmq2ORQ83jyB8SP6Wf7fZaoJdHYSUr2ySrU4SH24IvYuz 4hDO8G4+s5w4xlphA/0RA6KZVoTPqtDtDboipD9n480up6Q8YVnylbeCJ8gZFwkH7wK2 WBOw== X-Gm-Message-State: AC+VfDwwuaul5n0n5OVStHTBr9NNgrTa7Ai/T59vbwC///rjjcF5XE+a Fk7rOH5W1c0jI3/y11iMfeTwag== X-Google-Smtp-Source: ACHHUZ5bBUa+IZ+zQxwNmCb4pbZbizWCezt9RItrFgkqOHN09PZD1rbRUEyPvItVL/IZu/e+lOkb4A== X-Received: by 2002:a25:d757:0:b0:ba7:ff37:4603 with SMTP id o84-20020a25d757000000b00ba7ff374603mr1178588ybg.45.1686273534119; Thu, 08 Jun 2023 18:18:54 -0700 (PDT) Received: from ripple.attlocal.net (172-10-233-147.lightspeed.sntcca.sbcglobal.net. [172.10.233.147]) by smtp.gmail.com with ESMTPSA id v38-20020a25aba9000000b00b923b2935d9sm603286ybi.20.2023.06.08.18.18.50 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 08 Jun 2023 18:18:53 -0700 (PDT) Date: Thu, 8 Jun 2023 18:18:49 -0700 (PDT) From: Hugh Dickins X-X-Sender: hugh@ripple.attlocal.net To: Andrew Morton cc: Mike Kravetz , Mike Rapoport , "Kirill A. Shutemov" , Matthew Wilcox , David Hildenbrand , Suren Baghdasaryan , Qi Zheng , Yang Shi , Mel Gorman , Peter Xu , Peter Zijlstra , Will Deacon , Yu Zhao , Alistair Popple , Ralph Campbell , Ira Weiny , Steven Price , SeongJae Park , Lorenzo Stoakes , Huang Ying , Naoya Horiguchi , Christophe Leroy , Zack Rusin , Jason Gunthorpe , Axel Rasmussen , Anshuman Khandual , Pasha Tatashin , Miaohe Lin , Minchan Kim , Christoph Hellwig , Song Liu , Thomas Hellstrom , Ryan Roberts , linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: [PATCH v2 10/32] mm/pagewalk: walk_pte_range() allow for pte_offset_map() In-Reply-To: Message-ID: <3eba6f0-2b-fb66-6bb6-2ee8533e221@google.com> References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" walk_pte_range() has a no_vma option to serve walk_page_range_novma(). I don't know of any problem, but it looks safer to check for init_mm, and use pte_offset_kernel() rather than pte_offset_map() in that case: pte_offset_map()'s pmdval validation is intended for userspace. Allow for its pte_offset_map() or pte_offset_map_lock() to fail, and retry with ACTION_AGAIN if so. Add a second check for ACTION_AGAIN in walk_pmd_range(), to catch it after return from walk_pte_range(). Remove the pmd_trans_unstable() check after split_huge_pmd() in walk_pmd_range(): walk_pte_range() now handles those cases safely (and they must fail powerpc's is_hugepd() check). Signed-off-by: Hugh Dickins --- mm/pagewalk.c | 33 +++++++++++++++++++++++---------- 1 file changed, 23 insertions(+), 10 deletions(-) diff --git a/mm/pagewalk.c b/mm/pagewalk.c index cb23f8a15c13..64437105fe0d 100644 --- a/mm/pagewalk.c +++ b/mm/pagewalk.c @@ -46,15 +46,27 @@ static int walk_pte_range(pmd_t *pmd, unsigned long add= r, unsigned long end, spinlock_t *ptl; =20 if (walk->no_vma) { - pte =3D pte_offset_map(pmd, addr); - err =3D walk_pte_range_inner(pte, addr, end, walk); - pte_unmap(pte); + /* + * pte_offset_map() might apply user-specific validation. + */ + if (walk->mm =3D=3D &init_mm) + pte =3D pte_offset_kernel(pmd, addr); + else + pte =3D pte_offset_map(pmd, addr); + if (pte) { + err =3D walk_pte_range_inner(pte, addr, end, walk); + if (walk->mm !=3D &init_mm) + pte_unmap(pte); + } } else { pte =3D pte_offset_map_lock(walk->mm, pmd, addr, &ptl); - err =3D walk_pte_range_inner(pte, addr, end, walk); - pte_unmap_unlock(pte, ptl); + if (pte) { + err =3D walk_pte_range_inner(pte, addr, end, walk); + pte_unmap_unlock(pte, ptl); + } } - + if (!pte) + walk->action =3D ACTION_AGAIN; return err; } =20 @@ -141,11 +153,8 @@ static int walk_pmd_range(pud_t *pud, unsigned long ad= dr, unsigned long end, !(ops->pte_entry)) continue; =20 - if (walk->vma) { + if (walk->vma) split_huge_pmd(walk->vma, pmd, addr); - if (pmd_trans_unstable(pmd)) - goto again; - } =20 if (is_hugepd(__hugepd(pmd_val(*pmd)))) err =3D walk_hugepd_range((hugepd_t *)pmd, addr, next, walk, PMD_SHIFT); @@ -153,6 +162,10 @@ static int walk_pmd_range(pud_t *pud, unsigned long ad= dr, unsigned long end, err =3D walk_pte_range(pmd, addr, next, walk); if (err) break; + + if (walk->action =3D=3D ACTION_AGAIN) + goto again; + } while (pmd++, addr =3D next, addr !=3D end); =20 return err; --=20 2.35.3 From nobody Sat Feb 7 20:47:53 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 44EFFC7EE37 for ; Fri, 9 Jun 2023 01:20:14 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S237570AbjFIBUN (ORCPT ); Thu, 8 Jun 2023 21:20:13 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:37824 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229692AbjFIBUK (ORCPT ); Thu, 8 Jun 2023 21:20:10 -0400 Received: from mail-yw1-x1133.google.com (mail-yw1-x1133.google.com [IPv6:2607:f8b0:4864:20::1133]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 9F8EB198C for ; Thu, 8 Jun 2023 18:20:09 -0700 (PDT) Received: by mail-yw1-x1133.google.com with SMTP id 00721157ae682-565a022ef06so11190247b3.3 for ; Thu, 08 Jun 2023 18:20:09 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1686273609; x=1688865609; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:from:to:cc:subject:date:message-id:reply-to; bh=W3t6elq9R4Rh1GonIcnIp9HMVNr0t0fb96tM78h+6lY=; b=sbrAsg5WUYRFtaZIknvbQiG6/TYzlsI+0wgQgiUfJThncdOkuzQD0khSKs1V9z07r2 Tv51pW/PAoavvTS97xjUmu8LWWg94YJvp9vB5pfhhczalF5/DDl+Uv7HP4vGiWYqNo4X NvPImQC1dON//ZIIaJuY80thsOcoTmEarD24++KqimOdduTFWASKv9w7+mCL83HP6Bjj lCLJbqulD5C8g370KMX/kus7X70GOqLQ+qmTon2m/00V/c4ahLqm0Dsyh/zh31jb7WvE PrxBXcgSpFRmWGUOtPIfuBbXHPk/T7GjZ0IZtugAqqqNtfJYQbkhOVtUktXosevWE1uY kB2w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1686273609; x=1688865609; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=W3t6elq9R4Rh1GonIcnIp9HMVNr0t0fb96tM78h+6lY=; b=VEa5VIp0wlE6NJvD1tAfgLPLAUKBG2xNrPxnKYIG6/e6sDUPxQOP0f4e3nqQaZBIy6 Bp2J/10wSkJ4fRb4NR0kIe80CNu2bqPtPpb7nd9gY8q7hQCN9r9bOuweNCu9IZsMJeJQ sJLqazlPoPBnoYRws9sqdVTgfn4ljT+f/DPSGVT+TGchNjQcZnPaMzgB9JN6gpouXAad 5XBtzQRf32lLf5YZSBlKeq8TH6j7GQ+ociHdzSuB8oUt+uas4uuhXNJ0COFmLEURfl3m BOZHZZda5lTNehPCZ9jGP8IDXgMOMcB7r6X/DbUWOxx2+eq0kUCZjnB+pJZ1nfBudtLA rwtA== X-Gm-Message-State: AC+VfDyjGnwoLqU/iR8IeD7GsqbtJh0W+i3ZhaY009LM/nQC8QZJanuJ DBcXsPNxb2NrSqXMwjhIEvM92g== X-Google-Smtp-Source: ACHHUZ6n8W5YFW3Tw+jdRjglpr2r72K5tunXYPvH/mLAwhgJM3++HeY9qf1iqDSmIO6b6XIA/ifN5Q== X-Received: by 2002:a0d:dd01:0:b0:568:b0f6:ce8a with SMTP id g1-20020a0ddd01000000b00568b0f6ce8amr1205033ywe.24.1686273608570; Thu, 08 Jun 2023 18:20:08 -0700 (PDT) Received: from ripple.attlocal.net (172-10-233-147.lightspeed.sntcca.sbcglobal.net. [172.10.233.147]) by smtp.gmail.com with ESMTPSA id j77-20020a819250000000b00565862c5e90sm289860ywg.83.2023.06.08.18.20.05 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 08 Jun 2023 18:20:07 -0700 (PDT) Date: Thu, 8 Jun 2023 18:20:04 -0700 (PDT) From: Hugh Dickins X-X-Sender: hugh@ripple.attlocal.net To: Andrew Morton cc: Mike Kravetz , Mike Rapoport , "Kirill A. Shutemov" , Matthew Wilcox , David Hildenbrand , Suren Baghdasaryan , Qi Zheng , Yang Shi , Mel Gorman , Peter Xu , Peter Zijlstra , Will Deacon , Yu Zhao , Alistair Popple , Ralph Campbell , Ira Weiny , Steven Price , SeongJae Park , Lorenzo Stoakes , Huang Ying , Naoya Horiguchi , Christophe Leroy , Zack Rusin , Jason Gunthorpe , Axel Rasmussen , Anshuman Khandual , Pasha Tatashin , Miaohe Lin , Minchan Kim , Christoph Hellwig , Song Liu , Thomas Hellstrom , Ryan Roberts , linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: [PATCH v2 11/32] mm/vmwgfx: simplify pmd & pud mapping dirty helpers In-Reply-To: Message-ID: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" wp_clean_pmd_entry() need not check pmd_trans_unstable() or pmd_none(), wp_clean_pud_entry() need not check pud_trans_unstable() or pud_none(): it's just the ACTION_CONTINUE when trans_huge or devmap that's needed to prevent splitting, and we're hoping to remove pmd_trans_unstable(). Is that PUD #ifdef necessary? Maybe some configs are missing a stub. Signed-off-by: Hugh Dickins --- mm/mapping_dirty_helpers.c | 34 +++++++++------------------------- 1 file changed, 9 insertions(+), 25 deletions(-) diff --git a/mm/mapping_dirty_helpers.c b/mm/mapping_dirty_helpers.c index e1eb33f49059..87b4beeda4fa 100644 --- a/mm/mapping_dirty_helpers.c +++ b/mm/mapping_dirty_helpers.c @@ -128,19 +128,11 @@ static int wp_clean_pmd_entry(pmd_t *pmd, unsigned lo= ng addr, unsigned long end, { pmd_t pmdval =3D pmdp_get_lockless(pmd); =20 - if (!pmd_trans_unstable(&pmdval)) - return 0; - - if (pmd_none(pmdval)) { - walk->action =3D ACTION_AGAIN; - return 0; - } - - /* Huge pmd, present or migrated */ - walk->action =3D ACTION_CONTINUE; - if (pmd_trans_huge(pmdval) || pmd_devmap(pmdval)) + /* Do not split a huge pmd, present or migrated */ + if (pmd_trans_huge(pmdval) || pmd_devmap(pmdval)) { WARN_ON(pmd_write(pmdval) || pmd_dirty(pmdval)); - + walk->action =3D ACTION_CONTINUE; + } return 0; } =20 @@ -156,23 +148,15 @@ static int wp_clean_pmd_entry(pmd_t *pmd, unsigned lo= ng addr, unsigned long end, static int wp_clean_pud_entry(pud_t *pud, unsigned long addr, unsigned lon= g end, struct mm_walk *walk) { +#ifdef CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD pud_t pudval =3D READ_ONCE(*pud); =20 - if (!pud_trans_unstable(&pudval)) - return 0; - - if (pud_none(pudval)) { - walk->action =3D ACTION_AGAIN; - return 0; - } - -#ifdef CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD - /* Huge pud */ - walk->action =3D ACTION_CONTINUE; - if (pud_trans_huge(pudval) || pud_devmap(pudval)) + /* Do not split a huge pud */ + if (pud_trans_huge(pudval) || pud_devmap(pudval)) { WARN_ON(pud_write(pudval) || pud_dirty(pudval)); + walk->action =3D ACTION_CONTINUE; + } #endif - return 0; } =20 --=20 2.35.3 From nobody Sat Feb 7 20:47:53 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 23C12C7EE29 for ; Fri, 9 Jun 2023 01:22:03 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229757AbjFIBWB (ORCPT ); Thu, 8 Jun 2023 21:22:01 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:38740 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S237618AbjFIBVr (ORCPT ); Thu, 8 Jun 2023 21:21:47 -0400 Received: from mail-yb1-xb2e.google.com (mail-yb1-xb2e.google.com [IPv6:2607:f8b0:4864:20::b2e]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B3F611FDF for ; Thu, 8 Jun 2023 18:21:46 -0700 (PDT) Received: by mail-yb1-xb2e.google.com with SMTP id 3f1490d57ef6-bb3d122a19fso1262860276.0 for ; Thu, 08 Jun 2023 18:21:46 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1686273706; x=1688865706; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:from:to:cc:subject:date:message-id:reply-to; bh=ngswcYXh32wJZMu4ey5160LWyE599vrHnWvZmx6kKak=; b=DCQmA7jyE39MLlNzgNj5Fnvt6VxEVhxfR43nyr9O34TsE/NR89GCRNiD4ZoZdR4Qh/ VVlLGnas01pgDThKof5oMLzzf1GcWd8TwEjNBhNMTY02KeG47WG7BA2R3pdxHonu/WZa V3CvdasgvhS1HvjB0i2UUPMQsDXJR6NqotUE643j7wgPp8AFNy/qFFWZUuLH+fdomurR thKNoJK95KQaiNcwFeztAb+d5ZaqpXtxhvroVgVoxsGEe35huAPILPbeWC7G/RPecM4M XWHbDzBCI7PfoB33hTMYlIVyb/ngBEvTOCi3mr/ilks4HMOz+anwKzwFIqfyaYnbXfBU DGJA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1686273706; x=1688865706; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=ngswcYXh32wJZMu4ey5160LWyE599vrHnWvZmx6kKak=; b=SAiSofQJRF/nlWxzRZhgeBSxX7agnwj6cK7jw0DMVcRHkv/uDdOKkaDHdlUYOsXU1E iSrKvOiZBC5akRbzaoJ/0kEnQDFUgfh4dDMcRwu0JdS2rOSutOeCxhonRpyu6sFl3SUj Z1BSMz1J8H1nQRkQfrmt4bbOk7eSldr73YL36oeISAHHHi9FVPaRZpFmDZ4L0eji5sqo Ie+QMUzBgGflNqCrD2yHM2zRo7o5StC26ZORynPsbVIn+qmzJiKV3UrBJH2M+nFsVhUq m8CvkXq8IAHticA7GuRSsJEjQJFg8j5ZChx7YvrstpZIu0TYIAqf3IAomJkrt6e313nG Puig== X-Gm-Message-State: AC+VfDzg2BTBr4MPnj2kXmzg9Fzd3zjL3g3z7cIn13FLXQ9bvZeb8Qm+ 2O3VLRGITpII4yzfzYofszDgfg== X-Google-Smtp-Source: ACHHUZ6Ffc3fQZoL4RMB14OmxVVM/4jw5aXSDM9hxxoBcUG6hRO5j/bd0jr5kM6GfmtOtWXvD7TQfA== X-Received: by 2002:a25:f501:0:b0:bb3:9255:33e9 with SMTP id a1-20020a25f501000000b00bb3925533e9mr1145250ybe.53.1686273705789; Thu, 08 Jun 2023 18:21:45 -0700 (PDT) Received: from ripple.attlocal.net (172-10-233-147.lightspeed.sntcca.sbcglobal.net. [172.10.233.147]) by smtp.gmail.com with ESMTPSA id t12-20020a5b0dcc000000b00bb138b444dcsm586743ybr.36.2023.06.08.18.21.42 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 08 Jun 2023 18:21:44 -0700 (PDT) Date: Thu, 8 Jun 2023 18:21:41 -0700 (PDT) From: Hugh Dickins X-X-Sender: hugh@ripple.attlocal.net To: Andrew Morton cc: Mike Kravetz , Mike Rapoport , "Kirill A. Shutemov" , Matthew Wilcox , David Hildenbrand , Suren Baghdasaryan , Qi Zheng , Yang Shi , Mel Gorman , Peter Xu , Peter Zijlstra , Will Deacon , Yu Zhao , Alistair Popple , Ralph Campbell , Ira Weiny , Steven Price , SeongJae Park , Lorenzo Stoakes , Huang Ying , Naoya Horiguchi , Christophe Leroy , Zack Rusin , Jason Gunthorpe , Axel Rasmussen , Anshuman Khandual , Pasha Tatashin , Miaohe Lin , Minchan Kim , Christoph Hellwig , Song Liu , Thomas Hellstrom , Ryan Roberts , linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: [PATCH v2 12/32] mm/vmalloc: vmalloc_to_page() use pte_offset_kernel() In-Reply-To: Message-ID: <696386a-84f8-b33c-82e5-f865ed6eb39@google.com> References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" vmalloc_to_page() was using pte_offset_map() (followed by pte_unmap()), but it's intended for userspace page tables: prefer pte_offset_kernel(). Signed-off-by: Hugh Dickins Reviewed-by: Lorenzo Stoakes --- mm/vmalloc.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/mm/vmalloc.c b/mm/vmalloc.c index 9683573f1225..741722d247d5 100644 --- a/mm/vmalloc.c +++ b/mm/vmalloc.c @@ -703,11 +703,10 @@ struct page *vmalloc_to_page(const void *vmalloc_addr) if (WARN_ON_ONCE(pmd_bad(*pmd))) return NULL; =20 - ptep =3D pte_offset_map(pmd, addr); + ptep =3D pte_offset_kernel(pmd, addr); pte =3D *ptep; if (pte_present(pte)) page =3D pte_page(pte); - pte_unmap(ptep); =20 return page; } --=20 2.35.3 From nobody Sat Feb 7 20:47:53 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id F0A50C7EE29 for ; Fri, 9 Jun 2023 01:23:28 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S237614AbjFIBX1 (ORCPT ); Thu, 8 Jun 2023 21:23:27 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:39586 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229931AbjFIBXZ (ORCPT ); Thu, 8 Jun 2023 21:23:25 -0400 Received: from mail-yw1-x1134.google.com (mail-yw1-x1134.google.com [IPv6:2607:f8b0:4864:20::1134]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A694F18D for ; Thu, 8 Jun 2023 18:23:24 -0700 (PDT) Received: by mail-yw1-x1134.google.com with SMTP id 00721157ae682-565c7399afaso11693017b3.1 for ; Thu, 08 Jun 2023 18:23:24 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1686273804; x=1688865804; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:from:to:cc:subject:date:message-id:reply-to; bh=3KTDoQQSpcjB9NQiyOQFUXOX88v07BrrKabYvpMIx6w=; b=PhwpDxcjLXiR7MGVaWKfABLE3dmRB3qka2k6U1l+muDGzWLhomU4qiQ57KmT6VH8qY ACB3DayhC4faseiipd0QCNLUtNtuFH3XTvyWMyC+FvmNiYj2hTBmc8pdfyd4XFmOfbWU tZooxmyCXH+STIuVMaRhFjnJtSpEto0Mxch6xRdf+bo4VO2G6a/c8gheC2mOEmzSx5zk ccYQOeUWSm1v0s9lMRW9nRmwt5CbXGDSvPCaqv/QLtUPzWt0If7EQ4o8TmUAMcKVROe3 I5HEpFf63MgdvN7DdkYJ8qdg2z230og4tgBl+c7UnWgkZm/cRQnWkzi++q5rmkDVXcaG 0vug== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1686273804; x=1688865804; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=3KTDoQQSpcjB9NQiyOQFUXOX88v07BrrKabYvpMIx6w=; b=GiZd0qIQHOl9r87QXH+ZzsjIRWiv+KvxNhYDHhmgudyKY22riOcjdH3NFHbUdJD/SC 3jg/7wMfqW9hKKhIgzUvde/s9fG3pj39i/HCdw4FEfYUv03qHXVxCqS9DOhYnD20bXBs Ccth8W1bMF4xm0s+mtcvuCaIzjVv5ii1QShydMK60JDjmSALoAbd1c4SnRGiJ5DTrg2A bcr8anlJasdgiBp8wrc1747ZLZH9IoDdPFxyBhHtNCCPjTMIHX6fzO/9Rd+Y4GIpfzrf XxMXoei1MN3D1sJKckPrL0QHpx9UYnM34Nx83hD7XvlkYShA8i/NxuOLfYixT6bwWPnV ySMA== X-Gm-Message-State: AC+VfDxr8usE8FyZufSo6SSZXNUpodEWKNsI8b7L+KdektYVD8M2hOVG pj7+jsCxNDdOkaPwnyX/w1Cu8g== X-Google-Smtp-Source: ACHHUZ5Qxg9qOQctVm9TDBIrvRBwWLkj4+fag3kdEXz2orEpq2AOsjXF+B86JD9LcmOUKUNihoCjxw== X-Received: by 2002:a81:6743:0:b0:568:f9f0:b057 with SMTP id b64-20020a816743000000b00568f9f0b057mr1198700ywc.26.1686273803806; Thu, 08 Jun 2023 18:23:23 -0700 (PDT) Received: from ripple.attlocal.net (172-10-233-147.lightspeed.sntcca.sbcglobal.net. [172.10.233.147]) by smtp.gmail.com with ESMTPSA id i11-20020a0ddf0b000000b00568ab5dd873sm288724ywe.65.2023.06.08.18.23.20 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 08 Jun 2023 18:23:23 -0700 (PDT) Date: Thu, 8 Jun 2023 18:23:19 -0700 (PDT) From: Hugh Dickins X-X-Sender: hugh@ripple.attlocal.net To: Andrew Morton cc: Mike Kravetz , Mike Rapoport , "Kirill A. Shutemov" , Matthew Wilcox , David Hildenbrand , Suren Baghdasaryan , Qi Zheng , Yang Shi , Mel Gorman , Peter Xu , Peter Zijlstra , Will Deacon , Yu Zhao , Alistair Popple , Ralph Campbell , Ira Weiny , Steven Price , SeongJae Park , Lorenzo Stoakes , Huang Ying , Naoya Horiguchi , Christophe Leroy , Zack Rusin , Jason Gunthorpe , Axel Rasmussen , Anshuman Khandual , Pasha Tatashin , Miaohe Lin , Minchan Kim , Christoph Hellwig , Song Liu , Thomas Hellstrom , Ryan Roberts , linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: [PATCH v2 13/32] mm/hmm: retry if pte_offset_map() fails In-Reply-To: Message-ID: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" hmm_vma_walk_pmd() is called through mm_walk, but already has a goto again loop of its own, so take part in that if pte_offset_map() fails. Signed-off-by: Hugh Dickins Reviewed-by: Alistair Popple --- mm/hmm.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/mm/hmm.c b/mm/hmm.c index e23043345615..b1a9159d7c92 100644 --- a/mm/hmm.c +++ b/mm/hmm.c @@ -381,6 +381,8 @@ static int hmm_vma_walk_pmd(pmd_t *pmdp, } =20 ptep =3D pte_offset_map(pmdp, addr); + if (!ptep) + goto again; for (; addr < end; addr +=3D PAGE_SIZE, ptep++, hmm_pfns++) { int r; =20 --=20 2.35.3 From nobody Sat Feb 7 20:47:53 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id CE2E1C7EE29 for ; Fri, 9 Jun 2023 01:24:56 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S237618AbjFIBYz (ORCPT ); Thu, 8 Jun 2023 21:24:55 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:40488 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S237202AbjFIBYp (ORCPT ); Thu, 8 Jun 2023 21:24:45 -0400 Received: from mail-yb1-xb30.google.com (mail-yb1-xb30.google.com [IPv6:2607:f8b0:4864:20::b30]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E7C9F18D for ; Thu, 8 Jun 2023 18:24:43 -0700 (PDT) Received: by mail-yb1-xb30.google.com with SMTP id 3f1490d57ef6-bacfb7acdb7so1331310276.0 for ; Thu, 08 Jun 2023 18:24:43 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1686273883; x=1688865883; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:from:to:cc:subject:date:message-id:reply-to; bh=wNia0t/q7bIuS9YTvnKusLroNIjYrblkNZ2RtXpvdKE=; b=En3MWCFjZBPbHoaHE9aTCrwnidrPgxHv9KrEbLe/hJSNlvG00NQArOmMAEVh1cdPK1 c0avwttgIxMg1/M4xoSdtZvgHC6q+oGNCMIaib2sYWv0Fx3ipPlY1RUsAjzGUeuRCY7q f4C9d8VWmMay/sRpQantLYEl4VbdRT+IrwRgP0s1LNmuiH920qhuNY3jPTG46R78i5+k 9bDaQ6fLa6fnzNxc3zf976MlEqNb+2p/4gql5IeQqWDTKZblvLqIpzvT31fUaUlXU1zw KT/OmxBpL9FU5MCWQGoReuGTHeYWIqJH3GhVd3emhvyvLQrP6bSWfVqdEbmUf+1ykpPu 0Uaw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1686273883; x=1688865883; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=wNia0t/q7bIuS9YTvnKusLroNIjYrblkNZ2RtXpvdKE=; b=UFrFA2/aQlviVuEA4sVXd9YBokF8eFuiDbeDw225RIsbxsCFEbOxRixaDb9flg+2/+ LHes5NQft8DEoqSMr8x+Bm1JP/5YIXkbfDP+4YI1yvT9yuopBhsf3ikFYu2vYIMwh1Ab Svw6S48rn4KZ0gAMNsahZtXKSeKxTfzS/eHoGjUJdW+48f8YJXf723P4JUnqFS9ymge3 rEZwfDylaD8S488wCsHXvv7cje6Z8fascxgjftWHhMTRnUfsE6Q+AyYfNFUzwiHA81CJ yoD/+H7XUeMSL+eRFNZ8vTQRUIY1PzGW/6JrRTMPce37joA6PUxF/B1oInSmeQrIEtB5 zVYg== X-Gm-Message-State: AC+VfDwoGAtuJMfsr/yK3TSD7+uSfqe0ock2RsY2Du/n7I9uMPU6IMrW s7UEMIHiMKsZrh2P2panpgHWyA== X-Google-Smtp-Source: ACHHUZ7fnPyD1AB15VbBenw69LG3/w2FA1xyBpqbSoDD115cddrIVPxKnnpMPhIjJpC81nscf3jcJA== X-Received: by 2002:a25:8d83:0:b0:ba1:6bad:41a4 with SMTP id o3-20020a258d83000000b00ba16bad41a4mr1193186ybl.14.1686273882963; Thu, 08 Jun 2023 18:24:42 -0700 (PDT) Received: from ripple.attlocal.net (172-10-233-147.lightspeed.sntcca.sbcglobal.net. [172.10.233.147]) by smtp.gmail.com with ESMTPSA id ch22-20020a0569020b1600b00bac1087b44esm587924ybb.35.2023.06.08.18.24.39 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 08 Jun 2023 18:24:42 -0700 (PDT) Date: Thu, 8 Jun 2023 18:24:38 -0700 (PDT) From: Hugh Dickins X-X-Sender: hugh@ripple.attlocal.net To: Andrew Morton cc: Mike Kravetz , Mike Rapoport , "Kirill A. Shutemov" , Matthew Wilcox , David Hildenbrand , Suren Baghdasaryan , Qi Zheng , Yang Shi , Mel Gorman , Peter Xu , Peter Zijlstra , Will Deacon , Yu Zhao , Alistair Popple , Ralph Campbell , Ira Weiny , Steven Price , SeongJae Park , Lorenzo Stoakes , Huang Ying , Naoya Horiguchi , Christophe Leroy , Zack Rusin , Jason Gunthorpe , Axel Rasmussen , Anshuman Khandual , Pasha Tatashin , Miaohe Lin , Minchan Kim , Christoph Hellwig , Song Liu , Thomas Hellstrom , Ryan Roberts , linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: [PATCH v2 14/32] mm/userfaultfd: retry if pte_offset_map() fails In-Reply-To: Message-ID: <54423f-3dff-fd8d-614a-632727cc4cfb@google.com> References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Instead of worrying whether the pmd is stable, userfaultfd_must_wait() call pte_offset_map() as before, but go back to try again if that fails. Risk of endless loop? It already broke out if pmd_none(), !pmd_present() or pmd_trans_huge(), and pte_offset_map() would have cleared pmd_bad(): which leaves pmd_devmap(). Presumably pmd_devmap() is inappropriate in a vma subject to userfaultfd (it would have been mistreated before), but add a check just to avoid all possibility of endless loop there. Signed-off-by: Hugh Dickins Acked-by: Peter Xu --- fs/userfaultfd.c | 11 ++++++----- 1 file changed, 6 insertions(+), 5 deletions(-) diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c index f7a0817b1ec0..ca83423f8d54 100644 --- a/fs/userfaultfd.c +++ b/fs/userfaultfd.c @@ -349,12 +349,13 @@ static inline bool userfaultfd_must_wait(struct userf= aultfd_ctx *ctx, if (!pud_present(*pud)) goto out; pmd =3D pmd_offset(pud, address); +again: _pmd =3D pmdp_get_lockless(pmd); if (pmd_none(_pmd)) goto out; =20 ret =3D false; - if (!pmd_present(_pmd)) + if (!pmd_present(_pmd) || pmd_devmap(_pmd)) goto out; =20 if (pmd_trans_huge(_pmd)) { @@ -363,11 +364,11 @@ static inline bool userfaultfd_must_wait(struct userf= aultfd_ctx *ctx, goto out; } =20 - /* - * the pmd is stable (as in !pmd_trans_unstable) so we can re-read it - * and use the standard pte_offset_map() instead of parsing _pmd. - */ pte =3D pte_offset_map(pmd, address); + if (!pte) { + ret =3D true; + goto again; + } /* * Lockless access: we're in a wait_event so it's ok if it * changes under us. PTE markers should be handled the same as none --=20 2.35.3 From nobody Sat Feb 7 20:47:53 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 107A8C7EE25 for ; Fri, 9 Jun 2023 01:26:15 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S237687AbjFIB0O (ORCPT ); Thu, 8 Jun 2023 21:26:14 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41042 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S237202AbjFIB0K (ORCPT ); Thu, 8 Jun 2023 21:26:10 -0400 Received: from mail-yw1-x112e.google.com (mail-yw1-x112e.google.com [IPv6:2607:f8b0:4864:20::112e]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 4411218D for ; Thu, 8 Jun 2023 18:26:09 -0700 (PDT) Received: by mail-yw1-x112e.google.com with SMTP id 00721157ae682-565cd2fc9acso11738727b3.0 for ; Thu, 08 Jun 2023 18:26:09 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1686273968; x=1688865968; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:from:to:cc:subject:date:message-id:reply-to; bh=PmbbTemShg7dGpCJymh8ioCm9PQPk/gLwHXF7mQ6jbw=; b=d+dKJf7MYgGkbogFV5iJ0in/u8QWUjYLBAFmQ7ygcUpoTtYR38tV8VjHHRx79+tqC1 xHAi8VEI8udjhn0i6NXjlbZGM0curW/G09YDGBDO7uNeymgEDUutuEhY67dcv9e1JC3c Kr/lYteRhi8EZZ4ON3qgr8IZYbs22sar5cHF6pJ+hkjtIcaYivcyvRaJDrPKAoQHoxL4 78cmtQoJihNLnrWGPJ0/wzQ+Ft29PFAH7mols+/0p1qq115e8foXt7EV0WK2TTvZzAlK WTFccNgXKTckPziGTMxd+vfG4KkV3xlgmbe2/Bx5rx+kPXmMSxor6N0wIGRCDIqsVdI4 znJw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1686273968; x=1688865968; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=PmbbTemShg7dGpCJymh8ioCm9PQPk/gLwHXF7mQ6jbw=; b=H1joAbgKagzyKsreS/veSL4IlkDtHwhxRof9KBqhgxgANyKpCbSadHD4EaXyYh4zeI WUC8zpYJBLoZoEnJkbi6wiUysGT2fqzUleFv+KbeinQwx56zdm1HFXCwNwGuf9GZRcI3 2LJv5Jf1LNadnyxqnhvB34//RCq7LmKpGLI04xUlTxUU3XDQZjX/+rnLKYVzUYcVHAuH 5yC53WpipkgwJ4sDYq8sIE+UsrIGY9obo//zeYLCg9YzM0n/uhiBnpCVYzp9gUUjwBA/ n/Dcci/eQj0YQGL8C/AFdwjIQHqy4GpeF/5mSYVR3YcxMBnVZ0uANNwWPPmmvCzl/U+M W4dA== X-Gm-Message-State: AC+VfDzSLijw/uygBUSGnQiLkUNuJEYct4txjH4vfB0/z3LjLW9sJLEm 3NM8QVQ5yMfP42vOxKKFqvd1BQ== X-Google-Smtp-Source: ACHHUZ7XTCguPhOybQY+LH+ypWpoP0kYFfabibD6+mqWrxXo7bpsKAzEBYkvZOCdhGRiVh3fVLiw0Q== X-Received: by 2002:a81:5b89:0:b0:560:eadc:3bc9 with SMTP id p131-20020a815b89000000b00560eadc3bc9mr1402433ywb.7.1686273968353; Thu, 08 Jun 2023 18:26:08 -0700 (PDT) Received: from ripple.attlocal.net (172-10-233-147.lightspeed.sntcca.sbcglobal.net. [172.10.233.147]) by smtp.gmail.com with ESMTPSA id p82-20020a815b55000000b00561e7639ee8sm296391ywb.57.2023.06.08.18.26.05 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 08 Jun 2023 18:26:07 -0700 (PDT) Date: Thu, 8 Jun 2023 18:26:04 -0700 (PDT) From: Hugh Dickins X-X-Sender: hugh@ripple.attlocal.net To: Andrew Morton cc: Mike Kravetz , Mike Rapoport , "Kirill A. Shutemov" , Matthew Wilcox , David Hildenbrand , Suren Baghdasaryan , Qi Zheng , Yang Shi , Mel Gorman , Peter Xu , Peter Zijlstra , Will Deacon , Yu Zhao , Alistair Popple , Ralph Campbell , Ira Weiny , Steven Price , SeongJae Park , Lorenzo Stoakes , Huang Ying , Naoya Horiguchi , Christophe Leroy , Zack Rusin , Jason Gunthorpe , Axel Rasmussen , Anshuman Khandual , Pasha Tatashin , Miaohe Lin , Minchan Kim , Christoph Hellwig , Song Liu , Thomas Hellstrom , Ryan Roberts , linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: [PATCH v2 15/32] mm/userfaultfd: allow pte_offset_map_lock() to fail In-Reply-To: Message-ID: <50cf3930-1bfa-4de9-a079-3da47b7ce17b@google.com> References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" mfill_atomic_install_pte() and mfill_atomic_pte_zeropage() treat failed pte_offset_map_lock() as -EAGAIN, which mfill_atomic() already returns to user for a similar race. Signed-off-by: Hugh Dickins --- mm/userfaultfd.c | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c index e97a0b4889fc..5fd787158c70 100644 --- a/mm/userfaultfd.c +++ b/mm/userfaultfd.c @@ -76,7 +76,10 @@ int mfill_atomic_install_pte(pmd_t *dst_pmd, if (flags & MFILL_ATOMIC_WP) _dst_pte =3D pte_mkuffd_wp(_dst_pte); =20 + ret =3D -EAGAIN; dst_pte =3D pte_offset_map_lock(dst_mm, dst_pmd, dst_addr, &ptl); + if (!dst_pte) + goto out; =20 if (vma_is_shmem(dst_vma)) { /* serialize against truncate with the page table lock */ @@ -121,6 +124,7 @@ int mfill_atomic_install_pte(pmd_t *dst_pmd, ret =3D 0; out_unlock: pte_unmap_unlock(dst_pte, ptl); +out: return ret; } =20 @@ -212,7 +216,10 @@ static int mfill_atomic_pte_zeropage(pmd_t *dst_pmd, =20 _dst_pte =3D pte_mkspecial(pfn_pte(my_zero_pfn(dst_addr), dst_vma->vm_page_prot)); + ret =3D -EAGAIN; dst_pte =3D pte_offset_map_lock(dst_vma->vm_mm, dst_pmd, dst_addr, &ptl); + if (!dst_pte) + goto out; if (dst_vma->vm_file) { /* the shmem MAP_PRIVATE case requires checking the i_size */ inode =3D dst_vma->vm_file->f_inode; @@ -231,6 +238,7 @@ static int mfill_atomic_pte_zeropage(pmd_t *dst_pmd, ret =3D 0; out_unlock: pte_unmap_unlock(dst_pte, ptl); +out: return ret; } =20 --=20 2.35.3 From nobody Sat Feb 7 20:47:53 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6A85BC7EE29 for ; Fri, 9 Jun 2023 01:28:02 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S237733AbjFIB2B (ORCPT ); Thu, 8 Jun 2023 21:28:01 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41836 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230002AbjFIB16 (ORCPT ); Thu, 8 Jun 2023 21:27:58 -0400 Received: from mail-yw1-x1132.google.com (mail-yw1-x1132.google.com [IPv6:2607:f8b0:4864:20::1132]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id EA24D1716 for ; Thu, 8 Jun 2023 18:27:57 -0700 (PDT) Received: by mail-yw1-x1132.google.com with SMTP id 00721157ae682-569fc874498so12183277b3.1 for ; Thu, 08 Jun 2023 18:27:57 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1686274077; x=1688866077; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:from:to:cc:subject:date:message-id:reply-to; bh=MyPFy0gUSdpV3ibLKHHJg/J/UWoL9nTKOb5gbs61iKQ=; b=OyAJimk3fc+/Li5me7PIcu882/+lDPwqec67fOflTDld8VWvm0ZySNcdoJgLTrD9rx WoOLU/XWyP4MN+Eoi7KJZTv1R6ExtbK2SMofkB2PIt1uKNhRHeS+xx4hN9Wks9SB/Hkk 4gzmFYarU9qov9Uzj7o9h1zLdkAzYRq/b/kKfVItrvxfhKgU8iMFFvE5pQZ3kgV2up+L hNFZoE2nbmMWdsnJ1XbGUBg6gK5gwmUXxa01pub4RN5ECsXTRQyDMvk6goIXaQohEPKU wCMnLqlIXUxUsnIlKrzfLeDs5rhQwhiPDht5XmK01inAm6KH0sUkni/+00j7bluhyFK7 RwQg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1686274077; x=1688866077; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=MyPFy0gUSdpV3ibLKHHJg/J/UWoL9nTKOb5gbs61iKQ=; b=Amj2ZZfBE1zJhGKS+OgYIko1V49Q2o1WAKfytxdhRnAs8xzzASIGk+1gF91fxOcjQv PceFjKd5LP/niU9HktHHRt3siGFClqGaZ/t8VwnjfWRkP1yOiT6Ibp17+WJG0D6cMova ypgNNqAzjDNbZDFIIzD+mGgFxjzaf3Rs7RCzYS54TuHoZ7l+vICQxGc9T8GS0TYpIRl+ cbjv4aPn8wJ5wxxjOR0v/u2ErbCqYbHB4Y3+LQQuqRh3KQ+L2qpOQUDDAxmD6bW1cZVN mEI+p3HYJcwTaXTQufoe0E13GDIb94fK3c9v0EW/sD541s+Dh/v5bBoWbfTXwJmwXWxP q2lA== X-Gm-Message-State: AC+VfDy7O/TYRS44rJRIWXeM02EwnNdCecexC4FP4LE0hwZ8dSj8bwYf TA+p2Lh2IIn6N9TG1srk7kSFbg== X-Google-Smtp-Source: ACHHUZ5jGBBEQ5gswQSXjac3sdwb/7+VqiRoKupCeGhVPyqG0jNqvRoD86vqt+pEd3PQOO1HZrsBzg== X-Received: by 2002:a81:8005:0:b0:569:74f3:ed2e with SMTP id q5-20020a818005000000b0056974f3ed2emr1337210ywf.22.1686274077005; Thu, 08 Jun 2023 18:27:57 -0700 (PDT) Received: from ripple.attlocal.net (172-10-233-147.lightspeed.sntcca.sbcglobal.net. [172.10.233.147]) by smtp.gmail.com with ESMTPSA id c205-20020a814ed6000000b005688ca40c4bsm291663ywb.61.2023.06.08.18.27.53 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 08 Jun 2023 18:27:56 -0700 (PDT) Date: Thu, 8 Jun 2023 18:27:52 -0700 (PDT) From: Hugh Dickins X-X-Sender: hugh@ripple.attlocal.net To: Andrew Morton cc: Mike Kravetz , Mike Rapoport , "Kirill A. Shutemov" , Matthew Wilcox , David Hildenbrand , Suren Baghdasaryan , Qi Zheng , Yang Shi , Mel Gorman , Peter Xu , Peter Zijlstra , Will Deacon , Yu Zhao , Alistair Popple , Ralph Campbell , Ira Weiny , Steven Price , SeongJae Park , Lorenzo Stoakes , Huang Ying , Naoya Horiguchi , Christophe Leroy , Zack Rusin , Jason Gunthorpe , Axel Rasmussen , Anshuman Khandual , Pasha Tatashin , Miaohe Lin , Minchan Kim , Christoph Hellwig , Song Liu , Thomas Hellstrom , Ryan Roberts , linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: [PATCH v2 16/32] mm/debug_vm_pgtable,page_table_check: warn pte map fails In-Reply-To: Message-ID: <3ea9e4f-e5cf-d7d9-4c2-291b3c5a3636@google.com> References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Failures here would be surprising: pte_advanced_tests() and pte_clear_tests() and __page_table_check_pte_clear_range() each issue a warning if pte_offset_map() or pte_offset_map_lock() fails. Signed-off-by: Hugh Dickins --- mm/debug_vm_pgtable.c | 9 ++++++++- mm/page_table_check.c | 2 ++ 2 files changed, 10 insertions(+), 1 deletion(-) diff --git a/mm/debug_vm_pgtable.c b/mm/debug_vm_pgtable.c index c54177aabebd..ee119e33fef1 100644 --- a/mm/debug_vm_pgtable.c +++ b/mm/debug_vm_pgtable.c @@ -138,6 +138,9 @@ static void __init pte_advanced_tests(struct pgtable_de= bug_args *args) return; =20 pr_debug("Validating PTE advanced\n"); + if (WARN_ON(!args->ptep)) + return; + pte =3D pfn_pte(args->pte_pfn, args->page_prot); set_pte_at(args->mm, args->vaddr, args->ptep, pte); flush_dcache_page(page); @@ -619,6 +622,9 @@ static void __init pte_clear_tests(struct pgtable_debug= _args *args) * the unexpected overhead of cache flushing is acceptable. */ pr_debug("Validating PTE clear\n"); + if (WARN_ON(!args->ptep)) + return; + #ifndef CONFIG_RISCV pte =3D __pte(pte_val(pte) | RANDOM_ORVALUE); #endif @@ -1377,7 +1383,8 @@ static int __init debug_vm_pgtable(void) args.ptep =3D pte_offset_map_lock(args.mm, args.pmdp, args.vaddr, &ptl); pte_clear_tests(&args); pte_advanced_tests(&args); - pte_unmap_unlock(args.ptep, ptl); + if (args.ptep) + pte_unmap_unlock(args.ptep, ptl); =20 ptl =3D pmd_lock(args.mm, args.pmdp); pmd_clear_tests(&args); diff --git a/mm/page_table_check.c b/mm/page_table_check.c index f2baf97d5f38..b743a2f6bce0 100644 --- a/mm/page_table_check.c +++ b/mm/page_table_check.c @@ -246,6 +246,8 @@ void __page_table_check_pte_clear_range(struct mm_struc= t *mm, pte_t *ptep =3D pte_offset_map(&pmd, addr); unsigned long i; =20 + if (WARN_ON(!ptep)) + return; for (i =3D 0; i < PTRS_PER_PTE; i++) { __page_table_check_pte_clear(mm, addr, *ptep); addr +=3D PAGE_SIZE; --=20 2.35.3 From nobody Sat Feb 7 20:47:53 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id B66BCC7EE29 for ; Fri, 9 Jun 2023 01:29:33 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S237737AbjFIB3b (ORCPT ); Thu, 8 Jun 2023 21:29:31 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:42274 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230002AbjFIB33 (ORCPT ); Thu, 8 Jun 2023 21:29:29 -0400 Received: from mail-yw1-x1136.google.com (mail-yw1-x1136.google.com [IPv6:2607:f8b0:4864:20::1136]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 062011FDA for ; Thu, 8 Jun 2023 18:29:28 -0700 (PDT) Received: by mail-yw1-x1136.google.com with SMTP id 00721157ae682-568af2f6454so11394447b3.1 for ; Thu, 08 Jun 2023 18:29:27 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1686274167; x=1688866167; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:from:to:cc:subject:date:message-id:reply-to; bh=wEXGzvow+PeeAaek8TvLjmsJIBTHvkDkqgMD7Fw0vH4=; b=1iKYEdXQU7GyBWXXdTJ/eTNQaKjM8A5uGEtCPlSPe50PEEnGNTaKjIFQd2nLJWOfvh tPsKmHoib44kj+JEUNUeUUirOK6gcFjvoUWHBf5F5sWsiABdMBGtAthxe2WU9qh7noM4 Zh/iyBIZbzHd4k5pq9B8glalrk6L5lYQzkgizx3QlltAhP4mUsNeTxTjLywFtxVFGNvZ jIQGXg/oBKKP81SqRx4Caj2rWxiav0FLHH/NmIS/432ybPMUtKQi+Yq1y0GkDZpiwhGe kNBiMEtti/dkcWXs/faQIjCPcs483ukabcMmUPTpwrdxz/Ln2gtVt+7567CoyLJvwmZM 3ejA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1686274167; x=1688866167; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=wEXGzvow+PeeAaek8TvLjmsJIBTHvkDkqgMD7Fw0vH4=; b=ZRju5wkJJWZ3s7e6hUU+c9N6VeJKetfQFe2iqqMh2BuG3RexTITlhVFdaWp8fl+aQs hAny+zjZHiSrnxwQRqxQoaPozTShANerPDllHi14awqRhco3oYx2aJO/KZOwsk8olkl+ aXLKZiZbS+rdsBeFouxDNOdVj0tUzzPX8hSb9sgD+AoUjIJV3X+XdJtEi3160uvrJehn gXDr/fkq5BXN0xVk9YbV4KMWYWx3P3DDMGDMgREkiL5Ad6D0MN1jruC5Lr3c/g/Iwt6m WCDTF/bCKndWpSN/eFC3WSKuhvoSIHeW9p7SH8LDFDVJUBkIby+oYo4GWSNyh8pBoeiH Fv5Q== X-Gm-Message-State: AC+VfDx7Fa0CdqRi/eKMqd2c7LXzIv0JF/d65VBjeNd18pqUYxLwuRud wh4A3KsVvzaxyzRAii13siA/gA== X-Google-Smtp-Source: ACHHUZ7s1itVLKfUVkBt3mJsfUoKC8ShOR4DXlaWFBNzOobINfIUYkdb8mmHGFUq5qbj4FXPU58kgA== X-Received: by 2002:a0d:e8ce:0:b0:561:be2a:43f9 with SMTP id r197-20020a0de8ce000000b00561be2a43f9mr1268402ywe.41.1686274167029; Thu, 08 Jun 2023 18:29:27 -0700 (PDT) Received: from ripple.attlocal.net (172-10-233-147.lightspeed.sntcca.sbcglobal.net. [172.10.233.147]) by smtp.gmail.com with ESMTPSA id i130-20020a815488000000b00561e2cb2d3bsm300434ywb.23.2023.06.08.18.29.23 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 08 Jun 2023 18:29:26 -0700 (PDT) Date: Thu, 8 Jun 2023 18:29:22 -0700 (PDT) From: Hugh Dickins X-X-Sender: hugh@ripple.attlocal.net To: Andrew Morton cc: Mike Kravetz , Mike Rapoport , "Kirill A. Shutemov" , Matthew Wilcox , David Hildenbrand , Suren Baghdasaryan , Qi Zheng , Yang Shi , Mel Gorman , Peter Xu , Peter Zijlstra , Will Deacon , Yu Zhao , Alistair Popple , Ralph Campbell , Ira Weiny , Steven Price , SeongJae Park , Lorenzo Stoakes , Huang Ying , Naoya Horiguchi , Christophe Leroy , Zack Rusin , Jason Gunthorpe , Axel Rasmussen , Anshuman Khandual , Pasha Tatashin , Miaohe Lin , Minchan Kim , Christoph Hellwig , Song Liu , Thomas Hellstrom , Ryan Roberts , linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: [PATCH v2 17/32] mm/various: give up if pte_offset_map[_lock]() fails In-Reply-To: Message-ID: <7b9bd85d-1652-cbf2-159d-f503b45e5b@google.com> References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Following the examples of nearby code, various functions can just give up if pte_offset_map() or pte_offset_map_lock() fails. And there's no need for a preliminary pmd_trans_unstable() or other such check, since such cases are now safely handled inside. Signed-off-by: Hugh Dickins --- mm/gup.c | 9 ++++++--- mm/ksm.c | 7 ++++--- mm/memcontrol.c | 8 ++++---- mm/memory-failure.c | 8 +++++--- mm/migrate.c | 3 +++ 5 files changed, 22 insertions(+), 13 deletions(-) diff --git a/mm/gup.c b/mm/gup.c index 3bd5d3854c51..bb67193c5460 100644 --- a/mm/gup.c +++ b/mm/gup.c @@ -544,10 +544,10 @@ static struct page *follow_page_pte(struct vm_area_st= ruct *vma, if (WARN_ON_ONCE((flags & (FOLL_PIN | FOLL_GET)) =3D=3D (FOLL_PIN | FOLL_GET))) return ERR_PTR(-EINVAL); - if (unlikely(pmd_bad(*pmd))) - return no_page_table(vma, flags); =20 ptep =3D pte_offset_map_lock(mm, pmd, address, &ptl); + if (!ptep) + return no_page_table(vma, flags); pte =3D *ptep; if (!pte_present(pte)) goto no_page; @@ -851,8 +851,9 @@ static int get_gate_page(struct mm_struct *mm, unsigned= long address, pmd =3D pmd_offset(pud, address); if (!pmd_present(*pmd)) return -EFAULT; - VM_BUG_ON(pmd_trans_huge(*pmd)); pte =3D pte_offset_map(pmd, address); + if (!pte) + return -EFAULT; if (pte_none(*pte)) goto unmap; *vma =3D get_gate_vma(mm); @@ -2377,6 +2378,8 @@ static int gup_pte_range(pmd_t pmd, pmd_t *pmdp, unsi= gned long addr, pte_t *ptep, *ptem; =20 ptem =3D ptep =3D pte_offset_map(&pmd, addr); + if (!ptep) + return 0; do { pte_t pte =3D ptep_get_lockless(ptep); struct page *page; diff --git a/mm/ksm.c b/mm/ksm.c index df2aa281d49d..3dc15459dd20 100644 --- a/mm/ksm.c +++ b/mm/ksm.c @@ -431,10 +431,9 @@ static int break_ksm_pmd_entry(pmd_t *pmd, unsigned lo= ng addr, unsigned long nex pte_t *pte; int ret; =20 - if (pmd_leaf(*pmd) || !pmd_present(*pmd)) - return 0; - pte =3D pte_offset_map_lock(walk->mm, pmd, addr, &ptl); + if (!pte) + return 0; if (pte_present(*pte)) { page =3D vm_normal_page(walk->vma, addr, *pte); } else if (!pte_none(*pte)) { @@ -1203,6 +1202,8 @@ static int replace_page(struct vm_area_struct *vma, s= truct page *page, mmu_notifier_invalidate_range_start(&range); =20 ptep =3D pte_offset_map_lock(mm, pmd, addr, &ptl); + if (!ptep) + goto out_mn; if (!pte_same(*ptep, orig_pte)) { pte_unmap_unlock(ptep, ptl); goto out_mn; diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 4b27e245a055..fdd953655fe1 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -6057,9 +6057,9 @@ static int mem_cgroup_count_precharge_pte_range(pmd_t= *pmd, return 0; } =20 - if (pmd_trans_unstable(pmd)) - return 0; pte =3D pte_offset_map_lock(vma->vm_mm, pmd, addr, &ptl); + if (!pte) + return 0; for (; addr !=3D end; pte++, addr +=3D PAGE_SIZE) if (get_mctgt_type(vma, addr, *pte, NULL)) mc.precharge++; /* increment precharge temporarily */ @@ -6277,10 +6277,10 @@ static int mem_cgroup_move_charge_pte_range(pmd_t *= pmd, return 0; } =20 - if (pmd_trans_unstable(pmd)) - return 0; retry: pte =3D pte_offset_map_lock(vma->vm_mm, pmd, addr, &ptl); + if (!pte) + return 0; for (; addr !=3D end; addr +=3D PAGE_SIZE) { pte_t ptent =3D *(pte++); bool device =3D false; diff --git a/mm/memory-failure.c b/mm/memory-failure.c index 5b663eca1f29..b3cc8f213fe3 100644 --- a/mm/memory-failure.c +++ b/mm/memory-failure.c @@ -414,6 +414,8 @@ static unsigned long dev_pagemap_mapping_shift(struct v= m_area_struct *vma, if (pmd_devmap(*pmd)) return PMD_SHIFT; pte =3D pte_offset_map(pmd, address); + if (!pte) + return 0; if (pte_present(*pte) && pte_devmap(*pte)) ret =3D PAGE_SHIFT; pte_unmap(pte); @@ -800,11 +802,11 @@ static int hwpoison_pte_range(pmd_t *pmdp, unsigned l= ong addr, goto out; } =20 - if (pmd_trans_unstable(pmdp)) - goto out; - mapped_pte =3D ptep =3D pte_offset_map_lock(walk->vma->vm_mm, pmdp, addr, &ptl); + if (!ptep) + goto out; + for (; addr !=3D end; ptep++, addr +=3D PAGE_SIZE) { ret =3D check_hwpoisoned_entry(*ptep, addr, PAGE_SHIFT, hwp->pfn, &hwp->tk); diff --git a/mm/migrate.c b/mm/migrate.c index 3ecb7a40075f..308a56f0b156 100644 --- a/mm/migrate.c +++ b/mm/migrate.c @@ -305,6 +305,9 @@ void migration_entry_wait(struct mm_struct *mm, pmd_t *= pmd, swp_entry_t entry; =20 ptep =3D pte_offset_map_lock(mm, pmd, address, &ptl); + if (!ptep) + return; + pte =3D *ptep; pte_unmap(ptep); =20 --=20 2.35.3 From nobody Sat Feb 7 20:47:53 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id EFC0CC7EE37 for ; Fri, 9 Jun 2023 01:30:58 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229726AbjFIBa6 (ORCPT ); Thu, 8 Jun 2023 21:30:58 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:42714 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229688AbjFIBaz (ORCPT ); Thu, 8 Jun 2023 21:30:55 -0400 Received: from mail-yw1-x1132.google.com (mail-yw1-x1132.google.com [IPv6:2607:f8b0:4864:20::1132]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id DF35D1716 for ; Thu, 8 Jun 2023 18:30:53 -0700 (PDT) Received: by mail-yw1-x1132.google.com with SMTP id 00721157ae682-565bd368e19so10851917b3.1 for ; Thu, 08 Jun 2023 18:30:53 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1686274253; x=1688866253; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:from:to:cc:subject:date:message-id:reply-to; bh=7GAfzbPNj/2rvKRevzngE24N05kydgFSIoMtFyLCrv4=; b=gJAMMykaIEU9PkjyCIdSz3ODq0ftmm75epqTVf1DDwNHXYM11vOXndav/A+j1kRFEc 9zEcP31A1kA13aWMOjns1R75xfAWzNDi1gHKk68ewQgq0whuvZDJ2IOLlB+uSUVd5Hp8 svpukbwsNHcvaTpeSeeB57vhMGo3aoLUGSbQC4vDY82UJZYh6UFZthAmKEZpoO6CrQsj zIurR6IMYJtElj3ek8vexr06I8pOIvm9uDnN6jHks/XrVIB5jV9tJeQ4OkFBFlzOLvfn TFupD/LP2R58aC1XHY0D8ruvxKPXB9C8uw3aB2kdOGhPoyncXso5dO85F2PIfWP2s5CT /ipQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1686274253; x=1688866253; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=7GAfzbPNj/2rvKRevzngE24N05kydgFSIoMtFyLCrv4=; b=UoDHPYELmYDszENOUZtkICjSsXsJj7AMDraPu7kXEMmN8ZUpTEUqoxfDZ1OIj7mzho 7Qlp+EfCQzSBNp/Zo0q0VM8V2/a0fJ7awSJBb9Gwt4TGhLzaBUqURHbSKs2Ye0pAxS2H 3dCsPVmYQ77EjuK7BIUCSAA4jFABAHzAfF6bVmto4vQZClWnfxJReZXbYgKlubIMBbL8 ZsTDSEmQxixAZl9kY/rXKTQRyTH9esFfGu3aa/1jeWiQsARit/BU9T0Oxv2GSB8EyVz+ Q3rcZWSG98YdtsFiegUTXn0k1XqZYWgk/GGog8uv+IKS2METQMegKwNy/PM0X0EerPe3 kvYA== X-Gm-Message-State: AC+VfDys4OLmLXD9ezvX7BIaKGQX0MVeaevgFFqW8Sp9xee65ECAaPXs C2DGf55jZxEr4weyT2KojnPHTw== X-Google-Smtp-Source: ACHHUZ4zYAZPyewfaoixPaut1Ay+u+IP9P5gVJt0aeUL3XpkhLRU1BODxkde89jUMQkygUDst/++Nw== X-Received: by 2002:a0d:ea8b:0:b0:559:d3a0:4270 with SMTP id t133-20020a0dea8b000000b00559d3a04270mr1322987ywe.34.1686274252865; Thu, 08 Jun 2023 18:30:52 -0700 (PDT) Received: from ripple.attlocal.net (172-10-233-147.lightspeed.sntcca.sbcglobal.net. [172.10.233.147]) by smtp.gmail.com with ESMTPSA id y206-20020a817dd7000000b0055a881abfc3sm277355ywc.135.2023.06.08.18.30.49 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 08 Jun 2023 18:30:52 -0700 (PDT) Date: Thu, 8 Jun 2023 18:30:48 -0700 (PDT) From: Hugh Dickins X-X-Sender: hugh@ripple.attlocal.net To: Andrew Morton cc: Mike Kravetz , Mike Rapoport , "Kirill A. Shutemov" , Matthew Wilcox , David Hildenbrand , Suren Baghdasaryan , Qi Zheng , Yang Shi , Mel Gorman , Peter Xu , Peter Zijlstra , Will Deacon , Yu Zhao , Alistair Popple , Ralph Campbell , Ira Weiny , Steven Price , SeongJae Park , Lorenzo Stoakes , Huang Ying , Naoya Horiguchi , Christophe Leroy , Zack Rusin , Jason Gunthorpe , Axel Rasmussen , Anshuman Khandual , Pasha Tatashin , Miaohe Lin , Minchan Kim , Christoph Hellwig , Song Liu , Thomas Hellstrom , Ryan Roberts , linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: [PATCH v2 18/32] mm/mprotect: delete pmd_none_or_clear_bad_unless_trans_huge() In-Reply-To: Message-ID: <725a42a9-91e9-c868-925-e3a5fd40bb4f@google.com> References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" change_pmd_range() had special pmd_none_or_clear_bad_unless_trans_huge(), required to avoid "bad" choices when setting automatic NUMA hinting under mmap_read_lock(); but most of that is already covered in pte_offset_map() now. change_pmd_range() just wants a pmd_none() check before wasting time on MMU notifiers, then checks on the read-once _pmd value to work out what's needed for huge cases. If change_pte_range() returns -EAGAIN to retry if pte_offset_map_lock() fails, nothing more special is needed. Signed-off-by: Hugh Dickins --- mm/mprotect.c | 74 ++++++++++++--------------------------------------- 1 file changed, 17 insertions(+), 57 deletions(-) diff --git a/mm/mprotect.c b/mm/mprotect.c index c5a13c0f1017..64e1df0af514 100644 --- a/mm/mprotect.c +++ b/mm/mprotect.c @@ -93,22 +93,9 @@ static long change_pte_range(struct mmu_gather *tlb, bool uffd_wp_resolve =3D cp_flags & MM_CP_UFFD_WP_RESOLVE; =20 tlb_change_page_size(tlb, PAGE_SIZE); - - /* - * Can be called with only the mmap_lock for reading by - * prot_numa so we must check the pmd isn't constantly - * changing from under us from pmd_none to pmd_trans_huge - * and/or the other way around. - */ - if (pmd_trans_unstable(pmd)) - return 0; - - /* - * The pmd points to a regular pte so the pmd can't change - * from under us even if the mmap_lock is only hold for - * reading. - */ pte =3D pte_offset_map_lock(vma->vm_mm, pmd, addr, &ptl); + if (!pte) + return -EAGAIN; =20 /* Get target node for single threaded private VMAs */ if (prot_numa && !(vma->vm_flags & VM_SHARED) && @@ -301,26 +288,6 @@ static long change_pte_range(struct mmu_gather *tlb, return pages; } =20 -/* - * Used when setting automatic NUMA hinting protection where it is - * critical that a numa hinting PMD is not confused with a bad PMD. - */ -static inline int pmd_none_or_clear_bad_unless_trans_huge(pmd_t *pmd) -{ - pmd_t pmdval =3D pmdp_get_lockless(pmd); - - if (pmd_none(pmdval)) - return 1; - if (pmd_trans_huge(pmdval)) - return 0; - if (unlikely(pmd_bad(pmdval))) { - pmd_clear_bad(pmd); - return 1; - } - - return 0; -} - /* * Return true if we want to split THPs into PTE mappings in change * protection procedure, false otherwise. @@ -398,7 +365,8 @@ static inline long change_pmd_range(struct mmu_gather *= tlb, pmd =3D pmd_offset(pud, addr); do { long ret; - + pmd_t _pmd; +again: next =3D pmd_addr_end(addr, end); =20 ret =3D change_pmd_prepare(vma, pmd, cp_flags); @@ -406,16 +374,8 @@ static inline long change_pmd_range(struct mmu_gather = *tlb, pages =3D ret; break; } - /* - * Automatic NUMA balancing walks the tables with mmap_lock - * held for read. It's possible a parallel update to occur - * between pmd_trans_huge() and a pmd_none_or_clear_bad() - * check leading to a false positive and clearing. - * Hence, it's necessary to atomically read the PMD value - * for all the checks. - */ - if (!is_swap_pmd(*pmd) && !pmd_devmap(*pmd) && - pmd_none_or_clear_bad_unless_trans_huge(pmd)) + + if (pmd_none(*pmd)) goto next; =20 /* invoke the mmu notifier if the pmd is populated */ @@ -426,7 +386,8 @@ static inline long change_pmd_range(struct mmu_gather *= tlb, mmu_notifier_invalidate_range_start(&range); } =20 - if (is_swap_pmd(*pmd) || pmd_trans_huge(*pmd) || pmd_devmap(*pmd)) { + _pmd =3D pmdp_get_lockless(pmd); + if (is_swap_pmd(_pmd) || pmd_trans_huge(_pmd) || pmd_devmap(_pmd)) { if ((next - addr !=3D HPAGE_PMD_SIZE) || pgtable_split_needed(vma, cp_flags)) { __split_huge_pmd(vma, pmd, addr, false, NULL); @@ -441,15 +402,10 @@ static inline long change_pmd_range(struct mmu_gather= *tlb, break; } } else { - /* - * change_huge_pmd() does not defer TLB flushes, - * so no need to propagate the tlb argument. - */ - int nr_ptes =3D change_huge_pmd(tlb, vma, pmd, + ret =3D change_huge_pmd(tlb, vma, pmd, addr, newprot, cp_flags); - - if (nr_ptes) { - if (nr_ptes =3D=3D HPAGE_PMD_NR) { + if (ret) { + if (ret =3D=3D HPAGE_PMD_NR) { pages +=3D HPAGE_PMD_NR; nr_huge_updates++; } @@ -460,8 +416,12 @@ static inline long change_pmd_range(struct mmu_gather = *tlb, } /* fall through, the trans huge pmd just split */ } - pages +=3D change_pte_range(tlb, vma, pmd, addr, next, - newprot, cp_flags); + + ret =3D change_pte_range(tlb, vma, pmd, addr, next, newprot, + cp_flags); + if (ret < 0) + goto again; + pages +=3D ret; next: cond_resched(); } while (pmd++, addr =3D next, addr !=3D end); --=20 2.35.3 From nobody Sat Feb 7 20:47:53 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 307CBC7EE29 for ; Fri, 9 Jun 2023 01:32:59 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S237084AbjFIBc4 (ORCPT ); Thu, 8 Jun 2023 21:32:56 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:43670 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230094AbjFIBcx (ORCPT ); Thu, 8 Jun 2023 21:32:53 -0400 Received: from mail-yw1-x1131.google.com (mail-yw1-x1131.google.com [IPv6:2607:f8b0:4864:20::1131]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D36972D4F for ; Thu, 8 Jun 2023 18:32:52 -0700 (PDT) Received: by mail-yw1-x1131.google.com with SMTP id 00721157ae682-561b7729a12so35138137b3.1 for ; Thu, 08 Jun 2023 18:32:52 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1686274372; x=1688866372; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:from:to:cc:subject:date:message-id:reply-to; bh=dPoyGlkGpegFUTFofVDIHWRBddk7VxC7U3Cl7g+dttQ=; b=E4DRQK86HkB1M0oL3CGdKUO2AI9NqeHpyBpHLli2aF+Pqlit3JP/3FLe7keJkIgcP9 Ui3kDjUhzLuUpOKUr/BBsh3paM313vr/kZTbKcMrzy1rnCosbjtatNixhCUekN/A7pkf A1U0Xx3rYpDhvjGoSHww7Px2zZjjdooHl41F3udSS11WEeWkhqn3r8hfZs7GDrCg2BMs cW8rdWF1rLhD4DHvb+mCyaR83x60p+qcJdlOzarzjsn+EOUWDt8OE4eCU+8SOfMHlxy4 XQHaWxY2xWWX/2GxHjET5xLBgwJ5RGHZxlXYqQkMUK5vHAdHkIfUkQT70qyxIHs4Ypp2 ur1A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1686274372; x=1688866372; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=dPoyGlkGpegFUTFofVDIHWRBddk7VxC7U3Cl7g+dttQ=; b=Uh+WYhBVaO3iX9ZXVhUlyoqw5HBkQgv2qPAEH4plapoYq4C9sFOJPhSEPiLvSovGRD tXUdqtz8FJKCFAPQAeLnZjYUQPCY932797p4RPgVrm+PMjWn8tFNmk7/zgj7KLWpGqNt QSMqI5kp9XPCwV7fOOyyxdIHEIRL3gaobMcqvsxcOFs02g80UMPGUPXAXehmsXx5VPru fhStZjwdkm+vIurxB+QEXlM7Z4NEdx5HEtcrhX0OYpmeBqKLlbR1Ih4QiqvEhBdzG0IP Bd/9n+BXRbhz0+IdC0Ljkx8znH7fB/vq3a0UPraBMhgZnYnty+Hizr9ReAHB322Xk9jL vNFQ== X-Gm-Message-State: AC+VfDzmPjUcZF/wMPOq0ai/NBB+esdWgIASvALWHRHnFc1JqQPlpeuN 7xoSDh4YBoiP2Qck7BqpanDpvQ== X-Google-Smtp-Source: ACHHUZ6C2iTOD9MuiuvdKguC1n6gRPJbYpY/VQ3aTwN0L3f5aJC3JH3HR0E7O5S3i52MFj8yhumCLw== X-Received: by 2002:a81:4f4c:0:b0:561:1cb6:f3d6 with SMTP id d73-20020a814f4c000000b005611cb6f3d6mr549240ywb.0.1686274371904; Thu, 08 Jun 2023 18:32:51 -0700 (PDT) Received: from ripple.attlocal.net (172-10-233-147.lightspeed.sntcca.sbcglobal.net. [172.10.233.147]) by smtp.gmail.com with ESMTPSA id u195-20020a0debcc000000b00565c29cf592sm309752ywe.10.2023.06.08.18.32.48 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 08 Jun 2023 18:32:51 -0700 (PDT) Date: Thu, 8 Jun 2023 18:32:47 -0700 (PDT) From: Hugh Dickins X-X-Sender: hugh@ripple.attlocal.net To: Andrew Morton cc: Mike Kravetz , Mike Rapoport , "Kirill A. Shutemov" , Matthew Wilcox , David Hildenbrand , Suren Baghdasaryan , Qi Zheng , Yang Shi , Mel Gorman , Peter Xu , Peter Zijlstra , Will Deacon , Yu Zhao , Alistair Popple , Ralph Campbell , Ira Weiny , Steven Price , SeongJae Park , Lorenzo Stoakes , Huang Ying , Naoya Horiguchi , Christophe Leroy , Zack Rusin , Jason Gunthorpe , Axel Rasmussen , Anshuman Khandual , Pasha Tatashin , Miaohe Lin , Minchan Kim , Christoph Hellwig , Song Liu , Thomas Hellstrom , Ryan Roberts , linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: [PATCH v2 19/32] mm/mremap: retry if either pte_offset_map_*lock() fails In-Reply-To: Message-ID: <65e5e84a-f04-947-23f2-b97d3462e1e@google.com> References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" move_ptes() return -EAGAIN if pte_offset_map_lock() of old fails, or if pte_offset_map_nolock() of new fails: move_page_tables() retry if so. But that does need a pmd_none() check inside, to stop endless loop when huge shmem is truncated (thank you to syzbot); and move_huge_pmd() must tolerate that a page table might have been allocated there just before (of course it would be more satisfying to remove the empty page table, but this is not a path worth optimizing). Signed-off-by: Hugh Dickins --- mm/huge_memory.c | 5 +++-- mm/mremap.c | 28 ++++++++++++++++++++-------- 2 files changed, 23 insertions(+), 10 deletions(-) diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 624671aaa60d..d4bd5fa7c823 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -1760,9 +1760,10 @@ bool move_huge_pmd(struct vm_area_struct *vma, unsig= ned long old_addr, =20 /* * The destination pmd shouldn't be established, free_pgtables() - * should have release it. + * should have released it; but move_page_tables() might have already + * inserted a page table, if racing against shmem/file collapse. */ - if (WARN_ON(!pmd_none(*new_pmd))) { + if (!pmd_none(*new_pmd)) { VM_BUG_ON(pmd_trans_huge(*new_pmd)); return false; } diff --git a/mm/mremap.c b/mm/mremap.c index b11ce6c92099..1fc47b4f38d7 100644 --- a/mm/mremap.c +++ b/mm/mremap.c @@ -133,7 +133,7 @@ static pte_t move_soft_dirty_pte(pte_t pte) return pte; } =20 -static void move_ptes(struct vm_area_struct *vma, pmd_t *old_pmd, +static int move_ptes(struct vm_area_struct *vma, pmd_t *old_pmd, unsigned long old_addr, unsigned long old_end, struct vm_area_struct *new_vma, pmd_t *new_pmd, unsigned long new_addr, bool need_rmap_locks) @@ -143,6 +143,7 @@ static void move_ptes(struct vm_area_struct *vma, pmd_t= *old_pmd, spinlock_t *old_ptl, *new_ptl; bool force_flush =3D false; unsigned long len =3D old_end - old_addr; + int err =3D 0; =20 /* * When need_rmap_locks is true, we take the i_mmap_rwsem and anon_vma @@ -170,8 +171,16 @@ static void move_ptes(struct vm_area_struct *vma, pmd_= t *old_pmd, * pte locks because exclusive mmap_lock prevents deadlock. */ old_pte =3D pte_offset_map_lock(mm, old_pmd, old_addr, &old_ptl); - new_pte =3D pte_offset_map(new_pmd, new_addr); - new_ptl =3D pte_lockptr(mm, new_pmd); + if (!old_pte) { + err =3D -EAGAIN; + goto out; + } + new_pte =3D pte_offset_map_nolock(mm, new_pmd, new_addr, &new_ptl); + if (!new_pte) { + pte_unmap_unlock(old_pte, old_ptl); + err =3D -EAGAIN; + goto out; + } if (new_ptl !=3D old_ptl) spin_lock_nested(new_ptl, SINGLE_DEPTH_NESTING); flush_tlb_batched_pending(vma->vm_mm); @@ -208,8 +217,10 @@ static void move_ptes(struct vm_area_struct *vma, pmd_= t *old_pmd, spin_unlock(new_ptl); pte_unmap(new_pte - 1); pte_unmap_unlock(old_pte - 1, old_ptl); +out: if (need_rmap_locks) drop_rmap_locks(vma); + return err; } =20 #ifndef arch_supports_page_table_move @@ -537,6 +548,7 @@ unsigned long move_page_tables(struct vm_area_struct *v= ma, new_pmd =3D alloc_new_pmd(vma->vm_mm, vma, new_addr); if (!new_pmd) break; +again: if (is_swap_pmd(*old_pmd) || pmd_trans_huge(*old_pmd) || pmd_devmap(*old_pmd)) { if (extent =3D=3D HPAGE_PMD_SIZE && @@ -544,8 +556,6 @@ unsigned long move_page_tables(struct vm_area_struct *v= ma, old_pmd, new_pmd, need_rmap_locks)) continue; split_huge_pmd(vma, old_pmd, old_addr); - if (pmd_trans_unstable(old_pmd)) - continue; } else if (IS_ENABLED(CONFIG_HAVE_MOVE_PMD) && extent =3D=3D PMD_SIZE) { /* @@ -556,11 +566,13 @@ unsigned long move_page_tables(struct vm_area_struct = *vma, old_pmd, new_pmd, true)) continue; } - + if (pmd_none(*old_pmd)) + continue; if (pte_alloc(new_vma->vm_mm, new_pmd)) break; - move_ptes(vma, old_pmd, old_addr, old_addr + extent, new_vma, - new_pmd, new_addr, need_rmap_locks); + if (move_ptes(vma, old_pmd, old_addr, old_addr + extent, + new_vma, new_pmd, new_addr, need_rmap_locks) < 0) + goto again; } =20 mmu_notifier_invalidate_range_end(&range); --=20 2.35.3 From nobody Sat Feb 7 20:47:53 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5FE4DC7EE25 for ; Fri, 9 Jun 2023 01:34:13 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S237791AbjFIBeM (ORCPT ); Thu, 8 Jun 2023 21:34:12 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:44356 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230094AbjFIBeK (ORCPT ); Thu, 8 Jun 2023 21:34:10 -0400 Received: from mail-yw1-x112d.google.com (mail-yw1-x112d.google.com [IPv6:2607:f8b0:4864:20::112d]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 81E4E30E6 for ; Thu, 8 Jun 2023 18:34:08 -0700 (PDT) Received: by mail-yw1-x112d.google.com with SMTP id 00721157ae682-565e6beb7aaso11623887b3.2 for ; Thu, 08 Jun 2023 18:34:08 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1686274447; x=1688866447; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:from:to:cc:subject:date:message-id:reply-to; bh=wdvmzniCtM+nQioqQdrwu6IzqdoeozaKC+WtG0d7jzc=; b=Oi7q9zQXAnHzVKo/FhCPs8tzz9nacAj7GeJ1aGGBUrWEYASbA/COShPsGhE7TSY1ih bxVmRvKdrmFR5jXzZRTgemTvq61YU0XHRU70b1d3QdbaFwgY+kX5E2pxmuHyonISfka/ 2mIjcxl1BRTJLCdi9tDPNj4z8xlbHAAh2ERkE6DkidseXIGdbxnscT3F7ARVo5KkhPVF m5RsG7gDmqBonOEOuZYs+SmyuTOVvsJthOrxfrS4j3QsaT2MI33gyJZxrSrSJZSC8FPD R5ZnuYh4R1BXHnQy+G2iJxg5x1Uo6TM759fPwJnBzBOJAwDjI9gtM3K7QZ52HU2U68va CGEQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1686274447; x=1688866447; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=wdvmzniCtM+nQioqQdrwu6IzqdoeozaKC+WtG0d7jzc=; b=V3IOVHubJcLZ0Z3D4GWvl5wrMmduS5nQ+DTdipApQkxmXEaIrmnPnydCRqKDX7zpUw RSDQrLtwxplWpGK2JjZApWiYWrReln4/Dt973227ZtvcpxlHGc0XycH8ovYtxviERqIO nrb863IC2XIP0OYZd1dUlPyMDdfIdxSagmkYZwYxAbJu7g1x8TOcEcVU6bJPEWUYW2r6 W+ArFyHbQkN4Xhv3wtNsJCDshOhbzDIBVK39gn9st3aTr+J7M5dRCeNSdaPaZxNOGiwz 7nXS9H9qyCjGz/ci6AsEIZQ/lHMP4ZEoQhpD48x12P/V0Yay4KkkefJa75kLu6mdWlGb XFQw== X-Gm-Message-State: AC+VfDwbT7qOAg948G6GOeqMxw1zGau3dtbsA/h7qAT5UxOxKPY9Kgz+ aT8LNXFbCIGI8odlNP7hxDBqCw== X-Google-Smtp-Source: ACHHUZ5Q/YHvHVewBNkmQMz1HqeMOeXDnHdCMp674SssIxi8mNPsiGfn7zCsWmM3qzcif+cYyekxpQ== X-Received: by 2002:a0d:cb50:0:b0:54d:ea34:c31 with SMTP id n77-20020a0dcb50000000b0054dea340c31mr1301109ywd.29.1686274447361; Thu, 08 Jun 2023 18:34:07 -0700 (PDT) Received: from ripple.attlocal.net (172-10-233-147.lightspeed.sntcca.sbcglobal.net. [172.10.233.147]) by smtp.gmail.com with ESMTPSA id q67-20020a818046000000b0054f83731ad2sm314580ywf.0.2023.06.08.18.34.03 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 08 Jun 2023 18:34:06 -0700 (PDT) Date: Thu, 8 Jun 2023 18:34:03 -0700 (PDT) From: Hugh Dickins X-X-Sender: hugh@ripple.attlocal.net To: Andrew Morton cc: Mike Kravetz , Mike Rapoport , "Kirill A. Shutemov" , Matthew Wilcox , David Hildenbrand , Suren Baghdasaryan , Qi Zheng , Yang Shi , Mel Gorman , Peter Xu , Peter Zijlstra , Will Deacon , Yu Zhao , Alistair Popple , Ralph Campbell , Ira Weiny , Steven Price , SeongJae Park , Lorenzo Stoakes , Huang Ying , Naoya Horiguchi , Christophe Leroy , Zack Rusin , Jason Gunthorpe , Axel Rasmussen , Anshuman Khandual , Pasha Tatashin , Miaohe Lin , Minchan Kim , Christoph Hellwig , Song Liu , Thomas Hellstrom , Ryan Roberts , linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: [PATCH v2 20/32] mm/madvise: clean up pte_offset_map_lock() scans In-Reply-To: Message-ID: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Came here to make madvise's several pte_offset_map_lock() scans advance to next extent on failure, and remove superfluous pmd_trans_unstable() and pmd_none_or_trans_huge_or_clear_bad() calls. But also did some nearby cleanup. swapin_walk_pmd_entry(): don't name an address "index"; don't drop the lock after every pte, only when calling out to read_swap_cache_async(). madvise_cold_or_pageout_pte_range() and madvise_free_pte_range(): prefer "start_pte" for pointer, orig_pte usually denotes a saved pte value; leave lazy MMU mode before unlocking; merge the success and failure paths after split_folio(). Signed-off-by: Hugh Dickins --- mm/madvise.c | 122 ++++++++++++++++++++++++++++----------------------- 1 file changed, 68 insertions(+), 54 deletions(-) diff --git a/mm/madvise.c b/mm/madvise.c index b5ffbaf616f5..0af64c4a8f82 100644 --- a/mm/madvise.c +++ b/mm/madvise.c @@ -188,37 +188,43 @@ static int madvise_update_vma(struct vm_area_struct *= vma, =20 #ifdef CONFIG_SWAP static int swapin_walk_pmd_entry(pmd_t *pmd, unsigned long start, - unsigned long end, struct mm_walk *walk) + unsigned long end, struct mm_walk *walk) { struct vm_area_struct *vma =3D walk->private; - unsigned long index; struct swap_iocb *splug =3D NULL; + pte_t *ptep =3D NULL; + spinlock_t *ptl; + unsigned long addr; =20 - if (pmd_none_or_trans_huge_or_clear_bad(pmd)) - return 0; - - for (index =3D start; index !=3D end; index +=3D PAGE_SIZE) { + for (addr =3D start; addr < end; addr +=3D PAGE_SIZE) { pte_t pte; swp_entry_t entry; struct page *page; - spinlock_t *ptl; - pte_t *ptep; =20 - ptep =3D pte_offset_map_lock(vma->vm_mm, pmd, index, &ptl); + if (!ptep++) { + ptep =3D pte_offset_map_lock(vma->vm_mm, pmd, addr, &ptl); + if (!ptep) + break; + } + pte =3D *ptep; - pte_unmap_unlock(ptep, ptl); - if (!is_swap_pte(pte)) continue; entry =3D pte_to_swp_entry(pte); if (unlikely(non_swap_entry(entry))) continue; =20 + pte_unmap_unlock(ptep, ptl); + ptep =3D NULL; + page =3D read_swap_cache_async(entry, GFP_HIGHUSER_MOVABLE, - vma, index, false, &splug); + vma, addr, false, &splug); if (page) put_page(page); } + + if (ptep) + pte_unmap_unlock(ptep, ptl); swap_read_unplug(splug); cond_resched(); =20 @@ -340,7 +346,7 @@ static int madvise_cold_or_pageout_pte_range(pmd_t *pmd, bool pageout =3D private->pageout; struct mm_struct *mm =3D tlb->mm; struct vm_area_struct *vma =3D walk->vma; - pte_t *orig_pte, *pte, ptent; + pte_t *start_pte, *pte, ptent; spinlock_t *ptl; struct folio *folio =3D NULL; LIST_HEAD(folio_list); @@ -422,11 +428,11 @@ static int madvise_cold_or_pageout_pte_range(pmd_t *p= md, } =20 regular_folio: - if (pmd_trans_unstable(pmd)) - return 0; #endif tlb_change_page_size(tlb, PAGE_SIZE); - orig_pte =3D pte =3D pte_offset_map_lock(vma->vm_mm, pmd, addr, &ptl); + start_pte =3D pte =3D pte_offset_map_lock(vma->vm_mm, pmd, addr, &ptl); + if (!start_pte) + return 0; flush_tlb_batched_pending(mm); arch_enter_lazy_mmu_mode(); for (; addr < end; pte++, addr +=3D PAGE_SIZE) { @@ -447,25 +453,28 @@ static int madvise_cold_or_pageout_pte_range(pmd_t *p= md, * are sure it's worth. Split it if we are only owner. */ if (folio_test_large(folio)) { + int err; + if (folio_mapcount(folio) !=3D 1) break; if (pageout_anon_only_filter && !folio_test_anon(folio)) break; + if (!folio_trylock(folio)) + break; folio_get(folio); - if (!folio_trylock(folio)) { - folio_put(folio); - break; - } - pte_unmap_unlock(orig_pte, ptl); - if (split_folio(folio)) { - folio_unlock(folio); - folio_put(folio); - orig_pte =3D pte_offset_map_lock(mm, pmd, addr, &ptl); - break; - } + arch_leave_lazy_mmu_mode(); + pte_unmap_unlock(start_pte, ptl); + start_pte =3D NULL; + err =3D split_folio(folio); folio_unlock(folio); folio_put(folio); - orig_pte =3D pte =3D pte_offset_map_lock(mm, pmd, addr, &ptl); + if (err) + break; + start_pte =3D pte =3D + pte_offset_map_lock(mm, pmd, addr, &ptl); + if (!start_pte) + break; + arch_enter_lazy_mmu_mode(); pte--; addr -=3D PAGE_SIZE; continue; @@ -510,8 +519,10 @@ static int madvise_cold_or_pageout_pte_range(pmd_t *pm= d, folio_deactivate(folio); } =20 - arch_leave_lazy_mmu_mode(); - pte_unmap_unlock(orig_pte, ptl); + if (start_pte) { + arch_leave_lazy_mmu_mode(); + pte_unmap_unlock(start_pte, ptl); + } if (pageout) reclaim_pages(&folio_list); cond_resched(); @@ -612,7 +623,7 @@ static int madvise_free_pte_range(pmd_t *pmd, unsigned = long addr, struct mm_struct *mm =3D tlb->mm; struct vm_area_struct *vma =3D walk->vma; spinlock_t *ptl; - pte_t *orig_pte, *pte, ptent; + pte_t *start_pte, *pte, ptent; struct folio *folio; int nr_swap =3D 0; unsigned long next; @@ -620,13 +631,12 @@ static int madvise_free_pte_range(pmd_t *pmd, unsigne= d long addr, next =3D pmd_addr_end(addr, end); if (pmd_trans_huge(*pmd)) if (madvise_free_huge_pmd(tlb, vma, pmd, addr, next)) - goto next; - - if (pmd_trans_unstable(pmd)) - return 0; + return 0; =20 tlb_change_page_size(tlb, PAGE_SIZE); - orig_pte =3D pte =3D pte_offset_map_lock(mm, pmd, addr, &ptl); + start_pte =3D pte =3D pte_offset_map_lock(mm, pmd, addr, &ptl); + if (!start_pte) + return 0; flush_tlb_batched_pending(mm); arch_enter_lazy_mmu_mode(); for (; addr !=3D end; pte++, addr +=3D PAGE_SIZE) { @@ -664,23 +674,26 @@ static int madvise_free_pte_range(pmd_t *pmd, unsigne= d long addr, * deactivate all pages. */ if (folio_test_large(folio)) { + int err; + if (folio_mapcount(folio) !=3D 1) - goto out; + break; + if (!folio_trylock(folio)) + break; folio_get(folio); - if (!folio_trylock(folio)) { - folio_put(folio); - goto out; - } - pte_unmap_unlock(orig_pte, ptl); - if (split_folio(folio)) { - folio_unlock(folio); - folio_put(folio); - orig_pte =3D pte_offset_map_lock(mm, pmd, addr, &ptl); - goto out; - } + arch_leave_lazy_mmu_mode(); + pte_unmap_unlock(start_pte, ptl); + start_pte =3D NULL; + err =3D split_folio(folio); folio_unlock(folio); folio_put(folio); - orig_pte =3D pte =3D pte_offset_map_lock(mm, pmd, addr, &ptl); + if (err) + break; + start_pte =3D pte =3D + pte_offset_map_lock(mm, pmd, addr, &ptl); + if (!start_pte) + break; + arch_enter_lazy_mmu_mode(); pte--; addr -=3D PAGE_SIZE; continue; @@ -725,17 +738,18 @@ static int madvise_free_pte_range(pmd_t *pmd, unsigne= d long addr, } folio_mark_lazyfree(folio); } -out: + if (nr_swap) { if (current->mm =3D=3D mm) sync_mm_rss(mm); - add_mm_counter(mm, MM_SWAPENTS, nr_swap); } - arch_leave_lazy_mmu_mode(); - pte_unmap_unlock(orig_pte, ptl); + if (start_pte) { + arch_leave_lazy_mmu_mode(); + pte_unmap_unlock(start_pte, ptl); + } cond_resched(); -next: + return 0; } =20 --=20 2.35.3 From nobody Sat Feb 7 20:47:53 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4DAE7C7EE25 for ; Fri, 9 Jun 2023 01:35:24 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S237805AbjFIBfX (ORCPT ); Thu, 8 Jun 2023 21:35:23 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:45350 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S237506AbjFIBfV (ORCPT ); Thu, 8 Jun 2023 21:35:21 -0400 Received: from mail-yw1-x1133.google.com (mail-yw1-x1133.google.com [IPv6:2607:f8b0:4864:20::1133]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 04AB52D68 for ; Thu, 8 Jun 2023 18:35:19 -0700 (PDT) Received: by mail-yw1-x1133.google.com with SMTP id 00721157ae682-565bd368e19so10878147b3.1 for ; Thu, 08 Jun 2023 18:35:19 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1686274519; x=1688866519; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:from:to:cc:subject:date:message-id:reply-to; bh=hzsL+M6cdepZgQw4hIUfWIci6E11MaKNkaWhV1yArkY=; b=Sak7mjF34nMHTvCHP5NeKRtkES8TSscUHlyC9R0R46SCrhEJJ2Wo+n7dfWrX77uQe8 nf+QSJZI16aTp4GUl6IGu9WutySoxO4BhydtYaduPdP51lxLi8Eo7UzEJvf+9zsLlQKg 1uAWQEdqAQK63Qji2NQKBkmHFaElkPHhKBew+iyAcD1FGdj+qH73dMkXTiCWxIZUVbNy fXN+aUpQp9yRvXbnP1DZhFGyMAkNlSUaOV8BRQ3aucz5IbMfcsCH3z+WaSedYI9tzU4M sRC3DhkA8g6Vrf7EurgcGfyAeEcYsIRAt7OjP2hbNUKH6TRoKes6M9tBjy+ai0vdg3/i pdHA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1686274519; x=1688866519; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=hzsL+M6cdepZgQw4hIUfWIci6E11MaKNkaWhV1yArkY=; b=jz8bfDKuaI078HJ+V6g9I9mQKdgBOwYxXI3uRPeWVrH+h2nZC8QWOOTlSwKyzzE4iz al9sAFr0+ELYO2mapanjT+vpfujxvbZa/NcnPnQbuas9t2s3F4NLcazb8DR8i3fw2Klx /GhxSIN3YEzBccOmgIGvwWI3ejQyQ8VTr0McUX+dcS0OoRCzgT4juPcNibsBKL7w5Z3B MNIhtSof1JwmHzc5JozeRWZMyqsB4w6DqIb1wsVMdhuiBxr2nQIU1W1ATrM4pn8QMxsI 4265v1otk3skhCrMt87wiReK3qedTaknpqlfknusEFYz5nxRreUfQ4G4SwSBPGwbbuz/ hlJg== X-Gm-Message-State: AC+VfDwWuNUvmBjSyliR6ELuMe1zqkyxlGBK4FxMgPCgvh4DDXgc9WPM VLY8spZn0dydJ4Hhh/1F15pNew== X-Google-Smtp-Source: ACHHUZ4AK/lYHw0bDyLVKmEksYxVFlrDP4tGf35FRUzn6cSvs6hJCxMo87mkPkKs2nriI5iLhLUP+g== X-Received: by 2002:a81:4e04:0:b0:568:ea0e:ae75 with SMTP id c4-20020a814e04000000b00568ea0eae75mr1286809ywb.45.1686274518944; Thu, 08 Jun 2023 18:35:18 -0700 (PDT) Received: from ripple.attlocal.net (172-10-233-147.lightspeed.sntcca.sbcglobal.net. [172.10.233.147]) by smtp.gmail.com with ESMTPSA id s4-20020a81bf44000000b0054601ee157fsm283836ywk.114.2023.06.08.18.35.15 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 08 Jun 2023 18:35:18 -0700 (PDT) Date: Thu, 8 Jun 2023 18:35:14 -0700 (PDT) From: Hugh Dickins X-X-Sender: hugh@ripple.attlocal.net To: Andrew Morton cc: Mike Kravetz , Mike Rapoport , "Kirill A. Shutemov" , Matthew Wilcox , David Hildenbrand , Suren Baghdasaryan , Qi Zheng , Yang Shi , Mel Gorman , Peter Xu , Peter Zijlstra , Will Deacon , Yu Zhao , Alistair Popple , Ralph Campbell , Ira Weiny , Steven Price , SeongJae Park , Lorenzo Stoakes , Huang Ying , Naoya Horiguchi , Christophe Leroy , Zack Rusin , Jason Gunthorpe , Axel Rasmussen , Anshuman Khandual , Pasha Tatashin , Miaohe Lin , Minchan Kim , Christoph Hellwig , Song Liu , Thomas Hellstrom , Ryan Roberts , linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: [PATCH v2 21/32] mm/madvise: clean up force_shm_swapin_readahead() In-Reply-To: Message-ID: <67e18875-ffb3-ec27-346-f350e07bed87@google.com> References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Some nearby MADV_WILLNEED cleanup unrelated to pte_offset_map_lock(). shmem_swapin_range() is a better name than force_shm_swapin_readahead(). Fix unimportant off-by-one on end_index. Call the swp_entry_t "entry" rather than "swap": either is okay, but entry is the name used elsewhere in mm/madvise.c. Do not assume GFP_HIGHUSER_MOVABLE: that's right for anon swap, but shmem should take gfp from mapping. Pass the actual vma and address to read_swap_cache_async(), in case a NUMA mempolicy applies. lru_add_drain() at outer level, like madvise_willneed()'s other branch. Signed-off-by: Hugh Dickins --- mm/madvise.c | 24 +++++++++++++----------- 1 file changed, 13 insertions(+), 11 deletions(-) diff --git a/mm/madvise.c b/mm/madvise.c index 0af64c4a8f82..9b3c9610052f 100644 --- a/mm/madvise.c +++ b/mm/madvise.c @@ -235,30 +235,34 @@ static const struct mm_walk_ops swapin_walk_ops =3D { .pmd_entry =3D swapin_walk_pmd_entry, }; =20 -static void force_shm_swapin_readahead(struct vm_area_struct *vma, +static void shmem_swapin_range(struct vm_area_struct *vma, unsigned long start, unsigned long end, struct address_space *mapping) { XA_STATE(xas, &mapping->i_pages, linear_page_index(vma, start)); - pgoff_t end_index =3D linear_page_index(vma, end + PAGE_SIZE - 1); + pgoff_t end_index =3D linear_page_index(vma, end) - 1; struct page *page; struct swap_iocb *splug =3D NULL; =20 rcu_read_lock(); xas_for_each(&xas, page, end_index) { - swp_entry_t swap; + unsigned long addr; + swp_entry_t entry; =20 if (!xa_is_value(page)) continue; - swap =3D radix_to_swp_entry(page); + entry =3D radix_to_swp_entry(page); /* There might be swapin error entries in shmem mapping. */ - if (non_swap_entry(swap)) + if (non_swap_entry(entry)) continue; + + addr =3D vma->vm_start + + ((xas.xa_index - vma->vm_pgoff) << PAGE_SHIFT); xas_pause(&xas); rcu_read_unlock(); =20 - page =3D read_swap_cache_async(swap, GFP_HIGHUSER_MOVABLE, - NULL, 0, false, &splug); + page =3D read_swap_cache_async(entry, mapping_gfp_mask(mapping), + vma, addr, false, &splug); if (page) put_page(page); =20 @@ -266,8 +270,6 @@ static void force_shm_swapin_readahead(struct vm_area_s= truct *vma, } rcu_read_unlock(); swap_read_unplug(splug); - - lru_add_drain(); /* Push any new pages onto the LRU now */ } #endif /* CONFIG_SWAP */ =20 @@ -291,8 +293,8 @@ static long madvise_willneed(struct vm_area_struct *vma, } =20 if (shmem_mapping(file->f_mapping)) { - force_shm_swapin_readahead(vma, start, end, - file->f_mapping); + shmem_swapin_range(vma, start, end, file->f_mapping); + lru_add_drain(); /* Push any new pages onto the LRU now */ return 0; } #else --=20 2.35.3 From nobody Sat Feb 7 20:47:53 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id E0BA5C7EE29 for ; Fri, 9 Jun 2023 01:36:20 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S237454AbjFIBgT (ORCPT ); Thu, 8 Jun 2023 21:36:19 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:45696 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230138AbjFIBgR (ORCPT ); Thu, 8 Jun 2023 21:36:17 -0400 Received: from mail-yw1-x1131.google.com (mail-yw1-x1131.google.com [IPv6:2607:f8b0:4864:20::1131]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 32AFA1FDF for ; Thu, 8 Jun 2023 18:36:16 -0700 (PDT) Received: by mail-yw1-x1131.google.com with SMTP id 00721157ae682-565cfe4ece7so11390107b3.2 for ; Thu, 08 Jun 2023 18:36:16 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1686274575; x=1688866575; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:from:to:cc:subject:date:message-id:reply-to; bh=7TjV4RyPitx7LIAI3Pj7MSABK+uluBQcp84mqUzVC6k=; b=a5ewmipktwitFlsPba8meBOEdOaTniMgfwOwR66LxzGfjmzDH4IFIjppt+zCAFlN0g 4qVFbiljADzlv99p9uKv0I6LLrtlgpWP5C7ZN5jYvFhWguH9wpOq9f3yGPT9NIjCbt8f ccDNMfVoLAT69ZaBFF/iOjiPqCtqRwYPT0P1Yb6UZGbQci3A4kQCZ/DTEhdlS+lQWdYY KzmqHWulcwNNrAp2wlegE+sh4YyxODsI8618mTIxeZ3NAKuS71wgysjyvDWVHPesaDpk PafA85IWdjWaVeXqTpvoM+ur2RY2SxdVNDgFZ9WN6yrvvp/2wYtNHHpsHPpA7aoAQsWI /+rA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1686274575; x=1688866575; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=7TjV4RyPitx7LIAI3Pj7MSABK+uluBQcp84mqUzVC6k=; b=Dr0g3tJVQRJk1oseruzU8LjbDYcxIE3mTc0rVo3UUwLy3yEFWvRLlbXAFltfB+uvRs /SdDc1lerothQDoUxiEQMOVQC4+R5B/kpsU08A+ijRe3Hr5+wknarGDIyVqsWyEIRCMc yIoThA+80I/yF1FSJhkfH1+j3bN9ksqJdn2gMvTDNXuV9Ix/lS0ivWJCeZgIC/EafNqo MajkBNYhTD5SzY4aYKM3ktq1BA3RbB3c14CjuJ2nthtG2kZohYFtWdcQrocG2zCcoe+R 0406a/TFSM8Q7Qi3e8l6y4KWflEHLrUZhJ/JBLGb7YHJoSo7D5HaUR2Us5PxFZj7n2RX a4ag== X-Gm-Message-State: AC+VfDyiBbIsFTW5HGLny9BLgdM+InmvrFHGWtJOxFkRp0CqL/x3G0Ki NLd156vzZpMZouMKi+U7Fu0kHg== X-Google-Smtp-Source: ACHHUZ41tpEmUAG3j7GY9w97TvrJQ50NkLe4OKcO0eJ30STZxTamEDNBY6XFXEyrE/Y12cF7SVrDgA== X-Received: by 2002:a81:a0c9:0:b0:568:a870:314f with SMTP id x192-20020a81a0c9000000b00568a870314fmr1280229ywg.30.1686274575219; Thu, 08 Jun 2023 18:36:15 -0700 (PDT) Received: from ripple.attlocal.net (172-10-233-147.lightspeed.sntcca.sbcglobal.net. [172.10.233.147]) by smtp.gmail.com with ESMTPSA id d191-20020a0ddbc8000000b00569fdf7f58bsm293760ywe.66.2023.06.08.18.36.11 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 08 Jun 2023 18:36:14 -0700 (PDT) Date: Thu, 8 Jun 2023 18:36:11 -0700 (PDT) From: Hugh Dickins X-X-Sender: hugh@ripple.attlocal.net To: Andrew Morton cc: Mike Kravetz , Mike Rapoport , "Kirill A. Shutemov" , Matthew Wilcox , David Hildenbrand , Suren Baghdasaryan , Qi Zheng , Yang Shi , Mel Gorman , Peter Xu , Peter Zijlstra , Will Deacon , Yu Zhao , Alistair Popple , Ralph Campbell , Ira Weiny , Steven Price , SeongJae Park , Lorenzo Stoakes , Huang Ying , Naoya Horiguchi , Christophe Leroy , Zack Rusin , Jason Gunthorpe , Axel Rasmussen , Anshuman Khandual , Pasha Tatashin , Miaohe Lin , Minchan Kim , Christoph Hellwig , Song Liu , Thomas Hellstrom , Ryan Roberts , linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: [PATCH v2 22/32] mm/swapoff: allow pte_offset_map[_lock]() to fail In-Reply-To: Message-ID: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Adjust unuse_pte() and unuse_pte_range() to allow pte_offset_map_lock() and pte_offset_map() failure; remove pmd_none_or_trans_huge_or_clear_bad() from unuse_pmd_range() now that pte_offset_map() does all that itself. Signed-off-by: Hugh Dickins --- mm/swapfile.c | 38 ++++++++++++++++++++------------------ 1 file changed, 20 insertions(+), 18 deletions(-) diff --git a/mm/swapfile.c b/mm/swapfile.c index 274bbf797480..12d204e6dae2 100644 --- a/mm/swapfile.c +++ b/mm/swapfile.c @@ -1774,7 +1774,7 @@ static int unuse_pte(struct vm_area_struct *vma, pmd_= t *pmd, hwposioned =3D true; =20 pte =3D pte_offset_map_lock(vma->vm_mm, pmd, addr, &ptl); - if (unlikely(!pte_same_as_swp(*pte, swp_entry_to_pte(entry)))) { + if (unlikely(!pte || !pte_same_as_swp(*pte, swp_entry_to_pte(entry)))) { ret =3D 0; goto out; } @@ -1827,7 +1827,8 @@ static int unuse_pte(struct vm_area_struct *vma, pmd_= t *pmd, set_pte_at(vma->vm_mm, addr, pte, new_pte); swap_free(entry); out: - pte_unmap_unlock(pte, ptl); + if (pte) + pte_unmap_unlock(pte, ptl); if (page !=3D swapcache) { unlock_page(page); put_page(page); @@ -1839,17 +1840,22 @@ static int unuse_pte_range(struct vm_area_struct *v= ma, pmd_t *pmd, unsigned long addr, unsigned long end, unsigned int type) { - swp_entry_t entry; - pte_t *pte; + pte_t *pte =3D NULL; struct swap_info_struct *si; - int ret =3D 0; =20 si =3D swap_info[type]; - pte =3D pte_offset_map(pmd, addr); do { struct folio *folio; unsigned long offset; unsigned char swp_count; + swp_entry_t entry; + int ret; + + if (!pte++) { + pte =3D pte_offset_map(pmd, addr); + if (!pte) + break; + } =20 if (!is_swap_pte(*pte)) continue; @@ -1860,6 +1866,8 @@ static int unuse_pte_range(struct vm_area_struct *vma= , pmd_t *pmd, =20 offset =3D swp_offset(entry); pte_unmap(pte); + pte =3D NULL; + folio =3D swap_cache_get_folio(entry, vma, addr); if (!folio) { struct page *page; @@ -1878,8 +1886,7 @@ static int unuse_pte_range(struct vm_area_struct *vma= , pmd_t *pmd, if (!folio) { swp_count =3D READ_ONCE(si->swap_map[offset]); if (swp_count =3D=3D 0 || swp_count =3D=3D SWAP_MAP_BAD) - goto try_next; - + continue; return -ENOMEM; } =20 @@ -1889,20 +1896,17 @@ static int unuse_pte_range(struct vm_area_struct *v= ma, pmd_t *pmd, if (ret < 0) { folio_unlock(folio); folio_put(folio); - goto out; + return ret; } =20 folio_free_swap(folio); folio_unlock(folio); folio_put(folio); -try_next: - pte =3D pte_offset_map(pmd, addr); - } while (pte++, addr +=3D PAGE_SIZE, addr !=3D end); - pte_unmap(pte - 1); + } while (addr +=3D PAGE_SIZE, addr !=3D end); =20 - ret =3D 0; -out: - return ret; + if (pte) + pte_unmap(pte); + return 0; } =20 static inline int unuse_pmd_range(struct vm_area_struct *vma, pud_t *pud, @@ -1917,8 +1921,6 @@ static inline int unuse_pmd_range(struct vm_area_stru= ct *vma, pud_t *pud, do { cond_resched(); next =3D pmd_addr_end(addr, end); - if (pmd_none_or_trans_huge_or_clear_bad(pmd)) - continue; ret =3D unuse_pte_range(vma, pmd, addr, next, type); if (ret) return ret; --=20 2.35.3 From nobody Sat Feb 7 20:47:53 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id C2FF2C7EE29 for ; Fri, 9 Jun 2023 01:37:25 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S237860AbjFIBhY (ORCPT ); Thu, 8 Jun 2023 21:37:24 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46306 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S237844AbjFIBhT (ORCPT ); Thu, 8 Jun 2023 21:37:19 -0400 Received: from mail-yw1-x1132.google.com (mail-yw1-x1132.google.com [IPv6:2607:f8b0:4864:20::1132]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 582F22D72 for ; Thu, 8 Jun 2023 18:37:17 -0700 (PDT) Received: by mail-yw1-x1132.google.com with SMTP id 00721157ae682-568ba7abc11so11571467b3.3 for ; Thu, 08 Jun 2023 18:37:17 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1686274636; x=1688866636; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:from:to:cc:subject:date:message-id:reply-to; bh=zu44LOJFUZSfjUzXfaM7AXaMVzmcNQ9SKaG/n0S1Di4=; b=g/fK8hP5bFpFVjO76tLNArCNe2GlcEYd1TR7ZwQdCtSODKo9bn2ob62ekzm4ZsThhU yjQK+8xMaXjxfJ1yZWY7NOanC9yVcDVOaMe2iZYwQzT6eJ8h5uTrJC9oVwwCDxswbBHp l/xid7bUgRR1+bdWUBBbvmFDZ/iNYOS5AHsDau932Tn9EMZ4WD6ylqBkwxof8Mw7/egD Ww4k6FcpaXFX8Vqb/E2HUbm90tkOaEYqBxZ6KyMuDMKUc0iBxBMpA3o8pXAN+XBCXRzs f8EIBhd27TS5I+YyqHsJshT+x1siX1ieudcplgI00w6T7xrRyEgdPl7QxJSf2yBgKL3/ hKjg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1686274636; x=1688866636; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=zu44LOJFUZSfjUzXfaM7AXaMVzmcNQ9SKaG/n0S1Di4=; b=BkAcWE0/HBrOzTrkMOrCSUbtIY7MNRxodkTQvGzYPbaDBpIHfgZwHFOjecZDc2ff6w MI/xgy7N9EPvM4MM911Fk3uH6xw9oxxTi4VKvvL5bF02I7B4wHffQXFqpXkm9xbsDvoO GIgopXEJcKpmjPEiKKxlvbwGDH549hkK9M1zQWKADQ9Kf4jKk1o3n3IwQWeqj+WWH5tT XhwEuQBiG0E3cgADvnigbcbMuN+vmByhRvREFSmLwZkJhNIGqNFfF55c46pOPa9MGVJI mlZWeUJdLSgVwnXUTpQUorZbE1DYM5GsUJ2OavEmMmoknZ274ljF0mKsh2Gilbgza2c0 Pfbg== X-Gm-Message-State: AC+VfDz3NjF5iI86q8ZHokLDPr7fL/1ivpqJyGdP20+IYDo7OctquOoQ qdRe69pvQQbtT74+YKrepnqdjg== X-Google-Smtp-Source: ACHHUZ5BEYGpIOz49tbHOiOve9aTVph95zxtniJYmBvQce3mSMb/qY9GD2nlVpV9ClVLYloJKEZjzA== X-Received: by 2002:a81:46d7:0:b0:565:eedc:8dbe with SMTP id t206-20020a8146d7000000b00565eedc8dbemr1425297ywa.27.1686274636468; Thu, 08 Jun 2023 18:37:16 -0700 (PDT) Received: from ripple.attlocal.net (172-10-233-147.lightspeed.sntcca.sbcglobal.net. [172.10.233.147]) by smtp.gmail.com with ESMTPSA id k124-20020a816f82000000b00565b26a9c9csm292968ywc.64.2023.06.08.18.37.13 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 08 Jun 2023 18:37:15 -0700 (PDT) Date: Thu, 8 Jun 2023 18:37:12 -0700 (PDT) From: Hugh Dickins X-X-Sender: hugh@ripple.attlocal.net To: Andrew Morton cc: Mike Kravetz , Mike Rapoport , "Kirill A. Shutemov" , Matthew Wilcox , David Hildenbrand , Suren Baghdasaryan , Qi Zheng , Yang Shi , Mel Gorman , Peter Xu , Peter Zijlstra , Will Deacon , Yu Zhao , Alistair Popple , Ralph Campbell , Ira Weiny , Steven Price , SeongJae Park , Lorenzo Stoakes , Huang Ying , Naoya Horiguchi , Christophe Leroy , Zack Rusin , Jason Gunthorpe , Axel Rasmussen , Anshuman Khandual , Pasha Tatashin , Miaohe Lin , Minchan Kim , Christoph Hellwig , Song Liu , Thomas Hellstrom , Ryan Roberts , linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: [PATCH v2 23/32] mm/mglru: allow pte_offset_map_nolock() to fail In-Reply-To: Message-ID: <51ece73e-7398-2e4a-2384-56708c87844f@google.com> References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" MGLRU's walk_pte_range() use the safer pte_offset_map_nolock(), rather than pte_lockptr(), to get the ptl for its trylock. Just return false and move on to next extent if it fails, like when the trylock fails. Remove the VM_WARN_ON_ONCE(pmd_leaf) since that will happen, rarely. Signed-off-by: Hugh Dickins Acked-by: Yu Zhao --- mm/vmscan.c | 16 +++++++--------- 1 file changed, 7 insertions(+), 9 deletions(-) diff --git a/mm/vmscan.c b/mm/vmscan.c index 6d0cd2840cf0..6a9bb6b30dc8 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -3993,15 +3993,15 @@ static bool walk_pte_range(pmd_t *pmd, unsigned lon= g start, unsigned long end, struct pglist_data *pgdat =3D lruvec_pgdat(walk->lruvec); int old_gen, new_gen =3D lru_gen_from_seq(walk->max_seq); =20 - VM_WARN_ON_ONCE(pmd_leaf(*pmd)); - - ptl =3D pte_lockptr(args->mm, pmd); - if (!spin_trylock(ptl)) + pte =3D pte_offset_map_nolock(args->mm, pmd, start & PMD_MASK, &ptl); + if (!pte) return false; + if (!spin_trylock(ptl)) { + pte_unmap(pte); + return false; + } =20 arch_enter_lazy_mmu_mode(); - - pte =3D pte_offset_map(pmd, start & PMD_MASK); restart: for (i =3D pte_index(start), addr =3D start; addr !=3D end; i++, addr += =3D PAGE_SIZE) { unsigned long pfn; @@ -4042,10 +4042,8 @@ static bool walk_pte_range(pmd_t *pmd, unsigned long= start, unsigned long end, if (i < PTRS_PER_PTE && get_next_vma(PMD_MASK, PAGE_SIZE, args, &start, &= end)) goto restart; =20 - pte_unmap(pte); - arch_leave_lazy_mmu_mode(); - spin_unlock(ptl); + pte_unmap_unlock(pte, ptl); =20 return suitable_to_scan(total, young); } --=20 2.35.3 From nobody Sat Feb 7 20:47:53 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id EA3F4C7EE37 for ; Fri, 9 Jun 2023 01:38:28 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S237844AbjFIBi1 (ORCPT ); Thu, 8 Jun 2023 21:38:27 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46908 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229923AbjFIBiZ (ORCPT ); Thu, 8 Jun 2023 21:38:25 -0400 Received: from mail-ot1-x32e.google.com (mail-ot1-x32e.google.com [IPv6:2607:f8b0:4864:20::32e]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 7DDF918C for ; Thu, 8 Jun 2023 18:38:23 -0700 (PDT) Received: by mail-ot1-x32e.google.com with SMTP id 46e09a7af769-6b2993c9652so261143a34.1 for ; Thu, 08 Jun 2023 18:38:23 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1686274702; x=1688866702; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:from:to:cc:subject:date:message-id:reply-to; bh=AiDluOv2NKQXWITQOVsUWe3sj/TRHmtPFYOGjZ2ttyw=; b=T0KllypmaI6dTB7DB6gV+sXkQkzh1ghuQsJWEK3jBazIFUwKZcsQ3r5kr988Onz36X Dex8mQRK1D2tkuguCbuqQKOnDZrak1tNbdPv9EQlJylAGS/JOPCnfCNAKd3nfZKgjEDX NyjKhyt6lzb+I77gA/PrGWpgqss/Ipf03liz3QsE5ZInJrgAzICnzUTYMrBYboMu8+Ko Te1BCCl46W+vNegr6tum9ZNWdColvrZcB8P0b/1QpqOrN9yiBM3NUya9DmMuMuxFqXqJ gRuJjfPoEiaWO0bbttWGNF01oOM4LC9iGoxIWBtycc73x4aBvX+fDbj/NGV2UJ/71lCf Nnag== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1686274702; x=1688866702; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=AiDluOv2NKQXWITQOVsUWe3sj/TRHmtPFYOGjZ2ttyw=; b=d8ZvJRiEINg5LtWCaKk1ndGwufvihXUWXmJyI14yAy8mzulU4+PO2F4Fo+E62dYYAF vsZGnqMgFeDvfC+jXwMen858VHC8tQ3AW1QhYkrn32P6QfNJKwyYfmhQ2MhD458GMhz3 S/ZdnTljEou88FbVrAUOnkNGzmAHi5B+giz0XmXhvlgSJXWnshuVAxSO+I8UP/b2PJkQ mD34YafQqbH3vAP/ZMOOP8dN7NIkI2vBzrVHqQIkGzzcMUl9yk9k6s8a5MLyFVouE/9G BUHzY7uxww8BCf0rWMZmT/kRatZVMR6MZrJJqTeo7t8ES7fWlsEnaKOphFQuB4oDGICS AXMw== X-Gm-Message-State: AC+VfDyG6+7/obxGJUuvdGneQX/SY2C93wk4q6m8n3ZL2JqWavQuxdyX lxpHFTabjCF9P1KfLhh5Ir9OHw== X-Google-Smtp-Source: ACHHUZ4lATUYWdzZUuFyIeEB4NQ1kndPyTKKCxuz2TahCj6Q7NKx5y9YMV8W26QVvIO6910YNrrjBg== X-Received: by 2002:a05:6830:cb:b0:6b1:570c:de5 with SMTP id x11-20020a05683000cb00b006b1570c0de5mr88477oto.17.1686274701910; Thu, 08 Jun 2023 18:38:21 -0700 (PDT) Received: from ripple.attlocal.net (172-10-233-147.lightspeed.sntcca.sbcglobal.net. [172.10.233.147]) by smtp.gmail.com with ESMTPSA id y7-20020a056902052700b00b8f13ff2a8esm586262ybs.61.2023.06.08.18.38.18 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 08 Jun 2023 18:38:21 -0700 (PDT) Date: Thu, 8 Jun 2023 18:38:17 -0700 (PDT) From: Hugh Dickins X-X-Sender: hugh@ripple.attlocal.net To: Andrew Morton cc: Mike Kravetz , Mike Rapoport , "Kirill A. Shutemov" , Matthew Wilcox , David Hildenbrand , Suren Baghdasaryan , Qi Zheng , Yang Shi , Mel Gorman , Peter Xu , Peter Zijlstra , Will Deacon , Yu Zhao , Alistair Popple , Ralph Campbell , Ira Weiny , Steven Price , SeongJae Park , Lorenzo Stoakes , Huang Ying , Naoya Horiguchi , Christophe Leroy , Zack Rusin , Jason Gunthorpe , Axel Rasmussen , Anshuman Khandual , Pasha Tatashin , Miaohe Lin , Minchan Kim , Christoph Hellwig , Song Liu , Thomas Hellstrom , Ryan Roberts , linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: [PATCH v2 24/32] mm/migrate_device: allow pte_offset_map_lock() to fail In-Reply-To: Message-ID: <1131be62-2e84-da2f-8f45-807b2cbeeec5@google.com> References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" migrate_vma_collect_pmd(): remove the pmd_trans_unstable() handling after splitting huge zero pmd, and the pmd_none() handling after successfully splitting huge page: those are now managed inside pte_offset_map_lock(), and by "goto again" when it fails. But the skip after unsuccessful split_huge_page() must stay: it avoids an endless loop. The skip when pmd_bad()? Remove that: it will be treated as a hole rather than a skip once cleared by pte_offset_map_lock(), but with different timing that would be so anyway; and it's arguably best to leave the pmd_bad() handling centralized there. migrate_vma_insert_page(): remove comment on the old pte_offset_map() and old locking limitations; remove the pmd_trans_unstable() check and just proceed to pte_offset_map_lock(), aborting when it fails (page has been charged to memcg, but as in other cases, it's uncharged when freed). Signed-off-by: Hugh Dickins Reviewed-by: Alistair Popple --- mm/migrate_device.c | 31 ++++--------------------------- 1 file changed, 4 insertions(+), 27 deletions(-) diff --git a/mm/migrate_device.c b/mm/migrate_device.c index d30c9de60b0d..a14af6b12b04 100644 --- a/mm/migrate_device.c +++ b/mm/migrate_device.c @@ -83,9 +83,6 @@ static int migrate_vma_collect_pmd(pmd_t *pmdp, if (is_huge_zero_page(page)) { spin_unlock(ptl); split_huge_pmd(vma, pmdp, addr); - if (pmd_trans_unstable(pmdp)) - return migrate_vma_collect_skip(start, end, - walk); } else { int ret; =20 @@ -100,16 +97,12 @@ static int migrate_vma_collect_pmd(pmd_t *pmdp, if (ret) return migrate_vma_collect_skip(start, end, walk); - if (pmd_none(*pmdp)) - return migrate_vma_collect_hole(start, end, -1, - walk); } } =20 - if (unlikely(pmd_bad(*pmdp))) - return migrate_vma_collect_skip(start, end, walk); - ptep =3D pte_offset_map_lock(mm, pmdp, addr, &ptl); + if (!ptep) + goto again; arch_enter_lazy_mmu_mode(); =20 for (; addr < end; addr +=3D PAGE_SIZE, ptep++) { @@ -595,27 +588,10 @@ static void migrate_vma_insert_page(struct migrate_vm= a *migrate, pmdp =3D pmd_alloc(mm, pudp, addr); if (!pmdp) goto abort; - if (pmd_trans_huge(*pmdp) || pmd_devmap(*pmdp)) goto abort; - - /* - * Use pte_alloc() instead of pte_alloc_map(). We can't run - * pte_offset_map() on pmds where a huge pmd might be created - * from a different thread. - * - * pte_alloc_map() is safe to use under mmap_write_lock(mm) or when - * parallel threads are excluded by other means. - * - * Here we only have mmap_read_lock(mm). - */ if (pte_alloc(mm, pmdp)) goto abort; - - /* See the comment in pte_alloc_one_map() */ - if (unlikely(pmd_trans_unstable(pmdp))) - goto abort; - if (unlikely(anon_vma_prepare(vma))) goto abort; if (mem_cgroup_charge(page_folio(page), vma->vm_mm, GFP_KERNEL)) @@ -650,7 +626,8 @@ static void migrate_vma_insert_page(struct migrate_vma = *migrate, } =20 ptep =3D pte_offset_map_lock(mm, pmdp, addr, &ptl); - + if (!ptep) + goto abort; if (check_stable_address_space(mm)) goto unlock_abort; =20 --=20 2.35.3 From nobody Sat Feb 7 20:47:53 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 57BB1C7EE29 for ; Fri, 9 Jun 2023 01:40:14 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S237859AbjFIBkN (ORCPT ); Thu, 8 Jun 2023 21:40:13 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:47664 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S237849AbjFIBkJ (ORCPT ); Thu, 8 Jun 2023 21:40:09 -0400 Received: from mail-yb1-xb36.google.com (mail-yb1-xb36.google.com [IPv6:2607:f8b0:4864:20::b36]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 3C2B22D74 for ; Thu, 8 Jun 2023 18:40:05 -0700 (PDT) Received: by mail-yb1-xb36.google.com with SMTP id 3f1490d57ef6-ba8151a744fso1359054276.2 for ; Thu, 08 Jun 2023 18:40:05 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1686274804; x=1688866804; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:from:to:cc:subject:date:message-id:reply-to; bh=UwuVNK2fY7F7iaoH+tt2s4ap3NLwMgK8mAcGkUuV1QI=; b=xtJR0sJIJzLTXBky/wVlZo29dNTxzgA54BszS4z+bL4XgGa67uu8WNTm+JM7XI/l9k 6m4Lc80dzOLZH5qTELBlC5YxGQJLyz1bh8ZNW1tEJ8D6xjLzQ+u4+WAilxn55/jW0XRb Epze0ZHSodQnM7E7VM26/xnDCzhsJKvRWH4eXMeV0AMMehd09rxHboGfxvtXMtgKk1Z4 SnsamQypCiUjdvefoXG4abvyN6tDU3XIw1oNbEazjBhrMx+Kg+mHF/zkH+Ct85cC8hz5 iw1woW6T1GIXQ5NWzRO4+D6+WwHZuB7CFNtHE38x8bXk+v2L1K3lyzT2dxXGxy6c449H V+iA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1686274804; x=1688866804; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=UwuVNK2fY7F7iaoH+tt2s4ap3NLwMgK8mAcGkUuV1QI=; b=HHCUPVwH6Pu6J90z7bTkRR990AxFGLSltMLkiN/w1tlhkDCdReKZX1ywKT90oXKurf tQ4PBPZYK3WdwWHAGfFZGpVuppKZOvqlVBUxtI75kvOFMom5yOIZbbqLJitXDRwkD0J1 nT3zqRd1ZpY8azVF0lv13D2KrSoOHhOHoewXt2qhuD6ces52u49a0JrLQuxAxQ7WtHJ3 jA2f+ktQTOdWGhy/vMNbomQ8AhyulxPjMH8Gf5A/olT7HawVPys6u2dOigSXKzf9mfnR 3PVKx6eIR55he04t5Sy6aXY04k9kL9mzuLEmOxLL5cVt3gSYY7R5mEC2Ij9Zt4J/vSVr DZow== X-Gm-Message-State: AC+VfDxeMQyPB4hU6wkrOiIrFrCjrxcHKKwTBitmBdjwuQ52KeJjQkTC UMri3ZeLisMoQLaP1kH06vxhfA== X-Google-Smtp-Source: ACHHUZ4q7dBjgPopIYtcorM5R7hyIrmP9CgiCOnb4btr582PPdFwrjvxsL7Fli0sAzIP7Ak/8SJsSA== X-Received: by 2002:a0d:f543:0:b0:561:ce93:b560 with SMTP id e64-20020a0df543000000b00561ce93b560mr1378798ywf.43.1686274804318; Thu, 08 Jun 2023 18:40:04 -0700 (PDT) Received: from ripple.attlocal.net (172-10-233-147.lightspeed.sntcca.sbcglobal.net. [172.10.233.147]) by smtp.gmail.com with ESMTPSA id t202-20020a8183d3000000b005619cfb1b88sm306230ywf.52.2023.06.08.18.40.00 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 08 Jun 2023 18:40:03 -0700 (PDT) Date: Thu, 8 Jun 2023 18:39:59 -0700 (PDT) From: Hugh Dickins X-X-Sender: hugh@ripple.attlocal.net To: Andrew Morton cc: Mike Kravetz , Mike Rapoport , "Kirill A. Shutemov" , Matthew Wilcox , David Hildenbrand , Suren Baghdasaryan , Qi Zheng , Yang Shi , Mel Gorman , Peter Xu , Peter Zijlstra , Will Deacon , Yu Zhao , Alistair Popple , Ralph Campbell , Ira Weiny , Steven Price , SeongJae Park , Lorenzo Stoakes , Huang Ying , Naoya Horiguchi , Christophe Leroy , Zack Rusin , Jason Gunthorpe , Axel Rasmussen , Anshuman Khandual , Pasha Tatashin , Miaohe Lin , Minchan Kim , Christoph Hellwig , Song Liu , Thomas Hellstrom , Ryan Roberts , linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: [PATCH v2 25/32] mm/gup: remove FOLL_SPLIT_PMD use of pmd_trans_unstable() In-Reply-To: Message-ID: <59fd15dd-4d39-5ec-2043-1d5117f7f85@google.com> References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" There is now no reason for follow_pmd_mask()'s FOLL_SPLIT_PMD block to distinguish huge_zero_page from a normal THP: follow_page_pte() handles any instability, and here it's a good idea to replace any pmd_none(*pmd) by a page table a.s.a.p, in the huge_zero_page case as for a normal THP; and this removes an unnecessary possibility of -EBUSY failure. (Hmm, couldn't the normal THP case have hit an unstably refaulted THP before? But there are only two, exceptional, users of FOLL_SPLIT_PMD.) Signed-off-by: Hugh Dickins Reviewed-by: Yang Shi --- mm/gup.c | 19 ++++--------------- 1 file changed, 4 insertions(+), 15 deletions(-) diff --git a/mm/gup.c b/mm/gup.c index bb67193c5460..4ad50a59897f 100644 --- a/mm/gup.c +++ b/mm/gup.c @@ -681,21 +681,10 @@ static struct page *follow_pmd_mask(struct vm_area_st= ruct *vma, return follow_page_pte(vma, address, pmd, flags, &ctx->pgmap); } if (flags & FOLL_SPLIT_PMD) { - int ret; - page =3D pmd_page(*pmd); - if (is_huge_zero_page(page)) { - spin_unlock(ptl); - ret =3D 0; - split_huge_pmd(vma, pmd, address); - if (pmd_trans_unstable(pmd)) - ret =3D -EBUSY; - } else { - spin_unlock(ptl); - split_huge_pmd(vma, pmd, address); - ret =3D pte_alloc(mm, pmd) ? -ENOMEM : 0; - } - - return ret ? ERR_PTR(ret) : + spin_unlock(ptl); + split_huge_pmd(vma, pmd, address); + /* If pmd was left empty, stuff a page table in there quickly */ + return pte_alloc(mm, pmd) ? ERR_PTR(-ENOMEM) : follow_page_pte(vma, address, pmd, flags, &ctx->pgmap); } page =3D follow_trans_huge_pmd(vma, address, pmd, flags); --=20 2.35.3 From nobody Sat Feb 7 20:47:53 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 27E37C7EE37 for ; Fri, 9 Jun 2023 01:41:42 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S237878AbjFIBll (ORCPT ); Thu, 8 Jun 2023 21:41:41 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48244 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S237849AbjFIBli (ORCPT ); Thu, 8 Jun 2023 21:41:38 -0400 Received: from mail-yw1-x112f.google.com (mail-yw1-x112f.google.com [IPv6:2607:f8b0:4864:20::112f]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 43BC118C for ; Thu, 8 Jun 2023 18:41:37 -0700 (PDT) Received: by mail-yw1-x112f.google.com with SMTP id 00721157ae682-565ba6aee5fso11407947b3.1 for ; Thu, 08 Jun 2023 18:41:37 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1686274896; x=1688866896; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:from:to:cc:subject:date:message-id:reply-to; bh=9rqctUCF3S3JfLMZMEb80lefSOXfBunLGXTWgvpiAkU=; b=ZUnTRtdJaKuP8HpCFk7QtfKJXS4h7TMBZvOJMvdH3Kb9GiIDkKt+HetF/u5hNY0iBZ C1Kvof77sPcx1H7YMQXI4g7GvGYj+dOG69GUGVHoheyoLtvIX7JK20pKUCPc88Hz5F7K 1vDSIi4a6aoe2Zceom9Svwsk2YImxUMqdpYy3MpT8APM++H9OeEHUgaErR5PTJoWvDnG Ooj+D4UNVVjEzEkwYMDebnqNIzPrU3Z0+Y3Vz1q/c+oe13fguEZcDxrw/xPw7f0EQgh9 2MxgqZ31UDTio0TPV7BeKiTntn6xKzAMDwB1lip/A72zUIdR7jZaKNlHZ1NNT8xv0XOp wchQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1686274896; x=1688866896; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=9rqctUCF3S3JfLMZMEb80lefSOXfBunLGXTWgvpiAkU=; b=RROmMbfz1k8qf8gEMcbUyn8M/bRnEckjvMw4Ma1pJkd27czc0XA93mzHL9LQK82Mq8 KSVXtKAxUjCQ+O3UlaktpHt6LZoBx+tIgddsv4Yr7Jt7WaaVwUaxbEoyoyBEE7SIoH/j RHzASwBaQhQiTJQlCRx1NXu1fTt9U/4QCPMcSHJo80t7EZjNOwwFVoS5Ni9yNGVv2RYK PD/X8+SrFEMJe3ctFewYAXWTD+Ieuf9JGTGfOA8DMKhpjYdHlIayAV4V3H4hZKmNbqs4 I2b9MvypKd9912Qoy4jKyEwolf/1TQSYMfyNW4qGlOpdZNrL2kpu2BW4+Yc2kN882xtU EtQQ== X-Gm-Message-State: AC+VfDzMkKux4Z4ZQON/GUU78BcMaRsleyZLm8NG31Q6T3L3HL6IXRtl drXgZbJCCmpgr/ZsxottIy0jkA== X-Google-Smtp-Source: ACHHUZ4dqn2pyv+NnV1us+V+eC0MmFm8NCi0bji00YqYjgvIjBIANrUXMpCs8h/qJKTmj/xoG04trg== X-Received: by 2002:a81:6d04:0:b0:565:d3f9:209e with SMTP id i4-20020a816d04000000b00565d3f9209emr1374109ywc.34.1686274896335; Thu, 08 Jun 2023 18:41:36 -0700 (PDT) Received: from ripple.attlocal.net (172-10-233-147.lightspeed.sntcca.sbcglobal.net. [172.10.233.147]) by smtp.gmail.com with ESMTPSA id r205-20020a0de8d6000000b00559d9989490sm304902ywe.41.2023.06.08.18.41.32 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 08 Jun 2023 18:41:34 -0700 (PDT) Date: Thu, 8 Jun 2023 18:41:31 -0700 (PDT) From: Hugh Dickins X-X-Sender: hugh@ripple.attlocal.net To: Andrew Morton cc: Mike Kravetz , Mike Rapoport , "Kirill A. Shutemov" , Matthew Wilcox , David Hildenbrand , Suren Baghdasaryan , Qi Zheng , Yang Shi , Mel Gorman , Peter Xu , Peter Zijlstra , Will Deacon , Yu Zhao , Alistair Popple , Ralph Campbell , Ira Weiny , Steven Price , SeongJae Park , Lorenzo Stoakes , Huang Ying , Naoya Horiguchi , Christophe Leroy , Zack Rusin , Jason Gunthorpe , Axel Rasmussen , Anshuman Khandual , Pasha Tatashin , Miaohe Lin , Minchan Kim , Christoph Hellwig , Song Liu , Thomas Hellstrom , Ryan Roberts , linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: [PATCH v2 26/32] mm/huge_memory: split huge pmd under one pte_offset_map() In-Reply-To: Message-ID: <90cbed7f-90d9-b779-4a46-d2485baf9595@google.com> References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" __split_huge_zero_page_pmd() use a single pte_offset_map() to sweep the extent: it's already under pmd_lock(), so this is no worse for latency; and since it's supposed to have full control of the just-withdrawn page table, here choose to VM_BUG_ON if it were to fail. And please don't increment haddr by PAGE_SIZE, that should remain huge aligned: declare a separate addr (not a bugfix, but it was deceptive). __split_huge_pmd_locked() likewise (but it had declared a separate addr); and change its BUG_ON(!pte_none) to VM_BUG_ON, for consistency with zero (those deposited page tables are sometimes victims of random corruption). Signed-off-by: Hugh Dickins Reviewed-by: Yang Shi --- mm/huge_memory.c | 28 ++++++++++++++++++---------- 1 file changed, 18 insertions(+), 10 deletions(-) diff --git a/mm/huge_memory.c b/mm/huge_memory.c index d4bd5fa7c823..839c13fa0bbe 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -2037,6 +2037,8 @@ static void __split_huge_zero_page_pmd(struct vm_area= _struct *vma, struct mm_struct *mm =3D vma->vm_mm; pgtable_t pgtable; pmd_t _pmd, old_pmd; + unsigned long addr; + pte_t *pte; int i; =20 /* @@ -2052,17 +2054,20 @@ static void __split_huge_zero_page_pmd(struct vm_ar= ea_struct *vma, pgtable =3D pgtable_trans_huge_withdraw(mm, pmd); pmd_populate(mm, &_pmd, pgtable); =20 - for (i =3D 0; i < HPAGE_PMD_NR; i++, haddr +=3D PAGE_SIZE) { - pte_t *pte, entry; - entry =3D pfn_pte(my_zero_pfn(haddr), vma->vm_page_prot); + pte =3D pte_offset_map(&_pmd, haddr); + VM_BUG_ON(!pte); + for (i =3D 0, addr =3D haddr; i < HPAGE_PMD_NR; i++, addr +=3D PAGE_SIZE)= { + pte_t entry; + + entry =3D pfn_pte(my_zero_pfn(addr), vma->vm_page_prot); entry =3D pte_mkspecial(entry); if (pmd_uffd_wp(old_pmd)) entry =3D pte_mkuffd_wp(entry); - pte =3D pte_offset_map(&_pmd, haddr); VM_BUG_ON(!pte_none(*pte)); - set_pte_at(mm, haddr, pte, entry); - pte_unmap(pte); + set_pte_at(mm, addr, pte, entry); + pte++; } + pte_unmap(pte - 1); smp_wmb(); /* make pte visible before pmd */ pmd_populate(mm, pmd, pgtable); } @@ -2077,6 +2082,7 @@ static void __split_huge_pmd_locked(struct vm_area_st= ruct *vma, pmd_t *pmd, bool young, write, soft_dirty, pmd_migration =3D false, uffd_wp =3D false; bool anon_exclusive =3D false, dirty =3D false; unsigned long addr; + pte_t *pte; int i; =20 VM_BUG_ON(haddr & ~HPAGE_PMD_MASK); @@ -2205,8 +2211,10 @@ static void __split_huge_pmd_locked(struct vm_area_s= truct *vma, pmd_t *pmd, pgtable =3D pgtable_trans_huge_withdraw(mm, pmd); pmd_populate(mm, &_pmd, pgtable); =20 + pte =3D pte_offset_map(&_pmd, haddr); + VM_BUG_ON(!pte); for (i =3D 0, addr =3D haddr; i < HPAGE_PMD_NR; i++, addr +=3D PAGE_SIZE)= { - pte_t entry, *pte; + pte_t entry; /* * Note that NUMA hinting access restrictions are not * transferred to avoid any possibility of altering @@ -2249,11 +2257,11 @@ static void __split_huge_pmd_locked(struct vm_area_= struct *vma, pmd_t *pmd, entry =3D pte_mkuffd_wp(entry); page_add_anon_rmap(page + i, vma, addr, false); } - pte =3D pte_offset_map(&_pmd, addr); - BUG_ON(!pte_none(*pte)); + VM_BUG_ON(!pte_none(*pte)); set_pte_at(mm, addr, pte, entry); - pte_unmap(pte); + pte++; } + pte_unmap(pte - 1); =20 if (!pmd_migration) page_remove_rmap(page, vma, true); --=20 2.35.3 From nobody Sat Feb 7 20:47:53 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 51E41C7EE23 for ; Fri, 9 Jun 2023 01:42:51 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229482AbjFIBmu (ORCPT ); Thu, 8 Jun 2023 21:42:50 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49030 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S237849AbjFIBmr (ORCPT ); Thu, 8 Jun 2023 21:42:47 -0400 Received: from mail-yb1-xb32.google.com (mail-yb1-xb32.google.com [IPv6:2607:f8b0:4864:20::b32]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id F276F1BF0 for ; Thu, 8 Jun 2023 18:42:45 -0700 (PDT) Received: by mail-yb1-xb32.google.com with SMTP id 3f1490d57ef6-bb3a77abd7bso1301453276.0 for ; Thu, 08 Jun 2023 18:42:45 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1686274965; x=1688866965; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:from:to:cc:subject:date:message-id:reply-to; bh=p1k5/eahjbWsfa/uahTffpcykz1TgCrP6tjOVjA4Rhk=; b=kXMH2hIQkmF2USapaQhiZcpKUKkNYVjPRafKmj8hC8uAM6lWt8Cg7WyWV3i5oVqEWA bi93PCMA7PjdAzeBItn2fVWVBY+/9qgI6tMfBLEeEpjUthSCUQOS9R5oolpPexMofMwP pSgpjbezGBNl2+s7mdLUqylcEDr0Rzwx/U2TjkwQG91WnUJpoHZuxPDmnmO/Yv7J8FdM cwm5pnD7LoAthDQwPzAWpV55QbduVNRTxBSVVH445+SgGv+mnH4NOD/8k4MMNgeNCGj5 /35hYwKb/TBzCQeOI3M6UpGu01ZJaZLpaHwXWmcS91MANZ+pzrD8lrdrhALbZ0SuH8aW G/vA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1686274965; x=1688866965; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=p1k5/eahjbWsfa/uahTffpcykz1TgCrP6tjOVjA4Rhk=; b=HY5JXU6BuAeEX5zvLmfXktHcqZG8hsUicq4rViqofVF1QwOnXZlI5MXIQb/aSBQZf9 dNjDBhamY7GllGywm+08Ew6EoCqFFfvvRqtS57D700mO7msq2DvZpp6cONBv5SxlbhAw svFhUBuEeZ/oebYhq/OUDegswsSZaRw3BU3pBDvBJq3TZIABRFN7Dpop241ZgrowVWNB BTciCSDRMhOvaRGssxa9qoO/w4pWYAR/wG4Qn199ZCavQ0Eeta03Fy+qfI6pXkcPXlyK Avzm4rtzR3OH3GrHOeMSGOSabJO9mtPDnkTOWCVzigFjEz/YXtj6MWaTCabVnC2gK+nZ bfqA== X-Gm-Message-State: AC+VfDyKf67mjAUiAIv5ueWRYLkFZdE1+h2oCIb6Fl3ZTmmtTskBducO ihx4EpqTCV6LZseGIYlpVrIozg== X-Google-Smtp-Source: ACHHUZ5fn3kW9FeGHNpW1xzpcB4LGHphf3k1hTSF+y3EK3Em0GkiW50KAIuHoFANUJ/kQPH4MzH72g== X-Received: by 2002:a25:ab53:0:b0:ba8:972d:e380 with SMTP id u77-20020a25ab53000000b00ba8972de380mr1240110ybi.22.1686274964991; Thu, 08 Jun 2023 18:42:44 -0700 (PDT) Received: from ripple.attlocal.net (172-10-233-147.lightspeed.sntcca.sbcglobal.net. [172.10.233.147]) by smtp.gmail.com with ESMTPSA id e188-20020a251ec5000000b00baca49c80dcsm615573ybe.28.2023.06.08.18.42.41 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 08 Jun 2023 18:42:43 -0700 (PDT) Date: Thu, 8 Jun 2023 18:42:40 -0700 (PDT) From: Hugh Dickins X-X-Sender: hugh@ripple.attlocal.net To: Andrew Morton cc: Mike Kravetz , Mike Rapoport , "Kirill A. Shutemov" , Matthew Wilcox , David Hildenbrand , Suren Baghdasaryan , Qi Zheng , Yang Shi , Mel Gorman , Peter Xu , Peter Zijlstra , Will Deacon , Yu Zhao , Alistair Popple , Ralph Campbell , Ira Weiny , Steven Price , SeongJae Park , Lorenzo Stoakes , Huang Ying , Naoya Horiguchi , Christophe Leroy , Zack Rusin , Jason Gunthorpe , Axel Rasmussen , Anshuman Khandual , Pasha Tatashin , Miaohe Lin , Minchan Kim , Christoph Hellwig , Song Liu , Thomas Hellstrom , Ryan Roberts , linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: [PATCH v2 27/32] mm/khugepaged: allow pte_offset_map[_lock]() to fail In-Reply-To: Message-ID: <6513e85-d798-34ec-3762-7c24ffb9329@google.com> References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" __collapse_huge_page_swapin(): don't drop the map after every pte, it only has to be dropped by do_swap_page(); give up if pte_offset_map() fails; trace_mm_collapse_huge_page_swapin() at the end, with result; fix comment on returned result; fix vmf.pgoff, though it's not used. collapse_huge_page(): use pte_offset_map_lock() on the _pmd returned from clearing; allow failure, but it should be impossible there. hpage_collapse_scan_pmd() and collapse_pte_mapped_thp() allow for pte_offset_map_lock() failure. Signed-off-by: Hugh Dickins Reviewed-by: Yang Shi --- mm/khugepaged.c | 72 +++++++++++++++++++++++++++++++++---------------- 1 file changed, 49 insertions(+), 23 deletions(-) diff --git a/mm/khugepaged.c b/mm/khugepaged.c index 732f9ac393fc..49cfa7cdfe93 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -993,9 +993,8 @@ static int check_pmd_still_valid(struct mm_struct *mm, * Only done if hpage_collapse_scan_pmd believes it is worthwhile. * * Called and returns without pte mapped or spinlocks held. - * Note that if false is returned, mmap_lock will be released. + * Returns result: if not SCAN_SUCCEED, mmap_lock has been released. */ - static int __collapse_huge_page_swapin(struct mm_struct *mm, struct vm_area_struct *vma, unsigned long haddr, pmd_t *pmd, @@ -1004,23 +1003,35 @@ static int __collapse_huge_page_swapin(struct mm_st= ruct *mm, int swapped_in =3D 0; vm_fault_t ret =3D 0; unsigned long address, end =3D haddr + (HPAGE_PMD_NR * PAGE_SIZE); + int result; + pte_t *pte =3D NULL; =20 for (address =3D haddr; address < end; address +=3D PAGE_SIZE) { struct vm_fault vmf =3D { .vma =3D vma, .address =3D address, - .pgoff =3D linear_page_index(vma, haddr), + .pgoff =3D linear_page_index(vma, address), .flags =3D FAULT_FLAG_ALLOW_RETRY, .pmd =3D pmd, }; =20 - vmf.pte =3D pte_offset_map(pmd, address); - vmf.orig_pte =3D *vmf.pte; - if (!is_swap_pte(vmf.orig_pte)) { - pte_unmap(vmf.pte); - continue; + if (!pte++) { + pte =3D pte_offset_map(pmd, address); + if (!pte) { + mmap_read_unlock(mm); + result =3D SCAN_PMD_NULL; + goto out; + } } + + vmf.orig_pte =3D *pte; + if (!is_swap_pte(vmf.orig_pte)) + continue; + + vmf.pte =3D pte; ret =3D do_swap_page(&vmf); + /* Which unmaps pte (after perhaps re-checking the entry) */ + pte =3D NULL; =20 /* * do_swap_page returns VM_FAULT_RETRY with released mmap_lock. @@ -1029,24 +1040,29 @@ static int __collapse_huge_page_swapin(struct mm_st= ruct *mm, * resulting in later failure. */ if (ret & VM_FAULT_RETRY) { - trace_mm_collapse_huge_page_swapin(mm, swapped_in, referenced, 0); /* Likely, but not guaranteed, that page lock failed */ - return SCAN_PAGE_LOCK; + result =3D SCAN_PAGE_LOCK; + goto out; } if (ret & VM_FAULT_ERROR) { mmap_read_unlock(mm); - trace_mm_collapse_huge_page_swapin(mm, swapped_in, referenced, 0); - return SCAN_FAIL; + result =3D SCAN_FAIL; + goto out; } swapped_in++; } =20 + if (pte) + pte_unmap(pte); + /* Drain LRU add pagevec to remove extra pin on the swapped in pages */ if (swapped_in) lru_add_drain(); =20 - trace_mm_collapse_huge_page_swapin(mm, swapped_in, referenced, 1); - return SCAN_SUCCEED; + result =3D SCAN_SUCCEED; +out: + trace_mm_collapse_huge_page_swapin(mm, swapped_in, referenced, result); + return result; } =20 static int alloc_charge_hpage(struct page **hpage, struct mm_struct *mm, @@ -1146,9 +1162,6 @@ static int collapse_huge_page(struct mm_struct *mm, u= nsigned long address, address + HPAGE_PMD_SIZE); mmu_notifier_invalidate_range_start(&range); =20 - pte =3D pte_offset_map(pmd, address); - pte_ptl =3D pte_lockptr(mm, pmd); - pmd_ptl =3D pmd_lock(mm, pmd); /* probably unnecessary */ /* * This removes any huge TLB entry from the CPU so we won't allow @@ -1163,13 +1176,18 @@ static int collapse_huge_page(struct mm_struct *mm,= unsigned long address, mmu_notifier_invalidate_range_end(&range); tlb_remove_table_sync_one(); =20 - spin_lock(pte_ptl); - result =3D __collapse_huge_page_isolate(vma, address, pte, cc, - &compound_pagelist); - spin_unlock(pte_ptl); + pte =3D pte_offset_map_lock(mm, &_pmd, address, &pte_ptl); + if (pte) { + result =3D __collapse_huge_page_isolate(vma, address, pte, cc, + &compound_pagelist); + spin_unlock(pte_ptl); + } else { + result =3D SCAN_PMD_NULL; + } =20 if (unlikely(result !=3D SCAN_SUCCEED)) { - pte_unmap(pte); + if (pte) + pte_unmap(pte); spin_lock(pmd_ptl); BUG_ON(!pmd_none(*pmd)); /* @@ -1253,6 +1271,11 @@ static int hpage_collapse_scan_pmd(struct mm_struct = *mm, memset(cc->node_load, 0, sizeof(cc->node_load)); nodes_clear(cc->alloc_nmask); pte =3D pte_offset_map_lock(mm, pmd, address, &ptl); + if (!pte) { + result =3D SCAN_PMD_NULL; + goto out; + } + for (_address =3D address, _pte =3D pte; _pte < pte + HPAGE_PMD_NR; _pte++, _address +=3D PAGE_SIZE) { pte_t pteval =3D *_pte; @@ -1622,8 +1645,10 @@ int collapse_pte_mapped_thp(struct mm_struct *mm, un= signed long addr, * lockless_pages_from_mm() and the hardware page walker can access page * tables while all the high-level locks are held in write mode. */ - start_pte =3D pte_offset_map_lock(mm, pmd, haddr, &ptl); result =3D SCAN_FAIL; + start_pte =3D pte_offset_map_lock(mm, pmd, haddr, &ptl); + if (!start_pte) + goto drop_immap; =20 /* step 1: check all mapped PTEs are to the right huge page */ for (i =3D 0, addr =3D haddr, pte =3D start_pte; @@ -1697,6 +1722,7 @@ int collapse_pte_mapped_thp(struct mm_struct *mm, uns= igned long addr, =20 abort: pte_unmap_unlock(start_pte, ptl); +drop_immap: i_mmap_unlock_write(vma->vm_file->f_mapping); goto drop_hpage; } --=20 2.35.3 From nobody Sat Feb 7 20:47:53 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 447ADC7EE29 for ; Fri, 9 Jun 2023 01:43:50 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S237946AbjFIBnt (ORCPT ); Thu, 8 Jun 2023 21:43:49 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49680 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S237153AbjFIBnq (ORCPT ); Thu, 8 Jun 2023 21:43:46 -0400 Received: from mail-yw1-x1136.google.com (mail-yw1-x1136.google.com [IPv6:2607:f8b0:4864:20::1136]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 1C31C19D for ; Thu, 8 Jun 2023 18:43:44 -0700 (PDT) Received: by mail-yw1-x1136.google.com with SMTP id 00721157ae682-56ca07b34b1so11534767b3.0 for ; Thu, 08 Jun 2023 18:43:44 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1686275023; x=1688867023; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:from:to:cc:subject:date:message-id:reply-to; bh=NBl0ppCRA8l25HwCsBAYjZw+ZZ23COv4vEuhAo6MgQs=; b=h7/lhgABcsEuN80ZtY1CkQPEyiRM02tdU0Fj4Bqu3XEJ6+oMt1JttOuF1+QhBlyKlU nItRcb1lhVcg3eXONIukosC/ovw61FcNIrky9ZYjzzwajHv6HsTcvoIBjY3VhzMNyh1o MljGnp6iHoyALEuaRGEygREAs7NiPhq7Ts0bzgZaUDe8a2nZJcIQ2+Lj1wkKmKdUDbGy NH5AIRGaz2lGcsHiKp41MEO/swZqDgq35wqrQHtww6/nIP6oqDLpTQPv1VdUEAb8WQlv b+6qKzgTWVc0T+xle5LZMl0lKQUz8B9zZKcS4HbNa7M3nikEfj9E/klPVWnDJWzUyVm/ ISKg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1686275023; x=1688867023; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=NBl0ppCRA8l25HwCsBAYjZw+ZZ23COv4vEuhAo6MgQs=; b=SqBGwOAV+tsL/AwDFBDHK1tgaGdpj8nnEM37Er/kCGbaBoVeANgM0b9rR2+ouBMBhf d938bN1lsRVQKfPhVPOIZ/bYCM/5AkpKDWvV0qSmrARkQYJfNEYTUdhgmKB+E4hXnzng wcm77LIZKdw6BFgqSJG0JIYQ8AApIS0bJYahAFZ8Pc1AqgrkYn4quX5PMv1K9abzKg2N SzmGlychARUkVN3I0HoRox8SmmSxBccJ3VahgUA4sdFTzYcO+XT8gF33Nr7NSSn47uLk 8shOfNVeZHCErvYKRrMIiBdcoJqBkV6BzV+r5TBAsCehnX8kvu+00eBXUsG3uu8+vzOx BDKQ== X-Gm-Message-State: AC+VfDxdeYt/Eh44gFPKIFlACFfCFPXTfx3kqbEpf7PmEzoGVZISr/2A PD7tPWeqdaVaGrLnsb3a65LxIQ== X-Google-Smtp-Source: ACHHUZ7uNPgy7ZBXRocyBo3vsfUWGTN3VMWhF4yrbhs6dEiMkK2v8+kmP/oHR3LauFiKONeXt7BBYg== X-Received: by 2002:a0d:f6c4:0:b0:55a:40d3:4d6f with SMTP id g187-20020a0df6c4000000b0055a40d34d6fmr1156326ywf.26.1686275023013; Thu, 08 Jun 2023 18:43:43 -0700 (PDT) Received: from ripple.attlocal.net (172-10-233-147.lightspeed.sntcca.sbcglobal.net. [172.10.233.147]) by smtp.gmail.com with ESMTPSA id a17-20020a81bb51000000b00545a08184fdsm281040ywl.141.2023.06.08.18.43.38 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 08 Jun 2023 18:43:41 -0700 (PDT) Date: Thu, 8 Jun 2023 18:43:38 -0700 (PDT) From: Hugh Dickins X-X-Sender: hugh@ripple.attlocal.net To: Andrew Morton cc: Mike Kravetz , Mike Rapoport , "Kirill A. Shutemov" , Matthew Wilcox , David Hildenbrand , Suren Baghdasaryan , Qi Zheng , Yang Shi , Mel Gorman , Peter Xu , Peter Zijlstra , Will Deacon , Yu Zhao , Alistair Popple , Ralph Campbell , Ira Weiny , Steven Price , SeongJae Park , Lorenzo Stoakes , Huang Ying , Naoya Horiguchi , Christophe Leroy , Zack Rusin , Jason Gunthorpe , Axel Rasmussen , Anshuman Khandual , Pasha Tatashin , Miaohe Lin , Minchan Kim , Christoph Hellwig , Song Liu , Thomas Hellstrom , Ryan Roberts , linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: [PATCH v2 28/32] mm/memory: allow pte_offset_map[_lock]() to fail In-Reply-To: Message-ID: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" copy_pte_range(): use pte_offset_map_nolock(), and allow for it to fail; but with a comment on some further assumptions that are being made there. zap_pte_range() and zap_pmd_range(): adjust their interaction so that a pte_offset_map_lock() failure in zap_pte_range() leads to a retry in zap_pmd_range(); remove call to pmd_none_or_trans_huge_or_clear_bad(). Allow pte_offset_map_lock() to fail in many functions. Update comment on calling pte_alloc() in do_anonymous_page(). Remove redundant calls to pmd_trans_unstable(), pmd_devmap_trans_unstable(), pmd_none() and pmd_bad(); but leave pmd_none_or_clear_bad() calls in free_pmd_range() and copy_pmd_range(), those do simplify the next level down. Signed-off-by: Hugh Dickins --- mm/memory.c | 172 +++++++++++++++++++++++++--------------------------- 1 file changed, 82 insertions(+), 90 deletions(-) diff --git a/mm/memory.c b/mm/memory.c index 2eb54c0d5d3c..c7b920291a72 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -1012,13 +1012,25 @@ copy_pte_range(struct vm_area_struct *dst_vma, stru= ct vm_area_struct *src_vma, progress =3D 0; init_rss_vec(rss); =20 + /* + * copy_pmd_range()'s prior pmd_none_or_clear_bad(src_pmd), and the + * error handling here, assume that exclusive mmap_lock on dst and src + * protects anon from unexpected THP transitions; with shmem and file + * protected by mmap_lock-less collapse skipping areas with anon_vma + * (whereas vma_needs_copy() skips areas without anon_vma). A rework + * can remove such assumptions later, but this is good enough for now. + */ dst_pte =3D pte_alloc_map_lock(dst_mm, dst_pmd, addr, &dst_ptl); if (!dst_pte) { ret =3D -ENOMEM; goto out; } - src_pte =3D pte_offset_map(src_pmd, addr); - src_ptl =3D pte_lockptr(src_mm, src_pmd); + src_pte =3D pte_offset_map_nolock(src_mm, src_pmd, addr, &src_ptl); + if (!src_pte) { + pte_unmap_unlock(dst_pte, dst_ptl); + /* ret =3D=3D 0 */ + goto out; + } spin_lock_nested(src_ptl, SINGLE_DEPTH_NESTING); orig_src_pte =3D src_pte; orig_dst_pte =3D dst_pte; @@ -1083,8 +1095,7 @@ copy_pte_range(struct vm_area_struct *dst_vma, struct= vm_area_struct *src_vma, } while (dst_pte++, src_pte++, addr +=3D PAGE_SIZE, addr !=3D end); =20 arch_leave_lazy_mmu_mode(); - spin_unlock(src_ptl); - pte_unmap(orig_src_pte); + pte_unmap_unlock(orig_src_pte, src_ptl); add_mm_rss_vec(dst_mm, rss); pte_unmap_unlock(orig_dst_pte, dst_ptl); cond_resched(); @@ -1388,10 +1399,11 @@ static unsigned long zap_pte_range(struct mmu_gathe= r *tlb, swp_entry_t entry; =20 tlb_change_page_size(tlb, PAGE_SIZE); -again: init_rss_vec(rss); - start_pte =3D pte_offset_map_lock(mm, pmd, addr, &ptl); - pte =3D start_pte; + start_pte =3D pte =3D pte_offset_map_lock(mm, pmd, addr, &ptl); + if (!pte) + return addr; + flush_tlb_batched_pending(mm); arch_enter_lazy_mmu_mode(); do { @@ -1507,17 +1519,10 @@ static unsigned long zap_pte_range(struct mmu_gathe= r *tlb, * If we forced a TLB flush (either due to running out of * batch buffers or because we needed to flush dirty TLB * entries before releasing the ptl), free the batched - * memory too. Restart if we didn't do everything. + * memory too. Come back again if we didn't do everything. */ - if (force_flush) { - force_flush =3D 0; + if (force_flush) tlb_flush_mmu(tlb); - } - - if (addr !=3D end) { - cond_resched(); - goto again; - } =20 return addr; } @@ -1536,8 +1541,10 @@ static inline unsigned long zap_pmd_range(struct mmu= _gather *tlb, if (is_swap_pmd(*pmd) || pmd_trans_huge(*pmd) || pmd_devmap(*pmd)) { if (next - addr !=3D HPAGE_PMD_SIZE) __split_huge_pmd(vma, pmd, addr, false, NULL); - else if (zap_huge_pmd(tlb, vma, pmd, addr)) - goto next; + else if (zap_huge_pmd(tlb, vma, pmd, addr)) { + addr =3D next; + continue; + } /* fall through */ } else if (details && details->single_folio && folio_test_pmd_mappable(details->single_folio) && @@ -1550,20 +1557,14 @@ static inline unsigned long zap_pmd_range(struct mm= u_gather *tlb, */ spin_unlock(ptl); } - - /* - * Here there can be other concurrent MADV_DONTNEED or - * trans huge page faults running, and if the pmd is - * none or trans huge it can change under us. This is - * because MADV_DONTNEED holds the mmap_lock in read - * mode. - */ - if (pmd_none_or_trans_huge_or_clear_bad(pmd)) - goto next; - next =3D zap_pte_range(tlb, vma, pmd, addr, next, details); -next: - cond_resched(); - } while (pmd++, addr =3D next, addr !=3D end); + if (pmd_none(*pmd)) { + addr =3D next; + continue; + } + addr =3D zap_pte_range(tlb, vma, pmd, addr, next, details); + if (addr !=3D next) + pmd--; + } while (pmd++, cond_resched(), addr !=3D end); =20 return addr; } @@ -1905,6 +1906,10 @@ static int insert_pages(struct vm_area_struct *vma, = unsigned long addr, const int batch_size =3D min_t(int, pages_to_write_in_pmd, 8); =20 start_pte =3D pte_offset_map_lock(mm, pmd, addr, &pte_lock); + if (!start_pte) { + ret =3D -EFAULT; + goto out; + } for (pte =3D start_pte; pte_idx < batch_size; ++pte, ++pte_idx) { int err =3D insert_page_in_batch_locked(vma, pte, addr, pages[curr_page_idx], prot); @@ -2572,10 +2577,10 @@ static int apply_to_pte_range(struct mm_struct *mm,= pmd_t *pmd, mapped_pte =3D pte =3D (mm =3D=3D &init_mm) ? pte_offset_kernel(pmd, addr) : pte_offset_map_lock(mm, pmd, addr, &ptl); + if (!pte) + return -EINVAL; } =20 - BUG_ON(pmd_huge(*pmd)); - arch_enter_lazy_mmu_mode(); =20 if (fn) { @@ -2804,7 +2809,6 @@ static inline int __wp_page_copy_user(struct page *ds= t, struct page *src, int ret; void *kaddr; void __user *uaddr; - bool locked =3D false; struct vm_area_struct *vma =3D vmf->vma; struct mm_struct *mm =3D vma->vm_mm; unsigned long addr =3D vmf->address; @@ -2830,12 +2834,12 @@ static inline int __wp_page_copy_user(struct page *= dst, struct page *src, * On architectures with software "accessed" bits, we would * take a double page fault, so mark it accessed here. */ + vmf->pte =3D NULL; if (!arch_has_hw_pte_young() && !pte_young(vmf->orig_pte)) { pte_t entry; =20 vmf->pte =3D pte_offset_map_lock(mm, vmf->pmd, addr, &vmf->ptl); - locked =3D true; - if (!likely(pte_same(*vmf->pte, vmf->orig_pte))) { + if (unlikely(!vmf->pte || !pte_same(*vmf->pte, vmf->orig_pte))) { /* * Other thread has already handled the fault * and update local tlb only @@ -2857,13 +2861,12 @@ static inline int __wp_page_copy_user(struct page *= dst, struct page *src, * zeroes. */ if (__copy_from_user_inatomic(kaddr, uaddr, PAGE_SIZE)) { - if (locked) + if (vmf->pte) goto warn; =20 /* Re-validate under PTL if the page is still mapped */ vmf->pte =3D pte_offset_map_lock(mm, vmf->pmd, addr, &vmf->ptl); - locked =3D true; - if (!likely(pte_same(*vmf->pte, vmf->orig_pte))) { + if (unlikely(!vmf->pte || !pte_same(*vmf->pte, vmf->orig_pte))) { /* The PTE changed under us, update local tlb */ update_mmu_tlb(vma, addr, vmf->pte); ret =3D -EAGAIN; @@ -2888,7 +2891,7 @@ static inline int __wp_page_copy_user(struct page *ds= t, struct page *src, ret =3D 0; =20 pte_unlock: - if (locked) + if (vmf->pte) pte_unmap_unlock(vmf->pte, vmf->ptl); kunmap_atomic(kaddr); flush_dcache_page(dst); @@ -3110,7 +3113,7 @@ static vm_fault_t wp_page_copy(struct vm_fault *vmf) * Re-check the pte - we dropped the lock */ vmf->pte =3D pte_offset_map_lock(mm, vmf->pmd, vmf->address, &vmf->ptl); - if (likely(pte_same(*vmf->pte, vmf->orig_pte))) { + if (likely(vmf->pte && pte_same(*vmf->pte, vmf->orig_pte))) { if (old_folio) { if (!folio_test_anon(old_folio)) { dec_mm_counter(mm, mm_counter_file(&old_folio->page)); @@ -3178,19 +3181,20 @@ static vm_fault_t wp_page_copy(struct vm_fault *vmf) /* Free the old page.. */ new_folio =3D old_folio; page_copied =3D 1; - } else { + pte_unmap_unlock(vmf->pte, vmf->ptl); + } else if (vmf->pte) { update_mmu_tlb(vma, vmf->address, vmf->pte); + pte_unmap_unlock(vmf->pte, vmf->ptl); } =20 - if (new_folio) - folio_put(new_folio); - - pte_unmap_unlock(vmf->pte, vmf->ptl); /* * No need to double call mmu_notifier->invalidate_range() callback as * the above ptep_clear_flush_notify() did already call it. */ mmu_notifier_invalidate_range_only_end(&range); + + if (new_folio) + folio_put(new_folio); if (old_folio) { if (page_copied) free_swap_cache(&old_folio->page); @@ -3230,6 +3234,8 @@ vm_fault_t finish_mkwrite_fault(struct vm_fault *vmf) WARN_ON_ONCE(!(vmf->vma->vm_flags & VM_SHARED)); vmf->pte =3D pte_offset_map_lock(vmf->vma->vm_mm, vmf->pmd, vmf->address, &vmf->ptl); + if (!vmf->pte) + return VM_FAULT_NOPAGE; /* * We might have raced with another page fault while we released the * pte_offset_map_lock. @@ -3591,10 +3597,11 @@ static vm_fault_t remove_device_exclusive_entry(str= uct vm_fault *vmf) =20 vmf->pte =3D pte_offset_map_lock(vma->vm_mm, vmf->pmd, vmf->address, &vmf->ptl); - if (likely(pte_same(*vmf->pte, vmf->orig_pte))) + if (likely(vmf->pte && pte_same(*vmf->pte, vmf->orig_pte))) restore_exclusive_pte(vma, vmf->page, vmf->address, vmf->pte); =20 - pte_unmap_unlock(vmf->pte, vmf->ptl); + if (vmf->pte) + pte_unmap_unlock(vmf->pte, vmf->ptl); folio_unlock(folio); folio_put(folio); =20 @@ -3625,6 +3632,8 @@ static vm_fault_t pte_marker_clear(struct vm_fault *v= mf) { vmf->pte =3D pte_offset_map_lock(vmf->vma->vm_mm, vmf->pmd, vmf->address, &vmf->ptl); + if (!vmf->pte) + return 0; /* * Be careful so that we will only recover a special uffd-wp pte into a * none pte. Otherwise it means the pte could have changed, so retry. @@ -3728,11 +3737,9 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) vmf->page =3D pfn_swap_entry_to_page(entry); vmf->pte =3D pte_offset_map_lock(vma->vm_mm, vmf->pmd, vmf->address, &vmf->ptl); - if (unlikely(!pte_same(*vmf->pte, vmf->orig_pte))) { - spin_unlock(vmf->ptl); - goto out; - } - + if (unlikely(!vmf->pte || + !pte_same(*vmf->pte, vmf->orig_pte))) + goto unlock; /* * Get a page reference while we know the page can't be * freed. @@ -3807,7 +3814,7 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) */ vmf->pte =3D pte_offset_map_lock(vma->vm_mm, vmf->pmd, vmf->address, &vmf->ptl); - if (likely(pte_same(*vmf->pte, vmf->orig_pte))) + if (likely(vmf->pte && pte_same(*vmf->pte, vmf->orig_pte))) ret =3D VM_FAULT_OOM; goto unlock; } @@ -3877,7 +3884,7 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) */ vmf->pte =3D pte_offset_map_lock(vma->vm_mm, vmf->pmd, vmf->address, &vmf->ptl); - if (unlikely(!pte_same(*vmf->pte, vmf->orig_pte))) + if (unlikely(!vmf->pte || !pte_same(*vmf->pte, vmf->orig_pte))) goto out_nomap; =20 if (unlikely(!folio_test_uptodate(folio))) { @@ -4003,13 +4010,15 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) /* No need to invalidate - it was non-present before */ update_mmu_cache(vma, vmf->address, vmf->pte); unlock: - pte_unmap_unlock(vmf->pte, vmf->ptl); + if (vmf->pte) + pte_unmap_unlock(vmf->pte, vmf->ptl); out: if (si) put_swap_device(si); return ret; out_nomap: - pte_unmap_unlock(vmf->pte, vmf->ptl); + if (vmf->pte) + pte_unmap_unlock(vmf->pte, vmf->ptl); out_page: folio_unlock(folio); out_release: @@ -4041,22 +4050,12 @@ static vm_fault_t do_anonymous_page(struct vm_fault= *vmf) return VM_FAULT_SIGBUS; =20 /* - * Use pte_alloc() instead of pte_alloc_map(). We can't run - * pte_offset_map() on pmds where a huge pmd might be created - * from a different thread. - * - * pte_alloc_map() is safe to use under mmap_write_lock(mm) or when - * parallel threads are excluded by other means. - * - * Here we only have mmap_read_lock(mm). + * Use pte_alloc() instead of pte_alloc_map(), so that OOM can + * be distinguished from a transient failure of pte_offset_map(). */ if (pte_alloc(vma->vm_mm, vmf->pmd)) return VM_FAULT_OOM; =20 - /* See comment in handle_pte_fault() */ - if (unlikely(pmd_trans_unstable(vmf->pmd))) - return 0; - /* Use the zero-page for reads */ if (!(vmf->flags & FAULT_FLAG_WRITE) && !mm_forbids_zeropage(vma->vm_mm)) { @@ -4064,6 +4063,8 @@ static vm_fault_t do_anonymous_page(struct vm_fault *= vmf) vma->vm_page_prot)); vmf->pte =3D pte_offset_map_lock(vma->vm_mm, vmf->pmd, vmf->address, &vmf->ptl); + if (!vmf->pte) + goto unlock; if (vmf_pte_changed(vmf)) { update_mmu_tlb(vma, vmf->address, vmf->pte); goto unlock; @@ -4104,6 +4105,8 @@ static vm_fault_t do_anonymous_page(struct vm_fault *= vmf) =20 vmf->pte =3D pte_offset_map_lock(vma->vm_mm, vmf->pmd, vmf->address, &vmf->ptl); + if (!vmf->pte) + goto release; if (vmf_pte_changed(vmf)) { update_mmu_tlb(vma, vmf->address, vmf->pte); goto release; @@ -4131,7 +4134,8 @@ static vm_fault_t do_anonymous_page(struct vm_fault *= vmf) /* No need to invalidate - it was non-present before */ update_mmu_cache(vma, vmf->address, vmf->pte); unlock: - pte_unmap_unlock(vmf->pte, vmf->ptl); + if (vmf->pte) + pte_unmap_unlock(vmf->pte, vmf->ptl); return ret; release: folio_put(folio); @@ -4380,15 +4384,10 @@ vm_fault_t finish_fault(struct vm_fault *vmf) return VM_FAULT_OOM; } =20 - /* - * See comment in handle_pte_fault() for how this scenario happens, we - * need to return NOPAGE so that we drop this page. - */ - if (pmd_devmap_trans_unstable(vmf->pmd)) - return VM_FAULT_NOPAGE; - vmf->pte =3D pte_offset_map_lock(vma->vm_mm, vmf->pmd, vmf->address, &vmf->ptl); + if (!vmf->pte) + return VM_FAULT_NOPAGE; =20 /* Re-check under ptl */ if (likely(!vmf_pte_changed(vmf))) { @@ -4630,17 +4629,11 @@ static vm_fault_t do_fault(struct vm_fault *vmf) * The VMA was not fully populated on mmap() or missing VM_DONTEXPAND */ if (!vma->vm_ops->fault) { - /* - * If we find a migration pmd entry or a none pmd entry, which - * should never happen, return SIGBUS - */ - if (unlikely(!pmd_present(*vmf->pmd))) + vmf->pte =3D pte_offset_map_lock(vmf->vma->vm_mm, vmf->pmd, + vmf->address, &vmf->ptl); + if (unlikely(!vmf->pte)) ret =3D VM_FAULT_SIGBUS; else { - vmf->pte =3D pte_offset_map_lock(vmf->vma->vm_mm, - vmf->pmd, - vmf->address, - &vmf->ptl); /* * Make sure this is not a temporary clearing of pte * by holding ptl and checking again. A R/M/W update @@ -5429,10 +5422,9 @@ int follow_pte(struct mm_struct *mm, unsigned long a= ddress, pmd =3D pmd_offset(pud, address); VM_BUG_ON(pmd_trans_huge(*pmd)); =20 - if (pmd_none(*pmd) || unlikely(pmd_bad(*pmd))) - goto out; - ptep =3D pte_offset_map_lock(mm, pmd, address, ptlp); + if (!ptep) + goto out; if (!pte_present(*ptep)) goto unlock; *ptepp =3D ptep; --=20 2.35.3 From nobody Sat Feb 7 20:47:53 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 50C6CC7EE23 for ; Fri, 9 Jun 2023 01:45:18 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S237953AbjFIBpQ (ORCPT ); Thu, 8 Jun 2023 21:45:16 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:50696 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S236986AbjFIBpL (ORCPT ); Thu, 8 Jun 2023 21:45:11 -0400 Received: from mail-yw1-x112f.google.com (mail-yw1-x112f.google.com [IPv6:2607:f8b0:4864:20::112f]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 9C1E41A2 for ; Thu, 8 Jun 2023 18:45:10 -0700 (PDT) Received: by mail-yw1-x112f.google.com with SMTP id 00721157ae682-565cdb77b01so10989137b3.0 for ; Thu, 08 Jun 2023 18:45:10 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1686275110; x=1688867110; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:from:to:cc:subject:date:message-id:reply-to; bh=6V2+W0bna42qnZ43tvOf68yFS9Jt1uoq2b6WWLaZ+Rg=; b=5nDVvDFDj0TDSwA6VyREm/b7XZyfd9szkT+LB84tX5XiJW9/M17p2pjZw/Ek5jNCOs 40hFeMoNWqiJJMeNTMe2RktTogEBFqE0QPIaTtgF826N1Ako6QEnbFjFhp7dOZXF3sOJ aEBqtDp9A7OZ2dPSncNi3eN5NLAJGJkMJEjCptn2bKCi4uA/IGUtLDX3g50JH0sMPf9e MNUa9uUW/hKz7iJbU0zOzLOl/zU2VZeyofD7W99CMLPHI1iKGrA1iMdnkGcmDXd/BToo s3p4xYMOEA3jz65+/qIv9llP2GR7KuS7BRga3o0mZ9pxZSvUAyZQGkK0RDUXGGqfUJro ylKw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1686275110; x=1688867110; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=6V2+W0bna42qnZ43tvOf68yFS9Jt1uoq2b6WWLaZ+Rg=; b=GiXVHYqjHv9w6CXCVqMQlgryVH3OafFX1mIR/VTLvbLO5SZs0SbTfwVITmyVG39785 Ije2xQrB7A7gdOPXuuliRSvAsYEOnYOddAMg2GuDG/dVNLoySqIcoW4Si/foCBfaLw/C cIY5t2eYAVFsu+4EMtjZ6x5mFsd2DdnEDW2RO0vzBY0N/mcfsHn8GTXJ/2TJffI2zpsT kpDostdPkN4aJKDudY61x9PNtpE2KPjfDXixBSH4fBE+XJ9nYD5MaD5HYD4sTyWxtKBj z8MahmTxHa2acgEKLgJzk5NDPSV+2jCI+wees6atCjUCoMnTQVHl5GUhWBEVcTy2UX5E 597Q== X-Gm-Message-State: AC+VfDwkwPJYdj1yTOS0+8PA+JcAsXwsGi6+PK21Pg+mWeHSq57Xi+RS 1hVBx8xWJZfmRQJAEaC2p9NYxA== X-Google-Smtp-Source: ACHHUZ6uvLIOAAtIvmaYvvIxcZjU5faumO4qT6eMLe7uSTWFP55mTra/k3W0VJ0vPQcKSj3TeQZVPw== X-Received: by 2002:a81:8884:0:b0:569:74f3:fd07 with SMTP id y126-20020a818884000000b0056974f3fd07mr12140ywf.0.1686275109703; Thu, 08 Jun 2023 18:45:09 -0700 (PDT) Received: from ripple.attlocal.net (172-10-233-147.lightspeed.sntcca.sbcglobal.net. [172.10.233.147]) by smtp.gmail.com with ESMTPSA id d63-20020a816842000000b0055a7ff0a5cdsm303405ywc.27.2023.06.08.18.45.06 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 08 Jun 2023 18:45:08 -0700 (PDT) Date: Thu, 8 Jun 2023 18:45:05 -0700 (PDT) From: Hugh Dickins X-X-Sender: hugh@ripple.attlocal.net To: Andrew Morton cc: Mike Kravetz , Mike Rapoport , "Kirill A. Shutemov" , Matthew Wilcox , David Hildenbrand , Suren Baghdasaryan , Qi Zheng , Yang Shi , Mel Gorman , Peter Xu , Peter Zijlstra , Will Deacon , Yu Zhao , Alistair Popple , Ralph Campbell , Ira Weiny , Steven Price , SeongJae Park , Lorenzo Stoakes , Huang Ying , Naoya Horiguchi , Christophe Leroy , Zack Rusin , Jason Gunthorpe , Axel Rasmussen , Anshuman Khandual , Pasha Tatashin , Miaohe Lin , Minchan Kim , Christoph Hellwig , Song Liu , Thomas Hellstrom , Ryan Roberts , linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: [PATCH v2 29/32] mm/memory: handle_pte_fault() use pte_offset_map_nolock() In-Reply-To: Message-ID: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" handle_pte_fault() use pte_offset_map_nolock() to get the vmf.ptl which corresponds to vmf.pte, instead of pte_lockptr() being used later, when there's a chance that the pmd entry might have changed, perhaps to none, or to a huge pmd, with no split ptlock in its struct page. Remove its pmd_devmap_trans_unstable() call: pte_offset_map_nolock() will handle that case by failing. Update the "morph" comment above, looking forward to when shmem or file collapse to THP may not take mmap_lock for write (or not at all). do_numa_page() use the vmf->ptl from handle_pte_fault() at first, but refresh it when refreshing vmf->pte. do_swap_page()'s pte_unmap_same() (the thing that takes ptl to verify a two-part PAE orig_pte) use the vmf->ptl from handle_pte_fault() too; but do_swap_page() is also used by anon THP's __collapse_huge_page_swapin(), so adjust that to set vmf->ptl by pte_offset_map_nolock(). Signed-off-by: Hugh Dickins --- mm/khugepaged.c | 6 ++++-- mm/memory.c | 38 +++++++++++++------------------------- 2 files changed, 17 insertions(+), 27 deletions(-) diff --git a/mm/khugepaged.c b/mm/khugepaged.c index 49cfa7cdfe93..c11db2e78e95 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -1005,6 +1005,7 @@ static int __collapse_huge_page_swapin(struct mm_stru= ct *mm, unsigned long address, end =3D haddr + (HPAGE_PMD_NR * PAGE_SIZE); int result; pte_t *pte =3D NULL; + spinlock_t *ptl; =20 for (address =3D haddr; address < end; address +=3D PAGE_SIZE) { struct vm_fault vmf =3D { @@ -1016,7 +1017,7 @@ static int __collapse_huge_page_swapin(struct mm_stru= ct *mm, }; =20 if (!pte++) { - pte =3D pte_offset_map(pmd, address); + pte =3D pte_offset_map_nolock(mm, pmd, address, &ptl); if (!pte) { mmap_read_unlock(mm); result =3D SCAN_PMD_NULL; @@ -1024,11 +1025,12 @@ static int __collapse_huge_page_swapin(struct mm_st= ruct *mm, } } =20 - vmf.orig_pte =3D *pte; + vmf.orig_pte =3D ptep_get_lockless(pte); if (!is_swap_pte(vmf.orig_pte)) continue; =20 vmf.pte =3D pte; + vmf.ptl =3D ptl; ret =3D do_swap_page(&vmf); /* Which unmaps pte (after perhaps re-checking the entry) */ pte =3D NULL; diff --git a/mm/memory.c b/mm/memory.c index c7b920291a72..4ec46eecefd3 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -2786,10 +2786,9 @@ static inline int pte_unmap_same(struct vm_fault *vm= f) int same =3D 1; #if defined(CONFIG_SMP) || defined(CONFIG_PREEMPTION) if (sizeof(pte_t) > sizeof(unsigned long)) { - spinlock_t *ptl =3D pte_lockptr(vmf->vma->vm_mm, vmf->pmd); - spin_lock(ptl); + spin_lock(vmf->ptl); same =3D pte_same(*vmf->pte, vmf->orig_pte); - spin_unlock(ptl); + spin_unlock(vmf->ptl); } #endif pte_unmap(vmf->pte); @@ -4696,7 +4695,6 @@ static vm_fault_t do_numa_page(struct vm_fault *vmf) * validation through pte_unmap_same(). It's of NUMA type but * the pfn may be screwed if the read is non atomic. */ - vmf->ptl =3D pte_lockptr(vma->vm_mm, vmf->pmd); spin_lock(vmf->ptl); if (unlikely(!pte_same(*vmf->pte, vmf->orig_pte))) { pte_unmap_unlock(vmf->pte, vmf->ptl); @@ -4767,8 +4765,10 @@ static vm_fault_t do_numa_page(struct vm_fault *vmf) flags |=3D TNF_MIGRATED; } else { flags |=3D TNF_MIGRATE_FAIL; - vmf->pte =3D pte_offset_map(vmf->pmd, vmf->address); - spin_lock(vmf->ptl); + vmf->pte =3D pte_offset_map_lock(vma->vm_mm, vmf->pmd, + vmf->address, &vmf->ptl); + if (unlikely(!vmf->pte)) + goto out; if (unlikely(!pte_same(*vmf->pte, vmf->orig_pte))) { pte_unmap_unlock(vmf->pte, vmf->ptl); goto out; @@ -4897,27 +4897,16 @@ static vm_fault_t handle_pte_fault(struct vm_fault = *vmf) vmf->pte =3D NULL; vmf->flags &=3D ~FAULT_FLAG_ORIG_PTE_VALID; } else { - /* - * If a huge pmd materialized under us just retry later. Use - * pmd_trans_unstable() via pmd_devmap_trans_unstable() instead - * of pmd_trans_huge() to ensure the pmd didn't become - * pmd_trans_huge under us and then back to pmd_none, as a - * result of MADV_DONTNEED running immediately after a huge pmd - * fault in a different thread of this mm, in turn leading to a - * misleading pmd_trans_huge() retval. All we have to ensure is - * that it is a regular pmd that we can walk with - * pte_offset_map() and we can do that through an atomic read - * in C, which is what pmd_trans_unstable() provides. - */ - if (pmd_devmap_trans_unstable(vmf->pmd)) - return 0; /* * A regular pmd is established and it can't morph into a huge - * pmd from under us anymore at this point because we hold the - * mmap_lock read mode and khugepaged takes it in write mode. - * So now it's safe to run pte_offset_map(). + * pmd by anon khugepaged, since that takes mmap_lock in write + * mode; but shmem or file collapse to THP could still morph + * it into a huge pmd: just retry later if so. */ - vmf->pte =3D pte_offset_map(vmf->pmd, vmf->address); + vmf->pte =3D pte_offset_map_nolock(vmf->vma->vm_mm, vmf->pmd, + vmf->address, &vmf->ptl); + if (unlikely(!vmf->pte)) + return 0; vmf->orig_pte =3D ptep_get_lockless(vmf->pte); vmf->flags |=3D FAULT_FLAG_ORIG_PTE_VALID; =20 @@ -4936,7 +4925,6 @@ static vm_fault_t handle_pte_fault(struct vm_fault *v= mf) if (pte_protnone(vmf->orig_pte) && vma_is_accessible(vmf->vma)) return do_numa_page(vmf); =20 - vmf->ptl =3D pte_lockptr(vmf->vma->vm_mm, vmf->pmd); spin_lock(vmf->ptl); entry =3D vmf->orig_pte; if (unlikely(!pte_same(*vmf->pte, entry))) { --=20 2.35.3 From nobody Sat Feb 7 20:47:53 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4BD93C7EE29 for ; Fri, 9 Jun 2023 01:51:00 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S237994AbjFIBu4 (ORCPT ); Thu, 8 Jun 2023 21:50:56 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:52598 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S237975AbjFIBux (ORCPT ); Thu, 8 Jun 2023 21:50:53 -0400 Received: from mail-yb1-xb2d.google.com (mail-yb1-xb2d.google.com [IPv6:2607:f8b0:4864:20::b2d]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C14721BF0 for ; Thu, 8 Jun 2023 18:50:49 -0700 (PDT) Received: by mail-yb1-xb2d.google.com with SMTP id 3f1490d57ef6-bb167972cffso1373658276.1 for ; Thu, 08 Jun 2023 18:50:49 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1686275449; x=1688867449; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:from:to:cc:subject:date:message-id:reply-to; bh=5QGRbISmNdiSrcRCSB3pJg16AgA7v+5ZDTS/B09Kn8o=; b=VYBwsG1BtIkMXYG/EqMPmNqaJtObBxLa0DzVTT5ka0MvcTVzYnyBd4s72c34jeMJWZ xJedOpJC62Y06znAOr56dfhffB75a2S0AEDjwGdBtR8M50W2ZRRdIZBxB+7WBisOZrKT nPmJFCweF3RKUaSLr026wSZQ2QVhzWfX9ltBG3Uq6D/Lj/QnQ1Y+MBLl1GEWPRjympq5 k7E6IjPRJAuuYRzzFwPhEP7q5GuBMvHkMBcEHG6JFmqkrVIuIbku6yQIyNpLr2Fx94UY CJ6aDp4I29QQMn44W1tjU0XjRP5XR9NluCQv+G6gcGOq9aMokdcjYhjrO1zsvgIIBXRP l9eQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1686275449; x=1688867449; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=5QGRbISmNdiSrcRCSB3pJg16AgA7v+5ZDTS/B09Kn8o=; b=PBitS/xhWz1cVDi/iqdnibfEjgQxOZfjOVJwS+RpXYyqsyVnSASjiMMbXwds8Mlba7 brOsDoNoGJsJyBwPZFyEsbIHNAa6nRMCNzTGJ+uZ1+wcItytPS3u0zs/QPNCJL8y8IeY /tXXy3w8lc0PA7Z0YUZ6DyJawnD2Zel8aeUfmpKs7+bIZzCMOv/QziKq+FHxGzQ65zX0 Mf9RNmZymEe+O7kGcPPsCHkjhAswLkznNJZYW9Lpl/Rb7P4VKfBUHSihU4OksrHHHnuA uaZ6NHPK3rhfjqhkw0rkhc8sQm/YqUFMpAssB0eI6q1goq5Wtehrc3Pypv5OfNmJGkq0 GLKw== X-Gm-Message-State: AC+VfDxMlZx3SlHX/Srud9+GLqjUfU2KxxB3a+ZkFG4pdOtaygkdgM4K sahIJ8F6JeqWcbqASLctw0P9+A== X-Google-Smtp-Source: ACHHUZ7rkKxLDUXb7gVbThinQJUAuYEpzA2mTjY1b2VjU6hE71QC6+GRYBvyPmvE/ofxxSjMXUupyg== X-Received: by 2002:a25:4103:0:b0:ba8:8779:667 with SMTP id o3-20020a254103000000b00ba887790667mr1159827yba.41.1686275448723; Thu, 08 Jun 2023 18:50:48 -0700 (PDT) Received: from ripple.attlocal.net (172-10-233-147.lightspeed.sntcca.sbcglobal.net. [172.10.233.147]) by smtp.gmail.com with ESMTPSA id n11-20020a25400b000000b00bb393903508sm622475yba.14.2023.06.08.18.50.44 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 08 Jun 2023 18:50:47 -0700 (PDT) Date: Thu, 8 Jun 2023 18:50:37 -0700 (PDT) From: Hugh Dickins X-X-Sender: hugh@ripple.attlocal.net To: Andrew Morton cc: Mike Kravetz , Mike Rapoport , "Kirill A. Shutemov" , Matthew Wilcox , David Hildenbrand , Suren Baghdasaryan , Qi Zheng , Yang Shi , Mel Gorman , Peter Xu , Peter Zijlstra , Will Deacon , Yu Zhao , Alistair Popple , Ralph Campbell , Ira Weiny , Steven Price , SeongJae Park , Lorenzo Stoakes , Huang Ying , Naoya Horiguchi , Christophe Leroy , Zack Rusin , Jason Gunthorpe , Axel Rasmussen , Anshuman Khandual , Pasha Tatashin , Miaohe Lin , Minchan Kim , Christoph Hellwig , Song Liu , Thomas Hellstrom , Ryan Roberts , linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: [PATCH v2 30/32] mm/pgtable: delete pmd_trans_unstable() and friends In-Reply-To: Message-ID: <5abdab3-3136-b42e-274d-9c6281bfb79@google.com> References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Delete pmd_trans_unstable, pmd_none_or_trans_huge_or_clear_bad() and pmd_devmap_trans_unstable(), all now unused. With mixed feelings, delete all the comments on pmd_trans_unstable(). That was very good documentation of a subtle state, and this series does not even eliminate that state: but rather, normalizes and extends it, asking pte_offset_map[_lock]() callers to anticipate failure, without regard for whether mmap_read_lock() or mmap_write_lock() is held. Retain pud_trans_unstable(), which has one use in __handle_mm_fault(), but delete its equivalent pud_none_or_trans_huge_or_dev_or_clear_bad(). While there, move the default arch_needs_pgtable_deposit() definition up near where pgtable_trans_huge_deposit() and withdraw() are declared. Signed-off-by: Hugh Dickins --- include/linux/pgtable.h | 103 +++------------------------------------- mm/khugepaged.c | 4 -- 2 files changed, 7 insertions(+), 100 deletions(-) diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h index 3fabbb018557..a1326e61d7ee 100644 --- a/include/linux/pgtable.h +++ b/include/linux/pgtable.h @@ -599,6 +599,10 @@ extern void pgtable_trans_huge_deposit(struct mm_struc= t *mm, pmd_t *pmdp, extern pgtable_t pgtable_trans_huge_withdraw(struct mm_struct *mm, pmd_t *= pmdp); #endif =20 +#ifndef arch_needs_pgtable_deposit +#define arch_needs_pgtable_deposit() (false) +#endif + #ifdef CONFIG_TRANSPARENT_HUGEPAGE /* * This is an implementation of pmdp_establish() that is only suitable for= an @@ -1300,9 +1304,10 @@ static inline int pud_trans_huge(pud_t pud) } #endif =20 -/* See pmd_none_or_trans_huge_or_clear_bad for discussion. */ -static inline int pud_none_or_trans_huge_or_dev_or_clear_bad(pud_t *pud) +static inline int pud_trans_unstable(pud_t *pud) { +#if defined(CONFIG_TRANSPARENT_HUGEPAGE) && \ + defined(CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD) pud_t pudval =3D READ_ONCE(*pud); =20 if (pud_none(pudval) || pud_trans_huge(pudval) || pud_devmap(pudval)) @@ -1311,104 +1316,10 @@ static inline int pud_none_or_trans_huge_or_dev_or= _clear_bad(pud_t *pud) pud_clear_bad(pud); return 1; } - return 0; -} - -/* See pmd_trans_unstable for discussion. */ -static inline int pud_trans_unstable(pud_t *pud) -{ -#if defined(CONFIG_TRANSPARENT_HUGEPAGE) && \ - defined(CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD) - return pud_none_or_trans_huge_or_dev_or_clear_bad(pud); -#else - return 0; #endif -} - -#ifndef arch_needs_pgtable_deposit -#define arch_needs_pgtable_deposit() (false) -#endif -/* - * This function is meant to be used by sites walking pagetables with - * the mmap_lock held in read mode to protect against MADV_DONTNEED and - * transhuge page faults. MADV_DONTNEED can convert a transhuge pmd - * into a null pmd and the transhuge page fault can convert a null pmd - * into an hugepmd or into a regular pmd (if the hugepage allocation - * fails). While holding the mmap_lock in read mode the pmd becomes - * stable and stops changing under us only if it's not null and not a - * transhuge pmd. When those races occurs and this function makes a - * difference vs the standard pmd_none_or_clear_bad, the result is - * undefined so behaving like if the pmd was none is safe (because it - * can return none anyway). The compiler level barrier() is critically - * important to compute the two checks atomically on the same pmdval. - * - * For 32bit kernels with a 64bit large pmd_t this automatically takes - * care of reading the pmd atomically to avoid SMP race conditions - * against pmd_populate() when the mmap_lock is hold for reading by the - * caller (a special atomic read not done by "gcc" as in the generic - * version above, is also needed when THP is disabled because the page - * fault can populate the pmd from under us). - */ -static inline int pmd_none_or_trans_huge_or_clear_bad(pmd_t *pmd) -{ - pmd_t pmdval =3D pmdp_get_lockless(pmd); - /* - * !pmd_present() checks for pmd migration entries - * - * The complete check uses is_pmd_migration_entry() in linux/swapops.h - * But using that requires moving current function and pmd_trans_unstable= () - * to linux/swapops.h to resolve dependency, which is too much code move. - * - * !pmd_present() is equivalent to is_pmd_migration_entry() currently, - * because !pmd_present() pages can only be under migration not swapped - * out. - * - * pmd_none() is preserved for future condition checks on pmd migration - * entries and not confusing with this function name, although it is - * redundant with !pmd_present(). - */ - if (pmd_none(pmdval) || pmd_trans_huge(pmdval) || - (IS_ENABLED(CONFIG_ARCH_ENABLE_THP_MIGRATION) && !pmd_present(pmdval))) - return 1; - if (unlikely(pmd_bad(pmdval))) { - pmd_clear_bad(pmd); - return 1; - } return 0; } =20 -/* - * This is a noop if Transparent Hugepage Support is not built into - * the kernel. Otherwise it is equivalent to - * pmd_none_or_trans_huge_or_clear_bad(), and shall only be called in - * places that already verified the pmd is not none and they want to - * walk ptes while holding the mmap sem in read mode (write mode don't - * need this). If THP is not enabled, the pmd can't go away under the - * code even if MADV_DONTNEED runs, but if THP is enabled we need to - * run a pmd_trans_unstable before walking the ptes after - * split_huge_pmd returns (because it may have run when the pmd become - * null, but then a page fault can map in a THP and not a regular page). - */ -static inline int pmd_trans_unstable(pmd_t *pmd) -{ -#ifdef CONFIG_TRANSPARENT_HUGEPAGE - return pmd_none_or_trans_huge_or_clear_bad(pmd); -#else - return 0; -#endif -} - -/* - * the ordering of these checks is important for pmds with _page_devmap se= t. - * if we check pmd_trans_unstable() first we will trip the bad_pmd() check - * inside of pmd_none_or_trans_huge_or_clear_bad(). this will end up corre= ctly - * returning 1 but not before it spams dmesg with the pmd_clear_bad() outp= ut. - */ -static inline int pmd_devmap_trans_unstable(pmd_t *pmd) -{ - return pmd_devmap(*pmd) || pmd_trans_unstable(pmd); -} - #ifndef CONFIG_NUMA_BALANCING /* * Technically a PTE can be PROTNONE even when not doing NUMA balancing but diff --git a/mm/khugepaged.c b/mm/khugepaged.c index c11db2e78e95..1083f0e38a07 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -946,10 +946,6 @@ static int hugepage_vma_revalidate(struct mm_struct *m= m, unsigned long address, return SCAN_SUCCEED; } =20 -/* - * See pmd_trans_unstable() for how the result may change out from - * underneath us, even if we hold mmap_lock in read. - */ static int find_pmd_or_thp_or_none(struct mm_struct *mm, unsigned long address, pmd_t **pmd) --=20 2.35.3 From nobody Sat Feb 7 20:47:53 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 13C21C7EE29 for ; Fri, 9 Jun 2023 01:52:28 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238015AbjFIBw0 (ORCPT ); Thu, 8 Jun 2023 21:52:26 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53368 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S238000AbjFIBwY (ORCPT ); Thu, 8 Jun 2023 21:52:24 -0400 Received: from mail-yb1-xb34.google.com (mail-yb1-xb34.google.com [IPv6:2607:f8b0:4864:20::b34]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 3ADFA1FDF for ; Thu, 8 Jun 2023 18:52:23 -0700 (PDT) Received: by mail-yb1-xb34.google.com with SMTP id 3f1490d57ef6-bad05c6b389so1366147276.2 for ; Thu, 08 Jun 2023 18:52:23 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1686275542; x=1688867542; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:from:to:cc:subject:date:message-id:reply-to; bh=MICzT8g6d2gbosF05gYeCNQyYA3E/PyjzMQZ3kC9KOI=; b=PYctsQ9b4HrQorKzO7zX+fdJyjXUiN8i4W2q9bIpl7bYnCcy9KOZVUhjq8Som0kotl 8hN3DheRVe/kgHsmckj1HIHUS7QkhDrduKAtxjWrJ7uPd3Z1kPi0K/a74/drhVMrEv5N iyFn0O/U0PF+LJj7jEIWaYX3IwAMgIU4MQFfWd5rKkg5CQiQ4VOa9f2TnjccivpRjNpe MLfjrjWUH5cF/6qTPBYKfnjLb/mQJxgyL/APasFw6ccuLxdD/9qF0oXkbHPENouaIFKL go5KxwidvL12GJkGeTVlF7tb8Sm8kZvi4Yzd/FTjpbEMUsi8U+HBiLJXyA+0bfuLk/Ed ypDw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1686275542; x=1688867542; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=MICzT8g6d2gbosF05gYeCNQyYA3E/PyjzMQZ3kC9KOI=; b=E5nK6VNSd2JVpdRHj6mGK8T+J1i1SS0VwADD7oaIIA1s6aa0+vDoXS7OEtZS+/D+cC Sc1kY2PmVNtqjEY6G1gUtesUYPwZ/jaGIll+vOPfmlh+jfuGfR3vQ+7/cOEbA2E73C5u B0kqr4ebkvvTQpH2AFZiVLs2A9PF1zewO9zsU9jjpmRuJkGi6q4c3SWS24J2GhM/W/H+ YarSknjebkegQEDUJ6lz+dTPr+12jczzi8JIlFy4WQD6bSX4XX3R7hTFuVCSZZ2hX5fE z2XQRD80qGQtm4nXaMfEzJU0uzomEjXx+DvNTYZwX3GJfQVWGntqY00R5ME1bbGTVuUJ v0HQ== X-Gm-Message-State: AC+VfDymIVjUDNA8Mg7V96qeFBhv3JEgkm21bcETmKuWgKjNtlAXXT0F Sqtb0llfsjEb97pok3r4h6UO3g== X-Google-Smtp-Source: ACHHUZ4s2iHF1O1eWsJO+RR6O7kpy/Z7YTwwVNBnE8hanvEwdCoqR2iged2HkuPKJixJaGakJ6Kn7w== X-Received: by 2002:a0d:cf86:0:b0:559:ed0a:96c4 with SMTP id r128-20020a0dcf86000000b00559ed0a96c4mr1168635ywd.44.1686275542179; Thu, 08 Jun 2023 18:52:22 -0700 (PDT) Received: from ripple.attlocal.net (172-10-233-147.lightspeed.sntcca.sbcglobal.net. [172.10.233.147]) by smtp.gmail.com with ESMTPSA id j81-20020a816e54000000b00569e7cbcd56sm300407ywc.69.2023.06.08.18.52.18 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 08 Jun 2023 18:52:21 -0700 (PDT) Date: Thu, 8 Jun 2023 18:52:17 -0700 (PDT) From: Hugh Dickins X-X-Sender: hugh@ripple.attlocal.net To: Andrew Morton cc: Mike Kravetz , Mike Rapoport , "Kirill A. Shutemov" , Matthew Wilcox , David Hildenbrand , Suren Baghdasaryan , Qi Zheng , Yang Shi , Mel Gorman , Peter Xu , Peter Zijlstra , Will Deacon , Yu Zhao , Alistair Popple , Ralph Campbell , Ira Weiny , Steven Price , SeongJae Park , Lorenzo Stoakes , Huang Ying , Naoya Horiguchi , Christophe Leroy , Zack Rusin , Jason Gunthorpe , Axel Rasmussen , Anshuman Khandual , Pasha Tatashin , Miaohe Lin , Minchan Kim , Christoph Hellwig , Song Liu , Thomas Hellstrom , Ryan Roberts , linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: [PATCH v2 31/32] mm/swap: swap_vma_readahead() do the pte_offset_map() In-Reply-To: Message-ID: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" swap_vma_readahead() has been proceeding in an unconventional way, its preliminary swap_ra_info() doing the pte_offset_map() and pte_unmap(), then relying on that pte pointer even after the pte_unmap() - in its CONFIG_64BIT case (I think !CONFIG_HIGHPTE was intended; whereas 32-bit copied ptes to stack while they were mapped, but had to limit how many). Though it would be difficult to construct a failing testcase, accessing page table after pte_unmap() will become bad practice, even on 64-bit: an rcu_read_unlock() in pte_unmap() will allow page table to be freed. Move relevant definitions from include/linux/swap.h to mm/swap_state.c, nothing else used them. Delete the CONFIG_64BIT distinction and buffer, delete all reference to ptes from swap_ra_info(), use pte_offset_map() repeatedly in swap_vma_readahead(), breaking from the loop if it fails. (Will the repeated "map" and "unmap" show up as a slowdown anywhere? If so, maybe modify __read_swap_cache_async() to do the pte_unmap() only when it does not find the page already in the swapcache.) Use ptep_get_lockless(), mainly for its READ_ONCE(). Correctly advance the address passed down to each call of __read__swap_cache_async(). Signed-off-by: Hugh Dickins Reviewed-by: "Huang, Ying" --- include/linux/swap.h | 19 ------------------- mm/swap_state.c | 45 +++++++++++++++++++++++--------------------- 2 files changed, 24 insertions(+), 40 deletions(-) diff --git a/include/linux/swap.h b/include/linux/swap.h index 3c69cb653cb9..1b9f2d92fc10 100644 --- a/include/linux/swap.h +++ b/include/linux/swap.h @@ -337,25 +337,6 @@ struct swap_info_struct { */ }; =20 -#ifdef CONFIG_64BIT -#define SWAP_RA_ORDER_CEILING 5 -#else -/* Avoid stack overflow, because we need to save part of page table */ -#define SWAP_RA_ORDER_CEILING 3 -#define SWAP_RA_PTE_CACHE_SIZE (1 << SWAP_RA_ORDER_CEILING) -#endif - -struct vma_swap_readahead { - unsigned short win; - unsigned short offset; - unsigned short nr_pte; -#ifdef CONFIG_64BIT - pte_t *ptes; -#else - pte_t ptes[SWAP_RA_PTE_CACHE_SIZE]; -#endif -}; - static inline swp_entry_t folio_swap_entry(struct folio *folio) { swp_entry_t entry =3D { .val =3D page_private(&folio->page) }; diff --git a/mm/swap_state.c b/mm/swap_state.c index b76a65ac28b3..a43b41975da2 100644 --- a/mm/swap_state.c +++ b/mm/swap_state.c @@ -698,6 +698,14 @@ void exit_swap_address_space(unsigned int type) swapper_spaces[type] =3D NULL; } =20 +#define SWAP_RA_ORDER_CEILING 5 + +struct vma_swap_readahead { + unsigned short win; + unsigned short offset; + unsigned short nr_pte; +}; + static void swap_ra_info(struct vm_fault *vmf, struct vma_swap_readahead *ra_info) { @@ -705,11 +713,7 @@ static void swap_ra_info(struct vm_fault *vmf, unsigned long ra_val; unsigned long faddr, pfn, fpfn, lpfn, rpfn; unsigned long start, end; - pte_t *pte, *orig_pte; unsigned int max_win, hits, prev_win, win; -#ifndef CONFIG_64BIT - pte_t *tpte; -#endif =20 max_win =3D 1 << min_t(unsigned int, READ_ONCE(page_cluster), SWAP_RA_ORDER_CEILING); @@ -728,12 +732,9 @@ static void swap_ra_info(struct vm_fault *vmf, max_win, prev_win); atomic_long_set(&vma->swap_readahead_info, SWAP_RA_VAL(faddr, win, 0)); - if (win =3D=3D 1) return; =20 - /* Copy the PTEs because the page table may be unmapped */ - orig_pte =3D pte =3D pte_offset_map(vmf->pmd, faddr); if (fpfn =3D=3D pfn + 1) { lpfn =3D fpfn; rpfn =3D fpfn + win; @@ -753,15 +754,6 @@ static void swap_ra_info(struct vm_fault *vmf, =20 ra_info->nr_pte =3D end - start; ra_info->offset =3D fpfn - start; - pte -=3D ra_info->offset; -#ifdef CONFIG_64BIT - ra_info->ptes =3D pte; -#else - tpte =3D ra_info->ptes; - for (pfn =3D start; pfn !=3D end; pfn++) - *tpte++ =3D *pte++; -#endif - pte_unmap(orig_pte); } =20 /** @@ -785,7 +777,8 @@ static struct page *swap_vma_readahead(swp_entry_t fent= ry, gfp_t gfp_mask, struct swap_iocb *splug =3D NULL; struct vm_area_struct *vma =3D vmf->vma; struct page *page; - pte_t *pte, pentry; + pte_t *pte =3D NULL, pentry; + unsigned long addr; swp_entry_t entry; unsigned int i; bool page_allocated; @@ -797,17 +790,25 @@ static struct page *swap_vma_readahead(swp_entry_t fe= ntry, gfp_t gfp_mask, if (ra_info.win =3D=3D 1) goto skip; =20 + addr =3D vmf->address - (ra_info.offset * PAGE_SIZE); + blk_start_plug(&plug); - for (i =3D 0, pte =3D ra_info.ptes; i < ra_info.nr_pte; - i++, pte++) { - pentry =3D *pte; + for (i =3D 0; i < ra_info.nr_pte; i++, addr +=3D PAGE_SIZE) { + if (!pte++) { + pte =3D pte_offset_map(vmf->pmd, addr); + if (!pte) + break; + } + pentry =3D ptep_get_lockless(pte); if (!is_swap_pte(pentry)) continue; entry =3D pte_to_swp_entry(pentry); if (unlikely(non_swap_entry(entry))) continue; + pte_unmap(pte); + pte =3D NULL; page =3D __read_swap_cache_async(entry, gfp_mask, vma, - vmf->address, &page_allocated); + addr, &page_allocated); if (!page) continue; if (page_allocated) { @@ -819,6 +820,8 @@ static struct page *swap_vma_readahead(swp_entry_t fent= ry, gfp_t gfp_mask, } put_page(page); } + if (pte) + pte_unmap(pte); blk_finish_plug(&plug); swap_read_unplug(splug); lru_add_drain(); --=20 2.35.3 From nobody Sat Feb 7 20:47:53 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id B4521C7EE23 for ; Fri, 9 Jun 2023 01:53:32 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238028AbjFIBxb (ORCPT ); Thu, 8 Jun 2023 21:53:31 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53906 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231181AbjFIBx3 (ORCPT ); Thu, 8 Jun 2023 21:53:29 -0400 Received: from mail-yw1-x1134.google.com (mail-yw1-x1134.google.com [IPv6:2607:f8b0:4864:20::1134]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 02C761BF0 for ; Thu, 8 Jun 2023 18:53:29 -0700 (PDT) Received: by mail-yw1-x1134.google.com with SMTP id 00721157ae682-565d354b59fso12395647b3.0 for ; Thu, 08 Jun 2023 18:53:28 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1686275608; x=1688867608; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:from:to:cc:subject:date:message-id:reply-to; bh=++qSmbn7bhjg3RT15iglmwv4R1gESGfbvlyrc9z9nPg=; b=W1vFMPtinSndBcBgNlPbjuG1vI+MS4h79C73H3xECr5w/wVd+ID5QFfuBCbZ7w3Ayy CqgKuvbsR/UytKoF7Wy8rKKREq+DfXmPMzwuIXXWHCfXjS6w2FOZgMYNRgBe70zqe7iJ hE/jG7WdGUizvK/G6fx0GyFk9oLfadtxrkg+RkvFJ+RmYzOn5qIxJuKlmaDNTeDZEEE4 Q9LEWq0qVGafHnb3ONnwEEktOqfHdhe87GaAA+HqXKGM3CI972T54tFojES5P7841qpO GKVXJvYG8HirDB7MzuADX+/RcwH8/w5ThO2DEpKlo8tc7BvvsNgYXTbXSVtPO7qfqPxu lwNw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1686275608; x=1688867608; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=++qSmbn7bhjg3RT15iglmwv4R1gESGfbvlyrc9z9nPg=; b=AgVvCe2kH4GO1vNojGZuBJ9msa0AK0i9PIUN6aTbWEa75b9GwZB1r9NeO2e7F9BwSQ Qh5WIuAnM4wdqrrMgnQ2iIDwve9A540Ue+gFcWmbB+zKnD7X/s0DWNU9lpmxff+NCvmB 3oNhurgdGgvQwJ2mgfpWGDi4zW5PuTX7UOx4RhOARWFdJMDOvR2vFlcwSUT8At2FLdGt cFk3TmFLOVzhH9qNrkLrPbp9Kr4IDOYPjeKnVJW4xNcBB7Ie5TvUC+nxYPrS+I72Altg X7W4KRfuo688K/cI3+iwN/jMOcb7vJRW7pnM6c+7gWhMtCs5Pp6ckuOOMpKpNCodom5b R0Xg== X-Gm-Message-State: AC+VfDyLaYLAHR/0eeefc3YItjgSUiXGI+H7KPNZBIds9be+CbARV32G vNUmD8wnApGJgZASRPuDy4DdPA== X-Google-Smtp-Source: ACHHUZ7ms7OAKGCZkRczhA1SWfMIL/NPHAZni0NvY//Wj9lDtLQhlFsfX2EowXAYpVWs6Ek/Wh+UAA== X-Received: by 2002:a81:8403:0:b0:565:eae8:793b with SMTP id u3-20020a818403000000b00565eae8793bmr1235947ywf.25.1686275608094; Thu, 08 Jun 2023 18:53:28 -0700 (PDT) Received: from ripple.attlocal.net (172-10-233-147.lightspeed.sntcca.sbcglobal.net. [172.10.233.147]) by smtp.gmail.com with ESMTPSA id r125-20020a0de883000000b00568c29c3c4csm307485ywe.38.2023.06.08.18.53.24 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 08 Jun 2023 18:53:27 -0700 (PDT) Date: Thu, 8 Jun 2023 18:53:23 -0700 (PDT) From: Hugh Dickins X-X-Sender: hugh@ripple.attlocal.net To: Andrew Morton cc: Mike Kravetz , Mike Rapoport , "Kirill A. Shutemov" , Matthew Wilcox , David Hildenbrand , Suren Baghdasaryan , Qi Zheng , Yang Shi , Mel Gorman , Peter Xu , Peter Zijlstra , Will Deacon , Yu Zhao , Alistair Popple , Ralph Campbell , Ira Weiny , Steven Price , SeongJae Park , Lorenzo Stoakes , Huang Ying , Naoya Horiguchi , Christophe Leroy , Zack Rusin , Jason Gunthorpe , Axel Rasmussen , Anshuman Khandual , Pasha Tatashin , Miaohe Lin , Minchan Kim , Christoph Hellwig , Song Liu , Thomas Hellstrom , Ryan Roberts , linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: [PATCH v2 32/32] perf/core: Allow pte_offset_map() to fail In-Reply-To: Message-ID: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" In rare transient cases, not yet made possible, pte_offset_map() and pte_offet_map_lock() may not find a page table: handle appropriately. Signed-off-by: Hugh Dickins --- This is a perf patch, not an mm patch, and it will want to go in through the tip tree in due course; but keep it in this series for now, so that it's not missed, and not submitted before mm review. kernel/events/core.c | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/kernel/events/core.c b/kernel/events/core.c index db016e418931..174be710f3b3 100644 --- a/kernel/events/core.c +++ b/kernel/events/core.c @@ -7490,6 +7490,7 @@ static u64 perf_get_pgtable_size(struct mm_struct *mm= , unsigned long addr) return pud_leaf_size(pud); =20 pmdp =3D pmd_offset_lockless(pudp, pud, addr); +again: pmd =3D pmdp_get_lockless(pmdp); if (!pmd_present(pmd)) return 0; @@ -7498,6 +7499,9 @@ static u64 perf_get_pgtable_size(struct mm_struct *mm= , unsigned long addr) return pmd_leaf_size(pmd); =20 ptep =3D pte_offset_map(&pmd, addr); + if (!ptep) + goto again; + pte =3D ptep_get_lockless(ptep); if (pte_present(pte)) size =3D pte_leaf_size(pte); --=20 2.35.3 From nobody Sat Feb 7 20:47:53 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 39938EB64DB for ; Tue, 20 Jun 2023 06:50:18 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230020AbjFTGuR (ORCPT ); Tue, 20 Jun 2023 02:50:17 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49482 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229651AbjFTGuO (ORCPT ); Tue, 20 Jun 2023 02:50:14 -0400 Received: from mail-yw1-x1135.google.com (mail-yw1-x1135.google.com [IPv6:2607:f8b0:4864:20::1135]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id CABD5B1 for ; Mon, 19 Jun 2023 23:50:13 -0700 (PDT) Received: by mail-yw1-x1135.google.com with SMTP id 00721157ae682-57083a06b71so35803397b3.1 for ; Mon, 19 Jun 2023 23:50:13 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1687243813; x=1689835813; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:from:to:cc:subject:date:message-id:reply-to; bh=I/bRJJ7HKAxHd08pJcszqvVk2aL6jH9Ld13vowySjkQ=; b=Vz0vexlm8Fi1kRWWLKFH0afGyZbpoHfUCJFGK5u6d1yfX9pcmaTaF8B9KhDWYwm1jW Z3OHmDGoXUe3JhXcUAbF965Onew9srRDivZSbN3wndiTA6j7qLe46+c7OONnoi1awyTP PxuAvKUdhV9NZgbHaRPq9uuToBYnY7B3RKXHUc3DgHDN4+2mhVMJWHieVVS5XvM7nsua 94bKAJ2uPjOf+d8ibOrtydWioDkIQYoSfFcweFsuWKU+f3lbQACGTY8l9iGBLIH2Kqz6 q1zwm0178oZv/vq0nCnIzWVviStyiAH7Q6JscZHrNhFJ3utksGPBC2KEKTxPj80uNjMh YJOA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1687243813; x=1689835813; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=I/bRJJ7HKAxHd08pJcszqvVk2aL6jH9Ld13vowySjkQ=; b=jgRpP0eRYcwv67MHXXt+nhBc4MMmClldGBOK5kO0TNgYrwiBc+Hmlj/LGP9lEsKoJt xhFxT0zTniCotLriBs2GkJ3nifdw+61OAi7+f1lL/PHHXpaKkhyDT7EUJUkL/M5FkTZQ aDIPnWzaorJX4b1PErKeLxZFPmz9B7NleGCsp1o0axv74bKfBV5LCIjbs0gr4gMscwDe RY3KKlbYeuQhPAQ2WQon1wWfeLyKM0molfLZE+/KQyhf97ETAjXtsBO4KdLU9NwCSBrB EfNPgadF78h9AEBXVL7SndSSZwWrSTN11YGAWbAdCbAG/7pRP9RtmxFFWwPt/aChvX22 tK1Q== X-Gm-Message-State: AC+VfDxqM012fqSK3m3/UB54OziMHK5twEP5zop3uS40O1j+FwJfy6Uw SgNvQRF+V1IDUzwgQoIN1f779Q== X-Google-Smtp-Source: ACHHUZ6xe3gTMi2H2TKLGiqQf2yXBFKo4zstK8P2uGDdenyDLR1k6x+/t7pyX3iqc5ah0hb++0ZtyQ== X-Received: by 2002:a0d:d412:0:b0:56d:3402:b9a0 with SMTP id w18-20020a0dd412000000b0056d3402b9a0mr11226869ywd.14.1687243812830; Mon, 19 Jun 2023 23:50:12 -0700 (PDT) Received: from ripple.attlocal.net (172-10-233-147.lightspeed.sntcca.sbcglobal.net. [172.10.233.147]) by smtp.gmail.com with ESMTPSA id w1-20020a818601000000b0057328423a05sm356123ywf.80.2023.06.19.23.50.09 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 19 Jun 2023 23:50:12 -0700 (PDT) Date: Mon, 19 Jun 2023 23:50:00 -0700 (PDT) From: Hugh Dickins X-X-Sender: hugh@ripple.attlocal.net To: Andrew Morton cc: Mike Kravetz , Mike Rapoport , "Kirill A. Shutemov" , Matthew Wilcox , David Hildenbrand , Suren Baghdasaryan , Qi Zheng , Yang Shi , Mel Gorman , Peter Xu , Peter Zijlstra , Will Deacon , Yu Zhao , Alistair Popple , Ralph Campbell , Ira Weiny , Steven Price , SeongJae Park , Lorenzo Stoakes , Huang Ying , Naoya Horiguchi , Christophe Leroy , Zack Rusin , Jason Gunthorpe , Axel Rasmussen , Anshuman Khandual , Pasha Tatashin , Miaohe Lin , Minchan Kim , Christoph Hellwig , Song Liu , Thomas Hellstrom , Ryan Roberts , linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: [PATCH] mm/swapfile: delete outdated pte_offset_map() comment In-Reply-To: Message-ID: <9022632b-ba9d-8cb0-c25-4be9786481b5@google.com> References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Delete a triply out-of-date comment from add_swap_count_continuation(): 1. vmalloc_to_page() changed from pte_offset_map() to pte_offset_kernel() 2. pte_offset_map() changed from using kmap_atomic() to kmap_local_page() 3. kmap_atomic() changed from using fixed FIX_KMAP addresses in 2.6.37. Signed-off-by: Hugh Dickins --- Here's a late "33/32" to the series just moved to mm-stable - thank you! mm/swapfile.c | 5 ----- 1 file changed, 5 deletions(-) diff --git a/mm/swapfile.c b/mm/swapfile.c index 12d204e6dae2..0a17d85b50cb 100644 --- a/mm/swapfile.c +++ b/mm/swapfile.c @@ -3470,11 +3470,6 @@ int add_swap_count_continuation(swp_entry_t entry, g= fp_t gfp_mask) goto out; } =20 - /* - * We are fortunate that although vmalloc_to_page uses pte_offset_map, - * no architecture is using highmem pages for kernel page tables: so it - * will not corrupt the GFP_ATOMIC caller's atomic page table kmaps. - */ head =3D vmalloc_to_page(si->swap_map + offset); offset &=3D ~PAGE_MASK; =20 --=20 2.35.3