From nobody Sun Feb 8 14:51:49 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id D2433C7EE23 for ; Mon, 29 May 2023 06:15:29 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231445AbjE2GP2 (ORCPT ); Mon, 29 May 2023 02:15:28 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:43924 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231428AbjE2GPY (ORCPT ); Mon, 29 May 2023 02:15:24 -0400 Received: from mail-yw1-x112e.google.com (mail-yw1-x112e.google.com [IPv6:2607:f8b0:4864:20::112e]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E0EE0CF for ; Sun, 28 May 2023 23:15:11 -0700 (PDT) Received: by mail-yw1-x112e.google.com with SMTP id 00721157ae682-5658875abfaso41767437b3.1 for ; Sun, 28 May 2023 23:15:11 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1685340901; x=1687932901; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:from:to:cc:subject:date:message-id:reply-to; bh=2Xgtva00uaRjPQjQYWCbyYlUVerqtzlz4H3/JM5nRsA=; b=j75/ucEx6oo4OkFwpLoopPUGefT5zRQ/GQSssdYFJ4lLRni3rLEVhCMo1e8jwYMVcb YP6Bm+ySwzQyzGGQM2rSRov76G9UQ/Tn5RA33SDKqUYCIvPra4wwNlCuRu98iddHbdYN EF4ZzSOFNLEo4zXasuFIAULbNX6r1iMaUqjVeX4Zi3vuHqsR+CvgEyNVfPdXCJfUki2B 3dBz1c7QLHmuXQ+aR4qL+wJWgzXAMcn6OOeaRV4ZdMMOf1AGowm+XIRkGpuH6QAGrUcE pfw/89xaKUYoO8gOTy9XVNliAthq+Njc/D/eHcTCmEi6RvCXrpEvwwxwM3i0HwRhfzXq 7JqQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1685340901; x=1687932901; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=2Xgtva00uaRjPQjQYWCbyYlUVerqtzlz4H3/JM5nRsA=; b=IQg33jI7+MSW1OA2dSVvkoo++t7C4stbp0FS9tMk9XFwBAtE1WSY/R0DBl9npzUsSH fSITtJslxGcWWzAWdYIfKwvsK0SI6Se8dEVr22Hn7frHm5t9bXzVQIre/dg/dCbyrFfm xSU9/rLYtD4ORPD2Jpnnlus6GzJXVMh1WHyvlZa0eUh7/7m96zVibRJpAKoB98DKeH3o k87BwY9gIMLzXjvf27gfV6TNyzZjDAlg6k9go9+WXdMi9nBZf7QDSLESMmjVjS4MHhCT EmOFZ0AlzwhBI7DNDMZEC2KFB1/2ZkUHw5GlKI0bokvcSeDrw4p8mDcVfVG4ViCOCf7t pkxA== X-Gm-Message-State: AC+VfDyUfMSnYg3GLbSjqCkc/Otfi/ZRED72CYqsUm7A0JZ8EGoJmSCt QZZaGm12NiHSYku6tYCesMeoog== X-Google-Smtp-Source: ACHHUZ676xJPstkLOqdersuTwpjuuB4a5YN7mGCxCUJRDLUIc17B3O2mtdL/MxS8/JJgtz6rWj3hTw== X-Received: by 2002:a0d:e685:0:b0:55a:30f5:3d65 with SMTP id p127-20020a0de685000000b0055a30f53d65mr12605725ywe.41.1685340901250; Sun, 28 May 2023 23:15:01 -0700 (PDT) Received: from ripple.attlocal.net (172-10-233-147.lightspeed.sntcca.sbcglobal.net. [172.10.233.147]) by smtp.gmail.com with ESMTPSA id z7-20020a81a247000000b00560c2e3ec63sm3404765ywg.77.2023.05.28.23.14.57 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 28 May 2023 23:15:00 -0700 (PDT) Date: Sun, 28 May 2023 23:14:48 -0700 (PDT) From: Hugh Dickins X-X-Sender: hugh@ripple.attlocal.net To: Andrew Morton cc: Mike Kravetz , Mike Rapoport , "Kirill A. Shutemov" , Matthew Wilcox , David Hildenbrand , Suren Baghdasaryan , Qi Zheng , Yang Shi , Mel Gorman , Peter Xu , Peter Zijlstra , Will Deacon , Yu Zhao , Alistair Popple , Ralph Campbell , Ira Weiny , Steven Price , SeongJae Park , Naoya Horiguchi , Christophe Leroy , Zack Rusin , Jason Gunthorpe , Axel Rasmussen , Anshuman Khandual , Pasha Tatashin , Miaohe Lin , Minchan Kim , Christoph Hellwig , Song Liu , Thomas Hellstrom , Russell King , "David S. Miller" , Michael Ellerman , "Aneesh Kumar K.V" , Heiko Carstens , Christian Borntraeger , Claudio Imbrenda , Alexander Gordeev , Jann Horn , linux-arm-kernel@lists.infradead.org, sparclinux@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, linux-s390@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: [PATCH 01/12] mm/pgtable: add rcu_read_lock() and rcu_read_unlock()s In-Reply-To: <35e983f5-7ed3-b310-d949-9ae8b130cdab@google.com> Message-ID: <88c445ae-552-5243-31a4-2674bac62d4d@google.com> References: <35e983f5-7ed3-b310-d949-9ae8b130cdab@google.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Before putting them to use (several commits later), add rcu_read_lock() to pte_offset_map(), and rcu_read_unlock() to pte_unmap(). Make this a separate commit, since it risks exposing imbalances: prior commits have fixed all the known imbalances, but we may find some have been missed. Signed-off-by: Hugh Dickins --- include/linux/pgtable.h | 4 ++-- mm/pgtable-generic.c | 4 ++-- 2 files changed, 4 insertions(+), 4 deletions(-) diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h index a1326e61d7ee..8b0fc7fdc46f 100644 --- a/include/linux/pgtable.h +++ b/include/linux/pgtable.h @@ -99,7 +99,7 @@ static inline pte_t *pte_offset_kernel(pmd_t *pmd, unsign= ed long address) ((pte_t *)kmap_local_page(pmd_page(*(pmd))) + pte_index((address))) #define pte_unmap(pte) do { \ kunmap_local((pte)); \ - /* rcu_read_unlock() to be added later */ \ + rcu_read_unlock(); \ } while (0) #else static inline pte_t *__pte_map(pmd_t *pmd, unsigned long address) @@ -108,7 +108,7 @@ static inline pte_t *__pte_map(pmd_t *pmd, unsigned lon= g address) } static inline void pte_unmap(pte_t *pte) { - /* rcu_read_unlock() to be added later */ + rcu_read_unlock(); } #endif =20 diff --git a/mm/pgtable-generic.c b/mm/pgtable-generic.c index c7ab18a5fb77..674671835631 100644 --- a/mm/pgtable-generic.c +++ b/mm/pgtable-generic.c @@ -236,7 +236,7 @@ pte_t *__pte_offset_map(pmd_t *pmd, unsigned long addr,= pmd_t *pmdvalp) { pmd_t pmdval; =20 - /* rcu_read_lock() to be added later */ + rcu_read_lock(); pmdval =3D pmdp_get_lockless(pmd); if (pmdvalp) *pmdvalp =3D pmdval; @@ -250,7 +250,7 @@ pte_t *__pte_offset_map(pmd_t *pmd, unsigned long addr,= pmd_t *pmdvalp) } return __pte_map(&pmdval, addr); nomap: - /* rcu_read_unlock() to be added later */ + rcu_read_unlock(); return NULL; } =20 --=20 2.35.3 From nobody Sun Feb 8 14:51:49 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 67421C7EE23 for ; Mon, 29 May 2023 06:16:56 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231448AbjE2GQy (ORCPT ); Mon, 29 May 2023 02:16:54 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:45182 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231494AbjE2GQu (ORCPT ); Mon, 29 May 2023 02:16:50 -0400 Received: from mail-yw1-x1133.google.com (mail-yw1-x1133.google.com [IPv6:2607:f8b0:4864:20::1133]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 3DF19DD for ; Sun, 28 May 2023 23:16:26 -0700 (PDT) Received: by mail-yw1-x1133.google.com with SMTP id 00721157ae682-561f23dc55aso43805637b3.3 for ; Sun, 28 May 2023 23:16:26 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1685340981; x=1687932981; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:from:to:cc:subject:date:message-id:reply-to; bh=xPq2eB/hW7ShnFytnKcTY7aukisfg/jLLLPKQ4lP3rM=; b=TyhMHkB8EAAdy+hUHa48C2vLWt0oIXv3HPWA7oGBtoKurezV9dmnaHp/Ug08+YXBMJ q7/doDqEaB13J8DWMDOsmxCRPRcN6V2SXOTsYe+dqJjrxA7f0LVq1fNiXr9IMn2dnMMI ESckYbJKgJgrbJ8T+uYkbBEglbSci8JLRkQ5xTjOVaaAsg2STz0pP4v2Ide+0MWDvyHe y1hUI8uDpAzlLTOdYFAa9G5upyPvxazDxgjqrZ/FkHni8ZrS3oyiM+Gr8t/oORl7+XNi bEMYjRBzoNKzLiTXOPBKXj7pF37UgxQaCI1yMdVBb74YQf5DcZFdirdn0N62WO3feJu6 qWEA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1685340981; x=1687932981; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=xPq2eB/hW7ShnFytnKcTY7aukisfg/jLLLPKQ4lP3rM=; b=XHEIDmwH+szpUk1ygVvlskoXQSWDa84jScZy4d6fMbcrcdcFZhyxpPiJ8Sc+cj48t8 aNZsi5r6HrxnkgsDcvwEqubXaj0bYbmo+zrt6lCKLe0ldEctzoVuXURO+z0FD2YU38Sc MpqNL875yjDx176Pu2omHESmkS6xFo/pPhENzB2zGN0iA9eN+l0TK7b2D1rqhReCFtfO Vqf9pcJt6xgiYNrwZtOWf32wAIEJHlOfRNaCC2GmGj7uzzdq7xqgINewfPVRNJlCP5tw I95BTqKsvYXht+4Rfq6UOJ0vURFc2zvugwkSxnD6FtuosTi2FHD1JVyGbbGzxIn+rM9t ugCQ== X-Gm-Message-State: AC+VfDwpXtrN7Tb0gpEJDM2Jp0dJI4bHrXjG6GKTsa+LS104qAGbk0iv +xayYn16ryhyb2sxJIHR+ckRpw== X-Google-Smtp-Source: ACHHUZ5ZacpUNHnx35XFkkfr+0IABaEA7Bz0HX1XL9bgqfNu7gBbTpGfMWgF9E5JOg8lgzwGl1APAA== X-Received: by 2002:a0d:f003:0:b0:565:a0c8:7e66 with SMTP id z3-20020a0df003000000b00565a0c87e66mr11630388ywe.0.1685340981021; Sun, 28 May 2023 23:16:21 -0700 (PDT) Received: from ripple.attlocal.net (172-10-233-147.lightspeed.sntcca.sbcglobal.net. [172.10.233.147]) by smtp.gmail.com with ESMTPSA id n5-20020a819c45000000b00545a081847fsm3407533ywa.15.2023.05.28.23.16.17 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 28 May 2023 23:16:20 -0700 (PDT) Date: Sun, 28 May 2023 23:16:16 -0700 (PDT) From: Hugh Dickins X-X-Sender: hugh@ripple.attlocal.net To: Andrew Morton cc: Mike Kravetz , Mike Rapoport , "Kirill A. Shutemov" , Matthew Wilcox , David Hildenbrand , Suren Baghdasaryan , Qi Zheng , Yang Shi , Mel Gorman , Peter Xu , Peter Zijlstra , Will Deacon , Yu Zhao , Alistair Popple , Ralph Campbell , Ira Weiny , Steven Price , SeongJae Park , Naoya Horiguchi , Christophe Leroy , Zack Rusin , Jason Gunthorpe , Axel Rasmussen , Anshuman Khandual , Pasha Tatashin , Miaohe Lin , Minchan Kim , Christoph Hellwig , Song Liu , Thomas Hellstrom , Russell King , "David S. Miller" , Michael Ellerman , "Aneesh Kumar K.V" , Heiko Carstens , Christian Borntraeger , Claudio Imbrenda , Alexander Gordeev , Jann Horn , linux-arm-kernel@lists.infradead.org, sparclinux@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, linux-s390@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: [PATCH 02/12] mm/pgtable: add PAE safety to __pte_offset_map() In-Reply-To: <35e983f5-7ed3-b310-d949-9ae8b130cdab@google.com> Message-ID: <923480d5-35ab-7cac-79d0-343d16e29318@google.com> References: <35e983f5-7ed3-b310-d949-9ae8b130cdab@google.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" There is a faint risk that __pte_offset_map(), on a 32-bit architecture with a 64-bit pmd_t e.g. x86-32 with CONFIG_X86_PAE=3Dy, would succeed on a pmdval assembled from a pmd_low and a pmd_high which never belonged together: their combination not pointing to a page table at all, perhaps not even a valid pfn. pmdp_get_lockless() is not enough to prevent that. Guard against that (on such configs) by local_irq_save() blocking TLB flush between present updates, as linux/pgtable.h suggests. It's only needed around the pmdp_get_lockless() in __pte_offset_map(): a race when __pte_offset_map_lock() repeats the pmdp_get_lockless() after getting the lock, would just send it back to __pte_offset_map() again. CONFIG_GUP_GET_PXX_LOW_HIGH is enabled when required by mips, sh and x86. It is not enabled by arm-32 CONFIG_ARM_LPAE: my understanding is that Will Deacon's 2020 enhancements to READ_ONCE() are sufficient for arm. It is not enabled by arc, but its pmd_t is 32-bit even when pte_t 64-bit. Limit the IRQ disablement to CONFIG_HIGHPTE? Perhaps, but would need a little more work, to retry if pmd_low good for page table, but pmd_high non-zero from THP (and that might be making x86-specific assumptions). Signed-off-by: Hugh Dickins --- mm/pgtable-generic.c | 20 ++++++++++++++++++++ 1 file changed, 20 insertions(+) diff --git a/mm/pgtable-generic.c b/mm/pgtable-generic.c index 674671835631..d28b63386cef 100644 --- a/mm/pgtable-generic.c +++ b/mm/pgtable-generic.c @@ -232,12 +232,32 @@ pmd_t pmdp_collapse_flush(struct vm_area_struct *vma,= unsigned long address, #endif #endif /* CONFIG_TRANSPARENT_HUGEPAGE */ =20 +#if defined(CONFIG_GUP_GET_PXX_LOW_HIGH) && \ + (defined(CONFIG_SMP) || defined(CONFIG_PREEMPT_RCU)) +/* + * See the comment above ptep_get_lockless() in include/linux/pgtable.h: + * the barriers in pmdp_get_lockless() cannot guarantee that the value in + * pmd_high actually belongs with the value in pmd_low; but holding interr= upts + * off blocks the TLB flush between present updates, which guarantees that= a + * successful __pte_offset_map() points to a page from matched halves. + */ +#define config_might_irq_save(flags) local_irq_save(flags) +#define config_might_irq_restore(flags) local_irq_restore(flags) +#else +#define config_might_irq_save(flags) +#define config_might_irq_restore(flags) +#endif + pte_t *__pte_offset_map(pmd_t *pmd, unsigned long addr, pmd_t *pmdvalp) { + unsigned long __maybe_unused flags; pmd_t pmdval; =20 rcu_read_lock(); + config_might_irq_save(flags); pmdval =3D pmdp_get_lockless(pmd); + config_might_irq_restore(flags); + if (pmdvalp) *pmdvalp =3D pmdval; if (unlikely(pmd_none(pmdval) || is_pmd_migration_entry(pmdval))) --=20 2.35.3 From nobody Sun Feb 8 14:51:49 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0949AC77B7A for ; Mon, 29 May 2023 06:18:07 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230430AbjE2GSF (ORCPT ); Mon, 29 May 2023 02:18:05 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46294 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231438AbjE2GR6 (ORCPT ); Mon, 29 May 2023 02:17:58 -0400 Received: from mail-yb1-xb2b.google.com (mail-yb1-xb2b.google.com [IPv6:2607:f8b0:4864:20::b2b]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id F280211A for ; Sun, 28 May 2023 23:17:31 -0700 (PDT) Received: by mail-yb1-xb2b.google.com with SMTP id 3f1490d57ef6-ba71cd7ce7fso4322751276.1 for ; Sun, 28 May 2023 23:17:31 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1685341050; x=1687933050; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:from:to:cc:subject:date:message-id:reply-to; bh=QJ3N9RNUACQk0rp/xTQEOUK2bEy9LceuxJNDbuu7Hdo=; b=PnX8u4a7nFzrI9AMI1PfJ2JIEBsc/6HUoh5hn/jeCJld2cHVyPh5B3mpoBHYP4HlDt huCzuvd10XbNys3q964U8F3Pwm5I1086lfw4K2KXsSdCwoHho0iYP2E/vuoTg9TJPXAw 9Ao+OBlyjIf+ii9nemOKDnm34wjGWTWJPqiOtaDLYpgSbKG3NeKVJfh7S3pXb1pULdQN cjURL7J2sqlVPM4/ao36PpQzMPvierIr1dk2aTiiL2s04dhzXCN3987Ah17CrCgicXI7 uLzuDFZr6ZrjJPk0lh0dr69aYCwSbUbiqg2JiJfrZxUW7xsSohaWtB46bIclcU6whmCj +nDg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1685341050; x=1687933050; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=QJ3N9RNUACQk0rp/xTQEOUK2bEy9LceuxJNDbuu7Hdo=; b=ayc/u27ZpsiP91RHSmP2vvjmiz1I5Qy30y1tTw99/+NOSYMX6EdswNwG8RvTWZsBLQ s1LIWd0G+hfTjbs8Z99y6dGjwXu8+khyASBP+lAa7hql1wb2QoRV10tfON6fBwYYMbPI 3Lfihv+H4hoJGMljWpdHCKcBjvQ3je8qkADInNP1I30NzAxQ+hzIK4m1+RsXQPxbfUTx ubvmo8CqaQ69eDEKPVcSVDvDxa6LHmCZMNF7zdG+FB5GA61SP6S5rFtdyKTaWylTna2U VPV9SwPq8CPFTJwMCs4Fu6unDrlGFGKSMd1q2KNL3HH1tHbXhAeEF16MEkivkK08SYLv pkDw== X-Gm-Message-State: AC+VfDzUVvm9WwCa8NJkUfb7qz6ld9Tz3LjYXtgilBATWoD4fSo0X6Ta tt4hzqP8ALP7YZzB36OhyRrRmQ== X-Google-Smtp-Source: ACHHUZ6g2ncsiViAbUymnL1kKwlhN1cywz3Q5/8UINRdk6/z7ELyukBfTa2iPWzHzMx63rqo3lZAZA== X-Received: by 2002:a25:fc19:0:b0:ba1:e06b:bc57 with SMTP id v25-20020a25fc19000000b00ba1e06bbc57mr9201429ybd.64.1685341049679; Sun, 28 May 2023 23:17:29 -0700 (PDT) Received: from ripple.attlocal.net (172-10-233-147.lightspeed.sntcca.sbcglobal.net. [172.10.233.147]) by smtp.gmail.com with ESMTPSA id t62-20020a814641000000b0054f8b201c70sm3381110ywa.108.2023.05.28.23.17.26 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 28 May 2023 23:17:29 -0700 (PDT) Date: Sun, 28 May 2023 23:17:25 -0700 (PDT) From: Hugh Dickins X-X-Sender: hugh@ripple.attlocal.net To: Andrew Morton cc: Mike Kravetz , Mike Rapoport , "Kirill A. Shutemov" , Matthew Wilcox , David Hildenbrand , Suren Baghdasaryan , Qi Zheng , Yang Shi , Mel Gorman , Peter Xu , Peter Zijlstra , Will Deacon , Yu Zhao , Alistair Popple , Ralph Campbell , Ira Weiny , Steven Price , SeongJae Park , Naoya Horiguchi , Christophe Leroy , Zack Rusin , Jason Gunthorpe , Axel Rasmussen , Anshuman Khandual , Pasha Tatashin , Miaohe Lin , Minchan Kim , Christoph Hellwig , Song Liu , Thomas Hellstrom , Russell King , "David S. Miller" , Michael Ellerman , "Aneesh Kumar K.V" , Heiko Carstens , Christian Borntraeger , Claudio Imbrenda , Alexander Gordeev , Jann Horn , linux-arm-kernel@lists.infradead.org, sparclinux@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, linux-s390@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: [PATCH 03/12] arm: adjust_pte() use pte_offset_map_nolock() In-Reply-To: <35e983f5-7ed3-b310-d949-9ae8b130cdab@google.com> Message-ID: <94c2ebe1-6b23-1cee-4aae-22cb835776ff@google.com> References: <35e983f5-7ed3-b310-d949-9ae8b130cdab@google.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Instead of pte_lockptr(), use the recently added pte_offset_map_nolock() in adjust_pte(): because it gives the not-locked ptl for precisely that pte, which the caller can then safely lock; whereas pte_lockptr() is not so tightly coupled, because it dereferences the pmd pointer again. Signed-off-by: Hugh Dickins --- arch/arm/mm/fault-armv.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/arch/arm/mm/fault-armv.c b/arch/arm/mm/fault-armv.c index ca5302b0b7ee..7cb125497976 100644 --- a/arch/arm/mm/fault-armv.c +++ b/arch/arm/mm/fault-armv.c @@ -117,11 +117,10 @@ static int adjust_pte(struct vm_area_struct *vma, uns= igned long address, * must use the nested version. This also means we need to * open-code the spin-locking. */ - pte =3D pte_offset_map(pmd, address); + pte =3D pte_offset_map_nolock(vma->vm_mm, pmd, address, &ptl); if (!pte) return 0; =20 - ptl =3D pte_lockptr(vma->vm_mm, pmd); do_pte_lock(ptl); =20 ret =3D do_adjust_pte(vma, address, pfn, pte); --=20 2.35.3 From nobody Sun Feb 8 14:51:49 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 382F2C7EE23 for ; Mon, 29 May 2023 06:19:28 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231478AbjE2GT0 (ORCPT ); Mon, 29 May 2023 02:19:26 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:47406 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231545AbjE2GTQ (ORCPT ); Mon, 29 May 2023 02:19:16 -0400 Received: from mail-yw1-x1132.google.com (mail-yw1-x1132.google.com [IPv6:2607:f8b0:4864:20::1132]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 3F8F4E6A for ; Sun, 28 May 2023 23:18:49 -0700 (PDT) Received: by mail-yw1-x1132.google.com with SMTP id 00721157ae682-565f1145dc8so12002567b3.1 for ; Sun, 28 May 2023 23:18:49 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1685341128; x=1687933128; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:from:to:cc:subject:date:message-id:reply-to; bh=7tbEFbFk5pzLjwVra+37SCseEW8dJexk2Np2YDtZH+c=; b=1iA4YfG+l9uCrVK/VfGes1VI6E6dNElNmfH0Ym6MNMAOfLc3iDqikpG9Fh1kd0GZyN S2vDJSr0c2W4FoH5DTwBlYn9D4ysTNQfO/tuIIGYNqiQ41PKjRspMrJoqD8nN+1AW+Ll ZG2b3ePp7JtiTJ6Eed3Cy8FX00RyB15yPBCU8A7XIPIx+RIeQ7MpGS9HzfE/9OdJpx+S pAKYTbnfjluypjHVFpFATeSkHhnhYcVPl6qM+K8dkTFH6eXbpsYJ5LutgoPOrieOlSV3 oZuduBLXP+Ip6U8GW3EdevUfmC5f9cH9uaENaWd8adKsUbZ/byUTAk7/38rOHk1lVCDe TGTw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1685341128; x=1687933128; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=7tbEFbFk5pzLjwVra+37SCseEW8dJexk2Np2YDtZH+c=; b=HTvOYdt8JzE9RfsyRXruW80E05pqUvJyClCqnjFRZER/MkbN82eh1aM23VPWdSY5v9 gLN5VqHuM+KwIuEsm9ffZ9af/TBLZ9yViZj5RPDRmE4K1K3dbDRQTG7Ebc0YYyHt9P8M TfzsNYuubMKYW/MXd2tpT4T1mTxIWhiBkppx7Jfvt7pgSiww2C7Gf8KT9t1oMni5hQYt Y48wfuGoRNjtnzSy+eRvBeq2SYtQSQqhil8ulJiGhUYH5O1URgK+93BP/naQ2AxcKgC0 vkNH4KfiE6M5t6GsXAW55Bw0+UrTEt0uA06O5I0tiQqhNGr/EIdeNgZW+Zh6FcfaNwm3 l8Jg== X-Gm-Message-State: AC+VfDxJMdKt9CNOW5E13EDn+mc2pb3BUnFPfqNBJTvz+NL0L2JE1g9M SAoeKNWOskwKwjCeDR56cYPI4A== X-Google-Smtp-Source: ACHHUZ5wa1HnlGmpMM6V73fepd9odIlgE0K/gNhr4bUAPSedVjgsksTyRpOUN3YO0dkzYJPD3+qKCw== X-Received: by 2002:a81:6b09:0:b0:561:c147:1d46 with SMTP id g9-20020a816b09000000b00561c1471d46mr12682711ywc.9.1685341127900; Sun, 28 May 2023 23:18:47 -0700 (PDT) Received: from ripple.attlocal.net (172-10-233-147.lightspeed.sntcca.sbcglobal.net. [172.10.233.147]) by smtp.gmail.com with ESMTPSA id m19-20020a819e13000000b00560c648ef1esm3382356ywj.72.2023.05.28.23.18.44 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 28 May 2023 23:18:47 -0700 (PDT) Date: Sun, 28 May 2023 23:18:43 -0700 (PDT) From: Hugh Dickins X-X-Sender: hugh@ripple.attlocal.net To: Andrew Morton cc: Mike Kravetz , Mike Rapoport , "Kirill A. Shutemov" , Matthew Wilcox , David Hildenbrand , Suren Baghdasaryan , Qi Zheng , Yang Shi , Mel Gorman , Peter Xu , Peter Zijlstra , Will Deacon , Yu Zhao , Alistair Popple , Ralph Campbell , Ira Weiny , Steven Price , SeongJae Park , Naoya Horiguchi , Christophe Leroy , Zack Rusin , Jason Gunthorpe , Axel Rasmussen , Anshuman Khandual , Pasha Tatashin , Miaohe Lin , Minchan Kim , Christoph Hellwig , Song Liu , Thomas Hellstrom , Russell King , "David S. Miller" , Michael Ellerman , "Aneesh Kumar K.V" , Heiko Carstens , Christian Borntraeger , Claudio Imbrenda , Alexander Gordeev , Jann Horn , linux-arm-kernel@lists.infradead.org, sparclinux@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, linux-s390@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: [PATCH 04/12] powerpc: assert_pte_locked() use pte_offset_map_nolock() In-Reply-To: <35e983f5-7ed3-b310-d949-9ae8b130cdab@google.com> Message-ID: References: <35e983f5-7ed3-b310-d949-9ae8b130cdab@google.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Instead of pte_lockptr(), use the recently added pte_offset_map_nolock() in assert_pte_locked(). BUG if pte_offset_map_nolock() fails: this is stricter than the previous implementation, which skipped when pmd_none() (with a comment on khugepaged collapse transitions): but wouldn't we want to know, if an assert_pte_locked() caller can be racing such transitions? This mod might cause new crashes: which either expose my ignorance, or indicate issues to be fixed, or limit the usage of assert_pte_locked(). Signed-off-by: Hugh Dickins --- arch/powerpc/mm/pgtable.c | 16 ++++++---------- 1 file changed, 6 insertions(+), 10 deletions(-) diff --git a/arch/powerpc/mm/pgtable.c b/arch/powerpc/mm/pgtable.c index cb2dcdb18f8e..16b061af86d7 100644 --- a/arch/powerpc/mm/pgtable.c +++ b/arch/powerpc/mm/pgtable.c @@ -311,6 +311,8 @@ void assert_pte_locked(struct mm_struct *mm, unsigned l= ong addr) p4d_t *p4d; pud_t *pud; pmd_t *pmd; + pte_t *pte; + spinlock_t *ptl; =20 if (mm =3D=3D &init_mm) return; @@ -321,16 +323,10 @@ void assert_pte_locked(struct mm_struct *mm, unsigned= long addr) pud =3D pud_offset(p4d, addr); BUG_ON(pud_none(*pud)); pmd =3D pmd_offset(pud, addr); - /* - * khugepaged to collapse normal pages to hugepage, first set - * pmd to none to force page fault/gup to take mmap_lock. After - * pmd is set to none, we do a pte_clear which does this assertion - * so if we find pmd none, return. - */ - if (pmd_none(*pmd)) - return; - BUG_ON(!pmd_present(*pmd)); - assert_spin_locked(pte_lockptr(mm, pmd)); + pte =3D pte_offset_map_nolock(mm, pmd, addr, &ptl); + BUG_ON(!pte); + assert_spin_locked(ptl); + pte_unmap(pte); } #endif /* CONFIG_DEBUG_VM */ =20 --=20 2.35.3 From nobody Sun Feb 8 14:51:49 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id BC6CAC7EE23 for ; Mon, 29 May 2023 06:20:35 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231484AbjE2GUe (ORCPT ); Mon, 29 May 2023 02:20:34 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48096 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231355AbjE2GU2 (ORCPT ); Mon, 29 May 2023 02:20:28 -0400 Received: from mail-yb1-xb2a.google.com (mail-yb1-xb2a.google.com [IPv6:2607:f8b0:4864:20::b2a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 62E21B2 for ; Sun, 28 May 2023 23:20:27 -0700 (PDT) Received: by mail-yb1-xb2a.google.com with SMTP id 3f1490d57ef6-bacf9edc87bso4401016276.1 for ; Sun, 28 May 2023 23:20:27 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1685341226; x=1687933226; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:from:to:cc:subject:date:message-id:reply-to; bh=osELeF1qrpmk7tMBwVJZ0+Hn0dD614B/zeb//sVH2O8=; b=lDd5t1DNHYrERHggJZt1tFe/2MuojYDB8DYuX4saUcSsWZjN9TJvB4D3jOc40ZJzuw KXTyU64u/loqgfjTny6vyEEHc7I3ap+5KBFfNEbc+et5E0P2BRIz5MO59vg+RyUV5HGF LQIvE8HoWeX7dfHrfHrThINSr2Jkgw1dNwKUT5ElKyxmqFpo9JxBWX275kcmdKdX3BCO TKmGDZP/FC8QUdgVXd4u26ydd2mkiG9rdEZxkd6dJlJez+Ye+8eFZqq1OHv/Cc+ig/ka 6L20gyN+7z7wp1ikvxHZmZH1RAvGKKb0++xvnFOn6a1yNDE6E2enDzRJirlL0XfSo/Zh 2JMQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1685341226; x=1687933226; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=osELeF1qrpmk7tMBwVJZ0+Hn0dD614B/zeb//sVH2O8=; b=AJ1Vqg1fIVpMNdMTg1tJnO1oS4PBcK8DvBH33sXDoYMd6ssCNoueKS2k45H4w/j1gC ID4bBlD9cwPJFFdy6j72BTH3ls6YdpQy4dk7ggK7DyMRnYNuHsBJPdkXOZuQCIfl8Nvo C52/LmdwuCEGyT7ILP67NyvjZe2Rrq16Ktp2kKAl4Zs9P5gosyV1l/lgBhdci4E/bIQx 7KjZ7cgzNM61hagwa/lOgytAvfPNbV88BiJYy1pH9t0DuTtARGn+5oOugtWJmQ2QA8XR 2sBmNH6rU3/DzzOzz15DKd6AyR6VwsbPwJWGwHedN6vr79QOCRtb0hm/Nu6S6Ncs3P65 V3fg== X-Gm-Message-State: AC+VfDz7+6WRng//oQ53Y5o0hBEA68NNrktWnvp2u/DZTJ96DxB1LtZ9 RHxFJ0nJ84AyJbibpR44cRsOYg== X-Google-Smtp-Source: ACHHUZ5aMde8BUNH3d96u3IOrLaqOFrYAlrmuVUlWt4ugn0e7ueCBkueldmdmxLB7UIM6n20FuWD7w== X-Received: by 2002:a81:8304:0:b0:565:c888:1d09 with SMTP id t4-20020a818304000000b00565c8881d09mr8471629ywf.30.1685341226458; Sun, 28 May 2023 23:20:26 -0700 (PDT) Received: from ripple.attlocal.net (172-10-233-147.lightspeed.sntcca.sbcglobal.net. [172.10.233.147]) by smtp.gmail.com with ESMTPSA id y184-20020a0dd6c1000000b00565e57e6662sm1530559ywd.55.2023.05.28.23.20.22 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 28 May 2023 23:20:25 -0700 (PDT) Date: Sun, 28 May 2023 23:20:21 -0700 (PDT) From: Hugh Dickins X-X-Sender: hugh@ripple.attlocal.net To: Andrew Morton cc: Mike Kravetz , Mike Rapoport , "Kirill A. Shutemov" , Matthew Wilcox , David Hildenbrand , Suren Baghdasaryan , Qi Zheng , Yang Shi , Mel Gorman , Peter Xu , Peter Zijlstra , Will Deacon , Yu Zhao , Alistair Popple , Ralph Campbell , Ira Weiny , Steven Price , SeongJae Park , Naoya Horiguchi , Christophe Leroy , Zack Rusin , Jason Gunthorpe , Axel Rasmussen , Anshuman Khandual , Pasha Tatashin , Miaohe Lin , Minchan Kim , Christoph Hellwig , Song Liu , Thomas Hellstrom , Russell King , "David S. Miller" , Michael Ellerman , "Aneesh Kumar K.V" , Heiko Carstens , Christian Borntraeger , Claudio Imbrenda , Alexander Gordeev , Jann Horn , linux-arm-kernel@lists.infradead.org, sparclinux@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, linux-s390@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: [PATCH 05/12] powerpc: add pte_free_defer() for pgtables sharing page In-Reply-To: <35e983f5-7ed3-b310-d949-9ae8b130cdab@google.com> Message-ID: <28eb289f-ea2c-8eb9-63bb-9f7d7b9ccc11@google.com> References: <35e983f5-7ed3-b310-d949-9ae8b130cdab@google.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Add powerpc-specific pte_free_defer(), to call pte_free() via call_rcu(). pte_free_defer() will be called inside khugepaged's retract_page_tables() loop, where allocating extra memory cannot be relied upon. This precedes the generic version to avoid build breakage from incompatible pgtable_t. Signed-off-by: Hugh Dickins --- arch/powerpc/include/asm/pgalloc.h | 4 ++++ arch/powerpc/mm/pgtable-frag.c | 18 ++++++++++++++++++ 2 files changed, 22 insertions(+) diff --git a/arch/powerpc/include/asm/pgalloc.h b/arch/powerpc/include/asm/= pgalloc.h index 3360cad78ace..3a971e2a8c73 100644 --- a/arch/powerpc/include/asm/pgalloc.h +++ b/arch/powerpc/include/asm/pgalloc.h @@ -45,6 +45,10 @@ static inline void pte_free(struct mm_struct *mm, pgtabl= e_t ptepage) pte_fragment_free((unsigned long *)ptepage, 0); } =20 +/* arch use pte_free_defer() implementation in arch/powerpc/mm/pgtable-fra= g.c */ +#define pte_free_defer pte_free_defer +void pte_free_defer(struct mm_struct *mm, pgtable_t pgtable); + /* * Functions that deal with pagetables that could be at any level of * the table need to be passed an "index_size" so they know how to diff --git a/arch/powerpc/mm/pgtable-frag.c b/arch/powerpc/mm/pgtable-frag.c index 20652daa1d7e..3a3dac77faf2 100644 --- a/arch/powerpc/mm/pgtable-frag.c +++ b/arch/powerpc/mm/pgtable-frag.c @@ -120,3 +120,21 @@ void pte_fragment_free(unsigned long *table, int kerne= l) __free_page(page); } } + +#ifdef CONFIG_TRANSPARENT_HUGEPAGE +static void pte_free_now(struct rcu_head *head) +{ + struct page *page; + + page =3D container_of(head, struct page, rcu_head); + pte_fragment_free((unsigned long *)page_to_virt(page), 0); +} + +void pte_free_defer(struct mm_struct *mm, pgtable_t pgtable) +{ + struct page *page; + + page =3D virt_to_page(pgtable); + call_rcu(&page->rcu_head, pte_free_now); +} +#endif /* CONFIG_TRANSPARENT_HUGEPAGE */ --=20 2.35.3 From nobody Sun Feb 8 14:51:49 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id D194FC77B7A for ; Mon, 29 May 2023 06:21:45 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229589AbjE2GVo (ORCPT ); Mon, 29 May 2023 02:21:44 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48920 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230332AbjE2GVh (ORCPT ); Mon, 29 May 2023 02:21:37 -0400 Received: from mail-yb1-xb2f.google.com (mail-yb1-xb2f.google.com [IPv6:2607:f8b0:4864:20::b2f]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 04329B5 for ; Sun, 28 May 2023 23:21:34 -0700 (PDT) Received: by mail-yb1-xb2f.google.com with SMTP id 3f1490d57ef6-ba86ec8047bso4309360276.3 for ; Sun, 28 May 2023 23:21:33 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1685341293; x=1687933293; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:from:to:cc:subject:date:message-id:reply-to; bh=kjV6xBD8+Z39YuH4hgrLfpItULNEjB3if/M/mkAbsVQ=; b=rWUgEbcRxN+0lh7aV7n4UOuWoW/akkZOM/gqoHjVcxn/6hheszcuPJOLVJ4sguWyP8 nkQlUMJmy6e/Jx9YVzvAF7u4UBrtQ7Gt8z6ETj5pN1DYiBGqn6irZMUv2C9TbfCBC78I xaCU0qKIPP/XwpIBhRQPgPdjxQJW6d4VJskC/kyJTtslu2euOqK8ik9oIKBYFLT0OWAQ hthSfePHY1Fj36lFOt87PpfGm9V25DdJkUjrx5h6k5n/1x83JounXfxF7b0VnDNmT3pg /B3jbAdtHb1fjyAf9CqPVGMMMvFfo7sNqLyn5ELpZamilbHqPMk5v4bK33lqW9qB8lmS adrg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1685341293; x=1687933293; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=kjV6xBD8+Z39YuH4hgrLfpItULNEjB3if/M/mkAbsVQ=; b=GnJwADKq6FsjwvYkhnnB/HTC2KcupIJ3lw2hILwtuM7Bm64dxumjGyEC+9CEKjV48H osPo2emiSnswvnqmrJDbtaR8dP4oVJ7XdQnPAtnM6q4zKSyzomO4WCgUFRBjoWYAF1ev 0vgHF1AfUGx8LQrBFqI0BuPW2605FkvylUp0vzQSUf09eeH/22olPM/dvExRQzD/EbyN gvNSXm7U/AcX8ogZNVujiCQ+v1tDzCypQSFiBZy0+kL3gnNXJsDcw0BQbX3aVVgna8Ju zTXyEgIpReYpH94u8sAyStAn7MeShPD6IgLnuHVpOHZ7QUyXphv8ADqNbLDoGUGjhYj5 Sybw== X-Gm-Message-State: AC+VfDy0yTr2oZBwE8G9Az5YDQNpBiC4E3w2WcI6Bg8XrJPfjujXERuv mWoC5egjS15RjqvMLl28C2F0Wg== X-Google-Smtp-Source: ACHHUZ6EPD4pm5LL+EWBEb/p2RZtAUzDMPbUfE7F4FXgHJQ8XYwzEv8l0puT9pEGX6+B6Mzd+dzTzw== X-Received: by 2002:a25:d796:0:b0:ba8:3e89:bd69 with SMTP id o144-20020a25d796000000b00ba83e89bd69mr9536233ybg.12.1685341292952; Sun, 28 May 2023 23:21:32 -0700 (PDT) Received: from ripple.attlocal.net (172-10-233-147.lightspeed.sntcca.sbcglobal.net. [172.10.233.147]) by smtp.gmail.com with ESMTPSA id m205-20020a2571d6000000b00ba87bc06fe5sm2712527ybc.52.2023.05.28.23.21.28 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 28 May 2023 23:21:32 -0700 (PDT) Date: Sun, 28 May 2023 23:21:27 -0700 (PDT) From: Hugh Dickins X-X-Sender: hugh@ripple.attlocal.net To: Andrew Morton cc: Mike Kravetz , Mike Rapoport , "Kirill A. Shutemov" , Matthew Wilcox , David Hildenbrand , Suren Baghdasaryan , Qi Zheng , Yang Shi , Mel Gorman , Peter Xu , Peter Zijlstra , Will Deacon , Yu Zhao , Alistair Popple , Ralph Campbell , Ira Weiny , Steven Price , SeongJae Park , Naoya Horiguchi , Christophe Leroy , Zack Rusin , Jason Gunthorpe , Axel Rasmussen , Anshuman Khandual , Pasha Tatashin , Miaohe Lin , Minchan Kim , Christoph Hellwig , Song Liu , Thomas Hellstrom , Russell King , "David S. Miller" , Michael Ellerman , "Aneesh Kumar K.V" , Heiko Carstens , Christian Borntraeger , Claudio Imbrenda , Alexander Gordeev , Jann Horn , linux-arm-kernel@lists.infradead.org, sparclinux@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, linux-s390@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: [PATCH 06/12] sparc: add pte_free_defer() for pgtables sharing page In-Reply-To: <35e983f5-7ed3-b310-d949-9ae8b130cdab@google.com> Message-ID: References: <35e983f5-7ed3-b310-d949-9ae8b130cdab@google.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Add sparc-specific pte_free_defer(), to call pte_free() via call_rcu(). pte_free_defer() will be called inside khugepaged's retract_page_tables() loop, where allocating extra memory cannot be relied upon. This precedes the generic version to avoid build breakage from incompatible pgtable_t. Signed-off-by: Hugh Dickins --- arch/sparc/include/asm/pgalloc_64.h | 4 ++++ arch/sparc/mm/init_64.c | 16 ++++++++++++++++ 2 files changed, 20 insertions(+) diff --git a/arch/sparc/include/asm/pgalloc_64.h b/arch/sparc/include/asm/p= galloc_64.h index 7b5561d17ab1..caa7632be4c2 100644 --- a/arch/sparc/include/asm/pgalloc_64.h +++ b/arch/sparc/include/asm/pgalloc_64.h @@ -65,6 +65,10 @@ pgtable_t pte_alloc_one(struct mm_struct *mm); void pte_free_kernel(struct mm_struct *mm, pte_t *pte); void pte_free(struct mm_struct *mm, pgtable_t ptepage); =20 +/* arch use pte_free_defer() implementation in arch/sparc/mm/init_64.c */ +#define pte_free_defer pte_free_defer +void pte_free_defer(struct mm_struct *mm, pgtable_t pgtable); + #define pmd_populate_kernel(MM, PMD, PTE) pmd_set(MM, PMD, PTE) #define pmd_populate(MM, PMD, PTE) pmd_set(MM, PMD, PTE) =20 diff --git a/arch/sparc/mm/init_64.c b/arch/sparc/mm/init_64.c index 04f9db0c3111..b7c6aa085ef6 100644 --- a/arch/sparc/mm/init_64.c +++ b/arch/sparc/mm/init_64.c @@ -2930,6 +2930,22 @@ void pgtable_free(void *table, bool is_page) } =20 #ifdef CONFIG_TRANSPARENT_HUGEPAGE +static void pte_free_now(struct rcu_head *head) +{ + struct page *page; + + page =3D container_of(head, struct page, rcu_head); + __pte_free((pgtable_t)page_to_virt(page)); +} + +void pte_free_defer(struct mm_struct *mm, pgtable_t pgtable) +{ + struct page *page; + + page =3D virt_to_page(pgtable); + call_rcu(&page->rcu_head, pte_free_now); +} + void update_mmu_cache_pmd(struct vm_area_struct *vma, unsigned long addr, pmd_t *pmd) { --=20 2.35.3 From nobody Sun Feb 8 14:51:49 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 29D1DC7EE29 for ; Mon, 29 May 2023 06:22:58 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231550AbjE2GW4 (ORCPT ); Mon, 29 May 2023 02:22:56 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:50042 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229570AbjE2GWw (ORCPT ); Mon, 29 May 2023 02:22:52 -0400 Received: from mail-yw1-x112e.google.com (mail-yw1-x112e.google.com [IPv6:2607:f8b0:4864:20::112e]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 05D06DE for ; Sun, 28 May 2023 23:22:47 -0700 (PDT) Received: by mail-yw1-x112e.google.com with SMTP id 00721157ae682-565c7399afaso27085197b3.1 for ; Sun, 28 May 2023 23:22:46 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1685341366; x=1687933366; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:from:to:cc:subject:date:message-id:reply-to; bh=MHjeymamXVEDQ3Kw25JCka7yLmkzZOMAvMVAEydeLzM=; b=JnCv5o3gZCSjIaRpRDj860LJkDOANHqriyLGSv5pPF/RAyg9m3lQdupCXaTyei8DMf e3KqZBCpZIzZeMjk/yaxPctH6c49veiWRbUeGgx17gSNxkwyxqN1QFATHf6q6YutmR3t HvT+zDNBDGXXzqUl/LmcNCNapCFXfgc8yyTJIwdBnMkvFfp9LdRJqcRYSLqq8ynJ3LiF LXxpx3gFYAZnedKKq/SPuYQMj/8ol6u4/3db6BOFo/ME2oJub0w8Z8IEMye4507Zx0Ui smMsg0FV9ON+iRDLGBbSQlzdjRNF2P0/vV32Ee1vXvhsjJhpzkmQMG6hoZRiP5zjDNGY IC9A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1685341366; x=1687933366; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=MHjeymamXVEDQ3Kw25JCka7yLmkzZOMAvMVAEydeLzM=; b=Fq3VGt4qHHnM28qsoNWhnSNsXpVUMV0ROmQzF623MrGlEOupmN1p1lYmvXMkooMyDk EzkiG0BEb5WQEMnRfe2rHwL/rbLEDTsX90BjlhIdlYTc4u9uzKwTGsFN1xuo27e7nhSy RbD7YQ/gZq7d4rQoP7yzsvXxdxBmII09JQjW/LzfSKyZ2w4Kl2h8AtWJfnlp7hwjv1/g U9xOKQQtfWxym+UWWa4BBoPwhPMkNjUpw06nTV/hXbD1nf8XsSO8LmZbfA5lmS3IBY0g PMPDaugtarECgnanSiQq8D4bjo1fml5Gw0PhaLETZFX6JwJ7VyMRhTfMa/MkADTLdbz2 XovA== X-Gm-Message-State: AC+VfDxFPJcXLJERniKuvQP/dYdA+XphdypcVJQFm791pysA6dwFdRvQ 91LM8G30C+6mQsIzOkfEGkUgXg== X-Google-Smtp-Source: ACHHUZ4SIMI4er/AF0deXgjHIDsOF79qc5oA7wlkFnZQpWYe4q0dmwzIjbUF/Ql+fhbfUc3dtFgx4A== X-Received: by 2002:a81:a043:0:b0:560:beeb:6fc1 with SMTP id x64-20020a81a043000000b00560beeb6fc1mr13114394ywg.16.1685341365991; Sun, 28 May 2023 23:22:45 -0700 (PDT) Received: from ripple.attlocal.net (172-10-233-147.lightspeed.sntcca.sbcglobal.net. [172.10.233.147]) by smtp.gmail.com with ESMTPSA id t66-20020a818345000000b00568938ca41bsm405426ywf.53.2023.05.28.23.22.41 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 28 May 2023 23:22:45 -0700 (PDT) Date: Sun, 28 May 2023 23:22:40 -0700 (PDT) From: Hugh Dickins X-X-Sender: hugh@ripple.attlocal.net To: Andrew Morton cc: Mike Kravetz , Mike Rapoport , "Kirill A. Shutemov" , Matthew Wilcox , David Hildenbrand , Suren Baghdasaryan , Qi Zheng , Yang Shi , Mel Gorman , Peter Xu , Peter Zijlstra , Will Deacon , Yu Zhao , Alistair Popple , Ralph Campbell , Ira Weiny , Steven Price , SeongJae Park , Naoya Horiguchi , Christophe Leroy , Zack Rusin , Jason Gunthorpe , Axel Rasmussen , Anshuman Khandual , Pasha Tatashin , Miaohe Lin , Minchan Kim , Christoph Hellwig , Song Liu , Thomas Hellstrom , Russell King , "David S. Miller" , Michael Ellerman , "Aneesh Kumar K.V" , Heiko Carstens , Christian Borntraeger , Claudio Imbrenda , Alexander Gordeev , Jann Horn , linux-arm-kernel@lists.infradead.org, sparclinux@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, linux-s390@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: [PATCH 07/12] s390: add pte_free_defer(), with use of mmdrop_async() In-Reply-To: <35e983f5-7ed3-b310-d949-9ae8b130cdab@google.com> Message-ID: <6dd63b39-e71f-2e8b-7e0-83e02f3bcb39@google.com> References: <35e983f5-7ed3-b310-d949-9ae8b130cdab@google.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Add s390-specific pte_free_defer(), to call pte_free() via call_rcu(). pte_free_defer() will be called inside khugepaged's retract_page_tables() loop, where allocating extra memory cannot be relied upon. This precedes the generic version to avoid build breakage from incompatible pgtable_t. This version is more complicated than others: because page_table_free() needs to know which fragment is being freed, and which mm to link it to. page_table_free()'s fragment handling is clever, but I could too easily break it: what's done here in pte_free_defer() and pte_free_now() might be better integrated with page_table_free()'s cleverness, but not by me! By the time that page_table_free() gets called via RCU, it's conceivable that mm would already have been freed: so mmgrab() in pte_free_defer() and mmdrop() in pte_free_now(). No, that is not a good context to call mmdrop() from, so make mmdrop_async() public and use that. Signed-off-by: Hugh Dickins Reviewed-by: Gerald Schaefer --- arch/s390/include/asm/pgalloc.h | 4 ++++ arch/s390/mm/pgalloc.c | 34 +++++++++++++++++++++++++++++++++ include/linux/mm_types.h | 2 +- include/linux/sched/mm.h | 1 + kernel/fork.c | 2 +- 5 files changed, 41 insertions(+), 2 deletions(-) diff --git a/arch/s390/include/asm/pgalloc.h b/arch/s390/include/asm/pgallo= c.h index 17eb618f1348..89a9d5ef94f8 100644 --- a/arch/s390/include/asm/pgalloc.h +++ b/arch/s390/include/asm/pgalloc.h @@ -143,6 +143,10 @@ static inline void pmd_populate(struct mm_struct *mm, #define pte_free_kernel(mm, pte) page_table_free(mm, (unsigned long *) pte) #define pte_free(mm, pte) page_table_free(mm, (unsigned long *) pte) =20 +/* arch use pte_free_defer() implementation in arch/s390/mm/pgalloc.c */ +#define pte_free_defer pte_free_defer +void pte_free_defer(struct mm_struct *mm, pgtable_t pgtable); + void vmem_map_init(void); void *vmem_crst_alloc(unsigned long val); pte_t *vmem_pte_alloc(void); diff --git a/arch/s390/mm/pgalloc.c b/arch/s390/mm/pgalloc.c index 66ab68db9842..0129de9addfd 100644 --- a/arch/s390/mm/pgalloc.c +++ b/arch/s390/mm/pgalloc.c @@ -346,6 +346,40 @@ void page_table_free(struct mm_struct *mm, unsigned lo= ng *table) __free_page(page); } =20 +#ifdef CONFIG_TRANSPARENT_HUGEPAGE +static void pte_free_now(struct rcu_head *head) +{ + struct page *page; + unsigned long mm_bit; + struct mm_struct *mm; + unsigned long *table; + + page =3D container_of(head, struct page, rcu_head); + table =3D (unsigned long *)page_to_virt(page); + mm_bit =3D (unsigned long)page->pt_mm; + /* 4K page has only two 2K fragments, but alignment allows eight */ + mm =3D (struct mm_struct *)(mm_bit & ~7); + table +=3D PTRS_PER_PTE * (mm_bit & 7); + page_table_free(mm, table); + mmdrop_async(mm); +} + +void pte_free_defer(struct mm_struct *mm, pgtable_t pgtable) +{ + struct page *page; + unsigned long mm_bit; + + mmgrab(mm); + page =3D virt_to_page(pgtable); + /* Which 2K page table fragment of a 4K page? */ + mm_bit =3D ((unsigned long)pgtable & ~PAGE_MASK) / + (PTRS_PER_PTE * sizeof(pte_t)); + mm_bit +=3D (unsigned long)mm; + page->pt_mm =3D (struct mm_struct *)mm_bit; + call_rcu(&page->rcu_head, pte_free_now); +} +#endif /* CONFIG_TRANSPARENT_HUGEPAGE */ + void page_table_free_rcu(struct mmu_gather *tlb, unsigned long *table, unsigned long vmaddr) { diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index 306a3d1a0fa6..1667a1bdb8a8 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -146,7 +146,7 @@ struct page { pgtable_t pmd_huge_pte; /* protected by page->ptl */ unsigned long _pt_pad_2; /* mapping */ union { - struct mm_struct *pt_mm; /* x86 pgds only */ + struct mm_struct *pt_mm; /* x86 pgd, s390 */ atomic_t pt_frag_refcount; /* powerpc */ }; #if ALLOC_SPLIT_PTLOCKS diff --git a/include/linux/sched/mm.h b/include/linux/sched/mm.h index 8d89c8c4fac1..a9043d1a0d55 100644 --- a/include/linux/sched/mm.h +++ b/include/linux/sched/mm.h @@ -41,6 +41,7 @@ static inline void smp_mb__after_mmgrab(void) smp_mb__after_atomic(); } =20 +extern void mmdrop_async(struct mm_struct *mm); extern void __mmdrop(struct mm_struct *mm); =20 static inline void mmdrop(struct mm_struct *mm) diff --git a/kernel/fork.c b/kernel/fork.c index ed4e01daccaa..fa4486b65c56 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -942,7 +942,7 @@ static void mmdrop_async_fn(struct work_struct *work) __mmdrop(mm); } =20 -static void mmdrop_async(struct mm_struct *mm) +void mmdrop_async(struct mm_struct *mm) { if (unlikely(atomic_dec_and_test(&mm->mm_count))) { INIT_WORK(&mm->async_put_work, mmdrop_async_fn); --=20 2.35.3 From nobody Sun Feb 8 14:51:49 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7844CC77B7A for ; Mon, 29 May 2023 06:24:27 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231455AbjE2GYZ (ORCPT ); Mon, 29 May 2023 02:24:25 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:51422 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230158AbjE2GYX (ORCPT ); Mon, 29 May 2023 02:24:23 -0400 Received: from mail-yb1-xb34.google.com (mail-yb1-xb34.google.com [IPv6:2607:f8b0:4864:20::b34]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 5E8CB100 for ; Sun, 28 May 2023 23:23:52 -0700 (PDT) Received: by mail-yb1-xb34.google.com with SMTP id 3f1490d57ef6-ba81f71dfefso4296192276.0 for ; Sun, 28 May 2023 23:23:52 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1685341431; x=1687933431; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:from:to:cc:subject:date:message-id:reply-to; bh=2gSRrpVaV+v6oTVvWHOeguB0KnrQ/fwH26LJ25rfvoA=; b=mUjO4pWKJgDDKZpKrtcPe8FWu0b2JY0+TgvW1MX4OtSul/CzSbQKQ3pqCcMlWFJNGp gWOyHbCB9BXc3wVa0KzB77Yil0KZvLHMQCnsIsQVZ9fFzAnqckLLbO2LnWVC3QgXUA2m AKj7k1CX2prs/vdJ8Dux8TrJisS6znNjC1Vtrv/PsdyCPjLHWieMNw6R9XprNyEWjRCZ 4j0MNun5Zv7iQwiFWh+o2sHynbvf1kUMSyoEE1HC9cxss3LWR7d6E9drnyGo/GZyrbJL X1vYzNNY1HAUAQnzj74UQMsBrtflmgqlc89omMQkyN/nWA1df9Rcpj56XseLcz9CDb7K rRHA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1685341431; x=1687933431; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=2gSRrpVaV+v6oTVvWHOeguB0KnrQ/fwH26LJ25rfvoA=; b=bfyhuat8baSHlbVL5vZ0nWX0Y4m6DavaI89nlBOrFz3r529U/BmNUUPUKsdocYNErU FLD5WBfjsZmoqG+Za8LdL88YD5gcFVADMkgqY9S9z+a5ibJBR7GYN4BAXeQQR3a2wvhw iY9btR/U/CF0v6ZnJl6hGQCm4ajge+ZolLVeXH9zKHvj95aAVTaZBtQwhyvJ7k80LJiY 8FkYsuxOcFYXKKKcYUftVXV7Qj71gcRxniuUmf82R4E7ID6N+l7Cvpszy4CDfgx2jY2T 2y0rg8eQhl+VlF01yRUpDV7JCCFBWHhsv/d3+0Gqc7uNU3RbTKEasKkQkUof2SgYZoRl PeLQ== X-Gm-Message-State: AC+VfDzacjEr/g5UI8xJvja7t0ObkH1WXtYpAuP+z27Q0Ad1JNwwDYNm 5/wqboCcBezk/r48HkaT2q4Nrw== X-Google-Smtp-Source: ACHHUZ5CcwvoGgzDzBbk+Lv5ieVtVZp8uUiYjI1VJBrm/lel9oSDffzWyg8Wn1mT8OaCxHwZBaPMhA== X-Received: by 2002:a25:aae2:0:b0:bab:fdb3:7b56 with SMTP id t89-20020a25aae2000000b00babfdb37b56mr13795477ybi.24.1685341431250; Sun, 28 May 2023 23:23:51 -0700 (PDT) Received: from ripple.attlocal.net (172-10-233-147.lightspeed.sntcca.sbcglobal.net. [172.10.233.147]) by smtp.gmail.com with ESMTPSA id j73-20020a25d24c000000b00bb064767a4esm503449ybg.38.2023.05.28.23.23.47 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 28 May 2023 23:23:50 -0700 (PDT) Date: Sun, 28 May 2023 23:23:47 -0700 (PDT) From: Hugh Dickins X-X-Sender: hugh@ripple.attlocal.net To: Andrew Morton cc: Mike Kravetz , Mike Rapoport , "Kirill A. Shutemov" , Matthew Wilcox , David Hildenbrand , Suren Baghdasaryan , Qi Zheng , Yang Shi , Mel Gorman , Peter Xu , Peter Zijlstra , Will Deacon , Yu Zhao , Alistair Popple , Ralph Campbell , Ira Weiny , Steven Price , SeongJae Park , Naoya Horiguchi , Christophe Leroy , Zack Rusin , Jason Gunthorpe , Axel Rasmussen , Anshuman Khandual , Pasha Tatashin , Miaohe Lin , Minchan Kim , Christoph Hellwig , Song Liu , Thomas Hellstrom , Russell King , "David S. Miller" , Michael Ellerman , "Aneesh Kumar K.V" , Heiko Carstens , Christian Borntraeger , Claudio Imbrenda , Alexander Gordeev , Jann Horn , linux-arm-kernel@lists.infradead.org, sparclinux@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, linux-s390@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: [PATCH 08/12] mm/pgtable: add pte_free_defer() for pgtable as page In-Reply-To: <35e983f5-7ed3-b310-d949-9ae8b130cdab@google.com> Message-ID: <739964d-c535-4db4-90ec-2166285b4d47@google.com> References: <35e983f5-7ed3-b310-d949-9ae8b130cdab@google.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Add the generic pte_free_defer(), to call pte_free() via call_rcu(). pte_free_defer() will be called inside khugepaged's retract_page_tables() loop, where allocating extra memory cannot be relied upon. This version suits all those architectures which use an unfragmented page for one page table (none of whose pte_free()s use the mm arg which was passed to it). Signed-off-by: Hugh Dickins --- include/linux/pgtable.h | 2 ++ mm/pgtable-generic.c | 20 ++++++++++++++++++++ 2 files changed, 22 insertions(+) diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h index 8b0fc7fdc46f..62a8732d92f0 100644 --- a/include/linux/pgtable.h +++ b/include/linux/pgtable.h @@ -112,6 +112,8 @@ static inline void pte_unmap(pte_t *pte) } #endif =20 +void pte_free_defer(struct mm_struct *mm, pgtable_t pgtable); + /* Find an entry in the second-level page table.. */ #ifndef pmd_offset static inline pmd_t *pmd_offset(pud_t *pud, unsigned long address) diff --git a/mm/pgtable-generic.c b/mm/pgtable-generic.c index d28b63386cef..471697dcb244 100644 --- a/mm/pgtable-generic.c +++ b/mm/pgtable-generic.c @@ -13,6 +13,7 @@ #include #include #include +#include #include =20 /* @@ -230,6 +231,25 @@ pmd_t pmdp_collapse_flush(struct vm_area_struct *vma, = unsigned long address, return pmd; } #endif + +/* arch define pte_free_defer in asm/pgalloc.h for its own implementation = */ +#ifndef pte_free_defer +static void pte_free_now(struct rcu_head *head) +{ + struct page *page; + + page =3D container_of(head, struct page, rcu_head); + pte_free(NULL /* mm not passed and not used */, (pgtable_t)page); +} + +void pte_free_defer(struct mm_struct *mm, pgtable_t pgtable) +{ + struct page *page; + + page =3D pgtable; + call_rcu(&page->rcu_head, pte_free_now); +} +#endif /* pte_free_defer */ #endif /* CONFIG_TRANSPARENT_HUGEPAGE */ =20 #if defined(CONFIG_GUP_GET_PXX_LOW_HIGH) && \ --=20 2.35.3 From nobody Sun Feb 8 14:51:49 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 34E1EC7EE29 for ; Mon, 29 May 2023 06:26:06 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231551AbjE2G0F (ORCPT ); Mon, 29 May 2023 02:26:05 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53056 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231588AbjE2G0B (ORCPT ); Mon, 29 May 2023 02:26:01 -0400 Received: from mail-yb1-xb29.google.com (mail-yb1-xb29.google.com [IPv6:2607:f8b0:4864:20::b29]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B48B819D for ; Sun, 28 May 2023 23:25:26 -0700 (PDT) Received: by mail-yb1-xb29.google.com with SMTP id 3f1490d57ef6-ba71cd7ce7fso4329577276.1 for ; Sun, 28 May 2023 23:25:26 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1685341520; x=1687933520; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:from:to:cc:subject:date:message-id:reply-to; bh=JHIvVFDDeb0UmLqYECNwvBDzrNYvPV2mAJt9DuZsGqI=; b=vy4bOZeB/A66J4j9HnNgzhCNnvZdPQ/qYVsp6cHRESlm/+NNE5/wIBuX4BQPZZXi+U yfOv5KIuNFZIGxmT2J41Omwq2WuOLtYXkDCRVQFlDnKL7Sb2EkLjAGyVqgxk6YwVvXsq 4Pf+i37X+4QiOjtG5wU2Llmv4Y51UqbqrD37XCP8fC+Il7S+6dvF57+wSKT3pOZKo9Wq XS9k4OAS6EFviHg6GY7NWmUx1fP6lsswTyubgGRjqDpTcAziJzjScdUEu1t3l5BKVD8B YHHYTZOtafVUcsfZZNCTBvgZT4szHnXqQAoJ2bUIgqmZt8ZRwlANX5NxU9BJDyCisuqK 1k1Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1685341520; x=1687933520; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=JHIvVFDDeb0UmLqYECNwvBDzrNYvPV2mAJt9DuZsGqI=; b=X1seUOMyj7jfzRHrkYe93ecLtOc2235jqhNC8mhp4BeU1FG72wKNZ8DdyiBX1bBG1z 1C3CL3mXmnA8cXWMCKG/Tj3wk0bDFkkeEOXkaeJs8/P51kjPVDeJj6eXyFb5RgbKYT9r 8Y5WNNALph/R/31dtaXzE/qHI/OTTfCqZ4Tl7qImHGYWCZMT8rinjqokkZ7oStTQVmUl c/K22pK7B1eaeoWcY1IC++Ig4+hDlImUpSk8xzplWuibPHnyCD6omEEWu3Idr0a5GjB6 d7tsVC860izk8oPlrHB42KIvMWwjwLo1SA1BloL/eybwpL1WyxO8OfR6SBHnaACg0g9F JPGg== X-Gm-Message-State: AC+VfDzWhFo1jr8nCwbSQ5cZQdNmY5yvne+IHfB9U0JSBVpKMugjHag6 3wGOEmkJQpctXbKp5F2gg3jdSA== X-Google-Smtp-Source: ACHHUZ760oJ9OUejnylCEVSFBks584YyNq1hzCWCB4cpQetH+OHnL70ZjQyryA07DRyxberP/3hpcA== X-Received: by 2002:a25:ae87:0:b0:b9e:7082:971e with SMTP id b7-20020a25ae87000000b00b9e7082971emr9144162ybj.45.1685341519867; Sun, 28 May 2023 23:25:19 -0700 (PDT) Received: from ripple.attlocal.net (172-10-233-147.lightspeed.sntcca.sbcglobal.net. [172.10.233.147]) by smtp.gmail.com with ESMTPSA id x18-20020a258592000000b00ba88763e5b5sm2667181ybk.2.2023.05.28.23.25.16 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 28 May 2023 23:25:19 -0700 (PDT) Date: Sun, 28 May 2023 23:25:15 -0700 (PDT) From: Hugh Dickins X-X-Sender: hugh@ripple.attlocal.net To: Andrew Morton cc: Mike Kravetz , Mike Rapoport , "Kirill A. Shutemov" , Matthew Wilcox , David Hildenbrand , Suren Baghdasaryan , Qi Zheng , Yang Shi , Mel Gorman , Peter Xu , Peter Zijlstra , Will Deacon , Yu Zhao , Alistair Popple , Ralph Campbell , Ira Weiny , Steven Price , SeongJae Park , Naoya Horiguchi , Christophe Leroy , Zack Rusin , Jason Gunthorpe , Axel Rasmussen , Anshuman Khandual , Pasha Tatashin , Miaohe Lin , Minchan Kim , Christoph Hellwig , Song Liu , Thomas Hellstrom , Russell King , "David S. Miller" , Michael Ellerman , "Aneesh Kumar K.V" , Heiko Carstens , Christian Borntraeger , Claudio Imbrenda , Alexander Gordeev , Jann Horn , linux-arm-kernel@lists.infradead.org, sparclinux@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, linux-s390@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: [PATCH 09/12] mm/khugepaged: retract_page_tables() without mmap or vma lock In-Reply-To: <35e983f5-7ed3-b310-d949-9ae8b130cdab@google.com> Message-ID: <2e9996fa-d238-e7c-1194-834a2bd1f60@google.com> References: <35e983f5-7ed3-b310-d949-9ae8b130cdab@google.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Simplify shmem and file THP collapse's retract_page_tables(), and relax its locking: to improve its success rate and to lessen impact on others. Instead of its MADV_COLLAPSE case doing set_huge_pmd() at target_addr of target_mm, leave that part of the work to madvise_collapse() calling collapse_pte_mapped_thp() afterwards: just adjust collapse_file()'s result code to arrange for that. That spares retract_page_tables() four arguments; and since it will be successful in retracting all of the page tables expected of it, no need to track and return a result code itself. It needs i_mmap_lock_read(mapping) for traversing the vma interval tree, but it does not need i_mmap_lock_write() for that: page_vma_mapped_walk() allows for pte_offset_map_lock() etc to fail, and uses pmd_lock() for THPs. retract_page_tables() just needs to use those same spinlocks to exclude it briefly, while transitioning pmd from page table to none: so restore its use of pmd_lock() inside of which pte lock is nested. Users of pte_offset_map_lock() etc all now allow for them to fail: so retract_page_tables() now has no use for mmap_write_trylock() or vma_try_start_write(). In common with rmap and page_vma_mapped_walk(), it does not even need the mmap_read_lock(). But those users do expect the page table to remain a good page table, until they unlock and rcu_read_unlock(): so the page table cannot be freed immediately, but rather by the recently added pte_free_defer(). retract_page_tables() can be enhanced to replace_page_tables(), which inserts the final huge pmd without mmap lock: going through an invalid state instead of pmd_none() followed by fault. But that does raise some questions, and requires a more complicated pte_free_defer() for powerpc (when its arch_needs_pgtable_deposit() for shmem and file THPs). Leave that enhancement to a later release. Signed-off-by: Hugh Dickins --- mm/khugepaged.c | 169 +++++++++++++++++------------------------------- 1 file changed, 60 insertions(+), 109 deletions(-) diff --git a/mm/khugepaged.c b/mm/khugepaged.c index 1083f0e38a07..4fd408154692 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -1617,9 +1617,8 @@ int collapse_pte_mapped_thp(struct mm_struct *mm, uns= igned long addr, break; case SCAN_PMD_NONE: /* - * In MADV_COLLAPSE path, possible race with khugepaged where - * all pte entries have been removed and pmd cleared. If so, - * skip all the pte checks and just update the pmd mapping. + * All pte entries have been removed and pmd cleared. + * Skip all the pte checks and just update the pmd mapping. */ goto maybe_install_pmd; default: @@ -1748,123 +1747,73 @@ static void khugepaged_collapse_pte_mapped_thps(st= ruct khugepaged_mm_slot *mm_sl mmap_write_unlock(mm); } =20 -static int retract_page_tables(struct address_space *mapping, pgoff_t pgof= f, - struct mm_struct *target_mm, - unsigned long target_addr, struct page *hpage, - struct collapse_control *cc) +static void retract_page_tables(struct address_space *mapping, pgoff_t pgo= ff) { struct vm_area_struct *vma; - int target_result =3D SCAN_FAIL; =20 - i_mmap_lock_write(mapping); + i_mmap_lock_read(mapping); vma_interval_tree_foreach(vma, &mapping->i_mmap, pgoff, pgoff) { - int result =3D SCAN_FAIL; - struct mm_struct *mm =3D NULL; - unsigned long addr =3D 0; - pmd_t *pmd; - bool is_target =3D false; + struct mm_struct *mm; + unsigned long addr; + pmd_t *pmd, pgt_pmd; + spinlock_t *pml; + spinlock_t *ptl; =20 /* * Check vma->anon_vma to exclude MAP_PRIVATE mappings that - * got written to. These VMAs are likely not worth investing - * mmap_write_lock(mm) as PMD-mapping is likely to be split - * later. + * got written to. These VMAs are likely not worth removing + * page tables from, as PMD-mapping is likely to be split later. * - * Note that vma->anon_vma check is racy: it can be set up after - * the check but before we took mmap_lock by the fault path. - * But page lock would prevent establishing any new ptes of the - * page, so we are safe. - * - * An alternative would be drop the check, but check that page - * table is clear before calling pmdp_collapse_flush() under - * ptl. It has higher chance to recover THP for the VMA, but - * has higher cost too. It would also probably require locking - * the anon_vma. + * Note that vma->anon_vma check is racy: it can be set after + * the check, but page locks (with XA_RETRY_ENTRYs in holes) + * prevented establishing new ptes of the page. So we are safe + * to remove page table below, without even checking it's empty. */ - if (READ_ONCE(vma->anon_vma)) { - result =3D SCAN_PAGE_ANON; - goto next; - } + if (READ_ONCE(vma->anon_vma)) + continue; + addr =3D vma->vm_start + ((pgoff - vma->vm_pgoff) << PAGE_SHIFT); if (addr & ~HPAGE_PMD_MASK || - vma->vm_end < addr + HPAGE_PMD_SIZE) { - result =3D SCAN_VMA_CHECK; - goto next; - } - mm =3D vma->vm_mm; - is_target =3D mm =3D=3D target_mm && addr =3D=3D target_addr; - result =3D find_pmd_or_thp_or_none(mm, addr, &pmd); - if (result !=3D SCAN_SUCCEED) - goto next; - /* - * We need exclusive mmap_lock to retract page table. - * - * We use trylock due to lock inversion: we need to acquire - * mmap_lock while holding page lock. Fault path does it in - * reverse order. Trylock is a way to avoid deadlock. - * - * Also, it's not MADV_COLLAPSE's job to collapse other - * mappings - let khugepaged take care of them later. - */ - result =3D SCAN_PTE_MAPPED_HUGEPAGE; - if ((cc->is_khugepaged || is_target) && - mmap_write_trylock(mm)) { - /* trylock for the same lock inversion as above */ - if (!vma_try_start_write(vma)) - goto unlock_next; - - /* - * Re-check whether we have an ->anon_vma, because - * collapse_and_free_pmd() requires that either no - * ->anon_vma exists or the anon_vma is locked. - * We already checked ->anon_vma above, but that check - * is racy because ->anon_vma can be populated under the - * mmap lock in read mode. - */ - if (vma->anon_vma) { - result =3D SCAN_PAGE_ANON; - goto unlock_next; - } - /* - * When a vma is registered with uffd-wp, we can't - * recycle the pmd pgtable because there can be pte - * markers installed. Skip it only, so the rest mm/vma - * can still have the same file mapped hugely, however - * it'll always mapped in small page size for uffd-wp - * registered ranges. - */ - if (hpage_collapse_test_exit(mm)) { - result =3D SCAN_ANY_PROCESS; - goto unlock_next; - } - if (userfaultfd_wp(vma)) { - result =3D SCAN_PTE_UFFD_WP; - goto unlock_next; - } - collapse_and_free_pmd(mm, vma, addr, pmd); - if (!cc->is_khugepaged && is_target) - result =3D set_huge_pmd(vma, addr, pmd, hpage); - else - result =3D SCAN_SUCCEED; - -unlock_next: - mmap_write_unlock(mm); - goto next; - } - /* - * Calling context will handle target mm/addr. Otherwise, let - * khugepaged try again later. - */ - if (!is_target) { - khugepaged_add_pte_mapped_thp(mm, addr); + vma->vm_end < addr + HPAGE_PMD_SIZE) continue; - } -next: - if (is_target) - target_result =3D result; + + mm =3D vma->vm_mm; + if (find_pmd_or_thp_or_none(mm, addr, &pmd) !=3D SCAN_SUCCEED) + continue; + + if (hpage_collapse_test_exit(mm)) + continue; + /* + * When a vma is registered with uffd-wp, we cannot recycle + * the page table because there may be pte markers installed. + * Other vmas can still have the same file mapped hugely, but + * skip this one: it will always be mapped in small page size + * for uffd-wp registered ranges. + * + * What if VM_UFFD_WP is set a moment after this check? No + * problem, huge page lock is still held, stopping new mappings + * of page which might then get replaced by pte markers: only + * existing markers need to be protected here. (We could check + * after getting ptl below, but this comment distracting there!) + */ + if (userfaultfd_wp(vma)) + continue; + + /* Huge page lock is still held, so page table must be empty */ + pml =3D pmd_lock(mm, pmd); + ptl =3D pte_lockptr(mm, pmd); + if (ptl !=3D pml) + spin_lock_nested(ptl, SINGLE_DEPTH_NESTING); + pgt_pmd =3D pmdp_collapse_flush(vma, addr, pmd); + if (ptl !=3D pml) + spin_unlock(ptl); + spin_unlock(pml); + + mm_dec_nr_ptes(mm); + page_table_check_pte_clear_range(mm, addr, pgt_pmd); + pte_free_defer(mm, pmd_pgtable(pgt_pmd)); } - i_mmap_unlock_write(mapping); - return target_result; + i_mmap_unlock_read(mapping); } =20 /** @@ -2261,9 +2210,11 @@ static int collapse_file(struct mm_struct *mm, unsig= ned long addr, =20 /* * Remove pte page tables, so we can re-fault the page as huge. + * If MADV_COLLAPSE, adjust result to call collapse_pte_mapped_thp(). */ - result =3D retract_page_tables(mapping, start, mm, addr, hpage, - cc); + retract_page_tables(mapping, start); + if (cc && !cc->is_khugepaged) + result =3D SCAN_PTE_MAPPED_HUGEPAGE; unlock_page(hpage); =20 /* --=20 2.35.3 From nobody Sun Feb 8 14:51:49 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 384F9C7EE2F for ; Mon, 29 May 2023 06:27:38 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231710AbjE2G1g (ORCPT ); Mon, 29 May 2023 02:27:36 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:54358 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231682AbjE2G1X (ORCPT ); Mon, 29 May 2023 02:27:23 -0400 Received: from mail-yb1-xb2b.google.com (mail-yb1-xb2b.google.com [IPv6:2607:f8b0:4864:20::b2b]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 277FC19D for ; Sun, 28 May 2023 23:26:54 -0700 (PDT) Received: by mail-yb1-xb2b.google.com with SMTP id 3f1490d57ef6-ba8afcc82c0so5796215276.2 for ; Sun, 28 May 2023 23:26:54 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1685341612; x=1687933612; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:from:to:cc:subject:date:message-id:reply-to; bh=OPZMX1+LA/FvHP0/JC5qx50WzIWIXFewxM+MGzYgG4M=; b=Ytf3pJx5ZO2h8JphUMEUM+NgVIXTwIVhtYTQ/lA5Y8cir60w0OXuWtSllgiDhinPaR Iy4j/+CUvFlAp4R/CjSOblw5I+znP0kaNSqJXR4VKYKvTUfPb3oYvzoFoZFf0g3HTRPQ tAY4R1a5sWTcuBd7Oz6cpvBBwVDFmQiqte1C3YTUpMO9AoANcfZAyiW/btELN35Kv490 fFp3EQeG93PsokK2axn0N0nvmbnw91onFgB1MigUnnJSNnNJsoUz87zNfUFbsCL7jZ72 cyBArEZpOY67dIbt6WH+/9VlUHqVGsY29EI6EqLyGxDNiZ8V+VTllfG1NJsbPWUsuh6Y 4UjA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1685341612; x=1687933612; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=OPZMX1+LA/FvHP0/JC5qx50WzIWIXFewxM+MGzYgG4M=; b=dZArGuEZZl+0ysFdi2nbhzpjLv0XjuhXMFscP6iTLbC3XU2t024Y9mVhJA0ypDqTc0 dbryjS3bsYVA0IZhb97bQMNTS+Aby1y28s0zf/FhIv9fIjrbdwZe8xVcsG//C8id8ME6 ggw9dFkiTRIJqObLH44BvRlXW4K/1wyTWwMqGZv70P2AV1rWeej5cfltZVTRL+iKebdY j/hqs+P3qjJBrJbNuMbG4oYTmoNfNaz2eLvPpidR1yi/KKS4fGcdqBnC5hNpTTK2s4zp B4tnxGWfYDjxudJ9bisfNb+I2nlKWdlC1579D+B8KMuiF+U+rRdOanOw5mIffMk+102u 6z+A== X-Gm-Message-State: AC+VfDyjqqtqWWcNfcTH40pGMrvZErTqsOPiTXNXtcqw92vLUZXWhFIb 6ff3HA3imjT5hHpbcF5njAizAQ== X-Google-Smtp-Source: ACHHUZ5aDBGSE8V0e3lWDnHjHjCYn6fQF4N/hxkcmf1ygogwgzLWM5AgQMYGVJCj77dNUpkBtnYW7w== X-Received: by 2002:a81:5b55:0:b0:565:ec67:18f4 with SMTP id p82-20020a815b55000000b00565ec6718f4mr4108617ywb.32.1685341612167; Sun, 28 May 2023 23:26:52 -0700 (PDT) Received: from ripple.attlocal.net (172-10-233-147.lightspeed.sntcca.sbcglobal.net. [172.10.233.147]) by smtp.gmail.com with ESMTPSA id m6-20020a0de306000000b0055a486140b6sm3427593ywe.36.2023.05.28.23.26.48 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 28 May 2023 23:26:51 -0700 (PDT) Date: Sun, 28 May 2023 23:26:48 -0700 (PDT) From: Hugh Dickins X-X-Sender: hugh@ripple.attlocal.net To: Andrew Morton cc: Mike Kravetz , Mike Rapoport , "Kirill A. Shutemov" , Matthew Wilcox , David Hildenbrand , Suren Baghdasaryan , Qi Zheng , Yang Shi , Mel Gorman , Peter Xu , Peter Zijlstra , Will Deacon , Yu Zhao , Alistair Popple , Ralph Campbell , Ira Weiny , Steven Price , SeongJae Park , Naoya Horiguchi , Christophe Leroy , Zack Rusin , Jason Gunthorpe , Axel Rasmussen , Anshuman Khandual , Pasha Tatashin , Miaohe Lin , Minchan Kim , Christoph Hellwig , Song Liu , Thomas Hellstrom , Russell King , "David S. Miller" , Michael Ellerman , "Aneesh Kumar K.V" , Heiko Carstens , Christian Borntraeger , Claudio Imbrenda , Alexander Gordeev , Jann Horn , linux-arm-kernel@lists.infradead.org, sparclinux@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, linux-s390@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: [PATCH 10/12] mm/khugepaged: collapse_pte_mapped_thp() with mmap_read_lock() In-Reply-To: <35e983f5-7ed3-b310-d949-9ae8b130cdab@google.com> Message-ID: <563340a4-7ac9-7cc8-33d8-f7cc6ef19ea6@google.com> References: <35e983f5-7ed3-b310-d949-9ae8b130cdab@google.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Bring collapse_and_free_pmd() back into collapse_pte_mapped_thp(). It does need mmap_read_lock(), but it does not need mmap_write_lock(), nor vma_start_write() nor i_mmap lock nor anon_vma lock. All racing paths are relying on pte_offset_map_lock() and pmd_lock(), so use those. Follow the pattern in retract_page_tables(); and using pte_free_defer() removes the need for tlb_remove_table_sync_one() here. Confirm the preliminary find_pmd_or_thp_or_none() once page lock has been acquired and the page looks suitable: from then on its state is stable. However, collapse_pte_mapped_thp() was doing something others don't: freeing a page table still containing "valid" entries. i_mmap lock did stop a racing truncate from double-freeing those pages, but we prefer collapse_pte_mapped_thp() to clear the entries as usual. Their TLB flush can wait until the pmdp_collapse_flush() which follows, but the mmu_notifier_invalidate_range_start() has to be done earlier. Some cleanup while rearranging: rename "count" to "nr_ptes"; and "step 2" does not need to duplicate the checks in "step 1". Signed-off-by: Hugh Dickins --- mm/khugepaged.c | 131 +++++++++++++++--------------------------------- 1 file changed, 41 insertions(+), 90 deletions(-) diff --git a/mm/khugepaged.c b/mm/khugepaged.c index 4fd408154692..2999500abdd5 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -1485,7 +1485,7 @@ static bool khugepaged_add_pte_mapped_thp(struct mm_s= truct *mm, return ret; } =20 -/* hpage must be locked, and mmap_lock must be held in write */ +/* hpage must be locked, and mmap_lock must be held */ static int set_huge_pmd(struct vm_area_struct *vma, unsigned long addr, pmd_t *pmdp, struct page *hpage) { @@ -1497,7 +1497,7 @@ static int set_huge_pmd(struct vm_area_struct *vma, u= nsigned long addr, }; =20 VM_BUG_ON(!PageTransHuge(hpage)); - mmap_assert_write_locked(vma->vm_mm); + mmap_assert_locked(vma->vm_mm); =20 if (do_set_pmd(&vmf, hpage)) return SCAN_FAIL; @@ -1506,48 +1506,6 @@ static int set_huge_pmd(struct vm_area_struct *vma, = unsigned long addr, return SCAN_SUCCEED; } =20 -/* - * A note about locking: - * Trying to take the page table spinlocks would be useless here because t= hose - * are only used to synchronize: - * - * - modifying terminal entries (ones that point to a data page, not to a= nother - * page table) - * - installing *new* non-terminal entries - * - * Instead, we need roughly the same kind of protection as free_pgtables()= or - * mm_take_all_locks() (but only for a single VMA): - * The mmap lock together with this VMA's rmap locks covers all paths towa= rds - * the page table entries we're messing with here, except for hardware page - * table walks and lockless_pages_from_mm(). - */ -static void collapse_and_free_pmd(struct mm_struct *mm, struct vm_area_str= uct *vma, - unsigned long addr, pmd_t *pmdp) -{ - pmd_t pmd; - struct mmu_notifier_range range; - - mmap_assert_write_locked(mm); - if (vma->vm_file) - lockdep_assert_held_write(&vma->vm_file->f_mapping->i_mmap_rwsem); - /* - * All anon_vmas attached to the VMA have the same root and are - * therefore locked by the same lock. - */ - if (vma->anon_vma) - lockdep_assert_held_write(&vma->anon_vma->root->rwsem); - - mmu_notifier_range_init(&range, MMU_NOTIFY_CLEAR, 0, mm, addr, - addr + HPAGE_PMD_SIZE); - mmu_notifier_invalidate_range_start(&range); - pmd =3D pmdp_collapse_flush(vma, addr, pmdp); - tlb_remove_table_sync_one(); - mmu_notifier_invalidate_range_end(&range); - mm_dec_nr_ptes(mm); - page_table_check_pte_clear_range(mm, addr, pmd); - pte_free(mm, pmd_pgtable(pmd)); -} - /** * collapse_pte_mapped_thp - Try to collapse a pte-mapped THP for mm at * address haddr. @@ -1563,16 +1521,17 @@ static void collapse_and_free_pmd(struct mm_struct = *mm, struct vm_area_struct *v int collapse_pte_mapped_thp(struct mm_struct *mm, unsigned long addr, bool install_pmd) { + struct mmu_notifier_range range; unsigned long haddr =3D addr & HPAGE_PMD_MASK; struct vm_area_struct *vma =3D vma_lookup(mm, haddr); struct page *hpage; pte_t *start_pte, *pte; - pmd_t *pmd; - spinlock_t *ptl; - int count =3D 0, result =3D SCAN_FAIL; + pmd_t *pmd, pgt_pmd; + spinlock_t *pml, *ptl; + int nr_ptes =3D 0, result =3D SCAN_FAIL; int i; =20 - mmap_assert_write_locked(mm); + mmap_assert_locked(mm); =20 /* Fast check before locking page if already PMD-mapped */ result =3D find_pmd_or_thp_or_none(mm, haddr, &pmd); @@ -1612,6 +1571,7 @@ int collapse_pte_mapped_thp(struct mm_struct *mm, uns= igned long addr, goto drop_hpage; } =20 + result =3D find_pmd_or_thp_or_none(mm, haddr, &pmd); switch (result) { case SCAN_SUCCEED: break; @@ -1625,27 +1585,14 @@ int collapse_pte_mapped_thp(struct mm_struct *mm, u= nsigned long addr, goto drop_hpage; } =20 - /* Lock the vma before taking i_mmap and page table locks */ - vma_start_write(vma); + mmu_notifier_range_init(&range, MMU_NOTIFY_CLEAR, 0, mm, + haddr, haddr + HPAGE_PMD_SIZE); + mmu_notifier_invalidate_range_start(&range); =20 - /* - * We need to lock the mapping so that from here on, only GUP-fast and - * hardware page walks can access the parts of the page tables that - * we're operating on. - * See collapse_and_free_pmd(). - */ - i_mmap_lock_write(vma->vm_file->f_mapping); - - /* - * This spinlock should be unnecessary: Nobody else should be accessing - * the page tables under spinlock protection here, only - * lockless_pages_from_mm() and the hardware page walker can access page - * tables while all the high-level locks are held in write mode. - */ result =3D SCAN_FAIL; start_pte =3D pte_offset_map_lock(mm, pmd, haddr, &ptl); - if (!start_pte) - goto drop_immap; + if (!start_pte) /* mmap_lock + page lock should prevent this */ + goto abort; =20 /* step 1: check all mapped PTEs are to the right huge page */ for (i =3D 0, addr =3D haddr, pte =3D start_pte; @@ -1671,40 +1618,44 @@ int collapse_pte_mapped_thp(struct mm_struct *mm, u= nsigned long addr, */ if (hpage + i !=3D page) goto abort; - count++; + nr_ptes++; } =20 - /* step 2: adjust rmap */ + /* step 2: clear page table and adjust rmap */ for (i =3D 0, addr =3D haddr, pte =3D start_pte; i < HPAGE_PMD_NR; i++, addr +=3D PAGE_SIZE, pte++) { - struct page *page; - if (pte_none(*pte)) continue; - page =3D vm_normal_page(vma, addr, *pte); - if (WARN_ON_ONCE(page && is_zone_device_page(page))) - goto abort; - page_remove_rmap(page, vma, false); + + /* Must clear entry, or a racing truncate may re-remove it */ + pte_clear(mm, addr, pte); + page_remove_rmap(hpage + i, vma, false); } =20 pte_unmap_unlock(start_pte, ptl); =20 /* step 3: set proper refcount and mm_counters. */ - if (count) { - page_ref_sub(hpage, count); - add_mm_counter(vma->vm_mm, mm_counter_file(hpage), -count); + if (nr_ptes) { + page_ref_sub(hpage, nr_ptes); + add_mm_counter(vma->vm_mm, mm_counter_file(hpage), -nr_ptes); } =20 - /* step 4: remove pte entries */ - /* we make no change to anon, but protect concurrent anon page lookup */ - if (vma->anon_vma) - anon_vma_lock_write(vma->anon_vma); + /* step 4: remove page table */ =20 - collapse_and_free_pmd(mm, vma, haddr, pmd); + /* Huge page lock is still held, so page table must remain empty */ + pml =3D pmd_lock(mm, pmd); + if (ptl !=3D pml) + spin_lock_nested(ptl, SINGLE_DEPTH_NESTING); + pgt_pmd =3D pmdp_collapse_flush(vma, haddr, pmd); + if (ptl !=3D pml) + spin_unlock(ptl); + spin_unlock(pml); =20 - if (vma->anon_vma) - anon_vma_unlock_write(vma->anon_vma); - i_mmap_unlock_write(vma->vm_file->f_mapping); + mmu_notifier_invalidate_range_end(&range); + + mm_dec_nr_ptes(mm); + page_table_check_pte_clear_range(mm, haddr, pgt_pmd); + pte_free_defer(mm, pmd_pgtable(pgt_pmd)); =20 maybe_install_pmd: /* step 5: install pmd entry */ @@ -1718,9 +1669,9 @@ int collapse_pte_mapped_thp(struct mm_struct *mm, uns= igned long addr, return result; =20 abort: - pte_unmap_unlock(start_pte, ptl); -drop_immap: - i_mmap_unlock_write(vma->vm_file->f_mapping); + if (start_pte) + pte_unmap_unlock(start_pte, ptl); + mmu_notifier_invalidate_range_end(&range); goto drop_hpage; } =20 @@ -2842,9 +2793,9 @@ int madvise_collapse(struct vm_area_struct *vma, stru= ct vm_area_struct **prev, case SCAN_PTE_MAPPED_HUGEPAGE: BUG_ON(mmap_locked); BUG_ON(*prev); - mmap_write_lock(mm); + mmap_read_lock(mm); result =3D collapse_pte_mapped_thp(mm, addr, true); - mmap_write_unlock(mm); + mmap_locked =3D true; goto handle_result; /* Whitelisted set of results where continuing OK */ case SCAN_PMD_NULL: --=20 2.35.3 From nobody Sun Feb 8 14:51:49 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0E09BC77B7A for ; Mon, 29 May 2023 06:30:22 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231488AbjE2GaU (ORCPT ); Mon, 29 May 2023 02:30:20 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:56940 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230347AbjE2GaR (ORCPT ); Mon, 29 May 2023 02:30:17 -0400 Received: from mail-yb1-xb2b.google.com (mail-yb1-xb2b.google.com [IPv6:2607:f8b0:4864:20::b2b]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 72A12C2 for ; Sun, 28 May 2023 23:29:50 -0700 (PDT) Received: by mail-yb1-xb2b.google.com with SMTP id 3f1490d57ef6-ba818eb96dcso2257803276.0 for ; Sun, 28 May 2023 23:29:50 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1685341737; x=1687933737; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:from:to:cc:subject:date:message-id:reply-to; bh=xT/EEswLF0xV48TACSfwsUMzv4W28WePE24GrL1l/5o=; b=JvvTudiJ2ueEboa73P4a3lGmzg34HxMFolFy18ddYVZR38ryyurom4tpA+pKKdAKm+ yYAkrdkmojRae9uCGOc+xBDnB7qvKDF9c9FFPnpSBpjBiwbOJHOHU1+piNik8nuJHIHQ vpxQvD6IDQlSrCbcFB+Lk4wcPVE2ZDCw/XyBpkTiIDI1EAgr2XJxMGctVaCjxy4tlymz LrAusDPYqGDSES0yI8fnkSztcqZEaAtqHPYgPXvbz0LuLLq9bawcEEKWwmIcx4F6CA7T FILNf+ZbNWUKWLzPfTzPIQGR9twFhP7psJ5t7QdrhgKtJ5xSlV25NIlSqIAjV0HdJc5K 2WJg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1685341737; x=1687933737; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=xT/EEswLF0xV48TACSfwsUMzv4W28WePE24GrL1l/5o=; b=LAbjYI95zZ49E0FfPtfWO1a+BOhr2NbfZWj5tyXs4TBVZJle9RxnCVg3AZSQk01KEf mylSD9WfKL4iNGPOQqPij1umSHgoWm71u372uI5Q2R2LqrdkctSsJ8W2+Z/fyVlBVAkV LGHx6szq4DzrjDIzPVWKWLrcJ7XCmkgSKwgTBSmg+gPF9KIToR+S+o8xRdoTSkWwnvWz IsoVEo9/Ubj+AwjU8QPflpxd03YKAy3ZkUShzArr5H4BTs5JmFN8G2hLQWABS3vBjplI NK4aJt2AXw5ndAeVXS+5SBpQfpsk81FiB+ZDf+F/yURMXk97dPWqlYgI4OmdIAAAuMhy cM6g== X-Gm-Message-State: AC+VfDyu/C/YvGZg6QLDhI/24dKD9RzbrhLk15GYPeJhDX66BzMEUJC9 9ZS5JgoAvnHm8K1B2+Q23X7sWQ== X-Google-Smtp-Source: ACHHUZ7HfnnGjkEX5sGUsgNAdc3kbPNe/FVMe5HYLThd9cglb9ZMTh0FR6+KstwImWlJK0i5/3xaIA== X-Received: by 2002:a25:d391:0:b0:bac:f582:eefd with SMTP id e139-20020a25d391000000b00bacf582eefdmr10483734ybf.35.1685341737360; Sun, 28 May 2023 23:28:57 -0700 (PDT) Received: from ripple.attlocal.net (172-10-233-147.lightspeed.sntcca.sbcglobal.net. [172.10.233.147]) by smtp.gmail.com with ESMTPSA id 63-20020a251142000000b00ba7cb887380sm2723779ybr.14.2023.05.28.23.28.53 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 28 May 2023 23:28:57 -0700 (PDT) Date: Sun, 28 May 2023 23:28:52 -0700 (PDT) From: Hugh Dickins X-X-Sender: hugh@ripple.attlocal.net To: Andrew Morton cc: Mike Kravetz , Mike Rapoport , "Kirill A. Shutemov" , Matthew Wilcox , David Hildenbrand , Suren Baghdasaryan , Qi Zheng , Yang Shi , Mel Gorman , Peter Xu , Peter Zijlstra , Will Deacon , Yu Zhao , Alistair Popple , Ralph Campbell , Ira Weiny , Steven Price , SeongJae Park , Naoya Horiguchi , Christophe Leroy , Zack Rusin , Jason Gunthorpe , Axel Rasmussen , Anshuman Khandual , Pasha Tatashin , Miaohe Lin , Minchan Kim , Christoph Hellwig , Song Liu , Thomas Hellstrom , Russell King , "David S. Miller" , Michael Ellerman , "Aneesh Kumar K.V" , Heiko Carstens , Christian Borntraeger , Claudio Imbrenda , Alexander Gordeev , Jann Horn , linux-arm-kernel@lists.infradead.org, sparclinux@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, linux-s390@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: [PATCH 11/12] mm/khugepaged: delete khugepaged_collapse_pte_mapped_thps() In-Reply-To: <35e983f5-7ed3-b310-d949-9ae8b130cdab@google.com> Message-ID: <1bf6f10-1f8d-d410-98b9-66cbf9a45c2@google.com> References: <35e983f5-7ed3-b310-d949-9ae8b130cdab@google.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Now that retract_page_tables() can retract page tables reliably, without depending on trylocks, delete all the apparatus for khugepaged to try again later: khugepaged_collapse_pte_mapped_thps() etc; and free up the per-mm memory which was set aside for that in the khugepaged_mm_slot. But one part of that is worth keeping: when hpage_collapse_scan_file() found SCAN_PTE_MAPPED_HUGEPAGE, that address was noted in the mm_slot to be tried for retraction later - catching, for example, page tables where a reversible mprotect() of a portion had required splitting the pmd, but now it can be recollapsed. Call collapse_pte_mapped_thp() directly in this case (why was it deferred before? I assume an issue with needing mmap_lock for write, but now it's only needed for read). Signed-off-by: Hugh Dickins --- mm/khugepaged.c | 125 +++++++----------------------------------------- 1 file changed, 16 insertions(+), 109 deletions(-) diff --git a/mm/khugepaged.c b/mm/khugepaged.c index 2999500abdd5..301c0e54a2ef 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -92,8 +92,6 @@ static __read_mostly DEFINE_HASHTABLE(mm_slots_hash, MM_S= LOTS_HASH_BITS); =20 static struct kmem_cache *mm_slot_cache __read_mostly; =20 -#define MAX_PTE_MAPPED_THP 8 - struct collapse_control { bool is_khugepaged; =20 @@ -107,15 +105,9 @@ struct collapse_control { /** * struct khugepaged_mm_slot - khugepaged information per mm that is being= scanned * @slot: hash lookup from mm to mm_slot - * @nr_pte_mapped_thp: number of pte mapped THP - * @pte_mapped_thp: address array corresponding pte mapped THP */ struct khugepaged_mm_slot { struct mm_slot slot; - - /* pte-mapped THP in this mm */ - int nr_pte_mapped_thp; - unsigned long pte_mapped_thp[MAX_PTE_MAPPED_THP]; }; =20 /** @@ -1441,50 +1433,6 @@ static void collect_mm_slot(struct khugepaged_mm_slo= t *mm_slot) } =20 #ifdef CONFIG_SHMEM -/* - * Notify khugepaged that given addr of the mm is pte-mapped THP. Then - * khugepaged should try to collapse the page table. - * - * Note that following race exists: - * (1) khugepaged calls khugepaged_collapse_pte_mapped_thps() for mm_struc= t A, - * emptying the A's ->pte_mapped_thp[] array. - * (2) MADV_COLLAPSE collapses some file extent with target mm_struct B, a= nd - * retract_page_tables() finds a VMA in mm_struct A mapping the same e= xtent - * (at virtual address X) and adds an entry (for X) into mm_struct A's - * ->pte-mapped_thp[] array. - * (3) khugepaged calls khugepaged_collapse_scan_file() for mm_struct A at= X, - * sees a pte-mapped THP (SCAN_PTE_MAPPED_HUGEPAGE) and adds an entry - * (for X) into mm_struct A's ->pte-mapped_thp[] array. - * Thus, it's possible the same address is added multiple times for the sa= me - * mm_struct. Should this happen, we'll simply attempt - * collapse_pte_mapped_thp() multiple times for the same address, under th= e same - * exclusive mmap_lock, and assuming the first call is successful, subsequ= ent - * attempts will return quickly (without grabbing any additional locks) wh= en - * a huge pmd is found in find_pmd_or_thp_or_none(). Since this is a cheap - * check, and since this is a rare occurrence, the cost of preventing this - * "multiple-add" is thought to be more expensive than just handling it, s= hould - * it occur. - */ -static bool khugepaged_add_pte_mapped_thp(struct mm_struct *mm, - unsigned long addr) -{ - struct khugepaged_mm_slot *mm_slot; - struct mm_slot *slot; - bool ret =3D false; - - VM_BUG_ON(addr & ~HPAGE_PMD_MASK); - - spin_lock(&khugepaged_mm_lock); - slot =3D mm_slot_lookup(mm_slots_hash, mm); - mm_slot =3D mm_slot_entry(slot, struct khugepaged_mm_slot, slot); - if (likely(mm_slot && mm_slot->nr_pte_mapped_thp < MAX_PTE_MAPPED_THP)) { - mm_slot->pte_mapped_thp[mm_slot->nr_pte_mapped_thp++] =3D addr; - ret =3D true; - } - spin_unlock(&khugepaged_mm_lock); - return ret; -} - /* hpage must be locked, and mmap_lock must be held */ static int set_huge_pmd(struct vm_area_struct *vma, unsigned long addr, pmd_t *pmdp, struct page *hpage) @@ -1675,29 +1623,6 @@ int collapse_pte_mapped_thp(struct mm_struct *mm, un= signed long addr, goto drop_hpage; } =20 -static void khugepaged_collapse_pte_mapped_thps(struct khugepaged_mm_slot = *mm_slot) -{ - struct mm_slot *slot =3D &mm_slot->slot; - struct mm_struct *mm =3D slot->mm; - int i; - - if (likely(mm_slot->nr_pte_mapped_thp =3D=3D 0)) - return; - - if (!mmap_write_trylock(mm)) - return; - - if (unlikely(hpage_collapse_test_exit(mm))) - goto out; - - for (i =3D 0; i < mm_slot->nr_pte_mapped_thp; i++) - collapse_pte_mapped_thp(mm, mm_slot->pte_mapped_thp[i], false); - -out: - mm_slot->nr_pte_mapped_thp =3D 0; - mmap_write_unlock(mm); -} - static void retract_page_tables(struct address_space *mapping, pgoff_t pgo= ff) { struct vm_area_struct *vma; @@ -2326,16 +2251,6 @@ static int hpage_collapse_scan_file(struct mm_struct= *mm, unsigned long addr, { BUILD_BUG(); } - -static void khugepaged_collapse_pte_mapped_thps(struct khugepaged_mm_slot = *mm_slot) -{ -} - -static bool khugepaged_add_pte_mapped_thp(struct mm_struct *mm, - unsigned long addr) -{ - return false; -} #endif =20 static unsigned int khugepaged_scan_mm_slot(unsigned int pages, int *resul= t, @@ -2365,7 +2280,6 @@ static unsigned int khugepaged_scan_mm_slot(unsigned = int pages, int *result, khugepaged_scan.mm_slot =3D mm_slot; } spin_unlock(&khugepaged_mm_lock); - khugepaged_collapse_pte_mapped_thps(mm_slot); =20 mm =3D slot->mm; /* @@ -2418,36 +2332,29 @@ static unsigned int khugepaged_scan_mm_slot(unsigne= d int pages, int *result, khugepaged_scan.address); =20 mmap_read_unlock(mm); - *result =3D hpage_collapse_scan_file(mm, - khugepaged_scan.address, - file, pgoff, cc); mmap_locked =3D false; + *result =3D hpage_collapse_scan_file(mm, + khugepaged_scan.address, file, pgoff, cc); + if (*result =3D=3D SCAN_PTE_MAPPED_HUGEPAGE) { + mmap_read_lock(mm); + mmap_locked =3D true; + if (hpage_collapse_test_exit(mm)) { + fput(file); + goto breakouterloop; + } + *result =3D collapse_pte_mapped_thp(mm, + khugepaged_scan.address, false); + if (*result =3D=3D SCAN_PMD_MAPPED) + *result =3D SCAN_SUCCEED; + } fput(file); } else { *result =3D hpage_collapse_scan_pmd(mm, vma, - khugepaged_scan.address, - &mmap_locked, - cc); + khugepaged_scan.address, &mmap_locked, cc); } - switch (*result) { - case SCAN_PTE_MAPPED_HUGEPAGE: { - pmd_t *pmd; =20 - *result =3D find_pmd_or_thp_or_none(mm, - khugepaged_scan.address, - &pmd); - if (*result !=3D SCAN_SUCCEED) - break; - if (!khugepaged_add_pte_mapped_thp(mm, - khugepaged_scan.address)) - break; - } fallthrough; - case SCAN_SUCCEED: + if (*result =3D=3D SCAN_SUCCEED) ++khugepaged_pages_collapsed; - break; - default: - break; - } =20 /* move to next address */ khugepaged_scan.address +=3D HPAGE_PMD_SIZE; --=20 2.35.3 From nobody Sun Feb 8 14:51:49 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 75A3EC7EE29 for ; Mon, 29 May 2023 06:31:37 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230347AbjE2Gbf (ORCPT ); Mon, 29 May 2023 02:31:35 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:58142 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230328AbjE2Gbc (ORCPT ); Mon, 29 May 2023 02:31:32 -0400 Received: from mail-yb1-xb32.google.com (mail-yb1-xb32.google.com [IPv6:2607:f8b0:4864:20::b32]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C07B6FF for ; Sun, 28 May 2023 23:31:01 -0700 (PDT) Received: by mail-yb1-xb32.google.com with SMTP id 3f1490d57ef6-ba86ea269e0so4431134276.1 for ; Sun, 28 May 2023 23:31:01 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1685341829; x=1687933829; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:from:to:cc:subject:date:message-id:reply-to; bh=Aa5HWe45UW6rQUJiJNVLC4MPu6AngULI5k321Sbm9Ew=; b=UNKL1Q0Pec+L5oOba/99kxWNjcCwq8SenPXFdijAb/ObswMKjM7OUUmz2ZEICNHgFi Qv4ppRRcA3yebk/gcCjK1B/b31tItw1kcbI5MNnORL3Ys8OpaxJh/WEpZJ/SInbs7OtQ pID86tFe1+AG4qmsZJbOnM+7iKwXNWky09aARIe2tetg2zR8PTrceT/1TSWkarzFIet1 8YIKNPCa18G6ke7Me2QA0b5ztYnbCNDIOf/BDmkOzEl5UNMMd2c3BAGItXCN3K8DkXAw E9u7Ntp2cVwujFtJI7hoKruoONwfQaLYMkiR2S/Hp/B93RwLn01nkmija5yTld6gsNK0 YpVA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1685341829; x=1687933829; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=Aa5HWe45UW6rQUJiJNVLC4MPu6AngULI5k321Sbm9Ew=; b=Uo96Y0vYFzyzyLUmfRdtBkqwxrUmgFynV6HOfPsyz9VZC5s3e+prNF5SpzNToiJXU5 Fv7HtSXHM2hyRmCPU5AxCR3Qt6z4X8omUxNmbilswC0ta/bbe4EXgG9IhxjZSb3VBVIf JKbo2E1g490X923vZh00ASmPNJX82wgh8TMRWzysHCjQyaPmr3ozfQYnXqfZyTJWIyLt oxpQW7joiF1lx/DqbUxyhytcMaTjlIHOzW5a/dedr1iiUpUNM4UnOXY4NaKOGRVi6ak0 Ikz57mXQMrLQc4InBGvj0fNNYJxw2WsBshm6DlErhnkGweBzRI/oxbHDEeh+v+z4PhaH ncRg== X-Gm-Message-State: AC+VfDyO0kbXgnFwFJoYsJm1S4u8yRP7mSLesbQUvEuhVPb0MBoQvJTV 2Tdw7pbMkYql3vl4k9WcGgUc4A== X-Google-Smtp-Source: ACHHUZ6M1tOYR5IzxCkUxy+sUt0Djeqr4hN+mskEbw3iamyUciaGMEXp/45S6Zcf95QTtDJx+FPlXA== X-Received: by 2002:a81:b40c:0:b0:544:9180:3104 with SMTP id h12-20020a81b40c000000b0054491803104mr11920044ywi.34.1685341828878; Sun, 28 May 2023 23:30:28 -0700 (PDT) Received: from ripple.attlocal.net (172-10-233-147.lightspeed.sntcca.sbcglobal.net. [172.10.233.147]) by smtp.gmail.com with ESMTPSA id a7-20020a818a07000000b00555c30ec361sm3363238ywg.143.2023.05.28.23.30.25 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 28 May 2023 23:30:28 -0700 (PDT) Date: Sun, 28 May 2023 23:30:24 -0700 (PDT) From: Hugh Dickins X-X-Sender: hugh@ripple.attlocal.net To: Andrew Morton cc: Mike Kravetz , Mike Rapoport , "Kirill A. Shutemov" , Matthew Wilcox , David Hildenbrand , Suren Baghdasaryan , Qi Zheng , Yang Shi , Mel Gorman , Peter Xu , Peter Zijlstra , Will Deacon , Yu Zhao , Alistair Popple , Ralph Campbell , Ira Weiny , Steven Price , SeongJae Park , Naoya Horiguchi , Christophe Leroy , Zack Rusin , Jason Gunthorpe , Axel Rasmussen , Anshuman Khandual , Pasha Tatashin , Miaohe Lin , Minchan Kim , Christoph Hellwig , Song Liu , Thomas Hellstrom , Russell King , "David S. Miller" , Michael Ellerman , "Aneesh Kumar K.V" , Heiko Carstens , Christian Borntraeger , Claudio Imbrenda , Alexander Gordeev , Jann Horn , linux-arm-kernel@lists.infradead.org, sparclinux@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, linux-s390@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: [PATCH 12/12] mm: delete mmap_write_trylock() and vma_try_start_write() In-Reply-To: <35e983f5-7ed3-b310-d949-9ae8b130cdab@google.com> Message-ID: References: <35e983f5-7ed3-b310-d949-9ae8b130cdab@google.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" mmap_write_trylock() and vma_try_start_write() were added just for khugepaged, but now it has no use for them: delete. Signed-off-by: Hugh Dickins --- include/linux/mm.h | 17 ----------------- include/linux/mmap_lock.h | 10 ---------- 2 files changed, 27 deletions(-) diff --git a/include/linux/mm.h b/include/linux/mm.h index 3c2e56980853..9b24f8fbf899 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -690,21 +690,6 @@ static inline void vma_start_write(struct vm_area_stru= ct *vma) up_write(&vma->vm_lock->lock); } =20 -static inline bool vma_try_start_write(struct vm_area_struct *vma) -{ - int mm_lock_seq; - - if (__is_vma_write_locked(vma, &mm_lock_seq)) - return true; - - if (!down_write_trylock(&vma->vm_lock->lock)) - return false; - - vma->vm_lock_seq =3D mm_lock_seq; - up_write(&vma->vm_lock->lock); - return true; -} - static inline void vma_assert_write_locked(struct vm_area_struct *vma) { int mm_lock_seq; @@ -730,8 +715,6 @@ static inline bool vma_start_read(struct vm_area_struct= *vma) { return false; } static inline void vma_end_read(struct vm_area_struct *vma) {} static inline void vma_start_write(struct vm_area_struct *vma) {} -static inline bool vma_try_start_write(struct vm_area_struct *vma) - { return true; } static inline void vma_assert_write_locked(struct vm_area_struct *vma) {} static inline void vma_mark_detached(struct vm_area_struct *vma, bool detached) {} diff --git a/include/linux/mmap_lock.h b/include/linux/mmap_lock.h index aab8f1b28d26..d1191f02c7fa 100644 --- a/include/linux/mmap_lock.h +++ b/include/linux/mmap_lock.h @@ -112,16 +112,6 @@ static inline int mmap_write_lock_killable(struct mm_s= truct *mm) return ret; } =20 -static inline bool mmap_write_trylock(struct mm_struct *mm) -{ - bool ret; - - __mmap_lock_trace_start_locking(mm, true); - ret =3D down_write_trylock(&mm->mmap_lock) !=3D 0; - __mmap_lock_trace_acquire_returned(mm, true, ret); - return ret; -} - static inline void mmap_write_unlock(struct mm_struct *mm) { __mmap_lock_trace_released(mm, true); --=20 2.35.3