From nobody Wed Sep 10 09:07:32 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4F37FC7EE2E for ; Mon, 27 Feb 2023 17:36:48 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229820AbjB0Rgq (ORCPT ); Mon, 27 Feb 2023 12:36:46 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:47670 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229697AbjB0Rgm (ORCPT ); Mon, 27 Feb 2023 12:36:42 -0500 Received: from mail-pj1-x104a.google.com (mail-pj1-x104a.google.com [IPv6:2607:f8b0:4864:20::104a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C39BD2365D for ; Mon, 27 Feb 2023 09:36:40 -0800 (PST) Received: by mail-pj1-x104a.google.com with SMTP id gm13-20020a17090b100d00b0023704a72ca5so1893480pjb.4 for ; Mon, 27 Feb 2023 09:36:40 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=9oOfd5ioBGFgnDThXBfBlaouYrbK90Ol7CTAINZYLi8=; b=J1npsbv1gUS6Xx9BiEsQjP7XeeP4aS0zilvFmnv/Rz+GOr5UTDVWeJrWl0j62csJqw pABdpewltkyMDTz5x9er0/tmyJAZh7x/KhM7c8duF9xadmsMayPg+H71fGIGq8oPKWOj MgbVwv2JgaJHWGglF3BAvt4DqnEv/mXmkg9toQmUgHu4O8sWHghp6pe4qMmBQxfPPa9Z PixCDkJQkwa7ilKkjAeCuQR96NDmtVqp0ZmLhbk++NpcJ1RgmTRpPPKm+6LBvLnGJc6j 5RqWVbaNSaafI5Uk5daP/3bMtRsXgSv4TZmCrUrgFU7+20OuWfc4Q8qOQnmnQqR2Eexd RlJg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=9oOfd5ioBGFgnDThXBfBlaouYrbK90Ol7CTAINZYLi8=; b=vJnuLVWIJcAIdQR5ZFEEr9WnH8crOzURd4qFomdsbACNOqzYqPO9OvSeQGfuhy1VyL NKWQq4BEMaBF4/yfmHOQbzY/svfbDTq23QXp52GMxFwi/dOg+BWzHqmFUkEagOn7xnTw i/q9htYxPFiYd4p8RrXg0zvqMUXMuNCFMPCiebfGKuiPOk2y4GVw4nQt2Ea4mHhvnwAm WJ0ub/WjTMo9g25YIuLOUNrd/MK54CdV/0VSXuVRAnqIUtu0BPevn/HkDF5r6lPfv0c5 mHlLf0orYJ4kvYsMnB6N+uDQcQHMsB1nGRhrE9SYIdUUaoEiUAx/jVbjR9OiK9zx73Y6 mOig== X-Gm-Message-State: AO0yUKWJas+boM9xjpuLfrFWkVZtL3BLYa/ZstZsH92uDtb3nB/tgIVz VSIBkXUVwQ0EwO2rFmwMgxuZ5ulMfi4= X-Google-Smtp-Source: AK7set8H0aV6OfunNhl/Vx76nzg5qpE5z0RrBhYwaC/okN28yJslkLbmkGTcNV4WqgwtQAFy3/Xgq0PVYIo= X-Received: from surenb-desktop.mtv.corp.google.com ([2620:15c:211:200:e1f6:21d1:eead:3897]) (user=surenb job=sendgmr) by 2002:a17:90b:17c9:b0:237:ae7c:1595 with SMTP id me9-20020a17090b17c900b00237ae7c1595mr187914pjb.2.1677519399986; Mon, 27 Feb 2023 09:36:39 -0800 (PST) Date: Mon, 27 Feb 2023 09:36:00 -0800 In-Reply-To: <20230227173632.3292573-1-surenb@google.com> Mime-Version: 1.0 References: <20230227173632.3292573-1-surenb@google.com> X-Mailer: git-send-email 2.39.2.722.g9855ee24e9-goog Message-ID: <20230227173632.3292573-2-surenb@google.com> Subject: [PATCH v4 01/33] maple_tree: Be more cautious about dead nodes From: Suren Baghdasaryan To: akpm@linux-foundation.org Cc: michel@lespinasse.org, jglisse@google.com, mhocko@suse.com, vbabka@suse.cz, hannes@cmpxchg.org, mgorman@techsingularity.net, dave@stgolabs.net, willy@infradead.org, liam.howlett@oracle.com, peterz@infradead.org, ldufour@linux.ibm.com, paulmck@kernel.org, mingo@redhat.com, will@kernel.org, luto@kernel.org, songliubraving@fb.com, peterx@redhat.com, david@redhat.com, dhowells@redhat.com, hughd@google.com, bigeasy@linutronix.de, kent.overstreet@linux.dev, punit.agrawal@bytedance.com, lstoakes@gmail.com, peterjung1337@gmail.com, rientjes@google.com, chriscli@google.com, axelrasmussen@google.com, joelaf@google.com, minchan@google.com, rppt@kernel.org, jannh@google.com, shakeelb@google.com, tatashin@google.com, edumazet@google.com, gthelen@google.com, gurua@google.com, arjunroy@google.com, soheil@google.com, leewalsh@google.com, posk@google.com, michalechner92@googlemail.com, linux-mm@kvack.org, linux-arm-kernel@lists.infradead.org, linuxppc-dev@lists.ozlabs.org, x86@kernel.org, linux-kernel@vger.kernel.org, kernel-team@android.com, Liam Howlett , Suren Baghdasaryan Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Liam Howlett ma_pivots() and ma_data_end() may be called with a dead node. Ensure to that the node isn't dead before using the returned values. This is necessary for RCU mode of the maple tree. Fixes: 54a611b60590 ("Maple Tree: add new data structure") Signed-off-by: Liam Howlett Signed-off-by: Suren Baghdasaryan --- lib/maple_tree.c | 52 +++++++++++++++++++++++++++++++++++++++--------- 1 file changed, 43 insertions(+), 9 deletions(-) diff --git a/lib/maple_tree.c b/lib/maple_tree.c index 646297cae5d1..cc356b8369ad 100644 --- a/lib/maple_tree.c +++ b/lib/maple_tree.c @@ -544,6 +544,7 @@ static inline bool ma_dead_node(const struct maple_node= *node) =20 return (parent =3D=3D node); } + /* * mte_dead_node() - check if the @enode is dead. * @enode: The encoded maple node @@ -625,6 +626,8 @@ static inline unsigned int mas_alloc_req(const struct m= a_state *mas) * @node - the maple node * @type - the node type * + * In the event of a dead node, this array may be %NULL + * * Return: A pointer to the maple node pivots */ static inline unsigned long *ma_pivots(struct maple_node *node, @@ -1096,8 +1099,11 @@ static int mas_ascend(struct ma_state *mas) a_type =3D mas_parent_enum(mas, p_enode); a_node =3D mte_parent(p_enode); a_slot =3D mte_parent_slot(p_enode); - pivots =3D ma_pivots(a_node, a_type); a_enode =3D mt_mk_node(a_node, a_type); + pivots =3D ma_pivots(a_node, a_type); + + if (unlikely(ma_dead_node(a_node))) + return 1; =20 if (!set_min && a_slot) { set_min =3D true; @@ -1401,6 +1407,9 @@ static inline unsigned char ma_data_end(struct maple_= node *node, { unsigned char offset; =20 + if (!pivots) + return 0; + if (type =3D=3D maple_arange_64) return ma_meta_end(node, type); =20 @@ -1436,6 +1445,9 @@ static inline unsigned char mas_data_end(struct ma_st= ate *mas) return ma_meta_end(node, type); =20 pivots =3D ma_pivots(node, type); + if (unlikely(ma_dead_node(node))) + return 0; + offset =3D mt_pivots[type] - 1; if (likely(!pivots[offset])) return ma_meta_end(node, type); @@ -4505,6 +4517,9 @@ static inline int mas_prev_node(struct ma_state *mas,= unsigned long min) node =3D mas_mn(mas); slots =3D ma_slots(node, mt); pivots =3D ma_pivots(node, mt); + if (unlikely(ma_dead_node(node))) + return 1; + mas->max =3D pivots[offset]; if (offset) mas->min =3D pivots[offset - 1] + 1; @@ -4526,6 +4541,9 @@ static inline int mas_prev_node(struct ma_state *mas,= unsigned long min) slots =3D ma_slots(node, mt); pivots =3D ma_pivots(node, mt); offset =3D ma_data_end(node, mt, pivots, mas->max); + if (unlikely(ma_dead_node(node))) + return 1; + if (offset) mas->min =3D pivots[offset - 1] + 1; =20 @@ -4574,6 +4592,7 @@ static inline int mas_next_node(struct ma_state *mas,= struct maple_node *node, struct maple_enode *enode; int level =3D 0; unsigned char offset; + unsigned char node_end; enum maple_type mt; void __rcu **slots; =20 @@ -4597,7 +4616,11 @@ static inline int mas_next_node(struct ma_state *mas= , struct maple_node *node, node =3D mas_mn(mas); mt =3D mte_node_type(mas->node); pivots =3D ma_pivots(node, mt); - } while (unlikely(offset =3D=3D ma_data_end(node, mt, pivots, mas->max))); + node_end =3D ma_data_end(node, mt, pivots, mas->max); + if (unlikely(ma_dead_node(node))) + return 1; + + } while (unlikely(offset =3D=3D node_end)); =20 slots =3D ma_slots(node, mt); pivot =3D mas_safe_pivot(mas, pivots, ++offset, mt); @@ -4613,6 +4636,9 @@ static inline int mas_next_node(struct ma_state *mas,= struct maple_node *node, mt =3D mte_node_type(mas->node); slots =3D ma_slots(node, mt); pivots =3D ma_pivots(node, mt); + if (unlikely(ma_dead_node(node))) + return 1; + offset =3D 0; pivot =3D pivots[0]; } @@ -4659,11 +4685,14 @@ static inline void *mas_next_nentry(struct ma_state= *mas, return NULL; } =20 - pivots =3D ma_pivots(node, type); slots =3D ma_slots(node, type); - mas->index =3D mas_safe_min(mas, pivots, mas->offset); + pivots =3D ma_pivots(node, type); count =3D ma_data_end(node, type, pivots, mas->max); - if (ma_dead_node(node)) + if (unlikely(ma_dead_node(node))) + return NULL; + + mas->index =3D mas_safe_min(mas, pivots, mas->offset); + if (unlikely(ma_dead_node(node))) return NULL; =20 if (mas->index > max) @@ -4817,6 +4846,11 @@ static inline void *mas_prev_nentry(struct ma_state = *mas, unsigned long limit, =20 slots =3D ma_slots(mn, mt); pivots =3D ma_pivots(mn, mt); + if (unlikely(ma_dead_node(mn))) { + mas_rewalk(mas, index); + goto retry; + } + if (offset =3D=3D mt_pivots[mt]) pivot =3D mas->max; else @@ -6631,11 +6665,11 @@ static inline void *mas_first_entry(struct ma_state= *mas, struct maple_node *mn, while (likely(!ma_is_leaf(mt))) { MT_BUG_ON(mas->tree, mte_dead_node(mas->node)); slots =3D ma_slots(mn, mt); - pivots =3D ma_pivots(mn, mt); - max =3D pivots[0]; entry =3D mas_slot(mas, slots, 0); + pivots =3D ma_pivots(mn, mt); if (unlikely(ma_dead_node(mn))) return NULL; + max =3D pivots[0]; mas->node =3D entry; mn =3D mas_mn(mas); mt =3D mte_node_type(mas->node); @@ -6655,13 +6689,13 @@ static inline void *mas_first_entry(struct ma_state= *mas, struct maple_node *mn, if (likely(entry)) return entry; =20 - pivots =3D ma_pivots(mn, mt); - mas->index =3D pivots[0] + 1; mas->offset =3D 1; entry =3D mas_slot(mas, slots, 1); + pivots =3D ma_pivots(mn, mt); if (unlikely(ma_dead_node(mn))) return NULL; =20 + mas->index =3D pivots[0] + 1; if (mas->index > limit) goto none; =20 --=20 2.39.2.722.g9855ee24e9-goog From nobody Wed Sep 10 09:07:32 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id F132BC7EE23 for ; Mon, 27 Feb 2023 17:36:51 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229864AbjB0Rgu (ORCPT ); Mon, 27 Feb 2023 12:36:50 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:47710 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229671AbjB0Rgo (ORCPT ); Mon, 27 Feb 2023 12:36:44 -0500 Received: from mail-yw1-x1149.google.com (mail-yw1-x1149.google.com [IPv6:2607:f8b0:4864:20::1149]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 33C3510DE for ; Mon, 27 Feb 2023 09:36:43 -0800 (PST) Received: by mail-yw1-x1149.google.com with SMTP id 00721157ae682-536bbaa701aso152695117b3.3 for ; Mon, 27 Feb 2023 09:36:43 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=lArWv6/BKpLmlLXC6GszXodHEO09dxzXTjGiFRBLB5o=; b=Rtawk/5qB+wQAwcC1iD4VXuhoYDvFfWXoR8TLDjNU0mxzXG+OcIc5M1YDwWsJSdzc1 5ov0BA8i2CNkZg28B06fzU5xhMSs1uI1giDFkRO+7bYBZMhhkWnMlBAHldMBIWHj/nI6 Se6XX7qnInZ409hEUwR8XAgKIrmPJD89Fz59bDJnlZVSabti6XILWdKtE9EozNXeoXJo jeE4G4VS8SUdBqPqEoVdM/8Bov6AoMRWvMj0a5NQOXvkupkTbiI158Odu2A7RSMPsE77 crrvpWC4xPR29dzTvNq/UerrAnu3X0id5UHZqYhq5AWR42pdWVuHy+UOicXnyhpFqhVJ jDJg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=lArWv6/BKpLmlLXC6GszXodHEO09dxzXTjGiFRBLB5o=; b=uCfxwqDNGP9JwCZv3QZGkcrjBZnkJnXhaoOuqtMR1LoQpr+tJ0GryuyUtsv1MrojPd wXQJ4uafsBCpfXWWjCgafirgkSe2McL4qpG/HK0XsU4Eeh6FtC3iGfaqP62jQWkLg2WJ HeHptlTU4+FpIXEOaFx2DGW+YNUw1WxRtqUCBsieCmSLIzpR9ibibLpXFyllkHJtgF0c Lb77qqRoUtoFkrepqjDjBNxm7nFNK0YopWzKWGnH6zIBTOuJui+KXkMY3sDrk1wcGaih YNbbqv51b/PC1W/Ndd/EeTHajlnXSMiknoSM/ZNvBZ+LdbxtN0F5pEVp8/eWJccZD9Nn odNg== X-Gm-Message-State: AO0yUKX8KXpQeqK0Al5LxTsWMqsoFoUSBw6s8110dOy1GDRY6ARzCz0F doj3ryGqyhbgvRu/eKGQaNyg2P2IVKY= X-Google-Smtp-Source: AK7set/6BFahylhHoQQLwR3PqP3k7QFJ6TbZ44w6th+v/3oGX1dR6WIuOL9QS1wjUZYjoh9QnZVdIWwPxxI= X-Received: from surenb-desktop.mtv.corp.google.com ([2620:15c:211:200:e1f6:21d1:eead:3897]) (user=surenb job=sendgmr) by 2002:a25:8890:0:b0:a0d:8150:be04 with SMTP id d16-20020a258890000000b00a0d8150be04mr7185754ybl.13.1677519402335; Mon, 27 Feb 2023 09:36:42 -0800 (PST) Date: Mon, 27 Feb 2023 09:36:01 -0800 In-Reply-To: <20230227173632.3292573-1-surenb@google.com> Mime-Version: 1.0 References: <20230227173632.3292573-1-surenb@google.com> X-Mailer: git-send-email 2.39.2.722.g9855ee24e9-goog Message-ID: <20230227173632.3292573-3-surenb@google.com> Subject: [PATCH v4 02/33] maple_tree: Detect dead nodes in mas_start() From: Suren Baghdasaryan To: akpm@linux-foundation.org Cc: michel@lespinasse.org, jglisse@google.com, mhocko@suse.com, vbabka@suse.cz, hannes@cmpxchg.org, mgorman@techsingularity.net, dave@stgolabs.net, willy@infradead.org, liam.howlett@oracle.com, peterz@infradead.org, ldufour@linux.ibm.com, paulmck@kernel.org, mingo@redhat.com, will@kernel.org, luto@kernel.org, songliubraving@fb.com, peterx@redhat.com, david@redhat.com, dhowells@redhat.com, hughd@google.com, bigeasy@linutronix.de, kent.overstreet@linux.dev, punit.agrawal@bytedance.com, lstoakes@gmail.com, peterjung1337@gmail.com, rientjes@google.com, chriscli@google.com, axelrasmussen@google.com, joelaf@google.com, minchan@google.com, rppt@kernel.org, jannh@google.com, shakeelb@google.com, tatashin@google.com, edumazet@google.com, gthelen@google.com, gurua@google.com, arjunroy@google.com, soheil@google.com, leewalsh@google.com, posk@google.com, michalechner92@googlemail.com, linux-mm@kvack.org, linux-arm-kernel@lists.infradead.org, linuxppc-dev@lists.ozlabs.org, x86@kernel.org, linux-kernel@vger.kernel.org, kernel-team@android.com, Liam Howlett , Suren Baghdasaryan Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Liam Howlett When initially starting a search, the root node may already be in the process of being replaced in RCU mode. Detect and restart the walk if this is the case. This is necessary for RCU mode of the maple tree. Fixes: 54a611b60590 ("Maple Tree: add new data structure") Signed-off-by: Liam Howlett Signed-off-by: Suren Baghdasaryan --- lib/maple_tree.c | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/lib/maple_tree.c b/lib/maple_tree.c index cc356b8369ad..089cd76ec379 100644 --- a/lib/maple_tree.c +++ b/lib/maple_tree.c @@ -1360,12 +1360,16 @@ static inline struct maple_enode *mas_start(struct = ma_state *mas) mas->max =3D ULONG_MAX; mas->depth =3D 0; =20 +retry: root =3D mas_root(mas); /* Tree with nodes */ if (likely(xa_is_node(root))) { mas->depth =3D 1; mas->node =3D mte_safe_root(root); mas->offset =3D 0; + if (mte_dead_node(mas->node)) + goto retry; + return NULL; } =20 --=20 2.39.2.722.g9855ee24e9-goog From nobody Wed Sep 10 09:07:32 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 325E4C64ED8 for ; Mon, 27 Feb 2023 17:36:59 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229973AbjB0Rg5 (ORCPT ); Mon, 27 Feb 2023 12:36:57 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:47822 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229834AbjB0Rgq (ORCPT ); Mon, 27 Feb 2023 12:36:46 -0500 Received: from mail-yw1-x1149.google.com (mail-yw1-x1149.google.com [IPv6:2607:f8b0:4864:20::1149]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 46E482387C for ; Mon, 27 Feb 2023 09:36:45 -0800 (PST) Received: by mail-yw1-x1149.google.com with SMTP id 00721157ae682-536bbaa701aso152697007b3.3 for ; Mon, 27 Feb 2023 09:36:45 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=orUi5wHnC++aji9CdkMMFCsYDd1RwNIxCQdLrt+cFY0=; b=Am83ro2YmRzZ4VgyIBhDoTNrPllqfw/5UTqTUL7yZCncC06+SMYT4eArFva+POwT7g 6A/0/pFhHTsDxJdieAltqSAe6HtiFFVlFItush+XbcRSxs2FetQjqrV9RzDHQ5xUXxy+ EFVFqHW9b/Vfsu1fjnNuXuTzPWG0kNjoL02ASewZm4aTLFi207WQmvOLdMhmY0rXGsBi NgW5j6vPb4Yb97AotYvdQXAzm4p61g/PV5t8r9ru9nwtbYld830L0Sd8TDb9hbnZaf2S Co5g3ZCxIRR9yw3YwCJpCKA97OC2cOIFxPWxIRbJAgeIMHg6EAO/atx61gUeYr5Wxasb jnNw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=orUi5wHnC++aji9CdkMMFCsYDd1RwNIxCQdLrt+cFY0=; b=Gp25PwCkWSdtZ3MmfQPvLFRd/xvSGm3paVqVBEYmqueOwVR0cfE04xPUutU3Y10G3u rgpA7nIj/oGXVjT+A8C4Ps2gfCnFJIwkH/+7ALKrYqA0papt/VpvF8rVVCC+O+TfnQTw Yc4kED8XKNJiFh3zRKhMvDCpokpwa8qPn0wkDqR5kQ5ZUF8dy2L3EbEU/kDAQlxf91Dv 9nWDHhWNmjbjeq8FPoLSxaszh0uyP5NXyr63i12J38I0wojoOFZ97vF5xbPmmjYBlwh5 Xx9Yype9tOHu+f+/zqt2uTqd45KwORDNooJMGXU5nown88fsnCxn/NSCWYFsbOTwkpZa mZ5Q== X-Gm-Message-State: AO0yUKUrNpeGH+FJjxOEjbJlO3NoAF8PGf8eE1lbgBNQrBRNwmS1S1bF MzDg9D1VAG5KzK6VcVy+nFyBUZzY4T8= X-Google-Smtp-Source: AK7set8ivfKmLT2u3vpwWqjPXs4Ue9yElb1Co2/gr63/ZMmCdo5ef9MT+bBbe/btFlir1Y+dX6RcjVs74cU= X-Received: from surenb-desktop.mtv.corp.google.com ([2620:15c:211:200:e1f6:21d1:eead:3897]) (user=surenb job=sendgmr) by 2002:a5b:ecb:0:b0:a03:da3f:3e68 with SMTP id a11-20020a5b0ecb000000b00a03da3f3e68mr9395474ybs.12.1677519404853; Mon, 27 Feb 2023 09:36:44 -0800 (PST) Date: Mon, 27 Feb 2023 09:36:02 -0800 In-Reply-To: <20230227173632.3292573-1-surenb@google.com> Mime-Version: 1.0 References: <20230227173632.3292573-1-surenb@google.com> X-Mailer: git-send-email 2.39.2.722.g9855ee24e9-goog Message-ID: <20230227173632.3292573-4-surenb@google.com> Subject: [PATCH v4 03/33] maple_tree: Fix freeing of nodes in rcu mode From: Suren Baghdasaryan To: akpm@linux-foundation.org Cc: michel@lespinasse.org, jglisse@google.com, mhocko@suse.com, vbabka@suse.cz, hannes@cmpxchg.org, mgorman@techsingularity.net, dave@stgolabs.net, willy@infradead.org, liam.howlett@oracle.com, peterz@infradead.org, ldufour@linux.ibm.com, paulmck@kernel.org, mingo@redhat.com, will@kernel.org, luto@kernel.org, songliubraving@fb.com, peterx@redhat.com, david@redhat.com, dhowells@redhat.com, hughd@google.com, bigeasy@linutronix.de, kent.overstreet@linux.dev, punit.agrawal@bytedance.com, lstoakes@gmail.com, peterjung1337@gmail.com, rientjes@google.com, chriscli@google.com, axelrasmussen@google.com, joelaf@google.com, minchan@google.com, rppt@kernel.org, jannh@google.com, shakeelb@google.com, tatashin@google.com, edumazet@google.com, gthelen@google.com, gurua@google.com, arjunroy@google.com, soheil@google.com, leewalsh@google.com, posk@google.com, michalechner92@googlemail.com, linux-mm@kvack.org, linux-arm-kernel@lists.infradead.org, linuxppc-dev@lists.ozlabs.org, x86@kernel.org, linux-kernel@vger.kernel.org, kernel-team@android.com, Liam Howlett , Suren Baghdasaryan Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Liam Howlett The walk to destroy the nodes was not always setting the node type and would result in a destroy method potentially using the values as nodes. Avoid this by setting the correct node types. This is necessary for the RCU mode of the maple tree. Fixes: 54a611b60590 ("Maple Tree: add new data structure") Signed-off-by: Liam Howlett Signed-off-by: Suren Baghdasaryan --- lib/maple_tree.c | 73 ++++++++++++++++++++++++++++++++++++++++-------- 1 file changed, 62 insertions(+), 11 deletions(-) diff --git a/lib/maple_tree.c b/lib/maple_tree.c index 089cd76ec379..44d6ce30b28e 100644 --- a/lib/maple_tree.c +++ b/lib/maple_tree.c @@ -902,6 +902,44 @@ static inline void ma_set_meta(struct maple_node *mn, = enum maple_type mt, meta->end =3D end; } =20 +/* + * mas_clear_meta() - clear the metadata information of a node, if it exis= ts + * @mas: The maple state + * @mn: The maple node + * @mt: The maple node type + * @offset: The offset of the highest sub-gap in this node. + * @end: The end of the data in this node. + */ +static inline void mas_clear_meta(struct ma_state *mas, struct maple_node = *mn, + enum maple_type mt) +{ + struct maple_metadata *meta; + unsigned long *pivots; + void __rcu **slots; + void *next; + + switch (mt) { + case maple_range_64: + pivots =3D mn->mr64.pivot; + if (unlikely(pivots[MAPLE_RANGE64_SLOTS - 2])) { + slots =3D mn->mr64.slot; + next =3D mas_slot_locked(mas, slots, + MAPLE_RANGE64_SLOTS - 1); + if (unlikely((mte_to_node(next) && mte_node_type(next)))) + return; /* The last slot is a node, no metadata */ + } + fallthrough; + case maple_arange_64: + meta =3D ma_meta(mn, mt); + break; + default: + return; + } + + meta->gap =3D 0; + meta->end =3D 0; +} + /* * ma_meta_end() - Get the data end of a node from the metadata * @mn: The maple node @@ -5455,20 +5493,22 @@ static inline int mas_rev_alloc(struct ma_state *ma= s, unsigned long min, * mas_dead_leaves() - Mark all leaves of a node as dead. * @mas: The maple state * @slots: Pointer to the slot array + * @type: The maple node type * * Must hold the write lock. * * Return: The number of leaves marked as dead. */ static inline -unsigned char mas_dead_leaves(struct ma_state *mas, void __rcu **slots) +unsigned char mas_dead_leaves(struct ma_state *mas, void __rcu **slots, + enum maple_type mt) { struct maple_node *node; enum maple_type type; void *entry; int offset; =20 - for (offset =3D 0; offset < mt_slot_count(mas->node); offset++) { + for (offset =3D 0; offset < mt_slots[mt]; offset++) { entry =3D mas_slot_locked(mas, slots, offset); type =3D mte_node_type(entry); node =3D mte_to_node(entry); @@ -5487,14 +5527,13 @@ unsigned char mas_dead_leaves(struct ma_state *mas,= void __rcu **slots) =20 static void __rcu **mas_dead_walk(struct ma_state *mas, unsigned char offs= et) { - struct maple_node *node, *next; + struct maple_node *next; void __rcu **slots =3D NULL; =20 next =3D mas_mn(mas); do { - mas->node =3D ma_enode_ptr(next); - node =3D mas_mn(mas); - slots =3D ma_slots(node, node->type); + mas->node =3D mt_mk_node(next, next->type); + slots =3D ma_slots(next, next->type); next =3D mas_slot_locked(mas, slots, offset); offset =3D 0; } while (!ma_is_leaf(next->type)); @@ -5558,11 +5597,14 @@ static inline void __rcu **mas_destroy_descend(stru= ct ma_state *mas, node =3D mas_mn(mas); slots =3D ma_slots(node, mte_node_type(mas->node)); next =3D mas_slot_locked(mas, slots, 0); - if ((mte_dead_node(next))) + if ((mte_dead_node(next))) { + mte_to_node(next)->type =3D mte_node_type(next); next =3D mas_slot_locked(mas, slots, 1); + } =20 mte_set_node_dead(mas->node); node->type =3D mte_node_type(mas->node); + mas_clear_meta(mas, node, node->type); node->piv_parent =3D prev; node->parent_slot =3D offset; offset =3D 0; @@ -5582,13 +5624,18 @@ static void mt_destroy_walk(struct maple_enode *eno= de, unsigned char ma_flags, =20 MA_STATE(mas, &mt, 0, 0); =20 - if (mte_is_leaf(enode)) + mas.node =3D enode; + if (mte_is_leaf(enode)) { + node->type =3D mte_node_type(enode); goto free_leaf; + } =20 + ma_flags &=3D ~MT_FLAGS_LOCK_MASK; mt_init_flags(&mt, ma_flags); mas_lock(&mas); =20 - mas.node =3D start =3D enode; + mte_to_node(enode)->ma_flags =3D ma_flags; + start =3D enode; slots =3D mas_destroy_descend(&mas, start, 0); node =3D mas_mn(&mas); do { @@ -5596,7 +5643,8 @@ static void mt_destroy_walk(struct maple_enode *enode= , unsigned char ma_flags, unsigned char offset; struct maple_enode *parent, *tmp; =20 - node->slot_len =3D mas_dead_leaves(&mas, slots); + node->type =3D mte_node_type(mas.node); + node->slot_len =3D mas_dead_leaves(&mas, slots, node->type); if (free) mt_free_bulk(node->slot_len, slots); offset =3D node->parent_slot + 1; @@ -5620,7 +5668,8 @@ static void mt_destroy_walk(struct maple_enode *enode= , unsigned char ma_flags, } while (start !=3D mas.node); =20 node =3D mas_mn(&mas); - node->slot_len =3D mas_dead_leaves(&mas, slots); + node->type =3D mte_node_type(mas.node); + node->slot_len =3D mas_dead_leaves(&mas, slots, node->type); if (free) mt_free_bulk(node->slot_len, slots); =20 @@ -5630,6 +5679,8 @@ static void mt_destroy_walk(struct maple_enode *enode= , unsigned char ma_flags, free_leaf: if (free) mt_free_rcu(&node->rcu); + else + mas_clear_meta(&mas, node, node->type); } =20 /* --=20 2.39.2.722.g9855ee24e9-goog From nobody Wed Sep 10 09:07:32 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id C6DAEC64ED6 for ; Mon, 27 Feb 2023 17:37:02 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229922AbjB0RhB (ORCPT ); Mon, 27 Feb 2023 12:37:01 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48178 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229835AbjB0Rgz (ORCPT ); Mon, 27 Feb 2023 12:36:55 -0500 Received: from mail-yw1-x114a.google.com (mail-yw1-x114a.google.com [IPv6:2607:f8b0:4864:20::114a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 7D7BC23C72 for ; Mon, 27 Feb 2023 09:36:48 -0800 (PST) Received: by mail-yw1-x114a.google.com with SMTP id 00721157ae682-536b7eb9117so153558437b3.14 for ; Mon, 27 Feb 2023 09:36:48 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=fnkDgHrJ0cqyk4yVnPbndf59jVemOaMLlO/iS92oZbE=; b=cLE4KbCrSo3Ipu4g+yzfkRmcncyjiA+PVjyxhSlE0CxpbROsbbX5H0DHpPWYcZcUvL IgBczq5t3IK+5ladr1TbWOx/+SU8LSbf8Yx9jr8PkelvEstvvhZXeX43s6JTSESoyEye 3OJvntBP7bSxCk5pBrTwRFBtDhh/o6VDU9KVAovEJZ9n4EduivQuGfxt0pt6fskiyQUB zHwyjGzqyAJgiciJUq8aC5hSo+rK3+lmuiggplyhk0zrZZEyghcsffT9pFo2TSF/swmL j04i+/EYbcR8DTa/IGgGuaUtWO9uflgS4pwivCZBPdFnqw+L6jKJzSVJqswEg+Tmf1nw 9kkQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=fnkDgHrJ0cqyk4yVnPbndf59jVemOaMLlO/iS92oZbE=; b=lx4bTmzhD6PUNha9qcuzKOkVzC+JZVUMTBSIhOR4/2ut/RwxxGNJoe76YjSAJweiup DAyfPMicJAkrWODdgT7f63gFPAUKRDD+spfeyqCyTpN90yy+nnWMfcV+NQRMerQwWF20 +D0/Cyi6EeBk6iQNPY2NYTRqxO3lPiQZ360O7mcmKp3eO8B72nmqEFE7jfGeXFid0e2Z J3LGJHHG2Bvk73AARTWXcZyL71OAHUQ07iSCKc9Hc27wluOX4/31dV2PHADml9Le83He 60K4rdeWLhbqL1tPx82QC4wN7dYWqfiezy/qB1swfs5TaMW9LwRF6htZmmYhGct3IaUk 0/Sg== X-Gm-Message-State: AO0yUKWYNsos3WrlS12kTrY7UtM1XEetRPL7cXc1zf7/7G3hx8GeLE9A hKgnNraaIem3MtQwnO0F5f/kYoYhk9M= X-Google-Smtp-Source: AK7set9OxVnGQxF1LzHrZxP47amBuaMy7ZRonNPJ1uJTtUoNYZIGL83gP4W8p3rUFsPEboBNOudLGUD3TJ0= X-Received: from surenb-desktop.mtv.corp.google.com ([2620:15c:211:200:e1f6:21d1:eead:3897]) (user=surenb job=sendgmr) by 2002:a05:6902:34f:b0:a35:f3f2:7deb with SMTP id e15-20020a056902034f00b00a35f3f27debmr14636ybs.250.1677519407298; Mon, 27 Feb 2023 09:36:47 -0800 (PST) Date: Mon, 27 Feb 2023 09:36:03 -0800 In-Reply-To: <20230227173632.3292573-1-surenb@google.com> Mime-Version: 1.0 References: <20230227173632.3292573-1-surenb@google.com> X-Mailer: git-send-email 2.39.2.722.g9855ee24e9-goog Message-ID: <20230227173632.3292573-5-surenb@google.com> Subject: [PATCH v4 04/33] maple_tree: remove extra smp_wmb() from mas_dead_leaves() From: Suren Baghdasaryan To: akpm@linux-foundation.org Cc: michel@lespinasse.org, jglisse@google.com, mhocko@suse.com, vbabka@suse.cz, hannes@cmpxchg.org, mgorman@techsingularity.net, dave@stgolabs.net, willy@infradead.org, liam.howlett@oracle.com, peterz@infradead.org, ldufour@linux.ibm.com, paulmck@kernel.org, mingo@redhat.com, will@kernel.org, luto@kernel.org, songliubraving@fb.com, peterx@redhat.com, david@redhat.com, dhowells@redhat.com, hughd@google.com, bigeasy@linutronix.de, kent.overstreet@linux.dev, punit.agrawal@bytedance.com, lstoakes@gmail.com, peterjung1337@gmail.com, rientjes@google.com, chriscli@google.com, axelrasmussen@google.com, joelaf@google.com, minchan@google.com, rppt@kernel.org, jannh@google.com, shakeelb@google.com, tatashin@google.com, edumazet@google.com, gthelen@google.com, gurua@google.com, arjunroy@google.com, soheil@google.com, leewalsh@google.com, posk@google.com, michalechner92@googlemail.com, linux-mm@kvack.org, linux-arm-kernel@lists.infradead.org, linuxppc-dev@lists.ozlabs.org, x86@kernel.org, linux-kernel@vger.kernel.org, kernel-team@android.com, Liam Howlett , Suren Baghdasaryan Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Liam Howlett The call to mte_set_dead_node() before the smp_wmb() already calls smp_wmb() so this is not needed. This is an optimization for the RCU mode of the maple tree. Fixes: 54a611b60590 ("Maple Tree: add new data structure") Signed-off-by: Liam Howlett Signed-off-by: Suren Baghdasaryan --- lib/maple_tree.c | 1 - 1 file changed, 1 deletion(-) diff --git a/lib/maple_tree.c b/lib/maple_tree.c index 44d6ce30b28e..3d5ab02f981a 100644 --- a/lib/maple_tree.c +++ b/lib/maple_tree.c @@ -5517,7 +5517,6 @@ unsigned char mas_dead_leaves(struct ma_state *mas, v= oid __rcu **slots, break; =20 mte_set_node_dead(entry); - smp_wmb(); /* Needed for RCU */ node->type =3D type; rcu_assign_pointer(slots[offset], node); } --=20 2.39.2.722.g9855ee24e9-goog From nobody Wed Sep 10 09:07:32 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id C49B0C7EE23 for ; Mon, 27 Feb 2023 17:37:05 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230071AbjB0RhE (ORCPT ); Mon, 27 Feb 2023 12:37:04 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48206 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229881AbjB0Rg4 (ORCPT ); Mon, 27 Feb 2023 12:36:56 -0500 Received: from mail-pf1-x449.google.com (mail-pf1-x449.google.com [IPv6:2607:f8b0:4864:20::449]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C219524114 for ; Mon, 27 Feb 2023 09:36:50 -0800 (PST) Received: by mail-pf1-x449.google.com with SMTP id i15-20020aa78d8f000000b005edc307b103so2926495pfr.17 for ; Mon, 27 Feb 2023 09:36:50 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=9Ei1aTodbgKgkLCHkAcrfXghqfC4aFVkwG1NwHfqe18=; b=mjTDCe3um0lc/xjVfVjgd+Npe9HefebnR6MJBHLvf3YF1uIBfRxVZUnFRJLGFeQ5Qn 8RjZv63JPHIbjxEVYDv4CrjZbJnnn4Mt2fDeT1LOlyG88MFvPDh3biXldQxmJDhFfhkL MeulbY1wD03auJFox9o7eR+S24BryOLDDsTTZGQeZ4URN1aDTRabL9SThEbzRf62X937 1Tp2+9t6o/KfoCPMJmoRWi+E8Rd5yccUzXxzvOPdrAcdSgNX0Ahh/fjHi6t+VXE449Oh mngGw09qy20HWkPARPiqIZAx0mZi+xWfyiKlsc96M1s7LVT+aejv+nyRL84gn5iOqmCZ H2mg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=9Ei1aTodbgKgkLCHkAcrfXghqfC4aFVkwG1NwHfqe18=; b=6Bj1IQZ/Xzr6i8ILt+Zsgs2/+ytEFfvXSAGOXKODQfp6m1b8I5bZT3hZR3Z4AGaWAX W8vfzHizzuYOHbgMLB0uo+INnMegAhCx1j6kHN4rRZI+oYmLcYCzu+Gpk1nkajvVVtcU zpun2XbDf+L2jzhho1bA4xtsdBrplcWFxwTz9D/An6CRY8KBAc3IXglqxMI3yRxFGBOJ m1ih123O0VB2Qyk8nIYuc4UYPwOPyRVqmH12daSudIzsxeslYv4MM22fcx1ezfO3EKaC BsmJ3NxhYhCdUzy0BRaPkgHWqdiH1bjOvqi2QPWBDvp/e/3N+zLnMh5S23pyIRik5aZo JxTg== X-Gm-Message-State: AO0yUKUibKW9eF8eO7Tkg7oz/3u0kgJWlq8QGNGP/m2idQUyNxkhfYEt uv/lJ2Wscy+KI0yGS/ATx57TzP1qLAs= X-Google-Smtp-Source: AK7set8gSrJb7C7al4vp6knawTGaEfFTfMkKKVc0fZqqef6Oc8nGI6YkUfCL0eSaXZ77p/UxVJxZTTA+0vM= X-Received: from surenb-desktop.mtv.corp.google.com ([2620:15c:211:200:e1f6:21d1:eead:3897]) (user=surenb job=sendgmr) by 2002:a63:714f:0:b0:4fc:2058:fa2b with SMTP id b15-20020a63714f000000b004fc2058fa2bmr6283546pgn.3.1677519409843; Mon, 27 Feb 2023 09:36:49 -0800 (PST) Date: Mon, 27 Feb 2023 09:36:04 -0800 In-Reply-To: <20230227173632.3292573-1-surenb@google.com> Mime-Version: 1.0 References: <20230227173632.3292573-1-surenb@google.com> X-Mailer: git-send-email 2.39.2.722.g9855ee24e9-goog Message-ID: <20230227173632.3292573-6-surenb@google.com> Subject: [PATCH v4 05/33] maple_tree: Fix write memory barrier of nodes once dead for RCU mode From: Suren Baghdasaryan To: akpm@linux-foundation.org Cc: michel@lespinasse.org, jglisse@google.com, mhocko@suse.com, vbabka@suse.cz, hannes@cmpxchg.org, mgorman@techsingularity.net, dave@stgolabs.net, willy@infradead.org, liam.howlett@oracle.com, peterz@infradead.org, ldufour@linux.ibm.com, paulmck@kernel.org, mingo@redhat.com, will@kernel.org, luto@kernel.org, songliubraving@fb.com, peterx@redhat.com, david@redhat.com, dhowells@redhat.com, hughd@google.com, bigeasy@linutronix.de, kent.overstreet@linux.dev, punit.agrawal@bytedance.com, lstoakes@gmail.com, peterjung1337@gmail.com, rientjes@google.com, chriscli@google.com, axelrasmussen@google.com, joelaf@google.com, minchan@google.com, rppt@kernel.org, jannh@google.com, shakeelb@google.com, tatashin@google.com, edumazet@google.com, gthelen@google.com, gurua@google.com, arjunroy@google.com, soheil@google.com, leewalsh@google.com, posk@google.com, michalechner92@googlemail.com, linux-mm@kvack.org, linux-arm-kernel@lists.infradead.org, linuxppc-dev@lists.ozlabs.org, x86@kernel.org, linux-kernel@vger.kernel.org, kernel-team@android.com, "Liam R. Howlett" , Suren Baghdasaryan Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: "Liam R. Howlett" During the development of the maple tree, the strategy of freeing multiple nodes changed and, in the process, the pivots were reused to store pointers to dead nodes. To ensure the readers see accurate pivots, the writers need to mark the nodes as dead and call smp_wmb() to ensure any readers can identify the node as dead before using the pivot values. There were two places where the old method of marking the node as dead without smp_wmb() were being used, which resulted in RCU readers seeing the wrong pivot value before seeing the node was dead. Fix this race condition by using mte_set_node_dead() which has the smp_wmb() call to ensure the race is closed. Add a WARN_ON() to the ma_free_rcu() call to ensure all nodes being freed are marked as dead to ensure there are no other call paths besides the two updated paths. This is necessary for the RCU mode of the maple tree. Fixes: 54a611b60590 ("Maple Tree: add new data structure") Signed-off-by: Liam R. Howlett Signed-off-by: Suren Baghdasaryan --- lib/maple_tree.c | 7 +++++-- tools/testing/radix-tree/maple.c | 16 ++++++++++++++++ 2 files changed, 21 insertions(+), 2 deletions(-) diff --git a/lib/maple_tree.c b/lib/maple_tree.c index 3d5ab02f981a..6b6eddadd9d2 100644 --- a/lib/maple_tree.c +++ b/lib/maple_tree.c @@ -185,7 +185,7 @@ static void mt_free_rcu(struct rcu_head *head) */ static void ma_free_rcu(struct maple_node *node) { - node->parent =3D ma_parent_ptr(node); + WARN_ON(node->parent !=3D ma_parent_ptr(node)); call_rcu(&node->rcu, mt_free_rcu); } =20 @@ -1778,8 +1778,10 @@ static inline void mas_replace(struct ma_state *mas,= bool advanced) rcu_assign_pointer(slots[offset], mas->node); } =20 - if (!advanced) + if (!advanced) { + mte_set_node_dead(old_enode); mas_free(mas, old_enode); + } } =20 /* @@ -4218,6 +4220,7 @@ static inline bool mas_wr_node_store(struct ma_wr_sta= te *wr_mas) done: mas_leaf_set_meta(mas, newnode, dst_pivots, maple_leaf_64, new_end); if (in_rcu) { + mte_set_node_dead(mas->node); mas->node =3D mt_mk_node(newnode, wr_mas->type); mas_replace(mas, false); } else { diff --git a/tools/testing/radix-tree/maple.c b/tools/testing/radix-tree/ma= ple.c index 958ee9bdb316..4c89ff333f6f 100644 --- a/tools/testing/radix-tree/maple.c +++ b/tools/testing/radix-tree/maple.c @@ -108,6 +108,7 @@ static noinline void check_new_node(struct maple_tree *= mt) MT_BUG_ON(mt, mn->slot[1] !=3D NULL); MT_BUG_ON(mt, mas_allocated(&mas) !=3D 0); =20 + mn->parent =3D ma_parent_ptr(mn); ma_free_rcu(mn); mas.node =3D MAS_START; mas_nomem(&mas, GFP_KERNEL); @@ -160,6 +161,7 @@ static noinline void check_new_node(struct maple_tree *= mt) MT_BUG_ON(mt, mas_allocated(&mas) !=3D i); MT_BUG_ON(mt, !mn); MT_BUG_ON(mt, not_empty(mn)); + mn->parent =3D ma_parent_ptr(mn); ma_free_rcu(mn); } =20 @@ -192,6 +194,7 @@ static noinline void check_new_node(struct maple_tree *= mt) MT_BUG_ON(mt, not_empty(mn)); MT_BUG_ON(mt, mas_allocated(&mas) !=3D i - 1); MT_BUG_ON(mt, !mn); + mn->parent =3D ma_parent_ptr(mn); ma_free_rcu(mn); } =20 @@ -210,6 +213,7 @@ static noinline void check_new_node(struct maple_tree *= mt) mn =3D mas_pop_node(&mas); MT_BUG_ON(mt, not_empty(mn)); MT_BUG_ON(mt, mas_allocated(&mas) !=3D j - 1); + mn->parent =3D ma_parent_ptr(mn); ma_free_rcu(mn); } MT_BUG_ON(mt, mas_allocated(&mas) !=3D 0); @@ -233,6 +237,7 @@ static noinline void check_new_node(struct maple_tree *= mt) MT_BUG_ON(mt, mas_allocated(&mas) !=3D i - j); mn =3D mas_pop_node(&mas); MT_BUG_ON(mt, not_empty(mn)); + mn->parent =3D ma_parent_ptr(mn); ma_free_rcu(mn); MT_BUG_ON(mt, mas_allocated(&mas) !=3D i - j - 1); } @@ -269,6 +274,7 @@ static noinline void check_new_node(struct maple_tree *= mt) mn =3D mas_pop_node(&mas); /* get the next node. */ MT_BUG_ON(mt, mn =3D=3D NULL); MT_BUG_ON(mt, not_empty(mn)); + mn->parent =3D ma_parent_ptr(mn); ma_free_rcu(mn); } MT_BUG_ON(mt, mas_allocated(&mas) !=3D 0); @@ -294,6 +300,7 @@ static noinline void check_new_node(struct maple_tree *= mt) mn =3D mas_pop_node(&mas2); /* get the next node. */ MT_BUG_ON(mt, mn =3D=3D NULL); MT_BUG_ON(mt, not_empty(mn)); + mn->parent =3D ma_parent_ptr(mn); ma_free_rcu(mn); } MT_BUG_ON(mt, mas_allocated(&mas2) !=3D 0); @@ -334,10 +341,12 @@ static noinline void check_new_node(struct maple_tree= *mt) MT_BUG_ON(mt, mas_allocated(&mas) !=3D MAPLE_ALLOC_SLOTS + 2); mn =3D mas_pop_node(&mas); MT_BUG_ON(mt, not_empty(mn)); + mn->parent =3D ma_parent_ptr(mn); ma_free_rcu(mn); for (i =3D 1; i <=3D MAPLE_ALLOC_SLOTS + 1; i++) { mn =3D mas_pop_node(&mas); MT_BUG_ON(mt, not_empty(mn)); + mn->parent =3D ma_parent_ptr(mn); ma_free_rcu(mn); } MT_BUG_ON(mt, mas_allocated(&mas) !=3D 0); @@ -375,6 +384,7 @@ static noinline void check_new_node(struct maple_tree *= mt) mas_node_count(&mas, i); /* Request */ mas_nomem(&mas, GFP_KERNEL); /* Fill request */ mn =3D mas_pop_node(&mas); /* get the next node. */ + mn->parent =3D ma_parent_ptr(mn); ma_free_rcu(mn); mas_destroy(&mas); =20 @@ -382,10 +392,13 @@ static noinline void check_new_node(struct maple_tree= *mt) mas_node_count(&mas, i); /* Request */ mas_nomem(&mas, GFP_KERNEL); /* Fill request */ mn =3D mas_pop_node(&mas); /* get the next node. */ + mn->parent =3D ma_parent_ptr(mn); ma_free_rcu(mn); mn =3D mas_pop_node(&mas); /* get the next node. */ + mn->parent =3D ma_parent_ptr(mn); ma_free_rcu(mn); mn =3D mas_pop_node(&mas); /* get the next node. */ + mn->parent =3D ma_parent_ptr(mn); ma_free_rcu(mn); mas_destroy(&mas); } @@ -35369,6 +35382,7 @@ static noinline void check_prealloc(struct maple_tr= ee *mt) MT_BUG_ON(mt, allocated !=3D 1 + height * 3); mn =3D mas_pop_node(&mas); MT_BUG_ON(mt, mas_allocated(&mas) !=3D allocated - 1); + mn->parent =3D ma_parent_ptr(mn); ma_free_rcu(mn); MT_BUG_ON(mt, mas_preallocate(&mas, GFP_KERNEL) !=3D 0); mas_destroy(&mas); @@ -35386,6 +35400,7 @@ static noinline void check_prealloc(struct maple_tr= ee *mt) mas_destroy(&mas); allocated =3D mas_allocated(&mas); MT_BUG_ON(mt, allocated !=3D 0); + mn->parent =3D ma_parent_ptr(mn); ma_free_rcu(mn); =20 MT_BUG_ON(mt, mas_preallocate(&mas, GFP_KERNEL) !=3D 0); @@ -35756,6 +35771,7 @@ void farmer_tests(void) tree.ma_root =3D mt_mk_node(node, maple_leaf_64); mt_dump(&tree); =20 + node->parent =3D ma_parent_ptr(node); ma_free_rcu(node); =20 /* Check things that will make lockdep angry */ --=20 2.39.2.722.g9855ee24e9-goog From nobody Wed Sep 10 09:07:32 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3D6A8C7EE2E for ; Mon, 27 Feb 2023 17:37:15 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229827AbjB0RhN (ORCPT ); Mon, 27 Feb 2023 12:37:13 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48288 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229932AbjB0Rg4 (ORCPT ); Mon, 27 Feb 2023 12:36:56 -0500 Received: from mail-yw1-x114a.google.com (mail-yw1-x114a.google.com [IPv6:2607:f8b0:4864:20::114a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 82018241CD for ; Mon, 27 Feb 2023 09:36:52 -0800 (PST) Received: by mail-yw1-x114a.google.com with SMTP id 00721157ae682-536b7eb9117so153562077b3.14 for ; Mon, 27 Feb 2023 09:36:52 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=SQNoSLNi/WgstuG55kXE7+m1dHKl+fKKDPrkCWl/DCo=; b=J5wmcAzyDZcatSwztTuxgzaYtke6kriIvYIqPT/BFwSZWL+3VmHsJK6pm6uqTXXs0k hQBQeTTqtZMXoydWH2EraS1BwJQMWApBWO4fGke8/Jq0OvUOu4cfdsRQ8dDM68VxgizQ ztRQr+1fezpVtAc0lp78wCnDR/PHSs/vBVo6WXbM19HIw+eJAf+OKImpXbbC44F1OdAA e9ipoEkHKuhTw98suQdXekAHWIPjjRBuGbOkaPOXnZpP2dJYchLjizYibUTfDcgNqsor BtPgKlPExQUuIZmGBr+b23UCrNVwj01MSEdyN/KRD26rWer10z6tKSB96K/CBsjdr3nr Dy4w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=SQNoSLNi/WgstuG55kXE7+m1dHKl+fKKDPrkCWl/DCo=; b=20XYhyO2+dVezqMN1ZEpsVOlmBF1QI69F/IjMFdtFv62QSIVLQcVcx6kCvjCoFFxyI 8nQcCyM3E1xUbjBGICZHL36Z30T+yWybjwfW9faKZP2jNT1V1BaInBteyWXTgNGZhdQA JyOiLBKSWcYwAt9ps64yirzKcN7TdH/gdCLX6dzxohf4rIsePzEfEuIoJeSVD0hitACT 45lEB5vOZmVijAGesr9X4lAUYJaGOpaIAUCcqL2ADaFC4sTBM45e18oJSaXnxPNMbokL i2Uq9qEfkuf5lLiHiYv68fVHwJjXgl1kmjL3o691nupdNjltMnTFMzl+WM8rNyRQbDLJ 446g== X-Gm-Message-State: AO0yUKUjF1jbODVi4J5imfxYVxf3sZp7atb84GrLLrhEM2saI9a207TU 85eXSXr4JBPgJNXXy4OLYUoili8bMbs= X-Google-Smtp-Source: AK7set9MzLoXjLMA6aezNIGpaaorfvcTajbjU3PQe8ye5Rv3C1xsvBgDWbku9hWg2h6JXcwKW/yoJy8kVAo= X-Received: from surenb-desktop.mtv.corp.google.com ([2620:15c:211:200:e1f6:21d1:eead:3897]) (user=surenb job=sendgmr) by 2002:a25:e805:0:b0:a5f:0:bf12 with SMTP id k5-20020a25e805000000b00a5f0000bf12mr3637071ybd.13.1677519412049; Mon, 27 Feb 2023 09:36:52 -0800 (PST) Date: Mon, 27 Feb 2023 09:36:05 -0800 In-Reply-To: <20230227173632.3292573-1-surenb@google.com> Mime-Version: 1.0 References: <20230227173632.3292573-1-surenb@google.com> X-Mailer: git-send-email 2.39.2.722.g9855ee24e9-goog Message-ID: <20230227173632.3292573-7-surenb@google.com> Subject: [PATCH v4 06/33] maple_tree: Add smp_rmb() to dead node detection From: Suren Baghdasaryan To: akpm@linux-foundation.org Cc: michel@lespinasse.org, jglisse@google.com, mhocko@suse.com, vbabka@suse.cz, hannes@cmpxchg.org, mgorman@techsingularity.net, dave@stgolabs.net, willy@infradead.org, liam.howlett@oracle.com, peterz@infradead.org, ldufour@linux.ibm.com, paulmck@kernel.org, mingo@redhat.com, will@kernel.org, luto@kernel.org, songliubraving@fb.com, peterx@redhat.com, david@redhat.com, dhowells@redhat.com, hughd@google.com, bigeasy@linutronix.de, kent.overstreet@linux.dev, punit.agrawal@bytedance.com, lstoakes@gmail.com, peterjung1337@gmail.com, rientjes@google.com, chriscli@google.com, axelrasmussen@google.com, joelaf@google.com, minchan@google.com, rppt@kernel.org, jannh@google.com, shakeelb@google.com, tatashin@google.com, edumazet@google.com, gthelen@google.com, gurua@google.com, arjunroy@google.com, soheil@google.com, leewalsh@google.com, posk@google.com, michalechner92@googlemail.com, linux-mm@kvack.org, linux-arm-kernel@lists.infradead.org, linuxppc-dev@lists.ozlabs.org, x86@kernel.org, linux-kernel@vger.kernel.org, kernel-team@android.com, "Liam R. Howlett" , Suren Baghdasaryan Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: "Liam R. Howlett" Add an smp_rmb() before reading the parent pointer to ensure that anything read from the node prior to the parent pointer hasn't been reordered ahead of this check. The is necessary for RCU mode. Fixes: 54a611b60590 ("Maple Tree: add new data structure") Signed-off-by: Liam R. Howlett Signed-off-by: Suren Baghdasaryan --- lib/maple_tree.c | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/lib/maple_tree.c b/lib/maple_tree.c index 6b6eddadd9d2..8ad2d1669fad 100644 --- a/lib/maple_tree.c +++ b/lib/maple_tree.c @@ -539,9 +539,11 @@ static inline struct maple_node *mte_parent(const stru= ct maple_enode *enode) */ static inline bool ma_dead_node(const struct maple_node *node) { - struct maple_node *parent =3D (void *)((unsigned long) - node->parent & ~MAPLE_NODE_MASK); + struct maple_node *parent; =20 + /* Do not reorder reads from the node prior to the parent check */ + smp_rmb(); + parent =3D (void *)((unsigned long) node->parent & ~MAPLE_NODE_MASK); return (parent =3D=3D node); } =20 @@ -556,6 +558,8 @@ static inline bool mte_dead_node(const struct maple_eno= de *enode) struct maple_node *parent, *node; =20 node =3D mte_to_node(enode); + /* Do not reorder reads from the node prior to the parent check */ + smp_rmb(); parent =3D mte_parent(enode); return (parent =3D=3D node); } --=20 2.39.2.722.g9855ee24e9-goog From nobody Wed Sep 10 09:07:32 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 17E5BC7EE23 for ; Mon, 27 Feb 2023 17:37:19 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230017AbjB0RhR (ORCPT ); Mon, 27 Feb 2023 12:37:17 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48252 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230001AbjB0Rg7 (ORCPT ); Mon, 27 Feb 2023 12:36:59 -0500 Received: from mail-yw1-x1149.google.com (mail-yw1-x1149.google.com [IPv6:2607:f8b0:4864:20::1149]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id DE5D323DA4 for ; Mon, 27 Feb 2023 09:36:54 -0800 (PST) Received: by mail-yw1-x1149.google.com with SMTP id 00721157ae682-536bbaa701aso152704927b3.3 for ; Mon, 27 Feb 2023 09:36:54 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=fUqwZdzGNzQ24dn0iye+/d9qpIg3e2K+gtFu7zl18dg=; b=aK47QRtPPpBCpM3JfuxQQgTdprUxWBxefyFdQCAQov/N683tdXupMjqGdDKaSCwwNE r/feh6muJswYFO3gz8+8hlh8fJ46mk6ytxKiHTyHUDkeqkCmRcZ5ILVSZrf82rniQwJS bwSBoSpbInW4fIYbWSg9wRUr2lr7BPbm97H4qJPgmiiSyOeTA46yxA3PrsE4fyck24P9 RYYIOxHb/Qq+7z+3uM5jAsGNi6XPyJ8SL/Fk6/nH9s0kq3HjTQmlwXVgHF3zawoBuzen oXraX9gQmV6bGYIwSDanLksc10lz5lKlqDY5O8IyYB6Zq/Hucf+Jh7yTBppqP+cHK8p4 s1gg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=fUqwZdzGNzQ24dn0iye+/d9qpIg3e2K+gtFu7zl18dg=; b=hmJt+k6YCfDy8GYj8rx9ACjTDy55EyLt75VhaTz1n6wBv6HAer6+bePdkltXaL8K/2 Jmfbezop/acPGGWQ5NkDYtgRlz0MOIP/p8rlh/vW99CXdXVEQlCtqmBAm2ipznK+Uvp2 YAalqGsRvz07z/Z0Y0gNSWqg/AyL3STn+J1dYXb8EDMGVskGEIGzFOtI87zqF4jzs6cF SedHNlowehQp/nU33tA2jM0dduZWi21NGRqnwN+C0+q/fKaPIqk90QBNKa9Zg0phqZli OdHCjEdQBPL3oaOcRjX3XSVzGi49jPPhEcxJBU/vvBQMhstlzU7Vsl2AGW9ctZwfyKyv Y7oA== X-Gm-Message-State: AO0yUKXo7kNi8F55eYaR5G1rsNUTHyzQBsL1HjTYM3ESB8wXzXnW+ywS DN1ar5NJ2oQouy/60tPO3m72lFyZIGg= X-Google-Smtp-Source: AK7set+xp/kCact/o4a/8QM+9CxJRaFgqFXpY/QxYftINt6FbP584Urjuu8wRQFTT7DzY73bdvjS3xZy3nM= X-Received: from surenb-desktop.mtv.corp.google.com ([2620:15c:211:200:e1f6:21d1:eead:3897]) (user=surenb job=sendgmr) by 2002:a05:6902:1388:b0:855:fdcb:4467 with SMTP id x8-20020a056902138800b00855fdcb4467mr285964ybu.0.1677519414274; Mon, 27 Feb 2023 09:36:54 -0800 (PST) Date: Mon, 27 Feb 2023 09:36:06 -0800 In-Reply-To: <20230227173632.3292573-1-surenb@google.com> Mime-Version: 1.0 References: <20230227173632.3292573-1-surenb@google.com> X-Mailer: git-send-email 2.39.2.722.g9855ee24e9-goog Message-ID: <20230227173632.3292573-8-surenb@google.com> Subject: [PATCH v4 07/33] maple_tree: Add RCU lock checking to rcu callback functions From: Suren Baghdasaryan To: akpm@linux-foundation.org Cc: michel@lespinasse.org, jglisse@google.com, mhocko@suse.com, vbabka@suse.cz, hannes@cmpxchg.org, mgorman@techsingularity.net, dave@stgolabs.net, willy@infradead.org, liam.howlett@oracle.com, peterz@infradead.org, ldufour@linux.ibm.com, paulmck@kernel.org, mingo@redhat.com, will@kernel.org, luto@kernel.org, songliubraving@fb.com, peterx@redhat.com, david@redhat.com, dhowells@redhat.com, hughd@google.com, bigeasy@linutronix.de, kent.overstreet@linux.dev, punit.agrawal@bytedance.com, lstoakes@gmail.com, peterjung1337@gmail.com, rientjes@google.com, chriscli@google.com, axelrasmussen@google.com, joelaf@google.com, minchan@google.com, rppt@kernel.org, jannh@google.com, shakeelb@google.com, tatashin@google.com, edumazet@google.com, gthelen@google.com, gurua@google.com, arjunroy@google.com, soheil@google.com, leewalsh@google.com, posk@google.com, michalechner92@googlemail.com, linux-mm@kvack.org, linux-arm-kernel@lists.infradead.org, linuxppc-dev@lists.ozlabs.org, x86@kernel.org, linux-kernel@vger.kernel.org, kernel-team@android.com, "Liam R. Howlett" , Suren Baghdasaryan Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: "Liam R. Howlett" Dereferencing RCU objects within the RCU callback without the RCU check has caused lockdep to complain. Fix the RCU dereferencing by using the RCU callback lock to ensure the operation is safe. Also stop creating a new lock to use for dereferencing during destruction of the tree or subtree. Instead, pass through a pointer to the tree that has the lock that is held for RCU dereferencing checking. It also does not make sense to use the maple state in the freeing scenario as the tree walk is a special case where the tree no longer has the normal encodings and parent pointers. Fixes: 54a611b60590 ("Maple Tree: add new data structure") Reported-by: Suren Baghdasaryan Signed-off-by: Liam R. Howlett --- lib/maple_tree.c | 188 ++++++++++++++++++++++++----------------------- 1 file changed, 96 insertions(+), 92 deletions(-) diff --git a/lib/maple_tree.c b/lib/maple_tree.c index 8ad2d1669fad..2be86368237d 100644 --- a/lib/maple_tree.c +++ b/lib/maple_tree.c @@ -824,6 +824,11 @@ static inline void *mt_slot(const struct maple_tree *m= t, return rcu_dereference_check(slots[offset], mt_locked(mt)); } =20 +static inline void *mt_slot_locked(struct maple_tree *mt, void __rcu **slo= ts, + unsigned char offset) +{ + return rcu_dereference_protected(slots[offset], mt_locked(mt)); +} /* * mas_slot_locked() - Get the slot value when holding the maple tree lock. * @mas: The maple state @@ -835,7 +840,7 @@ static inline void *mt_slot(const struct maple_tree *mt, static inline void *mas_slot_locked(struct ma_state *mas, void __rcu **slo= ts, unsigned char offset) { - return rcu_dereference_protected(slots[offset], mt_locked(mas->tree)); + return mt_slot_locked(mas->tree, slots, offset); } =20 /* @@ -907,34 +912,35 @@ static inline void ma_set_meta(struct maple_node *mn,= enum maple_type mt, } =20 /* - * mas_clear_meta() - clear the metadata information of a node, if it exis= ts - * @mas: The maple state + * mt_clear_meta() - clear the metadata information of a node, if it exists + * @mt: The maple tree * @mn: The maple node - * @mt: The maple node type + * @type: The maple node type * @offset: The offset of the highest sub-gap in this node. * @end: The end of the data in this node. */ -static inline void mas_clear_meta(struct ma_state *mas, struct maple_node = *mn, - enum maple_type mt) +static inline void mt_clear_meta(struct maple_tree *mt, struct maple_node = *mn, + enum maple_type type) { struct maple_metadata *meta; unsigned long *pivots; void __rcu **slots; void *next; =20 - switch (mt) { + switch (type) { case maple_range_64: pivots =3D mn->mr64.pivot; if (unlikely(pivots[MAPLE_RANGE64_SLOTS - 2])) { slots =3D mn->mr64.slot; - next =3D mas_slot_locked(mas, slots, - MAPLE_RANGE64_SLOTS - 1); - if (unlikely((mte_to_node(next) && mte_node_type(next)))) - return; /* The last slot is a node, no metadata */ + next =3D mt_slot_locked(mt, slots, + MAPLE_RANGE64_SLOTS - 1); + if (unlikely((mte_to_node(next) && + mte_node_type(next)))) + return; /* no metadata, could be node */ } fallthrough; case maple_arange_64: - meta =3D ma_meta(mn, mt); + meta =3D ma_meta(mn, type); break; default: return; @@ -5497,7 +5503,7 @@ static inline int mas_rev_alloc(struct ma_state *mas,= unsigned long min, } =20 /* - * mas_dead_leaves() - Mark all leaves of a node as dead. + * mte_dead_leaves() - Mark all leaves of a node as dead. * @mas: The maple state * @slots: Pointer to the slot array * @type: The maple node type @@ -5507,16 +5513,16 @@ static inline int mas_rev_alloc(struct ma_state *ma= s, unsigned long min, * Return: The number of leaves marked as dead. */ static inline -unsigned char mas_dead_leaves(struct ma_state *mas, void __rcu **slots, - enum maple_type mt) +unsigned char mte_dead_leaves(struct maple_enode *enode, struct maple_tree= *mt, + void __rcu **slots) { struct maple_node *node; enum maple_type type; void *entry; int offset; =20 - for (offset =3D 0; offset < mt_slots[mt]; offset++) { - entry =3D mas_slot_locked(mas, slots, offset); + for (offset =3D 0; offset < mt_slot_count(enode); offset++) { + entry =3D mt_slot(mt, slots, offset); type =3D mte_node_type(entry); node =3D mte_to_node(entry); /* Use both node and type to catch LE & BE metadata */ @@ -5531,162 +5537,160 @@ unsigned char mas_dead_leaves(struct ma_state *ma= s, void __rcu **slots, return offset; } =20 -static void __rcu **mas_dead_walk(struct ma_state *mas, unsigned char offs= et) +/** + * mte_dead_walk() - Walk down a dead tree to just before the leaves + * @enode: The maple encoded node + * @offset: The starting offset + * + * Note: This can only be used from the RCU callback context. + */ +static void __rcu **mte_dead_walk(struct maple_enode **enode, unsigned cha= r offset) { - struct maple_node *next; + struct maple_node *node, *next; void __rcu **slots =3D NULL; =20 - next =3D mas_mn(mas); + next =3D mte_to_node(*enode); do { - mas->node =3D mt_mk_node(next, next->type); - slots =3D ma_slots(next, next->type); - next =3D mas_slot_locked(mas, slots, offset); + *enode =3D ma_enode_ptr(next); + node =3D mte_to_node(*enode); + slots =3D ma_slots(node, node->type); + next =3D rcu_dereference_protected(slots[offset], + lock_is_held(&rcu_callback_map)); offset =3D 0; } while (!ma_is_leaf(next->type)); =20 return slots; } =20 +/** + * mt_free_walk() - Walk & free a tree in the RCU callback context + * @head: The RCU head that's within the node. + * + * Note: This can only be used from the RCU callback context. + */ static void mt_free_walk(struct rcu_head *head) { void __rcu **slots; struct maple_node *node, *start; - struct maple_tree mt; + struct maple_enode *enode; unsigned char offset; enum maple_type type; - MA_STATE(mas, &mt, 0, 0); =20 node =3D container_of(head, struct maple_node, rcu); =20 if (ma_is_leaf(node->type)) goto free_leaf; =20 - mt_init_flags(&mt, node->ma_flags); - mas_lock(&mas); start =3D node; - mas.node =3D mt_mk_node(node, node->type); - slots =3D mas_dead_walk(&mas, 0); - node =3D mas_mn(&mas); + enode =3D mt_mk_node(node, node->type); + slots =3D mte_dead_walk(&enode, 0); + node =3D mte_to_node(enode); do { mt_free_bulk(node->slot_len, slots); offset =3D node->parent_slot + 1; - mas.node =3D node->piv_parent; - if (mas_mn(&mas) =3D=3D node) - goto start_slots_free; - - type =3D mte_node_type(mas.node); - slots =3D ma_slots(mte_to_node(mas.node), type); - if ((offset < mt_slots[type]) && (slots[offset])) - slots =3D mas_dead_walk(&mas, offset); - - node =3D mas_mn(&mas); + enode =3D node->piv_parent; + if (mte_to_node(enode) =3D=3D node) + goto free_leaf; + + type =3D mte_node_type(enode); + slots =3D ma_slots(mte_to_node(enode), type); + if ((offset < mt_slots[type]) && + rcu_dereference_protected(slots[offset], + lock_is_held(&rcu_callback_map))) + slots =3D mte_dead_walk(&enode, offset); + node =3D mte_to_node(enode); } while ((node !=3D start) || (node->slot_len < offset)); =20 slots =3D ma_slots(node, node->type); mt_free_bulk(node->slot_len, slots); =20 -start_slots_free: - mas_unlock(&mas); free_leaf: mt_free_rcu(&node->rcu); } =20 -static inline void __rcu **mas_destroy_descend(struct ma_state *mas, - struct maple_enode *prev, unsigned char offset) +static inline void __rcu **mte_destroy_descend(struct maple_enode **enode, + struct maple_tree *mt, struct maple_enode *prev, unsigned char offset) { struct maple_node *node; - struct maple_enode *next =3D mas->node; + struct maple_enode *next =3D *enode; void __rcu **slots =3D NULL; + enum maple_type type; + unsigned char next_offset =3D 0; =20 do { - mas->node =3D next; - node =3D mas_mn(mas); - slots =3D ma_slots(node, mte_node_type(mas->node)); - next =3D mas_slot_locked(mas, slots, 0); - if ((mte_dead_node(next))) { - mte_to_node(next)->type =3D mte_node_type(next); - next =3D mas_slot_locked(mas, slots, 1); - } + *enode =3D next; + node =3D mte_to_node(*enode); + type =3D mte_node_type(*enode); + slots =3D ma_slots(node, type); + next =3D mt_slot_locked(mt, slots, next_offset); + if ((mte_dead_node(next))) + next =3D mt_slot_locked(mt, slots, ++next_offset); =20 - mte_set_node_dead(mas->node); - node->type =3D mte_node_type(mas->node); - mas_clear_meta(mas, node, node->type); + mte_set_node_dead(*enode); + node->type =3D type; node->piv_parent =3D prev; node->parent_slot =3D offset; - offset =3D 0; - prev =3D mas->node; + offset =3D next_offset; + next_offset =3D 0; + prev =3D *enode; } while (!mte_is_leaf(next)); =20 return slots; } =20 -static void mt_destroy_walk(struct maple_enode *enode, unsigned char ma_fl= ags, +static void mt_destroy_walk(struct maple_enode *enode, struct maple_tree *= mt, bool free) { void __rcu **slots; struct maple_node *node =3D mte_to_node(enode); struct maple_enode *start; - struct maple_tree mt; - - MA_STATE(mas, &mt, 0, 0); =20 - mas.node =3D enode; if (mte_is_leaf(enode)) { node->type =3D mte_node_type(enode); goto free_leaf; } =20 - ma_flags &=3D ~MT_FLAGS_LOCK_MASK; - mt_init_flags(&mt, ma_flags); - mas_lock(&mas); - - mte_to_node(enode)->ma_flags =3D ma_flags; start =3D enode; - slots =3D mas_destroy_descend(&mas, start, 0); - node =3D mas_mn(&mas); + slots =3D mte_destroy_descend(&enode, mt, start, 0); + node =3D mte_to_node(enode); // Updated in the above call. do { enum maple_type type; unsigned char offset; struct maple_enode *parent, *tmp; =20 - node->type =3D mte_node_type(mas.node); - node->slot_len =3D mas_dead_leaves(&mas, slots, node->type); + node->slot_len =3D mte_dead_leaves(enode, mt, slots); if (free) mt_free_bulk(node->slot_len, slots); offset =3D node->parent_slot + 1; - mas.node =3D node->piv_parent; - if (mas_mn(&mas) =3D=3D node) - goto start_slots_free; + enode =3D node->piv_parent; + if (mte_to_node(enode) =3D=3D node) + goto free_leaf; =20 - type =3D mte_node_type(mas.node); - slots =3D ma_slots(mte_to_node(mas.node), type); + type =3D mte_node_type(enode); + slots =3D ma_slots(mte_to_node(enode), type); if (offset >=3D mt_slots[type]) goto next; =20 - tmp =3D mas_slot_locked(&mas, slots, offset); + tmp =3D mt_slot_locked(mt, slots, offset); if (mte_node_type(tmp) && mte_to_node(tmp)) { - parent =3D mas.node; - mas.node =3D tmp; - slots =3D mas_destroy_descend(&mas, parent, offset); + parent =3D enode; + enode =3D tmp; + slots =3D mte_destroy_descend(&enode, mt, parent, offset); } next: - node =3D mas_mn(&mas); - } while (start !=3D mas.node); + node =3D mte_to_node(enode); + } while (start !=3D enode); =20 - node =3D mas_mn(&mas); - node->type =3D mte_node_type(mas.node); - node->slot_len =3D mas_dead_leaves(&mas, slots, node->type); + node =3D mte_to_node(enode); + node->slot_len =3D mte_dead_leaves(enode, mt, slots); if (free) mt_free_bulk(node->slot_len, slots); =20 -start_slots_free: - mas_unlock(&mas); - free_leaf: if (free) mt_free_rcu(&node->rcu); else - mas_clear_meta(&mas, node, node->type); + mt_clear_meta(mt, node, node->type); } =20 /* @@ -5702,10 +5706,10 @@ static inline void mte_destroy_walk(struct maple_en= ode *enode, struct maple_node *node =3D mte_to_node(enode); =20 if (mt_in_rcu(mt)) { - mt_destroy_walk(enode, mt->ma_flags, false); + mt_destroy_walk(enode, mt, false); call_rcu(&node->rcu, mt_free_walk); } else { - mt_destroy_walk(enode, mt->ma_flags, true); + mt_destroy_walk(enode, mt, true); } } =20 --=20 2.39.2.722.g9855ee24e9-goog From nobody Wed Sep 10 09:07:32 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3E168C7EE30 for ; Mon, 27 Feb 2023 17:37:21 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230179AbjB0RhT (ORCPT ); Mon, 27 Feb 2023 12:37:19 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48566 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229756AbjB0RhB (ORCPT ); Mon, 27 Feb 2023 12:37:01 -0500 Received: from mail-yw1-x1149.google.com (mail-yw1-x1149.google.com [IPv6:2607:f8b0:4864:20::1149]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 6E3A82413A for ; Mon, 27 Feb 2023 09:36:57 -0800 (PST) Received: by mail-yw1-x1149.google.com with SMTP id 00721157ae682-536a545bfbaso154418717b3.20 for ; Mon, 27 Feb 2023 09:36:57 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=9aVmQ+gMlQiQzO0BzppdTJNpvPFfXe9SF4GFtX+8IdI=; b=D/TbQt2c2awJMi6HOd9NzTPN8Wh6HIDVRf22VcVpKnoo2iG9BW3cveeUmkm8PKhRJw e8yR0ZbMCGkBSW5fis3LsBzhCw5i+IpUgPXTff0+yi++DCnoiZhErYzM/F+7mHgcikXH QOit+1nZssfs8RFlG5fSiSPYICEv1yz50+pHU8s0ps/vZirUhJuz0Ulrl1hrsOJOpcE3 znNNgdhtvb5jKMhdVyGD/B9paUJjeax2LqQ02QvRkavKjrbvwoC4jih7/m+HHIs6ILLw YfzP5gfYx0HByoaWOc2LKVichxZvwcYhuSo6etH4BNZtHefRK1x1pNfZ+dtw2h02k/FS yoOQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=9aVmQ+gMlQiQzO0BzppdTJNpvPFfXe9SF4GFtX+8IdI=; b=08DNEBUsmdO0WD+AYuEk0k79T3f1dxgUhEEimJoGOCjYSxfzOhkyH3XEZVZGbA2wj6 gwY+Rweebw2swIgBrQdGlKV0siBADj6i04pyUZ3/zZOvqWPMH/2ywvQayrRLthkpImWE JTtoaaZEjrt3OyD0PoX9QJOh+12rsr8hC0I7JYSyZHBSPCbNgop0dyA9JIf1hfP/mQpo hqTpw3vU5JJcIBfiZMxDDMv0/ES0faje8R1KuLSBO0pxRQ1at3GOcJUFXXkkNjuXZlCK ck9wRGBjac4dGDiwbPlkvOy09LWumhJsMOGaciVcOf5u9tEbIrypH5hpbsM9HP5zTGsc XvVA== X-Gm-Message-State: AO0yUKWl4K8MOjHV9OLs40tggPEJENpLbDgV5rtF9W0ZEu392mNscn6/ /0tLcB1du79PD360bUh61QVQdYjquCs= X-Google-Smtp-Source: AK7set8Y6D5rMaDNffZrErpNBudkJshlslCM/7lUMkyK7fj0ZkdAWXrQW4GsW7RGl7OL0nGeq+RMkIj/EqM= X-Received: from surenb-desktop.mtv.corp.google.com ([2620:15c:211:200:e1f6:21d1:eead:3897]) (user=surenb job=sendgmr) by 2002:a05:6902:1205:b0:a09:314f:9f09 with SMTP id s5-20020a056902120500b00a09314f9f09mr214802ybu.6.1677519416349; Mon, 27 Feb 2023 09:36:56 -0800 (PST) Date: Mon, 27 Feb 2023 09:36:07 -0800 In-Reply-To: <20230227173632.3292573-1-surenb@google.com> Mime-Version: 1.0 References: <20230227173632.3292573-1-surenb@google.com> X-Mailer: git-send-email 2.39.2.722.g9855ee24e9-goog Message-ID: <20230227173632.3292573-9-surenb@google.com> Subject: [PATCH v4 08/33] mm: Enable maple tree RCU mode by default. From: Suren Baghdasaryan To: akpm@linux-foundation.org Cc: michel@lespinasse.org, jglisse@google.com, mhocko@suse.com, vbabka@suse.cz, hannes@cmpxchg.org, mgorman@techsingularity.net, dave@stgolabs.net, willy@infradead.org, liam.howlett@oracle.com, peterz@infradead.org, ldufour@linux.ibm.com, paulmck@kernel.org, mingo@redhat.com, will@kernel.org, luto@kernel.org, songliubraving@fb.com, peterx@redhat.com, david@redhat.com, dhowells@redhat.com, hughd@google.com, bigeasy@linutronix.de, kent.overstreet@linux.dev, punit.agrawal@bytedance.com, lstoakes@gmail.com, peterjung1337@gmail.com, rientjes@google.com, chriscli@google.com, axelrasmussen@google.com, joelaf@google.com, minchan@google.com, rppt@kernel.org, jannh@google.com, shakeelb@google.com, tatashin@google.com, edumazet@google.com, gthelen@google.com, gurua@google.com, arjunroy@google.com, soheil@google.com, leewalsh@google.com, posk@google.com, michalechner92@googlemail.com, linux-mm@kvack.org, linux-arm-kernel@lists.infradead.org, linuxppc-dev@lists.ozlabs.org, x86@kernel.org, linux-kernel@vger.kernel.org, kernel-team@android.com, "Liam R. Howlett" , Suren Baghdasaryan Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: "Liam R. Howlett" Use the maple tree in RCU mode for VMA tracking. This is necessary for the use of per-VMA locking. RCU mode is enabled by default but disabled when exiting an mm and for the new tree during a fork. Also enable RCU for the tree used in munmap operations to ensure the nodes remain valid for readers. Signed-off-by: Liam R. Howlett Signed-off-by: Suren Baghdasaryan --- include/linux/mm_types.h | 3 ++- kernel/fork.c | 3 +++ mm/mmap.c | 4 +++- 3 files changed, 8 insertions(+), 2 deletions(-) diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index 417d25c6a262..22b2ac82bffd 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -779,7 +779,8 @@ struct mm_struct { unsigned long cpu_bitmap[]; }; =20 -#define MM_MT_FLAGS (MT_FLAGS_ALLOC_RANGE | MT_FLAGS_LOCK_EXTERN) +#define MM_MT_FLAGS (MT_FLAGS_ALLOC_RANGE | MT_FLAGS_LOCK_EXTERN | \ + MT_FLAGS_USE_RCU) extern struct mm_struct init_mm; =20 /* Pointer magic because the dynamic array size confuses some compilers. */ diff --git a/kernel/fork.c b/kernel/fork.c index 0cbfdc4b509e..abfcf95734c7 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -617,6 +617,7 @@ static __latent_entropy int dup_mmap(struct mm_struct *= mm, if (retval) goto out; =20 + mt_clear_in_rcu(vmi.mas.tree); for_each_vma(old_vmi, mpnt) { struct file *file; =20 @@ -700,6 +701,8 @@ static __latent_entropy int dup_mmap(struct mm_struct *= mm, retval =3D arch_dup_mmap(oldmm, mm); loop_out: vma_iter_free(&vmi); + if (!retval) + mt_set_in_rcu(vmi.mas.tree); out: mmap_write_unlock(mm); flush_tlb_mm(oldmm); diff --git a/mm/mmap.c b/mm/mmap.c index 740b54be3ed4..c234443ee24c 100644 --- a/mm/mmap.c +++ b/mm/mmap.c @@ -2277,7 +2277,8 @@ do_vmi_align_munmap(struct vma_iterator *vmi, struct = vm_area_struct *vma, int count =3D 0; int error =3D -ENOMEM; MA_STATE(mas_detach, &mt_detach, 0, 0); - mt_init_flags(&mt_detach, MT_FLAGS_LOCK_EXTERN); + mt_init_flags(&mt_detach, vmi->mas.tree->ma_flags & + (MT_FLAGS_LOCK_MASK | MT_FLAGS_USE_RCU)); mt_set_external_lock(&mt_detach, &mm->mmap_lock); =20 /* @@ -3042,6 +3043,7 @@ void exit_mmap(struct mm_struct *mm) */ set_bit(MMF_OOM_SKIP, &mm->flags); mmap_write_lock(mm); + mt_clear_in_rcu(&mm->mm_mt); free_pgtables(&tlb, &mm->mm_mt, vma, FIRST_USER_ADDRESS, USER_PGTABLES_CEILING); tlb_finish_mmu(&tlb); --=20 2.39.2.722.g9855ee24e9-goog From nobody Wed Sep 10 09:07:32 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 945E0C64ED6 for ; Mon, 27 Feb 2023 17:37:33 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229987AbjB0Rhc (ORCPT ); Mon, 27 Feb 2023 12:37:32 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48978 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229969AbjB0RhM (ORCPT ); Mon, 27 Feb 2023 12:37:12 -0500 Received: from mail-yw1-x114a.google.com (mail-yw1-x114a.google.com [IPv6:2607:f8b0:4864:20::114a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 4E843241FC for ; Mon, 27 Feb 2023 09:36:59 -0800 (PST) Received: by mail-yw1-x114a.google.com with SMTP id 00721157ae682-536c039f859so154165257b3.21 for ; Mon, 27 Feb 2023 09:36:59 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=ycuncMhySgAqKB0JKjWQ87T3vvxMS9R2K+oLMhIEJ8Y=; b=jatZw+vavHEldR/zWx4fIqW8wAPU7kuskm2fNsdTmUbEoYYRUK1QgmfSQwty0F7vyh In/5k0Ybu4OL/7+U8XoT4kd5dwj6nkMsSYSfl49SjuU4fWHkZmC7EvE/NA5jUKV5mijt tEPaHMFzYqt7Ecgvi/GqjzXDl7rzKrsdjxUTz3IPQXQFpGFBi+tum9hpbaiwuAQcs7YP AcTg6xqgjkoCojGsNiAv0DLTEBbzq13kS5i64Oy3wTmze25CUcKHKdl0RIiSjO/ZdJhS rd+Ro59v2yktpIygy3VtQ7oiDpWrjQ+HLII3XoEVuOsktTglELO3PMbQ5C9t6zyu7z8h 3t1w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=ycuncMhySgAqKB0JKjWQ87T3vvxMS9R2K+oLMhIEJ8Y=; b=SdqssnPkY4Kj/eENE6VbijM6COf8LIdVX/AwTSWn3R3yYiQsYQyOqETc/hLf+c9UgB fPNICW7fYZSekB9EpGrwxFyd4i4EQdzq0nPj79EjA/hbgoaCMPH4yctatiA3cYn3vjEh X9hO3YwJ7c6DJMDAImxKebFyE/bK/WmMSrv6CFbOSYLc/0nibipIfJbxpPz2+oT3lFyk qpXlQFx296k5+5zaJIebwZGKOtWvOo+9DcU9OHEelpGuRzLYEsA3fjDEuE704Gw3k4Ru D93Ay1zvxDy6SdqjqG8AZLjWYOkeCQ5lZ6dIdK2vz455Fw7invYQ71xrhTB0Q7D8yOuG JvCA== X-Gm-Message-State: AO0yUKU+B/ElbbK253s5XAM/Igw0uvplkvslzUAMsXM2AHQ/tHpkU5M1 Y4x6bmTNNKf1NOCPzuA4Cp/mUsYuOsw= X-Google-Smtp-Source: AK7set+sQnwo5grHRclokRwp0C+SftMLAAuN7gXfh+MOPKXwAKaiWwA4Ongifn/fr7bxEOF3Yk7Tuxt7Hwo= X-Received: from surenb-desktop.mtv.corp.google.com ([2620:15c:211:200:e1f6:21d1:eead:3897]) (user=surenb job=sendgmr) by 2002:a81:af1f:0:b0:52f:1c40:b1f9 with SMTP id n31-20020a81af1f000000b0052f1c40b1f9mr10776885ywh.7.1677519418452; Mon, 27 Feb 2023 09:36:58 -0800 (PST) Date: Mon, 27 Feb 2023 09:36:08 -0800 In-Reply-To: <20230227173632.3292573-1-surenb@google.com> Mime-Version: 1.0 References: <20230227173632.3292573-1-surenb@google.com> X-Mailer: git-send-email 2.39.2.722.g9855ee24e9-goog Message-ID: <20230227173632.3292573-10-surenb@google.com> Subject: [PATCH v4 09/33] mm: introduce CONFIG_PER_VMA_LOCK From: Suren Baghdasaryan To: akpm@linux-foundation.org Cc: michel@lespinasse.org, jglisse@google.com, mhocko@suse.com, vbabka@suse.cz, hannes@cmpxchg.org, mgorman@techsingularity.net, dave@stgolabs.net, willy@infradead.org, liam.howlett@oracle.com, peterz@infradead.org, ldufour@linux.ibm.com, paulmck@kernel.org, mingo@redhat.com, will@kernel.org, luto@kernel.org, songliubraving@fb.com, peterx@redhat.com, david@redhat.com, dhowells@redhat.com, hughd@google.com, bigeasy@linutronix.de, kent.overstreet@linux.dev, punit.agrawal@bytedance.com, lstoakes@gmail.com, peterjung1337@gmail.com, rientjes@google.com, chriscli@google.com, axelrasmussen@google.com, joelaf@google.com, minchan@google.com, rppt@kernel.org, jannh@google.com, shakeelb@google.com, tatashin@google.com, edumazet@google.com, gthelen@google.com, gurua@google.com, arjunroy@google.com, soheil@google.com, leewalsh@google.com, posk@google.com, michalechner92@googlemail.com, linux-mm@kvack.org, linux-arm-kernel@lists.infradead.org, linuxppc-dev@lists.ozlabs.org, x86@kernel.org, linux-kernel@vger.kernel.org, kernel-team@android.com, Suren Baghdasaryan Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" This configuration variable will be used to build the support for VMA locking during page fault handling. This is enabled on supported architectures with SMP and MMU set. The architecture support is needed since the page fault handler is called from the architecture's page faulting code which needs modifications to handle faults under VMA lock. Signed-off-by: Suren Baghdasaryan --- mm/Kconfig | 12 ++++++++++++ 1 file changed, 12 insertions(+) diff --git a/mm/Kconfig b/mm/Kconfig index ca98b2072df5..2e4a7e61768a 100644 --- a/mm/Kconfig +++ b/mm/Kconfig @@ -1211,6 +1211,18 @@ config LRU_GEN_STATS This option has a per-memcg and per-node memory overhead. # } =20 +config ARCH_SUPPORTS_PER_VMA_LOCK + def_bool n + +config PER_VMA_LOCK + def_bool y + depends on ARCH_SUPPORTS_PER_VMA_LOCK && MMU && SMP + help + Allow per-vma locking during page fault handling. + + This feature allows locking each virtual memory area separately when + handling page faults instead of taking mmap_lock. + source "mm/damon/Kconfig" =20 endmenu --=20 2.39.2.722.g9855ee24e9-goog From nobody Wed Sep 10 09:07:32 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id B7516C64ED6 for ; Mon, 27 Feb 2023 17:37:36 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229998AbjB0Rhe (ORCPT ); Mon, 27 Feb 2023 12:37:34 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48258 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229916AbjB0RhP (ORCPT ); Mon, 27 Feb 2023 12:37:15 -0500 Received: from mail-yw1-x114a.google.com (mail-yw1-x114a.google.com [IPv6:2607:f8b0:4864:20::114a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 6EA772413B for ; Mon, 27 Feb 2023 09:37:01 -0800 (PST) Received: by mail-yw1-x114a.google.com with SMTP id 00721157ae682-536b7eb9117so153569677b3.14 for ; Mon, 27 Feb 2023 09:37:01 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=29YuznebDbhH7LJ7uqA8Selv98qAa02JmRVPnHaqVO4=; b=jX0E/sUVT2RKYWg5Tqa21wWzSsitbAZllQiaBV4eWJXbUNmmhHROoBx1qNz1dBdUuE WfwseHSvUNs7qQdK+oiHVNrKDTVUoZgUvG4zzPJtFXGpgjghqMQHZP4+pvFGIZY9K8t8 yTb/HpAgn0jfVzcS4IzWrxVVhGHhJpK4EoMdSwUP/MFG5D0Q4nJDsYGVVLA1YMlIDxWr /VFBixWLHZCmRtwCrTiJJEDxVSpInKYA+7RDdnLaY6ZEJ31POIP0qLghcCdmVf5ptWUG sRbVaoL3AhiMndAmHz8lepZWlnznP3Urg+vUr3DV+pv2Gj3JIWJnnqJV/02ALHT1NaZc S6wg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=29YuznebDbhH7LJ7uqA8Selv98qAa02JmRVPnHaqVO4=; b=A91smaaO37XfvJoDn2eJgGgS0XTGwMFAR6+jf0mRbZ5QOAcdI4jzvP29vIuz3uO+bI jdj4WhMmKkJSwei3zvI6+oYW+U+TKUFAbdud7vtGv2gg1FtYW9JruRmwr7BGKpK/zUnM s1hLGY10lU87IC/KeeOQkjj27glT0NMVai0fk/6q65NPnG9DgW4mfBujFdOy54yH3H08 qK40dVwWF+K6/WAQbzgYMyabMvKa2ISvCM++6AFbjmydjWS6/3oGzkg37mAcdEi2r21T LAaIDCGQCFGprizkfggYZJiMv3XuZ0ObYWCeL9Ik+ySdCA547KIC3nKF0bZH1mIhECJ6 B7JA== X-Gm-Message-State: AO0yUKUXN1drzzkDelNFTLBIyc/K0czofswHBkunfD3nwwNjKyuZyUCQ iNkRgs0JbYzEj12MqAR1UcgkfLSg9aU= X-Google-Smtp-Source: AK7set+1Pq3oRMVl5cZH8khR0Bvwd8LGMD5WFRsqtwPODZNGlHBw37TZClpeLubRs3LjGXy9DC4x47Xc4Uo= X-Received: from surenb-desktop.mtv.corp.google.com ([2620:15c:211:200:e1f6:21d1:eead:3897]) (user=surenb job=sendgmr) by 2002:a25:9390:0:b0:a53:5825:f25b with SMTP id a16-20020a259390000000b00a535825f25bmr6018468ybm.0.1677519421007; Mon, 27 Feb 2023 09:37:01 -0800 (PST) Date: Mon, 27 Feb 2023 09:36:09 -0800 In-Reply-To: <20230227173632.3292573-1-surenb@google.com> Mime-Version: 1.0 References: <20230227173632.3292573-1-surenb@google.com> X-Mailer: git-send-email 2.39.2.722.g9855ee24e9-goog Message-ID: <20230227173632.3292573-11-surenb@google.com> Subject: [PATCH v4 10/33] mm: rcu safe VMA freeing From: Suren Baghdasaryan To: akpm@linux-foundation.org Cc: michel@lespinasse.org, jglisse@google.com, mhocko@suse.com, vbabka@suse.cz, hannes@cmpxchg.org, mgorman@techsingularity.net, dave@stgolabs.net, willy@infradead.org, liam.howlett@oracle.com, peterz@infradead.org, ldufour@linux.ibm.com, paulmck@kernel.org, mingo@redhat.com, will@kernel.org, luto@kernel.org, songliubraving@fb.com, peterx@redhat.com, david@redhat.com, dhowells@redhat.com, hughd@google.com, bigeasy@linutronix.de, kent.overstreet@linux.dev, punit.agrawal@bytedance.com, lstoakes@gmail.com, peterjung1337@gmail.com, rientjes@google.com, chriscli@google.com, axelrasmussen@google.com, joelaf@google.com, minchan@google.com, rppt@kernel.org, jannh@google.com, shakeelb@google.com, tatashin@google.com, edumazet@google.com, gthelen@google.com, gurua@google.com, arjunroy@google.com, soheil@google.com, leewalsh@google.com, posk@google.com, michalechner92@googlemail.com, linux-mm@kvack.org, linux-arm-kernel@lists.infradead.org, linuxppc-dev@lists.ozlabs.org, x86@kernel.org, linux-kernel@vger.kernel.org, kernel-team@android.com, Suren Baghdasaryan Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Michel Lespinasse This prepares for page faults handling under VMA lock, looking up VMAs under protection of an rcu read lock, instead of the usual mmap read lock. Signed-off-by: Michel Lespinasse Signed-off-by: Suren Baghdasaryan --- include/linux/mm_types.h | 13 ++++++++++--- kernel/fork.c | 20 +++++++++++++++++++- 2 files changed, 29 insertions(+), 4 deletions(-) diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index 22b2ac82bffd..64a6b3f6b74f 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -480,9 +480,16 @@ struct anon_vma_name { struct vm_area_struct { /* The first cache line has the info for VMA tree walking. */ =20 - unsigned long vm_start; /* Our start address within vm_mm. */ - unsigned long vm_end; /* The first byte after our end address - within vm_mm. */ + union { + struct { + /* VMA covers [vm_start; vm_end) addresses within mm */ + unsigned long vm_start; + unsigned long vm_end; + }; +#ifdef CONFIG_PER_VMA_LOCK + struct rcu_head vm_rcu; /* Used for deferred freeing. */ +#endif + }; =20 struct mm_struct *vm_mm; /* The address space we belong to. */ =20 diff --git a/kernel/fork.c b/kernel/fork.c index abfcf95734c7..a63b739aeca9 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -479,12 +479,30 @@ struct vm_area_struct *vm_area_dup(struct vm_area_str= uct *orig) return new; } =20 -void vm_area_free(struct vm_area_struct *vma) +static void __vm_area_free(struct vm_area_struct *vma) { free_anon_vma_name(vma); kmem_cache_free(vm_area_cachep, vma); } =20 +#ifdef CONFIG_PER_VMA_LOCK +static void vm_area_free_rcu_cb(struct rcu_head *head) +{ + struct vm_area_struct *vma =3D container_of(head, struct vm_area_struct, + vm_rcu); + __vm_area_free(vma); +} +#endif + +void vm_area_free(struct vm_area_struct *vma) +{ +#ifdef CONFIG_PER_VMA_LOCK + call_rcu(&vma->vm_rcu, vm_area_free_rcu_cb); +#else + __vm_area_free(vma); +#endif +} + static void account_kernel_stack(struct task_struct *tsk, int account) { if (IS_ENABLED(CONFIG_VMAP_STACK)) { --=20 2.39.2.722.g9855ee24e9-goog From nobody Wed Sep 10 09:07:32 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 69F85C7EE23 for ; Mon, 27 Feb 2023 17:37:47 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230087AbjB0Rhp (ORCPT ); Mon, 27 Feb 2023 12:37:45 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49200 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230181AbjB0RhV (ORCPT ); Mon, 27 Feb 2023 12:37:21 -0500 Received: from mail-yw1-x114a.google.com (mail-yw1-x114a.google.com [IPv6:2607:f8b0:4864:20::114a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id EECA1244B7 for ; Mon, 27 Feb 2023 09:37:03 -0800 (PST) Received: by mail-yw1-x114a.google.com with SMTP id 00721157ae682-536d63d17dbso154500957b3.22 for ; Mon, 27 Feb 2023 09:37:03 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=Ls18it+007PHyc+1gU+R0B2Ah0edgQPBYFo5Fg80PJ0=; b=XsYGyrr6UscqFf/3E/PqXH+0brCoiV/AniBzcmo5DJxAd/4Bmzv4DyD8sM4NuAQxi8 dZWrcG1/ZDBfEbfkvX95mB7A+YCMYHAJEmom82/9Nzaqf1YzLljv1hrT1/pnhQY+fnL+ IOvvFbrnVTXpiEotFDmf2wc0ksmnu2CR25KjblrUK/fkJWLyctUr7YoQUoBBpu2dXgAS BlvIHfsMMuqia7hwWHF3vceBwA8k9G+49DeoR5tUXJAWoHB1hsAUuDBJJWuX8cJ0Z90U 6Vd32qha+psgDSdv++LPDOcFAXLUVGYeIPM/jmszF46HDebsTawQRm0lgk8RRbgTNhtx sLHw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=Ls18it+007PHyc+1gU+R0B2Ah0edgQPBYFo5Fg80PJ0=; b=GpJTvYfsxTNAe9QYASsqLq6qXwsHzX2/yOzLJadVfj4Cg1gXc3Cz6JPXMocwDLUJ9E 9sdit3IqP/bYOShEsrxqqK8Qbt4MIvwRh0sCtkUspMm+vo8lK/Wr4bSc/4m01uRMvOBZ Jx8H3KNMEkoU6VXz+K6tEKU737k05Y9QEvJM6DzUUiQf8fPLyBjsKTCZjYoiuUwkdrwa Q18/oSsCfzSpL8Gz9WcTojoWH8ZUqyUQPFzlom0//6OZvsmRV7R2WOOlfqX2DyhP2qOY OqWMWMATZ18FIpZol2CX1tKP9OyB9LxI/o671/My+zm1Z0MYIPzseHDH+uZDIRf++Cdx vHDw== X-Gm-Message-State: AO0yUKUf1F5anhGd935YvOOPQrA7Eyj4oTOwNZCqh8ty0tmnzjp4akv/ 9bZOYB3plpWKJLBBFGpXXqlQuXY+nEw= X-Google-Smtp-Source: AK7set84bNGznxOoxVaVlILHB4VZYiPLd+BYNZO1PonJXcuULbgT0KQ1/CW8aHys4CM8FC8mlmVpy5ibqO4= X-Received: from surenb-desktop.mtv.corp.google.com ([2620:15c:211:200:e1f6:21d1:eead:3897]) (user=surenb job=sendgmr) by 2002:a05:6902:1007:b0:8f3:904a:2305 with SMTP id w7-20020a056902100700b008f3904a2305mr261507ybt.2.1677519423051; Mon, 27 Feb 2023 09:37:03 -0800 (PST) Date: Mon, 27 Feb 2023 09:36:10 -0800 In-Reply-To: <20230227173632.3292573-1-surenb@google.com> Mime-Version: 1.0 References: <20230227173632.3292573-1-surenb@google.com> X-Mailer: git-send-email 2.39.2.722.g9855ee24e9-goog Message-ID: <20230227173632.3292573-12-surenb@google.com> Subject: [PATCH v4 11/33] mm: move mmap_lock assert function definitions From: Suren Baghdasaryan To: akpm@linux-foundation.org Cc: michel@lespinasse.org, jglisse@google.com, mhocko@suse.com, vbabka@suse.cz, hannes@cmpxchg.org, mgorman@techsingularity.net, dave@stgolabs.net, willy@infradead.org, liam.howlett@oracle.com, peterz@infradead.org, ldufour@linux.ibm.com, paulmck@kernel.org, mingo@redhat.com, will@kernel.org, luto@kernel.org, songliubraving@fb.com, peterx@redhat.com, david@redhat.com, dhowells@redhat.com, hughd@google.com, bigeasy@linutronix.de, kent.overstreet@linux.dev, punit.agrawal@bytedance.com, lstoakes@gmail.com, peterjung1337@gmail.com, rientjes@google.com, chriscli@google.com, axelrasmussen@google.com, joelaf@google.com, minchan@google.com, rppt@kernel.org, jannh@google.com, shakeelb@google.com, tatashin@google.com, edumazet@google.com, gthelen@google.com, gurua@google.com, arjunroy@google.com, soheil@google.com, leewalsh@google.com, posk@google.com, michalechner92@googlemail.com, linux-mm@kvack.org, linux-arm-kernel@lists.infradead.org, linuxppc-dev@lists.ozlabs.org, x86@kernel.org, linux-kernel@vger.kernel.org, kernel-team@android.com, Suren Baghdasaryan Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Move mmap_lock assert function definitions up so that they can be used by other mmap_lock routines. Signed-off-by: Suren Baghdasaryan --- include/linux/mmap_lock.h | 24 ++++++++++++------------ 1 file changed, 12 insertions(+), 12 deletions(-) diff --git a/include/linux/mmap_lock.h b/include/linux/mmap_lock.h index 96e113e23d04..e49ba91bb1f0 100644 --- a/include/linux/mmap_lock.h +++ b/include/linux/mmap_lock.h @@ -60,6 +60,18 @@ static inline void __mmap_lock_trace_released(struct mm_= struct *mm, bool write) =20 #endif /* CONFIG_TRACING */ =20 +static inline void mmap_assert_locked(struct mm_struct *mm) +{ + lockdep_assert_held(&mm->mmap_lock); + VM_BUG_ON_MM(!rwsem_is_locked(&mm->mmap_lock), mm); +} + +static inline void mmap_assert_write_locked(struct mm_struct *mm) +{ + lockdep_assert_held_write(&mm->mmap_lock); + VM_BUG_ON_MM(!rwsem_is_locked(&mm->mmap_lock), mm); +} + static inline void mmap_init_lock(struct mm_struct *mm) { init_rwsem(&mm->mmap_lock); @@ -150,18 +162,6 @@ static inline void mmap_read_unlock_non_owner(struct m= m_struct *mm) up_read_non_owner(&mm->mmap_lock); } =20 -static inline void mmap_assert_locked(struct mm_struct *mm) -{ - lockdep_assert_held(&mm->mmap_lock); - VM_BUG_ON_MM(!rwsem_is_locked(&mm->mmap_lock), mm); -} - -static inline void mmap_assert_write_locked(struct mm_struct *mm) -{ - lockdep_assert_held_write(&mm->mmap_lock); - VM_BUG_ON_MM(!rwsem_is_locked(&mm->mmap_lock), mm); -} - static inline int mmap_lock_is_contended(struct mm_struct *mm) { return rwsem_is_contended(&mm->mmap_lock); --=20 2.39.2.722.g9855ee24e9-goog From nobody Wed Sep 10 09:07:32 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 69D96C64ED6 for ; Mon, 27 Feb 2023 17:37:52 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230341AbjB0Rhu (ORCPT ); Mon, 27 Feb 2023 12:37:50 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48956 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230238AbjB0Rh1 (ORCPT ); Mon, 27 Feb 2023 12:37:27 -0500 Received: from mail-pj1-x104a.google.com (mail-pj1-x104a.google.com [IPv6:2607:f8b0:4864:20::104a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 927D124C8D for ; Mon, 27 Feb 2023 09:37:06 -0800 (PST) Received: by mail-pj1-x104a.google.com with SMTP id pb4-20020a17090b3c0400b00237873bd59bso1903024pjb.2 for ; Mon, 27 Feb 2023 09:37:06 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=Q/GJoky+VbSuk6dzXc8JP5Q5/FC6q697TPLIvz4vCWo=; b=NrffhdAmMk53hmq587bWVqAcVwUoFMyrGdI1dcdSP5rFWcjv1b2dxTOEh/45qUYAPi h/0PdE/bzVHMe1/toIQLnI1PA1/EP/QcNn95krcTC4R7+IMiZJdtCRCtMs9jcf3/2ANu i0kT+E3X8ZM+V+p/+ReYSUc0stVg3Z855SoJ4cBQAEK0+2hyKtIS1nwe4GOth2yQHLfU w8Fk0BJXAVVwtUoVe1w9Qpj9BUCfZjuVzX1iYFGwYp+nbDSsHPi+lTKw+kw0AGwaK95T wMmibkxVEN+68L0K6akco8ImKzGRBceFctt146aukjldpyDGZk2PpWKH7Fdr+teexTnY XHzA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=Q/GJoky+VbSuk6dzXc8JP5Q5/FC6q697TPLIvz4vCWo=; b=FZgkv9M06A5N+ND4afJa3HnHo/kwU/qmb2eYNLor4KmmvUOcv98faVTgOFZMrec+pa 82wCj/+o1l5euOHghBDrF08iMG3QDwHXRaF6C4SBUzY6GfMP3nvi4DOsDcSehSsqN4Xt Y3Er5DPq9CO6gF67G6amF5rc9DbpxtzXHSEcGRcYF6gXYKETCla1k9o3ijXeKUumYJDS g/kbCOeDEf6z1LWhaDk1bgE86XxDSk/6IeAm9B+ON/H8CRnW8eMZUfAnfgl9fH9ku7Mt Y6HAN8jDyUYQBAS+FAJp8VQ0fHw6HMHokHGzR20OtALcjBULoVpMGkvfpRkn65jHE19P yDMg== X-Gm-Message-State: AO0yUKW7x++oNwJ+lOCJhoUSR8qv7ZvrBeaP/5HNEoS8MvdwhpF+plNt OCU+tvSlucp/CsDOkwJP20tuhx/xz5c= X-Google-Smtp-Source: AK7set/svLuIVJzjBHat6rB8sP8zcMP/ZnW0bz7V1s9w04rE5j+sFd4oXDqXKYJ9s+zwLVwyfpWMezpHbsk= X-Received: from surenb-desktop.mtv.corp.google.com ([2620:15c:211:200:e1f6:21d1:eead:3897]) (user=surenb job=sendgmr) by 2002:a17:90a:1081:b0:237:3b68:e50c with SMTP id c1-20020a17090a108100b002373b68e50cmr30137pja.1.1677519425412; Mon, 27 Feb 2023 09:37:05 -0800 (PST) Date: Mon, 27 Feb 2023 09:36:11 -0800 In-Reply-To: <20230227173632.3292573-1-surenb@google.com> Mime-Version: 1.0 References: <20230227173632.3292573-1-surenb@google.com> X-Mailer: git-send-email 2.39.2.722.g9855ee24e9-goog Message-ID: <20230227173632.3292573-13-surenb@google.com> Subject: [PATCH v4 12/33] mm: add per-VMA lock and helper functions to control it From: Suren Baghdasaryan To: akpm@linux-foundation.org Cc: michel@lespinasse.org, jglisse@google.com, mhocko@suse.com, vbabka@suse.cz, hannes@cmpxchg.org, mgorman@techsingularity.net, dave@stgolabs.net, willy@infradead.org, liam.howlett@oracle.com, peterz@infradead.org, ldufour@linux.ibm.com, paulmck@kernel.org, mingo@redhat.com, will@kernel.org, luto@kernel.org, songliubraving@fb.com, peterx@redhat.com, david@redhat.com, dhowells@redhat.com, hughd@google.com, bigeasy@linutronix.de, kent.overstreet@linux.dev, punit.agrawal@bytedance.com, lstoakes@gmail.com, peterjung1337@gmail.com, rientjes@google.com, chriscli@google.com, axelrasmussen@google.com, joelaf@google.com, minchan@google.com, rppt@kernel.org, jannh@google.com, shakeelb@google.com, tatashin@google.com, edumazet@google.com, gthelen@google.com, gurua@google.com, arjunroy@google.com, soheil@google.com, leewalsh@google.com, posk@google.com, michalechner92@googlemail.com, linux-mm@kvack.org, linux-arm-kernel@lists.infradead.org, linuxppc-dev@lists.ozlabs.org, x86@kernel.org, linux-kernel@vger.kernel.org, kernel-team@android.com, Suren Baghdasaryan Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Introduce per-VMA locking. The lock implementation relies on a per-vma and per-mm sequence counters to note exclusive locking: - read lock - (implemented by vma_start_read) requires the vma (vm_lock_seq) and mm (mm_lock_seq) sequence counters to differ. If they match then there must be a vma exclusive lock held somewhere. - read unlock - (implemented by vma_end_read) is a trivial vma->lock unlock. - write lock - (vma_start_write) requires the mmap_lock to be held exclusively and the current mm counter is assigned to the vma counter. This will allow multiple vmas to be locked under a single mmap_lock write lock (e.g. during vma merging). The vma counter is modified under exclusive vma lock. - write unlock - (vma_end_write_all) is a batch release of all vma locks held. It doesn't pair with a specific vma_start_write! It is done before exclusive mmap_lock is released by incrementing mm sequence counter (mm_lock_seq). - write downgrade - if the mmap_lock is downgraded to the read lock, all vma write locks are released as well (effectivelly same as write unlock). Signed-off-by: Suren Baghdasaryan --- include/linux/mm.h | 82 +++++++++++++++++++++++++++++++++++++++ include/linux/mm_types.h | 8 ++++ include/linux/mmap_lock.h | 13 +++++++ kernel/fork.c | 4 ++ mm/init-mm.c | 3 ++ 5 files changed, 110 insertions(+) diff --git a/include/linux/mm.h b/include/linux/mm.h index 1f79667824eb..bbad5d4fa81b 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -623,6 +623,87 @@ struct vm_operations_struct { unsigned long addr); }; =20 +#ifdef CONFIG_PER_VMA_LOCK +static inline void vma_init_lock(struct vm_area_struct *vma) +{ + init_rwsem(&vma->lock); + vma->vm_lock_seq =3D -1; +} + +/* + * Try to read-lock a vma. The function is allowed to occasionally yield f= alse + * locked result to avoid performance overhead, in which case we fall back= to + * using mmap_lock. The function should never yield false unlocked result. + */ +static inline bool vma_start_read(struct vm_area_struct *vma) +{ + /* Check before locking. A race might cause false locked result. */ + if (vma->vm_lock_seq =3D=3D READ_ONCE(vma->vm_mm->mm_lock_seq)) + return false; + + if (unlikely(down_read_trylock(&vma->lock) =3D=3D 0)) + return false; + + /* + * Overflow might produce false locked result. + * False unlocked result is impossible because we modify and check + * vma->vm_lock_seq under vma->lock protection and mm->mm_lock_seq + * modification invalidates all existing locks. + */ + if (unlikely(vma->vm_lock_seq =3D=3D READ_ONCE(vma->vm_mm->mm_lock_seq)))= { + up_read(&vma->lock); + return false; + } + return true; +} + +static inline void vma_end_read(struct vm_area_struct *vma) +{ + rcu_read_lock(); /* keeps vma alive till the end of up_read */ + up_read(&vma->lock); + rcu_read_unlock(); +} + +static inline void vma_start_write(struct vm_area_struct *vma) +{ + int mm_lock_seq; + + mmap_assert_write_locked(vma->vm_mm); + + /* + * current task is holding mmap_write_lock, both vma->vm_lock_seq and + * mm->mm_lock_seq can't be concurrently modified. + */ + mm_lock_seq =3D READ_ONCE(vma->vm_mm->mm_lock_seq); + if (vma->vm_lock_seq =3D=3D mm_lock_seq) + return; + + down_write(&vma->lock); + vma->vm_lock_seq =3D mm_lock_seq; + up_write(&vma->lock); +} + +static inline void vma_assert_write_locked(struct vm_area_struct *vma) +{ + mmap_assert_write_locked(vma->vm_mm); + /* + * current task is holding mmap_write_lock, both vma->vm_lock_seq and + * mm->mm_lock_seq can't be concurrently modified. + */ + VM_BUG_ON_VMA(vma->vm_lock_seq !=3D READ_ONCE(vma->vm_mm->mm_lock_seq), v= ma); +} + +#else /* CONFIG_PER_VMA_LOCK */ + +static inline void vma_init_lock(struct vm_area_struct *vma) {} +static inline bool vma_start_read(struct vm_area_struct *vma) + { return false; } +static inline void vma_end_read(struct vm_area_struct *vma) {} +static inline void vma_start_write(struct vm_area_struct *vma) {} +static inline void vma_assert_write_locked(struct vm_area_struct *vma) {} + +#endif /* CONFIG_PER_VMA_LOCK */ + static inline void vma_init(struct vm_area_struct *vma, struct mm_struct *= mm) { static const struct vm_operations_struct dummy_vm_ops =3D {}; @@ -631,6 +712,7 @@ static inline void vma_init(struct vm_area_struct *vma,= struct mm_struct *mm) vma->vm_mm =3D mm; vma->vm_ops =3D &dummy_vm_ops; INIT_LIST_HEAD(&vma->anon_vma_chain); + vma_init_lock(vma); } =20 /* Use when VMA is not part of the VMA tree and needs no locking */ diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index 64a6b3f6b74f..a4e7493bacd7 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -508,6 +508,11 @@ struct vm_area_struct { vm_flags_t __private __vm_flags; }; =20 +#ifdef CONFIG_PER_VMA_LOCK + int vm_lock_seq; + struct rw_semaphore lock; +#endif + /* * For areas with an address space and backing store, * linkage into the address_space->i_mmap interval tree. @@ -644,6 +649,9 @@ struct mm_struct { * init_mm.mmlist, and are protected * by mmlist_lock */ +#ifdef CONFIG_PER_VMA_LOCK + int mm_lock_seq; +#endif =20 =20 unsigned long hiwater_rss; /* High-watermark of RSS usage */ diff --git a/include/linux/mmap_lock.h b/include/linux/mmap_lock.h index e49ba91bb1f0..aab8f1b28d26 100644 --- a/include/linux/mmap_lock.h +++ b/include/linux/mmap_lock.h @@ -72,6 +72,17 @@ static inline void mmap_assert_write_locked(struct mm_st= ruct *mm) VM_BUG_ON_MM(!rwsem_is_locked(&mm->mmap_lock), mm); } =20 +#ifdef CONFIG_PER_VMA_LOCK +static inline void vma_end_write_all(struct mm_struct *mm) +{ + mmap_assert_write_locked(mm); + /* No races during update due to exclusive mmap_lock being held */ + WRITE_ONCE(mm->mm_lock_seq, mm->mm_lock_seq + 1); +} +#else +static inline void vma_end_write_all(struct mm_struct *mm) {} +#endif + static inline void mmap_init_lock(struct mm_struct *mm) { init_rwsem(&mm->mmap_lock); @@ -114,12 +125,14 @@ static inline bool mmap_write_trylock(struct mm_struc= t *mm) static inline void mmap_write_unlock(struct mm_struct *mm) { __mmap_lock_trace_released(mm, true); + vma_end_write_all(mm); up_write(&mm->mmap_lock); } =20 static inline void mmap_write_downgrade(struct mm_struct *mm) { __mmap_lock_trace_acquire_returned(mm, false, true); + vma_end_write_all(mm); downgrade_write(&mm->mmap_lock); } =20 diff --git a/kernel/fork.c b/kernel/fork.c index a63b739aeca9..e1dd79c7738c 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -474,6 +474,7 @@ struct vm_area_struct *vm_area_dup(struct vm_area_struc= t *orig) */ data_race(memcpy(new, orig, sizeof(*new))); INIT_LIST_HEAD(&new->anon_vma_chain); + vma_init_lock(new); dup_anon_vma_name(orig, new); } return new; @@ -1216,6 +1217,9 @@ static struct mm_struct *mm_init(struct mm_struct *mm= , struct task_struct *p, seqcount_init(&mm->write_protect_seq); mmap_init_lock(mm); INIT_LIST_HEAD(&mm->mmlist); +#ifdef CONFIG_PER_VMA_LOCK + mm->mm_lock_seq =3D 0; +#endif mm_pgtables_bytes_init(mm); mm->map_count =3D 0; mm->locked_vm =3D 0; diff --git a/mm/init-mm.c b/mm/init-mm.c index c9327abb771c..33269314e060 100644 --- a/mm/init-mm.c +++ b/mm/init-mm.c @@ -37,6 +37,9 @@ struct mm_struct init_mm =3D { .page_table_lock =3D __SPIN_LOCK_UNLOCKED(init_mm.page_table_lock), .arg_lock =3D __SPIN_LOCK_UNLOCKED(init_mm.arg_lock), .mmlist =3D LIST_HEAD_INIT(init_mm.mmlist), +#ifdef CONFIG_PER_VMA_LOCK + .mm_lock_seq =3D 0, +#endif .user_ns =3D &init_user_ns, .cpu_bitmap =3D CPU_BITS_NONE, #ifdef CONFIG_IOMMU_SVA --=20 2.39.2.722.g9855ee24e9-goog From nobody Wed Sep 10 09:07:32 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 896EBC64ED6 for ; Mon, 27 Feb 2023 17:38:02 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229990AbjB0RiA (ORCPT ); Mon, 27 Feb 2023 12:38:00 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48238 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230130AbjB0Rha (ORCPT ); Mon, 27 Feb 2023 12:37:30 -0500 Received: from mail-yw1-x114a.google.com (mail-yw1-x114a.google.com [IPv6:2607:f8b0:4864:20::114a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B1556244AC for ; Mon, 27 Feb 2023 09:37:08 -0800 (PST) Received: by mail-yw1-x114a.google.com with SMTP id 00721157ae682-5395c8fc4a1so108715137b3.5 for ; Mon, 27 Feb 2023 09:37:08 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=I212iwzKiM4RAh4/x5atSdllFVTlqkGJNP08f3JbnWQ=; b=XBhlBuuWfB0rAIGd6hmSTNeCzmJSeNc8sCljBThxSUCF4xl/MSw7wD8HJnBQqXsjUJ 57kJQANXg6fNdWd8Pt4eiSrUjDCyUw1I4L84NqCzMgqp7jq/T8VmsbuNmrFrLa4O4XYU GqAQJ+b6Oca+hy9pxgzol0kd/HIC9ER2wtl/KEZHOhFYwY8HxJnljLvK30eOyvBMk653 HhqyRDcfvr4BXt/19NWrR6/WQesJSjC/tN0jtggmEFB14E/QwVd/f4szWEolKHRDFF2I b8hGsHDba//nCM47ED9hrsor6X6kCcWnUTvy8GzRTm8e9DVF7x8YXaJ1E8nG+/bnJf9E 7yEg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=I212iwzKiM4RAh4/x5atSdllFVTlqkGJNP08f3JbnWQ=; b=E4AukLGt3jKn6naeNishIxs21cVl7GqQC4bcYfUiXSNZ4qZGcmAV1YZRmOUku2ZdUV xk3L5MeJlyOfRMO8PgwCYzcfe2xzLAN6HhDelg3suOAJIyP5TjeP0ofKPKtetCiIKU1V Xmlv3frGoV/NvXPMN56eapgWNRWMnAf2u46k73BgZhLL7s0a1Eonr+yNny4+Phlj+hr1 nB9jxzMRc60zKuInnE9Tw/YJpZYaPt+tH349k9kyGlhEY2it+8KmLcZ9oFUZvu9cRsbf cJkyWN4SCcq2IUfGIWDAb4CwtqY/rfkwijF88Kqmko45STvOe1nXiPoWquCTzuwZeBcT TiLQ== X-Gm-Message-State: AO0yUKXEPbglgUTMRqRDjCRkX6/xkTbplYM95SSod1Q+7q22d+wbbqXl PaVRcz5yJYUXnPNWp6QMg2drepbckrs= X-Google-Smtp-Source: AK7set92Gr75A5uaM9VLoSN3m2+g8CdJzSD2poMBi0hZE65OwRnSrd/3zaVdEb4P76h2ChlMWirB5h4UQZs= X-Received: from surenb-desktop.mtv.corp.google.com ([2620:15c:211:200:e1f6:21d1:eead:3897]) (user=surenb job=sendgmr) by 2002:a5b:a4b:0:b0:a06:5ef5:3a82 with SMTP id z11-20020a5b0a4b000000b00a065ef53a82mr7473143ybq.5.1677519427726; Mon, 27 Feb 2023 09:37:07 -0800 (PST) Date: Mon, 27 Feb 2023 09:36:12 -0800 In-Reply-To: <20230227173632.3292573-1-surenb@google.com> Mime-Version: 1.0 References: <20230227173632.3292573-1-surenb@google.com> X-Mailer: git-send-email 2.39.2.722.g9855ee24e9-goog Message-ID: <20230227173632.3292573-14-surenb@google.com> Subject: [PATCH v4 13/33] mm: mark VMA as being written when changing vm_flags From: Suren Baghdasaryan To: akpm@linux-foundation.org Cc: michel@lespinasse.org, jglisse@google.com, mhocko@suse.com, vbabka@suse.cz, hannes@cmpxchg.org, mgorman@techsingularity.net, dave@stgolabs.net, willy@infradead.org, liam.howlett@oracle.com, peterz@infradead.org, ldufour@linux.ibm.com, paulmck@kernel.org, mingo@redhat.com, will@kernel.org, luto@kernel.org, songliubraving@fb.com, peterx@redhat.com, david@redhat.com, dhowells@redhat.com, hughd@google.com, bigeasy@linutronix.de, kent.overstreet@linux.dev, punit.agrawal@bytedance.com, lstoakes@gmail.com, peterjung1337@gmail.com, rientjes@google.com, chriscli@google.com, axelrasmussen@google.com, joelaf@google.com, minchan@google.com, rppt@kernel.org, jannh@google.com, shakeelb@google.com, tatashin@google.com, edumazet@google.com, gthelen@google.com, gurua@google.com, arjunroy@google.com, soheil@google.com, leewalsh@google.com, posk@google.com, michalechner92@googlemail.com, linux-mm@kvack.org, linux-arm-kernel@lists.infradead.org, linuxppc-dev@lists.ozlabs.org, x86@kernel.org, linux-kernel@vger.kernel.org, kernel-team@android.com, Suren Baghdasaryan Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Updates to vm_flags have to be done with VMA marked as being written for preventing concurrent page faults or other modifications. Signed-off-by: Suren Baghdasaryan --- include/linux/mm.h | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/include/linux/mm.h b/include/linux/mm.h index bbad5d4fa81b..3d5e8666892d 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -726,28 +726,28 @@ static inline void vm_flags_init(struct vm_area_struc= t *vma, static inline void vm_flags_reset(struct vm_area_struct *vma, vm_flags_t flags) { - mmap_assert_write_locked(vma->vm_mm); + vma_start_write(vma); vm_flags_init(vma, flags); } =20 static inline void vm_flags_reset_once(struct vm_area_struct *vma, vm_flags_t flags) { - mmap_assert_write_locked(vma->vm_mm); + vma_start_write(vma); WRITE_ONCE(ACCESS_PRIVATE(vma, __vm_flags), flags); } =20 static inline void vm_flags_set(struct vm_area_struct *vma, vm_flags_t flags) { - mmap_assert_write_locked(vma->vm_mm); + vma_start_write(vma); ACCESS_PRIVATE(vma, __vm_flags) |=3D flags; } =20 static inline void vm_flags_clear(struct vm_area_struct *vma, vm_flags_t flags) { - mmap_assert_write_locked(vma->vm_mm); + vma_start_write(vma); ACCESS_PRIVATE(vma, __vm_flags) &=3D ~flags; } =20 @@ -768,7 +768,7 @@ static inline void __vm_flags_mod(struct vm_area_struct= *vma, static inline void vm_flags_mod(struct vm_area_struct *vma, vm_flags_t set, vm_flags_t clear) { - mmap_assert_write_locked(vma->vm_mm); + vma_start_write(vma); __vm_flags_mod(vma, set, clear); } =20 --=20 2.39.2.722.g9855ee24e9-goog From nobody Wed Sep 10 09:07:32 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id AB569C64ED6 for ; Mon, 27 Feb 2023 17:38:06 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230303AbjB0RiF (ORCPT ); Mon, 27 Feb 2023 12:38:05 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48206 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229889AbjB0Rhd (ORCPT ); Mon, 27 Feb 2023 12:37:33 -0500 Received: from mail-pl1-x649.google.com (mail-pl1-x649.google.com [IPv6:2607:f8b0:4864:20::649]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A87C623C78 for ; Mon, 27 Feb 2023 09:37:11 -0800 (PST) Received: by mail-pl1-x649.google.com with SMTP id ki15-20020a170903068f00b0019ce282dc68so3909227plb.6 for ; Mon, 27 Feb 2023 09:37:11 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=QWCYeH3rb3vSAFp54Mmrv1+eix/+tyqClZQn+3lumNA=; b=qq9ST5KYd7H0i08HiNSL8c29Xrjl3aIZNo0vqd33ohY7IlOekjqUFmHX1pCbc/0CIB UJYRScXv/uQI5WgiH6m5n3NDzNqXvXc+EBT3Nr8sb84051cUA5rcu7NsZAUPZLQbAT+k L1+D52rxNhib5vRaYFXHr0l0OBrPbhYiKHs0HG8E+v4FWKdNvWFhRWhiPM7+2blQtGKB AQqKpecJF/PSLRs/Iu3OWc/90nRfqSr4OY1HTu201oV1jp1/8gN3EH/qUSvYipkLdMaB KA1JreUYCECV/wYDbpy6/KFPjd6XlpEX5mJczzYRoBrzVCeQlTDt7q+I0gm0goYKe+3A HbFg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=QWCYeH3rb3vSAFp54Mmrv1+eix/+tyqClZQn+3lumNA=; b=OICOCI6F+eOIk+OzMnv/sWdUs/hAi3kIxg+VtxYshKA4wgIqL1HIBacsnVSPyOi1Ma LF/0WwguV/8ijYWzV2rWn/PPDSdDpxwCItI45SfFK5dBnKpsyiqlYFThRvrz2Qjv6LxN wb+vwpLcTMKbU2gddyDtlOU8Q0FNLCWsMvkwTu3OOW3S+Ifl+Ics78n+LAY7K36TdTtW VQcQKqHOYoJC8eODI2hFQqxZmoIAXGy228JRFd4Xz9wc5tEak/MUaFspcppJgDeKELOw LX8fDbJsInkccqCzLmMx0QYkFOrIEi0wYsEfZVZxgGx0huxbOqFIxM2LNtW/W23Sy0sj ISXg== X-Gm-Message-State: AO0yUKUKJMpzEdLYsFwuwB7UyC48LWdEqp/toH0SLeCa3t+8o7Rsvt5o WxBoXdxftrzV+MZN1jy6PBgR5yw6SFA= X-Google-Smtp-Source: AK7set8jO949nwq1qPTfiOwF5d/NxOGl9PwUBICmy0J7iLD+CPMrX0ZITRyhhJB1FHKtenUKzwzPrWF2iBQ= X-Received: from surenb-desktop.mtv.corp.google.com ([2620:15c:211:200:e1f6:21d1:eead:3897]) (user=surenb job=sendgmr) by 2002:a17:903:2682:b0:199:56f2:3fc4 with SMTP id jf2-20020a170903268200b0019956f23fc4mr6277864plb.8.1677519430200; Mon, 27 Feb 2023 09:37:10 -0800 (PST) Date: Mon, 27 Feb 2023 09:36:13 -0800 In-Reply-To: <20230227173632.3292573-1-surenb@google.com> Mime-Version: 1.0 References: <20230227173632.3292573-1-surenb@google.com> X-Mailer: git-send-email 2.39.2.722.g9855ee24e9-goog Message-ID: <20230227173632.3292573-15-surenb@google.com> Subject: [PATCH v4 14/33] mm/mmap: move vma_prepare before vma_adjust_trans_huge From: Suren Baghdasaryan To: akpm@linux-foundation.org Cc: michel@lespinasse.org, jglisse@google.com, mhocko@suse.com, vbabka@suse.cz, hannes@cmpxchg.org, mgorman@techsingularity.net, dave@stgolabs.net, willy@infradead.org, liam.howlett@oracle.com, peterz@infradead.org, ldufour@linux.ibm.com, paulmck@kernel.org, mingo@redhat.com, will@kernel.org, luto@kernel.org, songliubraving@fb.com, peterx@redhat.com, david@redhat.com, dhowells@redhat.com, hughd@google.com, bigeasy@linutronix.de, kent.overstreet@linux.dev, punit.agrawal@bytedance.com, lstoakes@gmail.com, peterjung1337@gmail.com, rientjes@google.com, chriscli@google.com, axelrasmussen@google.com, joelaf@google.com, minchan@google.com, rppt@kernel.org, jannh@google.com, shakeelb@google.com, tatashin@google.com, edumazet@google.com, gthelen@google.com, gurua@google.com, arjunroy@google.com, soheil@google.com, leewalsh@google.com, posk@google.com, michalechner92@googlemail.com, linux-mm@kvack.org, linux-arm-kernel@lists.infradead.org, linuxppc-dev@lists.ozlabs.org, x86@kernel.org, linux-kernel@vger.kernel.org, kernel-team@android.com, Suren Baghdasaryan Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" vma_prepare() acquires all locks required before VMA modifications. Move vma_prepare() before vma_adjust_trans_huge() so that VMA is locked before any modification. Signed-off-by: Suren Baghdasaryan --- mm/mmap.c | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/mm/mmap.c b/mm/mmap.c index c234443ee24c..92893d86c0af 100644 --- a/mm/mmap.c +++ b/mm/mmap.c @@ -683,12 +683,12 @@ int vma_expand(struct vma_iterator *vmi, struct vm_ar= ea_struct *vma, if (vma_iter_prealloc(vmi)) goto nomem; =20 + vma_prepare(&vp); vma_adjust_trans_huge(vma, start, end, 0); /* VMA iterator points to previous, so set to start if necessary */ if (vma_iter_addr(vmi) !=3D start) vma_iter_set(vmi, start); =20 - vma_prepare(&vp); vma->vm_start =3D start; vma->vm_end =3D end; vma->vm_pgoff =3D pgoff; @@ -723,8 +723,8 @@ int vma_shrink(struct vma_iterator *vmi, struct vm_area= _struct *vma, return -ENOMEM; =20 init_vma_prep(&vp, vma); - vma_adjust_trans_huge(vma, start, end, 0); vma_prepare(&vp); + vma_adjust_trans_huge(vma, start, end, 0); =20 if (vma->vm_start < start) vma_iter_clear(vmi, vma->vm_start, start); @@ -994,12 +994,12 @@ struct vm_area_struct *vma_merge(struct vma_iterator = *vmi, struct mm_struct *mm, if (vma_iter_prealloc(vmi)) return NULL; =20 - vma_adjust_trans_huge(vma, vma_start, vma_end, adj_next); init_multi_vma_prep(&vp, vma, adjust, remove, remove2); VM_WARN_ON(vp.anon_vma && adjust && adjust->anon_vma && vp.anon_vma !=3D adjust->anon_vma); =20 vma_prepare(&vp); + vma_adjust_trans_huge(vma, vma_start, vma_end, adj_next); if (vma_start < vma->vm_start || vma_end > vma->vm_end) vma_expanded =3D true; =20 @@ -2198,10 +2198,10 @@ int __split_vma(struct vma_iterator *vmi, struct vm= _area_struct *vma, if (new->vm_ops && new->vm_ops->open) new->vm_ops->open(new); =20 - vma_adjust_trans_huge(vma, vma->vm_start, addr, 0); init_vma_prep(&vp, vma); vp.insert =3D new; vma_prepare(&vp); + vma_adjust_trans_huge(vma, vma->vm_start, addr, 0); =20 if (new_below) { vma->vm_start =3D addr; @@ -2910,9 +2910,9 @@ static int do_brk_flags(struct vma_iterator *vmi, str= uct vm_area_struct *vma, if (vma_iter_prealloc(vmi)) goto unacct_fail; =20 - vma_adjust_trans_huge(vma, vma->vm_start, addr + len, 0); init_vma_prep(&vp, vma); vma_prepare(&vp); + vma_adjust_trans_huge(vma, vma->vm_start, addr + len, 0); vma->vm_end =3D addr + len; vm_flags_set(vma, VM_SOFTDIRTY); vma_iter_store(vmi, vma); --=20 2.39.2.722.g9855ee24e9-goog From nobody Wed Sep 10 09:07:32 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id A935EC64ED8 for ; Mon, 27 Feb 2023 17:38:09 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230321AbjB0RiI (ORCPT ); Mon, 27 Feb 2023 12:38:08 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49662 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230315AbjB0Rhf (ORCPT ); Mon, 27 Feb 2023 12:37:35 -0500 Received: from mail-yw1-x114a.google.com (mail-yw1-x114a.google.com [IPv6:2607:f8b0:4864:20::114a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 0204C23DA2 for ; Mon, 27 Feb 2023 09:37:12 -0800 (PST) Received: by mail-yw1-x114a.google.com with SMTP id 00721157ae682-536c039f859so154178577b3.21 for ; Mon, 27 Feb 2023 09:37:12 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=wnWzHyaDa4xKEsusiA17Gd0QXhxkYtF9yictDdgKzrI=; b=eYNrKBSiv4pp3yC/iTCyZADelL0yGuzmaY6iqmhnRhO2ZC4AwjZBlMwQ60T7+F6v5V W9pXVV/VzgegCAMywuIjmCwiOA+90SLBeJW+MLSiosTMfuWNrFFPzi/DpMcpFnT2QAIz CS8A9pF/mDc6TBfXFulN3NvXyigohh18r+RbFrpjgpOYnpRGGrUngztXOUUKHZu+O84j vZr6z0eLJ0rZFHfCy+yhp6M25rLJHKBYEd4jK+R2iR3coHxysAy7kDI+ez956k4JFPra ihhR22fhGEVzhhNLMgR5SxjkhXirQ5xjL8arQw+dXDtDF4Wmy1/3QOzDKcZaxxbfGAP+ YaEQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=wnWzHyaDa4xKEsusiA17Gd0QXhxkYtF9yictDdgKzrI=; b=2Fxm5Iv6bnWXs4BbDzdxGuugicHlbt6h+PqaFjEksMCD+yBml2TXzHmlij2wxT/Mzi OvO5ch2SIadmqpzeRpeIi5WR5eqokaUIlOB958mRjaI9605mdmMENdCj/wn24jSb05qp /oSTEDET2nxeNjI7U5jFDvlcUGiVIz4TIoIq7b0JpJlT3qNSmgSSaduhga4gIP4G/ZYx WnCrPasWArMGizfBKXIJjv+yVzwBYSLkV7yI6kHqIozdM1tR1SgXOUSGpm9pUJp6eGWc LMRH31Mc5v0ypWc7iOXueW/KqtvgyBxuIGq26J9ETHwdWW+if0o0II3HpSIvYce3eR3Q H0/g== X-Gm-Message-State: AO0yUKWJ9dsLbey3k8x5xBJG3OTHxc+q6z6uENqJBXSp8f8do8NL/CIP tms+USsYin05tN1GE/PGA+svpVKV40c= X-Google-Smtp-Source: AK7set85RoEyMLPPVhWwtVXRmKlr9rEFzrfKcusuRBnfvQixZvHj1lhFnYPTkzBCMOeQSYlYKbHYaDSIuDs= X-Received: from surenb-desktop.mtv.corp.google.com ([2620:15c:211:200:e1f6:21d1:eead:3897]) (user=surenb job=sendgmr) by 2002:a05:6902:118c:b0:a06:538f:265f with SMTP id m12-20020a056902118c00b00a06538f265fmr8111185ybu.4.1677519432589; Mon, 27 Feb 2023 09:37:12 -0800 (PST) Date: Mon, 27 Feb 2023 09:36:14 -0800 In-Reply-To: <20230227173632.3292573-1-surenb@google.com> Mime-Version: 1.0 References: <20230227173632.3292573-1-surenb@google.com> X-Mailer: git-send-email 2.39.2.722.g9855ee24e9-goog Message-ID: <20230227173632.3292573-16-surenb@google.com> Subject: [PATCH v4 15/33] mm/khugepaged: write-lock VMA while collapsing a huge page From: Suren Baghdasaryan To: akpm@linux-foundation.org Cc: michel@lespinasse.org, jglisse@google.com, mhocko@suse.com, vbabka@suse.cz, hannes@cmpxchg.org, mgorman@techsingularity.net, dave@stgolabs.net, willy@infradead.org, liam.howlett@oracle.com, peterz@infradead.org, ldufour@linux.ibm.com, paulmck@kernel.org, mingo@redhat.com, will@kernel.org, luto@kernel.org, songliubraving@fb.com, peterx@redhat.com, david@redhat.com, dhowells@redhat.com, hughd@google.com, bigeasy@linutronix.de, kent.overstreet@linux.dev, punit.agrawal@bytedance.com, lstoakes@gmail.com, peterjung1337@gmail.com, rientjes@google.com, chriscli@google.com, axelrasmussen@google.com, joelaf@google.com, minchan@google.com, rppt@kernel.org, jannh@google.com, shakeelb@google.com, tatashin@google.com, edumazet@google.com, gthelen@google.com, gurua@google.com, arjunroy@google.com, soheil@google.com, leewalsh@google.com, posk@google.com, michalechner92@googlemail.com, linux-mm@kvack.org, linux-arm-kernel@lists.infradead.org, linuxppc-dev@lists.ozlabs.org, x86@kernel.org, linux-kernel@vger.kernel.org, kernel-team@android.com, Suren Baghdasaryan Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Protect VMA from concurrent page fault handler while collapsing a huge page. Page fault handler needs a stable PMD to use PTL and relies on per-VMA lock to prevent concurrent PMD changes. pmdp_collapse_flush(), set_huge_pmd() and collapse_and_free_pmd() can modify a PMD, which will not be detected by a page fault handler without proper locking. Before this patch, page tables can be walked under any one of the mmap_lock, the mapping lock, and the anon_vma lock; so when khugepaged unlinks and frees page tables, it must ensure that all of those either are locked or don't exist. This patch adds a fourth lock under which page tables can be traversed, and so khugepaged must also lock out that one. Signed-off-by: Suren Baghdasaryan --- mm/khugepaged.c | 5 +++++ mm/rmap.c | 31 ++++++++++++++++--------------- 2 files changed, 21 insertions(+), 15 deletions(-) diff --git a/mm/khugepaged.c b/mm/khugepaged.c index 941d1c7ea910..c64e01f03f27 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -1147,6 +1147,7 @@ static int collapse_huge_page(struct mm_struct *mm, u= nsigned long address, if (result !=3D SCAN_SUCCEED) goto out_up_write; =20 + vma_start_write(vma); anon_vma_lock_write(vma->anon_vma); =20 mmu_notifier_range_init(&range, MMU_NOTIFY_CLEAR, 0, mm, address, @@ -1614,6 +1615,9 @@ int collapse_pte_mapped_thp(struct mm_struct *mm, uns= igned long addr, goto drop_hpage; } =20 + /* Lock the vma before taking i_mmap and page table locks */ + vma_start_write(vma); + /* * We need to lock the mapping so that from here on, only GUP-fast and * hardware page walks can access the parts of the page tables that @@ -1819,6 +1823,7 @@ static int retract_page_tables(struct address_space *= mapping, pgoff_t pgoff, result =3D SCAN_PTE_UFFD_WP; goto unlock_next; } + vma_start_write(vma); collapse_and_free_pmd(mm, vma, addr, pmd); if (!cc->is_khugepaged && is_target) result =3D set_huge_pmd(vma, addr, pmd, hpage); diff --git a/mm/rmap.c b/mm/rmap.c index 8632e02661ac..cfdaa56cad3e 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -25,21 +25,22 @@ * mapping->invalidate_lock (in filemap_fault) * page->flags PG_locked (lock_page) * hugetlbfs_i_mmap_rwsem_key (in huge_pmd_share, see hugetlbfs be= low) - * mapping->i_mmap_rwsem - * anon_vma->rwsem - * mm->page_table_lock or pte_lock - * swap_lock (in swap_duplicate, swap_info_get) - * mmlist_lock (in mmput, drain_mmlist and others) - * mapping->private_lock (in block_dirty_folio) - * folio_lock_memcg move_lock (in block_dirty_folio) - * i_pages lock (widely used) - * lruvec->lru_lock (in folio_lruvec_lock_irq) - * inode->i_lock (in set_page_dirty's __mark_inode_dirty) - * bdi.wb->list_lock (in set_page_dirty's __mark_inode_d= irty) - * sb_lock (within inode_lock in fs/fs-writeback.c) - * i_pages lock (widely used, in set_page_dirty, - * in arch-dependent flush_dcache_mmap_lock, - * within bdi.wb->list_lock in __sync_single= _inode) + * vma_start_write + * mapping->i_mmap_rwsem + * anon_vma->rwsem + * mm->page_table_lock or pte_lock + * swap_lock (in swap_duplicate, swap_info_get) + * mmlist_lock (in mmput, drain_mmlist and others) + * mapping->private_lock (in block_dirty_folio) + * folio_lock_memcg move_lock (in block_dirty_folio) + * i_pages lock (widely used) + * lruvec->lru_lock (in folio_lruvec_lock_irq) + * inode->i_lock (in set_page_dirty's __mark_inode_dir= ty) + * bdi.wb->list_lock (in set_page_dirty's __mark_inode= _dirty) + * sb_lock (within inode_lock in fs/fs-writeback.c) + * i_pages lock (widely used, in set_page_dirty, + * in arch-dependent flush_dcache_mmap_loc= k, + * within bdi.wb->list_lock in __sync_sing= le_inode) * * anon_vma->rwsem,mapping->i_mmap_rwsem (memory_failure, collect_procs_= anon) * ->tasklist_lock --=20 2.39.2.722.g9855ee24e9-goog From nobody Wed Sep 10 09:07:32 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id CEC5DC64ED8 for ; Mon, 27 Feb 2023 17:38:22 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230230AbjB0RiV (ORCPT ); Mon, 27 Feb 2023 12:38:21 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49294 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230083AbjB0Rhp (ORCPT ); Mon, 27 Feb 2023 12:37:45 -0500 Received: from mail-yw1-x114a.google.com (mail-yw1-x114a.google.com [IPv6:2607:f8b0:4864:20::114a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C3B5224120 for ; Mon, 27 Feb 2023 09:37:18 -0800 (PST) Received: by mail-yw1-x114a.google.com with SMTP id 00721157ae682-536be78056eso152392247b3.1 for ; Mon, 27 Feb 2023 09:37:18 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=h8ePWaoI68GEISPuC9Ge+4qXkO8FW2G00MOJpn9vfbw=; b=MCQ0c4ia6IOTYMEFJR7GcbSkx2aCS+J3n3kLqdT/eCneA5RnZIfOSsLVqqbhg8oV3C z6a/cjMHL2s70DKPWTui5sWqdtSoRO2Sa3X/ZDXnFAdUlSIwaWZO5o28PzA9YRicQEsL lrmb7xLSeO5Upy8PojNo03zIknEUlOaIGEk3GY2vDeeMswBskS1b7cfH1kyMJjq2FiTa EPOOWdtU+2jcY+efHi/NzqtHu6KIrndJCcMf9jiuRTBG5FPDsaXOxEi35OK/QcXUq7OH bWHqrLVOAQNuSwHdqjlfEqvb31BygBYoGvuY/kisE56Bi1ONNPXA+NnMTkKqBCgcNMjA jbEw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=h8ePWaoI68GEISPuC9Ge+4qXkO8FW2G00MOJpn9vfbw=; b=NY7LEFwc1vGVA77aOtp2hjNZxLVNpSbdpk6+MZEbc5uo989LxxNT4m+0kEkPnyM/Z4 vpfrDkUxDzKATM3/5qDN50qUfu41o1lNeWozqkFmMhTRlP+Y9m+JDKSlGXV6JTT8WKWj D+CCRo/7CpYN6mZxpitHUSCub74pFgAEAG/2NkO8dSzoCbxKsorNWMzgYHe7cGDHoYGY sSCLW74foZ7921lJ149xWMHaU4t1UDmWV32yX4vTMBjkVwzL87UWjL2Vb4PTUZdGW8Cx AWMBEMsufn5qseHJfM/2007m9/dQ8Er7NSj+/gPCda2+yUFTCwokvJfcyQzwR5gwj6A9 B9KQ== X-Gm-Message-State: AO0yUKXsNdmkm3PQIP9zyZ9XKqKnMmjfeQB1OCZcNPF/ksZK3Q5H1XD6 SRZCSDHe/fOQ6yp+IDPejsslXNAh0yU= X-Google-Smtp-Source: AK7set9am1xybr1j78qWLG/uhzYsiBlXA8SkEZQ4RtcwObl9A6wj5gHEFg0gqcEhjrM4bOMvHErmcMX4UGI= X-Received: from surenb-desktop.mtv.corp.google.com ([2620:15c:211:200:e1f6:21d1:eead:3897]) (user=surenb job=sendgmr) by 2002:a5b:84b:0:b0:a09:32fb:be73 with SMTP id v11-20020a5b084b000000b00a0932fbbe73mr9490924ybq.0.1677519434899; Mon, 27 Feb 2023 09:37:14 -0800 (PST) Date: Mon, 27 Feb 2023 09:36:15 -0800 In-Reply-To: <20230227173632.3292573-1-surenb@google.com> Mime-Version: 1.0 References: <20230227173632.3292573-1-surenb@google.com> X-Mailer: git-send-email 2.39.2.722.g9855ee24e9-goog Message-ID: <20230227173632.3292573-17-surenb@google.com> Subject: [PATCH v4 16/33] mm/mmap: write-lock VMAs in vma_prepare before modifying them From: Suren Baghdasaryan To: akpm@linux-foundation.org Cc: michel@lespinasse.org, jglisse@google.com, mhocko@suse.com, vbabka@suse.cz, hannes@cmpxchg.org, mgorman@techsingularity.net, dave@stgolabs.net, willy@infradead.org, liam.howlett@oracle.com, peterz@infradead.org, ldufour@linux.ibm.com, paulmck@kernel.org, mingo@redhat.com, will@kernel.org, luto@kernel.org, songliubraving@fb.com, peterx@redhat.com, david@redhat.com, dhowells@redhat.com, hughd@google.com, bigeasy@linutronix.de, kent.overstreet@linux.dev, punit.agrawal@bytedance.com, lstoakes@gmail.com, peterjung1337@gmail.com, rientjes@google.com, chriscli@google.com, axelrasmussen@google.com, joelaf@google.com, minchan@google.com, rppt@kernel.org, jannh@google.com, shakeelb@google.com, tatashin@google.com, edumazet@google.com, gthelen@google.com, gurua@google.com, arjunroy@google.com, soheil@google.com, leewalsh@google.com, posk@google.com, michalechner92@googlemail.com, linux-mm@kvack.org, linux-arm-kernel@lists.infradead.org, linuxppc-dev@lists.ozlabs.org, x86@kernel.org, linux-kernel@vger.kernel.org, kernel-team@android.com, Suren Baghdasaryan Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Write-lock all VMAs which might be affected by a merge, split, expand or shrink operations. All these operations use vma_prepare() before making the modifications, therefore it provides a centralized place to perform VMA locking. Signed-off-by: Suren Baghdasaryan --- mm/mmap.c | 10 ++++++++++ 1 file changed, 10 insertions(+) diff --git a/mm/mmap.c b/mm/mmap.c index 92893d86c0af..e73fbb84ce12 100644 --- a/mm/mmap.c +++ b/mm/mmap.c @@ -502,6 +502,16 @@ static inline void init_vma_prep(struct vma_prepare *v= p, */ static inline void vma_prepare(struct vma_prepare *vp) { + if (vp->vma) + vma_start_write(vp->vma); + if (vp->adj_next) + vma_start_write(vp->adj_next); + /* vp->insert is always a newly created VMA, no need for locking */ + if (vp->remove) + vma_start_write(vp->remove); + if (vp->remove2) + vma_start_write(vp->remove2); + if (vp->file) { uprobe_munmap(vp->vma, vp->vma->vm_start, vp->vma->vm_end); =20 --=20 2.39.2.722.g9855ee24e9-goog From nobody Wed Sep 10 09:07:32 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 18AE1C7EE23 for ; Mon, 27 Feb 2023 17:38:26 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229755AbjB0RiY (ORCPT ); Mon, 27 Feb 2023 12:38:24 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48308 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229935AbjB0Rhq (ORCPT ); Mon, 27 Feb 2023 12:37:46 -0500 Received: from mail-yw1-x1149.google.com (mail-yw1-x1149.google.com [IPv6:2607:f8b0:4864:20::1149]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id F406D23C6D for ; Mon, 27 Feb 2023 09:37:18 -0800 (PST) Received: by mail-yw1-x1149.google.com with SMTP id 00721157ae682-536bbaeceeaso152728337b3.11 for ; Mon, 27 Feb 2023 09:37:18 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=VWwKpXjDg6WAvmp/os9+Adpl/ccsxZtoSX3Q5XOWyQ0=; b=QlvGB3sbxoYWlBS31TzcGu38FuNf7Lhjhtq2VlYCxWOSVfHgnAjaRBrb1w97aEacbe K6advecLicCSUfFEo9GDjw6ni0z0vYUO24YPFOrOT6PGrUxiFDeAMZRvmYa7tXn8nDcV jQQpuW50daFJ8WQmqyVE7pXe4jO0h72Ly1mymbXPSaJk3hcoxxEHUhmsYcU7Utdafwgh EoU7EkscyBq5l+R+tPrIdVU+apzyZHKYcAM5J8frjMBd1EpUZg/bzw2FKzGiNCJc/yPF OX60CUqczCC42EwMz0GH4Ohwd96NL51ADp4JavDsEQkoG6zGf4C60SM+ihxfLXuX3u7Q RNJw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=VWwKpXjDg6WAvmp/os9+Adpl/ccsxZtoSX3Q5XOWyQ0=; b=4o+E/0g2Oyp+ew3ttUiBvTYM9VT3pMVc7AvXhTHizVAuG8jPhGFEkfv79wKMHZ21ht py0Gz9UGlGOYz5jUGOjIQ8EiC3+jXJJHqlv0Ji1pjXJDx3/KjJp+48MzE/CBUaeHMYOK mKD700F9WsldZWoiQJFhnlfTsXcByigqheZleyxN3+sTyiEA98eSo6O2KSKVPifX2J6q JPB1/RNzYMjwDsUkCp48HtntUtvWQUfZYrVKNyLsZVNT46456fzROzlzBJlRuioBCIJb 9sqzp9TwUh+4c39YnAck8mhB8mfh5lDKg8c4T4IMbUUHz11+G6vOaB1Ka/ZEOIdv9Jux TSEQ== X-Gm-Message-State: AO0yUKXgx4Mjdmdnk24O4IRhWE1ATbSHy0fBU0o70HW2w5tAshtjFiCM 7o1Mg7BrrSFpQz7bWCFfEcv9UOdj2ls= X-Google-Smtp-Source: AK7set/Ab7tdetucxhQI7Y0d4CepnVZrRoikg6zNi8DSW4Q8ICogb5HvJ+8Ost/w6wLIBwUIyCkVrkJznGc= X-Received: from surenb-desktop.mtv.corp.google.com ([2620:15c:211:200:e1f6:21d1:eead:3897]) (user=surenb job=sendgmr) by 2002:a05:6902:10c:b0:a9f:3ee4:51a0 with SMTP id o12-20020a056902010c00b00a9f3ee451a0mr10ybh.505.1677519437283; Mon, 27 Feb 2023 09:37:17 -0800 (PST) Date: Mon, 27 Feb 2023 09:36:16 -0800 In-Reply-To: <20230227173632.3292573-1-surenb@google.com> Mime-Version: 1.0 References: <20230227173632.3292573-1-surenb@google.com> X-Mailer: git-send-email 2.39.2.722.g9855ee24e9-goog Message-ID: <20230227173632.3292573-18-surenb@google.com> Subject: [PATCH v4 17/33] mm/mremap: write-lock VMA while remapping it to a new address range From: Suren Baghdasaryan To: akpm@linux-foundation.org Cc: michel@lespinasse.org, jglisse@google.com, mhocko@suse.com, vbabka@suse.cz, hannes@cmpxchg.org, mgorman@techsingularity.net, dave@stgolabs.net, willy@infradead.org, liam.howlett@oracle.com, peterz@infradead.org, ldufour@linux.ibm.com, paulmck@kernel.org, mingo@redhat.com, will@kernel.org, luto@kernel.org, songliubraving@fb.com, peterx@redhat.com, david@redhat.com, dhowells@redhat.com, hughd@google.com, bigeasy@linutronix.de, kent.overstreet@linux.dev, punit.agrawal@bytedance.com, lstoakes@gmail.com, peterjung1337@gmail.com, rientjes@google.com, chriscli@google.com, axelrasmussen@google.com, joelaf@google.com, minchan@google.com, rppt@kernel.org, jannh@google.com, shakeelb@google.com, tatashin@google.com, edumazet@google.com, gthelen@google.com, gurua@google.com, arjunroy@google.com, soheil@google.com, leewalsh@google.com, posk@google.com, michalechner92@googlemail.com, linux-mm@kvack.org, linux-arm-kernel@lists.infradead.org, linuxppc-dev@lists.ozlabs.org, x86@kernel.org, linux-kernel@vger.kernel.org, kernel-team@android.com, Suren Baghdasaryan , Laurent Dufour Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Write-lock VMA as locked before copying it and when copy_vma produces a new VMA. Signed-off-by: Suren Baghdasaryan Reviewed-by: Laurent Dufour Reviewed-by: Hyeonggon Yoo <42.hyeyoo@gmail.com> --- mm/mmap.c | 1 + mm/mremap.c | 1 + 2 files changed, 2 insertions(+) diff --git a/mm/mmap.c b/mm/mmap.c index e73fbb84ce12..1f42b9a52b9b 100644 --- a/mm/mmap.c +++ b/mm/mmap.c @@ -3189,6 +3189,7 @@ struct vm_area_struct *copy_vma(struct vm_area_struct= **vmap, get_file(new_vma->vm_file); if (new_vma->vm_ops && new_vma->vm_ops->open) new_vma->vm_ops->open(new_vma); + vma_start_write(new_vma); if (vma_link(mm, new_vma)) goto out_vma_link; *need_rmap_locks =3D false; diff --git a/mm/mremap.c b/mm/mremap.c index 1ddf7beb62e9..327c38eb132e 100644 --- a/mm/mremap.c +++ b/mm/mremap.c @@ -623,6 +623,7 @@ static unsigned long move_vma(struct vm_area_struct *vm= a, return -ENOMEM; } =20 + vma_start_write(vma); new_pgoff =3D vma->vm_pgoff + ((old_addr - vma->vm_start) >> PAGE_SHIFT); new_vma =3D copy_vma(&vma, new_addr, new_len, new_pgoff, &need_rmap_locks); --=20 2.39.2.722.g9855ee24e9-goog From nobody Wed Sep 10 09:07:32 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 736A3C64ED8 for ; Mon, 27 Feb 2023 17:38:29 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230395AbjB0Ri1 (ORCPT ); Mon, 27 Feb 2023 12:38:27 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49328 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229558AbjB0Rhr (ORCPT ); Mon, 27 Feb 2023 12:37:47 -0500 Received: from mail-yw1-x1149.google.com (mail-yw1-x1149.google.com [IPv6:2607:f8b0:4864:20::1149]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E321724C89 for ; Mon, 27 Feb 2023 09:37:21 -0800 (PST) Received: by mail-yw1-x1149.google.com with SMTP id 00721157ae682-5376fa4106eso153559837b3.7 for ; Mon, 27 Feb 2023 09:37:21 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=r0m7Mx7yExEfpE40BLjkT9H7Bv79s8byLhYJ0wRxTC4=; b=J6HTSW3ije3IDV45f4imLJua2F9c9fG/LezyaDIF0k2VV4fuwgH+fku2QXSjWP2auL d+6+O+xNSm6Xj7FlWbcn2IucG0PD3/gJW2mH/MURmde3Q+9WMeczT04Csr++aQNQAB4v uOclz5rts/w0oMZxgj6JdiddKEJJwswsgJAAZvi48OAzGBi1BuSBhTNF8+d2d3+YbCDm fwaoPEsSaG9C8KiMPsrgpeVhqU2cm1pHUxTYDzrWvN6667+vmOEZ293x3rsB2i9qTzir LhGya0SRahx81MeDy4vFauftQ8Z+i+VjBO+uhvPnINIKKKPeRWHZkWIWZS9shPaay/sM /HGA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=r0m7Mx7yExEfpE40BLjkT9H7Bv79s8byLhYJ0wRxTC4=; b=5uYSAYKwD5doJXYhic5PCBoDqbu70mY5a36+UV34OOTYxNhLjtbowNKbbNyt7SX6Zp Hrswm12lbMtbln6IyqLKNshrhvhypHs5hnk+PSUa2LPaCjVhbopMjrzNetsaGayZ/RCV svHV5HwQRwc7zQHAsK86AmtJJLTmbJ/OB9z3YNN1M5gPtB/Q5hLhAwScm+/53Oy2jBE/ buk75JMs16VObL87jxMYDjzXKgdygYDsirawymQC49cDHcfJkjA9bguhu273ldh1XM6g lx3LRGqlB1DeBkmPMROn3/jDKV1wLdhc4KHtgaBsJ3GqKDyIK5vK4IUsbv+rMPle+Ty0 IxyQ== X-Gm-Message-State: AO0yUKU1hEQvZ6J/paSSYmBWR4TNfWXwULDhNcIEN235UxqVfVmQQKYT jW/nHdvkf0I3DuOzbKu/bJaqqQzkb0A= X-Google-Smtp-Source: AK7set8EV36Dr/ESiL80fu1d1SaW9bmdiDECZfWEbGGMOjKuqTD7GEuVWa5jynsNqQ2pFS6J79ENwuY0pY0= X-Received: from surenb-desktop.mtv.corp.google.com ([2620:15c:211:200:e1f6:21d1:eead:3897]) (user=surenb job=sendgmr) by 2002:a25:938e:0:b0:a60:c167:c056 with SMTP id a14-20020a25938e000000b00a60c167c056mr4911453ybm.9.1677519440118; Mon, 27 Feb 2023 09:37:20 -0800 (PST) Date: Mon, 27 Feb 2023 09:36:17 -0800 In-Reply-To: <20230227173632.3292573-1-surenb@google.com> Mime-Version: 1.0 References: <20230227173632.3292573-1-surenb@google.com> X-Mailer: git-send-email 2.39.2.722.g9855ee24e9-goog Message-ID: <20230227173632.3292573-19-surenb@google.com> Subject: [PATCH v4 18/33] mm: write-lock VMAs before removing them from VMA tree From: Suren Baghdasaryan To: akpm@linux-foundation.org Cc: michel@lespinasse.org, jglisse@google.com, mhocko@suse.com, vbabka@suse.cz, hannes@cmpxchg.org, mgorman@techsingularity.net, dave@stgolabs.net, willy@infradead.org, liam.howlett@oracle.com, peterz@infradead.org, ldufour@linux.ibm.com, paulmck@kernel.org, mingo@redhat.com, will@kernel.org, luto@kernel.org, songliubraving@fb.com, peterx@redhat.com, david@redhat.com, dhowells@redhat.com, hughd@google.com, bigeasy@linutronix.de, kent.overstreet@linux.dev, punit.agrawal@bytedance.com, lstoakes@gmail.com, peterjung1337@gmail.com, rientjes@google.com, chriscli@google.com, axelrasmussen@google.com, joelaf@google.com, minchan@google.com, rppt@kernel.org, jannh@google.com, shakeelb@google.com, tatashin@google.com, edumazet@google.com, gthelen@google.com, gurua@google.com, arjunroy@google.com, soheil@google.com, leewalsh@google.com, posk@google.com, michalechner92@googlemail.com, linux-mm@kvack.org, linux-arm-kernel@lists.infradead.org, linuxppc-dev@lists.ozlabs.org, x86@kernel.org, linux-kernel@vger.kernel.org, kernel-team@android.com, Suren Baghdasaryan Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Write-locking VMAs before isolating them ensures that page fault handlers don't operate on isolated VMAs. Signed-off-by: Suren Baghdasaryan --- mm/mmap.c | 1 + mm/nommu.c | 5 +++++ 2 files changed, 6 insertions(+) diff --git a/mm/mmap.c b/mm/mmap.c index 1f42b9a52b9b..f7ed357056c4 100644 --- a/mm/mmap.c +++ b/mm/mmap.c @@ -2255,6 +2255,7 @@ int split_vma(struct vma_iterator *vmi, struct vm_are= a_struct *vma, static inline int munmap_sidetree(struct vm_area_struct *vma, struct ma_state *mas_detach) { + vma_start_write(vma); mas_set_range(mas_detach, vma->vm_start, vma->vm_end - 1); if (mas_store_gfp(mas_detach, vma, GFP_KERNEL)) return -ENOMEM; diff --git a/mm/nommu.c b/mm/nommu.c index 57ba243c6a37..2ab162d773e2 100644 --- a/mm/nommu.c +++ b/mm/nommu.c @@ -588,6 +588,7 @@ static int delete_vma_from_mm(struct vm_area_struct *vm= a) current->pid); return -ENOMEM; } + vma_start_write(vma); cleanup_vma_from_mm(vma); =20 /* remove from the MM's tree and list */ @@ -1519,6 +1520,10 @@ void exit_mmap(struct mm_struct *mm) */ mmap_write_lock(mm); for_each_vma(vmi, vma) { + /* + * No need to lock VMA because this is the only mm user and no + * page fault handled can race with it. + */ cleanup_vma_from_mm(vma); delete_vma(mm, vma); cond_resched(); --=20 2.39.2.722.g9855ee24e9-goog From nobody Wed Sep 10 09:07:32 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 03F1BC7EE2E for ; Mon, 27 Feb 2023 17:38:32 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230162AbjB0Ria (ORCPT ); Mon, 27 Feb 2023 12:38:30 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48180 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230342AbjB0Rhv (ORCPT ); Mon, 27 Feb 2023 12:37:51 -0500 Received: from mail-yw1-x114a.google.com (mail-yw1-x114a.google.com [IPv6:2607:f8b0:4864:20::114a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 3C9C5241FE for ; Mon, 27 Feb 2023 09:37:24 -0800 (PST) Received: by mail-yw1-x114a.google.com with SMTP id 00721157ae682-536cad819c7so153272717b3.6 for ; Mon, 27 Feb 2023 09:37:24 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=q4IoNs8/2s63po158n4pCM20IldvB6LsORlyoWZYhDA=; b=bS7cuWSTvikNO5/1/KRQJLPoIX97MSDopLQjzIYhd9M4ATiilp+2nnzhJYwm+xkfAw CA4PkRjLFErdI6iJpWSrCW/0zzxhXa3aDXwe+0SrD6SM1VCA8VjhRHuY2QcHPeKNakQW P+DawdVxSLtrg/I5seo5dukO8m39Wj2s4hqiK8xve+bjTmkIkwxAaDpb3/Zo6UgfUlrQ EPB7WXnu31e6AaMIPOJ6t49a2aYPUJ8xG3eJeremwZc8bnMi2k5b+Q5ZfPziQ2FlHzzp w2I2hG7tsvd97Sj45eHKWDA+0nQshMfUHld0I+kGvTMYTZlSoT7VYdn14KEIlMTUGdtE e10A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=q4IoNs8/2s63po158n4pCM20IldvB6LsORlyoWZYhDA=; b=bWqMSu9z44RoJQR2jKkzEdgkMd6Xom6W1IYQFspFLgtrbClH5EqK2sdQgU9IXbIB1F 5gpzj6cgATZnwmVILSxu6uoYia5dXJmF2Vdbl06y+aFdLkFkgoVqX85PBVuWtXXxEP2Z HgWTn4VFM7mBQYV6m938lzDXVhXJEixRaId2QNTL04IWnIAX2EeONVHcfNtLY3IutJ2v jnmL1RZfi082Fijgn0bxxLqZLMw41yFOi6irKyiDxh8FR/kuhaV2JbZ6ZEuSso6U9Pav zXE4Zb7B56odAJD2AEtuHf8Uu8wppmG2aA2trRe0lq/UMcmw1miBZDdazWmAUNcuIWYE oxsA== X-Gm-Message-State: AO0yUKUWmwCVf+Pdplbi+TG957cHl+fK85y3V6wXimNL2myHzawgiUj1 2Hvo40hHsbbpp3Wvg+qW7sszFi8TdkM= X-Google-Smtp-Source: AK7set/PpfdmNkoTcymoja/27Ir9z6PS18iSX+0Y7AEWmCWJmOIOQsn+Wv7zMECojEbGHKP0cPXfu+hFpqg= X-Received: from surenb-desktop.mtv.corp.google.com ([2620:15c:211:200:e1f6:21d1:eead:3897]) (user=surenb job=sendgmr) by 2002:a05:6902:1388:b0:855:fdcb:4467 with SMTP id x8-20020a056902138800b00855fdcb4467mr287176ybu.0.1677519442170; Mon, 27 Feb 2023 09:37:22 -0800 (PST) Date: Mon, 27 Feb 2023 09:36:18 -0800 In-Reply-To: <20230227173632.3292573-1-surenb@google.com> Mime-Version: 1.0 References: <20230227173632.3292573-1-surenb@google.com> X-Mailer: git-send-email 2.39.2.722.g9855ee24e9-goog Message-ID: <20230227173632.3292573-20-surenb@google.com> Subject: [PATCH v4 19/33] mm: conditionally write-lock VMA in free_pgtables From: Suren Baghdasaryan To: akpm@linux-foundation.org Cc: michel@lespinasse.org, jglisse@google.com, mhocko@suse.com, vbabka@suse.cz, hannes@cmpxchg.org, mgorman@techsingularity.net, dave@stgolabs.net, willy@infradead.org, liam.howlett@oracle.com, peterz@infradead.org, ldufour@linux.ibm.com, paulmck@kernel.org, mingo@redhat.com, will@kernel.org, luto@kernel.org, songliubraving@fb.com, peterx@redhat.com, david@redhat.com, dhowells@redhat.com, hughd@google.com, bigeasy@linutronix.de, kent.overstreet@linux.dev, punit.agrawal@bytedance.com, lstoakes@gmail.com, peterjung1337@gmail.com, rientjes@google.com, chriscli@google.com, axelrasmussen@google.com, joelaf@google.com, minchan@google.com, rppt@kernel.org, jannh@google.com, shakeelb@google.com, tatashin@google.com, edumazet@google.com, gthelen@google.com, gurua@google.com, arjunroy@google.com, soheil@google.com, leewalsh@google.com, posk@google.com, michalechner92@googlemail.com, linux-mm@kvack.org, linux-arm-kernel@lists.infradead.org, linuxppc-dev@lists.ozlabs.org, x86@kernel.org, linux-kernel@vger.kernel.org, kernel-team@android.com, Suren Baghdasaryan Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Normally free_pgtables needs to lock affected VMAs except for the case when VMAs were isolated under VMA write-lock. munmap() does just that, isolating while holding appropriate locks and then downgrading mmap_lock and dropping per-VMA locks before freeing page tables. Add a parameter to free_pgtables for such scenario. Signed-off-by: Suren Baghdasaryan --- mm/internal.h | 2 +- mm/memory.c | 6 +++++- mm/mmap.c | 5 +++-- 3 files changed, 9 insertions(+), 4 deletions(-) diff --git a/mm/internal.h b/mm/internal.h index 08ce56dbb1d9..fce94775819c 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -105,7 +105,7 @@ void folio_activate(struct folio *folio); =20 void free_pgtables(struct mmu_gather *tlb, struct maple_tree *mt, struct vm_area_struct *start_vma, unsigned long floor, - unsigned long ceiling); + unsigned long ceiling, bool mm_wr_locked); void pmd_install(struct mm_struct *mm, pmd_t *pmd, pgtable_t *pte); =20 struct zap_details; diff --git a/mm/memory.c b/mm/memory.c index bfa3100ec5a3..f7f412833e42 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -348,7 +348,7 @@ void free_pgd_range(struct mmu_gather *tlb, =20 void free_pgtables(struct mmu_gather *tlb, struct maple_tree *mt, struct vm_area_struct *vma, unsigned long floor, - unsigned long ceiling) + unsigned long ceiling, bool mm_wr_locked) { MA_STATE(mas, mt, vma->vm_end, vma->vm_end); =20 @@ -366,6 +366,8 @@ void free_pgtables(struct mmu_gather *tlb, struct maple= _tree *mt, * Hide vma from rmap and truncate_pagecache before freeing * pgtables */ + if (mm_wr_locked) + vma_start_write(vma); unlink_anon_vmas(vma); unlink_file_vma(vma); =20 @@ -380,6 +382,8 @@ void free_pgtables(struct mmu_gather *tlb, struct maple= _tree *mt, && !is_vm_hugetlb_page(next)) { vma =3D next; next =3D mas_find(&mas, ceiling - 1); + if (mm_wr_locked) + vma_start_write(vma); unlink_anon_vmas(vma); unlink_file_vma(vma); } diff --git a/mm/mmap.c b/mm/mmap.c index f7ed357056c4..ec745586785c 100644 --- a/mm/mmap.c +++ b/mm/mmap.c @@ -2152,7 +2152,8 @@ static void unmap_region(struct mm_struct *mm, struct= maple_tree *mt, update_hiwater_rss(mm); unmap_vmas(&tlb, mt, vma, start, end, mm_wr_locked); free_pgtables(&tlb, mt, vma, prev ? prev->vm_end : FIRST_USER_ADDRESS, - next ? next->vm_start : USER_PGTABLES_CEILING); + next ? next->vm_start : USER_PGTABLES_CEILING, + mm_wr_locked); tlb_finish_mmu(&tlb); } =20 @@ -3056,7 +3057,7 @@ void exit_mmap(struct mm_struct *mm) mmap_write_lock(mm); mt_clear_in_rcu(&mm->mm_mt); free_pgtables(&tlb, &mm->mm_mt, vma, FIRST_USER_ADDRESS, - USER_PGTABLES_CEILING); + USER_PGTABLES_CEILING, true); tlb_finish_mmu(&tlb); =20 /* --=20 2.39.2.722.g9855ee24e9-goog From nobody Wed Sep 10 09:07:32 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 72632C64ED8 for ; Mon, 27 Feb 2023 17:38:35 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230399AbjB0Rid (ORCPT ); Mon, 27 Feb 2023 12:38:33 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48390 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230347AbjB0Rh5 (ORCPT ); Mon, 27 Feb 2023 12:37:57 -0500 Received: from mail-yw1-x1149.google.com (mail-yw1-x1149.google.com [IPv6:2607:f8b0:4864:20::1149]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 5A46E2331B for ; Mon, 27 Feb 2023 09:37:27 -0800 (PST) Received: by mail-yw1-x1149.google.com with SMTP id 00721157ae682-536af109f9aso154258357b3.13 for ; Mon, 27 Feb 2023 09:37:27 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=Q7njm8z68LI6RSv1P8+cvUE7J2LNjoWsKIE5FW5UVRU=; b=kE2fXrc4CKxmWF4clr80N741Ci0xoX5G8lQJgmu9s0G2B1Dy6UfiGdmy87bkmQVz06 s0jNeoaCkSHSjTyAuaLFgp289qeTagMwone0ldYih81KbxqFw8/SoaDnJa0SNDiiUGH5 C2cHTDPFHiMhA/p+opto/zgvEpmwF0hSnXtUorxCz5wzOWx7pG82xGb9d1FeLnZgcBn6 7W1alMlgUF/qdYBONkVV4Ipuqjxo8jbsNMGdOFgbr+6bIzoXMj9Q7PsKDQfKRgxbc6VN kYkd8hTRqF7X5fVDHYhOgBPe6TXfun1LU5P2SsjoAl6axSlRSprAw6j/yFhpk0MslB1+ 5m8w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=Q7njm8z68LI6RSv1P8+cvUE7J2LNjoWsKIE5FW5UVRU=; b=G59bBEDebkd9554QaE01Hs2JOi4gsCX7JEbd3ABsttwVD/EjXjN0AISw1L2MLHXpDG Bm7Ot22UymI2aa59d8tLN5CYv8C3/6CGFLs5ypEMvyu/usJT5Ef3cm7CoH2ilR+RFhd8 k0jcVAGQJ2EGd2PN6Y57afsC40kPAlmsN9ra3uCjkyVW8x23AfwAUWUIddszzOjt9Vqv 3oLImS9xPzu07EXqg5UhvrFg98PvCqPFy7vM88pqDEkttPcI4LUyP/nXJI9FI5RjMWnN oLqyoBSFH4ptwyEi7nQ+bULbqo5+NF6TQRHR8BIkWmvv3FAAw56Md+TiaICz/hepOeXj WRQA== X-Gm-Message-State: AO0yUKWbU3EnOUmzPUEn0gvJHB3dUeaN4yzCiAHEA8BpouShDsUDbvUf nreeccOTLX4MZkA+3DrwdeqycTRt4mM= X-Google-Smtp-Source: AK7set8BLaXPA95+t3odvktVZcgw7j4RuC8orgkmVPEnGvQUgHaO3AYXobp8iNv8pTyoR4t4ofqJKVALLsQ= X-Received: from surenb-desktop.mtv.corp.google.com ([2620:15c:211:200:e1f6:21d1:eead:3897]) (user=surenb job=sendgmr) by 2002:a05:6902:4d4:b0:91d:d6ed:6bd2 with SMTP id v20-20020a05690204d400b0091dd6ed6bd2mr10048648ybs.1.1677519444556; Mon, 27 Feb 2023 09:37:24 -0800 (PST) Date: Mon, 27 Feb 2023 09:36:19 -0800 In-Reply-To: <20230227173632.3292573-1-surenb@google.com> Mime-Version: 1.0 References: <20230227173632.3292573-1-surenb@google.com> X-Mailer: git-send-email 2.39.2.722.g9855ee24e9-goog Message-ID: <20230227173632.3292573-21-surenb@google.com> Subject: [PATCH v4 20/33] kernel/fork: assert no VMA readers during its destruction From: Suren Baghdasaryan To: akpm@linux-foundation.org Cc: michel@lespinasse.org, jglisse@google.com, mhocko@suse.com, vbabka@suse.cz, hannes@cmpxchg.org, mgorman@techsingularity.net, dave@stgolabs.net, willy@infradead.org, liam.howlett@oracle.com, peterz@infradead.org, ldufour@linux.ibm.com, paulmck@kernel.org, mingo@redhat.com, will@kernel.org, luto@kernel.org, songliubraving@fb.com, peterx@redhat.com, david@redhat.com, dhowells@redhat.com, hughd@google.com, bigeasy@linutronix.de, kent.overstreet@linux.dev, punit.agrawal@bytedance.com, lstoakes@gmail.com, peterjung1337@gmail.com, rientjes@google.com, chriscli@google.com, axelrasmussen@google.com, joelaf@google.com, minchan@google.com, rppt@kernel.org, jannh@google.com, shakeelb@google.com, tatashin@google.com, edumazet@google.com, gthelen@google.com, gurua@google.com, arjunroy@google.com, soheil@google.com, leewalsh@google.com, posk@google.com, michalechner92@googlemail.com, linux-mm@kvack.org, linux-arm-kernel@lists.infradead.org, linuxppc-dev@lists.ozlabs.org, x86@kernel.org, linux-kernel@vger.kernel.org, kernel-team@android.com, Suren Baghdasaryan Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Assert there are no holders of VMA lock for reading when it is about to be destroyed. Signed-off-by: Suren Baghdasaryan --- kernel/fork.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/kernel/fork.c b/kernel/fork.c index e1dd79c7738c..bdb55f25895d 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -491,6 +491,9 @@ static void vm_area_free_rcu_cb(struct rcu_head *head) { struct vm_area_struct *vma =3D container_of(head, struct vm_area_struct, vm_rcu); + + /* The vma should not be locked while being destroyed. */ + VM_BUG_ON_VMA(rwsem_is_locked(&vma->lock), vma); __vm_area_free(vma); } #endif --=20 2.39.2.722.g9855ee24e9-goog From nobody Wed Sep 10 09:07:32 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 20316C64ED8 for ; Mon, 27 Feb 2023 17:38:39 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230358AbjB0Rih (ORCPT ); Mon, 27 Feb 2023 12:38:37 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49086 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229996AbjB0RiA (ORCPT ); Mon, 27 Feb 2023 12:38:00 -0500 Received: from mail-pg1-x549.google.com (mail-pg1-x549.google.com [IPv6:2607:f8b0:4864:20::549]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B5F79241CB for ; Mon, 27 Feb 2023 09:37:31 -0800 (PST) Received: by mail-pg1-x549.google.com with SMTP id v24-20020a631518000000b00502e6bfe335so2167605pgl.2 for ; Mon, 27 Feb 2023 09:37:31 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=2RRQaU516ZwpTlJ/IPA+DGjHt1op6pQWjJsnPIcYfuA=; b=oOzu68rhqFvQUE3xPXRgC65GI9mEeLvvYxOtiIzpOqVj22Kyf5G6v2nWjVGklLxMvP HmGHsiBCF1KS106jCim6ns3MdM+isHFIB2cW4BN/lGs8dwWcRP4ETfa+ag2t0c8ooDBq IbM6nmgazseNTIyYCuuyg4z6GzI60o3WZCSdFyA+NgXZEZsvKFhXlvHjg7yVejPrN9tm lKdZz9wSXLCbvII2MlT1JIIh3E3Pzy0qhZRWTGIGZxssznyrIga0ZMQq7hZXO4CZPVOT 3YmHDK/l7fYrKAyqOPLY4x/QliUuYuXk7UtTVeDvzP6SJegWWyEijbowKlqNMovXBpqB mtGg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=2RRQaU516ZwpTlJ/IPA+DGjHt1op6pQWjJsnPIcYfuA=; b=jcU6H5mVSyO3ErOUDb6RS6eVDqxmZDyby8xcukn3AZWCc5lDopn2+6AG55heeopR6B FdsqudeO5vpMMg8MZxDE+cW0tSZRkmNxpoB2YGb7r1kP9V3H05tm6CYFhy8wkgLgC5FI JnNyGKx6GYPUlh2XDqHb/vihKKJ5JcLlTUyBISmiyb0LHD3xzt879Si0jutstz0u69Qe F9IJs3EurM4/r42P+eq2pdPcXAgym3Edgcc6O6fDm921m9oT0A/xGoqhXSs3mhb/wfj2 +IDgC2dCTiOBt+oEG52se0GKMsWje2Vn5UoHCqY3C6gIX6yGoMTpvOuyRkeHk+B4Uivy +01Q== X-Gm-Message-State: AO0yUKWii97jrWm1f5Cs2U5euQeg++2nAqqn17NUdC24PvT9qulkgIaJ jHwZznYjGhIPSAUhICyZfn2A4VZ7ai0= X-Google-Smtp-Source: AK7set/2J8KUumsEZ/0orhhZ+Y6s/zP6QMsk4a+yqlkwHuIWJVWfr/HftGogZhR0NE5sVdiuRfgOT6Lze5Y= X-Received: from surenb-desktop.mtv.corp.google.com ([2620:15c:211:200:e1f6:21d1:eead:3897]) (user=surenb job=sendgmr) by 2002:a62:8387:0:b0:593:974c:cba7 with SMTP id h129-20020a628387000000b00593974ccba7mr6413186pfe.5.1677519446727; Mon, 27 Feb 2023 09:37:26 -0800 (PST) Date: Mon, 27 Feb 2023 09:36:20 -0800 In-Reply-To: <20230227173632.3292573-1-surenb@google.com> Mime-Version: 1.0 References: <20230227173632.3292573-1-surenb@google.com> X-Mailer: git-send-email 2.39.2.722.g9855ee24e9-goog Message-ID: <20230227173632.3292573-22-surenb@google.com> Subject: [PATCH v4 21/33] mm/mmap: prevent pagefault handler from racing with mmu_notifier registration From: Suren Baghdasaryan To: akpm@linux-foundation.org Cc: michel@lespinasse.org, jglisse@google.com, mhocko@suse.com, vbabka@suse.cz, hannes@cmpxchg.org, mgorman@techsingularity.net, dave@stgolabs.net, willy@infradead.org, liam.howlett@oracle.com, peterz@infradead.org, ldufour@linux.ibm.com, paulmck@kernel.org, mingo@redhat.com, will@kernel.org, luto@kernel.org, songliubraving@fb.com, peterx@redhat.com, david@redhat.com, dhowells@redhat.com, hughd@google.com, bigeasy@linutronix.de, kent.overstreet@linux.dev, punit.agrawal@bytedance.com, lstoakes@gmail.com, peterjung1337@gmail.com, rientjes@google.com, chriscli@google.com, axelrasmussen@google.com, joelaf@google.com, minchan@google.com, rppt@kernel.org, jannh@google.com, shakeelb@google.com, tatashin@google.com, edumazet@google.com, gthelen@google.com, gurua@google.com, arjunroy@google.com, soheil@google.com, leewalsh@google.com, posk@google.com, michalechner92@googlemail.com, linux-mm@kvack.org, linux-arm-kernel@lists.infradead.org, linuxppc-dev@lists.ozlabs.org, x86@kernel.org, linux-kernel@vger.kernel.org, kernel-team@android.com, Suren Baghdasaryan Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Page fault handlers might need to fire MMU notifications while a new notifier is being registered. Modify mm_take_all_locks to write-lock all VMAs and prevent this race with page fault handlers that would hold VMA locks. VMAs are locked before i_mmap_rwsem and anon_vma to keep the same locking order as in page fault handlers. Signed-off-by: Suren Baghdasaryan --- mm/mmap.c | 9 +++++++++ 1 file changed, 9 insertions(+) diff --git a/mm/mmap.c b/mm/mmap.c index ec745586785c..b947d82e8522 100644 --- a/mm/mmap.c +++ b/mm/mmap.c @@ -3486,6 +3486,7 @@ static void vm_lock_mapping(struct mm_struct *mm, str= uct address_space *mapping) * of mm/rmap.c: * - all hugetlbfs_i_mmap_rwsem_key locks (aka mapping->i_mmap_rwsem for * hugetlb mapping); + * - all vmas marked locked * - all i_mmap_rwsem locks; * - all anon_vma->rwseml * @@ -3508,6 +3509,13 @@ int mm_take_all_locks(struct mm_struct *mm) =20 mutex_lock(&mm_all_locks_mutex); =20 + mas_for_each(&mas, vma, ULONG_MAX) { + if (signal_pending(current)) + goto out_unlock; + vma_start_write(vma); + } + + mas_set(&mas, 0); mas_for_each(&mas, vma, ULONG_MAX) { if (signal_pending(current)) goto out_unlock; @@ -3597,6 +3605,7 @@ void mm_drop_all_locks(struct mm_struct *mm) if (vma->vm_file && vma->vm_file->f_mapping) vm_unlock_mapping(vma->vm_file->f_mapping); } + vma_end_write_all(mm); =20 mutex_unlock(&mm_all_locks_mutex); } --=20 2.39.2.722.g9855ee24e9-goog From nobody Wed Sep 10 09:07:32 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 17CAFC64ED6 for ; Mon, 27 Feb 2023 17:38:49 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230206AbjB0Ris (ORCPT ); Mon, 27 Feb 2023 12:38:48 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49100 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230130AbjB0RiB (ORCPT ); Mon, 27 Feb 2023 12:38:01 -0500 Received: from mail-yw1-x1149.google.com (mail-yw1-x1149.google.com [IPv6:2607:f8b0:4864:20::1149]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E09D72449F for ; Mon, 27 Feb 2023 09:37:32 -0800 (PST) Received: by mail-yw1-x1149.google.com with SMTP id 00721157ae682-536bbaeceeaso152738957b3.11 for ; Mon, 27 Feb 2023 09:37:32 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=arZZ82aLoZGzTODI4V3Ds+XnXRSv5CvpGtFg8O7kSHU=; b=FEvdPkuedKMzzGsmheGWPizHKb0nGjYHVNnx+9kAk6JWaGUBiX99d1Bv/O2GM6XSSl gB8+ecHQ4IjEhzwJPfml0bM5L6LYZ2SwUORgwjGkuiwqlRjsYJT4KPxwzw1UlC60m4aZ A90w9fKe9n+yvmErALp7FrbdpDMG/bmb9rYVdsVvHwUyt0atMOUCv41ig6x8R+tdHGZS c/nc1AnKLbHyXNdSxCybAre4lyR5J1UNPqZOtvyqt/5yJckeBvBSk6OPvMXjo6UK2C1R s7BFEuIEED1IHyyqkGYjy7EtZvl+QTa1y4yJouRGAzmHG8rSxuivFT0pWoYd4Spinq1k BQ2A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=arZZ82aLoZGzTODI4V3Ds+XnXRSv5CvpGtFg8O7kSHU=; b=qjSpG2Y2h7n+yazIQgurYij6+rDu5yKYr1YjuQXX4yD9X6PoQlBqHjPWlHrBTj4y09 f5BQLxfAsBwUPss1esalNFiFu2WnVpMHzgmfKkd9RIIRmc2BFWjeYgIVDuOVEyRcY7Jz 6rzo6cSudD7fT7Z9dV9wsMOjL0iC1I72v85kxhEpCYuxoEuqegL5k7Z6uf0gchTMf7pP KeezjPxC+aBW/Eg7wxenvZ2V3eU8nJRqNh2KzrBwVizjEAOs1yJTAMLW4zC9FBbpGzT0 m5Onk65UwoPtHtYaSFi/jJCHVt0No6CdzaDQQOWzkGTYx+YOChKWPsFk9WJl/bRe+Di0 8d0A== X-Gm-Message-State: AO0yUKVxFbxx1L3wQx/4WDcXjQS34+BWGRDRrwlAZ5BADFZaTTv8HhFn w6QtH/3mlZpp3Vz5ntFi/8iSy+X/4V4= X-Google-Smtp-Source: AK7set/UVPuqJo669/ldHphDhViQWe/utENdYlZ/T7NeJi51KLkDK1GElSVGwaBDs6B97rQ0tmz39GqO0ZQ= X-Received: from surenb-desktop.mtv.corp.google.com ([2620:15c:211:200:e1f6:21d1:eead:3897]) (user=surenb job=sendgmr) by 2002:a81:ac68:0:b0:535:18be:4126 with SMTP id z40-20020a81ac68000000b0053518be4126mr10693031ywj.6.1677519449031; Mon, 27 Feb 2023 09:37:29 -0800 (PST) Date: Mon, 27 Feb 2023 09:36:21 -0800 In-Reply-To: <20230227173632.3292573-1-surenb@google.com> Mime-Version: 1.0 References: <20230227173632.3292573-1-surenb@google.com> X-Mailer: git-send-email 2.39.2.722.g9855ee24e9-goog Message-ID: <20230227173632.3292573-23-surenb@google.com> Subject: [PATCH v4 22/33] mm: introduce vma detached flag From: Suren Baghdasaryan To: akpm@linux-foundation.org Cc: michel@lespinasse.org, jglisse@google.com, mhocko@suse.com, vbabka@suse.cz, hannes@cmpxchg.org, mgorman@techsingularity.net, dave@stgolabs.net, willy@infradead.org, liam.howlett@oracle.com, peterz@infradead.org, ldufour@linux.ibm.com, paulmck@kernel.org, mingo@redhat.com, will@kernel.org, luto@kernel.org, songliubraving@fb.com, peterx@redhat.com, david@redhat.com, dhowells@redhat.com, hughd@google.com, bigeasy@linutronix.de, kent.overstreet@linux.dev, punit.agrawal@bytedance.com, lstoakes@gmail.com, peterjung1337@gmail.com, rientjes@google.com, chriscli@google.com, axelrasmussen@google.com, joelaf@google.com, minchan@google.com, rppt@kernel.org, jannh@google.com, shakeelb@google.com, tatashin@google.com, edumazet@google.com, gthelen@google.com, gurua@google.com, arjunroy@google.com, soheil@google.com, leewalsh@google.com, posk@google.com, michalechner92@googlemail.com, linux-mm@kvack.org, linux-arm-kernel@lists.infradead.org, linuxppc-dev@lists.ozlabs.org, x86@kernel.org, linux-kernel@vger.kernel.org, kernel-team@android.com, Suren Baghdasaryan Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Per-vma locking mechanism will search for VMA under RCU protection and then after locking it, has to ensure it was not removed from the VMA tree after we found it. To make this check efficient, introduce a vma->detached flag to mark VMAs which were removed from the VMA tree. Signed-off-by: Suren Baghdasaryan --- include/linux/mm.h | 11 +++++++++++ include/linux/mm_types.h | 3 +++ mm/mmap.c | 2 ++ 3 files changed, 16 insertions(+) diff --git a/include/linux/mm.h b/include/linux/mm.h index 3d5e8666892d..895bb3950e8a 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -693,6 +693,14 @@ static inline void vma_assert_write_locked(struct vm_a= rea_struct *vma) VM_BUG_ON_VMA(vma->vm_lock_seq !=3D READ_ONCE(vma->vm_mm->mm_lock_seq), v= ma); } =20 +static inline void vma_mark_detached(struct vm_area_struct *vma, bool deta= ched) +{ + /* When detaching vma should be write-locked */ + if (detached) + vma_assert_write_locked(vma); + vma->detached =3D detached; +} + #else /* CONFIG_PER_VMA_LOCK */ =20 static inline void vma_init_lock(struct vm_area_struct *vma) {} @@ -701,6 +709,8 @@ static inline bool vma_start_read(struct vm_area_struct= *vma) static inline void vma_end_read(struct vm_area_struct *vma) {} static inline void vma_start_write(struct vm_area_struct *vma) {} static inline void vma_assert_write_locked(struct vm_area_struct *vma) {} +static inline void vma_mark_detached(struct vm_area_struct *vma, + bool detached) {} =20 #endif /* CONFIG_PER_VMA_LOCK */ =20 @@ -712,6 +722,7 @@ static inline void vma_init(struct vm_area_struct *vma,= struct mm_struct *mm) vma->vm_mm =3D mm; vma->vm_ops =3D &dummy_vm_ops; INIT_LIST_HEAD(&vma->anon_vma_chain); + vma_mark_detached(vma, false); vma_init_lock(vma); } =20 diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index a4e7493bacd7..45a219d33c6b 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -511,6 +511,9 @@ struct vm_area_struct { #ifdef CONFIG_PER_VMA_LOCK int vm_lock_seq; struct rw_semaphore lock; + + /* Flag to indicate areas detached from the mm->mm_mt tree */ + bool detached; #endif =20 /* diff --git a/mm/mmap.c b/mm/mmap.c index b947d82e8522..df13c33498db 100644 --- a/mm/mmap.c +++ b/mm/mmap.c @@ -600,6 +600,7 @@ static inline void vma_complete(struct vma_prepare *vp, =20 if (vp->remove) { again: + vma_mark_detached(vp->remove, true); if (vp->file) { uprobe_munmap(vp->remove, vp->remove->vm_start, vp->remove->vm_end); @@ -2261,6 +2262,7 @@ static inline int munmap_sidetree(struct vm_area_stru= ct *vma, if (mas_store_gfp(mas_detach, vma, GFP_KERNEL)) return -ENOMEM; =20 + vma_mark_detached(vma, true); if (vma->vm_flags & VM_LOCKED) vma->vm_mm->locked_vm -=3D vma_pages(vma); =20 --=20 2.39.2.722.g9855ee24e9-goog From nobody Wed Sep 10 09:07:32 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2C867C64ED6 for ; Mon, 27 Feb 2023 17:38:46 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230314AbjB0Rio (ORCPT ); Mon, 27 Feb 2023 12:38:44 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49102 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230361AbjB0RiB (ORCPT ); Mon, 27 Feb 2023 12:38:01 -0500 Received: from mail-yw1-x114a.google.com (mail-yw1-x114a.google.com [IPv6:2607:f8b0:4864:20::114a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id DF8522449A for ; Mon, 27 Feb 2023 09:37:32 -0800 (PST) Received: by mail-yw1-x114a.google.com with SMTP id 00721157ae682-536cad819c7so153281707b3.6 for ; Mon, 27 Feb 2023 09:37:32 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=iRRoEkIKRwl2QQHnq1OJVyPp4L6+mQc5CWukcTSQqHI=; b=M2xu6PJmgtYFCFjZtLFUAhDJwsRj1PNvn9tpCCynDkmhHKgKWgNMPKAbFeMB0d7vzQ aKVeG6bXcHl5yxZ6JEXjbkVB2BvokKvbI4LCTQCQPJySbeEhyr0HdzjFd7UV7XC52Nqd IM/Y4Y2oz4P4pwkohu9Bqul5qghmRgidkrq66aJWcYZe5Ah2frrlViyJFqWif5WyIuln lwhBindA8HTOjteANHg0VaUlc+wXF2i0bjESG2yKoDnx3/qzuFoEw7K8TYty79uXoeAT DG0JzJNCfRj4E5rbsNquxz9O5F/H/zOUYZHgnCU0K2ioTwSn7zpOFzVjAn3t3qu4Rbvy YYMA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=iRRoEkIKRwl2QQHnq1OJVyPp4L6+mQc5CWukcTSQqHI=; b=y9na5U0qDi21qBAsnJHbPzounsTntLvQkhZlXN9o2VPt93mV9Rc1O9exuCbhlOm3Uo fAmH8/SZB0RBdDhAd9weSIzXnzs5xviA670YxGqJUnRfFErNFZWMsye7R2ZOQKOkYAz+ h273258vMQDT7/xiTtbNrrtzN2vM3yzavsHIgx/nwy1hlmDpUBQYtHZYqx3yG+h+gtzq nUzI3d+3q1QGLJnGVXPugWyJwFlo5rq25xRgCQSCQTZE637ia9lMJDcqIxBNtFdLTGhK akKOjdGtrVYY5TvMxI6c17WoDUe0ma0iIKr62hJhqcm3vBsSWM/DLP3X/BZvJaB2o0n1 qNog== X-Gm-Message-State: AO0yUKXYIVpxuDSm/e204c6V8k/q4qSehtgDtX/aWZC9+fAYoXQvtJnc 2UWeaION7JXYzDjv9ip4ESMp3K2u6Yw= X-Google-Smtp-Source: AK7set8ux+VAfJlkMlZWU+S8XCIwu1HZ1wVSlOGqhGsiL0aWWxFXwk2KW/a7dOnnyUwvPtqegO11CuQKxxM= X-Received: from surenb-desktop.mtv.corp.google.com ([2620:15c:211:200:e1f6:21d1:eead:3897]) (user=surenb job=sendgmr) by 2002:a5b:6cf:0:b0:a03:da3f:3e65 with SMTP id r15-20020a5b06cf000000b00a03da3f3e65mr9154608ybq.6.1677519451327; Mon, 27 Feb 2023 09:37:31 -0800 (PST) Date: Mon, 27 Feb 2023 09:36:22 -0800 In-Reply-To: <20230227173632.3292573-1-surenb@google.com> Mime-Version: 1.0 References: <20230227173632.3292573-1-surenb@google.com> X-Mailer: git-send-email 2.39.2.722.g9855ee24e9-goog Message-ID: <20230227173632.3292573-24-surenb@google.com> Subject: [PATCH v4 23/33] mm: introduce lock_vma_under_rcu to be used from arch-specific code From: Suren Baghdasaryan To: akpm@linux-foundation.org Cc: michel@lespinasse.org, jglisse@google.com, mhocko@suse.com, vbabka@suse.cz, hannes@cmpxchg.org, mgorman@techsingularity.net, dave@stgolabs.net, willy@infradead.org, liam.howlett@oracle.com, peterz@infradead.org, ldufour@linux.ibm.com, paulmck@kernel.org, mingo@redhat.com, will@kernel.org, luto@kernel.org, songliubraving@fb.com, peterx@redhat.com, david@redhat.com, dhowells@redhat.com, hughd@google.com, bigeasy@linutronix.de, kent.overstreet@linux.dev, punit.agrawal@bytedance.com, lstoakes@gmail.com, peterjung1337@gmail.com, rientjes@google.com, chriscli@google.com, axelrasmussen@google.com, joelaf@google.com, minchan@google.com, rppt@kernel.org, jannh@google.com, shakeelb@google.com, tatashin@google.com, edumazet@google.com, gthelen@google.com, gurua@google.com, arjunroy@google.com, soheil@google.com, leewalsh@google.com, posk@google.com, michalechner92@googlemail.com, linux-mm@kvack.org, linux-arm-kernel@lists.infradead.org, linuxppc-dev@lists.ozlabs.org, x86@kernel.org, linux-kernel@vger.kernel.org, kernel-team@android.com, Suren Baghdasaryan Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Introduce lock_vma_under_rcu function to lookup and lock a VMA during page fault handling. When VMA is not found, can't be locked or changes after being locked, the function returns NULL. The lookup is performed under RCU protection to prevent the found VMA from being destroyed before the VMA lock is acquired. VMA lock statistics are updated according to the results. For now only anonymous VMAs can be searched this way. In other cases the function returns NULL. Signed-off-by: Suren Baghdasaryan --- include/linux/mm.h | 3 +++ mm/memory.c | 46 ++++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 49 insertions(+) diff --git a/include/linux/mm.h b/include/linux/mm.h index 895bb3950e8a..46d2db743b1a 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -701,6 +701,9 @@ static inline void vma_mark_detached(struct vm_area_str= uct *vma, bool detached) vma->detached =3D detached; } =20 +struct vm_area_struct *lock_vma_under_rcu(struct mm_struct *mm, + unsigned long address); + #else /* CONFIG_PER_VMA_LOCK */ =20 static inline void vma_init_lock(struct vm_area_struct *vma) {} diff --git a/mm/memory.c b/mm/memory.c index f7f412833e42..bda4c1a991f0 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -5221,6 +5221,52 @@ vm_fault_t handle_mm_fault(struct vm_area_struct *vm= a, unsigned long address, } EXPORT_SYMBOL_GPL(handle_mm_fault); =20 +#ifdef CONFIG_PER_VMA_LOCK +/* + * Lookup and lock a VMA under RCU protection. Returned VMA is guaranteed = to be + * stable and not isolated. If the VMA is not found or is being modified t= he + * function returns NULL. + */ +struct vm_area_struct *lock_vma_under_rcu(struct mm_struct *mm, + unsigned long address) +{ + MA_STATE(mas, &mm->mm_mt, address, address); + struct vm_area_struct *vma; + + rcu_read_lock(); +retry: + vma =3D mas_walk(&mas); + if (!vma) + goto inval; + + /* Only anonymous vmas are supported for now */ + if (!vma_is_anonymous(vma)) + goto inval; + + if (!vma_start_read(vma)) + goto inval; + + /* Check since vm_start/vm_end might change before we lock the VMA */ + if (unlikely(address < vma->vm_start || address >=3D vma->vm_end)) { + vma_end_read(vma); + goto inval; + } + + /* Check if the VMA got isolated after we found it */ + if (vma->detached) { + vma_end_read(vma); + /* The area was replaced with another one */ + goto retry; + } + + rcu_read_unlock(); + return vma; +inval: + rcu_read_unlock(); + return NULL; +} +#endif /* CONFIG_PER_VMA_LOCK */ + #ifndef __PAGETABLE_P4D_FOLDED /* * Allocate p4d page table. --=20 2.39.2.722.g9855ee24e9-goog From nobody Wed Sep 10 09:07:32 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 59B01C64ED8 for ; Mon, 27 Feb 2023 17:38:51 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230430AbjB0Riu (ORCPT ); Mon, 27 Feb 2023 12:38:50 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48288 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229566AbjB0RiC (ORCPT ); Mon, 27 Feb 2023 12:38:02 -0500 Received: from mail-pj1-x1049.google.com (mail-pj1-x1049.google.com [IPv6:2607:f8b0:4864:20::1049]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 2467024484 for ; Mon, 27 Feb 2023 09:37:35 -0800 (PST) Received: by mail-pj1-x1049.google.com with SMTP id u8-20020a17090ae00800b00237e4f46c8bso1450065pjy.7 for ; Mon, 27 Feb 2023 09:37:35 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=qrP8wdv0q5g6TZoHkkTnkJI1bRVcCya3euyXWiQCR7k=; b=ehrse2VdjDnpUSRaJ3cu0jOOkJfFGFsUNPVo557shDqmppFkfxAsvzUOYaJjQ0PijB EblMOtm5g5NRXaAPcMrHVwQAv8s/x+GnRlxruVHPkK5jtQN3yNt/EBweE+tKCGKaism7 U3HUjXQbto1B7qoo3unz00bS4zA1LfeWfOlzZkorv9RHy9hKyHRYaEA4i2TEzaJJuMBg IRIyIDCAlrqJadscsfwaMyXl67M96LYiFdG5WP3t9rIaF13TbFMOYIsSXYvDmmcAe7mT R89y28xjRtFN9b/E58zHscfzaCRWqNFydygKz6NtUn93MmZ0Jv953fv4FHoISujugl1t udmA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=qrP8wdv0q5g6TZoHkkTnkJI1bRVcCya3euyXWiQCR7k=; b=2n65QL9JohQ4zA3XLQptrQHsJjPGzTfHjpj4mPmDfO9YCTeOdett0vAXQxaVlkETy0 OaXbA4O9Y5FSM8kV94gdhrFiQe3vODtcdNUCxSd+gSfF9GgSHEq78ihPVklCE2xWzFub Jq8gejXZ8abYVjolInCNVG79yFpw5BU/+tsRG6C327LgU6VxAqzb8KRu2ZkXMOFbIYA5 Uc+u1U5297GDrcm+tXbKAZEiYemUVekyG9aBcvZ4K2FHwJryTmpe6gx+vwsi31nM790+ fUQ/945dNIu2Jrtj+6HVpGoO4iNz+nu/aN8s30jH5Z/H3RvqON2RkGhi3R2ubbTgRUYR mWfg== X-Gm-Message-State: AO0yUKW2JOjmAVLI+V/XrxLgKDCt+MawoR2p/jdM9UGjruU/O+mWM3w1 qDIjrgBgDZX5iSFWQRmd8h5MQkuPx58= X-Google-Smtp-Source: AK7set/aNutcI5JPOH6nKZSnuzXLxYVujD/4TtRCmI167JUO0duhDMkhOBXuPDJivfwxfw4Ct9inq/o370Q= X-Received: from surenb-desktop.mtv.corp.google.com ([2620:15c:211:200:e1f6:21d1:eead:3897]) (user=surenb job=sendgmr) by 2002:a17:902:ab58:b0:19b:e4c:2039 with SMTP id ij24-20020a170902ab5800b0019b0e4c2039mr6578302plb.4.1677519453804; Mon, 27 Feb 2023 09:37:33 -0800 (PST) Date: Mon, 27 Feb 2023 09:36:23 -0800 In-Reply-To: <20230227173632.3292573-1-surenb@google.com> Mime-Version: 1.0 References: <20230227173632.3292573-1-surenb@google.com> X-Mailer: git-send-email 2.39.2.722.g9855ee24e9-goog Message-ID: <20230227173632.3292573-25-surenb@google.com> Subject: [PATCH v4 24/33] mm: fall back to mmap_lock if vma->anon_vma is not yet set From: Suren Baghdasaryan To: akpm@linux-foundation.org Cc: michel@lespinasse.org, jglisse@google.com, mhocko@suse.com, vbabka@suse.cz, hannes@cmpxchg.org, mgorman@techsingularity.net, dave@stgolabs.net, willy@infradead.org, liam.howlett@oracle.com, peterz@infradead.org, ldufour@linux.ibm.com, paulmck@kernel.org, mingo@redhat.com, will@kernel.org, luto@kernel.org, songliubraving@fb.com, peterx@redhat.com, david@redhat.com, dhowells@redhat.com, hughd@google.com, bigeasy@linutronix.de, kent.overstreet@linux.dev, punit.agrawal@bytedance.com, lstoakes@gmail.com, peterjung1337@gmail.com, rientjes@google.com, chriscli@google.com, axelrasmussen@google.com, joelaf@google.com, minchan@google.com, rppt@kernel.org, jannh@google.com, shakeelb@google.com, tatashin@google.com, edumazet@google.com, gthelen@google.com, gurua@google.com, arjunroy@google.com, soheil@google.com, leewalsh@google.com, posk@google.com, michalechner92@googlemail.com, linux-mm@kvack.org, linux-arm-kernel@lists.infradead.org, linuxppc-dev@lists.ozlabs.org, x86@kernel.org, linux-kernel@vger.kernel.org, kernel-team@android.com, Suren Baghdasaryan Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" When vma->anon_vma is not set, page fault handler will set it by either reusing anon_vma of an adjacent VMA if VMAs are compatible or by allocating a new one. find_mergeable_anon_vma() walks VMA tree to find a compatible adjacent VMA and that requires not only the faulting VMA to be stable but also the tree structure and other VMAs inside that tree. Therefore locking just the faulting VMA is not enough for this search. Fall back to taking mmap_lock when vma->anon_vma is not set. This situation happens only on the first page fault and should not affect overall performance. Signed-off-by: Suren Baghdasaryan Reviewed-by: Hyeonggon Yoo <42.hyeyoo@gmail.com> --- mm/memory.c | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/mm/memory.c b/mm/memory.c index bda4c1a991f0..8855846a361b 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -5243,6 +5243,10 @@ struct vm_area_struct *lock_vma_under_rcu(struct mm_= struct *mm, if (!vma_is_anonymous(vma)) goto inval; =20 + /* find_mergeable_anon_vma uses adjacent vmas which are not locked */ + if (!vma->anon_vma) + goto inval; + if (!vma_start_read(vma)) goto inval; =20 --=20 2.39.2.722.g9855ee24e9-goog From nobody Wed Sep 10 09:07:32 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id DA326C64ED8 for ; Mon, 27 Feb 2023 17:38:54 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230436AbjB0Rix (ORCPT ); Mon, 27 Feb 2023 12:38:53 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48940 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230204AbjB0RiO (ORCPT ); Mon, 27 Feb 2023 12:38:14 -0500 Received: from mail-yw1-x1149.google.com (mail-yw1-x1149.google.com [IPv6:2607:f8b0:4864:20::1149]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A637024CBB for ; Mon, 27 Feb 2023 09:37:37 -0800 (PST) Received: by mail-yw1-x1149.google.com with SMTP id 00721157ae682-536bf649e70so153110567b3.0 for ; Mon, 27 Feb 2023 09:37:37 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=7ZDLcNGctmKqQEDQSm/x/HJRE1p1urqN7vMq9ze3IhQ=; b=rSn/HhKulcAt/dXRGMHQok87P5M5E270I0oCS3gZJEWkLWidOpnej1dxaWrgMzWbdZ hRegAssYtkAXj68NCwN4NlvBTUXN9bj0pr+BM9FESPqsUtzNJdLfxpMOuaf3C4zcz8AG QQ9sSRnDU7Hb4QEga3Ttu8/AwCSxDxNjWb2i9IxegWBslOk4oTQU94dc0rtucLzduQ3Q 4ozKaX/E5D/Dfa902LgPaXoXtxfZ+pNftUMyHyvMV2xCazmDlSOzCrSiriiDBvuGe/Gm 12h/Fv6wWsZva2wBOKCzc+/nhceTE6OG2eBLFHaxFB9ziU5glhrshChbsmrwR3lUlmIy 6zUA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=7ZDLcNGctmKqQEDQSm/x/HJRE1p1urqN7vMq9ze3IhQ=; b=7I+T2+pAVPNV01pQ0w2U9A3DJlLkOvbfBd/y48qUIIoeBXjTRxFKa4RCw5IWDxp/1K XBaG878fo4mGVOvpSiA0+5n1edZlqY7gqv/Ebj6J9GlrZaZciyn72KV0du8+kDY8vDiS 34KZgqAC7QwNznkXYLEo8KciGLe3ipRJvygsxiX/43K37RbZiEgoi4cj1Uu39kWv+oBR zm4ofUWaEzVUExKvGiLxA8rrBqAv9tyyYgbPK/3XXJovfTAB7bDAeBid8c5b8E3E70i6 jglyj394JPilElZ6sVUcR5Pem1ZPDapjhve8KC/RneCtHsD4zXzz558OFU35edOfR/VI qqWw== X-Gm-Message-State: AO0yUKVxQLl6Anxtfe2sl2c41Kdttv9TBjHtv/BRK1ewZ101uKbU8o0j EyNm4bVDWxY1ZflfkXgvXl1MSlkbY5k= X-Google-Smtp-Source: AK7set8grN4JIndM+ly9gt1j/cdj3SHv32UwM9Nv4EN/4JzKqYBKorAEBnSg1fP76/edXWaaLOwjV1mMS3c= X-Received: from surenb-desktop.mtv.corp.google.com ([2620:15c:211:200:e1f6:21d1:eead:3897]) (user=surenb job=sendgmr) by 2002:a5b:406:0:b0:904:2aa2:c26c with SMTP id m6-20020a5b0406000000b009042aa2c26cmr9545732ybp.5.1677519456128; Mon, 27 Feb 2023 09:37:36 -0800 (PST) Date: Mon, 27 Feb 2023 09:36:24 -0800 In-Reply-To: <20230227173632.3292573-1-surenb@google.com> Mime-Version: 1.0 References: <20230227173632.3292573-1-surenb@google.com> X-Mailer: git-send-email 2.39.2.722.g9855ee24e9-goog Message-ID: <20230227173632.3292573-26-surenb@google.com> Subject: [PATCH v4 25/33] mm: add FAULT_FLAG_VMA_LOCK flag From: Suren Baghdasaryan To: akpm@linux-foundation.org Cc: michel@lespinasse.org, jglisse@google.com, mhocko@suse.com, vbabka@suse.cz, hannes@cmpxchg.org, mgorman@techsingularity.net, dave@stgolabs.net, willy@infradead.org, liam.howlett@oracle.com, peterz@infradead.org, ldufour@linux.ibm.com, paulmck@kernel.org, mingo@redhat.com, will@kernel.org, luto@kernel.org, songliubraving@fb.com, peterx@redhat.com, david@redhat.com, dhowells@redhat.com, hughd@google.com, bigeasy@linutronix.de, kent.overstreet@linux.dev, punit.agrawal@bytedance.com, lstoakes@gmail.com, peterjung1337@gmail.com, rientjes@google.com, chriscli@google.com, axelrasmussen@google.com, joelaf@google.com, minchan@google.com, rppt@kernel.org, jannh@google.com, shakeelb@google.com, tatashin@google.com, edumazet@google.com, gthelen@google.com, gurua@google.com, arjunroy@google.com, soheil@google.com, leewalsh@google.com, posk@google.com, michalechner92@googlemail.com, linux-mm@kvack.org, linux-arm-kernel@lists.infradead.org, linuxppc-dev@lists.ozlabs.org, x86@kernel.org, linux-kernel@vger.kernel.org, kernel-team@android.com, Suren Baghdasaryan , Laurent Dufour Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Add a new flag to distinguish page faults handled under protection of per-vma lock. Signed-off-by: Suren Baghdasaryan Reviewed-by: Laurent Dufour --- include/linux/mm.h | 3 ++- include/linux/mm_types.h | 1 + 2 files changed, 3 insertions(+), 1 deletion(-) diff --git a/include/linux/mm.h b/include/linux/mm.h index 46d2db743b1a..d07ac923333f 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -478,7 +478,8 @@ static inline bool fault_flag_allow_retry_first(enum fa= ult_flag flags) { FAULT_FLAG_USER, "USER" }, \ { FAULT_FLAG_REMOTE, "REMOTE" }, \ { FAULT_FLAG_INSTRUCTION, "INSTRUCTION" }, \ - { FAULT_FLAG_INTERRUPTIBLE, "INTERRUPTIBLE" } + { FAULT_FLAG_INTERRUPTIBLE, "INTERRUPTIBLE" }, \ + { FAULT_FLAG_VMA_LOCK, "VMA_LOCK" } =20 /* * vm_fault is filled by the pagefault handler and passed to the vma's diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index 45a219d33c6b..6768533a6b7c 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -1097,6 +1097,7 @@ enum fault_flag { FAULT_FLAG_INTERRUPTIBLE =3D 1 << 9, FAULT_FLAG_UNSHARE =3D 1 << 10, FAULT_FLAG_ORIG_PTE_VALID =3D 1 << 11, + FAULT_FLAG_VMA_LOCK =3D 1 << 12, }; =20 typedef unsigned int __bitwise zap_flags_t; --=20 2.39.2.722.g9855ee24e9-goog From nobody Wed Sep 10 09:07:32 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4C820C64ED8 for ; Mon, 27 Feb 2023 17:38:58 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230448AbjB0Ri5 (ORCPT ); Mon, 27 Feb 2023 12:38:57 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49306 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230098AbjB0RiS (ORCPT ); Mon, 27 Feb 2023 12:38:18 -0500 Received: from mail-yw1-x114a.google.com (mail-yw1-x114a.google.com [IPv6:2607:f8b0:4864:20::114a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 825BB24CAA for ; Mon, 27 Feb 2023 09:37:39 -0800 (PST) Received: by mail-yw1-x114a.google.com with SMTP id 00721157ae682-536c6ce8d74so152377877b3.9 for ; Mon, 27 Feb 2023 09:37:39 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=u/Oe5vIeahqwi5sYk+6bCjyqP36jyrRz0eAbwTkYv10=; b=oVVsUrq9J7ftO0x2n5rOfx5Vuu5nhzvFsm0M1/orxQW1l03nlBX1oBt9Z3PeMUwxB8 EXB0rSNNqcp6UVy0odo1Y2qU2vLsCHExXTtZRuAA28OmedCWSwEPwJ/vb5UPfrQv5PJW ecTpj0oKN79TWuriZXNGVkql1JYRLuNigyKob+qED9T9sBsRaNyBmXPE+YrDxEJo7iH1 ubKjcF24jylzmHXYDSJCXBFRNZiczfW6Yvvr4W8sLGpkz89otOdurivgyE6QG9AxV0ch vbIsOJhdZ8Z0YDKh5k0bAgyDf+j5LXFUYUSVRKs5nZ5bUnBRVL57+VXMgXm8R+LTRb+i YnrA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=u/Oe5vIeahqwi5sYk+6bCjyqP36jyrRz0eAbwTkYv10=; b=XdDMpHvNpTTw14Bb/zhLkkVovhkDmbHJvSuZPSYd4N4lORUvCXhFIH4EKPSZSJ48KI 9ZdqyU+lV9lXKkasqb22SHhled5sx/Zzz6wGOLL9wgrfOIP4GRfEj9A7ytotb5tb54Rj ynxsMH6Om4HCauOaQXqIH1PaezgPUBko9RqCX7L+S5EhHM+5uu4aQzC7RidA6U5n52rK VJsOXo2pmQ82vS8lK0LMFL+4a2ioD/1qa8IgVxwmG7/19/lCClAm1PmcgQVBtujr8AQm SwSwJS/h4Ml05nT7OpE0Ca9b3jRZkmq86Og8lCLGmiKI3KPXF50SuenrNQPYUgmoQMMj dftw== X-Gm-Message-State: AO0yUKVWB9vcEq9s1UquW+mDzgronPdkSkCabLnL5xtkGi8dVVX75er9 JvTa8TACYJisBPkD4Ha5PM/b1xo525w= X-Google-Smtp-Source: AK7set+K4/t0896rH5imu1aS5MRSoxg+vS9UwsZogQaAvLfU1/2OHkdV+UAhVM9dEQWx1b4yJz5LZ0tC6/0= X-Received: from surenb-desktop.mtv.corp.google.com ([2620:15c:211:200:e1f6:21d1:eead:3897]) (user=surenb job=sendgmr) by 2002:a81:3e21:0:b0:533:cf4e:9a80 with SMTP id l33-20020a813e21000000b00533cf4e9a80mr10559241ywa.6.1677519458641; Mon, 27 Feb 2023 09:37:38 -0800 (PST) Date: Mon, 27 Feb 2023 09:36:25 -0800 In-Reply-To: <20230227173632.3292573-1-surenb@google.com> Mime-Version: 1.0 References: <20230227173632.3292573-1-surenb@google.com> X-Mailer: git-send-email 2.39.2.722.g9855ee24e9-goog Message-ID: <20230227173632.3292573-27-surenb@google.com> Subject: [PATCH v4 26/33] mm: prevent do_swap_page from handling page faults under VMA lock From: Suren Baghdasaryan To: akpm@linux-foundation.org Cc: michel@lespinasse.org, jglisse@google.com, mhocko@suse.com, vbabka@suse.cz, hannes@cmpxchg.org, mgorman@techsingularity.net, dave@stgolabs.net, willy@infradead.org, liam.howlett@oracle.com, peterz@infradead.org, ldufour@linux.ibm.com, paulmck@kernel.org, mingo@redhat.com, will@kernel.org, luto@kernel.org, songliubraving@fb.com, peterx@redhat.com, david@redhat.com, dhowells@redhat.com, hughd@google.com, bigeasy@linutronix.de, kent.overstreet@linux.dev, punit.agrawal@bytedance.com, lstoakes@gmail.com, peterjung1337@gmail.com, rientjes@google.com, chriscli@google.com, axelrasmussen@google.com, joelaf@google.com, minchan@google.com, rppt@kernel.org, jannh@google.com, shakeelb@google.com, tatashin@google.com, edumazet@google.com, gthelen@google.com, gurua@google.com, arjunroy@google.com, soheil@google.com, leewalsh@google.com, posk@google.com, michalechner92@googlemail.com, linux-mm@kvack.org, linux-arm-kernel@lists.infradead.org, linuxppc-dev@lists.ozlabs.org, x86@kernel.org, linux-kernel@vger.kernel.org, kernel-team@android.com, Suren Baghdasaryan , Laurent Dufour Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Due to the possibility of do_swap_page dropping mmap_lock, abort fault handling under VMA lock and retry holding mmap_lock. This can be handled more gracefully in the future. Signed-off-by: Suren Baghdasaryan Reviewed-by: Laurent Dufour --- mm/memory.c | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/mm/memory.c b/mm/memory.c index 8855846a361b..af3c2c59cd11 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -3689,6 +3689,11 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) if (!pte_unmap_same(vmf)) goto out; =20 + if (vmf->flags & FAULT_FLAG_VMA_LOCK) { + ret =3D VM_FAULT_RETRY; + goto out; + } + entry =3D pte_to_swp_entry(vmf->orig_pte); if (unlikely(non_swap_entry(entry))) { if (is_migration_entry(entry)) { --=20 2.39.2.722.g9855ee24e9-goog From nobody Wed Sep 10 09:07:32 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id DA593C64ED8 for ; Mon, 27 Feb 2023 17:39:01 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230455AbjB0RjA (ORCPT ); Mon, 27 Feb 2023 12:39:00 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49374 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230378AbjB0RiU (ORCPT ); Mon, 27 Feb 2023 12:38:20 -0500 Received: from mail-yw1-x1149.google.com (mail-yw1-x1149.google.com [IPv6:2607:f8b0:4864:20::1149]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 5ECCC241EF for ; Mon, 27 Feb 2023 09:37:42 -0800 (PST) Received: by mail-yw1-x1149.google.com with SMTP id 00721157ae682-536a4eba107so153324157b3.19 for ; Mon, 27 Feb 2023 09:37:42 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=GQTFxYC9vfA9jf0qnv7gyPX3j3jACtIk3rCkUF/FJDY=; b=ltU27wOgGhATSjAgpKpsKxT2cYCgBXmJjPP7Vlf66xgvCPyy9Y+Q9/fs6W0SzgF7Jz AXhDZAKO1i4O/EUJVs8bawuizzatT9I4VypR9NCPeHEKAKGbhmDhab1/2DAQYWqdJDRL SlVcHXqIAb6AFOFtWd+XMHyyClJx99DPXWU2Oc6Hu8D9kLp3zMzECqCZ3BlXkfECm7ag vCaq44G6you6yfY7K1dBcUrSymqL5XjWSs2bzGHgp5mJNuIzmcdPLIfUbaNclxIn5eWy D+XKgbxlwvwdAgDoeqhoNS8X6MSQFy0GYxkz6nxN9qX46TIoCwJZF4rSw2+QWBYvnyX8 AYFA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=GQTFxYC9vfA9jf0qnv7gyPX3j3jACtIk3rCkUF/FJDY=; b=gSSA39FuZ9YYOzaCXOwot6McLcK/dVo4hRn8L5m2ChUPc1EhjsAyUFxxGdlgWEUzgX 0sj5u/MLOTsBsr0ba1MPphYTWo1iyJhWPVVxOJnvPP4PsKNDDbi4fkkUt/UjgPDFfSBl cN6a7k/Ind58YBpwKbEVDFog9HjiZLD6jmYAEKr8kCI93Y5wvbFiZTus9XCY6zIjjlrF iAv1L9lpYV+An1lSR7dejsMWEwBr0D0nlagBlQXj6PL4fgQCEF+XvDOXIBgzSLNcB28c /88ibp/GNbEU02P/zISYkE7T23bstNfyznOTAxzwWCn0zJOo3Qm+usV0lD9/fXxiU4xM GEmA== X-Gm-Message-State: AO0yUKWiXfz6EPUIxmYrvAPojVbc1png1U0hxDoSeA1PSz8oV3/w0TDj rMM22zOSaCBBbe3tDMEj6FesxROyXJY= X-Google-Smtp-Source: AK7set9z2RIB1+MnUTCI+d6tD2+X/XlUvgLJksvCvzE5szG4B6uXrDxC8Bbk9Da1F9i9cjnu9YsjeU4IwLs= X-Received: from surenb-desktop.mtv.corp.google.com ([2620:15c:211:200:e1f6:21d1:eead:3897]) (user=surenb job=sendgmr) by 2002:a0d:ca10:0:b0:52e:e6ed:3094 with SMTP id m16-20020a0dca10000000b0052ee6ed3094mr727498ywd.532.1677519461069; Mon, 27 Feb 2023 09:37:41 -0800 (PST) Date: Mon, 27 Feb 2023 09:36:26 -0800 In-Reply-To: <20230227173632.3292573-1-surenb@google.com> Mime-Version: 1.0 References: <20230227173632.3292573-1-surenb@google.com> X-Mailer: git-send-email 2.39.2.722.g9855ee24e9-goog Message-ID: <20230227173632.3292573-28-surenb@google.com> Subject: [PATCH v4 27/33] mm: prevent userfaults to be handled under per-vma lock From: Suren Baghdasaryan To: akpm@linux-foundation.org Cc: michel@lespinasse.org, jglisse@google.com, mhocko@suse.com, vbabka@suse.cz, hannes@cmpxchg.org, mgorman@techsingularity.net, dave@stgolabs.net, willy@infradead.org, liam.howlett@oracle.com, peterz@infradead.org, ldufour@linux.ibm.com, paulmck@kernel.org, mingo@redhat.com, will@kernel.org, luto@kernel.org, songliubraving@fb.com, peterx@redhat.com, david@redhat.com, dhowells@redhat.com, hughd@google.com, bigeasy@linutronix.de, kent.overstreet@linux.dev, punit.agrawal@bytedance.com, lstoakes@gmail.com, peterjung1337@gmail.com, rientjes@google.com, chriscli@google.com, axelrasmussen@google.com, joelaf@google.com, minchan@google.com, rppt@kernel.org, jannh@google.com, shakeelb@google.com, tatashin@google.com, edumazet@google.com, gthelen@google.com, gurua@google.com, arjunroy@google.com, soheil@google.com, leewalsh@google.com, posk@google.com, michalechner92@googlemail.com, linux-mm@kvack.org, linux-arm-kernel@lists.infradead.org, linuxppc-dev@lists.ozlabs.org, x86@kernel.org, linux-kernel@vger.kernel.org, kernel-team@android.com, Suren Baghdasaryan Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Due to the possibility of handle_userfault dropping mmap_lock, avoid fault handling under VMA lock and retry holding mmap_lock. This can be handled more gracefully in the future. Signed-off-by: Suren Baghdasaryan Suggested-by: Peter Xu --- mm/memory.c | 9 +++++++++ 1 file changed, 9 insertions(+) diff --git a/mm/memory.c b/mm/memory.c index af3c2c59cd11..f734f80d28ca 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -5255,6 +5255,15 @@ struct vm_area_struct *lock_vma_under_rcu(struct mm_= struct *mm, if (!vma_start_read(vma)) goto inval; =20 + /* + * Due to the possibility of userfault handler dropping mmap_lock, avoid + * it for now and fall back to page fault handling under mmap_lock. + */ + if (userfaultfd_armed(vma)) { + vma_end_read(vma); + goto inval; + } + /* Check since vm_start/vm_end might change before we lock the VMA */ if (unlikely(address < vma->vm_start || address >=3D vma->vm_end)) { vma_end_read(vma); --=20 2.39.2.722.g9855ee24e9-goog From nobody Wed Sep 10 09:07:32 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6FC0FC64ED8 for ; Mon, 27 Feb 2023 17:39:14 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230472AbjB0RjM (ORCPT ); Mon, 27 Feb 2023 12:39:12 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48960 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230343AbjB0RiW (ORCPT ); Mon, 27 Feb 2023 12:38:22 -0500 Received: from mail-yw1-x1149.google.com (mail-yw1-x1149.google.com [IPv6:2607:f8b0:4864:20::1149]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id ABF7F10DE for ; Mon, 27 Feb 2023 09:37:45 -0800 (PST) Received: by mail-yw1-x1149.google.com with SMTP id 00721157ae682-536c6ce8d74so152382247b3.9 for ; Mon, 27 Feb 2023 09:37:45 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=UPo8VnsZti2aeY7Ox0qRgfNdvo1gTkzbfBEHZWtvw/0=; b=IGWknCp/kisFoGwJNllVFcIXkGMt1odnhFRD9r74JJb+hle3qs+rysH7Cj7LxfSYl3 UW6Inlffa9zMrz18VMYTmm0mCybg9uuWP8GtlbTG3g9pftAlYiYDxCOgQ9yUPTQuNgUQ 7e5+Qgd8nKAvibx0XVaMl/6g6MRebcr67sOkD0X7DIV3NuMKmZV7fQCQn8tLNb6YhgpG HIOQasloSO/5ARX6XcV70hiI9ZtHPTCNioAp7z9/k/Sfdb6q0fhN5YCiCRhR+ImXzmes BXS9C1iWs23SeFAwT6rFElobGU5I0Xf+MXK56mAXr/Sbo1qNfZi6LCRNAeWAjz+RO9FM dTLA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=UPo8VnsZti2aeY7Ox0qRgfNdvo1gTkzbfBEHZWtvw/0=; b=qrwrq9+uWULfbxP/eCqTbBzGWri3FsM3pmJ1pHYtvet+YekeR/dZEnWnd8+Oe9avAq p1J2ecvnsUdMKLW4zLGcY3Fx1DYrVjQYVUEDjXJ3a+zkIC1dyowUBr1G2BniW3r2wWk+ GhkwITEsy/g20z0PyLbjBHuiU23njJRimy5m3u2vXZBgtJU0h8yQbCesY8Ha0+334nfV l5v8DbVJteFLfdGiPtke1ucxZC9gIi9a1gXc8nD34lRTS21H8rKt0OuAhZX1pta9eT/t RHFNPJ60K4diSPY63gl7wQW2d73kspoTfLYch3Bplo3EbWnr257vIggqNqV0SYjTVfGr +CcA== X-Gm-Message-State: AO0yUKU140pdtxvTvIoNIdF+Od0l7eCagfVLKDjUkK9x/RDvZT+JZWly iK20r/c3r5PMHZfuS3owkx1I+oisSZo= X-Google-Smtp-Source: AK7set9Qr99NowsWp6cjcp1aWhoq+Zdc9tJrNQG4gE1aZ9h1lXPmI6kCw99Bwq0EMQ8NeKIO2QVO54vN80s= X-Received: from surenb-desktop.mtv.corp.google.com ([2620:15c:211:200:e1f6:21d1:eead:3897]) (user=surenb job=sendgmr) by 2002:a0d:d002:0:b0:527:ae97:e8fe with SMTP id s2-20020a0dd002000000b00527ae97e8femr357186ywd.42.1677519463716; Mon, 27 Feb 2023 09:37:43 -0800 (PST) Date: Mon, 27 Feb 2023 09:36:27 -0800 In-Reply-To: <20230227173632.3292573-1-surenb@google.com> Mime-Version: 1.0 References: <20230227173632.3292573-1-surenb@google.com> X-Mailer: git-send-email 2.39.2.722.g9855ee24e9-goog Message-ID: <20230227173632.3292573-29-surenb@google.com> Subject: [PATCH v4 28/33] mm: introduce per-VMA lock statistics From: Suren Baghdasaryan To: akpm@linux-foundation.org Cc: michel@lespinasse.org, jglisse@google.com, mhocko@suse.com, vbabka@suse.cz, hannes@cmpxchg.org, mgorman@techsingularity.net, dave@stgolabs.net, willy@infradead.org, liam.howlett@oracle.com, peterz@infradead.org, ldufour@linux.ibm.com, paulmck@kernel.org, mingo@redhat.com, will@kernel.org, luto@kernel.org, songliubraving@fb.com, peterx@redhat.com, david@redhat.com, dhowells@redhat.com, hughd@google.com, bigeasy@linutronix.de, kent.overstreet@linux.dev, punit.agrawal@bytedance.com, lstoakes@gmail.com, peterjung1337@gmail.com, rientjes@google.com, chriscli@google.com, axelrasmussen@google.com, joelaf@google.com, minchan@google.com, rppt@kernel.org, jannh@google.com, shakeelb@google.com, tatashin@google.com, edumazet@google.com, gthelen@google.com, gurua@google.com, arjunroy@google.com, soheil@google.com, leewalsh@google.com, posk@google.com, michalechner92@googlemail.com, linux-mm@kvack.org, linux-arm-kernel@lists.infradead.org, linuxppc-dev@lists.ozlabs.org, x86@kernel.org, linux-kernel@vger.kernel.org, kernel-team@android.com, Suren Baghdasaryan Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Add a new CONFIG_PER_VMA_LOCK_STATS config option to dump extra statistics about handling page fault under VMA lock. Signed-off-by: Suren Baghdasaryan --- include/linux/vm_event_item.h | 6 ++++++ include/linux/vmstat.h | 6 ++++++ mm/Kconfig.debug | 6 ++++++ mm/memory.c | 2 ++ mm/vmstat.c | 6 ++++++ 5 files changed, 26 insertions(+) diff --git a/include/linux/vm_event_item.h b/include/linux/vm_event_item.h index 7f5d1caf5890..8abfa1240040 100644 --- a/include/linux/vm_event_item.h +++ b/include/linux/vm_event_item.h @@ -149,6 +149,12 @@ enum vm_event_item { PGPGIN, PGPGOUT, PSWPIN, PSWPOUT, #ifdef CONFIG_X86 DIRECT_MAP_LEVEL2_SPLIT, DIRECT_MAP_LEVEL3_SPLIT, +#endif +#ifdef CONFIG_PER_VMA_LOCK_STATS + VMA_LOCK_SUCCESS, + VMA_LOCK_ABORT, + VMA_LOCK_RETRY, + VMA_LOCK_MISS, #endif NR_VM_EVENT_ITEMS }; diff --git a/include/linux/vmstat.h b/include/linux/vmstat.h index 19cf5b6892ce..fed855bae6d8 100644 --- a/include/linux/vmstat.h +++ b/include/linux/vmstat.h @@ -125,6 +125,12 @@ static inline void vm_events_fold_cpu(int cpu) #define count_vm_tlb_events(x, y) do { (void)(y); } while (0) #endif =20 +#ifdef CONFIG_PER_VMA_LOCK_STATS +#define count_vm_vma_lock_event(x) count_vm_event(x) +#else +#define count_vm_vma_lock_event(x) do {} while (0) +#endif + #define __count_zid_vm_events(item, zid, delta) \ __count_vm_events(item##_NORMAL - ZONE_NORMAL + zid, delta) =20 diff --git a/mm/Kconfig.debug b/mm/Kconfig.debug index c3547a373c9c..4965a7333a3f 100644 --- a/mm/Kconfig.debug +++ b/mm/Kconfig.debug @@ -279,3 +279,9 @@ config DEBUG_KMEMLEAK_AUTO_SCAN =20 If unsure, say Y. =20 +config PER_VMA_LOCK_STATS + bool "Statistics for per-vma locks" + depends on PER_VMA_LOCK + default y + help + Statistics for per-vma locks. diff --git a/mm/memory.c b/mm/memory.c index f734f80d28ca..255b2f4fdd4a 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -5273,6 +5273,7 @@ struct vm_area_struct *lock_vma_under_rcu(struct mm_s= truct *mm, /* Check if the VMA got isolated after we found it */ if (vma->detached) { vma_end_read(vma); + count_vm_vma_lock_event(VMA_LOCK_MISS); /* The area was replaced with another one */ goto retry; } @@ -5281,6 +5282,7 @@ struct vm_area_struct *lock_vma_under_rcu(struct mm_s= truct *mm, return vma; inval: rcu_read_unlock(); + count_vm_vma_lock_event(VMA_LOCK_ABORT); return NULL; } #endif /* CONFIG_PER_VMA_LOCK */ diff --git a/mm/vmstat.c b/mm/vmstat.c index 1ea6a5ce1c41..4f1089a1860e 100644 --- a/mm/vmstat.c +++ b/mm/vmstat.c @@ -1399,6 +1399,12 @@ const char * const vmstat_text[] =3D { "direct_map_level2_splits", "direct_map_level3_splits", #endif +#ifdef CONFIG_PER_VMA_LOCK_STATS + "vma_lock_success", + "vma_lock_abort", + "vma_lock_retry", + "vma_lock_miss", +#endif #endif /* CONFIG_VM_EVENT_COUNTERS || CONFIG_MEMCG */ }; #endif /* CONFIG_PROC_FS || CONFIG_SYSFS || CONFIG_NUMA || CONFIG_MEMCG */ --=20 2.39.2.722.g9855ee24e9-goog From nobody Wed Sep 10 09:07:32 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id EB413C64ED8 for ; Mon, 27 Feb 2023 17:39:10 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230468AbjB0RjJ (ORCPT ); Mon, 27 Feb 2023 12:39:09 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48356 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230123AbjB0RiW (ORCPT ); Mon, 27 Feb 2023 12:38:22 -0500 Received: from mail-pg1-x54a.google.com (mail-pg1-x54a.google.com [IPv6:2607:f8b0:4864:20::54a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 58F42241F8 for ; Mon, 27 Feb 2023 09:37:47 -0800 (PST) Received: by mail-pg1-x54a.google.com with SMTP id 6-20020a631046000000b00502afcf62easo2138207pgq.8 for ; Mon, 27 Feb 2023 09:37:47 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=8qbswzDezEhzstnVvKcWG8k4mqqShtR5CvkIjaL/Ams=; b=S2z/SSZHK7ga7fn7K3WxQ1Z9Tp5cfnBCSQx6U3NwmV1gsUS+bGmINQRxa2j1mxprJj TlUS9zejozFI94Me4SRmYnZJhHoCJ1T/yWkWmu9QLBmKTP9UVKEKrUis4+6aHa3Th3fJ h+Y+1Jmq5YaDOU6wdiux0RRTdqhbgVjK/U97ZWeloQvvOddB9ya0oDdfoI+0RwHp4MBY fv4OUGk3ZqItB135iDjzRMrjgTvWp+7rVjVBq1LRXg1IvF5Kyuov7vPWbX/2nDhfcOko v7S1G6CGTLyAoqFG7bUJ2+HrxmmeBgvMQ0WZGMska7TydHy4EV5ohTcRFh6YPtqaKDjC j2tg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=8qbswzDezEhzstnVvKcWG8k4mqqShtR5CvkIjaL/Ams=; b=jJ9/gUTnYGhiBGFv0svv9xmVWox3uI02m1ARhea21eJnN+8Tbv6UXxzZp+fM67gx5s Iqq1J3PFJ6vlmt6+rHCCTirVZF/dBVlgNIrLjg7jPOxlJ812jU5S+h+F+XqBVHcVA8oy PywhU2Smf1EG7WTFMD9jxnfLaEg7poYe0mokT4wLrJ2M+CB1Za+/2FTkqssbFIJtMJjG Nh5R4Oc/+xfLTmd+JFrld1qNapGneVj0nb627cX7i3ZNd2+u1DR7VYGguKAwyc/uyYdN 9ep8OPvFRKhD35hRhNaEgsOfPiVS11/D0JkLiTfYF8RcL3ynahensPEeZVOD9wYoKsy/ W5rA== X-Gm-Message-State: AO0yUKV2x9gA7LS2cxYdvIarE1fxGAIyDG2a56S2OPZRodLRslRhvQ2M fOkOVYzoRNP4/cu4yHS995gWlm8I0VQ= X-Google-Smtp-Source: AK7set9keW13BRbYPHEfojmqkqSUJl+DCAHGSXNxIKLRjiYP92gcRxEVFUNpHyaWkuSA+coR6ukP0Mdxl38= X-Received: from surenb-desktop.mtv.corp.google.com ([2620:15c:211:200:e1f6:21d1:eead:3897]) (user=surenb job=sendgmr) by 2002:a63:7a56:0:b0:4fb:b88f:e98a with SMTP id j22-20020a637a56000000b004fbb88fe98amr6296782pgn.7.1677519466345; Mon, 27 Feb 2023 09:37:46 -0800 (PST) Date: Mon, 27 Feb 2023 09:36:28 -0800 In-Reply-To: <20230227173632.3292573-1-surenb@google.com> Mime-Version: 1.0 References: <20230227173632.3292573-1-surenb@google.com> X-Mailer: git-send-email 2.39.2.722.g9855ee24e9-goog Message-ID: <20230227173632.3292573-30-surenb@google.com> Subject: [PATCH v4 29/33] x86/mm: try VMA lock-based page fault handling first From: Suren Baghdasaryan To: akpm@linux-foundation.org Cc: michel@lespinasse.org, jglisse@google.com, mhocko@suse.com, vbabka@suse.cz, hannes@cmpxchg.org, mgorman@techsingularity.net, dave@stgolabs.net, willy@infradead.org, liam.howlett@oracle.com, peterz@infradead.org, ldufour@linux.ibm.com, paulmck@kernel.org, mingo@redhat.com, will@kernel.org, luto@kernel.org, songliubraving@fb.com, peterx@redhat.com, david@redhat.com, dhowells@redhat.com, hughd@google.com, bigeasy@linutronix.de, kent.overstreet@linux.dev, punit.agrawal@bytedance.com, lstoakes@gmail.com, peterjung1337@gmail.com, rientjes@google.com, chriscli@google.com, axelrasmussen@google.com, joelaf@google.com, minchan@google.com, rppt@kernel.org, jannh@google.com, shakeelb@google.com, tatashin@google.com, edumazet@google.com, gthelen@google.com, gurua@google.com, arjunroy@google.com, soheil@google.com, leewalsh@google.com, posk@google.com, michalechner92@googlemail.com, linux-mm@kvack.org, linux-arm-kernel@lists.infradead.org, linuxppc-dev@lists.ozlabs.org, x86@kernel.org, linux-kernel@vger.kernel.org, kernel-team@android.com, Suren Baghdasaryan Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Attempt VMA lock-based page fault handling first, and fall back to the existing mmap_lock-based handling if that fails. Signed-off-by: Suren Baghdasaryan --- arch/x86/Kconfig | 1 + arch/x86/mm/fault.c | 36 ++++++++++++++++++++++++++++++++++++ 2 files changed, 37 insertions(+) diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index a825bf031f49..df21fba77db1 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -27,6 +27,7 @@ config X86_64 # Options that are inherently 64-bit kernel only: select ARCH_HAS_GIGANTIC_PAGE select ARCH_SUPPORTS_INT128 if CC_HAS_INT128 + select ARCH_SUPPORTS_PER_VMA_LOCK select ARCH_USE_CMPXCHG_LOCKREF select HAVE_ARCH_SOFT_DIRTY select MODULES_USE_ELF_RELA diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c index a498ae1fbe66..e4399983c50c 100644 --- a/arch/x86/mm/fault.c +++ b/arch/x86/mm/fault.c @@ -19,6 +19,7 @@ #include /* faulthandler_disabled() */ #include /* efi_crash_gracefully_on_page_fault()*/ #include +#include /* find_and_lock_vma() */ =20 #include /* boot_cpu_has, ... */ #include /* dotraplinkage, ... */ @@ -1333,6 +1334,38 @@ void do_user_addr_fault(struct pt_regs *regs, } #endif =20 +#ifdef CONFIG_PER_VMA_LOCK + if (!(flags & FAULT_FLAG_USER)) + goto lock_mmap; + + vma =3D lock_vma_under_rcu(mm, address); + if (!vma) + goto lock_mmap; + + if (unlikely(access_error(error_code, vma))) { + vma_end_read(vma); + goto lock_mmap; + } + fault =3D handle_mm_fault(vma, address, flags | FAULT_FLAG_VMA_LOCK, regs= ); + vma_end_read(vma); + + if (!(fault & VM_FAULT_RETRY)) { + count_vm_vma_lock_event(VMA_LOCK_SUCCESS); + goto done; + } + count_vm_vma_lock_event(VMA_LOCK_RETRY); + + /* Quick path to respond to signals */ + if (fault_signal_pending(fault, regs)) { + if (!user_mode(regs)) + kernelmode_fixup_or_oops(regs, error_code, address, + SIGBUS, BUS_ADRERR, + ARCH_DEFAULT_PKEY); + return; + } +lock_mmap: +#endif /* CONFIG_PER_VMA_LOCK */ + /* * Kernel-mode access to the user address space should only occur * on well-defined single instructions listed in the exception @@ -1433,6 +1466,9 @@ void do_user_addr_fault(struct pt_regs *regs, } =20 mmap_read_unlock(mm); +#ifdef CONFIG_PER_VMA_LOCK +done: +#endif if (likely(!(fault & VM_FAULT_ERROR))) return; =20 --=20 2.39.2.722.g9855ee24e9-goog From nobody Wed Sep 10 09:07:32 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3E96BC7EE23 for ; Mon, 27 Feb 2023 17:39:20 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229816AbjB0RjS (ORCPT ); Mon, 27 Feb 2023 12:39:18 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48984 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229935AbjB0RiZ (ORCPT ); Mon, 27 Feb 2023 12:38:25 -0500 Received: from mail-yw1-x1149.google.com (mail-yw1-x1149.google.com [IPv6:2607:f8b0:4864:20::1149]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 4763B1FC4 for ; Mon, 27 Feb 2023 09:37:51 -0800 (PST) Received: by mail-yw1-x1149.google.com with SMTP id 00721157ae682-536c039f859so154210867b3.21 for ; Mon, 27 Feb 2023 09:37:51 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=MLyuEeEynE8KONkERvpFgsKF3fJfrVlAamWKGgJMnRA=; b=axT7gIHSM9vrvAhwKVjKksPBBorC/hEiqwRtJY/ABl81gKVCntmX74iTRhXvXgAnUV Um80sGKa9BQlweiuzQTKDk//t9klBC4AuVFnP3CTk2uBlo6XhUaWjWY5Blg8c/od1wB2 kGKuk0XbNvhKWTaK071eLxFocsyNDE7thDMFx60EqbJKuNaJXP1uEzxTROegPPIy1HGi /hu0FB66yrsQ6uwQNFSWYEafLSSxfymT27BvoyNTtd8BgVaAOG8u+Vs4WjN1pPd4fScE c5yJ6zCfiGKeo2DRNILtL3E3ThjPFtXt9gJX7nMmqjJa0MWklX0CZPjvb9UiR6pRtDTU u9Vg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=MLyuEeEynE8KONkERvpFgsKF3fJfrVlAamWKGgJMnRA=; b=YGBWuITZTXdysn7cxEhQzixPkpVRoj0Cqk98A8cu0BYFSts8ONyT6bwxMuemBqx28g poUZF1JsUseYToKT8jRBkNr9ONLVWYGw4xocjOdouWg24r2UarPR+GW3NxOWHS0/LeUU FIL3/BTUFKLfC+dJ8s2Y8wS3g2wfaigz8hj16st54Feoy0OEX8EXJ1um/BQspMSfeGRc Xv4/tgSv/lfuGdWKG5j6FAyEGSPnczS5USWbrZmKXHdzgQRrJo/uLhYMwlLBXDN2Hma4 2f5Zo6XSTlVYVLm+DVFgP9fjFCI8VEKH5YXzfg0aWnNJfyEPAFb1hPerBIiL8MwWtGCT eLGA== X-Gm-Message-State: AO0yUKWszaZO+kQtkDW6Muhm9noVqY6H3mJf+iueSlh0HwoWeiSL9ZNS LGzMbWu0N70n14V3hVUqKeDRigfqNCg= X-Google-Smtp-Source: AK7set/sV1mOrmQyWkQwKfhY9l7Yn97smkcjzbwoSpl4Lx6r+hbm0S79lHOvGLHJAd/kE6zZDkM+t31MhHQ= X-Received: from surenb-desktop.mtv.corp.google.com ([2620:15c:211:200:e1f6:21d1:eead:3897]) (user=surenb job=sendgmr) by 2002:a05:6902:140c:b0:88a:f2f:d004 with SMTP id z12-20020a056902140c00b0088a0f2fd004mr219395ybu.5.1677519468856; Mon, 27 Feb 2023 09:37:48 -0800 (PST) Date: Mon, 27 Feb 2023 09:36:29 -0800 In-Reply-To: <20230227173632.3292573-1-surenb@google.com> Mime-Version: 1.0 References: <20230227173632.3292573-1-surenb@google.com> X-Mailer: git-send-email 2.39.2.722.g9855ee24e9-goog Message-ID: <20230227173632.3292573-31-surenb@google.com> Subject: [PATCH v4 30/33] arm64/mm: try VMA lock-based page fault handling first From: Suren Baghdasaryan To: akpm@linux-foundation.org Cc: michel@lespinasse.org, jglisse@google.com, mhocko@suse.com, vbabka@suse.cz, hannes@cmpxchg.org, mgorman@techsingularity.net, dave@stgolabs.net, willy@infradead.org, liam.howlett@oracle.com, peterz@infradead.org, ldufour@linux.ibm.com, paulmck@kernel.org, mingo@redhat.com, will@kernel.org, luto@kernel.org, songliubraving@fb.com, peterx@redhat.com, david@redhat.com, dhowells@redhat.com, hughd@google.com, bigeasy@linutronix.de, kent.overstreet@linux.dev, punit.agrawal@bytedance.com, lstoakes@gmail.com, peterjung1337@gmail.com, rientjes@google.com, chriscli@google.com, axelrasmussen@google.com, joelaf@google.com, minchan@google.com, rppt@kernel.org, jannh@google.com, shakeelb@google.com, tatashin@google.com, edumazet@google.com, gthelen@google.com, gurua@google.com, arjunroy@google.com, soheil@google.com, leewalsh@google.com, posk@google.com, michalechner92@googlemail.com, linux-mm@kvack.org, linux-arm-kernel@lists.infradead.org, linuxppc-dev@lists.ozlabs.org, x86@kernel.org, linux-kernel@vger.kernel.org, kernel-team@android.com, Suren Baghdasaryan Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Attempt VMA lock-based page fault handling first, and fall back to the existing mmap_lock-based handling if that fails. Signed-off-by: Suren Baghdasaryan --- arch/arm64/Kconfig | 1 + arch/arm64/mm/fault.c | 36 ++++++++++++++++++++++++++++++++++++ 2 files changed, 37 insertions(+) diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig index 27b2592698b0..412207d789b1 100644 --- a/arch/arm64/Kconfig +++ b/arch/arm64/Kconfig @@ -95,6 +95,7 @@ config ARM64 select ARCH_SUPPORTS_INT128 if CC_HAS_INT128 select ARCH_SUPPORTS_NUMA_BALANCING select ARCH_SUPPORTS_PAGE_TABLE_CHECK + select ARCH_SUPPORTS_PER_VMA_LOCK select ARCH_WANT_COMPAT_IPC_PARSE_VERSION if COMPAT select ARCH_WANT_DEFAULT_BPF_JIT select ARCH_WANT_DEFAULT_TOPDOWN_MMAP_LAYOUT diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c index f4cb0f85ccf4..9e0db5c387e3 100644 --- a/arch/arm64/mm/fault.c +++ b/arch/arm64/mm/fault.c @@ -535,6 +535,9 @@ static int __kprobes do_page_fault(unsigned long far, u= nsigned long esr, unsigned long vm_flags; unsigned int mm_flags =3D FAULT_FLAG_DEFAULT; unsigned long addr =3D untagged_addr(far); +#ifdef CONFIG_PER_VMA_LOCK + struct vm_area_struct *vma; +#endif =20 if (kprobe_page_fault(regs, esr)) return 0; @@ -585,6 +588,36 @@ static int __kprobes do_page_fault(unsigned long far, = unsigned long esr, =20 perf_sw_event(PERF_COUNT_SW_PAGE_FAULTS, 1, regs, addr); =20 +#ifdef CONFIG_PER_VMA_LOCK + if (!(mm_flags & FAULT_FLAG_USER)) + goto lock_mmap; + + vma =3D lock_vma_under_rcu(mm, addr); + if (!vma) + goto lock_mmap; + + if (!(vma->vm_flags & vm_flags)) { + vma_end_read(vma); + goto lock_mmap; + } + fault =3D handle_mm_fault(vma, addr & PAGE_MASK, + mm_flags | FAULT_FLAG_VMA_LOCK, regs); + vma_end_read(vma); + + if (!(fault & VM_FAULT_RETRY)) { + count_vm_vma_lock_event(VMA_LOCK_SUCCESS); + goto done; + } + count_vm_vma_lock_event(VMA_LOCK_RETRY); + + /* Quick path to respond to signals */ + if (fault_signal_pending(fault, regs)) { + if (!user_mode(regs)) + goto no_context; + return 0; + } +lock_mmap: +#endif /* CONFIG_PER_VMA_LOCK */ /* * As per x86, we may deadlock here. However, since the kernel only * validly references user space from well defined areas of the code, @@ -628,6 +661,9 @@ static int __kprobes do_page_fault(unsigned long far, u= nsigned long esr, } mmap_read_unlock(mm); =20 +#ifdef CONFIG_PER_VMA_LOCK +done: +#endif /* * Handle the "normal" (no error) case first. */ --=20 2.39.2.722.g9855ee24e9-goog From nobody Wed Sep 10 09:07:32 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7720FC64ED8 for ; Mon, 27 Feb 2023 17:39:17 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230479AbjB0RjP (ORCPT ); Mon, 27 Feb 2023 12:39:15 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48988 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230296AbjB0Ri0 (ORCPT ); Mon, 27 Feb 2023 12:38:26 -0500 Received: from mail-yw1-x114a.google.com (mail-yw1-x114a.google.com [IPv6:2607:f8b0:4864:20::114a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C10E635B4 for ; Mon, 27 Feb 2023 09:37:52 -0800 (PST) Received: by mail-yw1-x114a.google.com with SMTP id 00721157ae682-536cb268ab8so154468307b3.17 for ; Mon, 27 Feb 2023 09:37:52 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=Hcfd1e1CYNsfSesf73569pOSKya0WHju+cd+LlOfzq8=; b=lbYIqJZdRbbdDZdXc3KPyecDlcIgvcbfb+DcxfEk3pJ0tkCKREu4kmt7prlFLJ1+FQ T7SrZzz7AESmHbsebVB/dcsUJ3sSPqxUDap8J4TM2PMm/f0fKj2d5agwEF1dRo2i14i7 2Tm7gxYItOsxZ6/P6dejoIKGIeHHxYKwoxOS0Hzf8d46xNc4trKpV4bsx+tVItCky7z5 aUXOUsg8Vx6avsDKLvMNoEFIIJZIXtu9ciloUm8OADOlD0yqFEBOdNzn/R9lnJBdqTeu YgUEQUdg6StQbFTiY+Ix3uwxS+O8mpUohpFOuENl70pA6HrQlHtgGjdfJeJSJ8v+j9zf qj8Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=Hcfd1e1CYNsfSesf73569pOSKya0WHju+cd+LlOfzq8=; b=WaaBjqlhvTXUCjOF6UtafV+Qsy/Rhei2q/DxFMQSBf58LXlQLWK+hQ+0NBObyyZ4Vr T0ygYJZRMJXoaKYicINYg+UuDgK3/9jdXqf2xG/IzCaM619FGNWVoJm92EBa7ZunKiI1 DrMrNlvyTsCkGDya4idCFELjhHpzjqUNyZnLXxYbLtVH7HBu9+KAtcU1Ec0sNaNHH9ij tjMALS9bzB7R6WWT0/wCso0BzGZxWde8SK2SBq4AASCUuORBw7PGBJrkSitSwk5RTJu0 4SoHRe6pwbed1scJICR04Ecfov7A1MVem8YYgRWJi78m0UibFg/EkWLUN7WyhGiMqa/8 HfxQ== X-Gm-Message-State: AO0yUKW96IP/x5ZkIL5nm49oWMv7TmSZse13pHRGPkjWqEz7y4kwi9/s idkKcUO+KwSoDQdDlBFTIkk+ICgYfso= X-Google-Smtp-Source: AK7set/pwIZDrdxVaNh2CuYGKu4UYoRP2oxQl2zbmcloOQnN3ea3mqmxriAC+2y4Q2nBddYhpyeOIu/z0Zw= X-Received: from surenb-desktop.mtv.corp.google.com ([2620:15c:211:200:e1f6:21d1:eead:3897]) (user=surenb job=sendgmr) by 2002:a05:690c:31c:b0:52f:184a:da09 with SMTP id bg28-20020a05690c031c00b0052f184ada09mr265273ywb.2.1677519471088; Mon, 27 Feb 2023 09:37:51 -0800 (PST) Date: Mon, 27 Feb 2023 09:36:30 -0800 In-Reply-To: <20230227173632.3292573-1-surenb@google.com> Mime-Version: 1.0 References: <20230227173632.3292573-1-surenb@google.com> X-Mailer: git-send-email 2.39.2.722.g9855ee24e9-goog Message-ID: <20230227173632.3292573-32-surenb@google.com> Subject: [PATCH v4 31/33] powerc/mm: try VMA lock-based page fault handling first From: Suren Baghdasaryan To: akpm@linux-foundation.org Cc: michel@lespinasse.org, jglisse@google.com, mhocko@suse.com, vbabka@suse.cz, hannes@cmpxchg.org, mgorman@techsingularity.net, dave@stgolabs.net, willy@infradead.org, liam.howlett@oracle.com, peterz@infradead.org, ldufour@linux.ibm.com, paulmck@kernel.org, mingo@redhat.com, will@kernel.org, luto@kernel.org, songliubraving@fb.com, peterx@redhat.com, david@redhat.com, dhowells@redhat.com, hughd@google.com, bigeasy@linutronix.de, kent.overstreet@linux.dev, punit.agrawal@bytedance.com, lstoakes@gmail.com, peterjung1337@gmail.com, rientjes@google.com, chriscli@google.com, axelrasmussen@google.com, joelaf@google.com, minchan@google.com, rppt@kernel.org, jannh@google.com, shakeelb@google.com, tatashin@google.com, edumazet@google.com, gthelen@google.com, gurua@google.com, arjunroy@google.com, soheil@google.com, leewalsh@google.com, posk@google.com, michalechner92@googlemail.com, linux-mm@kvack.org, linux-arm-kernel@lists.infradead.org, linuxppc-dev@lists.ozlabs.org, x86@kernel.org, linux-kernel@vger.kernel.org, kernel-team@android.com, Suren Baghdasaryan Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Laurent Dufour Attempt VMA lock-based page fault handling first, and fall back to the existing mmap_lock-based handling if that fails. Copied from "x86/mm: try VMA lock-based page fault handling first" Signed-off-by: Laurent Dufour Signed-off-by: Suren Baghdasaryan --- arch/powerpc/mm/fault.c | 41 ++++++++++++++++++++++++++ arch/powerpc/platforms/powernv/Kconfig | 1 + arch/powerpc/platforms/pseries/Kconfig | 1 + 3 files changed, 43 insertions(+) diff --git a/arch/powerpc/mm/fault.c b/arch/powerpc/mm/fault.c index 2bef19cc1b98..c7ae86b04b8a 100644 --- a/arch/powerpc/mm/fault.c +++ b/arch/powerpc/mm/fault.c @@ -469,6 +469,44 @@ static int ___do_page_fault(struct pt_regs *regs, unsi= gned long address, if (is_exec) flags |=3D FAULT_FLAG_INSTRUCTION; =20 +#ifdef CONFIG_PER_VMA_LOCK + if (!(flags & FAULT_FLAG_USER)) + goto lock_mmap; + + vma =3D lock_vma_under_rcu(mm, address); + if (!vma) + goto lock_mmap; + + if (unlikely(access_pkey_error(is_write, is_exec, + (error_code & DSISR_KEYFAULT), vma))) { + int rc =3D bad_access_pkey(regs, address, vma); + + vma_end_read(vma); + return rc; + } + + if (unlikely(access_error(is_write, is_exec, vma))) { + int rc =3D bad_access(regs, address); + + vma_end_read(vma); + return rc; + } + + fault =3D handle_mm_fault(vma, address, flags | FAULT_FLAG_VMA_LOCK, regs= ); + vma_end_read(vma); + + if (!(fault & VM_FAULT_RETRY)) { + count_vm_vma_lock_event(VMA_LOCK_SUCCESS); + goto done; + } + count_vm_vma_lock_event(VMA_LOCK_RETRY); + + if (fault_signal_pending(fault, regs)) + return user_mode(regs) ? 0 : SIGBUS; + +lock_mmap: +#endif /* CONFIG_PER_VMA_LOCK */ + /* When running in the kernel we expect faults to occur only to * addresses in user space. All other faults represent errors in the * kernel and should generate an OOPS. Unfortunately, in the case of an @@ -545,6 +583,9 @@ static int ___do_page_fault(struct pt_regs *regs, unsig= ned long address, =20 mmap_read_unlock(current->mm); =20 +#ifdef CONFIG_PER_VMA_LOCK +done: +#endif if (unlikely(fault & VM_FAULT_ERROR)) return mm_fault_error(regs, address, fault); =20 diff --git a/arch/powerpc/platforms/powernv/Kconfig b/arch/powerpc/platform= s/powernv/Kconfig index ae248a161b43..70a46acc70d6 100644 --- a/arch/powerpc/platforms/powernv/Kconfig +++ b/arch/powerpc/platforms/powernv/Kconfig @@ -16,6 +16,7 @@ config PPC_POWERNV select PPC_DOORBELL select MMU_NOTIFIER select FORCE_SMP + select ARCH_SUPPORTS_PER_VMA_LOCK default y =20 config OPAL_PRD diff --git a/arch/powerpc/platforms/pseries/Kconfig b/arch/powerpc/platform= s/pseries/Kconfig index b481c5c8bae1..9c205fe0e619 100644 --- a/arch/powerpc/platforms/pseries/Kconfig +++ b/arch/powerpc/platforms/pseries/Kconfig @@ -21,6 +21,7 @@ config PPC_PSERIES select HOTPLUG_CPU select FORCE_SMP select SWIOTLB + select ARCH_SUPPORTS_PER_VMA_LOCK default y =20 config PARAVIRT --=20 2.39.2.722.g9855ee24e9-goog From nobody Wed Sep 10 09:07:32 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 893A9C7EE23 for ; Mon, 27 Feb 2023 17:39:23 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229620AbjB0RjW (ORCPT ); Mon, 27 Feb 2023 12:39:22 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49624 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229930AbjB0Ric (ORCPT ); Mon, 27 Feb 2023 12:38:32 -0500 Received: from mail-pl1-x64a.google.com (mail-pl1-x64a.google.com [IPv6:2607:f8b0:4864:20::64a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 6AE9C59C7 for ; Mon, 27 Feb 2023 09:37:56 -0800 (PST) Received: by mail-pl1-x64a.google.com with SMTP id e1-20020a17090301c100b0019cd429f407so3941718plh.17 for ; Mon, 27 Feb 2023 09:37:56 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=F3L0KDnzHjfWSabmEfm0QwnSpYouz37Z4kQXdx4wyok=; b=AIld2qfh1hJ81pjBHIB2k6hHEBq4B9cQ7unRZ/ajU5zus4NStrM4vKSoC+D3xf5i2V I3lWxazmPww6pY3JnOd6ILpcq/3gMzUiBxySkT7gq+qA5zIjvIc98kXlCsV1T3OkgNoC wkB+0CjdoEsCW0SjfCSG1j8AzXTBtq+xLwxdRi9trKrWP6pzQ/z8RPgzdw1aTSBFWEFp tuWqf3fIoKXV2B5eA1r5q/mXlFY1b1mqVSiItBYfUpJVFKPT1XTPlAbMteQM7dFl9Yt7 rI/aJtYFzH6GTH9qBEbz2dvU/L39seWUdch/8Eafr+7mGpiemwVm0woR+UJD5ik1qt28 1vdw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=F3L0KDnzHjfWSabmEfm0QwnSpYouz37Z4kQXdx4wyok=; b=d83EgRTqfl9+qv5ad6oOXSx49I3DVx7k+fUkcrmaOR4pC/9rrgmLGeMiMo+Hws48iF k2c4rcCJ2H14yE9HbzSIhagA3MzwgR3IR2exE51Kj8vjpXxItb7WRcZxa2fXrIcNvOFl tROTI3wFw6JqKuqcY3n35TY3pdO9cSM01E0nA6uKUYFpja1GQ8C9xkGw36rlg0DRxXfc 1e+DCr334pWTkh+dyCffavPssmJTERM1i8QvfHTgf0hcn15Vf8MUdOoWYlLL2aT7Cst6 F3+PsWCbeZdcXepYWJUgSFbO9jVZpq4BNV5DnaId5hXmFA8gN7jf8N1wBWziQS8DeeI9 CkKw== X-Gm-Message-State: AO0yUKX2rt2SYKsdgJSRmZxeMgJn4dvUUwXP67SRCvAVXzsaaEgdE7bb f4vTMqcOv3+ktbcu1ZXbe/B0/6zlSig= X-Google-Smtp-Source: AK7set/h2/2/QuyVN0yF0KgPZgsVR0g2a/0YsQFw2JhIAgW+zKPw0n5+assM7s0mQSWtQbHur04pQimjpmw= X-Received: from surenb-desktop.mtv.corp.google.com ([2620:15c:211:200:e1f6:21d1:eead:3897]) (user=surenb job=sendgmr) by 2002:a17:90a:b388:b0:234:ba6f:c97a with SMTP id e8-20020a17090ab38800b00234ba6fc97amr24135pjr.3.1677519473325; Mon, 27 Feb 2023 09:37:53 -0800 (PST) Date: Mon, 27 Feb 2023 09:36:31 -0800 In-Reply-To: <20230227173632.3292573-1-surenb@google.com> Mime-Version: 1.0 References: <20230227173632.3292573-1-surenb@google.com> X-Mailer: git-send-email 2.39.2.722.g9855ee24e9-goog Message-ID: <20230227173632.3292573-33-surenb@google.com> Subject: [PATCH v4 32/33] mm/mmap: free vm_area_struct without call_rcu in exit_mmap From: Suren Baghdasaryan To: akpm@linux-foundation.org Cc: michel@lespinasse.org, jglisse@google.com, mhocko@suse.com, vbabka@suse.cz, hannes@cmpxchg.org, mgorman@techsingularity.net, dave@stgolabs.net, willy@infradead.org, liam.howlett@oracle.com, peterz@infradead.org, ldufour@linux.ibm.com, paulmck@kernel.org, mingo@redhat.com, will@kernel.org, luto@kernel.org, songliubraving@fb.com, peterx@redhat.com, david@redhat.com, dhowells@redhat.com, hughd@google.com, bigeasy@linutronix.de, kent.overstreet@linux.dev, punit.agrawal@bytedance.com, lstoakes@gmail.com, peterjung1337@gmail.com, rientjes@google.com, chriscli@google.com, axelrasmussen@google.com, joelaf@google.com, minchan@google.com, rppt@kernel.org, jannh@google.com, shakeelb@google.com, tatashin@google.com, edumazet@google.com, gthelen@google.com, gurua@google.com, arjunroy@google.com, soheil@google.com, leewalsh@google.com, posk@google.com, michalechner92@googlemail.com, linux-mm@kvack.org, linux-arm-kernel@lists.infradead.org, linuxppc-dev@lists.ozlabs.org, x86@kernel.org, linux-kernel@vger.kernel.org, kernel-team@android.com, Suren Baghdasaryan Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" call_rcu() can take a long time when callback offloading is enabled. Its use in the vm_area_free can cause regressions in the exit path when multiple VMAs are being freed. Because exit_mmap() is called only after the last mm user drops its refcount, the page fault handlers can't be racing with it. Any other possible user like oom-reaper or process_mrelease are already synchronized using mmap_lock. Therefore exit_mmap() can free VMAs directly, without the use of call_rcu(). Expose __vm_area_free() and use it from exit_mmap() to avoid possible call_rcu() floods and performance regressions caused by it. Signed-off-by: Suren Baghdasaryan --- include/linux/mm.h | 2 ++ kernel/fork.c | 2 +- mm/mmap.c | 11 +++++++---- 3 files changed, 10 insertions(+), 5 deletions(-) diff --git a/include/linux/mm.h b/include/linux/mm.h index d07ac923333f..5e142bfe7a58 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -256,6 +256,8 @@ void setup_initial_init_mm(void *start_code, void *end_= code, struct vm_area_struct *vm_area_alloc(struct mm_struct *); struct vm_area_struct *vm_area_dup(struct vm_area_struct *); void vm_area_free(struct vm_area_struct *); +/* Use only if VMA has no other users */ +void __vm_area_free(struct vm_area_struct *vma); =20 #ifndef CONFIG_MMU extern struct rb_root nommu_region_tree; diff --git a/kernel/fork.c b/kernel/fork.c index bdb55f25895d..ad37f1d0c5ab 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -480,7 +480,7 @@ struct vm_area_struct *vm_area_dup(struct vm_area_struc= t *orig) return new; } =20 -static void __vm_area_free(struct vm_area_struct *vma) +void __vm_area_free(struct vm_area_struct *vma) { free_anon_vma_name(vma); kmem_cache_free(vm_area_cachep, vma); diff --git a/mm/mmap.c b/mm/mmap.c index df13c33498db..0cd3714c2182 100644 --- a/mm/mmap.c +++ b/mm/mmap.c @@ -133,7 +133,7 @@ void unlink_file_vma(struct vm_area_struct *vma) /* * Close a vm structure and free it. */ -static void remove_vma(struct vm_area_struct *vma) +static void remove_vma(struct vm_area_struct *vma, bool unreachable) { might_sleep(); if (vma->vm_ops && vma->vm_ops->close) @@ -141,7 +141,10 @@ static void remove_vma(struct vm_area_struct *vma) if (vma->vm_file) fput(vma->vm_file); mpol_put(vma_policy(vma)); - vm_area_free(vma); + if (unreachable) + __vm_area_free(vma); + else + vm_area_free(vma); } =20 static inline struct vm_area_struct *vma_prev_limit(struct vma_iterator *v= mi, @@ -2130,7 +2133,7 @@ static inline void remove_mt(struct mm_struct *mm, st= ruct ma_state *mas) if (vma->vm_flags & VM_ACCOUNT) nr_accounted +=3D nrpages; vm_stat_account(mm, vma->vm_flags, -nrpages); - remove_vma(vma); + remove_vma(vma, false); } vm_unacct_memory(nr_accounted); validate_mm(mm); @@ -3070,7 +3073,7 @@ void exit_mmap(struct mm_struct *mm) do { if (vma->vm_flags & VM_ACCOUNT) nr_accounted +=3D vma_pages(vma); - remove_vma(vma); + remove_vma(vma, true); count++; cond_resched(); } while ((vma =3D mas_find(&mas, ULONG_MAX)) !=3D NULL); --=20 2.39.2.722.g9855ee24e9-goog From nobody Wed Sep 10 09:07:32 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7764EC7EE23 for ; Mon, 27 Feb 2023 17:39:27 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229996AbjB0RjZ (ORCPT ); Mon, 27 Feb 2023 12:39:25 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49644 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229899AbjB0Rif (ORCPT ); Mon, 27 Feb 2023 12:38:35 -0500 Received: from mail-yw1-x1149.google.com (mail-yw1-x1149.google.com [IPv6:2607:f8b0:4864:20::1149]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 6A4EC55B6 for ; Mon, 27 Feb 2023 09:37:55 -0800 (PST) Received: by mail-yw1-x1149.google.com with SMTP id 00721157ae682-5376fa4106eso153593007b3.7 for ; Mon, 27 Feb 2023 09:37:55 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=qWTCVIi1iYe9BSyPLwARD6DqoqWW7R5amHwkRUYAz6E=; b=VvMTeoUx5wjaqp/SSzBUAaS+m27RVNRJG9xutBinKMu13MVGWXfijU0pH0JIlVsRC1 1Qknjm7glaZd1lyCyg2w4XYR8Ujs0gzzwHLsVcLBT5JFXNg2+dcUo8mzjU56oHRaJiRQ Altiy52UpcBlGHU2tonuXHP1Zm9pZ7kEpRw7jPUHYcNU+RFCFPTwsB2NLxILr5m4FxUQ wXg6Va2RO6AqeAKOrEiDyanAl+KkHyrgZ3Z09+ORgyuXVSSi/uNsL4P3Dex0so3Z5yvz p07UqCIp1qtZGFtqF1JFTZwh/gE+2skLmmiHZO8ONAXTewlE9Wn0WJhWMtveGR25I/cT TS9g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=qWTCVIi1iYe9BSyPLwARD6DqoqWW7R5amHwkRUYAz6E=; b=lktnUqcnI4g2PzYEfr9EG5KzJfuhx0IMMxEjyqeu/cwF3r91KCOwLoN82yxLEaypOh 11z7l2vy52Umgt1i6ZuLZy/msQ6JglCjH3xbC4dZsHpLBkJpJsTbSIqfWvRdA8kb3tBi JMQ4toC+bxMEVGdlwVzSLYCO5CfkqgekN0420z6ByIRd2IjIPfOkle85vZnXJVEONB/1 HNL98IQhT9zQ49Fxblr+GXHZP7KzA4XuQmYARrDGquB+VCTsk7eW/U4JC8ZNxsKTbr50 ID4ZUpBXv4TgWrX6Uy13QGYAmAE/aEyAZyyl20tDdpb7OrSGyOTzhvr/ClDP5qM0HZDP utLw== X-Gm-Message-State: AO0yUKXeq1jjFrVG0rESJIPFKbrsXhkYlXlEG7VVcmAesviXMdhZB/18 whBiQcpuBs71jmdvaDBq32utbpqKdMM= X-Google-Smtp-Source: AK7set/c7Ur53LQtByhAksTFTJWraikr5DWovedZqPtoMuJRVU+wfT7Y+cDjXpRHOc4B8/glLU8UK1K2Jrk= X-Received: from surenb-desktop.mtv.corp.google.com ([2620:15c:211:200:e1f6:21d1:eead:3897]) (user=surenb job=sendgmr) by 2002:a5b:ecb:0:b0:a03:da3f:3e68 with SMTP id a11-20020a5b0ecb000000b00a03da3f3e68mr9397156ybs.12.1677519475458; Mon, 27 Feb 2023 09:37:55 -0800 (PST) Date: Mon, 27 Feb 2023 09:36:32 -0800 In-Reply-To: <20230227173632.3292573-1-surenb@google.com> Mime-Version: 1.0 References: <20230227173632.3292573-1-surenb@google.com> X-Mailer: git-send-email 2.39.2.722.g9855ee24e9-goog Message-ID: <20230227173632.3292573-34-surenb@google.com> Subject: [PATCH v4 33/33] mm: separate vma->lock from vm_area_struct From: Suren Baghdasaryan To: akpm@linux-foundation.org Cc: michel@lespinasse.org, jglisse@google.com, mhocko@suse.com, vbabka@suse.cz, hannes@cmpxchg.org, mgorman@techsingularity.net, dave@stgolabs.net, willy@infradead.org, liam.howlett@oracle.com, peterz@infradead.org, ldufour@linux.ibm.com, paulmck@kernel.org, mingo@redhat.com, will@kernel.org, luto@kernel.org, songliubraving@fb.com, peterx@redhat.com, david@redhat.com, dhowells@redhat.com, hughd@google.com, bigeasy@linutronix.de, kent.overstreet@linux.dev, punit.agrawal@bytedance.com, lstoakes@gmail.com, peterjung1337@gmail.com, rientjes@google.com, chriscli@google.com, axelrasmussen@google.com, joelaf@google.com, minchan@google.com, rppt@kernel.org, jannh@google.com, shakeelb@google.com, tatashin@google.com, edumazet@google.com, gthelen@google.com, gurua@google.com, arjunroy@google.com, soheil@google.com, leewalsh@google.com, posk@google.com, michalechner92@googlemail.com, linux-mm@kvack.org, linux-arm-kernel@lists.infradead.org, linuxppc-dev@lists.ozlabs.org, x86@kernel.org, linux-kernel@vger.kernel.org, kernel-team@android.com, Suren Baghdasaryan Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" vma->lock being part of the vm_area_struct causes performance regression during page faults because during contention its count and owner fields are constantly updated and having other parts of vm_area_struct used during page fault handling next to them causes constant cache line bouncing. Fix that by moving the lock outside of the vm_area_struct. All attempts to keep vma->lock inside vm_area_struct in a separate cache line still produce performance regression especially on NUMA machines. Smallest regression was achieved when lock is placed in the fourth cache line but that bloats vm_area_struct to 256 bytes. Considering performance and memory impact, separate lock looks like the best option. It increases memory footprint of each VMA but that can be optimized later if the new size causes issues. Note that after this change vma_init() does not allocate or initialize vma->lock anymore. A number of drivers allocate a pseudo VMA on the stack but they never use the VMA's lock, therefore it does not need to be allocated. The future drivers which might need the VMA lock should use vm_area_alloc()/vm_area_free() to allocate the VMA. Signed-off-by: Suren Baghdasaryan --- include/linux/mm.h | 23 ++++++------- include/linux/mm_types.h | 6 +++- kernel/fork.c | 73 ++++++++++++++++++++++++++++++++-------- 3 files changed, 74 insertions(+), 28 deletions(-) diff --git a/include/linux/mm.h b/include/linux/mm.h index 5e142bfe7a58..3d4bb18dfcb7 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -627,12 +627,6 @@ struct vm_operations_struct { }; =20 #ifdef CONFIG_PER_VMA_LOCK -static inline void vma_init_lock(struct vm_area_struct *vma) -{ - init_rwsem(&vma->lock); - vma->vm_lock_seq =3D -1; -} - /* * Try to read-lock a vma. The function is allowed to occasionally yield f= alse * locked result to avoid performance overhead, in which case we fall back= to @@ -644,17 +638,17 @@ static inline bool vma_start_read(struct vm_area_stru= ct *vma) if (vma->vm_lock_seq =3D=3D READ_ONCE(vma->vm_mm->mm_lock_seq)) return false; =20 - if (unlikely(down_read_trylock(&vma->lock) =3D=3D 0)) + if (unlikely(down_read_trylock(&vma->vm_lock->lock) =3D=3D 0)) return false; =20 /* * Overflow might produce false locked result. * False unlocked result is impossible because we modify and check - * vma->vm_lock_seq under vma->lock protection and mm->mm_lock_seq + * vma->vm_lock_seq under vma->vm_lock protection and mm->mm_lock_seq * modification invalidates all existing locks. */ if (unlikely(vma->vm_lock_seq =3D=3D READ_ONCE(vma->vm_mm->mm_lock_seq)))= { - up_read(&vma->lock); + up_read(&vma->vm_lock->lock); return false; } return true; @@ -663,7 +657,7 @@ static inline bool vma_start_read(struct vm_area_struct= *vma) static inline void vma_end_read(struct vm_area_struct *vma) { rcu_read_lock(); /* keeps vma alive till the end of up_read */ - up_read(&vma->lock); + up_read(&vma->vm_lock->lock); rcu_read_unlock(); } =20 @@ -681,9 +675,9 @@ static inline void vma_start_write(struct vm_area_struc= t *vma) if (vma->vm_lock_seq =3D=3D mm_lock_seq) return; =20 - down_write(&vma->lock); + down_write(&vma->vm_lock->lock); vma->vm_lock_seq =3D mm_lock_seq; - up_write(&vma->lock); + up_write(&vma->vm_lock->lock); } =20 static inline void vma_assert_write_locked(struct vm_area_struct *vma) @@ -720,6 +714,10 @@ static inline void vma_mark_detached(struct vm_area_st= ruct *vma, =20 #endif /* CONFIG_PER_VMA_LOCK */ =20 +/* + * WARNING: vma_init does not initialize vma->vm_lock. + * Use vm_area_alloc()/vm_area_free() if vma needs locking. + */ static inline void vma_init(struct vm_area_struct *vma, struct mm_struct *= mm) { static const struct vm_operations_struct dummy_vm_ops =3D {}; @@ -729,7 +727,6 @@ static inline void vma_init(struct vm_area_struct *vma,= struct mm_struct *mm) vma->vm_ops =3D &dummy_vm_ops; INIT_LIST_HEAD(&vma->anon_vma_chain); vma_mark_detached(vma, false); - vma_init_lock(vma); } =20 /* Use when VMA is not part of the VMA tree and needs no locking */ diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index 6768533a6b7c..89bbf7d8a312 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -471,6 +471,10 @@ struct anon_vma_name { char name[]; }; =20 +struct vma_lock { + struct rw_semaphore lock; +}; + /* * This struct describes a virtual memory area. There is one of these * per VM-area/task. A VM area is any part of the process virtual memory @@ -510,7 +514,7 @@ struct vm_area_struct { =20 #ifdef CONFIG_PER_VMA_LOCK int vm_lock_seq; - struct rw_semaphore lock; + struct vma_lock *vm_lock; =20 /* Flag to indicate areas detached from the mm->mm_mt tree */ bool detached; diff --git a/kernel/fork.c b/kernel/fork.c index ad37f1d0c5ab..75792157f51a 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -451,13 +451,49 @@ static struct kmem_cache *vm_area_cachep; /* SLAB cache for mm_struct structures (tsk->mm) */ static struct kmem_cache *mm_cachep; =20 +#ifdef CONFIG_PER_VMA_LOCK + +/* SLAB cache for vm_area_struct.lock */ +static struct kmem_cache *vma_lock_cachep; + +static bool vma_lock_alloc(struct vm_area_struct *vma) +{ + vma->vm_lock =3D kmem_cache_alloc(vma_lock_cachep, GFP_KERNEL); + if (!vma->vm_lock) + return false; + + init_rwsem(&vma->vm_lock->lock); + vma->vm_lock_seq =3D -1; + + return true; +} + +static inline void vma_lock_free(struct vm_area_struct *vma) +{ + kmem_cache_free(vma_lock_cachep, vma->vm_lock); +} + +#else /* CONFIG_PER_VMA_LOCK */ + +static inline bool vma_lock_alloc(struct vm_area_struct *vma) { return tru= e; } +static inline void vma_lock_free(struct vm_area_struct *vma) {} + +#endif /* CONFIG_PER_VMA_LOCK */ + struct vm_area_struct *vm_area_alloc(struct mm_struct *mm) { struct vm_area_struct *vma; =20 vma =3D kmem_cache_alloc(vm_area_cachep, GFP_KERNEL); - if (vma) - vma_init(vma, mm); + if (!vma) + return NULL; + + vma_init(vma, mm); + if (!vma_lock_alloc(vma)) { + kmem_cache_free(vm_area_cachep, vma); + return NULL; + } + return vma; } =20 @@ -465,24 +501,30 @@ struct vm_area_struct *vm_area_dup(struct vm_area_str= uct *orig) { struct vm_area_struct *new =3D kmem_cache_alloc(vm_area_cachep, GFP_KERNE= L); =20 - if (new) { - ASSERT_EXCLUSIVE_WRITER(orig->vm_flags); - ASSERT_EXCLUSIVE_WRITER(orig->vm_file); - /* - * orig->shared.rb may be modified concurrently, but the clone - * will be reinitialized. - */ - data_race(memcpy(new, orig, sizeof(*new))); - INIT_LIST_HEAD(&new->anon_vma_chain); - vma_init_lock(new); - dup_anon_vma_name(orig, new); + if (!new) + return NULL; + + ASSERT_EXCLUSIVE_WRITER(orig->vm_flags); + ASSERT_EXCLUSIVE_WRITER(orig->vm_file); + /* + * orig->shared.rb may be modified concurrently, but the clone + * will be reinitialized. + */ + data_race(memcpy(new, orig, sizeof(*new))); + if (!vma_lock_alloc(new)) { + kmem_cache_free(vm_area_cachep, new); + return NULL; } + INIT_LIST_HEAD(&new->anon_vma_chain); + dup_anon_vma_name(orig, new); + return new; } =20 void __vm_area_free(struct vm_area_struct *vma) { free_anon_vma_name(vma); + vma_lock_free(vma); kmem_cache_free(vm_area_cachep, vma); } =20 @@ -493,7 +535,7 @@ static void vm_area_free_rcu_cb(struct rcu_head *head) vm_rcu); =20 /* The vma should not be locked while being destroyed. */ - VM_BUG_ON_VMA(rwsem_is_locked(&vma->lock), vma); + VM_BUG_ON_VMA(rwsem_is_locked(&vma->vm_lock->lock), vma); __vm_area_free(vma); } #endif @@ -3160,6 +3202,9 @@ void __init proc_caches_init(void) NULL); =20 vm_area_cachep =3D KMEM_CACHE(vm_area_struct, SLAB_PANIC|SLAB_ACCOUNT); +#ifdef CONFIG_PER_VMA_LOCK + vma_lock_cachep =3D KMEM_CACHE(vma_lock, SLAB_PANIC|SLAB_ACCOUNT); +#endif mmap_init(); nsproxy_cache_init(); } --=20 2.39.2.722.g9855ee24e9-goog