From nobody Mon Sep 8 18:52:45 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 41D04C6FD19 for ; Mon, 13 Mar 2023 11:30:17 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230212AbjCMLaP (ORCPT ); Mon, 13 Mar 2023 07:30:15 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55032 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230167AbjCMLaH (ORCPT ); Mon, 13 Mar 2023 07:30:07 -0400 Received: from mail-pf1-x42b.google.com (mail-pf1-x42b.google.com [IPv6:2607:f8b0:4864:20::42b]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 55B2F4C6FC for ; Mon, 13 Mar 2023 04:29:43 -0700 (PDT) Received: by mail-pf1-x42b.google.com with SMTP id n16so1816290pfu.11 for ; Mon, 13 Mar 2023 04:29:43 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1678706983; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=cHjpkGxbHVFFSyOk4f8KxselJo2cIaqjL6okRMIx/Nw=; b=cMAG05THHEL45EY/ZCrlGRIbq1kt4I6LPUMAH/dQjouL+NMY60brEhRYxofz4Sc3yO jVllerX4euE/VSk8gxVy2qydl3B1I0ZvizXWnB/k4AlXoCTSsvfCFO9lUUclwXD/c7zx +Lstf6Kcx2x9pLpDuB/8MwyscxtFe5rDeDGAP/r3rkvK8Nru8nZQojqHvbsNe7a2Ab7/ gUPJ2C1u8cgdUyoJ+NnEXAlZ1izrfSpHumWDpFe9DMlhF0rV7W8iS0Gr8SVDtkxfdNr/ N3obCyhq6rnOWhxjxKbMgLykNy2bkGDKgby8e4YMGNLWFieQ1ksal3K4NhZ3AUVdAOK+ UsUw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1678706983; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=cHjpkGxbHVFFSyOk4f8KxselJo2cIaqjL6okRMIx/Nw=; b=W8QPQQxqVIyFmyp8lbMo8UIKoTm3iLRfsQNS0g/u0doZJmwY60m09905ShojbH1f4r 5uvj5mxC5bH/iXyJQQKb8wjYX0pGdVU7zLxWickgnbf5P1n+fcgDAI/RsGjAZiRUYOjq hdYoFnXcr5J6ijjoQlgQBW4kjzcio+4FBwnUU159O57otp9ko+xpC0u/dY7boCosYvLc Z5HVDVSF5f17PDh6AD4VTeQIO2pxf1E7+IfQWIQ/ngEE+Hf7z9tAFtf9+Owb/5zasrbo 9UGty363kvrvdtDzSHrDQDszVHvyWInhMED8HlyHjiFZnkStHtRPBNE1T0PL8Fph2z00 6F4A== X-Gm-Message-State: AO0yUKXhf4rZ4I0WIWLpHFs5Q8XivotVrMfWX900IEbaCfG8dnvUR+GG NmE8oHGpB4zCj1Z2isTjim83ew== X-Google-Smtp-Source: AK7set/7+CC2e3gj8ocPsHr26haOK3QREhbFYzVHIRff/+7vh0RhPeq9x+b/EZ+X+KieTYLWMp1SeQ== X-Received: by 2002:a05:6a00:2148:b0:606:d3d1:4cc4 with SMTP id o8-20020a056a00214800b00606d3d14cc4mr10842399pfk.3.1678706982825; Mon, 13 Mar 2023 04:29:42 -0700 (PDT) Received: from C02DW0BEMD6R.bytedance.net ([139.177.225.229]) by smtp.gmail.com with ESMTPSA id n2-20020a654882000000b0050300a7c8c2sm4390827pgs.89.2023.03.13.04.29.36 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 13 Mar 2023 04:29:42 -0700 (PDT) From: Qi Zheng To: akpm@linux-foundation.org, tkhai@ya.ru, vbabka@suse.cz, christian.koenig@amd.com, hannes@cmpxchg.org, shakeelb@google.com, mhocko@kernel.org, roman.gushchin@linux.dev, muchun.song@linux.dev, david@redhat.com, shy828301@gmail.com Cc: sultan@kerneltoast.com, dave@stgolabs.net, penguin-kernel@I-love.SAKURA.ne.jp, paulmck@kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, Qi Zheng Subject: [PATCH v5 1/8] mm: vmscan: add a map_nr_max field to shrinker_info Date: Mon, 13 Mar 2023 19:28:12 +0800 Message-Id: <20230313112819.38938-2-zhengqi.arch@bytedance.com> X-Mailer: git-send-email 2.24.3 (Apple Git-128) In-Reply-To: <20230313112819.38938-1-zhengqi.arch@bytedance.com> References: <20230313112819.38938-1-zhengqi.arch@bytedance.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" To prepare for the subsequent lockless memcg slab shrink, add a map_nr_max field to struct shrinker_info to records its own real shrinker_nr_max. Signed-off-by: Qi Zheng Suggested-by: Kirill Tkhai Acked-by: Vlastimil Babka Acked-by: Roman Gushchin --- include/linux/memcontrol.h | 1 + mm/vmscan.c | 35 ++++++++++++++++++----------------- 2 files changed, 19 insertions(+), 17 deletions(-) diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h index b6eda2ab205d..aa69ea98e2d8 100644 --- a/include/linux/memcontrol.h +++ b/include/linux/memcontrol.h @@ -97,6 +97,7 @@ struct shrinker_info { struct rcu_head rcu; atomic_long_t *nr_deferred; unsigned long *map; + int map_nr_max; }; =20 struct lruvec_stats_percpu { diff --git a/mm/vmscan.c b/mm/vmscan.c index 9414226218f0..9a2a6301052c 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -226,7 +226,8 @@ static struct shrinker_info *shrinker_info_protected(st= ruct mem_cgroup *memcg, =20 static int expand_one_shrinker_info(struct mem_cgroup *memcg, int map_size, int defer_size, - int old_map_size, int old_defer_size) + int old_map_size, int old_defer_size, + int new_nr_max) { struct shrinker_info *new, *old; struct mem_cgroup_per_node *pn; @@ -240,12 +241,17 @@ static int expand_one_shrinker_info(struct mem_cgroup= *memcg, if (!old) return 0; =20 + /* Already expanded this shrinker_info */ + if (new_nr_max <=3D old->map_nr_max) + continue; + new =3D kvmalloc_node(sizeof(*new) + size, GFP_KERNEL, nid); if (!new) return -ENOMEM; =20 new->nr_deferred =3D (atomic_long_t *)(new + 1); new->map =3D (void *)new->nr_deferred + defer_size; + new->map_nr_max =3D new_nr_max; =20 /* map: set all old bits, clear all new bits */ memset(new->map, (int)0xff, old_map_size); @@ -295,6 +301,7 @@ int alloc_shrinker_info(struct mem_cgroup *memcg) } info->nr_deferred =3D (atomic_long_t *)(info + 1); info->map =3D (void *)info->nr_deferred + defer_size; + info->map_nr_max =3D shrinker_nr_max; rcu_assign_pointer(memcg->nodeinfo[nid]->shrinker_info, info); } up_write(&shrinker_rwsem); @@ -302,23 +309,14 @@ int alloc_shrinker_info(struct mem_cgroup *memcg) return ret; } =20 -static inline bool need_expand(int nr_max) -{ - return round_up(nr_max, BITS_PER_LONG) > - round_up(shrinker_nr_max, BITS_PER_LONG); -} - static int expand_shrinker_info(int new_id) { int ret =3D 0; - int new_nr_max =3D new_id + 1; + int new_nr_max =3D round_up(new_id + 1, BITS_PER_LONG); int map_size, defer_size =3D 0; int old_map_size, old_defer_size =3D 0; struct mem_cgroup *memcg; =20 - if (!need_expand(new_nr_max)) - goto out; - if (!root_mem_cgroup) goto out; =20 @@ -332,7 +330,8 @@ static int expand_shrinker_info(int new_id) memcg =3D mem_cgroup_iter(NULL, NULL, NULL); do { ret =3D expand_one_shrinker_info(memcg, map_size, defer_size, - old_map_size, old_defer_size); + old_map_size, old_defer_size, + new_nr_max); if (ret) { mem_cgroup_iter_break(NULL, memcg); goto out; @@ -352,9 +351,11 @@ void set_shrinker_bit(struct mem_cgroup *memcg, int ni= d, int shrinker_id) =20 rcu_read_lock(); info =3D rcu_dereference(memcg->nodeinfo[nid]->shrinker_info); - /* Pairs with smp mb in shrink_slab() */ - smp_mb__before_atomic(); - set_bit(shrinker_id, info->map); + if (!WARN_ON_ONCE(shrinker_id >=3D info->map_nr_max)) { + /* Pairs with smp mb in shrink_slab() */ + smp_mb__before_atomic(); + set_bit(shrinker_id, info->map); + } rcu_read_unlock(); } } @@ -432,7 +433,7 @@ void reparent_shrinker_deferred(struct mem_cgroup *memc= g) for_each_node(nid) { child_info =3D shrinker_info_protected(memcg, nid); parent_info =3D shrinker_info_protected(parent, nid); - for (i =3D 0; i < shrinker_nr_max; i++) { + for (i =3D 0; i < child_info->map_nr_max; i++) { nr =3D atomic_long_read(&child_info->nr_deferred[i]); atomic_long_add(nr, &parent_info->nr_deferred[i]); } @@ -899,7 +900,7 @@ static unsigned long shrink_slab_memcg(gfp_t gfp_mask, = int nid, if (unlikely(!info)) goto unlock; =20 - for_each_set_bit(i, info->map, shrinker_nr_max) { + for_each_set_bit(i, info->map, info->map_nr_max) { struct shrink_control sc =3D { .gfp_mask =3D gfp_mask, .nid =3D nid, --=20 2.20.1 From nobody Mon Sep 8 18:52:45 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 25065C6FD19 for ; Mon, 13 Mar 2023 11:30:04 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230126AbjCMLaC (ORCPT ); Mon, 13 Mar 2023 07:30:02 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55146 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230135AbjCML3x (ORCPT ); Mon, 13 Mar 2023 07:29:53 -0400 Received: from mail-pj1-x1032.google.com (mail-pj1-x1032.google.com [IPv6:2607:f8b0:4864:20::1032]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 1C818559EE for ; Mon, 13 Mar 2023 04:29:51 -0700 (PDT) Received: by mail-pj1-x1032.google.com with SMTP id u3-20020a17090a450300b00239db6d7d47so11320778pjg.4 for ; Mon, 13 Mar 2023 04:29:51 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1678706990; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=o+PIjzMrqqus+vCKR4kW46vOjsCmmZLpdvQvuCO0Xe0=; b=LKdBuqoPZ3Eoe3IWRve5czAmJNAuSXHJEZyt6qMgQ2QWE0FfdegRHxqdr14XQLU0rx PsAdOS3EITkjvH5yWgAVHjIGJIgtokVqDvCFC26F4koRE2Dg75XvyXjyLjnvaxEhaWs0 dhQ9CFD461CllT/ivbb0KEKQLcI9xDxnAgIIgqPeywEBe7ZlcK3Kpq2x4YNZJlXiWmJW wFSsnsQNtCw4pUPGf8nkmUQNQWO9giNg3PiXQrgZK9dUZR0DabpxIf6V5RpwknTkHH+s 11as/+u7W1eLQqbscYrAfCE12YP+G4kZxz1LGruS5fzUaeaeMeP5o6+X9fPn1O+TsKdH kbvQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1678706990; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=o+PIjzMrqqus+vCKR4kW46vOjsCmmZLpdvQvuCO0Xe0=; b=FvriqHfG4WwNH44Z8cRjr2FyKqxVr0XUARuitkSFjSL8da5/5BFYocl9uHOqscBwd2 liXp7UPGsu1JEKSaeLMDJdit18DdariIiJPjK5HsGsb4l86mAQRp+DqGITEX4nNqUz8T 7E2hYS2hxwhdeUoelDncTp1trpRT6usWLC1Dd2cqkRnx+a3QuK5Hvzz/C6btmDxZL0RQ OMNXhcqcK9cJ/c8Il5mQC5pxKzkdp5zuO94We5xbWYHe4GrSLAZq0ryguq5t8XoFuWl2 a2N37ovdHLILO2qdLTbujR49Pxd8V7cwemx/YimaflkCG2NCKOqd9gCiXLtFjB8kBIom nx2g== X-Gm-Message-State: AO0yUKVPwqUmwo+93scnoD65Wf6F4k01o7p5zF/lzF5mpOlFpiOO2wDX MnfrcM5v4rLAY/9zTivibxAHOQ== X-Google-Smtp-Source: AK7set9cSLNe/LRJSehPejbnqQcEtWGFrHHPC3Q/DqoKx4Ii0aTTtc55w7QWCxr0TLSi4GrfoJETMQ== X-Received: by 2002:a05:6a20:841c:b0:c7:af88:4199 with SMTP id c28-20020a056a20841c00b000c7af884199mr17859458pzd.6.1678706990517; Mon, 13 Mar 2023 04:29:50 -0700 (PDT) Received: from C02DW0BEMD6R.bytedance.net ([139.177.225.229]) by smtp.gmail.com with ESMTPSA id n2-20020a654882000000b0050300a7c8c2sm4390827pgs.89.2023.03.13.04.29.43 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 13 Mar 2023 04:29:50 -0700 (PDT) From: Qi Zheng To: akpm@linux-foundation.org, tkhai@ya.ru, vbabka@suse.cz, christian.koenig@amd.com, hannes@cmpxchg.org, shakeelb@google.com, mhocko@kernel.org, roman.gushchin@linux.dev, muchun.song@linux.dev, david@redhat.com, shy828301@gmail.com Cc: sultan@kerneltoast.com, dave@stgolabs.net, penguin-kernel@I-love.SAKURA.ne.jp, paulmck@kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, Qi Zheng Subject: [PATCH v5 2/8] mm: vmscan: make global slab shrink lockless Date: Mon, 13 Mar 2023 19:28:13 +0800 Message-Id: <20230313112819.38938-3-zhengqi.arch@bytedance.com> X-Mailer: git-send-email 2.24.3 (Apple Git-128) In-Reply-To: <20230313112819.38938-1-zhengqi.arch@bytedance.com> References: <20230313112819.38938-1-zhengqi.arch@bytedance.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" The shrinker_rwsem is a global read-write lock in shrinkers subsystem, which protects most operations such as slab shrink, registration and unregistration of shrinkers, etc. This can easily cause problems in the following cases. 1) When the memory pressure is high and there are many filesystems mounted or unmounted at the same time, slab shrink will be affected (down_read_trylock() failed). Such as the real workload mentioned by Kirill Tkhai: ``` One of the real workloads from my experience is start of an overcommitted node containing many starting containers after node crash (or many resuming containers after reboot for kernel update). In these cases memory pressure is huge, and the node goes round in long reclaim. ``` 2) If a shrinker is blocked (such as the case mentioned in [1]) and a writer comes in (such as mount a fs), then this writer will be blocked and cause all subsequent shrinker-related operations to be blocked. Even if there is no competitor when shrinking slab, there may still be a problem. If we have a long shrinker list and we do not reclaim enough memory with each shrinker, then the down_read_trylock() may be called with high frequency. Because of the poor multicore scalability of atomic operations, this can lead to a significant drop in IPC (instructions per cycle). So many times in history ([2],[3],[4],[5]), some people wanted to replace shrinker_rwsem trylock with SRCU in the slab shrink, but all these patches were abandoned because SRCU was not unconditionally enabled. But now, since commit 1cd0bd06093c ("rcu: Remove CONFIG_SRCU"), the SRCU is unconditionally enabled. So it's time to use SRCU to protect readers who previously held shrinker_rwsem. This commit uses SRCU to make global slab shrink lockless, the memcg slab shrink is handled in the subsequent patch. [1]. https://lore.kernel.org/lkml/20191129214541.3110-1-ptikhomirov@virtuoz= zo.com/ [2]. https://lore.kernel.org/all/1437080113.3596.2.camel@stgolabs.net/ [3]. https://lore.kernel.org/lkml/1510609063-3327-1-git-send-email-penguin-= kernel@I-love.SAKURA.ne.jp/ [4]. https://lore.kernel.org/lkml/153365347929.19074.12509495712735843805.s= tgit@localhost.localdomain/ [5]. https://lore.kernel.org/lkml/20210927074823.5825-1-sultan@kerneltoast.= com/ Signed-off-by: Qi Zheng Acked-by: Vlastimil Babka Acked-by: Kirill Tkhai --- mm/vmscan.c | 28 ++++++++++++---------------- 1 file changed, 12 insertions(+), 16 deletions(-) diff --git a/mm/vmscan.c b/mm/vmscan.c index 9a2a6301052c..db2ed6e08f67 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -57,6 +57,7 @@ #include #include #include +#include =20 #include #include @@ -202,6 +203,7 @@ static void set_task_reclaim_state(struct task_struct *= task, =20 LIST_HEAD(shrinker_list); DECLARE_RWSEM(shrinker_rwsem); +DEFINE_SRCU(shrinker_srcu); =20 #ifdef CONFIG_MEMCG static int shrinker_nr_max; @@ -700,7 +702,7 @@ void free_prealloced_shrinker(struct shrinker *shrinker) void register_shrinker_prepared(struct shrinker *shrinker) { down_write(&shrinker_rwsem); - list_add_tail(&shrinker->list, &shrinker_list); + list_add_tail_rcu(&shrinker->list, &shrinker_list); shrinker->flags |=3D SHRINKER_REGISTERED; shrinker_debugfs_add(shrinker); up_write(&shrinker_rwsem); @@ -754,13 +756,15 @@ void unregister_shrinker(struct shrinker *shrinker) return; =20 down_write(&shrinker_rwsem); - list_del(&shrinker->list); + list_del_rcu(&shrinker->list); shrinker->flags &=3D ~SHRINKER_REGISTERED; if (shrinker->flags & SHRINKER_MEMCG_AWARE) unregister_memcg_shrinker(shrinker); debugfs_entry =3D shrinker_debugfs_remove(shrinker); up_write(&shrinker_rwsem); =20 + synchronize_srcu(&shrinker_srcu); + debugfs_remove_recursive(debugfs_entry); =20 kfree(shrinker->nr_deferred); @@ -780,6 +784,7 @@ void synchronize_shrinkers(void) { down_write(&shrinker_rwsem); up_write(&shrinker_rwsem); + synchronize_srcu(&shrinker_srcu); } EXPORT_SYMBOL(synchronize_shrinkers); =20 @@ -990,6 +995,7 @@ static unsigned long shrink_slab(gfp_t gfp_mask, int ni= d, { unsigned long ret, freed =3D 0; struct shrinker *shrinker; + int srcu_idx; =20 /* * The root memcg might be allocated even though memcg is disabled @@ -1001,10 +1007,10 @@ static unsigned long shrink_slab(gfp_t gfp_mask, in= t nid, if (!mem_cgroup_disabled() && !mem_cgroup_is_root(memcg)) return shrink_slab_memcg(gfp_mask, nid, memcg, priority); =20 - if (!down_read_trylock(&shrinker_rwsem)) - goto out; + srcu_idx =3D srcu_read_lock(&shrinker_srcu); =20 - list_for_each_entry(shrinker, &shrinker_list, list) { + list_for_each_entry_srcu(shrinker, &shrinker_list, list, + srcu_read_lock_held(&shrinker_srcu)) { struct shrink_control sc =3D { .gfp_mask =3D gfp_mask, .nid =3D nid, @@ -1015,19 +1021,9 @@ static unsigned long shrink_slab(gfp_t gfp_mask, int= nid, if (ret =3D=3D SHRINK_EMPTY) ret =3D 0; freed +=3D ret; - /* - * Bail out if someone want to register a new shrinker to - * prevent the registration from being stalled for long periods - * by parallel ongoing shrinking. - */ - if (rwsem_is_contended(&shrinker_rwsem)) { - freed =3D freed ? : 1; - break; - } } =20 - up_read(&shrinker_rwsem); -out: + srcu_read_unlock(&shrinker_srcu, srcu_idx); cond_resched(); return freed; } --=20 2.20.1 From nobody Mon Sep 8 18:52:45 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id D8ED3C6FD19 for ; Mon, 13 Mar 2023 11:30:30 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230300AbjCMLa3 (ORCPT ); Mon, 13 Mar 2023 07:30:29 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55212 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230215AbjCMLaP (ORCPT ); Mon, 13 Mar 2023 07:30:15 -0400 Received: from mail-pl1-x62f.google.com (mail-pl1-x62f.google.com [IPv6:2607:f8b0:4864:20::62f]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 39CBF4DE14 for ; Mon, 13 Mar 2023 04:29:57 -0700 (PDT) Received: by mail-pl1-x62f.google.com with SMTP id u5so12549026plq.7 for ; Mon, 13 Mar 2023 04:29:57 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1678706997; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=CwHKJa6QSzn+fFUbr3GdLYPgv95cnerwTX8zoVbLwNQ=; b=VL291BsmdjOKqQ02b55ZkiNe1Gxiw0B9HIG72ICowrQQIz60/2k76pkbV6ohUH0jkm Jox9Y09fVi77XloTu403ELBMBr76X8dzxeQ3haW2GgqK0Esu9ZkoXpWzHaSH6tAaMoDV tmDAGCPgOSqsKKkamJFDdXSO6nIaXZBF40p+RTjEGc53dohjfIGG8/C+G7GvyIESgath WeikExSTqhrXJdcJWm/xMTEGyKADTPQDEs+Ees4T5pfkypk5jFJZpd/7M1ecIk1aLgTn YzvxNO/avEaPXMEeDvs/G5Ff4veN37dFXuRX65vfKGciMuehn4VVeyzAB73LmFAqM9LG sxIw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1678706997; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=CwHKJa6QSzn+fFUbr3GdLYPgv95cnerwTX8zoVbLwNQ=; b=xxYdK0ETSpv07mu7DxsKn8vdTgXep7G0mZN2EZhS+4vcoj6xXRJWV/ZnW8UZBTqKlU W7pkoKKRse/pGvOxwZ6mD8K4rcEGYx2E9kJ3lj7bx72uSeHWSEcvERUXQ6AjlD39SU/9 rioD8KXEcAcOG+REy1ggA3oUF0Arx0VpxD+TChNRGb9j7ZOcI8CtbGo91u7aISy55nOy L1IpqEuGBtq0s5iZUaiy/Q5TrC9pdUdQO7CuHJQ0sIFYx4W9vj2Q584St7JuTvA9j4LI tEuY5EcU4lkg8shtHtGfFei+Ks0/FGWy65DEx0UAvZxeTeGtILhYC9E1AclajQFN551p 0TXA== X-Gm-Message-State: AO0yUKU9GdbvodDFvVK+hBLRiR+8b5/1Fb+ZVnwn33ysX2f79e0441H8 K0HIWOOQR90wQ9oNt//TLxbLkQ== X-Google-Smtp-Source: AK7set9JyLNsrTQk+ldmpaXssouDn0mkoIHdAxrcwhQZfqJh+G9zdIKa6tn/z3NoTm4Hps375eI0IQ== X-Received: by 2002:a17:90a:990e:b0:233:a836:15f4 with SMTP id b14-20020a17090a990e00b00233a83615f4mr12917519pjp.1.1678706996712; Mon, 13 Mar 2023 04:29:56 -0700 (PDT) Received: from C02DW0BEMD6R.bytedance.net ([139.177.225.229]) by smtp.gmail.com with ESMTPSA id n2-20020a654882000000b0050300a7c8c2sm4390827pgs.89.2023.03.13.04.29.50 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 13 Mar 2023 04:29:56 -0700 (PDT) From: Qi Zheng To: akpm@linux-foundation.org, tkhai@ya.ru, vbabka@suse.cz, christian.koenig@amd.com, hannes@cmpxchg.org, shakeelb@google.com, mhocko@kernel.org, roman.gushchin@linux.dev, muchun.song@linux.dev, david@redhat.com, shy828301@gmail.com Cc: sultan@kerneltoast.com, dave@stgolabs.net, penguin-kernel@I-love.SAKURA.ne.jp, paulmck@kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, Qi Zheng , Vlastimil Babka Subject: [PATCH v5 3/8] mm: vmscan: make memcg slab shrink lockless Date: Mon, 13 Mar 2023 19:28:14 +0800 Message-Id: <20230313112819.38938-4-zhengqi.arch@bytedance.com> X-Mailer: git-send-email 2.24.3 (Apple Git-128) In-Reply-To: <20230313112819.38938-1-zhengqi.arch@bytedance.com> References: <20230313112819.38938-1-zhengqi.arch@bytedance.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" Like global slab shrink, this commit also uses SRCU to make memcg slab shrink lockless. We can reproduce the down_read_trylock() hotspot through the following script: ``` DIR=3D"/root/shrinker/memcg/mnt" do_create() { mkdir -p /sys/fs/cgroup/memory/test mkdir -p /sys/fs/cgroup/perf_event/test echo 4G > /sys/fs/cgroup/memory/test/memory.limit_in_bytes for i in `seq 0 $1`; do mkdir -p /sys/fs/cgroup/memory/test/$i; echo $$ > /sys/fs/cgroup/memory/test/$i/cgroup.procs; echo $$ > /sys/fs/cgroup/perf_event/test/cgroup.procs; mkdir -p $DIR/$i; done } do_mount() { for i in `seq $1 $2`; do mount -t tmpfs $i $DIR/$i; done } do_touch() { for i in `seq $1 $2`; do echo $$ > /sys/fs/cgroup/memory/test/$i/cgroup.procs; echo $$ > /sys/fs/cgroup/perf_event/test/cgroup.procs; dd if=3D/dev/zero of=3D$DIR/$i/file$i bs=3D1M count=3D1 & done } case "$1" in touch) do_touch $2 $3 ;; test) do_create 4000 do_mount 0 4000 do_touch 0 3000 ;; *) exit 1 ;; esac ``` Save the above script, then run test and touch commands. Then we can use the following perf command to view hotspots: perf top -U -F 999 1) Before applying this patchset: 32.31% [kernel] [k] down_read_trylock 19.40% [kernel] [k] pv_native_safe_halt 16.24% [kernel] [k] up_read 15.70% [kernel] [k] shrink_slab 4.69% [kernel] [k] _find_next_bit 2.62% [kernel] [k] shrink_node 1.78% [kernel] [k] shrink_lruvec 0.76% [kernel] [k] do_shrink_slab 2) After applying this patchset: 27.83% [kernel] [k] _find_next_bit 16.97% [kernel] [k] shrink_slab 15.82% [kernel] [k] pv_native_safe_halt 9.58% [kernel] [k] shrink_node 8.31% [kernel] [k] shrink_lruvec 5.64% [kernel] [k] do_shrink_slab 3.88% [kernel] [k] mem_cgroup_iter At the same time, we use the following perf command to capture IPC information: perf stat -e cycles,instructions -G test -a --repeat 5 -- sleep 10 1) Before applying this patchset: Performance counter stats for 'system wide' (5 runs): 454187219766 cycles test (= +- 1.84% ) 78896433101 instructions test # 0.17 insn per = cycle ( +- 0.44% ) 10.0020430 +- 0.0000366 seconds time elapsed ( +- 0.00% ) 2) After applying this patchset: Performance counter stats for 'system wide' (5 runs): 841954709443 cycles test (= +- 15.80% ) (98.69%) 527258677936 instructions test # 0.63 insn per = cycle ( +- 15.11% ) (98.68%) 10.01064 +- 0.00831 seconds time elapsed ( +- 0.08% ) We can see that IPC drops very seriously when calling down_read_trylock() at high frequency. After using SRCU, the IPC is at a normal level. Signed-off-by: Qi Zheng Acked-by: Kirill Tkhai Acked-by: Vlastimil Babka --- mm/vmscan.c | 45 ++++++++++++++++++++++++++------------------- 1 file changed, 26 insertions(+), 19 deletions(-) diff --git a/mm/vmscan.c b/mm/vmscan.c index db2ed6e08f67..ce7834030f75 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -222,8 +222,21 @@ static inline int shrinker_defer_size(int nr_items) static struct shrinker_info *shrinker_info_protected(struct mem_cgroup *me= mcg, int nid) { - return rcu_dereference_protected(memcg->nodeinfo[nid]->shrinker_info, - lockdep_is_held(&shrinker_rwsem)); + return srcu_dereference_check(memcg->nodeinfo[nid]->shrinker_info, + &shrinker_srcu, + lockdep_is_held(&shrinker_rwsem)); +} + +static struct shrinker_info *shrinker_info_srcu(struct mem_cgroup *memcg, + int nid) +{ + return srcu_dereference(memcg->nodeinfo[nid]->shrinker_info, + &shrinker_srcu); +} + +static void free_shrinker_info_rcu(struct rcu_head *head) +{ + kvfree(container_of(head, struct shrinker_info, rcu)); } =20 static int expand_one_shrinker_info(struct mem_cgroup *memcg, @@ -264,7 +277,7 @@ static int expand_one_shrinker_info(struct mem_cgroup *= memcg, defer_size - old_defer_size); =20 rcu_assign_pointer(pn->shrinker_info, new); - kvfree_rcu(old, rcu); + call_srcu(&shrinker_srcu, &old->rcu, free_shrinker_info_rcu); } =20 return 0; @@ -350,15 +363,16 @@ void set_shrinker_bit(struct mem_cgroup *memcg, int n= id, int shrinker_id) { if (shrinker_id >=3D 0 && memcg && !mem_cgroup_is_root(memcg)) { struct shrinker_info *info; + int srcu_idx; =20 - rcu_read_lock(); - info =3D rcu_dereference(memcg->nodeinfo[nid]->shrinker_info); + srcu_idx =3D srcu_read_lock(&shrinker_srcu); + info =3D shrinker_info_srcu(memcg, nid); if (!WARN_ON_ONCE(shrinker_id >=3D info->map_nr_max)) { /* Pairs with smp mb in shrink_slab() */ smp_mb__before_atomic(); set_bit(shrinker_id, info->map); } - rcu_read_unlock(); + srcu_read_unlock(&shrinker_srcu, srcu_idx); } } =20 @@ -372,7 +386,6 @@ static int prealloc_memcg_shrinker(struct shrinker *shr= inker) return -ENOSYS; =20 down_write(&shrinker_rwsem); - /* This may call shrinker, so it must use down_read_trylock() */ id =3D idr_alloc(&shrinker_idr, shrinker, 0, 0, GFP_KERNEL); if (id < 0) goto unlock; @@ -406,7 +419,7 @@ static long xchg_nr_deferred_memcg(int nid, struct shri= nker *shrinker, { struct shrinker_info *info; =20 - info =3D shrinker_info_protected(memcg, nid); + info =3D shrinker_info_srcu(memcg, nid); return atomic_long_xchg(&info->nr_deferred[shrinker->id], 0); } =20 @@ -415,7 +428,7 @@ static long add_nr_deferred_memcg(long nr, int nid, str= uct shrinker *shrinker, { struct shrinker_info *info; =20 - info =3D shrinker_info_protected(memcg, nid); + info =3D shrinker_info_srcu(memcg, nid); return atomic_long_add_return(nr, &info->nr_deferred[shrinker->id]); } =20 @@ -893,15 +906,14 @@ static unsigned long shrink_slab_memcg(gfp_t gfp_mask= , int nid, { struct shrinker_info *info; unsigned long ret, freed =3D 0; + int srcu_idx; int i; =20 if (!mem_cgroup_online(memcg)) return 0; =20 - if (!down_read_trylock(&shrinker_rwsem)) - return 0; - - info =3D shrinker_info_protected(memcg, nid); + srcu_idx =3D srcu_read_lock(&shrinker_srcu); + info =3D shrinker_info_srcu(memcg, nid); if (unlikely(!info)) goto unlock; =20 @@ -951,14 +963,9 @@ static unsigned long shrink_slab_memcg(gfp_t gfp_mask,= int nid, set_shrinker_bit(memcg, nid, i); } freed +=3D ret; - - if (rwsem_is_contended(&shrinker_rwsem)) { - freed =3D freed ? : 1; - break; - } } unlock: - up_read(&shrinker_rwsem); + srcu_read_unlock(&shrinker_srcu, srcu_idx); return freed; } #else /* CONFIG_MEMCG */ --=20 2.20.1 From nobody Mon Sep 8 18:52:45 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8DC9EC61DA4 for ; Mon, 13 Mar 2023 11:30:45 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230310AbjCMLan (ORCPT ); Mon, 13 Mar 2023 07:30:43 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55368 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230338AbjCMLa3 (ORCPT ); Mon, 13 Mar 2023 07:30:29 -0400 Received: from mail-pg1-x52b.google.com (mail-pg1-x52b.google.com [IPv6:2607:f8b0:4864:20::52b]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 80E5D5B5D1 for ; Mon, 13 Mar 2023 04:30:03 -0700 (PDT) Received: by mail-pg1-x52b.google.com with SMTP id x37so2227816pga.1 for ; Mon, 13 Mar 2023 04:30:03 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1678707003; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=joJBORpGAhUrTa0QaRmWtbHEppwsJHnBnzSS2GFBqpE=; b=gxoAK0RJLeAUG9DMQX3AJ19hEgYUC73gn5qDU+xe20VchIqQl4+4Bs8TMoR+xUoHKj ND9SsfWlrkOJpCkSW9SdC0rMfbp2pkoC+L+Zsl8WbNt1tGBTAG2LXwZTOyMQ1iLcCwQd GxK8VSLfc4QaraebduBNkUss03AkdhgD0s4yXld1S43G6GFxDxgVa30i3fDKmLe2+vsn L0khGXfZPxYQ4maITGQLyGizgnKgpvbt2obFC6FA/0459J8cOE4CnMELpfc1XfocweG3 UCt1lbaasgjVAeNzd0lchgP8FL0nLDv/KuKuOjPDpVlt6/LjMfKic6NSPuZGnknpbpu6 jdFw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1678707003; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=joJBORpGAhUrTa0QaRmWtbHEppwsJHnBnzSS2GFBqpE=; b=DoSFHOjibqoqPPp/r0SL8YX+I0VcsqopQhiotEnSi9PpWyYkY52mxKdQ9aflGPH/1o OZasjuqFarBa1P4o59cIvzAxV66ldxhZy/IVjXXPcf3raf0vhHdX5DRTp3ZaUpySAqw2 bG5HZfLjBEDPfh9wS3M923NxECHxQ++c6IdgYblangee23hQAA7XnczwQL8vVRa0qJqs Fm1P7yBvvHztE06DvSn8eQWWp555qlT9Cuyos5NE2YVHboRxk0dZL0UVVr58PS9nsVkw csOIGD12MMqk6yFDSdgm9Ux+V8jcgKcoWje8qzoFvifVCW0dk+J4zhMXwu7Pus1uI+Qn lXNg== X-Gm-Message-State: AO0yUKUCUgmbsQ11bG8kWcUFojFjcpukxQgB6gFCWZ2VXJa15jZHx6/V 15yT/nv07ctoH/HCc3d3DAbnOA== X-Google-Smtp-Source: AK7set8YFjpOWP7oOMMANTE81hjqmcZGdje1Alhj8F/KolKsPGx7n3vdYEg4ks+PCVoaRcuIXePCBQ== X-Received: by 2002:a05:6a00:2148:b0:606:d3d1:4cc4 with SMTP id o8-20020a056a00214800b00606d3d14cc4mr10843183pfk.3.1678707003039; Mon, 13 Mar 2023 04:30:03 -0700 (PDT) Received: from C02DW0BEMD6R.bytedance.net ([139.177.225.229]) by smtp.gmail.com with ESMTPSA id n2-20020a654882000000b0050300a7c8c2sm4390827pgs.89.2023.03.13.04.29.57 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 13 Mar 2023 04:30:02 -0700 (PDT) From: Qi Zheng To: akpm@linux-foundation.org, tkhai@ya.ru, vbabka@suse.cz, christian.koenig@amd.com, hannes@cmpxchg.org, shakeelb@google.com, mhocko@kernel.org, roman.gushchin@linux.dev, muchun.song@linux.dev, david@redhat.com, shy828301@gmail.com Cc: sultan@kerneltoast.com, dave@stgolabs.net, penguin-kernel@I-love.SAKURA.ne.jp, paulmck@kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, Qi Zheng Subject: [PATCH v5 4/8] mm: vmscan: add shrinker_srcu_generation Date: Mon, 13 Mar 2023 19:28:15 +0800 Message-Id: <20230313112819.38938-5-zhengqi.arch@bytedance.com> X-Mailer: git-send-email 2.24.3 (Apple Git-128) In-Reply-To: <20230313112819.38938-1-zhengqi.arch@bytedance.com> References: <20230313112819.38938-1-zhengqi.arch@bytedance.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" From: Kirill Tkhai After we make slab shrink lockless with SRCU, the longest sleep unregister_shrinker() will be a sleep waiting for all do_shrink_slab() calls. To avoid long unbreakable action in the unregister_shrinker(), add shrinker_srcu_generation to restore a check similar to the rwsem_is_contendent() check that we had before. And for memcg slab shrink, we unlock SRCU and continue iterations from the next shrinker id. Signed-off-by: Kirill Tkhai Signed-off-by: Qi Zheng Acked-by: Vlastimil Babka --- mm/vmscan.c | 24 ++++++++++++++++++++---- 1 file changed, 20 insertions(+), 4 deletions(-) diff --git a/mm/vmscan.c b/mm/vmscan.c index ce7834030f75..5c2a22454320 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -204,6 +204,7 @@ static void set_task_reclaim_state(struct task_struct *= task, LIST_HEAD(shrinker_list); DECLARE_RWSEM(shrinker_rwsem); DEFINE_SRCU(shrinker_srcu); +static atomic_t shrinker_srcu_generation =3D ATOMIC_INIT(0); =20 #ifdef CONFIG_MEMCG static int shrinker_nr_max; @@ -776,6 +777,7 @@ void unregister_shrinker(struct shrinker *shrinker) debugfs_entry =3D shrinker_debugfs_remove(shrinker); up_write(&shrinker_rwsem); =20 + atomic_inc(&shrinker_srcu_generation); synchronize_srcu(&shrinker_srcu); =20 debugfs_remove_recursive(debugfs_entry); @@ -797,6 +799,7 @@ void synchronize_shrinkers(void) { down_write(&shrinker_rwsem); up_write(&shrinker_rwsem); + atomic_inc(&shrinker_srcu_generation); synchronize_srcu(&shrinker_srcu); } EXPORT_SYMBOL(synchronize_shrinkers); @@ -906,18 +909,20 @@ static unsigned long shrink_slab_memcg(gfp_t gfp_mask= , int nid, { struct shrinker_info *info; unsigned long ret, freed =3D 0; - int srcu_idx; - int i; + int srcu_idx, generation; + int i =3D 0; =20 if (!mem_cgroup_online(memcg)) return 0; =20 +again: srcu_idx =3D srcu_read_lock(&shrinker_srcu); info =3D shrinker_info_srcu(memcg, nid); if (unlikely(!info)) goto unlock; =20 - for_each_set_bit(i, info->map, info->map_nr_max) { + generation =3D atomic_read(&shrinker_srcu_generation); + for_each_set_bit_from(i, info->map, info->map_nr_max) { struct shrink_control sc =3D { .gfp_mask =3D gfp_mask, .nid =3D nid, @@ -963,6 +968,11 @@ static unsigned long shrink_slab_memcg(gfp_t gfp_mask,= int nid, set_shrinker_bit(memcg, nid, i); } freed +=3D ret; + if (atomic_read(&shrinker_srcu_generation) !=3D generation) { + srcu_read_unlock(&shrinker_srcu, srcu_idx); + i++; + goto again; + } } unlock: srcu_read_unlock(&shrinker_srcu, srcu_idx); @@ -1002,7 +1012,7 @@ static unsigned long shrink_slab(gfp_t gfp_mask, int = nid, { unsigned long ret, freed =3D 0; struct shrinker *shrinker; - int srcu_idx; + int srcu_idx, generation; =20 /* * The root memcg might be allocated even though memcg is disabled @@ -1016,6 +1026,7 @@ static unsigned long shrink_slab(gfp_t gfp_mask, int = nid, =20 srcu_idx =3D srcu_read_lock(&shrinker_srcu); =20 + generation =3D atomic_read(&shrinker_srcu_generation); list_for_each_entry_srcu(shrinker, &shrinker_list, list, srcu_read_lock_held(&shrinker_srcu)) { struct shrink_control sc =3D { @@ -1028,6 +1039,11 @@ static unsigned long shrink_slab(gfp_t gfp_mask, int= nid, if (ret =3D=3D SHRINK_EMPTY) ret =3D 0; freed +=3D ret; + + if (atomic_read(&shrinker_srcu_generation) !=3D generation) { + freed =3D freed ? : 1; + break; + } } =20 srcu_read_unlock(&shrinker_srcu, srcu_idx); --=20 2.20.1 From nobody Mon Sep 8 18:52:45 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 73E93C6FD1C for ; Mon, 13 Mar 2023 11:30:27 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230167AbjCMLaZ (ORCPT ); Mon, 13 Mar 2023 07:30:25 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55500 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230162AbjCMLaN (ORCPT ); Mon, 13 Mar 2023 07:30:13 -0400 Received: from mail-pl1-x631.google.com (mail-pl1-x631.google.com [IPv6:2607:f8b0:4864:20::631]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id CFBEA570AC for ; Mon, 13 Mar 2023 04:30:09 -0700 (PDT) Received: by mail-pl1-x631.google.com with SMTP id u5so12549651plq.7 for ; Mon, 13 Mar 2023 04:30:09 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1678707009; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=Oxd2ThxTJ6UijCJ4SDDL7FUC7F7cO/btWP9US9pLSzo=; b=MvlGryeRy82G7VkgqEKuMpQfzAezLp2xXrHbHKrQM/eEt81Oq/AfYVdlB/GIobl8It 7j2/egtK7yEFhJCwQO1OEf29hlspWlTFr6UYJKW30dRAjzdPoQtwo+yeUuFCO5XjM1fR QTpRAX/3KhabhS2FMYkh8JFMbk32znCcnVB0I8hQW5KZXN6S0Oxh6LXvEwjcTlYALXpV G6Kh9bDRNBabSRjM+xuJvnmQWOWsrkeBNmVU+B+cq0i32zf+AmsgqKMTYjYBMPOnHkL9 Scfe6Zs4rWOLcpW74stS1wV1wpoCCzVwzmigR2g2N6q5dgesopZRoP+7gE0A+6gqZQ3p 5mjg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1678707009; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=Oxd2ThxTJ6UijCJ4SDDL7FUC7F7cO/btWP9US9pLSzo=; b=3JxYf2QSH3XS4UaJemiAHwqhbY8tJDc3FGUBxeTwnJYBTVU/Phz0A+tptEBxuya4N4 ho7lgbRpwKeEZfF1IKt36HH5Ta0D5yMiBJ2PYf3zfF9N60b47pDS3TDMGDVMZQlL+q8V HLLpZpUlHyxhvEC9GEeEhGGvR/99tGrfCUpHcPw1+zc0GM+zcI6BHV1EQCsQ4uL2t0/A VCmG689yP5v9HjRjj0BIoTjM5GDRPO6JKJGnBXeIE4mNesFswxV5XgEsbeEop7nnWvd0 H0rzSVMKiSaGejWQX7cV5CWGREH4DiQ3O/u+EYo/DD855Tf48LJtCth+HONG0Nu3eAob gjHQ== X-Gm-Message-State: AO0yUKWNLuxUNd0L66Wqcr4+E85sDoXtBwKExFQdRY+3czVVhoN8g99c Bsx4NigkK1NBD83wOtfBITxqLg== X-Google-Smtp-Source: AK7set/YpeTiNPrxyPdk7ef1TN6nco5qoqim2zvGi95z+KNjYPKTOUxtpQzpzKaJth1w+KxBldYOuQ== X-Received: by 2002:a05:6a20:6914:b0:cc:4118:65c4 with SMTP id q20-20020a056a20691400b000cc411865c4mr12735774pzj.5.1678707009311; Mon, 13 Mar 2023 04:30:09 -0700 (PDT) Received: from C02DW0BEMD6R.bytedance.net ([139.177.225.229]) by smtp.gmail.com with ESMTPSA id n2-20020a654882000000b0050300a7c8c2sm4390827pgs.89.2023.03.13.04.30.03 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 13 Mar 2023 04:30:08 -0700 (PDT) From: Qi Zheng To: akpm@linux-foundation.org, tkhai@ya.ru, vbabka@suse.cz, christian.koenig@amd.com, hannes@cmpxchg.org, shakeelb@google.com, mhocko@kernel.org, roman.gushchin@linux.dev, muchun.song@linux.dev, david@redhat.com, shy828301@gmail.com Cc: sultan@kerneltoast.com, dave@stgolabs.net, penguin-kernel@I-love.SAKURA.ne.jp, paulmck@kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, Qi Zheng Subject: [PATCH v5 5/8] mm: shrinkers: make count and scan in shrinker debugfs lockless Date: Mon, 13 Mar 2023 19:28:16 +0800 Message-Id: <20230313112819.38938-6-zhengqi.arch@bytedance.com> X-Mailer: git-send-email 2.24.3 (Apple Git-128) In-Reply-To: <20230313112819.38938-1-zhengqi.arch@bytedance.com> References: <20230313112819.38938-1-zhengqi.arch@bytedance.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" Like global and memcg slab shrink, also use SRCU to make count and scan operations in memory shrinker debugfs lockless. Signed-off-by: Qi Zheng Acked-by: Vlastimil Babka Acked-by: Kirill Tkhai --- mm/shrinker_debug.c | 25 ++++++++----------------- 1 file changed, 8 insertions(+), 17 deletions(-) diff --git a/mm/shrinker_debug.c b/mm/shrinker_debug.c index 39c3491e28a3..37d54d037495 100644 --- a/mm/shrinker_debug.c +++ b/mm/shrinker_debug.c @@ -5,10 +5,12 @@ #include #include #include +#include =20 /* defined in vmscan.c */ extern struct rw_semaphore shrinker_rwsem; extern struct list_head shrinker_list; +extern struct srcu_struct shrinker_srcu; =20 static DEFINE_IDA(shrinker_debugfs_ida); static struct dentry *shrinker_debugfs_root; @@ -49,18 +51,13 @@ static int shrinker_debugfs_count_show(struct seq_file = *m, void *v) struct mem_cgroup *memcg; unsigned long total; bool memcg_aware; - int ret, nid; + int ret =3D 0, nid, srcu_idx; =20 count_per_node =3D kcalloc(nr_node_ids, sizeof(unsigned long), GFP_KERNEL= ); if (!count_per_node) return -ENOMEM; =20 - ret =3D down_read_killable(&shrinker_rwsem); - if (ret) { - kfree(count_per_node); - return ret; - } - rcu_read_lock(); + srcu_idx =3D srcu_read_lock(&shrinker_srcu); =20 memcg_aware =3D shrinker->flags & SHRINKER_MEMCG_AWARE; =20 @@ -91,8 +88,7 @@ static int shrinker_debugfs_count_show(struct seq_file *m= , void *v) } } while ((memcg =3D mem_cgroup_iter(NULL, memcg, NULL)) !=3D NULL); =20 - rcu_read_unlock(); - up_read(&shrinker_rwsem); + srcu_read_unlock(&shrinker_srcu, srcu_idx); =20 kfree(count_per_node); return ret; @@ -115,9 +111,8 @@ static ssize_t shrinker_debugfs_scan_write(struct file = *file, .gfp_mask =3D GFP_KERNEL, }; struct mem_cgroup *memcg =3D NULL; - int nid; + int nid, srcu_idx; char kbuf[72]; - ssize_t ret; =20 read_len =3D size < (sizeof(kbuf) - 1) ? size : (sizeof(kbuf) - 1); if (copy_from_user(kbuf, buf, read_len)) @@ -146,11 +141,7 @@ static ssize_t shrinker_debugfs_scan_write(struct file= *file, return -EINVAL; } =20 - ret =3D down_read_killable(&shrinker_rwsem); - if (ret) { - mem_cgroup_put(memcg); - return ret; - } + srcu_idx =3D srcu_read_lock(&shrinker_srcu); =20 sc.nid =3D nid; sc.memcg =3D memcg; @@ -159,7 +150,7 @@ static ssize_t shrinker_debugfs_scan_write(struct file = *file, =20 shrinker->scan_objects(shrinker, &sc); =20 - up_read(&shrinker_rwsem); + srcu_read_unlock(&shrinker_srcu, srcu_idx); mem_cgroup_put(memcg); =20 return size; --=20 2.20.1 From nobody Mon Sep 8 18:52:45 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 79DF9C61DA4 for ; Mon, 13 Mar 2023 11:30:42 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230331AbjCMLal (ORCPT ); Mon, 13 Mar 2023 07:30:41 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55940 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230164AbjCMLa1 (ORCPT ); Mon, 13 Mar 2023 07:30:27 -0400 Received: from mail-pl1-x62c.google.com (mail-pl1-x62c.google.com [IPv6:2607:f8b0:4864:20::62c]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 1129D5849B for ; Mon, 13 Mar 2023 04:30:16 -0700 (PDT) Received: by mail-pl1-x62c.google.com with SMTP id k2so4674684pll.8 for ; Mon, 13 Mar 2023 04:30:15 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1678707015; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=H5UXC4Y8L+sYixm/ZiEHHormamWIB6IrtYM6DyQ3d6c=; b=ItJnDFmMsgazPNOaXu8i9CtcCnSzNN57aJj3idjpUOYKHDhuP27hoj7ePdkQNEOby+ riTdnuWQLKyP3nQPylZlMs6mnXVeCSnfHP6Q7AeVCEJSYIy+l6HW9Tbx7RDnuCzBTX04 lg/9ublTRynkmyfmxHDT8KqSD1ZrXcBSzOxYBDdTI0PM8nEOM7P/oyB09c6XGMeI9FY0 ZynsjAfzuUGtByb52mixStZA+f5WzLTbVCttXhkplGLfpxGeI3fKJtOb5yyJciSuqDaa kNFAiFqnIfkQtLxoqnTuMDedbMN6EiT6dScRhSmKgSpp0QGmx1ct8f81dbBiredODvuA ZyxA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1678707015; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=H5UXC4Y8L+sYixm/ZiEHHormamWIB6IrtYM6DyQ3d6c=; b=pHAIhElgJXev6bGADw6nqK58a6+1bA8STqTctBzMzs+cnn9VQnoSNDEGbcQQauAXPl PungvdgrKJO4TAHT94ftp78DvHD8BEVqCSjDi5EGsEjY6tFapKyrEMTuL3VFFj9FCMYt igVMLTE/uNNm6d1uLK+CYbzX4zFR6Vf0Iims/6J3HdocX/NPJhJRObFBZj4S5y1fllY0 C8Tt9UgAZ5rHrob4YML/8PATDWlOx3yP9jiHF9jUM1vj6CBJ84lW+daAiTDNDrTHi6KJ DcqcW2ppUtfGYfXoA3zTb9kXecYQO7rfZChNNO+gV6soZRqFqGhVquNJLOwkCRdROpYD A5Dw== X-Gm-Message-State: AO0yUKVTmbq3+h14hpNj6GiJdbt2mTDp0b9J8KaUwHcfVaFFUGAxdDIA BJTqMkfIi7HYAKFnCk8fHqXNJg== X-Google-Smtp-Source: AK7set+yBkkIFIpn/7Eo+7xlIbSqY5w2muehss8s+sQ+FSFjAoN1HxOMAwRaBpjTqB+jL1sgGBo1BA== X-Received: by 2002:a05:6a20:440d:b0:cd:18a2:f6cc with SMTP id ce13-20020a056a20440d00b000cd18a2f6ccmr13960885pzb.3.1678707015460; Mon, 13 Mar 2023 04:30:15 -0700 (PDT) Received: from C02DW0BEMD6R.bytedance.net ([139.177.225.229]) by smtp.gmail.com with ESMTPSA id n2-20020a654882000000b0050300a7c8c2sm4390827pgs.89.2023.03.13.04.30.09 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 13 Mar 2023 04:30:15 -0700 (PDT) From: Qi Zheng To: akpm@linux-foundation.org, tkhai@ya.ru, vbabka@suse.cz, christian.koenig@amd.com, hannes@cmpxchg.org, shakeelb@google.com, mhocko@kernel.org, roman.gushchin@linux.dev, muchun.song@linux.dev, david@redhat.com, shy828301@gmail.com Cc: sultan@kerneltoast.com, dave@stgolabs.net, penguin-kernel@I-love.SAKURA.ne.jp, paulmck@kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, Qi Zheng Subject: [PATCH v5 6/8] mm: vmscan: hold write lock to reparent shrinker nr_deferred Date: Mon, 13 Mar 2023 19:28:17 +0800 Message-Id: <20230313112819.38938-7-zhengqi.arch@bytedance.com> X-Mailer: git-send-email 2.24.3 (Apple Git-128) In-Reply-To: <20230313112819.38938-1-zhengqi.arch@bytedance.com> References: <20230313112819.38938-1-zhengqi.arch@bytedance.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" For now, reparent_shrinker_deferred() is the only holder of read lock of shrinker_rwsem. And it already holds the global cgroup_mutex, so it will not be called in parallel. Therefore, in order to convert shrinker_rwsem to shrinker_mutex later, here we change to hold the write lock of shrinker_rwsem to reparent. Signed-off-by: Qi Zheng Acked-by: Vlastimil Babka Acked-by: Kirill Tkhai --- mm/vmscan.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/mm/vmscan.c b/mm/vmscan.c index 5c2a22454320..8c1ae7ea8dea 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -445,7 +445,7 @@ void reparent_shrinker_deferred(struct mem_cgroup *memc= g) parent =3D root_mem_cgroup; =20 /* Prevent from concurrent shrinker_info expand */ - down_read(&shrinker_rwsem); + down_write(&shrinker_rwsem); for_each_node(nid) { child_info =3D shrinker_info_protected(memcg, nid); parent_info =3D shrinker_info_protected(parent, nid); @@ -454,7 +454,7 @@ void reparent_shrinker_deferred(struct mem_cgroup *memc= g) atomic_long_add(nr, &parent_info->nr_deferred[i]); } } - up_read(&shrinker_rwsem); + up_write(&shrinker_rwsem); } =20 static bool cgroup_reclaim(struct scan_control *sc) --=20 2.20.1 From nobody Mon Sep 8 18:52:45 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 31E72C61DA4 for ; Mon, 13 Mar 2023 11:31:44 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230398AbjCMLbb (ORCPT ); Mon, 13 Mar 2023 07:31:31 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:56384 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230336AbjCMLbC (ORCPT ); Mon, 13 Mar 2023 07:31:02 -0400 Received: from mail-pg1-x52e.google.com (mail-pg1-x52e.google.com [IPv6:2607:f8b0:4864:20::52e]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id BCD1962B53 for ; Mon, 13 Mar 2023 04:30:22 -0700 (PDT) Received: by mail-pg1-x52e.google.com with SMTP id 132so6666485pgh.13 for ; Mon, 13 Mar 2023 04:30:22 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1678707022; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=0Z4Tof/PnwUFj5i0NNzN56+j5kNnB7gWDICsVdyoruc=; b=LueY7xOnIXHIbNor8k5oEpAk+tX93l8rCkl4eUPXSjj/FR0coKA+VxIcbga8n7x+3/ UVYQ+xwM5v7aAKibviyLXFIf9yfA5DqdUrbSgL59K0+yWXOscDZ9kE0PPEbSKSAKGOeW fYdcX0uAb27i11e16ubA0DvK2mQmR+nk8oZAtv85shH1l8grZ50ghPaV1UKNldaTWetj vt+eouSQMr0HwGA/8pBbk59dg1Pgdhd0cKvtaYLaKLCdPxJ8k+oWK21Mgq9NtpfoT2OG W0HAgtKswNDuPDIH/Ey2qVGkFfAIGkpwaECFhXVuXnSl/4V4k4XFXHrDTRlXi/isujEx xkxw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1678707022; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=0Z4Tof/PnwUFj5i0NNzN56+j5kNnB7gWDICsVdyoruc=; b=nNUI68runvX8mQmO8km0VUGlP4ncRHDzN/N/eiwRc0Cz6Eqh4e/13dL18Ktgg3Rzmr bmykdNOihfNhI6t753yqO8j+YCTj1+TRBir7oYUHeRaqJ9Q/4tAdES0Q1K3e4cGNJH1b mjIl/hFq2qsUuh0UzEQvH+sJ4HpSfhCTTXZsISCQjaqcAlEZyJ5J0v23NyzbKt/aEJd7 DSnlv7+6fnwxMN4vD/6fMWVou4YnTsvBM0skdvi2H0ZB59D8Ed7jNrxDYsoY4W1oAiYV sEaJAm9ena/kaRh5aMcJgQs5s+vkivTYbf7sFOQZ1wJ51gdy+nGhEYO4EbbpMcHmh8nT b2Ew== X-Gm-Message-State: AO0yUKUEf+Mb938cW1BWhe9iUETJdGl8emeYAhGLMzhPHxFgWw1KZ437 Dq3YSxDOVDAefnAQEb+Jmtxlhg== X-Google-Smtp-Source: AK7set/YWnhAf+YFhv2OXt5fozTrmg2G/24NbjxXRH9lebvV7mE5dh40UrMVfpfJ10aHjkOkvNuTfA== X-Received: by 2002:a05:6a00:2148:b0:606:d3d1:4cc4 with SMTP id o8-20020a056a00214800b00606d3d14cc4mr10844065pfk.3.1678707021778; Mon, 13 Mar 2023 04:30:21 -0700 (PDT) Received: from C02DW0BEMD6R.bytedance.net ([139.177.225.229]) by smtp.gmail.com with ESMTPSA id n2-20020a654882000000b0050300a7c8c2sm4390827pgs.89.2023.03.13.04.30.15 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 13 Mar 2023 04:30:21 -0700 (PDT) From: Qi Zheng To: akpm@linux-foundation.org, tkhai@ya.ru, vbabka@suse.cz, christian.koenig@amd.com, hannes@cmpxchg.org, shakeelb@google.com, mhocko@kernel.org, roman.gushchin@linux.dev, muchun.song@linux.dev, david@redhat.com, shy828301@gmail.com Cc: sultan@kerneltoast.com, dave@stgolabs.net, penguin-kernel@I-love.SAKURA.ne.jp, paulmck@kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, Qi Zheng Subject: [PATCH v5 7/8] mm: vmscan: remove shrinker_rwsem from synchronize_shrinkers() Date: Mon, 13 Mar 2023 19:28:18 +0800 Message-Id: <20230313112819.38938-8-zhengqi.arch@bytedance.com> X-Mailer: git-send-email 2.24.3 (Apple Git-128) In-Reply-To: <20230313112819.38938-1-zhengqi.arch@bytedance.com> References: <20230313112819.38938-1-zhengqi.arch@bytedance.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" Currently, the synchronize_shrinkers() is only used by TTM pool. It only requires that no shrinkers run in parallel, and doesn't care about registering and unregistering of shrinkers. Since slab shrink is protected by SRCU, synchronize_srcu() is sufficient to ensure that no shrinker is running in parallel. So the shrinker_rwsem in synchronize_shrinkers() is no longer needed, just remove it. Signed-off-by: Qi Zheng Acked-by: Vlastimil Babka Acked-by: Kirill Tkhai --- mm/vmscan.c | 8 ++------ 1 file changed, 2 insertions(+), 6 deletions(-) diff --git a/mm/vmscan.c b/mm/vmscan.c index 8c1ae7ea8dea..2b22a42d83c4 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -790,15 +790,11 @@ EXPORT_SYMBOL(unregister_shrinker); /** * synchronize_shrinkers - Wait for all running shrinkers to complete. * - * This is equivalent to calling unregister_shrink() and register_shrinker= (), - * but atomically and with less overhead. This is useful to guarantee that= all - * shrinker invocations have seen an update, before freeing memory, simila= r to - * rcu. + * This is useful to guarantee that all shrinker invocations have seen an + * update, before freeing memory. */ void synchronize_shrinkers(void) { - down_write(&shrinker_rwsem); - up_write(&shrinker_rwsem); atomic_inc(&shrinker_srcu_generation); synchronize_srcu(&shrinker_srcu); } --=20 2.20.1 From nobody Mon Sep 8 18:52:45 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 78D22C6FD19 for ; Mon, 13 Mar 2023 11:31:48 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229784AbjCMLbr (ORCPT ); Mon, 13 Mar 2023 07:31:47 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:57196 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229849AbjCMLbT (ORCPT ); Mon, 13 Mar 2023 07:31:19 -0400 Received: from mail-pg1-x52a.google.com (mail-pg1-x52a.google.com [IPv6:2607:f8b0:4864:20::52a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 922EF4C6FC for ; Mon, 13 Mar 2023 04:30:28 -0700 (PDT) Received: by mail-pg1-x52a.google.com with SMTP id q23so6686432pgt.7 for ; Mon, 13 Mar 2023 04:30:28 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1678707028; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=veHvRh5W1aQDA0P3DcAtGH917iTj6D7LjZJelB9m0SE=; b=SVZ/kd4yVajGRKXhMfnj08cBdN48skJmJEGIspTu2+CDNIdJP4iVJq7QNuWbryFTFQ 5WArVBLbgGw6BT+81/ZTiH5DRhGi3JDvekIZehLu2+/q199ZQJ1MzMGsH3B6ep70Nh2j O9CDfWe6oBvsg9PWBDzLHKCNF/NITz+vSqLJI6Zvlx3SIOWybhEFZsRLhi+R8hbquZII EF5ApH6pEdxse23guW/P/4ktZI7CDraqOYOVYGnfra1lYuYfsl2CmxifRbGwgg74BxXU wCwmiL2GpNFoGQ1kXBpQ/7VvFCZVNNvKkhzQxx4qZOb24QFhzFh+GYU9RthjQ0RmrpAP KQEg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1678707028; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=veHvRh5W1aQDA0P3DcAtGH917iTj6D7LjZJelB9m0SE=; b=JtqEQ93g3hTofb2OW6+1uSCip/vUN0B+V/PvLuVA6edCG5+uv2aRauQws3VhUJtDA/ XJLrdkd7+tXV8Vk5AyeuRDJ4sdHZXj87/q4FMdaYdaD/IvvfxqGOjDu4Bku7GLg+bwwc k4Twv0dG0leXKSxJWr3aZrH14r/kg41tjumQfeQzNZd4DQrNjqdfkCY2dN72RPWOJrik oldyThsUXpXNS80Z/wee4Wix/DhrRGibJpZhvPNZfw+TK8HWP2zPjgKQxWNg8QAuaVuV w+3jQ/jmH4uYbTniqGYIgcWZgFLPuRc6A1iLjCeYmTQHzJSj3E+0LBL5gtZ9h49ewCbp wADg== X-Gm-Message-State: AO0yUKV6on+PfXkCFrPL7hKsZCIkYZmjP7O7/8+waHYPLv7v7qfm95Z8 aryhdoYwBSSMplnK7w+CMuZhkA== X-Google-Smtp-Source: AK7set8i2b/48hUTIPTL5nz8DyCQ8zcW2aEytP3igjsldvVyv3Zw/RxUP33QJvrhhgkOlxsg3YqeKQ== X-Received: by 2002:a62:8784:0:b0:623:77a8:8f65 with SMTP id i126-20020a628784000000b0062377a88f65mr4740537pfe.3.1678707027920; Mon, 13 Mar 2023 04:30:27 -0700 (PDT) Received: from C02DW0BEMD6R.bytedance.net ([139.177.225.229]) by smtp.gmail.com with ESMTPSA id n2-20020a654882000000b0050300a7c8c2sm4390827pgs.89.2023.03.13.04.30.22 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 13 Mar 2023 04:30:27 -0700 (PDT) From: Qi Zheng To: akpm@linux-foundation.org, tkhai@ya.ru, vbabka@suse.cz, christian.koenig@amd.com, hannes@cmpxchg.org, shakeelb@google.com, mhocko@kernel.org, roman.gushchin@linux.dev, muchun.song@linux.dev, david@redhat.com, shy828301@gmail.com Cc: sultan@kerneltoast.com, dave@stgolabs.net, penguin-kernel@I-love.SAKURA.ne.jp, paulmck@kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, Qi Zheng Subject: [PATCH v5 8/8] mm: shrinkers: convert shrinker_rwsem to mutex Date: Mon, 13 Mar 2023 19:28:19 +0800 Message-Id: <20230313112819.38938-9-zhengqi.arch@bytedance.com> X-Mailer: git-send-email 2.24.3 (Apple Git-128) In-Reply-To: <20230313112819.38938-1-zhengqi.arch@bytedance.com> References: <20230313112819.38938-1-zhengqi.arch@bytedance.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" Now there are no readers of shrinker_rwsem, so we can simply replace it with mutex lock. Signed-off-by: Qi Zheng Acked-by: Vlastimil Babka Acked-by: Kirill Tkhai --- drivers/md/dm-cache-metadata.c | 2 +- drivers/md/dm-thin-metadata.c | 2 +- fs/super.c | 2 +- mm/shrinker_debug.c | 14 +++++++------- mm/vmscan.c | 34 +++++++++++++++++----------------- 5 files changed, 27 insertions(+), 27 deletions(-) diff --git a/drivers/md/dm-cache-metadata.c b/drivers/md/dm-cache-metadata.c index acffed750e3e..9e0c69958587 100644 --- a/drivers/md/dm-cache-metadata.c +++ b/drivers/md/dm-cache-metadata.c @@ -1828,7 +1828,7 @@ int dm_cache_metadata_abort(struct dm_cache_metadata = *cmd) * Replacement block manager (new_bm) is created and old_bm destroyed out= side of * cmd root_lock to avoid ABBA deadlock that would result (due to life-cy= cle of * shrinker associated with the block manager's bufio client vs cmd root_= lock). - * - must take shrinker_rwsem without holding cmd->root_lock + * - must take shrinker_mutex without holding cmd->root_lock */ new_bm =3D dm_block_manager_create(cmd->bdev, DM_CACHE_METADATA_BLOCK_SIZ= E << SECTOR_SHIFT, CACHE_MAX_CONCURRENT_LOCKS); diff --git a/drivers/md/dm-thin-metadata.c b/drivers/md/dm-thin-metadata.c index fd464fb024c3..9f5cb52c5763 100644 --- a/drivers/md/dm-thin-metadata.c +++ b/drivers/md/dm-thin-metadata.c @@ -1887,7 +1887,7 @@ int dm_pool_abort_metadata(struct dm_pool_metadata *p= md) * Replacement block manager (new_bm) is created and old_bm destroyed out= side of * pmd root_lock to avoid ABBA deadlock that would result (due to life-cy= cle of * shrinker associated with the block manager's bufio client vs pmd root_= lock). - * - must take shrinker_rwsem without holding pmd->root_lock + * - must take shrinker_mutex without holding pmd->root_lock */ new_bm =3D dm_block_manager_create(pmd->bdev, THIN_METADATA_BLOCK_SIZE <<= SECTOR_SHIFT, THIN_MAX_CONCURRENT_LOCKS); diff --git a/fs/super.c b/fs/super.c index 84332d5cb817..91a4037b1d95 100644 --- a/fs/super.c +++ b/fs/super.c @@ -54,7 +54,7 @@ static char *sb_writers_name[SB_FREEZE_LEVELS] =3D { * One thing we have to be careful of with a per-sb shrinker is that we do= n't * drop the last active reference to the superblock from within the shrink= er. * If that happens we could trigger unregistering the shrinker from within= the - * shrinker path and that leads to deadlock on the shrinker_rwsem. Hence we + * shrinker path and that leads to deadlock on the shrinker_mutex. Hence we * take a passive reference to the superblock to avoid this from occurring. */ static unsigned long super_cache_scan(struct shrinker *shrink, diff --git a/mm/shrinker_debug.c b/mm/shrinker_debug.c index 37d54d037495..fdd155fd35ed 100644 --- a/mm/shrinker_debug.c +++ b/mm/shrinker_debug.c @@ -8,7 +8,7 @@ #include =20 /* defined in vmscan.c */ -extern struct rw_semaphore shrinker_rwsem; +extern struct mutex shrinker_mutex; extern struct list_head shrinker_list; extern struct srcu_struct shrinker_srcu; =20 @@ -168,7 +168,7 @@ int shrinker_debugfs_add(struct shrinker *shrinker) char buf[128]; int id; =20 - lockdep_assert_held(&shrinker_rwsem); + lockdep_assert_held(&shrinker_mutex); =20 /* debugfs isn't initialized yet, add debugfs entries later. */ if (!shrinker_debugfs_root) @@ -211,7 +211,7 @@ int shrinker_debugfs_rename(struct shrinker *shrinker, = const char *fmt, ...) if (!new) return -ENOMEM; =20 - down_write(&shrinker_rwsem); + mutex_lock(&shrinker_mutex); =20 old =3D shrinker->name; shrinker->name =3D new; @@ -229,7 +229,7 @@ int shrinker_debugfs_rename(struct shrinker *shrinker, = const char *fmt, ...) shrinker->debugfs_entry =3D entry; } =20 - up_write(&shrinker_rwsem); + mutex_unlock(&shrinker_mutex); =20 kfree_const(old); =20 @@ -241,7 +241,7 @@ struct dentry *shrinker_debugfs_remove(struct shrinker = *shrinker) { struct dentry *entry =3D shrinker->debugfs_entry; =20 - lockdep_assert_held(&shrinker_rwsem); + lockdep_assert_held(&shrinker_mutex); =20 kfree_const(shrinker->name); shrinker->name =3D NULL; @@ -266,14 +266,14 @@ static int __init shrinker_debugfs_init(void) shrinker_debugfs_root =3D dentry; =20 /* Create debugfs entries for shrinkers registered at boot */ - down_write(&shrinker_rwsem); + mutex_lock(&shrinker_mutex); list_for_each_entry(shrinker, &shrinker_list, list) if (!shrinker->debugfs_entry) { ret =3D shrinker_debugfs_add(shrinker); if (ret) break; } - up_write(&shrinker_rwsem); + mutex_unlock(&shrinker_mutex); =20 return ret; } diff --git a/mm/vmscan.c b/mm/vmscan.c index 2b22a42d83c4..8faac4310cb5 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -35,7 +35,7 @@ #include #include #include -#include +#include #include #include #include @@ -202,7 +202,7 @@ static void set_task_reclaim_state(struct task_struct *= task, } =20 LIST_HEAD(shrinker_list); -DECLARE_RWSEM(shrinker_rwsem); +DEFINE_MUTEX(shrinker_mutex); DEFINE_SRCU(shrinker_srcu); static atomic_t shrinker_srcu_generation =3D ATOMIC_INIT(0); =20 @@ -225,7 +225,7 @@ static struct shrinker_info *shrinker_info_protected(st= ruct mem_cgroup *memcg, { return srcu_dereference_check(memcg->nodeinfo[nid]->shrinker_info, &shrinker_srcu, - lockdep_is_held(&shrinker_rwsem)); + lockdep_is_held(&shrinker_mutex)); } =20 static struct shrinker_info *shrinker_info_srcu(struct mem_cgroup *memcg, @@ -304,7 +304,7 @@ int alloc_shrinker_info(struct mem_cgroup *memcg) int nid, size, ret =3D 0; int map_size, defer_size =3D 0; =20 - down_write(&shrinker_rwsem); + mutex_lock(&shrinker_mutex); map_size =3D shrinker_map_size(shrinker_nr_max); defer_size =3D shrinker_defer_size(shrinker_nr_max); size =3D map_size + defer_size; @@ -320,7 +320,7 @@ int alloc_shrinker_info(struct mem_cgroup *memcg) info->map_nr_max =3D shrinker_nr_max; rcu_assign_pointer(memcg->nodeinfo[nid]->shrinker_info, info); } - up_write(&shrinker_rwsem); + mutex_unlock(&shrinker_mutex); =20 return ret; } @@ -336,7 +336,7 @@ static int expand_shrinker_info(int new_id) if (!root_mem_cgroup) goto out; =20 - lockdep_assert_held(&shrinker_rwsem); + lockdep_assert_held(&shrinker_mutex); =20 map_size =3D shrinker_map_size(new_nr_max); defer_size =3D shrinker_defer_size(new_nr_max); @@ -386,7 +386,7 @@ static int prealloc_memcg_shrinker(struct shrinker *shr= inker) if (mem_cgroup_disabled()) return -ENOSYS; =20 - down_write(&shrinker_rwsem); + mutex_lock(&shrinker_mutex); id =3D idr_alloc(&shrinker_idr, shrinker, 0, 0, GFP_KERNEL); if (id < 0) goto unlock; @@ -400,7 +400,7 @@ static int prealloc_memcg_shrinker(struct shrinker *shr= inker) shrinker->id =3D id; ret =3D 0; unlock: - up_write(&shrinker_rwsem); + mutex_unlock(&shrinker_mutex); return ret; } =20 @@ -410,7 +410,7 @@ static void unregister_memcg_shrinker(struct shrinker *= shrinker) =20 BUG_ON(id < 0); =20 - lockdep_assert_held(&shrinker_rwsem); + lockdep_assert_held(&shrinker_mutex); =20 idr_remove(&shrinker_idr, id); } @@ -445,7 +445,7 @@ void reparent_shrinker_deferred(struct mem_cgroup *memc= g) parent =3D root_mem_cgroup; =20 /* Prevent from concurrent shrinker_info expand */ - down_write(&shrinker_rwsem); + mutex_lock(&shrinker_mutex); for_each_node(nid) { child_info =3D shrinker_info_protected(memcg, nid); parent_info =3D shrinker_info_protected(parent, nid); @@ -454,7 +454,7 @@ void reparent_shrinker_deferred(struct mem_cgroup *memc= g) atomic_long_add(nr, &parent_info->nr_deferred[i]); } } - up_write(&shrinker_rwsem); + mutex_unlock(&shrinker_mutex); } =20 static bool cgroup_reclaim(struct scan_control *sc) @@ -703,9 +703,9 @@ void free_prealloced_shrinker(struct shrinker *shrinker) shrinker->name =3D NULL; #endif if (shrinker->flags & SHRINKER_MEMCG_AWARE) { - down_write(&shrinker_rwsem); + mutex_lock(&shrinker_mutex); unregister_memcg_shrinker(shrinker); - up_write(&shrinker_rwsem); + mutex_unlock(&shrinker_mutex); return; } =20 @@ -715,11 +715,11 @@ void free_prealloced_shrinker(struct shrinker *shrink= er) =20 void register_shrinker_prepared(struct shrinker *shrinker) { - down_write(&shrinker_rwsem); + mutex_lock(&shrinker_mutex); list_add_tail_rcu(&shrinker->list, &shrinker_list); shrinker->flags |=3D SHRINKER_REGISTERED; shrinker_debugfs_add(shrinker); - up_write(&shrinker_rwsem); + mutex_unlock(&shrinker_mutex); } =20 static int __register_shrinker(struct shrinker *shrinker) @@ -769,13 +769,13 @@ void unregister_shrinker(struct shrinker *shrinker) if (!(shrinker->flags & SHRINKER_REGISTERED)) return; =20 - down_write(&shrinker_rwsem); + mutex_lock(&shrinker_mutex); list_del_rcu(&shrinker->list); shrinker->flags &=3D ~SHRINKER_REGISTERED; if (shrinker->flags & SHRINKER_MEMCG_AWARE) unregister_memcg_shrinker(shrinker); debugfs_entry =3D shrinker_debugfs_remove(shrinker); - up_write(&shrinker_rwsem); + mutex_unlock(&shrinker_mutex); =20 atomic_inc(&shrinker_srcu_generation); synchronize_srcu(&shrinker_srcu); --=20 2.20.1