From nobody Wed Nov 27 21:51:40 2024 Delivered-To: importer@patchew.org Received-SPF: pass (zohomail.com: domain of lists.xenproject.org designates 192.237.175.120 as permitted sender) client-ip=192.237.175.120; envelope-from=xen-devel-bounces@lists.xenproject.org; helo=lists.xenproject.org; Authentication-Results: mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of lists.xenproject.org designates 192.237.175.120 as permitted sender) smtp.mailfrom=xen-devel-bounces@lists.xenproject.org; dmarc=pass(p=quarantine dis=none) header.from=bytedance.com ARC-Seal: i=1; a=rsa-sha256; t=1690445760; cv=none; d=zohomail.com; s=zohoarc; b=ds4WNojtnPd1ijMmBgLVbjKOkzPZELpOTimuXnkr+pkcFFjp8CyjCtC+a8LLfOrwe8+4vU6/Bcm/86WdIjX4jTCqBIb5BvCKbCD/QE1EGcdzE4Uj7jH33EfnUVkeSZLba1BtQQ2PhTb+Vg04Fr2egcQ9b3Os+HDjA4ZAI/Be3m0= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zohomail.com; s=zohoarc; t=1690445760; h=Content-Transfer-Encoding:Cc:Date:From:In-Reply-To:List-Subscribe:List-Post:List-Id:List-Help:List-Unsubscribe:MIME-Version:Message-ID:References:Sender:Subject:To; bh=kfKKsEp61SZUnTy00HUcIO/dAXsDrrCBrc1q4f5gA2g=; b=a8NDaPXhl6T1kTA2vfKJetOoVES5o0NxF6zcFAiPpMC60iULb/4FO/eYvRoZdgQukDj+jbsRCfr6A8LbfN71aECmzkTo0H9l6q2wTOn59SftIO1aMCvSVpHn2v/sQYbukmCY8ETqBVI4xC8WdcEHNn4wbGfNJ9flVKbWI8j6gRk= ARC-Authentication-Results: i=1; mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of lists.xenproject.org designates 192.237.175.120 as permitted sender) smtp.mailfrom=xen-devel-bounces@lists.xenproject.org; dmarc=pass header.from= (p=quarantine dis=none) Return-Path: Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120]) by mx.zohomail.com with SMTPS id 1690445760176516.3556856927547; Thu, 27 Jul 2023 01:16:00 -0700 (PDT) Received: from list by lists.xenproject.org with outflank-mailman.570878.893299 (Exim 4.92) (envelope-from ) id 1qOw9e-0006Kk-5T; Thu, 27 Jul 2023 08:15:26 +0000 Received: by outflank-mailman (output) from mailman id 570878.893299; Thu, 27 Jul 2023 08:15:26 +0000 Received: from localhost ([127.0.0.1] helo=lists.xenproject.org) by lists.xenproject.org with esmtp (Exim 4.92) (envelope-from ) id 1qOw9e-0006Kb-1J; Thu, 27 Jul 2023 08:15:26 +0000 Received: by outflank-mailman (input) for mailman id 570878; Thu, 27 Jul 2023 08:15:24 +0000 Received: from se1-gles-sth1-in.inumbo.com ([159.253.27.254] helo=se1-gles-sth1.inumbo.com) by lists.xenproject.org with esmtp (Exim 4.92) (envelope-from ) id 1qOw9c-0006Jy-Hz for xen-devel@lists.xenproject.org; Thu, 27 Jul 2023 08:15:24 +0000 Received: from mail-pf1-x433.google.com (mail-pf1-x433.google.com [2607:f8b0:4864:20::433]) by se1-gles-sth1.inumbo.com (Halon) with ESMTPS id bb9e76f4-2c55-11ee-b247-6b7b168915f2; Thu, 27 Jul 2023 10:15:23 +0200 (CEST) Received: by mail-pf1-x433.google.com with SMTP id d2e1a72fcca58-6748a616e17so184173b3a.1 for ; Thu, 27 Jul 2023 01:15:23 -0700 (PDT) Received: from C02DW0BEMD6R.bytedance.net ([203.208.167.147]) by smtp.gmail.com with ESMTPSA id j8-20020aa78d08000000b006828e49c04csm885872pfe.75.2023.07.27.01.15.10 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 27 Jul 2023 01:15:21 -0700 (PDT) X-Outflank-Mailman: Message body and most headers restored to incoming version X-BeenThere: xen-devel@lists.xenproject.org List-Id: Xen developer discussion List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Errors-To: xen-devel-bounces@lists.xenproject.org Precedence: list Sender: "Xen-devel" X-Inumbo-ID: bb9e76f4-2c55-11ee-b247-6b7b168915f2 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1690445722; x=1691050522; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=kfKKsEp61SZUnTy00HUcIO/dAXsDrrCBrc1q4f5gA2g=; b=EAS6K8pjZHV2xJO1k0B0O2ENzb3/Vvl+c5sC6vZF5+RNctHPYXK3F4aD7Xzt8amvSX Sp0sM3dY6nrfZgyX03OOoIKbasLRWXH327iv4+GYUr4WlO1QPIHvBbrRjUoddMLJYEtj qslIbW38aWpZfadSfI+TaMUP7PcrUPJ3woAJMv4ugKgPJM767PAG7K9gWxMSKIL9CKj3 w5CY/SzoV5mu91kUw+3ztoEROa7i9Mv96qiOzuHElFUZjh2ImlDQY3RLu003MMnVyV3S TQhJNq9ppxsbvsUjd927IP6iiwsq7VIfzEbkN9zPwoI9Valaw0w8Qu8JFGFsYg3rQsoY gM7w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1690445722; x=1691050522; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=kfKKsEp61SZUnTy00HUcIO/dAXsDrrCBrc1q4f5gA2g=; b=UIR1bHE5XQMd6LYWsVakQgJdXoUQFZWZGlIgaQ9Q3g15mQFx6aqd4KHE+yu+xqevb6 rEnhDa+N2p4g44n6IXMpyrK/47M32O9RBJrJWJ+VLHiK6kzbGIuINkzQ1FJ8dkwEya86 JlVY7zCuQghK0LxKt2SQMk823V8GFoWiIjiqKyoMm6K2UdjvMPV4Rcqf5IsMtgCl2vUV AzXla0dPTRckgOPGUJdwm9SFp/i57YPpPrHL/rFPhebVsvFVUMXw4gijM4HAc/1JZrRe 32YiaDetHP1xn/JmBY+HT8NbuylBvvU8Uw4gPVmdOJiFvbSIyAka/VM+juq1eWsWDeyu Fciw== X-Gm-Message-State: ABy/qLZGSjw4kmQA0ugSb1GpJWQmfu2Cc6QsDhvO/FYzi7GBYFsOkQ0y 7VZDUuqdejYITvi/czcKiuhArw== X-Google-Smtp-Source: APBJJlGaIHiWHX80BWphqmbYYC2Qe9rA0t1NXuG9/ru/q26OnfFQXVEhM5bGyqGhtVnixSectLT5LA== X-Received: by 2002:a05:6a00:4792:b0:668:834d:4bd with SMTP id dh18-20020a056a00479200b00668834d04bdmr4689753pfb.0.1690445722143; Thu, 27 Jul 2023 01:15:22 -0700 (PDT) From: Qi Zheng To: akpm@linux-foundation.org, david@fromorbit.com, tkhai@ya.ru, vbabka@suse.cz, roman.gushchin@linux.dev, djwong@kernel.org, brauner@kernel.org, paulmck@kernel.org, tytso@mit.edu, steven.price@arm.com, cel@kernel.org, senozhatsky@chromium.org, yujie.liu@intel.com, gregkh@linuxfoundation.org, muchun.song@linux.dev Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org, x86@kernel.org, kvm@vger.kernel.org, xen-devel@lists.xenproject.org, linux-erofs@lists.ozlabs.org, linux-f2fs-devel@lists.sourceforge.net, cluster-devel@redhat.com, linux-nfs@vger.kernel.org, linux-mtd@lists.infradead.org, rcu@vger.kernel.org, netdev@vger.kernel.org, dri-devel@lists.freedesktop.org, linux-arm-msm@vger.kernel.org, dm-devel@redhat.com, linux-raid@vger.kernel.org, linux-bcache@vger.kernel.org, virtualization@lists.linux-foundation.org, linux-fsdevel@vger.kernel.org, linux-ext4@vger.kernel.org, linux-xfs@vger.kernel.org, linux-btrfs@vger.kernel.org, Qi Zheng Subject: [PATCH v3 47/49] mm: shrinker: make memcg slab shrink lockless Date: Thu, 27 Jul 2023 16:05:00 +0800 Message-Id: <20230727080502.77895-48-zhengqi.arch@bytedance.com> X-Mailer: git-send-email 2.24.3 (Apple Git-128) In-Reply-To: <20230727080502.77895-1-zhengqi.arch@bytedance.com> References: <20230727080502.77895-1-zhengqi.arch@bytedance.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-ZohoMail-DKIM: pass (identity @bytedance.com) X-ZM-MESSAGEID: 1690445761669100001 Content-Type: text/plain; charset="utf-8" Like global slab shrink, this commit also uses refcount+RCU method to make memcg slab shrink lockless. Use the following script to do slab shrink stress test: ``` DIR=3D"/root/shrinker/memcg/mnt" do_create() { mkdir -p /sys/fs/cgroup/memory/test echo 4G > /sys/fs/cgroup/memory/test/memory.limit_in_bytes for i in `seq 0 $1`; do mkdir -p /sys/fs/cgroup/memory/test/$i; echo $$ > /sys/fs/cgroup/memory/test/$i/cgroup.procs; mkdir -p $DIR/$i; done } do_mount() { for i in `seq $1 $2`; do mount -t tmpfs $i $DIR/$i; done } do_touch() { for i in `seq $1 $2`; do echo $$ > /sys/fs/cgroup/memory/test/$i/cgroup.procs; dd if=3D/dev/zero of=3D$DIR/$i/file$i bs=3D1M count=3D1 & done } case "$1" in touch) do_touch $2 $3 ;; test) do_create 4000 do_mount 0 4000 do_touch 0 3000 ;; *) exit 1 ;; esac ``` Save the above script, then run test and touch commands. Then we can use the following perf command to view hotspots: perf top -U -F 999 1) Before applying this patchset: 40.44% [kernel] [k] down_read_trylock 17.59% [kernel] [k] up_read 13.64% [kernel] [k] pv_native_safe_halt 11.90% [kernel] [k] shrink_slab 8.21% [kernel] [k] idr_find 2.71% [kernel] [k] _find_next_bit 1.36% [kernel] [k] shrink_node 0.81% [kernel] [k] shrink_lruvec 0.80% [kernel] [k] __radix_tree_lookup 0.50% [kernel] [k] do_shrink_slab 0.21% [kernel] [k] list_lru_count_one 0.16% [kernel] [k] mem_cgroup_iter 2) After applying this patchset: 60.17% [kernel] [k] shrink_slab 20.42% [kernel] [k] pv_native_safe_halt 3.03% [kernel] [k] do_shrink_slab 2.73% [kernel] [k] shrink_node 2.27% [kernel] [k] shrink_lruvec 2.00% [kernel] [k] __rcu_read_unlock 1.92% [kernel] [k] mem_cgroup_iter 0.98% [kernel] [k] __rcu_read_lock 0.91% [kernel] [k] osq_lock 0.63% [kernel] [k] mem_cgroup_calculate_protection 0.55% [kernel] [k] shrinker_put 0.46% [kernel] [k] list_lru_count_one We can see that the first perf hotspot becomes shrink_slab, which is what we expect. Signed-off-by: Qi Zheng --- mm/shrinker.c | 80 ++++++++++++++++++++++++++++++++++----------------- 1 file changed, 54 insertions(+), 26 deletions(-) diff --git a/mm/shrinker.c b/mm/shrinker.c index d318f5621862..fee6f62904fb 100644 --- a/mm/shrinker.c +++ b/mm/shrinker.c @@ -107,6 +107,12 @@ static struct shrinker_info *shrinker_info_protected(s= truct mem_cgroup *memcg, lockdep_is_held(&shrinker_rwsem)); } =20 +static struct shrinker_info *shrinker_info_rcu(struct mem_cgroup *memcg, + int nid) +{ + return rcu_dereference(memcg->nodeinfo[nid]->shrinker_info); +} + static int expand_one_shrinker_info(struct mem_cgroup *memcg, int new_size, int old_size, int new_nr_max) { @@ -198,7 +204,7 @@ void set_shrinker_bit(struct mem_cgroup *memcg, int nid= , int shrinker_id) struct shrinker_info_unit *unit; =20 rcu_read_lock(); - info =3D rcu_dereference(memcg->nodeinfo[nid]->shrinker_info); + info =3D shrinker_info_rcu(memcg, nid); unit =3D info->unit[shriner_id_to_index(shrinker_id)]; if (!WARN_ON_ONCE(shrinker_id >=3D info->map_nr_max)) { /* Pairs with smp mb in shrink_slab() */ @@ -211,7 +217,7 @@ void set_shrinker_bit(struct mem_cgroup *memcg, int nid= , int shrinker_id) =20 static DEFINE_IDR(shrinker_idr); =20 -static int prealloc_memcg_shrinker(struct shrinker *shrinker) +static int shrinker_memcg_alloc(struct shrinker *shrinker) { int id, ret =3D -ENOMEM; =20 @@ -219,7 +225,6 @@ static int prealloc_memcg_shrinker(struct shrinker *shr= inker) return -ENOSYS; =20 down_write(&shrinker_rwsem); - /* This may call shrinker, so it must use down_read_trylock() */ id =3D idr_alloc(&shrinker_idr, shrinker, 0, 0, GFP_KERNEL); if (id < 0) goto unlock; @@ -237,7 +242,7 @@ static int prealloc_memcg_shrinker(struct shrinker *shr= inker) return ret; } =20 -static void unregister_memcg_shrinker(struct shrinker *shrinker) +static void shrinker_memcg_remove(struct shrinker *shrinker) { int id =3D shrinker->id; =20 @@ -253,10 +258,15 @@ static long xchg_nr_deferred_memcg(int nid, struct sh= rinker *shrinker, { struct shrinker_info *info; struct shrinker_info_unit *unit; + long nr_deferred; =20 - info =3D shrinker_info_protected(memcg, nid); + rcu_read_lock(); + info =3D shrinker_info_rcu(memcg, nid); unit =3D info->unit[shriner_id_to_index(shrinker->id)]; - return atomic_long_xchg(&unit->nr_deferred[shriner_id_to_offset(shrinker-= >id)], 0); + nr_deferred =3D atomic_long_xchg(&unit->nr_deferred[shriner_id_to_offset(= shrinker->id)], 0); + rcu_read_unlock(); + + return nr_deferred; } =20 static long add_nr_deferred_memcg(long nr, int nid, struct shrinker *shrin= ker, @@ -264,10 +274,16 @@ static long add_nr_deferred_memcg(long nr, int nid, s= truct shrinker *shrinker, { struct shrinker_info *info; struct shrinker_info_unit *unit; + long nr_deferred; =20 - info =3D shrinker_info_protected(memcg, nid); + rcu_read_lock(); + info =3D shrinker_info_rcu(memcg, nid); unit =3D info->unit[shriner_id_to_index(shrinker->id)]; - return atomic_long_add_return(nr, &unit->nr_deferred[shriner_id_to_offset= (shrinker->id)]); + nr_deferred =3D + atomic_long_add_return(nr, &unit->nr_deferred[shriner_id_to_offset(shrin= ker->id)]); + rcu_read_unlock(); + + return nr_deferred; } =20 void reparent_shrinker_deferred(struct mem_cgroup *memcg) @@ -299,12 +315,12 @@ void reparent_shrinker_deferred(struct mem_cgroup *me= mcg) up_read(&shrinker_rwsem); } #else -static int prealloc_memcg_shrinker(struct shrinker *shrinker) +static int shrinker_memcg_alloc(struct shrinker *shrinker) { return -ENOSYS; } =20 -static void unregister_memcg_shrinker(struct shrinker *shrinker) +static void shrinker_memcg_remove(struct shrinker *shrinker) { } =20 @@ -464,18 +480,23 @@ static unsigned long shrink_slab_memcg(gfp_t gfp_mask= , int nid, if (!mem_cgroup_online(memcg)) return 0; =20 - if (!down_read_trylock(&shrinker_rwsem)) - return 0; - - info =3D shrinker_info_protected(memcg, nid); +again: + rcu_read_lock(); + info =3D shrinker_info_rcu(memcg, nid); if (unlikely(!info)) goto unlock; =20 - for (; index < shriner_id_to_index(info->map_nr_max); index++) { + if (index < shriner_id_to_index(info->map_nr_max)) { struct shrinker_info_unit *unit; =20 unit =3D info->unit[index]; =20 + /* + * The shrinker_info_unit will not be freed, so we can + * safely release the RCU lock here. + */ + rcu_read_unlock(); + for_each_set_bit(offset, unit->map, SHRINKER_UNIT_BITS) { struct shrink_control sc =3D { .gfp_mask =3D gfp_mask, @@ -485,12 +506,14 @@ static unsigned long shrink_slab_memcg(gfp_t gfp_mask= , int nid, struct shrinker *shrinker; int shrinker_id =3D calc_shrinker_id(index, offset); =20 + rcu_read_lock(); shrinker =3D idr_find(&shrinker_idr, shrinker_id); - if (unlikely(!shrinker || !(shrinker->flags & SHRINKER_REGISTERED))) { - if (!shrinker) - clear_bit(offset, unit->map); + if (unlikely(!shrinker || !shrinker_try_get(shrinker))) { + clear_bit(offset, unit->map); + rcu_read_unlock(); continue; } + rcu_read_unlock(); =20 /* Call non-slab shrinkers even though kmem is disabled */ if (!memcg_kmem_online() && @@ -523,15 +546,20 @@ static unsigned long shrink_slab_memcg(gfp_t gfp_mask= , int nid, set_shrinker_bit(memcg, nid, shrinker_id); } freed +=3D ret; - - if (rwsem_is_contended(&shrinker_rwsem)) { - freed =3D freed ? : 1; - goto unlock; - } + shrinker_put(shrinker); } + + /* + * We have already exited the read-side of rcu critical section + * before calling do_shrink_slab(), the shrinker_info may be + * released in expand_one_shrinker_info(), so reacquire the + * shrinker_info. + */ + index++; + goto again; } unlock: - up_read(&shrinker_rwsem); + rcu_read_unlock(); return freed; } #else /* !CONFIG_MEMCG */ @@ -638,7 +666,7 @@ struct shrinker *shrinker_alloc(unsigned int flags, con= st char *fmt, ...) shrinker->flags =3D flags | SHRINKER_ALLOCATED; =20 if (flags & SHRINKER_MEMCG_AWARE) { - err =3D prealloc_memcg_shrinker(shrinker); + err =3D shrinker_memcg_alloc(shrinker); if (err =3D=3D -ENOSYS) shrinker->flags &=3D ~SHRINKER_MEMCG_AWARE; else if (err =3D=3D 0) @@ -731,7 +759,7 @@ void shrinker_free(struct shrinker *shrinker) } =20 if (shrinker->flags & SHRINKER_MEMCG_AWARE) - unregister_memcg_shrinker(shrinker); + shrinker_memcg_remove(shrinker); up_write(&shrinker_rwsem); =20 if (debugfs_entry) --=20 2.30.2