From nobody Thu Dec 18 05:00:03 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id E6FF0EE49AF for ; Wed, 23 Aug 2023 05:06:21 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232644AbjHWFGW (ORCPT ); Wed, 23 Aug 2023 01:06:22 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53580 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232635AbjHWFGT (ORCPT ); Wed, 23 Aug 2023 01:06:19 -0400 Received: from mail-ej1-x630.google.com (mail-ej1-x630.google.com [IPv6:2a00:1450:4864:20::630]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 8456AE5B for ; Tue, 22 Aug 2023 22:06:17 -0700 (PDT) Received: by mail-ej1-x630.google.com with SMTP id a640c23a62f3a-99bdeae1d0aso689667766b.1 for ; Tue, 22 Aug 2023 22:06:17 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1692767176; x=1693371976; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=5JRcXwzRi0jQB1TLzrEg7BfGVcbae4iTaoFsakUigOI=; b=ohUoxhT73ZTR4FOXW3n8KDmTPiUTiFYOtoJO19cnC62PnujQwIJjk04iHc3hG3cvX4 LcMnHxtwfbLBGmTt34u8g0jzz4jZ8yuQ9lUPSoqq+b+ob+vUFne5Ma4yxQhVwQSnItYY TmMzHCyb3oM+uF1g5uULBY71M8RCA6pElrs8DAtqHOYIeTFsd2Cm+xPN0BZGI/A69pMA SEpSyqnm8MbSdOdQpVYqzFhEAzCLb85q7LeCRH+xqLtBatSswuhxxNqrOaKmh/Uq1TLS FgYlSXUrmYwtY0IhveXfbO6t3owUxAb2UvPQY6uUUiBxrD4m2sZ0NQBGum5jb7Km5WD7 t2Dg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1692767176; x=1693371976; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=5JRcXwzRi0jQB1TLzrEg7BfGVcbae4iTaoFsakUigOI=; b=lelePtP1n0JpL/D6PMmGn4Miz9felB9Rv0IS8npGQiWAEUGu4QiYGgl9f5fhILkGx6 3CYm45JP7x/odICH21wwMOKjKf9ulmnWg1Gy/c9frvWlC5cBgfSk5rZ9NzFqSJGtuk0p /Ht+vzljmS/AfEK5uNc+lmoTZpPzquusv9PUi32ACNR1+qlW/w+S9UAYLdngMliyyTKI BOQGVRlAiTKILi1RCBX0Kko9F0rpcXgNbfhM2s07ylPlwEXRjxAwu5m/hrfU3FWtV39J EU/VDMXIOkoSoFyCYg+uZBU6ajQn/ckKHr1aJYik/qsL52otMeqwgKBAfi+t4IN+XXS6 GWwQ== X-Gm-Message-State: AOJu0YzyP7xr8ubCbB7QkvYX8WztrdLG3XwmagYX5+6Ri4poKHqi2DEX COPz521guPJKPp3mZw41mmKQ/yqwHYMApw== X-Google-Smtp-Source: AGHT+IFWux01B2Oa+XeckE2+Vyuwk0P4V6dysyQJzM5m7Vf7TaVfAA7TF1kKC1SWOti4CX3UZqAF0g== X-Received: by 2002:a17:907:7754:b0:9a1:9284:115 with SMTP id kx20-20020a170907775400b009a192840115mr6015550ejc.67.1692767176002; Tue, 22 Aug 2023 22:06:16 -0700 (PDT) Received: from f.. (cst-prg-85-121.cust.vodafone.cz. [46.135.85.121]) by smtp.gmail.com with ESMTPSA id q16-20020a170906b29000b0099ddc81903asm9267401ejz.221.2023.08.22.22.06.14 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 22 Aug 2023 22:06:15 -0700 (PDT) From: Mateusz Guzik To: linux-kernel@vger.kernel.org Cc: dennis@kernel.org, tj@kernel.org, cl@linux.com, akpm@linux-foundation.org, shakeelb@google.com, vegard.nossum@oracle.com, linux-mm@kvack.org, Mateusz Guzik Subject: [PATCH v3 1/2] pcpcntr: add group allocation/free Date: Wed, 23 Aug 2023 07:06:08 +0200 Message-Id: <20230823050609.2228718-2-mjguzik@gmail.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230823050609.2228718-1-mjguzik@gmail.com> References: <20230823050609.2228718-1-mjguzik@gmail.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" Allocations and frees are globally serialized on the pcpu lock (and the CPU hotplug lock if enabled, which is the case on Debian). At least one frequent consumer allocates 4 back-to-back counters (and frees them in the same manner), exacerbating the problem. While this does not fully remedy scalability issues, it is a step towards that goal and provides immediate relief. Signed-off-by: Mateusz Guzik Reviewed-by: Dennis Zhou Reviewed-by: Vegard Nossum --- include/linux/percpu_counter.h | 39 ++++++++++++++++++---- lib/percpu_counter.c | 61 +++++++++++++++++++++++----------- 2 files changed, 74 insertions(+), 26 deletions(-) diff --git a/include/linux/percpu_counter.h b/include/linux/percpu_counter.h index 75b73c83bc9d..f1e7c987e3d3 100644 --- a/include/linux/percpu_counter.h +++ b/include/linux/percpu_counter.h @@ -30,17 +30,27 @@ struct percpu_counter { =20 extern int percpu_counter_batch; =20 -int __percpu_counter_init(struct percpu_counter *fbc, s64 amount, gfp_t gf= p, - struct lock_class_key *key); +int __percpu_counter_init_many(struct percpu_counter *fbc, s64 amount, gfp= _t gfp, + u32 nr_counters, struct lock_class_key *key); =20 -#define percpu_counter_init(fbc, value, gfp) \ +#define percpu_counter_init_many(fbc, value, gfp, nr_counters) \ ({ \ static struct lock_class_key __key; \ \ - __percpu_counter_init(fbc, value, gfp, &__key); \ + __percpu_counter_init_many(fbc, value, gfp, nr_counters,\ + &__key); \ }) =20 -void percpu_counter_destroy(struct percpu_counter *fbc); + +#define percpu_counter_init(fbc, value, gfp) \ + percpu_counter_init_many(fbc, value, gfp, 1) + +void percpu_counter_destroy_many(struct percpu_counter *fbc, u32 nr_counte= rs); +static inline void percpu_counter_destroy(struct percpu_counter *fbc) +{ + percpu_counter_destroy_many(fbc, 1); +} + void percpu_counter_set(struct percpu_counter *fbc, s64 amount); void percpu_counter_add_batch(struct percpu_counter *fbc, s64 amount, s32 batch); @@ -116,11 +126,26 @@ struct percpu_counter { s64 count; }; =20 +static inline int percpu_counter_init_many(struct percpu_counter *fbc, s64= amount, + gfp_t gfp, u32 nr_counters) +{ + u32 i; + + for (i =3D 0; i < nr_counters; i++) + fbc[i].count =3D amount; + + return 0; +} + static inline int percpu_counter_init(struct percpu_counter *fbc, s64 amou= nt, gfp_t gfp) { - fbc->count =3D amount; - return 0; + return percpu_counter_init_many(fbc, amount, gfp, 1); +} + +static inline void percpu_counter_destroy_many(struct percpu_counter *fbc, + u32 nr_counters) +{ } =20 static inline void percpu_counter_destroy(struct percpu_counter *fbc) diff --git a/lib/percpu_counter.c b/lib/percpu_counter.c index 5004463c4f9f..9338b27f1cdd 100644 --- a/lib/percpu_counter.c +++ b/lib/percpu_counter.c @@ -151,48 +151,71 @@ s64 __percpu_counter_sum(struct percpu_counter *fbc) } EXPORT_SYMBOL(__percpu_counter_sum); =20 -int __percpu_counter_init(struct percpu_counter *fbc, s64 amount, gfp_t gf= p, - struct lock_class_key *key) +int __percpu_counter_init_many(struct percpu_counter *fbc, s64 amount, gfp= _t gfp, + u32 nr_counters, struct lock_class_key *key) { unsigned long flags __maybe_unused; - - raw_spin_lock_init(&fbc->lock); - lockdep_set_class(&fbc->lock, key); - fbc->count =3D amount; - fbc->counters =3D alloc_percpu_gfp(s32, gfp); - if (!fbc->counters) + size_t counter_size; + s32 __percpu *counters; + u32 i; + + counter_size =3D ALIGN(sizeof(*counters), __alignof__(*counters)); + counters =3D __alloc_percpu_gfp(nr_counters * counter_size, + __alignof__(*counters), gfp); + if (!counters) { + fbc[0].counters =3D NULL; return -ENOMEM; + } =20 - debug_percpu_counter_activate(fbc); + for (i =3D 0; i < nr_counters; i++) { + raw_spin_lock_init(&fbc[i].lock); + lockdep_set_class(&fbc[i].lock, key); +#ifdef CONFIG_HOTPLUG_CPU + INIT_LIST_HEAD(&fbc[i].list); +#endif + fbc[i].count =3D amount; + fbc[i].counters =3D (void *)counters + (i * counter_size); + + debug_percpu_counter_activate(&fbc[i]); + } =20 #ifdef CONFIG_HOTPLUG_CPU - INIT_LIST_HEAD(&fbc->list); spin_lock_irqsave(&percpu_counters_lock, flags); - list_add(&fbc->list, &percpu_counters); + for (i =3D 0; i < nr_counters; i++) + list_add(&fbc[i].list, &percpu_counters); spin_unlock_irqrestore(&percpu_counters_lock, flags); #endif return 0; } -EXPORT_SYMBOL(__percpu_counter_init); +EXPORT_SYMBOL(__percpu_counter_init_many); =20 -void percpu_counter_destroy(struct percpu_counter *fbc) +void percpu_counter_destroy_many(struct percpu_counter *fbc, u32 nr_counte= rs) { unsigned long flags __maybe_unused; + u32 i; + + if (WARN_ON_ONCE(!fbc)) + return; =20 - if (!fbc->counters) + if (!fbc[0].counters) return; =20 - debug_percpu_counter_deactivate(fbc); + for (i =3D 0; i < nr_counters; i++) + debug_percpu_counter_deactivate(&fbc[i]); =20 #ifdef CONFIG_HOTPLUG_CPU spin_lock_irqsave(&percpu_counters_lock, flags); - list_del(&fbc->list); + for (i =3D 0; i < nr_counters; i++) + list_del(&fbc[i].list); spin_unlock_irqrestore(&percpu_counters_lock, flags); #endif - free_percpu(fbc->counters); - fbc->counters =3D NULL; + + free_percpu(fbc[0].counters); + + for (i =3D 0; i < nr_counters; i++) + fbc[i].counters =3D NULL; } -EXPORT_SYMBOL(percpu_counter_destroy); +EXPORT_SYMBOL(percpu_counter_destroy_many); =20 int percpu_counter_batch __read_mostly =3D 32; EXPORT_SYMBOL(percpu_counter_batch); --=20 2.41.0 From nobody Thu Dec 18 05:00:03 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 81A80EE49A3 for ; Wed, 23 Aug 2023 05:06:27 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232642AbjHWFG1 (ORCPT ); Wed, 23 Aug 2023 01:06:27 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:34288 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232643AbjHWFGV (ORCPT ); Wed, 23 Aug 2023 01:06:21 -0400 Received: from mail-ej1-x630.google.com (mail-ej1-x630.google.com [IPv6:2a00:1450:4864:20::630]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 541CDE58 for ; Tue, 22 Aug 2023 22:06:19 -0700 (PDT) Received: by mail-ej1-x630.google.com with SMTP id a640c23a62f3a-9936b3d0286so696526566b.0 for ; Tue, 22 Aug 2023 22:06:19 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1692767178; x=1693371978; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=iKHCB44AckVFwTchMQuBCPQ5Mr4hBKNgvgKNhep04i4=; b=nbGMC/v69gsCL1IBVf+ztV6UvaaYtA28bsu2aG6AvtQMBJ+xqE93v5gEURtPqyh+JM a1MdLF+6B9gcA/ErtCCRYwr6N8uk4hiVZIa5fwbicH+qxlKLgPpX8V8Jr4t+5dhgPHjO HfNCETQIIeDAprwEzHDEabadn+frN0HXrfZcoQ5LUCeQlooltrZns12hOr5bS7GeJ6eT 39KbYobKl8/hc9d0MxYsL0hN/2VjmNZuGYvtxgHwFpeJ7D1eRBtdGnd3KkB4NX4bywDL 1NZuON4UUJvw4jjxWq5Jiix4+HFHnB7c8wvbpR9KwGf/vq93yXNr+VBD4t9ellxcZLQu Gn1A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1692767178; x=1693371978; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=iKHCB44AckVFwTchMQuBCPQ5Mr4hBKNgvgKNhep04i4=; b=QNdgyF44TC3UjnfKYCDcqAc0AduAjuqdJT7F5f6Zlm2Pj/tZ72mSumLLQtluRRtvwD 27Eun+ubLIDZxzRq4gVmYmTz1PyatJrlhzZ4a7KNJ0/c5M/dNLaUcPmVMlgh0TrGubPK HtsUpHq1K0l5U8krosCvNRA8VEZ5cW+BtgG+YkdKIKi42Mwz3BL8cwXtHEJkOh9KsHf1 s+rm9zOFfJyvKYCc+9lU4/c8vbJZDoGzasLbK5R17S4ObqwYVx9Erl3ZEBcvLSZFLlel hONeSdxJe01YfdW4o4t0+eZB1rJ/PxDU18cIvDqfkq0vAsyAf5gLH4JSO6S0re1vuKFT 8b0w== X-Gm-Message-State: AOJu0YzD+1aCCRgi4fseDRfb8Uqa6mVu60ZCTcZlHzld3sKaV5plreUh ETHIL/+h8SYbvNoSTtlFgAGrhCr+Kp/ULQ== X-Google-Smtp-Source: AGHT+IE7nSULLshI01yEFzWITte3YVllrEGWbX3MHiHkvCwHHx0LceZup3+ufxgmQEyozhc7uPZ28Q== X-Received: by 2002:a17:907:60c7:b0:991:bf04:2047 with SMTP id hv7-20020a17090760c700b00991bf042047mr8784854ejc.14.1692767177624; Tue, 22 Aug 2023 22:06:17 -0700 (PDT) Received: from f.. (cst-prg-85-121.cust.vodafone.cz. [46.135.85.121]) by smtp.gmail.com with ESMTPSA id q16-20020a170906b29000b0099ddc81903asm9267401ejz.221.2023.08.22.22.06.16 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 22 Aug 2023 22:06:17 -0700 (PDT) From: Mateusz Guzik To: linux-kernel@vger.kernel.org Cc: dennis@kernel.org, tj@kernel.org, cl@linux.com, akpm@linux-foundation.org, shakeelb@google.com, vegard.nossum@oracle.com, linux-mm@kvack.org, Mateusz Guzik Subject: [PATCH v3 2/2] kernel/fork: group allocation/free of per-cpu counters for mm struct Date: Wed, 23 Aug 2023 07:06:09 +0200 Message-Id: <20230823050609.2228718-3-mjguzik@gmail.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230823050609.2228718-1-mjguzik@gmail.com> References: <20230823050609.2228718-1-mjguzik@gmail.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" A trivial execve scalability test which tries to be very friendly (statically linked binaries, all separate) is predominantly bottlenecked by back-to-back per-cpu counter allocations which serialize on global locks. Ease the pain by allocating and freeing them in one go. Bench can be found here: http://apollo.backplane.com/DFlyMisc/doexec.c $ cc -static -O2 -o static-doexec doexec.c $ ./static-doexec $(nproc) Even at a very modest scale of 26 cores (ops/s): before: 133543.63 after: 186061.81 (+39%) While with the patch these allocations remain a significant problem, the primary bottleneck shifts to page release handling. Signed-off-by: Mateusz Guzik Reviewed-by: Dennis Zhou --- kernel/fork.c | 14 +++----------- 1 file changed, 3 insertions(+), 11 deletions(-) diff --git a/kernel/fork.c b/kernel/fork.c index d2e12b6d2b18..4f0ada33457e 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -909,8 +909,6 @@ static void cleanup_lazy_tlbs(struct mm_struct *mm) */ void __mmdrop(struct mm_struct *mm) { - int i; - BUG_ON(mm =3D=3D &init_mm); WARN_ON_ONCE(mm =3D=3D current->mm); =20 @@ -925,9 +923,8 @@ void __mmdrop(struct mm_struct *mm) put_user_ns(mm->user_ns); mm_pasid_drop(mm); mm_destroy_cid(mm); + percpu_counter_destroy_many(mm->rss_stat, NR_MM_COUNTERS); =20 - for (i =3D 0; i < NR_MM_COUNTERS; i++) - percpu_counter_destroy(&mm->rss_stat[i]); free_mm(mm); } EXPORT_SYMBOL_GPL(__mmdrop); @@ -1252,8 +1249,6 @@ static void mm_init_uprobes_state(struct mm_struct *m= m) static struct mm_struct *mm_init(struct mm_struct *mm, struct task_struct = *p, struct user_namespace *user_ns) { - int i; - mt_init_flags(&mm->mm_mt, MM_MT_FLAGS); mt_set_external_lock(&mm->mm_mt, &mm->mmap_lock); atomic_set(&mm->mm_users, 1); @@ -1301,17 +1296,14 @@ static struct mm_struct *mm_init(struct mm_struct *= mm, struct task_struct *p, if (mm_alloc_cid(mm)) goto fail_cid; =20 - for (i =3D 0; i < NR_MM_COUNTERS; i++) - if (percpu_counter_init(&mm->rss_stat[i], 0, GFP_KERNEL_ACCOUNT)) - goto fail_pcpu; + if (percpu_counter_init_many(mm->rss_stat, 0, GFP_KERNEL_ACCOUNT, NR_MM_C= OUNTERS)) + goto fail_pcpu; =20 mm->user_ns =3D get_user_ns(user_ns); lru_gen_init_mm(mm); return mm; =20 fail_pcpu: - while (i > 0) - percpu_counter_destroy(&mm->rss_stat[--i]); mm_destroy_cid(mm); fail_cid: destroy_context(mm); --=20 2.41.0