From nobody Mon Apr 6 14:10:03 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4D810C38145 for ; Tue, 6 Sep 2022 08:37:56 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S239305AbiIFIhw (ORCPT ); Tue, 6 Sep 2022 04:37:52 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:39958 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234121AbiIFIhE (ORCPT ); Tue, 6 Sep 2022 04:37:04 -0400 Received: from mga04.intel.com (mga04.intel.com [192.55.52.120]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B2C2C77EAC for ; Tue, 6 Sep 2022 01:35:47 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1662453347; x=1693989347; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=Xr2XFavRO7vmZgXY1TerF9mR0HBaDqK26PpRm3Zspt4=; b=ZG7eI6eXzfMAF9ymJNuMCXdBLBKCz3+FyeGjGdBWJ57FD8gT0Rl/a5Ik tuNXj0lwfpT+RHLs/284ScCwIauL34pXDbpSUKvudjuPCMXkCz5LsAUYf zbUnPO73aV6XmJ0YF+8CfDe/S0vfa/fvv+WFHbM04KXYxxswQHr9Fmqmr BvF4MF9ukCHuW2FkkSMHAWunkehGOodSujHkdP27aVjOj+BI3RTQWzFwa k4AS7ubHaHAiPQkfhbNypg0IndXwfaU6vyrkg9fW4WZqzq43sBnnByls4 NeCeExNpW+prJxlVQ51ZQorkEPVYZSdzjWmJ5grx6Q0haTvvf+V9hGisB Q==; X-IronPort-AV: E=McAfee;i="6500,9779,10461"; a="295273481" X-IronPort-AV: E=Sophos;i="5.93,293,1654585200"; d="scan'208";a="295273481" Received: from fmsmga008.fm.intel.com ([10.253.24.58]) by fmsmga104.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 06 Sep 2022 01:35:32 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.93,293,1654585200"; d="scan'208";a="675597111" Received: from linux-pnp-server-13.sh.intel.com ([10.239.176.176]) by fmsmga008.fm.intel.com with ESMTP; 06 Sep 2022 01:35:29 -0700 From: Jiebin Sun To: akpm@linux-foundation.org, vasily.averin@linux.dev, shakeelb@google.com, dennis@kernel.org, tj@kernel.org, cl@linux.com, ebiederm@xmission.com, legion@kernel.org, manfred@colorfullife.com, alexander.mikhalitsyn@virtuozzo.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: tim.c.chen@intel.com, feng.tang@intel.com, ying.huang@intel.com, tianyou.li@intel.com, wangyang.guo@intel.com, jiebin.sun@intel.com Subject: [PATCH v3 1/2] percpu: Add percpu_counter_add_local Date: Wed, 7 Sep 2022 00:54:29 +0800 Message-Id: <20220906165430.851424-2-jiebin.sun@intel.com> X-Mailer: git-send-email 2.31.1 In-Reply-To: <20220906165430.851424-1-jiebin.sun@intel.com> References: <20220902152243.479592-1-jiebin.sun@intel.com> <20220906165430.851424-1-jiebin.sun@intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" Add percpu_counter_add_local for only updating local counter without aggregating to global counter. This function could be used with percpu_counter_sum together if you need high accurate counter. It could bring obvious performance improvement if percpu_counter_add is frequently called and percpu_counter_sum is not in the critical path. Please use percpu_counter_add_batch instead if you need the counter timely but not accurate and the call of percpu_counter_add_batch is not heavy. Signed-off-by: Jiebin Sun --- include/linux/percpu_counter.h | 7 +++++++ lib/percpu_counter.c | 14 ++++++++++++++ 2 files changed, 21 insertions(+) diff --git a/include/linux/percpu_counter.h b/include/linux/percpu_counter.h index 01861eebed79..344d69ae0fb1 100644 --- a/include/linux/percpu_counter.h +++ b/include/linux/percpu_counter.h @@ -40,6 +40,7 @@ int __percpu_counter_init(struct percpu_counter *fbc, s64= amount, gfp_t gfp, =20 void percpu_counter_destroy(struct percpu_counter *fbc); void percpu_counter_set(struct percpu_counter *fbc, s64 amount); +void percpu_counter_add_local(struct percpu_counter *fbc, s64 amount); void percpu_counter_add_batch(struct percpu_counter *fbc, s64 amount, s32 batch); s64 __percpu_counter_sum(struct percpu_counter *fbc); @@ -138,6 +139,12 @@ percpu_counter_add(struct percpu_counter *fbc, s64 amo= unt) preempt_enable(); } =20 +static inline void +percpu_counter_add_local(struct percpu_counter *fbc, s64 amount) +{ + percpu_counter_add(fbc, amount); +} + static inline void percpu_counter_add_batch(struct percpu_counter *fbc, s64 amount, s32 batch) { diff --git a/lib/percpu_counter.c b/lib/percpu_counter.c index ed610b75dc32..36907eb573a8 100644 --- a/lib/percpu_counter.c +++ b/lib/percpu_counter.c @@ -72,6 +72,20 @@ void percpu_counter_set(struct percpu_counter *fbc, s64 = amount) } EXPORT_SYMBOL(percpu_counter_set); =20 +/* + * Recommend to use the function combined with percpu_counter_sum if you n= eed + * high accurate counter. As the percpu_counter_sum add up all the percpu + * counter, there is no need to check batch size and sum in percpu_counter= _add. + * If the percpu_counter_sum is infrequent used and the percpu_counter_add + * is in critical path, this combination could have significant performance + * improvement than the function percpu_counter_add_batch. + */ +void percpu_counter_add_local(struct percpu_counter *fbc, s64 amount) +{ + this_cpu_add(*fbc->counters, amount); +} +EXPORT_SYMBOL(percpu_counter_add_local); + /* * This function is both preempt and irq safe. The former is due to explic= it * preemption disable. The latter is guaranteed by the fact that the slow = path --=20 2.31.1 From nobody Mon Apr 6 14:10:03 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id ABE86ECAAA1 for ; Tue, 6 Sep 2022 08:37:44 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S239381AbiIFIhm (ORCPT ); Tue, 6 Sep 2022 04:37:42 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:37680 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S239287AbiIFIhA (ORCPT ); Tue, 6 Sep 2022 04:37:00 -0400 Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 664037757C for ; Tue, 6 Sep 2022 01:35:41 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1662453342; x=1693989342; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=+hfxFTBslo19p0IshemZ5IUFjK6rhpELRZSOm1JDagE=; b=NSF5dJhEJXhuZV8yg7IyKXrggTO9n4Lf7Ng8mkoXQyRBXHNilTwfSpZh Y+4nAsBcw2av0OQsItrSwkwnvQUG6fhUOfnQMMfvsMThh8LrWSbGqE11+ XGN+xFpoFZELaznpdG/sQWGO+nKY6+ipGm1+PZCysb/MzMWQHm8EfGFV1 jgQednG0WGSE+ShfPnfp5GJEXmBLwlj2YIcbA9bydOZwCJtOtCUmGrhH8 Ll+91qrXpb2sU5HBeRmgwM2JmT7xxRbhGWUcGQKtnpxyYIGUoJQLFA2xn 0GjLosafCb6DjlaAloG6GXc848valsOMtx1MbYh5HptsLktBNe6/ZJNad w==; X-IronPort-AV: E=McAfee;i="6500,9779,10461"; a="297850690" X-IronPort-AV: E=Sophos;i="5.93,293,1654585200"; d="scan'208";a="297850690" Received: from fmsmga008.fm.intel.com ([10.253.24.58]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 06 Sep 2022 01:35:38 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.93,293,1654585200"; d="scan'208";a="675597149" Received: from linux-pnp-server-13.sh.intel.com ([10.239.176.176]) by fmsmga008.fm.intel.com with ESMTP; 06 Sep 2022 01:35:34 -0700 From: Jiebin Sun To: akpm@linux-foundation.org, vasily.averin@linux.dev, shakeelb@google.com, dennis@kernel.org, tj@kernel.org, cl@linux.com, ebiederm@xmission.com, legion@kernel.org, manfred@colorfullife.com, alexander.mikhalitsyn@virtuozzo.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: tim.c.chen@intel.com, feng.tang@intel.com, ying.huang@intel.com, tianyou.li@intel.com, wangyang.guo@intel.com, jiebin.sun@intel.com Subject: [PATCH v3 2/2] ipc/msg: mitigate the lock contention with percpu counter Date: Wed, 7 Sep 2022 00:54:30 +0800 Message-Id: <20220906165430.851424-3-jiebin.sun@intel.com> X-Mailer: git-send-email 2.31.1 In-Reply-To: <20220906165430.851424-1-jiebin.sun@intel.com> References: <20220902152243.479592-1-jiebin.sun@intel.com> <20220906165430.851424-1-jiebin.sun@intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" The msg_bytes and msg_hdrs atomic counters are frequently updated when IPC msg queue is in heavy use, causing heavy cache bounce and overhead. Change them to percpu_counter greatly improve the performance. Since there is one percpu struct per namespace, additional memory cost is minimal. Reading of the count done in msgctl call, which is infrequent. So the need to sum up the counts in each CPU is infrequent. Apply the patch and test the pts/stress-ng-1.4.0 -- system v message passing (160 threads). Score gain: 3.38x CPU: ICX 8380 x 2 sockets Core number: 40 x 2 physical cores Benchmark: pts/stress-ng-1.4.0 -- system v message passing (160 threads) Signed-off-by: Jiebin Sun --- include/linux/ipc_namespace.h | 5 ++-- ipc/msg.c | 44 ++++++++++++++++++++++++----------- ipc/namespace.c | 5 +++- ipc/util.h | 4 ++-- 4 files changed, 39 insertions(+), 19 deletions(-) diff --git a/include/linux/ipc_namespace.h b/include/linux/ipc_namespace.h index e3e8c8662b49..e8240cf2611a 100644 --- a/include/linux/ipc_namespace.h +++ b/include/linux/ipc_namespace.h @@ -11,6 +11,7 @@ #include #include #include +#include =20 struct user_namespace; =20 @@ -36,8 +37,8 @@ struct ipc_namespace { unsigned int msg_ctlmax; unsigned int msg_ctlmnb; unsigned int msg_ctlmni; - atomic_t msg_bytes; - atomic_t msg_hdrs; + struct percpu_counter percpu_msg_bytes; + struct percpu_counter percpu_msg_hdrs; =20 size_t shm_ctlmax; size_t shm_ctlall; diff --git a/ipc/msg.c b/ipc/msg.c index a0d05775af2c..87c30decb23f 100644 --- a/ipc/msg.c +++ b/ipc/msg.c @@ -39,6 +39,7 @@ #include #include #include +#include =20 #include #include @@ -285,10 +286,10 @@ static void freeque(struct ipc_namespace *ns, struct = kern_ipc_perm *ipcp) rcu_read_unlock(); =20 list_for_each_entry_safe(msg, t, &msq->q_messages, m_list) { - atomic_dec(&ns->msg_hdrs); + percpu_counter_add_local(&ns->percpu_msg_hdrs, -1); free_msg(msg); } - atomic_sub(msq->q_cbytes, &ns->msg_bytes); + percpu_counter_add_local(&ns->percpu_msg_bytes, -(msq->q_cbytes)); ipc_update_pid(&msq->q_lspid, NULL); ipc_update_pid(&msq->q_lrpid, NULL); ipc_rcu_putref(&msq->q_perm, msg_rcu_free); @@ -495,17 +496,18 @@ static int msgctl_info(struct ipc_namespace *ns, int = msqid, msginfo->msgssz =3D MSGSSZ; msginfo->msgseg =3D MSGSEG; down_read(&msg_ids(ns).rwsem); - if (cmd =3D=3D MSG_INFO) { + if (cmd =3D=3D MSG_INFO) msginfo->msgpool =3D msg_ids(ns).in_use; - msginfo->msgmap =3D atomic_read(&ns->msg_hdrs); - msginfo->msgtql =3D atomic_read(&ns->msg_bytes); + max_idx =3D ipc_get_maxidx(&msg_ids(ns)); + up_read(&msg_ids(ns).rwsem); + if (cmd =3D=3D MSG_INFO) { + msginfo->msgmap =3D percpu_counter_sum(&ns->percpu_msg_hdrs); + msginfo->msgtql =3D percpu_counter_sum(&ns->percpu_msg_bytes); } else { msginfo->msgmap =3D MSGMAP; msginfo->msgpool =3D MSGPOOL; msginfo->msgtql =3D MSGTQL; } - max_idx =3D ipc_get_maxidx(&msg_ids(ns)); - up_read(&msg_ids(ns).rwsem); return (max_idx < 0) ? 0 : max_idx; } =20 @@ -935,8 +937,8 @@ static long do_msgsnd(int msqid, long mtype, void __use= r *mtext, list_add_tail(&msg->m_list, &msq->q_messages); msq->q_cbytes +=3D msgsz; msq->q_qnum++; - atomic_add(msgsz, &ns->msg_bytes); - atomic_inc(&ns->msg_hdrs); + percpu_counter_add_local(&ns->percpu_msg_bytes, msgsz); + percpu_counter_add_local(&ns->percpu_msg_hdrs, 1); } =20 err =3D 0; @@ -1159,8 +1161,8 @@ static long do_msgrcv(int msqid, void __user *buf, si= ze_t bufsz, long msgtyp, in msq->q_rtime =3D ktime_get_real_seconds(); ipc_update_pid(&msq->q_lrpid, task_tgid(current)); msq->q_cbytes -=3D msg->m_ts; - atomic_sub(msg->m_ts, &ns->msg_bytes); - atomic_dec(&ns->msg_hdrs); + percpu_counter_add_local(&ns->percpu_msg_bytes, -(msg->m_ts)); + percpu_counter_add_local(&ns->percpu_msg_hdrs, -1); ss_wakeup(msq, &wake_q, false); =20 goto out_unlock0; @@ -1297,20 +1299,34 @@ COMPAT_SYSCALL_DEFINE5(msgrcv, int, msqid, compat_u= ptr_t, msgp, } #endif =20 -void msg_init_ns(struct ipc_namespace *ns) +int msg_init_ns(struct ipc_namespace *ns) { + int ret; + ns->msg_ctlmax =3D MSGMAX; ns->msg_ctlmnb =3D MSGMNB; ns->msg_ctlmni =3D MSGMNI; =20 - atomic_set(&ns->msg_bytes, 0); - atomic_set(&ns->msg_hdrs, 0); + ret =3D percpu_counter_init(&ns->percpu_msg_bytes, 0, GFP_KERNEL); + if (ret) + goto fail_msg_bytes; + ret =3D percpu_counter_init(&ns->percpu_msg_hdrs, 0, GFP_KERNEL); + if (ret) + goto fail_msg_hdrs; ipc_init_ids(&ns->ids[IPC_MSG_IDS]); + return 0; + + fail_msg_hdrs: + percpu_counter_destroy(&ns->percpu_msg_bytes); + fail_msg_bytes: + return ret; } =20 #ifdef CONFIG_IPC_NS void msg_exit_ns(struct ipc_namespace *ns) { + percpu_counter_destroy(&ns->percpu_msg_bytes); + percpu_counter_destroy(&ns->percpu_msg_hdrs); free_ipcs(ns, &msg_ids(ns), freeque); idr_destroy(&ns->ids[IPC_MSG_IDS].ipcs_idr); rhashtable_destroy(&ns->ids[IPC_MSG_IDS].key_ht); diff --git a/ipc/namespace.c b/ipc/namespace.c index e1fcaedba4fa..8316ea585733 100644 --- a/ipc/namespace.c +++ b/ipc/namespace.c @@ -66,8 +66,11 @@ static struct ipc_namespace *create_ipc_ns(struct user_n= amespace *user_ns, if (!setup_ipc_sysctls(ns)) goto fail_mq; =20 + err =3D msg_init_ns(ns); + if (err) + goto fail_put; + sem_init_ns(ns); - msg_init_ns(ns); shm_init_ns(ns); =20 return ns; diff --git a/ipc/util.h b/ipc/util.h index 2dd7ce0416d8..1b0086c6346f 100644 --- a/ipc/util.h +++ b/ipc/util.h @@ -64,7 +64,7 @@ static inline void mq_put_mnt(struct ipc_namespace *ns) {= } =20 #ifdef CONFIG_SYSVIPC void sem_init_ns(struct ipc_namespace *ns); -void msg_init_ns(struct ipc_namespace *ns); +int msg_init_ns(struct ipc_namespace *ns); void shm_init_ns(struct ipc_namespace *ns); =20 void sem_exit_ns(struct ipc_namespace *ns); @@ -72,7 +72,7 @@ void msg_exit_ns(struct ipc_namespace *ns); void shm_exit_ns(struct ipc_namespace *ns); #else static inline void sem_init_ns(struct ipc_namespace *ns) { } -static inline void msg_init_ns(struct ipc_namespace *ns) { } +static inline int msg_init_ns(struct ipc_namespace *ns) { return 0;} static inline void shm_init_ns(struct ipc_namespace *ns) { } =20 static inline void sem_exit_ns(struct ipc_namespace *ns) { } --=20 2.31.1