From nobody Wed Feb 11 08:12:12 2026
Return-Path: <linux-kernel-owner@vger.kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 1D581CA0FFC
	for <linux-kernel@archiver.kernel.org>; Tue,  5 Sep 2023 16:24:44 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S1350374AbjIEQYm (ORCPT
        <rfc822;linux-kernel@archiver.kernel.org>);
        Tue, 5 Sep 2023 12:24:42 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:34660 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S1354756AbjIEOHi (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Tue, 5 Sep 2023 10:07:38 -0400
Received: from mgamail.intel.com (mgamail.intel.com [134.134.136.65])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 84A291AB
        for <linux-kernel@vger.kernel.org>;
 Tue,  5 Sep 2023 07:07:35 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1693922855; x=1725458855;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=Q95MzduZqjx1H3qSsIIWM/4xzW0Di0Tyl1YsUi4/EEk=;
  b=b9O00clt7a9N6CWPs2n7+O9xNN6xeXOqYRrK1WQPdlGf59URCG9hJ3tT
   IyK4Fu7ZQtmEYMci7EWK2h1MLU7gwUF5LvIQ1FlaYRzFJn95KNZw7VBAd
   RpJXBkLQKSwjlmCkUbfrvg2adCDPOaW8sP/lOl4wu62f1sj6NSXvV/jhc
   zLs4hYmHz3rpHvCiY/o5PXEyV51pOOXM84/W32+tXXQGqQpa7XwdimfyD
   CbyMLJJLeNxiQNDRswdpoFYU7PJueMPTcZIdXgrAdUxHaRPqOLwC2p8rl
   HKRIT0aItSb8Wh3RGf8uc0G5TSwOQ834IyjnmCjd5p8jmTA1gzeOT90Bl
   w==;
X-IronPort-AV: E=McAfee;i="6600,9927,10824"; a="380609601"
X-IronPort-AV: E=Sophos;i="6.02,229,1688454000";
   d="scan'208";a="380609601"
Received: from fmsmga004.fm.intel.com ([10.253.24.48])
  by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 05 Sep 2023 07:06:25 -0700
X-ExtLoop1: 1
X-IronPort-AV: E=McAfee;i="6600,9927,10824"; a="811242153"
X-IronPort-AV: E=Sophos;i="6.02,229,1688454000";
   d="scan'208";a="811242153"
Received: from shbuild999.sh.intel.com ([10.239.146.107])
  by fmsmga004.fm.intel.com with ESMTP; 05 Sep 2023 07:06:23 -0700
From: Feng Tang <feng.tang@intel.com>
To: Vlastimil Babka <vbabka@suse.cz>,
        Andrew Morton <akpm@linux-foundation.org>,
        Christoph Lameter <cl@linux.com>,
        Pekka Enberg <penberg@kernel.org>,
        David Rientjes <rientjes@google.com>,
        Joonsoo Kim <iamjoonsoo.kim@lge.com>,
        Roman Gushchin <roman.gushchin@linux.dev>,
        Hyeonggon Yoo <42.hyeyoo@gmail.com>, linux-mm@kvack.org,
        linux-kernel@vger.kernel.org
Cc: Feng Tang <feng.tang@intel.com>
Subject: [RFC Patch 2/3] mm/slub: double per-cpu partial number for large
 systems
Date: Tue,  5 Sep 2023 22:13:47 +0800
Message-Id: <20230905141348.32946-3-feng.tang@intel.com>
X-Mailer: git-send-email 2.27.0
In-Reply-To: <20230905141348.32946-1-feng.tang@intel.com>
References: <20230905141348.32946-1-feng.tang@intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

There are reports about severe lock contention for slub's per-node
'list_lock' in 'hackbench' test, [1][2], on server systems. And
similar contention is also seen when running 'mmap1' case of
will-it-scale on big systems. As the trend is one processor (socket)
will have more and more CPUs (100+, 200+), the contention could be
much more severe and becomes a scalability issue.

One way to help reducing the contention is to double the per-cpu
partial number for large systems.

Following is some performance data, where it shows big improvment
in will-it-scale/mmap1 case, but no ovbious change for the 'hackbench'
test.

The patch itself only makes the per-cpu partial number 2X, and for
better analysis, the 4X case is also profiled

will-it-scale/mmap1
-------------------
Run will-it-scale benchmark's 'mmap1' test case on a 2 socket Sapphire
Rapids server (112 cores / 224 threads) with 256 GB DRAM, run 3
configurations with parallel test threads of 25%, 50% and 100% of
number of CPUs, and the data is (base is vanilla v6.5 kernel):

		  base             base + 2X patch        base + 4X patch
wis-mmap1-25	 223670    +12.7%     251999     +34.9%     301749    per_proc=
ess_ops
wis-mmap1-50	 186020    +28.0%     238067     +55.6%     289521    per_proc=
ess_ops
wis-mmap1-100	  89200    +40.7%     125478     +62.4%     144858    per_pro=
cess_ops

Take the perf-profile comparasion of 50% test case, the lock contention
is greatly reduced:

     43.80           -11.5       32.27           -27.9       15.91   pp.sel=
f.native_queued_spin_lock_slowpath

hackbench
---------

Run same hackbench testcase  mentioned in [1], use same HW/SW as will-it-sc=
ale:

		  base             base + 2X patch        base + 4X patch
hackbench	759951      +0.2%    761506      +0.5%     763972     hackbench.t=
hroughput

[1]. https://lore.kernel.org/all/202307172140.3b34825a-oliver.sang@intel.co=
m/
[2]. ttps://lore.kernel.org/lkml/ZORaUsd+So+tnyMV@chenyu5-mobl2/

Signed-off-by: Feng Tang <feng.tang@intel.com>
---
 mm/slub.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/mm/slub.c b/mm/slub.c
index f7940048138c..51ca6dbaad09 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -4361,6 +4361,13 @@ static void set_cpu_partial(struct kmem_cache *s)
 	else
 		nr_objects =3D 120;
=20
+	/*
+	 * Give larger system more per-cpu partial slabs to reduce/postpone
+	 * contending per-node partial list.
+	 */
+	if (num_cpus() >=3D 32)
+		nr_objects *=3D 2;
+
 	slub_set_cpu_partial(s, nr_objects);
 #endif
 }
--=20
2.27.0