From nobody Sun Feb  8 06:21:30 2026
Return-Path: <linux-kernel-owner@vger.kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 913F6C001DC
	for <linux-kernel@archiver.kernel.org>; Sun, 23 Jul 2023 19:10:14 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S229491AbjGWTKN (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Sun, 23 Jul 2023 15:10:13 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46308 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S229626AbjGWTKH (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Sun, 23 Jul 2023 15:10:07 -0400
Received: from mail-pl1-x630.google.com (mail-pl1-x630.google.com
 [IPv6:2607:f8b0:4864:20::630])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B74CB10E5
        for <linux-kernel@vger.kernel.org>;
 Sun, 23 Jul 2023 12:09:33 -0700 (PDT)
Received: by mail-pl1-x630.google.com with SMTP id
 d9443c01a7336-1b852785a65so23740005ad.0
        for <linux-kernel@vger.kernel.org>;
 Sun, 23 Jul 2023 12:09:33 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=20221208; t=1690139372; x=1690744172;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:from:to:cc:subject:date
         :message-id:reply-to;
        bh=gKo85STH0xt3042Px3GwfcVzSaeLLym3sMWM8T4c8oI=;
        b=n2mPX/AnU5HRR2TatQSvEtq454ZA0uqEV+XcsaSntFyzBasdSjUD2ZxXYhdtKsJ8Ek
         N9htiNr2hrIdgrBE9skKL+CIacqccCRmdK02SxoejK2Q1awY9vRG/2TvqQDK6Syxfxj8
         2kbd0N7UF3rp7kIS3xCjLnlJqHAmiS4StmhmdAYyMZNjjd5Dkf8T4ZdXZx+nRl8G5CyZ
         xyId8mYgj1BZRUun2hsOiY5K4Oo/jHj9H2P94lDE6qSxJT3Z5wZbzJKGx1BSBfKHLgV6
         MGGFpyqeVUcYcdjRQFSL+G+W1+7NynXeEe0rLe7ZlLKTi9rEp8ygNOqDEXZo4x/wXXa7
         /u6A==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20221208; t=1690139372; x=1690744172;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc
         :subject:date:message-id:reply-to;
        bh=gKo85STH0xt3042Px3GwfcVzSaeLLym3sMWM8T4c8oI=;
        b=Bx5svf+iVEjZQo7NhHIPZC9ch54edsnPwkn8DxQuZNfXb6GLDU/iiMghnZ6AaB6RIk
         7cYJFvZOxDjxtCvJgnP8bgpnCf55MGZWH7i3hGSd33Mt14sxjddRWzKmxJBJnr1gpY+0
         UPAiUlipeJ1HGIFNogF09R8fZm6b9Xqbvnka/xuWl9IX3FO50Ov1F1ugwavd0iLSHnVl
         E15zDWUxyVQsYTAGbzZpyvJa0yCGe1w+Mx89gELVQ8B7F7pVe6zJfoxRmEiITBdxpw4J
         K8T2mIxAw7rC4ZUb9VS41BV3i3bJFFR9WP9cQjyps6JUbAt0RJX8ChEQ5RknReE+onIk
         nQYQ==
X-Gm-Message-State: ABy/qLaPbOqvH+j8/SBzOfEGk09MVUcO++DGTD/Vz0BusOcEXyXe2tvT
        ioVZXn9q9O+l5Q047oMtFEA=
X-Google-Smtp-Source: 
 APBJJlF1frUPXLA+nNbwlQwx8oe1ROZ8tR8dgVHs6kFaKVR6/CntGZhb6WAHDnToH/UXDwmWuSjusA==
X-Received: by 2002:a17:902:f687:b0:1bb:8cb6:3f99 with SMTP id
 l7-20020a170902f68700b001bb8cb63f99mr4867895plg.14.1690139371565;
        Sun, 23 Jul 2023 12:09:31 -0700 (PDT)
Received: from fedora.. ([1.245.179.104])
        by smtp.gmail.com with ESMTPSA id
 s10-20020a170902ea0a00b001b53d3d8f3dsm7168625plg.299.2023.07.23.12.09.26
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Sun, 23 Jul 2023 12:09:31 -0700 (PDT)
From: Hyeonggon Yoo <42.hyeyoo@gmail.com>
To: Vlastimil Babka <vbabka@suse.cz>, Christoph Lameter <cl@linux.com>,
        Pekka Enberg <penberg@kernel.org>,
        Joonsoo Kim <iamjoonsoo.kim@lge.com>,
        David Rientjes <rientjes@google.com>,
        Andrew Morton <akpm@linux-foundation.org>
Cc: Roman Gushchin <roman.gushchin@linux.dev>,
        Feng Tang <feng.tang@intel.com>,
        "Sang, Oliver" <oliver.sang@intel.com>,
        Jay Patel <jaypatel@linux.ibm.com>,
        Binder Makin <merimus@google.com>, aneesh.kumar@linux.ibm.com,
        tsahu@linux.ibm.com, piyushs@linux.ibm.com, fengwei.yin@intel.com,
        ying.huang@intel.com, lkp <lkp@intel.com>,
        "oe-lkp@lists.linux.dev" <oe-lkp@lists.linux.dev>,
        linux-mm@kvack.org, linux-kernel@vger.kernel.org,
        Hyeonggon Yoo <42.hyeyoo@gmail.com>
Subject: [RFC 1/2] Revert "mm,
 slub: change percpu partial accounting from objects to pages"
Date: Mon, 24 Jul 2023 04:09:05 +0900
Message-ID: <20230723190906.4082646-2-42.hyeyoo@gmail.com>
X-Mailer: git-send-email 2.41.0
In-Reply-To: <20230723190906.4082646-1-42.hyeyoo@gmail.com>
References: <20230723190906.4082646-1-42.hyeyoo@gmail.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

This is partial revert of commit b47291ef02b0 ("mm, slub: change percpu
partial accounting from objects to pages"). and full revert of commit
662188c3a20e ("mm/slub: Simplify struct slab slabs field definition").

While b47291ef02b0 prevents percpu partial slab list becoming too long,
it assumes that the order of slabs are always oo_order(s->oo).

The current approach can surprisingly lower the number of objects cached
per cpu when it fails to allocate high order slabs. Instead of accounting
the number of slabs, change it back to accounting objects, but keep
the assumption that the slab is always half-full.

With this change, the number of cached objects per cpu is not surprisingly
decreased even when it fails to allocate high order slabs. It still
prevents large inaccuracy because it does not account based on the
number of free objects when taking slabs.
---
 include/linux/slub_def.h |  2 --
 mm/slab.h                |  6 ++++++
 mm/slub.c                | 31 ++++++++++++-------------------
 3 files changed, 18 insertions(+), 21 deletions(-)

diff --git a/include/linux/slub_def.h b/include/linux/slub_def.h
index deb90cf4bffb..589ff6a2a23f 100644
--- a/include/linux/slub_def.h
+++ b/include/linux/slub_def.h
@@ -109,8 +109,6 @@ struct kmem_cache {
 #ifdef CONFIG_SLUB_CPU_PARTIAL
 	/* Number of per cpu partial objects to keep around */
 	unsigned int cpu_partial;
-	/* Number of per cpu partial slabs to keep around */
-	unsigned int cpu_partial_slabs;
 #endif
 	struct kmem_cache_order_objects oo;
=20
diff --git a/mm/slab.h b/mm/slab.h
index 799a315695c6..be38a264df16 100644
--- a/mm/slab.h
+++ b/mm/slab.h
@@ -65,7 +65,13 @@ struct slab {
 #ifdef CONFIG_SLUB_CPU_PARTIAL
 				struct {
 					struct slab *next;
+#ifdef CONFIG_64BIT
 					int slabs;	/* Nr of slabs left */
+					int pobjects;	/* Approximate count */
+#else
+					short int slabs;
+					short int pobjects;
+#endif
 				};
 #endif
 			};
diff --git a/mm/slub.c b/mm/slub.c
index f7940048138c..199d3d03d5b9 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -486,18 +486,7 @@ static inline unsigned int oo_objects(struct kmem_cach=
e_order_objects x)
 #ifdef CONFIG_SLUB_CPU_PARTIAL
 static void slub_set_cpu_partial(struct kmem_cache *s, unsigned int nr_obj=
ects)
 {
-	unsigned int nr_slabs;
-
 	s->cpu_partial =3D nr_objects;
-
-	/*
-	 * We take the number of objects but actually limit the number of
-	 * slabs on the per cpu partial list, in order to limit excessive
-	 * growth of the list. For simplicity we assume that the slabs will
-	 * be half-full.
-	 */
-	nr_slabs =3D DIV_ROUND_UP(nr_objects * 2, oo_objects(s->oo));
-	s->cpu_partial_slabs =3D nr_slabs;
 }
 #else
 static inline void
@@ -2275,7 +2264,7 @@ static void *get_partial_node(struct kmem_cache *s, s=
truct kmem_cache_node *n,
 	struct slab *slab, *slab2;
 	void *object =3D NULL;
 	unsigned long flags;
-	unsigned int partial_slabs =3D 0;
+	int objects_taken =3D 0;
=20
 	/*
 	 * Racy check. If we mistakenly see no partial slabs then we
@@ -2312,11 +2301,11 @@ static void *get_partial_node(struct kmem_cache *s,=
 struct kmem_cache_node *n,
 		} else {
 			put_cpu_partial(s, slab, 0);
 			stat(s, CPU_PARTIAL_NODE);
-			partial_slabs++;
+			objects_taken +=3D slab->objects / 2;
 		}
 #ifdef CONFIG_SLUB_CPU_PARTIAL
 		if (!kmem_cache_has_cpu_partial(s)
-			|| partial_slabs > s->cpu_partial_slabs / 2)
+			|| objects_taken > s->cpu_partial / 2)
 			break;
 #else
 		break;
@@ -2699,13 +2688,14 @@ static void put_cpu_partial(struct kmem_cache *s, s=
truct slab *slab, int drain)
 	struct slab *slab_to_unfreeze =3D NULL;
 	unsigned long flags;
 	int slabs =3D 0;
+	int pobjects =3D 0;
=20
 	local_lock_irqsave(&s->cpu_slab->lock, flags);
=20
 	oldslab =3D this_cpu_read(s->cpu_slab->partial);
=20
 	if (oldslab) {
-		if (drain && oldslab->slabs >=3D s->cpu_partial_slabs) {
+		if (drain && oldslab->pobjects >=3D s->cpu_partial) {
 			/*
 			 * Partial array is full. Move the existing set to the
 			 * per node partial list. Postpone the actual unfreezing
@@ -2714,14 +2704,17 @@ static void put_cpu_partial(struct kmem_cache *s, s=
truct slab *slab, int drain)
 			slab_to_unfreeze =3D oldslab;
 			oldslab =3D NULL;
 		} else {
+			pobjects =3D oldslab->pobjects;
 			slabs =3D oldslab->slabs;
 		}
 	}
=20
 	slabs++;
+	pobjects +=3D slab->objects / 2;
=20
 	slab->slabs =3D slabs;
 	slab->next =3D oldslab;
+	slab->pobjects =3D pobjects;
=20
 	this_cpu_write(s->cpu_slab->partial, slab);
=20
@@ -5653,13 +5646,13 @@ static ssize_t slabs_cpu_partial_show(struct kmem_c=
ache *s, char *buf)
=20
 		slab =3D slub_percpu_partial(per_cpu_ptr(s->cpu_slab, cpu));
=20
-		if (slab)
+		if (slab) {
 			slabs +=3D slab->slabs;
+			objects +=3D slab->objects;
+		}
 	}
 #endif
=20
-	/* Approximate half-full slabs, see slub_set_cpu_partial() */
-	objects =3D (slabs * oo_objects(s->oo)) / 2;
 	len +=3D sysfs_emit_at(buf, len, "%d(%d)", objects, slabs);
=20
 #ifdef CONFIG_SLUB_CPU_PARTIAL
@@ -5669,7 +5662,7 @@ static ssize_t slabs_cpu_partial_show(struct kmem_cac=
he *s, char *buf)
 		slab =3D slub_percpu_partial(per_cpu_ptr(s->cpu_slab, cpu));
 		if (slab) {
 			slabs =3D READ_ONCE(slab->slabs);
-			objects =3D (slabs * oo_objects(s->oo)) / 2;
+			objects =3D READ_ONCE(slab->pobjects);
 			len +=3D sysfs_emit_at(buf, len, " C%d=3D%d(%d)",
 					     cpu, objects, slabs);
 		}
--=20
2.41.0
From nobody Sun Feb  8 06:21:30 2026
Return-Path: <linux-kernel-owner@vger.kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id E0E1DC001DE
	for <linux-kernel@archiver.kernel.org>; Sun, 23 Jul 2023 19:10:16 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S229628AbjGWTKP (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Sun, 23 Jul 2023 15:10:15 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46376 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S229684AbjGWTKL (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Sun, 23 Jul 2023 15:10:11 -0400
Received: from mail-pl1-x629.google.com (mail-pl1-x629.google.com
 [IPv6:2607:f8b0:4864:20::629])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 1BD1FE7A
        for <linux-kernel@vger.kernel.org>;
 Sun, 23 Jul 2023 12:09:38 -0700 (PDT)
Received: by mail-pl1-x629.google.com with SMTP id
 d9443c01a7336-1b8ad8383faso28217935ad.0
        for <linux-kernel@vger.kernel.org>;
 Sun, 23 Jul 2023 12:09:38 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=20221208; t=1690139377; x=1690744177;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:from:to:cc:subject:date
         :message-id:reply-to;
        bh=r/RoRea2RA0cNUB7aJ08JIbxus10wCykI3szBf/b3NI=;
        b=Ou4Wv9xAdrG23117hD6v3zkc3JOqEdfqt1Qu9ZrCNSbLtrIaXd+6wF32CEMSTd5na6
         7A4Guaqf64bRqeW0tCtQ3dcA6oQku9WLgiqy4Me04QKAWd3fmPrJLmu8AjG0JZx3jRBL
         hstkawb10WlcxmKoFfy0da+F7E75mpmfcKyMuRUHGldiAFUkoLeykz80oJDSAuWDOXWk
         KXk5oTCiwgbucfFmgcXHqCnTDdL05FW9V+s4UvDhY8p68ItPDrDmUNJBHBrjC6auj4W7
         /lrlhfTfTf9r+moYceZyaJC6HkUM5ekZ3jNJYt3cPYA4LvyFgu4LhZluoSlht0DWlKVo
         jKUQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20221208; t=1690139377; x=1690744177;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc
         :subject:date:message-id:reply-to;
        bh=r/RoRea2RA0cNUB7aJ08JIbxus10wCykI3szBf/b3NI=;
        b=cyNV4vtshXh02Tj4TV8DRaP2sV7JI6V9BV5PtuGgIszKa0+bzTI35vnlnTanQmQa2N
         6s/8WCwZiWVt73V3ZeoIesru0OTQjf3PzrBw8QsTxkUGeLWP/fzbcfjAaewJVr17dve9
         bAEtaI0IJcPsbrvDL/DKkd+KqP2/UzO7MIm4d043jdwYSY7LQzyEtSw4B25iSu3v3/6/
         ztnRrxUNwPihyj6hNQsaAkgnuZY1q3pM71PfXgURuZKO2bA9vAmOWLR+czgyTstboKBB
         oXNPKjgP03oOJdNk7+8/WEo64lKna65wtFr/nQAf8AsJt5rmOZO0eQ0belLBIGE/q17I
         3MGA==
X-Gm-Message-State: ABy/qLYwPVF7ir2QbGiTq5BBlyOoQWg2x3iofwSHyYBF4gOapG7K7vTe
        wwAHNs4R0+bl9wvgdS2d6Y8=
X-Google-Smtp-Source: 
 APBJJlGkPj1Di2xoKkNZYf+n3venm8BAIC1yQaqaQC5aMCdLWvD6Q5YstcIl0N6JZs1PReycc5iu/Q==
X-Received: by 2002:a17:903:1208:b0:1b9:d307:c1df with SMTP id
 l8-20020a170903120800b001b9d307c1dfmr10327171plh.17.1690139377245;
        Sun, 23 Jul 2023 12:09:37 -0700 (PDT)
Received: from fedora.. ([1.245.179.104])
        by smtp.gmail.com with ESMTPSA id
 s10-20020a170902ea0a00b001b53d3d8f3dsm7168625plg.299.2023.07.23.12.09.31
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Sun, 23 Jul 2023 12:09:36 -0700 (PDT)
From: Hyeonggon Yoo <42.hyeyoo@gmail.com>
To: Vlastimil Babka <vbabka@suse.cz>, Christoph Lameter <cl@linux.com>,
        Pekka Enberg <penberg@kernel.org>,
        Joonsoo Kim <iamjoonsoo.kim@lge.com>,
        David Rientjes <rientjes@google.com>,
        Andrew Morton <akpm@linux-foundation.org>
Cc: Roman Gushchin <roman.gushchin@linux.dev>,
        Feng Tang <feng.tang@intel.com>,
        "Sang, Oliver" <oliver.sang@intel.com>,
        Jay Patel <jaypatel@linux.ibm.com>,
        Binder Makin <merimus@google.com>, aneesh.kumar@linux.ibm.com,
        tsahu@linux.ibm.com, piyushs@linux.ibm.com, fengwei.yin@intel.com,
        ying.huang@intel.com, lkp <lkp@intel.com>,
        "oe-lkp@lists.linux.dev" <oe-lkp@lists.linux.dev>,
        linux-mm@kvack.org, linux-kernel@vger.kernel.org,
        Hyeonggon Yoo <42.hyeyoo@gmail.com>
Subject: [RFC 2/2] mm/slub: prefer NUMA locality over slight memory saving on
 NUMA machines
Date: Mon, 24 Jul 2023 04:09:06 +0900
Message-ID: <20230723190906.4082646-3-42.hyeyoo@gmail.com>
X-Mailer: git-send-email 2.41.0
In-Reply-To: <20230723190906.4082646-1-42.hyeyoo@gmail.com>
References: <20230723190906.4082646-1-42.hyeyoo@gmail.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

By default, SLUB sets remote_node_defrag_ratio to 1000, which makes it
(in most cases) take slabs from remote nodes first before trying allocating
new folios on the local node from buddy.

Documentation/ABI/testing/sysfs-kernel-slab says:
> The file remote_node_defrag_ratio specifies the percentage of
> times SLUB will attempt to refill the cpu slab with a partial
> slab from a remote node as opposed to allocating a new slab on
> the local node.  This reduces the amount of wasted memory over
> the entire system but can be expensive.

Although this made sense when it was introduced, the portion of
per node partial lists in the overall SLUB memory usage has been decreased
since the introduction of per cpu partial lists. Therefore, it's worth
reevaluating its overhead on performance and memory usage.

[
	XXX: Add performance data. I tried to measure its impact on
	hackbench with a 2 socket NUMA 	machine. but it seems hackbench is
	too synthetic to benefit from this, because the	skbuff_head_cache's
	size fits into the last level cache.

	Probably more realistic workloads like netperf would benefit
	from this?
]

Set remote_node_defrag_ratio to zero by default, and the new behavior is:
	1) try refilling per CPU partial list from the local node
	2) try allocating new slabs from the local node without reclamation
	3) try refilling per CPU partial list from remote nodes
	4) try allocating new slabs from the local node or remote nodes

If user specified remote_node_defrag_ratio, it probabilistically tries
3) first and then try 2) and 4) in order, to avoid unexpected behavioral
change from user's perspective.
---
 mm/slub.c | 45 +++++++++++++++++++++++++++++++++++++--------
 1 file changed, 37 insertions(+), 8 deletions(-)

diff --git a/mm/slub.c b/mm/slub.c
index 199d3d03d5b9..cfdea3e3e221 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -2319,7 +2319,8 @@ static void *get_partial_node(struct kmem_cache *s, s=
truct kmem_cache_node *n,
 /*
  * Get a slab from somewhere. Search in increasing NUMA distances.
  */
-static void *get_any_partial(struct kmem_cache *s, struct partial_context =
*pc)
+static void *get_any_partial(struct kmem_cache *s, struct partial_context =
*pc,
+			     bool force_defrag)
 {
 #ifdef CONFIG_NUMA
 	struct zonelist *zonelist;
@@ -2347,8 +2348,8 @@ static void *get_any_partial(struct kmem_cache *s, st=
ruct partial_context *pc)
 	 * may be expensive if we do it every time we are trying to find a slab
 	 * with available objects.
 	 */
-	if (!s->remote_node_defrag_ratio ||
-			get_cycles() % 1024 > s->remote_node_defrag_ratio)
+	if (!force_defrag && (!s->remote_node_defrag_ratio ||
+			get_cycles() % 1024 > s->remote_node_defrag_ratio))
 		return NULL;
=20
 	do {
@@ -2382,7 +2383,8 @@ static void *get_any_partial(struct kmem_cache *s, st=
ruct partial_context *pc)
 /*
  * Get a partial slab, lock it and return it.
  */
-static void *get_partial(struct kmem_cache *s, int node, struct partial_co=
ntext *pc)
+static void *get_partial(struct kmem_cache *s, int node, struct partial_co=
ntext *pc,
+			 bool force_defrag)
 {
 	void *object;
 	int searchnode =3D node;
@@ -2394,7 +2396,7 @@ static void *get_partial(struct kmem_cache *s, int no=
de, struct partial_context
 	if (object || node !=3D NUMA_NO_NODE)
 		return object;
=20
-	return get_any_partial(s, pc);
+	return get_any_partial(s, pc, force_defrag);
 }
=20
 #ifndef CONFIG_SLUB_TINY
@@ -3092,6 +3094,7 @@ static void *___slab_alloc(struct kmem_cache *s, gfp_=
t gfpflags, int node,
 	struct slab *slab;
 	unsigned long flags;
 	struct partial_context pc;
+	gfp_t local_flags;
=20
 	stat(s, ALLOC_SLOWPATH);
=20
@@ -3208,10 +3211,35 @@ static void *___slab_alloc(struct kmem_cache *s, gf=
p_t gfpflags, int node,
 	pc.flags =3D gfpflags;
 	pc.slab =3D &slab;
 	pc.orig_size =3D orig_size;
-	freelist =3D get_partial(s, node, &pc);
+
+	freelist =3D get_partial(s, node, &pc, false);
 	if (freelist)
 		goto check_new_slab;
=20
+	/*
+	 * try allocating slab from the local node first before taking slabs
+	 * from remote nodes. If user specified remote_node_defrag_ratio,
+	 * try taking slabs from remote nodes first.
+	 */
+	slub_put_cpu_ptr(s->cpu_slab);
+	local_flags =3D (gfpflags | __GFP_NOWARN | __GFP_THISNODE);
+	local_flags &=3D ~(__GFP_NOFAIL | __GFP_RECLAIM);
+	slab =3D new_slab(s, local_flags, node);
+	c =3D slub_get_cpu_ptr(s->cpu_slab);
+
+	if (slab)
+		goto alloc_slab;
+
+	/*
+	 * At this point no memory can be allocated lightly.
+	 * Take slabs from remote nodes.
+	 */
+	if (node =3D=3D NUMA_NO_NODE) {
+		freelist =3D get_any_partial(s, &pc, true);
+		if (freelist)
+			goto check_new_slab;
+	}
+
 	slub_put_cpu_ptr(s->cpu_slab);
 	slab =3D new_slab(s, gfpflags, node);
 	c =3D slub_get_cpu_ptr(s->cpu_slab);
@@ -3221,6 +3249,7 @@ static void *___slab_alloc(struct kmem_cache *s, gfp_=
t gfpflags, int node,
 		return NULL;
 	}
=20
+alloc_slab:
 	stat(s, ALLOC_SLAB);
=20
 	if (kmem_cache_debug(s)) {
@@ -3404,7 +3433,7 @@ static void *__slab_alloc_node(struct kmem_cache *s,
 	pc.flags =3D gfpflags;
 	pc.slab =3D &slab;
 	pc.orig_size =3D orig_size;
-	object =3D get_partial(s, node, &pc);
+	object =3D get_partial(s, node, &pc, false);
=20
 	if (object)
 		return object;
@@ -4538,7 +4567,7 @@ static int kmem_cache_open(struct kmem_cache *s, slab=
_flags_t flags)
 	set_cpu_partial(s);
=20
 #ifdef CONFIG_NUMA
-	s->remote_node_defrag_ratio =3D 1000;
+	s->remote_node_defrag_ratio =3D 0;
 #endif
=20
 	/* Initialize the pre-computed randomized freelist if slab is up */
--=20
2.41.0