From nobody Mon Feb  9 02:03:17 2026
Return-Path: <linux-kernel-owner@vger.kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 5A11CC6FA9E
	for <linux-kernel@archiver.kernel.org>; Sat,  4 Mar 2023 03:48:53 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S229714AbjCDDsv (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Fri, 3 Mar 2023 22:48:51 -0500
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:35350 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S229463AbjCDDso (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Fri, 3 Mar 2023 22:48:44 -0500
Received: from mail-pl1-x636.google.com (mail-pl1-x636.google.com
 [IPv6:2607:f8b0:4864:20::636])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E4D4E1BAD1
        for <linux-kernel@vger.kernel.org>;
 Fri,  3 Mar 2023 19:48:43 -0800 (PST)
Received: by mail-pl1-x636.google.com with SMTP id u5so4771707plq.7
        for <linux-kernel@vger.kernel.org>;
 Fri, 03 Mar 2023 19:48:43 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=chromium.org; s=google;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:from:to:cc:subject:date
         :message-id:reply-to;
        bh=SEq3MDoFcR7yz1bLKCGz9x04eW60/3Ets++p8vd9Ctw=;
        b=OSN6KWn7hEXJmiSwcCwf+j4CxdT8wPCagEal0qiSFk+/V8atkJMsWd9LYZUtjmsPXr
         9CRmtA+NB7lVYdNh98oMEGUcsk0IzMtp3hVLlNUeYtTmnR1IdOQSDJgHlTb/6bPe258k
         cYr52SWQApBs9ogj8FNmQ3LcCt0GmMZsOzeLY=
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20210112;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc
         :subject:date:message-id:reply-to;
        bh=SEq3MDoFcR7yz1bLKCGz9x04eW60/3Ets++p8vd9Ctw=;
        b=ySC2RAwZ2egMCRKbD9JvSFdPlwekXOFlIT5yB5Syv01FGoAUcEzQYA11V9rM+RHooz
         o3Xf26QVWjjfczelKx5rjY8T/mz2G4fo45zRnqjbHtLwM7k7KA3TJRuJEa6rkR4k1Gol
         f2QI3xg2c/7wvcARltn8K57941dN5/4vFcA64PQSW+WfyLUASnGME8uXAIhCN8CEQDxL
         kKFMW1/JDz+Nh+6ZUmWI4T5JUvMBJLDoKWBWd0AF+u3JZQy2m7L86+TaqWimFYecy9hU
         t9plrJ/FAqsE5/59xIf4DvrQ+BdL2jk2qtEhR3s7XUvvbHEv+n7ABJ6jWbpREf2TxZ37
         rEKA==
X-Gm-Message-State: AO0yUKVNqql9V+0IDtT15/xD0l3M+68f0z5Gj82VOgjz2E9FM6kKabzB
        ejX35AdJ/3NEDzuQyIM6cDeyQw==
X-Google-Smtp-Source: 
 AK7set/tt+odDA27bkzjmHOQ+zF7n6+n5T9hL7wxUbL0XcZs6V/tJCFYEzPfag5hO0XniqIk8YzmhQ==
X-Received: by 2002:a17:902:ce92:b0:19e:27a1:dd94 with SMTP id
 f18-20020a170902ce9200b0019e27a1dd94mr5079134plg.35.1677901723382;
        Fri, 03 Mar 2023 19:48:43 -0800 (PST)
Received: from tigerii.tok.corp.google.com
 ([2401:fa00:8f:203:6ac2:6eee:5465:7ee6])
        by smtp.gmail.com with ESMTPSA id
 d6-20020a170902c18600b00199025284b3sm2249204pld.151.2023.03.03.19.48.41
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Fri, 03 Mar 2023 19:48:43 -0800 (PST)
From: Sergey Senozhatsky <senozhatsky@chromium.org>
To: Minchan Kim <minchan@kernel.org>,
        Andrew Morton <akpm@linux-foundation.org>
Cc: Yosry Ahmed <yosryahmed@google.com>, linux-kernel@vger.kernel.org,
        linux-mm@kvack.org, Sergey Senozhatsky <senozhatsky@chromium.org>
Subject: [PATCHv4 1/4] zsmalloc: remove insert_zspage() ->inuse optimization
Date: Sat,  4 Mar 2023 12:48:32 +0900
Message-Id: <20230304034835.2082479-2-senozhatsky@chromium.org>
X-Mailer: git-send-email 2.40.0.rc0.216.gc4246ad0f0-goog
In-Reply-To: <20230304034835.2082479-1-senozhatsky@chromium.org>
References: <20230304034835.2082479-1-senozhatsky@chromium.org>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

This optimization has no effect. It only ensures that
when a zspage was added to its corresponding fullness
list, its "inuse" counter was higher or lower than the
"inuse" counter of the zspage at the head of the list.
The intention was to keep busy zspages at the head, so
they could be filled up and moved to the ZS_FULL
fullness group more quickly. However, this doesn't work
as the "inuse" counter of a zspage can be modified by
obj_free() but the zspage may still belong to the same
fullness list. So, fix_fullness_group() won't change
the zspage's position in relation to the head's "inuse"
counter, leading to a largely random order of zspages
within the fullness list.

For instance, consider a printout of the "inuse"
counters of the first 10 zspages in a class that holds
93 objects per zspage:

 ZS_ALMOST_EMPTY:  36  67  68  64  35  54  63  52

As we can see the zspage with the lowest "inuse" counter
is actually the head of the fullness list.

Remove this pointless "optimisation".

Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org>
Acked-by: Minchan Kim <minchan@kernel.org>
---
 mm/zsmalloc.c | 13 +------------
 1 file changed, 1 insertion(+), 12 deletions(-)

diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
index 3aed46ab7e6c..abe0c4d7942d 100644
--- a/mm/zsmalloc.c
+++ b/mm/zsmalloc.c
@@ -762,19 +762,8 @@ static void insert_zspage(struct size_class *class,
 				struct zspage *zspage,
 				enum fullness_group fullness)
 {
-	struct zspage *head;
-
 	class_stat_inc(class, fullness, 1);
-	head =3D list_first_entry_or_null(&class->fullness_list[fullness],
-					struct zspage, list);
-	/*
-	 * We want to see more ZS_FULL pages and less almost empty/full.
-	 * Put pages with higher ->inuse first.
-	 */
-	if (head && get_zspage_inuse(zspage) < get_zspage_inuse(head))
-		list_add(&zspage->list, &head->list);
-	else
-		list_add(&zspage->list, &class->fullness_list[fullness]);
+	list_add(&zspage->list, &class->fullness_list[fullness]);
 }
=20
 /*
--=20
2.40.0.rc0.216.gc4246ad0f0-goog
From nobody Mon Feb  9 02:03:17 2026
Return-Path: <linux-kernel-owner@vger.kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 71BF9C6FA8E
	for <linux-kernel@archiver.kernel.org>; Sat,  4 Mar 2023 03:48:58 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S229771AbjCDDs4 (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Fri, 3 Mar 2023 22:48:56 -0500
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:35504 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S229738AbjCDDsu (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Fri, 3 Mar 2023 22:48:50 -0500
Received: from mail-pl1-x62f.google.com (mail-pl1-x62f.google.com
 [IPv6:2607:f8b0:4864:20::62f])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B6A621D920
        for <linux-kernel@vger.kernel.org>;
 Fri,  3 Mar 2023 19:48:46 -0800 (PST)
Received: by mail-pl1-x62f.google.com with SMTP id i10so4762811plr.9
        for <linux-kernel@vger.kernel.org>;
 Fri, 03 Mar 2023 19:48:46 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=chromium.org; s=google;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:from:to:cc:subject:date
         :message-id:reply-to;
        bh=INMm7/hLJTx2Fj9d5WNhewPWXG18XbmQ5Z9CNzzHrFk=;
        b=SoaMlTEvZDHc49/2GSVpvjEkRvnBLIEerQ8iyW9U4gt9XZ8+LycTaBewMczNArmI2Z
         ucPl0zr5zq4PJC0/sUazh1P/TutyXZLRJpmo8tHfZ9u4uTUqNWy/MPKlEgoBLZ7z+55A
         Ek9w6F8N27QCaKKE8CoQP149nM6gQOhqHz99w=
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20210112;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc
         :subject:date:message-id:reply-to;
        bh=INMm7/hLJTx2Fj9d5WNhewPWXG18XbmQ5Z9CNzzHrFk=;
        b=neohc+EJOQekVGtuwV4A1mRaMXvGBCxfAmGLUokc9ORhIvXjWErUKKBz99AayJjgSz
         xb9sMES4LE/7yPL4Os77OCkbq9rg5GX7x0+i/0pGmU2Dsob902tzkOsKAQFw9lhjD4d8
         eOQy173PT18zuTNgIJHw7dmGhM838jUCajJdBpwOJ1ErGGuPO72Vdi1DqNtNHbVO/wnv
         +hmdILql8WiqqSeC/SMOhQR+qz46s0X+aiR7Zd/sgGtN+wbgKYzFSxIE/zVduQWSbXa5
         EytKmwJwGcZNQObPd4UC5V7zCVDEaw6wZj7lLYOkdUnk87u4vt8FhjwRxVMLyuX9bCU1
         AwFw==
X-Gm-Message-State: AO0yUKVH2R8hXshbjw//Si07MzqozT/RH5OPflIP6I8cJsnVbvtfSh9c
        MTVq0zZbG6UDFl7et+wwNFpWbA==
X-Google-Smtp-Source: 
 AK7set9RMrUQhTc4WlbS1mnljKuOCdXpvvqCSzlH5usKmgHObVWE0odhxRuUQVvdCgcSXfFaVWNrVA==
X-Received: by 2002:a17:902:f54a:b0:19a:98c9:8cea with SMTP id
 h10-20020a170902f54a00b0019a98c98ceamr5024162plf.39.1677901726091;
        Fri, 03 Mar 2023 19:48:46 -0800 (PST)
Received: from tigerii.tok.corp.google.com
 ([2401:fa00:8f:203:6ac2:6eee:5465:7ee6])
        by smtp.gmail.com with ESMTPSA id
 d6-20020a170902c18600b00199025284b3sm2249204pld.151.2023.03.03.19.48.44
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Fri, 03 Mar 2023 19:48:45 -0800 (PST)
From: Sergey Senozhatsky <senozhatsky@chromium.org>
To: Minchan Kim <minchan@kernel.org>,
        Andrew Morton <akpm@linux-foundation.org>
Cc: Yosry Ahmed <yosryahmed@google.com>, linux-kernel@vger.kernel.org,
        linux-mm@kvack.org, Sergey Senozhatsky <senozhatsky@chromium.org>
Subject: [PATCHv4 2/4] zsmalloc: fine-grained inuse ratio based fullness
 grouping
Date: Sat,  4 Mar 2023 12:48:33 +0900
Message-Id: <20230304034835.2082479-3-senozhatsky@chromium.org>
X-Mailer: git-send-email 2.40.0.rc0.216.gc4246ad0f0-goog
In-Reply-To: <20230304034835.2082479-1-senozhatsky@chromium.org>
References: <20230304034835.2082479-1-senozhatsky@chromium.org>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

Each zspage maintains ->inuse counter which keeps track of the
number of objects stored in the zspage. The ->inuse counter also
determines the zspage's "fullness group" which is calculated as
the ratio of the "inuse" objects to the total number of objects
the zspage can hold (objs_per_zspage). The closer the ->inuse
counter is to objs_per_zspage, the better.

Each size class maintains several fullness lists, that keep
track of zspages of particular "fullness". Pages within each
fullness list are stored in random order with regard to the
->inuse counter. This is because sorting the zspages by ->inuse
counter each time obj_malloc() or obj_free() is called would
be too expensive. However, the ->inuse counter is still a
crucial factor in many situations.

For the two major zsmalloc operations, zs_malloc() and zs_compact(),
we typically select the head zspage from the corresponding fullness
list as the best candidate zspage. However, this assumption is not
always accurate.

For the zs_malloc() operation, the optimal candidate zspage should
have the highest ->inuse counter. This is because the goal is to
maximize the number of ZS_FULL zspages and make full use of all
allocated memory.

For the zs_compact() operation, the optimal source zspage should
have the lowest ->inuse counter. This is because compaction needs
to move objects in use to another page before it can release the
zspage and return its physical pages to the buddy allocator. The
fewer objects in use, the quicker compaction can release the zspage.
Additionally, compaction is measured by the number of pages it
releases.

This patch reworks the fullness grouping mechanism. Instead of
having two groups - ZS_ALMOST_EMPTY (usage ratio below 3/4) and
ZS_ALMOST_FULL (usage ration above 3/4) - that result in too many
zspages being included in the ALMOST_EMPTY group for specific
classes, size classes maintain a larger number of fullness lists
that give strict guarantees on the minimum and maximum ->inuse
values within each group. Each group represents a 10% change in the
->inuse ratio compared to neighboring groups. In essence, there
are groups for zspages with 0%, 10%, 20% usage ratios, and so on,
up to 100%.

This enhances the selection of candidate zspages for both zs_malloc()
and zs_compact(). A printout of the ->inuse counters of the first 7
zspages per (random) class fullness group:

 class-768 objs_per_zspage 16:
   fullness 100%:  empty
   fullness  99%:  empty
   fullness  90%:  empty
   fullness  80%:  empty
   fullness  70%:  empty
   fullness  60%:  8  8  9  9  8  8  8
   fullness  50%:  empty
   fullness  40%:  5  5  6  5  5  5  5
   fullness  30%:  4  4  4  4  4  4  4
   fullness  20%:  2  3  2  3  3  2  2
   fullness  10%:  1  1  1  1  1  1  1
   fullness   0%:  empty

The zs_malloc() function searches through the groups of pages
starting with the one having the highest usage ratio. This means
that it always selects a zspage from the group with the least
internal fragmentation (highest usage ratio) and makes it even
less fragmented by increasing its usage ratio.

The zs_compact() function, on the other hand, begins by scanning
the group with the highest fragmentation (lowest usage ratio) to
locate the source page. The first available zspage is selected, and
then the function moves downward to find a destination zspage in
the group with the lowest internal fragmentation (highest usage
ratio).

Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org>
Acked-by: Minchan Kim <minchan@kernel.org>
---
 mm/zsmalloc.c | 248 ++++++++++++++++++++++++++------------------------
 1 file changed, 130 insertions(+), 118 deletions(-)

diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
index abe0c4d7942d..cc59336a966a 100644
--- a/mm/zsmalloc.c
+++ b/mm/zsmalloc.c
@@ -127,7 +127,7 @@
 #define OBJ_INDEX_MASK	((_AC(1, UL) << OBJ_INDEX_BITS) - 1)
=20
 #define HUGE_BITS	1
-#define FULLNESS_BITS	2
+#define FULLNESS_BITS	4
 #define CLASS_BITS	8
 #define ISOLATED_BITS	5
 #define MAGIC_VAL_BITS	8
@@ -159,51 +159,46 @@
 #define ZS_SIZE_CLASSES	(DIV_ROUND_UP(ZS_MAX_ALLOC_SIZE - ZS_MIN_ALLOC_SIZ=
E, \
 				      ZS_SIZE_CLASS_DELTA) + 1)
=20
+/*
+ * Pages are distinguished by the ratio of used memory (that is the ratio
+ * of ->inuse objects to all objects that page can store). For example,
+ * INUSE_RATIO_10 means that the ratio of used objects is > 0% and <=3D 10=
%.
+ *
+ * The number of fullness groups is not random. It allows us to keep
+ * difference between the least busy page in the group (minimum permitted
+ * number of ->inuse objects) and the most busy page (maximum permitted
+ * number of ->inuse objects) at a reasonable value.
+ */
 enum fullness_group {
-	ZS_EMPTY,
-	ZS_ALMOST_EMPTY,
-	ZS_ALMOST_FULL,
-	ZS_FULL,
-	NR_ZS_FULLNESS,
+	ZS_INUSE_RATIO_0,
+	ZS_INUSE_RATIO_10,
+	/* NOTE: 5 more fullness groups here */
+	ZS_INUSE_RATIO_70	=3D 7,
+	/* NOTE: 2 more fullness groups here */
+	ZS_INUSE_RATIO_99       =3D 10,
+	ZS_INUSE_RATIO_100,
+	NR_FULLNESS_GROUPS,
 };
=20
 enum class_stat_type {
-	CLASS_EMPTY,
-	CLASS_ALMOST_EMPTY,
-	CLASS_ALMOST_FULL,
-	CLASS_FULL,
-	OBJ_ALLOCATED,
-	OBJ_USED,
-	NR_ZS_STAT_TYPE,
+	/* NOTE: stats for 12 fullness groups here: from inuse 0 to 100 */
+	ZS_OBJS_ALLOCATED       =3D NR_FULLNESS_GROUPS,
+	ZS_OBJS_INUSE,
+	NR_CLASS_STAT_TYPES,
 };
=20
 struct zs_size_stat {
-	unsigned long objs[NR_ZS_STAT_TYPE];
+	unsigned long objs[NR_CLASS_STAT_TYPES];
 };
=20
 #ifdef CONFIG_ZSMALLOC_STAT
 static struct dentry *zs_stat_root;
 #endif
=20
-/*
- * We assign a page to ZS_ALMOST_EMPTY fullness group when:
- *	n <=3D N / f, where
- * n =3D number of allocated objects
- * N =3D total number of objects zspage can store
- * f =3D fullness_threshold_frac
- *
- * Similarly, we assign zspage to:
- *	ZS_ALMOST_FULL	when n > N / f
- *	ZS_EMPTY	when n =3D=3D 0
- *	ZS_FULL		when n =3D=3D N
- *
- * (see: fix_fullness_group())
- */
-static const int fullness_threshold_frac =3D 4;
 static size_t huge_class_size;
=20
 struct size_class {
-	struct list_head fullness_list[NR_ZS_FULLNESS];
+	struct list_head fullness_list[NR_FULLNESS_GROUPS];
 	/*
 	 * Size of objects stored in this class. Must be multiple
 	 * of ZS_ALIGN.
@@ -547,8 +542,8 @@ static inline void set_freeobj(struct zspage *zspage, u=
nsigned int obj)
 }
=20
 static void get_zspage_mapping(struct zspage *zspage,
-				unsigned int *class_idx,
-				enum fullness_group *fullness)
+			       unsigned int *class_idx,
+			       int *fullness)
 {
 	BUG_ON(zspage->magic !=3D ZSPAGE_MAGIC);
=20
@@ -557,14 +552,14 @@ static void get_zspage_mapping(struct zspage *zspage,
 }
=20
 static struct size_class *zspage_class(struct zs_pool *pool,
-					     struct zspage *zspage)
+				       struct zspage *zspage)
 {
 	return pool->size_class[zspage->class];
 }
=20
 static void set_zspage_mapping(struct zspage *zspage,
-				unsigned int class_idx,
-				enum fullness_group fullness)
+			       unsigned int class_idx,
+			       int fullness)
 {
 	zspage->class =3D class_idx;
 	zspage->fullness =3D fullness;
@@ -588,23 +583,19 @@ static int get_size_class_index(int size)
 	return min_t(int, ZS_SIZE_CLASSES - 1, idx);
 }
=20
-/* type can be of enum type class_stat_type or fullness_group */
 static inline void class_stat_inc(struct size_class *class,
 				int type, unsigned long cnt)
 {
 	class->stats.objs[type] +=3D cnt;
 }
=20
-/* type can be of enum type class_stat_type or fullness_group */
 static inline void class_stat_dec(struct size_class *class,
 				int type, unsigned long cnt)
 {
 	class->stats.objs[type] -=3D cnt;
 }
=20
-/* type can be of enum type class_stat_type or fullness_group */
-static inline unsigned long zs_stat_get(struct size_class *class,
-				int type)
+static inline unsigned long zs_stat_get(struct size_class *class, int type)
 {
 	return class->stats.objs[type];
 }
@@ -646,16 +637,27 @@ static int zs_stats_size_show(struct seq_file *s, voi=
d *v)
 			"pages_per_zspage", "freeable");
=20
 	for (i =3D 0; i < ZS_SIZE_CLASSES; i++) {
+		int fg;
+
 		class =3D pool->size_class[i];
=20
 		if (class->index !=3D i)
 			continue;
=20
 		spin_lock(&pool->lock);
-		class_almost_full =3D zs_stat_get(class, CLASS_ALMOST_FULL);
-		class_almost_empty =3D zs_stat_get(class, CLASS_ALMOST_EMPTY);
-		obj_allocated =3D zs_stat_get(class, OBJ_ALLOCATED);
-		obj_used =3D zs_stat_get(class, OBJ_USED);
+		class_almost_full =3D 0;
+		class_almost_empty =3D 0;
+		/*
+		 * Replicate old behaviour for almost_full and almost_empty
+		 * stats.
+		 */
+		for (fg =3D ZS_INUSE_RATIO_70; fg <=3D ZS_INUSE_RATIO_99; fg++)
+			class_almost_full +=3D zs_stat_get(class, fg);
+		for (fg =3D ZS_INUSE_RATIO_10; fg < ZS_INUSE_RATIO_70; fg++)
+			class_almost_empty +=3D zs_stat_get(class, fg);
+
+		obj_allocated =3D zs_stat_get(class, ZS_OBJS_ALLOCATED);
+		obj_used =3D zs_stat_get(class, ZS_OBJS_INUSE);
 		freeable =3D zs_can_compact(class);
 		spin_unlock(&pool->lock);
=20
@@ -726,30 +728,28 @@ static inline void zs_pool_stat_destroy(struct zs_poo=
l *pool)
=20
 /*
  * For each size class, zspages are divided into different groups
- * depending on how "full" they are. This was done so that we could
- * easily find empty or nearly empty zspages when we try to shrink
- * the pool (not yet implemented). This function returns fullness
+ * depending on their usage ratio. This function returns fullness
  * status of the given page.
  */
-static enum fullness_group get_fullness_group(struct size_class *class,
-						struct zspage *zspage)
+static int get_fullness_group(struct size_class *class, struct zspage *zsp=
age)
 {
-	int inuse, objs_per_zspage;
-	enum fullness_group fg;
+	int inuse, objs_per_zspage, ratio;
=20
 	inuse =3D get_zspage_inuse(zspage);
 	objs_per_zspage =3D class->objs_per_zspage;
=20
 	if (inuse =3D=3D 0)
-		fg =3D ZS_EMPTY;
-	else if (inuse =3D=3D objs_per_zspage)
-		fg =3D ZS_FULL;
-	else if (inuse <=3D 3 * objs_per_zspage / fullness_threshold_frac)
-		fg =3D ZS_ALMOST_EMPTY;
-	else
-		fg =3D ZS_ALMOST_FULL;
+		return ZS_INUSE_RATIO_0;
+	if (inuse =3D=3D objs_per_zspage)
+		return ZS_INUSE_RATIO_100;
=20
-	return fg;
+	ratio =3D 100 * inuse / objs_per_zspage;
+	/*
+	 * Take integer division into consideration: a page with one inuse
+	 * object out of 127 possible, will end up having 0 usage ratio,
+	 * which is wrong as it belongs in ZS_INUSE_RATIO_10 fullness group.
+	 */
+	return ratio / 10 + 1;
 }
=20
 /*
@@ -760,7 +760,7 @@ static enum fullness_group get_fullness_group(struct si=
ze_class *class,
  */
 static void insert_zspage(struct size_class *class,
 				struct zspage *zspage,
-				enum fullness_group fullness)
+				int fullness)
 {
 	class_stat_inc(class, fullness, 1);
 	list_add(&zspage->list, &class->fullness_list[fullness]);
@@ -772,7 +772,7 @@ static void insert_zspage(struct size_class *class,
  */
 static void remove_zspage(struct size_class *class,
 				struct zspage *zspage,
-				enum fullness_group fullness)
+				int fullness)
 {
 	VM_BUG_ON(list_empty(&class->fullness_list[fullness]));
=20
@@ -783,17 +783,16 @@ static void remove_zspage(struct size_class *class,
 /*
  * Each size class maintains zspages in different fullness groups depending
  * on the number of live objects they contain. When allocating or freeing
- * objects, the fullness status of the page can change, say, from ALMOST_F=
ULL
- * to ALMOST_EMPTY when freeing an object. This function checks if such
- * a status change has occurred for the given page and accordingly moves t=
he
- * page from the freelist of the old fullness group to that of the new
- * fullness group.
+ * objects, the fullness status of the page can change, for instance, from
+ * INUSE_RATIO_80 to INUSE_RATIO_70 when freeing an object. This function
+ * checks if such a status change has occurred for the given page and
+ * accordingly moves the page from the list of the old fullness group to t=
hat
+ * of the new fullness group.
  */
-static enum fullness_group fix_fullness_group(struct size_class *class,
-						struct zspage *zspage)
+static int fix_fullness_group(struct size_class *class, struct zspage *zsp=
age)
 {
 	int class_idx;
-	enum fullness_group currfg, newfg;
+	int currfg, newfg;
=20
 	get_zspage_mapping(zspage, &class_idx, &currfg);
 	newfg =3D get_fullness_group(class, zspage);
@@ -966,7 +965,7 @@ static void __free_zspage(struct zs_pool *pool, struct =
size_class *class,
 				struct zspage *zspage)
 {
 	struct page *page, *next;
-	enum fullness_group fg;
+	int fg;
 	unsigned int class_idx;
=20
 	get_zspage_mapping(zspage, &class_idx, &fg);
@@ -974,7 +973,7 @@ static void __free_zspage(struct zs_pool *pool, struct =
size_class *class,
 	assert_spin_locked(&pool->lock);
=20
 	VM_BUG_ON(get_zspage_inuse(zspage));
-	VM_BUG_ON(fg !=3D ZS_EMPTY);
+	VM_BUG_ON(fg !=3D ZS_INUSE_RATIO_0);
=20
 	/* Free all deferred handles from zs_free */
 	free_handles(pool, class, zspage);
@@ -992,9 +991,8 @@ static void __free_zspage(struct zs_pool *pool, struct =
size_class *class,
=20
 	cache_free_zspage(pool, zspage);
=20
-	class_stat_dec(class, OBJ_ALLOCATED, class->objs_per_zspage);
-	atomic_long_sub(class->pages_per_zspage,
-					&pool->pages_allocated);
+	class_stat_dec(class, ZS_OBJS_ALLOCATED, class->objs_per_zspage);
+	atomic_long_sub(class->pages_per_zspage, &pool->pages_allocated);
 }
=20
 static void free_zspage(struct zs_pool *pool, struct size_class *class,
@@ -1013,7 +1011,7 @@ static void free_zspage(struct zs_pool *pool, struct =
size_class *class,
 		return;
 	}
=20
-	remove_zspage(class, zspage, ZS_EMPTY);
+	remove_zspage(class, zspage, ZS_INUSE_RATIO_0);
 #ifdef CONFIG_ZPOOL
 	list_del(&zspage->lru);
 #endif
@@ -1149,9 +1147,9 @@ static struct zspage *find_get_zspage(struct size_cla=
ss *class)
 	int i;
 	struct zspage *zspage;
=20
-	for (i =3D ZS_ALMOST_FULL; i >=3D ZS_EMPTY; i--) {
+	for (i =3D ZS_INUSE_RATIO_99; i >=3D ZS_INUSE_RATIO_0; i--) {
 		zspage =3D list_first_entry_or_null(&class->fullness_list[i],
-				struct zspage, list);
+						  struct zspage, list);
 		if (zspage)
 			break;
 	}
@@ -1510,7 +1508,7 @@ unsigned long zs_malloc(struct zs_pool *pool, size_t =
size, gfp_t gfp)
 {
 	unsigned long handle, obj;
 	struct size_class *class;
-	enum fullness_group newfg;
+	int newfg;
 	struct zspage *zspage;
=20
 	if (unlikely(!size || size > ZS_MAX_ALLOC_SIZE))
@@ -1532,7 +1530,7 @@ unsigned long zs_malloc(struct zs_pool *pool, size_t =
size, gfp_t gfp)
 		/* Now move the zspage to another fullness group, if required */
 		fix_fullness_group(class, zspage);
 		record_obj(handle, obj);
-		class_stat_inc(class, OBJ_USED, 1);
+		class_stat_inc(class, ZS_OBJS_INUSE, 1);
 		spin_unlock(&pool->lock);
=20
 		return handle;
@@ -1552,10 +1550,9 @@ unsigned long zs_malloc(struct zs_pool *pool, size_t=
 size, gfp_t gfp)
 	insert_zspage(class, zspage, newfg);
 	set_zspage_mapping(zspage, class->index, newfg);
 	record_obj(handle, obj);
-	atomic_long_add(class->pages_per_zspage,
-				&pool->pages_allocated);
-	class_stat_inc(class, OBJ_ALLOCATED, class->objs_per_zspage);
-	class_stat_inc(class, OBJ_USED, 1);
+	atomic_long_add(class->pages_per_zspage, &pool->pages_allocated);
+	class_stat_inc(class, ZS_OBJS_ALLOCATED, class->objs_per_zspage);
+	class_stat_inc(class, ZS_OBJS_INUSE, 1);
=20
 	/* We completely set up zspage so mark them as movable */
 	SetZsPageMovable(pool, zspage);
@@ -1611,7 +1608,7 @@ void zs_free(struct zs_pool *pool, unsigned long hand=
le)
 	struct page *f_page;
 	unsigned long obj;
 	struct size_class *class;
-	enum fullness_group fullness;
+	int fullness;
=20
 	if (IS_ERR_OR_NULL((void *)handle))
 		return;
@@ -1626,7 +1623,7 @@ void zs_free(struct zs_pool *pool, unsigned long hand=
le)
 	zspage =3D get_zspage(f_page);
 	class =3D zspage_class(pool, zspage);
=20
-	class_stat_dec(class, OBJ_USED, 1);
+	class_stat_dec(class, ZS_OBJS_INUSE, 1);
=20
 #ifdef CONFIG_ZPOOL
 	if (zspage->under_reclaim) {
@@ -1644,7 +1641,7 @@ void zs_free(struct zs_pool *pool, unsigned long hand=
le)
 	obj_free(class->size, obj, NULL);
=20
 	fullness =3D fix_fullness_group(class, zspage);
-	if (fullness =3D=3D ZS_EMPTY)
+	if (fullness =3D=3D ZS_INUSE_RATIO_0)
 		free_zspage(pool, class, zspage);
=20
 	spin_unlock(&pool->lock);
@@ -1826,22 +1823,33 @@ static int migrate_zspage(struct zs_pool *pool, str=
uct size_class *class,
 	return ret;
 }
=20
-static struct zspage *isolate_zspage(struct size_class *class, bool source)
+static struct zspage *isolate_src_zspage(struct size_class *class)
 {
-	int i;
 	struct zspage *zspage;
-	enum fullness_group fg[2] =3D {ZS_ALMOST_EMPTY, ZS_ALMOST_FULL};
+	int fg;
=20
-	if (!source) {
-		fg[0] =3D ZS_ALMOST_FULL;
-		fg[1] =3D ZS_ALMOST_EMPTY;
+	for (fg =3D ZS_INUSE_RATIO_10; fg <=3D ZS_INUSE_RATIO_99; fg++) {
+		zspage =3D list_first_entry_or_null(&class->fullness_list[fg],
+						  struct zspage, list);
+		if (zspage) {
+			remove_zspage(class, zspage, fg);
+			return zspage;
+		}
 	}
=20
-	for (i =3D 0; i < 2; i++) {
-		zspage =3D list_first_entry_or_null(&class->fullness_list[fg[i]],
-							struct zspage, list);
+	return zspage;
+}
+
+static struct zspage *isolate_dst_zspage(struct size_class *class)
+{
+	struct zspage *zspage;
+	int fg;
+
+	for (fg =3D ZS_INUSE_RATIO_99; fg >=3D ZS_INUSE_RATIO_10; fg--) {
+		zspage =3D list_first_entry_or_null(&class->fullness_list[fg],
+						  struct zspage, list);
 		if (zspage) {
-			remove_zspage(class, zspage, fg[i]);
+			remove_zspage(class, zspage, fg);
 			return zspage;
 		}
 	}
@@ -1854,12 +1862,11 @@ static struct zspage *isolate_zspage(struct size_cl=
ass *class, bool source)
  * @class: destination class
  * @zspage: target page
  *
- * Return @zspage's fullness_group
+ * Return @zspage's fullness status
  */
-static enum fullness_group putback_zspage(struct size_class *class,
-			struct zspage *zspage)
+static int putback_zspage(struct size_class *class, struct zspage *zspage)
 {
-	enum fullness_group fullness;
+	int fullness;
=20
 	fullness =3D get_fullness_group(class, zspage);
 	insert_zspage(class, zspage, fullness);
@@ -2123,7 +2130,7 @@ static void async_free_zspage(struct work_struct *wor=
k)
 	int i;
 	struct size_class *class;
 	unsigned int class_idx;
-	enum fullness_group fullness;
+	int fullness;
 	struct zspage *zspage, *tmp;
 	LIST_HEAD(free_pages);
 	struct zs_pool *pool =3D container_of(work, struct zs_pool,
@@ -2135,7 +2142,8 @@ static void async_free_zspage(struct work_struct *wor=
k)
 			continue;
=20
 		spin_lock(&pool->lock);
-		list_splice_init(&class->fullness_list[ZS_EMPTY], &free_pages);
+		list_splice_init(&class->fullness_list[ZS_INUSE_RATIO_0],
+				 &free_pages);
 		spin_unlock(&pool->lock);
 	}
=20
@@ -2144,7 +2152,7 @@ static void async_free_zspage(struct work_struct *wor=
k)
 		lock_zspage(zspage);
=20
 		get_zspage_mapping(zspage, &class_idx, &fullness);
-		VM_BUG_ON(fullness !=3D ZS_EMPTY);
+		VM_BUG_ON(fullness !=3D ZS_INUSE_RATIO_0);
 		class =3D pool->size_class[class_idx];
 		spin_lock(&pool->lock);
 #ifdef CONFIG_ZPOOL
@@ -2192,8 +2200,8 @@ static inline void zs_flush_migration(struct zs_pool =
*pool) { }
 static unsigned long zs_can_compact(struct size_class *class)
 {
 	unsigned long obj_wasted;
-	unsigned long obj_allocated =3D zs_stat_get(class, OBJ_ALLOCATED);
-	unsigned long obj_used =3D zs_stat_get(class, OBJ_USED);
+	unsigned long obj_allocated =3D zs_stat_get(class, ZS_OBJS_ALLOCATED);
+	unsigned long obj_used =3D zs_stat_get(class, ZS_OBJS_INUSE);
=20
 	if (obj_allocated <=3D obj_used)
 		return 0;
@@ -2217,7 +2225,7 @@ static unsigned long __zs_compact(struct zs_pool *poo=
l,
 	 * as well as zpage allocation/free
 	 */
 	spin_lock(&pool->lock);
-	while ((src_zspage =3D isolate_zspage(class, true))) {
+	while ((src_zspage =3D isolate_src_zspage(class))) {
 		/* protect someone accessing the zspage(i.e., zs_map_object) */
 		migrate_write_lock(src_zspage);
=20
@@ -2227,7 +2235,7 @@ static unsigned long __zs_compact(struct zs_pool *poo=
l,
 		cc.obj_idx =3D 0;
 		cc.s_page =3D get_first_page(src_zspage);
=20
-		while ((dst_zspage =3D isolate_zspage(class, false))) {
+		while ((dst_zspage =3D isolate_dst_zspage(class))) {
 			migrate_write_lock_nested(dst_zspage);
=20
 			cc.d_page =3D get_first_page(dst_zspage);
@@ -2252,7 +2260,7 @@ static unsigned long __zs_compact(struct zs_pool *poo=
l,
 		putback_zspage(class, dst_zspage);
 		migrate_write_unlock(dst_zspage);
=20
-		if (putback_zspage(class, src_zspage) =3D=3D ZS_EMPTY) {
+		if (putback_zspage(class, src_zspage) =3D=3D ZS_INUSE_RATIO_0) {
 			migrate_write_unlock(src_zspage);
 			free_zspage(pool, class, src_zspage);
 			pages_freed +=3D class->pages_per_zspage;
@@ -2410,7 +2418,7 @@ struct zs_pool *zs_create_pool(const char *name)
 		int pages_per_zspage;
 		int objs_per_zspage;
 		struct size_class *class;
-		int fullness =3D 0;
+		int fullness;
=20
 		size =3D ZS_MIN_ALLOC_SIZE + i * ZS_SIZE_CLASS_DELTA;
 		if (size > ZS_MAX_ALLOC_SIZE)
@@ -2464,9 +2472,12 @@ struct zs_pool *zs_create_pool(const char *name)
 		class->pages_per_zspage =3D pages_per_zspage;
 		class->objs_per_zspage =3D objs_per_zspage;
 		pool->size_class[i] =3D class;
-		for (fullness =3D ZS_EMPTY; fullness < NR_ZS_FULLNESS;
-							fullness++)
+
+		fullness =3D ZS_INUSE_RATIO_0;
+		while (fullness < NR_FULLNESS_GROUPS) {
 			INIT_LIST_HEAD(&class->fullness_list[fullness]);
+			fullness++;
+		}
=20
 		prev_class =3D class;
 	}
@@ -2512,11 +2523,12 @@ void zs_destroy_pool(struct zs_pool *pool)
 		if (class->index !=3D i)
 			continue;
=20
-		for (fg =3D ZS_EMPTY; fg < NR_ZS_FULLNESS; fg++) {
-			if (!list_empty(&class->fullness_list[fg])) {
-				pr_info("Freeing non-empty class with size %db, fullness group %d\n",
-					class->size, fg);
-			}
+		for (fg =3D ZS_INUSE_RATIO_0; fg < NR_FULLNESS_GROUPS; fg++) {
+			if (list_empty(&class->fullness_list[fg]))
+				continue;
+
+			pr_err("Class-%d fullness group %d is not empty\n",
+			       class->size, fg);
 		}
 		kfree(class);
 	}
@@ -2618,7 +2630,7 @@ static int zs_reclaim_page(struct zs_pool *pool, unsi=
gned int retries)
 	unsigned long handle;
 	struct zspage *zspage;
 	struct page *page;
-	enum fullness_group fullness;
+	int fullness;
=20
 	/* Lock LRU and fullness list */
 	spin_lock(&pool->lock);
@@ -2688,7 +2700,7 @@ static int zs_reclaim_page(struct zs_pool *pool, unsi=
gned int retries)
 			 * while the page is removed from the pool. Fix it
 			 * up for the check in __free_zspage().
 			 */
-			zspage->fullness =3D ZS_EMPTY;
+			zspage->fullness =3D ZS_INUSE_RATIO_0;
=20
 			__free_zspage(pool, class, zspage);
 			spin_unlock(&pool->lock);
--=20
2.40.0.rc0.216.gc4246ad0f0-goog
From nobody Mon Feb  9 02:03:17 2026
Return-Path: <linux-kernel-owner@vger.kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 1364BC6FA8E
	for <linux-kernel@archiver.kernel.org>; Sat,  4 Mar 2023 03:49:03 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S229616AbjCDDtB (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Fri, 3 Mar 2023 22:49:01 -0500
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:35528 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S229668AbjCDDsv (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Fri, 3 Mar 2023 22:48:51 -0500
Received: from mail-pl1-x62e.google.com (mail-pl1-x62e.google.com
 [IPv6:2607:f8b0:4864:20::62e])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id EF0441C7E1
        for <linux-kernel@vger.kernel.org>;
 Fri,  3 Mar 2023 19:48:48 -0800 (PST)
Received: by mail-pl1-x62e.google.com with SMTP id h8so4759162plf.10
        for <linux-kernel@vger.kernel.org>;
 Fri, 03 Mar 2023 19:48:48 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=chromium.org; s=google;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:from:to:cc:subject:date
         :message-id:reply-to;
        bh=mxRoi4TukkX6UlbjK9WtsqOLfwrBv6GFUJd0cC4XWDw=;
        b=PoJckSaXrWf6myxlSmql82ndp3fdHyLe8QuReRZtA53h8cgepOrcFc3wAxf5T+880d
         JxwL2bFvEx57bJWamf060g4cq+HktGLGh3pibGPW+OlzO+ceBhngA6O7ZtD+EcQopQ6T
         MaCJNdUlbXoZGJC2ah7Uxd+e2hfQW5hlYfvYw=
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20210112;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc
         :subject:date:message-id:reply-to;
        bh=mxRoi4TukkX6UlbjK9WtsqOLfwrBv6GFUJd0cC4XWDw=;
        b=jRDbnI+C4mJzrQJZWb8ccly74X40CJHnqqCsjjMuLEnY1gaWVVG/Jn3UWevu2+UCiJ
         2Zu7RUPNIGxRZd5eEIJsZaYKJWemtZvHxjuK2PmOxyP1CNrYd+TfxAK0HHPuFkw6jzWE
         D+kLLl/kmVBk3GIKCxCHXvyM1kRKaO1HVelNcI+j3S9xlOw7K0igsgRxPvbU0OLhYNqH
         DVZcztFftDbBjGX8LLhHsOrHfTupEV5VfGZF2UVsu9SmRTlvffJasyBAlE0h4CsLiCZY
         VmGoCOob6QCqwC6SiV616AMFt5j2SgdXPs0OEtJTpMBE6m3ymnjArI++O3YoOmY077Qn
         vBgQ==
X-Gm-Message-State: AO0yUKWh1Pc4I5zLuf3EgeAKnCvUAR3rgGKrrZ4JtSMXfKpCUuUOXfPV
        P07+uGdexFxYWZdW2Z7WjUHd/A==
X-Google-Smtp-Source: 
 AK7set8zpcSUINaMZnuyOqbfw1GR5LVivHiboO47W0G55Vv1CAgAtziIWwGghZEnWnNpAu3eWy170Q==
X-Received: by 2002:a17:903:1cb:b0:19c:eaf0:9859 with SMTP id
 e11-20020a17090301cb00b0019ceaf09859mr5610861plh.38.1677901728469;
        Fri, 03 Mar 2023 19:48:48 -0800 (PST)
Received: from tigerii.tok.corp.google.com
 ([2401:fa00:8f:203:6ac2:6eee:5465:7ee6])
        by smtp.gmail.com with ESMTPSA id
 d6-20020a170902c18600b00199025284b3sm2249204pld.151.2023.03.03.19.48.46
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Fri, 03 Mar 2023 19:48:48 -0800 (PST)
From: Sergey Senozhatsky <senozhatsky@chromium.org>
To: Minchan Kim <minchan@kernel.org>,
        Andrew Morton <akpm@linux-foundation.org>
Cc: Yosry Ahmed <yosryahmed@google.com>, linux-kernel@vger.kernel.org,
        linux-mm@kvack.org, Sergey Senozhatsky <senozhatsky@chromium.org>
Subject: [PATCHv4 3/4] zsmalloc: rework compaction algorithm
Date: Sat,  4 Mar 2023 12:48:34 +0900
Message-Id: <20230304034835.2082479-4-senozhatsky@chromium.org>
X-Mailer: git-send-email 2.40.0.rc0.216.gc4246ad0f0-goog
In-Reply-To: <20230304034835.2082479-1-senozhatsky@chromium.org>
References: <20230304034835.2082479-1-senozhatsky@chromium.org>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

The zsmalloc compaction algorithm has the potential to
waste some CPU cycles, particularly when compacting pages
within the same fullness group. This is due to the way it
selects the head page of the fullness list for source and
destination pages, and how it reinserts those pages during
each iteration. The algorithm may first use a page as a
migration destination and then as a migration source,
leading to an unnecessary back-and-forth movement of
objects.

Consider the following fullness list:

PageA PageB PageC PageD PageE

During the first iteration, the compaction algorithm will
select PageA as the source and PageB as the destination.
All of PageA's objects will be moved to PageB, and then
PageA will be released while PageB is reinserted into the
fullness list.

PageB PageC PageD PageE

During the next iteration, the compaction algorithm will
again select the head of the list as the source and destination,
meaning that PageB will now serve as the source and PageC as
the destination. This will result in the objects being moved
away from PageB, the same objects that were just moved to PageB
in the previous iteration.

To prevent this avalanche effect, the compaction algorithm
should not reinsert the destination page between iterations.
By doing so, the most optimal page will continue to be used
and its usage ratio will increase, reducing internal
fragmentation. The destination page should only be reinserted
into the fullness list if:
- It becomes full
- No source page is available.

TEST
=3D=3D=3D=3D

It's very challenging to reliably test this series. I ended up
developing my own synthetic test that has 100% reproducibility.
The test generates significan fragmentation (for each size class)
and then performs compaction for each class individually and tracks
the number of memcpy() in zs_object_copy(), so that we can compare
the amount work compaction does on per-class basis.

Total amount of work (zram mm_stat objs_moved)
Acked-by: Minchan Kim <minchan@kernel.org>
----------------------------------------------

Old fullness grouping, old compaction algorithm:
323977 memcpy() in zs_object_copy().

Old fullness grouping, new compaction algorithm:
262944 memcpy() in zs_object_copy().

New fullness grouping, new compaction algorithm:
213978 memcpy() in zs_object_copy().

Per-class compaction memcpy() comparison (T-test)
-------------------------------------------------

x Old fullness grouping, old compaction algorithm
+ Old fullness grouping, new compaction algorithm

    N           Min           Max        Median           Avg        Stddev
x 140           349          3513          2461     2314.1214     806.03271
+ 140           289          2778          2006     1878.1714     641.02073
Difference at 95.0% confidence
        -435.95 +/- 170.595
        -18.8387% +/- 7.37193%
        (Student's t, pooled s =3D 728.216)

x Old fullness grouping, old compaction algorithm
+ New fullness grouping, new compaction algorithm

    N           Min           Max        Median           Avg        Stddev
x 140           349          3513          2461     2314.1214     806.03271
+ 140           226          2279          1644     1528.4143     524.85268
Difference at 95.0% confidence
        -785.707 +/- 159.331
        -33.9527% +/- 6.88516%
        (Student's t, pooled s =3D 680.132)

Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org>
---
 mm/zsmalloc.c | 78 ++++++++++++++++++++++++---------------------------
 1 file changed, 36 insertions(+), 42 deletions(-)

diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
index cc59336a966a..a61540afbb28 100644
--- a/mm/zsmalloc.c
+++ b/mm/zsmalloc.c
@@ -1782,15 +1782,14 @@ struct zs_compact_control {
 	int obj_idx;
 };
=20
-static int migrate_zspage(struct zs_pool *pool, struct size_class *class,
-				struct zs_compact_control *cc)
+static void migrate_zspage(struct zs_pool *pool, struct size_class *class,
+			   struct zs_compact_control *cc)
 {
 	unsigned long used_obj, free_obj;
 	unsigned long handle;
 	struct page *s_page =3D cc->s_page;
 	struct page *d_page =3D cc->d_page;
 	int obj_idx =3D cc->obj_idx;
-	int ret =3D 0;
=20
 	while (1) {
 		handle =3D find_alloced_obj(class, s_page, &obj_idx);
@@ -1803,10 +1802,8 @@ static int migrate_zspage(struct zs_pool *pool, stru=
ct size_class *class,
 		}
=20
 		/* Stop if there is no more space */
-		if (zspage_full(class, get_zspage(d_page))) {
-			ret =3D -ENOMEM;
+		if (zspage_full(class, get_zspage(d_page)))
 			break;
-		}
=20
 		used_obj =3D handle_to_obj(handle);
 		free_obj =3D obj_malloc(pool, get_zspage(d_page), handle);
@@ -1819,8 +1816,6 @@ static int migrate_zspage(struct zs_pool *pool, struc=
t size_class *class,
 	/* Remember last position in this iteration */
 	cc->s_page =3D s_page;
 	cc->obj_idx =3D obj_idx;
-
-	return ret;
 }
=20
 static struct zspage *isolate_src_zspage(struct size_class *class)
@@ -2216,7 +2211,7 @@ static unsigned long __zs_compact(struct zs_pool *poo=
l,
 				  struct size_class *class)
 {
 	struct zs_compact_control cc;
-	struct zspage *src_zspage;
+	struct zspage *src_zspage =3D NULL;
 	struct zspage *dst_zspage =3D NULL;
 	unsigned long pages_freed =3D 0;
=20
@@ -2225,50 +2220,45 @@ static unsigned long __zs_compact(struct zs_pool *p=
ool,
 	 * as well as zpage allocation/free
 	 */
 	spin_lock(&pool->lock);
-	while ((src_zspage =3D isolate_src_zspage(class))) {
-		/* protect someone accessing the zspage(i.e., zs_map_object) */
-		migrate_write_lock(src_zspage);
+	while (zs_can_compact(class)) {
+		int fg;
=20
-		if (!zs_can_compact(class))
+		if (!dst_zspage) {
+			dst_zspage =3D isolate_dst_zspage(class);
+			if (!dst_zspage)
+				break;
+			migrate_write_lock(dst_zspage);
+			cc.d_page =3D get_first_page(dst_zspage);
+		}
+
+		src_zspage =3D isolate_src_zspage(class);
+		if (!src_zspage)
 			break;
=20
+		migrate_write_lock_nested(src_zspage);
+
 		cc.obj_idx =3D 0;
 		cc.s_page =3D get_first_page(src_zspage);
+		migrate_zspage(pool, class, &cc);
+		fg =3D putback_zspage(class, src_zspage);
+		migrate_write_unlock(src_zspage);
=20
-		while ((dst_zspage =3D isolate_dst_zspage(class))) {
-			migrate_write_lock_nested(dst_zspage);
-
-			cc.d_page =3D get_first_page(dst_zspage);
-			/*
-			 * If there is no more space in dst_page, resched
-			 * and see if anyone had allocated another zspage.
-			 */
-			if (!migrate_zspage(pool, class, &cc))
-				break;
+		if (fg =3D=3D ZS_INUSE_RATIO_0) {
+			free_zspage(pool, class, src_zspage);
+			pages_freed +=3D class->pages_per_zspage;
+			src_zspage =3D NULL;
+		}
=20
+		if (get_fullness_group(class, dst_zspage) =3D=3D ZS_INUSE_RATIO_100
+		    || spin_is_contended(&pool->lock)) {
 			putback_zspage(class, dst_zspage);
 			migrate_write_unlock(dst_zspage);
 			dst_zspage =3D NULL;
-			if (spin_is_contended(&pool->lock))
-				break;
-		}
=20
-		/* Stop if we couldn't find slot */
-		if (dst_zspage =3D=3D NULL)
-			break;
-
-		putback_zspage(class, dst_zspage);
-		migrate_write_unlock(dst_zspage);
-
-		if (putback_zspage(class, src_zspage) =3D=3D ZS_INUSE_RATIO_0) {
-			migrate_write_unlock(src_zspage);
-			free_zspage(pool, class, src_zspage);
-			pages_freed +=3D class->pages_per_zspage;
-		} else
-			migrate_write_unlock(src_zspage);
-		spin_unlock(&pool->lock);
-		cond_resched();
-		spin_lock(&pool->lock);
+			spin_unlock(&pool->lock);
+			cond_resched();
+			spin_lock(&pool->lock);
+		}
 	}
=20
 	if (src_zspage) {
@@ -2276,6 +2266,10 @@ static unsigned long __zs_compact(struct zs_pool *po=
ol,
 		migrate_write_unlock(src_zspage);
 	}
=20
+	if (dst_zspage) {
+		putback_zspage(class, dst_zspage);
+		migrate_write_unlock(dst_zspage);
+	}
 	spin_unlock(&pool->lock);
=20
 	return pages_freed;
--=20
2.40.0.rc0.216.gc4246ad0f0-goog
From nobody Mon Feb  9 02:03:17 2026
Return-Path: <linux-kernel-owner@vger.kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 1A2ECC6FA8E
	for <linux-kernel@archiver.kernel.org>; Sat,  4 Mar 2023 03:49:14 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S229721AbjCDDtM (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Fri, 3 Mar 2023 22:49:12 -0500
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:35508 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S229785AbjCDDtD (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Fri, 3 Mar 2023 22:49:03 -0500
Received: from mail-pj1-x1029.google.com (mail-pj1-x1029.google.com
 [IPv6:2607:f8b0:4864:20::1029])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 7B82E1C31A
        for <linux-kernel@vger.kernel.org>;
 Fri,  3 Mar 2023 19:48:52 -0800 (PST)
Received: by mail-pj1-x1029.google.com with SMTP id
 6-20020a17090a190600b00237c5b6ecd7so8082861pjg.4
        for <linux-kernel@vger.kernel.org>;
 Fri, 03 Mar 2023 19:48:52 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=chromium.org; s=google;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:from:to:cc:subject:date
         :message-id:reply-to;
        bh=VDa9Rt5gN+wqseziIq8dAlRWmwkEfGNI/W086ilXD04=;
        b=ma0hJFSMBbjPeOiDUGSPZH6UdO2Bu0ejhFXINdaxj+UMAC+rJPSDWse/UcI3XjdJB7
         XawTUDuCcYalqmSOEk5p0AaYQkVlIA52BdOk2CEjoo8NtEnudjtGwVlkDwdoKSwqAHFz
         nVpsbZEnKloellNxEGTwamXMjFp6+noHmgOSg=
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20210112;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc
         :subject:date:message-id:reply-to;
        bh=VDa9Rt5gN+wqseziIq8dAlRWmwkEfGNI/W086ilXD04=;
        b=jS4nx6okj8/2fVl9kQLHkkuSPjxWwePJywrDy3bPT/n+B+vV17mCHJcpdKOS6jEBzf
         8CcxGio0NacNvMS9lwmznd6ADKZgEGNF6TgeBkLt9b61huWN29BDHQH/NWzp5cszmcJb
         D9pELutA81uAwq0Q8cLerqCoMxbeBNFM2/p4XrAnmF31EfgfO4s9tm3d1j11+g7/hBbM
         bWmY9nj17F1PLX7T+XaZaEtOQrm4/mczETDeOJjDxxs4pZmxIjhOY/LE+gAPC3BwLCcB
         irCUAS/sRC4D2RWnT5pXB7ZUWIwF4hnWVzIWYsngqFvK2BcZxLuw2KAwDyj/b0jXlMaY
         36cQ==
X-Gm-Message-State: AO0yUKU4Xb46QIlnEwZVM1E167moqFI78rOVDnBm1FoDKMicnur/H04N
        Tn391TQMFsVSOg5Hgvcyu6sN8g==
X-Google-Smtp-Source: 
 AK7set+fQINFAZ8pGr8zRqFr/P11wMfjLik9Ns7+gC6nbPtWNZv37U4HFsrWhKuOWZaethOn90CMKw==
X-Received: by 2002:a17:903:1249:b0:19a:ad90:4223 with SMTP id
 u9-20020a170903124900b0019aad904223mr4358621plh.48.1677901731582;
        Fri, 03 Mar 2023 19:48:51 -0800 (PST)
Received: from tigerii.tok.corp.google.com
 ([2401:fa00:8f:203:6ac2:6eee:5465:7ee6])
        by smtp.gmail.com with ESMTPSA id
 d6-20020a170902c18600b00199025284b3sm2249204pld.151.2023.03.03.19.48.49
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Fri, 03 Mar 2023 19:48:51 -0800 (PST)
From: Sergey Senozhatsky <senozhatsky@chromium.org>
To: Minchan Kim <minchan@kernel.org>,
        Andrew Morton <akpm@linux-foundation.org>
Cc: Yosry Ahmed <yosryahmed@google.com>, linux-kernel@vger.kernel.org,
        linux-mm@kvack.org, Sergey Senozhatsky <senozhatsky@chromium.org>
Subject: [PATCHv4 4/4] zsmalloc: show per fullness group class stats
Date: Sat,  4 Mar 2023 12:48:35 +0900
Message-Id: <20230304034835.2082479-5-senozhatsky@chromium.org>
X-Mailer: git-send-email 2.40.0.rc0.216.gc4246ad0f0-goog
In-Reply-To: <20230304034835.2082479-1-senozhatsky@chromium.org>
References: <20230304034835.2082479-1-senozhatsky@chromium.org>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

We keep the old fullness (3/4 threshold) reporting in
zs_stats_size_show(). Switch from allmost full/empty
stats to fine-grained per inuse ratio (fullness group)
reporting, which gives signicantly more data on classes
fragmentation.

Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org>
Acked-by: Minchan Kim <minchan@kernel.org>
---
 mm/zsmalloc.c | 53 ++++++++++++++++++++++-----------------------------
 1 file changed, 23 insertions(+), 30 deletions(-)

diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
index a61540afbb28..aea50e2aa350 100644
--- a/mm/zsmalloc.c
+++ b/mm/zsmalloc.c
@@ -172,9 +172,7 @@
 enum fullness_group {
 	ZS_INUSE_RATIO_0,
 	ZS_INUSE_RATIO_10,
-	/* NOTE: 5 more fullness groups here */
-	ZS_INUSE_RATIO_70	=3D 7,
-	/* NOTE: 2 more fullness groups here */
+	/* NOTE: 8 more fullness groups here */
 	ZS_INUSE_RATIO_99       =3D 10,
 	ZS_INUSE_RATIO_100,
 	NR_FULLNESS_GROUPS,
@@ -621,23 +619,22 @@ static unsigned long zs_can_compact(struct size_class=
 *class);
=20
 static int zs_stats_size_show(struct seq_file *s, void *v)
 {
-	int i;
+	int i, fg;
 	struct zs_pool *pool =3D s->private;
 	struct size_class *class;
 	int objs_per_zspage;
-	unsigned long class_almost_full, class_almost_empty;
 	unsigned long obj_allocated, obj_used, pages_used, freeable;
-	unsigned long total_class_almost_full =3D 0, total_class_almost_empty =3D=
 0;
 	unsigned long total_objs =3D 0, total_used_objs =3D 0, total_pages =3D 0;
 	unsigned long total_freeable =3D 0;
+	unsigned long inuse_totals[NR_FULLNESS_GROUPS] =3D {0, };
=20
-	seq_printf(s, " %5s %5s %11s %12s %13s %10s %10s %16s %8s\n",
-			"class", "size", "almost_full", "almost_empty",
+	seq_printf(s, " %5s %5s %9s %9s %9s %9s %9s %9s %9s %9s %9s %9s %9s %13s =
%10s %10s %16s %8s\n",
+			"class", "size", "10%", "20%", "30%", "40%",
+			"50%", "60%", "70%", "80%", "90%", "99%", "100%",
 			"obj_allocated", "obj_used", "pages_used",
 			"pages_per_zspage", "freeable");
=20
 	for (i =3D 0; i < ZS_SIZE_CLASSES; i++) {
-		int fg;
=20
 		class =3D pool->size_class[i];
=20
@@ -645,16 +642,12 @@ static int zs_stats_size_show(struct seq_file *s, voi=
d *v)
 			continue;
=20
 		spin_lock(&pool->lock);
-		class_almost_full =3D 0;
-		class_almost_empty =3D 0;
-		/*
-		 * Replicate old behaviour for almost_full and almost_empty
-		 * stats.
-		 */
-		for (fg =3D ZS_INUSE_RATIO_70; fg <=3D ZS_INUSE_RATIO_99; fg++)
-			class_almost_full +=3D zs_stat_get(class, fg);
-		for (fg =3D ZS_INUSE_RATIO_10; fg < ZS_INUSE_RATIO_70; fg++)
-			class_almost_empty +=3D zs_stat_get(class, fg);
+
+		seq_printf(s, " %5u %5u ", i, class->size);
+		for (fg =3D ZS_INUSE_RATIO_10; fg < NR_FULLNESS_GROUPS; fg++) {
+			inuse_totals[fg] +=3D zs_stat_get(class, fg);
+			seq_printf(s, "%9lu ", zs_stat_get(class, fg));
+		}
=20
 		obj_allocated =3D zs_stat_get(class, ZS_OBJS_ALLOCATED);
 		obj_used =3D zs_stat_get(class, ZS_OBJS_INUSE);
@@ -665,14 +658,10 @@ static int zs_stats_size_show(struct seq_file *s, voi=
d *v)
 		pages_used =3D obj_allocated / objs_per_zspage *
 				class->pages_per_zspage;
=20
-		seq_printf(s, " %5u %5u %11lu %12lu %13lu"
-				" %10lu %10lu %16d %8lu\n",
-			i, class->size, class_almost_full, class_almost_empty,
-			obj_allocated, obj_used, pages_used,
-			class->pages_per_zspage, freeable);
+		seq_printf(s, "%13lu %10lu %10lu %16d %8lu\n",
+			   obj_allocated, obj_used, pages_used,
+			   class->pages_per_zspage, freeable);
=20
-		total_class_almost_full +=3D class_almost_full;
-		total_class_almost_empty +=3D class_almost_empty;
 		total_objs +=3D obj_allocated;
 		total_used_objs +=3D obj_used;
 		total_pages +=3D pages_used;
@@ -680,10 +669,14 @@ static int zs_stats_size_show(struct seq_file *s, voi=
d *v)
 	}
=20
 	seq_puts(s, "\n");
-	seq_printf(s, " %5s %5s %11lu %12lu %13lu %10lu %10lu %16s %8lu\n",
-			"Total", "", total_class_almost_full,
-			total_class_almost_empty, total_objs,
-			total_used_objs, total_pages, "", total_freeable);
+	seq_printf(s, " %5s %5s ", "Total", "");
+
+	for (fg =3D ZS_INUSE_RATIO_10; fg < NR_FULLNESS_GROUPS; fg++)
+		seq_printf(s, "%9lu ", inuse_totals[fg]);
+
+	seq_printf(s, "%13lu %10lu %10lu %16s %8lu\n",
+		   total_objs, total_used_objs, total_pages, "",
+		   total_freeable);
=20
 	return 0;
 }
--=20
2.40.0.rc0.216.gc4246ad0f0-goog