From nobody Thu Feb 12 19:00:58 2026 Received: from mail-pf1-f174.google.com (mail-pf1-f174.google.com [209.85.210.174]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id DB41E178373; Sat, 8 Jun 2024 15:53:28 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.210.174 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1717862010; cv=none; b=oTh7gJbZcnIPVlIATpEpkRG6JaIsNnWfeDDPvfZcBxLG1zNHerTUt/j0Jr4FrP6K8JZqXvQdSDbevGKl8Cn4eGextosBuv+p/3qFOc5aoY2Tq/1MX3wvEXBsZXRlbrf5FI5d1hT/fXaHmNwmdpnh5GcTqbhxLdHAb+rOUXZlGek= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1717862010; c=relaxed/simple; bh=F9ls/zaY0aWwRmQsgCEGvjaAZXp6qmxbCVIzrVc4nqE=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=Shfwx1yALdz7i64fQLayx9zGUDAUHklj07ytlMvZC6kBXGhLanFy3buCFonRWk2mj07Dgf0xFN8ZAdwJ5m6nlGyQcUuFmxxaVcrROQ6tnM+nOwxkK1o6DeIV2VHmrKqRuVLBEOws0WSV54o1gjQH4UsTS8NzQrF64OUYoUca2xc= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=STLc3mJ0; arc=none smtp.client-ip=209.85.210.174 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="STLc3mJ0" Received: by mail-pf1-f174.google.com with SMTP id d2e1a72fcca58-7041082a80aso1274507b3a.1; Sat, 08 Jun 2024 08:53:28 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1717862008; x=1718466808; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=hZHxiUjeMqD6vZkOfwjSbyKs/eS2zlbVa8rN6xjbDxU=; b=STLc3mJ0NF05jGRkzR6o8v07T/ult0GsR5sUeAsa7/jodoULdOZ2s/I6gILCrvVUTn 3vTFonZu5ZRM4HhSW61EF3aY6lCw2kErips7ENWJQr1FzRP+zdwv5aoDmYU5XzMdNBY9 nnv5IIo+bim0PgUF3X0amX8kqsOff+mEnMhnPlafsq0XcApRIgIzRhnvStq/ig+cJYVd 6RSls5oUV0twR9mLuUlc7zSEdo9Jgl/dX32oouSahzwKCT7ykblZ5qZG5S7iPd4fAori 9YZxu2MvXbrNF+8nzazOmVuCXMxVBiiXm568o7twsJTw6OVHMV0SKs2Yc8kZUdJ0fTwd P4cg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1717862008; x=1718466808; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=hZHxiUjeMqD6vZkOfwjSbyKs/eS2zlbVa8rN6xjbDxU=; b=OaQs0SfEBFJIXpoth1Fhx9bT1B16PRtuLFoiqLnq5ESClxBkvTK7gi3YFMc5USPMCX Cj/lM0UjasJD2IiZwUs8LZuTbgMtdYoOJqTwmJfLwW4gluLT89JrNC6I909xcnm03fVP 2cwJnbtnJGZMrKxcNXSx74QHb01B7wHa0pzg+XK1OI+NFs28mESGmrt73rAZpIhovoEx NXDsjGGuBL0a8Iv23Z+eESzKg/3vuIcK/L7UD9n+epfTHlt/CSLMfzvNAFokAOTS76xh Av89BGHwvlde7Gefgcmm14s9uMhoQXy9i8qD6FGAlFJ9QXNGORaYSFGbxzcgSyVX+TDt CXng== X-Forwarded-Encrypted: i=1; AJvYcCXL4w6yHnqvm3xnklL2prW+/gBIK9mxB+6kV0TttJjDldM2HOzrMsNYcVpNaJuphe1nM56DaKNEYFxq95UN+xtRoVHyfhWTPDl+qgw6ty+7UXh1xyl5U7y25U0cx22maHaZHSJeg4vl X-Gm-Message-State: AOJu0YwgRlMCzIjJB8ZcHCnX6m1kiOlYvIfzcGVElcroeXXCxFX+Vtcw DZImycgCR4fMIxrNqIwsChEMQVk1TVfXYQKLWQoLB+2CKVDBszh5 X-Google-Smtp-Source: AGHT+IGe4Ii+KwF0gPByGAcULNxz091Evf59hb1FQhfH6o9f5mRo+uFoQZvIIBQ3ItmZA7vqYSzMtQ== X-Received: by 2002:a05:6a21:32a3:b0:1b5:cf9c:2936 with SMTP id adf61e73a8af0-1b5cf9c2b30mr1003625637.39.1717862007919; Sat, 08 Jun 2024 08:53:27 -0700 (PDT) Received: from cbuild.srv.usb0.net (uw2.srv.usb0.net. [185.197.30.200]) by smtp.gmail.com with ESMTPSA id d2e1a72fcca58-703fd4d9d8fsm4335209b3a.149.2024.06.08.08.53.27 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sat, 08 Jun 2024 08:53:27 -0700 (PDT) From: Takero Funaki To: Johannes Weiner , Yosry Ahmed , Nhat Pham , Chengming Zhou , Jonathan Corbet , Andrew Morton , Domenico Cerasuolo Cc: Takero Funaki , linux-mm@kvack.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [PATCH v1 1/3] mm: zswap: fix global shrinker memcg iteration Date: Sat, 8 Jun 2024 15:53:08 +0000 Message-ID: <20240608155316.451600-2-flintglass@gmail.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20240608155316.451600-1-flintglass@gmail.com> References: <20240608155316.451600-1-flintglass@gmail.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" This patch fixes an issue where the zswap global shrinker stopped iterating through the memcg tree. The problem was that `shrink_worker()` would stop iterating when a memcg was being offlined and restart from the tree root. Now, it properly handles the offlining memcg and continues shrinking with the next memcg. This patch also modified handing of the lock for offlined memcg cleaner to adapt the change in the iteration, and avoid negligibly rare skipping of a memcg from shrink iteration. Fixes: a65b0e7607cc ("zswap: make shrinking memcg-aware") Signed-off-by: Takero Funaki --- mm/zswap.c | 87 ++++++++++++++++++++++++++++++++++++++++++------------ 1 file changed, 68 insertions(+), 19 deletions(-) diff --git a/mm/zswap.c b/mm/zswap.c index 80c634acb8d5..d720a42069b6 100644 --- a/mm/zswap.c +++ b/mm/zswap.c @@ -827,12 +827,27 @@ void zswap_folio_swapin(struct folio *folio) } } =20 +/* + * This function should be called when a memcg is being offlined. + * + * Since the global shrinker shrink_worker() may hold a reference + * of the memcg, we must check and release the reference in + * zswap_next_shrink. + * + * shrink_worker() must handle the case where this function releases + * the reference of memcg being shrunk. + */ void zswap_memcg_offline_cleanup(struct mem_cgroup *memcg) { /* lock out zswap shrinker walking memcg tree */ spin_lock(&zswap_shrink_lock); - if (zswap_next_shrink =3D=3D memcg) - zswap_next_shrink =3D mem_cgroup_iter(NULL, zswap_next_shrink, NULL); + + if (READ_ONCE(zswap_next_shrink) =3D=3D memcg) { + /* put back reference and advance the cursor */ + memcg =3D mem_cgroup_iter(NULL, memcg, NULL); + WRITE_ONCE(zswap_next_shrink, memcg); + } + spin_unlock(&zswap_shrink_lock); } =20 @@ -1401,25 +1416,44 @@ static int shrink_memcg(struct mem_cgroup *memcg) =20 static void shrink_worker(struct work_struct *w) { - struct mem_cgroup *memcg; + struct mem_cgroup *memcg =3D NULL; + struct mem_cgroup *next_memcg; int ret, failures =3D 0; unsigned long thr; =20 /* Reclaim down to the accept threshold */ thr =3D zswap_accept_thr_pages(); =20 - /* global reclaim will select cgroup in a round-robin fashion. */ + /* global reclaim will select cgroup in a round-robin fashion. + * + * We save iteration cursor memcg into zswap_next_shrink, + * which can be modified by the offline memcg cleaner + * zswap_memcg_offline_cleanup(). + * + * Since the offline cleaner is called only once, we cannot abandone + * offline memcg reference in zswap_next_shrink. + * We can rely on the cleaner only if we get online memcg under lock. + * If we get offline memcg, we cannot determine the cleaner will be + * called later. We must put it before returning from this function. + */ do { +iternext: spin_lock(&zswap_shrink_lock); - zswap_next_shrink =3D mem_cgroup_iter(NULL, zswap_next_shrink, NULL); - memcg =3D zswap_next_shrink; + next_memcg =3D READ_ONCE(zswap_next_shrink); + + if (memcg !=3D next_memcg) { + /* + * Ours was released by offlining. + * Use the saved memcg reference. + */ + memcg =3D next_memcg; + } else { + /* advance cursor */ + memcg =3D mem_cgroup_iter(NULL, memcg, NULL); + WRITE_ONCE(zswap_next_shrink, memcg); + } =20 /* - * We need to retry if we have gone through a full round trip, or if we - * got an offline memcg (or else we risk undoing the effect of the - * zswap memcg offlining cleanup callback). This is not catastrophic - * per se, but it will keep the now offlined memcg hostage for a while. - * * Note that if we got an online memcg, we will keep the extra * reference in case the original reference obtained by mem_cgroup_iter * is dropped by the zswap memcg offlining callback, ensuring that the @@ -1434,16 +1468,25 @@ static void shrink_worker(struct work_struct *w) } =20 if (!mem_cgroup_tryget_online(memcg)) { - /* drop the reference from mem_cgroup_iter() */ - mem_cgroup_iter_break(NULL, memcg); - zswap_next_shrink =3D NULL; + /* + * It is an offline memcg which we cannot shrink + * until its pages are reparented. + * + * Since we cannot determine if the offline cleaner has + * been already called or not, the offline memcg must be + * put back unconditonally. We cannot abort the loop while + * zswap_next_shrink has a reference of this offline memcg. + */ spin_unlock(&zswap_shrink_lock); - - if (++failures =3D=3D MAX_RECLAIM_RETRIES) - break; - - goto resched; + goto iternext; } + /* + * We got an extra memcg reference before unlocking. + * The cleaner cannot free it using zswap_next_shrink. + * + * Our memcg can be offlined after we get online memcg here. + * In this case, the cleaner is waiting the lock just behind us. + */ spin_unlock(&zswap_shrink_lock); =20 ret =3D shrink_memcg(memcg); @@ -1457,6 +1500,12 @@ static void shrink_worker(struct work_struct *w) resched: cond_resched(); } while (zswap_total_pages() > thr); + + /* + * We can still hold the original memcg reference. + * The reference is stored in zswap_next_shrink, and then reused + * by the next shrink_worker(). + */ } =20 /********************************* --=20 2.43.0 From nobody Thu Feb 12 19:00:58 2026 Received: from mail-pg1-f179.google.com (mail-pg1-f179.google.com [209.85.215.179]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C1B7517B4E2; Sat, 8 Jun 2024 15:53:30 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.215.179 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1717862012; cv=none; b=rkeDMC3Rpm6U/EsdoOnQutxdeEQL0liIP2H/E+uuc+hnR0Gw9FiONgsUB5l/doIkAGDYXXMntw9YWu0y/SuQQsD1U2U2HEpqAI8xD02mR9mfy5mYqOB8/Q2G8Q7ehMN4O8MnHCcY2G6vSm2JJA3feavtmbSc85giB4HtLDwgK6s= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1717862012; c=relaxed/simple; bh=LT86OGMvkfYTxhefOYCi7bv3FCawM9/Lf+EF2JUYET4=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=E6Kqobtk5Oc4jwVe/C3jpIdxdGu2PuReu6dCuWwGWwjCY8SiY/uxqm2R4GFbyo9z5QGGRrLO/wmAePnhwiouWSgJ+l/MjMwygm5hss6w7566zQd9vV9fLK0UWwVWFp5rkdFBXKbseg3GfgBncFRW+21J6aZpqLL1bJQ7/cq6CQs= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=fZhy3vjp; arc=none smtp.client-ip=209.85.215.179 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="fZhy3vjp" Received: by mail-pg1-f179.google.com with SMTP id 41be03b00d2f7-6bfd4b88608so2318339a12.1; Sat, 08 Jun 2024 08:53:30 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1717862010; x=1718466810; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=+SvMQN/q38nefiYyRiVmPEMKlu3DU/JjYA/6JNmM/yw=; b=fZhy3vjph008FoGRMbItexbVfs/7bjVZRUgXiBnnBSXiSiJoYu85GpqhRkfxyJB9We H2JSXGgE92hIgRqeG5pSvYRX5nx1DKkj46NckpQ1VuWPbSgeMSCHJcjc/OxaNpqAoEif OkqMG4O3DsTVB73JH2sNdSAJChmnfPWEyTRLucOeN0xSXkidHT8pQx5Nz7Kp1DYhPXva hxIzEq7vKaAqIQJO0/1XahK7XlNxPLtdauLN5Wp1MQ+jtIIPqUgNg30xqi3IvJ+8HlAN 67VuAQIKcoxrMQFmpMtzwzEEZzWoItFHxAz6jkhammBX9fhPcFKXjWhAISWbpx420+qf RkHg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1717862010; x=1718466810; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=+SvMQN/q38nefiYyRiVmPEMKlu3DU/JjYA/6JNmM/yw=; b=ZhmRY7XPGSb4NmM3RqEwtKeEwhWxT8dIB7spVY6TDwnIe0IRbfj52/i0Gpf6db6ejB cOKjDAZXPB+vBeLJlRU05XuASVnvTz3tOtZw6iOMiF6VKTFFcG0DTMtuch1gKWBFfieV RYug05G57APgoUz3vh+YioiK1yXl48BCqRJzvtf5+4QR/1HMsS2yXUciqUKxle2fJa2g wbbFesvl/mYhdDn3thQIEkkjfFD4AHJWz8NzIZLrzCqSXcOOoar+yeju4vJ5e4piZSb1 x724VHSe3dAdtMWJWTWsI/J87JRlE/ZnY8CoGL7bIz/eW7SXw7oc3OqkIjMuiN4pZxwM FjVg== X-Forwarded-Encrypted: i=1; AJvYcCXxTcsCRC++U8OeUETOP4vNUW+odTYlZPVqFYkM/mADq2zJRdXFLrShqHw9MgM3F4ITMvMJYayOH21IiR6u4D3FXa0GRn35mXofIr/NaG5C+ad12imlMmOESdEIWupVVgMixWEnMmYQ X-Gm-Message-State: AOJu0YxKgu/B73VkN/F1IrNp84Sv0U9qhfrOw4YXf/e362y6AUfs4Z+i 7akKO8hf8PPSMIPVWiI/iz+I0D8wznkCnh4TmrKFQ9rD3NPkuwOMS98jJJ45KFI= X-Google-Smtp-Source: AGHT+IEG/zCdA3pqdzro3I+Qtktx3F9FKRDjkiSf63QDjZuWrfOfFPcvyEL5zE403+r39r2Uach56A== X-Received: by 2002:a05:6a20:12d6:b0:1b2:b220:2db6 with SMTP id adf61e73a8af0-1b2f969ed25mr5659976637.6.1717862009830; Sat, 08 Jun 2024 08:53:29 -0700 (PDT) Received: from cbuild.srv.usb0.net (uw2.srv.usb0.net. [185.197.30.200]) by smtp.gmail.com with ESMTPSA id d2e1a72fcca58-703fd4d9d8fsm4335209b3a.149.2024.06.08.08.53.29 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sat, 08 Jun 2024 08:53:29 -0700 (PDT) From: Takero Funaki To: Johannes Weiner , Yosry Ahmed , Nhat Pham , Chengming Zhou , Jonathan Corbet , Andrew Morton , Domenico Cerasuolo Cc: Takero Funaki , linux-mm@kvack.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [PATCH v1 2/3] mm: zswap: fix global shrinker error handling logic Date: Sat, 8 Jun 2024 15:53:09 +0000 Message-ID: <20240608155316.451600-3-flintglass@gmail.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20240608155316.451600-1-flintglass@gmail.com> References: <20240608155316.451600-1-flintglass@gmail.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" This patch fixes zswap global shrinker that did not shrink zpool as expected. The issue it addresses is that `shrink_worker()` did not distinguish between unexpected errors and expected error codes that should be skipped, such as when there is no stored page in a memcg. This led to the shrinking process being aborted on the expected error codes. The shrinker should ignore these cases and skip to the next memcg. However, skipping all memcgs presents another problem. To address this, this patch tracks progress while walking the memcg tree and checks for progress once the tree walk is completed. To handle the empty memcg case, the helper function `shrink_memcg()` is modified to check if the memcg is empty and then return -ENOENT. Fixes: a65b0e7607cc ("zswap: make shrinking memcg-aware") Signed-off-by: Takero Funaki --- mm/zswap.c | 31 ++++++++++++++++++++++++++----- 1 file changed, 26 insertions(+), 5 deletions(-) diff --git a/mm/zswap.c b/mm/zswap.c index d720a42069b6..1a90f434f247 100644 --- a/mm/zswap.c +++ b/mm/zswap.c @@ -1393,7 +1393,7 @@ static struct shrinker *zswap_alloc_shrinker(void) =20 static int shrink_memcg(struct mem_cgroup *memcg) { - int nid, shrunk =3D 0; + int nid, shrunk =3D 0, stored =3D 0; =20 if (!mem_cgroup_zswap_writeback_enabled(memcg)) return -EINVAL; @@ -1408,9 +1408,16 @@ static int shrink_memcg(struct mem_cgroup *memcg) for_each_node_state(nid, N_NORMAL_MEMORY) { unsigned long nr_to_walk =3D 1; =20 + if (!list_lru_count_one(&zswap_list_lru, nid, memcg)) + continue; + ++stored; shrunk +=3D list_lru_walk_one(&zswap_list_lru, nid, memcg, &shrink_memcg_cb, NULL, &nr_to_walk); } + + if (!stored) + return -ENOENT; + return shrunk ? 0 : -EAGAIN; } =20 @@ -1418,12 +1425,18 @@ static void shrink_worker(struct work_struct *w) { struct mem_cgroup *memcg =3D NULL; struct mem_cgroup *next_memcg; - int ret, failures =3D 0; + int ret, failures =3D 0, progress; unsigned long thr; =20 /* Reclaim down to the accept threshold */ thr =3D zswap_accept_thr_pages(); =20 + /* + * We might start from the last memcg. + * That is not a failure. + */ + progress =3D 1; + /* global reclaim will select cgroup in a round-robin fashion. * * We save iteration cursor memcg into zswap_next_shrink, @@ -1461,9 +1474,12 @@ static void shrink_worker(struct work_struct *w) */ if (!memcg) { spin_unlock(&zswap_shrink_lock); - if (++failures =3D=3D MAX_RECLAIM_RETRIES) + + /* tree walk completed but no progress */ + if (!progress && ++failures =3D=3D MAX_RECLAIM_RETRIES) break; =20 + progress =3D 0; goto resched; } =20 @@ -1493,10 +1509,15 @@ static void shrink_worker(struct work_struct *w) /* drop the extra reference */ mem_cgroup_put(memcg); =20 - if (ret =3D=3D -EINVAL) - break; + /* not a writeback candidate memcg */ + if (ret =3D=3D -EINVAL || ret =3D=3D -ENOENT) + continue; + if (ret && ++failures =3D=3D MAX_RECLAIM_RETRIES) break; + + ++progress; + /* reschedule as we performed some IO */ resched: cond_resched(); } while (zswap_total_pages() > thr); --=20 2.43.0 From nobody Thu Feb 12 19:00:58 2026 Received: from mail-pf1-f172.google.com (mail-pf1-f172.google.com [209.85.210.172]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 779831E888; Sat, 8 Jun 2024 15:53:32 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.210.172 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1717862014; cv=none; b=tf16ugce+O1PBYH4eCzXZ5MBv8mneXkAZPryvs06l0evfzIgLv9y5qLbrOas55chna6Abf8eH+QgIXHJDurM6FILPaUqbnv/piMKHFG5p99ney1l7VcR42dvWp0X14Vs5N/GfXHbx5PfbOOS0v6QXQ8NVVz0Nz5P7i35l7CmrHA= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1717862014; c=relaxed/simple; bh=Snco5oqloL4GXcU+40C9q1nAQ6XQ+70U2BqTJl7GweE=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=Vvf2W7o0e50xpBwtRWNq8nG0aSwFPtsQluM85q4+sHjOv8MHX6lOoYhPVu1gO6EPwQ58DL8KPxkY+hDJr96AKmMc6s5mBHk3qlInSSOnKWiNcgOeF0i50bfkpgD/W+fEGICcIT7OWPhLSu1GpNBDFwmmH+xtfVfguN/YMKUxuGo= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=UmK+Nmx+; arc=none smtp.client-ip=209.85.210.172 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="UmK+Nmx+" Received: by mail-pf1-f172.google.com with SMTP id d2e1a72fcca58-703ed15b273so2494599b3a.1; Sat, 08 Jun 2024 08:53:32 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1717862012; x=1718466812; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=HrIr32TNVXDKH7oG1xumlOKq/cguM0V8sG+9FXVj9vY=; b=UmK+Nmx+x+eWQwNR3vc4pir6nM8Bm5id1m8cKxsO2NpiMA/U6kFUWdj9RQgLVHOB0F 79CJn5ogSuL88/cI3iniNjSUYIftue4EmWwWVN+Hh8mpfG1w6GhawpA3WNVUvlb6v9Q7 i2R95D6Oil7zD15L3MYiSWKz+QCZMQSNOFJwZ4ay10ZuI63r2KhR/mfEp1L8qq7bHkHe YtsHkngcbH6CXbfcwLxeg3xgb2Ha5e8IswYTfuytcQYiYhWATeK9zifRIb8DMf+gCh1J yzZ2KK+in/MzvHSPr36T1gqDgatVMAglrx5uAkrTmROzEsuDj0T+J38sV8zMSRMDvttv q4Tw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1717862012; x=1718466812; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=HrIr32TNVXDKH7oG1xumlOKq/cguM0V8sG+9FXVj9vY=; b=Z2hAL2t/j2GVFF6iN30Suo4Q5Ir3cTBBcJ2yVhR838cxnWSEFuBb5oBMMk1Go5pkBo RffgivYUQH1MDdeJ0wBzvfKNCzlHnqwTTf8Mhe+9wU06Pr+RjWsvCvaLYct60H90pYY0 hy9NcGq/VwtK6CoaG3BZ5M0g3glNgtw9V41PD5XWj9sMTzc+12Z0SSZdy1QZiJlvFepx rXTfAAuhH4r/nGqBhGXY0eFHoLbS5ZrymHYhZxBvVXewxYm48m8G2H5rbtxrpgwkaTCe QRIDZVnhWIENidOZkZaaTmfmX0X379qmJS2yBdZedemlfYxwRdeYoQ2R5rXiX/0gcWC1 vaoQ== X-Forwarded-Encrypted: i=1; AJvYcCVHiUoTjxFpp8OoklnmRA8pTyLDaex/msPtvpjeyACAaEcU0SguM2l2TGcFK8g08+oFPtuWkzdLpf9Tb0bz3JY1uZuBQ+vOM0nEAMgtrCljkuDeSS0zzIJlBUEFWJD9ZfLT+XUPay4G X-Gm-Message-State: AOJu0YzC0kGDXYs1qPLryVY/bDO1Yo8VJwy/QaJ5pnx7yjRXz5glXgyW wxBKBsizwNljsPM/pUy55MmmBK4nRYWLFtiJgJuZMi/FPAGffSgw X-Google-Smtp-Source: AGHT+IEteUQgfUsrWSdHev4f9MAtLQvgT4O27ZIatD/ynyxVmEqXJoDhlhnvAnSdhI3u3ENtRXai/Q== X-Received: by 2002:a05:6a20:43a2:b0:1b4:2a8:629 with SMTP id adf61e73a8af0-1b402a8081dmr4737912637.53.1717862011509; Sat, 08 Jun 2024 08:53:31 -0700 (PDT) Received: from cbuild.srv.usb0.net (uw2.srv.usb0.net. [185.197.30.200]) by smtp.gmail.com with ESMTPSA id d2e1a72fcca58-703fd4d9d8fsm4335209b3a.149.2024.06.08.08.53.30 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sat, 08 Jun 2024 08:53:31 -0700 (PDT) From: Takero Funaki To: Johannes Weiner , Yosry Ahmed , Nhat Pham , Chengming Zhou , Jonathan Corbet , Andrew Morton , Domenico Cerasuolo Cc: Takero Funaki , linux-mm@kvack.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [PATCH v1 3/3] mm: zswap: proactive shrinking before pool size limit is hit Date: Sat, 8 Jun 2024 15:53:10 +0000 Message-ID: <20240608155316.451600-4-flintglass@gmail.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20240608155316.451600-1-flintglass@gmail.com> References: <20240608155316.451600-1-flintglass@gmail.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" This patch implements proactive shrinking of zswap pool before the max pool size limit is reached. This also changes zswap to accept new pages while the shrinker is running. To prevent zswap from rejecting new pages and incurring latency when zswap is full, this patch queues the global shrinker by a pool usage threshold between 100% and accept_thr_percent, instead of the max pool size. The pool size will be controlled between 90% to 91% for the default accept_thr_percent=3D90. Since the current global shrinker continues to shrink until accept_thr_percent, we do not need to maintain the hysteresis variable tracking the pool limit overage in zswap_store(). Before this patch, zswap rejected pages while the shrinker is running without incrementing zswap_pool_limit_hit counter. It could be a reason why zswap writethrough new pages before writeback old pages. With this patch, zswap accepts new pages while shrinking, and zswap increments the counter when and only when zswap rejects pages by the max pool size. Now, reclaims smaller than the proactive shrinking amount finish instantly and trigger background shrinking. Admins can check if new pages are buffered by zswap by monitoring the pool_limit_hit counter. The name of sysfs tunable accept_thr_percent is unchanged as it is still the stop condition of the shrinker. The respective documentation is updated to describe the new behavior. Signed-off-by: Takero Funaki --- Documentation/admin-guide/mm/zswap.rst | 17 ++++---- mm/zswap.c | 54 ++++++++++++++++---------- 2 files changed, 42 insertions(+), 29 deletions(-) diff --git a/Documentation/admin-guide/mm/zswap.rst b/Documentation/admin-g= uide/mm/zswap.rst index 3598dcd7dbe7..a1d8f167a27a 100644 --- a/Documentation/admin-guide/mm/zswap.rst +++ b/Documentation/admin-guide/mm/zswap.rst @@ -111,18 +111,17 @@ checked if it is a same-value filled page before comp= ressing it. If true, the compressed length of the page is set to zero and the pattern or same-filled value is stored. =20 -To prevent zswap from shrinking pool when zswap is full and there's a high -pressure on swap (this will result in flipping pages in and out zswap pool -without any real benefit but with a performance drop for the system), a -special parameter has been introduced to implement a sort of hysteresis to -refuse taking pages into zswap pool until it has sufficient space if the l= imit -has been hit. To set the threshold at which zswap would start accepting pa= ges -again after it became full, use the sysfs ``accept_threshold_percent`` -attribute, e. g.:: +To prevent zswap from rejecting new pages and incurring latency when zswap= is +full, zswap initiates a worker called global shrinker that proactively evi= cts +some pages from the pool to swap devices while the pool is reaching the li= mit. +The global shrinker continues to evict pages until there is sufficient spa= ce to +accept new pages. To control how many pages should remain in the pool, use= the +sysfs ``accept_threshold_percent`` attribute as a percentage of the max po= ol +size, e. g.:: =20 echo 80 > /sys/module/zswap/parameters/accept_threshold_percent =20 -Setting this parameter to 100 will disable the hysteresis. +Setting this parameter to 100 will disable the proactive shrinking. =20 Some users cannot tolerate the swapping that comes with zswap store failur= es and zswap writebacks. Swapping can be disabled entirely (without disabling diff --git a/mm/zswap.c b/mm/zswap.c index 1a90f434f247..e957bfdeaf70 100644 --- a/mm/zswap.c +++ b/mm/zswap.c @@ -71,8 +71,6 @@ static u64 zswap_reject_kmemcache_fail; =20 /* Shrinker work queue */ static struct workqueue_struct *shrink_wq; -/* Pool limit was hit, we need to calm down */ -static bool zswap_pool_reached_full; =20 /********************************* * tunables @@ -118,7 +116,10 @@ module_param_cb(zpool, &zswap_zpool_param_ops, &zswap_= zpool_type, 0644); static unsigned int zswap_max_pool_percent =3D 20; module_param_named(max_pool_percent, zswap_max_pool_percent, uint, 0644); =20 -/* The threshold for accepting new pages after the max_pool_percent was hi= t */ +/* + * The percentage of pool size that the global shrinker keeps in memory. + * It does not protect old pages from the dynamic shrinker. + */ static unsigned int zswap_accept_thr_percent =3D 90; /* of max pool size */ module_param_named(accept_threshold_percent, zswap_accept_thr_percent, uint, 0644); @@ -539,6 +540,20 @@ static unsigned long zswap_accept_thr_pages(void) return zswap_max_pages() * zswap_accept_thr_percent / 100; } =20 +/* + * Returns threshold to start proactive global shrinking. + */ +static inline unsigned long zswap_shrink_start_pages(void) +{ + /* + * Shrinker will evict pages to the accept threshold. + * We add 1% to not schedule shrinker too frequently + * for small swapout. + */ + return zswap_max_pages() * + min(100, zswap_accept_thr_percent + 1) / 100; +} + unsigned long zswap_total_pages(void) { struct zswap_pool *pool; @@ -556,21 +571,6 @@ unsigned long zswap_total_pages(void) return total; } =20 -static bool zswap_check_limits(void) -{ - unsigned long cur_pages =3D zswap_total_pages(); - unsigned long max_pages =3D zswap_max_pages(); - - if (cur_pages >=3D max_pages) { - zswap_pool_limit_hit++; - zswap_pool_reached_full =3D true; - } else if (zswap_pool_reached_full && - cur_pages <=3D zswap_accept_thr_pages()) { - zswap_pool_reached_full =3D false; - } - return zswap_pool_reached_full; -} - /********************************* * param callbacks **********************************/ @@ -1577,6 +1577,8 @@ bool zswap_store(struct folio *folio) struct obj_cgroup *objcg =3D NULL; struct mem_cgroup *memcg =3D NULL; unsigned long value; + unsigned long cur_pages; + bool need_global_shrink =3D false; =20 VM_WARN_ON_ONCE(!folio_test_locked(folio)); VM_WARN_ON_ONCE(!folio_test_swapcache(folio)); @@ -1599,8 +1601,17 @@ bool zswap_store(struct folio *folio) mem_cgroup_put(memcg); } =20 - if (zswap_check_limits()) + cur_pages =3D zswap_total_pages(); + + if (cur_pages >=3D zswap_max_pages()) { + zswap_pool_limit_hit++; + need_global_shrink =3D true; goto reject; + } + + /* schedule shrink for incoming pages */ + if (cur_pages >=3D zswap_shrink_start_pages()) + queue_work(shrink_wq, &zswap_shrink_work); =20 /* allocate entry */ entry =3D zswap_entry_cache_alloc(GFP_KERNEL, folio_nid(folio)); @@ -1643,6 +1654,9 @@ bool zswap_store(struct folio *folio) =20 WARN_ONCE(err !=3D -ENOMEM, "unexpected xarray error: %d\n", err); zswap_reject_alloc_fail++; + + /* reduce entry in array */ + need_global_shrink =3D true; goto store_failed; } =20 @@ -1692,7 +1706,7 @@ bool zswap_store(struct folio *folio) zswap_entry_cache_free(entry); reject: obj_cgroup_put(objcg); - if (zswap_pool_reached_full) + if (need_global_shrink) queue_work(shrink_wq, &zswap_shrink_work); check_old: /* --=20 2.43.0