From nobody Fri Jun 19 15:07:33 2026 Received: from mail-pf1-f170.google.com (mail-pf1-f170.google.com [209.85.210.170]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 7DF2217D6 for ; Thu, 18 Jun 2026 04:49:28 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.210.170 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1781758170; cv=none; b=fm+NZFa/t98yQMyY5j5txJ9fz3Hl3i5Hfvgz+k14xZRkzzxNKrg1bBxMiecfOGiFCYNRYaW6Qt/YRBNnsZVmLrmnfoenMoGeim2rZT9yBdhbLc2amydxPauSJZGf8oBU2FE2uiXvdhvgnSG78kH5VMnKu5FzrDbqnRDl/YzyBGg= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1781758170; c=relaxed/simple; bh=icFfZQ/U3o57xaTAXnB3n/Ub2+4BqmO/f5ulEfrZ48o=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=d3WlGrGX1JBI4ZTy1fWQU3shgjvafuKcx1nv4MGRX7UAvHq9l2NRPAFjNawL776BUy1SYd/MYct2nffvXkppYavzOzXv/MJxYM5Pfki+p1SsmhdedtloSnrxOus9fIFyB2EZLSnucb17qc/BAU8Gx9glt/jMH7TVT64OlYhh+UQ= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=JtjHQtpr; arc=none smtp.client-ip=209.85.210.170 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="JtjHQtpr" Received: by mail-pf1-f170.google.com with SMTP id d2e1a72fcca58-8452a597afcso230906b3a.1 for ; Wed, 17 Jun 2026 21:49:28 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20251104; t=1781758168; x=1782362968; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=C0Hrp4ly7FYGNcwCz9cInOYn+lLT0zH/zJYZGhEA7lk=; b=JtjHQtprx4JyV5QUKqMVQVCeJAEIzedGcK2S6b8MDKI2tW8Xjyd7LblXzTq+tpVQWB ZVpMV4XSKJ5Ns29i+3mrwG0OuHZhXyWZep33HY3OJlkNRrSZy7KXZpJt4b/4j9FFhPdi FchR+KSyoVuj/gg9Khi9t3pd6VD1FyOPAvLYfMNBpdodR/3MOZXzQu4aXDIhFjwwO0YP COHWWKLLnRrDblwRZfjEQNuvpMCdU7MhfQmGZxRVCwYh3W5V5rOqDLoLnsG+Lso+MBVh ld1zrKp/x5XDggVVfsxLptyNAbmcE8muYLe36bwoIHEPrXXJShxLISIfPG0DgRWdGcFX gyow== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1781758168; x=1782362968; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=C0Hrp4ly7FYGNcwCz9cInOYn+lLT0zH/zJYZGhEA7lk=; b=p0hBz9o+2j2Rr7/jXHLOzWTO7v0uWpdlffuMt99MCdxfe8mwJIibZUO+sjCZeh6y81 bHLg3GvwZiM9A5zx5kFiJCjD+z6MtdViKGsxSAoviy+JL6f+Gt7kyKxyKEEzqYQgePV/ JlY3mmqPeXReCg4ooHDO6GUN159Jv2g3Xhurv8+MF4iWW1yh5YdbwsO/HvWmTCyBs9tn dSyxu1uE69F14k8VjG/2uG+g/udDjpxg2TlwLhOwGXZJ8yZuS0Ow49suqngwiQXfnYsP KzPiyTu4+4JWd2MnpaVFNyaONj2xuWia6OWpzf39hAp+1nb2ecqU67GZgbUnkkxf1/g/ rHeg== X-Forwarded-Encrypted: i=1; AFNElJ+LCPjrB1LeTINZidV16kBQnFbLwxtuC1k9qguOvFN2WVILPv5S9ehbPnTsSzbLsuTvd0KBnIerJdMSYPA=@vger.kernel.org X-Gm-Message-State: AOJu0Yxxbl8AqXboYxM/FIg1zI0u/o2Q2GDUmtx562wGfh/s0YAF6Uzt ZXXxmj97J+kYNklcEz1z5w75uQQxIZocLwDeyDtZFyGVOnNruxOI2gDI X-Gm-Gg: AfdE7clY5lAqUkYhqOXnRyHgo1XyMUzC+9To17r+v9FlVdqKot/kwjIaZAbQ/97B4pK fvkVMoT7e/obgjHP+3Pv1J/qWu72CwsICipycLysyEZe8x6lImGGz9aKz6b2+PUVaEhU7dV9zLl /ED0jMxbE81u7A+LAIor9aj1ASz3+33zy3LJFwa6hvTCO8l/8yzp6MeiWbsKNGqnqv93FV49Dgv gxC25ZVLkPv2uFnCVWK2EMfmQELXnQjnomuUivZBtiUn2XEooq0EfvuL1MrzZ5CpetnC7r3dCvd dguGT/vprWlAQLVHBIFJO1ARXiQ1JI+pEb0j7Zd0tNLJG/+LhieXmyUDx1pNTC4FjneDr9jeZei 8NFf0ylolj8l4oKb1ishkHqVFYxjSojoYqyvPeK62+tOemBavSqHw9VaTgg7Cep+DW/UdbNYMia KJQknOAGGp2EtrpC1LA4IVtLtvi2bXYsR3o/u1ZRWA X-Received: by 2002:a05:6a00:27a1:b0:835:405a:7e6f with SMTP id d2e1a72fcca58-84541ba8066mr953371b3a.14.1781758167819; Wed, 17 Jun 2026 21:49:27 -0700 (PDT) Received: from localhost.localdomain ([210.184.73.204]) by smtp.gmail.com with ESMTPSA id d2e1a72fcca58-8434b020b53sm17214781b3a.47.2026.06.17.21.49.19 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Wed, 17 Jun 2026 21:49:27 -0700 (PDT) From: Hao Jia To: akpm@linux-foundation.org, tj@kernel.org, hannes@cmpxchg.org, shakeel.butt@linux.dev, mhocko@kernel.org, yosry@kernel.org, mkoutny@suse.com, nphamcs@gmail.com, chengming.zhou@linux.dev, muchun.song@linux.dev, roman.gushchin@linux.dev Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, Hao Jia Subject: [PATCH v4 1/5] mm/zswap: Extend shrink_memcg() writeback capability Date: Thu, 18 Jun 2026 12:48:53 +0800 Message-Id: <20260618044857.69439-2-jiahao.kernel@gmail.com> X-Mailer: git-send-email 2.39.2 (Apple Git-143) In-Reply-To: <20260618044857.69439-1-jiahao.kernel@gmail.com> References: <20260618044857.69439-1-jiahao.kernel@gmail.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Hao Jia Currently, shrink_memcg() writes back at most one entry per-node during its traversal. This makes shrink_worker() inefficient, as it must repeatedly re-enter shrink_memcg() to make any substantial progress. To address this, extend shrink_memcg() and rewrite its LRU iteration logic to support batch writeback. Introduce the nr_to_writeback parameter to support a writeback budget based on compressed size. This enables batch writeback in the shrink_worker() path, while maintaining a low writeback budget in the zswap_store() path. Additionally, to prepare for future proactive writeback, update the return value semantics of shrink_memcg(): a positive value now represents the actual number of compressed bytes written back, 0 indicates that candidates existed but no writeback succeeded, and a negative value represents an error code. Suggested-by: Yosry Ahmed Signed-off-by: Hao Jia --- mm/zswap.c | 116 ++++++++++++++++++++++++++++++++++++++++++++--------- 1 file changed, 97 insertions(+), 19 deletions(-) diff --git a/mm/zswap.c b/mm/zswap.c index 761cd699e0a3..d7d031dee4cd 100644 --- a/mm/zswap.c +++ b/mm/zswap.c @@ -160,6 +160,11 @@ struct zswap_pool { char tfm_name[CRYPTO_MAX_ALG_NAME]; }; =20 +struct zswap_shrink_walk_arg { + unsigned long bytes_written; + bool encountered_page_in_swapcache; +}; + /* Global LRU lists shared by all zswap pools. */ static struct list_lru zswap_list_lru; =20 @@ -1089,8 +1094,9 @@ static enum lru_status shrink_memcg_cb(struct list_he= ad *item, struct list_lru_o void *arg) { struct zswap_entry *entry =3D container_of(item, struct zswap_entry, lru); - bool *encountered_page_in_swapcache =3D (bool *)arg; + struct zswap_shrink_walk_arg *walk_arg =3D arg; swp_entry_t swpentry; + unsigned int length; enum lru_status ret =3D LRU_REMOVED_RETRY; int writeback_result; =20 @@ -1135,8 +1141,13 @@ static enum lru_status shrink_memcg_cb(struct list_h= ead *item, struct list_lru_o * Once the lru lock is dropped, the entry might get freed. The * swpentry is copied to the stack, and entry isn't deref'd again * until the entry is verified to still be alive in the tree. + * + * entry->length is also copied while the lock is held, because + * zswap_writeback_entry() frees the entry on success and we still + * need its compressed size to account for writeback. */ swpentry =3D entry->swpentry; + length =3D entry->length; =20 /* * It's safe to drop the lock here because we return either @@ -1155,12 +1166,13 @@ static enum lru_status shrink_memcg_cb(struct list_= head *item, struct list_lru_o * into the warmer region. We should terminate shrinking (if we're in th= e dynamic * shrinker context). */ - if (writeback_result =3D=3D -EEXIST && encountered_page_in_swapcache) { + if (writeback_result =3D=3D -EEXIST) { ret =3D LRU_STOP; - *encountered_page_in_swapcache =3D true; + walk_arg->encountered_page_in_swapcache =3D true; } } else { zswap_written_back_pages++; + walk_arg->bytes_written +=3D length; } =20 return ret; @@ -1169,8 +1181,11 @@ static enum lru_status shrink_memcg_cb(struct list_h= ead *item, struct list_lru_o static unsigned long zswap_shrinker_scan(struct shrinker *shrinker, struct shrink_control *sc) { + struct zswap_shrink_walk_arg walk_arg =3D { + .bytes_written =3D 0, + .encountered_page_in_swapcache =3D false, + }; unsigned long shrink_ret; - bool encountered_page_in_swapcache =3D false; =20 if (!zswap_shrinker_enabled || !mem_cgroup_zswap_writeback_enabled(sc->memcg)) { @@ -1179,9 +1194,9 @@ static unsigned long zswap_shrinker_scan(struct shrin= ker *shrinker, } =20 shrink_ret =3D list_lru_shrink_walk(&zswap_list_lru, sc, &shrink_memcg_cb, - &encountered_page_in_swapcache); + &walk_arg); =20 - if (encountered_page_in_swapcache) + if (walk_arg.encountered_page_in_swapcache) return SHRINK_STOP; =20 return shrink_ret ? shrink_ret : SHRINK_STOP; @@ -1275,10 +1290,32 @@ static struct shrinker *zswap_alloc_shrinker(void) return shrinker; } =20 -static int shrink_memcg(struct mem_cgroup *memcg) -{ - int nid, shrunk =3D 0, scanned =3D 0; +/* + * The maximum acceptable scan cost factor for writing back + * PAGE_SIZE bytes of compressed data. + */ +#define ZSWAP_WB_SCAN_FACTOR 16UL +#define NR_ZSWAP_WB_BATCH 64UL =20 +/* + * Iterate over the per-node zswap LRUs of @memcg in batches, writing back + * up to @nr_to_writeback * PAGE_SIZE bytes of compressed data. + * + * Return: The number of bytes written back, or -ENOENT if @memcg has + * writeback disabled, is a zombie cgroup, or has empty zswap LRUs. + */ +static long shrink_memcg(struct mem_cgroup *memcg, + unsigned long nr_to_writeback) +{ + struct zswap_shrink_walk_arg walk_arg =3D { + .bytes_written =3D 0, + .encountered_page_in_swapcache =3D false, + }; + u64 bytes_to_writeback =3D nr_to_writeback << PAGE_SHIFT; + bool memcg_list_is_empty =3D true; + int nid; + + /* Memcg with zswap writeback disabled are not candidates. */ if (!mem_cgroup_zswap_writeback_enabled(memcg)) return -ENOENT; =20 @@ -1290,24 +1327,65 @@ static int shrink_memcg(struct mem_cgroup *memcg) return -ENOENT; =20 for_each_node_state(nid, N_NORMAL_MEMORY) { - unsigned long nr_to_walk =3D 1; + unsigned long nr_to_scan, nr_scanned =3D 0; + unsigned long remain; + walk_arg.encountered_page_in_swapcache =3D false; + /* + * Cap by LRU length: bounds rewalks when referenced + * entries keep rotating to the tail. + */ + nr_to_scan =3D list_lru_count_one(&zswap_list_lru, nid, memcg); + if (!nr_to_scan) + continue; + memcg_list_is_empty =3D false; + + /* + * Cap by SCAN_FACTOR * remain budget: bounds scan cost + * to the remaining writeback budget. + */ + remain =3D DIV_ROUND_UP(bytes_to_writeback - walk_arg.bytes_written, PAG= E_SIZE); + nr_to_scan =3D min(nr_to_scan, + remain * ZSWAP_WB_SCAN_FACTOR); =20 - shrunk +=3D list_lru_walk_one(&zswap_list_lru, nid, memcg, - &shrink_memcg_cb, NULL, &nr_to_walk); - scanned +=3D 1 - nr_to_walk; + while (nr_scanned < nr_to_scan) { + unsigned long nr_to_walk =3D min(NR_ZSWAP_WB_BATCH, + nr_to_scan - nr_scanned); + + /* + * Account for the committed budget rather than the walker's + * actual delta. If the list is emptied concurrently, the + * walker visits nothing and nr_scanned would never advance. + */ + nr_scanned +=3D nr_to_walk; + + list_lru_walk_one(&zswap_list_lru, nid, memcg, + &shrink_memcg_cb, + &walk_arg, + &nr_to_walk); + + if (walk_arg.bytes_written >=3D bytes_to_writeback) + return walk_arg.bytes_written; + + if (walk_arg.encountered_page_in_swapcache) + break; + + cond_resched(); + } } =20 - if (!scanned) + /* Return -ENOENT if all zswap LRU lists are empty. */ + if (memcg_list_is_empty) return -ENOENT; =20 - return shrunk ? 0 : -EAGAIN; + return walk_arg.bytes_written; } =20 static void shrink_worker(struct work_struct *w) { struct mem_cgroup *memcg; - int ret, failures =3D 0, attempts =3D 0; + int failures =3D 0, attempts =3D 0; unsigned long thr; + long ret; =20 /* Reclaim down to the accept threshold */ thr =3D zswap_accept_thr_pages(); @@ -1368,7 +1446,7 @@ static void shrink_worker(struct work_struct *w) goto resched; } =20 - ret =3D shrink_memcg(memcg); + ret =3D shrink_memcg(memcg, NR_ZSWAP_WB_BATCH); /* drop the extra reference */ mem_cgroup_put(memcg); =20 @@ -1382,7 +1460,7 @@ static void shrink_worker(struct work_struct *w) continue; ++attempts; =20 - if (ret && ++failures =3D=3D MAX_RECLAIM_RETRIES) + if (ret <=3D 0 && ++failures =3D=3D MAX_RECLAIM_RETRIES) break; resched: cond_resched(); @@ -1492,7 +1570,7 @@ bool zswap_store(struct folio *folio) objcg =3D get_obj_cgroup_from_folio(folio); if (objcg && !obj_cgroup_may_zswap(objcg)) { memcg =3D get_mem_cgroup_from_objcg(objcg); - if (shrink_memcg(memcg)) { + if (shrink_memcg(memcg, 1) <=3D 0) { mem_cgroup_put(memcg); goto put_objcg; } --=20 2.34.1 From nobody Fri Jun 19 15:07:33 2026 Received: from mail-pg1-f180.google.com (mail-pg1-f180.google.com [209.85.215.180]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 32A0E280A58 for ; Thu, 18 Jun 2026 04:49:38 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.215.180 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1781758180; cv=none; b=VvIGAgvddA4yBlyDpXe7uv/5Vl75jXlsUhasGCaAEZzD/UzSAgUI629Eu50pGagZ5m26TUsI3ONE5X5K2yvQdv0gSDpVPpBg/YbKrWda2jufb+VFBX68qkOI7NVnfKB8IZIgm64wj3wc5ktP7HW70RrfRKLpNDqHA7jlAeFY/Sk= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1781758180; c=relaxed/simple; bh=TMVbqmJ8fC23b+mq81KREe/sP6Ygp+jEWeBZ/637AVU=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=HrzS9bxcRMW9GfLkFHrZJk2pZ7/g+Smtqi/RFouGEL4g4dab1ztmHzyv74Pn4NLZELN4cgnC5cEJRl59W0IL07NOKYXbxS0EBUUgXDiidd0K15OoCFN2Iwctw2CCLjAvnNy69sDWpe08z7Nw+ODdRxXHVbIyallWGAeB/ao1WJc= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=Ih3rTNLu; arc=none smtp.client-ip=209.85.215.180 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="Ih3rTNLu" Received: by mail-pg1-f180.google.com with SMTP id 41be03b00d2f7-c85a297d2d2so379855a12.0 for ; Wed, 17 Jun 2026 21:49:38 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20251104; t=1781758178; x=1782362978; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=fTogXYNv78gwID3PJ713yHc97a0x/oZa3BlIEI1TXEw=; b=Ih3rTNLuMDAdLAtOT78Py2hc/tT0gsXMgPshfWDQeFXZwdDK3T1duFjLgmwkSEIsQh pDe4jJp25YKhzi55/523oRrS9EudaeUeiVcaWNDyX1h8sNJvAtiHvZJlkm395G0Fyban OgRudDWPlfHhBEsqjiGGHZrAqPJyVas8Z1TJpRenXQepm7RXxIiesqbjN5ewYe3/a98Y T2tJa3YEYqwBrrKYvBTi/dJDB3Urm4Vk3oTw2LIaTBUfxnyZQOW+tklsnVYyR3DYHOvE v11ryA0jwbcqcxLNORuQ5k4HYdrkFHc/kIQeiSIkwCvMF7jg4hU4A3P6yv2Do5mtKzKB Q5nQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1781758178; x=1782362978; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=fTogXYNv78gwID3PJ713yHc97a0x/oZa3BlIEI1TXEw=; b=jl/6L26CkHYIKtbmYHfCHTnu53+INjj+1iMQ7hBh2JKTMB6yvUQBzngDqaiRsc6D87 dwugZCNJ18pwhpQZllFd1U8aRGJ6zTLFHeofME6crpsOTgh5NlcuRmvd0kuhLvsFO1SV +iJ4k487c9NM04NZ+UXBFZXD5qwCNCVjBLWdluUQsXD8Efg50SyVZTc5ukRyrm51/aEn mzZNhg1uE2UNTTd4p5doS42agtc6/C80dPlY5Kzs6kV4TYlMDfBfiE4QSoHTBla98rZx komBlcGZXhasP5hDaMJGxr+oRVZfiJsxlIOBuhwGomVys5/I9LR/cQ1kvyj3LdB9/OV8 3pmw== X-Forwarded-Encrypted: i=1; AFNElJ8Y9JiVkeYSOSJGlkz0gmtsWXacP+DfgaLAM9UOw469kmDVpqy0DqtCcK2huMqT9lWAYoDS14f/FRCWfO4=@vger.kernel.org X-Gm-Message-State: AOJu0Yxa1+buHMW7CsQbG80krM6hpknAQhXPuRJItD5mYTwM3vv2vvDY o+B0BYJ3Nr8NI4UYSDPLUPXbBIDNfBhpTlrLxjIjKN2RlhKlWroyHspK X-Gm-Gg: Acq92OHoRfkG7jBQLYaNorr9ML11n3UNdI65DqcOyb2fXF3UsMfJIk4lfZW6kG1tHqg pnVY2kzXJrhfdrBsBGLdSLBKCPWhtGEklzR1LteWbloA6k2IflGipZOv2C3GKatRHeX2HFSMRuW NRqHKorN615Wksx1prSuMSnL5uY4GaZa/kf3JhQtNV3vdOaB6mrglFC5o+7ByrcWWI0DXzl02qP L2EBrV85+kY7HQ6RsxahppCBqxeGE0nW5l1S68jatBeDm19ynLVLzZQ/udK8ZgdfMrLTKxpamHA VLAgNH+9B2lQNRCX+BzQUfA7P91Kdwpuu8k89nq687omay7W8oQa87pC7m8XQsgGHWS4vYTr+YU zVxc4p5MQEeM/OwldgAsbnSLff72KN8//7wIuPiATXqu2FhyL62aSly2gYPu7H5N2qm3HNhlo7U b3D4u2cNc5BXXMknV/EfRyqEs6WY+QDayL0IvruvzQ X-Received: by 2002:a05:6a20:e212:b0:3b5:6b5a:4f29 with SMTP id adf61e73a8af0-3b8b7cd8e45mr7178450637.30.1781758178441; Wed, 17 Jun 2026 21:49:38 -0700 (PDT) Received: from localhost.localdomain ([210.184.73.204]) by smtp.gmail.com with ESMTPSA id d2e1a72fcca58-8434b020b53sm17214781b3a.47.2026.06.17.21.49.29 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Wed, 17 Jun 2026 21:49:38 -0700 (PDT) From: Hao Jia To: akpm@linux-foundation.org, tj@kernel.org, hannes@cmpxchg.org, shakeel.butt@linux.dev, mhocko@kernel.org, yosry@kernel.org, mkoutny@suse.com, nphamcs@gmail.com, chengming.zhou@linux.dev, muchun.song@linux.dev, roman.gushchin@linux.dev Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, Hao Jia Subject: [PATCH v4 2/5] mm/zswap: Factor writeback loop out of shrink_worker() Date: Thu, 18 Jun 2026 12:48:54 +0800 Message-Id: <20260618044857.69439-3-jiahao.kernel@gmail.com> X-Mailer: git-send-email 2.39.2 (Apple Git-143) In-Reply-To: <20260618044857.69439-1-jiahao.kernel@gmail.com> References: <20260618044857.69439-1-jiahao.kernel@gmail.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Hao Jia In preparation for sharing the writeback loop with proactive writeback, move the memcg iteration into zswap_iter_global() and the loop into zswap_try_to_writeback(lower, upper). shrink_worker() is reduced to computing the accept threshold and invoking the helper. Suggested-by: Yosry Ahmed Signed-off-by: Hao Jia --- mm/zswap.c | 136 +++++++++++++++++++++++++++++++---------------------- 1 file changed, 81 insertions(+), 55 deletions(-) diff --git a/mm/zswap.c b/mm/zswap.c index d7d031dee4cd..e29f8a61412d 100644 --- a/mm/zswap.c +++ b/mm/zswap.c @@ -1380,61 +1380,75 @@ static long shrink_memcg(struct mem_cgroup *memcg, return walk_arg.bytes_written; } =20 -static void shrink_worker(struct work_struct *w) +/* + * Global iteration uses a global cursor to select from all online + * memcgs in a round-robin fashion. + * + * We save iteration cursor memcg into zswap_next_shrink, + * which can be modified by the offline memcg cleaner + * zswap_memcg_offline_cleanup(). + * + * Since the offline cleaner is called only once, we cannot leave an + * offline memcg reference in zswap_next_shrink. + * We can rely on the cleaner only if we get online memcg under lock. + * + * If we get an offline memcg, we cannot determine if the cleaner has + * already been called or will be called later. We must put back the + * reference before returning from this function. Otherwise, the + * offline memcg left in zswap_next_shrink will hold the reference + * until the next run of shrink_worker(). + */ +static struct mem_cgroup *zswap_iter_global(void) { struct mem_cgroup *memcg; - int failures =3D 0, attempts =3D 0; - unsigned long thr; - long ret; - - /* Reclaim down to the accept threshold */ - thr =3D zswap_accept_thr_pages(); =20 /* - * Global reclaim will select cgroup in a round-robin fashion from all - * online memcgs, but memcgs that have no pages in zswap and - * writeback-disabled memcgs (memory.zswap.writeback=3D0) are not - * candidates for shrinking. + * Start from the next memcg after zswap_next_shrink. + * When the offline cleaner has already advanced the cursor, + * advancing the cursor here overlooks one memcg, but this + * should be negligibly rare. * - * Shrinking will be aborted if we encounter the following - * MAX_RECLAIM_RETRIES times: - * - No writeback-candidate memcgs found in a memcg tree walk. - * - Shrinking a writeback-candidate memcg failed. - * - * We save iteration cursor memcg into zswap_next_shrink, - * which can be modified by the offline memcg cleaner - * zswap_memcg_offline_cleanup(). - * - * Since the offline cleaner is called only once, we cannot leave an - * offline memcg reference in zswap_next_shrink. - * We can rely on the cleaner only if we get online memcg under lock. - * - * If we get an offline memcg, we cannot determine if the cleaner has - * already been called or will be called later. We must put back the - * reference before returning from this function. Otherwise, the - * offline memcg left in zswap_next_shrink will hold the reference - * until the next run of shrink_worker(). + * If we get an online memcg, keep the extra reference in case + * the original one obtained by mem_cgroup_iter() is dropped by + * zswap_memcg_offline_cleanup() while we are shrinking the + * memcg. */ + spin_lock(&zswap_shrink_lock); do { - /* - * Start shrinking from the next memcg after zswap_next_shrink. - * When the offline cleaner has already advanced the cursor, - * advancing the cursor here overlooks one memcg, but this - * should be negligibly rare. - * - * If we get an online memcg, keep the extra reference in case - * the original one obtained by mem_cgroup_iter() is dropped by - * zswap_memcg_offline_cleanup() while we are shrinking the - * memcg. - */ - spin_lock(&zswap_shrink_lock); - do { - memcg =3D mem_cgroup_iter(NULL, zswap_next_shrink, NULL); - zswap_next_shrink =3D memcg; - } while (memcg && !mem_cgroup_tryget_online(memcg)); - spin_unlock(&zswap_shrink_lock); + memcg =3D mem_cgroup_iter(NULL, zswap_next_shrink, NULL); + zswap_next_shrink =3D memcg; + } while (memcg && !mem_cgroup_tryget_online(memcg)); + spin_unlock(&zswap_shrink_lock); + + return memcg; +} + +/* + * Walk the memcg tree and write back zswap pages until the + * (lower_pages, upper_pages) window closes, or abort encounter + * MAX_RECLAIM_RETRIES times of the following conditions: + * - No writeback-candidate memcgs found in a memcg tree walk. + * - Shrinking a writeback-candidate memcg failed. + * + * For shrink_worker(), it passes lower=3Dthr and upper=3Dzswap_total_page= s(). + * The @upper limit is refreshed in each iteration by re-evaluating + * zswap_total_pages(), and the window closes once the total falls + * below the threshold. + */ +static void zswap_try_to_writeback(unsigned long lower_pages, + unsigned long upper_pages) +{ + int failures =3D 0, attempts =3D 0; + struct mem_cgroup *iter_memcg; + + while (lower_pages < upper_pages) { + unsigned long batch_size; + long shrunk; =20 - if (!memcg) { + cond_resched(); + + iter_memcg =3D zswap_iter_global(); + if (!iter_memcg) { /* * Continue shrinking without incrementing failures if * we found candidate memcgs in the last tree walk. @@ -1443,12 +1457,16 @@ static void shrink_worker(struct work_struct *w) break; =20 attempts =3D 0; - goto resched; + continue; } =20 - ret =3D shrink_memcg(memcg, NR_ZSWAP_WB_BATCH); + batch_size =3D min(upper_pages - lower_pages, NR_ZSWAP_WB_BATCH); + shrunk =3D shrink_memcg(iter_memcg, batch_size); /* drop the extra reference */ - mem_cgroup_put(memcg); + mem_cgroup_put(iter_memcg); + + /* zswap total pages might have changed, refresh it. */ + upper_pages =3D zswap_total_pages(); =20 /* * There are no writeback-candidate pages in the memcg. @@ -1456,15 +1474,23 @@ static void shrink_worker(struct work_struct *w) * with pages in zswap. Skip this without incrementing attempts * and failures. */ - if (ret =3D=3D -ENOENT) + if (shrunk =3D=3D -ENOENT) continue; ++attempts; =20 - if (ret <=3D 0 && ++failures =3D=3D MAX_RECLAIM_RETRIES) + if (shrunk <=3D 0 && ++failures =3D=3D MAX_RECLAIM_RETRIES) break; -resched: - cond_resched(); - } while (zswap_total_pages() > thr); + } +} + +static void shrink_worker(struct work_struct *w) +{ + unsigned long thr; + + /* Reclaim down to the accept threshold */ + thr =3D zswap_accept_thr_pages(); + + zswap_try_to_writeback(thr, zswap_total_pages()); } =20 /********************************* --=20 2.34.1 From nobody Fri Jun 19 15:07:33 2026 Received: from mail-pf1-f172.google.com (mail-pf1-f172.google.com [209.85.210.172]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id DDED92BCF45 for ; Thu, 18 Jun 2026 04:49:47 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.210.172 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1781758189; cv=none; b=dbREzy1cvgYHNvGqGrAV+prPYmqvuyAWn78NIhuU7oW8iiHAToweWYC+3CUewKM4BMmGTyG9SmBC9cxu23hPUkLeWM9oE0KO6QGAzkThNyvYdIPt7Ky9PGZJgYSgcfu5uRiGga+m7V+HLYm6OScZRYOyWLJQiRK1V+FgU1hmrI8= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1781758189; c=relaxed/simple; bh=HCtUpjIPIG5Uugov2OV8ihV47rK3YaTGz6VtefggPpM=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=AP4JAgXNobOPGGOMa48wLKlaKFce+tuZ/kKcSPDasHVOYyvkHzh9cUtwxQGzBWd/FFW3y6+444GGCj6g4LArCWJuv7myPQig9GeNEFaW8U3ZycJnHJlTmitK809AFwV5j16bRw3ok7YJHQrcupudMkocSPbrl7hIt7gPOwRAqb8= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=J/1GzSwX; arc=none smtp.client-ip=209.85.210.172 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="J/1GzSwX" Received: by mail-pf1-f172.google.com with SMTP id d2e1a72fcca58-842848fd613so339011b3a.3 for ; Wed, 17 Jun 2026 21:49:47 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20251104; t=1781758187; x=1782362987; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=w1papM2W9KLSMiQ51HSi85TiIoRBa0nUKenc379MiLQ=; b=J/1GzSwXzubbus1l2KW2xxT8eWDCiQS6d3n1wagJUMT8oGhVB3RDvuzGtuado7ucXO k4RrdK/dOIIkuaPCmrsfEfz385YAEsNmIVlKlMMchPGYcUHleG3kkZbdQJyEBHzPWJd2 0nsc2WBV3bCdrSOjX/RsilTGAjRaaHM1ZqTRTO0IKGTlpHwzGUZSVQfVieYC7qV1O/d1 XYFQquii0U1z7jGP7ObI1j9Xc3ftvDYhQwKu/0+3cj8NFMnjNpzbCJWD774NsSCfSQuD f8eQz5GPwVmKHacl7ZgnfKdIc9gRtw15ln3X5q7+3h9UcAW82kNE2Sf3bv7yF8r9oozk Wp3g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1781758187; x=1782362987; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=w1papM2W9KLSMiQ51HSi85TiIoRBa0nUKenc379MiLQ=; b=VMJDPQaj0t476vfZDGvi5YKr2fkyNB2nLssQEQdSqGvsuyRvrecN5Gzz3f94DK0RpV cud1Z/VhkljxEL7k3gcevfRYTLh2YEpkLATGYUKWn+vnauNIlDWv7j3Qzay4ymFAREW5 5QIK5hJNSmYGl1DU8fnCSMkR0MzHAmOtD9HAlTEQaBhl5VCUEeRv8oiszz0Vf6tPRQY/ GcYguREUTPjvxEWHPRkpN6vUa+0xrpS+FUDf/yOGJaU+THjWDxZKMbjqDEjoFNRP0Byr xF1HOv0IKVAnEh9XjMpvt+AOJxVyXdTzpgJMXv+31l5iVKHtYdk9O7L5k7kOnixYMB9M b7Bg== X-Forwarded-Encrypted: i=1; AFNElJ8KEzJy1Lx1MMwGH6PDuzWSgZc9vTaQTgQgw5+HwK1dwSbvcgqMhdR81m1ylJGAL0K+zWd0yzjWSokL62E=@vger.kernel.org X-Gm-Message-State: AOJu0YyyyurTW0HlhsBldkADUpRZA6WGoBxcv4yfoLJ7C52u0MW/5bOU bBGt73aBn0BjfoG7u2MUt7I0W7mdQ/75jTTIA38GQLh3oXTnnNtgn6E8 X-Gm-Gg: Acq92OFdl2lX47K3ng9ZJcR2hgUgNYlyNt1I95Xh4S/UXRksFDBOWIyCEZhT9qnGZde qpeouIf4FPNfW2Gabjl721kvDXMi7mG3spi06Hp1pFtsQT9Ek9ccL6lt0tFLZZ458VdZmG3wsR/ itx2lLzHvrm9jQ0dfjiAbdfwpEFcRqxRGoQpERezmgYqRIoHZl4EXvgZ1XGFg+is4ZpkCOH4QAo dNP27JuJqLekJZ+YMH27ucY7IQR8GYtCGl5Y/DaohfwOTqR6b9cjPhI1ND3mFniV8uYzIY7/n+e uv3s51G01+2TCpbrcBtjDRQjCCAp1GN0kSxf4i+Tm5ypQ8/7ylpzyEYGD+F5Go5mRmECTxQcnSD s/IRPu9RPwLiGjmVRqENpnv1BqAzBp22j0GOLhm10xZpqtuzwa9QG176VMAEx3Jsf8rJ2E2M4Cr vmDThLAGO5CaeaUypww1YIcCEWzQ7KS2iR828g+/SxP6+hfOZ18AE= X-Received: by 2002:a05:6a00:23d4:b0:842:7476:2376 with SMTP id d2e1a72fcca58-84541db467bmr952550b3a.41.1781758187176; Wed, 17 Jun 2026 21:49:47 -0700 (PDT) Received: from localhost.localdomain ([210.184.73.204]) by smtp.gmail.com with ESMTPSA id d2e1a72fcca58-8434b020b53sm17214781b3a.47.2026.06.17.21.49.39 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Wed, 17 Jun 2026 21:49:46 -0700 (PDT) From: Hao Jia To: akpm@linux-foundation.org, tj@kernel.org, hannes@cmpxchg.org, shakeel.butt@linux.dev, mhocko@kernel.org, yosry@kernel.org, mkoutny@suse.com, nphamcs@gmail.com, chengming.zhou@linux.dev, muchun.song@linux.dev, roman.gushchin@linux.dev Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, Hao Jia Subject: [PATCH v4 3/5] mm/zswap: Implement proactive writeback Date: Thu, 18 Jun 2026 12:48:55 +0800 Message-Id: <20260618044857.69439-4-jiahao.kernel@gmail.com> X-Mailer: git-send-email 2.39.2 (Apple Git-143) In-Reply-To: <20260618044857.69439-1-jiahao.kernel@gmail.com> References: <20260618044857.69439-1-jiahao.kernel@gmail.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Hao Jia Zswap currently writes back pages to backing swap reactively, triggered either by the shrinker or when the pool reaches its size limit. There is no mechanism to control the amount of writeback for a specific memory cgroup. However, users may want to proactively write back zswap pages, e.g., to free up memory for other applications or to prepare for memory-intensive workloads. Introduce a "zswap_writeback_only" key to the memory.reclaim cgroup interface. When specified, this key bypasses standard memory reclaim and exclusively performs proactive zswap writeback up to the requested budget. If omitted, the default reclaim behavior remains unchanged. Example usage: # Write back 10MB of compressed data from zswap to the backing swap echo "10M zswap_writeback_only" > memory.reclaim Note that the actual amount of compressed data written back may be less than requested due to the zswap second-chance algorithm: referenced entries are rotated on the LRU on the first encounter and only written back on a second pass. If fewer bytes are written back than requested, -EAGAIN is returned, matching the existing memory.reclaim semantics. Internally, extend user_proactive_reclaim() to parse the new "zswap_writeback_only" token and invoke the dedicated handler zswap_proactive_writeback(). This handler reuses zswap_try_to_writeback() to walk the target memcg subtree, draining per-node zswap LRUs through list_lru_walk_one() with the shrink_memcg_cb() callback. Suggested-by: Yosry Ahmed Suggested-by: Nhat Pham Signed-off-by: Hao Jia --- Documentation/admin-guide/cgroup-v2.rst | 18 ++++- Documentation/admin-guide/mm/zswap.rst | 11 +++- include/linux/zswap.h | 7 ++ mm/vmscan.c | 14 ++++ mm/zswap.c | 87 +++++++++++++++++++++---- 5 files changed, 120 insertions(+), 17 deletions(-) diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-= guide/cgroup-v2.rst index 6efd0095ed99..e52d97e8e9c6 100644 --- a/Documentation/admin-guide/cgroup-v2.rst +++ b/Documentation/admin-guide/cgroup-v2.rst @@ -1425,9 +1425,10 @@ PAGE_SIZE multiple when read back. =20 The following nested keys are defined. =20 - =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D =3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D + =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D =3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D swappiness Swappiness value to reclaim with - =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D =3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D + zswap_writeback_only Only perform proactive zswap writeback + =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D =3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D =20 Specifying a swappiness value instructs the kernel to perform the reclaim with that swappiness value. Note that this has the @@ -1437,6 +1438,19 @@ The following nested keys are defined. The valid range for swappiness is [0-200, max], setting swappiness=3Dmax exclusively reclaims anonymous memory. =20 + The zswap_writeback_only key skips ordinary memory reclaim and + writes back pages from zswap to the backing swap device until + the requested amount has been written or no further candidates + are found. This is useful to proactively offload cold compressed + data from the zswap pool to the swap device. It is only available + if zswap writeback is enabled. zswap_writeback_only cannot be + combined with swappiness; specifying both returns -EINVAL. + + Example:: + + # Writeback up to 10MB of compressed data from zswap to the backing swap + echo "10M zswap_writeback_only" > memory.reclaim + memory.peak A read-write single value file which exists on non-root cgroups. =20 diff --git a/Documentation/admin-guide/mm/zswap.rst b/Documentation/admin-g= uide/mm/zswap.rst index 2464425c783d..fdeb197d1683 100644 --- a/Documentation/admin-guide/mm/zswap.rst +++ b/Documentation/admin-guide/mm/zswap.rst @@ -131,7 +131,16 @@ User can enable it as follows:: echo Y > /sys/module/zswap/parameters/shrinker_enabled =20 This can be enabled at the boot time if ``CONFIG_ZSWAP_SHRINKER_DEFAULT_ON= `` is -selected. +selected. Once enabled, the shrinker automatically writes back zswap pages= to +backing swap during memory reclaim. + +If users want to explicitly trigger proactive zswap writeback for a specif= ic +memory cgroup without invoking standard page reclaim, it can be done as fo= llows:: + + echo "10M zswap_writeback_only" > /sys/fs/cgroup//memory.rec= laim + +Both of the methods mentioned above are subject to the ``memory.zswap.writ= eback`` +control. This means that ``memory.zswap.writeback`` can prevent all zswap = writeback. =20 A debugfs interface is provided for various statistic about pool size, num= ber of pages stored, same-value filled pages and various counters for the reas= ons diff --git a/include/linux/zswap.h b/include/linux/zswap.h index 30c193a1207e..7bf38318dab1 100644 --- a/include/linux/zswap.h +++ b/include/linux/zswap.h @@ -35,6 +35,7 @@ void zswap_lruvec_state_init(struct lruvec *lruvec); void zswap_folio_swapin(struct folio *folio); bool zswap_is_enabled(void); bool zswap_never_enabled(void); +int zswap_proactive_writeback(struct mem_cgroup *memcg, unsigned long nr_t= o_writeback); #else =20 struct zswap_lruvec_state {}; @@ -69,6 +70,12 @@ static inline bool zswap_never_enabled(void) return true; } =20 +static inline int zswap_proactive_writeback(struct mem_cgroup *memcg, + unsigned long nr_to_writeback) +{ + return -EOPNOTSUPP; +} + #endif =20 #endif /* _LINUX_ZSWAP_H */ diff --git a/mm/vmscan.c b/mm/vmscan.c index 299b5d9e8836..2e6c14569fc2 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -64,6 +64,7 @@ =20 #include #include +#include =20 #include "internal.h" #include "swap.h" @@ -7855,11 +7856,13 @@ static unsigned long __node_reclaim(struct pglist_d= ata *pgdat, gfp_t gfp_mask, enum { MEMORY_RECLAIM_SWAPPINESS =3D 0, MEMORY_RECLAIM_SWAPPINESS_MAX, + MEMORY_RECLAIM_ZSWAP_WRITEBACK_ONLY, MEMORY_RECLAIM_NULL, }; static const match_table_t tokens =3D { { MEMORY_RECLAIM_SWAPPINESS, "swappiness=3D%d"}, { MEMORY_RECLAIM_SWAPPINESS_MAX, "swappiness=3Dmax"}, + { MEMORY_RECLAIM_ZSWAP_WRITEBACK_ONLY, "zswap_writeback_only"}, { MEMORY_RECLAIM_NULL, NULL }, }; =20 @@ -7869,6 +7872,7 @@ int user_proactive_reclaim(char *buf, unsigned int nr_retries =3D MAX_RECLAIM_RETRIES; unsigned long nr_to_reclaim, nr_reclaimed =3D 0; int swappiness =3D -1; + bool zswap_writeback_only =3D false; char *old_buf, *start; substring_t args[MAX_OPT_ARGS]; gfp_t gfp_mask =3D GFP_KERNEL; @@ -7899,11 +7903,21 @@ int user_proactive_reclaim(char *buf, case MEMORY_RECLAIM_SWAPPINESS_MAX: swappiness =3D SWAPPINESS_ANON_ONLY; break; + case MEMORY_RECLAIM_ZSWAP_WRITEBACK_ONLY: + zswap_writeback_only =3D true; + break; default: return -EINVAL; } } =20 + if (zswap_writeback_only) { + /* zswap_writeback_only and swappiness are mutually exclusive. */ + if (swappiness !=3D -1) + return -EINVAL; + return zswap_proactive_writeback(memcg, nr_to_reclaim); + } + while (nr_reclaimed < nr_to_reclaim) { /* Will converge on zero, but reclaim enforces a minimum */ unsigned long batch_size =3D (nr_to_reclaim - nr_reclaimed) / 4; diff --git a/mm/zswap.c b/mm/zswap.c index e29f8a61412d..28200552dde3 100644 --- a/mm/zswap.c +++ b/mm/zswap.c @@ -1423,6 +1423,27 @@ static struct mem_cgroup *zswap_iter_global(void) return memcg; } =20 +/* + * Local iteration uses a local cursor to select from online memcgs + * under @root in a round-robin fashion. + * + * Pass the previous return value as @prev to advance the round-robin + * iteration, or pass NULL to start a new walk. If exiting early before + * the iteration completes, the caller must call mem_cgroup_iter_break() + * to release the cursor reference. + */ +static struct mem_cgroup *zswap_iter_local(struct mem_cgroup *root, + struct mem_cgroup *prev) +{ + struct mem_cgroup *memcg; + + do { + memcg =3D mem_cgroup_iter(root, prev, NULL); + prev =3D memcg; + } while (memcg && !mem_cgroup_tryget_online(memcg)); + return memcg; +} + /* * Walk the memcg tree and write back zswap pages until the * (lower_pages, upper_pages) window closes, or abort encounter @@ -1430,16 +1451,23 @@ static struct mem_cgroup *zswap_iter_global(void) * - No writeback-candidate memcgs found in a memcg tree walk. * - Shrinking a writeback-candidate memcg failed. * - * For shrink_worker(), it passes lower=3Dthr and upper=3Dzswap_total_page= s(). - * The @upper limit is refreshed in each iteration by re-evaluating - * zswap_total_pages(), and the window closes once the total falls - * below the threshold. + * For shrink_worker() (proactive=3Dfalse), it passes lower=3Dthr and + * upper=3Dzswap_total_pages(). The @upper limit is refreshed in each + * iteration by re-evaluating zswap_total_pages(), and the window + * closes once the total falls below the threshold. + * + * For zswap_proactive_writeback() (proactive=3Dtrue), it passes lower=3D0 + * and upper=3Dnr_to_writeback. The @lower limit is advanced by the + * compressed bytes written back via shrink_memcg(). The window closes + * once @nr_to_writeback pages of compressed data have been written back. */ -static void zswap_try_to_writeback(unsigned long lower_pages, - unsigned long upper_pages) +static int zswap_try_to_writeback(struct mem_cgroup *memcg, + unsigned long lower_pages, + unsigned long upper_pages, bool proactive) { - int failures =3D 0, attempts =3D 0; - struct mem_cgroup *iter_memcg; + int ret =3D 0, failures =3D 0, attempts =3D 0; + struct mem_cgroup *iter_memcg =3D NULL; + u64 bytes_written =3D 0; =20 while (lower_pages < upper_pages) { unsigned long batch_size; @@ -1447,14 +1475,17 @@ static void zswap_try_to_writeback(unsigned long lo= wer_pages, =20 cond_resched(); =20 - iter_memcg =3D zswap_iter_global(); + iter_memcg =3D proactive ? zswap_iter_local(memcg, iter_memcg) + : zswap_iter_global(); if (!iter_memcg) { /* * Continue shrinking without incrementing failures if * we found candidate memcgs in the last tree walk. */ - if (!attempts && ++failures =3D=3D MAX_RECLAIM_RETRIES) + if (!attempts && ++failures =3D=3D MAX_RECLAIM_RETRIES) { + ret =3D -EAGAIN; break; + } =20 attempts =3D 0; continue; @@ -1465,8 +1496,17 @@ static void zswap_try_to_writeback(unsigned long low= er_pages, /* drop the extra reference */ mem_cgroup_put(iter_memcg); =20 - /* zswap total pages might have changed, refresh it. */ - upper_pages =3D zswap_total_pages(); + /* + * Advance the window endpoint owned by this caller: + * - !proactive: zswap total pages might have changed, refresh. + * - proactive: accumulate bytes freed and fold to pages. + */ + if (!proactive) { + upper_pages =3D zswap_total_pages(); + } else if (shrunk > 0) { + bytes_written +=3D shrunk; + lower_pages =3D DIV_ROUND_UP(bytes_written, PAGE_SIZE); + } =20 /* * There are no writeback-candidate pages in the memcg. @@ -1478,9 +1518,15 @@ static void zswap_try_to_writeback(unsigned long low= er_pages, continue; ++attempts; =20 - if (shrunk <=3D 0 && ++failures =3D=3D MAX_RECLAIM_RETRIES) + if (shrunk <=3D 0 && ++failures =3D=3D MAX_RECLAIM_RETRIES) { + ret =3D -EAGAIN; break; + } } + + if (proactive) + mem_cgroup_iter_break(memcg, iter_memcg); + return ret; } =20 static void shrink_worker(struct work_struct *w) @@ -1490,7 +1536,7 @@ static void shrink_worker(struct work_struct *w) /* Reclaim down to the accept threshold */ thr =3D zswap_accept_thr_pages(); =20 - zswap_try_to_writeback(thr, zswap_total_pages()); + zswap_try_to_writeback(NULL, thr, zswap_total_pages(), false); } =20 /********************************* @@ -1736,6 +1782,19 @@ int zswap_load(struct folio *folio) return 0; } =20 +int zswap_proactive_writeback(struct mem_cgroup *memcg, + unsigned long nr_to_writeback) +{ + if (!memcg) + return -EINVAL; + if (!mem_cgroup_zswap_writeback_enabled(memcg)) + return -EINVAL; + if (!nr_to_writeback) + return 0; + + return zswap_try_to_writeback(memcg, 0, nr_to_writeback, true); +} + void zswap_invalidate(swp_entry_t swp) { pgoff_t offset =3D swp_offset(swp); --=20 2.34.1 From nobody Fri Jun 19 15:07:33 2026 Received: from mail-pf1-f174.google.com (mail-pf1-f174.google.com [209.85.210.174]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 5DC092BCF45 for ; Thu, 18 Jun 2026 04:49:56 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.210.174 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1781758197; cv=none; b=DvU61GZbJrcEQYXOUf80+5ZAUMTpE8o30vGrs4olY+jv+2ivmzaUYdwiwhiPlJKn9MPbm3tVIYHcxwMUyWCrEmNmv+7adq+6z9kuf/IzaJVjsD1hmx36wFl7LrZ1r0sGrOKZY/gUo9xVAppY4tPZ7TvQKRFgYP81TsBRVP5G3aY= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1781758197; c=relaxed/simple; bh=YF1VqYtDfh/qM73g2Rpr89IjFyrI+p5pxTH6vyO1oZ4=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=tvO1zPYJ2QeVMKb3kaebxe2UbcdLca0YOyzgfHemA9Zw+Ut8jIpkD26G0BJk6Lo3H/bWiJlOb5snNJyFvXrWo6Ad5F2mm+9KhpKUH5FicFiaA8EziuxySlY+EiG724EMWnRKMx6zGWzHbEntUrr27pBLcg6IP5L6b59nwxBhkmM= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=bEbdq6ZO; arc=none smtp.client-ip=209.85.210.174 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="bEbdq6ZO" Received: by mail-pf1-f174.google.com with SMTP id d2e1a72fcca58-84237c55ef9so258835b3a.0 for ; Wed, 17 Jun 2026 21:49:56 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20251104; t=1781758196; x=1782362996; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=lyhAjkY+9Qf6fDn3TYWufq6sYQTYoNNz/JFLMgng6yM=; b=bEbdq6ZOZUK7toDPQKD7Qhj4H4P7XWGD095GsfH3VTHRK9QP1AyBsAVZmR+tkbYAru Ci9SlhxxzGrhZjMcyhYeeoT/3QQKQ/g4URT6cKbhDUL6zb14StQGNlW9ipvvGV85o3LN pk6f4kBPSiIy49/UhTCm2i+xfkadCUJTNBeR3hSE3+iRU7HYLmCHyLU9P9yD1MSn+rxH +NEY/DBhLNMYNYbT0jWJZpGRpLwpD5hIZeh1mPSS760ZYNbjEkmNP/ruKa5uE/cg21ZM 4pqvp3GveH0C3XRTTove5tz2rRqOksXkz2m6o+tFAyqnwrfAhp+uz3kB4pavVYX0TENE DJYQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1781758196; x=1782362996; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=lyhAjkY+9Qf6fDn3TYWufq6sYQTYoNNz/JFLMgng6yM=; b=dIYF5rHffGs44v2zFNJgDka+V7/L0dVoFBYeYCQ+iAdW7igvtGnGYphICgODakOND9 7DydPAg4mA4ELj1WhbOzsbt0k4JG5ln+9gg6N71bJRD+sSYIMs7W2dXlnBS/cZn4yrCP 35OkqFN2kIPJAqYwM1fgE0QM+epsABzd3KKBzuTmfSRuwOcj5sHIk8e6a8BTUFEqXj/g 83OySDqWLm+w/Oj6oK6VaQ5YubGl6/OC6OutmljEokY4Qaah5BFj81bcLitVBWoRt5HT oXigToOY9ebCgVIqLwO1HOwhb469u3j+eWh7cyJdC5MIb9WwBQZzIH9mQKZrGi7GOHbo sq/w== X-Forwarded-Encrypted: i=1; AFNElJ8wxZoHyClT3aQiig2JHTxaiHfpDDdchaki0VRk3pYyOE3+dngxRo18SUldITBS2dtjbTRYUFnGTMt8Xws=@vger.kernel.org X-Gm-Message-State: AOJu0Yy+vV/i/OcMQVfFnUg7FPUC59qEPqUqQvZB0a9tm4b7/x0MV/RU OeUVWeHbKeUGYGJzwHNphhOeW3z6RelNRcY0nPm6ip3IqPelMMCFQG9h X-Gm-Gg: Acq92OGiSjNhN2M9p8Djqu3oxZ9cb7aHXSdUvR+x7rhpQ5oewIFwn1N3KImLC4P5uFe Wg3fFzuqty77P/ouRGZid/eJd0VyApcUEqAXZSf2fBVj8jH++uOjuFaHd7leL+XAjIU/dDIjfCF v7uaeQN92m0PkjXtf2XydZz2igl0/5RCVp7IuRq5aY84CQaBh6o9/pWNoooPJR9soBBICTelZgS LokXz1jMHTIO02E7noieiksdiEEqmpB3rkhD7A5+jhW9/hEiLoTZGqjI6x4hzkrSjrmpZU2HMF7 mFV3JpAj1sGZmjs1nE/2FVMazZAznVkVezV5nzNp+sc8jMANAl+/mBcMYdSoDr1x6UurYOEtz2H xXiEHTMwk2KiQEre0UHDZuYcN2uTZmZrg6bOJtQI7sF8XXV7ijpAiCy6FMCLUVkWjvaT/egaY6N Q24BriuO7NXxsy6GlSfmrUY38v8xqLs8o9D/LqA4pL X-Received: by 2002:a05:6a00:2192:b0:845:3ac5:1b8a with SMTP id d2e1a72fcca58-8453ac51e2cmr2039626b3a.0.1781758195707; Wed, 17 Jun 2026 21:49:55 -0700 (PDT) Received: from localhost.localdomain ([210.184.73.204]) by smtp.gmail.com with ESMTPSA id d2e1a72fcca58-8434b020b53sm17214781b3a.47.2026.06.17.21.49.48 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Wed, 17 Jun 2026 21:49:55 -0700 (PDT) From: Hao Jia To: akpm@linux-foundation.org, tj@kernel.org, hannes@cmpxchg.org, shakeel.butt@linux.dev, mhocko@kernel.org, yosry@kernel.org, mkoutny@suse.com, nphamcs@gmail.com, chengming.zhou@linux.dev, muchun.song@linux.dev, roman.gushchin@linux.dev Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, Hao Jia Subject: [PATCH v4 4/5] mm/zswap: Add per-memcg stat for proactive writeback Date: Thu, 18 Jun 2026 12:48:56 +0800 Message-Id: <20260618044857.69439-5-jiahao.kernel@gmail.com> X-Mailer: git-send-email 2.39.2 (Apple Git-143) In-Reply-To: <20260618044857.69439-1-jiahao.kernel@gmail.com> References: <20260618044857.69439-1-jiahao.kernel@gmail.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Hao Jia Add a new stat zswpwb_proactive_b to memory.stat. This counter is incremented by entry->length during proactive writebacks triggered via the zswap_writeback_only key in memory.reclaim. It tracks the compressed size (in bytes) of pages proactively written back from zswap to swap, allowing users to better monitor and tune the proactive writeback mechanism. Signed-off-by: Hao Jia --- Documentation/admin-guide/cgroup-v2.rst | 4 ++++ include/linux/memcontrol.h | 1 + mm/memcontrol.c | 3 +++ mm/zswap.c | 23 ++++++++++++++++++----- 4 files changed, 26 insertions(+), 5 deletions(-) diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-= guide/cgroup-v2.rst index e52d97e8e9c6..c164bb415002 100644 --- a/Documentation/admin-guide/cgroup-v2.rst +++ b/Documentation/admin-guide/cgroup-v2.rst @@ -1748,6 +1748,10 @@ The following nested keys are defined. zswpwb Number of pages written from zswap to swap. =20 + zswpwb_proactive_b + Bytes of compressed data proactively written back from + zswap to swap via memory.reclaim zswap_writeback_only key. + zswap_incomp Number of incompressible pages currently stored in zswap without compression. These pages could not be compressed to diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h index e1f46a0016fc..56580b264dc4 100644 --- a/include/linux/memcontrol.h +++ b/include/linux/memcontrol.h @@ -40,6 +40,7 @@ enum memcg_stat_item { MEMCG_ZSWAP_B, MEMCG_ZSWAPPED, MEMCG_ZSWAP_INCOMP, + MEMCG_ZSWPWB_PROACTIVE_B, MEMCG_NR_STAT, }; =20 diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 56cd4af08232..5ffb5095f0ee 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -433,6 +433,7 @@ static const unsigned int memcg_stat_items[] =3D { MEMCG_ZSWAP_B, MEMCG_ZSWAPPED, MEMCG_ZSWAP_INCOMP, + MEMCG_ZSWPWB_PROACTIVE_B, }; =20 #define NR_MEMCG_NODE_STAT_ITEMS ARRAY_SIZE(memcg_node_stat_items) @@ -1558,6 +1559,7 @@ static const struct memory_stat memory_stats[] =3D { { "zswap", MEMCG_ZSWAP_B }, { "zswapped", MEMCG_ZSWAPPED }, { "zswap_incomp", MEMCG_ZSWAP_INCOMP }, + { "zswpwb_proactive_b", MEMCG_ZSWPWB_PROACTIVE_B }, #endif { "file_mapped", NR_FILE_MAPPED }, { "file_dirty", NR_FILE_DIRTY }, @@ -1614,6 +1616,7 @@ static int memcg_page_state_unit(int item) switch (item) { case MEMCG_PERCPU_B: case MEMCG_ZSWAP_B: + case MEMCG_ZSWPWB_PROACTIVE_B: case NR_SLAB_RECLAIMABLE_B: case NR_SLAB_UNRECLAIMABLE_B: return 1; diff --git a/mm/zswap.c b/mm/zswap.c index 28200552dde3..d78bacf80209 100644 --- a/mm/zswap.c +++ b/mm/zswap.c @@ -163,6 +163,7 @@ struct zswap_pool { struct zswap_shrink_walk_arg { unsigned long bytes_written; bool encountered_page_in_swapcache; + bool proactive; }; =20 /* Global LRU lists shared by all zswap pools. */ @@ -990,7 +991,8 @@ static bool zswap_decompress(struct zswap_entry *entry,= struct folio *folio) * freed. */ static int zswap_writeback_entry(struct zswap_entry *entry, - swp_entry_t swpentry) + swp_entry_t swpentry, + bool proactive) { struct xarray *tree; pgoff_t offset =3D swp_offset(swpentry); @@ -1045,6 +1047,15 @@ static int zswap_writeback_entry(struct zswap_entry = *entry, if (entry->objcg) count_objcg_events(entry->objcg, ZSWPWB, 1); =20 + if (proactive && entry->objcg) { + struct mem_cgroup *memcg; + + rcu_read_lock(); + memcg =3D obj_cgroup_memcg(entry->objcg); + mod_memcg_state(memcg, MEMCG_ZSWPWB_PROACTIVE_B, entry->length); + rcu_read_unlock(); + } + zswap_entry_free(entry); =20 /* folio is up to date */ @@ -1155,7 +1166,7 @@ static enum lru_status shrink_memcg_cb(struct list_he= ad *item, struct list_lru_o */ spin_unlock(&l->lock); =20 - writeback_result =3D zswap_writeback_entry(entry, swpentry); + writeback_result =3D zswap_writeback_entry(entry, swpentry, walk_arg->pro= active); =20 if (writeback_result) { zswap_reject_reclaim_fail++; @@ -1184,6 +1195,7 @@ static unsigned long zswap_shrinker_scan(struct shrin= ker *shrinker, struct zswap_shrink_walk_arg walk_arg =3D { .bytes_written =3D 0, .encountered_page_in_swapcache =3D false, + .proactive =3D false, }; unsigned long shrink_ret; =20 @@ -1305,11 +1317,12 @@ static struct shrinker *zswap_alloc_shrinker(void) * writeback disabled, is a zombie cgroup, or has empty zswap LRUs. */ static long shrink_memcg(struct mem_cgroup *memcg, - unsigned long nr_to_writeback) + unsigned long nr_to_writeback, bool proactive) { struct zswap_shrink_walk_arg walk_arg =3D { .bytes_written =3D 0, .encountered_page_in_swapcache =3D false, + .proactive =3D proactive, }; u64 bytes_to_writeback =3D nr_to_writeback << PAGE_SHIFT; bool memcg_list_is_empty =3D true; @@ -1492,7 +1505,7 @@ static int zswap_try_to_writeback(struct mem_cgroup *= memcg, } =20 batch_size =3D min(upper_pages - lower_pages, NR_ZSWAP_WB_BATCH); - shrunk =3D shrink_memcg(iter_memcg, batch_size); + shrunk =3D shrink_memcg(iter_memcg, batch_size, proactive); /* drop the extra reference */ mem_cgroup_put(iter_memcg); =20 @@ -1642,7 +1655,7 @@ bool zswap_store(struct folio *folio) objcg =3D get_obj_cgroup_from_folio(folio); if (objcg && !obj_cgroup_may_zswap(objcg)) { memcg =3D get_mem_cgroup_from_objcg(objcg); - if (shrink_memcg(memcg, 1) <=3D 0) { + if (shrink_memcg(memcg, 1, false) <=3D 0) { mem_cgroup_put(memcg); goto put_objcg; } --=20 2.34.1 From nobody Fri Jun 19 15:07:33 2026 Received: from mail-pf1-f174.google.com (mail-pf1-f174.google.com [209.85.210.174]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 3A67617A2F6 for ; Thu, 18 Jun 2026 04:50:05 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.210.174 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1781758206; cv=none; b=ClH/xgtuYiH4qsozTZXmZZgBy6z7zZEir2PWaiLWFQeCRQp/5KcWU4qVaZA49waiiX7wQ7IXSOvmvVVBFfmsEZrFGLzD8mNqyr2DF36iWC3wBuczEt9fS2RN01nUUFZoVvviMBJb+ztLmTRxVY/y/TyfUenP9S7VohChkRKCJUI= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1781758206; c=relaxed/simple; bh=aIcPhA6gmhErGVoUAefc+7js1FHagPXIMuNDUIB2Pxc=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=XHzIdOyh8np8f60YTKRvhfQ3knOh+NJJF56qOlecmT063gFWYpuvndu7mEZY4gTMzjvC3Ciyu6GiOIuW1DNSTonn2w7p97ZcNVaSpnthyqJLZIKd12yet/KspCzOWq9ufcGpkj111BsiRgCYnQzk+Fc+2rETtgxdKnfaLvrVzRo= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=JHFlYWbb; arc=none smtp.client-ip=209.85.210.174 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="JHFlYWbb" Received: by mail-pf1-f174.google.com with SMTP id d2e1a72fcca58-84536e2857eso269370b3a.0 for ; Wed, 17 Jun 2026 21:50:05 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20251104; t=1781758204; x=1782363004; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=AC9B5YDsIulRerYhn7rz1eBAPxCjqQ7ZRwn/nMekv0E=; b=JHFlYWbbb9+TP27E1auB+z0zoZsU/awklKZN2ILxK5vWgwtSrww8iqMqsl7XdNs78J Wlbdl+DKZHyBf3ABp/6Nepe6hDbP6Ew6Rpc2k43J2FtISbkFRueDwvJqYgV50CSx7z+c kG6PLZlTpui//PaKd1luIeaGZ9DlNk5DrWvBEXGUqjluAWR7bcz4BkVmzjMu/6HONozH /wzf1BOGC6BeKJ86A85H84v7w98ewr08y6B2IwuHw4BXc7RaTm8SZFLytrtyEI66A81+ fpqC0yYMruFahTC+aal68CLmRnyk9tK1ErSqzib2soGUs1882PMN778w3bkRH7OEHvf+ tKpg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1781758204; x=1782363004; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=AC9B5YDsIulRerYhn7rz1eBAPxCjqQ7ZRwn/nMekv0E=; b=tICbT4WoluydW3ttCsl2Z2quMGdVwfOUv3RQM++ZxDvyi0FeQCp/77YOLamOJWxJOb uv9Ghrfe/YW2O5b1T+1frX7WRuReMI27ChmBwqxuEaM/LZBWfhYnbZUNjvcKP4sAO8Tk SSliyiVQxqxSOlMd1tP2OcVjfw7HT0MrIFBF1NxlJF+YQqgfWZQVgWLvbs15zlIRqk8D 1bnbrL5xWVCThIDWVYRg6gtOrrDmzctoLtRtmxahKxJpx+ddVG5FOs6XjQYwdPomtAwg I4kx1uBU59v8KwRWu+H7YMQMXsd4vYKJBkVslo2X8lUXfD5MYqRT5eZYl0AaPp6Zjvk6 MVhA== X-Forwarded-Encrypted: i=1; AFNElJ+xcActjlDvctg+AFRlkss7jAd6/DduOw7tKt584tuAWtHOGvGjrqLZW4j9ugw+z9mAmx+1wndYJa9A4bo=@vger.kernel.org X-Gm-Message-State: AOJu0Yx5JVd+rzym4Wij/eGkO1wOh53WO3GhaIZCTd/8TMW9CTedFnyJ Z+bvSIUOwaOn9ocM8mA0jvf+sHCbtymu9DjcdDPA/+ZZbhfUSvvE+Pyt X-Gm-Gg: AfdE7ckzPAK/YhgUCsmdzY/JLN8y2JhdldMar/t+0N4kRLWW1NWsVmqa25oQpeJjI5e vmRUQX8T7hTwI5qy/qqyJdvK/yRtThQ4NdCx6hR0wYRY/ypdxpaWSqSNYa0IXxY15n+ygjYxpNP 1akyBB1iY/bI9y9ki8b7k5HVh82Nn9NBNkV/2x3kQ5p7po82ht6QjviZJpWzWaIJESQhVhC5aMv kem42VgeQ3NPc4/v9Bu1JZkW6+6Sop0c5IYqLd/NhQw/SyNkMDqOorLTm1YjP91Cg91ZCI33Z9R tdOgg+6/KA7Gle9WIAr+bPFo6Z5ISzDtMfR0bcYCvu7oagA9YfBVFV4prrTclheoXshm2msPP5y ee6jJNqS/zEfBFvD8gxIEmYGKiHizquz/70WjbMjQPmXB+ctk/6xPTjdwOXu7q8xXdW914y3fPD aD706Nvfx/sYUGwdOa6pU4DPZwNtD0IIlB6GumwDjbKvfjvKbE6ho= X-Received: by 2002:a05:6a00:66c5:b0:845:31a6:d84d with SMTP id d2e1a72fcca58-84531a6df5bmr4041592b3a.7.1781758204555; Wed, 17 Jun 2026 21:50:04 -0700 (PDT) Received: from localhost.localdomain ([210.184.73.204]) by smtp.gmail.com with ESMTPSA id d2e1a72fcca58-8434b020b53sm17214781b3a.47.2026.06.17.21.49.57 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Wed, 17 Jun 2026 21:50:04 -0700 (PDT) From: Hao Jia To: akpm@linux-foundation.org, tj@kernel.org, hannes@cmpxchg.org, shakeel.butt@linux.dev, mhocko@kernel.org, yosry@kernel.org, mkoutny@suse.com, nphamcs@gmail.com, chengming.zhou@linux.dev, muchun.song@linux.dev, roman.gushchin@linux.dev Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, Hao Jia Subject: [PATCH v4 5/5] selftests/cgroup: Add tests for zswap proactive writeback Date: Thu, 18 Jun 2026 12:48:57 +0800 Message-Id: <20260618044857.69439-6-jiahao.kernel@gmail.com> X-Mailer: git-send-email 2.39.2 (Apple Git-143) In-Reply-To: <20260618044857.69439-1-jiahao.kernel@gmail.com> References: <20260618044857.69439-1-jiahao.kernel@gmail.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Hao Jia Add test_zswap_proactive_writeback() to cover the new memory.reclaim "zswap_writeback_only" key. The test populates a memory cgroup zswap pool, triggers proactive writeback, and verifies the behavior by observing the change in zswpwb_proactive_b. Invalid input combinations are also covered. Extend test_zswap_writeback_one() to assert that the existing non-proactive writeback path leaves zswpwb_proactive_b at zero. Signed-off-by: Hao Jia --- tools/testing/selftests/cgroup/test_zswap.c | 153 +++++++++++++++++++- 1 file changed, 152 insertions(+), 1 deletion(-) diff --git a/tools/testing/selftests/cgroup/test_zswap.c b/tools/testing/se= lftests/cgroup/test_zswap.c index 49b36ee79160..9c153fdd3a08 100644 --- a/tools/testing/selftests/cgroup/test_zswap.c +++ b/tools/testing/selftests/cgroup/test_zswap.c @@ -60,7 +60,12 @@ static int get_zswap_stored_pages(size_t *value) =20 static long get_cg_wb_count(const char *cg) { - return cg_read_key_long(cg, "memory.stat", "zswpwb"); + return cg_read_key_long(cg, "memory.stat", "zswpwb "); +} + +static long get_cg_pwb_bytes(const char *cg) +{ + return cg_read_key_long(cg, "memory.stat", "zswpwb_proactive_b "); } =20 static long get_zswpout(const char *cgroup) @@ -355,6 +360,7 @@ static int attempt_writeback(const char *cgroup, void *= arg) static int test_zswap_writeback_one(const char *cgroup, bool wb) { long zswpwb_before, zswpwb_after; + long pwb_bytes; =20 zswpwb_before =3D get_cg_wb_count(cgroup); if (zswpwb_before !=3D 0) { @@ -362,6 +368,12 @@ static int test_zswap_writeback_one(const char *cgroup= , bool wb) return -1; } =20 + pwb_bytes =3D get_cg_pwb_bytes(cgroup); + if (pwb_bytes !=3D 0) { + ksft_print_msg("zswpwb_proactive_b_before =3D %ld instead of 0\n", pwb_b= ytes); + return -1; + } + if (cg_run(cgroup, attempt_writeback, (void *) &wb)) return -1; =20 @@ -379,6 +391,17 @@ static int test_zswap_writeback_one(const char *cgroup= , bool wb) return -1; } =20 + /* + * attempt_writeback() does not use the proactive writeback path, so + * zswpwb_proactive_b must stay at zero regardless of whether + * writeback was enabled. + */ + pwb_bytes =3D get_cg_pwb_bytes(cgroup); + if (pwb_bytes !=3D 0) { + ksft_print_msg("zswpwb_proactive_b_after is %ld, expected 0\n", pwb_byte= s); + return -1; + } + return 0; } =20 @@ -770,6 +793,133 @@ static int test_zswap_incompressible(const char *root) return ret; } =20 +/* + * Trigger proactive zswap writeback with the following steps: + * 1. Allocate memory. + * 2. Push allocated memory into zswap. + * 3. Proactively write back zswap pages to swap + * using "zswap_writeback_only". + */ +static int proactive_writeback_workload(const char *cgroup, void *arg) +{ + long pagesize =3D sysconf(_SC_PAGESIZE); + size_t memsize =3D pagesize * 1024; + char reclaim_cmd[64]; + char buf[pagesize]; + long zswap_usage; + int ret =3D -1; + int rc; + char *mem; + + mem =3D (char *)malloc(memsize); + if (!mem) + return ret; + + for (int i =3D 0; i < pagesize; i++) + buf[i] =3D i < pagesize / 2 ? (char)i : 0; + for (int i =3D 0; i < memsize; i +=3D pagesize) + memcpy(&mem[i], buf, pagesize); + + /* Evict allocated memory into zswap. */ + if (cg_write_numeric(cgroup, "memory.reclaim", memsize)) { + ksft_print_msg("Failed to push pages into zswap\n"); + goto out; + } + + zswap_usage =3D cg_read_long(cgroup, "memory.zswap.current"); + if (zswap_usage <=3D 0) { + ksft_print_msg("no zswap pool to write back\n"); + goto out; + } + + /* Trigger proactive zswap writeback. */ + snprintf(reclaim_cmd, sizeof(reclaim_cmd), "%zu zswap_writeback_only", zs= wap_usage); + rc =3D cg_write(cgroup, "memory.reclaim", reclaim_cmd); + if (rc && rc !=3D -EAGAIN) { + ksft_print_msg("proactive zswap writeback failed: %d\n", rc); + goto out; + } + + ret =3D 0; +out: + free(mem); + return ret; +} + +static int check_writeback_invalid_inputs(const char *cgroup) +{ + static char * const bad_inputs[] =3D { + "zswap_writeback_only", + "1M zswap_writeback_only swappiness=3D60", + "1M swappiness=3D60 zswap_writeback_only", + "1M zswap_writeback_only swappiness=3Dmax", + "1M swappiness=3Dmax zswap_writeback_only", + }; + int i, rc; + + for (i =3D 0; i < ARRAY_SIZE(bad_inputs); i++) { + rc =3D cg_write(cgroup, "memory.reclaim", bad_inputs[i]); + if (rc !=3D -EINVAL) { + ksft_print_msg("memory.reclaim '%s': returned %d, expected %d\n", + bad_inputs[i], rc, -EINVAL); + return -1; + } + } + return 0; +} + +static int test_zswap_proactive_writeback(const char *root) +{ + long wb_before, wb_after; + long pwb_b_before, pwb_b_after; + long wb_delta, pwb_b_delta; + int ret =3D KSFT_FAIL; + char *test_group; + + if (cg_read_strcmp(root, "memory.zswap.writeback", "1")) + return KSFT_SKIP; + + test_group =3D cg_name(root, "zswap_proactive_test"); + if (!test_group) + return KSFT_FAIL; + if (cg_create(test_group)) + goto out; + if (check_writeback_invalid_inputs(test_group)) + goto out; + + pwb_b_before =3D get_cg_pwb_bytes(test_group); + wb_before =3D get_cg_wb_count(test_group); + if (pwb_b_before < 0 || wb_before < 0) + goto out; + + if (cg_run(test_group, proactive_writeback_workload, NULL)) + goto out; + + pwb_b_after =3D get_cg_pwb_bytes(test_group); + wb_after =3D get_cg_wb_count(test_group); + if (pwb_b_after < 0 || wb_after < 0) + goto out; + + pwb_b_delta =3D pwb_b_after - pwb_b_before; + wb_delta =3D wb_after - wb_before; + + if (pwb_b_delta <=3D 0) { + ksft_print_msg("zswpwb_proactive_b did not increase: delta=3D%ld\n", + pwb_b_delta); + goto out; + } + if (wb_delta <=3D 0) { + ksft_print_msg("zswpwb did not increase: delta=3D%ld\n", wb_delta); + goto out; + } + + ret =3D KSFT_PASS; +out: + cg_destroy(test_group); + free(test_group); + return ret; +} + #define T(x) { x, #x } struct zswap_test { int (*fn)(const char *root); @@ -783,6 +933,7 @@ struct zswap_test { T(test_no_kmem_bypass), T(test_no_invasive_cgroup_shrink), T(test_zswap_incompressible), + T(test_zswap_proactive_writeback), }; #undef T =20 --=20 2.34.1