From nobody Thu Apr 2 10:56:12 2026 Received: from mail-oa1-f45.google.com (mail-oa1-f45.google.com [209.85.160.45]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 617D93EE1F5 for ; Fri, 20 Mar 2026 19:27:54 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.160.45 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1774034876; cv=none; b=WHcT1jBEarip7TnATWaLuJGnDIH0XXqUO6od4o4i47hX86BhJ1f/DuUD8Q4Jmf6KICi86cdn2mwfo0MzNDyuKjZQacAB0s9Xw84PeOt+7BD7avK5vnU+07bD4mELtLPsz/GffoBkpyu+FxxisaZzaJNBZ5InWyzf2bQSy4Z/MRE= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1774034876; c=relaxed/simple; bh=1dUCBz1ekRwGyhp7aar8Ynyko8f4W8fyU8IBjgTWVT0=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=dxtLNvHcutKyeTPxImfQOBrayb4jgztS05YN/ywEb4vxZCxM/5s3Q4vsRxc/2GwWCOsLRBFL1Q165jC7kUXB/cROe/ZjA0HuUL6BokGCGF1/h9UuTBFGTanEcjGcDnlurSYgDL0sCXtX8Feu+63tRxCtfDEhr5b2nX+gol8JbeU= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=QrLXInbl; arc=none smtp.client-ip=209.85.160.45 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="QrLXInbl" Received: by mail-oa1-f45.google.com with SMTP id 586e51a60fabf-41576c5c01cso1480127fac.3 for ; Fri, 20 Mar 2026 12:27:54 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1774034873; x=1774639673; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=KPp25151sWevMzEffEXxfQEZtCUw8Iu/6c7CfKgDuL4=; b=QrLXInblHomAh6OHIjBFUyK8E28FJX5P1/b3FnAs6J102tRryk9SuMvI8F4JBa9t1T 1GvBGBe79kkmx9aruajSh5c8OMADIlD19sf+UDc67eTIcgwl+LyjJiK3QUs8z48oTLwm T0G0kgFKVDgeaIxt2QBBRhPDOsxnxdXRspMJf3zYWJX+8UXjgXxWnjxr8c5KKqx9gS6N 8aisPKhV8GyNd9qRvp2BwiOeDd8PEvxNmluNFJn+whebSFBZt/vOr+s41pxe7xzPJglY EhtPFPJwIYHjEQ+XCzImYs8xiBRd4ubLWwBmNUynKo16gudcOCtT15vG65ZLpkVwSMjk tj9Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1774034873; x=1774639673; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=KPp25151sWevMzEffEXxfQEZtCUw8Iu/6c7CfKgDuL4=; b=ZJtFsDBZlyVpxRESLshrWmeyrmzDDJz6wRFl3uDgw6UAD1dk1Xa5BqcJn6OuskAoN3 4JK3dkef24snEtdVrr70fE55pyUmZ8NmX26qfzq+DW7jvNz5nOudSXrLzCWL8naKJsbK 9LB5ghzefcgjKY6irQJuC/JKJ+mCVhTcNM26fta8cSnPNw6pf23h/1/bmcNb5TTlScQk 6Q2u4sRx0EMuOVEb8CArNpz3uVWDDna260ildu+Xp01SBZPnB6wxuooglX6szynK7uTd k0DA5fPbsS4ogGZsXGOVKYxS8OG/4VYHtz99dGSfUdQxP1ykh5L0au8s6tNCnxu9lgND 8Tcw== X-Forwarded-Encrypted: i=1; AJvYcCUonIZvKap3x7T9ovm1OP9gwVulFwr/mxRjZyaxlTpRd301sJ75qfjB/VQqJDB+lbHTU4VcN3hP5Vm9+Bo=@vger.kernel.org X-Gm-Message-State: AOJu0Yx89UARiWdxS0rQnGFuSIpc1I+yh7NTcjZiBrUu4s6MDQEjbIhM 9Met42HU3kHNPPtQ59EG+EDqbWlbNREsQe13bMwi1/wfIoZkANXhSk7T X-Gm-Gg: ATEYQzzxrDgbCJ2MEK7tfrSc3u3osNhvw60LB9nyRuMnPfUvrGX2JBb5cWHL7nEklEB TJeRQk4zEWdNHOB+igd+RNb7uS78BIKowQAQ7uM0gPKSkiL0I0gtPp56OShvDOVy+t6Dk2DhtzT z0fHvTIvMnKZFp2zsfvAZ/rmPyLYuvy55wehy0cuo9bLMIddqwZgl5x40rUG+VcS4/2J1KNMDyJ 0xsPW8bkYdo5J38si0KPqaTMhhcijFG/CKnO1f9Vg+m39Q5PxfXpGHwLPz5aRam/BkgScqinWXu WHIiRIj7RzrO5pFSLmltpeAL0NLwzpg5HktwuDh24rid2CtVmBGudOJWXII0LSWvIn3RAvYkmL9 RHc3jZBULe3se7OfS+kGd/eCHKwQNlzuWgcOE/h6LW0lMkK83gph+7svtL7qRbaynSICBnIsVzj BMdO7MN25mBVgvCjn62gFr04kmMz3GApk8Yz6kgS2CqNyt2w== X-Received: by 2002:a05:6870:f150:b0:409:7a01:6e2f with SMTP id 586e51a60fabf-41c10ffd393mr2670572fac.11.1774034872998; Fri, 20 Mar 2026 12:27:52 -0700 (PDT) Received: from localhost ([2a03:2880:10ff:56::]) by smtp.gmail.com with ESMTPSA id 586e51a60fabf-41c14d73a19sm2659524fac.11.2026.03.20.12.27.52 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 20 Mar 2026 12:27:52 -0700 (PDT) From: Nhat Pham To: kasong@tencent.com Cc: Liam.Howlett@oracle.com, akpm@linux-foundation.org, apopple@nvidia.com, axelrasmussen@google.com, baohua@kernel.org, baolin.wang@linux.alibaba.com, bhe@redhat.com, byungchul@sk.com, cgroups@vger.kernel.org, chengming.zhou@linux.dev, chrisl@kernel.org, corbet@lwn.net, david@kernel.org, dev.jain@arm.com, gourry@gourry.net, hannes@cmpxchg.org, hughd@google.com, jannh@google.com, joshua.hahnjy@gmail.com, lance.yang@linux.dev, lenb@kernel.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, linux-pm@vger.kernel.org, lorenzo.stoakes@oracle.com, matthew.brost@intel.com, mhocko@suse.com, muchun.song@linux.dev, npache@redhat.com, nphamcs@gmail.com, pavel@kernel.org, peterx@redhat.com, peterz@infradead.org, pfalcato@suse.de, rafael@kernel.org, rakie.kim@sk.com, roman.gushchin@linux.dev, rppt@kernel.org, ryan.roberts@arm.com, shakeel.butt@linux.dev, shikemeng@huaweicloud.com, surenb@google.com, tglx@kernel.org, vbabka@suse.cz, weixugc@google.com, ying.huang@linux.alibaba.com, yosry.ahmed@linux.dev, yuanchu@google.com, zhengqi.arch@bytedance.com, ziy@nvidia.com, kernel-team@meta.com, riel@surriel.com Subject: [PATCH v5 11/21] zswap: move zswap entry management to the virtual swap descriptor Date: Fri, 20 Mar 2026 12:27:25 -0700 Message-ID: <20260320192735.748051-12-nphamcs@gmail.com> X-Mailer: git-send-email 2.52.0 In-Reply-To: <20260320192735.748051-1-nphamcs@gmail.com> References: <20260320192735.748051-1-nphamcs@gmail.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Remove the zswap tree and manage zswap entries directly through the virtual swap descriptor. This re-partitions the zswap pool (by virtual swap cluster), which eliminates zswap tree lock contention. Signed-off-by: Nhat Pham --- include/linux/zswap.h | 6 +++ mm/vswap.c | 100 ++++++++++++++++++++++++++++++++++++++++++ mm/zswap.c | 40 ----------------- 3 files changed, 106 insertions(+), 40 deletions(-) diff --git a/include/linux/zswap.h b/include/linux/zswap.h index 1a04caf283dc8..7eb3ce7e124fc 100644 --- a/include/linux/zswap.h +++ b/include/linux/zswap.h @@ -6,6 +6,7 @@ #include =20 struct lruvec; +struct zswap_entry; =20 extern atomic_long_t zswap_stored_pages; =20 @@ -33,6 +34,11 @@ void zswap_lruvec_state_init(struct lruvec *lruvec); void zswap_folio_swapin(struct folio *folio); bool zswap_is_enabled(void); bool zswap_never_enabled(void); +void *zswap_entry_store(swp_entry_t swpentry, struct zswap_entry *entry); +void *zswap_entry_load(swp_entry_t swpentry); +void *zswap_entry_erase(swp_entry_t swpentry); +bool zswap_empty(swp_entry_t swpentry); + #else =20 struct zswap_lruvec_state {}; diff --git a/mm/vswap.c b/mm/vswap.c index 3027294cd872b..9b2122647b850 100644 --- a/mm/vswap.c +++ b/mm/vswap.c @@ -10,6 +10,7 @@ #include #include #include +#include #include "swap.h" #include "swap_table.h" =20 @@ -37,11 +38,13 @@ * Swap descriptor - metadata of a swapped out page. * * @slot: The handle to the physical swap slot backing this page. + * @zswap_entry: The zswap entry associated with this swap slot. * @swap_cache: The folio in swap cache. * @shadow: The shadow entry. */ struct swp_desc { swp_slot_t slot; + struct zswap_entry *zswap_entry; union { struct folio *swap_cache; void *shadow; @@ -238,6 +241,7 @@ static void __vswap_alloc_from_cluster(struct vswap_clu= ster *cluster, int start) for (i =3D 0; i < nr; i++) { desc =3D &cluster->descriptors[start + i]; desc->slot.val =3D 0; + desc->zswap_entry =3D NULL; } cluster->count +=3D nr; } @@ -1008,6 +1012,102 @@ void __swap_cache_replace_folio(struct folio *old, = struct folio *new) rcu_read_unlock(); } =20 +#ifdef CONFIG_ZSWAP +/** + * zswap_entry_store - store a zswap entry for a swap entry + * @swpentry: the swap entry + * @entry: the zswap entry to store + * + * Stores a zswap entry in the swap descriptor for the given swap entry. + * The cluster is locked during the store operation. + * + * Return: the old zswap entry if one existed, NULL otherwise + */ +void *zswap_entry_store(swp_entry_t swpentry, struct zswap_entry *entry) +{ + struct vswap_cluster *cluster =3D NULL; + struct swp_desc *desc; + void *old; + + rcu_read_lock(); + desc =3D vswap_iter(&cluster, swpentry.val); + if (!desc) { + rcu_read_unlock(); + return NULL; + } + + old =3D desc->zswap_entry; + desc->zswap_entry =3D entry; + spin_unlock(&cluster->lock); + rcu_read_unlock(); + + return old; +} + +/** + * zswap_entry_load - load a zswap entry for a swap entry + * @swpentry: the swap entry + * + * Loads the zswap entry from the swap descriptor for the given swap entry. + * + * Return: the zswap entry if one exists, NULL otherwise + */ +void *zswap_entry_load(swp_entry_t swpentry) +{ + struct vswap_cluster *cluster =3D NULL; + struct swp_desc *desc; + void *zswap_entry; + + rcu_read_lock(); + desc =3D vswap_iter(&cluster, swpentry.val); + if (!desc) { + rcu_read_unlock(); + return NULL; + } + + zswap_entry =3D desc->zswap_entry; + spin_unlock(&cluster->lock); + rcu_read_unlock(); + + return zswap_entry; +} + +/** + * zswap_entry_erase - erase a zswap entry for a swap entry + * @swpentry: the swap entry + * + * Erases the zswap entry from the swap descriptor for the given swap entr= y. + * The cluster is locked during the erase operation. + * + * Return: the zswap entry that was erased, NULL if none existed + */ +void *zswap_entry_erase(swp_entry_t swpentry) +{ + struct vswap_cluster *cluster =3D NULL; + struct swp_desc *desc; + void *old; + + rcu_read_lock(); + desc =3D vswap_iter(&cluster, swpentry.val); + if (!desc) { + rcu_read_unlock(); + return NULL; + } + + old =3D desc->zswap_entry; + desc->zswap_entry =3D NULL; + spin_unlock(&cluster->lock); + rcu_read_unlock(); + + return old; +} + +bool zswap_empty(swp_entry_t swpentry) +{ + return xa_empty(&vswap_cluster_map); +} +#endif /* CONFIG_ZSWAP */ + int vswap_init(void) { int i; diff --git a/mm/zswap.c b/mm/zswap.c index f7313261673ff..72441131f094e 100644 --- a/mm/zswap.c +++ b/mm/zswap.c @@ -223,37 +223,6 @@ static bool zswap_has_pool; * helpers and fwd declarations **********************************/ =20 -static DEFINE_XARRAY(zswap_tree); - -#define zswap_tree_index(entry) (entry.val) - -static inline void *zswap_entry_store(swp_entry_t swpentry, - struct zswap_entry *entry) -{ - pgoff_t offset =3D zswap_tree_index(swpentry); - - return xa_store(&zswap_tree, offset, entry, GFP_KERNEL); -} - -static inline void *zswap_entry_load(swp_entry_t swpentry) -{ - pgoff_t offset =3D zswap_tree_index(swpentry); - - return xa_load(&zswap_tree, offset); -} - -static inline void *zswap_entry_erase(swp_entry_t swpentry) -{ - pgoff_t offset =3D zswap_tree_index(swpentry); - - return xa_erase(&zswap_tree, offset); -} - -static inline bool zswap_empty(swp_entry_t swpentry) -{ - return xa_empty(&zswap_tree); -} - #define zswap_pool_debug(msg, p) \ pr_debug("%s pool %s\n", msg, (p)->tfm_name) =20 @@ -1445,13 +1414,6 @@ static bool zswap_store_page(struct page *page, goto compress_failed; =20 old =3D zswap_entry_store(page_swpentry, entry); - if (xa_is_err(old)) { - int err =3D xa_err(old); - - WARN_ONCE(err !=3D -ENOMEM, "unexpected xarray error: %d\n", err); - zswap_reject_alloc_fail++; - goto store_failed; - } =20 /* * We may have had an existing entry that became stale when @@ -1498,8 +1460,6 @@ static bool zswap_store_page(struct page *page, =20 return true; =20 -store_failed: - zs_free(pool->zs_pool, entry->handle); compress_failed: zswap_entry_cache_free(entry); return false; --=20 2.52.0