From nobody Sat Dec 27 17:04:49 2025 Received: from out-175.mta0.migadu.com (out-175.mta0.migadu.com [91.218.175.175]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 2DE98358B9 for ; Mon, 18 Dec 2023 11:50:43 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dmarc=fail (p=quarantine dis=none) header.from=bytedance.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.dev X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. From: Chengming Zhou Date: Mon, 18 Dec 2023 11:50:31 +0000 Subject: [PATCH v3 1/6] mm/zswap: change dstmem size to one page Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Message-Id: <20231213-zswap-dstmem-v3-1-4eac09b94ece@bytedance.com> References: <20231213-zswap-dstmem-v3-0-4eac09b94ece@bytedance.com> In-Reply-To: <20231213-zswap-dstmem-v3-0-4eac09b94ece@bytedance.com> To: Seth Jennings , Yosry Ahmed , Vitaly Wool , Dan Streetman , Johannes Weiner , Chris Li , Andrew Morton , Nhat Pham Cc: Chris Li , Yosry Ahmed , linux-kernel@vger.kernel.org, Chengming Zhou , linux-mm@kvack.org, Nhat Pham X-Developer-Signature: v=1; a=ed25519-sha256; t=1702900234; l=2012; i=zhouchengming@bytedance.com; s=20231204; h=from:subject:message-id; bh=knbPWkl1sBJOnq5pAv9yoWZfZi+/5lFRxFAsoTFiAeI=; b=Skf5BWIwB7K3APRmPY9IP+iewRS012Hs6COguVdU5PIlg7L+x04yE80HEjpOimgwkl6cW+uik 96/4hLYrUTUCrBSCuMhN8meBpFS79yrWTNl5P7OeI56K+kia94TKabn X-Developer-Key: i=zhouchengming@bytedance.com; a=ed25519; pk=xFTmRtMG3vELGJBUiml7OYNdM393WOMv0iWWeQEVVdA= X-Migadu-Flow: FLOW_OUT Change the dstmem size from 2 * PAGE_SIZE to only one page since we only need at most one page when compress, and the "dlen" is also PAGE_SIZE in acomp_request_set_params(). If the output size > PAGE_SIZE we don't wanna store the output in zswap anyway. So change it to one page, and delete the stale comment. There is no any history about the reason why we needed 2 pages, it has been 2 * PAGE_SIZE since the time zswap was first merged. According to Yosry and Nhat, one potential reason is that we used to store a zswap header containing the swap entry in the compressed page for writeback purposes, but we don't do that anymore. This patch works good in kernel build testing even when the input data doesn't compress at all (i.e. dlen =3D=3D PAGE_SIZE), which we can see from the bpftrace tool: bpftrace -e 'k:zpool_malloc {@[(uint32)arg1=3D=3D4096]=3Dcount()}' @[1]: 2 @[0]: 12011430 Reviewed-by: Yosry Ahmed Reviewed-by: Nhat Pham Signed-off-by: Chengming Zhou --- mm/zswap.c | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/mm/zswap.c b/mm/zswap.c index 7ee54a3d8281..976f278aa507 100644 --- a/mm/zswap.c +++ b/mm/zswap.c @@ -707,7 +707,7 @@ static int zswap_dstmem_prepare(unsigned int cpu) struct mutex *mutex; u8 *dst; =20 - dst =3D kmalloc_node(PAGE_SIZE * 2, GFP_KERNEL, cpu_to_node(cpu)); + dst =3D kmalloc_node(PAGE_SIZE, GFP_KERNEL, cpu_to_node(cpu)); if (!dst) return -ENOMEM; =20 @@ -1662,8 +1662,7 @@ bool zswap_store(struct folio *folio) sg_init_table(&input, 1); sg_set_page(&input, page, PAGE_SIZE, 0); =20 - /* zswap_dstmem is of size (PAGE_SIZE * 2). Reflect same in sg_list */ - sg_init_one(&output, dst, PAGE_SIZE * 2); + sg_init_one(&output, dst, PAGE_SIZE); acomp_request_set_params(acomp_ctx->req, &input, &output, PAGE_SIZE, dlen= ); /* * it maybe looks a little bit silly that we send an asynchronous request, --=20 b4 0.10.1 From nobody Sat Dec 27 17:04:49 2025 Received: from out-179.mta0.migadu.com (out-179.mta0.migadu.com [91.218.175.179]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id E01F23715B for ; Mon, 18 Dec 2023 11:50:45 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dmarc=fail (p=quarantine dis=none) header.from=bytedance.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.dev X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. From: Chengming Zhou Date: Mon, 18 Dec 2023 11:50:32 +0000 Subject: [PATCH v3 2/6] mm/zswap: reuse dstmem when decompress Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Message-Id: <20231213-zswap-dstmem-v3-2-4eac09b94ece@bytedance.com> References: <20231213-zswap-dstmem-v3-0-4eac09b94ece@bytedance.com> In-Reply-To: <20231213-zswap-dstmem-v3-0-4eac09b94ece@bytedance.com> To: Seth Jennings , Yosry Ahmed , Vitaly Wool , Dan Streetman , Johannes Weiner , Chris Li , Andrew Morton , Nhat Pham Cc: Chris Li , Yosry Ahmed , linux-kernel@vger.kernel.org, Chengming Zhou , linux-mm@kvack.org, Nhat Pham X-Developer-Signature: v=1; a=ed25519-sha256; t=1702900234; l=4279; i=zhouchengming@bytedance.com; s=20231204; h=from:subject:message-id; bh=JxHludCxMK0OPSXqNpXYMeMWjmEk17jj62CkEYJJ0Kc=; b=FkiCcWV+1XpQFAG5Ryg3DEVWXLZeHzNrlB7Z4iSS32PbvemI7sdMZ3kL9sbWdE9EZPQ8OiHKX xI4/eIK1X/tCF7SpnRPu4ma50Jp3RTDNqWgnkzjDcnf1OYQ4y7gNKV3 X-Developer-Key: i=zhouchengming@bytedance.com; a=ed25519; pk=xFTmRtMG3vELGJBUiml7OYNdM393WOMv0iWWeQEVVdA= X-Migadu-Flow: FLOW_OUT In the !zpool_can_sleep_mapped() case such as zsmalloc, we need to first copy the entry->handle memory to a temporary memory, which is allocated using kmalloc. Obviously we can reuse the per-compressor dstmem to avoid allocating every time, since it's percpu-compressor and protected in percpu mutex. Reviewed-by: Nhat Pham Acked-by: Chris Li Reviewed-by: Yosry Ahmed Signed-off-by: Chengming Zhou --- mm/zswap.c | 44 ++++++++++++-------------------------------- 1 file changed, 12 insertions(+), 32 deletions(-) diff --git a/mm/zswap.c b/mm/zswap.c index 976f278aa507..6b872744e962 100644 --- a/mm/zswap.c +++ b/mm/zswap.c @@ -1417,19 +1417,13 @@ static int zswap_writeback_entry(struct zswap_entry= *entry, struct crypto_acomp_ctx *acomp_ctx; struct zpool *pool =3D zswap_find_zpool(entry); bool page_was_allocated; - u8 *src, *tmp =3D NULL; + u8 *src; unsigned int dlen; int ret; struct writeback_control wbc =3D { .sync_mode =3D WB_SYNC_NONE, }; =20 - if (!zpool_can_sleep_mapped(pool)) { - tmp =3D kmalloc(PAGE_SIZE, GFP_KERNEL); - if (!tmp) - return -ENOMEM; - } - /* try to allocate swap cache page */ mpol =3D get_task_policy(current); page =3D __read_swap_cache_async(swpentry, GFP_KERNEL, mpol, @@ -1465,15 +1459,15 @@ static int zswap_writeback_entry(struct zswap_entry= *entry, /* decompress */ acomp_ctx =3D raw_cpu_ptr(entry->pool->acomp_ctx); dlen =3D PAGE_SIZE; + mutex_lock(acomp_ctx->mutex); =20 src =3D zpool_map_handle(pool, entry->handle, ZPOOL_MM_RO); if (!zpool_can_sleep_mapped(pool)) { - memcpy(tmp, src, entry->length); - src =3D tmp; + memcpy(acomp_ctx->dstmem, src, entry->length); + src =3D acomp_ctx->dstmem; zpool_unmap_handle(pool, entry->handle); } =20 - mutex_lock(acomp_ctx->mutex); sg_init_one(&input, src, entry->length); sg_init_table(&output, 1); sg_set_page(&output, page, PAGE_SIZE, 0); @@ -1482,9 +1476,7 @@ static int zswap_writeback_entry(struct zswap_entry *= entry, dlen =3D acomp_ctx->req->dlen; mutex_unlock(acomp_ctx->mutex); =20 - if (!zpool_can_sleep_mapped(pool)) - kfree(tmp); - else + if (zpool_can_sleep_mapped(pool)) zpool_unmap_handle(pool, entry->handle); =20 BUG_ON(ret); @@ -1508,9 +1500,6 @@ static int zswap_writeback_entry(struct zswap_entry *= entry, return ret; =20 fail: - if (!zpool_can_sleep_mapped(pool)) - kfree(tmp); - /* * If we get here because the page is already in swapcache, a * load may be happening concurrently. It is safe and okay to @@ -1771,7 +1760,7 @@ bool zswap_load(struct folio *folio) struct zswap_entry *entry; struct scatterlist input, output; struct crypto_acomp_ctx *acomp_ctx; - u8 *src, *dst, *tmp; + u8 *src, *dst; struct zpool *zpool; unsigned int dlen; bool ret; @@ -1796,26 +1785,19 @@ bool zswap_load(struct folio *folio) } =20 zpool =3D zswap_find_zpool(entry); - if (!zpool_can_sleep_mapped(zpool)) { - tmp =3D kmalloc(entry->length, GFP_KERNEL); - if (!tmp) { - ret =3D false; - goto freeentry; - } - } =20 /* decompress */ dlen =3D PAGE_SIZE; - src =3D zpool_map_handle(zpool, entry->handle, ZPOOL_MM_RO); + acomp_ctx =3D raw_cpu_ptr(entry->pool->acomp_ctx); + mutex_lock(acomp_ctx->mutex); =20 + src =3D zpool_map_handle(zpool, entry->handle, ZPOOL_MM_RO); if (!zpool_can_sleep_mapped(zpool)) { - memcpy(tmp, src, entry->length); - src =3D tmp; + memcpy(acomp_ctx->dstmem, src, entry->length); + src =3D acomp_ctx->dstmem; zpool_unmap_handle(zpool, entry->handle); } =20 - acomp_ctx =3D raw_cpu_ptr(entry->pool->acomp_ctx); - mutex_lock(acomp_ctx->mutex); sg_init_one(&input, src, entry->length); sg_init_table(&output, 1); sg_set_page(&output, page, PAGE_SIZE, 0); @@ -1826,15 +1808,13 @@ bool zswap_load(struct folio *folio) =20 if (zpool_can_sleep_mapped(zpool)) zpool_unmap_handle(zpool, entry->handle); - else - kfree(tmp); =20 ret =3D true; stats: count_vm_event(ZSWPIN); if (entry->objcg) count_objcg_event(entry->objcg, ZSWPIN); -freeentry: + spin_lock(&tree->lock); if (ret && zswap_exclusive_loads_enabled) { zswap_invalidate_entry(tree, entry); --=20 b4 0.10.1 From nobody Sat Dec 27 17:04:49 2025 Received: from out-188.mta0.migadu.com (out-188.mta0.migadu.com [91.218.175.188]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 269BE374F2 for ; Mon, 18 Dec 2023 11:50:47 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dmarc=fail (p=quarantine dis=none) header.from=bytedance.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.dev X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. From: Chengming Zhou Date: Mon, 18 Dec 2023 11:50:33 +0000 Subject: [PATCH v3 3/6] mm/zswap: refactor out __zswap_load() Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Message-Id: <20231213-zswap-dstmem-v3-3-4eac09b94ece@bytedance.com> References: <20231213-zswap-dstmem-v3-0-4eac09b94ece@bytedance.com> In-Reply-To: <20231213-zswap-dstmem-v3-0-4eac09b94ece@bytedance.com> To: Seth Jennings , Yosry Ahmed , Vitaly Wool , Dan Streetman , Johannes Weiner , Chris Li , Andrew Morton , Nhat Pham Cc: Chris Li , Yosry Ahmed , linux-kernel@vger.kernel.org, Chengming Zhou , linux-mm@kvack.org, Nhat Pham X-Developer-Signature: v=1; a=ed25519-sha256; t=1702900234; l=4703; i=zhouchengming@bytedance.com; s=20231204; h=from:subject:message-id; bh=B99wC6AH9+R5m6Djw0+W/AJA7ag/bUbsLCKvYrrlo8Q=; b=ild77L3TDGLGQu8Eca6JaXi/fIaO5CaZVqyj/qCaIHMIACqX9eH+pkWshvQ4Y9TZWp3ytCZsC rYJYgewIBt6BWiDIhPEOZvJMVJypVdwjYuUQ0PbiowCtIZh16LIxHdH X-Developer-Key: i=zhouchengming@bytedance.com; a=ed25519; pk=xFTmRtMG3vELGJBUiml7OYNdM393WOMv0iWWeQEVVdA= X-Migadu-Flow: FLOW_OUT The zswap_load() and zswap_writeback_entry() have the same part that decompress the data from zswap_entry to page, so refactor out the common part as __zswap_load(entry, page). Reviewed-by: Nhat Pham Reviewed-by: Yosry Ahmed Signed-off-by: Chengming Zhou Acked-by: Chris Li (Google) --- mm/zswap.c | 92 ++++++++++++++++++++++------------------------------------= ---- 1 file changed, 32 insertions(+), 60 deletions(-) diff --git a/mm/zswap.c b/mm/zswap.c index 6b872744e962..3433bd6b3cef 100644 --- a/mm/zswap.c +++ b/mm/zswap.c @@ -1392,6 +1392,35 @@ static int zswap_enabled_param_set(const char *val, return ret; } =20 +static void __zswap_load(struct zswap_entry *entry, struct page *page) +{ + struct zpool *zpool =3D zswap_find_zpool(entry); + struct scatterlist input, output; + struct crypto_acomp_ctx *acomp_ctx; + u8 *src; + + acomp_ctx =3D raw_cpu_ptr(entry->pool->acomp_ctx); + mutex_lock(acomp_ctx->mutex); + + src =3D zpool_map_handle(zpool, entry->handle, ZPOOL_MM_RO); + if (!zpool_can_sleep_mapped(zpool)) { + memcpy(acomp_ctx->dstmem, src, entry->length); + src =3D acomp_ctx->dstmem; + zpool_unmap_handle(zpool, entry->handle); + } + + sg_init_one(&input, src, entry->length); + sg_init_table(&output, 1); + sg_set_page(&output, page, PAGE_SIZE, 0); + acomp_request_set_params(acomp_ctx->req, &input, &output, entry->length, = PAGE_SIZE); + BUG_ON(crypto_wait_req(crypto_acomp_decompress(acomp_ctx->req), &acomp_ct= x->wait)); + BUG_ON(acomp_ctx->req->dlen !=3D PAGE_SIZE); + mutex_unlock(acomp_ctx->mutex); + + if (zpool_can_sleep_mapped(zpool)) + zpool_unmap_handle(zpool, entry->handle); +} + /********************************* * writeback code **********************************/ @@ -1413,12 +1442,7 @@ static int zswap_writeback_entry(struct zswap_entry = *entry, swp_entry_t swpentry =3D entry->swpentry; struct page *page; struct mempolicy *mpol; - struct scatterlist input, output; - struct crypto_acomp_ctx *acomp_ctx; - struct zpool *pool =3D zswap_find_zpool(entry); bool page_was_allocated; - u8 *src; - unsigned int dlen; int ret; struct writeback_control wbc =3D { .sync_mode =3D WB_SYNC_NONE, @@ -1456,31 +1480,7 @@ static int zswap_writeback_entry(struct zswap_entry = *entry, } spin_unlock(&tree->lock); =20 - /* decompress */ - acomp_ctx =3D raw_cpu_ptr(entry->pool->acomp_ctx); - dlen =3D PAGE_SIZE; - mutex_lock(acomp_ctx->mutex); - - src =3D zpool_map_handle(pool, entry->handle, ZPOOL_MM_RO); - if (!zpool_can_sleep_mapped(pool)) { - memcpy(acomp_ctx->dstmem, src, entry->length); - src =3D acomp_ctx->dstmem; - zpool_unmap_handle(pool, entry->handle); - } - - sg_init_one(&input, src, entry->length); - sg_init_table(&output, 1); - sg_set_page(&output, page, PAGE_SIZE, 0); - acomp_request_set_params(acomp_ctx->req, &input, &output, entry->length, = dlen); - ret =3D crypto_wait_req(crypto_acomp_decompress(acomp_ctx->req), &acomp_c= tx->wait); - dlen =3D acomp_ctx->req->dlen; - mutex_unlock(acomp_ctx->mutex); - - if (zpool_can_sleep_mapped(pool)) - zpool_unmap_handle(pool, entry->handle); - - BUG_ON(ret); - BUG_ON(dlen !=3D PAGE_SIZE); + __zswap_load(entry, page); =20 /* page is up to date */ SetPageUptodate(page); @@ -1758,11 +1758,7 @@ bool zswap_load(struct folio *folio) struct page *page =3D &folio->page; struct zswap_tree *tree =3D zswap_trees[type]; struct zswap_entry *entry; - struct scatterlist input, output; - struct crypto_acomp_ctx *acomp_ctx; - u8 *src, *dst; - struct zpool *zpool; - unsigned int dlen; + u8 *dst; bool ret; =20 VM_WARN_ON_ONCE(!folio_test_locked(folio)); @@ -1784,31 +1780,7 @@ bool zswap_load(struct folio *folio) goto stats; } =20 - zpool =3D zswap_find_zpool(entry); - - /* decompress */ - dlen =3D PAGE_SIZE; - acomp_ctx =3D raw_cpu_ptr(entry->pool->acomp_ctx); - mutex_lock(acomp_ctx->mutex); - - src =3D zpool_map_handle(zpool, entry->handle, ZPOOL_MM_RO); - if (!zpool_can_sleep_mapped(zpool)) { - memcpy(acomp_ctx->dstmem, src, entry->length); - src =3D acomp_ctx->dstmem; - zpool_unmap_handle(zpool, entry->handle); - } - - sg_init_one(&input, src, entry->length); - sg_init_table(&output, 1); - sg_set_page(&output, page, PAGE_SIZE, 0); - acomp_request_set_params(acomp_ctx->req, &input, &output, entry->length, = dlen); - if (crypto_wait_req(crypto_acomp_decompress(acomp_ctx->req), &acomp_ctx->= wait)) - WARN_ON(1); - mutex_unlock(acomp_ctx->mutex); - - if (zpool_can_sleep_mapped(zpool)) - zpool_unmap_handle(zpool, entry->handle); - + __zswap_load(entry, page); ret =3D true; stats: count_vm_event(ZSWPIN); --=20 b4 0.10.1 From nobody Sat Dec 27 17:04:49 2025 Received: from out-181.mta0.migadu.com (out-181.mta0.migadu.com [91.218.175.181]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 350EE37D12 for ; Mon, 18 Dec 2023 11:50:50 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dmarc=fail (p=quarantine dis=none) header.from=bytedance.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.dev X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. From: Chengming Zhou Date: Mon, 18 Dec 2023 11:50:34 +0000 Subject: [PATCH v3 4/6] mm/zswap: cleanup zswap_load() Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Message-Id: <20231213-zswap-dstmem-v3-4-4eac09b94ece@bytedance.com> References: <20231213-zswap-dstmem-v3-0-4eac09b94ece@bytedance.com> In-Reply-To: <20231213-zswap-dstmem-v3-0-4eac09b94ece@bytedance.com> To: Seth Jennings , Yosry Ahmed , Vitaly Wool , Dan Streetman , Johannes Weiner , Chris Li , Andrew Morton , Nhat Pham Cc: Chris Li , Yosry Ahmed , linux-kernel@vger.kernel.org, Chengming Zhou , linux-mm@kvack.org, Nhat Pham X-Developer-Signature: v=1; a=ed25519-sha256; t=1702900234; l=1470; i=zhouchengming@bytedance.com; s=20231204; h=from:subject:message-id; bh=1eOM2WlTMMQMFmZcsE6sU450y2r77yOSYBMqqAB7HFE=; b=mA1a+oM0pO2x4ySl9ZU6T4A1b2XOYkGKZpXj4gIvy56X1aoJsXnojsb8m/11LLzDFIBsJEhQx i+goKQs7a1CAJLnNJrSVIiCo2UaAEaJ/sfFe3MjPRQ5aHJnbPdiJDGR X-Developer-Key: i=zhouchengming@bytedance.com; a=ed25519; pk=xFTmRtMG3vELGJBUiml7OYNdM393WOMv0iWWeQEVVdA= X-Migadu-Flow: FLOW_OUT After the common decompress part goes to __zswap_load(), we can cleanup the zswap_load() a little. Reviewed-by: Yosry Ahmed Signed-off-by: Chengming Zhou Acked-by: Chis Li (Google) --- mm/zswap.c | 12 ++++-------- 1 file changed, 4 insertions(+), 8 deletions(-) diff --git a/mm/zswap.c b/mm/zswap.c index 3433bd6b3cef..86886276cb81 100644 --- a/mm/zswap.c +++ b/mm/zswap.c @@ -1759,7 +1759,6 @@ bool zswap_load(struct folio *folio) struct zswap_tree *tree =3D zswap_trees[type]; struct zswap_entry *entry; u8 *dst; - bool ret; =20 VM_WARN_ON_ONCE(!folio_test_locked(folio)); =20 @@ -1776,19 +1775,16 @@ bool zswap_load(struct folio *folio) dst =3D kmap_local_page(page); zswap_fill_page(dst, entry->value); kunmap_local(dst); - ret =3D true; - goto stats; + } else { + __zswap_load(entry, page); } =20 - __zswap_load(entry, page); - ret =3D true; -stats: count_vm_event(ZSWPIN); if (entry->objcg) count_objcg_event(entry->objcg, ZSWPIN); =20 spin_lock(&tree->lock); - if (ret && zswap_exclusive_loads_enabled) { + if (zswap_exclusive_loads_enabled) { zswap_invalidate_entry(tree, entry); folio_mark_dirty(folio); } else if (entry->length) { @@ -1798,7 +1794,7 @@ bool zswap_load(struct folio *folio) zswap_entry_put(tree, entry); spin_unlock(&tree->lock); =20 - return ret; + return true; } =20 void zswap_invalidate(int type, pgoff_t offset) --=20 b4 0.10.1 From nobody Sat Dec 27 17:04:49 2025 Received: from out-185.mta0.migadu.com (out-185.mta0.migadu.com [91.218.175.185]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 87C27381CD for ; Mon, 18 Dec 2023 11:50:52 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dmarc=fail (p=quarantine dis=none) header.from=bytedance.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.dev X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. From: Chengming Zhou Date: Mon, 18 Dec 2023 11:50:35 +0000 Subject: [PATCH v3 5/6] mm/zswap: cleanup zswap_writeback_entry() Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Message-Id: <20231213-zswap-dstmem-v3-5-4eac09b94ece@bytedance.com> References: <20231213-zswap-dstmem-v3-0-4eac09b94ece@bytedance.com> In-Reply-To: <20231213-zswap-dstmem-v3-0-4eac09b94ece@bytedance.com> To: Seth Jennings , Yosry Ahmed , Vitaly Wool , Dan Streetman , Johannes Weiner , Chris Li , Andrew Morton , Nhat Pham Cc: Chris Li , Yosry Ahmed , linux-kernel@vger.kernel.org, Chengming Zhou , linux-mm@kvack.org, Nhat Pham X-Developer-Signature: v=1; a=ed25519-sha256; t=1702900234; l=2215; i=zhouchengming@bytedance.com; s=20231204; h=from:subject:message-id; bh=u9D8B3ksO9G67avNjANTAO3qS+/SASwDD1QW2eXuPmA=; b=bX9QDtAp6LhqXJx8Yr0LWdcF0z06lztvYdnRzlwugcBS4LaRmP3c9JNAMHRIFxNxSVfv4KXJP 2yn6oW6AM20DLSOlXaQvsN42SgrRlNXTLt3ZWYzqDYQTNtDhy74I4xe X-Developer-Key: i=zhouchengming@bytedance.com; a=ed25519; pk=xFTmRtMG3vELGJBUiml7OYNdM393WOMv0iWWeQEVVdA= X-Migadu-Flow: FLOW_OUT Also after the common decompress part goes to __zswap_load(), we can cleanup the zswap_writeback_entry() a little. Reviewed-by: Yosry Ahmed Reviewed-by: Nhat Pham Signed-off-by: Chengming Zhou Acked-by: Chris Li (Google) --- mm/zswap.c | 25 +++++++++---------------- 1 file changed, 9 insertions(+), 16 deletions(-) diff --git a/mm/zswap.c b/mm/zswap.c index 86886276cb81..2c349fd88904 100644 --- a/mm/zswap.c +++ b/mm/zswap.c @@ -1443,7 +1443,6 @@ static int zswap_writeback_entry(struct zswap_entry *= entry, struct page *page; struct mempolicy *mpol; bool page_was_allocated; - int ret; struct writeback_control wbc =3D { .sync_mode =3D WB_SYNC_NONE, }; @@ -1453,15 +1452,18 @@ static int zswap_writeback_entry(struct zswap_entry= *entry, page =3D __read_swap_cache_async(swpentry, GFP_KERNEL, mpol, NO_INTERLEAVE_INDEX, &page_was_allocated, true); if (!page) { - ret =3D -ENOMEM; - goto fail; + /* + * If we get here because the page is already in swapcache, a + * load may be happening concurrently. It is safe and okay to + * not free the entry. It is also okay to return !0. + */ + return -ENOMEM; } =20 /* Found an existing page, we raced with load/swapin */ if (!page_was_allocated) { put_page(page); - ret =3D -EEXIST; - goto fail; + return -EEXIST; } =20 /* @@ -1475,8 +1477,7 @@ static int zswap_writeback_entry(struct zswap_entry *= entry, if (zswap_rb_search(&tree->rbroot, swp_offset(entry->swpentry)) !=3D entr= y) { spin_unlock(&tree->lock); delete_from_swap_cache(page_folio(page)); - ret =3D -ENOMEM; - goto fail; + return -ENOMEM; } spin_unlock(&tree->lock); =20 @@ -1497,15 +1498,7 @@ static int zswap_writeback_entry(struct zswap_entry = *entry, __swap_writepage(page, &wbc); put_page(page); =20 - return ret; - -fail: - /* - * If we get here because the page is already in swapcache, a - * load may be happening concurrently. It is safe and okay to - * not free the entry. It is also okay to return !0. - */ - return ret; + return 0; } =20 static int zswap_is_page_same_filled(void *ptr, unsigned long *value) --=20 b4 0.10.1 From nobody Sat Dec 27 17:04:49 2025 Received: from out-176.mta0.migadu.com (out-176.mta0.migadu.com [91.218.175.176]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 7F55938F94 for ; Mon, 18 Dec 2023 11:50:54 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dmarc=fail (p=quarantine dis=none) header.from=bytedance.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.dev X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. From: Chengming Zhou Date: Mon, 18 Dec 2023 11:50:36 +0000 Subject: [PATCH v3 6/6] mm/zswap: directly use percpu mutex and buffer in load/store Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Message-Id: <20231213-zswap-dstmem-v3-6-4eac09b94ece@bytedance.com> References: <20231213-zswap-dstmem-v3-0-4eac09b94ece@bytedance.com> In-Reply-To: <20231213-zswap-dstmem-v3-0-4eac09b94ece@bytedance.com> To: Seth Jennings , Yosry Ahmed , Vitaly Wool , Dan Streetman , Johannes Weiner , Chris Li , Andrew Morton , Nhat Pham Cc: Chris Li , Yosry Ahmed , linux-kernel@vger.kernel.org, Chengming Zhou , linux-mm@kvack.org, Nhat Pham X-Developer-Signature: v=1; a=ed25519-sha256; t=1702900234; l=6471; i=zhouchengming@bytedance.com; s=20231204; h=from:subject:message-id; bh=culrrHCm/Hf4dwHwtvZ06lQ5NsEncVU/QGy2Bp4v+0o=; b=b7n0OpxYpR3XA5GiwrtGYchVJRrAlhz5jqYK7l+kMC6lZWSSrzA4mZoc1L8QALpl7yA1Z8BpB gUTc5av/dY8AUdT5lUxvytYE9E+8PlB9bUSPPKrHM97afZKBM0ohjLG X-Developer-Key: i=zhouchengming@bytedance.com; a=ed25519; pk=xFTmRtMG3vELGJBUiml7OYNdM393WOMv0iWWeQEVVdA= X-Migadu-Flow: FLOW_OUT Since the introduce of reusing the dstmem in the load path, it seems confusing that we are now using acomp_ctx->dstmem and acomp_ctx->mutex now for purposes other than what the naming suggests. Yosry suggested removing these two fields from acomp_ctx, and directly using zswap_dstmem and zswap_mutex in both the load and store paths, rename them, and add proper comments above their definitions that they are for generic percpu buffering on the load and store paths. So this patch remove dstmem and mutex from acomp_ctx, and rename the zswap_dstmem to zswap_buffer, using the percpu mutex and buffer on the load and store paths. Suggested-by: Yosry Ahmed Signed-off-by: Chengming Zhou Acked-by: Chris Li (Google) --- mm/zswap.c | 69 +++++++++++++++++++++++++++++++++-------------------------= ---- 1 file changed, 37 insertions(+), 32 deletions(-) diff --git a/mm/zswap.c b/mm/zswap.c index 2c349fd88904..71bdcd552e5b 100644 --- a/mm/zswap.c +++ b/mm/zswap.c @@ -166,8 +166,6 @@ struct crypto_acomp_ctx { struct crypto_acomp *acomp; struct acomp_req *req; struct crypto_wait wait; - u8 *dstmem; - struct mutex *mutex; }; =20 /* @@ -694,7 +692,7 @@ static void zswap_alloc_shrinker(struct zswap_pool *poo= l) /********************************* * per-cpu code **********************************/ -static DEFINE_PER_CPU(u8 *, zswap_dstmem); +static DEFINE_PER_CPU(u8 *, zswap_buffer); /* * If users dynamically change the zpool type and compressor at runtime, i= .e. * zswap is running, zswap can have more than one zpool on one cpu, but th= ey @@ -702,39 +700,39 @@ static DEFINE_PER_CPU(u8 *, zswap_dstmem); */ static DEFINE_PER_CPU(struct mutex *, zswap_mutex); =20 -static int zswap_dstmem_prepare(unsigned int cpu) +static int zswap_buffer_prepare(unsigned int cpu) { struct mutex *mutex; - u8 *dst; + u8 *buf; =20 - dst =3D kmalloc_node(PAGE_SIZE, GFP_KERNEL, cpu_to_node(cpu)); - if (!dst) + buf =3D kmalloc_node(PAGE_SIZE, GFP_KERNEL, cpu_to_node(cpu)); + if (!buf) return -ENOMEM; =20 mutex =3D kmalloc_node(sizeof(*mutex), GFP_KERNEL, cpu_to_node(cpu)); if (!mutex) { - kfree(dst); + kfree(buf); return -ENOMEM; } =20 mutex_init(mutex); - per_cpu(zswap_dstmem, cpu) =3D dst; + per_cpu(zswap_buffer, cpu) =3D buf; per_cpu(zswap_mutex, cpu) =3D mutex; return 0; } =20 -static int zswap_dstmem_dead(unsigned int cpu) +static int zswap_buffer_dead(unsigned int cpu) { struct mutex *mutex; - u8 *dst; + u8 *buf; =20 mutex =3D per_cpu(zswap_mutex, cpu); kfree(mutex); per_cpu(zswap_mutex, cpu) =3D NULL; =20 - dst =3D per_cpu(zswap_dstmem, cpu); - kfree(dst); - per_cpu(zswap_dstmem, cpu) =3D NULL; + buf =3D per_cpu(zswap_buffer, cpu); + kfree(buf); + per_cpu(zswap_buffer, cpu) =3D NULL; =20 return 0; } @@ -772,9 +770,6 @@ static int zswap_cpu_comp_prepare(unsigned int cpu, str= uct hlist_node *node) acomp_request_set_callback(req, CRYPTO_TFM_REQ_MAY_BACKLOG, crypto_req_done, &acomp_ctx->wait); =20 - acomp_ctx->mutex =3D per_cpu(zswap_mutex, cpu); - acomp_ctx->dstmem =3D per_cpu(zswap_dstmem, cpu); - return 0; } =20 @@ -1397,15 +1392,21 @@ static void __zswap_load(struct zswap_entry *entry,= struct page *page) struct zpool *zpool =3D zswap_find_zpool(entry); struct scatterlist input, output; struct crypto_acomp_ctx *acomp_ctx; - u8 *src; + u8 *src, *buf; + int cpu; + struct mutex *mutex; =20 - acomp_ctx =3D raw_cpu_ptr(entry->pool->acomp_ctx); - mutex_lock(acomp_ctx->mutex); + cpu =3D raw_smp_processor_id(); + mutex =3D per_cpu(zswap_mutex, cpu); + mutex_lock(mutex); + + acomp_ctx =3D per_cpu_ptr(entry->pool->acomp_ctx, cpu); =20 src =3D zpool_map_handle(zpool, entry->handle, ZPOOL_MM_RO); if (!zpool_can_sleep_mapped(zpool)) { - memcpy(acomp_ctx->dstmem, src, entry->length); - src =3D acomp_ctx->dstmem; + buf =3D per_cpu(zswap_buffer, cpu); + memcpy(buf, src, entry->length); + src =3D buf; zpool_unmap_handle(zpool, entry->handle); } =20 @@ -1415,7 +1416,7 @@ static void __zswap_load(struct zswap_entry *entry, s= truct page *page) acomp_request_set_params(acomp_ctx->req, &input, &output, entry->length, = PAGE_SIZE); BUG_ON(crypto_wait_req(crypto_acomp_decompress(acomp_ctx->req), &acomp_ct= x->wait)); BUG_ON(acomp_ctx->req->dlen !=3D PAGE_SIZE); - mutex_unlock(acomp_ctx->mutex); + mutex_unlock(mutex); =20 if (zpool_can_sleep_mapped(zpool)) zpool_unmap_handle(zpool, entry->handle); @@ -1551,6 +1552,8 @@ bool zswap_store(struct folio *folio) u8 *src, *dst; gfp_t gfp; int ret; + int cpu; + struct mutex *mutex; =20 VM_WARN_ON_ONCE(!folio_test_locked(folio)); VM_WARN_ON_ONCE(!folio_test_swapcache(folio)); @@ -1636,11 +1639,13 @@ bool zswap_store(struct folio *folio) } =20 /* compress */ - acomp_ctx =3D raw_cpu_ptr(entry->pool->acomp_ctx); + cpu =3D raw_smp_processor_id(); + mutex =3D per_cpu(zswap_mutex, cpu); + mutex_lock(mutex); =20 - mutex_lock(acomp_ctx->mutex); + acomp_ctx =3D per_cpu_ptr(entry->pool->acomp_ctx, cpu); + dst =3D per_cpu(zswap_buffer, cpu); =20 - dst =3D acomp_ctx->dstmem; sg_init_table(&input, 1); sg_set_page(&input, page, PAGE_SIZE, 0); =20 @@ -1683,7 +1688,7 @@ bool zswap_store(struct folio *folio) buf =3D zpool_map_handle(zpool, handle, ZPOOL_MM_WO); memcpy(buf, dst, dlen); zpool_unmap_handle(zpool, handle); - mutex_unlock(acomp_ctx->mutex); + mutex_unlock(mutex); =20 /* populate entry */ entry->swpentry =3D swp_entry(type, offset); @@ -1726,7 +1731,7 @@ bool zswap_store(struct folio *folio) return true; =20 put_dstmem: - mutex_unlock(acomp_ctx->mutex); + mutex_unlock(mutex); put_pool: zswap_pool_put(entry->pool); freepage: @@ -1902,10 +1907,10 @@ static int zswap_setup(void) } =20 ret =3D cpuhp_setup_state(CPUHP_MM_ZSWP_MEM_PREPARE, "mm/zswap:prepare", - zswap_dstmem_prepare, zswap_dstmem_dead); + zswap_buffer_prepare, zswap_buffer_dead); if (ret) { - pr_err("dstmem alloc failed\n"); - goto dstmem_fail; + pr_err("buffer alloc failed\n"); + goto buffer_fail; } =20 ret =3D cpuhp_setup_state_multi(CPUHP_MM_ZSWP_POOL_PREPARE, @@ -1940,7 +1945,7 @@ static int zswap_setup(void) zswap_pool_destroy(pool); hp_fail: cpuhp_remove_state(CPUHP_MM_ZSWP_MEM_PREPARE); -dstmem_fail: +buffer_fail: kmem_cache_destroy(zswap_entry_cache); cache_fail: /* if built-in, we aren't unloaded on failure; don't allow use */ --=20 b4 0.10.1