From nobody Sat Dec 27 01:29:27 2025 Received: from out-186.mta0.migadu.com (out-186.mta0.migadu.com [91.218.175.186]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 8855D4F61D for ; Tue, 26 Dec 2023 15:54:51 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dmarc=fail (p=quarantine dis=none) header.from=bytedance.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.dev X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. From: Chengming Zhou Date: Tue, 26 Dec 2023 15:54:08 +0000 Subject: [PATCH v4 1/6] mm/zswap: change dstmem size to one page Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Message-Id: <20231213-zswap-dstmem-v4-1-f228b059dd89@bytedance.com> References: <20231213-zswap-dstmem-v4-0-f228b059dd89@bytedance.com> In-Reply-To: <20231213-zswap-dstmem-v4-0-f228b059dd89@bytedance.com> To: Andrew Morton , Seth Jennings , Johannes Weiner , Vitaly Wool , Nhat Pham , Chris Li , Yosry Ahmed , Dan Streetman Cc: linux-kernel@vger.kernel.org, Chengming Zhou , linux-mm@kvack.org, Nhat Pham , Yosry Ahmed , Chris Li X-Developer-Signature: v=1; a=ed25519-sha256; t=1703606082; l=2061; i=zhouchengming@bytedance.com; s=20231204; h=from:subject:message-id; bh=AFQJqRTNy44iATFj1RKZvqHU27WPg0GQMTPIPLqlpaM=; b=vEekGORU58JK/Xo7doVi9/xQDqn1SRUlc3hL7E1KvYToyqnm/hNB5CBcjOCkuqUuNL67fzR90 Z+hf+n1SDUGCIkh8qx1cZk4a0ATT8In1GnwVY+KqUFhinM4gqqu4nFg X-Developer-Key: i=zhouchengming@bytedance.com; a=ed25519; pk=xFTmRtMG3vELGJBUiml7OYNdM393WOMv0iWWeQEVVdA= X-Migadu-Flow: FLOW_OUT Change the dstmem size from 2 * PAGE_SIZE to only one page since we only need at most one page when compress, and the "dlen" is also PAGE_SIZE in acomp_request_set_params(). If the output size > PAGE_SIZE we don't wanna store the output in zswap anyway. So change it to one page, and delete the stale comment. There is no any history about the reason why we needed 2 pages, it has been 2 * PAGE_SIZE since the time zswap was first merged. According to Yosry and Nhat, one potential reason is that we used to store a zswap header containing the swap entry in the compressed page for writeback purposes, but we don't do that anymore. This patch works good in kernel build testing even when the input data doesn't compress at all (i.e. dlen =3D=3D PAGE_SIZE), which we can see from the bpftrace tool: bpftrace -e 'k:zpool_malloc {@[(uint32)arg1=3D=3D4096]=3Dcount()}' @[1]: 2 @[0]: 12011430 Reviewed-by: Yosry Ahmed Reviewed-by: Nhat Pham Acked-by: Chris Li (Google) Signed-off-by: Chengming Zhou --- mm/zswap.c | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/mm/zswap.c b/mm/zswap.c index 7ee54a3d8281..976f278aa507 100644 --- a/mm/zswap.c +++ b/mm/zswap.c @@ -707,7 +707,7 @@ static int zswap_dstmem_prepare(unsigned int cpu) struct mutex *mutex; u8 *dst; =20 - dst =3D kmalloc_node(PAGE_SIZE * 2, GFP_KERNEL, cpu_to_node(cpu)); + dst =3D kmalloc_node(PAGE_SIZE, GFP_KERNEL, cpu_to_node(cpu)); if (!dst) return -ENOMEM; =20 @@ -1662,8 +1662,7 @@ bool zswap_store(struct folio *folio) sg_init_table(&input, 1); sg_set_page(&input, page, PAGE_SIZE, 0); =20 - /* zswap_dstmem is of size (PAGE_SIZE * 2). Reflect same in sg_list */ - sg_init_one(&output, dst, PAGE_SIZE * 2); + sg_init_one(&output, dst, PAGE_SIZE); acomp_request_set_params(acomp_ctx->req, &input, &output, PAGE_SIZE, dlen= ); /* * it maybe looks a little bit silly that we send an asynchronous request, --=20 b4 0.10.1 From nobody Sat Dec 27 01:29:27 2025 Received: from out-173.mta0.migadu.com (out-173.mta0.migadu.com [91.218.175.173]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 2749C4FF74 for ; Tue, 26 Dec 2023 15:54:54 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dmarc=fail (p=quarantine dis=none) header.from=bytedance.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.dev X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. From: Chengming Zhou Date: Tue, 26 Dec 2023 15:54:09 +0000 Subject: [PATCH v4 2/6] mm/zswap: reuse dstmem when decompress Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Message-Id: <20231213-zswap-dstmem-v4-2-f228b059dd89@bytedance.com> References: <20231213-zswap-dstmem-v4-0-f228b059dd89@bytedance.com> In-Reply-To: <20231213-zswap-dstmem-v4-0-f228b059dd89@bytedance.com> To: Andrew Morton , Seth Jennings , Johannes Weiner , Vitaly Wool , Nhat Pham , Chris Li , Yosry Ahmed , Dan Streetman Cc: linux-kernel@vger.kernel.org, Chengming Zhou , linux-mm@kvack.org, Nhat Pham , Yosry Ahmed , Chris Li X-Developer-Signature: v=1; a=ed25519-sha256; t=1703606082; l=4288; i=zhouchengming@bytedance.com; s=20231204; h=from:subject:message-id; bh=ee6CjzVfu3cdsKJFW0DjU4Yqsb6snUyFV2P+WjqVZ2k=; b=/M2fa/388zR47dnBIiQiB32KQcYRHMBqBcSykhWTPfpxspJa5FXo9+SaEpDo1xt/3FIcNllNz cLnkXIGD4JGCQBc8dxPU/SL9D1/IOJXC80mVvuuJ/DxyQRPRAi4uCfR X-Developer-Key: i=zhouchengming@bytedance.com; a=ed25519; pk=xFTmRtMG3vELGJBUiml7OYNdM393WOMv0iWWeQEVVdA= X-Migadu-Flow: FLOW_OUT In the !zpool_can_sleep_mapped() case such as zsmalloc, we need to first copy the entry->handle memory to a temporary memory, which is allocated using kmalloc. Obviously we can reuse the per-compressor dstmem to avoid allocating every time, since it's percpu-compressor and protected in percpu mutex. Reviewed-by: Nhat Pham Reviewed-by: Yosry Ahmed Acked-by: Chris Li (Google) Signed-off-by: Chengming Zhou --- mm/zswap.c | 44 ++++++++++++-------------------------------- 1 file changed, 12 insertions(+), 32 deletions(-) diff --git a/mm/zswap.c b/mm/zswap.c index 976f278aa507..6b872744e962 100644 --- a/mm/zswap.c +++ b/mm/zswap.c @@ -1417,19 +1417,13 @@ static int zswap_writeback_entry(struct zswap_entry= *entry, struct crypto_acomp_ctx *acomp_ctx; struct zpool *pool =3D zswap_find_zpool(entry); bool page_was_allocated; - u8 *src, *tmp =3D NULL; + u8 *src; unsigned int dlen; int ret; struct writeback_control wbc =3D { .sync_mode =3D WB_SYNC_NONE, }; =20 - if (!zpool_can_sleep_mapped(pool)) { - tmp =3D kmalloc(PAGE_SIZE, GFP_KERNEL); - if (!tmp) - return -ENOMEM; - } - /* try to allocate swap cache page */ mpol =3D get_task_policy(current); page =3D __read_swap_cache_async(swpentry, GFP_KERNEL, mpol, @@ -1465,15 +1459,15 @@ static int zswap_writeback_entry(struct zswap_entry= *entry, /* decompress */ acomp_ctx =3D raw_cpu_ptr(entry->pool->acomp_ctx); dlen =3D PAGE_SIZE; + mutex_lock(acomp_ctx->mutex); =20 src =3D zpool_map_handle(pool, entry->handle, ZPOOL_MM_RO); if (!zpool_can_sleep_mapped(pool)) { - memcpy(tmp, src, entry->length); - src =3D tmp; + memcpy(acomp_ctx->dstmem, src, entry->length); + src =3D acomp_ctx->dstmem; zpool_unmap_handle(pool, entry->handle); } =20 - mutex_lock(acomp_ctx->mutex); sg_init_one(&input, src, entry->length); sg_init_table(&output, 1); sg_set_page(&output, page, PAGE_SIZE, 0); @@ -1482,9 +1476,7 @@ static int zswap_writeback_entry(struct zswap_entry *= entry, dlen =3D acomp_ctx->req->dlen; mutex_unlock(acomp_ctx->mutex); =20 - if (!zpool_can_sleep_mapped(pool)) - kfree(tmp); - else + if (zpool_can_sleep_mapped(pool)) zpool_unmap_handle(pool, entry->handle); =20 BUG_ON(ret); @@ -1508,9 +1500,6 @@ static int zswap_writeback_entry(struct zswap_entry *= entry, return ret; =20 fail: - if (!zpool_can_sleep_mapped(pool)) - kfree(tmp); - /* * If we get here because the page is already in swapcache, a * load may be happening concurrently. It is safe and okay to @@ -1771,7 +1760,7 @@ bool zswap_load(struct folio *folio) struct zswap_entry *entry; struct scatterlist input, output; struct crypto_acomp_ctx *acomp_ctx; - u8 *src, *dst, *tmp; + u8 *src, *dst; struct zpool *zpool; unsigned int dlen; bool ret; @@ -1796,26 +1785,19 @@ bool zswap_load(struct folio *folio) } =20 zpool =3D zswap_find_zpool(entry); - if (!zpool_can_sleep_mapped(zpool)) { - tmp =3D kmalloc(entry->length, GFP_KERNEL); - if (!tmp) { - ret =3D false; - goto freeentry; - } - } =20 /* decompress */ dlen =3D PAGE_SIZE; - src =3D zpool_map_handle(zpool, entry->handle, ZPOOL_MM_RO); + acomp_ctx =3D raw_cpu_ptr(entry->pool->acomp_ctx); + mutex_lock(acomp_ctx->mutex); =20 + src =3D zpool_map_handle(zpool, entry->handle, ZPOOL_MM_RO); if (!zpool_can_sleep_mapped(zpool)) { - memcpy(tmp, src, entry->length); - src =3D tmp; + memcpy(acomp_ctx->dstmem, src, entry->length); + src =3D acomp_ctx->dstmem; zpool_unmap_handle(zpool, entry->handle); } =20 - acomp_ctx =3D raw_cpu_ptr(entry->pool->acomp_ctx); - mutex_lock(acomp_ctx->mutex); sg_init_one(&input, src, entry->length); sg_init_table(&output, 1); sg_set_page(&output, page, PAGE_SIZE, 0); @@ -1826,15 +1808,13 @@ bool zswap_load(struct folio *folio) =20 if (zpool_can_sleep_mapped(zpool)) zpool_unmap_handle(zpool, entry->handle); - else - kfree(tmp); =20 ret =3D true; stats: count_vm_event(ZSWPIN); if (entry->objcg) count_objcg_event(entry->objcg, ZSWPIN); -freeentry: + spin_lock(&tree->lock); if (ret && zswap_exclusive_loads_enabled) { zswap_invalidate_entry(tree, entry); --=20 b4 0.10.1 From nobody Sat Dec 27 01:29:27 2025 Received: from out-187.mta0.migadu.com (out-187.mta0.migadu.com [91.218.175.187]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 532C751C32 for ; Tue, 26 Dec 2023 15:54:57 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dmarc=fail (p=quarantine dis=none) header.from=bytedance.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.dev X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. From: Chengming Zhou Date: Tue, 26 Dec 2023 15:54:10 +0000 Subject: [PATCH v4 3/6] mm/zswap: refactor out __zswap_load() Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Message-Id: <20231213-zswap-dstmem-v4-3-f228b059dd89@bytedance.com> References: <20231213-zswap-dstmem-v4-0-f228b059dd89@bytedance.com> In-Reply-To: <20231213-zswap-dstmem-v4-0-f228b059dd89@bytedance.com> To: Andrew Morton , Seth Jennings , Johannes Weiner , Vitaly Wool , Nhat Pham , Chris Li , Yosry Ahmed , Dan Streetman Cc: linux-kernel@vger.kernel.org, Chengming Zhou , linux-mm@kvack.org, Nhat Pham , Yosry Ahmed , Chris Li X-Developer-Signature: v=1; a=ed25519-sha256; t=1703606082; l=4752; i=zhouchengming@bytedance.com; s=20231204; h=from:subject:message-id; bh=rqaFBRN5T+K3BGykGqxbunVDkqGsdVLjPtenmvHXNns=; b=2whbnk++qzMgDfpxZEj6RN2GBW2XP2w/exd8KHJg/QUs3bQXx+0f1E7kPbE6TtAa+IGnjLVIP 4/8liyuX+0xCEjYzNGSZpQ+axcBpJmSX+TQmeVlzNI4BpBQ/XmcYob0 X-Developer-Key: i=zhouchengming@bytedance.com; a=ed25519; pk=xFTmRtMG3vELGJBUiml7OYNdM393WOMv0iWWeQEVVdA= X-Migadu-Flow: FLOW_OUT The zswap_load() and zswap_writeback_entry() have the same part that decompress the data from zswap_entry to page, so refactor out the common part as __zswap_load(entry, page). Reviewed-by: Nhat Pham Reviewed-by: Yosry Ahmed Acked-by: Chris Li (Google) Signed-off-by: Chengming Zhou --- mm/zswap.c | 92 ++++++++++++++++++++++------------------------------------= ---- 1 file changed, 32 insertions(+), 60 deletions(-) diff --git a/mm/zswap.c b/mm/zswap.c index 6b872744e962..3433bd6b3cef 100644 --- a/mm/zswap.c +++ b/mm/zswap.c @@ -1392,6 +1392,35 @@ static int zswap_enabled_param_set(const char *val, return ret; } =20 +static void __zswap_load(struct zswap_entry *entry, struct page *page) +{ + struct zpool *zpool =3D zswap_find_zpool(entry); + struct scatterlist input, output; + struct crypto_acomp_ctx *acomp_ctx; + u8 *src; + + acomp_ctx =3D raw_cpu_ptr(entry->pool->acomp_ctx); + mutex_lock(acomp_ctx->mutex); + + src =3D zpool_map_handle(zpool, entry->handle, ZPOOL_MM_RO); + if (!zpool_can_sleep_mapped(zpool)) { + memcpy(acomp_ctx->dstmem, src, entry->length); + src =3D acomp_ctx->dstmem; + zpool_unmap_handle(zpool, entry->handle); + } + + sg_init_one(&input, src, entry->length); + sg_init_table(&output, 1); + sg_set_page(&output, page, PAGE_SIZE, 0); + acomp_request_set_params(acomp_ctx->req, &input, &output, entry->length, = PAGE_SIZE); + BUG_ON(crypto_wait_req(crypto_acomp_decompress(acomp_ctx->req), &acomp_ct= x->wait)); + BUG_ON(acomp_ctx->req->dlen !=3D PAGE_SIZE); + mutex_unlock(acomp_ctx->mutex); + + if (zpool_can_sleep_mapped(zpool)) + zpool_unmap_handle(zpool, entry->handle); +} + /********************************* * writeback code **********************************/ @@ -1413,12 +1442,7 @@ static int zswap_writeback_entry(struct zswap_entry = *entry, swp_entry_t swpentry =3D entry->swpentry; struct page *page; struct mempolicy *mpol; - struct scatterlist input, output; - struct crypto_acomp_ctx *acomp_ctx; - struct zpool *pool =3D zswap_find_zpool(entry); bool page_was_allocated; - u8 *src; - unsigned int dlen; int ret; struct writeback_control wbc =3D { .sync_mode =3D WB_SYNC_NONE, @@ -1456,31 +1480,7 @@ static int zswap_writeback_entry(struct zswap_entry = *entry, } spin_unlock(&tree->lock); =20 - /* decompress */ - acomp_ctx =3D raw_cpu_ptr(entry->pool->acomp_ctx); - dlen =3D PAGE_SIZE; - mutex_lock(acomp_ctx->mutex); - - src =3D zpool_map_handle(pool, entry->handle, ZPOOL_MM_RO); - if (!zpool_can_sleep_mapped(pool)) { - memcpy(acomp_ctx->dstmem, src, entry->length); - src =3D acomp_ctx->dstmem; - zpool_unmap_handle(pool, entry->handle); - } - - sg_init_one(&input, src, entry->length); - sg_init_table(&output, 1); - sg_set_page(&output, page, PAGE_SIZE, 0); - acomp_request_set_params(acomp_ctx->req, &input, &output, entry->length, = dlen); - ret =3D crypto_wait_req(crypto_acomp_decompress(acomp_ctx->req), &acomp_c= tx->wait); - dlen =3D acomp_ctx->req->dlen; - mutex_unlock(acomp_ctx->mutex); - - if (zpool_can_sleep_mapped(pool)) - zpool_unmap_handle(pool, entry->handle); - - BUG_ON(ret); - BUG_ON(dlen !=3D PAGE_SIZE); + __zswap_load(entry, page); =20 /* page is up to date */ SetPageUptodate(page); @@ -1758,11 +1758,7 @@ bool zswap_load(struct folio *folio) struct page *page =3D &folio->page; struct zswap_tree *tree =3D zswap_trees[type]; struct zswap_entry *entry; - struct scatterlist input, output; - struct crypto_acomp_ctx *acomp_ctx; - u8 *src, *dst; - struct zpool *zpool; - unsigned int dlen; + u8 *dst; bool ret; =20 VM_WARN_ON_ONCE(!folio_test_locked(folio)); @@ -1784,31 +1780,7 @@ bool zswap_load(struct folio *folio) goto stats; } =20 - zpool =3D zswap_find_zpool(entry); - - /* decompress */ - dlen =3D PAGE_SIZE; - acomp_ctx =3D raw_cpu_ptr(entry->pool->acomp_ctx); - mutex_lock(acomp_ctx->mutex); - - src =3D zpool_map_handle(zpool, entry->handle, ZPOOL_MM_RO); - if (!zpool_can_sleep_mapped(zpool)) { - memcpy(acomp_ctx->dstmem, src, entry->length); - src =3D acomp_ctx->dstmem; - zpool_unmap_handle(zpool, entry->handle); - } - - sg_init_one(&input, src, entry->length); - sg_init_table(&output, 1); - sg_set_page(&output, page, PAGE_SIZE, 0); - acomp_request_set_params(acomp_ctx->req, &input, &output, entry->length, = dlen); - if (crypto_wait_req(crypto_acomp_decompress(acomp_ctx->req), &acomp_ctx->= wait)) - WARN_ON(1); - mutex_unlock(acomp_ctx->mutex); - - if (zpool_can_sleep_mapped(zpool)) - zpool_unmap_handle(zpool, entry->handle); - + __zswap_load(entry, page); ret =3D true; stats: count_vm_event(ZSWPIN); --=20 b4 0.10.1 From nobody Sat Dec 27 01:29:27 2025 Received: from out-171.mta0.migadu.com (out-171.mta0.migadu.com [91.218.175.171]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 20A0652F67 for ; Tue, 26 Dec 2023 15:54:59 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dmarc=fail (p=quarantine dis=none) header.from=bytedance.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.dev X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. From: Chengming Zhou Date: Tue, 26 Dec 2023 15:54:11 +0000 Subject: [PATCH v4 4/6] mm/zswap: cleanup zswap_load() Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Message-Id: <20231213-zswap-dstmem-v4-4-f228b059dd89@bytedance.com> References: <20231213-zswap-dstmem-v4-0-f228b059dd89@bytedance.com> In-Reply-To: <20231213-zswap-dstmem-v4-0-f228b059dd89@bytedance.com> To: Andrew Morton , Seth Jennings , Johannes Weiner , Vitaly Wool , Nhat Pham , Chris Li , Yosry Ahmed , Dan Streetman Cc: linux-kernel@vger.kernel.org, Chengming Zhou , linux-mm@kvack.org, Nhat Pham , Yosry Ahmed , Chris Li X-Developer-Signature: v=1; a=ed25519-sha256; t=1703606082; l=1518; i=zhouchengming@bytedance.com; s=20231204; h=from:subject:message-id; bh=rkes+J7eJM7t0YHK81EpBcYXvy/jNev7hTPxYOeqbUg=; b=EMECwijc3Cop46ntV6xtp5hN60obHUUAxQdRv1B1TZNaLQmm/7W7OGjUfEwdkOlvVPeLksp90 8QboHPBWsnnD2aRQe8a9oroWGYNCKyszGFtZFKrkZZcaWZnrG3gi0aT X-Developer-Key: i=zhouchengming@bytedance.com; a=ed25519; pk=xFTmRtMG3vELGJBUiml7OYNdM393WOMv0iWWeQEVVdA= X-Migadu-Flow: FLOW_OUT After the common decompress part goes to __zswap_load(), we can cleanup the zswap_load() a little. Reviewed-by: Yosry Ahmed Acked-by: Chis Li (Google) Signed-off-by: Chengming Zhou --- mm/zswap.c | 12 ++++-------- 1 file changed, 4 insertions(+), 8 deletions(-) diff --git a/mm/zswap.c b/mm/zswap.c index 3433bd6b3cef..86886276cb81 100644 --- a/mm/zswap.c +++ b/mm/zswap.c @@ -1759,7 +1759,6 @@ bool zswap_load(struct folio *folio) struct zswap_tree *tree =3D zswap_trees[type]; struct zswap_entry *entry; u8 *dst; - bool ret; =20 VM_WARN_ON_ONCE(!folio_test_locked(folio)); =20 @@ -1776,19 +1775,16 @@ bool zswap_load(struct folio *folio) dst =3D kmap_local_page(page); zswap_fill_page(dst, entry->value); kunmap_local(dst); - ret =3D true; - goto stats; + } else { + __zswap_load(entry, page); } =20 - __zswap_load(entry, page); - ret =3D true; -stats: count_vm_event(ZSWPIN); if (entry->objcg) count_objcg_event(entry->objcg, ZSWPIN); =20 spin_lock(&tree->lock); - if (ret && zswap_exclusive_loads_enabled) { + if (zswap_exclusive_loads_enabled) { zswap_invalidate_entry(tree, entry); folio_mark_dirty(folio); } else if (entry->length) { @@ -1798,7 +1794,7 @@ bool zswap_load(struct folio *folio) zswap_entry_put(tree, entry); spin_unlock(&tree->lock); =20 - return ret; + return true; } =20 void zswap_invalidate(int type, pgoff_t offset) --=20 b4 0.10.1 From nobody Sat Dec 27 01:29:27 2025 Received: from out-177.mta0.migadu.com (out-177.mta0.migadu.com [91.218.175.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 1108D53E12 for ; Tue, 26 Dec 2023 15:55:02 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dmarc=fail (p=quarantine dis=none) header.from=bytedance.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.dev X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. From: Chengming Zhou Date: Tue, 26 Dec 2023 15:54:12 +0000 Subject: [PATCH v4 5/6] mm/zswap: cleanup zswap_writeback_entry() Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Message-Id: <20231213-zswap-dstmem-v4-5-f228b059dd89@bytedance.com> References: <20231213-zswap-dstmem-v4-0-f228b059dd89@bytedance.com> In-Reply-To: <20231213-zswap-dstmem-v4-0-f228b059dd89@bytedance.com> To: Andrew Morton , Seth Jennings , Johannes Weiner , Vitaly Wool , Nhat Pham , Chris Li , Yosry Ahmed , Dan Streetman Cc: linux-kernel@vger.kernel.org, Chengming Zhou , linux-mm@kvack.org, Nhat Pham , Yosry Ahmed , Chris Li X-Developer-Signature: v=1; a=ed25519-sha256; t=1703606082; l=2338; i=zhouchengming@bytedance.com; s=20231204; h=from:subject:message-id; bh=tAV69Q+4us6Py1b2R2r8p5EYfYNSenhTMFhuOp7vNNQ=; b=afViGCtigOHtj8SqyE1/3tYO10W9zGnBEqlNj/WHy0NkwORwH+k6pwR/NJpbIHP3TQeqORo2h ERJwMflduKtC/okQR5pR1LwHwhvGLXlCdgX/gX53ZckJzqrZB76ELZ2 X-Developer-Key: i=zhouchengming@bytedance.com; a=ed25519; pk=xFTmRtMG3vELGJBUiml7OYNdM393WOMv0iWWeQEVVdA= X-Migadu-Flow: FLOW_OUT Also after the common decompress part goes to __zswap_load(), we can cleanup the zswap_writeback_entry() a little. Reviewed-by: Yosry Ahmed Reviewed-by: Nhat Pham Acked-by: Chris Li (Google) Signed-off-by: Chengming Zhou --- mm/zswap.c | 29 ++++++++++------------------- 1 file changed, 10 insertions(+), 19 deletions(-) diff --git a/mm/zswap.c b/mm/zswap.c index 86886276cb81..40ee9f109f98 100644 --- a/mm/zswap.c +++ b/mm/zswap.c @@ -1443,7 +1443,6 @@ static int zswap_writeback_entry(struct zswap_entry *= entry, struct page *page; struct mempolicy *mpol; bool page_was_allocated; - int ret; struct writeback_control wbc =3D { .sync_mode =3D WB_SYNC_NONE, }; @@ -1452,16 +1451,17 @@ static int zswap_writeback_entry(struct zswap_entry= *entry, mpol =3D get_task_policy(current); page =3D __read_swap_cache_async(swpentry, GFP_KERNEL, mpol, NO_INTERLEAVE_INDEX, &page_was_allocated, true); - if (!page) { - ret =3D -ENOMEM; - goto fail; - } + if (!page) + return -ENOMEM; =20 - /* Found an existing page, we raced with load/swapin */ + /* + * Found an existing page, we raced with load/swapin. We generally + * writeback cold pages from zswap, and swapin means the page just + * became hot. Skip this page and let the caller find another one. + */ if (!page_was_allocated) { put_page(page); - ret =3D -EEXIST; - goto fail; + return -EEXIST; } =20 /* @@ -1475,8 +1475,7 @@ static int zswap_writeback_entry(struct zswap_entry *= entry, if (zswap_rb_search(&tree->rbroot, swp_offset(entry->swpentry)) !=3D entr= y) { spin_unlock(&tree->lock); delete_from_swap_cache(page_folio(page)); - ret =3D -ENOMEM; - goto fail; + return -ENOMEM; } spin_unlock(&tree->lock); =20 @@ -1497,15 +1496,7 @@ static int zswap_writeback_entry(struct zswap_entry = *entry, __swap_writepage(page, &wbc); put_page(page); =20 - return ret; - -fail: - /* - * If we get here because the page is already in swapcache, a - * load may be happening concurrently. It is safe and okay to - * not free the entry. It is also okay to return !0. - */ - return ret; + return 0; } =20 static int zswap_is_page_same_filled(void *ptr, unsigned long *value) --=20 b4 0.10.1 From nobody Sat Dec 27 01:29:27 2025 Received: from out-177.mta0.migadu.com (out-177.mta0.migadu.com [91.218.175.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 0B90153E26 for ; Tue, 26 Dec 2023 15:55:05 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dmarc=fail (p=quarantine dis=none) header.from=bytedance.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.dev X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. From: Chengming Zhou Date: Tue, 26 Dec 2023 15:54:13 +0000 Subject: [PATCH v4 6/6] mm/zswap: change per-cpu mutex and buffer to per-acomp_ctx Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Message-Id: <20231213-zswap-dstmem-v4-6-f228b059dd89@bytedance.com> References: <20231213-zswap-dstmem-v4-0-f228b059dd89@bytedance.com> In-Reply-To: <20231213-zswap-dstmem-v4-0-f228b059dd89@bytedance.com> To: Andrew Morton , Seth Jennings , Johannes Weiner , Vitaly Wool , Nhat Pham , Chris Li , Yosry Ahmed , Dan Streetman Cc: linux-kernel@vger.kernel.org, Chengming Zhou , linux-mm@kvack.org, Nhat Pham , Yosry Ahmed , Chris Li X-Developer-Signature: v=1; a=ed25519-sha256; t=1703606082; l=7176; i=zhouchengming@bytedance.com; s=20231204; h=from:subject:message-id; bh=eozKHlUpbkn5zdclwtiNUF72Ti9iB66PKm56EsjFhvY=; b=ZHdimCEH7YHQg7J/zZG0lpD8VDQR5PJsQQJTk4mFBZYPUirIGkxxEFhs+8NzqRUaI5EO7+GWo uZGuipzU28YBtj82Nq4HzIbIEAvGbfNtgvhtYTNlPP6vi7n5vBhjtn8 X-Developer-Key: i=zhouchengming@bytedance.com; a=ed25519; pk=xFTmRtMG3vELGJBUiml7OYNdM393WOMv0iWWeQEVVdA= X-Migadu-Flow: FLOW_OUT First of all, we need to rename acomp_ctx->dstmem field to buffer, since we are now using for purposes other than compression. Then we change per-cpu mutex and buffer to per-acomp_ctx, since them belong to the acomp_ctx and are necessary parts when used in the compress/decompress contexts. So we can remove the old per-cpu mutex and dstmem. Acked-by: Chris Li (Google) Signed-off-by: Chengming Zhou Reviewed-by: Nhat Pham --- include/linux/cpuhotplug.h | 1 - mm/zswap.c | 98 +++++++++++++-----------------------------= ---- 2 files changed, 28 insertions(+), 71 deletions(-) diff --git a/include/linux/cpuhotplug.h b/include/linux/cpuhotplug.h index efc0c0b07efb..c3e06e21766a 100644 --- a/include/linux/cpuhotplug.h +++ b/include/linux/cpuhotplug.h @@ -124,7 +124,6 @@ enum cpuhp_state { CPUHP_ARM_BL_PREPARE, CPUHP_TRACE_RB_PREPARE, CPUHP_MM_ZS_PREPARE, - CPUHP_MM_ZSWP_MEM_PREPARE, CPUHP_MM_ZSWP_POOL_PREPARE, CPUHP_KVM_PPC_BOOK3S_PREPARE, CPUHP_ZCOMP_PREPARE, diff --git a/mm/zswap.c b/mm/zswap.c index 40ee9f109f98..8014509736ad 100644 --- a/mm/zswap.c +++ b/mm/zswap.c @@ -166,8 +166,8 @@ struct crypto_acomp_ctx { struct crypto_acomp *acomp; struct acomp_req *req; struct crypto_wait wait; - u8 *dstmem; - struct mutex *mutex; + u8 *buffer; + struct mutex mutex; }; =20 /* @@ -694,63 +694,26 @@ static void zswap_alloc_shrinker(struct zswap_pool *p= ool) /********************************* * per-cpu code **********************************/ -static DEFINE_PER_CPU(u8 *, zswap_dstmem); -/* - * If users dynamically change the zpool type and compressor at runtime, i= .e. - * zswap is running, zswap can have more than one zpool on one cpu, but th= ey - * are sharing dtsmem. So we need this mutex to be per-cpu. - */ -static DEFINE_PER_CPU(struct mutex *, zswap_mutex); - -static int zswap_dstmem_prepare(unsigned int cpu) -{ - struct mutex *mutex; - u8 *dst; - - dst =3D kmalloc_node(PAGE_SIZE, GFP_KERNEL, cpu_to_node(cpu)); - if (!dst) - return -ENOMEM; - - mutex =3D kmalloc_node(sizeof(*mutex), GFP_KERNEL, cpu_to_node(cpu)); - if (!mutex) { - kfree(dst); - return -ENOMEM; - } - - mutex_init(mutex); - per_cpu(zswap_dstmem, cpu) =3D dst; - per_cpu(zswap_mutex, cpu) =3D mutex; - return 0; -} - -static int zswap_dstmem_dead(unsigned int cpu) -{ - struct mutex *mutex; - u8 *dst; - - mutex =3D per_cpu(zswap_mutex, cpu); - kfree(mutex); - per_cpu(zswap_mutex, cpu) =3D NULL; - - dst =3D per_cpu(zswap_dstmem, cpu); - kfree(dst); - per_cpu(zswap_dstmem, cpu) =3D NULL; - - return 0; -} - static int zswap_cpu_comp_prepare(unsigned int cpu, struct hlist_node *nod= e) { struct zswap_pool *pool =3D hlist_entry(node, struct zswap_pool, node); struct crypto_acomp_ctx *acomp_ctx =3D per_cpu_ptr(pool->acomp_ctx, cpu); struct crypto_acomp *acomp; struct acomp_req *req; + int ret; + + mutex_init(&acomp_ctx->mutex); + + acomp_ctx->buffer =3D kmalloc_node(PAGE_SIZE, GFP_KERNEL, cpu_to_node(cpu= )); + if (!acomp_ctx->buffer) + return -ENOMEM; =20 acomp =3D crypto_alloc_acomp_node(pool->tfm_name, 0, 0, cpu_to_node(cpu)); if (IS_ERR(acomp)) { pr_err("could not alloc crypto acomp %s : %ld\n", pool->tfm_name, PTR_ERR(acomp)); - return PTR_ERR(acomp); + ret =3D PTR_ERR(acomp); + goto acomp_fail; } acomp_ctx->acomp =3D acomp; =20 @@ -758,8 +721,8 @@ static int zswap_cpu_comp_prepare(unsigned int cpu, str= uct hlist_node *node) if (!req) { pr_err("could not alloc crypto acomp_request %s\n", pool->tfm_name); - crypto_free_acomp(acomp_ctx->acomp); - return -ENOMEM; + ret =3D -ENOMEM; + goto req_fail; } acomp_ctx->req =3D req; =20 @@ -772,10 +735,13 @@ static int zswap_cpu_comp_prepare(unsigned int cpu, s= truct hlist_node *node) acomp_request_set_callback(req, CRYPTO_TFM_REQ_MAY_BACKLOG, crypto_req_done, &acomp_ctx->wait); =20 - acomp_ctx->mutex =3D per_cpu(zswap_mutex, cpu); - acomp_ctx->dstmem =3D per_cpu(zswap_dstmem, cpu); - return 0; + +req_fail: + crypto_free_acomp(acomp_ctx->acomp); +acomp_fail: + kfree(acomp_ctx->buffer); + return ret; } =20 static int zswap_cpu_comp_dead(unsigned int cpu, struct hlist_node *node) @@ -788,6 +754,7 @@ static int zswap_cpu_comp_dead(unsigned int cpu, struct= hlist_node *node) acomp_request_free(acomp_ctx->req); if (!IS_ERR_OR_NULL(acomp_ctx->acomp)) crypto_free_acomp(acomp_ctx->acomp); + kfree(acomp_ctx->buffer); } =20 return 0; @@ -1400,12 +1367,12 @@ static void __zswap_load(struct zswap_entry *entry,= struct page *page) u8 *src; =20 acomp_ctx =3D raw_cpu_ptr(entry->pool->acomp_ctx); - mutex_lock(acomp_ctx->mutex); + mutex_lock(&acomp_ctx->mutex); =20 src =3D zpool_map_handle(zpool, entry->handle, ZPOOL_MM_RO); if (!zpool_can_sleep_mapped(zpool)) { - memcpy(acomp_ctx->dstmem, src, entry->length); - src =3D acomp_ctx->dstmem; + memcpy(acomp_ctx->buffer, src, entry->length); + src =3D acomp_ctx->buffer; zpool_unmap_handle(zpool, entry->handle); } =20 @@ -1415,7 +1382,7 @@ static void __zswap_load(struct zswap_entry *entry, s= truct page *page) acomp_request_set_params(acomp_ctx->req, &input, &output, entry->length, = PAGE_SIZE); BUG_ON(crypto_wait_req(crypto_acomp_decompress(acomp_ctx->req), &acomp_ct= x->wait)); BUG_ON(acomp_ctx->req->dlen !=3D PAGE_SIZE); - mutex_unlock(acomp_ctx->mutex); + mutex_unlock(&acomp_ctx->mutex); =20 if (zpool_can_sleep_mapped(zpool)) zpool_unmap_handle(zpool, entry->handle); @@ -1636,9 +1603,9 @@ bool zswap_store(struct folio *folio) /* compress */ acomp_ctx =3D raw_cpu_ptr(entry->pool->acomp_ctx); =20 - mutex_lock(acomp_ctx->mutex); + mutex_lock(&acomp_ctx->mutex); =20 - dst =3D acomp_ctx->dstmem; + dst =3D acomp_ctx->buffer; sg_init_table(&input, 1); sg_set_page(&input, page, PAGE_SIZE, 0); =20 @@ -1681,7 +1648,7 @@ bool zswap_store(struct folio *folio) buf =3D zpool_map_handle(zpool, handle, ZPOOL_MM_WO); memcpy(buf, dst, dlen); zpool_unmap_handle(zpool, handle); - mutex_unlock(acomp_ctx->mutex); + mutex_unlock(&acomp_ctx->mutex); =20 /* populate entry */ entry->swpentry =3D swp_entry(type, offset); @@ -1724,7 +1691,7 @@ bool zswap_store(struct folio *folio) return true; =20 put_dstmem: - mutex_unlock(acomp_ctx->mutex); + mutex_unlock(&acomp_ctx->mutex); put_pool: zswap_pool_put(entry->pool); freepage: @@ -1899,13 +1866,6 @@ static int zswap_setup(void) goto cache_fail; } =20 - ret =3D cpuhp_setup_state(CPUHP_MM_ZSWP_MEM_PREPARE, "mm/zswap:prepare", - zswap_dstmem_prepare, zswap_dstmem_dead); - if (ret) { - pr_err("dstmem alloc failed\n"); - goto dstmem_fail; - } - ret =3D cpuhp_setup_state_multi(CPUHP_MM_ZSWP_POOL_PREPARE, "mm/zswap_pool:prepare", zswap_cpu_comp_prepare, @@ -1937,8 +1897,6 @@ static int zswap_setup(void) if (pool) zswap_pool_destroy(pool); hp_fail: - cpuhp_remove_state(CPUHP_MM_ZSWP_MEM_PREPARE); -dstmem_fail: kmem_cache_destroy(zswap_entry_cache); cache_fail: /* if built-in, we aren't unloaded on failure; don't allow use */ --=20 b4 0.10.1