From nobody Sat Dec 27 17:04:42 2025 Received: from out-176.mta1.migadu.com (out-176.mta1.migadu.com [95.215.58.176]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 640E712B61 for ; Mon, 18 Dec 2023 08:22:33 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dmarc=fail (p=quarantine dis=none) header.from=bytedance.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.dev X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. From: Chengming Zhou Date: Mon, 18 Dec 2023 08:22:00 +0000 Subject: [PATCH v2 1/6] mm/zswap: change dstmem size to one page Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Message-Id: <20231213-zswap-dstmem-v2-1-daa5d9ae41a7@bytedance.com> References: <20231213-zswap-dstmem-v2-0-daa5d9ae41a7@bytedance.com> In-Reply-To: <20231213-zswap-dstmem-v2-0-daa5d9ae41a7@bytedance.com> To: Seth Jennings , Dan Streetman , Chris Li , Nhat Pham , Vitaly Wool , Yosry Ahmed , Andrew Morton , Johannes Weiner Cc: linux-kernel@vger.kernel.org, Nhat Pham , linux-mm@kvack.org, Yosry Ahmed , Chengming Zhou , Chris Li X-Developer-Signature: v=1; a=ed25519-sha256; t=1702887745; l=2012; i=zhouchengming@bytedance.com; s=20231204; h=from:subject:message-id; bh=knbPWkl1sBJOnq5pAv9yoWZfZi+/5lFRxFAsoTFiAeI=; b=wLIslRILYEE+m1faC45+NiuKmvx2YOrqYhYxcwRKSBnB/cOfAH9ePkRU4ZggKvwoWRd8ApqOE G59vRbwGIL6Cj27KS5MZI/t6FgZ7RE6FdEjYEhi/qQPb2ilNgqbCYBz X-Developer-Key: i=zhouchengming@bytedance.com; a=ed25519; pk=xFTmRtMG3vELGJBUiml7OYNdM393WOMv0iWWeQEVVdA= X-Migadu-Flow: FLOW_OUT Change the dstmem size from 2 * PAGE_SIZE to only one page since we only need at most one page when compress, and the "dlen" is also PAGE_SIZE in acomp_request_set_params(). If the output size > PAGE_SIZE we don't wanna store the output in zswap anyway. So change it to one page, and delete the stale comment. There is no any history about the reason why we needed 2 pages, it has been 2 * PAGE_SIZE since the time zswap was first merged. According to Yosry and Nhat, one potential reason is that we used to store a zswap header containing the swap entry in the compressed page for writeback purposes, but we don't do that anymore. This patch works good in kernel build testing even when the input data doesn't compress at all (i.e. dlen =3D=3D PAGE_SIZE), which we can see from the bpftrace tool: bpftrace -e 'k:zpool_malloc {@[(uint32)arg1=3D=3D4096]=3Dcount()}' @[1]: 2 @[0]: 12011430 Reviewed-by: Yosry Ahmed Reviewed-by: Nhat Pham Signed-off-by: Chengming Zhou Acked-by: Chris Li (Google) --- mm/zswap.c | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/mm/zswap.c b/mm/zswap.c index 7ee54a3d8281..976f278aa507 100644 --- a/mm/zswap.c +++ b/mm/zswap.c @@ -707,7 +707,7 @@ static int zswap_dstmem_prepare(unsigned int cpu) struct mutex *mutex; u8 *dst; =20 - dst =3D kmalloc_node(PAGE_SIZE * 2, GFP_KERNEL, cpu_to_node(cpu)); + dst =3D kmalloc_node(PAGE_SIZE, GFP_KERNEL, cpu_to_node(cpu)); if (!dst) return -ENOMEM; =20 @@ -1662,8 +1662,7 @@ bool zswap_store(struct folio *folio) sg_init_table(&input, 1); sg_set_page(&input, page, PAGE_SIZE, 0); =20 - /* zswap_dstmem is of size (PAGE_SIZE * 2). Reflect same in sg_list */ - sg_init_one(&output, dst, PAGE_SIZE * 2); + sg_init_one(&output, dst, PAGE_SIZE); acomp_request_set_params(acomp_ctx->req, &input, &output, PAGE_SIZE, dlen= ); /* * it maybe looks a little bit silly that we send an asynchronous request, --=20 b4 0.10.1 From nobody Sat Dec 27 17:04:42 2025 Received: from out-185.mta1.migadu.com (out-185.mta1.migadu.com [95.215.58.185]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id CCD8A134B4 for ; Mon, 18 Dec 2023 08:22:35 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dmarc=fail (p=quarantine dis=none) header.from=bytedance.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.dev X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. From: Chengming Zhou Date: Mon, 18 Dec 2023 08:22:01 +0000 Subject: [PATCH v2 2/6] mm/zswap: reuse dstmem when decompress Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Message-Id: <20231213-zswap-dstmem-v2-2-daa5d9ae41a7@bytedance.com> References: <20231213-zswap-dstmem-v2-0-daa5d9ae41a7@bytedance.com> In-Reply-To: <20231213-zswap-dstmem-v2-0-daa5d9ae41a7@bytedance.com> To: Seth Jennings , Dan Streetman , Chris Li , Nhat Pham , Vitaly Wool , Yosry Ahmed , Andrew Morton , Johannes Weiner Cc: linux-kernel@vger.kernel.org, Nhat Pham , linux-mm@kvack.org, Yosry Ahmed , Chengming Zhou , Chris Li X-Developer-Signature: v=1; a=ed25519-sha256; t=1702887745; l=4229; i=zhouchengming@bytedance.com; s=20231204; h=from:subject:message-id; bh=aB+tFXtYFtFovOCKjFmKhfN20WzOhqogX+a8O/o262s=; b=Kd7hGtuBRqJ6usbhtM5mZ4Sj4K7wMC6fwtx+Dp2KUGxlw6etTViNm+SeBpEiBp80WtiJS41o8 B1a8dk6TqidDIJodwMtroZsMduENCYRUKjYDOAzxK6wHZnm5C2hLLQ7 X-Developer-Key: i=zhouchengming@bytedance.com; a=ed25519; pk=xFTmRtMG3vELGJBUiml7OYNdM393WOMv0iWWeQEVVdA= X-Migadu-Flow: FLOW_OUT In the !zpool_can_sleep_mapped() case such as zsmalloc, we need to first copy the entry->handle memory to a temporary memory, which is allocated using kmalloc. Obviously we can reuse the per-compressor dstmem to avoid allocating every time, since it's percpu-compressor and protected in percpu mutex. Reviewed-by: Nhat Pham Acked-by: Chris Li Signed-off-by: Chengming Zhou Reviewed-by: Yosry Ahmed --- mm/zswap.c | 44 ++++++++++++-------------------------------- 1 file changed, 12 insertions(+), 32 deletions(-) diff --git a/mm/zswap.c b/mm/zswap.c index 976f278aa507..6b872744e962 100644 --- a/mm/zswap.c +++ b/mm/zswap.c @@ -1417,19 +1417,13 @@ static int zswap_writeback_entry(struct zswap_entry= *entry, struct crypto_acomp_ctx *acomp_ctx; struct zpool *pool =3D zswap_find_zpool(entry); bool page_was_allocated; - u8 *src, *tmp =3D NULL; + u8 *src; unsigned int dlen; int ret; struct writeback_control wbc =3D { .sync_mode =3D WB_SYNC_NONE, }; =20 - if (!zpool_can_sleep_mapped(pool)) { - tmp =3D kmalloc(PAGE_SIZE, GFP_KERNEL); - if (!tmp) - return -ENOMEM; - } - /* try to allocate swap cache page */ mpol =3D get_task_policy(current); page =3D __read_swap_cache_async(swpentry, GFP_KERNEL, mpol, @@ -1465,15 +1459,15 @@ static int zswap_writeback_entry(struct zswap_entry= *entry, /* decompress */ acomp_ctx =3D raw_cpu_ptr(entry->pool->acomp_ctx); dlen =3D PAGE_SIZE; + mutex_lock(acomp_ctx->mutex); =20 src =3D zpool_map_handle(pool, entry->handle, ZPOOL_MM_RO); if (!zpool_can_sleep_mapped(pool)) { - memcpy(tmp, src, entry->length); - src =3D tmp; + memcpy(acomp_ctx->dstmem, src, entry->length); + src =3D acomp_ctx->dstmem; zpool_unmap_handle(pool, entry->handle); } =20 - mutex_lock(acomp_ctx->mutex); sg_init_one(&input, src, entry->length); sg_init_table(&output, 1); sg_set_page(&output, page, PAGE_SIZE, 0); @@ -1482,9 +1476,7 @@ static int zswap_writeback_entry(struct zswap_entry *= entry, dlen =3D acomp_ctx->req->dlen; mutex_unlock(acomp_ctx->mutex); =20 - if (!zpool_can_sleep_mapped(pool)) - kfree(tmp); - else + if (zpool_can_sleep_mapped(pool)) zpool_unmap_handle(pool, entry->handle); =20 BUG_ON(ret); @@ -1508,9 +1500,6 @@ static int zswap_writeback_entry(struct zswap_entry *= entry, return ret; =20 fail: - if (!zpool_can_sleep_mapped(pool)) - kfree(tmp); - /* * If we get here because the page is already in swapcache, a * load may be happening concurrently. It is safe and okay to @@ -1771,7 +1760,7 @@ bool zswap_load(struct folio *folio) struct zswap_entry *entry; struct scatterlist input, output; struct crypto_acomp_ctx *acomp_ctx; - u8 *src, *dst, *tmp; + u8 *src, *dst; struct zpool *zpool; unsigned int dlen; bool ret; @@ -1796,26 +1785,19 @@ bool zswap_load(struct folio *folio) } =20 zpool =3D zswap_find_zpool(entry); - if (!zpool_can_sleep_mapped(zpool)) { - tmp =3D kmalloc(entry->length, GFP_KERNEL); - if (!tmp) { - ret =3D false; - goto freeentry; - } - } =20 /* decompress */ dlen =3D PAGE_SIZE; - src =3D zpool_map_handle(zpool, entry->handle, ZPOOL_MM_RO); + acomp_ctx =3D raw_cpu_ptr(entry->pool->acomp_ctx); + mutex_lock(acomp_ctx->mutex); =20 + src =3D zpool_map_handle(zpool, entry->handle, ZPOOL_MM_RO); if (!zpool_can_sleep_mapped(zpool)) { - memcpy(tmp, src, entry->length); - src =3D tmp; + memcpy(acomp_ctx->dstmem, src, entry->length); + src =3D acomp_ctx->dstmem; zpool_unmap_handle(zpool, entry->handle); } =20 - acomp_ctx =3D raw_cpu_ptr(entry->pool->acomp_ctx); - mutex_lock(acomp_ctx->mutex); sg_init_one(&input, src, entry->length); sg_init_table(&output, 1); sg_set_page(&output, page, PAGE_SIZE, 0); @@ -1826,15 +1808,13 @@ bool zswap_load(struct folio *folio) =20 if (zpool_can_sleep_mapped(zpool)) zpool_unmap_handle(zpool, entry->handle); - else - kfree(tmp); =20 ret =3D true; stats: count_vm_event(ZSWPIN); if (entry->objcg) count_objcg_event(entry->objcg, ZSWPIN); -freeentry: + spin_lock(&tree->lock); if (ret && zswap_exclusive_loads_enabled) { zswap_invalidate_entry(tree, entry); --=20 b4 0.10.1 From nobody Sat Dec 27 17:04:42 2025 Received: from out-179.mta1.migadu.com (out-179.mta1.migadu.com [95.215.58.179]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 9BC5D13ADC for ; Mon, 18 Dec 2023 08:22:38 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dmarc=fail (p=quarantine dis=none) header.from=bytedance.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.dev X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. From: Chengming Zhou Date: Mon, 18 Dec 2023 08:22:02 +0000 Subject: [PATCH v2 3/6] mm/zswap: refactor out __zswap_load() Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Message-Id: <20231213-zswap-dstmem-v2-3-daa5d9ae41a7@bytedance.com> References: <20231213-zswap-dstmem-v2-0-daa5d9ae41a7@bytedance.com> In-Reply-To: <20231213-zswap-dstmem-v2-0-daa5d9ae41a7@bytedance.com> To: Seth Jennings , Dan Streetman , Chris Li , Nhat Pham , Vitaly Wool , Yosry Ahmed , Andrew Morton , Johannes Weiner Cc: linux-kernel@vger.kernel.org, Nhat Pham , linux-mm@kvack.org, Yosry Ahmed , Chengming Zhou , Chris Li X-Developer-Signature: v=1; a=ed25519-sha256; t=1702887745; l=4703; i=zhouchengming@bytedance.com; s=20231204; h=from:subject:message-id; bh=B99wC6AH9+R5m6Djw0+W/AJA7ag/bUbsLCKvYrrlo8Q=; b=i4FV3NiKGGQh3MrHNtbjPVINB+qPumTYyPJyDJWpxDhI64MoLtB2CBM8elGupDabajgL3lEDR 6iddEsOkX1UA09nUGr1s1pdovfIltzGUWgVzu3xtbR6x9alb5Q0sCF3 X-Developer-Key: i=zhouchengming@bytedance.com; a=ed25519; pk=xFTmRtMG3vELGJBUiml7OYNdM393WOMv0iWWeQEVVdA= X-Migadu-Flow: FLOW_OUT The zswap_load() and zswap_writeback_entry() have the same part that decompress the data from zswap_entry to page, so refactor out the common part as __zswap_load(entry, page). Reviewed-by: Nhat Pham Reviewed-by: Yosry Ahmed Signed-off-by: Chengming Zhou --- mm/zswap.c | 92 ++++++++++++++++++++++------------------------------------= ---- 1 file changed, 32 insertions(+), 60 deletions(-) diff --git a/mm/zswap.c b/mm/zswap.c index 6b872744e962..3433bd6b3cef 100644 --- a/mm/zswap.c +++ b/mm/zswap.c @@ -1392,6 +1392,35 @@ static int zswap_enabled_param_set(const char *val, return ret; } =20 +static void __zswap_load(struct zswap_entry *entry, struct page *page) +{ + struct zpool *zpool =3D zswap_find_zpool(entry); + struct scatterlist input, output; + struct crypto_acomp_ctx *acomp_ctx; + u8 *src; + + acomp_ctx =3D raw_cpu_ptr(entry->pool->acomp_ctx); + mutex_lock(acomp_ctx->mutex); + + src =3D zpool_map_handle(zpool, entry->handle, ZPOOL_MM_RO); + if (!zpool_can_sleep_mapped(zpool)) { + memcpy(acomp_ctx->dstmem, src, entry->length); + src =3D acomp_ctx->dstmem; + zpool_unmap_handle(zpool, entry->handle); + } + + sg_init_one(&input, src, entry->length); + sg_init_table(&output, 1); + sg_set_page(&output, page, PAGE_SIZE, 0); + acomp_request_set_params(acomp_ctx->req, &input, &output, entry->length, = PAGE_SIZE); + BUG_ON(crypto_wait_req(crypto_acomp_decompress(acomp_ctx->req), &acomp_ct= x->wait)); + BUG_ON(acomp_ctx->req->dlen !=3D PAGE_SIZE); + mutex_unlock(acomp_ctx->mutex); + + if (zpool_can_sleep_mapped(zpool)) + zpool_unmap_handle(zpool, entry->handle); +} + /********************************* * writeback code **********************************/ @@ -1413,12 +1442,7 @@ static int zswap_writeback_entry(struct zswap_entry = *entry, swp_entry_t swpentry =3D entry->swpentry; struct page *page; struct mempolicy *mpol; - struct scatterlist input, output; - struct crypto_acomp_ctx *acomp_ctx; - struct zpool *pool =3D zswap_find_zpool(entry); bool page_was_allocated; - u8 *src; - unsigned int dlen; int ret; struct writeback_control wbc =3D { .sync_mode =3D WB_SYNC_NONE, @@ -1456,31 +1480,7 @@ static int zswap_writeback_entry(struct zswap_entry = *entry, } spin_unlock(&tree->lock); =20 - /* decompress */ - acomp_ctx =3D raw_cpu_ptr(entry->pool->acomp_ctx); - dlen =3D PAGE_SIZE; - mutex_lock(acomp_ctx->mutex); - - src =3D zpool_map_handle(pool, entry->handle, ZPOOL_MM_RO); - if (!zpool_can_sleep_mapped(pool)) { - memcpy(acomp_ctx->dstmem, src, entry->length); - src =3D acomp_ctx->dstmem; - zpool_unmap_handle(pool, entry->handle); - } - - sg_init_one(&input, src, entry->length); - sg_init_table(&output, 1); - sg_set_page(&output, page, PAGE_SIZE, 0); - acomp_request_set_params(acomp_ctx->req, &input, &output, entry->length, = dlen); - ret =3D crypto_wait_req(crypto_acomp_decompress(acomp_ctx->req), &acomp_c= tx->wait); - dlen =3D acomp_ctx->req->dlen; - mutex_unlock(acomp_ctx->mutex); - - if (zpool_can_sleep_mapped(pool)) - zpool_unmap_handle(pool, entry->handle); - - BUG_ON(ret); - BUG_ON(dlen !=3D PAGE_SIZE); + __zswap_load(entry, page); =20 /* page is up to date */ SetPageUptodate(page); @@ -1758,11 +1758,7 @@ bool zswap_load(struct folio *folio) struct page *page =3D &folio->page; struct zswap_tree *tree =3D zswap_trees[type]; struct zswap_entry *entry; - struct scatterlist input, output; - struct crypto_acomp_ctx *acomp_ctx; - u8 *src, *dst; - struct zpool *zpool; - unsigned int dlen; + u8 *dst; bool ret; =20 VM_WARN_ON_ONCE(!folio_test_locked(folio)); @@ -1784,31 +1780,7 @@ bool zswap_load(struct folio *folio) goto stats; } =20 - zpool =3D zswap_find_zpool(entry); - - /* decompress */ - dlen =3D PAGE_SIZE; - acomp_ctx =3D raw_cpu_ptr(entry->pool->acomp_ctx); - mutex_lock(acomp_ctx->mutex); - - src =3D zpool_map_handle(zpool, entry->handle, ZPOOL_MM_RO); - if (!zpool_can_sleep_mapped(zpool)) { - memcpy(acomp_ctx->dstmem, src, entry->length); - src =3D acomp_ctx->dstmem; - zpool_unmap_handle(zpool, entry->handle); - } - - sg_init_one(&input, src, entry->length); - sg_init_table(&output, 1); - sg_set_page(&output, page, PAGE_SIZE, 0); - acomp_request_set_params(acomp_ctx->req, &input, &output, entry->length, = dlen); - if (crypto_wait_req(crypto_acomp_decompress(acomp_ctx->req), &acomp_ctx->= wait)) - WARN_ON(1); - mutex_unlock(acomp_ctx->mutex); - - if (zpool_can_sleep_mapped(zpool)) - zpool_unmap_handle(zpool, entry->handle); - + __zswap_load(entry, page); ret =3D true; stats: count_vm_event(ZSWPIN); --=20 b4 0.10.1 From nobody Sat Dec 27 17:04:42 2025 Received: from out-177.mta1.migadu.com (out-177.mta1.migadu.com [95.215.58.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 4152713FFF for ; Mon, 18 Dec 2023 08:22:40 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dmarc=fail (p=quarantine dis=none) header.from=bytedance.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.dev X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. From: Chengming Zhou Date: Mon, 18 Dec 2023 08:22:03 +0000 Subject: [PATCH v2 4/6] mm/zswap: cleanup zswap_load() Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Message-Id: <20231213-zswap-dstmem-v2-4-daa5d9ae41a7@bytedance.com> References: <20231213-zswap-dstmem-v2-0-daa5d9ae41a7@bytedance.com> In-Reply-To: <20231213-zswap-dstmem-v2-0-daa5d9ae41a7@bytedance.com> To: Seth Jennings , Dan Streetman , Chris Li , Nhat Pham , Vitaly Wool , Yosry Ahmed , Andrew Morton , Johannes Weiner Cc: linux-kernel@vger.kernel.org, Nhat Pham , linux-mm@kvack.org, Yosry Ahmed , Chengming Zhou , Chris Li X-Developer-Signature: v=1; a=ed25519-sha256; t=1702887745; l=1470; i=zhouchengming@bytedance.com; s=20231204; h=from:subject:message-id; bh=1eOM2WlTMMQMFmZcsE6sU450y2r77yOSYBMqqAB7HFE=; b=YOV157KveESD/iXPiDgVC9WaHMxuNJOt5qyf4ENQkrp7X4z9UFu4fkqe1LUaOnVs8aXOn3lks OXH9Dwct4c+At1moBhyAct1e80egxjcQg0Enm77kPeVULGpQ2btQ6ci X-Developer-Key: i=zhouchengming@bytedance.com; a=ed25519; pk=xFTmRtMG3vELGJBUiml7OYNdM393WOMv0iWWeQEVVdA= X-Migadu-Flow: FLOW_OUT After the common decompress part goes to __zswap_load(), we can cleanup the zswap_load() a little. Reviewed-by: Yosry Ahmed Signed-off-by: Chengming Zhou --- mm/zswap.c | 12 ++++-------- 1 file changed, 4 insertions(+), 8 deletions(-) diff --git a/mm/zswap.c b/mm/zswap.c index 3433bd6b3cef..86886276cb81 100644 --- a/mm/zswap.c +++ b/mm/zswap.c @@ -1759,7 +1759,6 @@ bool zswap_load(struct folio *folio) struct zswap_tree *tree =3D zswap_trees[type]; struct zswap_entry *entry; u8 *dst; - bool ret; =20 VM_WARN_ON_ONCE(!folio_test_locked(folio)); =20 @@ -1776,19 +1775,16 @@ bool zswap_load(struct folio *folio) dst =3D kmap_local_page(page); zswap_fill_page(dst, entry->value); kunmap_local(dst); - ret =3D true; - goto stats; + } else { + __zswap_load(entry, page); } =20 - __zswap_load(entry, page); - ret =3D true; -stats: count_vm_event(ZSWPIN); if (entry->objcg) count_objcg_event(entry->objcg, ZSWPIN); =20 spin_lock(&tree->lock); - if (ret && zswap_exclusive_loads_enabled) { + if (zswap_exclusive_loads_enabled) { zswap_invalidate_entry(tree, entry); folio_mark_dirty(folio); } else if (entry->length) { @@ -1798,7 +1794,7 @@ bool zswap_load(struct folio *folio) zswap_entry_put(tree, entry); spin_unlock(&tree->lock); =20 - return ret; + return true; } =20 void zswap_invalidate(int type, pgoff_t offset) --=20 b4 0.10.1 From nobody Sat Dec 27 17:04:42 2025 Received: from out-183.mta1.migadu.com (out-183.mta1.migadu.com [95.215.58.183]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C01D8107BA for ; Mon, 18 Dec 2023 08:22:43 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dmarc=fail (p=quarantine dis=none) header.from=bytedance.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.dev X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. From: Chengming Zhou Date: Mon, 18 Dec 2023 08:22:04 +0000 Subject: [PATCH v2 5/6] mm/zswap: cleanup zswap_writeback_entry() Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Message-Id: <20231213-zswap-dstmem-v2-5-daa5d9ae41a7@bytedance.com> References: <20231213-zswap-dstmem-v2-0-daa5d9ae41a7@bytedance.com> In-Reply-To: <20231213-zswap-dstmem-v2-0-daa5d9ae41a7@bytedance.com> To: Seth Jennings , Dan Streetman , Chris Li , Nhat Pham , Vitaly Wool , Yosry Ahmed , Andrew Morton , Johannes Weiner Cc: linux-kernel@vger.kernel.org, Nhat Pham , linux-mm@kvack.org, Yosry Ahmed , Chengming Zhou , Chris Li X-Developer-Signature: v=1; a=ed25519-sha256; t=1702887745; l=2215; i=zhouchengming@bytedance.com; s=20231204; h=from:subject:message-id; bh=u9D8B3ksO9G67avNjANTAO3qS+/SASwDD1QW2eXuPmA=; b=Y/k0K3dE80CEovxET6orobtOyqyRJJHYc5i0dttC7jiXtrRjSziXVF7ngHADT55LxnZB0hdN9 VU8d58ziWvID/Hp36IATF7GQgb3Z0riqtM9YUjDT6aMJe1wiTZkqkT7 X-Developer-Key: i=zhouchengming@bytedance.com; a=ed25519; pk=xFTmRtMG3vELGJBUiml7OYNdM393WOMv0iWWeQEVVdA= X-Migadu-Flow: FLOW_OUT Also after the common decompress part goes to __zswap_load(), we can cleanup the zswap_writeback_entry() a little. Reviewed-by: Yosry Ahmed Reviewed-by: Nhat Pham Signed-off-by: Chengming Zhou --- mm/zswap.c | 25 +++++++++---------------- 1 file changed, 9 insertions(+), 16 deletions(-) diff --git a/mm/zswap.c b/mm/zswap.c index 86886276cb81..2c349fd88904 100644 --- a/mm/zswap.c +++ b/mm/zswap.c @@ -1443,7 +1443,6 @@ static int zswap_writeback_entry(struct zswap_entry *= entry, struct page *page; struct mempolicy *mpol; bool page_was_allocated; - int ret; struct writeback_control wbc =3D { .sync_mode =3D WB_SYNC_NONE, }; @@ -1453,15 +1452,18 @@ static int zswap_writeback_entry(struct zswap_entry= *entry, page =3D __read_swap_cache_async(swpentry, GFP_KERNEL, mpol, NO_INTERLEAVE_INDEX, &page_was_allocated, true); if (!page) { - ret =3D -ENOMEM; - goto fail; + /* + * If we get here because the page is already in swapcache, a + * load may be happening concurrently. It is safe and okay to + * not free the entry. It is also okay to return !0. + */ + return -ENOMEM; } =20 /* Found an existing page, we raced with load/swapin */ if (!page_was_allocated) { put_page(page); - ret =3D -EEXIST; - goto fail; + return -EEXIST; } =20 /* @@ -1475,8 +1477,7 @@ static int zswap_writeback_entry(struct zswap_entry *= entry, if (zswap_rb_search(&tree->rbroot, swp_offset(entry->swpentry)) !=3D entr= y) { spin_unlock(&tree->lock); delete_from_swap_cache(page_folio(page)); - ret =3D -ENOMEM; - goto fail; + return -ENOMEM; } spin_unlock(&tree->lock); =20 @@ -1497,15 +1498,7 @@ static int zswap_writeback_entry(struct zswap_entry = *entry, __swap_writepage(page, &wbc); put_page(page); =20 - return ret; - -fail: - /* - * If we get here because the page is already in swapcache, a - * load may be happening concurrently. It is safe and okay to - * not free the entry. It is also okay to return !0. - */ - return ret; + return 0; } =20 static int zswap_is_page_same_filled(void *ptr, unsigned long *value) --=20 b4 0.10.1 From nobody Sat Dec 27 17:04:42 2025 Received: from out-182.mta1.migadu.com (out-182.mta1.migadu.com [95.215.58.182]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 5FCDE111A2 for ; Mon, 18 Dec 2023 08:22:46 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dmarc=fail (p=quarantine dis=none) header.from=bytedance.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.dev X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. From: Chengming Zhou Date: Mon, 18 Dec 2023 08:22:05 +0000 Subject: [PATCH v2 6/6] mm/zswap: directly use percpu mutex and buffer in load/store Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Message-Id: <20231213-zswap-dstmem-v2-6-daa5d9ae41a7@bytedance.com> References: <20231213-zswap-dstmem-v2-0-daa5d9ae41a7@bytedance.com> In-Reply-To: <20231213-zswap-dstmem-v2-0-daa5d9ae41a7@bytedance.com> To: Seth Jennings , Dan Streetman , Chris Li , Nhat Pham , Vitaly Wool , Yosry Ahmed , Andrew Morton , Johannes Weiner Cc: linux-kernel@vger.kernel.org, Nhat Pham , linux-mm@kvack.org, Yosry Ahmed , Chengming Zhou , Chris Li X-Developer-Signature: v=1; a=ed25519-sha256; t=1702887745; l=10608; i=zhouchengming@bytedance.com; s=20231204; h=from:subject:message-id; bh=k6WkjFerLmMPPnB4BHXU0oLODY4+tgFUEVqHpaqy/GE=; b=L3QlA6B3Z8gMjlH/tKC9DFpE0gONp9++QxZWqj4jttxtA28i2k06FoJnhzPXl2dTnUCRe0O2I /yfAOqL4yTTBVAWzMzyY/vp/6g77noItOcvUhDa+h91UaW5hpum/Bxo X-Developer-Key: i=zhouchengming@bytedance.com; a=ed25519; pk=xFTmRtMG3vELGJBUiml7OYNdM393WOMv0iWWeQEVVdA= X-Migadu-Flow: FLOW_OUT Since the introduce of reusing the dstmem in the load path, it seems confusing that we are now using acomp_ctx->dstmem and acomp_ctx->mutex now for purposes other than what the naming suggests. Yosry suggested removing these two fields from acomp_ctx, and directly using zswap_dstmem and zswap_mutex in both the load and store paths, rename them, and add proper comments above their definitions that they are for generic percpu buffering on the load and store paths. So this patch remove dstmem and mutex from acomp_ctx, and rename the zswap_dstmem to zswap_buffer, using the percpu mutex and buffer on the load and store paths. And refactor out __zswap_store() to only include the compress & store, since I found zswap_store() is too long. Suggested-by: Yosry Ahmed Signed-off-by: Chengming Zhou --- mm/zswap.c | 193 +++++++++++++++++++++++++++++++++------------------------= ---- 1 file changed, 104 insertions(+), 89 deletions(-) diff --git a/mm/zswap.c b/mm/zswap.c index 2c349fd88904..b7449294ec3a 100644 --- a/mm/zswap.c +++ b/mm/zswap.c @@ -166,8 +166,6 @@ struct crypto_acomp_ctx { struct crypto_acomp *acomp; struct acomp_req *req; struct crypto_wait wait; - u8 *dstmem; - struct mutex *mutex; }; =20 /* @@ -694,7 +692,7 @@ static void zswap_alloc_shrinker(struct zswap_pool *poo= l) /********************************* * per-cpu code **********************************/ -static DEFINE_PER_CPU(u8 *, zswap_dstmem); +static DEFINE_PER_CPU(u8 *, zswap_buffer); /* * If users dynamically change the zpool type and compressor at runtime, i= .e. * zswap is running, zswap can have more than one zpool on one cpu, but th= ey @@ -702,39 +700,39 @@ static DEFINE_PER_CPU(u8 *, zswap_dstmem); */ static DEFINE_PER_CPU(struct mutex *, zswap_mutex); =20 -static int zswap_dstmem_prepare(unsigned int cpu) +static int zswap_buffer_prepare(unsigned int cpu) { struct mutex *mutex; - u8 *dst; + u8 *buf; =20 - dst =3D kmalloc_node(PAGE_SIZE, GFP_KERNEL, cpu_to_node(cpu)); - if (!dst) + buf =3D kmalloc_node(PAGE_SIZE, GFP_KERNEL, cpu_to_node(cpu)); + if (!buf) return -ENOMEM; =20 mutex =3D kmalloc_node(sizeof(*mutex), GFP_KERNEL, cpu_to_node(cpu)); if (!mutex) { - kfree(dst); + kfree(buf); return -ENOMEM; } =20 mutex_init(mutex); - per_cpu(zswap_dstmem, cpu) =3D dst; + per_cpu(zswap_buffer, cpu) =3D buf; per_cpu(zswap_mutex, cpu) =3D mutex; return 0; } =20 -static int zswap_dstmem_dead(unsigned int cpu) +static int zswap_buffer_dead(unsigned int cpu) { struct mutex *mutex; - u8 *dst; + u8 *buf; =20 mutex =3D per_cpu(zswap_mutex, cpu); kfree(mutex); per_cpu(zswap_mutex, cpu) =3D NULL; =20 - dst =3D per_cpu(zswap_dstmem, cpu); - kfree(dst); - per_cpu(zswap_dstmem, cpu) =3D NULL; + buf =3D per_cpu(zswap_buffer, cpu); + kfree(buf); + per_cpu(zswap_buffer, cpu) =3D NULL; =20 return 0; } @@ -772,9 +770,6 @@ static int zswap_cpu_comp_prepare(unsigned int cpu, str= uct hlist_node *node) acomp_request_set_callback(req, CRYPTO_TFM_REQ_MAY_BACKLOG, crypto_req_done, &acomp_ctx->wait); =20 - acomp_ctx->mutex =3D per_cpu(zswap_mutex, cpu); - acomp_ctx->dstmem =3D per_cpu(zswap_dstmem, cpu); - return 0; } =20 @@ -1392,20 +1387,98 @@ static int zswap_enabled_param_set(const char *val, return ret; } =20 +static int __zswap_store(struct zswap_entry *entry, struct page *page) +{ + struct scatterlist input, output; + struct crypto_acomp_ctx *acomp_ctx; + struct zpool *zpool; + unsigned long handle; + unsigned int dlen; + u8 *buf, *dst; + gfp_t gfp; + int ret; + int cpu; + struct mutex *mutex; + + cpu =3D raw_smp_processor_id(); + mutex =3D per_cpu(zswap_mutex, cpu); + mutex_lock(mutex); + + acomp_ctx =3D per_cpu_ptr(entry->pool->acomp_ctx, cpu); + buf =3D per_cpu(zswap_buffer, cpu); + + sg_init_table(&input, 1); + sg_set_page(&input, page, PAGE_SIZE, 0); + sg_init_one(&output, buf, PAGE_SIZE); + acomp_request_set_params(acomp_ctx->req, &input, &output, PAGE_SIZE, PAGE= _SIZE); + /* + * it maybe looks a little bit silly that we send an asynchronous request, + * then wait for its completion synchronously. This makes the process look + * synchronous in fact. + * Theoretically, acomp supports users send multiple acomp requests in one + * acomp instance, then get those requests done simultaneously. but in th= is + * case, zswap actually does store and load page by page, there is no + * existing method to send the second page before the first page is done + * in one thread doing zwap. + * but in different threads running on different cpu, we have different + * acomp instance, so multiple threads can do (de)compression in parallel. + */ + ret =3D crypto_wait_req(crypto_acomp_compress(acomp_ctx->req), &acomp_ctx= ->wait); + dlen =3D acomp_ctx->req->dlen; + + if (ret) { + zswap_reject_compress_fail++; + goto unlock; + } + + /* store */ + zpool =3D zswap_find_zpool(entry); + gfp =3D __GFP_NORETRY | __GFP_NOWARN | __GFP_KSWAPD_RECLAIM; + if (zpool_malloc_support_movable(zpool)) + gfp |=3D __GFP_HIGHMEM | __GFP_MOVABLE; + ret =3D zpool_malloc(zpool, dlen, gfp, &handle); + if (ret =3D=3D -ENOSPC) { + zswap_reject_compress_poor++; + goto unlock; + } + if (ret) { + zswap_reject_alloc_fail++; + goto unlock; + } + dst =3D zpool_map_handle(zpool, handle, ZPOOL_MM_WO); + memcpy(dst, buf, dlen); + zpool_unmap_handle(zpool, handle); + mutex_unlock(mutex); + + entry->handle =3D handle; + entry->length =3D dlen; + return 0; + +unlock: + mutex_unlock(mutex); + return ret; +} + static void __zswap_load(struct zswap_entry *entry, struct page *page) { struct zpool *zpool =3D zswap_find_zpool(entry); struct scatterlist input, output; struct crypto_acomp_ctx *acomp_ctx; - u8 *src; + u8 *src, *buf; + int cpu; + struct mutex *mutex; + + cpu =3D raw_smp_processor_id(); + mutex =3D per_cpu(zswap_mutex, cpu); + mutex_lock(mutex); =20 - acomp_ctx =3D raw_cpu_ptr(entry->pool->acomp_ctx); - mutex_lock(acomp_ctx->mutex); + acomp_ctx =3D per_cpu_ptr(entry->pool->acomp_ctx, cpu); =20 src =3D zpool_map_handle(zpool, entry->handle, ZPOOL_MM_RO); if (!zpool_can_sleep_mapped(zpool)) { - memcpy(acomp_ctx->dstmem, src, entry->length); - src =3D acomp_ctx->dstmem; + buf =3D per_cpu(zswap_buffer, cpu); + memcpy(buf, src, entry->length); + src =3D buf; zpool_unmap_handle(zpool, entry->handle); } =20 @@ -1415,7 +1488,7 @@ static void __zswap_load(struct zswap_entry *entry, s= truct page *page) acomp_request_set_params(acomp_ctx->req, &input, &output, entry->length, = PAGE_SIZE); BUG_ON(crypto_wait_req(crypto_acomp_decompress(acomp_ctx->req), &acomp_ct= x->wait)); BUG_ON(acomp_ctx->req->dlen !=3D PAGE_SIZE); - mutex_unlock(acomp_ctx->mutex); + mutex_unlock(mutex); =20 if (zpool_can_sleep_mapped(zpool)) zpool_unmap_handle(zpool, entry->handle); @@ -1539,18 +1612,11 @@ bool zswap_store(struct folio *folio) struct page *page =3D &folio->page; struct zswap_tree *tree =3D zswap_trees[type]; struct zswap_entry *entry, *dupentry; - struct scatterlist input, output; - struct crypto_acomp_ctx *acomp_ctx; struct obj_cgroup *objcg =3D NULL; struct mem_cgroup *memcg =3D NULL; struct zswap_pool *pool; - struct zpool *zpool; - unsigned int dlen =3D PAGE_SIZE; - unsigned long handle, value; - char *buf; - u8 *src, *dst; - gfp_t gfp; - int ret; + u8 *src; + unsigned long value; =20 VM_WARN_ON_ONCE(!folio_test_locked(folio)); VM_WARN_ON_ONCE(!folio_test_swapcache(folio)); @@ -1635,60 +1701,11 @@ bool zswap_store(struct folio *folio) mem_cgroup_put(memcg); } =20 - /* compress */ - acomp_ctx =3D raw_cpu_ptr(entry->pool->acomp_ctx); - - mutex_lock(acomp_ctx->mutex); - - dst =3D acomp_ctx->dstmem; - sg_init_table(&input, 1); - sg_set_page(&input, page, PAGE_SIZE, 0); - - sg_init_one(&output, dst, PAGE_SIZE); - acomp_request_set_params(acomp_ctx->req, &input, &output, PAGE_SIZE, dlen= ); - /* - * it maybe looks a little bit silly that we send an asynchronous request, - * then wait for its completion synchronously. This makes the process look - * synchronous in fact. - * Theoretically, acomp supports users send multiple acomp requests in one - * acomp instance, then get those requests done simultaneously. but in th= is - * case, zswap actually does store and load page by page, there is no - * existing method to send the second page before the first page is done - * in one thread doing zwap. - * but in different threads running on different cpu, we have different - * acomp instance, so multiple threads can do (de)compression in parallel. - */ - ret =3D crypto_wait_req(crypto_acomp_compress(acomp_ctx->req), &acomp_ctx= ->wait); - dlen =3D acomp_ctx->req->dlen; - - if (ret) { - zswap_reject_compress_fail++; - goto put_dstmem; - } - - /* store */ - zpool =3D zswap_find_zpool(entry); - gfp =3D __GFP_NORETRY | __GFP_NOWARN | __GFP_KSWAPD_RECLAIM; - if (zpool_malloc_support_movable(zpool)) - gfp |=3D __GFP_HIGHMEM | __GFP_MOVABLE; - ret =3D zpool_malloc(zpool, dlen, gfp, &handle); - if (ret =3D=3D -ENOSPC) { - zswap_reject_compress_poor++; - goto put_dstmem; - } - if (ret) { - zswap_reject_alloc_fail++; - goto put_dstmem; - } - buf =3D zpool_map_handle(zpool, handle, ZPOOL_MM_WO); - memcpy(buf, dst, dlen); - zpool_unmap_handle(zpool, handle); - mutex_unlock(acomp_ctx->mutex); + if (__zswap_store(entry, page)) + goto put_pool; =20 /* populate entry */ entry->swpentry =3D swp_entry(type, offset); - entry->handle =3D handle; - entry->length =3D dlen; =20 insert_entry: entry->objcg =3D objcg; @@ -1725,8 +1742,6 @@ bool zswap_store(struct folio *folio) =20 return true; =20 -put_dstmem: - mutex_unlock(acomp_ctx->mutex); put_pool: zswap_pool_put(entry->pool); freepage: @@ -1902,10 +1917,10 @@ static int zswap_setup(void) } =20 ret =3D cpuhp_setup_state(CPUHP_MM_ZSWP_MEM_PREPARE, "mm/zswap:prepare", - zswap_dstmem_prepare, zswap_dstmem_dead); + zswap_buffer_prepare, zswap_buffer_dead); if (ret) { - pr_err("dstmem alloc failed\n"); - goto dstmem_fail; + pr_err("buffer alloc failed\n"); + goto buffer_fail; } =20 ret =3D cpuhp_setup_state_multi(CPUHP_MM_ZSWP_POOL_PREPARE, @@ -1940,7 +1955,7 @@ static int zswap_setup(void) zswap_pool_destroy(pool); hp_fail: cpuhp_remove_state(CPUHP_MM_ZSWP_MEM_PREPARE); -dstmem_fail: +buffer_fail: kmem_cache_destroy(zswap_entry_cache); cache_fail: /* if built-in, we aren't unloaded on failure; don't allow use */ --=20 b4 0.10.1