From nobody Sat Jun 13 07:53:19 2026 Received: from out203-205-221-164.mail.qq.com (out203-205-221-164.mail.qq.com [203.205.221.164]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 24D2734AB00; Fri, 8 May 2026 20:20:44 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=203.205.221.164 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778271647; cv=none; b=GqFcscSLpZ8szIYB2wjz0ctoj8SuLo46T/r9D4hYPbm4CuBMVBoMlj84kODW5NsHkwFtVM0OA0FbucbQDiGaTKMUboj2WI/8IPmm5iawvkzmSC8gZnVscDuUidyESMuoGBNBSPaS0p4azyc8LVyTed6ztLI/oieDPnCjB4AFhrA= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778271647; c=relaxed/simple; bh=7pzV74Xy1YMgSDaLbLTCZ/JeQlPopGqRG0xCubRXDXw=; h=Message-ID:From:To:Cc:Subject:Date:In-Reply-To:References: MIME-Version; b=O9VzH1E+te14ccq2kO4nBrGrOdqLEscTpMEx1PZSyA9BZDOzbUKOJJI+qHaG/kpipgm7wA6JJz+VulptYxyH+mBc7nyjhOiAOhRrUal6rI0+DUCdAb9H2CBYGdNL0g6E8E/55S8Zz25NBka9o4iOJuLSASBAFWIjr4EbHvnXhs8= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=qq.com; spf=pass smtp.mailfrom=qq.com; dkim=pass (1024-bit key) header.d=qq.com header.i=@qq.com header.b=dvyJWb0O; arc=none smtp.client-ip=203.205.221.164 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=qq.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=qq.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=qq.com header.i=@qq.com header.b="dvyJWb0O" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=qq.com; s=s201512; t=1778271637; bh=hWqx1blSQdp6dgm46Btp3ftogbY10gYMKeBKkQWtkfw=; h=From:To:Cc:Subject:Date:In-Reply-To:References; b=dvyJWb0OGSTjSU4EJhFhbNSrBz2Bv2eoi6YPznBYGZe/H8Txc+jwnqtfErWpfIls9 mqYysEDTIi9tghh+NoklI0KwGfAWeohhxalnDg501Td6DhzlrQnTMib5uyy95I7+d2 1IQkfYkOXrV098uJK0BFURJP0DVsR870Ieerzsds= Received: from node68.. ([166.111.236.25]) by newxmesmtplogicsvrszc56-0.qq.com (NewEsmtp) with SMTP id 52124024; Sat, 09 May 2026 04:20:33 +0800 X-QQ-mid: xmsmtpt1778271633t2lmi0qjr Message-ID: X-QQ-XMAILINFO: MhK4DKsBP06iWD5gycqxXUMJVpmboO9b7h3t7pVFSfFwEPA48/rNusLwO0eAWi k7sjJptT5r94bC1JGJTTnQmm5eOGSSuE2J/VoMyUkP7jcu2FpemRQaPCfFEAp2KhJeYXQms9BySm ovw48LEXWVXMpIakykHkqM0QASmYbJP+EDRkjpNr/iEbBINAZ/s5I/ReXguOkB5EYtmxX++O/yOn EzxIRvqQmpzuB4kOSc6uUz2texAuYCJd4r2C2+dARfcBKB6LkyGwxHZ35tmpHOasGvJGFBqS8o+B a2Dali0UPmXv9mWJI3zKycQCtpsHDEpCZiEw87Gs715CGnI1JC3LAOlclKDxzyOkct92j0VhFUQm ypvmeHB8QcupvY/V1e3+pR20KRlCFT7+7oKtRuh+pcQShv8Pc6jHTiX9xaNHlFQEhvLz7jqvjH6l WlvD3QUX5Rd6K6B7R8BGjGRINo1Lf2GHOBZHXZipSPMxD0QZBSWrGLBHon6ZHfjL6zDncfo1ooIn YG5PgavlMxuftw+NmgOehI41+zZyUQ+g1VHr5FoLo+GNQmNYokj3KQKSKQWCaha5JU/OYc/JMJur v2nGP1lTM+zNtRBe+/lmKDi4lsqLW/CIB+1y87kfICi/D8rSKhspt6a34R+Ygc4au295uEXmn4nn Yk+Km3Z9y0rGr7T7lwWXBsiE28nay69jcWQEIpFJePkhGGIh2T6yjN3bDwwfMj0YO2mmYY9wqNpH 2qToW6SQbfCiCz/Sf7VqqTQFEPd1H03pKuF/eCrzp8XAixiJ1zQfzFHHm+XTEC857Ljxv/7UfsoH tAUQCg6P5PxlIVkhDRwaHLN1lMrQHNlfpjNaPYL8Y6OqgIQiYszzUHDLdB4HW5IjI4ji2qPRgyRu sJUxfFElIfZ9hYSi1udadCQfwvvcpwUvoFLZvVWfivYnvL74st1G3qPuSOkaGBT3/EHCoQp5nOxh 629zGiBlH5K0TcxmFztsUGgZojjwu0defQirckaTdFGCNRGXwPnbM54K1GEZbfyI6s3jg7Woxffj gyPRYVn0f1oWxvdfnG8BjY2lzZ/YOBtHZPCg3uqg== X-QQ-XMRINFO: MSVp+SPm3vtSI1QTLgDHQqIV1w2oNKDqfg== From: fujunjie To: Andrew Morton , Chris Li , Kairui Song , Johannes Weiner , Nhat Pham , Yosry Ahmed Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, Jonathan Corbet , David Hildenbrand , Ryan Roberts , Barry Song , Baolin Wang , Chengming Zhou , Baoquan He , Lorenzo Stoakes Subject: [RFC PATCH 1/5] mm: zswap: decompress into a folio subpage Date: Fri, 8 May 2026 20:20:29 +0000 X-OQ-MSGID: <20260508202033.1834876-1-fujunjie1@qq.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" zswap_decompress() always writes to offset 0 of the target folio. That is sufficient while zswap only loads order-0 folios, but large folio swapin needs to fill each base page from its own zswap entry. Pass the base-page index to zswap_decompress() and use it for the kmap and scatterlist output offsets. Existing callers pass index 0, so this is a preparatory change with no intended behavior change. Signed-off-by: fujunjie --- mm/zswap.c | 12 +++++++----- 1 file changed, 7 insertions(+), 5 deletions(-) diff --git a/mm/zswap.c b/mm/zswap.c index 4b5149173b0e..afe38dfc5a29 100644 --- a/mm/zswap.c +++ b/mm/zswap.c @@ -921,12 +921,14 @@ static bool zswap_compress(struct page *page, struct = zswap_entry *entry, return comp_ret =3D=3D 0 && alloc_ret =3D=3D 0; } =20 -static bool zswap_decompress(struct zswap_entry *entry, struct folio *foli= o) +static bool zswap_decompress(struct zswap_entry *entry, struct folio *foli= o, + unsigned long index) { struct zswap_pool *pool =3D entry->pool; struct scatterlist input[2]; /* zsmalloc returns an SG list 1-2 entries */ struct scatterlist output; struct crypto_acomp_ctx *acomp_ctx; + size_t offset =3D index * PAGE_SIZE; int ret =3D 0, dlen; =20 acomp_ctx =3D raw_cpu_ptr(pool->acomp_ctx); @@ -939,14 +941,14 @@ static bool zswap_decompress(struct zswap_entry *entr= y, struct folio *folio) =20 WARN_ON_ONCE(input->length !=3D PAGE_SIZE); =20 - dst =3D kmap_local_folio(folio, 0); + dst =3D kmap_local_folio(folio, offset); memcpy_from_sglist(dst, input, 0, PAGE_SIZE); dlen =3D PAGE_SIZE; kunmap_local(dst); flush_dcache_folio(folio); } else { sg_init_table(&output, 1); - sg_set_folio(&output, folio, PAGE_SIZE, 0); + sg_set_folio(&output, folio, PAGE_SIZE, offset); acomp_request_set_params(acomp_ctx->req, input, &output, entry->length, PAGE_SIZE); ret =3D crypto_acomp_decompress(acomp_ctx->req); @@ -1034,7 +1036,7 @@ static int zswap_writeback_entry(struct zswap_entry *= entry, goto out; } =20 - if (!zswap_decompress(entry, folio)) { + if (!zswap_decompress(entry, folio, 0)) { ret =3D -EIO; goto out; } @@ -1611,7 +1613,7 @@ int zswap_load(struct folio *folio) if (!entry) return -ENOENT; =20 - if (!zswap_decompress(entry, folio)) { + if (!zswap_decompress(entry, folio, 0)) { folio_unlock(folio); return -EIO; } --=20 2.34.1 From nobody Sat Jun 13 07:53:19 2026 Received: from out203-205-221-205.mail.qq.com (out203-205-221-205.mail.qq.com [203.205.221.205]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id AF4693E4C97; Fri, 8 May 2026 20:21:54 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=203.205.221.205 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778271717; cv=none; b=eDESe2wEUjuU6bHM6TJWV5TwJRfWGkYktlBCRwJTG9TgpLAeAV0CFahDeroOkGA58sd74K3swBrlfyErVpg0a9ruexNuzluxBAMgSvTHtCSLdfOUcZsyyt3mW+vYYYcktUvFJwdMjGZIhb8ChWfPK/pGXoLilWWhAiFkLT7bk8o= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778271717; c=relaxed/simple; bh=WPGvh3XsqcwEbxeBcx+6QdnRnslVQEHJsQZ56FIeZGs=; h=Message-ID:From:To:Cc:Subject:Date:In-Reply-To:References: MIME-Version; b=nzXjbh6U1JDUUuTj43Eawl9NsETGJIBB7+FwALfqu1lpKHFIJJKC7wjjdAvgY0E4DDP0Jbdr4ZDZ2+JLPHRDpD0iEkyjUJ9ASkizxdRluNVFY9C+kl5kW9mxp120pFuz026ob8l8PazIZjRYcjOhw2qSyUKmjdKCTukOW3LK0Ys= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=qq.com; spf=pass smtp.mailfrom=qq.com; dkim=pass (1024-bit key) header.d=qq.com header.i=@qq.com header.b=wungASmC; arc=none smtp.client-ip=203.205.221.205 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=qq.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=qq.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=qq.com header.i=@qq.com header.b="wungASmC" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=qq.com; s=s201512; t=1778271707; bh=MErEmjiki8eLR7tKMnzhamgQ5TwvhrTV3RrgO75bht8=; h=From:To:Cc:Subject:Date:In-Reply-To:References; b=wungASmCtf/9Pgfd3Ux0SBGkA+uGlQCAEdyZqke1KOnZVE0Yki8iJ271TNwSab558 oZAnVXVNosanf6XQCZlzGDZR8P5cOUSxmM/qZRuoc6hj/5wHVQTcnrr/kfmdgHZOcy pi1M6kG+znHEM+LArlEIvDqNlDbbgnylUeleIt8s= Received: from node68.. ([166.111.236.25]) by newxmesmtplogicsvrszc56-0.qq.com (NewEsmtp) with SMTP id 52124024; Sat, 09 May 2026 04:20:33 +0800 X-QQ-mid: xmsmtpt1778271635tke1hdcuy Message-ID: X-QQ-XMAILINFO: Nwte6vbuj0Eq+xflc0glVhK11CErug6dGLe8F71ZYm74Myk/sg596Jj4bJ5QwL Xk0KOfEkVY4f+Ab3EaMi/ET48FUl9wmDKfhx1wov3J+5pII5WnkIU4kEKsRstbH696zW7e3IbARA ZRPNpqUGT10j36xKt6J88EKj6j9vpTNdbMtxaMmJ8Q6hncry+fz33iaQ7TrerYa7JdO0VuaeNcoL PQTuaqdaM46epuKA7OiRiJ8Ui8RpMXE/rXr50RYIYIG/uCanWRTkA94HtBCyVLhfQTBDL9ME5gtf 4kP5dQ41FGEUkvE+UABBagk9nV6K4XVzRVh/TMLyQnNH2IYLU0Zu76aCR1froDCoig6KzhKZhHjA wEblB9hS9fxvzSJ+VdvJZ65hk+/J3QmD1SLhKfReWd5k1/p7R1Zf3uVsvGI4UX8Hjzt/fKDLFUrh KmFAZVf7TIun2O1eGTkI28lgilqvN+wB0JINKhPlKCcdm83bPWZ/r4UuUKeMeiJAAExKnOwte03N XYPLlKPeaqEYVbXS5RH6rX0gromLrzUVyOVCTMUXR3So+7z1nja6jZ7sRgnwz7oe2CW5ehSTYuN4 RRKYp2qg5xQB2/b9C51wE+0t4Ez148iStBWw2jTyDrs58/MyRBzt134Rb6CaSzNucayK4bJn2sKc KctXN6TbwcYY4k9UJ2TJDK5+cfb1a6V4GHl2J3mJU0lBxDH0Ww+fhNsg8xwE86JjGzicRuY+a04C gMGV0c/sYLGmMQ9/62ZhCIGyeEHejbeWvCz0/D7QY2EW4ypiYbzjYnuL+Zq+StXMIPrSOILz/wXs yMI1AHJgS50gnen0/z873KWQ6CX6+OOk08LrsMS5OcWiCN5MbK6dCVVZd2nX6kPQDLbFFynyXq9H 00BrsIlEQYaj9qr8XoZu2b5wK6HqkN8d03m+4jmkrvm5yPjSF79+W/3duSTTOTvVi858PpRnzEX2 aTJUAIEx2k4pr/gLQMQz3K4zKL7kn+U8IlIwYAmjoHWf8uMwFDmw+IrDLf4lYGSUsebwb3Topt0h HWy3ImDYhOVIwUKLEMVz2MlYuq+bhl/DGt9evqeDSlljxebdh6 X-QQ-XMRINFO: NI4Ajvh11aEjEMj13RCX7UuhPEoou2bs1g== From: fujunjie To: Andrew Morton , Chris Li , Kairui Song , Johannes Weiner , Nhat Pham , Yosry Ahmed Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, Jonathan Corbet , David Hildenbrand , Ryan Roberts , Barry Song , Baolin Wang , Chengming Zhou , Baoquan He , Lorenzo Stoakes Subject: [RFC PATCH 2/5] mm: zswap: add a zswap entry batch helper Date: Fri, 8 May 2026 20:20:30 +0000 X-OQ-MSGID: <20260508202033.1834876-2-fujunjie1@qq.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Large folio swapin has to know whether a contiguous swap range is backed consistently by zswap. A range that is partly in zswap and partly on disk cannot be read by the existing whole-folio swap_read_folio() backend selection. Add zswap_entry_batch(), mirroring the existing zeromap batch query: it returns how many entries starting at a swap entry share the same zswap presence, and optionally reports the first entry's state. The CONFIG_ZSWAP=3Dn stub reports that the whole range is not in zswap. Signed-off-by: fujunjie --- include/linux/zswap.h | 9 +++++++++ mm/zswap.c | 36 ++++++++++++++++++++++++++++++++++++ 2 files changed, 45 insertions(+) diff --git a/include/linux/zswap.h b/include/linux/zswap.h index 30c193a1207e..b9d71f027200 100644 --- a/include/linux/zswap.h +++ b/include/linux/zswap.h @@ -27,6 +27,7 @@ struct zswap_lruvec_state { unsigned long zswap_total_pages(void); bool zswap_store(struct folio *folio); int zswap_load(struct folio *folio); +int zswap_entry_batch(swp_entry_t swp, int max_nr, bool *is_zswap); void zswap_invalidate(swp_entry_t swp); int zswap_swapon(int type, unsigned long nr_pages); void zswap_swapoff(int type); @@ -49,6 +50,14 @@ static inline int zswap_load(struct folio *folio) return -ENOENT; } =20 +static inline int zswap_entry_batch(swp_entry_t swp, int max_nr, + bool *is_zswap) +{ + if (is_zswap) + *is_zswap =3D false; + return max_nr; +} + static inline void zswap_invalidate(swp_entry_t swp) {} static inline int zswap_swapon(int type, unsigned long nr_pages) { diff --git a/mm/zswap.c b/mm/zswap.c index afe38dfc5a29..27c14b8edd15 100644 --- a/mm/zswap.c +++ b/mm/zswap.c @@ -234,6 +234,42 @@ static inline struct xarray *swap_zswap_tree(swp_entry= _t swp) >> ZSWAP_ADDRESS_SPACE_SHIFT]; } =20 +/* + * Return the number of contiguous swap entries that share the same zswap + * presence as @swp. If @is_zswap is not NULL, return @swp's zswap status. + * + * Context: callers must keep the swap type alive. The result is a snapshot + * of zswap xarray presence; callers must tolerate races by rechecking und= er + * the lock that matters for their operation or by falling back safely. + */ +int zswap_entry_batch(swp_entry_t swp, int max_nr, bool *is_zswap) +{ + pgoff_t offset =3D swp_offset(swp); + bool first; + int i; + + if (zswap_never_enabled()) { + if (is_zswap) + *is_zswap =3D false; + return max_nr; + } + + first =3D !!xa_load(swap_zswap_tree(swp), offset); + if (is_zswap) + *is_zswap =3D first; + + for (i =3D 1; i < max_nr; i++) { + swp_entry_t entry =3D swp_entry(swp_type(swp), offset + i); + bool present; + + present =3D !!xa_load(swap_zswap_tree(entry), offset + i); + if (present !=3D first) + return i; + } + + return max_nr; +} + #define zswap_pool_debug(msg, p) \ pr_debug("%s pool %s\n", msg, (p)->tfm_name) =20 --=20 2.34.1 From nobody Sat Jun 13 07:53:19 2026 Received: from out203-205-221-240.mail.qq.com (out203-205-221-240.mail.qq.com [203.205.221.240]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 8BC22308F26; Fri, 8 May 2026 20:22:02 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=203.205.221.240 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778271724; cv=none; b=tnhRrTGT/Ybd2ou6DK7bY/QLOHEgqqiPie3rHNxESS2o7qx8naRz7Z+QpttdYjeJ5tutDCZJNrQ2qt6Q3eiaGGCLntHN/W/lhR4J7CKuNsNmAjuIRbjYvAY9pZDxBAKhOmgP6TLubwL7TcJdOKte6KSDPH2SJirkP0l3nXqmgdg= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778271724; c=relaxed/simple; bh=sO/poesBe+fZPMzonrnn4jkb8v/LkuUSjZXKGKF0WYM=; h=Message-ID:From:To:Cc:Subject:Date:In-Reply-To:References: MIME-Version; b=aAfWcmjuTdZCF58AVx3oWJkdFY5hj713vcKhpC2H4qa3SzzZK1pjTz7dJqbo5tAyxlFWf3bxBUhvdkY0Z7hscflUmgrMNILbn77pTMyPNdr+Iux46NSdzSDAmw535A1QF1P8lhTv778rDkVnIrz8wZleb+kopjVtIT9zxm9Xl1I= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=qq.com; spf=pass smtp.mailfrom=qq.com; dkim=pass (1024-bit key) header.d=qq.com header.i=@qq.com header.b=mIrC3hOZ; arc=none smtp.client-ip=203.205.221.240 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=qq.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=qq.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=qq.com header.i=@qq.com header.b="mIrC3hOZ" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=qq.com; s=s201512; t=1778271714; bh=PjGDfOb5t/z0HQUZGodd3GW8KBtkUHKvNElfu8QNbQE=; h=From:To:Cc:Subject:Date:In-Reply-To:References; b=mIrC3hOZd363qdufDOJTXyYn0/lVwHj1WYH7Zouw45HKv9aQWiLwXTtYfar44GO4E szkEk0IM1BMaLffoyD9XK+f8jIe85BkPhrPqpjsqKmxRRlK7Qm3KysXgPpzGR9Wg7l Ic48BPrFUDzASl4n7C+/e6ktbYNu34uTcWP4Muj4= Received: from node68.. ([166.111.236.25]) by newxmesmtplogicsvrszc56-0.qq.com (NewEsmtp) with SMTP id 52124024; Sat, 09 May 2026 04:20:33 +0800 X-QQ-mid: xmsmtpt1778271638tjr8dbo19 Message-ID: X-QQ-XMAILINFO: Npv5yIQT4nxMH8Drs51QFZOeh0A9puObyTusyC9D93b6utLw564wsyCMzyOtNf kjO75MLkPkbvgBxqdy13laTpj2lboU00WC/pXpBrLBoE4JzEYTyuiUzWNgx3SZvil/V4/mrvUGRT jqnc/9yJH+aB0V0NiPmFLqN0BwnkgjEdLofGZxyJ9zVJ5UDHU1VFrK/4fmqfH9W3dKCwCZutl5bg 3LYZPZUYDwAFyn3Ab+Hzp6Dv02pJr0dgkC7JDBjU+iZIaisXq+30kl1xTaGy2PfRHDLa2RDMGWQZ Ul4QxDgnCJPVanhycI0tCuS5hExgjOl3H31OquQT2w2xrxDlj/4FGBUhO6sq1G9c3EBEakUtk6j+ Zi9tANtjkEJqYhCza0OzwfE0/SqQbKzRaqrY1EFci5HGCsLJK2vAU9ysanCl/BbYStMhT6MuQWYD L3pEeU5BaU7/PqrBec5I0+dwRM9xpRee5vzhajJhmX2AGhndD4EiB9ru46QjOPBG5/81s/tQJnBx uLVwwMT+byuyZttJkv3V2VRbjMn/dQV88530KjCcsuH0O9Dtwvcad3iJJdeZQ+CWc8cxUFVagVX6 0DMo/3O8Nr48i81Pn+bhT6avFi7S3plGed1o3+Wglxa+E9o1EBdTfN/dJrRZD0BG1XldQroSFGDd fCn3M0cOyaL7m1g2ahtq1xVdhUE3ULlVamVxp8n0but+Ryq8uaiQr6aOev2IBYUaP1eHUkgAxN3D Y7bSiK/2jMRmJTpQ0eA56g1JiXj0E97HkZfFiNoMsZ6RsDtDbkeKxA5F1sLlZRcVxfU/lnAwGCL7 J654qEecFSdrI532G0RoHU+q8M2sy8Yd4k6jggfT/bSC9uhQko/BblARXd+OWd7+WiaHz50tw0qD j8YEGz5SM3iiDr8xrkdcgKKo3XKbH/56jMg7wc+GJmPPStZVFkEeNf4YB+d2zlX7URzIARh4nfIB jZSODjYY3BicXVdtWJ7+Iw+PKp0RtKSWNlnL+6eStdHy+6BqVB6kg8CHfUsQS7STUBuDmUPWDms/ 1A8WsshFyYrXxvSmKxk7JTSxqgu3yy5wcCJ/9z3g== X-QQ-XMRINFO: OD9hHCdaPRBwH5bRRRw8tsiH4UAatJqXfg== From: fujunjie To: Andrew Morton , Chris Li , Kairui Song , Johannes Weiner , Nhat Pham , Yosry Ahmed Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, Jonathan Corbet , David Hildenbrand , Ryan Roberts , Barry Song , Baolin Wang , Chengming Zhou , Baoquan He , Lorenzo Stoakes Subject: [RFC PATCH 3/5] mm: zswap: load fully stored large folios Date: Fri, 8 May 2026 20:20:31 +0000 X-OQ-MSGID: <20260508202033.1834876-3-fujunjie1@qq.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" zswap_store() already stores every base page of a large folio as a separate zswap entry and tears the whole folio back down on store failure. The load side still rejects any large folio, which forces the swapin path to avoid mTHP swapin once zswap has ever been enabled. Use zswap_entry_batch() to distinguish three cases: the whole range is absent from zswap and should fall through to the disk backend, the whole range is present and can be decompressed one base page at a time, or the range is mixed and must be treated as an invalid large-folio backend selection. After all entries decompress successfully, mark the folio uptodate and dirty, account the mTHP swpin stat once for the folio, account one ZSWPIN event per base page, and invalidate each zswap entry because the swapcache folio becomes authoritative. Signed-off-by: fujunjie --- Documentation/admin-guide/mm/transhuge.rst | 4 +- mm/zswap.c | 65 ++++++++++++++-------- 2 files changed, 45 insertions(+), 24 deletions(-) diff --git a/Documentation/admin-guide/mm/transhuge.rst b/Documentation/adm= in-guide/mm/transhuge.rst index 5fbc3d89bb07..05456906aff6 100644 --- a/Documentation/admin-guide/mm/transhuge.rst +++ b/Documentation/admin-guide/mm/transhuge.rst @@ -644,8 +644,8 @@ zswpout piece without splitting. =20 swpin - is incremented every time a huge page is swapped in from a non-zswap - swap device in one piece. + is incremented every time a huge page is swapped in from swap I/O or + zswap in one piece. =20 swpin_fallback is incremented if swapin fails to allocate or charge a huge page diff --git a/mm/zswap.c b/mm/zswap.c index 27c14b8edd15..863ca1e896ed 100644 --- a/mm/zswap.c +++ b/mm/zswap.c @@ -28,6 +28,7 @@ #include #include #include +#include #include #include #include @@ -1614,20 +1615,23 @@ bool zswap_store(struct folio *folio) * NOT marked up-to-date, so that an IO error is emitted (e.g. do_swap_pa= ge() * will SIGBUS). * - * -EINVAL: if the swapped out content was in zswap, but the page belongs - * to a large folio, which is not supported by zswap. The folio is unlock= ed, - * but NOT marked up-to-date, so that an IO error is emitted (e.g. - * do_swap_page() will SIGBUS). + * -EINVAL: if the folio spans a mix of zswap and non-zswap entries. The + * folio is unlocked, but NOT marked up-to-date, so that an IO error is + * emitted (e.g. do_swap_page() will SIGBUS). Large folio swapin should + * reject such ranges before calling zswap_load(). * - * -ENOENT: if the swapped out content was not in zswap. The folio remains + * -ENOENT: if the swapped out content was not in zswap. For a large foli= o, + * this means the whole folio range was not in zswap. The folio remains * locked on return. */ int zswap_load(struct folio *folio) { swp_entry_t swp =3D folio->swap; pgoff_t offset =3D swp_offset(swp); - struct xarray *tree =3D swap_zswap_tree(swp); struct zswap_entry *entry; + int nr_pages =3D folio_nr_pages(folio); + bool is_zswap; + int index; =20 VM_WARN_ON_ONCE(!folio_test_locked(folio)); VM_WARN_ON_ONCE(!folio_test_swapcache(folio)); @@ -1635,30 +1639,36 @@ int zswap_load(struct folio *folio) if (zswap_never_enabled()) return -ENOENT; =20 - /* - * Large folios should not be swapped in while zswap is being used, as - * they are not properly handled. Zswap does not properly load large - * folios, and a large folio may only be partially in zswap. - */ - if (WARN_ON_ONCE(folio_test_large(folio))) { + if (zswap_entry_batch(swp, nr_pages, &is_zswap) !=3D nr_pages) { + WARN_ON_ONCE(folio_test_large(folio)); folio_unlock(folio); return -EINVAL; } =20 - entry =3D xa_load(tree, offset); - if (!entry) + if (!is_zswap) return -ENOENT; =20 - if (!zswap_decompress(entry, folio, 0)) { - folio_unlock(folio); - return -EIO; + for (index =3D 0; index < nr_pages; index++) { + swp_entry_t entry_swp =3D swp_entry(swp_type(swp), + offset + index); + struct xarray *tree =3D swap_zswap_tree(entry_swp); + + entry =3D xa_load(tree, offset + index); + if (WARN_ON_ONCE(!entry)) { + folio_unlock(folio); + return -EINVAL; + } + + if (!zswap_decompress(entry, folio, index)) { + folio_unlock(folio); + return -EIO; + } } =20 folio_mark_uptodate(folio); =20 - count_vm_event(ZSWPIN); - if (entry->objcg) - count_objcg_events(entry->objcg, ZSWPIN, 1); + count_mthp_stat(folio_order(folio), MTHP_STAT_SWPIN); + count_vm_events(ZSWPIN, nr_pages); =20 /* * We are reading into the swapcache, invalidate zswap entry. @@ -1668,8 +1678,19 @@ int zswap_load(struct folio *folio) * compression work. */ folio_mark_dirty(folio); - xa_erase(tree, offset); - zswap_entry_free(entry); + + for (index =3D 0; index < nr_pages; index++) { + swp_entry_t entry_swp =3D swp_entry(swp_type(swp), + offset + index); + struct xarray *tree =3D swap_zswap_tree(entry_swp); + + entry =3D xa_erase(tree, offset + index); + if (WARN_ON_ONCE(!entry)) + continue; + if (entry->objcg) + count_objcg_events(entry->objcg, ZSWPIN, 1); + zswap_entry_free(entry); + } =20 folio_unlock(folio); return 0; --=20 2.34.1 From nobody Sat Jun 13 07:53:19 2026 Received: from out203-205-221-221.mail.qq.com (out203-205-221-221.mail.qq.com [203.205.221.221]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 49B2D270575; Fri, 8 May 2026 20:20:45 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=203.205.221.221 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778271647; cv=none; b=PUXpX0/4axzuLd9tNqDjF8P9Xea2Q1766U2+LNu+XZVBbWV0BxEUjKFyWVcqfCyjfC7uiiKVre6Ox3EJuVODQI2iOtj6Ei6i7QK0yjw2vV/D5wmjshwLU9FQ9vVP1qvPqtqiHaY5xW+tyw4xULPDmdN74nVgAsPCfBsjUhbc3z4= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778271647; c=relaxed/simple; bh=0SsTQrE8/CfukGLoFTODd/Ziz6CRTgjLjK9jGiQ4Z/s=; h=Message-ID:From:To:Cc:Subject:Date:In-Reply-To:References: MIME-Version; b=SR4V1rP45KRNsq3FJwg1rtwGIdvp2kdWAfvl4K9Mimy9mKODk2PlpMTcWSlzWpn6Sb3vIUmGPkcEO4GiIk9pEn+EQN44DWFaXRgDRqQtE3asgVRKV+wtW+bwjc4tmRXcicCQjPJBkVXwSnklKPUnWgCeK3EUTHpHzoiY2fHZ0lk= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=qq.com; spf=pass smtp.mailfrom=qq.com; dkim=pass (1024-bit key) header.d=qq.com header.i=@qq.com header.b=C4Svgf+C; arc=none smtp.client-ip=203.205.221.221 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=qq.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=qq.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=qq.com header.i=@qq.com header.b="C4Svgf+C" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=qq.com; s=s201512; t=1778271643; bh=7+VsAe268AqTCt3bJy0GHLNrVxZajJrbaFUpnRg04rA=; h=From:To:Cc:Subject:Date:In-Reply-To:References; b=C4Svgf+Cua3obXizWlV2nGfPpWvCazlMA5FVVMQc5uN2z5eVznxDUyBmpdR1EctLv 3YDdso6F8BtdYh49Dc0G7UnLulbGTDM9vqJR8jZcXLQZwBGD+pZoRmD8Em2/W4o7oT 3E3GZTS4Nv4TbM0h41n07gGgDUOFm+AofYhi/nbI= Received: from node68.. ([166.111.236.25]) by newxmesmtplogicsvrszc56-0.qq.com (NewEsmtp) with SMTP id 52124024; Sat, 09 May 2026 04:20:33 +0800 X-QQ-mid: xmsmtpt1778271640tmrmaao9o Message-ID: X-QQ-XMAILINFO: OATpkVjS499uoZq1R1mcC6Jb7yeLTJbbmAfMYJ8QqsiQrPoRXtIk/dAtVzCvNu YH4XmMQyzr/Zi6/QFcszsPO8FVD/417uIaBeVCyABHrm1lqRt6gp/EMGG6hL0GDY7QMzsHZEzrKr 5aiYFn1muYUnHC5nDw9zogMmMYNduVIsEqPjrxlKUiE95THEL4mQzyDOf16Bh/idM1ozDn2nnLIc u4TP9X45EBo61+7csmKqkNnNrTvW3Ny+lk8A0KQNr9rJzZwBdt02xuplMF48tjZ9sMIG6pKLM2IA 5PlCvduFrUdVcuGkV/OcJjD1ltjZf+HbuuLRYgAUnanPM6XzPbrzsyN3h1Ngl1kN8PLZ+4bLfJeP 4TWv78/1bFqcgcXZIEKZwtwsV1EUOW1vRK8YyKGivI5wg/xy4K8tGn6Fu4LBFWHBVN82F4rjz+4Z R9zAy/wrRlh4RiI6xwiIf2si1EETe/lQQ9k5GaDN/Kw+HLmXif6UwccKsGeHHwjJn3ORz6mXT6N7 jsQRE8SIdq00db+ZwSeqnYDPWqYlDdy1LMtrxmg5ODfEHEA3fIzC79HyWTs7ROuH978XDkSvDLTP kHkLzgkcrer4CY0E5FaTDiT1bY6cNgz09Ds11PKfzwpyDjR0u5q1RVFgMabn+wP8eutkhT0N2SZV mYRQHLqaN9JPlkLu3v+kUigyfZ0qoETkV0lizuwU9I2pZDMsJC79KGmcUtRbbknJY3G4lgeHA/Lm DQYPpPnXOYPfwtl7iQR3+Wcfb7AL2BbKAavXSWnmX8nz0Fwk1uVUzi3U5QKKjmE9ceXLwNFxADey 8qBRIitATtl4os38v5ttIF31UUUn8+kOjCfu0lQ0qClI8FuwLoAbmM7eT76kZCsGNAWxV77aa5kK jjkgpJQuGITnTHZ6HZ+ELQpTsgFasU0AiwmmY+cFo3193QgWY2YBdlzPVl31jNJvK63dsq3e45nC UAA7HLsv/V3g9tovlugrvqzPMs8g/Z1+YBboemzQxzzDYB5VidpEl015kzEfhzdXpCYF+qZIHw6I W8C299uLZzEcQYXcMs3gUmJMjklf6AxUOaydthB9P0ImOSG55SIgj3c5U2EUSRO3vZHTOmXIAnR5 kEelkJ X-QQ-XMRINFO: NI4Ajvh11aEjEMj13RCX7UuhPEoou2bs1g== From: fujunjie To: Andrew Morton , Chris Li , Kairui Song , Johannes Weiner , Nhat Pham , Yosry Ahmed Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, Jonathan Corbet , David Hildenbrand , Ryan Roberts , Barry Song , Baolin Wang , Chengming Zhou , Baoquan He , Lorenzo Stoakes Subject: [RFC PATCH 4/5] mm: swap: fall back to order-0 after large swapin races Date: Fri, 8 May 2026 20:20:32 +0000 X-OQ-MSGID: <20260508202033.1834876-4-fujunjie1@qq.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" swapin_folio() documents that a large folio insertion race returns NULL so the caller can fall back to order-0 swapin. do_swap_page() currently turns that NULL into VM_FAULT_OOM if the PTE is unchanged, which is harsher than necessary and gets in the way of rejecting large folio ranges for backend reasons. Move the synchronous swapin sequence into a helper and retry with an order-0 folio when a large folio cannot be inserted into the swap cache. Count the event as an mTHP swapin fallback before dropping the failed large allocation. Signed-off-by: fujunjie --- mm/memory.c | 50 +++++++++++++++++++++++++++++++++++++++----------- 1 file changed, 39 insertions(+), 11 deletions(-) diff --git a/mm/memory.c b/mm/memory.c index ea6568571131..84e3b77b8293 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -4757,6 +4757,44 @@ static struct folio *alloc_swap_folio(struct vm_faul= t *vmf) } #endif /* CONFIG_TRANSPARENT_HUGEPAGE */ =20 +static struct folio *swapin_synchronous_folio(swp_entry_t entry, + struct vm_fault *vmf) +{ + struct folio *swapcache, *folio; + bool large; + int order; + + folio =3D alloc_swap_folio(vmf); + if (!folio) + return NULL; + + large =3D folio_test_large(folio); + order =3D folio_order(folio); + + /* + * folio is charged, so swapin can only fail due to raced swapin and + * return NULL. + */ + swapcache =3D swapin_folio(entry, folio); + if (swapcache =3D=3D folio) + return folio; + + if (!swapcache && large) + count_mthp_stat(order, MTHP_STAT_SWPIN_FALLBACK); + folio_put(folio); + if (swapcache || !large) + return swapcache; + + folio =3D __alloc_swap_folio(vmf); + if (!folio) + return NULL; + + swapcache =3D swapin_folio(entry, folio); + if (swapcache !=3D folio) + folio_put(folio); + return swapcache; +} + /* Sanity check that a folio is fully exclusive */ static void check_swap_exclusive(struct folio *folio, swp_entry_t entry, unsigned int nr_pages) @@ -4860,17 +4898,7 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) swap_update_readahead(folio, vma, vmf->address); if (!folio) { if (data_race(si->flags & SWP_SYNCHRONOUS_IO)) { - folio =3D alloc_swap_folio(vmf); - if (folio) { - /* - * folio is charged, so swapin can only fail due - * to raced swapin and return NULL. - */ - swapcache =3D swapin_folio(entry, folio); - if (swapcache !=3D folio) - folio_put(folio); - folio =3D swapcache; - } + folio =3D swapin_synchronous_folio(entry, vmf); } else { folio =3D swapin_readahead(entry, GFP_HIGHUSER_MOVABLE, vmf); } --=20 2.34.1 From nobody Sat Jun 13 07:53:19 2026 Received: from out203-205-221-192.mail.qq.com (out203-205-221-192.mail.qq.com [203.205.221.192]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 814EA34FF55; Fri, 8 May 2026 20:20:52 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=203.205.221.192 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778271654; cv=none; b=jnK6JzRJeBDeZhYCrS8SqIaD973aww1NvjuoRK3p58K1AJwRO7bNltkuffg0LL1EKMCadksze3s/PV5GY1ZYZ1pO1HI5TBTvq/gPkjQ3V+wxlFjbDINAIhDdDtdYDKM/L2zk+z+Z+Bwyg5ptH05hRG7uCs43OnYOTOwwgj+P3oA= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778271654; c=relaxed/simple; bh=y5oW0jfwZL5vGZwCFKQomyGYLdP1diisEKrDekyCXP8=; h=Message-ID:From:To:Cc:Subject:Date:In-Reply-To:References: MIME-Version; b=O6hqULZoXFl24ysygagcEDsuxljxURiBEZWkOQpcT1cKfFXjbFetRt/AT2aq31gbu7s6pOeM9QbKzIKmz75d+5GxGwnRIqj8NRKUklUBbwaCZKihIUWfKyImicbuxo18wOwlO8mzCcipTkX3j+T/py6drxkOzPZzOJic6NHzchk= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=qq.com; spf=pass smtp.mailfrom=qq.com; dkim=pass (1024-bit key) header.d=qq.com header.i=@qq.com header.b=o36snXT4; arc=none smtp.client-ip=203.205.221.192 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=qq.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=qq.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=qq.com header.i=@qq.com header.b="o36snXT4" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=qq.com; s=s201512; t=1778271645; bh=29KI1/j+JXdLBNJzbc0TXvIFwLMcWJt6tJQuQMLll40=; h=From:To:Cc:Subject:Date:In-Reply-To:References; b=o36snXT4uUPxxiSeCajaLzd00StdwDaXhRZKGmJkzycdoBucQN0YpUhH86lI8Dk9D 0P4zMJEmN7rg1eBVAGQf/eRYgVCPZTZxERUba7iabrOvBuYdC8H7lbVtRmS0uXZ2Yu MoBLG3n34AW0dP8CK2pT0XHzw0O9uYMadRcNg6p4= Received: from node68.. ([166.111.236.25]) by newxmesmtplogicsvrszc56-0.qq.com (NewEsmtp) with SMTP id 52124024; Sat, 09 May 2026 04:20:33 +0800 X-QQ-mid: xmsmtpt1778271642t5u9irdiv Message-ID: X-QQ-XMAILINFO: OAzfp65MIvpERzYT1k3ro0wixZxiJcL77I+UCCx7gAIkkO/isQbR5rcycSQ58A lM/4VEiqT1B75yxeE0/F08hxii/mHvWHARVhTLdadMdbz/AXjuCePyI5oOBOecipuLdvKWGQqR0Y uGKU6YhfFfPesimHLW0fYB0MyNBhFVirEgLBz3AbjEyH3Qu3hyw7ZGqadbsgrCgxIIf0G1JTBgJQ QC0BolnFeSoobUn4TlOvszp0kGYwUBwz4CbqPbmKB8qmThDHpYXtPpZ79bYDzEgz/zfVYN4/3F5P GceU1wq+27Jc3Q+qMXi5mE5ghsXpO9Lc+UGVjYDwOwPTgoc6XPI+wvdERQKx9754Sy3+hX2czNF4 7xwUnsJNYgkisXP5KR5pc1SsWhDfS+1iGi2Qa+hg9b1+/PRZU5VVCC95nidVNneEQKcYPoCtcnK+ hK3ZXQHSJQJ2BoeWsPHUSzeSFQpAxuEPI9rm1GN5xrkIF71DP1/2SU1XOBJndW6xtmO1nZtBwcUI nUulVxL08Xk5GvVrFTL51kn0LRhkjnNCRG8sJy5Tqas1m4Xjj1dqkgwcc+eXUmGYQch2jYLfACp/ ugjLTyoSgwTj2mZxiUYWRWAOsCC8xM+XNVERQy3HWR2m84Uw9zUzX2pJHntR6iSp569OnYY2vG4X 8FcMYREcE+1BSAfavOA2/R3S/5nFalVzoBwqJceMiIj3miPxbcMoPoeTtTj9dFj9z+bK4HMcgITe +q6ubAJueWLZcrXEAp9+h2WDrQl7T71BI9oiYtOsrQ4F4KFPQ9SbFYrCDow6AV9ilPmjdJB06sJ9 1SBbMJBH4hUxb7Vi9bVD7+WTDOgy02L+s4Gow/FUrufXRsDYouYasZhw4UilxEgVxAlJ6Lr5op7/ FgUaCYazB9Guf6gAibFPqbZNwwVsIyt0nSdIz0NlLKVoSkk7+6ba3SRYBQWGU0E6kCp258PqNwwN obPLFTxcf+8zed+A6wGNqHuy1z4N+RoDuW1rHa4W0W9FL7LEBTu7CEcEN/kJz/6wOqoYrfh24aog UvOvahj/Y5Y6aZO+Or X-QQ-XMRINFO: Nq+8W0+stu50tPAe92KXseR0ZZmBTk3gLg== From: fujunjie To: Andrew Morton , Chris Li , Kairui Song , Johannes Weiner , Nhat Pham , Yosry Ahmed Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, Jonathan Corbet , David Hildenbrand , Ryan Roberts , Barry Song , Baolin Wang , Chengming Zhou , Baoquan He , Lorenzo Stoakes Subject: [RFC PATCH 5/5] mm: swap: allow zswap-backed large folio swapin Date: Fri, 8 May 2026 20:20:33 +0000 X-OQ-MSGID: <20260508202033.1834876-5-fujunjie1@qq.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" alloc_swap_folio() has been falling back to order-0 in the anonymous synchronous swapin path whenever zswap was ever enabled, because a large folio range could contain a mixture of zswap and non-zswap entries and zswap_load() could not handle large folios. zswap_load() can now load a range that is fully present in zswap, and zswap_entry_batch() can identify mixed zswap ranges. Use that check alongside the existing zeromap and swapcache checks when selecting a large folio for anonymous swapin, and recheck before inserting a large folio into the swap cache while holding the swap cluster lock. With mixed zswap ranges rejected and the insertion-race fallback in place, remove the blanket zswap_never_enabled() fallback from the anonymous swapin path so all-zswap and all-disk anonymous ranges can use mTHP swapin. Shmem keeps its existing zswap fallback and is outside this RFC. Signed-off-by: fujunjie --- mm/memory.c | 21 ++++++--------------- mm/swap_state.c | 23 +++++++++++++++-------- 2 files changed, 21 insertions(+), 23 deletions(-) diff --git a/mm/memory.c b/mm/memory.c index 84e3b77b8293..0be249108de1 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -78,6 +78,7 @@ #include #include #include +#include =20 #include =20 @@ -4635,13 +4636,11 @@ static bool can_swapin_thp(struct vm_fault *vmf, pt= e_t *ptep, int nr_pages) if (swap_pte_batch(ptep, nr_pages, pte) !=3D nr_pages) return false; =20 - /* - * swap_read_folio() can't handle the case a large folio is hybridly - * from different backends. And they are likely corner cases. Similar - * things might be added once zswap support large folios. - */ + /* swap_read_folio() can't handle hybrid backend large folios. */ if (unlikely(swap_zeromap_batch(entry, nr_pages, NULL) !=3D nr_pages)) return false; + if (unlikely(zswap_entry_batch(entry, nr_pages, NULL) !=3D nr_pages)) + return false; if (unlikely(non_swapcache_batch(entry, nr_pages) !=3D nr_pages)) return false; =20 @@ -4690,14 +4689,6 @@ static struct folio *alloc_swap_folio(struct vm_faul= t *vmf) if (unlikely(userfaultfd_armed(vma))) goto fallback; =20 - /* - * A large swapped out folio could be partially or fully in zswap. We - * lack handling for such cases, so fallback to swapping in order-0 - * folio. - */ - if (!zswap_never_enabled()) - goto fallback; - entry =3D softleaf_from_pte(vmf->orig_pte); /* * Get a list of all the (large) orders below PMD_ORDER that are enabled @@ -4772,8 +4763,8 @@ static struct folio *swapin_synchronous_folio(swp_ent= ry_t entry, order =3D folio_order(folio); =20 /* - * folio is charged, so swapin can only fail due to raced swapin and - * return NULL. + * folio is charged, so NULL means the large folio could not be + * inserted and needs order-0 fallback. */ swapcache =3D swapin_folio(entry, folio); if (swapcache =3D=3D folio) diff --git a/mm/swap_state.c b/mm/swap_state.c index 1415a5c54a43..4e58fad5e5f0 100644 --- a/mm/swap_state.c +++ b/mm/swap_state.c @@ -22,6 +22,7 @@ #include #include #include +#include #include "internal.h" #include "swap_table.h" #include "swap.h" @@ -207,6 +208,11 @@ static int swap_cache_add_folio(struct folio *folio, s= wp_entry_t entry, if (swp_tb_is_shadow(old_tb)) shadow =3D swp_tb_to_shadow(old_tb); } while (++ci_off < ci_end); + if (unlikely(folio_test_large(folio) && + zswap_entry_batch(entry, nr_pages, NULL) !=3D nr_pages)) { + err =3D -EAGAIN; + goto failed; + } __swap_cache_add_folio(ci, folio, entry); swap_cluster_unlock(ci); if (shadowp) @@ -460,7 +466,8 @@ void swap_update_readahead(struct folio *folio, struct = vm_area_struct *vma, * * Context: Caller must protect the swap device with reference count or lo= cks. * Return: Returns the folio being added on success. Returns the existing = folio - * if @entry is already cached. Returns NULL if raced with swapin or swapo= ff. + * if @entry is already cached. Returns NULL if raced with swapin or swapo= ff, + * or if a large folio fails a backend recheck before insertion. */ static struct folio *__swap_cache_prepare_and_add(swp_entry_t entry, struct folio *folio, @@ -483,10 +490,10 @@ static struct folio *__swap_cache_prepare_and_add(swp= _entry_t entry, =20 /* * Large order allocation needs special handling on - * race: if a smaller folio exists in cache, swapin needs - * to fallback to order 0, and doing a swap cache lookup - * might return a folio that is irrelevant to the faulting - * entry because @entry is aligned down. Just return NULL. + * race or backend recheck failure: swapin needs to fall back + * to order 0, and doing a swap cache lookup might return a + * folio that is irrelevant to the faulting entry because + * @entry is aligned down. Just return NULL. */ if (ret !=3D -EEXIST || folio_test_large(folio)) goto failed; @@ -567,9 +574,9 @@ struct folio *swap_cache_alloc_folio(swp_entry_t entry,= gfp_t gfp_mask, * with the folio size. * * Return: returns pointer to @folio on success. If folio is a large folio - * and this raced with another swapin, NULL will be returned to allow fall= back - * to order 0. Else, if another folio was already added to the swap cache, - * return that swap cache folio instead. + * and it raced with another swapin or failed a backend recheck, NULL will= be + * returned to allow fallback to order 0. Else, if another folio was alrea= dy + * added to the swap cache, return that swap cache folio instead. */ struct folio *swapin_folio(swp_entry_t entry, struct folio *folio) { --=20 2.34.1