From nobody Sun Feb  8 21:56:50 2026
Received: from mail-pf1-f169.google.com (mail-pf1-f169.google.com
 [209.85.210.169])
	(using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id E05D134C9AE
	for <linux-kernel@vger.kernel.org>; Thu,  4 Dec 2025 19:30:11 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=209.85.210.169
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1764876613; cv=none;
 b=SqFE7e0t0OMUv3oiRAK4HhVILE0dv78T212mUQxdn4Y+cvbuwqK6OcM9y4LzmLZIy8RsdW7d1IokMKAGEWCnkPB0Aiid+whca+4ISnB7ggqOkzI2kMUlIgHJp8CINvHOOBuSckAva5p2eVb2RRIjJaCmrpC1eHsNw8gpAuTJNHI=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1764876613; c=relaxed/simple;
	bh=VpmU7dnhJ7AtOmuxOiwPT77Wg1F9Y9a9Zu8+/fWt1Pk=;
	h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References:
	 In-Reply-To:To:Cc;
 b=MY858qIfM/wi7Hh+VLW2XMJAyvnncuEtnp8o2DQZ4mxRQ55fxAetUlb7WzTS1SsYpbb0cYOdDOY5CcitpjuuqTOqRBWGwuw60gmrzJBF4mFveGEjAQxWNpMg41pf6oE/N5WXnd+GEZaXsF2lZDy9NzVUy9FyJH++ZTmKcOxBWVw=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=gmail.com;
 spf=pass smtp.mailfrom=gmail.com;
 dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com
 header.b=Rf7ld7EU; arc=none smtp.client-ip=209.85.210.169
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=gmail.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=gmail.com
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com
 header.b="Rf7ld7EU"
Received: by mail-pf1-f169.google.com with SMTP id
 d2e1a72fcca58-7b8e49d8b35so1539353b3a.3
        for <linux-kernel@vger.kernel.org>;
 Thu, 04 Dec 2025 11:30:11 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=20230601; t=1764876611; x=1765481411;
 darn=vger.kernel.org;
        h=cc:to:in-reply-to:references:message-id:content-transfer-encoding
         :mime-version:subject:date:from:from:to:cc:subject:date:message-id
         :reply-to;
        bh=sAt+yjHK5qQ7D8m5tBR+wGQlT8hgZz0lqkymiFojj+o=;
        b=Rf7ld7EUlAmJliyRljjrjEUClXIs60r9SCxXB35b9Iz0HfGE/DwEelThUzt0VlKe35
         /d8y9CG41s5PkTPq+DXAng535RtwrCUQwudAkT1t1/OZmyqwQKgLWOVmcnzezfe55LJv
         +5LV3sca9gdKAHEacePB+3s3bV+HglBo45eb7VpSzzQdOGXPZJnlNNJIMCquCmCDC0di
         T6rdRlPfahVVmIBxQRBg/Mcg+QJhHXMIcvtjc+QOaQQreQTgfq2M7ITHfmKfvgEvUdaa
         HfXWy47eVXEIK60xyeHj0nkPcE9OE6ogdh5IK0VBugu05wWzwBgwolG+5ExUASAjwng5
         29pQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1764876611; x=1765481411;
        h=cc:to:in-reply-to:references:message-id:content-transfer-encoding
         :mime-version:subject:date:from:x-gm-gg:x-gm-message-state:from:to
         :cc:subject:date:message-id:reply-to;
        bh=sAt+yjHK5qQ7D8m5tBR+wGQlT8hgZz0lqkymiFojj+o=;
        b=eRb70nmjf/8P3khTVp9Db+uvnT6mLuB6rKhSVed2Fz6dexmiLhnuWMetMylQlA1cTZ
         s9P4K+09/L7v/x4VpDxAb4mT9YWCNz4mUkYeflR+E9fr/CYy16pfMUzZQxM0WnWuIYwQ
         9OJ/7t4ieUgv118ubrnMXtSWJpwSF/q5r9HQzvBSyfD38sEap+/MN72dBvboUayReMZK
         iQYfAQ/EpwG32gEftivuxMETfRZ8EUAdE1csx4Gv5cDDPLt6anrMnuRH3PdS/Xg4g/1R
         NLZyOqLsof5Ti8aUt6OIcu2euNp/qetch083T7soIkwWvMSB0yk9GHTQrr+kq1gkL0o5
         RKHg==
X-Forwarded-Encrypted: i=1;
 AJvYcCVL+/wvyXRgOY6RGWW5BBQeRG8JLcoTBpXODYfhK+/Y56XLeT12B7I12R0QuNpMOYoGmiqpARQ5HkcjIt4=@vger.kernel.org
X-Gm-Message-State: AOJu0YykBOdY/b0FEAD4TsfeLIn1hv/JYauMGVpL68lJDyweXjmSHRWz
	U3tDy/n6HBz/eTa4mYGzeNU3fM6aWVijX2WzD2AdkxLkLLKCgz7rCGHG
X-Gm-Gg: ASbGnctxUh22GhqULxoGMrpQQrH7onjDiG7/2HK94uLNkrY3hCAAhlfU7OlrR6K1aqS
	7XuVaNUDBPfy0FkzdH5Uk++SV1fTaTdxCLB78QC/KNp4bi9D73aUal/mxHF+lGDWmJ52cyAFgMn
	cbuqWvfXddOP+nb9qMdSFiEffSqu0PLLRZE96A+lq936+hOJKzvTnnSMI6Yd7LuoGqnuHLeptTr
	h+50xH4yzKc0VmGqY4MUj8BxIrhDjdrEIcYOpmPapZfFHkZl0HEGVUEJRPLNpEYmPhh1IaBTQpQ
	VSRutYerIwKITtqELftpz6bYua6ARDv8znqbDQBfq47blV1LgkM0AqtgiDDhjxVXwPKYIILXGSM
	xImMDGqruJYAeB77ECYsdqAy+7zgzbiUAgeynuxy/VDbQ+cdnOnwRCGJWsRSJmRhvjS8DBPUXJX
	4I5vOdhLpPiSNQKx9JHb+Ub7POsLVGTvNO5qncNId1HewDT7Zd
X-Google-Smtp-Source: 
 AGHT+IFird7IDAtp3z6cM38SwIz7DzceXHZjQT0hHj1JkWSs1Yleu/kOjAP8i1LO8Znvzpyb60f+vw==
X-Received: by 2002:a05:6a20:a11e:b0:2e5:655c:7f93 with SMTP id
 adf61e73a8af0-363f5e6f5a3mr9553698637.33.1764876611072;
        Thu, 04 Dec 2025 11:30:11 -0800 (PST)
Received: from [127.0.0.1] ([101.32.222.185])
        by smtp.gmail.com with ESMTPSA id
 41be03b00d2f7-bf686b3b5a9sm2552926a12.9.2025.12.04.11.30.06
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Thu, 04 Dec 2025 11:30:10 -0800 (PST)
From: Kairui Song <ryncsn@gmail.com>
Date: Fri, 05 Dec 2025 03:29:15 +0800
Subject: [PATCH v4 07/19] mm/shmem: never bypass the swap cache for
 SWP_SYNCHRONOUS_IO
Precedence: bulk
X-Mailing-List: linux-kernel@vger.kernel.org
List-Id: <linux-kernel.vger.kernel.org>
List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: quoted-printable
Message-Id: <20251205-swap-table-p2-v4-7-cb7e28a26a40@tencent.com>
References: <20251205-swap-table-p2-v4-0-cb7e28a26a40@tencent.com>
In-Reply-To: <20251205-swap-table-p2-v4-0-cb7e28a26a40@tencent.com>
To: linux-mm@kvack.org
Cc: Andrew Morton <akpm@linux-foundation.org>, Baoquan He <bhe@redhat.com>,
 Barry Song <baohua@kernel.org>, Chris Li <chrisl@kernel.org>,
 Nhat Pham <nphamcs@gmail.com>, Yosry Ahmed <yosry.ahmed@linux.dev>,
 David Hildenbrand <david@kernel.org>, Johannes Weiner <hannes@cmpxchg.org>,
 Youngjun Park <youngjun.park@lge.com>, Hugh Dickins <hughd@google.com>,
 Baolin Wang <baolin.wang@linux.alibaba.com>,
 Ying Huang <ying.huang@linux.alibaba.com>,
 Kemeng Shi <shikemeng@huaweicloud.com>,
 Lorenzo Stoakes <lorenzo.stoakes@oracle.com>,
 "Matthew Wilcox (Oracle)" <willy@infradead.org>,
 linux-kernel@vger.kernel.org, Kairui Song <kasong@tencent.com>
X-Mailer: b4 0.14.3
X-Developer-Signature: v=1; a=ed25519-sha256; t=1764876574; l=8383;
 i=kasong@tencent.com; s=kasong-sign-tencent; h=from:subject:message-id;
 bh=IYfI3KkZoVNQ86T3ocsCHfwQQaumhNpA42xiQvDymTk=;
 b=49OGMjEFd0WljHDwp6ddjbasyOv2SjOP3yKoOjFMTosQmtuhrE3wDGlIZCtGx0P+ex8ECbZ6y
 QNzkyzflYt3AFCWDKegarXhBec7Sp5u2M/jiUiI65OjkYrO9gfD3eN2
X-Developer-Key: i=kasong@tencent.com; a=ed25519;
 pk=kCdoBuwrYph+KrkJnrr7Sm1pwwhGDdZKcKrqiK8Y1mI=

From: Kairui Song <kasong@tencent.com>

Now the overhead of the swap cache is trivial to none, bypassing the
swap cache is no longer a good optimization.

We have removed the cache bypass swapin for anon memory, now do the same
for shmem. Many helpers and functions can be dropped now.

The performance may slightly drop because of the co-existence and double
update of swap_map and swap table, and this problem will be improved
very soon in later commits by dropping the swap_map update partially:

Swapin of 24 GB file with tmpfs with
transparent_hugepage_tmpfs=3Dwithin_size and ZRAM, 3 test runs on my
machine:

Before:  After this commit:  After this series:
5.99s    6.29s               6.08s

And later swap table phases drop the swap_map completely to avoid
overhead and reduce memory usage.

Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com>
Tested-by: Baolin Wang <baolin.wang@linux.alibaba.com>
Signed-off-by: Kairui Song <kasong@tencent.com>
---
 mm/shmem.c    | 65 +++++++++++++++++--------------------------------------=
----
 mm/swap.h     |  4 ----
 mm/swapfile.c | 35 +++++++++-----------------------
 3 files changed, 27 insertions(+), 77 deletions(-)

diff --git a/mm/shmem.c b/mm/shmem.c
index ad18172ff831..d08248fd67ff 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -2001,10 +2001,9 @@ static struct folio *shmem_swap_alloc_folio(struct i=
node *inode,
 		swp_entry_t entry, int order, gfp_t gfp)
 {
 	struct shmem_inode_info *info =3D SHMEM_I(inode);
+	struct folio *new, *swapcache;
 	int nr_pages =3D 1 << order;
-	struct folio *new;
 	gfp_t alloc_gfp;
-	void *shadow;
=20
 	/*
 	 * We have arrived here because our zones are constrained, so don't
@@ -2044,34 +2043,19 @@ static struct folio *shmem_swap_alloc_folio(struct =
inode *inode,
 		goto fallback;
 	}
=20
-	/*
-	 * Prevent parallel swapin from proceeding with the swap cache flag.
-	 *
-	 * Of course there is another possible concurrent scenario as well,
-	 * that is to say, the swap cache flag of a large folio has already
-	 * been set by swapcache_prepare(), while another thread may have
-	 * already split the large swap entry stored in the shmem mapping.
-	 * In this case, shmem_add_to_page_cache() will help identify the
-	 * concurrent swapin and return -EEXIST.
-	 */
-	if (swapcache_prepare(entry, nr_pages)) {
+	swapcache =3D swapin_folio(entry, new);
+	if (swapcache !=3D new) {
 		folio_put(new);
-		new =3D ERR_PTR(-EEXIST);
-		/* Try smaller folio to avoid cache conflict */
-		goto fallback;
+		if (!swapcache) {
+			/*
+			 * The new folio is charged already, swapin can
+			 * only fail due to another raced swapin.
+			 */
+			new =3D ERR_PTR(-EEXIST);
+			goto fallback;
+		}
 	}
-
-	__folio_set_locked(new);
-	__folio_set_swapbacked(new);
-	new->swap =3D entry;
-
-	memcg1_swapin(entry, nr_pages);
-	shadow =3D swap_cache_get_shadow(entry);
-	if (shadow)
-		workingset_refault(new, shadow);
-	folio_add_lru(new);
-	swap_read_folio(new, NULL);
-	return new;
+	return swapcache;
 fallback:
 	/* Order 0 swapin failed, nothing to fallback to, abort */
 	if (!order)
@@ -2161,8 +2145,7 @@ static int shmem_replace_folio(struct folio **foliop,=
 gfp_t gfp,
 }
=20
 static void shmem_set_folio_swapin_error(struct inode *inode, pgoff_t inde=
x,
-					 struct folio *folio, swp_entry_t swap,
-					 bool skip_swapcache)
+					 struct folio *folio, swp_entry_t swap)
 {
 	struct address_space *mapping =3D inode->i_mapping;
 	swp_entry_t swapin_error;
@@ -2178,8 +2161,7 @@ static void shmem_set_folio_swapin_error(struct inode=
 *inode, pgoff_t index,
=20
 	nr_pages =3D folio_nr_pages(folio);
 	folio_wait_writeback(folio);
-	if (!skip_swapcache)
-		swap_cache_del_folio(folio);
+	swap_cache_del_folio(folio);
 	/*
 	 * Don't treat swapin error folio as alloced. Otherwise inode->i_blocks
 	 * won't be 0 when inode is released and thus trigger WARN_ON(i_blocks)
@@ -2279,7 +2261,6 @@ static int shmem_swapin_folio(struct inode *inode, pg=
off_t index,
 	softleaf_t index_entry;
 	struct swap_info_struct *si;
 	struct folio *folio =3D NULL;
-	bool skip_swapcache =3D false;
 	int error, nr_pages, order;
 	pgoff_t offset;
=20
@@ -2322,7 +2303,6 @@ static int shmem_swapin_folio(struct inode *inode, pg=
off_t index,
 				folio =3D NULL;
 				goto failed;
 			}
-			skip_swapcache =3D true;
 		} else {
 			/* Cached swapin only supports order 0 folio */
 			folio =3D shmem_swapin_cluster(swap, gfp, info, index);
@@ -2378,9 +2358,8 @@ static int shmem_swapin_folio(struct inode *inode, pg=
off_t index,
 	 * and swap cache folios are never partially freed.
 	 */
 	folio_lock(folio);
-	if ((!skip_swapcache && !folio_test_swapcache(folio)) ||
-	    shmem_confirm_swap(mapping, index, swap) < 0 ||
-	    folio->swap.val !=3D swap.val) {
+	if (!folio_matches_swap_entry(folio, swap) ||
+	    shmem_confirm_swap(mapping, index, swap) < 0) {
 		error =3D -EEXIST;
 		goto unlock;
 	}
@@ -2412,12 +2391,7 @@ static int shmem_swapin_folio(struct inode *inode, p=
goff_t index,
 	if (sgp =3D=3D SGP_WRITE)
 		folio_mark_accessed(folio);
=20
-	if (skip_swapcache) {
-		folio->swap.val =3D 0;
-		swapcache_clear(si, swap, nr_pages);
-	} else {
-		swap_cache_del_folio(folio);
-	}
+	swap_cache_del_folio(folio);
 	folio_mark_dirty(folio);
 	swap_free_nr(swap, nr_pages);
 	put_swap_device(si);
@@ -2428,14 +2402,11 @@ static int shmem_swapin_folio(struct inode *inode, =
pgoff_t index,
 	if (shmem_confirm_swap(mapping, index, swap) < 0)
 		error =3D -EEXIST;
 	if (error =3D=3D -EIO)
-		shmem_set_folio_swapin_error(inode, index, folio, swap,
-					     skip_swapcache);
+		shmem_set_folio_swapin_error(inode, index, folio, swap);
 unlock:
 	if (folio)
 		folio_unlock(folio);
 failed_nolock:
-	if (skip_swapcache)
-		swapcache_clear(si, folio->swap, folio_nr_pages(folio));
 	if (folio)
 		folio_put(folio);
 	put_swap_device(si);
diff --git a/mm/swap.h b/mm/swap.h
index 214e7d041030..e0f05babe13a 100644
--- a/mm/swap.h
+++ b/mm/swap.h
@@ -403,10 +403,6 @@ static inline int swap_writeout(struct folio *folio,
 	return 0;
 }
=20
-static inline void swapcache_clear(struct swap_info_struct *si, swp_entry_=
t entry, int nr)
-{
-}
-
 static inline struct folio *swap_cache_get_folio(swp_entry_t entry)
 {
 	return NULL;
diff --git a/mm/swapfile.c b/mm/swapfile.c
index e5284067a442..3762b8f3f9e9 100644
--- a/mm/swapfile.c
+++ b/mm/swapfile.c
@@ -1614,22 +1614,6 @@ struct swap_info_struct *get_swap_device(swp_entry_t=
 entry)
 	return NULL;
 }
=20
-static void swap_entries_put_cache(struct swap_info_struct *si,
-				   swp_entry_t entry, int nr)
-{
-	unsigned long offset =3D swp_offset(entry);
-	struct swap_cluster_info *ci;
-
-	ci =3D swap_cluster_lock(si, offset);
-	if (swap_only_has_cache(si, offset, nr)) {
-		swap_entries_free(si, ci, entry, nr);
-	} else {
-		for (int i =3D 0; i < nr; i++, entry.val++)
-			swap_entry_put_locked(si, ci, entry, SWAP_HAS_CACHE);
-	}
-	swap_cluster_unlock(ci);
-}
-
 static bool swap_entries_put_map(struct swap_info_struct *si,
 				 swp_entry_t entry, int nr)
 {
@@ -1765,13 +1749,21 @@ void swap_free_nr(swp_entry_t entry, int nr_pages)
 void put_swap_folio(struct folio *folio, swp_entry_t entry)
 {
 	struct swap_info_struct *si;
+	struct swap_cluster_info *ci;
+	unsigned long offset =3D swp_offset(entry);
 	int size =3D 1 << swap_entry_order(folio_order(folio));
=20
 	si =3D _swap_info_get(entry);
 	if (!si)
 		return;
=20
-	swap_entries_put_cache(si, entry, size);
+	ci =3D swap_cluster_lock(si, offset);
+	if (swap_only_has_cache(si, offset, size))
+		swap_entries_free(si, ci, entry, size);
+	else
+		for (int i =3D 0; i < size; i++, entry.val++)
+			swap_entry_put_locked(si, ci, entry, SWAP_HAS_CACHE);
+	swap_cluster_unlock(ci);
 }
=20
 int __swap_count(swp_entry_t entry)
@@ -3784,15 +3776,6 @@ int swapcache_prepare(swp_entry_t entry, int nr)
 	return __swap_duplicate(entry, SWAP_HAS_CACHE, nr);
 }
=20
-/*
- * Caller should ensure entries belong to the same folio so
- * the entries won't span cross cluster boundary.
- */
-void swapcache_clear(struct swap_info_struct *si, swp_entry_t entry, int n=
r)
-{
-	swap_entries_put_cache(si, entry, nr);
-}
-
 /*
  * add_swap_count_continuation - called when a swap count is duplicated
  * beyond SWAP_MAP_MAX, it allocates a new page and links that to the entr=
y's

--=20
2.52.0