From nobody Tue Dec 16 13:09:01 2025 Received: from mail-pf1-f169.google.com (mail-pf1-f169.google.com [209.85.210.169]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id E05D134C9AE for ; Thu, 4 Dec 2025 19:30:11 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.210.169 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1764876613; cv=none; b=SqFE7e0t0OMUv3oiRAK4HhVILE0dv78T212mUQxdn4Y+cvbuwqK6OcM9y4LzmLZIy8RsdW7d1IokMKAGEWCnkPB0Aiid+whca+4ISnB7ggqOkzI2kMUlIgHJp8CINvHOOBuSckAva5p2eVb2RRIjJaCmrpC1eHsNw8gpAuTJNHI= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1764876613; c=relaxed/simple; bh=VpmU7dnhJ7AtOmuxOiwPT77Wg1F9Y9a9Zu8+/fWt1Pk=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=MY858qIfM/wi7Hh+VLW2XMJAyvnncuEtnp8o2DQZ4mxRQ55fxAetUlb7WzTS1SsYpbb0cYOdDOY5CcitpjuuqTOqRBWGwuw60gmrzJBF4mFveGEjAQxWNpMg41pf6oE/N5WXnd+GEZaXsF2lZDy9NzVUy9FyJH++ZTmKcOxBWVw= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=Rf7ld7EU; arc=none smtp.client-ip=209.85.210.169 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="Rf7ld7EU" Received: by mail-pf1-f169.google.com with SMTP id d2e1a72fcca58-7b8e49d8b35so1539353b3a.3 for ; Thu, 04 Dec 2025 11:30:11 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1764876611; x=1765481411; darn=vger.kernel.org; h=cc:to:in-reply-to:references:message-id:content-transfer-encoding :mime-version:subject:date:from:from:to:cc:subject:date:message-id :reply-to; bh=sAt+yjHK5qQ7D8m5tBR+wGQlT8hgZz0lqkymiFojj+o=; b=Rf7ld7EUlAmJliyRljjrjEUClXIs60r9SCxXB35b9Iz0HfGE/DwEelThUzt0VlKe35 /d8y9CG41s5PkTPq+DXAng535RtwrCUQwudAkT1t1/OZmyqwQKgLWOVmcnzezfe55LJv +5LV3sca9gdKAHEacePB+3s3bV+HglBo45eb7VpSzzQdOGXPZJnlNNJIMCquCmCDC0di T6rdRlPfahVVmIBxQRBg/Mcg+QJhHXMIcvtjc+QOaQQreQTgfq2M7ITHfmKfvgEvUdaa HfXWy47eVXEIK60xyeHj0nkPcE9OE6ogdh5IK0VBugu05wWzwBgwolG+5ExUASAjwng5 29pQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1764876611; x=1765481411; h=cc:to:in-reply-to:references:message-id:content-transfer-encoding :mime-version:subject:date:from:x-gm-gg:x-gm-message-state:from:to :cc:subject:date:message-id:reply-to; bh=sAt+yjHK5qQ7D8m5tBR+wGQlT8hgZz0lqkymiFojj+o=; b=eRb70nmjf/8P3khTVp9Db+uvnT6mLuB6rKhSVed2Fz6dexmiLhnuWMetMylQlA1cTZ s9P4K+09/L7v/x4VpDxAb4mT9YWCNz4mUkYeflR+E9fr/CYy16pfMUzZQxM0WnWuIYwQ 9OJ/7t4ieUgv118ubrnMXtSWJpwSF/q5r9HQzvBSyfD38sEap+/MN72dBvboUayReMZK iQYfAQ/EpwG32gEftivuxMETfRZ8EUAdE1csx4Gv5cDDPLt6anrMnuRH3PdS/Xg4g/1R NLZyOqLsof5Ti8aUt6OIcu2euNp/qetch083T7soIkwWvMSB0yk9GHTQrr+kq1gkL0o5 RKHg== X-Forwarded-Encrypted: i=1; AJvYcCVL+/wvyXRgOY6RGWW5BBQeRG8JLcoTBpXODYfhK+/Y56XLeT12B7I12R0QuNpMOYoGmiqpARQ5HkcjIt4=@vger.kernel.org X-Gm-Message-State: AOJu0YykBOdY/b0FEAD4TsfeLIn1hv/JYauMGVpL68lJDyweXjmSHRWz U3tDy/n6HBz/eTa4mYGzeNU3fM6aWVijX2WzD2AdkxLkLLKCgz7rCGHG X-Gm-Gg: ASbGnctxUh22GhqULxoGMrpQQrH7onjDiG7/2HK94uLNkrY3hCAAhlfU7OlrR6K1aqS 7XuVaNUDBPfy0FkzdH5Uk++SV1fTaTdxCLB78QC/KNp4bi9D73aUal/mxHF+lGDWmJ52cyAFgMn cbuqWvfXddOP+nb9qMdSFiEffSqu0PLLRZE96A+lq936+hOJKzvTnnSMI6Yd7LuoGqnuHLeptTr h+50xH4yzKc0VmGqY4MUj8BxIrhDjdrEIcYOpmPapZfFHkZl0HEGVUEJRPLNpEYmPhh1IaBTQpQ VSRutYerIwKITtqELftpz6bYua6ARDv8znqbDQBfq47blV1LgkM0AqtgiDDhjxVXwPKYIILXGSM xImMDGqruJYAeB77ECYsdqAy+7zgzbiUAgeynuxy/VDbQ+cdnOnwRCGJWsRSJmRhvjS8DBPUXJX 4I5vOdhLpPiSNQKx9JHb+Ub7POsLVGTvNO5qncNId1HewDT7Zd X-Google-Smtp-Source: AGHT+IFird7IDAtp3z6cM38SwIz7DzceXHZjQT0hHj1JkWSs1Yleu/kOjAP8i1LO8Znvzpyb60f+vw== X-Received: by 2002:a05:6a20:a11e:b0:2e5:655c:7f93 with SMTP id adf61e73a8af0-363f5e6f5a3mr9553698637.33.1764876611072; Thu, 04 Dec 2025 11:30:11 -0800 (PST) Received: from [127.0.0.1] ([101.32.222.185]) by smtp.gmail.com with ESMTPSA id 41be03b00d2f7-bf686b3b5a9sm2552926a12.9.2025.12.04.11.30.06 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 04 Dec 2025 11:30:10 -0800 (PST) From: Kairui Song Date: Fri, 05 Dec 2025 03:29:15 +0800 Subject: [PATCH v4 07/19] mm/shmem: never bypass the swap cache for SWP_SYNCHRONOUS_IO Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Message-Id: <20251205-swap-table-p2-v4-7-cb7e28a26a40@tencent.com> References: <20251205-swap-table-p2-v4-0-cb7e28a26a40@tencent.com> In-Reply-To: <20251205-swap-table-p2-v4-0-cb7e28a26a40@tencent.com> To: linux-mm@kvack.org Cc: Andrew Morton , Baoquan He , Barry Song , Chris Li , Nhat Pham , Yosry Ahmed , David Hildenbrand , Johannes Weiner , Youngjun Park , Hugh Dickins , Baolin Wang , Ying Huang , Kemeng Shi , Lorenzo Stoakes , "Matthew Wilcox (Oracle)" , linux-kernel@vger.kernel.org, Kairui Song X-Mailer: b4 0.14.3 X-Developer-Signature: v=1; a=ed25519-sha256; t=1764876574; l=8383; i=kasong@tencent.com; s=kasong-sign-tencent; h=from:subject:message-id; bh=IYfI3KkZoVNQ86T3ocsCHfwQQaumhNpA42xiQvDymTk=; b=49OGMjEFd0WljHDwp6ddjbasyOv2SjOP3yKoOjFMTosQmtuhrE3wDGlIZCtGx0P+ex8ECbZ6y QNzkyzflYt3AFCWDKegarXhBec7Sp5u2M/jiUiI65OjkYrO9gfD3eN2 X-Developer-Key: i=kasong@tencent.com; a=ed25519; pk=kCdoBuwrYph+KrkJnrr7Sm1pwwhGDdZKcKrqiK8Y1mI= From: Kairui Song Now the overhead of the swap cache is trivial to none, bypassing the swap cache is no longer a good optimization. We have removed the cache bypass swapin for anon memory, now do the same for shmem. Many helpers and functions can be dropped now. The performance may slightly drop because of the co-existence and double update of swap_map and swap table, and this problem will be improved very soon in later commits by dropping the swap_map update partially: Swapin of 24 GB file with tmpfs with transparent_hugepage_tmpfs=3Dwithin_size and ZRAM, 3 test runs on my machine: Before: After this commit: After this series: 5.99s 6.29s 6.08s And later swap table phases drop the swap_map completely to avoid overhead and reduce memory usage. Reviewed-by: Baolin Wang Tested-by: Baolin Wang Signed-off-by: Kairui Song --- mm/shmem.c | 65 +++++++++++++++++--------------------------------------= ---- mm/swap.h | 4 ---- mm/swapfile.c | 35 +++++++++----------------------- 3 files changed, 27 insertions(+), 77 deletions(-) diff --git a/mm/shmem.c b/mm/shmem.c index ad18172ff831..d08248fd67ff 100644 --- a/mm/shmem.c +++ b/mm/shmem.c @@ -2001,10 +2001,9 @@ static struct folio *shmem_swap_alloc_folio(struct i= node *inode, swp_entry_t entry, int order, gfp_t gfp) { struct shmem_inode_info *info =3D SHMEM_I(inode); + struct folio *new, *swapcache; int nr_pages =3D 1 << order; - struct folio *new; gfp_t alloc_gfp; - void *shadow; =20 /* * We have arrived here because our zones are constrained, so don't @@ -2044,34 +2043,19 @@ static struct folio *shmem_swap_alloc_folio(struct = inode *inode, goto fallback; } =20 - /* - * Prevent parallel swapin from proceeding with the swap cache flag. - * - * Of course there is another possible concurrent scenario as well, - * that is to say, the swap cache flag of a large folio has already - * been set by swapcache_prepare(), while another thread may have - * already split the large swap entry stored in the shmem mapping. - * In this case, shmem_add_to_page_cache() will help identify the - * concurrent swapin and return -EEXIST. - */ - if (swapcache_prepare(entry, nr_pages)) { + swapcache =3D swapin_folio(entry, new); + if (swapcache !=3D new) { folio_put(new); - new =3D ERR_PTR(-EEXIST); - /* Try smaller folio to avoid cache conflict */ - goto fallback; + if (!swapcache) { + /* + * The new folio is charged already, swapin can + * only fail due to another raced swapin. + */ + new =3D ERR_PTR(-EEXIST); + goto fallback; + } } - - __folio_set_locked(new); - __folio_set_swapbacked(new); - new->swap =3D entry; - - memcg1_swapin(entry, nr_pages); - shadow =3D swap_cache_get_shadow(entry); - if (shadow) - workingset_refault(new, shadow); - folio_add_lru(new); - swap_read_folio(new, NULL); - return new; + return swapcache; fallback: /* Order 0 swapin failed, nothing to fallback to, abort */ if (!order) @@ -2161,8 +2145,7 @@ static int shmem_replace_folio(struct folio **foliop,= gfp_t gfp, } =20 static void shmem_set_folio_swapin_error(struct inode *inode, pgoff_t inde= x, - struct folio *folio, swp_entry_t swap, - bool skip_swapcache) + struct folio *folio, swp_entry_t swap) { struct address_space *mapping =3D inode->i_mapping; swp_entry_t swapin_error; @@ -2178,8 +2161,7 @@ static void shmem_set_folio_swapin_error(struct inode= *inode, pgoff_t index, =20 nr_pages =3D folio_nr_pages(folio); folio_wait_writeback(folio); - if (!skip_swapcache) - swap_cache_del_folio(folio); + swap_cache_del_folio(folio); /* * Don't treat swapin error folio as alloced. Otherwise inode->i_blocks * won't be 0 when inode is released and thus trigger WARN_ON(i_blocks) @@ -2279,7 +2261,6 @@ static int shmem_swapin_folio(struct inode *inode, pg= off_t index, softleaf_t index_entry; struct swap_info_struct *si; struct folio *folio =3D NULL; - bool skip_swapcache =3D false; int error, nr_pages, order; pgoff_t offset; =20 @@ -2322,7 +2303,6 @@ static int shmem_swapin_folio(struct inode *inode, pg= off_t index, folio =3D NULL; goto failed; } - skip_swapcache =3D true; } else { /* Cached swapin only supports order 0 folio */ folio =3D shmem_swapin_cluster(swap, gfp, info, index); @@ -2378,9 +2358,8 @@ static int shmem_swapin_folio(struct inode *inode, pg= off_t index, * and swap cache folios are never partially freed. */ folio_lock(folio); - if ((!skip_swapcache && !folio_test_swapcache(folio)) || - shmem_confirm_swap(mapping, index, swap) < 0 || - folio->swap.val !=3D swap.val) { + if (!folio_matches_swap_entry(folio, swap) || + shmem_confirm_swap(mapping, index, swap) < 0) { error =3D -EEXIST; goto unlock; } @@ -2412,12 +2391,7 @@ static int shmem_swapin_folio(struct inode *inode, p= goff_t index, if (sgp =3D=3D SGP_WRITE) folio_mark_accessed(folio); =20 - if (skip_swapcache) { - folio->swap.val =3D 0; - swapcache_clear(si, swap, nr_pages); - } else { - swap_cache_del_folio(folio); - } + swap_cache_del_folio(folio); folio_mark_dirty(folio); swap_free_nr(swap, nr_pages); put_swap_device(si); @@ -2428,14 +2402,11 @@ static int shmem_swapin_folio(struct inode *inode, = pgoff_t index, if (shmem_confirm_swap(mapping, index, swap) < 0) error =3D -EEXIST; if (error =3D=3D -EIO) - shmem_set_folio_swapin_error(inode, index, folio, swap, - skip_swapcache); + shmem_set_folio_swapin_error(inode, index, folio, swap); unlock: if (folio) folio_unlock(folio); failed_nolock: - if (skip_swapcache) - swapcache_clear(si, folio->swap, folio_nr_pages(folio)); if (folio) folio_put(folio); put_swap_device(si); diff --git a/mm/swap.h b/mm/swap.h index 214e7d041030..e0f05babe13a 100644 --- a/mm/swap.h +++ b/mm/swap.h @@ -403,10 +403,6 @@ static inline int swap_writeout(struct folio *folio, return 0; } =20 -static inline void swapcache_clear(struct swap_info_struct *si, swp_entry_= t entry, int nr) -{ -} - static inline struct folio *swap_cache_get_folio(swp_entry_t entry) { return NULL; diff --git a/mm/swapfile.c b/mm/swapfile.c index e5284067a442..3762b8f3f9e9 100644 --- a/mm/swapfile.c +++ b/mm/swapfile.c @@ -1614,22 +1614,6 @@ struct swap_info_struct *get_swap_device(swp_entry_t= entry) return NULL; } =20 -static void swap_entries_put_cache(struct swap_info_struct *si, - swp_entry_t entry, int nr) -{ - unsigned long offset =3D swp_offset(entry); - struct swap_cluster_info *ci; - - ci =3D swap_cluster_lock(si, offset); - if (swap_only_has_cache(si, offset, nr)) { - swap_entries_free(si, ci, entry, nr); - } else { - for (int i =3D 0; i < nr; i++, entry.val++) - swap_entry_put_locked(si, ci, entry, SWAP_HAS_CACHE); - } - swap_cluster_unlock(ci); -} - static bool swap_entries_put_map(struct swap_info_struct *si, swp_entry_t entry, int nr) { @@ -1765,13 +1749,21 @@ void swap_free_nr(swp_entry_t entry, int nr_pages) void put_swap_folio(struct folio *folio, swp_entry_t entry) { struct swap_info_struct *si; + struct swap_cluster_info *ci; + unsigned long offset =3D swp_offset(entry); int size =3D 1 << swap_entry_order(folio_order(folio)); =20 si =3D _swap_info_get(entry); if (!si) return; =20 - swap_entries_put_cache(si, entry, size); + ci =3D swap_cluster_lock(si, offset); + if (swap_only_has_cache(si, offset, size)) + swap_entries_free(si, ci, entry, size); + else + for (int i =3D 0; i < size; i++, entry.val++) + swap_entry_put_locked(si, ci, entry, SWAP_HAS_CACHE); + swap_cluster_unlock(ci); } =20 int __swap_count(swp_entry_t entry) @@ -3784,15 +3776,6 @@ int swapcache_prepare(swp_entry_t entry, int nr) return __swap_duplicate(entry, SWAP_HAS_CACHE, nr); } =20 -/* - * Caller should ensure entries belong to the same folio so - * the entries won't span cross cluster boundary. - */ -void swapcache_clear(struct swap_info_struct *si, swp_entry_t entry, int n= r) -{ - swap_entries_put_cache(si, entry, nr); -} - /* * add_swap_count_continuation - called when a swap count is duplicated * beyond SWAP_MAP_MAX, it allocates a new page and links that to the entr= y's --=20 2.52.0