From nobody Tue Feb 10 20:47:03 2026 Received: from mail-pl1-f196.google.com (mail-pl1-f196.google.com [209.85.214.196]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id AA0577E0E8 for ; Fri, 19 Dec 2025 19:58:08 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.196 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1766174290; cv=none; b=tFlsBexWbOd2ww0Yr8+sJgywSRUDpAdcp/v44lAWJW2NQkF8D250+JGo5zlbDSBl3cUPX9wkUDW1zLk7q4OoZzZqgDdP11joC1tRF8W3/gN4tLJjg4sO2m6ZqIY6NdNO01o4YuMy6TUXHCqrYby1TdNwaFFLwwRAnyEF8Al+AfU= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1766174290; c=relaxed/simple; bh=31dFcsjD7K9QVX4sb15+PRfrOeEiJVIOLRfCZTtA4oA=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=UDYm3+mmWNnfvphacY5vcOncvyi+DnGvdiQlHN71T6oEv5T4zw4hMPfMjYSIJ5ZC7y/7Xdz5v/uMYVrVvKdoOnJJ5LacIPvjd6Xq8U4lbZI5cSJWBYX3JHvu9n+JD6R3y1jtmrPd/44P7nxLBB1atcgH+nQJv7MNjBRRdIGfc6c= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=PttNT1q7; arc=none smtp.client-ip=209.85.214.196 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="PttNT1q7" Received: by mail-pl1-f196.google.com with SMTP id d9443c01a7336-2a07f8dd9cdso23600105ad.1 for ; Fri, 19 Dec 2025 11:58:08 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1766174288; x=1766779088; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=4H4DuUxDB6oobQYa52xvrq3Ax3h4Y1Qt9zUrdq7LqpM=; b=PttNT1q7kSprQA87lq1kPV/uPu6tIuy75mCArP4uDdG/Chpf4Bp65AtOghclIm2YOq GnJPNI2rmTOUzkT+OoqsVaMSaeDOvqw4G3eUTAplblpZuuZ57wpQ5s3c79+kI/9kniap d/dKQtiPgIod2he8QtPMAFJzxQwLR+G9sc/l2rSqsqRVjCpL+Lmga1W7pE7d4Ozu4Xad T4s7HObmk/QTOlRpS5qBUf5pN5UG7+VePJ1Ow2SejClKoBOE7DkNgZeG6U6tOdnOLvJl rxgD3NtaSOzznHnLztVgRdQlI8Gej7x2LPZpIEin0tpkBNaLdfdxL0Ai4Javjm6eC0y0 VS6w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1766174288; x=1766779088; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=4H4DuUxDB6oobQYa52xvrq3Ax3h4Y1Qt9zUrdq7LqpM=; b=QSGjZkWohnsTjTWFEFEs5x7oF94+TriYS4s/M8e4qljJyOC/KpHLsqaFcC4VEuJcIg JkoizgY766dsAcSK/USN77irgQNtNGml8Io1Xz96lGj8QAIbEtkuiJ1Jd7J0PJGJllzM T42lUaNp6g3GVoy7Jdd7iLnA2M0bxdckjJ5xQJNPespcVdxR2wOshqL/MlkmGeXu5bWn N9CpRfGIWNNHAyL57EGeF21pso3P8GWQZiIuWE5jQxXxrRmYuyG9TrrVIGQTefK7Z9lp PTYfn2PlvIfZeRvGDXXTJ1YJ9r7QFneQxVqK6VSrD87fcL8bsqOJi2TYz+57XRinEMWM ZS4Q== X-Forwarded-Encrypted: i=1; AJvYcCU6APaGhKh8ZJvox2tn9WoDDIJ5w5Dg4jNI1crWEkGrlOzNgEwAJst6eJh3QYntvVk8nt2kzpm+SwmlBWk=@vger.kernel.org X-Gm-Message-State: AOJu0YwCU6qNoAPh0/dSJA68yvONHirV++AIIHAALjS3Eh2G09YCQb8E ct6q8Bug1+rrPzoZ8tK0T2OIKHRRmOFAJqwsZRo55HAGDGOp4MYmG8u/ X-Gm-Gg: AY/fxX6oQOV2NHIQLP3N4WdWs/toS0YZ2DAOAC7st7vJ7dU8+Bz8i2KtrDIWJ35vsC/ hGJbAjJ25EI9unQtbGzFUKMaaKJH+I1U0/ej+4SVkr6QfbTmwfQ2/amNUViO+Mq61QKwa7SC5nq yry8km1PHiP/3SyEKKCRFr2wOci2odu9FfHvl/ffOwn4izXu/1UcxkRphFTZ8CQrGcW+YWgHzWs yHJZaJQYmL2Ie3fMPQmBmNJ2ghpYe5ni98461ZQOngKqzdJ7RHjfZIITJNZpkHS9vDt9UU8X/XL qF4snI+q8wb6XklejnjPAPeRPcdnCyM39zLKBk1HK7+NtOrHkYluT+LevVmd2khhaVKilxxOW4v RjZ4gE7HvDePHbCILKTj4JOWVnrGWUM9V8T2poJV1IPa9G+0FAauV4JdIlZRtVd+d1uA71WTjaX rjEAqAaU5bpuPkzDW83LESjqdcJrmQinybkCedtPDqL6NhzZmOTTNLZC5uvwPsJzJbWvO8eAMqF PFSww== X-Google-Smtp-Source: AGHT+IG3wtyX2kCqK0/+5Lh2TLFmivYjvnsdXdS7J5h4wcZuh9jh+Qx4ARXEJV+u1nbtB3ROW7TP9g== X-Received: by 2002:a17:902:d50b:b0:2a0:f47c:cfc with SMTP id d9443c01a7336-2a2f283685amr38147215ad.34.1766174287808; Fri, 19 Dec 2025 11:58:07 -0800 (PST) Received: from KASONG-MC4.tencent.com ([101.32.222.185]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-2a2f3d6ec6bsm29561605ad.87.2025.12.19.11.58.03 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Fri, 19 Dec 2025 11:58:07 -0800 (PST) From: Kairui Song To: linux-mm@kvack.org Cc: Kairui Song , Andrew Morton , Baoquan He , Barry Song , Chris Li , Nhat Pham , Yosry Ahmed , David Hildenbrand , Johannes Weiner , Youngjun Park , Hugh Dickins , Baolin Wang , Ying Huang , Kemeng Shi , Lorenzo Stoakes , "Matthew Wilcox (Oracle)" , linux-kernel@vger.kernel.org, Kairui Song Subject: [PATCH v5 07/19] mm/shmem: never bypass the swap cache for SWP_SYNCHRONOUS_IO Date: Sat, 20 Dec 2025 03:57:51 +0800 Message-ID: <20251219195751.61328-1-ryncsn@gmail.com> X-Mailer: git-send-email 2.52.0 In-Reply-To: <20251220-swap-table-p2-v5-0-8862a265a033@tencent.com> References: <20251220-swap-table-p2-v5-0-8862a265a033@tencent.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" X-Developer-Signature: v=1; a=ed25519-sha256; t=1766173451; l=8383; i=kasong@tencent.com; s=kasong-sign-tencent; h=from:subject:message-id; bh=sKGl3ENiC74FwrANABEUQMRrBkxGAp/uMm91rZRl9xM=; b=yf5gksHPlsaqtrml++5CgCNI5OX/44vVNb6duB9fWkQkMKDNXC+FFx8Jxgs7hLdW9/Lsta7Bm l1nNK9Cc7AWBVfWv/5eQgmJ/f2SGNDpuQOKI16R60HkMmte2972R59h X-Developer-Key: i=kasong@tencent.com; a=ed25519; pk=kCdoBuwrYph+KrkJnrr7Sm1pwwhGDdZKcKrqiK8Y1mI= Content-Transfer-Encoding: quoted-printable From: Kairui Song Now the overhead of the swap cache is trivial to none, bypassing the swap cache is no longer a good optimization. We have removed the cache bypass swapin for anon memory, now do the same for shmem. Many helpers and functions can be dropped now. The performance may slightly drop because of the co-existence and double update of swap_map and swap table, and this problem will be improved very soon in later commits by dropping the swap_map update partially: Swapin of 24 GB file with tmpfs with transparent_hugepage_tmpfs=3Dwithin_size and ZRAM, 3 test runs on my machine: Before: After this commit: After this series: 5.99s 6.29s 6.08s And later swap table phases will drop the swap_map completely to avoid overhead and reduce memory usage. Reviewed-by: Baolin Wang Tested-by: Baolin Wang Signed-off-by: Kairui Song --- mm/shmem.c | 65 +++++++++++++++++--------------------------------------= ---- mm/swap.h | 4 ---- mm/swapfile.c | 35 +++++++++----------------------- 3 files changed, 27 insertions(+), 77 deletions(-) diff --git a/mm/shmem.c b/mm/shmem.c index dd136d40631c..d7eeeaa9580d 100644 --- a/mm/shmem.c +++ b/mm/shmem.c @@ -2014,10 +2014,9 @@ static struct folio *shmem_swap_alloc_folio(struct i= node *inode, swp_entry_t entry, int order, gfp_t gfp) { struct shmem_inode_info *info =3D SHMEM_I(inode); + struct folio *new, *swapcache; int nr_pages =3D 1 << order; - struct folio *new; gfp_t alloc_gfp; - void *shadow; =20 /* * We have arrived here because our zones are constrained, so don't @@ -2057,34 +2056,19 @@ static struct folio *shmem_swap_alloc_folio(struct = inode *inode, goto fallback; } =20 - /* - * Prevent parallel swapin from proceeding with the swap cache flag. - * - * Of course there is another possible concurrent scenario as well, - * that is to say, the swap cache flag of a large folio has already - * been set by swapcache_prepare(), while another thread may have - * already split the large swap entry stored in the shmem mapping. - * In this case, shmem_add_to_page_cache() will help identify the - * concurrent swapin and return -EEXIST. - */ - if (swapcache_prepare(entry, nr_pages)) { + swapcache =3D swapin_folio(entry, new); + if (swapcache !=3D new) { folio_put(new); - new =3D ERR_PTR(-EEXIST); - /* Try smaller folio to avoid cache conflict */ - goto fallback; + if (!swapcache) { + /* + * The new folio is charged already, swapin can + * only fail due to another raced swapin. + */ + new =3D ERR_PTR(-EEXIST); + goto fallback; + } } - - __folio_set_locked(new); - __folio_set_swapbacked(new); - new->swap =3D entry; - - memcg1_swapin(entry, nr_pages); - shadow =3D swap_cache_get_shadow(entry); - if (shadow) - workingset_refault(new, shadow); - folio_add_lru(new); - swap_read_folio(new, NULL); - return new; + return swapcache; fallback: /* Order 0 swapin failed, nothing to fallback to, abort */ if (!order) @@ -2174,8 +2158,7 @@ static int shmem_replace_folio(struct folio **foliop,= gfp_t gfp, } =20 static void shmem_set_folio_swapin_error(struct inode *inode, pgoff_t inde= x, - struct folio *folio, swp_entry_t swap, - bool skip_swapcache) + struct folio *folio, swp_entry_t swap) { struct address_space *mapping =3D inode->i_mapping; swp_entry_t swapin_error; @@ -2191,8 +2174,7 @@ static void shmem_set_folio_swapin_error(struct inode= *inode, pgoff_t index, =20 nr_pages =3D folio_nr_pages(folio); folio_wait_writeback(folio); - if (!skip_swapcache) - swap_cache_del_folio(folio); + swap_cache_del_folio(folio); /* * Don't treat swapin error folio as alloced. Otherwise inode->i_blocks * won't be 0 when inode is released and thus trigger WARN_ON(i_blocks) @@ -2292,7 +2274,6 @@ static int shmem_swapin_folio(struct inode *inode, pg= off_t index, softleaf_t index_entry; struct swap_info_struct *si; struct folio *folio =3D NULL; - bool skip_swapcache =3D false; int error, nr_pages, order; pgoff_t offset; =20 @@ -2335,7 +2316,6 @@ static int shmem_swapin_folio(struct inode *inode, pg= off_t index, folio =3D NULL; goto failed; } - skip_swapcache =3D true; } else { /* Cached swapin only supports order 0 folio */ folio =3D shmem_swapin_cluster(swap, gfp, info, index); @@ -2391,9 +2371,8 @@ static int shmem_swapin_folio(struct inode *inode, pg= off_t index, * and swap cache folios are never partially freed. */ folio_lock(folio); - if ((!skip_swapcache && !folio_test_swapcache(folio)) || - shmem_confirm_swap(mapping, index, swap) < 0 || - folio->swap.val !=3D swap.val) { + if (!folio_matches_swap_entry(folio, swap) || + shmem_confirm_swap(mapping, index, swap) < 0) { error =3D -EEXIST; goto unlock; } @@ -2425,12 +2404,7 @@ static int shmem_swapin_folio(struct inode *inode, p= goff_t index, if (sgp =3D=3D SGP_WRITE) folio_mark_accessed(folio); =20 - if (skip_swapcache) { - folio->swap.val =3D 0; - swapcache_clear(si, swap, nr_pages); - } else { - swap_cache_del_folio(folio); - } + swap_cache_del_folio(folio); folio_mark_dirty(folio); swap_free_nr(swap, nr_pages); put_swap_device(si); @@ -2441,14 +2415,11 @@ static int shmem_swapin_folio(struct inode *inode, = pgoff_t index, if (shmem_confirm_swap(mapping, index, swap) < 0) error =3D -EEXIST; if (error =3D=3D -EIO) - shmem_set_folio_swapin_error(inode, index, folio, swap, - skip_swapcache); + shmem_set_folio_swapin_error(inode, index, folio, swap); unlock: if (folio) folio_unlock(folio); failed_nolock: - if (skip_swapcache) - swapcache_clear(si, folio->swap, folio_nr_pages(folio)); if (folio) folio_put(folio); put_swap_device(si); diff --git a/mm/swap.h b/mm/swap.h index 214e7d041030..e0f05babe13a 100644 --- a/mm/swap.h +++ b/mm/swap.h @@ -403,10 +403,6 @@ static inline int swap_writeout(struct folio *folio, return 0; } =20 -static inline void swapcache_clear(struct swap_info_struct *si, swp_entry_= t entry, int nr) -{ -} - static inline struct folio *swap_cache_get_folio(swp_entry_t entry) { return NULL; diff --git a/mm/swapfile.c b/mm/swapfile.c index e5284067a442..3762b8f3f9e9 100644 --- a/mm/swapfile.c +++ b/mm/swapfile.c @@ -1614,22 +1614,6 @@ struct swap_info_struct *get_swap_device(swp_entry_t= entry) return NULL; } =20 -static void swap_entries_put_cache(struct swap_info_struct *si, - swp_entry_t entry, int nr) -{ - unsigned long offset =3D swp_offset(entry); - struct swap_cluster_info *ci; - - ci =3D swap_cluster_lock(si, offset); - if (swap_only_has_cache(si, offset, nr)) { - swap_entries_free(si, ci, entry, nr); - } else { - for (int i =3D 0; i < nr; i++, entry.val++) - swap_entry_put_locked(si, ci, entry, SWAP_HAS_CACHE); - } - swap_cluster_unlock(ci); -} - static bool swap_entries_put_map(struct swap_info_struct *si, swp_entry_t entry, int nr) { @@ -1765,13 +1749,21 @@ void swap_free_nr(swp_entry_t entry, int nr_pages) void put_swap_folio(struct folio *folio, swp_entry_t entry) { struct swap_info_struct *si; + struct swap_cluster_info *ci; + unsigned long offset =3D swp_offset(entry); int size =3D 1 << swap_entry_order(folio_order(folio)); =20 si =3D _swap_info_get(entry); if (!si) return; =20 - swap_entries_put_cache(si, entry, size); + ci =3D swap_cluster_lock(si, offset); + if (swap_only_has_cache(si, offset, size)) + swap_entries_free(si, ci, entry, size); + else + for (int i =3D 0; i < size; i++, entry.val++) + swap_entry_put_locked(si, ci, entry, SWAP_HAS_CACHE); + swap_cluster_unlock(ci); } =20 int __swap_count(swp_entry_t entry) @@ -3784,15 +3776,6 @@ int swapcache_prepare(swp_entry_t entry, int nr) return __swap_duplicate(entry, SWAP_HAS_CACHE, nr); } =20 -/* - * Caller should ensure entries belong to the same folio so - * the entries won't span cross cluster boundary. - */ -void swapcache_clear(struct swap_info_struct *si, swp_entry_t entry, int n= r) -{ - swap_entries_put_cache(si, entry, nr); -} - /* * add_swap_count_continuation - called when a swap count is duplicated * beyond SWAP_MAP_MAX, it allocates a new page and links that to the entr= y's --=20 2.52.0