From nobody Tue Feb 10 12:57:30 2026
Received: from mail-pl1-f171.google.com (mail-pl1-f171.google.com
 [209.85.214.171])
	(using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id 80D0533CEA5
	for <linux-kernel@vger.kernel.org>; Fri, 19 Dec 2025 19:44:36 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=209.85.214.171
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1766173478; cv=none;
 b=Ox/2uyRU4fI6q8lDN/pKkRS9TaUojN7zOSLWH1jMt14gF8vSPqR/s5OMDEx//Rfsl0Lyv/BgyiPXSj5H+yFJOr3Hbc/PI+6PdpyHeOLbSKZ0bdML3/wopPUNB9fzIvJw2vPQyxwbvEHL6UQJInGrsOtjV5EysMzub0W8kuWxw9c=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1766173478; c=relaxed/simple;
	bh=fBQzBWRvSNvqbL7qlNgTc4sQ7TRdPV5/mrlX/UlrKxo=;
	h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References:
	 In-Reply-To:To:Cc;
 b=ADH/gW4rVMhT2UyCINhZl/pxdgcnW7EvB/dDMl89fxaEN8abBchetUz+0kXjNaKyPch69xPDC7j3h82bGoOr/VovHaK4j0ILysUAqLZ3Cw4XUnBmweDUAAOIywijckTIwYe40Mabt3l5Ex2Gjez50Py9S5FeAWO+Fx5CqaTu20U=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=gmail.com;
 spf=pass smtp.mailfrom=gmail.com;
 dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com
 header.b=lYm7CS1y; arc=none smtp.client-ip=209.85.214.171
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=gmail.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=gmail.com
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com
 header.b="lYm7CS1y"
Received: by mail-pl1-f171.google.com with SMTP id
 d9443c01a7336-2a0bae9aca3so30002625ad.3
        for <linux-kernel@vger.kernel.org>;
 Fri, 19 Dec 2025 11:44:36 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=20230601; t=1766173476; x=1766778276;
 darn=vger.kernel.org;
        h=cc:to:in-reply-to:references:message-id:content-transfer-encoding
         :mime-version:subject:date:from:from:to:cc:subject:date:message-id
         :reply-to;
        bh=6NNw4+sQJUy6FxM6FuaAbQJwTnt74msvVl4/Z/6MR8I=;
        b=lYm7CS1yV7dTMJlW6cLxO4ifthW+9t1eT+IesdwlYaCrXMxpTFLtNjCngfKl2hhwaP
         Sn6BbIJbbxs0cXH+wLyGOtrgHiaSGbsPl8yPbNFdAwOPdLDLBw2Qmh2jEbLbKRjWh0RD
         B4Wl4aX+Yax3/kIgIiMYBnsqagc1QwVEwLpBrST/bccf9ZHXNBJWg8ePLcTBOQtXq6Jw
         s9WkRQMiSpC8GwpE9ImAMdIVORXKzo03sH685XEe8aP+JaAtZDjZctoJOHyzEHOM731l
         A6AWlNvyzg9QbmdgUFYDSpuN2rf/0RKJuyb3EK2FR+sicCeq9mkOcugOF+biOjqWkHuv
         CckQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1766173476; x=1766778276;
        h=cc:to:in-reply-to:references:message-id:content-transfer-encoding
         :mime-version:subject:date:from:x-gm-gg:x-gm-message-state:from:to
         :cc:subject:date:message-id:reply-to;
        bh=6NNw4+sQJUy6FxM6FuaAbQJwTnt74msvVl4/Z/6MR8I=;
        b=QZ7918luDk9w1wWiv94OH81/6p4+2k7RVMAXDoqWJnb/mbz8sKBE99JwWabWj3pFaX
         q3Fpcm1jt8D/1oz6rxNSxXxz/mIzp8MQk8pq9UPy/nFzlwHJyAEqpB/HvLhDzfwb8400
         +Vb58wgHy2u7awNJsFcjhJt/aF3MlXHpfm1IX9QNoXSNcbWt8azgjmKvbPjw8YPe2JAy
         IZTwM6Tuy8e5BNZSNsauEOwmonRrXxCab3TlGVsH1jCu21Z+PziFT4gJ6IQJfzaOIfIk
         VIXss2xQ+zb1ifreZnHT4gvKKuw85d13YDN8MOWy3p7cXbu5yvoorYcJVxaR4Oob0pnj
         W6yw==
X-Forwarded-Encrypted: i=1;
 AJvYcCXGJ2ONGJe1dxsQ6GjQKxigZLc6NzW2cdEU44k6sKQyJhrX4MDEyjSm8hQXcFZC6osW7F9RUWQIDG4G89E=@vger.kernel.org
X-Gm-Message-State: AOJu0YxAM817ScR6G7h+K0bDqQtmarRCoWMTsR4Mx/BPnh5NzGEhp3wn
	obZNoRy8ZFpKUcDw9pHn44JOyLElb8AtFKbJXLF1ElMFrAtoobeSr93r
X-Gm-Gg: AY/fxX7Kv9Rkn/pshcxlQENaPT03O5mm3RndqCRolZdWsIN75L8F3DiU5rKqt9ZiYUf
	jAjLu2UZ7xPMA95bDF9PnIji0iDErotBy5ttNTN/FLSotDjhi1A36/yOCzag14YTRwWFFdTdhGA
	HEYXu5Dhb5g6Pvc1RIxvs1L7aXxE4VqRLLaktK4Z+YhBIS/Mo6GzRfjX/+bmHtyxkUhyJGcgqAq
	E3vh8H/O3ULc6hUPzf3IQYUE7D6UBpayqhQStr8x3M+5VxWBtMsENhrbo1PqnNNN/00WHfRT7dC
	+H1SvJiu5QX5+rw8a6mhlcU00pEWC4S02CZKzymRO25va7dkyltgMJiFwEIjxqWeBgwiN4lDLsT
	EAJuzw4iUpeKOFVmYVyTtUlXqhzZszLRr9+ZF0RPH6N3oU8Hit/RLChOlzjztjH2VVBZcHO9Wmk
	hiktCJlm8PGfM05lzzOxW201lImIqvBbYsPPZSoYgQ4g3kT0E4ceXD
X-Google-Smtp-Source: 
 AGHT+IEr+/1945zEk9Px6JXX2rL02mT5MvIoMEZ8htEYgKPpP4r46YR5Xun01SYV0wEmq0Q8Ko+l8Q==
X-Received: by 2002:a17:902:e847:b0:297:f2f1:6711 with SMTP id
 d9443c01a7336-2a2f2a4a39bmr39141335ad.56.1766173475908;
        Fri, 19 Dec 2025 11:44:35 -0800 (PST)
Received: from [127.0.0.1] ([101.32.222.185])
        by smtp.gmail.com with ESMTPSA id
 d9443c01a7336-2a2f3d76ceesm30170985ad.91.2025.12.19.11.44.31
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Fri, 19 Dec 2025 11:44:35 -0800 (PST)
From: Kairui Song <ryncsn@gmail.com>
Date: Sat, 20 Dec 2025 03:43:33 +0800
Subject: [PATCH v5 04/19] mm, swap: always try to free swap cache for
 SWP_SYNCHRONOUS_IO devices
Precedence: bulk
X-Mailing-List: linux-kernel@vger.kernel.org
List-Id: <linux-kernel.vger.kernel.org>
List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: quoted-printable
Message-Id: <20251220-swap-table-p2-v5-4-8862a265a033@tencent.com>
References: <20251220-swap-table-p2-v5-0-8862a265a033@tencent.com>
In-Reply-To: <20251220-swap-table-p2-v5-0-8862a265a033@tencent.com>
To: linux-mm@kvack.org
Cc: Andrew Morton <akpm@linux-foundation.org>, Baoquan He <bhe@redhat.com>,
 Barry Song <baohua@kernel.org>, Chris Li <chrisl@kernel.org>,
 Nhat Pham <nphamcs@gmail.com>, Yosry Ahmed <yosry.ahmed@linux.dev>,
 David Hildenbrand <david@kernel.org>, Johannes Weiner <hannes@cmpxchg.org>,
 Youngjun Park <youngjun.park@lge.com>, Hugh Dickins <hughd@google.com>,
 Baolin Wang <baolin.wang@linux.alibaba.com>,
 Ying Huang <ying.huang@linux.alibaba.com>,
 Kemeng Shi <shikemeng@huaweicloud.com>,
 Lorenzo Stoakes <lorenzo.stoakes@oracle.com>,
 "Matthew Wilcox (Oracle)" <willy@infradead.org>,
 linux-kernel@vger.kernel.org, Kairui Song <kasong@tencent.com>
X-Mailer: b4 0.14.3
X-Developer-Signature: v=1; a=ed25519-sha256; t=1766173451; l=3125;
 i=kasong@tencent.com; s=kasong-sign-tencent; h=from:subject:message-id;
 bh=wbdKOMibaBwyIgmlPxN8UtIMUziDUOOWTYBMRYu4e/E=;
 b=eXcqEZFNRxR2JEG8FVKf7rUS7y3LiCM9aj7WW7itUSsaEJbdRHEPX2jqg7xRwE4VRDRq+9Gcg
 x+3noyQuBwHDUVTvk8IBJ1ogPI7uex41kCJqTvbOJKuyICojksDxhAK
X-Developer-Key: i=kasong@tencent.com; a=ed25519;
 pk=kCdoBuwrYph+KrkJnrr7Sm1pwwhGDdZKcKrqiK8Y1mI=

From: Kairui Song <kasong@tencent.com>

Now SWP_SYNCHRONOUS_IO devices are also using swap cache. One side
effect is that a folio may stay in swap cache for a longer time due to
lazy freeing (vm_swap_full()). This can help save some CPU / IO if folios
are being swapped out very frequently right after swapin, hence improving
the performance. But the long pinning of swap slots also increases the
fragmentation rate of the swap device significantly, and currently,
all in-tree SWP_SYNCHRONOUS_IO devices are RAM disks, so it also
causes the backing memory to be pinned, increasing the memory pressure.

So drop the swap cache immediately for SWP_SYNCHRONOUS_IO devices after
swapin finishes. Swap cache has served its role as a synchronization
layer to prevent any parallel swap-in from wasting CPU or memory
allocation, and the redundant IO is not a major concern for
SWP_SYNCHRONOUS_IO devices.

Worth noting, without this patch, this series so far can provide a ~30%
performance gain for certain workloads like MySQL or kernel compilation,
but causes significant regression or OOM when under extreme global
pressure. With this patch, we still have a nice performance gain for
most workloads, and without introducing any observable regressions. This
is a hint that further optimization can be done based on the new unified
swapin with swap cache, but for now, just keep the behaviour consistent
with before.

Signed-off-by: Kairui Song <kasong@tencent.com>
---
 mm/memory.c | 18 ++++++++++++++++--
 1 file changed, 16 insertions(+), 2 deletions(-)

diff --git a/mm/memory.c b/mm/memory.c
index 3d6ab2689b5e..9e391a283946 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -4354,12 +4354,26 @@ static vm_fault_t remove_device_exclusive_entry(str=
uct vm_fault *vmf)
 	return 0;
 }
=20
-static inline bool should_try_to_free_swap(struct folio *folio,
+/*
+ * Check if we should call folio_free_swap to free the swap cache.
+ * folio_free_swap only frees the swap cache to release the slot if swap
+ * count is zero, so we don't need to check the swap count here.
+ */
+static inline bool should_try_to_free_swap(struct swap_info_struct *si,
+					   struct folio *folio,
 					   struct vm_area_struct *vma,
 					   unsigned int fault_flags)
 {
 	if (!folio_test_swapcache(folio))
 		return false;
+	/*
+	 * Always try to free swap cache for SWP_SYNCHRONOUS_IO devices. Swap
+	 * cache can help save some IO or memory overhead, but these devices
+	 * are fast, and meanwhile, swap cache pinning the slot deferring the
+	 * release of metadata or fragmentation is a more critical issue.
+	 */
+	if (data_race(si->flags & SWP_SYNCHRONOUS_IO))
+		return true;
 	if (mem_cgroup_swap_full(folio) || (vma->vm_flags & VM_LOCKED) ||
 	    folio_test_mlocked(folio))
 		return true;
@@ -4931,7 +4945,7 @@ vm_fault_t do_swap_page(struct vm_fault *vmf)
 	 * yet.
 	 */
 	swap_free_nr(entry, nr_pages);
-	if (should_try_to_free_swap(folio, vma, vmf->flags))
+	if (should_try_to_free_swap(si, folio, vma, vmf->flags))
 		folio_free_swap(folio);
=20
 	add_mm_counter(vma->vm_mm, MM_ANONPAGES, nr_pages);

--=20
2.52.0