From nobody Tue Dec 2 00:46:01 2025 Received: from mail-pf1-f181.google.com (mail-pf1-f181.google.com [209.85.210.181]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D40B7220698 for ; Mon, 24 Nov 2025 19:15:57 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.210.181 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1764011759; cv=none; b=djE9BLyn6zmy1Vf1amIQBfSvKgsQlXl5g1FxoXxswIBeF8/Vk0M6UhIUVTYkI14s95VB6fS3ZVJL85GguSuqXAnNF0Ei9oI9wPX8Wo0Ec9STXOqp4F12dSnB5EK4hcCZlXDf+6lkV5pbS32XMs+z8tqiYRRLQ8c5dZ2fAkGwRh0= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1764011759; c=relaxed/simple; bh=mEECQ/wo8pHlcOj88GTHTL5nEJiE8fkV3P8vt0L/P0Q=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=BZvoQi/+RDfepVO8DvC5sDBEZBduG94tF+TmyqhHcSWKel9S4cTB7Jy4NZfybzfCTHracyvma8mBoAkJ4qBTn/R1sZx/BzIkwDoMSzB1AcN4F0G4+4bDJN5VaFK9ik6RR5O4zl6AK00E1eOX3VfaahxPV+nnKKhONmTUPOKElKY= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=RCjZcWNL; arc=none smtp.client-ip=209.85.210.181 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="RCjZcWNL" Received: by mail-pf1-f181.google.com with SMTP id d2e1a72fcca58-7ba49f92362so2731417b3a.1 for ; Mon, 24 Nov 2025 11:15:57 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1764011757; x=1764616557; darn=vger.kernel.org; h=cc:to:in-reply-to:references:message-id:content-transfer-encoding :mime-version:subject:date:from:from:to:cc:subject:date:message-id :reply-to; bh=Z74RzNdfSl8VMAwJnh85EBGE2QwwEumphH4Ky//xWjU=; b=RCjZcWNL4XdD26smKffHFwsNyjeqs2W7WAnMNYp4ukNp/2zi40SCN6A8oAJdZI0dk+ V9yGX62w7+qvFtThympnWpXXcwxI5kQQZRr/mEcZZgoQLGQpCdSa9hLz7uiIeckkIkx0 HlAWIV1OQ8/Vlb+LLbOOVxFuzieEcyZszuFr1vFSg3C8AfguXB6HYalrLSNQYLrwFDjE 2OTFvIVpIuS4+FR3AoKk4wVJ66uDillGRVdA+bBqOdmvs8Flf0tFfqqMeHC60mypRTqq DFHb09IU8x9Iv5eeA/jxL6+tk1VolVJBt1/zlHId7Io4IcRa+zTMJTOFeNdGBNxnt9/N OBKg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1764011757; x=1764616557; h=cc:to:in-reply-to:references:message-id:content-transfer-encoding :mime-version:subject:date:from:x-gm-gg:x-gm-message-state:from:to :cc:subject:date:message-id:reply-to; bh=Z74RzNdfSl8VMAwJnh85EBGE2QwwEumphH4Ky//xWjU=; b=dmMKdwsYUbg7U8E5A1ogZJAZZEtlLMK/kGZgnfRGMva0PKdulv+Ph1hd/Otibyi96J 85LLQwRhMgPI3nIuzIteH3VVgnAc3qgaLDeiMk2HUU9/1giqLqHmBrcshvwMqoJgPpWd qvezoalkRpBHI46BZ2PoqVHkfPRBCc38q+l+Xemq8xMPjfoDpdNqT5AIaabna9GyJgyU Kzlr0rDbwHCRys5seNb4gex7OGueN/Iee2pff88JPy6bOhxg8et0C+WfR254YNr28NCh rPQTbXbURXD86wukiyt0ztI82ENl1zxq2T6ft1tjfuUqxcdfbOIzKj+uMYszoMrAVQ4D X4HA== X-Forwarded-Encrypted: i=1; AJvYcCWkk5RAVlucxntt0ldVPskQFxb7iUEMdo1SXe9xY1XhtaEeRGCf52gZk7NCjhgMn2F15ZzdTZwPuKSSVWM=@vger.kernel.org X-Gm-Message-State: AOJu0Yx3SDcgF/pXE6opzhYN9EjoNv8toGnqcqY3DHquWZs4t91lTrE9 o2fo0btjo1EJCfTflu6GQ3es1mCpYlEPUQbfn42Wg7H4b1cZa1O8mrGu X-Gm-Gg: ASbGncu2wh3JDkDJD43BszT88Oqe/3KX1fMhR4Ry8xQvT60Wyb0XXMs6QMVFCIZv9CV yh7QAtWu8Y4Ajtmye8WMMUi/NbszKB+LTdbmkoKYumX0tEXn3Stc7DsUm5lcMuF0+eyuNJu0Qu8 ziaXFArVUhrXp8hU6ih36LHnakBzfkkqr/rXh0KZwUHtfARGXepSY/RQY/8eIbxzcV3rcAXNVpt bOXh6SZ+GM82jjRUqbahXwVExO0V2n6MKHgCQlmz4mZhE5ZRmta21NM0Dp0CSnpIhUzooAR9lyu I4bakUuWrtC7W5c/zvPuBfyQnpo4fADO37Yk07SrFjF+Ijxh5zqhrhEEQcTFDx7Aj7CfQrdifg/ OpP8YAVaBdpkRGFCHTzWWm6mX2O1fG/4MmZEPBXc6FJ08KD9m+wtlBe6r3b97NSzYlTipkpJsEm +ruBhk+yHNgaW0rotirnflXfsa+yCe7DZkhrmuYW258D+ZPsr2 X-Google-Smtp-Source: AGHT+IGePS0KpOixf04O0GPHrPDOTyvQ+JK5iIOfL4RCRH+wqcyx+HJrz5BODSKqd55md+TCGGnUFg== X-Received: by 2002:a05:6a21:339a:b0:35e:86c3:af0a with SMTP id adf61e73a8af0-3613e56c0femr16023692637.22.1764011756938; Mon, 24 Nov 2025 11:15:56 -0800 (PST) Received: from [127.0.0.1] ([101.32.222.185]) by smtp.gmail.com with ESMTPSA id 41be03b00d2f7-bd75def75ffsm14327479a12.3.2025.11.24.11.15.52 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 24 Nov 2025 11:15:56 -0800 (PST) From: Kairui Song Date: Tue, 25 Nov 2025 03:13:47 +0800 Subject: [PATCH v3 04/19] mm, swap: always try to free swap cache for SWP_SYNCHRONOUS_IO devices Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Message-Id: <20251125-swap-table-p2-v3-4-33f54f707a5c@tencent.com> References: <20251125-swap-table-p2-v3-0-33f54f707a5c@tencent.com> In-Reply-To: <20251125-swap-table-p2-v3-0-33f54f707a5c@tencent.com> To: linux-mm@kvack.org Cc: Andrew Morton , Baoquan He , Barry Song , Chris Li , Nhat Pham , Yosry Ahmed , David Hildenbrand , Johannes Weiner , Youngjun Park , Hugh Dickins , Baolin Wang , Ying Huang , Kemeng Shi , Lorenzo Stoakes , "Matthew Wilcox (Oracle)" , linux-kernel@vger.kernel.org, Kairui Song X-Mailer: b4 0.14.3 X-Developer-Signature: v=1; a=ed25519-sha256; t=1764011730; l=3125; i=kasong@tencent.com; s=kasong-sign-tencent; h=from:subject:message-id; bh=CI2m/8pPC1xTUXTV6AisxBkS+TKo9XryUKDABctQipg=; b=1fBW1wrl1FdUcEqOwfoSs2tA5fBJlSZoQd/l5dSSLoz+ti6Fy2iBwItbC578HMHFO77guf/7B wYVXFaNjKC+Bd4iNACTNobyn7D0+ubSqDpVeQl9ZX01YuoM5frK6QXW X-Developer-Key: i=kasong@tencent.com; a=ed25519; pk=kCdoBuwrYph+KrkJnrr7Sm1pwwhGDdZKcKrqiK8Y1mI= From: Kairui Song Now SWP_SYNCHRONOUS_IO devices are also using swap cache. One side effect is that a folio may stay in swap cache for a longer time due to lazy freeing (vm_swap_full()). This can help save some CPU / IO if folios are being swapped out very frequently right after swapin, hence improving the performance. But the long pinning of swap slots also increases the fragmentation rate of the swap device significantly, and currently, all in-tree SWP_SYNCHRONOUS_IO devices are RAM disks, so it also causes the backing memory to be pinned, increasing the memory pressure. So drop the swap cache immediately for SWP_SYNCHRONOUS_IO devices after swapin finishes. Swap cache has served its role as a synchronization layer to prevent any parallel swap-in from wasting CPU or memory allocation, and the redundant IO is not a major concern for SWP_SYNCHRONOUS_IO devices. Worth noting, without this patch, this series so far can provide a ~30% performance gain for certain workloads like MySQL or kernel compilation, but causes significant regression or OOM when under extreme global pressure. With this patch, we still have a nice performance gain for most workloads, and without introducing any observable regressions. This is a hint that further optimization can be done based on the new unified swapin with swap cache, but for now, just keep the behaviour consistent with before. Signed-off-by: Kairui Song --- mm/memory.c | 18 ++++++++++++++++-- 1 file changed, 16 insertions(+), 2 deletions(-) diff --git a/mm/memory.c b/mm/memory.c index 41b690eb8c00..9fb2032772f2 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -4354,12 +4354,26 @@ static vm_fault_t remove_device_exclusive_entry(str= uct vm_fault *vmf) return 0; } =20 -static inline bool should_try_to_free_swap(struct folio *folio, +/* + * Check if we should call folio_free_swap to free the swap cache. + * folio_free_swap only frees the swap cache to release the slot if swap + * count is zero, so we don't need to check the swap count here. + */ +static inline bool should_try_to_free_swap(struct swap_info_struct *si, + struct folio *folio, struct vm_area_struct *vma, unsigned int fault_flags) { if (!folio_test_swapcache(folio)) return false; + /* + * Always try to free swap cache for SWP_SYNCHRONOUS_IO devices. Swap + * cache can help save some IO or memory overhead, but these devices + * are fast, and meanwhile, swap cache pinning the slot deferring the + * release of metadata or fragmentation is a more critical issue. + */ + if (data_race(si->flags & SWP_SYNCHRONOUS_IO)) + return true; if (mem_cgroup_swap_full(folio) || (vma->vm_flags & VM_LOCKED) || folio_test_mlocked(folio)) return true; @@ -4931,7 +4945,7 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) * yet. */ swap_free_nr(entry, nr_pages); - if (should_try_to_free_swap(folio, vma, vmf->flags)) + if (should_try_to_free_swap(si, folio, vma, vmf->flags)) folio_free_swap(folio); =20 add_mm_counter(vma->vm_mm, MM_ANONPAGES, nr_pages); --=20 2.52.0