From nobody Mon Feb 9 16:02:24 2026 Received: from mail-pl1-f174.google.com (mail-pl1-f174.google.com [209.85.214.174]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 4813333CEA5 for ; Fri, 19 Dec 2025 19:44:46 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.174 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1766173487; cv=none; b=rHe9YdjOEoQOkasbr2iybknHS7AOEhLmsxMbN43tBa23Wke8/+lTOhmlTzxpW34BT8i3YnyCh/UzIynHAlJrXaMfLk8KupPGPLW1BQnKeR/5ZNbRQ7RSlvwtEkABzbykxL+zEU5cjdSjZ2Rco/msXwN8qcTMnRslVJq7SBUGazg= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1766173487; c=relaxed/simple; bh=HKcx/Zthc8osEoP68fK/KRHY55/jJp6jR1VwQgCwLMY=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=AyEtZ16zFta+98mo8oKdHWV0I4V4UlwP1f8aJKhoJmCTGNtNH0dmBWLAiBl10WUXqoyoynLl+zrdFe0sHNxNm12tZY6JvrQ57GRXMjcPRHff8CRC4Ps4CCzH2UjLjIflO/tNcUpEjtAARbDlHaYxMkCUTAeGC/19EIVpOfzGT6M= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=iOPNWCqE; arc=none smtp.client-ip=209.85.214.174 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="iOPNWCqE" Received: by mail-pl1-f174.google.com with SMTP id d9443c01a7336-2a137692691so24571765ad.0 for ; Fri, 19 Dec 2025 11:44:46 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1766173485; x=1766778285; darn=vger.kernel.org; h=cc:to:in-reply-to:references:message-id:content-transfer-encoding :mime-version:subject:date:from:from:to:cc:subject:date:message-id :reply-to; bh=2PwOG+0/dDfbVzV+wkf5kACnPNuyGcuNjwmqxxfojog=; b=iOPNWCqE+LGJDOkWHMbEb74lci19FUVyRWSO3wFMAEvBpDU1Kq3xrZhWtjvTiV/XcZ 2+CjhoE52odjS3h/ys5VVygsxUh5iQQsa1kn9wh1Y7mOvufCVSCj/TRAgAt65wV9LNuw g6dfiRQNyxjGGPIm5FN/0fRFRS33Ji5Icc87Fjq0k+mUXgVRkXcEuCT8ymHhITTm4FCi NgOZe6nOF7VQ+fBajRWR5CORAexrSmjtFAK3XUdl2WHe4E5iJvXva8ClKzcmGqhnps3V zQ3c+FU3xriGCPaglZPEjmS5+PzeNumd1ChSjXa2zPmZY1wc2ACMiY9xSz1tIdIeAVjB B3Eg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1766173485; x=1766778285; h=cc:to:in-reply-to:references:message-id:content-transfer-encoding :mime-version:subject:date:from:x-gm-gg:x-gm-message-state:from:to :cc:subject:date:message-id:reply-to; bh=2PwOG+0/dDfbVzV+wkf5kACnPNuyGcuNjwmqxxfojog=; b=GaapnhpjDBCzkQj9967RhYG9Fq2v2SkXvLxTFM3bwQ3IcX8L/bfh+WCfkTFlhR2QEu gXF3Dw5MHieTUxra+Howu+HeG14AeS1HvP9KTqZ/XaXDcYDxb4LTlue9Hj3VZqqojUss ViOeFuO/BWaarBC8zAoAu/yVYjii46+8tPRei9JFqE8wXAtGSymC4kg008hp2GSnUMO3 /KZDDsqLaKwwxYeQwwCQC9PIL/qm/8ePcy6qyNVS2fIL/wC/5VkiWvz62HAFlrbzt6Lg kKuxuGwE/vWHSTEqCbrIcTxoLWVD5MWGF+rX/3J06RRqundIdfZbO37AEPaj9E+UpAGF fYnA== X-Forwarded-Encrypted: i=1; AJvYcCWIAkA1ZUBBD1M1MTPV609Z9ycVESWUo5LUaMUfF5ppsdOvua7MBEzcuFjNTTFmUMocVmWt4Vy5wa8Sbpg=@vger.kernel.org X-Gm-Message-State: AOJu0Yx3tfsF7Va4J4/HjaDKdVpOlih9jnNTb2vN0w9/K/w6qa1qT2vV hBDoWbRBS9os34GBJVgXJvEA4rlLMYD89RI5cpnTwaFqoZsTw14gBkGc X-Gm-Gg: AY/fxX5Iv7iYjRxfMJRQsZh8wqEkwXmw2RJT/jdiDRXu+JFDb0NFqIk68rByNnoXXQR X2XGceQIDV1vGJjWFy5rp9Prvm9T5L0SoAU2W7Ly5BRMLcQZNSsTut331gZrFogW29UAXRLNupe 3MPcQ4nooJA0etxIe2CRn1crjcm4VUBvFsy/ZVpffOyAtemoIZLmJ4BH+Yv4/LolGHCRkLHocp6 eEWjOsfz3TaCoc+F94H9jMDOcSXXCFzYnc//X98opp2AIrmQvR+YY8J36tHCSVoOvgI1CsGYe0Y Ap7eZ5GsO55H1FMTjiqDmn6iZzrHPlp7WirKfC/v8GJFFtzfNdqDe6+6PmM9TzqUDfaOWP3D+uo Eeed5LhuKNTRfWRPkOPytkB0HK+8beRLEG2XvRb1Sc3epD77BU01J0WgM1KQKWfM7Kx4/4WNe7a DavjWjR3STvkxyjBKb8YYUNoR3wIWwZ7+bYV+OOPJL/LrvsAdwTwSt X-Google-Smtp-Source: AGHT+IHC+QiRfT/V3IsfLeampt1WZNrkqY8SV6SNTZa+PJQAQMNi/BGbD8+RMWj9AP3Mi1F0hxikoA== X-Received: by 2002:a17:902:f552:b0:297:c71d:851c with SMTP id d9443c01a7336-2a2f2736bc0mr35603795ad.36.1766173485468; Fri, 19 Dec 2025 11:44:45 -0800 (PST) Received: from [127.0.0.1] ([101.32.222.185]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-2a2f3d76ceesm30170985ad.91.2025.12.19.11.44.41 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 19 Dec 2025 11:44:44 -0800 (PST) From: Kairui Song Date: Sat, 20 Dec 2025 03:43:35 +0800 Subject: [PATCH v5 06/19] mm, swap: free the swap cache after folio is mapped Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Message-Id: <20251220-swap-table-p2-v5-6-8862a265a033@tencent.com> References: <20251220-swap-table-p2-v5-0-8862a265a033@tencent.com> In-Reply-To: <20251220-swap-table-p2-v5-0-8862a265a033@tencent.com> To: linux-mm@kvack.org Cc: Andrew Morton , Baoquan He , Barry Song , Chris Li , Nhat Pham , Yosry Ahmed , David Hildenbrand , Johannes Weiner , Youngjun Park , Hugh Dickins , Baolin Wang , Ying Huang , Kemeng Shi , Lorenzo Stoakes , "Matthew Wilcox (Oracle)" , linux-kernel@vger.kernel.org, Kairui Song X-Mailer: b4 0.14.3 X-Developer-Signature: v=1; a=ed25519-sha256; t=1766173451; l=2894; i=kasong@tencent.com; s=kasong-sign-tencent; h=from:subject:message-id; bh=DGUAHHjNQalSID/K5uxKfa/hQdCPpXKKV/GQ5H67nAI=; b=CDTHqPhaFFt2+4Tu5lSmCzajCWzjr28S5y0RnqGspmj7eJ4qBH7rEslT01td9gvFD3pUzEkUQ Lx1LHAWT4GaB28yMAHGHOd+0+mCTNkmNgPMlJFsm26nwFwTkmdOKri2 X-Developer-Key: i=kasong@tencent.com; a=ed25519; pk=kCdoBuwrYph+KrkJnrr7Sm1pwwhGDdZKcKrqiK8Y1mI= From: Kairui Song Currently, we remove the folio from the swap cache and free the swap cache before mapping the PTE. To reduce repeated faults due to parallel swapins of the same PTE, change it to remove the folio from the swap cache after it is mapped. So new faults from the swap PTE will be much more likely to see the folio in the swap cache and wait on it. This does not eliminate all swapin races: an ongoing swapin fault may still see an empty swap cache. That's harmless, as the PTE is changed before the swap cache is cleared, so it will just return and not trigger any repeated faults. This does help to reduce the chance. Reviewed-by: Baoquan He Signed-off-by: Kairui Song --- mm/memory.c | 21 +++++++++++---------- 1 file changed, 11 insertions(+), 10 deletions(-) diff --git a/mm/memory.c b/mm/memory.c index ca54009cd586..a4c58341c44a 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -4362,6 +4362,7 @@ static vm_fault_t remove_device_exclusive_entry(struc= t vm_fault *vmf) static inline bool should_try_to_free_swap(struct swap_info_struct *si, struct folio *folio, struct vm_area_struct *vma, + unsigned int extra_refs, unsigned int fault_flags) { if (!folio_test_swapcache(folio)) @@ -4384,7 +4385,7 @@ static inline bool should_try_to_free_swap(struct swa= p_info_struct *si, * reference only in case it's likely that we'll be the exclusive user. */ return (fault_flags & FAULT_FLAG_WRITE) && !folio_test_ksm(folio) && - folio_ref_count(folio) =3D=3D (1 + folio_nr_pages(folio)); + folio_ref_count(folio) =3D=3D (extra_refs + folio_nr_pages(folio)); } =20 static vm_fault_t pte_marker_clear(struct vm_fault *vmf) @@ -4936,15 +4937,6 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) */ arch_swap_restore(folio_swap(entry, folio), folio); =20 - /* - * Remove the swap entry and conditionally try to free up the swapcache. - * We're already holding a reference on the page but haven't mapped it - * yet. - */ - swap_free_nr(entry, nr_pages); - if (should_try_to_free_swap(si, folio, vma, vmf->flags)) - folio_free_swap(folio); - add_mm_counter(vma->vm_mm, MM_ANONPAGES, nr_pages); add_mm_counter(vma->vm_mm, MM_SWAPENTS, -nr_pages); pte =3D mk_pte(page, vma->vm_page_prot); @@ -4998,6 +4990,15 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) arch_do_swap_page_nr(vma->vm_mm, vma, address, pte, pte, nr_pages); =20 + /* + * Remove the swap entry and conditionally try to free up the swapcache. + * Do it after mapping, so raced page faults will likely see the folio + * in swap cache and wait on the folio lock. + */ + swap_free_nr(entry, nr_pages); + if (should_try_to_free_swap(si, folio, vma, nr_pages, vmf->flags)) + folio_free_swap(folio); + folio_unlock(folio); if (unlikely(folio !=3D swapcache)) { /* --=20 2.52.0