From nobody Tue Dec 2 01:06:33 2025 Received: from mail-pf1-f170.google.com (mail-pf1-f170.google.com [209.85.210.170]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 9109B318136 for ; Mon, 24 Nov 2025 19:16:58 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.210.170 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1764011820; cv=none; b=MBCPqHfvCpuqEevbUoaHDt0WtD6bcVh/0TJJXVQWlUF5xbKJ+rP3vHqV308tswTI9qClY6GVv7EvBxtKweLe+WhpB69SJCo6kOx+FjU7u+b34fCfshD2agcPXcGl3pvxXFz7uPtlymApHwKkRdI8x+zURUP/oCdAUMdTqCZs7d0= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1764011820; c=relaxed/simple; bh=LYBvtJCCNBOihUgW/uBeH8Dg9ldLNOl46SnbYfl+h64=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=pEIAOGasDb36tM6Zjo+9BcHyqEfR35xuHxSjpezZd/kjXoOwxoqkOj4InPB7Cp+7KkU8UOzVhBiSgYjKDR6PYTOaA0FHQVfgiGB/T/B7Pm1KmscXzJhmvDlypIvauNongR08QNcwQPw4mCdHZZoVXOWwS6I4EsKbYqoggBn7ZAc= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=NW3gES2r; arc=none smtp.client-ip=209.85.210.170 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="NW3gES2r" Received: by mail-pf1-f170.google.com with SMTP id d2e1a72fcca58-7b9a98b751eso3659991b3a.1 for ; Mon, 24 Nov 2025 11:16:58 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1764011818; x=1764616618; darn=vger.kernel.org; h=cc:to:in-reply-to:references:message-id:content-transfer-encoding :mime-version:subject:date:from:from:to:cc:subject:date:message-id :reply-to; bh=GcrO4+VrgA5lfPDgsn5lPfD0YSBWSARCYG+aVCir9Rs=; b=NW3gES2rMzPSKgPqpkdaLl/4oeO+1RUWJaRGchfTuBC1OZ5f/li1UG1lmD+0dW4bwL 7IeTUqZ2sBvNVV28J6h7Z1P5HQELL6tCVdBgIl6ZJiK9g8IuBHyTaWw2MNOvlawn6f0Y 6RnBNFgwyhft07cJaZvoUUnAD/vUh4cXGAybs+QdYHB4gFpZYcMVHMfKWkJl8Qd0lNmR cz3j83+hAAYcn+5tBdsbRStPlLKoIhUH5ByZ0CHhS3ikwLL6iVn7UEzRKLj87ReDYfU3 Qa9GPGUKBAxbuwZ+TKx3bj8+1cdciT5Wx5Bfz0TsEsGItVw1Of3fA53ueJIDGjT2pXZk b7cg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1764011818; x=1764616618; h=cc:to:in-reply-to:references:message-id:content-transfer-encoding :mime-version:subject:date:from:x-gm-gg:x-gm-message-state:from:to :cc:subject:date:message-id:reply-to; bh=GcrO4+VrgA5lfPDgsn5lPfD0YSBWSARCYG+aVCir9Rs=; b=XLmaxLw2/XAS938B/g+djkR00Vpa9bxxEbLYOqrAuz4wkjc7AQYVVFrUrrCT7A+3An Jh2dLkkwVoeSubrePxr4H/kMQ8fKhMAyaltFEdpoSWCTtDYjj3vzNsB1a665Z0vHA7mY Yqu/ReLUoK6nse6uvFVmTWOL421CNmiXSOfQ6rRGUDhKyovEGfHav1+IPrdvBDwN72np AYgUUXwnuCOvY8boctx3UMOsDYsQX95SlQoi0LMq/lhwC0zGF9R1L/8zC0s8rCF8w4Mt 3qoO2Bb8x3dgao9YhFqD96eKUsH76T2QTpzyRdPpQTasuxgBlC9w+lstOW9KOOQvF/oD TKeA== X-Forwarded-Encrypted: i=1; AJvYcCWFzXoIpnOUeNBQSam1dWLuxcMBuH5NYugrMLkcXyGbvc+vLe2XRrhTvz32WLOmukrG1X+DpqfjE4zvP4E=@vger.kernel.org X-Gm-Message-State: AOJu0YyloqHISyalCsTYfNGiPIrODTkE7okqJsjeRkvY7AvHgh0VFmq2 +H/hBEH2dIErp7k1mMnHbZUNn+WyFMdcbc9ss7LoZe/QUKCfE2pCmq4m X-Gm-Gg: ASbGnctvlr6XBn+6QeR/1/PkUOTv/haNqQP93SwjvRFngR7OY11lAHJyIfu3DSq3926 AbWhnhVf5jFmmZJNkJCi7SLPR8NQAZGapshR4JC2RU4cSqCfnhg3N4pf1Iy/JCX4C9n1/qF14e3 kS+1TvClDGuEbfFo/0FFVGeZWrCaM7LRVJx29eVsTRMJn94sFupgQFbQDISAuAgjOSwNjPoWeOb rrugp3421/qF/7COJDitDDpRUnInvJtSOt0u/ws7+XOFsY7fAya5m+RCsjD/hwxt3TXcKuZ0utW ejkPPdG6YNk73yMMHcWFrNK2DuIrGqeZb+j82OmURT9AwCFnqljzKCfPGTDi540axaGp/YGy89n Nxpy/Q+cgDQyJFtSxXsQLDl2WG+AN/x3ltZmNdsTumtWeRFXjwQtgrpnar1yFAMBFhjWyT1+7Ni k4TNBBdrLywN6H6U6zLMq0QHERx+mt66wKAwNU/UJ53AvQ8hOb X-Google-Smtp-Source: AGHT+IHJUVf2bjQh7PYsc3slGlVxxnG0MimHJ3XPxaJaJmjt78qZ65YN1P9zZdlu49oz0qIj8JsQ9g== X-Received: by 2002:a05:6a20:918d:b0:343:5d53:c0ab with SMTP id adf61e73a8af0-3614ecc985emr13628204637.20.1764011817625; Mon, 24 Nov 2025 11:16:57 -0800 (PST) Received: from [127.0.0.1] ([101.32.222.185]) by smtp.gmail.com with ESMTPSA id 41be03b00d2f7-bd75def75ffsm14327479a12.3.2025.11.24.11.16.53 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 24 Nov 2025 11:16:57 -0800 (PST) From: Kairui Song Date: Tue, 25 Nov 2025 03:13:59 +0800 Subject: [PATCH v3 16/19] mm, swap: check swap table directly for checking cache Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Message-Id: <20251125-swap-table-p2-v3-16-33f54f707a5c@tencent.com> References: <20251125-swap-table-p2-v3-0-33f54f707a5c@tencent.com> In-Reply-To: <20251125-swap-table-p2-v3-0-33f54f707a5c@tencent.com> To: linux-mm@kvack.org Cc: Andrew Morton , Baoquan He , Barry Song , Chris Li , Nhat Pham , Yosry Ahmed , David Hildenbrand , Johannes Weiner , Youngjun Park , Hugh Dickins , Baolin Wang , Ying Huang , Kemeng Shi , Lorenzo Stoakes , "Matthew Wilcox (Oracle)" , linux-kernel@vger.kernel.org, Kairui Song X-Mailer: b4 0.14.3 X-Developer-Signature: v=1; a=ed25519-sha256; t=1764011730; l=7912; i=kasong@tencent.com; s=kasong-sign-tencent; h=from:subject:message-id; bh=l2u8+7ztYOIx0yRaQjFhJKppLX2cjxl7BELgafgfB5U=; b=D4cavwQuPEBTnSw4UhrTn5nzBgDX/18iZdPVPrIpEkirBt9h4hmGJR4XigoJICqcJ3g67PoIX nkzAJPUJZoYBfWnIoSqnK9IRqBY73FZ5F/4zQ4l4K7DtL8IeGhIEgA4 X-Developer-Key: i=kasong@tencent.com; a=ed25519; pk=kCdoBuwrYph+KrkJnrr7Sm1pwwhGDdZKcKrqiK8Y1mI= From: Kairui Song Instead of looking at the swap map, check swap table directly to tell if a swap slot is cached. Prepares for the removal of SWAP_HAS_CACHE. Signed-off-by: Kairui Song --- mm/swap.h | 11 ++++++++--- mm/swap_state.c | 16 ++++++++++++++++ mm/swapfile.c | 55 +++++++++++++++++++++++++++++-----------------------= --- mm/userfaultfd.c | 10 +++------- 4 files changed, 56 insertions(+), 36 deletions(-) diff --git a/mm/swap.h b/mm/swap.h index ec1ef7d0c35b..3692e143eeba 100644 --- a/mm/swap.h +++ b/mm/swap.h @@ -275,6 +275,7 @@ void __swapcache_clear_cached(struct swap_info_struct *= si, * swap entries in the page table, similar to locking swap cache folio. * - See the comment of get_swap_device() for more complex usage. */ +bool swap_cache_has_folio(swp_entry_t entry); struct folio *swap_cache_get_folio(swp_entry_t entry); void *swap_cache_get_shadow(swp_entry_t entry); void swap_cache_del_folio(struct folio *folio); @@ -335,8 +336,6 @@ static inline int swap_zeromap_batch(swp_entry_t entry,= int max_nr, =20 static inline int non_swapcache_batch(swp_entry_t entry, int max_nr) { - struct swap_info_struct *si =3D __swap_entry_to_info(entry); - pgoff_t offset =3D swp_offset(entry); int i; =20 /* @@ -345,8 +344,9 @@ static inline int non_swapcache_batch(swp_entry_t entry= , int max_nr) * be in conflict with the folio in swap cache. */ for (i =3D 0; i < max_nr; i++) { - if ((si->swap_map[offset + i] & SWAP_HAS_CACHE)) + if (swap_cache_has_folio(entry)) return i; + entry.val++; } =20 return i; @@ -449,6 +449,11 @@ static inline int swap_writeout(struct folio *folio, return 0; } =20 +static inline bool swap_cache_has_folio(swp_entry_t entry) +{ + return false; +} + static inline struct folio *swap_cache_get_folio(swp_entry_t entry) { return NULL; diff --git a/mm/swap_state.c b/mm/swap_state.c index eb7710120d5f..94b6d368e3e8 100644 --- a/mm/swap_state.c +++ b/mm/swap_state.c @@ -103,6 +103,22 @@ struct folio *swap_cache_get_folio(swp_entry_t entry) return NULL; } =20 +/** + * swap_cache_has_folio - Check if a swap slot has cache. + * @entry: swap entry indicating the slot. + * + * Context: Caller must ensure @entry is valid and protect the swap + * device with reference count or locks. + */ +bool swap_cache_has_folio(swp_entry_t entry) +{ + unsigned long swp_tb; + + swp_tb =3D swap_table_get(__swap_entry_to_cluster(entry), + swp_cluster_offset(entry)); + return swp_tb_is_folio(swp_tb); +} + /** * swap_cache_get_shadow - Looks up a shadow in the swap cache. * @entry: swap entry used for the lookup. diff --git a/mm/swapfile.c b/mm/swapfile.c index 91368294170f..7e28d60d90e1 100644 --- a/mm/swapfile.c +++ b/mm/swapfile.c @@ -792,23 +792,18 @@ static bool cluster_reclaim_range(struct swap_info_st= ruct *si, unsigned int nr_pages =3D 1 << order; unsigned long offset =3D start, end =3D start + nr_pages; unsigned char *map =3D si->swap_map; - int nr_reclaim; + unsigned long swp_tb; =20 spin_unlock(&ci->lock); do { - switch (READ_ONCE(map[offset])) { - case 0: + if (swap_count(READ_ONCE(map[offset]))) break; - case SWAP_HAS_CACHE: - nr_reclaim =3D __try_to_reclaim_swap(si, offset, TTRS_ANYWAY); - if (nr_reclaim < 0) - goto out; - break; - default: - goto out; + swp_tb =3D swap_table_get(ci, offset % SWAPFILE_CLUSTER); + if (swp_tb_is_folio(swp_tb)) { + if (__try_to_reclaim_swap(si, offset, TTRS_ANYWAY) < 0) + break; } } while (++offset < end); -out: spin_lock(&ci->lock); =20 /* @@ -829,37 +824,41 @@ static bool cluster_reclaim_range(struct swap_info_st= ruct *si, * Recheck the range no matter reclaim succeeded or not, the slot * could have been be freed while we are not holding the lock. */ - for (offset =3D start; offset < end; offset++) - if (READ_ONCE(map[offset])) + for (offset =3D start; offset < end; offset++) { + swp_tb =3D __swap_table_get(ci, offset % SWAPFILE_CLUSTER); + if (swap_count(map[offset]) || !swp_tb_is_null(swp_tb)) return false; + } =20 return true; } =20 static bool cluster_scan_range(struct swap_info_struct *si, struct swap_cluster_info *ci, - unsigned long start, unsigned int nr_pages, + unsigned long offset, unsigned int nr_pages, bool *need_reclaim) { - unsigned long offset, end =3D start + nr_pages; + unsigned long end =3D offset + nr_pages; unsigned char *map =3D si->swap_map; + unsigned long swp_tb; =20 if (cluster_is_empty(ci)) return true; =20 - for (offset =3D start; offset < end; offset++) { - switch (READ_ONCE(map[offset])) { - case 0: - continue; - case SWAP_HAS_CACHE: + do { + if (swap_count(map[offset])) + return false; + swp_tb =3D __swap_table_get(ci, offset % SWAPFILE_CLUSTER); + if (swp_tb_is_folio(swp_tb)) { + WARN_ON_ONCE(!(map[offset] & SWAP_HAS_CACHE)); if (!vm_swap_full()) return false; *need_reclaim =3D true; - continue; - default: - return false; + } else { + /* A entry with no count and no cache must be null */ + VM_WARN_ON_ONCE(!swp_tb_is_null(swp_tb)); } - } + } while (++offset < end); =20 return true; } @@ -1026,7 +1025,8 @@ static void swap_reclaim_full_clusters(struct swap_in= fo_struct *si, bool force) to_scan--; =20 while (offset < end) { - if (READ_ONCE(map[offset]) =3D=3D SWAP_HAS_CACHE) { + if (!swap_count(READ_ONCE(map[offset])) && + swp_tb_is_folio(__swap_table_get(ci, offset % SWAPFILE_CLUSTER))) { spin_unlock(&ci->lock); nr_reclaim =3D __try_to_reclaim_swap(si, offset, TTRS_ANYWAY); @@ -1968,6 +1968,7 @@ void swap_put_entries_direct(swp_entry_t entry, int n= r) struct swap_info_struct *si; bool any_only_cache =3D false; unsigned long offset; + unsigned long swp_tb; =20 si =3D get_swap_device(entry); if (WARN_ON_ONCE(!si)) @@ -1992,7 +1993,9 @@ void swap_put_entries_direct(swp_entry_t entry, int n= r) */ for (offset =3D start_offset; offset < end_offset; offset +=3D nr) { nr =3D 1; - if (READ_ONCE(si->swap_map[offset]) =3D=3D SWAP_HAS_CACHE) { + swp_tb =3D swap_table_get(__swap_offset_to_cluster(si, offset), + offset % SWAPFILE_CLUSTER); + if (!swap_count(READ_ONCE(si->swap_map[offset])) && swp_tb_is_folio(swp_= tb)) { /* * Folios are always naturally aligned in swap so * advance forward to the next boundary. Zero means no diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c index e6dfd5f28acd..3f28aa319988 100644 --- a/mm/userfaultfd.c +++ b/mm/userfaultfd.c @@ -1190,17 +1190,13 @@ static int move_swap_pte(struct mm_struct *mm, stru= ct vm_area_struct *dst_vma, * Check if the swap entry is cached after acquiring the src_pte * lock. Otherwise, we might miss a newly loaded swap cache folio. * - * Check swap_map directly to minimize overhead, READ_ONCE is sufficient. * We are trying to catch newly added swap cache, the only possible case= is * when a folio is swapped in and out again staying in swap cache, using= the * same entry before the PTE check above. The PTL is acquired and releas= ed - * twice, each time after updating the swap_map's flag. So holding - * the PTL here ensures we see the updated value. False positive is poss= ible, - * e.g. SWP_SYNCHRONOUS_IO swapin may set the flag without touching the - * cache, or during the tiny synchronization window between swap cache a= nd - * swap_map, but it will be gone very quickly, worst result is retry jit= ters. + * twice, each time after updating the swap table. So holding + * the PTL here ensures we see the updated value. */ - if (READ_ONCE(si->swap_map[swp_offset(entry)]) & SWAP_HAS_CACHE) { + if (swap_cache_has_folio(entry)) { double_pt_unlock(dst_ptl, src_ptl); return -EAGAIN; } --=20 2.52.0