From nobody Tue Dec 16 13:21:45 2025 Received: from mail-pf1-f169.google.com (mail-pf1-f169.google.com [209.85.210.169]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 39D3834B186 for ; Thu, 4 Dec 2025 19:29:45 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.210.169 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1764876586; cv=none; b=TpeH28MtjbhSLpvZp+RWCGWPglMNPyVqy0xbZEQlcIA40CCT8spxF0rutK1cOyx6lmk7NzCiUmN2rsmQmBb11/RutAVdqjEmhnn+vGnoa+zS8tG0v6O0/89eNSGiRZNDFtHMQGkeVM2JbIsT19vUmTl88aIyeSuqcoTSCRg6VRQ= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1764876586; c=relaxed/simple; bh=bEVBpXrPRO9yEJ4AajWHP560GQsJ0/HaiU6nlDZBNEU=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=GPceDu8TemW7Oisrl0IHmMUJYd9rs+z1qHL0D7vezPtvSPw70/u+xx0MkjmIC7/AMyLp1PMP7oS8Q1BtNPXiesLK6p0bYsKttnFbVHZI4v8yRCW7WGAssYg+3C02/Zbhq9yNiCSN7RCXCC8H8n7JXZyRb5CjKItyf857RrY3lrw= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=LtI0z+YJ; arc=none smtp.client-ip=209.85.210.169 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="LtI0z+YJ" Received: by mail-pf1-f169.google.com with SMTP id d2e1a72fcca58-7b80fed1505so1487374b3a.3 for ; Thu, 04 Dec 2025 11:29:45 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1764876584; x=1765481384; darn=vger.kernel.org; h=cc:to:in-reply-to:references:message-id:content-transfer-encoding :mime-version:subject:date:from:from:to:cc:subject:date:message-id :reply-to; bh=Jq81d1H6YKMjZl5Zgb0MmVZ5YbUN3ZOXLRAy2GMP/7c=; b=LtI0z+YJHjV0hTCMu4Kf2M5IbH14F5UnSBVgW48azLagmn6Py2krUWJ7yc3KnOQURb nTaizsmPLAgy7VChg9aClZvqZJU1O6mVcNUkl4JM6liNje64ONfefZ8ORKfnd3VLgGjw JtXdwtO0Cflm5FH/m330nVN4lx0NBE5WAZ5mzWV8dKmp5zrEPbGDUB9wgNC87TQj5VY5 U/6/E7+FSa+N6w1RzK8vQKxYDsmzvQf7ht8Au22z1P2rpj40ev4KvTokNB44NMm2ILfc sZXNKpr4GtCq4FjCeNIYXyXCH/jQO7c46eoqqGRcMAB709CMJycXiZXtKOrea6em+lKj TPPA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1764876584; x=1765481384; h=cc:to:in-reply-to:references:message-id:content-transfer-encoding :mime-version:subject:date:from:x-gm-gg:x-gm-message-state:from:to :cc:subject:date:message-id:reply-to; bh=Jq81d1H6YKMjZl5Zgb0MmVZ5YbUN3ZOXLRAy2GMP/7c=; b=Vxp6AZH65UHTopkXAEKgSpjEjRpiKXTNMrsLAq5slVgbBZZeuNKwU4QA4fQiMvV2tT xb1+KYa3ldwDX1J2z2CvhSiIrDJYnkwkGy6w1tJcRj8mwd4EZbyXzn8T1A0KJWLNN7mG d4KVGtMc6UKvXqvduSHmtYoEk2A0GnE84gKx4yRCoQibW04OvEFHyPZydowBdyz4Ffg4 mwlfcaYe0tK6msfPnSilGS4sLZjKHZak9ObKbiE2eCuZe9OyPHAUlDk2BSPQv746i+t7 cHtzKOioVEQEmKsnLbm0tZglIuASdXZTPMgK3Aizp6G3bnENcxEZTVLzCnfRFbRcIdgb t5aw== X-Forwarded-Encrypted: i=1; AJvYcCU1aEI/v0mzJoz0mkFhSCRFbrIzJ/Remu9VARRHxC9e29u0w1x6cg9TRASpk50S8ohGR9swhrLuwJIkilw=@vger.kernel.org X-Gm-Message-State: AOJu0YzQ2Lehh3uA6gSVilmi5P3SKgx6mDdrxX0t+95LFFB1b45M6c1Z s7zciYJOli4Q9ucIvbdqLoDiq0mHTtZhbQyPsvcfcJlsM/Eo6EA9M4Eq X-Gm-Gg: ASbGncs7b6FkLdl2HtTYWsZ3cf30Wnf+MlQZmeJCTqinvUL+lrhzPSTvnuwoh2FNF1T 8bsk6pOAI0JoFDuyVcR395S59KgwHjlg5AdVJOlZovgH+xZTxQtQ5jBNoywJe/Bb6QzMyfMu1AH bPLquPYU0qPTeQXk3of3M3Y9eUJETsYO0kJ9PH5HBhWHDZtrBydj3TXrdTkWjGjnVhTo107QN3b DvoPnv8S5fjyoGbhwuT43AdFCcvIVVNvzjyvE7xpQj8+6hlM2YhVmjYGmD/8u7ayh/xRK8jiTwS fV3zXrFHbYRLF7A6gXZfwnoK4Mhua9MbTUuM8wL42KwLA5YRPt/u1pyslD8QsLt1sfGedRiWVlj ASfPT8DZkJbI1F4tXPoE5tE4W1ku1IEXgWs97Dn+u8uejTJuQemjDmOG38HXoeJamBtKlR6cKzE md6Nt0HRNVkAWWUJbU8JgH0OwS/eOyrAtyP2gm0ECKNfV7NH3/3MEEE19pxUs= X-Google-Smtp-Source: AGHT+IEzzWPUKgseDNtI+TAN3a+rf5FvnoqWhzpl1W0PHwGSKZNXSLgSNaIQHWAI4QWskmWptXHJCA== X-Received: by 2002:a05:6a20:432c:b0:35e:11ff:45b4 with SMTP id adf61e73a8af0-363f5d6270dmr9088976637.21.1764876584356; Thu, 04 Dec 2025 11:29:44 -0800 (PST) Received: from [127.0.0.1] ([101.32.222.185]) by smtp.gmail.com with ESMTPSA id 41be03b00d2f7-bf686b3b5a9sm2552926a12.9.2025.12.04.11.29.40 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 04 Dec 2025 11:29:43 -0800 (PST) From: Kairui Song Date: Fri, 05 Dec 2025 03:29:09 +0800 Subject: [PATCH v4 01/19] mm, swap: rename __read_swap_cache_async to swap_cache_alloc_folio Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Message-Id: <20251205-swap-table-p2-v4-1-cb7e28a26a40@tencent.com> References: <20251205-swap-table-p2-v4-0-cb7e28a26a40@tencent.com> In-Reply-To: <20251205-swap-table-p2-v4-0-cb7e28a26a40@tencent.com> To: linux-mm@kvack.org Cc: Andrew Morton , Baoquan He , Barry Song , Chris Li , Nhat Pham , Yosry Ahmed , David Hildenbrand , Johannes Weiner , Youngjun Park , Hugh Dickins , Baolin Wang , Ying Huang , Kemeng Shi , Lorenzo Stoakes , "Matthew Wilcox (Oracle)" , linux-kernel@vger.kernel.org, Kairui Song X-Mailer: b4 0.14.3 X-Developer-Signature: v=1; a=ed25519-sha256; t=1764876574; l=8299; i=kasong@tencent.com; s=kasong-sign-tencent; h=from:subject:message-id; bh=+SR4a+Clox1gWWGvvj3O7fTgvXCBBPQIHCvWevNHVBU=; b=7JYvSNEeNJ2Sq4jdk+Kb2EUltwQCI7SO3G+uIZq1IG69COWybMWPF5xD+Sx+Vf7olTKI3QW2/ YBbUjHxa+L7AzodOdqoqg8Tes9iGKobj1YeCTfV2phjKOpFq+3pE8bC X-Developer-Key: i=kasong@tencent.com; a=ed25519; pk=kCdoBuwrYph+KrkJnrr7Sm1pwwhGDdZKcKrqiK8Y1mI= From: Kairui Song __read_swap_cache_async is widely used to allocate and ensure a folio is in swapcache, or get the folio if a folio is already there. It's not async, and it's not doing any read. Rename it to better present its usage, and prepare to be reworked as part of new swap cache APIs. Also, add some comments for the function. Worth noting that the skip_if_exists argument is an long existing workaround that will be dropped soon. Reviewed-by: Yosry Ahmed Acked-by: Chris Li Reviewed-by: Barry Song Reviewed-by: Nhat Pham Signed-off-by: Kairui Song Reviewed-by: Baoquan He --- mm/swap.h | 6 +++--- mm/swap_state.c | 46 +++++++++++++++++++++++++++++++++------------- mm/swapfile.c | 2 +- mm/zswap.c | 4 ++-- 4 files changed, 39 insertions(+), 19 deletions(-) diff --git a/mm/swap.h b/mm/swap.h index d034c13d8dd2..0fff92e42cfe 100644 --- a/mm/swap.h +++ b/mm/swap.h @@ -249,6 +249,9 @@ struct folio *swap_cache_get_folio(swp_entry_t entry); void *swap_cache_get_shadow(swp_entry_t entry); void swap_cache_add_folio(struct folio *folio, swp_entry_t entry, void **s= hadow); void swap_cache_del_folio(struct folio *folio); +struct folio *swap_cache_alloc_folio(swp_entry_t entry, gfp_t gfp_flags, + struct mempolicy *mpol, pgoff_t ilx, + bool *alloced, bool skip_if_exists); /* Below helpers require the caller to lock and pass in the swap cluster. = */ void __swap_cache_del_folio(struct swap_cluster_info *ci, struct folio *folio, swp_entry_t entry, void *shadow); @@ -261,9 +264,6 @@ void swapcache_clear(struct swap_info_struct *si, swp_e= ntry_t entry, int nr); struct folio *read_swap_cache_async(swp_entry_t entry, gfp_t gfp_mask, struct vm_area_struct *vma, unsigned long addr, struct swap_iocb **plug); -struct folio *__read_swap_cache_async(swp_entry_t entry, gfp_t gfp_flags, - struct mempolicy *mpol, pgoff_t ilx, bool *new_page_allocated, - bool skip_if_exists); struct folio *swap_cluster_readahead(swp_entry_t entry, gfp_t flag, struct mempolicy *mpol, pgoff_t ilx); struct folio *swapin_readahead(swp_entry_t entry, gfp_t flag, diff --git a/mm/swap_state.c b/mm/swap_state.c index 5f97c6ae70a2..08252eaef32f 100644 --- a/mm/swap_state.c +++ b/mm/swap_state.c @@ -402,9 +402,29 @@ void swap_update_readahead(struct folio *folio, struct= vm_area_struct *vma, } } =20 -struct folio *__read_swap_cache_async(swp_entry_t entry, gfp_t gfp_mask, - struct mempolicy *mpol, pgoff_t ilx, bool *new_page_allocated, - bool skip_if_exists) +/** + * swap_cache_alloc_folio - Allocate folio for swapped out slot in swap ca= che. + * @entry: the swapped out swap entry to be binded to the folio. + * @gfp_mask: memory allocation flags + * @mpol: NUMA memory allocation policy to be applied + * @ilx: NUMA interleave index, for use only when MPOL_INTERLEAVE + * @new_page_allocated: sets true if allocation happened, false otherwise + * @skip_if_exists: if the slot is a partially cached state, return NULL. + * This is a workaround that would be removed shortly. + * + * Allocate a folio in the swap cache for one swap slot, typically before + * doing IO (e.g. swap in or zswap writeback). The swap slot indicated by + * @entry must have a non-zero swap count (swapped out). + * Currently only supports order 0. + * + * Context: Caller must protect the swap device with reference count or lo= cks. + * Return: Returns the existing folio if @entry is cached already. Returns + * NULL if failed due to -ENOMEM or @entry have a swap count < 1. + */ +struct folio *swap_cache_alloc_folio(swp_entry_t entry, gfp_t gfp_mask, + struct mempolicy *mpol, pgoff_t ilx, + bool *new_page_allocated, + bool skip_if_exists) { struct swap_info_struct *si =3D __swap_entry_to_info(entry); struct folio *folio; @@ -452,12 +472,12 @@ struct folio *__read_swap_cache_async(swp_entry_t ent= ry, gfp_t gfp_mask, goto put_and_return; =20 /* - * Protect against a recursive call to __read_swap_cache_async() + * Protect against a recursive call to swap_cache_alloc_folio() * on the same entry waiting forever here because SWAP_HAS_CACHE * is set but the folio is not the swap cache yet. This can * happen today if mem_cgroup_swapin_charge_folio() below * triggers reclaim through zswap, which may call - * __read_swap_cache_async() in the writeback path. + * swap_cache_alloc_folio() in the writeback path. */ if (skip_if_exists) goto put_and_return; @@ -466,7 +486,7 @@ struct folio *__read_swap_cache_async(swp_entry_t entry= , gfp_t gfp_mask, * We might race against __swap_cache_del_folio(), and * stumble across a swap_map entry whose SWAP_HAS_CACHE * has not yet been cleared. Or race against another - * __read_swap_cache_async(), which has set SWAP_HAS_CACHE + * swap_cache_alloc_folio(), which has set SWAP_HAS_CACHE * in swap_map, but not yet added its folio to swap cache. */ schedule_timeout_uninterruptible(1); @@ -525,7 +545,7 @@ struct folio *read_swap_cache_async(swp_entry_t entry, = gfp_t gfp_mask, return NULL; =20 mpol =3D get_vma_policy(vma, addr, 0, &ilx); - folio =3D __read_swap_cache_async(entry, gfp_mask, mpol, ilx, + folio =3D swap_cache_alloc_folio(entry, gfp_mask, mpol, ilx, &page_allocated, false); mpol_cond_put(mpol); =20 @@ -643,9 +663,9 @@ struct folio *swap_cluster_readahead(swp_entry_t entry,= gfp_t gfp_mask, blk_start_plug(&plug); for (offset =3D start_offset; offset <=3D end_offset ; offset++) { /* Ok, do the async read-ahead now */ - folio =3D __read_swap_cache_async( - swp_entry(swp_type(entry), offset), - gfp_mask, mpol, ilx, &page_allocated, false); + folio =3D swap_cache_alloc_folio( + swp_entry(swp_type(entry), offset), gfp_mask, mpol, ilx, + &page_allocated, false); if (!folio) continue; if (page_allocated) { @@ -662,7 +682,7 @@ struct folio *swap_cluster_readahead(swp_entry_t entry,= gfp_t gfp_mask, lru_add_drain(); /* Push any new pages onto the LRU now */ skip: /* The page was likely read above, so no need for plugging here */ - folio =3D __read_swap_cache_async(entry, gfp_mask, mpol, ilx, + folio =3D swap_cache_alloc_folio(entry, gfp_mask, mpol, ilx, &page_allocated, false); if (unlikely(page_allocated)) swap_read_folio(folio, NULL); @@ -767,7 +787,7 @@ static struct folio *swap_vma_readahead(swp_entry_t tar= g_entry, gfp_t gfp_mask, if (!si) continue; } - folio =3D __read_swap_cache_async(entry, gfp_mask, mpol, ilx, + folio =3D swap_cache_alloc_folio(entry, gfp_mask, mpol, ilx, &page_allocated, false); if (si) put_swap_device(si); @@ -789,7 +809,7 @@ static struct folio *swap_vma_readahead(swp_entry_t tar= g_entry, gfp_t gfp_mask, lru_add_drain(); skip: /* The folio was likely read above, so no need for plugging here */ - folio =3D __read_swap_cache_async(targ_entry, gfp_mask, mpol, targ_ilx, + folio =3D swap_cache_alloc_folio(targ_entry, gfp_mask, mpol, targ_ilx, &page_allocated, false); if (unlikely(page_allocated)) swap_read_folio(folio, NULL); diff --git a/mm/swapfile.c b/mm/swapfile.c index 46d2008e4b99..e5284067a442 100644 --- a/mm/swapfile.c +++ b/mm/swapfile.c @@ -1574,7 +1574,7 @@ static unsigned char swap_entry_put_locked(struct swa= p_info_struct *si, * CPU1 CPU2 * do_swap_page() * ... swapoff+swapon - * __read_swap_cache_async() + * swap_cache_alloc_folio() * swapcache_prepare() * __swap_duplicate() * // check swap_map diff --git a/mm/zswap.c b/mm/zswap.c index 5d0f8b13a958..a7a2443912f4 100644 --- a/mm/zswap.c +++ b/mm/zswap.c @@ -1014,8 +1014,8 @@ static int zswap_writeback_entry(struct zswap_entry *= entry, return -EEXIST; =20 mpol =3D get_task_policy(current); - folio =3D __read_swap_cache_async(swpentry, GFP_KERNEL, mpol, - NO_INTERLEAVE_INDEX, &folio_was_allocated, true); + folio =3D swap_cache_alloc_folio(swpentry, GFP_KERNEL, mpol, + NO_INTERLEAVE_INDEX, &folio_was_allocated, true); put_swap_device(si); if (!folio) return -ENOMEM; --=20 2.52.0