From nobody Tue Dec 16 13:21:45 2025 Received: from mail-pl1-f170.google.com (mail-pl1-f170.google.com [209.85.214.170]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id BA1F834B1B6 for ; Thu, 4 Dec 2025 19:29:49 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.170 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1764876591; cv=none; b=NEp4U/eHcxPRKBR0hAYRyEh/sYHmTOKKpujb7whQwNHs6n0+nwCLF5V2B7XbLVPQFsDdpRz7izPqfSe/NGzLynBLGvivALPwsE56RAet3eLz5cp52spxKIwmOdu2jrSi69VVmklNaDSDZYlcX7o0O9mFfhYHNhsI0c1cBqy6LwQ= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1764876591; c=relaxed/simple; bh=mDR+6KpgZSns5kFDjnZ/kPqYNF5XkIY1kIQ6+gyx8GY=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=QqShkVwByAQqP5+ugrHaLIrpn0j1ChvgV9gNxhgFkI+YZOWLFV8NMcOP+3GxSMmLTled6OtlsGvUBdfZRDsdp7aRxvb6O57FtsYqv+cbSSfHVcf6n742BlqkE5p2KcTdHC8Ro6/d+7m1hw2zLjLc8I82k1B5HRpOik7axo2cpkM= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=kjtD4MU2; arc=none smtp.client-ip=209.85.214.170 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="kjtD4MU2" Received: by mail-pl1-f170.google.com with SMTP id d9443c01a7336-29568d93e87so13788035ad.2 for ; Thu, 04 Dec 2025 11:29:49 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1764876589; x=1765481389; darn=vger.kernel.org; h=cc:to:in-reply-to:references:message-id:content-transfer-encoding :mime-version:subject:date:from:from:to:cc:subject:date:message-id :reply-to; bh=wOM3SvUw6equ2pewheYxYcWCJBK1kBvmm5iLOdPjOeg=; b=kjtD4MU2xCbU12R3kUiFA+godDarNJJ57DTfHtnU+xkW3UIExiZSt4GvndDoe+Q8Q5 zZMNx5iVPqstFSXJ54rxQiW4vkA6D+DrxHxukSpBM08zqHP3XSkeOjw+7+QBZ3Q1LT2z usPWPIcNbw9C6AT29UIQpKfQWxNSF+9moie/CIEZ3G462eu+Npbq5qouTUZI93qs9OtQ 9YXgiZyx+ELAwUAh+ERlqraMCqsfn3xV/I7uUNbxsGHRxxFpZmHTyQ9hWWvmmbpBTRa5 tE8TuHwY3zaO2TwdMl/8EScmG8+AAMK4nVqFWQZJSVoUjIjZvjETVEwgFEL0oO9C8c25 VIsA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1764876589; x=1765481389; h=cc:to:in-reply-to:references:message-id:content-transfer-encoding :mime-version:subject:date:from:x-gm-gg:x-gm-message-state:from:to :cc:subject:date:message-id:reply-to; bh=wOM3SvUw6equ2pewheYxYcWCJBK1kBvmm5iLOdPjOeg=; b=jiuZQX2lqDc91DoZ8HKBAR03sXMEc0OfINdoP+cCaW6+3aq/59Op3jTtlkw4FaFJhd 1BHRDRTlcXmws91K1elVKdlmOTziJTzlUBviP+Rwx5ZT3etkhvs4DuhYYgvhTRqWoLG8 O2o3pMQJP5GfZujW+Birl68luvxpZdVSHQQLZdG9dFhDAJVIPCMCokkOjP2FXkqdwGk8 9w9i2sXiiCp9nt6AZHVFp+TOh1avTM6xGvc7RUXrMcNJJbcgQhzzkU15OY+VuMregvsk AIAbKNMHZXfStJNgfpZ7HlLfvpauToQ+JDIHcc/kR43lQgO2CC2DxzAOLEqP2aILZUKN m5Gg== X-Forwarded-Encrypted: i=1; AJvYcCW60O9gVdPU7IfklGgyv+bkEf+ZzV98Of61zjOIV9RjJyz82YJHUj3YdnCJacx9+1Jq+W6WyWC62yK0XdM=@vger.kernel.org X-Gm-Message-State: AOJu0YwO2+6yxvWXAKre/S2K3YWPf8KNXUSf9nXIaCRWPJ7U7IxwgyCR 2YCiWpVtcra1n5v2RcST2vXJyvdeCetp0Yw2XnDjkD0sJHcZicv2Fe1i X-Gm-Gg: ASbGncsg3jDpYW9PhBLiyoUMsu2nqmdZI393ClGiQc3TC/Zx8ggyPKXDKrGTRObvgCn nkcQk0gx6lsTsW9Gh+W5egFB3iDsZFcj95n8mzh0inS62aduU1GKrC4rnhsVTK8DhvRcZKbKp99 JcdwV6GnnoiYIofRAgIIBsV949rDa29um1bxIf93wjIrvGA2xVO1jQRhqTO2t1ZJCPYL2ZlN+5f 4PfST1F92iGoFhpSAhGeGTj009+5IoZkph9YknyueemYvsTMsiMVyj60aI1uXbyNDu2JKxDq+zs IpAANXdQ5KQDIHRLcJQyZSVc07CZjxEsXPm9Y/c775sJIfLoy58EaJNn6b9GwYyjYptViyQjNba lFMD6M8egGu8umLPvJepesLSVNEadnNVahWtm8Da56Rj4jll3emc/7LVyGZVEYE1QU0HmBABmEq xQrUCYtyye7oD4tnSxN62zbtdmjvCJjAi6vT5CBj91UphqRoTg X-Google-Smtp-Source: AGHT+IHdO2edJuvMK7Tdmwga/SAccjACFd4tKCXhD5hJFGNoxPw4sAHksA0RDQa7PXKOpMFQOxmb1g== X-Received: by 2002:a17:902:f603:b0:295:ed6:4625 with SMTP id d9443c01a7336-29da1eb4c96mr48731645ad.47.1764876588861; Thu, 04 Dec 2025 11:29:48 -0800 (PST) Received: from [127.0.0.1] ([101.32.222.185]) by smtp.gmail.com with ESMTPSA id 41be03b00d2f7-bf686b3b5a9sm2552926a12.9.2025.12.04.11.29.44 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 04 Dec 2025 11:29:48 -0800 (PST) From: Kairui Song Date: Fri, 05 Dec 2025 03:29:10 +0800 Subject: [PATCH v4 02/19] mm, swap: split swap cache preparation loop into a standalone helper Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Message-Id: <20251205-swap-table-p2-v4-2-cb7e28a26a40@tencent.com> References: <20251205-swap-table-p2-v4-0-cb7e28a26a40@tencent.com> In-Reply-To: <20251205-swap-table-p2-v4-0-cb7e28a26a40@tencent.com> To: linux-mm@kvack.org Cc: Andrew Morton , Baoquan He , Barry Song , Chris Li , Nhat Pham , Yosry Ahmed , David Hildenbrand , Johannes Weiner , Youngjun Park , Hugh Dickins , Baolin Wang , Ying Huang , Kemeng Shi , Lorenzo Stoakes , "Matthew Wilcox (Oracle)" , linux-kernel@vger.kernel.org, Kairui Song X-Mailer: b4 0.14.3 X-Developer-Signature: v=1; a=ed25519-sha256; t=1764876574; l=7996; i=kasong@tencent.com; s=kasong-sign-tencent; h=from:subject:message-id; bh=zTQ0qqEjQqVMue2feX7IfMCwec9b88xoOj9LChkN4Ao=; b=rgkfXy5MtJkcEjGgJL0f43HP7rPfSU/RD1a9mE+hj9skY7zby2N4ptMkQ6KZcXdzhNRVXBYXm mC/FZQVBxhUC4eLfyPvx8iYrVBSRYmEP/OvTR8ach2w50MmIBUAHrlX X-Developer-Key: i=kasong@tencent.com; a=ed25519; pk=kCdoBuwrYph+KrkJnrr7Sm1pwwhGDdZKcKrqiK8Y1mI= From: Kairui Song To prepare for the removal of swap cache bypass swapin, introduce a new helper that accepts an allocated and charged fresh folio, prepares the folio, the swap map, and then adds the folio to the swap cache. This doesn't change how swap cache works yet, we are still depending on the SWAP_HAS_CACHE in the swap map for synchronization. But all synchronization hacks are now all in this single helper. No feature change. Acked-by: Chris Li Reviewed-by: Barry Song Signed-off-by: Kairui Song --- mm/swap_state.c | 197 +++++++++++++++++++++++++++++++---------------------= ---- 1 file changed, 109 insertions(+), 88 deletions(-) diff --git a/mm/swap_state.c b/mm/swap_state.c index 08252eaef32f..a8511ce43242 100644 --- a/mm/swap_state.c +++ b/mm/swap_state.c @@ -402,6 +402,97 @@ void swap_update_readahead(struct folio *folio, struct= vm_area_struct *vma, } } =20 +/** + * __swap_cache_prepare_and_add - Prepare the folio and add it to swap cac= he. + * @entry: swap entry to be bound to the folio. + * @folio: folio to be added. + * @gfp: memory allocation flags for charge, can be 0 if @charged if true. + * @charged: if the folio is already charged. + * @skip_if_exists: if the slot is in a cached state, return NULL. + * This is an old workaround that will be removed shortly. + * + * Update the swap_map and add folio as swap cache, typically before swapi= n. + * All swap slots covered by the folio must have a non-zero swap count. + * + * Context: Caller must protect the swap device with reference count or lo= cks. + * Return: Returns the folio being added on success. Returns the existing = folio + * if @entry is already cached. Returns NULL if raced with swapin or swapo= ff. + */ +static struct folio *__swap_cache_prepare_and_add(swp_entry_t entry, + struct folio *folio, + gfp_t gfp, bool charged, + bool skip_if_exists) +{ + struct folio *swapcache; + void *shadow; + int ret; + + /* + * Check and pin the swap map with SWAP_HAS_CACHE, then add the folio + * into the swap cache. Loop with a schedule delay if raced with + * another process setting SWAP_HAS_CACHE. This hackish loop will + * be fixed very soon. + */ + for (;;) { + ret =3D swapcache_prepare(entry, folio_nr_pages(folio)); + if (!ret) + break; + + /* + * The skip_if_exists is for protecting against a recursive + * call to this helper on the same entry waiting forever + * here because SWAP_HAS_CACHE is set but the folio is not + * in the swap cache yet. This can happen today if + * mem_cgroup_swapin_charge_folio() below triggers reclaim + * through zswap, which may call this helper again in the + * writeback path. + * + * Large order allocation also needs special handling on + * race: if a smaller folio exists in cache, swapin needs + * to fallback to order 0, and doing a swap cache lookup + * might return a folio that is irrelevant to the faulting + * entry because @entry is aligned down. Just return NULL. + */ + if (ret !=3D -EEXIST || skip_if_exists || folio_test_large(folio)) + return NULL; + + /* + * Check the swap cache again, we can only arrive + * here because swapcache_prepare returns -EEXIST. + */ + swapcache =3D swap_cache_get_folio(entry); + if (swapcache) + return swapcache; + + /* + * We might race against __swap_cache_del_folio(), and + * stumble across a swap_map entry whose SWAP_HAS_CACHE + * has not yet been cleared. Or race against another + * swap_cache_alloc_folio(), which has set SWAP_HAS_CACHE + * in swap_map, but not yet added its folio to swap cache. + */ + schedule_timeout_uninterruptible(1); + } + + __folio_set_locked(folio); + __folio_set_swapbacked(folio); + + if (!charged && mem_cgroup_swapin_charge_folio(folio, NULL, gfp, entry)) { + put_swap_folio(folio, entry); + folio_unlock(folio); + return NULL; + } + + swap_cache_add_folio(folio, entry, &shadow); + memcg1_swapin(entry, folio_nr_pages(folio)); + if (shadow) + workingset_refault(folio, shadow); + + /* Caller will initiate read into locked folio */ + folio_add_lru(folio); + return folio; +} + /** * swap_cache_alloc_folio - Allocate folio for swapped out slot in swap ca= che. * @entry: the swapped out swap entry to be binded to the folio. @@ -428,99 +519,29 @@ struct folio *swap_cache_alloc_folio(swp_entry_t entr= y, gfp_t gfp_mask, { struct swap_info_struct *si =3D __swap_entry_to_info(entry); struct folio *folio; - struct folio *new_folio =3D NULL; struct folio *result =3D NULL; - void *shadow =3D NULL; =20 *new_page_allocated =3D false; - for (;;) { - int err; - - /* - * Check the swap cache first, if a cached folio is found, - * return it unlocked. The caller will lock and check it. - */ - folio =3D swap_cache_get_folio(entry); - if (folio) - goto got_folio; - - /* - * Just skip read ahead for unused swap slot. - */ - if (!swap_entry_swapped(si, entry)) - goto put_and_return; - - /* - * Get a new folio to read into from swap. Allocate it now if - * new_folio not exist, before marking swap_map SWAP_HAS_CACHE, - * when -EEXIST will cause any racers to loop around until we - * add it to cache. - */ - if (!new_folio) { - new_folio =3D folio_alloc_mpol(gfp_mask, 0, mpol, ilx, numa_node_id()); - if (!new_folio) - goto put_and_return; - } - - /* - * Swap entry may have been freed since our caller observed it. - */ - err =3D swapcache_prepare(entry, 1); - if (!err) - break; - else if (err !=3D -EEXIST) - goto put_and_return; - - /* - * Protect against a recursive call to swap_cache_alloc_folio() - * on the same entry waiting forever here because SWAP_HAS_CACHE - * is set but the folio is not the swap cache yet. This can - * happen today if mem_cgroup_swapin_charge_folio() below - * triggers reclaim through zswap, which may call - * swap_cache_alloc_folio() in the writeback path. - */ - if (skip_if_exists) - goto put_and_return; + /* Check the swap cache again for readahead path. */ + folio =3D swap_cache_get_folio(entry); + if (folio) + return folio; =20 - /* - * We might race against __swap_cache_del_folio(), and - * stumble across a swap_map entry whose SWAP_HAS_CACHE - * has not yet been cleared. Or race against another - * swap_cache_alloc_folio(), which has set SWAP_HAS_CACHE - * in swap_map, but not yet added its folio to swap cache. - */ - schedule_timeout_uninterruptible(1); - } - - /* - * The swap entry is ours to swap in. Prepare the new folio. - */ - __folio_set_locked(new_folio); - __folio_set_swapbacked(new_folio); - - if (mem_cgroup_swapin_charge_folio(new_folio, NULL, gfp_mask, entry)) - goto fail_unlock; - - swap_cache_add_folio(new_folio, entry, &shadow); - memcg1_swapin(entry, 1); + /* Skip allocation for unused swap slot for readahead path. */ + if (!swap_entry_swapped(si, entry)) + return NULL; =20 - if (shadow) - workingset_refault(new_folio, shadow); - - /* Caller will initiate read into locked new_folio */ - folio_add_lru(new_folio); - *new_page_allocated =3D true; - folio =3D new_folio; -got_folio: - result =3D folio; - goto put_and_return; - -fail_unlock: - put_swap_folio(new_folio, entry); - folio_unlock(new_folio); -put_and_return: - if (!(*new_page_allocated) && new_folio) - folio_put(new_folio); + /* Allocate a new folio to be added into the swap cache. */ + folio =3D folio_alloc_mpol(gfp_mask, 0, mpol, ilx, numa_node_id()); + if (!folio) + return NULL; + /* Try add the new folio, returns existing folio or NULL on failure. */ + result =3D __swap_cache_prepare_and_add(entry, folio, gfp_mask, + false, skip_if_exists); + if (result =3D=3D folio) + *new_page_allocated =3D true; + else + folio_put(folio); return result; } =20 --=20 2.52.0