From nobody Tue Dec 2 00:45:58 2025 Received: from mail-pf1-f179.google.com (mail-pf1-f179.google.com [209.85.210.179]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 082F12DEA94 for ; Mon, 24 Nov 2025 19:16:42 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.210.179 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1764011804; cv=none; b=oZSesRqjcqk4vIK32FKr3svk02mlNWjRC+Wc33MWM7rHO6WZfyFJ7J1aW5BBhDIQrm+CU29jmtzTp69FDphBGYDsNPeYFIvBx86De6Tv6e5F8XPM+nuCT4wEqQ7m1bPUkxxTfvzTopGLzbH+YcRO5t5JGFGGGcjXKU0J+e7M0sI= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1764011804; c=relaxed/simple; bh=3fIBJX96gJ3NGGMqm/+7n50xJ1aualCQQmrO0cYmagU=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=MBtpXrfZU4fXbSJJEnKpeuugGvbkdMiCyOzmTNundIcsCMcLVITM3yfP1HIGKtzbL2OitSGE/SsF8N3voFORehjVSL8hQQ6yhmNrp9hXoKUEN4js8+1a4gOpOTTYkMJPju3kKwCRsEftyLtZ8QJEQkfWbJS0Ik/9RvsHqcNzQO4= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=HZnyjlSX; arc=none smtp.client-ip=209.85.210.179 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="HZnyjlSX" Received: by mail-pf1-f179.google.com with SMTP id d2e1a72fcca58-7ba49f92362so2731849b3a.1 for ; Mon, 24 Nov 2025 11:16:42 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1764011802; x=1764616602; darn=vger.kernel.org; h=cc:to:in-reply-to:references:message-id:content-transfer-encoding :mime-version:subject:date:from:from:to:cc:subject:date:message-id :reply-to; bh=yWRzV1WRkV60aRgwV/nuINMmHJf95UsBR54eymIys8E=; b=HZnyjlSXeocUeFekOpEprFfOLyBO7Rq4LhGPV1A54kbfV+4fseYOrs+Ky7Af0q+/lg Y2NJgGdREFIsre0m/uhqQo3znnilU6i8lmfKFo5i9Oo9gjbXuim9Sj7stopYrnzus9n6 FrDqSNBzoA4RNW7rF7COpuyFfffeNiknpXVcw0OdQz0dXDKa0uR0HhIBfAg9nVJkHKNs KHXBPlDd3ewn2v3MD/TP+pYsyPuv461lm3vTeyf6SgjDHz8+azYNOX8f5QBoPPhtCkIP Iz0cDiO2941LjwkqtkJvpqMBpHdPNjuRP41SYRJta/9ad5Hy5xk6AplS+1CNKvQDuktK 94Og== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1764011802; x=1764616602; h=cc:to:in-reply-to:references:message-id:content-transfer-encoding :mime-version:subject:date:from:x-gm-gg:x-gm-message-state:from:to :cc:subject:date:message-id:reply-to; bh=yWRzV1WRkV60aRgwV/nuINMmHJf95UsBR54eymIys8E=; b=SRnXNFtBHjFq4g7HeF3cKGyUttAsmTAIq7lJXlxitFh17sLPAqj5KVrkMIEzMURBJq R3XfuTadSSzhLar7niJ8ZXRwSXIw2EZ5Jj1wsF3vWaTIcqvm4Kk3+90zHfRlHgGgEjmV TrlsC3AZh74O24MJDcCddBVb+yIE/jBAt+74yo0FCPRKf7WoEdenjA2IB3wE+sbzWjdD 4GA9vo7XyVaeS/BO0UHKqdU4wD79B8GL2XD37Vk9tQwFcI6DlZsAs80IcoIluBzmgHQr DaBozvPebt1ps7zngDW/FoyR17WEadeBNoFarF+aP9OCbtpARkQwXerqhfsVYX6PIKCP IGeQ== X-Forwarded-Encrypted: i=1; AJvYcCWOnfZz+z16Q5pYa74/X6O1FSh0YCW+AsjlUsAtfUT4resGB/DyGcnsBSx5+XZrTWXacvCFyjpg2JBAPIg=@vger.kernel.org X-Gm-Message-State: AOJu0YyTlIuCeVhR39gXICEVCjksIpp5r7KY1zAVIpb2OjPXxhO2kgGW j3cgoBPMVVKadtGFmVwdNDjxWdzV47aiUO6h94BD02uXH99DXsdPb3Af X-Gm-Gg: ASbGncuwgLT99vH8FFNre6QSLWqEbVul1gn5eTs0lA8nphjm9TObZ1UDQxnKIX8/DCa bHiDPFCdJTpyNSrpPvJ+eQC6/VwzHyXCSRZnMoBs+7oGmuux4zNxVF56HJnCxFGaxv3GkmNDcjG 2QUVwa1Q0sm/Uj4YAf1FwXOJ8UH1asPDHypZT1R8bwrQxUIYGZzsfOsWQ2Q2iMi3AKAYN9TcXSn VeTobewZfd1kDJYTrZzK5T4cEYQfSawMiNOMM/iaDky9iG4DX6VKd18msaqFEpjdH5iO+yhAicI TPUepBizMw87NlLmC3ci4qeE/Qa4xe+6si+hz9rcr1lhhwzTu1aE/CHFr/IjnOeZwFo8GzjfYAT G5DG0BZte9I9O62z1XvSUR71eRJhSsnTJljCd5Un6bjPJ2UaM+B4uBVchq712VqW0uVf9hir34S E3KkZjq2zVIYkm/RhrvXJA5zfL/ELJ0liwiGfyR/v5hixhpzJz X-Google-Smtp-Source: AGHT+IF6oDkF29fP7Zgad5PE+BeKohxldhqsbyS6qNXAnkrppKlg9jNj64G56E7F3aiZ6PnkAZD+OQ== X-Received: by 2002:a05:6a21:3290:b0:2ab:a456:9b09 with SMTP id adf61e73a8af0-3614f4a0c7fmr15516656637.15.1764011802296; Mon, 24 Nov 2025 11:16:42 -0800 (PST) Received: from [127.0.0.1] ([101.32.222.185]) by smtp.gmail.com with ESMTPSA id 41be03b00d2f7-bd75def75ffsm14327479a12.3.2025.11.24.11.16.37 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 24 Nov 2025 11:16:41 -0800 (PST) From: Kairui Song Date: Tue, 25 Nov 2025 03:13:56 +0800 Subject: [PATCH v3 13/19] mm, swap: remove workaround for unsynchronized swap map cache state Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Message-Id: <20251125-swap-table-p2-v3-13-33f54f707a5c@tencent.com> References: <20251125-swap-table-p2-v3-0-33f54f707a5c@tencent.com> In-Reply-To: <20251125-swap-table-p2-v3-0-33f54f707a5c@tencent.com> To: linux-mm@kvack.org Cc: Andrew Morton , Baoquan He , Barry Song , Chris Li , Nhat Pham , Yosry Ahmed , David Hildenbrand , Johannes Weiner , Youngjun Park , Hugh Dickins , Baolin Wang , Ying Huang , Kemeng Shi , Lorenzo Stoakes , "Matthew Wilcox (Oracle)" , linux-kernel@vger.kernel.org, Kairui Song X-Mailer: b4 0.14.3 X-Developer-Signature: v=1; a=ed25519-sha256; t=1764011730; l=7032; i=kasong@tencent.com; s=kasong-sign-tencent; h=from:subject:message-id; bh=EqyErHrG43rTclh/Dj/hWJSWApsejLOsmiO76lsWvo8=; b=/oih483LAH2Cg8yPDBqVttvrPiriRWEQTCTk58Q6x8+VMqd3wNHzP3ITe0bAg2xv7661yfmZf ZEEkXZkG255BX/sg5BxhXbn68OYafTCP3y4/s7htW+ddB/tWFmz4Q9O X-Developer-Key: i=kasong@tencent.com; a=ed25519; pk=kCdoBuwrYph+KrkJnrr7Sm1pwwhGDdZKcKrqiK8Y1mI= From: Kairui Song Remove the "skip if exists" check from commit a65b0e7607ccb ("zswap: make shrinking memcg-aware"). It was needed because there is a tiny time window between setting the SWAP_HAS_CACHE bit and actually adding the folio to the swap cache. If a user is trying to add the folio into the swap cache but another user was interrupted after setting SWAP_HAS_CACHE but hasn't added the folio to the swap cache yet, it might lead to a deadlock. We have moved the bit setting to the same critical section as adding the folio, so this is no longer needed. Remove it and clean it up. Signed-off-by: Kairui Song --- mm/swap.h | 2 +- mm/swap_state.c | 27 ++++++++++----------------- mm/zswap.c | 2 +- 3 files changed, 12 insertions(+), 19 deletions(-) diff --git a/mm/swap.h b/mm/swap.h index b5075a1aee04..6777b2ab9d92 100644 --- a/mm/swap.h +++ b/mm/swap.h @@ -260,7 +260,7 @@ int swap_cache_add_folio(struct folio *folio, swp_entry= _t entry, void swap_cache_del_folio(struct folio *folio); struct folio *swap_cache_alloc_folio(swp_entry_t entry, gfp_t gfp_flags, struct mempolicy *mpol, pgoff_t ilx, - bool *alloced, bool skip_if_exists); + bool *alloced); /* Below helpers require the caller to lock and pass in the swap cluster. = */ void __swap_cache_del_folio(struct swap_cluster_info *ci, struct folio *folio, swp_entry_t entry, void *shadow); diff --git a/mm/swap_state.c b/mm/swap_state.c index 847763c6dd4a..c29b7e386a7c 100644 --- a/mm/swap_state.c +++ b/mm/swap_state.c @@ -445,8 +445,6 @@ void swap_update_readahead(struct folio *folio, struct = vm_area_struct *vma, * @folio: folio to be added. * @gfp: memory allocation flags for charge, can be 0 if @charged if true. * @charged: if the folio is already charged. - * @skip_if_exists: if the slot is in a cached state, return NULL. - * This is an old workaround that will be removed shortly. * * Update the swap_map and add folio as swap cache, typically before swapi= n. * All swap slots covered by the folio must have a non-zero swap count. @@ -457,8 +455,7 @@ void swap_update_readahead(struct folio *folio, struct = vm_area_struct *vma, */ static struct folio *__swap_cache_prepare_and_add(swp_entry_t entry, struct folio *folio, - gfp_t gfp, bool charged, - bool skip_if_exists) + gfp_t gfp, bool charged) { struct folio *swapcache =3D NULL; void *shadow; @@ -478,7 +475,7 @@ static struct folio *__swap_cache_prepare_and_add(swp_e= ntry_t entry, * might return a folio that is irrelevant to the faulting * entry because @entry is aligned down. Just return NULL. */ - if (ret !=3D -EEXIST || skip_if_exists || folio_test_large(folio)) + if (ret !=3D -EEXIST || folio_test_large(folio)) goto failed; =20 swapcache =3D swap_cache_get_folio(entry); @@ -511,8 +508,6 @@ static struct folio *__swap_cache_prepare_and_add(swp_e= ntry_t entry, * @mpol: NUMA memory allocation policy to be applied * @ilx: NUMA interleave index, for use only when MPOL_INTERLEAVE * @new_page_allocated: sets true if allocation happened, false otherwise - * @skip_if_exists: if the slot is a partially cached state, return NULL. - * This is a workaround that would be removed shortly. * * Allocate a folio in the swap cache for one swap slot, typically before * doing IO (e.g. swap in or zswap writeback). The swap slot indicated by @@ -525,8 +520,7 @@ static struct folio *__swap_cache_prepare_and_add(swp_e= ntry_t entry, */ struct folio *swap_cache_alloc_folio(swp_entry_t entry, gfp_t gfp_mask, struct mempolicy *mpol, pgoff_t ilx, - bool *new_page_allocated, - bool skip_if_exists) + bool *new_page_allocated) { struct swap_info_struct *si =3D __swap_entry_to_info(entry); struct folio *folio; @@ -547,8 +541,7 @@ struct folio *swap_cache_alloc_folio(swp_entry_t entry,= gfp_t gfp_mask, if (!folio) return NULL; /* Try add the new folio, returns existing folio or NULL on failure. */ - result =3D __swap_cache_prepare_and_add(entry, folio, gfp_mask, - false, skip_if_exists); + result =3D __swap_cache_prepare_and_add(entry, folio, gfp_mask, false); if (result =3D=3D folio) *new_page_allocated =3D true; else @@ -577,7 +570,7 @@ struct folio *swapin_folio(swp_entry_t entry, struct fo= lio *folio) unsigned long nr_pages =3D folio_nr_pages(folio); =20 entry =3D swp_entry(swp_type(entry), round_down(offset, nr_pages)); - swapcache =3D __swap_cache_prepare_and_add(entry, folio, 0, true, false); + swapcache =3D __swap_cache_prepare_and_add(entry, folio, 0, true); if (swapcache =3D=3D folio) swap_read_folio(folio, NULL); return swapcache; @@ -605,7 +598,7 @@ struct folio *read_swap_cache_async(swp_entry_t entry, = gfp_t gfp_mask, =20 mpol =3D get_vma_policy(vma, addr, 0, &ilx); folio =3D swap_cache_alloc_folio(entry, gfp_mask, mpol, ilx, - &page_allocated, false); + &page_allocated); mpol_cond_put(mpol); =20 if (page_allocated) @@ -724,7 +717,7 @@ struct folio *swap_cluster_readahead(swp_entry_t entry,= gfp_t gfp_mask, /* Ok, do the async read-ahead now */ folio =3D swap_cache_alloc_folio( swp_entry(swp_type(entry), offset), gfp_mask, mpol, ilx, - &page_allocated, false); + &page_allocated); if (!folio) continue; if (page_allocated) { @@ -742,7 +735,7 @@ struct folio *swap_cluster_readahead(swp_entry_t entry,= gfp_t gfp_mask, skip: /* The page was likely read above, so no need for plugging here */ folio =3D swap_cache_alloc_folio(entry, gfp_mask, mpol, ilx, - &page_allocated, false); + &page_allocated); if (unlikely(page_allocated)) swap_read_folio(folio, NULL); return folio; @@ -847,7 +840,7 @@ static struct folio *swap_vma_readahead(swp_entry_t tar= g_entry, gfp_t gfp_mask, continue; } folio =3D swap_cache_alloc_folio(entry, gfp_mask, mpol, ilx, - &page_allocated, false); + &page_allocated); if (si) put_swap_device(si); if (!folio) @@ -869,7 +862,7 @@ static struct folio *swap_vma_readahead(swp_entry_t tar= g_entry, gfp_t gfp_mask, skip: /* The folio was likely read above, so no need for plugging here */ folio =3D swap_cache_alloc_folio(targ_entry, gfp_mask, mpol, targ_ilx, - &page_allocated, false); + &page_allocated); if (unlikely(page_allocated)) swap_read_folio(folio, NULL); return folio; diff --git a/mm/zswap.c b/mm/zswap.c index a7a2443912f4..d8a33db9d3cc 100644 --- a/mm/zswap.c +++ b/mm/zswap.c @@ -1015,7 +1015,7 @@ static int zswap_writeback_entry(struct zswap_entry *= entry, =20 mpol =3D get_task_policy(current); folio =3D swap_cache_alloc_folio(swpentry, GFP_KERNEL, mpol, - NO_INTERLEAVE_INDEX, &folio_was_allocated, true); + NO_INTERLEAVE_INDEX, &folio_was_allocated); put_swap_device(si); if (!folio) return -ENOMEM; --=20 2.52.0