From nobody Tue Dec 16 13:20:29 2025 Received: from mail-pl1-f173.google.com (mail-pl1-f173.google.com [209.85.214.173]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 25B7434E749 for ; Thu, 4 Dec 2025 19:30:38 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.173 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1764876640; cv=none; b=LNmLqR8SV6Kil3ualJOoQ1qNIRhEIifk3nt9soOI1MaEUabSLxLUiLeH8JXWRl8OYrLtCi9he7N71OV+sa5qAlpegmgTkThFgp+LJPBK2gSz6GipJDqTPbMxNdqgox5o4EopBaicsVwTYIVKefPLdAMhVfbW+9TpURZOQCJ023A= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1764876640; c=relaxed/simple; bh=7V3z6QyMtwAETHD1IqGfcpC8uYH0tKWQwoxXjylZCX4=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=sy4UemeZdei7Ty8Jp1WoU4xt0WI9SCgG7qpfljNpTKWR+bCNKT09MlpehpKy7XmSSU0IbLEDI5RhpV75KRF3oEzc+Hf88EgaQ0NxMJ8ctQ4Y2Qax1EA5RQ9Ceb0f4Xc3jVFZwun69TToDzBbH7Q9DzT4Kz46ZlVl160oD4ttgCo= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=AIbaxql7; arc=none smtp.client-ip=209.85.214.173 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="AIbaxql7" Received: by mail-pl1-f173.google.com with SMTP id d9443c01a7336-297d4a56f97so17988455ad.1 for ; Thu, 04 Dec 2025 11:30:38 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1764876638; x=1765481438; darn=vger.kernel.org; h=cc:to:in-reply-to:references:message-id:content-transfer-encoding :mime-version:subject:date:from:from:to:cc:subject:date:message-id :reply-to; bh=iynThVMRdzdMBrnu7L3dH8nlr9uonc5S69ZtO1nbUe4=; b=AIbaxql7scjtZkTEa9TyAMcqoD/DEOgfpZsZcAt/f34kkskD5g5aP4I4PKgkAE8y5Q dIidRswCtDaET7is6dNR4DMrptn1E6R6wHnym1abZYJLoUCboTOzsKJcygiUuZqMk9uN +7vVHH1Ajtg9kg393s8tveR7yUXX22kjq0CJSI31KKyR0xhZjIM68925rL7vNF2qKlie EKPdHKoyYGi55CrHaZpykXjkoyfvIr6rVxDeyuw6/mtpqrCCwZL/byuh4StgKn0rHe/n 2MUOhJzXrg39A5b40Ep0d440FP36x90fy9pYbzp6v94adiP/aXYhgg1V+tA2uyRRhJAg rIvQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1764876638; x=1765481438; h=cc:to:in-reply-to:references:message-id:content-transfer-encoding :mime-version:subject:date:from:x-gm-gg:x-gm-message-state:from:to :cc:subject:date:message-id:reply-to; bh=iynThVMRdzdMBrnu7L3dH8nlr9uonc5S69ZtO1nbUe4=; b=HAeqzIQROi5a3+dMJZdZgYqIGqDiMlN49q93dVY5e+GdmHu6nrm85PmdSCqH1vtQMe Kp/C+4AItTZzw2Q1mauSJ71EmTxpSHjECrEVzC2OVRLUYFhwCq49GqqQ6aK0yLTDnoNJ aso7ufc5TeAfMeraST1R/Ofnt+/VkM2HqG25aTFg/V7JlyAQEYfcY5HmMrLXSGoJS6mt h5BltuNxPkgVnP+0ZnenxB5MdOhJS7qtLOwea0FlwOhUWXLucNPfx8lioROc0De8EjFo KYMfcmUhs1n0ZqmBOHuXeroJ+W9IbNrdNhbEi26eO+lU1fncC39Q/psEvKgJTrGU2+2Y rTDA== X-Forwarded-Encrypted: i=1; AJvYcCVqUgSTrE48SLMiP9yHOojmBM/+ICwS2sJqxmoPJuiErMDF0aYzE9ttjQ+omiAUXUfX9ijzQ9qepL4iTf8=@vger.kernel.org X-Gm-Message-State: AOJu0YwAdVJLUsoC8/Aw5JAeep5jY+hwu9VxesGLWFhY8qNXLrX5hjCS KmZNJvUzaWhQhQTuAtWsOZmQnuqIb3prup1d7HkTRMGCqAc0feq4Q5ZS X-Gm-Gg: ASbGncudf7mNOF36EiBXHUq8uITXnNyKwJHJxiuRRrbJe4rkr2RKME70RVJrVUPVFmO mHcLxGj5QE1RXykFNOIDqDfxbWSk+lynBjZ9YozkT2M75lJWOH1Op3R1dz2g/iQQAK9cyNn9nM5 C7jCmEP1e0fIG43oGMPUogQqRc5d2TCNqBYOqKkOwge/FRHJyHxawqVcvhm8gYW0av2Fze2KPhC wcAhD643QwUn4/lWo1VDYpophPYlsMnpcRgehOMgccHUmbrxquBw1tpGzyQFO7dWs4Bl5Z9LVHe ZVxkjL/iBjXqgXqvDEj3ggIV7I3j5ndIh1bNiR/H+k5TU6FSPetzuU8kXC1zb9FghWPpyvVHRsO YS1oq7DhTOhkQzJ3oo6jaoTo4iyAxmOHOFuXfFu/s72rcoJ5Z7UL1xqA+CQF6b7gDn5ZOAp+hCr Ssg/0chdvLLTzur4e2ty48TGmgpDqnLSbunxsh1i+6mPRzPw3d X-Google-Smtp-Source: AGHT+IHkAo95IOwWZh4V1qJmP0GJYz+gC5Ov7f6z3VfqkVSM2RXpu273c86T89bxenPC+IcvjibDoQ== X-Received: by 2002:a17:902:c406:b0:298:49db:a9c5 with SMTP id d9443c01a7336-29d68430f25mr77084565ad.43.1764876638354; Thu, 04 Dec 2025 11:30:38 -0800 (PST) Received: from [127.0.0.1] ([101.32.222.185]) by smtp.gmail.com with ESMTPSA id 41be03b00d2f7-bf686b3b5a9sm2552926a12.9.2025.12.04.11.30.34 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 04 Dec 2025 11:30:37 -0800 (PST) From: Kairui Song Date: Fri, 05 Dec 2025 03:29:21 +0800 Subject: [PATCH v4 13/19] mm, swap: remove workaround for unsynchronized swap map cache state Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Message-Id: <20251205-swap-table-p2-v4-13-cb7e28a26a40@tencent.com> References: <20251205-swap-table-p2-v4-0-cb7e28a26a40@tencent.com> In-Reply-To: <20251205-swap-table-p2-v4-0-cb7e28a26a40@tencent.com> To: linux-mm@kvack.org Cc: Andrew Morton , Baoquan He , Barry Song , Chris Li , Nhat Pham , Yosry Ahmed , David Hildenbrand , Johannes Weiner , Youngjun Park , Hugh Dickins , Baolin Wang , Ying Huang , Kemeng Shi , Lorenzo Stoakes , "Matthew Wilcox (Oracle)" , linux-kernel@vger.kernel.org, Kairui Song X-Mailer: b4 0.14.3 X-Developer-Signature: v=1; a=ed25519-sha256; t=1764876574; l=7032; i=kasong@tencent.com; s=kasong-sign-tencent; h=from:subject:message-id; bh=SYqPgb2o84SebrcaWLsBII5bIr04ISZ9VA6R4j7AeY4=; b=18DhOG0mhl1tF3AnvCIQZl0WLi8NH5XB5nif23XJSLd0gnQqWZPk0OeZ6+x8LGsfRGVAEvDSJ DNd9sb4BWPRDbRzc+Ht9ROQgN9O3yghC06z0xSLZZoJlEwY1lsiqO7R X-Developer-Key: i=kasong@tencent.com; a=ed25519; pk=kCdoBuwrYph+KrkJnrr7Sm1pwwhGDdZKcKrqiK8Y1mI= From: Kairui Song Remove the "skip if exists" check from commit a65b0e7607ccb ("zswap: make shrinking memcg-aware"). It was needed because there is a tiny time window between setting the SWAP_HAS_CACHE bit and actually adding the folio to the swap cache. If a user is trying to add the folio into the swap cache but another user was interrupted after setting SWAP_HAS_CACHE but hasn't added the folio to the swap cache yet, it might lead to a deadlock. We have moved the bit setting to the same critical section as adding the folio, so this is no longer needed. Remove it and clean it up. Signed-off-by: Kairui Song --- mm/swap.h | 2 +- mm/swap_state.c | 27 ++++++++++----------------- mm/zswap.c | 2 +- 3 files changed, 12 insertions(+), 19 deletions(-) diff --git a/mm/swap.h b/mm/swap.h index b5075a1aee04..6777b2ab9d92 100644 --- a/mm/swap.h +++ b/mm/swap.h @@ -260,7 +260,7 @@ int swap_cache_add_folio(struct folio *folio, swp_entry= _t entry, void swap_cache_del_folio(struct folio *folio); struct folio *swap_cache_alloc_folio(swp_entry_t entry, gfp_t gfp_flags, struct mempolicy *mpol, pgoff_t ilx, - bool *alloced, bool skip_if_exists); + bool *alloced); /* Below helpers require the caller to lock and pass in the swap cluster. = */ void __swap_cache_del_folio(struct swap_cluster_info *ci, struct folio *folio, swp_entry_t entry, void *shadow); diff --git a/mm/swap_state.c b/mm/swap_state.c index df7df8b75e52..1a69ba3be87f 100644 --- a/mm/swap_state.c +++ b/mm/swap_state.c @@ -445,8 +445,6 @@ void swap_update_readahead(struct folio *folio, struct = vm_area_struct *vma, * @folio: folio to be added. * @gfp: memory allocation flags for charge, can be 0 if @charged if true. * @charged: if the folio is already charged. - * @skip_if_exists: if the slot is in a cached state, return NULL. - * This is an old workaround that will be removed shortly. * * Update the swap_map and add folio as swap cache, typically before swapi= n. * All swap slots covered by the folio must have a non-zero swap count. @@ -457,8 +455,7 @@ void swap_update_readahead(struct folio *folio, struct = vm_area_struct *vma, */ static struct folio *__swap_cache_prepare_and_add(swp_entry_t entry, struct folio *folio, - gfp_t gfp, bool charged, - bool skip_if_exists) + gfp_t gfp, bool charged) { struct folio *swapcache =3D NULL; void *shadow; @@ -478,7 +475,7 @@ static struct folio *__swap_cache_prepare_and_add(swp_e= ntry_t entry, * might return a folio that is irrelevant to the faulting * entry because @entry is aligned down. Just return NULL. */ - if (ret !=3D -EEXIST || skip_if_exists || folio_test_large(folio)) + if (ret !=3D -EEXIST || folio_test_large(folio)) goto failed; =20 swapcache =3D swap_cache_get_folio(entry); @@ -511,8 +508,6 @@ static struct folio *__swap_cache_prepare_and_add(swp_e= ntry_t entry, * @mpol: NUMA memory allocation policy to be applied * @ilx: NUMA interleave index, for use only when MPOL_INTERLEAVE * @new_page_allocated: sets true if allocation happened, false otherwise - * @skip_if_exists: if the slot is a partially cached state, return NULL. - * This is a workaround that would be removed shortly. * * Allocate a folio in the swap cache for one swap slot, typically before * doing IO (e.g. swap in or zswap writeback). The swap slot indicated by @@ -525,8 +520,7 @@ static struct folio *__swap_cache_prepare_and_add(swp_e= ntry_t entry, */ struct folio *swap_cache_alloc_folio(swp_entry_t entry, gfp_t gfp_mask, struct mempolicy *mpol, pgoff_t ilx, - bool *new_page_allocated, - bool skip_if_exists) + bool *new_page_allocated) { struct swap_info_struct *si =3D __swap_entry_to_info(entry); struct folio *folio; @@ -547,8 +541,7 @@ struct folio *swap_cache_alloc_folio(swp_entry_t entry,= gfp_t gfp_mask, if (!folio) return NULL; /* Try add the new folio, returns existing folio or NULL on failure. */ - result =3D __swap_cache_prepare_and_add(entry, folio, gfp_mask, - false, skip_if_exists); + result =3D __swap_cache_prepare_and_add(entry, folio, gfp_mask, false); if (result =3D=3D folio) *new_page_allocated =3D true; else @@ -577,7 +570,7 @@ struct folio *swapin_folio(swp_entry_t entry, struct fo= lio *folio) unsigned long nr_pages =3D folio_nr_pages(folio); =20 entry =3D swp_entry(swp_type(entry), round_down(offset, nr_pages)); - swapcache =3D __swap_cache_prepare_and_add(entry, folio, 0, true, false); + swapcache =3D __swap_cache_prepare_and_add(entry, folio, 0, true); if (swapcache =3D=3D folio) swap_read_folio(folio, NULL); return swapcache; @@ -605,7 +598,7 @@ struct folio *read_swap_cache_async(swp_entry_t entry, = gfp_t gfp_mask, =20 mpol =3D get_vma_policy(vma, addr, 0, &ilx); folio =3D swap_cache_alloc_folio(entry, gfp_mask, mpol, ilx, - &page_allocated, false); + &page_allocated); mpol_cond_put(mpol); =20 if (page_allocated) @@ -724,7 +717,7 @@ struct folio *swap_cluster_readahead(swp_entry_t entry,= gfp_t gfp_mask, /* Ok, do the async read-ahead now */ folio =3D swap_cache_alloc_folio( swp_entry(swp_type(entry), offset), gfp_mask, mpol, ilx, - &page_allocated, false); + &page_allocated); if (!folio) continue; if (page_allocated) { @@ -742,7 +735,7 @@ struct folio *swap_cluster_readahead(swp_entry_t entry,= gfp_t gfp_mask, skip: /* The page was likely read above, so no need for plugging here */ folio =3D swap_cache_alloc_folio(entry, gfp_mask, mpol, ilx, - &page_allocated, false); + &page_allocated); if (unlikely(page_allocated)) swap_read_folio(folio, NULL); return folio; @@ -847,7 +840,7 @@ static struct folio *swap_vma_readahead(swp_entry_t tar= g_entry, gfp_t gfp_mask, continue; } folio =3D swap_cache_alloc_folio(entry, gfp_mask, mpol, ilx, - &page_allocated, false); + &page_allocated); if (si) put_swap_device(si); if (!folio) @@ -869,7 +862,7 @@ static struct folio *swap_vma_readahead(swp_entry_t tar= g_entry, gfp_t gfp_mask, skip: /* The folio was likely read above, so no need for plugging here */ folio =3D swap_cache_alloc_folio(targ_entry, gfp_mask, mpol, targ_ilx, - &page_allocated, false); + &page_allocated); if (unlikely(page_allocated)) swap_read_folio(folio, NULL); return folio; diff --git a/mm/zswap.c b/mm/zswap.c index a7a2443912f4..d8a33db9d3cc 100644 --- a/mm/zswap.c +++ b/mm/zswap.c @@ -1015,7 +1015,7 @@ static int zswap_writeback_entry(struct zswap_entry *= entry, =20 mpol =3D get_task_policy(current); folio =3D swap_cache_alloc_folio(swpentry, GFP_KERNEL, mpol, - NO_INTERLEAVE_INDEX, &folio_was_allocated, true); + NO_INTERLEAVE_INDEX, &folio_was_allocated); put_swap_device(si); if (!folio) return -ENOMEM; --=20 2.52.0