From nobody Fri Jun 12 18:34:57 2026 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id A69ED3E6DE0 for ; Wed, 13 May 2026 09:21:14 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778664074; cv=none; b=IC+mkLwKAJ2zIWPSDo6uZ7Zd26DmN5vkDKPtskqGmg98a+FmqKiY2k6gkh2AcsI/hC0oVNLLaduFa58TafUrEJttlNgZ69tAWvvj9/syVov0ZEXA2pqRfbR5BvV6FUcUFyh2xt/vDzdtUYt4n7dac7GKjRR/pPHRNzj/VlBTJEY= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778664074; c=relaxed/simple; bh=aaZnNxvXRi5RGCK81lftC2Qg+WtZRlfiL86526nBOJk=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:To:Cc; b=B9SUcIawNTwL1YA/XB0szEoSSxwcAzSETp3roUQ4FaD2EagPzs/T7q3j7AvN5dB9fLhLXqlp1sdS1XlrxK6XrwcNqm0iDsSkGC3kU/Ue4bI2jSTEM67l4eUbYOeEsqehdVxHJKwZPUIdQwDnFtLEshfCXGG5aaTTC5frzIzx454= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=L+5lmtL/; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="L+5lmtL/" Received: by smtp.kernel.org (Postfix) with ESMTPS id 5662DC2BCC6; Wed, 13 May 2026 09:21:13 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1778664073; bh=aaZnNxvXRi5RGCK81lftC2Qg+WtZRlfiL86526nBOJk=; h=From:Date:Subject:To:Cc:Reply-To:From; b=L+5lmtL/COrDSFio0rUoG9I0GaS0/1xNDz2NY4MLj2hGQL708gMliOMEcPr2GqMVN 1ZKMf+VJjXs+PkdG4adC7j7TkLyVlM1JVMny3KgfOpfMyjOzZDL/rqqenTVUsHQCwj IBW2IQfyYTfr08c98+LFMVRP1BXdwit/myAI1tcvsHj/TOL7xcxCwggyxa+faDS8PP I7+1110tioGlFf2C9AwY4RYj2KBpoHY3dR4u+IINzZnsIk5dktMcGsB2QP2UoMMRgF JMojArMvnjh+dmT97TcpeUMvyLJiRmnxqGkuInVsKZ6WVwBza+AV4yhpfjeOzbo0FK 9ikW2SYk4WIAg== Received: from aws-us-west-2-korg-lkml-1.web.codeaurora.org (localhost.localdomain [127.0.0.1]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4C623CD4F24; Wed, 13 May 2026 09:21:13 +0000 (UTC) From: Kairui Song via B4 Relay Date: Wed, 13 May 2026 17:21:11 +0800 Subject: [PATCH] mm, swap: avoid leaving unused extend table after alloc race Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Message-Id: <20260513-swap-extend-table-fix-v1-1-a71dea851fb3@tencent.com> X-B4-Tracking: v=1; b=H4sIAAAAAAAC/yXMUQrCMBCE4auUfXahCVbFq4gPaXaqKxJLNmqh9 O5N9fEbmH8mQ1YYnZuZMj5q+koVbtdQvId0A6tUk2/9oe2cZ/uGkTEVJOES+id40IkhR3HDvjt JiFS/Y0adf93L9W979w/EssVoWVZl+BbseQAAAA== X-Change-ID: 20260512-swap-extend-table-fix-ed7d1f458dac To: linux-mm@kvack.org Cc: Andrew Morton , Breno Leitao , Chris Li , Kemeng Shi , Nhat Pham , Baoquan He , Barry Song , Youngjun Park , Kairui Song , linux-kernel@vger.kernel.org, Kairui Song X-Mailer: b4 0.15.2 X-Developer-Signature: v=1; a=ed25519-sha256; t=1778664071; l=3322; i=kasong@tencent.com; s=kasong-sign-tencent; h=from:subject:message-id; bh=olohF3NDSrfjvRRAsUpHGpzDKysDHKS6sIdmBbNETbA=; b=zpAlWCyt1bbDtA2xwMhWBQt1LZ0rOwusVROrVS8ULCEYK8RVrCy3ZRG6YzoMUZBUdCPTLeScB RQqXHkwetdFDO1d9a3eVQdPMExYZKu5VPmr8f27kKEH4r5U5nd/pjzM X-Developer-Key: i=kasong@tencent.com; a=ed25519; pk=kCdoBuwrYph+KrkJnrr7Sm1pwwhGDdZKcKrqiK8Y1mI= X-Endpoint-Received: by B4 Relay for kasong@tencent.com/kasong-sign-tencent with auth_id=562 X-Original-From: Kairui Song Reply-To: kasong@tencent.com From: Kairui Song Allocating an extend table requires dropping the ci lock first. While the lock is dropped, a concurrent put can decrease the slot's swap count to a value that is no longer maxed out, so the extend table is no longer required. The current allocation path still attach the new extend table to the cluster anyway, leaving it unused. It's not really leaked, the next maxed out count on the same cluster reuses the table, and frees it properly. Swapoff will also clean it up. The worst case is one unused page pinned per cluster until the next maxed-out allocation or swapoff. To eliminate the waste, re-check under the ci lock that the extend table is still needed before publishing it, and free the local allocation otherwise. The added overhead is ignorable. Fixes: 0d6af9bcf383 ("mm, swap: use the swap table to track the swap count") Reported-by: Breno Leitao Closes: https://lore.kernel.org/linux-mm/agG6Dp0umhs6O1SY@gmail.com/ Signed-off-by: Kairui Song Tested-by: Breno Leitao --- mm/swapfile.c | 32 +++++++++++++++++++++++++------- 1 file changed, 25 insertions(+), 7 deletions(-) diff --git a/mm/swapfile.c b/mm/swapfile.c index 4840fd40f36f..451d20bb9f47 100644 --- a/mm/swapfile.c +++ b/mm/swapfile.c @@ -1442,8 +1442,10 @@ static bool swap_sync_discard(void) } =20 static int swap_extend_table_alloc(struct swap_info_struct *si, - struct swap_cluster_info *ci, gfp_t gfp) + struct swap_cluster_info *ci, + unsigned int ci_off, gfp_t gfp) { + int count; void *table; =20 table =3D kzalloc(sizeof(ci->extend_table[0]) * SWAPFILE_CLUSTER, gfp); @@ -1451,11 +1453,27 @@ static int swap_extend_table_alloc(struct swap_info= _struct *si, return -ENOMEM; =20 spin_lock(&ci->lock); - if (!ci->extend_table) - ci->extend_table =3D table; - else - kfree(table); + /* + * Extend table allocation requires releasing ci lock first so it's + * possible that the slot has been freed, no longer overflowed, or + * a concurrent extend table allocation has already succeeded, so + * the allocation is no longer needed. + */ + if (!cluster_table_is_alloced(ci)) + goto out_free; + count =3D swp_tb_get_count(__swap_table_get(ci, ci_off)); + if (count < (SWP_TB_COUNT_MAX - 1)) + goto out_free; + if (ci->extend_table) + goto out_free; + + ci->extend_table =3D table; + spin_unlock(&ci->lock); + return 0; + +out_free: spin_unlock(&ci->lock); + kfree(table); return 0; } =20 @@ -1471,7 +1489,7 @@ int swap_retry_table_alloc(swp_entry_t entry, gfp_t g= fp) return 0; =20 ci =3D __swap_offset_to_cluster(si, offset); - ret =3D swap_extend_table_alloc(si, ci, gfp); + ret =3D swap_extend_table_alloc(si, ci, swp_cluster_offset(entry), gfp); =20 put_swap_device(si); return ret; @@ -1664,7 +1682,7 @@ static int swap_dup_entries_cluster(struct swap_info_= struct *si, if (unlikely(err)) { if (err =3D=3D -ENOMEM) { spin_unlock(&ci->lock); - err =3D swap_extend_table_alloc(si, ci, GFP_ATOMIC); + err =3D swap_extend_table_alloc(si, ci, ci_off, GFP_ATOMIC); spin_lock(&ci->lock); if (!err) goto restart; --- base-commit: 972c53e0ec3abfc6f5fe2cb503640710fb23cf95 change-id: 20260512-swap-extend-table-fix-ed7d1f458dac Best regards, -- =20 Kairui Song