From nobody Mon Feb 9 17:58:10 2026 Received: from mail-pl1-f178.google.com (mail-pl1-f178.google.com [209.85.214.178]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 770E64A02 for ; Fri, 19 Dec 2025 19:45:06 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.178 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1766173510; cv=none; b=AcbxgPNMUuPABRqI7CIlnx13E+kbohv+bql/HS0ls6dg50+irSz1GG08Sw9RwB258w+3RJ8CEHgYUVHiKtuSi3WE1tH1Yh/qDQcUyFWAgpCjcPdzpPinUVjNB5f66jy5vZwXa4N/mI+OAasUcqubvuLhZMS7yL2A/3KySk0Jma8= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1766173510; c=relaxed/simple; bh=tJroYPRwH9RrOt9AfVnZ4ZG9X0tKEn1CzUEzsEaioH8=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=bI6NiUKro81gAu4HE0vZ2ZwVKBxTjkvSFQ43/gCoRnKCBusLW869xTGfekMBd8+9QcGRBFuIhE99mwpsfF+zXANoXWwML5S/sxUm9PmcCeSmS4mMYV++DhEzVqLA1V0wOeFgHQdSYIic4yy9GvSnVUtwouxbETWyeS9jyotFuuE= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=HeUzxFse; arc=none smtp.client-ip=209.85.214.178 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="HeUzxFse" Received: by mail-pl1-f178.google.com with SMTP id d9443c01a7336-2a09d981507so15055855ad.1 for ; Fri, 19 Dec 2025 11:45:05 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1766173504; x=1766778304; darn=vger.kernel.org; h=cc:to:in-reply-to:references:message-id:content-transfer-encoding :mime-version:subject:date:from:from:to:cc:subject:date:message-id :reply-to; bh=4HyDXgqLqhgB1GreDyVhfU9ccv+OxUjYGBlnoKiwIUg=; b=HeUzxFseybsPxLMK3+P/JcDUQxIad0iVUcKp0Dpbfve56f9rlI21deS9kJysQwk88d P57cT7xka5q7GRdbXwAQPI4qXXP/VD2CB4MsJZ9ZnKfxOv2JxKma4tRpEj5SDrvgFWke r99rNXbFB2SRv2TAX25xrxUzmRypRFJrGqF1PqIVhyUun/U2ebWJOV7RMyTGLQ23vuQN x+Cb5IQVZnxUZbtbP4hCKl9I4QSOj9EiOAQ4QeNxeWzL3IcyciPErhWLGDCqO10n2Qwd C0SSohh+nhobR5SVwsZAlfPN0CG9h3Izb1hf0QPDRBitvGFRwWunamBSYMGYV+Jn/IlH FoPg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1766173504; x=1766778304; h=cc:to:in-reply-to:references:message-id:content-transfer-encoding :mime-version:subject:date:from:x-gm-gg:x-gm-message-state:from:to :cc:subject:date:message-id:reply-to; bh=4HyDXgqLqhgB1GreDyVhfU9ccv+OxUjYGBlnoKiwIUg=; b=YfC+2GCXuElquTQ6Tw+e1tZS7N1lf/dnlIJqQogMlqz8fE2b5xOPrG5EiTLm1ZSDTS g84T4tdZYcp69fZW828TvO5MPArozjr+54DIlG9y9aFbpISwzvdOTPSe97KMy3iZ97a8 38XkAgCZL+TSUfDFkWjewcP0YjZfp/SYuBZCocM1rTiCNGHx9TgTSzHv1thLTsU+GEs4 0t1B80dR1QZiIVSuWGedT9wp9NAyob3l+wgH5O0P1htKeCP5g48Vr5O8kTKUMR+sHELL fAhS9q3hc/2Dpm63A7lbJw5plwr1uaZEFS58Gd70OuBCNaGtCYl7Yrj14gSyXsO2XALt 9mxw== X-Forwarded-Encrypted: i=1; AJvYcCU7lFZYJMDmTa62EE8LSR/CMXoZRvSPpaX/3NFvlvyx7J77HjDDY1PkWot9ersEiIFG7UUby9nJB7/5cSk=@vger.kernel.org X-Gm-Message-State: AOJu0YzsnCYzEQoALScr2RLbd+h4wFrFGaNRzdcocFs0GVRk5l9bcPs1 D4Y2gwNvgUJ2HhGLuzEHPxEjvVff4TE4Cdbm3e7+jACQKJRuLc/Ug2dL X-Gm-Gg: AY/fxX6uIw7IuAd4Vv2rTi2Adqo6yovUQoUOFzDO0gAPXpNkw0n/W7ksgmnzHdpZUC6 nqVOVjOWBXdNbQlO4AcNMYl7uVLlCgLK3o45WIad1MONuLAxNasJF14WmetIEhf8runOv1q73c2 pwaf6lPM+Ym4K/La7ePvGP0B+iUEtteGtx6p+newvzgRRhX8jLVNWPQ0XuS/npDy4u84EypIDlU eIdjvqrn9TMpi/1iVGrGAzbIh9M56K+lZLDff2V3qjR3tWGoahbfWIIu2YhCsUj27YCutiFjfb2 AIbNw2Jck1F9MHGwmZkbk6lKZHI673JnqHgYM1jnBeWBxSOMz9jx/yhbEkdqgf2b4ELknvMUgQr fujlRTis8wgKiXAoxOypBVuDVTmlWdX4UIbFA1vSkgzlLZoMd8+/CijvRLA22KLMvdmKaIVf9qb THWtoPdWO1mxXKTerpakIFwZh3r3oTjx2iISyVLvs9XlmYFU6d+uB5 X-Google-Smtp-Source: AGHT+IEGbaYJLQwz82Y32ZQr8sGHVrxAKjqjmAa8Er6Pz2+vPVIlkrRIfDFe5BuAafoIgT8PdMNOUA== X-Received: by 2002:a17:903:228e:b0:296:547a:4bf2 with SMTP id d9443c01a7336-2a2f0d5d63emr37997225ad.27.1766173504336; Fri, 19 Dec 2025 11:45:04 -0800 (PST) Received: from [127.0.0.1] ([101.32.222.185]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-2a2f3d76ceesm30170985ad.91.2025.12.19.11.44.59 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 19 Dec 2025 11:45:03 -0800 (PST) From: Kairui Song Date: Sat, 20 Dec 2025 03:43:39 +0800 Subject: [PATCH v5 10/19] mm, swap: consolidate cluster reclaim and usability check Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Message-Id: <20251220-swap-table-p2-v5-10-8862a265a033@tencent.com> References: <20251220-swap-table-p2-v5-0-8862a265a033@tencent.com> In-Reply-To: <20251220-swap-table-p2-v5-0-8862a265a033@tencent.com> To: linux-mm@kvack.org Cc: Andrew Morton , Baoquan He , Barry Song , Chris Li , Nhat Pham , Yosry Ahmed , David Hildenbrand , Johannes Weiner , Youngjun Park , Hugh Dickins , Baolin Wang , Ying Huang , Kemeng Shi , Lorenzo Stoakes , "Matthew Wilcox (Oracle)" , linux-kernel@vger.kernel.org, Kairui Song X-Mailer: b4 0.14.3 X-Developer-Signature: v=1; a=ed25519-sha256; t=1766173451; l=4270; i=kasong@tencent.com; s=kasong-sign-tencent; h=from:subject:message-id; bh=xnDm6QZuEKeEGBDI1eFAFieFZ69pdwgSuhajCqwh5kM=; b=pZFw4tAOtubaYjGkG7HZR43CluQtM+Je8zGxoWNwlsDS5aDQ0BsoQxekZjMJh12Tio/q49Cjp tgD9gG0XVyrBHTqTLZKoZLHUkwOqd9MNDB0iA5jtvC+Pq/1gFy6KW3r X-Developer-Key: i=kasong@tencent.com; a=ed25519; pk=kCdoBuwrYph+KrkJnrr7Sm1pwwhGDdZKcKrqiK8Y1mI= From: Kairui Song Swap cluster cache reclaim requires releasing the lock, so the cluster may become unusable after the reclaim. To prepare for checking swap cache using the swap table directly, consolidate the swap cluster reclaim and the check logic. We will want to avoid touching the cluster's data completely with the swap table, to avoid RCU overhead here. And by moving the cluster usable check into the reclaim helper, it will also help avoid a redundant scan of the slots if the cluster is no longer usable, and we will want to avoid touching the cluster. Also, adjust it very slightly while at it: always scan the whole region during reclaim, don't skip slots covered by a reclaimed folio. Because the reclaim is lockless, it's possible that new cache lands at any time. And for allocation, we want all caches to be reclaimed to avoid fragmentation. Besides, if the scan offset is not aligned with the size of the reclaimed folio, we might skip some existing cache and fail the reclaim unexpectedly. There should be no observable behavior change. It might slightly improve the fragmentation issue or performance. Signed-off-by: Kairui Song --- mm/swapfile.c | 45 +++++++++++++++++++++++++++++---------------- 1 file changed, 29 insertions(+), 16 deletions(-) diff --git a/mm/swapfile.c b/mm/swapfile.c index 6d2ee1af0477..f3516e3c9e40 100644 --- a/mm/swapfile.c +++ b/mm/swapfile.c @@ -777,33 +777,51 @@ static int swap_cluster_setup_bad_slot(struct swap_cl= uster_info *cluster_info, return 0; } =20 +/* + * Reclaim drops the ci lock, so the cluster may become unusable (freed or + * stolen by a lower order). @usable will be set to false if that happens. + */ static bool cluster_reclaim_range(struct swap_info_struct *si, struct swap_cluster_info *ci, - unsigned long start, unsigned long end) + unsigned long start, unsigned int order, + bool *usable) { + unsigned int nr_pages =3D 1 << order; + unsigned long offset =3D start, end =3D start + nr_pages; unsigned char *map =3D si->swap_map; - unsigned long offset =3D start; int nr_reclaim; =20 spin_unlock(&ci->lock); do { switch (READ_ONCE(map[offset])) { case 0: - offset++; break; case SWAP_HAS_CACHE: nr_reclaim =3D __try_to_reclaim_swap(si, offset, TTRS_ANYWAY); - if (nr_reclaim > 0) - offset +=3D nr_reclaim; - else + if (nr_reclaim < 0) goto out; break; default: goto out; } - } while (offset < end); + } while (++offset < end); out: spin_lock(&ci->lock); + + /* + * We just dropped ci->lock so cluster could be used by another + * order or got freed, check if it's still usable or empty. + */ + if (!cluster_is_usable(ci, order)) { + *usable =3D false; + return false; + } + *usable =3D true; + + /* Fast path, no need to scan if the whole cluster is empty */ + if (cluster_is_empty(ci)) + return true; + /* * Recheck the range no matter reclaim succeeded or not, the slot * could have been be freed while we are not holding the lock. @@ -900,9 +918,10 @@ static unsigned int alloc_swap_scan_cluster(struct swa= p_info_struct *si, unsigned long start =3D ALIGN_DOWN(offset, SWAPFILE_CLUSTER); unsigned long end =3D min(start + SWAPFILE_CLUSTER, si->max); unsigned int nr_pages =3D 1 << order; - bool need_reclaim, ret; + bool need_reclaim, ret, usable; =20 lockdep_assert_held(&ci->lock); + VM_WARN_ON(!cluster_is_usable(ci, order)); =20 if (end < nr_pages || ci->count + nr_pages > SWAPFILE_CLUSTER) goto out; @@ -912,14 +931,8 @@ static unsigned int alloc_swap_scan_cluster(struct swa= p_info_struct *si, if (!cluster_scan_range(si, ci, offset, nr_pages, &need_reclaim)) continue; if (need_reclaim) { - ret =3D cluster_reclaim_range(si, ci, offset, offset + nr_pages); - /* - * Reclaim drops ci->lock and cluster could be used - * by another order. Not checking flag as off-list - * cluster has no flag set, and change of list - * won't cause fragmentation. - */ - if (!cluster_is_usable(ci, order)) + ret =3D cluster_reclaim_range(si, ci, offset, order, &usable); + if (!usable) goto out; if (cluster_is_empty(ci)) offset =3D start; --=20 2.52.0