From nobody Mon Feb 9 16:01:46 2026 Received: from mail-pj1-f50.google.com (mail-pj1-f50.google.com [209.85.216.50]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 03E3234A797 for ; Wed, 28 Jan 2026 09:30:53 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.216.50 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1769592655; cv=none; b=H1v+RKy8NlDdUB5y7AcqHW7LF+kTKZ4jA9nTWN2/X6vBfOxfpV/+77hYMrf4Oyib/mx3va3gtumurGNPpnRKvz0rHHv54szE5DIQMR0kquqk5uPWDaXPDLjJJoYZ6ZWe/7236HTn0Z2k52jQza1Jm2IDa3ntwxhr9tbftzX96Js= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1769592655; c=relaxed/simple; bh=7ccEw4+ezs68KMz6fmNdjYtqMbjfhzcscScI6KcPzUo=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=qm5FVgJnmXezpIzQThXVTx8zVNPeiv1p176euw6pl+vCmUXs5vpWuXLiSLqtkpA+Aaa4iW0ecReWp1yPQ/M2doAWHvFzSwuBYfzFlih5K/DvI3vRaYF+9uUifHx8UNrREEhVlsEY03VCk5mYPzsP7eCM0T/jk91iposn/q/X/fk= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=iOkWJMF8; arc=none smtp.client-ip=209.85.216.50 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="iOkWJMF8" Received: by mail-pj1-f50.google.com with SMTP id 98e67ed59e1d1-3530f597ea3so2977640a91.1 for ; Wed, 28 Jan 2026 01:30:53 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1769592653; x=1770197453; darn=vger.kernel.org; h=cc:to:in-reply-to:references:message-id:content-transfer-encoding :mime-version:subject:date:from:from:to:cc:subject:date:message-id :reply-to; bh=dh09qgby8S6xDVlW5KdfbSbFeHI4SQYQYdGZvGyhK/w=; b=iOkWJMF89EPeBtijNIGu6+O4bslVuNIrRIro0Cb1m9db1v2wYhO6iH9PXOn+my6WNo KGI49LPfylklYlymnsgYOr0LxwOU3zpLBOCOO7s+UZdUGyKcD/gXVe5+1C3VQzvzKSGr KsVgOHA8A/7ywr2fAU3qFhap/EprlXnAR7A7MTc8Zt2BkS/D5ou8dtYSKej8jD44e2WY tT5sMq1smPsDWCISxonup4dBec2rkbQDvfymxuXRe/6/GMX4Xh2ttX3Q1KUHGyV5b8/z FqvEoDyQh2a7EWjWqHcee0S2dun3BKPSSrbvAOdR9wfwiOvgrnH/b8b0SvpIsOQmItgE glIQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1769592653; x=1770197453; h=cc:to:in-reply-to:references:message-id:content-transfer-encoding :mime-version:subject:date:from:x-gm-gg:x-gm-message-state:from:to :cc:subject:date:message-id:reply-to; bh=dh09qgby8S6xDVlW5KdfbSbFeHI4SQYQYdGZvGyhK/w=; b=Bzf5md0SD222/Ggk23AADZanSzR9D+GzNxf4a+HT1pSWRdEu6/Y/oRoSotjDxp+wPS 43wWHx1vagkON0jFlfqm8XZglvZAtf1Ebb+hY4bwQBXCQw9VD/jwFivAjOd9KnLsRXBz oSrBL2Y4sSrAcmmQXtohMIium3lwipoZuotEXN1w2FV2LnrlslKWTqL+M38w9SXGa0Sg TVYdKr80fePHcv8oOTzDBQo4WjcaYqWwm1Amz4y1qHVnh0yCnoOs+tYnY5B3IqQ+SfbF MISBiQ771eld8lB0mP0MnNvB4tlj6jba+Oy6U4rq/uuBb9U1bHcoj7agBDpBqBwdWH1r s9hA== X-Forwarded-Encrypted: i=1; AJvYcCVquAzeriK+jNT8LcHfzLo4pYyWbRRwacCCRw8AWCaY+QynvkaITRVZYZpIEzuoM+DtEU437JPdeT9F4Cs=@vger.kernel.org X-Gm-Message-State: AOJu0YwBUNzmZyEbh/zcc/fQwdCQNnpCa1T7r2RB+tJioQdXYgjmbCpW zoH+qe4bVq3iug0EbLzraCFwPTqKTv1j1Gmmnloy82w6zLuNdsabEwm+ X-Gm-Gg: AZuq6aKUN/MO35MjInm80NqDtZSbUG6dWXsJwZTqNWQGgsMWvflOx/lQy7UITtF/rXo cw9N/upIWbJJmdWe6wDKSSBwe1gdpbFNH6NB36AeERbiAZXWi8f1nmOF2pKgGW716M6eGtn9Z+T ZHxq++MwU0qBTdrQ6B35n605wd4IXwql5c5Hdhb5Bi7n5AJ8umR6sSXU2ZmsRVaO7E8AKr4AOSi xUpgIqdzaBGj+2X76lSaT3eHnyWXMAEYR2bS6OA+KA61w3J8S/mUcJdphjSr4zjjIYU4vSm0E0/ OYpS6Q3a82BVX6Rj7C3BFRBWuJeDBLMF6vjWVZVxRpy6nGy18zk7YYpuf8knCnw70ODs0aDYidr +NQDjKWuTBBU1Tl3eLEkY4scRWZHiQ+ltuGim8NgmgeDpeAMcYxHhBqPV7RMi2UGMYq3p17NxoN AuJwnAFJygn8KzsnhWY/P1GTvdbqE8JLVXNltuP6wtscjwI8CNU3dmgt5WzipzjxSVPm3B X-Received: by 2002:a17:90b:268a:b0:33b:a906:e40 with SMTP id 98e67ed59e1d1-353fecc68aamr4206658a91.2.1769592653395; Wed, 28 Jan 2026 01:30:53 -0800 (PST) Received: from [127.0.0.1] ([43.132.141.21]) by smtp.gmail.com with ESMTPSA id 98e67ed59e1d1-3540f3eca6dsm1872235a91.15.2026.01.28.01.30.50 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 28 Jan 2026 01:30:52 -0800 (PST) From: Kairui Song Date: Wed, 28 Jan 2026 17:28:30 +0800 Subject: [PATCH v2 06/12] mm, swap: implement helpers for reserving data in the swap table Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Message-Id: <20260128-swap-table-p3-v2-6-fe0b67ef0215@tencent.com> References: <20260128-swap-table-p3-v2-0-fe0b67ef0215@tencent.com> In-Reply-To: <20260128-swap-table-p3-v2-0-fe0b67ef0215@tencent.com> To: linux-mm@kvack.org Cc: Andrew Morton , Kemeng Shi , Nhat Pham , Baoquan He , Barry Song , Johannes Weiner , David Hildenbrand , Lorenzo Stoakes , Youngjun Park , linux-kernel@vger.kernel.org, Chris Li , Kairui Song X-Mailer: b4 0.14.3 X-Developer-Signature: v=1; a=ed25519-sha256; t=1769592628; l=8791; i=kasong@tencent.com; s=kasong-sign-tencent; h=from:subject:message-id; bh=BTH7PLu1JcCgfu/sHFpEotAODg5Cd2G+MvLqcP9QWOM=; b=gbRif2/XxCHZURM6jy0wYODIgNiThYiIXJAfEMflr4osMuoS0q9rvNODHkucFrhDHBgf8VRqy b8sX9SNJVy2DNn0sYgPey2bxinRBiXote/Np8l7yzIECi1YVLcuOt93 X-Developer-Key: i=kasong@tencent.com; a=ed25519; pk=kCdoBuwrYph+KrkJnrr7Sm1pwwhGDdZKcKrqiK8Y1mI= From: Kairui Song To prepare for using the swap table as the unified swap layer, introduce macros and helpers for storing multiple kinds of data in a swap table entry. From now on, we are storing PFN in the swap table to make space for extra counting bits (SWAP_COUNT). Shadows are still stored as they are, as the SWAP_COUNT is not used yet. Also, rename shadow_swp_to_tb to shadow_to_swp_tb. That's a spelling error, not really worth a separate fix. No behaviour change yet, just prepare the API. Signed-off-by: Kairui Song --- mm/swap_state.c | 6 +-- mm/swap_table.h | 131 +++++++++++++++++++++++++++++++++++++++++++++++++++-= ---- 2 files changed, 124 insertions(+), 13 deletions(-) diff --git a/mm/swap_state.c b/mm/swap_state.c index 6d0eef7470be..e213ee35c1d2 100644 --- a/mm/swap_state.c +++ b/mm/swap_state.c @@ -148,7 +148,7 @@ void __swap_cache_add_folio(struct swap_cluster_info *c= i, VM_WARN_ON_ONCE_FOLIO(folio_test_swapcache(folio), folio); VM_WARN_ON_ONCE_FOLIO(!folio_test_swapbacked(folio), folio); =20 - new_tb =3D folio_to_swp_tb(folio); + new_tb =3D folio_to_swp_tb(folio, 0); ci_start =3D swp_cluster_offset(entry); ci_off =3D ci_start; ci_end =3D ci_start + nr_pages; @@ -249,7 +249,7 @@ void __swap_cache_del_folio(struct swap_cluster_info *c= i, struct folio *folio, VM_WARN_ON_ONCE_FOLIO(folio_test_writeback(folio), folio); =20 si =3D __swap_entry_to_info(entry); - new_tb =3D shadow_swp_to_tb(shadow); + new_tb =3D shadow_to_swp_tb(shadow, 0); ci_start =3D swp_cluster_offset(entry); ci_end =3D ci_start + nr_pages; ci_off =3D ci_start; @@ -331,7 +331,7 @@ void __swap_cache_replace_folio(struct swap_cluster_inf= o *ci, VM_WARN_ON_ONCE(!entry.val); =20 /* Swap cache still stores N entries instead of a high-order entry */ - new_tb =3D folio_to_swp_tb(new); + new_tb =3D folio_to_swp_tb(new, 0); do { old_tb =3D __swap_table_xchg(ci, ci_off, new_tb); WARN_ON_ONCE(!swp_tb_is_folio(old_tb) || swp_tb_to_folio(old_tb) !=3D ol= d); diff --git a/mm/swap_table.h b/mm/swap_table.h index 10e11d1f3b04..10762ac5f4f5 100644 --- a/mm/swap_table.h +++ b/mm/swap_table.h @@ -12,17 +12,72 @@ struct swap_table { }; =20 #define SWP_TABLE_USE_PAGE (sizeof(struct swap_table) =3D=3D PAGE_SIZE) -#define SWP_TB_COUNT_BITS 4 =20 /* * A swap table entry represents the status of a swap slot on a swap * (physical or virtual) device. The swap table in each cluster is a * 1:1 map of the swap slots in this cluster. * - * Each swap table entry could be a pointer (folio), a XA_VALUE - * (shadow), or NULL. + * Swap table entry type and bits layouts: + * + * NULL: |---------------- 0 ---------------| - Free slot + * Shadow: | SWAP_COUNT |---- SHADOW_VAL ---|1| - Swapped out slot + * PFN: | SWAP_COUNT |------ PFN -------|10| - Cached slot + * Pointer: |----------- Pointer ----------|100| - (Unused) + * Bad: |------------- 1 -------------|1000| - Bad slot + * + * SWAP_COUNT is `SWP_TB_COUNT_BITS` long, each entry is an atomic long. + * + * Usages: + * + * - NULL: Swap slot is unused, could be allocated. + * + * - Shadow: Swap slot is used and not cached (usually swapped out). It re= uses + * the XA_VALUE format to be compatible with working set shadows. SHADOW= _VAL + * part might be all 0 if the working shadow info is absent. In such a c= ase, + * we still want to keep the shadow format as a placeholder. + * + * Memcg ID is embedded in SHADOW_VAL. + * + * - PFN: Swap slot is in use, and cached. Memcg info is recorded on the p= age + * struct. + * + * - Pointer: Unused yet. `0b100` is reserved for potential pointer usage + * because only the lower three bits can be used as a marker for 8 bytes + * aligned pointers. + * + * - Bad: Swap slot is reserved, protects swap header or holes on swap dev= ices. */ =20 +#if defined(MAX_POSSIBLE_PHYSMEM_BITS) +#define SWAP_CACHE_PFN_BITS (MAX_POSSIBLE_PHYSMEM_BITS - PAGE_SHIFT) +#elif defined(MAX_PHYSMEM_BITS) +#define SWAP_CACHE_PFN_BITS (MAX_PHYSMEM_BITS - PAGE_SHIFT) +#else +#define SWAP_CACHE_PFN_BITS (BITS_PER_LONG - PAGE_SHIFT) +#endif + +/* NULL Entry, all 0 */ +#define SWP_TB_NULL 0UL + +/* Swapped out: shadow */ +#define SWP_TB_SHADOW_MARK 0b1UL + +/* Cached: PFN */ +#define SWP_TB_PFN_BITS (SWAP_CACHE_PFN_BITS + SWP_TB_PFN_MARK_BITS) +#define SWP_TB_PFN_MARK 0b10UL +#define SWP_TB_PFN_MARK_BITS 2 +#define SWP_TB_PFN_MARK_MASK (BIT(SWP_TB_PFN_MARK_BITS) - 1) + +/* SWAP_COUNT part for PFN or shadow, the width can be shrunk or extended = */ +#define SWP_TB_COUNT_BITS min(4, BITS_PER_LONG - SWP_TB_PFN_BITS) +#define SWP_TB_COUNT_MASK (~((~0UL) >> SWP_TB_COUNT_BITS)) +#define SWP_TB_COUNT_SHIFT (BITS_PER_LONG - SWP_TB_COUNT_BITS) +#define SWP_TB_COUNT_MAX ((1 << SWP_TB_COUNT_BITS) - 1) + +/* Bad slot: ends with 0b1000 and rests of bits are all 1 */ +#define SWP_TB_BAD ((~0UL) << 3) + /* Macro for shadow offset calculation */ #define SWAP_COUNT_SHIFT SWP_TB_COUNT_BITS =20 @@ -35,18 +90,47 @@ static inline unsigned long null_to_swp_tb(void) return 0; } =20 -static inline unsigned long folio_to_swp_tb(struct folio *folio) +static inline unsigned long __count_to_swp_tb(unsigned char count) { + /* + * At least three values are needed to distinguish free (0), + * used (count > 0 && count < SWP_TB_COUNT_MAX), and + * overflow (count =3D=3D SWP_TB_COUNT_MAX). + */ + BUILD_BUG_ON(SWP_TB_COUNT_MAX < 2 || SWP_TB_COUNT_BITS < 2); + VM_WARN_ON(count > SWP_TB_COUNT_MAX); + return ((unsigned long)count) << SWP_TB_COUNT_SHIFT; +} + +static inline unsigned long pfn_to_swp_tb(unsigned long pfn, unsigned int = count) +{ + unsigned long swp_tb; + BUILD_BUG_ON(sizeof(unsigned long) !=3D sizeof(void *)); - return (unsigned long)folio; + BUILD_BUG_ON(SWAP_CACHE_PFN_BITS > + (BITS_PER_LONG - SWP_TB_PFN_MARK_BITS - SWP_TB_COUNT_BITS)); + + swp_tb =3D (pfn << SWP_TB_PFN_MARK_BITS) | SWP_TB_PFN_MARK; + VM_WARN_ON_ONCE(swp_tb & SWP_TB_COUNT_MASK); + + return swp_tb | __count_to_swp_tb(count); +} + +static inline unsigned long folio_to_swp_tb(struct folio *folio, unsigned = int count) +{ + return pfn_to_swp_tb(folio_pfn(folio), count); } =20 -static inline unsigned long shadow_swp_to_tb(void *shadow) +static inline unsigned long shadow_to_swp_tb(void *shadow, unsigned int co= unt) { BUILD_BUG_ON((BITS_PER_XA_VALUE + 1) !=3D BITS_PER_BYTE * sizeof(unsigned long)); + BUILD_BUG_ON((unsigned long)xa_mk_value(0) !=3D SWP_TB_SHADOW_MARK); + VM_WARN_ON_ONCE(shadow && !xa_is_value(shadow)); - return (unsigned long)shadow; + VM_WARN_ON_ONCE(shadow && ((unsigned long)shadow & SWP_TB_COUNT_MASK)); + + return (unsigned long)shadow | __count_to_swp_tb(count) | SWP_TB_SHADOW_M= ARK; } =20 /* @@ -59,7 +143,7 @@ static inline bool swp_tb_is_null(unsigned long swp_tb) =20 static inline bool swp_tb_is_folio(unsigned long swp_tb) { - return !xa_is_value((void *)swp_tb) && !swp_tb_is_null(swp_tb); + return ((swp_tb & SWP_TB_PFN_MARK_MASK) =3D=3D SWP_TB_PFN_MARK); } =20 static inline bool swp_tb_is_shadow(unsigned long swp_tb) @@ -67,19 +151,44 @@ static inline bool swp_tb_is_shadow(unsigned long swp_= tb) return xa_is_value((void *)swp_tb); } =20 +static inline bool swp_tb_is_bad(unsigned long swp_tb) +{ + return swp_tb =3D=3D SWP_TB_BAD; +} + +static inline bool swp_tb_is_countable(unsigned long swp_tb) +{ + return (swp_tb_is_shadow(swp_tb) || swp_tb_is_folio(swp_tb) || + swp_tb_is_null(swp_tb)); +} + /* * Helpers for retrieving info from swap table. */ static inline struct folio *swp_tb_to_folio(unsigned long swp_tb) { VM_WARN_ON(!swp_tb_is_folio(swp_tb)); - return (void *)swp_tb; + return pfn_folio((swp_tb & ~SWP_TB_COUNT_MASK) >> SWP_TB_PFN_MARK_BITS); } =20 static inline void *swp_tb_to_shadow(unsigned long swp_tb) { VM_WARN_ON(!swp_tb_is_shadow(swp_tb)); - return (void *)swp_tb; + /* No shift needed, xa_value is stored as it is in the lower bits. */ + return (void *)(swp_tb & ~SWP_TB_COUNT_MASK); +} + +static inline unsigned char __swp_tb_get_count(unsigned long swp_tb) +{ + VM_WARN_ON(!swp_tb_is_countable(swp_tb)); + return ((swp_tb & SWP_TB_COUNT_MASK) >> SWP_TB_COUNT_SHIFT); +} + +static inline int swp_tb_get_count(unsigned long swp_tb) +{ + if (swp_tb_is_countable(swp_tb)) + return __swp_tb_get_count(swp_tb); + return -EINVAL; } =20 /* @@ -124,6 +233,8 @@ static inline unsigned long swap_table_get(struct swap_= cluster_info *ci, atomic_long_t *table; unsigned long swp_tb; =20 + VM_WARN_ON_ONCE(off >=3D SWAPFILE_CLUSTER); + rcu_read_lock(); table =3D rcu_dereference(ci->table); swp_tb =3D table ? atomic_long_read(&table[off]) : null_to_swp_tb(); --=20 2.52.0