From nobody Fri Apr 3 12:50:01 2026 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 4599433032A; Thu, 19 Feb 2026 23:42:08 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1771544528; cv=none; b=hl9R7smX227XInQ/0llZmVfOYTv/QEdLReUYo5Oo+x95QR5UnqEgdyyTre39LKF2s63w7J4np37LlNDPsnkA13W5cwbj5OI6XjtQhCc3KXZeNXVB54Y94va94nTjryLB3GN4jxIhrvVJmAtXUXIF+MhtyO1bhh33t9bvUtyZKl8= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1771544528; c=relaxed/simple; bh=SVkjz87H+v7qEvTRagEPaFIS/kko6XF9kiNxmbSXkpQ=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=Uead0N7gIDwP+WGqichRnJHD/fCdhM51L7aNxlUkWV32yrfobiyREcBtKHC0G8wBQTrJ7j4IvunpMewkS4zlZ6x5ja71CdpHj+a2Y1iIC1pLtoERSShBaQ7ANH8QOnMBmV3lFvwpLZ/qi6Qm0SNbcq8wOTF7bU5JT+P39LvWtzE= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=crl8C4zW; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="crl8C4zW" Received: by smtp.kernel.org (Postfix) with ESMTPS id 0CCDBC2BCB2; Thu, 19 Feb 2026 23:42:08 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1771544528; bh=SVkjz87H+v7qEvTRagEPaFIS/kko6XF9kiNxmbSXkpQ=; h=From:Date:Subject:References:In-Reply-To:To:Cc:Reply-To:From; b=crl8C4zWRKZwRLaByTmREkcM4IQWEXl33WnmqGuSEWKF00sOUTp2eCuGZ6HoK5YLL lBfdSZFAdLR1JLfFLM9UZ7I/p6Aui7ZLzvRv3jIe+deFP7DGQ8ksmmJxOP13Jte0zP MLB9ojJ0d4E9jBhKTKOK1OovK9I7vKOjjgSRa0PGTQJheSTQJ+okRwVYuvU1MKGXyi W2paG5sr3Jb9EXXrt0nHhgR0DJflFDx5xnAunvXQePHA2DcJn3j3EFtWpeuKnKQthn M9rXU9RyPxeZoCMz1QsUJOwOQ5t2+fDwRUd1GqAXSAtnTt4zOoxiL5LR3uyYqVb7Nq rF3dvGMth+mpw== Received: from aws-us-west-2-korg-lkml-1.web.codeaurora.org (localhost.localdomain [127.0.0.1]) by smtp.lore.kernel.org (Postfix) with ESMTP id 03203C531EA; Thu, 19 Feb 2026 23:42:08 +0000 (UTC) From: Kairui Song via B4 Relay Date: Fri, 20 Feb 2026 07:42:12 +0800 Subject: [PATCH RFC 11/15] mm/swap, memcg: remove swap cgroup array Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Message-Id: <20260220-swap-table-p4-v1-11-104795d19815@tencent.com> References: <20260220-swap-table-p4-v1-0-104795d19815@tencent.com> In-Reply-To: <20260220-swap-table-p4-v1-0-104795d19815@tencent.com> To: linux-mm@kvack.org Cc: Andrew Morton , David Hildenbrand , Lorenzo Stoakes , Zi Yan , Baolin Wang , Barry Song , Hugh Dickins , Chris Li , Kemeng Shi , Nhat Pham , Baoquan He , Johannes Weiner , Yosry Ahmed , Youngjun Park , Chengming Zhou , Roman Gushchin , Shakeel Butt , Muchun Song , Qi Zheng , linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, Kairui Song X-Mailer: b4 0.14.3 X-Developer-Signature: v=1; a=ed25519-sha256; t=1771544524; l=15582; i=kasong@tencent.com; s=kasong-sign-tencent; h=from:subject:message-id; bh=ab1DMIR8yqVLL69CexiaEYLTLIUHukDgwvK/gc3LcsM=; b=W5W0v6jgjF0XBwiRd/XQqgsIQe0I03NglUh7sJSKbsTpBIHlTYOWBv8Bup+fBAi5446LgGVyV eqfznMFyEv+Cy0zY0FiaHNTajlorvDD1ujYZO7cjxI/YmbNz9mMM5g5 X-Developer-Key: i=kasong@tencent.com; a=ed25519; pk=kCdoBuwrYph+KrkJnrr7Sm1pwwhGDdZKcKrqiK8Y1mI= X-Endpoint-Received: by B4 Relay for kasong@tencent.com/kasong-sign-tencent with auth_id=562 X-Original-From: Kairui Song Reply-To: kasong@tencent.com From: Kairui Song Now swap table contains the swap cgropu info all the time, the swap cgroup array can be dropped. Signed-off-by: Kairui Song --- MAINTAINERS | 1 - include/linux/memcontrol.h | 6 +- include/linux/swap_cgroup.h | 47 ------------ mm/Makefile | 3 - mm/internal.h | 1 - mm/memcontrol-v1.c | 1 - mm/memcontrol.c | 19 ++--- mm/swap_cgroup.c | 171 ----------------------------------------= ---- mm/swap_state.c | 3 +- mm/swapfile.c | 23 +----- 10 files changed, 11 insertions(+), 264 deletions(-) diff --git a/MAINTAINERS b/MAINTAINERS index aa1734a12887..05e633611e0b 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -6571,7 +6571,6 @@ F: mm/memcontrol.c F: mm/memcontrol-v1.c F: mm/memcontrol-v1.h F: mm/page_counter.c -F: mm/swap_cgroup.c F: samples/cgroup/* F: tools/testing/selftests/cgroup/memcg_protection.m F: tools/testing/selftests/cgroup/test_hugetlb_memcg.c diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h index 8fc794baf736..4bfe905bffb0 100644 --- a/include/linux/memcontrol.h +++ b/include/linux/memcontrol.h @@ -667,8 +667,7 @@ static inline int mem_cgroup_charge(struct folio *folio= , struct mm_struct *mm, int mem_cgroup_charge_hugetlb(struct folio* folio, gfp_t gfp); =20 int mem_cgroup_swapin_charge_folio(struct folio *folio, struct mm_struct *= mm, - gfp_t gfp, swp_entry_t entry, - unsigned short id); + gfp_t gfp, unsigned short id); =20 void __mem_cgroup_uncharge(struct folio *folio); =20 @@ -1146,8 +1145,7 @@ static inline int mem_cgroup_charge_hugetlb(struct fo= lio* folio, gfp_t gfp) } =20 static inline int mem_cgroup_swapin_charge_folio(struct folio *folio, - struct mm_struct *mm, gfp_t gfp, swp_entry_t entry, - unsigned short id) + struct mm_struct *mm, gfp_t gfp, unsigned short id) { return 0; } diff --git a/include/linux/swap_cgroup.h b/include/linux/swap_cgroup.h deleted file mode 100644 index 91cdf12190a0..000000000000 --- a/include/linux/swap_cgroup.h +++ /dev/null @@ -1,47 +0,0 @@ -/* SPDX-License-Identifier: GPL-2.0 */ -#ifndef __LINUX_SWAP_CGROUP_H -#define __LINUX_SWAP_CGROUP_H - -#include - -#if defined(CONFIG_MEMCG) && defined(CONFIG_SWAP) - -extern void swap_cgroup_record(struct folio *folio, unsigned short id, swp= _entry_t ent); -extern unsigned short swap_cgroup_clear(swp_entry_t ent, unsigned int nr_e= nts); -extern unsigned short lookup_swap_cgroup_id(swp_entry_t ent); -extern int swap_cgroup_swapon(int type, unsigned long max_pages); -extern void swap_cgroup_swapoff(int type); - -#else - -static inline -void swap_cgroup_record(struct folio *folio, unsigned short id, swp_entry_= t ent) -{ -} - -static inline -unsigned short swap_cgroup_clear(swp_entry_t ent, unsigned int nr_ents) -{ - return 0; -} - -static inline -unsigned short lookup_swap_cgroup_id(swp_entry_t ent) -{ - return 0; -} - -static inline int -swap_cgroup_swapon(int type, unsigned long max_pages) -{ - return 0; -} - -static inline void swap_cgroup_swapoff(int type) -{ - return; -} - -#endif - -#endif /* __LINUX_SWAP_CGROUP_H */ diff --git a/mm/Makefile b/mm/Makefile index 8ad2ab08244e..eff9f9e7e061 100644 --- a/mm/Makefile +++ b/mm/Makefile @@ -103,9 +103,6 @@ obj-$(CONFIG_PAGE_COUNTER) +=3D page_counter.o obj-$(CONFIG_LIVEUPDATE_MEMFD) +=3D memfd_luo.o obj-$(CONFIG_MEMCG_V1) +=3D memcontrol-v1.o obj-$(CONFIG_MEMCG) +=3D memcontrol.o vmpressure.o -ifdef CONFIG_SWAP -obj-$(CONFIG_MEMCG) +=3D swap_cgroup.o -endif ifdef CONFIG_BPF_SYSCALL obj-$(CONFIG_MEMCG) +=3D bpf_memcontrol.o endif diff --git a/mm/internal.h b/mm/internal.h index 416d3401aa17..26691885d75f 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -16,7 +16,6 @@ #include #include #include -#include #include =20 /* Internal core VMA manipulation functions. */ diff --git a/mm/memcontrol-v1.c b/mm/memcontrol-v1.c index 038e630dc7e1..eff18eda0707 100644 --- a/mm/memcontrol-v1.c +++ b/mm/memcontrol-v1.c @@ -5,7 +5,6 @@ #include #include #include -#include #include #include #include diff --git a/mm/memcontrol.c b/mm/memcontrol.c index d0f50019d733..8d0c9f3a011e 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -54,7 +54,6 @@ #include #include #include -#include #include #include #include @@ -4793,7 +4792,6 @@ int mem_cgroup_charge_hugetlb(struct folio *folio, gf= p_t gfp) * @folio: folio to charge. * @mm: mm context of the victim * @gfp: reclaim mode - * @entry: swap entry for which the folio is allocated * @id: the mem cgroup id * * This function charges a folio allocated for swapin. Please call this be= fore @@ -4802,21 +4800,17 @@ int mem_cgroup_charge_hugetlb(struct folio *folio, = gfp_t gfp) * Returns 0 on success. Otherwise, an error code is returned. */ int mem_cgroup_swapin_charge_folio(struct folio *folio, struct mm_struct *= mm, - gfp_t gfp, swp_entry_t entry, unsigned short id) + gfp_t gfp, unsigned short id) { struct mem_cgroup *memcg, *swap_memcg; - unsigned short memcg_id; unsigned int nr_pages; int ret; =20 if (mem_cgroup_disabled()) return 0; =20 - memcg_id =3D lookup_swap_cgroup_id(entry); nr_pages =3D folio_nr_pages(folio); =20 - WARN_ON_ONCE(id !=3D memcg_id); - rcu_read_lock(); swap_memcg =3D mem_cgroup_from_private_id(id); if (!swap_memcg) { @@ -4836,10 +4830,11 @@ int mem_cgroup_swapin_charge_folio(struct folio *fo= lio, struct mm_struct *mm, =20 /* * On successful charge, the folio itself now belongs to the memcg, - * so is folio->swap. So we can release the swap cgroup table's - * pinning of the private id. + * so is folio->swap. And the folio takes place of the shadow in + * the swap table so we can release the shadow's pinning of the + * private id. */ - swap_cgroup_clear(folio->swap, nr_pages); + VM_WARN_ON_ONCE_FOLIO(!folio_test_swapcache(folio), folio); mem_cgroup_private_id_put(swap_memcg, nr_pages); =20 /* @@ -5324,8 +5319,6 @@ struct mem_cgroup *__mem_cgroup_swap_free_folio(struc= t folio *folio, { unsigned int nr_pages =3D folio_nr_pages(folio); struct mem_cgroup *memcg, *swap_memcg; - swp_entry_t entry =3D folio->swap; - unsigned short id; =20 VM_WARN_ON_ONCE_FOLIO(!folio_memcg_charged(folio), folio); VM_WARN_ON_ONCE_FOLIO(!folio_test_swapcache(folio), folio); @@ -5337,8 +5330,6 @@ struct mem_cgroup *__mem_cgroup_swap_free_folio(struc= t folio *folio, */ memcg =3D folio_memcg(folio); swap_memcg =3D mem_cgroup_private_id_get_online(memcg, nr_pages); - id =3D mem_cgroup_private_id(swap_memcg); - swap_cgroup_record(folio, id, entry); =20 if (reclaim && do_memsw_account()) { memcg1_swapout(folio, swap_memcg); diff --git a/mm/swap_cgroup.c b/mm/swap_cgroup.c deleted file mode 100644 index b5a7f21c3afe..000000000000 --- a/mm/swap_cgroup.c +++ /dev/null @@ -1,171 +0,0 @@ -// SPDX-License-Identifier: GPL-2.0 -#include -#include -#include - -#include /* depends on mm.h include */ - -static DEFINE_MUTEX(swap_cgroup_mutex); - -/* Pack two cgroup id (short) of two entries in one swap_cgroup (atomic_t)= */ -#define ID_PER_SC (sizeof(struct swap_cgroup) / sizeof(unsigned short)) -#define ID_SHIFT (BITS_PER_TYPE(unsigned short)) -#define ID_MASK (BIT(ID_SHIFT) - 1) -struct swap_cgroup { - atomic_t ids; -}; - -struct swap_cgroup_ctrl { - struct swap_cgroup *map; -}; - -static struct swap_cgroup_ctrl swap_cgroup_ctrl[MAX_SWAPFILES]; - -static unsigned short __swap_cgroup_id_lookup(struct swap_cgroup *map, - pgoff_t offset) -{ - unsigned int shift =3D (offset % ID_PER_SC) * ID_SHIFT; - unsigned int old_ids =3D atomic_read(&map[offset / ID_PER_SC].ids); - - BUILD_BUG_ON(!is_power_of_2(ID_PER_SC)); - BUILD_BUG_ON(sizeof(struct swap_cgroup) !=3D sizeof(atomic_t)); - - return (old_ids >> shift) & ID_MASK; -} - -static unsigned short __swap_cgroup_id_xchg(struct swap_cgroup *map, - pgoff_t offset, - unsigned short new_id) -{ - unsigned short old_id; - struct swap_cgroup *sc =3D &map[offset / ID_PER_SC]; - unsigned int shift =3D (offset % ID_PER_SC) * ID_SHIFT; - unsigned int new_ids, old_ids =3D atomic_read(&sc->ids); - - do { - old_id =3D (old_ids >> shift) & ID_MASK; - new_ids =3D (old_ids & ~(ID_MASK << shift)); - new_ids |=3D ((unsigned int)new_id) << shift; - } while (!atomic_try_cmpxchg(&sc->ids, &old_ids, new_ids)); - - return old_id; -} - -/** - * swap_cgroup_record - record mem_cgroup for a set of swap entries. - * These entries must belong to one single folio, and that folio - * must be being charged for swap space (swap out). - * - * @folio: the folio that the swap entry belongs to - * @id: mem_cgroup ID to be recorded - * @ent: the first swap entry to be recorded - */ -void swap_cgroup_record(struct folio *folio, unsigned short id, - swp_entry_t ent) -{ - unsigned int nr_ents =3D folio_nr_pages(folio); - struct swap_cgroup *map; - pgoff_t offset, end; - unsigned short old; - - offset =3D swp_offset(ent); - end =3D offset + nr_ents; - map =3D swap_cgroup_ctrl[swp_type(ent)].map; - - do { - old =3D __swap_cgroup_id_xchg(map, offset, id); - VM_WARN_ON_ONCE(old); - } while (++offset !=3D end); -} - -/** - * swap_cgroup_clear - clear mem_cgroup for a set of swap entries. - * These entries must be being uncharged from swap. They either - * belongs to one single folio in the swap cache (swap in for - * cgroup v1), or no longer have any users (slot freeing). - * - * @ent: the first swap entry to be recorded into - * @nr_ents: number of swap entries to be recorded - * - * Returns the existing old value. - */ -unsigned short swap_cgroup_clear(swp_entry_t ent, unsigned int nr_ents) -{ - pgoff_t offset, end; - struct swap_cgroup *map; - unsigned short old, iter =3D 0; - - offset =3D swp_offset(ent); - end =3D offset + nr_ents; - map =3D swap_cgroup_ctrl[swp_type(ent)].map; - - do { - old =3D __swap_cgroup_id_xchg(map, offset, 0); - if (!iter) - iter =3D old; - VM_BUG_ON(iter !=3D old); - } while (++offset !=3D end); - - return old; -} - -/** - * lookup_swap_cgroup_id - lookup mem_cgroup id tied to swap entry - * @ent: swap entry to be looked up. - * - * Returns ID of mem_cgroup at success. 0 at failure. (0 is invalid ID) - */ -unsigned short lookup_swap_cgroup_id(swp_entry_t ent) -{ - struct swap_cgroup_ctrl *ctrl; - - if (mem_cgroup_disabled()) - return 0; - - ctrl =3D &swap_cgroup_ctrl[swp_type(ent)]; - return __swap_cgroup_id_lookup(ctrl->map, swp_offset(ent)); -} - -int swap_cgroup_swapon(int type, unsigned long max_pages) -{ - struct swap_cgroup *map; - struct swap_cgroup_ctrl *ctrl; - - if (mem_cgroup_disabled()) - return 0; - - BUILD_BUG_ON(sizeof(unsigned short) * ID_PER_SC !=3D - sizeof(struct swap_cgroup)); - map =3D vzalloc(DIV_ROUND_UP(max_pages, ID_PER_SC) * - sizeof(struct swap_cgroup)); - if (!map) - goto nomem; - - ctrl =3D &swap_cgroup_ctrl[type]; - mutex_lock(&swap_cgroup_mutex); - ctrl->map =3D map; - mutex_unlock(&swap_cgroup_mutex); - - return 0; -nomem: - pr_info("couldn't allocate enough memory for swap_cgroup\n"); - pr_info("swap_cgroup can be disabled by swapaccount=3D0 boot option\n"); - return -ENOMEM; -} - -void swap_cgroup_swapoff(int type) -{ - struct swap_cgroup *map; - struct swap_cgroup_ctrl *ctrl; - - if (mem_cgroup_disabled()) - return; - - mutex_lock(&swap_cgroup_mutex); - ctrl =3D &swap_cgroup_ctrl[type]; - map =3D ctrl->map; - ctrl->map =3D NULL; - mutex_unlock(&swap_cgroup_mutex); - - vfree(map); -} diff --git a/mm/swap_state.c b/mm/swap_state.c index 5ab3a41fe42c..c6ba15de4094 100644 --- a/mm/swap_state.c +++ b/mm/swap_state.c @@ -163,7 +163,6 @@ static int __swap_cache_check_batch(struct swap_cluster= _info *ci, *shadowp =3D swp_tb_to_shadow(old_tb); memcgid =3D shadow_to_memcgid(*shadowp); =20 - WARN_ON_ONCE(!mem_cgroup_disabled() && !memcgid); do { old_tb =3D __swap_table_get(ci, ci_off); if (unlikely(swp_tb_is_folio(old_tb)) || @@ -255,7 +254,7 @@ static struct folio *__swap_cache_alloc(struct swap_clu= ster_info *ci, WARN_ON(!shadow); =20 if (mem_cgroup_swapin_charge_folio(folio, vmf ? vmf->vma->vm_mm : NULL, - gfp, entry, shadow_to_memcgid(shadow))) { + gfp, shadow_to_memcgid(shadow))) { spin_lock(&ci->lock); __swap_cache_del_folio(ci, folio, shadow, false, false); spin_unlock(&ci->lock); diff --git a/mm/swapfile.c b/mm/swapfile.c index cd2d3b2ca6f0..de34f1990209 100644 --- a/mm/swapfile.c +++ b/mm/swapfile.c @@ -45,7 +45,6 @@ =20 #include #include -#include #include "swap_table.h" #include "internal.h" #include "swap_table.h" @@ -1885,8 +1884,7 @@ void __swap_cluster_free_entries(struct swap_info_str= uct *si, { void *shadow; unsigned long old_tb; - unsigned int type =3D si->type; - unsigned int id =3D 0, id_iter, id_check; + unsigned int id =3D 0, id_iter; unsigned int ci_off =3D ci_start, ci_end =3D ci_start + nr_pages; unsigned long offset =3D cluster_offset(si, ci); unsigned int ci_batch =3D ci_off; @@ -1903,23 +1901,15 @@ void __swap_cluster_free_entries(struct swap_info_s= truct *si, shadow =3D swp_tb_to_shadow(old_tb); id_iter =3D shadow_to_memcgid(shadow); if (id !=3D id_iter) { - if (id) { - id_check =3D swap_cgroup_clear(swp_entry(type, offset + ci_batch), - ci_off - ci_batch); - WARN_ON(id !=3D id_check); + if (id) mem_cgroup_uncharge_swap(id, ci_off - ci_batch); - } id =3D id_iter; ci_batch =3D ci_off; } } while (++ci_off < ci_end); =20 - if (id) { - id_check =3D swap_cgroup_clear(swp_entry(type, offset + ci_batch), - ci_off - ci_batch); - WARN_ON(id !=3D id_check); + if (id) mem_cgroup_uncharge_swap(id, ci_off - ci_batch); - } =20 swap_range_free(si, offset + ci_start, nr_pages); swap_cluster_assert_empty(ci, ci_start, nr_pages, false); @@ -3034,8 +3024,6 @@ SYSCALL_DEFINE1(swapoff, const char __user *, special= file) p->global_cluster =3D NULL; kvfree(zeromap); free_swap_cluster_info(cluster_info, maxpages); - /* Destroy swap account information */ - swap_cgroup_swapoff(p->type); =20 inode =3D mapping->host; =20 @@ -3567,10 +3555,6 @@ SYSCALL_DEFINE2(swapon, const char __user *, special= file, int, swap_flags) if (error) goto bad_swap_unlock_inode; =20 - error =3D swap_cgroup_swapon(si->type, maxpages); - if (error) - goto bad_swap_unlock_inode; - /* * Use kvmalloc_array instead of bitmap_zalloc as the allocation order mi= ght * be above MAX_PAGE_ORDER incase of a large swap file. @@ -3681,7 +3665,6 @@ SYSCALL_DEFINE2(swapon, const char __user *, specialf= ile, int, swap_flags) si->global_cluster =3D NULL; inode =3D NULL; destroy_swap_extents(si, swap_file); - swap_cgroup_swapoff(si->type); free_swap_cluster_info(si->cluster_info, si->max); si->cluster_info =3D NULL; kvfree(si->zeromap); --=20 2.53.0