From nobody Sat Feb 7 20:06:57 2026 Received: from out-170.mta0.migadu.com (out-170.mta0.migadu.com [91.218.175.170]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C5C72D2FF for ; Tue, 25 Jun 2024 00:59:30 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=91.218.175.170 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1719277172; cv=none; b=I6n2bpRl62E3r+1MOXzFPHU6lz18aDxfYJXJZ7WqneL4tBW6tmvOVZsihPWCPl4QSomMJuQd2T2QB2Dsfvvs+8pQO4zjkznmkGi7/Uh3918YT5Yv5r4Z9pIG0snAeccZ/N4liSiycSBLrCvnpo1KaKUhge0EHTZnEAB6GLsqKqs= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1719277172; c=relaxed/simple; bh=BGTJ8UpPpJFLnys7Cpb4hZmQmB4cDSMfeLSWQtRFwrs=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=CWkn57Dv9rqG5FlPJRsXD4ombL1mOY8IHJqxrnfYzGg84Ed6pONIsjtTWF3mVviJ8eS9CQm9Wfw8uYgBmRCKhpihaMq38LvkynE9DWZo2+SYclTPBqhs6LEXeph2mOUY/JqQMLZBYO9YvIbkp1mAuRbaHHY2F1XrQt7elmqtV9M= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev; spf=pass smtp.mailfrom=linux.dev; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b=SNHEm2mO; arc=none smtp.client-ip=91.218.175.170 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.dev Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b="SNHEm2mO" X-Envelope-To: akpm@linux-foundation.org DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1719277168; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=0K8TiASsfXHI+86AZ2a2bPWGqGFFcRCb/gQyRRUpsu8=; b=SNHEm2mO/2MBhZOMRXB4XkfaEiY2p+h8mD8OBd4U1zfPiQ3BuKoVwWbf2ski/Cc+TvOfxT QpO01xxRyLiHWa3dG3T+27zaWcaG96hCPF5P1nHpIMrWd4l1kPGh8RycYujNQQJUiVPl5+ uPIKcwxBcy2hngodxGz3SoUlayh91Iw= X-Envelope-To: hannes@cmpxchg.org X-Envelope-To: mhocko@kernel.org X-Envelope-To: shakeel.butt@linux.dev X-Envelope-To: muchun.song@linux.dev X-Envelope-To: linux-kernel@vger.kernel.org X-Envelope-To: cgroups@vger.kernel.org X-Envelope-To: linux-mm@kvack.org X-Envelope-To: roman.gushchin@linux.dev X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. From: Roman Gushchin To: Andrew Morton Cc: Johannes Weiner , Michal Hocko , Shakeel Butt , Muchun Song , linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, linux-mm@kvack.org, Roman Gushchin Subject: [PATCH v2 01/14] mm: memcg: introduce memcontrol-v1.c Date: Mon, 24 Jun 2024 17:58:53 -0700 Message-ID: <20240625005906.106920-2-roman.gushchin@linux.dev> In-Reply-To: <20240625005906.106920-1-roman.gushchin@linux.dev> References: <20240625005906.106920-1-roman.gushchin@linux.dev> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Migadu-Flow: FLOW_OUT Content-Type: text/plain; charset="utf-8" This patch introduces the mm/memcontrol-v1.c source file which will be used= for all legacy (cgroup v1) memory cgroup code. It also introduces mm/memcontrol= -v1.h to keep declarations shared between mm/memcontrol.c and mm/memcontrol-v1.c. As of now, let's compile it if CONFIG_MEMCG is set, similar to mm/memcontro= l.c. Later on it can be switched to use a separate config option, so that the le= gacy code won't be compiled if not required. Signed-off-by: Roman Gushchin Acked-by: Michal Hocko Acked-by: Shakeel Butt Suggested-by: Matthew Wilcox (Oracle) --- mm/Makefile | 3 ++- mm/memcontrol-v1.c | 3 +++ mm/memcontrol-v1.h | 7 +++++++ 3 files changed, 12 insertions(+), 1 deletion(-) create mode 100644 mm/memcontrol-v1.c create mode 100644 mm/memcontrol-v1.h diff --git a/mm/Makefile b/mm/Makefile index 8fb85acda1b1..124d4dea2035 100644 --- a/mm/Makefile +++ b/mm/Makefile @@ -26,6 +26,7 @@ KCOV_INSTRUMENT_page_alloc.o :=3D n KCOV_INSTRUMENT_debug-pagealloc.o :=3D n KCOV_INSTRUMENT_kmemleak.o :=3D n KCOV_INSTRUMENT_memcontrol.o :=3D n +KCOV_INSTRUMENT_memcontrol-v1.o :=3D n KCOV_INSTRUMENT_mmzone.o :=3D n KCOV_INSTRUMENT_vmstat.o :=3D n KCOV_INSTRUMENT_failslab.o :=3D n @@ -95,7 +96,7 @@ obj-$(CONFIG_NUMA) +=3D memory-tiers.o obj-$(CONFIG_DEVICE_MIGRATION) +=3D migrate_device.o obj-$(CONFIG_TRANSPARENT_HUGEPAGE) +=3D huge_memory.o khugepaged.o obj-$(CONFIG_PAGE_COUNTER) +=3D page_counter.o -obj-$(CONFIG_MEMCG) +=3D memcontrol.o vmpressure.o +obj-$(CONFIG_MEMCG) +=3D memcontrol.o memcontrol-v1.o vmpressure.o ifdef CONFIG_SWAP obj-$(CONFIG_MEMCG) +=3D swap_cgroup.o endif diff --git a/mm/memcontrol-v1.c b/mm/memcontrol-v1.c new file mode 100644 index 000000000000..a941446ba575 --- /dev/null +++ b/mm/memcontrol-v1.c @@ -0,0 +1,3 @@ +// SPDX-License-Identifier: GPL-2.0-or-later + +#include "memcontrol-v1.h" diff --git a/mm/memcontrol-v1.h b/mm/memcontrol-v1.h new file mode 100644 index 000000000000..7c5f094755ff --- /dev/null +++ b/mm/memcontrol-v1.h @@ -0,0 +1,7 @@ +/* SPDX-License-Identifier: GPL-2.0-or-later */ + +#ifndef __MM_MEMCONTROL_V1_H +#define __MM_MEMCONTROL_V1_H + + +#endif /* __MM_MEMCONTROL_V1_H */ --=20 2.45.2 From nobody Sat Feb 7 20:06:57 2026 Received: from out-182.mta0.migadu.com (out-182.mta0.migadu.com [91.218.175.182]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 86AAC12E7E for ; Tue, 25 Jun 2024 00:59:32 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=91.218.175.182 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1719277175; cv=none; b=LQ9TlBtoL2Ud/swi6LKQQebWCjvtykuenrNcKro7moT0zq5+/uy3tMTWMhVfM2pDgQi8LjgupDp5nk/GUdbMS6efP+G1JWZkjSFcVxejXtGasphElYpMfWmjhZ6PkyJJXLI6Qy1uqavj4/g/iQ5BnljwnrdKvolFkENBgEzNA5A= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1719277175; c=relaxed/simple; bh=vPKmJJJOZYVc4kEAqMQOqjWeooMK18QKthpBQ1BFnaQ=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=I0TNZZ4NgkKwuI8W5m2jEEVT1ikGuKKMqImOtxCawH6K5PLp1JnaSvJnglLme+gQ7WnLyFu2hjq1WTZRaq5p9Rd/a/gjf/VMuznuOYznQ8MEGRErrhe0GY4AFWednIsuC/PAOVZWDph/2jhqFJim00jWWx1UX2GDZHWmD+VZ4XU= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev; spf=pass smtp.mailfrom=linux.dev; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b=GmhrTbP+; arc=none smtp.client-ip=91.218.175.182 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.dev Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b="GmhrTbP+" X-Envelope-To: akpm@linux-foundation.org DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1719277170; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=rjhMCfmQ5JlzFrU4RV8zWub/Fm9YmIR6Xcl5saEXBXg=; b=GmhrTbP+cQCGqW0v1/PBAdYCK9w59SjAQqjOeUubJgdxGDNfRE7O+OSVZO2hMOj0thmYpL 0HIc5CZIo4WY5zNd2rGVoPrZLXvIDmkwoMrtij56Tjl4iPPSRcrbBeSLihGhfEibFgsv8c pFkIK2YNYrNSp9nfGFUPwklXrLeEJ7I= X-Envelope-To: hannes@cmpxchg.org X-Envelope-To: mhocko@kernel.org X-Envelope-To: shakeel.butt@linux.dev X-Envelope-To: muchun.song@linux.dev X-Envelope-To: linux-kernel@vger.kernel.org X-Envelope-To: cgroups@vger.kernel.org X-Envelope-To: linux-mm@kvack.org X-Envelope-To: roman.gushchin@linux.dev X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. From: Roman Gushchin To: Andrew Morton Cc: Johannes Weiner , Michal Hocko , Shakeel Butt , Muchun Song , linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, linux-mm@kvack.org, Roman Gushchin Subject: [PATCH v2 02/14] mm: memcg: move soft limit reclaim code to memcontrol-v1.c Date: Mon, 24 Jun 2024 17:58:54 -0700 Message-ID: <20240625005906.106920-3-roman.gushchin@linux.dev> In-Reply-To: <20240625005906.106920-1-roman.gushchin@linux.dev> References: <20240625005906.106920-1-roman.gushchin@linux.dev> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Migadu-Flow: FLOW_OUT Content-Type: text/plain; charset="utf-8" Soft limits are cgroup v1-specific and are not supported by cgroup v2, so let's move the corresponding code into memcontrol-v1.c. Aside from simple moving the code, this commits introduces a trivial memcg1_soft_limit_reset() function to reset soft limits and also moves the global soft limit tree initialization code into a new memcg1_init() function. It also moves corresponding declarations shared between memcontrol.c and memcontrol-v1.c into mm/memcontrol-v1.h. Signed-off-by: Roman Gushchin Acked-by: Michal Hocko Acked-by: Shakeel Butt Suggested-by: Matthew Wilcox (Oracle) --- mm/memcontrol-v1.c | 342 +++++++++++++++++++++++++++++++++++++++++++++ mm/memcontrol-v1.h | 7 + mm/memcontrol.c | 337 +------------------------------------------- 3 files changed, 353 insertions(+), 333 deletions(-) diff --git a/mm/memcontrol-v1.c b/mm/memcontrol-v1.c index a941446ba575..2ccb8406fa84 100644 --- a/mm/memcontrol-v1.c +++ b/mm/memcontrol-v1.c @@ -1,3 +1,345 @@ // SPDX-License-Identifier: GPL-2.0-or-later =20 +#include +#include +#include + #include "memcontrol-v1.h" + +/* + * Cgroups above their limits are maintained in a RB-Tree, independent of + * their hierarchy representation + */ + +struct mem_cgroup_tree_per_node { + struct rb_root rb_root; + struct rb_node *rb_rightmost; + spinlock_t lock; +}; + +struct mem_cgroup_tree { + struct mem_cgroup_tree_per_node *rb_tree_per_node[MAX_NUMNODES]; +}; + +static struct mem_cgroup_tree soft_limit_tree __read_mostly; + +/* + * Maximum loops in mem_cgroup_soft_reclaim(), used for soft + * limit reclaim to prevent infinite loops, if they ever occur. + */ +#define MEM_CGROUP_MAX_RECLAIM_LOOPS 100 +#define MEM_CGROUP_MAX_SOFT_LIMIT_RECLAIM_LOOPS 2 + +static void __mem_cgroup_insert_exceeded(struct mem_cgroup_per_node *mz, + struct mem_cgroup_tree_per_node *mctz, + unsigned long new_usage_in_excess) +{ + struct rb_node **p =3D &mctz->rb_root.rb_node; + struct rb_node *parent =3D NULL; + struct mem_cgroup_per_node *mz_node; + bool rightmost =3D true; + + if (mz->on_tree) + return; + + mz->usage_in_excess =3D new_usage_in_excess; + if (!mz->usage_in_excess) + return; + while (*p) { + parent =3D *p; + mz_node =3D rb_entry(parent, struct mem_cgroup_per_node, + tree_node); + if (mz->usage_in_excess < mz_node->usage_in_excess) { + p =3D &(*p)->rb_left; + rightmost =3D false; + } else { + p =3D &(*p)->rb_right; + } + } + + if (rightmost) + mctz->rb_rightmost =3D &mz->tree_node; + + rb_link_node(&mz->tree_node, parent, p); + rb_insert_color(&mz->tree_node, &mctz->rb_root); + mz->on_tree =3D true; +} + +static void __mem_cgroup_remove_exceeded(struct mem_cgroup_per_node *mz, + struct mem_cgroup_tree_per_node *mctz) +{ + if (!mz->on_tree) + return; + + if (&mz->tree_node =3D=3D mctz->rb_rightmost) + mctz->rb_rightmost =3D rb_prev(&mz->tree_node); + + rb_erase(&mz->tree_node, &mctz->rb_root); + mz->on_tree =3D false; +} + +static void mem_cgroup_remove_exceeded(struct mem_cgroup_per_node *mz, + struct mem_cgroup_tree_per_node *mctz) +{ + unsigned long flags; + + spin_lock_irqsave(&mctz->lock, flags); + __mem_cgroup_remove_exceeded(mz, mctz); + spin_unlock_irqrestore(&mctz->lock, flags); +} + +static unsigned long soft_limit_excess(struct mem_cgroup *memcg) +{ + unsigned long nr_pages =3D page_counter_read(&memcg->memory); + unsigned long soft_limit =3D READ_ONCE(memcg->soft_limit); + unsigned long excess =3D 0; + + if (nr_pages > soft_limit) + excess =3D nr_pages - soft_limit; + + return excess; +} + +void mem_cgroup_update_tree(struct mem_cgroup *memcg, int nid) +{ + unsigned long excess; + struct mem_cgroup_per_node *mz; + struct mem_cgroup_tree_per_node *mctz; + + if (lru_gen_enabled()) { + if (soft_limit_excess(memcg)) + lru_gen_soft_reclaim(memcg, nid); + return; + } + + mctz =3D soft_limit_tree.rb_tree_per_node[nid]; + if (!mctz) + return; + /* + * Necessary to update all ancestors when hierarchy is used. + * because their event counter is not touched. + */ + for (; memcg; memcg =3D parent_mem_cgroup(memcg)) { + mz =3D memcg->nodeinfo[nid]; + excess =3D soft_limit_excess(memcg); + /* + * We have to update the tree if mz is on RB-tree or + * mem is over its softlimit. + */ + if (excess || mz->on_tree) { + unsigned long flags; + + spin_lock_irqsave(&mctz->lock, flags); + /* if on-tree, remove it */ + if (mz->on_tree) + __mem_cgroup_remove_exceeded(mz, mctz); + /* + * Insert again. mz->usage_in_excess will be updated. + * If excess is 0, no tree ops. + */ + __mem_cgroup_insert_exceeded(mz, mctz, excess); + spin_unlock_irqrestore(&mctz->lock, flags); + } + } +} + +void mem_cgroup_remove_from_trees(struct mem_cgroup *memcg) +{ + struct mem_cgroup_tree_per_node *mctz; + struct mem_cgroup_per_node *mz; + int nid; + + for_each_node(nid) { + mz =3D memcg->nodeinfo[nid]; + mctz =3D soft_limit_tree.rb_tree_per_node[nid]; + if (mctz) + mem_cgroup_remove_exceeded(mz, mctz); + } +} + +static struct mem_cgroup_per_node * +__mem_cgroup_largest_soft_limit_node(struct mem_cgroup_tree_per_node *mctz) +{ + struct mem_cgroup_per_node *mz; + +retry: + mz =3D NULL; + if (!mctz->rb_rightmost) + goto done; /* Nothing to reclaim from */ + + mz =3D rb_entry(mctz->rb_rightmost, + struct mem_cgroup_per_node, tree_node); + /* + * Remove the node now but someone else can add it back, + * we will to add it back at the end of reclaim to its correct + * position in the tree. + */ + __mem_cgroup_remove_exceeded(mz, mctz); + if (!soft_limit_excess(mz->memcg) || + !css_tryget(&mz->memcg->css)) + goto retry; +done: + return mz; +} + +static struct mem_cgroup_per_node * +mem_cgroup_largest_soft_limit_node(struct mem_cgroup_tree_per_node *mctz) +{ + struct mem_cgroup_per_node *mz; + + spin_lock_irq(&mctz->lock); + mz =3D __mem_cgroup_largest_soft_limit_node(mctz); + spin_unlock_irq(&mctz->lock); + return mz; +} + +static int mem_cgroup_soft_reclaim(struct mem_cgroup *root_memcg, + pg_data_t *pgdat, + gfp_t gfp_mask, + unsigned long *total_scanned) +{ + struct mem_cgroup *victim =3D NULL; + int total =3D 0; + int loop =3D 0; + unsigned long excess; + unsigned long nr_scanned; + struct mem_cgroup_reclaim_cookie reclaim =3D { + .pgdat =3D pgdat, + }; + + excess =3D soft_limit_excess(root_memcg); + + while (1) { + victim =3D mem_cgroup_iter(root_memcg, victim, &reclaim); + if (!victim) { + loop++; + if (loop >=3D 2) { + /* + * If we have not been able to reclaim + * anything, it might because there are + * no reclaimable pages under this hierarchy + */ + if (!total) + break; + /* + * We want to do more targeted reclaim. + * excess >> 2 is not to excessive so as to + * reclaim too much, nor too less that we keep + * coming back to reclaim from this cgroup + */ + if (total >=3D (excess >> 2) || + (loop > MEM_CGROUP_MAX_RECLAIM_LOOPS)) + break; + } + continue; + } + total +=3D mem_cgroup_shrink_node(victim, gfp_mask, false, + pgdat, &nr_scanned); + *total_scanned +=3D nr_scanned; + if (!soft_limit_excess(root_memcg)) + break; + } + mem_cgroup_iter_break(root_memcg, victim); + return total; +} + +unsigned long mem_cgroup_soft_limit_reclaim(pg_data_t *pgdat, int order, + gfp_t gfp_mask, + unsigned long *total_scanned) +{ + unsigned long nr_reclaimed =3D 0; + struct mem_cgroup_per_node *mz, *next_mz =3D NULL; + unsigned long reclaimed; + int loop =3D 0; + struct mem_cgroup_tree_per_node *mctz; + unsigned long excess; + + if (lru_gen_enabled()) + return 0; + + if (order > 0) + return 0; + + mctz =3D soft_limit_tree.rb_tree_per_node[pgdat->node_id]; + + /* + * Do not even bother to check the largest node if the root + * is empty. Do it lockless to prevent lock bouncing. Races + * are acceptable as soft limit is best effort anyway. + */ + if (!mctz || RB_EMPTY_ROOT(&mctz->rb_root)) + return 0; + + /* + * This loop can run a while, specially if mem_cgroup's continuously + * keep exceeding their soft limit and putting the system under + * pressure + */ + do { + if (next_mz) + mz =3D next_mz; + else + mz =3D mem_cgroup_largest_soft_limit_node(mctz); + if (!mz) + break; + + reclaimed =3D mem_cgroup_soft_reclaim(mz->memcg, pgdat, + gfp_mask, total_scanned); + nr_reclaimed +=3D reclaimed; + spin_lock_irq(&mctz->lock); + + /* + * If we failed to reclaim anything from this memory cgroup + * it is time to move on to the next cgroup + */ + next_mz =3D NULL; + if (!reclaimed) + next_mz =3D __mem_cgroup_largest_soft_limit_node(mctz); + + excess =3D soft_limit_excess(mz->memcg); + /* + * One school of thought says that we should not add + * back the node to the tree if reclaim returns 0. + * But our reclaim could return 0, simply because due + * to priority we are exposing a smaller subset of + * memory to reclaim from. Consider this as a longer + * term TODO. + */ + /* If excess =3D=3D 0, no tree ops */ + __mem_cgroup_insert_exceeded(mz, mctz, excess); + spin_unlock_irq(&mctz->lock); + css_put(&mz->memcg->css); + loop++; + /* + * Could not reclaim anything and there are no more + * mem cgroups to try or we seem to be looping without + * reclaiming anything. + */ + if (!nr_reclaimed && + (next_mz =3D=3D NULL || + loop > MEM_CGROUP_MAX_SOFT_LIMIT_RECLAIM_LOOPS)) + break; + } while (!nr_reclaimed); + if (next_mz) + css_put(&next_mz->memcg->css); + return nr_reclaimed; +} + +static int __init memcg1_init(void) +{ + int node; + + for_each_node(node) { + struct mem_cgroup_tree_per_node *rtpn; + + rtpn =3D kzalloc_node(sizeof(*rtpn), GFP_KERNEL, node); + + rtpn->rb_root =3D RB_ROOT; + rtpn->rb_rightmost =3D NULL; + spin_lock_init(&rtpn->lock); + soft_limit_tree.rb_tree_per_node[node] =3D rtpn; + } + + return 0; +} +subsys_initcall(memcg1_init); diff --git a/mm/memcontrol-v1.h b/mm/memcontrol-v1.h index 7c5f094755ff..4da6fa561c6d 100644 --- a/mm/memcontrol-v1.h +++ b/mm/memcontrol-v1.h @@ -3,5 +3,12 @@ #ifndef __MM_MEMCONTROL_V1_H #define __MM_MEMCONTROL_V1_H =20 +void mem_cgroup_update_tree(struct mem_cgroup *memcg, int nid); +void mem_cgroup_remove_from_trees(struct mem_cgroup *memcg); + +static inline void memcg1_soft_limit_reset(struct mem_cgroup *memcg) +{ + WRITE_ONCE(memcg->soft_limit, PAGE_COUNTER_MAX); +} =20 #endif /* __MM_MEMCONTROL_V1_H */ diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 974bd160838c..003e944f34ea 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -72,6 +72,7 @@ #include #include "slab.h" #include "swap.h" +#include "memcontrol-v1.h" =20 #include =20 @@ -108,23 +109,6 @@ static bool do_memsw_account(void) #define THRESHOLDS_EVENTS_TARGET 128 #define SOFTLIMIT_EVENTS_TARGET 1024 =20 -/* - * Cgroups above their limits are maintained in a RB-Tree, independent of - * their hierarchy representation - */ - -struct mem_cgroup_tree_per_node { - struct rb_root rb_root; - struct rb_node *rb_rightmost; - spinlock_t lock; -}; - -struct mem_cgroup_tree { - struct mem_cgroup_tree_per_node *rb_tree_per_node[MAX_NUMNODES]; -}; - -static struct mem_cgroup_tree soft_limit_tree __read_mostly; - /* for OOM */ struct mem_cgroup_eventfd_list { struct list_head list; @@ -199,13 +183,6 @@ static struct move_charge_struct { .waitq =3D __WAIT_QUEUE_HEAD_INITIALIZER(mc.waitq), }; =20 -/* - * Maximum loops in mem_cgroup_soft_reclaim(), used for soft - * limit reclaim to prevent infinite loops, if they ever occur. - */ -#define MEM_CGROUP_MAX_RECLAIM_LOOPS 100 -#define MEM_CGROUP_MAX_SOFT_LIMIT_RECLAIM_LOOPS 2 - /* for encoding cft->private value on file */ enum res_type { _MEM, @@ -413,169 +390,6 @@ ino_t page_cgroup_ino(struct page *page) return ino; } =20 -static void __mem_cgroup_insert_exceeded(struct mem_cgroup_per_node *mz, - struct mem_cgroup_tree_per_node *mctz, - unsigned long new_usage_in_excess) -{ - struct rb_node **p =3D &mctz->rb_root.rb_node; - struct rb_node *parent =3D NULL; - struct mem_cgroup_per_node *mz_node; - bool rightmost =3D true; - - if (mz->on_tree) - return; - - mz->usage_in_excess =3D new_usage_in_excess; - if (!mz->usage_in_excess) - return; - while (*p) { - parent =3D *p; - mz_node =3D rb_entry(parent, struct mem_cgroup_per_node, - tree_node); - if (mz->usage_in_excess < mz_node->usage_in_excess) { - p =3D &(*p)->rb_left; - rightmost =3D false; - } else { - p =3D &(*p)->rb_right; - } - } - - if (rightmost) - mctz->rb_rightmost =3D &mz->tree_node; - - rb_link_node(&mz->tree_node, parent, p); - rb_insert_color(&mz->tree_node, &mctz->rb_root); - mz->on_tree =3D true; -} - -static void __mem_cgroup_remove_exceeded(struct mem_cgroup_per_node *mz, - struct mem_cgroup_tree_per_node *mctz) -{ - if (!mz->on_tree) - return; - - if (&mz->tree_node =3D=3D mctz->rb_rightmost) - mctz->rb_rightmost =3D rb_prev(&mz->tree_node); - - rb_erase(&mz->tree_node, &mctz->rb_root); - mz->on_tree =3D false; -} - -static void mem_cgroup_remove_exceeded(struct mem_cgroup_per_node *mz, - struct mem_cgroup_tree_per_node *mctz) -{ - unsigned long flags; - - spin_lock_irqsave(&mctz->lock, flags); - __mem_cgroup_remove_exceeded(mz, mctz); - spin_unlock_irqrestore(&mctz->lock, flags); -} - -static unsigned long soft_limit_excess(struct mem_cgroup *memcg) -{ - unsigned long nr_pages =3D page_counter_read(&memcg->memory); - unsigned long soft_limit =3D READ_ONCE(memcg->soft_limit); - unsigned long excess =3D 0; - - if (nr_pages > soft_limit) - excess =3D nr_pages - soft_limit; - - return excess; -} - -static void mem_cgroup_update_tree(struct mem_cgroup *memcg, int nid) -{ - unsigned long excess; - struct mem_cgroup_per_node *mz; - struct mem_cgroup_tree_per_node *mctz; - - if (lru_gen_enabled()) { - if (soft_limit_excess(memcg)) - lru_gen_soft_reclaim(memcg, nid); - return; - } - - mctz =3D soft_limit_tree.rb_tree_per_node[nid]; - if (!mctz) - return; - /* - * Necessary to update all ancestors when hierarchy is used. - * because their event counter is not touched. - */ - for (; memcg; memcg =3D parent_mem_cgroup(memcg)) { - mz =3D memcg->nodeinfo[nid]; - excess =3D soft_limit_excess(memcg); - /* - * We have to update the tree if mz is on RB-tree or - * mem is over its softlimit. - */ - if (excess || mz->on_tree) { - unsigned long flags; - - spin_lock_irqsave(&mctz->lock, flags); - /* if on-tree, remove it */ - if (mz->on_tree) - __mem_cgroup_remove_exceeded(mz, mctz); - /* - * Insert again. mz->usage_in_excess will be updated. - * If excess is 0, no tree ops. - */ - __mem_cgroup_insert_exceeded(mz, mctz, excess); - spin_unlock_irqrestore(&mctz->lock, flags); - } - } -} - -static void mem_cgroup_remove_from_trees(struct mem_cgroup *memcg) -{ - struct mem_cgroup_tree_per_node *mctz; - struct mem_cgroup_per_node *mz; - int nid; - - for_each_node(nid) { - mz =3D memcg->nodeinfo[nid]; - mctz =3D soft_limit_tree.rb_tree_per_node[nid]; - if (mctz) - mem_cgroup_remove_exceeded(mz, mctz); - } -} - -static struct mem_cgroup_per_node * -__mem_cgroup_largest_soft_limit_node(struct mem_cgroup_tree_per_node *mctz) -{ - struct mem_cgroup_per_node *mz; - -retry: - mz =3D NULL; - if (!mctz->rb_rightmost) - goto done; /* Nothing to reclaim from */ - - mz =3D rb_entry(mctz->rb_rightmost, - struct mem_cgroup_per_node, tree_node); - /* - * Remove the node now but someone else can add it back, - * we will to add it back at the end of reclaim to its correct - * position in the tree. - */ - __mem_cgroup_remove_exceeded(mz, mctz); - if (!soft_limit_excess(mz->memcg) || - !css_tryget(&mz->memcg->css)) - goto retry; -done: - return mz; -} - -static struct mem_cgroup_per_node * -mem_cgroup_largest_soft_limit_node(struct mem_cgroup_tree_per_node *mctz) -{ - struct mem_cgroup_per_node *mz; - - spin_lock_irq(&mctz->lock); - mz =3D __mem_cgroup_largest_soft_limit_node(mctz); - spin_unlock_irq(&mctz->lock); - return mz; -} - /* Subset of node_stat_item for memcg stats */ static const unsigned int memcg_node_stat_items[] =3D { NR_INACTIVE_ANON, @@ -1980,56 +1794,6 @@ static bool mem_cgroup_out_of_memory(struct mem_cgro= up *memcg, gfp_t gfp_mask, return ret; } =20 -static int mem_cgroup_soft_reclaim(struct mem_cgroup *root_memcg, - pg_data_t *pgdat, - gfp_t gfp_mask, - unsigned long *total_scanned) -{ - struct mem_cgroup *victim =3D NULL; - int total =3D 0; - int loop =3D 0; - unsigned long excess; - unsigned long nr_scanned; - struct mem_cgroup_reclaim_cookie reclaim =3D { - .pgdat =3D pgdat, - }; - - excess =3D soft_limit_excess(root_memcg); - - while (1) { - victim =3D mem_cgroup_iter(root_memcg, victim, &reclaim); - if (!victim) { - loop++; - if (loop >=3D 2) { - /* - * If we have not been able to reclaim - * anything, it might because there are - * no reclaimable pages under this hierarchy - */ - if (!total) - break; - /* - * We want to do more targeted reclaim. - * excess >> 2 is not to excessive so as to - * reclaim too much, nor too less that we keep - * coming back to reclaim from this cgroup - */ - if (total >=3D (excess >> 2) || - (loop > MEM_CGROUP_MAX_RECLAIM_LOOPS)) - break; - } - continue; - } - total +=3D mem_cgroup_shrink_node(victim, gfp_mask, false, - pgdat, &nr_scanned); - *total_scanned +=3D nr_scanned; - if (!soft_limit_excess(root_memcg)) - break; - } - mem_cgroup_iter_break(root_memcg, victim); - return total; -} - #ifdef CONFIG_LOCKDEP static struct lockdep_map memcg_oom_lock_dep_map =3D { .name =3D "memcg_oom_lock", @@ -3925,88 +3689,6 @@ static int mem_cgroup_resize_max(struct mem_cgroup *= memcg, return ret; } =20 -unsigned long mem_cgroup_soft_limit_reclaim(pg_data_t *pgdat, int order, - gfp_t gfp_mask, - unsigned long *total_scanned) -{ - unsigned long nr_reclaimed =3D 0; - struct mem_cgroup_per_node *mz, *next_mz =3D NULL; - unsigned long reclaimed; - int loop =3D 0; - struct mem_cgroup_tree_per_node *mctz; - unsigned long excess; - - if (lru_gen_enabled()) - return 0; - - if (order > 0) - return 0; - - mctz =3D soft_limit_tree.rb_tree_per_node[pgdat->node_id]; - - /* - * Do not even bother to check the largest node if the root - * is empty. Do it lockless to prevent lock bouncing. Races - * are acceptable as soft limit is best effort anyway. - */ - if (!mctz || RB_EMPTY_ROOT(&mctz->rb_root)) - return 0; - - /* - * This loop can run a while, specially if mem_cgroup's continuously - * keep exceeding their soft limit and putting the system under - * pressure - */ - do { - if (next_mz) - mz =3D next_mz; - else - mz =3D mem_cgroup_largest_soft_limit_node(mctz); - if (!mz) - break; - - reclaimed =3D mem_cgroup_soft_reclaim(mz->memcg, pgdat, - gfp_mask, total_scanned); - nr_reclaimed +=3D reclaimed; - spin_lock_irq(&mctz->lock); - - /* - * If we failed to reclaim anything from this memory cgroup - * it is time to move on to the next cgroup - */ - next_mz =3D NULL; - if (!reclaimed) - next_mz =3D __mem_cgroup_largest_soft_limit_node(mctz); - - excess =3D soft_limit_excess(mz->memcg); - /* - * One school of thought says that we should not add - * back the node to the tree if reclaim returns 0. - * But our reclaim could return 0, simply because due - * to priority we are exposing a smaller subset of - * memory to reclaim from. Consider this as a longer - * term TODO. - */ - /* If excess =3D=3D 0, no tree ops */ - __mem_cgroup_insert_exceeded(mz, mctz, excess); - spin_unlock_irq(&mctz->lock); - css_put(&mz->memcg->css); - loop++; - /* - * Could not reclaim anything and there are no more - * mem cgroups to try or we seem to be looping without - * reclaiming anything. - */ - if (!nr_reclaimed && - (next_mz =3D=3D NULL || - loop > MEM_CGROUP_MAX_SOFT_LIMIT_RECLAIM_LOOPS)) - break; - } while (!nr_reclaimed); - if (next_mz) - css_put(&next_mz->memcg->css); - return nr_reclaimed; -} - /* * Reclaims as many pages from the given memcg as possible. * @@ -5784,7 +5466,7 @@ mem_cgroup_css_alloc(struct cgroup_subsys_state *pare= nt_css) return ERR_CAST(memcg); =20 page_counter_set_high(&memcg->memory, PAGE_COUNTER_MAX); - WRITE_ONCE(memcg->soft_limit, PAGE_COUNTER_MAX); + memcg1_soft_limit_reset(memcg); #if defined(CONFIG_MEMCG_KMEM) && defined(CONFIG_ZSWAP) memcg->zswap_max =3D PAGE_COUNTER_MAX; WRITE_ONCE(memcg->zswap_writeback, @@ -5957,7 +5639,7 @@ static void mem_cgroup_css_reset(struct cgroup_subsys= _state *css) page_counter_set_min(&memcg->memory, 0); page_counter_set_low(&memcg->memory, 0); page_counter_set_high(&memcg->memory, PAGE_COUNTER_MAX); - WRITE_ONCE(memcg->soft_limit, PAGE_COUNTER_MAX); + memcg1_soft_limit_reset(memcg); page_counter_set_high(&memcg->swap, PAGE_COUNTER_MAX); memcg_wb_domain_size_changed(memcg); } @@ -7984,7 +7666,7 @@ __setup("cgroup.memory=3D", cgroup_memory); */ static int __init mem_cgroup_init(void) { - int cpu, node; + int cpu; =20 /* * Currently s32 type (can refer to struct batched_lruvec_stat) is @@ -8001,17 +7683,6 @@ static int __init mem_cgroup_init(void) INIT_WORK(&per_cpu_ptr(&memcg_stock, cpu)->work, drain_local_stock); =20 - for_each_node(node) { - struct mem_cgroup_tree_per_node *rtpn; - - rtpn =3D kzalloc_node(sizeof(*rtpn), GFP_KERNEL, node); - - rtpn->rb_root =3D RB_ROOT; - rtpn->rb_rightmost =3D NULL; - spin_lock_init(&rtpn->lock); - soft_limit_tree.rb_tree_per_node[node] =3D rtpn; - } - return 0; } subsys_initcall(mem_cgroup_init); --=20 2.45.2 From nobody Sat Feb 7 20:06:57 2026 Received: from out-188.mta0.migadu.com (out-188.mta0.migadu.com [91.218.175.188]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 304A710A35 for ; Tue, 25 Jun 2024 00:59:34 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=91.218.175.188 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1719277177; cv=none; b=B7n3KgNj0wkJGtIRxsCONL2HoPutY11e4EXNXFKZJ0xhnXzhY1P+CGAkOIYLs4T6BcitnVq7N/XI1l15F7PMRgSTP08A+qdYXJRtRjknOOP7RlnrkOnoCrYJQ2Y9jWOEpXXPyZI54zDVB451Eao9fBP9O1YiNMAOoZ1Unx+LWeo= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1719277177; c=relaxed/simple; bh=kry4xK+p8M5zaDaq9rnZFn3CVVELpltW4OQH7B4V3yE=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=HqDmKDA7XuaVY//Rm8SOkfwq7RFa2emrY9pcp3+BdKeoSoHHpWxHML5oZgT+iN9XrsglHiTU0yAtGHwlxz1/C+WD+VCJ3UYQY7c/tBUyFyM2JhFBvVuR5r4o1gWzTQzQ+XbUP6O1HjXsy2WsCHhk4tximfoukJ/f/ldk26YrhXg= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev; spf=pass smtp.mailfrom=linux.dev; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b=vC47+CqM; arc=none smtp.client-ip=91.218.175.188 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.dev Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b="vC47+CqM" X-Envelope-To: akpm@linux-foundation.org DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1719277173; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=jqHMldr5yb8EHMopv4QA8bs9iV164cfr+gO2HUVb6OI=; b=vC47+CqMGWRzP59I6JXMOgvoaLbgn6Avaqr2H6e/hxJishCg/ejeMxkMXhSHpjUjIptXFO LFVdS25i0wSGT3zri597XOuNN81MUCYX5Yb7NzyvW3ye0M4dADWj+tGmjFyCUAN5oPQFgE 8zmBGd0hMknLsB9MyuQU7jNjIDMmrdA= X-Envelope-To: hannes@cmpxchg.org X-Envelope-To: mhocko@kernel.org X-Envelope-To: shakeel.butt@linux.dev X-Envelope-To: muchun.song@linux.dev X-Envelope-To: linux-kernel@vger.kernel.org X-Envelope-To: cgroups@vger.kernel.org X-Envelope-To: linux-mm@kvack.org X-Envelope-To: roman.gushchin@linux.dev X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. From: Roman Gushchin To: Andrew Morton Cc: Johannes Weiner , Michal Hocko , Shakeel Butt , Muchun Song , linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, linux-mm@kvack.org, Roman Gushchin Subject: [PATCH v2 03/14] mm: memcg: rename soft limit reclaim-related functions Date: Mon, 24 Jun 2024 17:58:55 -0700 Message-ID: <20240625005906.106920-4-roman.gushchin@linux.dev> In-Reply-To: <20240625005906.106920-1-roman.gushchin@linux.dev> References: <20240625005906.106920-1-roman.gushchin@linux.dev> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Migadu-Flow: FLOW_OUT Content-Type: text/plain; charset="utf-8" Rename exported function related to the softlimit reclaim to have memcg1_ prefix. Signed-off-by: Roman Gushchin Acked-by: Michal Hocko Acked-by: Shakeel Butt Suggested-by: Matthew Wilcox (Oracle) --- include/linux/memcontrol.h | 12 ++++++------ mm/memcontrol-v1.c | 6 +++--- mm/memcontrol-v1.h | 4 ++-- mm/memcontrol.c | 4 ++-- mm/vmscan.c | 10 +++++----- 5 files changed, 18 insertions(+), 18 deletions(-) diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h index 7403dd5926eb..83c8327455d8 100644 --- a/include/linux/memcontrol.h +++ b/include/linux/memcontrol.h @@ -1121,9 +1121,9 @@ static inline void memcg_memory_event_mm(struct mm_st= ruct *mm, =20 void split_page_memcg(struct page *head, int old_order, int new_order); =20 -unsigned long mem_cgroup_soft_limit_reclaim(pg_data_t *pgdat, int order, - gfp_t gfp_mask, - unsigned long *total_scanned); +unsigned long memcg1_soft_limit_reclaim(pg_data_t *pgdat, int order, + gfp_t gfp_mask, + unsigned long *total_scanned); =20 #else /* CONFIG_MEMCG */ =20 @@ -1572,9 +1572,9 @@ static inline void split_page_memcg(struct page *head= , int old_order, int new_or } =20 static inline -unsigned long mem_cgroup_soft_limit_reclaim(pg_data_t *pgdat, int order, - gfp_t gfp_mask, - unsigned long *total_scanned) +unsigned long memcg1_soft_limit_reclaim(pg_data_t *pgdat, int order, + gfp_t gfp_mask, + unsigned long *total_scanned) { return 0; } diff --git a/mm/memcontrol-v1.c b/mm/memcontrol-v1.c index 2ccb8406fa84..68e2f1a718d3 100644 --- a/mm/memcontrol-v1.c +++ b/mm/memcontrol-v1.c @@ -100,7 +100,7 @@ static unsigned long soft_limit_excess(struct mem_cgrou= p *memcg) return excess; } =20 -void mem_cgroup_update_tree(struct mem_cgroup *memcg, int nid) +void memcg1_update_tree(struct mem_cgroup *memcg, int nid) { unsigned long excess; struct mem_cgroup_per_node *mz; @@ -143,7 +143,7 @@ void mem_cgroup_update_tree(struct mem_cgroup *memcg, i= nt nid) } } =20 -void mem_cgroup_remove_from_trees(struct mem_cgroup *memcg) +void memcg1_remove_from_trees(struct mem_cgroup *memcg) { struct mem_cgroup_tree_per_node *mctz; struct mem_cgroup_per_node *mz; @@ -243,7 +243,7 @@ static int mem_cgroup_soft_reclaim(struct mem_cgroup *r= oot_memcg, return total; } =20 -unsigned long mem_cgroup_soft_limit_reclaim(pg_data_t *pgdat, int order, +unsigned long memcg1_soft_limit_reclaim(pg_data_t *pgdat, int order, gfp_t gfp_mask, unsigned long *total_scanned) { diff --git a/mm/memcontrol-v1.h b/mm/memcontrol-v1.h index 4da6fa561c6d..e37bc7e8d955 100644 --- a/mm/memcontrol-v1.h +++ b/mm/memcontrol-v1.h @@ -3,8 +3,8 @@ #ifndef __MM_MEMCONTROL_V1_H #define __MM_MEMCONTROL_V1_H =20 -void mem_cgroup_update_tree(struct mem_cgroup *memcg, int nid); -void mem_cgroup_remove_from_trees(struct mem_cgroup *memcg); +void memcg1_update_tree(struct mem_cgroup *memcg, int nid); +void memcg1_remove_from_trees(struct mem_cgroup *memcg); =20 static inline void memcg1_soft_limit_reset(struct mem_cgroup *memcg) { diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 003e944f34ea..3479e1af12d5 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -1012,7 +1012,7 @@ static void memcg_check_events(struct mem_cgroup *mem= cg, int nid) MEM_CGROUP_TARGET_SOFTLIMIT); mem_cgroup_threshold(memcg); if (unlikely(do_softlimit)) - mem_cgroup_update_tree(memcg, nid); + memcg1_update_tree(memcg, nid); } } =20 @@ -5610,7 +5610,7 @@ static void mem_cgroup_css_free(struct cgroup_subsys_= state *css) =20 vmpressure_cleanup(&memcg->vmpressure); cancel_work_sync(&memcg->high_work); - mem_cgroup_remove_from_trees(memcg); + memcg1_remove_from_trees(memcg); free_shrinker_info(memcg); mem_cgroup_free(memcg); } diff --git a/mm/vmscan.c b/mm/vmscan.c index 900bad16e506..3d4c681c6d40 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -6186,9 +6186,9 @@ static void shrink_zones(struct zonelist *zonelist, s= truct scan_control *sc) * and balancing, not for a memcg's limit. */ nr_soft_scanned =3D 0; - nr_soft_reclaimed =3D mem_cgroup_soft_limit_reclaim(zone->zone_pgdat, - sc->order, sc->gfp_mask, - &nr_soft_scanned); + nr_soft_reclaimed =3D memcg1_soft_limit_reclaim(zone->zone_pgdat, + sc->order, sc->gfp_mask, + &nr_soft_scanned); sc->nr_reclaimed +=3D nr_soft_reclaimed; sc->nr_scanned +=3D nr_soft_scanned; /* need some check for avoid more shrink_zone() */ @@ -6952,8 +6952,8 @@ static int balance_pgdat(pg_data_t *pgdat, int order,= int highest_zoneidx) /* Call soft limit reclaim before calling shrink_node. */ sc.nr_scanned =3D 0; nr_soft_scanned =3D 0; - nr_soft_reclaimed =3D mem_cgroup_soft_limit_reclaim(pgdat, sc.order, - sc.gfp_mask, &nr_soft_scanned); + nr_soft_reclaimed =3D memcg1_soft_limit_reclaim(pgdat, sc.order, + sc.gfp_mask, &nr_soft_scanned); sc.nr_reclaimed +=3D nr_soft_reclaimed; =20 /* --=20 2.45.2 From nobody Sat Feb 7 20:06:57 2026 Received: from out-174.mta0.migadu.com (out-174.mta0.migadu.com [91.218.175.174]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B3D821B806 for ; Tue, 25 Jun 2024 00:59:37 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=91.218.175.174 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1719277185; cv=none; b=dwLEHvHJZhFdDz0BlfuMlPRo4UKzNzoRMNOOj0fZt3M3fElZJWHBC8VvMTuk+KOzBDdVDNmQ2Ij5eLRGQmRofr9LCumwS9EDTCPaRrvcuvt/ysFrcG0pNT9s+BVvgRlH6k4liYQaFaYaxh4QzBvUu6Z6dA0cZeU/XpMyjIlV774= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1719277185; c=relaxed/simple; bh=SapZMN1d4Ekuo2M4bE4Ez2/1b2z10EpuhdMt2C9L9MA=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=op5q1OfahbvxSRJQG7CKqMe6n7vy44MI+iaNvQk/+tXx8rwuSYs3oNIAPb5vbOsVtlYfDRChiVfTj6KXY6eYbff3HObMkl77XbKJj1kutkL64fNHbBSoy93DiBSrLz5jZj5/o7Gk3M0xyHc9jQaoj7XTc9Jvk9jvnxeoMN4WFwo= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev; spf=pass smtp.mailfrom=linux.dev; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b=KHi0guIT; arc=none smtp.client-ip=91.218.175.174 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.dev Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b="KHi0guIT" X-Envelope-To: akpm@linux-foundation.org DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1719277175; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=LverYrd34j0zpBhJ+AEXga+Lutoo9REj+Fp8pHCDQ84=; b=KHi0guIT/OVgUwFPCNbAUXUMGfHU3m72V1FIE+Z/PTQVJFA4nwMEnPMlyKQUim5WaYO3Ux PiE8k3NXliAjhwjQRb6NO5CbOVQFqd+hXPAcJ/GSvmdNndp/fX+NfUpexf7JLLv7FAoGhG 0NZu4WU+RSBbyL9ffYR+jp9EgmFZIuQ= X-Envelope-To: hannes@cmpxchg.org X-Envelope-To: mhocko@kernel.org X-Envelope-To: shakeel.butt@linux.dev X-Envelope-To: muchun.song@linux.dev X-Envelope-To: linux-kernel@vger.kernel.org X-Envelope-To: cgroups@vger.kernel.org X-Envelope-To: linux-mm@kvack.org X-Envelope-To: roman.gushchin@linux.dev X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. From: Roman Gushchin To: Andrew Morton Cc: Johannes Weiner , Michal Hocko , Shakeel Butt , Muchun Song , linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, linux-mm@kvack.org, Roman Gushchin Subject: [PATCH v2 04/14] mm: memcg: move charge migration code to memcontrol-v1.c Date: Mon, 24 Jun 2024 17:58:56 -0700 Message-ID: <20240625005906.106920-5-roman.gushchin@linux.dev> In-Reply-To: <20240625005906.106920-1-roman.gushchin@linux.dev> References: <20240625005906.106920-1-roman.gushchin@linux.dev> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Migadu-Flow: FLOW_OUT Content-Type: text/plain; charset="utf-8" Unlike the legacy cgroup v1 memory controller, cgroup v2 memory controller doesn't support moving charged pages between cgroups. It's a fairly large and complicated code which created a number of problems in the past. Let's move this code into memcontrol-v1.c. It shaves off 1k lines from memcontrol.c. It's also another step towards making the legacy memory controller code optionally compiled. Signed-off-by: Roman Gushchin Acked-by: Michal Hocko Acked-by: Shakeel Butt Suggested-by: Matthew Wilcox (Oracle) --- mm/memcontrol-v1.c | 981 +++++++++++++++++++++++++++++++++++++++++++ mm/memcontrol-v1.h | 30 ++ mm/memcontrol.c | 1004 +------------------------------------------- 3 files changed, 1019 insertions(+), 996 deletions(-) diff --git a/mm/memcontrol-v1.c b/mm/memcontrol-v1.c index 68e2f1a718d3..f4c8bec5ae1b 100644 --- a/mm/memcontrol-v1.c +++ b/mm/memcontrol-v1.c @@ -3,7 +3,12 @@ #include #include #include +#include +#include +#include =20 +#include "internal.h" +#include "swap.h" #include "memcontrol-v1.h" =20 /* @@ -30,6 +35,31 @@ static struct mem_cgroup_tree soft_limit_tree __read_mos= tly; #define MEM_CGROUP_MAX_RECLAIM_LOOPS 100 #define MEM_CGROUP_MAX_SOFT_LIMIT_RECLAIM_LOOPS 2 =20 +/* Stuffs for move charges at task migration. */ +/* + * Types of charges to be moved. + */ +#define MOVE_ANON 0x1U +#define MOVE_FILE 0x2U +#define MOVE_MASK (MOVE_ANON | MOVE_FILE) + +/* "mc" and its members are protected by cgroup_mutex */ +static struct move_charge_struct { + spinlock_t lock; /* for from, to */ + struct mm_struct *mm; + struct mem_cgroup *from; + struct mem_cgroup *to; + unsigned long flags; + unsigned long precharge; + unsigned long moved_charge; + unsigned long moved_swap; + struct task_struct *moving_task; /* a task moving charges */ + wait_queue_head_t waitq; /* a waitq for other context */ +} mc =3D { + .lock =3D __SPIN_LOCK_UNLOCKED(mc.lock), + .waitq =3D __WAIT_QUEUE_HEAD_INITIALIZER(mc.waitq), +}; + static void __mem_cgroup_insert_exceeded(struct mem_cgroup_per_node *mz, struct mem_cgroup_tree_per_node *mctz, unsigned long new_usage_in_excess) @@ -325,6 +355,957 @@ unsigned long memcg1_soft_limit_reclaim(pg_data_t *pg= dat, int order, return nr_reclaimed; } =20 +/* + * A routine for checking "mem" is under move_account() or not. + * + * Checking a cgroup is mc.from or mc.to or under hierarchy of + * moving cgroups. This is for waiting at high-memory pressure + * caused by "move". + */ +static bool mem_cgroup_under_move(struct mem_cgroup *memcg) +{ + struct mem_cgroup *from; + struct mem_cgroup *to; + bool ret =3D false; + /* + * Unlike task_move routines, we access mc.to, mc.from not under + * mutual exclusion by cgroup_mutex. Here, we take spinlock instead. + */ + spin_lock(&mc.lock); + from =3D mc.from; + to =3D mc.to; + if (!from) + goto unlock; + + ret =3D mem_cgroup_is_descendant(from, memcg) || + mem_cgroup_is_descendant(to, memcg); +unlock: + spin_unlock(&mc.lock); + return ret; +} + +bool mem_cgroup_wait_acct_move(struct mem_cgroup *memcg) +{ + if (mc.moving_task && current !=3D mc.moving_task) { + if (mem_cgroup_under_move(memcg)) { + DEFINE_WAIT(wait); + prepare_to_wait(&mc.waitq, &wait, TASK_INTERRUPTIBLE); + /* moving charge context might have finished. */ + if (mc.moving_task) + schedule(); + finish_wait(&mc.waitq, &wait); + return true; + } + } + return false; +} + +/** + * folio_memcg_lock - Bind a folio to its memcg. + * @folio: The folio. + * + * This function prevents unlocked LRU folios from being moved to + * another cgroup. + * + * It ensures lifetime of the bound memcg. The caller is responsible + * for the lifetime of the folio. + */ +void folio_memcg_lock(struct folio *folio) +{ + struct mem_cgroup *memcg; + unsigned long flags; + + /* + * The RCU lock is held throughout the transaction. The fast + * path can get away without acquiring the memcg->move_lock + * because page moving starts with an RCU grace period. + */ + rcu_read_lock(); + + if (mem_cgroup_disabled()) + return; +again: + memcg =3D folio_memcg(folio); + if (unlikely(!memcg)) + return; + +#ifdef CONFIG_PROVE_LOCKING + local_irq_save(flags); + might_lock(&memcg->move_lock); + local_irq_restore(flags); +#endif + + if (atomic_read(&memcg->moving_account) <=3D 0) + return; + + spin_lock_irqsave(&memcg->move_lock, flags); + if (memcg !=3D folio_memcg(folio)) { + spin_unlock_irqrestore(&memcg->move_lock, flags); + goto again; + } + + /* + * When charge migration first begins, we can have multiple + * critical sections holding the fast-path RCU lock and one + * holding the slowpath move_lock. Track the task who has the + * move_lock for folio_memcg_unlock(). + */ + memcg->move_lock_task =3D current; + memcg->move_lock_flags =3D flags; +} + +static void __folio_memcg_unlock(struct mem_cgroup *memcg) +{ + if (memcg && memcg->move_lock_task =3D=3D current) { + unsigned long flags =3D memcg->move_lock_flags; + + memcg->move_lock_task =3D NULL; + memcg->move_lock_flags =3D 0; + + spin_unlock_irqrestore(&memcg->move_lock, flags); + } + + rcu_read_unlock(); +} + +/** + * folio_memcg_unlock - Release the binding between a folio and its memcg. + * @folio: The folio. + * + * This releases the binding created by folio_memcg_lock(). This does + * not change the accounting of this folio to its memcg, but it does + * permit others to change it. + */ +void folio_memcg_unlock(struct folio *folio) +{ + __folio_memcg_unlock(folio_memcg(folio)); +} + +#ifdef CONFIG_SWAP +/** + * mem_cgroup_move_swap_account - move swap charge and swap_cgroup's recor= d. + * @entry: swap entry to be moved + * @from: mem_cgroup which the entry is moved from + * @to: mem_cgroup which the entry is moved to + * + * It succeeds only when the swap_cgroup's record for this entry is the sa= me + * as the mem_cgroup's id of @from. + * + * Returns 0 on success, -EINVAL on failure. + * + * The caller must have charged to @to, IOW, called page_counter_charge() = about + * both res and memsw, and called css_get(). + */ +static int mem_cgroup_move_swap_account(swp_entry_t entry, + struct mem_cgroup *from, struct mem_cgroup *to) +{ + unsigned short old_id, new_id; + + old_id =3D mem_cgroup_id(from); + new_id =3D mem_cgroup_id(to); + + if (swap_cgroup_cmpxchg(entry, old_id, new_id) =3D=3D old_id) { + mod_memcg_state(from, MEMCG_SWAP, -1); + mod_memcg_state(to, MEMCG_SWAP, 1); + return 0; + } + return -EINVAL; +} +#else +static inline int mem_cgroup_move_swap_account(swp_entry_t entry, + struct mem_cgroup *from, struct mem_cgroup *to) +{ + return -EINVAL; +} +#endif + +u64 mem_cgroup_move_charge_read(struct cgroup_subsys_state *css, + struct cftype *cft) +{ + return mem_cgroup_from_css(css)->move_charge_at_immigrate; +} + +#ifdef CONFIG_MMU +int mem_cgroup_move_charge_write(struct cgroup_subsys_state *css, + struct cftype *cft, u64 val) +{ + struct mem_cgroup *memcg =3D mem_cgroup_from_css(css); + + pr_warn_once("Cgroup memory moving (move_charge_at_immigrate) is deprecat= ed. " + "Please report your usecase to linux-mm@kvack.org if you " + "depend on this functionality.\n"); + + if (val & ~MOVE_MASK) + return -EINVAL; + + /* + * No kind of locking is needed in here, because ->can_attach() will + * check this value once in the beginning of the process, and then carry + * on with stale data. This means that changes to this value will only + * affect task migrations starting after the change. + */ + memcg->move_charge_at_immigrate =3D val; + return 0; +} +#else +int mem_cgroup_move_charge_write(struct cgroup_subsys_state *css, + struct cftype *cft, u64 val) +{ + return -ENOSYS; +} +#endif + +#ifdef CONFIG_MMU +/* Handlers for move charge at task migration. */ +static int mem_cgroup_do_precharge(unsigned long count) +{ + int ret; + + /* Try a single bulk charge without reclaim first, kswapd may wake */ + ret =3D try_charge(mc.to, GFP_KERNEL & ~__GFP_DIRECT_RECLAIM, count); + if (!ret) { + mc.precharge +=3D count; + return ret; + } + + /* Try charges one by one with reclaim, but do not retry */ + while (count--) { + ret =3D try_charge(mc.to, GFP_KERNEL | __GFP_NORETRY, 1); + if (ret) + return ret; + mc.precharge++; + cond_resched(); + } + return 0; +} + +union mc_target { + struct folio *folio; + swp_entry_t ent; +}; + +enum mc_target_type { + MC_TARGET_NONE =3D 0, + MC_TARGET_PAGE, + MC_TARGET_SWAP, + MC_TARGET_DEVICE, +}; + +static struct page *mc_handle_present_pte(struct vm_area_struct *vma, + unsigned long addr, pte_t ptent) +{ + struct page *page =3D vm_normal_page(vma, addr, ptent); + + if (!page) + return NULL; + if (PageAnon(page)) { + if (!(mc.flags & MOVE_ANON)) + return NULL; + } else { + if (!(mc.flags & MOVE_FILE)) + return NULL; + } + get_page(page); + + return page; +} + +#if defined(CONFIG_SWAP) || defined(CONFIG_DEVICE_PRIVATE) +static struct page *mc_handle_swap_pte(struct vm_area_struct *vma, + pte_t ptent, swp_entry_t *entry) +{ + struct page *page =3D NULL; + swp_entry_t ent =3D pte_to_swp_entry(ptent); + + if (!(mc.flags & MOVE_ANON)) + return NULL; + + /* + * Handle device private pages that are not accessible by the CPU, but + * stored as special swap entries in the page table. + */ + if (is_device_private_entry(ent)) { + page =3D pfn_swap_entry_to_page(ent); + if (!get_page_unless_zero(page)) + return NULL; + return page; + } + + if (non_swap_entry(ent)) + return NULL; + + /* + * Because swap_cache_get_folio() updates some statistics counter, + * we call find_get_page() with swapper_space directly. + */ + page =3D find_get_page(swap_address_space(ent), swap_cache_index(ent)); + entry->val =3D ent.val; + + return page; +} +#else +static struct page *mc_handle_swap_pte(struct vm_area_struct *vma, + pte_t ptent, swp_entry_t *entry) +{ + return NULL; +} +#endif + +static struct page *mc_handle_file_pte(struct vm_area_struct *vma, + unsigned long addr, pte_t ptent) +{ + unsigned long index; + struct folio *folio; + + if (!vma->vm_file) /* anonymous vma */ + return NULL; + if (!(mc.flags & MOVE_FILE)) + return NULL; + + /* folio is moved even if it's not RSS of this task(page-faulted). */ + /* shmem/tmpfs may report page out on swap: account for that too. */ + index =3D linear_page_index(vma, addr); + folio =3D filemap_get_incore_folio(vma->vm_file->f_mapping, index); + if (IS_ERR(folio)) + return NULL; + return folio_file_page(folio, index); +} + +/** + * mem_cgroup_move_account - move account of the folio + * @folio: The folio. + * @compound: charge the page as compound or small page + * @from: mem_cgroup which the folio is moved from. + * @to: mem_cgroup which the folio is moved to. @from !=3D @to. + * + * The folio must be locked and not on the LRU. + * + * This function doesn't do "charge" to new cgroup and doesn't do "uncharg= e" + * from old cgroup. + */ +static int mem_cgroup_move_account(struct folio *folio, + bool compound, + struct mem_cgroup *from, + struct mem_cgroup *to) +{ + struct lruvec *from_vec, *to_vec; + struct pglist_data *pgdat; + unsigned int nr_pages =3D compound ? folio_nr_pages(folio) : 1; + int nid, ret; + + VM_BUG_ON(from =3D=3D to); + VM_BUG_ON_FOLIO(!folio_test_locked(folio), folio); + VM_BUG_ON_FOLIO(folio_test_lru(folio), folio); + VM_BUG_ON(compound && !folio_test_large(folio)); + + ret =3D -EINVAL; + if (folio_memcg(folio) !=3D from) + goto out; + + pgdat =3D folio_pgdat(folio); + from_vec =3D mem_cgroup_lruvec(from, pgdat); + to_vec =3D mem_cgroup_lruvec(to, pgdat); + + folio_memcg_lock(folio); + + if (folio_test_anon(folio)) { + if (folio_mapped(folio)) { + __mod_lruvec_state(from_vec, NR_ANON_MAPPED, -nr_pages); + __mod_lruvec_state(to_vec, NR_ANON_MAPPED, nr_pages); + if (folio_test_pmd_mappable(folio)) { + __mod_lruvec_state(from_vec, NR_ANON_THPS, + -nr_pages); + __mod_lruvec_state(to_vec, NR_ANON_THPS, + nr_pages); + } + } + } else { + __mod_lruvec_state(from_vec, NR_FILE_PAGES, -nr_pages); + __mod_lruvec_state(to_vec, NR_FILE_PAGES, nr_pages); + + if (folio_test_swapbacked(folio)) { + __mod_lruvec_state(from_vec, NR_SHMEM, -nr_pages); + __mod_lruvec_state(to_vec, NR_SHMEM, nr_pages); + } + + if (folio_mapped(folio)) { + __mod_lruvec_state(from_vec, NR_FILE_MAPPED, -nr_pages); + __mod_lruvec_state(to_vec, NR_FILE_MAPPED, nr_pages); + } + + if (folio_test_dirty(folio)) { + struct address_space *mapping =3D folio_mapping(folio); + + if (mapping_can_writeback(mapping)) { + __mod_lruvec_state(from_vec, NR_FILE_DIRTY, + -nr_pages); + __mod_lruvec_state(to_vec, NR_FILE_DIRTY, + nr_pages); + } + } + } + +#ifdef CONFIG_SWAP + if (folio_test_swapcache(folio)) { + __mod_lruvec_state(from_vec, NR_SWAPCACHE, -nr_pages); + __mod_lruvec_state(to_vec, NR_SWAPCACHE, nr_pages); + } +#endif + if (folio_test_writeback(folio)) { + __mod_lruvec_state(from_vec, NR_WRITEBACK, -nr_pages); + __mod_lruvec_state(to_vec, NR_WRITEBACK, nr_pages); + } + + /* + * All state has been migrated, let's switch to the new memcg. + * + * It is safe to change page's memcg here because the page + * is referenced, charged, isolated, and locked: we can't race + * with (un)charging, migration, LRU putback, or anything else + * that would rely on a stable page's memory cgroup. + * + * Note that folio_memcg_lock is a memcg lock, not a page lock, + * to save space. As soon as we switch page's memory cgroup to a + * new memcg that isn't locked, the above state can change + * concurrently again. Make sure we're truly done with it. + */ + smp_mb(); + + css_get(&to->css); + css_put(&from->css); + + folio->memcg_data =3D (unsigned long)to; + + __folio_memcg_unlock(from); + + ret =3D 0; + nid =3D folio_nid(folio); + + local_irq_disable(); + mem_cgroup_charge_statistics(to, nr_pages); + memcg_check_events(to, nid); + mem_cgroup_charge_statistics(from, -nr_pages); + memcg_check_events(from, nid); + local_irq_enable(); +out: + return ret; +} + +/** + * get_mctgt_type - get target type of moving charge + * @vma: the vma the pte to be checked belongs + * @addr: the address corresponding to the pte to be checked + * @ptent: the pte to be checked + * @target: the pointer the target page or swap ent will be stored(can be = NULL) + * + * Context: Called with pte lock held. + * Return: + * * MC_TARGET_NONE - If the pte is not a target for move charge. + * * MC_TARGET_PAGE - If the page corresponding to this pte is a target for + * move charge. If @target is not NULL, the folio is stored in target->f= olio + * with extra refcnt taken (Caller should release it). + * * MC_TARGET_SWAP - If the swap entry corresponding to this pte is a + * target for charge migration. If @target is not NULL, the entry is + * stored in target->ent. + * * MC_TARGET_DEVICE - Like MC_TARGET_PAGE but page is device memory and + * thus not on the lru. For now such page is charged like a regular page + * would be as it is just special memory taking the place of a regular p= age. + * See Documentations/vm/hmm.txt and include/linux/hmm.h + */ +static enum mc_target_type get_mctgt_type(struct vm_area_struct *vma, + unsigned long addr, pte_t ptent, union mc_target *target) +{ + struct page *page =3D NULL; + struct folio *folio; + enum mc_target_type ret =3D MC_TARGET_NONE; + swp_entry_t ent =3D { .val =3D 0 }; + + if (pte_present(ptent)) + page =3D mc_handle_present_pte(vma, addr, ptent); + else if (pte_none_mostly(ptent)) + /* + * PTE markers should be treated as a none pte here, separated + * from other swap handling below. + */ + page =3D mc_handle_file_pte(vma, addr, ptent); + else if (is_swap_pte(ptent)) + page =3D mc_handle_swap_pte(vma, ptent, &ent); + + if (page) + folio =3D page_folio(page); + if (target && page) { + if (!folio_trylock(folio)) { + folio_put(folio); + return ret; + } + /* + * page_mapped() must be stable during the move. This + * pte is locked, so if it's present, the page cannot + * become unmapped. If it isn't, we have only partial + * control over the mapped state: the page lock will + * prevent new faults against pagecache and swapcache, + * so an unmapped page cannot become mapped. However, + * if the page is already mapped elsewhere, it can + * unmap, and there is nothing we can do about it. + * Alas, skip moving the page in this case. + */ + if (!pte_present(ptent) && page_mapped(page)) { + folio_unlock(folio); + folio_put(folio); + return ret; + } + } + + if (!page && !ent.val) + return ret; + if (page) { + /* + * Do only loose check w/o serialization. + * mem_cgroup_move_account() checks the page is valid or + * not under LRU exclusion. + */ + if (folio_memcg(folio) =3D=3D mc.from) { + ret =3D MC_TARGET_PAGE; + if (folio_is_device_private(folio) || + folio_is_device_coherent(folio)) + ret =3D MC_TARGET_DEVICE; + if (target) + target->folio =3D folio; + } + if (!ret || !target) { + if (target) + folio_unlock(folio); + folio_put(folio); + } + } + /* + * There is a swap entry and a page doesn't exist or isn't charged. + * But we cannot move a tail-page in a THP. + */ + if (ent.val && !ret && (!page || !PageTransCompound(page)) && + mem_cgroup_id(mc.from) =3D=3D lookup_swap_cgroup_id(ent)) { + ret =3D MC_TARGET_SWAP; + if (target) + target->ent =3D ent; + } + return ret; +} + +#ifdef CONFIG_TRANSPARENT_HUGEPAGE +/* + * We don't consider PMD mapped swapping or file mapped pages because THP = does + * not support them for now. + * Caller should make sure that pmd_trans_huge(pmd) is true. + */ +static enum mc_target_type get_mctgt_type_thp(struct vm_area_struct *vma, + unsigned long addr, pmd_t pmd, union mc_target *target) +{ + struct page *page =3D NULL; + struct folio *folio; + enum mc_target_type ret =3D MC_TARGET_NONE; + + if (unlikely(is_swap_pmd(pmd))) { + VM_BUG_ON(thp_migration_supported() && + !is_pmd_migration_entry(pmd)); + return ret; + } + page =3D pmd_page(pmd); + VM_BUG_ON_PAGE(!page || !PageHead(page), page); + folio =3D page_folio(page); + if (!(mc.flags & MOVE_ANON)) + return ret; + if (folio_memcg(folio) =3D=3D mc.from) { + ret =3D MC_TARGET_PAGE; + if (target) { + folio_get(folio); + if (!folio_trylock(folio)) { + folio_put(folio); + return MC_TARGET_NONE; + } + target->folio =3D folio; + } + } + return ret; +} +#else +static inline enum mc_target_type get_mctgt_type_thp(struct vm_area_struct= *vma, + unsigned long addr, pmd_t pmd, union mc_target *target) +{ + return MC_TARGET_NONE; +} +#endif + +static int mem_cgroup_count_precharge_pte_range(pmd_t *pmd, + unsigned long addr, unsigned long end, + struct mm_walk *walk) +{ + struct vm_area_struct *vma =3D walk->vma; + pte_t *pte; + spinlock_t *ptl; + + ptl =3D pmd_trans_huge_lock(pmd, vma); + if (ptl) { + /* + * Note their can not be MC_TARGET_DEVICE for now as we do not + * support transparent huge page with MEMORY_DEVICE_PRIVATE but + * this might change. + */ + if (get_mctgt_type_thp(vma, addr, *pmd, NULL) =3D=3D MC_TARGET_PAGE) + mc.precharge +=3D HPAGE_PMD_NR; + spin_unlock(ptl); + return 0; + } + + pte =3D pte_offset_map_lock(vma->vm_mm, pmd, addr, &ptl); + if (!pte) + return 0; + for (; addr !=3D end; pte++, addr +=3D PAGE_SIZE) + if (get_mctgt_type(vma, addr, ptep_get(pte), NULL)) + mc.precharge++; /* increment precharge temporarily */ + pte_unmap_unlock(pte - 1, ptl); + cond_resched(); + + return 0; +} + +static const struct mm_walk_ops precharge_walk_ops =3D { + .pmd_entry =3D mem_cgroup_count_precharge_pte_range, + .walk_lock =3D PGWALK_RDLOCK, +}; + +static unsigned long mem_cgroup_count_precharge(struct mm_struct *mm) +{ + unsigned long precharge; + + mmap_read_lock(mm); + walk_page_range(mm, 0, ULONG_MAX, &precharge_walk_ops, NULL); + mmap_read_unlock(mm); + + precharge =3D mc.precharge; + mc.precharge =3D 0; + + return precharge; +} + +static int mem_cgroup_precharge_mc(struct mm_struct *mm) +{ + unsigned long precharge =3D mem_cgroup_count_precharge(mm); + + VM_BUG_ON(mc.moving_task); + mc.moving_task =3D current; + return mem_cgroup_do_precharge(precharge); +} + +/* cancels all extra charges on mc.from and mc.to, and wakes up all waiter= s. */ +static void __mem_cgroup_clear_mc(void) +{ + struct mem_cgroup *from =3D mc.from; + struct mem_cgroup *to =3D mc.to; + + /* we must uncharge all the leftover precharges from mc.to */ + if (mc.precharge) { + mem_cgroup_cancel_charge(mc.to, mc.precharge); + mc.precharge =3D 0; + } + /* + * we didn't uncharge from mc.from at mem_cgroup_move_account(), so + * we must uncharge here. + */ + if (mc.moved_charge) { + mem_cgroup_cancel_charge(mc.from, mc.moved_charge); + mc.moved_charge =3D 0; + } + /* we must fixup refcnts and charges */ + if (mc.moved_swap) { + /* uncharge swap account from the old cgroup */ + if (!mem_cgroup_is_root(mc.from)) + page_counter_uncharge(&mc.from->memsw, mc.moved_swap); + + mem_cgroup_id_put_many(mc.from, mc.moved_swap); + + /* + * we charged both to->memory and to->memsw, so we + * should uncharge to->memory. + */ + if (!mem_cgroup_is_root(mc.to)) + page_counter_uncharge(&mc.to->memory, mc.moved_swap); + + mc.moved_swap =3D 0; + } + memcg_oom_recover(from); + memcg_oom_recover(to); + wake_up_all(&mc.waitq); +} + +static void mem_cgroup_clear_mc(void) +{ + struct mm_struct *mm =3D mc.mm; + + /* + * we must clear moving_task before waking up waiters at the end of + * task migration. + */ + mc.moving_task =3D NULL; + __mem_cgroup_clear_mc(); + spin_lock(&mc.lock); + mc.from =3D NULL; + mc.to =3D NULL; + mc.mm =3D NULL; + spin_unlock(&mc.lock); + + mmput(mm); +} + +int mem_cgroup_can_attach(struct cgroup_taskset *tset) +{ + struct cgroup_subsys_state *css; + struct mem_cgroup *memcg =3D NULL; /* unneeded init to make gcc happy */ + struct mem_cgroup *from; + struct task_struct *leader, *p; + struct mm_struct *mm; + unsigned long move_flags; + int ret =3D 0; + + /* charge immigration isn't supported on the default hierarchy */ + if (cgroup_subsys_on_dfl(memory_cgrp_subsys)) + return 0; + + /* + * Multi-process migrations only happen on the default hierarchy + * where charge immigration is not used. Perform charge + * immigration if @tset contains a leader and whine if there are + * multiple. + */ + p =3D NULL; + cgroup_taskset_for_each_leader(leader, css, tset) { + WARN_ON_ONCE(p); + p =3D leader; + memcg =3D mem_cgroup_from_css(css); + } + if (!p) + return 0; + + /* + * We are now committed to this value whatever it is. Changes in this + * tunable will only affect upcoming migrations, not the current one. + * So we need to save it, and keep it going. + */ + move_flags =3D READ_ONCE(memcg->move_charge_at_immigrate); + if (!move_flags) + return 0; + + from =3D mem_cgroup_from_task(p); + + VM_BUG_ON(from =3D=3D memcg); + + mm =3D get_task_mm(p); + if (!mm) + return 0; + /* We move charges only when we move a owner of the mm */ + if (mm->owner =3D=3D p) { + VM_BUG_ON(mc.from); + VM_BUG_ON(mc.to); + VM_BUG_ON(mc.precharge); + VM_BUG_ON(mc.moved_charge); + VM_BUG_ON(mc.moved_swap); + + spin_lock(&mc.lock); + mc.mm =3D mm; + mc.from =3D from; + mc.to =3D memcg; + mc.flags =3D move_flags; + spin_unlock(&mc.lock); + /* We set mc.moving_task later */ + + ret =3D mem_cgroup_precharge_mc(mm); + if (ret) + mem_cgroup_clear_mc(); + } else { + mmput(mm); + } + return ret; +} + +void mem_cgroup_cancel_attach(struct cgroup_taskset *tset) +{ + if (mc.to) + mem_cgroup_clear_mc(); +} + +static int mem_cgroup_move_charge_pte_range(pmd_t *pmd, + unsigned long addr, unsigned long end, + struct mm_walk *walk) +{ + int ret =3D 0; + struct vm_area_struct *vma =3D walk->vma; + pte_t *pte; + spinlock_t *ptl; + enum mc_target_type target_type; + union mc_target target; + struct folio *folio; + + ptl =3D pmd_trans_huge_lock(pmd, vma); + if (ptl) { + if (mc.precharge < HPAGE_PMD_NR) { + spin_unlock(ptl); + return 0; + } + target_type =3D get_mctgt_type_thp(vma, addr, *pmd, &target); + if (target_type =3D=3D MC_TARGET_PAGE) { + folio =3D target.folio; + if (folio_isolate_lru(folio)) { + if (!mem_cgroup_move_account(folio, true, + mc.from, mc.to)) { + mc.precharge -=3D HPAGE_PMD_NR; + mc.moved_charge +=3D HPAGE_PMD_NR; + } + folio_putback_lru(folio); + } + folio_unlock(folio); + folio_put(folio); + } else if (target_type =3D=3D MC_TARGET_DEVICE) { + folio =3D target.folio; + if (!mem_cgroup_move_account(folio, true, + mc.from, mc.to)) { + mc.precharge -=3D HPAGE_PMD_NR; + mc.moved_charge +=3D HPAGE_PMD_NR; + } + folio_unlock(folio); + folio_put(folio); + } + spin_unlock(ptl); + return 0; + } + +retry: + pte =3D pte_offset_map_lock(vma->vm_mm, pmd, addr, &ptl); + if (!pte) + return 0; + for (; addr !=3D end; addr +=3D PAGE_SIZE) { + pte_t ptent =3D ptep_get(pte++); + bool device =3D false; + swp_entry_t ent; + + if (!mc.precharge) + break; + + switch (get_mctgt_type(vma, addr, ptent, &target)) { + case MC_TARGET_DEVICE: + device =3D true; + fallthrough; + case MC_TARGET_PAGE: + folio =3D target.folio; + /* + * We can have a part of the split pmd here. Moving it + * can be done but it would be too convoluted so simply + * ignore such a partial THP and keep it in original + * memcg. There should be somebody mapping the head. + */ + if (folio_test_large(folio)) + goto put; + if (!device && !folio_isolate_lru(folio)) + goto put; + if (!mem_cgroup_move_account(folio, false, + mc.from, mc.to)) { + mc.precharge--; + /* we uncharge from mc.from later. */ + mc.moved_charge++; + } + if (!device) + folio_putback_lru(folio); +put: /* get_mctgt_type() gets & locks the page */ + folio_unlock(folio); + folio_put(folio); + break; + case MC_TARGET_SWAP: + ent =3D target.ent; + if (!mem_cgroup_move_swap_account(ent, mc.from, mc.to)) { + mc.precharge--; + mem_cgroup_id_get_many(mc.to, 1); + /* we fixup other refcnts and charges later. */ + mc.moved_swap++; + } + break; + default: + break; + } + } + pte_unmap_unlock(pte - 1, ptl); + cond_resched(); + + if (addr !=3D end) { + /* + * We have consumed all precharges we got in can_attach(). + * We try charge one by one, but don't do any additional + * charges to mc.to if we have failed in charge once in attach() + * phase. + */ + ret =3D mem_cgroup_do_precharge(1); + if (!ret) + goto retry; + } + + return ret; +} + +static const struct mm_walk_ops charge_walk_ops =3D { + .pmd_entry =3D mem_cgroup_move_charge_pte_range, + .walk_lock =3D PGWALK_RDLOCK, +}; + +static void mem_cgroup_move_charge(void) +{ + lru_add_drain_all(); + /* + * Signal folio_memcg_lock() to take the memcg's move_lock + * while we're moving its pages to another memcg. Then wait + * for already started RCU-only updates to finish. + */ + atomic_inc(&mc.from->moving_account); + synchronize_rcu(); +retry: + if (unlikely(!mmap_read_trylock(mc.mm))) { + /* + * Someone who are holding the mmap_lock might be waiting in + * waitq. So we cancel all extra charges, wake up all waiters, + * and retry. Because we cancel precharges, we might not be able + * to move enough charges, but moving charge is a best-effort + * feature anyway, so it wouldn't be a big problem. + */ + __mem_cgroup_clear_mc(); + cond_resched(); + goto retry; + } + /* + * When we have consumed all precharges and failed in doing + * additional charge, the page walk just aborts. + */ + walk_page_range(mc.mm, 0, ULONG_MAX, &charge_walk_ops, NULL); + mmap_read_unlock(mc.mm); + atomic_dec(&mc.from->moving_account); +} + +void mem_cgroup_move_task(void) +{ + if (mc.to) { + mem_cgroup_move_charge(); + mem_cgroup_clear_mc(); + } +} + +#else /* !CONFIG_MMU */ +static int mem_cgroup_can_attach(struct cgroup_taskset *tset) +{ + return 0; +} +static void mem_cgroup_cancel_attach(struct cgroup_taskset *tset) +{ +} +static void mem_cgroup_move_task(void) +{ +} +#endif + static int __init memcg1_init(void) { int node; diff --git a/mm/memcontrol-v1.h b/mm/memcontrol-v1.h index e37bc7e8d955..55e7c4f90c39 100644 --- a/mm/memcontrol-v1.h +++ b/mm/memcontrol-v1.h @@ -11,4 +11,34 @@ static inline void memcg1_soft_limit_reset(struct mem_cg= roup *memcg) WRITE_ONCE(memcg->soft_limit, PAGE_COUNTER_MAX); } =20 +void mem_cgroup_charge_statistics(struct mem_cgroup *memcg, int nr_pages); +void memcg_check_events(struct mem_cgroup *memcg, int nid); +void memcg_oom_recover(struct mem_cgroup *memcg); +int try_charge_memcg(struct mem_cgroup *memcg, gfp_t gfp_mask, + unsigned int nr_pages); + +static inline int try_charge(struct mem_cgroup *memcg, gfp_t gfp_mask, + unsigned int nr_pages) +{ + if (mem_cgroup_is_root(memcg)) + return 0; + + return try_charge_memcg(memcg, gfp_mask, nr_pages); +} + +void mem_cgroup_id_get_many(struct mem_cgroup *memcg, unsigned int n); +void mem_cgroup_id_put_many(struct mem_cgroup *memcg, unsigned int n); + +bool mem_cgroup_wait_acct_move(struct mem_cgroup *memcg); +struct cgroup_taskset; +int mem_cgroup_can_attach(struct cgroup_taskset *tset); +void mem_cgroup_cancel_attach(struct cgroup_taskset *tset); +void mem_cgroup_move_task(void); + +struct cftype; +u64 mem_cgroup_move_charge_read(struct cgroup_subsys_state *css, + struct cftype *cft); +int mem_cgroup_move_charge_write(struct cgroup_subsys_state *css, + struct cftype *cft, u64 val); + #endif /* __MM_MEMCONTROL_V1_H */ diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 3479e1af12d5..3332c89cae2e 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -28,7 +28,6 @@ #include #include #include -#include #include #include #include @@ -45,7 +44,6 @@ #include #include #include -#include #include #include #include @@ -71,7 +69,6 @@ #include #include #include "slab.h" -#include "swap.h" #include "memcontrol-v1.h" =20 #include @@ -158,31 +155,6 @@ struct mem_cgroup_event { static void mem_cgroup_threshold(struct mem_cgroup *memcg); static void mem_cgroup_oom_notify(struct mem_cgroup *memcg); =20 -/* Stuffs for move charges at task migration. */ -/* - * Types of charges to be moved. - */ -#define MOVE_ANON 0x1U -#define MOVE_FILE 0x2U -#define MOVE_MASK (MOVE_ANON | MOVE_FILE) - -/* "mc" and its members are protected by cgroup_mutex */ -static struct move_charge_struct { - spinlock_t lock; /* for from, to */ - struct mm_struct *mm; - struct mem_cgroup *from; - struct mem_cgroup *to; - unsigned long flags; - unsigned long precharge; - unsigned long moved_charge; - unsigned long moved_swap; - struct task_struct *moving_task; /* a task moving charges */ - wait_queue_head_t waitq; /* a waitq for other context */ -} mc =3D { - .lock =3D __SPIN_LOCK_UNLOCKED(mc.lock), - .waitq =3D __WAIT_QUEUE_HEAD_INITIALIZER(mc.waitq), -}; - /* for encoding cft->private value on file */ enum res_type { _MEM, @@ -955,8 +927,7 @@ static unsigned long memcg_events_local(struct mem_cgro= up *memcg, int event) return READ_ONCE(memcg->vmstats->events_local[i]); } =20 -static void mem_cgroup_charge_statistics(struct mem_cgroup *memcg, - int nr_pages) +void mem_cgroup_charge_statistics(struct mem_cgroup *memcg, int nr_pages) { /* pagein of a big page is an event. So, ignore page size */ if (nr_pages > 0) @@ -998,7 +969,7 @@ static bool mem_cgroup_event_ratelimit(struct mem_cgrou= p *memcg, * Check events in order. * */ -static void memcg_check_events(struct mem_cgroup *memcg, int nid) +void memcg_check_events(struct mem_cgroup *memcg, int nid) { if (IS_ENABLED(CONFIG_PREEMPT_RT)) return; @@ -1467,51 +1438,6 @@ static unsigned long mem_cgroup_margin(struct mem_cg= roup *memcg) return margin; } =20 -/* - * A routine for checking "mem" is under move_account() or not. - * - * Checking a cgroup is mc.from or mc.to or under hierarchy of - * moving cgroups. This is for waiting at high-memory pressure - * caused by "move". - */ -static bool mem_cgroup_under_move(struct mem_cgroup *memcg) -{ - struct mem_cgroup *from; - struct mem_cgroup *to; - bool ret =3D false; - /* - * Unlike task_move routines, we access mc.to, mc.from not under - * mutual exclusion by cgroup_mutex. Here, we take spinlock instead. - */ - spin_lock(&mc.lock); - from =3D mc.from; - to =3D mc.to; - if (!from) - goto unlock; - - ret =3D mem_cgroup_is_descendant(from, memcg) || - mem_cgroup_is_descendant(to, memcg); -unlock: - spin_unlock(&mc.lock); - return ret; -} - -static bool mem_cgroup_wait_acct_move(struct mem_cgroup *memcg) -{ - if (mc.moving_task && current !=3D mc.moving_task) { - if (mem_cgroup_under_move(memcg)) { - DEFINE_WAIT(wait); - prepare_to_wait(&mc.waitq, &wait, TASK_INTERRUPTIBLE); - /* moving charge context might have finished. */ - if (mc.moving_task) - schedule(); - finish_wait(&mc.waitq, &wait); - return true; - } - } - return false; -} - struct memory_stat { const char *name; unsigned int idx; @@ -1904,7 +1830,7 @@ static int memcg_oom_wake_function(wait_queue_entry_t= *wait, return autoremove_wake_function(wait, mode, sync, arg); } =20 -static void memcg_oom_recover(struct mem_cgroup *memcg) +void memcg_oom_recover(struct mem_cgroup *memcg) { /* * For the following lockless ->under_oom test, the only required @@ -2093,87 +2019,6 @@ void mem_cgroup_print_oom_group(struct mem_cgroup *m= emcg) pr_cont(" are going to be killed due to memory.oom.group set\n"); } =20 -/** - * folio_memcg_lock - Bind a folio to its memcg. - * @folio: The folio. - * - * This function prevents unlocked LRU folios from being moved to - * another cgroup. - * - * It ensures lifetime of the bound memcg. The caller is responsible - * for the lifetime of the folio. - */ -void folio_memcg_lock(struct folio *folio) -{ - struct mem_cgroup *memcg; - unsigned long flags; - - /* - * The RCU lock is held throughout the transaction. The fast - * path can get away without acquiring the memcg->move_lock - * because page moving starts with an RCU grace period. - */ - rcu_read_lock(); - - if (mem_cgroup_disabled()) - return; -again: - memcg =3D folio_memcg(folio); - if (unlikely(!memcg)) - return; - -#ifdef CONFIG_PROVE_LOCKING - local_irq_save(flags); - might_lock(&memcg->move_lock); - local_irq_restore(flags); -#endif - - if (atomic_read(&memcg->moving_account) <=3D 0) - return; - - spin_lock_irqsave(&memcg->move_lock, flags); - if (memcg !=3D folio_memcg(folio)) { - spin_unlock_irqrestore(&memcg->move_lock, flags); - goto again; - } - - /* - * When charge migration first begins, we can have multiple - * critical sections holding the fast-path RCU lock and one - * holding the slowpath move_lock. Track the task who has the - * move_lock for folio_memcg_unlock(). - */ - memcg->move_lock_task =3D current; - memcg->move_lock_flags =3D flags; -} - -static void __folio_memcg_unlock(struct mem_cgroup *memcg) -{ - if (memcg && memcg->move_lock_task =3D=3D current) { - unsigned long flags =3D memcg->move_lock_flags; - - memcg->move_lock_task =3D NULL; - memcg->move_lock_flags =3D 0; - - spin_unlock_irqrestore(&memcg->move_lock, flags); - } - - rcu_read_unlock(); -} - -/** - * folio_memcg_unlock - Release the binding between a folio and its memcg. - * @folio: The folio. - * - * This releases the binding created by folio_memcg_lock(). This does - * not change the accounting of this folio to its memcg, but it does - * permit others to change it. - */ -void folio_memcg_unlock(struct folio *folio) -{ - __folio_memcg_unlock(folio_memcg(folio)); -} - struct memcg_stock_pcp { local_lock_t stock_lock; struct mem_cgroup *cached; /* this never be root cgroup */ @@ -2653,8 +2498,8 @@ void mem_cgroup_handle_over_high(gfp_t gfp_mask) css_put(&memcg->css); } =20 -static int try_charge_memcg(struct mem_cgroup *memcg, gfp_t gfp_mask, - unsigned int nr_pages) +int try_charge_memcg(struct mem_cgroup *memcg, gfp_t gfp_mask, + unsigned int nr_pages) { unsigned int batch =3D max(MEMCG_CHARGE_BATCH, nr_pages); int nr_retries =3D MAX_RECLAIM_RETRIES; @@ -2849,15 +2694,6 @@ static int try_charge_memcg(struct mem_cgroup *memcg= , gfp_t gfp_mask, return 0; } =20 -static inline int try_charge(struct mem_cgroup *memcg, gfp_t gfp_mask, - unsigned int nr_pages) -{ - if (mem_cgroup_is_root(memcg)) - return 0; - - return try_charge_memcg(memcg, gfp_mask, nr_pages); -} - /** * mem_cgroup_cancel_charge() - cancel an uncommitted try_charge() call. * @memcg: memcg previously charged. @@ -3595,43 +3431,6 @@ void split_page_memcg(struct page *head, int old_ord= er, int new_order) css_get_many(&memcg->css, old_nr / new_nr - 1); } =20 -#ifdef CONFIG_SWAP -/** - * mem_cgroup_move_swap_account - move swap charge and swap_cgroup's recor= d. - * @entry: swap entry to be moved - * @from: mem_cgroup which the entry is moved from - * @to: mem_cgroup which the entry is moved to - * - * It succeeds only when the swap_cgroup's record for this entry is the sa= me - * as the mem_cgroup's id of @from. - * - * Returns 0 on success, -EINVAL on failure. - * - * The caller must have charged to @to, IOW, called page_counter_charge() = about - * both res and memsw, and called css_get(). - */ -static int mem_cgroup_move_swap_account(swp_entry_t entry, - struct mem_cgroup *from, struct mem_cgroup *to) -{ - unsigned short old_id, new_id; - - old_id =3D mem_cgroup_id(from); - new_id =3D mem_cgroup_id(to); - - if (swap_cgroup_cmpxchg(entry, old_id, new_id) =3D=3D old_id) { - mod_memcg_state(from, MEMCG_SWAP, -1); - mod_memcg_state(to, MEMCG_SWAP, 1); - return 0; - } - return -EINVAL; -} -#else -static inline int mem_cgroup_move_swap_account(swp_entry_t entry, - struct mem_cgroup *from, struct mem_cgroup *to) -{ - return -EINVAL; -} -#endif =20 static DEFINE_MUTEX(memcg_max_mutex); =20 @@ -4015,42 +3814,6 @@ static ssize_t mem_cgroup_reset(struct kernfs_open_f= ile *of, char *buf, return nbytes; } =20 -static u64 mem_cgroup_move_charge_read(struct cgroup_subsys_state *css, - struct cftype *cft) -{ - return mem_cgroup_from_css(css)->move_charge_at_immigrate; -} - -#ifdef CONFIG_MMU -static int mem_cgroup_move_charge_write(struct cgroup_subsys_state *css, - struct cftype *cft, u64 val) -{ - struct mem_cgroup *memcg =3D mem_cgroup_from_css(css); - - pr_warn_once("Cgroup memory moving (move_charge_at_immigrate) is deprecat= ed. " - "Please report your usecase to linux-mm@kvack.org if you " - "depend on this functionality.\n"); - - if (val & ~MOVE_MASK) - return -EINVAL; - - /* - * No kind of locking is needed in here, because ->can_attach() will - * check this value once in the beginning of the process, and then carry - * on with stale data. This means that changes to this value will only - * affect task migrations starting after the change. - */ - memcg->move_charge_at_immigrate =3D val; - return 0; -} -#else -static int mem_cgroup_move_charge_write(struct cgroup_subsys_state *css, - struct cftype *cft, u64 val) -{ - return -ENOSYS; -} -#endif - #ifdef CONFIG_NUMA =20 #define LRU_ALL_FILE (BIT(LRU_INACTIVE_FILE) | BIT(LRU_ACTIVE_FILE)) @@ -5261,13 +5024,13 @@ static void mem_cgroup_id_remove(struct mem_cgroup = *memcg) } } =20 -static void __maybe_unused mem_cgroup_id_get_many(struct mem_cgroup *memcg, - unsigned int n) +void __maybe_unused mem_cgroup_id_get_many(struct mem_cgroup *memcg, + unsigned int n) { refcount_add(n, &memcg->id.ref); } =20 -static void mem_cgroup_id_put_many(struct mem_cgroup *memcg, unsigned int = n) +void mem_cgroup_id_put_many(struct mem_cgroup *memcg, unsigned int n) { if (refcount_sub_and_test(n, &memcg->id.ref)) { mem_cgroup_id_remove(memcg); @@ -5747,757 +5510,6 @@ static void mem_cgroup_css_rstat_flush(struct cgrou= p_subsys_state *css, int cpu) atomic64_set(&memcg->vmstats->stats_updates, 0); } =20 -#ifdef CONFIG_MMU -/* Handlers for move charge at task migration. */ -static int mem_cgroup_do_precharge(unsigned long count) -{ - int ret; - - /* Try a single bulk charge without reclaim first, kswapd may wake */ - ret =3D try_charge(mc.to, GFP_KERNEL & ~__GFP_DIRECT_RECLAIM, count); - if (!ret) { - mc.precharge +=3D count; - return ret; - } - - /* Try charges one by one with reclaim, but do not retry */ - while (count--) { - ret =3D try_charge(mc.to, GFP_KERNEL | __GFP_NORETRY, 1); - if (ret) - return ret; - mc.precharge++; - cond_resched(); - } - return 0; -} - -union mc_target { - struct folio *folio; - swp_entry_t ent; -}; - -enum mc_target_type { - MC_TARGET_NONE =3D 0, - MC_TARGET_PAGE, - MC_TARGET_SWAP, - MC_TARGET_DEVICE, -}; - -static struct page *mc_handle_present_pte(struct vm_area_struct *vma, - unsigned long addr, pte_t ptent) -{ - struct page *page =3D vm_normal_page(vma, addr, ptent); - - if (!page) - return NULL; - if (PageAnon(page)) { - if (!(mc.flags & MOVE_ANON)) - return NULL; - } else { - if (!(mc.flags & MOVE_FILE)) - return NULL; - } - get_page(page); - - return page; -} - -#if defined(CONFIG_SWAP) || defined(CONFIG_DEVICE_PRIVATE) -static struct page *mc_handle_swap_pte(struct vm_area_struct *vma, - pte_t ptent, swp_entry_t *entry) -{ - struct page *page =3D NULL; - swp_entry_t ent =3D pte_to_swp_entry(ptent); - - if (!(mc.flags & MOVE_ANON)) - return NULL; - - /* - * Handle device private pages that are not accessible by the CPU, but - * stored as special swap entries in the page table. - */ - if (is_device_private_entry(ent)) { - page =3D pfn_swap_entry_to_page(ent); - if (!get_page_unless_zero(page)) - return NULL; - return page; - } - - if (non_swap_entry(ent)) - return NULL; - - /* - * Because swap_cache_get_folio() updates some statistics counter, - * we call find_get_page() with swapper_space directly. - */ - page =3D find_get_page(swap_address_space(ent), swap_cache_index(ent)); - entry->val =3D ent.val; - - return page; -} -#else -static struct page *mc_handle_swap_pte(struct vm_area_struct *vma, - pte_t ptent, swp_entry_t *entry) -{ - return NULL; -} -#endif - -static struct page *mc_handle_file_pte(struct vm_area_struct *vma, - unsigned long addr, pte_t ptent) -{ - unsigned long index; - struct folio *folio; - - if (!vma->vm_file) /* anonymous vma */ - return NULL; - if (!(mc.flags & MOVE_FILE)) - return NULL; - - /* folio is moved even if it's not RSS of this task(page-faulted). */ - /* shmem/tmpfs may report page out on swap: account for that too. */ - index =3D linear_page_index(vma, addr); - folio =3D filemap_get_incore_folio(vma->vm_file->f_mapping, index); - if (IS_ERR(folio)) - return NULL; - return folio_file_page(folio, index); -} - -/** - * mem_cgroup_move_account - move account of the folio - * @folio: The folio. - * @compound: charge the page as compound or small page - * @from: mem_cgroup which the folio is moved from. - * @to: mem_cgroup which the folio is moved to. @from !=3D @to. - * - * The folio must be locked and not on the LRU. - * - * This function doesn't do "charge" to new cgroup and doesn't do "uncharg= e" - * from old cgroup. - */ -static int mem_cgroup_move_account(struct folio *folio, - bool compound, - struct mem_cgroup *from, - struct mem_cgroup *to) -{ - struct lruvec *from_vec, *to_vec; - struct pglist_data *pgdat; - unsigned int nr_pages =3D compound ? folio_nr_pages(folio) : 1; - int nid, ret; - - VM_BUG_ON(from =3D=3D to); - VM_BUG_ON_FOLIO(!folio_test_locked(folio), folio); - VM_BUG_ON_FOLIO(folio_test_lru(folio), folio); - VM_BUG_ON(compound && !folio_test_large(folio)); - - ret =3D -EINVAL; - if (folio_memcg(folio) !=3D from) - goto out; - - pgdat =3D folio_pgdat(folio); - from_vec =3D mem_cgroup_lruvec(from, pgdat); - to_vec =3D mem_cgroup_lruvec(to, pgdat); - - folio_memcg_lock(folio); - - if (folio_test_anon(folio)) { - if (folio_mapped(folio)) { - __mod_lruvec_state(from_vec, NR_ANON_MAPPED, -nr_pages); - __mod_lruvec_state(to_vec, NR_ANON_MAPPED, nr_pages); - if (folio_test_pmd_mappable(folio)) { - __mod_lruvec_state(from_vec, NR_ANON_THPS, - -nr_pages); - __mod_lruvec_state(to_vec, NR_ANON_THPS, - nr_pages); - } - } - } else { - __mod_lruvec_state(from_vec, NR_FILE_PAGES, -nr_pages); - __mod_lruvec_state(to_vec, NR_FILE_PAGES, nr_pages); - - if (folio_test_swapbacked(folio)) { - __mod_lruvec_state(from_vec, NR_SHMEM, -nr_pages); - __mod_lruvec_state(to_vec, NR_SHMEM, nr_pages); - } - - if (folio_mapped(folio)) { - __mod_lruvec_state(from_vec, NR_FILE_MAPPED, -nr_pages); - __mod_lruvec_state(to_vec, NR_FILE_MAPPED, nr_pages); - } - - if (folio_test_dirty(folio)) { - struct address_space *mapping =3D folio_mapping(folio); - - if (mapping_can_writeback(mapping)) { - __mod_lruvec_state(from_vec, NR_FILE_DIRTY, - -nr_pages); - __mod_lruvec_state(to_vec, NR_FILE_DIRTY, - nr_pages); - } - } - } - -#ifdef CONFIG_SWAP - if (folio_test_swapcache(folio)) { - __mod_lruvec_state(from_vec, NR_SWAPCACHE, -nr_pages); - __mod_lruvec_state(to_vec, NR_SWAPCACHE, nr_pages); - } -#endif - if (folio_test_writeback(folio)) { - __mod_lruvec_state(from_vec, NR_WRITEBACK, -nr_pages); - __mod_lruvec_state(to_vec, NR_WRITEBACK, nr_pages); - } - - /* - * All state has been migrated, let's switch to the new memcg. - * - * It is safe to change page's memcg here because the page - * is referenced, charged, isolated, and locked: we can't race - * with (un)charging, migration, LRU putback, or anything else - * that would rely on a stable page's memory cgroup. - * - * Note that folio_memcg_lock is a memcg lock, not a page lock, - * to save space. As soon as we switch page's memory cgroup to a - * new memcg that isn't locked, the above state can change - * concurrently again. Make sure we're truly done with it. - */ - smp_mb(); - - css_get(&to->css); - css_put(&from->css); - - folio->memcg_data =3D (unsigned long)to; - - __folio_memcg_unlock(from); - - ret =3D 0; - nid =3D folio_nid(folio); - - local_irq_disable(); - mem_cgroup_charge_statistics(to, nr_pages); - memcg_check_events(to, nid); - mem_cgroup_charge_statistics(from, -nr_pages); - memcg_check_events(from, nid); - local_irq_enable(); -out: - return ret; -} - -/** - * get_mctgt_type - get target type of moving charge - * @vma: the vma the pte to be checked belongs - * @addr: the address corresponding to the pte to be checked - * @ptent: the pte to be checked - * @target: the pointer the target page or swap ent will be stored(can be = NULL) - * - * Context: Called with pte lock held. - * Return: - * * MC_TARGET_NONE - If the pte is not a target for move charge. - * * MC_TARGET_PAGE - If the page corresponding to this pte is a target for - * move charge. If @target is not NULL, the folio is stored in target->f= olio - * with extra refcnt taken (Caller should release it). - * * MC_TARGET_SWAP - If the swap entry corresponding to this pte is a - * target for charge migration. If @target is not NULL, the entry is - * stored in target->ent. - * * MC_TARGET_DEVICE - Like MC_TARGET_PAGE but page is device memory and - * thus not on the lru. For now such page is charged like a regular page - * would be as it is just special memory taking the place of a regular p= age. - * See Documentations/vm/hmm.txt and include/linux/hmm.h - */ -static enum mc_target_type get_mctgt_type(struct vm_area_struct *vma, - unsigned long addr, pte_t ptent, union mc_target *target) -{ - struct page *page =3D NULL; - struct folio *folio; - enum mc_target_type ret =3D MC_TARGET_NONE; - swp_entry_t ent =3D { .val =3D 0 }; - - if (pte_present(ptent)) - page =3D mc_handle_present_pte(vma, addr, ptent); - else if (pte_none_mostly(ptent)) - /* - * PTE markers should be treated as a none pte here, separated - * from other swap handling below. - */ - page =3D mc_handle_file_pte(vma, addr, ptent); - else if (is_swap_pte(ptent)) - page =3D mc_handle_swap_pte(vma, ptent, &ent); - - if (page) - folio =3D page_folio(page); - if (target && page) { - if (!folio_trylock(folio)) { - folio_put(folio); - return ret; - } - /* - * page_mapped() must be stable during the move. This - * pte is locked, so if it's present, the page cannot - * become unmapped. If it isn't, we have only partial - * control over the mapped state: the page lock will - * prevent new faults against pagecache and swapcache, - * so an unmapped page cannot become mapped. However, - * if the page is already mapped elsewhere, it can - * unmap, and there is nothing we can do about it. - * Alas, skip moving the page in this case. - */ - if (!pte_present(ptent) && page_mapped(page)) { - folio_unlock(folio); - folio_put(folio); - return ret; - } - } - - if (!page && !ent.val) - return ret; - if (page) { - /* - * Do only loose check w/o serialization. - * mem_cgroup_move_account() checks the page is valid or - * not under LRU exclusion. - */ - if (folio_memcg(folio) =3D=3D mc.from) { - ret =3D MC_TARGET_PAGE; - if (folio_is_device_private(folio) || - folio_is_device_coherent(folio)) - ret =3D MC_TARGET_DEVICE; - if (target) - target->folio =3D folio; - } - if (!ret || !target) { - if (target) - folio_unlock(folio); - folio_put(folio); - } - } - /* - * There is a swap entry and a page doesn't exist or isn't charged. - * But we cannot move a tail-page in a THP. - */ - if (ent.val && !ret && (!page || !PageTransCompound(page)) && - mem_cgroup_id(mc.from) =3D=3D lookup_swap_cgroup_id(ent)) { - ret =3D MC_TARGET_SWAP; - if (target) - target->ent =3D ent; - } - return ret; -} - -#ifdef CONFIG_TRANSPARENT_HUGEPAGE -/* - * We don't consider PMD mapped swapping or file mapped pages because THP = does - * not support them for now. - * Caller should make sure that pmd_trans_huge(pmd) is true. - */ -static enum mc_target_type get_mctgt_type_thp(struct vm_area_struct *vma, - unsigned long addr, pmd_t pmd, union mc_target *target) -{ - struct page *page =3D NULL; - struct folio *folio; - enum mc_target_type ret =3D MC_TARGET_NONE; - - if (unlikely(is_swap_pmd(pmd))) { - VM_BUG_ON(thp_migration_supported() && - !is_pmd_migration_entry(pmd)); - return ret; - } - page =3D pmd_page(pmd); - VM_BUG_ON_PAGE(!page || !PageHead(page), page); - folio =3D page_folio(page); - if (!(mc.flags & MOVE_ANON)) - return ret; - if (folio_memcg(folio) =3D=3D mc.from) { - ret =3D MC_TARGET_PAGE; - if (target) { - folio_get(folio); - if (!folio_trylock(folio)) { - folio_put(folio); - return MC_TARGET_NONE; - } - target->folio =3D folio; - } - } - return ret; -} -#else -static inline enum mc_target_type get_mctgt_type_thp(struct vm_area_struct= *vma, - unsigned long addr, pmd_t pmd, union mc_target *target) -{ - return MC_TARGET_NONE; -} -#endif - -static int mem_cgroup_count_precharge_pte_range(pmd_t *pmd, - unsigned long addr, unsigned long end, - struct mm_walk *walk) -{ - struct vm_area_struct *vma =3D walk->vma; - pte_t *pte; - spinlock_t *ptl; - - ptl =3D pmd_trans_huge_lock(pmd, vma); - if (ptl) { - /* - * Note their can not be MC_TARGET_DEVICE for now as we do not - * support transparent huge page with MEMORY_DEVICE_PRIVATE but - * this might change. - */ - if (get_mctgt_type_thp(vma, addr, *pmd, NULL) =3D=3D MC_TARGET_PAGE) - mc.precharge +=3D HPAGE_PMD_NR; - spin_unlock(ptl); - return 0; - } - - pte =3D pte_offset_map_lock(vma->vm_mm, pmd, addr, &ptl); - if (!pte) - return 0; - for (; addr !=3D end; pte++, addr +=3D PAGE_SIZE) - if (get_mctgt_type(vma, addr, ptep_get(pte), NULL)) - mc.precharge++; /* increment precharge temporarily */ - pte_unmap_unlock(pte - 1, ptl); - cond_resched(); - - return 0; -} - -static const struct mm_walk_ops precharge_walk_ops =3D { - .pmd_entry =3D mem_cgroup_count_precharge_pte_range, - .walk_lock =3D PGWALK_RDLOCK, -}; - -static unsigned long mem_cgroup_count_precharge(struct mm_struct *mm) -{ - unsigned long precharge; - - mmap_read_lock(mm); - walk_page_range(mm, 0, ULONG_MAX, &precharge_walk_ops, NULL); - mmap_read_unlock(mm); - - precharge =3D mc.precharge; - mc.precharge =3D 0; - - return precharge; -} - -static int mem_cgroup_precharge_mc(struct mm_struct *mm) -{ - unsigned long precharge =3D mem_cgroup_count_precharge(mm); - - VM_BUG_ON(mc.moving_task); - mc.moving_task =3D current; - return mem_cgroup_do_precharge(precharge); -} - -/* cancels all extra charges on mc.from and mc.to, and wakes up all waiter= s. */ -static void __mem_cgroup_clear_mc(void) -{ - struct mem_cgroup *from =3D mc.from; - struct mem_cgroup *to =3D mc.to; - - /* we must uncharge all the leftover precharges from mc.to */ - if (mc.precharge) { - mem_cgroup_cancel_charge(mc.to, mc.precharge); - mc.precharge =3D 0; - } - /* - * we didn't uncharge from mc.from at mem_cgroup_move_account(), so - * we must uncharge here. - */ - if (mc.moved_charge) { - mem_cgroup_cancel_charge(mc.from, mc.moved_charge); - mc.moved_charge =3D 0; - } - /* we must fixup refcnts and charges */ - if (mc.moved_swap) { - /* uncharge swap account from the old cgroup */ - if (!mem_cgroup_is_root(mc.from)) - page_counter_uncharge(&mc.from->memsw, mc.moved_swap); - - mem_cgroup_id_put_many(mc.from, mc.moved_swap); - - /* - * we charged both to->memory and to->memsw, so we - * should uncharge to->memory. - */ - if (!mem_cgroup_is_root(mc.to)) - page_counter_uncharge(&mc.to->memory, mc.moved_swap); - - mc.moved_swap =3D 0; - } - memcg_oom_recover(from); - memcg_oom_recover(to); - wake_up_all(&mc.waitq); -} - -static void mem_cgroup_clear_mc(void) -{ - struct mm_struct *mm =3D mc.mm; - - /* - * we must clear moving_task before waking up waiters at the end of - * task migration. - */ - mc.moving_task =3D NULL; - __mem_cgroup_clear_mc(); - spin_lock(&mc.lock); - mc.from =3D NULL; - mc.to =3D NULL; - mc.mm =3D NULL; - spin_unlock(&mc.lock); - - mmput(mm); -} - -static int mem_cgroup_can_attach(struct cgroup_taskset *tset) -{ - struct cgroup_subsys_state *css; - struct mem_cgroup *memcg =3D NULL; /* unneeded init to make gcc happy */ - struct mem_cgroup *from; - struct task_struct *leader, *p; - struct mm_struct *mm; - unsigned long move_flags; - int ret =3D 0; - - /* charge immigration isn't supported on the default hierarchy */ - if (cgroup_subsys_on_dfl(memory_cgrp_subsys)) - return 0; - - /* - * Multi-process migrations only happen on the default hierarchy - * where charge immigration is not used. Perform charge - * immigration if @tset contains a leader and whine if there are - * multiple. - */ - p =3D NULL; - cgroup_taskset_for_each_leader(leader, css, tset) { - WARN_ON_ONCE(p); - p =3D leader; - memcg =3D mem_cgroup_from_css(css); - } - if (!p) - return 0; - - /* - * We are now committed to this value whatever it is. Changes in this - * tunable will only affect upcoming migrations, not the current one. - * So we need to save it, and keep it going. - */ - move_flags =3D READ_ONCE(memcg->move_charge_at_immigrate); - if (!move_flags) - return 0; - - from =3D mem_cgroup_from_task(p); - - VM_BUG_ON(from =3D=3D memcg); - - mm =3D get_task_mm(p); - if (!mm) - return 0; - /* We move charges only when we move a owner of the mm */ - if (mm->owner =3D=3D p) { - VM_BUG_ON(mc.from); - VM_BUG_ON(mc.to); - VM_BUG_ON(mc.precharge); - VM_BUG_ON(mc.moved_charge); - VM_BUG_ON(mc.moved_swap); - - spin_lock(&mc.lock); - mc.mm =3D mm; - mc.from =3D from; - mc.to =3D memcg; - mc.flags =3D move_flags; - spin_unlock(&mc.lock); - /* We set mc.moving_task later */ - - ret =3D mem_cgroup_precharge_mc(mm); - if (ret) - mem_cgroup_clear_mc(); - } else { - mmput(mm); - } - return ret; -} - -static void mem_cgroup_cancel_attach(struct cgroup_taskset *tset) -{ - if (mc.to) - mem_cgroup_clear_mc(); -} - -static int mem_cgroup_move_charge_pte_range(pmd_t *pmd, - unsigned long addr, unsigned long end, - struct mm_walk *walk) -{ - int ret =3D 0; - struct vm_area_struct *vma =3D walk->vma; - pte_t *pte; - spinlock_t *ptl; - enum mc_target_type target_type; - union mc_target target; - struct folio *folio; - - ptl =3D pmd_trans_huge_lock(pmd, vma); - if (ptl) { - if (mc.precharge < HPAGE_PMD_NR) { - spin_unlock(ptl); - return 0; - } - target_type =3D get_mctgt_type_thp(vma, addr, *pmd, &target); - if (target_type =3D=3D MC_TARGET_PAGE) { - folio =3D target.folio; - if (folio_isolate_lru(folio)) { - if (!mem_cgroup_move_account(folio, true, - mc.from, mc.to)) { - mc.precharge -=3D HPAGE_PMD_NR; - mc.moved_charge +=3D HPAGE_PMD_NR; - } - folio_putback_lru(folio); - } - folio_unlock(folio); - folio_put(folio); - } else if (target_type =3D=3D MC_TARGET_DEVICE) { - folio =3D target.folio; - if (!mem_cgroup_move_account(folio, true, - mc.from, mc.to)) { - mc.precharge -=3D HPAGE_PMD_NR; - mc.moved_charge +=3D HPAGE_PMD_NR; - } - folio_unlock(folio); - folio_put(folio); - } - spin_unlock(ptl); - return 0; - } - -retry: - pte =3D pte_offset_map_lock(vma->vm_mm, pmd, addr, &ptl); - if (!pte) - return 0; - for (; addr !=3D end; addr +=3D PAGE_SIZE) { - pte_t ptent =3D ptep_get(pte++); - bool device =3D false; - swp_entry_t ent; - - if (!mc.precharge) - break; - - switch (get_mctgt_type(vma, addr, ptent, &target)) { - case MC_TARGET_DEVICE: - device =3D true; - fallthrough; - case MC_TARGET_PAGE: - folio =3D target.folio; - /* - * We can have a part of the split pmd here. Moving it - * can be done but it would be too convoluted so simply - * ignore such a partial THP and keep it in original - * memcg. There should be somebody mapping the head. - */ - if (folio_test_large(folio)) - goto put; - if (!device && !folio_isolate_lru(folio)) - goto put; - if (!mem_cgroup_move_account(folio, false, - mc.from, mc.to)) { - mc.precharge--; - /* we uncharge from mc.from later. */ - mc.moved_charge++; - } - if (!device) - folio_putback_lru(folio); -put: /* get_mctgt_type() gets & locks the page */ - folio_unlock(folio); - folio_put(folio); - break; - case MC_TARGET_SWAP: - ent =3D target.ent; - if (!mem_cgroup_move_swap_account(ent, mc.from, mc.to)) { - mc.precharge--; - mem_cgroup_id_get_many(mc.to, 1); - /* we fixup other refcnts and charges later. */ - mc.moved_swap++; - } - break; - default: - break; - } - } - pte_unmap_unlock(pte - 1, ptl); - cond_resched(); - - if (addr !=3D end) { - /* - * We have consumed all precharges we got in can_attach(). - * We try charge one by one, but don't do any additional - * charges to mc.to if we have failed in charge once in attach() - * phase. - */ - ret =3D mem_cgroup_do_precharge(1); - if (!ret) - goto retry; - } - - return ret; -} - -static const struct mm_walk_ops charge_walk_ops =3D { - .pmd_entry =3D mem_cgroup_move_charge_pte_range, - .walk_lock =3D PGWALK_RDLOCK, -}; - -static void mem_cgroup_move_charge(void) -{ - lru_add_drain_all(); - /* - * Signal folio_memcg_lock() to take the memcg's move_lock - * while we're moving its pages to another memcg. Then wait - * for already started RCU-only updates to finish. - */ - atomic_inc(&mc.from->moving_account); - synchronize_rcu(); -retry: - if (unlikely(!mmap_read_trylock(mc.mm))) { - /* - * Someone who are holding the mmap_lock might be waiting in - * waitq. So we cancel all extra charges, wake up all waiters, - * and retry. Because we cancel precharges, we might not be able - * to move enough charges, but moving charge is a best-effort - * feature anyway, so it wouldn't be a big problem. - */ - __mem_cgroup_clear_mc(); - cond_resched(); - goto retry; - } - /* - * When we have consumed all precharges and failed in doing - * additional charge, the page walk just aborts. - */ - walk_page_range(mc.mm, 0, ULONG_MAX, &charge_walk_ops, NULL); - mmap_read_unlock(mc.mm); - atomic_dec(&mc.from->moving_account); -} - -static void mem_cgroup_move_task(void) -{ - if (mc.to) { - mem_cgroup_move_charge(); - mem_cgroup_clear_mc(); - } -} - -#else /* !CONFIG_MMU */ -static int mem_cgroup_can_attach(struct cgroup_taskset *tset) -{ - return 0; -} -static void mem_cgroup_cancel_attach(struct cgroup_taskset *tset) -{ -} -static void mem_cgroup_move_task(void) -{ -} -#endif - #ifdef CONFIG_MEMCG_KMEM static void mem_cgroup_fork(struct task_struct *task) { --=20 2.45.2 From nobody Sat Feb 7 20:06:57 2026 Received: from out-175.mta0.migadu.com (out-175.mta0.migadu.com [91.218.175.175]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 72BEE1C6A8 for ; Tue, 25 Jun 2024 00:59:39 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=91.218.175.175 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1719277181; cv=none; b=F/Nae9bBM5d7x8+bxxEZxvjOoBGk8/uPBFcYqU6jUPfZRUIt5eKQiQdN3RUpESVDJj5f0iAN1g6vCdRid1hVwyVfaXw0Og9vKMVdZVqTlMtAr6NjVsZprqBkxNqEsUq5dTazbv1Y5UOBFI86lYGqlYWgX2snPC1e3YkLC7q1lbg= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1719277181; c=relaxed/simple; bh=xv9NOCGodCEQxYKo91VaisLIINAyjCN1JxW1krl4FrM=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=lwQdhZa98uHNVBJ2noqf9TkNk+h9qRR6FQeg95uMcpCAO+LCoYNYknFsR8sylif77pOtayYbKvcK3UbA2ibnI5fQFeMLQqkS4T2ABfCptsFIH5wOL6dhGZc9Fec0H9Gm6s8DUROYU0RWFbo8aafi7ZNKUiXJj9beBOiYLf+SI88= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev; spf=pass smtp.mailfrom=linux.dev; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b=SFL+k+TQ; arc=none smtp.client-ip=91.218.175.175 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.dev Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b="SFL+k+TQ" X-Envelope-To: akpm@linux-foundation.org DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1719277177; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=v/tf3lBw3jG/rT7MJZGDLV/4uYe72LY+pJRmiKuAdfM=; b=SFL+k+TQiNlRjgoH6x50pq3n0QiBjYIdOlFT3DK3/1aCGerHYnl1GSqWHiUELtnGjVl6cf EHBcQ4kFu/PRip99CGJsa6MpFTpsdvUA9eQKJSLiBJ56N+W0FFhsvBi6fisG0L48+JrNBH AoKWwYioVwfLl56ihuUsHU1AN4cR280= X-Envelope-To: hannes@cmpxchg.org X-Envelope-To: mhocko@kernel.org X-Envelope-To: shakeel.butt@linux.dev X-Envelope-To: muchun.song@linux.dev X-Envelope-To: linux-kernel@vger.kernel.org X-Envelope-To: cgroups@vger.kernel.org X-Envelope-To: linux-mm@kvack.org X-Envelope-To: roman.gushchin@linux.dev X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. From: Roman Gushchin To: Andrew Morton Cc: Johannes Weiner , Michal Hocko , Shakeel Butt , Muchun Song , linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, linux-mm@kvack.org, Roman Gushchin Subject: [PATCH v2 05/14] mm: memcg: rename charge move-related functions Date: Mon, 24 Jun 2024 17:58:57 -0700 Message-ID: <20240625005906.106920-6-roman.gushchin@linux.dev> In-Reply-To: <20240625005906.106920-1-roman.gushchin@linux.dev> References: <20240625005906.106920-1-roman.gushchin@linux.dev> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Migadu-Flow: FLOW_OUT Content-Type: text/plain; charset="utf-8" Rename exported function related to the charge move to have the memcg1_ prefix. Signed-off-by: Roman Gushchin Acked-by: Michal Hocko Acked-by: Shakeel Butt Suggested-by: Matthew Wilcox (Oracle) --- mm/memcontrol-v1.c | 14 +++++++------- mm/memcontrol-v1.h | 8 ++++---- mm/memcontrol.c | 8 ++++---- 3 files changed, 15 insertions(+), 15 deletions(-) diff --git a/mm/memcontrol-v1.c b/mm/memcontrol-v1.c index f4c8bec5ae1b..c25e038ac874 100644 --- a/mm/memcontrol-v1.c +++ b/mm/memcontrol-v1.c @@ -384,7 +384,7 @@ static bool mem_cgroup_under_move(struct mem_cgroup *me= mcg) return ret; } =20 -bool mem_cgroup_wait_acct_move(struct mem_cgroup *memcg) +bool memcg1_wait_acct_move(struct mem_cgroup *memcg) { if (mc.moving_task && current !=3D mc.moving_task) { if (mem_cgroup_under_move(memcg)) { @@ -1056,7 +1056,7 @@ static void mem_cgroup_clear_mc(void) mmput(mm); } =20 -int mem_cgroup_can_attach(struct cgroup_taskset *tset) +int memcg1_can_attach(struct cgroup_taskset *tset) { struct cgroup_subsys_state *css; struct mem_cgroup *memcg =3D NULL; /* unneeded init to make gcc happy */ @@ -1126,7 +1126,7 @@ int mem_cgroup_can_attach(struct cgroup_taskset *tset) return ret; } =20 -void mem_cgroup_cancel_attach(struct cgroup_taskset *tset) +void memcg1_cancel_attach(struct cgroup_taskset *tset) { if (mc.to) mem_cgroup_clear_mc(); @@ -1285,7 +1285,7 @@ static void mem_cgroup_move_charge(void) atomic_dec(&mc.from->moving_account); } =20 -void mem_cgroup_move_task(void) +void memcg1_move_task(void) { if (mc.to) { mem_cgroup_move_charge(); @@ -1294,14 +1294,14 @@ void mem_cgroup_move_task(void) } =20 #else /* !CONFIG_MMU */ -static int mem_cgroup_can_attach(struct cgroup_taskset *tset) +int memcg1_can_attach(struct cgroup_taskset *tset) { return 0; } -static void mem_cgroup_cancel_attach(struct cgroup_taskset *tset) +void memcg1_cancel_attach(struct cgroup_taskset *tset) { } -static void mem_cgroup_move_task(void) +void memcg1_move_task(void) { } #endif diff --git a/mm/memcontrol-v1.h b/mm/memcontrol-v1.h index 55e7c4f90c39..d377c0be9880 100644 --- a/mm/memcontrol-v1.h +++ b/mm/memcontrol-v1.h @@ -29,11 +29,11 @@ static inline int try_charge(struct mem_cgroup *memcg, = gfp_t gfp_mask, void mem_cgroup_id_get_many(struct mem_cgroup *memcg, unsigned int n); void mem_cgroup_id_put_many(struct mem_cgroup *memcg, unsigned int n); =20 -bool mem_cgroup_wait_acct_move(struct mem_cgroup *memcg); +bool memcg1_wait_acct_move(struct mem_cgroup *memcg); struct cgroup_taskset; -int mem_cgroup_can_attach(struct cgroup_taskset *tset); -void mem_cgroup_cancel_attach(struct cgroup_taskset *tset); -void mem_cgroup_move_task(void); +int memcg1_can_attach(struct cgroup_taskset *tset); +void memcg1_cancel_attach(struct cgroup_taskset *tset); +void memcg1_move_task(void); =20 struct cftype; u64 mem_cgroup_move_charge_read(struct cgroup_subsys_state *css, diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 3332c89cae2e..da2c0fa0de1b 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -2582,7 +2582,7 @@ int try_charge_memcg(struct mem_cgroup *memcg, gfp_t = gfp_mask, * At task move, charge accounts can be doubly counted. So, it's * better to wait until the end of task_move if something is going on. */ - if (mem_cgroup_wait_acct_move(mem_over_limit)) + if (memcg1_wait_acct_move(mem_over_limit)) goto retry; =20 if (nr_retries--) @@ -6030,12 +6030,12 @@ struct cgroup_subsys memory_cgrp_subsys =3D { .css_free =3D mem_cgroup_css_free, .css_reset =3D mem_cgroup_css_reset, .css_rstat_flush =3D mem_cgroup_css_rstat_flush, - .can_attach =3D mem_cgroup_can_attach, + .can_attach =3D memcg1_can_attach, #if defined(CONFIG_LRU_GEN) || defined(CONFIG_MEMCG_KMEM) .attach =3D mem_cgroup_attach, #endif - .cancel_attach =3D mem_cgroup_cancel_attach, - .post_attach =3D mem_cgroup_move_task, + .cancel_attach =3D memcg1_cancel_attach, + .post_attach =3D memcg1_move_task, #ifdef CONFIG_MEMCG_KMEM .fork =3D mem_cgroup_fork, .exit =3D mem_cgroup_exit, --=20 2.45.2 From nobody Sat Feb 7 20:06:57 2026 Received: from out-180.mta0.migadu.com (out-180.mta0.migadu.com [91.218.175.180]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 518F614AA9 for ; Tue, 25 Jun 2024 00:59:41 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=91.218.175.180 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1719277184; cv=none; b=CAR1fk3bUVzIMTAPAStsnb1fxDnONOKja479QcV3xu/LZu3yjnK4WBgEM90z+CUGlxm5r5YWuqtn9/giL66XoAQyzgg2pwNyV52ISTZ8xgQz/P87df/aRJz6QgGfaQ/C7z3oQ2QrcwCuE8G655c0OYq7ZhVooBG8v/nDCZHFCrs= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1719277184; c=relaxed/simple; bh=5lJa8ccXeBkkQFuH5fsvJWUmOrLNQiuvCIJPlYSYLd8=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=ezMMja/R0qYD9R2fvp8HQes6JU1Q0bKTEohDv6+pSIEESi73BEVUdT/yKvn17eqNJ7w5zojmP/mEz0QfQQshKyL99nJ5qZNroAI9h0kDB97ZCsh6l2g7GL227vJQQ0a3JmZCdVs9Dkbj2UjVXKjbbMOfOr97Qt8UWXxqRWO6DsA= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev; spf=pass smtp.mailfrom=linux.dev; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b=sRsGEcY6; arc=none smtp.client-ip=91.218.175.180 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.dev Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b="sRsGEcY6" X-Envelope-To: akpm@linux-foundation.org DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1719277179; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=uHE77+TOfsb44R4Zdivy+arvI6EdnE8krvjgQcOnIU4=; b=sRsGEcY6QL5IC0YQkXYUoQ7Kg1j8SVrZCi5BN8/Fx9mXprwavTJIVwpDVmJNHe5gWla/RV 8nn2dOVKeiKzenbNl2HTsZiqfYJ82CVfF58+i6cK4RJJRCAnC+GhVA9S8dt5ckBi7XAQGX Am4uKaptrsTYH28tdXRhRb6+sbe1C1o= X-Envelope-To: hannes@cmpxchg.org X-Envelope-To: mhocko@kernel.org X-Envelope-To: shakeel.butt@linux.dev X-Envelope-To: muchun.song@linux.dev X-Envelope-To: linux-kernel@vger.kernel.org X-Envelope-To: cgroups@vger.kernel.org X-Envelope-To: linux-mm@kvack.org X-Envelope-To: roman.gushchin@linux.dev X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. From: Roman Gushchin To: Andrew Morton Cc: Johannes Weiner , Michal Hocko , Shakeel Butt , Muchun Song , linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, linux-mm@kvack.org, Roman Gushchin Subject: [PATCH v2 06/14] mm: memcg: move legacy memcg event code into memcontrol-v1.c Date: Mon, 24 Jun 2024 17:58:58 -0700 Message-ID: <20240625005906.106920-7-roman.gushchin@linux.dev> In-Reply-To: <20240625005906.106920-1-roman.gushchin@linux.dev> References: <20240625005906.106920-1-roman.gushchin@linux.dev> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Migadu-Flow: FLOW_OUT Content-Type: text/plain; charset="utf-8" Cgroup v1's memory controller contains a pretty complicated event notifications mechanism which is not used on cgroup v2. Let's move the corresponding code into memcontrol-v1.c. Please, note, that mem_cgroup_event_ratelimit() remains in memcontrol.c, otherwise it would require exporting too many details on memcg stats outside of memcontrol.c. Signed-off-by: Roman Gushchin Acked-by: Michal Hocko Acked-by: Shakeel Butt Suggested-by: Matthew Wilcox (Oracle) --- include/linux/memcontrol.h | 12 - mm/memcontrol-v1.c | 653 +++++++++++++++++++++++++++++++++++ mm/memcontrol-v1.h | 51 +++ mm/memcontrol.c | 687 +------------------------------------ 4 files changed, 709 insertions(+), 694 deletions(-) diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h index 83c8327455d8..588179d29849 100644 --- a/include/linux/memcontrol.h +++ b/include/linux/memcontrol.h @@ -69,18 +69,6 @@ struct mem_cgroup_id { refcount_t ref; }; =20 -/* - * Per memcg event counter is incremented at every pagein/pageout. With TH= P, - * it will be incremented by the number of pages. This counter is used - * to trigger some periodic events. This is straightforward and better - * than using jiffies etc. to handle periodic memcg event. - */ -enum mem_cgroup_events_target { - MEM_CGROUP_TARGET_THRESH, - MEM_CGROUP_TARGET_SOFTLIMIT, - MEM_CGROUP_NTARGETS, -}; - struct memcg_vmstats_percpu; struct memcg_vmstats; struct lruvec_stats_percpu; diff --git a/mm/memcontrol-v1.c b/mm/memcontrol-v1.c index c25e038ac874..4b2290ceace6 100644 --- a/mm/memcontrol-v1.c +++ b/mm/memcontrol-v1.c @@ -6,6 +6,10 @@ #include #include #include +#include +#include +#include +#include =20 #include "internal.h" #include "swap.h" @@ -60,6 +64,54 @@ static struct move_charge_struct { .waitq =3D __WAIT_QUEUE_HEAD_INITIALIZER(mc.waitq), }; =20 +/* for OOM */ +struct mem_cgroup_eventfd_list { + struct list_head list; + struct eventfd_ctx *eventfd; +}; + +/* + * cgroup_event represents events which userspace want to receive. + */ +struct mem_cgroup_event { + /* + * memcg which the event belongs to. + */ + struct mem_cgroup *memcg; + /* + * eventfd to signal userspace about the event. + */ + struct eventfd_ctx *eventfd; + /* + * Each of these stored in a list by the cgroup. + */ + struct list_head list; + /* + * register_event() callback will be used to add new userspace + * waiter for changes related to this event. Use eventfd_signal() + * on eventfd to send notification to userspace. + */ + int (*register_event)(struct mem_cgroup *memcg, + struct eventfd_ctx *eventfd, const char *args); + /* + * unregister_event() callback will be called when userspace closes + * the eventfd or on cgroup removing. This callback must be set, + * if you want provide notification functionality. + */ + void (*unregister_event)(struct mem_cgroup *memcg, + struct eventfd_ctx *eventfd); + /* + * All fields below needed to unregister event when + * userspace closes eventfd. + */ + poll_table pt; + wait_queue_head_t *wqh; + wait_queue_entry_t wait; + struct work_struct remove; +}; + +extern spinlock_t memcg_oom_lock; + static void __mem_cgroup_insert_exceeded(struct mem_cgroup_per_node *mz, struct mem_cgroup_tree_per_node *mctz, unsigned long new_usage_in_excess) @@ -1306,6 +1358,607 @@ void memcg1_move_task(void) } #endif =20 +static void __mem_cgroup_threshold(struct mem_cgroup *memcg, bool swap) +{ + struct mem_cgroup_threshold_ary *t; + unsigned long usage; + int i; + + rcu_read_lock(); + if (!swap) + t =3D rcu_dereference(memcg->thresholds.primary); + else + t =3D rcu_dereference(memcg->memsw_thresholds.primary); + + if (!t) + goto unlock; + + usage =3D mem_cgroup_usage(memcg, swap); + + /* + * current_threshold points to threshold just below or equal to usage. + * If it's not true, a threshold was crossed after last + * call of __mem_cgroup_threshold(). + */ + i =3D t->current_threshold; + + /* + * Iterate backward over array of thresholds starting from + * current_threshold and check if a threshold is crossed. + * If none of thresholds below usage is crossed, we read + * only one element of the array here. + */ + for (; i >=3D 0 && unlikely(t->entries[i].threshold > usage); i--) + eventfd_signal(t->entries[i].eventfd); + + /* i =3D current_threshold + 1 */ + i++; + + /* + * Iterate forward over array of thresholds starting from + * current_threshold+1 and check if a threshold is crossed. + * If none of thresholds above usage is crossed, we read + * only one element of the array here. + */ + for (; i < t->size && unlikely(t->entries[i].threshold <=3D usage); i++) + eventfd_signal(t->entries[i].eventfd); + + /* Update current_threshold */ + t->current_threshold =3D i - 1; +unlock: + rcu_read_unlock(); +} + +static void mem_cgroup_threshold(struct mem_cgroup *memcg) +{ + while (memcg) { + __mem_cgroup_threshold(memcg, false); + if (do_memsw_account()) + __mem_cgroup_threshold(memcg, true); + + memcg =3D parent_mem_cgroup(memcg); + } +} + +/* + * Check events in order. + * + */ +void memcg_check_events(struct mem_cgroup *memcg, int nid) +{ + if (IS_ENABLED(CONFIG_PREEMPT_RT)) + return; + + /* threshold event is triggered in finer grain than soft limit */ + if (unlikely(mem_cgroup_event_ratelimit(memcg, + MEM_CGROUP_TARGET_THRESH))) { + bool do_softlimit; + + do_softlimit =3D mem_cgroup_event_ratelimit(memcg, + MEM_CGROUP_TARGET_SOFTLIMIT); + mem_cgroup_threshold(memcg); + if (unlikely(do_softlimit)) + memcg1_update_tree(memcg, nid); + } +} + +static int compare_thresholds(const void *a, const void *b) +{ + const struct mem_cgroup_threshold *_a =3D a; + const struct mem_cgroup_threshold *_b =3D b; + + if (_a->threshold > _b->threshold) + return 1; + + if (_a->threshold < _b->threshold) + return -1; + + return 0; +} + +static int mem_cgroup_oom_notify_cb(struct mem_cgroup *memcg) +{ + struct mem_cgroup_eventfd_list *ev; + + spin_lock(&memcg_oom_lock); + + list_for_each_entry(ev, &memcg->oom_notify, list) + eventfd_signal(ev->eventfd); + + spin_unlock(&memcg_oom_lock); + return 0; +} + +void mem_cgroup_oom_notify(struct mem_cgroup *memcg) +{ + struct mem_cgroup *iter; + + for_each_mem_cgroup_tree(iter, memcg) + mem_cgroup_oom_notify_cb(iter); +} + +static int __mem_cgroup_usage_register_event(struct mem_cgroup *memcg, + struct eventfd_ctx *eventfd, const char *args, enum res_type type) +{ + struct mem_cgroup_thresholds *thresholds; + struct mem_cgroup_threshold_ary *new; + unsigned long threshold; + unsigned long usage; + int i, size, ret; + + ret =3D page_counter_memparse(args, "-1", &threshold); + if (ret) + return ret; + + mutex_lock(&memcg->thresholds_lock); + + if (type =3D=3D _MEM) { + thresholds =3D &memcg->thresholds; + usage =3D mem_cgroup_usage(memcg, false); + } else if (type =3D=3D _MEMSWAP) { + thresholds =3D &memcg->memsw_thresholds; + usage =3D mem_cgroup_usage(memcg, true); + } else + BUG(); + + /* Check if a threshold crossed before adding a new one */ + if (thresholds->primary) + __mem_cgroup_threshold(memcg, type =3D=3D _MEMSWAP); + + size =3D thresholds->primary ? thresholds->primary->size + 1 : 1; + + /* Allocate memory for new array of thresholds */ + new =3D kmalloc(struct_size(new, entries, size), GFP_KERNEL); + if (!new) { + ret =3D -ENOMEM; + goto unlock; + } + new->size =3D size; + + /* Copy thresholds (if any) to new array */ + if (thresholds->primary) + memcpy(new->entries, thresholds->primary->entries, + flex_array_size(new, entries, size - 1)); + + /* Add new threshold */ + new->entries[size - 1].eventfd =3D eventfd; + new->entries[size - 1].threshold =3D threshold; + + /* Sort thresholds. Registering of new threshold isn't time-critical */ + sort(new->entries, size, sizeof(*new->entries), + compare_thresholds, NULL); + + /* Find current threshold */ + new->current_threshold =3D -1; + for (i =3D 0; i < size; i++) { + if (new->entries[i].threshold <=3D usage) { + /* + * new->current_threshold will not be used until + * rcu_assign_pointer(), so it's safe to increment + * it here. + */ + ++new->current_threshold; + } else + break; + } + + /* Free old spare buffer and save old primary buffer as spare */ + kfree(thresholds->spare); + thresholds->spare =3D thresholds->primary; + + rcu_assign_pointer(thresholds->primary, new); + + /* To be sure that nobody uses thresholds */ + synchronize_rcu(); + +unlock: + mutex_unlock(&memcg->thresholds_lock); + + return ret; +} + +static int mem_cgroup_usage_register_event(struct mem_cgroup *memcg, + struct eventfd_ctx *eventfd, const char *args) +{ + return __mem_cgroup_usage_register_event(memcg, eventfd, args, _MEM); +} + +static int memsw_cgroup_usage_register_event(struct mem_cgroup *memcg, + struct eventfd_ctx *eventfd, const char *args) +{ + return __mem_cgroup_usage_register_event(memcg, eventfd, args, _MEMSWAP); +} + +static void __mem_cgroup_usage_unregister_event(struct mem_cgroup *memcg, + struct eventfd_ctx *eventfd, enum res_type type) +{ + struct mem_cgroup_thresholds *thresholds; + struct mem_cgroup_threshold_ary *new; + unsigned long usage; + int i, j, size, entries; + + mutex_lock(&memcg->thresholds_lock); + + if (type =3D=3D _MEM) { + thresholds =3D &memcg->thresholds; + usage =3D mem_cgroup_usage(memcg, false); + } else if (type =3D=3D _MEMSWAP) { + thresholds =3D &memcg->memsw_thresholds; + usage =3D mem_cgroup_usage(memcg, true); + } else + BUG(); + + if (!thresholds->primary) + goto unlock; + + /* Check if a threshold crossed before removing */ + __mem_cgroup_threshold(memcg, type =3D=3D _MEMSWAP); + + /* Calculate new number of threshold */ + size =3D entries =3D 0; + for (i =3D 0; i < thresholds->primary->size; i++) { + if (thresholds->primary->entries[i].eventfd !=3D eventfd) + size++; + else + entries++; + } + + new =3D thresholds->spare; + + /* If no items related to eventfd have been cleared, nothing to do */ + if (!entries) + goto unlock; + + /* Set thresholds array to NULL if we don't have thresholds */ + if (!size) { + kfree(new); + new =3D NULL; + goto swap_buffers; + } + + new->size =3D size; + + /* Copy thresholds and find current threshold */ + new->current_threshold =3D -1; + for (i =3D 0, j =3D 0; i < thresholds->primary->size; i++) { + if (thresholds->primary->entries[i].eventfd =3D=3D eventfd) + continue; + + new->entries[j] =3D thresholds->primary->entries[i]; + if (new->entries[j].threshold <=3D usage) { + /* + * new->current_threshold will not be used + * until rcu_assign_pointer(), so it's safe to increment + * it here. + */ + ++new->current_threshold; + } + j++; + } + +swap_buffers: + /* Swap primary and spare array */ + thresholds->spare =3D thresholds->primary; + + rcu_assign_pointer(thresholds->primary, new); + + /* To be sure that nobody uses thresholds */ + synchronize_rcu(); + + /* If all events are unregistered, free the spare array */ + if (!new) { + kfree(thresholds->spare); + thresholds->spare =3D NULL; + } +unlock: + mutex_unlock(&memcg->thresholds_lock); +} + +static void mem_cgroup_usage_unregister_event(struct mem_cgroup *memcg, + struct eventfd_ctx *eventfd) +{ + return __mem_cgroup_usage_unregister_event(memcg, eventfd, _MEM); +} + +static void memsw_cgroup_usage_unregister_event(struct mem_cgroup *memcg, + struct eventfd_ctx *eventfd) +{ + return __mem_cgroup_usage_unregister_event(memcg, eventfd, _MEMSWAP); +} + +static int mem_cgroup_oom_register_event(struct mem_cgroup *memcg, + struct eventfd_ctx *eventfd, const char *args) +{ + struct mem_cgroup_eventfd_list *event; + + event =3D kmalloc(sizeof(*event), GFP_KERNEL); + if (!event) + return -ENOMEM; + + spin_lock(&memcg_oom_lock); + + event->eventfd =3D eventfd; + list_add(&event->list, &memcg->oom_notify); + + /* already in OOM ? */ + if (memcg->under_oom) + eventfd_signal(eventfd); + spin_unlock(&memcg_oom_lock); + + return 0; +} + +static void mem_cgroup_oom_unregister_event(struct mem_cgroup *memcg, + struct eventfd_ctx *eventfd) +{ + struct mem_cgroup_eventfd_list *ev, *tmp; + + spin_lock(&memcg_oom_lock); + + list_for_each_entry_safe(ev, tmp, &memcg->oom_notify, list) { + if (ev->eventfd =3D=3D eventfd) { + list_del(&ev->list); + kfree(ev); + } + } + + spin_unlock(&memcg_oom_lock); +} + +/* + * DO NOT USE IN NEW FILES. + * + * "cgroup.event_control" implementation. + * + * This is way over-engineered. It tries to support fully configurable + * events for each user. Such level of flexibility is completely + * unnecessary especially in the light of the planned unified hierarchy. + * + * Please deprecate this and replace with something simpler if at all + * possible. + */ + +/* + * Unregister event and free resources. + * + * Gets called from workqueue. + */ +static void memcg_event_remove(struct work_struct *work) +{ + struct mem_cgroup_event *event =3D + container_of(work, struct mem_cgroup_event, remove); + struct mem_cgroup *memcg =3D event->memcg; + + remove_wait_queue(event->wqh, &event->wait); + + event->unregister_event(memcg, event->eventfd); + + /* Notify userspace the event is going away. */ + eventfd_signal(event->eventfd); + + eventfd_ctx_put(event->eventfd); + kfree(event); + css_put(&memcg->css); +} + +/* + * Gets called on EPOLLHUP on eventfd when user closes it. + * + * Called with wqh->lock held and interrupts disabled. + */ +static int memcg_event_wake(wait_queue_entry_t *wait, unsigned mode, + int sync, void *key) +{ + struct mem_cgroup_event *event =3D + container_of(wait, struct mem_cgroup_event, wait); + struct mem_cgroup *memcg =3D event->memcg; + __poll_t flags =3D key_to_poll(key); + + if (flags & EPOLLHUP) { + /* + * If the event has been detached at cgroup removal, we + * can simply return knowing the other side will cleanup + * for us. + * + * We can't race against event freeing since the other + * side will require wqh->lock via remove_wait_queue(), + * which we hold. + */ + spin_lock(&memcg->event_list_lock); + if (!list_empty(&event->list)) { + list_del_init(&event->list); + /* + * We are in atomic context, but cgroup_event_remove() + * may sleep, so we have to call it in workqueue. + */ + schedule_work(&event->remove); + } + spin_unlock(&memcg->event_list_lock); + } + + return 0; +} + +static void memcg_event_ptable_queue_proc(struct file *file, + wait_queue_head_t *wqh, poll_table *pt) +{ + struct mem_cgroup_event *event =3D + container_of(pt, struct mem_cgroup_event, pt); + + event->wqh =3D wqh; + add_wait_queue(wqh, &event->wait); +} + +/* + * DO NOT USE IN NEW FILES. + * + * Parse input and register new cgroup event handler. + * + * Input must be in format ' '. + * Interpretation of args is defined by control file implementation. + */ +ssize_t memcg_write_event_control(struct kernfs_open_file *of, + char *buf, size_t nbytes, loff_t off) +{ + struct cgroup_subsys_state *css =3D of_css(of); + struct mem_cgroup *memcg =3D mem_cgroup_from_css(css); + struct mem_cgroup_event *event; + struct cgroup_subsys_state *cfile_css; + unsigned int efd, cfd; + struct fd efile; + struct fd cfile; + struct dentry *cdentry; + const char *name; + char *endp; + int ret; + + if (IS_ENABLED(CONFIG_PREEMPT_RT)) + return -EOPNOTSUPP; + + buf =3D strstrip(buf); + + efd =3D simple_strtoul(buf, &endp, 10); + if (*endp !=3D ' ') + return -EINVAL; + buf =3D endp + 1; + + cfd =3D simple_strtoul(buf, &endp, 10); + if ((*endp !=3D ' ') && (*endp !=3D '\0')) + return -EINVAL; + buf =3D endp + 1; + + event =3D kzalloc(sizeof(*event), GFP_KERNEL); + if (!event) + return -ENOMEM; + + event->memcg =3D memcg; + INIT_LIST_HEAD(&event->list); + init_poll_funcptr(&event->pt, memcg_event_ptable_queue_proc); + init_waitqueue_func_entry(&event->wait, memcg_event_wake); + INIT_WORK(&event->remove, memcg_event_remove); + + efile =3D fdget(efd); + if (!efile.file) { + ret =3D -EBADF; + goto out_kfree; + } + + event->eventfd =3D eventfd_ctx_fileget(efile.file); + if (IS_ERR(event->eventfd)) { + ret =3D PTR_ERR(event->eventfd); + goto out_put_efile; + } + + cfile =3D fdget(cfd); + if (!cfile.file) { + ret =3D -EBADF; + goto out_put_eventfd; + } + + /* the process need read permission on control file */ + /* AV: shouldn't we check that it's been opened for read instead? */ + ret =3D file_permission(cfile.file, MAY_READ); + if (ret < 0) + goto out_put_cfile; + + /* + * The control file must be a regular cgroup1 file. As a regular cgroup + * file can't be renamed, it's safe to access its name afterwards. + */ + cdentry =3D cfile.file->f_path.dentry; + if (cdentry->d_sb->s_type !=3D &cgroup_fs_type || !d_is_reg(cdentry)) { + ret =3D -EINVAL; + goto out_put_cfile; + } + + /* + * Determine the event callbacks and set them in @event. This used + * to be done via struct cftype but cgroup core no longer knows + * about these events. The following is crude but the whole thing + * is for compatibility anyway. + * + * DO NOT ADD NEW FILES. + */ + name =3D cdentry->d_name.name; + + if (!strcmp(name, "memory.usage_in_bytes")) { + event->register_event =3D mem_cgroup_usage_register_event; + event->unregister_event =3D mem_cgroup_usage_unregister_event; + } else if (!strcmp(name, "memory.oom_control")) { + event->register_event =3D mem_cgroup_oom_register_event; + event->unregister_event =3D mem_cgroup_oom_unregister_event; + } else if (!strcmp(name, "memory.pressure_level")) { + event->register_event =3D vmpressure_register_event; + event->unregister_event =3D vmpressure_unregister_event; + } else if (!strcmp(name, "memory.memsw.usage_in_bytes")) { + event->register_event =3D memsw_cgroup_usage_register_event; + event->unregister_event =3D memsw_cgroup_usage_unregister_event; + } else { + ret =3D -EINVAL; + goto out_put_cfile; + } + + /* + * Verify @cfile should belong to @css. Also, remaining events are + * automatically removed on cgroup destruction but the removal is + * asynchronous, so take an extra ref on @css. + */ + cfile_css =3D css_tryget_online_from_dir(cdentry->d_parent, + &memory_cgrp_subsys); + ret =3D -EINVAL; + if (IS_ERR(cfile_css)) + goto out_put_cfile; + if (cfile_css !=3D css) { + css_put(cfile_css); + goto out_put_cfile; + } + + ret =3D event->register_event(memcg, event->eventfd, buf); + if (ret) + goto out_put_css; + + vfs_poll(efile.file, &event->pt); + + spin_lock_irq(&memcg->event_list_lock); + list_add(&event->list, &memcg->event_list); + spin_unlock_irq(&memcg->event_list_lock); + + fdput(cfile); + fdput(efile); + + return nbytes; + +out_put_css: + css_put(css); +out_put_cfile: + fdput(cfile); +out_put_eventfd: + eventfd_ctx_put(event->eventfd); +out_put_efile: + fdput(efile); +out_kfree: + kfree(event); + + return ret; +} + +void memcg1_css_offline(struct mem_cgroup *memcg) +{ + struct mem_cgroup_event *event, *tmp; + + /* + * Unregister events and notify userspace. + * Notify userspace about cgroup removing only after rmdir of cgroup + * directory to avoid race between userspace and kernelspace. + */ + spin_lock_irq(&memcg->event_list_lock); + list_for_each_entry_safe(event, tmp, &memcg->event_list, list) { + list_del_init(&event->list); + schedule_work(&event->remove); + } + spin_unlock_irq(&memcg->event_list_lock); +} + static int __init memcg1_init(void) { int node; diff --git a/mm/memcontrol-v1.h b/mm/memcontrol-v1.h index d377c0be9880..524a2c76ffc9 100644 --- a/mm/memcontrol-v1.h +++ b/mm/memcontrol-v1.h @@ -41,4 +41,55 @@ u64 mem_cgroup_move_charge_read(struct cgroup_subsys_sta= te *css, int mem_cgroup_move_charge_write(struct cgroup_subsys_state *css, struct cftype *cft, u64 val); =20 +/* + * Per memcg event counter is incremented at every pagein/pageout. With TH= P, + * it will be incremented by the number of pages. This counter is used + * to trigger some periodic events. This is straightforward and better + * than using jiffies etc. to handle periodic memcg event. + */ +enum mem_cgroup_events_target { + MEM_CGROUP_TARGET_THRESH, + MEM_CGROUP_TARGET_SOFTLIMIT, + MEM_CGROUP_NTARGETS, +}; + +/* Whether legacy memory+swap accounting is active */ +static bool do_memsw_account(void) +{ + return !cgroup_subsys_on_dfl(memory_cgrp_subsys); +} + +/* + * Iteration constructs for visiting all cgroups (under a tree). If + * loops are exited prematurely (break), mem_cgroup_iter_break() must + * be used for reference counting. + */ +#define for_each_mem_cgroup_tree(iter, root) \ + for (iter =3D mem_cgroup_iter(root, NULL, NULL); \ + iter !=3D NULL; \ + iter =3D mem_cgroup_iter(root, iter, NULL)) + +#define for_each_mem_cgroup(iter) \ + for (iter =3D mem_cgroup_iter(NULL, NULL, NULL); \ + iter !=3D NULL; \ + iter =3D mem_cgroup_iter(NULL, iter, NULL)) + +void memcg1_css_offline(struct mem_cgroup *memcg); + +/* for encoding cft->private value on file */ +enum res_type { + _MEM, + _MEMSWAP, + _KMEM, + _TCP, +}; + +bool mem_cgroup_event_ratelimit(struct mem_cgroup *memcg, + enum mem_cgroup_events_target target); +unsigned long mem_cgroup_usage(struct mem_cgroup *memcg, bool swap); +void mem_cgroup_oom_notify(struct mem_cgroup *memcg); +ssize_t memcg_write_event_control(struct kernfs_open_file *of, + char *buf, size_t nbytes, loff_t off); + + #endif /* __MM_MEMCONTROL_V1_H */ diff --git a/mm/memcontrol.c b/mm/memcontrol.c index da2c0fa0de1b..bd4b26a73596 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -46,9 +46,6 @@ #include #include #include -#include -#include -#include #include #include #include @@ -59,7 +56,6 @@ #include #include #include -#include #include #include #include @@ -97,91 +93,13 @@ static bool cgroup_memory_nobpf __ro_after_init; static DECLARE_WAIT_QUEUE_HEAD(memcg_cgwb_frn_waitq); #endif =20 -/* Whether legacy memory+swap accounting is active */ -static bool do_memsw_account(void) -{ - return !cgroup_subsys_on_dfl(memory_cgrp_subsys); -} - #define THRESHOLDS_EVENTS_TARGET 128 #define SOFTLIMIT_EVENTS_TARGET 1024 =20 -/* for OOM */ -struct mem_cgroup_eventfd_list { - struct list_head list; - struct eventfd_ctx *eventfd; -}; - -/* - * cgroup_event represents events which userspace want to receive. - */ -struct mem_cgroup_event { - /* - * memcg which the event belongs to. - */ - struct mem_cgroup *memcg; - /* - * eventfd to signal userspace about the event. - */ - struct eventfd_ctx *eventfd; - /* - * Each of these stored in a list by the cgroup. - */ - struct list_head list; - /* - * register_event() callback will be used to add new userspace - * waiter for changes related to this event. Use eventfd_signal() - * on eventfd to send notification to userspace. - */ - int (*register_event)(struct mem_cgroup *memcg, - struct eventfd_ctx *eventfd, const char *args); - /* - * unregister_event() callback will be called when userspace closes - * the eventfd or on cgroup removing. This callback must be set, - * if you want provide notification functionality. - */ - void (*unregister_event)(struct mem_cgroup *memcg, - struct eventfd_ctx *eventfd); - /* - * All fields below needed to unregister event when - * userspace closes eventfd. - */ - poll_table pt; - wait_queue_head_t *wqh; - wait_queue_entry_t wait; - struct work_struct remove; -}; - -static void mem_cgroup_threshold(struct mem_cgroup *memcg); -static void mem_cgroup_oom_notify(struct mem_cgroup *memcg); - -/* for encoding cft->private value on file */ -enum res_type { - _MEM, - _MEMSWAP, - _KMEM, - _TCP, -}; - #define MEMFILE_PRIVATE(x, val) ((x) << 16 | (val)) #define MEMFILE_TYPE(val) ((val) >> 16 & 0xffff) #define MEMFILE_ATTR(val) ((val) & 0xffff) =20 -/* - * Iteration constructs for visiting all cgroups (under a tree). If - * loops are exited prematurely (break), mem_cgroup_iter_break() must - * be used for reference counting. - */ -#define for_each_mem_cgroup_tree(iter, root) \ - for (iter =3D mem_cgroup_iter(root, NULL, NULL); \ - iter !=3D NULL; \ - iter =3D mem_cgroup_iter(root, iter, NULL)) - -#define for_each_mem_cgroup(iter) \ - for (iter =3D mem_cgroup_iter(NULL, NULL, NULL); \ - iter !=3D NULL; \ - iter =3D mem_cgroup_iter(NULL, iter, NULL)) - static inline bool task_is_dying(void) { return tsk_is_oom_victim(current) || fatal_signal_pending(current) || @@ -940,8 +858,8 @@ void mem_cgroup_charge_statistics(struct mem_cgroup *me= mcg, int nr_pages) __this_cpu_add(memcg->vmstats_percpu->nr_page_events, nr_pages); } =20 -static bool mem_cgroup_event_ratelimit(struct mem_cgroup *memcg, - enum mem_cgroup_events_target target) +bool mem_cgroup_event_ratelimit(struct mem_cgroup *memcg, + enum mem_cgroup_events_target target) { unsigned long val, next; =20 @@ -965,28 +883,6 @@ static bool mem_cgroup_event_ratelimit(struct mem_cgro= up *memcg, return false; } =20 -/* - * Check events in order. - * - */ -void memcg_check_events(struct mem_cgroup *memcg, int nid) -{ - if (IS_ENABLED(CONFIG_PREEMPT_RT)) - return; - - /* threshold event is triggered in finer grain than soft limit */ - if (unlikely(mem_cgroup_event_ratelimit(memcg, - MEM_CGROUP_TARGET_THRESH))) { - bool do_softlimit; - - do_softlimit =3D mem_cgroup_event_ratelimit(memcg, - MEM_CGROUP_TARGET_SOFTLIMIT); - mem_cgroup_threshold(memcg); - if (unlikely(do_softlimit)) - memcg1_update_tree(memcg, nid); - } -} - struct mem_cgroup *mem_cgroup_from_task(struct task_struct *p) { /* @@ -1726,7 +1622,7 @@ static struct lockdep_map memcg_oom_lock_dep_map =3D { }; #endif =20 -static DEFINE_SPINLOCK(memcg_oom_lock); +DEFINE_SPINLOCK(memcg_oom_lock); =20 /* * Check OOM-Killer is already running under our hierarchy. @@ -3545,7 +3441,7 @@ static int mem_cgroup_hierarchy_write(struct cgroup_s= ubsys_state *css, return -EINVAL; } =20 -static unsigned long mem_cgroup_usage(struct mem_cgroup *memcg, bool swap) +unsigned long mem_cgroup_usage(struct mem_cgroup *memcg, bool swap) { unsigned long val; =20 @@ -4046,331 +3942,6 @@ static int mem_cgroup_swappiness_write(struct cgrou= p_subsys_state *css, return 0; } =20 -static void __mem_cgroup_threshold(struct mem_cgroup *memcg, bool swap) -{ - struct mem_cgroup_threshold_ary *t; - unsigned long usage; - int i; - - rcu_read_lock(); - if (!swap) - t =3D rcu_dereference(memcg->thresholds.primary); - else - t =3D rcu_dereference(memcg->memsw_thresholds.primary); - - if (!t) - goto unlock; - - usage =3D mem_cgroup_usage(memcg, swap); - - /* - * current_threshold points to threshold just below or equal to usage. - * If it's not true, a threshold was crossed after last - * call of __mem_cgroup_threshold(). - */ - i =3D t->current_threshold; - - /* - * Iterate backward over array of thresholds starting from - * current_threshold and check if a threshold is crossed. - * If none of thresholds below usage is crossed, we read - * only one element of the array here. - */ - for (; i >=3D 0 && unlikely(t->entries[i].threshold > usage); i--) - eventfd_signal(t->entries[i].eventfd); - - /* i =3D current_threshold + 1 */ - i++; - - /* - * Iterate forward over array of thresholds starting from - * current_threshold+1 and check if a threshold is crossed. - * If none of thresholds above usage is crossed, we read - * only one element of the array here. - */ - for (; i < t->size && unlikely(t->entries[i].threshold <=3D usage); i++) - eventfd_signal(t->entries[i].eventfd); - - /* Update current_threshold */ - t->current_threshold =3D i - 1; -unlock: - rcu_read_unlock(); -} - -static void mem_cgroup_threshold(struct mem_cgroup *memcg) -{ - while (memcg) { - __mem_cgroup_threshold(memcg, false); - if (do_memsw_account()) - __mem_cgroup_threshold(memcg, true); - - memcg =3D parent_mem_cgroup(memcg); - } -} - -static int compare_thresholds(const void *a, const void *b) -{ - const struct mem_cgroup_threshold *_a =3D a; - const struct mem_cgroup_threshold *_b =3D b; - - if (_a->threshold > _b->threshold) - return 1; - - if (_a->threshold < _b->threshold) - return -1; - - return 0; -} - -static int mem_cgroup_oom_notify_cb(struct mem_cgroup *memcg) -{ - struct mem_cgroup_eventfd_list *ev; - - spin_lock(&memcg_oom_lock); - - list_for_each_entry(ev, &memcg->oom_notify, list) - eventfd_signal(ev->eventfd); - - spin_unlock(&memcg_oom_lock); - return 0; -} - -static void mem_cgroup_oom_notify(struct mem_cgroup *memcg) -{ - struct mem_cgroup *iter; - - for_each_mem_cgroup_tree(iter, memcg) - mem_cgroup_oom_notify_cb(iter); -} - -static int __mem_cgroup_usage_register_event(struct mem_cgroup *memcg, - struct eventfd_ctx *eventfd, const char *args, enum res_type type) -{ - struct mem_cgroup_thresholds *thresholds; - struct mem_cgroup_threshold_ary *new; - unsigned long threshold; - unsigned long usage; - int i, size, ret; - - ret =3D page_counter_memparse(args, "-1", &threshold); - if (ret) - return ret; - - mutex_lock(&memcg->thresholds_lock); - - if (type =3D=3D _MEM) { - thresholds =3D &memcg->thresholds; - usage =3D mem_cgroup_usage(memcg, false); - } else if (type =3D=3D _MEMSWAP) { - thresholds =3D &memcg->memsw_thresholds; - usage =3D mem_cgroup_usage(memcg, true); - } else - BUG(); - - /* Check if a threshold crossed before adding a new one */ - if (thresholds->primary) - __mem_cgroup_threshold(memcg, type =3D=3D _MEMSWAP); - - size =3D thresholds->primary ? thresholds->primary->size + 1 : 1; - - /* Allocate memory for new array of thresholds */ - new =3D kmalloc(struct_size(new, entries, size), GFP_KERNEL); - if (!new) { - ret =3D -ENOMEM; - goto unlock; - } - new->size =3D size; - - /* Copy thresholds (if any) to new array */ - if (thresholds->primary) - memcpy(new->entries, thresholds->primary->entries, - flex_array_size(new, entries, size - 1)); - - /* Add new threshold */ - new->entries[size - 1].eventfd =3D eventfd; - new->entries[size - 1].threshold =3D threshold; - - /* Sort thresholds. Registering of new threshold isn't time-critical */ - sort(new->entries, size, sizeof(*new->entries), - compare_thresholds, NULL); - - /* Find current threshold */ - new->current_threshold =3D -1; - for (i =3D 0; i < size; i++) { - if (new->entries[i].threshold <=3D usage) { - /* - * new->current_threshold will not be used until - * rcu_assign_pointer(), so it's safe to increment - * it here. - */ - ++new->current_threshold; - } else - break; - } - - /* Free old spare buffer and save old primary buffer as spare */ - kfree(thresholds->spare); - thresholds->spare =3D thresholds->primary; - - rcu_assign_pointer(thresholds->primary, new); - - /* To be sure that nobody uses thresholds */ - synchronize_rcu(); - -unlock: - mutex_unlock(&memcg->thresholds_lock); - - return ret; -} - -static int mem_cgroup_usage_register_event(struct mem_cgroup *memcg, - struct eventfd_ctx *eventfd, const char *args) -{ - return __mem_cgroup_usage_register_event(memcg, eventfd, args, _MEM); -} - -static int memsw_cgroup_usage_register_event(struct mem_cgroup *memcg, - struct eventfd_ctx *eventfd, const char *args) -{ - return __mem_cgroup_usage_register_event(memcg, eventfd, args, _MEMSWAP); -} - -static void __mem_cgroup_usage_unregister_event(struct mem_cgroup *memcg, - struct eventfd_ctx *eventfd, enum res_type type) -{ - struct mem_cgroup_thresholds *thresholds; - struct mem_cgroup_threshold_ary *new; - unsigned long usage; - int i, j, size, entries; - - mutex_lock(&memcg->thresholds_lock); - - if (type =3D=3D _MEM) { - thresholds =3D &memcg->thresholds; - usage =3D mem_cgroup_usage(memcg, false); - } else if (type =3D=3D _MEMSWAP) { - thresholds =3D &memcg->memsw_thresholds; - usage =3D mem_cgroup_usage(memcg, true); - } else - BUG(); - - if (!thresholds->primary) - goto unlock; - - /* Check if a threshold crossed before removing */ - __mem_cgroup_threshold(memcg, type =3D=3D _MEMSWAP); - - /* Calculate new number of threshold */ - size =3D entries =3D 0; - for (i =3D 0; i < thresholds->primary->size; i++) { - if (thresholds->primary->entries[i].eventfd !=3D eventfd) - size++; - else - entries++; - } - - new =3D thresholds->spare; - - /* If no items related to eventfd have been cleared, nothing to do */ - if (!entries) - goto unlock; - - /* Set thresholds array to NULL if we don't have thresholds */ - if (!size) { - kfree(new); - new =3D NULL; - goto swap_buffers; - } - - new->size =3D size; - - /* Copy thresholds and find current threshold */ - new->current_threshold =3D -1; - for (i =3D 0, j =3D 0; i < thresholds->primary->size; i++) { - if (thresholds->primary->entries[i].eventfd =3D=3D eventfd) - continue; - - new->entries[j] =3D thresholds->primary->entries[i]; - if (new->entries[j].threshold <=3D usage) { - /* - * new->current_threshold will not be used - * until rcu_assign_pointer(), so it's safe to increment - * it here. - */ - ++new->current_threshold; - } - j++; - } - -swap_buffers: - /* Swap primary and spare array */ - thresholds->spare =3D thresholds->primary; - - rcu_assign_pointer(thresholds->primary, new); - - /* To be sure that nobody uses thresholds */ - synchronize_rcu(); - - /* If all events are unregistered, free the spare array */ - if (!new) { - kfree(thresholds->spare); - thresholds->spare =3D NULL; - } -unlock: - mutex_unlock(&memcg->thresholds_lock); -} - -static void mem_cgroup_usage_unregister_event(struct mem_cgroup *memcg, - struct eventfd_ctx *eventfd) -{ - return __mem_cgroup_usage_unregister_event(memcg, eventfd, _MEM); -} - -static void memsw_cgroup_usage_unregister_event(struct mem_cgroup *memcg, - struct eventfd_ctx *eventfd) -{ - return __mem_cgroup_usage_unregister_event(memcg, eventfd, _MEMSWAP); -} - -static int mem_cgroup_oom_register_event(struct mem_cgroup *memcg, - struct eventfd_ctx *eventfd, const char *args) -{ - struct mem_cgroup_eventfd_list *event; - - event =3D kmalloc(sizeof(*event), GFP_KERNEL); - if (!event) - return -ENOMEM; - - spin_lock(&memcg_oom_lock); - - event->eventfd =3D eventfd; - list_add(&event->list, &memcg->oom_notify); - - /* already in OOM ? */ - if (memcg->under_oom) - eventfd_signal(eventfd); - spin_unlock(&memcg_oom_lock); - - return 0; -} - -static void mem_cgroup_oom_unregister_event(struct mem_cgroup *memcg, - struct eventfd_ctx *eventfd) -{ - struct mem_cgroup_eventfd_list *ev, *tmp; - - spin_lock(&memcg_oom_lock); - - list_for_each_entry_safe(ev, tmp, &memcg->oom_notify, list) { - if (ev->eventfd =3D=3D eventfd) { - list_del(&ev->list); - kfree(ev); - } - } - - spin_unlock(&memcg_oom_lock); -} - static int mem_cgroup_oom_control_read(struct seq_file *sf, void *v) { struct mem_cgroup *memcg =3D mem_cgroup_from_seq(sf); @@ -4611,243 +4182,6 @@ static void memcg_wb_domain_size_changed(struct mem= _cgroup *memcg) =20 #endif /* CONFIG_CGROUP_WRITEBACK */ =20 -/* - * DO NOT USE IN NEW FILES. - * - * "cgroup.event_control" implementation. - * - * This is way over-engineered. It tries to support fully configurable - * events for each user. Such level of flexibility is completely - * unnecessary especially in the light of the planned unified hierarchy. - * - * Please deprecate this and replace with something simpler if at all - * possible. - */ - -/* - * Unregister event and free resources. - * - * Gets called from workqueue. - */ -static void memcg_event_remove(struct work_struct *work) -{ - struct mem_cgroup_event *event =3D - container_of(work, struct mem_cgroup_event, remove); - struct mem_cgroup *memcg =3D event->memcg; - - remove_wait_queue(event->wqh, &event->wait); - - event->unregister_event(memcg, event->eventfd); - - /* Notify userspace the event is going away. */ - eventfd_signal(event->eventfd); - - eventfd_ctx_put(event->eventfd); - kfree(event); - css_put(&memcg->css); -} - -/* - * Gets called on EPOLLHUP on eventfd when user closes it. - * - * Called with wqh->lock held and interrupts disabled. - */ -static int memcg_event_wake(wait_queue_entry_t *wait, unsigned mode, - int sync, void *key) -{ - struct mem_cgroup_event *event =3D - container_of(wait, struct mem_cgroup_event, wait); - struct mem_cgroup *memcg =3D event->memcg; - __poll_t flags =3D key_to_poll(key); - - if (flags & EPOLLHUP) { - /* - * If the event has been detached at cgroup removal, we - * can simply return knowing the other side will cleanup - * for us. - * - * We can't race against event freeing since the other - * side will require wqh->lock via remove_wait_queue(), - * which we hold. - */ - spin_lock(&memcg->event_list_lock); - if (!list_empty(&event->list)) { - list_del_init(&event->list); - /* - * We are in atomic context, but cgroup_event_remove() - * may sleep, so we have to call it in workqueue. - */ - schedule_work(&event->remove); - } - spin_unlock(&memcg->event_list_lock); - } - - return 0; -} - -static void memcg_event_ptable_queue_proc(struct file *file, - wait_queue_head_t *wqh, poll_table *pt) -{ - struct mem_cgroup_event *event =3D - container_of(pt, struct mem_cgroup_event, pt); - - event->wqh =3D wqh; - add_wait_queue(wqh, &event->wait); -} - -/* - * DO NOT USE IN NEW FILES. - * - * Parse input and register new cgroup event handler. - * - * Input must be in format ' '. - * Interpretation of args is defined by control file implementation. - */ -static ssize_t memcg_write_event_control(struct kernfs_open_file *of, - char *buf, size_t nbytes, loff_t off) -{ - struct cgroup_subsys_state *css =3D of_css(of); - struct mem_cgroup *memcg =3D mem_cgroup_from_css(css); - struct mem_cgroup_event *event; - struct cgroup_subsys_state *cfile_css; - unsigned int efd, cfd; - struct fd efile; - struct fd cfile; - struct dentry *cdentry; - const char *name; - char *endp; - int ret; - - if (IS_ENABLED(CONFIG_PREEMPT_RT)) - return -EOPNOTSUPP; - - buf =3D strstrip(buf); - - efd =3D simple_strtoul(buf, &endp, 10); - if (*endp !=3D ' ') - return -EINVAL; - buf =3D endp + 1; - - cfd =3D simple_strtoul(buf, &endp, 10); - if ((*endp !=3D ' ') && (*endp !=3D '\0')) - return -EINVAL; - buf =3D endp + 1; - - event =3D kzalloc(sizeof(*event), GFP_KERNEL); - if (!event) - return -ENOMEM; - - event->memcg =3D memcg; - INIT_LIST_HEAD(&event->list); - init_poll_funcptr(&event->pt, memcg_event_ptable_queue_proc); - init_waitqueue_func_entry(&event->wait, memcg_event_wake); - INIT_WORK(&event->remove, memcg_event_remove); - - efile =3D fdget(efd); - if (!efile.file) { - ret =3D -EBADF; - goto out_kfree; - } - - event->eventfd =3D eventfd_ctx_fileget(efile.file); - if (IS_ERR(event->eventfd)) { - ret =3D PTR_ERR(event->eventfd); - goto out_put_efile; - } - - cfile =3D fdget(cfd); - if (!cfile.file) { - ret =3D -EBADF; - goto out_put_eventfd; - } - - /* the process need read permission on control file */ - /* AV: shouldn't we check that it's been opened for read instead? */ - ret =3D file_permission(cfile.file, MAY_READ); - if (ret < 0) - goto out_put_cfile; - - /* - * The control file must be a regular cgroup1 file. As a regular cgroup - * file can't be renamed, it's safe to access its name afterwards. - */ - cdentry =3D cfile.file->f_path.dentry; - if (cdentry->d_sb->s_type !=3D &cgroup_fs_type || !d_is_reg(cdentry)) { - ret =3D -EINVAL; - goto out_put_cfile; - } - - /* - * Determine the event callbacks and set them in @event. This used - * to be done via struct cftype but cgroup core no longer knows - * about these events. The following is crude but the whole thing - * is for compatibility anyway. - * - * DO NOT ADD NEW FILES. - */ - name =3D cdentry->d_name.name; - - if (!strcmp(name, "memory.usage_in_bytes")) { - event->register_event =3D mem_cgroup_usage_register_event; - event->unregister_event =3D mem_cgroup_usage_unregister_event; - } else if (!strcmp(name, "memory.oom_control")) { - event->register_event =3D mem_cgroup_oom_register_event; - event->unregister_event =3D mem_cgroup_oom_unregister_event; - } else if (!strcmp(name, "memory.pressure_level")) { - event->register_event =3D vmpressure_register_event; - event->unregister_event =3D vmpressure_unregister_event; - } else if (!strcmp(name, "memory.memsw.usage_in_bytes")) { - event->register_event =3D memsw_cgroup_usage_register_event; - event->unregister_event =3D memsw_cgroup_usage_unregister_event; - } else { - ret =3D -EINVAL; - goto out_put_cfile; - } - - /* - * Verify @cfile should belong to @css. Also, remaining events are - * automatically removed on cgroup destruction but the removal is - * asynchronous, so take an extra ref on @css. - */ - cfile_css =3D css_tryget_online_from_dir(cdentry->d_parent, - &memory_cgrp_subsys); - ret =3D -EINVAL; - if (IS_ERR(cfile_css)) - goto out_put_cfile; - if (cfile_css !=3D css) { - css_put(cfile_css); - goto out_put_cfile; - } - - ret =3D event->register_event(memcg, event->eventfd, buf); - if (ret) - goto out_put_css; - - vfs_poll(efile.file, &event->pt); - - spin_lock_irq(&memcg->event_list_lock); - list_add(&event->list, &memcg->event_list); - spin_unlock_irq(&memcg->event_list_lock); - - fdput(cfile); - fdput(efile); - - return nbytes; - -out_put_css: - css_put(css); -out_put_cfile: - fdput(cfile); -out_put_eventfd: - eventfd_ctx_put(event->eventfd); -out_put_efile: - fdput(efile); -out_kfree: - kfree(event); - - return ret; -} - #if defined(CONFIG_MEMCG_KMEM) && defined(CONFIG_SLUB_DEBUG) static int mem_cgroup_slab_show(struct seq_file *m, void *p) { @@ -5314,19 +4648,8 @@ static int mem_cgroup_css_online(struct cgroup_subsy= s_state *css) static void mem_cgroup_css_offline(struct cgroup_subsys_state *css) { struct mem_cgroup *memcg =3D mem_cgroup_from_css(css); - struct mem_cgroup_event *event, *tmp; =20 - /* - * Unregister events and notify userspace. - * Notify userspace about cgroup removing only after rmdir of cgroup - * directory to avoid race between userspace and kernelspace. - */ - spin_lock_irq(&memcg->event_list_lock); - list_for_each_entry_safe(event, tmp, &memcg->event_list, list) { - list_del_init(&event->list); - schedule_work(&event->remove); - } - spin_unlock_irq(&memcg->event_list_lock); + memcg1_css_offline(memcg); =20 page_counter_set_min(&memcg->memory, 0); page_counter_set_low(&memcg->memory, 0); --=20 2.45.2 From nobody Sat Feb 7 20:06:57 2026 Received: from out-179.mta0.migadu.com (out-179.mta0.migadu.com [91.218.175.179]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 585A6374F1 for ; Tue, 25 Jun 2024 00:59:43 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=91.218.175.179 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1719277185; cv=none; b=nCSEQ+JQJgTdVlJbmBbpDS7iKeJUKJyLNWJVcoPL2nE1vb+zvFtle8qoEp4eZKWo+L12R9YwjIZw1A8sYs6gco6oHFPyDkGHMUiVeWJp8G8AX6Y3kjThHiOI4pTPd465CgJL1NCi2rJ58+LrQGxIG21c6317bEnu8Kph5dxIBNo= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1719277185; c=relaxed/simple; bh=IQzyzCkRP8i6jF7uCdjgsanqWmOgB62q7c74jVt/VK4=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=LgfBCdx5fk/3jNrnoXSqIuSnScbjs/soz9cxDksmkeH4NSLsfQ+wRXl1eOl2+nUS7aTDbxlEfIix+WtT8WoMzs4NYePl2Ql/1ECXRS/H6F3npuJ5g3TU64jCBdWnKjXOKgtTE8+OhN+1OMz3C6g1ZaRfA2pYQplK288RawhPbPg= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev; spf=pass smtp.mailfrom=linux.dev; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b=Zz5Mz7eu; arc=none smtp.client-ip=91.218.175.179 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.dev Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b="Zz5Mz7eu" X-Envelope-To: akpm@linux-foundation.org DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1719277181; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=hnnziElivuQ/LCoNmnyRRYzwKafGd1VzRo+k4WxWNEA=; b=Zz5Mz7euqVmt7XORkaBlHGPy1k/atuJWJKdYE/ol1NG1b2dsF0yeJlI/cbm0Uxkz0ZqpZn pLewHnkD7xPEgvO0eWlj+ZwU0Gqm2DsLuhAD54EVH+gmRtJ+lvmB6Qm5PKalMHH1WgMkBw 5Z2h3LBBZWTLBVkhIT9tXB/VOGV2AtU= X-Envelope-To: hannes@cmpxchg.org X-Envelope-To: mhocko@kernel.org X-Envelope-To: shakeel.butt@linux.dev X-Envelope-To: muchun.song@linux.dev X-Envelope-To: linux-kernel@vger.kernel.org X-Envelope-To: cgroups@vger.kernel.org X-Envelope-To: linux-mm@kvack.org X-Envelope-To: roman.gushchin@linux.dev X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. From: Roman Gushchin To: Andrew Morton Cc: Johannes Weiner , Michal Hocko , Shakeel Butt , Muchun Song , linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, linux-mm@kvack.org, Roman Gushchin Subject: [PATCH v2 07/14] mm: memcg: rename memcg_check_events() Date: Mon, 24 Jun 2024 17:58:59 -0700 Message-ID: <20240625005906.106920-8-roman.gushchin@linux.dev> In-Reply-To: <20240625005906.106920-1-roman.gushchin@linux.dev> References: <20240625005906.106920-1-roman.gushchin@linux.dev> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Migadu-Flow: FLOW_OUT Content-Type: text/plain; charset="utf-8" Rename memcg_check_events() into memcg1_check_events() for consistency with other cgroup v1-specific functions. Signed-off-by: Roman Gushchin Acked-by: Michal Hocko Acked-by: Shakeel Butt Suggested-by: Matthew Wilcox (Oracle) --- mm/memcontrol-v1.c | 6 +++--- mm/memcontrol-v1.h | 2 +- mm/memcontrol.c | 8 ++++---- 3 files changed, 8 insertions(+), 8 deletions(-) diff --git a/mm/memcontrol-v1.c b/mm/memcontrol-v1.c index 4b2290ceace6..d7b5c4c14732 100644 --- a/mm/memcontrol-v1.c +++ b/mm/memcontrol-v1.c @@ -835,9 +835,9 @@ static int mem_cgroup_move_account(struct folio *folio, =20 local_irq_disable(); mem_cgroup_charge_statistics(to, nr_pages); - memcg_check_events(to, nid); + memcg1_check_events(to, nid); mem_cgroup_charge_statistics(from, -nr_pages); - memcg_check_events(from, nid); + memcg1_check_events(from, nid); local_irq_enable(); out: return ret; @@ -1424,7 +1424,7 @@ static void mem_cgroup_threshold(struct mem_cgroup *m= emcg) * Check events in order. * */ -void memcg_check_events(struct mem_cgroup *memcg, int nid) +void memcg1_check_events(struct mem_cgroup *memcg, int nid) { if (IS_ENABLED(CONFIG_PREEMPT_RT)) return; diff --git a/mm/memcontrol-v1.h b/mm/memcontrol-v1.h index 524a2c76ffc9..ef1b7037cbdc 100644 --- a/mm/memcontrol-v1.h +++ b/mm/memcontrol-v1.h @@ -12,7 +12,7 @@ static inline void memcg1_soft_limit_reset(struct mem_cgr= oup *memcg) } =20 void mem_cgroup_charge_statistics(struct mem_cgroup *memcg, int nr_pages); -void memcg_check_events(struct mem_cgroup *memcg, int nid); +void memcg1_check_events(struct mem_cgroup *memcg, int nid); void memcg_oom_recover(struct mem_cgroup *memcg); int try_charge_memcg(struct mem_cgroup *memcg, gfp_t gfp_mask, unsigned int nr_pages); diff --git a/mm/memcontrol.c b/mm/memcontrol.c index bd4b26a73596..92fb72bbd494 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -2632,7 +2632,7 @@ void mem_cgroup_commit_charge(struct folio *folio, st= ruct mem_cgroup *memcg) =20 local_irq_disable(); mem_cgroup_charge_statistics(memcg, folio_nr_pages(folio)); - memcg_check_events(memcg, folio_nid(folio)); + memcg1_check_events(memcg, folio_nid(folio)); local_irq_enable(); } =20 @@ -5697,7 +5697,7 @@ static void uncharge_batch(const struct uncharge_gath= er *ug) local_irq_save(flags); __count_memcg_events(ug->memcg, PGPGOUT, ug->pgpgout); __this_cpu_add(ug->memcg->vmstats_percpu->nr_page_events, ug->nr_memory); - memcg_check_events(ug->memcg, ug->nid); + memcg1_check_events(ug->memcg, ug->nid); local_irq_restore(flags); =20 /* drop reference from uncharge_folio */ @@ -5836,7 +5836,7 @@ void mem_cgroup_replace_folio(struct folio *old, stru= ct folio *new) =20 local_irq_save(flags); mem_cgroup_charge_statistics(memcg, nr_pages); - memcg_check_events(memcg, folio_nid(new)); + memcg1_check_events(memcg, folio_nid(new)); local_irq_restore(flags); } =20 @@ -6104,7 +6104,7 @@ void mem_cgroup_swapout(struct folio *folio, swp_entr= y_t entry) memcg_stats_lock(); mem_cgroup_charge_statistics(memcg, -nr_entries); memcg_stats_unlock(); - memcg_check_events(memcg, folio_nid(folio)); + memcg1_check_events(memcg, folio_nid(folio)); =20 css_put(&memcg->css); } --=20 2.45.2 From nobody Sat Feb 7 20:06:57 2026 Received: from out-181.mta0.migadu.com (out-181.mta0.migadu.com [91.218.175.181]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 130253BB23 for ; Tue, 25 Jun 2024 00:59:45 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=91.218.175.181 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1719277187; cv=none; b=FI99v+F77qDPSVt3ZrRymRNAukkW4hHeq1MiUa1KwnjrSIXWiKEmkARqZ6Peb0/BS4cm6PhT/F/qA6tOH6XvRxwKyVaPUNVlaIfLNudqPD7wyYnnn3RINaoliY8tlxNrpLWo+vGFuUdQtPjD7JTWv5+nkMTf2olHljMyM1qRhNI= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1719277187; c=relaxed/simple; bh=X7/6qG8oE8wrz9JEGlev8I//DeA2kt2CoX++xGUF3lg=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=e2cjPvCKgsA8ZiITt4ULXc/HKsir99Vtb9Ej61FbmoP4HObFSZt62TvCQbmmHEm90tWbLKJI8+Db+PnCWhOoH/af7tI3fSUiPOKWvFuEgpCusi++pQh1A225Tjd2bD4XeIDRvN5QYNMOM7QwT8ykVLzCWBYZ1nVAjXVdrAMk4b4= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev; spf=pass smtp.mailfrom=linux.dev; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b=m8+KDAZP; arc=none smtp.client-ip=91.218.175.181 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.dev Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b="m8+KDAZP" X-Envelope-To: akpm@linux-foundation.org DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1719277183; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=gDpuG7ZRLEX7/v6od4u5N3qPX9lXtC5VgDycpruJVCk=; b=m8+KDAZPBXlAxzuey8qbrsfLxOAcJ26BTSgsLqwb8PF3wE0mB/XAoZ0f8zjH/qfkiv9YU2 QpNerO2d/+fK0XMGbpKDloXTC34+z0o4ohNH/pSLyRw/C/HsCR4F2/QiYTAlTujw4orXLN zxSDNGBTrTL2ylcvCaGz0tNPRfr3oS8= X-Envelope-To: hannes@cmpxchg.org X-Envelope-To: mhocko@kernel.org X-Envelope-To: shakeel.butt@linux.dev X-Envelope-To: muchun.song@linux.dev X-Envelope-To: linux-kernel@vger.kernel.org X-Envelope-To: cgroups@vger.kernel.org X-Envelope-To: linux-mm@kvack.org X-Envelope-To: roman.gushchin@linux.dev X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. From: Roman Gushchin To: Andrew Morton Cc: Johannes Weiner , Michal Hocko , Shakeel Butt , Muchun Song , linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, linux-mm@kvack.org, Roman Gushchin Subject: [PATCH v2 08/14] mm: memcg: move cgroup v1 oom handling code into memcontrol-v1.c Date: Mon, 24 Jun 2024 17:59:00 -0700 Message-ID: <20240625005906.106920-9-roman.gushchin@linux.dev> In-Reply-To: <20240625005906.106920-1-roman.gushchin@linux.dev> References: <20240625005906.106920-1-roman.gushchin@linux.dev> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Migadu-Flow: FLOW_OUT Content-Type: text/plain; charset="utf-8" Cgroup v1 supports a complicated OOM handling in userspace mechanism, which is not supported by cgroup v2. Let's move the corresponding code into memcontrol-v1.c. Aside from mechanical code movement this patch introduces two new functions: memcg1_oom_prepare() and memcg1_oom_finish(). Those are implementing cgroup v1-specific parts of the common memcg OOM handling path. Signed-off-by: Roman Gushchin Acked-by: Michal Hocko Acked-by: Shakeel Butt Suggested-by: Matthew Wilcox (Oracle) --- mm/memcontrol-v1.c | 229 ++++++++++++++++++++++++++++++++++++++++++++- mm/memcontrol-v1.h | 3 +- mm/memcontrol.c | 216 +----------------------------------------- 3 files changed, 231 insertions(+), 217 deletions(-) diff --git a/mm/memcontrol-v1.c b/mm/memcontrol-v1.c index d7b5c4c14732..253d49d5fb12 100644 --- a/mm/memcontrol-v1.c +++ b/mm/memcontrol-v1.c @@ -110,7 +110,13 @@ struct mem_cgroup_event { struct work_struct remove; }; =20 -extern spinlock_t memcg_oom_lock; +#ifdef CONFIG_LOCKDEP +static struct lockdep_map memcg_oom_lock_dep_map =3D { + .name =3D "memcg_oom_lock", +}; +#endif + +DEFINE_SPINLOCK(memcg_oom_lock); =20 static void __mem_cgroup_insert_exceeded(struct mem_cgroup_per_node *mz, struct mem_cgroup_tree_per_node *mctz, @@ -1469,7 +1475,7 @@ static int mem_cgroup_oom_notify_cb(struct mem_cgroup= *memcg) return 0; } =20 -void mem_cgroup_oom_notify(struct mem_cgroup *memcg) +static void mem_cgroup_oom_notify(struct mem_cgroup *memcg) { struct mem_cgroup *iter; =20 @@ -1959,6 +1965,225 @@ void memcg1_css_offline(struct mem_cgroup *memcg) spin_unlock_irq(&memcg->event_list_lock); } =20 +/* + * Check OOM-Killer is already running under our hierarchy. + * If someone is running, return false. + */ +static bool mem_cgroup_oom_trylock(struct mem_cgroup *memcg) +{ + struct mem_cgroup *iter, *failed =3D NULL; + + spin_lock(&memcg_oom_lock); + + for_each_mem_cgroup_tree(iter, memcg) { + if (iter->oom_lock) { + /* + * this subtree of our hierarchy is already locked + * so we cannot give a lock. + */ + failed =3D iter; + mem_cgroup_iter_break(memcg, iter); + break; + } else + iter->oom_lock =3D true; + } + + if (failed) { + /* + * OK, we failed to lock the whole subtree so we have + * to clean up what we set up to the failing subtree + */ + for_each_mem_cgroup_tree(iter, memcg) { + if (iter =3D=3D failed) { + mem_cgroup_iter_break(memcg, iter); + break; + } + iter->oom_lock =3D false; + } + } else + mutex_acquire(&memcg_oom_lock_dep_map, 0, 1, _RET_IP_); + + spin_unlock(&memcg_oom_lock); + + return !failed; +} + +static void mem_cgroup_oom_unlock(struct mem_cgroup *memcg) +{ + struct mem_cgroup *iter; + + spin_lock(&memcg_oom_lock); + mutex_release(&memcg_oom_lock_dep_map, _RET_IP_); + for_each_mem_cgroup_tree(iter, memcg) + iter->oom_lock =3D false; + spin_unlock(&memcg_oom_lock); +} + +static void mem_cgroup_mark_under_oom(struct mem_cgroup *memcg) +{ + struct mem_cgroup *iter; + + spin_lock(&memcg_oom_lock); + for_each_mem_cgroup_tree(iter, memcg) + iter->under_oom++; + spin_unlock(&memcg_oom_lock); +} + +static void mem_cgroup_unmark_under_oom(struct mem_cgroup *memcg) +{ + struct mem_cgroup *iter; + + /* + * Be careful about under_oom underflows because a child memcg + * could have been added after mem_cgroup_mark_under_oom. + */ + spin_lock(&memcg_oom_lock); + for_each_mem_cgroup_tree(iter, memcg) + if (iter->under_oom > 0) + iter->under_oom--; + spin_unlock(&memcg_oom_lock); +} + +static DECLARE_WAIT_QUEUE_HEAD(memcg_oom_waitq); + +struct oom_wait_info { + struct mem_cgroup *memcg; + wait_queue_entry_t wait; +}; + +static int memcg_oom_wake_function(wait_queue_entry_t *wait, + unsigned mode, int sync, void *arg) +{ + struct mem_cgroup *wake_memcg =3D (struct mem_cgroup *)arg; + struct mem_cgroup *oom_wait_memcg; + struct oom_wait_info *oom_wait_info; + + oom_wait_info =3D container_of(wait, struct oom_wait_info, wait); + oom_wait_memcg =3D oom_wait_info->memcg; + + if (!mem_cgroup_is_descendant(wake_memcg, oom_wait_memcg) && + !mem_cgroup_is_descendant(oom_wait_memcg, wake_memcg)) + return 0; + return autoremove_wake_function(wait, mode, sync, arg); +} + +void memcg_oom_recover(struct mem_cgroup *memcg) +{ + /* + * For the following lockless ->under_oom test, the only required + * guarantee is that it must see the state asserted by an OOM when + * this function is called as a result of userland actions + * triggered by the notification of the OOM. This is trivially + * achieved by invoking mem_cgroup_mark_under_oom() before + * triggering notification. + */ + if (memcg && memcg->under_oom) + __wake_up(&memcg_oom_waitq, TASK_NORMAL, 0, memcg); +} + +/** + * mem_cgroup_oom_synchronize - complete memcg OOM handling + * @handle: actually kill/wait or just clean up the OOM state + * + * This has to be called at the end of a page fault if the memcg OOM + * handler was enabled. + * + * Memcg supports userspace OOM handling where failed allocations must + * sleep on a waitqueue until the userspace task resolves the + * situation. Sleeping directly in the charge context with all kinds + * of locks held is not a good idea, instead we remember an OOM state + * in the task and mem_cgroup_oom_synchronize() has to be called at + * the end of the page fault to complete the OOM handling. + * + * Returns %true if an ongoing memcg OOM situation was detected and + * completed, %false otherwise. + */ +bool mem_cgroup_oom_synchronize(bool handle) +{ + struct mem_cgroup *memcg =3D current->memcg_in_oom; + struct oom_wait_info owait; + bool locked; + + /* OOM is global, do not handle */ + if (!memcg) + return false; + + if (!handle) + goto cleanup; + + owait.memcg =3D memcg; + owait.wait.flags =3D 0; + owait.wait.func =3D memcg_oom_wake_function; + owait.wait.private =3D current; + INIT_LIST_HEAD(&owait.wait.entry); + + prepare_to_wait(&memcg_oom_waitq, &owait.wait, TASK_KILLABLE); + mem_cgroup_mark_under_oom(memcg); + + locked =3D mem_cgroup_oom_trylock(memcg); + + if (locked) + mem_cgroup_oom_notify(memcg); + + schedule(); + mem_cgroup_unmark_under_oom(memcg); + finish_wait(&memcg_oom_waitq, &owait.wait); + + if (locked) + mem_cgroup_oom_unlock(memcg); +cleanup: + current->memcg_in_oom =3D NULL; + css_put(&memcg->css); + return true; +} + + +bool memcg1_oom_prepare(struct mem_cgroup *memcg, bool *locked) +{ + /* + * We are in the middle of the charge context here, so we + * don't want to block when potentially sitting on a callstack + * that holds all kinds of filesystem and mm locks. + * + * cgroup1 allows disabling the OOM killer and waiting for outside + * handling until the charge can succeed; remember the context and put + * the task to sleep at the end of the page fault when all locks are + * released. + * + * On the other hand, in-kernel OOM killer allows for an async victim + * memory reclaim (oom_reaper) and that means that we are not solely + * relying on the oom victim to make a forward progress and we can + * invoke the oom killer here. + * + * Please note that mem_cgroup_out_of_memory might fail to find a + * victim and then we have to bail out from the charge path. + */ + if (READ_ONCE(memcg->oom_kill_disable)) { + if (current->in_user_fault) { + css_get(&memcg->css); + current->memcg_in_oom =3D memcg; + } + return false; + } + + mem_cgroup_mark_under_oom(memcg); + + *locked =3D mem_cgroup_oom_trylock(memcg); + + if (*locked) + mem_cgroup_oom_notify(memcg); + + mem_cgroup_unmark_under_oom(memcg); + + return true; +} + +void memcg1_oom_finish(struct mem_cgroup *memcg, bool locked) +{ + if (locked) + mem_cgroup_oom_unlock(memcg); +} + static int __init memcg1_init(void) { int node; diff --git a/mm/memcontrol-v1.h b/mm/memcontrol-v1.h index ef1b7037cbdc..3de956b2422f 100644 --- a/mm/memcontrol-v1.h +++ b/mm/memcontrol-v1.h @@ -87,9 +87,10 @@ enum res_type { bool mem_cgroup_event_ratelimit(struct mem_cgroup *memcg, enum mem_cgroup_events_target target); unsigned long mem_cgroup_usage(struct mem_cgroup *memcg, bool swap); -void mem_cgroup_oom_notify(struct mem_cgroup *memcg); ssize_t memcg_write_event_control(struct kernfs_open_file *of, char *buf, size_t nbytes, loff_t off); =20 +bool memcg1_oom_prepare(struct mem_cgroup *memcg, bool *locked); +void memcg1_oom_finish(struct mem_cgroup *memcg, bool locked); =20 #endif /* __MM_MEMCONTROL_V1_H */ diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 92fb72bbd494..8abd364ac837 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -1616,130 +1616,6 @@ static bool mem_cgroup_out_of_memory(struct mem_cgr= oup *memcg, gfp_t gfp_mask, return ret; } =20 -#ifdef CONFIG_LOCKDEP -static struct lockdep_map memcg_oom_lock_dep_map =3D { - .name =3D "memcg_oom_lock", -}; -#endif - -DEFINE_SPINLOCK(memcg_oom_lock); - -/* - * Check OOM-Killer is already running under our hierarchy. - * If someone is running, return false. - */ -static bool mem_cgroup_oom_trylock(struct mem_cgroup *memcg) -{ - struct mem_cgroup *iter, *failed =3D NULL; - - spin_lock(&memcg_oom_lock); - - for_each_mem_cgroup_tree(iter, memcg) { - if (iter->oom_lock) { - /* - * this subtree of our hierarchy is already locked - * so we cannot give a lock. - */ - failed =3D iter; - mem_cgroup_iter_break(memcg, iter); - break; - } else - iter->oom_lock =3D true; - } - - if (failed) { - /* - * OK, we failed to lock the whole subtree so we have - * to clean up what we set up to the failing subtree - */ - for_each_mem_cgroup_tree(iter, memcg) { - if (iter =3D=3D failed) { - mem_cgroup_iter_break(memcg, iter); - break; - } - iter->oom_lock =3D false; - } - } else - mutex_acquire(&memcg_oom_lock_dep_map, 0, 1, _RET_IP_); - - spin_unlock(&memcg_oom_lock); - - return !failed; -} - -static void mem_cgroup_oom_unlock(struct mem_cgroup *memcg) -{ - struct mem_cgroup *iter; - - spin_lock(&memcg_oom_lock); - mutex_release(&memcg_oom_lock_dep_map, _RET_IP_); - for_each_mem_cgroup_tree(iter, memcg) - iter->oom_lock =3D false; - spin_unlock(&memcg_oom_lock); -} - -static void mem_cgroup_mark_under_oom(struct mem_cgroup *memcg) -{ - struct mem_cgroup *iter; - - spin_lock(&memcg_oom_lock); - for_each_mem_cgroup_tree(iter, memcg) - iter->under_oom++; - spin_unlock(&memcg_oom_lock); -} - -static void mem_cgroup_unmark_under_oom(struct mem_cgroup *memcg) -{ - struct mem_cgroup *iter; - - /* - * Be careful about under_oom underflows because a child memcg - * could have been added after mem_cgroup_mark_under_oom. - */ - spin_lock(&memcg_oom_lock); - for_each_mem_cgroup_tree(iter, memcg) - if (iter->under_oom > 0) - iter->under_oom--; - spin_unlock(&memcg_oom_lock); -} - -static DECLARE_WAIT_QUEUE_HEAD(memcg_oom_waitq); - -struct oom_wait_info { - struct mem_cgroup *memcg; - wait_queue_entry_t wait; -}; - -static int memcg_oom_wake_function(wait_queue_entry_t *wait, - unsigned mode, int sync, void *arg) -{ - struct mem_cgroup *wake_memcg =3D (struct mem_cgroup *)arg; - struct mem_cgroup *oom_wait_memcg; - struct oom_wait_info *oom_wait_info; - - oom_wait_info =3D container_of(wait, struct oom_wait_info, wait); - oom_wait_memcg =3D oom_wait_info->memcg; - - if (!mem_cgroup_is_descendant(wake_memcg, oom_wait_memcg) && - !mem_cgroup_is_descendant(oom_wait_memcg, wake_memcg)) - return 0; - return autoremove_wake_function(wait, mode, sync, arg); -} - -void memcg_oom_recover(struct mem_cgroup *memcg) -{ - /* - * For the following lockless ->under_oom test, the only required - * guarantee is that it must see the state asserted by an OOM when - * this function is called as a result of userland actions - * triggered by the notification of the OOM. This is trivially - * achieved by invoking mem_cgroup_mark_under_oom() before - * triggering notification. - */ - if (memcg && memcg->under_oom) - __wake_up(&memcg_oom_waitq, TASK_NORMAL, 0, memcg); -} - /* * Returns true if successfully killed one or more processes. Though in so= me * corner cases it can return true even without killing any process. @@ -1753,104 +1629,16 @@ static bool mem_cgroup_oom(struct mem_cgroup *memc= g, gfp_t mask, int order) =20 memcg_memory_event(memcg, MEMCG_OOM); =20 - /* - * We are in the middle of the charge context here, so we - * don't want to block when potentially sitting on a callstack - * that holds all kinds of filesystem and mm locks. - * - * cgroup1 allows disabling the OOM killer and waiting for outside - * handling until the charge can succeed; remember the context and put - * the task to sleep at the end of the page fault when all locks are - * released. - * - * On the other hand, in-kernel OOM killer allows for an async victim - * memory reclaim (oom_reaper) and that means that we are not solely - * relying on the oom victim to make a forward progress and we can - * invoke the oom killer here. - * - * Please note that mem_cgroup_out_of_memory might fail to find a - * victim and then we have to bail out from the charge path. - */ - if (READ_ONCE(memcg->oom_kill_disable)) { - if (current->in_user_fault) { - css_get(&memcg->css); - current->memcg_in_oom =3D memcg; - } + if (!memcg1_oom_prepare(memcg, &locked)) return false; - } - - mem_cgroup_mark_under_oom(memcg); =20 - locked =3D mem_cgroup_oom_trylock(memcg); - - if (locked) - mem_cgroup_oom_notify(memcg); - - mem_cgroup_unmark_under_oom(memcg); ret =3D mem_cgroup_out_of_memory(memcg, mask, order); =20 - if (locked) - mem_cgroup_oom_unlock(memcg); + memcg1_oom_finish(memcg, locked); =20 return ret; } =20 -/** - * mem_cgroup_oom_synchronize - complete memcg OOM handling - * @handle: actually kill/wait or just clean up the OOM state - * - * This has to be called at the end of a page fault if the memcg OOM - * handler was enabled. - * - * Memcg supports userspace OOM handling where failed allocations must - * sleep on a waitqueue until the userspace task resolves the - * situation. Sleeping directly in the charge context with all kinds - * of locks held is not a good idea, instead we remember an OOM state - * in the task and mem_cgroup_oom_synchronize() has to be called at - * the end of the page fault to complete the OOM handling. - * - * Returns %true if an ongoing memcg OOM situation was detected and - * completed, %false otherwise. - */ -bool mem_cgroup_oom_synchronize(bool handle) -{ - struct mem_cgroup *memcg =3D current->memcg_in_oom; - struct oom_wait_info owait; - bool locked; - - /* OOM is global, do not handle */ - if (!memcg) - return false; - - if (!handle) - goto cleanup; - - owait.memcg =3D memcg; - owait.wait.flags =3D 0; - owait.wait.func =3D memcg_oom_wake_function; - owait.wait.private =3D current; - INIT_LIST_HEAD(&owait.wait.entry); - - prepare_to_wait(&memcg_oom_waitq, &owait.wait, TASK_KILLABLE); - mem_cgroup_mark_under_oom(memcg); - - locked =3D mem_cgroup_oom_trylock(memcg); - - if (locked) - mem_cgroup_oom_notify(memcg); - - schedule(); - mem_cgroup_unmark_under_oom(memcg); - finish_wait(&memcg_oom_waitq, &owait.wait); - - if (locked) - mem_cgroup_oom_unlock(memcg); -cleanup: - current->memcg_in_oom =3D NULL; - css_put(&memcg->css); - return true; -} - /** * mem_cgroup_get_oom_group - get a memory cgroup to clean up after OOM * @victim: task to be killed by the OOM killer --=20 2.45.2 From nobody Sat Feb 7 20:06:57 2026 Received: from out-185.mta0.migadu.com (out-185.mta0.migadu.com [91.218.175.185]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 264C843AB3 for ; Tue, 25 Jun 2024 00:59:46 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=91.218.175.185 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1719277188; cv=none; b=Xm53CESsnlrTqO1d4V4RZP/gpwCM9BU8xydvEIhFHWMTlkp+w4++A89s9bY9oCbLyvzZ+BfJ3IUQtHNSbXmUMLO1bMfpz2CE1COtIKZhv/ZnI8IQDZeNIfk+D/o47PzOe66P3lySyjoAnAUqEKjSJ1++QsGJgPKH5yPrHPVCKfw= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1719277188; c=relaxed/simple; bh=TeuSUGNM7krOR1ImWUbyFCeUYRReVvBXyseqinm1CaA=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=OngtRQEJueIPGQbSPuhDbET2BbVOYvAJPGoSJQqpxXovrr1S/MiRWxAMWVGj0lo4eYwrJ9zZWfKRffzjeoWDbk4PPUTxXdS1W/Et/ELuOX+yiPp+q6AFXvf6P0hOznc7wBi1Q2c/Il6QFPyWcGT+Fi2xSMgAEfBEOp3mH1z2FzQ= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev; spf=pass smtp.mailfrom=linux.dev; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b=eV89MCas; arc=none smtp.client-ip=91.218.175.185 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.dev Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b="eV89MCas" X-Envelope-To: akpm@linux-foundation.org DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1719277185; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=fbzH39LFdgO7g4Zh8QO5WwcNo4wg+o7OIMv0JFj+P7c=; b=eV89MCast7EgJhdesN7qSvQtr85YMmOO0k2w6ijFUtydlNQ78AauHRfRhUjsPEeOfbC5Gx dtcK/V9nKa9SNmF/5FFvf0Xxi6O7J2yw96sE17lemuK/TV1CjppWoMQQb/mE6M7L50bNMj 8VHqifkS/V/BI3Pm3oEoZbviAukkS9Q= X-Envelope-To: hannes@cmpxchg.org X-Envelope-To: mhocko@kernel.org X-Envelope-To: shakeel.butt@linux.dev X-Envelope-To: muchun.song@linux.dev X-Envelope-To: linux-kernel@vger.kernel.org X-Envelope-To: cgroups@vger.kernel.org X-Envelope-To: linux-mm@kvack.org X-Envelope-To: roman.gushchin@linux.dev X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. From: Roman Gushchin To: Andrew Morton Cc: Johannes Weiner , Michal Hocko , Shakeel Butt , Muchun Song , linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, linux-mm@kvack.org, Roman Gushchin Subject: [PATCH v2 09/14] mm: memcg: rename memcg_oom_recover() Date: Mon, 24 Jun 2024 17:59:01 -0700 Message-ID: <20240625005906.106920-10-roman.gushchin@linux.dev> In-Reply-To: <20240625005906.106920-1-roman.gushchin@linux.dev> References: <20240625005906.106920-1-roman.gushchin@linux.dev> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Migadu-Flow: FLOW_OUT Content-Type: text/plain; charset="utf-8" Rename memcg_oom_recover() into memcg1_oom_recover() for consistency with other memory cgroup v1-related functions. Move the declaration in mm/memcontrol-v1.h to be nearby other memcg v1 oom handling functions. Signed-off-by: Roman Gushchin Acked-by: Michal Hocko Acked-by: Shakeel Butt Suggested-by: Matthew Wilcox (Oracle) --- mm/memcontrol-v1.c | 6 +++--- mm/memcontrol-v1.h | 2 +- mm/memcontrol.c | 6 +++--- 3 files changed, 7 insertions(+), 7 deletions(-) diff --git a/mm/memcontrol-v1.c b/mm/memcontrol-v1.c index 253d49d5fb12..1d5608ee1606 100644 --- a/mm/memcontrol-v1.c +++ b/mm/memcontrol-v1.c @@ -1090,8 +1090,8 @@ static void __mem_cgroup_clear_mc(void) =20 mc.moved_swap =3D 0; } - memcg_oom_recover(from); - memcg_oom_recover(to); + memcg1_oom_recover(from); + memcg1_oom_recover(to); wake_up_all(&mc.waitq); } =20 @@ -2067,7 +2067,7 @@ static int memcg_oom_wake_function(wait_queue_entry_t= *wait, return autoremove_wake_function(wait, mode, sync, arg); } =20 -void memcg_oom_recover(struct mem_cgroup *memcg) +void memcg1_oom_recover(struct mem_cgroup *memcg) { /* * For the following lockless ->under_oom test, the only required diff --git a/mm/memcontrol-v1.h b/mm/memcontrol-v1.h index 3de956b2422f..972c493a8ae3 100644 --- a/mm/memcontrol-v1.h +++ b/mm/memcontrol-v1.h @@ -13,7 +13,6 @@ static inline void memcg1_soft_limit_reset(struct mem_cgr= oup *memcg) =20 void mem_cgroup_charge_statistics(struct mem_cgroup *memcg, int nr_pages); void memcg1_check_events(struct mem_cgroup *memcg, int nid); -void memcg_oom_recover(struct mem_cgroup *memcg); int try_charge_memcg(struct mem_cgroup *memcg, gfp_t gfp_mask, unsigned int nr_pages); =20 @@ -92,5 +91,6 @@ ssize_t memcg_write_event_control(struct kernfs_open_file= *of, =20 bool memcg1_oom_prepare(struct mem_cgroup *memcg, bool *locked); void memcg1_oom_finish(struct mem_cgroup *memcg, bool locked); +void memcg1_oom_recover(struct mem_cgroup *memcg); =20 #endif /* __MM_MEMCONTROL_V1_H */ diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 8abd364ac837..37e0af5b26f3 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -3167,7 +3167,7 @@ static int mem_cgroup_resize_max(struct mem_cgroup *m= emcg, } while (true); =20 if (!ret && enlarge) - memcg_oom_recover(memcg); + memcg1_oom_recover(memcg); =20 return ret; } @@ -3752,7 +3752,7 @@ static int mem_cgroup_oom_control_write(struct cgroup= _subsys_state *css, =20 WRITE_ONCE(memcg->oom_kill_disable, val); if (!val) - memcg_oom_recover(memcg); + memcg1_oom_recover(memcg); =20 return 0; } @@ -5479,7 +5479,7 @@ static void uncharge_batch(const struct uncharge_gath= er *ug) page_counter_uncharge(&ug->memcg->memsw, ug->nr_memory); if (ug->nr_kmem) memcg_account_kmem(ug->memcg, -ug->nr_kmem); - memcg_oom_recover(ug->memcg); + memcg1_oom_recover(ug->memcg); } =20 local_irq_save(flags); --=20 2.45.2 From nobody Sat Feb 7 20:06:57 2026 Received: from out-177.mta0.migadu.com (out-177.mta0.migadu.com [91.218.175.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id A5D3349623 for ; Tue, 25 Jun 2024 00:59:49 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=91.218.175.177 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1719277193; cv=none; b=MgDMW0HWi8kjtavKTLNGFsa1iPfDMAKkeERxRMgyhqwcP9Ms6fqsJqCNMfA7wBMdDSxZwL+lJv6doFohPFaH3wPtD6F23Ac+QUtNbNKbrY6D6wlHkhSuzHxRiUnGfFiD0qCldDhsAsNF9p0hBjVJOB0WR7WQ6PIjuMniBN9tUdM= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1719277193; c=relaxed/simple; bh=WjwjXYumcVOpdrXi7YS+/E4hhBrfelVUrkqDwdixEXc=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=szfMME5qyY+qcABSigDoe4Oj9zuxLXv7ZGFta+na9KCdPXL0Do8OpNPgw7Wnm1gmMWKKNMTjAizozLohzh1m26mBjTTdzBBAXv308iRLR099xqbvxKav3LFnIkiddURg3qiYNqdEUHAA+VPG1TE1YUmg6z7PpRWhNyYk+BbJHtw= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev; spf=pass smtp.mailfrom=linux.dev; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b=CS6G31DD; arc=none smtp.client-ip=91.218.175.177 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.dev Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b="CS6G31DD" X-Envelope-To: akpm@linux-foundation.org DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1719277188; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=GPeyaNOSxBqGrmhIQ0Yy2cBzUve/uQScgREmCFFE80A=; b=CS6G31DDySi/Xt0aJx00BATVEUW2DIQKSsJ0IKZnXL6+TmgPsBUi5GK+Ap2FWWoXcGDcAR Q4KTj0NOAajPbPkhkSn27NKhCZY26SsS3jEniRpROz7KSL7rrGiIDH8W+6CcJ4LUZoUIrC cpLmvnsleHo/2OgjosFJ77JwOftICeI= X-Envelope-To: hannes@cmpxchg.org X-Envelope-To: mhocko@kernel.org X-Envelope-To: shakeel.butt@linux.dev X-Envelope-To: muchun.song@linux.dev X-Envelope-To: linux-kernel@vger.kernel.org X-Envelope-To: cgroups@vger.kernel.org X-Envelope-To: linux-mm@kvack.org X-Envelope-To: roman.gushchin@linux.dev X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. From: Roman Gushchin To: Andrew Morton Cc: Johannes Weiner , Michal Hocko , Shakeel Butt , Muchun Song , linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, linux-mm@kvack.org, Roman Gushchin Subject: [PATCH v2 10/14] mm: memcg: move cgroup v1 interface files to memcontrol-v1.c Date: Mon, 24 Jun 2024 17:59:02 -0700 Message-ID: <20240625005906.106920-11-roman.gushchin@linux.dev> In-Reply-To: <20240625005906.106920-1-roman.gushchin@linux.dev> References: <20240625005906.106920-1-roman.gushchin@linux.dev> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Migadu-Flow: FLOW_OUT Content-Type: text/plain; charset="utf-8" Move legacy cgroup v1 memory controller interfaces and corresponding code into memcontrol-v1.c. Signed-off-by: Roman Gushchin Acked-by: Michal Hocko Acked-by: Shakeel Butt Suggested-by: Matthew Wilcox (Oracle) --- mm/memcontrol-v1.c | 739 ++++++++++++++++++++++++++++++++++++++++++++- mm/memcontrol-v1.h | 29 +- mm/memcontrol.c | 721 +------------------------------------------ 3 files changed, 767 insertions(+), 722 deletions(-) diff --git a/mm/memcontrol-v1.c b/mm/memcontrol-v1.c index 1d5608ee1606..1b7337d0170d 100644 --- a/mm/memcontrol-v1.c +++ b/mm/memcontrol-v1.c @@ -10,6 +10,7 @@ #include #include #include +#include =20 #include "internal.h" #include "swap.h" @@ -110,6 +111,18 @@ struct mem_cgroup_event { struct work_struct remove; }; =20 +#define MEMFILE_PRIVATE(x, val) ((x) << 16 | (val)) +#define MEMFILE_TYPE(val) ((val) >> 16 & 0xffff) +#define MEMFILE_ATTR(val) ((val) & 0xffff) + +enum { + RES_USAGE, + RES_LIMIT, + RES_MAX_USAGE, + RES_FAILCNT, + RES_SOFT_LIMIT, +}; + #ifdef CONFIG_LOCKDEP static struct lockdep_map memcg_oom_lock_dep_map =3D { .name =3D "memcg_oom_lock", @@ -577,14 +590,14 @@ static inline int mem_cgroup_move_swap_account(swp_en= try_t entry, } #endif =20 -u64 mem_cgroup_move_charge_read(struct cgroup_subsys_state *css, +static u64 mem_cgroup_move_charge_read(struct cgroup_subsys_state *css, struct cftype *cft) { return mem_cgroup_from_css(css)->move_charge_at_immigrate; } =20 #ifdef CONFIG_MMU -int mem_cgroup_move_charge_write(struct cgroup_subsys_state *css, +static int mem_cgroup_move_charge_write(struct cgroup_subsys_state *css, struct cftype *cft, u64 val) { struct mem_cgroup *memcg =3D mem_cgroup_from_css(css); @@ -606,7 +619,7 @@ int mem_cgroup_move_charge_write(struct cgroup_subsys_s= tate *css, return 0; } #else -int mem_cgroup_move_charge_write(struct cgroup_subsys_state *css, +static int mem_cgroup_move_charge_write(struct cgroup_subsys_state *css, struct cftype *cft, u64 val) { return -ENOSYS; @@ -1803,8 +1816,8 @@ static void memcg_event_ptable_queue_proc(struct file= *file, * Input must be in format ' '. * Interpretation of args is defined by control file implementation. */ -ssize_t memcg_write_event_control(struct kernfs_open_file *of, - char *buf, size_t nbytes, loff_t off) +static ssize_t memcg_write_event_control(struct kernfs_open_file *of, + char *buf, size_t nbytes, loff_t off) { struct cgroup_subsys_state *css =3D of_css(of); struct mem_cgroup *memcg =3D mem_cgroup_from_css(css); @@ -2184,6 +2197,722 @@ void memcg1_oom_finish(struct mem_cgroup *memcg, bo= ol locked) mem_cgroup_oom_unlock(memcg); } =20 +static DEFINE_MUTEX(memcg_max_mutex); + +static int mem_cgroup_resize_max(struct mem_cgroup *memcg, + unsigned long max, bool memsw) +{ + bool enlarge =3D false; + bool drained =3D false; + int ret; + bool limits_invariant; + struct page_counter *counter =3D memsw ? &memcg->memsw : &memcg->memory; + + do { + if (signal_pending(current)) { + ret =3D -EINTR; + break; + } + + mutex_lock(&memcg_max_mutex); + /* + * Make sure that the new limit (memsw or memory limit) doesn't + * break our basic invariant rule memory.max <=3D memsw.max. + */ + limits_invariant =3D memsw ? max >=3D READ_ONCE(memcg->memory.max) : + max <=3D memcg->memsw.max; + if (!limits_invariant) { + mutex_unlock(&memcg_max_mutex); + ret =3D -EINVAL; + break; + } + if (max > counter->max) + enlarge =3D true; + ret =3D page_counter_set_max(counter, max); + mutex_unlock(&memcg_max_mutex); + + if (!ret) + break; + + if (!drained) { + drain_all_stock(memcg); + drained =3D true; + continue; + } + + if (!try_to_free_mem_cgroup_pages(memcg, 1, GFP_KERNEL, + memsw ? 0 : MEMCG_RECLAIM_MAY_SWAP, NULL)) { + ret =3D -EBUSY; + break; + } + } while (true); + + if (!ret && enlarge) + memcg1_oom_recover(memcg); + + return ret; +} + +/* + * Reclaims as many pages from the given memcg as possible. + * + * Caller is responsible for holding css reference for memcg. + */ +static int mem_cgroup_force_empty(struct mem_cgroup *memcg) +{ + int nr_retries =3D MAX_RECLAIM_RETRIES; + + /* we call try-to-free pages for make this cgroup empty */ + lru_add_drain_all(); + + drain_all_stock(memcg); + + /* try to free all pages in this cgroup */ + while (nr_retries && page_counter_read(&memcg->memory)) { + if (signal_pending(current)) + return -EINTR; + + if (!try_to_free_mem_cgroup_pages(memcg, 1, GFP_KERNEL, + MEMCG_RECLAIM_MAY_SWAP, NULL)) + nr_retries--; + } + + return 0; +} + +static ssize_t mem_cgroup_force_empty_write(struct kernfs_open_file *of, + char *buf, size_t nbytes, + loff_t off) +{ + struct mem_cgroup *memcg =3D mem_cgroup_from_css(of_css(of)); + + if (mem_cgroup_is_root(memcg)) + return -EINVAL; + return mem_cgroup_force_empty(memcg) ?: nbytes; +} + +static u64 mem_cgroup_hierarchy_read(struct cgroup_subsys_state *css, + struct cftype *cft) +{ + return 1; +} + +static int mem_cgroup_hierarchy_write(struct cgroup_subsys_state *css, + struct cftype *cft, u64 val) +{ + if (val =3D=3D 1) + return 0; + + pr_warn_once("Non-hierarchical mode is deprecated. " + "Please report your usecase to linux-mm@kvack.org if you " + "depend on this functionality.\n"); + + return -EINVAL; +} + +static u64 mem_cgroup_read_u64(struct cgroup_subsys_state *css, + struct cftype *cft) +{ + struct mem_cgroup *memcg =3D mem_cgroup_from_css(css); + struct page_counter *counter; + + switch (MEMFILE_TYPE(cft->private)) { + case _MEM: + counter =3D &memcg->memory; + break; + case _MEMSWAP: + counter =3D &memcg->memsw; + break; + case _KMEM: + counter =3D &memcg->kmem; + break; + case _TCP: + counter =3D &memcg->tcpmem; + break; + default: + BUG(); + } + + switch (MEMFILE_ATTR(cft->private)) { + case RES_USAGE: + if (counter =3D=3D &memcg->memory) + return (u64)mem_cgroup_usage(memcg, false) * PAGE_SIZE; + if (counter =3D=3D &memcg->memsw) + return (u64)mem_cgroup_usage(memcg, true) * PAGE_SIZE; + return (u64)page_counter_read(counter) * PAGE_SIZE; + case RES_LIMIT: + return (u64)counter->max * PAGE_SIZE; + case RES_MAX_USAGE: + return (u64)counter->watermark * PAGE_SIZE; + case RES_FAILCNT: + return counter->failcnt; + case RES_SOFT_LIMIT: + return (u64)READ_ONCE(memcg->soft_limit) * PAGE_SIZE; + default: + BUG(); + } +} + +/* + * This function doesn't do anything useful. Its only job is to provide a = read + * handler for a file so that cgroup_file_mode() will add read permissions. + */ +static int mem_cgroup_dummy_seq_show(__always_unused struct seq_file *m, + __always_unused void *v) +{ + return -EINVAL; +} + +static int memcg_update_tcp_max(struct mem_cgroup *memcg, unsigned long ma= x) +{ + int ret; + + mutex_lock(&memcg_max_mutex); + + ret =3D page_counter_set_max(&memcg->tcpmem, max); + if (ret) + goto out; + + if (!memcg->tcpmem_active) { + /* + * The active flag needs to be written after the static_key + * update. This is what guarantees that the socket activation + * function is the last one to run. See mem_cgroup_sk_alloc() + * for details, and note that we don't mark any socket as + * belonging to this memcg until that flag is up. + * + * We need to do this, because static_keys will span multiple + * sites, but we can't control their order. If we mark a socket + * as accounted, but the accounting functions are not patched in + * yet, we'll lose accounting. + * + * We never race with the readers in mem_cgroup_sk_alloc(), + * because when this value change, the code to process it is not + * patched in yet. + */ + static_branch_inc(&memcg_sockets_enabled_key); + memcg->tcpmem_active =3D true; + } +out: + mutex_unlock(&memcg_max_mutex); + return ret; +} + +/* + * The user of this function is... + * RES_LIMIT. + */ +static ssize_t mem_cgroup_write(struct kernfs_open_file *of, + char *buf, size_t nbytes, loff_t off) +{ + struct mem_cgroup *memcg =3D mem_cgroup_from_css(of_css(of)); + unsigned long nr_pages; + int ret; + + buf =3D strstrip(buf); + ret =3D page_counter_memparse(buf, "-1", &nr_pages); + if (ret) + return ret; + + switch (MEMFILE_ATTR(of_cft(of)->private)) { + case RES_LIMIT: + if (mem_cgroup_is_root(memcg)) { /* Can't set limit on root */ + ret =3D -EINVAL; + break; + } + switch (MEMFILE_TYPE(of_cft(of)->private)) { + case _MEM: + ret =3D mem_cgroup_resize_max(memcg, nr_pages, false); + break; + case _MEMSWAP: + ret =3D mem_cgroup_resize_max(memcg, nr_pages, true); + break; + case _KMEM: + pr_warn_once("kmem.limit_in_bytes is deprecated and will be removed. " + "Writing any value to this file has no effect. " + "Please report your usecase to linux-mm@kvack.org if you " + "depend on this functionality.\n"); + ret =3D 0; + break; + case _TCP: + ret =3D memcg_update_tcp_max(memcg, nr_pages); + break; + } + break; + case RES_SOFT_LIMIT: + if (IS_ENABLED(CONFIG_PREEMPT_RT)) { + ret =3D -EOPNOTSUPP; + } else { + WRITE_ONCE(memcg->soft_limit, nr_pages); + ret =3D 0; + } + break; + } + return ret ?: nbytes; +} + +static ssize_t mem_cgroup_reset(struct kernfs_open_file *of, char *buf, + size_t nbytes, loff_t off) +{ + struct mem_cgroup *memcg =3D mem_cgroup_from_css(of_css(of)); + struct page_counter *counter; + + switch (MEMFILE_TYPE(of_cft(of)->private)) { + case _MEM: + counter =3D &memcg->memory; + break; + case _MEMSWAP: + counter =3D &memcg->memsw; + break; + case _KMEM: + counter =3D &memcg->kmem; + break; + case _TCP: + counter =3D &memcg->tcpmem; + break; + default: + BUG(); + } + + switch (MEMFILE_ATTR(of_cft(of)->private)) { + case RES_MAX_USAGE: + page_counter_reset_watermark(counter); + break; + case RES_FAILCNT: + counter->failcnt =3D 0; + break; + default: + BUG(); + } + + return nbytes; +} + +#ifdef CONFIG_NUMA + +#define LRU_ALL_FILE (BIT(LRU_INACTIVE_FILE) | BIT(LRU_ACTIVE_FILE)) +#define LRU_ALL_ANON (BIT(LRU_INACTIVE_ANON) | BIT(LRU_ACTIVE_ANON)) +#define LRU_ALL ((1 << NR_LRU_LISTS) - 1) + +/* static unsigned long mem_cgroup_node_nr_lru_pages(struct mem_cgroup *me= mcg, */ +/* int nid, unsigned int lru_mask, bool tree) */ +/* { */ +/* struct lruvec *lruvec =3D mem_cgroup_lruvec(memcg, NODE_DATA(nid)); */ +/* unsigned long nr =3D 0; */ +/* enum lru_list lru; */ + +/* VM_BUG_ON((unsigned)nid >=3D nr_node_ids); */ + +/* for_each_lru(lru) { */ +/* if (!(BIT(lru) & lru_mask)) */ +/* continue; */ +/* if (tree) */ +/* nr +=3D lruvec_page_state(lruvec, NR_LRU_BASE + lru); */ +/* else */ +/* nr +=3D lruvec_page_state_local(lruvec, NR_LRU_BASE + lru); */ +/* } */ +/* return nr; */ +/* } */ + +/* static unsigned long mem_cgroup_nr_lru_pages(struct mem_cgroup *memcg, = */ +/* unsigned int lru_mask, */ +/* bool tree) */ +/* { */ +/* unsigned long nr =3D 0; */ +/* enum lru_list lru; */ + +/* for_each_lru(lru) { */ +/* if (!(BIT(lru) & lru_mask)) */ +/* continue; */ +/* if (tree) */ +/* nr +=3D memcg_page_state(memcg, NR_LRU_BASE + lru); */ +/* else */ +/* nr +=3D memcg_page_state_local(memcg, NR_LRU_BASE + lru); */ +/* } */ +/* return nr; */ +/* } */ + +static int memcg_numa_stat_show(struct seq_file *m, void *v) +{ + struct numa_stat { + const char *name; + unsigned int lru_mask; + }; + + static const struct numa_stat stats[] =3D { + { "total", LRU_ALL }, + { "file", LRU_ALL_FILE }, + { "anon", LRU_ALL_ANON }, + { "unevictable", BIT(LRU_UNEVICTABLE) }, + }; + const struct numa_stat *stat; + int nid; + struct mem_cgroup *memcg =3D mem_cgroup_from_seq(m); + + mem_cgroup_flush_stats(memcg); + + for (stat =3D stats; stat < stats + ARRAY_SIZE(stats); stat++) { + seq_printf(m, "%s=3D%lu", stat->name, + mem_cgroup_nr_lru_pages(memcg, stat->lru_mask, + false)); + for_each_node_state(nid, N_MEMORY) + seq_printf(m, " N%d=3D%lu", nid, + mem_cgroup_node_nr_lru_pages(memcg, nid, + stat->lru_mask, false)); + seq_putc(m, '\n'); + } + + for (stat =3D stats; stat < stats + ARRAY_SIZE(stats); stat++) { + + seq_printf(m, "hierarchical_%s=3D%lu", stat->name, + mem_cgroup_nr_lru_pages(memcg, stat->lru_mask, + true)); + for_each_node_state(nid, N_MEMORY) + seq_printf(m, " N%d=3D%lu", nid, + mem_cgroup_node_nr_lru_pages(memcg, nid, + stat->lru_mask, true)); + seq_putc(m, '\n'); + } + + return 0; +} +#endif /* CONFIG_NUMA */ + +static const unsigned int memcg1_stats[] =3D { + NR_FILE_PAGES, + NR_ANON_MAPPED, +#ifdef CONFIG_TRANSPARENT_HUGEPAGE + NR_ANON_THPS, +#endif + NR_SHMEM, + NR_FILE_MAPPED, + NR_FILE_DIRTY, + NR_WRITEBACK, + WORKINGSET_REFAULT_ANON, + WORKINGSET_REFAULT_FILE, +#ifdef CONFIG_SWAP + MEMCG_SWAP, + NR_SWAPCACHE, +#endif +}; + +static const char *const memcg1_stat_names[] =3D { + "cache", + "rss", +#ifdef CONFIG_TRANSPARENT_HUGEPAGE + "rss_huge", +#endif + "shmem", + "mapped_file", + "dirty", + "writeback", + "workingset_refault_anon", + "workingset_refault_file", +#ifdef CONFIG_SWAP + "swap", + "swapcached", +#endif +}; + +/* Universal VM events cgroup1 shows, original sort order */ +static const unsigned int memcg1_events[] =3D { + PGPGIN, + PGPGOUT, + PGFAULT, + PGMAJFAULT, +}; + +void memcg1_stat_format(struct mem_cgroup *memcg, struct seq_buf *s) +{ + unsigned long memory, memsw; + struct mem_cgroup *mi; + unsigned int i; + + BUILD_BUG_ON(ARRAY_SIZE(memcg1_stat_names) !=3D ARRAY_SIZE(memcg1_stats)); + + mem_cgroup_flush_stats(memcg); + + for (i =3D 0; i < ARRAY_SIZE(memcg1_stats); i++) { + unsigned long nr; + + nr =3D memcg_page_state_local_output(memcg, memcg1_stats[i]); + seq_buf_printf(s, "%s %lu\n", memcg1_stat_names[i], nr); + } + + for (i =3D 0; i < ARRAY_SIZE(memcg1_events); i++) + seq_buf_printf(s, "%s %lu\n", vm_event_name(memcg1_events[i]), + memcg_events_local(memcg, memcg1_events[i])); + + for (i =3D 0; i < NR_LRU_LISTS; i++) + seq_buf_printf(s, "%s %lu\n", lru_list_name(i), + memcg_page_state_local(memcg, NR_LRU_BASE + i) * + PAGE_SIZE); + + /* Hierarchical information */ + memory =3D memsw =3D PAGE_COUNTER_MAX; + for (mi =3D memcg; mi; mi =3D parent_mem_cgroup(mi)) { + memory =3D min(memory, READ_ONCE(mi->memory.max)); + memsw =3D min(memsw, READ_ONCE(mi->memsw.max)); + } + seq_buf_printf(s, "hierarchical_memory_limit %llu\n", + (u64)memory * PAGE_SIZE); + seq_buf_printf(s, "hierarchical_memsw_limit %llu\n", + (u64)memsw * PAGE_SIZE); + + for (i =3D 0; i < ARRAY_SIZE(memcg1_stats); i++) { + unsigned long nr; + + nr =3D memcg_page_state_output(memcg, memcg1_stats[i]); + seq_buf_printf(s, "total_%s %llu\n", memcg1_stat_names[i], + (u64)nr); + } + + for (i =3D 0; i < ARRAY_SIZE(memcg1_events); i++) + seq_buf_printf(s, "total_%s %llu\n", + vm_event_name(memcg1_events[i]), + (u64)memcg_events(memcg, memcg1_events[i])); + + for (i =3D 0; i < NR_LRU_LISTS; i++) + seq_buf_printf(s, "total_%s %llu\n", lru_list_name(i), + (u64)memcg_page_state(memcg, NR_LRU_BASE + i) * + PAGE_SIZE); + +#ifdef CONFIG_DEBUG_VM + { + pg_data_t *pgdat; + struct mem_cgroup_per_node *mz; + unsigned long anon_cost =3D 0; + unsigned long file_cost =3D 0; + + for_each_online_pgdat(pgdat) { + mz =3D memcg->nodeinfo[pgdat->node_id]; + + anon_cost +=3D mz->lruvec.anon_cost; + file_cost +=3D mz->lruvec.file_cost; + } + seq_buf_printf(s, "anon_cost %lu\n", anon_cost); + seq_buf_printf(s, "file_cost %lu\n", file_cost); + } +#endif +} + +static u64 mem_cgroup_swappiness_read(struct cgroup_subsys_state *css, + struct cftype *cft) +{ + struct mem_cgroup *memcg =3D mem_cgroup_from_css(css); + + return mem_cgroup_swappiness(memcg); +} + +static int mem_cgroup_swappiness_write(struct cgroup_subsys_state *css, + struct cftype *cft, u64 val) +{ + struct mem_cgroup *memcg =3D mem_cgroup_from_css(css); + + if (val > MAX_SWAPPINESS) + return -EINVAL; + + if (!mem_cgroup_is_root(memcg)) + WRITE_ONCE(memcg->swappiness, val); + else + WRITE_ONCE(vm_swappiness, val); + + return 0; +} + +static int mem_cgroup_oom_control_read(struct seq_file *sf, void *v) +{ + struct mem_cgroup *memcg =3D mem_cgroup_from_seq(sf); + + seq_printf(sf, "oom_kill_disable %d\n", READ_ONCE(memcg->oom_kill_disable= )); + seq_printf(sf, "under_oom %d\n", (bool)memcg->under_oom); + seq_printf(sf, "oom_kill %lu\n", + atomic_long_read(&memcg->memory_events[MEMCG_OOM_KILL])); + return 0; +} + +static int mem_cgroup_oom_control_write(struct cgroup_subsys_state *css, + struct cftype *cft, u64 val) +{ + struct mem_cgroup *memcg =3D mem_cgroup_from_css(css); + + /* cannot set to root cgroup and only 0 and 1 are allowed */ + if (mem_cgroup_is_root(memcg) || !((val =3D=3D 0) || (val =3D=3D 1))) + return -EINVAL; + + WRITE_ONCE(memcg->oom_kill_disable, val); + if (!val) + memcg1_oom_recover(memcg); + + return 0; +} + +#if defined(CONFIG_MEMCG_KMEM) && defined(CONFIG_SLUB_DEBUG) +static int mem_cgroup_slab_show(struct seq_file *m, void *p) +{ + /* + * Deprecated. + * Please, take a look at tools/cgroup/memcg_slabinfo.py . + */ + return 0; +} +#endif + +struct cftype mem_cgroup_legacy_files[] =3D { + { + .name =3D "usage_in_bytes", + .private =3D MEMFILE_PRIVATE(_MEM, RES_USAGE), + .read_u64 =3D mem_cgroup_read_u64, + }, + { + .name =3D "max_usage_in_bytes", + .private =3D MEMFILE_PRIVATE(_MEM, RES_MAX_USAGE), + .write =3D mem_cgroup_reset, + .read_u64 =3D mem_cgroup_read_u64, + }, + { + .name =3D "limit_in_bytes", + .private =3D MEMFILE_PRIVATE(_MEM, RES_LIMIT), + .write =3D mem_cgroup_write, + .read_u64 =3D mem_cgroup_read_u64, + }, + { + .name =3D "soft_limit_in_bytes", + .private =3D MEMFILE_PRIVATE(_MEM, RES_SOFT_LIMIT), + .write =3D mem_cgroup_write, + .read_u64 =3D mem_cgroup_read_u64, + }, + { + .name =3D "failcnt", + .private =3D MEMFILE_PRIVATE(_MEM, RES_FAILCNT), + .write =3D mem_cgroup_reset, + .read_u64 =3D mem_cgroup_read_u64, + }, + { + .name =3D "stat", + .seq_show =3D memory_stat_show, + }, + { + .name =3D "force_empty", + .write =3D mem_cgroup_force_empty_write, + }, + { + .name =3D "use_hierarchy", + .write_u64 =3D mem_cgroup_hierarchy_write, + .read_u64 =3D mem_cgroup_hierarchy_read, + }, + { + .name =3D "cgroup.event_control", /* XXX: for compat */ + .write =3D memcg_write_event_control, + .flags =3D CFTYPE_NO_PREFIX | CFTYPE_WORLD_WRITABLE, + }, + { + .name =3D "swappiness", + .read_u64 =3D mem_cgroup_swappiness_read, + .write_u64 =3D mem_cgroup_swappiness_write, + }, + { + .name =3D "move_charge_at_immigrate", + .read_u64 =3D mem_cgroup_move_charge_read, + .write_u64 =3D mem_cgroup_move_charge_write, + }, + { + .name =3D "oom_control", + .seq_show =3D mem_cgroup_oom_control_read, + .write_u64 =3D mem_cgroup_oom_control_write, + }, + { + .name =3D "pressure_level", + .seq_show =3D mem_cgroup_dummy_seq_show, + }, +#ifdef CONFIG_NUMA + { + .name =3D "numa_stat", + .seq_show =3D memcg_numa_stat_show, + }, +#endif + { + .name =3D "kmem.limit_in_bytes", + .private =3D MEMFILE_PRIVATE(_KMEM, RES_LIMIT), + .write =3D mem_cgroup_write, + .read_u64 =3D mem_cgroup_read_u64, + }, + { + .name =3D "kmem.usage_in_bytes", + .private =3D MEMFILE_PRIVATE(_KMEM, RES_USAGE), + .read_u64 =3D mem_cgroup_read_u64, + }, + { + .name =3D "kmem.failcnt", + .private =3D MEMFILE_PRIVATE(_KMEM, RES_FAILCNT), + .write =3D mem_cgroup_reset, + .read_u64 =3D mem_cgroup_read_u64, + }, + { + .name =3D "kmem.max_usage_in_bytes", + .private =3D MEMFILE_PRIVATE(_KMEM, RES_MAX_USAGE), + .write =3D mem_cgroup_reset, + .read_u64 =3D mem_cgroup_read_u64, + }, +#if defined(CONFIG_MEMCG_KMEM) && defined(CONFIG_SLUB_DEBUG) + { + .name =3D "kmem.slabinfo", + .seq_show =3D mem_cgroup_slab_show, + }, +#endif + { + .name =3D "kmem.tcp.limit_in_bytes", + .private =3D MEMFILE_PRIVATE(_TCP, RES_LIMIT), + .write =3D mem_cgroup_write, + .read_u64 =3D mem_cgroup_read_u64, + }, + { + .name =3D "kmem.tcp.usage_in_bytes", + .private =3D MEMFILE_PRIVATE(_TCP, RES_USAGE), + .read_u64 =3D mem_cgroup_read_u64, + }, + { + .name =3D "kmem.tcp.failcnt", + .private =3D MEMFILE_PRIVATE(_TCP, RES_FAILCNT), + .write =3D mem_cgroup_reset, + .read_u64 =3D mem_cgroup_read_u64, + }, + { + .name =3D "kmem.tcp.max_usage_in_bytes", + .private =3D MEMFILE_PRIVATE(_TCP, RES_MAX_USAGE), + .write =3D mem_cgroup_reset, + .read_u64 =3D mem_cgroup_read_u64, + }, + { }, /* terminate */ +}; + +struct cftype memsw_files[] =3D { + { + .name =3D "memsw.usage_in_bytes", + .private =3D MEMFILE_PRIVATE(_MEMSWAP, RES_USAGE), + .read_u64 =3D mem_cgroup_read_u64, + }, + { + .name =3D "memsw.max_usage_in_bytes", + .private =3D MEMFILE_PRIVATE(_MEMSWAP, RES_MAX_USAGE), + .write =3D mem_cgroup_reset, + .read_u64 =3D mem_cgroup_read_u64, + }, + { + .name =3D "memsw.limit_in_bytes", + .private =3D MEMFILE_PRIVATE(_MEMSWAP, RES_LIMIT), + .write =3D mem_cgroup_write, + .read_u64 =3D mem_cgroup_read_u64, + }, + { + .name =3D "memsw.failcnt", + .private =3D MEMFILE_PRIVATE(_MEMSWAP, RES_FAILCNT), + .write =3D mem_cgroup_reset, + .read_u64 =3D mem_cgroup_read_u64, + }, + { }, /* terminate */ +}; + static int __init memcg1_init(void) { int node; diff --git a/mm/memcontrol-v1.h b/mm/memcontrol-v1.h index 972c493a8ae3..7be4670d9abb 100644 --- a/mm/memcontrol-v1.h +++ b/mm/memcontrol-v1.h @@ -3,6 +3,8 @@ #ifndef __MM_MEMCONTROL_V1_H #define __MM_MEMCONTROL_V1_H =20 +#include + void memcg1_update_tree(struct mem_cgroup *memcg, int nid); void memcg1_remove_from_trees(struct mem_cgroup *memcg); =20 @@ -34,12 +36,6 @@ int memcg1_can_attach(struct cgroup_taskset *tset); void memcg1_cancel_attach(struct cgroup_taskset *tset); void memcg1_move_task(void); =20 -struct cftype; -u64 mem_cgroup_move_charge_read(struct cgroup_subsys_state *css, - struct cftype *cft); -int mem_cgroup_move_charge_write(struct cgroup_subsys_state *css, - struct cftype *cft, u64 val); - /* * Per memcg event counter is incremented at every pagein/pageout. With TH= P, * it will be incremented by the number of pages. This counter is used @@ -86,11 +82,28 @@ enum res_type { bool mem_cgroup_event_ratelimit(struct mem_cgroup *memcg, enum mem_cgroup_events_target target); unsigned long mem_cgroup_usage(struct mem_cgroup *memcg, bool swap); -ssize_t memcg_write_event_control(struct kernfs_open_file *of, - char *buf, size_t nbytes, loff_t off); =20 bool memcg1_oom_prepare(struct mem_cgroup *memcg, bool *locked); void memcg1_oom_finish(struct mem_cgroup *memcg, bool locked); void memcg1_oom_recover(struct mem_cgroup *memcg); =20 +void drain_all_stock(struct mem_cgroup *root_memcg); +unsigned long mem_cgroup_nr_lru_pages(struct mem_cgroup *memcg, + unsigned int lru_mask, bool tree); +unsigned long mem_cgroup_node_nr_lru_pages(struct mem_cgroup *memcg, + int nid, unsigned int lru_mask, + bool tree); + +unsigned long memcg_events(struct mem_cgroup *memcg, int event); +unsigned long memcg_events_local(struct mem_cgroup *memcg, int event); +unsigned long memcg_page_state_local(struct mem_cgroup *memcg, int idx); +unsigned long memcg_page_state_output(struct mem_cgroup *memcg, int item); +unsigned long memcg_page_state_local_output(struct mem_cgroup *memcg, int = item); +int memory_stat_show(struct seq_file *m, void *v); + +void memcg1_stat_format(struct mem_cgroup *memcg, struct seq_buf *s); + +extern struct cftype memsw_files[]; +extern struct cftype mem_cgroup_legacy_files[]; + #endif /* __MM_MEMCONTROL_V1_H */ diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 37e0af5b26f3..c7341e811945 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -96,10 +96,6 @@ static DECLARE_WAIT_QUEUE_HEAD(memcg_cgwb_frn_waitq); #define THRESHOLDS_EVENTS_TARGET 128 #define SOFTLIMIT_EVENTS_TARGET 1024 =20 -#define MEMFILE_PRIVATE(x, val) ((x) << 16 | (val)) -#define MEMFILE_TYPE(val) ((val) >> 16 & 0xffff) -#define MEMFILE_ATTR(val) ((val) & 0xffff) - static inline bool task_is_dying(void) { return tsk_is_oom_victim(current) || fatal_signal_pending(current) || @@ -676,7 +672,7 @@ void __mod_memcg_state(struct mem_cgroup *memcg, enum m= emcg_stat_item idx, } =20 /* idx can be of type enum memcg_stat_item or node_stat_item. */ -static unsigned long memcg_page_state_local(struct mem_cgroup *memcg, int = idx) +unsigned long memcg_page_state_local(struct mem_cgroup *memcg, int idx) { long x; int i =3D memcg_stats_index(idx); @@ -825,7 +821,7 @@ void __count_memcg_events(struct mem_cgroup *memcg, enu= m vm_event_item idx, memcg_stats_unlock(); } =20 -static unsigned long memcg_events(struct mem_cgroup *memcg, int event) +unsigned long memcg_events(struct mem_cgroup *memcg, int event) { int i =3D memcg_events_index(event); =20 @@ -835,7 +831,7 @@ static unsigned long memcg_events(struct mem_cgroup *me= mcg, int event) return READ_ONCE(memcg->vmstats->events[i]); } =20 -static unsigned long memcg_events_local(struct mem_cgroup *memcg, int even= t) +unsigned long memcg_events_local(struct mem_cgroup *memcg, int event) { int i =3D memcg_events_index(event); =20 @@ -1420,15 +1416,13 @@ static int memcg_page_state_output_unit(int item) } } =20 -static inline unsigned long memcg_page_state_output(struct mem_cgroup *mem= cg, - int item) +unsigned long memcg_page_state_output(struct mem_cgroup *memcg, int item) { return memcg_page_state(memcg, item) * memcg_page_state_output_unit(item); } =20 -static inline unsigned long memcg_page_state_local_output( - struct mem_cgroup *memcg, int item) +unsigned long memcg_page_state_local_output(struct mem_cgroup *memcg, int = item) { return memcg_page_state_local(memcg, item) * memcg_page_state_output_unit(item); @@ -1487,8 +1481,6 @@ static void memcg_stat_format(struct mem_cgroup *memc= g, struct seq_buf *s) WARN_ON_ONCE(seq_buf_has_overflowed(s)); } =20 -static void memcg1_stat_format(struct mem_cgroup *memcg, struct seq_buf *s= ); - static void memory_stat_format(struct mem_cgroup *memcg, struct seq_buf *s) { if (cgroup_subsys_on_dfl(memory_cgrp_subsys)) @@ -1861,7 +1853,7 @@ static void refill_stock(struct mem_cgroup *memcg, un= signed int nr_pages) * Drains all per-CPU charge caches for given root_memcg resp. subtree * of the hierarchy under it. */ -static void drain_all_stock(struct mem_cgroup *root_memcg) +void drain_all_stock(struct mem_cgroup *root_memcg) { int cpu, curcpu; =20 @@ -3115,120 +3107,6 @@ void split_page_memcg(struct page *head, int old_or= der, int new_order) css_get_many(&memcg->css, old_nr / new_nr - 1); } =20 - -static DEFINE_MUTEX(memcg_max_mutex); - -static int mem_cgroup_resize_max(struct mem_cgroup *memcg, - unsigned long max, bool memsw) -{ - bool enlarge =3D false; - bool drained =3D false; - int ret; - bool limits_invariant; - struct page_counter *counter =3D memsw ? &memcg->memsw : &memcg->memory; - - do { - if (signal_pending(current)) { - ret =3D -EINTR; - break; - } - - mutex_lock(&memcg_max_mutex); - /* - * Make sure that the new limit (memsw or memory limit) doesn't - * break our basic invariant rule memory.max <=3D memsw.max. - */ - limits_invariant =3D memsw ? max >=3D READ_ONCE(memcg->memory.max) : - max <=3D memcg->memsw.max; - if (!limits_invariant) { - mutex_unlock(&memcg_max_mutex); - ret =3D -EINVAL; - break; - } - if (max > counter->max) - enlarge =3D true; - ret =3D page_counter_set_max(counter, max); - mutex_unlock(&memcg_max_mutex); - - if (!ret) - break; - - if (!drained) { - drain_all_stock(memcg); - drained =3D true; - continue; - } - - if (!try_to_free_mem_cgroup_pages(memcg, 1, GFP_KERNEL, - memsw ? 0 : MEMCG_RECLAIM_MAY_SWAP, NULL)) { - ret =3D -EBUSY; - break; - } - } while (true); - - if (!ret && enlarge) - memcg1_oom_recover(memcg); - - return ret; -} - -/* - * Reclaims as many pages from the given memcg as possible. - * - * Caller is responsible for holding css reference for memcg. - */ -static int mem_cgroup_force_empty(struct mem_cgroup *memcg) -{ - int nr_retries =3D MAX_RECLAIM_RETRIES; - - /* we call try-to-free pages for make this cgroup empty */ - lru_add_drain_all(); - - drain_all_stock(memcg); - - /* try to free all pages in this cgroup */ - while (nr_retries && page_counter_read(&memcg->memory)) { - if (signal_pending(current)) - return -EINTR; - - if (!try_to_free_mem_cgroup_pages(memcg, 1, GFP_KERNEL, - MEMCG_RECLAIM_MAY_SWAP, NULL)) - nr_retries--; - } - - return 0; -} - -static ssize_t mem_cgroup_force_empty_write(struct kernfs_open_file *of, - char *buf, size_t nbytes, - loff_t off) -{ - struct mem_cgroup *memcg =3D mem_cgroup_from_css(of_css(of)); - - if (mem_cgroup_is_root(memcg)) - return -EINVAL; - return mem_cgroup_force_empty(memcg) ?: nbytes; -} - -static u64 mem_cgroup_hierarchy_read(struct cgroup_subsys_state *css, - struct cftype *cft) -{ - return 1; -} - -static int mem_cgroup_hierarchy_write(struct cgroup_subsys_state *css, - struct cftype *cft, u64 val) -{ - if (val =3D=3D 1) - return 0; - - pr_warn_once("Non-hierarchical mode is deprecated. " - "Please report your usecase to linux-mm@kvack.org if you " - "depend on this functionality.\n"); - - return -EINVAL; -} - unsigned long mem_cgroup_usage(struct mem_cgroup *memcg, bool swap) { unsigned long val; @@ -3251,67 +3129,6 @@ unsigned long mem_cgroup_usage(struct mem_cgroup *me= mcg, bool swap) return val; } =20 -enum { - RES_USAGE, - RES_LIMIT, - RES_MAX_USAGE, - RES_FAILCNT, - RES_SOFT_LIMIT, -}; - -static u64 mem_cgroup_read_u64(struct cgroup_subsys_state *css, - struct cftype *cft) -{ - struct mem_cgroup *memcg =3D mem_cgroup_from_css(css); - struct page_counter *counter; - - switch (MEMFILE_TYPE(cft->private)) { - case _MEM: - counter =3D &memcg->memory; - break; - case _MEMSWAP: - counter =3D &memcg->memsw; - break; - case _KMEM: - counter =3D &memcg->kmem; - break; - case _TCP: - counter =3D &memcg->tcpmem; - break; - default: - BUG(); - } - - switch (MEMFILE_ATTR(cft->private)) { - case RES_USAGE: - if (counter =3D=3D &memcg->memory) - return (u64)mem_cgroup_usage(memcg, false) * PAGE_SIZE; - if (counter =3D=3D &memcg->memsw) - return (u64)mem_cgroup_usage(memcg, true) * PAGE_SIZE; - return (u64)page_counter_read(counter) * PAGE_SIZE; - case RES_LIMIT: - return (u64)counter->max * PAGE_SIZE; - case RES_MAX_USAGE: - return (u64)counter->watermark * PAGE_SIZE; - case RES_FAILCNT: - return counter->failcnt; - case RES_SOFT_LIMIT: - return (u64)READ_ONCE(memcg->soft_limit) * PAGE_SIZE; - default: - BUG(); - } -} - -/* - * This function doesn't do anything useful. Its only job is to provide a = read - * handler for a file so that cgroup_file_mode() will add read permissions. - */ -static int mem_cgroup_dummy_seq_show(__always_unused struct seq_file *m, - __always_unused void *v) -{ - return -EINVAL; -} - #ifdef CONFIG_MEMCG_KMEM static int memcg_online_kmem(struct mem_cgroup *memcg) { @@ -3373,139 +3190,9 @@ static void memcg_offline_kmem(struct mem_cgroup *m= emcg) } #endif /* CONFIG_MEMCG_KMEM */ =20 -static int memcg_update_tcp_max(struct mem_cgroup *memcg, unsigned long ma= x) -{ - int ret; - - mutex_lock(&memcg_max_mutex); - - ret =3D page_counter_set_max(&memcg->tcpmem, max); - if (ret) - goto out; - - if (!memcg->tcpmem_active) { - /* - * The active flag needs to be written after the static_key - * update. This is what guarantees that the socket activation - * function is the last one to run. See mem_cgroup_sk_alloc() - * for details, and note that we don't mark any socket as - * belonging to this memcg until that flag is up. - * - * We need to do this, because static_keys will span multiple - * sites, but we can't control their order. If we mark a socket - * as accounted, but the accounting functions are not patched in - * yet, we'll lose accounting. - * - * We never race with the readers in mem_cgroup_sk_alloc(), - * because when this value change, the code to process it is not - * patched in yet. - */ - static_branch_inc(&memcg_sockets_enabled_key); - memcg->tcpmem_active =3D true; - } -out: - mutex_unlock(&memcg_max_mutex); - return ret; -} - -/* - * The user of this function is... - * RES_LIMIT. - */ -static ssize_t mem_cgroup_write(struct kernfs_open_file *of, - char *buf, size_t nbytes, loff_t off) -{ - struct mem_cgroup *memcg =3D mem_cgroup_from_css(of_css(of)); - unsigned long nr_pages; - int ret; - - buf =3D strstrip(buf); - ret =3D page_counter_memparse(buf, "-1", &nr_pages); - if (ret) - return ret; - - switch (MEMFILE_ATTR(of_cft(of)->private)) { - case RES_LIMIT: - if (mem_cgroup_is_root(memcg)) { /* Can't set limit on root */ - ret =3D -EINVAL; - break; - } - switch (MEMFILE_TYPE(of_cft(of)->private)) { - case _MEM: - ret =3D mem_cgroup_resize_max(memcg, nr_pages, false); - break; - case _MEMSWAP: - ret =3D mem_cgroup_resize_max(memcg, nr_pages, true); - break; - case _KMEM: - pr_warn_once("kmem.limit_in_bytes is deprecated and will be removed. " - "Writing any value to this file has no effect. " - "Please report your usecase to linux-mm@kvack.org if you " - "depend on this functionality.\n"); - ret =3D 0; - break; - case _TCP: - ret =3D memcg_update_tcp_max(memcg, nr_pages); - break; - } - break; - case RES_SOFT_LIMIT: - if (IS_ENABLED(CONFIG_PREEMPT_RT)) { - ret =3D -EOPNOTSUPP; - } else { - WRITE_ONCE(memcg->soft_limit, nr_pages); - ret =3D 0; - } - break; - } - return ret ?: nbytes; -} - -static ssize_t mem_cgroup_reset(struct kernfs_open_file *of, char *buf, - size_t nbytes, loff_t off) -{ - struct mem_cgroup *memcg =3D mem_cgroup_from_css(of_css(of)); - struct page_counter *counter; - - switch (MEMFILE_TYPE(of_cft(of)->private)) { - case _MEM: - counter =3D &memcg->memory; - break; - case _MEMSWAP: - counter =3D &memcg->memsw; - break; - case _KMEM: - counter =3D &memcg->kmem; - break; - case _TCP: - counter =3D &memcg->tcpmem; - break; - default: - BUG(); - } - - switch (MEMFILE_ATTR(of_cft(of)->private)) { - case RES_MAX_USAGE: - page_counter_reset_watermark(counter); - break; - case RES_FAILCNT: - counter->failcnt =3D 0; - break; - default: - BUG(); - } - - return nbytes; -} - -#ifdef CONFIG_NUMA - -#define LRU_ALL_FILE (BIT(LRU_INACTIVE_FILE) | BIT(LRU_ACTIVE_FILE)) -#define LRU_ALL_ANON (BIT(LRU_INACTIVE_ANON) | BIT(LRU_ACTIVE_ANON)) -#define LRU_ALL ((1 << NR_LRU_LISTS) - 1) - -static unsigned long mem_cgroup_node_nr_lru_pages(struct mem_cgroup *memcg, - int nid, unsigned int lru_mask, bool tree) +unsigned long mem_cgroup_node_nr_lru_pages(struct mem_cgroup *memcg, + int nid, unsigned int lru_mask, + bool tree) { struct lruvec *lruvec =3D mem_cgroup_lruvec(memcg, NODE_DATA(nid)); unsigned long nr =3D 0; @@ -3524,9 +3211,8 @@ static unsigned long mem_cgroup_node_nr_lru_pages(str= uct mem_cgroup *memcg, return nr; } =20 -static unsigned long mem_cgroup_nr_lru_pages(struct mem_cgroup *memcg, - unsigned int lru_mask, - bool tree) +unsigned long mem_cgroup_nr_lru_pages(struct mem_cgroup *memcg, + unsigned int lru_mask, bool tree) { unsigned long nr =3D 0; enum lru_list lru; @@ -3542,221 +3228,6 @@ static unsigned long mem_cgroup_nr_lru_pages(struct= mem_cgroup *memcg, return nr; } =20 -static int memcg_numa_stat_show(struct seq_file *m, void *v) -{ - struct numa_stat { - const char *name; - unsigned int lru_mask; - }; - - static const struct numa_stat stats[] =3D { - { "total", LRU_ALL }, - { "file", LRU_ALL_FILE }, - { "anon", LRU_ALL_ANON }, - { "unevictable", BIT(LRU_UNEVICTABLE) }, - }; - const struct numa_stat *stat; - int nid; - struct mem_cgroup *memcg =3D mem_cgroup_from_seq(m); - - mem_cgroup_flush_stats(memcg); - - for (stat =3D stats; stat < stats + ARRAY_SIZE(stats); stat++) { - seq_printf(m, "%s=3D%lu", stat->name, - mem_cgroup_nr_lru_pages(memcg, stat->lru_mask, - false)); - for_each_node_state(nid, N_MEMORY) - seq_printf(m, " N%d=3D%lu", nid, - mem_cgroup_node_nr_lru_pages(memcg, nid, - stat->lru_mask, false)); - seq_putc(m, '\n'); - } - - for (stat =3D stats; stat < stats + ARRAY_SIZE(stats); stat++) { - - seq_printf(m, "hierarchical_%s=3D%lu", stat->name, - mem_cgroup_nr_lru_pages(memcg, stat->lru_mask, - true)); - for_each_node_state(nid, N_MEMORY) - seq_printf(m, " N%d=3D%lu", nid, - mem_cgroup_node_nr_lru_pages(memcg, nid, - stat->lru_mask, true)); - seq_putc(m, '\n'); - } - - return 0; -} -#endif /* CONFIG_NUMA */ - -static const unsigned int memcg1_stats[] =3D { - NR_FILE_PAGES, - NR_ANON_MAPPED, -#ifdef CONFIG_TRANSPARENT_HUGEPAGE - NR_ANON_THPS, -#endif - NR_SHMEM, - NR_FILE_MAPPED, - NR_FILE_DIRTY, - NR_WRITEBACK, - WORKINGSET_REFAULT_ANON, - WORKINGSET_REFAULT_FILE, -#ifdef CONFIG_SWAP - MEMCG_SWAP, - NR_SWAPCACHE, -#endif -}; - -static const char *const memcg1_stat_names[] =3D { - "cache", - "rss", -#ifdef CONFIG_TRANSPARENT_HUGEPAGE - "rss_huge", -#endif - "shmem", - "mapped_file", - "dirty", - "writeback", - "workingset_refault_anon", - "workingset_refault_file", -#ifdef CONFIG_SWAP - "swap", - "swapcached", -#endif -}; - -/* Universal VM events cgroup1 shows, original sort order */ -static const unsigned int memcg1_events[] =3D { - PGPGIN, - PGPGOUT, - PGFAULT, - PGMAJFAULT, -}; - -static void memcg1_stat_format(struct mem_cgroup *memcg, struct seq_buf *s) -{ - unsigned long memory, memsw; - struct mem_cgroup *mi; - unsigned int i; - - BUILD_BUG_ON(ARRAY_SIZE(memcg1_stat_names) !=3D ARRAY_SIZE(memcg1_stats)); - - mem_cgroup_flush_stats(memcg); - - for (i =3D 0; i < ARRAY_SIZE(memcg1_stats); i++) { - unsigned long nr; - - nr =3D memcg_page_state_local_output(memcg, memcg1_stats[i]); - seq_buf_printf(s, "%s %lu\n", memcg1_stat_names[i], nr); - } - - for (i =3D 0; i < ARRAY_SIZE(memcg1_events); i++) - seq_buf_printf(s, "%s %lu\n", vm_event_name(memcg1_events[i]), - memcg_events_local(memcg, memcg1_events[i])); - - for (i =3D 0; i < NR_LRU_LISTS; i++) - seq_buf_printf(s, "%s %lu\n", lru_list_name(i), - memcg_page_state_local(memcg, NR_LRU_BASE + i) * - PAGE_SIZE); - - /* Hierarchical information */ - memory =3D memsw =3D PAGE_COUNTER_MAX; - for (mi =3D memcg; mi; mi =3D parent_mem_cgroup(mi)) { - memory =3D min(memory, READ_ONCE(mi->memory.max)); - memsw =3D min(memsw, READ_ONCE(mi->memsw.max)); - } - seq_buf_printf(s, "hierarchical_memory_limit %llu\n", - (u64)memory * PAGE_SIZE); - seq_buf_printf(s, "hierarchical_memsw_limit %llu\n", - (u64)memsw * PAGE_SIZE); - - for (i =3D 0; i < ARRAY_SIZE(memcg1_stats); i++) { - unsigned long nr; - - nr =3D memcg_page_state_output(memcg, memcg1_stats[i]); - seq_buf_printf(s, "total_%s %llu\n", memcg1_stat_names[i], - (u64)nr); - } - - for (i =3D 0; i < ARRAY_SIZE(memcg1_events); i++) - seq_buf_printf(s, "total_%s %llu\n", - vm_event_name(memcg1_events[i]), - (u64)memcg_events(memcg, memcg1_events[i])); - - for (i =3D 0; i < NR_LRU_LISTS; i++) - seq_buf_printf(s, "total_%s %llu\n", lru_list_name(i), - (u64)memcg_page_state(memcg, NR_LRU_BASE + i) * - PAGE_SIZE); - -#ifdef CONFIG_DEBUG_VM - { - pg_data_t *pgdat; - struct mem_cgroup_per_node *mz; - unsigned long anon_cost =3D 0; - unsigned long file_cost =3D 0; - - for_each_online_pgdat(pgdat) { - mz =3D memcg->nodeinfo[pgdat->node_id]; - - anon_cost +=3D mz->lruvec.anon_cost; - file_cost +=3D mz->lruvec.file_cost; - } - seq_buf_printf(s, "anon_cost %lu\n", anon_cost); - seq_buf_printf(s, "file_cost %lu\n", file_cost); - } -#endif -} - -static u64 mem_cgroup_swappiness_read(struct cgroup_subsys_state *css, - struct cftype *cft) -{ - struct mem_cgroup *memcg =3D mem_cgroup_from_css(css); - - return mem_cgroup_swappiness(memcg); -} - -static int mem_cgroup_swappiness_write(struct cgroup_subsys_state *css, - struct cftype *cft, u64 val) -{ - struct mem_cgroup *memcg =3D mem_cgroup_from_css(css); - - if (val > MAX_SWAPPINESS) - return -EINVAL; - - if (!mem_cgroup_is_root(memcg)) - WRITE_ONCE(memcg->swappiness, val); - else - WRITE_ONCE(vm_swappiness, val); - - return 0; -} - -static int mem_cgroup_oom_control_read(struct seq_file *sf, void *v) -{ - struct mem_cgroup *memcg =3D mem_cgroup_from_seq(sf); - - seq_printf(sf, "oom_kill_disable %d\n", READ_ONCE(memcg->oom_kill_disable= )); - seq_printf(sf, "under_oom %d\n", (bool)memcg->under_oom); - seq_printf(sf, "oom_kill %lu\n", - atomic_long_read(&memcg->memory_events[MEMCG_OOM_KILL])); - return 0; -} - -static int mem_cgroup_oom_control_write(struct cgroup_subsys_state *css, - struct cftype *cft, u64 val) -{ - struct mem_cgroup *memcg =3D mem_cgroup_from_css(css); - - /* cannot set to root cgroup and only 0 and 1 are allowed */ - if (mem_cgroup_is_root(memcg) || !((val =3D=3D 0) || (val =3D=3D 1))) - return -EINVAL; - - WRITE_ONCE(memcg->oom_kill_disable, val); - if (!val) - memcg1_oom_recover(memcg); - - return 0; -} - #ifdef CONFIG_CGROUP_WRITEBACK =20 #include @@ -3970,147 +3441,6 @@ static void memcg_wb_domain_size_changed(struct mem= _cgroup *memcg) =20 #endif /* CONFIG_CGROUP_WRITEBACK */ =20 -#if defined(CONFIG_MEMCG_KMEM) && defined(CONFIG_SLUB_DEBUG) -static int mem_cgroup_slab_show(struct seq_file *m, void *p) -{ - /* - * Deprecated. - * Please, take a look at tools/cgroup/memcg_slabinfo.py . - */ - return 0; -} -#endif - -static int memory_stat_show(struct seq_file *m, void *v); - -static struct cftype mem_cgroup_legacy_files[] =3D { - { - .name =3D "usage_in_bytes", - .private =3D MEMFILE_PRIVATE(_MEM, RES_USAGE), - .read_u64 =3D mem_cgroup_read_u64, - }, - { - .name =3D "max_usage_in_bytes", - .private =3D MEMFILE_PRIVATE(_MEM, RES_MAX_USAGE), - .write =3D mem_cgroup_reset, - .read_u64 =3D mem_cgroup_read_u64, - }, - { - .name =3D "limit_in_bytes", - .private =3D MEMFILE_PRIVATE(_MEM, RES_LIMIT), - .write =3D mem_cgroup_write, - .read_u64 =3D mem_cgroup_read_u64, - }, - { - .name =3D "soft_limit_in_bytes", - .private =3D MEMFILE_PRIVATE(_MEM, RES_SOFT_LIMIT), - .write =3D mem_cgroup_write, - .read_u64 =3D mem_cgroup_read_u64, - }, - { - .name =3D "failcnt", - .private =3D MEMFILE_PRIVATE(_MEM, RES_FAILCNT), - .write =3D mem_cgroup_reset, - .read_u64 =3D mem_cgroup_read_u64, - }, - { - .name =3D "stat", - .seq_show =3D memory_stat_show, - }, - { - .name =3D "force_empty", - .write =3D mem_cgroup_force_empty_write, - }, - { - .name =3D "use_hierarchy", - .write_u64 =3D mem_cgroup_hierarchy_write, - .read_u64 =3D mem_cgroup_hierarchy_read, - }, - { - .name =3D "cgroup.event_control", /* XXX: for compat */ - .write =3D memcg_write_event_control, - .flags =3D CFTYPE_NO_PREFIX | CFTYPE_WORLD_WRITABLE, - }, - { - .name =3D "swappiness", - .read_u64 =3D mem_cgroup_swappiness_read, - .write_u64 =3D mem_cgroup_swappiness_write, - }, - { - .name =3D "move_charge_at_immigrate", - .read_u64 =3D mem_cgroup_move_charge_read, - .write_u64 =3D mem_cgroup_move_charge_write, - }, - { - .name =3D "oom_control", - .seq_show =3D mem_cgroup_oom_control_read, - .write_u64 =3D mem_cgroup_oom_control_write, - }, - { - .name =3D "pressure_level", - .seq_show =3D mem_cgroup_dummy_seq_show, - }, -#ifdef CONFIG_NUMA - { - .name =3D "numa_stat", - .seq_show =3D memcg_numa_stat_show, - }, -#endif - { - .name =3D "kmem.limit_in_bytes", - .private =3D MEMFILE_PRIVATE(_KMEM, RES_LIMIT), - .write =3D mem_cgroup_write, - .read_u64 =3D mem_cgroup_read_u64, - }, - { - .name =3D "kmem.usage_in_bytes", - .private =3D MEMFILE_PRIVATE(_KMEM, RES_USAGE), - .read_u64 =3D mem_cgroup_read_u64, - }, - { - .name =3D "kmem.failcnt", - .private =3D MEMFILE_PRIVATE(_KMEM, RES_FAILCNT), - .write =3D mem_cgroup_reset, - .read_u64 =3D mem_cgroup_read_u64, - }, - { - .name =3D "kmem.max_usage_in_bytes", - .private =3D MEMFILE_PRIVATE(_KMEM, RES_MAX_USAGE), - .write =3D mem_cgroup_reset, - .read_u64 =3D mem_cgroup_read_u64, - }, -#if defined(CONFIG_MEMCG_KMEM) && defined(CONFIG_SLUB_DEBUG) - { - .name =3D "kmem.slabinfo", - .seq_show =3D mem_cgroup_slab_show, - }, -#endif - { - .name =3D "kmem.tcp.limit_in_bytes", - .private =3D MEMFILE_PRIVATE(_TCP, RES_LIMIT), - .write =3D mem_cgroup_write, - .read_u64 =3D mem_cgroup_read_u64, - }, - { - .name =3D "kmem.tcp.usage_in_bytes", - .private =3D MEMFILE_PRIVATE(_TCP, RES_USAGE), - .read_u64 =3D mem_cgroup_read_u64, - }, - { - .name =3D "kmem.tcp.failcnt", - .private =3D MEMFILE_PRIVATE(_TCP, RES_FAILCNT), - .write =3D mem_cgroup_reset, - .read_u64 =3D mem_cgroup_read_u64, - }, - { - .name =3D "kmem.tcp.max_usage_in_bytes", - .private =3D MEMFILE_PRIVATE(_TCP, RES_MAX_USAGE), - .write =3D mem_cgroup_reset, - .read_u64 =3D mem_cgroup_read_u64, - }, - { }, /* terminate */ -}; - /* * Private memory cgroup IDR * @@ -4902,7 +4232,7 @@ static int memory_events_local_show(struct seq_file *= m, void *v) return 0; } =20 -static int memory_stat_show(struct seq_file *m, void *v) +int memory_stat_show(struct seq_file *m, void *v) { struct mem_cgroup *memcg =3D mem_cgroup_from_seq(m); char *buf =3D kmalloc(PAGE_SIZE, GFP_KERNEL); @@ -6133,33 +5463,6 @@ static struct cftype swap_files[] =3D { { } /* terminate */ }; =20 -static struct cftype memsw_files[] =3D { - { - .name =3D "memsw.usage_in_bytes", - .private =3D MEMFILE_PRIVATE(_MEMSWAP, RES_USAGE), - .read_u64 =3D mem_cgroup_read_u64, - }, - { - .name =3D "memsw.max_usage_in_bytes", - .private =3D MEMFILE_PRIVATE(_MEMSWAP, RES_MAX_USAGE), - .write =3D mem_cgroup_reset, - .read_u64 =3D mem_cgroup_read_u64, - }, - { - .name =3D "memsw.limit_in_bytes", - .private =3D MEMFILE_PRIVATE(_MEMSWAP, RES_LIMIT), - .write =3D mem_cgroup_write, - .read_u64 =3D mem_cgroup_read_u64, - }, - { - .name =3D "memsw.failcnt", - .private =3D MEMFILE_PRIVATE(_MEMSWAP, RES_FAILCNT), - .write =3D mem_cgroup_reset, - .read_u64 =3D mem_cgroup_read_u64, - }, - { }, /* terminate */ -}; - #if defined(CONFIG_MEMCG_KMEM) && defined(CONFIG_ZSWAP) /** * obj_cgroup_may_zswap - check if this cgroup can zswap --=20 2.45.2 From nobody Sat Feb 7 20:06:57 2026 Received: from out-189.mta0.migadu.com (out-189.mta0.migadu.com [91.218.175.189]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id BBF284D112 for ; Tue, 25 Jun 2024 00:59:51 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=91.218.175.189 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1719277193; cv=none; b=cpIDSfl9i+qz7TknWHm+j332eXBt3sedTQmZmNndHJYy9TQFbNyXX4lRFPCPyrn5TwmUtxZcpkrGaSe4jcXMsO9e0pdCE3m8CLVwH+8LvCDvyJwptkzdXrU4cE+qT2BxwHNGlZHCwJ/eaOEDMl8My4WolIOdHJA1J9FEpX2ACqI= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1719277193; c=relaxed/simple; bh=Pz7OL3KQbrtV83W/iSFgHROsNnyda+pJCYcSrpY3Zt8=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=qJDwIJkU4DJuwn8Cqpw83vTxdbVTuZ2Moqilesnn9R8F1kclBtSi5zgogBe2MuvF/rHp8WW6nImX67IwoCJoO6wPfRh2yM+3ZqM2wh035dxE33j0ISn9PVUa+S/1hFMw6HNksUI0KZZzw3765zxMyc4puwF4eaUnmsI8/TSziDo= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev; spf=pass smtp.mailfrom=linux.dev; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b=PlGf6lnG; arc=none smtp.client-ip=91.218.175.189 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.dev Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b="PlGf6lnG" X-Envelope-To: akpm@linux-foundation.org DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1719277190; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=i6OtryeQEKVoJSpeL/4FN+6EUWrG1/pihifLngpHrqA=; b=PlGf6lnGIQXNmfnq2Lb4Tkf4shhTkmXLgelP2ND3Ym1XBUrQgwU2RRuocobaKhgUDIawY9 y/qwRqVY0IWNNQJ78VvWbqijJuUQHx0Fx3ff0YC5Hnm0PZzzLh8G6Fp4u+r+ci0iSk4sBa ygiGKYxBilotTIeiKryEUQaV54ffcHg= X-Envelope-To: hannes@cmpxchg.org X-Envelope-To: mhocko@kernel.org X-Envelope-To: shakeel.butt@linux.dev X-Envelope-To: muchun.song@linux.dev X-Envelope-To: linux-kernel@vger.kernel.org X-Envelope-To: cgroups@vger.kernel.org X-Envelope-To: linux-mm@kvack.org X-Envelope-To: roman.gushchin@linux.dev X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. From: Roman Gushchin To: Andrew Morton Cc: Johannes Weiner , Michal Hocko , Shakeel Butt , Muchun Song , linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, linux-mm@kvack.org, Roman Gushchin Subject: [PATCH v2 11/14] mm: memcg: make memcg1_update_tree() static Date: Mon, 24 Jun 2024 17:59:03 -0700 Message-ID: <20240625005906.106920-12-roman.gushchin@linux.dev> In-Reply-To: <20240625005906.106920-1-roman.gushchin@linux.dev> References: <20240625005906.106920-1-roman.gushchin@linux.dev> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Migadu-Flow: FLOW_OUT Content-Type: text/plain; charset="utf-8" memcg1_update_tree() is not used outside of mm/memcontrol-v1.c anymore, define it as static and remove the declaration from the header file. Signed-off-by: Roman Gushchin Acked-by: Michal Hocko Acked-by: Shakeel Butt Suggested-by: Matthew Wilcox (Oracle) --- mm/memcontrol-v1.c | 2 +- mm/memcontrol-v1.h | 1 - 2 files changed, 1 insertion(+), 2 deletions(-) diff --git a/mm/memcontrol-v1.c b/mm/memcontrol-v1.c index 1b7337d0170d..f89de413004b 100644 --- a/mm/memcontrol-v1.c +++ b/mm/memcontrol-v1.c @@ -201,7 +201,7 @@ static unsigned long soft_limit_excess(struct mem_cgrou= p *memcg) return excess; } =20 -void memcg1_update_tree(struct mem_cgroup *memcg, int nid) +static void memcg1_update_tree(struct mem_cgroup *memcg, int nid) { unsigned long excess; struct mem_cgroup_per_node *mz; diff --git a/mm/memcontrol-v1.h b/mm/memcontrol-v1.h index 7be4670d9abb..7d6ac4a4fb36 100644 --- a/mm/memcontrol-v1.h +++ b/mm/memcontrol-v1.h @@ -5,7 +5,6 @@ =20 #include =20 -void memcg1_update_tree(struct mem_cgroup *memcg, int nid); void memcg1_remove_from_trees(struct mem_cgroup *memcg); =20 static inline void memcg1_soft_limit_reset(struct mem_cgroup *memcg) --=20 2.45.2 From nobody Sat Feb 7 20:06:57 2026 Received: from out-188.mta0.migadu.com (out-188.mta0.migadu.com [91.218.175.188]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B780D5465B for ; Tue, 25 Jun 2024 00:59:53 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=91.218.175.188 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1719277195; cv=none; b=Vx23bEDvGXxpjZGUWvtDk4ThAia4Y2eiSkSa0FU91qnucAIPDdCndDB01ml95y0QZ7lh4eszaiJufEiavFi4dsf6/PAbkxlDEueSJqGDkpxawSW5gXh5P1KjPVWQu7PF0cAUqtgbTzQQE0Y90XvGn16BPXNpTpmINtMrt2SXzuw= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1719277195; c=relaxed/simple; bh=nwggAZ9PcePaaA5PdLPWy8RfmfcWf7B9WGkKueeHe1U=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=JaEWct70jPe4n4DD/wplgQrg7Fk9I3uHWvnGFRxYmnqBPCktRx1COELgu9C4zZXzCl8Gwc0f/4aCoGeLsMIr1YcfDARzXoHSI/i/UXX2chrGWV89PxmIPU18laTq60WAJOGBoz5qr/wJZcY6R39X69lk/3r8KCexoVKURe2z29s= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev; spf=pass smtp.mailfrom=linux.dev; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b=ISVeAmKN; arc=none smtp.client-ip=91.218.175.188 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.dev Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b="ISVeAmKN" X-Envelope-To: akpm@linux-foundation.org DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1719277192; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=zJ3ODeZAUzmBmpS974GYslSkPfIUFwRZ/UoK6P7Hc/E=; b=ISVeAmKN40Ye3C2e8nOk/sLPBa9vn5C3IabRiHVDyM7fVcdHK4iwaSqjQTJhI4w7xXS49+ TcCWJ2eM9yk2nhKW0+YUkPaWA+XyEtwmvHmLWK2fKGT31wVjvJohmZARjIB/joYkrMpIKJ rjUqnhP5spuFw+Z/tNtF8zHs3mgraZA= X-Envelope-To: hannes@cmpxchg.org X-Envelope-To: mhocko@kernel.org X-Envelope-To: shakeel.butt@linux.dev X-Envelope-To: muchun.song@linux.dev X-Envelope-To: linux-kernel@vger.kernel.org X-Envelope-To: cgroups@vger.kernel.org X-Envelope-To: linux-mm@kvack.org X-Envelope-To: roman.gushchin@linux.dev X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. From: Roman Gushchin To: Andrew Morton Cc: Johannes Weiner , Michal Hocko , Shakeel Butt , Muchun Song , linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, linux-mm@kvack.org, Roman Gushchin Subject: [PATCH v2 12/14] mm: memcg: group cgroup v1 memcg related declarations Date: Mon, 24 Jun 2024 17:59:04 -0700 Message-ID: <20240625005906.106920-13-roman.gushchin@linux.dev> In-Reply-To: <20240625005906.106920-1-roman.gushchin@linux.dev> References: <20240625005906.106920-1-roman.gushchin@linux.dev> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Migadu-Flow: FLOW_OUT Content-Type: text/plain; charset="utf-8" Group all cgroup v1-related declarations at the end of memcontrol.h and mm/memcontrol-v1.h with an intention to put them all together under a config option later on. It should make things easier to follow and maintain too. Signed-off-by: Roman Gushchin Acked-by: Michal Hocko Acked-by: Shakeel Butt Suggested-by: Matthew Wilcox (Oracle) --- include/linux/memcontrol.h | 144 +++++++++++++++++++------------------ mm/memcontrol-v1.h | 89 ++++++++++++----------- 2 files changed, 123 insertions(+), 110 deletions(-) diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h index 588179d29849..a70d64ed04f5 100644 --- a/include/linux/memcontrol.h +++ b/include/linux/memcontrol.h @@ -950,39 +950,13 @@ static inline void mem_cgroup_exit_user_fault(void) current->in_user_fault =3D 0; } =20 -static inline bool task_in_memcg_oom(struct task_struct *p) -{ - return p->memcg_in_oom; -} - -bool mem_cgroup_oom_synchronize(bool wait); struct mem_cgroup *mem_cgroup_get_oom_group(struct task_struct *victim, struct mem_cgroup *oom_domain); void mem_cgroup_print_oom_group(struct mem_cgroup *memcg); =20 -void folio_memcg_lock(struct folio *folio); -void folio_memcg_unlock(struct folio *folio); - void __mod_memcg_state(struct mem_cgroup *memcg, enum memcg_stat_item idx, int val); =20 -/* try to stablize folio_memcg() for all the pages in a memcg */ -static inline bool mem_cgroup_trylock_pages(struct mem_cgroup *memcg) -{ - rcu_read_lock(); - - if (mem_cgroup_disabled() || !atomic_read(&memcg->moving_account)) - return true; - - rcu_read_unlock(); - return false; -} - -static inline void mem_cgroup_unlock_pages(void) -{ - rcu_read_unlock(); -} - /* idx can be of type enum memcg_stat_item or node_stat_item */ static inline void mod_memcg_state(struct mem_cgroup *memcg, enum memcg_stat_item idx, int val) @@ -1109,10 +1083,6 @@ static inline void memcg_memory_event_mm(struct mm_s= truct *mm, =20 void split_page_memcg(struct page *head, int old_order, int new_order); =20 -unsigned long memcg1_soft_limit_reclaim(pg_data_t *pgdat, int order, - gfp_t gfp_mask, - unsigned long *total_scanned); - #else /* CONFIG_MEMCG */ =20 #define MEM_CGROUP_ID_SHIFT 0 @@ -1423,26 +1393,6 @@ mem_cgroup_print_oom_meminfo(struct mem_cgroup *memc= g) { } =20 -static inline void folio_memcg_lock(struct folio *folio) -{ -} - -static inline void folio_memcg_unlock(struct folio *folio) -{ -} - -static inline bool mem_cgroup_trylock_pages(struct mem_cgroup *memcg) -{ - /* to match folio_memcg_rcu() */ - rcu_read_lock(); - return true; -} - -static inline void mem_cgroup_unlock_pages(void) -{ - rcu_read_unlock(); -} - static inline void mem_cgroup_handle_over_high(gfp_t gfp_mask) { } @@ -1455,16 +1405,6 @@ static inline void mem_cgroup_exit_user_fault(void) { } =20 -static inline bool task_in_memcg_oom(struct task_struct *p) -{ - return false; -} - -static inline bool mem_cgroup_oom_synchronize(bool wait) -{ - return false; -} - static inline struct mem_cgroup *mem_cgroup_get_oom_group( struct task_struct *victim, struct mem_cgroup *oom_domain) { @@ -1558,14 +1498,6 @@ void count_memcg_event_mm(struct mm_struct *mm, enum= vm_event_item idx) static inline void split_page_memcg(struct page *head, int old_order, int = new_order) { } - -static inline -unsigned long memcg1_soft_limit_reclaim(pg_data_t *pgdat, int order, - gfp_t gfp_mask, - unsigned long *total_scanned) -{ - return 0; -} #endif /* CONFIG_MEMCG */ =20 /* @@ -1916,4 +1848,80 @@ static inline bool mem_cgroup_zswap_writeback_enable= d(struct mem_cgroup *memcg) } #endif =20 + +/* Cgroup v1-related declarations */ + +#ifdef CONFIG_MEMCG +unsigned long memcg1_soft_limit_reclaim(pg_data_t *pgdat, int order, + gfp_t gfp_mask, + unsigned long *total_scanned); + +bool mem_cgroup_oom_synchronize(bool wait); + +static inline bool task_in_memcg_oom(struct task_struct *p) +{ + return p->memcg_in_oom; +} + +void folio_memcg_lock(struct folio *folio); +void folio_memcg_unlock(struct folio *folio); + +/* try to stablize folio_memcg() for all the pages in a memcg */ +static inline bool mem_cgroup_trylock_pages(struct mem_cgroup *memcg) +{ + rcu_read_lock(); + + if (mem_cgroup_disabled() || !atomic_read(&memcg->moving_account)) + return true; + + rcu_read_unlock(); + return false; +} + +static inline void mem_cgroup_unlock_pages(void) +{ + rcu_read_unlock(); +} + +#else /* CONFIG_MEMCG */ +static inline +unsigned long memcg1_soft_limit_reclaim(pg_data_t *pgdat, int order, + gfp_t gfp_mask, + unsigned long *total_scanned) +{ + return 0; +} + +static inline void folio_memcg_lock(struct folio *folio) +{ +} + +static inline void folio_memcg_unlock(struct folio *folio) +{ +} + +static inline bool mem_cgroup_trylock_pages(struct mem_cgroup *memcg) +{ + /* to match folio_memcg_rcu() */ + rcu_read_lock(); + return true; +} + +static inline void mem_cgroup_unlock_pages(void) +{ + rcu_read_unlock(); +} + +static inline bool task_in_memcg_oom(struct task_struct *p) +{ + return false; +} + +static inline bool mem_cgroup_oom_synchronize(bool wait) +{ + return false; +} + +#endif /* CONFIG_MEMCG */ + #endif /* _LINUX_MEMCONTROL_H */ diff --git a/mm/memcontrol-v1.h b/mm/memcontrol-v1.h index 7d6ac4a4fb36..89d420793048 100644 --- a/mm/memcontrol-v1.h +++ b/mm/memcontrol-v1.h @@ -5,15 +5,9 @@ =20 #include =20 -void memcg1_remove_from_trees(struct mem_cgroup *memcg); - -static inline void memcg1_soft_limit_reset(struct mem_cgroup *memcg) -{ - WRITE_ONCE(memcg->soft_limit, PAGE_COUNTER_MAX); -} +/* Cgroup v1 and v2 common declarations */ =20 void mem_cgroup_charge_statistics(struct mem_cgroup *memcg, int nr_pages); -void memcg1_check_events(struct mem_cgroup *memcg, int nid); int try_charge_memcg(struct mem_cgroup *memcg, gfp_t gfp_mask, unsigned int nr_pages); =20 @@ -29,30 +23,6 @@ static inline int try_charge(struct mem_cgroup *memcg, g= fp_t gfp_mask, void mem_cgroup_id_get_many(struct mem_cgroup *memcg, unsigned int n); void mem_cgroup_id_put_many(struct mem_cgroup *memcg, unsigned int n); =20 -bool memcg1_wait_acct_move(struct mem_cgroup *memcg); -struct cgroup_taskset; -int memcg1_can_attach(struct cgroup_taskset *tset); -void memcg1_cancel_attach(struct cgroup_taskset *tset); -void memcg1_move_task(void); - -/* - * Per memcg event counter is incremented at every pagein/pageout. With TH= P, - * it will be incremented by the number of pages. This counter is used - * to trigger some periodic events. This is straightforward and better - * than using jiffies etc. to handle periodic memcg event. - */ -enum mem_cgroup_events_target { - MEM_CGROUP_TARGET_THRESH, - MEM_CGROUP_TARGET_SOFTLIMIT, - MEM_CGROUP_NTARGETS, -}; - -/* Whether legacy memory+swap accounting is active */ -static bool do_memsw_account(void) -{ - return !cgroup_subsys_on_dfl(memory_cgrp_subsys); -} - /* * Iteration constructs for visiting all cgroups (under a tree). If * loops are exited prematurely (break), mem_cgroup_iter_break() must @@ -68,24 +38,28 @@ static bool do_memsw_account(void) iter !=3D NULL; \ iter =3D mem_cgroup_iter(NULL, iter, NULL)) =20 -void memcg1_css_offline(struct mem_cgroup *memcg); +/* Whether legacy memory+swap accounting is active */ +static bool do_memsw_account(void) +{ + return !cgroup_subsys_on_dfl(memory_cgrp_subsys); +} =20 -/* for encoding cft->private value on file */ -enum res_type { - _MEM, - _MEMSWAP, - _KMEM, - _TCP, +/* + * Per memcg event counter is incremented at every pagein/pageout. With TH= P, + * it will be incremented by the number of pages. This counter is used + * to trigger some periodic events. This is straightforward and better + * than using jiffies etc. to handle periodic memcg event. + */ +enum mem_cgroup_events_target { + MEM_CGROUP_TARGET_THRESH, + MEM_CGROUP_TARGET_SOFTLIMIT, + MEM_CGROUP_NTARGETS, }; =20 bool mem_cgroup_event_ratelimit(struct mem_cgroup *memcg, enum mem_cgroup_events_target target); unsigned long mem_cgroup_usage(struct mem_cgroup *memcg, bool swap); =20 -bool memcg1_oom_prepare(struct mem_cgroup *memcg, bool *locked); -void memcg1_oom_finish(struct mem_cgroup *memcg, bool locked); -void memcg1_oom_recover(struct mem_cgroup *memcg); - void drain_all_stock(struct mem_cgroup *root_memcg); unsigned long mem_cgroup_nr_lru_pages(struct mem_cgroup *memcg, unsigned int lru_mask, bool tree); @@ -100,6 +74,37 @@ unsigned long memcg_page_state_output(struct mem_cgroup= *memcg, int item); unsigned long memcg_page_state_local_output(struct mem_cgroup *memcg, int = item); int memory_stat_show(struct seq_file *m, void *v); =20 +/* Cgroup v1-specific declarations */ + +void memcg1_remove_from_trees(struct mem_cgroup *memcg); + +static inline void memcg1_soft_limit_reset(struct mem_cgroup *memcg) +{ + WRITE_ONCE(memcg->soft_limit, PAGE_COUNTER_MAX); +} + +bool memcg1_wait_acct_move(struct mem_cgroup *memcg); + +struct cgroup_taskset; +int memcg1_can_attach(struct cgroup_taskset *tset); +void memcg1_cancel_attach(struct cgroup_taskset *tset); +void memcg1_move_task(void); +void memcg1_css_offline(struct mem_cgroup *memcg); + +/* for encoding cft->private value on file */ +enum res_type { + _MEM, + _MEMSWAP, + _KMEM, + _TCP, +}; + +bool memcg1_oom_prepare(struct mem_cgroup *memcg, bool *locked); +void memcg1_oom_finish(struct mem_cgroup *memcg, bool locked); +void memcg1_oom_recover(struct mem_cgroup *memcg); + +void memcg1_check_events(struct mem_cgroup *memcg, int nid); + void memcg1_stat_format(struct mem_cgroup *memcg, struct seq_buf *s); =20 extern struct cftype memsw_files[]; --=20 2.45.2 From nobody Sat Feb 7 20:06:57 2026 Received: from out-181.mta0.migadu.com (out-181.mta0.migadu.com [91.218.175.181]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id E82B361FE5 for ; Tue, 25 Jun 2024 00:59:55 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=91.218.175.181 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1719277198; cv=none; b=MQat1TtTqnrM97We6ViDK3GvYFglyhZXLJdyt1Ll5x4UhbTEhvkUtUMe3LUb2HyWxehmondSXLx3UI9tFFc9t6yrPh3e13q2uq52BnF8l4LUz8LV6WOguvztVlkonZ2SgwcNHdTe4vJ74moSVIHdoXuhezDiwqv3aLGGcQG5loI= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1719277198; c=relaxed/simple; bh=DXbBSXMMQBlOnmcmQav3CtTvcCGhGgqyFQgW+B/FlMw=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=fXRHoJHSdwyNukmXXrTf2gU1t5C5RaWDMIl4nl8ZjlpgdzLYH6Ok+IX04OggSKKEo63mWX5lZ2HSyAoErhFReI8BVhIn2VQ+MokZr0AS8uGXOuIHwJxIeVxYEQTYylX0/kpX+RwEOvB5g7+bSyu2PZA+Wzn1e1qQax4FX9C1Z9I= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev; spf=pass smtp.mailfrom=linux.dev; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b=xZ/trUm1; arc=none smtp.client-ip=91.218.175.181 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.dev Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b="xZ/trUm1" X-Envelope-To: akpm@linux-foundation.org DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1719277194; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=aKa/FfyHPEH4o1sFdYYdtwpHqrQqG+YVicTF0vyr7Es=; b=xZ/trUm1bPRxjkxHdfwXYw23D8g3nT4RuyvbkwX3GrpmF6OUjQPtABVAGUzjlLOXtMZzFy N73Jm3sr4Sw/t537Ch/wnspIKfD3obUNU29cQ4gbrd662+v0CBRE7p65+BiGKRMKacyg7m gBsWfxX0p2NJAop6892EaPa5rPKrxWk= X-Envelope-To: hannes@cmpxchg.org X-Envelope-To: mhocko@kernel.org X-Envelope-To: shakeel.butt@linux.dev X-Envelope-To: muchun.song@linux.dev X-Envelope-To: linux-kernel@vger.kernel.org X-Envelope-To: cgroups@vger.kernel.org X-Envelope-To: linux-mm@kvack.org X-Envelope-To: roman.gushchin@linux.dev X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. From: Roman Gushchin To: Andrew Morton Cc: Johannes Weiner , Michal Hocko , Shakeel Butt , Muchun Song , linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, linux-mm@kvack.org, Roman Gushchin Subject: [PATCH v2 13/14] mm: memcg: put cgroup v1-related members of task_struct under config option Date: Mon, 24 Jun 2024 17:59:05 -0700 Message-ID: <20240625005906.106920-14-roman.gushchin@linux.dev> In-Reply-To: <20240625005906.106920-1-roman.gushchin@linux.dev> References: <20240625005906.106920-1-roman.gushchin@linux.dev> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Migadu-Flow: FLOW_OUT Content-Type: text/plain; charset="utf-8" Guard cgroup v1-related members of task_struct under the CONFIG_MEMCG_V1 config option, so that users who adopted cgroup v2 don't have to waste the memory for fields which are never accessed. Signed-off-by: Roman Gushchin Acked-by: Michal Hocko Acked-by: Shakeel Butt Suggested-by: Matthew Wilcox (Oracle) --- include/linux/memcontrol.h | 6 +++--- init/Kconfig | 9 +++++++++ mm/Makefile | 3 ++- mm/memcontrol-v1.h | 21 ++++++++++++++++++++- mm/memcontrol.c | 10 +++++++--- 5 files changed, 41 insertions(+), 8 deletions(-) diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h index a70d64ed04f5..796cfa842346 100644 --- a/include/linux/memcontrol.h +++ b/include/linux/memcontrol.h @@ -1851,7 +1851,7 @@ static inline bool mem_cgroup_zswap_writeback_enabled= (struct mem_cgroup *memcg) =20 /* Cgroup v1-related declarations */ =20 -#ifdef CONFIG_MEMCG +#ifdef CONFIG_MEMCG_V1 unsigned long memcg1_soft_limit_reclaim(pg_data_t *pgdat, int order, gfp_t gfp_mask, unsigned long *total_scanned); @@ -1883,7 +1883,7 @@ static inline void mem_cgroup_unlock_pages(void) rcu_read_unlock(); } =20 -#else /* CONFIG_MEMCG */ +#else /* CONFIG_MEMCG_V1 */ static inline unsigned long memcg1_soft_limit_reclaim(pg_data_t *pgdat, int order, gfp_t gfp_mask, @@ -1922,6 +1922,6 @@ static inline bool mem_cgroup_oom_synchronize(bool wa= it) return false; } =20 -#endif /* CONFIG_MEMCG */ +#endif /* CONFIG_MEMCG_V1 */ =20 #endif /* _LINUX_MEMCONTROL_H */ diff --git a/init/Kconfig b/init/Kconfig index febdea2afc3b..5191b6435b4e 100644 --- a/init/Kconfig +++ b/init/Kconfig @@ -969,6 +969,15 @@ config MEMCG help Provides control over the memory footprint of tasks in a cgroup. =20 +config MEMCG_V1 + bool "Legacy memory controller" + depends on MEMCG + default n + help + Legacy cgroup v1 memory controller. + + San N is unsure. + config MEMCG_KMEM bool depends on MEMCG diff --git a/mm/Makefile b/mm/Makefile index 124d4dea2035..d2915f8c9dc0 100644 --- a/mm/Makefile +++ b/mm/Makefile @@ -96,7 +96,8 @@ obj-$(CONFIG_NUMA) +=3D memory-tiers.o obj-$(CONFIG_DEVICE_MIGRATION) +=3D migrate_device.o obj-$(CONFIG_TRANSPARENT_HUGEPAGE) +=3D huge_memory.o khugepaged.o obj-$(CONFIG_PAGE_COUNTER) +=3D page_counter.o -obj-$(CONFIG_MEMCG) +=3D memcontrol.o memcontrol-v1.o vmpressure.o +obj-$(CONFIG_MEMCG_V1) +=3D memcontrol-v1.o +obj-$(CONFIG_MEMCG) +=3D memcontrol.o vmpressure.o ifdef CONFIG_SWAP obj-$(CONFIG_MEMCG) +=3D swap_cgroup.o endif diff --git a/mm/memcontrol-v1.h b/mm/memcontrol-v1.h index 89d420793048..64b053d7f131 100644 --- a/mm/memcontrol-v1.h +++ b/mm/memcontrol-v1.h @@ -75,7 +75,7 @@ unsigned long memcg_page_state_local_output(struct mem_cg= roup *memcg, int item); int memory_stat_show(struct seq_file *m, void *v); =20 /* Cgroup v1-specific declarations */ - +#ifdef CONFIG_MEMCG_V1 void memcg1_remove_from_trees(struct mem_cgroup *memcg); =20 static inline void memcg1_soft_limit_reset(struct mem_cgroup *memcg) @@ -110,4 +110,23 @@ void memcg1_stat_format(struct mem_cgroup *memcg, stru= ct seq_buf *s); extern struct cftype memsw_files[]; extern struct cftype mem_cgroup_legacy_files[]; =20 +#else /* CONFIG_MEMCG_V1 */ + +static inline void memcg1_remove_from_trees(struct mem_cgroup *memcg) {} +static inline void memcg1_soft_limit_reset(struct mem_cgroup *memcg) {} +static inline bool memcg1_wait_acct_move(struct mem_cgroup *memcg) { retur= n false; } +static inline void memcg1_css_offline(struct mem_cgroup *memcg) {} + +static inline bool memcg1_oom_prepare(struct mem_cgroup *memcg, bool *lock= ed) { return true; } +static inline void memcg1_oom_finish(struct mem_cgroup *memcg, bool locked= ) {} +static inline void memcg1_oom_recover(struct mem_cgroup *memcg) {} + +static inline void memcg1_check_events(struct mem_cgroup *memcg, int nid) = {} + +static inline void memcg1_stat_format(struct mem_cgroup *memcg, struct seq= _buf *s) {} + +extern struct cftype memsw_files[]; +extern struct cftype mem_cgroup_legacy_files[]; +#endif /* CONFIG_MEMCG_V1 */ + #endif /* __MM_MEMCONTROL_V1_H */ diff --git a/mm/memcontrol.c b/mm/memcontrol.c index c7341e811945..d2e1f8baeae8 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -4471,18 +4471,20 @@ struct cgroup_subsys memory_cgrp_subsys =3D { .css_free =3D mem_cgroup_css_free, .css_reset =3D mem_cgroup_css_reset, .css_rstat_flush =3D mem_cgroup_css_rstat_flush, - .can_attach =3D memcg1_can_attach, #if defined(CONFIG_LRU_GEN) || defined(CONFIG_MEMCG_KMEM) .attach =3D mem_cgroup_attach, #endif - .cancel_attach =3D memcg1_cancel_attach, - .post_attach =3D memcg1_move_task, #ifdef CONFIG_MEMCG_KMEM .fork =3D mem_cgroup_fork, .exit =3D mem_cgroup_exit, #endif .dfl_cftypes =3D memory_files, +#ifdef CONFIG_MEMCG_V1 + .can_attach =3D memcg1_can_attach, + .cancel_attach =3D memcg1_cancel_attach, + .post_attach =3D memcg1_move_task, .legacy_cftypes =3D mem_cgroup_legacy_files, +#endif .early_init =3D 0, }; =20 @@ -5653,7 +5655,9 @@ static int __init mem_cgroup_swap_init(void) return 0; =20 WARN_ON(cgroup_add_dfl_cftypes(&memory_cgrp_subsys, swap_files)); +#ifdef CONFIG_MEMCG_V1 WARN_ON(cgroup_add_legacy_cftypes(&memory_cgrp_subsys, memsw_files)); +#endif #if defined(CONFIG_MEMCG_KMEM) && defined(CONFIG_ZSWAP) WARN_ON(cgroup_add_dfl_cftypes(&memory_cgrp_subsys, zswap_files)); #endif --=20 2.45.2 From nobody Sat Feb 7 20:06:57 2026 Received: from out-188.mta0.migadu.com (out-188.mta0.migadu.com [91.218.175.188]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 6B1C06BB5C for ; Tue, 25 Jun 2024 00:59:57 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=91.218.175.188 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1719277199; cv=none; b=cW0YMU6jBgVuCyKxPv2wZoz89PXS51zH1DkIREt/GdWp+eI1kvid41wsnYJDg0Xb2xHV3ds8fesLEQOocB1YNtGn/7tbYxcM2s+iU2K/2ZKXHTLpeeneYGaBMiJO+p6CEgIEEMKqrYddbQP68TXCwNw3m6UrzvSegvnLKNPanB8= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1719277199; c=relaxed/simple; bh=XjGRooweZTz6QDvQKDiYt7nyI4M7WsUbBxp6B2Rm0GA=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=jQX8YhOuMn7TZ3tYqUwNChDXir4I3OjasNrLIY2Vj9Nv7a7MCSzFwLdajm/j8gyvcz0aMNWqoglMRzyME3D7qvTHKn3vUmt5Q7jVellCL7tnigFrCa7K/PMR9TwoKCaE2Xlim8Qi+pC/vZfhe0uXuJaVeQELxd5kCOe/YAo8Cbw= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev; spf=pass smtp.mailfrom=linux.dev; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b=cUo/vEeX; arc=none smtp.client-ip=91.218.175.188 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.dev Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b="cUo/vEeX" X-Envelope-To: akpm@linux-foundation.org DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1719277196; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=IUWh34mKvDA5ZX4ELdIfwkO9YFkJ3Clj/+3O87GvqhQ=; b=cUo/vEeXYjeXJIY2wMDo5RPQ1Z5aqrE///lMqhQ/e22bOBg2cLi6CMQ29cSj37a4yT3qEg DL2MBC7H/y1neuQrubwTKXQ6XXXymtTO5RkYGmrEqVJC7icFj4khgDg89CwZfQ2yUNBiF6 cpLXh1XL96ClOj+cqZSLEtHEHkRQoCk= X-Envelope-To: hannes@cmpxchg.org X-Envelope-To: mhocko@kernel.org X-Envelope-To: shakeel.butt@linux.dev X-Envelope-To: muchun.song@linux.dev X-Envelope-To: linux-kernel@vger.kernel.org X-Envelope-To: cgroups@vger.kernel.org X-Envelope-To: linux-mm@kvack.org X-Envelope-To: roman.gushchin@linux.dev X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. From: Roman Gushchin To: Andrew Morton Cc: Johannes Weiner , Michal Hocko , Shakeel Butt , Muchun Song , linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, linux-mm@kvack.org, Roman Gushchin Subject: [PATCH v2 14/14] MAINTAINERS: add mm/memcontrol-v1.c/h to the list of maintained files Date: Mon, 24 Jun 2024 17:59:06 -0700 Message-ID: <20240625005906.106920-15-roman.gushchin@linux.dev> In-Reply-To: <20240625005906.106920-1-roman.gushchin@linux.dev> References: <20240625005906.106920-1-roman.gushchin@linux.dev> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Migadu-Flow: FLOW_OUT Content-Type: text/plain; charset="utf-8" Signed-off-by: Roman Gushchin Acked-by: Shakeel Butt Suggested-by: Matthew Wilcox (Oracle) --- MAINTAINERS | 2 ++ 1 file changed, 2 insertions(+) diff --git a/MAINTAINERS b/MAINTAINERS index 7ad96cbb9f28..52a4089746b3 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -5582,6 +5582,8 @@ L: linux-mm@kvack.org S: Maintained F: include/linux/memcontrol.h F: mm/memcontrol.c +F: mm/memcontrol-v1.c +F: mm/memcontrol-v1.h F: mm/swap_cgroup.c F: samples/cgroup/* F: tools/testing/selftests/cgroup/memcg_protection.m --=20 2.45.2