From nobody Mon Nov 25 21:18:54 2024 Received: from out-180.mta1.migadu.com (out-180.mta1.migadu.com [95.215.58.180]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id E3E04189F5F for ; Thu, 24 Oct 2024 06:57:27 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=95.215.58.180 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1729753051; cv=none; b=rZ0I5Xnc3fN8d1ySi2z32f6ASoNqYjGgNZAMXqVqtX40+PrvfjNRqRAA3vNhrFsJPDO2ztAX/f8oJ4EiJH4pjtZ34Jc7SaJYSg5E4SAsqDByJEinTcwBER7gMkBxuigLYd63z3VoRIn3xMOCy9XwBtXdXk9hvqtGeEm4fDJwC5Y= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1729753051; c=relaxed/simple; bh=EGJzMDMe+0j7metEGVCFtC9+944ca/RANDlJZlL8+f4=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=DCjKcOJ67hG867QmicETVdWfIh4qPc7sDt/5AEea2Smvkcna3A1JPVRRpvB9+Hw+FuA8Nbb6xR47xpT1VkSB2iMYnvQnDMwTYkLVHDBmp9+lav/MdzKAmjH2Pa9FaJtV4uofum8sTd6Mc3J1J5CMWR4cNIEbYiIOK6amsyV1gF0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev; spf=pass smtp.mailfrom=linux.dev; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b=Kjf84Jp7; arc=none smtp.client-ip=95.215.58.180 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.dev Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b="Kjf84Jp7" X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1729753046; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=Mgr83r2IKcrYLJOy8T73HRhPqfYNucsjFTt8Ge3VU38=; b=Kjf84Jp7OIqdpzZVceEWDUi0+bwEUOmndVPpmZE1fiaOTfTCdfeHWmTPGRpku1xwgY8bp8 U60q2aXQgrip6fBZ1ZVJImWWEZbroG3Iwi8NEQxuJGcqyR2Kvar9Vp7hYbCR8eFZY9lXDT grcTd/KAaZVZ1samkosAq9DTt8eUDtA= From: Shakeel Butt To: Andrew Morton Cc: Johannes Weiner , Michal Hocko , Roman Gushchin , Muchun Song , Hugh Dickins , linux-mm@kvack.org, cgroups@vger.kernel.org, linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-doc@vger.kernel.org, Meta kernel team Subject: [RFC PATCH 1/3] memcg-v1: fully deprecate move_charge_at_immigrate Date: Wed, 23 Oct 2024 23:57:10 -0700 Message-ID: <20241024065712.1274481-2-shakeel.butt@linux.dev> In-Reply-To: <20241024065712.1274481-1-shakeel.butt@linux.dev> References: <20241024065712.1274481-1-shakeel.butt@linux.dev> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Migadu-Flow: FLOW_OUT Content-Type: text/plain; charset="utf-8" Proceed with the complete deprecation of memcg v1's charge moving feature. The deprecation warning has been in the kernel for almost two years and has been ported to all stable kernel since. Now is the time to fully deprecate this feature. Signed-off-by: Shakeel Butt Reviewed-by: Roman Gushchin --- .../admin-guide/cgroup-v1/memory.rst | 82 +------------------ mm/memcontrol-v1.c | 14 +--- 2 files changed, 5 insertions(+), 91 deletions(-) diff --git a/Documentation/admin-guide/cgroup-v1/memory.rst b/Documentation= /admin-guide/cgroup-v1/memory.rst index 270501db9f4e..286d16fc22eb 100644 --- a/Documentation/admin-guide/cgroup-v1/memory.rst +++ b/Documentation/admin-guide/cgroup-v1/memory.rst @@ -90,9 +90,7 @@ Brief summary of control files. used. memory.swappiness set/show swappiness parameter of vmscan (See sysctl's vm.swappiness) - memory.move_charge_at_immigrate set/show controls of moving charges - This knob is deprecated and shouldn't= be - used. + memory.move_charge_at_immigrate This knob is deprecated. memory.oom_control set/show oom controls. This knob is deprecated and shouldn't= be used. @@ -243,10 +241,6 @@ behind this approach is that a cgroup that aggressivel= y uses a shared page will eventually get charged for it (once it is uncharged from the cgroup that brought it in -- this will happen on memory pressure). =20 -But see :ref:`section 8.2 ` when moving a -task to another cgroup, its pages may be recharged to the new cgroup, if -move_charge_at_immigrate has been chosen. - 2.4 Swap Extension -------------------------------------- =20 @@ -756,78 +750,8 @@ If we want to change this to 1G, we can at any time us= e:: =20 THIS IS DEPRECATED! =20 -It's expensive and unreliable! It's better practice to launch workload -tasks directly from inside their target cgroup. Use dedicated workload -cgroups to allow fine-grained policy adjustments without having to -move physical pages between control domains. - -Users can move charges associated with a task along with task migration, t= hat -is, uncharge task's pages from the old cgroup and charge them to the new c= group. -This feature is not supported in !CONFIG_MMU environments because of lack = of -page tables. - -8.1 Interface -------------- - -This feature is disabled by default. It can be enabled (and disabled again= ) by -writing to memory.move_charge_at_immigrate of the destination cgroup. - -If you want to enable it:: - - # echo (some positive value) > memory.move_charge_at_immigrate - -.. note:: - Each bits of move_charge_at_immigrate has its own meaning about what= type - of charges should be moved. See :ref:`section 8.2 - ` for details. - -.. note:: - Charges are moved only when you move mm->owner, in other words, - a leader of a thread group. - -.. note:: - If we cannot find enough space for the task in the destination cgrou= p, we - try to make space by reclaiming memory. Task migration may fail if we - cannot make enough space. - -.. note:: - It can take several seconds if you move charges much. - -And if you want disable it again:: - - # echo 0 > memory.move_charge_at_immigrate - -.. _cgroup-v1-memory-movable-charges: - -8.2 Type of charges which can be moved --------------------------------------- - -Each bit in move_charge_at_immigrate has its own meaning about what type of -charges should be moved. But in any case, it must be noted that an account= of -a page or a swap can be moved only when it is charged to the task's current -(old) memory cgroup. - -+---+---------------------------------------------------------------------= -----+ -|bit| what type of charges would be moved ? = | -+=3D=3D=3D+=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D+ -| 0 | A charge of an anonymous page (or swap of it) used by the target tas= k. | -| | You must enable Swap Extension (see 2.4) to enable move of swap char= ges. | -+---+---------------------------------------------------------------------= -----+ -| 1 | A charge of file pages (normal file, tmpfs file (e.g. ipc shared mem= ory) | -| | and swaps of tmpfs file) mmapped by the target task. Unlike the case= of | -| | anonymous pages, file pages (and swaps) in the range mmapped by the = task | -| | will be moved even if the task hasn't done page fault, i.e. they mig= ht | -| | not be the task's "RSS", but other task's "RSS" that maps the same f= ile. | -| | The mapcount of the page is ignored (the page can be moved independe= nt | -| | of the mapcount). You must enable Swap Extension (see 2.4) to = | -| | enable move of swap charges. = | -+---+---------------------------------------------------------------------= -----+ - -8.3 TODO --------- - -- All of moving charge operations are done under cgroup_mutex. It's not go= od - behavior to hold the mutex too long, so we may need some trick. +Reading memory.move_charge_at_immigrate will always return 0 and writing +to it will always return -EINVAL. =20 9. Memory thresholds =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D diff --git a/mm/memcontrol-v1.c b/mm/memcontrol-v1.c index 81d8819f13cd..8f88540f0159 100644 --- a/mm/memcontrol-v1.c +++ b/mm/memcontrol-v1.c @@ -593,7 +593,7 @@ static inline int mem_cgroup_move_swap_account(swp_entr= y_t entry, static u64 mem_cgroup_move_charge_read(struct cgroup_subsys_state *css, struct cftype *cft) { - return mem_cgroup_from_css(css)->move_charge_at_immigrate; + return 0; } =20 #ifdef CONFIG_MMU @@ -606,17 +606,7 @@ static int mem_cgroup_move_charge_write(struct cgroup_= subsys_state *css, "Please report your usecase to linux-mm@kvack.org if you " "depend on this functionality.\n"); =20 - if (val & ~MOVE_MASK) - return -EINVAL; - - /* - * No kind of locking is needed in here, because ->can_attach() will - * check this value once in the beginning of the process, and then carry - * on with stale data. This means that changes to this value will only - * affect task migrations starting after the change. - */ - memcg->move_charge_at_immigrate =3D val; - return 0; + return -EINVAL; } #else static int mem_cgroup_move_charge_write(struct cgroup_subsys_state *css, --=20 2.43.5 From nobody Mon Nov 25 21:18:54 2024 Received: from out-185.mta1.migadu.com (out-185.mta1.migadu.com [95.215.58.185]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 396FC18B485 for ; Thu, 24 Oct 2024 06:57:31 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=95.215.58.185 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1729753055; cv=none; b=lqf28hIeKx0cD6DRqgIFP39OSfdxYVxvhwC8BQO4Jnc18x52Ef9WKywR+WIWzLhaYhamWEWjK+LzHH3xy65mST6eNnSu7speBfhByM8mbMT9O6OWgGIjOxJOcv/b5yTTXnDmKTr09H0Fcoq/RpnYEFfkqpQI9ugVVuE/9/TGyOo= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1729753055; c=relaxed/simple; bh=SP6kqGEClpAVz3xBrmMF3iu5f9JdPPs6CePFL20E/4o=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=KbaMRaxLW5AJZugLafPavZpnpZgFxo2xzfOUSVuf8MAvo0Q/WqElkPiOtS8YikS40YZXiTAQ7SgXWt+vjrFeB2fWKpyzrJPNSMmTyAIgvifCWG9wcaA9FVIsCtPj82YmxZj8l+uDfh0Tdx44gjNJXLpPyJkIcq5sGtgBG6WxwKA= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev; spf=pass smtp.mailfrom=linux.dev; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b=kr8TwIjj; arc=none smtp.client-ip=95.215.58.185 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.dev Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b="kr8TwIjj" X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1729753049; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=/acFZbfQOGBuIQn7AgFY5qft19AmPwKwRE4xsta9KDQ=; b=kr8TwIjjysJk0D0i+qgOPwjxaZ0GfOAd6XR74iEyTBJCoSwPR5h+/tQKVJ7ksnRxyMaR20 B6xuRuRXEjln8npO2o9amqtsOoS5zB/0xMzkt46bRiQMs8WSBfq+q2OdBhJb3LMJ4DS3Qw nzuTOGUH6gc2pz44z0DaRc55qzyQuCE= From: Shakeel Butt To: Andrew Morton Cc: Johannes Weiner , Michal Hocko , Roman Gushchin , Muchun Song , Hugh Dickins , linux-mm@kvack.org, cgroups@vger.kernel.org, linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-doc@vger.kernel.org, Meta kernel team Subject: [RFC PATCH 2/3] memcg-v1: remove charge move code Date: Wed, 23 Oct 2024 23:57:11 -0700 Message-ID: <20241024065712.1274481-3-shakeel.butt@linux.dev> In-Reply-To: <20241024065712.1274481-1-shakeel.butt@linux.dev> References: <20241024065712.1274481-1-shakeel.butt@linux.dev> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Migadu-Flow: FLOW_OUT Content-Type: text/plain; charset="utf-8" The memcg-v1 charge move feature has been deprecated completely and let's remove the relevant code as well. Signed-off-by: Shakeel Butt Acked-by: Michal Hocko Reviewed-by: Roman Gushchin --- include/linux/memcontrol.h | 5 - mm/memcontrol-v1.c | 864 ------------------------------------- mm/memcontrol-v1.h | 6 - mm/memcontrol.c | 9 - 4 files changed, 884 deletions(-) diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h index 524006313b0d..798db70b0a30 100644 --- a/include/linux/memcontrol.h +++ b/include/linux/memcontrol.h @@ -299,11 +299,6 @@ struct mem_cgroup { /* For oom notifier event fd */ struct list_head oom_notify; =20 - /* - * Should we move charges of a task when a task is moved into this - * mem_cgroup ? And what type of charges should we move ? - */ - unsigned long move_charge_at_immigrate; /* taken only while moving_account > 0 */ spinlock_t move_lock; unsigned long move_lock_flags; diff --git a/mm/memcontrol-v1.c b/mm/memcontrol-v1.c index 8f88540f0159..79339cb65b9d 100644 --- a/mm/memcontrol-v1.c +++ b/mm/memcontrol-v1.c @@ -40,31 +40,6 @@ static struct mem_cgroup_tree soft_limit_tree __read_mos= tly; #define MEM_CGROUP_MAX_RECLAIM_LOOPS 100 #define MEM_CGROUP_MAX_SOFT_LIMIT_RECLAIM_LOOPS 2 =20 -/* Stuffs for move charges at task migration. */ -/* - * Types of charges to be moved. - */ -#define MOVE_ANON 0x1ULL -#define MOVE_FILE 0x2ULL -#define MOVE_MASK (MOVE_ANON | MOVE_FILE) - -/* "mc" and its members are protected by cgroup_mutex */ -static struct move_charge_struct { - spinlock_t lock; /* for from, to */ - struct mm_struct *mm; - struct mem_cgroup *from; - struct mem_cgroup *to; - unsigned long flags; - unsigned long precharge; - unsigned long moved_charge; - unsigned long moved_swap; - struct task_struct *moving_task; /* a task moving charges */ - wait_queue_head_t waitq; /* a waitq for other context */ -} mc =3D { - .lock =3D __SPIN_LOCK_UNLOCKED(mc.lock), - .waitq =3D __WAIT_QUEUE_HEAD_INITIALIZER(mc.waitq), -}; - /* for OOM */ struct mem_cgroup_eventfd_list { struct list_head list; @@ -426,51 +401,6 @@ unsigned long memcg1_soft_limit_reclaim(pg_data_t *pgd= at, int order, return nr_reclaimed; } =20 -/* - * A routine for checking "mem" is under move_account() or not. - * - * Checking a cgroup is mc.from or mc.to or under hierarchy of - * moving cgroups. This is for waiting at high-memory pressure - * caused by "move". - */ -static bool mem_cgroup_under_move(struct mem_cgroup *memcg) -{ - struct mem_cgroup *from; - struct mem_cgroup *to; - bool ret =3D false; - /* - * Unlike task_move routines, we access mc.to, mc.from not under - * mutual exclusion by cgroup_mutex. Here, we take spinlock instead. - */ - spin_lock(&mc.lock); - from =3D mc.from; - to =3D mc.to; - if (!from) - goto unlock; - - ret =3D mem_cgroup_is_descendant(from, memcg) || - mem_cgroup_is_descendant(to, memcg); -unlock: - spin_unlock(&mc.lock); - return ret; -} - -bool memcg1_wait_acct_move(struct mem_cgroup *memcg) -{ - if (mc.moving_task && current !=3D mc.moving_task) { - if (mem_cgroup_under_move(memcg)) { - DEFINE_WAIT(wait); - prepare_to_wait(&mc.waitq, &wait, TASK_INTERRUPTIBLE); - /* moving charge context might have finished. */ - if (mc.moving_task) - schedule(); - finish_wait(&mc.waitq, &wait); - return true; - } - } - return false; -} - /** * folio_memcg_lock - Bind a folio to its memcg. * @folio: The folio. @@ -552,44 +482,6 @@ void folio_memcg_unlock(struct folio *folio) __folio_memcg_unlock(folio_memcg(folio)); } =20 -#ifdef CONFIG_SWAP -/** - * mem_cgroup_move_swap_account - move swap charge and swap_cgroup's recor= d. - * @entry: swap entry to be moved - * @from: mem_cgroup which the entry is moved from - * @to: mem_cgroup which the entry is moved to - * - * It succeeds only when the swap_cgroup's record for this entry is the sa= me - * as the mem_cgroup's id of @from. - * - * Returns 0 on success, -EINVAL on failure. - * - * The caller must have charged to @to, IOW, called page_counter_charge() = about - * both res and memsw, and called css_get(). - */ -static int mem_cgroup_move_swap_account(swp_entry_t entry, - struct mem_cgroup *from, struct mem_cgroup *to) -{ - unsigned short old_id, new_id; - - old_id =3D mem_cgroup_id(from); - new_id =3D mem_cgroup_id(to); - - if (swap_cgroup_cmpxchg(entry, old_id, new_id) =3D=3D old_id) { - mod_memcg_state(from, MEMCG_SWAP, -1); - mod_memcg_state(to, MEMCG_SWAP, 1); - return 0; - } - return -EINVAL; -} -#else -static inline int mem_cgroup_move_swap_account(swp_entry_t entry, - struct mem_cgroup *from, struct mem_cgroup *to) -{ - return -EINVAL; -} -#endif - static u64 mem_cgroup_move_charge_read(struct cgroup_subsys_state *css, struct cftype *cft) { @@ -600,8 +492,6 @@ static u64 mem_cgroup_move_charge_read(struct cgroup_su= bsys_state *css, static int mem_cgroup_move_charge_write(struct cgroup_subsys_state *css, struct cftype *cft, u64 val) { - struct mem_cgroup *memcg =3D mem_cgroup_from_css(css); - pr_warn_once("Cgroup memory moving (move_charge_at_immigrate) is deprecat= ed. " "Please report your usecase to linux-mm@kvack.org if you " "depend on this functionality.\n"); @@ -616,760 +506,6 @@ static int mem_cgroup_move_charge_write(struct cgroup= _subsys_state *css, } #endif =20 -#ifdef CONFIG_MMU -/* Handlers for move charge at task migration. */ -static int mem_cgroup_do_precharge(unsigned long count) -{ - int ret; - - /* Try a single bulk charge without reclaim first, kswapd may wake */ - ret =3D try_charge(mc.to, GFP_KERNEL & ~__GFP_DIRECT_RECLAIM, count); - if (!ret) { - mc.precharge +=3D count; - return ret; - } - - /* Try charges one by one with reclaim, but do not retry */ - while (count--) { - ret =3D try_charge(mc.to, GFP_KERNEL | __GFP_NORETRY, 1); - if (ret) - return ret; - mc.precharge++; - cond_resched(); - } - return 0; -} - -union mc_target { - struct folio *folio; - swp_entry_t ent; -}; - -enum mc_target_type { - MC_TARGET_NONE =3D 0, - MC_TARGET_PAGE, - MC_TARGET_SWAP, - MC_TARGET_DEVICE, -}; - -static struct page *mc_handle_present_pte(struct vm_area_struct *vma, - unsigned long addr, pte_t ptent) -{ - struct page *page =3D vm_normal_page(vma, addr, ptent); - - if (!page) - return NULL; - if (PageAnon(page)) { - if (!(mc.flags & MOVE_ANON)) - return NULL; - } else { - if (!(mc.flags & MOVE_FILE)) - return NULL; - } - get_page(page); - - return page; -} - -#if defined(CONFIG_SWAP) || defined(CONFIG_DEVICE_PRIVATE) -static struct page *mc_handle_swap_pte(struct vm_area_struct *vma, - pte_t ptent, swp_entry_t *entry) -{ - struct page *page =3D NULL; - swp_entry_t ent =3D pte_to_swp_entry(ptent); - - if (!(mc.flags & MOVE_ANON)) - return NULL; - - /* - * Handle device private pages that are not accessible by the CPU, but - * stored as special swap entries in the page table. - */ - if (is_device_private_entry(ent)) { - page =3D pfn_swap_entry_to_page(ent); - if (!get_page_unless_zero(page)) - return NULL; - return page; - } - - if (non_swap_entry(ent)) - return NULL; - - /* - * Because swap_cache_get_folio() updates some statistics counter, - * we call find_get_page() with swapper_space directly. - */ - page =3D find_get_page(swap_address_space(ent), swap_cache_index(ent)); - entry->val =3D ent.val; - - return page; -} -#else -static struct page *mc_handle_swap_pte(struct vm_area_struct *vma, - pte_t ptent, swp_entry_t *entry) -{ - return NULL; -} -#endif - -static struct page *mc_handle_file_pte(struct vm_area_struct *vma, - unsigned long addr, pte_t ptent) -{ - unsigned long index; - struct folio *folio; - - if (!vma->vm_file) /* anonymous vma */ - return NULL; - if (!(mc.flags & MOVE_FILE)) - return NULL; - - /* folio is moved even if it's not RSS of this task(page-faulted). */ - /* shmem/tmpfs may report page out on swap: account for that too. */ - index =3D linear_page_index(vma, addr); - folio =3D filemap_get_incore_folio(vma->vm_file->f_mapping, index); - if (IS_ERR(folio)) - return NULL; - return folio_file_page(folio, index); -} - -static void memcg1_check_events(struct mem_cgroup *memcg, int nid); -static void memcg1_charge_statistics(struct mem_cgroup *memcg, int nr_page= s); - -/** - * mem_cgroup_move_account - move account of the folio - * @folio: The folio. - * @compound: charge the page as compound or small page - * @from: mem_cgroup which the folio is moved from. - * @to: mem_cgroup which the folio is moved to. @from !=3D @to. - * - * The folio must be locked and not on the LRU. - * - * This function doesn't do "charge" to new cgroup and doesn't do "uncharg= e" - * from old cgroup. - */ -static int mem_cgroup_move_account(struct folio *folio, - bool compound, - struct mem_cgroup *from, - struct mem_cgroup *to) -{ - struct lruvec *from_vec, *to_vec; - struct pglist_data *pgdat; - unsigned int nr_pages =3D compound ? folio_nr_pages(folio) : 1; - int nid, ret; - - VM_BUG_ON(from =3D=3D to); - VM_BUG_ON_FOLIO(!folio_test_locked(folio), folio); - VM_BUG_ON_FOLIO(folio_test_lru(folio), folio); - VM_BUG_ON(compound && !folio_test_large(folio)); - - ret =3D -EINVAL; - if (folio_memcg(folio) !=3D from) - goto out; - - pgdat =3D folio_pgdat(folio); - from_vec =3D mem_cgroup_lruvec(from, pgdat); - to_vec =3D mem_cgroup_lruvec(to, pgdat); - - folio_memcg_lock(folio); - - if (folio_test_anon(folio)) { - if (folio_mapped(folio)) { - __mod_lruvec_state(from_vec, NR_ANON_MAPPED, -nr_pages); - __mod_lruvec_state(to_vec, NR_ANON_MAPPED, nr_pages); - if (folio_test_pmd_mappable(folio)) { - __mod_lruvec_state(from_vec, NR_ANON_THPS, - -nr_pages); - __mod_lruvec_state(to_vec, NR_ANON_THPS, - nr_pages); - } - } - } else { - __mod_lruvec_state(from_vec, NR_FILE_PAGES, -nr_pages); - __mod_lruvec_state(to_vec, NR_FILE_PAGES, nr_pages); - - if (folio_test_swapbacked(folio)) { - __mod_lruvec_state(from_vec, NR_SHMEM, -nr_pages); - __mod_lruvec_state(to_vec, NR_SHMEM, nr_pages); - } - - if (folio_mapped(folio)) { - __mod_lruvec_state(from_vec, NR_FILE_MAPPED, -nr_pages); - __mod_lruvec_state(to_vec, NR_FILE_MAPPED, nr_pages); - } - - if (folio_test_dirty(folio)) { - struct address_space *mapping =3D folio_mapping(folio); - - if (mapping_can_writeback(mapping)) { - __mod_lruvec_state(from_vec, NR_FILE_DIRTY, - -nr_pages); - __mod_lruvec_state(to_vec, NR_FILE_DIRTY, - nr_pages); - } - } - } - -#ifdef CONFIG_SWAP - if (folio_test_swapcache(folio)) { - __mod_lruvec_state(from_vec, NR_SWAPCACHE, -nr_pages); - __mod_lruvec_state(to_vec, NR_SWAPCACHE, nr_pages); - } -#endif - if (folio_test_writeback(folio)) { - __mod_lruvec_state(from_vec, NR_WRITEBACK, -nr_pages); - __mod_lruvec_state(to_vec, NR_WRITEBACK, nr_pages); - } - - /* - * All state has been migrated, let's switch to the new memcg. - * - * It is safe to change page's memcg here because the page - * is referenced, charged, isolated, and locked: we can't race - * with (un)charging, migration, LRU putback, or anything else - * that would rely on a stable page's memory cgroup. - * - * Note that folio_memcg_lock is a memcg lock, not a page lock, - * to save space. As soon as we switch page's memory cgroup to a - * new memcg that isn't locked, the above state can change - * concurrently again. Make sure we're truly done with it. - */ - smp_mb(); - - css_get(&to->css); - css_put(&from->css); - - folio->memcg_data =3D (unsigned long)to; - - __folio_memcg_unlock(from); - - ret =3D 0; - nid =3D folio_nid(folio); - - local_irq_disable(); - memcg1_charge_statistics(to, nr_pages); - memcg1_check_events(to, nid); - memcg1_charge_statistics(from, -nr_pages); - memcg1_check_events(from, nid); - local_irq_enable(); -out: - return ret; -} - -/** - * get_mctgt_type - get target type of moving charge - * @vma: the vma the pte to be checked belongs - * @addr: the address corresponding to the pte to be checked - * @ptent: the pte to be checked - * @target: the pointer the target page or swap ent will be stored(can be = NULL) - * - * Context: Called with pte lock held. - * Return: - * * MC_TARGET_NONE - If the pte is not a target for move charge. - * * MC_TARGET_PAGE - If the page corresponding to this pte is a target for - * move charge. If @target is not NULL, the folio is stored in target->f= olio - * with extra refcnt taken (Caller should release it). - * * MC_TARGET_SWAP - If the swap entry corresponding to this pte is a - * target for charge migration. If @target is not NULL, the entry is - * stored in target->ent. - * * MC_TARGET_DEVICE - Like MC_TARGET_PAGE but page is device memory and - * thus not on the lru. For now such page is charged like a regular page - * would be as it is just special memory taking the place of a regular p= age. - * See Documentations/vm/hmm.txt and include/linux/hmm.h - */ -static enum mc_target_type get_mctgt_type(struct vm_area_struct *vma, - unsigned long addr, pte_t ptent, union mc_target *target) -{ - struct page *page =3D NULL; - struct folio *folio; - enum mc_target_type ret =3D MC_TARGET_NONE; - swp_entry_t ent =3D { .val =3D 0 }; - - if (pte_present(ptent)) - page =3D mc_handle_present_pte(vma, addr, ptent); - else if (pte_none_mostly(ptent)) - /* - * PTE markers should be treated as a none pte here, separated - * from other swap handling below. - */ - page =3D mc_handle_file_pte(vma, addr, ptent); - else if (is_swap_pte(ptent)) - page =3D mc_handle_swap_pte(vma, ptent, &ent); - - if (page) - folio =3D page_folio(page); - if (target && page) { - if (!folio_trylock(folio)) { - folio_put(folio); - return ret; - } - /* - * page_mapped() must be stable during the move. This - * pte is locked, so if it's present, the page cannot - * become unmapped. If it isn't, we have only partial - * control over the mapped state: the page lock will - * prevent new faults against pagecache and swapcache, - * so an unmapped page cannot become mapped. However, - * if the page is already mapped elsewhere, it can - * unmap, and there is nothing we can do about it. - * Alas, skip moving the page in this case. - */ - if (!pte_present(ptent) && page_mapped(page)) { - folio_unlock(folio); - folio_put(folio); - return ret; - } - } - - if (!page && !ent.val) - return ret; - if (page) { - /* - * Do only loose check w/o serialization. - * mem_cgroup_move_account() checks the page is valid or - * not under LRU exclusion. - */ - if (folio_memcg(folio) =3D=3D mc.from) { - ret =3D MC_TARGET_PAGE; - if (folio_is_device_private(folio) || - folio_is_device_coherent(folio)) - ret =3D MC_TARGET_DEVICE; - if (target) - target->folio =3D folio; - } - if (!ret || !target) { - if (target) - folio_unlock(folio); - folio_put(folio); - } - } - /* - * There is a swap entry and a page doesn't exist or isn't charged. - * But we cannot move a tail-page in a THP. - */ - if (ent.val && !ret && (!page || !PageTransCompound(page)) && - mem_cgroup_id(mc.from) =3D=3D lookup_swap_cgroup_id(ent)) { - ret =3D MC_TARGET_SWAP; - if (target) - target->ent =3D ent; - } - return ret; -} - -#ifdef CONFIG_TRANSPARENT_HUGEPAGE -/* - * We don't consider PMD mapped swapping or file mapped pages because THP = does - * not support them for now. - * Caller should make sure that pmd_trans_huge(pmd) is true. - */ -static enum mc_target_type get_mctgt_type_thp(struct vm_area_struct *vma, - unsigned long addr, pmd_t pmd, union mc_target *target) -{ - struct page *page =3D NULL; - struct folio *folio; - enum mc_target_type ret =3D MC_TARGET_NONE; - - if (unlikely(is_swap_pmd(pmd))) { - VM_BUG_ON(thp_migration_supported() && - !is_pmd_migration_entry(pmd)); - return ret; - } - page =3D pmd_page(pmd); - VM_BUG_ON_PAGE(!page || !PageHead(page), page); - folio =3D page_folio(page); - if (!(mc.flags & MOVE_ANON)) - return ret; - if (folio_memcg(folio) =3D=3D mc.from) { - ret =3D MC_TARGET_PAGE; - if (target) { - folio_get(folio); - if (!folio_trylock(folio)) { - folio_put(folio); - return MC_TARGET_NONE; - } - target->folio =3D folio; - } - } - return ret; -} -#else -static inline enum mc_target_type get_mctgt_type_thp(struct vm_area_struct= *vma, - unsigned long addr, pmd_t pmd, union mc_target *target) -{ - return MC_TARGET_NONE; -} -#endif - -static int mem_cgroup_count_precharge_pte_range(pmd_t *pmd, - unsigned long addr, unsigned long end, - struct mm_walk *walk) -{ - struct vm_area_struct *vma =3D walk->vma; - pte_t *pte; - spinlock_t *ptl; - - ptl =3D pmd_trans_huge_lock(pmd, vma); - if (ptl) { - /* - * Note their can not be MC_TARGET_DEVICE for now as we do not - * support transparent huge page with MEMORY_DEVICE_PRIVATE but - * this might change. - */ - if (get_mctgt_type_thp(vma, addr, *pmd, NULL) =3D=3D MC_TARGET_PAGE) - mc.precharge +=3D HPAGE_PMD_NR; - spin_unlock(ptl); - return 0; - } - - pte =3D pte_offset_map_lock(vma->vm_mm, pmd, addr, &ptl); - if (!pte) - return 0; - for (; addr !=3D end; pte++, addr +=3D PAGE_SIZE) - if (get_mctgt_type(vma, addr, ptep_get(pte), NULL)) - mc.precharge++; /* increment precharge temporarily */ - pte_unmap_unlock(pte - 1, ptl); - cond_resched(); - - return 0; -} - -static const struct mm_walk_ops precharge_walk_ops =3D { - .pmd_entry =3D mem_cgroup_count_precharge_pte_range, - .walk_lock =3D PGWALK_RDLOCK, -}; - -static unsigned long mem_cgroup_count_precharge(struct mm_struct *mm) -{ - unsigned long precharge; - - mmap_read_lock(mm); - walk_page_range(mm, 0, ULONG_MAX, &precharge_walk_ops, NULL); - mmap_read_unlock(mm); - - precharge =3D mc.precharge; - mc.precharge =3D 0; - - return precharge; -} - -static int mem_cgroup_precharge_mc(struct mm_struct *mm) -{ - unsigned long precharge =3D mem_cgroup_count_precharge(mm); - - VM_BUG_ON(mc.moving_task); - mc.moving_task =3D current; - return mem_cgroup_do_precharge(precharge); -} - -/* cancels all extra charges on mc.from and mc.to, and wakes up all waiter= s. */ -static void __mem_cgroup_clear_mc(void) -{ - struct mem_cgroup *from =3D mc.from; - struct mem_cgroup *to =3D mc.to; - - /* we must uncharge all the leftover precharges from mc.to */ - if (mc.precharge) { - mem_cgroup_cancel_charge(mc.to, mc.precharge); - mc.precharge =3D 0; - } - /* - * we didn't uncharge from mc.from at mem_cgroup_move_account(), so - * we must uncharge here. - */ - if (mc.moved_charge) { - mem_cgroup_cancel_charge(mc.from, mc.moved_charge); - mc.moved_charge =3D 0; - } - /* we must fixup refcnts and charges */ - if (mc.moved_swap) { - /* uncharge swap account from the old cgroup */ - if (!mem_cgroup_is_root(mc.from)) - page_counter_uncharge(&mc.from->memsw, mc.moved_swap); - - mem_cgroup_id_put_many(mc.from, mc.moved_swap); - - /* - * we charged both to->memory and to->memsw, so we - * should uncharge to->memory. - */ - if (!mem_cgroup_is_root(mc.to)) - page_counter_uncharge(&mc.to->memory, mc.moved_swap); - - mc.moved_swap =3D 0; - } - memcg1_oom_recover(from); - memcg1_oom_recover(to); - wake_up_all(&mc.waitq); -} - -static void mem_cgroup_clear_mc(void) -{ - struct mm_struct *mm =3D mc.mm; - - /* - * we must clear moving_task before waking up waiters at the end of - * task migration. - */ - mc.moving_task =3D NULL; - __mem_cgroup_clear_mc(); - spin_lock(&mc.lock); - mc.from =3D NULL; - mc.to =3D NULL; - mc.mm =3D NULL; - spin_unlock(&mc.lock); - - mmput(mm); -} - -int memcg1_can_attach(struct cgroup_taskset *tset) -{ - struct cgroup_subsys_state *css; - struct mem_cgroup *memcg =3D NULL; /* unneeded init to make gcc happy */ - struct mem_cgroup *from; - struct task_struct *leader, *p; - struct mm_struct *mm; - unsigned long move_flags; - int ret =3D 0; - - /* charge immigration isn't supported on the default hierarchy */ - if (cgroup_subsys_on_dfl(memory_cgrp_subsys)) - return 0; - - /* - * Multi-process migrations only happen on the default hierarchy - * where charge immigration is not used. Perform charge - * immigration if @tset contains a leader and whine if there are - * multiple. - */ - p =3D NULL; - cgroup_taskset_for_each_leader(leader, css, tset) { - WARN_ON_ONCE(p); - p =3D leader; - memcg =3D mem_cgroup_from_css(css); - } - if (!p) - return 0; - - /* - * We are now committed to this value whatever it is. Changes in this - * tunable will only affect upcoming migrations, not the current one. - * So we need to save it, and keep it going. - */ - move_flags =3D READ_ONCE(memcg->move_charge_at_immigrate); - if (!move_flags) - return 0; - - from =3D mem_cgroup_from_task(p); - - VM_BUG_ON(from =3D=3D memcg); - - mm =3D get_task_mm(p); - if (!mm) - return 0; - /* We move charges only when we move a owner of the mm */ - if (mm->owner =3D=3D p) { - VM_BUG_ON(mc.from); - VM_BUG_ON(mc.to); - VM_BUG_ON(mc.precharge); - VM_BUG_ON(mc.moved_charge); - VM_BUG_ON(mc.moved_swap); - - spin_lock(&mc.lock); - mc.mm =3D mm; - mc.from =3D from; - mc.to =3D memcg; - mc.flags =3D move_flags; - spin_unlock(&mc.lock); - /* We set mc.moving_task later */ - - ret =3D mem_cgroup_precharge_mc(mm); - if (ret) - mem_cgroup_clear_mc(); - } else { - mmput(mm); - } - return ret; -} - -void memcg1_cancel_attach(struct cgroup_taskset *tset) -{ - if (mc.to) - mem_cgroup_clear_mc(); -} - -static int mem_cgroup_move_charge_pte_range(pmd_t *pmd, - unsigned long addr, unsigned long end, - struct mm_walk *walk) -{ - int ret =3D 0; - struct vm_area_struct *vma =3D walk->vma; - pte_t *pte; - spinlock_t *ptl; - enum mc_target_type target_type; - union mc_target target; - struct folio *folio; - - ptl =3D pmd_trans_huge_lock(pmd, vma); - if (ptl) { - if (mc.precharge < HPAGE_PMD_NR) { - spin_unlock(ptl); - return 0; - } - target_type =3D get_mctgt_type_thp(vma, addr, *pmd, &target); - if (target_type =3D=3D MC_TARGET_PAGE) { - folio =3D target.folio; - if (folio_isolate_lru(folio)) { - if (!mem_cgroup_move_account(folio, true, - mc.from, mc.to)) { - mc.precharge -=3D HPAGE_PMD_NR; - mc.moved_charge +=3D HPAGE_PMD_NR; - } - folio_putback_lru(folio); - } - folio_unlock(folio); - folio_put(folio); - } else if (target_type =3D=3D MC_TARGET_DEVICE) { - folio =3D target.folio; - if (!mem_cgroup_move_account(folio, true, - mc.from, mc.to)) { - mc.precharge -=3D HPAGE_PMD_NR; - mc.moved_charge +=3D HPAGE_PMD_NR; - } - folio_unlock(folio); - folio_put(folio); - } - spin_unlock(ptl); - return 0; - } - -retry: - pte =3D pte_offset_map_lock(vma->vm_mm, pmd, addr, &ptl); - if (!pte) - return 0; - for (; addr !=3D end; addr +=3D PAGE_SIZE) { - pte_t ptent =3D ptep_get(pte++); - bool device =3D false; - swp_entry_t ent; - - if (!mc.precharge) - break; - - switch (get_mctgt_type(vma, addr, ptent, &target)) { - case MC_TARGET_DEVICE: - device =3D true; - fallthrough; - case MC_TARGET_PAGE: - folio =3D target.folio; - /* - * We can have a part of the split pmd here. Moving it - * can be done but it would be too convoluted so simply - * ignore such a partial THP and keep it in original - * memcg. There should be somebody mapping the head. - */ - if (folio_test_large(folio)) - goto put; - if (!device && !folio_isolate_lru(folio)) - goto put; - if (!mem_cgroup_move_account(folio, false, - mc.from, mc.to)) { - mc.precharge--; - /* we uncharge from mc.from later. */ - mc.moved_charge++; - } - if (!device) - folio_putback_lru(folio); -put: /* get_mctgt_type() gets & locks the page */ - folio_unlock(folio); - folio_put(folio); - break; - case MC_TARGET_SWAP: - ent =3D target.ent; - if (!mem_cgroup_move_swap_account(ent, mc.from, mc.to)) { - mc.precharge--; - mem_cgroup_id_get_many(mc.to, 1); - /* we fixup other refcnts and charges later. */ - mc.moved_swap++; - } - break; - default: - break; - } - } - pte_unmap_unlock(pte - 1, ptl); - cond_resched(); - - if (addr !=3D end) { - /* - * We have consumed all precharges we got in can_attach(). - * We try charge one by one, but don't do any additional - * charges to mc.to if we have failed in charge once in attach() - * phase. - */ - ret =3D mem_cgroup_do_precharge(1); - if (!ret) - goto retry; - } - - return ret; -} - -static const struct mm_walk_ops charge_walk_ops =3D { - .pmd_entry =3D mem_cgroup_move_charge_pte_range, - .walk_lock =3D PGWALK_RDLOCK, -}; - -static void mem_cgroup_move_charge(void) -{ - lru_add_drain_all(); - /* - * Signal folio_memcg_lock() to take the memcg's move_lock - * while we're moving its pages to another memcg. Then wait - * for already started RCU-only updates to finish. - */ - atomic_inc(&mc.from->moving_account); - synchronize_rcu(); -retry: - if (unlikely(!mmap_read_trylock(mc.mm))) { - /* - * Someone who are holding the mmap_lock might be waiting in - * waitq. So we cancel all extra charges, wake up all waiters, - * and retry. Because we cancel precharges, we might not be able - * to move enough charges, but moving charge is a best-effort - * feature anyway, so it wouldn't be a big problem. - */ - __mem_cgroup_clear_mc(); - cond_resched(); - goto retry; - } - /* - * When we have consumed all precharges and failed in doing - * additional charge, the page walk just aborts. - */ - walk_page_range(mc.mm, 0, ULONG_MAX, &charge_walk_ops, NULL); - mmap_read_unlock(mc.mm); - atomic_dec(&mc.from->moving_account); -} - -void memcg1_move_task(void) -{ - if (mc.to) { - mem_cgroup_move_charge(); - mem_cgroup_clear_mc(); - } -} - -#else /* !CONFIG_MMU */ -int memcg1_can_attach(struct cgroup_taskset *tset) -{ - return 0; -} -void memcg1_cancel_attach(struct cgroup_taskset *tset) -{ -} -void memcg1_move_task(void) -{ -} -#endif - static void __mem_cgroup_threshold(struct mem_cgroup *memcg, bool swap) { struct mem_cgroup_threshold_ary *t; diff --git a/mm/memcontrol-v1.h b/mm/memcontrol-v1.h index c0672e25bcdb..0e3b82951d91 100644 --- a/mm/memcontrol-v1.h +++ b/mm/memcontrol-v1.h @@ -80,12 +80,7 @@ static inline void memcg1_soft_limit_reset(struct mem_cg= roup *memcg) WRITE_ONCE(memcg->soft_limit, PAGE_COUNTER_MAX); } =20 -bool memcg1_wait_acct_move(struct mem_cgroup *memcg); - struct cgroup_taskset; -int memcg1_can_attach(struct cgroup_taskset *tset); -void memcg1_cancel_attach(struct cgroup_taskset *tset); -void memcg1_move_task(void); void memcg1_css_offline(struct mem_cgroup *memcg); =20 /* for encoding cft->private value on file */ @@ -130,7 +125,6 @@ static inline void memcg1_free_events(struct mem_cgroup= *memcg) {} static inline void memcg1_memcg_init(struct mem_cgroup *memcg) {} static inline void memcg1_remove_from_trees(struct mem_cgroup *memcg) {} static inline void memcg1_soft_limit_reset(struct mem_cgroup *memcg) {} -static inline bool memcg1_wait_acct_move(struct mem_cgroup *memcg) { retur= n false; } static inline void memcg1_css_offline(struct mem_cgroup *memcg) {} =20 static inline bool memcg1_oom_prepare(struct mem_cgroup *memcg, bool *lock= ed) { return true; } diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 5c3a8629ef3e..94279b9c766a 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -2242,12 +2242,6 @@ int try_charge_memcg(struct mem_cgroup *memcg, gfp_t= gfp_mask, */ if (nr_reclaimed && nr_pages <=3D (1 << PAGE_ALLOC_COSTLY_ORDER)) goto retry; - /* - * At task move, charge accounts can be doubly counted. So, it's - * better to wait until the end of task_move if something is going on. - */ - if (memcg1_wait_acct_move(mem_over_limit)) - goto retry; =20 if (nr_retries--) goto retry; @@ -4439,9 +4433,6 @@ struct cgroup_subsys memory_cgrp_subsys =3D { .exit =3D mem_cgroup_exit, .dfl_cftypes =3D memory_files, #ifdef CONFIG_MEMCG_V1 - .can_attach =3D memcg1_can_attach, - .cancel_attach =3D memcg1_cancel_attach, - .post_attach =3D memcg1_move_task, .legacy_cftypes =3D mem_cgroup_legacy_files, #endif .early_init =3D 0, --=20 2.43.5 From nobody Mon Nov 25 21:18:54 2024 Received: from out-172.mta1.migadu.com (out-172.mta1.migadu.com [95.215.58.172]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 28A0A1A38E3 for ; Thu, 24 Oct 2024 06:57:35 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=95.215.58.172 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1729753059; cv=none; b=Ce0e9sFvgjgRlOW2SSLt8j4Ekmi1dkqjiwLkw7XjVKyZ4VTglCsjr+NlezhilgV1iAqd2KOxfR5bd55yw4rolp1DFOkKXTKFAiDiRwWK4tgK8cTRAdb87GOp8ieHsYsVIh0aR9YxWxFRYuqv50twF1E+rUa+S9Gdkl1XLvpReY8= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1729753059; c=relaxed/simple; bh=JxFRtVDkVwNX/nKJf6rxYe1v7biKRdps+Stcu5DCQeg=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=juvTpM576sNMnBk/7Sq0gvedF21PNOKYBikORPB54kqpsmXJEa1jndxvJwZhBIbWfnt98BYyc8pA2rGnIvYjkWe7kPmgVCNilG2beqLpCdiIo9VtmsmXSWE3nLJ0lR5DOui/vHyp+WVdlKL4tlgIy2j5tvQUoCKvaWmaZhARpAI= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev; spf=pass smtp.mailfrom=linux.dev; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b=fk7R4GrF; arc=none smtp.client-ip=95.215.58.172 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.dev Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b="fk7R4GrF" X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1729753054; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=Jjur1v507XZcfB3R788/dravrRJmQ/huQN6yJVE4iP0=; b=fk7R4GrFjC4AEwUgyhAI4/ezHAGxfXFZ+MU7L3jao8pJ4Xf0Og82QbElOyqjOPrwGb/ltR eKBUJSbYRCd1woigqV+GtSCQkASBvINFSwREEpvEcn/gQLDvqA/6LpcDn4S6yQ7O40XAe4 CNwvb5Pz9Zcv+8dThKfAXFFb/17J/Uw= From: Shakeel Butt To: Andrew Morton Cc: Johannes Weiner , Michal Hocko , Roman Gushchin , Muchun Song , Hugh Dickins , linux-mm@kvack.org, cgroups@vger.kernel.org, linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-doc@vger.kernel.org, Meta kernel team Subject: [RFC PATCH 3/3] memcg-v1: remove memcg move locking code Date: Wed, 23 Oct 2024 23:57:12 -0700 Message-ID: <20241024065712.1274481-4-shakeel.butt@linux.dev> In-Reply-To: <20241024065712.1274481-1-shakeel.butt@linux.dev> References: <20241024065712.1274481-1-shakeel.butt@linux.dev> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Migadu-Flow: FLOW_OUT Content-Type: text/plain; charset="utf-8" The memcg v1's charge move feature has been deprecated. There is no need to have any locking or protection against the moving charge. Let's proceed to remove all the locking code related to charge moving. Signed-off-by: Shakeel Butt Reviewed-by: Roman Gushchin --- fs/buffer.c | 5 --- include/linux/memcontrol.h | 54 ------------------------- mm/filemap.c | 1 - mm/memcontrol-v1.c | 82 -------------------------------------- mm/memcontrol.c | 5 --- mm/page-writeback.c | 21 ++-------- mm/rmap.c | 1 - mm/vmscan.c | 11 ----- 8 files changed, 3 insertions(+), 177 deletions(-) diff --git a/fs/buffer.c b/fs/buffer.c index 1fc9a50def0b..88e765b0699f 100644 --- a/fs/buffer.c +++ b/fs/buffer.c @@ -736,15 +736,12 @@ bool block_dirty_folio(struct address_space *mapping,= struct folio *folio) * Lock out page's memcg migration to keep PageDirty * synchronized with per-memcg dirty page counters. */ - folio_memcg_lock(folio); newly_dirty =3D !folio_test_set_dirty(folio); spin_unlock(&mapping->i_private_lock); =20 if (newly_dirty) __folio_mark_dirty(folio, mapping, 1); =20 - folio_memcg_unlock(folio); - if (newly_dirty) __mark_inode_dirty(mapping->host, I_DIRTY_PAGES); =20 @@ -1194,13 +1191,11 @@ void mark_buffer_dirty(struct buffer_head *bh) struct folio *folio =3D bh->b_folio; struct address_space *mapping =3D NULL; =20 - folio_memcg_lock(folio); if (!folio_test_set_dirty(folio)) { mapping =3D folio->mapping; if (mapping) __folio_mark_dirty(folio, mapping, 0); } - folio_memcg_unlock(folio); if (mapping) __mark_inode_dirty(mapping->host, I_DIRTY_PAGES); } diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h index 798db70b0a30..932534291ca2 100644 --- a/include/linux/memcontrol.h +++ b/include/linux/memcontrol.h @@ -299,20 +299,10 @@ struct mem_cgroup { /* For oom notifier event fd */ struct list_head oom_notify; =20 - /* taken only while moving_account > 0 */ - spinlock_t move_lock; - unsigned long move_lock_flags; - /* Legacy tcp memory accounting */ bool tcpmem_active; int tcpmem_pressure; =20 - /* - * set > 0 if pages under this cgroup are moving to other cgroup. - */ - atomic_t moving_account; - struct task_struct *move_lock_task; - /* List of events which userspace want to receive */ struct list_head event_list; spinlock_t event_list_lock; @@ -428,9 +418,7 @@ static inline struct obj_cgroup *__folio_objcg(struct f= olio *folio) * * - the folio lock * - LRU isolation - * - folio_memcg_lock() * - exclusive reference - * - mem_cgroup_trylock_pages() * * For a kmem folio a caller should hold an rcu read lock to protect memcg * associated with a kmem folio from being released. @@ -499,9 +487,7 @@ static inline struct mem_cgroup *folio_memcg_rcu(struct= folio *folio) * * - the folio lock * - LRU isolation - * - lock_folio_memcg() * - exclusive reference - * - mem_cgroup_trylock_pages() * * For a kmem folio a caller should hold an rcu read lock to protect memcg * associated with a kmem folio from being released. @@ -1873,26 +1859,6 @@ static inline bool task_in_memcg_oom(struct task_str= uct *p) return p->memcg_in_oom; } =20 -void folio_memcg_lock(struct folio *folio); -void folio_memcg_unlock(struct folio *folio); - -/* try to stablize folio_memcg() for all the pages in a memcg */ -static inline bool mem_cgroup_trylock_pages(struct mem_cgroup *memcg) -{ - rcu_read_lock(); - - if (mem_cgroup_disabled() || !atomic_read(&memcg->moving_account)) - return true; - - rcu_read_unlock(); - return false; -} - -static inline void mem_cgroup_unlock_pages(void) -{ - rcu_read_unlock(); -} - static inline void mem_cgroup_enter_user_fault(void) { WARN_ON(current->in_user_fault); @@ -1914,26 +1880,6 @@ unsigned long memcg1_soft_limit_reclaim(pg_data_t *p= gdat, int order, return 0; } =20 -static inline void folio_memcg_lock(struct folio *folio) -{ -} - -static inline void folio_memcg_unlock(struct folio *folio) -{ -} - -static inline bool mem_cgroup_trylock_pages(struct mem_cgroup *memcg) -{ - /* to match folio_memcg_rcu() */ - rcu_read_lock(); - return true; -} - -static inline void mem_cgroup_unlock_pages(void) -{ - rcu_read_unlock(); -} - static inline bool task_in_memcg_oom(struct task_struct *p) { return false; diff --git a/mm/filemap.c b/mm/filemap.c index 630a1c431ea1..e582a1545d2a 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -119,7 +119,6 @@ * ->i_pages lock (folio_remove_rmap_pte->set_page_dirty) * bdi.wb->list_lock (folio_remove_rmap_pte->set_page_dirty) * ->inode->i_lock (folio_remove_rmap_pte->set_page_dirty) - * ->memcg->move_lock (folio_remove_rmap_pte->folio_memcg_lock) * bdi.wb->list_lock (zap_pte_range->set_page_dirty) * ->inode->i_lock (zap_pte_range->set_page_dirty) * ->private_lock (zap_pte_range->block_dirty_folio) diff --git a/mm/memcontrol-v1.c b/mm/memcontrol-v1.c index 79339cb65b9d..b14f01a93d0c 100644 --- a/mm/memcontrol-v1.c +++ b/mm/memcontrol-v1.c @@ -401,87 +401,6 @@ unsigned long memcg1_soft_limit_reclaim(pg_data_t *pgd= at, int order, return nr_reclaimed; } =20 -/** - * folio_memcg_lock - Bind a folio to its memcg. - * @folio: The folio. - * - * This function prevents unlocked LRU folios from being moved to - * another cgroup. - * - * It ensures lifetime of the bound memcg. The caller is responsible - * for the lifetime of the folio. - */ -void folio_memcg_lock(struct folio *folio) -{ - struct mem_cgroup *memcg; - unsigned long flags; - - /* - * The RCU lock is held throughout the transaction. The fast - * path can get away without acquiring the memcg->move_lock - * because page moving starts with an RCU grace period. - */ - rcu_read_lock(); - - if (mem_cgroup_disabled()) - return; -again: - memcg =3D folio_memcg(folio); - if (unlikely(!memcg)) - return; - -#ifdef CONFIG_PROVE_LOCKING - local_irq_save(flags); - might_lock(&memcg->move_lock); - local_irq_restore(flags); -#endif - - if (atomic_read(&memcg->moving_account) <=3D 0) - return; - - spin_lock_irqsave(&memcg->move_lock, flags); - if (memcg !=3D folio_memcg(folio)) { - spin_unlock_irqrestore(&memcg->move_lock, flags); - goto again; - } - - /* - * When charge migration first begins, we can have multiple - * critical sections holding the fast-path RCU lock and one - * holding the slowpath move_lock. Track the task who has the - * move_lock for folio_memcg_unlock(). - */ - memcg->move_lock_task =3D current; - memcg->move_lock_flags =3D flags; -} - -static void __folio_memcg_unlock(struct mem_cgroup *memcg) -{ - if (memcg && memcg->move_lock_task =3D=3D current) { - unsigned long flags =3D memcg->move_lock_flags; - - memcg->move_lock_task =3D NULL; - memcg->move_lock_flags =3D 0; - - spin_unlock_irqrestore(&memcg->move_lock, flags); - } - - rcu_read_unlock(); -} - -/** - * folio_memcg_unlock - Release the binding between a folio and its memcg. - * @folio: The folio. - * - * This releases the binding created by folio_memcg_lock(). This does - * not change the accounting of this folio to its memcg, but it does - * permit others to change it. - */ -void folio_memcg_unlock(struct folio *folio) -{ - __folio_memcg_unlock(folio_memcg(folio)); -} - static u64 mem_cgroup_move_charge_read(struct cgroup_subsys_state *css, struct cftype *cft) { @@ -1187,7 +1106,6 @@ void memcg1_memcg_init(struct mem_cgroup *memcg) { INIT_LIST_HEAD(&memcg->oom_notify); mutex_init(&memcg->thresholds_lock); - spin_lock_init(&memcg->move_lock); INIT_LIST_HEAD(&memcg->event_list); spin_lock_init(&memcg->event_list_lock); } diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 94279b9c766a..3c223aaeb6af 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -1189,7 +1189,6 @@ void lruvec_memcg_debug(struct lruvec *lruvec, struct= folio *folio) * These functions are safe to use under any of the following conditions: * - folio locked * - folio_test_lru false - * - folio_memcg_lock() * - folio frozen (refcount of 0) * * Return: The lruvec this folio is on with its lock held. @@ -1211,7 +1210,6 @@ struct lruvec *folio_lruvec_lock(struct folio *folio) * These functions are safe to use under any of the following conditions: * - folio locked * - folio_test_lru false - * - folio_memcg_lock() * - folio frozen (refcount of 0) * * Return: The lruvec this folio is on with its lock held and interrupts @@ -1235,7 +1233,6 @@ struct lruvec *folio_lruvec_lock_irq(struct folio *fo= lio) * These functions are safe to use under any of the following conditions: * - folio locked * - folio_test_lru false - * - folio_memcg_lock() * - folio frozen (refcount of 0) * * Return: The lruvec this folio is on with its lock held and interrupts @@ -2375,9 +2372,7 @@ static void commit_charge(struct folio *folio, struct= mem_cgroup *memcg) * * - the page lock * - LRU isolation - * - folio_memcg_lock() * - exclusive reference - * - mem_cgroup_trylock_pages() */ folio->memcg_data =3D (unsigned long)memcg; } diff --git a/mm/page-writeback.c b/mm/page-writeback.c index 1d7179aba8e3..e33727dd6b47 100644 --- a/mm/page-writeback.c +++ b/mm/page-writeback.c @@ -2743,8 +2743,6 @@ EXPORT_SYMBOL(noop_dirty_folio); /* * Helper function for set_page_dirty family. * - * Caller must hold folio_memcg_lock(). - * * NOTE: This relies on being atomic wrt interrupts. */ static void folio_account_dirtied(struct folio *folio, @@ -2776,8 +2774,6 @@ static void folio_account_dirtied(struct folio *folio, =20 /* * Helper function for deaccounting dirty page without writeback. - * - * Caller must hold folio_memcg_lock(). */ void folio_account_cleaned(struct folio *folio, struct bdi_writeback *wb) { @@ -2795,9 +2791,8 @@ void folio_account_cleaned(struct folio *folio, struc= t bdi_writeback *wb) * If warn is true, then emit a warning if the folio is not uptodate and h= as * not been truncated. * - * The caller must hold folio_memcg_lock(). It is the caller's - * responsibility to prevent the folio from being truncated while - * this function is in progress, although it may have been truncated + * It is the caller's responsibility to prevent the folio from being trunc= ated + * while this function is in progress, although it may have been truncated * before this function is called. Most callers have the folio locked. * A few have the folio blocked from truncation through other means (e.g. * zap_vma_pages() has it mapped and is holding the page table lock). @@ -2841,14 +2836,10 @@ void __folio_mark_dirty(struct folio *folio, struct= address_space *mapping, */ bool filemap_dirty_folio(struct address_space *mapping, struct folio *foli= o) { - folio_memcg_lock(folio); - if (folio_test_set_dirty(folio)) { - folio_memcg_unlock(folio); + if (folio_test_set_dirty(folio)) return false; - } =20 __folio_mark_dirty(folio, mapping, !folio_test_private(folio)); - folio_memcg_unlock(folio); =20 if (mapping->host) { /* !PageAnon && !swapper_space */ @@ -2975,14 +2966,12 @@ void __folio_cancel_dirty(struct folio *folio) struct bdi_writeback *wb; struct wb_lock_cookie cookie =3D {}; =20 - folio_memcg_lock(folio); wb =3D unlocked_inode_to_wb_begin(inode, &cookie); =20 if (folio_test_clear_dirty(folio)) folio_account_cleaned(folio, wb); =20 unlocked_inode_to_wb_end(inode, &cookie); - folio_memcg_unlock(folio); } else { folio_clear_dirty(folio); } @@ -3093,7 +3082,6 @@ bool __folio_end_writeback(struct folio *folio) struct address_space *mapping =3D folio_mapping(folio); bool ret; =20 - folio_memcg_lock(folio); if (mapping && mapping_use_writeback_tags(mapping)) { struct inode *inode =3D mapping->host; struct backing_dev_info *bdi =3D inode_to_bdi(inode); @@ -3124,7 +3112,6 @@ bool __folio_end_writeback(struct folio *folio) lruvec_stat_mod_folio(folio, NR_WRITEBACK, -nr); zone_stat_mod_folio(folio, NR_ZONE_WRITE_PENDING, -nr); node_stat_mod_folio(folio, NR_WRITTEN, nr); - folio_memcg_unlock(folio); =20 return ret; } @@ -3137,7 +3124,6 @@ void __folio_start_writeback(struct folio *folio, boo= l keep_write) =20 VM_BUG_ON_FOLIO(folio_test_writeback(folio), folio); =20 - folio_memcg_lock(folio); if (mapping && mapping_use_writeback_tags(mapping)) { XA_STATE(xas, &mapping->i_pages, folio_index(folio)); struct inode *inode =3D mapping->host; @@ -3178,7 +3164,6 @@ void __folio_start_writeback(struct folio *folio, boo= l keep_write) =20 lruvec_stat_mod_folio(folio, NR_WRITEBACK, nr); zone_stat_mod_folio(folio, NR_ZONE_WRITE_PENDING, nr); - folio_memcg_unlock(folio); =20 access_ret =3D arch_make_folio_accessible(folio); /* diff --git a/mm/rmap.c b/mm/rmap.c index 4785a693857a..c6c4d4ea29a7 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -32,7 +32,6 @@ * swap_lock (in swap_duplicate, swap_info_get) * mmlist_lock (in mmput, drain_mmlist and others) * mapping->private_lock (in block_dirty_folio) - * folio_lock_memcg move_lock (in block_dirty_folio) * i_pages lock (widely used) * lruvec->lru_lock (in folio_lruvec_lock_irq) * inode->i_lock (in set_page_dirty's __mark_inode_dir= ty) diff --git a/mm/vmscan.c b/mm/vmscan.c index 29c098790b01..fd7171658b63 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -3662,10 +3662,6 @@ static void walk_mm(struct mm_struct *mm, struct lru= _gen_mm_walk *walk) if (walk->seq !=3D max_seq) break; =20 - /* folio_update_gen() requires stable folio_memcg() */ - if (!mem_cgroup_trylock_pages(memcg)) - break; - /* the caller might be holding the lock for write */ if (mmap_read_trylock(mm)) { err =3D walk_page_range(mm, walk->next_addr, ULONG_MAX, &mm_walk_ops, w= alk); @@ -3673,8 +3669,6 @@ static void walk_mm(struct mm_struct *mm, struct lru_= gen_mm_walk *walk) mmap_read_unlock(mm); } =20 - mem_cgroup_unlock_pages(); - if (walk->batched) { spin_lock_irq(&lruvec->lru_lock); reset_batch_size(walk); @@ -4096,10 +4090,6 @@ bool lru_gen_look_around(struct page_vma_mapped_walk= *pvmw) } } =20 - /* folio_update_gen() requires stable folio_memcg() */ - if (!mem_cgroup_trylock_pages(memcg)) - return true; - arch_enter_lazy_mmu_mode(); =20 pte -=3D (addr - start) / PAGE_SIZE; @@ -4144,7 +4134,6 @@ bool lru_gen_look_around(struct page_vma_mapped_walk = *pvmw) } =20 arch_leave_lazy_mmu_mode(); - mem_cgroup_unlock_pages(); =20 /* feedback from rmap walkers to page table walkers */ if (mm_state && suitable_to_scan(i, young)) --=20 2.43.5