From nobody Thu Sep 18 11:50:13 2025
Return-Path: <linux-kernel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 8FDAFC4708C
	for <linux-kernel@archiver.kernel.org>; Tue,  6 Dec 2022 17:14:21 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S235750AbiLFROT (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Tue, 6 Dec 2022 12:14:19 -0500
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49150 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S235668AbiLFROG (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Tue, 6 Dec 2022 12:14:06 -0500
Received: from mail-ed1-x52c.google.com (mail-ed1-x52c.google.com
 [IPv6:2a00:1450:4864:20::52c])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 2B4CB303C7
        for <linux-kernel@vger.kernel.org>;
 Tue,  6 Dec 2022 09:14:05 -0800 (PST)
Received: by mail-ed1-x52c.google.com with SMTP id l11so21262728edb.4
        for <linux-kernel@vger.kernel.org>;
 Tue, 06 Dec 2022 09:14:05 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=cmpxchg-org.20210112.gappssmtp.com; s=20210112;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:from:to:cc:subject:date
         :message-id:reply-to;
        bh=KduHn+5gyrCZCJ3PZZKBhevPs/yCR7iZhiHXyVA0CdE=;
        b=U1tH6w6XQXpYI+Hv+M2BpssrkbT69EZiZtCJ8KUrJiRcGor3DEDhd1mEQhmYBwNQAL
         9YedqqgQJoMza6N6wOnCopBmkDUVl4TGszL7044J1ET/RB82yJLOSkO3A10sVEkWg8tw
         0z38g0G+6klmx7+DUWF/Lj59zdXgVZZkiBhgPm7HtpP6UoAVuRquXY0sRXXXI3tZin53
         VD8Oobr5c3lkHwIXOQw6vLPebCgDSNO7nkyR39EwRxodTc4l3/xeYHpMD0CAYMriMrd2
         N6OnCJlPli+HAoMxQiW5waMNstoUqDSvHSRvQNN/Uv8NqyA+jeyP7SgUNW7gld1NZOSf
         l4zg==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20210112;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc
         :subject:date:message-id:reply-to;
        bh=KduHn+5gyrCZCJ3PZZKBhevPs/yCR7iZhiHXyVA0CdE=;
        b=1HJTWjdc4+Wxlo1AtDAyhGSZrghobfUB3eyL3px2f84XNbny1Ds6zaHrQRUEzZ5s/R
         cSBmes+cXJrLiN4CUxC9OO2qvDLyrQyakbY2ND8cRV/gWncbaVUPbpkT5jCSafBHa17o
         78+1DNCGL45QatIMqqP98vE39dclgCpM1rnJsrqn8s+kMIxsQmThQzRDRFFMdxezaMmJ
         lE9FAl3UXMx4GQ8Q7XArPX7ElGP9yb211inBka483KwkKEDQ/Ch3ygSkpY2RsNi51XpQ
         arruwR3EOD4BpquGUeZGUmZDXy/1mNzwr/yY9BdupioDTVk9Xbyw9fZlrOni6Z2znvsG
         HS0A==
X-Gm-Message-State: ANoB5pkeNM++6FOmJzayKCw7NT5PN9y2yB6EOGbuoP4NBLtQyqernBUo
        zNOHJ/RsadZsl9AZqBbqt8ikvw==
X-Google-Smtp-Source: 
 AA0mqf7Ryd9NRT+3N9jQ0YqZ+vYbjjy0DyrVXIILJI2vcnGZiLsDKaLX+crjVJIhIeHPDsTlOyd56w==
X-Received: by 2002:a05:6402:444a:b0:459:401:c23e with SMTP id
 o10-20020a056402444a00b004590401c23emr65038318edb.23.1670346843685;
        Tue, 06 Dec 2022 09:14:03 -0800 (PST)
Received: from localhost ([2a02:8070:6387:ab20:15aa:3c87:c206:d15e])
        by smtp.gmail.com with ESMTPSA id
 9-20020a170906200900b007c0688a68cbsm7534622ejo.176.2022.12.06.09.14.01
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Tue, 06 Dec 2022 09:14:03 -0800 (PST)
From: Johannes Weiner <hannes@cmpxchg.org>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
        Hugh Dickins <hughd@google.com>,
        Shakeel Butt <shakeelb@google.com>,
        Michal Hocko <mhocko@suse.com>, linux-mm@kvack.org,
        cgroups@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: [PATCH 1/3] mm: memcontrol: skip moving non-present pages that are
 mapped elsewhere
Date: Tue,  6 Dec 2022 18:13:39 +0100
Message-Id: <20221206171340.139790-2-hannes@cmpxchg.org>
X-Mailer: git-send-email 2.38.1
In-Reply-To: <20221206171340.139790-1-hannes@cmpxchg.org>
References: <20221206171340.139790-1-hannes@cmpxchg.org>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

During charge moving, the pte lock and the page lock cover nearly all
cases of stabilizing page_mapped(). The only exception is when we're
looking at a non-present pte and find a page in the page cache or in
the swapcache: if the page is mapped elsewhere, it can become unmapped
outside of our control. For this reason, rmap needs lock_page_memcg().

We don't like cgroup-specific locks in generic MM code - especially in
performance-critical MM code - and for a legacy feature that's
unlikely to have many users left - if any.

So remove the exception. Arguably that's better semantics anyway: the
page is shared, and another process seems to be the more active user.

Once we stop moving such pages, rmap doesn't need lock_page_memcg()
anymore. The next patch will remove it.

Suggested-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Hugh Dickins <hughd@google.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Shakeel Butt <shakeelb@google.com>
---
 mm/memcontrol.c | 52 ++++++++++++++++++++++++++++++++++++-------------
 1 file changed, 38 insertions(+), 14 deletions(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 48c44229cf47..b696354c1b21 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -5681,7 +5681,7 @@ static struct page *mc_handle_file_pte(struct vm_area=
_struct *vma,
  * @from: mem_cgroup which the page is moved from.
  * @to:	mem_cgroup which the page is moved to. @from !=3D @to.
  *
- * The caller must make sure the page is not on LRU (isolate_page() is use=
ful.)
+ * The page must be locked and not on the LRU.
  *
  * This function doesn't do "charge" to new cgroup and doesn't do "uncharg=
e"
  * from old cgroup.
@@ -5698,20 +5698,13 @@ static int mem_cgroup_move_account(struct page *pag=
e,
 	int nid, ret;
=20
 	VM_BUG_ON(from =3D=3D to);
+	VM_BUG_ON_FOLIO(!folio_test_locked(folio), folio);
 	VM_BUG_ON_FOLIO(folio_test_lru(folio), folio);
 	VM_BUG_ON(compound && !folio_test_large(folio));
=20
-	/*
-	 * Prevent mem_cgroup_migrate() from looking at
-	 * page's memory cgroup of its source page while we change it.
-	 */
-	ret =3D -EBUSY;
-	if (!folio_trylock(folio))
-		goto out;
-
 	ret =3D -EINVAL;
 	if (folio_memcg(folio) !=3D from)
-		goto out_unlock;
+		goto out;
=20
 	pgdat =3D folio_pgdat(folio);
 	from_vec =3D mem_cgroup_lruvec(from, pgdat);
@@ -5798,8 +5791,6 @@ static int mem_cgroup_move_account(struct page *page,
 	mem_cgroup_charge_statistics(from, -nr_pages);
 	memcg_check_events(from, nid);
 	local_irq_enable();
-out_unlock:
-	folio_unlock(folio);
 out:
 	return ret;
 }
@@ -5848,6 +5839,29 @@ static enum mc_target_type get_mctgt_type(struct vm_=
area_struct *vma,
 	else if (is_swap_pte(ptent))
 		page =3D mc_handle_swap_pte(vma, ptent, &ent);
=20
+	if (target && page) {
+		if (!trylock_page(page)) {
+			put_page(page);
+			return ret;
+		}
+		/*
+		 * page_mapped() must be stable during the move. This
+		 * pte is locked, so if it's present, the page cannot
+		 * become unmapped. If it isn't, we have only partial
+		 * control over the mapped state: the page lock will
+		 * prevent new faults against pagecache and swapcache,
+		 * so an unmapped page cannot become mapped. However,
+		 * if the page is already mapped elsewhere, it can
+		 * unmap, and there is nothing we can do about it.
+		 * Alas, skip moving the page in this case.
+		 */
+		if (!pte_present(ptent) && page_mapped(page)) {
+			unlock_page(page);
+			put_page(page);
+			return ret;
+		}
+	}
+
 	if (!page && !ent.val)
 		return ret;
 	if (page) {
@@ -5864,8 +5878,11 @@ static enum mc_target_type get_mctgt_type(struct vm_=
area_struct *vma,
 			if (target)
 				target->page =3D page;
 		}
-		if (!ret || !target)
+		if (!ret || !target) {
+			if (target)
+				unlock_page(page);
 			put_page(page);
+		}
 	}
 	/*
 	 * There is a swap entry and a page doesn't exist or isn't charged.
@@ -5905,6 +5922,10 @@ static enum mc_target_type get_mctgt_type_thp(struct=
 vm_area_struct *vma,
 		ret =3D MC_TARGET_PAGE;
 		if (target) {
 			get_page(page);
+			if (!trylock_page(page)) {
+				put_page(page);
+				return MC_TARGET_NONE;
+			}
 			target->page =3D page;
 		}
 	}
@@ -6143,6 +6164,7 @@ static int mem_cgroup_move_charge_pte_range(pmd_t *pm=
d,
 				}
 				putback_lru_page(page);
 			}
+			unlock_page(page);
 			put_page(page);
 		} else if (target_type =3D=3D MC_TARGET_DEVICE) {
 			page =3D target.page;
@@ -6151,6 +6173,7 @@ static int mem_cgroup_move_charge_pte_range(pmd_t *pm=
d,
 				mc.precharge -=3D HPAGE_PMD_NR;
 				mc.moved_charge +=3D HPAGE_PMD_NR;
 			}
+			unlock_page(page);
 			put_page(page);
 		}
 		spin_unlock(ptl);
@@ -6193,7 +6216,8 @@ static int mem_cgroup_move_charge_pte_range(pmd_t *pm=
d,
 			}
 			if (!device)
 				putback_lru_page(page);
-put:			/* get_mctgt_type() gets the page */
+put:			/* get_mctgt_type() gets & locks the page */
+			unlock_page(page);
 			put_page(page);
 			break;
 		case MC_TARGET_SWAP:
--=20
2.38.1
From nobody Thu Sep 18 11:50:13 2025
Return-Path: <linux-kernel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 74254C352A1
	for <linux-kernel@archiver.kernel.org>; Tue,  6 Dec 2022 17:14:46 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S235753AbiLFROp (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Tue, 6 Dec 2022 12:14:45 -0500
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48808 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S235687AbiLFRO0 (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Tue, 6 Dec 2022 12:14:26 -0500
Received: from mail-ej1-x62a.google.com (mail-ej1-x62a.google.com
 [IPv6:2a00:1450:4864:20::62a])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 6139132B8C
        for <linux-kernel@vger.kernel.org>;
 Tue,  6 Dec 2022 09:14:20 -0800 (PST)
Received: by mail-ej1-x62a.google.com with SMTP id m18so6273218eji.5
        for <linux-kernel@vger.kernel.org>;
 Tue, 06 Dec 2022 09:14:20 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=cmpxchg-org.20210112.gappssmtp.com; s=20210112;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:from:to:cc:subject:date
         :message-id:reply-to;
        bh=10IT05+HKhb7xyirRRNdh88kI/5idGFXpcCcjOl6q8w=;
        b=uhHfurOuRNSHba5bcgYmHYTUiA0/YQOE6uTGKT8Nlcmry8Tts8/UpjfQ4dm10Rm3yF
         uwkMG3xtgqkmQ2FFsCCiTFe2q2QL5Ml99J3lDmyTYSbiu/vlsJJTOzc/rc6Ry1jRSKvY
         iqu4J98q8ZTNmGg26MTPVifMLjCyjIlyOmcx4ifNd5Tr368zq5meHCenw/bXIzm6w1FA
         fCnOb5UmPsn1+RqvMqwZKgSO2IcO8MmnYWogG426iZHQQb4cILJz/fSHqpNipzX9kxdN
         XCSKZBMRd1s3w6KC1U9lxsD54O3g3JBbdyxhjnR4Y8JhslpAg4hiNFX9CYv8+rpKR3at
         wjKQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20210112;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc
         :subject:date:message-id:reply-to;
        bh=10IT05+HKhb7xyirRRNdh88kI/5idGFXpcCcjOl6q8w=;
        b=T8I1WRudxIBsmblYfns9T4XbosX/zh5p8G9kBZ2VGBPgB0xx6Wvb5vsyrZFsyOHVls
         bmPjai+dgKW9DMD53ql9PD0WWsqMKclZ1GsOX+rF209xYZoiBff0ZTy5/poGpJrsUCU3
         mNaGF/tlP30p4RxmcfkIirPMF7mnYoDYvKy5lUhaXqxNyc8Ft0DDeqadXH9IemKOu3xk
         rHlwXJ8LVhGV+MfW5uU+AsTzjG5+xK4d8WkL1mA8GqjEJFkMiNCK77zGziZ1QnPRTkdN
         qOvAIDsIhJzKX+dPFDu5XlyPt2kSf3ec+wv4d70eGyfEEc14EEFfmCCgPTVUAi3Q5HNC
         OH+Q==
X-Gm-Message-State: ANoB5pm3ja+D+IjTHZMv/UzANKZudQqa8q+oSlyZpepil0ReJDuOIclR
        aF+Q2kgIHDtnyGNTMWzzoyv4XA==
X-Google-Smtp-Source: 
 AA0mqf7kQJJ4LPTmjKqwgzg8jBUYoouC//Y9xZrafAmdsNdsR1Uk81RoFo7FkZKbUBxLLtJ04Q/PaA==
X-Received: by 2002:a17:906:32ce:b0:78d:9022:f146 with SMTP id
 k14-20020a17090632ce00b0078d9022f146mr56850399ejk.656.1670346859021;
        Tue, 06 Dec 2022 09:14:19 -0800 (PST)
Received: from localhost ([2a02:8070:6387:ab20:15aa:3c87:c206:d15e])
        by smtp.gmail.com with ESMTPSA id
 r11-20020a056402034b00b0045bd14e241csm1187950edw.76.2022.12.06.09.14.11
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Tue, 06 Dec 2022 09:14:18 -0800 (PST)
From: Johannes Weiner <hannes@cmpxchg.org>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
        Hugh Dickins <hughd@google.com>,
        Shakeel Butt <shakeelb@google.com>,
        Michal Hocko <mhocko@suse.com>, linux-mm@kvack.org,
        cgroups@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: [PATCH 2/3] mm: rmap: remove lock_page_memcg()
Date: Tue,  6 Dec 2022 18:13:40 +0100
Message-Id: <20221206171340.139790-3-hannes@cmpxchg.org>
X-Mailer: git-send-email 2.38.1
In-Reply-To: <20221206171340.139790-1-hannes@cmpxchg.org>
References: <20221206171340.139790-1-hannes@cmpxchg.org>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

The previous patch made sure charge moving only touches pages for
which page_mapped() is stable. lock_page_memcg() is no longer needed.

Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Hugh Dickins <hughd@google.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Shakeel Butt <shakeelb@google.com>
---
 mm/rmap.c | 26 ++++++++------------------
 1 file changed, 8 insertions(+), 18 deletions(-)

diff --git a/mm/rmap.c b/mm/rmap.c
index b616870a09be..32e48b1c5847 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -1222,9 +1222,6 @@ void page_add_anon_rmap(struct page *page,
 	bool compound =3D flags & RMAP_COMPOUND;
 	bool first =3D true;
=20
-	if (unlikely(PageKsm(page)))
-		lock_page_memcg(page);
-
 	/* Is page being mapped by PTE? Is this its first map to be added? */
 	if (likely(!compound)) {
 		first =3D atomic_inc_and_test(&page->_mapcount);
@@ -1262,15 +1259,14 @@ void page_add_anon_rmap(struct page *page,
 	if (nr)
 		__mod_lruvec_page_state(page, NR_ANON_MAPPED, nr);
=20
-	if (unlikely(PageKsm(page)))
-		unlock_page_memcg(page);
-
-	/* address might be in next vma when migration races vma_adjust */
-	else if (first)
-		__page_set_anon_rmap(page, vma, address,
-				     !!(flags & RMAP_EXCLUSIVE));
-	else
-		__page_check_anon_rmap(page, vma, address);
+	if (likely(!PageKsm(page))) {
+		/* address might be in next vma when migration races vma_adjust */
+		if (first)
+			__page_set_anon_rmap(page, vma, address,
+					     !!(flags & RMAP_EXCLUSIVE));
+		else
+			__page_check_anon_rmap(page, vma, address);
+	}
=20
 	mlock_vma_page(page, vma, compound);
 }
@@ -1329,7 +1325,6 @@ void page_add_file_rmap(struct page *page,
 	bool first;
=20
 	VM_BUG_ON_PAGE(compound && !PageTransHuge(page), page);
-	lock_page_memcg(page);
=20
 	/* Is page being mapped by PTE? Is this its first map to be added? */
 	if (likely(!compound)) {
@@ -1365,7 +1360,6 @@ void page_add_file_rmap(struct page *page,
 			NR_SHMEM_PMDMAPPED : NR_FILE_PMDMAPPED, nr_pmdmapped);
 	if (nr)
 		__mod_lruvec_page_state(page, NR_FILE_MAPPED, nr);
-	unlock_page_memcg(page);
=20
 	mlock_vma_page(page, vma, compound);
 }
@@ -1394,8 +1388,6 @@ void page_remove_rmap(struct page *page,
 		return;
 	}
=20
-	lock_page_memcg(page);
-
 	/* Is page being unmapped by PTE? Is this its last map to be removed? */
 	if (likely(!compound)) {
 		last =3D atomic_add_negative(-1, &page->_mapcount);
@@ -1451,8 +1443,6 @@ void page_remove_rmap(struct page *page,
 	 * and remember that it's only reliable while mapped.
 	 */
=20
-	unlock_page_memcg(page);
-
 	munlock_vma_page(page, vma, compound);
 }
=20
--=20
2.38.1
From nobody Thu Sep 18 11:50:13 2025
Return-Path: <linux-kernel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 43919C352A1
	for <linux-kernel@archiver.kernel.org>; Tue,  6 Dec 2022 17:14:55 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S235759AbiLFROx (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Tue, 6 Dec 2022 12:14:53 -0500
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49422 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S235715AbiLFROc (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Tue, 6 Dec 2022 12:14:32 -0500
Received: from mail-ed1-x533.google.com (mail-ed1-x533.google.com
 [IPv6:2a00:1450:4864:20::533])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B3FBC3AC03
        for <linux-kernel@vger.kernel.org>;
 Tue,  6 Dec 2022 09:14:26 -0800 (PST)
Received: by mail-ed1-x533.google.com with SMTP id c17so12681675edj.13
        for <linux-kernel@vger.kernel.org>;
 Tue, 06 Dec 2022 09:14:26 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=cmpxchg-org.20210112.gappssmtp.com; s=20210112;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:from:to:cc:subject:date
         :message-id:reply-to;
        bh=i7DLHKnfgUHUqbOM/z8uZ79UX5oa3K/ENnh6OCW5tqs=;
        b=SVtFnYIHXMecHvxYJx77nJRHucuFwk5/9UG0Gzc0VixuyFsGI76/q4dIZTOj1/SyqT
         O82nl1FhPWAuz6lthA4angI6hMP1oZ1qGqag/8sKuVxBGm3pxTJTOWSvPrnOqtbj5jfd
         dswMPCRu9hqrDyIeHIcBn4yV+mIpsTk4mtqkiH6YM7R3FEaAd3SHSxjIa0Zn6WW0paME
         3RAvrTuiv3qqMZnPDfVp3gFd0wYptMFNXnJ/2pl0ugruoWzEj4p0JQBQjH47NBbb0t+v
         fuorAUnWM8LTnOrOiOvW4DKQWz57wcJP0qAtpCExb4/f/oq8NDnkZRVOa6F4b3YFJBO1
         Hedg==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20210112;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc
         :subject:date:message-id:reply-to;
        bh=i7DLHKnfgUHUqbOM/z8uZ79UX5oa3K/ENnh6OCW5tqs=;
        b=rvECZLM+DXVh7ifBzW82VuZfbdFane5Gvt+MAyYBhebIZV4gkvEhLg40VmMcVnUM8b
         wxjl8Epfw+uwopwnsIeKnv9TnthuqJionFhPHEhUMNvaid4DUzZ4VoNNr+SXJlub4HUW
         RbKgBzxX4IeHuhpQPjZ2ByQpeIQx2bULxPSjvEpcI+sIYpf15OabMM5NZpJgtEgLXG8Q
         P0ubljJC5x3GKlH/8Y+XKbBkkBv1DMPDp1IpFYEMwAVzGiGCE4GGmnAKLHEdAH/y+FkN
         QgHw3e559AnRzhQHdOYa53GME2Guy68eKBrs8JhtLTB2MRKHl0uIS+YUi7Y3t0+AbRJD
         SmSw==
X-Gm-Message-State: ANoB5pk29kIp3+n9uaQhEl7iQ9II0ABUPmndzRSGNqeBEga524rop896
        k5JRt7egmjLI9T03kbtbf9wqsg==
X-Google-Smtp-Source: 
 AA0mqf4zM0Xfm3o50FPw1HED00hbjDIYwQEpUTYuG4CSMG/pv2gDG2TkVQ9Z4W/PFOBStf5o5pM4yg==
X-Received: by 2002:a05:6402:2213:b0:46b:1d60:f60a with SMTP id
 cq19-20020a056402221300b0046b1d60f60amr37331636edb.193.1670346865233;
        Tue, 06 Dec 2022 09:14:25 -0800 (PST)
Received: from localhost ([2a02:8070:6387:ab20:15aa:3c87:c206:d15e])
        by smtp.gmail.com with ESMTPSA id
 kw25-20020a170907771900b007c0ae8569d6sm1458481ejc.146.2022.12.06.09.14.23
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Tue, 06 Dec 2022 09:14:24 -0800 (PST)
From: Johannes Weiner <hannes@cmpxchg.org>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
        Hugh Dickins <hughd@google.com>,
        Shakeel Butt <shakeelb@google.com>,
        Michal Hocko <mhocko@suse.com>, linux-mm@kvack.org,
        cgroups@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: [PATCH 3/3] mm: memcontrol: deprecate charge moving
Date: Tue,  6 Dec 2022 18:13:41 +0100
Message-Id: <20221206171340.139790-4-hannes@cmpxchg.org>
X-Mailer: git-send-email 2.38.1
In-Reply-To: <20221206171340.139790-1-hannes@cmpxchg.org>
References: <20221206171340.139790-1-hannes@cmpxchg.org>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

Charge moving mode in cgroup1 allows memory to follow tasks as they
migrate between cgroups. This is, and always has been, a questionable
thing to do - for several reasons.

First, it's expensive. Pages need to be identified, locked and
isolated from various MM operations, and reassigned, one by one.

Second, it's unreliable. Once pages are charged to a cgroup, there
isn't always a clear owner task anymore. Cache isn't moved at all, for
example. Mapped memory is moved - but if trylocking or isolating a
page fails, it's arbitrarily left behind. Frequent moving between
domains may leave a task's memory scattered all over the place.

Third, it isn't really needed. Launcher tasks can kick off workload
tasks directly in their target cgroup. Using dedicated per-workload
groups allows fine-grained policy adjustments - no need to move tasks
and their physical pages between control domains. The feature was
never forward-ported to cgroup2, and it hasn't been missed.

Despite it being a niche usecase, the maintenance overhead of
supporting it is enormous. Because pages are moved while they are live
and subject to various MM operations, the synchronization rules are
complicated. There are lock_page_memcg() in MM and FS code, which
non-cgroup people don't understand. In some cases we've been able to
shift code and cgroup API calls around such that we can rely on native
locking as much as possible. But that's fragile, and sometimes we need
to hold MM locks for longer than we otherwise would (pte lock e.g.).

Mark the feature deprecated. Hopefully we can remove it soon.

Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Hugh Dickins <hughd@google.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Shakeel Butt <shakeelb@google.com>
---
 Documentation/admin-guide/cgroup-v1/memory.rst | 11 ++++++++++-
 mm/memcontrol.c                                |  4 ++++
 2 files changed, 14 insertions(+), 1 deletion(-)

diff --git a/Documentation/admin-guide/cgroup-v1/memory.rst b/Documentation=
/admin-guide/cgroup-v1/memory.rst
index 60370f2c67b9..87d7877b98ec 100644
--- a/Documentation/admin-guide/cgroup-v1/memory.rst
+++ b/Documentation/admin-guide/cgroup-v1/memory.rst
@@ -86,6 +86,8 @@ Brief summary of control files.
  memory.swappiness		     set/show swappiness parameter of vmscan
 				     (See sysctl's vm.swappiness)
  memory.move_charge_at_immigrate     set/show controls of moving charges
+                                     This knob is deprecated and shouldn't=
 be
+                                     used.
  memory.oom_control		     set/show oom controls.
  memory.numa_stat		     show the number of memory usage per numa
 				     node
@@ -717,9 +719,16 @@ Soft limits can be setup by using the following comman=
ds (in this example we
        It is recommended to set the soft limit always below the hard limit,
        otherwise the hard limit will take precedence.
=20
-8. Move charges at task migration
+8. Move charges at task migration (DEPRECATED!)
 =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D
=20
+THIS IS DEPRECATED!
+
+It's expensive and unreliable! It's better practice to launch workload
+tasks directly from inside their target cgroup. Use dedicated workload
+cgroups to allow fine-grained policy adjustments without having to
+move physical pages between control domains.
+
 Users can move charges associated with a task along with task migration, t=
hat
 is, uncharge task's pages from the old cgroup and charge them to the new c=
group.
 This feature is not supported in !CONFIG_MMU environments because of lack =
of
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index b696354c1b21..e650a38d9a90 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -3919,6 +3919,10 @@ static int mem_cgroup_move_charge_write(struct cgrou=
p_subsys_state *css,
 {
 	struct mem_cgroup *memcg =3D mem_cgroup_from_css(css);
=20
+	pr_warn_once("Cgroup memory moving is deprecated. "
+		     "Please report your usecase to linux-mm@kvack.org if you "
+		     "depend on this functionality.\n");
+
 	if (val & ~MOVE_MASK)
 		return -EINVAL;
=20
--=20
2.38.1