From nobody Fri Dec 19 16:05:22 2025 Received: from mail-pl1-f178.google.com (mail-pl1-f178.google.com [209.85.214.178]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 31A8421A437 for ; Tue, 15 Apr 2025 02:46:01 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.178 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1744685164; cv=none; b=VzkzrX0F+PJcjJ4Sotv4oE0XH17gTGc8iTnR43s5ZNVmGVcXZzSrf29YpwBFYy267i1PmsMhgYm00wxMfZpoXUdcBNpe7/M/niUNOsRdqp0VncwazG0b8qCuyFKyj08gJpAkFo+fI4J3BXGB3076sgEsi8rgMA86kLXKdEmDsR8= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1744685164; c=relaxed/simple; bh=IbU9S+gNd9H4vOMxRrhOwxpGymSk9IS2B1xUn+hKwUg=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=OdGv7IhrZDxSaicqnkNCSRpqhzOYxCyXo1Uwy7Ja2rp8dV6pOkQGeuP9c7EX3kj5Boeq8k6EhdLCDlksoOI20kYA4mLL48VfPbs7Fy1Ki6UngZRFPmab2Zc1l1tAE+zGbuYB8sCiKqPLnzoSldQRwuu6kqEdAuurgFUWIIrdS3o= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com; spf=pass smtp.mailfrom=bytedance.com; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b=EyDAunD/; arc=none smtp.client-ip=209.85.214.178 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=bytedance.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b="EyDAunD/" Received: by mail-pl1-f178.google.com with SMTP id d9443c01a7336-223fd89d036so60574635ad.1 for ; Mon, 14 Apr 2025 19:46:01 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1744685161; x=1745289961; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=M8KzBx499Eh3TwCUOfkq+U0TMdvhQzDUhsoXtCDxKvc=; b=EyDAunD/dohhiGW9sb9mdr1h04B1Sp+y1TCwYmeUUdwIhuXAth0jBjxwOboAVKWSmj fzPSykXT6rnX5VfWfUQHXEBoo2WQfk4j7Mz8bpiKyBA+TFdZp3wSjH1s77KUhqjbZitT 67eDbekpKJZpqVIOJxETII0a6h6iaDvNyvn8Gntnnd4bEEgpvzYt8dtnXD+5MwZwL2PK HteGVj/5I+POEe7TERQMOfBn7OZsTPVQRgPCWd2N7PYiiQMM12h/FsYhobjjLd5HyW/g Fm9+Rk/JFZHRjbR4YlC23cfa9TTSRu23K9gP7xl97By8/p7OpJGkmj3sHnNHAKZhbEGz v2bg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1744685161; x=1745289961; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=M8KzBx499Eh3TwCUOfkq+U0TMdvhQzDUhsoXtCDxKvc=; b=qQ3g5LrnX7JkDu2iht6HL/v/bS538PfQ06nb9033UQWtFuL/JJyu1Y7uVaQvTgsc79 AaGqoowFaDksGeqfpqxq6SeGVIpzcVSJruL8OkWB0J/7PU5ujiiCBfy7whjLzvBXTEb7 fPoC/DXnHZ7TOlJh9NkS40tz4YswYdy188W0jqRqT2fPBeavIr/j43VXumArFenpuiSa 2OaLyfYRbFw2ifc3o/nP+Q1srvpw5WBxIb6ft8P79oC9q0ucdvxMgGdbP4C7X3CFdowm 9BSd/SXGrbQqJSWzcxlWD0lKJLd/q1qtEsWLEySpbXOWcBM85eeYLkqtLYntaxj9T4kc 7OCA== X-Gm-Message-State: AOJu0Yz4KuoM6IxWf0M2jK7cClmEaXbWaeI3zAemkA0X9Fg2+VSAYZVJ uPfhswBtxP5HyK0QokP2wlkeQvvztBR9bT1mUkPGMzMR58YZ9Z3mKsE2mBGfXoI= X-Gm-Gg: ASbGncurN9D/4X7AX4B4ZYTtiyujBPRXghKr0mLtrZeHDPGS5hatqFH8RHty9QK1WOZ 1DL8/k2zeqtclxKGmQdfNbr1wglCaRbtSAApuAVhyuhwQKJf4Hb5tIcbU09jebgK/Xj6rGLi2cV Av/6EVFQ0bTcuToCOrQNSdqGMyOJDSyZG9Zr/B2t4fJYu4E9lzz2NLu3JvyHS8PjH9VKFZiUonF XmLLHR8rE/UFbIthpumvlXdHTnEp0WFWemclQzUo8rqGj8fyOMRyWPw3kn9yvbjo/9pxS3HGoNx RU47D24KXAQ9PjzBKhEAGhxLUv6qdpagZqO/X28zv90N3hzdBw1S5IpSoTL9XDRHQEluXy/m X-Google-Smtp-Source: AGHT+IGjusvGNjKHuKsmMzFyHvfJ71DX0fIRnpQJPK83MJ7aTNHlxqG7qwINGqRKJ0KYDTG88Uch2Q== X-Received: by 2002:a17:902:c40b:b0:223:88af:2c30 with SMTP id d9443c01a7336-22bea4b3c03mr161252765ad.16.1744685161415; Mon, 14 Apr 2025 19:46:01 -0700 (PDT) Received: from PXLDJ45XCM.bytedance.net ([61.213.176.5]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-22ac7ccac49sm106681185ad.217.2025.04.14.19.45.56 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Mon, 14 Apr 2025 19:46:00 -0700 (PDT) From: Muchun Song To: hannes@cmpxchg.org, mhocko@kernel.org, roman.gushchin@linux.dev, shakeel.butt@linux.dev, muchun.song@linux.dev, akpm@linux-foundation.org, david@fromorbit.com, zhengqi.arch@bytedance.com, yosry.ahmed@linux.dev, nphamcs@gmail.com, chengming.zhou@linux.dev Cc: linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, linux-mm@kvack.org, hamzamahfooz@linux.microsoft.com, apais@linux.microsoft.com, Muchun Song Subject: [PATCH RFC 01/28] mm: memcontrol: remove dead code of checking parent memory cgroup Date: Tue, 15 Apr 2025 10:45:05 +0800 Message-Id: <20250415024532.26632-2-songmuchun@bytedance.com> X-Mailer: git-send-email 2.39.5 (Apple Git-154) In-Reply-To: <20250415024532.26632-1-songmuchun@bytedance.com> References: <20250415024532.26632-1-songmuchun@bytedance.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Since the no-hierarchy mode has been deprecated after the commit: commit bef8620cd8e0 ("mm: memcg: deprecate the non-hierarchical mode"). As a result, parent_mem_cgroup() will not return NULL except when passing the root memcg, and the root memcg cannot be offline. Hence, it's safe to remove the check on the returned value of parent_mem_cgroup(). Remove the corresponding dead code. Signed-off-by: Muchun Song Acked-by: Roman Gushchin Acked-by: Johannes Weiner --- mm/memcontrol.c | 5 ----- mm/shrinker.c | 6 +----- 2 files changed, 1 insertion(+), 10 deletions(-) diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 421740f1bcdc..61488e45cab2 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -3196,9 +3196,6 @@ static void memcg_offline_kmem(struct mem_cgroup *mem= cg) return; =20 parent =3D parent_mem_cgroup(memcg); - if (!parent) - parent =3D root_mem_cgroup; - memcg_reparent_list_lrus(memcg, parent); =20 /* @@ -3489,8 +3486,6 @@ struct mem_cgroup *mem_cgroup_id_get_online(struct me= m_cgroup *memcg) break; } memcg =3D parent_mem_cgroup(memcg); - if (!memcg) - memcg =3D root_mem_cgroup; } return memcg; } diff --git a/mm/shrinker.c b/mm/shrinker.c index 4a93fd433689..e8e092a2f7f4 100644 --- a/mm/shrinker.c +++ b/mm/shrinker.c @@ -286,14 +286,10 @@ void reparent_shrinker_deferred(struct mem_cgroup *me= mcg) { int nid, index, offset; long nr; - struct mem_cgroup *parent; + struct mem_cgroup *parent =3D parent_mem_cgroup(memcg); struct shrinker_info *child_info, *parent_info; struct shrinker_info_unit *child_unit, *parent_unit; =20 - parent =3D parent_mem_cgroup(memcg); - if (!parent) - parent =3D root_mem_cgroup; - /* Prevent from concurrent shrinker_info expand */ mutex_lock(&shrinker_mutex); for_each_node(nid) { --=20 2.20.1 From nobody Fri Dec 19 16:05:22 2025 Received: from mail-pg1-f177.google.com (mail-pg1-f177.google.com [209.85.215.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B850914830F for ; Tue, 15 Apr 2025 02:46:07 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.215.177 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1744685169; cv=none; b=L4MA5aGfgI1X8sr1STtNa3oEcnBNma2x8D5BaulQ5nhWgKbDVpdHIpZ1XqJFhGV//Q3ubrJXUs+llbHsNvGDRubGm+Fpdc+Zf9UQdzeI+zqLu7udtXbQxPxF3GfC3Syzy762ifwJnJrwlk6roJh4vt6EimBeGbU9Ye23rQpuCXg= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1744685169; c=relaxed/simple; bh=SmGY8uUfPfu4APqtxCqC7OBXU35g8mxz82QCGYM/o0o=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=mSElc3ctOopUou42Lnku/r3Lkb3TVxPgNeYUJeK8aMkoP9a2lV80nRBlAvXcW+jTISu9Aq5oaYiF0VUPDjEXCbS6tcXB3fQaTZ44LLcbYEmYgXlUobSaALL6QH8iBjZUjVSaGLoDSYp+rAV+pOZQe9i/BziRzLaN3gDCYBkLfnY= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com; spf=pass smtp.mailfrom=bytedance.com; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b=hbl4yyBt; arc=none smtp.client-ip=209.85.215.177 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=bytedance.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b="hbl4yyBt" Received: by mail-pg1-f177.google.com with SMTP id 41be03b00d2f7-aee79a0f192so3336901a12.3 for ; Mon, 14 Apr 2025 19:46:07 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1744685167; x=1745289967; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=gvWIGpHR+GW9zZlduzgMx/Do9egOoQOFfamShBDZoJg=; b=hbl4yyBtfmMe5vGr9HAqp5ANdG+B/Z+HV15Y+4Q48zqDXXWTy0Qh5bYW8syDEQHU6a vBTq/zubKEbTrtbZV29I0avu89C6TCvLHLPEeL3C2pcBKHEezszA5myLsV4kh/aVvBbe ++mMDcbfe0JtVt/Nn1WMGh+qFC2h8HAGEaz4IfA8WZ6EpJ0ptLx6+2MbENJjPEp7dMX7 Z8udqVvMrbjV8HKaAmbRq3u0JF/90N4NqziVKrKj429rVSbuMJ1G6+7rbucbhCmMAiZ5 lzitCQd3mGibykn903yHqMMtCGjjUI+cu6a2Fo2kKLWYFut85bfz7beFFDCOBOjwV033 +BZA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1744685167; x=1745289967; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=gvWIGpHR+GW9zZlduzgMx/Do9egOoQOFfamShBDZoJg=; b=rK1c1xVKUrMV5x7X8Pq5Yc3USA8YjgSdMaUUkWq4XLVCpq4OZVLG6k4OkOQtmrtyi4 QbxA2hZYHbRl0SS0whPob2t1LDXGWQ5ZNtHqaqkcw0bUSqgv36N7E6oePqsUEc73iVHp xd4SW5zKoKPQLTTtAV5YSIbp4AE2QfbxTZ8zBVAXR/ZXBkoXLSVqX0avqZ4dRcbk/scZ jI5uU8H8M8Q24rlwRHt/b7uQbQ8oyFwHqCIJrD5lVyi+mtc8tVnIw6/JpFwBWOhb6yAT +L0zkMIX51lUP2hFSc2RdeFWqoqRBsFN73jasPG3FSgOV69hifEzigBubu+1ggvtzrSM D+rA== X-Gm-Message-State: AOJu0YxjUSKeA/eP4nyU1ovo31xfE5ziWb5+uGtNH/lm15JenR7EGuXZ AZBGrDlgXriSQ1T+x2ukQlQ68nS2TNq/tof3Ju5Bk0inIsPz5rBVDVUwCSWyzso= X-Gm-Gg: ASbGncvA3Pgg2u5QUu+lXeppdnedIo2SYG/hO0UGBU4L7TzjjLUCTynLyVW+p+s/6eO uIKQofjHAUrfhcGM9p64WEf/zBuzpdss1VWr5BpYlnzIqCO6A2MiGIgXZrMc19UAr0Jc2w/rzG8 0a5yuTtjKbavjfGJNfVEeMtrCh52zyXg57SLcKzbOjcmNRcGo3vRWr6mer4QuUH6P3AE9YBCVuj t1CWQHK6KPYs47VXpiaTW3cqWKIe8rMzZKqeox9ozywALZxrvWhAXk71a5+8NPlyMnOfAySLg++ ZfbchG97+jXVYkMK+fHtFNXMK+p4hvZp3OGJB10LXxTm7SN0lean1tLJPIsJxzsZQSmAe0pO X-Google-Smtp-Source: AGHT+IH8Y9q5TPLiC8967x/6BZ0skn8uSLHYwF2c4d9LznyXXEA3aGsij5atdYTxBM4bP93XVTR95w== X-Received: by 2002:a17:90a:c105:b0:2fa:157e:c790 with SMTP id 98e67ed59e1d1-30823676374mr19082580a91.5.1744685166974; Mon, 14 Apr 2025 19:46:06 -0700 (PDT) Received: from PXLDJ45XCM.bytedance.net ([61.213.176.5]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-22ac7ccac49sm106681185ad.217.2025.04.14.19.46.01 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Mon, 14 Apr 2025 19:46:06 -0700 (PDT) From: Muchun Song To: hannes@cmpxchg.org, mhocko@kernel.org, roman.gushchin@linux.dev, shakeel.butt@linux.dev, muchun.song@linux.dev, akpm@linux-foundation.org, david@fromorbit.com, zhengqi.arch@bytedance.com, yosry.ahmed@linux.dev, nphamcs@gmail.com, chengming.zhou@linux.dev Cc: linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, linux-mm@kvack.org, hamzamahfooz@linux.microsoft.com, apais@linux.microsoft.com, Muchun Song Subject: [PATCH RFC 02/28] mm: memcontrol: use folio_memcg_charged() to avoid potential rcu lock holding Date: Tue, 15 Apr 2025 10:45:06 +0800 Message-Id: <20250415024532.26632-3-songmuchun@bytedance.com> X-Mailer: git-send-email 2.39.5 (Apple Git-154) In-Reply-To: <20250415024532.26632-1-songmuchun@bytedance.com> References: <20250415024532.26632-1-songmuchun@bytedance.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" If a folio isn't charged to the memory cgroup, holding an rcu read lock is needless. Users only want to know its charge status, so use folio_memcg_charged() here. Signed-off-by: Muchun Song --- mm/memcontrol.c | 11 ++++------- 1 file changed, 4 insertions(+), 7 deletions(-) diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 61488e45cab2..0fc76d50bc23 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -797,20 +797,17 @@ void __mod_lruvec_state(struct lruvec *lruvec, enum n= ode_stat_item idx, void __lruvec_stat_mod_folio(struct folio *folio, enum node_stat_item idx, int val) { - struct mem_cgroup *memcg; pg_data_t *pgdat =3D folio_pgdat(folio); struct lruvec *lruvec; =20 - rcu_read_lock(); - memcg =3D folio_memcg(folio); - /* Untracked pages have no memcg, no lruvec. Update only the node */ - if (!memcg) { - rcu_read_unlock(); + if (!folio_memcg_charged(folio)) { + /* Untracked pages have no memcg, no lruvec. Update only the node */ __mod_node_page_state(pgdat, idx, val); return; } =20 - lruvec =3D mem_cgroup_lruvec(memcg, pgdat); + rcu_read_lock(); + lruvec =3D mem_cgroup_lruvec(folio_memcg(folio), pgdat); __mod_lruvec_state(lruvec, idx, val); rcu_read_unlock(); } --=20 2.20.1 From nobody Fri Dec 19 16:05:22 2025 Received: from mail-pl1-f171.google.com (mail-pl1-f171.google.com [209.85.214.171]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id DD79D20ADE9 for ; Tue, 15 Apr 2025 02:46:12 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.171 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1744685174; cv=none; b=iWF0KtDRF1Ag72ezFsqLoi8I+qnUFLHqTIMFs6/qktWmRHdW9rrGKrOVCBsJB4WQM3pDISFRy5Jk7Tl14/J/dGomd1cQb9jRFD4cBQH2iNlabYSriEOQHhRKpW8JQcbLe/xhVGNNwEdR0ADietNM0xWECgw9Cz3BYgb+j4lRO7c= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1744685174; c=relaxed/simple; bh=coPa0VZ+lz8ATwbtZkKdsUxWQhC+KKG60nOSR2hGqUQ=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=uQfYYDUj3r4uroeOEJhphuKoJhWkrKROltRy5O1UpSzIVYrK9DoXP6y5nUGrNv/i/MJWNL9EEsekPwDfgUZt0mpSmaXvy1enk4R+TAbyNxPboNrG7rBG9pEmx1/QXMcLh54H93iAHxm4QRSPKLBv9FVLMI3wYyeSHph5LLemQA8= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com; spf=pass smtp.mailfrom=bytedance.com; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b=g7TnzUTo; arc=none smtp.client-ip=209.85.214.171 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=bytedance.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b="g7TnzUTo" Received: by mail-pl1-f171.google.com with SMTP id d9443c01a7336-224100e9a5cso56026985ad.2 for ; Mon, 14 Apr 2025 19:46:12 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1744685172; x=1745289972; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=WxEqDGFZMUmXVsS40Ac89DQGDSUswXt+C8ij+v7lQJA=; b=g7TnzUTo27NnuNpVdGbuY2kPLu+ed9HtSEaBn+VMpJZksRwEKRafjaQzmAkmtk4z8v flY8QSCqyV35m9dLFRN55bfxL0rw2k5or062olD03Hn1BvEVmPEkvbUq0ZFDbdJz5RIb pbBmUFeyLOqEdqdt9/2+VSItJdczW4ESgtNKUxk+KziSBkcx/XWWaxQVJ+kKtIUwGVdY TvjA4nNb9d+/iqyChMhFoAflXuJIaYbfsaiKY/RgWC1NETIn90lEzQesRzX8JBLyYXk3 OFc4eT+FBOYjSUPkLIjMOHLFlcnyMkQIN+edV5CrspNcQARFFZutJPiPKJ1Skd6+iIRz j9zA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1744685172; x=1745289972; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=WxEqDGFZMUmXVsS40Ac89DQGDSUswXt+C8ij+v7lQJA=; b=xIdvrY3F054/U+LtKUSm//M86S+L/mLz1A4hoWdshTqPRfG9ancpy/ZeBigqUJV4NR B5VA4XbXZv6oEEvZZ0jU1WaPNUcE7j7OjsJwvY7KzztSWtcK75injQehUXp7bH0YZ+EM KRuDrltKi4JyspeJPlm19heVdFaq+BiebGIyLfkfvpfGvDzaBdz6/GaqdbjE/I2UV56N Yrqi5ezMistB3G+8pQ29nCSyrCM9VKytIYjkFAxoqvkDOF1D/RMsF4uwals3DxdNJOXk 3A/mju441bY0R884treZ76zmGuXueFzEyLNaW5en0txrc0NhZ9kMAes0pNWs31Pa17O0 7eZA== X-Gm-Message-State: AOJu0YwgmhFQbkrjImxNgEnOnnQjoe7eHS/LlpjtlfIoMP9apjYrFarm J7U4BQMwFTasH8Y274LxKc6wYBkrC50XKoOSe2P3BFpa53DzQdCkLAGBFXR+0iA= X-Gm-Gg: ASbGncu/1jNu7SlVYxQe9QFeTSO6Uh89kqQf6tyj6AKeDllSar0h0FlyrS/1hHgAQax FAoQLDLL4mjNuJ1f6Uu7oC3SXvonKmUHtqYfe57csnVG8NfSSMrwyEZNoOn2wB5SOsG/zqNIyBW ylC+bMZfSs2leBz1S9dYCffKHGZCwAEYD1d6VOF7emekAnTzK4ydNSyNkoq5Q/Fx42l/sK/PjRb blM7MQi2oe3Ke7vsyIDtjyq9hpxa/hWNnxdqb8Zp6n13O4Fk+XxO7afqbhj90bw06JhsxQyLpbQ A+cFzJt8YdIF+8WBtzbIKScytkpNtMewtB4EY1zAVWo7fhMoCAj632Vb9AmReix+ZfW/huHb X-Google-Smtp-Source: AGHT+IEIXidh/WytIqsVeM73H2bfcV0T/EuPMcTn2oTZb/fVqTSwur1YIJZD8NROyg3NAghPl0x8VQ== X-Received: by 2002:a17:903:228f:b0:220:bcc5:2845 with SMTP id d9443c01a7336-22bea49548fmr180452435ad.7.1744685172229; Mon, 14 Apr 2025 19:46:12 -0700 (PDT) Received: from PXLDJ45XCM.bytedance.net ([61.213.176.5]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-22ac7ccac49sm106681185ad.217.2025.04.14.19.46.07 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Mon, 14 Apr 2025 19:46:11 -0700 (PDT) From: Muchun Song To: hannes@cmpxchg.org, mhocko@kernel.org, roman.gushchin@linux.dev, shakeel.butt@linux.dev, muchun.song@linux.dev, akpm@linux-foundation.org, david@fromorbit.com, zhengqi.arch@bytedance.com, yosry.ahmed@linux.dev, nphamcs@gmail.com, chengming.zhou@linux.dev Cc: linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, linux-mm@kvack.org, hamzamahfooz@linux.microsoft.com, apais@linux.microsoft.com, Muchun Song Subject: [PATCH RFC 03/28] mm: workingset: use folio_lruvec() in workingset_refault() Date: Tue, 15 Apr 2025 10:45:07 +0800 Message-Id: <20250415024532.26632-4-songmuchun@bytedance.com> X-Mailer: git-send-email 2.39.5 (Apple Git-154) In-Reply-To: <20250415024532.26632-1-songmuchun@bytedance.com> References: <20250415024532.26632-1-songmuchun@bytedance.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Use folio_lruvec() to simplify the code. Signed-off-by: Muchun Song Acked-by: Johannes Weiner --- mm/workingset.c | 7 +------ 1 file changed, 1 insertion(+), 6 deletions(-) diff --git a/mm/workingset.c b/mm/workingset.c index 4841ae8af411..ebafc0eaafba 100644 --- a/mm/workingset.c +++ b/mm/workingset.c @@ -534,8 +534,6 @@ bool workingset_test_recent(void *shadow, bool file, bo= ol *workingset, void workingset_refault(struct folio *folio, void *shadow) { bool file =3D folio_is_file_lru(folio); - struct pglist_data *pgdat; - struct mem_cgroup *memcg; struct lruvec *lruvec; bool workingset; long nr; @@ -557,10 +555,7 @@ void workingset_refault(struct folio *folio, void *sha= dow) * locked to guarantee folio_memcg() stability throughout. */ nr =3D folio_nr_pages(folio); - memcg =3D folio_memcg(folio); - pgdat =3D folio_pgdat(folio); - lruvec =3D mem_cgroup_lruvec(memcg, pgdat); - + lruvec =3D folio_lruvec(folio); mod_lruvec_state(lruvec, WORKINGSET_REFAULT_BASE + file, nr); =20 if (!workingset_test_recent(shadow, file, &workingset, true)) --=20 2.20.1 From nobody Fri Dec 19 16:05:22 2025 Received: from mail-pl1-f170.google.com (mail-pl1-f170.google.com [209.85.214.170]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 3B8DD20DD54 for ; Tue, 15 Apr 2025 02:46:17 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.170 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1744685180; cv=none; b=NmOFWA1t+a8haBlasdTl8I+LNJ0LYSRGSdeIiaOfzyKbRK+IojUXnp1sU9MlgMC3hf9cU50L2YfRje/5ydFEU8IbMQAnYHXOQrzbX/p8kxB7TZEw5SeXFwgvhqSG4ssFPdsAJFJSde0tZFhwbKZ3HX3yl3T1emyW3ZQCeR4OL/4= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1744685180; c=relaxed/simple; bh=cBh3aVYxqc3iw3A6mggzgpoySbYgwI2wab9Jivs2nVA=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=mWkeYgGlGp7TNdhNndMBS7hP+V5W7yfbkBa15ZJlggSSSFuuTXYRCS9nJsQ29zRTCIBdFM/GPrd2oqJD8cYjoMRdzk3/svQT6fNJYWRf1n3jUCAn0Ytpkv1DzOSwUIhzHLJXTOWqHSMKfnwI05hgRo4RQIj3W1atcs73mrhZ9Pc= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com; spf=pass smtp.mailfrom=bytedance.com; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b=J+EtbJYC; arc=none smtp.client-ip=209.85.214.170 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=bytedance.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b="J+EtbJYC" Received: by mail-pl1-f170.google.com with SMTP id d9443c01a7336-2254e0b4b79so67747985ad.2 for ; Mon, 14 Apr 2025 19:46:17 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1744685177; x=1745289977; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=brxtynYJLZUnqxef6aOOLSxck7i4XWxbG91ObFKf1mY=; b=J+EtbJYCNrsiRhTnIXYwzKydCDbC9lQISKiD91+T5txjGvQbqzUVmU3m2g3SgDXhcR Tcl0No1utcmIIOtKMqmOjc1vHAaNq6O/li2soyt4lkqzzVgLGwIzc8uFzMVh4BUfPuq9 ZMgdjbaz6QH2zqg1dvuexLWiGoTfqvO8RVmgzzBYUjdV9fSvDyj1fN2Ex4DznGn0nVnx PjMVPLjN4OAqlpFDDSsdf4s+WnM3v02QGrt2RQp38DI1tnV2Axf9kwORoHuxmxyrX2NI O00itmjSCjZ/yXLyscDs+E5FiECBeSilmFTzdXyPlou12nwLIWVyL7JqnISHllARykz9 zgaQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1744685177; x=1745289977; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=brxtynYJLZUnqxef6aOOLSxck7i4XWxbG91ObFKf1mY=; b=QOvXZTGrHL9S5vXddIzizbiyCGsoUU0ll663BfZH2Ul54shFV4pp48A8mIJQkVI2U8 e9sxu/rE5nr8nYAPiP7kr1b6hREn0r7o1XyTwNBSNwEWd4APqZQ9XOez5lG91ST/rlU6 WwiV22xTO4Wjkew6tXGML67D5pn3r5NCBL8XBu7JC9cYEjRRXACwsvdmp/FCjK/42HUs xPCyGz/R1/2OWQh4DJPSoHjw1bxzQZhPiTs0yzGBhFcZd+rYj2vBmHBycYoleulnmBFH EZ1wSjjAepcG2Uh7LuIunPTWo2jVzMhFB3hVMTaDlrjZdPYiEBm3WTXg9iHz7HKQU7HJ uFgg== X-Gm-Message-State: AOJu0YwgJr9Em4sBBLbHJKbOydv4uVXiS5hwhJ7y/Ketr6nPpv2KP4Mw DSziFCdOCWUMgfT6aIoxflLdW50b0wbs+i7BzZ0CPNv5l/f2+A1CiaTJwiYYqQA= X-Gm-Gg: ASbGncvAKCHJqWgN7OZs2OVZ66gW8IlffKmJ1ppo34i9aglTofeYBbgw4bJJEBSA2i+ eLGNeRI9w3yJAPajX9XIMHXtHR98mxfST/b6dInR7EdSEzXkP8ShXO+mOjMjHSstYITijZJU50g RVm0kz1MHSN+JkT2jKK3s0g8l+wbHVYCElMk9Uy9CDhwTl9Ji+qI36ERjvgC68IyKj1Q7KP1rV8 WjXF/xTMUvvTQ889VIdiSPeTbaM5DDoZctBLa/OK9ucm71EQXEPbJRQMRUQHJ26HhwgpSWvpwW2 OZVkAHKscn0Thz889j9OmKcDvoEqBLb1hBrLzTs0u0ayFwW3lGL2bzJKHer9CYwoHV2HJL0L X-Google-Smtp-Source: AGHT+IHyyAeyjZ49bEZ0MTKd+BqhH6rBa5vP3SZXknSqiAeNZVb3i/S33ow8Ug4oRyTybyiTWON2CQ== X-Received: by 2002:a17:902:db0b:b0:21f:164d:93fe with SMTP id d9443c01a7336-22bea50832bmr206493445ad.53.1744685177428; Mon, 14 Apr 2025 19:46:17 -0700 (PDT) Received: from PXLDJ45XCM.bytedance.net ([61.213.176.5]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-22ac7ccac49sm106681185ad.217.2025.04.14.19.46.12 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Mon, 14 Apr 2025 19:46:16 -0700 (PDT) From: Muchun Song To: hannes@cmpxchg.org, mhocko@kernel.org, roman.gushchin@linux.dev, shakeel.butt@linux.dev, muchun.song@linux.dev, akpm@linux-foundation.org, david@fromorbit.com, zhengqi.arch@bytedance.com, yosry.ahmed@linux.dev, nphamcs@gmail.com, chengming.zhou@linux.dev Cc: linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, linux-mm@kvack.org, hamzamahfooz@linux.microsoft.com, apais@linux.microsoft.com, Muchun Song Subject: [PATCH RFC 04/28] mm: rename unlock_page_lruvec_irq and its variants Date: Tue, 15 Apr 2025 10:45:08 +0800 Message-Id: <20250415024532.26632-5-songmuchun@bytedance.com> X-Mailer: git-send-email 2.39.5 (Apple Git-154) In-Reply-To: <20250415024532.26632-1-songmuchun@bytedance.com> References: <20250415024532.26632-1-songmuchun@bytedance.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" It is inappropriate to use folio_lruvec_lock() variants in conjunction with unlock_page_lruvec() variants, as this involves the inconsistent operation = of locking a folio while unlocking a page. To rectify this, the functions unlock_page_lruvec{_irq, _irqrestore} are renamed to lruvec_unlock{_irq, _irqrestore}. Signed-off-by: Muchun Song Acked-by: Roman Gushchin Acked-by: Johannes Weiner --- include/linux/memcontrol.h | 10 +++++----- mm/compaction.c | 14 +++++++------- mm/huge_memory.c | 2 +- mm/mlock.c | 2 +- mm/swap.c | 12 ++++++------ mm/vmscan.c | 4 ++-- 6 files changed, 22 insertions(+), 22 deletions(-) diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h index 53364526d877..a045819bcf40 100644 --- a/include/linux/memcontrol.h +++ b/include/linux/memcontrol.h @@ -1510,17 +1510,17 @@ static inline struct lruvec *parent_lruvec(struct l= ruvec *lruvec) return mem_cgroup_lruvec(memcg, lruvec_pgdat(lruvec)); } =20 -static inline void unlock_page_lruvec(struct lruvec *lruvec) +static inline void lruvec_unlock(struct lruvec *lruvec) { spin_unlock(&lruvec->lru_lock); } =20 -static inline void unlock_page_lruvec_irq(struct lruvec *lruvec) +static inline void lruvec_unlock_irq(struct lruvec *lruvec) { spin_unlock_irq(&lruvec->lru_lock); } =20 -static inline void unlock_page_lruvec_irqrestore(struct lruvec *lruvec, +static inline void lruvec_unlock_irqrestore(struct lruvec *lruvec, unsigned long flags) { spin_unlock_irqrestore(&lruvec->lru_lock, flags); @@ -1542,7 +1542,7 @@ static inline struct lruvec *folio_lruvec_relock_irq(= struct folio *folio, if (folio_matches_lruvec(folio, locked_lruvec)) return locked_lruvec; =20 - unlock_page_lruvec_irq(locked_lruvec); + lruvec_unlock_irq(locked_lruvec); } =20 return folio_lruvec_lock_irq(folio); @@ -1556,7 +1556,7 @@ static inline void folio_lruvec_relock_irqsave(struct= folio *folio, if (folio_matches_lruvec(folio, *lruvecp)) return; =20 - unlock_page_lruvec_irqrestore(*lruvecp, *flags); + lruvec_unlock_irqrestore(*lruvecp, *flags); } =20 *lruvecp =3D folio_lruvec_lock_irqsave(folio, flags); diff --git a/mm/compaction.c b/mm/compaction.c index 139f00c0308a..ce45d633ddad 100644 --- a/mm/compaction.c +++ b/mm/compaction.c @@ -946,7 +946,7 @@ isolate_migratepages_block(struct compact_control *cc, = unsigned long low_pfn, */ if (!(low_pfn % COMPACT_CLUSTER_MAX)) { if (locked) { - unlock_page_lruvec_irqrestore(locked, flags); + lruvec_unlock_irqrestore(locked, flags); locked =3D NULL; } =20 @@ -997,7 +997,7 @@ isolate_migratepages_block(struct compact_control *cc, = unsigned long low_pfn, } /* for alloc_contig case */ if (locked) { - unlock_page_lruvec_irqrestore(locked, flags); + lruvec_unlock_irqrestore(locked, flags); locked =3D NULL; } =20 @@ -1089,7 +1089,7 @@ isolate_migratepages_block(struct compact_control *cc= , unsigned long low_pfn, if (unlikely(__PageMovable(page)) && !PageIsolated(page)) { if (locked) { - unlock_page_lruvec_irqrestore(locked, flags); + lruvec_unlock_irqrestore(locked, flags); locked =3D NULL; } =20 @@ -1194,7 +1194,7 @@ isolate_migratepages_block(struct compact_control *cc= , unsigned long low_pfn, /* If we already hold the lock, we can skip some rechecking */ if (lruvec !=3D locked) { if (locked) - unlock_page_lruvec_irqrestore(locked, flags); + lruvec_unlock_irqrestore(locked, flags); =20 compact_lock_irqsave(&lruvec->lru_lock, &flags, cc); locked =3D lruvec; @@ -1262,7 +1262,7 @@ isolate_migratepages_block(struct compact_control *cc= , unsigned long low_pfn, isolate_fail_put: /* Avoid potential deadlock in freeing page under lru_lock */ if (locked) { - unlock_page_lruvec_irqrestore(locked, flags); + lruvec_unlock_irqrestore(locked, flags); locked =3D NULL; } folio_put(folio); @@ -1278,7 +1278,7 @@ isolate_migratepages_block(struct compact_control *cc= , unsigned long low_pfn, */ if (nr_isolated) { if (locked) { - unlock_page_lruvec_irqrestore(locked, flags); + lruvec_unlock_irqrestore(locked, flags); locked =3D NULL; } putback_movable_pages(&cc->migratepages); @@ -1310,7 +1310,7 @@ isolate_migratepages_block(struct compact_control *cc= , unsigned long low_pfn, =20 isolate_abort: if (locked) - unlock_page_lruvec_irqrestore(locked, flags); + lruvec_unlock_irqrestore(locked, flags); if (folio) { folio_set_lru(folio); folio_put(folio); diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 2a47682d1ab7..df66aa4bc4c2 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -3605,7 +3605,7 @@ static int __split_unmapped_folio(struct folio *folio= , int new_order, folio_ref_unfreeze(origin_folio, 1 + ((mapping || swap_cache) ? folio_nr_pages(origin_folio) : 0)); =20 - unlock_page_lruvec(lruvec); + lruvec_unlock(lruvec); =20 if (swap_cache) xa_unlock(&swap_cache->i_pages); diff --git a/mm/mlock.c b/mm/mlock.c index 3cb72b579ffd..86cad963edb7 100644 --- a/mm/mlock.c +++ b/mm/mlock.c @@ -205,7 +205,7 @@ static void mlock_folio_batch(struct folio_batch *fbatc= h) } =20 if (lruvec) - unlock_page_lruvec_irq(lruvec); + lruvec_unlock_irq(lruvec); folios_put(fbatch); } =20 diff --git a/mm/swap.c b/mm/swap.c index 77b2d5997873..ee19e171857d 100644 --- a/mm/swap.c +++ b/mm/swap.c @@ -91,7 +91,7 @@ static void page_cache_release(struct folio *folio) =20 __page_cache_release(folio, &lruvec, &flags); if (lruvec) - unlock_page_lruvec_irqrestore(lruvec, flags); + lruvec_unlock_irqrestore(lruvec, flags); } =20 void __folio_put(struct folio *folio) @@ -171,7 +171,7 @@ static void folio_batch_move_lru(struct folio_batch *fb= atch, move_fn_t move_fn) } =20 if (lruvec) - unlock_page_lruvec_irqrestore(lruvec, flags); + lruvec_unlock_irqrestore(lruvec, flags); folios_put(fbatch); } =20 @@ -343,7 +343,7 @@ void folio_activate(struct folio *folio) =20 lruvec =3D folio_lruvec_lock_irq(folio); lru_activate(lruvec, folio); - unlock_page_lruvec_irq(lruvec); + lruvec_unlock_irq(lruvec); folio_set_lru(folio); } #endif @@ -953,7 +953,7 @@ void folios_put_refs(struct folio_batch *folios, unsign= ed int *refs) =20 if (folio_is_zone_device(folio)) { if (lruvec) { - unlock_page_lruvec_irqrestore(lruvec, flags); + lruvec_unlock_irqrestore(lruvec, flags); lruvec =3D NULL; } if (folio_ref_sub_and_test(folio, nr_refs)) @@ -967,7 +967,7 @@ void folios_put_refs(struct folio_batch *folios, unsign= ed int *refs) /* hugetlb has its own memcg */ if (folio_test_hugetlb(folio)) { if (lruvec) { - unlock_page_lruvec_irqrestore(lruvec, flags); + lruvec_unlock_irqrestore(lruvec, flags); lruvec =3D NULL; } free_huge_folio(folio); @@ -981,7 +981,7 @@ void folios_put_refs(struct folio_batch *folios, unsign= ed int *refs) j++; } if (lruvec) - unlock_page_lruvec_irqrestore(lruvec, flags); + lruvec_unlock_irqrestore(lruvec, flags); if (!j) { folio_batch_reinit(folios); return; diff --git a/mm/vmscan.c b/mm/vmscan.c index b620d74b0f66..a76b3cee043d 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -1847,7 +1847,7 @@ bool folio_isolate_lru(struct folio *folio) folio_get(folio); lruvec =3D folio_lruvec_lock_irq(folio); lruvec_del_folio(lruvec, folio); - unlock_page_lruvec_irq(lruvec); + lruvec_unlock_irq(lruvec); ret =3D true; } =20 @@ -7681,7 +7681,7 @@ void check_move_unevictable_folios(struct folio_batch= *fbatch) if (lruvec) { __count_vm_events(UNEVICTABLE_PGRESCUED, pgrescued); __count_vm_events(UNEVICTABLE_PGSCANNED, pgscanned); - unlock_page_lruvec_irq(lruvec); + lruvec_unlock_irq(lruvec); } else if (pgscanned) { count_vm_events(UNEVICTABLE_PGSCANNED, pgscanned); } --=20 2.20.1 From nobody Fri Dec 19 16:05:22 2025 Received: from mail-pl1-f169.google.com (mail-pl1-f169.google.com [209.85.214.169]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 41805210F49 for ; Tue, 15 Apr 2025 02:46:23 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.169 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1744685184; cv=none; b=gVWVU2Raok9g1almwidKIzaAM0Lb9C3Sqm5YPMhJOiCoYcXgfE5QHXSs2ZAkz1CVsz6MinEqmpqJj0me++Ymnq0BA2DAd0WFwGm7ilIl7OYPgT6T0sGsL9f3Rf+LuFuvmxcCRI7qMaEzNyocglv5Ei7V97i/oa9zE5K0BQYnw1I= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1744685184; c=relaxed/simple; bh=hQOJab4kVrFQvrssWHzPZOIBnHti4kboAWzr5RL2/NE=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=hyIlJeUmaLDCrID+In5YMwVbl945yeKHDzqHdsuKNEvMRbcAVtuGMMIGg/gcUNTZwVALUClthtBSzjau4yLsAiHTu/Wheaq4dUyLMCwNB6977Uoi0dHkSPeR5UuAlwCv2zY0CPjuSc+egjkRKaTgXOIKK5K6sUIHfNrc1V34q4Q= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com; spf=pass smtp.mailfrom=bytedance.com; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b=iPUM5rEr; arc=none smtp.client-ip=209.85.214.169 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=bytedance.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b="iPUM5rEr" Received: by mail-pl1-f169.google.com with SMTP id d9443c01a7336-2240b4de12bso68172205ad.2 for ; Mon, 14 Apr 2025 19:46:23 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1744685182; x=1745289982; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=HknBZVJEEwldsJF0essNl26exDom/3YV3flikzCNEqI=; b=iPUM5rEr/GGKMpqJ3O3V/CgT/8xGPNFm2D9Dk5Lbl1S/JDn2b0Vrqrrupmn6ScBgUc OQAn+tNj965RFcbbOSd4/0kXpvNZPsECiPL3P0DSPLTgsYDHTGC4CdXhmMwiSpq+ngEl fxT+GlQguoMrEnDoDZidaIidCX0Ep2qbB2eZEgOB8p25eIQu35U0TfrgBTA5IlgJPlna vzAIKjRLlUTGzqYm3sDpspbZ5E0S+yytWuhm4ixC/7ev9nZuQMDdFFquok8F7nM1wPNG uyKoh+MFLq80mCig10xfzbILSxSRsCeoo4LUIy+fPV05O4agQzsex/S1h0A9t6NZPaBm rd/w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1744685182; x=1745289982; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=HknBZVJEEwldsJF0essNl26exDom/3YV3flikzCNEqI=; b=h6f4u4swVJuz5SaCAolL6CrI4K0IwioncyP/ZQpgpQHHR2dPV817oWjJt+BE3J7XsA HfIh17KclkTKYT957Cb0ILhKetaclu493H1XeMPvAgjYM1z9XXXPJqw+ZuYBD4iH8u9B wWwDWpHCJt4SsIwpIGC/BAkwe3v3EcKjrxr+rwnsti5CE/ClP5acbxvKUNlbHpW1JOGL UEtEeRe3/qtqMBN+7T3ZpNufODxyP9sw67qcRMe3fFwYvSBxW+krYd6YZ822a/c2CTDb XnIWnmE0ci8dRVkDKX9zvyG4DW1hCwb20Fd8GWL8Y48f8bwqtQeaRkW0QZtUEOQyp8Kk cqlA== X-Gm-Message-State: AOJu0YzpJ/x9eov6NIT3Qw/wz9ohkEQ6ParYT8NeuFlHfz/aLE2hIZq+ RbK+FrV0gh9Agx3GM6y3snVHshf8FkGRRPhIK2NPbLpRFtb1qpEN1f4GJbFpMj4= X-Gm-Gg: ASbGncuAf/2fTu5GCmjD7U5EEFxaDvrLoRC5DDSYu7x3xCEbgePhzT3bnxBEbdj9Typ X0klMXtW4UtmahLaKJS9WRkzwHfnG5XNfgus2A/dhNh4RjWWzeCaoVPqSgn+eVi6XhHozxu+WHE xZCtdiVhtYRzfOobewVzS8gG/QguxPScVIewaAo+CZjCLJxXdUjhfoomyS9FmeM2Kl8q0mBvwYQ hJcCaMVOxY/UVse46zTVXZmvtRH0GuRLr1W3GyCR8c14gCo76K6njcJVJZ0hTOCXNuXzdiB9B4f 5PNF/7DXMKaE6acUGxM1i7mZt4C7bQ1Y6ooEl1DOmhTytyWlkvQ1VGUfF1OIyW6FkrzEHA40 X-Google-Smtp-Source: AGHT+IF9USZi2XhF57yhEEp9osrqi/g5DdA8y8Taoi4iL31Fzh+/3YgFBVfEBC6Bksa7uUzxyxne4g== X-Received: by 2002:a17:903:3c44:b0:223:f408:c3e2 with SMTP id d9443c01a7336-22bea4b6136mr205639305ad.14.1744685182592; Mon, 14 Apr 2025 19:46:22 -0700 (PDT) Received: from PXLDJ45XCM.bytedance.net ([61.213.176.5]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-22ac7ccac49sm106681185ad.217.2025.04.14.19.46.17 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Mon, 14 Apr 2025 19:46:22 -0700 (PDT) From: Muchun Song To: hannes@cmpxchg.org, mhocko@kernel.org, roman.gushchin@linux.dev, shakeel.butt@linux.dev, muchun.song@linux.dev, akpm@linux-foundation.org, david@fromorbit.com, zhengqi.arch@bytedance.com, yosry.ahmed@linux.dev, nphamcs@gmail.com, chengming.zhou@linux.dev Cc: linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, linux-mm@kvack.org, hamzamahfooz@linux.microsoft.com, apais@linux.microsoft.com, Muchun Song Subject: [PATCH RFC 05/28] mm: thp: replace folio_memcg() with folio_memcg_charged() Date: Tue, 15 Apr 2025 10:45:09 +0800 Message-Id: <20250415024532.26632-6-songmuchun@bytedance.com> X-Mailer: git-send-email 2.39.5 (Apple Git-154) In-Reply-To: <20250415024532.26632-1-songmuchun@bytedance.com> References: <20250415024532.26632-1-songmuchun@bytedance.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" folio_memcg_charged() is intended for use when the user is unconcerned about the returned memcg pointer. It is more efficient than folio_memcg(). Therefore, replace folio_memcg() with folio_memcg_charged(). Signed-off-by: Muchun Song Acked-by: Johannes Weiner --- mm/huge_memory.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/mm/huge_memory.c b/mm/huge_memory.c index df66aa4bc4c2..a81e89987ca2 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -4048,7 +4048,7 @@ bool __folio_unqueue_deferred_split(struct folio *fol= io) bool unqueued =3D false; =20 WARN_ON_ONCE(folio_ref_count(folio)); - WARN_ON_ONCE(!mem_cgroup_disabled() && !folio_memcg(folio)); + WARN_ON_ONCE(!mem_cgroup_disabled() && !folio_memcg_charged(folio)); =20 ds_queue =3D get_deferred_split_queue(folio); spin_lock_irqsave(&ds_queue->split_queue_lock, flags); --=20 2.20.1 From nobody Fri Dec 19 16:05:22 2025 Received: from mail-pl1-f180.google.com (mail-pl1-f180.google.com [209.85.214.180]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id ACAE6212FAB for ; Tue, 15 Apr 2025 02:46:28 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.180 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1744685190; cv=none; b=KPl/EN9mAEbdHvKkhJB/b+8MixNbUKIXsb5Lt4ujEXcm5Ax7pLnrtWt4tZbmW9PG+tnzjMjEYcfXpR2uRdQ94X1yoCjlMJYXr+IDhwPsxuXIw5ZHaXlJOypd90N/d058v+VJM92NBJsqwZ0xX17Ns2yv8aXKmZDzz7mLCKDSyE8= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1744685190; c=relaxed/simple; bh=KSYsL4yQvpm1WsPFuKmKy8oc/G+fxWcGcjCz7Keh7Sw=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=gB2yqsK9h8mKhqz3mq6QRaJpfOkquS7qiQTPoj+2k4P08IPdx23kp0Nb4osfxOSdhMXZGBgOOA+8E5VJdTYA7vp1jlTdZiQE1THy1UNNayXZXeSYcIer5rwa2KuPpsPtsePvqh34csu/GuRhWZNmjf3WS0+nwuOx+x8qtWoYVXk= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com; spf=pass smtp.mailfrom=bytedance.com; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b=e3BqFIzY; arc=none smtp.client-ip=209.85.214.180 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=bytedance.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b="e3BqFIzY" Received: by mail-pl1-f180.google.com with SMTP id d9443c01a7336-22928d629faso48589725ad.3 for ; Mon, 14 Apr 2025 19:46:28 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1744685188; x=1745289988; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=SVQwnJXLTzf5IbKZLnXsEzmufmqfYmZThSJSPJR5bRg=; b=e3BqFIzY01CWcNOElO1Yr2R4D2FKQ+U1QdgGNlKvNqjFd1Xe86t8eEkXlYQu8vGDEq KJgDGrTkb7iRM6bgHh92YcVjzjXvO7pmB9WlJEwrUY4BPLTz1JcCxu2jIsrVSvf5XWC/ 5NiokEcZ+pAiM5etuNhtXzv2TYVB/FIgvt0O9pjGG61Cxl7jSouBCFlRsX8xEY97H2D3 jSxa25Ixfxjf4coBLNoBp1kwTprrp39A/g8pCE0l0fhKq8w3W51d75lGnSNZg+ZPBves 7yEdalLPe6lvUbGtRuHbOEejt/2KUhAGOJ6t6E067mS/Sqt8zaP5Y2fxf7T3sUy9ijUM 9tAA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1744685188; x=1745289988; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=SVQwnJXLTzf5IbKZLnXsEzmufmqfYmZThSJSPJR5bRg=; b=QzGD9FjjAsArZK6eLX5/eAwgn31nUQLZ2LZu5VQGENEztSUu8SfDza8z9pmXd3tyoa SiCCH0PMbeOmhXLLJK4brO0z53f3RbkOeI5B7Yf2B+CA21bY3/DQpaoBT0rmbjpdobG0 q8R/aQUYjEPi6j9IG5wGNuOsj+Au/nCCS+jo7csptECByop7I/mhZ6D6raaeKXS1EHMw u0IVbwGksDNo6xlaXKY+WG3WdiioihJhC42Gn0vplQxFlEUPLIN1GpRC7W72YNhCFhIb vZ590HdZUS5cMpsOYQL4qwyliKdOZ6O1GnNSvPVwbUa0vprF5EE8WBvPvxMHlHLkU2vD PFEw== X-Gm-Message-State: AOJu0YytYHuETkUEVnS0Yq4JIGCQ6sLSwdiZwKBxBFbAlEmSaHJV006e gMK8NnzUQYGW4LhRI0e4Q9i9vaUYG7j3zuJDnw3DHdIgDub31KGMFmhgvS/7XFM= X-Gm-Gg: ASbGncuFJOz+YWnhPseD/fO3YyTeGUURgAy+p5G47qSs9xj2sYXVcHJ4LcdKBGxn451 SQ98yLyIsBzvrLI1O+GgwvrTNPIGNFILMMCs8+iS+5nHBuBxoKoFZwYFpr286rilgjqGA1cmAO4 L8gNVcd4KVy/f2Ob5ePOwOfoyjI7h9Wg+73JI4018FxPUUpvKnKLp3dwhpmLUacbS7SNnHU7+OF I0a5Kethza9q77OmeKGdqzlWJgiYtjypICSmBw5F5M3eHDz+3RDQwk/J/ftSHlgIvFohfCz9sbb qgjORlPPJhILbboof4uq9S2PIAmwgjxuDVaHmQvoCkhPReye4U/8yKMIc446G2golPQJi4ex X-Google-Smtp-Source: AGHT+IH0uXKr0KJ6xOw01dclaGRwapxC+cZ57mqT8VHJVXzMXdBqWXQQ8E+M2n4/a2Vb6fIXE8woqw== X-Received: by 2002:a17:903:98b:b0:224:1781:a950 with SMTP id d9443c01a7336-22bea4aba54mr197670155ad.14.1744685187749; Mon, 14 Apr 2025 19:46:27 -0700 (PDT) Received: from PXLDJ45XCM.bytedance.net ([61.213.176.5]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-22ac7ccac49sm106681185ad.217.2025.04.14.19.46.22 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Mon, 14 Apr 2025 19:46:27 -0700 (PDT) From: Muchun Song To: hannes@cmpxchg.org, mhocko@kernel.org, roman.gushchin@linux.dev, shakeel.butt@linux.dev, muchun.song@linux.dev, akpm@linux-foundation.org, david@fromorbit.com, zhengqi.arch@bytedance.com, yosry.ahmed@linux.dev, nphamcs@gmail.com, chengming.zhou@linux.dev Cc: linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, linux-mm@kvack.org, hamzamahfooz@linux.microsoft.com, apais@linux.microsoft.com, Muchun Song Subject: [PATCH RFC 06/28] mm: thp: introduce folio_split_queue_lock and its variants Date: Tue, 15 Apr 2025 10:45:10 +0800 Message-Id: <20250415024532.26632-7-songmuchun@bytedance.com> X-Mailer: git-send-email 2.39.5 (Apple Git-154) In-Reply-To: <20250415024532.26632-1-songmuchun@bytedance.com> References: <20250415024532.26632-1-songmuchun@bytedance.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" In future memcg removal, the binding between a folio and a memcg may change, making the split lock within the memcg unstable when held. A new approach is required to reparent the split queue to its parent. This patch starts introducing a unified way to acquire the split lock for future work. It's a code-only refactoring with no functional changes. Signed-off-by: Muchun Song Acked-by: Johannes Weiner --- include/linux/memcontrol.h | 10 ++++ mm/huge_memory.c | 100 +++++++++++++++++++++++++++---------- 2 files changed, 83 insertions(+), 27 deletions(-) diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h index a045819bcf40..bb4f203733f3 100644 --- a/include/linux/memcontrol.h +++ b/include/linux/memcontrol.h @@ -1639,6 +1639,11 @@ int alloc_shrinker_info(struct mem_cgroup *memcg); void free_shrinker_info(struct mem_cgroup *memcg); void set_shrinker_bit(struct mem_cgroup *memcg, int nid, int shrinker_id); void reparent_shrinker_deferred(struct mem_cgroup *memcg); + +static inline int shrinker_id(struct shrinker *shrinker) +{ + return shrinker->id; +} #else #define mem_cgroup_sockets_enabled 0 static inline void mem_cgroup_sk_alloc(struct sock *sk) { }; @@ -1652,6 +1657,11 @@ static inline void set_shrinker_bit(struct mem_cgrou= p *memcg, int nid, int shrinker_id) { } + +static inline int shrinker_id(struct shrinker *shrinker) +{ + return -1; +} #endif =20 #ifdef CONFIG_MEMCG diff --git a/mm/huge_memory.c b/mm/huge_memory.c index a81e89987ca2..70820fa75c1f 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -1059,26 +1059,75 @@ pmd_t maybe_pmd_mkwrite(pmd_t pmd, struct vm_area_s= truct *vma) =20 #ifdef CONFIG_MEMCG static inline -struct deferred_split *get_deferred_split_queue(struct folio *folio) +struct mem_cgroup *folio_split_queue_memcg(struct folio *folio, + struct deferred_split *queue) +{ + if (mem_cgroup_disabled()) + return NULL; + if (&NODE_DATA(folio_nid(folio))->deferred_split_queue =3D=3D queue) + return NULL; + return container_of(queue, struct mem_cgroup, deferred_split_queue); +} + +static inline struct deferred_split *folio_memcg_split_queue(struct folio = *folio) { struct mem_cgroup *memcg =3D folio_memcg(folio); - struct pglist_data *pgdat =3D NODE_DATA(folio_nid(folio)); =20 - if (memcg) - return &memcg->deferred_split_queue; - else - return &pgdat->deferred_split_queue; + return memcg ? &memcg->deferred_split_queue : NULL; } #else static inline -struct deferred_split *get_deferred_split_queue(struct folio *folio) +struct mem_cgroup *folio_split_queue_memcg(struct folio *folio, + struct deferred_split *queue) { - struct pglist_data *pgdat =3D NODE_DATA(folio_nid(folio)); + return NULL; +} =20 - return &pgdat->deferred_split_queue; +static inline struct deferred_split *folio_memcg_split_queue(struct folio = *folio) +{ + return NULL; } #endif =20 +static struct deferred_split *folio_split_queue(struct folio *folio) +{ + struct deferred_split *queue =3D folio_memcg_split_queue(folio); + + return queue ? : &NODE_DATA(folio_nid(folio))->deferred_split_queue; +} + +static struct deferred_split *folio_split_queue_lock(struct folio *folio) +{ + struct deferred_split *queue; + + queue =3D folio_split_queue(folio); + spin_lock(&queue->split_queue_lock); + + return queue; +} + +static struct deferred_split * +folio_split_queue_lock_irqsave(struct folio *folio, unsigned long *flags) +{ + struct deferred_split *queue; + + queue =3D folio_split_queue(folio); + spin_lock_irqsave(&queue->split_queue_lock, *flags); + + return queue; +} + +static inline void split_queue_unlock(struct deferred_split *queue) +{ + spin_unlock(&queue->split_queue_lock); +} + +static inline void split_queue_unlock_irqrestore(struct deferred_split *qu= eue, + unsigned long flags) +{ + spin_unlock_irqrestore(&queue->split_queue_lock, flags); +} + static inline bool is_transparent_hugepage(const struct folio *folio) { if (!folio_test_large(folio)) @@ -3723,7 +3772,7 @@ static int __folio_split(struct folio *folio, unsigne= d int new_order, struct page *split_at, struct page *lock_at, struct list_head *list, bool uniform_split) { - struct deferred_split *ds_queue =3D get_deferred_split_queue(folio); + struct deferred_split *ds_queue; XA_STATE(xas, &folio->mapping->i_pages, folio->index); bool is_anon =3D folio_test_anon(folio); struct address_space *mapping =3D NULL; @@ -3857,7 +3906,7 @@ static int __folio_split(struct folio *folio, unsigne= d int new_order, } =20 /* Prevent deferred_split_scan() touching ->_refcount */ - spin_lock(&ds_queue->split_queue_lock); + ds_queue =3D folio_split_queue_lock(folio); if (folio_ref_freeze(folio, 1 + extra_pins)) { if (folio_order(folio) > 1 && !list_empty(&folio->_deferred_list)) { @@ -3875,7 +3924,7 @@ static int __folio_split(struct folio *folio, unsigne= d int new_order, */ list_del_init(&folio->_deferred_list); } - spin_unlock(&ds_queue->split_queue_lock); + split_queue_unlock(ds_queue); if (mapping) { int nr =3D folio_nr_pages(folio); =20 @@ -3896,7 +3945,7 @@ static int __folio_split(struct folio *folio, unsigne= d int new_order, split_at, lock_at, list, end, &xas, mapping, uniform_split); } else { - spin_unlock(&ds_queue->split_queue_lock); + split_queue_unlock(ds_queue); fail: if (mapping) xas_unlock(&xas); @@ -4050,8 +4099,7 @@ bool __folio_unqueue_deferred_split(struct folio *fol= io) WARN_ON_ONCE(folio_ref_count(folio)); WARN_ON_ONCE(!mem_cgroup_disabled() && !folio_memcg_charged(folio)); =20 - ds_queue =3D get_deferred_split_queue(folio); - spin_lock_irqsave(&ds_queue->split_queue_lock, flags); + ds_queue =3D folio_split_queue_lock_irqsave(folio, &flags); if (!list_empty(&folio->_deferred_list)) { ds_queue->split_queue_len--; if (folio_test_partially_mapped(folio)) { @@ -4062,7 +4110,7 @@ bool __folio_unqueue_deferred_split(struct folio *fol= io) list_del_init(&folio->_deferred_list); unqueued =3D true; } - spin_unlock_irqrestore(&ds_queue->split_queue_lock, flags); + split_queue_unlock_irqrestore(ds_queue, flags); =20 return unqueued; /* useful for debug warnings */ } @@ -4070,10 +4118,7 @@ bool __folio_unqueue_deferred_split(struct folio *fo= lio) /* partially_mapped=3Dfalse won't clear PG_partially_mapped folio flag */ void deferred_split_folio(struct folio *folio, bool partially_mapped) { - struct deferred_split *ds_queue =3D get_deferred_split_queue(folio); -#ifdef CONFIG_MEMCG - struct mem_cgroup *memcg =3D folio_memcg(folio); -#endif + struct deferred_split *ds_queue; unsigned long flags; =20 /* @@ -4096,7 +4141,7 @@ void deferred_split_folio(struct folio *folio, bool p= artially_mapped) if (folio_test_swapcache(folio)) return; =20 - spin_lock_irqsave(&ds_queue->split_queue_lock, flags); + ds_queue =3D folio_split_queue_lock_irqsave(folio, &flags); if (partially_mapped) { if (!folio_test_partially_mapped(folio)) { folio_set_partially_mapped(folio); @@ -4111,15 +4156,16 @@ void deferred_split_folio(struct folio *folio, bool= partially_mapped) VM_WARN_ON_FOLIO(folio_test_partially_mapped(folio), folio); } if (list_empty(&folio->_deferred_list)) { + struct mem_cgroup *memcg; + + memcg =3D folio_split_queue_memcg(folio, ds_queue); list_add_tail(&folio->_deferred_list, &ds_queue->split_queue); ds_queue->split_queue_len++; -#ifdef CONFIG_MEMCG if (memcg) set_shrinker_bit(memcg, folio_nid(folio), - deferred_split_shrinker->id); -#endif + shrinker_id(deferred_split_shrinker)); } - spin_unlock_irqrestore(&ds_queue->split_queue_lock, flags); + split_queue_unlock_irqrestore(ds_queue, flags); } =20 static unsigned long deferred_split_count(struct shrinker *shrink, @@ -4202,7 +4248,7 @@ static unsigned long deferred_split_scan(struct shrin= ker *shrink, if (!--sc->nr_to_scan) break; } - spin_unlock_irqrestore(&ds_queue->split_queue_lock, flags); + split_queue_unlock_irqrestore(ds_queue, flags); =20 list_for_each_entry_safe(folio, next, &list, _deferred_list) { bool did_split =3D false; @@ -4251,7 +4297,7 @@ static unsigned long deferred_split_scan(struct shrin= ker *shrink, spin_lock_irqsave(&ds_queue->split_queue_lock, flags); list_splice_tail(&list, &ds_queue->split_queue); ds_queue->split_queue_len -=3D removed; - spin_unlock_irqrestore(&ds_queue->split_queue_lock, flags); + split_queue_unlock_irqrestore(ds_queue, flags); =20 if (prev) folio_put(prev); --=20 2.20.1 From nobody Fri Dec 19 16:05:22 2025 Received: from mail-pg1-f172.google.com (mail-pg1-f172.google.com [209.85.215.172]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id BB866221733 for ; Tue, 15 Apr 2025 02:46:33 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.215.172 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1744685195; cv=none; b=dqzsMNnDLBnb8XJAY3B8SHe4eV8RVgSrbOUFXNzqSbUxtsb3li9AHrRrXdhOiJ5eTreMna4JqhfOAo4pi7mPU/Ns+pPq/D3u6nzgHKfEgjI/NVlm4YyDdlknzWZWgmfICnZUaGE6MQ0Us9uNHGDDbwdyuHlNkC80ozv6JZpS0F8= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1744685195; c=relaxed/simple; bh=84sEftSYIXbatP8ktiNMwP5R5mvWYXBwXyaqvwHswlo=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=i+khv8PKyOIpRpwviI3jYVhWC/IjE94H74zWEqnFWktydL0M7k1OLWhdS1TrWsgs/3GHiutI08yWVt4go5ispS08HojF3WoEU2JzDURMwwvxxWG/Ag6m/nzvDIYpCnuqp1hYM5vYoQy/9lNeGN/C0guGiNGh/KEIMjAxRxcTrgY= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com; spf=pass smtp.mailfrom=bytedance.com; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b=OMZsgufo; arc=none smtp.client-ip=209.85.215.172 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=bytedance.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b="OMZsgufo" Received: by mail-pg1-f172.google.com with SMTP id 41be03b00d2f7-af93cd64ef3so3462906a12.2 for ; Mon, 14 Apr 2025 19:46:33 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1744685193; x=1745289993; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=y8vA4hA6IVI4psjQZDVnpF3qlQM7CohLDDJ5h+RUQ50=; b=OMZsgufow1ebs6FLwZbHco8mCG+Z/vxDvHw0KkvQ3xivgPSthk+CN5l9KHYZ7XZiZl 8R+p6fs7nxU8PVB7EKILd01yWYrBFb68n4BENYPDMTaecjExbkITfQ/XJnF+qVU6DuaA M9KzQmKibOvJImTZbQhxr0DkElSQjAOaizZQtFPKAKMleloGYmSMDV1EojFHuA0usGeA qQXQ5to4ik4KXD6DhQvhEGewtkkGQLD8eh8ET9eX07c0q0xXOzBYMxQRrMtzCzNI7O60 1KSBrnNuioqwFfTvt6lxuFc7qkeQUtfH6KeBZQT16P9nDRNO688RVYvYs+YMjozHO/Hs RjjA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1744685193; x=1745289993; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=y8vA4hA6IVI4psjQZDVnpF3qlQM7CohLDDJ5h+RUQ50=; b=sNt/uIAPhdXy5/eiVhfeG4WvJU4mE/pmdZiKbg1EDm91/ijfwL8we9vQcMvSMKCdY4 YMpe94/mIqFfx/2tqccZIMKcQzZJSSAINitbcD8OnZrjv1cJ6Cknq1ZKCzsF5R3FI317 y5uBk5UQP47lgtJ5J4VVymXPgE1GyW0j3ls9rOZAQ8OqqaqNfouqkAMvNkKmzdf1rfsw 2Vl8NWNZNqRWEyyoQU92fs/U0tO7s9nemH0Cx3vPLF1X8QxOa00VBJkHIc8EDUas6tG/ RFplgxKWCnfI4vs0QF5mRT1uJy/491AMustLs2EryFpBSAdXvFlr4N9XeqhlZoSsxM58 GjIA== X-Gm-Message-State: AOJu0Yy9rKesGZsgurhiPv/l0Qa1XqWu4Q9mDUnIaPuCK9Zq6VO6L1GR JYqmcf65akZxECD3LeVD1Gxl/lWhcCenN5W8dBldt9KJNcvNw7lWzxztTI8DfU0= X-Gm-Gg: ASbGncvn9V/c9DuJQVXhaw8cfXCWdkvRYJ5xy9zRBiqXl2AYziYvG/RHkQ0UjwUjz94 6gIspC/tR98L1+3K0EFAOSNTAWFsbwKS0wRhubvg0avG68bxf2Ceek3q9kMQlJBrYX8LPPDoRT9 OYNrQJ9IR4hZ8pVIkJw6Xz08P8X0LUxWq7fZUHExl3nDlb4jJPj/6lHKvkmi052SXsGaju4myiT 3LQ1cEzuKvoG79hO6JKwSw9llxG+pbrV/lgJtXwTqc2kVoKokzt2HL13UC4DHxs7RzQs/8a6Ag6 i7UIE9dDggc6tKov8NL/JsRWy4mPamB0Sb0F8IrpadNCFzUBtGkedGln4kqV/la9lXmCyJxx X-Google-Smtp-Source: AGHT+IGN/ZCUfpQSbaBS4W8bBOfntfSIoehjtu5uWTeww1FUk2wFfobrLS/nbSO2R96mT1mo6Vfdog== X-Received: by 2002:a17:902:e552:b0:223:66bc:f1de with SMTP id d9443c01a7336-22bea4afc0fmr177104965ad.21.1744685192967; Mon, 14 Apr 2025 19:46:32 -0700 (PDT) Received: from PXLDJ45XCM.bytedance.net ([61.213.176.5]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-22ac7ccac49sm106681185ad.217.2025.04.14.19.46.28 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Mon, 14 Apr 2025 19:46:32 -0700 (PDT) From: Muchun Song To: hannes@cmpxchg.org, mhocko@kernel.org, roman.gushchin@linux.dev, shakeel.butt@linux.dev, muchun.song@linux.dev, akpm@linux-foundation.org, david@fromorbit.com, zhengqi.arch@bytedance.com, yosry.ahmed@linux.dev, nphamcs@gmail.com, chengming.zhou@linux.dev Cc: linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, linux-mm@kvack.org, hamzamahfooz@linux.microsoft.com, apais@linux.microsoft.com, Muchun Song Subject: [PATCH RFC 07/28] mm: thp: use folio_batch to handle THP splitting in deferred_split_scan() Date: Tue, 15 Apr 2025 10:45:11 +0800 Message-Id: <20250415024532.26632-8-songmuchun@bytedance.com> X-Mailer: git-send-email 2.39.5 (Apple Git-154) In-Reply-To: <20250415024532.26632-1-songmuchun@bytedance.com> References: <20250415024532.26632-1-songmuchun@bytedance.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" The maintenance of the folio->_deferred_list is intricate because it's reused in a local list. Here are some peculiarities: 1) When a folio is removed from its split queue and added to a local on-stack list in deferred_split_scan(), the ->split_queue_len isn't updated, leading to an inconsistency between it and the actual number of folios in the split queue. 2) When the folio is split via split_folio() later, it's removed from the local list while holding the split queue lock. At this time, this lock protects the local list, not the split queue. 3) To handle the race condition with a third-party freeing or migrating the preceding folio, we must ensure there's always one safe (with raised refcount) folio before by delaying its folio_put(). More details can be found in commit e66f3185fa04. It's rather tricky. We can use the folio_batch infrastructure to handle this clearly. In this case, ->split_queue_len will be consistent with the real number of folios in the split queue. If list_empty(&folio->_deferred_list) returns false, it's clear the folio must be in its split queue (not in a local list anymore). In the future, we aim to reparent LRU folios during memcg offline to eliminate dying memory cgroups. This patch prepares for using folio_split_queue_lock_irqsave() as folio memcg may change then. Signed-off-by: Muchun Song --- mm/huge_memory.c | 69 +++++++++++++++++++++--------------------------- 1 file changed, 30 insertions(+), 39 deletions(-) diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 70820fa75c1f..d2bc943a40e8 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -4220,40 +4220,47 @@ static unsigned long deferred_split_scan(struct shr= inker *shrink, struct pglist_data *pgdata =3D NODE_DATA(sc->nid); struct deferred_split *ds_queue =3D &pgdata->deferred_split_queue; unsigned long flags; - LIST_HEAD(list); - struct folio *folio, *next, *prev =3D NULL; - int split =3D 0, removed =3D 0; + struct folio *folio, *next; + int split =3D 0, i; + struct folio_batch fbatch; + bool done; =20 #ifdef CONFIG_MEMCG if (sc->memcg) ds_queue =3D &sc->memcg->deferred_split_queue; #endif - + folio_batch_init(&fbatch); +retry: + done =3D true; spin_lock_irqsave(&ds_queue->split_queue_lock, flags); /* Take pin on all head pages to avoid freeing them under us */ list_for_each_entry_safe(folio, next, &ds_queue->split_queue, _deferred_list) { if (folio_try_get(folio)) { - list_move(&folio->_deferred_list, &list); - } else { + folio_batch_add(&fbatch, folio); + } else if (folio_test_partially_mapped(folio)) { /* We lost race with folio_put() */ - if (folio_test_partially_mapped(folio)) { - folio_clear_partially_mapped(folio); - mod_mthp_stat(folio_order(folio), - MTHP_STAT_NR_ANON_PARTIALLY_MAPPED, -1); - } - list_del_init(&folio->_deferred_list); - ds_queue->split_queue_len--; + folio_clear_partially_mapped(folio); + mod_mthp_stat(folio_order(folio), + MTHP_STAT_NR_ANON_PARTIALLY_MAPPED, -1); } + list_del_init(&folio->_deferred_list); + ds_queue->split_queue_len--; if (!--sc->nr_to_scan) break; + if (folio_batch_space(&fbatch) =3D=3D 0) { + done =3D false; + break; + } } split_queue_unlock_irqrestore(ds_queue, flags); =20 - list_for_each_entry_safe(folio, next, &list, _deferred_list) { + for (i =3D 0; i < folio_batch_count(&fbatch); i++) { bool did_split =3D false; bool underused =3D false; + struct deferred_split *fqueue; =20 + folio =3D fbatch.folios[i]; if (!folio_test_partially_mapped(folio)) { underused =3D thp_underused(folio); if (!underused) @@ -4269,39 +4276,23 @@ static unsigned long deferred_split_scan(struct shr= inker *shrink, } folio_unlock(folio); next: + if (did_split || !folio_test_partially_mapped(folio)) + continue; /* - * split_folio() removes folio from list on success. * Only add back to the queue if folio is partially mapped. * If thp_underused returns false, or if split_folio fails * in the case it was underused, then consider it used and * don't add it back to split_queue. */ - if (did_split) { - ; /* folio already removed from list */ - } else if (!folio_test_partially_mapped(folio)) { - list_del_init(&folio->_deferred_list); - removed++; - } else { - /* - * That unlocked list_del_init() above would be unsafe, - * unless its folio is separated from any earlier folios - * left on the list (which may be concurrently unqueued) - * by one safe folio with refcount still raised. - */ - swap(folio, prev); - } - if (folio) - folio_put(folio); + fqueue =3D folio_split_queue_lock_irqsave(folio, &flags); + list_add_tail(&folio->_deferred_list, &fqueue->split_queue); + fqueue->split_queue_len++; + split_queue_unlock_irqrestore(fqueue, flags); } + folios_put(&fbatch); =20 - spin_lock_irqsave(&ds_queue->split_queue_lock, flags); - list_splice_tail(&list, &ds_queue->split_queue); - ds_queue->split_queue_len -=3D removed; - split_queue_unlock_irqrestore(ds_queue, flags); - - if (prev) - folio_put(prev); - + if (!done) + goto retry; /* * Stop shrinker if we didn't split any page, but the queue is empty. * This can happen if pages were freed under us. --=20 2.20.1 From nobody Fri Dec 19 16:05:22 2025 Received: from mail-pg1-f170.google.com (mail-pg1-f170.google.com [209.85.215.170]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id CE267221FB4 for ; Tue, 15 Apr 2025 02:46:38 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.215.170 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1744685200; cv=none; b=luoLT4PVujJFo8nW3b5PSQlwFthAPDINeulxFD4jg9maWgYeR919ldnNviQxHyGcfXVG++aFc+E7miLM9ShfZea4cYE3hxS6P7wWSJ+7K6KX/w2+3+BUklDBw+Li/70EFY1puABswvqtMilDZrRbOvPXbA/x8Nvy7SUYj2DTZHE= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1744685200; c=relaxed/simple; bh=I2MxloyZUm/6MBAy/p3ae3ToNEsICOYe03keV5sOoXo=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=sfDtWnm3uEea+09ho9GS+n9pxw6GrjIFl3S5ciGzcMeia0/8M84oIFaBR8T3TuSLORIhipEirwN40rXe+NSAD6qLVZmLq1dExmIONopv4+ENK9p5AUYxmeaR88+xLB+CnkssIMGaD5wL6qK0sV6e5TglN4ZEjxVj8TyRY3/NNBE= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com; spf=pass smtp.mailfrom=bytedance.com; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b=aOk86XGD; arc=none smtp.client-ip=209.85.215.170 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=bytedance.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b="aOk86XGD" Received: by mail-pg1-f170.google.com with SMTP id 41be03b00d2f7-b00e3c87713so4128795a12.2 for ; Mon, 14 Apr 2025 19:46:38 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1744685198; x=1745289998; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=FoWdLHOi8MZDe7tFGtzR8WCw9Q33zyNjsUf4A8uDwIg=; b=aOk86XGDAepWx1JdLufMOd84aBMvR4t2diQZnRIgYGrYmd8TrqePrx3hbTDfRsv4WI ++3bhSVzu4zymRuxkrvqzv+9XMIzX2JciVZWgnKO+M7hC5EbJyIGlowVG4o8uAPF2L4o Lqb670DgojdD/c3zJju3nSZAlcj1qcHPgw1O2ie5Nh3oFZiIgsCM3oL6v2Rxe4LVK/cC Ge1Mtj3HIsT2QtJIexccVuQrvexbQ7DuSt5RsTgrmB7HPo60q/iNqUC4JKDJ0wk+iM9H 3fR55h4docd2Eh+8l1Uz/MaR/6nHaqO/tfvpgxUiIMvA0mkCYq1k6WZ0COMJ5SECvhcr bMmg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1744685198; x=1745289998; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=FoWdLHOi8MZDe7tFGtzR8WCw9Q33zyNjsUf4A8uDwIg=; b=aaEIt6AgUscs0xQM+raLowcWD5QjbR7Gw+OFlVRnn/BUQGRyFcN0PfROtkzQygAoku gBdkq7TO4EdNTaQBYRMCA3bTN0x/2yhDPavYZRimqCoRNV3fL5tg/DXd5uV5g1gbpc8v UoalKrZ7VvEqtUpD1a26jGVc6kfKpXUs6U8U92W5fB7WLrl8cX9e+Ur/UGIeMTzbnIy5 8B0RIAVoe6VVXI/Se1LQfGkYKkdq0fmtX6wuKVq3z8JBnLqVXDXYPBjBq4BPU03Fug+0 P3MxpNbZOcTgi3k5Gk4bjYM0yRNUzZsWugcGjTZ6Hjz6dCeOJ8k0mW2lRBeGvDJbONPj vmOg== X-Gm-Message-State: AOJu0Yz65e6WVzL/tAFvYb0Wg44zjkQgmDgGGhkR4Bn/0gjRO7FsOAB0 neqOwOALe9kcvz26Pl9OLszT0p115KJksy05+cbDsm9ah6VkMybk/UHXoU6lLZ0= X-Gm-Gg: ASbGnct66FTO6yBd9UfjCO9O4nraD6Vkg7fUMNHB8CVJqq3rqSkEbpxI3DNcSyoEOtA dbFs73nJBG6qroB1hSHLcMuMkqeLai//hvyshHPME8zVrM/3uqjKZCi4zwGahXDrEQpAIGr59NB cZ4GcaI8V/ZQ4Bzt0IHpvERqN0wWOHdfYod2X11BZO4J68mjl8YiYOn2l16m0k6BAXK+FBB9Z+c g0VwvMGyNEgQXpa1Yhnw0x9QCVFhap5oHR2f0PXzsmfvupvv+/sLAQfnQ9PMByZ/pQLCdMtCZ8x xrnrzPuA+KQY/e+hsBLfueUVleq8hXG0iocHfketV1P+FR3AHFfcRPMlYqDAhhxBTQNfUWDHkF6 MCmqwwbw= X-Google-Smtp-Source: AGHT+IEKT/drxov/S/IJkMzk/bPhvMbUiZuHxa6kWkMhcuX0SINcw3+8GdOiNf/WYjYHBQMzkgoxKg== X-Received: by 2002:a17:90b:2f0b:b0:2f6:f32e:90ac with SMTP id 98e67ed59e1d1-30823639726mr23171470a91.11.1744685198109; Mon, 14 Apr 2025 19:46:38 -0700 (PDT) Received: from PXLDJ45XCM.bytedance.net ([61.213.176.5]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-22ac7ccac49sm106681185ad.217.2025.04.14.19.46.33 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Mon, 14 Apr 2025 19:46:37 -0700 (PDT) From: Muchun Song To: hannes@cmpxchg.org, mhocko@kernel.org, roman.gushchin@linux.dev, shakeel.butt@linux.dev, muchun.song@linux.dev, akpm@linux-foundation.org, david@fromorbit.com, zhengqi.arch@bytedance.com, yosry.ahmed@linux.dev, nphamcs@gmail.com, chengming.zhou@linux.dev Cc: linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, linux-mm@kvack.org, hamzamahfooz@linux.microsoft.com, apais@linux.microsoft.com, Muchun Song Subject: [PATCH RFC 08/28] mm: vmscan: refactor move_folios_to_lru() Date: Tue, 15 Apr 2025 10:45:12 +0800 Message-Id: <20250415024532.26632-9-songmuchun@bytedance.com> X-Mailer: git-send-email 2.39.5 (Apple Git-154) In-Reply-To: <20250415024532.26632-1-songmuchun@bytedance.com> References: <20250415024532.26632-1-songmuchun@bytedance.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" In a subsequent patch, we'll reparent the LRU folios. The folios that are moved to the appropriate LRU list can undergo reparenting during the move_folios_to_lru() process. Hence, it's incorrect for the caller to hold a lruvec lock. Instead, we should utilize the more general interface of folio_lruvec_relock_irq() to obtain the correct lruvec lock. This patch involves only code refactoring and doesn't introduce any functional changes. Signed-off-by: Muchun Song Acked-by: Johannes Weiner --- mm/vmscan.c | 51 +++++++++++++++++++++++++-------------------------- 1 file changed, 25 insertions(+), 26 deletions(-) diff --git a/mm/vmscan.c b/mm/vmscan.c index a76b3cee043d..eac5e6e70660 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -1901,24 +1901,27 @@ static bool too_many_isolated(struct pglist_data *p= gdat, int file, /* * move_folios_to_lru() moves folios from private @list to appropriate LRU= list. * - * Returns the number of pages moved to the given lruvec. + * Returns the number of pages moved to the appropriate lruvec. + * + * Note: The caller must not hold any lruvec lock. */ -static unsigned int move_folios_to_lru(struct lruvec *lruvec, - struct list_head *list) +static unsigned int move_folios_to_lru(struct list_head *list) { int nr_pages, nr_moved =3D 0; + struct lruvec *lruvec =3D NULL; struct folio_batch free_folios; =20 folio_batch_init(&free_folios); while (!list_empty(list)) { struct folio *folio =3D lru_to_folio(list); =20 + lruvec =3D folio_lruvec_relock_irq(folio, lruvec); VM_BUG_ON_FOLIO(folio_test_lru(folio), folio); list_del(&folio->lru); if (unlikely(!folio_evictable(folio))) { - spin_unlock_irq(&lruvec->lru_lock); + lruvec_unlock_irq(lruvec); folio_putback_lru(folio); - spin_lock_irq(&lruvec->lru_lock); + lruvec =3D NULL; continue; } =20 @@ -1940,19 +1943,15 @@ static unsigned int move_folios_to_lru(struct lruve= c *lruvec, =20 folio_unqueue_deferred_split(folio); if (folio_batch_add(&free_folios, folio) =3D=3D 0) { - spin_unlock_irq(&lruvec->lru_lock); + lruvec_unlock_irq(lruvec); mem_cgroup_uncharge_folios(&free_folios); free_unref_folios(&free_folios); - spin_lock_irq(&lruvec->lru_lock); + lruvec =3D NULL; } =20 continue; } =20 - /* - * All pages were isolated from the same lruvec (and isolation - * inhibits memcg migration). - */ VM_BUG_ON_FOLIO(!folio_matches_lruvec(folio, lruvec), folio); lruvec_add_folio(lruvec, folio); nr_pages =3D folio_nr_pages(folio); @@ -1961,11 +1960,12 @@ static unsigned int move_folios_to_lru(struct lruve= c *lruvec, workingset_age_nonresident(lruvec, nr_pages); } =20 + if (lruvec) + lruvec_unlock_irq(lruvec); + if (free_folios.nr) { - spin_unlock_irq(&lruvec->lru_lock); mem_cgroup_uncharge_folios(&free_folios); free_unref_folios(&free_folios); - spin_lock_irq(&lruvec->lru_lock); } =20 return nr_moved; @@ -2033,9 +2033,9 @@ static unsigned long shrink_inactive_list(unsigned lo= ng nr_to_scan, =20 nr_reclaimed =3D shrink_folio_list(&folio_list, pgdat, sc, &stat, false); =20 - spin_lock_irq(&lruvec->lru_lock); - move_folios_to_lru(lruvec, &folio_list); + move_folios_to_lru(&folio_list); =20 + local_irq_disable(); __mod_lruvec_state(lruvec, PGDEMOTE_KSWAPD + reclaimer_offset(sc), stat.nr_demoted); __mod_node_page_state(pgdat, NR_ISOLATED_ANON + file, -nr_taken); @@ -2044,7 +2044,7 @@ static unsigned long shrink_inactive_list(unsigned lo= ng nr_to_scan, __count_vm_events(item, nr_reclaimed); __count_memcg_events(lruvec_memcg(lruvec), item, nr_reclaimed); __count_vm_events(PGSTEAL_ANON + file, nr_reclaimed); - spin_unlock_irq(&lruvec->lru_lock); + local_irq_enable(); =20 lru_note_cost(lruvec, file, stat.nr_pageout, nr_scanned - nr_reclaimed); =20 @@ -2183,16 +2183,15 @@ static void shrink_active_list(unsigned long nr_to_= scan, /* * Move folios back to the lru list. */ - spin_lock_irq(&lruvec->lru_lock); - - nr_activate =3D move_folios_to_lru(lruvec, &l_active); - nr_deactivate =3D move_folios_to_lru(lruvec, &l_inactive); + nr_activate =3D move_folios_to_lru(&l_active); + nr_deactivate =3D move_folios_to_lru(&l_inactive); =20 + local_irq_disable(); __count_vm_events(PGDEACTIVATE, nr_deactivate); __count_memcg_events(lruvec_memcg(lruvec), PGDEACTIVATE, nr_deactivate); =20 __mod_node_page_state(pgdat, NR_ISOLATED_ANON + file, -nr_taken); - spin_unlock_irq(&lruvec->lru_lock); + local_irq_enable(); =20 if (nr_rotated) lru_note_cost(lruvec, file, 0, nr_rotated); @@ -4723,14 +4722,15 @@ static int evict_folios(struct lruvec *lruvec, stru= ct scan_control *sc, int swap set_mask_bits(&folio->flags, LRU_REFS_FLAGS, BIT(PG_active)); } =20 - spin_lock_irq(&lruvec->lru_lock); - - move_folios_to_lru(lruvec, &list); + move_folios_to_lru(&list); =20 + local_irq_disable(); walk =3D current->reclaim_state->mm_walk; if (walk && walk->batched) { walk->lruvec =3D lruvec; + spin_lock(&lruvec->lru_lock); reset_batch_size(walk); + spin_unlock(&lruvec->lru_lock); } =20 __mod_lruvec_state(lruvec, PGDEMOTE_KSWAPD + reclaimer_offset(sc), @@ -4741,8 +4741,7 @@ static int evict_folios(struct lruvec *lruvec, struct= scan_control *sc, int swap __count_vm_events(item, reclaimed); __count_memcg_events(memcg, item, reclaimed); __count_vm_events(PGSTEAL_ANON + type, reclaimed); - - spin_unlock_irq(&lruvec->lru_lock); + local_irq_enable(); =20 list_splice_init(&clean, &list); =20 --=20 2.20.1 From nobody Fri Dec 19 16:05:22 2025 Received: from mail-pl1-f182.google.com (mail-pl1-f182.google.com [209.85.214.182]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 361D7221FB4 for ; Tue, 15 Apr 2025 02:46:43 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.182 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1744685206; cv=none; b=AGjUILY1lQkzeb5N9TY4aAWNsZJTUIfLRs/15NRtWaxM1BdJRcbTPCE8mY3Jx3n72JDPcxpaE22l1+1xodWkxBKEB+HBM7sPx3Jrl8NPlXad8ZebDQOFVGrfjlt8WXh04ebjV4niWmINQaDOxJ7Moe1aAYYPmG7EqvZCua+/Gz4= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1744685206; c=relaxed/simple; bh=RJbyGjSiuBfJcsuLzkJgi0r4E8ThNERBFSbwg2CQX30=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=GOZEOW5NDhNdV6F6bxocPnV7atENobOweklpB8xkurUyPVJJWLCkTEyFIkXnHg0oSHQ+A161suI0jEQ9SGznL8uavKOWlmiHkTupdLzmcNcESndCGmgJ8mx4mePXBG5qEWPfM/1a5At444FBLFPzC7SVuH9jU5Fbj7dz4z+nKkE= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com; spf=pass smtp.mailfrom=bytedance.com; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b=GguoWzui; arc=none smtp.client-ip=209.85.214.182 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=bytedance.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b="GguoWzui" Received: by mail-pl1-f182.google.com with SMTP id d9443c01a7336-224341bbc1dso44401915ad.3 for ; Mon, 14 Apr 2025 19:46:43 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1744685203; x=1745290003; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=4255W6cWFY69GBJYbQ0IFQtoRNp4GanHOHXEYxbFWEk=; b=GguoWzuiaZBQHg6IP8IkA+6zvmLB1+G4vIwpybMUtezpU0ZeIzFWgdUQX8BGKiQyLP nuCeUcP8YsfrKSt7tY8OjAcrCOcuCnpa2FNOn82KExNCmdRJpvhNkgqiNBlGyrNEmwmH eAprZ+JBKDBasDIcee7c0bt0AUh2vgSx/sHAuxrDW1hGxSwgihyPctTqCC6vbf4ORw04 wvY5zXUVrHSZAJyKhpe5e/ZJi2ref/zGIb6YVN4O+C78umgmPj0dAsGzLuqJcXLMCu3M pgZYjRdEkD+8/aEGXMh+Iq+LkJ7GPFnMRCtIX1hoeLoAnpWJ+cVXZOBB0mXE5w3aQ1cg Nh7A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1744685203; x=1745290003; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=4255W6cWFY69GBJYbQ0IFQtoRNp4GanHOHXEYxbFWEk=; b=Z1FTaKvbDtf/PEcZvSdEiIBco25IiM+8MYYTppTSa9iqJypdEJqAYfV+Y1naeFNYMp pN4nAW3i2Ebyt9D4Rnkn1IcbtnDWC2k2gyf0NvIKzbgUfbhhjG7YMzrO8aM+7LOR9smV yiHEmYyoQFcFTEjqgL/ijQqyt8svT4R53tb13vh5dct/IT4N4coShgOg3nT+V583gXkN 7WI+tARBj5mb4EM68WEXYGHHcag1H6KFUZnNTsV03p4ZeqCZwDHav2UXt+z4F9+sXKw+ yiAKZveXM0+fBqAFE+bl1T2XpsTaPs43nUAfSt1LXTmpEkz9X1O3EWwzTfobwgy4ninh 8mYQ== X-Gm-Message-State: AOJu0YyIb3wwJ1NRgQMuDdry+Uo1ME6PBGtTQ0Jt+x2Yyc1T6qhX1+WW LdtP8LFojyaGOSJO9mtmsRP/yCsMvVFpnyMkVgJkI5WEEd9bxv3kmuKj4odCzZM= X-Gm-Gg: ASbGnctD92MDtJdX+wf6x3gYVya9uQTqJ2Ffnfr0YexlJurO6iLkMNsNdC0IqrN/Q6K v0VdPhb//GWBjcc5J/qRgNuROfAG8M6Mbz9cq1NTyIBUnB8dRNF09RBj8JfBTYMpWQ163vgsox4 aUMZFLXj354nZ0Rhx0obWjdHnZTcINgmtXwSM0XLHX6B3DP1EZVfEEbkmXr/jUhSnqQlVhuIzFn 8/7t6wN8sXglx5FlalNmIGpskJqLrRCYBdYZ4xhAQgQLRopdTOOOzoTPpoaJ1SoL2WC7Q8iRDfs 4XKs6NaKulPH7wKj7EOOBoSAuVoAyw7lgeCka9kRELJsXsfZwz72VJDipebm4bT5CNVlrLcQ X-Google-Smtp-Source: AGHT+IF1hz17w48BHL1psfGG5kRB5zOy7YB0gh0pPjQ8MypiC8y7ZHlHyktUWpSXGC6ZIkLQ5ghe2Q== X-Received: by 2002:a17:902:f546:b0:210:fce4:11ec with SMTP id d9443c01a7336-22bea49575fmr200724635ad.1.1744685203339; Mon, 14 Apr 2025 19:46:43 -0700 (PDT) Received: from PXLDJ45XCM.bytedance.net ([61.213.176.5]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-22ac7ccac49sm106681185ad.217.2025.04.14.19.46.38 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Mon, 14 Apr 2025 19:46:42 -0700 (PDT) From: Muchun Song To: hannes@cmpxchg.org, mhocko@kernel.org, roman.gushchin@linux.dev, shakeel.butt@linux.dev, muchun.song@linux.dev, akpm@linux-foundation.org, david@fromorbit.com, zhengqi.arch@bytedance.com, yosry.ahmed@linux.dev, nphamcs@gmail.com, chengming.zhou@linux.dev Cc: linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, linux-mm@kvack.org, hamzamahfooz@linux.microsoft.com, apais@linux.microsoft.com, Muchun Song Subject: [PATCH RFC 09/28] mm: memcontrol: allocate object cgroup for non-kmem case Date: Tue, 15 Apr 2025 10:45:13 +0800 Message-Id: <20250415024532.26632-10-songmuchun@bytedance.com> X-Mailer: git-send-email 2.39.5 (Apple Git-154) In-Reply-To: <20250415024532.26632-1-songmuchun@bytedance.com> References: <20250415024532.26632-1-songmuchun@bytedance.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Pagecache pages are charged at allocation time and hold a reference to the original memory cgroup until reclaimed. Depending on memory pressure, page sharing patterns between different cgroups and cgroup creation/destruction rates, many dying memory cgroups can be pinned by pagecache pages, reducing page reclaim efficiency and wasting memory. Converting LRU folios and most other raw memory cgroup pins to the object cgroup direction can fix this long-living problem. As a result, the objcg infrastructure is no longer solely applicable to the kmem case. In this patch, we extend the scope of the objcg infrastructure beyond the kmem case, enabling LRU folios to reuse it for folio charging purposes. It should be noted that LRU folios are not accounted for at the root level, yet the folio->memcg_data points to the root_mem_cgroup. Hence, the folio->memcg_data of LRU folios always points to a valid pointer. However, the root_mem_cgroup does not possess an object cgroup. Therefore, we also allocate an object cgroup for the root_mem_cgroup. Signed-off-by: Muchun Song --- mm/memcontrol.c | 50 +++++++++++++++++++++++-------------------------- 1 file changed, 23 insertions(+), 27 deletions(-) diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 0fc76d50bc23..a6362d11b46c 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -193,10 +193,10 @@ static struct obj_cgroup *obj_cgroup_alloc(void) return objcg; } =20 -static void memcg_reparent_objcgs(struct mem_cgroup *memcg, - struct mem_cgroup *parent) +static void memcg_reparent_objcgs(struct mem_cgroup *memcg) { struct obj_cgroup *objcg, *iter; + struct mem_cgroup *parent =3D parent_mem_cgroup(memcg); =20 objcg =3D rcu_replace_pointer(memcg->objcg, NULL, true); =20 @@ -3156,30 +3156,17 @@ unsigned long mem_cgroup_usage(struct mem_cgroup *m= emcg, bool swap) return val; } =20 -static int memcg_online_kmem(struct mem_cgroup *memcg) +static void memcg_online_kmem(struct mem_cgroup *memcg) { - struct obj_cgroup *objcg; - if (mem_cgroup_kmem_disabled()) - return 0; + return; =20 if (unlikely(mem_cgroup_is_root(memcg))) - return 0; - - objcg =3D obj_cgroup_alloc(); - if (!objcg) - return -ENOMEM; - - objcg->memcg =3D memcg; - rcu_assign_pointer(memcg->objcg, objcg); - obj_cgroup_get(objcg); - memcg->orig_objcg =3D objcg; + return; =20 static_branch_enable(&memcg_kmem_online_key); =20 memcg->kmemcg_id =3D memcg->id.id; - - return 0; } =20 static void memcg_offline_kmem(struct mem_cgroup *memcg) @@ -3194,12 +3181,6 @@ static void memcg_offline_kmem(struct mem_cgroup *me= mcg) =20 parent =3D parent_mem_cgroup(memcg); memcg_reparent_list_lrus(memcg, parent); - - /* - * Objcg's reparenting must be after list_lru's, make sure list_lru - * helpers won't use parent's list_lru until child is drained. - */ - memcg_reparent_objcgs(memcg, parent); } =20 #ifdef CONFIG_CGROUP_WRITEBACK @@ -3711,9 +3692,9 @@ mem_cgroup_css_alloc(struct cgroup_subsys_state *pare= nt_css) static int mem_cgroup_css_online(struct cgroup_subsys_state *css) { struct mem_cgroup *memcg =3D mem_cgroup_from_css(css); + struct obj_cgroup *objcg; =20 - if (memcg_online_kmem(memcg)) - goto remove_id; + memcg_online_kmem(memcg); =20 /* * A memcg must be visible for expand_shrinker_info() @@ -3723,6 +3704,15 @@ static int mem_cgroup_css_online(struct cgroup_subsy= s_state *css) if (alloc_shrinker_info(memcg)) goto offline_kmem; =20 + objcg =3D obj_cgroup_alloc(); + if (!objcg) + goto free_shrinker; + + objcg->memcg =3D memcg; + rcu_assign_pointer(memcg->objcg, objcg); + obj_cgroup_get(objcg); + memcg->orig_objcg =3D objcg; + if (unlikely(mem_cgroup_is_root(memcg)) && !mem_cgroup_disabled()) queue_delayed_work(system_unbound_wq, &stats_flush_dwork, FLUSH_TIME); @@ -3745,9 +3735,10 @@ static int mem_cgroup_css_online(struct cgroup_subsy= s_state *css) xa_store(&mem_cgroup_ids, memcg->id.id, memcg, GFP_KERNEL); =20 return 0; +free_shrinker: + free_shrinker_info(memcg); offline_kmem: memcg_offline_kmem(memcg); -remove_id: mem_cgroup_id_remove(memcg); return -ENOMEM; } @@ -3764,6 +3755,11 @@ static void mem_cgroup_css_offline(struct cgroup_sub= sys_state *css) zswap_memcg_offline_cleanup(memcg); =20 memcg_offline_kmem(memcg); + /* + * Objcg's reparenting must be after list_lru's above, make sure list_lru + * helpers won't use parent's list_lru until child is drained. + */ + memcg_reparent_objcgs(memcg); reparent_shrinker_deferred(memcg); wb_memcg_offline(memcg); lru_gen_offline_memcg(memcg); --=20 2.20.1 From nobody Fri Dec 19 16:05:22 2025 Received: from mail-pl1-f171.google.com (mail-pl1-f171.google.com [209.85.214.171]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 78B6822259F for ; Tue, 15 Apr 2025 02:46:49 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.171 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1744685211; cv=none; b=J62mtTimlOmsExNsrG8zqwNcsbrTUH/rSphrZV1RvHpv4nX6O8/JRAwNYnf+O3e/NOc3QRih1oMl/g0QdgF/T8CdcjjfLBLrgq6bmIojAZMQJ+8cL9YOccGjbaxxIZNtNtzLOxxzNzLHei9Au/MFyFDvYqwoOJpuLzoni7a+PXc= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1744685211; c=relaxed/simple; bh=QhuVhvtJc5MpA+UIZxyTNbtSjQ4aOVEDIQQnJjIAMyE=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=e1a+TErtSQlgwBjdn/BZAK8v2qJRdF1vqalEXnTOXfWBSmTidSXRoYVpz16naIL/PFSd36WWqIiLfoLjhZTkt8UWyyqJIgGU28U2QuIeDSv5eRqASpmzwtlAe7DFvO26vwQfOxrDWH7Kt1/Tw1dfFYAfzK5Th581XL9ZI6SPC+M= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com; spf=pass smtp.mailfrom=bytedance.com; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b=VupfZ/ky; arc=none smtp.client-ip=209.85.214.171 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=bytedance.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b="VupfZ/ky" Received: by mail-pl1-f171.google.com with SMTP id d9443c01a7336-224171d6826so70698215ad.3 for ; Mon, 14 Apr 2025 19:46:49 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1744685209; x=1745290009; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=IuDX2yZAES6U0UQApE3qruOoIN30Kn4u1JZ0iDLJzNw=; b=VupfZ/kyYO80+4eVO6KxMJ9hVBAXycs6xxyQLIOB99DoOk021z9twtwJGcYr+gGEjm SpCBkV3MMazTM/kuK3YabytKOIHPNKUGAXqXbVkyMSYxxFo5IemEakyfLAOwcWydx1J+ 05r+2QljAbVX44VX7ZLwtFDNf3S4XIeBZ//mhhNttEOhTYMvO+w4uLN0fmozRh16FlKw evp0q+8OqfZmjQdAMpn+IucEIwfLZx25A6gqhs5a6iReyeolWW/mkb/mSsMkLv1hSRSU 8fi7C3GFLQF6pHaNePV9NJYbRVgNV/gVdRWwpFekeC8n5MWr/+nhcz42WIcLjnpkmnbV xMIQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1744685209; x=1745290009; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=IuDX2yZAES6U0UQApE3qruOoIN30Kn4u1JZ0iDLJzNw=; b=H5tIPLTtER/Qq8Y1g3P3VJ0gq7cMzqU/EWcYnD5iay5455jJCnsJrFVw3mT2+DQdzy efAJaSTIlMs903soL1ajwnuoMPI2ZEKCy8y+Bjxw5+XyC5RVFJ+O1D1PPybmxahwOvRr eCmHjyWZkiB2RvEFzYBYM1ce68T2IO4H00/RVauvdqqH5RSviNp0cRHoVYLkbPXdG5s3 uR7aE6oY7phD0rTS+oYHoJNeuQaSD4p8bfeb/QgbncrLrLuJ1XDK65RdMrnua7H97PtQ Ei2Es32VtF3QO9fJ2a/vR80B10z60WDdqYaU7WhKQC05ZiOuvMKJvitJBwtO9cCLK1oQ qTqA== X-Gm-Message-State: AOJu0Yy5bo/1VqqotG+1V05mhI//c8xXmBOmE36baJSUjouPOH6bcWnC Mqm3PcOfbhGjkGCkpYNysKdk2Pw9L+cV7QRghxEtfrdd9YmbmitFa4SoB1caYy4= X-Gm-Gg: ASbGnctBrUuBIlukh9nG94Mc7hLwJgpFsS9mrH/2Rl2OTJ9eBPPf/aaxUfbV2/4zd+e WWJ8HPgWnVGSoJg71i/TmJ9MQ60i8IFloSvW9PmZblyn6LUcifuT9va8E4zXSMT1mVAOxkYHwqD h+MuNPXBlGnyRd+LxFNQv4qqBAaYt14F03bSA0G8dgvrWloFLzMBClkpToraQ52zrFRxWtloO+e YzyvuQQYVgc5HZpuxa4NgGMQXWH/4bQGcp20DLEyLQvYrfVt4iHil2DcoPBEHdtyGAD9xseDswm ZWn2tVYNrpk43GX9yo1n/QDUxp8sdRpqHSvmeOoWdO/q0/ueGzjyJ8/I2V7eF4oVit1vKZSU X-Google-Smtp-Source: AGHT+IEiRiEAaC7KVEwbBorKdUbvn6lwcIuA3If1BEtGwUIR4XacRMxzFJ4y2u3Y4Nx1KigSul40og== X-Received: by 2002:a17:903:230e:b0:224:1234:5a3b with SMTP id d9443c01a7336-22bea50da98mr202894305ad.51.1744685208495; Mon, 14 Apr 2025 19:46:48 -0700 (PDT) Received: from PXLDJ45XCM.bytedance.net ([61.213.176.5]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-22ac7ccac49sm106681185ad.217.2025.04.14.19.46.43 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Mon, 14 Apr 2025 19:46:48 -0700 (PDT) From: Muchun Song To: hannes@cmpxchg.org, mhocko@kernel.org, roman.gushchin@linux.dev, shakeel.butt@linux.dev, muchun.song@linux.dev, akpm@linux-foundation.org, david@fromorbit.com, zhengqi.arch@bytedance.com, yosry.ahmed@linux.dev, nphamcs@gmail.com, chengming.zhou@linux.dev Cc: linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, linux-mm@kvack.org, hamzamahfooz@linux.microsoft.com, apais@linux.microsoft.com, Muchun Song Subject: [PATCH RFC 10/28] mm: memcontrol: return root object cgroup for root memory cgroup Date: Tue, 15 Apr 2025 10:45:14 +0800 Message-Id: <20250415024532.26632-11-songmuchun@bytedance.com> X-Mailer: git-send-email 2.39.5 (Apple Git-154) In-Reply-To: <20250415024532.26632-1-songmuchun@bytedance.com> References: <20250415024532.26632-1-songmuchun@bytedance.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Memory cgroup functions such as get_mem_cgroup_from_folio() and get_mem_cgroup_from_mm() return a valid memory cgroup pointer, even for the root memory cgroup. In contrast, the situation for object cgroups has been different. Previously, the root object cgroup couldn't be returned because it didn't exist. Now that a valid root object cgroup exists, for the sake of consistency, it's necessary to align the behavior of object-cgroup-related operations with that of memory cgroup APIs. Signed-off-by: Muchun Song --- include/linux/memcontrol.h | 29 ++++++++++++++++++------- mm/memcontrol.c | 44 ++++++++++++++++++++------------------ mm/percpu.c | 2 +- 3 files changed, 45 insertions(+), 30 deletions(-) diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h index bb4f203733f3..e74922d5755d 100644 --- a/include/linux/memcontrol.h +++ b/include/linux/memcontrol.h @@ -319,6 +319,7 @@ struct mem_cgroup { #define MEMCG_CHARGE_BATCH 64U =20 extern struct mem_cgroup *root_mem_cgroup; +extern struct obj_cgroup *root_obj_cgroup; =20 enum page_memcg_data_flags { /* page->memcg_data is a pointer to an slabobj_ext vector */ @@ -528,6 +529,11 @@ static inline bool mem_cgroup_is_root(struct mem_cgrou= p *memcg) return (memcg =3D=3D root_mem_cgroup); } =20 +static inline bool obj_cgroup_is_root(const struct obj_cgroup *objcg) +{ + return objcg =3D=3D root_obj_cgroup; +} + static inline bool mem_cgroup_disabled(void) { return !cgroup_subsys_enabled(memory_cgrp_subsys); @@ -752,23 +758,26 @@ struct mem_cgroup *mem_cgroup_from_css(struct cgroup_= subsys_state *css){ =20 static inline bool obj_cgroup_tryget(struct obj_cgroup *objcg) { + if (obj_cgroup_is_root(objcg)) + return true; return percpu_ref_tryget(&objcg->refcnt); } =20 -static inline void obj_cgroup_get(struct obj_cgroup *objcg) +static inline void obj_cgroup_get_many(struct obj_cgroup *objcg, + unsigned long nr) { - percpu_ref_get(&objcg->refcnt); + if (!obj_cgroup_is_root(objcg)) + percpu_ref_get_many(&objcg->refcnt, nr); } =20 -static inline void obj_cgroup_get_many(struct obj_cgroup *objcg, - unsigned long nr) +static inline void obj_cgroup_get(struct obj_cgroup *objcg) { - percpu_ref_get_many(&objcg->refcnt, nr); + obj_cgroup_get_many(objcg, 1); } =20 static inline void obj_cgroup_put(struct obj_cgroup *objcg) { - if (objcg) + if (objcg && !obj_cgroup_is_root(objcg)) percpu_ref_put(&objcg->refcnt); } =20 @@ -1101,6 +1110,11 @@ static inline bool mem_cgroup_is_root(struct mem_cgr= oup *memcg) return true; } =20 +static inline bool obj_cgroup_is_root(const struct obj_cgroup *objcg) +{ + return true; +} + static inline bool mem_cgroup_disabled(void) { return true; @@ -1684,8 +1698,7 @@ static inline struct obj_cgroup *get_obj_cgroup_from_= current(void) { struct obj_cgroup *objcg =3D current_obj_cgroup(); =20 - if (objcg) - obj_cgroup_get(objcg); + obj_cgroup_get(objcg); =20 return objcg; } diff --git a/mm/memcontrol.c b/mm/memcontrol.c index a6362d11b46c..4aadc1b87db3 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -81,6 +81,7 @@ struct cgroup_subsys memory_cgrp_subsys __read_mostly; EXPORT_SYMBOL(memory_cgrp_subsys); =20 struct mem_cgroup *root_mem_cgroup __read_mostly; +struct obj_cgroup *root_obj_cgroup __read_mostly; =20 /* Active memory cgroup to use from an interrupt context */ DEFINE_PER_CPU(struct mem_cgroup *, int_active_memcg); @@ -2525,15 +2526,14 @@ struct mem_cgroup *mem_cgroup_from_slab_obj(void *p) =20 static struct obj_cgroup *__get_obj_cgroup_from_memcg(struct mem_cgroup *m= emcg) { - struct obj_cgroup *objcg =3D NULL; + for (; memcg; memcg =3D parent_mem_cgroup(memcg)) { + struct obj_cgroup *objcg =3D rcu_dereference(memcg->objcg); =20 - for (; !mem_cgroup_is_root(memcg); memcg =3D parent_mem_cgroup(memcg)) { - objcg =3D rcu_dereference(memcg->objcg); if (likely(objcg && obj_cgroup_tryget(objcg))) - break; - objcg =3D NULL; + return objcg; } - return objcg; + + return NULL; } =20 static struct obj_cgroup *current_objcg_update(void) @@ -2604,18 +2604,17 @@ __always_inline struct obj_cgroup *current_obj_cgro= up(void) * Objcg reference is kept by the task, so it's safe * to use the objcg by the current task. */ - return objcg; + return objcg ? : root_obj_cgroup; } =20 memcg =3D this_cpu_read(int_active_memcg); if (unlikely(memcg)) goto from_memcg; =20 - return NULL; + return root_obj_cgroup; =20 from_memcg: - objcg =3D NULL; - for (; !mem_cgroup_is_root(memcg); memcg =3D parent_mem_cgroup(memcg)) { + for (; memcg; memcg =3D parent_mem_cgroup(memcg)) { /* * Memcg pointer is protected by scope (see set_active_memcg()) * and is pinning the corresponding objcg, so objcg can't go @@ -2624,10 +2623,10 @@ __always_inline struct obj_cgroup *current_obj_cgro= up(void) */ objcg =3D rcu_dereference_check(memcg->objcg, 1); if (likely(objcg)) - break; + return objcg; } =20 - return objcg; + return root_obj_cgroup; } =20 struct obj_cgroup *get_obj_cgroup_from_folio(struct folio *folio) @@ -2641,14 +2640,8 @@ struct obj_cgroup *get_obj_cgroup_from_folio(struct = folio *folio) objcg =3D __folio_objcg(folio); obj_cgroup_get(objcg); } else { - struct mem_cgroup *memcg; - rcu_read_lock(); - memcg =3D __folio_memcg(folio); - if (memcg) - objcg =3D __get_obj_cgroup_from_memcg(memcg); - else - objcg =3D NULL; + objcg =3D __get_obj_cgroup_from_memcg(__folio_memcg(folio)); rcu_read_unlock(); } return objcg; @@ -2733,7 +2726,7 @@ int __memcg_kmem_charge_page(struct page *page, gfp_t= gfp, int order) int ret =3D 0; =20 objcg =3D current_obj_cgroup(); - if (objcg) { + if (!obj_cgroup_is_root(objcg)) { ret =3D obj_cgroup_charge_pages(objcg, gfp, 1 << order); if (!ret) { obj_cgroup_get(objcg); @@ -3036,7 +3029,7 @@ bool __memcg_slab_post_alloc_hook(struct kmem_cache *= s, struct list_lru *lru, * obj_cgroup_get() is used to get a permanent reference. */ objcg =3D current_obj_cgroup(); - if (!objcg) + if (obj_cgroup_is_root(objcg)) return true; =20 /* @@ -3708,6 +3701,9 @@ static int mem_cgroup_css_online(struct cgroup_subsys= _state *css) if (!objcg) goto free_shrinker; =20 + if (unlikely(mem_cgroup_is_root(memcg))) + root_obj_cgroup =3D objcg; + objcg->memcg =3D memcg; rcu_assign_pointer(memcg->objcg, objcg); obj_cgroup_get(objcg); @@ -5302,6 +5298,9 @@ void obj_cgroup_charge_zswap(struct obj_cgroup *objcg= , size_t size) if (!cgroup_subsys_on_dfl(memory_cgrp_subsys)) return; =20 + if (obj_cgroup_is_root(objcg)) + return; + VM_WARN_ON_ONCE(!(current->flags & PF_MEMALLOC)); =20 /* PF_MEMALLOC context, charging must succeed */ @@ -5329,6 +5328,9 @@ void obj_cgroup_uncharge_zswap(struct obj_cgroup *obj= cg, size_t size) if (!cgroup_subsys_on_dfl(memory_cgrp_subsys)) return; =20 + if (obj_cgroup_is_root(objcg)) + return; + obj_cgroup_uncharge(objcg, size); =20 rcu_read_lock(); diff --git a/mm/percpu.c b/mm/percpu.c index b35494c8ede2..3e54c6fca9bd 100644 --- a/mm/percpu.c +++ b/mm/percpu.c @@ -1616,7 +1616,7 @@ static bool pcpu_memcg_pre_alloc_hook(size_t size, gf= p_t gfp, return true; =20 objcg =3D current_obj_cgroup(); - if (!objcg) + if (obj_cgroup_is_root(objcg)) return true; =20 if (obj_cgroup_charge(objcg, gfp, pcpu_obj_full_size(size))) --=20 2.20.1 From nobody Fri Dec 19 16:05:22 2025 Received: from mail-pl1-f176.google.com (mail-pl1-f176.google.com [209.85.214.176]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 8CD0A2248B9 for ; Tue, 15 Apr 2025 02:46:54 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.176 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1744685216; cv=none; b=OMh/OPweoCqmX/mY9bBpYCsCZ1BMygxeJK3Du+S5TEzFUjtf1C1Rvy2FHWZvTsV4xHCn229iLjnx5lHv7Y/qIBJ4PPJoG/E9jeV6PifOAwnJDdSb7MVXBIetbD5HqVaaqNyynJ86hfQ4k0uFfpYx3rWeevKBhNlayY+blCf9Fqo= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1744685216; c=relaxed/simple; bh=zozmMIIkyQB7vfg2WW3vcLqWkxJHli/zl3aRM3CRPPI=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=YEZ68d2ar9+vS/Vn6u6JkBTyGqt6OEu1jrC/GJTUI+iMMJy8B4m5Yu5IUisoDOvaQT70Xmt5lv0GhO8jSEsfUHFo2WVrhkOHj3E7U3lFrxbJtiTnJmIH/Y52TEF0c2by1868ndTJ5SPC7ZkQ0pIDjlMETNwDW2A4E+Od7r1FrxE= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com; spf=pass smtp.mailfrom=bytedance.com; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b=izAJHX7h; arc=none smtp.client-ip=209.85.214.176 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=bytedance.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b="izAJHX7h" Received: by mail-pl1-f176.google.com with SMTP id d9443c01a7336-2240b4de12bso68175175ad.2 for ; Mon, 14 Apr 2025 19:46:54 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1744685214; x=1745290014; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=GDFiTtsYE3vsXL9LKCv2J9wFa6MHITbi7MDnimw67RY=; b=izAJHX7htiBVyb8xAlCgBvEPj9eNwRXIty0+c3vv6IRU0PDTSTRs2NV8dsNzx3CwCC m2X2OOXjuq6jLVgJaIgxhsk3VvBvIzVAmrxaTqFpCdX/D55HKLkNLtYJ+dBalryk75Qs SXca/VIquUgJOsSwJXVA5fR4vws1DaciGlcPgS01T4U1F9RuFXVJCs7rhqmvT4QWR3Gr bRKuhrAvnD/FSSQxY8NaZmevRQcXPcozOJ1woSPEgszqE7gkLq0iLYs0E934Dhj1vtnQ it6wlms5CnZQmya22svuodlg4tt7CEaLxf0EFktqox/Y1dq88etZOfbHeHxJw4VjHuM5 o/YA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1744685214; x=1745290014; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=GDFiTtsYE3vsXL9LKCv2J9wFa6MHITbi7MDnimw67RY=; b=wW+HMd2IMzw1oIqDryGCmGa7cJwIhnCWHb1/QH9M43/yrmzd+2ePjA2KlqkIzj6fCd EWJl3COjnr/t6DrtGe0paDHlvbsVfSy/a7297pfPwFaha188KjJEdPt+6CZ7KC1rVQS9 tV/+CjHs4WBI277mlDp0j1Sp5AxmEvbhg8Ud2o0q6s6SIIgklBTGC1w8PT6H5edCLAGl x3OmmhejxJPA23KoRq2PPcT8bdSva3BYlZC02ZKSitDt05eJ3StAWpyNEkncdwBiA1UA +r8/Mj0ayKxX0SqhhwRy4xuQWAMuoXI+BcgxVkwqudPIcMZCiTZDXfTddoam539u/y+X g0uQ== X-Gm-Message-State: AOJu0YzNEWMxg3835G6R+aT2CSmzutW+xbYFPG/1bfUcscKGiXz+oZaf rOG6TXld0aTlCLZHdWM1p3yBJaAXxl6kOcEBu0JhtTgZnWeg342GBDwA+kLnmb8= X-Gm-Gg: ASbGnctFbIDAj375D9ojJCqMNVD4o8gOBJCaP40bbjjDudOUpkalu/KxUTcYoFKLYw9 IVKgUDdT6dM86dNzzmN4bs0299tGU74b5OSNOPLHYGO7s9Zd4NKsEJ8QbKF90iNIATx93I0lki5 V5IRccBCdpywHG+6045yxXcmMU05ztuROJci69j1GkiQPFsq6QCB3yESf95yC22z7UsIR1nqXwM EUT+oOUeFGxkTpWqzzbWjjqKvJ8tv0TxPYZjyRrzGsGA18WeABQRKofX6372zvdxxWFro8mVTkA 7d8hw+SwALxLJJoAqeMdeP8HRUaS0YD+TCykO+CuP+LLH6Pae9MQ7m7CHh8wVm+Qed4UgW/ovcw sozWVD+c= X-Google-Smtp-Source: AGHT+IFH7dR3wBmfihakVorT6Qxxwq/ncIuMTxmwaVROxMgkaq6cZIV7iNxi0oFb9UgfVvElafvYJA== X-Received: by 2002:a17:902:eb8a:b0:21f:6c81:f63 with SMTP id d9443c01a7336-22bea4b6141mr200724475ad.16.1744685213756; Mon, 14 Apr 2025 19:46:53 -0700 (PDT) Received: from PXLDJ45XCM.bytedance.net ([61.213.176.5]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-22ac7ccac49sm106681185ad.217.2025.04.14.19.46.48 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Mon, 14 Apr 2025 19:46:53 -0700 (PDT) From: Muchun Song To: hannes@cmpxchg.org, mhocko@kernel.org, roman.gushchin@linux.dev, shakeel.butt@linux.dev, muchun.song@linux.dev, akpm@linux-foundation.org, david@fromorbit.com, zhengqi.arch@bytedance.com, yosry.ahmed@linux.dev, nphamcs@gmail.com, chengming.zhou@linux.dev Cc: linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, linux-mm@kvack.org, hamzamahfooz@linux.microsoft.com, apais@linux.microsoft.com, Muchun Song Subject: [PATCH RFC 11/28] mm: memcontrol: prevent memory cgroup release in get_mem_cgroup_from_folio() Date: Tue, 15 Apr 2025 10:45:15 +0800 Message-Id: <20250415024532.26632-12-songmuchun@bytedance.com> X-Mailer: git-send-email 2.39.5 (Apple Git-154) In-Reply-To: <20250415024532.26632-1-songmuchun@bytedance.com> References: <20250415024532.26632-1-songmuchun@bytedance.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" In the near future, a folio will no longer pin its corresponding memory cgroup. To ensure safety, it will only be appropriate to hold the rcu read lock or acquire a reference to the memory cgroup returned by folio_memcg(), thereby preventing it from being released. In the current patch, the rcu read lock is employed to safeguard against the release of the memory cgroup in get_mem_cgroup_from_folio(). This serves as a preparatory measure for the reparenting of the LRU pages. Signed-off-by: Muchun Song --- mm/memcontrol.c | 11 ++++++++--- 1 file changed, 8 insertions(+), 3 deletions(-) diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 4aadc1b87db3..4802ce1f49a4 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -983,14 +983,19 @@ struct mem_cgroup *get_mem_cgroup_from_current(void) */ struct mem_cgroup *get_mem_cgroup_from_folio(struct folio *folio) { - struct mem_cgroup *memcg =3D folio_memcg(folio); + struct mem_cgroup *memcg; =20 if (mem_cgroup_disabled()) return NULL; =20 + if (!folio_memcg_charged(folio)) + return root_mem_cgroup; + rcu_read_lock(); - if (!memcg || WARN_ON_ONCE(!css_tryget(&memcg->css))) - memcg =3D root_mem_cgroup; +retry: + memcg =3D folio_memcg(folio); + if (unlikely(!css_tryget(&memcg->css))) + goto retry; rcu_read_unlock(); return memcg; } --=20 2.20.1 From nobody Fri Dec 19 16:05:22 2025 Received: from mail-pl1-f178.google.com (mail-pl1-f178.google.com [209.85.214.178]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 5CE8C218AB3 for ; Tue, 15 Apr 2025 02:47:00 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.178 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1744685221; cv=none; b=hxfNxwQ90vwYWBmZsmtLxMG86xA7YLb36pFwaIHDMw01pHUXAsQn8SrmvpHoUwVYllqItnUbVA62oH0QyHfj90r2TyRdsyhEm8d6yp0p8v3eu93WnVTugccNXywnRN880IK+4i5yq8QRn/BHJDp0RQ1il1hFg+K8m9SiLrj8bS0= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1744685221; c=relaxed/simple; bh=8KxT6StpABCnUuNQIAeV85jX8gee7ZWYN/zcNFzGT4s=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=fLeWiZuAbZH2x7Nrxcpin5CsWPweD7kqgiTQHNvycTJSBxeqk35yysVvOHdBwDFSB+LUi+7RPUiFDDHQvn2yBgi0xCLSdijvjDJGKB1xr9EzXXqZ/7psl7J+93BUrBj0+vf37fyqPqO9pilp8XtuJN6/kbWjOJ2VnAd5Eedn/AQ= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com; spf=pass smtp.mailfrom=bytedance.com; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b=hA4TWiGa; arc=none smtp.client-ip=209.85.214.178 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=bytedance.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b="hA4TWiGa" Received: by mail-pl1-f178.google.com with SMTP id d9443c01a7336-2264aefc45dso74528075ad.0 for ; Mon, 14 Apr 2025 19:47:00 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1744685219; x=1745290019; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=Ubou74n5eLeoB4dEz2IgWAHxjcUZtEwt6QUiSiKGpOQ=; b=hA4TWiGaBdT+RW5VnjUM6tC4UrKaXvoCPUwBytpgIhn/60hJksyYg3EemIvna66WTJ 4oQ2MyLj1R4QC6qwP1kYEoaaIFtxDafQCqNH0qYeYf7d26FaHhm5g3FtJZCkEeHnYzUu jhaiLwX0sANJm6szjo3VJkePNru5Y5H11vsAzCBrmnjKETwNr48YWjW9A3n8EJghXGX1 iKdJilALnEpKuIayKEAHdS4dvHPG+iMpW1zynIzZnF5ZCQoNbNmJLFJzrgxQy2OP1k8N ZZv7lz3CEus9KZnw8uPutC61wd+ZcPGVh67pTvso5bQfsq69/7G6eF4ZUDIGhFiD1XBL fhyA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1744685219; x=1745290019; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=Ubou74n5eLeoB4dEz2IgWAHxjcUZtEwt6QUiSiKGpOQ=; b=SJZcur1OYx9/F8ZF7NWLxWIiLQezBF5xazLlpwY3y08pMOVe9+tQ0qx3G3y4YqmfLU I/chue1/0WAiYzSyCMSItBP5HtIWeIym2KnrT3weZH+Jg1dTh+KazI6KZLO3/V42RC4l oHg52nPrGTwXt14EOqK+QSrzMMBnUUlcip48mvz5HXvz6rTPgNNXiGEa64c1aP9eVUdY T9+L4lc+STaaSLMCYRm1/8bTjxLeB5cJNKjWN/6B95UYMYqFyW1odsgiBuwvBSRApmOJ WYoOyeqPovIeD5RdHXaNHqd3zBdxSuRYXINwOA+MVDS7uVpDJ7nTdahWWx8tFKe3nSgK YpZA== X-Gm-Message-State: AOJu0YzAKog62sxPqbPUA8CjL8kl6HspJyB1aXDBr5hKTv/SFFYJdQw/ hUAJByjxwQqF4h1lHYpUFhxmmnY5UPrkJueZVJxgtDTmAXB6gaMrbxrWTKIOwE4= X-Gm-Gg: ASbGncv0fQYvQpU1R6lLX95MFciRoWvjdz70JfV2mphw9BH+47JxZp3GtGDRPyFBSEl r5yf85GPziL6SP6geLHbzzUJGvkQJNiy0bxx2xMlr3w/nSoirKpkbUnaW1zJvdkiNPxroHP+Uj1 e+KG4Bc4UIqBpez5HGl4l7GwZpG4MLtPViPFjmtN35V33mVnaa3cIMmH7BtdzHAUINUTV4wGu4K 2CmWzSv865Ubh1TfAGzOJpbEfrEFSHO9t2/RBFXEDZx2oYNc7RI+xc7Tna9nEvDzo/k5ZEBUYBZ nORdgrwPMrt5VH6CWZXbhS17/L+kM3mnvlmp+blPBf/Rk104C1pt/jFAXrdgPEJbcBHET2Qv X-Google-Smtp-Source: AGHT+IFRGP6yJdpVqApS9S/aczCeDW7cEdxeVDyn5gX3MHZ0LzmgIuZG2begu5ay7gyJ3d0hK4lbIw== X-Received: by 2002:a17:902:db0e:b0:224:23be:c569 with SMTP id d9443c01a7336-22bea4adf49mr221463895ad.22.1744685219616; Mon, 14 Apr 2025 19:46:59 -0700 (PDT) Received: from PXLDJ45XCM.bytedance.net ([61.213.176.5]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-22ac7ccac49sm106681185ad.217.2025.04.14.19.46.54 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Mon, 14 Apr 2025 19:46:59 -0700 (PDT) From: Muchun Song To: hannes@cmpxchg.org, mhocko@kernel.org, roman.gushchin@linux.dev, shakeel.butt@linux.dev, muchun.song@linux.dev, akpm@linux-foundation.org, david@fromorbit.com, zhengqi.arch@bytedance.com, yosry.ahmed@linux.dev, nphamcs@gmail.com, chengming.zhou@linux.dev Cc: linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, linux-mm@kvack.org, hamzamahfooz@linux.microsoft.com, apais@linux.microsoft.com, Muchun Song Subject: [PATCH RFC 12/28] buffer: prevent memory cgroup release in folio_alloc_buffers() Date: Tue, 15 Apr 2025 10:45:16 +0800 Message-Id: <20250415024532.26632-13-songmuchun@bytedance.com> X-Mailer: git-send-email 2.39.5 (Apple Git-154) In-Reply-To: <20250415024532.26632-1-songmuchun@bytedance.com> References: <20250415024532.26632-1-songmuchun@bytedance.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" In the near future, a folio will no longer pin its corresponding memory cgroup. To ensure safety, it will only be appropriate to hold the rcu read lock or acquire a reference to the memory cgroup returned by folio_memcg(), thereby preventing it from being released. In the current patch, the function get_mem_cgroup_from_folio() is employed to safeguard against the release of the memory cgroup. This serves as a preparatory measure for the reparenting of the LRU pages. Signed-off-by: Muchun Song --- fs/buffer.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/fs/buffer.c b/fs/buffer.c index c7abb4a029dc..d8dca9bf5e38 100644 --- a/fs/buffer.c +++ b/fs/buffer.c @@ -914,8 +914,7 @@ struct buffer_head *folio_alloc_buffers(struct folio *f= olio, unsigned long size, long offset; struct mem_cgroup *memcg, *old_memcg; =20 - /* The folio lock pins the memcg */ - memcg =3D folio_memcg(folio); + memcg =3D get_mem_cgroup_from_folio(folio); old_memcg =3D set_active_memcg(memcg); =20 head =3D NULL; @@ -936,6 +935,7 @@ struct buffer_head *folio_alloc_buffers(struct folio *f= olio, unsigned long size, } out: set_active_memcg(old_memcg); + mem_cgroup_put(memcg); return head; /* * In case anything failed, we just free everything we got. --=20 2.20.1 From nobody Fri Dec 19 16:05:22 2025 Received: from mail-pl1-f170.google.com (mail-pl1-f170.google.com [209.85.214.170]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B093D2192EA for ; Tue, 15 Apr 2025 02:47:06 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.170 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1744685228; cv=none; b=QpogfhVO/2oceNqawosJbGiqJX67EYMeAMWuEkYPEji6vXmu+yKxYMdzQ0TqNDMWpfAukSqgVyY3qZFqsY1fOh4qGj9xFSvCSqvpPwBfz5mRa87c2oUMp8RGOepnuiRFW8QEXjGUy6tM8QNFJcMX+Pt0TRL+aNBJFLncuOVM0eY= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1744685228; c=relaxed/simple; bh=siUlYavAtRsgutfSnr0p3JN+rwISg9BNgAgZ+S/EI4c=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=Lq65hCykgg3tmEErQlaSVoiTh6GoCSe9jVRAV0OLczrZ284y66ifUPNDQLeS85zUi72rCASSpTfC/UQMg1tV4BOnXMVVOvOZK/l/krusyBPcUE5ipzsV11iFtszGIxC/nuSeiZoZKpDA2iRmIUQknFBRh68uIK+wOY0Iv+0EWq4= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com; spf=pass smtp.mailfrom=bytedance.com; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b=Zge6eyoQ; arc=none smtp.client-ip=209.85.214.170 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=bytedance.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b="Zge6eyoQ" Received: by mail-pl1-f170.google.com with SMTP id d9443c01a7336-225df540edcso59344645ad.0 for ; Mon, 14 Apr 2025 19:47:06 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1744685226; x=1745290026; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=obht/6Kk0NnhGpMOiil8TzYARUGhrq+7EhlNkpnMaGs=; b=Zge6eyoQtTo+KWQwgZv3uVtbLgNpcW4KVrD4GgWi6glKPs3KIc5PvRSHHUYmwT0XSo Y8X02l5DH0DdSgBqtrVb0Ox8zH7aUPiU7I1DrL/ZYfRBHRJv9Pi8FdUJTAEG5+V7G4uR MdD4swA667z2qYYiU5jEpwHJu9xI4o+f0JBn0l9/6qF7FrtmTw8jTDLS0do14sUDf+Qj DTzDsja0xDwavFmNj/ZIsvhBE+10Yp7pyHOEg7sZ0KirT3ETo5nQSoeHFjkrpIE8VmEH PzBqApdV158/x5lb4qNkBMGgdu6uSqV7ZX0nDEu4Os4ykgutjbvzpzR+4hHwNDSjGx0a O/EQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1744685226; x=1745290026; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=obht/6Kk0NnhGpMOiil8TzYARUGhrq+7EhlNkpnMaGs=; b=Gs1lUFAgvSKfWwtn8YJk0SKVVn9OLXVx8dHQVbQ5Ca3qUA7C+bHTtWRA4bbihkOdFE MjJmnNI1rieoCpZqjZ8+5uJnt8xJjCBOgN/0WRRNvXm6sCViHGCtqdvxzqIBdXzcM1eO 7YnAGY5ZMTYqMq8Z/ozViOkt7xx7MlGCWqML1y190qfNkQKw+FmsFcWal3mO+UZ+wJ/c cgDco8xe/6W93Ln1842dk7tPmBAaU/TS3gtcxyADWK0e7y438j/kZ5vbOEuzGGgCkc0S NgChdZs1/WeWzORKI9lCtRsiFkwBpmF40ZriTmH8P+/6GM6wXYj6CrikfpQOa3tdxAX0 A1vQ== X-Gm-Message-State: AOJu0Yxl8CsuZCOG5eJqSAZeMF88BNLL9XlSkRN0gS7tYfYtDB0vP2Ut /LMVnpke21TAR8OGT0UmQJpBGJYymZLVHl24kxpi977eHkz/0HAYJgP0pD4KWNs= X-Gm-Gg: ASbGncvhmDT7LMwYl8tKZdoASxsld/o5W5bvJz+cxctyC6bW9BPwIPaGl5Elp0kl/Xn l5cMh/E+uhYg9pdUnSoKV2hhZUtYHM6TcbdHmv36p1HdgR5lOEe3zLUPfS/VJoWFNgJF+Ntn1yI UjedFsA0L/ue/0goJfCrc5dMGdxnFHIuhytWV+cxt8zZLeXA+Z83rRZNEvpsvMleyes95pDp2ty uhRM/eaE3dTKmPPXuNY8gISStRHJCre03e8M0pXApuxfsiVvlt/2NqK0GjEAiS1XFxb6hEaPt1Y KY6CWFjp7uSwlk/O1XQha5vGIFqYo2XWPQbVW+CTzCKiF5xbDdQPLZ5Lb4gbXJ50lG7m/DjYPLH wp47goKE= X-Google-Smtp-Source: AGHT+IF1G++5FxihLJJNREqKO+5yf6OFv8YN+5YM0gPXaE7ee12PkODsP2jTee4zjVH6VXKddsDutQ== X-Received: by 2002:a17:902:d490:b0:226:3392:3704 with SMTP id d9443c01a7336-22c24984d71mr24086475ad.12.1744685225787; Mon, 14 Apr 2025 19:47:05 -0700 (PDT) Received: from PXLDJ45XCM.bytedance.net ([61.213.176.5]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-22ac7ccac49sm106681185ad.217.2025.04.14.19.46.59 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Mon, 14 Apr 2025 19:47:05 -0700 (PDT) From: Muchun Song To: hannes@cmpxchg.org, mhocko@kernel.org, roman.gushchin@linux.dev, shakeel.butt@linux.dev, muchun.song@linux.dev, akpm@linux-foundation.org, david@fromorbit.com, zhengqi.arch@bytedance.com, yosry.ahmed@linux.dev, nphamcs@gmail.com, chengming.zhou@linux.dev Cc: linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, linux-mm@kvack.org, hamzamahfooz@linux.microsoft.com, apais@linux.microsoft.com, Muchun Song Subject: [PATCH RFC 13/28] writeback: prevent memory cgroup release in writeback module Date: Tue, 15 Apr 2025 10:45:17 +0800 Message-Id: <20250415024532.26632-14-songmuchun@bytedance.com> X-Mailer: git-send-email 2.39.5 (Apple Git-154) In-Reply-To: <20250415024532.26632-1-songmuchun@bytedance.com> References: <20250415024532.26632-1-songmuchun@bytedance.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" In the near future, a folio will no longer pin its corresponding memory cgroup. To ensure safety, it will only be appropriate to hold the rcu read lock or acquire a reference to the memory cgroup returned by folio_memcg(), thereby preventing it from being released. In the current patch, the function get_mem_cgroup_css_from_folio() and the rcu read lock are employed to safeguard against the release of the memory cgroup. This serves as a preparatory measure for the reparenting of the LRU pages. Signed-off-by: Muchun Song --- fs/fs-writeback.c | 22 +++++++++++----------- include/linux/memcontrol.h | 9 +++++++-- include/trace/events/writeback.h | 3 +++ mm/memcontrol.c | 14 ++++++++------ 4 files changed, 29 insertions(+), 19 deletions(-) diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c index cc57367fb641..e3561d486bdb 100644 --- a/fs/fs-writeback.c +++ b/fs/fs-writeback.c @@ -269,15 +269,13 @@ void __inode_attach_wb(struct inode *inode, struct fo= lio *folio) if (inode_cgwb_enabled(inode)) { struct cgroup_subsys_state *memcg_css; =20 - if (folio) { - memcg_css =3D mem_cgroup_css_from_folio(folio); - wb =3D wb_get_create(bdi, memcg_css, GFP_ATOMIC); - } else { - /* must pin memcg_css, see wb_get_create() */ + /* must pin memcg_css, see wb_get_create() */ + if (folio) + memcg_css =3D get_mem_cgroup_css_from_folio(folio); + else memcg_css =3D task_get_css(current, memory_cgrp_id); - wb =3D wb_get_create(bdi, memcg_css, GFP_ATOMIC); - css_put(memcg_css); - } + wb =3D wb_get_create(bdi, memcg_css, GFP_ATOMIC); + css_put(memcg_css); } =20 if (!wb) @@ -929,16 +927,16 @@ void wbc_account_cgroup_owner(struct writeback_contro= l *wbc, struct folio *folio if (!wbc->wb || wbc->no_cgroup_owner) return; =20 - css =3D mem_cgroup_css_from_folio(folio); + css =3D get_mem_cgroup_css_from_folio(folio); /* dead cgroups shouldn't contribute to inode ownership arbitration */ if (!(css->flags & CSS_ONLINE)) - return; + goto out; =20 id =3D css->id; =20 if (id =3D=3D wbc->wb_id) { wbc->wb_bytes +=3D bytes; - return; + goto out; } =20 if (id =3D=3D wbc->wb_lcand_id) @@ -951,6 +949,8 @@ void wbc_account_cgroup_owner(struct writeback_control = *wbc, struct folio *folio wbc->wb_tcand_bytes +=3D bytes; else wbc->wb_tcand_bytes -=3D min(bytes, wbc->wb_tcand_bytes); +out: + css_put(css); } EXPORT_SYMBOL_GPL(wbc_account_cgroup_owner); =20 diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h index e74922d5755d..a9ef2087c735 100644 --- a/include/linux/memcontrol.h +++ b/include/linux/memcontrol.h @@ -874,7 +874,7 @@ static inline bool mm_match_cgroup(struct mm_struct *mm, return match; } =20 -struct cgroup_subsys_state *mem_cgroup_css_from_folio(struct folio *folio); +struct cgroup_subsys_state *get_mem_cgroup_css_from_folio(struct folio *fo= lio); ino_t page_cgroup_ino(struct page *page); =20 static inline bool mem_cgroup_online(struct mem_cgroup *memcg) @@ -1594,9 +1594,14 @@ static inline void mem_cgroup_track_foreign_dirty(st= ruct folio *folio, if (mem_cgroup_disabled()) return; =20 + if (!folio_memcg_charged(folio)) + return; + + rcu_read_lock(); memcg =3D folio_memcg(folio); - if (unlikely(memcg && &memcg->css !=3D wb->memcg_css)) + if (unlikely(&memcg->css !=3D wb->memcg_css)) mem_cgroup_track_foreign_dirty_slowpath(folio, wb); + rcu_read_unlock(); } =20 void mem_cgroup_flush_foreign(struct bdi_writeback *wb); diff --git a/include/trace/events/writeback.h b/include/trace/events/writeb= ack.h index 0ff388131fc9..99665c79856b 100644 --- a/include/trace/events/writeback.h +++ b/include/trace/events/writeback.h @@ -266,7 +266,10 @@ TRACE_EVENT(track_foreign_dirty, __entry->ino =3D inode ? inode->i_ino : 0; __entry->memcg_id =3D wb->memcg_css->id; __entry->cgroup_ino =3D __trace_wb_assign_cgroup(wb); + + rcu_read_lock(); __entry->page_cgroup_ino =3D cgroup_ino(folio_memcg(folio)->css.cgroup); + rcu_read_unlock(); ), =20 TP_printk("bdi %s[%llu]: ino=3D%lu memcg_id=3D%u cgroup_ino=3D%lu page_cg= roup_ino=3D%lu", diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 4802ce1f49a4..09ecb5cb78f2 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -229,7 +229,7 @@ DEFINE_STATIC_KEY_FALSE(memcg_bpf_enabled_key); EXPORT_SYMBOL(memcg_bpf_enabled_key); =20 /** - * mem_cgroup_css_from_folio - css of the memcg associated with a folio + * get_mem_cgroup_css_from_folio - acquire a css of the memcg associated w= ith a folio * @folio: folio of interest * * If memcg is bound to the default hierarchy, css of the memcg associated @@ -239,14 +239,16 @@ EXPORT_SYMBOL(memcg_bpf_enabled_key); * If memcg is bound to a traditional hierarchy, the css of root_mem_cgroup * is returned. */ -struct cgroup_subsys_state *mem_cgroup_css_from_folio(struct folio *folio) +struct cgroup_subsys_state *get_mem_cgroup_css_from_folio(struct folio *fo= lio) { - struct mem_cgroup *memcg =3D folio_memcg(folio); + struct mem_cgroup *memcg; =20 - if (!memcg || !cgroup_subsys_on_dfl(memory_cgrp_subsys)) - memcg =3D root_mem_cgroup; + if (!cgroup_subsys_on_dfl(memory_cgrp_subsys)) + return &root_mem_cgroup->css; =20 - return &memcg->css; + memcg =3D get_mem_cgroup_from_folio(folio); + + return memcg ? &memcg->css : &root_mem_cgroup->css; } =20 /** --=20 2.20.1 From nobody Fri Dec 19 16:05:22 2025 Received: from mail-pl1-f177.google.com (mail-pl1-f177.google.com [209.85.214.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id BB1BA219A95 for ; Tue, 15 Apr 2025 02:47:12 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.177 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1744685234; cv=none; b=oYB5MbQWhjU6/bSRX8bdVPqsk+r/bzAf0POk5PYdp+si4WC+V0bbjfs7rEek+78EnzoUhv/1H1x3iZmN7WWQVi4GdOfw45Rpdv75zmKRYuxfayEeBvF7TV8tzjr5cF7WVjbDa07Glt876wVZI5FH3SaTjvSgflN/4MZIKwAKrcI= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1744685234; c=relaxed/simple; bh=HuKS97oFEuuG+wDz4Y7AeIlRQjvWQlQVDlFIdpSjOu0=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=B5unS+G/dwTWrO3BE6G25OX1IGu5t//qqJreP/cqsXaLEIuUDin6Nug8hPG7KpdvvFQTW5/zMOTbbF/vwugCURu9Y0Ri5JVLde6Zu7r6wf9DRgyU9ndoDRtjf1vN+jomKNBll4Z6cKJM3zMED1JsFztIairRn+HjUnQCe1yWoAw= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com; spf=pass smtp.mailfrom=bytedance.com; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b=JZHSpRdH; arc=none smtp.client-ip=209.85.214.177 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=bytedance.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b="JZHSpRdH" Received: by mail-pl1-f177.google.com with SMTP id d9443c01a7336-2279915e06eso53263835ad.1 for ; Mon, 14 Apr 2025 19:47:12 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1744685232; x=1745290032; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=xg/Q9PeJ3a0fPq4jQ400Y/OgBu3oFR0wAT2gCDp4D5Q=; b=JZHSpRdHY8uZnr7Uac6F8fijK+voc0kTBhBrjdqHflYMBvGApQUQUWAD2KfKT7C/f9 wldL4VW9zYPnaOLjIO/Gm1H8DKySvpx2sYl/6PT6GgBmFXBFLmpG10R/TKWcKJj/IpW/ ZKcJdnfeevCyReCdlKNKb76+XoEpdkO+XqZCDKukOPbpadaD8TbjByKMk/a25Kn7Nb6J ENePShNAPX2HcnstFCNwSete79gBxnKjQxWGIWcEbdQyt5nkueyce7Nw65z/pWcNH8Wp YgievKHSiTlOco3M/QbvMaMQjFX0ArQLzZdNt1oNWVcKpEk0ZcMhmlfobNv6japNXc0g Jt/Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1744685232; x=1745290032; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=xg/Q9PeJ3a0fPq4jQ400Y/OgBu3oFR0wAT2gCDp4D5Q=; b=YyjvlU2srvMOLHUyRAAK4kbXCE7jWfX1ybhm2CsOs+Fb88H0og1pdtCDq/UQZjDnud mVDYXTibxuN/LrQ3VFwANrlFYS/Vq8zmimLKhVVEOZ1WlqJ7iPzqHYVG0s0CHtSoVDKS XcqmWZvKy52DOr7HFe5CPTIN2Q+xvxSBo1NM5dcEq8ia36qSKEFYBtRexOVnwTKLDrGv 1FDs1tcxncni58nmeX/eHCLYWhO7awla6TN7xyg5rXKHA+gB7Ov6GtRjQBKI44HgAgmZ MRfM+PXdR7bkNBc3ZqFwgWSRD5zDA7obKRFYCIapXRacX3w2QdYKuG0ePRGcicwetDuP JDag== X-Gm-Message-State: AOJu0Yz3NDMoaw1S3Jx0O397SO++WaC1TOy0E0MKzHl3Bs+7DwH8WVJn iA9JhDBwiEGvISP4+Xw/994oomV+TnlnHUTpbrAF+lcw5NKnbAdlQwGec85vCyM= X-Gm-Gg: ASbGnct/7dGwHE09tfqtzQ40fFCqfkV0d6z4WmEGngzktUuNtCRrUMIpyXALPh4ORd6 9x+oh5RoNKNtdviNoXkZ+e1Wck1daUtaQpqAZbcLyLY/84Hfr3IojwRUgg6ZXwdA/nFiBgIgp6C npTPAbtrsUwUuHyTd9bbUBxCBhkqoDFD+WKXg6G+6HCm29lVucceTrj2sM2RErXastF0h5AjAyI e5T/fEooSF8KTj/esJSsaiK0cQK2ryOqqFMMh+XPglVYxTL1EIpOQUbMYkCCh4OVc/N2m1YiAAF 5cdcumfRzJ+vbnPsMecN9l+NJrJrUWHGOXDBI/2FVRJ1ypPVbXwBzAM56TCek0zWCuE62v4JOK7 +gno9YyE= X-Google-Smtp-Source: AGHT+IEzkWcKa2ct6W+m99WRXOgiOEymvJ30CBNBOfWjsGr2/iLsdVh+Li4JvemGEyYH2iRVm2MWrA== X-Received: by 2002:a17:902:d485:b0:21f:6fb9:9299 with SMTP id d9443c01a7336-22bea4bf561mr197033565ad.27.1744685231866; Mon, 14 Apr 2025 19:47:11 -0700 (PDT) Received: from PXLDJ45XCM.bytedance.net ([61.213.176.5]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-22ac7ccac49sm106681185ad.217.2025.04.14.19.47.06 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Mon, 14 Apr 2025 19:47:11 -0700 (PDT) From: Muchun Song To: hannes@cmpxchg.org, mhocko@kernel.org, roman.gushchin@linux.dev, shakeel.butt@linux.dev, muchun.song@linux.dev, akpm@linux-foundation.org, david@fromorbit.com, zhengqi.arch@bytedance.com, yosry.ahmed@linux.dev, nphamcs@gmail.com, chengming.zhou@linux.dev Cc: linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, linux-mm@kvack.org, hamzamahfooz@linux.microsoft.com, apais@linux.microsoft.com, Muchun Song Subject: [PATCH RFC 14/28] mm: memcontrol: prevent memory cgroup release in count_memcg_folio_events() Date: Tue, 15 Apr 2025 10:45:18 +0800 Message-Id: <20250415024532.26632-15-songmuchun@bytedance.com> X-Mailer: git-send-email 2.39.5 (Apple Git-154) In-Reply-To: <20250415024532.26632-1-songmuchun@bytedance.com> References: <20250415024532.26632-1-songmuchun@bytedance.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" In the near future, a folio will no longer pin its corresponding memory cgroup. To ensure safety, it will only be appropriate to hold the rcu read lock or acquire a reference to the memory cgroup returned by folio_memcg(), thereby preventing it from being released. In the current patch, the rcu read lock is employed to safeguard against the release of the memory cgroup in count_memcg_folio_events(). This serves as a preparatory measure for the reparenting of the LRU pages. Signed-off-by: Muchun Song --- include/linux/memcontrol.h | 11 ++++++++--- 1 file changed, 8 insertions(+), 3 deletions(-) diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h index a9ef2087c735..01239147eb11 100644 --- a/include/linux/memcontrol.h +++ b/include/linux/memcontrol.h @@ -978,10 +978,15 @@ static inline void count_memcg_events(struct mem_cgro= up *memcg, static inline void count_memcg_folio_events(struct folio *folio, enum vm_event_item idx, unsigned long nr) { - struct mem_cgroup *memcg =3D folio_memcg(folio); + struct mem_cgroup *memcg; =20 - if (memcg) - count_memcg_events(memcg, idx, nr); + if (!folio_memcg_charged(folio)) + return; + + rcu_read_lock(); + memcg =3D folio_memcg(folio); + count_memcg_events(memcg, idx, nr); + rcu_read_unlock(); } =20 static inline void count_memcg_events_mm(struct mm_struct *mm, --=20 2.20.1 From nobody Fri Dec 19 16:05:22 2025 Received: from mail-pl1-f179.google.com (mail-pl1-f179.google.com [209.85.214.179]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 4483721ABC6 for ; Tue, 15 Apr 2025 02:47:18 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.179 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1744685240; cv=none; b=f7abS8/LM18odO38GG5JIAPLMeEJh0dE5aDlcy6cvBNXmTLhjJ85pOh/FXuy7nPoaYrXKAC7TpnrpUXgn2RUqYuJhVNzcUbbLq94EmT8vREdmL7FTw3cM6qNFryziwG7XjlqDXGlz8/9zUIfAtqkSMAyARoeZGsvO6IZyNsuMLk= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1744685240; c=relaxed/simple; bh=ExRJlPFqRIPBVplhInN1J+JPDFU1q83KrxCLYkfL8vo=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=ROJ1/bb8pbVC1dmn3qqZTmLyTbXLqsVoa0vFDt1+Hw87yn2JzQVioV/f7wkw3sy18XQdPgwgKY/LZQa8Cch9Zg2dffGFWFmoo47+WMuAEjgZzjnAl+sLSD+qReU4MoX0qXErHdj15rbbUHj6JbD/DEme2MoOalIcb5k6RfMJS4Y= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com; spf=pass smtp.mailfrom=bytedance.com; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b=kIprf5Nn; arc=none smtp.client-ip=209.85.214.179 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=bytedance.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b="kIprf5Nn" Received: by mail-pl1-f179.google.com with SMTP id d9443c01a7336-227d6b530d8so48305725ad.3 for ; Mon, 14 Apr 2025 19:47:18 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1744685237; x=1745290037; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=048a/COuUtmsXsXzHcibcjt0GyskIDx7SCGdBM7o194=; b=kIprf5NnfVr8zwBg2zKmZeWU3jQXAheHtu+IwoDTWZkhXhH0QfBkcKAEd3K5xLaFpQ +BDoi0eGkMluXMR9jag9hEjS/2G+8SFsbqQXhSa0iPne6Qr1fWzwH5ulU0w72AMTlkQT 3tCF05TcFQd2Q7uRsSPQkK5aEEqLNYM+QJmo9xZfqMQPCSsIDnq4pD6waZEYPouCQiKn aMI9Wg9iD5TLpj9b+fEJCR/XnFwjc96qtaDjqvP4QBV4vs/gUPWzMtQXm9H7HXsXcz6E AN+97XoF/ousv0dN6MaD3PdlxmsUr16nX40tqbBwnoB7zB9deyiS78YQFgeYBayqtynY Cw0Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1744685237; x=1745290037; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=048a/COuUtmsXsXzHcibcjt0GyskIDx7SCGdBM7o194=; b=f4NKgsXikm+Jmzswjg2PZ+uAAgu1iL01KhBxl84y0pcoDg0tYXzt2gNp4Dvl4Qpbk8 ymaOYwxf9ZrgtE79eJvsm2kGEYbN2C2M0khp8dP/vfHISd1xOkLzZSjssnC2XXnuWpsi hzrGjc+5Fq3J32vKaAlxViu94ahH/4V8A8FOggpxwdAyHAIt2VVul51jG0Ge8yhjRtiK GsY1VrHlULxtvOWkTZp+ZN70SuQKJzN+gq4OQleHZUV6ae+7Apl+1BrIe4/TL28AD/S+ hYIDsA0N9lWJHk6sQOfpAH+oajD58BHYV4y04MOPP9xf770n6YUUm0/01Kf7zrLpcCJl NrJg== X-Gm-Message-State: AOJu0YzxWpc7MZSlE6vPZVfScDEXv1UnjbrlXfNAL9E8m5M83sCn667h s/oSt500uO1wxeDbcyAtTnUj2nL+WxSgej2avkTKUmlDFgH0swroLWOxmSqYENo= X-Gm-Gg: ASbGncshKjrnJK1N1lftUGt+n6Y/a62GO6b6c/oSjwZ1Xd6BNfHuhrEz4flI+B+5iTG EaDgxmHwB39AqsljH/EpYbVkVDvunZ/fK4LD4DXxghDYW1JPTDZWcVWP0sEBAQYzq5iuYl72vdH 2x7LXyEImmUVhmHQ1WROqKdWXF43RiDDaU+2FRsPV9l7nyYGTzhzAdUea5N3QSE0qu8CLRNo3rx 72Ev0IJ4UBu/tGlePgZ9cJ7AMEQbKaISTdliU5HlmLnhX9Muk7/jz6zII4PuO72rZSmverJ4iqx mC9B9qNPI0+2qAdSKJCpnkwQrnJjF6uYTx8FiTJbxl+5Pj0u+jNiQCYEQ4W3K9qKNDCPUzQ9 X-Google-Smtp-Source: AGHT+IH8mFGGtZG1eAt8IdVT2T6wuIxSclSFtYUMLbYnFcscVwUk704iXPYP1CTuJRMCglCtOnp9BA== X-Received: by 2002:a17:902:ccc5:b0:221:7b4a:476c with SMTP id d9443c01a7336-22bea4ab854mr201964985ad.18.1744685237616; Mon, 14 Apr 2025 19:47:17 -0700 (PDT) Received: from PXLDJ45XCM.bytedance.net ([61.213.176.5]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-22ac7ccac49sm106681185ad.217.2025.04.14.19.47.12 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Mon, 14 Apr 2025 19:47:17 -0700 (PDT) From: Muchun Song To: hannes@cmpxchg.org, mhocko@kernel.org, roman.gushchin@linux.dev, shakeel.butt@linux.dev, muchun.song@linux.dev, akpm@linux-foundation.org, david@fromorbit.com, zhengqi.arch@bytedance.com, yosry.ahmed@linux.dev, nphamcs@gmail.com, chengming.zhou@linux.dev Cc: linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, linux-mm@kvack.org, hamzamahfooz@linux.microsoft.com, apais@linux.microsoft.com, Muchun Song Subject: [PATCH RFC 15/28] mm: page_io: prevent memory cgroup release in page_io module Date: Tue, 15 Apr 2025 10:45:19 +0800 Message-Id: <20250415024532.26632-16-songmuchun@bytedance.com> X-Mailer: git-send-email 2.39.5 (Apple Git-154) In-Reply-To: <20250415024532.26632-1-songmuchun@bytedance.com> References: <20250415024532.26632-1-songmuchun@bytedance.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" In the near future, a folio will no longer pin its corresponding memory cgroup. To ensure safety, it will only be appropriate to hold the rcu read lock or acquire a reference to the memory cgroup returned by folio_memcg(), thereby preventing it from being released. In the current patch, the rcu read lock is employed to safeguard against the release of the memory cgroup in swap_writepage() and bio_associate_blkg_from_page(). This serves as a preparatory measure for the reparenting of the LRU pages. Signed-off-by: Muchun Song --- mm/page_io.c | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/mm/page_io.c b/mm/page_io.c index 4bce19df557b..5894e2ff97ef 100644 --- a/mm/page_io.c +++ b/mm/page_io.c @@ -280,10 +280,14 @@ int swap_writepage(struct page *page, struct writebac= k_control *wbc) folio_unlock(folio); return 0; } + + rcu_read_lock(); if (!mem_cgroup_zswap_writeback_enabled(folio_memcg(folio))) { + rcu_read_unlock(); folio_mark_dirty(folio); return AOP_WRITEPAGE_ACTIVATE; } + rcu_read_unlock(); =20 __swap_writepage(folio, wbc); return 0; @@ -308,11 +312,11 @@ static void bio_associate_blkg_from_page(struct bio *= bio, struct folio *folio) struct cgroup_subsys_state *css; struct mem_cgroup *memcg; =20 - memcg =3D folio_memcg(folio); - if (!memcg) + if (!folio_memcg_charged(folio)) return; =20 rcu_read_lock(); + memcg =3D folio_memcg(folio); css =3D cgroup_e_css(memcg->css.cgroup, &io_cgrp_subsys); bio_associate_blkg_from_css(bio, css); rcu_read_unlock(); --=20 2.20.1 From nobody Fri Dec 19 16:05:22 2025 Received: from mail-pl1-f180.google.com (mail-pl1-f180.google.com [209.85.214.180]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 6B5471531E8 for ; Tue, 15 Apr 2025 02:47:24 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.180 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1744685245; cv=none; b=VVi2W3xnnpiVuWK7Kv9VbWfBPXQM2ZdbPfpRASvyCJSHtO/NCExxava6j7nYuZCkKmt1fztQE6U6WVygppLIIJYUoY+ph1pVrWdP3Ci0RJ+HMjKWfwyHL5bJ/4hpkExfF87gK6y35Z5sftDCgkIWQM1A5tyeP5yORdo3gYMq6HI= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1744685245; c=relaxed/simple; bh=v5yMn5IixgDIQc8oleyaSWuKuE9z67T/Dq0pK11nZuI=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=Qka+6FSY59cRvYr5m+EiCTVbIj8gDyFhYht24SnfcrC8UtsBzufyMiC7ovJ1v89fq/tNBy2Oo3pNjkNSayRuNDQccSrh/lKih0lcM5DUjSi5gvHmZjJAXuRIi7aPQw5flExfiJK22bwN1sHhqzKm3po55OoWjPj4OUBQfkTvxRY= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com; spf=pass smtp.mailfrom=bytedance.com; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b=OQOaleIw; arc=none smtp.client-ip=209.85.214.180 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=bytedance.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b="OQOaleIw" Received: by mail-pl1-f180.google.com with SMTP id d9443c01a7336-2240b4de12bso68177975ad.2 for ; Mon, 14 Apr 2025 19:47:24 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1744685244; x=1745290044; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=M5EyQKk8+6fTeRtjB7wRmwigOpyCG4thZQUtLcFoNss=; b=OQOaleIwWH8SiioPA7QjEmUZKtNsgGvNAClUiOjgHKllSTzbhThqIOh2R2FKNlGlV4 NhGVzNHLrrC1JjcivD1R/3lucZD504VLoOOmgUaEACPQkQY+Mv+65wbipbjY0tUrjm3a tL4X4hkswJBuqirz4kCSU9kBWhMFzid7IfEIe4e6Hy4kTFoM4GDO57jP8oMwoM8sK2V+ SiM4t4laA2I8kav++Yj73sV5I7Qiy9iIZsSAppwCLdUL+o1mrWNlaHUHMim0VpYiLCbT c1dfj5DuApjBJWUG41fatIpI8yDFLqLn5ZZcnWy0umwuv9ZVD3VvhidnnVpq7SYpYpON +pzA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1744685244; x=1745290044; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=M5EyQKk8+6fTeRtjB7wRmwigOpyCG4thZQUtLcFoNss=; b=mTb/WaP8Ri0xgpx0Ehkgrr865BIstB+lgiylJRVNrZAqLEIaodg0OBjf+2hq6HA9fE Ty/c6zXxs/VnQ8K96sby1pdSyJm+1b6bIKwdDdCuV6AumySLRq7ONV4dH0kdLMhNo2W3 8WyH1VdBYMNryQZ8ogH+qqLEFxGZjQiOHUEa26RDseHP17J0uvlFL0iNenzJakqMiM7j C0d8iLSMIgePcELCnFK5IlFDBOrD0GkYD/NtfVzLWRggMZtwZBOcg0hP/Z5S+hgl/qsh qyo23aCiulLgsS/0SqO4/+2n/mrdjNAb+UTTBguRCj0RKhfF363OPvL/SUa5Kvyzl4tL k6bA== X-Gm-Message-State: AOJu0YysAHjH0MzEJY0cH2C9jR0wT9Z80qiBBYF1u1gWrMTVYShQT2qh 5JqbIq4v6NxPqwD3+v5IeiklDCANhMMziUXCjq3EWpQt3Jl/SlXPfpac6jX/mFY9L31befaxHJu Oq+aeBA== X-Gm-Gg: ASbGncvvdv+nsWjAqkgfkG9hVWko5JsnEY4OkpYrbtYz40xIN96bXwZTvOtnLjRxGQ8 fXHMu8D8Gsd9rE1s1YkeijSTHmrTkKPbHwbyvXuH1rkq79GzxW0KYAVOqVWlw2BpWAYavf5qwlp OphyG10tbwZmQOdjFeMAQjtJK+0cv2+pyvKI/EuLtv1xCTpjaQcHgHvDl2AHTuh+cErUGyoaf0a CmROIagKkyKCTYUzEBKCXEtX0rQlvZXT5CzI6WmaOqOVJO7OxQOhuwvp3sbICX2Nvkn/LYVw1v0 3UQbXC4Bm8hRXv/OIOJZIcSON2t0JiwDux0a9D2Cc9GuiWiW5FD/vL9iSiwpSoViAih31C5r X-Google-Smtp-Source: AGHT+IHMPzkkwvrvPUMjLd8RH358czmXgq5CB+tWd2lPWjwDGWGWGLk2G1vtiG4B5iGYuVvrrbfcCQ== X-Received: by 2002:a17:902:f70f:b0:224:5a8:ba29 with SMTP id d9443c01a7336-22bea50ba50mr238795615ad.43.1744685243640; Mon, 14 Apr 2025 19:47:23 -0700 (PDT) Received: from PXLDJ45XCM.bytedance.net ([61.213.176.5]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-22ac7ccac49sm106681185ad.217.2025.04.14.19.47.18 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Mon, 14 Apr 2025 19:47:23 -0700 (PDT) From: Muchun Song To: hannes@cmpxchg.org, mhocko@kernel.org, roman.gushchin@linux.dev, shakeel.butt@linux.dev, muchun.song@linux.dev, akpm@linux-foundation.org, david@fromorbit.com, zhengqi.arch@bytedance.com, yosry.ahmed@linux.dev, nphamcs@gmail.com, chengming.zhou@linux.dev Cc: linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, linux-mm@kvack.org, hamzamahfooz@linux.microsoft.com, apais@linux.microsoft.com, Muchun Song Subject: [PATCH RFC 16/28] mm: migrate: prevent memory cgroup release in folio_migrate_mapping() Date: Tue, 15 Apr 2025 10:45:20 +0800 Message-Id: <20250415024532.26632-17-songmuchun@bytedance.com> X-Mailer: git-send-email 2.39.5 (Apple Git-154) In-Reply-To: <20250415024532.26632-1-songmuchun@bytedance.com> References: <20250415024532.26632-1-songmuchun@bytedance.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" In the near future, a folio will no longer pin its corresponding memory cgroup. To ensure safety, it will only be appropriate to hold the rcu read lock or acquire a reference to the memory cgroup returned by folio_memcg(), thereby preventing it from being released. In the current patch, the rcu read lock is employed to safeguard against the release of the memory cgroup in folio_migrate_mapping(). This serves as a preparatory measure for the reparenting of the LRU pages. Signed-off-by: Muchun Song --- mm/migrate.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/mm/migrate.c b/mm/migrate.c index f3ee6d8d5e2e..2ff1eaf39a9e 100644 --- a/mm/migrate.c +++ b/mm/migrate.c @@ -565,6 +565,7 @@ static int __folio_migrate_mapping(struct address_space= *mapping, struct lruvec *old_lruvec, *new_lruvec; struct mem_cgroup *memcg; =20 + rcu_read_lock(); memcg =3D folio_memcg(folio); old_lruvec =3D mem_cgroup_lruvec(memcg, oldzone->zone_pgdat); new_lruvec =3D mem_cgroup_lruvec(memcg, newzone->zone_pgdat); @@ -592,6 +593,7 @@ static int __folio_migrate_mapping(struct address_space= *mapping, __mod_lruvec_state(new_lruvec, NR_FILE_DIRTY, nr); __mod_zone_page_state(newzone, NR_ZONE_WRITE_PENDING, nr); } + rcu_read_unlock(); } local_irq_enable(); =20 --=20 2.20.1 From nobody Fri Dec 19 16:05:22 2025 Received: from mail-pg1-f174.google.com (mail-pg1-f174.google.com [209.85.215.174]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id A49231531E8 for ; Tue, 15 Apr 2025 02:47:29 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.215.174 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1744685251; cv=none; b=VWXAPjOucvLvuStlxjak7b+YVGgY6bH3/C+vNywy2yX0v8cL6NTVZp5tvrx3X4hpflU4APgQ0Xofew/L+/A2NDBhNLZDpPh1uZ08aW3ssHydVKlMvJSgnLEUNgvniHZpj2UUNbYbAVQYttooEPJLI5LQQ/4m+1ITVV+eSqw8LVE= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1744685251; c=relaxed/simple; bh=iY7sUEg8RxvszMD15EveHMNsGLG7ckUp1GQHb7p3g4Q=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=RGiMAOes0s/eKPUiX5846Us26OYBkq68dSy4q64EDIP46gDrZjSsM2OmrfyK/no9Y1Tlp+ooazaR/SSThtAQALyvffJ/i3vElDskBDkOSf/N7OYsVcnSc04iMNaPo8qBeIbw6ByzO9+u+FelxD5UCgor7E5+UflNnDRPR/xHFbQ= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com; spf=pass smtp.mailfrom=bytedance.com; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b=X3YcwjO9; arc=none smtp.client-ip=209.85.215.174 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=bytedance.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b="X3YcwjO9" Received: by mail-pg1-f174.google.com with SMTP id 41be03b00d2f7-af59c920d32so3686228a12.0 for ; Mon, 14 Apr 2025 19:47:29 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1744685249; x=1745290049; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=P4yZw6tSBeNw48uMiPAV08H8+5zhu9HJtv5YMARqzOY=; b=X3YcwjO9y65Gsm4R/lWIdmkJVfmM9nsfuyzeNvX5K0UgEww8fL8TEokcr7SlPS5i6P e5hqxfvJdFsrTmrAMRWQLjTcGF9YwbPzRt0dvX72IwVTGnMvy/Wfk2dNWcdOrAo07PSL bZdm7vO//fRi+sTBCbvUNGvTMsNxmxeQUo1zIR7vDejgC4JPxEciOj/0SgUBy40jh+Na bzu0ONIvaQICQWsISiDTR2E0PyUzQ+2y7Z3XEI7wS4r6jZdUvmOSGt17hyhIOl0Mi5Nd UOaWZhlY17ZwgnKI/a8G6uurhq91XKgZhtKJ7zgnEKDKzImiT0plgo50YtUPHJHKNBou 5bhA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1744685249; x=1745290049; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=P4yZw6tSBeNw48uMiPAV08H8+5zhu9HJtv5YMARqzOY=; b=vYg4f8enDlhYo9geMaDXOk6/6CQBwyGYFOX1ajXAyf9hhJVCXUYPIxZdBm05Eg9LlK bnQR3cLjnFgS7YgsHA2CRVs6NxIrr3QSM2nN3aknDjl6JDCflzQ5HcelFSVyCKs4c09o Zz8+Tvxb3nT/B9Rsy830eMlnDnyt3X3XSbSZIgtQnU3dkXEVlKue/crtzkWqehdcXRTt WyVk9M+1sG+zFOLypaUPmw+EBInKNdXu2gZQtUkpVxLkQVZKh/JG0N0Vss2ousYRIyC5 BVnnZ2osKPdLkQTCwMu0IMpOsmq1NAW1Zi4OnnGio7TRZ2QSi9MoypWlHwL65ubt4jG2 yk8g== X-Gm-Message-State: AOJu0Yy76NNQcIiFvgczbafLOdnBz+ad1sOxrQ9+kjP7Hk5jb4y+BucJ zZGp4O1m1tjfLewkxRXo10XlouYeIfbvKVdlk0ogwYG5FKvfTYtL8Ct2ZKkvLJo= X-Gm-Gg: ASbGncu3Fc1T8HDpvDbdOEdghECF3jgo7DDhF3hLsl7sOtSI2xXFvFkeOsBbFBJlAGv qpVkyOaVeZsQ0KKZXcF1z5xEvwt0+739NiORp8kX1UPsj9mNgbnlAEmAEmBN1olMlkoS6P4g8gE seVz9c4k1PmnaGhvVn1gK1W8htK4f8HXXqEi9r9rMpyUupYP2HNvt9AHBXfXnWicjjzAgsUYGvl Go/NO4eVPlCwEFLJ+WDcTe1HZX26DlDZ78CqiF3QXclLwJFOiLGKjwnNYSVMNLB8/qrBnT9c8uD WHRxiX/g2Goh+8zuOMl9OLFv4FKgUN5mVHrnoDkoALIjOoG8wNMyjkreVgaax9iUk2fIjeiW X-Google-Smtp-Source: AGHT+IHc1pLuWcfx52pBjvHAkcC6la0P8uERYY4ckvTOHEzauARR1urHfVeCryhgzFD7fMiSaMQDhw== X-Received: by 2002:a17:903:2f86:b0:223:47b4:aaf8 with SMTP id d9443c01a7336-22bea50e1a5mr207262015ad.52.1744685249046; Mon, 14 Apr 2025 19:47:29 -0700 (PDT) Received: from PXLDJ45XCM.bytedance.net ([61.213.176.5]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-22ac7ccac49sm106681185ad.217.2025.04.14.19.47.24 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Mon, 14 Apr 2025 19:47:28 -0700 (PDT) From: Muchun Song To: hannes@cmpxchg.org, mhocko@kernel.org, roman.gushchin@linux.dev, shakeel.butt@linux.dev, muchun.song@linux.dev, akpm@linux-foundation.org, david@fromorbit.com, zhengqi.arch@bytedance.com, yosry.ahmed@linux.dev, nphamcs@gmail.com, chengming.zhou@linux.dev Cc: linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, linux-mm@kvack.org, hamzamahfooz@linux.microsoft.com, apais@linux.microsoft.com, Muchun Song Subject: [PATCH RFC 17/28] mm: mglru: prevent memory cgroup release in mglru Date: Tue, 15 Apr 2025 10:45:21 +0800 Message-Id: <20250415024532.26632-18-songmuchun@bytedance.com> X-Mailer: git-send-email 2.39.5 (Apple Git-154) In-Reply-To: <20250415024532.26632-1-songmuchun@bytedance.com> References: <20250415024532.26632-1-songmuchun@bytedance.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" In the near future, a folio will no longer pin its corresponding memory cgroup. To ensure safety, it will only be appropriate to hold the rcu read lock or acquire a reference to the memory cgroup returned by folio_memcg(), thereby preventing it from being released. In the current patch, the rcu read lock is employed to safeguard against the release of the memory cgroup in mglru. This serves as a preparatory measure for the reparenting of the LRU pages. Signed-off-by: Muchun Song --- mm/vmscan.c | 17 +++++++++++++---- 1 file changed, 13 insertions(+), 4 deletions(-) diff --git a/mm/vmscan.c b/mm/vmscan.c index eac5e6e70660..fbba14094c6d 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -3451,8 +3451,10 @@ static struct folio *get_pfn_folio(unsigned long pfn= , struct mem_cgroup *memcg, if (folio_nid(folio) !=3D pgdat->node_id) return NULL; =20 + rcu_read_lock(); if (folio_memcg(folio) !=3D memcg) - return NULL; + folio =3D NULL; + rcu_read_unlock(); =20 return folio; } @@ -4194,10 +4196,10 @@ bool lru_gen_look_around(struct page_vma_mapped_wal= k *pvmw) unsigned long addr =3D pvmw->address; struct vm_area_struct *vma =3D pvmw->vma; struct folio *folio =3D pfn_folio(pvmw->pfn); - struct mem_cgroup *memcg =3D folio_memcg(folio); + struct mem_cgroup *memcg; struct pglist_data *pgdat =3D folio_pgdat(folio); - struct lruvec *lruvec =3D mem_cgroup_lruvec(memcg, pgdat); - struct lru_gen_mm_state *mm_state =3D get_mm_state(lruvec); + struct lruvec *lruvec; + struct lru_gen_mm_state *mm_state; DEFINE_MAX_SEQ(lruvec); int gen =3D lru_gen_from_seq(max_seq); =20 @@ -4234,6 +4236,11 @@ bool lru_gen_look_around(struct page_vma_mapped_walk= *pvmw) } } =20 + rcu_read_lock(); + memcg =3D folio_memcg(folio); + lruvec =3D mem_cgroup_lruvec(memcg, pgdat); + mm_state =3D get_mm_state(lruvec); + arch_enter_lazy_mmu_mode(); =20 pte -=3D (addr - start) / PAGE_SIZE; @@ -4270,6 +4277,8 @@ bool lru_gen_look_around(struct page_vma_mapped_walk = *pvmw) =20 arch_leave_lazy_mmu_mode(); =20 + rcu_read_unlock(); + /* feedback from rmap walkers to page table walkers */ if (mm_state && suitable_to_scan(i, young)) update_bloom_filter(mm_state, max_seq, pvmw->pmd); --=20 2.20.1 From nobody Fri Dec 19 16:05:22 2025 Received: from mail-pl1-f175.google.com (mail-pl1-f175.google.com [209.85.214.175]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 04D691531E8 for ; Tue, 15 Apr 2025 02:47:34 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.175 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1744685256; cv=none; b=UXD0GbvUgVq7g/6/OiQh+gBEyEbhZacpg89I5VSl7mJTe72rzOwCbJQs1wx3NrJqdNbkTrZCismfICaRxth7iBLbYA2NFF2L9zLOc1U7VDfI/Ro3ddJE18U0xw98adKgfFHaj7I8Xs3kCelgGbaKl5Sbmg0skPT2jsfH4yhqhRY= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1744685256; c=relaxed/simple; bh=wFI5xoUcr+PPoN6JLmUH0KSJpkttQ6kHDTWRm3toq2M=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=UCzbPqgrOSrYWDsluvCxpqc9eTY7/uwYom+gts2alIjh9SN41WnaDEO+d3cpAIdDy75TVX2wmN5bPXxEvYhD366frultgV2jVm//1NkQcFpBpFJ7yj54XWH+6jphdbdIWwtZGufp/H1/Y5ZLPDwBJ9uR/It0DGRd2jW17W2XpQ0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com; spf=pass smtp.mailfrom=bytedance.com; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b=fPRMKG5g; arc=none smtp.client-ip=209.85.214.175 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=bytedance.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b="fPRMKG5g" Received: by mail-pl1-f175.google.com with SMTP id d9443c01a7336-227cf12df27so40404525ad.0 for ; Mon, 14 Apr 2025 19:47:34 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1744685254; x=1745290054; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=IXD1JHisUM2V0YMP9gwUIK79l/C4R22oOynHqMcLpgg=; b=fPRMKG5g6Y4dHSqFrX0n2724FhvwdUsiZGPNJiB7fDJexa/TcNF+jVGi2OvahkabJf iO5817tdTejqojD61CIp2CbNG1k5HMEQQsDFPCx8j7SSREWKdI8TRRtShuSNQVIZhhBu 7zGMRbRAHc4EtRxx6GjMFzKZzifCoj4kIGGQgyrNg8EHU7CbbYgwTk+cLUc8sv743LhN MKYNydY7KbEL6LuleCP4CMUAyNAcPBk3qqIx1fx1W8A5Vn6M6sFJJzsrNYbcCBwoA1Ai xFYLuImy5NcNEhAvT3aAEATnvDzBJjaZ7iIMiHlnhA9JT+LsrCv2PwZks9TvuJHGNClr 2XTw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1744685254; x=1745290054; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=IXD1JHisUM2V0YMP9gwUIK79l/C4R22oOynHqMcLpgg=; b=blMaetGnWirmcMLpUE87rY2ifen9V5RsZ2UhLLwJVfxj/YPvOY7aW91RaGfe7xpx3W 92zOwlUvUmGSMhtHUNy5ciCr5z0ZQ/pD3vbfi4ATv1K/sBST8TrNfTZvbkt4vJb1ODgf S6KsQiOR4il49FWrqXXFRLwMVVE+nYJTSa7asy1IYio3wKU8ZLXcTxeeKsknSu460Zjd wwtjHwGyh2uKDD3MCscvfU8bhRy/tbVuq4RX07AqQwWP8lOeHy0G8LJwsf0N3u4bLP5+ Nh5ydkdaIERoYPp1+gUBE63S+tfiyPieeS2u3B/8++gfsl7vLbFkvWBqL3fzM0XKI0uA P7ow== X-Gm-Message-State: AOJu0YyE0v5FOvIJwdSeoVgQJ8TIpPw8gUbI5A+T6K3OK1F2LqC6j7uk pFhsRkhLppkzB7u8mOU8Jx5gf8swE2zpHsM/QQuxMkanODGjXWS5UZviCK4raxY= X-Gm-Gg: ASbGncskErdD/lGMg9QubBevklH7nIvSUdrI0gamlV3mFiVT1+RvO9gX5z3R4AVNHjw 1u0fn35MglUo0nyuEBFETZ0vdEH0G72aU+zguY7S7kVt145QNfkS8HyJepRjIOLI+Xchbsd0C2Z x/4tHDK3ErcWcTkEmNlHRdHRyIsRU5yptmw12XlzofLHXckBh3vJpX3wj3wJGtqgW1pkb+ChwTN iZEdADxhM5mtmJyT+KOnzLAA6XTyWiRHKbrMFIhHmU9qsgIucHTHMyibXAy+rjFt/fS+S1SNWXJ 5ZSqnu0jo5MfTI34DWloB4+QldoGhUP7AP44DxP9Hzd5guDFdbCA2pgMgEInNlmOgbzojt5k X-Google-Smtp-Source: AGHT+IGov3Ukjqy6KHI2Ta9mNCWkvrqJ5tO5c4YlNBGkmYhVlyky9mFum65/WmO/B3bYIoz2x2Rf2g== X-Received: by 2002:a17:902:c942:b0:215:b1e3:c051 with SMTP id d9443c01a7336-22c24984d66mr19690505ad.11.1744685254245; Mon, 14 Apr 2025 19:47:34 -0700 (PDT) Received: from PXLDJ45XCM.bytedance.net ([61.213.176.5]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-22ac7ccac49sm106681185ad.217.2025.04.14.19.47.29 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Mon, 14 Apr 2025 19:47:33 -0700 (PDT) From: Muchun Song To: hannes@cmpxchg.org, mhocko@kernel.org, roman.gushchin@linux.dev, shakeel.butt@linux.dev, muchun.song@linux.dev, akpm@linux-foundation.org, david@fromorbit.com, zhengqi.arch@bytedance.com, yosry.ahmed@linux.dev, nphamcs@gmail.com, chengming.zhou@linux.dev Cc: linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, linux-mm@kvack.org, hamzamahfooz@linux.microsoft.com, apais@linux.microsoft.com, Muchun Song Subject: [PATCH RFC 18/28] mm: memcontrol: prevent memory cgroup release in mem_cgroup_swap_full() Date: Tue, 15 Apr 2025 10:45:22 +0800 Message-Id: <20250415024532.26632-19-songmuchun@bytedance.com> X-Mailer: git-send-email 2.39.5 (Apple Git-154) In-Reply-To: <20250415024532.26632-1-songmuchun@bytedance.com> References: <20250415024532.26632-1-songmuchun@bytedance.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" In the near future, a folio will no longer pin its corresponding memory cgroup. To ensure safety, it will only be appropriate to hold the rcu read lock or acquire a reference to the memory cgroup returned by folio_memcg(), thereby preventing it from being released. In the current patch, the rcu read lock is employed to safeguard against the release of the memory cgroup in mem_cgroup_swap_full(). This serves as a preparatory measure for the reparenting of the LRU pages. Signed-off-by: Muchun Song --- mm/memcontrol.c | 10 +++++++--- 1 file changed, 7 insertions(+), 3 deletions(-) diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 09ecb5cb78f2..694f19017699 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -5097,17 +5097,21 @@ bool mem_cgroup_swap_full(struct folio *folio) if (do_memsw_account()) return false; =20 - memcg =3D folio_memcg(folio); - if (!memcg) + if (!folio_memcg_charged(folio)) return false; =20 + rcu_read_lock(); + memcg =3D folio_memcg(folio); for (; !mem_cgroup_is_root(memcg); memcg =3D parent_mem_cgroup(memcg)) { unsigned long usage =3D page_counter_read(&memcg->swap); =20 if (usage * 2 >=3D READ_ONCE(memcg->swap.high) || - usage * 2 >=3D READ_ONCE(memcg->swap.max)) + usage * 2 >=3D READ_ONCE(memcg->swap.max)) { + rcu_read_unlock(); return true; + } } + rcu_read_unlock(); =20 return false; } --=20 2.20.1 From nobody Fri Dec 19 16:05:22 2025 Received: from mail-pl1-f179.google.com (mail-pl1-f179.google.com [209.85.214.179]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 1B793230BCF for ; Tue, 15 Apr 2025 02:47:40 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.179 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1744685261; cv=none; b=rzsg4BMZcoZIAtWlCNeOuFGoJg6gpTbWPn1N4yu11fb5K4CTsXZURLM1CK330bK0VM2IDqScUyxVw46Ej63OdtMsyw1GCfx5I+XdCLKjeE26+FXYtEWR9qOzAn+kQI7nvK/VmZxm2unCUYQvD5r9dpO2zGkLy1R2fW8C/FqqmIQ= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1744685261; c=relaxed/simple; bh=Zzgrk1/+DtDOPybbX6ZvBHcnSmm0kFUZaAuQTqVa+nM=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=bCfFhIxLdwKRPAk/kkXkzXOLqt4hea0rt/AKyBqBgSGDyHpnS+WjpQzahIaymmPKOHDnrSC90fImH6WOhZEwRuuA+u97noUi1xrnVrW9VADco1YgZKDguppHwOanRx0mImFVlG7p4GK1hDuZ3St20G2gjHpRJdFe5Wp9ZBeb6rQ= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com; spf=pass smtp.mailfrom=bytedance.com; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b=ehc7iTpC; arc=none smtp.client-ip=209.85.214.179 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=bytedance.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b="ehc7iTpC" Received: by mail-pl1-f179.google.com with SMTP id d9443c01a7336-22622ddcc35so66467735ad.2 for ; Mon, 14 Apr 2025 19:47:40 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1744685259; x=1745290059; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=il+WlAu7ECEYeHNnJ6w5CmqDQz43sJSf6crApj5k5Jc=; b=ehc7iTpC2DdckSPuuWMaY1Wqb5WQbwFv4Xx7W3kU3v3/bUWcyHJu0jQ7AKvT+tkjgv UqCS8DJxBmmpKmIL5d9102+Cl/JHy+d047GmFARRPiomaspG4pjs62O09mytSw5UCrNi t4BMebti0+VvnSLf/OjbqHsO4KN2XEEUCTgcTX13spdUSaPYw/vOi0zBHVMHOD2jrVET WSm3/qUoWafT3cahRZlj3/U7MM5Gst2UcEaOp+B8QRcQy7XqyrP8v1xqFWbxWQ0+S6n4 uFXCcn7qUv7sYEGhv9Z1FX1jYR5HFP5eEYXclRQSZ1EDSsZt8YGeqBKYxIwHvuII8ruP E2sA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1744685259; x=1745290059; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=il+WlAu7ECEYeHNnJ6w5CmqDQz43sJSf6crApj5k5Jc=; b=vmXwxPaNKvE5bNlRROhS7PIxg10FRSidb+1lKEwwzism3YnFJqCMFP+YhXr4aEtb6F +QdRd0EQArwRmZvipzVPkIFsnsTKKwWrrT9+8DjjtBmxTS0JgzcrztFLq/PdQ09ga4wy XpyNRcUORlMykOKgjtVqjyLKiUyTiV/JkzgpPhB8O2qv9me82DDXvwE+C5C0+BpT3Wux Z4WvOJOaRVmtGWytwIutTzHiaBiQ4/RlLqRYDFj+wMMTGzdSWpMV7njNHrJh2wj1l04k e/XgyzrP/op3PCK69VTBciND6R7d7vWw3d5whnPuf+C7RL3Uv/oOF2bl41LuQGj0rLbI 6Afw== X-Gm-Message-State: AOJu0YwPBzTkIDmMeVG0+BoBDF3kwBfBMV1uiLifWdvROArMwzE+HUQO mc1skmWJHSwhePn+1QGsAUxnCLgdCy9LhAGFgYr8OPc853NTgxq6lVQexcY2dh8= X-Gm-Gg: ASbGnctOKK79a4+hz4BMASaDAep9UtYN4UDlzt9jfpjHVj/QIzEEChy9EG1Djvd0x/m uYuMjSmZJeOutlRWwZ9stbJRb5le8OpIaY5jEnleJnRycSj9Do2d00CQNBRh0yqJGk4qhxSXI2B MQquD/GiQfLYuh2paWam6IJP75nouSrE58efqt7o4NPkTFd4KMM7PKnjvuHcSBPpxKGh52EPxJ3 n30f3IYyZsaX2A3spEcXcjoLROwq8ZXC3u6zQHKh5w/z7vJxzN8TuC9nQZ+GHcTLvNYdL3GCMey Mbqpsjpzo6AiPj9rUQdsQJyG0WenBMTB5T+aHyeY5G6bGS1MI4tCW22LAKlULtrH0vm2PcX6 X-Google-Smtp-Source: AGHT+IELTjYsxK3e4qqX4T5H1yTy2Tq0r1FbaCUPQTdjd2lEOKiB73ooWS5EpMEZdecB9fJL4bZ8Lg== X-Received: by 2002:a17:902:d591:b0:224:c76:5e57 with SMTP id d9443c01a7336-22bea4ef9e9mr221732775ad.39.1744685259610; Mon, 14 Apr 2025 19:47:39 -0700 (PDT) Received: from PXLDJ45XCM.bytedance.net ([61.213.176.5]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-22ac7ccac49sm106681185ad.217.2025.04.14.19.47.34 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Mon, 14 Apr 2025 19:47:39 -0700 (PDT) From: Muchun Song To: hannes@cmpxchg.org, mhocko@kernel.org, roman.gushchin@linux.dev, shakeel.butt@linux.dev, muchun.song@linux.dev, akpm@linux-foundation.org, david@fromorbit.com, zhengqi.arch@bytedance.com, yosry.ahmed@linux.dev, nphamcs@gmail.com, chengming.zhou@linux.dev Cc: linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, linux-mm@kvack.org, hamzamahfooz@linux.microsoft.com, apais@linux.microsoft.com, Muchun Song Subject: [PATCH RFC 19/28] mm: workingset: prevent memory cgroup release in lru_gen_eviction() Date: Tue, 15 Apr 2025 10:45:23 +0800 Message-Id: <20250415024532.26632-20-songmuchun@bytedance.com> X-Mailer: git-send-email 2.39.5 (Apple Git-154) In-Reply-To: <20250415024532.26632-1-songmuchun@bytedance.com> References: <20250415024532.26632-1-songmuchun@bytedance.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" In the near future, a folio will no longer pin its corresponding memory cgroup. To ensure safety, it will only be appropriate to hold the rcu read lock or acquire a reference to the memory cgroup returned by folio_memcg(), thereby preventing it from being released. In the current patch, the rcu read lock is employed to safeguard against the release of the memory cgroup in lru_gen_eviction(). This serves as a preparatory measure for the reparenting of the LRU pages. Signed-off-by: Muchun Song --- mm/workingset.c | 9 +++++++-- 1 file changed, 7 insertions(+), 2 deletions(-) diff --git a/mm/workingset.c b/mm/workingset.c index ebafc0eaafba..e14b9e33f161 100644 --- a/mm/workingset.c +++ b/mm/workingset.c @@ -241,11 +241,14 @@ static void *lru_gen_eviction(struct folio *folio) int refs =3D folio_lru_refs(folio); bool workingset =3D folio_test_workingset(folio); int tier =3D lru_tier_from_refs(refs, workingset); - struct mem_cgroup *memcg =3D folio_memcg(folio); + struct mem_cgroup *memcg; struct pglist_data *pgdat =3D folio_pgdat(folio); + unsigned short memcg_id; =20 BUILD_BUG_ON(LRU_GEN_WIDTH + LRU_REFS_WIDTH > BITS_PER_LONG - EVICTION_SH= IFT); =20 + rcu_read_lock(); + memcg =3D folio_memcg(folio); lruvec =3D mem_cgroup_lruvec(memcg, pgdat); lrugen =3D &lruvec->lrugen; min_seq =3D READ_ONCE(lrugen->min_seq[type]); @@ -253,8 +256,10 @@ static void *lru_gen_eviction(struct folio *folio) =20 hist =3D lru_hist_from_seq(min_seq); atomic_long_add(delta, &lrugen->evicted[hist][type][tier]); + memcg_id =3D mem_cgroup_id(memcg); + rcu_read_unlock(); =20 - return pack_shadow(mem_cgroup_id(memcg), pgdat, token, workingset); + return pack_shadow(memcg_id, pgdat, token, workingset); } =20 /* --=20 2.20.1 From nobody Fri Dec 19 16:05:22 2025 Received: from mail-pl1-f182.google.com (mail-pl1-f182.google.com [209.85.214.182]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 721A4232792 for ; Tue, 15 Apr 2025 02:47:45 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.182 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1744685266; cv=none; b=HzF3zZphIjKhs41HPUIKH2nF2PjxkvPMOCq7HXNVwfk7QlUbK9pwWVcJ06oYfWuD5K9aN863Tzw3qB/BVdw5jXdIa3FwHP7A340j/VCNTKFInuVWMzOzXoQtvf/7Rqi9XAzz472dDZrxDqhnbHv29N9XxmHaT/N1iIHVjDREenc= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1744685266; c=relaxed/simple; bh=MrXrXjTBYXLdLREsfYrjSTNotRmujCm0BhazHpTsW2w=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=Mg1agWYhSkChplVTweX34dtjsy4ZKLig7CNJGBw9CRGg7prNtZGPpKg/oH6R/XN8LpbAApjq3tKELjZOfkBYlx04yhro1SQlxHXL80F9Z1TUnvsMAsKAVE2Sx0j7kdIvc6WYhRjCQXUdC7s8a44eKB7PhgqbpBlogRTZW3lxFD0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com; spf=pass smtp.mailfrom=bytedance.com; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b=RpS6rVR3; arc=none smtp.client-ip=209.85.214.182 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=bytedance.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b="RpS6rVR3" Received: by mail-pl1-f182.google.com with SMTP id d9443c01a7336-22423adf751so45813865ad.2 for ; Mon, 14 Apr 2025 19:47:45 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1744685265; x=1745290065; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=63b5+F5fFo9uNyjbfST0FZ55aigVErZF90xX0QFU4a4=; b=RpS6rVR3Kav0RuLYj5Hi8WHnWyG74tKTln+QeVE4hgeO1oZvihhy6UQC3eUTGfhfin 7Go+fHxQOr+rTpcdDZrwkUzdfDmIfPz0dfeMPEcdaw0EHodfp1VID0DEmlJBJY5DNc8F zYz2P+Cea1TCqDwziLNcV3S/eNfuaGBbSxXvoumaIrX+LxyelIUk26/qNyhnkxKwXXLS TGTX79vnfEbKKOz+0OL0mZ3VlMqTAbs9crJszuvY3NYuthu8ru3s886AdlCJrIWmVEXg WMkJ18nFMSOSQGZX8vHANNX7mCtiHJyddQDQ5cpr77sOBjIWNAJxRTVcgrhnuAQU+H86 PDLg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1744685265; x=1745290065; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=63b5+F5fFo9uNyjbfST0FZ55aigVErZF90xX0QFU4a4=; b=CLGRq/hIrEg/eyVNQArIm6aY24unmp91BIktBADqV9/wpna3y6v94/AvPwRE1QychP oBPlFGnCr3N+eTZd6L+noOnYkPgn2j51/0SfZJRfPsNdW7cW2yjJxfVIpbRkMFiNdh/y xK3nminsl7zu2BeDS6FdjoqVHVgVBQ5xpq5+5PC9BqExSo+Tj3ZGpqy4cJZwOEFYCY+j iwMkGzsdamh2yiuPRW2sqxY2f+b5wMWc8GLgpGgUXbVtGYzhFH3HSkdgzJ+LceJ+6rsR tiTZ1+AfFFlCczku29qzAyqREVBo9yLqBuzKspq2ppFFD35LzNjIoqCnQyMS8LQBaCDR hp+A== X-Gm-Message-State: AOJu0YzW2gy+JhB4cOfbvfJpi1+/HvkySTkFOfmxYSaHnbiG7HgLuaAu 14uBB6yEVs4SwO0qpn13JUC/crZP6bLRdFoRlICHHUK3FCxSSAzS5OnQKedcX60= X-Gm-Gg: ASbGncuD5tWhJqYvTWUxqfk+nYo/YVf1ItC08WvRvQn+xsFBM3J3yuY4sfGEkF3DRG+ FRGHK+3hPWQhd0jjNsWgoxiQAp1EpCydg8yNdQHABFiH22wMYqvoHLe5x9bcbwDk1ajP5ZZ86Dt lwXF0frba61sTRJ3G/zTeu8Ag6EZeAQT0lxyPOKNGnTxh13B51jQ4S/5ZHpAyH9sKSSY2RgqxsT 7DqjlmnUHLLRa76sWi5SXnezyCGLoI3K2elmPMsEJdob+EOPXhHBOrpRNbc7OA/Am8vgfZBhRmO u84pcGJ0QuM8UpWQj+i9P+muC9ikBZ2ThqBRbT6XJH05ceYspQla71kdBtN9TPw/Q8wVdSUR X-Google-Smtp-Source: AGHT+IFmIkyy7FOoYDJmjbvJNymfdfM84C+MAEagetSFHxG3fte/Kh1b2EKbddjsXQhUKdg0bAYVCg== X-Received: by 2002:a17:902:da91:b0:220:f7bb:842 with SMTP id d9443c01a7336-22bea4efafamr182908255ad.42.1744685264764; Mon, 14 Apr 2025 19:47:44 -0700 (PDT) Received: from PXLDJ45XCM.bytedance.net ([61.213.176.5]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-22ac7ccac49sm106681185ad.217.2025.04.14.19.47.39 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Mon, 14 Apr 2025 19:47:44 -0700 (PDT) From: Muchun Song To: hannes@cmpxchg.org, mhocko@kernel.org, roman.gushchin@linux.dev, shakeel.butt@linux.dev, muchun.song@linux.dev, akpm@linux-foundation.org, david@fromorbit.com, zhengqi.arch@bytedance.com, yosry.ahmed@linux.dev, nphamcs@gmail.com, chengming.zhou@linux.dev Cc: linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, linux-mm@kvack.org, hamzamahfooz@linux.microsoft.com, apais@linux.microsoft.com, Muchun Song Subject: [PATCH RFC 20/28] mm: workingset: prevent lruvec release in workingset_refault() Date: Tue, 15 Apr 2025 10:45:24 +0800 Message-Id: <20250415024532.26632-21-songmuchun@bytedance.com> X-Mailer: git-send-email 2.39.5 (Apple Git-154) In-Reply-To: <20250415024532.26632-1-songmuchun@bytedance.com> References: <20250415024532.26632-1-songmuchun@bytedance.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" In the near future, a folio will no longer pin its corresponding memory cgroup. So an lruvec returned by folio_lruvec() could be released without the rcu read lock or a reference to its memory cgroup. In the current patch, the rcu read lock is employed to safeguard against the release of the lruvec in workingset_refault(). This serves as a preparatory measure for the reparenting of the LRU pages. Signed-off-by: Muchun Song --- mm/workingset.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/mm/workingset.c b/mm/workingset.c index e14b9e33f161..ef89d18cb8cf 100644 --- a/mm/workingset.c +++ b/mm/workingset.c @@ -560,11 +560,12 @@ void workingset_refault(struct folio *folio, void *sh= adow) * locked to guarantee folio_memcg() stability throughout. */ nr =3D folio_nr_pages(folio); + rcu_read_lock(); lruvec =3D folio_lruvec(folio); mod_lruvec_state(lruvec, WORKINGSET_REFAULT_BASE + file, nr); =20 if (!workingset_test_recent(shadow, file, &workingset, true)) - return; + goto out; =20 folio_set_active(folio); workingset_age_nonresident(lruvec, nr); @@ -580,6 +581,8 @@ void workingset_refault(struct folio *folio, void *shad= ow) lru_note_cost_refault(folio); mod_lruvec_state(lruvec, WORKINGSET_RESTORE_BASE + file, nr); } +out: + rcu_read_unlock(); } =20 /** --=20 2.20.1 From nobody Fri Dec 19 16:05:22 2025 Received: from mail-pl1-f173.google.com (mail-pl1-f173.google.com [209.85.214.173]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id A0894232792 for ; Tue, 15 Apr 2025 02:47:50 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.173 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1744685272; cv=none; b=aJk2wl9xo8CNqACirJQVf/2jXJ93Twsi2bXiFV7+PkAeHVf75isj7JFxjaQAWqDjqafmu8QFULrSK+O9hPYEsp3G/1dldwJzMYAmkcJELWsLgghchZG2fzL78SA4KLtBj+uyfVGyzzlTnSTkcRTiuWLVcgnRntpMuLL1vtPLerI= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1744685272; c=relaxed/simple; bh=N963eLsna70u/SXWWOCTgEo6yKRzuPJMpqN+SD/ylcg=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=tE3ICgiz1aATxOQoDI8Toc6OhI1iid8fwEGyMDHFtdXMKkW5Jq7X1VCsn+zFp8qkpyF3cR+YfKDdukq012Vc9r9nfLuUPjtBJ2fL4zag4WXxJOD44x2XGW0Ttmh86udYQ8gexDCevtIriLgXr/QpREzi5HhHJEKREpL3EY1mnrI= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com; spf=pass smtp.mailfrom=bytedance.com; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b=gwxYAnMX; arc=none smtp.client-ip=209.85.214.173 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=bytedance.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b="gwxYAnMX" Received: by mail-pl1-f173.google.com with SMTP id d9443c01a7336-223fb0f619dso53489605ad.1 for ; Mon, 14 Apr 2025 19:47:50 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1744685270; x=1745290070; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=Dhcsyw5OvFqI3eRSntntJlPhmFn+ppWjISpYyf0tCPE=; b=gwxYAnMXlbfzgTzBgnwWseP7oizIOtrP7G+friYS+aqpLw5r8K+fjniJuMV9lx4SYO M0A9/IHjmtbVlvcWDgWS/TcVyWqZW9TAn3YpW1iWuY2p7J83wOLEV9jysfin6/1QXmco 4o9VKWi19j5TFL+1XD5hczYxQUA16tNn5GCw77LDxsG2eKpXibdQCJMWdT1t/PNNbTIK NecvueU2WfnOVgCeudq5H6UnnB905WpNptPc05YuzIfijhZFZAnuYeh4MlR6EPtRiNVS F+/m8F5y01HZpZ8MUY2NgUlKTLTCCdyMwnNVPM+0sik5opi1wxbfEoI8/Yw1Wwar1dRe ZjBw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1744685270; x=1745290070; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=Dhcsyw5OvFqI3eRSntntJlPhmFn+ppWjISpYyf0tCPE=; b=Gfj2at3F6+OiVs8rAW/uelk02lk4PsprCKAd+KQzbZFwoRq4jcItcr5I7cJZJNY0Qj nSFwDaLRoWHUBG6EAr7IebEPI+G/Jo2iivqEI4KDnc2Bo0oI9d0bWxnPhAGg8WlJf0Ru 56l1FzCfcdPi2Rmq9vom63OEiuBQ3tfIhjetW20pnT7Im2RMOzkw88XuGxiU9HuEikQA lSfJFYhoxA0emSEgEfkEjmDxdllgUn2ZTAr9QdlaPIhaCTMqgGNDyx7KfKLiZJocZeGk YCjJqxV75NGYWYXnL6PNaBC1QKbAz8wCGdOaQWnGsSuoRmFR+rqA/sPJ2Pwx41kaueju qBuw== X-Gm-Message-State: AOJu0YzihzvoQEVPZAZxTJbE3nvIOfRawVLgVDurAZNd6oF64xCAEcgS E56buDagbnb6g2BwVrjip4S2CW2SNaTQGFfHnuSySZy/JE7Wp+N7ShDid8fbA0w= X-Gm-Gg: ASbGncu0zhRnhOJlNW3eqbam5j3ODLb5TAf/bOIz2p9t6W3Rax6wqDOCGw7IoPrQKye VXHnHJfgDyS4D3zoeT1lhboLEA5JpFCeBmNBVW64NkM7T9zbE6bkr8o9qSOcBaByfmJJvvdk7Fj 2Th8epXliYJ1gneh5fwvZMTU/QETlR3dc62x65N78cdvtzJ+8x4BnbPAzVV9lEuE2lJf6fezVbK 97/i7a8nn2s1lH2FSTPunK1o1XxGgRG6SIjAUvf7KyOBD+UMzcPKnrD5Jaxlky2PLQ30kYWxqYQ SaTzwdeizMi7R2FuwK2iKYTo3mczwlhWV4U9x2d1/+iBPoxBRCoi6PoBVsvAcgkhCKm3AFQ8 X-Google-Smtp-Source: AGHT+IHQB5bi/wRWV63xYCSrz429tnAZcPPlisuUNzOS3Dpdg6ZYVtpDC/C2wxVGDE/gRRdOBX11FA== X-Received: by 2002:a17:902:ef4c:b0:224:76f:9e4a with SMTP id d9443c01a7336-22bea4ab7e3mr209530695ad.14.1744685269959; Mon, 14 Apr 2025 19:47:49 -0700 (PDT) Received: from PXLDJ45XCM.bytedance.net ([61.213.176.5]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-22ac7ccac49sm106681185ad.217.2025.04.14.19.47.45 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Mon, 14 Apr 2025 19:47:49 -0700 (PDT) From: Muchun Song To: hannes@cmpxchg.org, mhocko@kernel.org, roman.gushchin@linux.dev, shakeel.butt@linux.dev, muchun.song@linux.dev, akpm@linux-foundation.org, david@fromorbit.com, zhengqi.arch@bytedance.com, yosry.ahmed@linux.dev, nphamcs@gmail.com, chengming.zhou@linux.dev Cc: linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, linux-mm@kvack.org, hamzamahfooz@linux.microsoft.com, apais@linux.microsoft.com, Muchun Song Subject: [PATCH RFC 21/28] mm: zswap: prevent lruvec release in zswap_folio_swapin() Date: Tue, 15 Apr 2025 10:45:25 +0800 Message-Id: <20250415024532.26632-22-songmuchun@bytedance.com> X-Mailer: git-send-email 2.39.5 (Apple Git-154) In-Reply-To: <20250415024532.26632-1-songmuchun@bytedance.com> References: <20250415024532.26632-1-songmuchun@bytedance.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" In the near future, a folio will no longer pin its corresponding memory cgroup. So an lruvec returned by folio_lruvec() could be released without the rcu read lock or a reference to its memory cgroup. In the current patch, the rcu read lock is employed to safeguard against the release of the lruvec in zswap_folio_swapin(). This serves as a preparatory measure for the reparenting of the LRU pages. Signed-off-by: Muchun Song Acked-by: Nhat Pham Reviewed-by: Chengming Zhou --- mm/zswap.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/mm/zswap.c b/mm/zswap.c index 204fb59da33c..4a41c2371f3d 100644 --- a/mm/zswap.c +++ b/mm/zswap.c @@ -752,8 +752,10 @@ void zswap_folio_swapin(struct folio *folio) struct lruvec *lruvec; =20 if (folio) { + rcu_read_lock(); lruvec =3D folio_lruvec(folio); atomic_long_inc(&lruvec->zswap_lruvec_state.nr_disk_swapins); + rcu_read_unlock(); } } =20 --=20 2.20.1 From nobody Fri Dec 19 16:05:22 2025 Received: from mail-pl1-f170.google.com (mail-pl1-f170.google.com [209.85.214.170]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id AB57B233716 for ; Tue, 15 Apr 2025 02:47:55 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.170 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1744685277; cv=none; b=EcEAawRJ0irPKt+AxVpyM+zyzs7Q41XAHClyj52Igeq3EohKWYLmFQdl5kD8Tv4VgiYva+qbfvSf0Df3Zb5wZkcHcMeNUkobBvoafRwDBjIA+0rnmV7Yy7DbWW1QwwADfGyvlEnqRHnCB6ZXXisidc23EXCJcShTLejtuXJguR4= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1744685277; c=relaxed/simple; bh=gqcWvL8puEcB8VURLE0CUOfUeKpOQO4iYFAmWiCBYCM=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=dH1QhJgt052e1ykeJKn/Z02IbHPh2l0kjhWrtGPuHDFQ5TpAJy1gAQEdC78as3jG5BCF2nYzSsAW1OwmSV6b7aklgQ8d0Acn8Qrg6PaEkDKSCwx+xWORzuTnbZfxbeqIx7Oua4VRR+9KQoEhYBy9HSFny0+/t5mHWobj1NvwGG0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com; spf=pass smtp.mailfrom=bytedance.com; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b=fpcKFjh9; arc=none smtp.client-ip=209.85.214.170 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=bytedance.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b="fpcKFjh9" Received: by mail-pl1-f170.google.com with SMTP id d9443c01a7336-2264aefc45dso74533615ad.0 for ; Mon, 14 Apr 2025 19:47:55 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1744685275; x=1745290075; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=lBa3KcBA5wxUNKsQ+K1hpla3o8QIUcc8xJR0inNtMu0=; b=fpcKFjh9ienup1D85vtSvbp1TeT+MWhoUrsOsNiiTa/M9+staSsJXBEj/iebO1CNsZ 4+tY6T/BYJ+Z9EzqNB5EoZvdUU5YZB/PnmFSsk8hfZZY4XCDvUF1CjiuNMZ6clk7HrmD q7inDR24jRI5JmJKK76Q5sGJgfUrKE8PJBjIYdPOFKKtixrvZvejpjTC5lmujZ1JPH/F 72kGqWXE6HbyLtz6aNcpBfv4+0r6N0EoANMeJDU4RtTt2mmMrCfxOrT7ndmtJFDhXILX i+g47tpkmbCzZxuXKaFVTk8/EeFRbAkAj90+gz0MAUmqVwBvj04LNNDbGLT7QqiOy5Az 22Fw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1744685275; x=1745290075; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=lBa3KcBA5wxUNKsQ+K1hpla3o8QIUcc8xJR0inNtMu0=; b=t7fzE0+w8FFdjTMmGQtF1XsEPzFqVA/6qjaXwMgblQmEaN6aUscfK6JDQUJeEB2WO/ sKr1bV+MXCI251CQrIat/NqI5OWTkNKrgLMaefaDXhKonol0WEYMheZsTt41hQ17N120 Bj5X38+/Wtew6Zj5oRInP/uO+tjsAFtRI71HiysFzBtF68J19XhABUm9uXvLLf/mVc4n QijyhuUzyvqRXNHgFL/UBLuiu686Tlzpzb5PHValJ03swY4QpRY3wg18fQ5VvrCTkR0Y rcCDl4ZtIMBt+UCuUbAOaUCBe57fLLzxzfMEZr2fKKr33otAgL5Sx6sTrb6IefTx3Xfi fAEQ== X-Gm-Message-State: AOJu0YxEC8tqgX26tC+4PGlS3YC0B77OU2p/w25z5YTqrLQ/NZkQTbOk yfGgGsU8zUfjYp9TDG2U7z6VbqLZZ2SojRMXsSdPfkTNrSHMHy9w7rZ5fFP2BKM= X-Gm-Gg: ASbGncsA7t5rVUIzgcw1Nij1fZ68DiwcqNwWHVQI/S0K5JflYtCG1lFIFSgAaoxfhJF 9e+Mx2dRUDZfrVsu3X8bkojQi3q0F6bnDQJeO+9/pPp3Q4OLx+q4ErJMnpTbSyApkYUMykE9jB5 HYwAqZUiYwCfuv2O+kMrFH6z4ekftm6xfM+xyFEQDUQzZpBEZYw2ud3UD6nSHEs7xzFz3HaSOKU aVLJY0GLG03UVGfenOZzSXSWMqFm+0qnh/U6oq8vS0YZogQcm86jb0GXW8+LL32+4ppFDokD8Si 3uCv7mpUsbVRC2w78YFfBlUxg6CmMbHPbS+2/CkI3pEd++KDhfiTKpqj0gIhxWpQtAkv6zFW X-Google-Smtp-Source: AGHT+IHI4hqLb4XlutpOFIzQEdrRaemLqaUdJqfYPhQI+o4uvusfdbjBRgP5q4orxhE3PLyGhMO1fw== X-Received: by 2002:a17:903:3202:b0:224:1c41:a4bc with SMTP id d9443c01a7336-22bea4ab6c4mr253471365ad.12.1744685275144; Mon, 14 Apr 2025 19:47:55 -0700 (PDT) Received: from PXLDJ45XCM.bytedance.net ([61.213.176.5]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-22ac7ccac49sm106681185ad.217.2025.04.14.19.47.50 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Mon, 14 Apr 2025 19:47:54 -0700 (PDT) From: Muchun Song To: hannes@cmpxchg.org, mhocko@kernel.org, roman.gushchin@linux.dev, shakeel.butt@linux.dev, muchun.song@linux.dev, akpm@linux-foundation.org, david@fromorbit.com, zhengqi.arch@bytedance.com, yosry.ahmed@linux.dev, nphamcs@gmail.com, chengming.zhou@linux.dev Cc: linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, linux-mm@kvack.org, hamzamahfooz@linux.microsoft.com, apais@linux.microsoft.com, Muchun Song Subject: [PATCH RFC 22/28] mm: swap: prevent lruvec release in swap module Date: Tue, 15 Apr 2025 10:45:26 +0800 Message-Id: <20250415024532.26632-23-songmuchun@bytedance.com> X-Mailer: git-send-email 2.39.5 (Apple Git-154) In-Reply-To: <20250415024532.26632-1-songmuchun@bytedance.com> References: <20250415024532.26632-1-songmuchun@bytedance.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" In the near future, a folio will no longer pin its corresponding memory cgroup. So an lruvec returned by folio_lruvec() could be released without the rcu read lock or a reference to its memory cgroup. In the current patch, the rcu read lock is employed to safeguard against the release of the lruvec in lru_note_cost_refault() and lru_activate(). This serves as a preparatory measure for the reparenting of the LRU pages. Signed-off-by: Muchun Song --- mm/swap.c | 10 +++++++--- 1 file changed, 7 insertions(+), 3 deletions(-) diff --git a/mm/swap.c b/mm/swap.c index ee19e171857d..fbf887578dbe 100644 --- a/mm/swap.c +++ b/mm/swap.c @@ -291,8 +291,10 @@ void lru_note_cost(struct lruvec *lruvec, bool file, =20 void lru_note_cost_refault(struct folio *folio) { + rcu_read_lock(); lru_note_cost(folio_lruvec(folio), folio_is_file_lru(folio), folio_nr_pages(folio), 0); + rcu_read_unlock(); } =20 static void lru_activate(struct lruvec *lruvec, struct folio *folio) @@ -406,18 +408,20 @@ static void lru_gen_inc_refs(struct folio *folio) =20 static bool lru_gen_clear_refs(struct folio *folio) { - struct lru_gen_folio *lrugen; int gen =3D folio_lru_gen(folio); int type =3D folio_is_file_lru(folio); + unsigned long seq; =20 if (gen < 0) return true; =20 set_mask_bits(&folio->flags, LRU_REFS_FLAGS | BIT(PG_workingset), 0); =20 - lrugen =3D &folio_lruvec(folio)->lrugen; + rcu_read_lock(); + seq =3D READ_ONCE(folio_lruvec(folio)->lrugen.min_seq[type]); + rcu_read_unlock(); /* whether can do without shuffling under the LRU lock */ - return gen =3D=3D lru_gen_from_seq(READ_ONCE(lrugen->min_seq[type])); + return gen =3D=3D lru_gen_from_seq(seq); } =20 #else /* !CONFIG_LRU_GEN */ --=20 2.20.1 From nobody Fri Dec 19 16:05:22 2025 Received: from mail-pl1-f182.google.com (mail-pl1-f182.google.com [209.85.214.182]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 3D7CD21D5B3 for ; Tue, 15 Apr 2025 02:48:01 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.182 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1744685282; cv=none; b=NSBlJbkUf1ka5tgDwBZOq7G8Qih+3In0I/V3pOL8aLn9/hwIouPFXypAXKrmHVMhoSpxIR/vnUnMIeq525iIA20aujDEeS+sP1YIrDLbYevsF4v+r1XhHixX2SZGZGV1HobXVrPRSzCQVLC6nFbmU7BGLKiJQhCfxVeihWe9gYA= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1744685282; c=relaxed/simple; bh=hwnyuwbVRBFAA56pD1wDk9t8QUtkgrEJu4P2YyCqkrs=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=nOq9khp9z3vfkPdrfYK3PvRUh5++b6OR5qfY+7CCG7yMxuohDd26+9t8rBBv39ZyfNIXsbiF6UNYnQgjRaUU/97qJFrShpkbE/LsA6QO+uhAXf8QVcX1jP8TGqg10HPFdvbX6YHDiigTpCL53FJlrSyGZa1d5cf1fG4N6gtPszY= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com; spf=pass smtp.mailfrom=bytedance.com; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b=icfJr1De; arc=none smtp.client-ip=209.85.214.182 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=bytedance.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b="icfJr1De" Received: by mail-pl1-f182.google.com with SMTP id d9443c01a7336-22423adf751so45815345ad.2 for ; Mon, 14 Apr 2025 19:48:01 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1744685280; x=1745290080; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=OUs32dtFutsOKBw24WRLrtykUD+fXNK5AQbCBXYyH+U=; b=icfJr1De56/vcTa7+gUpdgphPSQa1E/gQI2fjVNw0kcGTsds1P9wtpQlbJupMPay9t mYGk33QtWk/T+PKgnt6VFyIRPqaJzPprjoIbq/RIOfxmZNeTG/9k9ELI8Dm3dHfsvlmc l1oFbmS0qPvowHqVkahE10TO0ABgwzjiq+uDnawxGZ0s4TFEYlHhb4FKaKfbWIqthtV8 2/71Xg7JthTfU0KWaS2pUDr5g0enmzFiGvNO6+nFrwqJes/CLxDfM2C5+s05W2YVL45q 5nfn/JQ5hUICwiBesTDtZUdRSsj1VghkzZZYYbGxQd8+kKZLgx+Nit5W4KP7LxO/6Tin g8UQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1744685280; x=1745290080; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=OUs32dtFutsOKBw24WRLrtykUD+fXNK5AQbCBXYyH+U=; b=pMg/aaexCid1Lu65vH4OhQba6H13/OgMaP6w9sJkJuWh4pZyOGN0XtudxaKFzt0Rw4 HT2OUJjzX8alj0/hgYOFvaMbIWxdheGjwLlglElOILgwlNq8cyB+PhFdv7r0A6tbh+2N Kl3LnU3m8oSFCJ8G1XrBAdXbAweF2KxeXAni41KTJETffzHvBGuSqZpZBFiAHZjNr+uK uW+VXd9fKy4fyyTGd91nLXzZE6GIHxz4K4PI6lpXjyGkX2qIufVeivi2DsWD5OWJFaVM +hxt9ePF4SJCcqSpJTdLs6aRNhJHAGgjXXK4aLOC5rdQpMTmVc//bO2xhuQAyQg5l/5V ps2g== X-Gm-Message-State: AOJu0YwxKgTZkY4Xgw+ZRgR7F5u/X9yImKbEPTm8O3TbO4DXj9YtZQdL uzVDgnVO6VhIO14GGMc8hdg2r0dPAcRR0gYDAK4V4fivj3g97fNwJDJzmq5Kzgs= X-Gm-Gg: ASbGnctLEyMt7o8yR8fn5rOVfYZ9g+LkRWKDZi/OtKtWpGw7PrPvVT1gQIQZF5hMWO7 LJWcEucfOAZEmtpozlaSQSJ7YmxvHInzmTqWmQEfveCnHQgOLP/YMvgE8zUY9eRmpfS0XdgT5Vp rE7ucQ20NsDuENAPP3GYtdlsFcY3IPxqyAovYV0etVis8JrU1cwB5qqTRL16QBeDF8iSQUu4uzc Im9MhiA+a76UvDsslkljrpTrOY3w7gWhnSgVwxk5vtJinqjqfBrysIn/2qs6jsoM4os0Vg0W0nG 2kLIL2HYt+iKb5baKoObyeWBO1VZxgdxPk6LdvsqQmE5uW4EiTX9NG1GYlHeOhIMbPklVqtl X-Google-Smtp-Source: AGHT+IEtjVtnb1BSndquKXoPm2m8j71DYv9c3rLhXw+JhVIvJMgE2spfc/6lyYJ7b6Mu11ismWhvkw== X-Received: by 2002:a17:902:e547:b0:220:c911:3f60 with SMTP id d9443c01a7336-22bea4fd182mr195573825ad.47.1744685280634; Mon, 14 Apr 2025 19:48:00 -0700 (PDT) Received: from PXLDJ45XCM.bytedance.net ([61.213.176.5]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-22ac7ccac49sm106681185ad.217.2025.04.14.19.47.55 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Mon, 14 Apr 2025 19:48:00 -0700 (PDT) From: Muchun Song To: hannes@cmpxchg.org, mhocko@kernel.org, roman.gushchin@linux.dev, shakeel.butt@linux.dev, muchun.song@linux.dev, akpm@linux-foundation.org, david@fromorbit.com, zhengqi.arch@bytedance.com, yosry.ahmed@linux.dev, nphamcs@gmail.com, chengming.zhou@linux.dev Cc: linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, linux-mm@kvack.org, hamzamahfooz@linux.microsoft.com, apais@linux.microsoft.com, Muchun Song Subject: [PATCH RFC 23/28] mm: workingset: prevent lruvec release in workingset_activation() Date: Tue, 15 Apr 2025 10:45:27 +0800 Message-Id: <20250415024532.26632-24-songmuchun@bytedance.com> X-Mailer: git-send-email 2.39.5 (Apple Git-154) In-Reply-To: <20250415024532.26632-1-songmuchun@bytedance.com> References: <20250415024532.26632-1-songmuchun@bytedance.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" In the near future, a folio will no longer pin its corresponding memory cgroup. So an lruvec returned by folio_lruvec() could be released without the rcu read lock or a reference to its memory cgroup. In the current patch, the rcu read lock is employed to safeguard against the release of the lruvec in workingset_activation(). This serves as a preparatory measure for the reparenting of the LRU pages. Signed-off-by: Muchun Song --- mm/workingset.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/mm/workingset.c b/mm/workingset.c index ef89d18cb8cf..ec625eb7db69 100644 --- a/mm/workingset.c +++ b/mm/workingset.c @@ -595,8 +595,11 @@ void workingset_activation(struct folio *folio) * Filter non-memcg pages here, e.g. unmap can call * mark_page_accessed() on VDSO pages. */ - if (mem_cgroup_disabled() || folio_memcg_charged(folio)) + if (mem_cgroup_disabled() || folio_memcg_charged(folio)) { + rcu_read_lock(); workingset_age_nonresident(folio_lruvec(folio), folio_nr_pages(folio)); + rcu_read_unlock(); + } } =20 /* --=20 2.20.1 From nobody Fri Dec 19 16:05:22 2025 Received: from mail-pl1-f178.google.com (mail-pl1-f178.google.com [209.85.214.178]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id A84812343C6 for ; Tue, 15 Apr 2025 02:48:06 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.178 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1744685288; cv=none; b=f6iawys6Vtn4IVp7pcfN9Gjl8JXP/QXOh0QmNbyRQ4yZFZiKao0mTsVJBTOtxnDniXF1+omxYaOvD75mArq2Sof5LpiSHq8vdHzkNSVxAjIGCP1SQJBzCj3mZkVTiBF/AokKKGIXR5GpJD+hk503ZNtKGCarlprwsJTCTjtRtDw= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1744685288; c=relaxed/simple; bh=sTTiATuZuhDebCoRqHZETQjvP/LxD7C7zdqE+ikAVAI=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=QH6AmryqWGCH4lApSc9Dumrcr1Hnj5dD6HlCWBjhR2l8PHS82EAOXvbhU21MMTeU0zPn3vWsk/4u29yo/Rn3SgMFOoJ7pI0l3hgsQCpQvkyCnkCCeEJToWJuxtyhsoNM958h3iVwUeabt2tY/B/t10ieKLOlx397CLvON3u7888= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com; spf=pass smtp.mailfrom=bytedance.com; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b=fouCe7N9; arc=none smtp.client-ip=209.85.214.178 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=bytedance.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b="fouCe7N9" Received: by mail-pl1-f178.google.com with SMTP id d9443c01a7336-225477548e1so48384885ad.0 for ; Mon, 14 Apr 2025 19:48:06 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1744685286; x=1745290086; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=CpSUB3YDfgeYGmfO/fpEZlFvo0fqwlF+YsT5eAHFtoQ=; b=fouCe7N9Fz6S2r/9QDtKPmxM5j/H8o76UNwIAh43qVFeIbNUlI9zSEBT0Qke6SWmvi y9ulJIYDhG2ZIkdxn1EURA/fmi3IQ+lGY3RoJmgU1bchVzvt8YP+S92z+O1Zu94rOkM0 1EDcYUyFu+tkZ6j3v2TLsJoJdkhycE+wkYj0dB5qqVp1dH6usqDA8zJLZsoyuDyS+Kgw r5zjcV9dQr9JB2l4YzBI2xtAq6abontUycqAujCzsggEZ+2UTgOjtqfPmTK9SLNt6i4i zPN2xo9e+rfjkxnwSP1SgMkHmVq/HUXgmQcqzzCqbxEEJ/phW0F5QEDaA8oueSyXhK6u 8N+w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1744685286; x=1745290086; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=CpSUB3YDfgeYGmfO/fpEZlFvo0fqwlF+YsT5eAHFtoQ=; b=qObGa0IX2wKz3TngAk6q+ugwsK8gZPbzivAMRIt7OocBHrttTtlQIXzVs5hHFi1UWa hOf9THxcDv2d8pf4zPPjhj07CmQr1JQUpXSE7GDvvnwJpkS3yXkA8RphP972472pzP0T fn+hehvl8TOgD1su+TN3GHLpxyuCl47f46WY5+6h6OAK98FML9NZqhUIIHAhtvn6B2N7 XFxEMCygeWHwbCxv9ZND0Ik/nu8JPayHdOgRy+BVGYHCyB4PjjQUTTjFZVea0v31/vfT f1Im2YW8k5eDX+CXux3Cs+fCmqKckdncgqaSEtYEKtxyg0anfVTyngmub3WL3mLDCys3 yvSQ== X-Gm-Message-State: AOJu0YzH0jNb8Gs11YZgkfkohk+84wOO0ACcPf7eh9nMCH9umcyNN/yZ cpzQFRlFdO5I478046AFOxxaMdI3xOe/Rb61tfNmFGxI6tqU/RqaiCp7O1gnQgE= X-Gm-Gg: ASbGnctmvWHoPMzEqoZclpxddLcpjECKLAZop/njssKMCildoVnDwAWlgq06rXUwKOQ Jmr7aSgJJMguMwV96wZ9FAGn4UgzKeXStVS97BRKCQNq5lQ/OE0iwv84VUNQhtU7h5oL6WDxQzP qFO5HSaix78dCW4yQyPd7FV/dp57Em92c5yl9vm2FSYYb1poIdZj9evSVLN1TY5A3j1kNn3ZUIW EbZIXDgQsbDKA9X96Ya0oETClQ4YzstftYU8ecn6So0QVbWHs0EBRj/XKYVGUDDzW3wBN6uVqQd f/+dyeqD0sDClVYHru51AllaN4BGbJv06wYPsU2Rrm502TbWR2Vw+OTe3Ghala31yuR9eRlIo2W BbH386WU= X-Google-Smtp-Source: AGHT+IHgqNDuzvTzNM38ht/Apai59kRy7TIM0/jwQoLjF9IBxI/3XCoQow31pfiiq7WdKcqDsqh+0g== X-Received: by 2002:a17:902:f78a:b0:224:1eab:97b5 with SMTP id d9443c01a7336-22bea49530dmr219890795ad.1.1744685285756; Mon, 14 Apr 2025 19:48:05 -0700 (PDT) Received: from PXLDJ45XCM.bytedance.net ([61.213.176.5]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-22ac7ccac49sm106681185ad.217.2025.04.14.19.48.00 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Mon, 14 Apr 2025 19:48:05 -0700 (PDT) From: Muchun Song To: hannes@cmpxchg.org, mhocko@kernel.org, roman.gushchin@linux.dev, shakeel.butt@linux.dev, muchun.song@linux.dev, akpm@linux-foundation.org, david@fromorbit.com, zhengqi.arch@bytedance.com, yosry.ahmed@linux.dev, nphamcs@gmail.com, chengming.zhou@linux.dev Cc: linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, linux-mm@kvack.org, hamzamahfooz@linux.microsoft.com, apais@linux.microsoft.com, Muchun Song Subject: [PATCH RFC 24/28] mm: memcontrol: prepare for reparenting LRU pages for lruvec lock Date: Tue, 15 Apr 2025 10:45:28 +0800 Message-Id: <20250415024532.26632-25-songmuchun@bytedance.com> X-Mailer: git-send-email 2.39.5 (Apple Git-154) In-Reply-To: <20250415024532.26632-1-songmuchun@bytedance.com> References: <20250415024532.26632-1-songmuchun@bytedance.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" The following diagram illustrates how to ensure the safety of the folio lruvec lock when LRU folios undergo reparenting. In the folio_lruvec_lock(folio) function: ``` rcu_read_lock(); retry: lruvec =3D folio_lruvec(folio); /* There is a possibility of folio reparenting at this point. */ spin_lock(&lruvec->lru_lock); if (unlikely(lruvec_memcg(lruvec) !=3D folio_memcg(folio))) { /* * The wrong lruvec lock was acquired, and a retry is required. * This is because the folio resides on the parent memcg lruvec * list. */ spin_unlock(&lruvec->lru_lock); goto retry; } /* Reaching here indicates that folio_memcg() is stable. */ ``` In the memcg_reparent_objcgs(memcg) function: ``` spin_lock(&lruvec->lru_lock); spin_lock(&lruvec_parent->lru_lock); /* Transfer folios from the lruvec list to the parent's. */ spin_unlock(&lruvec_parent->lru_lock); spin_unlock(&lruvec->lru_lock); ``` After acquiring the lruvec lock, it is necessary to verify whether the folio has been reparented. If reparenting has occurred, the new lruvec lock must be reacquired. During the LRU folio reparenting process, the lruvec lock will also be acquired (this will be implemented in a subsequent patch). Therefore, folio_memcg() remains unchanged while the lruvec lock is held. Given that lruvec_memcg(lruvec) is always equal to folio_memcg(folio) after the lruvec lock is acquired, the lruvec_memcg_debug() check is redundant. Hence, it is removed. This patch serves as a preparation for the reparenting of LRU folios. Signed-off-by: Muchun Song --- include/linux/memcontrol.h | 23 ++++++----------- mm/compaction.c | 29 ++++++++++++++++----- mm/memcontrol.c | 53 +++++++++++++++++++------------------- 3 files changed, 58 insertions(+), 47 deletions(-) diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h index 01239147eb11..27b23e464229 100644 --- a/include/linux/memcontrol.h +++ b/include/linux/memcontrol.h @@ -719,7 +719,11 @@ static inline struct lruvec *mem_cgroup_lruvec(struct = mem_cgroup *memcg, * folio_lruvec - return lruvec for isolating/putting an LRU folio * @folio: Pointer to the folio. * - * This function relies on folio->mem_cgroup being stable. + * The user should hold an rcu read lock to protect lruvec associated with + * the folio from being released. But it does not prevent binding stability + * between the folio and the returned lruvec from being changed to its par= ent + * or ancestor (e.g. like folio_lruvec_lock() does that holds LRU lock to + * prevent the change). */ static inline struct lruvec *folio_lruvec(struct folio *folio) { @@ -742,15 +746,6 @@ struct lruvec *folio_lruvec_lock_irq(struct folio *fol= io); struct lruvec *folio_lruvec_lock_irqsave(struct folio *folio, unsigned long *flags); =20 -#ifdef CONFIG_DEBUG_VM -void lruvec_memcg_debug(struct lruvec *lruvec, struct folio *folio); -#else -static inline -void lruvec_memcg_debug(struct lruvec *lruvec, struct folio *folio) -{ -} -#endif - static inline struct mem_cgroup *mem_cgroup_from_css(struct cgroup_subsys_state *css){ return css ? container_of(css, struct mem_cgroup, css) : NULL; @@ -1211,11 +1206,6 @@ static inline struct lruvec *folio_lruvec(struct fol= io *folio) return &pgdat->__lruvec; } =20 -static inline -void lruvec_memcg_debug(struct lruvec *lruvec, struct folio *folio) -{ -} - static inline struct mem_cgroup *parent_mem_cgroup(struct mem_cgroup *memc= g) { return NULL; @@ -1532,17 +1522,20 @@ static inline struct lruvec *parent_lruvec(struct l= ruvec *lruvec) static inline void lruvec_unlock(struct lruvec *lruvec) { spin_unlock(&lruvec->lru_lock); + rcu_read_unlock(); } =20 static inline void lruvec_unlock_irq(struct lruvec *lruvec) { spin_unlock_irq(&lruvec->lru_lock); + rcu_read_unlock(); } =20 static inline void lruvec_unlock_irqrestore(struct lruvec *lruvec, unsigned long flags) { spin_unlock_irqrestore(&lruvec->lru_lock, flags); + rcu_read_unlock(); } =20 /* Test requires a stable folio->memcg binding, see folio_memcg() */ diff --git a/mm/compaction.c b/mm/compaction.c index ce45d633ddad..4abd1481d5de 100644 --- a/mm/compaction.c +++ b/mm/compaction.c @@ -551,6 +551,24 @@ static bool compact_lock_irqsave(spinlock_t *lock, uns= igned long *flags, return true; } =20 +static struct lruvec * +compact_folio_lruvec_lock_irqsave(struct folio *folio, unsigned long *flag= s, + struct compact_control *cc) +{ + struct lruvec *lruvec; + + rcu_read_lock(); +retry: + lruvec =3D folio_lruvec(folio); + compact_lock_irqsave(&lruvec->lru_lock, flags, cc); + if (unlikely(lruvec_memcg(lruvec) !=3D folio_memcg(folio))) { + spin_unlock_irqrestore(&lruvec->lru_lock, *flags); + goto retry; + } + + return lruvec; +} + /* * Compaction requires the taking of some coarse locks that are potentially * very heavily contended. The lock should be periodically unlocked to avo= id @@ -872,7 +890,7 @@ isolate_migratepages_block(struct compact_control *cc, = unsigned long low_pfn, { pg_data_t *pgdat =3D cc->zone->zone_pgdat; unsigned long nr_scanned =3D 0, nr_isolated =3D 0; - struct lruvec *lruvec; + struct lruvec *lruvec =3D NULL; unsigned long flags =3D 0; struct lruvec *locked =3D NULL; struct folio *folio =3D NULL; @@ -1189,18 +1207,17 @@ isolate_migratepages_block(struct compact_control *= cc, unsigned long low_pfn, if (!folio_test_clear_lru(folio)) goto isolate_fail_put; =20 - lruvec =3D folio_lruvec(folio); + if (locked) + lruvec =3D folio_lruvec(folio); =20 /* If we already hold the lock, we can skip some rechecking */ - if (lruvec !=3D locked) { + if (lruvec !=3D locked || !locked) { if (locked) lruvec_unlock_irqrestore(locked, flags); =20 - compact_lock_irqsave(&lruvec->lru_lock, &flags, cc); + lruvec =3D compact_folio_lruvec_lock_irqsave(folio, &flags, cc); locked =3D lruvec; =20 - lruvec_memcg_debug(lruvec, folio); - /* * Try get exclusive access under lock. If marked for * skip, the scan is aborted unless the current context diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 694f19017699..1f0c6e7b69cc 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -1196,23 +1196,6 @@ void mem_cgroup_scan_tasks(struct mem_cgroup *memcg, } } =20 -#ifdef CONFIG_DEBUG_VM -void lruvec_memcg_debug(struct lruvec *lruvec, struct folio *folio) -{ - struct mem_cgroup *memcg; - - if (mem_cgroup_disabled()) - return; - - memcg =3D folio_memcg(folio); - - if (!memcg) - VM_BUG_ON_FOLIO(!mem_cgroup_is_root(lruvec_memcg(lruvec)), folio); - else - VM_BUG_ON_FOLIO(lruvec_memcg(lruvec) !=3D memcg, folio); -} -#endif - /** * folio_lruvec_lock - Lock the lruvec for a folio. * @folio: Pointer to the folio. @@ -1222,14 +1205,20 @@ void lruvec_memcg_debug(struct lruvec *lruvec, stru= ct folio *folio) * - folio_test_lru false * - folio frozen (refcount of 0) * - * Return: The lruvec this folio is on with its lock held. + * Return: The lruvec this folio is on with its lock held and rcu read loc= k held. */ struct lruvec *folio_lruvec_lock(struct folio *folio) { - struct lruvec *lruvec =3D folio_lruvec(folio); + struct lruvec *lruvec; =20 + rcu_read_lock(); +retry: + lruvec =3D folio_lruvec(folio); spin_lock(&lruvec->lru_lock); - lruvec_memcg_debug(lruvec, folio); + if (unlikely(lruvec_memcg(lruvec) !=3D folio_memcg(folio))) { + spin_unlock(&lruvec->lru_lock); + goto retry; + } =20 return lruvec; } @@ -1244,14 +1233,20 @@ struct lruvec *folio_lruvec_lock(struct folio *foli= o) * - folio frozen (refcount of 0) * * Return: The lruvec this folio is on with its lock held and interrupts - * disabled. + * disabled and rcu read lock held. */ struct lruvec *folio_lruvec_lock_irq(struct folio *folio) { - struct lruvec *lruvec =3D folio_lruvec(folio); + struct lruvec *lruvec; =20 + rcu_read_lock(); +retry: + lruvec =3D folio_lruvec(folio); spin_lock_irq(&lruvec->lru_lock); - lruvec_memcg_debug(lruvec, folio); + if (unlikely(lruvec_memcg(lruvec) !=3D folio_memcg(folio))) { + spin_unlock_irq(&lruvec->lru_lock); + goto retry; + } =20 return lruvec; } @@ -1267,15 +1262,21 @@ struct lruvec *folio_lruvec_lock_irq(struct folio *= folio) * - folio frozen (refcount of 0) * * Return: The lruvec this folio is on with its lock held and interrupts - * disabled. + * disabled and rcu read lock held. */ struct lruvec *folio_lruvec_lock_irqsave(struct folio *folio, unsigned long *flags) { - struct lruvec *lruvec =3D folio_lruvec(folio); + struct lruvec *lruvec; =20 + rcu_read_lock(); +retry: + lruvec =3D folio_lruvec(folio); spin_lock_irqsave(&lruvec->lru_lock, *flags); - lruvec_memcg_debug(lruvec, folio); + if (unlikely(lruvec_memcg(lruvec) !=3D folio_memcg(folio))) { + spin_unlock_irqrestore(&lruvec->lru_lock, *flags); + goto retry; + } =20 return lruvec; } --=20 2.20.1 From nobody Fri Dec 19 16:05:22 2025 Received: from mail-pg1-f172.google.com (mail-pg1-f172.google.com [209.85.215.172]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 95505235340 for ; Tue, 15 Apr 2025 02:48:11 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.215.172 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1744685293; cv=none; b=SGf112IKbbRG6vc16B7pcHj+KwJslOpqT0IwHDLMlovCPUonmd8ni0NWuyQ0Ehf1xC5UP8LM/FUOeFvTfntXHQ/98qYgezF7iRQho1UFLJ643Uh13cAevQN8/v0JO9XlMUT+ihE44kBsLt9hajf+N6XXLJzjlo0MWR2eDlZ54YI= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1744685293; c=relaxed/simple; bh=b6WgT5AaIyycVV1DlaIxlbeB9KrtvSXL+2l330r8XW4=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=Aqz6HPyZ+0K0G1TsLi+SXMktWCOKAqUdJU6ob6c6Mv5FSQi0PZlT6MJF3BS4mB/cLzcQFSsxYRImFU9uuvdR0K9qsrCORBJe4GdhIixiWZp3At2W4xnNvGhK2xCN4S8HMa3S/pqyq6QRapq+V/Ukn5wWsowR27KylYtvX8UwgDM= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com; spf=pass smtp.mailfrom=bytedance.com; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b=iPMPiQYU; arc=none smtp.client-ip=209.85.215.172 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=bytedance.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b="iPMPiQYU" Received: by mail-pg1-f172.google.com with SMTP id 41be03b00d2f7-b09c090c97eso838938a12.1 for ; Mon, 14 Apr 2025 19:48:11 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1744685291; x=1745290091; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=vzey8goXEog+ZffEsOrcu6z/kqiEK8l86D0gwmtPc7Q=; b=iPMPiQYUKy0rh8fGJ73HTK8N8RjpFAX/nGm9ovu690ajo+aQyVkKd39wsEVWKum1Fe lc4kjWq2aN7cDDV9FEbs0AT2/dkgz09I/9nOhSdObb5OxYfHRGUsj7TC1L6FLTtSEUJ9 FO03ofZIgWYDHSs9vdOmk6GZccUo7v/p8jRcyfuuOvvty9LgHMGi7XUxy+ZZ2SZziYqj jxyIQFrvlRtT7MlgTZbpIE49zGmEdx2WBpMo/JbeOr2M7xPdEoW3ECPLdPL5+aToRfYn SiUBlV4B9bfo4imXtCJ1dicbs17gh0tQ4a0y1qZUGj80D6aB6XOAeMAA317LXGZ9Biwg AyYg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1744685291; x=1745290091; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=vzey8goXEog+ZffEsOrcu6z/kqiEK8l86D0gwmtPc7Q=; b=L+Wtmj9CDuV+DxjfxqeOGnBKmPL++punUREfcG4SbNbtD8zrXcB+XJZ/cX7wMKu0qc PmbpNjC4wiW3j93yICBMsbYShj+/6lbsdzsfMBHzx86dfKxbf0/UqnF23obZ1i/LvK1R 5SYYo+KVjmMiLWdrlsqACkHlg2glvj4phVXoBjbZp9g8vuj/660ACexEt8Tr0tyQW/xa xaj/RiyI2OZA1W6dNwYZvubpeVc6QbiecZ1v0DcvbjAkFzq2s58pFuWnF08qrhgVFWlx 5s+OZcJbwh7kCcshwu48MWakLI41oLK133inppeuCmpgu1n26Gs6d/mZzmacKd8NbYeJ Vd1Q== X-Gm-Message-State: AOJu0YzUfL2IJOfddlnx6i2Cmb50EGa3b+RBug3d4To1UVHwnzF26Fy2 RdhuL4IthSxPSw2/WWIqz7AdZdVAg3YvChmurlipjf7UyE0uMkfR7p/LY/uWBj0= X-Gm-Gg: ASbGncvBGlNJneLa9rW2fwQERNPl74JTdmC9lp6dPHDS1p39p57JZESiyZsWEMLzxcW 4E3FDURIfOCRdS3Bh7mAsqymDdsGHNJQPvM4LAm1/wNRq2d79jRhvA36VIzwOJIzf8MClbrJaop kEYuoSXUu8ApveZsGsaN9dAIH0ieQNCmsPl7kN3Ym2R/nXn1INUtRVRlUJWfFev5CbDCSTj2QRq 2VZVvWq8MyIkJnzqh08XlkuXBvfiWZc+wjrH2A2+3XBdDRoaw2dLCbCK3hza1pnS0HwgArhKssr Tz55iEEcGT1zrNHZM26atoPAdJrmhAAaHvDpSVxkkB7NehIF6czwEhL7xk2WF6nTqQANZvmFQtl 3kTWHe30= X-Google-Smtp-Source: AGHT+IGKYbRxT9nU7SMJ0Zxzh0coEyzVb1MW3k+CSg2YANyZzMz0FPRUg+d3lPphVCyM8bkhwJIXdw== X-Received: by 2002:a17:902:eb8a:b0:220:c813:dfce with SMTP id d9443c01a7336-22bea4f273amr208688465ad.39.1744685290927; Mon, 14 Apr 2025 19:48:10 -0700 (PDT) Received: from PXLDJ45XCM.bytedance.net ([61.213.176.5]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-22ac7ccac49sm106681185ad.217.2025.04.14.19.48.06 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Mon, 14 Apr 2025 19:48:10 -0700 (PDT) From: Muchun Song To: hannes@cmpxchg.org, mhocko@kernel.org, roman.gushchin@linux.dev, shakeel.butt@linux.dev, muchun.song@linux.dev, akpm@linux-foundation.org, david@fromorbit.com, zhengqi.arch@bytedance.com, yosry.ahmed@linux.dev, nphamcs@gmail.com, chengming.zhou@linux.dev Cc: linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, linux-mm@kvack.org, hamzamahfooz@linux.microsoft.com, apais@linux.microsoft.com, Muchun Song Subject: [PATCH RFC 25/28] mm: thp: prepare for reparenting LRU pages for split queue lock Date: Tue, 15 Apr 2025 10:45:29 +0800 Message-Id: <20250415024532.26632-26-songmuchun@bytedance.com> X-Mailer: git-send-email 2.39.5 (Apple Git-154) In-Reply-To: <20250415024532.26632-1-songmuchun@bytedance.com> References: <20250415024532.26632-1-songmuchun@bytedance.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Analogous to the mechanism employed for the lruvec lock, we adopt an identical strategy to ensure the safety of the split queue lock during the reparenting process of LRU folios. Signed-off-by: Muchun Song --- mm/huge_memory.c | 14 ++++++++++++++ 1 file changed, 14 insertions(+) diff --git a/mm/huge_memory.c b/mm/huge_memory.c index d2bc943a40e8..813334994f84 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -1100,8 +1100,14 @@ static struct deferred_split *folio_split_queue_lock= (struct folio *folio) { struct deferred_split *queue; =20 + rcu_read_lock(); +retry: queue =3D folio_split_queue(folio); spin_lock(&queue->split_queue_lock); + if (unlikely(folio_split_queue_memcg(folio, queue) !=3D folio_memcg(folio= ))) { + spin_unlock(&queue->split_queue_lock); + goto retry; + } =20 return queue; } @@ -1111,8 +1117,14 @@ folio_split_queue_lock_irqsave(struct folio *folio, = unsigned long *flags) { struct deferred_split *queue; =20 + rcu_read_lock(); +retry: queue =3D folio_split_queue(folio); spin_lock_irqsave(&queue->split_queue_lock, *flags); + if (unlikely(folio_split_queue_memcg(folio, queue) !=3D folio_memcg(folio= ))) { + spin_unlock_irqrestore(&queue->split_queue_lock, *flags); + goto retry; + } =20 return queue; } @@ -1120,12 +1132,14 @@ folio_split_queue_lock_irqsave(struct folio *folio,= unsigned long *flags) static inline void split_queue_unlock(struct deferred_split *queue) { spin_unlock(&queue->split_queue_lock); + rcu_read_unlock(); } =20 static inline void split_queue_unlock_irqrestore(struct deferred_split *qu= eue, unsigned long flags) { spin_unlock_irqrestore(&queue->split_queue_lock, flags); + rcu_read_unlock(); } =20 static inline bool is_transparent_hugepage(const struct folio *folio) --=20 2.20.1 From nobody Fri Dec 19 16:05:22 2025 Received: from mail-pl1-f174.google.com (mail-pl1-f174.google.com [209.85.214.174]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id CE66714830F for ; Tue, 15 Apr 2025 02:48:16 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.174 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1744685299; cv=none; b=rP2u6C/LjNEp72/NToIDsUp5YSzBN5NbnYiV8Vfc+KsHw5kpxv9ADtomzaCK0B6NhkRSZxuvj78mUfPHA7nAbm54hyoNxn8naCx0lrghdoydYkOTwGW0XDyC1Q65oD3MI2JM9qRRC9zRnmYF8O1DWllH/jM++mCUVVa2YeJoLDU= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1744685299; c=relaxed/simple; bh=fGt3vwXzojBpwJ5OPFRNuUOpmtvk4WgFiUPe9G5ZOto=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=SPB7qQldRVMjhBZ8B3xiYbi3mdlDZZMJ37TrAyyXopJO9VgQ74qb068Kdsn098cKLmZCaC3EOXkPBS0mxHxoEzKfs028UsozpsqydgneyrxzIgbLhkNx8Z5Qd8kY5hyXxofEL4BvfBBzULXzHXbBgeCDFXRUHzOc3b5KGo5Zzps= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com; spf=pass smtp.mailfrom=bytedance.com; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b=Eym07reE; arc=none smtp.client-ip=209.85.214.174 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=bytedance.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b="Eym07reE" Received: by mail-pl1-f174.google.com with SMTP id d9443c01a7336-225df540edcso59350615ad.0 for ; Mon, 14 Apr 2025 19:48:16 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1744685296; x=1745290096; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=chHLX5G0loxHSohYhxyRHh7WVRRPsaZ78MDusal3hIs=; b=Eym07reEy2An2zdgKEO46ZR05K09ZnxkS0OjAKkBhzf6zxDEKDa9zQtmqAFjSLgn55 nIGEpTSNqgAdWyL9sPkCE8UqrWMGel5iBICE/Tx2TGA4JN5sX8uXToExhEsKIgdbb8pP Je00YfR3G/qUdSJnOAQoVbikc970FT6PENU0p0RtKnrYXrZpOI2IVhUZEcOTZb7K9KPH 8jF7RhIB+zzWNlpbeXaQgK8+CNCcrGQRYiPUMNO7uIxTr/zmwaJbedKu/kj3d9pqETZj JrBfm/U3zLrZdyfectktdZKdZzNGCJ3jN10DY/U922Op7bDL4Q8Vo6mcRg50WydkwiLQ fxgw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1744685296; x=1745290096; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=chHLX5G0loxHSohYhxyRHh7WVRRPsaZ78MDusal3hIs=; b=DWgLhRwgQwcoodu9H2rJn8+mz6OlT6hFrdAO4qVDU4swOH2w9Fsfz3SK9YoyvgHE4d uV76S3zEwEXO4uTliJDYW5kubdiIzi+7XZwyMMpI1zjbGG/MtAQvhqjs6iA+RNwnWwfX Vt8XjX6o1XwupnKNfJZRZ3q2PjqTeWlFZEfAUJyrSyzQytxXwsmEA+kSHDrk4QN5RWY0 GacPQ0djh4z94xAnR2nrpY701VkhSGMSijb40Q+mX1+aNG1VC1y+Af4qtiP5LIpmRw0J Nwt755kCawW4xlNXe+0B2r1sqmcSbU479usrwzfcTvQxT0dZ7FFw36fdYgVgBFMofPTG kzJw== X-Gm-Message-State: AOJu0YzgrK0xsG8+pStNLZh1qYoJ3iuZ+iUUwt50c4AyaaIU2XX5xNwq WQZQjujLPOdLOhoHe5sK+29R50k9pOWfbrlkqQSizFWcuL9QuC7aK4N+2vLszog= X-Gm-Gg: ASbGncuqTYhJhZzDC6fJSnIfPZBWBfr8sYpW7MpChWTzufTa7hl4Ix38Pyu3kY7nNlI bUKv/Ut70c1+YMdI4EinGnlMfxKaq1k9VQCoSaLW9HHNxYoOZ1yn1NYyoNFndiOLRcH1vEpD2kZ SZfw1dQHjjlBzvLtYzFOpX7iBgefZCdQqA+2Oo7x/vWLKNHW09ahnVCAmAqTl10vYnYkElUhsW+ VRLpA3m3sCcLsaBzKaVX0/oJtF0WO92fsrVcDdu82hAc4ZbilVAnJyd86kSNLg5yUgFZZ75NL00 0JVR1ufkcQUEp0YWIOy+dmczmKKnzf4p/u+np+5EpVS2CJ8TFS1CygpQGRotuirI7efaCGBo X-Google-Smtp-Source: AGHT+IHfdYPb0mTyQBEggcIEtJXMdJHS98DVkuG5qNfvvUIjwBW9edwqUb5MNEzj4f0Vk/I+xU8TBA== X-Received: by 2002:a17:903:3203:b0:215:58be:3349 with SMTP id d9443c01a7336-22c24987312mr27942235ad.14.1744685296075; Mon, 14 Apr 2025 19:48:16 -0700 (PDT) Received: from PXLDJ45XCM.bytedance.net ([61.213.176.5]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-22ac7ccac49sm106681185ad.217.2025.04.14.19.48.11 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Mon, 14 Apr 2025 19:48:15 -0700 (PDT) From: Muchun Song To: hannes@cmpxchg.org, mhocko@kernel.org, roman.gushchin@linux.dev, shakeel.butt@linux.dev, muchun.song@linux.dev, akpm@linux-foundation.org, david@fromorbit.com, zhengqi.arch@bytedance.com, yosry.ahmed@linux.dev, nphamcs@gmail.com, chengming.zhou@linux.dev Cc: linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, linux-mm@kvack.org, hamzamahfooz@linux.microsoft.com, apais@linux.microsoft.com, Muchun Song Subject: [PATCH RFC 26/28] mm: memcontrol: introduce memcg_reparent_ops Date: Tue, 15 Apr 2025 10:45:30 +0800 Message-Id: <20250415024532.26632-27-songmuchun@bytedance.com> X-Mailer: git-send-email 2.39.5 (Apple Git-154) In-Reply-To: <20250415024532.26632-1-songmuchun@bytedance.com> References: <20250415024532.26632-1-songmuchun@bytedance.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" In the previous patch, we established a method to ensure the safety of the lruvec lock and the split queue lock during the reparenting of LRU folios. The process involves the following steps: memcg_reparent_objcgs(memcg) 1) lock // lruvec belongs to memcg and lruvec_parent belongs to parent memc= g. spin_lock(&lruvec->lru_lock); spin_lock(&lruvec_parent->lru_lock); 2) relocate from current memcg to its parent // Move all the pages from the lruvec list to the parent lruvec lis= t. 3) unlock spin_unlock(&lruvec_parent->lru_lock); spin_unlock(&lruvec->lru_lock); In addition to the folio lruvec lock, the deferred split queue lock (specific to THP) also requires a similar approach. Therefore, we abstract the three essential steps from the memcg_reparent_objcgs() function. memcg_reparent_objcgs(memcg) 1) lock memcg_reparent_ops->lock(memcg, parent); 2) relocate memcg_reparent_ops->relocate(memcg, reparent); 3) unlock memcg_reparent_ops->unlock(memcg, reparent); Currently, two distinct locks (such as the lruvec lock and the deferred split queue lock) need to utilize this infrastructure. In the subsequent patch, we will employ these APIs to ensure the safety of these locks during the reparenting of LRU folios. Signed-off-by: Muchun Song --- include/linux/memcontrol.h | 20 ++++++++++++ mm/memcontrol.c | 62 ++++++++++++++++++++++++++++++-------- 2 files changed, 69 insertions(+), 13 deletions(-) diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h index 27b23e464229..0e450623f8fa 100644 --- a/include/linux/memcontrol.h +++ b/include/linux/memcontrol.h @@ -311,6 +311,26 @@ struct mem_cgroup { struct mem_cgroup_per_node *nodeinfo[]; }; =20 +struct memcg_reparent_ops { + /* + * Note that interrupt is disabled before calling those callbacks, + * so the interrupt should remain disabled when leaving those callbacks. + */ + void (*lock)(struct mem_cgroup *src, struct mem_cgroup *dst); + void (*relocate)(struct mem_cgroup *src, struct mem_cgroup *dst); + void (*unlock)(struct mem_cgroup *src, struct mem_cgroup *dst); +}; + +#define DEFINE_MEMCG_REPARENT_OPS(name) \ + const struct memcg_reparent_ops memcg_##name##_reparent_ops =3D { \ + .lock =3D name##_reparent_lock, \ + .relocate =3D name##_reparent_relocate, \ + .unlock =3D name##_reparent_unlock, \ + } + +#define DECLARE_MEMCG_REPARENT_OPS(name) \ + extern const struct memcg_reparent_ops memcg_##name##_reparent_ops + /* * size of first charge trial. * TODO: maybe necessary to use big numbers in big irons or dynamic based = of the diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 1f0c6e7b69cc..3fac51179186 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -194,24 +194,60 @@ static struct obj_cgroup *obj_cgroup_alloc(void) return objcg; } =20 -static void memcg_reparent_objcgs(struct mem_cgroup *memcg) +static void objcg_reparent_lock(struct mem_cgroup *src, struct mem_cgroup = *dst) +{ + spin_lock(&objcg_lock); +} + +static void objcg_reparent_relocate(struct mem_cgroup *src, struct mem_cgr= oup *dst) { struct obj_cgroup *objcg, *iter; - struct mem_cgroup *parent =3D parent_mem_cgroup(memcg); =20 - objcg =3D rcu_replace_pointer(memcg->objcg, NULL, true); + objcg =3D rcu_replace_pointer(src->objcg, NULL, true); + /* 1) Ready to reparent active objcg. */ + list_add(&objcg->list, &src->objcg_list); + /* 2) Reparent active objcg and already reparented objcgs to dst. */ + list_for_each_entry(iter, &src->objcg_list, list) + WRITE_ONCE(iter->memcg, dst); + /* 3) Move already reparented objcgs to the dst's list */ + list_splice(&src->objcg_list, &dst->objcg_list); +} =20 - spin_lock_irq(&objcg_lock); +static void objcg_reparent_unlock(struct mem_cgroup *src, struct mem_cgrou= p *dst) +{ + spin_unlock(&objcg_lock); +} =20 - /* 1) Ready to reparent active objcg. */ - list_add(&objcg->list, &memcg->objcg_list); - /* 2) Reparent active objcg and already reparented objcgs to parent. */ - list_for_each_entry(iter, &memcg->objcg_list, list) - WRITE_ONCE(iter->memcg, parent); - /* 3) Move already reparented objcgs to the parent's list */ - list_splice(&memcg->objcg_list, &parent->objcg_list); - - spin_unlock_irq(&objcg_lock); +static DEFINE_MEMCG_REPARENT_OPS(objcg); + +static const struct memcg_reparent_ops *memcg_reparent_ops[] =3D { + &memcg_objcg_reparent_ops, +}; + +#define DEFINE_MEMCG_REPARENT_FUNC(phase) \ + static void memcg_reparent_##phase(struct mem_cgroup *src, \ + struct mem_cgroup *dst) \ + { \ + int i; \ + \ + for (i =3D 0; i < ARRAY_SIZE(memcg_reparent_ops); i++) \ + memcg_reparent_ops[i]->phase(src, dst); \ + } + +DEFINE_MEMCG_REPARENT_FUNC(lock) +DEFINE_MEMCG_REPARENT_FUNC(relocate) +DEFINE_MEMCG_REPARENT_FUNC(unlock) + +static void memcg_reparent_objcgs(struct mem_cgroup *src) +{ + struct mem_cgroup *dst =3D parent_mem_cgroup(src); + struct obj_cgroup *objcg =3D rcu_dereference_protected(src->objcg, true); + + local_irq_disable(); + memcg_reparent_lock(src, dst); + memcg_reparent_relocate(src, dst); + memcg_reparent_unlock(src, dst); + local_irq_enable(); =20 percpu_ref_kill(&objcg->refcnt); } --=20 2.20.1 From nobody Fri Dec 19 16:05:22 2025 Received: from mail-pj1-f51.google.com (mail-pj1-f51.google.com [209.85.216.51]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 4FC062356A6 for ; Tue, 15 Apr 2025 02:48:21 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.216.51 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1744685304; cv=none; b=P0snPEl3zs0rOHU8WglJ6QXZ3pFkxu9gvGtvyUIv2bgbbeB4eCVb8Pwg87BIwMo0Hze6Psv5S3mxroRRLmD+ptaGVO8IcNuxC5iUSKu4lVcwwID45I4PbeuKRTAb73t7y0OJUKJ5KEghHxNAwd+KQlyRDRN+YC///zHMZ0S/RRw= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1744685304; c=relaxed/simple; bh=H0r0s+aNVtABcImPElCvxEcapJXkAeLu6EgaDJbCxCA=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=XOUGpG/X/XGGTbVqXvH3L6mLS51HU3KsNeL5mKWwBUKQt5JwPAomJXpj8ZUvuXqGANSzzdDGt+KEe/yyPUdgBGDNbxOblJnkmI7AgyyL1Vpqt7mIVxQHI4F0PqGxHWXAjVBh4Emn7pIVxG1iTvUP977IfDv7mSrXYBlDR9zGIik= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com; spf=pass smtp.mailfrom=bytedance.com; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b=kNei1fmG; arc=none smtp.client-ip=209.85.216.51 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=bytedance.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b="kNei1fmG" Received: by mail-pj1-f51.google.com with SMTP id 98e67ed59e1d1-2ff85fec403so5674787a91.1 for ; Mon, 14 Apr 2025 19:48:21 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1744685301; x=1745290101; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=/5hAkTeajZjUlhSwkD9bqooCoCo2zpB/VByckb93fwU=; b=kNei1fmGe10RFBA5v5bcBI9E8P9ZVE4NpwYjRNcsOPkDdAJK5g8MiIk9/ESvHmbXek QnEaS7uQDNVLv9quBlm9CP8plUikUG4DQIO3KlVoDfhGmrebuJWVIQQM4V3nH2i73DCt 2oUBPFzb+W0TcbESzzaLVDEUbjm/7hJeSJ4717axwQySJQKMzxwWz6uVdAt3ANlh12Bd +vdKhRIHVoG+jrzeLSzbQCXIw8vL41YojA+5tklb1AEbGYgjFmjf2ZFul4Nw8SUTnesZ 40bxlC2VpWmqKO29rtvyMy0vtjaYKpVGABh5l+NFkdp/OwlUI17OLbVKxRyD2w7PPxlt y++g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1744685301; x=1745290101; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=/5hAkTeajZjUlhSwkD9bqooCoCo2zpB/VByckb93fwU=; b=RlIcKio0TrdrleChfo01B5gL6zFNwUuWge0nOAggdcp6Mr/31N5DTbOPmf3Klhr6V+ 0tyIS1AC7az/rZfKpGZnvSMuq8QsoK9B3OOOj3XW2QZGUqAKE2qN4fs6jKqx5XXfESFo Cv7TDc4npnf4uK+dHbobP4SSSAw5QAE0rJk5AKju9DhXLdf0dobHWMQnFzdSdFCu00xy lmLjLKpXf//2UNeOKaeT+2XeN/oP0ycjQ/dUMpB9Tr8Ny5nAYqr3C/UhMgshLWqzB522 ds4YobxQH3gjqgiGcTEimmVRAuczHk6IOTjJjXZCEwU19+CI0Jg/Un8P0LO93bwleXuK RTsA== X-Gm-Message-State: AOJu0YxjRJP130wUxYHN0OJTJL7Tti8b4k0QB5Q865C8Y+nMJtp4b4ed NR23ucrFYrPhJEX1tq6w0YvyhucoySDN6aAB0Wibv3jitovu6MbDexNHxVgizjo= X-Gm-Gg: ASbGncv/bzOwx8xBExnByzuCB2pEeMgrLt1U7fkGWUZH3MwSMyya77zPmWcVmr3iIDY 0YGCeTTwZll1I5ixTFMBpy4a0s8IMuSUoK9P/wWChNPpy7m4w6uHjs0yM9flMLfmtdd3DQAZw0z cHflA89DMKpWLrxIGX/zXXa5HO5qIOHu3h7g9BbebIygyu9cok86u8SRx4qrN3qfTnV0wErNrts TD3PFpKEXU865Ia91xG7/7UA5+Ft738CrjzFFKvyJIf8BISZ10Q7DIWLwKENfrFZTcuoqYBIwD7 8UCKFZPzmf+RskyZZIlZx4oEy8xyL3SnXTkKbA1Af4S9nI2uCLrosdiZtHabkrEiXd5LWbmM X-Google-Smtp-Source: AGHT+IHCiWoDN+bE/e12Bjpe86h+z5c2l97IYpJx1vPLZfIiCxp0ezSHDN9cZFtMGTkakWuJkAZr7g== X-Received: by 2002:a17:90b:2dcd:b0:306:b593:455e with SMTP id 98e67ed59e1d1-3084f306f84mr2571647a91.1.1744685301315; Mon, 14 Apr 2025 19:48:21 -0700 (PDT) Received: from PXLDJ45XCM.bytedance.net ([61.213.176.5]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-22ac7ccac49sm106681185ad.217.2025.04.14.19.48.16 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Mon, 14 Apr 2025 19:48:20 -0700 (PDT) From: Muchun Song To: hannes@cmpxchg.org, mhocko@kernel.org, roman.gushchin@linux.dev, shakeel.butt@linux.dev, muchun.song@linux.dev, akpm@linux-foundation.org, david@fromorbit.com, zhengqi.arch@bytedance.com, yosry.ahmed@linux.dev, nphamcs@gmail.com, chengming.zhou@linux.dev Cc: linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, linux-mm@kvack.org, hamzamahfooz@linux.microsoft.com, apais@linux.microsoft.com, Muchun Song Subject: [PATCH RFC 27/28] mm: memcontrol: eliminate the problem of dying memory cgroup for LRU folios Date: Tue, 15 Apr 2025 10:45:31 +0800 Message-Id: <20250415024532.26632-28-songmuchun@bytedance.com> X-Mailer: git-send-email 2.39.5 (Apple Git-154) In-Reply-To: <20250415024532.26632-1-songmuchun@bytedance.com> References: <20250415024532.26632-1-songmuchun@bytedance.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Pagecache pages are charged at allocation time and hold a reference to the original memory cgroup until reclaimed. Depending on memory pressure, page sharing patterns between different cgroups and cgroup creation/destruction rates, many dying memory cgroups can be pinned by pagecache pages, reducing page reclaim efficiency and wasting memory. Converting LRU folios and most other raw memory cgroup pins to the object cgroup direction can fix this long-living problem. Finally, folio->memcg_data of LRU folios and kmem folios will always point to an object cgroup pointer. The folio->memcg_data of slab folios will point to an vector of object cgroups. Signed-off-by: Muchun Song --- include/linux/memcontrol.h | 78 +++++-------- mm/huge_memory.c | 33 ++++++ mm/memcontrol-v1.c | 15 ++- mm/memcontrol.c | 228 +++++++++++++++++++++++++------------ 4 files changed, 222 insertions(+), 132 deletions(-) diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h index 0e450623f8fa..7b1279963c0c 100644 --- a/include/linux/memcontrol.h +++ b/include/linux/memcontrol.h @@ -368,9 +368,6 @@ enum objext_flags { #define OBJEXTS_FLAGS_MASK (__NR_OBJEXTS_FLAGS - 1) =20 #ifdef CONFIG_MEMCG - -static inline bool folio_memcg_kmem(struct folio *folio); - /* * After the initialization objcg->memcg is always pointing at * a valid memcg, but can be atomically swapped to the parent memcg. @@ -384,43 +381,19 @@ static inline struct mem_cgroup *obj_cgroup_memcg(str= uct obj_cgroup *objcg) } =20 /* - * __folio_memcg - Get the memory cgroup associated with a non-kmem folio - * @folio: Pointer to the folio. - * - * Returns a pointer to the memory cgroup associated with the folio, - * or NULL. This function assumes that the folio is known to have a - * proper memory cgroup pointer. It's not safe to call this function - * against some type of folios, e.g. slab folios or ex-slab folios or - * kmem folios. - */ -static inline struct mem_cgroup *__folio_memcg(struct folio *folio) -{ - unsigned long memcg_data =3D folio->memcg_data; - - VM_BUG_ON_FOLIO(folio_test_slab(folio), folio); - VM_BUG_ON_FOLIO(memcg_data & MEMCG_DATA_OBJEXTS, folio); - VM_BUG_ON_FOLIO(memcg_data & MEMCG_DATA_KMEM, folio); - - return (struct mem_cgroup *)(memcg_data & ~OBJEXTS_FLAGS_MASK); -} - -/* - * __folio_objcg - get the object cgroup associated with a kmem folio. + * folio_objcg - get the object cgroup associated with a folio. * @folio: Pointer to the folio. * * Returns a pointer to the object cgroup associated with the folio, * or NULL. This function assumes that the folio is known to have a - * proper object cgroup pointer. It's not safe to call this function - * against some type of folios, e.g. slab folios or ex-slab folios or - * LRU folios. + * proper object cgroup pointer. */ -static inline struct obj_cgroup *__folio_objcg(struct folio *folio) +static inline struct obj_cgroup *folio_objcg(struct folio *folio) { unsigned long memcg_data =3D folio->memcg_data; =20 VM_BUG_ON_FOLIO(folio_test_slab(folio), folio); VM_BUG_ON_FOLIO(memcg_data & MEMCG_DATA_OBJEXTS, folio); - VM_BUG_ON_FOLIO(!(memcg_data & MEMCG_DATA_KMEM), folio); =20 return (struct obj_cgroup *)(memcg_data & ~OBJEXTS_FLAGS_MASK); } @@ -434,21 +407,31 @@ static inline struct obj_cgroup *__folio_objcg(struct= folio *folio) * proper memory cgroup pointer. It's not safe to call this function * against some type of folios, e.g. slab folios or ex-slab folios. * - * For a non-kmem folio any of the following ensures folio and memcg bindi= ng - * stability: + * For a folio any of the following ensures folio and objcg binding stabil= ity: * * - the folio lock * - LRU isolation * - exclusive reference * - * For a kmem folio a caller should hold an rcu read lock to protect memcg - * associated with a kmem folio from being released. + * Based on the stable binding of folio and objcg, for a folio any of the + * following ensures folio and memcg binding stability: + * + * - cgroup_mutex + * - the lruvec lock + * - the split queue lock (only THP page) + * + * If the caller only want to ensure that the page counters of memcg are + * updated correctly, ensure that the binding stability of folio and objcg + * is sufficient. + * + * Note: The caller should hold an rcu read lock or cgroup_mutex to protect + * memcg associated with a folio from being released. */ static inline struct mem_cgroup *folio_memcg(struct folio *folio) { - if (folio_memcg_kmem(folio)) - return obj_cgroup_memcg(__folio_objcg(folio)); - return __folio_memcg(folio); + struct obj_cgroup *objcg =3D folio_objcg(folio); + + return objcg ? obj_cgroup_memcg(objcg) : NULL; } =20 /* @@ -472,15 +455,10 @@ static inline bool folio_memcg_charged(struct folio *= folio) * has an associated memory cgroup pointer or an object cgroups vector or * an object cgroup. * - * For a non-kmem folio any of the following ensures folio and memcg bindi= ng - * stability: + * The page and objcg or memcg binding rules can refer to folio_memcg(). * - * - the folio lock - * - LRU isolation - * - exclusive reference - * - * For a kmem folio a caller should hold an rcu read lock to protect memcg - * associated with a kmem folio from being released. + * A caller should hold an rcu read lock to protect memcg associated with a + * page from being released. */ static inline struct mem_cgroup *folio_memcg_check(struct folio *folio) { @@ -489,18 +467,14 @@ static inline struct mem_cgroup *folio_memcg_check(st= ruct folio *folio) * for slabs, READ_ONCE() should be used here. */ unsigned long memcg_data =3D READ_ONCE(folio->memcg_data); + struct obj_cgroup *objcg; =20 if (memcg_data & MEMCG_DATA_OBJEXTS) return NULL; =20 - if (memcg_data & MEMCG_DATA_KMEM) { - struct obj_cgroup *objcg; - - objcg =3D (void *)(memcg_data & ~OBJEXTS_FLAGS_MASK); - return obj_cgroup_memcg(objcg); - } + objcg =3D (void *)(memcg_data & ~OBJEXTS_FLAGS_MASK); =20 - return (struct mem_cgroup *)(memcg_data & ~OBJEXTS_FLAGS_MASK); + return objcg ? obj_cgroup_memcg(objcg) : NULL; } =20 static inline struct mem_cgroup *page_memcg_check(struct page *page) diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 813334994f84..0236020de5b3 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -1075,6 +1075,39 @@ static inline struct deferred_split *folio_memcg_spl= it_queue(struct folio *folio =20 return memcg ? &memcg->deferred_split_queue : NULL; } + +static void thp_sq_reparent_lock(struct mem_cgroup *src, struct mem_cgroup= *dst) +{ + spin_lock(&src->deferred_split_queue.split_queue_lock); + spin_lock_nested(&dst->deferred_split_queue.split_queue_lock, + SINGLE_DEPTH_NESTING); +} + +static void thp_sq_reparent_relocate(struct mem_cgroup *src, struct mem_cg= roup *dst) +{ + int nid; + struct deferred_split *src_queue, *dst_queue; + + src_queue =3D &src->deferred_split_queue; + dst_queue =3D &dst->deferred_split_queue; + + if (!src_queue->split_queue_len) + return; + + list_splice_tail_init(&src_queue->split_queue, &dst_queue->split_queue); + dst_queue->split_queue_len +=3D src_queue->split_queue_len; + src_queue->split_queue_len =3D 0; + + for_each_node(nid) + set_shrinker_bit(dst, nid, deferred_split_shrinker->id); +} + +static void thp_sq_reparent_unlock(struct mem_cgroup *src, struct mem_cgro= up *dst) +{ + spin_unlock(&dst->deferred_split_queue.split_queue_lock); + spin_unlock(&src->deferred_split_queue.split_queue_lock); +} +DEFINE_MEMCG_REPARENT_OPS(thp_sq); #else static inline struct mem_cgroup *folio_split_queue_memcg(struct folio *folio, diff --git a/mm/memcontrol-v1.c b/mm/memcontrol-v1.c index 8660908850dc..fb060e5c28ca 100644 --- a/mm/memcontrol-v1.c +++ b/mm/memcontrol-v1.c @@ -591,6 +591,7 @@ void memcg1_commit_charge(struct folio *folio, struct m= em_cgroup *memcg) void memcg1_swapout(struct folio *folio, swp_entry_t entry) { struct mem_cgroup *memcg, *swap_memcg; + struct obj_cgroup *objcg; unsigned int nr_entries; =20 VM_BUG_ON_FOLIO(folio_test_lru(folio), folio); @@ -602,12 +603,13 @@ void memcg1_swapout(struct folio *folio, swp_entry_t = entry) if (!do_memsw_account()) return; =20 - memcg =3D folio_memcg(folio); - - VM_WARN_ON_ONCE_FOLIO(!memcg, folio); - if (!memcg) + objcg =3D folio_objcg(folio); + VM_WARN_ON_ONCE_FOLIO(!objcg, folio); + if (!objcg) return; =20 + rcu_read_lock(); + memcg =3D obj_cgroup_memcg(objcg); /* * In case the memcg owning these pages has been offlined and doesn't * have an ID allocated to it anymore, charge the closest online @@ -625,7 +627,7 @@ void memcg1_swapout(struct folio *folio, swp_entry_t en= try) folio_unqueue_deferred_split(folio); folio->memcg_data =3D 0; =20 - if (!mem_cgroup_is_root(memcg)) + if (!obj_cgroup_is_root(objcg)) page_counter_uncharge(&memcg->memory, nr_entries); =20 if (memcg !=3D swap_memcg) { @@ -646,7 +648,8 @@ void memcg1_swapout(struct folio *folio, swp_entry_t en= try) preempt_enable_nested(); memcg1_check_events(memcg, folio_nid(folio)); =20 - css_put(&memcg->css); + rcu_read_unlock(); + obj_cgroup_put(objcg); } =20 /* diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 3fac51179186..1381a9e97ec5 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -220,8 +220,78 @@ static void objcg_reparent_unlock(struct mem_cgroup *s= rc, struct mem_cgroup *dst =20 static DEFINE_MEMCG_REPARENT_OPS(objcg); =20 +static void lruvec_reparent_lock(struct mem_cgroup *src, struct mem_cgroup= *dst) +{ + int nid, nest =3D 0; + + for_each_node(nid) { + spin_lock_nested(&mem_cgroup_lruvec(src, + NODE_DATA(nid))->lru_lock, nest++); + spin_lock_nested(&mem_cgroup_lruvec(dst, + NODE_DATA(nid))->lru_lock, nest++); + } +} + +static void lruvec_reparent_lru(struct lruvec *src, struct lruvec *dst, + enum lru_list lru) +{ + int zid; + struct mem_cgroup_per_node *mz_src, *mz_dst; + + mz_src =3D container_of(src, struct mem_cgroup_per_node, lruvec); + mz_dst =3D container_of(dst, struct mem_cgroup_per_node, lruvec); + + if (lru !=3D LRU_UNEVICTABLE) + list_splice_tail_init(&src->lists[lru], &dst->lists[lru]); + + for (zid =3D 0; zid < MAX_NR_ZONES; zid++) { + mz_dst->lru_zone_size[zid][lru] +=3D mz_src->lru_zone_size[zid][lru]; + mz_src->lru_zone_size[zid][lru] =3D 0; + } +} + +static void lruvec_reparent_relocate(struct mem_cgroup *src, struct mem_cg= roup *dst) +{ + int nid; + + for_each_node(nid) { + enum lru_list lru; + struct lruvec *src_lruvec, *dst_lruvec; + + src_lruvec =3D mem_cgroup_lruvec(src, NODE_DATA(nid)); + dst_lruvec =3D mem_cgroup_lruvec(dst, NODE_DATA(nid)); + + dst_lruvec->anon_cost +=3D src_lruvec->anon_cost; + dst_lruvec->file_cost +=3D src_lruvec->file_cost; + + for_each_lru(lru) + lruvec_reparent_lru(src_lruvec, dst_lruvec, lru); + } +} + +static void lruvec_reparent_unlock(struct mem_cgroup *src, struct mem_cgro= up *dst) +{ + int nid; + + for_each_node(nid) { + spin_unlock(&mem_cgroup_lruvec(dst, NODE_DATA(nid))->lru_lock); + spin_unlock(&mem_cgroup_lruvec(src, NODE_DATA(nid))->lru_lock); + } +} + +static DEFINE_MEMCG_REPARENT_OPS(lruvec); + +#ifdef CONFIG_TRANSPARENT_HUGEPAGE +DECLARE_MEMCG_REPARENT_OPS(thp_sq); +#endif + +/* The lock order depends on the order of elements in this array. */ static const struct memcg_reparent_ops *memcg_reparent_ops[] =3D { &memcg_objcg_reparent_ops, + &memcg_lruvec_reparent_ops, +#ifdef CONFIG_TRANSPARENT_HUGEPAGE + &memcg_thp_sq_reparent_ops, +#endif }; =20 #define DEFINE_MEMCG_REPARENT_FUNC(phase) \ @@ -1018,6 +1088,8 @@ struct mem_cgroup *get_mem_cgroup_from_current(void) /** * get_mem_cgroup_from_folio - Obtain a reference on a given folio's memcg. * @folio: folio from which memcg should be extracted. + * + * The page and objcg or memcg binding rules can refer to folio_memcg(). */ struct mem_cgroup *get_mem_cgroup_from_folio(struct folio *folio) { @@ -2489,17 +2561,17 @@ static inline int try_charge(struct mem_cgroup *mem= cg, gfp_t gfp_mask, return try_charge_memcg(memcg, gfp_mask, nr_pages); } =20 -static void commit_charge(struct folio *folio, struct mem_cgroup *memcg) +static void commit_charge(struct folio *folio, struct obj_cgroup *objcg) { VM_BUG_ON_FOLIO(folio_memcg_charged(folio), folio); /* - * Any of the following ensures page's memcg stability: + * Any of the following ensures page's objcg stability: * * - the page lock * - LRU isolation * - exclusive reference */ - folio->memcg_data =3D (unsigned long)memcg; + folio->memcg_data =3D (unsigned long)objcg; } =20 static inline void __mod_objcg_mlstate(struct obj_cgroup *objcg, @@ -2580,6 +2652,17 @@ static struct obj_cgroup *__get_obj_cgroup_from_memc= g(struct mem_cgroup *memcg) return NULL; } =20 +static inline struct obj_cgroup *get_obj_cgroup_from_memcg(struct mem_cgro= up *memcg) +{ + struct obj_cgroup *objcg; + + rcu_read_lock(); + objcg =3D __get_obj_cgroup_from_memcg(memcg); + rcu_read_unlock(); + + return objcg; +} + static struct obj_cgroup *current_objcg_update(void) { struct mem_cgroup *memcg; @@ -2677,17 +2760,10 @@ struct obj_cgroup *get_obj_cgroup_from_folio(struct= folio *folio) { struct obj_cgroup *objcg; =20 - if (!memcg_kmem_online()) - return NULL; - - if (folio_memcg_kmem(folio)) { - objcg =3D __folio_objcg(folio); + objcg =3D folio_objcg(folio); + if (objcg) obj_cgroup_get(objcg); - } else { - rcu_read_lock(); - objcg =3D __get_obj_cgroup_from_memcg(__folio_memcg(folio)); - rcu_read_unlock(); - } + return objcg; } =20 @@ -3168,7 +3244,7 @@ void folio_split_memcg_refs(struct folio *folio, unsi= gned old_order, return; =20 new_refs =3D (1 << (old_order - new_order)) - 1; - css_get_many(&__folio_memcg(folio)->css, new_refs); + obj_cgroup_get_many(folio_objcg(folio), new_refs); } =20 unsigned long mem_cgroup_usage(struct mem_cgroup *memcg, bool swap) @@ -4616,16 +4692,20 @@ void mem_cgroup_calculate_protection(struct mem_cgr= oup *root, static int charge_memcg(struct folio *folio, struct mem_cgroup *memcg, gfp_t gfp) { - int ret; - - ret =3D try_charge(memcg, gfp, folio_nr_pages(folio)); - if (ret) - goto out; + int ret =3D 0; + struct obj_cgroup *objcg; =20 - css_get(&memcg->css); - commit_charge(folio, memcg); + objcg =3D get_obj_cgroup_from_memcg(memcg); + /* Do not account at the root objcg level. */ + if (!obj_cgroup_is_root(objcg)) + ret =3D try_charge(memcg, gfp, folio_nr_pages(folio)); + if (ret) { + obj_cgroup_put(objcg); + return ret; + } + commit_charge(folio, objcg); memcg1_commit_charge(folio, memcg); -out: + return ret; } =20 @@ -4711,7 +4791,7 @@ int mem_cgroup_swapin_charge_folio(struct folio *foli= o, struct mm_struct *mm, } =20 struct uncharge_gather { - struct mem_cgroup *memcg; + struct obj_cgroup *objcg; unsigned long nr_memory; unsigned long pgpgout; unsigned long nr_kmem; @@ -4725,60 +4805,54 @@ static inline void uncharge_gather_clear(struct unc= harge_gather *ug) =20 static void uncharge_batch(const struct uncharge_gather *ug) { + struct mem_cgroup *memcg; + + rcu_read_lock(); + memcg =3D obj_cgroup_memcg(ug->objcg); if (ug->nr_memory) { - page_counter_uncharge(&ug->memcg->memory, ug->nr_memory); + page_counter_uncharge(&memcg->memory, ug->nr_memory); if (do_memsw_account()) - page_counter_uncharge(&ug->memcg->memsw, ug->nr_memory); + page_counter_uncharge(&memcg->memsw, ug->nr_memory); if (ug->nr_kmem) { - mod_memcg_state(ug->memcg, MEMCG_KMEM, -ug->nr_kmem); - memcg1_account_kmem(ug->memcg, -ug->nr_kmem); + mod_memcg_state(memcg, MEMCG_KMEM, -ug->nr_kmem); + memcg1_account_kmem(memcg, -ug->nr_kmem); } - memcg1_oom_recover(ug->memcg); + memcg1_oom_recover(memcg); } =20 - memcg1_uncharge_batch(ug->memcg, ug->pgpgout, ug->nr_memory, ug->nid); + memcg1_uncharge_batch(memcg, ug->pgpgout, ug->nr_memory, ug->nid); + rcu_read_unlock(); =20 /* drop reference from uncharge_folio */ - css_put(&ug->memcg->css); + obj_cgroup_put(ug->objcg); } =20 static void uncharge_folio(struct folio *folio, struct uncharge_gather *ug) { long nr_pages; - struct mem_cgroup *memcg; struct obj_cgroup *objcg; =20 VM_BUG_ON_FOLIO(folio_test_lru(folio), folio); =20 /* * Nobody should be changing or seriously looking at - * folio memcg or objcg at this point, we have fully - * exclusive access to the folio. + * folio objcg at this point, we have fully exclusive + * access to the folio. */ - if (folio_memcg_kmem(folio)) { - objcg =3D __folio_objcg(folio); - /* - * This get matches the put at the end of the function and - * kmem pages do not hold memcg references anymore. - */ - memcg =3D get_mem_cgroup_from_objcg(objcg); - } else { - memcg =3D __folio_memcg(folio); - } - - if (!memcg) + objcg =3D folio_objcg(folio); + if (!objcg) return; =20 - if (ug->memcg !=3D memcg) { - if (ug->memcg) { + if (ug->objcg !=3D objcg) { + if (ug->objcg) { uncharge_batch(ug); uncharge_gather_clear(ug); } - ug->memcg =3D memcg; + ug->objcg =3D objcg; ug->nid =3D folio_nid(folio); =20 - /* pairs with css_put in uncharge_batch */ - css_get(&memcg->css); + /* pairs with obj_cgroup_put in uncharge_batch */ + obj_cgroup_get(objcg); } =20 nr_pages =3D folio_nr_pages(folio); @@ -4786,20 +4860,17 @@ static void uncharge_folio(struct folio *folio, str= uct uncharge_gather *ug) if (folio_memcg_kmem(folio)) { ug->nr_memory +=3D nr_pages; ug->nr_kmem +=3D nr_pages; - - folio->memcg_data =3D 0; - obj_cgroup_put(objcg); } else { /* LRU pages aren't accounted at the root level */ - if (!mem_cgroup_is_root(memcg)) + if (!obj_cgroup_is_root(objcg)) ug->nr_memory +=3D nr_pages; ug->pgpgout++; =20 WARN_ON_ONCE(folio_unqueue_deferred_split(folio)); - folio->memcg_data =3D 0; } =20 - css_put(&memcg->css); + folio->memcg_data =3D 0; + obj_cgroup_put(objcg); } =20 void __mem_cgroup_uncharge(struct folio *folio) @@ -4823,7 +4894,7 @@ void __mem_cgroup_uncharge_folios(struct folio_batch = *folios) uncharge_gather_clear(&ug); for (i =3D 0; i < folios->nr; i++) uncharge_folio(folios->folios[i], &ug); - if (ug.memcg) + if (ug.objcg) uncharge_batch(&ug); } =20 @@ -4840,6 +4911,7 @@ void __mem_cgroup_uncharge_folios(struct folio_batch = *folios) void mem_cgroup_replace_folio(struct folio *old, struct folio *new) { struct mem_cgroup *memcg; + struct obj_cgroup *objcg; long nr_pages =3D folio_nr_pages(new); =20 VM_BUG_ON_FOLIO(!folio_test_locked(old), old); @@ -4854,21 +4926,24 @@ void mem_cgroup_replace_folio(struct folio *old, st= ruct folio *new) if (folio_memcg_charged(new)) return; =20 - memcg =3D folio_memcg(old); - VM_WARN_ON_ONCE_FOLIO(!memcg, old); - if (!memcg) + objcg =3D folio_objcg(old); + VM_WARN_ON_ONCE_FOLIO(!objcg, old); + if (!objcg) return; =20 + rcu_read_lock(); + memcg =3D obj_cgroup_memcg(objcg); /* Force-charge the new page. The old one will be freed soon */ - if (!mem_cgroup_is_root(memcg)) { + if (!obj_cgroup_is_root(objcg)) { page_counter_charge(&memcg->memory, nr_pages); if (do_memsw_account()) page_counter_charge(&memcg->memsw, nr_pages); } =20 - css_get(&memcg->css); - commit_charge(new, memcg); + obj_cgroup_get(objcg); + commit_charge(new, objcg); memcg1_commit_charge(new, memcg); + rcu_read_unlock(); } =20 /** @@ -4884,7 +4959,7 @@ void mem_cgroup_replace_folio(struct folio *old, stru= ct folio *new) */ void mem_cgroup_migrate(struct folio *old, struct folio *new) { - struct mem_cgroup *memcg; + struct obj_cgroup *objcg; =20 VM_BUG_ON_FOLIO(!folio_test_locked(old), old); VM_BUG_ON_FOLIO(!folio_test_locked(new), new); @@ -4895,18 +4970,18 @@ void mem_cgroup_migrate(struct folio *old, struct f= olio *new) if (mem_cgroup_disabled()) return; =20 - memcg =3D folio_memcg(old); + objcg =3D folio_objcg(old); /* - * Note that it is normal to see !memcg for a hugetlb folio. + * Note that it is normal to see !objcg for a hugetlb folio. * For e.g, itt could have been allocated when memory_hugetlb_accounting * was not selected. */ - VM_WARN_ON_ONCE_FOLIO(!folio_test_hugetlb(old) && !memcg, old); - if (!memcg) + VM_WARN_ON_ONCE_FOLIO(!folio_test_hugetlb(old) && !objcg, old); + if (!objcg) return; =20 - /* Transfer the charge and the css ref */ - commit_charge(new, memcg); + /* Transfer the charge and the objcg ref */ + commit_charge(new, objcg); =20 /* Warning should never happen, so don't worry about refcount non-0 */ WARN_ON_ONCE(folio_unqueue_deferred_split(old)); @@ -5049,22 +5124,27 @@ int __mem_cgroup_try_charge_swap(struct folio *foli= o, swp_entry_t entry) unsigned int nr_pages =3D folio_nr_pages(folio); struct page_counter *counter; struct mem_cgroup *memcg; + struct obj_cgroup *objcg; =20 if (do_memsw_account()) return 0; =20 - memcg =3D folio_memcg(folio); - - VM_WARN_ON_ONCE_FOLIO(!memcg, folio); - if (!memcg) + objcg =3D folio_objcg(folio); + VM_WARN_ON_ONCE_FOLIO(!objcg, folio); + if (!objcg) return 0; =20 + rcu_read_lock(); + memcg =3D obj_cgroup_memcg(objcg); if (!entry.val) { memcg_memory_event(memcg, MEMCG_SWAP_FAIL); + rcu_read_unlock(); return 0; } =20 memcg =3D mem_cgroup_id_get_online(memcg); + /* memcg is pined by memcg ID. */ + rcu_read_unlock(); =20 if (!mem_cgroup_is_root(memcg) && !page_counter_try_charge(&memcg->swap, nr_pages, &counter)) { --=20 2.20.1 From nobody Fri Dec 19 16:05:22 2025 Received: from mail-pj1-f43.google.com (mail-pj1-f43.google.com [209.85.216.43]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 57B702356C8 for ; Tue, 15 Apr 2025 02:48:27 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.216.43 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1744685309; cv=none; b=nxVHbPUab/0OkR64iNjsPTAcufPiUHbLuM1IJzCBmqISNUlbO1GePEirMFLRa5TkXKgu1h6zB1EwshoFRtdlnrzPD2C9srKM8Zwyr4ErTKET0Mj2pWefciAwCBSarX9x85r5IXxyw0fYWFKCBp1JusgDUqcnzMLi0xRfreeiWrw= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1744685309; c=relaxed/simple; bh=qfxs19SK10slyUO0O8nR8PbPqoZ9Hjoq9tzPkm4Mjx8=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=gm3Y3MzccnrEeZ8q3nJZaY/AVSV8/oZiEvKzsN10c776mkZY9xYtHv7ERT1AShlu5eWSJD2ogQDyJLKUT5pmLUxK4fkLggCLGnnTfoChQLWCnYr8Jic+QKPjEjYbJJN2COJLPrIKHsIwd4ZVEZ8DjEMVae7ref9/3gs3tYHLoIo= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com; spf=pass smtp.mailfrom=bytedance.com; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b=kPovSJD3; arc=none smtp.client-ip=209.85.216.43 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=bytedance.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b="kPovSJD3" Received: by mail-pj1-f43.google.com with SMTP id 98e67ed59e1d1-30820167b47so4172843a91.0 for ; Mon, 14 Apr 2025 19:48:27 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1744685306; x=1745290106; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=UXOKqlBcK7wmw0vjxin9AD5vy+d1LV8LiFHk+wC2Q/0=; b=kPovSJD3T5k4Wp3/2QgSI2nVOJlsPBaE/mXyRp1yP1T6zU6bqQUUZhSFpTkX92hAp8 NyKFgvPOrkxzi5h02VVY1TgrEU+jwsbJm0BiQGSySJje5M4WwzU1K1u25UeFfTywUP97 ObhZIdxypCfmdzJrM8KfHu9aXX5KGFjt4qIwEp6W2pX0QTsyCiKNq9vOO6QkMOnJejmO w4lConQT8sno2bzPxgMaf0kL70BTNgxcIEnEC8q0/WeP+HYuiI5PJ9INtcTnXu4Dexs3 wMj8VW4BUoQ0FhMw5HKji0VvypzhV6GrLBFUpo/OdljAHn/BA9L/ZhGX1aA76DUGKcip 7IkA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1744685306; x=1745290106; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=UXOKqlBcK7wmw0vjxin9AD5vy+d1LV8LiFHk+wC2Q/0=; b=emfykj0Hq+g4GPf/W/q2fBlFM2KoHYWimHazVjf7b+OZh+3bwrH8SkQNkzK6E1oyW8 y5zh/GBNqwz1LxhDKkytnlA6cA4n1gcvjYOp9rGRIWOF5dkzusrRMzxB6Q88duPfxT4U CazbLbTZfFvnLpYkrZjroHFv4Lh/DpBObmLXaT4y9P7FvSqby2k3Hj61R7Y9rpuz0cJh +7JF21WpM58ugjTyBsCzzcz7kRPieUqdpCia5k+OOVE7wdrJGcJK/tD8gGqUuiqpOa5k TF1hcwJJuAUSpHywjP7PKb7YJd5TnU0Azrm+nNPBU9XfiD8evo65By9CnX0xvob8pVoB lmvA== X-Gm-Message-State: AOJu0YzBNGDwiGIlLMIR7bR48LUEDzYAI1rjZ1CJ7ngsdiY+KLwhZKRv vJesWTE6lmEw9NzxPgUTJcgM3wXSRlQ2xNBWmpMewuOt/6e67ijdIh02LEOSCb4= X-Gm-Gg: ASbGnctDA4OHJmWa6WysZYAoHRv7Oj94YRbkXUZWBX05sFOdPj/GkD0uxY+1gMs638D xmeDpl9pCggikX/f/CHSNvmyn3Gnx4LfiIr5LPYbViKk0uQ4FTegY37Uodst8s39JEbYlPPh5Zs EiLolOCve40SpSo0yKARDN5PTUrvhOMJt0CHntvFVmTmMVaCkFwmgOYoBNc4tq8x/XP7gNLcLgF kJu6wHwlw8qrqt1hD+z0uCtLbzj0LDKJ+17t2x2l6d6+HD9/rGvUexJxvD28Skq2HSYfl+dH8y+ U2v3/mK3r/RYQs7P3TlUI0trU+yvcSpmYu5bSDvpFwGQqZNOt+ALz01cmhva0OWLWkd6uKwd5lS iT1rndIs= X-Google-Smtp-Source: AGHT+IH7xXBf+pSvn3kHgRy6uEQ3xLaVZ8ScWHSC0IXjydDRK6JttBy5tPzuwSPN3xTseySeCrXp4w== X-Received: by 2002:a17:90b:28d0:b0:2ef:ad48:7175 with SMTP id 98e67ed59e1d1-3084f3d2a3amr2508095a91.15.1744685306495; Mon, 14 Apr 2025 19:48:26 -0700 (PDT) Received: from PXLDJ45XCM.bytedance.net ([61.213.176.5]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-22ac7ccac49sm106681185ad.217.2025.04.14.19.48.21 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Mon, 14 Apr 2025 19:48:26 -0700 (PDT) From: Muchun Song To: hannes@cmpxchg.org, mhocko@kernel.org, roman.gushchin@linux.dev, shakeel.butt@linux.dev, muchun.song@linux.dev, akpm@linux-foundation.org, david@fromorbit.com, zhengqi.arch@bytedance.com, yosry.ahmed@linux.dev, nphamcs@gmail.com, chengming.zhou@linux.dev Cc: linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, linux-mm@kvack.org, hamzamahfooz@linux.microsoft.com, apais@linux.microsoft.com, Muchun Song Subject: [PATCH RFC 28/28] mm: lru: add VM_WARN_ON_ONCE_FOLIO to lru maintenance helpers Date: Tue, 15 Apr 2025 10:45:32 +0800 Message-Id: <20250415024532.26632-29-songmuchun@bytedance.com> X-Mailer: git-send-email 2.39.5 (Apple Git-154) In-Reply-To: <20250415024532.26632-1-songmuchun@bytedance.com> References: <20250415024532.26632-1-songmuchun@bytedance.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" We must ensure the folio is deleted from or added to the correct lruvec lis= t. So, add VM_WARN_ON_ONCE_FOLIO() to catch invalid users. The VM_BUG_ON_PAGE() in move_pages_to_lru() can be removed as add_page_to_lru_list() will perform the necessary check. Signed-off-by: Muchun Song Acked-by: Roman Gushchin --- include/linux/mm_inline.h | 6 ++++++ mm/vmscan.c | 1 - 2 files changed, 6 insertions(+), 1 deletion(-) diff --git a/include/linux/mm_inline.h b/include/linux/mm_inline.h index f9157a0c42a5..f36491c42ace 100644 --- a/include/linux/mm_inline.h +++ b/include/linux/mm_inline.h @@ -341,6 +341,8 @@ void lruvec_add_folio(struct lruvec *lruvec, struct fol= io *folio) { enum lru_list lru =3D folio_lru_list(folio); =20 + VM_WARN_ON_ONCE_FOLIO(!folio_matches_lruvec(folio, lruvec), folio); + if (lru_gen_add_folio(lruvec, folio, false)) return; =20 @@ -355,6 +357,8 @@ void lruvec_add_folio_tail(struct lruvec *lruvec, struc= t folio *folio) { enum lru_list lru =3D folio_lru_list(folio); =20 + VM_WARN_ON_ONCE_FOLIO(!folio_matches_lruvec(folio, lruvec), folio); + if (lru_gen_add_folio(lruvec, folio, true)) return; =20 @@ -369,6 +373,8 @@ void lruvec_del_folio(struct lruvec *lruvec, struct fol= io *folio) { enum lru_list lru =3D folio_lru_list(folio); =20 + VM_WARN_ON_ONCE_FOLIO(!folio_matches_lruvec(folio, lruvec), folio); + if (lru_gen_del_folio(lruvec, folio, false)) return; =20 diff --git a/mm/vmscan.c b/mm/vmscan.c index fbba14094c6d..a59268bf4112 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -1952,7 +1952,6 @@ static unsigned int move_folios_to_lru(struct list_he= ad *list) continue; } =20 - VM_BUG_ON_FOLIO(!folio_matches_lruvec(folio, lruvec), folio); lruvec_add_folio(lruvec, folio); nr_pages =3D folio_nr_pages(folio); nr_moved +=3D nr_pages; --=20 2.20.1