From nobody Thu Oct 2 07:43:34 2025 Received: from mail-pl1-f176.google.com (mail-pl1-f176.google.com [209.85.214.176]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 4CDA823BD13 for ; Fri, 19 Sep 2025 03:49:44 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.176 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1758253785; cv=none; b=U4jLjyZyoAJBAhT/DVZok9wABgVynFt1ClvLdeQkBJI90WjDFVndTyOqOoekAsKDe3sfGdA7VeFVdl4phsoknzQVrbk2uLvMwi2i8B5tY/foHPrBLftyx2N7LKJxLhhGZGpUffazXNd+eYIUi/zK99T2DlKK16N7N5xvruiN1hQ= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1758253785; c=relaxed/simple; bh=fN41K4+s6c2llu5wrjnSJOVOGZxC8pLWdWdNSYRSRXg=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=CiKakpaDIZn3/eKVulmpnZ0kr8H4TnnoSbIWmPItvOZ0TTSWQ4FfmboT/T2uu12DDYmAbYuYtg//vntpCNI775tdh9cwB6jlhoq866Lb257ygEgltC7Q7DbvuYk/3Km4D3lWp1V/CIZdzha4s5D3Lw5Xc3r/sbtQD2VquhQlw+w= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com; spf=pass smtp.mailfrom=bytedance.com; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b=YdJh+eL2; arc=none smtp.client-ip=209.85.214.176 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=bytedance.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b="YdJh+eL2" Received: by mail-pl1-f176.google.com with SMTP id d9443c01a7336-2698d47e776so10827995ad.1 for ; Thu, 18 Sep 2025 20:49:44 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1758253783; x=1758858583; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=Qta8j4KZvgpuCZEInGOfHTtc26DcWYqWpsrutb5sExg=; b=YdJh+eL2TZ0L8OtJ14r0ofNpKMOnHpM7GwZ9deDd8q+Diq1uNKFgweI2UTTrLcV/4H C0RYuIdIqwmNWBcbGkVhrYEgiwNsNpXSakAubXo278DJwnpouT8ITKYFI0/TMb9uMEZv 9ymRVVZK8pLGeb/C4SxtsxzCVtkn3mCZovy9KTbjfxfRjzZ42EIVGrbPIVa4dbPK8vvJ BRTxBakFWiW5MiXHQG2jEakDOk/FsR26qAn7TaC84Ym7H9g9y5akw9ZHzv3sX6ANleov 1xOAT8e96gDkiqXEUgMIjrgLQ5NbNqsQyEYVFFhAupU+kQWohlMy5/yNBuV6F6XmZiLv Kf/w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1758253783; x=1758858583; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=Qta8j4KZvgpuCZEInGOfHTtc26DcWYqWpsrutb5sExg=; b=OFR+xy0ZvvwqVguoN/GyprHILrTQufh2/W4iWAOzvi7KAi2Ux7zC4NnJZrTyafbluV fdH0dRPft/krGKW0vIlPO0I4C/7CVwZZtdh7e5S48CiscWsh66Sk34jyOkxD1dVBLgEX GiBiKcHQJesMCpsS6aR7Jb2zlJrw/MLK2vbGpczpaM7fhR5YoREAVNxs+UCQf8wdQ1+7 MIOjCmogg7ioTY/7BSkbNSmqJt5ELLmlNi7D9ryQutqubYCtNHUXP2+2h90qKzYCTk4g MPIMx0rtiUMH7rLV2chDkMlU+2J+OoSXsJDUfOHsP+7pYFQq+c4WN5H4zL1Q4umV9DeC bmEA== X-Forwarded-Encrypted: i=1; AJvYcCUYKC+Y+BUI4pnqYiHcn9SDih50iEa9wqvBeedjyf2ZeLfWreVrdrTGeFo82L3eYiL7KMM0zXADsiiKJUs=@vger.kernel.org X-Gm-Message-State: AOJu0YyhYV+kST+sZAcoUFQBwbMRgjVmDTq7CZF3mmJ1ko6K69//17aH qUgSqZsnJ+rdWuzIdT8eHd8nNWxocyBNsWusxWZ6MjqB2ayJLbqvnx1MvSu1TqXFt3s= X-Gm-Gg: ASbGncsuLh62ZBxqmzFT8IeOsdwE1gDhYGQO96NYaM9vI7LPfkP68lwxgA0Tb6uJezq LCfIgZz/bPOmZg/wLWzAbj5D4UguXcgCs2vqwXNlq6pwypn09yTW/xpupKwcj7Xb5yedOZRIxBd GBCuS6dG/hUHfmKXz/s/iP1TYEP0y8XL90YPmAFjCEOaeZXWxmn6kkKjIZbv7bycU+j6fZFGEuL qdd8xyR86nnRi7x+HXFS7/PWMC33OvQVTMYHpjZFFgc3NOlLjqs+gJjKBQq41t3lu3qS8WBUnCq mUKsuG53RhohzUGhhpSZPBYBywqpp3MjIACuz+unukoFaEYu+Xt5+mUYa+RL9gSlGOlggDeuZ6C S35z7Hq6v+rsiugmnViHcRW6H5odvEaLmRMqAzbthySr8tD+dOVn5TZ4l3O4Uvu4vd0S6JIY= X-Google-Smtp-Source: AGHT+IEKIcUFF6fbBY5OZR14G5gZ1jT642lbrLvDNjajNIskg3XnlnI32L5cYmFTm9cWzyX4rO37Qg== X-Received: by 2002:a17:903:298c:b0:267:ba92:4d19 with SMTP id d9443c01a7336-269ba253554mr27527575ad.0.1758253783635; Thu, 18 Sep 2025 20:49:43 -0700 (PDT) Received: from G7HT0H2MK4.bytedance.net ([63.216.146.178]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-269802de5e9sm39629235ad.72.2025.09.18.20.49.36 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Thu, 18 Sep 2025 20:49:43 -0700 (PDT) From: Qi Zheng To: hannes@cmpxchg.org, hughd@google.com, mhocko@suse.com, roman.gushchin@linux.dev, shakeel.butt@linux.dev, muchun.song@linux.dev, david@redhat.com, lorenzo.stoakes@oracle.com, ziy@nvidia.com, baolin.wang@linux.alibaba.com, Liam.Howlett@oracle.com, npache@redhat.com, ryan.roberts@arm.com, dev.jain@arm.com, baohua@kernel.org, lance.yang@linux.dev, akpm@linux-foundation.org Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, Muchun Song , Qi Zheng Subject: [PATCH 1/4] mm: thp: replace folio_memcg() with folio_memcg_charged() Date: Fri, 19 Sep 2025 11:46:32 +0800 Message-ID: <35038de39456f827c88766b90a1cec93cf151ef2.1758253018.git.zhengqi.arch@bytedance.com> X-Mailer: git-send-email 2.48.1 In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Muchun Song folio_memcg_charged() is intended for use when the user is unconcerned about the returned memcg pointer. It is more efficient than folio_memcg(). Therefore, replace folio_memcg() with folio_memcg_charged(). Signed-off-by: Muchun Song Acked-by: Johannes Weiner Signed-off-by: Qi Zheng Acked-by: David Hildenbrand Acked-by: Shakeel Butt --- mm/huge_memory.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 5acca24bbabbe..582628ddf3f33 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -4014,7 +4014,7 @@ bool __folio_unqueue_deferred_split(struct folio *fol= io) bool unqueued =3D false; =20 WARN_ON_ONCE(folio_ref_count(folio)); - WARN_ON_ONCE(!mem_cgroup_disabled() && !folio_memcg(folio)); + WARN_ON_ONCE(!mem_cgroup_disabled() && !folio_memcg_charged(folio)); =20 ds_queue =3D get_deferred_split_queue(folio); spin_lock_irqsave(&ds_queue->split_queue_lock, flags); --=20 2.20.1 From nobody Thu Oct 2 07:43:34 2025 Received: from mail-pl1-f177.google.com (mail-pl1-f177.google.com [209.85.214.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 6532225A338 for ; Fri, 19 Sep 2025 03:49:53 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.177 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1758253795; cv=none; b=ORsgx1lUzljc/K7Zbu3Mq4RCdDQ1qkCdh6rvUJ82U+Qn8t9tVUkJYcHrOrZuXpUrozJ+Nz6qYgmBrlAgtOWbmXPt2N0cm1hdRhf9XcYQsVF1/wQEEWUZfdDHSPl4Lx1ni/cmXQVOkGwYg6Dvs+03Kn0PuqEi3Ukv207ng6mEHtY= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1758253795; c=relaxed/simple; bh=g37vYcabJEJRYlDTW+ACXK5cF4/Rt3LDmjbXj6OCrpI=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=CzpAEgnPQ89tk48pEXLybcriPnz1rqIrjVK0q28QDxv6gSs+n7DJxsUsuHrM1R9iI05pH9xAleJEJIquyNNYDOG+9hw9HA6ypuHbc9MFAbeXjF/d9lMSkfOepOuc6D8Y+F3vYH18o/FPEEzp/Hf5pd3U7wHYNPlb0R8Hn/2O06I= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com; spf=pass smtp.mailfrom=bytedance.com; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b=jEtAt2kn; arc=none smtp.client-ip=209.85.214.177 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=bytedance.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b="jEtAt2kn" Received: by mail-pl1-f177.google.com with SMTP id d9443c01a7336-2445806df50so14046095ad.1 for ; Thu, 18 Sep 2025 20:49:53 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1758253793; x=1758858593; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=pH+wF1kXarYaSv3rskQg/2hJ0DlvmAW+2Zdq+CMJuUY=; b=jEtAt2knZRiQ5iSY9jk0JZ+ljiZSlaAiye4tjAH4fR5nZMUpK1E23+HT4dtCyhpZgE kvzdP6cGHsir60SX4Esbi3GsAQdVnBTD1J3M5K9twD7qXbuc2BG0Pg/eFQilN4n92hwK Wz5O8DEwmybskCjd/xectAKcuD241eWfZ6yRLwUmMKZRuy4nRKBIjdimElKCGE05+og7 7IVjqQnszqSjRcAUAWHk8B4f8a6mDMwEWI43qGeRTWBIvlLxYzqtkcXdVWeloEgZwRVn mqXHVmtRfiUhXPaOM6cLtvIeKJqS6ngvb7HuMKUrNgWYmrYhbBN9kkPMQ+JM4oUWJr/s ymgA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1758253793; x=1758858593; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=pH+wF1kXarYaSv3rskQg/2hJ0DlvmAW+2Zdq+CMJuUY=; b=FLWFpUTTJCKXuB+LPFDcW230KdkhdSkfe759pP3mwpZ5pYZS5Mpjej81ldUQKNnWtW 02EqD7tCbgVrY/w9gAwKcfOMcSUD6Pb0kFJ2Je2E9SxCwg+L3aDo7GXEshEfgfsZESJY u+gFVUhOCOCQBIFqYOsWs2qM/nuetEh5upENgFAZsKcrGa14Vz1Goatncofza2Sc26iG T6htI0a8rc44Cm4nIEWI+zjkVSMTTberJvQt16a1TJNPA7l4FVxN4UlAc7dgjD+qX02x xgCuxpm0r4UuvkxzqXg16a3/bmzlFsxDGNyCYDubgK739zsMFjkUasxdydbfT9E92FU+ fj3Q== X-Forwarded-Encrypted: i=1; AJvYcCVgkas/32GgxyolMcvPtCR9uVRZ23MWaq4np/+DkrgAVoFgyJiPXz/bGi0iP8XVBzRongiHL5Uoq6Tqo3w=@vger.kernel.org X-Gm-Message-State: AOJu0YzM0Pw02lTcg1br3CoTrrtwS+2Zg6H9jjei6jTkV51Mjm97tanZ UZd16O9SoEYsQRAt1omR9jXDI4bc4avxb36Nu3oWDGHbns74/DNu16kt+XURIQ8DJTc= X-Gm-Gg: ASbGncvAjez/joV0wdlTtptCUs1A/JI/Q44Mcr/PKyU9VqcEthymsm8PZc32glBNY+Z p1q0yreGL+m3mNoZClvW4VEN/NfZxAE7kZCpkuV6LcteBf1jHKEfP9+JyXFYjl50/80af3n58iA L/tS0d+dnZTCzWTnGDb1MAG4PTQjqPB9Ro6K7q+tMcXfjtMpi+NmmxnrBQoPkGnhUkLKEGaWMqi 31O7EnZ77+7QlvWaNY34G8MIeaN91K1SXGin9IAg+v94GP8a7qzLwTPWAJs6Jp+d27UEBFCIz8u 2RqclBr+Nd9CoYIi4T65HGh+RYVWqBx3ZsjJCaz/6PapcT3Q/gQ/cfhEpjC5PuNYpLz/+Mp1/YR RvumuKKviXRwRWiSpJKev5LAW3fb6yn8p6teE3rlZIzKXV2pe3xRUMK1og3j4zwUUWt66HbM= X-Google-Smtp-Source: AGHT+IE5FHZ53nVuEAm0EyvYyGaqEltHg/7s5hCn2sKWBeSzIX/++Bb6G05UYvRwpJYwFHv887Zr4Q== X-Received: by 2002:a17:902:ccd2:b0:252:50ad:4e6f with SMTP id d9443c01a7336-269ba566603mr31897395ad.54.1758253792612; Thu, 18 Sep 2025 20:49:52 -0700 (PDT) Received: from G7HT0H2MK4.bytedance.net ([63.216.146.178]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-269802de5e9sm39629235ad.72.2025.09.18.20.49.44 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Thu, 18 Sep 2025 20:49:52 -0700 (PDT) From: Qi Zheng To: hannes@cmpxchg.org, hughd@google.com, mhocko@suse.com, roman.gushchin@linux.dev, shakeel.butt@linux.dev, muchun.song@linux.dev, david@redhat.com, lorenzo.stoakes@oracle.com, ziy@nvidia.com, baolin.wang@linux.alibaba.com, Liam.Howlett@oracle.com, npache@redhat.com, ryan.roberts@arm.com, dev.jain@arm.com, baohua@kernel.org, lance.yang@linux.dev, akpm@linux-foundation.org Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, Muchun Song , Qi Zheng Subject: [PATCH 2/4] mm: thp: introduce folio_split_queue_lock and its variants Date: Fri, 19 Sep 2025 11:46:33 +0800 Message-ID: X-Mailer: git-send-email 2.48.1 In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Muchun Song In future memcg removal, the binding between a folio and a memcg may change, making the split lock within the memcg unstable when held. A new approach is required to reparent the split queue to its parent. This patch starts introducing a unified way to acquire the split lock for future work. It's a code-only refactoring with no functional changes. Signed-off-by: Muchun Song Acked-by: Johannes Weiner Signed-off-by: Qi Zheng Acked-by: David Hildenbrand Acked-by: Shakeel Butt --- include/linux/memcontrol.h | 10 +++++ mm/huge_memory.c | 89 ++++++++++++++++++++++++++------------ 2 files changed, 71 insertions(+), 28 deletions(-) diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h index 16fe0306e50ea..99876af13c315 100644 --- a/include/linux/memcontrol.h +++ b/include/linux/memcontrol.h @@ -1662,6 +1662,11 @@ int alloc_shrinker_info(struct mem_cgroup *memcg); void free_shrinker_info(struct mem_cgroup *memcg); void set_shrinker_bit(struct mem_cgroup *memcg, int nid, int shrinker_id); void reparent_shrinker_deferred(struct mem_cgroup *memcg); + +static inline int shrinker_id(struct shrinker *shrinker) +{ + return shrinker->id; +} #else #define mem_cgroup_sockets_enabled 0 =20 @@ -1693,6 +1698,11 @@ static inline void set_shrinker_bit(struct mem_cgrou= p *memcg, int nid, int shrinker_id) { } + +static inline int shrinker_id(struct shrinker *shrinker) +{ + return -1; +} #endif =20 #ifdef CONFIG_MEMCG diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 582628ddf3f33..d34516a22f5bb 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -1078,26 +1078,62 @@ pmd_t maybe_pmd_mkwrite(pmd_t pmd, struct vm_area_s= truct *vma) =20 #ifdef CONFIG_MEMCG static inline -struct deferred_split *get_deferred_split_queue(struct folio *folio) +struct mem_cgroup *folio_split_queue_memcg(struct folio *folio, + struct deferred_split *queue) { - struct mem_cgroup *memcg =3D folio_memcg(folio); - struct pglist_data *pgdat =3D NODE_DATA(folio_nid(folio)); - - if (memcg) - return &memcg->deferred_split_queue; - else - return &pgdat->deferred_split_queue; + if (mem_cgroup_disabled()) + return NULL; + if (&NODE_DATA(folio_nid(folio))->deferred_split_queue =3D=3D queue) + return NULL; + return container_of(queue, struct mem_cgroup, deferred_split_queue); } #else static inline -struct deferred_split *get_deferred_split_queue(struct folio *folio) +struct mem_cgroup *folio_split_queue_memcg(struct folio *folio, + struct deferred_split *queue) { - struct pglist_data *pgdat =3D NODE_DATA(folio_nid(folio)); - - return &pgdat->deferred_split_queue; + return NULL; } #endif =20 +static struct deferred_split *folio_split_queue_lock(struct folio *folio) +{ + struct mem_cgroup *memcg; + struct deferred_split *queue; + + memcg =3D folio_memcg(folio); + queue =3D memcg ? &memcg->deferred_split_queue : + &NODE_DATA(folio_nid(folio))->deferred_split_queue; + spin_lock(&queue->split_queue_lock); + + return queue; +} + +static struct deferred_split * +folio_split_queue_lock_irqsave(struct folio *folio, unsigned long *flags) +{ + struct mem_cgroup *memcg; + struct deferred_split *queue; + + memcg =3D folio_memcg(folio); + queue =3D memcg ? &memcg->deferred_split_queue : + &NODE_DATA(folio_nid(folio))->deferred_split_queue; + spin_lock_irqsave(&queue->split_queue_lock, *flags); + + return queue; +} + +static inline void split_queue_unlock(struct deferred_split *queue) +{ + spin_unlock(&queue->split_queue_lock); +} + +static inline void split_queue_unlock_irqrestore(struct deferred_split *qu= eue, + unsigned long flags) +{ + spin_unlock_irqrestore(&queue->split_queue_lock, flags); +} + static inline bool is_transparent_hugepage(const struct folio *folio) { if (!folio_test_large(folio)) @@ -3579,7 +3615,7 @@ static int __folio_split(struct folio *folio, unsigne= d int new_order, struct page *split_at, struct page *lock_at, struct list_head *list, bool uniform_split) { - struct deferred_split *ds_queue =3D get_deferred_split_queue(folio); + struct deferred_split *ds_queue; XA_STATE(xas, &folio->mapping->i_pages, folio->index); struct folio *end_folio =3D folio_next(folio); bool is_anon =3D folio_test_anon(folio); @@ -3718,7 +3754,7 @@ static int __folio_split(struct folio *folio, unsigne= d int new_order, } =20 /* Prevent deferred_split_scan() touching ->_refcount */ - spin_lock(&ds_queue->split_queue_lock); + ds_queue =3D folio_split_queue_lock(folio); if (folio_ref_freeze(folio, 1 + extra_pins)) { struct swap_cluster_info *ci =3D NULL; struct lruvec *lruvec; @@ -3740,7 +3776,7 @@ static int __folio_split(struct folio *folio, unsigne= d int new_order, */ list_del_init(&folio->_deferred_list); } - spin_unlock(&ds_queue->split_queue_lock); + split_queue_unlock(ds_queue); if (mapping) { int nr =3D folio_nr_pages(folio); =20 @@ -3835,7 +3871,7 @@ static int __folio_split(struct folio *folio, unsigne= d int new_order, if (ci) swap_cluster_unlock(ci); } else { - spin_unlock(&ds_queue->split_queue_lock); + split_queue_unlock(ds_queue); ret =3D -EAGAIN; } fail: @@ -4016,8 +4052,7 @@ bool __folio_unqueue_deferred_split(struct folio *fol= io) WARN_ON_ONCE(folio_ref_count(folio)); WARN_ON_ONCE(!mem_cgroup_disabled() && !folio_memcg_charged(folio)); =20 - ds_queue =3D get_deferred_split_queue(folio); - spin_lock_irqsave(&ds_queue->split_queue_lock, flags); + ds_queue =3D folio_split_queue_lock_irqsave(folio, &flags); if (!list_empty(&folio->_deferred_list)) { ds_queue->split_queue_len--; if (folio_test_partially_mapped(folio)) { @@ -4028,7 +4063,7 @@ bool __folio_unqueue_deferred_split(struct folio *fol= io) list_del_init(&folio->_deferred_list); unqueued =3D true; } - spin_unlock_irqrestore(&ds_queue->split_queue_lock, flags); + split_queue_unlock_irqrestore(ds_queue, flags); =20 return unqueued; /* useful for debug warnings */ } @@ -4036,10 +4071,7 @@ bool __folio_unqueue_deferred_split(struct folio *fo= lio) /* partially_mapped=3Dfalse won't clear PG_partially_mapped folio flag */ void deferred_split_folio(struct folio *folio, bool partially_mapped) { - struct deferred_split *ds_queue =3D get_deferred_split_queue(folio); -#ifdef CONFIG_MEMCG - struct mem_cgroup *memcg =3D folio_memcg(folio); -#endif + struct deferred_split *ds_queue; unsigned long flags; =20 /* @@ -4062,7 +4094,7 @@ void deferred_split_folio(struct folio *folio, bool p= artially_mapped) if (folio_test_swapcache(folio)) return; =20 - spin_lock_irqsave(&ds_queue->split_queue_lock, flags); + ds_queue =3D folio_split_queue_lock_irqsave(folio, &flags); if (partially_mapped) { if (!folio_test_partially_mapped(folio)) { folio_set_partially_mapped(folio); @@ -4077,15 +4109,16 @@ void deferred_split_folio(struct folio *folio, bool= partially_mapped) VM_WARN_ON_FOLIO(folio_test_partially_mapped(folio), folio); } if (list_empty(&folio->_deferred_list)) { + struct mem_cgroup *memcg; + + memcg =3D folio_split_queue_memcg(folio, ds_queue); list_add_tail(&folio->_deferred_list, &ds_queue->split_queue); ds_queue->split_queue_len++; -#ifdef CONFIG_MEMCG if (memcg) set_shrinker_bit(memcg, folio_nid(folio), - deferred_split_shrinker->id); -#endif + shrinker_id(deferred_split_shrinker)); } - spin_unlock_irqrestore(&ds_queue->split_queue_lock, flags); + split_queue_unlock_irqrestore(ds_queue, flags); } =20 static unsigned long deferred_split_count(struct shrinker *shrink, --=20 2.20.1 From nobody Thu Oct 2 07:43:34 2025 Received: from mail-pg1-f176.google.com (mail-pg1-f176.google.com [209.85.215.176]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id A7C3F217F3D for ; Fri, 19 Sep 2025 03:50:02 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.215.176 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1758253805; cv=none; b=TAgviv5PfAIzjyjBcm9JqZSFe3JQb5OxW98XHUpkS5bAaWBjUEGr92c0VNwhGm6K957/sT9SOG3n3KwAQRsN/5+lnipW6uepKv/W+WdCnsF3sYxQZOZYv7ZUgTcBAAi97V3hbnzISBw85CwwzL9g9BXY/xXEDVNpAccay7R2wdI= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1758253805; c=relaxed/simple; bh=uFqXaX4786zqLvlI1mUVJQRvKysWf2519E4f/yIziLs=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=KSIhfa9tIINd8JensBP0IsOUHEeNGoDWpOLjJFGJyrb0lzv/RSzqgGARE0pMsqfluf/3h5wi4EMnHEaLbsnNkS+3McPCssSePxt3IwRnB9AwsJYO3ul9iwcdvH4/bFQtySezSa4qwyjh/Hb9Zf9Mu/BLbB7cSU5sVE534KXGxiw= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com; spf=pass smtp.mailfrom=bytedance.com; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b=Atv76PuE; arc=none smtp.client-ip=209.85.215.176 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=bytedance.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b="Atv76PuE" Received: by mail-pg1-f176.google.com with SMTP id 41be03b00d2f7-b54c707374fso1157676a12.1 for ; Thu, 18 Sep 2025 20:50:02 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1758253802; x=1758858602; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=ZDYs9OGpSFKUZZGueotNHOqsQDyScZ0e77VVTnkWHNA=; b=Atv76PuEY0qf4fYS70v91/sDTuMteu2nk1+3drTEbeoMVR/0V+rvn+nsf1R26EOwIo N1dRlcb9c/lhAZ3669UQJAHowY5t/IlrtwyUYz1R8gZnwmDcn+dOe/pJlvzeMsH6ZFfi gPoS8ECVPYR8tyJjnt8b/RXfhTw5t5HQE3OcYiaOuEok4FbbXEuyJYBrGIaCndX5cS2y FN4bnUKa2JxPRe7XIBLBPosAu9tAg2LLENSzbBXarIzdKQ4SGydZ1qCjXNm6FqxjYzP1 ASmR5i2gZcbqIG6YAW0hpaLYi22N4IZbR+fxBmL0jg4oBa00IExmaRWUvrGKW2cudyXH rWPA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1758253802; x=1758858602; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=ZDYs9OGpSFKUZZGueotNHOqsQDyScZ0e77VVTnkWHNA=; b=h+UIeLeJGeNdaaKkIKEY8iAMl90aQ38QPmfwCjRq3P4CcMQ/8t2/uTeD0h3ethLdxf ZVnv6HP4BJKb6rPZowm2NapmNDe6ddu/np/S/F/P7PAIqLod3Lm2FgKfRueAYQ3LcU+0 RHFZ58joTnO1p4OA4WXFBizCgRXjp5rXVBUXK7o2ms0RT5mY+njzVRhqdciOKO+OEQDQ Ow9HGkUAOM3IP2p4bZKfFwkgpdYQQNlwVqohXZZI+O4mV5Pn1zJMvgrfBkkWemBAQUy9 x7Iq4Vv2IigUWEwWas3xvTxB1nWHpfMyop0SpOwCJ+1HBIvc5dm+xF1lfjJ1fqn+Xlty bPBA== X-Forwarded-Encrypted: i=1; AJvYcCXWqsCkhXh+BnU8/2d0z/CDoXZN6Ghc90H2ajGnCy3l2kQMOaYG9ASJi9B69gCwqrHXZdNeLSPPPethzy0=@vger.kernel.org X-Gm-Message-State: AOJu0Yx/WPD5DHRU6SEF5V9oiHIKKl+6U2KhlEpT+Ukz4TPYrcTaDLOl wwTIWVHnFZJZ9gmzbSeym639QC1AhsqFYTG6RgjDgaO1uN1rm9y11hxZKhVW+G0muFU= X-Gm-Gg: ASbGncvp+sKm6OEWqbOrpxOvTNsH+pNGcP/X8pOOEsmPhOUTUb0vh8sXVQeK2YjFtws IiLMr6hlJ0NdCaE9zO3YXOqnzuXJcaEwe9uLLTpInZ0+HJRRdmU8VJ33Va23EraTZmdxYmCYYzl 0XMotnDWj7xWB0Ia0VRCBxAF5AlblCHLLq6RGQopQgKUw0sEGVJ13k+f447j+0nPTustpZ6cYee 3jPyVc+o0CaoucoHpyBlmFvz5cV+dQksR1maBFeAKSwRK4OrjZ9oJHlxCfUa8mzW/ataR5sa9wU IZedLvOBZdB8dgqxaZ2GobwfzdvkIMItatcNs33ruBZ7XAcSC6jX8ggUenaqkTrn8SB3hEK7XlD knqrPRYFkX2iMAKhmJf+W+sP5jV4PAvweQ8gQCAhKjfeL5CEOTc5kQy1gynxXmj5Di/DT1Ck= X-Google-Smtp-Source: AGHT+IFwsSMJjHL+zbHQOwnU5q2n34Jm6lBoXue7BH/hv8fZoGm3UEzcsPvo+mERmOTWnPv00CgCYw== X-Received: by 2002:a17:903:6c3:b0:262:661d:eb1d with SMTP id d9443c01a7336-269ba3c2c39mr19723955ad.1.1758253801748; Thu, 18 Sep 2025 20:50:01 -0700 (PDT) Received: from G7HT0H2MK4.bytedance.net ([63.216.146.178]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-269802de5e9sm39629235ad.72.2025.09.18.20.49.53 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Thu, 18 Sep 2025 20:50:01 -0700 (PDT) From: Qi Zheng To: hannes@cmpxchg.org, hughd@google.com, mhocko@suse.com, roman.gushchin@linux.dev, shakeel.butt@linux.dev, muchun.song@linux.dev, david@redhat.com, lorenzo.stoakes@oracle.com, ziy@nvidia.com, baolin.wang@linux.alibaba.com, Liam.Howlett@oracle.com, npache@redhat.com, ryan.roberts@arm.com, dev.jain@arm.com, baohua@kernel.org, lance.yang@linux.dev, akpm@linux-foundation.org Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, Muchun Song , Qi Zheng Subject: [PATCH 3/4] mm: thp: use folio_batch to handle THP splitting in deferred_split_scan() Date: Fri, 19 Sep 2025 11:46:34 +0800 Message-ID: <3db5da29d767162a006a562963eb52df9ce45a51.1758253018.git.zhengqi.arch@bytedance.com> X-Mailer: git-send-email 2.48.1 In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Muchun Song The maintenance of the folio->_deferred_list is intricate because it's reused in a local list. Here are some peculiarities: 1) When a folio is removed from its split queue and added to a local on-stack list in deferred_split_scan(), the ->split_queue_len isn't updated, leading to an inconsistency between it and the actual number of folios in the split queue. 2) When the folio is split via split_folio() later, it's removed from the local list while holding the split queue lock. At this time, this lock protects the local list, not the split queue. 3) To handle the race condition with a third-party freeing or migrating the preceding folio, we must ensure there's always one safe (with raised refcount) folio before by delaying its folio_put(). More details can be found in commit e66f3185fa04 ("mm/thp: fix deferred split queue not partially_mapped"). It's rather tricky. We can use the folio_batch infrastructure to handle this clearly. In this case, ->split_queue_len will be consistent with the real number of folios in the split queue. If list_empty(&folio->_deferred_list) returns false, it's clear the folio must be in its split queue (not in a local list anymore). In the future, we will reparent LRU folios during memcg offline to eliminate dying memory cgroups, which requires reparenting the split queue to its parent first. So this patch prepares for using folio_split_queue_lock_irqsave() as the memcg may change then. Signed-off-by: Muchun Song Signed-off-by: Qi Zheng --- mm/huge_memory.c | 88 +++++++++++++++++++++++------------------------- 1 file changed, 42 insertions(+), 46 deletions(-) diff --git a/mm/huge_memory.c b/mm/huge_memory.c index d34516a22f5bb..ab16da21c94e0 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -3760,21 +3760,22 @@ static int __folio_split(struct folio *folio, unsig= ned int new_order, struct lruvec *lruvec; int expected_refs; =20 - if (folio_order(folio) > 1 && - !list_empty(&folio->_deferred_list)) { - ds_queue->split_queue_len--; + if (folio_order(folio) > 1) { + if (!list_empty(&folio->_deferred_list)) { + ds_queue->split_queue_len--; + /* + * Reinitialize page_deferred_list after removing the + * page from the split_queue, otherwise a subsequent + * split will see list corruption when checking the + * page_deferred_list. + */ + list_del_init(&folio->_deferred_list); + } if (folio_test_partially_mapped(folio)) { folio_clear_partially_mapped(folio); mod_mthp_stat(folio_order(folio), MTHP_STAT_NR_ANON_PARTIALLY_MAPPED, -1); } - /* - * Reinitialize page_deferred_list after removing the - * page from the split_queue, otherwise a subsequent - * split will see list corruption when checking the - * page_deferred_list. - */ - list_del_init(&folio->_deferred_list); } split_queue_unlock(ds_queue); if (mapping) { @@ -4173,40 +4174,48 @@ static unsigned long deferred_split_scan(struct shr= inker *shrink, struct pglist_data *pgdata =3D NODE_DATA(sc->nid); struct deferred_split *ds_queue =3D &pgdata->deferred_split_queue; unsigned long flags; - LIST_HEAD(list); - struct folio *folio, *next, *prev =3D NULL; - int split =3D 0, removed =3D 0; + struct folio *folio, *next; + int split =3D 0, i; + struct folio_batch fbatch; + bool done; =20 #ifdef CONFIG_MEMCG if (sc->memcg) ds_queue =3D &sc->memcg->deferred_split_queue; #endif =20 + folio_batch_init(&fbatch); +retry: + done =3D true; spin_lock_irqsave(&ds_queue->split_queue_lock, flags); /* Take pin on all head pages to avoid freeing them under us */ list_for_each_entry_safe(folio, next, &ds_queue->split_queue, _deferred_list) { if (folio_try_get(folio)) { - list_move(&folio->_deferred_list, &list); - } else { + folio_batch_add(&fbatch, folio); + } else if (folio_test_partially_mapped(folio)) { /* We lost race with folio_put() */ - if (folio_test_partially_mapped(folio)) { - folio_clear_partially_mapped(folio); - mod_mthp_stat(folio_order(folio), - MTHP_STAT_NR_ANON_PARTIALLY_MAPPED, -1); - } - list_del_init(&folio->_deferred_list); - ds_queue->split_queue_len--; + folio_clear_partially_mapped(folio); + mod_mthp_stat(folio_order(folio), + MTHP_STAT_NR_ANON_PARTIALLY_MAPPED, -1); } + list_del_init(&folio->_deferred_list); + ds_queue->split_queue_len--; if (!--sc->nr_to_scan) break; + if (folio_batch_space(&fbatch) =3D=3D 0) { + done =3D false; + break; + } } spin_unlock_irqrestore(&ds_queue->split_queue_lock, flags); =20 - list_for_each_entry_safe(folio, next, &list, _deferred_list) { + for (i =3D 0; i < folio_batch_count(&fbatch); i++) { bool did_split =3D false; bool underused =3D false; + struct deferred_split *fqueue; =20 + folio =3D fbatch.folios[i]; if (!folio_test_partially_mapped(folio)) { /* * See try_to_map_unused_to_zeropage(): we cannot @@ -4229,38 +4238,25 @@ static unsigned long deferred_split_scan(struct shr= inker *shrink, } folio_unlock(folio); next: + if (did_split || !folio_test_partially_mapped(folio)) + continue; /* - * split_folio() removes folio from list on success. * Only add back to the queue if folio is partially mapped. * If thp_underused returns false, or if split_folio fails * in the case it was underused, then consider it used and * don't add it back to split_queue. */ - if (did_split) { - ; /* folio already removed from list */ - } else if (!folio_test_partially_mapped(folio)) { - list_del_init(&folio->_deferred_list); - removed++; - } else { - /* - * That unlocked list_del_init() above would be unsafe, - * unless its folio is separated from any earlier folios - * left on the list (which may be concurrently unqueued) - * by one safe folio with refcount still raised. - */ - swap(folio, prev); + fqueue =3D folio_split_queue_lock_irqsave(folio, &flags); + if (list_empty(&folio->_deferred_list)) { + list_add_tail(&folio->_deferred_list, &fqueue->split_queue); + fqueue->split_queue_len++; } - if (folio) - folio_put(folio); + split_queue_unlock_irqrestore(fqueue, flags); } + folios_put(&fbatch); =20 - spin_lock_irqsave(&ds_queue->split_queue_lock, flags); - list_splice_tail(&list, &ds_queue->split_queue); - ds_queue->split_queue_len -=3D removed; - spin_unlock_irqrestore(&ds_queue->split_queue_lock, flags); - - if (prev) - folio_put(prev); + if (!done) + goto retry; =20 /* * Stop shrinker if we didn't split any page, but the queue is empty. --=20 2.20.1 From nobody Thu Oct 2 07:43:34 2025 Received: from mail-pl1-f174.google.com (mail-pl1-f174.google.com [209.85.214.174]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 85FB326AA91 for ; Fri, 19 Sep 2025 03:50:10 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.174 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1758253812; cv=none; b=pFeoCsnkLoUMog6rI3WVtGaJK3wxIuW2XK9FKE/v1RDw/hk6lXLSOa6qi8H+lZrs0bQG0VkpXiy10UYCAmAcOPB7HkhMtzizLxcvtadE4JAOHwmBPj57gxpavf/Ea8IXb8HlqRxvnp5KiTcqSxm9CLgFoRKCpWe16pfjgUpJWcs= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1758253812; c=relaxed/simple; bh=2BfJcoSg/7mgP/yKydyYCUCNnty9IIfBW6dQrQZHU1s=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=Ryov0K+ltDDePo4yuXh8bQ1jLWtr/kKYj/Yrg5RuPbA8dE85n+jBs/vqKMBaju1uwKzz3iXlbOhno8Pf3h0XTWpKnbMUQMHXOdWpBo75ToEI8fyIrdslrvQDGBSo6R14kHI9aRAYORQm1a8I+eY+vN3oGHK75F1n0R4uyd/H1VU= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com; spf=pass smtp.mailfrom=bytedance.com; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b=K6QrOtNF; arc=none smtp.client-ip=209.85.214.174 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=bytedance.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b="K6QrOtNF" Received: by mail-pl1-f174.google.com with SMTP id d9443c01a7336-267f0fe72a1so10504835ad.2 for ; Thu, 18 Sep 2025 20:50:10 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1758253810; x=1758858610; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=Dh3BXMCVe/uDP+wP9LS7p/Pcrlb4d1sLNbOVo2rnZMs=; b=K6QrOtNFaBT8fXIYbG22neUs9QyCpY2AaqHAg5ZVQeItzyaSXvFbFLXi1wEhaEyZSB r05fHDBxuYuMPBB1gnHaxfAhq/qlsqGWI4kV0Vn6T3hJ8t1j2VCDsiyCDFbMsRogsQsH /UR7qH61uemEPm24v4rqcW8q/LDN7Asw7Cxbb8AcHF58b5iTbRV96Bfar2pdIdg916eu I9/9gUZ7W9sgdUNQchg9P/0NZSFtCTAV9NLqSZv7RXLQ0CSleoxJzz6Lf/EQtXXdgCeU Hji9dj3cfS5EvuACEJh9YiZJCZyVIETYLbR3tBHM/WivGObUF1DKyIWjIlyf9lFmgNs4 jAmg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1758253810; x=1758858610; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=Dh3BXMCVe/uDP+wP9LS7p/Pcrlb4d1sLNbOVo2rnZMs=; b=UBJ7jhsBlsuQaPnYUIKoSjaOkDzkaNJJ/0L9+LSna4c5UBD6J1IMs6CZsg55VwfEHb swcFGSzxzv2+IGDh1VOk39FA2LzuO+cQfP2HVGADRZpffY9+aSF9XuCwxQqp/UF8YSVZ j851Ui1tWu0v8mbt0wY5FuNjikHBRmK92EtRAIYdFae594BdUDs//xgbVUZulMJPoOTS 8nTFTx2AgxWS5bp+EmLQZDJTvEt5ITIRPymWZAzLgDy22mCsO+TgpYCWZbn1e5gsatPD BjiXhifza4rLOvYzIsR2MnESRxGuQR2GBnkbDvo9UedJWW9mp3ckIH8oyVnZDImFGuht zqTw== X-Forwarded-Encrypted: i=1; AJvYcCUtAUgtD4ANPdCsHmn2olNdcGDayhB5ra3EykR+hMjSJ1KHmoevkh8bWcdbxQ6hQ1SYTVvY5En1ONoYkv4=@vger.kernel.org X-Gm-Message-State: AOJu0Yzm/JIjpsc8dhzllbwg+oqORg3/hDYbHuBuKG9GnO7clM0AGFek Hz13P7ywQq9KY4qgNp8CYHLobmaPAiMqjNhIi9CmbG96LfIS9936PpfASdCUr219JHg= X-Gm-Gg: ASbGncszS+BUN7kkYG0PtTKyQzA+U9fmrWNkbYLmUyW0liEVBuH6vRZqob902LUDjnL /e362P4RinlUvw6OtvON1drLoaDk+rSlXeOcQuOGDUZTG0izwPpDDQlq065yVAydIVA1wah71Ce BCfrN/vOhiVC/Mi/SGujCJBTDVg2UE2J+Z4kRocGcJjffZ+T7O2zycp6UZmtF1NXslLSAnsRnqS /tRgU/MFx4addrVXAwyhqifPdCnqi/MOm61r57KoLdJX/NOZfVV8kBXWN2Ge87OgdMCVyBbUEUu 8YMu0Pp5v4OecdTTtHCpsDpYNEq+RipbgYPQd2um0kVP6JwL9jK72uMI7OklADpQ339+XAvwR7n k1ISHLuvsNN6EhyMb6b8vv9Y7uL04Em/3gn7ZNHAQC8n3FxdwLdKZ43aYxMG4BqxxBPjKiKk= X-Google-Smtp-Source: AGHT+IHXW8cvX8V+aWHxCFgC0rEAHufiYcxbAIJngIiMTSD/3JE4z6oIJUCUUjulr4OnA4Onw52AMw== X-Received: by 2002:a17:903:6c3:b0:262:661d:eb1d with SMTP id d9443c01a7336-269ba3c2c39mr19727495ad.1.1758253809733; Thu, 18 Sep 2025 20:50:09 -0700 (PDT) Received: from G7HT0H2MK4.bytedance.net ([63.216.146.178]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-269802de5e9sm39629235ad.72.2025.09.18.20.50.02 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Thu, 18 Sep 2025 20:50:08 -0700 (PDT) From: Qi Zheng To: hannes@cmpxchg.org, hughd@google.com, mhocko@suse.com, roman.gushchin@linux.dev, shakeel.butt@linux.dev, muchun.song@linux.dev, david@redhat.com, lorenzo.stoakes@oracle.com, ziy@nvidia.com, baolin.wang@linux.alibaba.com, Liam.Howlett@oracle.com, npache@redhat.com, ryan.roberts@arm.com, dev.jain@arm.com, baohua@kernel.org, lance.yang@linux.dev, akpm@linux-foundation.org Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, Qi Zheng Subject: [PATCH 4/4] mm: thp: reparent the split queue during memcg offline Date: Fri, 19 Sep 2025 11:46:35 +0800 Message-ID: X-Mailer: git-send-email 2.48.1 In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" In the future, we will reparent LRU folios during memcg offline to eliminate dying memory cgroups, which requires reparenting the split queue to its parent. Similar to list_lru, the split queue is relatively independent and does not need to be reparented along with objcg and LRU folios (holding objcg lock and lru lock). So let's apply the same mechanism as list_lru to reparent the split queue separately when memcg is offine. Signed-off-by: Qi Zheng --- include/linux/huge_mm.h | 1 + include/linux/mmzone.h | 1 + mm/huge_memory.c | 39 +++++++++++++++++++++++++++++++++++++++ mm/memcontrol.c | 1 + mm/mm_init.c | 1 + 5 files changed, 43 insertions(+) diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h index f327d62fc9852..3215a35a20411 100644 --- a/include/linux/huge_mm.h +++ b/include/linux/huge_mm.h @@ -417,6 +417,7 @@ static inline int split_huge_page(struct page *page) return split_huge_page_to_list_to_order(page, NULL, ret); } void deferred_split_folio(struct folio *folio, bool partially_mapped); +void reparent_deferred_split_queue(struct mem_cgroup *memcg); =20 void __split_huge_pmd(struct vm_area_struct *vma, pmd_t *pmd, unsigned long address, bool freeze); diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index 7fb7331c57250..f3eb81fee056a 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -1346,6 +1346,7 @@ struct deferred_split { spinlock_t split_queue_lock; struct list_head split_queue; unsigned long split_queue_len; + bool is_dying; }; #endif =20 diff --git a/mm/huge_memory.c b/mm/huge_memory.c index ab16da21c94e0..72e78d22ec4b2 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -1102,9 +1102,15 @@ static struct deferred_split *folio_split_queue_lock= (struct folio *folio) struct deferred_split *queue; =20 memcg =3D folio_memcg(folio); +retry: queue =3D memcg ? &memcg->deferred_split_queue : &NODE_DATA(folio_nid(folio))->deferred_split_queue; spin_lock(&queue->split_queue_lock); + if (unlikely(queue->is_dying =3D=3D true)) { + spin_unlock(&queue->split_queue_lock); + memcg =3D parent_mem_cgroup(memcg); + goto retry; + } =20 return queue; } @@ -1116,9 +1122,15 @@ folio_split_queue_lock_irqsave(struct folio *folio, = unsigned long *flags) struct deferred_split *queue; =20 memcg =3D folio_memcg(folio); +retry: queue =3D memcg ? &memcg->deferred_split_queue : &NODE_DATA(folio_nid(folio))->deferred_split_queue; spin_lock_irqsave(&queue->split_queue_lock, *flags); + if (unlikely(queue->is_dying =3D=3D true)) { + spin_unlock_irqrestore(&queue->split_queue_lock, *flags); + memcg =3D parent_mem_cgroup(memcg); + goto retry; + } =20 return queue; } @@ -4267,6 +4279,33 @@ static unsigned long deferred_split_scan(struct shri= nker *shrink, return split; } =20 +void reparent_deferred_split_queue(struct mem_cgroup *memcg) +{ + struct mem_cgroup *parent =3D parent_mem_cgroup(memcg); + struct deferred_split *ds_queue =3D &memcg->deferred_split_queue; + struct deferred_split *parent_ds_queue =3D &parent->deferred_split_queue; + int nid; + + spin_lock_irq(&ds_queue->split_queue_lock); + spin_lock_nested(&parent_ds_queue->split_queue_lock, SINGLE_DEPTH_NESTING= ); + + if (!ds_queue->split_queue_len) + goto unlock; + + list_splice_tail_init(&ds_queue->split_queue, &parent_ds_queue->split_que= ue); + parent_ds_queue->split_queue_len +=3D ds_queue->split_queue_len; + ds_queue->split_queue_len =3D 0; + /* Mark the ds_queue dead */ + ds_queue->is_dying =3D true; + + for_each_node(nid) + set_shrinker_bit(parent, nid, shrinker_id(deferred_split_shrinker)); + +unlock: + spin_unlock(&parent_ds_queue->split_queue_lock); + spin_unlock_irq(&ds_queue->split_queue_lock); +} + #ifdef CONFIG_DEBUG_FS static void split_huge_pages_all(void) { diff --git a/mm/memcontrol.c b/mm/memcontrol.c index e090f29eb03bd..d03da72e7585d 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -3887,6 +3887,7 @@ static void mem_cgroup_css_offline(struct cgroup_subs= ys_state *css) zswap_memcg_offline_cleanup(memcg); =20 memcg_offline_kmem(memcg); + reparent_deferred_split_queue(memcg); reparent_shrinker_deferred(memcg); wb_memcg_offline(memcg); lru_gen_offline_memcg(memcg); diff --git a/mm/mm_init.c b/mm/mm_init.c index 3db2dea7db4c5..cbda5c2ee3241 100644 --- a/mm/mm_init.c +++ b/mm/mm_init.c @@ -1387,6 +1387,7 @@ static void pgdat_init_split_queue(struct pglist_data= *pgdat) spin_lock_init(&ds_queue->split_queue_lock); INIT_LIST_HEAD(&ds_queue->split_queue); ds_queue->split_queue_len =3D 0; + ds_queue->is_dying =3D false; } #else static void pgdat_init_split_queue(struct pglist_data *pgdat) {} --=20 2.20.1