From nobody Thu Oct 2 03:30:40 2025 Received: from mail-pj1-f47.google.com (mail-pj1-f47.google.com [209.85.216.47]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 3C9E2318151 for ; Tue, 23 Sep 2025 09:16:56 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.216.47 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1758619017; cv=none; b=bZdsL2eHA7DnaZA0cRqhjLBHf7GYtSK/rV04GSwmURPPmYnyRBU6klplswJy/OO0c0mWnCKBr/6+bIfHmyOg7Lh7xj8lgw6fYrda251/dgBWRrr+6MwUitBXv++c5u/79WwSeOsLXDkQIcJJBUYxt2/LM30rQ7m0I8VG3zQDFfE= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1758619017; c=relaxed/simple; bh=AU412Bpf49FFmisY/etBQ+7iirkt/xVlj8aREaYYDAI=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=Yr1iM9A21RrfaDgEdD/jun1TVCwHFU6sTzwDc7V7YQr8k+Nbh3kmU5CVgQvsBA2Chq2iVeRJwlS6XbGGST1RhYc93R6oWlG4IMjmLVX8ebjPLcnVcg8AyH+pLVfIU/axB8a+lyEN3YVLLSau5jmaAmzF9XfJvQAco5vwK5y7RXw= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com; spf=pass smtp.mailfrom=bytedance.com; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b=ZY+DlXSo; arc=none smtp.client-ip=209.85.216.47 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=bytedance.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b="ZY+DlXSo" Received: by mail-pj1-f47.google.com with SMTP id 98e67ed59e1d1-330a4d4359bso3122563a91.2 for ; Tue, 23 Sep 2025 02:16:56 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1758619015; x=1759223815; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=1vRIzRAWis1O+Krw5uc9gzQHxpU0thSXw3zEc2VjfIw=; b=ZY+DlXSoTf12nqaS1o9IaGl26ZPlx49zXWo6COAlzsvh7Y22Vz1SUJi0x24dqQuAeK 2CuFcBArQUSLCCUg4yJ1/OWCfLmflcuPgIrWALl1B/JnSFr+q6CUpD8PxEMlylB97tHu D8HIK5JzcA170JIF/ssynAQvLDG4k2DQvJJG6oDaaPXEtHA7v+2PG147JbNonDtIs0yD xJeNNU/HFdoJPvZuxHFh+B0hY5YsvkK2YcPs8iTk22hwjw6oAVcHRjFMgd5TuNF4Gg4V zIoewie3rfgJx6s3wtaHf3iyH+BA6FpxRKnTl/21BIxpOzv/YX96bCVAoeVk4nJhMUkR D4Ow== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1758619015; x=1759223815; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=1vRIzRAWis1O+Krw5uc9gzQHxpU0thSXw3zEc2VjfIw=; b=OWywl25XvJ+SuRaQ2JfKe7sLWMNqMVTBAx1Z5A1njTuootoIFwp14pqCMP2OcdSi9e z5YR8EODOvAEkEJGWjQfkYJAGGwJo2k/ryTuK9o4Vr1f0nYQTI0t5ryrl2swLk/gSveh Abv094vU4cX/Gyrn7sm+ZrXdTbBThcmHSuiVSCqZ3LshidV8CGZZfbIRGoklrlr40FQn LuN+ehWezaa1lfW78KqjSLksW9bO9b0IWnfOYkC6JLvZyZKgOqW47x1hpETjHCTbaz8C 36T0oTY0LGuDGrfh3ZzciSX39KMcvP57xEzdwcW5WIP1IZnQ/+y/7J08lXEs0UNhNBWm 4sAg== X-Forwarded-Encrypted: i=1; AJvYcCXotgBMIctCISUXe/ohqTj10hp8CUzRT+abRKSYyRxe5gfJKl7Kg+YC//C4qlNvyTsMRsbhkWzStvsxaFQ=@vger.kernel.org X-Gm-Message-State: AOJu0Yz/5snylD9xMvXqFEMH2m0ed3fKdjboC6yg0hjaP7XcessIk2eO r4VILikny+9XbSlDJ9KhArFdEkXmDZE/suNXyq207uujHpzuXD0SxjIUI12TzuEtCcI= X-Gm-Gg: ASbGncvISz4Zkbr9m2r0gAZ+0IgbXo/aWwlW9zsF4UW7qFiRIdQdKcmojbSwy1DZsQL ZgJ5w5OVKG6wFuq4fApgzZZ4UgbBGelD32033AUwm62uxJiaw1eLGVgxLXKkVuS2Jx+nyrT89P4 4dkOmXkMB2OZkQrcO1QoF7KXHiTQpfUMlz39N6QrtR4hFXuEuqb90EvLLPEB+pMlCNbb5MYsIZS ngvVyTJB13dmWgs/k8yFgrr0gAK2b2EhvmV+cNr7gOMpK5BWaH/cqMRCqi+Twndic6wV8UzYMbH TKoc4X3HxLkHDO6NPXuCWCqUtWSZoNFSKCFqo2WtTYaYhTrb/0HBHZQ3LP60HT7E9Y8VKFyhfGO 8MuX2Ft4ccRcdnW2Wf7DXch1xzgL+Wex41hnC/vQBPJ9xdOSN+5GYYfhMSKBbKLdoAeXziL0= X-Google-Smtp-Source: AGHT+IHwALepKfuy3l2UtqVt0g85ATZ63XzujHyf6JpHaHemjaQ8df2V5+fXQ28C3kJ2VnPBxstLCQ== X-Received: by 2002:a17:90b:1b4b:b0:330:4a1d:223c with SMTP id 98e67ed59e1d1-332a9515d33mr2709085a91.15.1758619015333; Tue, 23 Sep 2025 02:16:55 -0700 (PDT) Received: from G7HT0H2MK4.bytedance.net ([63.216.146.178]) by smtp.gmail.com with ESMTPSA id 98e67ed59e1d1-32ed26a9993sm18724713a91.11.2025.09.23.02.16.47 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Tue, 23 Sep 2025 02:16:54 -0700 (PDT) From: Qi Zheng To: hannes@cmpxchg.org, hughd@google.com, mhocko@suse.com, roman.gushchin@linux.dev, shakeel.butt@linux.dev, muchun.song@linux.dev, david@redhat.com, lorenzo.stoakes@oracle.com, ziy@nvidia.com, harry.yoo@oracle.com, baolin.wang@linux.alibaba.com, Liam.Howlett@oracle.com, npache@redhat.com, ryan.roberts@arm.com, dev.jain@arm.com, baohua@kernel.org, lance.yang@linux.dev, akpm@linux-foundation.org Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, Muchun Song , Qi Zheng Subject: [PATCH v2 1/4] mm: thp: replace folio_memcg() with folio_memcg_charged() Date: Tue, 23 Sep 2025 17:16:22 +0800 Message-ID: <0ac716fb7fea89ada92ad544f88ca546e43d1f29.1758618527.git.zhengqi.arch@bytedance.com> X-Mailer: git-send-email 2.48.1 In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Muchun Song folio_memcg_charged() is intended for use when the user is unconcerned about the returned memcg pointer. It is more efficient than folio_memcg(). Therefore, replace folio_memcg() with folio_memcg_charged(). Signed-off-by: Muchun Song Acked-by: Johannes Weiner Acked-by: Shakeel Butt Acked-by: David Hildenbrand Signed-off-by: Qi Zheng Reviewed-by: Roman Gushchin --- mm/huge_memory.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 5acca24bbabbe..582628ddf3f33 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -4014,7 +4014,7 @@ bool __folio_unqueue_deferred_split(struct folio *fol= io) bool unqueued =3D false; =20 WARN_ON_ONCE(folio_ref_count(folio)); - WARN_ON_ONCE(!mem_cgroup_disabled() && !folio_memcg(folio)); + WARN_ON_ONCE(!mem_cgroup_disabled() && !folio_memcg_charged(folio)); =20 ds_queue =3D get_deferred_split_queue(folio); spin_lock_irqsave(&ds_queue->split_queue_lock, flags); --=20 2.20.1 From nobody Thu Oct 2 03:30:40 2025 Received: from mail-pj1-f48.google.com (mail-pj1-f48.google.com [209.85.216.48]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 81A2931D393 for ; Tue, 23 Sep 2025 09:17:04 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.216.48 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1758619026; cv=none; b=BacbTmHT3Gas6axox2dX3LdXA/q47YRTv7iemttQ3Q/Nz2UKGm+FyDA1Ly8Xy17em+ztUM79qrXmMOPNgJ42tgY4036+i9uFVnLzm6MZNdnUy6qNpRSKpSQ3go3pbjte9uWwGA+Tr/HA63hZ/TY3ohOj5ysCCAtc9hYlB7h1w6Y= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1758619026; c=relaxed/simple; bh=CCDOaehAWoJ5rEBlGLkY9JtnaKGzUldRwoprN0ERAK8=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=nScPws2UHzSdEcuNHI00aUWS5b1aIBFdVKIS3Fd0uiFByLGfqEmS3PWLwGOpqHGX00jho10GGEC2YECgvdXOfBErM0mmhg/bSdGqnH4hx71Yoe+ywQVITIdKUgYDt2qOW0fQ5IaFXOkZgdnujkuiiZ+DiTbmqVvAoNQf3PHb5pM= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com; spf=pass smtp.mailfrom=bytedance.com; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b=i3WA7+hw; arc=none smtp.client-ip=209.85.216.48 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=bytedance.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b="i3WA7+hw" Received: by mail-pj1-f48.google.com with SMTP id 98e67ed59e1d1-32df5cae0b1so6061213a91.2 for ; Tue, 23 Sep 2025 02:17:04 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1758619024; x=1759223824; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=u6uNBFGFaJI1+zGTy5DFgjysGcp2+im/rBXpMTQQekQ=; b=i3WA7+hwsusiQEVz7BDgVA+fPkhNmse8Jj7ysbKZbC7Vhhy8Wem+3wlDSKLMQFoicQ zV0AR04QZ5b2kRflrijH5sf+fywm4v/LBxV8VQq/r8jZA72S5qko3i2qQ25l+hWXVN0j KB63gkW+0ajdyUY4LOJyePP5deUlT8A/QwNuzTDUhoN5ceNhOnyGhpE/Zlx1nrH9LJdo xfDN0AkilRu+Mm30vsnkSLQ2G+dUjYGPEEh7w0SloO3riVJgXE0+Jjj4vlYemvQyEpDe SOy0JlEiweyHpnSMAxm8Dh2dbaRJbTCee8XWzvWt6cMCMUotZLsxsivlLEuqvG4pzt11 7big== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1758619024; x=1759223824; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=u6uNBFGFaJI1+zGTy5DFgjysGcp2+im/rBXpMTQQekQ=; b=ggHB9d1cKqHjX4msM7At4BjbMqgskaIlrK3y84AV8XmPsgrTTl58U85ijwUmEtXDSi eJPKCpFJZ4UOBWLpSaHuaoBDJcfEB2MTPS4rO8fgM8f/iU9C0Ok+OokKEf+RHCEHfr7P 69fCkT9s5c1/kEJBC9po8PNVYcQJt218NSkAYQMTMFkS+2Lw0ViilyGc7bsWO/ssJlTi ldm3doIWGQYPDoItbLGmuXJ+sY8ckXYuHU3aKZNC0ISWBdV4zvYf3pXdUwLcU99iykF0 neonek0TCNy8ps83jB8Q8F022oc7hGm9ALCcLbuMw0v9HATPNGBzDtxtep6Zg1MYVl6P BzsQ== X-Forwarded-Encrypted: i=1; AJvYcCVIjsNZUbpWwikzsN9DUrzV7WEImTijD8lB0R4x7vfZ0I/IFM0yC4VjyyTOU802iWqyGm0bVtEAGTS7GSg=@vger.kernel.org X-Gm-Message-State: AOJu0YyjbyP02zxQ6JyiB6Rr3ajvF10WpxIcGRdYLupc1pbGaACzfPao Fsro2RDWvWOL2b6paVltEL0yMkiAs4f6qDgEZNtTEw0NP0SPe7rkTNte5lFiwSwDo/Y= X-Gm-Gg: ASbGnctmFLnKAwxSs8GT1d3/TKVmJWGPio7sz+78vN05tN9q2aI6yaBdUScZPW6HR9L IdevWwMURC0uQM976B9I9UWeKYssTS+ROE83dr7BwVZP1vcXzVNoi6nuITcE4TtCPi7N+3CXoHf 4W6ScxFmz4Way+XmeobIVOf91Fu4XxftFxx06ogMaqdKnd81oo87RYDARhehkbAO69f5uYstDzP PA0CMNag0nZMTpHtKMhuY2Uu2WEjCE6M3SVHnntFoOYt1rFjgRko2UkyY8FYr0QaoemXurfavvA 23TiTOMiGab0V4sfIyjnB3AqvV3gaXgh2VG1GZVAMgCuOxxZU9AmkD4cl30pd2av55a7t51w5w/ mM4O4ZVA43kxDaGada36Z7zFai9DPXXnZHyypwRg55eoZ4KSu2PaOLzd2QMUWuQQFo/i8wZg= X-Google-Smtp-Source: AGHT+IHXRTi7JrEoo9TlTBjVxxXC7IIaD0IUqv1/nkTJuTCB7uO/+49ZH0/c4MP+wiFn/vhAyFBCKw== X-Received: by 2002:a17:90b:1b47:b0:32e:e150:8937 with SMTP id 98e67ed59e1d1-332a92df6famr2662936a91.6.1758619023823; Tue, 23 Sep 2025 02:17:03 -0700 (PDT) Received: from G7HT0H2MK4.bytedance.net ([63.216.146.178]) by smtp.gmail.com with ESMTPSA id 98e67ed59e1d1-32ed26a9993sm18724713a91.11.2025.09.23.02.16.55 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Tue, 23 Sep 2025 02:17:03 -0700 (PDT) From: Qi Zheng To: hannes@cmpxchg.org, hughd@google.com, mhocko@suse.com, roman.gushchin@linux.dev, shakeel.butt@linux.dev, muchun.song@linux.dev, david@redhat.com, lorenzo.stoakes@oracle.com, ziy@nvidia.com, harry.yoo@oracle.com, baolin.wang@linux.alibaba.com, Liam.Howlett@oracle.com, npache@redhat.com, ryan.roberts@arm.com, dev.jain@arm.com, baohua@kernel.org, lance.yang@linux.dev, akpm@linux-foundation.org Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, Muchun Song , Qi Zheng Subject: [PATCH v2 2/4] mm: thp: introduce folio_split_queue_lock and its variants Date: Tue, 23 Sep 2025 17:16:23 +0800 Message-ID: X-Mailer: git-send-email 2.48.1 In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Muchun Song In future memcg removal, the binding between a folio and a memcg may change, making the split lock within the memcg unstable when held. A new approach is required to reparent the split queue to its parent. This patch starts introducing a unified way to acquire the split lock for future work. It's a code-only refactoring with no functional changes. Signed-off-by: Muchun Song Signed-off-by: Qi Zheng Acked-by: Johannes Weiner Reviewed-by: Zi Yan Acked-by: Shakeel Butt Acked-by: David Hildenbrand --- include/linux/memcontrol.h | 10 ++++ mm/huge_memory.c | 104 ++++++++++++++++++++++++++++--------- 2 files changed, 89 insertions(+), 25 deletions(-) diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h index 16fe0306e50ea..99876af13c315 100644 --- a/include/linux/memcontrol.h +++ b/include/linux/memcontrol.h @@ -1662,6 +1662,11 @@ int alloc_shrinker_info(struct mem_cgroup *memcg); void free_shrinker_info(struct mem_cgroup *memcg); void set_shrinker_bit(struct mem_cgroup *memcg, int nid, int shrinker_id); void reparent_shrinker_deferred(struct mem_cgroup *memcg); + +static inline int shrinker_id(struct shrinker *shrinker) +{ + return shrinker->id; +} #else #define mem_cgroup_sockets_enabled 0 =20 @@ -1693,6 +1698,11 @@ static inline void set_shrinker_bit(struct mem_cgrou= p *memcg, int nid, int shrinker_id) { } + +static inline int shrinker_id(struct shrinker *shrinker) +{ + return -1; +} #endif =20 #ifdef CONFIG_MEMCG diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 582628ddf3f33..2f41b8f0d4871 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -1078,26 +1078,83 @@ pmd_t maybe_pmd_mkwrite(pmd_t pmd, struct vm_area_s= truct *vma) =20 #ifdef CONFIG_MEMCG static inline -struct deferred_split *get_deferred_split_queue(struct folio *folio) +struct mem_cgroup *folio_split_queue_memcg(struct folio *folio, + struct deferred_split *queue) { - struct mem_cgroup *memcg =3D folio_memcg(folio); - struct pglist_data *pgdat =3D NODE_DATA(folio_nid(folio)); + if (mem_cgroup_disabled()) + return NULL; + if (&NODE_DATA(folio_nid(folio))->deferred_split_queue =3D=3D queue) + return NULL; + return container_of(queue, struct mem_cgroup, deferred_split_queue); +} =20 - if (memcg) - return &memcg->deferred_split_queue; - else - return &pgdat->deferred_split_queue; +static struct deferred_split *folio_split_queue_lock(struct folio *folio) +{ + struct mem_cgroup *memcg; + struct deferred_split *queue; + + memcg =3D folio_memcg(folio); + queue =3D memcg ? &memcg->deferred_split_queue : + &NODE_DATA(folio_nid(folio))->deferred_split_queue; + spin_lock(&queue->split_queue_lock); + + return queue; +} + +static struct deferred_split * +folio_split_queue_lock_irqsave(struct folio *folio, unsigned long *flags) +{ + struct mem_cgroup *memcg; + struct deferred_split *queue; + + memcg =3D folio_memcg(folio); + queue =3D memcg ? &memcg->deferred_split_queue : + &NODE_DATA(folio_nid(folio))->deferred_split_queue; + spin_lock_irqsave(&queue->split_queue_lock, *flags); + + return queue; } #else static inline -struct deferred_split *get_deferred_split_queue(struct folio *folio) +struct mem_cgroup *folio_split_queue_memcg(struct folio *folio, + struct deferred_split *queue) +{ + return NULL; +} + +static struct deferred_split *folio_split_queue_lock(struct folio *folio) { struct pglist_data *pgdat =3D NODE_DATA(folio_nid(folio)); + struct deferred_split *queue =3D &pgdat->deferred_split_queue; + + spin_lock(&queue->split_queue_lock); =20 - return &pgdat->deferred_split_queue; + return queue; +} + +static struct deferred_split * +folio_split_queue_lock_irqsave(struct folio *folio, unsigned long *flags) +{ + struct pglist_data *pgdat =3D NODE_DATA(folio_nid(folio)); + struct deferred_split *queue =3D &pgdat->deferred_split_queue; + + spin_lock_irqsave(&queue->split_queue_lock, *flags); + + return queue; } #endif =20 +static inline void split_queue_unlock(struct deferred_split *queue) +{ + spin_unlock(&queue->split_queue_lock); +} + +static inline void split_queue_unlock_irqrestore(struct deferred_split *qu= eue, + unsigned long flags) +{ + spin_unlock_irqrestore(&queue->split_queue_lock, flags); +} + static inline bool is_transparent_hugepage(const struct folio *folio) { if (!folio_test_large(folio)) @@ -3579,7 +3636,7 @@ static int __folio_split(struct folio *folio, unsigne= d int new_order, struct page *split_at, struct page *lock_at, struct list_head *list, bool uniform_split) { - struct deferred_split *ds_queue =3D get_deferred_split_queue(folio); + struct deferred_split *ds_queue; XA_STATE(xas, &folio->mapping->i_pages, folio->index); struct folio *end_folio =3D folio_next(folio); bool is_anon =3D folio_test_anon(folio); @@ -3718,7 +3775,7 @@ static int __folio_split(struct folio *folio, unsigne= d int new_order, } =20 /* Prevent deferred_split_scan() touching ->_refcount */ - spin_lock(&ds_queue->split_queue_lock); + ds_queue =3D folio_split_queue_lock(folio); if (folio_ref_freeze(folio, 1 + extra_pins)) { struct swap_cluster_info *ci =3D NULL; struct lruvec *lruvec; @@ -3740,7 +3797,7 @@ static int __folio_split(struct folio *folio, unsigne= d int new_order, */ list_del_init(&folio->_deferred_list); } - spin_unlock(&ds_queue->split_queue_lock); + split_queue_unlock(ds_queue); if (mapping) { int nr =3D folio_nr_pages(folio); =20 @@ -3835,7 +3892,7 @@ static int __folio_split(struct folio *folio, unsigne= d int new_order, if (ci) swap_cluster_unlock(ci); } else { - spin_unlock(&ds_queue->split_queue_lock); + split_queue_unlock(ds_queue); ret =3D -EAGAIN; } fail: @@ -4016,8 +4073,7 @@ bool __folio_unqueue_deferred_split(struct folio *fol= io) WARN_ON_ONCE(folio_ref_count(folio)); WARN_ON_ONCE(!mem_cgroup_disabled() && !folio_memcg_charged(folio)); =20 - ds_queue =3D get_deferred_split_queue(folio); - spin_lock_irqsave(&ds_queue->split_queue_lock, flags); + ds_queue =3D folio_split_queue_lock_irqsave(folio, &flags); if (!list_empty(&folio->_deferred_list)) { ds_queue->split_queue_len--; if (folio_test_partially_mapped(folio)) { @@ -4028,7 +4084,7 @@ bool __folio_unqueue_deferred_split(struct folio *fol= io) list_del_init(&folio->_deferred_list); unqueued =3D true; } - spin_unlock_irqrestore(&ds_queue->split_queue_lock, flags); + split_queue_unlock_irqrestore(ds_queue, flags); =20 return unqueued; /* useful for debug warnings */ } @@ -4036,10 +4092,7 @@ bool __folio_unqueue_deferred_split(struct folio *fo= lio) /* partially_mapped=3Dfalse won't clear PG_partially_mapped folio flag */ void deferred_split_folio(struct folio *folio, bool partially_mapped) { - struct deferred_split *ds_queue =3D get_deferred_split_queue(folio); -#ifdef CONFIG_MEMCG - struct mem_cgroup *memcg =3D folio_memcg(folio); -#endif + struct deferred_split *ds_queue; unsigned long flags; =20 /* @@ -4062,7 +4115,7 @@ void deferred_split_folio(struct folio *folio, bool p= artially_mapped) if (folio_test_swapcache(folio)) return; =20 - spin_lock_irqsave(&ds_queue->split_queue_lock, flags); + ds_queue =3D folio_split_queue_lock_irqsave(folio, &flags); if (partially_mapped) { if (!folio_test_partially_mapped(folio)) { folio_set_partially_mapped(folio); @@ -4077,15 +4130,16 @@ void deferred_split_folio(struct folio *folio, bool= partially_mapped) VM_WARN_ON_FOLIO(folio_test_partially_mapped(folio), folio); } if (list_empty(&folio->_deferred_list)) { + struct mem_cgroup *memcg; + + memcg =3D folio_split_queue_memcg(folio, ds_queue); list_add_tail(&folio->_deferred_list, &ds_queue->split_queue); ds_queue->split_queue_len++; -#ifdef CONFIG_MEMCG if (memcg) set_shrinker_bit(memcg, folio_nid(folio), - deferred_split_shrinker->id); -#endif + shrinker_id(deferred_split_shrinker)); } - spin_unlock_irqrestore(&ds_queue->split_queue_lock, flags); + split_queue_unlock_irqrestore(ds_queue, flags); } =20 static unsigned long deferred_split_count(struct shrinker *shrink, --=20 2.20.1 From nobody Thu Oct 2 03:30:40 2025 Received: from mail-pg1-f182.google.com (mail-pg1-f182.google.com [209.85.215.182]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 428EA320CAD for ; Tue, 23 Sep 2025 09:17:13 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.215.182 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1758619035; cv=none; b=a7enknZOygsU6WHuebR6Yjd7n4x9SHZMZIaTAQDHudzIxH42BNoZjf0NoyWU1OsMgM5lyZg28cGLLHL1Jaywna4hGpwG9X9uJgwJeAATc6KajVFOAgMv607nABXaw4Rm62jfX7cD5vClc0Bt+GJd7s0Qq4VOgl5nQB5l7gghyB4= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1758619035; c=relaxed/simple; bh=KOXlbh25Gsetgl3GNjGe9VD0ySH162kovB3o+nlpoCM=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=o3vsaE/ElZlJPDNuNgfsP+NQvlPc/ww8lRP/qqTm7MOVgfFqfCCRdjGR12RcLDOTw4hQzp8iUqillVe9xovy3wPerUaNDbSztbxajMxIRYgaPVSP3pphjFhkk5rIteIxcT4fXx+lpCIv7j5TaRXoXMhuOiypi1xgnHP1aqDOYdE= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com; spf=pass smtp.mailfrom=bytedance.com; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b=VmOvix5W; arc=none smtp.client-ip=209.85.215.182 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=bytedance.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b="VmOvix5W" Received: by mail-pg1-f182.google.com with SMTP id 41be03b00d2f7-b553412a19bso1802910a12.1 for ; Tue, 23 Sep 2025 02:17:13 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1758619032; x=1759223832; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=BMEpf3Ea3B6J3gmDiquYNV20PPyPPv+3X/VUa2fqYkk=; b=VmOvix5WwQ0bzDmNQBJtI0AX1II7E5Izcavu/qt0FOElCZwNnyJxsfRdKVeU1sjj7j A2rlQr0kSxvqn34aW0UpJEYxfh6N+y/gybdhV5Tzy1zNRn4hBErHopH+tjgbIhsY/LVf 4veAJvD/smNLcZi5gjPf9CIMiOgG0+CTHE7n8HdnmYAhv1cOMKNQ6URB0pQAp497x3DQ pCq+Dc/jofOXozrznKtluG/u/SqC+y325isPLk9YRO+nRbpITPVtCgOfwuEWGl/5PrFZ 7nuE5braFtHWhfQlOGH8jJH1LxZWoqfv0Fk+g4u3uhxT73tj5Pxnv/cZwQJzbipNcklb DxkQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1758619032; x=1759223832; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=BMEpf3Ea3B6J3gmDiquYNV20PPyPPv+3X/VUa2fqYkk=; b=Dmz4Ulnc7Z3SPzEcx0M58c/s5xDH9xBo+fK3WOVT4fhOyEyofcTmpWudlvil83LaQF NCvD4genQcrpFQnGLl149ASnfoORZ43ZgEg5zm/W12nsB8jWu0Ltzr+k4V5ngf00d7Rz pTy/OY3PBPZLTLHqvw5xVukHVYaiLx+3cWsvVxSOpMMw7uq+tA8mPPM8VqvgRr1kw3b7 O2o0JudtfS4+N3h5JQ3Kj7B/dHcZaTSwtVCyJ7+tuJLbmi8aKqdbOmU+V+LYtmy7LheP FntB+5+x7AUdyQzO7p7Lf8XTb4ysD56155EO52YljWj1wevJZuHMPyh1MvQ+uf6zN4+1 7maQ== X-Forwarded-Encrypted: i=1; AJvYcCUSob6sgBU9MG0eWadQ8dqkgpdBcdLWZWtdBpjMdDpDHyua9xBtrmcn8JbShW+TU7+xqeoRQqKBbQoBH4E=@vger.kernel.org X-Gm-Message-State: AOJu0YzTwCgMD/x3tvrknDQq7jXlflo4TqI5zrrmOSJHez4YfKhuy1TF zE3Wz3hfqmjKanBEL08AHBQnc4v/ChDclpE+YT+ee+lfDF/Oye8Wy9DYW1uiixZcap0= X-Gm-Gg: ASbGncsQcnF1RndKRlvzgtct478X9wzbrkoTlkPxT5+fUraOPxJG4yUaSSHZcYij6Iz M0cq8W77nx/WDfOWrVnVAgaaKXMdoOQxQiFlZIwBkZRfwGrUeHUHHU6dff6VK6oPfBTfB32capE 9cpXv1TbXbasBaFh5RkiVo4PCLr2MZE4YBLbplwnyUygW9/Ofs+kRH9kT58bEg6ITHHN2Sty2xx 8b+S4AOKep4CN2QOJM+/VK8k28Ti9hagXOJpM9ZakDDWdvhXIZQnVX1LmyOZJxQBZ5zqaeP9LvH tHCrvtN3kMuGZ/RXzn4LUAL/NTLSgRVVDJDXWMiweUS9vyfy73Glpo8/I8zemjrzJQ3L1sfjKsz t0K1dhwKPtv/T79a5fFPsRn6/+uqdQhYkDlV1kiaG0214IQS54F2NNJsBgtRcdbIvP2cCJ7U= X-Google-Smtp-Source: AGHT+IEjW7nlqMYtk1SjNV/LLbiGjcZcRqySo0SKeu4Hh2+wqVdplupwS9kqc4xr8VvvoRI+XMKetw== X-Received: by 2002:a17:902:ce01:b0:250:6d0e:1e40 with SMTP id d9443c01a7336-27cc1380e40mr22109715ad.7.1758619032483; Tue, 23 Sep 2025 02:17:12 -0700 (PDT) Received: from G7HT0H2MK4.bytedance.net ([63.216.146.178]) by smtp.gmail.com with ESMTPSA id 98e67ed59e1d1-32ed26a9993sm18724713a91.11.2025.09.23.02.17.04 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Tue, 23 Sep 2025 02:17:11 -0700 (PDT) From: Qi Zheng To: hannes@cmpxchg.org, hughd@google.com, mhocko@suse.com, roman.gushchin@linux.dev, shakeel.butt@linux.dev, muchun.song@linux.dev, david@redhat.com, lorenzo.stoakes@oracle.com, ziy@nvidia.com, harry.yoo@oracle.com, baolin.wang@linux.alibaba.com, Liam.Howlett@oracle.com, npache@redhat.com, ryan.roberts@arm.com, dev.jain@arm.com, baohua@kernel.org, lance.yang@linux.dev, akpm@linux-foundation.org Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, Muchun Song , Qi Zheng Subject: [PATCH v2 3/4] mm: thp: use folio_batch to handle THP splitting in deferred_split_scan() Date: Tue, 23 Sep 2025 17:16:24 +0800 Message-ID: <782da2d3eca63d9bf152c58c6733c4e16b06b740.1758618527.git.zhengqi.arch@bytedance.com> X-Mailer: git-send-email 2.48.1 In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Muchun Song The maintenance of the folio->_deferred_list is intricate because it's reused in a local list. Here are some peculiarities: 1) When a folio is removed from its split queue and added to a local on-stack list in deferred_split_scan(), the ->split_queue_len isn't updated, leading to an inconsistency between it and the actual number of folios in the split queue. 2) When the folio is split via split_folio() later, it's removed from the local list while holding the split queue lock. At this time, this lock protects the local list, not the split queue. 3) To handle the race condition with a third-party freeing or migrating the preceding folio, we must ensure there's always one safe (with raised refcount) folio before by delaying its folio_put(). More details can be found in commit e66f3185fa04 ("mm/thp: fix deferred split queue not partially_mapped"). It's rather tricky. We can use the folio_batch infrastructure to handle this clearly. In this case, ->split_queue_len will be consistent with the real number of folios in the split queue. If list_empty(&folio->_deferred_list) returns false, it's clear the folio must be in its split queue (not in a local list anymore). In the future, we will reparent LRU folios during memcg offline to eliminate dying memory cgroups, which requires reparenting the split queue to its parent first. So this patch prepares for using folio_split_queue_lock_irqsave() as the memcg may change then. Signed-off-by: Muchun Song Signed-off-by: Qi Zheng Acked-by: David Hildenbrand Reviewed-by: Zi Yan --- mm/huge_memory.c | 84 ++++++++++++++++++++++-------------------------- 1 file changed, 38 insertions(+), 46 deletions(-) diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 2f41b8f0d4871..48b51e6230a67 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -3781,21 +3781,22 @@ static int __folio_split(struct folio *folio, unsig= ned int new_order, struct lruvec *lruvec; int expected_refs; =20 - if (folio_order(folio) > 1 && - !list_empty(&folio->_deferred_list)) { - ds_queue->split_queue_len--; + if (folio_order(folio) > 1) { + if (!list_empty(&folio->_deferred_list)) { + ds_queue->split_queue_len--; + /* + * Reinitialize page_deferred_list after removing the + * page from the split_queue, otherwise a subsequent + * split will see list corruption when checking the + * page_deferred_list. + */ + list_del_init(&folio->_deferred_list); + } if (folio_test_partially_mapped(folio)) { folio_clear_partially_mapped(folio); mod_mthp_stat(folio_order(folio), MTHP_STAT_NR_ANON_PARTIALLY_MAPPED, -1); } - /* - * Reinitialize page_deferred_list after removing the - * page from the split_queue, otherwise a subsequent - * split will see list corruption when checking the - * page_deferred_list. - */ - list_del_init(&folio->_deferred_list); } split_queue_unlock(ds_queue); if (mapping) { @@ -4194,40 +4195,44 @@ static unsigned long deferred_split_scan(struct shr= inker *shrink, struct pglist_data *pgdata =3D NODE_DATA(sc->nid); struct deferred_split *ds_queue =3D &pgdata->deferred_split_queue; unsigned long flags; - LIST_HEAD(list); - struct folio *folio, *next, *prev =3D NULL; - int split =3D 0, removed =3D 0; + struct folio *folio, *next; + int split =3D 0, i; + struct folio_batch fbatch; =20 #ifdef CONFIG_MEMCG if (sc->memcg) ds_queue =3D &sc->memcg->deferred_split_queue; #endif =20 + folio_batch_init(&fbatch); +retry: spin_lock_irqsave(&ds_queue->split_queue_lock, flags); /* Take pin on all head pages to avoid freeing them under us */ list_for_each_entry_safe(folio, next, &ds_queue->split_queue, _deferred_list) { if (folio_try_get(folio)) { - list_move(&folio->_deferred_list, &list); - } else { + folio_batch_add(&fbatch, folio); + } else if (folio_test_partially_mapped(folio)) { /* We lost race with folio_put() */ - if (folio_test_partially_mapped(folio)) { - folio_clear_partially_mapped(folio); - mod_mthp_stat(folio_order(folio), - MTHP_STAT_NR_ANON_PARTIALLY_MAPPED, -1); - } - list_del_init(&folio->_deferred_list); - ds_queue->split_queue_len--; + folio_clear_partially_mapped(folio); + mod_mthp_stat(folio_order(folio), + MTHP_STAT_NR_ANON_PARTIALLY_MAPPED, -1); } + list_del_init(&folio->_deferred_list); + ds_queue->split_queue_len--; if (!--sc->nr_to_scan) break; + if (!folio_batch_space(&fbatch)) + break; } spin_unlock_irqrestore(&ds_queue->split_queue_lock, flags); =20 - list_for_each_entry_safe(folio, next, &list, _deferred_list) { + for (i =3D 0; i < folio_batch_count(&fbatch); i++) { bool did_split =3D false; bool underused =3D false; + struct deferred_split *fqueue; =20 + folio =3D fbatch.folios[i]; if (!folio_test_partially_mapped(folio)) { /* * See try_to_map_unused_to_zeropage(): we cannot @@ -4250,38 +4255,25 @@ static unsigned long deferred_split_scan(struct shr= inker *shrink, } folio_unlock(folio); next: + if (did_split || !folio_test_partially_mapped(folio)) + continue; /* - * split_folio() removes folio from list on success. * Only add back to the queue if folio is partially mapped. * If thp_underused returns false, or if split_folio fails * in the case it was underused, then consider it used and * don't add it back to split_queue. */ - if (did_split) { - ; /* folio already removed from list */ - } else if (!folio_test_partially_mapped(folio)) { - list_del_init(&folio->_deferred_list); - removed++; - } else { - /* - * That unlocked list_del_init() above would be unsafe, - * unless its folio is separated from any earlier folios - * left on the list (which may be concurrently unqueued) - * by one safe folio with refcount still raised. - */ - swap(folio, prev); + fqueue =3D folio_split_queue_lock_irqsave(folio, &flags); + if (list_empty(&folio->_deferred_list)) { + list_add_tail(&folio->_deferred_list, &fqueue->split_queue); + fqueue->split_queue_len++; } - if (folio) - folio_put(folio); + split_queue_unlock_irqrestore(fqueue, flags); } + folios_put(&fbatch); =20 - spin_lock_irqsave(&ds_queue->split_queue_lock, flags); - list_splice_tail(&list, &ds_queue->split_queue); - ds_queue->split_queue_len -=3D removed; - spin_unlock_irqrestore(&ds_queue->split_queue_lock, flags); - - if (prev) - folio_put(prev); + if (sc->nr_to_scan) + goto retry; =20 /* * Stop shrinker if we didn't split any page, but the queue is empty. --=20 2.20.1 From nobody Thu Oct 2 03:30:40 2025 Received: from mail-pf1-f177.google.com (mail-pf1-f177.google.com [209.85.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id E4CA1320CC9 for ; Tue, 23 Sep 2025 09:17:20 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.210.177 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1758619042; cv=none; b=nRuNCvuuyBxdNSbd7RqvMfqz56vnqk94O5g9X9/wxxPp/P8F0koyKVgAqnXn91yVdoBtBh7+ORJJFI+csQ/UrcWQ9sRIIV59F0ma1bivgImtf/2sVX8QICH07BlXhB/P66M84c5mz14OEKUko3CYa9hCK0R0Id6vcZeklyO85RY= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1758619042; c=relaxed/simple; bh=10Y+XNOiXyb/vrv5K9CM6W3Fnz9B0Mh4n4ASqtZuho0=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=jUR4hgiZxXSmeH8bQ6cqR5fOLNBfjy2xpUNhoVPSGu4HjXFZmWsp3T/4zgL+Ysvvxj1O/l0DZa7h6oO5VAcTcjPw7IN/xQfdr5SP7x36lemo9IPYFgmsbQ+L5x/KRKf4CdfvlH7OxFyEbMXCVCXI/NEVFug5SdSfyr2EoNXBagw= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com; spf=pass smtp.mailfrom=bytedance.com; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b=XKsXqBuj; arc=none smtp.client-ip=209.85.210.177 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=bytedance.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b="XKsXqBuj" Received: by mail-pf1-f177.google.com with SMTP id d2e1a72fcca58-76e4fc419a9so5343099b3a.0 for ; Tue, 23 Sep 2025 02:17:20 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1758619040; x=1759223840; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=fsqflaZCptn+OqUf7yzOdIjdD2+Qx392fdtpQO1y/sE=; b=XKsXqBujT1YcdcpXDj0A9MXvYvam8fnnNcKDCHx3YQVLJwcEOND3p5GN7jWRtlUVtU /E8iFqkEYgzWmQyO+T+gcJtc62g+mPgerBcvhiyqAcLg33F748GnxqLieKwCFTo9V/+0 02zw5jcJoiHJFilvBEWG22sOXDXJEdj9Tuif3YYLPrkKhg/KOrB8sZS+Fwey7VlICKTG fG4b3VpqIYPY7SwoiypUFH8pUutWtwrNfpvITuwjFsWTJpA4ZS6QKM9CnDHzdQ6k2Ij5 Lk83kaXan8DgEun3ObqCZO4jh7sCmnXn3dR39Mxeboo57fzxHfY4KTCRj0qnmN5aIQLX rRdQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1758619040; x=1759223840; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=fsqflaZCptn+OqUf7yzOdIjdD2+Qx392fdtpQO1y/sE=; b=CP9JvnnrSl0P/+nYDP2CaIXcOWjZnmuqF1whNECS6wxU5SxbOBqkVrHuQZc33gINAh KBAUyVAziBXUr/8zr4AAeOlbHeRSYG+PZZJIJcL2lrP/N2LCxRcl2vGoLYSmiXD4wjKg UWbYzJmry6xUkJhSKzCjAN4Nmu6qeSbX6sjK4F/5bv/amMNAGhRIQpIwbSnVYnhmRCqT k15IIvmZKDUMkf4E58fNOYnxI/Cv6x2HSd9aeOCfT8Je+QEn7mhr28GY42hYvTRWWeHx ubprmlY5deKrOsqJ0PTHl8kkdVTT0WEktb4KnRt5JLUnylNs43bRceAWEHs6peY24FiH /QRQ== X-Forwarded-Encrypted: i=1; AJvYcCUm1nTEWwaZqRwahYhuL7ccob7LxqB1t+IdVrZn9XD8R7ieNeql6hTzKBdt+tTYdMaiSbVlLB+c3dSWGb8=@vger.kernel.org X-Gm-Message-State: AOJu0YxLQ4c+EynnmtBym/GJ/OBq2DClkC4Pe5TOsJcp1M173E7E2Cxf Itr5ydJYe1hhZ2snaVdC3itzXBdqWeYuRH2fBZlwOnVsnEJLPASKP3r1hKfzQzjZiHI= X-Gm-Gg: ASbGncsAnGAOGr41h1DUU2plx1nWLfhGjY/LUNEhhEyEubASYFfNcT9h/qHHtK+ifvx PsafyMjLKjIeeEANLO1MieBWqYpDnfUqp4WT3YzO3B8hBLsQ96AwHjK5yKy3J/JXaME4tg1Z+H7 veyqSy8DrGT3boelbeHTOhD+XDdh7iujZJWOZZmgNh6E+4rAaCfxQjg6T2Kq6sJhnMPFoPFR+8p smxPF1JO2EG/H6mKOIxHq1KxtOWrsIPY5QS3jXD0Dv90gsSMLBnEyo/14Ay2KkScYJ+1axCqDwq 4pGqeOyq1e/0uD4btKOx9M7ZLWLYYFZhpHvSO2fltQ0BtDXS3QeWWi7m9oVSl/LlGQhyWPUReWE Xirz0QlbaOI+Fqse9M/IE3FVSKTBoZk42mwJwJg7QzaVB/057QhNrro5T/DNk3K1u3LRu/dw= X-Google-Smtp-Source: AGHT+IF8RvtbGMLomOdXVkDJ+YxzFvLcecwxKm+jrWt7odK5vRmlP4nv8ElQjsO9rstnUpRj/Gaa/w== X-Received: by 2002:a05:6a20:a122:b0:2ba:e2c5:7281 with SMTP id adf61e73a8af0-2cfe903f449mr2771078637.35.1758619040289; Tue, 23 Sep 2025 02:17:20 -0700 (PDT) Received: from G7HT0H2MK4.bytedance.net ([63.216.146.178]) by smtp.gmail.com with ESMTPSA id 98e67ed59e1d1-32ed26a9993sm18724713a91.11.2025.09.23.02.17.12 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Tue, 23 Sep 2025 02:17:19 -0700 (PDT) From: Qi Zheng To: hannes@cmpxchg.org, hughd@google.com, mhocko@suse.com, roman.gushchin@linux.dev, shakeel.butt@linux.dev, muchun.song@linux.dev, david@redhat.com, lorenzo.stoakes@oracle.com, ziy@nvidia.com, harry.yoo@oracle.com, baolin.wang@linux.alibaba.com, Liam.Howlett@oracle.com, npache@redhat.com, ryan.roberts@arm.com, dev.jain@arm.com, baohua@kernel.org, lance.yang@linux.dev, akpm@linux-foundation.org Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, Qi Zheng Subject: [PATCH v2 4/4] mm: thp: reparent the split queue during memcg offline Date: Tue, 23 Sep 2025 17:16:25 +0800 Message-ID: <55370bda7b2df617033ac12116c1712144bb7591.1758618527.git.zhengqi.arch@bytedance.com> X-Mailer: git-send-email 2.48.1 In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" In the future, we will reparent LRU folios during memcg offline to eliminate dying memory cgroups, which requires reparenting the split queue to its parent. Similar to list_lru, the split queue is relatively independent and does not need to be reparented along with objcg and LRU folios (holding objcg lock and lru lock). So let's apply the same mechanism as list_lru to reparent the split queue separately when memcg is offine. Signed-off-by: Qi Zheng --- include/linux/huge_mm.h | 2 ++ include/linux/mmzone.h | 1 + mm/huge_memory.c | 39 +++++++++++++++++++++++++++++++++++++++ mm/memcontrol.c | 1 + mm/mm_init.c | 1 + 5 files changed, 44 insertions(+) diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h index f327d62fc9852..a0d4b751974d2 100644 --- a/include/linux/huge_mm.h +++ b/include/linux/huge_mm.h @@ -417,6 +417,7 @@ static inline int split_huge_page(struct page *page) return split_huge_page_to_list_to_order(page, NULL, ret); } void deferred_split_folio(struct folio *folio, bool partially_mapped); +void reparent_deferred_split_queue(struct mem_cgroup *memcg); =20 void __split_huge_pmd(struct vm_area_struct *vma, pmd_t *pmd, unsigned long address, bool freeze); @@ -611,6 +612,7 @@ static inline int try_folio_split(struct folio *folio, = struct page *page, } =20 static inline void deferred_split_folio(struct folio *folio, bool partiall= y_mapped) {} +static inline void reparent_deferred_split_queue(struct mem_cgroup *memcg)= {} #define split_huge_pmd(__vma, __pmd, __address) \ do { } while (0) =20 diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index 7fb7331c57250..f3eb81fee056a 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -1346,6 +1346,7 @@ struct deferred_split { spinlock_t split_queue_lock; struct list_head split_queue; unsigned long split_queue_len; + bool is_dying; }; #endif =20 diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 48b51e6230a67..de7806f759cba 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -1094,9 +1094,15 @@ static struct deferred_split *folio_split_queue_lock= (struct folio *folio) struct deferred_split *queue; =20 memcg =3D folio_memcg(folio); +retry: queue =3D memcg ? &memcg->deferred_split_queue : &NODE_DATA(folio_nid(folio))->deferred_split_queue; spin_lock(&queue->split_queue_lock); + if (unlikely(queue->is_dying =3D=3D true)) { + spin_unlock(&queue->split_queue_lock); + memcg =3D parent_mem_cgroup(memcg); + goto retry; + } =20 return queue; } @@ -1108,9 +1114,15 @@ folio_split_queue_lock_irqsave(struct folio *folio, = unsigned long *flags) struct deferred_split *queue; =20 memcg =3D folio_memcg(folio); +retry: queue =3D memcg ? &memcg->deferred_split_queue : &NODE_DATA(folio_nid(folio))->deferred_split_queue; spin_lock_irqsave(&queue->split_queue_lock, *flags); + if (unlikely(queue->is_dying =3D=3D true)) { + spin_unlock_irqrestore(&queue->split_queue_lock, *flags); + memcg =3D parent_mem_cgroup(memcg); + goto retry; + } =20 return queue; } @@ -4284,6 +4296,33 @@ static unsigned long deferred_split_scan(struct shri= nker *shrink, return split; } =20 +void reparent_deferred_split_queue(struct mem_cgroup *memcg) +{ + struct mem_cgroup *parent =3D parent_mem_cgroup(memcg); + struct deferred_split *ds_queue =3D &memcg->deferred_split_queue; + struct deferred_split *parent_ds_queue =3D &parent->deferred_split_queue; + int nid; + + spin_lock_irq(&ds_queue->split_queue_lock); + spin_lock_nested(&parent_ds_queue->split_queue_lock, SINGLE_DEPTH_NESTING= ); + + if (!ds_queue->split_queue_len) + goto unlock; + + list_splice_tail_init(&ds_queue->split_queue, &parent_ds_queue->split_que= ue); + parent_ds_queue->split_queue_len +=3D ds_queue->split_queue_len; + ds_queue->split_queue_len =3D 0; + /* Mark the ds_queue dead */ + ds_queue->is_dying =3D true; + + for_each_node(nid) + set_shrinker_bit(parent, nid, shrinker_id(deferred_split_shrinker)); + +unlock: + spin_unlock(&parent_ds_queue->split_queue_lock); + spin_unlock_irq(&ds_queue->split_queue_lock); +} + #ifdef CONFIG_DEBUG_FS static void split_huge_pages_all(void) { diff --git a/mm/memcontrol.c b/mm/memcontrol.c index e090f29eb03bd..d03da72e7585d 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -3887,6 +3887,7 @@ static void mem_cgroup_css_offline(struct cgroup_subs= ys_state *css) zswap_memcg_offline_cleanup(memcg); =20 memcg_offline_kmem(memcg); + reparent_deferred_split_queue(memcg); reparent_shrinker_deferred(memcg); wb_memcg_offline(memcg); lru_gen_offline_memcg(memcg); diff --git a/mm/mm_init.c b/mm/mm_init.c index 3db2dea7db4c5..cbda5c2ee3241 100644 --- a/mm/mm_init.c +++ b/mm/mm_init.c @@ -1387,6 +1387,7 @@ static void pgdat_init_split_queue(struct pglist_data= *pgdat) spin_lock_init(&ds_queue->split_queue_lock); INIT_LIST_HEAD(&ds_queue->split_queue); ds_queue->split_queue_len =3D 0; + ds_queue->is_dying =3D false; } #else static void pgdat_init_split_queue(struct pglist_data *pgdat) {} --=20 2.20.1