From nobody Wed Oct 1 21:27:15 2025 Received: from mail-pf1-f172.google.com (mail-pf1-f172.google.com [209.85.210.172]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id AD64935950 for ; Sun, 28 Sep 2025 11:17:36 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.210.172 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1759058258; cv=none; b=RPY9B7dxK2G43Oy0urWDs6ObqpXTQbUy/9wDhwIkKQ8hJYEVEpsmo+2Su3Sdc1p4KLK1dVMbpPsRVX7aHGP5mD+5D5QRS1w3mOo4xFlzK9PPmipdh+zgpwsvM3Day1etHIbe3kQ4B1KmU7B37LOQLacTMqVbqswz2kASiwG79NQ= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1759058258; c=relaxed/simple; bh=7xtZd7Yd8EegYDFC5RFSYuo7Ki9HOkuue2yjH/U0y5c=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=owRNl8vsMS6sve3EWYVaOdwbbxlRvqD86edVoxlz413F6Nm2nJY0CXtxdiUfEEIPdSAJtkapjcvXN3QKzYewp8XtnkGhJ95SFNeqGgY5dZQw2D3aRyTvdnd3bpzJ5IoX3aT4gAIBOi77PLQOVlZRI/V3y3gJSE/L1AGcXk2dwWI= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com; spf=pass smtp.mailfrom=bytedance.com; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b=huLBu34I; arc=none smtp.client-ip=209.85.210.172 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=bytedance.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b="huLBu34I" Received: by mail-pf1-f172.google.com with SMTP id d2e1a72fcca58-78100be28easo2661824b3a.1 for ; Sun, 28 Sep 2025 04:17:36 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1759058256; x=1759663056; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=k1ekZIO3b9Ttpe972UQJNalil++E0ZqVPkWGV+cChaI=; b=huLBu34IMCAeO1FRXhGke5OvTBCjcfYlX8ZtAQEmYTMv0IlVeD0zDYLQRWIauyvJwP UNxWuXF3Xixt4kyUMfbXluxOIH13Dz6BPIoOUtnm5273wAo0hhF3VTNsZSG3Hhele6fY CLBv+ppfaFSNR1suYZ+6L0q7G9LA62Ko9Iy+FlQ2KicxLITfYmHUv3gZ4xwrbw4igSbo 6MsiyWBVAaGYgtN2MUettURfLtFIGNeENXR0BydKTmyRqPqOxfSBHc9nVfM9CT4TR5aA AkJmY3mMp7oxK5jADEKMaq6jxEK1kfgdKoqcQ66cbwIry1Hc1mjPFfBv6hlHHDuvy0t8 522Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1759058256; x=1759663056; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=k1ekZIO3b9Ttpe972UQJNalil++E0ZqVPkWGV+cChaI=; b=mhZXtJsRsPN9lWVkhZ+pZIZwXTN/fux3J6cTFoa2YQJx8xCqlFTt1dBVIM9W5ah1Wh w2FUK8KQwRK+BDbj9Duq7DfJ1oQQ6xzEW0foyXusX/+PvjiVc0Dn8JJlW2MKHzkTX0np 0kTczH0V0JW/1sMM+iwWantmQYQj/sQsgxRSh65jJs+CBLPzT2awrUxtCIGdQej+Z00H VGtwo5Wi95zPMyM671Tc0kTaLY4zMNqj6k9RG545mHSmx9G6qYoSdGRUs1+uX3tQGkTR jEm/tCJ2w8rKnVaWdj4vToBxEYBg89+54b4yBX/+4MbbU8wSIXFQZ+7+YEzMp7kHDpwE njUQ== X-Forwarded-Encrypted: i=1; AJvYcCVEsrNa9g/5LMdCkSJv6b/6iLwz/auoME81hqxrGMDTGT3re3ntM4ltazWSbOqNDAkZDdBaCflIP4GlWgk=@vger.kernel.org X-Gm-Message-State: AOJu0YyM+nWF5MBBPee3Nmd8Bt6iEbGCdq7ooispgvu1vNKjIIlx7H9p MXXeLBIRVyQTBpM8wVzBwgWuSH3FkiV30h0rJ7LILYCeeJ+hU4KEf+WeBHQ5lsHEGPM= X-Gm-Gg: ASbGncu8MI1CUpj1wXbir8iSr4V8vM1WCqDp6Upq5b5qqEmg4T9RGAutEkGUc9TJuaN hYIg1Knnlaaq0DcA+0Qxpn6hcKyaz/hBHZ+KxCtmh4NW+u4TVHrQvZrbObJ2RRkz2HTgqWzhUPw D6kmKhvBE0aGYJp7loCFUXD3goVg+QpPwe6fXiIYG6vptPpXiGX16pAt/HP0M+tuTeZgNrwbfdo 1XX0RGWHIjfFVUT9C3ew9ZCC9irk89h8CUmaRX03neNp/RHhEIuQoSU5j3PQZbnx3ndOSyzhLVP yxNnnJRyyL06qzkoCBoyLiZZQQ5D0qY/8TzQQ5fs2q8bzReJBB6rSqtZmDxvumKAP86am/7638c DwRskXi/0zVjwLaPi60Z4zrKh5g6UQbzDmD56IPf5rbV0HyZJ5XELDICvrfdXDcD2ZU6qPQw6m5 Gb X-Google-Smtp-Source: AGHT+IHqrAqU+jC+qNEzoF4LOIdPO8yTHXE2d2ywkwGRffVmCfyJBJS1CU2Vm8Q37E39F0I3wqfN2Q== X-Received: by 2002:a05:6a20:734e:b0:2c6:cdcc:5dd2 with SMTP id adf61e73a8af0-2e7be80995fmr15796544637.11.1759058255924; Sun, 28 Sep 2025 04:17:35 -0700 (PDT) Received: from G7HT0H2MK4.bytedance.net ([139.177.225.231]) by smtp.gmail.com with ESMTPSA id 41be03b00d2f7-b57c55a2c45sm8687451a12.45.2025.09.28.04.17.26 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Sun, 28 Sep 2025 04:17:35 -0700 (PDT) From: Qi Zheng To: hannes@cmpxchg.org, hughd@google.com, mhocko@suse.com, roman.gushchin@linux.dev, shakeel.butt@linux.dev, muchun.song@linux.dev, david@redhat.com, lorenzo.stoakes@oracle.com, ziy@nvidia.com, harry.yoo@oracle.com, baolin.wang@linux.alibaba.com, Liam.Howlett@oracle.com, npache@redhat.com, ryan.roberts@arm.com, dev.jain@arm.com, baohua@kernel.org, lance.yang@linux.dev, akpm@linux-foundation.org Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, Muchun Song , Qi Zheng Subject: [PATCH v3 1/4] mm: thp: replace folio_memcg() with folio_memcg_charged() Date: Sun, 28 Sep 2025 19:16:59 +0800 Message-ID: <488cc8d44ba9ef1ec8ed2b32e7267d83cfd5736d.1759056506.git.zhengqi.arch@bytedance.com> X-Mailer: git-send-email 2.48.1 In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Muchun Song folio_memcg_charged() is intended for use when the user is unconcerned about the returned memcg pointer. It is more efficient than folio_memcg(). Therefore, replace folio_memcg() with folio_memcg_charged(). Signed-off-by: Muchun Song Signed-off-by: Qi Zheng Acked-by: Johannes Weiner Acked-by: Shakeel Butt Acked-by: David Hildenbrand Reviewed-by: Roman Gushchin --- mm/huge_memory.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 1b81680b4225f..6db24b3a57005 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -4014,7 +4014,7 @@ bool __folio_unqueue_deferred_split(struct folio *fol= io) bool unqueued =3D false; =20 WARN_ON_ONCE(folio_ref_count(folio)); - WARN_ON_ONCE(!mem_cgroup_disabled() && !folio_memcg(folio)); + WARN_ON_ONCE(!mem_cgroup_disabled() && !folio_memcg_charged(folio)); =20 ds_queue =3D get_deferred_split_queue(folio); spin_lock_irqsave(&ds_queue->split_queue_lock, flags); --=20 2.20.1 From nobody Wed Oct 1 21:27:15 2025 Received: from mail-pf1-f172.google.com (mail-pf1-f172.google.com [209.85.210.172]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 7CA2F2DCBFA for ; Sun, 28 Sep 2025 11:17:45 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.210.172 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1759058267; cv=none; b=VDIHjNepdldKB/1xI1XcA7opCz469roM1LAWAAav3Zo1kwIUam9rfdF8YgJCnvXYBVzAYCLs4lqmTEpPATjMEdZLNxXkXXm4nhaOcld3ahBRbnJrWTaBASpx1K5ttFwV19uql5/CwmZiGv1ihVxNI+fzj2A49K4JFz7IPdkxPqU= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1759058267; c=relaxed/simple; bh=Wf7w5cXuqBs2uVEih/JherOIpJySNVLPPcmveznoocA=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=Tys3oX4jo+/hIYg7yd99bqaqecV1oxwBxumO/IXkXihM+27cgXwq9X9FrGglJxs3vOQ/vKEP5KZZRokYl597+rUBPkyK+4vOM+RNzNvjDJvbfsUtvBSYuKQgMbtp+KlQlznt3VxeJQSYrGXp4ojRbpbJw5LtvykzR7YrIU8iAgE= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com; spf=pass smtp.mailfrom=bytedance.com; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b=JQmUYBlN; arc=none smtp.client-ip=209.85.210.172 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=bytedance.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b="JQmUYBlN" Received: by mail-pf1-f172.google.com with SMTP id d2e1a72fcca58-782023ca359so938894b3a.2 for ; Sun, 28 Sep 2025 04:17:45 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1759058265; x=1759663065; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=kqIkKVMb0D1T6X1uViaX+LF5tLEzwLH/sPWmDCgIpN4=; b=JQmUYBlNPt2ML3NpbDy+vCIUz4UgXhYJPA2eVhAOCkqqwouqZ0pAQbpNIaZVxgKYhr rzkSrqYgcJT8EZr+h5LjftYdnAOfCdkJXbyOzQxAZSN0yCO4cEEc1FyoCUbU0k8jTPPd RE2Ez0OX2+YJX7pOj71Eg4s4ink6Es+g5eJQDPIRLUOO2gQg20I13mYOEH9PEk9PcGWT 8oNddUQo862+THA2/hry58l/Gg8O/Ohg7pDtG0Fo/ud63iXHvNerqfZIhEsJTAUy/kgr Xhei4AopfUmB1dVwwMWm7zrnEm0p1xA3LHmkLA+bwmaF7Y+716qkhXG0ucMZrB/+ZQIO oIrw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1759058265; x=1759663065; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=kqIkKVMb0D1T6X1uViaX+LF5tLEzwLH/sPWmDCgIpN4=; b=kpOFGyI208IMiQI7WdG6WFXMS4zqApQ/n5xfAJNECEvmtwVabqo4k00KH0IF7CPYFC gOgnIMM4MtpIlHYYLrzsU8wqIeNzpCc/M/cLqOw+P0kGp/SbkkslI/aBLJb0qRacRaB5 ky//QJhMeYJXNJPMvRUXGMqQTSsuJv2hSYYePQiEWH05TdNAwMzAospZujsEjo/29K3E azoMki7GfWz4//vMNLrWJ7yYB5TIWFwF6EbY05m8YcCn0vAy6KraYh0kwZaPLiptgJJ+ /s7Qc0WZNZ9E35f4NCBP10HqFfn7SM3TFtwPEYztK1Kun/5OOo8TMIqiTxqSCUdMcPbi xJPg== X-Forwarded-Encrypted: i=1; AJvYcCVzsQETa627AF68sDbLQmUtWzrab2N1QvAtJsApARMxSqJ+u74sAQBO2APa1aVP7iqnbR2yzNPGPTgVa8M=@vger.kernel.org X-Gm-Message-State: AOJu0YxY/3A9BQ1ZTyvluNjCTOrmGaMzfnzhRS485j9auyJA4F6NXgON VFXPp2jZuNeUX3cx7TlATNFNlw8hsKicSOy9w7vSs64wgsOkAdWe5S8NdAnwQfc/qtM= X-Gm-Gg: ASbGnctjfnVJLA8A8/U5NK965Y93JrFSLSKGXKh+nGfYw08npLHgzG9PfI1FFuBlmNL 8c3mnBWXqPoH323p0ArZzFt5ybB8VUjsBlzSDJl5NJmZMebwfqicyGy/xfB+TsikCJ5A5ftFogX PcCrH9LnfJc4AaYU2gkE2b/I5obRPmi29Eey15CgSiQR3Vil2iNq6mP9YwOp+4WU19nngFqAVe6 fHLKcOeSGVbXtoy0WPknZAGCdHv3px/e1MyryO5a/n4w23Nra61bOSUr2meAdV680/ATUc+pBe4 NRobw6PqNDaSOIo0HCa4y1uYh2BdQA6uHvk/XB1KUd8HSFylZaVoTYlGvRYfhom5g6xqBssbbF5 e87aDIaETiMBjnpQv1stGHb/8FENGJfHWMb3b2Dx7NqklkAurIJoJ+EPwYsJcWn3z/Q== X-Google-Smtp-Source: AGHT+IFpgKlNfJwp+S3wjiWhEQGfyxqtvOuQ9QjT3paL+rGW25mXbvLbUn9yt5U2oEGJqOcasX6FEg== X-Received: by 2002:a05:6a20:3d83:b0:2df:8271:f095 with SMTP id adf61e73a8af0-2e7c75000e0mr17723876637.24.1759058264714; Sun, 28 Sep 2025 04:17:44 -0700 (PDT) Received: from G7HT0H2MK4.bytedance.net ([139.177.225.231]) by smtp.gmail.com with ESMTPSA id 41be03b00d2f7-b57c55a2c45sm8687451a12.45.2025.09.28.04.17.36 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Sun, 28 Sep 2025 04:17:44 -0700 (PDT) From: Qi Zheng To: hannes@cmpxchg.org, hughd@google.com, mhocko@suse.com, roman.gushchin@linux.dev, shakeel.butt@linux.dev, muchun.song@linux.dev, david@redhat.com, lorenzo.stoakes@oracle.com, ziy@nvidia.com, harry.yoo@oracle.com, baolin.wang@linux.alibaba.com, Liam.Howlett@oracle.com, npache@redhat.com, ryan.roberts@arm.com, dev.jain@arm.com, baohua@kernel.org, lance.yang@linux.dev, akpm@linux-foundation.org Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, Muchun Song , Qi Zheng Subject: [PATCH v3 2/4] mm: thp: introduce folio_split_queue_lock and its variants Date: Sun, 28 Sep 2025 19:17:00 +0800 Message-ID: X-Mailer: git-send-email 2.48.1 In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Muchun Song In future memcg removal, the binding between a folio and a memcg may change, making the split lock within the memcg unstable when held. A new approach is required to reparent the split queue to its parent. This patch starts introducing a unified way to acquire the split lock for future work. It's a code-only refactoring with no functional changes. Signed-off-by: Muchun Song Signed-off-by: Qi Zheng Acked-by: Johannes Weiner Reviewed-by: Zi Yan Acked-by: Shakeel Butt Acked-by: David Hildenbrand --- include/linux/memcontrol.h | 10 ++++ mm/huge_memory.c | 104 ++++++++++++++++++++++++++++--------- 2 files changed, 89 insertions(+), 25 deletions(-) diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h index 16fe0306e50ea..99876af13c315 100644 --- a/include/linux/memcontrol.h +++ b/include/linux/memcontrol.h @@ -1662,6 +1662,11 @@ int alloc_shrinker_info(struct mem_cgroup *memcg); void free_shrinker_info(struct mem_cgroup *memcg); void set_shrinker_bit(struct mem_cgroup *memcg, int nid, int shrinker_id); void reparent_shrinker_deferred(struct mem_cgroup *memcg); + +static inline int shrinker_id(struct shrinker *shrinker) +{ + return shrinker->id; +} #else #define mem_cgroup_sockets_enabled 0 =20 @@ -1693,6 +1698,11 @@ static inline void set_shrinker_bit(struct mem_cgrou= p *memcg, int nid, int shrinker_id) { } + +static inline int shrinker_id(struct shrinker *shrinker) +{ + return -1; +} #endif =20 #ifdef CONFIG_MEMCG diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 6db24b3a57005..0ac3b97177b7f 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -1078,26 +1078,83 @@ pmd_t maybe_pmd_mkwrite(pmd_t pmd, struct vm_area_s= truct *vma) =20 #ifdef CONFIG_MEMCG static inline -struct deferred_split *get_deferred_split_queue(struct folio *folio) +struct mem_cgroup *folio_split_queue_memcg(struct folio *folio, + struct deferred_split *queue) { - struct mem_cgroup *memcg =3D folio_memcg(folio); - struct pglist_data *pgdat =3D NODE_DATA(folio_nid(folio)); + if (mem_cgroup_disabled()) + return NULL; + if (&NODE_DATA(folio_nid(folio))->deferred_split_queue =3D=3D queue) + return NULL; + return container_of(queue, struct mem_cgroup, deferred_split_queue); +} =20 - if (memcg) - return &memcg->deferred_split_queue; - else - return &pgdat->deferred_split_queue; +static struct deferred_split *folio_split_queue_lock(struct folio *folio) +{ + struct mem_cgroup *memcg; + struct deferred_split *queue; + + memcg =3D folio_memcg(folio); + queue =3D memcg ? &memcg->deferred_split_queue : + &NODE_DATA(folio_nid(folio))->deferred_split_queue; + spin_lock(&queue->split_queue_lock); + + return queue; +} + +static struct deferred_split * +folio_split_queue_lock_irqsave(struct folio *folio, unsigned long *flags) +{ + struct mem_cgroup *memcg; + struct deferred_split *queue; + + memcg =3D folio_memcg(folio); + queue =3D memcg ? &memcg->deferred_split_queue : + &NODE_DATA(folio_nid(folio))->deferred_split_queue; + spin_lock_irqsave(&queue->split_queue_lock, *flags); + + return queue; } #else static inline -struct deferred_split *get_deferred_split_queue(struct folio *folio) +struct mem_cgroup *folio_split_queue_memcg(struct folio *folio, + struct deferred_split *queue) +{ + return NULL; +} + +static struct deferred_split *folio_split_queue_lock(struct folio *folio) { struct pglist_data *pgdat =3D NODE_DATA(folio_nid(folio)); + struct deferred_split *queue =3D &pgdat->deferred_split_queue; + + spin_lock(&queue->split_queue_lock); =20 - return &pgdat->deferred_split_queue; + return queue; +} + +static struct deferred_split * +folio_split_queue_lock_irqsave(struct folio *folio, unsigned long *flags) +{ + struct pglist_data *pgdat =3D NODE_DATA(folio_nid(folio)); + struct deferred_split *queue =3D &pgdat->deferred_split_queue; + + spin_lock_irqsave(&queue->split_queue_lock, *flags); + + return queue; } #endif =20 +static inline void split_queue_unlock(struct deferred_split *queue) +{ + spin_unlock(&queue->split_queue_lock); +} + +static inline void split_queue_unlock_irqrestore(struct deferred_split *qu= eue, + unsigned long flags) +{ + spin_unlock_irqrestore(&queue->split_queue_lock, flags); +} + static inline bool is_transparent_hugepage(const struct folio *folio) { if (!folio_test_large(folio)) @@ -3579,7 +3636,7 @@ static int __folio_split(struct folio *folio, unsigne= d int new_order, struct page *split_at, struct page *lock_at, struct list_head *list, bool uniform_split) { - struct deferred_split *ds_queue =3D get_deferred_split_queue(folio); + struct deferred_split *ds_queue; XA_STATE(xas, &folio->mapping->i_pages, folio->index); struct folio *end_folio =3D folio_next(folio); bool is_anon =3D folio_test_anon(folio); @@ -3718,7 +3775,7 @@ static int __folio_split(struct folio *folio, unsigne= d int new_order, } =20 /* Prevent deferred_split_scan() touching ->_refcount */ - spin_lock(&ds_queue->split_queue_lock); + ds_queue =3D folio_split_queue_lock(folio); if (folio_ref_freeze(folio, 1 + extra_pins)) { struct swap_cluster_info *ci =3D NULL; struct lruvec *lruvec; @@ -3740,7 +3797,7 @@ static int __folio_split(struct folio *folio, unsigne= d int new_order, */ list_del_init(&folio->_deferred_list); } - spin_unlock(&ds_queue->split_queue_lock); + split_queue_unlock(ds_queue); if (mapping) { int nr =3D folio_nr_pages(folio); =20 @@ -3835,7 +3892,7 @@ static int __folio_split(struct folio *folio, unsigne= d int new_order, if (ci) swap_cluster_unlock(ci); } else { - spin_unlock(&ds_queue->split_queue_lock); + split_queue_unlock(ds_queue); ret =3D -EAGAIN; } fail: @@ -4016,8 +4073,7 @@ bool __folio_unqueue_deferred_split(struct folio *fol= io) WARN_ON_ONCE(folio_ref_count(folio)); WARN_ON_ONCE(!mem_cgroup_disabled() && !folio_memcg_charged(folio)); =20 - ds_queue =3D get_deferred_split_queue(folio); - spin_lock_irqsave(&ds_queue->split_queue_lock, flags); + ds_queue =3D folio_split_queue_lock_irqsave(folio, &flags); if (!list_empty(&folio->_deferred_list)) { ds_queue->split_queue_len--; if (folio_test_partially_mapped(folio)) { @@ -4028,7 +4084,7 @@ bool __folio_unqueue_deferred_split(struct folio *fol= io) list_del_init(&folio->_deferred_list); unqueued =3D true; } - spin_unlock_irqrestore(&ds_queue->split_queue_lock, flags); + split_queue_unlock_irqrestore(ds_queue, flags); =20 return unqueued; /* useful for debug warnings */ } @@ -4036,10 +4092,7 @@ bool __folio_unqueue_deferred_split(struct folio *fo= lio) /* partially_mapped=3Dfalse won't clear PG_partially_mapped folio flag */ void deferred_split_folio(struct folio *folio, bool partially_mapped) { - struct deferred_split *ds_queue =3D get_deferred_split_queue(folio); -#ifdef CONFIG_MEMCG - struct mem_cgroup *memcg =3D folio_memcg(folio); -#endif + struct deferred_split *ds_queue; unsigned long flags; =20 /* @@ -4062,7 +4115,7 @@ void deferred_split_folio(struct folio *folio, bool p= artially_mapped) if (folio_test_swapcache(folio)) return; =20 - spin_lock_irqsave(&ds_queue->split_queue_lock, flags); + ds_queue =3D folio_split_queue_lock_irqsave(folio, &flags); if (partially_mapped) { if (!folio_test_partially_mapped(folio)) { folio_set_partially_mapped(folio); @@ -4077,15 +4130,16 @@ void deferred_split_folio(struct folio *folio, bool= partially_mapped) VM_WARN_ON_FOLIO(folio_test_partially_mapped(folio), folio); } if (list_empty(&folio->_deferred_list)) { + struct mem_cgroup *memcg; + + memcg =3D folio_split_queue_memcg(folio, ds_queue); list_add_tail(&folio->_deferred_list, &ds_queue->split_queue); ds_queue->split_queue_len++; -#ifdef CONFIG_MEMCG if (memcg) set_shrinker_bit(memcg, folio_nid(folio), - deferred_split_shrinker->id); -#endif + shrinker_id(deferred_split_shrinker)); } - spin_unlock_irqrestore(&ds_queue->split_queue_lock, flags); + split_queue_unlock_irqrestore(ds_queue, flags); } =20 static unsigned long deferred_split_count(struct shrinker *shrink, --=20 2.20.1 From nobody Wed Oct 1 21:27:15 2025 Received: from mail-pf1-f174.google.com (mail-pf1-f174.google.com [209.85.210.174]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 393BD2DCF7D for ; Sun, 28 Sep 2025 11:17:54 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.210.174 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1759058276; cv=none; b=Lkvmvf5vPeXgYd3EWOmqK4Fu1d0GYLb8V3MwSGEcTqu9ZSshzUyi0ChS+xqvPv5OTZpmXQ38YD+YQ7Om3S02/CQAkhxVU8XhAsalMx4qYYo8DXahEccaX5wpraFNA+buw+FuHqePEuBkkYKfo0/xh09wtnXakrr8R6C9+YvffBk= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1759058276; c=relaxed/simple; bh=VNY5dirm+uEkDQXaxi1k9ptTWRpaYGc8tLEYF2RFCkQ=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=gJR1PG+ndG7ikGniT/3BHJT7jK6KfZ82YcJGcEtSa8ClQkNIZywTwfVhkMYMhAIshvGlVcUa2QKf4PRvg89VXHx2+Yfi5TMDTeRSo1E+uH5S23zvnrIbhWSebA8MlZ3Wt5zerqoZbBvHPelXL7sYaY0gxYFyjtavx9q2zG3piXo= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com; spf=pass smtp.mailfrom=bytedance.com; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b=CjEPIszI; arc=none smtp.client-ip=209.85.210.174 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=bytedance.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b="CjEPIszI" Received: by mail-pf1-f174.google.com with SMTP id d2e1a72fcca58-7835321bc98so448672b3a.2 for ; Sun, 28 Sep 2025 04:17:54 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1759058273; x=1759663073; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=REKRs8CKWU4LVKPgGJm+MTqNzgJQGfW7CB6Vx4pkWZs=; b=CjEPIszIw+UbzIBw37vAWtPqgiqsLfXfQy4trdpm7+waIbvB2PikyBV46mU1PFkpFc pZNTQalUXfg6pJqtbspwPlKG6X0H3+QfHWCtH2SH5Ed/w++h3xXaeT2Vb67tTIFJjqem kPgPAS1d/nFzce6lydk2QZ6hs+PB+L8xewXb46YoxrPjrqm+yUTkXmwpb23xi0mc5TaR N9cwmYJToLsbxtajcWaIDGMBufLxG/0Q7Lqr5c5qrdFjLPALnXhFbt9VJYQ0X2EUMWNS G17llpKG9j3gcDXZiziHkRrvFGFN8ADSEifdQY/AUGyxqVsNQxn1XugvOSheqbO1M0yE Z6jw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1759058273; x=1759663073; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=REKRs8CKWU4LVKPgGJm+MTqNzgJQGfW7CB6Vx4pkWZs=; b=QIxEzKP+E9Ir1lVorxii1O4IB58lOJNVfQUYuqymUdCSjbhCDJkeR2+aZJSRtDGd8+ tIpazVPOvT/ZGilow9oTuRYKyM2W7Uo4Cgu3FE+efQu2Ug/NI0HPpxbHsUuFoQiIB9W7 Y03ZXkNQZQx9NaSxiPoQ1AyT5/7ZkRzoLP9ZA+h5rOhVbAkcykEJGLk12NPA+uQbBNvs xxeoyF+f19JY8Xr1uSu+gMXOrZAOTZ+3VSEQSrxAxwNu2mdJy+Uoq1XWYU67iRcgSup0 RF/Pl+3SbBWyENMh+TE3yAxetAuo95qqhvGUM/kKani5VptqjdLTq+dfSAedxhaT58fW w99w== X-Forwarded-Encrypted: i=1; AJvYcCWkAGfSdMGC0KiC/UpuIjd4ZSrw6u9sfp1g4f7VJLIvY/OwyHKE7QZTinoziaMX+gJIRqxuOKC0v/iC7Gs=@vger.kernel.org X-Gm-Message-State: AOJu0YxZ0ish7DHTwTeMcfeD8otRF70JaeUVzWiMsRJgM4+ng47Yedc9 qnMPehT3GmwZxPySJ34wpZBrPuDJpli06VGvrRZAtWCi2Wtt0R2ghqpxB1WfHz+mfzM= X-Gm-Gg: ASbGncu2FDuFp9/Ft01ffwa76WFYHk1omOt8B0lAVu689eb5vUBeCEDhkyofF5JjfUV WHRyTjJBz5OVI2dFJz3Ccgm0GXuWF2jnN3Kti48EUchfHQBCOE8H2QledZeX2VcSf78YS8qo5mi h/hHcq73gJMHmpy1xt7WXJyYQsErWUeT48FeXWdxT3u8os8jiS9TVo/0qSamb3V92jB/B1Xf6Ng 63MHw14zy4zlkRikOc1+9T/TchPrZ2ClDdYbUo95vRHItqAWPi8LtEAU1JaDg6bL3PqAjCfTaM+ Yn1NOiU0eeseglf6nEqfHsFsUiaUKBIqnjdHLHFfteS9BuOYlv9LcLMT7LFoK4pUci+mpwPzYMf a0WRnOFFILJXv9Sm9xEmspREVFJB+RhchIb0KWqWU2NngCd+6x86RycrGdR9pBfxAlTmhQMXeJX hE X-Google-Smtp-Source: AGHT+IGIIdlp+3ne4RMEMjmjf/O27EuwNrQ4psGtAD9bT7M8UEYwxZbaO2h7CkCYrbpbiJHcop3BUQ== X-Received: by 2002:a05:6a20:734e:b0:2ca:1b5:9d4d with SMTP id adf61e73a8af0-2e7bf478afcmr17761831637.2.1759058273370; Sun, 28 Sep 2025 04:17:53 -0700 (PDT) Received: from G7HT0H2MK4.bytedance.net ([139.177.225.231]) by smtp.gmail.com with ESMTPSA id 41be03b00d2f7-b57c55a2c45sm8687451a12.45.2025.09.28.04.17.45 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Sun, 28 Sep 2025 04:17:52 -0700 (PDT) From: Qi Zheng To: hannes@cmpxchg.org, hughd@google.com, mhocko@suse.com, roman.gushchin@linux.dev, shakeel.butt@linux.dev, muchun.song@linux.dev, david@redhat.com, lorenzo.stoakes@oracle.com, ziy@nvidia.com, harry.yoo@oracle.com, baolin.wang@linux.alibaba.com, Liam.Howlett@oracle.com, npache@redhat.com, ryan.roberts@arm.com, dev.jain@arm.com, baohua@kernel.org, lance.yang@linux.dev, akpm@linux-foundation.org Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, Muchun Song , Qi Zheng Subject: [PATCH v3 3/4] mm: thp: use folio_batch to handle THP splitting in deferred_split_scan() Date: Sun, 28 Sep 2025 19:17:01 +0800 Message-ID: <43dc58065a4905cdcc02b3e755f3fa9d3fec350b.1759056506.git.zhengqi.arch@bytedance.com> X-Mailer: git-send-email 2.48.1 In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Muchun Song The maintenance of the folio->_deferred_list is intricate because it's reused in a local list. Here are some peculiarities: 1) When a folio is removed from its split queue and added to a local on-stack list in deferred_split_scan(), the ->split_queue_len isn't updated, leading to an inconsistency between it and the actual number of folios in the split queue. 2) When the folio is split via split_folio() later, it's removed from the local list while holding the split queue lock. At this time, this lock protects the local list, not the split queue. 3) To handle the race condition with a third-party freeing or migrating the preceding folio, we must ensure there's always one safe (with raised refcount) folio before by delaying its folio_put(). More details can be found in commit e66f3185fa04 ("mm/thp: fix deferred split queue not partially_mapped"). It's rather tricky. We can use the folio_batch infrastructure to handle this clearly. In this case, ->split_queue_len will be consistent with the real number of folios in the split queue. If list_empty(&folio->_deferred_list) returns false, it's clear the folio must be in its split queue (not in a local list anymore). In the future, we will reparent LRU folios during memcg offline to eliminate dying memory cgroups, which requires reparenting the split queue to its parent first. So this patch prepares for using folio_split_queue_lock_irqsave() as the memcg may change then. Signed-off-by: Muchun Song Signed-off-by: Qi Zheng Reviewed-by: Zi Yan Acked-by: David Hildenbrand --- mm/huge_memory.c | 84 ++++++++++++++++++++++-------------------------- 1 file changed, 38 insertions(+), 46 deletions(-) diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 0ac3b97177b7f..bb32091e3133e 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -3781,21 +3781,22 @@ static int __folio_split(struct folio *folio, unsig= ned int new_order, struct lruvec *lruvec; int expected_refs; =20 - if (folio_order(folio) > 1 && - !list_empty(&folio->_deferred_list)) { - ds_queue->split_queue_len--; + if (folio_order(folio) > 1) { + if (!list_empty(&folio->_deferred_list)) { + ds_queue->split_queue_len--; + /* + * Reinitialize page_deferred_list after removing the + * page from the split_queue, otherwise a subsequent + * split will see list corruption when checking the + * page_deferred_list. + */ + list_del_init(&folio->_deferred_list); + } if (folio_test_partially_mapped(folio)) { folio_clear_partially_mapped(folio); mod_mthp_stat(folio_order(folio), MTHP_STAT_NR_ANON_PARTIALLY_MAPPED, -1); } - /* - * Reinitialize page_deferred_list after removing the - * page from the split_queue, otherwise a subsequent - * split will see list corruption when checking the - * page_deferred_list. - */ - list_del_init(&folio->_deferred_list); } split_queue_unlock(ds_queue); if (mapping) { @@ -4185,40 +4186,44 @@ static unsigned long deferred_split_scan(struct shr= inker *shrink, struct pglist_data *pgdata =3D NODE_DATA(sc->nid); struct deferred_split *ds_queue =3D &pgdata->deferred_split_queue; unsigned long flags; - LIST_HEAD(list); - struct folio *folio, *next, *prev =3D NULL; - int split =3D 0, removed =3D 0; + struct folio *folio, *next; + int split =3D 0, i; + struct folio_batch fbatch; =20 #ifdef CONFIG_MEMCG if (sc->memcg) ds_queue =3D &sc->memcg->deferred_split_queue; #endif =20 + folio_batch_init(&fbatch); +retry: spin_lock_irqsave(&ds_queue->split_queue_lock, flags); /* Take pin on all head pages to avoid freeing them under us */ list_for_each_entry_safe(folio, next, &ds_queue->split_queue, _deferred_list) { if (folio_try_get(folio)) { - list_move(&folio->_deferred_list, &list); - } else { + folio_batch_add(&fbatch, folio); + } else if (folio_test_partially_mapped(folio)) { /* We lost race with folio_put() */ - if (folio_test_partially_mapped(folio)) { - folio_clear_partially_mapped(folio); - mod_mthp_stat(folio_order(folio), - MTHP_STAT_NR_ANON_PARTIALLY_MAPPED, -1); - } - list_del_init(&folio->_deferred_list); - ds_queue->split_queue_len--; + folio_clear_partially_mapped(folio); + mod_mthp_stat(folio_order(folio), + MTHP_STAT_NR_ANON_PARTIALLY_MAPPED, -1); } + list_del_init(&folio->_deferred_list); + ds_queue->split_queue_len--; if (!--sc->nr_to_scan) break; + if (!folio_batch_space(&fbatch)) + break; } spin_unlock_irqrestore(&ds_queue->split_queue_lock, flags); =20 - list_for_each_entry_safe(folio, next, &list, _deferred_list) { + for (i =3D 0; i < folio_batch_count(&fbatch); i++) { bool did_split =3D false; bool underused =3D false; + struct deferred_split *fqueue; =20 + folio =3D fbatch.folios[i]; if (!folio_test_partially_mapped(folio)) { /* * See try_to_map_unused_to_zeropage(): we cannot @@ -4241,38 +4246,25 @@ static unsigned long deferred_split_scan(struct shr= inker *shrink, } folio_unlock(folio); next: + if (did_split || !folio_test_partially_mapped(folio)) + continue; /* - * split_folio() removes folio from list on success. * Only add back to the queue if folio is partially mapped. * If thp_underused returns false, or if split_folio fails * in the case it was underused, then consider it used and * don't add it back to split_queue. */ - if (did_split) { - ; /* folio already removed from list */ - } else if (!folio_test_partially_mapped(folio)) { - list_del_init(&folio->_deferred_list); - removed++; - } else { - /* - * That unlocked list_del_init() above would be unsafe, - * unless its folio is separated from any earlier folios - * left on the list (which may be concurrently unqueued) - * by one safe folio with refcount still raised. - */ - swap(folio, prev); + fqueue =3D folio_split_queue_lock_irqsave(folio, &flags); + if (list_empty(&folio->_deferred_list)) { + list_add_tail(&folio->_deferred_list, &fqueue->split_queue); + fqueue->split_queue_len++; } - if (folio) - folio_put(folio); + split_queue_unlock_irqrestore(fqueue, flags); } + folios_put(&fbatch); =20 - spin_lock_irqsave(&ds_queue->split_queue_lock, flags); - list_splice_tail(&list, &ds_queue->split_queue); - ds_queue->split_queue_len -=3D removed; - spin_unlock_irqrestore(&ds_queue->split_queue_lock, flags); - - if (prev) - folio_put(prev); + if (sc->nr_to_scan) + goto retry; =20 /* * Stop shrinker if we didn't split any page, but the queue is empty. --=20 2.20.1 From nobody Wed Oct 1 21:27:15 2025 Received: from out-171.mta0.migadu.com (out-171.mta0.migadu.com [91.218.175.171]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 06EB0221DB1 for ; Sun, 28 Sep 2025 11:45:22 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=91.218.175.171 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1759059926; cv=none; b=Jaj6jYzKhaGYUxgNdfolo+X2mLM+MTBWkyYxx0evLRo8Ey/F1YEJh3nIcCIpyEjttyE+pjrf0VG++FDaRtxoIuYY8XlclP47XwnbpZQiN5s+Qz0AURZ4k/JxLbzNgF2wRufAi+VGy7ByuVqCwCdpnsE8EAzp/+XpzmvLWClXh1g= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1759059926; c=relaxed/simple; bh=dOaiC7hPdRdNRTiK3Li6yW2D0QVnTDFXT2W26tA/kaY=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=NYXyc8wtkOjuVKkBBWO6hKGlPjgZbpi/0mneMwwNfDId0d4SwF96NES6pjxKNkxi66ZFrdkrgl/8N1vemeBQcBomdm4bDGPUGP9flb5Pczraa3CeDGtexWV07EEeKpYPAyVVtoaoWSOGqLxuAnlB7xhPOF2TnYUzNhm23gPw7X0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev; spf=pass smtp.mailfrom=linux.dev; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b=JIq1Lufp; arc=none smtp.client-ip=91.218.175.171 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.dev Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b="JIq1Lufp" X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1759059920; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=O81bmHO/De5uM7lSg4FU2b4XyEO6JopNzT32LkacqfA=; b=JIq1LufpQuE8ZgyxbB42qgdDzEoiLyWslWoFZWiXX1nBvJoYOI0iUJTYCWWWrcG9DM8Z1y SRk3pLH0BZTEabjcCsD8hKiuAdZbgQtS6dL7JSMtOHc4NHVtKe8iokFqA6bwqUDe7XQjB5 S8G3ATFaBFe2Sj1eyaBxrpoVqKvG9ec= From: Qi Zheng To: hannes@cmpxchg.org, hughd@google.com, mhocko@suse.com, roman.gushchin@linux.dev, shakeel.butt@linux.dev, muchun.song@linux.dev, david@redhat.com, lorenzo.stoakes@oracle.com, ziy@nvidia.com, harry.yoo@oracle.com, baolin.wang@linux.alibaba.com, Liam.Howlett@oracle.com, npache@redhat.com, ryan.roberts@arm.com, dev.jain@arm.com, baohua@kernel.org, lance.yang@linux.dev, akpm@linux-foundation.org Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, Qi Zheng Subject: [PATCH v3 4/4] mm: thp: reparent the split queue during memcg offline Date: Sun, 28 Sep 2025 19:45:08 +0800 Message-ID: <2ddd0c184829e65c5b3afa34e93599783e7af3d4.1759056506.git.zhengqi.arch@bytedance.com> In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Migadu-Flow: FLOW_OUT Content-Type: text/plain; charset="utf-8" From: Qi Zheng Similar to list_lru, the split queue is relatively independent and does not need to be reparented along with objcg and LRU folios (holding objcg lock and lru lock). So let's apply the same mechanism as list_lru to reparent the split queue separately when memcg is offine. This is also a preparation for reparenting LRU folios. Signed-off-by: Qi Zheng --- include/linux/huge_mm.h | 4 ++++ mm/huge_memory.c | 46 +++++++++++++++++++++++++++++++++++++++++ mm/memcontrol.c | 1 + 3 files changed, 51 insertions(+) diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h index f327d62fc9852..0c211dcbb0ec1 100644 --- a/include/linux/huge_mm.h +++ b/include/linux/huge_mm.h @@ -417,6 +417,9 @@ static inline int split_huge_page(struct page *page) return split_huge_page_to_list_to_order(page, NULL, ret); } void deferred_split_folio(struct folio *folio, bool partially_mapped); +#ifdef CONFIG_MEMCG +void reparent_deferred_split_queue(struct mem_cgroup *memcg); +#endif =20 void __split_huge_pmd(struct vm_area_struct *vma, pmd_t *pmd, unsigned long address, bool freeze); @@ -611,6 +614,7 @@ static inline int try_folio_split(struct folio *folio, = struct page *page, } =20 static inline void deferred_split_folio(struct folio *folio, bool partiall= y_mapped) {} +static inline void reparent_deferred_split_queue(struct mem_cgroup *memcg)= {} #define split_huge_pmd(__vma, __pmd, __address) \ do { } while (0) =20 diff --git a/mm/huge_memory.c b/mm/huge_memory.c index bb32091e3133e..5fc0caca71de0 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -1094,9 +1094,22 @@ static struct deferred_split *folio_split_queue_lock= (struct folio *folio) struct deferred_split *queue; =20 memcg =3D folio_memcg(folio); +retry: queue =3D memcg ? &memcg->deferred_split_queue : &NODE_DATA(folio_nid(folio))->deferred_split_queue; spin_lock(&queue->split_queue_lock); + /* + * Notice: + * 1. The memcg could be NULL if cgroup_disable=3Dmemory is set. + * 2. There is a period between setting CSS_DYING and reparenting + * deferred split queue, and during this period the THPs in the + * deferred split queue will be hidden from the shrinker side. + */ + if (unlikely(memcg && css_is_dying(&memcg->css))) { + spin_unlock(&queue->split_queue_lock); + memcg =3D parent_mem_cgroup(memcg); + goto retry; + } =20 return queue; } @@ -1108,9 +1121,15 @@ folio_split_queue_lock_irqsave(struct folio *folio, = unsigned long *flags) struct deferred_split *queue; =20 memcg =3D folio_memcg(folio); +retry: queue =3D memcg ? &memcg->deferred_split_queue : &NODE_DATA(folio_nid(folio))->deferred_split_queue; spin_lock_irqsave(&queue->split_queue_lock, *flags); + if (unlikely(memcg && css_is_dying(&memcg->css))) { + spin_unlock_irqrestore(&queue->split_queue_lock, *flags); + memcg =3D parent_mem_cgroup(memcg); + goto retry; + } =20 return queue; } @@ -4275,6 +4294,33 @@ static unsigned long deferred_split_scan(struct shri= nker *shrink, return split; } =20 +#ifdef CONFIG_MEMCG +void reparent_deferred_split_queue(struct mem_cgroup *memcg) +{ + struct mem_cgroup *parent =3D parent_mem_cgroup(memcg); + struct deferred_split *ds_queue =3D &memcg->deferred_split_queue; + struct deferred_split *parent_ds_queue =3D &parent->deferred_split_queue; + int nid; + + spin_lock_irq(&ds_queue->split_queue_lock); + spin_lock_nested(&parent_ds_queue->split_queue_lock, SINGLE_DEPTH_NESTING= ); + + if (!ds_queue->split_queue_len) + goto unlock; + + list_splice_tail_init(&ds_queue->split_queue, &parent_ds_queue->split_que= ue); + parent_ds_queue->split_queue_len +=3D ds_queue->split_queue_len; + ds_queue->split_queue_len =3D 0; + + for_each_node(nid) + set_shrinker_bit(parent, nid, shrinker_id(deferred_split_shrinker)); + +unlock: + spin_unlock(&parent_ds_queue->split_queue_lock); + spin_unlock_irq(&ds_queue->split_queue_lock); +} +#endif + #ifdef CONFIG_DEBUG_FS static void split_huge_pages_all(void) { diff --git a/mm/memcontrol.c b/mm/memcontrol.c index e090f29eb03bd..d03da72e7585d 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -3887,6 +3887,7 @@ static void mem_cgroup_css_offline(struct cgroup_subs= ys_state *css) zswap_memcg_offline_cleanup(memcg); =20 memcg_offline_kmem(memcg); + reparent_deferred_split_queue(memcg); reparent_shrinker_deferred(memcg); wb_memcg_offline(memcg); lru_gen_offline_memcg(memcg); --=20 2.20.1