From nobody Sat Feb  7 23:26:06 2026
Received: from m16.mail.163.com (m16.mail.163.com [117.135.210.3])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id 482C826D4CA;
	Thu, 11 Dec 2025 03:29:02 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=117.135.210.3
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1765423747; cv=none;
 b=fjYmachgHeLHP/mKHSmo8fFH5S+pOxcj/dFSHK0WnoKeyCbt6dAjrXrT2T93mMwz6kUV4QjQvDAivjDraWKhWd5zhISPrkfDdEqPlak070joim6ZJJqF78NKEwSqoxHfGwzEqCY8NvHEnXz0YGS0Eig0/qjMHriPQQ37tvkAKO0=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1765423747; c=relaxed/simple;
	bh=B3kdRNOLcj/OXyiFm8PfITf4VLWItqfaYJfei09TnjM=;
	h=Date:From:To:Cc:Subject:In-Reply-To:References:Content-Type:
	 MIME-Version:Message-ID;
 b=ButwW89CV80jVsu7Y882o0U+Rp7aNLkuU2yBKfGabt+hCIJFU2jzOuSDT9mcmeIBi5hw4NpCpxNxRsiUOwSPtxTWScSJcrzkxVZY/HxH12lQzG/isVfq5xRf7VNndSUKv93YymqafKj3fX6oh5sVGxgivM/GydAB6RouQiq/L1U=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=163.com;
 spf=pass smtp.mailfrom=163.com;
 dkim=pass (1024-bit key) header.d=163.com header.i=@163.com
 header.b=k0Y3NQ0d; arc=none smtp.client-ip=117.135.210.3
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=163.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=163.com
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (1024-bit key) header.d=163.com header.i=@163.com
 header.b="k0Y3NQ0d"
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=163.com;
	s=s110527; h=Date:From:To:Subject:Content-Type:MIME-Version:
	Message-ID; bh=B3kdRNOLcj/OXyiFm8PfITf4VLWItqfaYJfei09TnjM=; b=k
	0Y3NQ0d67g/TaQ4Qbe/ODWd2C0g6XbhebbEYRMdWVY9zlq70aKJuwP4AA5VdvoNF
	l+vefls+k76+p29lRR2D6jz0Xi9N2euS3VEJiYS3L56ZW8jBc/AJ3BMPkM616XQ7
	a23LtooVwJcRcY7ZXSti1879d2E38OAAH50NutiXjA=
Received: from 00107082$163.com ( [111.35.191.189] ) by
 ajax-webmail-wmsvr-40-119 (Coremail) ; Thu, 11 Dec 2025 11:28:21 +0800
 (CST)
Date: Thu, 11 Dec 2025 11:28:21 +0800 (CST)
From: "David Wang" <00107082@163.com>
To: "Mal Haak" <malcolm@haak.id.au>
Cc: linux-kernel@vger.kernel.org, surenb@google.com, xiubli@redhat.com,
	idryomov@gmail.com, ceph-devel@vger.kernel.org
Subject: RRe: Possible memory leak in 6.17.7
X-Priority: 3
X-Mailer: Coremail Webmail Server Version 2023.4-cmXT build
 20250723(a044bf12) Copyright (c) 2002-2025 www.mailtech.cn 163com
In-Reply-To: <20251210234318.5d8c2d68@xps15mal>
References: <20251110182008.71e0858b@xps15mal>
 <20251208110829.11840-1-00107082@163.com>
 <20251209090831.13c7a639@xps15mal>
 <17469653.4a75.19b01691299.Coremail.00107082@163.com>
 <20251210234318.5d8c2d68@xps15mal>
X-NTES-SC: 
 AL_Qu2dB/iZvE8r5yOZY+kXn0oTju85XMCzuv8j3YJeN500piTN5A0fXF5jAFDb4sW3GSCVsjy7Th9p7ftbT4xqdI+OsYSFQS0LWcfM+qx0WXjk
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
X-Mailing-List: linux-kernel@vger.kernel.org
List-Id: <linux-kernel.vger.kernel.org>
List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
Message-ID: <2a9ba88e.3aa6.19b0b73dd4e.Coremail.00107082@163.com>
X-Coremail-Locale: zh_CN
X-CM-TRANSID: dygvCgDXx7pWOjppw8I2AA--.13W
X-CM-SenderInfo: qqqrilqqysqiywtou0bp/xtbC7RaMN2k6OlaQ6gAA3F
X-Coremail-Antispam: 1U5529EdanIXcx71UUUUU7vcSsGvfC2KfnxnUU==
Content-Type: text/plain; charset="utf-8"


At 2025-12-10 21:43:18, "Mal Haak" <malcolm@haak.id.au> wrote:
>On Tue, 9 Dec 2025 12:40:21 +0800 (CST)
>"David Wang" <00107082@163.com> wrote:
>
>> At 2025-12-09 07:08:31, "Mal Haak" <malcolm@haak.id.au> wrote:
>> >On Mon,  8 Dec 2025 19:08:29 +0800
>> >David Wang <00107082@163.com> wrote:
>> > =20
>> >> On Mon, 10 Nov 2025 18:20:08 +1000
>> >> Mal Haak <malcolm@haak.id.au> wrote: =20
>> >> > Hello,
>> >> >=20
>> >> > I have found a memory leak in 6.17.7 but I am unsure how to
>> >> > track it down effectively.
>> >> >=20
>> >> >    =20
>> >>=20
>> >> I think the `memory allocation profiling` feature can help.
>> >> https://docs.kernel.org/mm/allocation-profiling.html
>> >>=20
>> >> You would need to build a kernel with=20
>> >> CONFIG_MEM_ALLOC_PROFILING=3Dy
>> >> CONFIG_MEM_ALLOC_PROFILING_ENABLED_BY_DEFAULT=3Dy
>> >>=20
>> >> And check /proc/allocinfo for the suspicious allocations which take
>> >> more memory than expected.
>> >>=20
>> >> (I once caught a nvidia driver memory leak.)
>> >>=20
>> >>=20
>> >> FYI
>> >> David
>> >>  =20
>> >
>> >Thank you for your suggestion. I have some results.
>> >
>> >Ran the rsync workload for about 9 hours. It started to look like it
>> >was happening.
>> ># smem -pw
>> >Area                           Used      Cache   Noncache=20
>> >firmware/hardware             0.00%      0.00%      0.00%=20
>> >kernel image                  0.00%      0.00%      0.00%=20
>> >kernel dynamic memory        80.46%     65.80%     14.66%=20
>> >userspace memory              0.35%      0.16%      0.19%=20
>> >free memory                  19.19%     19.19%      0.00%=20
>> ># sort -g /proc/allocinfo|tail|numfmt --to=3Diec
>> >         22M     5609 mm/memory.c:1190 func:folio_prealloc=20
>> >         23M     1932 fs/xfs/xfs_buf.c:226 [xfs]
>> >func:xfs_buf_alloc_backing_mem=20
>> >         24M    24135 fs/xfs/xfs_icache.c:97 [xfs]
>> > func:xfs_inode_alloc 27M     6693 mm/memory.c:1192
>> > func:folio_prealloc 58M    14784 mm/page_ext.c:271
>> > func:alloc_page_ext 258M      129 mm/khugepaged.c:1069
>> > func:alloc_charge_folio 430M   770788 lib/xarray.c:378
>> > func:xas_alloc 545M    36444 mm/slub.c:3059 func:alloc_slab_page=20
>> >        9.8G  2563617 mm/readahead.c:189 func:ractl_alloc_folio=20
>> >         20G  5164004 mm/filemap.c:2012 func:__filemap_get_folio=20
>> >
>> >
>> >So I stopped the workload and dropped caches to confirm.
>> >
>> ># echo 3 > /proc/sys/vm/drop_caches
>> ># smem -pw
>> >Area                           Used      Cache   Noncache=20
>> >firmware/hardware             0.00%      0.00%      0.00%=20
>> >kernel image                  0.00%      0.00%      0.00%=20
>> >kernel dynamic memory        33.45%      0.09%     33.36%=20
>> >userspace memory              0.36%      0.16%      0.19%=20
>> >free memory                  66.20%     66.20%      0.00%=20
>> ># sort -g /proc/allocinfo|tail|numfmt --to=3Diec
>> >         12M     2987 mm/execmem.c:41 func:execmem_vmalloc=20
>> >         12M        3 kernel/dma/pool.c:96 func:atomic_pool_expand=20
>> >         13M      751 mm/slub.c:3061 func:alloc_slab_page=20
>> >         16M        8 mm/khugepaged.c:1069 func:alloc_charge_folio=20
>> >         18M     4355 mm/memory.c:1190 func:folio_prealloc=20
>> >         24M     6119 mm/memory.c:1192 func:folio_prealloc=20
>> >         58M    14784 mm/page_ext.c:271 func:alloc_page_ext=20
>> >         61M    15448 mm/readahead.c:189 func:ractl_alloc_folio=20
>> >         79M     6726 mm/slub.c:3059 func:alloc_slab_page=20
>> >         11G  2674488 mm/filemap.c:2012 func:__filemap_get_folio

Maybe narrowing down the "Noncache" caller of __filemap_get_folio would hel=
p clarify things.
(It could be designed that way, and  needs other route than dropping-cache =
to release the memory, just guess....)
If you want, you can modify code to split the accounting for __filemap_get_=
folio according to its callers.

Following is a draft patch: (based on v6.18)

diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index 09b581c1d878..ba8c659a6ae3 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -753,7 +753,11 @@ static inline fgf_t fgf_set_order(size_t size)
 }
=20
 void *filemap_get_entry(struct address_space *mapping, pgoff_t index);
-struct folio *__filemap_get_folio(struct address_space *mapping, pgoff_t i=
ndex,
+
+#define __filemap_get_folio(...)			\
+	alloc_hooks(__filemap_get_folio_noprof(__VA_ARGS__))
+
+struct folio *__filemap_get_folio_noprof(struct address_space *mapping, pg=
off_t index,
 		fgf_t fgp_flags, gfp_t gfp);
 struct page *pagecache_get_page(struct address_space *mapping, pgoff_t ind=
ex,
 		fgf_t fgp_flags, gfp_t gfp);
diff --git a/mm/filemap.c b/mm/filemap.c
index 024b71da5224..e1c1c26d7cb3 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -1938,7 +1938,7 @@ void *filemap_get_entry(struct address_space *mapping=
, pgoff_t index)
  *
  * Return: The found folio or an ERR_PTR() otherwise.
  */
-struct folio *__filemap_get_folio(struct address_space *mapping, pgoff_t i=
ndex,
+struct folio *__filemap_get_folio_noprof(struct address_space *mapping, pg=
off_t index,
 		fgf_t fgp_flags, gfp_t gfp)
 {
 	struct folio *folio;
@@ -2009,7 +2009,7 @@ struct folio *__filemap_get_folio(struct address_spac=
e *mapping, pgoff_t index,
 			err =3D -ENOMEM;
 			if (order > min_order)
 				alloc_gfp |=3D __GFP_NORETRY | __GFP_NOWARN;
-			folio =3D filemap_alloc_folio(alloc_gfp, order);
+			folio =3D filemap_alloc_folio_noprof(alloc_gfp, order);
 			if (!folio)
 				continue;
=20
@@ -2056,7 +2056,7 @@ struct folio *__filemap_get_folio(struct address_spac=
e *mapping, pgoff_t index,
 		folio_clear_dropbehind(folio);
 	return folio;
 }
-EXPORT_SYMBOL(__filemap_get_folio);
+EXPORT_SYMBOL(__filemap_get_folio_noprof);
=20
 static inline struct folio *find_get_entry(struct xa_state *xas, pgoff_t m=
ax,
 		xa_mark_t mark)


FYI
David

>> >
>> >So if I'm reading this correctly something is causing folios collect
>> >and not be able to be freed? =20
>>=20
>> CC cephfs, maybe someone could have an easy reading out of those
>> folio usage
>>=20
>>=20
>> >
>> >Also it's clear that some of the folio's are counting as cache and
>> >some aren't.=20
>> >
>> >Like I said 6.17 and 6.18 both have the issue. 6.12 does not. I'm now
>> >going to manually walk through previous kernel releases and find
>> >where it first starts happening purely because I'm having issues
>> >building earlier kernels due to rust stuff and other python
>> >incompatibilities making doing a git-bisect a bit fun.
>> >
>> >I'll do it the packages way until I get closer, then solve the build
>> >issues.=20
>> >
>> >Thanks,
>> >Mal
>> > =20
>Thanks David.
>
>I've contacted the ceph developers as well.=20
>
>There was a suggestion it was due to the change from, to quote:
>folio.free() to folio.put() or something like this.
>
>The change happened around 6.14/6.15
>
>I've found an easier reproducer.=20
>
>There has been a suggestion that perhaps the ceph team might not fix
>this as "you can just reboot before the machine becomes unstable" and
>"Nobody else has encountered this bug"
>
>I'll leave that to other people to make a call on but I'd assume the
>lack of reports is due to the fact that most stable distros are still
>on a a far too early kernel and/or are using the fuse driver with k8s.
>
>Anyway, thanks for your assistance.