From nobody Tue Dec 16 16:38:05 2025 Received: from m16.mail.163.com (m16.mail.163.com [117.135.210.3]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 482C826D4CA; Thu, 11 Dec 2025 03:29:02 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=117.135.210.3 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1765423747; cv=none; b=fjYmachgHeLHP/mKHSmo8fFH5S+pOxcj/dFSHK0WnoKeyCbt6dAjrXrT2T93mMwz6kUV4QjQvDAivjDraWKhWd5zhISPrkfDdEqPlak070joim6ZJJqF78NKEwSqoxHfGwzEqCY8NvHEnXz0YGS0Eig0/qjMHriPQQ37tvkAKO0= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1765423747; c=relaxed/simple; bh=B3kdRNOLcj/OXyiFm8PfITf4VLWItqfaYJfei09TnjM=; h=Date:From:To:Cc:Subject:In-Reply-To:References:Content-Type: MIME-Version:Message-ID; b=ButwW89CV80jVsu7Y882o0U+Rp7aNLkuU2yBKfGabt+hCIJFU2jzOuSDT9mcmeIBi5hw4NpCpxNxRsiUOwSPtxTWScSJcrzkxVZY/HxH12lQzG/isVfq5xRf7VNndSUKv93YymqafKj3fX6oh5sVGxgivM/GydAB6RouQiq/L1U= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=163.com; spf=pass smtp.mailfrom=163.com; dkim=pass (1024-bit key) header.d=163.com header.i=@163.com header.b=k0Y3NQ0d; arc=none smtp.client-ip=117.135.210.3 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=163.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=163.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=163.com header.i=@163.com header.b="k0Y3NQ0d" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=163.com; s=s110527; h=Date:From:To:Subject:Content-Type:MIME-Version: Message-ID; bh=B3kdRNOLcj/OXyiFm8PfITf4VLWItqfaYJfei09TnjM=; b=k 0Y3NQ0d67g/TaQ4Qbe/ODWd2C0g6XbhebbEYRMdWVY9zlq70aKJuwP4AA5VdvoNF l+vefls+k76+p29lRR2D6jz0Xi9N2euS3VEJiYS3L56ZW8jBc/AJ3BMPkM616XQ7 a23LtooVwJcRcY7ZXSti1879d2E38OAAH50NutiXjA= Received: from 00107082$163.com ( [111.35.191.189] ) by ajax-webmail-wmsvr-40-119 (Coremail) ; Thu, 11 Dec 2025 11:28:21 +0800 (CST) Date: Thu, 11 Dec 2025 11:28:21 +0800 (CST) From: "David Wang" <00107082@163.com> To: "Mal Haak" Cc: linux-kernel@vger.kernel.org, surenb@google.com, xiubli@redhat.com, idryomov@gmail.com, ceph-devel@vger.kernel.org Subject: RRe: Possible memory leak in 6.17.7 X-Priority: 3 X-Mailer: Coremail Webmail Server Version 2023.4-cmXT build 20250723(a044bf12) Copyright (c) 2002-2025 www.mailtech.cn 163com In-Reply-To: <20251210234318.5d8c2d68@xps15mal> References: <20251110182008.71e0858b@xps15mal> <20251208110829.11840-1-00107082@163.com> <20251209090831.13c7a639@xps15mal> <17469653.4a75.19b01691299.Coremail.00107082@163.com> <20251210234318.5d8c2d68@xps15mal> X-NTES-SC: AL_Qu2dB/iZvE8r5yOZY+kXn0oTju85XMCzuv8j3YJeN500piTN5A0fXF5jAFDb4sW3GSCVsjy7Th9p7ftbT4xqdI+OsYSFQS0LWcfM+qx0WXjk Content-Transfer-Encoding: quoted-printable Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Message-ID: <2a9ba88e.3aa6.19b0b73dd4e.Coremail.00107082@163.com> X-Coremail-Locale: zh_CN X-CM-TRANSID: dygvCgDXx7pWOjppw8I2AA--.13W X-CM-SenderInfo: qqqrilqqysqiywtou0bp/xtbC7RaMN2k6OlaQ6gAA3F X-Coremail-Antispam: 1U5529EdanIXcx71UUUUU7vcSsGvfC2KfnxnUU== Content-Type: text/plain; charset="utf-8" At 2025-12-10 21:43:18, "Mal Haak" wrote: >On Tue, 9 Dec 2025 12:40:21 +0800 (CST) >"David Wang" <00107082@163.com> wrote: > >> At 2025-12-09 07:08:31, "Mal Haak" wrote: >> >On Mon, 8 Dec 2025 19:08:29 +0800 >> >David Wang <00107082@163.com> wrote: >> > =20 >> >> On Mon, 10 Nov 2025 18:20:08 +1000 >> >> Mal Haak wrote: =20 >> >> > Hello, >> >> >=20 >> >> > I have found a memory leak in 6.17.7 but I am unsure how to >> >> > track it down effectively. >> >> >=20 >> >> > =20 >> >>=20 >> >> I think the `memory allocation profiling` feature can help. >> >> https://docs.kernel.org/mm/allocation-profiling.html >> >>=20 >> >> You would need to build a kernel with=20 >> >> CONFIG_MEM_ALLOC_PROFILING=3Dy >> >> CONFIG_MEM_ALLOC_PROFILING_ENABLED_BY_DEFAULT=3Dy >> >>=20 >> >> And check /proc/allocinfo for the suspicious allocations which take >> >> more memory than expected. >> >>=20 >> >> (I once caught a nvidia driver memory leak.) >> >>=20 >> >>=20 >> >> FYI >> >> David >> >> =20 >> > >> >Thank you for your suggestion. I have some results. >> > >> >Ran the rsync workload for about 9 hours. It started to look like it >> >was happening. >> ># smem -pw >> >Area Used Cache Noncache=20 >> >firmware/hardware 0.00% 0.00% 0.00%=20 >> >kernel image 0.00% 0.00% 0.00%=20 >> >kernel dynamic memory 80.46% 65.80% 14.66%=20 >> >userspace memory 0.35% 0.16% 0.19%=20 >> >free memory 19.19% 19.19% 0.00%=20 >> ># sort -g /proc/allocinfo|tail|numfmt --to=3Diec >> > 22M 5609 mm/memory.c:1190 func:folio_prealloc=20 >> > 23M 1932 fs/xfs/xfs_buf.c:226 [xfs] >> >func:xfs_buf_alloc_backing_mem=20 >> > 24M 24135 fs/xfs/xfs_icache.c:97 [xfs] >> > func:xfs_inode_alloc 27M 6693 mm/memory.c:1192 >> > func:folio_prealloc 58M 14784 mm/page_ext.c:271 >> > func:alloc_page_ext 258M 129 mm/khugepaged.c:1069 >> > func:alloc_charge_folio 430M 770788 lib/xarray.c:378 >> > func:xas_alloc 545M 36444 mm/slub.c:3059 func:alloc_slab_page=20 >> > 9.8G 2563617 mm/readahead.c:189 func:ractl_alloc_folio=20 >> > 20G 5164004 mm/filemap.c:2012 func:__filemap_get_folio=20 >> > >> > >> >So I stopped the workload and dropped caches to confirm. >> > >> ># echo 3 > /proc/sys/vm/drop_caches >> ># smem -pw >> >Area Used Cache Noncache=20 >> >firmware/hardware 0.00% 0.00% 0.00%=20 >> >kernel image 0.00% 0.00% 0.00%=20 >> >kernel dynamic memory 33.45% 0.09% 33.36%=20 >> >userspace memory 0.36% 0.16% 0.19%=20 >> >free memory 66.20% 66.20% 0.00%=20 >> ># sort -g /proc/allocinfo|tail|numfmt --to=3Diec >> > 12M 2987 mm/execmem.c:41 func:execmem_vmalloc=20 >> > 12M 3 kernel/dma/pool.c:96 func:atomic_pool_expand=20 >> > 13M 751 mm/slub.c:3061 func:alloc_slab_page=20 >> > 16M 8 mm/khugepaged.c:1069 func:alloc_charge_folio=20 >> > 18M 4355 mm/memory.c:1190 func:folio_prealloc=20 >> > 24M 6119 mm/memory.c:1192 func:folio_prealloc=20 >> > 58M 14784 mm/page_ext.c:271 func:alloc_page_ext=20 >> > 61M 15448 mm/readahead.c:189 func:ractl_alloc_folio=20 >> > 79M 6726 mm/slub.c:3059 func:alloc_slab_page=20 >> > 11G 2674488 mm/filemap.c:2012 func:__filemap_get_folio Maybe narrowing down the "Noncache" caller of __filemap_get_folio would hel= p clarify things. (It could be designed that way, and needs other route than dropping-cache = to release the memory, just guess....) If you want, you can modify code to split the accounting for __filemap_get_= folio according to its callers. Following is a draft patch: (based on v6.18) diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h index 09b581c1d878..ba8c659a6ae3 100644 --- a/include/linux/pagemap.h +++ b/include/linux/pagemap.h @@ -753,7 +753,11 @@ static inline fgf_t fgf_set_order(size_t size) } =20 void *filemap_get_entry(struct address_space *mapping, pgoff_t index); -struct folio *__filemap_get_folio(struct address_space *mapping, pgoff_t i= ndex, + +#define __filemap_get_folio(...) \ + alloc_hooks(__filemap_get_folio_noprof(__VA_ARGS__)) + +struct folio *__filemap_get_folio_noprof(struct address_space *mapping, pg= off_t index, fgf_t fgp_flags, gfp_t gfp); struct page *pagecache_get_page(struct address_space *mapping, pgoff_t ind= ex, fgf_t fgp_flags, gfp_t gfp); diff --git a/mm/filemap.c b/mm/filemap.c index 024b71da5224..e1c1c26d7cb3 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -1938,7 +1938,7 @@ void *filemap_get_entry(struct address_space *mapping= , pgoff_t index) * * Return: The found folio or an ERR_PTR() otherwise. */ -struct folio *__filemap_get_folio(struct address_space *mapping, pgoff_t i= ndex, +struct folio *__filemap_get_folio_noprof(struct address_space *mapping, pg= off_t index, fgf_t fgp_flags, gfp_t gfp) { struct folio *folio; @@ -2009,7 +2009,7 @@ struct folio *__filemap_get_folio(struct address_spac= e *mapping, pgoff_t index, err =3D -ENOMEM; if (order > min_order) alloc_gfp |=3D __GFP_NORETRY | __GFP_NOWARN; - folio =3D filemap_alloc_folio(alloc_gfp, order); + folio =3D filemap_alloc_folio_noprof(alloc_gfp, order); if (!folio) continue; =20 @@ -2056,7 +2056,7 @@ struct folio *__filemap_get_folio(struct address_spac= e *mapping, pgoff_t index, folio_clear_dropbehind(folio); return folio; } -EXPORT_SYMBOL(__filemap_get_folio); +EXPORT_SYMBOL(__filemap_get_folio_noprof); =20 static inline struct folio *find_get_entry(struct xa_state *xas, pgoff_t m= ax, xa_mark_t mark) FYI David >> > >> >So if I'm reading this correctly something is causing folios collect >> >and not be able to be freed? =20 >>=20 >> CC cephfs, maybe someone could have an easy reading out of those >> folio usage >>=20 >>=20 >> > >> >Also it's clear that some of the folio's are counting as cache and >> >some aren't.=20 >> > >> >Like I said 6.17 and 6.18 both have the issue. 6.12 does not. I'm now >> >going to manually walk through previous kernel releases and find >> >where it first starts happening purely because I'm having issues >> >building earlier kernels due to rust stuff and other python >> >incompatibilities making doing a git-bisect a bit fun. >> > >> >I'll do it the packages way until I get closer, then solve the build >> >issues.=20 >> > >> >Thanks, >> >Mal >> > =20 >Thanks David. > >I've contacted the ceph developers as well.=20 > >There was a suggestion it was due to the change from, to quote: >folio.free() to folio.put() or something like this. > >The change happened around 6.14/6.15 > >I've found an easier reproducer.=20 > >There has been a suggestion that perhaps the ceph team might not fix >this as "you can just reboot before the machine becomes unstable" and >"Nobody else has encountered this bug" > >I'll leave that to other people to make a call on but I'd assume the >lack of reports is due to the fact that most stable distros are still >on a a far too early kernel and/or are using the fuse driver with k8s. > >Anyway, thanks for your assistance.