[PATCH] fs/kernfs: raise sb->maxbytes to MAX_LFS_FILESIZE

Jane Chu posted 1 patch 2 months, 3 weeks ago
fs/kernfs/mount.c | 1 +
1 file changed, 1 insertion(+)
[PATCH] fs/kernfs: raise sb->maxbytes to MAX_LFS_FILESIZE
Posted by Jane Chu 2 months, 3 weeks ago
On an ARM64 A1 system, it's possible to have physical memory span
up to the 64T boundary, like below

$ lsmem -b -r -n -o range,size
0x0000000080000000-0x00000000bfffffff 1073741824
0x0000080000000000-0x000008007fffffff 2147483648
0x00000800c0000000-0x0000087fffffffff 546534588416
0x0000400000000000-0x00004000bfffffff 3221225472
0x0000400100000000-0x0000407fffffffff 545460846592

So it's time to extend /sys/kernel/mm/page_idle/bitmap to be able
to account for >2G number of pages, by raising the kernfs file size
limit.

Signed-off-by: Jane Chu <jane.chu@oracle.com>
---
 fs/kernfs/mount.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/fs/kernfs/mount.c b/fs/kernfs/mount.c
index 76eaf64b9d9e..3ac52e141766 100644
--- a/fs/kernfs/mount.c
+++ b/fs/kernfs/mount.c
@@ -298,6 +298,7 @@ static int kernfs_fill_super(struct super_block *sb, struct kernfs_fs_context *k
 	if (info->root->flags & KERNFS_ROOT_SUPPORT_EXPORTOP)
 		sb->s_export_op = &kernfs_export_ops;
 	sb->s_time_gran = 1;
+	sb->s_maxbytes  = MAX_LFS_FILESIZE;
 
 	/* sysfs dentries and inodes don't require IO to create */
 	sb->s_shrink->seeks = 0;
-- 
2.43.5
Re: [PATCH] fs/kernfs: raise sb->maxbytes to MAX_LFS_FILESIZE
Posted by Greg KH 2 months, 2 weeks ago
On Tue, Nov 11, 2025 at 01:26:06PM -0700, Jane Chu wrote:
> On an ARM64 A1 system, it's possible to have physical memory span
> up to the 64T boundary, like below
> 
> $ lsmem -b -r -n -o range,size
> 0x0000000080000000-0x00000000bfffffff 1073741824
> 0x0000080000000000-0x000008007fffffff 2147483648
> 0x00000800c0000000-0x0000087fffffffff 546534588416
> 0x0000400000000000-0x00004000bfffffff 3221225472
> 0x0000400100000000-0x0000407fffffffff 545460846592
> 
> So it's time to extend /sys/kernel/mm/page_idle/bitmap to be able
> to account for >2G number of pages, by raising the kernfs file size
> limit.

Wait, we are having sysfs files that are bigger than >2G?  Which files
exactly?

> Signed-off-by: Jane Chu <jane.chu@oracle.com>
> ---
>  fs/kernfs/mount.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/fs/kernfs/mount.c b/fs/kernfs/mount.c
> index 76eaf64b9d9e..3ac52e141766 100644
> --- a/fs/kernfs/mount.c
> +++ b/fs/kernfs/mount.c
> @@ -298,6 +298,7 @@ static int kernfs_fill_super(struct super_block *sb, struct kernfs_fs_context *k
>  	if (info->root->flags & KERNFS_ROOT_SUPPORT_EXPORTOP)
>  		sb->s_export_op = &kernfs_export_ops;
>  	sb->s_time_gran = 1;
> +	sb->s_maxbytes  = MAX_LFS_FILESIZE;

What is the default setting for s_maxbytes today?

thanks,

greg k-h
Re: [PATCH] fs/kernfs: raise sb->maxbytes to MAX_LFS_FILESIZE
Posted by jane.chu@oracle.com 2 months, 2 weeks ago
Hi, Greg,

On 11/24/2025 8:17 AM, Greg KH wrote:
> On Tue, Nov 11, 2025 at 01:26:06PM -0700, Jane Chu wrote:
>> On an ARM64 A1 system, it's possible to have physical memory span
>> up to the 64T boundary, like below
>>
>> $ lsmem -b -r -n -o range,size
>> 0x0000000080000000-0x00000000bfffffff 1073741824
>> 0x0000080000000000-0x000008007fffffff 2147483648
>> 0x00000800c0000000-0x0000087fffffffff 546534588416
>> 0x0000400000000000-0x00004000bfffffff 3221225472
>> 0x0000400100000000-0x0000407fffffffff 545460846592
>>
>> So it's time to extend /sys/kernel/mm/page_idle/bitmap to be able
>> to account for >2G number of pages, by raising the kernfs file size
>> limit.
> 
> Wait, we are having sysfs files that are bigger than >2G?  Which files
> exactly?

This file:  /sys/kernel/mm/page_idle/bitmap
that tracks idle pages, 1 bit per page.

Because of the above memory span, so even though the system has < 64TiB 
memory, we still need to be able to seek beyond the 2GiB point in the 
/sys/kernel/mm/page_idle/bitmap file.

without fix:
--------------
2 Gb
$ sudo dd if=/sys/kernel/mm/page_idle/bitmap of=/dev/null bs=8 
skip=$((2*1024*1024*1024/8)) count=1
dd: /sys/kernel/mm/page_idle/bitmap: cannot skip: Invalid argument  <--
0+0 records in
0+0 records out
0 bytes copied, 0.00017564 s, 0.0 kB/s  <--

with fix:
------------
2 Gb
$ sudo dd if=/sys/kernel/mm/page_idle/bitmap of=/dev/null bs=8 
skip=$((2*1024*1024*1024/8)) count=1
dd: /sys/kernel/mm/page_idle/bitmap: cannot skip to specified offset <-- 
ignore
1+0 records in
1+0 records out
8 bytes copied, 0.000165122 s, 48.4 kB/s  <--

thanks,
-jane

> 
>> Signed-off-by: Jane Chu <jane.chu@oracle.com>
>> ---
>>   fs/kernfs/mount.c | 1 +
>>   1 file changed, 1 insertion(+)
>>
>> diff --git a/fs/kernfs/mount.c b/fs/kernfs/mount.c
>> index 76eaf64b9d9e..3ac52e141766 100644
>> --- a/fs/kernfs/mount.c
>> +++ b/fs/kernfs/mount.c
>> @@ -298,6 +298,7 @@ static int kernfs_fill_super(struct super_block *sb, struct kernfs_fs_context *k
>>   	if (info->root->flags & KERNFS_ROOT_SUPPORT_EXPORTOP)
>>   		sb->s_export_op = &kernfs_export_ops;
>>   	sb->s_time_gran = 1;
>> +	sb->s_maxbytes  = MAX_LFS_FILESIZE;
> 
> What is the default setting for s_maxbytes today?
> 
> thanks,
> 
> greg k-h
Re: [PATCH] fs/kernfs: raise sb->maxbytes to MAX_LFS_FILESIZE
Posted by Greg KH 2 months, 2 weeks ago
On Mon, Nov 24, 2025 at 09:06:21AM -0800, jane.chu@oracle.com wrote:
> Hi, Greg,
> 
> On 11/24/2025 8:17 AM, Greg KH wrote:
> > On Tue, Nov 11, 2025 at 01:26:06PM -0700, Jane Chu wrote:
> > > On an ARM64 A1 system, it's possible to have physical memory span
> > > up to the 64T boundary, like below
> > > 
> > > $ lsmem -b -r -n -o range,size
> > > 0x0000000080000000-0x00000000bfffffff 1073741824
> > > 0x0000080000000000-0x000008007fffffff 2147483648
> > > 0x00000800c0000000-0x0000087fffffffff 546534588416
> > > 0x0000400000000000-0x00004000bfffffff 3221225472
> > > 0x0000400100000000-0x0000407fffffffff 545460846592
> > > 
> > > So it's time to extend /sys/kernel/mm/page_idle/bitmap to be able
> > > to account for >2G number of pages, by raising the kernfs file size
> > > limit.
> > 
> > Wait, we are having sysfs files that are bigger than >2G?  Which files
> > exactly?
> 
> This file:  /sys/kernel/mm/page_idle/bitmap
> that tracks idle pages, 1 bit per page.

Why is that a sysfs file and not a debugfs file?

> Because of the above memory span, so even though the system has < 64TiB
> memory, we still need to be able to seek beyond the 2GiB point in the
> /sys/kernel/mm/page_idle/bitmap file.

What uses this file?  It's not on my systems, what arch uses it?

thanks,

greg k-h
Re: [PATCH] fs/kernfs: raise sb->maxbytes to MAX_LFS_FILESIZE
Posted by jane.chu@oracle.com 2 months, 2 weeks ago
On 11/24/2025 9:27 AM, Greg KH wrote:
> On Mon, Nov 24, 2025 at 09:06:21AM -0800, jane.chu@oracle.com wrote:
>> Hi, Greg,
>>
>> On 11/24/2025 8:17 AM, Greg KH wrote:
>>> On Tue, Nov 11, 2025 at 01:26:06PM -0700, Jane Chu wrote:
>>>> On an ARM64 A1 system, it's possible to have physical memory span
>>>> up to the 64T boundary, like below
>>>>
>>>> $ lsmem -b -r -n -o range,size
>>>> 0x0000000080000000-0x00000000bfffffff 1073741824
>>>> 0x0000080000000000-0x000008007fffffff 2147483648
>>>> 0x00000800c0000000-0x0000087fffffffff 546534588416
>>>> 0x0000400000000000-0x00004000bfffffff 3221225472
>>>> 0x0000400100000000-0x0000407fffffffff 545460846592
>>>>
>>>> So it's time to extend /sys/kernel/mm/page_idle/bitmap to be able
>>>> to account for >2G number of pages, by raising the kernfs file size
>>>> limit.
>>>
>>> Wait, we are having sysfs files that are bigger than >2G?  Which files
>>> exactly?
>>
>> This file:  /sys/kernel/mm/page_idle/bitmap
>> that tracks idle pages, 1 bit per page.
> 
> Why is that a sysfs file and not a debugfs file?

The bitmap file  was introduced by
   33c3fc71c8cf "(mm: introduce idle page tracking)"
for idle page tracking.

See also
   https://docs.kernel.org/admin-guide/mm/idle_page_tracking.html

> 
>> Because of the above memory span, so even though the system has < 64TiB
>> memory, we still need to be able to seek beyond the 2GiB point in the
>> /sys/kernel/mm/page_idle/bitmap file.
> 
> What uses this file?  It's not on my systems, what arch uses it?

Our use case is for production, not for debug purpose.
The file is on my dual-socket Intel Ice Lake system w/ Linux v6.12.x, 
the issue was originally reported on an ARM64 A1 system.

thanks,
-jane

> 
> thanks,
> 
> greg k-h