[v1] ext4: enable block size larger than page size

[PATCH 25/25] ext4: enable block size larger than page size

Posted by libaokun@huaweicloud.com 3 months, 2 weeks ago

From: Baokun Li <libaokun1@huawei.com>

Since block device (See commit 3c20917120ce ("block/bdev: enable large
folio support for large logical block sizes")) and page cache (See commit
ab95d23bab220ef8 ("filemap: allocate mapping_min_order folios in the page
cache")) has the ability to have a minimum order when allocating folio,
and ext4 has supported large folio in commit 7ac67301e82f ("ext4: enable
large folio for regular file"), now add support for block_size > PAGE_SIZE
in ext4.

set_blocksize() -> bdev_validate_blocksize() already validates the block
size, so ext4_load_super() does not need to perform additional checks.

Here we only need to enable large folio by default when s_min_folio_order
is greater than 0 and add the FS_LBS bit to fs_flags.

In addition, mark this feature as experimental.

Signed-off-by: Baokun Li <libaokun1@huawei.com>
Reviewed-by: Zhang Yi <yi.zhang@huawei.com>
---
 fs/ext4/inode.c | 3 +++
 fs/ext4/super.c | 6 +++++-
 2 files changed, 8 insertions(+), 1 deletion(-)

diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 04f9380d4211..ba6cf05860ae 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -5146,6 +5146,9 @@ static bool ext4_should_enable_large_folio(struct inode *inode)
 	if (!ext4_test_mount_flag(sb, EXT4_MF_LARGE_FOLIO))
 		return false;
 
+	if (EXT4_SB(sb)->s_min_folio_order)
+		return true;
+
 	if (!S_ISREG(inode->i_mode))
 		return false;
 	if (ext4_test_inode_flag(inode, EXT4_INODE_JOURNAL_DATA))
diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index fdc006a973aa..4c0bd79bdf68 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -5053,6 +5053,9 @@ static int ext4_check_large_folio(struct super_block *sb)
 		return -EINVAL;
 	}
 
+	if (sb->s_blocksize > PAGE_SIZE)
+		ext4_msg(sb, KERN_NOTICE, "EXPERIMENTAL bs(%lu) > ps(%lu) enabled.",
+			 sb->s_blocksize, PAGE_SIZE);
 	return 0;
 }
 
@@ -7432,7 +7435,8 @@ static struct file_system_type ext4_fs_type = {
 	.init_fs_context	= ext4_init_fs_context,
 	.parameters		= ext4_param_specs,
 	.kill_sb		= ext4_kill_sb,
-	.fs_flags		= FS_REQUIRES_DEV | FS_ALLOW_IDMAP | FS_MGTIME,
+	.fs_flags		= FS_REQUIRES_DEV | FS_ALLOW_IDMAP | FS_MGTIME |
+				  FS_LBS,
 };
 MODULE_ALIAS_FS("ext4");
 
-- 
2.46.1

Re: [PATCH 25/25] ext4: enable block size larger than page size

Posted by Jan Kara 3 months ago

On Sat 25-10-25 11:22:21, libaokun@huaweicloud.com wrote:
> From: Baokun Li <libaokun1@huawei.com>
> 
> Since block device (See commit 3c20917120ce ("block/bdev: enable large
> folio support for large logical block sizes")) and page cache (See commit
> ab95d23bab220ef8 ("filemap: allocate mapping_min_order folios in the page
> cache")) has the ability to have a minimum order when allocating folio,
> and ext4 has supported large folio in commit 7ac67301e82f ("ext4: enable
> large folio for regular file"), now add support for block_size > PAGE_SIZE
> in ext4.
> 
> set_blocksize() -> bdev_validate_blocksize() already validates the block
> size, so ext4_load_super() does not need to perform additional checks.
> 
> Here we only need to enable large folio by default when s_min_folio_order
> is greater than 0 and add the FS_LBS bit to fs_flags.
> 
> In addition, mark this feature as experimental.
> 
> Signed-off-by: Baokun Li <libaokun1@huawei.com>
> Reviewed-by: Zhang Yi <yi.zhang@huawei.com>

...

> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
> index 04f9380d4211..ba6cf05860ae 100644
> --- a/fs/ext4/inode.c
> +++ b/fs/ext4/inode.c
> @@ -5146,6 +5146,9 @@ static bool ext4_should_enable_large_folio(struct inode *inode)
>  	if (!ext4_test_mount_flag(sb, EXT4_MF_LARGE_FOLIO))
>  		return false;
>  
> +	if (EXT4_SB(sb)->s_min_folio_order)
> +		return true;
> +

But now files with data journalling flag enabled will get large folios
possibly significantly greater that blocksize. I don't think there's a
fundamental reason why data journalling doesn't work with large folios, the
only thing that's likely going to break is that credit estimates will go
through the roof if there are too many blocks per folio. But that can be
handled by setting max folio order to be equal to min folio order when
journalling data for the inode.

It is a bit scary to be modifying max folio order in
ext4_change_inode_journal_flag() but I guess less scary than setting new
aops and if we prune the whole page cache before touching the order and
inode flag, we should be safe (famous last words ;).

								Honza

>  	if (!S_ISREG(inode->i_mode))
>  		return false;
>  	if (ext4_test_inode_flag(inode, EXT4_INODE_JOURNAL_DATA))
> diff --git a/fs/ext4/super.c b/fs/ext4/super.c
> index fdc006a973aa..4c0bd79bdf68 100644
> --- a/fs/ext4/super.c
> +++ b/fs/ext4/super.c
> @@ -5053,6 +5053,9 @@ static int ext4_check_large_folio(struct super_block *sb)
>  		return -EINVAL;
>  	}
>  
> +	if (sb->s_blocksize > PAGE_SIZE)
> +		ext4_msg(sb, KERN_NOTICE, "EXPERIMENTAL bs(%lu) > ps(%lu) enabled.",
> +			 sb->s_blocksize, PAGE_SIZE);
>  	return 0;
>  }
>  
> @@ -7432,7 +7435,8 @@ static struct file_system_type ext4_fs_type = {
>  	.init_fs_context	= ext4_init_fs_context,
>  	.parameters		= ext4_param_specs,
>  	.kill_sb		= ext4_kill_sb,
> -	.fs_flags		= FS_REQUIRES_DEV | FS_ALLOW_IDMAP | FS_MGTIME,
> +	.fs_flags		= FS_REQUIRES_DEV | FS_ALLOW_IDMAP | FS_MGTIME |
> +				  FS_LBS,
>  };
>  MODULE_ALIAS_FS("ext4");
>  
> -- 
> 2.46.1
> 
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

Re: [PATCH 25/25] ext4: enable block size larger than page size

Posted by Baokun Li 3 months ago

On 2025-11-05 18:14, Jan Kara wrote:
> On Sat 25-10-25 11:22:21, libaokun@huaweicloud.com wrote:
>> From: Baokun Li <libaokun1@huawei.com>
>>
>> Since block device (See commit 3c20917120ce ("block/bdev: enable large
>> folio support for large logical block sizes")) and page cache (See commit
>> ab95d23bab220ef8 ("filemap: allocate mapping_min_order folios in the page
>> cache")) has the ability to have a minimum order when allocating folio,
>> and ext4 has supported large folio in commit 7ac67301e82f ("ext4: enable
>> large folio for regular file"), now add support for block_size > PAGE_SIZE
>> in ext4.
>>
>> set_blocksize() -> bdev_validate_blocksize() already validates the block
>> size, so ext4_load_super() does not need to perform additional checks.
>>
>> Here we only need to enable large folio by default when s_min_folio_order
>> is greater than 0 and add the FS_LBS bit to fs_flags.
>>
>> In addition, mark this feature as experimental.
>>
>> Signed-off-by: Baokun Li <libaokun1@huawei.com>
>> Reviewed-by: Zhang Yi <yi.zhang@huawei.com>
> ...
>
>> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
>> index 04f9380d4211..ba6cf05860ae 100644
>> --- a/fs/ext4/inode.c
>> +++ b/fs/ext4/inode.c
>> @@ -5146,6 +5146,9 @@ static bool ext4_should_enable_large_folio(struct inode *inode)
>>  	if (!ext4_test_mount_flag(sb, EXT4_MF_LARGE_FOLIO))
>>  		return false;
>>  
>> +	if (EXT4_SB(sb)->s_min_folio_order)
>> +		return true;
>> +
> But now files with data journalling flag enabled will get large folios
> possibly significantly greater that blocksize. I don't think there's a
> fundamental reason why data journalling doesn't work with large folios, the
> only thing that's likely going to break is that credit estimates will go
> through the roof if there are too many blocks per folio. But that can be
> handled by setting max folio order to be equal to min folio order when
> journalling data for the inode.
>
> It is a bit scary to be modifying max folio order in
> ext4_change_inode_journal_flag() but I guess less scary than setting new
> aops and if we prune the whole page cache before touching the order and
> inode flag, we should be safe (famous last words ;).
>
Good point! This looks feasible.

We just need to adjust the folio order range based on the journal data,
and in ext4_inode_journal_mode only ignore the inode’s journal data flag
when max_order > min_order.

I’ll make the adaptation and run some tests.
Thank you for your review!


Cheers,
Baokun

>
>>  	if (!S_ISREG(inode->i_mode))
>>  		return false;
>>  	if (ext4_test_inode_flag(inode, EXT4_INODE_JOURNAL_DATA))
>> diff --git a/fs/ext4/super.c b/fs/ext4/super.c
>> index fdc006a973aa..4c0bd79bdf68 100644
>> --- a/fs/ext4/super.c
>> +++ b/fs/ext4/super.c
>> @@ -5053,6 +5053,9 @@ static int ext4_check_large_folio(struct super_block *sb)
>>  		return -EINVAL;
>>  	}
>>  
>> +	if (sb->s_blocksize > PAGE_SIZE)
>> +		ext4_msg(sb, KERN_NOTICE, "EXPERIMENTAL bs(%lu) > ps(%lu) enabled.",
>> +			 sb->s_blocksize, PAGE_SIZE);
>>  	return 0;
>>  }
>>  
>> @@ -7432,7 +7435,8 @@ static struct file_system_type ext4_fs_type = {
>>  	.init_fs_context	= ext4_init_fs_context,
>>  	.parameters		= ext4_param_specs,
>>  	.kill_sb		= ext4_kill_sb,
>> -	.fs_flags		= FS_REQUIRES_DEV | FS_ALLOW_IDMAP | FS_MGTIME,
>> +	.fs_flags		= FS_REQUIRES_DEV | FS_ALLOW_IDMAP | FS_MGTIME |
>> +				  FS_LBS,
>>  };
>>  MODULE_ALIAS_FS("ext4");
>>  
>> -- 
>> 2.46.1
>>