[v2] ext4: enable block size larger than page size

[PATCH v2 21/24] ext4: make data=journal support large block size

Posted by libaokun@huaweicloud.com 3 months ago

From: Baokun Li <libaokun1@huawei.com>

Currently, ext4_set_inode_mapping_order() does not set max folio order
for files with the data journalling flag. For files that already have
large folios enabled, ext4_inode_journal_mode() ignores the data
journalling flag once max folio order is set.

This is not because data journalling cannot work with large folios, but
because credit estimates will go through the roof if there are too many
blocks per folio.

Since the real constraint is blocks-per-folio, to support data=journal
under LBS, we now set max folio order to be equal to min folio order for
files with the journalling flag. When LBS is disabled, the max folio order
remains unset as before.

Additionally, the max_order check in ext4_inode_journal_mode() is removed,
and mapping order is reset in ext4_change_inode_journal_flag().

Suggested-by: Jan Kara <jack@suse.cz>
Signed-off-by: Baokun Li <libaokun1@huawei.com>
---
 fs/ext4/ext4_jbd2.c |  3 +--
 fs/ext4/inode.c     | 14 ++++++++++----
 2 files changed, 11 insertions(+), 6 deletions(-)

diff --git a/fs/ext4/ext4_jbd2.c b/fs/ext4/ext4_jbd2.c
index a0e66bc10093..05e5946ed9b3 100644
--- a/fs/ext4/ext4_jbd2.c
+++ b/fs/ext4/ext4_jbd2.c
@@ -16,8 +16,7 @@ int ext4_inode_journal_mode(struct inode *inode)
 	    ext4_test_inode_flag(inode, EXT4_INODE_EA_INODE) ||
 	    test_opt(inode->i_sb, DATA_FLAGS) == EXT4_MOUNT_JOURNAL_DATA ||
 	    (ext4_test_inode_flag(inode, EXT4_INODE_JOURNAL_DATA) &&
-	    !test_opt(inode->i_sb, DELALLOC) &&
-	    !mapping_large_folio_support(inode->i_mapping))) {
+	    !test_opt(inode->i_sb, DELALLOC))) {
 		/* We do not support data journalling for encrypted data */
 		if (S_ISREG(inode->i_mode) && IS_ENCRYPTED(inode))
 			return EXT4_INODE_ORDERED_DATA_MODE;  /* ordered */
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 22d215f90c64..517701024d18 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -5152,9 +5152,6 @@ static bool ext4_should_enable_large_folio(struct inode *inode)
 
 	if (!S_ISREG(inode->i_mode))
 		return false;
-	if (test_opt(sb, DATA_FLAGS) == EXT4_MOUNT_JOURNAL_DATA ||
-	    ext4_test_inode_flag(inode, EXT4_INODE_JOURNAL_DATA))
-		return false;
 	if (ext4_has_feature_verity(sb))
 		return false;
 	if (ext4_has_feature_encrypt(sb))
@@ -5172,12 +5169,20 @@ static bool ext4_should_enable_large_folio(struct inode *inode)
 		umin(MAX_PAGECACHE_ORDER, (11 + (i)->i_blkbits - PAGE_SHIFT))
 void ext4_set_inode_mapping_order(struct inode *inode)
 {
+	u32 max_order;
+
 	if (!ext4_should_enable_large_folio(inode))
 		return;
 
+	if (test_opt(inode->i_sb, DATA_FLAGS) == EXT4_MOUNT_JOURNAL_DATA ||
+	    ext4_test_inode_flag(inode, EXT4_INODE_JOURNAL_DATA))
+		max_order = EXT4_SB(inode->i_sb)->s_min_folio_order;
+	else
+		max_order = EXT4_MAX_PAGECACHE_ORDER(inode);
+
 	mapping_set_folio_order_range(inode->i_mapping,
 				      EXT4_SB(inode->i_sb)->s_min_folio_order,
-				      EXT4_MAX_PAGECACHE_ORDER(inode));
+				      max_order);
 }
 
 struct inode *__ext4_iget(struct super_block *sb, unsigned long ino,
@@ -6585,6 +6590,7 @@ int ext4_change_inode_journal_flag(struct inode *inode, int val)
 		ext4_clear_inode_flag(inode, EXT4_INODE_JOURNAL_DATA);
 	}
 	ext4_set_aops(inode);
+	ext4_set_inode_mapping_order(inode);
 
 	jbd2_journal_unlock_updates(journal);
 	ext4_writepages_up_write(inode->i_sb, alloc_ctx);
-- 
2.46.1

Re: [PATCH v2 21/24] ext4: make data=journal support large block size

Posted by Jan Kara 3 months ago

On Fri 07-11-25 22:42:46, libaokun@huaweicloud.com wrote:
> From: Baokun Li <libaokun1@huawei.com>
> 
> Currently, ext4_set_inode_mapping_order() does not set max folio order
> for files with the data journalling flag. For files that already have
> large folios enabled, ext4_inode_journal_mode() ignores the data
> journalling flag once max folio order is set.
> 
> This is not because data journalling cannot work with large folios, but
> because credit estimates will go through the roof if there are too many
> blocks per folio.
> 
> Since the real constraint is blocks-per-folio, to support data=journal
> under LBS, we now set max folio order to be equal to min folio order for
> files with the journalling flag. When LBS is disabled, the max folio order
> remains unset as before.
> 
> Additionally, the max_order check in ext4_inode_journal_mode() is removed,
> and mapping order is reset in ext4_change_inode_journal_flag().
> 
> Suggested-by: Jan Kara <jack@suse.cz>
> Signed-off-by: Baokun Li <libaokun1@huawei.com>

...

> @@ -6585,6 +6590,7 @@ int ext4_change_inode_journal_flag(struct inode *inode, int val)
>  		ext4_clear_inode_flag(inode, EXT4_INODE_JOURNAL_DATA);
>  	}
>  	ext4_set_aops(inode);
> +	ext4_set_inode_mapping_order(inode);
>  
>  	jbd2_journal_unlock_updates(journal);
>  	ext4_writepages_up_write(inode->i_sb, alloc_ctx);

I think more needs to be done here because this way we could leave folios
in the page cache that would be now larger than max order. To simplify the
logic I'd make filemap_write_and_wait() call in
ext4_change_inode_journal_flag() unconditional and add there
truncate_pagecache() call to evict all the page cache before we switch the
inode journalling mode.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

Re: [PATCH v2 21/24] ext4: make data=journal support large block size

Posted by Baokun Li 3 months ago

On 2025-11-10 17:48, Jan Kara wrote:
> On Fri 07-11-25 22:42:46, libaokun@huaweicloud.com wrote:
>> From: Baokun Li <libaokun1@huawei.com>
>>
>> Currently, ext4_set_inode_mapping_order() does not set max folio order
>> for files with the data journalling flag. For files that already have
>> large folios enabled, ext4_inode_journal_mode() ignores the data
>> journalling flag once max folio order is set.
>>
>> This is not because data journalling cannot work with large folios, but
>> because credit estimates will go through the roof if there are too many
>> blocks per folio.
>>
>> Since the real constraint is blocks-per-folio, to support data=journal
>> under LBS, we now set max folio order to be equal to min folio order for
>> files with the journalling flag. When LBS is disabled, the max folio order
>> remains unset as before.
>>
>> Additionally, the max_order check in ext4_inode_journal_mode() is removed,
>> and mapping order is reset in ext4_change_inode_journal_flag().
>>
>> Suggested-by: Jan Kara <jack@suse.cz>
>> Signed-off-by: Baokun Li <libaokun1@huawei.com>
> ...
>
>> @@ -6585,6 +6590,7 @@ int ext4_change_inode_journal_flag(struct inode *inode, int val)
>>  		ext4_clear_inode_flag(inode, EXT4_INODE_JOURNAL_DATA);
>>  	}
>>  	ext4_set_aops(inode);
>> +	ext4_set_inode_mapping_order(inode);
>>  
>>  	jbd2_journal_unlock_updates(journal);
>>  	ext4_writepages_up_write(inode->i_sb, alloc_ctx);
> I think more needs to be done here because this way we could leave folios
> in the page cache that would be now larger than max order. To simplify the
> logic I'd make filemap_write_and_wait() call in
> ext4_change_inode_journal_flag() unconditional and add there
> truncate_pagecache() call to evict all the page cache before we switch the
> inode journalling mode.
>
> 								Honza

That makes sense. I forgot to truncate the old page cache here.

I will make the changes according to your suggestion in the next version.

Thank you for your advice!


Cheers,
Baokun