[v2] ext4: enable large folio for regular files

[PATCH v2 4/8] ext4/jbd2: convert jbd2_journal_blocks_per_page() to support large folio

Posted by Zhang Yi 9 months ago

From: Zhang Yi <yi.zhang@huawei.com>

jbd2_journal_blocks_per_page() returns the number of blocks in a single
page. Rename it to jbd2_journal_blocks_per_folio() and make it returns
the number of blocks in the largest folio, preparing for the calculation
of journal credits blocks when allocating blocks within a large folio in
the writeback path.

Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
---
 fs/ext4/ext4_jbd2.h  | 4 ++--
 fs/ext4/inode.c      | 6 +++---
 fs/jbd2/journal.c    | 7 ++++---
 include/linux/jbd2.h | 2 +-
 4 files changed, 10 insertions(+), 9 deletions(-)

diff --git a/fs/ext4/ext4_jbd2.h b/fs/ext4/ext4_jbd2.h
index 3221714d9901..63d17c5201b5 100644
--- a/fs/ext4/ext4_jbd2.h
+++ b/fs/ext4/ext4_jbd2.h
@@ -319,10 +319,10 @@ static inline int ext4_journal_ensure_credits(handle_t *handle, int credits,
 				revoke_creds, 0);
 }
 
-static inline int ext4_journal_blocks_per_page(struct inode *inode)
+static inline int ext4_journal_blocks_per_folio(struct inode *inode)
 {
 	if (EXT4_JOURNAL(inode) != NULL)
-		return jbd2_journal_blocks_per_page(inode);
+		return jbd2_journal_blocks_per_folio(inode);
 	return 0;
 }
 
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 573ae0b3be1d..ffbf444b56d4 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -2361,7 +2361,7 @@ static int mpage_map_and_submit_extent(handle_t *handle,
  */
 static int ext4_da_writepages_trans_blocks(struct inode *inode)
 {
-	int bpp = ext4_journal_blocks_per_page(inode);
+	int bpp = ext4_journal_blocks_per_folio(inode);
 
 	return ext4_meta_trans_blocks(inode,
 				MAX_WRITEPAGES_EXTENT_LEN + bpp - 1, bpp);
@@ -2439,7 +2439,7 @@ static int mpage_prepare_extent_to_map(struct mpage_da_data *mpd)
 	ext4_lblk_t lblk;
 	struct buffer_head *head;
 	handle_t *handle = NULL;
-	int bpp = ext4_journal_blocks_per_page(mpd->inode);
+	int bpp = ext4_journal_blocks_per_folio(mpd->inode);
 
 	if (mpd->wbc->sync_mode == WB_SYNC_ALL || mpd->wbc->tagged_writepages)
 		tag = PAGECACHE_TAG_TOWRITE;
@@ -5831,7 +5831,7 @@ static int ext4_meta_trans_blocks(struct inode *inode, int lblocks,
  */
 int ext4_writepage_trans_blocks(struct inode *inode)
 {
-	int bpp = ext4_journal_blocks_per_page(inode);
+	int bpp = ext4_journal_blocks_per_folio(inode);
 	int ret;
 
 	ret = ext4_meta_trans_blocks(inode, bpp, bpp);
diff --git a/fs/jbd2/journal.c b/fs/jbd2/journal.c
index 743a1d7633cd..ecf31af7d2fb 100644
--- a/fs/jbd2/journal.c
+++ b/fs/jbd2/journal.c
@@ -83,7 +83,7 @@ EXPORT_SYMBOL(jbd2_log_wait_commit);
 EXPORT_SYMBOL(jbd2_journal_start_commit);
 EXPORT_SYMBOL(jbd2_journal_force_commit_nested);
 EXPORT_SYMBOL(jbd2_journal_wipe);
-EXPORT_SYMBOL(jbd2_journal_blocks_per_page);
+EXPORT_SYMBOL(jbd2_journal_blocks_per_folio);
 EXPORT_SYMBOL(jbd2_journal_invalidate_folio);
 EXPORT_SYMBOL(jbd2_journal_try_to_free_buffers);
 EXPORT_SYMBOL(jbd2_journal_force_commit);
@@ -2657,9 +2657,10 @@ void jbd2_journal_ack_err(journal_t *journal)
 	write_unlock(&journal->j_state_lock);
 }
 
-int jbd2_journal_blocks_per_page(struct inode *inode)
+int jbd2_journal_blocks_per_folio(struct inode *inode)
 {
-	return 1 << (PAGE_SHIFT - inode->i_sb->s_blocksize_bits);
+	return 1 << (PAGE_SHIFT + mapping_max_folio_order(inode->i_mapping) -
+		     inode->i_sb->s_blocksize_bits);
 }
 
 /*
diff --git a/include/linux/jbd2.h b/include/linux/jbd2.h
index 023e8abdb99a..ebbcdab474d5 100644
--- a/include/linux/jbd2.h
+++ b/include/linux/jbd2.h
@@ -1723,7 +1723,7 @@ static inline int tid_geq(tid_t x, tid_t y)
 	return (difference >= 0);
 }
 
-extern int jbd2_journal_blocks_per_page(struct inode *inode);
+extern int jbd2_journal_blocks_per_folio(struct inode *inode);
 extern size_t journal_tag_bytes(journal_t *journal);
 
 static inline int jbd2_journal_has_csum_v2or3(journal_t *journal)
-- 
2.46.1

Re: [PATCH v2 4/8] ext4/jbd2: convert jbd2_journal_blocks_per_page() to support large folio

Posted by Jan Kara 8 months, 3 weeks ago

On Mon 12-05-25 14:33:15, Zhang Yi wrote:
> From: Zhang Yi <yi.zhang@huawei.com>
> 
> jbd2_journal_blocks_per_page() returns the number of blocks in a single
> page. Rename it to jbd2_journal_blocks_per_folio() and make it returns
> the number of blocks in the largest folio, preparing for the calculation
> of journal credits blocks when allocating blocks within a large folio in
> the writeback path.
> 
> Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
...
> @@ -2657,9 +2657,10 @@ void jbd2_journal_ack_err(journal_t *journal)
>  	write_unlock(&journal->j_state_lock);
>  }
>  
> -int jbd2_journal_blocks_per_page(struct inode *inode)
> +int jbd2_journal_blocks_per_folio(struct inode *inode)
>  {
> -	return 1 << (PAGE_SHIFT - inode->i_sb->s_blocksize_bits);
> +	return 1 << (PAGE_SHIFT + mapping_max_folio_order(inode->i_mapping) -
> +		     inode->i_sb->s_blocksize_bits);
>  }

FWIW this will result in us reserving some 10k transaction credits for 1k
blocksize with maximum 2M folio size. That is going to create serious
pressure on the journalling machinery. For now I guess we are fine but
eventually we should rewrite how credits for writing out folio are computed
to reduce this massive overestimation. It will be a bit tricky but we could
always reserve credits for one / couple of extents and try to extend the
transaction if we need more. The tricky part is to do the partial folio
writeout in case we cannot extend the transaction...

								Honza
>  /*
> diff --git a/include/linux/jbd2.h b/include/linux/jbd2.h
> index 023e8abdb99a..ebbcdab474d5 100644
> --- a/include/linux/jbd2.h
> +++ b/include/linux/jbd2.h
> @@ -1723,7 +1723,7 @@ static inline int tid_geq(tid_t x, tid_t y)
>  	return (difference >= 0);
>  }
>  
> -extern int jbd2_journal_blocks_per_page(struct inode *inode);
> +extern int jbd2_journal_blocks_per_folio(struct inode *inode);
>  extern size_t journal_tag_bytes(journal_t *journal);
>  
>  static inline int jbd2_journal_has_csum_v2or3(journal_t *journal)
> -- 
> 2.46.1
> 
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

Re: [PATCH v2 4/8] ext4/jbd2: convert jbd2_journal_blocks_per_page() to support large folio

Posted by Zhang Yi 8 months, 3 weeks ago

On 2025/5/20 4:16, Jan Kara wrote:
> On Mon 12-05-25 14:33:15, Zhang Yi wrote:
>> From: Zhang Yi <yi.zhang@huawei.com>
>>
>> jbd2_journal_blocks_per_page() returns the number of blocks in a single
>> page. Rename it to jbd2_journal_blocks_per_folio() and make it returns
>> the number of blocks in the largest folio, preparing for the calculation
>> of journal credits blocks when allocating blocks within a large folio in
>> the writeback path.
>>
>> Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
> ...
>> @@ -2657,9 +2657,10 @@ void jbd2_journal_ack_err(journal_t *journal)
>>  	write_unlock(&journal->j_state_lock);
>>  }
>>  
>> -int jbd2_journal_blocks_per_page(struct inode *inode)
>> +int jbd2_journal_blocks_per_folio(struct inode *inode)
>>  {
>> -	return 1 << (PAGE_SHIFT - inode->i_sb->s_blocksize_bits);
>> +	return 1 << (PAGE_SHIFT + mapping_max_folio_order(inode->i_mapping) -
>> +		     inode->i_sb->s_blocksize_bits);
>>  }
> 
> FWIW this will result in us reserving some 10k transaction credits for 1k
> blocksize with maximum 2M folio size. That is going to create serious
> pressure on the journalling machinery. For now I guess we are fine

Oooh, indeed, you are right, thanks a lot for pointing this out. As you
mentioned in patch 5, the credits calculation I proposed was incorrect,
I thought it wouldn't require too many credits.

I believe it is risky to mount a filesystem with a small journal space
and a large number of block groups. For example, if we build an image
with a 1K block size and a 1MB journal on a 20GB disk (which contains
2,540 groups), it will require 2,263 credits, exceeding the available
journal space.

For now, I'm going to disable large folio support on the filesystem with
limited journal space. i.e., when the return value of
ext4_writepage_trans_blocks() is greater than
jbd2_max_user_trans_buffers(journal) / 2, ext4_should_enable_large_folio()
return false, thoughts?

> but
> eventually we should rewrite how credits for writing out folio are computed
> to reduce this massive overestimation. It will be a bit tricky but we could
> always reserve credits for one / couple of extents and try to extend the
> transaction if we need more. The tricky part is to do the partial folio
> writeout in case we cannot extend the transaction...
> 

Yes, this is a feasible solution; however, I prefer to promote the iomap
conversion in the long run.

Thanks,
Yi.

Re: [PATCH v2 4/8] ext4/jbd2: convert jbd2_journal_blocks_per_page() to support large folio

Posted by Jan Kara 8 months, 3 weeks ago

On Tue 20-05-25 20:46:51, Zhang Yi wrote:
> On 2025/5/20 4:16, Jan Kara wrote:
> > On Mon 12-05-25 14:33:15, Zhang Yi wrote:
> >> From: Zhang Yi <yi.zhang@huawei.com>
> >>
> >> jbd2_journal_blocks_per_page() returns the number of blocks in a single
> >> page. Rename it to jbd2_journal_blocks_per_folio() and make it returns
> >> the number of blocks in the largest folio, preparing for the calculation
> >> of journal credits blocks when allocating blocks within a large folio in
> >> the writeback path.
> >>
> >> Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
> > ...
> >> @@ -2657,9 +2657,10 @@ void jbd2_journal_ack_err(journal_t *journal)
> >>  	write_unlock(&journal->j_state_lock);
> >>  }
> >>  
> >> -int jbd2_journal_blocks_per_page(struct inode *inode)
> >> +int jbd2_journal_blocks_per_folio(struct inode *inode)
> >>  {
> >> -	return 1 << (PAGE_SHIFT - inode->i_sb->s_blocksize_bits);
> >> +	return 1 << (PAGE_SHIFT + mapping_max_folio_order(inode->i_mapping) -
> >> +		     inode->i_sb->s_blocksize_bits);
> >>  }
> > 
> > FWIW this will result in us reserving some 10k transaction credits for 1k
> > blocksize with maximum 2M folio size. That is going to create serious
> > pressure on the journalling machinery. For now I guess we are fine
> 
> Oooh, indeed, you are right, thanks a lot for pointing this out. As you
> mentioned in patch 5, the credits calculation I proposed was incorrect,
> I thought it wouldn't require too many credits.
> 
> I believe it is risky to mount a filesystem with a small journal space
> and a large number of block groups. For example, if we build an image
> with a 1K block size and a 1MB journal on a 20GB disk (which contains
> 2,540 groups), it will require 2,263 credits, exceeding the available
> journal space.
> 
> For now, I'm going to disable large folio support on the filesystem with
> limited journal space. i.e., when the return value of
> ext4_writepage_trans_blocks() is greater than
> jbd2_max_user_trans_buffers(journal) / 2, ext4_should_enable_large_folio()
> return false, thoughts?

Yep, looks like a good stopgap solution for now.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

[PATCH v2 1/8] ext4: make ext4_mpage_readpages() support large folios
[PATCH v2 2/8] ext4: make regular file's buffered write path support large folios
[PATCH v2 3/8] ext4: make __ext4_block_zero_page_range() support large folio
[PATCH v2 4/8] ext4/jbd2: convert jbd2_journal_blocks_per_page() to support large folio
[PATCH v2 5/8] ext4: correct the journal credits calculations of allocating blocks
[PATCH v2 6/8] ext4: make the writeback path support large folios
[PATCH v2 7/8] ext4: make online defragmentation support large folios
[PATCH v2 8/8] ext4: enable large folio for regular file