[v2] ext4: fix insufficient credits when writing back large folios

[PATCH v2 5/6] ext4/jbd2: reintroduce jbd2_journal_blocks_per_page()

Posted by Zhang Yi 8 months ago

From: Zhang Yi <yi.zhang@huawei.com>

This partially reverts commit d6bf294773a4 ("ext4/jbd2: convert
jbd2_journal_blocks_per_page() to support large folio"). This
jbd2_journal_blocks_per_folio() will lead to a significant
overestimation of journal credits. Since we still reserve credits for
one page and attempt to extend and restart handles during large folio
writebacks, so we should convert this helper back to
ext4_journal_blocks_per_page().

Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
---
 fs/ext4/ext4_jbd2.h  | 7 +++++++
 fs/ext4/inode.c      | 6 +++---
 fs/jbd2/journal.c    | 6 ++++++
 include/linux/jbd2.h | 1 +
 4 files changed, 17 insertions(+), 3 deletions(-)

diff --git a/fs/ext4/ext4_jbd2.h b/fs/ext4/ext4_jbd2.h
index 63d17c5201b5..c0ee756cb34c 100644
--- a/fs/ext4/ext4_jbd2.h
+++ b/fs/ext4/ext4_jbd2.h
@@ -326,6 +326,13 @@ static inline int ext4_journal_blocks_per_folio(struct inode *inode)
 	return 0;
 }
 
+static inline int ext4_journal_blocks_per_page(struct inode *inode)
+{
+	if (EXT4_JOURNAL(inode) != NULL)
+		return jbd2_journal_blocks_per_page(inode);
+	return 0;
+}
+
 static inline int ext4_journal_force_commit(journal_t *journal)
 {
 	if (journal)
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 67e37dd546eb..9835145b1b27 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -2556,7 +2556,7 @@ static int mpage_map_and_submit_extent(handle_t *handle,
  */
 static int ext4_da_writepages_trans_blocks(struct inode *inode)
 {
-	int bpp = ext4_journal_blocks_per_folio(inode);
+	int bpp = ext4_journal_blocks_per_page(inode);
 
 	return ext4_meta_trans_blocks(inode,
 				MAX_WRITEPAGES_EXTENT_LEN + bpp - 1, bpp);
@@ -2634,7 +2634,7 @@ static int mpage_prepare_extent_to_map(struct mpage_da_data *mpd)
 	ext4_lblk_t lblk;
 	struct buffer_head *head;
 	handle_t *handle = NULL;
-	int bpp = ext4_journal_blocks_per_folio(mpd->inode);
+	int bpp = ext4_journal_blocks_per_page(mpd->inode);
 
 	if (mpd->wbc->sync_mode == WB_SYNC_ALL || mpd->wbc->tagged_writepages)
 		tag = PAGECACHE_TAG_TOWRITE;
@@ -6255,7 +6255,7 @@ int ext4_meta_trans_blocks(struct inode *inode, int lblocks, int pextents)
  */
 int ext4_writepage_trans_blocks(struct inode *inode)
 {
-	int bpp = ext4_journal_blocks_per_folio(inode);
+	int bpp = ext4_journal_blocks_per_page(inode);
 	int ret;
 
 	ret = ext4_meta_trans_blocks(inode, bpp, bpp);
diff --git a/fs/jbd2/journal.c b/fs/jbd2/journal.c
index d480b94117cd..7fccb425907f 100644
--- a/fs/jbd2/journal.c
+++ b/fs/jbd2/journal.c
@@ -84,6 +84,7 @@ EXPORT_SYMBOL(jbd2_journal_start_commit);
 EXPORT_SYMBOL(jbd2_journal_force_commit_nested);
 EXPORT_SYMBOL(jbd2_journal_wipe);
 EXPORT_SYMBOL(jbd2_journal_blocks_per_folio);
+EXPORT_SYMBOL(jbd2_journal_blocks_per_page);
 EXPORT_SYMBOL(jbd2_journal_invalidate_folio);
 EXPORT_SYMBOL(jbd2_journal_try_to_free_buffers);
 EXPORT_SYMBOL(jbd2_journal_force_commit);
@@ -2661,6 +2662,11 @@ int jbd2_journal_blocks_per_folio(struct inode *inode)
 		     inode->i_sb->s_blocksize_bits);
 }
 
+int jbd2_journal_blocks_per_page(struct inode *inode)
+{
+	return 1 << (PAGE_SHIFT - inode->i_sb->s_blocksize_bits);
+}
+
 /*
  * helper functions to deal with 32 or 64bit block numbers.
  */
diff --git a/include/linux/jbd2.h b/include/linux/jbd2.h
index 43b9297fe8a7..f35369c104ba 100644
--- a/include/linux/jbd2.h
+++ b/include/linux/jbd2.h
@@ -1724,6 +1724,7 @@ static inline int tid_geq(tid_t x, tid_t y)
 }
 
 extern int jbd2_journal_blocks_per_folio(struct inode *inode);
+extern int jbd2_journal_blocks_per_page(struct inode *inode);
 extern size_t journal_tag_bytes(journal_t *journal);
 
 static inline int jbd2_journal_has_csum_v2or3(journal_t *journal)
-- 
2.46.1

Re: [PATCH v2 5/6] ext4/jbd2: reintroduce jbd2_journal_blocks_per_page()

Posted by Jan Kara 7 months, 3 weeks ago

On Wed 11-06-25 19:16:24, Zhang Yi wrote:
> From: Zhang Yi <yi.zhang@huawei.com>
> 
> This partially reverts commit d6bf294773a4 ("ext4/jbd2: convert
> jbd2_journal_blocks_per_page() to support large folio"). This
> jbd2_journal_blocks_per_folio() will lead to a significant
> overestimation of journal credits. Since we still reserve credits for
> one page and attempt to extend and restart handles during large folio
> writebacks, so we should convert this helper back to
> ext4_journal_blocks_per_page().
> 
> Signed-off-by: Zhang Yi <yi.zhang@huawei.com>

Here I'm not decided. Does it make any particular sense to reserve credits
for one *page* worth of blocks when pages don't have any particular meaning
in our writeback code anymore? We could reserve credits just for one
physical extent and that should be enough. For blocksize == pagesize (most
common configs) this would be actually equivalent. If blocksize < pagesize,
this could force us to do some more writeback retries and thus get somewhat
higher writeback CPU overhead but do we really care for these configs?  It
is well possible I've overlooked something and someone will spot a
performance regression in practical setup with this in which case we'd have
to come up with something more clever but I think it's worth it to start
simple and complicate later.

								Honza

> ---
>  fs/ext4/ext4_jbd2.h  | 7 +++++++
>  fs/ext4/inode.c      | 6 +++---
>  fs/jbd2/journal.c    | 6 ++++++
>  include/linux/jbd2.h | 1 +
>  4 files changed, 17 insertions(+), 3 deletions(-)
> 
> diff --git a/fs/ext4/ext4_jbd2.h b/fs/ext4/ext4_jbd2.h
> index 63d17c5201b5..c0ee756cb34c 100644
> --- a/fs/ext4/ext4_jbd2.h
> +++ b/fs/ext4/ext4_jbd2.h
> @@ -326,6 +326,13 @@ static inline int ext4_journal_blocks_per_folio(struct inode *inode)
>  	return 0;
>  }
>  
> +static inline int ext4_journal_blocks_per_page(struct inode *inode)
> +{
> +	if (EXT4_JOURNAL(inode) != NULL)
> +		return jbd2_journal_blocks_per_page(inode);
> +	return 0;
> +}
> +
>  static inline int ext4_journal_force_commit(journal_t *journal)
>  {
>  	if (journal)
> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
> index 67e37dd546eb..9835145b1b27 100644
> --- a/fs/ext4/inode.c
> +++ b/fs/ext4/inode.c
> @@ -2556,7 +2556,7 @@ static int mpage_map_and_submit_extent(handle_t *handle,
>   */
>  static int ext4_da_writepages_trans_blocks(struct inode *inode)
>  {
> -	int bpp = ext4_journal_blocks_per_folio(inode);
> +	int bpp = ext4_journal_blocks_per_page(inode);
>  
>  	return ext4_meta_trans_blocks(inode,
>  				MAX_WRITEPAGES_EXTENT_LEN + bpp - 1, bpp);
> @@ -2634,7 +2634,7 @@ static int mpage_prepare_extent_to_map(struct mpage_da_data *mpd)
>  	ext4_lblk_t lblk;
>  	struct buffer_head *head;
>  	handle_t *handle = NULL;
> -	int bpp = ext4_journal_blocks_per_folio(mpd->inode);
> +	int bpp = ext4_journal_blocks_per_page(mpd->inode);
>  
>  	if (mpd->wbc->sync_mode == WB_SYNC_ALL || mpd->wbc->tagged_writepages)
>  		tag = PAGECACHE_TAG_TOWRITE;
> @@ -6255,7 +6255,7 @@ int ext4_meta_trans_blocks(struct inode *inode, int lblocks, int pextents)
>   */
>  int ext4_writepage_trans_blocks(struct inode *inode)
>  {
> -	int bpp = ext4_journal_blocks_per_folio(inode);
> +	int bpp = ext4_journal_blocks_per_page(inode);
>  	int ret;
>  
>  	ret = ext4_meta_trans_blocks(inode, bpp, bpp);
> diff --git a/fs/jbd2/journal.c b/fs/jbd2/journal.c
> index d480b94117cd..7fccb425907f 100644
> --- a/fs/jbd2/journal.c
> +++ b/fs/jbd2/journal.c
> @@ -84,6 +84,7 @@ EXPORT_SYMBOL(jbd2_journal_start_commit);
>  EXPORT_SYMBOL(jbd2_journal_force_commit_nested);
>  EXPORT_SYMBOL(jbd2_journal_wipe);
>  EXPORT_SYMBOL(jbd2_journal_blocks_per_folio);
> +EXPORT_SYMBOL(jbd2_journal_blocks_per_page);
>  EXPORT_SYMBOL(jbd2_journal_invalidate_folio);
>  EXPORT_SYMBOL(jbd2_journal_try_to_free_buffers);
>  EXPORT_SYMBOL(jbd2_journal_force_commit);
> @@ -2661,6 +2662,11 @@ int jbd2_journal_blocks_per_folio(struct inode *inode)
>  		     inode->i_sb->s_blocksize_bits);
>  }
>  
> +int jbd2_journal_blocks_per_page(struct inode *inode)
> +{
> +	return 1 << (PAGE_SHIFT - inode->i_sb->s_blocksize_bits);
> +}
> +
>  /*
>   * helper functions to deal with 32 or 64bit block numbers.
>   */
> diff --git a/include/linux/jbd2.h b/include/linux/jbd2.h
> index 43b9297fe8a7..f35369c104ba 100644
> --- a/include/linux/jbd2.h
> +++ b/include/linux/jbd2.h
> @@ -1724,6 +1724,7 @@ static inline int tid_geq(tid_t x, tid_t y)
>  }
>  
>  extern int jbd2_journal_blocks_per_folio(struct inode *inode);
> +extern int jbd2_journal_blocks_per_page(struct inode *inode);
>  extern size_t journal_tag_bytes(journal_t *journal);
>  
>  static inline int jbd2_journal_has_csum_v2or3(journal_t *journal)
> -- 
> 2.46.1
> 
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

Re: [PATCH v2 5/6] ext4/jbd2: reintroduce jbd2_journal_blocks_per_page()

Posted by Zhang Yi 7 months, 3 weeks ago

On 2025/6/20 0:44, Jan Kara wrote:
> On Wed 11-06-25 19:16:24, Zhang Yi wrote:
>> From: Zhang Yi <yi.zhang@huawei.com>
>>
>> This partially reverts commit d6bf294773a4 ("ext4/jbd2: convert
>> jbd2_journal_blocks_per_page() to support large folio"). This
>> jbd2_journal_blocks_per_folio() will lead to a significant
>> overestimation of journal credits. Since we still reserve credits for
>> one page and attempt to extend and restart handles during large folio
>> writebacks, so we should convert this helper back to
>> ext4_journal_blocks_per_page().
>>
>> Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
> 
> Here I'm not decided. Does it make any particular sense to reserve credits
> for one *page* worth of blocks when pages don't have any particular meaning
> in our writeback code anymore? We could reserve credits just for one
> physical extent and that should be enough.

Indeed, reserving credits for a single page is no longer suitable in the
currently folio based context. It do seems more appropriate to allocate
credits for a single extent.

> For blocksize == pagesize (most
> common configs) this would be actually equivalent. If blocksize < pagesize,
> this could force us to do some more writeback retries and thus get somewhat
> higher writeback CPU overhead but do we really care for these configs?  It
> is well possible I've overlooked something and someone will spot a
> performance regression in practical setup with this in which case we'd have
> to come up with something more clever but I think it's worth it to start
> simple and complicate later.

This can indeed be a problem if the file system is already fragmented
enough. However, thanks to the credits extension logic in
__ext4_journal_ensure_credits(), I suppose that on most file system images,
it will not trigger excessive retry operations. Besides, although there
might be some lock cost in jbd2_journal_extend(), I suppose it won't be a
big deal.

Perhaps we could reserve more credits through a complex formula at the
outset, which would lower the cost of expanding the credits. But I don't
think this will help much in reducing the number of retries, it may only
be helpful in extreme cases (the running transaction stats to commit, we
cannot extend it).

So, I think we can implement it by reserving for an extent for the time
being. Do you agree?

Thanks,
Yi.

> 
> 
>> ---
>>  fs/ext4/ext4_jbd2.h  | 7 +++++++
>>  fs/ext4/inode.c      | 6 +++---
>>  fs/jbd2/journal.c    | 6 ++++++
>>  include/linux/jbd2.h | 1 +
>>  4 files changed, 17 insertions(+), 3 deletions(-)
>>
>> diff --git a/fs/ext4/ext4_jbd2.h b/fs/ext4/ext4_jbd2.h
>> index 63d17c5201b5..c0ee756cb34c 100644
>> --- a/fs/ext4/ext4_jbd2.h
>> +++ b/fs/ext4/ext4_jbd2.h
>> @@ -326,6 +326,13 @@ static inline int ext4_journal_blocks_per_folio(struct inode *inode)
>>  	return 0;
>>  }
>>  
>> +static inline int ext4_journal_blocks_per_page(struct inode *inode)
>> +{
>> +	if (EXT4_JOURNAL(inode) != NULL)
>> +		return jbd2_journal_blocks_per_page(inode);
>> +	return 0;
>> +}
>> +
>>  static inline int ext4_journal_force_commit(journal_t *journal)
>>  {
>>  	if (journal)
>> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
>> index 67e37dd546eb..9835145b1b27 100644
>> --- a/fs/ext4/inode.c
>> +++ b/fs/ext4/inode.c
>> @@ -2556,7 +2556,7 @@ static int mpage_map_and_submit_extent(handle_t *handle,
>>   */
>>  static int ext4_da_writepages_trans_blocks(struct inode *inode)
>>  {
>> -	int bpp = ext4_journal_blocks_per_folio(inode);
>> +	int bpp = ext4_journal_blocks_per_page(inode);
>>  
>>  	return ext4_meta_trans_blocks(inode,
>>  				MAX_WRITEPAGES_EXTENT_LEN + bpp - 1, bpp);
>> @@ -2634,7 +2634,7 @@ static int mpage_prepare_extent_to_map(struct mpage_da_data *mpd)
>>  	ext4_lblk_t lblk;
>>  	struct buffer_head *head;
>>  	handle_t *handle = NULL;
>> -	int bpp = ext4_journal_blocks_per_folio(mpd->inode);
>> +	int bpp = ext4_journal_blocks_per_page(mpd->inode);
>>  
>>  	if (mpd->wbc->sync_mode == WB_SYNC_ALL || mpd->wbc->tagged_writepages)
>>  		tag = PAGECACHE_TAG_TOWRITE;
>> @@ -6255,7 +6255,7 @@ int ext4_meta_trans_blocks(struct inode *inode, int lblocks, int pextents)
>>   */
>>  int ext4_writepage_trans_blocks(struct inode *inode)
>>  {
>> -	int bpp = ext4_journal_blocks_per_folio(inode);
>> +	int bpp = ext4_journal_blocks_per_page(inode);
>>  	int ret;
>>  
>>  	ret = ext4_meta_trans_blocks(inode, bpp, bpp);
>> diff --git a/fs/jbd2/journal.c b/fs/jbd2/journal.c
>> index d480b94117cd..7fccb425907f 100644
>> --- a/fs/jbd2/journal.c
>> +++ b/fs/jbd2/journal.c
>> @@ -84,6 +84,7 @@ EXPORT_SYMBOL(jbd2_journal_start_commit);
>>  EXPORT_SYMBOL(jbd2_journal_force_commit_nested);
>>  EXPORT_SYMBOL(jbd2_journal_wipe);
>>  EXPORT_SYMBOL(jbd2_journal_blocks_per_folio);
>> +EXPORT_SYMBOL(jbd2_journal_blocks_per_page);
>>  EXPORT_SYMBOL(jbd2_journal_invalidate_folio);
>>  EXPORT_SYMBOL(jbd2_journal_try_to_free_buffers);
>>  EXPORT_SYMBOL(jbd2_journal_force_commit);
>> @@ -2661,6 +2662,11 @@ int jbd2_journal_blocks_per_folio(struct inode *inode)
>>  		     inode->i_sb->s_blocksize_bits);
>>  }
>>  
>> +int jbd2_journal_blocks_per_page(struct inode *inode)
>> +{
>> +	return 1 << (PAGE_SHIFT - inode->i_sb->s_blocksize_bits);
>> +}
>> +
>>  /*
>>   * helper functions to deal with 32 or 64bit block numbers.
>>   */
>> diff --git a/include/linux/jbd2.h b/include/linux/jbd2.h
>> index 43b9297fe8a7..f35369c104ba 100644
>> --- a/include/linux/jbd2.h
>> +++ b/include/linux/jbd2.h
>> @@ -1724,6 +1724,7 @@ static inline int tid_geq(tid_t x, tid_t y)
>>  }
>>  
>>  extern int jbd2_journal_blocks_per_folio(struct inode *inode);
>> +extern int jbd2_journal_blocks_per_page(struct inode *inode);
>>  extern size_t journal_tag_bytes(journal_t *journal);
>>  
>>  static inline int jbd2_journal_has_csum_v2or3(journal_t *journal)
>> -- 
>> 2.46.1
>>

Re: [PATCH v2 5/6] ext4/jbd2: reintroduce jbd2_journal_blocks_per_page()

Posted by Jan Kara 7 months, 3 weeks ago

On Sat 21-06-25 15:46:40, Zhang Yi wrote:
> On 2025/6/20 0:44, Jan Kara wrote:
> > On Wed 11-06-25 19:16:24, Zhang Yi wrote:
> >> From: Zhang Yi <yi.zhang@huawei.com>
> >>
> >> This partially reverts commit d6bf294773a4 ("ext4/jbd2: convert
> >> jbd2_journal_blocks_per_page() to support large folio"). This
> >> jbd2_journal_blocks_per_folio() will lead to a significant
> >> overestimation of journal credits. Since we still reserve credits for
> >> one page and attempt to extend and restart handles during large folio
> >> writebacks, so we should convert this helper back to
> >> ext4_journal_blocks_per_page().
> >>
> >> Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
> > 
> > Here I'm not decided. Does it make any particular sense to reserve credits
> > for one *page* worth of blocks when pages don't have any particular meaning
> > in our writeback code anymore? We could reserve credits just for one
> > physical extent and that should be enough.
> 
> Indeed, reserving credits for a single page is no longer suitable in the
> currently folio based context. It do seems more appropriate to allocate
> credits for a single extent.
> 
> > For blocksize == pagesize (most
> > common configs) this would be actually equivalent. If blocksize < pagesize,
> > this could force us to do some more writeback retries and thus get somewhat
> > higher writeback CPU overhead but do we really care for these configs?  It
> > is well possible I've overlooked something and someone will spot a
> > performance regression in practical setup with this in which case we'd have
> > to come up with something more clever but I think it's worth it to start
> > simple and complicate later.
> 
> This can indeed be a problem if the file system is already fragmented
> enough. However, thanks to the credits extension logic in
> __ext4_journal_ensure_credits(), I suppose that on most file system images,
> it will not trigger excessive retry operations. Besides, although there
> might be some lock cost in jbd2_journal_extend(), I suppose it won't be a
> big deal.
> 
> Perhaps we could reserve more credits through a complex formula at the
> outset, which would lower the cost of expanding the credits. But I don't
> think this will help much in reducing the number of retries, it may only
> be helpful in extreme cases (the running transaction stats to commit, we
> cannot extend it).
> 
> So, I think we can implement it by reserving for an extent for the time
> being. Do you agree?

Yes, I agree. After all this is easy to change if we find some real issues
with it.

								Honza

-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

[PATCH v2 1/6] ext4: move the calculation of wbc->nr_to_write to mpage_folio_done()
[PATCH v2 2/6] ext4: fix stale data if it bail out of the extents mapping loop
[PATCH v2 3/6] ext4: restart handle if credits are insufficient during allocating blocks
[PATCH v2 4/6] ext4: correct the reserved credits for extent conversion
[PATCH v2 5/6] ext4/jbd2: reintroduce jbd2_journal_blocks_per_page()
[PATCH v2 6/6] ext4: fix insufficient credits calculation in ext4_meta_trans_blocks()