[v3] ext4: fix insufficient credits when writing back large folios

[PATCH v3 05/10] ext4: restart handle if credits are insufficient during allocating blocks

Posted by Zhang Yi 5 months, 2 weeks ago

From: Zhang Yi <yi.zhang@huawei.com>

After large folios are supported on ext4, writing back a sufficiently
large and discontinuous folio may consume a significant number of
journal credits, placing considerable strain on the journal. For
example, in a 20GB filesystem with 1K block size and 1MB journal size,
writing back a 2MB folio could require thousands of credits in the
worst-case scenario (when each block is discontinuous and distributed
across different block groups), potentially exceeding the journal size.
This issue can also occur in ext4_write_begin() and ext4_page_mkwrite()
when delalloc is not enabled.

Fix this by ensuring that there are sufficient journal credits before
allocating an extent in mpage_map_one_extent() and
ext4_block_write_begin(). If there are not enough credits, return
-EAGAIN, exit the current mapping loop, restart a new handle and a new
transaction, and allocating blocks on this folio again in the next
iteration.

Suggested-by: Jan Kara <jack@suse.cz>
Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
---
 fs/ext4/inode.c | 40 +++++++++++++++++++++++++++++++++++-----
 1 file changed, 35 insertions(+), 5 deletions(-)

diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 31731a732df2..efe778aaf74b 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -877,6 +877,25 @@ static void ext4_update_bh_state(struct buffer_head *bh, unsigned long flags)
 	} while (unlikely(!try_cmpxchg(&bh->b_state, &old_state, new_state)));
 }
 
+/*
+ * Make sure that the current journal transaction has enough credits to map
+ * one extent. Return -EAGAIN if it cannot extend the current running
+ * transaction.
+ */
+static inline int ext4_journal_ensure_extent_credits(handle_t *handle,
+						     struct inode *inode)
+{
+	int credits;
+	int ret;
+
+	if (!handle)
+		return 0;
+
+	credits = ext4_chunk_trans_blocks(inode, 1);
+	ret = __ext4_journal_ensure_credits(handle, credits, credits, 0);
+	return ret <= 0 ? ret : -EAGAIN;
+}
+
 static int _ext4_get_block(struct inode *inode, sector_t iblock,
 			   struct buffer_head *bh, int flags)
 {
@@ -1175,7 +1194,9 @@ int ext4_block_write_begin(handle_t *handle, struct folio *folio,
 			clear_buffer_new(bh);
 		if (!buffer_mapped(bh)) {
 			WARN_ON(bh->b_size != blocksize);
-			err = get_block(inode, block, bh, 1);
+			err = ext4_journal_ensure_extent_credits(handle, inode);
+			if (!err)
+				err = get_block(inode, block, bh, 1);
 			if (err)
 				break;
 			if (buffer_new(bh)) {
@@ -1374,8 +1395,9 @@ static int ext4_write_begin(struct file *file, struct address_space *mapping,
 				ext4_orphan_del(NULL, inode);
 		}
 
-		if (ret == -ENOSPC &&
-		    ext4_should_retry_alloc(inode->i_sb, &retries))
+		if (ret == -EAGAIN ||
+		    (ret == -ENOSPC &&
+		     ext4_should_retry_alloc(inode->i_sb, &retries)))
 			goto retry_journal;
 		folio_put(folio);
 		return ret;
@@ -2323,6 +2345,11 @@ static int mpage_map_one_extent(handle_t *handle, struct mpage_da_data *mpd)
 	int get_blocks_flags;
 	int err, dioread_nolock;
 
+	/* Make sure transaction has enough credits for this extent */
+	err = ext4_journal_ensure_extent_credits(handle, inode);
+	if (err < 0)
+		return err;
+
 	trace_ext4_da_write_pages_extent(inode, map);
 	/*
 	 * Call ext4_map_blocks() to allocate any delayed allocation blocks, or
@@ -2450,7 +2477,7 @@ static int mpage_map_and_submit_extent(handle_t *handle,
 			 * In the case of ENOSPC, if ext4_count_free_blocks()
 			 * is non-zero, a commit should free up blocks.
 			 */
-			if ((err == -ENOMEM) ||
+			if ((err == -ENOMEM) || (err == -EAGAIN) ||
 			    (err == -ENOSPC && ext4_count_free_clusters(sb))) {
 				/*
 				 * We may have already allocated extents for
@@ -2956,6 +2983,8 @@ static int ext4_do_writepages(struct mpage_da_data *mpd)
 			ret = 0;
 			continue;
 		}
+		if (ret == -EAGAIN)
+			ret = 0;
 		/* Fatal error - ENOMEM, EIO... */
 		if (ret)
 			break;
@@ -6734,7 +6763,8 @@ vm_fault_t ext4_page_mkwrite(struct vm_fault *vmf)
 retry_alloc:
 	/* Start jorunal and allocate blocks */
 	err = ext4_block_page_mkwrite(inode, folio, get_block);
-	if (err == -ENOSPC && ext4_should_retry_alloc(inode->i_sb, &retries))
+	if (err == -EAGAIN ||
+	    (err == -ENOSPC && ext4_should_retry_alloc(inode->i_sb, &retries)))
 		goto retry_alloc;
 out_ret:
 	ret = vmf_fs_error(err);
-- 
2.46.1

Re: [PATCH v3 05/10] ext4: restart handle if credits are insufficient during allocating blocks

Posted by Jan Kara 5 months, 2 weeks ago

On Tue 01-07-25 21:06:30, Zhang Yi wrote:
> From: Zhang Yi <yi.zhang@huawei.com>
> 
> After large folios are supported on ext4, writing back a sufficiently
> large and discontinuous folio may consume a significant number of
> journal credits, placing considerable strain on the journal. For
> example, in a 20GB filesystem with 1K block size and 1MB journal size,
> writing back a 2MB folio could require thousands of credits in the
> worst-case scenario (when each block is discontinuous and distributed
> across different block groups), potentially exceeding the journal size.
> This issue can also occur in ext4_write_begin() and ext4_page_mkwrite()
> when delalloc is not enabled.
> 
> Fix this by ensuring that there are sufficient journal credits before
> allocating an extent in mpage_map_one_extent() and
> ext4_block_write_begin(). If there are not enough credits, return
> -EAGAIN, exit the current mapping loop, restart a new handle and a new
> transaction, and allocating blocks on this folio again in the next
> iteration.
> 
> Suggested-by: Jan Kara <jack@suse.cz>
> Signed-off-by: Zhang Yi <yi.zhang@huawei.com>

Very nice. Feel free to add:

Reviewed-by: Jan Kara <jack@suse.cz>

One small comment below:

> +/*
> + * Make sure that the current journal transaction has enough credits to map
> + * one extent. Return -EAGAIN if it cannot extend the current running
> + * transaction.
> + */
> +static inline int ext4_journal_ensure_extent_credits(handle_t *handle,
> +						     struct inode *inode)
> +{
> +	int credits;
> +	int ret;
> +
> +	if (!handle)

Shouldn't this rather be ext4_handle_valid(handle) to catch nojournal mode
properly?

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

Re: [PATCH v3 05/10] ext4: restart handle if credits are insufficient during allocating blocks

Posted by Zhang Yi 5 months, 2 weeks ago

On 2025/7/2 22:18, Jan Kara wrote:
> On Tue 01-07-25 21:06:30, Zhang Yi wrote:
>> From: Zhang Yi <yi.zhang@huawei.com>
>>
>> After large folios are supported on ext4, writing back a sufficiently
>> large and discontinuous folio may consume a significant number of
>> journal credits, placing considerable strain on the journal. For
>> example, in a 20GB filesystem with 1K block size and 1MB journal size,
>> writing back a 2MB folio could require thousands of credits in the
>> worst-case scenario (when each block is discontinuous and distributed
>> across different block groups), potentially exceeding the journal size.
>> This issue can also occur in ext4_write_begin() and ext4_page_mkwrite()
>> when delalloc is not enabled.
>>
>> Fix this by ensuring that there are sufficient journal credits before
>> allocating an extent in mpage_map_one_extent() and
>> ext4_block_write_begin(). If there are not enough credits, return
>> -EAGAIN, exit the current mapping loop, restart a new handle and a new
>> transaction, and allocating blocks on this folio again in the next
>> iteration.
>>
>> Suggested-by: Jan Kara <jack@suse.cz>
>> Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
> 
> Very nice. Feel free to add:
> 
> Reviewed-by: Jan Kara <jack@suse.cz>
> 
> One small comment below:
> 
>> +/*
>> + * Make sure that the current journal transaction has enough credits to map
>> + * one extent. Return -EAGAIN if it cannot extend the current running
>> + * transaction.
>> + */
>> +static inline int ext4_journal_ensure_extent_credits(handle_t *handle,
>> +						     struct inode *inode)
>> +{
>> +	int credits;
>> +	int ret;
>> +
>> +	if (!handle)
> 
> Shouldn't this rather be ext4_handle_valid(handle) to catch nojournal mode
> properly?
> 
__ext4_journal_ensure_credits() already calls ext4_handle_valid() to handle
nojournal mode, and the '!handle' check here is to handle the case where
ext4_block_write_begin() passes in a NULL 'handle'.

Thanks,
Yi.

Re: [PATCH v3 05/10] ext4: restart handle if credits are insufficient during allocating blocks

Posted by Jan Kara 5 months, 2 weeks ago

On Thu 03-07-25 10:13:07, Zhang Yi wrote:
> On 2025/7/2 22:18, Jan Kara wrote:
> > On Tue 01-07-25 21:06:30, Zhang Yi wrote:
> >> From: Zhang Yi <yi.zhang@huawei.com>
> >>
> >> After large folios are supported on ext4, writing back a sufficiently
> >> large and discontinuous folio may consume a significant number of
> >> journal credits, placing considerable strain on the journal. For
> >> example, in a 20GB filesystem with 1K block size and 1MB journal size,
> >> writing back a 2MB folio could require thousands of credits in the
> >> worst-case scenario (when each block is discontinuous and distributed
> >> across different block groups), potentially exceeding the journal size.
> >> This issue can also occur in ext4_write_begin() and ext4_page_mkwrite()
> >> when delalloc is not enabled.
> >>
> >> Fix this by ensuring that there are sufficient journal credits before
> >> allocating an extent in mpage_map_one_extent() and
> >> ext4_block_write_begin(). If there are not enough credits, return
> >> -EAGAIN, exit the current mapping loop, restart a new handle and a new
> >> transaction, and allocating blocks on this folio again in the next
> >> iteration.
> >>
> >> Suggested-by: Jan Kara <jack@suse.cz>
> >> Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
> > 
> > Very nice. Feel free to add:
> > 
> > Reviewed-by: Jan Kara <jack@suse.cz>
> > 
> > One small comment below:
> > 
> >> +/*
> >> + * Make sure that the current journal transaction has enough credits to map
> >> + * one extent. Return -EAGAIN if it cannot extend the current running
> >> + * transaction.
> >> + */
> >> +static inline int ext4_journal_ensure_extent_credits(handle_t *handle,
> >> +						     struct inode *inode)
> >> +{
> >> +	int credits;
> >> +	int ret;
> >> +
> >> +	if (!handle)
> > 
> > Shouldn't this rather be ext4_handle_valid(handle) to catch nojournal mode
> > properly?
> > 
> __ext4_journal_ensure_credits() already calls ext4_handle_valid() to handle
> nojournal mode, and the '!handle' check here is to handle the case where
> ext4_block_write_begin() passes in a NULL 'handle'.

Ah, right. But then you don't need the test at all, do you? Anyway,
whatever you decide to do with this (or nothing) is fine by me.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

Re: [PATCH v3 05/10] ext4: restart handle if credits are insufficient during allocating blocks

Posted by Zhang Yi 5 months, 2 weeks ago

On 2025/7/4 0:27, Jan Kara wrote:
> On Thu 03-07-25 10:13:07, Zhang Yi wrote:
>> On 2025/7/2 22:18, Jan Kara wrote:
>>> On Tue 01-07-25 21:06:30, Zhang Yi wrote:
>>>> From: Zhang Yi <yi.zhang@huawei.com>
>>>>
>>>> After large folios are supported on ext4, writing back a sufficiently
>>>> large and discontinuous folio may consume a significant number of
>>>> journal credits, placing considerable strain on the journal. For
>>>> example, in a 20GB filesystem with 1K block size and 1MB journal size,
>>>> writing back a 2MB folio could require thousands of credits in the
>>>> worst-case scenario (when each block is discontinuous and distributed
>>>> across different block groups), potentially exceeding the journal size.
>>>> This issue can also occur in ext4_write_begin() and ext4_page_mkwrite()
>>>> when delalloc is not enabled.
>>>>
>>>> Fix this by ensuring that there are sufficient journal credits before
>>>> allocating an extent in mpage_map_one_extent() and
>>>> ext4_block_write_begin(). If there are not enough credits, return
>>>> -EAGAIN, exit the current mapping loop, restart a new handle and a new
>>>> transaction, and allocating blocks on this folio again in the next
>>>> iteration.
>>>>
>>>> Suggested-by: Jan Kara <jack@suse.cz>
>>>> Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
>>>
>>> Very nice. Feel free to add:
>>>
>>> Reviewed-by: Jan Kara <jack@suse.cz>
>>>
>>> One small comment below:
>>>
>>>> +/*
>>>> + * Make sure that the current journal transaction has enough credits to map
>>>> + * one extent. Return -EAGAIN if it cannot extend the current running
>>>> + * transaction.
>>>> + */
>>>> +static inline int ext4_journal_ensure_extent_credits(handle_t *handle,
>>>> +						     struct inode *inode)
>>>> +{
>>>> +	int credits;
>>>> +	int ret;
>>>> +
>>>> +	if (!handle)
>>>
>>> Shouldn't this rather be ext4_handle_valid(handle) to catch nojournal mode
>>> properly?
>>>
>> __ext4_journal_ensure_credits() already calls ext4_handle_valid() to handle
>> nojournal mode, and the '!handle' check here is to handle the case where
>> ext4_block_write_begin() passes in a NULL 'handle'.
> 
> Ah, right. But then you don't need the test at all, do you? Anyway,
> whatever you decide to do with this (or nothing) is fine by me.
> 

Yeah, remove this test is fine with me. I added this one is because the
comments in ext4_handle_valid() said "Do not use this for NULL handles."
I think it is best to follow this rule. :)

Best regards,
Yi.

Re: [PATCH v3 05/10] ext4: restart handle if credits are insufficient during allocating blocks

Posted by Jan Kara 5 months, 2 weeks ago

On Fri 04-07-25 09:40:23, Zhang Yi wrote:
> On 2025/7/4 0:27, Jan Kara wrote:
> > On Thu 03-07-25 10:13:07, Zhang Yi wrote:
> >> On 2025/7/2 22:18, Jan Kara wrote:
> >>> On Tue 01-07-25 21:06:30, Zhang Yi wrote:
> >>>> From: Zhang Yi <yi.zhang@huawei.com>
> >>>>
> >>>> After large folios are supported on ext4, writing back a sufficiently
> >>>> large and discontinuous folio may consume a significant number of
> >>>> journal credits, placing considerable strain on the journal. For
> >>>> example, in a 20GB filesystem with 1K block size and 1MB journal size,
> >>>> writing back a 2MB folio could require thousands of credits in the
> >>>> worst-case scenario (when each block is discontinuous and distributed
> >>>> across different block groups), potentially exceeding the journal size.
> >>>> This issue can also occur in ext4_write_begin() and ext4_page_mkwrite()
> >>>> when delalloc is not enabled.
> >>>>
> >>>> Fix this by ensuring that there are sufficient journal credits before
> >>>> allocating an extent in mpage_map_one_extent() and
> >>>> ext4_block_write_begin(). If there are not enough credits, return
> >>>> -EAGAIN, exit the current mapping loop, restart a new handle and a new
> >>>> transaction, and allocating blocks on this folio again in the next
> >>>> iteration.
> >>>>
> >>>> Suggested-by: Jan Kara <jack@suse.cz>
> >>>> Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
> >>>
> >>> Very nice. Feel free to add:
> >>>
> >>> Reviewed-by: Jan Kara <jack@suse.cz>
> >>>
> >>> One small comment below:
> >>>
> >>>> +/*
> >>>> + * Make sure that the current journal transaction has enough credits to map
> >>>> + * one extent. Return -EAGAIN if it cannot extend the current running
> >>>> + * transaction.
> >>>> + */
> >>>> +static inline int ext4_journal_ensure_extent_credits(handle_t *handle,
> >>>> +						     struct inode *inode)
> >>>> +{
> >>>> +	int credits;
> >>>> +	int ret;
> >>>> +
> >>>> +	if (!handle)
> >>>
> >>> Shouldn't this rather be ext4_handle_valid(handle) to catch nojournal mode
> >>> properly?
> >>>
> >> __ext4_journal_ensure_credits() already calls ext4_handle_valid() to handle
> >> nojournal mode, and the '!handle' check here is to handle the case where
> >> ext4_block_write_begin() passes in a NULL 'handle'.
> > 
> > Ah, right. But then you don't need the test at all, do you? Anyway,
> > whatever you decide to do with this (or nothing) is fine by me.
> > 
> 
> Yeah, remove this test is fine with me. I added this one is because the
> comments in ext4_handle_valid() said "Do not use this for NULL handles."
> I think it is best to follow this rule. :)

Right, I didn't read that comment :) So maybe the best fix will be just
adding a comment before the test like:

	/* Called from ext4_da_write_begin() which has no handle started? */
	if (!handle)
		return 0;

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

Re: [PATCH v3 05/10] ext4: restart handle if credits are insufficient during allocating blocks

Posted by Zhang Yi 5 months, 2 weeks ago

On 2025/7/4 16:18, Jan Kara wrote:
> On Fri 04-07-25 09:40:23, Zhang Yi wrote:
>> On 2025/7/4 0:27, Jan Kara wrote:
>>> On Thu 03-07-25 10:13:07, Zhang Yi wrote:
>>>> On 2025/7/2 22:18, Jan Kara wrote:
>>>>> On Tue 01-07-25 21:06:30, Zhang Yi wrote:
>>>>>> From: Zhang Yi <yi.zhang@huawei.com>
>>>>>>
>>>>>> After large folios are supported on ext4, writing back a sufficiently
>>>>>> large and discontinuous folio may consume a significant number of
>>>>>> journal credits, placing considerable strain on the journal. For
>>>>>> example, in a 20GB filesystem with 1K block size and 1MB journal size,
>>>>>> writing back a 2MB folio could require thousands of credits in the
>>>>>> worst-case scenario (when each block is discontinuous and distributed
>>>>>> across different block groups), potentially exceeding the journal size.
>>>>>> This issue can also occur in ext4_write_begin() and ext4_page_mkwrite()
>>>>>> when delalloc is not enabled.
>>>>>>
>>>>>> Fix this by ensuring that there are sufficient journal credits before
>>>>>> allocating an extent in mpage_map_one_extent() and
>>>>>> ext4_block_write_begin(). If there are not enough credits, return
>>>>>> -EAGAIN, exit the current mapping loop, restart a new handle and a new
>>>>>> transaction, and allocating blocks on this folio again in the next
>>>>>> iteration.
>>>>>>
>>>>>> Suggested-by: Jan Kara <jack@suse.cz>
>>>>>> Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
>>>>>
>>>>> Very nice. Feel free to add:
>>>>>
>>>>> Reviewed-by: Jan Kara <jack@suse.cz>
>>>>>
>>>>> One small comment below:
>>>>>
>>>>>> +/*
>>>>>> + * Make sure that the current journal transaction has enough credits to map
>>>>>> + * one extent. Return -EAGAIN if it cannot extend the current running
>>>>>> + * transaction.
>>>>>> + */
>>>>>> +static inline int ext4_journal_ensure_extent_credits(handle_t *handle,
>>>>>> +						     struct inode *inode)
>>>>>> +{
>>>>>> +	int credits;
>>>>>> +	int ret;
>>>>>> +
>>>>>> +	if (!handle)
>>>>>
>>>>> Shouldn't this rather be ext4_handle_valid(handle) to catch nojournal mode
>>>>> properly?
>>>>>
>>>> __ext4_journal_ensure_credits() already calls ext4_handle_valid() to handle
>>>> nojournal mode, and the '!handle' check here is to handle the case where
>>>> ext4_block_write_begin() passes in a NULL 'handle'.
>>>
>>> Ah, right. But then you don't need the test at all, do you? Anyway,
>>> whatever you decide to do with this (or nothing) is fine by me.
>>>
>>
>> Yeah, remove this test is fine with me. I added this one is because the
>> comments in ext4_handle_valid() said "Do not use this for NULL handles."
>> I think it is best to follow this rule. :)
> 
> Right, I didn't read that comment :) So maybe the best fix will be just
> adding a comment before the test like:
> 
> 	/* Called from ext4_da_write_begin() which has no handle started? */
> 	if (!handle)
> 		return 0;
> 

Sure, it looks good, will do.

Thanks,
Yi.