From: Zhang Yi <yi.zhang@huawei.com>
After large folios are supported on ext4, writing back a sufficiently
large and discontinuous folio may consume a significant number of
journal credits, placing considerable strain on the journal. For
example, in a 20GB filesystem with 1K block size and 1MB journal size,
writing back a 2MB folio could require thousands of credits in the
worst-case scenario (when each block is discontinuous and distributed
across different block groups), potentially exceeding the journal size.
This issue can also occur in ext4_write_begin() and ext4_page_mkwrite()
when delalloc is not enabled.
Fix this by ensuring that there are sufficient journal credits before
allocating an extent in mpage_map_one_extent() and
ext4_block_write_begin(). If there are not enough credits, return
-EAGAIN, exit the current mapping loop, restart a new handle and a new
transaction, and allocating blocks on this folio again in the next
iteration.
Suggested-by: Jan Kara <jack@suse.cz>
Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
---
fs/ext4/inode.c | 40 +++++++++++++++++++++++++++++++++++-----
1 file changed, 35 insertions(+), 5 deletions(-)
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 31731a732df2..efe778aaf74b 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -877,6 +877,25 @@ static void ext4_update_bh_state(struct buffer_head *bh, unsigned long flags)
} while (unlikely(!try_cmpxchg(&bh->b_state, &old_state, new_state)));
}
+/*
+ * Make sure that the current journal transaction has enough credits to map
+ * one extent. Return -EAGAIN if it cannot extend the current running
+ * transaction.
+ */
+static inline int ext4_journal_ensure_extent_credits(handle_t *handle,
+ struct inode *inode)
+{
+ int credits;
+ int ret;
+
+ if (!handle)
+ return 0;
+
+ credits = ext4_chunk_trans_blocks(inode, 1);
+ ret = __ext4_journal_ensure_credits(handle, credits, credits, 0);
+ return ret <= 0 ? ret : -EAGAIN;
+}
+
static int _ext4_get_block(struct inode *inode, sector_t iblock,
struct buffer_head *bh, int flags)
{
@@ -1175,7 +1194,9 @@ int ext4_block_write_begin(handle_t *handle, struct folio *folio,
clear_buffer_new(bh);
if (!buffer_mapped(bh)) {
WARN_ON(bh->b_size != blocksize);
- err = get_block(inode, block, bh, 1);
+ err = ext4_journal_ensure_extent_credits(handle, inode);
+ if (!err)
+ err = get_block(inode, block, bh, 1);
if (err)
break;
if (buffer_new(bh)) {
@@ -1374,8 +1395,9 @@ static int ext4_write_begin(struct file *file, struct address_space *mapping,
ext4_orphan_del(NULL, inode);
}
- if (ret == -ENOSPC &&
- ext4_should_retry_alloc(inode->i_sb, &retries))
+ if (ret == -EAGAIN ||
+ (ret == -ENOSPC &&
+ ext4_should_retry_alloc(inode->i_sb, &retries)))
goto retry_journal;
folio_put(folio);
return ret;
@@ -2323,6 +2345,11 @@ static int mpage_map_one_extent(handle_t *handle, struct mpage_da_data *mpd)
int get_blocks_flags;
int err, dioread_nolock;
+ /* Make sure transaction has enough credits for this extent */
+ err = ext4_journal_ensure_extent_credits(handle, inode);
+ if (err < 0)
+ return err;
+
trace_ext4_da_write_pages_extent(inode, map);
/*
* Call ext4_map_blocks() to allocate any delayed allocation blocks, or
@@ -2450,7 +2477,7 @@ static int mpage_map_and_submit_extent(handle_t *handle,
* In the case of ENOSPC, if ext4_count_free_blocks()
* is non-zero, a commit should free up blocks.
*/
- if ((err == -ENOMEM) ||
+ if ((err == -ENOMEM) || (err == -EAGAIN) ||
(err == -ENOSPC && ext4_count_free_clusters(sb))) {
/*
* We may have already allocated extents for
@@ -2956,6 +2983,8 @@ static int ext4_do_writepages(struct mpage_da_data *mpd)
ret = 0;
continue;
}
+ if (ret == -EAGAIN)
+ ret = 0;
/* Fatal error - ENOMEM, EIO... */
if (ret)
break;
@@ -6734,7 +6763,8 @@ vm_fault_t ext4_page_mkwrite(struct vm_fault *vmf)
retry_alloc:
/* Start jorunal and allocate blocks */
err = ext4_block_page_mkwrite(inode, folio, get_block);
- if (err == -ENOSPC && ext4_should_retry_alloc(inode->i_sb, &retries))
+ if (err == -EAGAIN ||
+ (err == -ENOSPC && ext4_should_retry_alloc(inode->i_sb, &retries)))
goto retry_alloc;
out_ret:
ret = vmf_fs_error(err);
--
2.46.1
On Tue 01-07-25 21:06:30, Zhang Yi wrote:
> From: Zhang Yi <yi.zhang@huawei.com>
>
> After large folios are supported on ext4, writing back a sufficiently
> large and discontinuous folio may consume a significant number of
> journal credits, placing considerable strain on the journal. For
> example, in a 20GB filesystem with 1K block size and 1MB journal size,
> writing back a 2MB folio could require thousands of credits in the
> worst-case scenario (when each block is discontinuous and distributed
> across different block groups), potentially exceeding the journal size.
> This issue can also occur in ext4_write_begin() and ext4_page_mkwrite()
> when delalloc is not enabled.
>
> Fix this by ensuring that there are sufficient journal credits before
> allocating an extent in mpage_map_one_extent() and
> ext4_block_write_begin(). If there are not enough credits, return
> -EAGAIN, exit the current mapping loop, restart a new handle and a new
> transaction, and allocating blocks on this folio again in the next
> iteration.
>
> Suggested-by: Jan Kara <jack@suse.cz>
> Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
Very nice. Feel free to add:
Reviewed-by: Jan Kara <jack@suse.cz>
One small comment below:
> +/*
> + * Make sure that the current journal transaction has enough credits to map
> + * one extent. Return -EAGAIN if it cannot extend the current running
> + * transaction.
> + */
> +static inline int ext4_journal_ensure_extent_credits(handle_t *handle,
> + struct inode *inode)
> +{
> + int credits;
> + int ret;
> +
> + if (!handle)
Shouldn't this rather be ext4_handle_valid(handle) to catch nojournal mode
properly?
Honza
--
Jan Kara <jack@suse.com>
SUSE Labs, CR
On 2025/7/2 22:18, Jan Kara wrote:
> On Tue 01-07-25 21:06:30, Zhang Yi wrote:
>> From: Zhang Yi <yi.zhang@huawei.com>
>>
>> After large folios are supported on ext4, writing back a sufficiently
>> large and discontinuous folio may consume a significant number of
>> journal credits, placing considerable strain on the journal. For
>> example, in a 20GB filesystem with 1K block size and 1MB journal size,
>> writing back a 2MB folio could require thousands of credits in the
>> worst-case scenario (when each block is discontinuous and distributed
>> across different block groups), potentially exceeding the journal size.
>> This issue can also occur in ext4_write_begin() and ext4_page_mkwrite()
>> when delalloc is not enabled.
>>
>> Fix this by ensuring that there are sufficient journal credits before
>> allocating an extent in mpage_map_one_extent() and
>> ext4_block_write_begin(). If there are not enough credits, return
>> -EAGAIN, exit the current mapping loop, restart a new handle and a new
>> transaction, and allocating blocks on this folio again in the next
>> iteration.
>>
>> Suggested-by: Jan Kara <jack@suse.cz>
>> Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
>
> Very nice. Feel free to add:
>
> Reviewed-by: Jan Kara <jack@suse.cz>
>
> One small comment below:
>
>> +/*
>> + * Make sure that the current journal transaction has enough credits to map
>> + * one extent. Return -EAGAIN if it cannot extend the current running
>> + * transaction.
>> + */
>> +static inline int ext4_journal_ensure_extent_credits(handle_t *handle,
>> + struct inode *inode)
>> +{
>> + int credits;
>> + int ret;
>> +
>> + if (!handle)
>
> Shouldn't this rather be ext4_handle_valid(handle) to catch nojournal mode
> properly?
>
__ext4_journal_ensure_credits() already calls ext4_handle_valid() to handle
nojournal mode, and the '!handle' check here is to handle the case where
ext4_block_write_begin() passes in a NULL 'handle'.
Thanks,
Yi.
On Thu 03-07-25 10:13:07, Zhang Yi wrote:
> On 2025/7/2 22:18, Jan Kara wrote:
> > On Tue 01-07-25 21:06:30, Zhang Yi wrote:
> >> From: Zhang Yi <yi.zhang@huawei.com>
> >>
> >> After large folios are supported on ext4, writing back a sufficiently
> >> large and discontinuous folio may consume a significant number of
> >> journal credits, placing considerable strain on the journal. For
> >> example, in a 20GB filesystem with 1K block size and 1MB journal size,
> >> writing back a 2MB folio could require thousands of credits in the
> >> worst-case scenario (when each block is discontinuous and distributed
> >> across different block groups), potentially exceeding the journal size.
> >> This issue can also occur in ext4_write_begin() and ext4_page_mkwrite()
> >> when delalloc is not enabled.
> >>
> >> Fix this by ensuring that there are sufficient journal credits before
> >> allocating an extent in mpage_map_one_extent() and
> >> ext4_block_write_begin(). If there are not enough credits, return
> >> -EAGAIN, exit the current mapping loop, restart a new handle and a new
> >> transaction, and allocating blocks on this folio again in the next
> >> iteration.
> >>
> >> Suggested-by: Jan Kara <jack@suse.cz>
> >> Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
> >
> > Very nice. Feel free to add:
> >
> > Reviewed-by: Jan Kara <jack@suse.cz>
> >
> > One small comment below:
> >
> >> +/*
> >> + * Make sure that the current journal transaction has enough credits to map
> >> + * one extent. Return -EAGAIN if it cannot extend the current running
> >> + * transaction.
> >> + */
> >> +static inline int ext4_journal_ensure_extent_credits(handle_t *handle,
> >> + struct inode *inode)
> >> +{
> >> + int credits;
> >> + int ret;
> >> +
> >> + if (!handle)
> >
> > Shouldn't this rather be ext4_handle_valid(handle) to catch nojournal mode
> > properly?
> >
> __ext4_journal_ensure_credits() already calls ext4_handle_valid() to handle
> nojournal mode, and the '!handle' check here is to handle the case where
> ext4_block_write_begin() passes in a NULL 'handle'.
Ah, right. But then you don't need the test at all, do you? Anyway,
whatever you decide to do with this (or nothing) is fine by me.
Honza
--
Jan Kara <jack@suse.com>
SUSE Labs, CR
On 2025/7/4 0:27, Jan Kara wrote:
> On Thu 03-07-25 10:13:07, Zhang Yi wrote:
>> On 2025/7/2 22:18, Jan Kara wrote:
>>> On Tue 01-07-25 21:06:30, Zhang Yi wrote:
>>>> From: Zhang Yi <yi.zhang@huawei.com>
>>>>
>>>> After large folios are supported on ext4, writing back a sufficiently
>>>> large and discontinuous folio may consume a significant number of
>>>> journal credits, placing considerable strain on the journal. For
>>>> example, in a 20GB filesystem with 1K block size and 1MB journal size,
>>>> writing back a 2MB folio could require thousands of credits in the
>>>> worst-case scenario (when each block is discontinuous and distributed
>>>> across different block groups), potentially exceeding the journal size.
>>>> This issue can also occur in ext4_write_begin() and ext4_page_mkwrite()
>>>> when delalloc is not enabled.
>>>>
>>>> Fix this by ensuring that there are sufficient journal credits before
>>>> allocating an extent in mpage_map_one_extent() and
>>>> ext4_block_write_begin(). If there are not enough credits, return
>>>> -EAGAIN, exit the current mapping loop, restart a new handle and a new
>>>> transaction, and allocating blocks on this folio again in the next
>>>> iteration.
>>>>
>>>> Suggested-by: Jan Kara <jack@suse.cz>
>>>> Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
>>>
>>> Very nice. Feel free to add:
>>>
>>> Reviewed-by: Jan Kara <jack@suse.cz>
>>>
>>> One small comment below:
>>>
>>>> +/*
>>>> + * Make sure that the current journal transaction has enough credits to map
>>>> + * one extent. Return -EAGAIN if it cannot extend the current running
>>>> + * transaction.
>>>> + */
>>>> +static inline int ext4_journal_ensure_extent_credits(handle_t *handle,
>>>> + struct inode *inode)
>>>> +{
>>>> + int credits;
>>>> + int ret;
>>>> +
>>>> + if (!handle)
>>>
>>> Shouldn't this rather be ext4_handle_valid(handle) to catch nojournal mode
>>> properly?
>>>
>> __ext4_journal_ensure_credits() already calls ext4_handle_valid() to handle
>> nojournal mode, and the '!handle' check here is to handle the case where
>> ext4_block_write_begin() passes in a NULL 'handle'.
>
> Ah, right. But then you don't need the test at all, do you? Anyway,
> whatever you decide to do with this (or nothing) is fine by me.
>
Yeah, remove this test is fine with me. I added this one is because the
comments in ext4_handle_valid() said "Do not use this for NULL handles."
I think it is best to follow this rule. :)
Best regards,
Yi.
On Fri 04-07-25 09:40:23, Zhang Yi wrote:
> On 2025/7/4 0:27, Jan Kara wrote:
> > On Thu 03-07-25 10:13:07, Zhang Yi wrote:
> >> On 2025/7/2 22:18, Jan Kara wrote:
> >>> On Tue 01-07-25 21:06:30, Zhang Yi wrote:
> >>>> From: Zhang Yi <yi.zhang@huawei.com>
> >>>>
> >>>> After large folios are supported on ext4, writing back a sufficiently
> >>>> large and discontinuous folio may consume a significant number of
> >>>> journal credits, placing considerable strain on the journal. For
> >>>> example, in a 20GB filesystem with 1K block size and 1MB journal size,
> >>>> writing back a 2MB folio could require thousands of credits in the
> >>>> worst-case scenario (when each block is discontinuous and distributed
> >>>> across different block groups), potentially exceeding the journal size.
> >>>> This issue can also occur in ext4_write_begin() and ext4_page_mkwrite()
> >>>> when delalloc is not enabled.
> >>>>
> >>>> Fix this by ensuring that there are sufficient journal credits before
> >>>> allocating an extent in mpage_map_one_extent() and
> >>>> ext4_block_write_begin(). If there are not enough credits, return
> >>>> -EAGAIN, exit the current mapping loop, restart a new handle and a new
> >>>> transaction, and allocating blocks on this folio again in the next
> >>>> iteration.
> >>>>
> >>>> Suggested-by: Jan Kara <jack@suse.cz>
> >>>> Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
> >>>
> >>> Very nice. Feel free to add:
> >>>
> >>> Reviewed-by: Jan Kara <jack@suse.cz>
> >>>
> >>> One small comment below:
> >>>
> >>>> +/*
> >>>> + * Make sure that the current journal transaction has enough credits to map
> >>>> + * one extent. Return -EAGAIN if it cannot extend the current running
> >>>> + * transaction.
> >>>> + */
> >>>> +static inline int ext4_journal_ensure_extent_credits(handle_t *handle,
> >>>> + struct inode *inode)
> >>>> +{
> >>>> + int credits;
> >>>> + int ret;
> >>>> +
> >>>> + if (!handle)
> >>>
> >>> Shouldn't this rather be ext4_handle_valid(handle) to catch nojournal mode
> >>> properly?
> >>>
> >> __ext4_journal_ensure_credits() already calls ext4_handle_valid() to handle
> >> nojournal mode, and the '!handle' check here is to handle the case where
> >> ext4_block_write_begin() passes in a NULL 'handle'.
> >
> > Ah, right. But then you don't need the test at all, do you? Anyway,
> > whatever you decide to do with this (or nothing) is fine by me.
> >
>
> Yeah, remove this test is fine with me. I added this one is because the
> comments in ext4_handle_valid() said "Do not use this for NULL handles."
> I think it is best to follow this rule. :)
Right, I didn't read that comment :) So maybe the best fix will be just
adding a comment before the test like:
/* Called from ext4_da_write_begin() which has no handle started? */
if (!handle)
return 0;
Honza
--
Jan Kara <jack@suse.com>
SUSE Labs, CR
On 2025/7/4 16:18, Jan Kara wrote:
> On Fri 04-07-25 09:40:23, Zhang Yi wrote:
>> On 2025/7/4 0:27, Jan Kara wrote:
>>> On Thu 03-07-25 10:13:07, Zhang Yi wrote:
>>>> On 2025/7/2 22:18, Jan Kara wrote:
>>>>> On Tue 01-07-25 21:06:30, Zhang Yi wrote:
>>>>>> From: Zhang Yi <yi.zhang@huawei.com>
>>>>>>
>>>>>> After large folios are supported on ext4, writing back a sufficiently
>>>>>> large and discontinuous folio may consume a significant number of
>>>>>> journal credits, placing considerable strain on the journal. For
>>>>>> example, in a 20GB filesystem with 1K block size and 1MB journal size,
>>>>>> writing back a 2MB folio could require thousands of credits in the
>>>>>> worst-case scenario (when each block is discontinuous and distributed
>>>>>> across different block groups), potentially exceeding the journal size.
>>>>>> This issue can also occur in ext4_write_begin() and ext4_page_mkwrite()
>>>>>> when delalloc is not enabled.
>>>>>>
>>>>>> Fix this by ensuring that there are sufficient journal credits before
>>>>>> allocating an extent in mpage_map_one_extent() and
>>>>>> ext4_block_write_begin(). If there are not enough credits, return
>>>>>> -EAGAIN, exit the current mapping loop, restart a new handle and a new
>>>>>> transaction, and allocating blocks on this folio again in the next
>>>>>> iteration.
>>>>>>
>>>>>> Suggested-by: Jan Kara <jack@suse.cz>
>>>>>> Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
>>>>>
>>>>> Very nice. Feel free to add:
>>>>>
>>>>> Reviewed-by: Jan Kara <jack@suse.cz>
>>>>>
>>>>> One small comment below:
>>>>>
>>>>>> +/*
>>>>>> + * Make sure that the current journal transaction has enough credits to map
>>>>>> + * one extent. Return -EAGAIN if it cannot extend the current running
>>>>>> + * transaction.
>>>>>> + */
>>>>>> +static inline int ext4_journal_ensure_extent_credits(handle_t *handle,
>>>>>> + struct inode *inode)
>>>>>> +{
>>>>>> + int credits;
>>>>>> + int ret;
>>>>>> +
>>>>>> + if (!handle)
>>>>>
>>>>> Shouldn't this rather be ext4_handle_valid(handle) to catch nojournal mode
>>>>> properly?
>>>>>
>>>> __ext4_journal_ensure_credits() already calls ext4_handle_valid() to handle
>>>> nojournal mode, and the '!handle' check here is to handle the case where
>>>> ext4_block_write_begin() passes in a NULL 'handle'.
>>>
>>> Ah, right. But then you don't need the test at all, do you? Anyway,
>>> whatever you decide to do with this (or nothing) is fine by me.
>>>
>>
>> Yeah, remove this test is fine with me. I added this one is because the
>> comments in ext4_handle_valid() said "Do not use this for NULL handles."
>> I think it is best to follow this rule. :)
>
> Right, I didn't read that comment :) So maybe the best fix will be just
> adding a comment before the test like:
>
> /* Called from ext4_da_write_begin() which has no handle started? */
> if (!handle)
> return 0;
>
Sure, it looks good, will do.
Thanks,
Yi.
© 2016 - 2025 Red Hat, Inc.