fs/ext4/inode.c | 17 ++++++++++++++++- 1 file changed, 16 insertions(+), 1 deletion(-)
From: Baolin Liu <liubaolin@kylinos.cn>
Since the merge of commit 3910b513fcdf ("ext4: persist the new uptodate
buffers in ext4_journalled_zero_new_buffers"), a new assertion failure
occurred under a old kernel(ext3, data=journal, pagesize=64k) with
corresponding ported patches:
================================================================
Call trace:
__ext4_handle_dirty_metadata+0x320/0x7e8
write_end_fn+0x78/0x178
ext4_journalled_zero_new_buffers+0xd0/0x2c8
ext4_block_write_begin+0x850/0xc00
ext4_write_begin+0x334/0xc68
generic_perform_write+0x1a4/0x380
ext4_buffered_write_iter+0x180/0x370
ext4_file_write_iter+0x194/0xfc0
new_sync_write+0x338/0x4b8
__vfs_write+0xc4/0xe8
vfs_write+0x12c/0x3d0
ksys_write+0xf4/0x230
sys_write+0x34/0x48
el0_svc_naked+0x44/0x48
================================================================
which was caused by bh dirting without calling
do_journal_get_write_access().
In the loop for all bhs of a page in ext4_block_write_begin(),
when a err occurred, it will jump out of loop.
But that will leaves some bhs being processed and some not,
which will lead to the asserion failure in calling write_end_fn().
To fixed that, get write access for the rest unprocessed bhs, just
as what write_end_fn do.
Fixes: 3910b513fcdf ("ext4: persist the new uptodate buffers in ext4_journalled_zero_new_buffers")
Reported-and-tested-by: Zhi Long <longzhi@sangfor.com.cn>
Suggested-by: Shida Zhang <zhangshida@kylinos.cn>
Signed-off-by: Baolin Liu <liubaolin@kylinos.cn>
---
fs/ext4/inode.c | 17 ++++++++++++++++-
1 file changed, 16 insertions(+), 1 deletion(-)
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 54bdd4884fe6..a72f951288e4 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -1102,9 +1102,24 @@ int ext4_block_write_begin(handle_t *handle, struct folio *folio,
err = -EIO;
}
if (unlikely(err)) {
- if (should_journal_data)
+ if (should_journal_data) {
+ if (bh != head || !block_start) {
+ do {
+ block_end = block_start + bh->b_size;
+
+ if (buffer_new(bh))
+ if (block_end > from && block_start < to)
+ do_journal_get_write_access(handle,
+ inode, bh);
+
+ block_start = block_end;
+ bh = bh->b_this_page;
+ } while (bh != head);
+ }
+
ext4_journalled_zero_new_buffers(handle, inode, folio,
from, to);
+ }
else
folio_zero_new_buffers(folio, from, to);
} else if (fscrypt_inode_uses_fs_layer_crypto(inode)) {
--
2.39.2
On Thu 10-10-24 10:58:55, Baolin Liu wrote:
> From: Baolin Liu <liubaolin@kylinos.cn>
>
> Since the merge of commit 3910b513fcdf ("ext4: persist the new uptodate
> buffers in ext4_journalled_zero_new_buffers"), a new assertion failure
> occurred under a old kernel(ext3, data=journal, pagesize=64k) with
> corresponding ported patches:
...
> which was caused by bh dirting without calling
> do_journal_get_write_access().
>
> In the loop for all bhs of a page in ext4_block_write_begin(),
> when a err occurred, it will jump out of loop.
> But that will leaves some bhs being processed and some not,
> which will lead to the asserion failure in calling write_end_fn().
Thanks for the patch but I don't understand one thing here: For
ext4_journalled_zero_new_buffers() to call write_end_fn() the buffer must
have buffer_new flag set. That flag can get set only by ext4_get_block()
function when it succeeds in which case we also call
do_journal_get_write_access(). So how is it possible that buffer_new was
set on a buffer on which we didn't call do_journal_get_write_access()? This
indicates there may be some deeper problem hidden. How exactly did you
trigger this problem?
Honza
>
> To fixed that, get write access for the rest unprocessed bhs, just
> as what write_end_fn do.
>
> Fixes: 3910b513fcdf ("ext4: persist the new uptodate buffers in ext4_journalled_zero_new_buffers")
> Reported-and-tested-by: Zhi Long <longzhi@sangfor.com.cn>
> Suggested-by: Shida Zhang <zhangshida@kylinos.cn>
> Signed-off-by: Baolin Liu <liubaolin@kylinos.cn>
> ---
> fs/ext4/inode.c | 17 ++++++++++++++++-
> 1 file changed, 16 insertions(+), 1 deletion(-)
>
> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
> index 54bdd4884fe6..a72f951288e4 100644
> --- a/fs/ext4/inode.c
> +++ b/fs/ext4/inode.c
> @@ -1102,9 +1102,24 @@ int ext4_block_write_begin(handle_t *handle, struct folio *folio,
> err = -EIO;
> }
> if (unlikely(err)) {
> - if (should_journal_data)
> + if (should_journal_data) {
> + if (bh != head || !block_start) {
> + do {
> + block_end = block_start + bh->b_size;
> +
> + if (buffer_new(bh))
> + if (block_end > from && block_start < to)
> + do_journal_get_write_access(handle,
> + inode, bh);
> +
> + block_start = block_end;
> + bh = bh->b_this_page;
> + } while (bh != head);
> + }
> +
> ext4_journalled_zero_new_buffers(handle, inode, folio,
> from, to);
> + }
> else
> folio_zero_new_buffers(folio, from, to);
> } else if (fscrypt_inode_uses_fs_layer_crypto(inode)) {
> --
> 2.39.2
>
--
Jan Kara <jack@suse.com>
SUSE Labs, CR
> Greetings,
> This problem is reproduced by our customer using their own testing tool “run_bug”.
> When I consulted with a client, the testing tool “run_bug” used a variety of background programs to benchmark
> (including memory pressure, cpu pressure, file cycle manipulation, fsstress Stress testing tool, postmark program,and so on).
> The recurrence probability is relatively low.
>
> In response to your query, in ext4_block_write_begin, the new state will be clear before get block,
> and the bh that failed get_block will not be set to new.
> However, when the page size is greater than the block size, a page will contain multiple bh.
> bh->b_this_page is a circular list for managing all bh on the same page.
> After get_block jumps out of the for loop, then bh->b_this_page is not processed by clear new in the for loop.
> So after calling ext4_journalled_zero_new_buffers,
> The ext4_journalled_zero_new_buffers function will determine all bh of the same page and call write_end_fn if they are in new state,
> get_block returns err's bh->b_this_page and circular list other unhandled bh if the state was previously set to new.
> Because bh not get access, the corresponding transaction is not placed in jh->b_transaction, resulting in a crash.
>
> Therefore, the patch processing method I submit is to make unprocessed bh determines if it is in new state and get access.
> There is another way to handle the remaining bh clear_buffer_new that is not processed.
> I personally recommend get access this way, the impact is small.
> Please guide the two processing methods, which one do you recommend?
在 2024/10/10 17:29, Jan Kara 写道:
> On Thu 10-10-24 10:58:55, Baolin Liu wrote:
>> From: Baolin Liu <liubaolin@kylinos.cn>
>>
>> Since the merge of commit 3910b513fcdf ("ext4: persist the new uptodate
>> buffers in ext4_journalled_zero_new_buffers"), a new assertion failure
>> occurred under a old kernel(ext3, data=journal, pagesize=64k) with
>> corresponding ported patches:
> ...
>> which was caused by bh dirting without calling
>> do_journal_get_write_access().
>>
>> In the loop for all bhs of a page in ext4_block_write_begin(),
>> when a err occurred, it will jump out of loop.
>> But that will leaves some bhs being processed and some not,
>> which will lead to the asserion failure in calling write_end_fn().
>
> Thanks for the patch but I don't understand one thing here: For
> ext4_journalled_zero_new_buffers() to call write_end_fn() the buffer must
> have buffer_new flag set. That flag can get set only by ext4_get_block()
> function when it succeeds in which case we also call
> do_journal_get_write_access(). So how is it possible that buffer_new was
> set on a buffer on which we didn't call do_journal_get_write_access()? This
> indicates there may be some deeper problem hidden. How exactly did you
> trigger this problem?
>
> Honza
>
>>
>> To fixed that, get write access for the rest unprocessed bhs, just
>> as what write_end_fn do.
>>
>> Fixes: 3910b513fcdf ("ext4: persist the new uptodate buffers in ext4_journalled_zero_new_buffers")
>> Reported-and-tested-by: Zhi Long <longzhi@sangfor.com.cn>
>> Suggested-by: Shida Zhang <zhangshida@kylinos.cn>
>> Signed-off-by: Baolin Liu <liubaolin@kylinos.cn>
>> ---
>> fs/ext4/inode.c | 17 ++++++++++++++++-
>> 1 file changed, 16 insertions(+), 1 deletion(-)
>>
>> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
>> index 54bdd4884fe6..a72f951288e4 100644
>> --- a/fs/ext4/inode.c
>> +++ b/fs/ext4/inode.c
>> @@ -1102,9 +1102,24 @@ int ext4_block_write_begin(handle_t *handle, struct folio *folio,
>> err = -EIO;
>> }
>> if (unlikely(err)) {
>> - if (should_journal_data)
>> + if (should_journal_data) {
>> + if (bh != head || !block_start) {
>> + do {
>> + block_end = block_start + bh->b_size;
>> +
>> + if (buffer_new(bh))
>> + if (block_end > from && block_start < to)
>> + do_journal_get_write_access(handle,
>> + inode, bh);
>> +
>> + block_start = block_end;
>> + bh = bh->b_this_page;
>> + } while (bh != head);
>> + }
>> +
>> ext4_journalled_zero_new_buffers(handle, inode, folio,
>> from, to);
>> + }
>> else
>> folio_zero_new_buffers(folio, from, to);
>> } else if (fscrypt_inode_uses_fs_layer_crypto(inode)) {
>> --
>> 2.39.2
>>
> Greetings,
> Regarding this issue,
> I was able to reproduce it quickly by injecting faults via module parameter passing in fsstest while testing simultaneously.
> And we tested that neither get access nor clear new would reproduce the issue after injecting faults.
> Could you please take a look at which approach, get access or clear new, is better?
>
> The fsstress testing and injection fault command are as follows:
> fsstress_arm -d "/fsstress_dir2/" -l 102400 -n 100 -p 128 -r -S -s 10 -c
> watch -n 1 "echo 1 > /sys/module/ext4/parameters/inject_fault"
>
> The injected fault test is modified as follows,
> where the module parameter inject_fault injects the fault,
> and the module parameter try_fix selects whether to handle the fault by getting access or clearing the new:
>
> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
> index b231cd437..590f84391 100644
> --- a/fs/ext4/inode.c
> +++ b/fs/ext4/inode.c
> @@ -50,6 +50,12 @@
>
> #define MPAGE_DA_EXTENT_TAIL 0x01
>
> +static int ext4_inject_fault __read_mostly;
> +module_param_named(inject_fault, ext4_inject_fault, int, 0644);
> +static int ext4_try_fix __read_mostly;
> +module_param_named(try_fix, ext4_try_fix, int, 0644);
> +
> +
> static void ext4_journalled_zero_new_buffers(handle_t *handle,
> struct page *page,
> unsigned int from, unsigned int to);
> @@ -1065,6 +1071,12 @@ int ext4_block_write_begin(handle_t *handle, struct page *page, loff_t pos, unsi
> clear_buffer_new(bh);
> if (!buffer_mapped(bh)) {
> WARN_ON(bh->b_size != blocksize);
> + if (unlikely(ext4_inject_fault)) {
> + ext4_inject_fault = 0;
> + ext4_warning(inode->i_sb, "XXX inject fault get_block return -ENOSPC\n");
> + err = -ENOSPC;
> + break;
> + }
> err = get_block(inode, block, bh, 1);
> if (err)
> break;
> @@ -1116,10 +1128,31 @@ int ext4_block_write_begin(handle_t *handle, struct page *page, loff_t pos, unsi
> err = -EIO;
> }
> if (unlikely(err))
> - if (should_journal_data)
> + if (should_journal_data) {
> + if(bh != head || !block_start) {
> + do {
> + block_end = block_start + bh->b_size;
> +
> + if (buffer_new(bh))
> + if (block_end > from && block_start < to) {
> + if (ext4_try_fix == 1) {
> + ext4_warning(inode->i_sb, "XXX try fix 1\n");
> + do_journal_get_write_access(handle,
> + bh);
> + } else if (ext4_try_fix == 2) {
> + ext4_warning(inode->i_sb, "XXX try fix 2\n");
> + clear_buffer_new(bh);
> + }
> + }
> +
> + block_start = block_end;
> + bh = bh->b_this_page;
> + } while (bh != head);
> + }
> +
> ext4_journalled_zero_new_buffers(handle, page, from,
> to);
> - else
> + } else
> page_zero_new_buffers(page, from, to);
> else if (decrypt)
> err = fscrypt_decrypt_page(page->mapping->host, page,
在 2024/10/11 14:18, liubaolin 写道:
>> Greetings,
>> This problem is reproduced by our customer using their own testing
>> tool “run_bug”.
>> When I consulted with a client, the testing tool “run_bug” used a
>> variety of background programs to benchmark (including memory
>> pressure, cpu pressure, file cycle manipulation, fsstress Stress
>> testing tool, postmark program,and so on).
>> The recurrence probability is relatively low.
>>
>> In response to your query, in ext4_block_write_begin, the new state
>> will be clear before get block, and the bh that failed get_block will
>> not be set to new.
>> However, when the page size is greater than the block size, a page
>> will contain multiple bh. bh->b_this_page is a circular list for
>> managing all bh on the same page.
>> After get_block jumps out of the for loop, then bh->b_this_page is not
>> processed by clear new in the for loop.
>> So after calling ext4_journalled_zero_new_buffers,
>> The ext4_journalled_zero_new_buffers function will determine all bh of
>> the same page and call write_end_fn if they are in new state,
>> get_block returns err's bh->b_this_page and circular list other
>> unhandled bh if the state was previously set to new.
>> Because bh not get access, the corresponding transaction is not placed
>> in jh->b_transaction, resulting in a crash.
>>
>> Therefore, the patch processing method I submit is to make unprocessed
>> bh determines if it is in new state and get access.
>> There is another way to handle the remaining bh clear_buffer_new that
>> is not processed.
>> I personally recommend get access this way, the impact is small.
>> Please guide the two processing methods, which one do you recommend?
>
>
>
> 在 2024/10/10 17:29, Jan Kara 写道:
>> On Thu 10-10-24 10:58:55, Baolin Liu wrote:
>>> From: Baolin Liu <liubaolin@kylinos.cn>
>>>
>>> Since the merge of commit 3910b513fcdf ("ext4: persist the new uptodate
>>> buffers in ext4_journalled_zero_new_buffers"), a new assertion failure
>>> occurred under a old kernel(ext3, data=journal, pagesize=64k) with
>>> corresponding ported patches:
>> ...
>>> which was caused by bh dirting without calling
>>> do_journal_get_write_access().
>>>
>>> In the loop for all bhs of a page in ext4_block_write_begin(),
>>> when a err occurred, it will jump out of loop.
>>> But that will leaves some bhs being processed and some not,
>>> which will lead to the asserion failure in calling write_end_fn().
>>
>> Thanks for the patch but I don't understand one thing here: For
>> ext4_journalled_zero_new_buffers() to call write_end_fn() the buffer must
>> have buffer_new flag set. That flag can get set only by ext4_get_block()
>> function when it succeeds in which case we also call
>> do_journal_get_write_access(). So how is it possible that buffer_new was
>> set on a buffer on which we didn't call do_journal_get_write_access()?
>> This
>> indicates there may be some deeper problem hidden. How exactly did you
>> trigger this problem?
>>
>> Honza
>>
>>>
>>> To fixed that, get write access for the rest unprocessed bhs, just
>>> as what write_end_fn do.
>>>
>>> Fixes: 3910b513fcdf ("ext4: persist the new uptodate buffers in
>>> ext4_journalled_zero_new_buffers")
>>> Reported-and-tested-by: Zhi Long <longzhi@sangfor.com.cn>
>>> Suggested-by: Shida Zhang <zhangshida@kylinos.cn>
>>> Signed-off-by: Baolin Liu <liubaolin@kylinos.cn>
>>> ---
>>> fs/ext4/inode.c | 17 ++++++++++++++++-
>>> 1 file changed, 16 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
>>> index 54bdd4884fe6..a72f951288e4 100644
>>> --- a/fs/ext4/inode.c
>>> +++ b/fs/ext4/inode.c
>>> @@ -1102,9 +1102,24 @@ int ext4_block_write_begin(handle_t *handle,
>>> struct folio *folio,
>>> err = -EIO;
>>> }
>>> if (unlikely(err)) {
>>> - if (should_journal_data)
>>> + if (should_journal_data) {
>>> + if (bh != head || !block_start) {
>>> + do {
>>> + block_end = block_start + bh->b_size;
>>> +
>>> + if (buffer_new(bh))
>>> + if (block_end > from && block_start < to)
>>> + do_journal_get_write_access(handle,
>>> + inode, bh);
>>> +
>>> + block_start = block_end;
>>> + bh = bh->b_this_page;
>>> + } while (bh != head);
>>> + }
>>> +
>>> ext4_journalled_zero_new_buffers(handle, inode, folio,
>>> from, to);
>>> + }
>>> else
>>> folio_zero_new_buffers(folio, from, to);
>>> } else if (fscrypt_inode_uses_fs_layer_crypto(inode)) {
>>> --
>>> 2.39.2
>>>
>
© 2016 - 2026 Red Hat, Inc.