fs/ext4/inode.c | 17 ++++++++++++++++- 1 file changed, 16 insertions(+), 1 deletion(-)
From: Baolin Liu <liubaolin@kylinos.cn>
Since the merge of commit 3910b513fcdf ("ext4: persist the new uptodate
buffers in ext4_journalled_zero_new_buffers"), a new assertion failure
occurred under a old kernel(ext3, data=journal, pagesize=64k) with
corresponding ported patches:
================================================================
Call trace:
__ext4_handle_dirty_metadata+0x320/0x7e8
write_end_fn+0x78/0x178
ext4_journalled_zero_new_buffers+0xd0/0x2c8
ext4_block_write_begin+0x850/0xc00
ext4_write_begin+0x334/0xc68
generic_perform_write+0x1a4/0x380
ext4_buffered_write_iter+0x180/0x370
ext4_file_write_iter+0x194/0xfc0
new_sync_write+0x338/0x4b8
__vfs_write+0xc4/0xe8
vfs_write+0x12c/0x3d0
ksys_write+0xf4/0x230
sys_write+0x34/0x48
el0_svc_naked+0x44/0x48
================================================================
which was caused by bh dirting without calling
do_journal_get_write_access().
In the loop for all bhs of a page in ext4_block_write_begin(),
when a err occurred, it will jump out of loop.
But that will leaves some bhs being processed and some not,
which will lead to the asserion failure in calling write_end_fn().
To fixed that, get write access for the rest unprocessed bhs, just
as what write_end_fn do.
Fixes: 3910b513fcdf ("ext4: persist the new uptodate buffers in ext4_journalled_zero_new_buffers")
Reported-and-tested-by: Zhi Long <longzhi@sangfor.com.cn>
Suggested-by: Shida Zhang <zhangshida@kylinos.cn>
Signed-off-by: Baolin Liu <liubaolin@kylinos.cn>
---
fs/ext4/inode.c | 17 ++++++++++++++++-
1 file changed, 16 insertions(+), 1 deletion(-)
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 54bdd4884fe6..a72f951288e4 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -1102,9 +1102,24 @@ int ext4_block_write_begin(handle_t *handle, struct folio *folio,
err = -EIO;
}
if (unlikely(err)) {
- if (should_journal_data)
+ if (should_journal_data) {
+ if (bh != head || !block_start) {
+ do {
+ block_end = block_start + bh->b_size;
+
+ if (buffer_new(bh))
+ if (block_end > from && block_start < to)
+ do_journal_get_write_access(handle,
+ inode, bh);
+
+ block_start = block_end;
+ bh = bh->b_this_page;
+ } while (bh != head);
+ }
+
ext4_journalled_zero_new_buffers(handle, inode, folio,
from, to);
+ }
else
folio_zero_new_buffers(folio, from, to);
} else if (fscrypt_inode_uses_fs_layer_crypto(inode)) {
--
2.39.2
On Thu 10-10-24 10:58:55, Baolin Liu wrote: > From: Baolin Liu <liubaolin@kylinos.cn> > > Since the merge of commit 3910b513fcdf ("ext4: persist the new uptodate > buffers in ext4_journalled_zero_new_buffers"), a new assertion failure > occurred under a old kernel(ext3, data=journal, pagesize=64k) with > corresponding ported patches: ... > which was caused by bh dirting without calling > do_journal_get_write_access(). > > In the loop for all bhs of a page in ext4_block_write_begin(), > when a err occurred, it will jump out of loop. > But that will leaves some bhs being processed and some not, > which will lead to the asserion failure in calling write_end_fn(). Thanks for the patch but I don't understand one thing here: For ext4_journalled_zero_new_buffers() to call write_end_fn() the buffer must have buffer_new flag set. That flag can get set only by ext4_get_block() function when it succeeds in which case we also call do_journal_get_write_access(). So how is it possible that buffer_new was set on a buffer on which we didn't call do_journal_get_write_access()? This indicates there may be some deeper problem hidden. How exactly did you trigger this problem? Honza > > To fixed that, get write access for the rest unprocessed bhs, just > as what write_end_fn do. > > Fixes: 3910b513fcdf ("ext4: persist the new uptodate buffers in ext4_journalled_zero_new_buffers") > Reported-and-tested-by: Zhi Long <longzhi@sangfor.com.cn> > Suggested-by: Shida Zhang <zhangshida@kylinos.cn> > Signed-off-by: Baolin Liu <liubaolin@kylinos.cn> > --- > fs/ext4/inode.c | 17 ++++++++++++++++- > 1 file changed, 16 insertions(+), 1 deletion(-) > > diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c > index 54bdd4884fe6..a72f951288e4 100644 > --- a/fs/ext4/inode.c > +++ b/fs/ext4/inode.c > @@ -1102,9 +1102,24 @@ int ext4_block_write_begin(handle_t *handle, struct folio *folio, > err = -EIO; > } > if (unlikely(err)) { > - if (should_journal_data) > + if (should_journal_data) { > + if (bh != head || !block_start) { > + do { > + block_end = block_start + bh->b_size; > + > + if (buffer_new(bh)) > + if (block_end > from && block_start < to) > + do_journal_get_write_access(handle, > + inode, bh); > + > + block_start = block_end; > + bh = bh->b_this_page; > + } while (bh != head); > + } > + > ext4_journalled_zero_new_buffers(handle, inode, folio, > from, to); > + } > else > folio_zero_new_buffers(folio, from, to); > } else if (fscrypt_inode_uses_fs_layer_crypto(inode)) { > -- > 2.39.2 > -- Jan Kara <jack@suse.com> SUSE Labs, CR
> Greetings, > This problem is reproduced by our customer using their own testing tool “run_bug”. > When I consulted with a client, the testing tool “run_bug” used a variety of background programs to benchmark > (including memory pressure, cpu pressure, file cycle manipulation, fsstress Stress testing tool, postmark program,and so on). > The recurrence probability is relatively low. > > In response to your query, in ext4_block_write_begin, the new state will be clear before get block, > and the bh that failed get_block will not be set to new. > However, when the page size is greater than the block size, a page will contain multiple bh. > bh->b_this_page is a circular list for managing all bh on the same page. > After get_block jumps out of the for loop, then bh->b_this_page is not processed by clear new in the for loop. > So after calling ext4_journalled_zero_new_buffers, > The ext4_journalled_zero_new_buffers function will determine all bh of the same page and call write_end_fn if they are in new state, > get_block returns err's bh->b_this_page and circular list other unhandled bh if the state was previously set to new. > Because bh not get access, the corresponding transaction is not placed in jh->b_transaction, resulting in a crash. > > Therefore, the patch processing method I submit is to make unprocessed bh determines if it is in new state and get access. > There is another way to handle the remaining bh clear_buffer_new that is not processed. > I personally recommend get access this way, the impact is small. > Please guide the two processing methods, which one do you recommend? 在 2024/10/10 17:29, Jan Kara 写道: > On Thu 10-10-24 10:58:55, Baolin Liu wrote: >> From: Baolin Liu <liubaolin@kylinos.cn> >> >> Since the merge of commit 3910b513fcdf ("ext4: persist the new uptodate >> buffers in ext4_journalled_zero_new_buffers"), a new assertion failure >> occurred under a old kernel(ext3, data=journal, pagesize=64k) with >> corresponding ported patches: > ... >> which was caused by bh dirting without calling >> do_journal_get_write_access(). >> >> In the loop for all bhs of a page in ext4_block_write_begin(), >> when a err occurred, it will jump out of loop. >> But that will leaves some bhs being processed and some not, >> which will lead to the asserion failure in calling write_end_fn(). > > Thanks for the patch but I don't understand one thing here: For > ext4_journalled_zero_new_buffers() to call write_end_fn() the buffer must > have buffer_new flag set. That flag can get set only by ext4_get_block() > function when it succeeds in which case we also call > do_journal_get_write_access(). So how is it possible that buffer_new was > set on a buffer on which we didn't call do_journal_get_write_access()? This > indicates there may be some deeper problem hidden. How exactly did you > trigger this problem? > > Honza > >> >> To fixed that, get write access for the rest unprocessed bhs, just >> as what write_end_fn do. >> >> Fixes: 3910b513fcdf ("ext4: persist the new uptodate buffers in ext4_journalled_zero_new_buffers") >> Reported-and-tested-by: Zhi Long <longzhi@sangfor.com.cn> >> Suggested-by: Shida Zhang <zhangshida@kylinos.cn> >> Signed-off-by: Baolin Liu <liubaolin@kylinos.cn> >> --- >> fs/ext4/inode.c | 17 ++++++++++++++++- >> 1 file changed, 16 insertions(+), 1 deletion(-) >> >> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c >> index 54bdd4884fe6..a72f951288e4 100644 >> --- a/fs/ext4/inode.c >> +++ b/fs/ext4/inode.c >> @@ -1102,9 +1102,24 @@ int ext4_block_write_begin(handle_t *handle, struct folio *folio, >> err = -EIO; >> } >> if (unlikely(err)) { >> - if (should_journal_data) >> + if (should_journal_data) { >> + if (bh != head || !block_start) { >> + do { >> + block_end = block_start + bh->b_size; >> + >> + if (buffer_new(bh)) >> + if (block_end > from && block_start < to) >> + do_journal_get_write_access(handle, >> + inode, bh); >> + >> + block_start = block_end; >> + bh = bh->b_this_page; >> + } while (bh != head); >> + } >> + >> ext4_journalled_zero_new_buffers(handle, inode, folio, >> from, to); >> + } >> else >> folio_zero_new_buffers(folio, from, to); >> } else if (fscrypt_inode_uses_fs_layer_crypto(inode)) { >> -- >> 2.39.2 >>
> Greetings, > Regarding this issue, > I was able to reproduce it quickly by injecting faults via module parameter passing in fsstest while testing simultaneously. > And we tested that neither get access nor clear new would reproduce the issue after injecting faults. > Could you please take a look at which approach, get access or clear new, is better? > > The fsstress testing and injection fault command are as follows: > fsstress_arm -d "/fsstress_dir2/" -l 102400 -n 100 -p 128 -r -S -s 10 -c > watch -n 1 "echo 1 > /sys/module/ext4/parameters/inject_fault" > > The injected fault test is modified as follows, > where the module parameter inject_fault injects the fault, > and the module parameter try_fix selects whether to handle the fault by getting access or clearing the new: > > diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c > index b231cd437..590f84391 100644 > --- a/fs/ext4/inode.c > +++ b/fs/ext4/inode.c > @@ -50,6 +50,12 @@ > > #define MPAGE_DA_EXTENT_TAIL 0x01 > > +static int ext4_inject_fault __read_mostly; > +module_param_named(inject_fault, ext4_inject_fault, int, 0644); > +static int ext4_try_fix __read_mostly; > +module_param_named(try_fix, ext4_try_fix, int, 0644); > + > + > static void ext4_journalled_zero_new_buffers(handle_t *handle, > struct page *page, > unsigned int from, unsigned int to); > @@ -1065,6 +1071,12 @@ int ext4_block_write_begin(handle_t *handle, struct page *page, loff_t pos, unsi > clear_buffer_new(bh); > if (!buffer_mapped(bh)) { > WARN_ON(bh->b_size != blocksize); > + if (unlikely(ext4_inject_fault)) { > + ext4_inject_fault = 0; > + ext4_warning(inode->i_sb, "XXX inject fault get_block return -ENOSPC\n"); > + err = -ENOSPC; > + break; > + } > err = get_block(inode, block, bh, 1); > if (err) > break; > @@ -1116,10 +1128,31 @@ int ext4_block_write_begin(handle_t *handle, struct page *page, loff_t pos, unsi > err = -EIO; > } > if (unlikely(err)) > - if (should_journal_data) > + if (should_journal_data) { > + if(bh != head || !block_start) { > + do { > + block_end = block_start + bh->b_size; > + > + if (buffer_new(bh)) > + if (block_end > from && block_start < to) { > + if (ext4_try_fix == 1) { > + ext4_warning(inode->i_sb, "XXX try fix 1\n"); > + do_journal_get_write_access(handle, > + bh); > + } else if (ext4_try_fix == 2) { > + ext4_warning(inode->i_sb, "XXX try fix 2\n"); > + clear_buffer_new(bh); > + } > + } > + > + block_start = block_end; > + bh = bh->b_this_page; > + } while (bh != head); > + } > + > ext4_journalled_zero_new_buffers(handle, page, from, > to); > - else > + } else > page_zero_new_buffers(page, from, to); > else if (decrypt) > err = fscrypt_decrypt_page(page->mapping->host, page, 在 2024/10/11 14:18, liubaolin 写道: >> Greetings, >> This problem is reproduced by our customer using their own testing >> tool “run_bug”. >> When I consulted with a client, the testing tool “run_bug” used a >> variety of background programs to benchmark (including memory >> pressure, cpu pressure, file cycle manipulation, fsstress Stress >> testing tool, postmark program,and so on). >> The recurrence probability is relatively low. >> >> In response to your query, in ext4_block_write_begin, the new state >> will be clear before get block, and the bh that failed get_block will >> not be set to new. >> However, when the page size is greater than the block size, a page >> will contain multiple bh. bh->b_this_page is a circular list for >> managing all bh on the same page. >> After get_block jumps out of the for loop, then bh->b_this_page is not >> processed by clear new in the for loop. >> So after calling ext4_journalled_zero_new_buffers, >> The ext4_journalled_zero_new_buffers function will determine all bh of >> the same page and call write_end_fn if they are in new state, >> get_block returns err's bh->b_this_page and circular list other >> unhandled bh if the state was previously set to new. >> Because bh not get access, the corresponding transaction is not placed >> in jh->b_transaction, resulting in a crash. >> >> Therefore, the patch processing method I submit is to make unprocessed >> bh determines if it is in new state and get access. >> There is another way to handle the remaining bh clear_buffer_new that >> is not processed. >> I personally recommend get access this way, the impact is small. >> Please guide the two processing methods, which one do you recommend? > > > > 在 2024/10/10 17:29, Jan Kara 写道: >> On Thu 10-10-24 10:58:55, Baolin Liu wrote: >>> From: Baolin Liu <liubaolin@kylinos.cn> >>> >>> Since the merge of commit 3910b513fcdf ("ext4: persist the new uptodate >>> buffers in ext4_journalled_zero_new_buffers"), a new assertion failure >>> occurred under a old kernel(ext3, data=journal, pagesize=64k) with >>> corresponding ported patches: >> ... >>> which was caused by bh dirting without calling >>> do_journal_get_write_access(). >>> >>> In the loop for all bhs of a page in ext4_block_write_begin(), >>> when a err occurred, it will jump out of loop. >>> But that will leaves some bhs being processed and some not, >>> which will lead to the asserion failure in calling write_end_fn(). >> >> Thanks for the patch but I don't understand one thing here: For >> ext4_journalled_zero_new_buffers() to call write_end_fn() the buffer must >> have buffer_new flag set. That flag can get set only by ext4_get_block() >> function when it succeeds in which case we also call >> do_journal_get_write_access(). So how is it possible that buffer_new was >> set on a buffer on which we didn't call do_journal_get_write_access()? >> This >> indicates there may be some deeper problem hidden. How exactly did you >> trigger this problem? >> >> Honza >> >>> >>> To fixed that, get write access for the rest unprocessed bhs, just >>> as what write_end_fn do. >>> >>> Fixes: 3910b513fcdf ("ext4: persist the new uptodate buffers in >>> ext4_journalled_zero_new_buffers") >>> Reported-and-tested-by: Zhi Long <longzhi@sangfor.com.cn> >>> Suggested-by: Shida Zhang <zhangshida@kylinos.cn> >>> Signed-off-by: Baolin Liu <liubaolin@kylinos.cn> >>> --- >>> fs/ext4/inode.c | 17 ++++++++++++++++- >>> 1 file changed, 16 insertions(+), 1 deletion(-) >>> >>> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c >>> index 54bdd4884fe6..a72f951288e4 100644 >>> --- a/fs/ext4/inode.c >>> +++ b/fs/ext4/inode.c >>> @@ -1102,9 +1102,24 @@ int ext4_block_write_begin(handle_t *handle, >>> struct folio *folio, >>> err = -EIO; >>> } >>> if (unlikely(err)) { >>> - if (should_journal_data) >>> + if (should_journal_data) { >>> + if (bh != head || !block_start) { >>> + do { >>> + block_end = block_start + bh->b_size; >>> + >>> + if (buffer_new(bh)) >>> + if (block_end > from && block_start < to) >>> + do_journal_get_write_access(handle, >>> + inode, bh); >>> + >>> + block_start = block_end; >>> + bh = bh->b_this_page; >>> + } while (bh != head); >>> + } >>> + >>> ext4_journalled_zero_new_buffers(handle, inode, folio, >>> from, to); >>> + } >>> else >>> folio_zero_new_buffers(folio, from, to); >>> } else if (fscrypt_inode_uses_fs_layer_crypto(inode)) { >>> -- >>> 2.39.2 >>> >
© 2016 - 2024 Red Hat, Inc.