From: Zhang Yi <yi.zhang@huawei.com>
This partially reverts commit d6bf294773a4 ("ext4/jbd2: convert
jbd2_journal_blocks_per_page() to support large folio"). This
jbd2_journal_blocks_per_folio() will lead to a significant
overestimation of journal credits. Since we still reserve credits for
one page and attempt to extend and restart handles during large folio
writebacks, so we should convert this helper back to
ext4_journal_blocks_per_page().
Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
---
fs/ext4/ext4_jbd2.h | 7 +++++++
fs/ext4/inode.c | 6 +++---
fs/jbd2/journal.c | 6 ++++++
include/linux/jbd2.h | 1 +
4 files changed, 17 insertions(+), 3 deletions(-)
diff --git a/fs/ext4/ext4_jbd2.h b/fs/ext4/ext4_jbd2.h
index 63d17c5201b5..c0ee756cb34c 100644
--- a/fs/ext4/ext4_jbd2.h
+++ b/fs/ext4/ext4_jbd2.h
@@ -326,6 +326,13 @@ static inline int ext4_journal_blocks_per_folio(struct inode *inode)
return 0;
}
+static inline int ext4_journal_blocks_per_page(struct inode *inode)
+{
+ if (EXT4_JOURNAL(inode) != NULL)
+ return jbd2_journal_blocks_per_page(inode);
+ return 0;
+}
+
static inline int ext4_journal_force_commit(journal_t *journal)
{
if (journal)
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 67e37dd546eb..9835145b1b27 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -2556,7 +2556,7 @@ static int mpage_map_and_submit_extent(handle_t *handle,
*/
static int ext4_da_writepages_trans_blocks(struct inode *inode)
{
- int bpp = ext4_journal_blocks_per_folio(inode);
+ int bpp = ext4_journal_blocks_per_page(inode);
return ext4_meta_trans_blocks(inode,
MAX_WRITEPAGES_EXTENT_LEN + bpp - 1, bpp);
@@ -2634,7 +2634,7 @@ static int mpage_prepare_extent_to_map(struct mpage_da_data *mpd)
ext4_lblk_t lblk;
struct buffer_head *head;
handle_t *handle = NULL;
- int bpp = ext4_journal_blocks_per_folio(mpd->inode);
+ int bpp = ext4_journal_blocks_per_page(mpd->inode);
if (mpd->wbc->sync_mode == WB_SYNC_ALL || mpd->wbc->tagged_writepages)
tag = PAGECACHE_TAG_TOWRITE;
@@ -6255,7 +6255,7 @@ int ext4_meta_trans_blocks(struct inode *inode, int lblocks, int pextents)
*/
int ext4_writepage_trans_blocks(struct inode *inode)
{
- int bpp = ext4_journal_blocks_per_folio(inode);
+ int bpp = ext4_journal_blocks_per_page(inode);
int ret;
ret = ext4_meta_trans_blocks(inode, bpp, bpp);
diff --git a/fs/jbd2/journal.c b/fs/jbd2/journal.c
index d480b94117cd..7fccb425907f 100644
--- a/fs/jbd2/journal.c
+++ b/fs/jbd2/journal.c
@@ -84,6 +84,7 @@ EXPORT_SYMBOL(jbd2_journal_start_commit);
EXPORT_SYMBOL(jbd2_journal_force_commit_nested);
EXPORT_SYMBOL(jbd2_journal_wipe);
EXPORT_SYMBOL(jbd2_journal_blocks_per_folio);
+EXPORT_SYMBOL(jbd2_journal_blocks_per_page);
EXPORT_SYMBOL(jbd2_journal_invalidate_folio);
EXPORT_SYMBOL(jbd2_journal_try_to_free_buffers);
EXPORT_SYMBOL(jbd2_journal_force_commit);
@@ -2661,6 +2662,11 @@ int jbd2_journal_blocks_per_folio(struct inode *inode)
inode->i_sb->s_blocksize_bits);
}
+int jbd2_journal_blocks_per_page(struct inode *inode)
+{
+ return 1 << (PAGE_SHIFT - inode->i_sb->s_blocksize_bits);
+}
+
/*
* helper functions to deal with 32 or 64bit block numbers.
*/
diff --git a/include/linux/jbd2.h b/include/linux/jbd2.h
index 43b9297fe8a7..f35369c104ba 100644
--- a/include/linux/jbd2.h
+++ b/include/linux/jbd2.h
@@ -1724,6 +1724,7 @@ static inline int tid_geq(tid_t x, tid_t y)
}
extern int jbd2_journal_blocks_per_folio(struct inode *inode);
+extern int jbd2_journal_blocks_per_page(struct inode *inode);
extern size_t journal_tag_bytes(journal_t *journal);
static inline int jbd2_journal_has_csum_v2or3(journal_t *journal)
--
2.46.1
On Wed 11-06-25 19:16:24, Zhang Yi wrote: > From: Zhang Yi <yi.zhang@huawei.com> > > This partially reverts commit d6bf294773a4 ("ext4/jbd2: convert > jbd2_journal_blocks_per_page() to support large folio"). This > jbd2_journal_blocks_per_folio() will lead to a significant > overestimation of journal credits. Since we still reserve credits for > one page and attempt to extend and restart handles during large folio > writebacks, so we should convert this helper back to > ext4_journal_blocks_per_page(). > > Signed-off-by: Zhang Yi <yi.zhang@huawei.com> Here I'm not decided. Does it make any particular sense to reserve credits for one *page* worth of blocks when pages don't have any particular meaning in our writeback code anymore? We could reserve credits just for one physical extent and that should be enough. For blocksize == pagesize (most common configs) this would be actually equivalent. If blocksize < pagesize, this could force us to do some more writeback retries and thus get somewhat higher writeback CPU overhead but do we really care for these configs? It is well possible I've overlooked something and someone will spot a performance regression in practical setup with this in which case we'd have to come up with something more clever but I think it's worth it to start simple and complicate later. Honza > --- > fs/ext4/ext4_jbd2.h | 7 +++++++ > fs/ext4/inode.c | 6 +++--- > fs/jbd2/journal.c | 6 ++++++ > include/linux/jbd2.h | 1 + > 4 files changed, 17 insertions(+), 3 deletions(-) > > diff --git a/fs/ext4/ext4_jbd2.h b/fs/ext4/ext4_jbd2.h > index 63d17c5201b5..c0ee756cb34c 100644 > --- a/fs/ext4/ext4_jbd2.h > +++ b/fs/ext4/ext4_jbd2.h > @@ -326,6 +326,13 @@ static inline int ext4_journal_blocks_per_folio(struct inode *inode) > return 0; > } > > +static inline int ext4_journal_blocks_per_page(struct inode *inode) > +{ > + if (EXT4_JOURNAL(inode) != NULL) > + return jbd2_journal_blocks_per_page(inode); > + return 0; > +} > + > static inline int ext4_journal_force_commit(journal_t *journal) > { > if (journal) > diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c > index 67e37dd546eb..9835145b1b27 100644 > --- a/fs/ext4/inode.c > +++ b/fs/ext4/inode.c > @@ -2556,7 +2556,7 @@ static int mpage_map_and_submit_extent(handle_t *handle, > */ > static int ext4_da_writepages_trans_blocks(struct inode *inode) > { > - int bpp = ext4_journal_blocks_per_folio(inode); > + int bpp = ext4_journal_blocks_per_page(inode); > > return ext4_meta_trans_blocks(inode, > MAX_WRITEPAGES_EXTENT_LEN + bpp - 1, bpp); > @@ -2634,7 +2634,7 @@ static int mpage_prepare_extent_to_map(struct mpage_da_data *mpd) > ext4_lblk_t lblk; > struct buffer_head *head; > handle_t *handle = NULL; > - int bpp = ext4_journal_blocks_per_folio(mpd->inode); > + int bpp = ext4_journal_blocks_per_page(mpd->inode); > > if (mpd->wbc->sync_mode == WB_SYNC_ALL || mpd->wbc->tagged_writepages) > tag = PAGECACHE_TAG_TOWRITE; > @@ -6255,7 +6255,7 @@ int ext4_meta_trans_blocks(struct inode *inode, int lblocks, int pextents) > */ > int ext4_writepage_trans_blocks(struct inode *inode) > { > - int bpp = ext4_journal_blocks_per_folio(inode); > + int bpp = ext4_journal_blocks_per_page(inode); > int ret; > > ret = ext4_meta_trans_blocks(inode, bpp, bpp); > diff --git a/fs/jbd2/journal.c b/fs/jbd2/journal.c > index d480b94117cd..7fccb425907f 100644 > --- a/fs/jbd2/journal.c > +++ b/fs/jbd2/journal.c > @@ -84,6 +84,7 @@ EXPORT_SYMBOL(jbd2_journal_start_commit); > EXPORT_SYMBOL(jbd2_journal_force_commit_nested); > EXPORT_SYMBOL(jbd2_journal_wipe); > EXPORT_SYMBOL(jbd2_journal_blocks_per_folio); > +EXPORT_SYMBOL(jbd2_journal_blocks_per_page); > EXPORT_SYMBOL(jbd2_journal_invalidate_folio); > EXPORT_SYMBOL(jbd2_journal_try_to_free_buffers); > EXPORT_SYMBOL(jbd2_journal_force_commit); > @@ -2661,6 +2662,11 @@ int jbd2_journal_blocks_per_folio(struct inode *inode) > inode->i_sb->s_blocksize_bits); > } > > +int jbd2_journal_blocks_per_page(struct inode *inode) > +{ > + return 1 << (PAGE_SHIFT - inode->i_sb->s_blocksize_bits); > +} > + > /* > * helper functions to deal with 32 or 64bit block numbers. > */ > diff --git a/include/linux/jbd2.h b/include/linux/jbd2.h > index 43b9297fe8a7..f35369c104ba 100644 > --- a/include/linux/jbd2.h > +++ b/include/linux/jbd2.h > @@ -1724,6 +1724,7 @@ static inline int tid_geq(tid_t x, tid_t y) > } > > extern int jbd2_journal_blocks_per_folio(struct inode *inode); > +extern int jbd2_journal_blocks_per_page(struct inode *inode); > extern size_t journal_tag_bytes(journal_t *journal); > > static inline int jbd2_journal_has_csum_v2or3(journal_t *journal) > -- > 2.46.1 > -- Jan Kara <jack@suse.com> SUSE Labs, CR
On 2025/6/20 0:44, Jan Kara wrote: > On Wed 11-06-25 19:16:24, Zhang Yi wrote: >> From: Zhang Yi <yi.zhang@huawei.com> >> >> This partially reverts commit d6bf294773a4 ("ext4/jbd2: convert >> jbd2_journal_blocks_per_page() to support large folio"). This >> jbd2_journal_blocks_per_folio() will lead to a significant >> overestimation of journal credits. Since we still reserve credits for >> one page and attempt to extend and restart handles during large folio >> writebacks, so we should convert this helper back to >> ext4_journal_blocks_per_page(). >> >> Signed-off-by: Zhang Yi <yi.zhang@huawei.com> > > Here I'm not decided. Does it make any particular sense to reserve credits > for one *page* worth of blocks when pages don't have any particular meaning > in our writeback code anymore? We could reserve credits just for one > physical extent and that should be enough. Indeed, reserving credits for a single page is no longer suitable in the currently folio based context. It do seems more appropriate to allocate credits for a single extent. > For blocksize == pagesize (most > common configs) this would be actually equivalent. If blocksize < pagesize, > this could force us to do some more writeback retries and thus get somewhat > higher writeback CPU overhead but do we really care for these configs? It > is well possible I've overlooked something and someone will spot a > performance regression in practical setup with this in which case we'd have > to come up with something more clever but I think it's worth it to start > simple and complicate later. This can indeed be a problem if the file system is already fragmented enough. However, thanks to the credits extension logic in __ext4_journal_ensure_credits(), I suppose that on most file system images, it will not trigger excessive retry operations. Besides, although there might be some lock cost in jbd2_journal_extend(), I suppose it won't be a big deal. Perhaps we could reserve more credits through a complex formula at the outset, which would lower the cost of expanding the credits. But I don't think this will help much in reducing the number of retries, it may only be helpful in extreme cases (the running transaction stats to commit, we cannot extend it). So, I think we can implement it by reserving for an extent for the time being. Do you agree? Thanks, Yi. > > >> --- >> fs/ext4/ext4_jbd2.h | 7 +++++++ >> fs/ext4/inode.c | 6 +++--- >> fs/jbd2/journal.c | 6 ++++++ >> include/linux/jbd2.h | 1 + >> 4 files changed, 17 insertions(+), 3 deletions(-) >> >> diff --git a/fs/ext4/ext4_jbd2.h b/fs/ext4/ext4_jbd2.h >> index 63d17c5201b5..c0ee756cb34c 100644 >> --- a/fs/ext4/ext4_jbd2.h >> +++ b/fs/ext4/ext4_jbd2.h >> @@ -326,6 +326,13 @@ static inline int ext4_journal_blocks_per_folio(struct inode *inode) >> return 0; >> } >> >> +static inline int ext4_journal_blocks_per_page(struct inode *inode) >> +{ >> + if (EXT4_JOURNAL(inode) != NULL) >> + return jbd2_journal_blocks_per_page(inode); >> + return 0; >> +} >> + >> static inline int ext4_journal_force_commit(journal_t *journal) >> { >> if (journal) >> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c >> index 67e37dd546eb..9835145b1b27 100644 >> --- a/fs/ext4/inode.c >> +++ b/fs/ext4/inode.c >> @@ -2556,7 +2556,7 @@ static int mpage_map_and_submit_extent(handle_t *handle, >> */ >> static int ext4_da_writepages_trans_blocks(struct inode *inode) >> { >> - int bpp = ext4_journal_blocks_per_folio(inode); >> + int bpp = ext4_journal_blocks_per_page(inode); >> >> return ext4_meta_trans_blocks(inode, >> MAX_WRITEPAGES_EXTENT_LEN + bpp - 1, bpp); >> @@ -2634,7 +2634,7 @@ static int mpage_prepare_extent_to_map(struct mpage_da_data *mpd) >> ext4_lblk_t lblk; >> struct buffer_head *head; >> handle_t *handle = NULL; >> - int bpp = ext4_journal_blocks_per_folio(mpd->inode); >> + int bpp = ext4_journal_blocks_per_page(mpd->inode); >> >> if (mpd->wbc->sync_mode == WB_SYNC_ALL || mpd->wbc->tagged_writepages) >> tag = PAGECACHE_TAG_TOWRITE; >> @@ -6255,7 +6255,7 @@ int ext4_meta_trans_blocks(struct inode *inode, int lblocks, int pextents) >> */ >> int ext4_writepage_trans_blocks(struct inode *inode) >> { >> - int bpp = ext4_journal_blocks_per_folio(inode); >> + int bpp = ext4_journal_blocks_per_page(inode); >> int ret; >> >> ret = ext4_meta_trans_blocks(inode, bpp, bpp); >> diff --git a/fs/jbd2/journal.c b/fs/jbd2/journal.c >> index d480b94117cd..7fccb425907f 100644 >> --- a/fs/jbd2/journal.c >> +++ b/fs/jbd2/journal.c >> @@ -84,6 +84,7 @@ EXPORT_SYMBOL(jbd2_journal_start_commit); >> EXPORT_SYMBOL(jbd2_journal_force_commit_nested); >> EXPORT_SYMBOL(jbd2_journal_wipe); >> EXPORT_SYMBOL(jbd2_journal_blocks_per_folio); >> +EXPORT_SYMBOL(jbd2_journal_blocks_per_page); >> EXPORT_SYMBOL(jbd2_journal_invalidate_folio); >> EXPORT_SYMBOL(jbd2_journal_try_to_free_buffers); >> EXPORT_SYMBOL(jbd2_journal_force_commit); >> @@ -2661,6 +2662,11 @@ int jbd2_journal_blocks_per_folio(struct inode *inode) >> inode->i_sb->s_blocksize_bits); >> } >> >> +int jbd2_journal_blocks_per_page(struct inode *inode) >> +{ >> + return 1 << (PAGE_SHIFT - inode->i_sb->s_blocksize_bits); >> +} >> + >> /* >> * helper functions to deal with 32 or 64bit block numbers. >> */ >> diff --git a/include/linux/jbd2.h b/include/linux/jbd2.h >> index 43b9297fe8a7..f35369c104ba 100644 >> --- a/include/linux/jbd2.h >> +++ b/include/linux/jbd2.h >> @@ -1724,6 +1724,7 @@ static inline int tid_geq(tid_t x, tid_t y) >> } >> >> extern int jbd2_journal_blocks_per_folio(struct inode *inode); >> +extern int jbd2_journal_blocks_per_page(struct inode *inode); >> extern size_t journal_tag_bytes(journal_t *journal); >> >> static inline int jbd2_journal_has_csum_v2or3(journal_t *journal) >> -- >> 2.46.1 >>
On Sat 21-06-25 15:46:40, Zhang Yi wrote: > On 2025/6/20 0:44, Jan Kara wrote: > > On Wed 11-06-25 19:16:24, Zhang Yi wrote: > >> From: Zhang Yi <yi.zhang@huawei.com> > >> > >> This partially reverts commit d6bf294773a4 ("ext4/jbd2: convert > >> jbd2_journal_blocks_per_page() to support large folio"). This > >> jbd2_journal_blocks_per_folio() will lead to a significant > >> overestimation of journal credits. Since we still reserve credits for > >> one page and attempt to extend and restart handles during large folio > >> writebacks, so we should convert this helper back to > >> ext4_journal_blocks_per_page(). > >> > >> Signed-off-by: Zhang Yi <yi.zhang@huawei.com> > > > > Here I'm not decided. Does it make any particular sense to reserve credits > > for one *page* worth of blocks when pages don't have any particular meaning > > in our writeback code anymore? We could reserve credits just for one > > physical extent and that should be enough. > > Indeed, reserving credits for a single page is no longer suitable in the > currently folio based context. It do seems more appropriate to allocate > credits for a single extent. > > > For blocksize == pagesize (most > > common configs) this would be actually equivalent. If blocksize < pagesize, > > this could force us to do some more writeback retries and thus get somewhat > > higher writeback CPU overhead but do we really care for these configs? It > > is well possible I've overlooked something and someone will spot a > > performance regression in practical setup with this in which case we'd have > > to come up with something more clever but I think it's worth it to start > > simple and complicate later. > > This can indeed be a problem if the file system is already fragmented > enough. However, thanks to the credits extension logic in > __ext4_journal_ensure_credits(), I suppose that on most file system images, > it will not trigger excessive retry operations. Besides, although there > might be some lock cost in jbd2_journal_extend(), I suppose it won't be a > big deal. > > Perhaps we could reserve more credits through a complex formula at the > outset, which would lower the cost of expanding the credits. But I don't > think this will help much in reducing the number of retries, it may only > be helpful in extreme cases (the running transaction stats to commit, we > cannot extend it). > > So, I think we can implement it by reserving for an extent for the time > being. Do you agree? Yes, I agree. After all this is easy to change if we find some real issues with it. Honza -- Jan Kara <jack@suse.com> SUSE Labs, CR
© 2016 - 2025 Red Hat, Inc.