fs/ext4/ext4_jbd2.c | 11 +++++++++-- fs/jbd2/transaction.c | 13 +++++++++---- 2 files changed, 18 insertions(+), 6 deletions(-)
From: Zhang Yi <yi.zhang@huawei.com> Hello! This series fixes an data corruption issue reported by Gao Xiang in nojournal mode. The problem is happened after a metadata block is freed, it can be immediately reallocated as a data block. However, the metadata on this block may still be in the process of being written back, which means the new data in this block could potentially be overwritten by the stale metadata and trigger a data corruption issue. Please see below discussion with Jan for more details: https://lore.kernel.org/linux-ext4/a9417096-9549-4441-9878-b1955b899b4e@huaweicloud.com/ Patch 1 strengthens the same case in ordered journal mode, theoretically preventing the occurrence of stale data issues. Patch 2 fix this issue in nojournal mode. Regards, Yi. Zhang Yi (2): jbd2: ensure that all ongoing I/O complete before freeing blocks ext4: wait for ongoing I/O to complete before freeing blocks fs/ext4/ext4_jbd2.c | 11 +++++++++-- fs/jbd2/transaction.c | 13 +++++++++---- 2 files changed, 18 insertions(+), 6 deletions(-) -- 2.46.1
Hi Ted, On 2025/9/16 17:33, Zhang Yi wrote: > From: Zhang Yi <yi.zhang@huawei.com> > > Hello! > > This series fixes an data corruption issue reported by Gao Xiang in > nojournal mode. The problem is happened after a metadata block is freed, > it can be immediately reallocated as a data block. However, the metadata > on this block may still be in the process of being written back, which > means the new data in this block could potentially be overwritten by the > stale metadata and trigger a data corruption issue. Please see below > discussion with Jan for more details: > > https://lore.kernel.org/linux-ext4/a9417096-9549-4441-9878-b1955b899b4e@huaweicloud.com/ > > Patch 1 strengthens the same case in ordered journal mode, theoretically > preventing the occurrence of stale data issues. > Patch 2 fix this issue in nojournal mode. It seems this series is not applied, is it ignored? When ext4 nojournal mode is used, it is actually a very serious bug since data corruption can happen very easily in specific conditions (we actually have a specific environment which can reproduce the issue very quickly) Also it seems AWS folks reported this issue years ago (2021), the phenomenon was almost the same, but the issue still exists until now: https://lore.kernel.org/linux-ext4/20211108173520.xp6xphodfhcen2sy@u87e72aa3c6c25c.ant.amazon.com/ Some of our internal businesses actually rely on EXT4 no_journal mode and when they upgrade the kernel from 4.19 to 5.10, they actually read corrupted data after page cache memory is reclaimed (actually the on-disk data was corrupted even earlier). So personally I wonder what's the current status of EXT4 no_journal mode since this issue has been existing for more than 5 years but some people may need an extent-enabled ext2 so they selected this mode. We already released an announcement to advise customers not using no_journal mode because it seems lack of enough maintainence (yet many end users are interested in this mode): https://www.alibabacloud.com/help/en/alinux/support/data-corruption-risk-and-solution-in-ext4-nojounral-mode Thanks, Gao Xiang > > Regards, > Yi. > > Zhang Yi (2): > jbd2: ensure that all ongoing I/O complete before freeing blocks > ext4: wait for ongoing I/O to complete before freeing blocks > > fs/ext4/ext4_jbd2.c | 11 +++++++++-- > fs/jbd2/transaction.c | 13 +++++++++---- > 2 files changed, 18 insertions(+), 6 deletions(-) >
Hi Ted! I think this patch series has fallen through the cracks. Can you please push it to Linus? Given there are real users hitting the data corruption, we should do it soon (although it isn't a new issue so it isn't supercritical). On Thu 02-10-25 19:42:34, Gao Xiang wrote: > On 2025/9/16 17:33, Zhang Yi wrote: > > From: Zhang Yi <yi.zhang@huawei.com> > > > > Hello! > > > > This series fixes an data corruption issue reported by Gao Xiang in > > nojournal mode. The problem is happened after a metadata block is freed, > > it can be immediately reallocated as a data block. However, the metadata > > on this block may still be in the process of being written back, which > > means the new data in this block could potentially be overwritten by the > > stale metadata and trigger a data corruption issue. Please see below > > discussion with Jan for more details: > > > > https://lore.kernel.org/linux-ext4/a9417096-9549-4441-9878-b1955b899b4e@huaweicloud.com/ > > > > Patch 1 strengthens the same case in ordered journal mode, theoretically > > preventing the occurrence of stale data issues. > > Patch 2 fix this issue in nojournal mode. > > It seems this series is not applied, is it ignored? Well, likely Ted just missed it when collecting patches for his PR. > When ext4 nojournal mode is used, it is actually a very > serious bug since data corruption can happen very easily > in specific conditions (we actually have a specific > environment which can reproduce the issue very quickly) This is good to know so that we can prioritize accordingly. > Also it seems AWS folks reported this issue years ago > (2021), the phenomenon was almost the same, but the issue > still exists until now: > https://lore.kernel.org/linux-ext4/20211108173520.xp6xphodfhcen2sy@u87e72aa3c6c25c.ant.amazon.com/ Likely yes, but back then we weren't able to figure out the root cause. > Some of our internal businesses actually rely on EXT4 > no_journal mode and when they upgrade the kernel from > 4.19 to 5.10, they actually read corrupted data after > page cache memory is reclaimed (actually the on-disk > data was corrupted even earlier). > > So personally I wonder what's the current status of > EXT4 no_journal mode since this issue has been existing > for more than 5 years but some people may need > an extent-enabled ext2 so they selected this mode. The nojournal mode is fully supported. There are many enterprise customers (mostly cloud vendors) that depend on it. Including Ted's employer ;) > We already released an announcement to advise customers > not using no_journal mode because it seems lack of > enough maintainence (yet many end users are interested > in this mode): > https://www.alibabacloud.com/help/en/alinux/support/data-corruption-risk-and-solution-in-ext4-nojounral-mode Well, it's good to be cautious but the reality is that data corruption issues do happen from time to time. Both in nojournal mode and in normal journalled mode. And this one exists since the beginning when nojournal mode was implemented. So it apparently requires rather specific conditions to hit. Honza -- Jan Kara <jack@suse.com> SUSE Labs, CR
Yes, sorry, this fell through the cracks. I just applied and am running it through tests. - Ted
Hi Jan, On 2025/10/6 21:52, Jan Kara wrote: > Hi Ted! > > I think this patch series has fallen through the cracks. Can you please > push it to Linus? Given there are real users hitting the data corruption, > we should do it soon (although it isn't a new issue so it isn't > supercritical). Thanks for the ping. > .. > >> Some of our internal businesses actually rely on EXT4 >> no_journal mode and when they upgrade the kernel from >> 4.19 to 5.10, they actually read corrupted data after >> page cache memory is reclaimed (actually the on-disk >> data was corrupted even earlier). >> >> So personally I wonder what's the current status of >> EXT4 no_journal mode since this issue has been existing >> for more than 5 years but some people may need >> an extent-enabled ext2 so they selected this mode. > > The nojournal mode is fully supported. There are many enterprise customers > (mostly cloud vendors) that depend on it. Including Ted's employer ;) .. yet honestly, this issue can be easily observed in no_journal + memory pressure, and our new 5.10 kernel setup (previous 4.19) can catch this issue very easily. Unless the memory is sufficient, the valid page cache can cover up this issue, but the on-disk data could be still corrupted. So we wonder how large scale no_journal mode is used for now, and if they have memory pressure workload. > >> We already released an announcement to advise customers >> not using no_journal mode because it seems lack of >> enough maintainence (yet many end users are interested >> in this mode): >> https://www.alibabacloud.com/help/en/alinux/support/data-corruption-risk-and-solution-in-ext4-nojounral-mode > > Well, it's good to be cautious but the reality is that data corruption > issues do happen from time to time. Both in nojournal mode and in normal > journalled mode. And this one exists since the beginning when nojournal > mode was implemented. So it apparently requires rather specific conditions > to hit. The original issue (the one fixed by Yi in 2019) existed for a quite long time and I think it was hard to reproduce (compared to this one), but the regression out of lack of clean_bdev_aliases() and clean_bdev_bh_alias() makes another serious regression (which exists since 2019 until now) which can be easily reproduced on some specific VM setup (our workload is also create and delete some small and big files, and data corruption can be observed since some data is filled with extent layout, much like the previous AWS one). Thanks, Gao Xiang > > Honza >
© 2016 - 2026 Red Hat, Inc.