[PATCH 0/3] ocfs2: stop BUG_ON crashes in suballoc invalid-dinode paths

ZhengYuan Huang posted 3 patches 2 months, 1 week ago
fs/ocfs2/suballoc.c | 33 +++++++++++++++++++++------------
1 file changed, 21 insertions(+), 12 deletions(-)
[PATCH 0/3] ocfs2: stop BUG_ON crashes in suballoc invalid-dinode paths
Posted by ZhengYuan Huang 2 months, 1 week ago
commit 10995aa2451a ("ocfs2: Morph the haphazard
OCFS2_IS_VALID_DINODE() checks.") converted several OCFS2 dinode
corruption checks from graceful error handling to BUG_ON() under the
assumption that every caller only sees validated inode buffers.

That assumption does not always hold for JBD-managed buffers. The common
inode read path can still hand suballoc code an invalid dinode, which turns
crafted filesystem corruption into a kernel panic instead of a normal OCFS2
filesystem error.

This series restores graceful corruption handling at the three
independently reachable BUG_ON() sites in fs/ocfs2/suballoc.c:

1. reserve_suballoc_bits()
2. claim_suballoc_bits()
3. _ocfs2_free_suballoc_bits()

The series is split per crash site so each patch fixes one bug. A broader
follow-up could harden structural validation for JBD-managed inode reads,
but that change touches a much wider read-side contract and is kept out of
scope here.

ZhengYuan Huang (3):
  ocfs2: handle invalid dinode in reserve_suballoc_bits
  ocfs2: handle invalid dinode in claim_suballoc_bits
  ocfs2: handle invalid dinode in _ocfs2_free_suballoc_bits

 fs/ocfs2/suballoc.c | 33 +++++++++++++++++++++------------
 1 file changed, 21 insertions(+), 12 deletions(-)

-- 
2.43.0
Re: [PATCH 0/3] ocfs2: stop BUG_ON crashes in suballoc invalid-dinode paths
Posted by Joseph Qi 2 months, 1 week ago

On 4/3/26 2:30 PM, ZhengYuan Huang wrote:
> commit 10995aa2451a ("ocfs2: Morph the haphazard
> OCFS2_IS_VALID_DINODE() checks.") converted several OCFS2 dinode
> corruption checks from graceful error handling to BUG_ON() under the
> assumption that every caller only sees validated inode buffers.
> 
> That assumption does not always hold for JBD-managed buffers. The common
> inode read path can still hand suballoc code an invalid dinode, which turns
> crafted filesystem corruption into a kernel panic instead of a normal OCFS2
> filesystem error.
> 

When inode first read from disk, it will call ocfs2_validate_inode_block()
to validate if it is valid.
So it seems this is a code bug once the buffer is modified? Or how it
happens?

Thanks,
Joseph

> This series restores graceful corruption handling at the three
> independently reachable BUG_ON() sites in fs/ocfs2/suballoc.c:
> 
> 1. reserve_suballoc_bits()
> 2. claim_suballoc_bits()
> 3. _ocfs2_free_suballoc_bits()
> 
> The series is split per crash site so each patch fixes one bug. A broader
> follow-up could harden structural validation for JBD-managed inode reads,
> but that change touches a much wider read-side contract and is kept out of
> scope here.
> 
> ZhengYuan Huang (3):
>   ocfs2: handle invalid dinode in reserve_suballoc_bits
>   ocfs2: handle invalid dinode in claim_suballoc_bits
>   ocfs2: handle invalid dinode in _ocfs2_free_suballoc_bits
> 
>  fs/ocfs2/suballoc.c | 33 +++++++++++++++++++++------------
>  1 file changed, 21 insertions(+), 12 deletions(-)
>
Re: [PATCH 0/3] ocfs2: stop BUG_ON crashes in suballoc invalid-dinode paths
Posted by ZhengYuan Huang 2 months ago
On Fri, Apr 3, 2026 at 5:30 PM Joseph Qi <joseph.qi@linux.alibaba.com> wrote:
> On 4/3/26 2:30 PM, ZhengYuan Huang wrote:
> > commit 10995aa2451a ("ocfs2: Morph the haphazard
> > OCFS2_IS_VALID_DINODE() checks.") converted several OCFS2 dinode
> > corruption checks from graceful error handling to BUG_ON() under the
> > assumption that every caller only sees validated inode buffers.
> >
> > That assumption does not always hold for JBD-managed buffers. The common
> > inode read path can still hand suballoc code an invalid dinode, which turns
> > crafted filesystem corruption into a kernel panic instead of a normal OCFS2
> > filesystem error.
> >
>
> When inode first read from disk, it will call ocfs2_validate_inode_block()
> to validate if it is valid.
> So it seems this is a code bug once the buffer is modified? Or how it
> happens?
>
> Thanks,
> Joseph

This bug was discovered by our fuzzing framework. The fuzzer mutates
filesystem metadata on disk to test filesystem robustness, but it does
not modify in-memory state.

Due to an unknown issue, the full crash log was truncated, so we
currently cannot deterministically reproduce the bug. We are still
working on reconstructing a reliable reproducer based on partial
traces.

From our current analysis, one possible explanation is that the
initial inode validation does not guarantee the buffer remains valid
for its entire lifetime:

On mount, OCFS2 loads local system inodes before journal replay, so
the allocator inode can be instantiated and validated first.
Afterwards, dirty journal replay writes filesystem blocks back through
jbd2 recovery using __getblk(j_fs_dev, blocknr) + memcpy(nbh->b_data,
...) + mark_buffer_dirty(), which can overwrite the same cached bh
that was previously validated.

Later, OCFS2 rereads inode blocks through ocfs2_inode_lock paths, and
those read paths explicitly skip inode validation when buffer_jbd(bh)
is set. This is visible both in ocfs2_read_blocks_sync() and
ocfs2_read_blocks(), and the latter even documents that journal-held
buffers never get NeedsValidate set.

Normal allocator updates make these dinode buffers JBD-managed via
ocfs2_journal_access_di() -> jbd2_journal_get_write_access() ->
set_buffer_jbd(bh). So the bug is not that the very first read forgot
to validate; it is that a previously validated system-inode bh can be
changed later, and subsequent JBD-owned rereads bypass validation
before reaching the BUG_ON in suballoc.

So the issue does not appear to be that the initial validation is
missing, but rather that a previously validated buffer can be modified
later (e.g., by journal replay), and subsequent accesses bypass
validation due to JBD state.

We are still investigating and will update if we manage to produce a
reliable reproducer.

Thanks,
ZhengYuan Huang