fs/xfs/libxfs/xfs_da_btree.c | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-)
The DA node verifier checks header fields such as level and count, but
it does not verify that the btree hash values are in non-decreasing
order.
DA node blocks are traversed by searching the btree array for the first
hash value that exceeds the lookup hash, which requires the hash values
to be ordered correctly.
Add a hash order check to xfs_da3_node_verify, similar to the existing
validation in xfs_dir3_leaf_check_int, so that misordered node
blocks are detected as metadata corruption.
Signed-off-by: Yuto Ohnuki <ytohnuki@amazon.com>
---
fs/xfs/libxfs/xfs_da_btree.c | 8 +++++++-
1 file changed, 7 insertions(+), 1 deletion(-)
diff --git a/fs/xfs/libxfs/xfs_da_btree.c b/fs/xfs/libxfs/xfs_da_btree.c
index ad801b7bd2dd..ef49df22899e 100644
--- a/fs/xfs/libxfs/xfs_da_btree.c
+++ b/fs/xfs/libxfs/xfs_da_btree.c
@@ -225,6 +225,7 @@ xfs_da3_node_verify(
struct xfs_da_intnode *hdr = bp->b_addr;
struct xfs_da3_icnode_hdr ichdr;
xfs_failaddr_t fa;
+ int i;
xfs_da3_node_hdr_from_disk(mp, &ichdr, hdr);
@@ -247,7 +248,12 @@ xfs_da3_node_verify(
ichdr.count > mp->m_attr_geo->node_ents)
return __this_address;
- /* XXX: hash order check? */
+ /* Check hash value order. */
+ for (i = 0; i + 1 < ichdr.count; i++) {
+ if (be32_to_cpu(ichdr.btree[i].hashval) >
+ be32_to_cpu(ichdr.btree[i + 1].hashval))
+ return __this_address;
+ }
return NULL;
}
--
2.50.1
Amazon Web Services EMEA SARL, 38 avenue John F. Kennedy, L-1855 Luxembourg, R.C.S. Luxembourg B186284
Amazon Web Services EMEA SARL, Irish Branch, One Burlington Plaza, Burlington Road, Dublin 4, Ireland, branch registration number 908705
On Wed, Apr 15, 2026 at 08:12:24AM +0100, Yuto Ohnuki wrote:
> The DA node verifier checks header fields such as level and count, but
> it does not verify that the btree hash values are in non-decreasing
> order.
>
> DA node blocks are traversed by searching the btree array for the first
> hash value that exceeds the lookup hash, which requires the hash values
> to be ordered correctly.
>
> Add a hash order check to xfs_da3_node_verify, similar to the existing
> validation in xfs_dir3_leaf_check_int, so that misordered node
> blocks are detected as metadata corruption.
>
> Signed-off-by: Yuto Ohnuki <ytohnuki@amazon.com>
> ---
> fs/xfs/libxfs/xfs_da_btree.c | 8 +++++++-
> 1 file changed, 7 insertions(+), 1 deletion(-)
>
> diff --git a/fs/xfs/libxfs/xfs_da_btree.c b/fs/xfs/libxfs/xfs_da_btree.c
> index ad801b7bd2dd..ef49df22899e 100644
> --- a/fs/xfs/libxfs/xfs_da_btree.c
> +++ b/fs/xfs/libxfs/xfs_da_btree.c
> @@ -225,6 +225,7 @@ xfs_da3_node_verify(
> struct xfs_da_intnode *hdr = bp->b_addr;
> struct xfs_da3_icnode_hdr ichdr;
> xfs_failaddr_t fa;
> + int i;
>
> xfs_da3_node_hdr_from_disk(mp, &ichdr, hdr);
>
> @@ -247,7 +248,12 @@ xfs_da3_node_verify(
> ichdr.count > mp->m_attr_geo->node_ents)
> return __this_address;
>
> - /* XXX: hash order check? */
> + /* Check hash value order. */
> + for (i = 0; i + 1 < ichdr.count; i++) {
I guess that works, though I'd have iterated from i to ichdr.count and
done the test on the previous element (like scrub/dir.c does).
> + if (be32_to_cpu(ichdr.btree[i].hashval) >
> + be32_to_cpu(ichdr.btree[i + 1].hashval))
I think this might be one of the few places where doing kernel style
indentation of the if test flows better:
for (i = 1; i < ichdr.count; i++) {
if (be32_to_cpu(ichdr.btree[i - 1].hashval) >
be32_to_cpu(ichdr.btree[i].hashval))
The logic is solid though.
--D
> + return __this_address;
> + }
>
> return NULL;
> }
> --
> 2.50.1
>
>
>
>
> Amazon Web Services EMEA SARL, 38 avenue John F. Kennedy, L-1855 Luxembourg, R.C.S. Luxembourg B186284
>
> Amazon Web Services EMEA SARL, Irish Branch, One Burlington Plaza, Burlington Road, Dublin 4, Ireland, branch registration number 908705
>
>
>
>
On Wed, Apr 15, 2026 at 08:16:25AM -0700, Darrick J. Wong wrote: > On Wed, Apr 15, 2026 at 08:12:24AM +0100, Yuto Ohnuki wrote: > > The DA node verifier checks header fields such as level and count, but > > it does not verify that the btree hash values are in non-decreasing > > order. > > > > DA node blocks are traversed by searching the btree array for the first > > hash value that exceeds the lookup hash, which requires the hash values > > to be ordered correctly. > > > > Add a hash order check to xfs_da3_node_verify, similar to the existing > > validation in xfs_dir3_leaf_check_int, so that misordered node > > blocks are detected as metadata corruption. > > > > Signed-off-by: Yuto Ohnuki <ytohnuki@amazon.com> There were good reasons why this wasn't implemented originally but marked with an 'XXX: <question>' comment. It's not because it is hard to implement, but because it is an -expensive- check and it is has never been clear that the check would ever actually find corruptions in production systems. That is, any media level error that could corrupt a hash value will be caught by the CRCs, hence the only vector for a production system to experience an out of order hash value is a bug in the XFS kernel code. This situation has not changed in the past 12-13 years, so.... ... what's the CPU cost of doing this check? How much performance does this cost for cold cache directory traversal? What if we have 64kB directory blocks and a full dabtree node? i.e. this will iterate 64kB of 4 byte hash values one at time, so how much does that cost? What is the possible impacts of an out of order has entry? Lookups may not find the entry they are looking for, so files might be missing from directories. Even modifications will simply result in more out of order hash values or trip over out of order keys in path traversal which will trigger corruption reports and maybe shutdowns. I'm not sure there is anything else that can go wrong at runtime. Hence even fuzzers like syzbot creating out of order hash entries shouldn't result in crashes or other sorts of serious failures. In which case, how much extra protection against corruption based failures does runtime detection in production systems actually gain us? Is it worth the cost of running this verification on every directory node read and write? If the only real issue we need to defend against is software bugs, then we should be modelling the verification on xfs_dir3_data_check(). That's runtime dir3 data block check function that is only run on DEBUG kernels. It is placed in the data block modification code to catch corruptions immediately after the broken modification code has created them. This seems like a similar situation to me. -Dave. -- Dave Chinner dgc@kernel.org
On Thu, Apr 16, 2026 at 10:39:22AM +1000, Dave Chinner wrote: > There were good reasons why this wasn't implemented originally but > marked with an 'XXX: <question>' comment. It's not because it is > hard to implement, but because it is an -expensive- check and it is > has never been clear that the check would ever actually find > corruptions in production systems. > > That is, any media level error that could corrupt a hash value will > be caught by the CRCs, hence the only vector for a production system > to experience an out of order hash value is a bug in the XFS kernel > code. > > This situation has not changed in the past 12-13 years, so.... > > ... what's the CPU cost of doing this check? How much performance > does this cost for cold cache directory traversal? I confirmed that the hash order check does detect misordered entries as expected, but I have not measured the performance cost yet. Thank you for pointing that out. > What if we have 64kB directory blocks and a full dabtree node? i.e. > this will iterate 64kB of 4 byte hash values one at time, so how > much does that cost? > > What is the possible impacts of an out of order has entry? Lookups > may not find the entry they are looking for, so files might be > missing from directories. Even modifications will simply result in > more out of order hash values or trip over out of order keys in path > traversal which will trigger corruption reports and maybe shutdowns. > I'm not sure there is anything else that can go wrong at runtime. > > Hence even fuzzers like syzbot creating out of order hash entries > shouldn't result in crashes or other sorts of serious failures. > > In which case, how much extra protection against corruption based > failures does runtime detection in production systems actually gain > us? Is it worth the cost of running this verification on every > directory node read and write? > > If the only real issue we need to defend against is software bugs, > then we should be modelling the verification on > xfs_dir3_data_check(). That's runtime dir3 data block check > function that is only run on DEBUG kernels. It is placed in the data > block modification code to catch corruptions immediately after the > broken modification code has created them. This seems like a similar > situation to me. > > -Dave. > -- > Dave Chinner > dgc@kernel.org That makes sense. I'll rework this as a DEBUG-only check modelled on xfs_dir3_data_check(), and adopt Darrick's style suggestions as well. Thanks for the review. Yuto Amazon Web Services EMEA SARL, 38 avenue John F. Kennedy, L-1855 Luxembourg, R.C.S. Luxembourg B186284 Amazon Web Services EMEA SARL, Irish Branch, One Burlington Plaza, Burlington Road, Dublin 4, Ireland, branch registration number 908705
© 2016 - 2026 Red Hat, Inc.