fs/xfs/libxfs/xfs_da_btree.c | 11 ++++++++++- 1 file changed, 10 insertions(+), 1 deletion(-)
The xfs_da3_node_verify() function checks the integrity of directory
and attribute B-tree node blocks. However, it was missing a check to
ensure that the hash values of the btree entries within the node are
strictly increasing, as required by the B-tree structure.
Add a loop to iterate through the btree entries and verify that each
entry's hash value is greater than the previous one. If an
out-of-order hash value is detected, return failure to indicate
corruption.
This addresses the "XXX: hash order check?" comment and improves
corruption detection for DA node blocks.
Signed-off-by: Charalampos Mitrodimas <charmitro@posteo.net>
---
fs/xfs/libxfs/xfs_da_btree.c | 11 ++++++++++-
1 file changed, 10 insertions(+), 1 deletion(-)
diff --git a/fs/xfs/libxfs/xfs_da_btree.c b/fs/xfs/libxfs/xfs_da_btree.c
index 17d9e6154f1978ce5a5cb82176eea4d6b9cd768d..6c748911e54619c3ceae9b81f55cf61da6735f01 100644
--- a/fs/xfs/libxfs/xfs_da_btree.c
+++ b/fs/xfs/libxfs/xfs_da_btree.c
@@ -247,7 +247,16 @@ xfs_da3_node_verify(
ichdr.count > mp->m_attr_geo->node_ents)
return __this_address;
- /* XXX: hash order check? */
+ /* Check hash order */
+ uint32_t prev_hash = be32_to_cpu(ichdr.btree[0].hashval);
+
+ for (int i = 1; i < ichdr.count; i++) {
+ uint32_t curr_hash = be32_to_cpu(ichdr.btree[i].hashval);
+
+ if (curr_hash <= prev_hash)
+ return __this_address;
+ prev_hash = curr_hash;
+ }
return NULL;
}
---
base-commit: ecd5d67ad602c2c12e8709762717112ef0958767
change-id: 20250412-xfs-hash-check-be7397881a2c
Best regards,
--
Charalampos Mitrodimas <charmitro@posteo.net>
On Sat, Apr 12, 2025 at 08:03:57PM +0000, Charalampos Mitrodimas wrote:
> The xfs_da3_node_verify() function checks the integrity of directory
> and attribute B-tree node blocks. However, it was missing a check to
> ensure that the hash values of the btree entries within the node are
> strictly increasing, as required by the B-tree structure.
>
> Add a loop to iterate through the btree entries and verify that each
> entry's hash value is greater than the previous one. If an
> out-of-order hash value is detected, return failure to indicate
> corruption.
>
> This addresses the "XXX: hash order check?" comment and improves
> corruption detection for DA node blocks.
>
> Signed-off-by: Charalampos Mitrodimas <charmitro@posteo.net>
> ---
> fs/xfs/libxfs/xfs_da_btree.c | 11 ++++++++++-
> 1 file changed, 10 insertions(+), 1 deletion(-)
>
> diff --git a/fs/xfs/libxfs/xfs_da_btree.c b/fs/xfs/libxfs/xfs_da_btree.c
> index 17d9e6154f1978ce5a5cb82176eea4d6b9cd768d..6c748911e54619c3ceae9b81f55cf61da6735f01 100644
> --- a/fs/xfs/libxfs/xfs_da_btree.c
> +++ b/fs/xfs/libxfs/xfs_da_btree.c
> @@ -247,7 +247,16 @@ xfs_da3_node_verify(
> ichdr.count > mp->m_attr_geo->node_ents)
> return __this_address;
>
> - /* XXX: hash order check? */
> + /* Check hash order */
> + uint32_t prev_hash = be32_to_cpu(ichdr.btree[0].hashval);
> +
> + for (int i = 1; i < ichdr.count; i++) {
> + uint32_t curr_hash = be32_to_cpu(ichdr.btree[i].hashval);
> +
> + if (curr_hash <= prev_hash)
> + return __this_address;
> + prev_hash = curr_hash;
> + }
Hmmm. Do you have any numbers related to the performance impact of this patch?
IIRC for very populated directories we can end up having many entries here. It's
not uncommon to have filesystems with millions of entries in a single directory.
Now we'll be looping over all those entries here during verification, which could
scale to many interactions on this loop.
I'm not sure if I'm right here, but this seems to add a big performance penalty
for directory writes, so I'm curious about the performance implications of this
patch.
>
> return NULL;
> }
>
> ---
> base-commit: ecd5d67ad602c2c12e8709762717112ef0958767
> change-id: 20250412-xfs-hash-check-be7397881a2c
>
> Best regards,
> --
> Charalampos Mitrodimas <charmitro@posteo.net>
>
On Wed, Apr 30, 2025 at 11:23:57AM +0200, Carlos Maiolino wrote:
> On Sat, Apr 12, 2025 at 08:03:57PM +0000, Charalampos Mitrodimas wrote:
> > The xfs_da3_node_verify() function checks the integrity of directory
> > and attribute B-tree node blocks. However, it was missing a check to
> > ensure that the hash values of the btree entries within the node are
> > strictly increasing, as required by the B-tree structure.
> >
> > Add a loop to iterate through the btree entries and verify that each
> > entry's hash value is greater than the previous one. If an
> > out-of-order hash value is detected, return failure to indicate
> > corruption.
> >
> > This addresses the "XXX: hash order check?" comment and improves
> > corruption detection for DA node blocks.
> >
> > Signed-off-by: Charalampos Mitrodimas <charmitro@posteo.net>
> > ---
> > fs/xfs/libxfs/xfs_da_btree.c | 11 ++++++++++-
> > 1 file changed, 10 insertions(+), 1 deletion(-)
> >
> > diff --git a/fs/xfs/libxfs/xfs_da_btree.c b/fs/xfs/libxfs/xfs_da_btree.c
> > index 17d9e6154f1978ce5a5cb82176eea4d6b9cd768d..6c748911e54619c3ceae9b81f55cf61da6735f01 100644
> > --- a/fs/xfs/libxfs/xfs_da_btree.c
> > +++ b/fs/xfs/libxfs/xfs_da_btree.c
> > @@ -247,7 +247,16 @@ xfs_da3_node_verify(
> > ichdr.count > mp->m_attr_geo->node_ents)
> > return __this_address;
> >
> > - /* XXX: hash order check? */
> > + /* Check hash order */
> > + uint32_t prev_hash = be32_to_cpu(ichdr.btree[0].hashval);
> > +
> > + for (int i = 1; i < ichdr.count; i++) {
> > + uint32_t curr_hash = be32_to_cpu(ichdr.btree[i].hashval);
> > +
> > + if (curr_hash <= prev_hash)
> > + return __this_address;
> > + prev_hash = curr_hash;
> > + }
>
> Hmmm. Do you have any numbers related to the performance impact of this patch?
>
> IIRC for very populated directories we can end up having many entries here. It's
> not uncommon to have filesystems with millions of entries in a single directory.
> Now we'll be looping over all those entries here during verification, which could
> scale to many interactions on this loop.
> I'm not sure if I'm right here, but this seems to add a big performance penalty
> for directory writes, so I'm curious about the performance implications of this
> patch.
It's only a single dabtree block, which will likely be warm in cache
due to the crc32c validation.
But if memory serves, one can create a large enough dir (or xattr)
structure such that a dabtree node gets written out with a bunch of
entries with the same hashval. That was the subject of the correction
made in commit b7b81f336ac02f ("xfs_repair: fix incorrect dabtree
hashval comparison") so I've been wondering if this passes the xfs/599
test? Or am I just being dumb?
--D
> >
> > return NULL;
> > }
> >
> > ---
> > base-commit: ecd5d67ad602c2c12e8709762717112ef0958767
> > change-id: 20250412-xfs-hash-check-be7397881a2c
> >
> > Best regards,
> > --
> > Charalampos Mitrodimas <charmitro@posteo.net>
> >
>
On Thu, May 01, 2025 at 07:12:49AM -0700, Darrick J. Wong wrote:
> On Wed, Apr 30, 2025 at 11:23:57AM +0200, Carlos Maiolino wrote:
> > On Sat, Apr 12, 2025 at 08:03:57PM +0000, Charalampos Mitrodimas wrote:
> > > The xfs_da3_node_verify() function checks the integrity of directory
> > > and attribute B-tree node blocks. However, it was missing a check to
> > > ensure that the hash values of the btree entries within the node are
> > > strictly increasing, as required by the B-tree structure.
> > >
> > > Add a loop to iterate through the btree entries and verify that each
> > > entry's hash value is greater than the previous one. If an
> > > out-of-order hash value is detected, return failure to indicate
> > > corruption.
> > >
> > > This addresses the "XXX: hash order check?" comment and improves
> > > corruption detection for DA node blocks.
> > >
> > > Signed-off-by: Charalampos Mitrodimas <charmitro@posteo.net>
> > > ---
> > > fs/xfs/libxfs/xfs_da_btree.c | 11 ++++++++++-
> > > 1 file changed, 10 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/fs/xfs/libxfs/xfs_da_btree.c b/fs/xfs/libxfs/xfs_da_btree.c
> > > index 17d9e6154f1978ce5a5cb82176eea4d6b9cd768d..6c748911e54619c3ceae9b81f55cf61da6735f01 100644
> > > --- a/fs/xfs/libxfs/xfs_da_btree.c
> > > +++ b/fs/xfs/libxfs/xfs_da_btree.c
> > > @@ -247,7 +247,16 @@ xfs_da3_node_verify(
> > > ichdr.count > mp->m_attr_geo->node_ents)
> > > return __this_address;
> > >
> > > - /* XXX: hash order check? */
> > > + /* Check hash order */
> > > + uint32_t prev_hash = be32_to_cpu(ichdr.btree[0].hashval);
> > > +
> > > + for (int i = 1; i < ichdr.count; i++) {
> > > + uint32_t curr_hash = be32_to_cpu(ichdr.btree[i].hashval);
> > > +
> > > + if (curr_hash <= prev_hash)
> > > + return __this_address;
> > > + prev_hash = curr_hash;
> > > + }
> >
> > Hmmm. Do you have any numbers related to the performance impact of this patch?
> >
> > IIRC for very populated directories we can end up having many entries here. It's
> > not uncommon to have filesystems with millions of entries in a single directory.
> > Now we'll be looping over all those entries here during verification, which could
> > scale to many interactions on this loop.
> > I'm not sure if I'm right here, but this seems to add a big performance penalty
> > for directory writes, so I'm curious about the performance implications of this
> > patch.
>
> It's only a single dabtree block, which will likely be warm in cache
> due to the crc32c validation.
I assumed this could be called to verify a leaf block, a look at the code
seems that's not the case, so, this is perhaps harmless related to performance.
>
> But if memory serves, one can create a large enough dir (or xattr)
> structure such that a dabtree node gets written out with a bunch of
> entries with the same hashval. That was the subject of the correction
> made in commit b7b81f336ac02f ("xfs_repair: fix incorrect dabtree
> hashval comparison") so I've been wondering if this passes the xfs/599
> test? Or am I just being dumb?
>
> --D
>
> > >
> > > return NULL;
> > > }
> > >
> > > ---
> > > base-commit: ecd5d67ad602c2c12e8709762717112ef0958767
> > > change-id: 20250412-xfs-hash-check-be7397881a2c
> > >
> > > Best regards,
> > > --
> > > Charalampos Mitrodimas <charmitro@posteo.net>
> > >
> >
"Darrick J. Wong" <djwong@kernel.org> writes:
> On Wed, Apr 30, 2025 at 11:23:57AM +0200, Carlos Maiolino wrote:
>> On Sat, Apr 12, 2025 at 08:03:57PM +0000, Charalampos Mitrodimas wrote:
>> > The xfs_da3_node_verify() function checks the integrity of directory
>> > and attribute B-tree node blocks. However, it was missing a check to
>> > ensure that the hash values of the btree entries within the node are
>> > strictly increasing, as required by the B-tree structure.
>> >
>> > Add a loop to iterate through the btree entries and verify that each
>> > entry's hash value is greater than the previous one. If an
>> > out-of-order hash value is detected, return failure to indicate
>> > corruption.
>> >
>> > This addresses the "XXX: hash order check?" comment and improves
>> > corruption detection for DA node blocks.
>> >
>> > Signed-off-by: Charalampos Mitrodimas <charmitro@posteo.net>
>> > ---
>> > fs/xfs/libxfs/xfs_da_btree.c | 11 ++++++++++-
>> > 1 file changed, 10 insertions(+), 1 deletion(-)
>> >
>> > diff --git a/fs/xfs/libxfs/xfs_da_btree.c b/fs/xfs/libxfs/xfs_da_btree.c
>> > index 17d9e6154f1978ce5a5cb82176eea4d6b9cd768d..6c748911e54619c3ceae9b81f55cf61da6735f01 100644
>> > --- a/fs/xfs/libxfs/xfs_da_btree.c
>> > +++ b/fs/xfs/libxfs/xfs_da_btree.c
>> > @@ -247,7 +247,16 @@ xfs_da3_node_verify(
>> > ichdr.count > mp->m_attr_geo->node_ents)
>> > return __this_address;
>> >
>> > - /* XXX: hash order check? */
>> > + /* Check hash order */
>> > + uint32_t prev_hash = be32_to_cpu(ichdr.btree[0].hashval);
>> > +
>> > + for (int i = 1; i < ichdr.count; i++) {
>> > + uint32_t curr_hash = be32_to_cpu(ichdr.btree[i].hashval);
>> > +
>> > + if (curr_hash <= prev_hash)
>> > + return __this_address;
>> > + prev_hash = curr_hash;
>> > + }
>>
>> Hmmm. Do you have any numbers related to the performance impact of this patch?
>>
>> IIRC for very populated directories we can end up having many entries here. It's
>> not uncommon to have filesystems with millions of entries in a single directory.
>> Now we'll be looping over all those entries here during verification, which could
>> scale to many interactions on this loop.
>> I'm not sure if I'm right here, but this seems to add a big performance penalty
>> for directory writes, so I'm curious about the performance implications of this
>> patch.
>
> It's only a single dabtree block, which will likely be warm in cache
> due to the crc32c validation.
I ran a 60-second fio test that creates directories. Performance was not
significantly changed:
Before: read: IOPS=4809k, BW=18.3GiB/s (19.7GB/s)(1101GiB/60001msec)
After: read: IOPS=5121k, BW=19.5GiB/s (20.0GB/s)(1172GiB/60000msec)
But I'd welcome input on more targeted benchmarks if these aren't
representative.
>
> But if memory serves, one can create a large enough dir (or xattr)
> structure such that a dabtree node gets written out with a bunch of
> entries with the same hashval. That was the subject of the correction
> made in commit b7b81f336ac02f ("xfs_repair: fix incorrect dabtree
> hashval comparison") so I've been wondering if this passes the xfs/599
> test? Or am I just being dumb?
I've tested the patch with xfs/599 as you suggested, and found the
issue. The test fails with:
if (curr_hash <= prev_hash)
return __this_address;
But passes with:
if (curr_hash < prev_hash)
return __this_address;
XFS supports entries with identical hash values. This aligns with commit
b7b81f336ac02f ("xfs_repair: fix incorrect dabtree hashval comparison").
I'll send a v2 that checks for non-decreasing hash values (allowing
equality), rather than strictly increasing ones.
>
> --D
>
>> >
>> > return NULL;
>> > }
>> >
>> > ---
>> > base-commit: ecd5d67ad602c2c12e8709762717112ef0958767
>> > change-id: 20250412-xfs-hash-check-be7397881a2c
>> >
>> > Best regards,
>> > --
>> > Charalampos Mitrodimas <charmitro@posteo.net>
>> >
>>
"Darrick J. Wong" <djwong@kernel.org> writes:
> On Wed, Apr 30, 2025 at 11:23:57AM +0200, Carlos Maiolino wrote:
>> On Sat, Apr 12, 2025 at 08:03:57PM +0000, Charalampos Mitrodimas wrote:
>> > The xfs_da3_node_verify() function checks the integrity of directory
>> > and attribute B-tree node blocks. However, it was missing a check to
>> > ensure that the hash values of the btree entries within the node are
>> > strictly increasing, as required by the B-tree structure.
>> >
>> > Add a loop to iterate through the btree entries and verify that each
>> > entry's hash value is greater than the previous one. If an
>> > out-of-order hash value is detected, return failure to indicate
>> > corruption.
>> >
>> > This addresses the "XXX: hash order check?" comment and improves
>> > corruption detection for DA node blocks.
>> >
>> > Signed-off-by: Charalampos Mitrodimas <charmitro@posteo.net>
>> > ---
>> > fs/xfs/libxfs/xfs_da_btree.c | 11 ++++++++++-
>> > 1 file changed, 10 insertions(+), 1 deletion(-)
>> >
>> > diff --git a/fs/xfs/libxfs/xfs_da_btree.c b/fs/xfs/libxfs/xfs_da_btree.c
>> > index 17d9e6154f1978ce5a5cb82176eea4d6b9cd768d..6c748911e54619c3ceae9b81f55cf61da6735f01 100644
>> > --- a/fs/xfs/libxfs/xfs_da_btree.c
>> > +++ b/fs/xfs/libxfs/xfs_da_btree.c
>> > @@ -247,7 +247,16 @@ xfs_da3_node_verify(
>> > ichdr.count > mp->m_attr_geo->node_ents)
>> > return __this_address;
>> >
>> > - /* XXX: hash order check? */
>> > + /* Check hash order */
>> > + uint32_t prev_hash = be32_to_cpu(ichdr.btree[0].hashval);
>> > +
>> > + for (int i = 1; i < ichdr.count; i++) {
>> > + uint32_t curr_hash = be32_to_cpu(ichdr.btree[i].hashval);
>> > +
>> > + if (curr_hash <= prev_hash)
>> > + return __this_address;
>> > + prev_hash = curr_hash;
>> > + }
>>
>> Hmmm. Do you have any numbers related to the performance impact of this patch?
>>
>> IIRC for very populated directories we can end up having many entries here. It's
>> not uncommon to have filesystems with millions of entries in a single directory.
>> Now we'll be looping over all those entries here during verification, which could
>> scale to many interactions on this loop.
>> I'm not sure if I'm right here, but this seems to add a big performance penalty
>> for directory writes, so I'm curious about the performance implications of this
>> patch.
>
> It's only a single dabtree block, which will likely be warm in cache
> due to the crc32c validation.
Regardless, what is a good method of measuring the penalty, if any?
>
> But if memory serves, one can create a large enough dir (or xattr)
> structure such that a dabtree node gets written out with a bunch of
> entries with the same hashval. That was the subject of the correction
> made in commit b7b81f336ac02f ("xfs_repair: fix incorrect dabtree
> hashval comparison") so I've been wondering if this passes the xfs/599
> test? Or am I just being dumb?
I'll rebase (in case) give it a try over the next weekend and reach
back. AFAIR all tests where okay, but might gives us a hint if it is
failing now.
Thanks for the review Darrick and Carlos.
C. Mitrodimas
>
> --D
>
>> >
>> > return NULL;
>> > }
>> >
>> > ---
>> > base-commit: ecd5d67ad602c2c12e8709762717112ef0958767
>> > change-id: 20250412-xfs-hash-check-be7397881a2c
>> >
>> > Best regards,
>> > --
>> > Charalampos Mitrodimas <charmitro@posteo.net>
>> >
>>
On Sat, Apr 12, 2025 at 08:03:57PM +0000, Charalampos Mitrodimas wrote:
> The xfs_da3_node_verify() function checks the integrity of directory
> and attribute B-tree node blocks. However, it was missing a check to
> ensure that the hash values of the btree entries within the node are
> strictly increasing, as required by the B-tree structure.
>
> Add a loop to iterate through the btree entries and verify that each
> entry's hash value is greater than the previous one. If an
> out-of-order hash value is detected, return failure to indicate
> corruption.
>
> This addresses the "XXX: hash order check?" comment and improves
> corruption detection for DA node blocks.
>
> Signed-off-by: Charalampos Mitrodimas <charmitro@posteo.net>
> ---
> fs/xfs/libxfs/xfs_da_btree.c | 11 ++++++++++-
> 1 file changed, 10 insertions(+), 1 deletion(-)
>
> diff --git a/fs/xfs/libxfs/xfs_da_btree.c b/fs/xfs/libxfs/xfs_da_btree.c
> index 17d9e6154f1978ce5a5cb82176eea4d6b9cd768d..6c748911e54619c3ceae9b81f55cf61da6735f01 100644
> --- a/fs/xfs/libxfs/xfs_da_btree.c
> +++ b/fs/xfs/libxfs/xfs_da_btree.c
> @@ -247,7 +247,16 @@ xfs_da3_node_verify(
> ichdr.count > mp->m_attr_geo->node_ents)
> return __this_address;
>
> - /* XXX: hash order check? */
> + /* Check hash order */
> + uint32_t prev_hash = be32_to_cpu(ichdr.btree[0].hashval);
> +
> + for (int i = 1; i < ichdr.count; i++) {
> + uint32_t curr_hash = be32_to_cpu(ichdr.btree[i].hashval);
> +
> + if (curr_hash <= prev_hash)
Does XFS support a directory with two names that hash to the same value?
--D
> + return __this_address;
> + prev_hash = curr_hash;
> + }
>
> return NULL;
> }
>
> ---
> base-commit: ecd5d67ad602c2c12e8709762717112ef0958767
> change-id: 20250412-xfs-hash-check-be7397881a2c
>
> Best regards,
> --
> Charalampos Mitrodimas <charmitro@posteo.net>
>
>
On 15/4/25 01:15, Darrick J. Wong wrote:
> On Sat, Apr 12, 2025 at 08:03:57PM +0000, Charalampos Mitrodimas wrote:
>> The xfs_da3_node_verify() function checks the integrity of directory
>> and attribute B-tree node blocks. However, it was missing a check to
>> ensure that the hash values of the btree entries within the node are
>> strictly increasing, as required by the B-tree structure.
>>
>> Add a loop to iterate through the btree entries and verify that each
>> entry's hash value is greater than the previous one. If an
>> out-of-order hash value is detected, return failure to indicate
>> corruption.
>>
>> This addresses the "XXX: hash order check?" comment and improves
>> corruption detection for DA node blocks.
>>
>> Signed-off-by: Charalampos Mitrodimas <charmitro@posteo.net>
>> ---
>> fs/xfs/libxfs/xfs_da_btree.c | 11 ++++++++++-
>> 1 file changed, 10 insertions(+), 1 deletion(-)
>>
>> diff --git a/fs/xfs/libxfs/xfs_da_btree.c b/fs/xfs/libxfs/xfs_da_btree.c
>> index 17d9e6154f1978ce5a5cb82176eea4d6b9cd768d..6c748911e54619c3ceae9b81f55cf61da6735f01 100644
>> --- a/fs/xfs/libxfs/xfs_da_btree.c
>> +++ b/fs/xfs/libxfs/xfs_da_btree.c
>> @@ -247,7 +247,16 @@ xfs_da3_node_verify(
>> ichdr.count > mp->m_attr_geo->node_ents)
>> return __this_address;
>>
>> - /* XXX: hash order check? */
>> + /* Check hash order */
>> + uint32_t prev_hash = be32_to_cpu(ichdr.btree[0].hashval);
>> +
>> + for (int i = 1; i < ichdr.count; i++) {
>> + uint32_t curr_hash = be32_to_cpu(ichdr.btree[i].hashval);
>> +
>> + if (curr_hash <= prev_hash)
> Does XFS support a directory with two names that hash to the same value?
Hi Darrick,
I believe so, yes. These are handled at the leaf level by comparing the
actual names. This adds a check which ensures the intermediate node's
internal B-tree entries are strictly ordered by hash.
C. Mitrodimas
>
> --D
>
>> + return __this_address;
>> + prev_hash = curr_hash;
>> + }
>>
>> return NULL;
>> }
>>
>> ---
>> base-commit: ecd5d67ad602c2c12e8709762717112ef0958767
>> change-id: 20250412-xfs-hash-check-be7397881a2c
>>
>> Best regards,
>> --
>> Charalampos Mitrodimas <charmitro@posteo.net>
>>
>>
© 2016 - 2025 Red Hat, Inc.