[PATCH v3] btrfs: validate data reloc tree file extent item members in tree-checker

Teng Liu posted 1 patch 1 month, 2 weeks ago
fs/btrfs/relocation.c   |  8 ++++----
fs/btrfs/tree-checker.c | 21 +++++++++++++++++++++
2 files changed, 25 insertions(+), 4 deletions(-)
[PATCH v3] btrfs: validate data reloc tree file extent item members in tree-checker
Posted by Teng Liu 1 month, 2 weeks ago
get_new_location() uses BUG_ON() to crash the kernel if the file extent
item it looks up has any of offset, compression, encryption, or
other_encoding set. The data reloc inode is only written by relocation's
own paths -- insert_prealloc_file_extent() and
insert_ordered_extent_file_extent() -- which always leave those four
fields at 0 (the data reloc inode is created with BTRFS_INODE_NOCOMPRESS,
and encryption/other_encoding are reserved-and-zero). Observing a
non-zero value therefore means the leaf decoded from disk does not match
what the kernel wrote, i.e. on-disk corruption. A malformed image can
reach this code via balance and panic the kernel.

Move the validation into tree-checker's check_extent_data_item(), where
the constraint is enforced when the leaf is read off disk rather than
after relocation has already started. The data reloc tree has a fixed
root id (BTRFS_DATA_RELOC_TREE_OBJECTID) recorded in the extent buffer
header, so check_extent_data_item() has all the information it needs to
apply this check on its own. Report violations via file_extent_err() and
print the four offending values.

In get_new_location() replace the BUG_ON() with an ASSERT().
The caller in replace_file_extents() already handles non-zero returns from
get_new_location() by breaking out of the loop without aborting the
transaction, so no caller changes are needed.

Suggested-by: Qu Wenruo <wqu@suse.com>
Suggested-by: David Sterba <dsterba@suse.com>
Reported-by: syzbot+3e20d8f3d41bac5dc9a2@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=3e20d8f3d41bac5dc9a2
Signed-off-by: Teng Liu <27rabbitlt@gmail.com>
---
Changes in v3:
 - Move the corruption check from relocation.c into tree-checker's
   check_extent_data_item(), per Qu and David. The data reloc tree's
   fixed objectid is recorded in the extent buffer header, so the
   check has all the context it needs at read time.
 - Use file_extent_err() and print offset/compression/encryption/
   other_encoding values, per Qu.
 - Replace the BUG_ON in get_new_location() with ASSERT() rather than
   -EUCLEAN, per David.

Changes in v2:
 - Pair the -EUCLEAN return with btrfs_print_leaf() and btrfs_err() so
   the offending leaf is dumped to dmesg, per Qu's review of v1.
 - Expand the changelog to argue why non-zero
   compression/encryption/other_encoding in the data reloc inode imply
   on-disk corruption.

 fs/btrfs/relocation.c   |  8 ++++----
 fs/btrfs/tree-checker.c | 21 +++++++++++++++++++++
 2 files changed, 25 insertions(+), 4 deletions(-)

diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c
index 1c42c5180bdd..527d4dbfe31c 100644
--- a/fs/btrfs/relocation.c
+++ b/fs/btrfs/relocation.c
@@ -835,10 +835,10 @@ static int get_new_location(struct inode *reloc_inode, u64 *new_bytenr,
 	fi = btrfs_item_ptr(leaf, path->slots[0],
 			    struct btrfs_file_extent_item);
 
-	BUG_ON(btrfs_file_extent_offset(leaf, fi) ||
-	       btrfs_file_extent_compression(leaf, fi) ||
-	       btrfs_file_extent_encryption(leaf, fi) ||
-	       btrfs_file_extent_other_encoding(leaf, fi));
+	ASSERT(!btrfs_file_extent_offset(leaf, fi) &&
+	       !btrfs_file_extent_compression(leaf, fi) &&
+	       !btrfs_file_extent_encryption(leaf, fi) &&
+	       !btrfs_file_extent_other_encoding(leaf, fi));
 
 	if (num_bytes != btrfs_file_extent_disk_num_bytes(leaf, fi))
 		return -EINVAL;
diff --git a/fs/btrfs/tree-checker.c b/fs/btrfs/tree-checker.c
index 1f15d0793a9c..e4864f7a471e 100644
--- a/fs/btrfs/tree-checker.c
+++ b/fs/btrfs/tree-checker.c
@@ -296,6 +296,27 @@ static int check_extent_data_item(struct extent_buffer *leaf,
 		return 0;
 	}
 
+	/*
+	 * For the data reloc tree, file extent items are written by
+	 * relocation's own paths, which always leave offset, compression,
+	 * encryption and other_encoding as 0. Any non-zero value here means
+	 * the leaf decoded from disk does not match what the kernel wrote,
+	 * i.e. on-disk corruption.
+	 */
+	if (unlikely(btrfs_header_owner(leaf) == BTRFS_DATA_RELOC_TREE_OBJECTID &&
+		     (btrfs_file_extent_offset(leaf, fi) ||
+		      btrfs_file_extent_compression(leaf, fi) ||
+		      btrfs_file_extent_encryption(leaf, fi) ||
+		      btrfs_file_extent_other_encoding(leaf, fi)))) {
+		file_extent_err(leaf, slot,
+"invalid members for data reloc tree, offset=%llu compress=%u encryption=%u other_encoding=%u",
+				btrfs_file_extent_offset(leaf, fi),
+				btrfs_file_extent_compression(leaf, fi),
+				btrfs_file_extent_encryption(leaf, fi),
+				btrfs_file_extent_other_encoding(leaf, fi));
+		return -EUCLEAN;
+	}
+
 	/* Regular or preallocated extent has fixed item size */
 	if (unlikely(item_size != sizeof(*fi))) {
 		file_extent_err(leaf, slot,
-- 
2.54.0
Re: [PATCH v3] btrfs: validate data reloc tree file extent item members in tree-checker
Posted by Johannes Thumshirn 1 month, 2 weeks ago
On 4/27/26 10:24 PM, Teng Liu wrote:
> diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c
> index 1c42c5180bdd..527d4dbfe31c 100644
> --- a/fs/btrfs/relocation.c
> +++ b/fs/btrfs/relocation.c
> @@ -835,10 +835,10 @@ static int get_new_location(struct inode *reloc_inode, u64 *new_bytenr,
>   	fi = btrfs_item_ptr(leaf, path->slots[0],
>   			    struct btrfs_file_extent_item);
>   
> -	BUG_ON(btrfs_file_extent_offset(leaf, fi) ||
> -	       btrfs_file_extent_compression(leaf, fi) ||
> -	       btrfs_file_extent_encryption(leaf, fi) ||
> -	       btrfs_file_extent_other_encoding(leaf, fi));
> +	ASSERT(!btrfs_file_extent_offset(leaf, fi) &&
> +	       !btrfs_file_extent_compression(leaf, fi) &&
> +	       !btrfs_file_extent_encryption(leaf, fi) &&
> +	       !btrfs_file_extent_other_encoding(leaf, fi));

Can you split that into multiple ASSERT()s? So we quickly see which one 
actually triggered.
Re: [PATCH v3] btrfs: validate data reloc tree file extent item members in tree-checker
Posted by Qu Wenruo 1 month, 2 weeks ago

在 2026/4/28 05:54, Teng Liu 写道:
> get_new_location() uses BUG_ON() to crash the kernel if the file extent
> item it looks up has any of offset, compression, encryption, or
> other_encoding set. The data reloc inode is only written by relocation's
> own paths -- insert_prealloc_file_extent() and
> insert_ordered_extent_file_extent() -- which always leave those four
> fields at 0 (the data reloc inode is created with BTRFS_INODE_NOCOMPRESS,
> and encryption/other_encoding are reserved-and-zero). Observing a
> non-zero value therefore means the leaf decoded from disk does not match
> what the kernel wrote, i.e. on-disk corruption. A malformed image can
> reach this code via balance and panic the kernel.
> 
> Move the validation into tree-checker's check_extent_data_item(), where
> the constraint is enforced when the leaf is read off disk rather than
> after relocation has already started. The data reloc tree has a fixed
> root id (BTRFS_DATA_RELOC_TREE_OBJECTID) recorded in the extent buffer
> header, so check_extent_data_item() has all the information it needs to
> apply this check on its own. Report violations via file_extent_err() and
> print the four offending values.
> 
> In get_new_location() replace the BUG_ON() with an ASSERT().
> The caller in replace_file_extents() already handles non-zero returns from
> get_new_location() by breaking out of the loop without aborting the
> transaction, so no caller changes are needed.
> 
> Suggested-by: Qu Wenruo <wqu@suse.com>
> Suggested-by: David Sterba <dsterba@suse.com>
> Reported-by: syzbot+3e20d8f3d41bac5dc9a2@syzkaller.appspotmail.com
> Closes: https://syzkaller.appspot.com/bug?extid=3e20d8f3d41bac5dc9a2
> Signed-off-by: Teng Liu <27rabbitlt@gmail.com>

Reviewed-by: Qu Wenruo <wqu@suse.com>

And merged.

Thanks,
Qu

> ---
> Changes in v3:
>   - Move the corruption check from relocation.c into tree-checker's
>     check_extent_data_item(), per Qu and David. The data reloc tree's
>     fixed objectid is recorded in the extent buffer header, so the
>     check has all the context it needs at read time.
>   - Use file_extent_err() and print offset/compression/encryption/
>     other_encoding values, per Qu.
>   - Replace the BUG_ON in get_new_location() with ASSERT() rather than
>     -EUCLEAN, per David.
> 
> Changes in v2:
>   - Pair the -EUCLEAN return with btrfs_print_leaf() and btrfs_err() so
>     the offending leaf is dumped to dmesg, per Qu's review of v1.
>   - Expand the changelog to argue why non-zero
>     compression/encryption/other_encoding in the data reloc inode imply
>     on-disk corruption.
> 
>   fs/btrfs/relocation.c   |  8 ++++----
>   fs/btrfs/tree-checker.c | 21 +++++++++++++++++++++
>   2 files changed, 25 insertions(+), 4 deletions(-)
> 
> diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c
> index 1c42c5180bdd..527d4dbfe31c 100644
> --- a/fs/btrfs/relocation.c
> +++ b/fs/btrfs/relocation.c
> @@ -835,10 +835,10 @@ static int get_new_location(struct inode *reloc_inode, u64 *new_bytenr,
>   	fi = btrfs_item_ptr(leaf, path->slots[0],
>   			    struct btrfs_file_extent_item);
>   
> -	BUG_ON(btrfs_file_extent_offset(leaf, fi) ||
> -	       btrfs_file_extent_compression(leaf, fi) ||
> -	       btrfs_file_extent_encryption(leaf, fi) ||
> -	       btrfs_file_extent_other_encoding(leaf, fi));
> +	ASSERT(!btrfs_file_extent_offset(leaf, fi) &&
> +	       !btrfs_file_extent_compression(leaf, fi) &&
> +	       !btrfs_file_extent_encryption(leaf, fi) &&
> +	       !btrfs_file_extent_other_encoding(leaf, fi));
>   
>   	if (num_bytes != btrfs_file_extent_disk_num_bytes(leaf, fi))
>   		return -EINVAL;
> diff --git a/fs/btrfs/tree-checker.c b/fs/btrfs/tree-checker.c
> index 1f15d0793a9c..e4864f7a471e 100644
> --- a/fs/btrfs/tree-checker.c
> +++ b/fs/btrfs/tree-checker.c
> @@ -296,6 +296,27 @@ static int check_extent_data_item(struct extent_buffer *leaf,
>   		return 0;
>   	}
>   
> +	/*
> +	 * For the data reloc tree, file extent items are written by
> +	 * relocation's own paths, which always leave offset, compression,
> +	 * encryption and other_encoding as 0. Any non-zero value here means
> +	 * the leaf decoded from disk does not match what the kernel wrote,
> +	 * i.e. on-disk corruption.
> +	 */
> +	if (unlikely(btrfs_header_owner(leaf) == BTRFS_DATA_RELOC_TREE_OBJECTID &&
> +		     (btrfs_file_extent_offset(leaf, fi) ||
> +		      btrfs_file_extent_compression(leaf, fi) ||
> +		      btrfs_file_extent_encryption(leaf, fi) ||
> +		      btrfs_file_extent_other_encoding(leaf, fi)))) {
> +		file_extent_err(leaf, slot,
> +"invalid members for data reloc tree, offset=%llu compress=%u encryption=%u other_encoding=%u",
> +				btrfs_file_extent_offset(leaf, fi),
> +				btrfs_file_extent_compression(leaf, fi),
> +				btrfs_file_extent_encryption(leaf, fi),
> +				btrfs_file_extent_other_encoding(leaf, fi));
> +		return -EUCLEAN;
> +	}
> +
>   	/* Regular or preallocated extent has fixed item size */
>   	if (unlikely(item_size != sizeof(*fi))) {
>   		file_extent_err(leaf, slot,

Re: [PATCH v3] btrfs: validate data reloc tree file extent item members in tree-checker
Posted by Qu Wenruo 1 month, 2 weeks ago

在 2026/4/28 07:45, Qu Wenruo 写道:
> 
> 
> 在 2026/4/28 05:54, Teng Liu 写道:
>> get_new_location() uses BUG_ON() to crash the kernel if the file extent
>> item it looks up has any of offset, compression, encryption, or
>> other_encoding set. The data reloc inode is only written by relocation's
>> own paths -- insert_prealloc_file_extent() and
>> insert_ordered_extent_file_extent() -- which always leave those four
>> fields at 0 (the data reloc inode is created with BTRFS_INODE_NOCOMPRESS,
>> and encryption/other_encoding are reserved-and-zero). Observing a
>> non-zero value therefore means the leaf decoded from disk does not match
>> what the kernel wrote, i.e. on-disk corruption. A malformed image can
>> reach this code via balance and panic the kernel.
>>
>> Move the validation into tree-checker's check_extent_data_item(), where
>> the constraint is enforced when the leaf is read off disk rather than
>> after relocation has already started. The data reloc tree has a fixed
>> root id (BTRFS_DATA_RELOC_TREE_OBJECTID) recorded in the extent buffer
>> header, so check_extent_data_item() has all the information it needs to
>> apply this check on its own. Report violations via file_extent_err() and
>> print the four offending values.
>>
>> In get_new_location() replace the BUG_ON() with an ASSERT().
>> The caller in replace_file_extents() already handles non-zero returns 
>> from
>> get_new_location() by breaking out of the loop without aborting the
>> transaction, so no caller changes are needed.
>>
>> Suggested-by: Qu Wenruo <wqu@suse.com>
>> Suggested-by: David Sterba <dsterba@suse.com>
>> Reported-by: syzbot+3e20d8f3d41bac5dc9a2@syzkaller.appspotmail.com
>> Closes: https://syzkaller.appspot.com/bug?extid=3e20d8f3d41bac5dc9a2
>> Signed-off-by: Teng Liu <27rabbitlt@gmail.com>
> 
> Reviewed-by: Qu Wenruo <wqu@suse.com>
> 
> And merged.

Unfortunately this tree-checker got triggered during btrfs/061 runs at 
write-time tree-checker, with arm64 64K page size.

The offending file extent is as the following:

[  536.885066] 	item 69 key (258 EXTENT_DATA 4063232) itemoff 12400 
itemsize 53
[  536.885067] 		generation 28 type 1
[  536.885067] 		extent data disk bytenr 10512723968 nr 36864
[  536.885068] 		extent data offset 24576 nr 12288 ram 36864
[  536.885069] 		extent compression 0

Note the offset is not zero, and the type is 1 which means it's a 
regular file extent.

So the check is causing false alerts.

Now the patch is reverted, and I'll spend more time digging into the case.

Thanks,
Qu

> 
> Thanks,
> Qu
> 
>> ---
>> Changes in v3:
>>   - Move the corruption check from relocation.c into tree-checker's
>>     check_extent_data_item(), per Qu and David. The data reloc tree's
>>     fixed objectid is recorded in the extent buffer header, so the
>>     check has all the context it needs at read time.
>>   - Use file_extent_err() and print offset/compression/encryption/
>>     other_encoding values, per Qu.
>>   - Replace the BUG_ON in get_new_location() with ASSERT() rather than
>>     -EUCLEAN, per David.
>>
>> Changes in v2:
>>   - Pair the -EUCLEAN return with btrfs_print_leaf() and btrfs_err() so
>>     the offending leaf is dumped to dmesg, per Qu's review of v1.
>>   - Expand the changelog to argue why non-zero
>>     compression/encryption/other_encoding in the data reloc inode imply
>>     on-disk corruption.
>>
>>   fs/btrfs/relocation.c   |  8 ++++----
>>   fs/btrfs/tree-checker.c | 21 +++++++++++++++++++++
>>   2 files changed, 25 insertions(+), 4 deletions(-)
>>
>> diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c
>> index 1c42c5180bdd..527d4dbfe31c 100644
>> --- a/fs/btrfs/relocation.c
>> +++ b/fs/btrfs/relocation.c
>> @@ -835,10 +835,10 @@ static int get_new_location(struct inode 
>> *reloc_inode, u64 *new_bytenr,
>>       fi = btrfs_item_ptr(leaf, path->slots[0],
>>                   struct btrfs_file_extent_item);
>> -    BUG_ON(btrfs_file_extent_offset(leaf, fi) ||
>> -           btrfs_file_extent_compression(leaf, fi) ||
>> -           btrfs_file_extent_encryption(leaf, fi) ||
>> -           btrfs_file_extent_other_encoding(leaf, fi));
>> +    ASSERT(!btrfs_file_extent_offset(leaf, fi) &&
>> +           !btrfs_file_extent_compression(leaf, fi) &&
>> +           !btrfs_file_extent_encryption(leaf, fi) &&
>> +           !btrfs_file_extent_other_encoding(leaf, fi));
>>       if (num_bytes != btrfs_file_extent_disk_num_bytes(leaf, fi))
>>           return -EINVAL;
>> diff --git a/fs/btrfs/tree-checker.c b/fs/btrfs/tree-checker.c
>> index 1f15d0793a9c..e4864f7a471e 100644
>> --- a/fs/btrfs/tree-checker.c
>> +++ b/fs/btrfs/tree-checker.c
>> @@ -296,6 +296,27 @@ static int check_extent_data_item(struct 
>> extent_buffer *leaf,
>>           return 0;
>>       }
>> +    /*
>> +     * For the data reloc tree, file extent items are written by
>> +     * relocation's own paths, which always leave offset, compression,
>> +     * encryption and other_encoding as 0. Any non-zero value here means
>> +     * the leaf decoded from disk does not match what the kernel wrote,
>> +     * i.e. on-disk corruption.
>> +     */
>> +    if (unlikely(btrfs_header_owner(leaf) == 
>> BTRFS_DATA_RELOC_TREE_OBJECTID &&
>> +             (btrfs_file_extent_offset(leaf, fi) ||
>> +              btrfs_file_extent_compression(leaf, fi) ||
>> +              btrfs_file_extent_encryption(leaf, fi) ||
>> +              btrfs_file_extent_other_encoding(leaf, fi)))) {
>> +        file_extent_err(leaf, slot,
>> +"invalid members for data reloc tree, offset=%llu compress=%u 
>> encryption=%u other_encoding=%u",
>> +                btrfs_file_extent_offset(leaf, fi),
>> +                btrfs_file_extent_compression(leaf, fi),
>> +                btrfs_file_extent_encryption(leaf, fi),
>> +                btrfs_file_extent_other_encoding(leaf, fi));
>> +        return -EUCLEAN;
>> +    }
>> +
>>       /* Regular or preallocated extent has fixed item size */
>>       if (unlikely(item_size != sizeof(*fi))) {
>>           file_extent_err(leaf, slot,
> 
> 

Re: [PATCH v3] btrfs: validate data reloc tree file extent item members in tree-checker
Posted by David Sterba 1 month, 2 weeks ago
On Tue, Apr 28, 2026 at 10:14:40AM +0930, Qu Wenruo wrote:
> 
> 
> 在 2026/4/28 07:45, Qu Wenruo 写道:
> > 
> > 
> > 在 2026/4/28 05:54, Teng Liu 写道:
> >> get_new_location() uses BUG_ON() to crash the kernel if the file extent
> >> item it looks up has any of offset, compression, encryption, or
> >> other_encoding set. The data reloc inode is only written by relocation's
> >> own paths -- insert_prealloc_file_extent() and
> >> insert_ordered_extent_file_extent() -- which always leave those four
> >> fields at 0 (the data reloc inode is created with BTRFS_INODE_NOCOMPRESS,
> >> and encryption/other_encoding are reserved-and-zero). Observing a
> >> non-zero value therefore means the leaf decoded from disk does not match
> >> what the kernel wrote, i.e. on-disk corruption. A malformed image can
> >> reach this code via balance and panic the kernel.
> >>
> >> Move the validation into tree-checker's check_extent_data_item(), where
> >> the constraint is enforced when the leaf is read off disk rather than
> >> after relocation has already started. The data reloc tree has a fixed
> >> root id (BTRFS_DATA_RELOC_TREE_OBJECTID) recorded in the extent buffer
> >> header, so check_extent_data_item() has all the information it needs to
> >> apply this check on its own. Report violations via file_extent_err() and
> >> print the four offending values.
> >>
> >> In get_new_location() replace the BUG_ON() with an ASSERT().
> >> The caller in replace_file_extents() already handles non-zero returns 
> >> from
> >> get_new_location() by breaking out of the loop without aborting the
> >> transaction, so no caller changes are needed.
> >>
> >> Suggested-by: Qu Wenruo <wqu@suse.com>
> >> Suggested-by: David Sterba <dsterba@suse.com>
> >> Reported-by: syzbot+3e20d8f3d41bac5dc9a2@syzkaller.appspotmail.com
> >> Closes: https://syzkaller.appspot.com/bug?extid=3e20d8f3d41bac5dc9a2
> >> Signed-off-by: Teng Liu <27rabbitlt@gmail.com>
> > 
> > Reviewed-by: Qu Wenruo <wqu@suse.com>
> > 
> > And merged.
> 
> Unfortunately this tree-checker got triggered during btrfs/061 runs at 
> write-time tree-checker, with arm64 64K page size.
> 
> The offending file extent is as the following:
> 
> [  536.885066] 	item 69 key (258 EXTENT_DATA 4063232) itemoff 12400 
> itemsize 53
> [  536.885067] 		generation 28 type 1
> [  536.885067] 		extent data disk bytenr 10512723968 nr 36864
> [  536.885068] 		extent data offset 24576 nr 12288 ram 36864
> [  536.885069] 		extent compression 0
> 
> Note the offset is not zero, and the type is 1 which means it's a 
> regular file extent.
> 
> So the check is causing false alerts.

I maybe have an idea.  The difference from the BUG_ON and the
tree-checker is the context where it's called. In relocation it's
somewere in the middle and there are actions fixing up the offset. OTOH
when this is done in tree-checker the constraints are different.

  get_new_location() - verifies offset, compression, ...

The offset corresponds to 'bytenr' and is returned via *new_bytenr to
replace_file_extents() and then updated in the leaf

  btrfs_set_file_extent_disk_bytenr(leaf, fi, new_bytenr);

This eventually ends up in in the pre-write check.
[PATCH v4] btrfs: validate data reloc tree file extent item members
Posted by Teng Liu 1 month ago
get_new_location() uses BUG_ON() to crash the kernel if the file extent
item it looks up has any of offset, compression, encryption, or
other_encoding set non-zero. The data reloc inode is only written by
relocation's own paths and the four fields are always 0 in what the
kernel writes:

  - insert_prealloc_file_extent() memsets the stack item to zero and
    only fills in type, disk_bytenr, disk_num_bytes and num_bytes, so
    offset/compression/encryption/other_encoding stay 0.
  - insert_ordered_extent_file_extent() copies oe->compress_type into
    the file extent's compression field, but the data reloc inode is
    created with BTRFS_INODE_NOCOMPRESS so compress_type is always 0;
    encryption and other_encoding are reserved-and-zero in btrfs.

A non-zero value here means the leaf decoded from disk does not match
what the kernel wrote, i.e. on-disk corruption. A malformed image
reaches this code via balance and panics the kernel.

A previous attempt to enforce all four constraints in tree-checker's
check_extent_data_item() was merged as commit 7d0ee95979e9 ("btrfs:
validate data reloc tree file extent item members in tree-checker")
and then reverted by commit 1c034697fcaa after btrfs/061 produced
false positives on arm64 with 64K pages. The reason: relocation
writeback legitimately produces REG file_extent_items with offset != 0
in the data reloc tree. When an ordered extent covers only the back
portion of an underlying PREALLOC (num_bytes < ram_bytes on the input
file_extent), insert_ordered_extent_file_extent() inserts a REG with

  offset    = oe->offset
  num_bytes = oe->num_bytes
  ram_bytes preserved from the original PREALLOC,

and this item can reach disk if a transaction commit fires while it
is present in the leaf.

The four fields belong in different layers:

  - compression, encryption and other_encoding are universal
    invariants for every item in the data reloc tree, regardless of
    cluster geometry. Enforce them in tree-checker's
    check_extent_data_item() so a corrupt leaf is rejected at read
    time.

  - offset is only an invariant at the cluster-boundary keys that
    get_new_location() searches (the key is computed as
    src_disk_bytenr - reloc_block_group_start). Partial-PREALLOC
    writebacks legitimately place REG items at non-boundary keys with
    offset != 0; tree-checker cannot reject these. The cluster-
    boundary item is always written by either
    insert_prealloc_file_extent() (offset=0 by memset) or by the
    front portion of a partial writeback (offset=0 by construction),
    so a non-zero offset there is corruption.

Enforce the universal invariants in check_extent_data_item() with a
file_extent_err() rejection. Convert the BUG_ON() in
get_new_location() to a -EUCLEAN return paired with btrfs_print_leaf()
and btrfs_err() so the offending leaf is logged. The caller in
replace_file_extents() already handles non-zero returns from
get_new_location() by breaking out of the loop without aborting the
transaction.

Suggested-by: Qu Wenruo <wqu@suse.com>
Suggested-by: David Sterba <dsterba@suse.com>
Reported-by: syzbot+3e20d8f3d41bac5dc9a2@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=3e20d8f3d41bac5dc9a2
Signed-off-by: Teng Liu <27rabbitlt@gmail.com>
---
Changes in v4:
 - Split the check by which layer the invariant holds in. Reject
   compression/encryption/other_encoding != 0 in tree-checker (true
   on-disk invariant for the entire data reloc tree). Keep the offset
   check at the call site in get_new_location() (true only at the
   cluster-boundary keys it searches; partial-PREALLOC writeback
   legitimately produces non-zero offset at non-boundary keys, which
   is why the v3 single-rule approach was reverted).
 - Suggested by Qu Wenruo in reply to v3:
   https://lore.kernel.org/linux-btrfs/20260427202822.278326-1-27rabbitlt@gmail.com/

Changes in v3:
 - Moved the entire four-field check from get_new_location() into
   tree-checker's check_extent_data_item(). Replaced BUG_ON() with
   ASSERT() in get_new_location(). Merged as 7d0ee95979e9 and
   reverted by 1c034697fcaa due to false positives in btrfs/061 on
   arm64 64K pages.

Changes in v2:
 - Pair the -EUCLEAN return with btrfs_print_leaf() and btrfs_err()
   so the offending leaf is dumped to dmesg, per Qu's v1 review:
   https://lore.kernel.org/linux-btrfs/6c54901d-5e07-4c46-9553-997b28c93b86@suse.com/
 - Expand the changelog to argue why non-zero compression/encryption/
   other_encoding in the data reloc inode imply on-disk corruption
   rather than a kernel bug.

 fs/btrfs/relocation.c   | 22 ++++++++++++++++++----
 fs/btrfs/tree-checker.c | 27 +++++++++++++++++++++++++++
 2 files changed, 45 insertions(+), 4 deletions(-)

diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c
index 1c42c5180bdd..01977fa282db 100644
--- a/fs/btrfs/relocation.c
+++ b/fs/btrfs/relocation.c
@@ -814,6 +814,7 @@ static int get_new_location(struct inode *reloc_inode, u64 *new_bytenr,
 			    u64 bytenr, u64 num_bytes)
 {
 	struct btrfs_root *root = BTRFS_I(reloc_inode)->root;
+	struct btrfs_fs_info *fs_info = root->fs_info;
 	BTRFS_PATH_AUTO_FREE(path);
 	struct btrfs_file_extent_item *fi;
 	struct extent_buffer *leaf;
@@ -835,10 +836,23 @@ static int get_new_location(struct inode *reloc_inode, u64 *new_bytenr,
 	fi = btrfs_item_ptr(leaf, path->slots[0],
 			    struct btrfs_file_extent_item);
 
-	BUG_ON(btrfs_file_extent_offset(leaf, fi) ||
-	       btrfs_file_extent_compression(leaf, fi) ||
-	       btrfs_file_extent_encryption(leaf, fi) ||
-	       btrfs_file_extent_other_encoding(leaf, fi));
+	/*
+	 * The cluster-boundary key searched above is always written by
+	 * relocation with offset 0: either by insert_prealloc_file_extent()
+	 * (memsets the stack item to 0) or by the front portion of a partial
+	 * writeback (offset=0 by construction). A non-zero value here means
+	 * the on-disk leaf does not match what relocation wrote, i.e.
+	 * corruption. The other encoding fields are caught earlier by
+	 * tree-checker's check_extent_data_item().
+	 */
+	if (unlikely(btrfs_file_extent_offset(leaf, fi))) {
+		btrfs_print_leaf(leaf);
+		btrfs_err(fs_info,
+"unexpected non-zero offset in file extent item for data reloc inode %llu key offset %llu offset %llu",
+			  btrfs_ino(BTRFS_I(reloc_inode)), bytenr,
+			  btrfs_file_extent_offset(leaf, fi));
+		return -EUCLEAN;
+	}
 
 	if (num_bytes != btrfs_file_extent_disk_num_bytes(leaf, fi))
 		return -EINVAL;
diff --git a/fs/btrfs/tree-checker.c b/fs/btrfs/tree-checker.c
index 1f15d0793a9c..8fc919dc08d0 100644
--- a/fs/btrfs/tree-checker.c
+++ b/fs/btrfs/tree-checker.c
@@ -296,6 +296,33 @@ static int check_extent_data_item(struct extent_buffer *leaf,
 		return 0;
 	}
 
+	/*
+	 * For the data reloc tree, file extent items are written by
+	 * relocation's own paths. The data reloc inode is created with
+	 * BTRFS_INODE_NOCOMPRESS, so insert_ordered_extent_file_extent()
+	 * always leaves the compression field at 0. Encryption and
+	 * other_encoding are reserved-and-zero in btrfs. A non-zero value
+	 * for any of these means the leaf decoded from disk does not match
+	 * what the kernel wrote, i.e. on-disk corruption.
+	 *
+	 * The file_extent_item's offset field is NOT a universal invariant
+	 * here: partial-PREALLOC writebacks legitimately produce REG items
+	 * with non-zero offset at non-boundary keys. The offset check is
+	 * performed at the call site in get_new_location(), which only
+	 * inspects cluster-boundary keys where offset is always 0.
+	 */
+	if (unlikely(btrfs_header_owner(leaf) == BTRFS_DATA_RELOC_TREE_OBJECTID &&
+		     (btrfs_file_extent_compression(leaf, fi) ||
+		      btrfs_file_extent_encryption(leaf, fi) ||
+		      btrfs_file_extent_other_encoding(leaf, fi)))) {
+		file_extent_err(leaf, slot,
+"invalid encoding fields for data reloc tree, compression=%u encryption=%u other_encoding=%u",
+				btrfs_file_extent_compression(leaf, fi),
+				btrfs_file_extent_encryption(leaf, fi),
+				btrfs_file_extent_other_encoding(leaf, fi));
+		return -EUCLEAN;
+	}
+
 	/* Regular or preallocated extent has fixed item size */
 	if (unlikely(item_size != sizeof(*fi))) {
 		file_extent_err(leaf, slot,

base-commit: 6bf684b8823552b99c86bf791b22f622934ee771
-- 
2.54.0
Re: [PATCH v4] btrfs: validate data reloc tree file extent item members
Posted by David Sterba 3 weeks, 1 day ago
On Wed, May 13, 2026 at 01:35:44PM +0200, Teng Liu wrote:
> get_new_location() uses BUG_ON() to crash the kernel if the file extent
> item it looks up has any of offset, compression, encryption, or
> other_encoding set non-zero. The data reloc inode is only written by
> relocation's own paths and the four fields are always 0 in what the
> kernel writes:
> 
>   - insert_prealloc_file_extent() memsets the stack item to zero and
>     only fills in type, disk_bytenr, disk_num_bytes and num_bytes, so
>     offset/compression/encryption/other_encoding stay 0.
>   - insert_ordered_extent_file_extent() copies oe->compress_type into
>     the file extent's compression field, but the data reloc inode is
>     created with BTRFS_INODE_NOCOMPRESS so compress_type is always 0;
>     encryption and other_encoding are reserved-and-zero in btrfs.
> 
> A non-zero value here means the leaf decoded from disk does not match
> what the kernel wrote, i.e. on-disk corruption. A malformed image
> reaches this code via balance and panics the kernel.
> 
> A previous attempt to enforce all four constraints in tree-checker's
> check_extent_data_item() was merged as commit 7d0ee95979e9 ("btrfs:
> validate data reloc tree file extent item members in tree-checker")
> and then reverted by commit 1c034697fcaa after btrfs/061 produced
> false positives on arm64 with 64K pages. The reason: relocation
> writeback legitimately produces REG file_extent_items with offset != 0
> in the data reloc tree. When an ordered extent covers only the back
> portion of an underlying PREALLOC (num_bytes < ram_bytes on the input
> file_extent), insert_ordered_extent_file_extent() inserts a REG with
> 
>   offset    = oe->offset
>   num_bytes = oe->num_bytes
>   ram_bytes preserved from the original PREALLOC,
> 
> and this item can reach disk if a transaction commit fires while it
> is present in the leaf.
> 
> The four fields belong in different layers:
> 
>   - compression, encryption and other_encoding are universal
>     invariants for every item in the data reloc tree, regardless of
>     cluster geometry. Enforce them in tree-checker's
>     check_extent_data_item() so a corrupt leaf is rejected at read
>     time.
> 
>   - offset is only an invariant at the cluster-boundary keys that
>     get_new_location() searches (the key is computed as
>     src_disk_bytenr - reloc_block_group_start). Partial-PREALLOC
>     writebacks legitimately place REG items at non-boundary keys with
>     offset != 0; tree-checker cannot reject these. The cluster-
>     boundary item is always written by either
>     insert_prealloc_file_extent() (offset=0 by memset) or by the
>     front portion of a partial writeback (offset=0 by construction),
>     so a non-zero offset there is corruption.
> 
> Enforce the universal invariants in check_extent_data_item() with a
> file_extent_err() rejection. Convert the BUG_ON() in
> get_new_location() to a -EUCLEAN return paired with btrfs_print_leaf()
> and btrfs_err() so the offending leaf is logged. The caller in
> replace_file_extents() already handles non-zero returns from
> get_new_location() by breaking out of the loop without aborting the
> transaction.
> 
> Suggested-by: Qu Wenruo <wqu@suse.com>
> Suggested-by: David Sterba <dsterba@suse.com>
> Reported-by: syzbot+3e20d8f3d41bac5dc9a2@syzkaller.appspotmail.com
> Closes: https://syzkaller.appspot.com/bug?extid=3e20d8f3d41bac5dc9a2
> Signed-off-by: Teng Liu <27rabbitlt@gmail.com>

Added to for-next, thanks.