[v2] btrfs: Sync read disk super and set block size

[PATCH v2] btrfs: Sync read disk super and set block size

Posted by Edward Adam Davis 4 weeks, 1 day ago

When the user performs a btrfs mount, the block device is not set
correctly. The user sets the block size of the block device to 0x4000
by executing the BLKBSZSET command.
Since the block size change also changes the mapping->flags value, this
further affects the result of the mapping_min_folio_order() calculation.

Let's analyze the following two scenarios:
Scenario 1: Without executing the BLKBSZSET command, the block size is
0x1000, and mapping_min_folio_order() returns 0;

Scenario 2: After executing the BLKBSZSET command, the block size is
0x4000, and mapping_min_folio_order() returns 2.

do_read_cache_folio() allocates a folio before the BLKBSZSET command
is executed. This results in the allocated folio having an order value
of 0. Later, after BLKBSZSET is executed, the block size increases to
0x4000, and the mapping_min_folio_order() calculation result becomes 2.
This leads to two undesirable consequences:
1. filemap_add_folio() triggers a VM_BUG_ON_FOLIO(folio_order(folio) <
mapping_min_folio_order(mapping)) assertion.
2. The syzbot report [1] shows a null pointer dereference in
create_empty_buffers() due to a buffer head allocation failure.

Synchronization should be established based on the inode between the
BLKBSZSET command and read cache page to prevent inconsistencies in
block size or mapping flags before and after folio allocation.

[1]
KASAN: null-ptr-deref in range [0x0000000000000000-0x0000000000000007]
RIP: 0010:create_empty_buffers+0x4d/0x480 fs/buffer.c:1694
Call Trace:
 folio_create_buffers+0x109/0x150 fs/buffer.c:1802
 block_read_full_folio+0x14c/0x850 fs/buffer.c:2403
 filemap_read_folio+0xc8/0x2a0 mm/filemap.c:2496
 do_read_cache_folio+0x266/0x5c0 mm/filemap.c:4096
 do_read_cache_page mm/filemap.c:4162 [inline]
 read_cache_page_gfp+0x29/0x120 mm/filemap.c:4195
 btrfs_read_disk_super+0x192/0x500 fs/btrfs/volumes.c:1367

Reported-by: syzbot+b4a2af3000eaa84d95d5@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=b4a2af3000eaa84d95d5
Signed-off-by: Edward Adam Davis <eadavis@qq.com>
---
v1 -> v2: replace inode lock with invalidate lock

 fs/btrfs/volumes.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 13c514684cfb..68ff166fe445 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -1364,7 +1364,9 @@ struct btrfs_super_block *btrfs_read_disk_super(struct block_device *bdev,
 				      (bytenr + BTRFS_SUPER_INFO_SIZE) >> PAGE_SHIFT);
 	}
 
+	filemap_invalidate_lock(mapping);
 	page = read_cache_page_gfp(mapping, bytenr >> PAGE_SHIFT, GFP_NOFS);
+	filemap_invalidate_unlock(mapping);
 	if (IS_ERR(page))
 		return ERR_CAST(page);
 
-- 
2.43.0

Re: [PATCH v2] btrfs: Sync read disk super and set block size

Posted by Qu Wenruo 4 weeks, 1 day ago


在 2026/1/9 23:32, Edward Adam Davis 写道:
> When the user performs a btrfs mount, the block device is not set
> correctly.

Explain "the block device is not set correctly", all your later commit 
message is showing that it's end user re-setting the device block size.
There is nothing "not set correctly".

> The user sets the block size of the block device to 0x4000
> by executing the BLKBSZSET command.
> Since the block size change also changes the mapping->flags value, this
> further affects the result of the mapping_min_folio_order() calculation.
> 
> Let's analyze the following two scenarios:
> Scenario 1: Without executing the BLKBSZSET command, the block size is
> 0x1000, and mapping_min_folio_order() returns 0;
> 
> Scenario 2: After executing the BLKBSZSET command, the block size is
> 0x4000, and mapping_min_folio_order() returns 2.
> 
> do_read_cache_folio() allocates a folio before the BLKBSZSET command
> is executed. This results in the allocated folio having an order value
> of 0. Later, after BLKBSZSET is executed, the block size increases to
> 0x4000, and the mapping_min_folio_order() calculation result becomes 2.
> This leads to two undesirable consequences:
> 1. filemap_add_folio() triggers a VM_BUG_ON_FOLIO(folio_order(folio) <
> mapping_min_folio_order(mapping)) assertion.
> 2. The syzbot report [1] shows a null pointer dereference in
> create_empty_buffers() due to a buffer head allocation failure.

Although I agree with the analyze so far, I didn't see we should 
continue stick to the old read_page_cache_gfp() call.

There is already an old series which will address it pretty well:

https://lore.kernel.org/linux-btrfs/cover.1752097916.git.wqu@suse.com/

Although at that time I'm not aware of the race between blocksize set 
and read_page_cache_gfp().

Since btrfs is the last one utilizing this interface, I think it's 
better to completely remove it other than bothering the extra locking 
requirement.

> 
> Synchronization should be established based on the inode between the
> BLKBSZSET command and read cache page to prevent inconsistencies in
> block size or mapping flags before and after folio allocation.
> 
> [1]
> KASAN: null-ptr-deref in range [0x0000000000000000-0x0000000000000007]
> RIP: 0010:create_empty_buffers+0x4d/0x480 fs/buffer.c:1694
> Call Trace:
>   folio_create_buffers+0x109/0x150 fs/buffer.c:1802
>   block_read_full_folio+0x14c/0x850 fs/buffer.c:2403
>   filemap_read_folio+0xc8/0x2a0 mm/filemap.c:2496
>   do_read_cache_folio+0x266/0x5c0 mm/filemap.c:4096
>   do_read_cache_page mm/filemap.c:4162 [inline]
>   read_cache_page_gfp+0x29/0x120 mm/filemap.c:4195
>   btrfs_read_disk_super+0x192/0x500 fs/btrfs/volumes.c:1367
> 
> Reported-by: syzbot+b4a2af3000eaa84d95d5@syzkaller.appspotmail.com
> Closes: https://syzkaller.appspot.com/bug?extid=b4a2af3000eaa84d95d5
> Signed-off-by: Edward Adam Davis <eadavis@qq.com>

Anyway you won't reply to any review/comment, I doubt if you will change 
this time.


> ---
> v1 -> v2: replace inode lock with invalidate lock
> 
>   fs/btrfs/volumes.c | 2 ++
>   1 file changed, 2 insertions(+)
> 
> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> index 13c514684cfb..68ff166fe445 100644
> --- a/fs/btrfs/volumes.c
> +++ b/fs/btrfs/volumes.c
> @@ -1364,7 +1364,9 @@ struct btrfs_super_block *btrfs_read_disk_super(struct block_device *bdev,
>   				      (bytenr + BTRFS_SUPER_INFO_SIZE) >> PAGE_SHIFT);
>   	}
>   
> +	filemap_invalidate_lock(mapping);
>   	page = read_cache_page_gfp(mapping, bytenr >> PAGE_SHIFT, GFP_NOFS);
> +	filemap_invalidate_unlock(mapping);
>   	if (IS_ERR(page))
>   		return ERR_CAST(page);
>

Re: [PATCH v2] btrfs: Sync read disk super and set block size

Posted by Filipe Manana 4 weeks, 1 day ago

On Fri, Jan 9, 2026 at 1:02 PM Edward Adam Davis <eadavis@qq.com> wrote:
>
> When the user performs a btrfs mount, the block device is not set
> correctly. The user sets the block size of the block device to 0x4000
> by executing the BLKBSZSET command.
> Since the block size change also changes the mapping->flags value, this
> further affects the result of the mapping_min_folio_order() calculation.
>
> Let's analyze the following two scenarios:
> Scenario 1: Without executing the BLKBSZSET command, the block size is
> 0x1000, and mapping_min_folio_order() returns 0;
>
> Scenario 2: After executing the BLKBSZSET command, the block size is
> 0x4000, and mapping_min_folio_order() returns 2.
>
> do_read_cache_folio() allocates a folio before the BLKBSZSET command
> is executed. This results in the allocated folio having an order value
> of 0. Later, after BLKBSZSET is executed, the block size increases to
> 0x4000, and the mapping_min_folio_order() calculation result becomes 2.
> This leads to two undesirable consequences:
> 1. filemap_add_folio() triggers a VM_BUG_ON_FOLIO(folio_order(folio) <
> mapping_min_folio_order(mapping)) assertion.
> 2. The syzbot report [1] shows a null pointer dereference in
> create_empty_buffers() due to a buffer head allocation failure.
>
> Synchronization should be established based on the inode between the
> BLKBSZSET command and read cache page to prevent inconsistencies in
> block size or mapping flags before and after folio allocation.
>
> [1]
> KASAN: null-ptr-deref in range [0x0000000000000000-0x0000000000000007]
> RIP: 0010:create_empty_buffers+0x4d/0x480 fs/buffer.c:1694
> Call Trace:
>  folio_create_buffers+0x109/0x150 fs/buffer.c:1802
>  block_read_full_folio+0x14c/0x850 fs/buffer.c:2403
>  filemap_read_folio+0xc8/0x2a0 mm/filemap.c:2496
>  do_read_cache_folio+0x266/0x5c0 mm/filemap.c:4096
>  do_read_cache_page mm/filemap.c:4162 [inline]
>  read_cache_page_gfp+0x29/0x120 mm/filemap.c:4195
>  btrfs_read_disk_super+0x192/0x500 fs/btrfs/volumes.c:1367
>
> Reported-by: syzbot+b4a2af3000eaa84d95d5@syzkaller.appspotmail.com
> Closes: https://syzkaller.appspot.com/bug?extid=b4a2af3000eaa84d95d5
> Signed-off-by: Edward Adam Davis <eadavis@qq.com>

Reviewed-by: Filipe Manana <fdmanana@suse.com>

Thanks, I've added it to the for-next branch with minor editing
(uncapitalize the first word in the subject and added some blank lines
for readability):

https://github.com/btrfs/linux/commits/for-next/

> ---
> v1 -> v2: replace inode lock with invalidate lock
>
>  fs/btrfs/volumes.c | 2 ++
>  1 file changed, 2 insertions(+)
>
> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> index 13c514684cfb..68ff166fe445 100644
> --- a/fs/btrfs/volumes.c
> +++ b/fs/btrfs/volumes.c
> @@ -1364,7 +1364,9 @@ struct btrfs_super_block *btrfs_read_disk_super(struct block_device *bdev,
>                                       (bytenr + BTRFS_SUPER_INFO_SIZE) >> PAGE_SHIFT);
>         }
>
> +       filemap_invalidate_lock(mapping);
>         page = read_cache_page_gfp(mapping, bytenr >> PAGE_SHIFT, GFP_NOFS);
> +       filemap_invalidate_unlock(mapping);
>         if (IS_ERR(page))
>                 return ERR_CAST(page);
>
> --
> 2.43.0
>