[v4] btrfs: fix sleep from invalid context bug in update_qgroup_limit_item()

[PATCH v4 1/3] btrfs: add might_sleep() to some places in update_qgroup_limit_item()

Posted by ChenXiaoSong 3 years, 4 months ago

As the potential sleeping under spin lock is hard to spot, we should add
might_sleep() to some places.

Signed-off-by: ChenXiaoSong <chenxiaosong2@huawei.com>
---
 fs/btrfs/ctree.c  | 2 ++
 fs/btrfs/qgroup.c | 2 ++
 2 files changed, 4 insertions(+)

diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
index a9543f01184c..809053e9cfde 100644
--- a/fs/btrfs/ctree.c
+++ b/fs/btrfs/ctree.c
@@ -1934,6 +1934,8 @@ int btrfs_search_slot(struct btrfs_trans_handle *trans, struct btrfs_root *root,
 	int min_write_lock_level;
 	int prev_cmp;
 
+	might_sleep();
+
 	lowest_level = p->lowest_level;
 	WARN_ON(lowest_level && ins_len > 0);
 	WARN_ON(p->nodes[0] != NULL);
diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c
index 9334c3157c22..d0480b9c6c86 100644
--- a/fs/btrfs/qgroup.c
+++ b/fs/btrfs/qgroup.c
@@ -779,6 +779,8 @@ static int update_qgroup_limit_item(struct btrfs_trans_handle *trans,
 	int ret;
 	int slot;
 
+	might_sleep();
+
 	key.objectid = 0;
 	key.type = BTRFS_QGROUP_LIMIT_KEY;
 	key.offset = qgroup->qgroupid;
-- 
2.31.1

Re: [PATCH v4 1/3] btrfs: add might_sleep() to some places in update_qgroup_limit_item()

Posted by Qu Wenruo 3 years, 4 months ago


On 2022/11/16 01:17, ChenXiaoSong wrote:
> As the potential sleeping under spin lock is hard to spot, we should add
> might_sleep() to some places.
> 
> Signed-off-by: ChenXiaoSong <chenxiaosong2@huawei.com>

Looks good.

We may want to add more in other locations, but this is really a good start.

Reviewed-by: Qu Wenruo <wqu@suse.com>

Thanks,
Qu
> ---
>   fs/btrfs/ctree.c  | 2 ++
>   fs/btrfs/qgroup.c | 2 ++
>   2 files changed, 4 insertions(+)
> 
> diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
> index a9543f01184c..809053e9cfde 100644
> --- a/fs/btrfs/ctree.c
> +++ b/fs/btrfs/ctree.c
> @@ -1934,6 +1934,8 @@ int btrfs_search_slot(struct btrfs_trans_handle *trans, struct btrfs_root *root,
>   	int min_write_lock_level;
>   	int prev_cmp;
>   
> +	might_sleep();
> +
>   	lowest_level = p->lowest_level;
>   	WARN_ON(lowest_level && ins_len > 0);
>   	WARN_ON(p->nodes[0] != NULL);
> diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c
> index 9334c3157c22..d0480b9c6c86 100644
> --- a/fs/btrfs/qgroup.c
> +++ b/fs/btrfs/qgroup.c
> @@ -779,6 +779,8 @@ static int update_qgroup_limit_item(struct btrfs_trans_handle *trans,
>   	int ret;
>   	int slot;
>   
> +	might_sleep();
> +
>   	key.objectid = 0;
>   	key.type = BTRFS_QGROUP_LIMIT_KEY;
>   	key.offset = qgroup->qgroupid;

Re: [PATCH v4 1/3] btrfs: add might_sleep() to some places in update_qgroup_limit_item()

Posted by ChenXiaoSong 3 years, 4 months ago

在 2022/11/16 6:48, Qu Wenruo 写道:
> Looks good.
> 
> We may want to add more in other locations, but this is really a good 
> start.
> 
> Reviewed-by: Qu Wenruo <wqu@suse.com>
> 
> Thanks,
> Qu

If I just add might_sleep() in btrfs_alloc_path() and 
btrfs_search_slot(), is it reasonable?

Or just add might_sleep() to one place in update_qgroup_limit_item() ?

Re: [PATCH v4 1/3] btrfs: add might_sleep() to some places in update_qgroup_limit_item()

Posted by Qu Wenruo 3 years, 4 months ago


On 2022/11/16 16:09, ChenXiaoSong wrote:
> 在 2022/11/16 6:48, Qu Wenruo 写道:
>> Looks good.
>>
>> We may want to add more in other locations, but this is really a good 
>> start.
>>
>> Reviewed-by: Qu Wenruo <wqu@suse.com>
>>
>> Thanks,
>> Qu
> 
> If I just add might_sleep() in btrfs_alloc_path() and 
> btrfs_search_slot(), is it reasonable?

Adding it to btrfs_search_slot() is definitely correct.

But why for btrfs_alloc_path()? Wouldn't kmem_cache_zalloc() itself 
already do the might_sleep_if() somewhere?

I just looked the call chain, and indeed it is doing the check already:

btrfs_alloc_path()
|- kmem_cache_zalloc()
    |- kmem_cache_alloc()
       |- __kmem_cache_alloc_lru()
          |- slab_alloc()
             |- slab_alloc_node()
                |- slab_pre_alloc_hook()
                   |- might_alloc()
                      |- might_sleep_if()

Thanks,
Qu

> 
> Or just add might_sleep() to one place in update_qgroup_limit_item() ?

Re: [PATCH v4 1/3] btrfs: add might_sleep() to some places in update_qgroup_limit_item()

Posted by David Sterba 3 years, 4 months ago

On Wed, Nov 16, 2022 at 04:43:50PM +0800, Qu Wenruo wrote:
> 
> 
> On 2022/11/16 16:09, ChenXiaoSong wrote:
> > 在 2022/11/16 6:48, Qu Wenruo 写道:
> >> Looks good.
> >>
> >> We may want to add more in other locations, but this is really a good 
> >> start.
> >>
> >> Reviewed-by: Qu Wenruo <wqu@suse.com>
> >>
> >> Thanks,
> >> Qu
> > 
> > If I just add might_sleep() in btrfs_alloc_path() and 
> > btrfs_search_slot(), is it reasonable?
> 
> Adding it to btrfs_search_slot() is definitely correct.
> 
> But why for btrfs_alloc_path()? Wouldn't kmem_cache_zalloc() itself 
> already do the might_sleep_if() somewhere?
> 
> I just looked the call chain, and indeed it is doing the check already:
> 
> btrfs_alloc_path()
> |- kmem_cache_zalloc()
>     |- kmem_cache_alloc()
>        |- __kmem_cache_alloc_lru()
>           |- slab_alloc()
>              |- slab_alloc_node()
>                 |- slab_pre_alloc_hook()
>                    |- might_alloc()
>                       |- might_sleep_if()

The call chaing is unconditional so the check will always happen but the
condition itself in might_sleep_if does not recognize GFP_NOFS:

 34 static inline bool gfpflags_allow_blocking(const gfp_t gfp_flags)
 35 {
 36         return !!(gfp_flags & __GFP_DIRECT_RECLAIM);
 37 }

#define GFP_NOFS        (__GFP_RECLAIM | __GFP_IO)

And I think the qgroup limit was exactly a spin lock over btrfs_path_alloc so
it did not help. An might_sleep() inside btrfs_path_alloc() is a very minimal
but reliable check we could add, the paths are used in many places so it would
increase the coverage.

Re: [PATCH v4 1/3] btrfs: add might_sleep() to some places in update_qgroup_limit_item()

Posted by Qu Wenruo 3 years, 4 months ago


On 2022/11/16 20:24, David Sterba wrote:
> On Wed, Nov 16, 2022 at 04:43:50PM +0800, Qu Wenruo wrote:
>>
>>
>> On 2022/11/16 16:09, ChenXiaoSong wrote:
>>> 在 2022/11/16 6:48, Qu Wenruo 写道:
>>>> Looks good.
>>>>
>>>> We may want to add more in other locations, but this is really a good
>>>> start.
>>>>
>>>> Reviewed-by: Qu Wenruo <wqu@suse.com>
>>>>
>>>> Thanks,
>>>> Qu
>>>
>>> If I just add might_sleep() in btrfs_alloc_path() and
>>> btrfs_search_slot(), is it reasonable?
>>
>> Adding it to btrfs_search_slot() is definitely correct.
>>
>> But why for btrfs_alloc_path()? Wouldn't kmem_cache_zalloc() itself
>> already do the might_sleep_if() somewhere?
>>
>> I just looked the call chain, and indeed it is doing the check already:
>>
>> btrfs_alloc_path()
>> |- kmem_cache_zalloc()
>>      |- kmem_cache_alloc()
>>         |- __kmem_cache_alloc_lru()
>>            |- slab_alloc()
>>               |- slab_alloc_node()
>>                  |- slab_pre_alloc_hook()
>>                     |- might_alloc()
>>                        |- might_sleep_if()
> 
> The call chaing is unconditional so the check will always happen but the
> condition itself in might_sleep_if does not recognize GFP_NOFS:
> 
>   34 static inline bool gfpflags_allow_blocking(const gfp_t gfp_flags)
>   35 {
>   36         return !!(gfp_flags & __GFP_DIRECT_RECLAIM);
>   37 }
> 
> #define GFP_NOFS        (__GFP_RECLAIM | __GFP_IO)
> 
> And I think the qgroup limit was exactly a spin lock over btrfs_path_alloc so
> it did not help. An might_sleep() inside btrfs_path_alloc() is a very minimal
> but reliable check we could add, the paths are used in many places so it would
> increase the coverage.

OK, then it makes sense now for btrfs_alloc_path().

But I still believe this looks like a bug in gfpflags_allow_blocking()...

Thanks,
Qu

Re: [PATCH v4 1/3] btrfs: add might_sleep() to some places in update_qgroup_limit_item()

Posted by David Sterba 3 years, 4 months ago

On Wed, Nov 16, 2022 at 01:17:07AM +0800, ChenXiaoSong wrote:
> As the potential sleeping under spin lock is hard to spot, we should add
> might_sleep() to some places.
> 
> Signed-off-by: ChenXiaoSong <chenxiaosong2@huawei.com>
> ---
>  fs/btrfs/ctree.c  | 2 ++
>  fs/btrfs/qgroup.c | 2 ++
>  2 files changed, 4 insertions(+)
> 
> diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
> index a9543f01184c..809053e9cfde 100644
> --- a/fs/btrfs/ctree.c
> +++ b/fs/btrfs/ctree.c
> @@ -1934,6 +1934,8 @@ int btrfs_search_slot(struct btrfs_trans_handle *trans, struct btrfs_root *root,
>  	int min_write_lock_level;
>  	int prev_cmp;
>  
> +	might_sleep();

This needs some explanation in the changelog, the reason was mentioned
in some past patch iteration that it's due to potential IO fi the blocks
are not cached.

> +
>  	lowest_level = p->lowest_level;
>  	WARN_ON(lowest_level && ins_len > 0);
>  	WARN_ON(p->nodes[0] != NULL);
> diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c
> index 9334c3157c22..d0480b9c6c86 100644
> --- a/fs/btrfs/qgroup.c
> +++ b/fs/btrfs/qgroup.c
> @@ -779,6 +779,8 @@ static int update_qgroup_limit_item(struct btrfs_trans_handle *trans,
>  	int ret;
>  	int slot;
>  
> +	might_sleep();

This one is redundant, no? There's call to btrfs_search_slot a few lines
below.

> +
>  	key.objectid = 0;
>  	key.type = BTRFS_QGROUP_LIMIT_KEY;
>  	key.offset = qgroup->qgroupid;
> -- 
> 2.31.1
>