[v1] block: fix null-deref in percpu_ref_put

[PATCH-next] block: fix null-deref in percpu_ref_put

Posted by Zhong Jinghua 2 years, 9 months ago

A problem was find in stable 5.10 and the root cause of it like below.

In the use of q_usage_counter of request_queue, blk_cleanup_queue using
"wait_event(q->mq_freeze_wq, percpu_ref_is_zero(&q->q_usage_counter))"
to wait q_usage_counter becoming zero. however, if the q_usage_counter
becoming zero quickly, and percpu_ref_exit will execute and ref->data
will be freed, maybe another process will cause a null-defef problem
like below:

	CPU0                             CPU1
blk_mq_destroy_queue
 blk_freeze_queue
  blk_mq_freeze_queue_wait
				scsi_end_request
				 percpu_ref_get
				 ...
				 percpu_ref_put
				  atomic_long_sub_and_test
 blk_put_queue
  kobject_put
   kref_put
    blk_release_queue
     percpu_ref_exit
      ref->data -> NULL
   				   ref->data->release(ref) -> null-deref

As suggested by Ming Lei, fix it by getting the release method before
the referebce count is minus 0.

Suggested-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Zhong Jinghua <zhongjinghua@huawei.com>
---
 include/linux/percpu-refcount.h | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/include/linux/percpu-refcount.h b/include/linux/percpu-refcount.h
index d73a1c08c3e3..11e717c95acb 100644
--- a/include/linux/percpu-refcount.h
+++ b/include/linux/percpu-refcount.h
@@ -331,8 +331,11 @@ static inline void percpu_ref_put_many(struct percpu_ref *ref, unsigned long nr)
 
 	if (__ref_is_percpu(ref, &percpu_count))
 		this_cpu_sub(*percpu_count, nr);
-	else if (unlikely(atomic_long_sub_and_test(nr, &ref->data->count)))
-		ref->data->release(ref);
+	else {
+		percpu_ref_func_t *release = ref->data->release;
+		if (unlikely(atomic_long_sub_and_test(nr, &ref->data->count)))
+			release(ref);
+	}
 
 	rcu_read_unlock();
 }
-- 
2.31.1

Re: [PATCH-next] block: fix null-deref in percpu_ref_put

Posted by Dennis Zhou 2 years, 9 months ago

Hello,

On Tue, Dec 06, 2022 at 05:09:39PM +0800, Zhong Jinghua wrote:
> A problem was find in stable 5.10 and the root cause of it like below.
> 
> In the use of q_usage_counter of request_queue, blk_cleanup_queue using
> "wait_event(q->mq_freeze_wq, percpu_ref_is_zero(&q->q_usage_counter))"
> to wait q_usage_counter becoming zero. however, if the q_usage_counter
> becoming zero quickly, and percpu_ref_exit will execute and ref->data
> will be freed, maybe another process will cause a null-defef problem
> like below:
> 
> 	CPU0                             CPU1
> blk_mq_destroy_queue
>  blk_freeze_queue
>   blk_mq_freeze_queue_wait
> 				scsi_end_request
> 				 percpu_ref_get
> 				 ...
> 				 percpu_ref_put
> 				  atomic_long_sub_and_test
>  blk_put_queue
>   kobject_put
>    kref_put
>     blk_release_queue
>      percpu_ref_exit
>       ref->data -> NULL
>    				   ref->data->release(ref) -> null-deref
> 

I remember thinking about this a while ago. I don't think this fix works
as nicely as it may seem. Please correct me if I'm wrong.

q->q_usage_counter has the oddity that the lifetime of the percpu_ref
object isn't managed by the release function. The freeing is handled by
a separate path where it depends on the percpu_ref hitting 0. So here we
have 2 concurrent paths racing to run with 1 destroying the object. We
probably need blk_release_queue() to wait on percpu_ref's release
finishing, not starting.

I think the above works in this specific case because there is a
call_rcu() in blk_release_queue(). If there wasn't a call_rcu(),
then by the same logic we could delay ref->data->release(ref) further
and that could potentially lead to a use after free.

Ideally, I think fixing the race in q->q_usage_counter's pattern is
better than masking it here as I think we're being saved by the
call_rcu() call further down the object release path.

Thanks,
Dennis

> As suggested by Ming Lei, fix it by getting the release method before
> the referebce count is minus 0.
> 
> Suggested-by: Ming Lei <ming.lei@redhat.com>
> Signed-off-by: Zhong Jinghua <zhongjinghua@huawei.com>
> ---
>  include/linux/percpu-refcount.h | 7 +++++--
>  1 file changed, 5 insertions(+), 2 deletions(-)
> 
> diff --git a/include/linux/percpu-refcount.h b/include/linux/percpu-refcount.h
> index d73a1c08c3e3..11e717c95acb 100644
> --- a/include/linux/percpu-refcount.h
> +++ b/include/linux/percpu-refcount.h
> @@ -331,8 +331,11 @@ static inline void percpu_ref_put_many(struct percpu_ref *ref, unsigned long nr)
>  
>  	if (__ref_is_percpu(ref, &percpu_count))
>  		this_cpu_sub(*percpu_count, nr);
> -	else if (unlikely(atomic_long_sub_and_test(nr, &ref->data->count)))
> -		ref->data->release(ref);
> +	else {
> +		percpu_ref_func_t *release = ref->data->release;
> +		if (unlikely(atomic_long_sub_and_test(nr, &ref->data->count)))
> +			release(ref);
> +	}
>  
>  	rcu_read_unlock();
>  }
> -- 
> 2.31.1
>

Re: [PATCH-next] block: fix null-deref in percpu_ref_put

Posted by Ming Lei 2 years, 9 months ago

On Wed, Dec 7, 2022 at 9:08 AM Dennis Zhou <dennis@kernel.org> wrote:
>
> Hello,
>
> On Tue, Dec 06, 2022 at 05:09:39PM +0800, Zhong Jinghua wrote:
> > A problem was find in stable 5.10 and the root cause of it like below.
> >
> > In the use of q_usage_counter of request_queue, blk_cleanup_queue using
> > "wait_event(q->mq_freeze_wq, percpu_ref_is_zero(&q->q_usage_counter))"
> > to wait q_usage_counter becoming zero. however, if the q_usage_counter
> > becoming zero quickly, and percpu_ref_exit will execute and ref->data
> > will be freed, maybe another process will cause a null-defef problem
> > like below:
> >
> >       CPU0                             CPU1
> > blk_mq_destroy_queue
> >  blk_freeze_queue
> >   blk_mq_freeze_queue_wait
> >                               scsi_end_request
> >                                percpu_ref_get
> >                                ...
> >                                percpu_ref_put
> >                                 atomic_long_sub_and_test
> >  blk_put_queue
> >   kobject_put
> >    kref_put
> >     blk_release_queue
> >      percpu_ref_exit
> >       ref->data -> NULL
> >                                  ref->data->release(ref) -> null-deref
> >
>
> I remember thinking about this a while ago. I don't think this fix works
> as nicely as it may seem. Please correct me if I'm wrong.
>
> q->q_usage_counter has the oddity that the lifetime of the percpu_ref
> object isn't managed by the release function. The freeing is handled by
> a separate path where it depends on the percpu_ref hitting 0. So here we
> have 2 concurrent paths racing to run with 1 destroying the object. We
> probably need blk_release_queue() to wait on percpu_ref's release
> finishing, not starting.
>
> I think the above works in this specific case because there is a
> call_rcu() in blk_release_queue(). If there wasn't a call_rcu(),
> then by the same logic we could delay ref->data->release(ref) further
> and that could potentially lead to a use after free.
>
> Ideally, I think fixing the race in q->q_usage_counter's pattern is
> better than masking it here as I think we're being saved by the
> call_rcu() call further down the object release path.

The problem is actually in percpu_ref_is_zero(), which can return true
before ->release() is called. And any pattern of wait_event(percpu_ref_is_zero)
may imply such risk.

It may be not easy to fix the issue in block layer cleanly, but can be
solved in percpu-refcount simply by adding ->release_lock(spin lock)
in the counter for draining atomic_long_sub_and_test() & ->release()
in percpu_ref_exit(). Or simply use percpu_ref_switch_lock.

Thanks,
Ming

Re: [PATCH-next] block: fix null-deref in percpu_ref_put

Posted by Yu Kuai 2 years, 9 months ago

Hi,

在 2022/12/07 9:05, Dennis Zhou 写道:
> Hello,
> 
> On Tue, Dec 06, 2022 at 05:09:39PM +0800, Zhong Jinghua wrote:
>> A problem was find in stable 5.10 and the root cause of it like below.
>>
>> In the use of q_usage_counter of request_queue, blk_cleanup_queue using
>> "wait_event(q->mq_freeze_wq, percpu_ref_is_zero(&q->q_usage_counter))"
>> to wait q_usage_counter becoming zero. however, if the q_usage_counter
>> becoming zero quickly, and percpu_ref_exit will execute and ref->data
>> will be freed, maybe another process will cause a null-defef problem
>> like below:
>>
>> 	CPU0                             CPU1
>> blk_mq_destroy_queue
>>   blk_freeze_queue
>>    blk_mq_freeze_queue_wait
>> 				scsi_end_request
>> 				 percpu_ref_get
>> 				 ...
>> 				 percpu_ref_put
>> 				  atomic_long_sub_and_test
>>   blk_put_queue
>>    kobject_put
>>     kref_put
>>      blk_release_queue
>>       percpu_ref_exit
>>        ref->data -> NULL
>>     				   ref->data->release(ref) -> null-deref
>>
> 
> I remember thinking about this a while ago. I don't think this fix works
> as nicely as it may seem. Please correct me if I'm wrong.
> 
> q->q_usage_counter has the oddity that the lifetime of the percpu_ref
> object isn't managed by the release function. The freeing is handled by
> a separate path where it depends on the percpu_ref hitting 0. So here we
> have 2 concurrent paths racing to run with 1 destroying the object. We
> probably need blk_release_queue() to wait on percpu_ref's release
> finishing, not starting.
> 
> I think the above works in this specific case because there is a
> call_rcu() in blk_release_queue(). If there wasn't a call_rcu(),
> then by the same logic we could delay ref->data->release(ref) further
> and that could potentially lead to a use after free.
> 
> Ideally, I think fixing the race in q->q_usage_counter's pattern is
> better than masking it here as I think we're being saved by the
> call_rcu() call further down the object release path.

Agree.

BTW, Wensheng used to send a patch to fix this in block layer:

https://www.spinics.net/lists/kernel/msg4615696.html.

Thanks,
Kuai