[PATCH for-6.0] block/nbd: fix possible use after free of s->connect_thread

Vladimir Sementsov-Ogievskiy posted 1 patch 1 week ago
Test checkpatch passed
Patches applied successfully (tree, apply log)
git fetch https://github.com/patchew-project/qemu tags/patchew/20210406155114.1057355-1-vsementsov@virtuozzo.com
Maintainers: Eric Blake <eblake@redhat.com>, Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>, Max Reitz <mreitz@redhat.com>, Kevin Wolf <kwolf@redhat.com>
block/nbd.c | 11 +++++++++++
1 file changed, 11 insertions(+)

[PATCH for-6.0] block/nbd: fix possible use after free of s->connect_thread

Posted by Vladimir Sementsov-Ogievskiy 1 week ago
If on nbd_close() we detach the thread (in
nbd_co_establish_connection_cancel() thr->state becomes
CONNECT_THREAD_RUNNING_DETACHED), after that point we should not use
s->connect_thread (which is set to NULL), as running thread may free it
at any time.

Still nbd_co_establish_connection() does exactly this: it saves
s->connect_thread to local variable (just for better code style) and
use it even after yield point, when thread may be already detached.

Fix that. Also check thr to be non-NULL on
nbd_co_establish_connection() start for safety.

After this patch "case CONNECT_THREAD_RUNNING_DETACHED" becomes
impossible in the second switch in nbd_co_establish_connection().
Still, don't add extra abort() just before the release. If it somehow
possible to reach this "case:" it won't hurt. Anyway, good refactoring
of all this reconnect mess will come soon.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
---

Hi all! I faced a crash, just running 277 iotest in a loop. I can't
reproduce it on master, it reproduces only on my branch with nbd
reconnect refactorings.

Still, it seems very possible that it may crash under some conditions.
So I propose this patch for 6.0. It's written so that it's obvious that
it will not hurt:

 pre-patch, on first hunk we'll just crash if thr is NULL,
 on second hunk it's safe to return -1, and using thr when
 s->connect_thread is already zeroed is obviously wrong.

 block/nbd.c | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/block/nbd.c b/block/nbd.c
index c26dc5a54f..1d4668d42d 100644
--- a/block/nbd.c
+++ b/block/nbd.c
@@ -443,6 +443,11 @@ nbd_co_establish_connection(BlockDriverState *bs, Error **errp)
     BDRVNBDState *s = bs->opaque;
     NBDConnectThread *thr = s->connect_thread;
 
+    if (!thr) {
+        /* detached */
+        return -1;
+    }
+
     qemu_mutex_lock(&thr->mutex);
 
     switch (thr->state) {
@@ -486,6 +491,12 @@ nbd_co_establish_connection(BlockDriverState *bs, Error **errp)
     s->wait_connect = true;
     qemu_coroutine_yield();
 
+    if (!s->connect_thread) {
+        /* detached */
+        return -1;
+    }
+    assert(thr == s->connect_thread);
+
     qemu_mutex_lock(&thr->mutex);
 
     switch (thr->state) {
-- 
2.29.2


Re: [PATCH for-6.0] block/nbd: fix possible use after free of s->connect_thread

Posted by Max Reitz 21 hours ago
On 06.04.21 17:51, Vladimir Sementsov-Ogievskiy wrote:
> If on nbd_close() we detach the thread (in
> nbd_co_establish_connection_cancel() thr->state becomes
> CONNECT_THREAD_RUNNING_DETACHED), after that point we should not use
> s->connect_thread (which is set to NULL), as running thread may free it
> at any time.
> 
> Still nbd_co_establish_connection() does exactly this: it saves
> s->connect_thread to local variable (just for better code style) and
> use it even after yield point, when thread may be already detached.
> 
> Fix that. Also check thr to be non-NULL on
> nbd_co_establish_connection() start for safety.
> 
> After this patch "case CONNECT_THREAD_RUNNING_DETACHED" becomes
> impossible in the second switch in nbd_co_establish_connection().
> Still, don't add extra abort() just before the release. If it somehow
> possible to reach this "case:" it won't hurt. Anyway, good refactoring
> of all this reconnect mess will come soon.
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> ---
> 
> Hi all! I faced a crash, just running 277 iotest in a loop. I can't
> reproduce it on master, it reproduces only on my branch with nbd
> reconnect refactorings.
> 
> Still, it seems very possible that it may crash under some conditions.
> So I propose this patch for 6.0. It's written so that it's obvious that
> it will not hurt:
> 
>   pre-patch, on first hunk we'll just crash if thr is NULL,
>   on second hunk it's safe to return -1, and using thr when
>   s->connect_thread is already zeroed is obviously wrong.
> 
>   block/nbd.c | 11 +++++++++++
>   1 file changed, 11 insertions(+)
> 
> diff --git a/block/nbd.c b/block/nbd.c
> index c26dc5a54f..1d4668d42d 100644
> --- a/block/nbd.c
> +++ b/block/nbd.c
> @@ -443,6 +443,11 @@ nbd_co_establish_connection(BlockDriverState *bs, Error **errp)
>       BDRVNBDState *s = bs->opaque;
>       NBDConnectThread *thr = s->connect_thread;
>   
> +    if (!thr) {
> +        /* detached */
> +        return -1;
> +    }
> +
>       qemu_mutex_lock(&thr->mutex);
>   
>       switch (thr->state) {

First, it is a bit strange not to set *errp in these cases.  I don’t 
think it’d make a difference anyway, but now that I look into it...  The 
error would be propagated to s->connect_err, but that isn’t used 
anywhere, so...  What’s the point?

Anyway.  What I really wanted to ask is: nbd_co_establish_connection() 
may create a thread, which accesses the thread.  Is that a problem? 
Like, can this happen:

- nbd_co_establish_connection(): thr->state := THREAD_RUNNING
- nbd_co_establish_connection(): thread is created
- nbd_co_establish_connection(): thr->mutex unlocked
- connect_thread_func(): thr->mutex locked
- connect_thread_func(): thr->state := something else
- connect_thread_func(): thr->mutex unlocked
- nbd_co_establish_connection(): yields
- (As I understood it, this yield leads to
    nbd_co_establish_connection_cancel() continuing)
- nbd_co_EC_cancel(): thr->mutex locked
- nbd_co_EC_cancel(): do_free true
- nbd_co_EC_cancel(): nbd_free_connect_thread()
- connect_thread_func(): nbd_free_connect_thread()

?

Max


Re: [PATCH for-6.0] block/nbd: fix possible use after free of s->connect_thread

Posted by Vladimir Sementsov-Ogievskiy 20 hours ago
13.04.2021 14:53, Max Reitz wrote:
> On 06.04.21 17:51, Vladimir Sementsov-Ogievskiy wrote:
>> If on nbd_close() we detach the thread (in
>> nbd_co_establish_connection_cancel() thr->state becomes
>> CONNECT_THREAD_RUNNING_DETACHED), after that point we should not use
>> s->connect_thread (which is set to NULL), as running thread may free it
>> at any time.
>>
>> Still nbd_co_establish_connection() does exactly this: it saves
>> s->connect_thread to local variable (just for better code style) and
>> use it even after yield point, when thread may be already detached.
>>
>> Fix that. Also check thr to be non-NULL on
>> nbd_co_establish_connection() start for safety.
>>
>> After this patch "case CONNECT_THREAD_RUNNING_DETACHED" becomes
>> impossible in the second switch in nbd_co_establish_connection().
>> Still, don't add extra abort() just before the release. If it somehow
>> possible to reach this "case:" it won't hurt. Anyway, good refactoring
>> of all this reconnect mess will come soon.
>>
>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
>> ---
>>
>> Hi all! I faced a crash, just running 277 iotest in a loop. I can't
>> reproduce it on master, it reproduces only on my branch with nbd
>> reconnect refactorings.
>>
>> Still, it seems very possible that it may crash under some conditions.
>> So I propose this patch for 6.0. It's written so that it's obvious that
>> it will not hurt:
>>
>>   pre-patch, on first hunk we'll just crash if thr is NULL,
>>   on second hunk it's safe to return -1, and using thr when
>>   s->connect_thread is already zeroed is obviously wrong.
>>
>>   block/nbd.c | 11 +++++++++++
>>   1 file changed, 11 insertions(+)
>>
>> diff --git a/block/nbd.c b/block/nbd.c
>> index c26dc5a54f..1d4668d42d 100644
>> --- a/block/nbd.c
>> +++ b/block/nbd.c
>> @@ -443,6 +443,11 @@ nbd_co_establish_connection(BlockDriverState *bs, Error **errp)
>>       BDRVNBDState *s = bs->opaque;
>>       NBDConnectThread *thr = s->connect_thread;
>> +    if (!thr) {
>> +        /* detached */
>> +        return -1;
>> +    }
>> +
>>       qemu_mutex_lock(&thr->mutex);
>>       switch (thr->state) {
> 
> First, it is a bit strange not to set *errp in these cases. 

Oops, right! ashamed)

> I don’t think it’d make a difference anyway, but now that I look into it...  The error would be propagated to s->connect_err, but that isn’t used anywhere, so...  What’s the point?

Yes it's unused.. That's to be improved later.

> 
> Anyway.  What I really wanted to ask is: nbd_co_establish_connection() may create a thread, which accesses the thread.  Is that a problem? Like, can this happen:
> 
> - nbd_co_establish_connection(): thr->state := THREAD_RUNNING
> - nbd_co_establish_connection(): thread is created
> - nbd_co_establish_connection(): thr->mutex unlocked
> - connect_thread_func(): thr->mutex locked
> - connect_thread_func(): thr->state := something else
> - connect_thread_func(): thr->mutex unlocked
> - nbd_co_establish_connection(): yields
> - (As I understood it, this yield leads to
>    nbd_co_establish_connection_cancel() continuing)
> - nbd_co_EC_cancel(): thr->mutex locked
> - nbd_co_EC_cancel(): do_free true
> - nbd_co_EC_cancel(): nbd_free_connect_thread()
> - connect_thread_func(): nbd_free_connect_thread()
> 

I think not. Here we are safe: connect_thread_func will only do free if thread in CONNECT_THREAD_RUNNING_DETACHED state. Thread becomes CONNECT_THREAD_RUNNING_DETACHED only on one code path in nbd_co_establish_connection_cancel(), and on that path do_free is false. OK, what you say is possible if we call nbd_co_establish_connection_cancel() twice with detach=true. But that should not be possible if it is called only from .bdrv_close (indirectly) of nbd driver.

The problem is that nbd_co_EC_cancel() may free the thread state, and then nbd_co_establish_connection() reuses saved thr local varible after yield. Still (as I noted before), I've never reproduced it on master, only on my branch with some modifications. Still it seems possible anyway.


-- 
Best regards,
Vladimir

Re: [PATCH for-6.0] block/nbd: fix possible use after free of s->connect_thread

Posted by Max Reitz 19 hours ago
On 13.04.21 14:19, Vladimir Sementsov-Ogievskiy wrote:
> 13.04.2021 14:53, Max Reitz wrote:
>> On 06.04.21 17:51, Vladimir Sementsov-Ogievskiy wrote:
>>> If on nbd_close() we detach the thread (in
>>> nbd_co_establish_connection_cancel() thr->state becomes
>>> CONNECT_THREAD_RUNNING_DETACHED), after that point we should not use
>>> s->connect_thread (which is set to NULL), as running thread may free it
>>> at any time.
>>>
>>> Still nbd_co_establish_connection() does exactly this: it saves
>>> s->connect_thread to local variable (just for better code style) and
>>> use it even after yield point, when thread may be already detached.
>>>
>>> Fix that. Also check thr to be non-NULL on
>>> nbd_co_establish_connection() start for safety.
>>>
>>> After this patch "case CONNECT_THREAD_RUNNING_DETACHED" becomes
>>> impossible in the second switch in nbd_co_establish_connection().
>>> Still, don't add extra abort() just before the release. If it somehow
>>> possible to reach this "case:" it won't hurt. Anyway, good refactoring
>>> of all this reconnect mess will come soon.
>>>
>>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
>>> ---
>>>
>>> Hi all! I faced a crash, just running 277 iotest in a loop. I can't
>>> reproduce it on master, it reproduces only on my branch with nbd
>>> reconnect refactorings.
>>>
>>> Still, it seems very possible that it may crash under some conditions.
>>> So I propose this patch for 6.0. It's written so that it's obvious that
>>> it will not hurt:
>>>
>>>   pre-patch, on first hunk we'll just crash if thr is NULL,
>>>   on second hunk it's safe to return -1, and using thr when
>>>   s->connect_thread is already zeroed is obviously wrong.
>>>
>>>   block/nbd.c | 11 +++++++++++
>>>   1 file changed, 11 insertions(+)
>>>
>>> diff --git a/block/nbd.c b/block/nbd.c
>>> index c26dc5a54f..1d4668d42d 100644
>>> --- a/block/nbd.c
>>> +++ b/block/nbd.c
>>> @@ -443,6 +443,11 @@ nbd_co_establish_connection(BlockDriverState 
>>> *bs, Error **errp)
>>>       BDRVNBDState *s = bs->opaque;
>>>       NBDConnectThread *thr = s->connect_thread;
>>> +    if (!thr) {
>>> +        /* detached */
>>> +        return -1;
>>> +    }
>>> +
>>>       qemu_mutex_lock(&thr->mutex);
>>>       switch (thr->state) {
>>
>> First, it is a bit strange not to set *errp in these cases. 
> 
> Oops, right! ashamed)

OK, so who cares.  It wouldn’t do anything anyway.

Apart from that, all the changes do is to turn use after frees or 
immediate NULL dereferences into clean errors.  I can’t see any 
resources that should be cleaned up, so I hope Coverity won’t hate me 
for taking this patch.

And then we’ll see whether Peter will take the pull request...

Max


Re: [PATCH for-6.0] block/nbd: fix possible use after free of s->connect_thread

Posted by Roman Kagan 2 days ago
On Tue, Apr 06, 2021 at 06:51:14PM +0300, Vladimir Sementsov-Ogievskiy wrote:
> If on nbd_close() we detach the thread (in
> nbd_co_establish_connection_cancel() thr->state becomes
> CONNECT_THREAD_RUNNING_DETACHED), after that point we should not use
> s->connect_thread (which is set to NULL), as running thread may free it
> at any time.
> 
> Still nbd_co_establish_connection() does exactly this: it saves
> s->connect_thread to local variable (just for better code style) and
> use it even after yield point, when thread may be already detached.
> 
> Fix that. Also check thr to be non-NULL on
> nbd_co_establish_connection() start for safety.
> 
> After this patch "case CONNECT_THREAD_RUNNING_DETACHED" becomes
> impossible in the second switch in nbd_co_establish_connection().
> Still, don't add extra abort() just before the release. If it somehow
> possible to reach this "case:" it won't hurt. Anyway, good refactoring
> of all this reconnect mess will come soon.
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> ---
> 
> Hi all! I faced a crash, just running 277 iotest in a loop. I can't
> reproduce it on master, it reproduces only on my branch with nbd
> reconnect refactorings.
> 
> Still, it seems very possible that it may crash under some conditions.
> So I propose this patch for 6.0. It's written so that it's obvious that
> it will not hurt:
> 
>  pre-patch, on first hunk we'll just crash if thr is NULL,
>  on second hunk it's safe to return -1, and using thr when
>  s->connect_thread is already zeroed is obviously wrong.
> 
>  block/nbd.c | 11 +++++++++++
>  1 file changed, 11 insertions(+)

Can we please get it merged in 6.0 as it's a genuine crasher fix?

Reviewed-by: Roman Kagan <rvkagan@yandex-team.ru>

Re: [PATCH for-6.0] block/nbd: fix possible use after free of s->connect_thread

Posted by Vladimir Sementsov-Ogievskiy 2 days ago
12.04.2021 11:45, Roman Kagan wrote:
> On Tue, Apr 06, 2021 at 06:51:14PM +0300, Vladimir Sementsov-Ogievskiy wrote:
>> If on nbd_close() we detach the thread (in
>> nbd_co_establish_connection_cancel() thr->state becomes
>> CONNECT_THREAD_RUNNING_DETACHED), after that point we should not use
>> s->connect_thread (which is set to NULL), as running thread may free it
>> at any time.
>>
>> Still nbd_co_establish_connection() does exactly this: it saves
>> s->connect_thread to local variable (just for better code style) and
>> use it even after yield point, when thread may be already detached.
>>
>> Fix that. Also check thr to be non-NULL on
>> nbd_co_establish_connection() start for safety.
>>
>> After this patch "case CONNECT_THREAD_RUNNING_DETACHED" becomes
>> impossible in the second switch in nbd_co_establish_connection().
>> Still, don't add extra abort() just before the release. If it somehow
>> possible to reach this "case:" it won't hurt. Anyway, good refactoring
>> of all this reconnect mess will come soon.
>>
>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
>> ---
>>
>> Hi all! I faced a crash, just running 277 iotest in a loop. I can't
>> reproduce it on master, it reproduces only on my branch with nbd
>> reconnect refactorings.
>>
>> Still, it seems very possible that it may crash under some conditions.
>> So I propose this patch for 6.0. It's written so that it's obvious that
>> it will not hurt:
>>
>>   pre-patch, on first hunk we'll just crash if thr is NULL,
>>   on second hunk it's safe to return -1, and using thr when
>>   s->connect_thread is already zeroed is obviously wrong.
>>
>>   block/nbd.c | 11 +++++++++++
>>   1 file changed, 11 insertions(+)
> 
> Can we please get it merged in 6.0 as it's a genuine crasher fix?
> 
> Reviewed-by: Roman Kagan <rvkagan@yandex-team.ru>
> 

Thanks! I'm applying it to my nbd branch and will send PULL request for rc3 today

-- 
Best regards,
Vladimir

Re: [PATCH for-6.0] block/nbd: fix possible use after free of s->connect_thread

Posted by Vladimir Sementsov-Ogievskiy 1 week ago
06.04.2021 18:51, Vladimir Sementsov-Ogievskiy wrote:
> If on nbd_close() we detach the thread (in
> nbd_co_establish_connection_cancel() thr->state becomes
> CONNECT_THREAD_RUNNING_DETACHED), after that point we should not use
> s->connect_thread (which is set to NULL), as running thread may free it
> at any time.
> 
> Still nbd_co_establish_connection() does exactly this: it saves
> s->connect_thread to local variable (just for better code style) and
> use it even after yield point, when thread may be already detached.
> 
> Fix that. Also check thr to be non-NULL on
> nbd_co_establish_connection() start for safety.
> 
> After this patch "case CONNECT_THREAD_RUNNING_DETACHED" becomes
> impossible in the second switch in nbd_co_establish_connection().
> Still, don't add extra abort() just before the release. If it somehow
> possible to reach this "case:" it won't hurt. Anyway, good refactoring
> of all this reconnect mess will come soon.
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy<vsementsov@virtuozzo.com>
> ---
> 
> Hi all! I faced a crash, just running 277 iotest in a loop. I can't
> reproduce it on master, it reproduces only on my branch with nbd
> reconnect refactorings.
> 
> Still, it seems very possible that it may crash under some conditions.
> So I propose this patch for 6.0. It's written so that it's obvious that
> it will not hurt:
> 
>   pre-patch, on first hunk we'll just crash if thr is NULL,
>   on second hunk it's safe to return -1, and using thr when
>   s->connect_thread is already zeroed is obviously wrong.

Ha, occasionally I reinvented what Roman already does in "[PATCH 1/7] block/nbd: avoid touching freed connect_thread".

My additional first hunk actually is not needed, as nbd_co_establish_connection is called after if (!nbd_clisent_connecting(s)) { return; }, so we should not be here after  nbd_co_establish_connection_cancel(bs, true); which is called with s->state set to NBD_CLIENT_QUIT.

So, it would be more honest to take Roman's patch "[PATCH 1/7] block/nbd: avoid touching freed connect_thread" :)

Eric, could you take a look? If there no more pending block patches, I can try to send pull-request myself

-- 
Best regards,
Vladimir

Re: [PATCH for-6.0] block/nbd: fix possible use after free of s->connect_thread

Posted by Vladimir Sementsov-Ogievskiy 6 days, 13 hours ago
06.04.2021 19:20, Vladimir Sementsov-Ogievskiy wrote:
> 06.04.2021 18:51, Vladimir Sementsov-Ogievskiy wrote:
>> If on nbd_close() we detach the thread (in
>> nbd_co_establish_connection_cancel() thr->state becomes
>> CONNECT_THREAD_RUNNING_DETACHED), after that point we should not use
>> s->connect_thread (which is set to NULL), as running thread may free it
>> at any time.
>>
>> Still nbd_co_establish_connection() does exactly this: it saves
>> s->connect_thread to local variable (just for better code style) and
>> use it even after yield point, when thread may be already detached.
>>
>> Fix that. Also check thr to be non-NULL on
>> nbd_co_establish_connection() start for safety.
>>
>> After this patch "case CONNECT_THREAD_RUNNING_DETACHED" becomes
>> impossible in the second switch in nbd_co_establish_connection().
>> Still, don't add extra abort() just before the release. If it somehow
>> possible to reach this "case:" it won't hurt. Anyway, good refactoring
>> of all this reconnect mess will come soon.
>>
>> Signed-off-by: Vladimir Sementsov-Ogievskiy<vsementsov@virtuozzo.com>
>> ---
>>
>> Hi all! I faced a crash, just running 277 iotest in a loop. I can't
>> reproduce it on master, it reproduces only on my branch with nbd
>> reconnect refactorings.
>>
>> Still, it seems very possible that it may crash under some conditions.
>> So I propose this patch for 6.0. It's written so that it's obvious that
>> it will not hurt:
>>
>>   pre-patch, on first hunk we'll just crash if thr is NULL,
>>   on second hunk it's safe to return -1, and using thr when
>>   s->connect_thread is already zeroed is obviously wrong.
> 
> Ha, occasionally I reinvented what Roman already does in "[PATCH 1/7] block/nbd: avoid touching freed connect_thread".
> 
> My additional first hunk actually is not needed, as nbd_co_establish_connection is called after if (!nbd_clisent_connecting(s)) { return; }, so we should not be here after  nbd_co_establish_connection_cancel(bs, true); which is called with s->state set to NBD_CLIENT_QUIT.
> 
> So, it would be more honest to take Roman's patch "[PATCH 1/7] block/nbd: avoid touching freed connect_thread" :)

Still, I like my variant, because it make obvious that s->connect_thread may change only to NULL, not to some new pointer.

> 
> Eric, could you take a look? If there no more pending block patches, I can try to send pull-request myself
> 

Kevin, I see you've staged several patches for rc3.. This one is quite simple, could you add it too?

-- 
Best regards,
Vladimir