block/nbd.c | 11 +++++++++++ 1 file changed, 11 insertions(+)
If on nbd_close() we detach the thread (in
nbd_co_establish_connection_cancel() thr->state becomes
CONNECT_THREAD_RUNNING_DETACHED), after that point we should not use
s->connect_thread (which is set to NULL), as running thread may free it
at any time.
Still nbd_co_establish_connection() does exactly this: it saves
s->connect_thread to local variable (just for better code style) and
use it even after yield point, when thread may be already detached.
Fix that. Also check thr to be non-NULL on
nbd_co_establish_connection() start for safety.
After this patch "case CONNECT_THREAD_RUNNING_DETACHED" becomes
impossible in the second switch in nbd_co_establish_connection().
Still, don't add extra abort() just before the release. If it somehow
possible to reach this "case:" it won't hurt. Anyway, good refactoring
of all this reconnect mess will come soon.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
---
Hi all! I faced a crash, just running 277 iotest in a loop. I can't
reproduce it on master, it reproduces only on my branch with nbd
reconnect refactorings.
Still, it seems very possible that it may crash under some conditions.
So I propose this patch for 6.0. It's written so that it's obvious that
it will not hurt:
pre-patch, on first hunk we'll just crash if thr is NULL,
on second hunk it's safe to return -1, and using thr when
s->connect_thread is already zeroed is obviously wrong.
block/nbd.c | 11 +++++++++++
1 file changed, 11 insertions(+)
diff --git a/block/nbd.c b/block/nbd.c
index c26dc5a54f..1d4668d42d 100644
--- a/block/nbd.c
+++ b/block/nbd.c
@@ -443,6 +443,11 @@ nbd_co_establish_connection(BlockDriverState *bs, Error **errp)
BDRVNBDState *s = bs->opaque;
NBDConnectThread *thr = s->connect_thread;
+ if (!thr) {
+ /* detached */
+ return -1;
+ }
+
qemu_mutex_lock(&thr->mutex);
switch (thr->state) {
@@ -486,6 +491,12 @@ nbd_co_establish_connection(BlockDriverState *bs, Error **errp)
s->wait_connect = true;
qemu_coroutine_yield();
+ if (!s->connect_thread) {
+ /* detached */
+ return -1;
+ }
+ assert(thr == s->connect_thread);
+
qemu_mutex_lock(&thr->mutex);
switch (thr->state) {
--
2.29.2
06.04.2021 18:51, Vladimir Sementsov-Ogievskiy wrote:
> If on nbd_close() we detach the thread (in
> nbd_co_establish_connection_cancel() thr->state becomes
> CONNECT_THREAD_RUNNING_DETACHED), after that point we should not use
> s->connect_thread (which is set to NULL), as running thread may free it
> at any time.
>
> Still nbd_co_establish_connection() does exactly this: it saves
> s->connect_thread to local variable (just for better code style) and
> use it even after yield point, when thread may be already detached.
>
> Fix that. Also check thr to be non-NULL on
> nbd_co_establish_connection() start for safety.
>
> After this patch "case CONNECT_THREAD_RUNNING_DETACHED" becomes
> impossible in the second switch in nbd_co_establish_connection().
> Still, don't add extra abort() just before the release. If it somehow
> possible to reach this "case:" it won't hurt. Anyway, good refactoring
> of all this reconnect mess will come soon.
>
> Signed-off-by: Vladimir Sementsov-Ogievskiy<vsementsov@virtuozzo.com>
> ---
>
> Hi all! I faced a crash, just running 277 iotest in a loop. I can't
> reproduce it on master, it reproduces only on my branch with nbd
> reconnect refactorings.
>
> Still, it seems very possible that it may crash under some conditions.
> So I propose this patch for 6.0. It's written so that it's obvious that
> it will not hurt:
>
> pre-patch, on first hunk we'll just crash if thr is NULL,
> on second hunk it's safe to return -1, and using thr when
> s->connect_thread is already zeroed is obviously wrong.
Ha, occasionally I reinvented what Roman already does in "[PATCH 1/7] block/nbd: avoid touching freed connect_thread".
My additional first hunk actually is not needed, as nbd_co_establish_connection is called after if (!nbd_clisent_connecting(s)) { return; }, so we should not be here after nbd_co_establish_connection_cancel(bs, true); which is called with s->state set to NBD_CLIENT_QUIT.
So, it would be more honest to take Roman's patch "[PATCH 1/7] block/nbd: avoid touching freed connect_thread" :)
Eric, could you take a look? If there no more pending block patches, I can try to send pull-request myself
--
Best regards,
Vladimir
06.04.2021 19:20, Vladimir Sementsov-Ogievskiy wrote:
> 06.04.2021 18:51, Vladimir Sementsov-Ogievskiy wrote:
>> If on nbd_close() we detach the thread (in
>> nbd_co_establish_connection_cancel() thr->state becomes
>> CONNECT_THREAD_RUNNING_DETACHED), after that point we should not use
>> s->connect_thread (which is set to NULL), as running thread may free it
>> at any time.
>>
>> Still nbd_co_establish_connection() does exactly this: it saves
>> s->connect_thread to local variable (just for better code style) and
>> use it even after yield point, when thread may be already detached.
>>
>> Fix that. Also check thr to be non-NULL on
>> nbd_co_establish_connection() start for safety.
>>
>> After this patch "case CONNECT_THREAD_RUNNING_DETACHED" becomes
>> impossible in the second switch in nbd_co_establish_connection().
>> Still, don't add extra abort() just before the release. If it somehow
>> possible to reach this "case:" it won't hurt. Anyway, good refactoring
>> of all this reconnect mess will come soon.
>>
>> Signed-off-by: Vladimir Sementsov-Ogievskiy<vsementsov@virtuozzo.com>
>> ---
>>
>> Hi all! I faced a crash, just running 277 iotest in a loop. I can't
>> reproduce it on master, it reproduces only on my branch with nbd
>> reconnect refactorings.
>>
>> Still, it seems very possible that it may crash under some conditions.
>> So I propose this patch for 6.0. It's written so that it's obvious that
>> it will not hurt:
>>
>> pre-patch, on first hunk we'll just crash if thr is NULL,
>> on second hunk it's safe to return -1, and using thr when
>> s->connect_thread is already zeroed is obviously wrong.
>
> Ha, occasionally I reinvented what Roman already does in "[PATCH 1/7] block/nbd: avoid touching freed connect_thread".
>
> My additional first hunk actually is not needed, as nbd_co_establish_connection is called after if (!nbd_clisent_connecting(s)) { return; }, so we should not be here after nbd_co_establish_connection_cancel(bs, true); which is called with s->state set to NBD_CLIENT_QUIT.
>
> So, it would be more honest to take Roman's patch "[PATCH 1/7] block/nbd: avoid touching freed connect_thread" :)
Still, I like my variant, because it make obvious that s->connect_thread may change only to NULL, not to some new pointer.
>
> Eric, could you take a look? If there no more pending block patches, I can try to send pull-request myself
>
Kevin, I see you've staged several patches for rc3.. This one is quite simple, could you add it too?
--
Best regards,
Vladimir
On 06.04.21 17:51, Vladimir Sementsov-Ogievskiy wrote:
> If on nbd_close() we detach the thread (in
> nbd_co_establish_connection_cancel() thr->state becomes
> CONNECT_THREAD_RUNNING_DETACHED), after that point we should not use
> s->connect_thread (which is set to NULL), as running thread may free it
> at any time.
>
> Still nbd_co_establish_connection() does exactly this: it saves
> s->connect_thread to local variable (just for better code style) and
> use it even after yield point, when thread may be already detached.
>
> Fix that. Also check thr to be non-NULL on
> nbd_co_establish_connection() start for safety.
>
> After this patch "case CONNECT_THREAD_RUNNING_DETACHED" becomes
> impossible in the second switch in nbd_co_establish_connection().
> Still, don't add extra abort() just before the release. If it somehow
> possible to reach this "case:" it won't hurt. Anyway, good refactoring
> of all this reconnect mess will come soon.
>
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> ---
>
> Hi all! I faced a crash, just running 277 iotest in a loop. I can't
> reproduce it on master, it reproduces only on my branch with nbd
> reconnect refactorings.
>
> Still, it seems very possible that it may crash under some conditions.
> So I propose this patch for 6.0. It's written so that it's obvious that
> it will not hurt:
>
> pre-patch, on first hunk we'll just crash if thr is NULL,
> on second hunk it's safe to return -1, and using thr when
> s->connect_thread is already zeroed is obviously wrong.
>
> block/nbd.c | 11 +++++++++++
> 1 file changed, 11 insertions(+)
>
> diff --git a/block/nbd.c b/block/nbd.c
> index c26dc5a54f..1d4668d42d 100644
> --- a/block/nbd.c
> +++ b/block/nbd.c
> @@ -443,6 +443,11 @@ nbd_co_establish_connection(BlockDriverState *bs, Error **errp)
> BDRVNBDState *s = bs->opaque;
> NBDConnectThread *thr = s->connect_thread;
>
> + if (!thr) {
> + /* detached */
> + return -1;
> + }
> +
> qemu_mutex_lock(&thr->mutex);
>
> switch (thr->state) {
First, it is a bit strange not to set *errp in these cases. I don’t
think it’d make a difference anyway, but now that I look into it... The
error would be propagated to s->connect_err, but that isn’t used
anywhere, so... What’s the point?
Anyway. What I really wanted to ask is: nbd_co_establish_connection()
may create a thread, which accesses the thread. Is that a problem?
Like, can this happen:
- nbd_co_establish_connection(): thr->state := THREAD_RUNNING
- nbd_co_establish_connection(): thread is created
- nbd_co_establish_connection(): thr->mutex unlocked
- connect_thread_func(): thr->mutex locked
- connect_thread_func(): thr->state := something else
- connect_thread_func(): thr->mutex unlocked
- nbd_co_establish_connection(): yields
- (As I understood it, this yield leads to
nbd_co_establish_connection_cancel() continuing)
- nbd_co_EC_cancel(): thr->mutex locked
- nbd_co_EC_cancel(): do_free true
- nbd_co_EC_cancel(): nbd_free_connect_thread()
- connect_thread_func(): nbd_free_connect_thread()
?
Max
13.04.2021 14:53, Max Reitz wrote:
> On 06.04.21 17:51, Vladimir Sementsov-Ogievskiy wrote:
>> If on nbd_close() we detach the thread (in
>> nbd_co_establish_connection_cancel() thr->state becomes
>> CONNECT_THREAD_RUNNING_DETACHED), after that point we should not use
>> s->connect_thread (which is set to NULL), as running thread may free it
>> at any time.
>>
>> Still nbd_co_establish_connection() does exactly this: it saves
>> s->connect_thread to local variable (just for better code style) and
>> use it even after yield point, when thread may be already detached.
>>
>> Fix that. Also check thr to be non-NULL on
>> nbd_co_establish_connection() start for safety.
>>
>> After this patch "case CONNECT_THREAD_RUNNING_DETACHED" becomes
>> impossible in the second switch in nbd_co_establish_connection().
>> Still, don't add extra abort() just before the release. If it somehow
>> possible to reach this "case:" it won't hurt. Anyway, good refactoring
>> of all this reconnect mess will come soon.
>>
>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
>> ---
>>
>> Hi all! I faced a crash, just running 277 iotest in a loop. I can't
>> reproduce it on master, it reproduces only on my branch with nbd
>> reconnect refactorings.
>>
>> Still, it seems very possible that it may crash under some conditions.
>> So I propose this patch for 6.0. It's written so that it's obvious that
>> it will not hurt:
>>
>> pre-patch, on first hunk we'll just crash if thr is NULL,
>> on second hunk it's safe to return -1, and using thr when
>> s->connect_thread is already zeroed is obviously wrong.
>>
>> block/nbd.c | 11 +++++++++++
>> 1 file changed, 11 insertions(+)
>>
>> diff --git a/block/nbd.c b/block/nbd.c
>> index c26dc5a54f..1d4668d42d 100644
>> --- a/block/nbd.c
>> +++ b/block/nbd.c
>> @@ -443,6 +443,11 @@ nbd_co_establish_connection(BlockDriverState *bs, Error **errp)
>> BDRVNBDState *s = bs->opaque;
>> NBDConnectThread *thr = s->connect_thread;
>> + if (!thr) {
>> + /* detached */
>> + return -1;
>> + }
>> +
>> qemu_mutex_lock(&thr->mutex);
>> switch (thr->state) {
>
> First, it is a bit strange not to set *errp in these cases.
Oops, right! ashamed)
> I don’t think it’d make a difference anyway, but now that I look into it... The error would be propagated to s->connect_err, but that isn’t used anywhere, so... What’s the point?
Yes it's unused.. That's to be improved later.
>
> Anyway. What I really wanted to ask is: nbd_co_establish_connection() may create a thread, which accesses the thread. Is that a problem? Like, can this happen:
>
> - nbd_co_establish_connection(): thr->state := THREAD_RUNNING
> - nbd_co_establish_connection(): thread is created
> - nbd_co_establish_connection(): thr->mutex unlocked
> - connect_thread_func(): thr->mutex locked
> - connect_thread_func(): thr->state := something else
> - connect_thread_func(): thr->mutex unlocked
> - nbd_co_establish_connection(): yields
> - (As I understood it, this yield leads to
> nbd_co_establish_connection_cancel() continuing)
> - nbd_co_EC_cancel(): thr->mutex locked
> - nbd_co_EC_cancel(): do_free true
> - nbd_co_EC_cancel(): nbd_free_connect_thread()
> - connect_thread_func(): nbd_free_connect_thread()
>
I think not. Here we are safe: connect_thread_func will only do free if thread in CONNECT_THREAD_RUNNING_DETACHED state. Thread becomes CONNECT_THREAD_RUNNING_DETACHED only on one code path in nbd_co_establish_connection_cancel(), and on that path do_free is false. OK, what you say is possible if we call nbd_co_establish_connection_cancel() twice with detach=true. But that should not be possible if it is called only from .bdrv_close (indirectly) of nbd driver.
The problem is that nbd_co_EC_cancel() may free the thread state, and then nbd_co_establish_connection() reuses saved thr local varible after yield. Still (as I noted before), I've never reproduced it on master, only on my branch with some modifications. Still it seems possible anyway.
--
Best regards,
Vladimir
On 13.04.21 14:19, Vladimir Sementsov-Ogievskiy wrote:
> 13.04.2021 14:53, Max Reitz wrote:
>> On 06.04.21 17:51, Vladimir Sementsov-Ogievskiy wrote:
>>> If on nbd_close() we detach the thread (in
>>> nbd_co_establish_connection_cancel() thr->state becomes
>>> CONNECT_THREAD_RUNNING_DETACHED), after that point we should not use
>>> s->connect_thread (which is set to NULL), as running thread may free it
>>> at any time.
>>>
>>> Still nbd_co_establish_connection() does exactly this: it saves
>>> s->connect_thread to local variable (just for better code style) and
>>> use it even after yield point, when thread may be already detached.
>>>
>>> Fix that. Also check thr to be non-NULL on
>>> nbd_co_establish_connection() start for safety.
>>>
>>> After this patch "case CONNECT_THREAD_RUNNING_DETACHED" becomes
>>> impossible in the second switch in nbd_co_establish_connection().
>>> Still, don't add extra abort() just before the release. If it somehow
>>> possible to reach this "case:" it won't hurt. Anyway, good refactoring
>>> of all this reconnect mess will come soon.
>>>
>>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
>>> ---
>>>
>>> Hi all! I faced a crash, just running 277 iotest in a loop. I can't
>>> reproduce it on master, it reproduces only on my branch with nbd
>>> reconnect refactorings.
>>>
>>> Still, it seems very possible that it may crash under some conditions.
>>> So I propose this patch for 6.0. It's written so that it's obvious that
>>> it will not hurt:
>>>
>>> pre-patch, on first hunk we'll just crash if thr is NULL,
>>> on second hunk it's safe to return -1, and using thr when
>>> s->connect_thread is already zeroed is obviously wrong.
>>>
>>> block/nbd.c | 11 +++++++++++
>>> 1 file changed, 11 insertions(+)
>>>
>>> diff --git a/block/nbd.c b/block/nbd.c
>>> index c26dc5a54f..1d4668d42d 100644
>>> --- a/block/nbd.c
>>> +++ b/block/nbd.c
>>> @@ -443,6 +443,11 @@ nbd_co_establish_connection(BlockDriverState
>>> *bs, Error **errp)
>>> BDRVNBDState *s = bs->opaque;
>>> NBDConnectThread *thr = s->connect_thread;
>>> + if (!thr) {
>>> + /* detached */
>>> + return -1;
>>> + }
>>> +
>>> qemu_mutex_lock(&thr->mutex);
>>> switch (thr->state) {
>>
>> First, it is a bit strange not to set *errp in these cases.
>
> Oops, right! ashamed)
OK, so who cares. It wouldn’t do anything anyway.
Apart from that, all the changes do is to turn use after frees or
immediate NULL dereferences into clean errors. I can’t see any
resources that should be cleaned up, so I hope Coverity won’t hate me
for taking this patch.
And then we’ll see whether Peter will take the pull request...
Max
On Tue, Apr 06, 2021 at 06:51:14PM +0300, Vladimir Sementsov-Ogievskiy wrote: > If on nbd_close() we detach the thread (in > nbd_co_establish_connection_cancel() thr->state becomes > CONNECT_THREAD_RUNNING_DETACHED), after that point we should not use > s->connect_thread (which is set to NULL), as running thread may free it > at any time. > > Still nbd_co_establish_connection() does exactly this: it saves > s->connect_thread to local variable (just for better code style) and > use it even after yield point, when thread may be already detached. > > Fix that. Also check thr to be non-NULL on > nbd_co_establish_connection() start for safety. > > After this patch "case CONNECT_THREAD_RUNNING_DETACHED" becomes > impossible in the second switch in nbd_co_establish_connection(). > Still, don't add extra abort() just before the release. If it somehow > possible to reach this "case:" it won't hurt. Anyway, good refactoring > of all this reconnect mess will come soon. > > Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> > --- > > Hi all! I faced a crash, just running 277 iotest in a loop. I can't > reproduce it on master, it reproduces only on my branch with nbd > reconnect refactorings. > > Still, it seems very possible that it may crash under some conditions. > So I propose this patch for 6.0. It's written so that it's obvious that > it will not hurt: > > pre-patch, on first hunk we'll just crash if thr is NULL, > on second hunk it's safe to return -1, and using thr when > s->connect_thread is already zeroed is obviously wrong. > > block/nbd.c | 11 +++++++++++ > 1 file changed, 11 insertions(+) Can we please get it merged in 6.0 as it's a genuine crasher fix? Reviewed-by: Roman Kagan <rvkagan@yandex-team.ru>
12.04.2021 11:45, Roman Kagan wrote: > On Tue, Apr 06, 2021 at 06:51:14PM +0300, Vladimir Sementsov-Ogievskiy wrote: >> If on nbd_close() we detach the thread (in >> nbd_co_establish_connection_cancel() thr->state becomes >> CONNECT_THREAD_RUNNING_DETACHED), after that point we should not use >> s->connect_thread (which is set to NULL), as running thread may free it >> at any time. >> >> Still nbd_co_establish_connection() does exactly this: it saves >> s->connect_thread to local variable (just for better code style) and >> use it even after yield point, when thread may be already detached. >> >> Fix that. Also check thr to be non-NULL on >> nbd_co_establish_connection() start for safety. >> >> After this patch "case CONNECT_THREAD_RUNNING_DETACHED" becomes >> impossible in the second switch in nbd_co_establish_connection(). >> Still, don't add extra abort() just before the release. If it somehow >> possible to reach this "case:" it won't hurt. Anyway, good refactoring >> of all this reconnect mess will come soon. >> >> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> >> --- >> >> Hi all! I faced a crash, just running 277 iotest in a loop. I can't >> reproduce it on master, it reproduces only on my branch with nbd >> reconnect refactorings. >> >> Still, it seems very possible that it may crash under some conditions. >> So I propose this patch for 6.0. It's written so that it's obvious that >> it will not hurt: >> >> pre-patch, on first hunk we'll just crash if thr is NULL, >> on second hunk it's safe to return -1, and using thr when >> s->connect_thread is already zeroed is obviously wrong. >> >> block/nbd.c | 11 +++++++++++ >> 1 file changed, 11 insertions(+) > > Can we please get it merged in 6.0 as it's a genuine crasher fix? > > Reviewed-by: Roman Kagan <rvkagan@yandex-team.ru> > Thanks! I'm applying it to my nbd branch and will send PULL request for rc3 today -- Best regards, Vladimir
© 2016 - 2026 Red Hat, Inc.