block/nbd.c | 11 +++++++++++ 1 file changed, 11 insertions(+)
If on nbd_close() we detach the thread (in
nbd_co_establish_connection_cancel() thr->state becomes
CONNECT_THREAD_RUNNING_DETACHED), after that point we should not use
s->connect_thread (which is set to NULL), as running thread may free it
at any time.
Still nbd_co_establish_connection() does exactly this: it saves
s->connect_thread to local variable (just for better code style) and
use it even after yield point, when thread may be already detached.
Fix that. Also check thr to be non-NULL on
nbd_co_establish_connection() start for safety.
After this patch "case CONNECT_THREAD_RUNNING_DETACHED" becomes
impossible in the second switch in nbd_co_establish_connection().
Still, don't add extra abort() just before the release. If it somehow
possible to reach this "case:" it won't hurt. Anyway, good refactoring
of all this reconnect mess will come soon.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
---
Hi all! I faced a crash, just running 277 iotest in a loop. I can't
reproduce it on master, it reproduces only on my branch with nbd
reconnect refactorings.
Still, it seems very possible that it may crash under some conditions.
So I propose this patch for 6.0. It's written so that it's obvious that
it will not hurt:
pre-patch, on first hunk we'll just crash if thr is NULL,
on second hunk it's safe to return -1, and using thr when
s->connect_thread is already zeroed is obviously wrong.
block/nbd.c | 11 +++++++++++
1 file changed, 11 insertions(+)
diff --git a/block/nbd.c b/block/nbd.c
index c26dc5a54f..1d4668d42d 100644
--- a/block/nbd.c
+++ b/block/nbd.c
@@ -443,6 +443,11 @@ nbd_co_establish_connection(BlockDriverState *bs, Error **errp)
BDRVNBDState *s = bs->opaque;
NBDConnectThread *thr = s->connect_thread;
+ if (!thr) {
+ /* detached */
+ return -1;
+ }
+
qemu_mutex_lock(&thr->mutex);
switch (thr->state) {
@@ -486,6 +491,12 @@ nbd_co_establish_connection(BlockDriverState *bs, Error **errp)
s->wait_connect = true;
qemu_coroutine_yield();
+ if (!s->connect_thread) {
+ /* detached */
+ return -1;
+ }
+ assert(thr == s->connect_thread);
+
qemu_mutex_lock(&thr->mutex);
switch (thr->state) {
--
2.29.2
06.04.2021 18:51, Vladimir Sementsov-Ogievskiy wrote: > If on nbd_close() we detach the thread (in > nbd_co_establish_connection_cancel() thr->state becomes > CONNECT_THREAD_RUNNING_DETACHED), after that point we should not use > s->connect_thread (which is set to NULL), as running thread may free it > at any time. > > Still nbd_co_establish_connection() does exactly this: it saves > s->connect_thread to local variable (just for better code style) and > use it even after yield point, when thread may be already detached. > > Fix that. Also check thr to be non-NULL on > nbd_co_establish_connection() start for safety. > > After this patch "case CONNECT_THREAD_RUNNING_DETACHED" becomes > impossible in the second switch in nbd_co_establish_connection(). > Still, don't add extra abort() just before the release. If it somehow > possible to reach this "case:" it won't hurt. Anyway, good refactoring > of all this reconnect mess will come soon. > > Signed-off-by: Vladimir Sementsov-Ogievskiy<vsementsov@virtuozzo.com> > --- > > Hi all! I faced a crash, just running 277 iotest in a loop. I can't > reproduce it on master, it reproduces only on my branch with nbd > reconnect refactorings. > > Still, it seems very possible that it may crash under some conditions. > So I propose this patch for 6.0. It's written so that it's obvious that > it will not hurt: > > pre-patch, on first hunk we'll just crash if thr is NULL, > on second hunk it's safe to return -1, and using thr when > s->connect_thread is already zeroed is obviously wrong. Ha, occasionally I reinvented what Roman already does in "[PATCH 1/7] block/nbd: avoid touching freed connect_thread". My additional first hunk actually is not needed, as nbd_co_establish_connection is called after if (!nbd_clisent_connecting(s)) { return; }, so we should not be here after nbd_co_establish_connection_cancel(bs, true); which is called with s->state set to NBD_CLIENT_QUIT. So, it would be more honest to take Roman's patch "[PATCH 1/7] block/nbd: avoid touching freed connect_thread" :) Eric, could you take a look? If there no more pending block patches, I can try to send pull-request myself -- Best regards, Vladimir
06.04.2021 19:20, Vladimir Sementsov-Ogievskiy wrote: > 06.04.2021 18:51, Vladimir Sementsov-Ogievskiy wrote: >> If on nbd_close() we detach the thread (in >> nbd_co_establish_connection_cancel() thr->state becomes >> CONNECT_THREAD_RUNNING_DETACHED), after that point we should not use >> s->connect_thread (which is set to NULL), as running thread may free it >> at any time. >> >> Still nbd_co_establish_connection() does exactly this: it saves >> s->connect_thread to local variable (just for better code style) and >> use it even after yield point, when thread may be already detached. >> >> Fix that. Also check thr to be non-NULL on >> nbd_co_establish_connection() start for safety. >> >> After this patch "case CONNECT_THREAD_RUNNING_DETACHED" becomes >> impossible in the second switch in nbd_co_establish_connection(). >> Still, don't add extra abort() just before the release. If it somehow >> possible to reach this "case:" it won't hurt. Anyway, good refactoring >> of all this reconnect mess will come soon. >> >> Signed-off-by: Vladimir Sementsov-Ogievskiy<vsementsov@virtuozzo.com> >> --- >> >> Hi all! I faced a crash, just running 277 iotest in a loop. I can't >> reproduce it on master, it reproduces only on my branch with nbd >> reconnect refactorings. >> >> Still, it seems very possible that it may crash under some conditions. >> So I propose this patch for 6.0. It's written so that it's obvious that >> it will not hurt: >> >> pre-patch, on first hunk we'll just crash if thr is NULL, >> on second hunk it's safe to return -1, and using thr when >> s->connect_thread is already zeroed is obviously wrong. > > Ha, occasionally I reinvented what Roman already does in "[PATCH 1/7] block/nbd: avoid touching freed connect_thread". > > My additional first hunk actually is not needed, as nbd_co_establish_connection is called after if (!nbd_clisent_connecting(s)) { return; }, so we should not be here after nbd_co_establish_connection_cancel(bs, true); which is called with s->state set to NBD_CLIENT_QUIT. > > So, it would be more honest to take Roman's patch "[PATCH 1/7] block/nbd: avoid touching freed connect_thread" :) Still, I like my variant, because it make obvious that s->connect_thread may change only to NULL, not to some new pointer. > > Eric, could you take a look? If there no more pending block patches, I can try to send pull-request myself > Kevin, I see you've staged several patches for rc3.. This one is quite simple, could you add it too? -- Best regards, Vladimir
On 06.04.21 17:51, Vladimir Sementsov-Ogievskiy wrote: > If on nbd_close() we detach the thread (in > nbd_co_establish_connection_cancel() thr->state becomes > CONNECT_THREAD_RUNNING_DETACHED), after that point we should not use > s->connect_thread (which is set to NULL), as running thread may free it > at any time. > > Still nbd_co_establish_connection() does exactly this: it saves > s->connect_thread to local variable (just for better code style) and > use it even after yield point, when thread may be already detached. > > Fix that. Also check thr to be non-NULL on > nbd_co_establish_connection() start for safety. > > After this patch "case CONNECT_THREAD_RUNNING_DETACHED" becomes > impossible in the second switch in nbd_co_establish_connection(). > Still, don't add extra abort() just before the release. If it somehow > possible to reach this "case:" it won't hurt. Anyway, good refactoring > of all this reconnect mess will come soon. > > Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> > --- > > Hi all! I faced a crash, just running 277 iotest in a loop. I can't > reproduce it on master, it reproduces only on my branch with nbd > reconnect refactorings. > > Still, it seems very possible that it may crash under some conditions. > So I propose this patch for 6.0. It's written so that it's obvious that > it will not hurt: > > pre-patch, on first hunk we'll just crash if thr is NULL, > on second hunk it's safe to return -1, and using thr when > s->connect_thread is already zeroed is obviously wrong. > > block/nbd.c | 11 +++++++++++ > 1 file changed, 11 insertions(+) > > diff --git a/block/nbd.c b/block/nbd.c > index c26dc5a54f..1d4668d42d 100644 > --- a/block/nbd.c > +++ b/block/nbd.c > @@ -443,6 +443,11 @@ nbd_co_establish_connection(BlockDriverState *bs, Error **errp) > BDRVNBDState *s = bs->opaque; > NBDConnectThread *thr = s->connect_thread; > > + if (!thr) { > + /* detached */ > + return -1; > + } > + > qemu_mutex_lock(&thr->mutex); > > switch (thr->state) { First, it is a bit strange not to set *errp in these cases. I don’t think it’d make a difference anyway, but now that I look into it... The error would be propagated to s->connect_err, but that isn’t used anywhere, so... What’s the point? Anyway. What I really wanted to ask is: nbd_co_establish_connection() may create a thread, which accesses the thread. Is that a problem? Like, can this happen: - nbd_co_establish_connection(): thr->state := THREAD_RUNNING - nbd_co_establish_connection(): thread is created - nbd_co_establish_connection(): thr->mutex unlocked - connect_thread_func(): thr->mutex locked - connect_thread_func(): thr->state := something else - connect_thread_func(): thr->mutex unlocked - nbd_co_establish_connection(): yields - (As I understood it, this yield leads to nbd_co_establish_connection_cancel() continuing) - nbd_co_EC_cancel(): thr->mutex locked - nbd_co_EC_cancel(): do_free true - nbd_co_EC_cancel(): nbd_free_connect_thread() - connect_thread_func(): nbd_free_connect_thread() ? Max
13.04.2021 14:53, Max Reitz wrote: > On 06.04.21 17:51, Vladimir Sementsov-Ogievskiy wrote: >> If on nbd_close() we detach the thread (in >> nbd_co_establish_connection_cancel() thr->state becomes >> CONNECT_THREAD_RUNNING_DETACHED), after that point we should not use >> s->connect_thread (which is set to NULL), as running thread may free it >> at any time. >> >> Still nbd_co_establish_connection() does exactly this: it saves >> s->connect_thread to local variable (just for better code style) and >> use it even after yield point, when thread may be already detached. >> >> Fix that. Also check thr to be non-NULL on >> nbd_co_establish_connection() start for safety. >> >> After this patch "case CONNECT_THREAD_RUNNING_DETACHED" becomes >> impossible in the second switch in nbd_co_establish_connection(). >> Still, don't add extra abort() just before the release. If it somehow >> possible to reach this "case:" it won't hurt. Anyway, good refactoring >> of all this reconnect mess will come soon. >> >> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> >> --- >> >> Hi all! I faced a crash, just running 277 iotest in a loop. I can't >> reproduce it on master, it reproduces only on my branch with nbd >> reconnect refactorings. >> >> Still, it seems very possible that it may crash under some conditions. >> So I propose this patch for 6.0. It's written so that it's obvious that >> it will not hurt: >> >> pre-patch, on first hunk we'll just crash if thr is NULL, >> on second hunk it's safe to return -1, and using thr when >> s->connect_thread is already zeroed is obviously wrong. >> >> block/nbd.c | 11 +++++++++++ >> 1 file changed, 11 insertions(+) >> >> diff --git a/block/nbd.c b/block/nbd.c >> index c26dc5a54f..1d4668d42d 100644 >> --- a/block/nbd.c >> +++ b/block/nbd.c >> @@ -443,6 +443,11 @@ nbd_co_establish_connection(BlockDriverState *bs, Error **errp) >> BDRVNBDState *s = bs->opaque; >> NBDConnectThread *thr = s->connect_thread; >> + if (!thr) { >> + /* detached */ >> + return -1; >> + } >> + >> qemu_mutex_lock(&thr->mutex); >> switch (thr->state) { > > First, it is a bit strange not to set *errp in these cases. Oops, right! ashamed) > I don’t think it’d make a difference anyway, but now that I look into it... The error would be propagated to s->connect_err, but that isn’t used anywhere, so... What’s the point? Yes it's unused.. That's to be improved later. > > Anyway. What I really wanted to ask is: nbd_co_establish_connection() may create a thread, which accesses the thread. Is that a problem? Like, can this happen: > > - nbd_co_establish_connection(): thr->state := THREAD_RUNNING > - nbd_co_establish_connection(): thread is created > - nbd_co_establish_connection(): thr->mutex unlocked > - connect_thread_func(): thr->mutex locked > - connect_thread_func(): thr->state := something else > - connect_thread_func(): thr->mutex unlocked > - nbd_co_establish_connection(): yields > - (As I understood it, this yield leads to > nbd_co_establish_connection_cancel() continuing) > - nbd_co_EC_cancel(): thr->mutex locked > - nbd_co_EC_cancel(): do_free true > - nbd_co_EC_cancel(): nbd_free_connect_thread() > - connect_thread_func(): nbd_free_connect_thread() > I think not. Here we are safe: connect_thread_func will only do free if thread in CONNECT_THREAD_RUNNING_DETACHED state. Thread becomes CONNECT_THREAD_RUNNING_DETACHED only on one code path in nbd_co_establish_connection_cancel(), and on that path do_free is false. OK, what you say is possible if we call nbd_co_establish_connection_cancel() twice with detach=true. But that should not be possible if it is called only from .bdrv_close (indirectly) of nbd driver. The problem is that nbd_co_EC_cancel() may free the thread state, and then nbd_co_establish_connection() reuses saved thr local varible after yield. Still (as I noted before), I've never reproduced it on master, only on my branch with some modifications. Still it seems possible anyway. -- Best regards, Vladimir
On 13.04.21 14:19, Vladimir Sementsov-Ogievskiy wrote: > 13.04.2021 14:53, Max Reitz wrote: >> On 06.04.21 17:51, Vladimir Sementsov-Ogievskiy wrote: >>> If on nbd_close() we detach the thread (in >>> nbd_co_establish_connection_cancel() thr->state becomes >>> CONNECT_THREAD_RUNNING_DETACHED), after that point we should not use >>> s->connect_thread (which is set to NULL), as running thread may free it >>> at any time. >>> >>> Still nbd_co_establish_connection() does exactly this: it saves >>> s->connect_thread to local variable (just for better code style) and >>> use it even after yield point, when thread may be already detached. >>> >>> Fix that. Also check thr to be non-NULL on >>> nbd_co_establish_connection() start for safety. >>> >>> After this patch "case CONNECT_THREAD_RUNNING_DETACHED" becomes >>> impossible in the second switch in nbd_co_establish_connection(). >>> Still, don't add extra abort() just before the release. If it somehow >>> possible to reach this "case:" it won't hurt. Anyway, good refactoring >>> of all this reconnect mess will come soon. >>> >>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> >>> --- >>> >>> Hi all! I faced a crash, just running 277 iotest in a loop. I can't >>> reproduce it on master, it reproduces only on my branch with nbd >>> reconnect refactorings. >>> >>> Still, it seems very possible that it may crash under some conditions. >>> So I propose this patch for 6.0. It's written so that it's obvious that >>> it will not hurt: >>> >>> pre-patch, on first hunk we'll just crash if thr is NULL, >>> on second hunk it's safe to return -1, and using thr when >>> s->connect_thread is already zeroed is obviously wrong. >>> >>> block/nbd.c | 11 +++++++++++ >>> 1 file changed, 11 insertions(+) >>> >>> diff --git a/block/nbd.c b/block/nbd.c >>> index c26dc5a54f..1d4668d42d 100644 >>> --- a/block/nbd.c >>> +++ b/block/nbd.c >>> @@ -443,6 +443,11 @@ nbd_co_establish_connection(BlockDriverState >>> *bs, Error **errp) >>> BDRVNBDState *s = bs->opaque; >>> NBDConnectThread *thr = s->connect_thread; >>> + if (!thr) { >>> + /* detached */ >>> + return -1; >>> + } >>> + >>> qemu_mutex_lock(&thr->mutex); >>> switch (thr->state) { >> >> First, it is a bit strange not to set *errp in these cases. > > Oops, right! ashamed) OK, so who cares. It wouldn’t do anything anyway. Apart from that, all the changes do is to turn use after frees or immediate NULL dereferences into clean errors. I can’t see any resources that should be cleaned up, so I hope Coverity won’t hate me for taking this patch. And then we’ll see whether Peter will take the pull request... Max
On Tue, Apr 06, 2021 at 06:51:14PM +0300, Vladimir Sementsov-Ogievskiy wrote: > If on nbd_close() we detach the thread (in > nbd_co_establish_connection_cancel() thr->state becomes > CONNECT_THREAD_RUNNING_DETACHED), after that point we should not use > s->connect_thread (which is set to NULL), as running thread may free it > at any time. > > Still nbd_co_establish_connection() does exactly this: it saves > s->connect_thread to local variable (just for better code style) and > use it even after yield point, when thread may be already detached. > > Fix that. Also check thr to be non-NULL on > nbd_co_establish_connection() start for safety. > > After this patch "case CONNECT_THREAD_RUNNING_DETACHED" becomes > impossible in the second switch in nbd_co_establish_connection(). > Still, don't add extra abort() just before the release. If it somehow > possible to reach this "case:" it won't hurt. Anyway, good refactoring > of all this reconnect mess will come soon. > > Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> > --- > > Hi all! I faced a crash, just running 277 iotest in a loop. I can't > reproduce it on master, it reproduces only on my branch with nbd > reconnect refactorings. > > Still, it seems very possible that it may crash under some conditions. > So I propose this patch for 6.0. It's written so that it's obvious that > it will not hurt: > > pre-patch, on first hunk we'll just crash if thr is NULL, > on second hunk it's safe to return -1, and using thr when > s->connect_thread is already zeroed is obviously wrong. > > block/nbd.c | 11 +++++++++++ > 1 file changed, 11 insertions(+) Can we please get it merged in 6.0 as it's a genuine crasher fix? Reviewed-by: Roman Kagan <rvkagan@yandex-team.ru>
12.04.2021 11:45, Roman Kagan wrote: > On Tue, Apr 06, 2021 at 06:51:14PM +0300, Vladimir Sementsov-Ogievskiy wrote: >> If on nbd_close() we detach the thread (in >> nbd_co_establish_connection_cancel() thr->state becomes >> CONNECT_THREAD_RUNNING_DETACHED), after that point we should not use >> s->connect_thread (which is set to NULL), as running thread may free it >> at any time. >> >> Still nbd_co_establish_connection() does exactly this: it saves >> s->connect_thread to local variable (just for better code style) and >> use it even after yield point, when thread may be already detached. >> >> Fix that. Also check thr to be non-NULL on >> nbd_co_establish_connection() start for safety. >> >> After this patch "case CONNECT_THREAD_RUNNING_DETACHED" becomes >> impossible in the second switch in nbd_co_establish_connection(). >> Still, don't add extra abort() just before the release. If it somehow >> possible to reach this "case:" it won't hurt. Anyway, good refactoring >> of all this reconnect mess will come soon. >> >> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> >> --- >> >> Hi all! I faced a crash, just running 277 iotest in a loop. I can't >> reproduce it on master, it reproduces only on my branch with nbd >> reconnect refactorings. >> >> Still, it seems very possible that it may crash under some conditions. >> So I propose this patch for 6.0. It's written so that it's obvious that >> it will not hurt: >> >> pre-patch, on first hunk we'll just crash if thr is NULL, >> on second hunk it's safe to return -1, and using thr when >> s->connect_thread is already zeroed is obviously wrong. >> >> block/nbd.c | 11 +++++++++++ >> 1 file changed, 11 insertions(+) > > Can we please get it merged in 6.0 as it's a genuine crasher fix? > > Reviewed-by: Roman Kagan <rvkagan@yandex-team.ru> > Thanks! I'm applying it to my nbd branch and will send PULL request for rc3 today -- Best regards, Vladimir
© 2016 - 2024 Red Hat, Inc.