From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
Unregister the fd handler before we destroy the channel,
otherwise we've got a race where we might land in the
fd handler just as we're closing the device.
(The race is quite data dependent, you just have to have
the right set of devices for it to trigger).
Corresponds to RH bz: https://bugzilla.redhat.com/show_bug.cgi?id=1666601
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
migration/rdma.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/migration/rdma.c b/migration/rdma.c
index 9b2e7e10aa..54a3c11540 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -2321,6 +2321,7 @@ static void qemu_rdma_cleanup(RDMAContext *rdma)
rdma->connected = false;
}
+ qemu_set_fd_handler(rdma->channel->fd, NULL, NULL, NULL);
g_free(rdma->dest_blocks);
rdma->dest_blocks = NULL;
--
2.20.1
On Tue, Jan 22, 2019 at 05:31:11PM +0000, Dr. David Alan Gilbert (git) wrote: > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com> > > Unregister the fd handler before we destroy the channel, > otherwise we've got a race where we might land in the > fd handler just as we're closing the device. > > (The race is quite data dependent, you just have to have > the right set of devices for it to trigger). > > Corresponds to RH bz: https://bugzilla.redhat.com/show_bug.cgi?id=1666601 > > Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> (Could the crash happened because the same fd number is re-used after the RDMA channel was destroyed? Then when the fd has an event, it'll be delivered to rdma_cm_poll_handler() while the fd is not really the RDMA channel handle any more) Reviewed-by: Peter Xu <peterx@redhat.com> Regards, -- Peter Xu
* Peter Xu (peterx@redhat.com) wrote: > On Tue, Jan 22, 2019 at 05:31:11PM +0000, Dr. David Alan Gilbert (git) wrote: > > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com> > > > > Unregister the fd handler before we destroy the channel, > > otherwise we've got a race where we might land in the > > fd handler just as we're closing the device. > > > > (The race is quite data dependent, you just have to have > > the right set of devices for it to trigger). > > > > Corresponds to RH bz: https://bugzilla.redhat.com/show_bug.cgi?id=1666601 > > > > Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> > > (Could the crash happened because the same fd number is re-used after > the RDMA channel was destroyed? Then when the fd has an event, it'll > be delivered to rdma_cm_poll_handler() while the fd is not really the > RDMA channel handle any more) That's an interesting thought, I'd assumed it was just a race, but being dependent on the fd numbering would explain why it was so delicate to reproduce it. > Reviewed-by: Peter Xu <peterx@redhat.com> Thanks! Dave > Regards, > > -- > Peter Xu -- Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
* Dr. David Alan Gilbert (git) (dgilbert@redhat.com) wrote: > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com> > > Unregister the fd handler before we destroy the channel, > otherwise we've got a race where we might land in the > fd handler just as we're closing the device. > > (The race is quite data dependent, you just have to have > the right set of devices for it to trigger). > > Corresponds to RH bz: https://bugzilla.redhat.com/show_bug.cgi?id=1666601 > > Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Queued > --- > migration/rdma.c | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/migration/rdma.c b/migration/rdma.c > index 9b2e7e10aa..54a3c11540 100644 > --- a/migration/rdma.c > +++ b/migration/rdma.c > @@ -2321,6 +2321,7 @@ static void qemu_rdma_cleanup(RDMAContext *rdma) > rdma->connected = false; > } > > + qemu_set_fd_handler(rdma->channel->fd, NULL, NULL, NULL); > g_free(rdma->dest_blocks); > rdma->dest_blocks = NULL; > > -- > 2.20.1 > > -- Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
On 1/23/19 12:44 PM, Dr. David Alan Gilbert wrote: > * Dr. David Alan Gilbert (git) (dgilbert@redhat.com) wrote: >> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com> >> >> Unregister the fd handler before we destroy the channel, >> otherwise we've got a race where we might land in the >> fd handler just as we're closing the device. >> >> (The race is quite data dependent, you just have to have >> the right set of devices for it to trigger). >> >> Corresponds to RH bz: https://bugzilla.redhat.com/show_bug.cgi?id=1666601 >> >> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> > > Queued Did you fixed the patch subject typo? "un(r)egister" >> --- >> migration/rdma.c | 1 + >> 1 file changed, 1 insertion(+) >> >> diff --git a/migration/rdma.c b/migration/rdma.c >> index 9b2e7e10aa..54a3c11540 100644 >> --- a/migration/rdma.c >> +++ b/migration/rdma.c >> @@ -2321,6 +2321,7 @@ static void qemu_rdma_cleanup(RDMAContext *rdma) >> rdma->connected = false; >> } >> >> + qemu_set_fd_handler(rdma->channel->fd, NULL, NULL, NULL); >> g_free(rdma->dest_blocks); >> rdma->dest_blocks = NULL; >> >> -- >> 2.20.1 >> >> > -- > Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK >
* Philippe Mathieu-Daudé (philmd@redhat.com) wrote: > On 1/23/19 12:44 PM, Dr. David Alan Gilbert wrote: > > * Dr. David Alan Gilbert (git) (dgilbert@redhat.com) wrote: > >> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com> > >> > >> Unregister the fd handler before we destroy the channel, > >> otherwise we've got a race where we might land in the > >> fd handler just as we're closing the device. > >> > >> (The race is quite data dependent, you just have to have > >> the right set of devices for it to trigger). > >> > >> Corresponds to RH bz: https://bugzilla.redhat.com/show_bug.cgi?id=1666601 > >> > >> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> > > > > Queued > > Did you fixed the patch subject typo? "un(r)egister" Yes; fortunately I spotted it during building the pull :-) Dave > >> --- > >> migration/rdma.c | 1 + > >> 1 file changed, 1 insertion(+) > >> > >> diff --git a/migration/rdma.c b/migration/rdma.c > >> index 9b2e7e10aa..54a3c11540 100644 > >> --- a/migration/rdma.c > >> +++ b/migration/rdma.c > >> @@ -2321,6 +2321,7 @@ static void qemu_rdma_cleanup(RDMAContext *rdma) > >> rdma->connected = false; > >> } > >> > >> + qemu_set_fd_handler(rdma->channel->fd, NULL, NULL, NULL); > >> g_free(rdma->dest_blocks); > >> rdma->dest_blocks = NULL; > >> > >> -- > >> 2.20.1 > >> > >> > > -- > > Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK > > -- Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
On Tue, 22 Jan 2019 at 19:08, Dr. David Alan Gilbert (git) <dgilbert@redhat.com> wrote: > > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com> > > Unregister the fd handler before we destroy the channel, > otherwise we've got a race where we might land in the > fd handler just as we're closing the device. > > (The race is quite data dependent, you just have to have > the right set of devices for it to trigger). > > Corresponds to RH bz: https://bugzilla.redhat.com/show_bug.cgi?id=1666601 > > Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> > --- > migration/rdma.c | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/migration/rdma.c b/migration/rdma.c > index 9b2e7e10aa..54a3c11540 100644 > --- a/migration/rdma.c > +++ b/migration/rdma.c > @@ -2321,6 +2321,7 @@ static void qemu_rdma_cleanup(RDMAContext *rdma) > rdma->connected = false; > } > > + qemu_set_fd_handler(rdma->channel->fd, NULL, NULL, NULL); > g_free(rdma->dest_blocks); > rdma->dest_blocks = NULL; Hi -- this patch makes coverity complain (CID 1398634), because here we use rdma->channel without checking that it is NULL, but later in the function we have an "if (rdma->channel)" test. Should this code be conditional on rmda->channel being non-NULL, or is the later test incorrect? thanks -- PMM
* Peter Maydell (peter.maydell@linaro.org) wrote: > On Tue, 22 Jan 2019 at 19:08, Dr. David Alan Gilbert (git) > <dgilbert@redhat.com> wrote: > > > > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com> > > > > Unregister the fd handler before we destroy the channel, > > otherwise we've got a race where we might land in the > > fd handler just as we're closing the device. > > > > (The race is quite data dependent, you just have to have > > the right set of devices for it to trigger). > > > > Corresponds to RH bz: https://bugzilla.redhat.com/show_bug.cgi?id=1666601 > > > > Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> > > --- > > migration/rdma.c | 1 + > > 1 file changed, 1 insertion(+) > > > > diff --git a/migration/rdma.c b/migration/rdma.c > > index 9b2e7e10aa..54a3c11540 100644 > > --- a/migration/rdma.c > > +++ b/migration/rdma.c > > @@ -2321,6 +2321,7 @@ static void qemu_rdma_cleanup(RDMAContext *rdma) > > rdma->connected = false; > > } > > > > + qemu_set_fd_handler(rdma->channel->fd, NULL, NULL, NULL); > > g_free(rdma->dest_blocks); > > rdma->dest_blocks = NULL; > > Hi -- this patch makes coverity complain (CID 1398634), > because here we use rdma->channel without checking that it is NULL, > but later in the function we have an "if (rdma->channel)" test. > Should this code be conditional on rmda->channel being non-NULL, > or is the later test incorrect? Yes, it's got a point - I can seg that. I'll post a fix. Dave > thanks > -- PMM -- Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
© 2016 - 2025 Red Hat, Inc.