[Qemu-devel] [RFC 0/2] memory/vfio: notify region_del() when unregister listeners

Peter Xu posted 2 patches 6 years, 2 months ago
Patches applied successfully (tree, apply log)
git fetch https://github.com/patchew-project/qemu tags/patchew/20180119084219.31187-1-peterx@redhat.com
Test checkpatch passed
Test docker-build@min-glib passed
Test docker-mingw@fedora passed
Test docker-quick@centos6 passed
Test ppc passed
Test s390x passed
There is a newer version of this series
hw/vfio/common.c | 16 ++++++++++++----
memory.c         | 24 ++++++++++++++++++++++++
2 files changed, 36 insertions(+), 4 deletions(-)
[Qemu-devel] [RFC 0/2] memory/vfio: notify region_del() when unregister listeners
Posted by Peter Xu 6 years, 2 months ago
I encountered an event loss problem during unplugging vfio devices:

  https://bugzilla.redhat.com/show_bug.cgi?id=1531393

I thought it should be a simple VT-d issue but I was wrong.  The whole
debugging leads me to these patches.

Basically I think what we missed is that when unregistering memory
listeners, we don't really call region_del() at all.  Instead we just
remove ourselves from the listener list.  IMHO that's not enough.  A
clean unregister should undo all possible changes that have done
during region_add().  That's patch 1.

Patch 2 fixes a vfio issue when patch 1 is applied.

I'm marking this change as RFC since it touches the core of memory
somehow, on which I am not 100% sure about.  E.g., I haven't tested
all the listener users, so I'm not sure whether it may broke any use
case.

But what I'm sure is that it passes the docker tests on
compiling/qtests, and it fixes the event loss that reported.

Let's see whether I can get some feedback first.  Please review.

Thanks.

Peter Xu (2):
  memory: do explicit cleanup when remove listeners
  vfio: listener unregister before unset container

 hw/vfio/common.c | 16 ++++++++++++----
 memory.c         | 24 ++++++++++++++++++++++++
 2 files changed, 36 insertions(+), 4 deletions(-)

-- 
2.14.3


Re: [Qemu-devel] [RFC 0/2] memory/vfio: notify region_del() when unregister listeners
Posted by Paolo Bonzini 6 years, 2 months ago
On 19/01/2018 09:42, Peter Xu wrote:
> I encountered an event loss problem during unplugging vfio devices:
> 
>   https://bugzilla.redhat.com/show_bug.cgi?id=1531393
> 
> I thought it should be a simple VT-d issue but I was wrong.  The whole
> debugging leads me to these patches.
> 
> Basically I think what we missed is that when unregistering memory
> listeners, we don't really call region_del() at all.  Instead we just
> remove ourselves from the listener list.  IMHO that's not enough.  A
> clean unregister should undo all possible changes that have done
> during region_add().  That's patch 1.
> 
> Patch 2 fixes a vfio issue when patch 1 is applied.

It makes sense, though of course patch 1 must come second for
bisectability.  Of the other listeners, most do not implement
region_del, but commit must be audited as well.  What matters is whether
the listener is unregistered, and only few are:

- kvm_arm_machine_init_done must unregister the listener after the
QSLIST_FOREACH_SAFE loop.

- Xen seems okay

- vhost needs to remove the memory_region_unref loop after unregistering
the listener - and this must be done at the same time as patch 1, not before

- virtio seems okay

Thanks,

Paolo

Re: [Qemu-devel] [RFC 0/2] memory/vfio: notify region_del() when unregister listeners
Posted by Peter Xu 6 years, 2 months ago
On Fri, Jan 19, 2018 at 12:41:01PM +0100, Paolo Bonzini wrote:
> On 19/01/2018 09:42, Peter Xu wrote:
> > I encountered an event loss problem during unplugging vfio devices:
> > 
> >   https://bugzilla.redhat.com/show_bug.cgi?id=1531393
> > 
> > I thought it should be a simple VT-d issue but I was wrong.  The whole
> > debugging leads me to these patches.
> > 
> > Basically I think what we missed is that when unregistering memory
> > listeners, we don't really call region_del() at all.  Instead we just
> > remove ourselves from the listener list.  IMHO that's not enough.  A
> > clean unregister should undo all possible changes that have done
> > during region_add().  That's patch 1.
> > 
> > Patch 2 fixes a vfio issue when patch 1 is applied.
> 
> It makes sense, though of course patch 1 must come second for
> bisectability.  Of the other listeners, most do not implement
> region_del, but commit must be audited as well.  What matters is whether
> the listener is unregistered, and only few are:
> 
> - kvm_arm_machine_init_done must unregister the listener after the
> QSLIST_FOREACH_SAFE loop.

I'll add another patch for this.  I noticed that we haven't really do
the list cleanup on kvm_devices_head.  Say, the items are still on the
list even after they are freed in kvm_arm_machine_init_done():

    QSLIST_FOREACH_SAFE(kd, &kvm_devices_head, entries, tkd) {
        if (kd->kda.addr != -1) {
            kvm_arm_set_device_addr(kd);
        }
        memory_region_unref(kd->mr);
        g_free(kd);
    }

I think it's pretty safe as long as we don't access kvm_devices_head
any more (that's what we did), but would it be better to clean the
list head too?  (CC PeterM too)

Anyway, I'll ignore this for my series and only do what's needed for
the memory listener fix.

> 
> - Xen seems okay
> 
> - vhost needs to remove the memory_region_unref loop after unregistering
> the listener - and this must be done at the same time as patch 1, not before

Noted.  I'll also fix the begin() missing problem you pointed out in
the other comment.  Thanks!

> 
> - virtio seems okay
> 
> Thanks,
> 
> Paolo

-- 
Peter Xu