[libvirt] [PATCH] qemu: Prevent isolation group-related guest disappearance

Andrea Bolognani posted 1 patch 6 years, 8 months ago
Failed in applying to current master (apply log)
src/qemu/qemu_domain_address.c | 12 ++++++++++++
1 file changed, 12 insertions(+)
[libvirt] [PATCH] qemu: Prevent isolation group-related guest disappearance
Posted by Andrea Bolognani 6 years, 8 months ago
We can't retrieve the isolation group of a device that's
not present in the system. However, it's very common for
VFs to be created late in the boot, so they might not be
present yet when libvirtd starts, which would cause the
guests using them to disappear.

If a PCI address has already been set for the host device
in question, we can assume that it either existed at some
point in the past or the user is assigning addresses
manually: in both cases, we should not error out and just
ignore the (temporary) failure.

Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1484254

Signed-off-by: Andrea Bolognani <abologna@redhat.com>
---
 src/qemu/qemu_domain_address.c | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/src/qemu/qemu_domain_address.c b/src/qemu/qemu_domain_address.c
index 16bf0cdf9..7f12f186b 100644
--- a/src/qemu/qemu_domain_address.c
+++ b/src/qemu/qemu_domain_address.c
@@ -1012,6 +1012,18 @@ qemuDomainFillDeviceIsolationGroup(virDomainDefPtr def,
         tmp = virPCIDeviceAddressGetIOMMUGroupNum(hostAddr);
 
         if (tmp < 0) {
+            /* If there's already a PCI address assigned to the device
+             * we move on instead of erroring out.
+             *
+             * This means we still don't allow non-existing host
+             * devices to be assigned to guests; however, if the host
+             * device existed at some point in the past but no longer
+             * does, which can happen very easily when dealing with VFs,
+             * we prevent the guest from disappearing and give the user
+             * an opportunity to edit its configuration */
+            if (virDeviceInfoPCIAddressPresent(info))
+                goto skip;
+
             virReportError(VIR_ERR_INTERNAL_ERROR,
                            _("Can't look up isolation group for host device "
                              "%04x:%02x:%02x.%x"),
-- 
2.13.5

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [PATCH] qemu: Prevent isolation group-related guest disappearance
Posted by Martin Kletzander 6 years, 8 months ago
On Thu, Aug 24, 2017 at 05:12:20PM +0200, Andrea Bolognani wrote:
>We can't retrieve the isolation group of a device that's
>not present in the system. However, it's very common for
>VFs to be created late in the boot, so they might not be
>present yet when libvirtd starts, which would cause the
>guests using them to disappear.
>
>If a PCI address has already been set for the host device
>in question, we can assume that it either existed at some
>point in the past or the user is assigning addresses
>manually: in both cases, we should not error out and just
>ignore the (temporary) failure.
>
>Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1484254
>
>Signed-off-by: Andrea Bolognani <abologna@redhat.com>
>---
> src/qemu/qemu_domain_address.c | 12 ++++++++++++
> 1 file changed, 12 insertions(+)
>

Not an expert on this topic, but I did my due diligence and it makes
sense to me, so

Reviewed-by: Martin Kletzander <mkletzan@redhat.com>

if that's enough for you.

>diff --git a/src/qemu/qemu_domain_address.c b/src/qemu/qemu_domain_address.c
>index 16bf0cdf9..7f12f186b 100644
>--- a/src/qemu/qemu_domain_address.c
>+++ b/src/qemu/qemu_domain_address.c
>@@ -1012,6 +1012,18 @@ qemuDomainFillDeviceIsolationGroup(virDomainDefPtr def,
>         tmp = virPCIDeviceAddressGetIOMMUGroupNum(hostAddr);
>
>         if (tmp < 0) {
>+            /* If there's already a PCI address assigned to the device
>+             * we move on instead of erroring out.
>+             *
>+             * This means we still don't allow non-existing host
>+             * devices to be assigned to guests; however, if the host
>+             * device existed at some point in the past but no longer
>+             * does, which can happen very easily when dealing with VFs,
>+             * we prevent the guest from disappearing and give the user
>+             * an opportunity to edit its configuration */
>+            if (virDeviceInfoPCIAddressPresent(info))
>+                goto skip;
>+
>             virReportError(VIR_ERR_INTERNAL_ERROR,
>                            _("Can't look up isolation group for host device "
>                              "%04x:%02x:%02x.%x"),
>--
>2.13.5
>
>--
>libvir-list mailing list
>libvir-list@redhat.com
>https://www.redhat.com/mailman/listinfo/libvir-list
--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [PATCH] qemu: Prevent isolation group-related guest disappearance
Posted by Peter Krempa 6 years, 8 months ago
On Fri, Aug 25, 2017 at 11:41:22 +0200, Martin Kletzander wrote:
> On Thu, Aug 24, 2017 at 05:12:20PM +0200, Andrea Bolognani wrote:
> > We can't retrieve the isolation group of a device that's
> > not present in the system. However, it's very common for
> > VFs to be created late in the boot, so they might not be
> > present yet when libvirtd starts, which would cause the
> > guests using them to disappear.
> > 
> > If a PCI address has already been set for the host device
> > in question, we can assume that it either existed at some
> > point in the past or the user is assigning addresses
> > manually: in both cases, we should not error out and just
> > ignore the (temporary) failure.
> > 
> > Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1484254
> > 
> > Signed-off-by: Andrea Bolognani <abologna@redhat.com>
> > ---
> > src/qemu/qemu_domain_address.c | 12 ++++++++++++
> > 1 file changed, 12 insertions(+)
> > 
> 
> Not an expert on this topic, but I did my due diligence and it makes
> sense to me, so
> 
> Reviewed-by: Martin Kletzander <mkletzan@redhat.com>

Disclamer: I'm not objecting to the ACK, it's clearly better than it
was.

> 
> if that's enough for you.
> 
> > diff --git a/src/qemu/qemu_domain_address.c b/src/qemu/qemu_domain_address.c
> > index 16bf0cdf9..7f12f186b 100644
> > --- a/src/qemu/qemu_domain_address.c
> > +++ b/src/qemu/qemu_domain_address.c
> > @@ -1012,6 +1012,18 @@ qemuDomainFillDeviceIsolationGroup(virDomainDefPtr def,
> >         tmp = virPCIDeviceAddressGetIOMMUGroupNum(hostAddr);
> > 
> >         if (tmp < 0) {
> > +            /* If there's already a PCI address assigned to the device
> > +             * we move on instead of erroring out.
> > +             *
> > +             * This means we still don't allow non-existing host
> > +             * devices to be assigned to guests; however, if the host
> > +             * device existed at some point in the past but no longer
> > +             * does, which can happen very easily when dealing with VFs,
> > +             * we prevent the guest from disappearing and give the user
> > +             * an opportunity to edit its configuration */
> > +            if (virDeviceInfoPCIAddressPresent(info))
> > +                goto skip;
> > +
> >             virReportError(VIR_ERR_INTERNAL_ERROR,
> >                            _("Can't look up isolation group for host device "
> >                              "%04x:%02x:%02x.%x"),

Why on earth is this in the domain address callback? That means that
it's filled on parse. That's silly for a thing like the isolation group.

The isolation group is necessary only when you are going to start the VM
so only then it should be determined. That would prevent a bug like this
altogether.

Also there's PCI hotplug...
--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [PATCH] qemu: Prevent isolation group-related guest disappearance
Posted by Andrea Bolognani 6 years, 8 months ago
On Fri, 2017-08-25 at 12:12 +0200, Peter Krempa wrote:
> > > @@ -1012,6 +1012,18 @@ qemuDomainFillDeviceIsolationGroup(virDomainDefPtr def,
> > >         tmp = virPCIDeviceAddressGetIOMMUGroupNum(hostAddr);
> > > 
> > >         if (tmp < 0) {
> > > +            /* If there's already a PCI address assigned to the device
> > > +             * we move on instead of erroring out.
> > > +             *
> > > +             * This means we still don't allow non-existing host
> > > +             * devices to be assigned to guests; however, if the host
> > > +             * device existed at some point in the past but no longer
> > > +             * does, which can happen very easily when dealing with VFs,
> > > +             * we prevent the guest from disappearing and give the user
> > > +             * an opportunity to edit its configuration */
> > > +            if (virDeviceInfoPCIAddressPresent(info))
> > > +                goto skip;
> > > +
> > >             virReportError(VIR_ERR_INTERNAL_ERROR,
> > >                            _("Can't look up isolation group for host device "
> > >                              "%04x:%02x:%02x.%x"),
> 
> Why on earth is this in the domain address callback? That means that
> it's filled on parse. That's silly for a thing like the isolation group.
> 
> The isolation group is necessary only when you are going to start the VM
> so only then it should be determined. That would prevent a bug like this
> altogether.
> 
> Also there's PCI hotplug...

Well, the whole point of introducing isolation groups
was so that isolation constraint would be respected when
assigning PCI addresses, and guest startup is way too
late for that.

If host devices are not isolated correctly on pSeries
guests, depending on the situation the result can be
either degraded error handling (undesirable, but maybe
not too bad) or the guest failing to start (pretty bad).
So picking the right PCI addresses is critical, and we
need to know the isolation group in order to do it.

Note that these constraints only apply to pSeries, and
accordingly the code above only runs for such guests.


That said, I hadn't realized that configuring guests
to use devices that are not yet or no longer available
on the host was supposed to work, but since that's the
case the current behavior is basically a regression
because it prevents the users from doing something
that they would have been able to do in the past.

I'll post a v2 that changes the behavior so that we
fall back to the old ways (no isolation) for devices
that are not available on the host at the time the
guest is configured.

Thanks for pointing out my mistake :)

-- 
Andrea Bolognani / Red Hat / Virtualization

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list