[v1] don't kill qemu process on restart if networkNotify fails

[libvirt] [PATCH 2/2] network: better log message when network is inactive during reconnect

Posted by Laine Stump 8 years, 9 months ago

If the network isn't active during networkNotifyActualDevice(), we
would log an error message stating that the bridge device didn't
exist. This patch adds a check to see if the network is active, making
the logs more useful in the case that it isn't.

Partially resolves: https://bugzilla.redhat.com/1442700
---
 src/network/bridge_driver.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/src/network/bridge_driver.c b/src/network/bridge_driver.c
index d2d8557..e06f81b 100644
--- a/src/network/bridge_driver.c
+++ b/src/network/bridge_driver.c
@@ -4676,6 +4676,13 @@ networkNotifyActualDevice(virDomainDefPtr dom,
     }
     netdef = network->def;
 
+    if (!virNetworkObjIsActive(network)) {
+        virReportError(VIR_ERR_OPERATION_INVALID,
+                       _("network '%s' is not active"),
+                       netdef->name);
+        goto error;
+    }
+
     /* if we're restarting libvirtd after an upgrade from a version
      * that didn't save bridge name in actualNetDef for
      * actualType==network, we need to copy it in so that it will be
-- 
2.9.3

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] [PATCH 2/2] network: better log message when network is inactive during reconnect

Posted by John Ferlan 8 years, 9 months ago


On 04/25/2017 12:34 PM, Laine Stump wrote:
> If the network isn't active during networkNotifyActualDevice(), we
> would log an error message stating that the bridge device didn't
> exist. This patch adds a check to see if the network is active, making
> the logs more useful in the case that it isn't.
> 
> Partially resolves: https://bugzilla.redhat.com/1442700
> ---
>  src/network/bridge_driver.c | 7 +++++++
>  1 file changed, 7 insertions(+)
> 
> diff --git a/src/network/bridge_driver.c b/src/network/bridge_driver.c
> index d2d8557..e06f81b 100644
> --- a/src/network/bridge_driver.c
> +++ b/src/network/bridge_driver.c
> @@ -4676,6 +4676,13 @@ networkNotifyActualDevice(virDomainDefPtr dom,
>      }
>      netdef = network->def;
>  
> +    if (!virNetworkObjIsActive(network)) {
> +        virReportError(VIR_ERR_OPERATION_INVALID,
> +                       _("network '%s' is not active"),
> +                       netdef->name);
> +        goto error;
> +    }
> +

/me wonders whether this should just a goto cleanup - IOW: if the
network isn't active, so what, why error. Once someone attempts to start
it, they'll get errors I assume...

Not that goto error or cleanup matters since commit id '4fee4e0' changed
the goto cleanup to goto error and added:

+
+error:
+    goto cleanup;

I guess I don't have the answer readily available in my head as to how
much of the subsequent code would be called at network start time?

John

>      /* if we're restarting libvirtd after an upgrade from a version
>       * that didn't save bridge name in actualNetDef for
>       * actualType==network, we need to copy it in so that it will be
> 

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] [PATCH 2/2] network: better log message when network is inactive during reconnect

Posted by Laine Stump 8 years, 9 months ago

On 04/26/2017 02:39 PM, John Ferlan wrote:
> 
> 
> On 04/25/2017 12:34 PM, Laine Stump wrote:
>> If the network isn't active during networkNotifyActualDevice(), we
>> would log an error message stating that the bridge device didn't
>> exist. This patch adds a check to see if the network is active, making
>> the logs more useful in the case that it isn't.
>>
>> Partially resolves: https://bugzilla.redhat.com/1442700
>> ---
>>  src/network/bridge_driver.c | 7 +++++++
>>  1 file changed, 7 insertions(+)
>>
>> diff --git a/src/network/bridge_driver.c b/src/network/bridge_driver.c
>> index d2d8557..e06f81b 100644
>> --- a/src/network/bridge_driver.c
>> +++ b/src/network/bridge_driver.c
>> @@ -4676,6 +4676,13 @@ networkNotifyActualDevice(virDomainDefPtr dom,
>>      }
>>      netdef = network->def;
>>  
>> +    if (!virNetworkObjIsActive(network)) {
>> +        virReportError(VIR_ERR_OPERATION_INVALID,
>> +                       _("network '%s' is not active"),
>> +                       netdef->name);
>> +        goto error;
>> +    }
>> +
> 
> /me wonders whether this should just a goto cleanup - IOW: if the
> network isn't active, so what, why error. Once someone attempts to start
> it, they'll get errors I assume...
> 
> Not that goto error or cleanup matters since commit id '4fee4e0' changed
> the goto cleanup to goto error and added:
> 
> +
> +error:
> +    goto cleanup;

That's missing the point of that commit. "cleanup:" is there only as a
place for error: to goto the cleanup code that is common to both the
"success:" exit and the "error:" exit (and the three labels are all
there so that that this function is structured similarly to
networkAllocateActualDevice() - I wanted them to be as close to the same
as possible). In the body of the function you either declare the
allocate/notification a success (with "goto success") in which case the
connect count for the network goes up, or you declare it a failure (with
"goto error") in which case the connect count for the network remains
unchanged, but you never directly goto the noncommittal "cleanup:".

> 
> I guess I don't have the answer readily available in my head as to how
> much of the subsequent code would be called at network start time?

None of this is called when the network is started - we have no way for
the networkStart() function to learn which interfaces in which domains
are supposed to be connected to a particular network. That needs more
infrastructure change than I wanted to put in right now (we need some
sort of event or callback that will allow the network driver to notify
all active hypervisor drivers whenever a network is started (or
destroyed?), giving the hypervisor driver a chance to call
networkNotifyActualDevice() for any affected domain interface).

So have I explained myself well enough?

> 
> John
> 
>>      /* if we're restarting libvirtd after an upgrade from a version
>>       * that didn't save bridge name in actualNetDef for
>>       * actualType==network, we need to copy it in so that it will be
>>
> 

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] [PATCH 2/2] network: better log message when network is inactive during reconnect

Posted by John Ferlan 8 years, 9 months ago


On 04/26/2017 05:57 PM, Laine Stump wrote:
> On 04/26/2017 02:39 PM, John Ferlan wrote:
>>
>>
>> On 04/25/2017 12:34 PM, Laine Stump wrote:
>>> If the network isn't active during networkNotifyActualDevice(), we
>>> would log an error message stating that the bridge device didn't
>>> exist. This patch adds a check to see if the network is active, making
>>> the logs more useful in the case that it isn't.
>>>
>>> Partially resolves: https://bugzilla.redhat.com/1442700
>>> ---
>>>  src/network/bridge_driver.c | 7 +++++++
>>>  1 file changed, 7 insertions(+)
>>>
>>> diff --git a/src/network/bridge_driver.c b/src/network/bridge_driver.c
>>> index d2d8557..e06f81b 100644
>>> --- a/src/network/bridge_driver.c
>>> +++ b/src/network/bridge_driver.c
>>> @@ -4676,6 +4676,13 @@ networkNotifyActualDevice(virDomainDefPtr dom,
>>>      }
>>>      netdef = network->def;
>>>  
>>> +    if (!virNetworkObjIsActive(network)) {
>>> +        virReportError(VIR_ERR_OPERATION_INVALID,
>>> +                       _("network '%s' is not active"),
>>> +                       netdef->name);
>>> +        goto error;
>>> +    }
>>> +
>>
>> /me wonders whether this should just a goto cleanup - IOW: if the
>> network isn't active, so what, why error. Once someone attempts to start
>> it, they'll get errors I assume...
>>
>> Not that goto error or cleanup matters since commit id '4fee4e0' changed
>> the goto cleanup to goto error and added:
>>
>> +
>> +error:
>> +    goto cleanup;
> 
> That's missing the point of that commit. "cleanup:" is there only as a
> place for error: to goto the cleanup code that is common to both the
> "success:" exit and the "error:" exit (and the three labels are all
> there so that that this function is structured similarly to
> networkAllocateActualDevice() - I wanted them to be as close to the same
> as possible). In the body of the function you either declare the
> allocate/notification a success (with "goto success") in which case the
> connect count for the network goes up, or you declare it a failure (with
> "goto error") in which case the connect count for the network remains
> unchanged, but you never directly goto the noncommittal "cleanup:".
> 

OK understood - it's just one of those odd looking constructs to have a
goto error and goto cleanup immediately following it, but your reasoning
for having the logic this way is understandable.

>>
>> I guess I don't have the answer readily available in my head as to how
>> much of the subsequent code would be called at network start time?
> 
> None of this is called when the network is started - we have no way for
> the networkStart() function to learn which interfaces in which domains
> are supposed to be connected to a particular network. That needs more
> infrastructure change than I wanted to put in right now (we need some
> sort of event or callback that will allow the network driver to notify
> all active hypervisor drivers whenever a network is started (or
> destroyed?), giving the hypervisor driver a chance to call
> networkNotifyActualDevice() for any affected domain interface).
> 
> So have I explained myself well enough?
> 

Sure, but for my context...

I guess I was putting myself in the mindset of a libvirtd reconnect
where "who knows" what was done with each network prior to reconnection
and this is the discovery of that. For some paths it's a hard death
issue - the physical connection had problems or went away. For other
paths it's more of a soft or state issue - someone stopped/destroyed,
but didn't undefine a network.  I was thinking more that in this latter
case, does an error make sense?

John

>>
>> John
>>
>>>      /* if we're restarting libvirtd after an upgrade from a version
>>>       * that didn't save bridge name in actualNetDef for
>>>       * actualType==network, we need to copy it in so that it will be
>>>
>>
> 

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] [PATCH 2/2] network: better log message when network is inactive during reconnect

Posted by Laine Stump 8 years, 9 months ago

On 04/27/2017 09:06 AM, John Ferlan wrote:
> 
> 
> On 04/26/2017 05:57 PM, Laine Stump wrote:
>> On 04/26/2017 02:39 PM, John Ferlan wrote:
>>>
>>>
>>> On 04/25/2017 12:34 PM, Laine Stump wrote:
>>>> If the network isn't active during networkNotifyActualDevice(), we
>>>> would log an error message stating that the bridge device didn't
>>>> exist. This patch adds a check to see if the network is active, making
>>>> the logs more useful in the case that it isn't.
>>>>
>>>> Partially resolves: https://bugzilla.redhat.com/1442700
>>>> ---
>>>>  src/network/bridge_driver.c | 7 +++++++
>>>>  1 file changed, 7 insertions(+)
>>>>
>>>> diff --git a/src/network/bridge_driver.c b/src/network/bridge_driver.c
>>>> index d2d8557..e06f81b 100644
>>>> --- a/src/network/bridge_driver.c
>>>> +++ b/src/network/bridge_driver.c
>>>> @@ -4676,6 +4676,13 @@ networkNotifyActualDevice(virDomainDefPtr dom,
>>>>      }
>>>>      netdef = network->def;
>>>>  
>>>> +    if (!virNetworkObjIsActive(network)) {
>>>> +        virReportError(VIR_ERR_OPERATION_INVALID,
>>>> +                       _("network '%s' is not active"),
>>>> +                       netdef->name);
>>>> +        goto error;
>>>> +    }
>>>> +
>>>
>>> /me wonders whether this should just a goto cleanup - IOW: if the
>>> network isn't active, so what, why error. Once someone attempts to start
>>> it, they'll get errors I assume...
>>>
>>> Not that goto error or cleanup matters since commit id '4fee4e0' changed
>>> the goto cleanup to goto error and added:
>>>
>>> +
>>> +error:
>>> +    goto cleanup;
>>
>> That's missing the point of that commit. "cleanup:" is there only as a
>> place for error: to goto the cleanup code that is common to both the
>> "success:" exit and the "error:" exit (and the three labels are all
>> there so that that this function is structured similarly to
>> networkAllocateActualDevice() - I wanted them to be as close to the same
>> as possible). In the body of the function you either declare the
>> allocate/notification a success (with "goto success") in which case the
>> connect count for the network goes up, or you declare it a failure (with
>> "goto error") in which case the connect count for the network remains
>> unchanged, but you never directly goto the noncommittal "cleanup:".
>>
> 
> OK understood - it's just one of those odd looking constructs to have a
> goto error and goto cleanup immediately following it, but your reasoning
> for having the logic this way is understandable.
> 
>>>
>>> I guess I don't have the answer readily available in my head as to how
>>> much of the subsequent code would be called at network start time?
>>
>> None of this is called when the network is started - we have no way for
>> the networkStart() function to learn which interfaces in which domains
>> are supposed to be connected to a particular network. That needs more
>> infrastructure change than I wanted to put in right now (we need some
>> sort of event or callback that will allow the network driver to notify
>> all active hypervisor drivers whenever a network is started (or
>> destroyed?), giving the hypervisor driver a chance to call
>> networkNotifyActualDevice() for any affected domain interface).
>>
>> So have I explained myself well enough?
>>
> 
> Sure, but for my context...
> 
> I guess I was putting myself in the mindset of a libvirtd reconnect
> where "who knows" what was done with each network prior to reconnection
> and this is the discovery of that. For some paths it's a hard death
> issue - the physical connection had problems or went away. For other
> paths it's more of a soft or state issue - someone stopped/destroyed,
> but didn't undefine a network.  I was thinking more that in this latter
> case, does an error make sense?

I think it makes sense to log an error, but it doesn't make sense to
kill the guest. I think it would only make sense to kill the guest if an
error condition led to 1) a guest that was completely non-functioning
and/or unreachable anyway, so it would eventually need to be killed
anyway no matter what you tried to do, or 2) the guest being left
exposed/vulnerable in some serious way. Neither of those is the case here.

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list