[libvirt] [PATCH] qemu: Restore lost shutdown reason

John Ferlan posted 1 patch 5 years, 5 months ago
Patches applied successfully (tree, apply log)
git fetch https://github.com/patchew-project/libvirt tags/patchew/20181018152853.5357-1-jferlan@redhat.com
Test syntax-check passed
There is a newer version of this series
src/qemu/qemu_process.c | 15 ++++++++++-----
1 file changed, 10 insertions(+), 5 deletions(-)
[libvirt] [PATCH] qemu: Restore lost shutdown reason
Posted by John Ferlan 5 years, 5 months ago
When qemuProcessReconnectHelper was introduced (commit d38897a5d)
reconnection failure used VIR_DOMAIN_SHUTOFF_FAILED; however, that
was changed in commit bda2f17d to either VIR_DOMAIN_SHUTOFF_CRASHED
or VIR_DOMAIN_SHUTOFF_UNKNOWN.

When QEMU_CAPS_NO_SHUTDOWN checking was removed in commit fe35b1ad6
the conditional state was just left at VIR_DOMAIN_SHUTOFF_CRASHED.

Restore the logic adjustment using the same conditions as command
line building and alter the comment to describe the reasoning.

Signed-off-by: John Ferlan <jferlan@redhat.com>
---

 This was "discovered" while reviewing Nikolay's patch related
 to adding "DAEMON" as the shutdown reason:

https://www.redhat.com/archives/libvir-list/2018-October/msg00493.html

 While working through the iterations of that - figured I'd at
 least post this.


 src/qemu/qemu_process.c | 15 ++++++++++-----
 1 file changed, 10 insertions(+), 5 deletions(-)

diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c
index 3955eda17c..4a39111d51 100644
--- a/src/qemu/qemu_process.c
+++ b/src/qemu/qemu_process.c
@@ -7985,11 +7985,16 @@ qemuProcessReconnect(void *opaque)
     if (virDomainObjIsActive(obj)) {
         /* We can't get the monitor back, so must kill the VM
          * to remove danger of it ending up running twice if
-         * user tries to start it again later
-         * If we couldn't get the monitor since QEMU supports
-         * no-shutdown, we can safely say that the domain
-         * crashed ... */
-        state = VIR_DOMAIN_SHUTOFF_CRASHED;
+         * user tries to start it again later.
+         *
+         * If we cannot get to the monitor when the QEMU command
+         * line used -no-shutdown, then we can safely say that the
+         * domain crashed; otherwise we don't really know. */
+        if (priv->monJSON && priv->allowReboot == VIR_TRISTATE_BOOL_YES)
+            state = VIR_DOMAIN_SHUTOFF_CRASHED;
+        else
+            state = VIR_DOMAIN_SHUTOFF_UNKNOWN;
+
         /* If BeginJob failed, we jumped here without a job, let's hope another
          * thread didn't have a chance to start playing with the domain yet
          * (it's all we can do anyway).
-- 
2.17.2

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [PATCH] qemu: Restore lost shutdown reason
Posted by Nikolay Shirokovskiy 5 years, 5 months ago

On 18.10.2018 18:28, John Ferlan wrote:
> When qemuProcessReconnectHelper was introduced (commit d38897a5d)
> reconnection failure used VIR_DOMAIN_SHUTOFF_FAILED; however, that
> was changed in commit bda2f17d to either VIR_DOMAIN_SHUTOFF_CRASHED
> or VIR_DOMAIN_SHUTOFF_UNKNOWN.
> 
> When QEMU_CAPS_NO_SHUTDOWN checking was removed in commit fe35b1ad6
> the conditional state was just left at VIR_DOMAIN_SHUTOFF_CRASHED.
> 
> Restore the logic adjustment using the same conditions as command
> line building and alter the comment to describe the reasoning.
> 
> Signed-off-by: John Ferlan <jferlan@redhat.com>
> ---
> 
>  This was "discovered" while reviewing Nikolay's patch related
>  to adding "DAEMON" as the shutdown reason:
> 
> https://www.redhat.com/archives/libvir-list/2018-October/msg00493.html
> 
>  While working through the iterations of that - figured I'd at
>  least post this.
> 
> 
>  src/qemu/qemu_process.c | 15 ++++++++++-----
>  1 file changed, 10 insertions(+), 5 deletions(-)
> 
> diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c
> index 3955eda17c..4a39111d51 100644
> --- a/src/qemu/qemu_process.c
> +++ b/src/qemu/qemu_process.c
> @@ -7985,11 +7985,16 @@ qemuProcessReconnect(void *opaque)
>      if (virDomainObjIsActive(obj)) {
>          /* We can't get the monitor back, so must kill the VM
>           * to remove danger of it ending up running twice if
> -         * user tries to start it again later
> -         * If we couldn't get the monitor since QEMU supports
> -         * no-shutdown, we can safely say that the domain
> -         * crashed ... */
> -        state = VIR_DOMAIN_SHUTOFF_CRASHED;
> +         * user tries to start it again later.
> +         *
> +         * If we cannot get to the monitor when the QEMU command
> +         * line used -no-shutdown, then we can safely say that the
> +         * domain crashed; otherwise we don't really know. */
> +        if (priv->monJSON && priv->allowReboot == VIR_TRISTATE_BOOL_YES)
> +            state = VIR_DOMAIN_SHUTOFF_CRASHED;
> +        else
> +            state = VIR_DOMAIN_SHUTOFF_UNKNOWN;
> +
>          /* If BeginJob failed, we jumped here without a job, let's hope another
>           * thread didn't have a chance to start playing with the domain yet
>           * (it's all we can do anyway).
> 

This is reasonable but not complete. This is only applied to case when we do connect(2)
to qemu and it is failed with ECONNREFUSED which means qemu process does not exists
(in which case it is either crashed if was configured with -no-shutdown or terminated
for unknown reason) or just closed monitor connection (in this case we are going
to terminate it by ourselves).

But there are other cases. For example we can jump to error path even before connect(2)
or after successfull connect. See discussion of such cases in [1].

[1] https://www.redhat.com/archives/libvir-list/2018-October/msg01031.html

Nikolay

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [PATCH] qemu: Restore lost shutdown reason
Posted by John Ferlan 5 years, 4 months ago

On 10/22/18 3:36 AM, Nikolay Shirokovskiy wrote:
> 
> 
> On 18.10.2018 18:28, John Ferlan wrote:
>> When qemuProcessReconnectHelper was introduced (commit d38897a5d)
>> reconnection failure used VIR_DOMAIN_SHUTOFF_FAILED; however, that
>> was changed in commit bda2f17d to either VIR_DOMAIN_SHUTOFF_CRASHED
>> or VIR_DOMAIN_SHUTOFF_UNKNOWN.
>>
>> When QEMU_CAPS_NO_SHUTDOWN checking was removed in commit fe35b1ad6
>> the conditional state was just left at VIR_DOMAIN_SHUTOFF_CRASHED.
>>
>> Restore the logic adjustment using the same conditions as command
>> line building and alter the comment to describe the reasoning.
>>
>> Signed-off-by: John Ferlan <jferlan@redhat.com>
>> ---
>>
>>  This was "discovered" while reviewing Nikolay's patch related
>>  to adding "DAEMON" as the shutdown reason:
>>
>> https://www.redhat.com/archives/libvir-list/2018-October/msg00493.html
>>
>>  While working through the iterations of that - figured I'd at
>>  least post this.
>>
>>
>>  src/qemu/qemu_process.c | 15 ++++++++++-----
>>  1 file changed, 10 insertions(+), 5 deletions(-)
>>
>> diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c
>> index 3955eda17c..4a39111d51 100644
>> --- a/src/qemu/qemu_process.c
>> +++ b/src/qemu/qemu_process.c
>> @@ -7985,11 +7985,16 @@ qemuProcessReconnect(void *opaque)
>>      if (virDomainObjIsActive(obj)) {
>>          /* We can't get the monitor back, so must kill the VM
>>           * to remove danger of it ending up running twice if
>> -         * user tries to start it again later
>> -         * If we couldn't get the monitor since QEMU supports
>> -         * no-shutdown, we can safely say that the domain
>> -         * crashed ... */
>> -        state = VIR_DOMAIN_SHUTOFF_CRASHED;
>> +         * user tries to start it again later.
>> +         *
>> +         * If we cannot get to the monitor when the QEMU command
>> +         * line used -no-shutdown, then we can safely say that the
>> +         * domain crashed; otherwise we don't really know. */
>> +        if (priv->monJSON && priv->allowReboot == VIR_TRISTATE_BOOL_YES)
>> +            state = VIR_DOMAIN_SHUTOFF_CRASHED;
>> +        else
>> +            state = VIR_DOMAIN_SHUTOFF_UNKNOWN;
>> +
>>          /* If BeginJob failed, we jumped here without a job, let's hope another
>>           * thread didn't have a chance to start playing with the domain yet
>>           * (it's all we can do anyway).
>>
> 
> This is reasonable but not complete. This is only applied to case when we do connect(2)

I understand your point; however, the goal of this patch wasn't to fix
the various other failure scenarios/reasons. It was to merely restore
the lost UNKNOWN reason in the event that either priv->monJSON == false
(unlikely) or priv->allowReboot != VIR_TRISTATE_BOOL_YES (possible).

Rather than copy-pasta, I'll create qemuDomainIsUsingNoShutdown(priv)
which will perform the condition checks for both command line generation
and reconnection failure paths.

If that's not desirable, then that's fine. I won't spend more cycles on
this.

John

> to qemu and it is failed with ECONNREFUSED which means qemu process does not exists
> (in which case it is either crashed if was configured with -no-shutdown or terminated
> for unknown reason) or just closed monitor connection (in this case we are going
> to terminate it by ourselves).
> 
> But there are other cases. For example we can jump to error path even before connect(2)
> or after successfull connect. See discussion of such cases in [1].
> 
> [1] https://www.redhat.com/archives/libvir-list/2018-October/msg01031.html
> 
> Nikolay
> 

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [PATCH] qemu: Restore lost shutdown reason
Posted by Nikolay Shirokovskiy 5 years, 4 months ago

On 01.11.2018 17:24, John Ferlan wrote:
> 
> 
> On 10/22/18 3:36 AM, Nikolay Shirokovskiy wrote:
>>
>>
>> On 18.10.2018 18:28, John Ferlan wrote:
>>> When qemuProcessReconnectHelper was introduced (commit d38897a5d)
>>> reconnection failure used VIR_DOMAIN_SHUTOFF_FAILED; however, that
>>> was changed in commit bda2f17d to either VIR_DOMAIN_SHUTOFF_CRASHED
>>> or VIR_DOMAIN_SHUTOFF_UNKNOWN.
>>>
>>> When QEMU_CAPS_NO_SHUTDOWN checking was removed in commit fe35b1ad6
>>> the conditional state was just left at VIR_DOMAIN_SHUTOFF_CRASHED.
>>>
>>> Restore the logic adjustment using the same conditions as command
>>> line building and alter the comment to describe the reasoning.
>>>
>>> Signed-off-by: John Ferlan <jferlan@redhat.com>
>>> ---
>>>
>>>  This was "discovered" while reviewing Nikolay's patch related
>>>  to adding "DAEMON" as the shutdown reason:
>>>
>>> https://www.redhat.com/archives/libvir-list/2018-October/msg00493.html
>>>
>>>  While working through the iterations of that - figured I'd at
>>>  least post this.
>>>
>>>
>>>  src/qemu/qemu_process.c | 15 ++++++++++-----
>>>  1 file changed, 10 insertions(+), 5 deletions(-)
>>>
>>> diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c
>>> index 3955eda17c..4a39111d51 100644
>>> --- a/src/qemu/qemu_process.c
>>> +++ b/src/qemu/qemu_process.c
>>> @@ -7985,11 +7985,16 @@ qemuProcessReconnect(void *opaque)
>>>      if (virDomainObjIsActive(obj)) {
>>>          /* We can't get the monitor back, so must kill the VM
>>>           * to remove danger of it ending up running twice if
>>> -         * user tries to start it again later
>>> -         * If we couldn't get the monitor since QEMU supports
>>> -         * no-shutdown, we can safely say that the domain
>>> -         * crashed ... */
>>> -        state = VIR_DOMAIN_SHUTOFF_CRASHED;
>>> +         * user tries to start it again later.
>>> +         *
>>> +         * If we cannot get to the monitor when the QEMU command
>>> +         * line used -no-shutdown, then we can safely say that the
>>> +         * domain crashed; otherwise we don't really know. */
>>> +        if (priv->monJSON && priv->allowReboot == VIR_TRISTATE_BOOL_YES)
>>> +            state = VIR_DOMAIN_SHUTOFF_CRASHED;
>>> +        else
>>> +            state = VIR_DOMAIN_SHUTOFF_UNKNOWN;
>>> +
>>>          /* If BeginJob failed, we jumped here without a job, let's hope another
>>>           * thread didn't have a chance to start playing with the domain yet
>>>           * (it's all we can do anyway).
>>>
>>
>> This is reasonable but not complete. This is only applied to case when we do connect(2)
> 
> I understand your point; however, the goal of this patch wasn't to fix
> the various other failure scenarios/reasons. It was to merely restore
> the lost UNKNOWN reason in the event that either priv->monJSON == false
> (unlikely) or priv->allowReboot != VIR_TRISTATE_BOOL_YES (possible).
> 
> Rather than copy-pasta, I'll create qemuDomainIsUsingNoShutdown(priv)
> which will perform the condition checks for both command line generation
> and reconnection failure paths.

I'm not against this change. I hoped we can elaborate more presice solution
as discussed in https://www.redhat.com/archives/libvir-list/2018-October/msg00493.html

Nikolay

> 
> If that's not desirable, then that's fine. I won't spend more cycles on
> this.
> 
> John
> 
>> to qemu and it is failed with ECONNREFUSED which means qemu process does not exists
>> (in which case it is either crashed if was configured with -no-shutdown or terminated
>> for unknown reason) or just closed monitor connection (in this case we are going
>> to terminate it by ourselves).
>>
>> But there are other cases. For example we can jump to error path even before connect(2)
>> or after successfull connect. See discussion of such cases in [1].
>>
>> [1] https://www.redhat.com/archives/libvir-list/2018-October/msg01031.html
>>
>> Nikolay
>>

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list