[v3] qemu_iotests: improve debugging options

[PATCH v3 05/15] qemu-iotests: delay QMP socket timers

Posted by Emanuele Giuseppe Esposito 4 years, 10 months ago

Attaching a gdbserver implies that the qmp socket
should wait indefinitely for an answer from QEMU.

Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
---
 python/qemu/machine.py        |  3 +++
 tests/qemu-iotests/iotests.py | 10 +++++++++-
 2 files changed, 12 insertions(+), 1 deletion(-)

diff --git a/python/qemu/machine.py b/python/qemu/machine.py
index 12752142c9..d6142271c2 100644
--- a/python/qemu/machine.py
+++ b/python/qemu/machine.py
@@ -409,6 +409,9 @@ def _launch(self) -> None:
                                        stderr=subprocess.STDOUT,
                                        shell=False,
                                        close_fds=False)
+
+        if 'gdbserver' in self._wrapper:
+            self._qmp_timer = None
         self._post_launch()
 
     def _early_cleanup(self) -> None:
diff --git a/tests/qemu-iotests/iotests.py b/tests/qemu-iotests/iotests.py
index 05d0dc0751..380527245e 100644
--- a/tests/qemu-iotests/iotests.py
+++ b/tests/qemu-iotests/iotests.py
@@ -478,7 +478,10 @@ def log(msg: Msg,
 
 class Timeout:
     def __init__(self, seconds, errmsg="Timeout"):
-        self.seconds = seconds
+        if qemu_gdb:
+            self.seconds = 3000
+        else:
+            self.seconds = seconds
         self.errmsg = errmsg
     def __enter__(self):
         signal.signal(signal.SIGALRM, self.timeout)
@@ -684,6 +687,11 @@ def qmp_to_opts(self, obj):
             output_list += [key + '=' + obj[key]]
         return ','.join(output_list)
 
+    def get_qmp_events(self, wait: bool = False) -> List[QMPMessage]:
+        if qemu_gdb:
+            wait = 0.0
+        return super().get_qmp_events(wait=wait)
+
     def get_qmp_events_filtered(self, wait=60.0):
         result = []
         for ev in self.get_qmp_events(wait=wait):
-- 
2.30.2

Re: [PATCH v3 05/15] qemu-iotests: delay QMP socket timers

Posted by Max Reitz 4 years, 9 months ago

On 14.04.21 19:03, Emanuele Giuseppe Esposito wrote:
> Attaching a gdbserver implies that the qmp socket
> should wait indefinitely for an answer from QEMU.
> 
> Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
> ---
>   python/qemu/machine.py        |  3 +++
>   tests/qemu-iotests/iotests.py | 10 +++++++++-
>   2 files changed, 12 insertions(+), 1 deletion(-)
> 
> diff --git a/python/qemu/machine.py b/python/qemu/machine.py
> index 12752142c9..d6142271c2 100644
> --- a/python/qemu/machine.py
> +++ b/python/qemu/machine.py
> @@ -409,6 +409,9 @@ def _launch(self) -> None:
>                                          stderr=subprocess.STDOUT,
>                                          shell=False,
>                                          close_fds=False)
> +
> +        if 'gdbserver' in self._wrapper:
> +            self._qmp_timer = None

Why doesn’t __init__() evaluate this?  This here doesn’t feel like the 
right place for it.  If we want to evaluate it here, self._qmp_timer 
shouldn’t exist, and instead the timeout should be a _post_launch() 
parameter.  (Which I would have nothing against, by the way.)

Also, mypy complains that this variable is a float, so iotest 297 (which 
runs mypy) fails.

>           self._post_launch()
>   
>       def _early_cleanup(self) -> None:
> diff --git a/tests/qemu-iotests/iotests.py b/tests/qemu-iotests/iotests.py
> index 05d0dc0751..380527245e 100644
> --- a/tests/qemu-iotests/iotests.py
> +++ b/tests/qemu-iotests/iotests.py
> @@ -478,7 +478,10 @@ def log(msg: Msg,
>   
>   class Timeout:
>       def __init__(self, seconds, errmsg="Timeout"):
> -        self.seconds = seconds
> +        if qemu_gdb:
> +            self.seconds = 3000
> +        else:
> +            self.seconds = seconds

We might as well completely disable the timeout then, that would be more 
honest, I think.  (I.e. to make __enter__ and __exit__ no-ops.)

>           self.errmsg = errmsg
>       def __enter__(self):
>           signal.signal(signal.SIGALRM, self.timeout)
> @@ -684,6 +687,11 @@ def qmp_to_opts(self, obj):
>               output_list += [key + '=' + obj[key]]
>           return ','.join(output_list)
>   
> +    def get_qmp_events(self, wait: bool = False) -> List[QMPMessage]:
> +        if qemu_gdb:
> +            wait = 0.0

First, this is a bool.  I can see that qmp.py expects a
Union[bool, float], but machine.py expects just a bool.  (Also, mypy 
complains that this specific `wait` here is a `bool`.  You can see that 
by running iotest 297.)

Second, I don’t understand this.  If the caller wants to block waiting 
on an event, then that should have nothing to do with whether we have 
gdb running or not.  As far as I understand, setting wait to 0.0 is the 
same as wait = False, i.e. we don’t block and just return None 
immediately if there is no pending event.

Max

> +        return super().get_qmp_events(wait=wait)
> +
>       def get_qmp_events_filtered(self, wait=60.0):
>           result = []
>           for ev in self.get_qmp_events(wait=wait):
>

Re: [PATCH v3 05/15] qemu-iotests: delay QMP socket timers

Posted by Emanuele Giuseppe Esposito 4 years, 9 months ago


On 30/04/2021 13:59, Max Reitz wrote:
> On 14.04.21 19:03, Emanuele Giuseppe Esposito wrote:
>> Attaching a gdbserver implies that the qmp socket
>> should wait indefinitely for an answer from QEMU.
>>
>> Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
>> ---
>>   python/qemu/machine.py        |  3 +++
>>   tests/qemu-iotests/iotests.py | 10 +++++++++-
>>   2 files changed, 12 insertions(+), 1 deletion(-)
>>
>> diff --git a/python/qemu/machine.py b/python/qemu/machine.py
>> index 12752142c9..d6142271c2 100644
>> --- a/python/qemu/machine.py
>> +++ b/python/qemu/machine.py
>> @@ -409,6 +409,9 @@ def _launch(self) -> None:
>>                                          stderr=subprocess.STDOUT,
>>                                          shell=False,
>>                                          close_fds=False)
>> +
>> +        if 'gdbserver' in self._wrapper:
>> +            self._qmp_timer = None
> 
> Why doesn’t __init__() evaluate this?  This here doesn’t feel like the 
> right place for it.  If we want to evaluate it here, self._qmp_timer 
> shouldn’t exist, and instead the timeout should be a _post_launch() 
> parameter.  (Which I would have nothing against, by the way.)

Uhm.. I got another comment in a previous version where for the "event" 
callbacks it was better a property than passing around a parameter. 
Which I honestly agree.

What should __init__() do? The check here is to see if the invocation 
has gdb (and a couple of patches ahead also valgrind), to remove the timer.
If I understand what you mean, you want something like
def __init__(self, timer):

But from my understanding, the class hierarchy is:
QEMUMachine: in machine.py
QEMUQtestMachine(QEMUMachine): in qtest.py
VM(qtest.QEMUQtestMachine): in iotests.py
VM() is then invoked in each test.

So this is not easily reachable by check.py, to pass the parameter into 
__init__

> 
> Also, mypy complains that this variable is a float, so iotest 297 (which 
> runs mypy) fails.

This and all mistakes of test 297 (mypy) is my fault: I did not have 
pylint-3 installed thus when testing it skipped the 297 test.

> 
>>           self._post_launch()
>>       def _early_cleanup(self) -> None:
>> diff --git a/tests/qemu-iotests/iotests.py 
>> b/tests/qemu-iotests/iotests.py
>> index 05d0dc0751..380527245e 100644
>> --- a/tests/qemu-iotests/iotests.py
>> +++ b/tests/qemu-iotests/iotests.py
>> @@ -478,7 +478,10 @@ def log(msg: Msg,
>>   class Timeout:
>>       def __init__(self, seconds, errmsg="Timeout"):
>> -        self.seconds = seconds
>> +        if qemu_gdb:
>> +            self.seconds = 3000
>> +        else:
>> +            self.seconds = seconds
> 
> We might as well completely disable the timeout then, that would be more 
> honest, I think.  (I.e. to make __enter__ and __exit__ no-ops.)
> 
>>           self.errmsg = errmsg
>>       def __enter__(self):
>>           signal.signal(signal.SIGALRM, self.timeout)
>> @@ -684,6 +687,11 @@ def qmp_to_opts(self, obj):
>>               output_list += [key + '=' + obj[key]]
>>           return ','.join(output_list)
>> +    def get_qmp_events(self, wait: bool = False) -> List[QMPMessage]:
>> +        if qemu_gdb:
>> +            wait = 0.0
> 
> First, this is a bool.  I can see that qmp.py expects a
> Union[bool, float], but machine.py expects just a bool.  (Also, mypy 
> complains that this specific `wait` here is a `bool`.  You can see that 
> by running iotest 297.)

I think here machine.py should reflect the qmp.py behavior and have an 
Union[bool, float] too.
> 
> Second, I don’t understand this.  If the caller wants to block waiting 
> on an event, then that should have nothing to do with whether we have 
> gdb running or not.  As far as I understand, setting wait to 0.0 is the 
> same as wait = False, i.e. we don’t block and just return None 
> immediately if there is no pending event.

You're right, this might not be needed here. The problem I had was that 
calling gdb and pausing at a breakpoint or something for a while would 
make the QMP socket timeout, thus aborting the whole test. In order to 
avoid that, I need to stop or delay timers.

I can't remember why I added this check here. At some point I am sure 
the test was failing because of socket timeout expiration, but I cannot 
reproduce the problem when commenting out this check above in 
get_qmp_events. The other check in patch 3 should be enough.

Emanuele
> 
> Max
> 
>> +        return super().get_qmp_events(wait=wait)
>> +
>>       def get_qmp_events_filtered(self, wait=60.0):
>>           result = []
>>           for ev in self.get_qmp_events(wait=wait):
>>
>

Re: [PATCH v3 05/15] qemu-iotests: delay QMP socket timers

Posted by Max Reitz 4 years, 9 months ago

On 30.04.21 23:03, Emanuele Giuseppe Esposito wrote:
> 
> 
> On 30/04/2021 13:59, Max Reitz wrote:
>> On 14.04.21 19:03, Emanuele Giuseppe Esposito wrote:
>>> Attaching a gdbserver implies that the qmp socket
>>> should wait indefinitely for an answer from QEMU.
>>>
>>> Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
>>> ---
>>>   python/qemu/machine.py        |  3 +++
>>>   tests/qemu-iotests/iotests.py | 10 +++++++++-
>>>   2 files changed, 12 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/python/qemu/machine.py b/python/qemu/machine.py
>>> index 12752142c9..d6142271c2 100644
>>> --- a/python/qemu/machine.py
>>> +++ b/python/qemu/machine.py
>>> @@ -409,6 +409,9 @@ def _launch(self) -> None:
>>>                                          stderr=subprocess.STDOUT,
>>>                                          shell=False,
>>>                                          close_fds=False)
>>> +
>>> +        if 'gdbserver' in self._wrapper:
>>> +            self._qmp_timer = None
>>
>> Why doesn’t __init__() evaluate this?  This here doesn’t feel like the 
>> right place for it.  If we want to evaluate it here, self._qmp_timer 
>> shouldn’t exist, and instead the timeout should be a _post_launch() 
>> parameter.  (Which I would have nothing against, by the way.)
> 
> Uhm.. I got another comment in a previous version where for the "event" 
> callbacks it was better a property than passing around a parameter. 
> Which I honestly agree.

I think that comment was in the sense of providing a default value, 
which can be expressed by having a property that is set in __init__.

I don’t have anything against making this a property, but I also don’t 
have anything against making it a _post_launch() parameter.  I could 
even live with both, i.e. set _qmp_timer to 15 in __init__, then have a 
_post_launch parameter, and pass either self._qmp_timer or None if 
self._wrapper includes 'gdbserver'.

What I do mind is that I don’t understand why the property is modified 
here.  The value of self._qmp_timer is supposed to be 15 by default and 
None if self._wrapper includes 'gdbserver'.  It should thus be changed 
to None the moment self._wrapper is made to include 'gdbserver'. 
Because self._wrapper is set only in __init__, this should happen in 
__init__.

> What should __init__() do? The check here is to see if the invocation 
> has gdb (and a couple of patches ahead also valgrind), to remove the timer.
> If I understand what you mean, you want something like
> def __init__(self, timer):

Oh, no.  We can optionally do that perhaps later, but what I meant is 
just to put this in __init__() (without adding any parameters to it):

self._qmp_timer = 15.0 if 'gdbserver' not in self._wrapper else None

I think self._qmp_timer should always reflect what timeout we are going 
to use when a VM is launched.  So if the conditions influencing the 
timeout change, it should be updated immediately to reflect this.  The 
only condition we have right now is the content of self._wrapper, which 
is only set in __init__, so self._qmp_timer should be set once in 
__init__ and not changed afterwards.

That sounds academic, but imagine what would happen if we had a 
set_qmp_timer() method: The timout could be adjusted, but launch() would 
just ignore it and update the property, even though the conditions 
influencing the timout didn’t change between set_qmp_timer() and launch().

Or if we had a get_qmp_timer(); a caller would read a timeout of 15.0 
before launch(), even though the timeout is going to be None.

Therefore, I think a property should not be updated just before it is 
read, but instead when any condition that’s supposed to influence its 
value changes.

I suggested making it a parameter because updating a property when 
reading it sounds like it should be a parameter instead.  I.e., one 
would say

def __init__():
     self._qmp_timeout_default = 15.0

def post_launch(qmp_timeout):
     self._qmp.accept(qmp_timeout)

def launch(self):
     ...
     qmp_timeout = None if 'gdbserver' in self._wrapper \
                        else self._qmp_timout_default
     self.post_launch(qmp_timeout)

Which is basically the structure your patch has, which gave me the idea.

[...]

>>>           self._post_launch()
>>>       def _early_cleanup(self) -> None:
>>> diff --git a/tests/qemu-iotests/iotests.py 
>>> b/tests/qemu-iotests/iotests.py
>>> index 05d0dc0751..380527245e 100644
>>> --- a/tests/qemu-iotests/iotests.py
>>> +++ b/tests/qemu-iotests/iotests.py

[...]

>>> @@ -684,6 +687,11 @@ def qmp_to_opts(self, obj):
>>>               output_list += [key + '=' + obj[key]]
>>>           return ','.join(output_list)
>>> +    def get_qmp_events(self, wait: bool = False) -> List[QMPMessage]:
>>> +        if qemu_gdb:
>>> +            wait = 0.0

[...]

>>
>> Second, I don’t understand this.  If the caller wants to block waiting 
>> on an event, then that should have nothing to do with whether we have 
>> gdb running or not.  As far as I understand, setting wait to 0.0 is 
>> the same as wait = False, i.e. we don’t block and just return None 
>> immediately if there is no pending event.
> 
> You're right, this might not be needed here. The problem I had was that 
> calling gdb and pausing at a breakpoint or something for a while would 
> make the QMP socket timeout, thus aborting the whole test. In order to 
> avoid that, I need to stop or delay timers.
> 
> I can't remember why I added this check here. At some point I am sure 
> the test was failing because of socket timeout expiration, but I cannot 
> reproduce the problem when commenting out this check above in 
> get_qmp_events. The other check in patch 3 should be enough.

Hm, ok.  I’d guessed that you intended the wait=0.0 or wait=False to 
mean that we get an infinite timeout (i.e., no timeout), but that’s 
exactly why I didn’t get it.  wait=0.0 doesn’t give an infinite timeout, 
but instead basically times out immediately.

Max

Re: [PATCH v3 05/15] qemu-iotests: delay QMP socket timers

Posted by John Snow 4 years, 9 months ago

On 5/3/21 11:02 AM, Max Reitz wrote:
> On 30.04.21 23:03, Emanuele Giuseppe Esposito wrote:
>>
>>
>> On 30/04/2021 13:59, Max Reitz wrote:
>>> On 14.04.21 19:03, Emanuele Giuseppe Esposito wrote:
>>>> Attaching a gdbserver implies that the qmp socket
>>>> should wait indefinitely for an answer from QEMU.
>>>>
>>>> Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
>>>> ---
>>>>   python/qemu/machine.py        |  3 +++
>>>>   tests/qemu-iotests/iotests.py | 10 +++++++++-
>>>>   2 files changed, 12 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/python/qemu/machine.py b/python/qemu/machine.py
>>>> index 12752142c9..d6142271c2 100644
>>>> --- a/python/qemu/machine.py
>>>> +++ b/python/qemu/machine.py
>>>> @@ -409,6 +409,9 @@ def _launch(self) -> None:
>>>>                                          stderr=subprocess.STDOUT,
>>>>                                          shell=False,
>>>>                                          close_fds=False)
>>>> +
>>>> +        if 'gdbserver' in self._wrapper:
>>>> +            self._qmp_timer = None
>>>
>>> Why doesn’t __init__() evaluate this?  This here doesn’t feel like 
>>> the right place for it.  If we want to evaluate it here, 
>>> self._qmp_timer shouldn’t exist, and instead the timeout should be a 
>>> _post_launch() parameter.  (Which I would have nothing against, by 
>>> the way.)
>>
>> Uhm.. I got another comment in a previous version where for the 
>> "event" callbacks it was better a property than passing around a 
>> parameter. Which I honestly agree.
> 
> I think that comment was in the sense of providing a default value, 
> which can be expressed by having a property that is set in __init__.
> 

My comment was along the lines that "_post_launch()" is behaving as an 
event loop hook and not the sort of thing I want to pass parameters to. 
It's a private method, so the only possibility for someone passing a 
parameter to is another class method anyway.

We have a hierarchy of things that depend on the Machine class and I 
didn't want to start cascading optional parameters into the subclasses.

It was my intent that the information needed to run _post_launch() 
correctly should be known by the state of the object -- which I think 
should be true anyway.

> I don’t have anything against making this a property, but I also don’t 
> have anything against making it a _post_launch() parameter.  I could 
> even live with both, i.e. set _qmp_timer to 15 in __init__, then have a 
> _post_launch parameter, and pass either self._qmp_timer or None if 
> self._wrapper includes 'gdbserver'.
> 
> What I do mind is that I don’t understand why the property is modified 
> here.  The value of self._qmp_timer is supposed to be 15 by default and 
> None if self._wrapper includes 'gdbserver'.  It should thus be changed 
> to None the moment self._wrapper is made to include 'gdbserver'. Because 
> self._wrapper is set only in __init__, this should happen in __init__.
> 
>> What should __init__() do? The check here is to see if the invocation 
>> has gdb (and a couple of patches ahead also valgrind), to remove the 
>> timer.
>> If I understand what you mean, you want something like
>> def __init__(self, timer):
> 
> Oh, no.  We can optionally do that perhaps later, but what I meant is 
> just to put this in __init__() (without adding any parameters to it):
> 
> self._qmp_timer = 15.0 if 'gdbserver' not in self._wrapper else None
> 
> I think self._qmp_timer should always reflect what timeout we are going 
> to use when a VM is launched.  So if the conditions influencing the 
> timeout change, it should be updated immediately to reflect this.  The 
> only condition we have right now is the content of self._wrapper, which 
> is only set in __init__, so self._qmp_timer should be set once in 
> __init__ and not changed afterwards.
> 
> That sounds academic, but imagine what would happen if we had a 
> set_qmp_timer() method: The timout could be adjusted, but launch() would 
> just ignore it and update the property, even though the conditions 
> influencing the timout didn’t change between set_qmp_timer() and launch().
> 
> Or if we had a get_qmp_timer(); a caller would read a timeout of 15.0 
> before launch(), even though the timeout is going to be None.
> 
> Therefore, I think a property should not be updated just before it is 
> read, but instead when any condition that’s supposed to influence its 
> value changes.
> 

I agree with Max's reasoning here.

I am also not a fan of squishing magic into this class; changing class 
behavior based on introspection of wrapper arguments feels like a 
layering violation.

Maybe what you want is a subclass or a wrapper class that knows how to 
run QEMU using gdbserver, and changes some behaviors accordingly?

The factoring of Machine is quite bad already, admittedly, and is in 
need of a good spit-shine. Too many init parameters, too many state 
variables, too many methods that got patched in to support one specific 
use-case at one point or another. At a certain point, I begin to worry 
about how it's possible to audit how all of these one-off features 
behave and interact. It's getting complex.

Is it time to dream up a refactoring for how the Machine class behaves?

> 
> I suggested making it a parameter because updating a property when 
> reading it sounds like it should be a parameter instead.  I.e., one 
> would say
> 
> def __init__():
>      self._qmp_timeout_default = 15.0
> 
> def post_launch(qmp_timeout):
>      self._qmp.accept(qmp_timeout)
> 
> def launch(self):
>      ...
>      qmp_timeout = None if 'gdbserver' in self._wrapper \
>                         else self._qmp_timout_default
>      self.post_launch(qmp_timeout)
> 
> 
> Which is basically the structure your patch has, which gave me the idea.
> 
> [...]
> 
>>>>           self._post_launch()
>>>>       def _early_cleanup(self) -> None:
>>>> diff --git a/tests/qemu-iotests/iotests.py 
>>>> b/tests/qemu-iotests/iotests.py
>>>> index 05d0dc0751..380527245e 100644
>>>> --- a/tests/qemu-iotests/iotests.py
>>>> +++ b/tests/qemu-iotests/iotests.py
> 
> [...]
> 
>>>> @@ -684,6 +687,11 @@ def qmp_to_opts(self, obj):
>>>>               output_list += [key + '=' + obj[key]]
>>>>           return ','.join(output_list)
>>>> +    def get_qmp_events(self, wait: bool = False) -> List[QMPMessage]:
>>>> +        if qemu_gdb:
>>>> +            wait = 0.0
> 
> [...]
> 
>>>
>>> Second, I don’t understand this.  If the caller wants to block 
>>> waiting on an event, then that should have nothing to do with whether 
>>> we have gdb running or not.  As far as I understand, setting wait to 
>>> 0.0 is the same as wait = False, i.e. we don’t block and just return 
>>> None immediately if there is no pending event.
>>
>> You're right, this might not be needed here. The problem I had was 
>> that calling gdb and pausing at a breakpoint or something for a while 
>> would make the QMP socket timeout, thus aborting the whole test. In 
>> order to avoid that, I need to stop or delay timers.
>>
>> I can't remember why I added this check here. At some point I am sure 
>> the test was failing because of socket timeout expiration, but I 
>> cannot reproduce the problem when commenting out this check above in 
>> get_qmp_events. The other check in patch 3 should be enough.
> 
> Hm, ok.  I’d guessed that you intended the wait=0.0 or wait=False to 
> mean that we get an infinite timeout (i.e., no timeout), but that’s 
> exactly why I didn’t get it.  wait=0.0 doesn’t give an infinite timeout, 
> but instead basically times out immediately.
> 
> Max

Well, I suppose if we don't need it, then that makes things easier too :)