[Qemu-devel] [PATCH] cpus: Fix event order on resume of stopped guest

Markus Armbruster posted 1 patch 5 years, 11 months ago
Patches applied successfully (tree, apply log)
git fetch https://github.com/patchew-project/qemu tags/patchew/20180423084518.2426-1-armbru@redhat.com
Test checkpatch passed
Test docker-build@min-glib passed
Test docker-mingw@fedora passed
Test s390x passed
cpus.c | 16 ++++++++--------
1 file changed, 8 insertions(+), 8 deletions(-)
[Qemu-devel] [PATCH] cpus: Fix event order on resume of stopped guest
Posted by Markus Armbruster 5 years, 11 months ago
When resume of a stopped guest immediately runs into block device
errors, the BLOCK_IO_ERROR event is sent before the RESUME event.

Reproducer:

1. Create a scratch image
   $ dd if=/dev/zero of=scratch.img bs=1M count=100

   Size doesn't actually matter.

2. Prepare blkdebug configuration:

   $ cat >blkdebug.conf <<EOF
   [inject-error]
   event = "write_aio"
   errno = "5"
   EOF

   Note that errno 5 is EIO.

3. Run a guest with an additional scratch disk, i.e. with additional
   arguments
   -drive if=none,id=scratch-drive,format=raw,werror=stop,file=blkdebug:blkdebug.conf:scratch.img
   -device virtio-blk-pci,id=scratch,drive=scratch-drive

   The blkdebug part makes all writes to the scratch drive fail with
   EIO.  The werror=stop pauses the guest on write errors.

4. Connect to the QMP socket e.g. like this:
   $ socat UNIX:/your/qmp/socket READLINE,history=$HOME/.qmp_history,prompt='QMP> '

   Issue QMP command 'qmp_capabilities':
   QMP> { "execute": "qmp_capabilities" }

5. Boot the guest.

6. In the guest, write to the scratch disk, e.g. like this:

   # dd if=/dev/zero of=/dev/vdb count=1

   Do double-check the device specified with of= is actually the
   scratch device!

7. Issue QMP command 'cont':
   QMP> { "execute": "cont" }

After step 6, I get a BLOCK_IO_ERROR event followed by a STOP event.  Good.

After step 7, I get BLOCK_IO_ERROR, then RESUME, then STOP.  Not so
good; I'd expect RESUME, then BLOCK_IO_ERROR, then STOP.

The funny event order confuses libvirt: virsh -r domstate DOMAIN
--reason reports "paused (unknown)" rather than "paused (I/O error)".

The culprit is vm_prepare_start().

    /* Ensure that a STOP/RESUME pair of events is emitted if a
     * vmstop request was pending.  The BLOCK_IO_ERROR event, for
     * example, according to documentation is always followed by
     * the STOP event.
     */
    if (runstate_is_running()) {
        qapi_event_send_stop(&error_abort);
        res = -1;
    } else {
        replay_enable_events();
        cpu_enable_ticks();
        runstate_set(RUN_STATE_RUNNING);
        vm_state_notify(1, RUN_STATE_RUNNING);
    }

    /* We are sending this now, but the CPUs will be resumed shortly later */
    qapi_event_send_resume(&error_abort);
    return res;

When resuming a stopped guest, we take the else branch before we get
to sending RESUME.  vm_state_notify() runs virtio_vmstate_change(),
among other things.  This restarts I/O, triggering the BLOCK_IO_ERROR
event.

Reshuffle vm_prepare_start() to send the RESUME event earlier.

Fixes RHBZ 1566153.

Cc: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
---
 cpus.c | 16 ++++++++--------
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/cpus.c b/cpus.c
index 38eba8bff3..398392bc3a 100644
--- a/cpus.c
+++ b/cpus.c
@@ -2043,7 +2043,6 @@ int vm_stop(RunState state)
 int vm_prepare_start(void)
 {
     RunState requested;
-    int res = 0;
 
     qemu_vmstop_requested(&requested);
     if (runstate_is_running() && requested == RUN_STATE__MAX) {
@@ -2057,17 +2056,18 @@ int vm_prepare_start(void)
      */
     if (runstate_is_running()) {
         qapi_event_send_stop(&error_abort);
-        res = -1;
-    } else {
-        replay_enable_events();
-        cpu_enable_ticks();
-        runstate_set(RUN_STATE_RUNNING);
-        vm_state_notify(1, RUN_STATE_RUNNING);
+        qapi_event_send_resume(&error_abort);
+        return -1;
     }
 
     /* We are sending this now, but the CPUs will be resumed shortly later */
     qapi_event_send_resume(&error_abort);
-    return res;
+
+    replay_enable_events();
+    cpu_enable_ticks();
+    runstate_set(RUN_STATE_RUNNING);
+    vm_state_notify(1, RUN_STATE_RUNNING);
+    return 0;
 }
 
 void vm_start(void)
-- 
2.13.6


Re: [Qemu-devel] [PATCH] cpus: Fix event order on resume of stopped guest
Posted by Paolo Bonzini 5 years, 11 months ago
On 23/04/2018 10:45, Markus Armbruster wrote:
> When resume of a stopped guest immediately runs into block device
> errors, the BLOCK_IO_ERROR event is sent before the RESUME event.
> 
> Reproducer:
> 
> 1. Create a scratch image
>    $ dd if=/dev/zero of=scratch.img bs=1M count=100
> 
>    Size doesn't actually matter.
> 
> 2. Prepare blkdebug configuration:
> 
>    $ cat >blkdebug.conf <<EOF
>    [inject-error]
>    event = "write_aio"
>    errno = "5"
>    EOF
> 
>    Note that errno 5 is EIO.
> 
> 3. Run a guest with an additional scratch disk, i.e. with additional
>    arguments
>    -drive if=none,id=scratch-drive,format=raw,werror=stop,file=blkdebug:blkdebug.conf:scratch.img
>    -device virtio-blk-pci,id=scratch,drive=scratch-drive
> 
>    The blkdebug part makes all writes to the scratch drive fail with
>    EIO.  The werror=stop pauses the guest on write errors.
> 
> 4. Connect to the QMP socket e.g. like this:
>    $ socat UNIX:/your/qmp/socket READLINE,history=$HOME/.qmp_history,prompt='QMP> '
> 
>    Issue QMP command 'qmp_capabilities':
>    QMP> { "execute": "qmp_capabilities" }
> 
> 5. Boot the guest.
> 
> 6. In the guest, write to the scratch disk, e.g. like this:
> 
>    # dd if=/dev/zero of=/dev/vdb count=1
> 
>    Do double-check the device specified with of= is actually the
>    scratch device!
> 
> 7. Issue QMP command 'cont':
>    QMP> { "execute": "cont" }
> 
> After step 6, I get a BLOCK_IO_ERROR event followed by a STOP event.  Good.
> 
> After step 7, I get BLOCK_IO_ERROR, then RESUME, then STOP.  Not so
> good; I'd expect RESUME, then BLOCK_IO_ERROR, then STOP.
> 
> The funny event order confuses libvirt: virsh -r domstate DOMAIN
> --reason reports "paused (unknown)" rather than "paused (I/O error)".
> 
> The culprit is vm_prepare_start().
> 
>     /* Ensure that a STOP/RESUME pair of events is emitted if a
>      * vmstop request was pending.  The BLOCK_IO_ERROR event, for
>      * example, according to documentation is always followed by
>      * the STOP event.
>      */
>     if (runstate_is_running()) {
>         qapi_event_send_stop(&error_abort);
>         res = -1;
>     } else {
>         replay_enable_events();
>         cpu_enable_ticks();
>         runstate_set(RUN_STATE_RUNNING);
>         vm_state_notify(1, RUN_STATE_RUNNING);
>     }
> 
>     /* We are sending this now, but the CPUs will be resumed shortly later */
>     qapi_event_send_resume(&error_abort);
>     return res;
> 
> When resuming a stopped guest, we take the else branch before we get
> to sending RESUME.  vm_state_notify() runs virtio_vmstate_change(),
> among other things.  This restarts I/O, triggering the BLOCK_IO_ERROR
> event.
> 
> Reshuffle vm_prepare_start() to send the RESUME event earlier.
> 
> Fixes RHBZ 1566153.
> 
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Signed-off-by: Markus Armbruster <armbru@redhat.com>
> ---
>  cpus.c | 16 ++++++++--------
>  1 file changed, 8 insertions(+), 8 deletions(-)
> 
> diff --git a/cpus.c b/cpus.c
> index 38eba8bff3..398392bc3a 100644
> --- a/cpus.c
> +++ b/cpus.c
> @@ -2043,7 +2043,6 @@ int vm_stop(RunState state)
>  int vm_prepare_start(void)
>  {
>      RunState requested;
> -    int res = 0;
>  
>      qemu_vmstop_requested(&requested);
>      if (runstate_is_running() && requested == RUN_STATE__MAX) {
> @@ -2057,17 +2056,18 @@ int vm_prepare_start(void)
>       */
>      if (runstate_is_running()) {
>          qapi_event_send_stop(&error_abort);
> -        res = -1;
> -    } else {
> -        replay_enable_events();
> -        cpu_enable_ticks();
> -        runstate_set(RUN_STATE_RUNNING);
> -        vm_state_notify(1, RUN_STATE_RUNNING);
> +        qapi_event_send_resume(&error_abort);
> +        return -1;
>      }
>  
>      /* We are sending this now, but the CPUs will be resumed shortly later */
>      qapi_event_send_resume(&error_abort);
> -    return res;
> +
> +    replay_enable_events();
> +    cpu_enable_ticks();
> +    runstate_set(RUN_STATE_RUNNING);
> +    vm_state_notify(1, RUN_STATE_RUNNING);
> +    return 0;
>  }
>  
>  void vm_start(void)
> 

Queued, thanks.

Paolo

Re: [Qemu-devel] [PATCH] cpus: Fix event order on resume of stopped guest
Posted by Kevin Wolf 5 years, 11 months ago
Am 23.04.2018 um 10:45 hat Markus Armbruster geschrieben:
> When resume of a stopped guest immediately runs into block device
> errors, the BLOCK_IO_ERROR event is sent before the RESUME event.
> 
> Reproducer:
> 
> 1. Create a scratch image
>    $ dd if=/dev/zero of=scratch.img bs=1M count=100
> 
>    Size doesn't actually matter.
> 
> 2. Prepare blkdebug configuration:
> 
>    $ cat >blkdebug.conf <<EOF
>    [inject-error]
>    event = "write_aio"
>    errno = "5"
>    EOF
> 
>    Note that errno 5 is EIO.
> 
> 3. Run a guest with an additional scratch disk, i.e. with additional
>    arguments
>    -drive if=none,id=scratch-drive,format=raw,werror=stop,file=blkdebug:blkdebug.conf:scratch.img
>    -device virtio-blk-pci,id=scratch,drive=scratch-drive
> 
>    The blkdebug part makes all writes to the scratch drive fail with
>    EIO.  The werror=stop pauses the guest on write errors.
> 
> 4. Connect to the QMP socket e.g. like this:
>    $ socat UNIX:/your/qmp/socket READLINE,history=$HOME/.qmp_history,prompt='QMP> '
> 
>    Issue QMP command 'qmp_capabilities':
>    QMP> { "execute": "qmp_capabilities" }
> 
> 5. Boot the guest.
> 
> 6. In the guest, write to the scratch disk, e.g. like this:
> 
>    # dd if=/dev/zero of=/dev/vdb count=1
> 
>    Do double-check the device specified with of= is actually the
>    scratch device!
> 
> 7. Issue QMP command 'cont':
>    QMP> { "execute": "cont" }
> 
> After step 6, I get a BLOCK_IO_ERROR event followed by a STOP event.  Good.
> 
> After step 7, I get BLOCK_IO_ERROR, then RESUME, then STOP.  Not so
> good; I'd expect RESUME, then BLOCK_IO_ERROR, then STOP.

Do you want to rephrase this in the form of a script for qemu-iotests?

I suppose the 'dd' line can be replaced by a 'qemu-io' monitor command.

Kevin

Re: [Qemu-devel] [PATCH] cpus: Fix event order on resume of stopped guest
Posted by Markus Armbruster 5 years, 11 months ago
Kevin Wolf <kwolf@redhat.com> writes:

> Am 23.04.2018 um 10:45 hat Markus Armbruster geschrieben:
>> When resume of a stopped guest immediately runs into block device
>> errors, the BLOCK_IO_ERROR event is sent before the RESUME event.
>> 
>> Reproducer:
>> 
>> 1. Create a scratch image
>>    $ dd if=/dev/zero of=scratch.img bs=1M count=100
>> 
>>    Size doesn't actually matter.
>> 
>> 2. Prepare blkdebug configuration:
>> 
>>    $ cat >blkdebug.conf <<EOF
>>    [inject-error]
>>    event = "write_aio"
>>    errno = "5"
>>    EOF
>> 
>>    Note that errno 5 is EIO.
>> 
>> 3. Run a guest with an additional scratch disk, i.e. with additional
>>    arguments
>>    -drive if=none,id=scratch-drive,format=raw,werror=stop,file=blkdebug:blkdebug.conf:scratch.img
>>    -device virtio-blk-pci,id=scratch,drive=scratch-drive
>> 
>>    The blkdebug part makes all writes to the scratch drive fail with
>>    EIO.  The werror=stop pauses the guest on write errors.
>> 
>> 4. Connect to the QMP socket e.g. like this:
>>    $ socat UNIX:/your/qmp/socket READLINE,history=$HOME/.qmp_history,prompt='QMP> '
>> 
>>    Issue QMP command 'qmp_capabilities':
>>    QMP> { "execute": "qmp_capabilities" }
>> 
>> 5. Boot the guest.
>> 
>> 6. In the guest, write to the scratch disk, e.g. like this:
>> 
>>    # dd if=/dev/zero of=/dev/vdb count=1
>> 
>>    Do double-check the device specified with of= is actually the
>>    scratch device!
>> 
>> 7. Issue QMP command 'cont':
>>    QMP> { "execute": "cont" }
>> 
>> After step 6, I get a BLOCK_IO_ERROR event followed by a STOP event.  Good.
>> 
>> After step 7, I get BLOCK_IO_ERROR, then RESUME, then STOP.  Not so
>> good; I'd expect RESUME, then BLOCK_IO_ERROR, then STOP.
>
> Do you want to rephrase this in the form of a script for qemu-iotests?
>
> I suppose the 'dd' line can be replaced by a 'qemu-io' monitor command.

Makes sense, but I'm quite pretty much a noob there.  Perhaps I can copy
an existing test and hack it up.  Which one would you recommend?

Re: [Qemu-devel] [PATCH] cpus: Fix event order on resume of stopped guest
Posted by Kevin Wolf 5 years, 11 months ago
Am 23.04.2018 um 17:47 hat Markus Armbruster geschrieben:
> Kevin Wolf <kwolf@redhat.com> writes:
> 
> > Am 23.04.2018 um 10:45 hat Markus Armbruster geschrieben:
> >> When resume of a stopped guest immediately runs into block device
> >> errors, the BLOCK_IO_ERROR event is sent before the RESUME event.
> >> 
> >> Reproducer:
> >> 
> >> 1. Create a scratch image
> >>    $ dd if=/dev/zero of=scratch.img bs=1M count=100
> >> 
> >>    Size doesn't actually matter.
> >> 
> >> 2. Prepare blkdebug configuration:
> >> 
> >>    $ cat >blkdebug.conf <<EOF
> >>    [inject-error]
> >>    event = "write_aio"
> >>    errno = "5"
> >>    EOF
> >> 
> >>    Note that errno 5 is EIO.
> >> 
> >> 3. Run a guest with an additional scratch disk, i.e. with additional
> >>    arguments
> >>    -drive if=none,id=scratch-drive,format=raw,werror=stop,file=blkdebug:blkdebug.conf:scratch.img
> >>    -device virtio-blk-pci,id=scratch,drive=scratch-drive
> >> 
> >>    The blkdebug part makes all writes to the scratch drive fail with
> >>    EIO.  The werror=stop pauses the guest on write errors.
> >> 
> >> 4. Connect to the QMP socket e.g. like this:
> >>    $ socat UNIX:/your/qmp/socket READLINE,history=$HOME/.qmp_history,prompt='QMP> '
> >> 
> >>    Issue QMP command 'qmp_capabilities':
> >>    QMP> { "execute": "qmp_capabilities" }
> >> 
> >> 5. Boot the guest.
> >> 
> >> 6. In the guest, write to the scratch disk, e.g. like this:
> >> 
> >>    # dd if=/dev/zero of=/dev/vdb count=1
> >> 
> >>    Do double-check the device specified with of= is actually the
> >>    scratch device!
> >> 
> >> 7. Issue QMP command 'cont':
> >>    QMP> { "execute": "cont" }
> >> 
> >> After step 6, I get a BLOCK_IO_ERROR event followed by a STOP event.  Good.
> >> 
> >> After step 7, I get BLOCK_IO_ERROR, then RESUME, then STOP.  Not so
> >> good; I'd expect RESUME, then BLOCK_IO_ERROR, then STOP.
> >
> > Do you want to rephrase this in the form of a script for qemu-iotests?
> >
> > I suppose the 'dd' line can be replaced by a 'qemu-io' monitor command.
> 
> Makes sense, but I'm quite pretty much a noob there.  Perhaps I can copy
> an existing test and hack it up.  Which one would you recommend?

Depends on how much control you actually need for this test. At first
sight, it might be enough to copy one of the tests implementing a
run_qemu() function. These are tests that do essentially this:

    qemu-system-x86_64 -qmp stdio <<EOF
    ...commands here....
    EOF

This is all you need if you don't have a reason to wait for or even
parse QMP results. (The results end up in stdout, so they are validated
with the usual diffing.)

If you need a bit more, copy one of the tests that use ./common.qemu.
This is a bit more complex but allows you to wait for expected QMP
results before you continue with the next action. Probably you don't
need this here.

(And if even that is not powerful enough, Python test cases with
iotests.py are what you want. Almost certainly overkill for this one.)

Kevin

Re: [Qemu-devel] [PATCH] cpus: Fix event order on resume of stopped guest
Posted by Markus Armbruster 5 years, 10 months ago
Kevin Wolf <kwolf@redhat.com> writes:

> Am 23.04.2018 um 10:45 hat Markus Armbruster geschrieben:
>> When resume of a stopped guest immediately runs into block device
>> errors, the BLOCK_IO_ERROR event is sent before the RESUME event.
>> 
>> Reproducer:
>> 
>> 1. Create a scratch image
>>    $ dd if=/dev/zero of=scratch.img bs=1M count=100
>> 
>>    Size doesn't actually matter.
>> 
>> 2. Prepare blkdebug configuration:
>> 
>>    $ cat >blkdebug.conf <<EOF
>>    [inject-error]
>>    event = "write_aio"
>>    errno = "5"
>>    EOF
>> 
>>    Note that errno 5 is EIO.
>> 
>> 3. Run a guest with an additional scratch disk, i.e. with additional
>>    arguments
>>    -drive if=none,id=scratch-drive,format=raw,werror=stop,file=blkdebug:blkdebug.conf:scratch.img
>>    -device virtio-blk-pci,id=scratch,drive=scratch-drive
>> 
>>    The blkdebug part makes all writes to the scratch drive fail with
>>    EIO.  The werror=stop pauses the guest on write errors.
>> 
>> 4. Connect to the QMP socket e.g. like this:
>>    $ socat UNIX:/your/qmp/socket READLINE,history=$HOME/.qmp_history,prompt='QMP> '
>> 
>>    Issue QMP command 'qmp_capabilities':
>>    QMP> { "execute": "qmp_capabilities" }
>> 
>> 5. Boot the guest.
>> 
>> 6. In the guest, write to the scratch disk, e.g. like this:
>> 
>>    # dd if=/dev/zero of=/dev/vdb count=1
>> 
>>    Do double-check the device specified with of= is actually the
>>    scratch device!
>> 
>> 7. Issue QMP command 'cont':
>>    QMP> { "execute": "cont" }
>> 
>> After step 6, I get a BLOCK_IO_ERROR event followed by a STOP event.  Good.
>> 
>> After step 7, I get BLOCK_IO_ERROR, then RESUME, then STOP.  Not so
>> good; I'd expect RESUME, then BLOCK_IO_ERROR, then STOP.
>
> Do you want to rephrase this in the form of a script for qemu-iotests?
>
> I suppose the 'dd' line can be replaced by a 'qemu-io' monitor command.

Uh, can it?  With qemu-io, the write doesn't stop the guest, because it
bypasses the device model, and thus blk_error_action().  I'm not aware
of ways to make qemu-iotests write via a device model.  I'm afraid we
need a full-fledged qtest.  Better ideas?

Re: [Qemu-devel] [PATCH] cpus: Fix event order on resume of stopped guest
Posted by Kevin Wolf 5 years, 10 months ago
Am 03.05.2018 um 14:17 hat Markus Armbruster geschrieben:
> Kevin Wolf <kwolf@redhat.com> writes:
> 
> > Am 23.04.2018 um 10:45 hat Markus Armbruster geschrieben:
> >> When resume of a stopped guest immediately runs into block device
> >> errors, the BLOCK_IO_ERROR event is sent before the RESUME event.
> >> 
> >> Reproducer:
> >> 
> >> 1. Create a scratch image
> >>    $ dd if=/dev/zero of=scratch.img bs=1M count=100
> >> 
> >>    Size doesn't actually matter.
> >> 
> >> 2. Prepare blkdebug configuration:
> >> 
> >>    $ cat >blkdebug.conf <<EOF
> >>    [inject-error]
> >>    event = "write_aio"
> >>    errno = "5"
> >>    EOF
> >> 
> >>    Note that errno 5 is EIO.
> >> 
> >> 3. Run a guest with an additional scratch disk, i.e. with additional
> >>    arguments
> >>    -drive if=none,id=scratch-drive,format=raw,werror=stop,file=blkdebug:blkdebug.conf:scratch.img
> >>    -device virtio-blk-pci,id=scratch,drive=scratch-drive
> >> 
> >>    The blkdebug part makes all writes to the scratch drive fail with
> >>    EIO.  The werror=stop pauses the guest on write errors.
> >> 
> >> 4. Connect to the QMP socket e.g. like this:
> >>    $ socat UNIX:/your/qmp/socket READLINE,history=$HOME/.qmp_history,prompt='QMP> '
> >> 
> >>    Issue QMP command 'qmp_capabilities':
> >>    QMP> { "execute": "qmp_capabilities" }
> >> 
> >> 5. Boot the guest.
> >> 
> >> 6. In the guest, write to the scratch disk, e.g. like this:
> >> 
> >>    # dd if=/dev/zero of=/dev/vdb count=1
> >> 
> >>    Do double-check the device specified with of= is actually the
> >>    scratch device!
> >> 
> >> 7. Issue QMP command 'cont':
> >>    QMP> { "execute": "cont" }
> >> 
> >> After step 6, I get a BLOCK_IO_ERROR event followed by a STOP event.  Good.
> >> 
> >> After step 7, I get BLOCK_IO_ERROR, then RESUME, then STOP.  Not so
> >> good; I'd expect RESUME, then BLOCK_IO_ERROR, then STOP.
> >
> > Do you want to rephrase this in the form of a script for qemu-iotests?
> >
> > I suppose the 'dd' line can be replaced by a 'qemu-io' monitor command.
> 
> Uh, can it?  With qemu-io, the write doesn't stop the guest, because it
> bypasses the device model, and thus blk_error_action().  I'm not aware
> of ways to make qemu-iotests write via a device model.  I'm afraid we
> need a full-fledged qtest.  Better ideas?

I'm afraid you're right. :-(

Did I ever mention that I don't really like having the werror logic in
the devices?

Kevin

Re: [Qemu-devel] [PATCH] cpus: Fix event order on resume of stopped guest
Posted by Markus Armbruster 5 years, 10 months ago
Kevin Wolf <kwolf@redhat.com> writes:

> Am 03.05.2018 um 14:17 hat Markus Armbruster geschrieben:
>> Kevin Wolf <kwolf@redhat.com> writes:
>> 
>> > Am 23.04.2018 um 10:45 hat Markus Armbruster geschrieben:
>> >> When resume of a stopped guest immediately runs into block device
>> >> errors, the BLOCK_IO_ERROR event is sent before the RESUME event.
>> >> 
>> >> Reproducer:
>> >> 
>> >> 1. Create a scratch image
>> >>    $ dd if=/dev/zero of=scratch.img bs=1M count=100
>> >> 
>> >>    Size doesn't actually matter.
>> >> 
>> >> 2. Prepare blkdebug configuration:
>> >> 
>> >>    $ cat >blkdebug.conf <<EOF
>> >>    [inject-error]
>> >>    event = "write_aio"
>> >>    errno = "5"
>> >>    EOF
>> >> 
>> >>    Note that errno 5 is EIO.
>> >> 
>> >> 3. Run a guest with an additional scratch disk, i.e. with additional
>> >>    arguments
>> >>    -drive if=none,id=scratch-drive,format=raw,werror=stop,file=blkdebug:blkdebug.conf:scratch.img
>> >>    -device virtio-blk-pci,id=scratch,drive=scratch-drive
>> >> 
>> >>    The blkdebug part makes all writes to the scratch drive fail with
>> >>    EIO.  The werror=stop pauses the guest on write errors.
>> >> 
>> >> 4. Connect to the QMP socket e.g. like this:
>> >>    $ socat UNIX:/your/qmp/socket READLINE,history=$HOME/.qmp_history,prompt='QMP> '
>> >> 
>> >>    Issue QMP command 'qmp_capabilities':
>> >>    QMP> { "execute": "qmp_capabilities" }
>> >> 
>> >> 5. Boot the guest.
>> >> 
>> >> 6. In the guest, write to the scratch disk, e.g. like this:
>> >> 
>> >>    # dd if=/dev/zero of=/dev/vdb count=1
>> >> 
>> >>    Do double-check the device specified with of= is actually the
>> >>    scratch device!
>> >> 
>> >> 7. Issue QMP command 'cont':
>> >>    QMP> { "execute": "cont" }
>> >> 
>> >> After step 6, I get a BLOCK_IO_ERROR event followed by a STOP event.  Good.
>> >> 
>> >> After step 7, I get BLOCK_IO_ERROR, then RESUME, then STOP.  Not so
>> >> good; I'd expect RESUME, then BLOCK_IO_ERROR, then STOP.
>> >
>> > Do you want to rephrase this in the form of a script for qemu-iotests?
>> >
>> > I suppose the 'dd' line can be replaced by a 'qemu-io' monitor command.
>> 
>> Uh, can it?  With qemu-io, the write doesn't stop the guest, because it
>> bypasses the device model, and thus blk_error_action().  I'm not aware
>> of ways to make qemu-iotests write via a device model.  I'm afraid we
>> need a full-fledged qtest.  Better ideas?
>
> I'm afraid you're right. :-(
>
> Did I ever mention that I don't really like having the werror logic in
> the devices?

Only a few times :)

There's an explanation next to blk_error_action():

/* This is done by device models because, while the block layer knows
 * about the error, it does not know whether an operation comes from
 * the device or the block layer (from a job, for example).
 */

Re: [Qemu-devel] [PATCH] cpus: Fix event order on resume of stopped guest
Posted by Paolo Bonzini 5 years, 10 months ago
On 03/05/2018 14:17, Markus Armbruster wrote:
>>> 4. Connect to the QMP socket e.g. like this:
>>>    $ socat UNIX:/your/qmp/socket READLINE,history=$HOME/.qmp_history,prompt='QMP> '
>>>
>>>    Issue QMP command 'qmp_capabilities':
>>>    QMP> { "execute": "qmp_capabilities" }
>>>
>>> 5. Boot the guest.
>>>
>>> 6. In the guest, write to the scratch disk, e.g. like this:
>>>
>>>    # dd if=/dev/zero of=/dev/vdb count=1
>>>
>>>    Do double-check the device specified with of= is actually the
>>>    scratch device!
>>>
>>> 7. Issue QMP command 'cont':
>>>    QMP> { "execute": "cont" }
>>>
>>> After step 6, I get a BLOCK_IO_ERROR event followed by a STOP event.  Good.
>>>
>>> After step 7, I get BLOCK_IO_ERROR, then RESUME, then STOP.  Not so
>>> good; I'd expect RESUME, then BLOCK_IO_ERROR, then STOP.
>> Do you want to rephrase this in the form of a script for qemu-iotests?
>>
>> I suppose the 'dd' line can be replaced by a 'qemu-io' monitor command.
> Uh, can it?  With qemu-io, the write doesn't stop the guest, because it
> bypasses the device model, and thus blk_error_action().  I'm not aware
> of ways to make qemu-iotests write via a device model.  I'm afraid we
> need a full-fledged qtest.  Better ideas?

Yeah, using virtio-blk-test sounds like a good idea.

Paolo

Re: [Qemu-devel] [PATCH] cpus: Fix event order on resume of stopped guest
Posted by Markus Armbruster 5 years, 10 months ago
Paolo Bonzini <pbonzini@redhat.com> writes:

> On 03/05/2018 14:17, Markus Armbruster wrote:
>>>> 4. Connect to the QMP socket e.g. like this:
>>>>    $ socat UNIX:/your/qmp/socket READLINE,history=$HOME/.qmp_history,prompt='QMP> '
>>>>
>>>>    Issue QMP command 'qmp_capabilities':
>>>>    QMP> { "execute": "qmp_capabilities" }
>>>>
>>>> 5. Boot the guest.
>>>>
>>>> 6. In the guest, write to the scratch disk, e.g. like this:
>>>>
>>>>    # dd if=/dev/zero of=/dev/vdb count=1
>>>>
>>>>    Do double-check the device specified with of= is actually the
>>>>    scratch device!
>>>>
>>>> 7. Issue QMP command 'cont':
>>>>    QMP> { "execute": "cont" }
>>>>
>>>> After step 6, I get a BLOCK_IO_ERROR event followed by a STOP event.  Good.
>>>>
>>>> After step 7, I get BLOCK_IO_ERROR, then RESUME, then STOP.  Not so
>>>> good; I'd expect RESUME, then BLOCK_IO_ERROR, then STOP.
>>> Do you want to rephrase this in the form of a script for qemu-iotests?
>>>
>>> I suppose the 'dd' line can be replaced by a 'qemu-io' monitor command.
>> Uh, can it?  With qemu-io, the write doesn't stop the guest, because it
>> bypasses the device model, and thus blk_error_action().  I'm not aware
>> of ways to make qemu-iotests write via a device model.  I'm afraid we
>> need a full-fledged qtest.  Better ideas?
>
> Yeah, using virtio-blk-test sounds like a good idea.

Who's familiar with this test?  I'm not sure I can afford digging into
it myself right now...

The other devices supporting error actions are in hw/scsi/scsi-disk.c,
hw/ide/ahci.c and hw/ide/core.c.  Tests with matching names are
virtio-scsi-test.c, ahci-test.c, ide-test.c.