[Qemu-devel] [PATCH v2 0/2] migration: fix virtio-rng

Laurent Vivier posted 2 patches 6 years, 11 months ago
Patches applied successfully (tree, apply log)
git fetch https://github.com/patchew-project/qemu tags/patchew/20170412135312.1686-1-lvivier@redhat.com
Test checkpatch passed
Test docker passed
Test s390x passed
hw/virtio/trace-events         |  3 +++
hw/virtio/virtio-rng.c         | 29 +++++++++++++++++++++++------
include/hw/virtio/virtio-rng.h |  2 ++
migration/migration.c          |  6 +++---
4 files changed, 31 insertions(+), 9 deletions(-)
[Qemu-devel] [PATCH v2 0/2] migration: fix virtio-rng
Posted by Laurent Vivier 6 years, 11 months ago
When post-copy migration is enabled, the destination
guest can ask for memory from the source when the
vmstate is restored.

In the case of virtio, a part of the virtqueue
is migrated by the vmstate structure (last_avail_idx)
another part is migrated inside the RAM (used_idx).
On the source side, the virtqueue can be modified
whereas the vmstate is already migrated, and the destination
side can ask for the value in RAM. In this case we have
an inconsistency that can generate this kind of error:
    "VQ 0 size 0x8 < last_avail_idx 0xa - used_idx 0"
in hw/virtio/virtio.c:2180, virtio_load().

This happens with virtio-rng as the chr_read()
function which modifies the virqueue is called
by the rng backend and the rng backend continues to
run while the migration is running and the CPU is stopped.

This series fixes this problem by ignoring chr_read()
calls while the CPU is stopped. The first patch of the
series fixes another problem triggered by this error
case: a use-after-free case.

The probability to have this problem is very low, as
generally the post-copy phase is very short, so the window
to modify the virtqueue while the vmstate has been sent
is very small... except if you are doing trans-continental
guest migration with high latency and post-copy phase that
can be run for minutes.

I've been able to reproduce the problem locally on a host,
by adding network latency with "tc". Another condition is
to have an rng daemon running in the guest to generate
events in the virtio-rng device.

v2:
- add a vm state change handler to restart the virtio-rng
  process when the CPU restarts (it also replaces
  the post_load function).

Laurent Vivier (2):
  migration: don't close a file descriptor while it can be in use
  virtio-rng: stop virtqueue while the CPU is stopped

 hw/virtio/trace-events         |  3 +++
 hw/virtio/virtio-rng.c         | 29 +++++++++++++++++++++++------
 include/hw/virtio/virtio-rng.h |  2 ++
 migration/migration.c          |  6 +++---
 4 files changed, 31 insertions(+), 9 deletions(-)

-- 
2.9.3


Re: [Qemu-devel] [PATCH v2 0/2] migration: fix virtio-rng
Posted by Stefan Hajnoczi 6 years, 11 months ago
On Wed, Apr 12, 2017 at 03:53:10PM +0200, Laurent Vivier wrote:
> When post-copy migration is enabled, the destination
> guest can ask for memory from the source when the
> vmstate is restored.
> 
> In the case of virtio, a part of the virtqueue
> is migrated by the vmstate structure (last_avail_idx)
> another part is migrated inside the RAM (used_idx).
> On the source side, the virtqueue can be modified
> whereas the vmstate is already migrated, and the destination
> side can ask for the value in RAM. In this case we have
> an inconsistency that can generate this kind of error:
>     "VQ 0 size 0x8 < last_avail_idx 0xa - used_idx 0"
> in hw/virtio/virtio.c:2180, virtio_load().
> 
> This happens with virtio-rng as the chr_read()
> function which modifies the virqueue is called
> by the rng backend and the rng backend continues to
> run while the migration is running and the CPU is stopped.
> 
> This series fixes this problem by ignoring chr_read()
> calls while the CPU is stopped. The first patch of the
> series fixes another problem triggered by this error
> case: a use-after-free case.
> 
> The probability to have this problem is very low, as
> generally the post-copy phase is very short, so the window
> to modify the virtqueue while the vmstate has been sent
> is very small... except if you are doing trans-continental
> guest migration with high latency and post-copy phase that
> can be run for minutes.
> 
> I've been able to reproduce the problem locally on a host,
> by adding network latency with "tc". Another condition is
> to have an rng daemon running in the guest to generate
> events in the virtio-rng device.
> 
> v2:
> - add a vm state change handler to restart the virtio-rng
>   process when the CPU restarts (it also replaces
>   the post_load function).
> 
> Laurent Vivier (2):
>   migration: don't close a file descriptor while it can be in use
>   virtio-rng: stop virtqueue while the CPU is stopped
> 
>  hw/virtio/trace-events         |  3 +++
>  hw/virtio/virtio-rng.c         | 29 +++++++++++++++++++++++------
>  include/hw/virtio/virtio-rng.h |  2 ++
>  migration/migration.c          |  6 +++---
>  4 files changed, 31 insertions(+), 9 deletions(-)
> 
> -- 
> 2.9.3
> 

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Re: [Qemu-devel] [PATCH v2 0/2] migration: fix virtio-rng
Posted by Amit Shah 6 years, 11 months ago
On (Wed) 12 Apr 2017 [15:53:10], Laurent Vivier wrote:
> When post-copy migration is enabled, the destination
> guest can ask for memory from the source when the
> vmstate is restored.
> 
> In the case of virtio, a part of the virtqueue
> is migrated by the vmstate structure (last_avail_idx)
> another part is migrated inside the RAM (used_idx).
> On the source side, the virtqueue can be modified
> whereas the vmstate is already migrated, and the destination
> side can ask for the value in RAM. In this case we have
> an inconsistency that can generate this kind of error:
>     "VQ 0 size 0x8 < last_avail_idx 0xa - used_idx 0"
> in hw/virtio/virtio.c:2180, virtio_load().
> 
> This happens with virtio-rng as the chr_read()
> function which modifies the virqueue is called
> by the rng backend and the rng backend continues to
> run while the migration is running and the CPU is stopped.
> 
> This series fixes this problem by ignoring chr_read()
> calls while the CPU is stopped. The first patch of the
> series fixes another problem triggered by this error
> case: a use-after-free case.
> 
> The probability to have this problem is very low, as
> generally the post-copy phase is very short, so the window
> to modify the virtqueue while the vmstate has been sent
> is very small... except if you are doing trans-continental
> guest migration with high latency and post-copy phase that
> can be run for minutes.
> 
> I've been able to reproduce the problem locally on a host,
> by adding network latency with "tc". Another condition is
> to have an rng daemon running in the guest to generate
> events in the virtio-rng device.

Acked-by: Amit Shah <amit@kernel.org>

		Amit
-- 
http://log.amitshah.net/