[PATCH v2 0/3] block/nbd: fix crashers in reconnect while migrating

Roman Kagan posted 3 patches 3 years, 3 months ago
Test checkpatch passed
Patches applied successfully (tree, apply log)
git fetch https://github.com/patchew-project/qemu tags/patchew/20210129073859.683063-1-rvkagan@yandex-team.ru
Maintainers: Eric Blake <eblake@redhat.com>, Kevin Wolf <kwolf@redhat.com>, Max Reitz <mreitz@redhat.com>
include/block/nbd.h |  7 ++++---
block/nbd.c         | 25 +++++++++++++++++--------
2 files changed, 21 insertions(+), 11 deletions(-)
[PATCH v2 0/3] block/nbd: fix crashers in reconnect while migrating
Posted by Roman Kagan 3 years, 3 months ago
During the final phase of migration the NBD reconnection logic may
encounter situations it doesn't expect during regular operation.

This series addresses some of them that make qemu crash.  They are
reproducible when a vm with a secondary drive attached via nbd with
non-zero "reconnect-delay" runs a stress load (fio with big queue depth)
in the guest on that drive and is migrated (e.g. to a file), while the
nbd server is SIGKILL-ed and restarted every second.

See the individual patches for specific crash conditions and more
detailed explanations.

v1 -> v2:
- fix corrupted backtraces in log messages
- add r-b by Vladimir

Roman Kagan (3):
  block/nbd: only detach existing iochannel from aio_context
  block/nbd: only enter connection coroutine if it's present
  nbd: make nbd_read* return -EIO on error

 include/block/nbd.h |  7 ++++---
 block/nbd.c         | 25 +++++++++++++++++--------
 2 files changed, 21 insertions(+), 11 deletions(-)

-- 
2.29.2


Re: [PATCH v2 0/3] block/nbd: fix crashers in reconnect while migrating
Posted by Eric Blake 3 years, 2 months ago
On 1/29/21 1:38 AM, Roman Kagan wrote:
> During the final phase of migration the NBD reconnection logic may
> encounter situations it doesn't expect during regular operation.
> 
> This series addresses some of them that make qemu crash.  They are
> reproducible when a vm with a secondary drive attached via nbd with
> non-zero "reconnect-delay" runs a stress load (fio with big queue depth)
> in the guest on that drive and is migrated (e.g. to a file), while the
> nbd server is SIGKILL-ed and restarted every second.
> 
> See the individual patches for specific crash conditions and more
> detailed explanations.
> 
> v1 -> v2:
> - fix corrupted backtraces in log messages
> - add r-b by Vladimir
> 

Thanks, queuing through my NBD tree.

> Roman Kagan (3):
>   block/nbd: only detach existing iochannel from aio_context
>   block/nbd: only enter connection coroutine if it's present
>   nbd: make nbd_read* return -EIO on error
> 
>  include/block/nbd.h |  7 ++++---
>  block/nbd.c         | 25 +++++++++++++++++--------
>  2 files changed, 21 insertions(+), 11 deletions(-)
> 

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3226
Virtualization:  qemu.org | libvirt.org