29.01.2021 10:35, Roman Kagan wrote:
> On Fri, Jan 29, 2021 at 08:51:39AM +0300, Vladimir Sementsov-Ogievskiy wrote:
>> 28.01.2021 23:14, Roman Kagan wrote:
>>> During the final phase of migration the NBD reconnection logic may
>>> encounter situations it doesn't expect during regular operation.
>>>
>>> This series addresses some of them that make qemu crash. They are
>>> reproducible when a vm with a secondary drive attached via nbd with
>>> non-zero "reconnect-delay" runs a stress load (fio with big queue depth)
>>> in the guest on that drive and is migrated (e.g. to a file), while the
>>> nbd server is SIGKILL-ed and restarted every second.
>>>
>>> See the individual patches for specific crash conditions and more
>>> detailed explanations.
>>>
>>> Roman Kagan (3):
>>> block/nbd: only detach existing iochannel from aio_context
>>> block/nbd: only enter connection coroutine if it's present
>>> nbd: make nbd_read* return -EIO on error
>>>
>>> include/block/nbd.h | 7 ++++---
>>> block/nbd.c | 25 +++++++++++++++++--------
>>> 2 files changed, 21 insertions(+), 11 deletions(-)
>>>
>>
>> Thanks a lot for fixing!
>>
>> Do you have some reproducer scripts? Could you post them or may be add
>> an iotest?
>
> I don't have it scripted, just ad hoc command lines. I'll look into
> making up a test. Can you perhaps suggest what existing test to base
> on?
>
For now reconnect feature is covered only by two tests tests/qemu-iotests/264 and tests/qemu-iotests/277.
Also note, that since "f203080bbd iotests: rewrite check into python" you should add new iotests with human-readable file names into tests/qemu-iotests/tests subdirectory. Also you don't need to update tests/qemu-iotests/group file (it's absent now), test groups are defined in tests themselves in a comment, like "# group: rw quick".
--
Best regards,
Vladimir