[v1] mirror: cancel nbd reconnect

[PATCH 00/11] mirror: cancel nbd reconnect

Posted by Vladimir Sementsov-Ogievskiy 5 years, 2 months ago

Hi all!

The problem

Assume we have mirror job with nbd target node with enabled reconnect.
Connection failed. So, all current requests to nbd node are waiting for
nbd driver to reconnect. And they will wait for reconnect-delay time
specified in nbd blockdev options. This timeout may be long enough, for
example, we in Virtuozzo use 300 seconds by default.

So, if at this moment user tries to cancel the job, job will wait for
its in-flight requests to finish up to 300 seconds. From the user point
of view, cancelling the job takes a long time. Bad.

Solution

Let's just cancel "waiting for reconnect in in-flight request coroutines"
on mirror (and backup) cancel. Welcome the series below.

Vladimir Sementsov-Ogievskiy (11):
  block: add new BlockDriver handler: bdrv_cancel_in_flight
  block/nbd: implement .bdrv_cancel_in_flight
  block/raw-format: implement .bdrv_cancel_in_flight handler
  job: add .cancel handler for the driver
  block/mirror: implement .cancel job handler
  iotests/264: fix style
  iotests/264: move to python unittest
  iotests.py: qemu_nbd_popen: remove pid file after use
  iotests/264: add mirror-cancel test-case
  block/backup: implement .cancel job handler
  iotests/264: add backup-cancel test-case

 include/block/block.h         |   3 +
 include/block/block_int.h     |   9 +++
 include/qemu/job.h            |   5 ++
 block/backup.c                |  10 +++
 block/io.c                    |  11 +++
 block/mirror.c                |   9 +++
 block/nbd.c                   |  15 ++++
 block/raw-format.c            |   6 ++
 job.c                         |   3 +
 tests/qemu-iotests/264        | 143 ++++++++++++++++++++++------------
 tests/qemu-iotests/264.out    |  20 ++---
 tests/qemu-iotests/iotests.py |   6 +-
 12 files changed, 173 insertions(+), 67 deletions(-)

-- 
2.21.3

Re: [PATCH 00/11] mirror: cancel nbd reconnect

Posted by Vladimir Sementsov-Ogievskiy 5 years, 1 month ago

ping

18.11.2020 21:04, Vladimir Sementsov-Ogievskiy wrote:
> Hi all!
> 
> The problem
> 
> Assume we have mirror job with nbd target node with enabled reconnect.
> Connection failed. So, all current requests to nbd node are waiting for
> nbd driver to reconnect. And they will wait for reconnect-delay time
> specified in nbd blockdev options. This timeout may be long enough, for
> example, we in Virtuozzo use 300 seconds by default.
> 
> So, if at this moment user tries to cancel the job, job will wait for
> its in-flight requests to finish up to 300 seconds. From the user point
> of view, cancelling the job takes a long time. Bad.
> 
> Solution
> 
> Let's just cancel "waiting for reconnect in in-flight request coroutines"
> on mirror (and backup) cancel. Welcome the series below.
> 
> Vladimir Sementsov-Ogievskiy (11):
>    block: add new BlockDriver handler: bdrv_cancel_in_flight
>    block/nbd: implement .bdrv_cancel_in_flight
>    block/raw-format: implement .bdrv_cancel_in_flight handler
>    job: add .cancel handler for the driver
>    block/mirror: implement .cancel job handler
>    iotests/264: fix style
>    iotests/264: move to python unittest
>    iotests.py: qemu_nbd_popen: remove pid file after use
>    iotests/264: add mirror-cancel test-case
>    block/backup: implement .cancel job handler
>    iotests/264: add backup-cancel test-case
> 
>   include/block/block.h         |   3 +
>   include/block/block_int.h     |   9 +++
>   include/qemu/job.h            |   5 ++
>   block/backup.c                |  10 +++
>   block/io.c                    |  11 +++
>   block/mirror.c                |   9 +++
>   block/nbd.c                   |  15 ++++
>   block/raw-format.c            |   6 ++
>   job.c                         |   3 +
>   tests/qemu-iotests/264        | 143 ++++++++++++++++++++++------------
>   tests/qemu-iotests/264.out    |  20 ++---
>   tests/qemu-iotests/iotests.py |   6 +-
>   12 files changed, 173 insertions(+), 67 deletions(-)
> 


-- 
Best regards,
Vladimir

Re: [PATCH 00/11] mirror: cancel nbd reconnect

Posted by Vladimir Sementsov-Ogievskiy 5 years, 1 month ago

ping ping

18.12.2020 14:05, Vladimir Sementsov-Ogievskiy wrote:
> ping
> 
> 18.11.2020 21:04, Vladimir Sementsov-Ogievskiy wrote:
>> Hi all!
>>
>> The problem
>>
>> Assume we have mirror job with nbd target node with enabled reconnect.
>> Connection failed. So, all current requests to nbd node are waiting for
>> nbd driver to reconnect. And they will wait for reconnect-delay time
>> specified in nbd blockdev options. This timeout may be long enough, for
>> example, we in Virtuozzo use 300 seconds by default.
>>
>> So, if at this moment user tries to cancel the job, job will wait for
>> its in-flight requests to finish up to 300 seconds. From the user point
>> of view, cancelling the job takes a long time. Bad.
>>
>> Solution
>>
>> Let's just cancel "waiting for reconnect in in-flight request coroutines"
>> on mirror (and backup) cancel. Welcome the series below.
>>
>> Vladimir Sementsov-Ogievskiy (11):
>>    block: add new BlockDriver handler: bdrv_cancel_in_flight
>>    block/nbd: implement .bdrv_cancel_in_flight
>>    block/raw-format: implement .bdrv_cancel_in_flight handler
>>    job: add .cancel handler for the driver
>>    block/mirror: implement .cancel job handler
>>    iotests/264: fix style
>>    iotests/264: move to python unittest
>>    iotests.py: qemu_nbd_popen: remove pid file after use
>>    iotests/264: add mirror-cancel test-case
>>    block/backup: implement .cancel job handler
>>    iotests/264: add backup-cancel test-case
>>
>>   include/block/block.h         |   3 +
>>   include/block/block_int.h     |   9 +++
>>   include/qemu/job.h            |   5 ++
>>   block/backup.c                |  10 +++
>>   block/io.c                    |  11 +++
>>   block/mirror.c                |   9 +++
>>   block/nbd.c                   |  15 ++++
>>   block/raw-format.c            |   6 ++
>>   job.c                         |   3 +
>>   tests/qemu-iotests/264        | 143 ++++++++++++++++++++++------------
>>   tests/qemu-iotests/264.out    |  20 ++---
>>   tests/qemu-iotests/iotests.py |   6 +-
>>   12 files changed, 173 insertions(+), 67 deletions(-)
>>
> 
> 


-- 
Best regards,
Vladimir

Re: [PATCH 00/11] mirror: cancel nbd reconnect

Posted by Eric Blake 5 years, 2 months ago

On 11/18/20 12:04 PM, Vladimir Sementsov-Ogievskiy wrote:
> Hi all!
> 
> The problem
> 
> Assume we have mirror job with nbd target node with enabled reconnect.
> Connection failed. So, all current requests to nbd node are waiting for
> nbd driver to reconnect. And they will wait for reconnect-delay time
> specified in nbd blockdev options. This timeout may be long enough, for
> example, we in Virtuozzo use 300 seconds by default.
> 
> So, if at this moment user tries to cancel the job, job will wait for
> its in-flight requests to finish up to 300 seconds. From the user point
> of view, cancelling the job takes a long time. Bad.
> 
> Solution
> 
> Let's just cancel "waiting for reconnect in in-flight request coroutines"
> on mirror (and backup) cancel. Welcome the series below.

Given that we're past -rc2, I think this is enough on the 'new feature'
side to defer into 6.0 rather than trying to claim as a bug-fix needed
for 5.2-rc3.

That said, the summary does make it sound like a worthwhile thing to add.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3226
Virtualization:  qemu.org | libvirt.org

Re: [PATCH 00/11] mirror: cancel nbd reconnect

Posted by Vladimir Sementsov-Ogievskiy 5 years, 2 months ago

18.11.2020 21:19, Eric Blake wrote:
> On 11/18/20 12:04 PM, Vladimir Sementsov-Ogievskiy wrote:
>> Hi all!
>>
>> The problem
>>
>> Assume we have mirror job with nbd target node with enabled reconnect.
>> Connection failed. So, all current requests to nbd node are waiting for
>> nbd driver to reconnect. And they will wait for reconnect-delay time
>> specified in nbd blockdev options. This timeout may be long enough, for
>> example, we in Virtuozzo use 300 seconds by default.
>>
>> So, if at this moment user tries to cancel the job, job will wait for
>> its in-flight requests to finish up to 300 seconds. From the user point
>> of view, cancelling the job takes a long time. Bad.
>>
>> Solution
>>
>> Let's just cancel "waiting for reconnect in in-flight request coroutines"
>> on mirror (and backup) cancel. Welcome the series below.
> 
> Given that we're past -rc2, I think this is enough on the 'new feature'
> side to defer into 6.0 rather than trying to claim as a bug-fix needed
> for 5.2-rc3.

Of course.

> 
> That said, the summary does make it sound like a worthwhile thing to add.
> 


-- 
Best regards,
Vladimir

Re: [PATCH 00/11] mirror: cancel nbd reconnect

Posted by Eric Blake 5 years ago

On 11/18/20 12:04 PM, Vladimir Sementsov-Ogievskiy wrote:
> Hi all!
> 
> The problem
> 
> Assume we have mirror job with nbd target node with enabled reconnect.
> Connection failed. So, all current requests to nbd node are waiting for
> nbd driver to reconnect. And they will wait for reconnect-delay time
> specified in nbd blockdev options. This timeout may be long enough, for
> example, we in Virtuozzo use 300 seconds by default.
> 
> So, if at this moment user tries to cancel the job, job will wait for
> its in-flight requests to finish up to 300 seconds. From the user point
> of view, cancelling the job takes a long time. Bad.
> 
> Solution
> 
> Let's just cancel "waiting for reconnect in in-flight request coroutines"
> on mirror (and backup) cancel. Welcome the series below.

Because of my question on 4/11, I did not queue the entire series yet.
But 6/11 was trivial enough to queue now.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3226
Virtualization:  qemu.org | libvirt.org

Re: [PATCH 00/11] mirror: cancel nbd reconnect

Posted by Vladimir Sementsov-Ogievskiy 5 years ago

21.01.2021 05:14, Eric Blake wrote:
> On 11/18/20 12:04 PM, Vladimir Sementsov-Ogievskiy wrote:
>> Hi all!
>>
>> The problem
>>
>> Assume we have mirror job with nbd target node with enabled reconnect.
>> Connection failed. So, all current requests to nbd node are waiting for
>> nbd driver to reconnect. And they will wait for reconnect-delay time
>> specified in nbd blockdev options. This timeout may be long enough, for
>> example, we in Virtuozzo use 300 seconds by default.
>>
>> So, if at this moment user tries to cancel the job, job will wait for
>> its in-flight requests to finish up to 300 seconds. From the user point
>> of view, cancelling the job takes a long time. Bad.
>>
>> Solution
>>
>> Let's just cancel "waiting for reconnect in in-flight request coroutines"
>> on mirror (and backup) cancel. Welcome the series below.
> 
> Because of my question on 4/11, I did not queue the entire series yet.
> But 6/11 was trivial enough to queue now.
> 

Thanks!

-- 
Best regards,
Vladimir