[Qemu-devel] [PATCH v3 00/11] NBD reconnect

Vladimir Sementsov-Ogievskiy posted 11 patches 5 years, 10 months ago
Failed in applying to current master (apply log)
Test checkpatch passed
Test docker-mingw@fedora passed
Test docker-quick@centos7 passed
Test s390x passed
There is a newer version of this series
qapi/block-core.json          |  12 +-
block/nbd-client.h            |  23 ++-
block/nbd-client.c            | 429 ++++++++++++++++++++++++++++++------------
block/nbd.c                   |  61 +++---
tests/qemu-iotests/220        |  68 +++++++
tests/qemu-iotests/220.out    |   7 +
tests/qemu-iotests/group      |   1 +
tests/qemu-iotests/iotests.py |   4 +
8 files changed, 445 insertions(+), 160 deletions(-)
create mode 100755 tests/qemu-iotests/220
create mode 100644 tests/qemu-iotests/220.out
[Qemu-devel] [PATCH v3 00/11] NBD reconnect
Posted by Vladimir Sementsov-Ogievskiy 5 years, 10 months ago
Hi all.

Here is NBD reconnect.
The feature realized inside nbd-client driver and works as follows:

There are two parameters: reconnect-attempts and reconnect-timeout.
So, we will try to reconnect in case of initial connection failed or
in case of connection lost. All current and new io operations will wait
until we make reconnect-attempts tries to reconnect. After this, all
requests will fail with EIO, but we will continue trying to reconnect.

v3:
06: fix build error in function 'nbd_co_send_request':
     error: 'i' may be used uninitialized in this function

v2 notes:
Here is v2 of NBD reconnect, but it is very very different from v1, so,
forget about v1.
The series includes my "NBD reconnect: preliminary refactoring", with
changes in 05: leave asserts (Eric).

Vladimir Sementsov-Ogievskiy (11):
  block/nbd-client: split channel errors from export errors
  block/nbd: move connection code from block/nbd to block/nbd-client
  block/nbd-client: split connection from initialization
  block/nbd-client: fix nbd_reply_chunk_iter_receive
  block/nbd-client: don't check ioc
  block/nbd-client: move from quit to state
  block/nbd-client: rename read_reply_co to connection_co
  block/nbd-client: move connecting to connection_co
  block/nbd: add cmdline and qapi parameters for nbd reconnect
  block/nbd-client: nbd reconnect
  iotests: test nbd reconnect

 qapi/block-core.json          |  12 +-
 block/nbd-client.h            |  23 ++-
 block/nbd-client.c            | 429 ++++++++++++++++++++++++++++++------------
 block/nbd.c                   |  61 +++---
 tests/qemu-iotests/220        |  68 +++++++
 tests/qemu-iotests/220.out    |   7 +
 tests/qemu-iotests/group      |   1 +
 tests/qemu-iotests/iotests.py |   4 +
 8 files changed, 445 insertions(+), 160 deletions(-)
 create mode 100755 tests/qemu-iotests/220
 create mode 100644 tests/qemu-iotests/220.out

-- 
2.11.1


Re: [Qemu-devel] [PATCH v3 00/11] NBD reconnect
Posted by Vladimir Sementsov-Ogievskiy 5 years, 9 months ago
Hi all.

before v4 realization, I'd like to discuss some questions.

Our proposal for v4 is the following:

1. don't reconnect on nbd_open. So, on open we do only one connect 
attempt, and if it fails, open fails.
2. don't configure timeout between attempts. instead do the following:
     1s timeout, then 2s, then 4, 8, 16, and then always 16 until success
3. configure only time, after disconnect, during which requests are 
paused (and after this time, if not connected, they will return EIO). Or 
not configure it for now, make a default of 5 minutes.

Any ideas?


09.06.2018 18:32, Vladimir Sementsov-Ogievskiy wrote:
> Hi all.
>
> Here is NBD reconnect.
> The feature realized inside nbd-client driver and works as follows:
>
> There are two parameters: reconnect-attempts and reconnect-timeout.
> So, we will try to reconnect in case of initial connection failed or
> in case of connection lost. All current and new io operations will wait
> until we make reconnect-attempts tries to reconnect. After this, all
> requests will fail with EIO, but we will continue trying to reconnect.
>
> v3:
> 06: fix build error in function 'nbd_co_send_request':
>       error: 'i' may be used uninitialized in this function
>
> v2 notes:
> Here is v2 of NBD reconnect, but it is very very different from v1, so,
> forget about v1.
> The series includes my "NBD reconnect: preliminary refactoring", with
> changes in 05: leave asserts (Eric).
>
> Vladimir Sementsov-Ogievskiy (11):
>    block/nbd-client: split channel errors from export errors
>    block/nbd: move connection code from block/nbd to block/nbd-client
>    block/nbd-client: split connection from initialization
>    block/nbd-client: fix nbd_reply_chunk_iter_receive
>    block/nbd-client: don't check ioc
>    block/nbd-client: move from quit to state
>    block/nbd-client: rename read_reply_co to connection_co
>    block/nbd-client: move connecting to connection_co
>    block/nbd: add cmdline and qapi parameters for nbd reconnect
>    block/nbd-client: nbd reconnect
>    iotests: test nbd reconnect
>
>   qapi/block-core.json          |  12 +-
>   block/nbd-client.h            |  23 ++-
>   block/nbd-client.c            | 429 ++++++++++++++++++++++++++++++------------
>   block/nbd.c                   |  61 +++---
>   tests/qemu-iotests/220        |  68 +++++++
>   tests/qemu-iotests/220.out    |   7 +
>   tests/qemu-iotests/group      |   1 +
>   tests/qemu-iotests/iotests.py |   4 +
>   8 files changed, 445 insertions(+), 160 deletions(-)
>   create mode 100755 tests/qemu-iotests/220
>   create mode 100644 tests/qemu-iotests/220.out
>


-- 
Best regards,
Vladimir


Re: [Qemu-devel] [PATCH v3 00/11] NBD reconnect
Posted by Eric Blake 5 years, 9 months ago
On 07/03/2018 08:46 AM, Vladimir Sementsov-Ogievskiy wrote:
> Hi all.
> 
> before v4 realization, I'd like to discuss some questions.
> 
> Our proposal for v4 is the following:
> 
> 1. don't reconnect on nbd_open. So, on open we do only one connect 
> attempt, and if it fails, open fails.
> 2. don't configure timeout between attempts. instead do the following:
>      1s timeout, then 2s, then 4, 8, 16, and then always 16 until success
> 3. configure only time, after disconnect, during which requests are 
> paused (and after this time, if not connected, they will return EIO). Or 
> not configure it for now, make a default of 5 minutes.
> 
> Any ideas?

I apologize that I haven't had time to review this series closely.  At 
this point, I think it's missed 3.0, and will have to be 3.1 material. 
Your proposal for exponential backoff on reconnect attempts makes sense; 
beyond that, I haven't reviewed closely enough to know if I have other 
suggestions on how many knobs to expose to the user, vs. how much to 
make automatic.

> 
> 
> 09.06.2018 18:32, Vladimir Sementsov-Ogievskiy wrote:
>> Hi all.
>>
>> Here is NBD reconnect.
>> The feature realized inside nbd-client driver and works as follows:
>>
>> There are two parameters: reconnect-attempts and reconnect-timeout.
>> So, we will try to reconnect in case of initial connection failed or
>> in case of connection lost. All current and new io operations will wait
>> until we make reconnect-attempts tries to reconnect. After this, all
>> requests will fail with EIO, but we will continue trying to reconnect.
>>
-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org