[PATCH 0/2] Make local migration with TAP network device possible

Andrey Ryabinin posted 2 patches 3 years, 7 months ago
Patches applied successfully (tree, apply log)
git fetch https://github.com/patchew-project/qemu tags/patchew/20220614111843.24960-1-arbn@yandex-team.com
Maintainers: "Marc-André Lureau" <marcandre.lureau@redhat.com>, Paolo Bonzini <pbonzini@redhat.com>, "Daniel P. Berrangé" <berrange@redhat.com>, Juan Quintela <quintela@redhat.com>, "Dr. David Alan Gilbert" <dgilbert@redhat.com>, Jason Wang <jasowang@redhat.com>
chardev/char-socket.c |  3 ---
io/channel-socket.c   |  3 ---
migration/fd.c        |  2 ++
net/tap-bsd.c         |  5 +++++
net/tap-linux.c       | 12 ++++++++++++
net/tap-solaris.c     |  5 +++++
net/tap.c             |  2 +-
net/tap_int.h         |  1 +
8 files changed, 26 insertions(+), 7 deletions(-)
[PATCH 0/2] Make local migration with TAP network device possible
Posted by Andrey Ryabinin 3 years, 7 months ago
Hi

These couple patches aims to  make possible local migration (within one host)
on the same TAP device used by source and destination QEMU

The scenario looks like this
 1. Create TAP devices and pass file descriptors to source QEMU
 2. Launch destination QEMU (-incoming defer) and pass same descriptors to it.
 3. Start migration


Regarding the first patch: It makes possible to receive file descriptor in non-blocking
state. But I probably didn't cover all FD users which might need to set blocking state after
the patch. So I'm hopping for the hints where else, besides fd_start_incoming_migration()
I need to put qemu_socket_set_block() calls.


Andrey Ryabinin (2):
  chardev: don't set O_NONBLOCK on SCM_RIGHTS file descriptors.
  tap: initialize TAPState->enabled according to the actual state of
    queue

 chardev/char-socket.c |  3 ---
 io/channel-socket.c   |  3 ---
 migration/fd.c        |  2 ++
 net/tap-bsd.c         |  5 +++++
 net/tap-linux.c       | 12 ++++++++++++
 net/tap-solaris.c     |  5 +++++
 net/tap.c             |  2 +-
 net/tap_int.h         |  1 +
 8 files changed, 26 insertions(+), 7 deletions(-)

-- 
2.35.1
Re: [PATCH 0/2] Make local migration with TAP network device possible
Posted by Stefan Hajnoczi 3 years, 7 months ago
On Tue, Jun 14, 2022 at 02:18:41PM +0300, Andrey Ryabinin wrote:
> Hi
> 
> These couple patches aims to  make possible local migration (within one host)
> on the same TAP device used by source and destination QEMU
> 
> The scenario looks like this
>  1. Create TAP devices and pass file descriptors to source QEMU
>  2. Launch destination QEMU (-incoming defer) and pass same descriptors to it.
>  3. Start migration
> 
> 
> Regarding the first patch: It makes possible to receive file descriptor in non-blocking
> state. But I probably didn't cover all FD users which might need to set blocking state after
> the patch. So I'm hopping for the hints where else, besides fd_start_incoming_migration()
> I need to put qemu_socket_set_block() calls.

Nice feature. I am worried that these patches are unsafe/incomplete
though.

Tap local migration isn't explicitly visible in the code. How will other
developers know the feature is there and how to avoid breaking it when
modifying the code? Maybe a migration test case, comments that explain
the rules about accessing the tap fd, and/or assertions?

How does this interact with hw/net/vhost_net.c, which uses tap_get_fd()
to borrow the fd? I guess the idea is that the source VM is paused and
no tap activity is expected. Then migration handover happens and the
destination VM starts running and is allowed to access the tap fd.
However, the source VM still has vhost_net with the tap fd set up. I
wonder if there is any issue with interference between the two vhost_net
instances?

These kinds of questions should be answered, mostly in the code but also
in the cover letter. It should be clear why this approach is correct.

Thanks,
Stefan

> 
> 
> Andrey Ryabinin (2):
>   chardev: don't set O_NONBLOCK on SCM_RIGHTS file descriptors.
>   tap: initialize TAPState->enabled according to the actual state of
>     queue
> 
>  chardev/char-socket.c |  3 ---
>  io/channel-socket.c   |  3 ---
>  migration/fd.c        |  2 ++
>  net/tap-bsd.c         |  5 +++++
>  net/tap-linux.c       | 12 ++++++++++++
>  net/tap-solaris.c     |  5 +++++
>  net/tap.c             |  2 +-
>  net/tap_int.h         |  1 +
>  8 files changed, 26 insertions(+), 7 deletions(-)
> 
> -- 
> 2.35.1
> 
Re: [PATCH 0/2] Make local migration with TAP network device possible
Posted by Andrey Ryabinin 3 years, 7 months ago

On 6/15/22 16:10, Stefan Hajnoczi wrote:
> On Tue, Jun 14, 2022 at 02:18:41PM +0300, Andrey Ryabinin wrote:
>> Hi
>>
>> These couple patches aims to  make possible local migration (within one host)
>> on the same TAP device used by source and destination QEMU
>>
>> The scenario looks like this
>>  1. Create TAP devices and pass file descriptors to source QEMU
>>  2. Launch destination QEMU (-incoming defer) and pass same descriptors to it.
>>  3. Start migration
>>
>>
>> Regarding the first patch: It makes possible to receive file descriptor in non-blocking
>> state. But I probably didn't cover all FD users which might need to set blocking state after
>> the patch. So I'm hopping for the hints where else, besides fd_start_incoming_migration()
>> I need to put qemu_socket_set_block() calls.
> 
> Nice feature. I am worried that these patches are unsafe/incomplete
> though.
> 
> Tap local migration isn't explicitly visible in the code. How will other
> developers know the feature is there and how to avoid breaking it when
> modifying the code? Maybe a migration test case, comments that explain
> the rules about accessing the tap fd, and/or assertions?
> 

Yeah, I'm writing avocado test case, will add it in future next iterations.

> How does this interact with hw/net/vhost_net.c, which uses tap_get_fd()
> to borrow the fd? I guess the idea is that the source VM is paused and
> no tap activity is expected. Then migration handover happens and the
> destination VM starts running and is allowed to access the tap fd.
> However, the source VM still has vhost_net with the tap fd set up. I
> wonder if there is any issue with interference between the two vhost_net
> instances?
> 

I'll try to take a closer look at code, let's see if I can find any possible issues.
But I did some brief testing with vhost=on net device, and it find any problems.
The test was
1. launch pings from source VM to host
2. Migrate
3. Check at dest VM that there is no lost packets.


> These kinds of questions should be answered, mostly in the code but also
> in the cover letter. It should be clear why this approach is correct.
> 
> Thanks,
> Stefan
> 
>>
>>
>> Andrey Ryabinin (2):
>>   chardev: don't set O_NONBLOCK on SCM_RIGHTS file descriptors.
>>   tap: initialize TAPState->enabled according to the actual state of
>>     queue
>>
>>  chardev/char-socket.c |  3 ---
>>  io/channel-socket.c   |  3 ---
>>  migration/fd.c        |  2 ++
>>  net/tap-bsd.c         |  5 +++++
>>  net/tap-linux.c       | 12 ++++++++++++
>>  net/tap-solaris.c     |  5 +++++
>>  net/tap.c             |  2 +-
>>  net/tap_int.h         |  1 +
>>  8 files changed, 26 insertions(+), 7 deletions(-)
>>
>> -- 
>> 2.35.1
>>
Re: [PATCH 0/2] Make local migration with TAP network device possible
Posted by Stefan Hajnoczi 3 years, 7 months ago
Thanks! I will be offline for the next week.

Feel free to progress this series without my review.

Stefan

On Fri, Jun 24, 2022, 11:53 Andrey Ryabinin <arbn@yandex-team.com> wrote:

>
>
> On 6/15/22 16:10, Stefan Hajnoczi wrote:
> > On Tue, Jun 14, 2022 at 02:18:41PM +0300, Andrey Ryabinin wrote:
> >> Hi
> >>
> >> These couple patches aims to  make possible local migration (within one
> host)
> >> on the same TAP device used by source and destination QEMU
> >>
> >> The scenario looks like this
> >>  1. Create TAP devices and pass file descriptors to source QEMU
> >>  2. Launch destination QEMU (-incoming defer) and pass same descriptors
> to it.
> >>  3. Start migration
> >>
> >>
> >> Regarding the first patch: It makes possible to receive file descriptor
> in non-blocking
> >> state. But I probably didn't cover all FD users which might need to set
> blocking state after
> >> the patch. So I'm hopping for the hints where else, besides
> fd_start_incoming_migration()
> >> I need to put qemu_socket_set_block() calls.
> >
> > Nice feature. I am worried that these patches are unsafe/incomplete
> > though.
> >
> > Tap local migration isn't explicitly visible in the code. How will other
> > developers know the feature is there and how to avoid breaking it when
> > modifying the code? Maybe a migration test case, comments that explain
> > the rules about accessing the tap fd, and/or assertions?
> >
>
> Yeah, I'm writing avocado test case, will add it in future next iterations.
>
> > How does this interact with hw/net/vhost_net.c, which uses tap_get_fd()
> > to borrow the fd? I guess the idea is that the source VM is paused and
> > no tap activity is expected. Then migration handover happens and the
> > destination VM starts running and is allowed to access the tap fd.
> > However, the source VM still has vhost_net with the tap fd set up. I
> > wonder if there is any issue with interference between the two vhost_net
> > instances?
> >
>
> I'll try to take a closer look at code, let's see if I can find any
> possible issues.
> But I did some brief testing with vhost=on net device, and it find any
> problems.
> The test was
> 1. launch pings from source VM to host
> 2. Migrate
> 3. Check at dest VM that there is no lost packets.
>
>
> > These kinds of questions should be answered, mostly in the code but also
> > in the cover letter. It should be clear why this approach is correct.
> >
> > Thanks,
> > Stefan
> >
> >>
> >>
> >> Andrey Ryabinin (2):
> >>   chardev: don't set O_NONBLOCK on SCM_RIGHTS file descriptors.
> >>   tap: initialize TAPState->enabled according to the actual state of
> >>     queue
> >>
> >>  chardev/char-socket.c |  3 ---
> >>  io/channel-socket.c   |  3 ---
> >>  migration/fd.c        |  2 ++
> >>  net/tap-bsd.c         |  5 +++++
> >>  net/tap-linux.c       | 12 ++++++++++++
> >>  net/tap-solaris.c     |  5 +++++
> >>  net/tap.c             |  2 +-
> >>  net/tap_int.h         |  1 +
> >>  8 files changed, 26 insertions(+), 7 deletions(-)
> >>
> >> --
> >> 2.35.1
> >>
>
>
Re: [PATCH 0/2] Make local migration with TAP network device possible
Posted by Daniel P. Berrangé 3 years, 7 months ago
On Tue, Jun 14, 2022 at 02:18:41PM +0300, Andrey Ryabinin wrote:
> Hi
> 
> These couple patches aims to  make possible local migration (within one host)
> on the same TAP device used by source and destination QEMU
> 
> The scenario looks like this
>  1. Create TAP devices and pass file descriptors to source QEMU
>  2. Launch destination QEMU (-incoming defer) and pass same descriptors to it.
>  3. Start migration
> 
> 
> Regarding the first patch: It makes possible to receive file descriptor in non-blocking
> state. But I probably didn't cover all FD users which might need to set blocking state after
> the patch. So I'm hopping for the hints where else, besides fd_start_incoming_migration()
> I need to put qemu_socket_set_block() calls.

You'll need to check all callers of

qio_channel_readv_full
qio_channel_readv_full_all
qio_channel_readv_full_all_eof

and identify which pass a non-NULL parameter for 'fds'. If the caller
does NOT have a qemu_setnonblock call on the FD it gets back, then you
have to assume it is expecting it in blocking mode and so need to
add qemu_setblock


With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|