[RFC PATCH v2 0/3] virtio-net: graceful drop of vhost for TAP

Yuri Benditovich posted 3 patches 3 years ago
Test checkpatch passed
Patches applied successfully (tree, apply log)
git fetch https://github.com/patchew-project/qemu tags/patchew/20210322122452.369750-1-yuri.benditovich@daynix.com
Maintainers: Jason Wang <jasowang@redhat.com>, "Michael S. Tsirkin" <mst@redhat.com>
hw/net/vhost_net.c         |  4 ++-
hw/net/virtio-net.c        | 51 ++++++++++++++++++++++++++++++++++++++
hw/virtio/virtio.c         |  8 ++++++
include/hw/virtio/virtio.h |  8 ++++++
include/net/net.h          |  1 +
5 files changed, 71 insertions(+), 1 deletion(-)
[RFC PATCH v2 0/3] virtio-net: graceful drop of vhost for TAP
Posted by Yuri Benditovich 3 years ago
Allow fallback to userspace only upon migration, only for specific features
and only if 'vhostforce' is not requested.

Changes from v1:
Patch 1 dropeed (will be submitted in another series)
Added device callback in case the migration should fail due to missing features

Yuri Benditovich (3):
  net: add ability to hide (disable) vhost_net
  virtio: introduce 'missing_features_migrated' device callback
  virtio-net: implement missing_features_migrated callback

 hw/net/vhost_net.c         |  4 ++-
 hw/net/virtio-net.c        | 51 ++++++++++++++++++++++++++++++++++++++
 hw/virtio/virtio.c         |  8 ++++++
 include/hw/virtio/virtio.h |  8 ++++++
 include/net/net.h          |  1 +
 5 files changed, 71 insertions(+), 1 deletion(-)

-- 
2.26.2


Re: [RFC PATCH v2 0/3] virtio-net: graceful drop of vhost for TAP
Posted by Jason Wang 3 years ago
在 2021/3/22 下午8:24, Yuri Benditovich 写道:
> Allow fallback to userspace only upon migration, only for specific features
> and only if 'vhostforce' is not requested.
>
> Changes from v1:
> Patch 1 dropeed (will be submitted in another series)
> Added device callback in case the migration should fail due to missing features


Hi Yuri:

Have a quick glance at the series. A questions is why we need to do the 
fallback only during load?

I think we should do it in the device initializating. E.g when the vhost 
features can not satisfy, we should disable vhost since there.

Thanks


>
> Yuri Benditovich (3):
>    net: add ability to hide (disable) vhost_net
>    virtio: introduce 'missing_features_migrated' device callback
>    virtio-net: implement missing_features_migrated callback
>
>   hw/net/vhost_net.c         |  4 ++-
>   hw/net/virtio-net.c        | 51 ++++++++++++++++++++++++++++++++++++++
>   hw/virtio/virtio.c         |  8 ++++++
>   include/hw/virtio/virtio.h |  8 ++++++
>   include/net/net.h          |  1 +
>   5 files changed, 71 insertions(+), 1 deletion(-)
>


Re: [RFC PATCH v2 0/3] virtio-net: graceful drop of vhost for TAP
Posted by Yuri Benditovich 3 years ago
Hi Jason,

This was discussed earlier on the previous series of patches.
https://lists.gnu.org/archive/html/qemu-devel/2021-02/msg01829.html
There were strong objections from both Daniel and Michael and I feel
that the series was rejected.
There was Michael's claim:
"We did what this patch is trying to change for years now, in
particular KVM also seems to happily disable CPU features not supported
by kernel so I wonder why we can't keep doing it, with tweaks for some
corner cases."
https://lists.gnu.org/archive/html/qemu-devel/2021-02/msg03187.html
And it was Michael's question:
"Can we limit the change to when a VM is migrated in?"
https://lists.gnu.org/archive/html/qemu-devel/2021-02/msg03163.html
So I'm trying to suggest another approach:
- In case of conflicting features (for example RSS and vhost) we in
qemu we do not have enough information to prefer one or another.
- If we drop to userspace in the first set_features we say: "vhost is
less important than other requested features"
- This series keeps backward compatibility, i.e. if you start with
vhost and some features are not available - they are silently cleared.
- But in case the features are available on source machine - they are used
- In case of migration this series says: "We prefer successful
migration even if for that we need to drop to userspace"
- On the migration back to the 1st system we again work with all the
features and with vhost as all the features are available.

Thanks,
Yuri



On Thu, Mar 25, 2021 at 8:59 AM Jason Wang <jasowang@redhat.com> wrote:
>
>
> 在 2021/3/22 下午8:24, Yuri Benditovich 写道:
> > Allow fallback to userspace only upon migration, only for specific features
> > and only if 'vhostforce' is not requested.
> >
> > Changes from v1:
> > Patch 1 dropeed (will be submitted in another series)
> > Added device callback in case the migration should fail due to missing features
>
>
> Hi Yuri:
>
> Have a quick glance at the series. A questions is why we need to do the
> fallback only during load?
>
> I think we should do it in the device initializating. E.g when the vhost
> features can not satisfy, we should disable vhost since there.
>
> Thanks
>
>
> >
> > Yuri Benditovich (3):
> >    net: add ability to hide (disable) vhost_net
> >    virtio: introduce 'missing_features_migrated' device callback
> >    virtio-net: implement missing_features_migrated callback
> >
> >   hw/net/vhost_net.c         |  4 ++-
> >   hw/net/virtio-net.c        | 51 ++++++++++++++++++++++++++++++++++++++
> >   hw/virtio/virtio.c         |  8 ++++++
> >   include/hw/virtio/virtio.h |  8 ++++++
> >   include/net/net.h          |  1 +
> >   5 files changed, 71 insertions(+), 1 deletion(-)
> >
>

Re: [RFC PATCH v2 0/3] virtio-net: graceful drop of vhost for TAP
Posted by Jason Wang 3 years ago
在 2021/3/25 下午5:00, Yuri Benditovich 写道:
> Hi Jason,
>
> This was discussed earlier on the previous series of patches.
> https://lists.gnu.org/archive/html/qemu-devel/2021-02/msg01829.html
> There were strong objections from both Daniel and Michael and I feel
> that the series was rejected.
> There was Michael's claim:
> "We did what this patch is trying to change for years now, in
> particular KVM also seems to happily disable CPU features not supported
> by kernel so I wonder why we can't keep doing it, with tweaks for some
> corner cases."


So for cpu feautres, it works since the management have other tool to 
the cpuid. Then management will make sure the migration happens amongs 
the hosts that is compatibile with the same cpuid sets.

For vhost, we don't have such capabilities, that's why I think we need 
to have fallback.


> https://lists.gnu.org/archive/html/qemu-devel/2021-02/msg03187.html
> And it was Michael's question:
> "Can we limit the change to when a VM is migrated in?"
> https://lists.gnu.org/archive/html/qemu-devel/2021-02/msg03163.html
> So I'm trying to suggest another approach:
> - In case of conflicting features (for example RSS and vhost) we in
> qemu we do not have enough information to prefer one or another.
> - If we drop to userspace in the first set_features we say: "vhost is
> less important than other requested features"
> - This series keeps backward compatibility, i.e. if you start with
> vhost and some features are not available - they are silently cleared.
> - But in case the features are available on source machine - they are used
> - In case of migration this series says: "We prefer successful
> migration even if for that we need to drop to userspace"
> - On the migration back to the 1st system we again work with all the
> features and with vhost as all the features are available.


One issue for this approach is that. Consider we had two drivers:

1) Driver A that supports split only
2) Driver B that supports packed

Consider src support packed but dest doesn't

So switching driver A to driver B works without migration. But if we 
switch driver from A to B after migration it won't work?

Thanks


>
> Thanks,
> Yuri
>
>
>
> On Thu, Mar 25, 2021 at 8:59 AM Jason Wang <jasowang@redhat.com> wrote:
>>
>> 在 2021/3/22 下午8:24, Yuri Benditovich 写道:
>>> Allow fallback to userspace only upon migration, only for specific features
>>> and only if 'vhostforce' is not requested.
>>>
>>> Changes from v1:
>>> Patch 1 dropeed (will be submitted in another series)
>>> Added device callback in case the migration should fail due to missing features
>>
>> Hi Yuri:
>>
>> Have a quick glance at the series. A questions is why we need to do the
>> fallback only during load?
>>
>> I think we should do it in the device initializating. E.g when the vhost
>> features can not satisfy, we should disable vhost since there.
>>
>> Thanks
>>
>>
>>> Yuri Benditovich (3):
>>>     net: add ability to hide (disable) vhost_net
>>>     virtio: introduce 'missing_features_migrated' device callback
>>>     virtio-net: implement missing_features_migrated callback
>>>
>>>    hw/net/vhost_net.c         |  4 ++-
>>>    hw/net/virtio-net.c        | 51 ++++++++++++++++++++++++++++++++++++++
>>>    hw/virtio/virtio.c         |  8 ++++++
>>>    include/hw/virtio/virtio.h |  8 ++++++
>>>    include/net/net.h          |  1 +
>>>    5 files changed, 71 insertions(+), 1 deletion(-)
>>>


Re: [RFC PATCH v2 0/3] virtio-net: graceful drop of vhost for TAP
Posted by Yuri Benditovich 3 years ago
On Fri, Mar 26, 2021 at 10:51 AM Jason Wang <jasowang@redhat.com> wrote:
>
>
> 在 2021/3/25 下午5:00, Yuri Benditovich 写道:
> > Hi Jason,
> >
> > This was discussed earlier on the previous series of patches.
> > https://lists.gnu.org/archive/html/qemu-devel/2021-02/msg01829.html
> > There were strong objections from both Daniel and Michael and I feel
> > that the series was rejected.
> > There was Michael's claim:
> > "We did what this patch is trying to change for years now, in
> > particular KVM also seems to happily disable CPU features not supported
> > by kernel so I wonder why we can't keep doing it, with tweaks for some
> > corner cases."
>
>
> So for cpu feautres, it works since the management have other tool to
> the cpuid. Then management will make sure the migration happens amongs
> the hosts that is compatibile with the same cpuid sets.
>
> For vhost, we don't have such capabilities, that's why I think we need
> to have fallback.
>
Hi Jason,
What, from your POV was the result of v1 discussion?
IMO, there was one critical comment that the patch does not address
'forcevhost' properly (indeed).
IMO, there are many comments from Daniel and Michael that in the sum
say that this change is not what they would like.
If I'm mistaken please let me know.

I have no problem to send v3 = v1 + handling of ''forcevhost'
If this is what you want, please let me know also.

>
> > https://lists.gnu.org/archive/html/qemu-devel/2021-02/msg03187.html
> > And it was Michael's question:
> > "Can we limit the change to when a VM is migrated in?"
> > https://lists.gnu.org/archive/html/qemu-devel/2021-02/msg03163.html
> > So I'm trying to suggest another approach:
> > - In case of conflicting features (for example RSS and vhost) we in
> > qemu we do not have enough information to prefer one or another.
> > - If we drop to userspace in the first set_features we say: "vhost is
> > less important than other requested features"
> > - This series keeps backward compatibility, i.e. if you start with
> > vhost and some features are not available - they are silently cleared.
> > - But in case the features are available on source machine - they are used
> > - In case of migration this series says: "We prefer successful
> > migration even if for that we need to drop to userspace"
> > - On the migration back to the 1st system we again work with all the
> > features and with vhost as all the features are available.
>
>
> One issue for this approach is that. Consider we had two drivers:
>
> 1) Driver A that supports split only
> 2) Driver B that supports packed
>
> Consider src support packed but dest doesn't
>
> So switching driver A to driver B works without migration. But if we
> switch driver from A to B after migration it won't work?

I assume that  both src and dest started with vhost=on.

As driver B supports both packed and split, you can switch from driver
A to driver B after migration
and driver B will work with split. Exactly as it does today.

The key question is what is more important - vhost or features that
vhost does not support?
current code says: vhost is more important always
v1 patch says: features are more important always.
v2 patch says: vhost is more important at init time, features are more
important at migration time.
Because we are able to drop vhost but we can't drop features when we
have a running driver.
Do you agree?

>
> Thanks
>
>
> >
> > Thanks,
> > Yuri
> >
> >
> >
> > On Thu, Mar 25, 2021 at 8:59 AM Jason Wang <jasowang@redhat.com> wrote:
> >>
> >> 在 2021/3/22 下午8:24, Yuri Benditovich 写道:
> >>> Allow fallback to userspace only upon migration, only for specific features
> >>> and only if 'vhostforce' is not requested.
> >>>
> >>> Changes from v1:
> >>> Patch 1 dropeed (will be submitted in another series)
> >>> Added device callback in case the migration should fail due to missing features
> >>
> >> Hi Yuri:
> >>
> >> Have a quick glance at the series. A questions is why we need to do the
> >> fallback only during load?
> >>
> >> I think we should do it in the device initializating. E.g when the vhost
> >> features can not satisfy, we should disable vhost since there.
> >>
> >> Thanks
> >>
> >>
> >>> Yuri Benditovich (3):
> >>>     net: add ability to hide (disable) vhost_net
> >>>     virtio: introduce 'missing_features_migrated' device callback
> >>>     virtio-net: implement missing_features_migrated callback
> >>>
> >>>    hw/net/vhost_net.c         |  4 ++-
> >>>    hw/net/virtio-net.c        | 51 ++++++++++++++++++++++++++++++++++++++
> >>>    hw/virtio/virtio.c         |  8 ++++++
> >>>    include/hw/virtio/virtio.h |  8 ++++++
> >>>    include/net/net.h          |  1 +
> >>>    5 files changed, 71 insertions(+), 1 deletion(-)
> >>>
>

Re: [RFC PATCH v2 0/3] virtio-net: graceful drop of vhost for TAP
Posted by Jason Wang 3 years ago
在 2021/3/26 下午5:09, Yuri Benditovich 写道:
> On Fri, Mar 26, 2021 at 10:51 AM Jason Wang <jasowang@redhat.com> wrote:
>>
>> 在 2021/3/25 下午5:00, Yuri Benditovich 写道:
>>> Hi Jason,
>>>
>>> This was discussed earlier on the previous series of patches.
>>> https://lists.gnu.org/archive/html/qemu-devel/2021-02/msg01829.html
>>> There were strong objections from both Daniel and Michael and I feel
>>> that the series was rejected.
>>> There was Michael's claim:
>>> "We did what this patch is trying to change for years now, in
>>> particular KVM also seems to happily disable CPU features not supported
>>> by kernel so I wonder why we can't keep doing it, with tweaks for some
>>> corner cases."
>>
>> So for cpu feautres, it works since the management have other tool to
>> the cpuid. Then management will make sure the migration happens amongs
>> the hosts that is compatibile with the same cpuid sets.
>>
>> For vhost, we don't have such capabilities, that's why I think we need
>> to have fallback.
>>
> Hi Jason,
> What, from your POV was the result of v1 discussion?


It looks to me we don't have an agreement on that, sorry.


> IMO, there was one critical comment that the patch does not address
> 'forcevhost' properly (indeed).
> IMO, there are many comments from Daniel and Michael that in the sum
> say that this change is not what they would like.
> If I'm mistaken please let me know.


I think I will open a new thread and summarize the different approaches 
and then we can come a conclusion.


>
> I have no problem to send v3 = v1 + handling of ''forcevhost'
> If this is what you want, please let me know also.
>
>>> https://lists.gnu.org/archive/html/qemu-devel/2021-02/msg03187.html
>>> And it was Michael's question:
>>> "Can we limit the change to when a VM is migrated in?"
>>> https://lists.gnu.org/archive/html/qemu-devel/2021-02/msg03163.html
>>> So I'm trying to suggest another approach:
>>> - In case of conflicting features (for example RSS and vhost) we in
>>> qemu we do not have enough information to prefer one or another.
>>> - If we drop to userspace in the first set_features we say: "vhost is
>>> less important than other requested features"
>>> - This series keeps backward compatibility, i.e. if you start with
>>> vhost and some features are not available - they are silently cleared.
>>> - But in case the features are available on source machine - they are used
>>> - In case of migration this series says: "We prefer successful
>>> migration even if for that we need to drop to userspace"
>>> - On the migration back to the 1st system we again work with all the
>>> features and with vhost as all the features are available.
>>
>> One issue for this approach is that. Consider we had two drivers:
>>
>> 1) Driver A that supports split only
>> 2) Driver B that supports packed
>>
>> Consider src support packed but dest doesn't
>>
>> So switching driver A to driver B works without migration. But if we
>> switch driver from A to B after migration it won't work?
> I assume that  both src and dest started with vhost=on.
>
> As driver B supports both packed and split, you can switch from driver
> A to driver B after migration
> and driver B will work with split. Exactly as it does today.
>
> The key question is what is more important - vhost or features that
> vhost does not support?
> current code says: vhost is more important always
> v1 patch says: features are more important always.
> v2 patch says: vhost is more important at init time, features are more
> important at migration time.
> Because we are able to drop vhost but we can't drop features when we
> have a running driver.
> Do you agree?


I think what came from cli is the most important. So if I understand 
correclty:

- vhost=on means "turn on vhost when possible" it implies that fallback 
is allowed (we had already had fallback codes)
- vhostforce=on means "turn on vhost unconditonally" it implies that we 
can't do fallback

So my understanding is that:

- "vhost=on, packed=on", we can fallback to userspace but must keep 
packed virtqueue works
- "vhost=on,vhostforce=on,packed=on", we can't fallback and must keep 
both vhost and packed virtqueue work, if we can't we need to fail

Thanks


>
>> Thanks
>>
>>
>>> Thanks,
>>> Yuri
>>>
>>>
>>>
>>> On Thu, Mar 25, 2021 at 8:59 AM Jason Wang <jasowang@redhat.com> wrote:
>>>> 在 2021/3/22 下午8:24, Yuri Benditovich 写道:
>>>>> Allow fallback to userspace only upon migration, only for specific features
>>>>> and only if 'vhostforce' is not requested.
>>>>>
>>>>> Changes from v1:
>>>>> Patch 1 dropeed (will be submitted in another series)
>>>>> Added device callback in case the migration should fail due to missing features
>>>> Hi Yuri:
>>>>
>>>> Have a quick glance at the series. A questions is why we need to do the
>>>> fallback only during load?
>>>>
>>>> I think we should do it in the device initializating. E.g when the vhost
>>>> features can not satisfy, we should disable vhost since there.
>>>>
>>>> Thanks
>>>>
>>>>
>>>>> Yuri Benditovich (3):
>>>>>      net: add ability to hide (disable) vhost_net
>>>>>      virtio: introduce 'missing_features_migrated' device callback
>>>>>      virtio-net: implement missing_features_migrated callback
>>>>>
>>>>>     hw/net/vhost_net.c         |  4 ++-
>>>>>     hw/net/virtio-net.c        | 51 ++++++++++++++++++++++++++++++++++++++
>>>>>     hw/virtio/virtio.c         |  8 ++++++
>>>>>     include/hw/virtio/virtio.h |  8 ++++++
>>>>>     include/net/net.h          |  1 +
>>>>>     5 files changed, 71 insertions(+), 1 deletion(-)
>>>>>


Re: [RFC PATCH v2 0/3] virtio-net: graceful drop of vhost for TAP
Posted by Jason Wang 2 years, 12 months ago
在 2021/3/26 下午5:18, Jason Wang 写道:
>>> ?
>> I assume that  both src and dest started with vhost=on.
>>
>> As driver B supports both packed and split, you can switch from driver
>> A to driver B after migration
>> and driver B will work with split. Exactly as it does today.
>>
>> The key question is what is more important - vhost or features that
>> vhost does not support?
>> current code says: vhost is more important always
>> v1 patch says: features are more important always.
>> v2 patch says: vhost is more important at init time, features are more
>> important at migration time.
>> Because we are able to drop vhost but we can't drop features when we
>> have a running driver.
>> Do you agree?
>
>
> I think what came from cli is the most important. So if I understand 
> correclty:
>
> - vhost=on means "turn on vhost when possible" it implies that 
> fallback is allowed (we had already had fallback codes)
> - vhostforce=on means "turn on vhost unconditonally" it implies that 
> we can't do fallback
>
> So my understanding is that:
>
> - "vhost=on, packed=on", we can fallback to userspace but must keep 
> packed virtqueue works
> - "vhost=on,vhostforce=on,packed=on", we can't fallback and must keep 
> both vhost and packed virtqueue work, if we can't we need to fail
>
> Thanks


Daniel and Michael, am I right here?

We need some inputs to move forward to fix the migration compatibility 
issue.

Thanks