[PATCH v2 00/27] Virtio net failover fixes

Juan Quintela posted 27 patches 3 years, 5 months ago
Test checkpatch passed
Patches applied successfully (tree, apply log)
git fetch https://github.com/patchew-project/qemu tags/patchew/20201118083748.1328-1-quintela@redhat.com
Maintainers: Jason Wang <jasowang@redhat.com>, "Daniel P. Berrangé" <berrange@redhat.com>, Paolo Bonzini <pbonzini@redhat.com>, "Michael S. Tsirkin" <mst@redhat.com>, Eduardo Habkost <ehabkost@redhat.com>, "Dr. David Alan Gilbert" <dgilbert@redhat.com>, Juan Quintela <quintela@redhat.com>
include/hw/qdev-core.h         |  28 ++--
include/hw/virtio/virtio-net.h |   9 +-
hw/core/qdev.c                 |  19 +--
hw/net/virtio-net.c            | 298 +++++++++++++--------------------
migration/migration.c          |  13 ++
softmmu/qdev-monitor.c         |  43 ++---
6 files changed, 167 insertions(+), 243 deletions(-)
[PATCH v2 00/27] Virtio net failover fixes
Posted by Juan Quintela 3 years, 5 months ago
Hi

This is a big rework of the network failover setup.  General idea is:
* We don't cache the name of the primary/standby devices
  We have several problems there with stale pointers
* After this:
- We can always remove either the primary/standby devices without trouble
- Pluggin/unplugging works
- We go to device opts to see what the failover device are.
  Notice that we are plugging/unplugging the device, so it is not critical.
- Once there, I "fixed" managedsave for libvirt (now gives an error instead o=
f just hanging)
* Fields not cached anymore:
- primary_device_dict
- primary_device_opts
- standby_id
- primary_device_id
- primary_dev
* I renamed the should_be_hidden() callback to hide device, but if
  people preffer the old name I can leave it.
* Add (some) doc to some functions
* Remove almost 100 lines of code while fixing things.

Please review.

Later, Juan.

Juan Quintela (27):
  migration: Network Failover can't work with a paused guest
  failover: fix indentantion
  failover: Use always atomics for primary_should_be_hidden
  failover: primary bus is only used once, and where it is set
  failover: Remove unused parameter
  failover: Remove external partially_hotplugged property
  failover: qdev_device_add() returns err or dev set
  failover: Rename bool to failover_primary_hidden
  failover: g_strcmp0() knows how to handle NULL
  failover: Remove primary_device_opts
  failover: remove standby_id variable
  failover: Remove primary_device_dict
  failover: Remove memory leak
  failover: simplify virtio_net_find_primary()
  failover: should_be_hidden() should take a bool
  failover: Rename function to hide_device()
  failover: virtio_net_connect_failover_devices() does nothing
  failover: Rename to failover_find_primary_device()
  failover: simplify qdev_device_add() failover case
  failover: simplify qdev_device_add()
  failover: make sure that id always exist
  failover: remove failover_find_primary_device() error parameter
  failover: split failover_find_primary_device_id()
  failover: We don't need to cache primary_device_id anymore
  failover: Caller of this two functions already have primary_dev
  failover: simplify failover_unplug_primary
  failover: Remove primary_dev member

 include/hw/qdev-core.h         |  28 ++--
 include/hw/virtio/virtio-net.h |   9 +-
 hw/core/qdev.c                 |  19 +--
 hw/net/virtio-net.c            | 298 +++++++++++++--------------------
 migration/migration.c          |  13 ++
 softmmu/qdev-monitor.c         |  43 ++---
 6 files changed, 167 insertions(+), 243 deletions(-)

--=20
2.26.2



Re: [PATCH v2 00/27] Virtio net failover fixes
Posted by Michael S. Tsirkin 3 years, 5 months ago
On Wed, Nov 18, 2020 at 09:37:21AM +0100, Juan Quintela wrote:
> Hi
> 
> This is a big rework of the network failover setup.  General idea is:
> * We don't cache the name of the primary/standby devices
>   We have several problems there with stale pointers
> * After this:
> - We can always remove either the primary/standby devices without trouble
> - Pluggin/unplugging works
> - We go to device opts to see what the failover device are.
>   Notice that we are plugging/unplugging the device, so it is not critical.
> - Once there, I "fixed" managedsave for libvirt (now gives an error instead o=
> f just hanging)
> * Fields not cached anymore:
> - primary_device_dict
> - primary_device_opts
> - standby_id
> - primary_device_id
> - primary_dev
> * I renamed the should_be_hidden() callback to hide device, but if
>   people preffer the old name I can leave it.
> * Add (some) doc to some functions
> * Remove almost 100 lines of code while fixing things.
> 
> Please review.

OK that's great, any of this appropriate for 5.2?
The memory leak fix maybe?

> Later, Juan.
> 
> Juan Quintela (27):
>   migration: Network Failover can't work with a paused guest
>   failover: fix indentantion
>   failover: Use always atomics for primary_should_be_hidden
>   failover: primary bus is only used once, and where it is set
>   failover: Remove unused parameter
>   failover: Remove external partially_hotplugged property
>   failover: qdev_device_add() returns err or dev set
>   failover: Rename bool to failover_primary_hidden
>   failover: g_strcmp0() knows how to handle NULL
>   failover: Remove primary_device_opts
>   failover: remove standby_id variable
>   failover: Remove primary_device_dict
>   failover: Remove memory leak
>   failover: simplify virtio_net_find_primary()
>   failover: should_be_hidden() should take a bool
>   failover: Rename function to hide_device()
>   failover: virtio_net_connect_failover_devices() does nothing
>   failover: Rename to failover_find_primary_device()
>   failover: simplify qdev_device_add() failover case
>   failover: simplify qdev_device_add()
>   failover: make sure that id always exist
>   failover: remove failover_find_primary_device() error parameter
>   failover: split failover_find_primary_device_id()
>   failover: We don't need to cache primary_device_id anymore
>   failover: Caller of this two functions already have primary_dev
>   failover: simplify failover_unplug_primary
>   failover: Remove primary_dev member
> 
>  include/hw/qdev-core.h         |  28 ++--
>  include/hw/virtio/virtio-net.h |   9 +-
>  hw/core/qdev.c                 |  19 +--
>  hw/net/virtio-net.c            | 298 +++++++++++++--------------------
>  migration/migration.c          |  13 ++
>  softmmu/qdev-monitor.c         |  43 ++---
>  6 files changed, 167 insertions(+), 243 deletions(-)
> 
> --=20
> 2.26.2
> 


Re: [PATCH v2 00/27] Virtio net failover fixes
Posted by Juan Quintela 3 years, 4 months ago
"Michael S. Tsirkin" <mst@redhat.com> wrote:
> On Wed, Nov 18, 2020 at 09:37:21AM +0100, Juan Quintela wrote:
>> Hi
>> 
>> This is a big rework of the network failover setup.  General idea is:
>> * We don't cache the name of the primary/standby devices
>>   We have several problems there with stale pointers
>> * After this:
>> - We can always remove either the primary/standby devices without trouble
>> - Pluggin/unplugging works
>> - We go to device opts to see what the failover device are.
>>   Notice that we are plugging/unplugging the device, so it is not critical.
>> - Once there, I "fixed" managedsave for libvirt (now gives an error instead o=
>> f just hanging)
>> * Fields not cached anymore:
>> - primary_device_dict
>> - primary_device_opts
>> - standby_id
>> - primary_device_id
>> - primary_dev
>> * I renamed the should_be_hidden() callback to hide device, but if
>>   people preffer the old name I can leave it.
>> * Add (some) doc to some functions
>> * Remove almost 100 lines of code while fixing things.
>> 
>> Please review.
>
> OK that's great, any of this appropriate for 5.2?
> The memory leak fix maybe?

1st one is also a fix, current code just hangs the guest.
Rest of things .... current code fails a lot, but we are too late on the
cycle.

Later, Juan.


>> Later, Juan.
>> 
>> Juan Quintela (27):
>>   migration: Network Failover can't work with a paused guest
>>   failover: fix indentantion
>>   failover: Use always atomics for primary_should_be_hidden
>>   failover: primary bus is only used once, and where it is set
>>   failover: Remove unused parameter
>>   failover: Remove external partially_hotplugged property
>>   failover: qdev_device_add() returns err or dev set
>>   failover: Rename bool to failover_primary_hidden
>>   failover: g_strcmp0() knows how to handle NULL
>>   failover: Remove primary_device_opts
>>   failover: remove standby_id variable
>>   failover: Remove primary_device_dict
>>   failover: Remove memory leak
>>   failover: simplify virtio_net_find_primary()
>>   failover: should_be_hidden() should take a bool
>>   failover: Rename function to hide_device()
>>   failover: virtio_net_connect_failover_devices() does nothing
>>   failover: Rename to failover_find_primary_device()
>>   failover: simplify qdev_device_add() failover case
>>   failover: simplify qdev_device_add()
>>   failover: make sure that id always exist
>>   failover: remove failover_find_primary_device() error parameter
>>   failover: split failover_find_primary_device_id()
>>   failover: We don't need to cache primary_device_id anymore
>>   failover: Caller of this two functions already have primary_dev
>>   failover: simplify failover_unplug_primary
>>   failover: Remove primary_dev member
>> 
>>  include/hw/qdev-core.h         |  28 ++--
>>  include/hw/virtio/virtio-net.h |   9 +-
>>  hw/core/qdev.c                 |  19 +--
>>  hw/net/virtio-net.c            | 298 +++++++++++++--------------------
>>  migration/migration.c          |  13 ++
>>  softmmu/qdev-monitor.c         |  43 ++---
>>  6 files changed, 167 insertions(+), 243 deletions(-)
>> 
>> --=20
>> 2.26.2
>> 


Re: [PATCH v2 00/27] Virtio net failover fixes
Posted by Michael S. Tsirkin 3 years, 4 months ago
On Wed, Dec 02, 2020 at 11:16:04AM +0100, Juan Quintela wrote:
> "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > On Wed, Nov 18, 2020 at 09:37:21AM +0100, Juan Quintela wrote:
> >> Hi
> >> 
> >> This is a big rework of the network failover setup.  General idea is:
> >> * We don't cache the name of the primary/standby devices
> >>   We have several problems there with stale pointers
> >> * After this:
> >> - We can always remove either the primary/standby devices without trouble
> >> - Pluggin/unplugging works
> >> - We go to device opts to see what the failover device are.
> >>   Notice that we are plugging/unplugging the device, so it is not critical.
> >> - Once there, I "fixed" managedsave for libvirt (now gives an error instead o=
> >> f just hanging)
> >> * Fields not cached anymore:
> >> - primary_device_dict
> >> - primary_device_opts
> >> - standby_id
> >> - primary_device_id
> >> - primary_dev
> >> * I renamed the should_be_hidden() callback to hide device, but if
> >>   people preffer the old name I can leave it.
> >> * Add (some) doc to some functions
> >> * Remove almost 100 lines of code while fixing things.
> >> 
> >> Please review.
> >
> > OK that's great, any of this appropriate for 5.2?
> > The memory leak fix maybe?
> 
> 1st one is also a fix, current code just hangs the guest.


Hmm it does but then proceeds when you unpause so I'm not sure
it's a good idea for 5.2. It's late in the cycle to try to
handle management bugs ...

> Rest of things .... current code fails a lot, but we are too late on the
> cycle.
> 
> Later, Juan.
> 
> 
> >> Later, Juan.
> >> 
> >> Juan Quintela (27):
> >>   migration: Network Failover can't work with a paused guest
> >>   failover: fix indentantion
> >>   failover: Use always atomics for primary_should_be_hidden
> >>   failover: primary bus is only used once, and where it is set
> >>   failover: Remove unused parameter
> >>   failover: Remove external partially_hotplugged property
> >>   failover: qdev_device_add() returns err or dev set
> >>   failover: Rename bool to failover_primary_hidden
> >>   failover: g_strcmp0() knows how to handle NULL
> >>   failover: Remove primary_device_opts
> >>   failover: remove standby_id variable
> >>   failover: Remove primary_device_dict
> >>   failover: Remove memory leak
> >>   failover: simplify virtio_net_find_primary()
> >>   failover: should_be_hidden() should take a bool
> >>   failover: Rename function to hide_device()
> >>   failover: virtio_net_connect_failover_devices() does nothing
> >>   failover: Rename to failover_find_primary_device()
> >>   failover: simplify qdev_device_add() failover case
> >>   failover: simplify qdev_device_add()
> >>   failover: make sure that id always exist
> >>   failover: remove failover_find_primary_device() error parameter
> >>   failover: split failover_find_primary_device_id()
> >>   failover: We don't need to cache primary_device_id anymore
> >>   failover: Caller of this two functions already have primary_dev
> >>   failover: simplify failover_unplug_primary
> >>   failover: Remove primary_dev member
> >> 
> >>  include/hw/qdev-core.h         |  28 ++--
> >>  include/hw/virtio/virtio-net.h |   9 +-
> >>  hw/core/qdev.c                 |  19 +--
> >>  hw/net/virtio-net.c            | 298 +++++++++++++--------------------
> >>  migration/migration.c          |  13 ++
> >>  softmmu/qdev-monitor.c         |  43 ++---
> >>  6 files changed, 167 insertions(+), 243 deletions(-)
> >> 
> >> --=20
> >> 2.26.2
> >> 


Re: [PATCH v2 00/27] Virtio net failover fixes
Posted by Michael S. Tsirkin 3 years, 4 months ago
On Wed, Nov 18, 2020 at 09:37:21AM +0100, Juan Quintela wrote:
> Hi

I tagged this for after the release. To help make sure this
is not lost pls ping me after the release. Thanks!


> This is a big rework of the network failover setup.  General idea is:
> * We don't cache the name of the primary/standby devices
>   We have several problems there with stale pointers
> * After this:
> - We can always remove either the primary/standby devices without trouble
> - Pluggin/unplugging works
> - We go to device opts to see what the failover device are.
>   Notice that we are plugging/unplugging the device, so it is not critical.
> - Once there, I "fixed" managedsave for libvirt (now gives an error instead o=
> f just hanging)
> * Fields not cached anymore:
> - primary_device_dict
> - primary_device_opts
> - standby_id
> - primary_device_id
> - primary_dev
> * I renamed the should_be_hidden() callback to hide device, but if
>   people preffer the old name I can leave it.
> * Add (some) doc to some functions
> * Remove almost 100 lines of code while fixing things.
> 
> Please review.
> 
> Later, Juan.
> 
> Juan Quintela (27):
>   migration: Network Failover can't work with a paused guest
>   failover: fix indentantion
>   failover: Use always atomics for primary_should_be_hidden
>   failover: primary bus is only used once, and where it is set
>   failover: Remove unused parameter
>   failover: Remove external partially_hotplugged property
>   failover: qdev_device_add() returns err or dev set
>   failover: Rename bool to failover_primary_hidden
>   failover: g_strcmp0() knows how to handle NULL
>   failover: Remove primary_device_opts
>   failover: remove standby_id variable
>   failover: Remove primary_device_dict
>   failover: Remove memory leak
>   failover: simplify virtio_net_find_primary()
>   failover: should_be_hidden() should take a bool
>   failover: Rename function to hide_device()
>   failover: virtio_net_connect_failover_devices() does nothing
>   failover: Rename to failover_find_primary_device()
>   failover: simplify qdev_device_add() failover case
>   failover: simplify qdev_device_add()
>   failover: make sure that id always exist
>   failover: remove failover_find_primary_device() error parameter
>   failover: split failover_find_primary_device_id()
>   failover: We don't need to cache primary_device_id anymore
>   failover: Caller of this two functions already have primary_dev
>   failover: simplify failover_unplug_primary
>   failover: Remove primary_dev member
> 
>  include/hw/qdev-core.h         |  28 ++--
>  include/hw/virtio/virtio-net.h |   9 +-
>  hw/core/qdev.c                 |  19 +--
>  hw/net/virtio-net.c            | 298 +++++++++++++--------------------
>  migration/migration.c          |  13 ++
>  softmmu/qdev-monitor.c         |  43 ++---
>  6 files changed, 167 insertions(+), 243 deletions(-)
> 
> --=20
> 2.26.2
>