[PATCH v2 00/13] resolve hangs/crashes on libvirtd shutdown

Nikolay Shirokovskiy posted 13 patches 3 years, 9 months ago
Test syntax-check failed
Patches applied successfully (tree, apply log)
git fetch https://github.com/patchew-project/libvirt tags/patchew/1595499253-584162-1-git-send-email-nshirokovskiy@virtuozzo.com
scripts/check-drivername.py   |   2 +
src/driver-state.h            |   8 ++++
src/libvirt.c                 |  42 ++++++++++++++++
src/libvirt_internal.h        |   2 +
src/libvirt_private.syms      |   4 ++
src/libvirt_remote.syms       |   2 +-
src/qemu/qemu_domain.c        |  18 +++++--
src/qemu/qemu_driver.c        |  32 +++++++++++++
src/qemu/qemu_process.c       |   3 --
src/remote/remote_daemon.c    |   6 +--
src/rpc/virnetdaemon.c        | 109 ++++++++++++++++++++++++++++++++++++------
src/rpc/virnetdaemon.h        |   8 +++-
src/rpc/virnetserver.c        |   8 ++++
src/rpc/virnetserver.h        |   1 +
src/rpc/virnetserverservice.c |   1 -
src/util/vireventthread.c     |   1 +
src/util/virthreadpool.c      |  65 +++++++++++++++++--------
src/util/virthreadpool.h      |   6 +--
18 files changed, 267 insertions(+), 51 deletions(-)
[PATCH v2 00/13] resolve hangs/crashes on libvirtd shutdown
Posted by Nikolay Shirokovskiy 3 years, 9 months ago
I keep qemu VM event loop exiting synchronously but add code to avoid deadlock
that can be caused by this approach. I guess it is worth having synchronous
exiting of threads in this case to avoid crashes.

Patches that are already positively reviewed has appropriate 'Reviewed-by' lines.

Changes from v1:
- rename stateShutdown to state stateShutdownPrepare
- introduce net daemon shutdown callbacks
- make some adjustments in terms of qemu per VM's event loop thread
  finishing
- factor out net server shutdown facilities into distinct patch
- increase shutdown timeout from 15s to 30s

Nikolay Shirokovskiy (13):
  libvirt: add stateShutdownPrepare/stateShutdownWait to drivers
  util: always initialize priority condition
  util: add stop/drain functions to thread pool
  rpc: don't unref service ref on socket behalf twice
  rpc: add virNetDaemonSetShutdownCallbacks
  rpc: add shutdown facilities to netserver
  rpc: finish all threads before exiting main loop
  qemu: don't shutdown event thread in monitor EOF callback
  vireventthread: exit thread synchronously on finalize
  qemu: avoid deadlock in qemuDomainObjStopWorker
  qemu: implement driver's shutdown/shutdown wait methods
  rpc: cleanup virNetDaemonClose method
  util: remove unused virThreadPoolNew macro

 scripts/check-drivername.py   |   2 +
 src/driver-state.h            |   8 ++++
 src/libvirt.c                 |  42 ++++++++++++++++
 src/libvirt_internal.h        |   2 +
 src/libvirt_private.syms      |   4 ++
 src/libvirt_remote.syms       |   2 +-
 src/qemu/qemu_domain.c        |  18 +++++--
 src/qemu/qemu_driver.c        |  32 +++++++++++++
 src/qemu/qemu_process.c       |   3 --
 src/remote/remote_daemon.c    |   6 +--
 src/rpc/virnetdaemon.c        | 109 ++++++++++++++++++++++++++++++++++++------
 src/rpc/virnetdaemon.h        |   8 +++-
 src/rpc/virnetserver.c        |   8 ++++
 src/rpc/virnetserver.h        |   1 +
 src/rpc/virnetserverservice.c |   1 -
 src/util/vireventthread.c     |   1 +
 src/util/virthreadpool.c      |  65 +++++++++++++++++--------
 src/util/virthreadpool.h      |   6 +--
 18 files changed, 267 insertions(+), 51 deletions(-)

-- 
1.8.3.1

Re: [PATCH v2 00/13] resolve hangs/crashes on libvirtd shutdown
Posted by Nikolay Shirokovskiy 3 years, 8 months ago
polite ping

On 23.07.2020 13:14, Nikolay Shirokovskiy wrote:
> I keep qemu VM event loop exiting synchronously but add code to avoid deadlock
> that can be caused by this approach. I guess it is worth having synchronous
> exiting of threads in this case to avoid crashes.
> 
> Patches that are already positively reviewed has appropriate 'Reviewed-by' lines.
> 
> Changes from v1:
> - rename stateShutdown to state stateShutdownPrepare
> - introduce net daemon shutdown callbacks
> - make some adjustments in terms of qemu per VM's event loop thread
>   finishing
> - factor out net server shutdown facilities into distinct patch
> - increase shutdown timeout from 15s to 30s
> 
> Nikolay Shirokovskiy (13):
>   libvirt: add stateShutdownPrepare/stateShutdownWait to drivers
>   util: always initialize priority condition
>   util: add stop/drain functions to thread pool
>   rpc: don't unref service ref on socket behalf twice
>   rpc: add virNetDaemonSetShutdownCallbacks
>   rpc: add shutdown facilities to netserver
>   rpc: finish all threads before exiting main loop
>   qemu: don't shutdown event thread in monitor EOF callback
>   vireventthread: exit thread synchronously on finalize
>   qemu: avoid deadlock in qemuDomainObjStopWorker
>   qemu: implement driver's shutdown/shutdown wait methods
>   rpc: cleanup virNetDaemonClose method
>   util: remove unused virThreadPoolNew macro
> 
>  scripts/check-drivername.py   |   2 +
>  src/driver-state.h            |   8 ++++
>  src/libvirt.c                 |  42 ++++++++++++++++
>  src/libvirt_internal.h        |   2 +
>  src/libvirt_private.syms      |   4 ++
>  src/libvirt_remote.syms       |   2 +-
>  src/qemu/qemu_domain.c        |  18 +++++--
>  src/qemu/qemu_driver.c        |  32 +++++++++++++
>  src/qemu/qemu_process.c       |   3 --
>  src/remote/remote_daemon.c    |   6 +--
>  src/rpc/virnetdaemon.c        | 109 ++++++++++++++++++++++++++++++++++++------
>  src/rpc/virnetdaemon.h        |   8 +++-
>  src/rpc/virnetserver.c        |   8 ++++
>  src/rpc/virnetserver.h        |   1 +
>  src/rpc/virnetserverservice.c |   1 -
>  src/util/vireventthread.c     |   1 +
>  src/util/virthreadpool.c      |  65 +++++++++++++++++--------
>  src/util/virthreadpool.h      |   6 +--
>  18 files changed, 267 insertions(+), 51 deletions(-)
> 

Re: [PATCH v2 00/13] resolve hangs/crashes on libvirtd shutdown
Posted by Nikolay Shirokovskiy 3 years, 7 months ago
one more time

Sorry if this is on somebody's yet to be reviewed patches stack. It is not
clear now in current workflow if this patch track lost or not.

On 06.08.2020 17:10, Nikolay Shirokovskiy wrote:
> polite ping
> 
> On 23.07.2020 13:14, Nikolay Shirokovskiy wrote:
>> I keep qemu VM event loop exiting synchronously but add code to avoid deadlock
>> that can be caused by this approach. I guess it is worth having synchronous
>> exiting of threads in this case to avoid crashes.
>>
>> Patches that are already positively reviewed has appropriate 'Reviewed-by' lines.
>>
>> Changes from v1:
>> - rename stateShutdown to state stateShutdownPrepare
>> - introduce net daemon shutdown callbacks
>> - make some adjustments in terms of qemu per VM's event loop thread
>>   finishing
>> - factor out net server shutdown facilities into distinct patch
>> - increase shutdown timeout from 15s to 30s
>>
>> Nikolay Shirokovskiy (13):
>>   libvirt: add stateShutdownPrepare/stateShutdownWait to drivers
>>   util: always initialize priority condition
>>   util: add stop/drain functions to thread pool
>>   rpc: don't unref service ref on socket behalf twice
>>   rpc: add virNetDaemonSetShutdownCallbacks
>>   rpc: add shutdown facilities to netserver
>>   rpc: finish all threads before exiting main loop
>>   qemu: don't shutdown event thread in monitor EOF callback
>>   vireventthread: exit thread synchronously on finalize
>>   qemu: avoid deadlock in qemuDomainObjStopWorker
>>   qemu: implement driver's shutdown/shutdown wait methods
>>   rpc: cleanup virNetDaemonClose method
>>   util: remove unused virThreadPoolNew macro
>>
>>  scripts/check-drivername.py   |   2 +
>>  src/driver-state.h            |   8 ++++
>>  src/libvirt.c                 |  42 ++++++++++++++++
>>  src/libvirt_internal.h        |   2 +
>>  src/libvirt_private.syms      |   4 ++
>>  src/libvirt_remote.syms       |   2 +-
>>  src/qemu/qemu_domain.c        |  18 +++++--
>>  src/qemu/qemu_driver.c        |  32 +++++++++++++
>>  src/qemu/qemu_process.c       |   3 --
>>  src/remote/remote_daemon.c    |   6 +--
>>  src/rpc/virnetdaemon.c        | 109 ++++++++++++++++++++++++++++++++++++------
>>  src/rpc/virnetdaemon.h        |   8 +++-
>>  src/rpc/virnetserver.c        |   8 ++++
>>  src/rpc/virnetserver.h        |   1 +
>>  src/rpc/virnetserverservice.c |   1 -
>>  src/util/vireventthread.c     |   1 +
>>  src/util/virthreadpool.c      |  65 +++++++++++++++++--------
>>  src/util/virthreadpool.h      |   6 +--
>>  18 files changed, 267 insertions(+), 51 deletions(-)
>>
> 

Re: [PATCH v2 00/13] resolve hangs/crashes on libvirtd shutdown
Posted by Nikolay Shirokovskiy 3 years, 7 months ago
Pushed now.  Thanx everyone for review.

Patch "[PATCH v2 05/13] rpc: add virNetDaemonSetShutdownCallbacks" does not have mantainer 
review but I guess it is ok as the patch is simple enough and it's API is used in other
patches of series.

I would also want to note that crashes are still possible because not all threads
are joined on shutdown in qemu driver and other drivers. For example in qemu driver we 
spawn a thread during fake reboot or on daemon startup we spawn threads for every
VM to reconnect. So I would like to continue to work on this issues. Is this considered
worth the effort? I realize these are rare corner cases, like daemon shutdown immediately
after daemon start or daemon shutdown coincide with some domain reboot etc.

Nikolay

On 23.07.2020 13:14, Nikolay Shirokovskiy wrote:
> I keep qemu VM event loop exiting synchronously but add code to avoid deadlock
> that can be caused by this approach. I guess it is worth having synchronous
> exiting of threads in this case to avoid crashes.
> 
> Patches that are already positively reviewed has appropriate 'Reviewed-by' lines.
> 
> Changes from v1:
> - rename stateShutdown to state stateShutdownPrepare
> - introduce net daemon shutdown callbacks
> - make some adjustments in terms of qemu per VM's event loop thread
>   finishing
> - factor out net server shutdown facilities into distinct patch
> - increase shutdown timeout from 15s to 30s
> 
> Nikolay Shirokovskiy (13):
>   libvirt: add stateShutdownPrepare/stateShutdownWait to drivers
>   util: always initialize priority condition
>   util: add stop/drain functions to thread pool
>   rpc: don't unref service ref on socket behalf twice
>   rpc: add virNetDaemonSetShutdownCallbacks
>   rpc: add shutdown facilities to netserver
>   rpc: finish all threads before exiting main loop
>   qemu: don't shutdown event thread in monitor EOF callback
>   vireventthread: exit thread synchronously on finalize
>   qemu: avoid deadlock in qemuDomainObjStopWorker
>   qemu: implement driver's shutdown/shutdown wait methods
>   rpc: cleanup virNetDaemonClose method
>   util: remove unused virThreadPoolNew macro
> 
>  scripts/check-drivername.py   |   2 +
>  src/driver-state.h            |   8 ++++
>  src/libvirt.c                 |  42 ++++++++++++++++
>  src/libvirt_internal.h        |   2 +
>  src/libvirt_private.syms      |   4 ++
>  src/libvirt_remote.syms       |   2 +-
>  src/qemu/qemu_domain.c        |  18 +++++--
>  src/qemu/qemu_driver.c        |  32 +++++++++++++
>  src/qemu/qemu_process.c       |   3 --
>  src/remote/remote_daemon.c    |   6 +--
>  src/rpc/virnetdaemon.c        | 109 ++++++++++++++++++++++++++++++++++++------
>  src/rpc/virnetdaemon.h        |   8 +++-
>  src/rpc/virnetserver.c        |   8 ++++
>  src/rpc/virnetserver.h        |   1 +
>  src/rpc/virnetserverservice.c |   1 -
>  src/util/vireventthread.c     |   1 +
>  src/util/virthreadpool.c      |  65 +++++++++++++++++--------
>  src/util/virthreadpool.h      |   6 +--
>  18 files changed, 267 insertions(+), 51 deletions(-)
> 

Re: [PATCH v2 00/13] resolve hangs/crashes on libvirtd shutdown
Posted by Daniel P. Berrangé 3 years, 7 months ago
On Mon, Sep 07, 2020 at 10:12:08AM +0300, Nikolay Shirokovskiy wrote:
> Pushed now.  Thanx everyone for review.
> 
> Patch "[PATCH v2 05/13] rpc: add virNetDaemonSetShutdownCallbacks" does not have mantainer 
> review but I guess it is ok as the patch is simple enough and it's API is used in other
> patches of series.
> 
> I would also want to note that crashes are still possible because not all threads
> are joined on shutdown in qemu driver and other drivers. For example in qemu driver we 
> spawn a thread during fake reboot or on daemon startup we spawn threads for every
> VM to reconnect. So I would like to continue to work on this issues. Is this considered
> worth the effort? I realize these are rare corner cases, like daemon shutdown immediately
> after daemon start or daemon shutdown coincide with some domain reboot etc.

Personally I wouldn't spend time on such edge cases, as I feel there are
probably worse problems in libvirt needing attention first, but if you
want to send patches we would of course review them.


Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|