[PATCH V2 0/4] Introduce Advanced Watch Dog module

Zhang Chen posted 4 patches 4 years, 4 months ago
Test asan passed
Test checkpatch passed
Test FreeBSD passed
Test docker-mingw@fedora passed
Test docker-clang@ubuntu passed
Test docker-quick@centos7 passed
Patches applied successfully (tree, apply log)
git fetch https://github.com/patchew-project/qemu tags/patchew/20191101024850.20808-1-chen.zhang@intel.com
Maintainers: Jason Wang <jasowang@redhat.com>, Paolo Bonzini <pbonzini@redhat.com>
There is a newer version of this series
configure         |   9 +
net/Makefile.objs |   1 +
net/awd.c         | 491 ++++++++++++++++++++++++++++++++++++++++++++++
qemu-options.hx   |   6 +
vl.c              |   7 +
5 files changed, 514 insertions(+)
create mode 100644 net/awd.c
[PATCH V2 0/4] Introduce Advanced Watch Dog module
Posted by Zhang Chen 4 years, 4 months ago
From: Zhang Chen <chen.zhang@intel.com>

Advanced Watch Dog is an universal monitoring module on VMM side, it can be used to detect network down(VMM to guest, VMM to VMM, VMM to another remote server) and do previously set operation. Current AWD patch just accept any input as the signal to refresh the watchdog timer,
and we can also make a certain interactive protocol here. For the output user can pre-write
some command or some messages in the AWD opt-script. We noticed that there is no way
for VMM communicate directly, maybe some people think we don't need such things(up layer
software like openstack can handle it). But we engaged with real customer found that in some cases,they need a lightweight and efficient mechanism to solve some practical problems(openstack is too heavy).
for example: When it detects lost connection with the paired node,it will send message to admin, notify another VMM, send qmp command to qemu do some operation like restart the VM, build VMM heartbeat system, etc.
It make user have basic VM/Host network monitoring tools and basic false tolerance and recovery solution.

Demo usage(for COLO heartbeat service):

In primary node:

-chardev socket,id=h1,host=3.3.3.3,port=9009,server,nowait
-chardev socket,id=heartbeat0,host=3.3.3.3,port=4445
-object iothread,id=iothread2
-object advanced-watchdog,id=heart1,server=on,awd_node=h1,notification_node=heartbeat0,opt_script=colo_opt_script_path,iothread=iothread1,pulse_interval=1000,timeout=5000

In secondary node:

-monitor tcp::4445,server,nowait
-chardev socket,id=h1,host=3.3.3.3,port=9009,reconnect=1
-chardev socket,id=heart1,host=3.3.3.8,port=4445
-object iothread,id=iothread1
-object advanced-watchdog,id=heart1,server=off,awd_node=h1,notification_node=heart1,opt_script=colo_secondary_opt_script,iothread=iothread1,timeout=10000


V2:
 - Addressed Philippe comments add configure selector for AWD.

Initial:
 - Initial version.

Zhang Chen (4):
  net/awd.c: Introduce Advanced Watch Dog module framework
  net/awd.c: Initailize input/output chardev
  net/awd.c: Load advanced watch dog worker thread job
  vl.c: Make Advanced Watch Dog delayed initialization

 configure         |   9 +
 net/Makefile.objs |   1 +
 net/awd.c         | 491 ++++++++++++++++++++++++++++++++++++++++++++++
 qemu-options.hx   |   6 +
 vl.c              |   7 +
 5 files changed, 514 insertions(+)
 create mode 100644 net/awd.c

-- 
2.17.1


RE: [PATCH V2 0/4] Introduce Advanced Watch Dog module
Posted by Zhang, Chen 4 years, 4 months ago
Hi~ All~ 

Ping.... Anyone have time to review this series? I need more comments~

Thanks
Zhang Chen

> -----Original Message-----
> From: Zhang, Chen <chen.zhang@intel.com>
> Sent: Friday, November 1, 2019 10:49 AM
> To: Jason Wang <jasowang@redhat.com>; Paolo Bonzini
> <pbonzini@redhat.com>; Philippe Mathieu-Daudé <philmd@redhat.com>;
> qemu-dev <qemu-devel@nongnu.org>
> Cc: Zhang Chen <zhangckid@gmail.com>; Zhang, Chen
> <chen.zhang@intel.com>
> Subject: [PATCH V2 0/4] Introduce Advanced Watch Dog module
> 
> From: Zhang Chen <chen.zhang@intel.com>
> 
> Advanced Watch Dog is an universal monitoring module on VMM side, it can
> be used to detect network down(VMM to guest, VMM to VMM, VMM to
> another remote server) and do previously set operation. Current AWD patch
> just accept any input as the signal to refresh the watchdog timer, and we can
> also make a certain interactive protocol here. For the output user can pre-
> write some command or some messages in the AWD opt-script. We noticed
> that there is no way for VMM communicate directly, maybe some people
> think we don't need such things(up layer software like openstack can handle
> it). But we engaged with real customer found that in some cases,they need a
> lightweight and efficient mechanism to solve some practical
> problems(openstack is too heavy).
> for example: When it detects lost connection with the paired node,it will
> send message to admin, notify another VMM, send qmp command to qemu
> do some operation like restart the VM, build VMM heartbeat system, etc.
> It make user have basic VM/Host network monitoring tools and basic false
> tolerance and recovery solution.
> 
> Demo usage(for COLO heartbeat service):
> 
> In primary node:
> 
> -chardev socket,id=h1,host=3.3.3.3,port=9009,server,nowait
> -chardev socket,id=heartbeat0,host=3.3.3.3,port=4445
> -object iothread,id=iothread2
> -object advanced-
> watchdog,id=heart1,server=on,awd_node=h1,notification_node=heartbeat
> 0,opt_script=colo_opt_script_path,iothread=iothread1,pulse_interval=1000,
> timeout=5000
> 
> In secondary node:
> 
> -monitor tcp::4445,server,nowait
> -chardev socket,id=h1,host=3.3.3.3,port=9009,reconnect=1
> -chardev socket,id=heart1,host=3.3.3.8,port=4445
> -object iothread,id=iothread1
> -object advanced-
> watchdog,id=heart1,server=off,awd_node=h1,notification_node=heart1,op
> t_script=colo_secondary_opt_script,iothread=iothread1,timeout=10000
> 
> 
> V2:
>  - Addressed Philippe comments add configure selector for AWD.
> 
> Initial:
>  - Initial version.
> 
> Zhang Chen (4):
>   net/awd.c: Introduce Advanced Watch Dog module framework
>   net/awd.c: Initailize input/output chardev
>   net/awd.c: Load advanced watch dog worker thread job
>   vl.c: Make Advanced Watch Dog delayed initialization
> 
>  configure         |   9 +
>  net/Makefile.objs |   1 +
>  net/awd.c         | 491
> ++++++++++++++++++++++++++++++++++++++++++++++
>  qemu-options.hx   |   6 +
>  vl.c              |   7 +
>  5 files changed, 514 insertions(+)
>  create mode 100644 net/awd.c
> 
> --
> 2.17.1


Re: [PATCH V2 0/4] Introduce Advanced Watch Dog module
Posted by Markus Armbruster 4 years, 4 months ago
"Zhang, Chen" <chen.zhang@intel.com> writes:

> Hi~ All~ 
>
> Ping.... Anyone have time to review this series? I need more comments~

Any takers?


RE: [PATCH V2 0/4] Introduce Advanced Watch Dog module
Posted by Zhang, Chen 4 years, 4 months ago

> -----Original Message-----
> From: Markus Armbruster <armbru@redhat.com>
> Sent: Wednesday, November 27, 2019 11:49 PM
> To: Zhang, Chen <chen.zhang@intel.com>
> Cc: Jason Wang <jasowang@redhat.com>; Paolo Bonzini
> <pbonzini@redhat.com>; Philippe Mathieu-Daudé <philmd@redhat.com>;
> qemu-dev <qemu-devel@nongnu.org>; Zhang Chen <zhangckid@gmail.com>
> Subject: Re: [PATCH V2 0/4] Introduce Advanced Watch Dog module
> 
> "Zhang, Chen" <chen.zhang@intel.com> writes:
> 
> > Hi~ All~
> >
> > Ping.... Anyone have time to review this series? I need more comments~
> 
> Any takers?

Hi Markus,

Thank you for your attention.
This is a very simple module to complete the tasks related to error detection and automatic processing.
I have write the detail reason why we need it in real environment on the commit log.
Here is the latest patch:
https://lists.nongnu.org/archive/html/qemu-devel/2019-11/msg02872.html

Thanks
Zhang Chen