[PATCH v3 0/7] colo: Introduce resource agent and test suite/CI

Lukas Straub posted 7 patches 3 years, 8 months ago
Test docker-quick@centos7 failed
Test docker-mingw@fedora failed
Test checkpatch failed
Test FreeBSD failed
Patches applied successfully (tree, apply log)
git fetch https://github.com/patchew-project/qemu tags/patchew/cover.1596536719.git.lukasstraub2@web.de
Maintainers: Alberto Garcia <berto@igalia.com>, Max Reitz <mreitz@redhat.com>, Wainer dos Santos Moschetta <wainersm@redhat.com>, Cleber Rosa <crosa@redhat.com>, "Philippe Mathieu-Daudé" <philmd@redhat.com>, Kevin Wolf <kwolf@redhat.com>
There is a newer version of this series
MAINTAINERS                               |    6 +
Makefile                                  |    5 +
block/quorum.c                            |   20 +-
configure                                 |   10 +
scripts/colo-resource-agent/colo          | 1501 +++++++++++++++++++++
scripts/colo-resource-agent/crm_master    |   44 +
scripts/colo-resource-agent/crm_resource  |   12 +
tests/acceptance/avocado_qemu/__init__.py |   15 +
tests/acceptance/boot_linux.py            |   11 +-
tests/acceptance/colo.py                  |  677 ++++++++++
10 files changed, 2286 insertions(+), 15 deletions(-)
create mode 100755 scripts/colo-resource-agent/colo
create mode 100755 scripts/colo-resource-agent/crm_master
create mode 100755 scripts/colo-resource-agent/crm_resource
create mode 100644 tests/acceptance/colo.py
[PATCH v3 0/7] colo: Introduce resource agent and test suite/CI
Posted by Lukas Straub 3 years, 8 months ago
Hello Everyone,
So here is v3. Patch 1 can already be merged independently of the others.
Please review.

Regards,
Lukas Straub

Based-on: <cover.1596528468.git.lukasstraub2@web.de>
"Introduce 'yank' oob qmp command to recover from hanging qemu"

Changes:

v3:
 -resource-agent: Don't determine local qemu state by remote master-score, query
  directly via qmp instead
 -resource-agent: Add max_queue_size parameter for colo-compare
 -resource-agent: Fix monitor action on secondary returning error during
  clean shutdown
 -resource-agent: Fix stop action setting master-score to 0 on primary on
  clean shutdown

v2:
 -use new yank api
 -drop disk_size parameter
 -introduce pick_qemu_util function and use it

Overview:

Hello Everyone,
These patches introduce a resource agent for fully automatic management of colo
and a test suite building upon the resource agent to extensively test colo.

Test suite features:
-Tests failover with peer crashing and hanging and failover during checkpoint
-Tests network using ssh and iperf3
-Quick test requires no special configuration
-Network test for testing colo-compare
-Stress test: failover all the time with network load

Resource agent features:
-Fully automatic management of colo
-Handles many failures: hanging/crashing qemu, replication error, disk error, ...
-Recovers from hanging qemu by using the "yank" oob command
-Tracks which node has up-to-date data
-Works well in clusters with more than 2 nodes

Run times on my laptop:
Quick test: 200s
Network test: 800s (tagged as slow)
Stress test: 1300s (tagged as slow)

For the last two tests, the test suite needs access to a network bridge to
properly test the network, so some parameters need to be given to the test
run. See tests/acceptance/colo.py for more information.

Regards,
Lukas Straub

Lukas Straub (7):
  block/quorum.c: stable children names
  avocado_qemu: Introduce pick_qemu_util to pick qemu utility binaries
  boot_linux.py: Use pick_qemu_util
  colo: Introduce resource agent
  colo: Introduce high-level test suite
  configure,Makefile: Install colo resource-agent
  MAINTAINERS: Add myself as maintainer for COLO resource agent

 MAINTAINERS                               |    6 +
 Makefile                                  |    5 +
 block/quorum.c                            |   20 +-
 configure                                 |   10 +
 scripts/colo-resource-agent/colo          | 1501 +++++++++++++++++++++
 scripts/colo-resource-agent/crm_master    |   44 +
 scripts/colo-resource-agent/crm_resource  |   12 +
 tests/acceptance/avocado_qemu/__init__.py |   15 +
 tests/acceptance/boot_linux.py            |   11 +-
 tests/acceptance/colo.py                  |  677 ++++++++++
 10 files changed, 2286 insertions(+), 15 deletions(-)
 create mode 100755 scripts/colo-resource-agent/colo
 create mode 100755 scripts/colo-resource-agent/crm_master
 create mode 100755 scripts/colo-resource-agent/crm_resource
 create mode 100644 tests/acceptance/colo.py

--
2.20.1
Re: [PATCH v3 0/7] colo: Introduce resource agent and test suite/CI
Posted by Lukas Straub 3 years, 8 months ago
On Tue, 4 Aug 2020 12:46:29 +0200
Lukas Straub <lukasstraub2@web.de> wrote:

> Hello Everyone,
> So here is v3. Patch 1 can already be merged independently of the others.
> Please review.
> 
> Regards,
> Lukas Straub
> 
> Based-on: <cover.1596528468.git.lukasstraub2@web.de>
> "Introduce 'yank' oob qmp command to recover from hanging qemu"
> 
> Changes:
> 
> v3:
>  -resource-agent: Don't determine local qemu state by remote master-score, query
>   directly via qmp instead
>  -resource-agent: Add max_queue_size parameter for colo-compare
>  -resource-agent: Fix monitor action on secondary returning error during
>   clean shutdown
>  -resource-agent: Fix stop action setting master-score to 0 on primary on
>   clean shutdown
> 
> v2:
>  -use new yank api
>  -drop disk_size parameter
>  -introduce pick_qemu_util function and use it
> 
> Overview:
> 
> Hello Everyone,
> These patches introduce a resource agent for fully automatic management of colo
> and a test suite building upon the resource agent to extensively test colo.
> 
> Test suite features:
> -Tests failover with peer crashing and hanging and failover during checkpoint
> -Tests network using ssh and iperf3
> -Quick test requires no special configuration
> -Network test for testing colo-compare
> -Stress test: failover all the time with network load
> 
> Resource agent features:
> -Fully automatic management of colo
> -Handles many failures: hanging/crashing qemu, replication error, disk error, ...
> -Recovers from hanging qemu by using the "yank" oob command
> -Tracks which node has up-to-date data
> -Works well in clusters with more than 2 nodes
> 
> Run times on my laptop:
> Quick test: 200s
> Network test: 800s (tagged as slow)
> Stress test: 1300s (tagged as slow)
> 
> For the last two tests, the test suite needs access to a network bridge to
> properly test the network, so some parameters need to be given to the test
> run. See tests/acceptance/colo.py for more information.
> 
> Regards,
> Lukas Straub
> 
> Lukas Straub (7):
>   block/quorum.c: stable children names
>   avocado_qemu: Introduce pick_qemu_util to pick qemu utility binaries
>   boot_linux.py: Use pick_qemu_util
>   colo: Introduce resource agent
>   colo: Introduce high-level test suite
>   configure,Makefile: Install colo resource-agent
>   MAINTAINERS: Add myself as maintainer for COLO resource agent
> 
>  MAINTAINERS                               |    6 +
>  Makefile                                  |    5 +
>  block/quorum.c                            |   20 +-
>  configure                                 |   10 +
>  scripts/colo-resource-agent/colo          | 1501 +++++++++++++++++++++
>  scripts/colo-resource-agent/crm_master    |   44 +
>  scripts/colo-resource-agent/crm_resource  |   12 +
>  tests/acceptance/avocado_qemu/__init__.py |   15 +
>  tests/acceptance/boot_linux.py            |   11 +-
>  tests/acceptance/colo.py                  |  677 ++++++++++
>  10 files changed, 2286 insertions(+), 15 deletions(-)
>  create mode 100755 scripts/colo-resource-agent/colo
>  create mode 100755 scripts/colo-resource-agent/crm_master
>  create mode 100755 scripts/colo-resource-agent/crm_resource
>  create mode 100644 tests/acceptance/colo.py
> 
> --
> 2.20.1

Ping...
Re: [PATCH v3 0/7] colo: Introduce resource agent and test suite/CI
Posted by Philippe Mathieu-Daudé 3 years, 8 months ago
On 8/18/20 2:27 PM, Lukas Straub wrote:
> On Tue, 4 Aug 2020 12:46:29 +0200
> Lukas Straub <lukasstraub2@web.de> wrote:
> 
>> Hello Everyone,
>> So here is v3. Patch 1 can already be merged independently of the others.
>> Please review.
>>
>> Regards,
>> Lukas Straub
>>
>> Based-on: <cover.1596528468.git.lukasstraub2@web.de>
>> "Introduce 'yank' oob qmp command to recover from hanging qemu"
>>
>> Changes:
>>
>> v3:
>>  -resource-agent: Don't determine local qemu state by remote master-score, query
>>   directly via qmp instead
>>  -resource-agent: Add max_queue_size parameter for colo-compare
>>  -resource-agent: Fix monitor action on secondary returning error during
>>   clean shutdown
>>  -resource-agent: Fix stop action setting master-score to 0 on primary on
>>   clean shutdown
>>
>> v2:
>>  -use new yank api
>>  -drop disk_size parameter
>>  -introduce pick_qemu_util function and use it
>>
>> Overview:
>>
>> Hello Everyone,
>> These patches introduce a resource agent for fully automatic management of colo
>> and a test suite building upon the resource agent to extensively test colo.
>>
>> Test suite features:
>> -Tests failover with peer crashing and hanging and failover during checkpoint
>> -Tests network using ssh and iperf3
>> -Quick test requires no special configuration
>> -Network test for testing colo-compare
>> -Stress test: failover all the time with network load
>>
>> Resource agent features:
>> -Fully automatic management of colo
>> -Handles many failures: hanging/crashing qemu, replication error, disk error, ...
>> -Recovers from hanging qemu by using the "yank" oob command
>> -Tracks which node has up-to-date data
>> -Works well in clusters with more than 2 nodes
>>
>> Run times on my laptop:
>> Quick test: 200s
>> Network test: 800s (tagged as slow)
>> Stress test: 1300s (tagged as slow)
>>
>> For the last two tests, the test suite needs access to a network bridge to
>> properly test the network, so some parameters need to be given to the test
>> run. See tests/acceptance/colo.py for more information.
>>
>> Regards,
>> Lukas Straub
>>
>> Lukas Straub (7):
>>   block/quorum.c: stable children names
>>   avocado_qemu: Introduce pick_qemu_util to pick qemu utility binaries
>>   boot_linux.py: Use pick_qemu_util
>>   colo: Introduce resource agent
>>   colo: Introduce high-level test suite
>>   configure,Makefile: Install colo resource-agent
>>   MAINTAINERS: Add myself as maintainer for COLO resource agent
>>
>>  MAINTAINERS                               |    6 +
>>  Makefile                                  |    5 +
>>  block/quorum.c                            |   20 +-
>>  configure                                 |   10 +
>>  scripts/colo-resource-agent/colo          | 1501 +++++++++++++++++++++
>>  scripts/colo-resource-agent/crm_master    |   44 +
>>  scripts/colo-resource-agent/crm_resource  |   12 +
>>  tests/acceptance/avocado_qemu/__init__.py |   15 +
>>  tests/acceptance/boot_linux.py            |   11 +-
>>  tests/acceptance/colo.py                  |  677 ++++++++++
>>  10 files changed, 2286 insertions(+), 15 deletions(-)
>>  create mode 100755 scripts/colo-resource-agent/colo
>>  create mode 100755 scripts/colo-resource-agent/crm_master
>>  create mode 100755 scripts/colo-resource-agent/crm_resource
>>  create mode 100644 tests/acceptance/colo.py
>>
>> --
>> 2.20.1
> 
> Ping...
> 

Cleber, Wainer, can you have a look at tests/acceptance/colo.py please?


Re: [PATCH v3 0/7] colo: Introduce resource agent and test suite/CI
Posted by Lukas Straub 3 years, 8 months ago
On Tue, 18 Aug 2020 14:27:01 +0200
Lukas Straub <lukasstraub2@web.de> wrote:

> On Tue, 4 Aug 2020 12:46:29 +0200
> Lukas Straub <lukasstraub2@web.de> wrote:
> 
> > Hello Everyone,
> > So here is v3. Patch 1 can already be merged independently of the others.
> > Please review.
> > 
> > Regards,
> > Lukas Straub
> > 
> > Based-on: <cover.1596528468.git.lukasstraub2@web.de>
> > "Introduce 'yank' oob qmp command to recover from hanging qemu"
> > 
> > Changes:
> > 
> > v3:
> >  -resource-agent: Don't determine local qemu state by remote master-score, query
> >   directly via qmp instead
> >  -resource-agent: Add max_queue_size parameter for colo-compare
> >  -resource-agent: Fix monitor action on secondary returning error during
> >   clean shutdown
> >  -resource-agent: Fix stop action setting master-score to 0 on primary on
> >   clean shutdown
> > 
> > v2:
> >  -use new yank api
> >  -drop disk_size parameter
> >  -introduce pick_qemu_util function and use it
> > 
> > Overview:
> > 
> > Hello Everyone,
> > These patches introduce a resource agent for fully automatic management of colo
> > and a test suite building upon the resource agent to extensively test colo.
> > 
> > Test suite features:
> > -Tests failover with peer crashing and hanging and failover during checkpoint
> > -Tests network using ssh and iperf3
> > -Quick test requires no special configuration
> > -Network test for testing colo-compare
> > -Stress test: failover all the time with network load
> > 
> > Resource agent features:
> > -Fully automatic management of colo
> > -Handles many failures: hanging/crashing qemu, replication error, disk error, ...
> > -Recovers from hanging qemu by using the "yank" oob command
> > -Tracks which node has up-to-date data
> > -Works well in clusters with more than 2 nodes
> > 
> > Run times on my laptop:
> > Quick test: 200s
> > Network test: 800s (tagged as slow)
> > Stress test: 1300s (tagged as slow)
> > 
> > For the last two tests, the test suite needs access to a network bridge to
> > properly test the network, so some parameters need to be given to the test
> > run. See tests/acceptance/colo.py for more information.
> > 
> > Regards,
> > Lukas Straub
> > 
> > Lukas Straub (7):
> >   block/quorum.c: stable children names
> >   avocado_qemu: Introduce pick_qemu_util to pick qemu utility binaries
> >   boot_linux.py: Use pick_qemu_util
> >   colo: Introduce resource agent
> >   colo: Introduce high-level test suite
> >   configure,Makefile: Install colo resource-agent
> >   MAINTAINERS: Add myself as maintainer for COLO resource agent
> > 
> >  MAINTAINERS                               |    6 +
> >  Makefile                                  |    5 +
> >  block/quorum.c                            |   20 +-
> >  configure                                 |   10 +
> >  scripts/colo-resource-agent/colo          | 1501 +++++++++++++++++++++
> >  scripts/colo-resource-agent/crm_master    |   44 +
> >  scripts/colo-resource-agent/crm_resource  |   12 +
> >  tests/acceptance/avocado_qemu/__init__.py |   15 +
> >  tests/acceptance/boot_linux.py            |   11 +-
> >  tests/acceptance/colo.py                  |  677 ++++++++++
> >  10 files changed, 2286 insertions(+), 15 deletions(-)
> >  create mode 100755 scripts/colo-resource-agent/colo
> >  create mode 100755 scripts/colo-resource-agent/crm_master
> >  create mode 100755 scripts/colo-resource-agent/crm_resource
> >  create mode 100644 tests/acceptance/colo.py
> > 
> > --
> > 2.20.1  
> 
> Ping...

Ping 2...

Kevin, can you already apply patch 1 "block/quorum.c: stable children names"? It resolves the following bug: https://bugs.launchpad.net/qemu/+bug/1881231

Regards,
Lukas Straub
Re: [PATCH v3 0/7] colo: Introduce resource agent and test suite/CI
Posted by Philippe Mathieu-Daudé 3 years, 7 months ago
Hi Wainer,

As Cleber is busy with Gating CI, can you
review tests/acceptance/colo.py please?

On 8/27/20 10:40 AM, Lukas Straub wrote:
> On Tue, 18 Aug 2020 14:27:01 +0200
> Lukas Straub <lukasstraub2@web.de> wrote:
> 
>> On Tue, 4 Aug 2020 12:46:29 +0200
>> Lukas Straub <lukasstraub2@web.de> wrote:
>>
>>> Hello Everyone,
>>> So here is v3. Patch 1 can already be merged independently of the others.
>>> Please review.
>>>
>>> Regards,
>>> Lukas Straub
>>>
>>> Based-on: <cover.1596528468.git.lukasstraub2@web.de>
>>> "Introduce 'yank' oob qmp command to recover from hanging qemu"
>>>
>>> Changes:
>>>
>>> v3:
>>>  -resource-agent: Don't determine local qemu state by remote master-score, query
>>>   directly via qmp instead
>>>  -resource-agent: Add max_queue_size parameter for colo-compare
>>>  -resource-agent: Fix monitor action on secondary returning error during
>>>   clean shutdown
>>>  -resource-agent: Fix stop action setting master-score to 0 on primary on
>>>   clean shutdown
>>>
>>> v2:
>>>  -use new yank api
>>>  -drop disk_size parameter
>>>  -introduce pick_qemu_util function and use it
>>>
>>> Overview:
>>>
>>> Hello Everyone,
>>> These patches introduce a resource agent for fully automatic management of colo
>>> and a test suite building upon the resource agent to extensively test colo.
>>>
>>> Test suite features:
>>> -Tests failover with peer crashing and hanging and failover during checkpoint
>>> -Tests network using ssh and iperf3
>>> -Quick test requires no special configuration
>>> -Network test for testing colo-compare
>>> -Stress test: failover all the time with network load
>>>
>>> Resource agent features:
>>> -Fully automatic management of colo
>>> -Handles many failures: hanging/crashing qemu, replication error, disk error, ...
>>> -Recovers from hanging qemu by using the "yank" oob command
>>> -Tracks which node has up-to-date data
>>> -Works well in clusters with more than 2 nodes
>>>
>>> Run times on my laptop:
>>> Quick test: 200s
>>> Network test: 800s (tagged as slow)
>>> Stress test: 1300s (tagged as slow)
>>>
>>> For the last two tests, the test suite needs access to a network bridge to
>>> properly test the network, so some parameters need to be given to the test
>>> run. See tests/acceptance/colo.py for more information.
>>>
>>> Regards,
>>> Lukas Straub
>>>
>>> Lukas Straub (7):
>>>   block/quorum.c: stable children names
>>>   avocado_qemu: Introduce pick_qemu_util to pick qemu utility binaries
>>>   boot_linux.py: Use pick_qemu_util
>>>   colo: Introduce resource agent
>>>   colo: Introduce high-level test suite
>>>   configure,Makefile: Install colo resource-agent
>>>   MAINTAINERS: Add myself as maintainer for COLO resource agent
>>>
>>>  MAINTAINERS                               |    6 +
>>>  Makefile                                  |    5 +
>>>  block/quorum.c                            |   20 +-
>>>  configure                                 |   10 +
>>>  scripts/colo-resource-agent/colo          | 1501 +++++++++++++++++++++
>>>  scripts/colo-resource-agent/crm_master    |   44 +
>>>  scripts/colo-resource-agent/crm_resource  |   12 +
>>>  tests/acceptance/avocado_qemu/__init__.py |   15 +
>>>  tests/acceptance/boot_linux.py            |   11 +-
>>>  tests/acceptance/colo.py                  |  677 ++++++++++
>>>  10 files changed, 2286 insertions(+), 15 deletions(-)
>>>  create mode 100755 scripts/colo-resource-agent/colo
>>>  create mode 100755 scripts/colo-resource-agent/crm_master
>>>  create mode 100755 scripts/colo-resource-agent/crm_resource
>>>  create mode 100644 tests/acceptance/colo.py
>>>
>>> --
>>> 2.20.1  
>>
>> Ping...
> 
> Ping 2...
> 
> Kevin, can you already apply patch 1 "block/quorum.c: stable children names"? It resolves the following bug: https://bugs.launchpad.net/qemu/+bug/1881231
> 
> Regards,
> Lukas Straub
>