[Xen-devel] [OSSTEST PATCH 00/21] Abandon jobs which are unreasonably delaying their flight

Ian Jackson posted 21 patches 5 years ago
Failed in applying to current master (apply log)
Osstest/Executive.pm         |  95 ++++++++++++++++++++++++++---
Osstest/JobDB/Executive.pm   |   8 ++-
Osstest/TestSupport.pm       |  24 ++++++--
mg-hostalloc-starvation-demo |  53 ++++++++++++++++
ms-flights-summary           |   9 +--
sg-execute-flight            |   2 +-
sg-report-flight             |  17 +++++-
tcl/JobDB-Executive.tcl      |   6 +-
ts-hosts-allocate-Executive  | 142 ++++++++++++++++++++++++++++++++++++++++---
ts-logs-capture              |   7 ++-
10 files changed, 328 insertions(+), 35 deletions(-)
create mode 100755 mg-hostalloc-starvation-demo
[Xen-devel] [OSSTEST PATCH 00/21] Abandon jobs which are unreasonably delaying their flight
Posted by Ian Jackson 5 years ago
Sometimes we find ourselves seriously lacking the capacity to run
particular job(s).  The result can be that the whole system stands
mostly idle while a small proportion of the resources runs flat out
with a giant queue.

In this series we arrange for osstest to be able to spot this
happening, and automatically rebalance load by give up earlier on the
jobs which are overly-contended.

There are some tuning parameters, of course.  To summarise, I have
chosen here to treat jobs as starved if (for example):
  We have completed 90% of the flight, and the remaining 10%
  is projected to take 5x as long as the first 90%.
(The "90%" is by number of jobs.)  See the patch
  starvation: Infrastructure for jobs which are delaying their flights
for details of the heuristic and its parameters.

When situations like this persist it will still be good to manually
balance the load by adjusting the job mix in submitted flights.  This
is because the starvation will not necessarily drop the same job in
subsequent flights on the same "branch", so starvation will impair the
regression detection.

Ian Jackson (21):
  ts-hosts-allocate-Executive: with -U, just append to the same logfile
  selecthost: Honour new $none_ok optional parameter
  ts-logs-capture: Do not try to capture logs of hosts not allocated
  alloc_resources: Support special abandonment values
  starvation: Teach sg-report-flight about starved step state
  starvation: Teach archaeologists about starved job state
  starvation: Teach ms-flights-summary about job state starved
  starvation: Teach sg-execute-flight about job state starved
  step handling: Preserve step states set by ts-* scripts
  TestSupport: Make "broken" print the actual job state
  JobDB::Executive: step_*: fix log messages to talk about "steps"
  starvation: Permit step_finish to set the state `starved'
  TestSupport: Make "broken" set the step state too
  tcl/JobDB-Executive: Do not squash "starved" status
  starvation: Propagate starved job status into dependent jobs
  ts-host-allocate-Executive: Break out $now and add a newline
  starvation: Use "starved" for hostalloc_maxwait_max
  starvation: Infrastructure for jobs which are delaying their flights
  starvation: Abandon jobs which are unreasonably delaying their flight
  hostalloc_maxwait_max: Use starvation most_optimistic
  starvation: Better logging/debugging output

 Osstest/Executive.pm         |  95 ++++++++++++++++++++++++++---
 Osstest/JobDB/Executive.pm   |   8 ++-
 Osstest/TestSupport.pm       |  24 ++++++--
 mg-hostalloc-starvation-demo |  53 ++++++++++++++++
 ms-flights-summary           |   9 +--
 sg-execute-flight            |   2 +-
 sg-report-flight             |  17 +++++-
 tcl/JobDB-Executive.tcl      |   6 +-
 ts-hosts-allocate-Executive  | 142 ++++++++++++++++++++++++++++++++++++++++---
 ts-logs-capture              |   7 ++-
 10 files changed, 328 insertions(+), 35 deletions(-)
 create mode 100755 mg-hostalloc-starvation-demo

-- 
2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [OSSTEST PATCH 00/21] Abandon jobs which are unreasonably delaying their flight
Posted by Julien Grall 4 years, 11 months ago
Hi Ian,

On 4/18/19 5:31 PM, Ian Jackson wrote:
> Sometimes we find ourselves seriously lacking the capacity to run
> particular job(s).  The result can be that the whole system stands
> mostly idle while a small proportion of the resources runs flat out
> with a giant queue.
> 
> In this series we arrange for osstest to be able to spot this
> happening, and automatically rebalance load by give up earlier on the
> jobs which are overly-contended.
> 
> There are some tuning parameters, of course.  To summarise, I have
> chosen here to treat jobs as starved if (for example):
>    We have completed 90% of the flight, and the remaining 10%
>    is projected to take 5x as long as the first 90%.
> (The "90%" is by number of jobs.)  See the patch
>    starvation: Infrastructure for jobs which are delaying their flights
> for details of the heuristic and its parameters.
> 
> When situations like this persist it will still be good to manually
> balance the load by adjusting the job mix in submitted flights.  This
> is because the starvation will not necessarily drop the same job in
> subsequent flights on the same "branch", so starvation will impair the
> regression detection.

As we discussed on IRC, I understand this will have an impact on Arm32 
testing. Do you have an estimate how likely the tests will be skipped?

I am wondering whether we should discuss to reduce the number of 
testings done on Arm32. We did that in the past on Arm64 when we were 
struggle with broken laxton0/laxton1.

Cheers,

> 
> Ian Jackson (21):
>    ts-hosts-allocate-Executive: with -U, just append to the same logfile
>    selecthost: Honour new $none_ok optional parameter
>    ts-logs-capture: Do not try to capture logs of hosts not allocated
>    alloc_resources: Support special abandonment values
>    starvation: Teach sg-report-flight about starved step state
>    starvation: Teach archaeologists about starved job state
>    starvation: Teach ms-flights-summary about job state starved
>    starvation: Teach sg-execute-flight about job state starved
>    step handling: Preserve step states set by ts-* scripts
>    TestSupport: Make "broken" print the actual job state
>    JobDB::Executive: step_*: fix log messages to talk about "steps"
>    starvation: Permit step_finish to set the state `starved'
>    TestSupport: Make "broken" set the step state too
>    tcl/JobDB-Executive: Do not squash "starved" status
>    starvation: Propagate starved job status into dependent jobs
>    ts-host-allocate-Executive: Break out $now and add a newline
>    starvation: Use "starved" for hostalloc_maxwait_max
>    starvation: Infrastructure for jobs which are delaying their flights
>    starvation: Abandon jobs which are unreasonably delaying their flight
>    hostalloc_maxwait_max: Use starvation most_optimistic
>    starvation: Better logging/debugging output
> 
>   Osstest/Executive.pm         |  95 ++++++++++++++++++++++++++---
>   Osstest/JobDB/Executive.pm   |   8 ++-
>   Osstest/TestSupport.pm       |  24 ++++++--
>   mg-hostalloc-starvation-demo |  53 ++++++++++++++++
>   ms-flights-summary           |   9 +--
>   sg-execute-flight            |   2 +-
>   sg-report-flight             |  17 +++++-
>   tcl/JobDB-Executive.tcl      |   6 +-
>   ts-hosts-allocate-Executive  | 142 ++++++++++++++++++++++++++++++++++++++++---
>   ts-logs-capture              |   7 ++-
>   10 files changed, 328 insertions(+), 35 deletions(-)
>   create mode 100755 mg-hostalloc-starvation-demo
> 

-- 
Julien Grall

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [OSSTEST PATCH 00/21] Abandon jobs which are unreasonably delaying their flight
Posted by Ian Jackson 4 years, 11 months ago
Julien Grall writes ("Re: [OSSTEST PATCH 00/21] Abandon jobs which are unreasonably delaying their flight"):
> As we discussed on IRC, I understand this will have an impact on Arm32 
> testing. Do you have an estimate how likely the tests will be skipped?

Many, maybe most.  Very likely the smoke tests will continue to run.

> I am wondering whether we should discuss to reduce the number of 
> testings done on Arm32. We did that in the past on Arm64 when we were 
> struggle with broken laxton0/laxton1.

That is a sensible suggestion but before we do that kind of manual
rebalancing I would like to try moving at least the kernel builds, so
they run as amd64 cross builds.  I think that will free up a lot of
capacity.

In the meantime, are you happy with me pushing this series to osstest
pretest at some point when convenient ?

Regards,
Ian.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [OSSTEST PATCH 00/21] Abandon jobs which are unreasonably delaying their flight
Posted by Julien Grall 4 years, 11 months ago
Hi Ian,

On 29/04/2019 15:46, Ian Jackson wrote:
> Julien Grall writes ("Re: [OSSTEST PATCH 00/21] Abandon jobs which are unreasonably delaying their flight"):
>> As we discussed on IRC, I understand this will have an impact on Arm32
>> testing. Do you have an estimate how likely the tests will be skipped?
> 
> Many, maybe most.  Very likely the smoke tests will continue to run.
> 
>> I am wondering whether we should discuss to reduce the number of
>> testings done on Arm32. We did that in the past on Arm64 when we were
>> struggle with broken laxton0/laxton1.
> 
> That is a sensible suggestion but before we do that kind of manual
> rebalancing I would like to try moving at least the kernel builds, so
> they run as amd64 cross builds.  I think that will free up a lot of
> capacity.
> 
> In the meantime, are you happy with me pushing this series to osstest > pretest at some point when convenient ?

I am happy with that. Let's see how much tests will get dropped on Arm32 :).

Cheers,

-- 
Julien Grall

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel