Hi Ian,
On 4/18/19 5:31 PM, Ian Jackson wrote:
> Sometimes we find ourselves seriously lacking the capacity to run
> particular job(s). The result can be that the whole system stands
> mostly idle while a small proportion of the resources runs flat out
> with a giant queue.
>
> In this series we arrange for osstest to be able to spot this
> happening, and automatically rebalance load by give up earlier on the
> jobs which are overly-contended.
>
> There are some tuning parameters, of course. To summarise, I have
> chosen here to treat jobs as starved if (for example):
> We have completed 90% of the flight, and the remaining 10%
> is projected to take 5x as long as the first 90%.
> (The "90%" is by number of jobs.) See the patch
> starvation: Infrastructure for jobs which are delaying their flights
> for details of the heuristic and its parameters.
>
> When situations like this persist it will still be good to manually
> balance the load by adjusting the job mix in submitted flights. This
> is because the starvation will not necessarily drop the same job in
> subsequent flights on the same "branch", so starvation will impair the
> regression detection.
As we discussed on IRC, I understand this will have an impact on Arm32
testing. Do you have an estimate how likely the tests will be skipped?
I am wondering whether we should discuss to reduce the number of
testings done on Arm32. We did that in the past on Arm64 when we were
struggle with broken laxton0/laxton1.
Cheers,
>
> Ian Jackson (21):
> ts-hosts-allocate-Executive: with -U, just append to the same logfile
> selecthost: Honour new $none_ok optional parameter
> ts-logs-capture: Do not try to capture logs of hosts not allocated
> alloc_resources: Support special abandonment values
> starvation: Teach sg-report-flight about starved step state
> starvation: Teach archaeologists about starved job state
> starvation: Teach ms-flights-summary about job state starved
> starvation: Teach sg-execute-flight about job state starved
> step handling: Preserve step states set by ts-* scripts
> TestSupport: Make "broken" print the actual job state
> JobDB::Executive: step_*: fix log messages to talk about "steps"
> starvation: Permit step_finish to set the state `starved'
> TestSupport: Make "broken" set the step state too
> tcl/JobDB-Executive: Do not squash "starved" status
> starvation: Propagate starved job status into dependent jobs
> ts-host-allocate-Executive: Break out $now and add a newline
> starvation: Use "starved" for hostalloc_maxwait_max
> starvation: Infrastructure for jobs which are delaying their flights
> starvation: Abandon jobs which are unreasonably delaying their flight
> hostalloc_maxwait_max: Use starvation most_optimistic
> starvation: Better logging/debugging output
>
> Osstest/Executive.pm | 95 ++++++++++++++++++++++++++---
> Osstest/JobDB/Executive.pm | 8 ++-
> Osstest/TestSupport.pm | 24 ++++++--
> mg-hostalloc-starvation-demo | 53 ++++++++++++++++
> ms-flights-summary | 9 +--
> sg-execute-flight | 2 +-
> sg-report-flight | 17 +++++-
> tcl/JobDB-Executive.tcl | 6 +-
> ts-hosts-allocate-Executive | 142 ++++++++++++++++++++++++++++++++++++++++---
> ts-logs-capture | 7 ++-
> 10 files changed, 328 insertions(+), 35 deletions(-)
> create mode 100755 mg-hostalloc-starvation-demo
>
--
Julien Grall
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel