Stuff the two queries together: we use the firsty query as a WITH
clause. This is significantly faster, perhaps because the query
optimiser does a better job but probably just because it saves on
round trips.
No functional change.
Perf: subjectively this seemed to help when the cache was cold. Now I
have a warm cache and it doesn't seem to make much difference.
Perf: runtime of my test case now ~5-7s.
Example queries before (from the debugging output):
Query A part I:
SELECT f.flight AS flight,
j.job AS job,
f.started AS started,
j.status AS status
FROM flights f
JOIN jobs j USING (flight)
JOIN runvars r
ON f.flight=r.flight
AND r.name=?
WHERE j.job=r.job
AND f.blessing=?
AND f.branch=?
AND j.job=?
AND r.val=?
AND (j.status='pass' OR j.status='fail'
OR j.status='truncated'!)
AND f.started IS NOT NULL
AND f.started >= ?
ORDER BY f.started DESC
With bind variables:
"test-amd64-i386-xl-pvshim"
"guest-start"
Query B part I:
SELECT f.flight AS flight,
s.job AS job,
NULL as started,
NULL as status,
max(s.finished) AS max_finished
FROM steps s JOIN flights f
ON s.flight=f.flight
WHERE s.job=? AND f.blessing=? AND f.branch=?
AND s.finished IS NOT NULL
AND f.started IS NOT NULL
AND f.started >= ?
GROUP BY f.flight, s.job
ORDER BY max_finished DESC
With bind variables:
"test-armhf-armhf-libvirt"
'real'
"xen-unstable"
1594144469
Query common part II:
WITH tsteps AS
(
SELECT *
FROM steps
WHERE flight=? AND job=?
)
, tsteps2 AS
(
SELECT *
FROM tsteps
WHERE finished <=
(SELECT finished
FROM tsteps
WHERE tsteps.testid = ?)
)
SELECT (
SELECT max(finished)-min(started)
FROM tsteps2
) - (
SELECT sum(finished-started)
FROM tsteps2
WHERE step = 'ts-hosts-allocate'
)
AS duration
With bind variables from previous query, eg:
152045
"test-armhf-armhf-libvirt"
"guest-start.2"
After:
Query A (combined):
WITH f AS (
SELECT f.flight AS flight,
j.job AS job,
f.started AS started,
j.status AS status
FROM flights f
JOIN jobs j USING (flight)
JOIN runvars r
ON f.flight=r.flight
AND r.name=?
WHERE j.job=r.job
AND f.blessing=?
AND f.branch=?
AND j.job=?
AND r.val=?
AND (j.status='pass' OR j.status='fail'
OR j.status='truncated'!)
AND f.started IS NOT NULL
AND f.started >= ?
ORDER BY f.started DESC
)
SELECT flight, max_finished, job, started, status,
(
WITH tsteps AS
(
SELECT *
FROM steps
WHERE flight=f.flight AND job=f.job
)
, tsteps2 AS
(
SELECT *
FROM tsteps
WHERE finished <=
(SELECT finished
FROM tsteps
WHERE tsteps.testid = ?)
)
SELECT (
SELECT max(finished)-min(started)
FROM tsteps2
) - (
SELECT sum(finished-started)
FROM tsteps2
WHERE step = 'ts-hosts-allocate'
)
AS duration
) FROM f
Query B (combined):
WITH f AS (
SELECT f.flight AS flight,
s.job AS job,
NULL as started,
NULL as status,
max(s.finished) AS max_finished
FROM steps s JOIN flights f
ON s.flight=f.flight
WHERE s.job=? AND f.blessing=? AND f.branch=?
AND s.finished IS NOT NULL
AND f.started IS NOT NULL
AND f.started >= ?
GROUP BY f.flight, s.job
ORDER BY max_finished DESC
)
SELECT flight, max_finished, job, started, status,
(
WITH tsteps AS
(
SELECT *
FROM steps
WHERE flight=f.flight AND job=f.job
)
, tsteps2 AS
(
SELECT *
FROM tsteps
WHERE finished <=
(SELECT finished
FROM tsteps
WHERE tsteps.testid = ?)
)
SELECT (
SELECT max(finished)-min(started)
FROM tsteps2
) - (
SELECT sum(finished-started)
FROM tsteps2
WHERE step = 'ts-hosts-allocate'
)
AS duration
) FROM f
Diff for query A:
@@ -1,3 +1,4 @@
+ WITH f AS (
SELECT f.flight AS flight,
j.job AS job,
f.started AS started,
@@ -18,11 +19,14 @@
AND f.started >= ?
ORDER BY f.started DESC
+ )
+ SELECT flight, max_finished, job, started, status,
+ (
WITH tsteps AS
(
SELECT *
FROM steps
- WHERE flight=? AND job=?
+ WHERE flight=f.flight AND job=f.job
)
, tsteps2 AS
(
@@ -42,3 +46,5 @@
WHERE step = 'ts-hosts-allocate'
)
AS duration
+
+ ) FROM f
Diff for query B:
@@ -1,3 +1,4 @@
+ WITH f AS (
SELECT f.flight AS flight,
s.job AS job,
NULL as started,
@@ -12,11 +13,14 @@
GROUP BY f.flight, s.job
ORDER BY max_finished DESC
+ )
+ SELECT flight, max_finished, job, started, status,
+ (
WITH tsteps AS
(
SELECT *
FROM steps
- WHERE flight=? AND job=?
+ WHERE flight=f.flight AND job=f.job
)
, tsteps2 AS
(
@@ -36,3 +40,5 @@
WHERE step = 'ts-hosts-allocate'
)
AS duration
+
+ ) FROM f
CC: George Dunlap <George.Dunlap@citrix.com>
Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
---
Osstest/Executive.pm | 31 ++++++++++++++++++++-----------
1 file changed, 20 insertions(+), 11 deletions(-)
diff --git a/Osstest/Executive.pm b/Osstest/Executive.pm
index 621153ee..66c93ab9 100644
--- a/Osstest/Executive.pm
+++ b/Osstest/Executive.pm
@@ -1192,7 +1192,7 @@ END
(
SELECT *
FROM steps
- WHERE flight=? AND job=?
+ WHERE flight=f.flight AND job=f.job
)
END_ALWAYS
, tsteps2 AS
@@ -1216,9 +1216,20 @@ END_UPTOINCL
AS duration
END_ALWAYS
- my $recentflights_q= $dbh_tests->prepare($recentflights_qtxt);
- my $duration_anyref_q= $dbh_tests->prepare($duration_anyref_qtxt);
- my $duration_duration_q = $dbh_tests->prepare($duration_duration_qtxt);
+ my $prepare_combi = sub {
+ db_prepare(<<END);
+ WITH f AS (
+$_[0]
+ )
+ SELECT flight, max_finished, job, started, status,
+ (
+$duration_duration_qtxt
+ ) FROM f
+END
+ };
+
+ my $recentflights_q= $prepare_combi->($recentflights_qtxt);
+ my $duration_anyref_q= $prepare_combi->($duration_anyref_qtxt);
return sub {
my ($job, $hostidname, $onhost, $uptoincl_testid) = @_;
@@ -1239,14 +1250,16 @@ END_ALWAYS
$branch,
$job,
$onhost,
- $limit);
+ $limit,
+ @x_params);
$refs= $recentflights_q->fetchall_arrayref({});
$recentflights_q->finish();
$dbg->("SAME-HOST GOT ".scalar(@$refs));
}
if (!@$refs) {
- $duration_anyref_q->execute($job, $blessing, $branch, $limit);
+ $duration_anyref_q->execute($job, $blessing, $branch, $limit,
+ @x_params);
$refs= $duration_anyref_q->fetchall_arrayref({});
$duration_anyref_q->finish();
$dbg->("ANY-HOST GOT ".scalar(@$refs));
@@ -1259,11 +1272,7 @@ END_ALWAYS
my $duration_max= 0;
foreach my $ref (@$refs) {
- my @d_d_args = ($ref->{flight}, $job);
- push @d_d_args, @x_params;
- $duration_duration_q->execute(@d_d_args);
- my ($duration) = $duration_duration_q->fetchrow_array();
- $duration_duration_q->finish();
+ my ($duration) = $ref->{duration};
if ($duration) {
$dbg->("REF $ref->{flight} DURATION $duration ".
($ref->{status} // ''));
--
2.20.1
> On Jul 21, 2020, at 7:42 PM, Ian Jackson <ian.jackson@eu.citrix.com> wrote: > > Stuff the two queries together: we use the firsty query as a WITH > clause. This is significantly faster, perhaps because the query > optimiser does a better job but probably just because it saves on > round trips. > > No functional change. > > Perf: subjectively this seemed to help when the cache was cold. Now I > have a warm cache and it doesn't seem to make much difference. > > Perf: runtime of my test case now ~5-7s. > > Example queries before (from the debugging output): > > Query A part I: > > SELECT f.flight AS flight, > j.job AS job, > f.started AS started, > j.status AS status > FROM flights f > JOIN jobs j USING (flight) > JOIN runvars r > ON f.flight=r.flight > AND r.name=? > WHERE j.job=r.job Did these last two get mixed up? My limited experience w/ JOIN ON and WHERE would lead me to expect we’re joining on `f.flight=r.flight and r.job = j.job`, and having `r.name = ?` as part of the WHERE clause. I see it’s the same in the combined query as well. > AND f.blessing=? > AND f.branch=? > AND j.job=? > AND r.val=? > AND (j.status='pass' OR j.status='fail' > OR j.status='truncated'!) > AND f.started IS NOT NULL > AND f.started >= ? > ORDER BY f.started DESC > > With bind variables: > "test-amd64-i386-xl-pvshim" > "guest-start" > > Query B part I: > > SELECT f.flight AS flight, > s.job AS job, > NULL as started, > NULL as status, > max(s.finished) AS max_finished > FROM steps s JOIN flights f > ON s.flight=f.flight > WHERE s.job=? AND f.blessing=? AND f.branch=? > AND s.finished IS NOT NULL > AND f.started IS NOT NULL > AND f.started >= ? > GROUP BY f.flight, s.job > ORDER BY max_finished DESC > > With bind variables: > "test-armhf-armhf-libvirt" > 'real' > "xen-unstable" > 1594144469 > > Query common part II: > > WITH tsteps AS > ( > SELECT * > FROM steps > WHERE flight=? AND job=? > ) > , tsteps2 AS > ( > SELECT * > FROM tsteps > WHERE finished <= > (SELECT finished > FROM tsteps > WHERE tsteps.testid = ?) > ) > SELECT ( > SELECT max(finished)-min(started) > FROM tsteps2 > ) - ( > SELECT sum(finished-started) > FROM tsteps2 > WHERE step = 'ts-hosts-allocate' > ) > AS duration Er, wait — you were doing a separate `duration` query for each row of the previous query? Yeah, that sounds like it could be a lot of round trips. :-) > > With bind variables from previous query, eg: > 152045 > "test-armhf-armhf-libvirt" > "guest-start.2" > > After: > > Query A (combined): > > WITH f AS ( > SELECT f.flight AS flight, > j.job AS job, > f.started AS started, > j.status AS status > FROM flights f > JOIN jobs j USING (flight) > JOIN runvars r > ON f.flight=r.flight > AND r.name=? > WHERE j.job=r.job > AND f.blessing=? > AND f.branch=? > AND j.job=? > AND r.val=? > AND (j.status='pass' OR j.status='fail' > OR j.status='truncated'!) > AND f.started IS NOT NULL > AND f.started >= ? > ORDER BY f.started DESC > > ) > SELECT flight, max_finished, job, started, status, > ( > WITH tsteps AS > ( > SELECT * > FROM steps > WHERE flight=f.flight AND job=f.job > ) > , tsteps2 AS > ( > SELECT * > FROM tsteps > WHERE finished <= > (SELECT finished > FROM tsteps > WHERE tsteps.testid = ?) > ) > SELECT ( > SELECT max(finished)-min(started) > FROM tsteps2 > ) - ( > SELECT sum(finished-started) > FROM tsteps2 > WHERE step = 'ts-hosts-allocate' > ) > AS duration > > ) FROM f I mean, in both queries (A and B), the transform should basically result in the same thing happening, as far as I can tell. I can try to analyze the duration query and see if I can come up with any suggestions, but that would be a different patch anyway. -George
George Dunlap writes ("Re: [OSSTEST PATCH 14/14] duration_estimator: Move duration query loop into database"):
> > On Jul 21, 2020, at 7:42 PM, Ian Jackson <ian.jackson@eu.citrix.com> wrote:
...
> > Example queries before (from the debugging output):
> >
> > Query A part I:
> >
> > SELECT f.flight AS flight,
> > j.job AS job,
> > f.started AS started,
> > j.status AS status
> > FROM flights f
> > JOIN jobs j USING (flight)
> > JOIN runvars r
> > ON f.flight=r.flight
> > AND r.name=?
> > WHERE j.job=r.job
>
> Did these last two get mixed up? My limited experience w/ JOIN ON
> and WHERE would lead me to expect we’re joining on
> `f.flight=r.flight and r.job = j.job`, and having `r.name = ?` as
> part of the WHERE clause. I see it’s the same in the combined query
> as well.
Well spotted. However, actually, this makes no difference: with an
inner join, ON clauses are the same as WHERE clauses. It does seem
stylistically poor though, so I will add a commit to change it.
> > Query common part II:
> >
> > WITH tsteps AS
> > (
> > SELECT *
> > FROM steps
> > WHERE flight=? AND job=?
> > )
> > , tsteps2 AS
> > (
> > SELECT *
> > FROM tsteps
> > WHERE finished <=
> > (SELECT finished
> > FROM tsteps
> > WHERE tsteps.testid = ?)
> > )
> > SELECT (
> > SELECT max(finished)-min(started)
> > FROM tsteps2
> > ) - (
> > SELECT sum(finished-started)
> > FROM tsteps2
> > WHERE step = 'ts-hosts-allocate'
> > )
> > AS duration
>
> Er, wait — you were doing a separate `duration` query for each row of the previous query? Yeah, that sounds like it could be a lot of round trips. :-)
I was doing, yes. This code was not really very optimised.
> I mean, in both queries (A and B), the transform should basically result in the same thing happening, as far as I can tell.
Good, thanks.
> I can try to analyze the duration query and see if I can come up with any suggestions, but that would be a different patch anyway.
It's fast enough now :-).
Thanks,
Ian.
> On Jul 31, 2020, at 11:39 AM, Ian Jackson <ian.jackson@citrix.com> wrote:
>
> George Dunlap writes ("Re: [OSSTEST PATCH 14/14] duration_estimator: Move duration query loop into database"):
>>> On Jul 21, 2020, at 7:42 PM, Ian Jackson <ian.jackson@eu.citrix.com> wrote:
> ...
>>> Example queries before (from the debugging output):
>>>
>>> Query A part I:
>>>
>>> SELECT f.flight AS flight,
>>> j.job AS job,
>>> f.started AS started,
>>> j.status AS status
>>> FROM flights f
>>> JOIN jobs j USING (flight)
>>> JOIN runvars r
>>> ON f.flight=r.flight
>>> AND r.name=?
>>> WHERE j.job=r.job
>>
>> Did these last two get mixed up? My limited experience w/ JOIN ON
>> and WHERE would lead me to expect we’re joining on
>> `f.flight=r.flight and r.job = j.job`, and having `r.name = ?` as
>> part of the WHERE clause. I see it’s the same in the combined query
>> as well.
>
> Well spotted. However, actually, this makes no difference: with an
> inner join, ON clauses are the same as WHERE clauses. It does seem
> stylistically poor though, so I will add a commit to change it.
Yeah, in my tiny amount of experience with SQLite, putting this sort of restriction in WHERE rather than ON didn’t seem to make a practical difference; no doubt the query planner is smart enough to DTRT. But switching them should make it slightly easier for humans to parse, so is probably worth doing while you’re here, if you have a few spare cycles.
Thanks,
-George
© 2016 - 2026 Red Hat, Inc.