After we refactor this query then we can enable the index use.
(Both of these things together in this commit because I haven't perf
tested the version with just the refactoring.)
(We have provided an index that can answer this question really
quickly if a version is specified. But the query planner couldn't see
that because it works without seeing the bind variables, so doesn't
know that the value of name is going to be suitable for this index.)
* Convert the two EXISTS subqueries into JOIN/AND with a DISTINCT
clause naming the fields on flights, so as to replicate the previous
result rows. Then do $selection field last. The subquery is a
convenient way to let this do the previous thing for all the values
of $selection (including, notably, *).
* Add the additional AND clause for r.name, which has no logical
effect given the actual values of name, enabling the query planner
to use this index.
Perf: In my test case the sg-report-flight runtime is now ~8s. I am
reasonably confident that this will not make other use cases of this
code worse.
Perf: runtime of my test case now ~11s
Example query before (from the Perl DBI trace):
SELECT *
FROM flights f
WHERE
EXISTS (
SELECT 1
FROM runvars r
WHERE name=?
AND val=?
AND r.flight=f.flight
AND ( (CASE
WHEN (r.job) LIKE 'build-%-prev' THEN 'xprev'
WHEN ((r.job) LIKE 'build-%-freebsd'
AND 'x' = 'freebsdbuildjob') THEN 'DISCARD'
ELSE ''
END)
= '')
)
AND ( (TRUE AND flight <= 151903) AND (blessing='real') )
AND (branch=?)
ORDER BY flight DESC
LIMIT 1
After:
SELECT *
FROM ( SELECT DISTINCT
flight, started, blessing, branch, intended
FROM flights f
JOIN runvars r USING (flight)
WHERE name=?
AND name LIKE 'revision_%'
AND val=?
AND r.flight=f.flight
AND ( (CASE
WHEN (r.job) LIKE 'build-%-prev' THEN 'xprev'
WHEN ((r.job) LIKE 'build-%-freebsd'
AND 'x' = 'freebsdbuildjob') THEN 'DISCARD'
ELSE ''
END)
= '')
AND ( (TRUE AND flight <= 151903) AND (blessing='real') )
AND (branch=?)
) AS sub WHERE TRUE
ORDER BY flight DESC
LIMIT 1
In both cases with bind vars:
'revision_xen'
'165f3afbfc3db70fcfdccad07085cde0a03c858b'
"xen-unstable"
Diff to the example query:
@@ -1,10 +1,10 @@
SELECT *
+ FROM ( SELECT DISTINCT
+ flight, started, blessing, branch, intended
FROM flights f
- WHERE
- EXISTS (
- SELECT 1
- FROM runvars r
+ JOIN runvars r USING (flight)
WHERE name=?
+ AND name LIKE 'revision_%'
AND val=?
AND r.flight=f.flight
AND ( (CASE
@@ -14,8 +14,8 @@
ELSE ''
END)
= '')
- )
AND ( (TRUE AND flight <= 151903) AND (blessing='real') )
AND (branch=?)
+) AS sub WHERE TRUE
ORDER BY flight DESC
LIMIT 1
CC: George Dunlap <George.Dunlap@citrix.com>
Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
---
Osstest/Executive.pm | 20 ++++++++------------
schema/runvars-revision-index.sql | 2 +-
2 files changed, 9 insertions(+), 13 deletions(-)
diff --git a/Osstest/Executive.pm b/Osstest/Executive.pm
index c3dc1261..c272e9f2 100644
--- a/Osstest/Executive.pm
+++ b/Osstest/Executive.pm
@@ -415,37 +415,32 @@ sub report__find_test ($$$$$$$) {
my $querytext = <<END;
SELECT $selection
- FROM flights f
- WHERE
+ FROM ( SELECT DISTINCT
+ flight, started, blessing, branch, intended
+ FROM flights f
END
if (defined $revision) {
if ($tree eq 'osstest') {
$querytext .= <<END;
- EXISTS (
- SELECT 1
- FROM flights_harness_touched t
+ JOIN flights_harness_touched t USING (flight)
WHERE t.harness=?
- AND t.flight=f.flight
- )
END
push @params, $revision;
} else {
$querytext .= <<END;
- EXISTS (
- SELECT 1
- FROM runvars r
+ JOIN runvars r USING (flight)
WHERE name=?
+ AND name LIKE 'revision_%'
AND val=?
AND r.flight=f.flight
AND ${\ main_revision_job_cond('r.job') }
- )
END
push @params, "revision_$tree", $revision;
}
} else {
$querytext .= <<END;
- TRUE
+ WHERE TRUE
END
}
@@ -460,6 +455,7 @@ END
END
push @params, @$branches;
+ $querytext .= ") AS sub WHERE TRUE\n";
$querytext .= $extracond;
$querytext .= $sortlimit;
diff --git a/schema/runvars-revision-index.sql b/schema/runvars-revision-index.sql
index a2e3be13..4c1aea66 100644
--- a/schema/runvars-revision-index.sql
+++ b/schema/runvars-revision-index.sql
@@ -1,4 +1,4 @@
--- ##OSSTEST## 008 Preparatory
+-- ##OSSTEST## 008 Needed
--
-- This index helps Executive::report__find_test find relevant flights.
--
2.20.1
> On Jul 21, 2020, at 7:41 PM, Ian Jackson <ian.jackson@eu.citrix.com> wrote: > > After we refactor this query then we can enable the index use. > (Both of these things together in this commit because I haven't perf > tested the version with just the refactoring.) > > (We have provided an index that can answer this question really > quickly if a version is specified. But the query planner couldn't see > that because it works without seeing the bind variables, so doesn't > know that the value of name is going to be suitable for this index.) > > * Convert the two EXISTS subqueries into JOIN/AND with a DISTINCT > clause naming the fields on flights, so as to replicate the previous > result rows. Then do $selection field last. The subquery is a > convenient way to let this do the previous thing for all the values > of $selection (including, notably, *). > > * Add the additional AND clause for r.name, which has no logical > effect given the actual values of name, enabling the query planner > to use this index. > > Perf: In my test case the sg-report-flight runtime is now ~8s. I am > reasonably confident that this will not make other use cases of this > code worse. > > Perf: runtime of my test case now ~11s > > Example query before (from the Perl DBI trace): > > SELECT * > FROM flights f > WHERE > EXISTS ( > SELECT 1 > FROM runvars r > WHERE name=? > AND val=? > AND r.flight=f.flight > AND ( (CASE > WHEN (r.job) LIKE 'build-%-prev' THEN 'xprev' > WHEN ((r.job) LIKE 'build-%-freebsd' > AND 'x' = 'freebsdbuildjob') THEN 'DISCARD' > ELSE '' > END) > = '') > ) > AND ( (TRUE AND flight <= 151903) AND (blessing='real') ) > AND (branch=?) > ORDER BY flight DESC > LIMIT 1 So this says: Get me all the columns for the highest-numbered flight Where: There is at least one runvar for that flight has the specified $name and $value And the job is *not* like build-%-prev or build-%-freebsd The flight number (?) is <= 151903, and blessing = real For the specified $branch What’s the “TRUE and flight <= 151903” for? > > After: > > SELECT * > FROM ( SELECT DISTINCT > flight, started, blessing, branch, intended > FROM flights f > JOIN runvars r USING (flight) > WHERE name=? > AND name LIKE 'revision_%' > AND val=? > AND r.flight=f.flight > AND ( (CASE > WHEN (r.job) LIKE 'build-%-prev' THEN 'xprev' > WHEN ((r.job) LIKE 'build-%-freebsd' > AND 'x' = 'freebsdbuildjob') THEN 'DISCARD' > ELSE '' > END) > = '') > AND ( (TRUE AND flight <= 151903) AND (blessing='real') ) > AND (branch=?) > ) AS sub WHERE TRUE > ORDER BY flight DESC > LIMIT 1 And this says (effectively) Get me <flight, started, blessing, branch, intended> From the highest-numbered flight Where That flight has a runvar with specified name and value The job *doesn’t* look like “build-%-prev” or “build-%-freebsd” flight & blessing as appropriate branch as specified. Isn’t the r.flight = f.flight redundant if we’re joining on flight? Also, in spite of the paragraph attempting to explain it, I’m afraid I don’t understand what the “AS sub WHERE TRUE” is for. But it looks like the new query should do the same thing as the old query, assuming that the columns from the subquery are all the columns that you need in the correct order. -George
George Dunlap writes ("Re: [OSSTEST PATCH 08/14] Executive: Use index for report__find_test"):
> > On Jul 21, 2020, at 7:41 PM, Ian Jackson <ian.jackson@eu.citrix.com> wrote:
> > Example query before (from the Perl DBI trace):
...
> So this says:
>
> Get me all the columns
> for the highest-numbered flight
> Where:
> There is at least one runvar for that flight has the specified $name and $value
> And the job is *not* like build-%-prev or build-%-freebsd
> The flight number (?) is <= 151903, and blessing = real
> For the specified $branch
Yes.
> What’s the “TRUE and flight <= 151903” for?
These queries are programmetically constructed. In this case, the
flight condition is not always there. My test case had a
--max-flight=151903 on the command line: this is a debugging option.
It avoids newly added stuff in the db confusing me and generally
disturbing things. This is implemented with a condition variable
which contains either "" or "and flight <= 151903". Doing it this way
simplifies the generation code.
> And this says (effectively)
>
> Get me <flight, started, blessing, branch, intended>
> From the highest-numbered flight
> Where
> That flight has a runvar with specified name and value
> The job *doesn’t* look like “build-%-prev” or “build-%-freebsd”
> flight & blessing as appropriate
> branch as specified.
I think so, yes.
> Isn’t the r.flight = f.flight redundant if we’re joining on flight?
Indeed it is. I guess I can add a patch at theend to delete that.
> Also, in spite of the paragraph attempting to explain it, I’m afraid
> I don’t understand what the “AS sub WHERE TRUE” is for.
The reason for the subquery is not evident in the SQL. It's because
of the Perl code which generates this query. The same code is used to
generate queries that start with things like
SELECT * ...
SELECT COUNT(*) AS count ...
The perl code gets told "*" or "COUNT(*) AS count". The call sites
that pass "*" expect to see fields from flights. It would be
possible to change "*" to the explicit field list everywhere, but
it was much easier to do it this way.
(The WHERE TRUE is another one of these stubs where a condition might
appear.)
> But it looks like the new query should do the same thing as the old
> query, assuming that the columns from the subquery are all the
> columns that you need in the correct order.
The subquery columns are precisely the columns currently existing in
he flights table.
Thanks,
Ian.
© 2016 - 2026 Red Hat, Inc.