Specifically, we narrow the initial query to flights which have at
least some job with the built_revision_foo we are looking for.
This condition is strictly broader than that implemented inside the
flight search loop, so there is no functional change.
Perf: runtime of my test case now ~300s-500s.
Example query before (from the Perl DBI trace):
SELECT * FROM (
SELECT flight, blessing FROM flights
WHERE (branch='xen-unstable')
AND EXISTS (SELECT 1
FROM jobs
WHERE jobs.flight = flights.flight
AND jobs.job = ?)
AND ( (TRUE AND flight <= 151903) AND (blessing='real') )
ORDER BY flight DESC
LIMIT 1000
) AS sub
ORDER BY blessing ASC, flight DESC
With these bind variables:
"test-armhf-armhf-libvirt"
After:
SELECT * FROM (
SELECT DISTINCT flight, blessing
FROM flights
JOIN runvars r1 USING (flight)
WHERE (branch='xen-unstable')
AND ( (TRUE AND flight <= 151903) AND (blessing='real') )
AND EXISTS (SELECT 1
FROM jobs
WHERE jobs.flight = flights.flight
AND jobs.job = ?)
AND r1.name LIKE 'built_revision_%'
AND r1.name = ?
AND r1.val= ?
ORDER BY flight DESC
LIMIT 1000
) AS sub
ORDER BY blessing ASC, flight DESC
With these bind variables:
"test-armhf-armhf-libvirt"
'built_revision_xen'
'165f3afbfc3db70fcfdccad07085cde0a03c858b'
Diff to the query:
SELECT * FROM (
- SELECT flight, blessing FROM flights
+ SELECT DISTINCT flight, blessing
+ FROM flights
+ JOIN runvars r1 USING (flight)
+
WHERE (branch='xen-unstable')
+ AND ( (TRUE AND flight <= 151903) AND (blessing='real') )
AND EXISTS (SELECT 1
FROM jobs
WHERE jobs.flight = flights.flight
AND jobs.job = ?)
- AND ( (TRUE AND flight <= 151903) AND (blessing='real') )
+ AND r1.name LIKE 'built_revision_%'
+ AND r1.name = ?
+ AND r1.val= ?
+
ORDER BY flight DESC
LIMIT 1000
) AS sub
CC: George Dunlap <George.Dunlap@citrix.com>
Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
---
schema/runvars-built-index.sql | 2 +-
sg-report-flight | 64 ++++++++++++++++++++++++++++++++--
2 files changed, 62 insertions(+), 4 deletions(-)
diff --git a/schema/runvars-built-index.sql b/schema/runvars-built-index.sql
index 94f85ed8..8582227e 100644
--- a/schema/runvars-built-index.sql
+++ b/schema/runvars-built-index.sql
@@ -1,4 +1,4 @@
--- ##OSSTEST## 007 Preparatory
+-- ##OSSTEST## 007 Needed
--
-- This index helps sg-report-flight find relevant flights.
diff --git a/sg-report-flight b/sg-report-flight
index 70def778..61aec7a8 100755
--- a/sg-report-flight
+++ b/sg-report-flight
@@ -185,19 +185,77 @@ END
if (defined $job) {
push @flightsq_params, $job;
$flightsq_jobcond = <<END;
- EXISTS (SELECT 1
+ AND EXISTS (SELECT 1
FROM jobs
WHERE jobs.flight = flights.flight
AND jobs.job = ?)
END
}
+ # We build a slightly complicated query to find possibly-relevant
+ # flights. A "possibly-relevant" flight is one which the main
+ # flight categorisation algorithm below (the loop over $tflight)
+ # *might* decide is of interest.
+ #
+ # That algorithm produces a table of which revision(s) of what
+ # %specver trees the build jobs for the relevant test job used.
+ # And then it insists (amongst other things) that for each such
+ # tree the revision in question appears.
+ #
+ # It only looks at build jobs within the flight. So any flight
+ # that the main algorithm finds interesting will have *some* job
+ # (in the same flight) mentioning that revision in a built
+ # revision runvar. So we can search the runvars table by its
+ # index on the revision.
+ #
+ # So we look for flights that have an appropriate entry in runvars
+ # for each %specver tree. We can do this by joining the runvar
+ # table once for each tree.
+ #
+ # The "osstest" tree is handled specially. as ever. (We use
+ # "r$ri" there too for orthogonality of the code, not because
+ # there could be multiple specifiations for the osstest revision.)
+ #
+ # This complex query is an optimisation: for correctness, we must
+ # still execute the full job-specific recursive examination, for
+ # each possibly-relevant flight - that's the $tflight loop body.
+
+ my $runvars_joins = '';
+ my $runvars_conds = '';
+ my $ri=0;
+ foreach my $tree (sort keys %{ $specver{$thisthat} }) {
+ $ri++;
+ if ($tree ne 'osstest') {
+ $runvars_joins .= <<END;
+ JOIN runvars r$ri USING (flight)
+END
+ $runvars_conds .= <<END;
+ AND r$ri.name LIKE 'built_revision_%'
+ AND r$ri.name = ?
+ AND r$ri.val= ?
+END
+ push @flightsq_params, "built_revision_$tree",
+ $specver{$thisthat}{$tree};
+ } else {
+ $runvars_joins .= <<END;
+ JOIN flights_harness_touched r$ri USING (flight)
+END
+ $runvars_conds .= <<END;
+ AND r$ri.harness= ?
+END
+ push @flightsq_params, $specver{$thisthat}{$tree};
+ }
+ }
+
my $flightsq= <<END;
SELECT * FROM (
- SELECT flight, blessing FROM flights
+ SELECT DISTINCT flight, blessing
+ FROM flights
+$runvars_joins
WHERE $branches_cond_q
- AND $flightsq_jobcond
AND $blessingscond
+$flightsq_jobcond
+$runvars_conds
ORDER BY flight DESC
LIMIT 1000
) AS sub
--
2.20.1
> On Jul 21, 2020, at 7:41 PM, Ian Jackson <ian.jackson@eu.citrix.com> wrote: > > Specifically, we narrow the initial query to flights which have at > least some job with the built_revision_foo we are looking for. > > This condition is strictly broader than that implemented inside the > flight search loop, so there is no functional change. > > Perf: runtime of my test case now ~300s-500s. > > Example query before (from the Perl DBI trace): > > SELECT * FROM ( > SELECT flight, blessing FROM flights > WHERE (branch='xen-unstable') > AND EXISTS (SELECT 1 > FROM jobs > WHERE jobs.flight = flights.flight > AND jobs.job = ?) > > AND ( (TRUE AND flight <= 151903) AND (blessing='real') ) > ORDER BY flight DESC > LIMIT 1000 > ) AS sub > ORDER BY blessing ASC, flight DESC This one says: Find the 1000 most recent flights Where branch is "xen-unstable” one of its jobs is $job And blessing is “real” But why are we selecting ‘blessing’ from these, if we’ve specified that blessing = “real”? Isn’t that redundant? > > With these bind variables: > > "test-armhf-armhf-libvirt" > > After: > > SELECT * FROM ( > SELECT DISTINCT flight, blessing > FROM flights > JOIN runvars r1 USING (flight) > > WHERE (branch='xen-unstable') > AND ( (TRUE AND flight <= 151903) AND (blessing='real') ) > AND EXISTS (SELECT 1 > FROM jobs > WHERE jobs.flight = flights.flight > AND jobs.job = ?) > > AND r1.name LIKE 'built_revision_%' > AND r1.name = ? > AND r1.val= ? > > ORDER BY flight DESC > LIMIT 1000 > ) AS sub > ORDER BY blessing ASC, flight DESC So this says: Find me the most 1000 recent flights Where: branch is “xen-unstable” flight <= 15903 blessing is “real” One of its jobs is $job It has a runvar matching given $name and $val And of course it uses the ’name LIKE ‘built_revision_%’ index. Still don’t understand the ’TRUE AND’ and ‘AS sub’ bits, but it looks to me like it’s substantially the same query, with additional $name = $val runvar restriction. And given that you say, "This condition is strictly broader than that implemented inside the flight search loop”, I take it that it’s again mainly to take advantage of the new index? -George
George Dunlap writes ("Re: [OSSTEST PATCH 04/14] sg-report-flight: Ask the db for flights of interest"):
> > On Jul 21, 2020, at 7:41 PM, Ian Jackson <ian.jackson@eu.citrix.com> wrote:
> > Example query before (from the Perl DBI trace):
> >
> > SELECT * FROM (
> > SELECT flight, blessing FROM flights
...
> > AND ( (TRUE AND flight <= 151903) AND (blessing='real') )
...
> But why are we selecting ‘blessing’ from these, if we’ve specified that blessing = “real”? Isn’t that redundant?
That condition is programmatically constructed. Sometimes it will ask
for multiple different blessings and then it wants to know which.
> > After:
...
> So this says:
>
> Find me the most 1000 recent flights
> Where:
> branch is “xen-unstable”
> flight <= 15903
> blessing is “real”
> One of its jobs is $job
> It has a runvar matching given $name and $val
>
> And of course it uses the ’name LIKE ‘built_revision_%’ index.
Yes.
> Still don’t understand the ’TRUE AND’ and ‘AS sub’ bits, but it
> looks to me like it’s substantially the same query, with additional
> $name = $val runvar restriction.
That's my intent, ytes.
> And given that you say, "This condition is strictly broader than
> that implemented inside the flight search loop”, I take it that it’s
> again mainly to take advantage of the new index?
Right. The previous approach was "iterate over recent flights,
figure out precisely what they built, and decide if they meet the
(complex) requirements".
Now we only iterate over a subset of recent flights: those which have
at least one such runvar. The big commennt is meant to be a
demonstration that the "(complex) requirements" are a narrower
condition than the new condition on the initial flights query.
So I think the result is that it will look deeper into history, and be
faster, but not otherwise change its beaviour.
Ian.
© 2016 - 2026 Red Hat, Inc.