[OSSTEST PATCH 05/14] sg-report-flight: Use WITH to use best index use for $flightsq

Ian Jackson posted 14 patches 5 years, 6 months ago
[OSSTEST PATCH 05/14] sg-report-flight: Use WITH to use best index use for $flightsq
Posted by Ian Jackson 5 years, 6 months ago
While we're here, convert this EXISTS subquery to a JOIN.

Perf: runtime of my test case now ~200-300s.

Example query before (from the Perl DBI trace):

      SELECT * FROM (
        SELECT DISTINCT flight, blessing
             FROM flights
             JOIN runvars r1 USING (flight)

            WHERE (branch='xen-unstable')
              AND ( (TRUE AND flight <= 151903) AND (blessing='real') )
                  AND EXISTS (SELECT 1
                            FROM jobs
                           WHERE jobs.flight = flights.flight
                             AND jobs.job = ?)

              AND r1.name LIKE 'built_revision_%'
              AND r1.name = ?
              AND r1.val= ?

            ORDER BY flight DESC
            LIMIT 1000
      ) AS sub
      ORDER BY blessing ASC, flight DESC

With bind variables:

     "test-armhf-armhf-libvirt"
     'built_revision_xen'
     '165f3afbfc3db70fcfdccad07085cde0a03c858b'

After:

      WITH sub AS (
        SELECT DISTINCT flight, blessing
             FROM flights
             JOIN runvars r1 USING (flight)

            WHERE (branch='xen-unstable')
              AND ( (TRUE AND flight <= 151903) AND (blessing='real') )
              AND r1.name LIKE 'built_revision_%'
              AND r1.name = ?
              AND r1.val= ?

            ORDER BY flight DESC
            LIMIT 1000
      )
      SELECT *
        FROM sub
        JOIN jobs USING (flight)

       WHERE (1=1)
                  AND jobs.job = ?

      ORDER BY blessing ASC, flight DESC

With bind variables:

    'built_revision_xen'
    '165f3afbfc3db70fcfdccad07085cde0a03c858b'
    "test-armhf-armhf-libvirt"

Diff to the query:

-      SELECT * FROM (
+      WITH sub AS (
         SELECT DISTINCT flight, blessing
              FROM flights
              JOIN runvars r1 USING (flight)

             WHERE (branch='xen-unstable')
               AND ( (TRUE AND flight <= 151903) AND (blessing='real') )
-                  AND EXISTS (SELECT 1
-                            FROM jobs
-                           WHERE jobs.flight = flights.flight
-                             AND jobs.job = ?)
-
               AND r1.name LIKE 'built_revision_%'
               AND r1.name = ?
               AND r1.val= ?

             ORDER BY flight DESC
             LIMIT 1000
-      ) AS sub
+      )
+      SELECT *
+        FROM sub
+        JOIN jobs USING (flight)
+
+       WHERE (1=1)
+                  AND jobs.job = ?
+
       ORDER BY blessing ASC, flight DESC

CC: George Dunlap <George.Dunlap@citrix.com>
Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
---
 sg-report-flight | 39 ++++++++++++++++++++++++---------------
 1 file changed, 24 insertions(+), 15 deletions(-)

diff --git a/sg-report-flight b/sg-report-flight
index 61aec7a8..b5398573 100755
--- a/sg-report-flight
+++ b/sg-report-flight
@@ -180,18 +180,6 @@ END
         return undef;
     }
 
-    my @flightsq_params;
-    my $flightsq_jobcond='(1=1)';
-    if (defined $job) {
-	push @flightsq_params, $job;
-	$flightsq_jobcond = <<END;
-                  AND EXISTS (SELECT 1
-			    FROM jobs
-			   WHERE jobs.flight = flights.flight
-			     AND jobs.job = ?)
-END
-    }
-
     # We build a slightly complicated query to find possibly-relevant
     # flights.  A "possibly-relevant" flight is one which the main
     # flight categorisation algorithm below (the loop over $tflight)
@@ -220,6 +208,7 @@ END
     # still execute the full job-specific recursive examination, for
     # each possibly-relevant flight - that's the $tflight loop body.
 
+    my @flightsq_params;
     my $runvars_joins = '';
     my $runvars_conds = '';
     my $ri=0;
@@ -247,18 +236,38 @@ END
       }
     }
 
+    my $flightsq_jobs_join = '';
+    my $flightsq_jobcond = '';
+    if (defined $job) {
+	push @flightsq_params, $job;
+	$flightsq_jobs_join = <<END;
+        JOIN jobs USING (flight)
+END
+	$flightsq_jobcond = <<END;
+                  AND jobs.job = ?
+END
+    }
+
+    # In psql 9.6 this WITH clause makes postgresql do the flights
+    # query first.  This is good because our built revision index finds
+    # relevant flights very quickly.  Without this, postgresql seems
+    # to like to scan the jobs table.
     my $flightsq= <<END;
-      SELECT * FROM (
+      WITH sub AS (
         SELECT DISTINCT flight, blessing
              FROM flights
 $runvars_joins
             WHERE $branches_cond_q
               AND $blessingscond
-$flightsq_jobcond
 $runvars_conds
             ORDER BY flight DESC
             LIMIT 1000
-      ) AS sub
+      )
+      SELECT *
+        FROM sub
+$flightsq_jobs_join
+       WHERE (1=1)
+$flightsq_jobcond
       ORDER BY blessing ASC, flight DESC
 END
     $flightsq= db_prepare($flightsq);
-- 
2.20.1


Re: [OSSTEST PATCH 05/14] sg-report-flight: Use WITH to use best index use for $flightsq
Posted by George Dunlap 5 years, 6 months ago

> On Jul 21, 2020, at 7:41 PM, Ian Jackson <ian.jackson@eu.citrix.com> wrote:
> 
> While we're here, convert this EXISTS subquery to a JOIN.
> 
> Perf: runtime of my test case now ~200-300s.
> 
> Example query before (from the Perl DBI trace):
> 
>      SELECT * FROM (
>        SELECT DISTINCT flight, blessing
>             FROM flights
>             JOIN runvars r1 USING (flight)
> 
>            WHERE (branch='xen-unstable')
>              AND ( (TRUE AND flight <= 151903) AND (blessing='real') )
>                  AND EXISTS (SELECT 1
>                            FROM jobs
>                           WHERE jobs.flight = flights.flight
>                             AND jobs.job = ?)
> 
>              AND r1.name LIKE 'built_revision_%'
>              AND r1.name = ?
>              AND r1.val= ?
> 
>            ORDER BY flight DESC
>            LIMIT 1000
>      ) AS sub
>      ORDER BY blessing ASC, flight DESC
> 
> With bind variables:
> 
>     "test-armhf-armhf-libvirt"
>     'built_revision_xen'
>     '165f3afbfc3db70fcfdccad07085cde0a03c858b'
> 
> After:
> 
>      WITH sub AS (
>        SELECT DISTINCT flight, blessing
>             FROM flights
>             JOIN runvars r1 USING (flight)
> 
>            WHERE (branch='xen-unstable')
>              AND ( (TRUE AND flight <= 151903) AND (blessing='real') )
>              AND r1.name LIKE 'built_revision_%'
>              AND r1.name = ?
>              AND r1.val= ?
> 
>            ORDER BY flight DESC
>            LIMIT 1000
>      )
>      SELECT *
>        FROM sub
>        JOIN jobs USING (flight)
> 
>       WHERE (1=1)
>                  AND jobs.job = ?
> 
>      ORDER BY blessing ASC, flight DESC

I was wondering if it would be useful converting this to a join would be useful. :-)

Again, not sure what the “(1=1) AND” bit is for; something to poke the query planner somehow?

The main thing I see here is that there’s nothing *in the query* that guarantees you won’t get multiple flights if there are multiple jobs for that flight whose ‘job’ value; but given the naming scheme so far, I’m guessing job is unique…?  As long as there’s something else preventing duplication I think it’s fine.

 -George

Re: [OSSTEST PATCH 05/14] sg-report-flight: Use WITH to use best index use for $flightsq
Posted by Ian Jackson 5 years, 6 months ago
George Dunlap writes ("Re: [OSSTEST PATCH 05/14] sg-report-flight: Use WITH to use best index use for $flightsq"):
> On Jul 21, 2020, at 7:41 PM, Ian Jackson <ian.jackson@eu.citrix.com> wrote:
> > After:
> >      WITH sub AS (
> >        SELECT DISTINCT flight, blessing
> >             FROM flights
> >             JOIN runvars r1 USING (flight)
> > 
> >            WHERE (branch='xen-unstable')
> >              AND ( (TRUE AND flight <= 151903) AND (blessing='real') )
> >              AND r1.name LIKE 'built_revision_%'
> >              AND r1.name = ?
> >              AND r1.val= ?
> > 
> >            ORDER BY flight DESC
> >            LIMIT 1000
> >      )
> >      SELECT *
> >        FROM sub
> >        JOIN jobs USING (flight)
> > 
> >       WHERE (1=1)
> >                  AND jobs.job = ?
> > 
> >      ORDER BY blessing ASC, flight DESC
> 
> I was wondering if it would be useful converting this to a join would be useful. :-)
...
> The main thing I see here is that there’s nothing *in the query*
> that guarantees you won’t get multiple flights if there are multiple
> jobs for that flight whose ‘job’ value; but given the naming scheme
> so far, I’m guessing job is unique…?  As long as there’s something
> else preventing duplication I think it’s fine.

(flight,job) is the primary key for the jobs table.

I can probably produce a schema dump if that would make reading this
stuff easier.

Ian.