Right now we're dividing the jobs into three stages: prebuild, which
includes DCO checking as well as building artifacts such as the
website, and native_build/cross_build, which do exactly what you'd
expect based on their names.
This organization is nice from the logical point of view, but results
in poor utilization of the available CI resources: in particular, the
fact that cross_build jobs can only start after all native_build jobs
have finished means that if even a single one of the latter takes a
bit longer the pipeline will stall, and with native builds taking
anywhere from less than 10 minutes to more than 20, this happens all
the time.
Building artifacts in a separate pipeline stage also doesn't have any
advantages, and only delays further stages by a couple of minutes.
The only job that really makes sense in its own stage is the DCO
check, because it's extremely fast (less than 1 minute) and, if that
fails, we can avoid kicking off all other jobs.
Reducing the number of stages results in significant speedups:
specifically, going from three stages to two stages reduces the
overall completion time for a full CI pipeline from ~45 minutes[1]
to ~30 minutes[2].
[1] https://gitlab.com/abologna/libvirt/-/pipelines/154751893
[2] https://gitlab.com/abologna/libvirt/-/pipelines/154771173
Signed-off-by: Andrea Bolognani <abologna@redhat.com>
---
.gitlab-ci.yml | 19 +++++++++----------
1 file changed, 9 insertions(+), 10 deletions(-)
diff --git a/.gitlab-ci.yml b/.gitlab-ci.yml
index 7a8142b506..8d9313e415 100644
--- a/.gitlab-ci.yml
+++ b/.gitlab-ci.yml
@@ -2,9 +2,8 @@ variables:
GIT_DEPTH: 100
stages:
- - prebuild
- - native_build
- - cross_build
+ - sanity_checks
+ - builds
.script_variables: &script_variables |
export MAKEFLAGS="-j$(getconf _NPROCESSORS_ONLN)"
@@ -17,7 +16,7 @@ stages:
# Default native build jobs that are always run
.native_build_default_job_template: &native_build_default_job_definition
- stage: native_build
+ stage: builds
cache:
paths:
- ccache/
@@ -42,7 +41,7 @@ stages:
# system other than Linux. These jobs will only run if the required
# setup has been performed on the GitLab account (see ci/README.rst).
.cirrus_build_default_job_template: &cirrus_build_default_job_definition
- stage: native_build
+ stage: builds
image: registry.gitlab.com/libvirt/libvirt-ci/cirrus-run:master
script:
- cirrus-run ci/cirrus/$NAME.yml.j2
@@ -64,7 +63,7 @@ stages:
# Default cross build jobs that are always run
.cross_build_default_job_template: &cross_build_default_job_definition
- stage: cross_build
+ stage: builds
cache:
paths:
- ccache/
@@ -194,7 +193,7 @@ mingw64-fedora-rawhide:
# be deployed to the web root:
# https://gitlab.com/libvirt/libvirt/-/jobs/artifacts/master/download?job=website
website:
- stage: prebuild
+ stage: builds
before_script:
- *script_variables
script:
@@ -216,7 +215,7 @@ website:
codestyle:
- stage: prebuild
+ stage: builds
before_script:
- *script_variables
script:
@@ -231,7 +230,7 @@ codestyle:
# for translation usage:
# https://gitlab.com/libvirt/libvirt/-/jobs/artifacts/master/download?job=potfile
potfile:
- stage: prebuild
+ stage: builds
only:
- master
before_script:
@@ -259,7 +258,7 @@ potfile:
# this test on developer's personal forks from which
# merge requests are submitted
check-dco:
- stage: prebuild
+ stage: sanity_checks
image: registry.gitlab.com/libvirt/libvirt-ci/check-dco:master
script:
- /check-dco
--
2.25.4
On Wed, Jun 10, 2020 at 13:33:01 +0200, Andrea Bolognani wrote: [...] > Building artifacts in a separate pipeline stage also doesn't have any > advantages, and only delays further stages by a couple of minutes. > The only job that really makes sense in its own stage is the DCO > check, because it's extremely fast (less than 1 minute) and, if that > fails, we can avoid kicking off all other jobs. On the contrary I think that the DCO check should be made after builds as that usually forces users to add a sign-off just to bypass that check if they want to sanity check their series. Since the lack of a sign off can be effectively used as a mark for an patch that is not ready to be pushed, but a build-check is still needed. This adds a pointless hurdle in using the CI and also removes one of the meaningful uses to have a sign off checker.
On Wed, Jun 10, 2020 at 01:51:29PM +0200, Peter Krempa wrote: > On Wed, Jun 10, 2020 at 13:33:01 +0200, Andrea Bolognani wrote: > > [...] > > > Building artifacts in a separate pipeline stage also doesn't have any > > advantages, and only delays further stages by a couple of minutes. > > The only job that really makes sense in its own stage is the DCO > > check, because it's extremely fast (less than 1 minute) and, if that > > fails, we can avoid kicking off all other jobs. > > On the contrary I think that the DCO check should be made after builds > as that usually forces users to add a sign-off just to bypass that check > if they want to sanity check their series. Missing signoff is quite common for new contributors, so it was put as the first check so that get quick notification of this mistake. > Since the lack of a sign off can be effectively used as a mark for an > patch that is not ready to be pushed, but a build-check is still needed. > This adds a pointless hurdle in using the CI and also removes one of the > meaningful uses to have a sign off checker. That kind of usage of signoff is not really required in a merge request workflow. You won't typically open the merge request in the first place if code isn't ready, but if you did, then there's explicit "WIP" flag for merge requests to achieve this. Once libvirt.git uses merge request, we will fully block all ability to push directly to git. Regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|
On a Wednesday in 2020, Daniel P. Berrangé wrote: >On Wed, Jun 10, 2020 at 01:51:29PM +0200, Peter Krempa wrote: >> On Wed, Jun 10, 2020 at 13:33:01 +0200, Andrea Bolognani wrote: >> >> [...] >> >> > Building artifacts in a separate pipeline stage also doesn't have any >> > advantages, and only delays further stages by a couple of minutes. >> > The only job that really makes sense in its own stage is the DCO >> > check, because it's extremely fast (less than 1 minute) and, if that >> > fails, we can avoid kicking off all other jobs. >> >> On the contrary I think that the DCO check should be made after builds >> as that usually forces users to add a sign-off just to bypass that check >> if they want to sanity check their series. > >Missing signoff is quite common for new contributors, so it was put as >the first check so that get quick notification of this mistake. > >> Since the lack of a sign off can be effectively used as a mark for an >> patch that is not ready to be pushed, but a build-check is still needed. >> This adds a pointless hurdle in using the CI and also removes one of the >> meaningful uses to have a sign off checker. > >That kind of usage of signoff is not really required in a merge request >workflow. You won't typically open the merge request in the first place >if code isn't ready, but if you did, then there's explicit "WIP" flag for >merge requests to achieve this. Once libvirt.git uses merge request, we >will fully block all ability to push directly to git. I think we have a long way to go until merge requests are usable without pushing directly to git. Jano > >Regards, >Daniel >-- >|: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| >|: https://libvirt.org -o- https://fstop138.berrange.com :| >|: https://entangle-photo.org -o- https://www.instagram.com/dberrange :| >
On Wed, Jun 10, 2020 at 01:33:01PM +0200, Andrea Bolognani wrote: > Right now we're dividing the jobs into three stages: prebuild, which > includes DCO checking as well as building artifacts such as the > website, and native_build/cross_build, which do exactly what you'd > expect based on their names. > > This organization is nice from the logical point of view, but results > in poor utilization of the available CI resources: in particular, the > fact that cross_build jobs can only start after all native_build jobs > have finished means that if even a single one of the latter takes a > bit longer the pipeline will stall, and with native builds taking > anywhere from less than 10 minutes to more than 20, this happens all > the time. > > Building artifacts in a separate pipeline stage also doesn't have any > advantages, and only delays further stages by a couple of minutes. > The only job that really makes sense in its own stage is the DCO > check, because it's extremely fast (less than 1 minute) and, if that > fails, we can avoid kicking off all other jobs. > > Reducing the number of stages results in significant speedups: > specifically, going from three stages to two stages reduces the > overall completion time for a full CI pipeline from ~45 minutes[1] > to ~30 minutes[2]. > > [1] https://gitlab.com/abologna/libvirt/-/pipelines/154751893 > [2] https://gitlab.com/abologna/libvirt/-/pipelines/154771173 > > Signed-off-by: Andrea Bolognani <abologna@redhat.com> > --- > .gitlab-ci.yml | 19 +++++++++---------- > 1 file changed, 9 insertions(+), 10 deletions(-) Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|
On Wed, Jun 10, 2020 at 01:33:01PM +0200, Andrea Bolognani wrote: > Right now we're dividing the jobs into three stages: prebuild, which > includes DCO checking as well as building artifacts such as the > website, and native_build/cross_build, which do exactly what you'd > expect based on their names. > > This organization is nice from the logical point of view, but results > in poor utilization of the available CI resources: in particular, the > fact that cross_build jobs can only start after all native_build jobs > have finished means that if even a single one of the latter takes a > bit longer the pipeline will stall, and with native builds taking > anywhere from less than 10 minutes to more than 20, this happens all > the time. > > Building artifacts in a separate pipeline stage also doesn't have any > advantages, and only delays further stages by a couple of minutes. > The only job that really makes sense in its own stage is the DCO > check, because it's extremely fast (less than 1 minute) and, if that > fails, we can avoid kicking off all other jobs. The advantage of using stages is that it makes it easy to see at a glance where the pipeline was failing. > > Reducing the number of stages results in significant speedups: > specifically, going from three stages to two stages reduces the > overall completion time for a full CI pipeline from ~45 minutes[1] > to ~30 minutes[2]. > > [1] https://gitlab.com/abologna/libvirt/-/pipelines/154751893 > [2] https://gitlab.com/abologna/libvirt/-/pipelines/154771173 I don't think this time comparison is showing a genuine difference. If we look at the original staged pipeline, every single individual job took much longer than every individual jobs in the simplified pipeline. I think the difference in job times accounts for most (possibly all) of the difference in the pipelines time. If we look at the history of libvirt pipelines: https://gitlab.com/libvirt/libvirt/pipelines the vast majority of the time we're completing in 30 minutes or less already. If you want to demonstrate an time improvement from these merged stages, then run 20 pipelines over a cople of days and show that they're consistently better than what we see already, and not just a reflection of the CI infra load at a point in time. Regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|
On Wed, Jun 10, 2020 at 01:14:51PM +0100, Daniel P. Berrangé wrote: > On Wed, Jun 10, 2020 at 01:33:01PM +0200, Andrea Bolognani wrote: > > Right now we're dividing the jobs into three stages: prebuild, which > > includes DCO checking as well as building artifacts such as the > > website, and native_build/cross_build, which do exactly what you'd > > expect based on their names. > > > > This organization is nice from the logical point of view, but results > > in poor utilization of the available CI resources: in particular, the > > fact that cross_build jobs can only start after all native_build jobs > > have finished means that if even a single one of the latter takes a > > bit longer the pipeline will stall, and with native builds taking > > anywhere from less than 10 minutes to more than 20, this happens all > > the time. > > > > Building artifacts in a separate pipeline stage also doesn't have any > > advantages, and only delays further stages by a couple of minutes. > > The only job that really makes sense in its own stage is the DCO > > check, because it's extremely fast (less than 1 minute) and, if that > > fails, we can avoid kicking off all other jobs. > > The advantage of using stages is that it makes it easy to see at a > glance where the pipeline was failing. > > > > > Reducing the number of stages results in significant speedups: > > specifically, going from three stages to two stages reduces the > > overall completion time for a full CI pipeline from ~45 minutes[1] > > to ~30 minutes[2]. > > > > [1] https://gitlab.com/abologna/libvirt/-/pipelines/154751893 > > [2] https://gitlab.com/abologna/libvirt/-/pipelines/154771173 > > I don't think this time comparison is showing a genuine difference. > > If we look at the original staged pipeline, every single individual > job took much longer than every individual jobs in the simplified > pipeline. I think the difference in job times accounts for most > (possibly all) of the difference in the pipelines time. > > If we look at the history of libvirt pipelines: > > https://gitlab.com/libvirt/libvirt/pipelines > > the vast majority of the time we're completing in 30 minutes or > less already. > > If you want to demonstrate an time improvement from these merged > stages, then run 20 pipelines over a cople of days and show > that they're consistently better than what we see already, and > not just a reflection of the CI infra load at a point in time. Also remember that we're using ccache, so slower builds may just be a reflection of the ccache having low hit rate - a sequence of repeated builds of the same branch should identify if that's the case. Regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|
On Wed, 2020-06-10 at 13:31 +0100, Daniel P. Berrangé wrote: > On Wed, Jun 10, 2020 at 01:14:51PM +0100, Daniel P. Berrangé wrote: > > On Wed, Jun 10, 2020 at 01:33:01PM +0200, Andrea Bolognani wrote: > > > Building artifacts in a separate pipeline stage also doesn't have any > > > advantages, and only delays further stages by a couple of minutes. > > > The only job that really makes sense in its own stage is the DCO > > > check, because it's extremely fast (less than 1 minute) and, if that > > > fails, we can avoid kicking off all other jobs. > > > > The advantage of using stages is that it makes it easy to see at a > > glance where the pipeline was failing. Ultimately you'll need to drill down to the actual failure, though, so the only situation in which it would really provide value is if for some reason *all* cross builds failed at once, which is not something that happens frequently enough to optimize for. > > > Reducing the number of stages results in significant speedups: > > > specifically, going from three stages to two stages reduces the > > > overall completion time for a full CI pipeline from ~45 minutes[1] > > > to ~30 minutes[2]. > > > > > > [1] https://gitlab.com/abologna/libvirt/-/pipelines/154751893 > > > [2] https://gitlab.com/abologna/libvirt/-/pipelines/154771173 > > > > I don't think this time comparison is showing a genuine difference. > > > > If we look at the original staged pipeline, every single individual > > job took much longer than every individual jobs in the simplified > > pipeline. I think the difference in job times accounts for most > > (possibly all) of the difference in the pipelines time. > > > > If we look at the history of libvirt pipelines: > > > > https://gitlab.com/libvirt/libvirt/pipelines > > > > the vast majority of the time we're completing in 30 minutes or > > less already. That was before introducing FreeBSD builds, which for whatever reason take a significantly longer time: the last couple of jobs both took 50+ minutes. Installing packages is very inefficient, it would seem. Either way, even looking at earlier jobs, it seems clear that we leave compute time on the table: for the last 10 jobs before adding FreeBSD, we have Longest job | Shortest job ------------ ------------- 21:20 | 12:12 16:11 | 09:04 21:31 | 13:40 16:32 | 08:28 14:53 | 08:16 16:01 | 07:59 16:17 | 08:40 15:30 | 08:49 15:12 | 09:11 16:20 | 08:34 which means the pipeline is stalled for at least 5-8 minutes each time. That's time that we could use to run builds, but we just sit idly and wait instead. The difference becomes even bigger with FreeBSD in the mix. Even from a more semantical point of view, pipeline stages exist to implement dependencies between jobs: a good example is our container build jobs, which of course need to happen *before* the build job that uses that container can start. There are no dependencies whatsoever between native builds and cross builds. > > If you want to demonstrate an time improvement from these merged > > stages, then run 20 pipelines over a cople of days and show > > that they're consistently better than what we see already, and > > not just a reflection of the CI infra load at a point in time. I could do that, sure, it just seems like a waste of shared runner CPU time... > Also remember that we're using ccache, so slower builds may just be a > reflection of the ccache having low hit rate - a sequence of repeated > builds of the same branch should identify if that's the case. I've been running builds pretty much non-stop over the past few days, and since the cache is keyed off the job's name there should be no significant skew caused by this. -- Andrea Bolognani / Red Hat / Virtualization
On Wed, Jun 10, 2020 at 05:15:55PM +0200, Andrea Bolognani wrote: > On Wed, 2020-06-10 at 13:31 +0100, Daniel P. Berrangé wrote: > > On Wed, Jun 10, 2020 at 01:14:51PM +0100, Daniel P. Berrangé wrote: > > > On Wed, Jun 10, 2020 at 01:33:01PM +0200, Andrea Bolognani wrote: > > > > Building artifacts in a separate pipeline stage also doesn't have any > > > > advantages, and only delays further stages by a couple of minutes. > > > > The only job that really makes sense in its own stage is the DCO > > > > check, because it's extremely fast (less than 1 minute) and, if that > > > > fails, we can avoid kicking off all other jobs. > > > > > > The advantage of using stages is that it makes it easy to see at a > > > glance where the pipeline was failing. > > Ultimately you'll need to drill down to the actual failure, though, > so the only situation in which it would really provide value is if > for some reason *all* cross builds failed at once, which is not > something that happens frequently enough to optimize for. > > > > > Reducing the number of stages results in significant speedups: > > > > specifically, going from three stages to two stages reduces the > > > > overall completion time for a full CI pipeline from ~45 minutes[1] > > > > to ~30 minutes[2]. > > > > > > > > [1] https://gitlab.com/abologna/libvirt/-/pipelines/154751893 > > > > [2] https://gitlab.com/abologna/libvirt/-/pipelines/154771173 > > > > > > I don't think this time comparison is showing a genuine difference. > > > > > > If we look at the original staged pipeline, every single individual > > > job took much longer than every individual jobs in the simplified > > > pipeline. I think the difference in job times accounts for most > > > (possibly all) of the difference in the pipelines time. > > > > > > If we look at the history of libvirt pipelines: > > > > > > https://gitlab.com/libvirt/libvirt/pipelines > > > > > > the vast majority of the time we're completing in 30 minutes or > > > less already. > > That was before introducing FreeBSD builds, which for whatever reason > take a significantly longer time: the last couple of jobs both took > 50+ minutes. Installing packages is very inefficient, it would seem. Oh dear, yeah, i missed that it introduced FreeBSD. Regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|
© 2016 - 2024 Red Hat, Inc.