[RFC PATCH 0/1] ci: Speed up container stage

Fabiano Rosas posted 1 patch 1 year, 2 months ago
Patches applied successfully (tree, apply log)
git fetch https://github.com/patchew-project/qemu tags/patchew/20230223142154.31975-1-farosas@suse.de
Maintainers: "Alex Bennée" <alex.bennee@linaro.org>, "Philippe Mathieu-Daudé" <philmd@linaro.org>, Thomas Huth <thuth@redhat.com>, Wainer dos Santos Moschetta <wainersm@redhat.com>, Beraldo Leal <bleal@redhat.com>
There is a newer version of this series
.gitlab-ci.d/container-template.yml | 1 +
1 file changed, 1 insertion(+)
[RFC PATCH 0/1] ci: Speed up container stage
Posted by Fabiano Rosas 1 year, 2 months ago
I'm not sure if this was discussed previously, but I noticed we're not
pulling the images we push to the registry at every pipeline run.

I would expect we don't actually need to rebuild container images at
_every_ pipeline run, so I propose we add a "docker pull" to the
container templates. We already have that for the docker-edk2|opensbi
images.

Some containers can take a long time to build (14 mins) and pulling
the image first without building can cut the time to about 3
mins. With this we can save almost 2h of cumulative CI time per
pipeline run:

| master   | pull-only |  diff    | container
| 0:02:34  | 0:02:09   | 00:00:25 |  alpha-debian-cross-container
| 0:04:45  | 0:02:40   | 00:02:05 |  amd64-alpine-container
| 0:09:51  | 0:02:56   | 00:06:55 |  amd64-centos8-container
| 0:07:21  | 0:02:49   | 00:04:32 |  amd64-debian-container
| 0:06:00  | 0:02:37   | 00:03:23 |  amd64-debian-cross-container
| 0:14:22  | 0:03:41   | 00:10:41 |  amd64-debian-user-cross-container
| 0:10:14  | 0:03:24   | 00:06:50 |  amd64-fedora-container
| 0:12:09  | 0:02:49   | 00:09:20 |  amd64-opensuse-leap-container
| 0:07:33  | 0:02:45   | 00:04:48 |  amd64-ubuntu2004-container
| 0:08:28  | 0:03:07   | 00:05:21 |  arm64-debian-cross-container
| 0:04:27  | 0:02:58   | 00:01:29 |  armel-debian-cross-container
| 0:08:01  | 0:02:55   | 00:05:06 |  armhf-debian-cross-container
| 0:03:33  | 0:02:18   | 00:01:15 |  cris-fedora-cross-container
| 0:00:28  | 0:00:28   | 00:00:00 |  docker-edk2
| 0:00:25  | 0:00:28   |-00:00:03 |  docker-opensbi
| 0:08:34  | 0:03:10   | 00:05:24 |  hexagon-cross-container
| 0:02:34  | 0:02:08   | 00:00:26 |  hppa-debian-cross-container
| 0:04:50  | 0:02:28   | 00:02:22 |  i386-fedora-cross-container
| 0:02:36  | 0:02:12   | 00:00:24 |  m68k-debian-cross-container
| 0:02:40  | 0:02:09   | 00:00:31 |  mips-debian-cross-container
| 0:02:38  | 0:02:09   | 00:00:29 |  mips64-debian-cross-container
| 0:04:28  | 0:02:48   | 00:01:40 |  mips64el-debian-cross-container
| 0:07:07  | 0:02:46   | 00:04:21 |  mipsel-debian-cross-container
| 0:03:51  | 0:02:21   | 00:01:30 |  powerpc-test-cross-container
| 0:08:52  | 0:03:00   | 00:05:52 |  ppc64el-debian-cross-container
| 0:06:07  | 0:02:49   | 00:03:18 |  python-container
| 0:04:37  | 0:02:26   | 00:02:11 |  riscv64-debian-cross-container
| 0:02:39  | 0:02:08   | 00:00:31 |  riscv64-debian-test-cross-container
| 0:08:03  | 0:03:00   | 00:05:03 |  s390x-debian-cross-container
| 0:02:34  | 0:02:08   | 00:00:26 |  sh4-debian-cross-container
| 0:02:37  | 0:02:09   | 00:00:28 |  sparc64-debian-cross-container
| 0:04:25  | 0:02:17   | 00:02:08 |  tricore-debian-cross-container
| 0:12:51  | 0:03:27   | 00:09:24 |  win32-fedora-cross-container
| 0:11:16  | 0:03:29   | 00:07:47 |  win64-fedora-cross-container
| 0:03:28  | 0:02:20   | 00:01:08 |  xtensa-debian-cross-container
                       | 01:57:30 |

We would need to devise a mechanism (not included here) to force the
re-build of the container images when needed, perhaps an environment
variable or even a whole new "container build" stage before the
"container" stage.

What do you think?

Fabiano Rosas (1):
  ci: Attempt to pull container images before building

 .gitlab-ci.d/container-template.yml | 1 +
 1 file changed, 1 insertion(+)

-- 
2.35.3
Re: [RFC PATCH 0/1] ci: Speed up container stage
Posted by Daniel P. Berrangé 1 year, 2 months ago
On Thu, Feb 23, 2023 at 11:21:53AM -0300, Fabiano Rosas wrote:
> I'm not sure if this was discussed previously, but I noticed we're not
> pulling the images we push to the registry at every pipeline run.
> 
> I would expect we don't actually need to rebuild container images at
> _every_ pipeline run, so I propose we add a "docker pull" to the
> container templates. We already have that for the docker-edk2|opensbi
> images.
> 
> Some containers can take a long time to build (14 mins) and pulling
> the image first without building can cut the time to about 3
> mins. With this we can save almost 2h of cumulative CI time per
> pipeline run:

The docker.py script that we're invoking is already pulling the
image itself eg to pick a random recent job:

  https://gitlab.com/qemu-project/qemu/-/jobs/3806090058

We can see

  $ ./tests/docker/docker.py --engine docker build -t "qemu/$NAME" -f "tests/docker/dockerfiles/$NAME.docker" -r $CI_REGISTRY/qemu-project/qemu 03:54
  Using default tag: latest
  latest: Pulling from qemu-project/qemu/qemu/debian-arm64-cross
  bb263680fed1: Pulling fs layer
  ...snip...

none the less it still went ahead and rebuilt the image from scratch
so something is going wrong here. I don't know why your change adding
an extra 'docker pull' would have any effect, given we're already
pulling, so I wonder if that's just coincidental apparent change
due to the initial state of your fork's container registery.

Whenever I look at this I end up wishing out docker.py didn't exist
and that we could just directly do

  - docker pull "$TAG"
  - docker build --cache-from "$TAG" --tag "$TAG" -f "tests/docker/$NAME.docker"

as that sould be sufficient to build the image with caching.

> We would need to devise a mechanism (not included here) to force the
> re-build of the container images when needed, perhaps an environment
> variable or even a whole new "container build" stage before the
> "container" stage.
> 
> What do you think?

We definitely want the rebuild to be cached. So whatever is
broken in that regard needs fixing, as this used to work AFAIK.


Ideally we would skip the container stage entirely for any
pull request that did NOT include changes to the dockerfile.

The problem is that the way we're using gitlab doesn't let
that work well. We need to setup rules based on filepath.
Such rules are totally unreliable for push events in
practice, because they only evaluate the delta between what
you just pushed and what was already available on the server.
This does not match the content of the pull request, it might
be just a subset.

If we had subsystem maintainers opening a merge request for
their submission, then we could reliably write rules based
on what files are changed by the pull request, and entirely
skip the containers stage most of the time, which would be
an even bigger saving.


With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|
Re: [RFC PATCH 0/1] ci: Speed up container stage
Posted by Fabiano Rosas 1 year, 2 months ago
Daniel P. Berrangé <berrange@redhat.com> writes:

> On Thu, Feb 23, 2023 at 11:21:53AM -0300, Fabiano Rosas wrote:
>> I'm not sure if this was discussed previously, but I noticed we're not
>> pulling the images we push to the registry at every pipeline run.
>> 
>> I would expect we don't actually need to rebuild container images at
>> _every_ pipeline run, so I propose we add a "docker pull" to the
>> container templates. We already have that for the docker-edk2|opensbi
>> images.
>> 
>> Some containers can take a long time to build (14 mins) and pulling
>> the image first without building can cut the time to about 3
>> mins. With this we can save almost 2h of cumulative CI time per
>> pipeline run:
>
> The docker.py script that we're invoking is already pulling the
> image itself eg to pick a random recent job:
>
>   https://gitlab.com/qemu-project/qemu/-/jobs/3806090058
>
> We can see
>
>   $ ./tests/docker/docker.py --engine docker build -t "qemu/$NAME" -f "tests/docker/dockerfiles/$NAME.docker" -r $CI_REGISTRY/qemu-project/qemu 03:54
>   Using default tag: latest
>   latest: Pulling from qemu-project/qemu/qemu/debian-arm64-cross
>   bb263680fed1: Pulling fs layer
>   ...snip...

Ah right, so this is different for user's pipelines because the push at
the end of the build goes to the user's registry:

registry.gitlab.com/farosas/qemu/qemu/debian-arm64-cross

So we're fetching from one place and pushing to a different one. That is
why I see the improvement.

> none the less it still went ahead and rebuilt the image from scratch

It seems docker.py does not see that we're trying to build a tag that is
already there. Could this be due to --cache-from being "disabled"?

        if ("docker" in self._command and
            "TRAVIS" not in os.environ and
            "GITLAB_CI" not in os.environ):
            os.environ["DOCKER_BUILDKIT"] = "1"
            self._buildkit = True
        else:
            self._buildkit = False


> so something is going wrong here. I don't know why your change adding
> an extra 'docker pull' would have any effect, given we're already
> pulling, so I wonder if that's just coincidental apparent change
> due to the initial state of your fork's container registery.
>
> Whenever I look at this I end up wishing out docker.py didn't exist
> and that we could just directly do
>
>   - docker pull "$TAG"
>   - docker build --cache-from "$TAG" --tag "$TAG" -f "tests/docker/$NAME.docker"

I see that in the docker-openbsi image we do just that.

> as that sould be sufficient to build the image with caching.
>
>> We would need to devise a mechanism (not included here) to force the
>> re-build of the container images when needed, perhaps an environment
>> variable or even a whole new "container build" stage before the
>> "container" stage.
>> 
>> What do you think?
>
> We definitely want the rebuild to be cached. So whatever is
> broken in that regard needs fixing, as this used to work AFAIK.
>
>
> Ideally we would skip the container stage entirely for any
> pull request that did NOT include changes to the dockerfile.

Agreed.
Re: [RFC PATCH 0/1] ci: Speed up container stage
Posted by Alex Bennée 1 year, 2 months ago
Daniel P. Berrangé <berrange@redhat.com> writes:

> On Thu, Feb 23, 2023 at 11:21:53AM -0300, Fabiano Rosas wrote:
>> I'm not sure if this was discussed previously, but I noticed we're not
>> pulling the images we push to the registry at every pipeline run.
>> 
>> I would expect we don't actually need to rebuild container images at
>> _every_ pipeline run, so I propose we add a "docker pull" to the
>> container templates. We already have that for the docker-edk2|opensbi
>> images.
>> 
>> Some containers can take a long time to build (14 mins) and pulling
>> the image first without building can cut the time to about 3
>> mins. With this we can save almost 2h of cumulative CI time per
>> pipeline run:
>
> The docker.py script that we're invoking is already pulling the
> image itself eg to pick a random recent job:
>
>   https://gitlab.com/qemu-project/qemu/-/jobs/3806090058
>
> We can see
>
>   $ ./tests/docker/docker.py --engine docker build -t "qemu/$NAME" -f
> "tests/docker/dockerfiles/$NAME.docker" -r
> $CI_REGISTRY/qemu-project/qemu 03:54
>   Using default tag: latest
>   latest: Pulling from qemu-project/qemu/qemu/debian-arm64-cross
>   bb263680fed1: Pulling fs layer
>   ...snip...
>
> none the less it still went ahead and rebuilt the image from scratch
> so something is going wrong here. I don't know why your change adding
> an extra 'docker pull' would have any effect, given we're already
> pulling, so I wonder if that's just coincidental apparent change
> due to the initial state of your fork's container registery.
>
> Whenever I look at this I end up wishing out docker.py didn't exist
> and that we could just directly do
>
>   - docker pull "$TAG"
>   - docker build --cache-from "$TAG" --tag "$TAG" -f "tests/docker/$NAME.docker"
>
> as that sould be sufficient to build the image with caching.

I think we should be ready to do that now as we have flattened all our
dockerfiles. The only other thing that docker.py does is nicely add a
final step for the current user so you can ensure all files generated in
docker cross compile images are still readable on the host.

>> We would need to devise a mechanism (not included here) to force the
>> re-build of the container images when needed, perhaps an environment
>> variable or even a whole new "container build" stage before the
>> "container" stage.
>> 
>> What do you think?
>
> We definitely want the rebuild to be cached. So whatever is
> broken in that regard needs fixing, as this used to work AFAIK.
>
>
> Ideally we would skip the container stage entirely for any
> pull request that did NOT include changes to the dockerfile.

That would be ideal.

> The problem is that the way we're using gitlab doesn't let
> that work well. We need to setup rules based on filepath.
> Such rules are totally unreliable for push events in
> practice, because they only evaluate the delta between what
> you just pushed and what was already available on the server.
> This does not match the content of the pull request, it might
> be just a subset.
>
> If we had subsystem maintainers opening a merge request for
> their submission, then we could reliably write rules based
> on what files are changed by the pull request, and entirely
> skip the containers stage most of the time, which would be
> an even bigger saving.

Our first tentative steps away from an email process?

-- 
Alex Bennée
Virtualisation Tech Lead @ Linaro
Re: [RFC PATCH 0/1] ci: Speed up container stage
Posted by Fabiano Rosas 1 year, 2 months ago
Hi Alex,

> Daniel P. Berrangé <berrange@redhat.com> writes:
>
>> On Thu, Feb 23, 2023 at 11:21:53AM -0300, Fabiano Rosas wrote:
>>> I'm not sure if this was discussed previously, but I noticed we're not
>>> pulling the images we push to the registry at every pipeline run.
>>> 
>>> I would expect we don't actually need to rebuild container images at
>>> _every_ pipeline run, so I propose we add a "docker pull" to the
>>> container templates. We already have that for the docker-edk2|opensbi
>>> images.
>>> 
>>> Some containers can take a long time to build (14 mins) and pulling
>>> the image first without building can cut the time to about 3
>>> mins. With this we can save almost 2h of cumulative CI time per
>>> pipeline run:
>>
>> The docker.py script that we're invoking is already pulling the
>> image itself eg to pick a random recent job:
>>
>>   https://gitlab.com/qemu-project/qemu/-/jobs/3806090058
>>
>> We can see
>>
>>   $ ./tests/docker/docker.py --engine docker build -t "qemu/$NAME" -f
>> "tests/docker/dockerfiles/$NAME.docker" -r
>> $CI_REGISTRY/qemu-project/qemu 03:54
>>   Using default tag: latest
>>   latest: Pulling from qemu-project/qemu/qemu/debian-arm64-cross
>>   bb263680fed1: Pulling fs layer
>>   ...snip...
>>
>> none the less it still went ahead and rebuilt the image from scratch
>> so something is going wrong here. I don't know why your change adding
>> an extra 'docker pull' would have any effect, given we're already
>> pulling, so I wonder if that's just coincidental apparent change
>> due to the initial state of your fork's container registery.
>>
>> Whenever I look at this I end up wishing out docker.py didn't exist
>> and that we could just directly do
>>
>>   - docker pull "$TAG"
>>   - docker build --cache-from "$TAG" --tag "$TAG" -f "tests/docker/$NAME.docker"
>>
>> as that sould be sufficient to build the image with caching.
>
> I think we should be ready to do that now as we have flattened all our
> dockerfiles. The only other thing that docker.py does is nicely add a
> final step for the current user so you can ensure all files generated in
> docker cross compile images are still readable on the host.
>

Just so you know this command line worked:

docker build --cache-from $TAG --tag $TAG --build-arg BUILDKIT_INLINE_CACHE=1 \
  -f "tests/docker/dockerfiles/$NAME.docker" "."

building the cache: https://gitlab.com/farosas/qemu/-/jobs/3825838177
using the cache:    https://gitlab.com/farosas/qemu/-/jobs/3825926944

But we might still have this issue:

commit 6ddc3dc7a882f2e7200fa7fecf505a8d0d8bbea9
Author: Daniel P. Berrangé <berrange@redhat.com>
Date:   Fri Jul 9 15:29:35 2021 +0100

    tests/docker: don't use BUILDKIT in GitLab either
    
    Using BUILDKIT breaks with certain container registries such as CentOS,
    with docker build reporting an error such as
    
      failed to solve with frontend dockerfile.v0:
      failed to build LLB: failed to load cache key:
      unexpected status code
      https://registry.centos.org/v2/centos/manifests/7:
      403 Forbidden

We might need to go the route of skipping the docker build when the
docker pull succeeds.
Re: [RFC PATCH 0/1] ci: Speed up container stage
Posted by Daniel P. Berrangé 1 year, 2 months ago
On Thu, Feb 23, 2023 at 05:30:58PM -0300, Fabiano Rosas wrote:
> 
> Hi Alex,
> 
> > Daniel P. Berrangé <berrange@redhat.com> writes:
> >
> >> On Thu, Feb 23, 2023 at 11:21:53AM -0300, Fabiano Rosas wrote:
> >>> I'm not sure if this was discussed previously, but I noticed we're not
> >>> pulling the images we push to the registry at every pipeline run.
> >>> 
> >>> I would expect we don't actually need to rebuild container images at
> >>> _every_ pipeline run, so I propose we add a "docker pull" to the
> >>> container templates. We already have that for the docker-edk2|opensbi
> >>> images.
> >>> 
> >>> Some containers can take a long time to build (14 mins) and pulling
> >>> the image first without building can cut the time to about 3
> >>> mins. With this we can save almost 2h of cumulative CI time per
> >>> pipeline run:
> >>
> >> The docker.py script that we're invoking is already pulling the
> >> image itself eg to pick a random recent job:
> >>
> >>   https://gitlab.com/qemu-project/qemu/-/jobs/3806090058
> >>
> >> We can see
> >>
> >>   $ ./tests/docker/docker.py --engine docker build -t "qemu/$NAME" -f
> >> "tests/docker/dockerfiles/$NAME.docker" -r
> >> $CI_REGISTRY/qemu-project/qemu 03:54
> >>   Using default tag: latest
> >>   latest: Pulling from qemu-project/qemu/qemu/debian-arm64-cross
> >>   bb263680fed1: Pulling fs layer
> >>   ...snip...
> >>
> >> none the less it still went ahead and rebuilt the image from scratch
> >> so something is going wrong here. I don't know why your change adding
> >> an extra 'docker pull' would have any effect, given we're already
> >> pulling, so I wonder if that's just coincidental apparent change
> >> due to the initial state of your fork's container registery.
> >>
> >> Whenever I look at this I end up wishing out docker.py didn't exist
> >> and that we could just directly do
> >>
> >>   - docker pull "$TAG"
> >>   - docker build --cache-from "$TAG" --tag "$TAG" -f "tests/docker/$NAME.docker"
> >>
> >> as that sould be sufficient to build the image with caching.
> >
> > I think we should be ready to do that now as we have flattened all our
> > dockerfiles. The only other thing that docker.py does is nicely add a
> > final step for the current user so you can ensure all files generated in
> > docker cross compile images are still readable on the host.
> >
> 
> Just so you know this command line worked:
> 
> docker build --cache-from $TAG --tag $TAG --build-arg BUILDKIT_INLINE_CACHE=1 \
>   -f "tests/docker/dockerfiles/$NAME.docker" "."
> 
> building the cache: https://gitlab.com/farosas/qemu/-/jobs/3825838177
> using the cache:    https://gitlab.com/farosas/qemu/-/jobs/3825926944
> 
> But we might still have this issue:
> 
> commit 6ddc3dc7a882f2e7200fa7fecf505a8d0d8bbea9
> Author: Daniel P. Berrangé <berrange@redhat.com>
> Date:   Fri Jul 9 15:29:35 2021 +0100
> 
>     tests/docker: don't use BUILDKIT in GitLab either
>     
>     Using BUILDKIT breaks with certain container registries such as CentOS,
>     with docker build reporting an error such as
>     
>       failed to solve with frontend dockerfile.v0:
>       failed to build LLB: failed to load cache key:
>       unexpected status code
>       https://registry.centos.org/v2/centos/manifests/7:
>       403 Forbidden
> 
> We might need to go the route of skipping the docker build when the
> docker pull succeeds.

I don't recall what made QEMU special in that respect, but we never
set this variable in any libvirt project and didn't see any obvious
problems. So I'm inclined to say it was something the docker.py
does that's caused the problem

With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|