.gitlab-ci.d/containers.yml | 7 ---- .gitlab-ci.yml | 74 +++++++++++++++++++++++++++++++++---- 2 files changed, 66 insertions(+), 15 deletions(-)
This series fixes a problem with our gitlab CI rules that cause container builds to be skipped. See the commit description in the first patch for the details on this problem. The overall result of this series though is a small increase in overall pipeline time. Previously - When container jobs are skipped: approx 1hr 5 mins - When container jobs are run, cached by docker: approx 1hr 15 mins - When container jobs are run, not cached by docker: approx 1hr 30 mins With this series applied the first scenario no longer exists, so all piplines are either 1hr 15 or 1hr 30 depending on whether the container phase is skipped. On the plus side the builds are more reliable as we're actually building container images at correct times. There is still a race condition though where build jobs can run with the wrong containers. This happens if you push two different branches to gitlab with different docker file content. If the container jobs for the 2nd branch finish before the 1st branch runs its build jobs, the 1st branch can end up using containers fro the second branch. The only fix to truely fix that would be to stop using "latest" docker tag and always use a tag based on the branch name. This would mean we build up a growing set of docker images in the gitlab registry. At least this series is much more correct that what exists in git currently. Daniel P. Berrangé (3): gitlab: always build container images gitlab: add fine grained job deps for all build jobs gitlab: fix inconsistent indentation .gitlab-ci.d/containers.yml | 7 ---- .gitlab-ci.yml | 74 +++++++++++++++++++++++++++++++++---- 2 files changed, 66 insertions(+), 15 deletions(-) -- 2.29.2
On Mon, Feb 08, 2021 at 04:33:36PM +0000, Daniel P. Berrangé wrote: > This series fixes a problem with our gitlab CI rules that cause > container builds to be skipped. See the commit description in the > first patch for the details on this problem. > > The overall result of this series though is a small increase in overall > pipeline time. > > Previously > > - When container jobs are skipped: approx 1hr 5 mins > - When container jobs are run, cached by docker: approx 1hr 15 mins > - When container jobs are run, not cached by docker: approx 1hr 30 mins > > With this series applied the first scenario no longer exists, so > all piplines are either 1hr 15 or 1hr 30 depending on whether the > container phase is skipped. I mean to say the biggest problem I see is the cross-win64-system job. This consumes 1 hour 5 minutes all on its own. It is at least 15 minutes longer that every other job AFAICT. So no matter how well we parallelize stuff, 1 hr 5 is a hard lower limit on pipeline duration right now. We might want to consider how to split the win64 job or cut down what it does in some way ? Regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|
On 2/8/21 6:22 PM, Daniel P. Berrangé wrote: > On Mon, Feb 08, 2021 at 04:33:36PM +0000, Daniel P. Berrangé wrote: >> This series fixes a problem with our gitlab CI rules that cause >> container builds to be skipped. See the commit description in the >> first patch for the details on this problem. >> >> The overall result of this series though is a small increase in overall >> pipeline time. >> >> Previously >> >> - When container jobs are skipped: approx 1hr 5 mins >> - When container jobs are run, cached by docker: approx 1hr 15 mins >> - When container jobs are run, not cached by docker: approx 1hr 30 mins >> >> With this series applied the first scenario no longer exists, so >> all piplines are either 1hr 15 or 1hr 30 depending on whether the >> container phase is skipped. > > I mean to say the biggest problem I see is the cross-win64-system > job. This consumes 1 hour 5 minutes all on its own. It is at least > 15 minutes longer that every other job AFAICT. So no matter how > well we parallelize stuff, 1 hr 5 is a hard lower limit on pipeline > duration right now. > > We might want to consider how to split the win64 job or cut down > what it does in some way ? When the win64 build was added (on Debian), it was to mostly to cover the WHPX. Later we moved mingw jobs to Fedora. I just checked and WHPX is no more built, and nobody complained, so it is not relevant anymore. I don't mind much what you do with the Gitlab win64 job, as this config is better covered on Cirrus. I'd like to keep the win32 job because it has been useful to catch 32-bit issues. Regards, Phil.
On Mon, Feb 08, 2021 at 07:08:39PM +0100, Philippe Mathieu-Daudé wrote: > On 2/8/21 6:22 PM, Daniel P. Berrangé wrote: > > On Mon, Feb 08, 2021 at 04:33:36PM +0000, Daniel P. Berrangé wrote: > >> This series fixes a problem with our gitlab CI rules that cause > >> container builds to be skipped. See the commit description in the > >> first patch for the details on this problem. > >> > >> The overall result of this series though is a small increase in overall > >> pipeline time. > >> > >> Previously > >> > >> - When container jobs are skipped: approx 1hr 5 mins > >> - When container jobs are run, cached by docker: approx 1hr 15 mins > >> - When container jobs are run, not cached by docker: approx 1hr 30 mins > >> > >> With this series applied the first scenario no longer exists, so > >> all piplines are either 1hr 15 or 1hr 30 depending on whether the > >> container phase is skipped. > > > > I mean to say the biggest problem I see is the cross-win64-system > > job. This consumes 1 hour 5 minutes all on its own. It is at least > > 15 minutes longer that every other job AFAICT. So no matter how > > well we parallelize stuff, 1 hr 5 is a hard lower limit on pipeline > > duration right now. > > > > We might want to consider how to split the win64 job or cut down > > what it does in some way ? > > When the win64 build was added (on Debian), it was to mostly to cover > the WHPX. Later we moved mingw jobs to Fedora. I just checked and > WHPX is no more built, and nobody complained, so it is not relevant > anymore. > > I don't mind much what you do with the Gitlab win64 job, as this config > is better covered on Cirrus. I'd like to keep the win32 job because it > has been useful to catch 32-bit issues. I'm not suggesting we remove it. Most developers won't setup Cirrus CI, so will only run GitLab CI jobs. IME it is good to have both win32 and win64 coverage because things do break differently on them - especially if you use bad printf format strings that are not independant of host word size Regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|
Am 08.02.21 um 19:12 schrieb Daniel P. Berrangé: > On Mon, Feb 08, 2021 at 07:08:39PM +0100, Philippe Mathieu-Daudé wrote: >> On 2/8/21 6:22 PM, Daniel P. Berrangé wrote: >>> On Mon, Feb 08, 2021 at 04:33:36PM +0000, Daniel P. Berrangé wrote: >>>> This series fixes a problem with our gitlab CI rules that cause >>>> container builds to be skipped. See the commit description in the >>>> first patch for the details on this problem. >>>> >>>> The overall result of this series though is a small increase in overall >>>> pipeline time. >>>> >>>> Previously >>>> >>>> - When container jobs are skipped: approx 1hr 5 mins >>>> - When container jobs are run, cached by docker: approx 1hr 15 mins >>>> - When container jobs are run, not cached by docker: approx 1hr 30 mins >>>> >>>> With this series applied the first scenario no longer exists, so >>>> all piplines are either 1hr 15 or 1hr 30 depending on whether the >>>> container phase is skipped. >>> I mean to say the biggest problem I see is the cross-win64-system >>> job. This consumes 1 hour 5 minutes all on its own. It is at least >>> 15 minutes longer that every other job AFAICT. So no matter how >>> well we parallelize stuff, 1 hr 5 is a hard lower limit on pipeline >>> duration right now. >>> >>> We might want to consider how to split the win64 job or cut down >>> what it does in some way ? >> When the win64 build was added (on Debian), it was to mostly to cover >> the WHPX. Later we moved mingw jobs to Fedora. I just checked and >> WHPX is no more built, and nobody complained, so it is not relevant >> anymore. >> >> I don't mind much what you do with the Gitlab win64 job, as this config >> is better covered on Cirrus. I'd like to keep the win32 job because it >> has been useful to catch 32-bit issues. > I'm not suggesting we remove it. Most developers won't setup Cirrus CI, > so will only run GitLab CI jobs. IME it is good to have both win32 > and win64 coverage because things do break differently on them - especially > if you use bad printf format strings that are not independant of host > word size I would not say that something is not relevant just because nobody complains. Nobody would miss any CI if everything were always fine. So people would miss WHPX CI as soon as there are changes (which happen infrequently) and something breaks. WHPX should be covered by the w64 build, and as many as possible other features where I currently see a "NO" in the configure output as well. Nevertheless I don't think that each CI job must run frequently. Each run not only costs time, but also energy, and contributes to our climate change. I think that for the win32 and win64 jobs it would be sufficient to run them weekly, maybe even alternating if that is possible. Stefan
On 09/02/2021 07.01, Stefan Weil wrote: > Am 08.02.21 um 19:12 schrieb Daniel P. Berrangé: > >> On Mon, Feb 08, 2021 at 07:08:39PM +0100, Philippe Mathieu-Daudé wrote: >>> On 2/8/21 6:22 PM, Daniel P. Berrangé wrote: >>>> On Mon, Feb 08, 2021 at 04:33:36PM +0000, Daniel P. Berrangé wrote: >>>>> This series fixes a problem with our gitlab CI rules that cause >>>>> container builds to be skipped. See the commit description in the >>>>> first patch for the details on this problem. >>>>> >>>>> The overall result of this series though is a small increase in overall >>>>> pipeline time. >>>>> >>>>> Previously >>>>> >>>>> - When container jobs are skipped: approx 1hr 5 mins >>>>> - When container jobs are run, cached by docker: approx 1hr 15 mins >>>>> - When container jobs are run, not cached by docker: approx 1hr 30 mins >>>>> >>>>> With this series applied the first scenario no longer exists, so >>>>> all piplines are either 1hr 15 or 1hr 30 depending on whether the >>>>> container phase is skipped. >>>> I mean to say the biggest problem I see is the cross-win64-system >>>> job. This consumes 1 hour 5 minutes all on its own. It is at least >>>> 15 minutes longer that every other job AFAICT. So no matter how >>>> well we parallelize stuff, 1 hr 5 is a hard lower limit on pipeline >>>> duration right now. >>>> >>>> We might want to consider how to split the win64 job or cut down >>>> what it does in some way ? >>> When the win64 build was added (on Debian), it was to mostly to cover >>> the WHPX. Later we moved mingw jobs to Fedora. I just checked and >>> WHPX is no more built, and nobody complained, so it is not relevant >>> anymore. >>> >>> I don't mind much what you do with the Gitlab win64 job, as this config >>> is better covered on Cirrus. I'd like to keep the win32 job because it >>> has been useful to catch 32-bit issues. >> I'm not suggesting we remove it. Most developers won't setup Cirrus CI, >> so will only run GitLab CI jobs. IME it is good to have both win32 >> and win64 coverage because things do break differently on them - especially >> if you use bad printf format strings that are not independant of host >> word size > > > I would not say that something is not relevant just because nobody > complains. Nobody would miss any CI if everything were always fine. So > people would miss WHPX CI as soon as there are changes (which happen > infrequently) and something breaks. WHPX should be covered by the w64 build, > and as many as possible other features where I currently see a "NO" in the > configure output as well. Yes, I agree, we should add WHPX there again. Could somebody please check whether the headers are already available in the latest Fedora? Then we might simply switch the container to use the latest version of Fedora instead. Otherwise we should install the headers manually there, like we did in commit d3dd34a1e5e134 for the now-removed Debian container. > Nevertheless I don't think that each CI job must run frequently. Each run > not only costs time, but also energy, and contributes to our climate change. > > I think that for the win32 and win64 jobs it would be sufficient to run them > weekly, maybe even alternating if that is possible. Maybe it would be sufficient to run those jobs only on tags (so that they get checked for pull requests) and on the master and staging branch? Thomas
On Tue, Feb 09, 2021 at 07:01:49AM +0100, Stefan Weil wrote: > Am 08.02.21 um 19:12 schrieb Daniel P. Berrangé: > > > On Mon, Feb 08, 2021 at 07:08:39PM +0100, Philippe Mathieu-Daudé wrote: > > > On 2/8/21 6:22 PM, Daniel P. Berrangé wrote: > > > > On Mon, Feb 08, 2021 at 04:33:36PM +0000, Daniel P. Berrangé wrote: > > > > > This series fixes a problem with our gitlab CI rules that cause > > > > > container builds to be skipped. See the commit description in the > > > > > first patch for the details on this problem. > > > > > > > > > > The overall result of this series though is a small increase in overall > > > > > pipeline time. > > > > > > > > > > Previously > > > > > > > > > > - When container jobs are skipped: approx 1hr 5 mins > > > > > - When container jobs are run, cached by docker: approx 1hr 15 mins > > > > > - When container jobs are run, not cached by docker: approx 1hr 30 mins > > > > > > > > > > With this series applied the first scenario no longer exists, so > > > > > all piplines are either 1hr 15 or 1hr 30 depending on whether the > > > > > container phase is skipped. > > > > I mean to say the biggest problem I see is the cross-win64-system > > > > job. This consumes 1 hour 5 minutes all on its own. It is at least > > > > 15 minutes longer that every other job AFAICT. So no matter how > > > > well we parallelize stuff, 1 hr 5 is a hard lower limit on pipeline > > > > duration right now. > > > > > > > > We might want to consider how to split the win64 job or cut down > > > > what it does in some way ? > > > When the win64 build was added (on Debian), it was to mostly to cover > > > the WHPX. Later we moved mingw jobs to Fedora. I just checked and > > > WHPX is no more built, and nobody complained, so it is not relevant > > > anymore. > > > > > > I don't mind much what you do with the Gitlab win64 job, as this config > > > is better covered on Cirrus. I'd like to keep the win32 job because it > > > has been useful to catch 32-bit issues. > > I'm not suggesting we remove it. Most developers won't setup Cirrus CI, > > so will only run GitLab CI jobs. IME it is good to have both win32 > > and win64 coverage because things do break differently on them - especially > > if you use bad printf format strings that are not independant of host > > word size > > > I would not say that something is not relevant just because nobody > complains. Nobody would miss any CI if everything were always fine. So > people would miss WHPX CI as soon as there are changes (which happen > infrequently) and something breaks. WHPX should be covered by the w64 build, > and as many as possible other features where I currently see a "NO" in the > configure output as well. > > Nevertheless I don't think that each CI job must run frequently. Each run > not only costs time, but also energy, and contributes to our climate change. > > I think that for the win32 and win64 jobs it would be sufficient to run them > weekly, maybe even alternating if that is possible. I think that would be a major step backwards. Not only do we need these jobs to be gating for merges into git master to prevent regressions getting merged, but we want want to stop contributors and subsystem maintainers sending broken patch series. Not running them every time will loose these protections. Regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|
On 2/8/21 5:33 PM, Daniel P. Berrangé wrote: > This series fixes a problem with our gitlab CI rules that cause > container builds to be skipped. See the commit description in the > first patch for the details on this problem. ... > There is still a race condition though where build jobs can run > with the wrong containers. This happens if you push two different > branches to gitlab with different docker file content. If the > container jobs for the 2nd branch finish before the 1st > branch runs its build jobs, the 1st branch can end up using > containers fro the second branch. The only fix to truely fix > that would be to stop using "latest" docker tag and always > use a tag based on the branch name. This would mean we build > up a growing set of docker images in the gitlab registry. OK this indeed describes the problem I'm facing. > At least this series is much more correct that what exists in > git currently. Good, I'll test it then. Regards, Phil.
© 2016 - 2024 Red Hat, Inc.