[v1] automation: introduce an x86_64 Dom0/DomU test

[PATCH 3/3] automation: add a QEMU based x86_64 Dom0/DomU test

Posted by Stefano Stabellini 4 years, 3 months ago

From: Stefano Stabellini <stefano.stabellini@xilinx.com>

Introduce a test based on QEMU to run Xen, Dom0 and start a DomU.
This is similar to the existing qemu-alpine-arm64.sh script and test.
The only differences are:
- use Debian's qemu-system-x86_64 (on ARM we build our own)
- use ipxe instead of u-boot and ImageBuilder

Signed-off-by: Stefano Stabellini <stefano.stabellini@xilinx.com>
---
 automation/gitlab-ci/test.yaml           | 24 +++++++
 automation/scripts/qemu-alpine-x86_64.sh | 92 ++++++++++++++++++++++++
 2 files changed, 116 insertions(+)
 create mode 100644 automation/scripts/qemu-alpine-x86_64.sh

diff --git a/automation/gitlab-ci/test.yaml b/automation/gitlab-ci/test.yaml
index 91a10febbf..c1d67ec4b5 100644
--- a/automation/gitlab-ci/test.yaml
+++ b/automation/gitlab-ci/test.yaml
@@ -47,6 +47,30 @@ qemu-alpine-arm64-gcc:
     - /^coverity-tested\/.*/
     - /^stable-.*/
 
+qemu-alpine-x86_64-gcc:
+  stage: test
+  image: registry.gitlab.com/xen-project/xen/${CONTAINER}
+  variables:
+    CONTAINER: debian:unstable
+  script:
+    - ./automation/scripts/qemu-alpine-x86_64.sh 2>&1 | tee qemu-smoke-arm64.log
+  dependencies:
+    - alpine-3.12-gcc
+    - alpine-3.12-rootfs-export
+    - kernel-5.10.74-export
+  artifacts:
+    paths:
+      - smoke.serial
+      - '*.log'
+    when: always
+  tags:
+    - x86_64
+  except:
+    - master
+    - smoke
+    - /^coverity-tested\/.*/
+    - /^stable-.*/
+
 qemu-smoke-arm64-gcc:
   stage: test
   image: registry.gitlab.com/xen-project/xen/${CONTAINER}
diff --git a/automation/scripts/qemu-alpine-x86_64.sh b/automation/scripts/qemu-alpine-x86_64.sh
new file mode 100644
index 0000000000..41b05210d6
--- /dev/null
+++ b/automation/scripts/qemu-alpine-x86_64.sh
@@ -0,0 +1,92 @@
+#!/bin/bash
+
+set -ex
+
+apt-get -qy update
+apt-get -qy install --no-install-recommends qemu-system-x86 \
+                                            cpio \
+                                            curl \
+                                            busybox-static
+
+# DomU Busybox
+cd binaries
+mkdir -p initrd
+mkdir -p initrd/bin
+mkdir -p initrd/sbin
+mkdir -p initrd/etc
+mkdir -p initrd/dev
+mkdir -p initrd/proc
+mkdir -p initrd/sys
+mkdir -p initrd/lib
+mkdir -p initrd/var
+mkdir -p initrd/mnt
+cp /bin/busybox initrd/bin/busybox
+initrd/bin/busybox --install initrd/bin
+echo "#!/bin/sh
+
+mount -t proc proc /proc
+mount -t sysfs sysfs /sys
+mount -t devtmpfs devtmpfs /dev
+/bin/sh" > initrd/init
+chmod +x initrd/init
+cd initrd
+find . | cpio --create --format='newc' | gzip > ../initrd.cpio.gz
+cd ..
+
+mkdir -p rootfs
+cd rootfs
+tar xvzf ../initrd.tar.gz
+mkdir proc
+mkdir run
+mkdir srv
+mkdir sys
+rm var/run
+cp -ar ../dist/install/* .
+mv ../initrd.cpio.gz ./root
+cp ../bzImage ./root
+echo "name=\"test\"
+memory=512
+vcpus=1
+kernel=\"/root/bzImage\"
+ramdisk=\"/root/initrd.cpio.gz\"
+extra=\"console=hvc0 root=/dev/ram0 rdinit=/bin/sh\"
+" > root/test.cfg
+echo "#!/bin/bash
+
+export LD_LIBRARY_PATH=/usr/local/lib
+bash /etc/init.d/xencommons start
+
+xl list
+
+xl create -c /root/test.cfg
+
+" > etc/local.d/xen.start
+chmod +x etc/local.d/xen.start
+echo "rc_verbose=yes" >> etc/rc.conf
+find . |cpio -H newc -o|gzip > ../xen-rootfs.cpio.gz
+cd ../..
+
+cat >> binaries/pxelinux.0 <<- EOF
+#!ipxe
+
+kernel xen console=com1
+module bzImage console=hvc0
+module xen-rootfs.cpio.gz
+boot
+EOF
+
+# Run the test
+rm -f smoke.serial
+set +e
+timeout -k 1 720 \
+qemu-system-x86_64 \
+    -cpu qemu64,+svm \
+    -m 2G -smp 2 \
+    -monitor none -serial stdio \
+    -nographic \
+    -device virtio-net-pci,netdev=n0 \
+    -netdev user,id=n0,tftp=binaries,bootfile=/pxelinux.0 |& tee smoke.serial
+
+set -e
+(grep -q "Domain-0" smoke.serial && grep -q "BusyBox" smoke.serial) || exit 1
+exit 0
-- 
2.17.1

Re: [PATCH 3/3] automation: add a QEMU based x86_64 Dom0/DomU test

Posted by Anthony PERARD 4 years, 3 months ago

On Thu, Oct 21, 2021 at 04:08:39PM -0700, Stefano Stabellini wrote:
> diff --git a/automation/scripts/qemu-alpine-x86_64.sh b/automation/scripts/qemu-alpine-x86_64.sh
> new file mode 100644
> index 0000000000..41b05210d6
> --- /dev/null
> +++ b/automation/scripts/qemu-alpine-x86_64.sh
> @@ -0,0 +1,92 @@
> +#!/bin/bash
> +
> +set -ex
> +
> +apt-get -qy update
> +apt-get -qy install --no-install-recommends qemu-system-x86 \
> +                                            cpio \
> +                                            curl \
> +                                            busybox-static

Please, don't install packages during the CI job. If you need new
packages, update the container.

That said, "curl" doesn't seems to be needed.

> +# DomU Busybox
> +cd binaries
> +mkdir -p initrd
> +mkdir -p initrd/bin
> +mkdir -p initrd/sbin
> +mkdir -p initrd/etc
> +mkdir -p initrd/dev
> +mkdir -p initrd/proc
> +mkdir -p initrd/sys
> +mkdir -p initrd/lib
> +mkdir -p initrd/var
> +mkdir -p initrd/mnt
> +cp /bin/busybox initrd/bin/busybox
> +initrd/bin/busybox --install initrd/bin
> +echo "#!/bin/sh
> +
> +mount -t proc proc /proc
> +mount -t sysfs sysfs /sys
> +mount -t devtmpfs devtmpfs /dev
> +/bin/sh" > initrd/init
> +chmod +x initrd/init
> +cd initrd
> +find . | cpio --create --format='newc' | gzip > ../initrd.cpio.gz

This isn't confusing at all, depending on the tool used to make an
archive, the resulting initrd has a different purpose :-).
    initrd.tar.gz -> dom0
    initrd.cpio.gz -> domU

> +cd ..

Maybe add a comment here saying that we are now preparing dom0 root
filesystem? (as there is one for domu)

> +mkdir -p rootfs
> +cd rootfs
> +tar xvzf ../initrd.tar.gz
> +mkdir proc
> +mkdir run
> +mkdir srv
> +mkdir sys
> +rm var/run
> +cp -ar ../dist/install/* .
> +mv ../initrd.cpio.gz ./root
> +cp ../bzImage ./root
> +echo "name=\"test\"
> +memory=512
> +vcpus=1
> +kernel=\"/root/bzImage\"
> +ramdisk=\"/root/initrd.cpio.gz\"
> +extra=\"console=hvc0 root=/dev/ram0 rdinit=/bin/sh\"
> +" > root/test.cfg
> +echo "#!/bin/bash
> +

Maybe add `set -x` ?

> +export LD_LIBRARY_PATH=/usr/local/lib
> +bash /etc/init.d/xencommons start
> +
> +xl list
> +
> +xl create -c /root/test.cfg
> +
> +" > etc/local.d/xen.start
> +chmod +x etc/local.d/xen.start
> +echo "rc_verbose=yes" >> etc/rc.conf
> +find . |cpio -H newc -o|gzip > ../xen-rootfs.cpio.gz
> +cd ../..
> +
> +cat >> binaries/pxelinux.0 <<- EOF

So, I've look at <<- meaning as I never used that before and it seems to
remove all leading tab chr from the input, yet they are no tab in the
input. So maybe use just <<EOF instead, without the dash -.

> +#!ipxe
> +
> +kernel xen console=com1
> +module bzImage console=hvc0
> +module xen-rootfs.cpio.gz
> +boot
> +EOF

Thanks,

-- 
Anthony PERARD

Re: [PATCH 3/3] automation: add a QEMU based x86_64 Dom0/DomU test

Posted by Stefano Stabellini 4 years, 3 months ago

On Fri, 22 Oct 2021, Anthony PERARD wrote:
> On Thu, Oct 21, 2021 at 04:08:39PM -0700, Stefano Stabellini wrote:
> > diff --git a/automation/scripts/qemu-alpine-x86_64.sh b/automation/scripts/qemu-alpine-x86_64.sh
> > new file mode 100644
> > index 0000000000..41b05210d6
> > --- /dev/null
> > +++ b/automation/scripts/qemu-alpine-x86_64.sh
> > @@ -0,0 +1,92 @@
> > +#!/bin/bash
> > +
> > +set -ex
> > +
> > +apt-get -qy update
> > +apt-get -qy install --no-install-recommends qemu-system-x86 \
> > +                                            cpio \
> > +                                            curl \
> > +                                            busybox-static
> 
> Please, don't install packages during the CI job. If you need new
> packages, update the container.

The container used to run this script is the one used for the Xen build:
automation/build/debian/unstable.dockerfile, AKA
registry.gitlab.com/xen-project/xen/debian:unstable. We don't have a
specific container for the sole purpose of running tests.

Thus, I could add qemu-system-x86 to
automation/build/debian/unstable.dockerfile, but then we increase the
size of the debian unstable build container needlessly for all related
build jobs.

Or we could add one more special container just for running tests, but
then it is one more container to maintain and keep up-to-date.

This is why I chose as a compromise to keep the underlying container as
is, and just apt-get the extra 3-4 packages here. It is the same thing
we do on ARM: automation/scripts/qemu-alpine-arm64.sh. Also keep in mind
that this job is run in the "test" step where we have far fewer jobs at
the moment and the runners are not busy. (It would be different in the
"build" step where we have many jobs.)

I am not entirely sure what is the best solution overall, but for this
series at this stage I would prefer to keep the same strategy used for
the ARM tests (i.e. reuse the debian unstable build container and
apt-get the few missing packages.) If we do change the way we do it, I
would rather change both x86 and ARM at the same time.

> That said, "curl" doesn't seems to be needed.

Yeah I'll take away curl.

> > +# DomU Busybox
> > +cd binaries
> > +mkdir -p initrd
> > +mkdir -p initrd/bin
> > +mkdir -p initrd/sbin
> > +mkdir -p initrd/etc
> > +mkdir -p initrd/dev
> > +mkdir -p initrd/proc
> > +mkdir -p initrd/sys
> > +mkdir -p initrd/lib
> > +mkdir -p initrd/var
> > +mkdir -p initrd/mnt
> > +cp /bin/busybox initrd/bin/busybox
> > +initrd/bin/busybox --install initrd/bin
> > +echo "#!/bin/sh
> > +
> > +mount -t proc proc /proc
> > +mount -t sysfs sysfs /sys
> > +mount -t devtmpfs devtmpfs /dev
> > +/bin/sh" > initrd/init
> > +chmod +x initrd/init
> > +cd initrd
> > +find . | cpio --create --format='newc' | gzip > ../initrd.cpio.gz
> 
> This isn't confusing at all, depending on the tool used to make an
> archive, the resulting initrd has a different purpose :-).
>     initrd.tar.gz -> dom0
>     initrd.cpio.gz -> domU
> 
> > +cd ..
> 
> Maybe add a comment here saying that we are now preparing dom0 root
> filesystem? (as there is one for domu)

Sure I can add a comment

> > +mkdir -p rootfs
> > +cd rootfs
> > +tar xvzf ../initrd.tar.gz
> > +mkdir proc
> > +mkdir run
> > +mkdir srv
> > +mkdir sys
> > +rm var/run
> > +cp -ar ../dist/install/* .
> > +mv ../initrd.cpio.gz ./root
> > +cp ../bzImage ./root
> > +echo "name=\"test\"
> > +memory=512
> > +vcpus=1
> > +kernel=\"/root/bzImage\"
> > +ramdisk=\"/root/initrd.cpio.gz\"
> > +extra=\"console=hvc0 root=/dev/ram0 rdinit=/bin/sh\"
> > +" > root/test.cfg
> > +echo "#!/bin/bash
> > +
> 
> Maybe add `set -x` ?

OK

> > +export LD_LIBRARY_PATH=/usr/local/lib
> > +bash /etc/init.d/xencommons start
> > +
> > +xl list
> > +
> > +xl create -c /root/test.cfg
> > +
> > +" > etc/local.d/xen.start
> > +chmod +x etc/local.d/xen.start
> > +echo "rc_verbose=yes" >> etc/rc.conf
> > +find . |cpio -H newc -o|gzip > ../xen-rootfs.cpio.gz
> > +cd ../..
> > +
> > +cat >> binaries/pxelinux.0 <<- EOF
> 
> So, I've look at <<- meaning as I never used that before and it seems to
> remove all leading tab chr from the input, yet they are no tab in the
> input. So maybe use just <<EOF instead, without the dash -.

OK

> > +#!ipxe
> > +
> > +kernel xen console=com1
> > +module bzImage console=hvc0
> > +module xen-rootfs.cpio.gz
> > +boot
> > +EOF

Re: [PATCH 3/3] automation: add a QEMU based x86_64 Dom0/DomU test

Posted by Anthony PERARD 4 years, 3 months ago

On Fri, Oct 22, 2021 at 01:05:35PM -0700, Stefano Stabellini wrote:
> On Fri, 22 Oct 2021, Anthony PERARD wrote:
> > On Thu, Oct 21, 2021 at 04:08:39PM -0700, Stefano Stabellini wrote:
> > > diff --git a/automation/scripts/qemu-alpine-x86_64.sh b/automation/scripts/qemu-alpine-x86_64.sh
> > > new file mode 100644
> > > index 0000000000..41b05210d6
> > > --- /dev/null
> > > +++ b/automation/scripts/qemu-alpine-x86_64.sh
> > > @@ -0,0 +1,92 @@
> > > +#!/bin/bash
> > > +
> > > +set -ex
> > > +
> > > +apt-get -qy update
> > > +apt-get -qy install --no-install-recommends qemu-system-x86 \
> > > +                                            cpio \
> > > +                                            curl \
> > > +                                            busybox-static
> > 
> > Please, don't install packages during the CI job. If you need new
> > packages, update the container.
> 
> The container used to run this script is the one used for the Xen build:
> automation/build/debian/unstable.dockerfile, AKA
> registry.gitlab.com/xen-project/xen/debian:unstable. We don't have a
> specific container for the sole purpose of running tests.

I've added qemu to our debian:stretch container recently, in order to
run the "qemu-smoke-*" tests without running apt commands. Unless more
recent software are needed, you could base the "qemu-alpine-x86*" test
on debian:stretch which might only then missing cpio and busybox. Update
of the stretch container should go smoothly as it has been done
recently.

> Thus, I could add qemu-system-x86 to
> automation/build/debian/unstable.dockerfile, but then we increase the
> size of the debian unstable build container needlessly for all related
> build jobs.

There is something I'm missing, how is it a problem to have a container
that is a bit bigger? What sort of problem could we have to deal with?
On the other hand, doing more task, installing software, downloading
anything from the internet, makes the job much less reliable. It
increase the risk of a failure do to external factors and it takes more
time to run the test.

> Or we could add one more special container just for running tests, but
> then it is one more container to maintain and keep up-to-date.
> 
> This is why I chose as a compromise to keep the underlying container as
> is, and just apt-get the extra 3-4 packages here. It is the same thing
> we do on ARM: automation/scripts/qemu-alpine-arm64.sh. Also keep in mind
> that this job is run in the "test" step where we have far fewer jobs at
> the moment and the runners are not busy. (It would be different in the
> "build" step where we have many jobs.)

I don't really see any difference between a "test" job and a "build"
jobs, both kind use the same resource/runner.

On that note, they're is something I've learned recently: "test" job
don't even have to wait for all "build" job to complete, they can run in
parallel. We would just need to replace "dependencies" by "needs":
    https://docs.gitlab.com/ee/ci/yaml/index.html#needs
But that could be an improvement for an other time and only a side note
for the patch.

> I am not entirely sure what is the best solution overall, but for this
> series at this stage I would prefer to keep the same strategy used for
> the ARM tests (i.e. reuse the debian unstable build container and
> apt-get the few missing packages.) If we do change the way we do it, I
> would rather change both x86 and ARM at the same time.

I'm pretty sure the best strategy would be to do as little as possible
during a job, download as little as possible and possibly cache as much
as possible and do as much as possible ahead of time. Feel free to
change the Arm test, but I don't think it is necessary to change the Arm
test at the same time as introducing an x86 test.

Cheers,

-- 
Anthony PERARD

Re: [PATCH 3/3] automation: add a QEMU based x86_64 Dom0/DomU test

Posted by Stefano Stabellini 4 years, 3 months ago

On Mon, 25 Oct 2021, Anthony PERARD wrote:
> On Fri, Oct 22, 2021 at 01:05:35PM -0700, Stefano Stabellini wrote:
> > On Fri, 22 Oct 2021, Anthony PERARD wrote:
> > > On Thu, Oct 21, 2021 at 04:08:39PM -0700, Stefano Stabellini wrote:
> > > > diff --git a/automation/scripts/qemu-alpine-x86_64.sh b/automation/scripts/qemu-alpine-x86_64.sh
> > > > new file mode 100644
> > > > index 0000000000..41b05210d6
> > > > --- /dev/null
> > > > +++ b/automation/scripts/qemu-alpine-x86_64.sh
> > > > @@ -0,0 +1,92 @@
> > > > +#!/bin/bash
> > > > +
> > > > +set -ex
> > > > +
> > > > +apt-get -qy update
> > > > +apt-get -qy install --no-install-recommends qemu-system-x86 \
> > > > +                                            cpio \
> > > > +                                            curl \
> > > > +                                            busybox-static
> > > 
> > > Please, don't install packages during the CI job. If you need new
> > > packages, update the container.
> > 
> > The container used to run this script is the one used for the Xen build:
> > automation/build/debian/unstable.dockerfile, AKA
> > registry.gitlab.com/xen-project/xen/debian:unstable. We don't have a
> > specific container for the sole purpose of running tests.
> 
> I've added qemu to our debian:stretch container recently, in order to
> run the "qemu-smoke-*" tests without running apt commands. Unless more
> recent software are needed, you could base the "qemu-alpine-x86*" test
> on debian:stretch which might only then missing cpio and busybox. Update
> of the stretch container should go smoothly as it has been done
> recently.

I am happy to use debian:stretch in this patch series. I'll make the
change. That only leaves cpio and busybox-static to apt-get which are
small. In general, I am not too happy about increasing the size of build
containers with testing binaries so I would rather not increase it any
further, see below.

> > Thus, I could add qemu-system-x86 to
> > automation/build/debian/unstable.dockerfile, but then we increase the
> > size of the debian unstable build container needlessly for all related
> > build jobs.
> 
> There is something I'm missing, how is it a problem to have a container
> that is a bit bigger? What sort of problem could we have to deal with?

It takes time to clone the container in the gitlab-ci, the bigger the
container the more time it takes. It is fetched over the network. Now we
are fetching qemu (as part of the container) 10 times during the build
although it is not needed.

There is a compromise to be made. Let's say the QEMU package is 100MB.
If we add QEMU to debian:stretch we are going to increase the overall
data usage by about 1GB, which means we are waiting to fetch 1GB of
additional data during the pipeline.

If we needed QEMU only once, then we could just apt-get it and we would
only fetch 100MB once. Although using apt-get will be a bit slower
because the deb needs to be unpacked, etc. I think that if we used QEMU
only in one test then it would be faster to apt-get it.

However in our case we are using QEMU 4 times during the tests, so
adding QEMU to debian:stretch probably made things a bit faster or at
least not slower.

Does it make sense?

This is why we should have specialized containers for the tests instead
of adding test packages to the build containers. However, today all
containers are updated manually so it would be best to make containers
automatically updatable first.  Until then I think it is OK to apt-get
stuff.

> On the other hand, doing more task, installing software, downloading
> anything from the internet, makes the job much less reliable. It
> increase the risk of a failure do to external factors and it takes more
> time to run the test.

That is fair but it is more a theoretical problem than a real issue at
the moment. In reality I haven't seen Debian's apt-get failing even a
single time in our gitlab-ci (it failed because our debian:unstable
container went out of date but that's our fault).

But we do have a severe problem at the moment with external sources: our
"git clones" keep failing during the build on x86. That is definitely
something worth improving (see my other email thread on the subject) and
it is the main problem affecting gitlab-ci at the moment, I keep having
to restart jobs almost daily to get the overall pipeline to "pass".

If you have any ideas on how to stop fetching things using "git" from
external repositories in gitlab-ci that would be fantastic :-)
The only thing I could think of to "fix it" is moving all external repos
to gitlab repositories mirrors.

> > Or we could add one more special container just for running tests, but
> > then it is one more container to maintain and keep up-to-date.
> > 
> > This is why I chose as a compromise to keep the underlying container as
> > is, and just apt-get the extra 3-4 packages here. It is the same thing
> > we do on ARM: automation/scripts/qemu-alpine-arm64.sh. Also keep in mind
> > that this job is run in the "test" step where we have far fewer jobs at
> > the moment and the runners are not busy. (It would be different in the
> > "build" step where we have many jobs.)
> 
> I don't really see any difference between a "test" job and a "build"
> jobs, both kind use the same resource/runner.
> 
> On that note, they're is something I've learned recently: "test" job
> don't even have to wait for all "build" job to complete, they can run in
> parallel. We would just need to replace "dependencies" by "needs":
>     https://docs.gitlab.com/ee/ci/yaml/index.html#needs
> But that could be an improvement for an other time and only a side note
> for the patch.

That is really cool! I didn't know about it. Yes, currently there is a
big distinction between build and test jobs because build jobs are the
bottleneck and test jobs don't start before all the build jobs finish.

> > I am not entirely sure what is the best solution overall, but for this
> > series at this stage I would prefer to keep the same strategy used for
> > the ARM tests (i.e. reuse the debian unstable build container and
> > apt-get the few missing packages.) If we do change the way we do it, I
> > would rather change both x86 and ARM at the same time.
> 
> I'm pretty sure the best strategy would be to do as little as possible
> during a job, download as little as possible and possibly cache as much
> as possible and do as much as possible ahead of time. Feel free to
> change the Arm test, but I don't think it is necessary to change the Arm
> test at the same time as introducing an x86 test.

I agree.

At the same time it would be nice to follow the same strategy between
x86 and ARM going forward: if one optimization is made for one, it is
also made for the other.

[PATCH 1/3] automation: add x86_64 alpine 3.12 test-artifact
[PATCH 2/3] automation: Linux 5.10.74 test-artifact
[PATCH 3/3] automation: add a QEMU based x86_64 Dom0/DomU test