.gitlab-ci.d: Define new clang sanitizer CI jobs

[RFC PATCH] .gitlab-ci.d: Define new clang sanitizer CI jobs

Posted by Peter Maydell 2 weeks, 6 days ago

Define new CI jobs to build QEMU with all the clang sanitizers
enabled and run "make check".  The split jobs are patterned on
the existing build-cfi-aarch64 / check-cfi-aarch64 ones.

This patch is RFC because the resulting test job times out.  I'm
posting it for suggestions on how we want to try to do this, because
I do think that now we're very nearly at "clean pass on sanitizers"
that's worth defending in CI. (We ought also to be able to do
similar for "make check-functional", but that has an even longer
running and also in my local testing it seems to be flaky for
reasons unrelated to sanitizer fails and which I haven't tried
to track down.)

The build job is fine, but the "make check" job times out long before
it finishes: on the general gitlab runners that my personal QEMU
gitlab repo gets access to, it completed only 102/591 tests within
the hour:
    https://gitlab.com/pm215/qemu/-/jobs/14179464576

We could have multiple versions that split up the targets, but it
would be nice if we could build once and have multiple test jobs;
especially as we move more files into "build-once" having multiple
build jobs split by target means we're building those common files
more often than we need to.  But I don't think "make check" has a way
to say "limit by target".

Note: to get all the tests to pass you'll need a few patches that are
on list but still awaiting review/commit (one sparc patch, three mips
patches, Marc-André's fix for tpm), plus a patch to
lsan_suppressions.txt that silences warnings about target/riscv
leaking of hashtables.  You can find those (plus this patch) in
 https://gitlab.com/pm215/qemu/-/commits/asan-fails
(the branch as of commit a987eab2ed is what I tested with).

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 .gitlab-ci.d/buildtest.yml | 26 ++++++++++++++++++++++++++
 1 file changed, 26 insertions(+)

diff --git a/.gitlab-ci.d/buildtest.yml b/.gitlab-ci.d/buildtest.yml
index 4b1949a3a5..9c73b1a646 100644
--- a/.gitlab-ci.d/buildtest.yml
+++ b/.gitlab-ci.d/buildtest.yml
@@ -434,6 +434,32 @@ clang-system:
     TARGETS: alpha-softmmu arm-softmmu m68k-softmmu mips64-softmmu s390x-softmmu
     MAKE_CHECK_ARGS: check-qtest check-tcg
 
+# We split "build" and "check" here to keep the job times down
+# below the timeout
+build-clang-sanitizer:
+  extends:
+    - .native_build_job_template
+    - .native_build_artifact_template
+  needs:
+    - job: amd64-fedora-container
+  variables:
+    IMAGE: fedora
+    CONFIGURE_ARGS: --cc=clang --cxx=clang++ --enable-ubsan --enable-asan
+      --extra-cflags=-fno-sanitize-recover=undefined
+    MAKE_CHECK_ARGS: check-build
+
+check-clang-sanitizer:
+  extends: .native_test_job_template
+  needs:
+    - job: build-clang-sanitizer
+      artifacts: true
+  variables:
+    IMAGE: fedora
+    MAKE_CHECK_ARGS: check-qtest check-tcg
+    LSAN_OPTIONS: suppressions=../scripts/lsan_suppressions.txt
+    ASAN_OPTIONS: fast_unwind_on_malloc=0
+    TIMEOUT_MULTIPLIER: 3
+
 clang-user:
   extends: .native_build_job_template
   needs:
-- 
2.43.0

Re: [RFC PATCH] .gitlab-ci.d: Define new clang sanitizer CI jobs

Posted by Pierrick Bouvier 2 weeks, 6 days ago

On 5/5/2026 5:27 AM, Peter Maydell wrote:
> Define new CI jobs to build QEMU with all the clang sanitizers
> enabled and run "make check".  The split jobs are patterned on
> the existing build-cfi-aarch64 / check-cfi-aarch64 ones.
> 
> This patch is RFC because the resulting test job times out.  I'm
> posting it for suggestions on how we want to try to do this, because
> I do think that now we're very nearly at "clean pass on sanitizers"
> that's worth defending in CI. (We ought also to be able to do
> similar for "make check-functional", but that has an even longer
> running and also in my local testing it seems to be flaky for
> reasons unrelated to sanitizer fails and which I haven't tried
> to track down.)
> 
> The build job is fine, but the "make check" job times out long before
> it finishes: on the general gitlab runners that my personal QEMU
> gitlab repo gets access to, it completed only 102/591 tests within
> the hour:
>     https://gitlab.com/pm215/qemu/-/jobs/14179464576
> 
> We could have multiple versions that split up the targets, but it
> would be nice if we could build once and have multiple test jobs;
> especially as we move more files into "build-once" having multiple
> build jobs split by target means we're building those common files
> more often than we need to.  But I don't think "make check" has a way
> to say "limit by target".
>

To parallelize, how about structuring things like?
1. Build (asan) for a single (small) target to build common code only
     ||
   cache for ccache
     ||
2. Build (asan) for each base architecture reusing cache.
   This ensures common code should not have to be recompiled
3. Run all tests specific to each base architecture.
   - check-tcg
   +
   - func-quick+func-*
   - func-thorough+func-*
   - qtest+qtest-*

Also, we can run all generic tests with a job 1.b, running after 1:
- decodetree
- qapi-schema+qapi-frontend
- qapi-schema+qapi-interop
- softfloat-slow+softfloat-ops-slow+slow
- softfloat+softfloat-compare
- softfloat+softfloat-conv
- softfloat+softfloat-ops
- tracetool
- unit
- unit+qga

The only thing left would be block tests, which are deactivated for asan
builds:
- block
- block-slow+slow
- block-thorough+thorough

Tests and categories can be found from:
$ ./build/pyvenv/bin/meson test -C build --list --setup thorough

With this, we can even ditch all other testings jobs since asan ones can
be the only reference we need to have.

> Note: to get all the tests to pass you'll need a few patches that are
> on list but still awaiting review/commit (one sparc patch, three mips
> patches, Marc-André's fix for tpm), plus a patch to
> lsan_suppressions.txt that silences warnings about target/riscv
> leaking of hashtables.  You can find those (plus this patch) in
>  https://gitlab.com/pm215/qemu/-/commits/asan-fails
> (the branch as of commit a987eab2ed is what I tested with).
> 
> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
> ---
>  .gitlab-ci.d/buildtest.yml | 26 ++++++++++++++++++++++++++
>  1 file changed, 26 insertions(+)
> 
> diff --git a/.gitlab-ci.d/buildtest.yml b/.gitlab-ci.d/buildtest.yml
> index 4b1949a3a5..9c73b1a646 100644
> --- a/.gitlab-ci.d/buildtest.yml
> +++ b/.gitlab-ci.d/buildtest.yml
> @@ -434,6 +434,32 @@ clang-system:
>      TARGETS: alpha-softmmu arm-softmmu m68k-softmmu mips64-softmmu s390x-softmmu
>      MAKE_CHECK_ARGS: check-qtest check-tcg
>  
> +# We split "build" and "check" here to keep the job times down
> +# below the timeout
> +build-clang-sanitizer:
> +  extends:
> +    - .native_build_job_template
> +    - .native_build_artifact_template
> +  needs:
> +    - job: amd64-fedora-container
> +  variables:
> +    IMAGE: fedora
> +    CONFIGURE_ARGS: --cc=clang --cxx=clang++ --enable-ubsan --enable-asan
> +      --extra-cflags=-fno-sanitize-recover=undefined
> +    MAKE_CHECK_ARGS: check-build
> +
> +check-clang-sanitizer:
> +  extends: .native_test_job_template
> +  needs:
> +    - job: build-clang-sanitizer
> +      artifacts: true
> +  variables:
> +    IMAGE: fedora
> +    MAKE_CHECK_ARGS: check-qtest check-tcg
> +    LSAN_OPTIONS: suppressions=../scripts/lsan_suppressions.txt
> +    ASAN_OPTIONS: fast_unwind_on_malloc=0
> +    TIMEOUT_MULTIPLIER: 3
> +
>  clang-user:
>    extends: .native_build_job_template
>    needs:

Re: [RFC PATCH] .gitlab-ci.d: Define new clang sanitizer CI jobs

Posted by Pierrick Bouvier 2 weeks, 6 days ago

On 5/5/2026 12:29 PM, Pierrick Bouvier wrote:
> On 5/5/2026 5:27 AM, Peter Maydell wrote:
>> Define new CI jobs to build QEMU with all the clang sanitizers
>> enabled and run "make check".  The split jobs are patterned on
>> the existing build-cfi-aarch64 / check-cfi-aarch64 ones.
>>
>> This patch is RFC because the resulting test job times out.  I'm
>> posting it for suggestions on how we want to try to do this, because
>> I do think that now we're very nearly at "clean pass on sanitizers"
>> that's worth defending in CI. (We ought also to be able to do
>> similar for "make check-functional", but that has an even longer
>> running and also in my local testing it seems to be flaky for
>> reasons unrelated to sanitizer fails and which I haven't tried
>> to track down.)
>>
>> The build job is fine, but the "make check" job times out long before
>> it finishes: on the general gitlab runners that my personal QEMU
>> gitlab repo gets access to, it completed only 102/591 tests within
>> the hour:
>>     https://gitlab.com/pm215/qemu/-/jobs/14179464576
>>
>> We could have multiple versions that split up the targets, but it
>> would be nice if we could build once and have multiple test jobs;
>> especially as we move more files into "build-once" having multiple
>> build jobs split by target means we're building those common files
>> more often than we need to.  But I don't think "make check" has a way
>> to say "limit by target".
>>
> 
> To parallelize, how about structuring things like?
> 1. Build (asan) for a single (small) target to build common code only
>      ||
>    cache for ccache
>      ||
> 2. Build (asan) for each base architecture reusing cache.
>    This ensures common code should not have to be recompiled
> 3. Run all tests specific to each base architecture.
>    - check-tcg
>    +
>    - func-quick+func-*
>    - func-thorough+func-*
>    - qtest+qtest-*
>

Steps 2 and 3 being ran in parallel (per base arch) in case it's not
clear from my original message.

> Also, we can run all generic tests with a job 1.b, running after 1:
> - decodetree
> - qapi-schema+qapi-frontend
> - qapi-schema+qapi-interop
> - softfloat-slow+softfloat-ops-slow+slow
> - softfloat+softfloat-compare
> - softfloat+softfloat-conv
> - softfloat+softfloat-ops
> - tracetool
> - unit
> - unit+qga
> 
> The only thing left would be block tests, which are deactivated for asan
> builds:
> - block
> - block-slow+slow
> - block-thorough+thorough
> 
> Tests and categories can be found from:
> $ ./build/pyvenv/bin/meson test -C build --list --setup thorough
> 
> With this, we can even ditch all other testings jobs since asan ones can
> be the only reference we need to have.
> 
>> Note: to get all the tests to pass you'll need a few patches that are
>> on list but still awaiting review/commit (one sparc patch, three mips
>> patches, Marc-André's fix for tpm), plus a patch to
>> lsan_suppressions.txt that silences warnings about target/riscv
>> leaking of hashtables.  You can find those (plus this patch) in
>>  https://gitlab.com/pm215/qemu/-/commits/asan-fails
>> (the branch as of commit a987eab2ed is what I tested with).
>>
>> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
>> ---
>>  .gitlab-ci.d/buildtest.yml | 26 ++++++++++++++++++++++++++
>>  1 file changed, 26 insertions(+)
>>
>> diff --git a/.gitlab-ci.d/buildtest.yml b/.gitlab-ci.d/buildtest.yml
>> index 4b1949a3a5..9c73b1a646 100644
>> --- a/.gitlab-ci.d/buildtest.yml
>> +++ b/.gitlab-ci.d/buildtest.yml
>> @@ -434,6 +434,32 @@ clang-system:
>>      TARGETS: alpha-softmmu arm-softmmu m68k-softmmu mips64-softmmu s390x-softmmu
>>      MAKE_CHECK_ARGS: check-qtest check-tcg
>>  
>> +# We split "build" and "check" here to keep the job times down
>> +# below the timeout
>> +build-clang-sanitizer:
>> +  extends:
>> +    - .native_build_job_template
>> +    - .native_build_artifact_template
>> +  needs:
>> +    - job: amd64-fedora-container
>> +  variables:
>> +    IMAGE: fedora
>> +    CONFIGURE_ARGS: --cc=clang --cxx=clang++ --enable-ubsan --enable-asan
>> +      --extra-cflags=-fno-sanitize-recover=undefined
>> +    MAKE_CHECK_ARGS: check-build
>> +
>> +check-clang-sanitizer:
>> +  extends: .native_test_job_template
>> +  needs:
>> +    - job: build-clang-sanitizer
>> +      artifacts: true
>> +  variables:
>> +    IMAGE: fedora
>> +    MAKE_CHECK_ARGS: check-qtest check-tcg
>> +    LSAN_OPTIONS: suppressions=../scripts/lsan_suppressions.txt
>> +    ASAN_OPTIONS: fast_unwind_on_malloc=0
>> +    TIMEOUT_MULTIPLIER: 3
>> +
>>  clang-user:
>>    extends: .native_build_job_template
>>    needs:
>