From nobody Sat Nov 23 18:03:10 2024 Delivered-To: importer@patchew.org Received-SPF: pass (zohomail.com: domain of lists.libvirt.org designates 8.43.85.245 as permitted sender) client-ip=8.43.85.245; envelope-from=devel-bounces@lists.libvirt.org; helo=lists.libvirt.org; Authentication-Results: mx.zohomail.com; dkim=fail; spf=pass (zohomail.com: domain of lists.libvirt.org designates 8.43.85.245 as permitted sender) smtp.mailfrom=devel-bounces@lists.libvirt.org; dmarc=fail(p=none dis=none) header.from=gmail.com Return-Path: Received: from lists.libvirt.org (lists.libvirt.org [8.43.85.245]) by mx.zohomail.com with SMTPS id 1724085113282285.5113876155575; Mon, 19 Aug 2024 09:31:53 -0700 (PDT) Received: by lists.libvirt.org (Postfix, from userid 996) id 29797982; Mon, 19 Aug 2024 12:31:52 -0400 (EDT) Received: from lists.libvirt.org (localhost [IPv6:::1]) by lists.libvirt.org (Postfix) with ESMTP id 41B0E1702; Mon, 19 Aug 2024 12:20:55 -0400 (EDT) Received: by lists.libvirt.org (Postfix, from userid 996) id E7A9215E0; Mon, 19 Aug 2024 12:20:43 -0400 (EDT) Received: from mail-ej1-f49.google.com (mail-ej1-f49.google.com [209.85.218.49]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by lists.libvirt.org (Postfix) with ESMTPS id D45FD1621 for ; Mon, 19 Aug 2024 12:20:10 -0400 (EDT) Received: by mail-ej1-f49.google.com with SMTP id a640c23a62f3a-a80ea7084e9so228967966b.0 for ; Mon, 19 Aug 2024 09:20:10 -0700 (PDT) Received: from localhost.localdomain ([37.186.51.21]) by smtp.gmail.com with ESMTPSA id 4fb4d7f45d1cf-5bebbde4964sm5738298a12.24.2024.08.19.09.20.07 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 19 Aug 2024 09:20:08 -0700 (PDT) X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on lists.libvirt.org X-Spam-Level: X-Spam-Status: No, score=-0.6 required=5.0 tests=DKIM_ADSP_CUSTOM_MED, DKIM_INVALID,DKIM_SIGNED,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H3,RCVD_IN_MSPIKE_WL,SPF_HELO_NONE,T_SCC_BODY_TEXT_LINE autolearn=unavailable autolearn_force=no version=3.4.4 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1724084409; x=1724689209; darn=lists.libvirt.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=c1adgYEBZ8y8QHT6z7C1curS3uXRODS27kGMserVOZA=; b=X30vi79yYyvO784J6YN59CbPU06wlkIeRgLxWigzkLunhD154r6c9v0PJWsCd9yDhH +kETO3US6jZTdcRwNwrK07PbOdNymt68cK1hr2WOi9nG60blKeSQM32oIELYZwkwdVng xGfshbvMjoIRd9m/qBJD7a+mED9c4I8sU/E1b2uqMzQM4ZEtxjo3zT3V9mPIXYZ+3R3l TVcYktqAg0ENYct0rP4sPRVEDg9dJ2c4mmzV9vEcxC4n1TaXcGSVZqbpajXdjMr+JNLU +SWgjcVB9ZGqOtoL9/Xr+/fmdmkC5ICmD2cpXtx6Ym5EfVUf4vvHdvjUlE7sMnmsqiZ4 stLA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1724084409; x=1724689209; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=c1adgYEBZ8y8QHT6z7C1curS3uXRODS27kGMserVOZA=; b=B8TMnqVd9v7XQOwvqTQ2kZJNuQACxJXqn6fBfNWh3PzeIoC4SBdxJ+67K75vb2FcfY sPhqNdwDUZqcpAK8o5dX0ycbkjqEVX5IOloxTtlX81Zh79eEQX22g2g7dMLODL8q3PWc ZLmwD3p+ije7gmDlhr3Uqne2z69N7+R70PptF96j50UM4TQR892u2zO4QkXzQPJjNJ2u MXcTfUIl94g5JOEZNZoC4oce/jLBQISt3SkK2oTIMBKwU1dKGT0b+DG+oGF4fUe2EgzG UVCiaCnDeHwQB5ACCpKhYky3xT0/KirS4UmrFaG0OwzgE4rKehz8x12hgRMYssd/lc7O vViA== X-Gm-Message-State: AOJu0Yz/e8C7P6PDxuasuL4jiNKWWsXG7ZYRu/F8gCMv0GqukoLDAc8S X5rqaCcVyAd7qI508rI/PIuYGUoRLn3YOqZevpbGWolyaUoJiJBsuodWCg== X-Google-Smtp-Source: AGHT+IFhn8XS8VtgaBkY+FyYxS4qO4YXBEZBj+iZXA3ArxWsQIwZpYCMNFZ652cB0Fhv1ZYxrJs8lw== X-Received: by 2002:a05:6402:5506:b0:58b:1a5e:c0e7 with SMTP id 4fb4d7f45d1cf-5beca8cd10bmr6522136a12.35.1724084409014; Mon, 19 Aug 2024 09:20:09 -0700 (PDT) From: Rayhan Faizel To: devel@lists.libvirt.org Subject: [PATCH 14/14] docs: Document the fuzzers Date: Mon, 19 Aug 2024 21:39:52 +0530 Message-Id: <20240819160952.351383-15-rayhan.faizel@gmail.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240819160952.351383-1-rayhan.faizel@gmail.com> References: <20240819160952.351383-1-rayhan.faizel@gmail.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Message-ID-Hash: SUYAJMDQ7VEXORWMNPWAN5SI7IUILNWY X-Message-ID-Hash: SUYAJMDQ7VEXORWMNPWAN5SI7IUILNWY X-MailFrom: rayhan.faizel@gmail.com X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; header-match-config-1; header-match-config-2; header-match-config-3; header-match-devel.lists.libvirt.org-0; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; suspicious-header CC: Rayhan Faizel X-Mailman-Version: 3.2.2 Precedence: list List-Id: Development discussions about the libvirt library & tools Archived-At: List-Archive: List-Help: List-Post: List-Subscribe: List-Unsubscribe: X-ZohoMail-DKIM: fail (Header signature does not verify) X-ZM-MESSAGEID: 1724085113857116600 Content-Type: text/plain; charset="utf-8" Document the fuzzers in two ways. 1. Explain the high level working of the fuzzers under docs/kbase. 2. Add README to explain general setup of the fuzzer and its usage. Signed-off-by: Rayhan Faizel --- docs/kbase/index.rst | 3 + docs/kbase/internals/meson.build | 1 + docs/kbase/internals/xml-fuzzing.rst | 120 ++++++++++++++++++++++++ tests/fuzz/README.rst | 131 +++++++++++++++++++++++++++ 4 files changed, 255 insertions(+) create mode 100644 docs/kbase/internals/xml-fuzzing.rst create mode 100644 tests/fuzz/README.rst diff --git a/docs/kbase/index.rst b/docs/kbase/index.rst index e51b35cbfc..9cf6268800 100644 --- a/docs/kbase/index.rst +++ b/docs/kbase/index.rst @@ -116,3 +116,6 @@ Internals =20 `QEMU monitor event handling `__ Brief outline how events emitted by qemu on the monitor are handlded. + +`XML Fuzzing `__ + Working of the structure-aware XML fuzzers. diff --git a/docs/kbase/internals/meson.build b/docs/kbase/internals/meson.= build index f1e9122f8f..86b6639419 100644 --- a/docs/kbase/internals/meson.build +++ b/docs/kbase/internals/meson.build @@ -9,6 +9,7 @@ docs_kbase_internals_files =3D [ 'qemu-migration', 'qemu-threads', 'rpc', + 'xml-fuzzing', ] =20 =20 diff --git a/docs/kbase/internals/xml-fuzzing.rst b/docs/kbase/internals/xm= l-fuzzing.rst new file mode 100644 index 0000000000..85f565fda5 --- /dev/null +++ b/docs/kbase/internals/xml-fuzzing.rst @@ -0,0 +1,120 @@ +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D +Libvirt XML fuzzing +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D + +XML fuzzing is done using libFuzzer and libprotobuf-mutator. XML fuzzing +cannot be done with normal fuzzing methods, as XML is a highly structured +format. Structure-aware fuzzing is implemented using libprotobuf-mutator w= hich +mutates and fuzzes protobuf inputs. Protobufs are used as an intermediate +format and serialized to XML. + +Protobuf to XML representation +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D + +A protobuf definition written to fuzz libvirt XML formats may resemble the +following. + +:: + + message MainObj { + message SomeTagMessage { + optional uint32 A_number =3D 1; + optional DummyString A_name =3D 2; + + enum typeEnum { + typeA =3D 0; + typeB =3D 1; + typeC =3D 2; + } + + optional typeEnum A_type =3D 3; + + message InnerTagMessage { + optional uint32 A_number =3D 1; + } + + repeated InnerTagMessage T_innertag =3D 4; + + message SecondInnerTagMessage { + optional uint32 V_value =3D 1; + } + optional SecondInnerTagMessage T_secondinner =3D 5; + } + + optional SomeTagMessage T_sometag =3D 1; + } + +* Fields starting with ``T_`` represent XML tags. Their types are protobuf= messages + which may further contain other protobuf-defined XML tags or attributes. + +* Fields starting with ``A_`` represent XML attributes. Most of the time, + it uses one of the primitive datatypes (Eg: ``uint32``, ``bool``, ``enum= ``, etc. ) available in protobuf. + + * If the attribute can take multiple data types, it is encapsulated in a= ``oneof`` statement. + The field name also has a prefix of ``A_OPTXX_`` where ``XX`` is a num= ber between 0 to 99. + * If the attribute name contains special characters, the real name is st= ored in + ``libvirt::real_name`` which is extended by ``FieldOptions``. + * If an enum value contains special characters, the real value is stored= in + ``libvirt::real_value`` which is extended by ``EnumValueOptions``. + +* Fields starting with ``V_`` represent raw text in XML. + + * If ``T_`` and ``V_`` fields are defined in the same message, ``V_`` fi= elds + will be preferred only if it has presence, otherwise it will process t= he + rest of the ``T`` fields as usual. + * ``V_`` fields can take on the same datatypes as ``A_`` fields. + +* ``repeated`` is used to allow multiple XML tags of the same name. + +``A_`` fields must always precede ``V_`` and ``T_`` fields. Likewise, ``V_= `` +fields must precede ``T_`` fields if any. + +On fuzzing the above protobuf definition, one of the possible protobuf to = XML +serializations could be + +:: + + + + + 1241232 + + +Custom Protobuf Datatypes +------------------------- + +Sometimes, primitive data types or enums are not enough to encode the +desired attribute values, especially if they themselves are structured. In= this +case, such fields are represented by a handwritten protobuf message define= d in +``xml_domain_datatypes.proto``. To serialize these messages to XML attribu= te +values, custom handlers are defined in ``proto_custom_datatypes.cc``. + +This is useful for data types such as IP addresses, MAC addresses, target +device names, etc. + +Protobuf generation +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D + +``proto`` files are automatically generated on compile-time using the scri= pt +``relaxng_to_proto.py``. The script parses relaxng schemas to generate a p= rotobuf +file containing fields and messages representing all the defined XML tags = and +attributes. + +The script tries to figure out the correct datatype of the XML attribute. +However, on its own it can only figure out the general datatype or enum va= lues +of the attribute but not the constraints or regex patterns. Some override = tables +are present to improve upon that. + +Fuzzer Harnesses +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D + +Driver-specific harnesses in general re-use the existing test driver setup +as well as other existing test utilities under ``tests/``. Harnesses are +available for the following drivers: + +* QEMU XML Domain +* QEMU XML Hotplug +* CH XML Domain +* VMX XML Domain +* libXL XML Domain +* NWFilter XML diff --git a/tests/fuzz/README.rst b/tests/fuzz/README.rst new file mode 100644 index 0000000000..d92cdc94d7 --- /dev/null +++ b/tests/fuzz/README.rst @@ -0,0 +1,131 @@ +=3D=3D=3D=3D=3D=3D=3D +Fuzzing +=3D=3D=3D=3D=3D=3D=3D + +The XML fuzzing project was built as part of Google Summer of Code 2024. +The fuzzing project aims to find edge-case XML configurations that may cra= sh +libvirt during parsing. The libvirt domain XML format is a highly structur= ed +grammar so normal methods of fuzzing will not work. We use a combination +of libFuzzer and libprotobuf-mutator to perform structure-aware fuzzing of +various libvirt XML formats. The XML is represented through an intermediate +protobuf that is mutated by libprotobuf-mutator. This protobuf is automati= cally +generated by a Python script ``relaxng_to_proto.py`` which parses relaxNG +schemas. + +Currently, we fuzz the following: + +* QEMU XML Domain (qemu_xml_domain_fuzz, qemu_xml_domain_fuzz_disk, qemu_x= ml_domain_fuzz_interface) +* QEMU XML Hotplug (qemu_xml_hotplug_fuzz) +* CH XML Domain (ch_xml_domain_fuzz) +* VMX XML Domain (vmx_xml_domain_fuzz) +* LibXL XML Domain (libxl_xml_domain_fuzz) +* NWFilter XML (xml_nwfilter_fuzz) + +libprotobuf-mutator +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D + +libprotobuf-mutator is the crux of our fuzzing methodology that +allows us to perform grammar-aware fuzzing of the XML format in the first +place. However, its setup is a bit involved. The general build and install +instructions can be followed in +https://github.com/google/libprotobuf-mutator/blob/master/README.md +but we will have to tweak it depending on the distro. One of the biggest +problems is that most distros have very outdated versions of protobuf +which will cause various build and linkage issues with the mutator. + +- If you are on a rolling release distro, the system package can likely be + used as-is. However, you may need to pass ``-std=3Dc++17`` in ``CXXFLAG= S`` + and ``-Wl,--copy-dt-needed-entries`` in ``LDFLAGS``.\ +- For every other distro with old protobuf installations, you can supply + ``-DLIB_PROTO_MUTATOR_DOWNLOAD_PROTOBUF=3DON`` during libprotobuf-mutat= or + setup. After this, provide ``-Dexternal_protobuf_dir=3D`` to libvi= rt + meson setup pointing to the ``external.protobuf`` directory generated + during libprotobuf-mutator compilation. +- On some distros like Fedora which predominantly use PIC compiled + libraries, you may need to pass ``-fPIC`` in ``CFLAGS/CXXFLAGS`` or you + will encounter relocation errors during libvirt compilation. + +Setup +=3D=3D=3D=3D=3D + +:: + + env CC=3Dclang CXX=3Dclang++ \ + meson setup build -Dsystem=3Dtrue -Ddriver_qemu=3Denabled -Db_lundef= =3Dfalse \ + -Db_sanitize=3Daddress,undefined -Dfuz= z=3Denabled -Dexternal_protobuf_dir=3D + +- This command line will introduce LLVM SanitizerCoverage across all + object files. +- libFuzzer is supported only on clang/clang++. +- To use an external protobuf dependency, use + ``-Dexternal_protobuf_dir=3D``. If your system has a new enough pro= tobuf + dependency, you can ignore this. +- ``b_sanitize`` is not compulsory but it does improve the odds of the fuz= zer + finding interesting test cases. It is recommended to pass + ``address,undefined`` to enable both ASAN and UBSan. Note that ASAN will + cut your performance by a factor of 2 on average. +- You can set ``b_sanitize`` to ``thread`` to enable TSAN which is useful = for + fuzzing race conditions in the ``qemu_xml_hotplug_fuzz`` fuzzer especial= ly. + +NOTE: This has only been tested on x86_64 and aarch64 Linux, but should wo= rk +identically on other architectures and possibly even other UNIX based OSes +(BSD, macOS, etc.). + +Usage +=3D=3D=3D=3D=3D + +Run ``./tests/fuzz/run_fuzz ``. + +If the fuzzer finds a crashing test case, it will dump a separate file in = your +working directory. Run +``./tests/fuzz/run_fuzz --testcase `` to reproduce the= crash. +More options to configure the fuzzer can be found with the ``-h`` flag. To= save/ +load a corpus, add ``--corpus ``. + +To merge or minimize corpuses, run +:: + ./tests/fuzz/run_fuzz --libfuzzer-options=3D"-merge=3D1 " + +Notable options are listed below. + +- ``--arch``: Set architecture of the domain XML to fuzz. +- ``-j, --jobs``: Run parallel fuzzing workers using either ``jobs`` or + ``fork`` based on ``--parallel-mode``. Eg: + ``./tests/fuzz/run_fuzz qemu_xml_domain_fuzz -j8 --parallel-mode fork``. +- ``--dump-xml``: Print all fuzzed XMLs (useful for debugging reproducers) +- ``--format-xml``: Exercise format function on XML domain fuzzers. +- ``--corpus``: Save or use corpus on-disk. +- ``--libfuzzer-options``: Pass additional libFuzzer flags as documented in + https://llvm.org/docs/LibFuzzer.html#options. + +Coverage Report +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D + +- libvirt supports instrumenting builds with gcov for coverage data colle= ction + using ``-Dtest_coverage=3Dtrue``. +:: + + ./tests/fuzz/run_fuzz --total_time=3D --corpus=3D + ./tests/fuzz/run_fuzz --corpus=3D --libfuzzer-opt= ions=3D"-runs=3D0" + find -name '*.gcda' -exec llvm-cov gcov {} \; # Run in build directory + gcovr --gcov-executable "llvm-cov gcov" --html-details coverage.html -= r + +- Alternatively, we can use clang profile coverage instrumentation + enabled with ``-Dtest_coverage_clang=3Dtrue``. +:: + + ./tests/fuzz/run_fuzz --total_time=3D --corpus=3D + ./tests/fuzz/run_fuzz --corpus=3D --llvm-profile-= file=3Dcoverage.profraw + llvm-profdata merge coverage.profraw -output coverage.profdata + llvm-cov show --instr-profile coverage.profdata --sources --format html > coverage.html + +Tips +=3D=3D=3D=3D + +- libFuzzer will try to pass comparison checks using its internal TORC + (Table of Recent Comparisons), but this can get easily overwhelmed in t= he + case of libvirt due to its code being quite complex. You can alleviate + this to some extent by passing ``--use-value-profile`` to the fuzzer. +- If you want the fuzzer to proceed even after encountering a crash, + add ``-j --parallel-mode=3Dfork``. Do note that the memory usage will + increase exponentially with each parallel fuzzing worker. --=20 2.34.1