From nobody Mon May 25 05:54:39 2026 Received: from mailgw02.zimbra-vnc.de (mailgw02.zimbra-vnc.de [148.251.102.236]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 3DF543176EE; Mon, 18 May 2026 06:21:18 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=148.251.102.236 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779085283; cv=none; b=u1Ys2mVOTZwl79w2alhGRpl/1hjKeEJ+zNOxKoRrT0HVdT3n2Z19fzBES/wJ/z+UFNfPT6ZObu+HoXrTb8GfIanllmSuqvU8UZqBFhxLlxWLZDBBZki5PYDeJHdU1lrVkRGv/eF1N5oS8IYDMcuFatllLStjTVrp/dVhFeFgwao= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779085283; c=relaxed/simple; bh=BhDM3wxbRQR+9889hYbygD2Zy9Xs083yLTH4wRhGWIM=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=jtK0A+yzTkVTUruAWmU6+l/qar11YMKzADs+BnpL8sQMTW5wBTIkUYSQKjVP048ZLaKuWRC1vrMjXx4zmXpNlyYkBFZkR0XpxvN2xnrdas5Uw9U3pzrR9oagSIcrYJC24Mn73ndel6ESINgMhtWxkzK2dW962bfqWB/Xoft0zYk= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=tngtech.com; spf=pass smtp.mailfrom=tngtech.com; dkim=pass (2048-bit key) header.d=tngtech.com header.i=@tngtech.com header.b=DwkX2u3K; arc=none smtp.client-ip=148.251.102.236 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=tngtech.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=tngtech.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=tngtech.com header.i=@tngtech.com header.b="DwkX2u3K" Received: from zmproxy.tng.vnc.biz (zimbra-vnc.tngtech.com [35.234.71.156]) by mailgw02.zimbra-vnc.de (Postfix) with ESMTPS id 267C7200AA; Mon, 18 May 2026 08:21:09 +0200 (CEST) Received: from localhost (localhost [127.0.0.1]) by zmproxy.tng.vnc.biz (Postfix) with ESMTP id E36E61FAD2B; Mon, 18 May 2026 08:21:08 +0200 (CEST) Received: from zmproxy.tng.vnc.biz ([127.0.0.1]) by localhost (zmproxy.tng.vnc.biz [127.0.0.1]) (amavis, port 10032) with ESMTP id SbyGJfaKSExT; Mon, 18 May 2026 08:21:08 +0200 (CEST) Received: from localhost (localhost [127.0.0.1]) by zmproxy.tng.vnc.biz (Postfix) with ESMTP id 012051FAD2F; Mon, 18 May 2026 08:21:08 +0200 (CEST) DKIM-Filter: OpenDKIM Filter v2.10.3 zmproxy.tng.vnc.biz 012051FAD2F DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=tngtech.com; s=B14491C6-869D-11EB-BB6C-8DD33D883B31; t=1779085268; bh=D+eLvV0iz2LovY/WMJKzo0PTVmPQCEG6btwOjsAZPIc=; h=From:To:Subject:Date:Message-ID:MIME-Version; b=DwkX2u3KftQqBVu2HvywtOUJUdcCmuDDlYinkF/bjQWy5Fe93ZqJNv4yte8o0aogj ARPdK/G5lEZ/oAk8eFqKwGqK+4CArAW9MWPSO4/aXcsN97rbOy9uMZB+/toC/nc/eq iiW5mn7NTVJvrYx+5L/ZB90gnrv9sJyVyYGm3pwcU0N0Oufb8zJF+7g48k9ILbyPlc +cupcmNizs/zfO8Z24xV1AlyllZ+gX/bWzgjEE7BNElpUd1KOBoY1Ry9BRzgAjCMie 3KRGwMK1iW0iFkJhFlaHPmt1EX6IyJmVKQQ7oE2xTZ7SXdT17NuHoJf31irLRhQ+ly EKMTz1fhQItpA== X-Virus-Scanned: amavis at zmproxy.tng.vnc.biz Received: from zmproxy.tng.vnc.biz ([127.0.0.1]) by localhost (zmproxy.tng.vnc.biz [127.0.0.1]) (amavis, port 10026) with ESMTP id sG0-Hq2XVOC8; Mon, 18 May 2026 08:21:07 +0200 (CEST) Received: from luis-Precision-5480.. (ipservice-092-209-239-167.092.209.pools.vodafone-ip.de [92.209.239.167]) by zmproxy.tng.vnc.biz (Postfix) with ESMTPSA id 9C5DE1FAD2B; Mon, 18 May 2026 08:21:07 +0200 (CEST) From: Luis To: nathan@kernel.org, nsc@kernel.org Cc: linux-kbuild@vger.kernel.org, linux-kernel@vger.kernel.org, akpm@linux-foundation.org, gregkh@linuxfoundation.org, kstewart@linuxfoundation.org, maximilian.huber@tngtech.com, Luis Augenstein Subject: [PATCH v7 01/15] scripts/sbom: add documentation Date: Mon, 18 May 2026 08:20:48 +0200 Message-ID: <20260518062102.2051814-2-luis.augenstein@tngtech.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20260518062102.2051814-1-luis.augenstein@tngtech.com> References: <20260518062102.2051814-1-luis.augenstein@tngtech.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Luis Augenstein Assisted-by: Cursor:claude-sonnet-4-5 Assisted-by: OpenCode:GLM-4-7 Co-developed-by: Maximilian Huber Signed-off-by: Maximilian Huber Signed-off-by: Luis Augenstein --- Documentation/tools/index.rst | 1 + Documentation/tools/sbom/sbom.rst | 206 ++++++++++++++++++++++++++++++ 2 files changed, 207 insertions(+) create mode 100644 Documentation/tools/sbom/sbom.rst diff --git a/Documentation/tools/index.rst b/Documentation/tools/index.rst index 5f2f63bcb28..1adf4a6f909 100644 --- a/Documentation/tools/index.rst +++ b/Documentation/tools/index.rst @@ -13,3 +13,4 @@ more additions are needed here: rtla/index rv/index python + sbom/sbom diff --git a/Documentation/tools/sbom/sbom.rst b/Documentation/tools/sbom/s= bom.rst new file mode 100644 index 00000000000..029b08b6ad8 --- /dev/null +++ b/Documentation/tools/sbom/sbom.rst @@ -0,0 +1,206 @@ +.. SPDX-License-Identifier: GPL-2.0-only OR MIT +.. Copyright (C) 2025 TNG Technology Consulting GmbH + +KernelSbom +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D + +Introduction +------------ + +KernelSbom is a Python script ``scripts/sbom/sbom.py`` that can be +executed after a successful kernel build. When invoked, KernelSbom +analyzes all files involved in the build and generates Software Bill of +Materials (SBOM) documents in SPDX 3.0.1 format. +The generated SBOM documents capture: + +* **Final output artifacts**, typically the kernel image and modules +* **All source files** that contributed to the build with metadata + and licensing information +* **Details of the build process**, including intermediate artifacts + and the build commands linking source files to the final output + artifacts + +KernelSbom is originally developed in the +`KernelSbom repository `_. + +Requirements +------------ + +Python 3.10 or later. No libraries or other dependencies are required. + +Basic Usage +----------- + +Run the ``make sbom`` target. +For example:: + + $ make defconfig O=3Dkernel_build + $ make sbom O=3Dkernel_build -j$(nproc) + +This will trigger a kernel build. After all build outputs have been +generated, KernelSbom produces three SPDX documents in the root +directory of the object tree: + +* ``sbom-source.spdx.json`` + Describes all source files involved in the build and + associates each file with its corresponding license expression. + +* ``sbom-output.spdx.json`` + Captures all final build outputs (kernel image and ``.ko`` module files) + and includes build metadata such as environment variables and + a hash of the ``.config`` file used for the build. + +* ``sbom-build.spdx.json`` + Imports files from the source and output documents and describes every + intermediate build artifact. For each artifact, it records the exact + build command used and establishes the relationship between + input files and generated outputs. + +When invoking the sbom target, it is recommended to perform +out-of-tree builds using ``O=3D``. KernelSbom classifies files as +source files when they are located in the source tree and not in the +object tree. For in-tree builds, where the source and object trees are +the same directory, this distinction can no longer be made reliably. +In that case, KernelSbom does not generate a dedicated source SBOM. +Instead, source files are included in the build SBOM. + +Standalone Usage +---------------- + +KernelSbom can also be used as a standalone script to generate +SPDX documents for specific build outputs. For example, after a +successful x86 kernel build, KernelSbom can generate SPDX documents +for the ``bzImage`` kernel image:: + + $ SRCARCH=3Dx86 python3 scripts/sbom/sbom.py \ + --src-tree . \ + --obj-tree ./kernel_build \ + --roots arch/x86/boot/bzImage \ + --generate-spdx \ + --generate-used-files \ + --prettify-json \ + --debug + +Note that when KernelSbom is invoked outside of the ``make`` process, +the environment variables used during compilation are not available and +therefore cannot be included in the generated SPDX documents. It is +recommended to set at least the ``SRCARCH`` environment variable to the +architecture for which the build was performed. + +For a full list of command-line options, run:: + + $ python3 scripts/sbom/sbom.py --help + +Output Format +------------- + +KernelSbom generates documents conforming to the +`SPDX 3.0.1 specification `_ +serialized as JSON-LD. + +To reduce file size, the output documents use the JSON-LD ``@context`` +to define custom prefixes for ``spdxId`` values. While this is compliant +with the SPDX specification, only a limited number of tools in the +current SPDX ecosystem support custom JSON-LD contexts. To use such +tools with the generated documents, the custom JSON-LD context must +be expanded before providing the documents. +See https://lists.spdx.org/g/Spdx-tech/message/6064 for more information. + +How it Works +------------ + +KernelSbom operates in two major phases: + +1. **Generate the cmd graph**, an acyclic directed dependency graph. +2. **Generate SPDX documents** based on the cmd graph. + +KernelSbom begins from the root artifacts specified by the user, e.g., +``arch/x86/boot/bzImage``. For each root artifact, it collects all +dependencies required to build that artifact. The dependencies come +from multiple sources: + +* **.cmd files**: The primary source is the ``.cmd`` file of the + generated artifact, e.g., ``arch/x86/boot/.bzImage.cmd``. These files + contain the exact command used to build the artifact and often include + an explicit list of input dependencies. By parsing the ``.cmd`` + file, the full list of dependencies can be obtained. + +* **.incbin statements**: The second source are include binary + ``.incbin`` statements in ``.S`` assembly files. + +* **Hardcoded dependencies**: Unfortunately, not all build dependencies + can be found via ``.cmd`` files and ``.incbin`` statements. Some build + dependencies are directly defined in Makefiles or Kbuild files. + Parsing these files is considered too complex for the scope of this + project. Instead, the remaining gaps of the graph are filled using a + list of manually defined dependencies, see + ``scripts/sbom/sbom/cmd_graph/hardcoded_dependencies.py``. This list is + known to be incomplete. However, analysis of the cmd graph indicates a + ~99% completeness. For more information about the completeness analysis, + see `KernelSbom #95 `_. + +Given the list of dependency files, KernelSbom recursively processes +each file, expanding the dependency chain all the way to the version +controlled source files. The result is a complete dependency graph +where nodes represent files, and edges represent "file A was used to +build file B" relationships. + +Using the cmd graph, KernelSbom produces three SPDX documents. +For every file in the graph, KernelSbom: + +* Parses ``SPDX-License-Identifier`` headers, +* Computes file hashes, +* Estimates the file type based on extension and path, +* Records build relationships between files. + +Each root output file is additionally associated with an SPDX Package +element that captures version information, license data, and copyright. + +Advanced Usage +-------------- + +Including Kernel Modules +~~~~~~~~~~~~~~~~~~~~~~~~ + +The list of all ``.ko`` kernel modules produced during a build can be +extracted from the ``modules.order`` file within the object tree. +For example:: + + $ echo "arch/x86/boot/bzImage" > sbom-roots.txt + $ sed 's/\.o$/.ko/' ./kernel_build/modules.order >> sbom-roots.txt + +Then use the generated roots file:: + + $ SRCARCH=3Dx86 python3 scripts/sbom/sbom.py \ + --src-tree . \ + --obj-tree ./kernel_build \ + --roots-file sbom-roots.txt \ + --generate-spdx + +Equal Source and Object Trees +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +When the source tree and object tree are identical (for example, when +building in-tree), source files can no longer be reliably distinguished +from generated files. +In this scenario, KernelSbom does not produce a dedicated +``sbom-source.spdx.json`` document. Instead, both source files and build +artifacts are included together in ``sbom-build.spdx.json``, and +``sbom.used-files.txt`` lists all files referenced in the build document. + +Unknown Build Commands +~~~~~~~~~~~~~~~~~~~~~~ + +Because the kernel supports a wide range of configurations and versions, +KernelSbom may encounter build commands in ``.cmd`` files that it does +not yet support. By default, KernelSbom will fail if an unknown build +command is encountered. + +If you still wish to generate SPDX documents despite unsupported +commands, you can use the ``--do-not-fail-on-unknown-build-command`` +option. KernelSbom will continue and produce the documents, although +the resulting SBOM will be incomplete. + +This option should only be used when the missing portion of the +dependency graph is small and an incomplete SBOM is acceptable for +your use case. --=20 2.43.0 From nobody Mon May 25 05:54:39 2026 Received: from mailgw02.zimbra-vnc.de (mailgw02.zimbra-vnc.de [148.251.102.236]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 3EEA6385529; Mon, 18 May 2026 06:21:18 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=148.251.102.236 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779085282; cv=none; b=I36qDkyICKAb8WJTiWVo0djT0u3W9WrOAmxIi0xgePqasWMo3gfhdss7fn3/qqtknnH+TMoHP3jgDR0sha5uUKm2TP3IMQdjYYQRqj5JDazwxYjtLgpyiAX7eZWkVBykQzUbt6WJRA/NR0Tp/sMHxllilG/AcvsIEV3rRurwZ+o= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779085282; c=relaxed/simple; bh=htGuP00MUFwoIi1DwFkohLV4asminO75kiVwRXChVBo=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=WjhEPCzsQGmlK3819MWRv2PeBX/WPRwXXke5p/ARtXT9JzXWWmKpU4fmeMuLvFZUk9tWvmL/E6tOGLgN0KTdKBKWJXSNzRhjZ22s5+y5Xgj5B2xYgqOL6AYxjxHmJnVEZC93bjomV7qeRZit+NtVQeCeiER6YEbc9cvySM4uUa0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=tngtech.com; spf=pass smtp.mailfrom=tngtech.com; dkim=pass (2048-bit key) header.d=tngtech.com header.i=@tngtech.com header.b=KxEOhiD6; arc=none smtp.client-ip=148.251.102.236 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=tngtech.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=tngtech.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=tngtech.com header.i=@tngtech.com header.b="KxEOhiD6" Received: from zmproxy.tng.vnc.biz (zimbra-vnc.tngtech.com [35.234.71.156]) by mailgw02.zimbra-vnc.de (Postfix) with ESMTPS id 56BC0200BB; Mon, 18 May 2026 08:21:15 +0200 (CEST) Received: from localhost (localhost [127.0.0.1]) by zmproxy.tng.vnc.biz (Postfix) with ESMTP id 384221F8989; Mon, 18 May 2026 08:21:15 +0200 (CEST) Received: from zmproxy.tng.vnc.biz ([127.0.0.1]) by localhost (zmproxy.tng.vnc.biz [127.0.0.1]) (amavis, port 10032) with ESMTP id x4pW-AO9AM1z; Mon, 18 May 2026 08:21:10 +0200 (CEST) Received: from localhost (localhost [127.0.0.1]) by zmproxy.tng.vnc.biz (Postfix) with ESMTP id E85311F89A8; Mon, 18 May 2026 08:21:09 +0200 (CEST) DKIM-Filter: OpenDKIM Filter v2.10.3 zmproxy.tng.vnc.biz E85311F89A8 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=tngtech.com; s=B14491C6-869D-11EB-BB6C-8DD33D883B31; t=1779085270; bh=OfOkKkqhycGPczYqB2DRqx74nqpSOhYRT84EwmYdtfQ=; h=From:To:Subject:Date:Message-ID:MIME-Version; b=KxEOhiD6PRRzQU5TUOYKElQ5+E7mmGAA5Vb5MUXZK/e3dw4RWJN/81hZfrlHd+kV1 MgsYYbwyRopIcSH514Wr/nh/SiiSLHtoxaifCnO0UuhOQb8d0X1k+hCualK8q7XnHd +S/fKnVsAy3XJzhnweMhNDWHHEFRCHLcOkskbmdFJ/spkBOaBa9EtyPDxniub9/2Ta SJ9m8KePpleS1iDoRg2DSFRgn2anpncCv2AZ6hqoLkCqJi3e27vXJ0reuORmE2WJcG RJuIDgrtqQsQ8jG0/JvfqYw/V0pzinqRdb5G9rRmQrvURWMhfpIrBbRzrv48SoplSg GTV94h2pLyheA== X-Virus-Scanned: amavis at zmproxy.tng.vnc.biz Received: from zmproxy.tng.vnc.biz ([127.0.0.1]) by localhost (zmproxy.tng.vnc.biz [127.0.0.1]) (amavis, port 10026) with ESMTP id DaWqdEyf8ibC; Mon, 18 May 2026 08:21:09 +0200 (CEST) Received: from luis-Precision-5480.. (ipservice-092-209-239-167.092.209.pools.vodafone-ip.de [92.209.239.167]) by zmproxy.tng.vnc.biz (Postfix) with ESMTPSA id 91FE81F8989; Mon, 18 May 2026 08:21:09 +0200 (CEST) From: Luis To: nathan@kernel.org, nsc@kernel.org Cc: linux-kbuild@vger.kernel.org, linux-kernel@vger.kernel.org, akpm@linux-foundation.org, gregkh@linuxfoundation.org, kstewart@linuxfoundation.org, maximilian.huber@tngtech.com, Luis Augenstein Subject: [PATCH v7 02/15] scripts/sbom: integrate script in make process Date: Mon, 18 May 2026 08:20:49 +0200 Message-ID: <20260518062102.2051814-3-luis.augenstein@tngtech.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20260518062102.2051814-1-luis.augenstein@tngtech.com> References: <20260518062102.2051814-1-luis.augenstein@tngtech.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Luis Augenstein integrate SBOM script into the kernel build process. Assisted-by: Cursor:claude-sonnet-4-5 Assisted-by: OpenCode:GLM-4-7 Co-developed-by: Maximilian Huber Signed-off-by: Maximilian Huber Signed-off-by: Luis Augenstein --- .gitignore | 1 + MAINTAINERS | 6 ++++++ Makefile | 20 ++++++++++++++++++-- scripts/sbom/sbom.py | 16 ++++++++++++++++ 4 files changed, 41 insertions(+), 2 deletions(-) create mode 100644 scripts/sbom/sbom.py diff --git a/.gitignore b/.gitignore index 3044b9590f0..f0d35a9d591 100644 --- a/.gitignore +++ b/.gitignore @@ -49,6 +49,7 @@ *.s *.so *.so.dbg +*.spdx.json *.su *.symtypes *.tab.[ch] diff --git a/MAINTAINERS b/MAINTAINERS index 6aa3fe2ee1b..3dd2ce9ef0c 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -23903,6 +23903,12 @@ R: Marc Murphy S: Supported F: arch/arm/boot/dts/ti/omap/am335x-sancloud* =20 +SBOM +M: Luis Augenstein +M: Maximilian Huber +S: Maintained +F: scripts/sbom/ + SC1200 WDT DRIVER M: Zwane Mwaikambo S: Maintained diff --git a/Makefile b/Makefile index b7b80e84e1e..36f43a9e2ae 100644 --- a/Makefile +++ b/Makefile @@ -787,7 +787,7 @@ endif # in addition to whatever we do anyway. # Just "make" or "make all" shall build modules as well =20 -ifneq ($(filter all modules nsdeps compile_commands.json clang-%,$(MAKECMD= GOALS)),) +ifneq ($(filter all modules nsdeps compile_commands.json clang-% sbom,$(MA= KECMDGOALS)),) KBUILD_MODULES :=3D y endif =20 @@ -1692,7 +1692,7 @@ CLEAN_FILES +=3D vmlinux.symvers modules-only.symvers= \ modules.builtin.ranges vmlinux.o.map vmlinux.unstripped \ compile_commands.json rust/test \ rust-project.json .vmlinux.objs .vmlinux.export.c \ - .builtin-dtbs-list .builtin-dtbs.S + .builtin-dtbs-list .builtin-dtbs.S sbom-*.spdx.json =20 # Directories & files removed with 'make mrproper' MRPROPER_FILES +=3D include/config include/generated \ @@ -1811,6 +1811,7 @@ help: @echo '' @echo 'Tools:' @echo ' nsdeps - Generate missing symbol namespace dependencie= s' + @echo ' sbom - Generate Software Bill of Materials' @echo '' @echo 'Kernel selftest:' @echo ' kselftest - Build and run kernel selftest' @@ -2197,6 +2198,21 @@ nsdeps: export KBUILD_NSDEPS=3D1 nsdeps: modules $(Q)$(CONFIG_SHELL) $(srctree)/scripts/nsdeps =20 +# Script to generate .spdx.json SBOM documents describing the build +# ------------------------------------------------------------------------= --- + +ifdef building_out_of_srctree +sbom_targets :=3D sbom-source.spdx.json +endif +sbom_targets +=3D sbom-build.spdx.json sbom-output.spdx.json +quiet_cmd_sbom =3D GEN $(sbom_targets) + cmd_sbom =3D printf "%s\n" "$(KBUILD_IMAGE)" >"$(tmp-target)"; \ + $(if $(CONFIG_MODULES),sed 's/\.o$$/.ko/' $(objtree)/modu= les.order >> "$(tmp-target)";) \ + $(PYTHON3) $(srctree)/scripts/sbom/sbom.py; +PHONY +=3D sbom +sbom: $(notdir $(KBUILD_IMAGE)) include/generated/autoconf.h $(if $(CONFIG= _MODULES),modules modules.order) + $(call cmd,sbom) + # Clang Tooling # ------------------------------------------------------------------------= --- =20 diff --git a/scripts/sbom/sbom.py b/scripts/sbom/sbom.py new file mode 100644 index 00000000000..9c2e4c7f17c --- /dev/null +++ b/scripts/sbom/sbom.py @@ -0,0 +1,16 @@ +#!/usr/bin/env python3 +# SPDX-License-Identifier: GPL-2.0-only OR MIT +# Copyright (C) 2025 TNG Technology Consulting GmbH + +""" +Compute software bill of materials in SPDX format describing a kernel buil= d. +""" + + +def main(): + pass + + +# Call main method +if __name__ =3D=3D "__main__": + main() --=20 2.43.0 From nobody Mon May 25 05:54:39 2026 Received: from mailgw02.zimbra-vnc.de (mailgw02.zimbra-vnc.de [148.251.102.236]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 7E0FD3DE42A; Mon, 18 May 2026 06:21:21 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=148.251.102.236 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779085287; cv=none; b=BCXy+igLglIN23KXacn4Mk4/VShwuRHtdZJJ6jRCnMYYz8jRsRGv7BZCLZR5qEV3WYqdEXPVpX3PhkOzYLqzhsFTHZ96FfpbgoVCbs0JHewqIUiPTUVL/SzqvlEmOOCoV/vl0q3kz4Ma1uc2AeH9bispT1NJt5r6K6dJvhkTfUU= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779085287; c=relaxed/simple; bh=1Z++A3nRDYD9UP7Wt03B49EDM9XW017brqFXjmxp5q4=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=ZlRNTCBONaw1pua5BtqHMd+i91RoEBwkyhleklGEGWxP7O1Ox4d2pL3YOSsEgRJDYOSnbr+TtIXS4LumdvQNWXVBU9L86innWtuFcM5lfPNUWjJjKeDSBUDVl+Yt6A2vNedbJTckfOfjNGE+jEk4K9hJfpexbViBkCunuvQSvik= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=tngtech.com; spf=pass smtp.mailfrom=tngtech.com; dkim=pass (2048-bit key) header.d=tngtech.com header.i=@tngtech.com header.b=MER7hYVp; arc=none smtp.client-ip=148.251.102.236 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=tngtech.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=tngtech.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=tngtech.com header.i=@tngtech.com header.b="MER7hYVp" Received: from zmproxy.tng.vnc.biz (zimbra-vnc.tngtech.com [35.234.71.156]) by mailgw02.zimbra-vnc.de (Postfix) with ESMTPS id 1A720200C6; Mon, 18 May 2026 08:21:18 +0200 (CEST) Received: from localhost (localhost [127.0.0.1]) by zmproxy.tng.vnc.biz (Postfix) with ESMTP id 5B8A11FACCB; Mon, 18 May 2026 08:21:17 +0200 (CEST) Received: from zmproxy.tng.vnc.biz ([127.0.0.1]) by localhost (zmproxy.tng.vnc.biz [127.0.0.1]) (amavis, port 10032) with ESMTP id 19SU_emGhFsG; Mon, 18 May 2026 08:21:12 +0200 (CEST) Received: from localhost (localhost [127.0.0.1]) by zmproxy.tng.vnc.biz (Postfix) with ESMTP id EF5351FACD3; Mon, 18 May 2026 08:21:11 +0200 (CEST) DKIM-Filter: OpenDKIM Filter v2.10.3 zmproxy.tng.vnc.biz EF5351FACD3 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=tngtech.com; s=B14491C6-869D-11EB-BB6C-8DD33D883B31; t=1779085272; bh=kZR3+eOUXTBU3uvJc4K4Tw0EwAJF/430ifQ+oDuSjfA=; h=From:To:Subject:Date:Message-ID:MIME-Version; b=MER7hYVpa68NH9keWgoXTN25wHy1NU7Ezz5CD1e7DqwnOnQHQUw+taTmMyxzWWG11 Lb5tD/NdvbAwJDcGIupD0MOnDA4X74IIb2hxsFM5Bumyga5PtCi74sUkXE6ExjgQNv GhOJEFUm1DbfPNfZrbPHV7XOoEZjmVZqlHJU0ESuudjvbynrxHSkweRFBcH2XesdAt m55kDhYQk+BUu+nK7f4gTUjYL24TVQ67mlBD2JnB4LDwC+mwP9eK29oZ9swyN0WUZn zNWxuFS2t12YAXYjYF9X52RFrIP3Xgks/F/e2Wx5Lpylw/ZzRJzH9h+WFKlsa4bbG+ 9L7qwHqCpeFlw== X-Virus-Scanned: amavis at zmproxy.tng.vnc.biz Received: from zmproxy.tng.vnc.biz ([127.0.0.1]) by localhost (zmproxy.tng.vnc.biz [127.0.0.1]) (amavis, port 10026) with ESMTP id acckYvwFT0Q5; Mon, 18 May 2026 08:21:11 +0200 (CEST) Received: from luis-Precision-5480.. (ipservice-092-209-239-167.092.209.pools.vodafone-ip.de [92.209.239.167]) by zmproxy.tng.vnc.biz (Postfix) with ESMTPSA id 9B5B31FACCB; Mon, 18 May 2026 08:21:11 +0200 (CEST) From: Luis To: nathan@kernel.org, nsc@kernel.org Cc: linux-kbuild@vger.kernel.org, linux-kernel@vger.kernel.org, akpm@linux-foundation.org, gregkh@linuxfoundation.org, kstewart@linuxfoundation.org, maximilian.huber@tngtech.com, Luis Augenstein Subject: [PATCH v7 03/15] scripts/sbom: setup sbom logging Date: Mon, 18 May 2026 08:20:50 +0200 Message-ID: <20260518062102.2051814-4-luis.augenstein@tngtech.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20260518062102.2051814-1-luis.augenstein@tngtech.com> References: <20260518062102.2051814-1-luis.augenstein@tngtech.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Luis Augenstein Add logging infrastructure for warnings and errors. Errors and warnings are accumulated and summarized in the end. Assisted-by: Cursor:claude-sonnet-4-5 Assisted-by: OpenCode:GLM-4-7 Co-developed-by: Maximilian Huber Signed-off-by: Maximilian Huber Signed-off-by: Luis Augenstein --- scripts/sbom/sbom.py | 26 ++++++++- scripts/sbom/sbom/__init__.py | 0 scripts/sbom/sbom/config.py | 46 +++++++++++++++ scripts/sbom/sbom/sbom_logging.py | 94 +++++++++++++++++++++++++++++++ 4 files changed, 165 insertions(+), 1 deletion(-) create mode 100644 scripts/sbom/sbom/__init__.py create mode 100644 scripts/sbom/sbom/config.py create mode 100644 scripts/sbom/sbom/sbom_logging.py diff --git a/scripts/sbom/sbom.py b/scripts/sbom/sbom.py index 9c2e4c7f17c..3bd466720b0 100644 --- a/scripts/sbom/sbom.py +++ b/scripts/sbom/sbom.py @@ -6,9 +6,33 @@ Compute software bill of materials in SPDX format describing a kernel buil= d. """ =20 +import logging +import sys +import sbom.sbom_logging as sbom_logging +from sbom.config import get_config + + +def _exit_with_summary(write_output_on_error: bool =3D False) -> None: + warning_summary =3D sbom_logging.summarize_warnings() + error_summary =3D sbom_logging.summarize_errors() + if warning_summary: + logging.warning(warning_summary) + if error_summary: + logging.error(error_summary) + sys.exit(1) + =20 def main(): - pass + # Read config + config =3D get_config() + + # Configure logging + logging.basicConfig( + level=3Dlogging.DEBUG if config.debug else logging.INFO, + format=3D"[%(levelname)s] %(message)s", + ) + + _exit_with_summary(config.write_output_on_error) =20 =20 # Call main method diff --git a/scripts/sbom/sbom/__init__.py b/scripts/sbom/sbom/__init__.py new file mode 100644 index 00000000000..e69de29bb2d diff --git a/scripts/sbom/sbom/config.py b/scripts/sbom/sbom/config.py new file mode 100644 index 00000000000..c1ac9ad5737 --- /dev/null +++ b/scripts/sbom/sbom/config.py @@ -0,0 +1,46 @@ +# SPDX-License-Identifier: GPL-2.0-only OR MIT +# Copyright (C) 2025 TNG Technology Consulting GmbH + +import argparse +from dataclasses import dataclass + + +@dataclass +class KernelSbomConfig: + debug: bool + """Whether to enable debug logging.""" + + +def _parse_cli_arguments(parser: argparse.ArgumentParser) -> dict[str, boo= l]: + """ + Parse command-line arguments using argparse. + + Returns: + Dictionary of parsed arguments. + """ + parser.add_argument( + "--debug", + action=3D"store_true", + default=3DFalse, + help=3D"Enable debug logs (default: False)", + ) + + args =3D vars(parser.parse_args()) + return args + + +def get_config() -> KernelSbomConfig: + """ + Parse command-line arguments and construct the configuration object. + + Returns: + KernelSbomConfig: Configuration object with all settings for SBOM = generation. + """ + parser =3D argparse.ArgumentParser( + description=3D"Generate SPDX SBOM documents for kernel builds", + ) + args =3D _parse_cli_arguments(parser) + + debug =3D args["debug"] + + return KernelSbomConfig(debug=3Ddebug) diff --git a/scripts/sbom/sbom/sbom_logging.py b/scripts/sbom/sbom/sbom_log= ging.py new file mode 100644 index 00000000000..fbc53cc77ef --- /dev/null +++ b/scripts/sbom/sbom/sbom_logging.py @@ -0,0 +1,94 @@ +# SPDX-License-Identifier: GPL-2.0-only OR MIT +# Copyright (C) 2025 TNG Technology Consulting GmbH + +import logging +import inspect +from typing import Literal + + +MessageTemplate =3D str + + +class MessageLogger: + """Logger that suppresses repeated messages and stores a summary of al= l logged messages.""" + + _messages: dict[MessageTemplate, list[str]] + _message_counts: dict[MessageTemplate, int] + _repeated_logs_limit: int + """Maximum number of repeated messages of the same type to log before = suppressing further output.""" + + def __init__(self, level: Literal["error", "warning"], repeated_logs_l= imit: int =3D 3) -> None: + self._level =3D level + self._messages =3D {} + self._message_counts =3D {} + self._repeated_logs_limit =3D repeated_logs_limit + + def log(self, template: MessageTemplate, /, **kwargs: str) -> None: + """Log a message based on a template and optional variables. Examp= le: `log("Missing {path}", path=3Dstr(p))`.""" + message =3D template + for key, value in kwargs.items(): + message =3D message.replace("{" + key + "}", value) + if template not in self._messages: + self._messages[template] =3D [] + self._message_counts[template] =3D 0 + self._message_counts[template] +=3D 1 + if self._message_counts[template] <=3D self._repeated_logs_limit: + if self._level =3D=3D "error": + logging.error(message) + elif self._level =3D=3D "warning": + logging.warning(message) + self._messages[template].append(message) + + def get_summary(self) -> str: + if len(self._messages) =3D=3D 0: + return "" + summary: list[str] =3D [f"Summarize {self._level}s:"] + for template, messages in self._messages.items(): + for message in messages: + summary.append(message) + n_suppressed_messages =3D self._message_counts[template] - sel= f._repeated_logs_limit + if n_suppressed_messages > 0: + instances =3D "instance" if n_suppressed_messages =3D=3D 1= else "instances" + summary.append(f"... (Found {n_suppressed_messages} more {= instances} of this {self._level})") + return "\n".join(summary) + + def has_messages(self) -> bool: + return len(self._message_counts) > 0 + + +_warning_logger: MessageLogger +_error_logger: MessageLogger + + +def warning(msg_template: MessageTemplate, /, **kwargs: str) -> None: + _warning_logger.log(msg_template, **kwargs) + + +def error(msg_template: MessageTemplate, /, **kwargs: str) -> None: + frame =3D inspect.currentframe() + caller_frame =3D frame.f_back if frame else None + info =3D inspect.getframeinfo(caller_frame) if caller_frame else None + if info: + msg_template =3D f'File "{info.filename}", line {info.lineno}, in = {info.function}\n{msg_template}' + _error_logger.log(msg_template, **kwargs) + + +def summarize_warnings() -> str: + return _warning_logger.get_summary() + + +def summarize_errors() -> str: + return _error_logger.get_summary() + + +def has_errors() -> bool: + return _error_logger.has_messages() + + +def init() -> None: + global _warning_logger, _error_logger + _warning_logger =3D MessageLogger("warning") + _error_logger =3D MessageLogger("error") + + +init() --=20 2.43.0 From nobody Mon May 25 05:54:39 2026 Received: from mailgw02.zimbra-vnc.de (mailgw02.zimbra-vnc.de [148.251.102.236]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 5D6C23DE420; Mon, 18 May 2026 06:21:20 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=148.251.102.236 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779085292; cv=none; b=NnRCSlzOdm86K0JyhRLJ9ZMKy/wdyWYtBE9Sdato908kmPwJrV/J62Sw0z7JUVOSemX6wN+4GSEjzV+ua3KAolQJiC+q1fE6V1ORzvT6BZrJPYFzK7krJYkoBJfwwESdt4weuUa+aHbbLg3opMTEb/dkY14yTWuupVExa3TndmQ= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779085292; c=relaxed/simple; bh=0HhtpX6wXVGv7ju6BRjOJQwpkKFdBDtQixJWlUzoVEE=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=biXgNLtMn9ybgeT20rjoGOBWfPsYjckVl9Fhzl3afqeHZLAcbxeY8cpo92cqsT4CfmHOiRC0VAttyMHBNgETA1Vx4ml/dz80NJ6eddad1FFxVfV8Hby2wsmh/OPiSiQREsQhICsuByMVuKV16YVuzyefl9lpIE441GVFNVYFweA= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=tngtech.com; spf=pass smtp.mailfrom=tngtech.com; dkim=pass (2048-bit key) header.d=tngtech.com header.i=@tngtech.com header.b=W42LhRxE; arc=none smtp.client-ip=148.251.102.236 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=tngtech.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=tngtech.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=tngtech.com header.i=@tngtech.com header.b="W42LhRxE" Received: from zmproxy.tng.vnc.biz (zimbra-vnc.tngtech.com [35.234.71.156]) by mailgw02.zimbra-vnc.de (Postfix) with ESMTPS id 23753200BE; Mon, 18 May 2026 08:21:17 +0200 (CEST) Received: from localhost (localhost [127.0.0.1]) by zmproxy.tng.vnc.biz (Postfix) with ESMTP id E75531FAD21; Mon, 18 May 2026 08:21:15 +0200 (CEST) Received: from zmproxy.tng.vnc.biz ([127.0.0.1]) by localhost (zmproxy.tng.vnc.biz [127.0.0.1]) (amavis, port 10032) with ESMTP id 2vNXyV3zB_3E; Mon, 18 May 2026 08:21:13 +0200 (CEST) Received: from localhost (localhost [127.0.0.1]) by zmproxy.tng.vnc.biz (Postfix) with ESMTP id BD16F1FAD23; Mon, 18 May 2026 08:21:13 +0200 (CEST) DKIM-Filter: OpenDKIM Filter v2.10.3 zmproxy.tng.vnc.biz BD16F1FAD23 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=tngtech.com; s=B14491C6-869D-11EB-BB6C-8DD33D883B31; t=1779085273; bh=Je4grYFP3lZK0hYLXd3y/+5zGMGIrn3hUN7UBzkJs0A=; h=From:To:Subject:Date:Message-ID:MIME-Version; b=W42LhRxEux2EwVIcztomnOH74P8N1wpHV7AGgE1MTsWeQwMUDwfTxE6aiqBCbqBk7 OjBzfT4dJ2KoBxRSOgypN7eO3fGKQPfnZdOiZ3PY2F8PVWZQ1TrOqgs1UFJf3W9WPo fhlTarKN69HmiNj4Oic2cmOS70ZHXzQEW2vB2mO0k2I9WmjsS/l/AA2drTY/jE8RON VPe8ucEpcnC8B+JtPE3l6xdwXQg4cwi4HlOJMQxJU5CTPyKalpIcrjVkPE4ZsUX7eN sE1RWY89Tn9+9rlgM5A/jzjNC6dYSOga28z6P3fB6807Ryeqm2YRp0xCzzxOZ/Lz4k hL5imuLpd0yYA== X-Virus-Scanned: amavis at zmproxy.tng.vnc.biz Received: from zmproxy.tng.vnc.biz ([127.0.0.1]) by localhost (zmproxy.tng.vnc.biz [127.0.0.1]) (amavis, port 10026) with ESMTP id Wp6zazqLJATq; Mon, 18 May 2026 08:21:13 +0200 (CEST) Received: from luis-Precision-5480.. (ipservice-092-209-239-167.092.209.pools.vodafone-ip.de [92.209.239.167]) by zmproxy.tng.vnc.biz (Postfix) with ESMTPSA id 5B7BA1FAD21; Mon, 18 May 2026 08:21:13 +0200 (CEST) From: Luis To: nathan@kernel.org, nsc@kernel.org Cc: linux-kbuild@vger.kernel.org, linux-kernel@vger.kernel.org, akpm@linux-foundation.org, gregkh@linuxfoundation.org, kstewart@linuxfoundation.org, maximilian.huber@tngtech.com, Luis Augenstein Subject: [PATCH v7 04/15] scripts/sbom: add command parsers Date: Mon, 18 May 2026 08:20:51 +0200 Message-ID: <20260518062102.2051814-5-luis.augenstein@tngtech.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20260518062102.2051814-1-luis.augenstein@tngtech.com> References: <20260518062102.2051814-1-luis.augenstein@tngtech.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Luis Augenstein Implement savedcmd_parser module for extracting input files from kernel build commands. Assisted-by: Cursor:claude-sonnet-4-5 Assisted-by: OpenCode:GLM-4-7 Co-developed-by: Maximilian Huber Signed-off-by: Maximilian Huber Signed-off-by: Luis Augenstein --- .../cmd_graph/savedcmd_parser/__init__.py | 6 + .../command_parser_registry.py | 516 ++++++++++++++++++ .../savedcmd_parser/command_splitter.py | 128 +++++ .../savedcmd_parser/savedcmd_parser.py | 67 +++ .../cmd_graph/savedcmd_parser/tokenizer.py | 92 ++++ scripts/sbom/sbom/environment.py | 192 +++++++ 6 files changed, 1001 insertions(+) create mode 100644 scripts/sbom/sbom/cmd_graph/savedcmd_parser/__init__.py create mode 100644 scripts/sbom/sbom/cmd_graph/savedcmd_parser/command_par= ser_registry.py create mode 100644 scripts/sbom/sbom/cmd_graph/savedcmd_parser/command_spl= itter.py create mode 100644 scripts/sbom/sbom/cmd_graph/savedcmd_parser/savedcmd_pa= rser.py create mode 100644 scripts/sbom/sbom/cmd_graph/savedcmd_parser/tokenizer.py create mode 100644 scripts/sbom/sbom/environment.py diff --git a/scripts/sbom/sbom/cmd_graph/savedcmd_parser/__init__.py b/scri= pts/sbom/sbom/cmd_graph/savedcmd_parser/__init__.py new file mode 100644 index 00000000000..d13876af4df --- /dev/null +++ b/scripts/sbom/sbom/cmd_graph/savedcmd_parser/__init__.py @@ -0,0 +1,6 @@ +# SPDX-License-Identifier: GPL-2.0-only OR MIT +# Copyright (C) 2025 TNG Technology Consulting GmbH + +from sbom.cmd_graph.savedcmd_parser.savedcmd_parser import parse_inputs_fr= om_commands + +__all__ =3D ["parse_inputs_from_commands"] diff --git a/scripts/sbom/sbom/cmd_graph/savedcmd_parser/command_parser_reg= istry.py b/scripts/sbom/sbom/cmd_graph/savedcmd_parser/command_parser_regis= try.py new file mode 100644 index 00000000000..a48040b2c13 --- /dev/null +++ b/scripts/sbom/sbom/cmd_graph/savedcmd_parser/command_parser_registry.py @@ -0,0 +1,516 @@ +# SPDX-License-Identifier: GPL-2.0-only OR MIT +# Copyright (C) 2025 TNG Technology Consulting GmbH + +import re +import shlex +from typing import Callable, Iterator + +import sbom.sbom_logging as sbom_logging +from sbom.environment import Environment +from sbom.cmd_graph.savedcmd_parser.command_splitter import IfBlock, split= _commands +from sbom.cmd_graph.savedcmd_parser.tokenizer import ( + CmdParsingError, + Option, + Positional, + tokenize_single_command, + tokenize_single_command_positionals_only, +) +from sbom.path_utils import PathStr + +CommandParser =3D Callable[[str], list[PathStr]] +CommandParserRegistryEntry =3D tuple[re.Pattern[str], CommandParser] + + +def _parse_dd_command(command: str) -> list[PathStr]: + match =3D re.match(r"dd.*?if=3D(\S+)", command) + if match: + return [match.group(1)] + return [] + + +def _parse_cat_command(command: str) -> list[PathStr]: + positionals =3D tokenize_single_command_positionals_only(command) + # expect positionals to be ["cat", input1, input2, ...] + return [p for p in positionals[1:]] + + +def _parse_compound_command(command: str) -> list[PathStr]: + compound_command_parsers: list[CommandParserRegistryEntry] =3D [ + (re.compile(r"dd\b"), _parse_dd_command), + (re.compile(r"cat.*?\|"), lambda c: _parse_cat_command(c.split("|"= )[0])), + (re.compile(r"cat\b[^|>]*$"), _parse_cat_command), + (re.compile(r"echo\b"), _parse_noop), + (re.compile(r"\S+=3D"), _parse_noop), + (re.compile(r"printf\b"), _parse_noop), + (re.compile(r"sed\b"), _parse_sed_command), + ( + re.compile(r"(.*/)scripts/bin2c\s*<"), + lambda c: [input] if (input :=3D c.split("<")[1].split(">")[0]= .strip()) !=3D "/dev/null" else [], + ), + (re.compile(r"^:$"), _parse_noop), + ] + + match =3D re.match(r"\s*[\(\{](.*)[\)\}]\s*>", command, re.DOTALL) + if match is None: + raise CmdParsingError("No inner commands found for compound comman= d") + input_files: list[PathStr] =3D [] + inner_commands =3D split_commands(match.group(1)) + for inner_command in inner_commands: + if isinstance(inner_command, IfBlock): + sbom_logging.error( + "Skip parsing inner command {inner_command} of compound co= mmand because IfBlock is not supported", + inner_command=3Dinner_command, + ) + continue + + parser =3D next((parser for pattern, parser in compound_command_pa= rsers if pattern.match(inner_command)), None) + if parser is None: + sbom_logging.error( + "Skip parsing inner command {inner_command} of compound co= mmand because no matching parser was found", + inner_command=3Dinner_command, + ) + continue + try: + input_files +=3D parser(inner_command) + except (CmdParsingError, IndexError) as e: + sbom_logging.error( + "Skip parsing inner command {inner_command} of compound co= mmand because of command parsing error: {error_message}", + inner_command=3Dinner_command, + error_message=3Dstr(e), + ) + return input_files + + +def _parse_objcopy_command(command: str) -> list[PathStr]: + command_parts =3D tokenize_single_command(command, flag_options=3D["-S= ", "-w"]) + positionals =3D [part.value for part in command_parts if isinstance(pa= rt, Positional)] + # expect positionals to be ['objcopy', input_file] or ['objcopy', inpu= t_file, output_file] + return [positionals[1]] + + +def _parse_link_vmlinux_command(command: str) -> list[PathStr]: + """ + For simplicity we do not parse the `scripts/link-vmlinux.sh` script. + Instead the `vmlinux.a` dependency is just hardcoded for now. + """ + return ["vmlinux.a"] + + +def _parse_cp_command(command: str) -> list[PathStr]: + positionals =3D tokenize_single_command_positionals_only(command) + # expect positionals to be ["cp", input1, ..., destination] + return positionals[1:-1] + + +def _parse_noop(command: str) -> list[PathStr]: + """ + No-op parser for commands with no input files (e.g., 'rm', 'true'). + Returns an empty list. + """ + return [] + + +def _parse_ar_command(command: str) -> list[PathStr]: + positionals =3D tokenize_single_command_positionals_only(command) + # expect positionals to be ['ar', flags, output, input1, input2, ...] + flags =3D positionals[1] + if "r" not in flags: + # 'r' option indicates that new files are added to the archive. + # If this option is missing we won't find any relevant input files. + return [] + return positionals[3:] + + +def _parse_ar_piped_xargs_command(command: str) -> list[PathStr]: + printf_command, _ =3D command.split("|", 1) + positionals =3D tokenize_single_command_positionals_only(printf_comman= d.strip()) + # expect positionals to be ['printf', '{prefix_path}%s ', input1, inpu= t2, ...] + prefix_path =3D positionals[1].removesuffix("%s ") + return [f"{prefix_path}{filename}" for filename in positionals[2:]] + + +def _parse_gcc_or_clang_command(command: str) -> list[PathStr]: + parts =3D shlex.split(command) + # compile mode: expect last positional argument ending in a source fil= e extension to be the input file + for part in reversed(parts): + if not part.startswith("-") and any(part.endswith(suffix) for suff= ix in [".c", ".S", ".dts"]): + return [part] + + # linking mode: expect all .o files to be the inputs + return [p for p in parts if p.endswith(".o")] + + +def _parse_rustc_command(command: str) -> list[PathStr]: + parts =3D shlex.split(command) + # expect last positional argument ending in `.rs` to be the input file + for part in reversed(parts): + if not part.startswith("-") and part.endswith(".rs"): + return [part] + raise CmdParsingError("Could not find .rs input source file") + + +def _parse_rustdoc_command(command: str) -> list[PathStr]: + parts =3D shlex.split(command) + # expect last positional argument ending in `.rs` to be the input file + for part in reversed(parts): + if not part.startswith("-") and part.endswith(".rs"): + return [part] + raise CmdParsingError("Could not find .rs input source file") + + +def _parse_syscallhdr_command(command: str) -> list[PathStr]: + command_parts =3D tokenize_single_command(command.strip(), flag_option= s=3D["--emit-nr"]) + positionals =3D [p.value for p in command_parts if isinstance(p, Posit= ional)] + # expect positionals to be ["sh", path/to/syscallhdr.sh, input, output] + return [positionals[2]] + + +def _parse_syscalltbl_command(command: str) -> list[PathStr]: + command_parts =3D tokenize_single_command(command.strip()) + positionals =3D [p.value for p in command_parts if isinstance(p, Posit= ional)] + # expect positionals to be ["sh", path/to/syscalltbl.sh, input, output] + return [positionals[2]] + + +def _parse_mkcapflags_command(command: str) -> list[PathStr]: + positionals =3D tokenize_single_command_positionals_only(command) + # expect positionals to be ["sh", path/to/mkcapflags.sh, output, input= 1, input2] + return [positionals[3], positionals[4]] + + +def _parse_orc_hash_command(command: str) -> list[PathStr]: + positionals =3D tokenize_single_command_positionals_only(command) + # expect positionals to be ["sh", path/to/orc_hash.sh, '<', input, '>'= , output] + return [positionals[3]] + + +def _parse_xen_hypercalls_command(command: str) -> list[PathStr]: + positionals =3D tokenize_single_command_positionals_only(command) + # expect positionals to be ["sh", path/to/xen-hypercalls.sh, output, i= nput1, input2, ...] + return positionals[3:] + + +def _parse_gen_initramfs_command(command: str) -> list[PathStr]: + command_parts =3D tokenize_single_command(command) + positionals =3D [p.value for p in command_parts if isinstance(p, Posit= ional)] + # expect positionals to be ["sh", path/to/gen_initramfs.sh, input1, in= put2, ...] + return positionals[2:] + + +def _parse_vdso2c_command(command: str) -> list[PathStr]: + positionals =3D tokenize_single_command_positionals_only(command) + # expect positionals to be ['vdso2c', raw_input, stripped_input, outpu= t] + return [positionals[1], positionals[2]] + + +def _parse_vdsomunge_command(command: str) -> list[PathStr]: + positionals =3D tokenize_single_command_positionals_only(command) + # expect positionals to be ['vdsomunge', input, output] + return [positionals[1]] + + +def _parse_ld_command(command: str) -> list[PathStr]: + command_parts =3D tokenize_single_command( + command=3Dcommand.strip(), + flag_options=3D[ + "-shared", + "--no-undefined", + "--eh-frame-hdr", + "-Bsymbolic", + "-r", + "--no-ld-generated-unwind-info", + "--no-dynamic-linker", + "-pie", + "--no-dynamic-linker--whole-archive", + "--whole-archive", + "--no-whole-archive", + "--start-group", + "--end-group", + ], + ) + positionals =3D [p.value for p in command_parts if isinstance(p, Posit= ional)] + # expect positionals to be ["ld", input1, input2, ...] + return positionals[1:] + + +def _parse_sed_command(command: str) -> list[PathStr]: + command_parts =3D shlex.split(command) + # expect command parts to be ["sed", *, input] + input =3D command_parts[-1] + if input =3D=3D "/dev/null": + return [] + return [input] + + +def _parse_awk(command: str) -> list[PathStr]: + command_parts =3D tokenize_single_command(command) + options =3D [p for p in command_parts if isinstance(p, Option)] + positionals =3D [p.value for p in command_parts if isinstance(p, Posit= ional)] + has_script_file =3D any(p.name =3D=3D "-f" for p in options) + # With -f option: expect ["awk", input1, input2, ...] + # Without -f option: expect ["awk", inline_program, input1, input2, ..= .] + return positionals[1:] if has_script_file else positionals[2:] + + +def _parse_nm_piped_command(command: str) -> list[PathStr]: + nm_command, _ =3D command.split("|", 1) + command_parts =3D tokenize_single_command( + command=3Dnm_command.strip(), + flag_options=3D["-p", "--defined-only"], + ) + positionals =3D [p.value for p in command_parts if isinstance(p, Posit= ional)] + # expect positionals to be ["nm", input1, input2, ...] + return [p for p in positionals[1:]] + + +def _parse_pnm_to_logo_command(command: str) -> list[PathStr]: + command_parts =3D shlex.split(command) + # expect command parts to be ["pnmtologo", , input] + return [command_parts[-1]] + + +def _parse_relacheck(command: str) -> list[PathStr]: + positionals =3D tokenize_single_command_positionals_only(command) + # expect positionals to be ["relacheck", input, log_reference] + return [positionals[1]] + + +def _parse_gen_hyprel_command(command: str) -> list[PathStr]: + gen_hyprel_command, _ =3D command.split(">", 1) + command_parts =3D shlex.split(gen_hyprel_command) + # expect command_parts to be ["gen-hyprel", input] + return [command_parts[1]] + + +def _parse_perl_command(command: str) -> list[PathStr]: + positionals =3D tokenize_single_command_positionals_only(command.strip= ()) + # expect positionals to be ["perl", input] + return [positionals[1]] + + +def _parse_strip_command(command: str) -> list[PathStr]: + command_parts =3D tokenize_single_command(command, flag_options=3D["--= strip-debug"]) + positionals =3D [p.value for p in command_parts if isinstance(p, Posit= ional)] + # expect positionals to be ["strip", input1, input2, ...] + return positionals[1:] + + +def _parse_mkpiggy_command(command: str) -> list[PathStr]: + mkpiggy_command, _ =3D command.split(">", 1) + positionals =3D tokenize_single_command_positionals_only(mkpiggy_comma= nd) + # expect positionals to be ["mkpiggy", input] + return [positionals[1]] + + +def _parse_relocs_command(command: str) -> list[PathStr]: + if ">" not in command: + # Only consider relocs commands that redirect output to a file. + # If there's no redirection, we assume it produces no output file = and therefore has no input we care about. + return [] + relocs_command, _ =3D command.split(">", 1) + command_parts =3D shlex.split(relocs_command) + # expect command_parts to be ["relocs", options, input] + return [command_parts[-1]] + + +def _parse_mk_elfconfig_command(command: str) -> list[PathStr]: + positionals =3D tokenize_single_command_positionals_only(command) + # expect positionals to be ["mk_elfconfig", "<", input, ">", output] + return [positionals[2]] + + +def _parse_flex_command(command: str) -> list[PathStr]: + parts =3D shlex.split(command) + # expect last positional argument ending in `.l` to be the input file + for part in reversed(parts): + if not part.startswith("-") and part.endswith(".l"): + return [part] + raise CmdParsingError("Could not find .l input source file in command") + + +def _parse_bison_command(command: str) -> list[PathStr]: + parts =3D shlex.split(command) + # expect last positional argument ending in `.y` to be the input file + for part in reversed(parts): + if not part.startswith("-") and part.endswith(".y"): + return [part] + raise CmdParsingError("Could not find input .y input source file in co= mmand") + + +def _parse_tools_build_command(command: str) -> list[PathStr]: + positionals =3D tokenize_single_command_positionals_only(command) + # expect positionals to be ["tools/build", "input1", "input2", "input3= ", "output"] + return positionals[1:-1] + + +def _parse_extract_cert_command(command: str) -> list[PathStr]: + command_parts =3D shlex.split(command) + # expect command parts to be [path/to/extract-cert, input, output] + input =3D command_parts[1] + if not input: + return [] + return [input] + + +def _parse_dtc_command(command: str) -> list[PathStr]: + wno_flags =3D [command_part for command_part in shlex.split(command) i= f command_part.startswith("-Wno-")] + command_parts =3D tokenize_single_command(command, flag_options=3Dwno_= flags) + positionals =3D [p.value for p in command_parts if isinstance(p, Posit= ional)] + # expect positionals to be [path/to/dtc, input] + return [positionals[1]] + + +def _parse_bindgen_command(command: str) -> list[PathStr]: + command_parts =3D shlex.split(command) + header_file_input_paths =3D [part for part in command_parts if part.en= dswith(".h")] + return header_file_input_paths + + +def _parse_gen_header(command: str) -> list[PathStr]: + command_parts =3D shlex.split(command) + # expect command parts to be ["python3", path/to/gen_headers.py, ..., = "--xml", input] + i =3D next((i for i, token in enumerate(command_parts) if token =3D=3D= "--xml"), None) + if i is None: + raise CmdParsingError(f"Expected --xml input file in gen_headers c= ommand but got {command}") + return [command_parts[i + 1]] + +def _parse_mkuboot_command(command: str) -> list[PathStr]: + command_parts =3D tokenize_single_command(command) + # mkuboot.sh passes all args to mkimage; -d specifies the data/input i= mage file + for part in command_parts: + if isinstance(part, Option) and part.name =3D=3D "-d" and part.val= ue is not None: + return [part.value] + raise CmdParsingError("Could not find -d (data file) option in mkuboot= .sh command") + + +def _parse_syscallnr_command(command: str) -> list[PathStr]: + command_parts =3D tokenize_single_command(command.strip()) + positionals =3D [p.value for p in command_parts if isinstance(p, Posit= ional)] + # expect positionals to be ["sh", path/to/syscallnr.sh, input, output] + return [positionals[2]] + + +def _parse_gen_kernel_hwcaps_command(command: str) -> list[PathStr]: + command_parts =3D tokenize_single_command(command.strip(), flag_option= s=3D["-e"]) + positionals =3D [p.value for p in command_parts if isinstance(p, Posit= ional)] + # expect positionals to be ["sh", path/to/gen-kernel-hwcaps.sh, input] + return [positionals[2]] + + +class CommandParserRegistry: + """ + Registry mapping command patterns to their input-file parsers. + """ + + def __init__(self, entries: list[CommandParserRegistryEntry]) -> None: + self._entries =3D entries + + def __iter__(self) -> Iterator[CommandParserRegistryEntry]: + return iter(self._entries) + + @staticmethod + def create() -> "CommandParserRegistry": + def env_or_default_pattern(env_value: str | None, default_pattern:= str) -> str: + if env_value is None or not env_value.strip(): + return default_pattern + return rf"(?:{re.escape(env_value.strip())}|{default_pattern})" + + cc_pattern =3D env_or_default_pattern(Environment.CC(), r"([^\s]+-= )?(gcc|clang)") + ld_pattern =3D env_or_default_pattern(Environment.LD(), r"([^\s]+-= )?ld") + ar_pattern =3D env_or_default_pattern(Environment.AR(), r"([^\s]+-= )?ar") + nm_pattern =3D env_or_default_pattern(Environment.NM(), r"([^\s]+-= )?nm") + objcopy_pattern =3D env_or_default_pattern(Environment.OBJCOPY(), = r"([^\s]+-)?objcopy") + strip_pattern =3D env_or_default_pattern(Environment.STRIP(), r"([= ^\s]+-)?strip") + + entries: list[CommandParserRegistryEntry] =3D [ + # Compound commands + (re.compile(r"\(.*?\)\s*>", re.DOTALL), _parse_compound_comman= d), + (re.compile(r"\{.*?\}\s*>", re.DOTALL), _parse_compound_comman= d), + # Standard Unix utilities and system tools + (re.compile(r"^rm\b"), _parse_noop), + (re.compile(r"^mkdir\b"), _parse_noop), + (re.compile(r"^touch\b"), _parse_noop), + (re.compile(r"^cp\b"), _parse_cp_command), + (re.compile(r"^truncate\b"), _parse_noop), + (re.compile(r"^cat\b.*?[\|>]"), lambda c: _parse_cat_command(c= .split("|")[0].split(">")[0])), + (re.compile(r"^echo[^|]*$"), _parse_noop), + (re.compile(r"^sed.*?>"), lambda c: _parse_sed_command(c.split= (">")[0])), + (re.compile(r"^sed\b"), _parse_noop), + (re.compile(r"^awk.*?<.*?>"), lambda c: [c.split("<")[1].split= (">")[0]]), + (re.compile(r"^awk.*?>"), lambda c: _parse_awk(c.split(">")[0]= )), + (re.compile(r"^(/bin/)?true\b"), _parse_noop), + (re.compile(r"^(/bin/)?false\b"), _parse_noop), + (re.compile(r"^openssl\s+req.*?-new.*?-keyout"), _parse_noop), + # Compilers and code generators + # (C/LLVM toolchain, Rust, Flex/Bison, Bindgen, Perl, etc.) + ( + re.compile(rf"^{cc_pattern}\b"), + lambda command: _parse_gcc_or_clang_command(re.sub(rf"^{cc= _pattern}\b", "gcc", command, count=3D1)), + ), + ( + re.compile(rf"^{ld_pattern}\b"), + lambda command: _parse_ld_command(re.sub(rf"^{ld_pattern}\= b", "ld", command, count=3D1)), + ), + ( + re.compile(rf"^printf\b.*\| xargs {ar_pattern}\b"), + lambda command: _parse_ar_piped_xargs_command( + re.sub(rf"xargs {ar_pattern}\b", "xargs ar", command, = count=3D1) + ), + ), + ( + re.compile(rf"^{ar_pattern}\b"), + lambda command: _parse_ar_command(re.sub(rf"^{ar_pattern}\= b", "ar", command, count=3D1)), + ), + ( + re.compile(rf"^{nm_pattern}\b.*?\|"), + lambda command: _parse_nm_piped_command(re.sub(rf"^{nm_pat= tern}\b", "nm", command, count=3D1)), + ), + ( + re.compile(rf"^{objcopy_pattern}\b"), + lambda command: _parse_objcopy_command(re.sub(rf"^{objcopy= _pattern}\b", "objcopy", command, count=3D1)), + ), + ( + re.compile(rf"^{strip_pattern}\b"), + lambda command: _parse_strip_command(re.sub(rf"^{strip_pat= tern}\b", "strip", command, count=3D1)), + ), + (re.compile(r".*?rustc\b"), _parse_rustc_command), + (re.compile(r".*?rustdoc\b"), _parse_rustdoc_command), + (re.compile(r"^flex\b"), _parse_flex_command), + (re.compile(r"^bison\b"), _parse_bison_command), + (re.compile(r"^bindgen\b"), _parse_bindgen_command), + (re.compile(r"^perl\b"), _parse_perl_command), + # Kernel-specific build scripts and tools + (re.compile(r"^(.*/)?link-vmlinux\.sh\b"), _parse_link_vmlinux= _command), + (re.compile(r"sh (.*/)?syscallhdr\.sh\b"), _parse_syscallhdr_c= ommand), + (re.compile(r"sh (.*/)?syscalltbl\.sh\b"), _parse_syscalltbl_c= ommand), + (re.compile(r"sh (.*/)?mkcapflags\.sh\b"), _parse_mkcapflags_c= ommand), + (re.compile(r"sh (.*/)?orc_hash\.sh\b"), _parse_orc_hash_comma= nd), + (re.compile(r"sh (.*/)?xen-hypercalls\.sh\b"), _parse_xen_hype= rcalls_command), + (re.compile(r"sh (.*/)?gen_initramfs\.sh\b"), _parse_gen_initr= amfs_command), + (re.compile(r"sh (.*/)?checkundef\.sh\b"), _parse_noop), + (re.compile(r"(bash|sh) (.*/)?mkuboot\.sh\b"), _parse_mkuboot_= command), + (re.compile(r"sh (.*/)?syscallnr\.sh\b"), _parse_syscallnr_com= mand), + (re.compile(r"(/bin/)?sh (.*/)?gen-kernel-hwcaps\.sh\b"), lamb= da c: _parse_gen_kernel_hwcaps_command(c.split(">")[0])), + (re.compile(r"(.*/)?vdso2c\b"), _parse_vdso2c_command), + (re.compile(r"(.*/)?vdsomunge\b"), _parse_vdsomunge_command), + (re.compile(r"^(.*/)?mkpiggy.*?>"), _parse_mkpiggy_command), + (re.compile(r"^(.*/)?relocs\b"), _parse_relocs_command), + (re.compile(r"^(.*/)?mk_elfconfig.*?<.*?>"), _parse_mk_elfconf= ig_command), + (re.compile(r"^(.*/)?tools/build\b"), _parse_tools_build_comma= nd), + (re.compile(r"^(.*/)?certs/extract-cert"), _parse_extract_cert= _command), + (re.compile(r"^(.*/)?scripts/dtc/dtc\b"), _parse_dtc_command), + (re.compile(r"^(.*/)?pnmtologo\b"), _parse_pnm_to_logo_command= ), + (re.compile(r"^(.*/)?kernel/pi/relacheck"), _parse_relacheck), + (re.compile(r"^(.*/)?gen-hyprel\b"), _parse_gen_hyprel_command= ), + (re.compile(r"^drivers/gpu/drm/radeon/mkregtable"), lambda c: = [c.split(" ")[1]]), + (re.compile(r"(.*/)?genheaders\b"), _parse_noop), + (re.compile(r"^(.*/)?mkcpustr\s+>"), _parse_noop), + (re.compile(r"^(.*/)polgen\b"), _parse_noop), + (re.compile(r"make -f .*/arch/x86/Makefile\.postlink"), _parse= _noop), + (re.compile(r"^(.*/)?raid6/mktables\s+>"), _parse_noop), + (re.compile(r"^(.*/)?objtool\b"), _parse_noop), + (re.compile(r"^(.*/)?module/gen_test_kallsyms.sh"), _parse_noo= p), + (re.compile(r"^(.*/)?gen_header.py"), _parse_gen_header), + (re.compile(r"^(.*/)?scripts/rustdoc_test_gen"), _parse_noop), + ] + return CommandParserRegistry(entries) diff --git a/scripts/sbom/sbom/cmd_graph/savedcmd_parser/command_splitter.p= y b/scripts/sbom/sbom/cmd_graph/savedcmd_parser/command_splitter.py new file mode 100644 index 00000000000..4749f4bd669 --- /dev/null +++ b/scripts/sbom/sbom/cmd_graph/savedcmd_parser/command_splitter.py @@ -0,0 +1,128 @@ +# SPDX-License-Identifier: GPL-2.0-only OR MIT +# Copyright (C) 2025 TNG Technology Consulting GmbH + +import re +from dataclasses import dataclass + + +# If Block pattern to match a simple, single-level if-then-fi block. Neste= d If blocks are not supported. +IF_BLOCK_PATTERN =3D re.compile( + r""" + ^if(.*?);\s* # Match 'if ;' (non-greedy) + then(.*?);\s* # Match 'then ;' (non-greedy) + fi\b # Match 'fi' + """, + re.VERBOSE, +) + + +@dataclass +class IfBlock: + condition: str + then_statement: str + + +def _unwrap_outer_parentheses(s: str) -> str: + s =3D s.strip() + if not (s.startswith("(") and s.endswith(")")): + return s + + count =3D 0 + for i, char in enumerate(s): + if char =3D=3D "(": + count +=3D 1 + elif char =3D=3D ")": + count -=3D 1 + # If count is 0 before the end, outer parentheses don't match + if count =3D=3D 0 and i !=3D len(s) - 1: + return s + + # outer parentheses do match, unwrap once + return _unwrap_outer_parentheses(s[1:-1]) + + +def _find_first_top_level_command_separator( + commands: str, separators: list[str] =3D [";", "&&"] +) -> tuple[int | None, int | None]: + def is_escaped(index: int) -> bool: + preceding =3D commands[:index] + return (len(preceding) - len(preceding.rstrip("\\"))) % 2 =3D=3D 1 + + in_single_quote =3D False + in_double_quote =3D False + in_curly_braces =3D 0 + in_braces =3D 0 + for i, char in enumerate(commands): + if char =3D=3D "'" and not in_double_quote and not is_escaped(i): + # Toggle single quote state (unless inside double quotes or es= caped) + in_single_quote =3D not in_single_quote + elif char =3D=3D '"' and not in_single_quote and not is_escaped(i): + # Toggle double quote state (unless inside single quotes or es= caped) + in_double_quote =3D not in_double_quote + + if in_single_quote or in_double_quote: + continue + + # Toggle braces state + if char =3D=3D "{": + in_curly_braces +=3D 1 + if char =3D=3D "}": + in_curly_braces -=3D 1 + + if char =3D=3D "(": + in_braces +=3D 1 + if char =3D=3D ")": + in_braces -=3D 1 + + if in_curly_braces > 0 or in_braces > 0: + continue + + # return found separator position and separator length + for separator in separators: + if commands[i : i + len(separator)] =3D=3D separator: + return i, len(separator) + + return None, None + + +def split_commands(commands: str) -> list[str | IfBlock]: + """ + Splits a string of command-line commands into individual parts. + + This function handles: + - Top-level command separators (e.g., `;` and `&&`) to split multiple = commands. + - Conditional if-blocks, returning them as `IfBlock` instances. + - Preserves the order of commands and trims whitespace. + + Args: + commands (str): The raw command string. + + Returns: + list[str | IfBlock]: A list of single commands or `IfBlock` object= s. + """ + single_commands: list[str | IfBlock] =3D [] + remaining_commands =3D _unwrap_outer_parentheses(commands) + while len(remaining_commands) > 0: + remaining_commands =3D remaining_commands.strip() + + # if block + matched_if =3D IF_BLOCK_PATTERN.match(remaining_commands) + if matched_if: + condition, then_statement =3D matched_if.groups() + single_commands.append(IfBlock(condition.strip(), then_stateme= nt.strip())) + full_matched =3D matched_if.group(0) + remaining_commands =3D remaining_commands.removeprefix(full_ma= tched).lstrip("; \n") + continue + + # command until next separator + separator_position, separator_length =3D _find_first_top_level_com= mand_separator(remaining_commands) + if separator_position is not None and separator_length is not None: + single_commands.append(remaining_commands[:separator_position]= .strip()) + remaining_commands =3D remaining_commands[separator_position += separator_length :].strip() + continue + + # single last command + single_commands.append(remaining_commands) + break + + return single_commands diff --git a/scripts/sbom/sbom/cmd_graph/savedcmd_parser/savedcmd_parser.py= b/scripts/sbom/sbom/cmd_graph/savedcmd_parser/savedcmd_parser.py new file mode 100644 index 00000000000..6a7ea4787aa --- /dev/null +++ b/scripts/sbom/sbom/cmd_graph/savedcmd_parser/savedcmd_parser.py @@ -0,0 +1,67 @@ +# SPDX-License-Identifier: GPL-2.0-only OR MIT +# Copyright (C) 2025 TNG Technology Consulting GmbH + +import sbom.sbom_logging as sbom_logging +from sbom.cmd_graph.savedcmd_parser.command_splitter import IfBlock, split= _commands +from sbom.cmd_graph.savedcmd_parser.command_parser_registry import Command= ParserRegistry +from sbom.cmd_graph.savedcmd_parser.tokenizer import CmdParsingError +from sbom.path_utils import PathStr + +DEFAULT_COMMAND_PARSER_REGISTRY =3D CommandParserRegistry.create() + + +def parse_inputs_from_commands( + commands: str, + fail_on_unknown_build_command: bool, + registry: CommandParserRegistry | None =3D None, +) -> list[PathStr]: + """ + Extract input files referenced in a set of command-line commands. + + Args: + commands (str): Command line expression to parse. + fail_on_unknown_build_command (bool): Whether to fail if an unknow= n build command is encountered. If False, errors are logged as warnings. + registry (CommandParserRegistry | None): Registry of single comman= d parsers. + + Returns: + list[PathStr]: List of input file paths required by the commands. + """ + + def log_error_or_warning(message: str, /, **kwargs: str) -> None: + if fail_on_unknown_build_command: + sbom_logging.error(message, **kwargs) + else: + sbom_logging.warning(message, **kwargs) + + if registry is None: + registry =3D DEFAULT_COMMAND_PARSER_REGISTRY + + input_files: list[PathStr] =3D [] + for single_command in split_commands(commands): + if isinstance(single_command, IfBlock): + inputs =3D parse_inputs_from_commands(single_command.then_stat= ement, fail_on_unknown_build_command, registry) + if inputs: + log_error_or_warning( + "Skipped parsing command {then_statement} because inpu= t files in IfBlock 'then' statement are not supported", + then_statement=3Dsingle_command.then_statement, + ) + continue + + matched_parser =3D next((parser for pattern, parser in registry if= pattern.match(single_command)), None) + if matched_parser is None: + log_error_or_warning( + "Skipped parsing command {single_command} because no match= ing parser was found", + single_command=3Dsingle_command, + ) + continue + try: + inputs =3D matched_parser(single_command) + input_files.extend(inputs) + except (CmdParsingError, IndexError) as e: + log_error_or_warning( + "Skipped parsing command {single_command} because of comma= nd parsing error: {error_message}", + single_command=3Dsingle_command, + error_message=3Dstr(e), + ) + + return [input.strip().rstrip("/") for input in input_files] diff --git a/scripts/sbom/sbom/cmd_graph/savedcmd_parser/tokenizer.py b/scr= ipts/sbom/sbom/cmd_graph/savedcmd_parser/tokenizer.py new file mode 100644 index 00000000000..1bf081f40be --- /dev/null +++ b/scripts/sbom/sbom/cmd_graph/savedcmd_parser/tokenizer.py @@ -0,0 +1,92 @@ +# SPDX-License-Identifier: GPL-2.0-only OR MIT +# Copyright (C) 2025 TNG Technology Consulting GmbH + +import re +import shlex +from dataclasses import dataclass +from typing import Union + + +class CmdParsingError(Exception): + pass + + +@dataclass +class Option: + name: str + value: str | None =3D None + + +@dataclass +class Positional: + value: str + + +_SUBCOMMAND_PATTERN =3D re.compile(r"\$\$\(([^()]*)\)") +"""Pattern to match $$(...) blocks""" + + +def tokenize_single_command(command: str, flag_options: list[str] | None = =3D None) -> list[Union[Option, Positional]]: + """ + Parse a shell command into a list of Options and Positionals. + - Positional: the command and any positional arguments. + - Options: handles flags and options with values provided as space-sep= arated, or equals-sign + (e.g., '--opt val', '--opt=3Dval', '--flag'). + + Args: + command: Command line string. + flag_options: Options that are flags without values (e.g., '--verb= ose'). + + Returns: + List of `Option` and `Positional` objects in command order. + """ + + # Wrap all $$(...) blocks in double quotes to prevent shlex from spli= tting them. + command_with_protected_subcommands =3D _SUBCOMMAND_PATTERN.sub(lambda = m: f'"$$({m.group(1)})"', command) + tokens =3D shlex.split(command_with_protected_subcommands) + + parsed: list[Option | Positional] =3D [] + i =3D 0 + while i < len(tokens): + token =3D tokens[i] + + # Positional + if not token.startswith("-"): + parsed.append(Positional(token)) + i +=3D 1 + continue + + # Option without value (--flag) + if (token.startswith("-") and i + 1 < len(tokens) and tokens[i + 1= ].startswith("-")) or ( + flag_options and token in flag_options + ): + parsed.append(Option(name=3Dtoken)) + i +=3D 1 + continue + + # Option with equals sign (--opt=3Dval) + if "=3D" in token: + name, value =3D token.split("=3D", 1) + parsed.append(Option(name=3Dname, value=3Dvalue)) + i +=3D 1 + continue + + # Option with space-separated value (--opt val) + if i + 1 < len(tokens) and not tokens[i + 1].startswith("-"): + parsed.append(Option(name=3Dtoken, value=3Dtokens[i + 1])) + i +=3D 2 + continue + + raise CmdParsingError(f"Unrecognized token: {token} in command {co= mmand}") + + return parsed + + +def tokenize_single_command_positionals_only(command: str) -> list[str]: + command_parts =3D tokenize_single_command(command) + positionals =3D [p.value for p in command_parts if isinstance(p, Posit= ional)] + if len(positionals) !=3D len(command_parts): + raise CmdParsingError( + f"Invalid command format: expected positional arguments only b= ut got options in command {command}." + ) + return positionals diff --git a/scripts/sbom/sbom/environment.py b/scripts/sbom/sbom/environme= nt.py new file mode 100644 index 00000000000..4304066fe97 --- /dev/null +++ b/scripts/sbom/sbom/environment.py @@ -0,0 +1,192 @@ +# SPDX-License-Identifier: GPL-2.0-only OR MIT +# Copyright (C) 2025 TNG Technology Consulting GmbH + +import os + +KERNEL_BUILD_VARIABLES_ALLOWLIST =3D [ + "AFLAGS_KERNEL", + "AFLAGS_MODULE", + "AR", + "ARCH", + "ARCH_CORE", + "ARCH_DRIVERS", + "ARCH_LIB", + "AWK", + "BASH", + "BINDGEN", + "BITS", + "CC", + "CC_FLAGS_FPU", + "CC_FLAGS_NO_FPU", + "CFLAGS_GCOV", + "CFLAGS_KERNEL", + "CFLAGS_MODULE", + "CHECK", + "CHECKFLAGS", + "CLIPPY_CONF_DIR", + "CONFIG_SHELL", + "CPP", + "CROSS_COMPILE", + "CURDIR", + "GNUMAKEFLAGS", + "HOSTCC", + "HOSTCXX", + "HOSTPKG_CONFIG", + "HOSTRUSTC", + "INSTALLKERNEL", + "INSTALL_DTBS_PATH", + "INSTALL_HDR_PATH", + "INSTALL_PATH", + "KBUILD_AFLAGS", + "KBUILD_AFLAGS_KERNEL", + "KBUILD_AFLAGS_MODULE", + "KBUILD_BUILTIN", + "KBUILD_CFLAGS", + "KBUILD_CFLAGS_KERNEL", + "KBUILD_CFLAGS_MODULE", + "KBUILD_CHECKSRC", + "KBUILD_CLIPPY", + "KBUILD_CPPFLAGS", + "KBUILD_EXTMOD", + "KBUILD_EXTRA_WARN", + "KBUILD_HOSTCFLAGS", + "KBUILD_HOSTCXXFLAGS", + "KBUILD_HOSTLDFLAGS", + "KBUILD_HOSTLDLIBS", + "KBUILD_HOSTRUSTFLAGS", + "KBUILD_IMAGE", + "KBUILD_LDFLAGS", + "KBUILD_LDFLAGS_MODULE", + "KBUILD_LDS", + "KBUILD_MODULES", + "KBUILD_PROCMACROLDFLAGS", + "KBUILD_RUSTFLAGS", + "KBUILD_RUSTFLAGS_KERNEL", + "KBUILD_RUSTFLAGS_MODULE", + "KBUILD_USERCFLAGS", + "KBUILD_USERLDFLAGS", + "KBUILD_VERBOSE", + "KBUILD_VMLINUX_LIBS", + "KBZIP2", + "KCONFIG_CONFIG", + "KERNELDOC", + "KERNELRELEASE", + "KERNELVERSION", + "KGZIP", + "KLZOP", + "LC_COLLATE", + "LC_NUMERIC", + "LD", + "LDFLAGS_MODULE", + "LEX", + "LINUXINCLUDE", + "LZ4", + "LZMA", + "MAKE", + "MAKEFILES", + "MAKEFILE_LIST", + "MAKEFLAGS", + "MAKELEVEL", + "MAKEOVERRIDES", + "MAKE_COMMAND", + "MAKE_HOST", + "MAKE_TERMERR", + "MAKE_TERMOUT", + "MAKE_VERSION", + "MFLAGS", + "MODLIB", + "NM", + "NOSTDINC_FLAGS", + "O", + "OBJCOPY", + "OBJCOPYFLAGS", + "OBJDUMP", + "PAHOLE", + "PATCHLEVEL", + "PERL", + "PYTHON3", + "Q", + "RCS_FIND_IGNORE", + "READELF", + "REALMODE_CFLAGS", + "RESOLVE_BTFIDS", + "RETHUNK_CFLAGS", + "RETHUNK_RUSTFLAGS", + "RETPOLINE_CFLAGS", + "RETPOLINE_RUSTFLAGS", + "RETPOLINE_VDSO_CFLAGS", + "RUSTC", + "RUSTC_BOOTSTRAP", + "RUSTC_OR_CLIPPY", + "RUSTC_OR_CLIPPY_QUIET", + "RUSTDOC", + "RUSTFLAGS_KERNEL", + "RUSTFLAGS_MODULE", + "RUSTFMT", + "SRCARCH", + "STRIP", + "SUBLEVEL", + "SUFFIXES", + "TAR", + "UTS_MACHINE", + "VERSION", + "VPATH", + "XZ", + "YACC", + "ZSTD", + "building_out_of_srctree", + "cross_compiling", + "objtree", + "quiet", + "rust_common_flags", + "srcroot", + "srctree", + "sub_make_done", + "subdir", +] + + +class Environment: + """ + Read-only accessor for kernel build environment variables. + """ + + @classmethod + def KERNEL_BUILD_VARIABLES(cls) -> dict[str, str]: + return { + name: value.strip() + for name in KERNEL_BUILD_VARIABLES_ALLOWLIST + if (value :=3D os.getenv(name)) is not None and value.strip() + } + + @classmethod + def ARCH(cls) -> str | None: + return os.getenv("ARCH") + + @classmethod + def SRCARCH(cls) -> str | None: + return os.getenv("SRCARCH") + + @classmethod + def CC(cls) -> str | None: + return os.getenv("CC") + + @classmethod + def LD(cls) -> str | None: + return os.getenv("LD") + + @classmethod + def AR(cls) -> str | None: + return os.getenv("AR") + + @classmethod + def NM(cls) -> str | None: + return os.getenv("NM") + + @classmethod + def OBJCOPY(cls) -> str | None: + return os.getenv("OBJCOPY") + + @classmethod + def STRIP(cls) -> str | None: + return os.getenv("STRIP") --=20 2.43.0 From nobody Mon May 25 05:54:39 2026 Received: from mailgw02.zimbra-vnc.de (mailgw02.zimbra-vnc.de [148.251.102.236]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id ABBFF38A722; Mon, 18 May 2026 06:21:25 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=148.251.102.236 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779085293; cv=none; b=cuSeLRVXM55s2yCNsdX3GPqAk4ASWopHVMiYfRI4FUhfuO/rP3wKrffHh/FKsJvfyn3aDa+qxxFYJlWZs4ItcAWST/zGqzT1hghogGlnbFmBHtjUvHjNW5AWahx3SGzuLaMFDRMqUWVguBcIFCzp7omt8/BcRhKbGLuDUV/bCR0= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779085293; c=relaxed/simple; bh=PRg89BIoaeBLewECeAhmZoBCmS8fwXzrLVsL8nN/byY=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=WaeMT/WurcOhRwJvx4+jXxN9qTqyM20dX9ZAAuUGUW6QMtQpejpZpGOtEOcaRChcqUac8KynaBf9aNQO3NuJCY790du8Uu37m0Gng4C3EUkoq5h2bnKv1f3CW3owVKqoMeOTu0zVKPoavmeW5CdZmnWlUgsDi6X0mYIgpH7gniY= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=tngtech.com; spf=pass smtp.mailfrom=tngtech.com; dkim=pass (2048-bit key) header.d=tngtech.com header.i=@tngtech.com header.b=aBIsc1Hf; arc=none smtp.client-ip=148.251.102.236 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=tngtech.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=tngtech.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=tngtech.com header.i=@tngtech.com header.b="aBIsc1Hf" Received: from zmproxy.tng.vnc.biz (zimbra-vnc.tngtech.com [35.234.71.156]) by mailgw02.zimbra-vnc.de (Postfix) with ESMTPS id 2C494200C8; Mon, 18 May 2026 08:21:18 +0200 (CEST) Received: from localhost (localhost [127.0.0.1]) by zmproxy.tng.vnc.biz (Postfix) with ESMTP id 8375E1F8989; Mon, 18 May 2026 08:21:17 +0200 (CEST) Received: from zmproxy.tng.vnc.biz ([127.0.0.1]) by localhost (zmproxy.tng.vnc.biz [127.0.0.1]) (amavis, port 10032) with ESMTP id qGMzBWTURLza; Mon, 18 May 2026 08:21:15 +0200 (CEST) Received: from localhost (localhost [127.0.0.1]) by zmproxy.tng.vnc.biz (Postfix) with ESMTP id 574BF1F89A8; Mon, 18 May 2026 08:21:15 +0200 (CEST) DKIM-Filter: OpenDKIM Filter v2.10.3 zmproxy.tng.vnc.biz 574BF1F89A8 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=tngtech.com; s=B14491C6-869D-11EB-BB6C-8DD33D883B31; t=1779085275; bh=VFYQlOPQ63OilccrhlGPeS+7nKmGq2PrdKFW/66zatk=; h=From:To:Subject:Date:Message-ID:MIME-Version; b=aBIsc1HfAt7D+fnJcMXCl19I8XDvQwnLZKtKoXxUvdQStTex34TCtftNbKUktM1B5 Dgp9jvbqTEwRLWl9Mryut1Dj+ALfyDVYHTDDxACrbJr1O5Pp++pIN5VAr4Hz0g3MbE /5biFzVrNtX+ZJLdaEfK6bQ3o72QZn7SikU+FCDt5cAnpZeqdVe7kFY8EHt1cuPyvh iVTmKmiuxEDoWhjiQWfaEpAmFOdGPb7vxLfzMfITYKxoYUIwjdgaRXTs+dSSzoWrod xSDZkcg5LhQCGZ7CWnmxqhrdMuA+J1xY/MPtsNtuVphFWjYLFdjqw6ay3KOXc1hJRR TAWp4mSE6k1Pg== X-Virus-Scanned: amavis at zmproxy.tng.vnc.biz Received: from zmproxy.tng.vnc.biz ([127.0.0.1]) by localhost (zmproxy.tng.vnc.biz [127.0.0.1]) (amavis, port 10026) with ESMTP id 0kI8EUgMs7Du; Mon, 18 May 2026 08:21:15 +0200 (CEST) Received: from luis-Precision-5480.. (ipservice-092-209-239-167.092.209.pools.vodafone-ip.de [92.209.239.167]) by zmproxy.tng.vnc.biz (Postfix) with ESMTPSA id E9DA21FAD27; Mon, 18 May 2026 08:21:14 +0200 (CEST) From: Luis To: nathan@kernel.org, nsc@kernel.org Cc: linux-kbuild@vger.kernel.org, linux-kernel@vger.kernel.org, akpm@linux-foundation.org, gregkh@linuxfoundation.org, kstewart@linuxfoundation.org, maximilian.huber@tngtech.com, Luis Augenstein Subject: [PATCH v7 05/15] scripts/sbom: add cmd graph generation Date: Mon, 18 May 2026 08:20:52 +0200 Message-ID: <20260518062102.2051814-6-luis.augenstein@tngtech.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20260518062102.2051814-1-luis.augenstein@tngtech.com> References: <20260518062102.2051814-1-luis.augenstein@tngtech.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Luis Augenstein Implement command graph generation by parsing .cmd files to build a dependency graph. Add CmdGraph, CmdGraphNode, and .cmd file parsing. Supports generating a flat list of used source files via the --generate-used-files cli argument. Assisted-by: Cursor:claude-sonnet-4-5 Assisted-by: OpenCode:GLM-4-7 Co-developed-by: Maximilian Huber Signed-off-by: Maximilian Huber Signed-off-by: Luis Augenstein --- Makefile | 6 +- scripts/sbom/sbom.py | 39 +++++ scripts/sbom/sbom/cmd_graph/__init__.py | 7 + scripts/sbom/sbom/cmd_graph/cmd_file.py | 162 ++++++++++++++++++ scripts/sbom/sbom/cmd_graph/cmd_graph.py | 46 +++++ scripts/sbom/sbom/cmd_graph/cmd_graph_node.py | 111 ++++++++++++ scripts/sbom/sbom/cmd_graph/deps_parser.py | 52 ++++++ scripts/sbom/sbom/config.py | 149 +++++++++++++++- scripts/sbom/sbom/path_utils.py | 22 +++ 9 files changed, 591 insertions(+), 3 deletions(-) create mode 100644 scripts/sbom/sbom/cmd_graph/__init__.py create mode 100644 scripts/sbom/sbom/cmd_graph/cmd_file.py create mode 100644 scripts/sbom/sbom/cmd_graph/cmd_graph.py create mode 100644 scripts/sbom/sbom/cmd_graph/cmd_graph_node.py create mode 100644 scripts/sbom/sbom/cmd_graph/deps_parser.py create mode 100644 scripts/sbom/sbom/path_utils.py diff --git a/Makefile b/Makefile index 36f43a9e2ae..5cae3679343 100644 --- a/Makefile +++ b/Makefile @@ -2208,7 +2208,11 @@ sbom_targets +=3D sbom-build.spdx.json sbom-output.s= pdx.json quiet_cmd_sbom =3D GEN $(sbom_targets) cmd_sbom =3D printf "%s\n" "$(KBUILD_IMAGE)" >"$(tmp-target)"; \ $(if $(CONFIG_MODULES),sed 's/\.o$$/.ko/' $(objtree)/modu= les.order >> "$(tmp-target)";) \ - $(PYTHON3) $(srctree)/scripts/sbom/sbom.py; + $(PYTHON3) $(srctree)/scripts/sbom/sbom.py \ + --src-tree $(abspath $(srctree)) \ + --obj-tree $(abspath $(objtree)) \ + --roots-file "$(tmp-target)" \ + --output-directory $(abspath $(objtree)); PHONY +=3D sbom sbom: $(notdir $(KBUILD_IMAGE)) include/generated/autoconf.h $(if $(CONFIG= _MODULES),modules modules.order) $(call cmd,sbom) diff --git a/scripts/sbom/sbom.py b/scripts/sbom/sbom.py index 3bd466720b0..d700e4f294f 100644 --- a/scripts/sbom/sbom.py +++ b/scripts/sbom/sbom.py @@ -7,9 +7,13 @@ Compute software bill of materials in SPDX format describi= ng a kernel build. """ =20 import logging +import os import sys +import time import sbom.sbom_logging as sbom_logging from sbom.config import get_config +from sbom.path_utils import is_relative_to +from sbom.cmd_graph import CmdGraph =20 =20 def _exit_with_summary(write_output_on_error: bool =3D False) -> None: @@ -19,6 +23,11 @@ def _exit_with_summary(write_output_on_error: bool =3D F= alse) -> None: logging.warning(warning_summary) if error_summary: logging.error(error_summary) + if not write_output_on_error: + logging.info( + "Use --write-output-on-error to generate output documents = even when errors occur. " + "Note that in this case the generated documents may be inc= omplete." + ) sys.exit(1) =20 =20 @@ -32,6 +41,36 @@ def main(): format=3D"[%(levelname)s] %(message)s", ) =20 + # Build cmd graph + logging.debug("Start building cmd graph") + start_time =3D time.time() + cmd_graph =3D CmdGraph.create(config.root_paths, config) + logging.debug(f"Built cmd graph in {time.time() - start_time} seconds") + + # Save used files document + if config.generate_used_files: + if config.src_tree =3D=3D config.obj_tree: + logging.info( + f"Extracting all files from the cmd graph to {config.used_= files_file_name} " + "instead of only source files because source files cannot = be " + "reliably classified when the source and object trees are = identical.", + ) + used_files =3D [os.path.relpath(node.absolute_path, config.src= _tree) for node in cmd_graph] + logging.debug(f"Found {len(used_files)} files in cmd graph.") + else: + used_files =3D [ + os.path.relpath(node.absolute_path, config.src_tree) + for node in cmd_graph + if is_relative_to(node.absolute_path, config.src_tree) + and not is_relative_to(node.absolute_path, config.obj_tree) + ] + logging.debug(f"Found {len(used_files)} source files in cmd gr= aph") + if not sbom_logging.has_errors() or config.write_output_on_error: + used_files_path =3D os.path.join(config.output_directory, conf= ig.used_files_file_name) + with open(used_files_path, "w", encoding=3D"utf-8") as f: + f.write("\n".join(str(file_path) for file_path in used_fil= es)) + logging.debug(f"Successfully saved {used_files_path}") + _exit_with_summary(config.write_output_on_error) =20 =20 diff --git a/scripts/sbom/sbom/cmd_graph/__init__.py b/scripts/sbom/sbom/cm= d_graph/__init__.py new file mode 100644 index 00000000000..9d661a5c3d9 --- /dev/null +++ b/scripts/sbom/sbom/cmd_graph/__init__.py @@ -0,0 +1,7 @@ +# SPDX-License-Identifier: GPL-2.0-only OR MIT +# Copyright (C) 2025 TNG Technology Consulting GmbH + +from .cmd_graph import CmdGraph +from .cmd_graph_node import CmdGraphNode, CmdGraphNodeConfig + +__all__ =3D ["CmdGraph", "CmdGraphNode", "CmdGraphNodeConfig"] diff --git a/scripts/sbom/sbom/cmd_graph/cmd_file.py b/scripts/sbom/sbom/cm= d_graph/cmd_file.py new file mode 100644 index 00000000000..dcd63e284a3 --- /dev/null +++ b/scripts/sbom/sbom/cmd_graph/cmd_file.py @@ -0,0 +1,162 @@ +# SPDX-License-Identifier: GPL-2.0-only OR MIT +# Copyright (C) 2025 TNG Technology Consulting GmbH + +import os +import re +from dataclasses import dataclass, field +from sbom.cmd_graph.deps_parser import parse_cmd_file_deps +from sbom.cmd_graph.savedcmd_parser import parse_inputs_from_commands +import sbom.sbom_logging as sbom_logging +from sbom.path_utils import PathStr + +SAVEDCMD_PATTERN =3D re.compile(r"^(saved)?cmd_.*?:=3D\s*(?P= .+)$") +SOURCE_PATTERN =3D re.compile(r"^source.*?:=3D\s*(?P.+)$") + + +@dataclass +class CmdFile: + cmd_file_path: PathStr + savedcmd: str + source: PathStr | None =3D None + deps: list[str] =3D field(default_factory=3Dlist) + make_rules: list[str] =3D field(default_factory=3Dlist) + + @classmethod + def create(cls, cmd_file_path: PathStr) -> "CmdFile | None": + """ + Parses a .cmd file. + .cmd files are assumed to have one of the following structures: + 1. Full Cmd File + (saved)?cmd_ :=3D + source_ :=3D + deps_ :=3D \ + + :=3D $(deps_) + $(deps_): + + 2. Command Only Cmd File + (saved)?cmd_ :=3D + + 3. Single Dependency Cmd File + (saved)?cmd_ :=3D + : + + Args: + cmd_file_path (Path): absolute Path to a .cmd file + + Returns: + cmd_file (CmdFile): Parsed cmd file. + """ + with open(cmd_file_path, "rt", encoding=3D"utf-8") as f: + lines =3D [line.strip() for line in f.readlines() if line.stri= p() !=3D "" and not line.startswith("#")] + + # savedcmd + match =3D SAVEDCMD_PATTERN.match(lines[0] if lines else "") + if match is None: + sbom_logging.error( + "Skip parsing '{cmd_file_path}' because no 'savedcmd_' com= mand was found.", cmd_file_path=3Dcmd_file_path + ) + return None + savedcmd =3D match.group("full_command") + + # Command Only Cmd File + if len(lines) =3D=3D 1: + return CmdFile(cmd_file_path, savedcmd) + + # Single Dependency Cmd File + if len(lines) =3D=3D 2: + parts =3D lines[1].split(":", 1) + if len(parts) !=3D 2: + sbom_logging.error( + "Skip parsing '{cmd_file_path}'. Expected dependency l= ine ': ' but got {second_line}", cmd_file_path=3Dcmd_fi= le_path, second_line=3Dlines[1] + ) + return None + dep =3D parts[1].strip() + return CmdFile(cmd_file_path, savedcmd, deps=3D[dep]) + + # Full Cmd File + # source + line1 =3D SOURCE_PATTERN.match(lines[1]) + if line1 is None: + sbom_logging.error( + "Skip parsing '{cmd_file_path}' because no 'source_' entry= was found.", cmd_file_path=3Dcmd_file_path + ) + return CmdFile(cmd_file_path, savedcmd) + source =3D line1.group("source_file") + + # deps + deps: list[str] =3D [] + i =3D 3 # lines[2] includes the variable assignment but no actual= dependency, so we need to start at lines[3]. + while i < len(lines): + if not lines[i].endswith("\\"): + break + deps.append(lines[i][:-1].strip()) + i +=3D 1 + + # make_rules + make_rules =3D lines[i:] + + return CmdFile(cmd_file_path, savedcmd, source, deps, make_rules) + + def get_dependencies( + self: "CmdFile", target_path: PathStr, obj_tree: PathStr, fail_on_= unknown_build_command: bool + ) -> list[PathStr]: + """ + Parses all dependencies required to build a target file from its c= md file. + + Args: + target_path: path to the target file relative to `obj_tree`. + obj_tree: absolute path to the object tree. + fail_on_unknown_build_command: Whether to fail if an unknown b= uild command is encountered. + + Returns: + list[PathStr]: dependency file paths relative to `obj_tree`. + """ + input_files: list[PathStr] =3D [ + str(p) for p in parse_inputs_from_commands(self.savedcmd, fail= _on_unknown_build_command) + ] + if self.deps: + input_files +=3D [str(p) for p in parse_cmd_file_deps(self.dep= s)] + input_files =3D _expand_resolve_files(input_files, obj_tree) + + cmd_file_dependencies: list[PathStr] =3D [] + for input_file in input_files: + # input files are either absolute or relative to the object tr= ee + if os.path.isabs(input_file): + input_file =3D os.path.relpath(input_file, obj_tree) + if input_file =3D=3D target_path: + # Skip target file to prevent cycles. This is necessary be= cause some multi stage commands first create an output and then pass it as = input to the next command, e.g., objcopy. + continue + cmd_file_dependencies.append(input_file) + unique_cmd_file_dependencies =3D list(dict.fromkeys(cmd_file_depen= dencies)) + return unique_cmd_file_dependencies + + +def _expand_resolve_files(input_files: list[PathStr], obj_tree: PathStr) -= > list[PathStr]: + """ + Expands resolve files which may reference additional files via '@' not= ation. + + Args: + input_files (list[PathStr]): List of file paths relative to the ob= ject tree, where paths starting with '@' refer to files + containing further file paths, each o= n a separate line. + obj_tree: Absolute path to the root of the object tree. + + Returns: + list[PathStr]: Flattened list of all input file paths, with any ne= sted '@' file references resolved recursively. + """ + expanded_input_files: list[PathStr] =3D [] + for input_file in input_files: + if not input_file.startswith("@"): + expanded_input_files.append(input_file) + continue + resolve_file_path =3D os.path.join(obj_tree, input_file.removepref= ix("@")) + if not os.path.exists(resolve_file_path): + sbom_logging.error( + "Skip resolving '{resolve_file_path}' because the response= file does not exist.", + resolve_file_path=3Dresolve_file_path, + ) + continue + with open(resolve_file_path, "rt", encoding=3D"utf-8") as f: + resolve_file_content =3D [line_stripped for line in f.readline= s() if (line_stripped :=3D line.strip())] + expanded_input_files +=3D _expand_resolve_files(resolve_file_conte= nt, obj_tree) + return expanded_input_files diff --git a/scripts/sbom/sbom/cmd_graph/cmd_graph.py b/scripts/sbom/sbom/c= md_graph/cmd_graph.py new file mode 100644 index 00000000000..2f57965237f --- /dev/null +++ b/scripts/sbom/sbom/cmd_graph/cmd_graph.py @@ -0,0 +1,46 @@ +# SPDX-License-Identifier: GPL-2.0-only OR MIT +# Copyright (C) 2025 TNG Technology Consulting GmbH + +from collections import deque +from dataclasses import dataclass, field +from typing import Iterator + +from sbom.cmd_graph.cmd_graph_node import CmdGraphNode, CmdGraphNodeConfig +from sbom.path_utils import PathStr + + +@dataclass +class CmdGraph: + """Directed acyclic graph of build dependencies primarily inferred fro= m .cmd files produced during kernel builds""" + + roots: list[CmdGraphNode] =3D field(default_factory=3Dlist) + + @classmethod + def create(cls, root_paths: list[PathStr], config: CmdGraphNodeConfig)= -> "CmdGraph": + """ + Recursively builds a dependency graph starting from `root_paths`. + Dependencies are mainly discovered by parsing the `.cmd` files. + + Args: + root_paths (list[PathStr]): List of paths to root outputs rela= tive to obj_tree + config (CmdGraphNodeConfig): Configuration options + + Returns: + CmdGraph: A graph of all build dependencies for the given root= files. + """ + node_cache: dict[PathStr, CmdGraphNode] =3D {} + root_nodes =3D [CmdGraphNode.create(root_path, config, node_cache)= for root_path in root_paths] + return CmdGraph(root_nodes) + + def __iter__(self) -> Iterator[CmdGraphNode]: + """Traverse the graph in breadth-first order, yielding each unique= node.""" + visited: set[PathStr] =3D set() + node_stack: deque[CmdGraphNode] =3D deque(self.roots) + while len(node_stack) > 0: + node =3D node_stack.popleft() + if node.absolute_path in visited: + continue + + visited.add(node.absolute_path) + node_stack.extend(node.children) + yield node diff --git a/scripts/sbom/sbom/cmd_graph/cmd_graph_node.py b/scripts/sbom/s= bom/cmd_graph/cmd_graph_node.py new file mode 100644 index 00000000000..7dde1c28eef --- /dev/null +++ b/scripts/sbom/sbom/cmd_graph/cmd_graph_node.py @@ -0,0 +1,111 @@ +# SPDX-License-Identifier: GPL-2.0-only OR MIT +# Copyright (C) 2025 TNG Technology Consulting GmbH + +from dataclasses import dataclass, field +import logging +import os +from typing import Iterator, Protocol + +from sbom import sbom_logging +from sbom.cmd_graph.cmd_file import CmdFile +from sbom.path_utils import PathStr, has_link, is_relative_to + + +class CmdGraphNodeConfig(Protocol): + obj_tree: PathStr + src_tree: PathStr + fail_on_unknown_build_command: bool + + +@dataclass +class CmdGraphNode: + """A node in the cmd graph representing a single file and its dependen= cies.""" + + absolute_path: PathStr + """Absolute path to the file this node represents.""" + + cmd_file: CmdFile | None =3D None + """Parsed .cmd file describing how the file at absolute_path was built= , or None if not available.""" + + cmd_file_dependencies: list["CmdGraphNode"] =3D field(default_factory= =3Dlist) + + @property + def children(self) -> Iterator["CmdGraphNode"]: + seen: set[PathStr] =3D set() + for node in self.cmd_file_dependencies: + if node.absolute_path not in seen: + seen.add(node.absolute_path) + yield node + + @classmethod + def create( + cls, + target_path: PathStr, + config: CmdGraphNodeConfig, + cache: dict[PathStr, "CmdGraphNode"] | None =3D None, + depth: int =3D 0, + ) -> "CmdGraphNode": + """ + Recursively builds a dependency graph starting from `target_path`. + Dependencies are mainly discovered by parsing the `..cmd` file. + + Args: + target_path: Path to the target file relative to obj_tree. + config: Config options + cache: Tracks processed nodes to prevent cycles. + depth: Internal parameter to track the current recursion depth. + + Returns: + CmdGraphNode: cmd graph node representing the target file + """ + if cache is None: + cache =3D {} + + target_path_absolute =3D ( + os.path.realpath(p) + if has_link(p:=3Dos.path.join(config.obj_tree, target_path)) + else os.path.normpath(p) + ) + + if target_path_absolute in cache: + return cache[target_path_absolute] + + if depth =3D=3D 0: + logging.debug(f"Build node: {target_path}") + + cmd_file_path =3D _to_cmd_path(target_path_absolute) + cmd_file =3D CmdFile.create(cmd_file_path) if os.path.exists(cmd_f= ile_path) else None + node =3D CmdGraphNode(target_path_absolute, cmd_file) + cache[target_path_absolute] =3D node + + if not os.path.exists(target_path_absolute): + error_or_warning =3D ( + sbom_logging.error + if is_relative_to(target_path_absolute, config.obj_tree) + or is_relative_to(target_path_absolute, config.src_tree) + else sbom_logging.warning + ) + error_or_warning( + "Skip parsing '{target_path_absolute}' because file does n= ot exist", + target_path_absolute=3Dtarget_path_absolute, + ) + return node + + # Search for dependencies to add to the graph as child nodes. Chil= d paths are always relative to the output tree. + def _build_child_node(child_path: PathStr) -> "CmdGraphNode": + return CmdGraphNode.create(child_path, config, cache, depth + = 1) + + if cmd_file is not None: + node.cmd_file_dependencies =3D [ + _build_child_node(cmd_file_dependency_path) + for cmd_file_dependency_path in cmd_file.get_dependencies( + target_path, config.obj_tree, config.fail_on_unknown_b= uild_command + ) + ] + + return node + + +def _to_cmd_path(path: PathStr) -> PathStr: + name =3D os.path.basename(path) + return path.removesuffix(name) + f".{name}.cmd" diff --git a/scripts/sbom/sbom/cmd_graph/deps_parser.py b/scripts/sbom/sbom= /cmd_graph/deps_parser.py new file mode 100644 index 00000000000..6a2d92f0778 --- /dev/null +++ b/scripts/sbom/sbom/cmd_graph/deps_parser.py @@ -0,0 +1,52 @@ +# SPDX-License-Identifier: GPL-2.0-only OR MIT +# Copyright (C) 2025 TNG Technology Consulting GmbH + +import re +import sbom.sbom_logging as sbom_logging +from sbom.path_utils import PathStr + +# Match dependencies on config files +# Example match: "$(wildcard include/config/CONFIG_SOMETHING)" +CONFIG_PATTERN =3D re.compile(r"\$\(wildcard (include/config/[^)]+)\)") + +# Match dependencies on the objtool binary +# Example match: "$(wildcard ./tools/objtool/objtool)" +OBJTOOL_PATTERN =3D re.compile(r"\$\(wildcard \./tools/objtool/objtool\)") + +# Match any Makefile wildcard reference +# Example match: "$(wildcard path/to/file)" +WILDCARD_PATTERN =3D re.compile(r"\$\(wildcard (?P[^)]+)\)") + +# Match ordinary paths: +# - ^(\/)?: Optionally starts with a '/' +# - (([\w\-\.,+~=3D@ ]*)\/)*: Zero or more directory levels +# - [\w\-\.,+~=3D@ ]+$: Path component (file or directory) +# Example matches: "/foo/bar.c", "dir1/dir2/file.txt", "plainfile" +VALID_PATH_PATTERN =3D re.compile(r"^(\/)?(([\w\-\.,+~=3D@ ]*)\/)*[\w\-\.,= +~=3D@ ]+$") + + +def parse_cmd_file_deps(deps: list[str]) -> list[PathStr]: + """ + Parse dependency strings of a .cmd file and return valid input file pa= ths. + + Args: + deps: List of dependency strings as found in `.cmd` files. + + Returns: + input_files: List of input file paths + """ + input_files: list[PathStr] =3D [] + for dep in deps: + dep =3D dep.strip() + match dep: + case _ if CONFIG_PATTERN.match(dep) or OBJTOOL_PATTERN.match(d= ep): + # config paths like include/config/ should no= t be included in the graph + continue + case _ if match :=3D WILDCARD_PATTERN.match(dep): + path =3D match.group("path") + input_files.append(path) + case _ if VALID_PATH_PATTERN.match(dep): + input_files.append(dep) + case _: + sbom_logging.error("Skip parsing dependency {dep} because = of unrecognized format", dep=3Ddep) + return input_files diff --git a/scripts/sbom/sbom/config.py b/scripts/sbom/sbom/config.py index c1ac9ad5737..b8c1a2b404d 100644 --- a/scripts/sbom/sbom/config.py +++ b/scripts/sbom/sbom/config.py @@ -3,21 +3,88 @@ =20 import argparse from dataclasses import dataclass +import os +from typing import Any +from sbom.path_utils import PathStr =20 =20 @dataclass class KernelSbomConfig: + src_tree: PathStr + """Absolute path to the Linux kernel source directory.""" + + obj_tree: PathStr + """Absolute path to the build output directory.""" + + root_paths: list[PathStr] + """List of paths to root outputs (relative to obj_tree) to base the SB= OM on.""" + + generate_used_files: bool + """Whether to generate a flat list of all source files used in the bui= ld. + If False, no used-files document is created.""" + + used_files_file_name: str + """If `generate_used_files` is True, specifies the file name for the u= sed-files document.""" + + output_directory: PathStr + """Path to the directory where the generated output documents will be = saved.""" + debug: bool """Whether to enable debug logging.""" =20 + fail_on_unknown_build_command: bool + """Whether to fail if an unknown build command is encountered in a .cm= d file.""" + + write_output_on_error: bool + """Whether to write output documents even if errors occur.""" + =20 -def _parse_cli_arguments(parser: argparse.ArgumentParser) -> dict[str, boo= l]: +def _parse_cli_arguments(parser: argparse.ArgumentParser) -> dict[str, Any= ]: """ Parse command-line arguments using argparse. =20 Returns: Dictionary of parsed arguments. """ + parser.add_argument( + "--src-tree", + default=3D"../linux", + help=3D"Path to the kernel source tree (default: ../linux)", + ) + parser.add_argument( + "--obj-tree", + default=3D"../linux/kernel_build", + help=3D"Path to the build output directory (default: ../linux/kern= el_build)", + ) + group =3D parser.add_mutually_exclusive_group(required=3DTrue) + group.add_argument( + "--roots", + nargs=3D"+", + help=3D"Space-separated list of paths relative to obj-tree for whi= ch the SBOM will be created.\n" + "Cannot be used together with --roots-file.", + ) + group.add_argument( + "--roots-file", + help=3D"Path to a file containing the root paths (one per line). C= annot be used together with --roots.", + ) + parser.add_argument( + "--generate-used-files", + action=3D"store_true", + default=3DFalse, + help=3D( + "Whether to create the sbom.used-files.txt file, a flat list o= f all " + "source files used for the kernel build.\n" + "If src-tree and obj-tree are equal it is not possible to reli= ably " + "classify source files.\n" + "In this case sbom.used-files.txt will contain all files used = for the " + "kernel build including all build artifacts. (default: False)" + ), + ) + parser.add_argument( + "--output-directory", + default=3D".", + help=3D"Path to the directory where the generated output documents= will be stored (default: .)", + ) parser.add_argument( "--debug", action=3D"store_true", @@ -25,6 +92,28 @@ def _parse_cli_arguments(parser: argparse.ArgumentParser= ) -> dict[str, bool]: help=3D"Enable debug logs (default: False)", ) =20 + # Error handling settings + parser.add_argument( + "--do-not-fail-on-unknown-build-command", + action=3D"store_true", + default=3DFalse, + help=3D( + "Whether to fail if an unknown build command is encountered in= a .cmd file.\n" + "If set to True, errors are logged as warnings instead. (defau= lt: False)" + ), + ) + parser.add_argument( + "--write-output-on-error", + action=3D"store_true", + default=3DFalse, + help=3D( + "Write output documents even if errors occur. The resulting do= cuments " + "may be incomplete.\n" + "A summary of warnings and errors can be found in the 'comment= ' property " + "of the CreationInfo element. (default: False)" + ), + ) + args =3D vars(parser.parse_args()) return args =20 @@ -37,10 +126,66 @@ def get_config() -> KernelSbomConfig: KernelSbomConfig: Configuration object with all settings for SBOM = generation. """ parser =3D argparse.ArgumentParser( + formatter_class=3Dargparse.RawTextHelpFormatter, description=3D"Generate SPDX SBOM documents for kernel builds", ) args =3D _parse_cli_arguments(parser) =20 + # Extract and validate cli arguments + src_tree =3D os.path.realpath(args["src_tree"]) + obj_tree =3D os.path.realpath(args["obj_tree"]) + root_paths =3D [] + if args["roots_file"]: + with open(args["roots_file"], "rt", encoding=3D"utf-8") as f: + root_paths =3D [root.strip() for root in f.readlines()] + if len(root_paths) =3D=3D 0: + parser.error("--roots-file must contain at least one path") + else: + root_paths =3D args["roots"] + _validate_path_arguments(parser, src_tree, obj_tree, root_paths) + + generate_used_files =3D args["generate_used_files"] + output_directory =3D os.path.realpath(args["output_directory"]) debug =3D args["debug"] =20 - return KernelSbomConfig(debug=3Ddebug) + fail_on_unknown_build_command =3D not args["do_not_fail_on_unknown_bui= ld_command"] + write_output_on_error =3D args["write_output_on_error"] + + # Hardcoded config + used_files_file_name =3D "sbom.used-files.txt" + + return KernelSbomConfig( + src_tree=3Dsrc_tree, + obj_tree=3Dobj_tree, + root_paths=3Droot_paths, + generate_used_files=3Dgenerate_used_files, + used_files_file_name=3Dused_files_file_name, + output_directory=3Doutput_directory, + debug=3Ddebug, + fail_on_unknown_build_command=3Dfail_on_unknown_build_command, + write_output_on_error=3Dwrite_output_on_error, + ) + + +def _validate_path_arguments( + parser: argparse.ArgumentParser, + src_tree: PathStr, + obj_tree: PathStr, + root_paths: list[PathStr], +) -> None: + """ + Validate that the provided paths exist. + + Args: + parser: The argument parser, used to emit well-formatted error mes= sages. + src_tree: Absolute path to the source tree. + obj_tree: Absolute path to the object tree. + root_paths: List of root paths relative to obj_tree. + """ + if not os.path.exists(src_tree): + parser.error(f"--src-tree {src_tree} does not exist") + if not os.path.exists(obj_tree): + parser.error(f"--obj-tree {obj_tree} does not exist") + for root_path in root_paths: + if not os.path.isfile(root_path_absolute :=3D os.path.join(obj_tre= e, root_path)): + parser.error(f"path to root artifact {root_path_absolute} is n= ot a file") diff --git a/scripts/sbom/sbom/path_utils.py b/scripts/sbom/sbom/path_utils= .py new file mode 100644 index 00000000000..29820046dc8 --- /dev/null +++ b/scripts/sbom/sbom/path_utils.py @@ -0,0 +1,22 @@ +# SPDX-License-Identifier: GPL-2.0-only OR MIT +# Copyright (C) 2025 TNG Technology Consulting GmbH + +import os +from functools import lru_cache + +PathStr =3D str +"""Filesystem path represented as a plain string for better performance th= an pathlib.Path.""" + + +def is_relative_to(path: PathStr, base: PathStr) -> bool: + return os.path.commonpath([path, base]) =3D=3D base + +@lru_cache(maxsize=3DNone) +def has_link(path: PathStr) -> bool: + """Returns True if path or any of its ancestor directories is a symlin= k. Results are cached to avoid duplicate lstat syscalls.""" + if os.path.islink(path): + return True + parent =3D os.path.dirname(path) + if parent =3D=3D path: + return False + return has_link(parent) --=20 2.43.0 From nobody Mon May 25 05:54:39 2026 Received: from mailgw02.zimbra-vnc.de (mailgw02.zimbra-vnc.de [148.251.102.236]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 1370C38B147; Mon, 18 May 2026 06:21:25 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=148.251.102.236 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779085292; cv=none; b=YfnvO3JVBocYFGhIvVr/oOYkHu+ZsFXbmfE6Dvii0w/qqn/OatMdIr8RLQCakBSxHek5KQYC79EsxcFV5Apj4OibsPzWw/m/uApb631WoYKqpDtZom8j6eQ0+opVrHCx18nUJ68XLQrXBoOEqmlloTS4/UG0w2xmKCtLXD35TgQ= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779085292; c=relaxed/simple; bh=T30ra+tWj8xmNayf0D97b4yId+AmexN8t/uXoaOzjPg=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=fcNoMbR6h+jIPPweWuYpnwc0c0/396CR9sEY/SYReoG/x0FMjAF7kPQJ9A6+cFi9VeP1eRRHNJGEMyrkqUiz1bBwSoU+iK03MqpFAyVa6RSABiZ74PbfDIlJ/RQEdq/iVGXAf6eT/6Zm4IV82TvO8K0aiAZ/w/S1DZolHagGr8I= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=tngtech.com; spf=pass smtp.mailfrom=tngtech.com; dkim=pass (2048-bit key) header.d=tngtech.com header.i=@tngtech.com header.b=DK08614B; arc=none smtp.client-ip=148.251.102.236 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=tngtech.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=tngtech.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=tngtech.com header.i=@tngtech.com header.b="DK08614B" Received: from zmproxy.tng.vnc.biz (zimbra-vnc.tngtech.com [35.234.71.156]) by mailgw02.zimbra-vnc.de (Postfix) with ESMTPS id 3F433200C9; Mon, 18 May 2026 08:21:18 +0200 (CEST) Received: from localhost (localhost [127.0.0.1]) by zmproxy.tng.vnc.biz (Postfix) with ESMTP id C08D41FACD3; Mon, 18 May 2026 08:21:17 +0200 (CEST) Received: from zmproxy.tng.vnc.biz ([127.0.0.1]) by localhost (zmproxy.tng.vnc.biz [127.0.0.1]) (amavis, port 10032) with ESMTP id 7eGUm8tTHX2i; Mon, 18 May 2026 08:21:16 +0200 (CEST) Received: from localhost (localhost [127.0.0.1]) by zmproxy.tng.vnc.biz (Postfix) with ESMTP id B085C1FAD2F; Mon, 18 May 2026 08:21:16 +0200 (CEST) DKIM-Filter: OpenDKIM Filter v2.10.3 zmproxy.tng.vnc.biz B085C1FAD2F DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=tngtech.com; s=B14491C6-869D-11EB-BB6C-8DD33D883B31; t=1779085276; bh=QUAiEzp97wF3qbltU1fFd8h/Mz4TvzQY3lZdncaz0jo=; h=From:To:Subject:Date:Message-ID:MIME-Version; b=DK08614B7J6cePHB2zLtciPKs8UlzqZk4JconsdVv/Fmkim8iL/sCtwPDSDEhq7QE 8/+QzVFMr8eRD0baH2mEJss1d4J4UuH6WeiumMRYtNVpEZP2VGkyCb206j92RkbkBs kqKnfPkShUFF2VRIlJ3RYCCSd2DeerV90gNCgad1zqoQtH2/CXdQgVsRZa5pftfHnV 6syeKJlsrdZtjf5zrP1l8RrhdZDRB9vCVMUD/20MPrzBvzbHctmodTlqnb66RR5knH 7m29C5gKt4VcT1BUX1DSrAwmuwb9QehFNrBGQHMUb4kh24euyILJaEAbblO78XgenJ DU4HmsvsBIzEQ== X-Virus-Scanned: amavis at zmproxy.tng.vnc.biz Received: from zmproxy.tng.vnc.biz ([127.0.0.1]) by localhost (zmproxy.tng.vnc.biz [127.0.0.1]) (amavis, port 10026) with ESMTP id pH80-6OPbOam; Mon, 18 May 2026 08:21:16 +0200 (CEST) Received: from luis-Precision-5480.. (ipservice-092-209-239-167.092.209.pools.vodafone-ip.de [92.209.239.167]) by zmproxy.tng.vnc.biz (Postfix) with ESMTPSA id 560D91FAD29; Mon, 18 May 2026 08:21:16 +0200 (CEST) From: Luis To: nathan@kernel.org, nsc@kernel.org Cc: linux-kbuild@vger.kernel.org, linux-kernel@vger.kernel.org, akpm@linux-foundation.org, gregkh@linuxfoundation.org, kstewart@linuxfoundation.org, maximilian.huber@tngtech.com, Luis Augenstein Subject: [PATCH v7 06/15] scripts/sbom: add additional dependency sources for cmd graph Date: Mon, 18 May 2026 08:20:53 +0200 Message-ID: <20260518062102.2051814-7-luis.augenstein@tngtech.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20260518062102.2051814-1-luis.augenstein@tngtech.com> References: <20260518062102.2051814-1-luis.augenstein@tngtech.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Luis Augenstein Add hardcoded dependencies and .incbin directive parsing to discover dependencies not tracked by .cmd files. Assisted-by: Cursor:claude-sonnet-4-5 Assisted-by: OpenCode:GLM-4-7 Co-developed-by: Maximilian Huber Signed-off-by: Maximilian Huber Signed-off-by: Luis Augenstein --- scripts/sbom/sbom/cmd_graph/cmd_graph_node.py | 33 ++++++- .../sbom/cmd_graph/hardcoded_dependencies.py | 87 +++++++++++++++++++ scripts/sbom/sbom/cmd_graph/incbin_parser.py | 42 +++++++++ 3 files changed, 161 insertions(+), 1 deletion(-) create mode 100644 scripts/sbom/sbom/cmd_graph/hardcoded_dependencies.py create mode 100644 scripts/sbom/sbom/cmd_graph/incbin_parser.py diff --git a/scripts/sbom/sbom/cmd_graph/cmd_graph_node.py b/scripts/sbom/s= bom/cmd_graph/cmd_graph_node.py index 7dde1c28eef..61f3a8140ce 100644 --- a/scripts/sbom/sbom/cmd_graph/cmd_graph_node.py +++ b/scripts/sbom/sbom/cmd_graph/cmd_graph_node.py @@ -2,15 +2,24 @@ # Copyright (C) 2025 TNG Technology Consulting GmbH =20 from dataclasses import dataclass, field +from itertools import chain import logging import os from typing import Iterator, Protocol =20 from sbom import sbom_logging from sbom.cmd_graph.cmd_file import CmdFile +from sbom.cmd_graph.hardcoded_dependencies import get_hardcoded_dependenci= es +from sbom.cmd_graph.incbin_parser import parse_incbin_statements from sbom.path_utils import PathStr, has_link, is_relative_to =20 =20 +@dataclass +class IncbinDependency: + node: "CmdGraphNode" + full_statement: str + + class CmdGraphNodeConfig(Protocol): obj_tree: PathStr src_tree: PathStr @@ -28,11 +37,17 @@ class CmdGraphNode: """Parsed .cmd file describing how the file at absolute_path was built= , or None if not available.""" =20 cmd_file_dependencies: list["CmdGraphNode"] =3D field(default_factory= =3Dlist) + incbin_dependencies: list[IncbinDependency] =3D field(default_factory= =3Dlist) + hardcoded_dependencies: list["CmdGraphNode"] =3D field(default_factory= =3Dlist) =20 @property def children(self) -> Iterator["CmdGraphNode"]: seen: set[PathStr] =3D set() - for node in self.cmd_file_dependencies: + for node in chain( + self.cmd_file_dependencies, + (dep.node for dep in self.incbin_dependencies), + self.hardcoded_dependencies, + ): if node.absolute_path not in seen: seen.add(node.absolute_path) yield node @@ -95,6 +110,13 @@ class CmdGraphNode: def _build_child_node(child_path: PathStr) -> "CmdGraphNode": return CmdGraphNode.create(child_path, config, cache, depth + = 1) =20 + node.hardcoded_dependencies =3D [ + _build_child_node(hardcoded_dependency_path) + for hardcoded_dependency_path in get_hardcoded_dependencies( + target_path_absolute, config.obj_tree, config.src_tree + ) + ] + if cmd_file is not None: node.cmd_file_dependencies =3D [ _build_child_node(cmd_file_dependency_path) @@ -103,6 +125,15 @@ class CmdGraphNode: ) ] =20 + if node.absolute_path.endswith(".S"): + node.incbin_dependencies =3D [ + IncbinDependency( + node=3D_build_child_node(incbin_statement.path), + full_statement=3Dincbin_statement.full_statement, + ) + for incbin_statement in parse_incbin_statements(node.absol= ute_path) + ] + return node =20 =20 diff --git a/scripts/sbom/sbom/cmd_graph/hardcoded_dependencies.py b/script= s/sbom/sbom/cmd_graph/hardcoded_dependencies.py new file mode 100644 index 00000000000..2eb04d30f4e --- /dev/null +++ b/scripts/sbom/sbom/cmd_graph/hardcoded_dependencies.py @@ -0,0 +1,87 @@ +# SPDX-License-Identifier: GPL-2.0-only OR MIT +# Copyright (C) 2025 TNG Technology Consulting GmbH + +import os +from typing import Callable +import sbom.sbom_logging as sbom_logging +from sbom.path_utils import PathStr, is_relative_to +from sbom.environment import Environment + +HARDCODED_DEPENDENCIES: dict[str, list[str]] =3D { + # defined in linux/Kbuild + "include/generated/rq-offsets.h": ["kernel/sched/rq-offsets.s"], + "kernel/sched/rq-offsets.s": ["include/generated/asm-offsets.h"], + "include/generated/bounds.h": ["kernel/bounds.s"], + "include/generated/asm-offsets.h": ["arch/{arch}/kernel/asm-offsets.s"= ], +} +""" +Maps file paths to the list of dependencies required to build them +which are not tracked by the .cmd dependency mechanism. +Paths are relative to either the source tree or the object tree. +""" + +def get_hardcoded_dependencies(path: PathStr, obj_tree: PathStr, src_tree:= PathStr) -> list[PathStr]: + """ + Some files in the kernel build process are not tracked by the .cmd dep= endency mechanism. + Parsing these dependencies programmatically is too complex for the sco= pe of this project. + Therefore, this function provides manually defined dependencies to be = added to the build graph. + + Args: + path: absolute path to a file within the src tree or object tree. + obj_tree: absolute Path to the base directory of the object tree. + src_tree: absolute Path to the `linux` source directory. + + Returns: + list[PathStr]: A list of dependency file paths (relative to the ob= ject tree) required to build the file at the given path. + """ + if is_relative_to(path, obj_tree): + path =3D os.path.relpath(path, obj_tree) + elif is_relative_to(path, src_tree): + path =3D os.path.relpath(path, src_tree) + + if path not in HARDCODED_DEPENDENCIES: + return [] + + template_variables: dict[str, Callable[[], str | None]] =3D { + "arch": lambda: _get_arch(path), + } + + dependencies: list[PathStr] =3D [] + for dependency_template in HARDCODED_DEPENDENCIES[path]: + dependency =3D _evaluate_template(dependency_template, template_va= riables) + if dependency is None: + continue + if os.path.exists(os.path.join(obj_tree, dependency)): + dependencies.append(dependency) + elif os.path.exists(dependency_absolute :=3D os.path.join(src_tree= , dependency)): + dependencies.append(os.path.relpath(dependency_absolute, obj_t= ree)) + else: + sbom_logging.error( + "Skip hardcoded dependency '{dependency}' for '{path}' bec= ause the dependency lies neither in the src tree nor the object tree.", + dependency=3Ddependency, + path=3Dpath, + ) + + return dependencies + + +def _evaluate_template(template: str, variables: dict[str, Callable[[], st= r | None]]) -> str | None: + for key, value_function in variables.items(): + template_key =3D "{" + key + "}" + if template_key in template: + value =3D value_function() + if value is None: + return None + template =3D template.replace(template_key, value) + return template + + +def _get_arch(path: PathStr): + srcarch =3D Environment.SRCARCH() + if srcarch is None: + sbom_logging.error( + "Skipped architecture specific hardcoded dependency for '{path= }' because the SRCARCH environment variable was not set.", + path=3Dpath, + ) + return None + return srcarch diff --git a/scripts/sbom/sbom/cmd_graph/incbin_parser.py b/scripts/sbom/sb= om/cmd_graph/incbin_parser.py new file mode 100644 index 00000000000..ca289c2b888 --- /dev/null +++ b/scripts/sbom/sbom/cmd_graph/incbin_parser.py @@ -0,0 +1,42 @@ +# SPDX-License-Identifier: GPL-2.0-only OR MIT +# Copyright (C) 2025 TNG Technology Consulting GmbH + +from dataclasses import dataclass +import re + +from sbom.path_utils import PathStr + +INCBIN_PATTERN =3D re.compile(r'\s*\.incbin\s+"(?P[^"]+)"') +"""Regex pattern for matching `.incbin ""` statements.""" + + +@dataclass +class IncbinStatement: + """A parsed `.incbin ""` directive.""" + + path: PathStr + """path to the file referenced by the `.incbin` directive.""" + + full_statement: str + """Full `.incbin ""` statement as it originally appeared in the = file.""" + + +def parse_incbin_statements(absolute_path: PathStr) -> list[IncbinStatemen= t]: + """ + Parses `.incbin` directives from an `.S` assembly file. + + Args: + absolute_path: Absolute path to the `.S` assembly file. + + Returns: + list[IncbinStatement]: Parsed `.incbin` statements. + """ + with open(absolute_path, "rt", encoding=3D"utf-8") as f: + content =3D f.read() + return [ + IncbinStatement( + path=3Dmatch.group("path"), + full_statement=3Dmatch.group(0).strip(), + ) + for match in INCBIN_PATTERN.finditer(content) + ] --=20 2.43.0 From nobody Mon May 25 05:54:39 2026 Received: from mailgw02.zimbra-vnc.de (mailgw02.zimbra-vnc.de [148.251.102.236]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 064583DD85B; Mon, 18 May 2026 06:21:27 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=148.251.102.236 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779085294; cv=none; b=MQsG2aKbA9Ib3b99YeACkBn6+UNMKb/C60RCUTl/k6i4e9EtPLRBjfjXLQ7E/C8Yz1to+KePAHYfH95X+32aHQ99bEnrb4K0xxUsNXOubMSIAtotTwLUu769g4Rs0jjEyuBnMZ8KwO4ydmrHSFHx2dscXpI2rbhaB1CvFwbOm+o= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779085294; c=relaxed/simple; bh=sc1qifr0NbrTLegWs/gxE4CtFE7Fs7BznzCjm3fngtk=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=T+E+9n9TIQfy8eaJY8wRIyiaqx59FXSFbtDVAcGEprAsD6yXpNwIxIuTOr1rEE1I7I4DUqWJOLvgdEbCeAGemYFEXapNJGZD1STp1unBBwK3jyEMgzajHdMt37NVUqnk0bkx9x9j3p+/zNczQDL09sv3Pmy9BQcyeFuxMVEXXy4= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=tngtech.com; spf=pass smtp.mailfrom=tngtech.com; dkim=pass (2048-bit key) header.d=tngtech.com header.i=@tngtech.com header.b=GmNu3DJk; arc=none smtp.client-ip=148.251.102.236 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=tngtech.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=tngtech.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=tngtech.com header.i=@tngtech.com header.b="GmNu3DJk" Received: from zmproxy.tng.vnc.biz (zimbra-vnc.tngtech.com [35.234.71.156]) by mailgw02.zimbra-vnc.de (Postfix) with ESMTPS id 93B16200AA; Mon, 18 May 2026 08:21:23 +0200 (CEST) Received: from localhost (localhost [127.0.0.1]) by zmproxy.tng.vnc.biz (Postfix) with ESMTP id 7ED391FACCB; Mon, 18 May 2026 08:21:23 +0200 (CEST) Received: from zmproxy.tng.vnc.biz ([127.0.0.1]) by localhost (zmproxy.tng.vnc.biz [127.0.0.1]) (amavis, port 10032) with ESMTP id hhXo_hbDYj73; Mon, 18 May 2026 08:21:18 +0200 (CEST) Received: from localhost (localhost [127.0.0.1]) by zmproxy.tng.vnc.biz (Postfix) with ESMTP id 294011F89A8; Mon, 18 May 2026 08:21:18 +0200 (CEST) DKIM-Filter: OpenDKIM Filter v2.10.3 zmproxy.tng.vnc.biz 294011F89A8 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=tngtech.com; s=B14491C6-869D-11EB-BB6C-8DD33D883B31; t=1779085278; bh=uzlFQH12ZsN3N3nLGnxAARdVacoGgTMH5VaLuq8ilmQ=; h=From:To:Subject:Date:Message-ID:MIME-Version; b=GmNu3DJkrfEvA3RZkCkkD3CsMvFZJ5z1kTRm1QuFd5i4BpMlIOWlDLQrIRMo9iW21 3XBmJzrpaMlseKtLNZzalKCO8cf86Is97EqagJHj8JH0U7Nw7I42AMgXcJ09xV7K+x dw0qqn7uJ/UIg8f85hqI+lh20BJkotEu7BxmzJDhRvG8s2kEBgl4rEe1WfPrVyyadc U2qdOAtecltGeO+5m5ZkCj8DNNoCoMWYWMbc/FMFrWwxOPYh30G0O8xU6glyz0s9wK 6FVNa3Usfum5rxEj43I0GdmGGGAW2O6x0F3vtZV058bVEBY6vJSyu1Lkg2bPvUhYGR 2FeCGMuNdUrqA== X-Virus-Scanned: amavis at zmproxy.tng.vnc.biz Received: from zmproxy.tng.vnc.biz ([127.0.0.1]) by localhost (zmproxy.tng.vnc.biz [127.0.0.1]) (amavis, port 10026) with ESMTP id 39DvHcZ5ma1d; Mon, 18 May 2026 08:21:18 +0200 (CEST) Received: from luis-Precision-5480.. (ipservice-092-209-239-167.092.209.pools.vodafone-ip.de [92.209.239.167]) by zmproxy.tng.vnc.biz (Postfix) with ESMTPSA id C86C41FAD2B; Mon, 18 May 2026 08:21:17 +0200 (CEST) From: Luis To: nathan@kernel.org, nsc@kernel.org Cc: linux-kbuild@vger.kernel.org, linux-kernel@vger.kernel.org, akpm@linux-foundation.org, gregkh@linuxfoundation.org, kstewart@linuxfoundation.org, maximilian.huber@tngtech.com, Luis Augenstein Subject: [PATCH v7 07/15] scripts/sbom: add SPDX classes Date: Mon, 18 May 2026 08:20:54 +0200 Message-ID: <20260518062102.2051814-8-luis.augenstein@tngtech.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20260518062102.2051814-1-luis.augenstein@tngtech.com> References: <20260518062102.2051814-1-luis.augenstein@tngtech.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Luis Augenstein Implement Python dataclasses to model the SPDX classes required within an SPDX document. The class and property names are consistent with the SPDX 3.0.1 specification. Assisted-by: Cursor:claude-sonnet-4-5 Assisted-by: OpenCode:GLM-4-7 Co-developed-by: Maximilian Huber Signed-off-by: Maximilian Huber Signed-off-by: Luis Augenstein --- scripts/sbom/sbom/spdx/__init__.py | 7 + scripts/sbom/sbom/spdx/build.py | 17 +++ scripts/sbom/sbom/spdx/core.py | 170 ++++++++++++++++++++++ scripts/sbom/sbom/spdx/serialization.py | 62 ++++++++ scripts/sbom/sbom/spdx/simplelicensing.py | 20 +++ scripts/sbom/sbom/spdx/software.py | 69 +++++++++ scripts/sbom/sbom/spdx/spdxId.py | 36 +++++ 7 files changed, 381 insertions(+) create mode 100644 scripts/sbom/sbom/spdx/__init__.py create mode 100644 scripts/sbom/sbom/spdx/build.py create mode 100644 scripts/sbom/sbom/spdx/core.py create mode 100644 scripts/sbom/sbom/spdx/serialization.py create mode 100644 scripts/sbom/sbom/spdx/simplelicensing.py create mode 100644 scripts/sbom/sbom/spdx/software.py create mode 100644 scripts/sbom/sbom/spdx/spdxId.py diff --git a/scripts/sbom/sbom/spdx/__init__.py b/scripts/sbom/sbom/spdx/__= init__.py new file mode 100644 index 00000000000..4097b59f8f1 --- /dev/null +++ b/scripts/sbom/sbom/spdx/__init__.py @@ -0,0 +1,7 @@ +# SPDX-License-Identifier: GPL-2.0-only OR MIT +# Copyright (C) 2025 TNG Technology Consulting GmbH + +from .spdxId import SpdxId, SpdxIdGenerator +from .serialization import JsonLdSpdxDocument + +__all__ =3D ["JsonLdSpdxDocument", "SpdxId", "SpdxIdGenerator"] diff --git a/scripts/sbom/sbom/spdx/build.py b/scripts/sbom/sbom/spdx/build= .py new file mode 100644 index 00000000000..a39ec9c09b1 --- /dev/null +++ b/scripts/sbom/sbom/spdx/build.py @@ -0,0 +1,17 @@ +# SPDX-License-Identifier: GPL-2.0-only OR MIT +# Copyright (C) 2025 TNG Technology Consulting GmbH + +from dataclasses import dataclass, field +from sbom.spdx.core import DictionaryEntry, Element, Hash + + +@dataclass(kw_only=3DTrue) +class Build(Element): + """https://spdx.github.io/spdx-spec/v3.0.1/model/Build/Classes/Build/"= "" + + type: str =3D field(init=3DFalse, default=3D"build_Build") + build_buildType: str + build_buildId: str + build_environment: list[DictionaryEntry] =3D field(default_factory=3Dl= ist) + build_configSourceUri: list[str] =3D field(default_factory=3Dlist) + build_configSourceDigest: list[Hash] =3D field(default_factory=3Dlist) diff --git a/scripts/sbom/sbom/spdx/core.py b/scripts/sbom/sbom/spdx/core.py new file mode 100644 index 00000000000..7eb376a1cd8 --- /dev/null +++ b/scripts/sbom/sbom/spdx/core.py @@ -0,0 +1,170 @@ +# SPDX-License-Identifier: GPL-2.0-only OR MIT +# Copyright (C) 2025 TNG Technology Consulting GmbH + +from dataclasses import dataclass, field + +from typing import Any, Literal +from sbom.spdx.spdxId import SpdxId + +SPDX_SPEC_VERSION =3D "3.0.1" + +ExternalIdentifierType =3D Literal["email", "gitoid", "urlScheme"] +HashAlgorithm =3D Literal["sha256", "sha512"] +ProfileIdentifierType =3D Literal["core", "software", "build", "lite", "si= mpleLicensing"] +RelationshipType =3D Literal[ + "contains", + "generates", + "hasDeclaredLicense", + "hasInput", + "hasOutput", + "ancestorOf", + "hasDistributionArtifact", + "dependsOn", +] +RelationshipCompleteness =3D Literal["complete", "incomplete", "noAssertio= n"] + + +@dataclass +class SpdxObject: + def to_dict(self) -> dict[str, Any]: + def _to_dict(v: Any): + return v.to_dict() if hasattr(v, "to_dict") else v + + d: dict[str, Any] =3D {} + for field_name in self.__dataclass_fields__: + value =3D getattr(self, field_name) + if value is None or value =3D=3D [] or value =3D=3D "": + continue + + if isinstance(value, Element): + d[field_name] =3D value.spdxId + elif isinstance(value, list) and len(value) > 0 and isinstance= (value[0], Element): # type: ignore + value: list[Element] =3D value + d[field_name] =3D [v.spdxId for v in value] + else: + d[field_name] =3D [_to_dict(v) for v in value] if isinstan= ce(value, list) else _to_dict(value) # type: ignore + return d + + +@dataclass(kw_only=3DTrue) +class IntegrityMethod(SpdxObject): + """https://spdx.github.io/spdx-spec/v3.0.1/model/Core/Classes/Integrit= yMethod/""" + + +@dataclass(kw_only=3DTrue) +class Hash(IntegrityMethod): + """https://spdx.github.io/spdx-spec/v3.0.1/model/Core/Classes/Hash/""" + + type: str =3D field(init=3DFalse, default=3D"Hash") + hashValue: str + algorithm: HashAlgorithm + + +@dataclass(kw_only=3DTrue) +class Element(SpdxObject): + """https://spdx.github.io/spdx-spec/v3.0.1/model/Core/Classes/Element/= """ + + type: str =3D field(init=3DFalse, default=3D"Element") + spdxId: SpdxId + creationInfo: str =3D "_:creationinfo" + name: str | None =3D None + verifiedUsing: list[Hash] =3D field(default_factory=3Dlist) + comment: str | None =3D None + + +@dataclass(kw_only=3DTrue) +class ExternalMap(SpdxObject): + """https://spdx.github.io/spdx-spec/v3.0.1/model/Core/Classes/External= Map/""" + + type: str =3D field(init=3DFalse, default=3D"ExternalMap") + externalSpdxId: SpdxId + + +@dataclass(kw_only=3DTrue) +class NamespaceMap(SpdxObject): + """https://spdx.github.io/spdx-spec/v3.0.1/model/Core/Classes/Namespac= eMap/""" + + type: str =3D field(init=3DFalse, default=3D"NamespaceMap") + prefix: str + namespace: str + + +@dataclass(kw_only=3DTrue) +class ElementCollection(Element): + """https://spdx.github.io/spdx-spec/v3.0.1/model/Core/Classes/ElementC= ollection/""" + + type: str =3D field(init=3DFalse, default=3D"ElementCollection") + element: list[Element] =3D field(default_factory=3Dlist) + rootElement: list[Element] =3D field(default_factory=3Dlist) + profileConformance: list[ProfileIdentifierType] =3D field(default_fact= ory=3Dlist) + + +@dataclass(kw_only=3DTrue) +class SpdxDocument(ElementCollection): + """https://spdx.github.io/spdx-spec/v3.0.1/model/Core/Classes/SpdxDocu= ment/""" + + type: str =3D field(init=3DFalse, default=3D"SpdxDocument") + import_: list[ExternalMap] =3D field(default_factory=3Dlist) + namespaceMap: list[NamespaceMap] =3D field(default_factory=3Dlist) + + def to_dict(self) -> dict[str, Any]: + return {("import" if k =3D=3D "import_" else k): v for k, v in sup= er().to_dict().items()} + + +@dataclass(kw_only=3DTrue) +class Agent(Element): + """https://spdx.github.io/spdx-spec/v3.0.1/model/Core/Classes/Agent/""" + + type: str =3D field(init=3DFalse, default=3D"Agent") + + +@dataclass(kw_only=3DTrue) +class SoftwareAgent(Agent): + """https://spdx.github.io/spdx-spec/v3.0.1/model/Core/Classes/Software= Agent/""" + + type: str =3D field(init=3DFalse, default=3D"SoftwareAgent") + + +@dataclass(kw_only=3DTrue) +class CreationInfo(SpdxObject): + """https://spdx.github.io/spdx-spec/v3.0.1/model/Core/Classes/Creation= Info/""" + + type: str =3D field(init=3DFalse, default=3D"CreationInfo") + id: SpdxId =3D "_:creationinfo" + specVersion: str =3D SPDX_SPEC_VERSION + createdBy: list[Agent] + created: str + comment: str | None =3D None + + def to_dict(self) -> dict[str, Any]: + return {("@id" if k =3D=3D "id" else k): v for k, v in super().to_= dict().items()} + + +@dataclass(kw_only=3DTrue) +class Relationship(Element): + """https://spdx.github.io/spdx-spec/v3.0.1/model/Core/Classes/Relation= ship/""" + + type: str =3D field(init=3DFalse, default=3D"Relationship") + relationshipType: RelationshipType + from_: Element # underscore because 'from' is a reserved keyword + to: list[Element] + completeness: RelationshipCompleteness | None =3D None + + def to_dict(self) -> dict[str, Any]: + return {("from" if k =3D=3D "from_" else k): v for k, v in super()= .to_dict().items()} + + +@dataclass(kw_only=3DTrue) +class Artifact(Element): + """https://spdx.github.io/spdx-spec/v3.0.1/model/Core/Classes/Artifact= /""" + + type: str =3D field(init=3DFalse, default=3D"Artifact") + + +@dataclass(kw_only=3DTrue) +class DictionaryEntry(SpdxObject): + """https://spdx.github.io/spdx-spec/v3.0.1/model/Core/Classes/Dictiona= ryEntry/""" + + type: str =3D field(init=3DFalse, default=3D"DictionaryEntry") + key: str + value: str diff --git a/scripts/sbom/sbom/spdx/serialization.py b/scripts/sbom/sbom/sp= dx/serialization.py new file mode 100644 index 00000000000..b4df7d368d4 --- /dev/null +++ b/scripts/sbom/sbom/spdx/serialization.py @@ -0,0 +1,62 @@ +# SPDX-License-Identifier: GPL-2.0-only OR MIT +# Copyright (C) 2025 TNG Technology Consulting GmbH + +import json +from typing import Any +from sbom.path_utils import PathStr +from sbom.spdx.core import SPDX_SPEC_VERSION, SpdxDocument, SpdxObject + + +class JsonLdSpdxDocument: + """Represents an SPDX document in JSON-LD format for serialization.""" + + graph: list[SpdxObject] + + def __init__(self, graph: list[SpdxObject]) -> None: + """ + Initialize a JSON-LD SPDX document from a graph of SPDX objects. + The graph must contain a single SpdxDocument element. + + Args: + graph: List of SPDX objects representing the complete SPDX doc= ument. + """ + self.graph =3D graph + + @property + def context(self) -> list[str | dict[str, str]]: + spdx_document =3D next(element for element in self.graph if isinst= ance(element, SpdxDocument)) + return [ + f"https://spdx.org/rdf/{SPDX_SPEC_VERSION}/spdx-context.jsonld= ", + {ns.prefix: ns.namespace for ns in spdx_document.namespaceMap}, + ] + + def to_dict(self) -> dict[str, Any]: + """ + Convert the SPDX document to a dictionary representation suitable = for JSON serialization. + + Returns: + Dictionary with @context and @graph keys following JSON-LD for= mat. + """ + def _item_to_dict(item: SpdxObject) -> dict: + d =3D item.to_dict() + if isinstance(item, SpdxDocument): + d.pop("namespaceMap", None) + return d + return { + "@context": self.context, + "@graph": [_item_to_dict(item) for item in self.graph], + } + + def save(self, path: PathStr, prettify: bool) -> None: + """ + Save the SPDX document to a JSON file. + + Args: + path: File path where the document will be saved. + prettify: Whether to pretty-print the JSON with indentation. + """ + with open(path, "w", encoding=3D"utf-8") as f: + if prettify: + json.dump(self.to_dict(), f, indent=3D2) + else: + json.dump(self.to_dict(), f, separators=3D(",", ":")) diff --git a/scripts/sbom/sbom/spdx/simplelicensing.py b/scripts/sbom/sbom/= spdx/simplelicensing.py new file mode 100644 index 00000000000..750ddd24ad8 --- /dev/null +++ b/scripts/sbom/sbom/spdx/simplelicensing.py @@ -0,0 +1,20 @@ +# SPDX-License-Identifier: GPL-2.0-only OR MIT +# Copyright (C) 2025 TNG Technology Consulting GmbH + +from dataclasses import dataclass, field +from sbom.spdx.core import Element + + +@dataclass(kw_only=3DTrue) +class AnyLicenseInfo(Element): + """https://spdx.github.io/spdx-spec/v3.0.1/model/SimpleLicensing/Class= es/AnyLicenseInfo/""" + + type: str =3D field(init=3DFalse, default=3D"simplelicensing_AnyLicens= eInfo") + + +@dataclass(kw_only=3DTrue) +class LicenseExpression(AnyLicenseInfo): + """https://spdx.github.io/spdx-spec/v3.0.1/model/SimpleLicensing/Class= es/LicenseExpression/""" + + type: str =3D field(init=3DFalse, default=3D"simplelicensing_LicenseEx= pression") + simplelicensing_licenseExpression: str diff --git a/scripts/sbom/sbom/spdx/software.py b/scripts/sbom/sbom/spdx/so= ftware.py new file mode 100644 index 00000000000..2f46de7c316 --- /dev/null +++ b/scripts/sbom/sbom/spdx/software.py @@ -0,0 +1,69 @@ +# SPDX-License-Identifier: GPL-2.0-only OR MIT +# Copyright (C) 2025 TNG Technology Consulting GmbH + +from dataclasses import dataclass, field +from typing import Literal +from sbom.spdx.core import Artifact, ElementCollection, IntegrityMethod + + +SbomType =3D Literal["source", "build"] +FileKindType =3D Literal["file", "directory"] +SoftwarePurpose =3D Literal[ + "source", + "archive", + "library", + "file", + "data", + "configuration", + "executable", + "module", + "application", + "documentation", + "other", +] +ContentIdentifierType =3D Literal["gitoid", "swhid"] + + +@dataclass(kw_only=3DTrue) +class Sbom(ElementCollection): + """https://spdx.github.io/spdx-spec/v3.0.1/model/Software/Classes/Sbom= /""" + + type: str =3D field(init=3DFalse, default=3D"software_Sbom") + software_sbomType: list[SbomType] =3D field(default_factory=3Dlist) + + +@dataclass(kw_only=3DTrue) +class ContentIdentifier(IntegrityMethod): + """https://spdx.github.io/spdx-spec/v3.0.1/model/Software/Classes/Cont= entIdentifier/""" + + type: str =3D field(init=3DFalse, default=3D"software_ContentIdentifie= r") + software_contentIdentifierType: ContentIdentifierType + software_contentIdentifierValue: str + + +@dataclass(kw_only=3DTrue) +class SoftwareArtifact(Artifact): + """https://spdx.github.io/spdx-spec/v3.0.1/model/Software/Classes/Soft= wareArtifact/""" + + type: str =3D field(init=3DFalse, default=3D"software_Artifact") + software_primaryPurpose: SoftwarePurpose | None =3D None + software_copyrightText: str | None =3D None + software_contentIdentifier: list[ContentIdentifier] =3D field(default_= factory=3Dlist) + + +@dataclass(kw_only=3DTrue) +class Package(SoftwareArtifact): + """https://spdx.github.io/spdx-spec/v3.0.1/model/Software/Classes/Pack= age/""" + + type: str =3D field(init=3DFalse, default=3D"software_Package") + name: str # type: ignore + software_packageVersion: str | None =3D None + + +@dataclass(kw_only=3DTrue) +class File(SoftwareArtifact): + """https://spdx.github.io/spdx-spec/v3.0.1/model/Software/Classes/File= /""" + + type: str =3D field(init=3DFalse, default=3D"software_File") + name: str # type: ignore + software_fileKind: FileKindType | None =3D None diff --git a/scripts/sbom/sbom/spdx/spdxId.py b/scripts/sbom/sbom/spdx/spdx= Id.py new file mode 100644 index 00000000000..589e85c5f70 --- /dev/null +++ b/scripts/sbom/sbom/spdx/spdxId.py @@ -0,0 +1,36 @@ +# SPDX-License-Identifier: GPL-2.0-only OR MIT +# Copyright (C) 2025 TNG Technology Consulting GmbH + +from itertools import count +from typing import Iterator + +SpdxId =3D str + + +class SpdxIdGenerator: + _namespace: str + _prefix: str | None =3D None + _counter: Iterator[int] + + def __init__(self, namespace: str, prefix: str | None =3D None) -> Non= e: + """ + Initialize the SPDX ID generator with a namespace. + + Args: + namespace: The full namespace to use for generated IDs. + prefix: Optional. If provided, generated IDs will use this pre= fix instead of the full namespace. + """ + self._namespace =3D namespace + self._prefix =3D prefix + self._counter =3D count(0) + + def generate(self) -> SpdxId: + return f"{f'{self._prefix}:' if self._prefix else self._namespace}= {next(self._counter)}" + + @property + def prefix(self) -> str | None: + return self._prefix + + @property + def namespace(self) -> str: + return self._namespace --=20 2.43.0 From nobody Mon May 25 05:54:39 2026 Received: from mailgw02.zimbra-vnc.de (mailgw02.zimbra-vnc.de [148.251.102.236]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 54F463DE45C; Mon, 18 May 2026 06:21:29 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=148.251.102.236 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779085297; cv=none; b=G78aFuChrqToR3jHUvB6sBcQWL8kirbj+kSpiP8DoZRw/NDk/rgMWB5i+1f0k0n7KKAa25PRJzhhbU+jo72lJ1hM2rX5rIa3MsNnOvL88iX9BomTLiNbXk7X+sHQ2t/t1x/I+4dDzmt0M3UFTrkX3oFVCozUj/MzexkRNrGFaBE= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779085297; c=relaxed/simple; bh=GZ4rErdzlNjbULgFzTaRldiXR+bwkL7yLNUIKEruU8Y=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=J1d6U2B3b3Ti4Rh2RSpmRW3pmmrakeCzoP8/hH04fdg86pPN/8nbWYoX4chyL0MBc4kur2NEWmoESOUKxRypDx+9mwQNtklaaC5f1tsCrX0YfqKleklaxFdFkn1v112uaaX8wELAzjVFTerk654TPfpJfZSEFDMSbMGmIVO4B28= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=tngtech.com; spf=pass smtp.mailfrom=tngtech.com; dkim=pass (2048-bit key) header.d=tngtech.com header.i=@tngtech.com header.b=hYYCAevj; arc=none smtp.client-ip=148.251.102.236 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=tngtech.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=tngtech.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=tngtech.com header.i=@tngtech.com header.b="hYYCAevj" Received: from zmproxy.tng.vnc.biz (zimbra-vnc.tngtech.com [35.234.71.156]) by mailgw02.zimbra-vnc.de (Postfix) with ESMTPS id 61B80200CA; Mon, 18 May 2026 08:21:25 +0200 (CEST) Received: from localhost (localhost [127.0.0.1]) by zmproxy.tng.vnc.biz (Postfix) with ESMTP id 255751F8989; Mon, 18 May 2026 08:21:25 +0200 (CEST) Received: from zmproxy.tng.vnc.biz ([127.0.0.1]) by localhost (zmproxy.tng.vnc.biz [127.0.0.1]) (amavis, port 10032) with ESMTP id IqPhhak03CXC; Mon, 18 May 2026 08:21:19 +0200 (CEST) Received: from localhost (localhost [127.0.0.1]) by zmproxy.tng.vnc.biz (Postfix) with ESMTP id 9C9491FACD3; Mon, 18 May 2026 08:21:19 +0200 (CEST) DKIM-Filter: OpenDKIM Filter v2.10.3 zmproxy.tng.vnc.biz 9C9491FACD3 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=tngtech.com; s=B14491C6-869D-11EB-BB6C-8DD33D883B31; t=1779085279; bh=nVd2LHper35jUbhxLuTA1qjKsjS7H5X2RBEO/PcfDiM=; h=From:To:Subject:Date:Message-ID:MIME-Version; b=hYYCAevj9+S6M6Zk0VoL1HxK9YNs3iLRVmRIcwhXVBrbjmBy77NXXfs99hhu7uXvE iB09I7hGNJfKuVsICgNJN8lYieHQCcgpT15fBF+RWDSI5tb9keTst6K/oJOrhQry+U C2+iYTP+UEplJEuDRp7lfVHLattiEDxiqFtdtw3BELGSMc38ksaqKjfUWWEQJvSw2L Xa+qn1TMk6eER2z2nrdTZTjh4vhChL4S0GZD3FQhnGnznoMmxa8QGIy94TnYUuL/a3 iHkxUTi1ei9Gi6kcX3WoE9eG9zHGFaq0gNzOhz+IgYX9hs1C4pchfvYYVakGzDrA7d MPUl9nHK/cGeg== X-Virus-Scanned: amavis at zmproxy.tng.vnc.biz Received: from zmproxy.tng.vnc.biz ([127.0.0.1]) by localhost (zmproxy.tng.vnc.biz [127.0.0.1]) (amavis, port 10026) with ESMTP id T5MZgUDEQg59; Mon, 18 May 2026 08:21:19 +0200 (CEST) Received: from luis-Precision-5480.. (ipservice-092-209-239-167.092.209.pools.vodafone-ip.de [92.209.239.167]) by zmproxy.tng.vnc.biz (Postfix) with ESMTPSA id 46E551F8989; Mon, 18 May 2026 08:21:19 +0200 (CEST) From: Luis To: nathan@kernel.org, nsc@kernel.org Cc: linux-kbuild@vger.kernel.org, linux-kernel@vger.kernel.org, akpm@linux-foundation.org, gregkh@linuxfoundation.org, kstewart@linuxfoundation.org, maximilian.huber@tngtech.com, Luis Augenstein Subject: [PATCH v7 08/15] scripts/sbom: add JSON-LD serialization Date: Mon, 18 May 2026 08:20:55 +0200 Message-ID: <20260518062102.2051814-9-luis.augenstein@tngtech.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20260518062102.2051814-1-luis.augenstein@tngtech.com> References: <20260518062102.2051814-1-luis.augenstein@tngtech.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Luis Augenstein Add infrastructure to serialize an SPDX graph as a JSON-LD document. NamespaceMaps in the SPDX document are converted to custom prefixes in the @context field of the JSON-LD output. The SBOM tool uses NamespaceMaps solely to shorten SPDX IDs, avoiding repetition of full namespace URIs by using short prefixes. Assisted-by: Cursor:claude-sonnet-4-5 Assisted-by: OpenCode:GLM-4-7 Co-developed-by: Maximilian Huber Signed-off-by: Maximilian Huber Signed-off-by: Luis Augenstein --- Makefile | 3 +- scripts/sbom/sbom.py | 56 +++++++++++++++++++ scripts/sbom/sbom/config.py | 56 +++++++++++++++++++ scripts/sbom/sbom/spdx_graph/__init__.py | 7 +++ .../sbom/sbom/spdx_graph/build_spdx_graphs.py | 36 ++++++++++++ .../sbom/sbom/spdx_graph/spdx_graph_model.py | 36 ++++++++++++ 6 files changed, 193 insertions(+), 1 deletion(-) create mode 100644 scripts/sbom/sbom/spdx_graph/__init__.py create mode 100644 scripts/sbom/sbom/spdx_graph/build_spdx_graphs.py create mode 100644 scripts/sbom/sbom/spdx_graph/spdx_graph_model.py diff --git a/Makefile b/Makefile index 5cae3679343..c121283231d 100644 --- a/Makefile +++ b/Makefile @@ -2212,7 +2212,8 @@ quiet_cmd_sbom =3D GEN $(sbom_targets) --src-tree $(abspath $(srctree)) \ --obj-tree $(abspath $(objtree)) \ --roots-file "$(tmp-target)" \ - --output-directory $(abspath $(objtree)); + --output-directory $(abspath $(objtree)) \ + --generate-spdx; PHONY +=3D sbom sbom: $(notdir $(KBUILD_IMAGE)) include/generated/autoconf.h $(if $(CONFIG= _MODULES),modules modules.order) $(call cmd,sbom) diff --git a/scripts/sbom/sbom.py b/scripts/sbom/sbom.py index d700e4f294f..764175b9c89 100644 --- a/scripts/sbom/sbom.py +++ b/scripts/sbom/sbom.py @@ -6,13 +6,18 @@ Compute software bill of materials in SPDX format describing a kernel buil= d. """ =20 +import json import logging import os import sys import time +import uuid import sbom.sbom_logging as sbom_logging from sbom.config import get_config from sbom.path_utils import is_relative_to +from sbom.spdx import JsonLdSpdxDocument, SpdxIdGenerator +from sbom.spdx.core import CreationInfo, SpdxDocument +from sbom.spdx_graph import SpdxIdGeneratorCollection, build_spdx_graphs from sbom.cmd_graph import CmdGraph =20 =20 @@ -71,6 +76,57 @@ def main(): f.write("\n".join(str(file_path) for file_path in used_fil= es)) logging.debug(f"Successfully saved {used_files_path}") =20 + if config.generate_spdx is False: + _exit_with_summary(config.write_output_on_error) + return + + # Build SPDX Documents + logging.debug("Start generating SPDX graph based on cmd graph") + start_time =3D time.time() + + # The real uuid will be generated based on the content of the SPDX gra= phs + # to ensure that the same SPDX document is always assigned the same uu= id. + PLACEHOLDER_UUID =3D "00000000-0000-0000-0000-000000000000" + spdx_id_base_namespace =3D f"{config.spdxId_prefix}{PLACEHOLDER_UUID}/" + spdx_id_generators =3D SpdxIdGeneratorCollection( + base=3DSpdxIdGenerator(prefix=3D"p", namespace=3Dspdx_id_base_name= space), + source=3DSpdxIdGenerator(prefix=3D"s", namespace=3Df"{spdx_id_base= _namespace}source/"), + build=3DSpdxIdGenerator(prefix=3D"b", namespace=3Df"{spdx_id_base_= namespace}build/"), + output=3DSpdxIdGenerator(prefix=3D"o", namespace=3Df"{spdx_id_base= _namespace}output/"), + ) + + spdx_graphs =3D build_spdx_graphs( + cmd_graph, + spdx_id_generators, + config, + ) + spdx_id_uuid =3D uuid.uuid5( + uuid.NAMESPACE_URL, + "".join( + json.dumps(element.to_dict()) for spdx_graph in spdx_graphs.va= lues() for element in spdx_graph.to_list() + ), + ) + logging.debug(f"Generated SPDX graph in {time.time() - start_time} sec= onds") + + if not sbom_logging.has_errors() or config.write_output_on_error: + for kernel_sbom_kind, spdx_graph in spdx_graphs.items(): + spdx_graph_objects =3D spdx_graph.to_list() + # Add warning and error summary to creation info comment + creation_info =3D next(element for element in spdx_graph_objec= ts if isinstance(element, CreationInfo)) + creation_info.comment =3D "\n".join([ + sbom_logging.summarize_warnings(), + sbom_logging.summarize_errors(), + ]).strip() + # Replace Placeholder uuid with real uuid for spdxIds + spdx_document =3D next(element for element in spdx_graph_objec= ts if isinstance(element, SpdxDocument)) + for namespaceMap in spdx_document.namespaceMap: + namespaceMap.namespace =3D namespaceMap.namespace.replace(= PLACEHOLDER_UUID, str(spdx_id_uuid)) + # Serialize SPDX graph to JSON-LD + spdx_doc =3D JsonLdSpdxDocument(graph=3Dspdx_graph_objects) + save_path =3D os.path.join(config.output_directory, config.spd= x_file_names[kernel_sbom_kind]) + spdx_doc.save(save_path, config.prettify_json) + logging.debug(f"Successfully saved {save_path}") + _exit_with_summary(config.write_output_on_error) =20 =20 diff --git a/scripts/sbom/sbom/config.py b/scripts/sbom/sbom/config.py index b8c1a2b404d..98c7d939364 100644 --- a/scripts/sbom/sbom/config.py +++ b/scripts/sbom/sbom/config.py @@ -3,11 +3,18 @@ =20 import argparse from dataclasses import dataclass +from enum import Enum import os from typing import Any from sbom.path_utils import PathStr =20 =20 +class KernelSpdxDocumentKind(Enum): + SOURCE =3D "source" + BUILD =3D "build" + OUTPUT =3D "output" + + @dataclass class KernelSbomConfig: src_tree: PathStr @@ -19,6 +26,13 @@ class KernelSbomConfig: root_paths: list[PathStr] """List of paths to root outputs (relative to obj_tree) to base the SB= OM on.""" =20 + generate_spdx: bool + """Whether to generate SPDX SBOM documents. If False, no SPDX files ar= e created.""" + + spdx_file_names: dict[KernelSpdxDocumentKind, str] + """If `generate_spdx` is True, defines the file names for each SPDX SB= OM kind + (source, build, output) to store on disk.""" + generate_used_files: bool """Whether to generate a flat list of all source files used in the bui= ld. If False, no used-files document is created.""" @@ -38,6 +52,12 @@ class KernelSbomConfig: write_output_on_error: bool """Whether to write output documents even if errors occur.""" =20 + spdxId_prefix: str + """Prefix to use for all SPDX element IDs.""" + + prettify_json: bool + """Whether to pretty-print generated SPDX JSON documents.""" + =20 def _parse_cli_arguments(parser: argparse.ArgumentParser) -> dict[str, Any= ]: """ @@ -67,6 +87,15 @@ def _parse_cli_arguments(parser: argparse.ArgumentParser= ) -> dict[str, Any]: "--roots-file", help=3D"Path to a file containing the root paths (one per line). C= annot be used together with --roots.", ) + parser.add_argument( + "--generate-spdx", + action=3D"store_true", + default=3DFalse, + help=3D( + "Whether to create sbom-source.spdx.json, sbom-build.spdx.json= and " + "sbom-output.spdx.json documents (default: False)" + ), + ) parser.add_argument( "--generate-used-files", action=3D"store_true", @@ -114,6 +143,20 @@ def _parse_cli_arguments(parser: argparse.ArgumentPars= er) -> dict[str, Any]: ), ) =20 + # SPDX specific options + spdx_group =3D parser.add_argument_group("SPDX options", "Options for = customizing SPDX document generation") + spdx_group.add_argument( + "--spdxId-prefix", + default=3D"urn:spdx.dev:", + help=3D"The prefix to use for all spdxId properties. (default: urn= :spdx.dev:)", + ) + spdx_group.add_argument( + "--prettify-json", + action=3D"store_true", + default=3DFalse, + help=3D"Whether to pretty print the generated spdx.json documents = (default: False)", + ) + args =3D vars(parser.parse_args()) return args =20 @@ -144,6 +187,7 @@ def get_config() -> KernelSbomConfig: root_paths =3D args["roots"] _validate_path_arguments(parser, src_tree, obj_tree, root_paths) =20 + generate_spdx =3D args["generate_spdx"] generate_used_files =3D args["generate_used_files"] output_directory =3D os.path.realpath(args["output_directory"]) debug =3D args["debug"] @@ -151,19 +195,31 @@ def get_config() -> KernelSbomConfig: fail_on_unknown_build_command =3D not args["do_not_fail_on_unknown_bui= ld_command"] write_output_on_error =3D args["write_output_on_error"] =20 + spdxId_prefix =3D args["spdxId_prefix"] + prettify_json =3D args["prettify_json"] + # Hardcoded config + spdx_file_names =3D { + KernelSpdxDocumentKind.SOURCE: "sbom-source.spdx.json", + KernelSpdxDocumentKind.BUILD: "sbom-build.spdx.json", + KernelSpdxDocumentKind.OUTPUT: "sbom-output.spdx.json", + } used_files_file_name =3D "sbom.used-files.txt" =20 return KernelSbomConfig( src_tree=3Dsrc_tree, obj_tree=3Dobj_tree, root_paths=3Droot_paths, + generate_spdx=3Dgenerate_spdx, + spdx_file_names=3Dspdx_file_names, generate_used_files=3Dgenerate_used_files, used_files_file_name=3Dused_files_file_name, output_directory=3Doutput_directory, debug=3Ddebug, fail_on_unknown_build_command=3Dfail_on_unknown_build_command, write_output_on_error=3Dwrite_output_on_error, + spdxId_prefix=3DspdxId_prefix, + prettify_json=3Dprettify_json, ) =20 =20 diff --git a/scripts/sbom/sbom/spdx_graph/__init__.py b/scripts/sbom/sbom/s= pdx_graph/__init__.py new file mode 100644 index 00000000000..3557b1d51bf --- /dev/null +++ b/scripts/sbom/sbom/spdx_graph/__init__.py @@ -0,0 +1,7 @@ +# SPDX-License-Identifier: GPL-2.0-only OR MIT +# Copyright (C) 2025 TNG Technology Consulting GmbH + +from .build_spdx_graphs import build_spdx_graphs +from .spdx_graph_model import SpdxIdGeneratorCollection + +__all__ =3D ["build_spdx_graphs", "SpdxIdGeneratorCollection"] diff --git a/scripts/sbom/sbom/spdx_graph/build_spdx_graphs.py b/scripts/sb= om/sbom/spdx_graph/build_spdx_graphs.py new file mode 100644 index 00000000000..bb3db4e423d --- /dev/null +++ b/scripts/sbom/sbom/spdx_graph/build_spdx_graphs.py @@ -0,0 +1,36 @@ +# SPDX-License-Identifier: GPL-2.0-only OR MIT +# Copyright (C) 2025 TNG Technology Consulting GmbH + + +from typing import Protocol + +from sbom.config import KernelSpdxDocumentKind +from sbom.cmd_graph import CmdGraph +from sbom.path_utils import PathStr +from sbom.spdx_graph.spdx_graph_model import SpdxGraph, SpdxIdGeneratorCol= lection + + +class SpdxGraphConfig(Protocol): + obj_tree: PathStr + src_tree: PathStr + + +def build_spdx_graphs( + cmd_graph: CmdGraph, + spdx_id_generators: SpdxIdGeneratorCollection, + config: SpdxGraphConfig, +) -> dict[KernelSpdxDocumentKind, SpdxGraph]: + """ + Builds SPDX graphs (output, source, and build) based on a cmd dependen= cy graph. + If the source and object trees are identical, no dedicated source grap= h can be created. + In that case the source files are added to the build graph instead. + + Args: + cmd_graph: The dependency graph of a kernel build. + spdx_id_generators: Collection of SPDX ID generators. + config: Configuration options. + + Returns: + Dictionary of SPDX graphs + """ + return {} diff --git a/scripts/sbom/sbom/spdx_graph/spdx_graph_model.py b/scripts/sbo= m/sbom/spdx_graph/spdx_graph_model.py new file mode 100644 index 00000000000..682194d4362 --- /dev/null +++ b/scripts/sbom/sbom/spdx_graph/spdx_graph_model.py @@ -0,0 +1,36 @@ +# SPDX-License-Identifier: GPL-2.0-only OR MIT +# Copyright (C) 2025 TNG Technology Consulting GmbH + +from dataclasses import dataclass +from sbom.spdx.core import CreationInfo, SoftwareAgent, SpdxDocument, Spdx= Object +from sbom.spdx.software import Sbom +from sbom.spdx.spdxId import SpdxIdGenerator + + +@dataclass +class SpdxGraph: + """Represents the complete graph of a single SPDX document.""" + + spdx_document: SpdxDocument + agent: SoftwareAgent + creation_info: CreationInfo + sbom: Sbom + + def to_list(self) -> list[SpdxObject]: + return [ + self.spdx_document, + self.agent, + self.creation_info, + self.sbom, + *self.sbom.element, + ] + + +@dataclass +class SpdxIdGeneratorCollection: + """Holds SPDX ID generators for different document types to ensure glo= bally unique SPDX IDs.""" + + base: SpdxIdGenerator + source: SpdxIdGenerator + build: SpdxIdGenerator + output: SpdxIdGenerator --=20 2.43.0 From nobody Mon May 25 05:54:39 2026 Received: from mailgw02.zimbra-vnc.de (mailgw02.zimbra-vnc.de [148.251.102.236]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 52DE53DE427; Mon, 18 May 2026 06:21:29 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=148.251.102.236 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779085296; cv=none; b=leYeqiElxU1tW4UY/m8tXj+nKiv1N0wm5fQKcBko19Ti0AiV468YZqD4vNjyZJGrTUrE6/m8EL7zEMsQzgkk+tbtKDUkKXMpIoYxQ7YNHeDKBtwN6ROAi8mCCfuOwjfv4kd2d71hy4rfDiBtEXmcoMS9WA4t1XOf5yupYXY3cpg= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779085296; c=relaxed/simple; bh=W+c/gp0y77NyMS9iju1Tthh6XfsrWjfOveM/s+3V3a8=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=L5uqrTohEwODGdMpz8/dJcn1+1TpvjSyBOlCmxktxA53qI5UG/I/NlLhLAvmF3NteQzMFYN1wU9ZOxUhOGta1ThtVuOiCMbHYssYCO3CCzV/8RZFfud0TpWWKvtRlKmOwY/KG+30ykn0WVn38nwIiEipI49gwSU6pp7sRoBO5xQ= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=tngtech.com; spf=pass smtp.mailfrom=tngtech.com; dkim=pass (2048-bit key) header.d=tngtech.com header.i=@tngtech.com header.b=ETI+wbnt; arc=none smtp.client-ip=148.251.102.236 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=tngtech.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=tngtech.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=tngtech.com header.i=@tngtech.com header.b="ETI+wbnt" Received: from zmproxy.tng.vnc.biz (zimbra-vnc.tngtech.com [35.234.71.156]) by mailgw02.zimbra-vnc.de (Postfix) with ESMTPS id 2DEEA200B9; Mon, 18 May 2026 08:21:25 +0200 (CEST) Received: from localhost (localhost [127.0.0.1]) by zmproxy.tng.vnc.biz (Postfix) with ESMTP id EF0C01FAD21; Mon, 18 May 2026 08:21:24 +0200 (CEST) Received: from zmproxy.tng.vnc.biz ([127.0.0.1]) by localhost (zmproxy.tng.vnc.biz [127.0.0.1]) (amavis, port 10032) with ESMTP id H8KKDhh9pnzo; Mon, 18 May 2026 08:21:21 +0200 (CEST) Received: from localhost (localhost [127.0.0.1]) by zmproxy.tng.vnc.biz (Postfix) with ESMTP id 225201FAD23; Mon, 18 May 2026 08:21:21 +0200 (CEST) DKIM-Filter: OpenDKIM Filter v2.10.3 zmproxy.tng.vnc.biz 225201FAD23 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=tngtech.com; s=B14491C6-869D-11EB-BB6C-8DD33D883B31; t=1779085281; bh=HkNqPgZi3l77KFdipEAjp9eOxi4kxYpMhOZ7S4aLkgA=; h=From:To:Subject:Date:Message-ID:MIME-Version; b=ETI+wbntwdZPGRr5MNbKlPgHt+x6jOmsa3j/Rv0ZiPxYHSatG2XhkpTEVs+h4wpp3 KKvIOVSG8Ry5GDU+qaGol+uckA1+O1uCoX4pFQJs9T4Gz3hadVcFNKW32qR2JJmRTk KmxjOBF43REiWEA3lmKXadJjEfgO9stM7YiEuX4cS6kCOh9c90razt9nQx1adVU0M1 Hr9eMlzpt9Vh4xL1u0ek+oL0VTqrgcZLAdK0GDUchT+1x/L4aqGK32nB5VGBtysFfp wIWQM5Jzd9VnyFk++fp87nd3Le7nPz3Lok2qBCjz5MVrf7ABepV7v0IwY/rUnfoXg8 pCGTg3XVhJDZA== X-Virus-Scanned: amavis at zmproxy.tng.vnc.biz Received: from zmproxy.tng.vnc.biz ([127.0.0.1]) by localhost (zmproxy.tng.vnc.biz [127.0.0.1]) (amavis, port 10026) with ESMTP id TrVPdy-T6gZH; Mon, 18 May 2026 08:21:21 +0200 (CEST) Received: from luis-Precision-5480.. (ipservice-092-209-239-167.092.209.pools.vodafone-ip.de [92.209.239.167]) by zmproxy.tng.vnc.biz (Postfix) with ESMTPSA id C0AF21FAD21; Mon, 18 May 2026 08:21:20 +0200 (CEST) From: Luis To: nathan@kernel.org, nsc@kernel.org Cc: linux-kbuild@vger.kernel.org, linux-kernel@vger.kernel.org, akpm@linux-foundation.org, gregkh@linuxfoundation.org, kstewart@linuxfoundation.org, maximilian.huber@tngtech.com, Luis Augenstein Subject: [PATCH v7 09/15] scripts/sbom: add shared SPDX elements Date: Mon, 18 May 2026 08:20:56 +0200 Message-ID: <20260518062102.2051814-10-luis.augenstein@tngtech.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20260518062102.2051814-1-luis.augenstein@tngtech.com> References: <20260518062102.2051814-1-luis.augenstein@tngtech.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Luis Augenstein Implement shared SPDX elements used in all three documents. Assisted-by: Cursor:claude-sonnet-4-5 Assisted-by: OpenCode:GLM-4-7 Co-developed-by: Maximilian Huber Signed-off-by: Maximilian Huber Signed-off-by: Luis Augenstein --- scripts/sbom/sbom/config.py | 9 ++++++ .../sbom/sbom/spdx_graph/build_spdx_graphs.py | 5 ++- .../sbom/spdx_graph/shared_spdx_elements.py | 32 +++++++++++++++++++ 3 files changed, 45 insertions(+), 1 deletion(-) create mode 100644 scripts/sbom/sbom/spdx_graph/shared_spdx_elements.py diff --git a/scripts/sbom/sbom/config.py b/scripts/sbom/sbom/config.py index 98c7d939364..b1dd30790f5 100644 --- a/scripts/sbom/sbom/config.py +++ b/scripts/sbom/sbom/config.py @@ -3,6 +3,7 @@ =20 import argparse from dataclasses import dataclass +from datetime import datetime, timezone from enum import Enum import os from typing import Any @@ -52,6 +53,9 @@ class KernelSbomConfig: write_output_on_error: bool """Whether to write output documents even if errors occur.""" =20 + created: datetime + """Datetime to use for the SPDX created property of the CreationInfo e= lement.""" + spdxId_prefix: str """Prefix to use for all SPDX element IDs.""" =20 @@ -195,6 +199,10 @@ def get_config() -> KernelSbomConfig: fail_on_unknown_build_command =3D not args["do_not_fail_on_unknown_bui= ld_command"] write_output_on_error =3D args["write_output_on_error"] =20 + created =3D datetime.fromtimestamp( + max([os.path.getmtime(os.path.join(obj_tree, root_path)) for root_= path in root_paths]), + tz=3Dtimezone.utc, + ) spdxId_prefix =3D args["spdxId_prefix"] prettify_json =3D args["prettify_json"] =20 @@ -218,6 +226,7 @@ def get_config() -> KernelSbomConfig: debug=3Ddebug, fail_on_unknown_build_command=3Dfail_on_unknown_build_command, write_output_on_error=3Dwrite_output_on_error, + created=3Dcreated, spdxId_prefix=3DspdxId_prefix, prettify_json=3Dprettify_json, ) diff --git a/scripts/sbom/sbom/spdx_graph/build_spdx_graphs.py b/scripts/sb= om/sbom/spdx_graph/build_spdx_graphs.py index bb3db4e423d..9c47258a31c 100644 --- a/scripts/sbom/sbom/spdx_graph/build_spdx_graphs.py +++ b/scripts/sbom/sbom/spdx_graph/build_spdx_graphs.py @@ -1,18 +1,20 @@ # SPDX-License-Identifier: GPL-2.0-only OR MIT # Copyright (C) 2025 TNG Technology Consulting GmbH =20 - +from datetime import datetime from typing import Protocol =20 from sbom.config import KernelSpdxDocumentKind from sbom.cmd_graph import CmdGraph from sbom.path_utils import PathStr from sbom.spdx_graph.spdx_graph_model import SpdxGraph, SpdxIdGeneratorCol= lection +from sbom.spdx_graph.shared_spdx_elements import SharedSpdxElements =20 =20 class SpdxGraphConfig(Protocol): obj_tree: PathStr src_tree: PathStr + created: datetime =20 =20 def build_spdx_graphs( @@ -33,4 +35,5 @@ def build_spdx_graphs( Returns: Dictionary of SPDX graphs """ + shared_elements =3D SharedSpdxElements.create(spdx_id_generators.base,= config.created) return {} diff --git a/scripts/sbom/sbom/spdx_graph/shared_spdx_elements.py b/scripts= /sbom/sbom/spdx_graph/shared_spdx_elements.py new file mode 100644 index 00000000000..115e8778a46 --- /dev/null +++ b/scripts/sbom/sbom/spdx_graph/shared_spdx_elements.py @@ -0,0 +1,32 @@ +# SPDX-License-Identifier: GPL-2.0-only OR MIT +# Copyright (C) 2025 TNG Technology Consulting GmbH + +from dataclasses import dataclass +from datetime import datetime, timezone +from sbom.spdx.core import CreationInfo, SoftwareAgent +from sbom.spdx.spdxId import SpdxIdGenerator + + +@dataclass(frozen=3DTrue) +class SharedSpdxElements: + agent: SoftwareAgent + creation_info: CreationInfo + + @classmethod + def create(cls, spdx_id_generator: SpdxIdGenerator, created: datetime)= -> "SharedSpdxElements": + """ + Creates shared SPDX elements used across multiple documents. + + Args: + spdx_id_generator: Generator for creating SPDX IDs. + created: SPDX 'created' property used for the creation info. + + Returns: + SharedSpdxElements with agent and creation info. + """ + agent =3D SoftwareAgent( + spdxId=3Dspdx_id_generator.generate(), + name=3D"KernelSbom", + ) + creation_info =3D CreationInfo(createdBy=3D[agent], created=3Dcrea= ted.astimezone(timezone.utc).strftime("%Y-%m-%dT%H:%M:%SZ")) + return SharedSpdxElements(agent=3Dagent, creation_info=3Dcreation_= info) --=20 2.43.0 From nobody Mon May 25 05:54:39 2026 Received: from mailgw02.zimbra-vnc.de (mailgw02.zimbra-vnc.de [148.251.102.236]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 6F5433101D0; Mon, 18 May 2026 06:21:30 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=148.251.102.236 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779085302; cv=none; b=lDvpIkIQw0dmJ3dT2BuneSRttM4rJOmJYqpmy2QtcQf7n1OYSyCuLRGQALRX+K5xxJcEFJpfM5O5CSDcn1os9CmKCb9mcWmUGdWtWzhnEadFIC26JMBOU5mZv9J4RXWTVHtg+DFFSPjyEl5f5KlUywVTO+6XYLegktIaxw1rqw8= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779085302; c=relaxed/simple; bh=2H81XdmyeLSUSX8m+yjk9zFbTxDy9PI7ISspGSYp78w=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=lCP5tnlg0kgiXe+I2MVbrBLILnqCGWpwy0XcOqJqyNsokp913LsMVac+bh2Z7FsfAOWl6oAb0aLshld9W15uzJUB/Dv+kq7jouVsGZvr4RpgzRb0htF/cdO/wIZSSLNnDem/Do15Cn2yAoKuu/t6+q5zRJ1QUAfdr8yXL4C8KsU= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=tngtech.com; spf=pass smtp.mailfrom=tngtech.com; dkim=pass (2048-bit key) header.d=tngtech.com header.i=@tngtech.com header.b=brweu633; arc=none smtp.client-ip=148.251.102.236 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=tngtech.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=tngtech.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=tngtech.com header.i=@tngtech.com header.b="brweu633" Received: from zmproxy.tng.vnc.biz (zimbra-vnc.tngtech.com [35.234.71.156]) by mailgw02.zimbra-vnc.de (Postfix) with ESMTPS id 3B6EE200BB; Mon, 18 May 2026 08:21:25 +0200 (CEST) Received: from localhost (localhost [127.0.0.1]) by zmproxy.tng.vnc.biz (Postfix) with ESMTP id 0EBB81FAD23; Mon, 18 May 2026 08:21:25 +0200 (CEST) Received: from zmproxy.tng.vnc.biz ([127.0.0.1]) by localhost (zmproxy.tng.vnc.biz [127.0.0.1]) (amavis, port 10032) with ESMTP id ph0VJD52d3yv; Mon, 18 May 2026 08:21:22 +0200 (CEST) Received: from localhost (localhost [127.0.0.1]) by zmproxy.tng.vnc.biz (Postfix) with ESMTP id 9F4AD1FAD29; Mon, 18 May 2026 08:21:22 +0200 (CEST) DKIM-Filter: OpenDKIM Filter v2.10.3 zmproxy.tng.vnc.biz 9F4AD1FAD29 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=tngtech.com; s=B14491C6-869D-11EB-BB6C-8DD33D883B31; t=1779085282; bh=W7CTVhdY5T0rgHf2N38DX4KEubdBBcxAMMrBTxHVkIY=; h=From:To:Subject:Date:Message-ID:MIME-Version; b=brweu633LZtZWx2IY6qLu3U4jWimy5kOLkFoQPUSSzeEJ+yRIj1onJuB2RPdt3k70 s+3tXcSvP2QDrFAUmPX2yqDPvcJqDSYhhKH9uFsk80ffA6oKUT82AmDm5jY1aPvZcJ eBUkTLqg1LQuzT9sGsiT1h+85juDzuH6TnO7WCNUNLqURYhe7el07cKTXDCrYPHSym hj73d5NYwj7s5cGyJnkHfx4G/geqDio1VboCL/y/7o1V+rhAD0MagOukMh0qyGiJeg BTLBDmMGFEIn0cy6zDcB7UO2d+nR0O7QrGnNIjW8J0wwYLv8wN7P2qWHeFXPLgirwX 8+VunTzU0KBbg== X-Virus-Scanned: amavis at zmproxy.tng.vnc.biz Received: from zmproxy.tng.vnc.biz ([127.0.0.1]) by localhost (zmproxy.tng.vnc.biz [127.0.0.1]) (amavis, port 10026) with ESMTP id QdxULskZ40Za; Mon, 18 May 2026 08:21:22 +0200 (CEST) Received: from luis-Precision-5480.. (ipservice-092-209-239-167.092.209.pools.vodafone-ip.de [92.209.239.167]) by zmproxy.tng.vnc.biz (Postfix) with ESMTPSA id 4E45B1FAD27; Mon, 18 May 2026 08:21:22 +0200 (CEST) From: Luis To: nathan@kernel.org, nsc@kernel.org Cc: linux-kbuild@vger.kernel.org, linux-kernel@vger.kernel.org, akpm@linux-foundation.org, gregkh@linuxfoundation.org, kstewart@linuxfoundation.org, maximilian.huber@tngtech.com, Luis Augenstein Subject: [PATCH v7 10/15] scripts/sbom: collect file metadata Date: Mon, 18 May 2026 08:20:57 +0200 Message-ID: <20260518062102.2051814-11-luis.augenstein@tngtech.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20260518062102.2051814-1-luis.augenstein@tngtech.com> References: <20260518062102.2051814-1-luis.augenstein@tngtech.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Luis Augenstein Implement the kernel_file module that collects file metadata, including license identifier for source files, SHA-256 hash, Git blob object ID, an estimation of the file type, and whether files belong to the source, build, or output SBOM. Assisted-by: Cursor:claude-sonnet-4-5 Assisted-by: OpenCode:GLM-4-7 Co-developed-by: Maximilian Huber Signed-off-by: Maximilian Huber Signed-off-by: Luis Augenstein --- .../sbom/sbom/spdx_graph/build_spdx_graphs.py | 2 + scripts/sbom/sbom/spdx_graph/kernel_file.py | 315 ++++++++++++++++++ 2 files changed, 317 insertions(+) create mode 100644 scripts/sbom/sbom/spdx_graph/kernel_file.py diff --git a/scripts/sbom/sbom/spdx_graph/build_spdx_graphs.py b/scripts/sb= om/sbom/spdx_graph/build_spdx_graphs.py index 9c47258a31c..0f95f99d560 100644 --- a/scripts/sbom/sbom/spdx_graph/build_spdx_graphs.py +++ b/scripts/sbom/sbom/spdx_graph/build_spdx_graphs.py @@ -7,6 +7,7 @@ from typing import Protocol from sbom.config import KernelSpdxDocumentKind from sbom.cmd_graph import CmdGraph from sbom.path_utils import PathStr +from sbom.spdx_graph.kernel_file import KernelFileCollection from sbom.spdx_graph.spdx_graph_model import SpdxGraph, SpdxIdGeneratorCol= lection from sbom.spdx_graph.shared_spdx_elements import SharedSpdxElements =20 @@ -36,4 +37,5 @@ def build_spdx_graphs( Dictionary of SPDX graphs """ shared_elements =3D SharedSpdxElements.create(spdx_id_generators.base,= config.created) + kernel_files =3D KernelFileCollection.create(cmd_graph, config.obj_tre= e, config.src_tree, spdx_id_generators) return {} diff --git a/scripts/sbom/sbom/spdx_graph/kernel_file.py b/scripts/sbom/sbo= m/spdx_graph/kernel_file.py new file mode 100644 index 00000000000..505f25f66eb --- /dev/null +++ b/scripts/sbom/sbom/spdx_graph/kernel_file.py @@ -0,0 +1,315 @@ +# SPDX-License-Identifier: GPL-2.0-only OR MIT +# Copyright (C) 2025 TNG Technology Consulting GmbH + +from dataclasses import dataclass +from enum import Enum +import hashlib +import os +import re +from sbom.cmd_graph import CmdGraph +from sbom.path_utils import PathStr, is_relative_to +from sbom.spdx import SpdxId, SpdxIdGenerator +from sbom.spdx.core import Hash +from sbom.spdx.software import ContentIdentifier, File, SoftwarePurpose +import sbom.sbom_logging as sbom_logging +from sbom.spdx_graph.spdx_graph_model import SpdxIdGeneratorCollection + + +class KernelFileLocation(Enum): + """Represents the location of a file relative to the source/object tre= es.""" + + SOURCE_TREE =3D "source_tree" + """File is located in the source tree.""" + OBJ_TREE =3D "obj_tree" + """File is located in the object tree.""" + EXTERNAL =3D "external" + """File is located outside both source and object trees.""" + BOTH =3D "both" + """File is located in a folder that is both source and object tree.""" + + +@dataclass +class KernelFile: + """kernel-specific metadata used to generate an SPDX File element.""" + + absolute_path: PathStr + """Absolute path of the file.""" + file_location: KernelFileLocation + """Location of the file relative to the source/object trees.""" + name: str + """Name of the file element. Should be relative to the source tree if + file_location equals SOURCE_TREE and relative to the object tree if + file_location equals OBJ_TREE. If file_location equals EXTERNAL, the + absolute path is used.""" + license_identifier: str | None + """SPDX license ID if file_location equals SOURCE_TREE or BOTH; otherw= ise None.""" + spdx_id_generator: SpdxIdGenerator + """Generator for the SPDX ID of the file element.""" + + _spdx_file_element: File | None =3D None + + @classmethod + def create( + cls, + absolute_path: PathStr, + obj_tree: PathStr, + src_tree: PathStr, + spdx_id_generators: SpdxIdGeneratorCollection, + is_output: bool, + ) -> "KernelFile": + is_in_obj_tree =3D is_relative_to(absolute_path, obj_tree) + is_in_src_tree =3D is_relative_to(absolute_path, src_tree) + + # file element name should be relative to output or src tree if po= ssible + if not is_in_src_tree and not is_in_obj_tree: + file_element_name =3D str(absolute_path) + file_location =3D KernelFileLocation.EXTERNAL + spdx_id_generator =3D spdx_id_generators.source if src_tree != =3D obj_tree else spdx_id_generators.build + elif is_in_src_tree and src_tree =3D=3D obj_tree: + file_element_name =3D os.path.relpath(absolute_path, obj_tree) + file_location =3D KernelFileLocation.BOTH + spdx_id_generator =3D spdx_id_generators.output if is_output e= lse spdx_id_generators.build + elif is_in_obj_tree: + file_element_name =3D os.path.relpath(absolute_path, obj_tree) + file_location =3D KernelFileLocation.OBJ_TREE + spdx_id_generator =3D spdx_id_generators.output if is_output e= lse spdx_id_generators.build + else: + file_element_name =3D os.path.relpath(absolute_path, src_tree) + file_location =3D KernelFileLocation.SOURCE_TREE + spdx_id_generator =3D spdx_id_generators.source + + # parse spdx license identifier + license_identifier =3D ( + _parse_spdx_license_identifier(absolute_path) + if file_location =3D=3D KernelFileLocation.SOURCE_TREE or file= _location =3D=3D KernelFileLocation.BOTH + else None + ) + + return KernelFile( + absolute_path, + file_location, + file_element_name, + license_identifier, + spdx_id_generator, + ) + + @property + def spdx_file_element(self) -> File: + if self._spdx_file_element is None: + self._spdx_file_element =3D _build_file_element( + self.absolute_path, + self.name, + self.spdx_id_generator.generate(), + self.file_location, + ) + return self._spdx_file_element + + +@dataclass +class KernelFileCollection: + """Collection of kernel files.""" + + source: dict[PathStr, KernelFile] + build: dict[PathStr, KernelFile] + output: dict[PathStr, KernelFile] + external: dict[PathStr, KernelFile] + + @classmethod + def create( + cls, + cmd_graph: CmdGraph, + obj_tree: PathStr, + src_tree: PathStr, + spdx_id_generators: SpdxIdGeneratorCollection, + ) -> "KernelFileCollection": + source: dict[PathStr, KernelFile] =3D {} + build: dict[PathStr, KernelFile] =3D {} + output: dict[PathStr, KernelFile] =3D {} + external: dict[PathStr, KernelFile] =3D {} + root_node_paths =3D {node.absolute_path for node in cmd_graph.root= s} + for node in cmd_graph: + is_root =3D node.absolute_path in root_node_paths + kernel_file =3D KernelFile.create( + node.absolute_path, + obj_tree, + src_tree, + spdx_id_generators, + is_root, + ) + if is_root: + output[kernel_file.absolute_path] =3D kernel_file + elif kernel_file.file_location =3D=3D KernelFileLocation.SOURC= E_TREE: + source[kernel_file.absolute_path] =3D kernel_file + elif kernel_file.file_location =3D=3D KernelFileLocation.EXTER= NAL: + external[kernel_file.absolute_path] =3D kernel_file + else: + build[kernel_file.absolute_path] =3D kernel_file + + return KernelFileCollection(source, build, output, external) + + def to_dict(self) -> dict[PathStr, KernelFile]: + return {**self.source, **self.build, **self.output, **self.externa= l} + + +def _build_file_element(absolute_path: PathStr, name: str, spdx_id: SpdxId= , file_location: KernelFileLocation) -> File: + verifiedUsing: list[Hash] =3D [] + content_identifier: list[ContentIdentifier] =3D [] + if os.path.isfile(absolute_path): + verifiedUsing =3D [Hash(algorithm=3D"sha256", hashValue=3D_sha256(= absolute_path))] + content_identifier =3D [ + ContentIdentifier( + software_contentIdentifierType=3D"gitoid", + software_contentIdentifierValue=3D_git_blob_oid(absolute_p= ath), + ) + ] + elif file_location =3D=3D KernelFileLocation.EXTERNAL: + sbom_logging.warning( + "Cannot compute hash for {absolute_path} because file does not= exist.", + absolute_path=3Dabsolute_path, + ) + else: + sbom_logging.error( + "Cannot compute hash for {absolute_path} because file does not= exist.", + absolute_path=3Dabsolute_path, + ) + + # primary purpose + primary_purpose =3D _get_primary_purpose(absolute_path) + + return File( + spdxId=3Dspdx_id, + name=3Dname, + verifiedUsing=3DverifiedUsing, + software_primaryPurpose=3Dprimary_purpose, + software_contentIdentifier=3Dcontent_identifier, + ) + + +def _sha256(file_path: PathStr, chunk_size: int =3D 1 << 20) -> str: + """Compute the SHA-256 hex digest of a file, reading it in chunks of c= hunk_size bytes.""" + h =3D hashlib.sha256() + with open(file_path, "rb") as f: + for chunk in iter(lambda: f.read(chunk_size), b""): + h.update(chunk) + return h.hexdigest() + + +def _git_blob_oid(file_path: str, chunk_size: int =3D 1 << 20) -> str: + """Compute the Git blob object ID (SHA-1 hex) for a file, like `git ha= sh-object`, reading it in chunks of chunk_size bytes.""" + h =3D hashlib.sha1() + h.update(f"blob {os.path.getsize(file_path)}\0".encode()) + with open(file_path, "rb") as f: + for chunk in iter(lambda: f.read(chunk_size), b""): + h.update(chunk) + return h.hexdigest() + + +# REUSE-IgnoreStart +SPDX_LICENSE_IDENTIFIER_PATTERN =3D re.compile( + r"SPDX-License-Identifier:" # literal tag + r"\s*" # optional whitespace after colon + r"(?P.*?)" # license expression (non-greedy, stops = before terminator) + r"(?:\s*" # optional whitespace before terminator = (not captured) + r"(-->|\*/|$))", # terminator: XML "-->", C-style "*/", o= r end of line + re.MULTILINE, # match end of each line, not just end o= f string +) +# REUSE-IgnoreEnd + + +def _parse_spdx_license_identifier(absolute_path: str, max_bytes: int =3D = 512) -> str | None: + """ + Extracts the SPDX-License-Identifier from the beginning of a source fi= le. + + Args: + absolute_path: Path to the source file. + max_bytes: Maximum number of bytes to scan for the license identif= ier. + + Returns: + The license identifier string (e.g., 'GPL-2.0-only') if found, oth= erwise None. + """ + try: + with open(absolute_path, "r", encoding=3D"utf-8") as f: + match =3D SPDX_LICENSE_IDENTIFIER_PATTERN.search(f.read(max_by= tes)) + if match: + return match.group("id") + except (UnicodeDecodeError, OSError): + return None + return None + + +def _get_primary_purpose(absolute_path: PathStr) -> SoftwarePurpose | None: + def ends_with(suffixes: list[str]) -> bool: + return any(absolute_path.endswith(suffix) for suffix in suffixes) + + def includes_path_segments(path_segments: list[str]) -> bool: + return any(segment in absolute_path for segment in path_segments) + + # Source code + if ends_with([".c", ".h", ".S", ".s", ".rs", ".pl", "gen_smb1_mapping"= , "gen_smb2_mapping"]): + return "source" + + # Libraries + if ends_with([".a", ".so", ".so.raw", ".rlib"]): + return "library" + + # Archives + if ends_with([".xz", ".cpio", ".gz", ".tar", ".zip", "piggy_data"]): + return "archive" + + # Applications + if ends_with(["bzImage", "Image", ".efi"]): + return "application" + + # Executables / machine code + if ends_with([".bin", ".elf", "vmlinux", "vmlinux.unstripped", "vmlinu= z", "bpfilter_umh"]): + return "executable" + + # Kernel modules + if ends_with([".ko"]): + return "module" + + # Data files + if ends_with( + [ + ".tbl", + ".relocs", + ".rmeta", + ".in", + ".dbg", + ".x509", + ".pbm", + ".ppm", + ".dtb", + ".uc", + ".inc", + ".dts", + ".dtsi", + ".dtbo", + ".xml", + ".ro", + "initramfs_inc_data", + "default_cpio_list", + "x509_certificate_list", + "utf8data.c_shipped", + "blacklist_hash_list", + "x509_revocation_list", + "cpucaps", + "sysreg", + "mach-types", + ] + ) or includes_path_segments(["drivers/gpu/drm/radeon/reg_srcs/"]): + return "data" + + # Configuration files + if ends_with([".pem", ".key", ".conf", ".config", ".cfg", ".bconf"]): + return "configuration" + + # Documentation + if ends_with([".md"]): + return "documentation" + + # Other / miscellaneous + if ends_with([".o", ".tmp"]): + return "other" + + sbom_logging.warning("Could not infer primary purpose for {absolute_pa= th}", absolute_path=3Dabsolute_path) --=20 2.43.0 From nobody Mon May 25 05:54:39 2026 Received: from mailgw02.zimbra-vnc.de (mailgw02.zimbra-vnc.de [148.251.102.236]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 5B4D03DEAFC; Mon, 18 May 2026 06:21:32 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=148.251.102.236 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779085303; cv=none; b=hUwC94DqnA0JwbsPyha8ywaGwXXutmQp5BuxrHvVT7KyU4CkxyHofRE44VMuFI4M+bxWXCimxNa7nJMsOnsF/p8r7vWhIopM41AL6XPd7+5PyC5gC+0qZVwS6qE9JREzVUdXmLSlet52N9KjLSFfnnDA5Mva/oG82hyQxvFPY8w= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779085303; c=relaxed/simple; bh=b50ToFXRoxWRScu/nVpsrCRytSXn8cked0GlXqekENk=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=D/PrUif9sxNj4ktD7Fkscr/Aiw62k9t2ZmNrI7qG9iMOLjJAfTx7ZZCoaG5cK8TOh50XCQhycG07tTEjij2Yff5thc9Z+pwlJBTOw6qZDRpqCwtAqAHCaTLYR/D1rBgyTQzQaVAIAEZzDyXkJNlyVKkIV2Vvo3UJTe8Oumj8h3g= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=tngtech.com; spf=pass smtp.mailfrom=tngtech.com; dkim=pass (2048-bit key) header.d=tngtech.com header.i=@tngtech.com header.b=JCA+TGVg; arc=none smtp.client-ip=148.251.102.236 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=tngtech.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=tngtech.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=tngtech.com header.i=@tngtech.com header.b="JCA+TGVg" Received: from zmproxy.tng.vnc.biz (zimbra-vnc.tngtech.com [35.234.71.156]) by mailgw02.zimbra-vnc.de (Postfix) with ESMTPS id 8C892200CB; Mon, 18 May 2026 08:21:25 +0200 (CEST) Received: from localhost (localhost [127.0.0.1]) by zmproxy.tng.vnc.biz (Postfix) with ESMTP id 51D841F89A8; Mon, 18 May 2026 08:21:25 +0200 (CEST) Received: from zmproxy.tng.vnc.biz ([127.0.0.1]) by localhost (zmproxy.tng.vnc.biz [127.0.0.1]) (amavis, port 10032) with ESMTP id 2wcHQGMPHUWS; Mon, 18 May 2026 08:21:24 +0200 (CEST) Received: from localhost (localhost [127.0.0.1]) by zmproxy.tng.vnc.biz (Postfix) with ESMTP id 4769C1FACCB; Mon, 18 May 2026 08:21:24 +0200 (CEST) DKIM-Filter: OpenDKIM Filter v2.10.3 zmproxy.tng.vnc.biz 4769C1FACCB DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=tngtech.com; s=B14491C6-869D-11EB-BB6C-8DD33D883B31; t=1779085284; bh=zdzp9hvABx1s2HkSN8Ee7hXMOVVuxyPBI/EdI7HyFIw=; h=From:To:Subject:Date:Message-ID:MIME-Version; b=JCA+TGVgqyzOQAXZulM6gG7kGTlKEl9OpDtZATACvu229PUleOBJSTwPsZRzvAvsY PmMEy8CihWb1sY3m/N4uBgFYTyCdq6HkoiwdJ/339dTZ3UrmGHf28I3FGiRr5NXsuT E7BPSeWTyz8u6tvB3y979yKmm5HGDJfbxZkly+ehyCTT1HDiTfsBhoz1Bpf+4J24AD kdneGCN1LGCMhrlF0m8wxKHWR0g/DjjpYCdkGrwb4tweTKPfc///bG4DZGTTsR66MC qTF2Mi8fWhNHjJwcXzkZFk2cb45jkrH8la1rantjQHYG/68DdezOKOLBv/inpfjlDS GI6s7eaIEOalg== X-Virus-Scanned: amavis at zmproxy.tng.vnc.biz Received: from zmproxy.tng.vnc.biz ([127.0.0.1]) by localhost (zmproxy.tng.vnc.biz [127.0.0.1]) (amavis, port 10026) with ESMTP id JjyWTsNDfRec; Mon, 18 May 2026 08:21:24 +0200 (CEST) Received: from luis-Precision-5480.. (ipservice-092-209-239-167.092.209.pools.vodafone-ip.de [92.209.239.167]) by zmproxy.tng.vnc.biz (Postfix) with ESMTPSA id DB8FB1F89A8; Mon, 18 May 2026 08:21:23 +0200 (CEST) From: Luis To: nathan@kernel.org, nsc@kernel.org Cc: linux-kbuild@vger.kernel.org, linux-kernel@vger.kernel.org, akpm@linux-foundation.org, gregkh@linuxfoundation.org, kstewart@linuxfoundation.org, maximilian.huber@tngtech.com, Luis Augenstein Subject: [PATCH v7 11/15] scripts/sbom: add SPDX output graph Date: Mon, 18 May 2026 08:20:58 +0200 Message-ID: <20260518062102.2051814-12-luis.augenstein@tngtech.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20260518062102.2051814-1-luis.augenstein@tngtech.com> References: <20260518062102.2051814-1-luis.augenstein@tngtech.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Luis Augenstein Implement the SPDX output graph which contains the distributable build outputs and high level metadata about the build. Assisted-by: Cursor:claude-sonnet-4-5 Assisted-by: OpenCode:GLM-4-7 Co-developed-by: Maximilian Huber Signed-off-by: Maximilian Huber Signed-off-by: Luis Augenstein --- Makefile | 5 +- scripts/sbom/sbom/config.py | 64 ++++++ .../sbom/sbom/spdx_graph/build_spdx_graphs.py | 18 +- .../sbom/sbom/spdx_graph/spdx_output_graph.py | 187 ++++++++++++++++++ 4 files changed, 272 insertions(+), 2 deletions(-) create mode 100644 scripts/sbom/sbom/spdx_graph/spdx_output_graph.py diff --git a/Makefile b/Makefile index c121283231d..f88386c4db6 100644 --- a/Makefile +++ b/Makefile @@ -2213,7 +2213,10 @@ quiet_cmd_sbom =3D GEN $(sbom_targets) --obj-tree $(abspath $(objtree)) \ --roots-file "$(tmp-target)" \ --output-directory $(abspath $(objtree)) \ - --generate-spdx; + --generate-spdx \ + --package-license "GPL-2.0 WITH Linux-syscall-note" \ + --package-version "$(KERNELVERSION)" \ + --write-output-on-error; PHONY +=3D sbom sbom: $(notdir $(KBUILD_IMAGE)) include/generated/autoconf.h $(if $(CONFIG= _MODULES),modules modules.order) $(call cmd,sbom) diff --git a/scripts/sbom/sbom/config.py b/scripts/sbom/sbom/config.py index b1dd30790f5..6811f782943 100644 --- a/scripts/sbom/sbom/config.py +++ b/scripts/sbom/sbom/config.py @@ -59,6 +59,21 @@ class KernelSbomConfig: spdxId_prefix: str """Prefix to use for all SPDX element IDs.""" =20 + build_type: str + """SPDX buildType property to use for all Build elements.""" + + build_id: str | None + """SPDX buildId property to use for all Build elements.""" + + package_license: str + """License expression applied to all SPDX Packages.""" + + package_version: str | None + """Version string applied to all SPDX Packages.""" + + package_copyright_text: str | None + """Copyright text applied to all SPDX Packages.""" + prettify_json: bool """Whether to pretty-print generated SPDX JSON documents.""" =20 @@ -154,6 +169,40 @@ def _parse_cli_arguments(parser: argparse.ArgumentPars= er) -> dict[str, Any]: default=3D"urn:spdx.dev:", help=3D"The prefix to use for all spdxId properties. (default: urn= :spdx.dev:)", ) + spdx_group.add_argument( + "--build-type", + default=3D"urn:spdx.dev:Kbuild", + help=3D"The SPDX buildType property to use for all Build elements.= (default: urn:spdx.dev:Kbuild)", + ) + spdx_group.add_argument( + "--build-id", + default=3DNone, + help=3D"The SPDX buildId property to use for all Build elements.\n" + "If not provided the spdxId of the high level Build element is use= d as the buildId. (default: None)", + ) + spdx_group.add_argument( + "--package-license", + default=3D"NOASSERTION", + help=3D( + "The SPDX licenseExpression property to use for the LicenseExp= ression " + "linked to all SPDX Package elements. (default: NOASSERTION)" + ), + ) + spdx_group.add_argument( + "--package-version", + default=3DNone, + help=3D"The SPDX packageVersion property to use for all SPDX Packa= ge elements. (default: None)", + ) + spdx_group.add_argument( + "--package-copyright-text", + default=3DNone, + help=3D( + "The SPDX copyrightText property to use for all SPDX Package e= lements.\n" + "If not specified, and if a COPYING file exists in the source = tree,\n" + "the package-copyright-text is set to the content of this file= . " + "(default: None)" + ), + ) spdx_group.add_argument( "--prettify-json", action=3D"store_true", @@ -204,6 +253,16 @@ def get_config() -> KernelSbomConfig: tz=3Dtimezone.utc, ) spdxId_prefix =3D args["spdxId_prefix"] + build_type =3D args["build_type"] + build_id =3D args["build_id"] + package_license =3D args["package_license"] + package_version =3D args["package_version"] if args["package_version"]= is not None else None + package_copyright_text: str | None =3D None + if args["package_copyright_text"] is not None: + package_copyright_text =3D args["package_copyright_text"] + elif os.path.isfile(copying_path :=3D os.path.join(src_tree, "COPYING"= )): + with open(copying_path, "r", encoding=3D"utf-8") as f: + package_copyright_text =3D f.read() prettify_json =3D args["prettify_json"] =20 # Hardcoded config @@ -228,6 +287,11 @@ def get_config() -> KernelSbomConfig: write_output_on_error=3Dwrite_output_on_error, created=3Dcreated, spdxId_prefix=3DspdxId_prefix, + build_type=3Dbuild_type, + build_id=3Dbuild_id, + package_license=3Dpackage_license, + package_version=3Dpackage_version, + package_copyright_text=3Dpackage_copyright_text, prettify_json=3Dprettify_json, ) =20 diff --git a/scripts/sbom/sbom/spdx_graph/build_spdx_graphs.py b/scripts/sb= om/sbom/spdx_graph/build_spdx_graphs.py index 0f95f99d560..2af0fbe6cdb 100644 --- a/scripts/sbom/sbom/spdx_graph/build_spdx_graphs.py +++ b/scripts/sbom/sbom/spdx_graph/build_spdx_graphs.py @@ -10,12 +10,18 @@ from sbom.path_utils import PathStr from sbom.spdx_graph.kernel_file import KernelFileCollection from sbom.spdx_graph.spdx_graph_model import SpdxGraph, SpdxIdGeneratorCol= lection from sbom.spdx_graph.shared_spdx_elements import SharedSpdxElements +from sbom.spdx_graph.spdx_output_graph import SpdxOutputGraph =20 =20 class SpdxGraphConfig(Protocol): obj_tree: PathStr src_tree: PathStr created: datetime + build_type: str + build_id: str | None + package_license: str + package_version: str | None + package_copyright_text: str | None =20 =20 def build_spdx_graphs( @@ -38,4 +44,14 @@ def build_spdx_graphs( """ shared_elements =3D SharedSpdxElements.create(spdx_id_generators.base,= config.created) kernel_files =3D KernelFileCollection.create(cmd_graph, config.obj_tre= e, config.src_tree, spdx_id_generators) - return {} + output_graph =3D SpdxOutputGraph.create( + root_files=3Dlist(kernel_files.output.values()), + shared_elements=3Dshared_elements, + spdx_id_generators=3Dspdx_id_generators, + config=3Dconfig, + ) + spdx_graphs: dict[KernelSpdxDocumentKind, SpdxGraph] =3D { + KernelSpdxDocumentKind.OUTPUT: output_graph, + } + + return spdx_graphs diff --git a/scripts/sbom/sbom/spdx_graph/spdx_output_graph.py b/scripts/sb= om/sbom/spdx_graph/spdx_output_graph.py new file mode 100644 index 00000000000..ff9b2c31fb0 --- /dev/null +++ b/scripts/sbom/sbom/spdx_graph/spdx_output_graph.py @@ -0,0 +1,187 @@ +# SPDX-License-Identifier: GPL-2.0-only OR MIT +# Copyright (C) 2025 TNG Technology Consulting GmbH + +from dataclasses import dataclass +import os +from typing import Protocol +from sbom.environment import Environment +from sbom.path_utils import PathStr +from sbom.spdx.build import Build +from sbom.spdx.core import DictionaryEntry, NamespaceMap, Relationship, Sp= dxDocument +from sbom.spdx.simplelicensing import LicenseExpression +from sbom.spdx.software import File, Package, Sbom +from sbom.spdx.spdxId import SpdxIdGenerator +from sbom.spdx_graph.kernel_file import KernelFile +from sbom.spdx_graph.shared_spdx_elements import SharedSpdxElements +from sbom.spdx_graph.spdx_graph_model import SpdxGraph, SpdxIdGeneratorCol= lection + + +class SpdxOutputGraphConfig(Protocol): + obj_tree: PathStr + src_tree: PathStr + build_type: str + build_id: str | None + package_license: str + package_version: str | None + package_copyright_text: str | None + + +@dataclass +class SpdxOutputGraph(SpdxGraph): + """SPDX graph representing distributable output files""" + + high_level_build_element: Build + + @classmethod + def create( + cls, + root_files: list[KernelFile], + shared_elements: SharedSpdxElements, + spdx_id_generators: SpdxIdGeneratorCollection, + config: SpdxOutputGraphConfig, + ) -> "SpdxOutputGraph": + """ + Args: + root_files: List of distributable output files which act as ro= ots + of the dependency graph. + shared_elements: Shared SPDX elements used across multiple doc= uments. + spdx_id_generators: Collection of SPDX ID generators. + config: Configuration options. + + Returns: + SpdxOutputGraph: The SPDX output graph. + """ + # SpdxDocument + spdx_document =3D SpdxDocument( + spdxId=3Dspdx_id_generators.output.generate(), + profileConformance=3D["core", "software", "build", "simpleLice= nsing"], + namespaceMap=3D[ + NamespaceMap(prefix=3Dgenerator.prefix, namespace=3Dgenera= tor.namespace) + for generator in [spdx_id_generators.output, spdx_id_gener= ators.base] + if generator.prefix is not None + ], + ) + + # Sbom + sbom =3D Sbom( + spdxId=3Dspdx_id_generators.output.generate(), + software_sbomType=3D["build"], + ) + + # High-level Build elements + config_source_element =3D KernelFile.create( + absolute_path=3Dos.path.join(config.obj_tree, ".config"), + obj_tree=3Dconfig.obj_tree, + src_tree=3Dconfig.src_tree, + spdx_id_generators=3Dspdx_id_generators, + is_output=3DTrue, + ).spdx_file_element + high_level_build_element, high_level_build_element_hasOutput_relat= ionship =3D _high_level_build_elements( + config.build_type, + config.build_id, + config_source_element, + spdx_id_generators.output, + ) + + # Root file elements + root_file_elements: list[File] =3D [file.spdx_file_element for fil= e in root_files] + + # Package elements + package_elements =3D [ + Package( + spdxId=3Dspdx_id_generators.output.generate(), + name=3D_get_package_name(file.name), + software_packageVersion=3Dconfig.package_version, + software_copyrightText=3Dconfig.package_copyright_text, + comment=3Df"Architecture=3D{arch}" if (arch :=3D Environme= nt.ARCH() or Environment.SRCARCH()) else None, + software_primaryPurpose=3Dfile.software_primaryPurpose, + ) + for file in root_file_elements + ] + package_hasDistributionArtifact_file_relationships =3D [ + Relationship( + spdxId=3Dspdx_id_generators.output.generate(), + relationshipType=3D"hasDistributionArtifact", + from_=3Dpackage, + to=3D[file], + ) + for package, file in zip(package_elements, root_file_elements) + ] + package_license_expression =3D LicenseExpression( + spdxId=3Dspdx_id_generators.output.generate(), + simplelicensing_licenseExpression=3Dconfig.package_license, + ) + package_hasDeclaredLicense_relationships =3D [ + Relationship( + spdxId=3Dspdx_id_generators.output.generate(), + relationshipType=3D"hasDeclaredLicense", + from_=3Dpackage, + to=3D[package_license_expression], + ) + for package in package_elements + ] + + # Update relationships + spdx_document.rootElement =3D [sbom] + + sbom.rootElement =3D [*package_elements] + sbom.element =3D [ + config_source_element, + high_level_build_element, + high_level_build_element_hasOutput_relationship, + *root_file_elements, + *package_elements, + *package_hasDistributionArtifact_file_relationships, + package_license_expression, + *package_hasDeclaredLicense_relationships, + ] + + high_level_build_element_hasOutput_relationship.to =3D [*root_file= _elements] + + output_graph =3D SpdxOutputGraph( + spdx_document, + shared_elements.agent, + shared_elements.creation_info, + sbom, + high_level_build_element, + ) + return output_graph + + +def _get_package_name(filename: str) -> str: + """ + Generates a SPDX package name from a filename. + Kernel images (bzImage, Image) get a descriptive name, others use the = basename of the file. + """ + KERNEL_FILENAMES =3D ["bzImage", "Image"] + basename =3D os.path.basename(filename) + return f"Linux Kernel ({basename})" if basename in KERNEL_FILENAMES el= se basename + + +def _high_level_build_elements( + build_type: str, + build_id: str | None, + config_source_element: File, + spdx_id_generator: SpdxIdGenerator, +) -> tuple[Build, Relationship]: + build_spdxId =3D spdx_id_generator.generate() + high_level_build_element =3D Build( + spdxId=3Dbuild_spdxId, + build_buildType=3Dbuild_type, + build_buildId=3Dbuild_id if build_id is not None else build_spdxId, + build_environment=3D[ + DictionaryEntry(key=3Dkey, value=3Dvalue) + for key, value in Environment.KERNEL_BUILD_VARIABLES().items() + if value + ], + build_configSourceUri=3D[config_source_element.spdxId], + build_configSourceDigest=3Dconfig_source_element.verifiedUsing, + ) + + high_level_build_element_hasOutput_relationship =3D Relationship( + spdxId=3Dspdx_id_generator.generate(), + relationshipType=3D"hasOutput", + from_=3Dhigh_level_build_element, + to=3D[], + ) + return high_level_build_element, high_level_build_element_hasOutput_re= lationship --=20 2.43.0 From nobody Mon May 25 05:54:39 2026 Received: from mailgw02.zimbra-vnc.de (mailgw02.zimbra-vnc.de [148.251.102.236]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 58D5F3CE080; Mon, 18 May 2026 06:21:32 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=148.251.102.236 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779085303; cv=none; b=NisE/riB5FdIQZOGgUHWlFcj2XwwC2Hn1/6r3rbepzuaihilpAnzNOFlp3DVQ+MXV9nob2mrvaslnWzPjheSLCv5qTebOvvbWg3X99Uh6UUUvAGqh5kre7Nl6u70g8ksoyGQoeNnT+6h5PiTWMM84kthuAVrObLEy5oLretqSdQ= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779085303; c=relaxed/simple; bh=WXtM7VisArFeZ1mgK5qX0yEHrfpWEnhcJjRq16D2Es4=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=Qg82kyhDcnoGGPxy/jttzmGitGoO4PkqFGH1z8sTAUXv1hKRInaa3RTwv7upKjV1+DH8mPQIa9b+WZ3yeWQB40goa3K2XDwJcED3T/QEaIzYeFyLosHrJmRTpOjhZ2DuzZLQTTSnyxqOBSiAlBxZl40hgob1b0bCQCfaUd4wc6I= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=tngtech.com; spf=pass smtp.mailfrom=tngtech.com; dkim=pass (2048-bit key) header.d=tngtech.com header.i=@tngtech.com header.b=IzaN5FgW; arc=none smtp.client-ip=148.251.102.236 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=tngtech.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=tngtech.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=tngtech.com header.i=@tngtech.com header.b="IzaN5FgW" Received: from zmproxy.tng.vnc.biz (zimbra-vnc.tngtech.com [35.234.71.156]) by mailgw02.zimbra-vnc.de (Postfix) with ESMTPS id CC174200CC; Mon, 18 May 2026 08:21:26 +0200 (CEST) Received: from localhost (localhost [127.0.0.1]) by zmproxy.tng.vnc.biz (Postfix) with ESMTP id 808A71F89A8; Mon, 18 May 2026 08:21:26 +0200 (CEST) Received: from zmproxy.tng.vnc.biz ([127.0.0.1]) by localhost (zmproxy.tng.vnc.biz [127.0.0.1]) (amavis, port 10032) with ESMTP id ajOon2JiPSeo; Mon, 18 May 2026 08:21:25 +0200 (CEST) Received: from localhost (localhost [127.0.0.1]) by zmproxy.tng.vnc.biz (Postfix) with ESMTP id D292A1F8989; Mon, 18 May 2026 08:21:25 +0200 (CEST) DKIM-Filter: OpenDKIM Filter v2.10.3 zmproxy.tng.vnc.biz D292A1F8989 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=tngtech.com; s=B14491C6-869D-11EB-BB6C-8DD33D883B31; t=1779085285; bh=irArMQU4LwVWrBnLTa5TS4whXCpE8NI54UJeNCuL+Aw=; h=From:To:Subject:Date:Message-ID:MIME-Version; b=IzaN5FgWF5LL+4ChErowx6RmQrDVRp6jTFVbiJJY6igEKk78mukpiugNSW4eAEHfg K04cJxMOQFtkWidx+2YidIr+xJTvgizgbCgWfo8Rba/cHAhKVGhZ3EiVwRKwx0DLiY fzrJ4qbW3yRZ3r4UQOh7DjWo+278aITJ+bwjoAYzw4of8f/E80XSeqWC/1v3r7bd5S OpDQEt/YHBkv+Y02s56XL91GJkkOOOqZw4BA1VlAzx5xfpj19CqD0abuQX2PNOSjF8 GGv0NP2REUPQ5L3/D5jCe2Yba5rnAMAahMYR9kGYHZt51SNlYd2SQGFngIt5Q0/022 9SOur3kuKiriw== X-Virus-Scanned: amavis at zmproxy.tng.vnc.biz Received: from zmproxy.tng.vnc.biz ([127.0.0.1]) by localhost (zmproxy.tng.vnc.biz [127.0.0.1]) (amavis, port 10026) with ESMTP id iZPJ9I4qb5bK; Mon, 18 May 2026 08:21:25 +0200 (CEST) Received: from luis-Precision-5480.. (ipservice-092-209-239-167.092.209.pools.vodafone-ip.de [92.209.239.167]) by zmproxy.tng.vnc.biz (Postfix) with ESMTPSA id 816371FACCB; Mon, 18 May 2026 08:21:25 +0200 (CEST) From: Luis To: nathan@kernel.org, nsc@kernel.org Cc: linux-kbuild@vger.kernel.org, linux-kernel@vger.kernel.org, akpm@linux-foundation.org, gregkh@linuxfoundation.org, kstewart@linuxfoundation.org, maximilian.huber@tngtech.com, Luis Augenstein Subject: [PATCH v7 12/15] scripts/sbom: add SPDX source graph Date: Mon, 18 May 2026 08:20:59 +0200 Message-ID: <20260518062102.2051814-13-luis.augenstein@tngtech.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20260518062102.2051814-1-luis.augenstein@tngtech.com> References: <20260518062102.2051814-1-luis.augenstein@tngtech.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Luis Augenstein Implement the SPDX source graph which contains all source files involved during the build, along with the licensing information for each file. Assisted-by: Cursor:claude-sonnet-4-5 Assisted-by: OpenCode:GLM-4-7 Co-developed-by: Maximilian Huber Signed-off-by: Maximilian Huber Signed-off-by: Luis Augenstein --- .../sbom/sbom/spdx_graph/build_spdx_graphs.py | 9 ++ .../sbom/sbom/spdx_graph/spdx_source_graph.py | 130 ++++++++++++++++++ 2 files changed, 139 insertions(+) create mode 100644 scripts/sbom/sbom/spdx_graph/spdx_source_graph.py diff --git a/scripts/sbom/sbom/spdx_graph/build_spdx_graphs.py b/scripts/sb= om/sbom/spdx_graph/build_spdx_graphs.py index 2af0fbe6cdb..f2567d44960 100644 --- a/scripts/sbom/sbom/spdx_graph/build_spdx_graphs.py +++ b/scripts/sbom/sbom/spdx_graph/build_spdx_graphs.py @@ -10,6 +10,7 @@ from sbom.path_utils import PathStr from sbom.spdx_graph.kernel_file import KernelFileCollection from sbom.spdx_graph.spdx_graph_model import SpdxGraph, SpdxIdGeneratorCol= lection from sbom.spdx_graph.shared_spdx_elements import SharedSpdxElements +from sbom.spdx_graph.spdx_source_graph import SpdxSourceGraph from sbom.spdx_graph.spdx_output_graph import SpdxOutputGraph =20 =20 @@ -54,4 +55,12 @@ def build_spdx_graphs( KernelSpdxDocumentKind.OUTPUT: output_graph, } =20 + if len(kernel_files.source) > 0: + spdx_graphs[KernelSpdxDocumentKind.SOURCE] =3D SpdxSourceGraph.cre= ate( + source_files=3Dlist(kernel_files.source.values()), + external_files=3Dlist(kernel_files.external.values()), + shared_elements=3Dshared_elements, + spdx_id_generators=3Dspdx_id_generators, + ) + return spdx_graphs diff --git a/scripts/sbom/sbom/spdx_graph/spdx_source_graph.py b/scripts/sb= om/sbom/spdx_graph/spdx_source_graph.py new file mode 100644 index 00000000000..90880212ded --- /dev/null +++ b/scripts/sbom/sbom/spdx_graph/spdx_source_graph.py @@ -0,0 +1,130 @@ +# SPDX-License-Identifier: GPL-2.0-only OR MIT +# Copyright (C) 2025 TNG Technology Consulting GmbH + +from dataclasses import dataclass +from sbom.spdx import SpdxIdGenerator +from sbom.spdx.core import Element, NamespaceMap, Relationship, SpdxDocume= nt +from sbom.spdx.simplelicensing import LicenseExpression +from sbom.spdx.software import File, Sbom +from sbom.spdx_graph.kernel_file import KernelFile +from sbom.spdx_graph.shared_spdx_elements import SharedSpdxElements +from sbom.spdx_graph.spdx_graph_model import SpdxGraph, SpdxIdGeneratorCol= lection + + +@dataclass +class SpdxSourceGraph(SpdxGraph): + """SPDX graph representing source files""" + + @classmethod + def create( + cls, + source_files: list[KernelFile], + external_files: list[KernelFile], + shared_elements: SharedSpdxElements, + spdx_id_generators: SpdxIdGeneratorCollection, + ) -> "SpdxSourceGraph": + """ + Args: + source_files: List of files within the kernel source tree. + external_files: Files outside both source and object trees. + shared_elements: Shared SPDX elements used across multiple doc= uments. + spdx_id_generators: Collection of SPDX ID generators. + + Returns: + SpdxSourceGraph: The SPDX source graph. + """ + # SpdxDocument + source_spdx_document =3D SpdxDocument( + spdxId=3Dspdx_id_generators.source.generate(), + profileConformance=3D["core", "software", "simpleLicensing"], + namespaceMap=3D[ + NamespaceMap(prefix=3Dgenerator.prefix, namespace=3Dgenera= tor.namespace) + for generator in [spdx_id_generators.source, spdx_id_gener= ators.base] + if generator.prefix is not None + ], + ) + + # Sbom + source_sbom =3D Sbom( + spdxId=3Dspdx_id_generators.source.generate(), + software_sbomType=3D["source"], + ) + + # Src Tree Elements + src_tree_element =3D File( + spdxId=3Dspdx_id_generators.source.generate(), + name=3D"$(src_tree)", + software_fileKind=3D"directory", + ) + src_tree_contains_relationship =3D Relationship( + spdxId=3Dspdx_id_generators.source.generate(), + relationshipType=3D"contains", + from_=3Dsrc_tree_element, + to=3D[], + ) + + # Source file elements + source_file_elements: list[Element] =3D [file.spdx_file_element fo= r file in source_files] + external_file_elements: list[Element] =3D [file.spdx_file_element = for file in external_files] + + # Source file license elements + source_file_license_identifiers, source_file_license_relationships= =3D source_file_license_elements( + source_files, spdx_id_generators.source + ) + + # Update relationships + source_spdx_document.rootElement =3D [source_sbom] + source_sbom.rootElement =3D [src_tree_element] + source_sbom.element =3D [ + src_tree_element, + src_tree_contains_relationship, + *source_file_elements, + *external_file_elements, + *source_file_license_identifiers, + *source_file_license_relationships, + ] + src_tree_contains_relationship.to =3D source_file_elements + + source_graph =3D SpdxSourceGraph( + source_spdx_document, + shared_elements.agent, + shared_elements.creation_info, + source_sbom, + ) + return source_graph + + +def source_file_license_elements( + source_files: list[KernelFile], spdx_id_generator: SpdxIdGenerator +) -> tuple[list[LicenseExpression], list[Relationship]]: + """ + Creates SPDX license expressions and links them to the given source fi= les + via hasDeclaredLicense relationships. + + Args: + source_files: List of files within the kernel source tree. + spdx_id_generator: Generator for unique SPDX IDs. + + Returns: + Tuple of (license expressions, hasDeclaredLicense relationships). + """ + license_expressions: dict[str, LicenseExpression] =3D {} + for file in source_files: + if file.license_identifier is None or file.license_identifier in l= icense_expressions: + continue + license_expressions[file.license_identifier] =3D LicenseExpression( + spdxId=3Dspdx_id_generator.generate(), + simplelicensing_licenseExpression=3Dfile.license_identifier, + ) + + source_file_license_relationships =3D [ + Relationship( + spdxId=3Dspdx_id_generator.generate(), + relationshipType=3D"hasDeclaredLicense", + from_=3Dfile.spdx_file_element, + to=3D[license_expressions[file.license_identifier]], + ) + for file in source_files + if file.license_identifier is not None + ] + return ([*license_expressions.values()], source_file_license_relations= hips) --=20 2.43.0 From nobody Mon May 25 05:54:39 2026 Received: from mailgw02.zimbra-vnc.de (mailgw02.zimbra-vnc.de [148.251.102.236]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 44C2E3DDDCD; Mon, 18 May 2026 06:21:36 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=148.251.102.236 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779085304; cv=none; b=peg7NqAneyJf0b35ENM5n8AOaVR5L3SCErMokMelKh3wQCfxuvAihXdIkQjAkIvUmOT62RCWPAAhQxfQRRkgqeuLhDdHdXE/utStl9MGRvLHhWhctuwfoieLxMpb6S+UrB8a+LNgga2kxS3GWzhMrzc0yZzwid07MavbNtxZolI= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779085304; c=relaxed/simple; bh=+v4Q6VCD4q79YgY9ngs/9H9auNUc3ghr/wPj0DcRo6g=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=J6qBDzIcwM6iZGLLBDvY5XYonREfaD2bAsUEbjfYlitBRGFem21q00PHh7LVnE16gOKwizs1Uggn+xKRAmvklblMR4xwWrpvieMSCrrkA+ssMJKlthyARO7pCMdyfMk2/n/hAJzQ8j0+LRWiu1Bwzj9rH9UkXbzrOaOmPLBJ9sE= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=tngtech.com; spf=pass smtp.mailfrom=tngtech.com; dkim=pass (2048-bit key) header.d=tngtech.com header.i=@tngtech.com header.b=Huw8/KiC; arc=none smtp.client-ip=148.251.102.236 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=tngtech.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=tngtech.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=tngtech.com header.i=@tngtech.com header.b="Huw8/KiC" Received: from zmproxy.tng.vnc.biz (zimbra-vnc.tngtech.com [35.234.71.156]) by mailgw02.zimbra-vnc.de (Postfix) with ESMTPS id D7E79200C6; Mon, 18 May 2026 08:21:28 +0200 (CEST) Received: from localhost (localhost [127.0.0.1]) by zmproxy.tng.vnc.biz (Postfix) with ESMTP id 2B68A1F8989; Mon, 18 May 2026 08:21:28 +0200 (CEST) Received: from zmproxy.tng.vnc.biz ([127.0.0.1]) by localhost (zmproxy.tng.vnc.biz [127.0.0.1]) (amavis, port 10032) with ESMTP id vRYrDHPwlS2J; Mon, 18 May 2026 08:21:27 +0200 (CEST) Received: from localhost (localhost [127.0.0.1]) by zmproxy.tng.vnc.biz (Postfix) with ESMTP id 3C4B11F89A8; Mon, 18 May 2026 08:21:27 +0200 (CEST) DKIM-Filter: OpenDKIM Filter v2.10.3 zmproxy.tng.vnc.biz 3C4B11F89A8 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=tngtech.com; s=B14491C6-869D-11EB-BB6C-8DD33D883B31; t=1779085287; bh=XEgu4LUSVjfZgcscCYcZaHyuAzW000Lwm3mdeHvwOPE=; h=From:To:Subject:Date:Message-ID:MIME-Version; b=Huw8/KiCqeXLXdCl2/ARDiGYf76i/NDmS/+rcvhK3a/YhRWUCj9XxSLZjyBCTuQu2 7WGBxRQk8HIiqE9T3AwIAuKUXKgz/EjOVuqWvZ5l0D8wh+N923atEGfS5oFJaZO3pI AHIo8E9L7ah7qzqZQ2cxHFhDeLESGnwAr2/Dp2f7nYOBjMIbNcxiOtvaniuehiyliJ 7O2DO7G78XAFUbrN1UUEcai7qqb/zgauQ+yJNV4RlW7AGwxv49JhNX5OWoMqK33znc 3zjmeiGA9ZNNcpnBVnIltq7pU3Mzo8BLkHT7HUs8agr3dQELBf5+Oh4v++5gD8F1RV gDQvcmNvFQ7WQ== X-Virus-Scanned: amavis at zmproxy.tng.vnc.biz Received: from zmproxy.tng.vnc.biz ([127.0.0.1]) by localhost (zmproxy.tng.vnc.biz [127.0.0.1]) (amavis, port 10026) with ESMTP id VTlSXSMZ_0cU; Mon, 18 May 2026 08:21:27 +0200 (CEST) Received: from luis-Precision-5480.. (ipservice-092-209-239-167.092.209.pools.vodafone-ip.de [92.209.239.167]) by zmproxy.tng.vnc.biz (Postfix) with ESMTPSA id DC17D1F8989; Mon, 18 May 2026 08:21:26 +0200 (CEST) From: Luis To: nathan@kernel.org, nsc@kernel.org Cc: linux-kbuild@vger.kernel.org, linux-kernel@vger.kernel.org, akpm@linux-foundation.org, gregkh@linuxfoundation.org, kstewart@linuxfoundation.org, maximilian.huber@tngtech.com, Luis Augenstein Subject: [PATCH v7 13/15] scripts/sbom: add SPDX build graph Date: Mon, 18 May 2026 08:21:00 +0200 Message-ID: <20260518062102.2051814-14-luis.augenstein@tngtech.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20260518062102.2051814-1-luis.augenstein@tngtech.com> References: <20260518062102.2051814-1-luis.augenstein@tngtech.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Luis Augenstein Implement the SPDX build graph to describe the relationships between source files in the source SBOM and output files in the output SBOM. Assisted-by: Cursor:claude-sonnet-4-5 Assisted-by: OpenCode:GLM-4-7 Co-developed-by: Maximilian Huber Signed-off-by: Maximilian Huber Signed-off-by: Luis Augenstein --- .../sbom/sbom/spdx_graph/build_spdx_graphs.py | 17 + .../sbom/sbom/spdx_graph/spdx_build_graph.py | 318 ++++++++++++++++++ 2 files changed, 335 insertions(+) create mode 100644 scripts/sbom/sbom/spdx_graph/spdx_build_graph.py diff --git a/scripts/sbom/sbom/spdx_graph/build_spdx_graphs.py b/scripts/sb= om/sbom/spdx_graph/build_spdx_graphs.py index f2567d44960..ee24e9eaf60 100644 --- a/scripts/sbom/sbom/spdx_graph/build_spdx_graphs.py +++ b/scripts/sbom/sbom/spdx_graph/build_spdx_graphs.py @@ -4,6 +4,7 @@ from datetime import datetime from typing import Protocol =20 +import logging from sbom.config import KernelSpdxDocumentKind from sbom.cmd_graph import CmdGraph from sbom.path_utils import PathStr @@ -11,6 +12,7 @@ from sbom.spdx_graph.kernel_file import KernelFileCollect= ion from sbom.spdx_graph.spdx_graph_model import SpdxGraph, SpdxIdGeneratorCol= lection from sbom.spdx_graph.shared_spdx_elements import SharedSpdxElements from sbom.spdx_graph.spdx_source_graph import SpdxSourceGraph +from sbom.spdx_graph.spdx_build_graph import SpdxBuildGraph from sbom.spdx_graph.spdx_output_graph import SpdxOutputGraph =20 =20 @@ -62,5 +64,20 @@ def build_spdx_graphs( shared_elements=3Dshared_elements, spdx_id_generators=3Dspdx_id_generators, ) + else: + logging.info( + "Skipped creating a dedicated source SBOM because source files= cannot be " + "reliably classified when the source and object trees are iden= tical. " + "Added source files to the build SBOM instead." + ) + + build_graph =3D SpdxBuildGraph.create( + cmd_graph, + kernel_files, + shared_elements, + output_graph.high_level_build_element, + spdx_id_generators, + ) + spdx_graphs[KernelSpdxDocumentKind.BUILD] =3D build_graph =20 return spdx_graphs diff --git a/scripts/sbom/sbom/spdx_graph/spdx_build_graph.py b/scripts/sbo= m/sbom/spdx_graph/spdx_build_graph.py new file mode 100644 index 00000000000..4d738bc3b3e --- /dev/null +++ b/scripts/sbom/sbom/spdx_graph/spdx_build_graph.py @@ -0,0 +1,318 @@ +# SPDX-License-Identifier: GPL-2.0-only OR MIT +# Copyright (C) 2025 TNG Technology Consulting GmbH + +from dataclasses import dataclass +from typing import Mapping +from sbom.cmd_graph import CmdGraph +from sbom.path_utils import PathStr +from sbom.spdx import SpdxIdGenerator +from sbom.spdx.build import Build +from sbom.spdx.core import ExternalMap, NamespaceMap, Relationship, SpdxDo= cument +from sbom.spdx.software import File, Sbom +from sbom.spdx_graph.kernel_file import KernelFileCollection +from sbom.spdx_graph.shared_spdx_elements import SharedSpdxElements +from sbom.spdx_graph.spdx_graph_model import SpdxGraph, SpdxIdGeneratorCol= lection +from sbom.spdx_graph.spdx_source_graph import source_file_license_elements + + +@dataclass +class SpdxBuildGraph(SpdxGraph): + """SPDX graph representing build dependencies connecting source files = and + distributable output files""" + + @classmethod + def create( + cls, + cmd_graph: CmdGraph, + kernel_files: KernelFileCollection, + shared_elements: SharedSpdxElements, + high_level_build_element: Build, + spdx_id_generators: SpdxIdGeneratorCollection, + ) -> "SpdxBuildGraph": + if len(kernel_files.source) > 0: + return _create_spdx_build_graph( + cmd_graph, + kernel_files, + shared_elements, + high_level_build_element, + spdx_id_generators, + ) + else: + return _create_spdx_build_graph_with_mixed_sources( + cmd_graph, + kernel_files, + shared_elements, + high_level_build_element, + spdx_id_generators, + ) + + +def _create_spdx_build_graph( + cmd_graph: CmdGraph, + kernel_files: KernelFileCollection, + shared_elements: SharedSpdxElements, + high_level_build_element: Build, + spdx_id_generators: SpdxIdGeneratorCollection, +) -> SpdxBuildGraph: + """ + Creates an SPDX build graph where source and output files are referenc= ed + from external documents. + + Args: + cmd_graph: The dependency graph of a kernel build. + kernel_files: Collection of categorized kernel files involved in t= he build. + shared_elements: SPDX elements shared across multiple documents. + high_level_build_element: The high-level Build element referenced = by the build graph. + spdx_id_generators: Collection of generators for SPDX element IDs. + + Returns: + SpdxBuildGraph: The SPDX build graph connecting source files and d= istributable output files. + """ + # SpdxDocument + build_spdx_document =3D SpdxDocument( + spdxId=3Dspdx_id_generators.build.generate(), + profileConformance=3D["core", "software", "build"], + namespaceMap=3D[ + NamespaceMap(prefix=3Dgenerator.prefix, namespace=3Dgenerator.= namespace) + for generator in [ + spdx_id_generators.build, + spdx_id_generators.source, + spdx_id_generators.output, + spdx_id_generators.base, + ] + if generator.prefix is not None + ], + ) + + # Sbom + build_sbom =3D Sbom( + spdxId=3Dspdx_id_generators.build.generate(), + software_sbomType=3D["build"], + ) + + # Src and object tree elements + obj_tree_element =3D File( + spdxId=3Dspdx_id_generators.build.generate(), + name=3D"$(obj_tree)", + software_fileKind=3D"directory", + ) + obj_tree_contains_relationship =3D Relationship( + spdxId=3Dspdx_id_generators.build.generate(), + relationshipType=3D"contains", + from_=3Dobj_tree_element, + to=3D[], + ) + + # File elements + build_file_elements =3D [file.spdx_file_element for file in kernel_fil= es.build.values()] + file_relationships =3D _file_relationships( + cmd_graph=3Dcmd_graph, + file_elements=3D{key: file.spdx_file_element for key, file in kern= el_files.to_dict().items()}, + high_level_build_element=3Dhigh_level_build_element, + spdx_id_generator=3Dspdx_id_generators.build, + ) + + # Update relationships + build_spdx_document.rootElement =3D [build_sbom] + + build_spdx_document.import_ =3D [ + *( + ExternalMap(externalSpdxId=3Dfile.spdx_file_element.spdxId) + for file in (*kernel_files.source.values(), *kernel_files.exte= rnal.values()) + ), + ExternalMap(externalSpdxId=3Dhigh_level_build_element.spdxId), + *(ExternalMap(externalSpdxId=3Dfile.spdx_file_element.spdxId) for = file in kernel_files.output.values()), + ] + + build_sbom.rootElement =3D [obj_tree_element] + build_sbom.element =3D [ + obj_tree_element, + obj_tree_contains_relationship, + *build_file_elements, + *file_relationships, + ] + + obj_tree_contains_relationship.to =3D [ + *build_file_elements, + *(file.spdx_file_element for file in kernel_files.output.values()), + ] + + # create Spdx graphs + build_graph =3D SpdxBuildGraph( + build_spdx_document, + shared_elements.agent, + shared_elements.creation_info, + build_sbom, + ) + return build_graph + + +def _create_spdx_build_graph_with_mixed_sources( + cmd_graph: CmdGraph, + kernel_files: KernelFileCollection, + shared_elements: SharedSpdxElements, + high_level_build_element: Build, + spdx_id_generators: SpdxIdGeneratorCollection, +) -> SpdxBuildGraph: + """ + Creates an SPDX build graph where only output files are referenced from + an external document. Source files are included directly in the build = graph. + + Args: + cmd_graph: The dependency graph of a kernel build. + kernel_files: Collection of categorized kernel files involved in t= he build. + shared_elements: SPDX elements shared across multiple documents. + high_level_build_element: The high-level Build element referenced = by the build graph. + spdx_id_generators: Collection of generators for SPDX element IDs. + + Returns: + SpdxBuildGraph: The SPDX build graph connecting source files and d= istributable output files. + """ + # SpdxDocument + build_spdx_document =3D SpdxDocument( + spdxId=3Dspdx_id_generators.build.generate(), + profileConformance=3D["core", "software", "build"], + namespaceMap=3D[ + NamespaceMap(prefix=3Dgenerator.prefix, namespace=3Dgenerator.= namespace) + for generator in [ + spdx_id_generators.build, + spdx_id_generators.output, + spdx_id_generators.base, + ] + if generator.prefix is not None + ], + ) + + # Sbom + build_sbom =3D Sbom( + spdxId=3Dspdx_id_generators.build.generate(), + software_sbomType=3D["build"], + ) + + # File elements + build_file_elements =3D [file.spdx_file_element for file in kernel_fil= es.build.values()] + external_file_elements =3D [file.spdx_file_element for file in kernel_= files.external.values()] + file_relationships =3D _file_relationships( + cmd_graph=3Dcmd_graph, + file_elements=3D{key: file.spdx_file_element for key, file in kern= el_files.to_dict().items()}, + high_level_build_element=3Dhigh_level_build_element, + spdx_id_generator=3Dspdx_id_generators.build, + ) + + # Source file license elements + source_file_license_identifiers, source_file_license_relationships =3D= source_file_license_elements( + list(kernel_files.build.values()), spdx_id_generators.build + ) + + # Update relationships + build_spdx_document.rootElement =3D [build_sbom] + root_file_elements =3D [file.spdx_file_element for file in kernel_file= s.output.values()] + build_spdx_document.import_ =3D [ + ExternalMap(externalSpdxId=3Dhigh_level_build_element.spdxId), + *(ExternalMap(externalSpdxId=3Dfile.spdxId) for file in root_file_= elements), + ] + + build_sbom.rootElement =3D [*root_file_elements] + build_sbom.element =3D [ + *build_file_elements, + *external_file_elements, + *source_file_license_identifiers, + *source_file_license_relationships, + *file_relationships, + ] + + build_graph =3D SpdxBuildGraph( + build_spdx_document, + shared_elements.agent, + shared_elements.creation_info, + build_sbom, + ) + return build_graph + + +def _file_relationships( + cmd_graph: CmdGraph, + file_elements: Mapping[PathStr, File], + high_level_build_element: Build, + spdx_id_generator: SpdxIdGenerator, +) -> list[Build | Relationship]: + """ + Construct SPDX Build and Relationship elements representing dependency + relationships in the cmd graph. + + Args: + cmd_graph: The dependency graph of a kernel build. + file_elements: Mapping of filesystem paths (PathStr) to their + corresponding SPDX File elements. + high_level_build_element: The SPDX Build element representing the = overall build process/root. + spdx_id_generator: Generator for unique SPDX IDs. + + Returns: + list[Build | Relationship]: List of SPDX Build and Relationship el= ements + """ + high_level_build_ancestorOf_relationship =3D Relationship( + spdxId=3Dspdx_id_generator.generate(), + relationshipType=3D"ancestorOf", + from_=3Dhigh_level_build_element, + completeness=3D"complete", + to=3D[], + ) + + # Create a relationship between each node (output file) + # and its children (input files) + build_and_relationship_elements: list[Build | Relationship] =3D [high_= level_build_ancestorOf_relationship] + for node in cmd_graph: + # .cmd file dependencies + if node.cmd_file is not None: + build_element =3D Build( + spdxId=3Dspdx_id_generator.generate(), + build_buildType=3Dhigh_level_build_element.build_buildType, + build_buildId=3Dhigh_level_build_element.build_buildId, + comment=3Dnode.cmd_file.savedcmd, + ) + build_and_relationship_elements.append(build_element) + + if node.cmd_file_dependencies: + hasInput_relationship =3D Relationship( + spdxId=3Dspdx_id_generator.generate(), + relationshipType=3D"hasInput", + from_=3Dbuild_element, + to=3D[file_elements[dep.absolute_path] for dep in node= .cmd_file_dependencies], + ) + build_and_relationship_elements.append(hasInput_relationsh= ip) + + hasOutput_relationship =3D Relationship( + spdxId=3Dspdx_id_generator.generate(), + relationshipType=3D"hasOutput", + from_=3Dbuild_element, + to=3D[file_elements[node.absolute_path]], + ) + build_and_relationship_elements.append(hasOutput_relationship) + + high_level_build_ancestorOf_relationship.to.append(build_eleme= nt) + + # incbin dependencies + if len(node.incbin_dependencies) > 0: + incbin_dependsOn_relationship =3D Relationship( + spdxId=3Dspdx_id_generator.generate(), + relationshipType=3D"dependsOn", + comment=3D"\n".join([incbin_dependency.full_statement for = incbin_dependency in node.incbin_dependencies]), + from_=3Dfile_elements[node.absolute_path], + to=3D[ + file_elements[incbin_dependency.node.absolute_path] + for incbin_dependency in node.incbin_dependencies + ], + ) + build_and_relationship_elements.append(incbin_dependsOn_relati= onship) + + # hardcoded dependencies + if len(node.hardcoded_dependencies) > 0: + hardcoded_dependency_relationship =3D Relationship( + spdxId=3Dspdx_id_generator.generate(), + relationshipType=3D"dependsOn", + from_=3Dfile_elements[node.absolute_path], + to=3D[file_elements[n.absolute_path] for n in node.hardcod= ed_dependencies], + ) + build_and_relationship_elements.append(hardcoded_dependency_re= lationship) + + return build_and_relationship_elements --=20 2.43.0 From nobody Mon May 25 05:54:39 2026 Received: from mailgw02.zimbra-vnc.de (mailgw02.zimbra-vnc.de [148.251.102.236]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C7A2B3DF019; Mon, 18 May 2026 06:21:36 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=148.251.102.236 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779085307; cv=none; b=aQFo7hPfcpLwnRRf3uLPtmN9uTjYLVPCAJ9pu0UbbKfY7MrybCNoNn/Nnqof0t1MitrsectfPyUErV6UTnLZn8hyR8vb2J2R2YDTPnMJ942gK/Mrc8nRORg3Eski8eWAWCA5ErQtIqBWVvnGUE+j5Ss7nZtm/C64oewLzy7OSA0= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779085307; c=relaxed/simple; bh=4NmpOJ6YtHDRQBnpBkU+Dlewh2FQzXPfieP9CTF3b1U=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=KS9rSP4wmHHWMJ10CPTWa//vaQ1ohySFcrGLIkd1TgqbsDKl2Q7XyCdltYKzl0AcWkYi8Spkdr1epYkUsAj3fa28q8ozESP+5Ajk3msFuBkm7wRp8oDffMp+N9tSQDfMpnYcF60rpbCVXXbFDD7hlrknQXcJAvKhMthQ1n55/vk= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=tngtech.com; spf=pass smtp.mailfrom=tngtech.com; dkim=pass (2048-bit key) header.d=tngtech.com header.i=@tngtech.com header.b=QZIYOBTh; arc=none smtp.client-ip=148.251.102.236 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=tngtech.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=tngtech.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=tngtech.com header.i=@tngtech.com header.b="QZIYOBTh" Received: from zmproxy.tng.vnc.biz (zimbra-vnc.tngtech.com [35.234.71.156]) by mailgw02.zimbra-vnc.de (Postfix) with ESMTPS id 63EA0200CE; Mon, 18 May 2026 08:21:31 +0200 (CEST) Received: from localhost (localhost [127.0.0.1]) by zmproxy.tng.vnc.biz (Postfix) with ESMTP id 2E48A1F89A8; Mon, 18 May 2026 08:21:31 +0200 (CEST) Received: from zmproxy.tng.vnc.biz ([127.0.0.1]) by localhost (zmproxy.tng.vnc.biz [127.0.0.1]) (amavis, port 10032) with ESMTP id TU_fzY04TXDm; Mon, 18 May 2026 08:21:28 +0200 (CEST) Received: from localhost (localhost [127.0.0.1]) by zmproxy.tng.vnc.biz (Postfix) with ESMTP id CB77A1FACCB; Mon, 18 May 2026 08:21:28 +0200 (CEST) DKIM-Filter: OpenDKIM Filter v2.10.3 zmproxy.tng.vnc.biz CB77A1FACCB DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=tngtech.com; s=B14491C6-869D-11EB-BB6C-8DD33D883B31; t=1779085288; bh=FA3MeeJSL2kCLS6ILJNS/yPLeIWjDcAPSqTtxKoAquQ=; h=From:To:Subject:Date:Message-ID:MIME-Version; b=QZIYOBThdj6xmLdCYkCWQLZlHjXjPk4snBsEYohXmdb7UlYpkzdCOMKNETqTT3bSF 1ThhdhOw9S2Q5Hu5xQLH+UzRlc1c1TAE9TGEozZWOMilpTX+JRds2isKjvct9aV5IX rJl8OkKsHqNKyj2A1C/hQSo1F9BC3kftz79Yw8CyIv1L72cLzerCKs0VEV33Hknq+/ sbJOZfiHmieCqSqYdpyE0WPKq7GtPRXJ48jFFC94YUFLj2yQf6NAw0w6I46M5d8Owu xVZ4evrD+rFLh0XLX6Gqn8INfbotTsA95Z2EKovN2cSF8b9DHOZmc1aDXTPKTKQ70J iaTXm3+iTT6nA== X-Virus-Scanned: amavis at zmproxy.tng.vnc.biz Received: from zmproxy.tng.vnc.biz ([127.0.0.1]) by localhost (zmproxy.tng.vnc.biz [127.0.0.1]) (amavis, port 10026) with ESMTP id ZnIJfU4LlwOX; Mon, 18 May 2026 08:21:28 +0200 (CEST) Received: from luis-Precision-5480.. (ipservice-092-209-239-167.092.209.pools.vodafone-ip.de [92.209.239.167]) by zmproxy.tng.vnc.biz (Postfix) with ESMTPSA id 6E6C41F89A8; Mon, 18 May 2026 08:21:28 +0200 (CEST) From: Luis To: nathan@kernel.org, nsc@kernel.org Cc: linux-kbuild@vger.kernel.org, linux-kernel@vger.kernel.org, akpm@linux-foundation.org, gregkh@linuxfoundation.org, kstewart@linuxfoundation.org, maximilian.huber@tngtech.com, Luis Augenstein Subject: [PATCH v7 14/15] scripts/sbom: add unit tests for command parsers Date: Mon, 18 May 2026 08:21:01 +0200 Message-ID: <20260518062102.2051814-15-luis.augenstein@tngtech.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20260518062102.2051814-1-luis.augenstein@tngtech.com> References: <20260518062102.2051814-1-luis.augenstein@tngtech.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Luis Augenstein Add unit tests to verify that command parsers correctly extract input files from build commands. Assisted-by: Cursor:claude-sonnet-4-5 Assisted-by: OpenCode:GLM-4-7 Co-developed-by: Maximilian Huber Signed-off-by: Maximilian Huber Signed-off-by: Luis Augenstein --- scripts/sbom/tests/__init__.py | 0 scripts/sbom/tests/cmd_graph/__init__.py | 0 .../tests/cmd_graph/test_savedcmd_parser.py | 443 ++++++++++++++++++ 3 files changed, 443 insertions(+) create mode 100644 scripts/sbom/tests/__init__.py create mode 100644 scripts/sbom/tests/cmd_graph/__init__.py create mode 100644 scripts/sbom/tests/cmd_graph/test_savedcmd_parser.py diff --git a/scripts/sbom/tests/__init__.py b/scripts/sbom/tests/__init__.py new file mode 100644 index 00000000000..e69de29bb2d diff --git a/scripts/sbom/tests/cmd_graph/__init__.py b/scripts/sbom/tests/= cmd_graph/__init__.py new file mode 100644 index 00000000000..e69de29bb2d diff --git a/scripts/sbom/tests/cmd_graph/test_savedcmd_parser.py b/scripts= /sbom/tests/cmd_graph/test_savedcmd_parser.py new file mode 100644 index 00000000000..a061a748e1b --- /dev/null +++ b/scripts/sbom/tests/cmd_graph/test_savedcmd_parser.py @@ -0,0 +1,443 @@ +# SPDX-License-Identifier: GPL-2.0-only OR MIT +# Copyright (C) 2025 TNG Technology Consulting GmbH + +import os +import unittest +from unittest.mock import patch + +from sbom.cmd_graph.savedcmd_parser import parse_inputs_from_commands +from sbom.cmd_graph.savedcmd_parser.command_parser_registry import Command= ParserRegistry +import sbom.sbom_logging as sbom_logging + + +class TestSavedCmdParser(unittest.TestCase): + def _assert_parsing(self, cmd: str, expected: str, registry: CommandPa= rserRegistry | None =3D None) -> None: + sbom_logging.init() + parsed =3D parse_inputs_from_commands(cmd, fail_on_unknown_build_c= ommand=3DFalse, registry=3Dregistry) + target =3D [] if expected =3D=3D "" else expected.split(" ") + self.assertEqual(parsed, target) + errors =3D sbom_logging._error_logger._message_counts # type: igno= re + self.assertEqual(errors, {}) + + # Compound command tests + def test_dd_cat(self): + cmd =3D "(dd if=3Darch/x86/boot/setup.bin bs=3D4k conv=3Dsync stat= us=3Dnone; cat arch/x86/boot/vmlinux.bin) >arch/x86/boot/bzImage" + expected =3D "arch/x86/boot/setup.bin arch/x86/boot/vmlinux.bin" + self._assert_parsing(cmd, expected) + + def test_manual_file_creation(self): + cmd =3D """{ symbase=3D__dtbo_overlay_bad_unresolved; echo '$(poun= d)include '; echo '.section .rodata,"a"'; echo '= .balign STRUCT_ALIGNMENT'; echo ".global $${symbase}_begin"; echo "$${symba= se}_begin:"; echo '.incbin "drivers/of/unittest-data/overlay_bad_unresolved= .dtbo" '; echo ".global $${symbase}_end"; echo "$${symbase}_end:"; echo '.b= align STRUCT_ALIGNMENT'; } > drivers/of/unittest-data/overlay_bad_unresolve= d.dtbo.S""" + expected =3D "" + self._assert_parsing(cmd, expected) + + def test_cat_xz_wrap(self): + cmd =3D "{ cat arch/x86/boot/compressed/vmlinux.bin | sh ../script= s/xz_wrap.sh; printf \\130\\064\\024\\000; } > arch/x86/boot/compressed/vml= inux.bin.xz" + expected =3D "arch/x86/boot/compressed/vmlinux.bin" + self._assert_parsing(cmd, expected) + + def test_printf_sed(self): + cmd =3D r"""{ printf 'static char tomoyo_builtin_profile[] __init= data =3D\n'; sed -e 's/\\/\\\\/g' -e 's/\"/\\"/g' -e 's/\(.*\)/\t"\1\\n"/' = -- /dev/null; printf '\t"";\n'; printf 'static char tomoyo_builtin_excepti= on_policy[] __initdata =3D\n'; sed -e 's/\\/\\\\/g' -e 's/\"/\\"/g' -e 's/\= (.*\)/\t"\1\\n"/' -- ../security/tomoyo/policy/exception_policy.conf.defaul= t; printf '\t"";\n'; printf 'static char tomoyo_builtin_domain_policy[] __= initdata =3D\n'; sed -e 's/\\/\\\\/g' -e 's/\"/\\"/g' -e 's/\(.*\)/\t"\1\\n= "/' -- /dev/null; printf '\t"";\n'; printf 'static char tomoyo_builtin_man= ager[] __initdata =3D\n'; sed -e 's/\\/\\\\/g' -e 's/\"/\\"/g' -e 's/\(.*\)= /\t"\1\\n"/' -- /dev/null; printf '\t"";\n'; printf 'static char tomoyo_bu= iltin_stat[] __initdata =3D\n'; sed -e 's/\\/\\\\/g' -e 's/\"/\\"/g' -e 's/= \(.*\)/\t"\1\\n"/' -- /dev/null; printf '\t"";\n'; } > security/tomoyo/buil= tin-policy.h""" + expected =3D "../security/tomoyo/policy/exception_policy.conf.defa= ult" + self._assert_parsing(cmd, expected) + + def test_bin2c_echo(self): + cmd =3D """(echo "static char tomoyo_builtin_profile[] __initdata = =3D"; ./scripts/bin2c security/tomoyo/builtin-policy= .h""" + expected =3D "../security/tomoyo/policy/exception_policy.conf.defa= ult" + self._assert_parsing(cmd, expected) + + def test_cat_colon(self): + cmd =3D "{ cat init/modules.order; cat usr/modules.order; ca= t arch/x86/modules.order; cat arch/x86/boot/startup/modules.order; cat = kernel/modules.order; cat certs/modules.order; cat mm/modules.order; = cat fs/modules.order; cat ipc/modules.order; cat security/modules.order= ; cat crypto/modules.order; cat block/modules.order; cat io_uring/mod= ules.order; cat lib/modules.order; cat arch/x86/lib/modules.order; ca= t drivers/modules.order; cat sound/modules.order; cat samples/modules.o= rder; cat net/modules.order; cat virt/modules.order; cat arch/x86/pci= /modules.order; cat arch/x86/power/modules.order; cat arch/x86/video/mo= dules.order; :; } > modules.order" + expected =3D "init/modules.order usr/modules.order arch/x86/module= s.order arch/x86/boot/startup/modules.order kernel/modules.order certs/modu= les.order mm/modules.order fs/modules.order ipc/modules.order security/modu= les.order crypto/modules.order block/modules.order io_uring/modules.order l= ib/modules.order arch/x86/lib/modules.order drivers/modules.order sound/mod= ules.order samples/modules.order net/modules.order virt/modules.order arch/= x86/pci/modules.order arch/x86/power/modules.order arch/x86/video/modules.o= rder" + self._assert_parsing(cmd, expected) + + def test_cat_zstd(self): + cmd =3D "{ cat arch/x86/boot/compressed/vmlinux.bin arch/x86/boot/= compressed/vmlinux.relocs | zstd -22 --ultra; printf \\340\\362\\066\\003; = } > arch/x86/boot/compressed/vmlinux.bin.zst" + expected =3D "arch/x86/boot/compressed/vmlinux.bin arch/x86/boot/c= ompressed/vmlinux.relocs" + self._assert_parsing(cmd, expected) + + # cat command tests + def test_cat_redirect(self): + cmd =3D "cat ../fs/unicode/utf8data.c_shipped > fs/unicode/utf8dat= a.c" + expected =3D "../fs/unicode/utf8data.c_shipped" + self._assert_parsing(cmd, expected) + + def test_cat_piped(self): + cmd =3D "cat arch/x86/boot/compressed/vmlinux.bin arch/x86/boot/co= mpressed/vmlinux.relocs | gzip -n -f -9 > arch/x86/boot/compressed/vmlinux.= bin.gz" + expected =3D "arch/x86/boot/compressed/vmlinux.bin arch/x86/boot/c= ompressed/vmlinux.relocs" + self._assert_parsing(cmd, expected) + + # sed command tests + def test_sed(self): + cmd =3D "sed -n 's/.*define *BLIST_\\([A-Z0-9_]*\\) *.*/BLIST_FLAG= _NAME(\\1),/p' ../include/scsi/scsi_devinfo.h > drivers/scsi/scsi_devinfo_t= bl.c" + expected =3D "../include/scsi/scsi_devinfo.h" + self._assert_parsing(cmd, expected) + + # awk command tests + def test_awk(self): + cmd =3D "awk -f ../arch/arm64/tools/gen-cpucaps.awk ../arch/arm64/= tools/cpucaps > arch/arm64/include/generated/asm/cpucap-defs.h" + expected =3D "../arch/arm64/tools/cpucaps" + self._assert_parsing(cmd, expected) + + def test_awk_with_input_redirection(self): + cmd =3D "awk -v N=3D1 -f ../lib/raid6/unroll.awk < ../lib/raid6/in= t.uc > lib/raid6/int1.c" + expected =3D "../lib/raid6/int.uc" + self._assert_parsing(cmd, expected) + + # openssl command tests + def test_openssl(self): + cmd =3D "openssl req -new -nodes -utf8 -sha256 -days 36500 -batch = -x509 -config certs/x509.genkey -outform PEM -out certs/signing_key.pem -ke= yout certs/signing_key.pem 2>&1" + expected =3D "" + self._assert_parsing(cmd, expected) + + # gcc/clang command tests + def test_gcc(self): + cmd =3D ( + "gcc -Wp,-MMD,arch/x86/pci/.i386.o.d -nostdinc -I../arch/x86/i= nclude -I./arch/x86/include/generated -I../include -I./include -I../arch/x8= 6/include/uapi -I./arch/x86/include/generated/uapi -I../include/uapi -I./in= clude/generated/uapi -include ../include/linux/compiler-version.h -include = ../include/linux/kconfig.h -include ../include/linux/compiler_types.h -D__K= ERNEL__ -fmacro-prefix-map=3D../=3D -Werror -std=3Dgnu11 -fshort-wchar -fun= signed-char -fno-common -fno-PIE -fno-strict-aliasing -mno-sse -mno-mmx -mn= o-sse2 -mno-3dnow -mno-avx -fcf-protection=3Dbranch -fno-jump-tables -m64 -= falign-jumps=3D1 -falign-loops=3D1 -mno-80387 -mno-fp-ret-in-387 -mpreferre= d-stack-boundary=3D3 -mskip-rax-setup -march=3Dx86-64 -mtune=3Dgeneric -mno= -red-zone -mcmodel=3Dkernel -mstack-protector-guard-reg=3Dgs -mstack-protec= tor-guard-symbol=3D__ref_stack_chk_guard -Wno-sign-compare -fno-asynchronou= s-unwind-tables -mindirect-branch=3Dthunk-extern -mindirect-branch-register= -mindirect-branch-cs-prefix -mfunction-return=3Dthunk-extern -fno-jump-tab= les -fpatchable-function-entry=3D16,16 -fno-delete-null-pointer-checks -O2 = -fno-allow-store-data-races -fstack-protector-strong -fomit-frame-pointer -= fno-stack-clash-protection -falign-functions=3D16 -fno-strict-overflow -fno= -stack-check -fconserve-stack -fno-builtin-wcslen -Wall -Wextra -Wundef -We= rror=3Dimplicit-function-declaration -Werror=3Dimplicit-int -Werror=3Dretur= n-type -Werror=3Dstrict-prototypes -Wno-format-security -Wno-trigraphs -Wno= -frame-address -Wno-address-of-packed-member -Wmissing-declarations -Wmissi= ng-prototypes -Wframe-larger-than=3D2048 -Wno-main -Wvla-larger-than=3D1 -W= no-pointer-sign -Wcast-function-type -Wno-array-bounds -Wno-stringop-overfl= ow -Wno-alloc-size-larger-than -Wimplicit-fallthrough=3D5 -Werror=3Ddate-ti= me -Werror=3Dincompatible-pointer-types -Werror=3Ddesignated-init -Wenum-co= nversion -Wunused -Wno-unused-but-set-variable -Wno-unused-const-variable -= Wno-packed-not-aligned -Wno-format-overflow -Wno-format-truncation -Wno-str= ingop-truncation -Wno-override-init -Wno-missing-field-initializers -Wno-ty= pe-limits -Wno-shift-negative-value -Wno-maybe-uninitialized -Wno-sign-comp= are -Wno-unused-parameter -I../arch/x86/pci -Iarch/x86/pci -DKBUILD_MODF= ILE=3D" + "arch/x86/pci/i386" + " -DKBUILD_BASENAME=3D" + "i386" + " -DKBUILD_MODNAME=3D" + "i386" + " -D__KBUILD_MODNAME=3Dkmod_i386 -c -o arch/x86/pci/i386.o ../= arch/x86/pci/i386.c " + ) + expected =3D "../arch/x86/pci/i386.c" + self._assert_parsing(cmd, expected) + + def test_gcc_linking(self): + cmd =3D "gcc -o arch/x86/tools/relocs arch/x86/tools/relocs_32.o= arch/x86/tools/relocs_64.o arch/x86/tools/relocs_common.o" + expected =3D "arch/x86/tools/relocs_32.o arch/x86/tools/relocs_64.= o arch/x86/tools/relocs_common.o" + self._assert_parsing(cmd, expected) + + def test_gcc_without_compile_flag(self): + cmd =3D "gcc -Wp,-MMD,arch/x86/boot/compressed/.mkpiggy.d -Wall -W= missing-prototypes -Wstrict-prototypes -O2 -fomit-frame-pointer -std=3Dgnu1= 1 -I ../scripts/include -I../tools/include -I arch/x86/boot/compressed = -o arch/x86/boot/compressed/mkpiggy ../arch/x86/boot/compressed/mkpiggy.c" + expected =3D "../arch/x86/boot/compressed/mkpiggy.c" + self._assert_parsing(cmd, expected) + + def test_gcc_with_env_override(self): + with patch.dict(os.environ, {"CC": "ccache gcc"}): + registry =3D CommandParserRegistry.create() + cmd =3D "gcc -o arch/x86/tools/relocs arch/x86/tools/relocs_= 32.o arch/x86/tools/relocs_64.o arch/x86/tools/relocs_common.o" + expected =3D "arch/x86/tools/relocs_32.o arch/x86/tools/relocs= _64.o arch/x86/tools/relocs_common.o" + self._assert_parsing(cmd, expected, registry) + self._assert_parsing(f"ccache {cmd}", expected, registry) + + def test_gcc_dts_preprocessing(self): + cmd =3D "gcc -E -Wp,-MMD,drivers/of/.empty_root.dtb.d.pre.tmp -nos= tdinc -I ../scripts/dtc/include-prefixes -undef -D__DTS__ -x assembler-with= -cpp -o drivers/of/.empty_root.dtb.dts.tmp ../drivers/of/empty_root.dts" + expected =3D "../drivers/of/empty_root.dts" + self._assert_parsing(cmd, expected) + + def test_clang(self): + cmd =3D """clang -Wp,-MMD,arch/x86/entry/.entry_64_compat.o.d -nos= tdinc -I../arch/x86/include -I./arch/x86/include/generated -I../include -I.= /include -I../arch/x86/include/uapi -I./arch/x86/include/generated/uapi -I.= ./include/uapi -I./include/generated/uapi -include ../include/linux/compile= r-version.h -include ../include/linux/kconfig.h -D__KERNEL__ --target=3Dx86= _64-linux-gnu -fintegrated-as -Werror=3Dunknown-warning-option -Werror=3Dig= nored-optimization-argument -Werror=3Doption-ignored -Werror=3Dunused-comma= nd-line-argument -fmacro-prefix-map=3D../=3D -Werror -D__ASSEMBLY__ -fno-PI= E -m64 -I../arch/x86/entry -Iarch/x86/entry -DKBUILD_MODFILE=3D'"arch/x8= 6/entry/entry_64_compat"' -DKBUILD_MODNAME=3D'"entry_64_compat"' -D__KBUILD= _MODNAME=3Dkmod_entry_64_compat -c -o arch/x86/entry/entry_64_compat.o ../a= rch/x86/entry/entry_64_compat.S""" + expected =3D "../arch/x86/entry/entry_64_compat.S" + self._assert_parsing(cmd, expected) + + # ld command tests + def test_ld(self): + cmd =3D r'ld -o arch/x86/entry/vdso/vdso64.so.dbg -shared --hash-s= tyle=3Dboth --build-id=3Dsha1 --no-undefined --eh-frame-hdr -Bsymbolic -z = noexecstack -m elf_x86_64 -soname linux-vdso.so.1 -z max-page-size=3D4096 -= T arch/x86/entry/vdso/vdso.lds arch/x86/entry/vdso/vdso-note.o arch/x86/ent= ry/vdso/vclock_gettime.o arch/x86/entry/vdso/vgetcpu.o arch/x86/entry/vdso/= vgetrandom.o arch/x86/entry/vdso/vgetrandom-chacha.o; if readelf -rW arch/x= 86/entry/vdso/vdso64.so.dbg | grep -v _NONE | grep -q " R_\w*_"; then (echo= >&2 "arch/x86/entry/vdso/vdso64.so.dbg: dynamic relocations are not suppor= ted"; rm -f arch/x86/entry/vdso/vdso64.so.dbg; /bin/false); fi' + expected =3D "arch/x86/entry/vdso/vdso-note.o arch/x86/entry/vdso/= vclock_gettime.o arch/x86/entry/vdso/vgetcpu.o arch/x86/entry/vdso/vgetrand= om.o arch/x86/entry/vdso/vgetrandom-chacha.o" + self._assert_parsing(cmd, expected) + + def test_ld_with_env_override(self): + with patch.dict(os.environ, {"LD": "some-tool ld"}): + registry =3D CommandParserRegistry.create() + cmd =3D r'ld -o arch/x86/entry/vdso/vdso64.so.dbg -shared --ha= sh-style=3Dboth --build-id=3Dsha1 --no-undefined --eh-frame-hdr -Bsymbolic= -z noexecstack -m elf_x86_64 -soname linux-vdso.so.1 -z max-page-size=3D40= 96 -T arch/x86/entry/vdso/vdso.lds arch/x86/entry/vdso/vdso-note.o arch/x86= /entry/vdso/vclock_gettime.o arch/x86/entry/vdso/vgetcpu.o arch/x86/entry/v= dso/vgetrandom.o arch/x86/entry/vdso/vgetrandom-chacha.o; if readelf -rW ar= ch/x86/entry/vdso/vdso64.so.dbg | grep -v _NONE | grep -q " R_\w*_"; then (= echo >&2 "arch/x86/entry/vdso/vdso64.so.dbg: dynamic relocations are not su= pported"; rm -f arch/x86/entry/vdso/vdso64.so.dbg; /bin/false); fi' + expected =3D "arch/x86/entry/vdso/vdso-note.o arch/x86/entry/v= dso/vclock_gettime.o arch/x86/entry/vdso/vgetcpu.o arch/x86/entry/vdso/vget= random.o arch/x86/entry/vdso/vgetrandom-chacha.o" + self._assert_parsing(cmd, expected, registry) + self._assert_parsing(f"some-tool {cmd}", expected, registry) + + def test_ld_whole_archive(self): + cmd =3D "ld -m elf_x86_64 -z noexecstack -r -o vmlinux.o --whole= -archive vmlinux.a --no-whole-archive --start-group --end-group" + expected =3D "vmlinux.a" + self._assert_parsing(cmd, expected) + + def test_ld_with_at_symbol(self): + cmd =3D "ld.lld -m elf_x86_64 -z noexecstack -r -o fs/efivarfs/e= fivarfs.o @fs/efivarfs/efivarfs.mod ; ./tools/objtool/objtool --hacks=3Dju= mp_label --hacks=3Dnoinstr --hacks=3Dskylake --ibt --orc --retpoline --reth= unk --static-call --uaccess --prefix=3D16 --link --module fs/efivarfs/efi= varfs.o" + expected =3D "@fs/efivarfs/efivarfs.mod" + self._assert_parsing(cmd, expected) + + def test_ld_if_objdump(self): + cmd =3D """ld -o arch/x86/entry/vdso/vdso64.so.dbg -shared --hash-= style=3Dboth --build-id=3Dsha1 --eh-frame-hdr -Bsymbolic -z noexecstack -m= elf_x86_64 -soname linux-vdso.so.1 --no-undefined -z max-page-size=3D4096 = -T arch/x86/entry/vdso/vdso.lds arch/x86/entry/vdso/vdso-note.o arch/x86/en= try/vdso/vclock_gettime.o arch/x86/entry/vdso/vgetcpu.o arch/x86/entry/vdso= /vsgx.o && sh ./arch/x86/entry/vdso/checkundef.sh 'nm' 'arch/x86/entry/vdso= /vdso64.so.dbg'; if objdump -R arch/x86/entry/vdso/vdso64.so.dbg | grep -E = -h "R_X86_64_JUMP_SLOT|R_X86_64_GLOB_DAT|R_X86_64_RELATIVE| R_386_GLOB_DAT|= R_386_JMP_SLOT|R_386_RELATIVE"; then (echo >&2 "arch/x86/entry/vdso/vdso64.= so.dbg: dynamic relocations are not supported"; rm -f arch/x86/entry/vdso/v= dso64.so.dbg; /bin/false); fi""" + expected =3D "arch/x86/entry/vdso/vdso-note.o arch/x86/entry/vdso/= vclock_gettime.o arch/x86/entry/vdso/vgetcpu.o arch/x86/entry/vdso/vsgx.o" + self._assert_parsing(cmd, expected) + + # printf | xargs ar command tests + def test_ar_printf(self): + cmd =3D 'rm -f built-in.a; printf "./%s " init/built-in.a usr/bui= lt-in.a arch/x86/built-in.a arch/x86/boot/startup/built-in.a kernel/built-i= n.a certs/built-in.a mm/built-in.a fs/built-in.a ipc/built-in.a security/bu= ilt-in.a crypto/built-in.a block/built-in.a io_uring/built-in.a lib/built-i= n.a arch/x86/lib/built-in.a drivers/built-in.a sound/built-in.a net/built-i= n.a virt/built-in.a arch/x86/pci/built-in.a arch/x86/power/built-in.a arch/= x86/video/built-in.a | xargs ar cDPrST built-in.a' + expected =3D "./init/built-in.a ./usr/built-in.a ./arch/x86/built-= in.a ./arch/x86/boot/startup/built-in.a ./kernel/built-in.a ./certs/built-i= n.a ./mm/built-in.a ./fs/built-in.a ./ipc/built-in.a ./security/built-in.a = ./crypto/built-in.a ./block/built-in.a ./io_uring/built-in.a ./lib/built-in= .a ./arch/x86/lib/built-in.a ./drivers/built-in.a ./sound/built-in.a ./net/= built-in.a ./virt/built-in.a ./arch/x86/pci/built-in.a ./arch/x86/power/bui= lt-in.a ./arch/x86/video/built-in.a" + self._assert_parsing(cmd, expected) + + def test_ar_printf_nested(self): + cmd =3D 'rm -f arch/x86/pci/built-in.a; printf "arch/x86/pci/%s "= i386.o init.o mmconfig_64.o direct.o mmconfig-shared.o fixup.o acpi.o lega= cy.o irq.o common.o early.o bus_numa.o amd_bus.o | xargs ar cDPrST arch/x86= /pci/built-in.a' + expected =3D "arch/x86/pci/i386.o arch/x86/pci/init.o arch/x86/pci= /mmconfig_64.o arch/x86/pci/direct.o arch/x86/pci/mmconfig-shared.o arch/x8= 6/pci/fixup.o arch/x86/pci/acpi.o arch/x86/pci/legacy.o arch/x86/pci/irq.o = arch/x86/pci/common.o arch/x86/pci/early.o arch/x86/pci/bus_numa.o arch/x86= /pci/amd_bus.o" + self._assert_parsing(cmd, expected) + + # ar command tests + def test_ar_reordering(self): + cmd =3D "rm -f vmlinux.a; ar cDPrST vmlinux.a built-in.a lib/lib.= a arch/x86/lib/lib.a; ar mPiT $$(ar t vmlinux.a | sed -n 1p) vmlinux.a $$(a= r t vmlinux.a | grep -F -f ../scripts/head-object-list.txt)" + expected =3D "built-in.a lib/lib.a arch/x86/lib/lib.a" + self._assert_parsing(cmd, expected) + + def test_ar_default(self): + cmd =3D "rm -f lib/lib.a; ar cDPrsT lib/lib.a lib/argv_split.o lib= /bug.o lib/buildid.o lib/clz_tab.o lib/cmdline.o lib/cpumask.o lib/ctype.o = lib/dec_and_lock.o lib/decompress.o lib/decompress_bunzip2.o lib/decompress= _inflate.o lib/decompress_unlz4.o lib/decompress_unlzma.o lib/decompress_un= lzo.o lib/decompress_unxz.o lib/decompress_unzstd.o lib/dump_stack.o lib/ea= rlycpio.o lib/extable.o lib/flex_proportions.o lib/idr.o lib/iomem_copy.o l= ib/irq_regs.o lib/is_single_threaded.o lib/klist.o lib/kobject.o lib/kobjec= t_uevent.o lib/logic_pio.o lib/maple_tree.o lib/memcat_p.o lib/nmi_backtrac= e.o lib/objpool.o lib/plist.o lib/radix-tree.o lib/ratelimit.o lib/rbtree.o= lib/seq_buf.o lib/siphash.o lib/string.o lib/sys_info.o lib/timerqueue.o l= ib/union_find.o lib/vsprintf.o lib/win_minmax.o lib/xarray.o" + expected =3D "lib/argv_split.o lib/bug.o lib/buildid.o lib/clz_tab= .o lib/cmdline.o lib/cpumask.o lib/ctype.o lib/dec_and_lock.o lib/decompres= s.o lib/decompress_bunzip2.o lib/decompress_inflate.o lib/decompress_unlz4.= o lib/decompress_unlzma.o lib/decompress_unlzo.o lib/decompress_unxz.o lib/= decompress_unzstd.o lib/dump_stack.o lib/earlycpio.o lib/extable.o lib/flex= _proportions.o lib/idr.o lib/iomem_copy.o lib/irq_regs.o lib/is_single_thre= aded.o lib/klist.o lib/kobject.o lib/kobject_uevent.o lib/logic_pio.o lib/m= aple_tree.o lib/memcat_p.o lib/nmi_backtrace.o lib/objpool.o lib/plist.o li= b/radix-tree.o lib/ratelimit.o lib/rbtree.o lib/seq_buf.o lib/siphash.o lib= /string.o lib/sys_info.o lib/timerqueue.o lib/union_find.o lib/vsprintf.o l= ib/win_minmax.o lib/xarray.o" + self._assert_parsing(cmd, expected) + + def test_ar_llvm(self): + cmd =3D "llvm-ar mPiT $$(llvm-ar t vmlinux.a | sed -n 1p) vmlinux.= a $$(llvm-ar t vmlinux.a | grep -F -f ../scripts/head-object-list.txt)" + expected =3D "" + self._assert_parsing(cmd, expected) + + # nm command tests + def test_nm(self): + cmd =3D """llvm-nm -p --defined-only rust/core.o | awk '$$2~/(T|R|= D|B)/ && $$3!~/__(pfx|cfi|odr_asan)/ { printf "EXPORT_SYMBOL_RUST_GPL(%s);\= n",$$3 }' > rust/exports_core_generated.h""" + expected =3D "rust/core.o" + self._assert_parsing(cmd, expected) + + def test_nm_vmlinux(self): + cmd =3D r"nm vmlinux | sed -n -e 's/^\([0-9a-fA-F]*\) [ABbCDGRSTtV= W] \(_text\|__start_rodata\|__bss_start\|_end\)$/#define VO_\2 _AC(0x\1,UL)= /p' > arch/x86/boot/voffset.h" + expected =3D "vmlinux" + self._assert_parsing(cmd, expected) + + # objcopy command tests + def test_objcopy(self): + cmd =3D "objcopy --remove-section=3D'.rel*' --remove-section=3D!'.= rel*.dyn' vmlinux.unstripped vmlinux" + expected =3D "vmlinux.unstripped" + self._assert_parsing(cmd, expected) + + def test_objcopy_llvm(self): + cmd =3D "llvm-objcopy --remove-section=3D'.rel*' --remove-section= =3D!'.rel*.dyn' vmlinux.unstripped vmlinux" + expected =3D "vmlinux.unstripped" + self._assert_parsing(cmd, expected) + + # strip command tests + def test_strip(self): + cmd =3D "strip --strip-debug -o drivers/firmware/efi/libstub/mem.s= tub.o drivers/firmware/efi/libstub/mem.o" + expected =3D "drivers/firmware/efi/libstub/mem.o" + self._assert_parsing(cmd, expected) + + # cp command tests + def test_cp_truncate(self): + cmd =3D "cp arch/arm64/boot/Image arch/arm64/boot/vmlinux.bin; tru= ncate -s $$(hexdump -s16 -n4 -e '\"%u\"' arch/arm64/boot/Image) arch/arm64/= boot/vmlinux.bin" + expected =3D "arch/arm64/boot/Image" + self._assert_parsing(cmd, expected) + + # rustc command tests + def test_rustc(self): + cmd =3D """OBJTREE=3D/workspace/linux/kernel_build rustc -Zbinary_= dep_depinfo=3Dy -Astable_features -Dnon_ascii_idents -Dunsafe_op_in_unsafe_= fn -Wmissing_docs -Wrust_2018_idioms -Wclippy::all -Wclippy::as_ptr_cast_mu= t -Wclippy::as_underscore -Wclippy::cast_lossless -Wclippy::ignored_unit_pa= tterns -Wclippy::mut_mut -Wclippy::needless_bitwise_bool -Aclippy::needless= _lifetimes -Wclippy::no_mangle_with_rust_abi -Wclippy::ptr_as_ptr -Wclippy:= :ptr_cast_constness -Wclippy::ref_as_ptr -Wclippy::undocumented_unsafe_bloc= ks -Wclippy::unnecessary_safety_comment -Wclippy::unnecessary_safety_doc -W= rustdoc::missing_crate_level_docs -Wrustdoc::unescaped_backticks -Cpanic=3D= abort -Cembed-bitcode=3Dn -Clto=3Dn -Cforce-unwind-tables=3Dn -Ccodegen-uni= ts=3D1 -Csymbol-mangling-version=3Dv0 -Crelocation-model=3Dstatic -Zfunctio= n-sections=3Dn -Wclippy::float_arithmetic --target=3D./scripts/target.json = -Ctarget-feature=3D-sse,-sse2,-sse3,-ssse3,-sse4.1,-sse4.2,-avx,-avx2 -Zcf-= protection=3Dbranch -Zno-jump-tables -Ctarget-cpu=3Dx86-64 -Ztune-cpu=3Dgen= eric -Cno-redzone=3Dy -Ccode-model=3Dkernel -Zfunction-return=3Dthunk-exter= n -Zpatchable-function-entry=3D16,16 -Copt-level=3D2 -Cdebug-assertions=3Dn= -Coverflow-checks=3Dy -Dwarnings @./include/generated/rustc_cfg --edition= =3D2021 --cfg no_fp_fmt_parse --emit=3Ddep-info=3Drust/.core.o.d --emit=3Do= bj=3Drust/core.o --emit=3Dmetadata=3Drust/libcore.rmeta --crate-type rlib -= L./rust --crate-name core /usr/lib/rust-1.84/lib/rustlib/src/rust/library/c= ore/src/lib.rs --sysroot=3D/dev/null ;llvm-objcopy --redefine-sym __addsf3= =3D__rust__addsf3 --redefine-sym __eqsf2=3D__rust__eqsf2 --redefine-sym __e= xtendsfdf2=3D__rust__extendsfdf2 --redefine-sym __gesf2=3D__rust__gesf2 --r= edefine-sym __lesf2=3D__rust__lesf2 --redefine-sym __ltsf2=3D__rust__ltsf2 = --redefine-sym __mulsf3=3D__rust__mulsf3 --redefine-sym __nesf2=3D__rust__n= esf2 --redefine-sym __truncdfsf2=3D__rust__truncdfsf2 --redefine-sym __unor= dsf2=3D__rust__unordsf2 --redefine-sym __adddf3=3D__rust__adddf3 --redefine= -sym __eqdf2=3D__rust__eqdf2 --redefine-sym __ledf2=3D__rust__ledf2 --redef= ine-sym __ltdf2=3D__rust__ltdf2 --redefine-sym __muldf3=3D__rust__muldf3 --= redefine-sym __unorddf2=3D__rust__unorddf2 --redefine-sym __muloti4=3D__rus= t__muloti4 --redefine-sym __multi3=3D__rust__multi3 --redefine-sym __udivmo= dti4=3D__rust__udivmodti4 --redefine-sym __udivti3=3D__rust__udivti3 --rede= fine-sym __umodti3=3D__rust__umodti3 rust/core.o""" + expected =3D "/usr/lib/rust-1.84/lib/rustlib/src/rust/library/core= /src/lib.rs rust/core.o" + self._assert_parsing(cmd, expected) + + # rustdoc command tests + def test_rustdoc(self): + cmd =3D """OBJTREE=3D/workspace/linux/kernel_build rustdoc --test = --edition=3D2021 -Zbinary_dep_depinfo=3Dy -Astable_features -Dnon_ascii_ide= nts -Dunsafe_op_in_unsafe_fn -Wmissing_docs -Wrust_2018_idioms -Wunreachabl= e_pub -Wclippy::all -Wclippy::as_ptr_cast_mut -Wclippy::as_underscore -Wcli= ppy::cast_lossless -Wclippy::ignored_unit_patterns -Wclippy::mut_mut -Wclip= py::needless_bitwise_bool -Aclippy::needless_lifetimes -Wclippy::no_mangle_= with_rust_abi -Wclippy::ptr_as_ptr -Wclippy::ptr_cast_constness -Wclippy::r= ef_as_ptr -Wclippy::undocumented_unsafe_blocks -Wclippy::unnecessary_safety= _comment -Wclippy::unnecessary_safety_doc -Wrustdoc::missing_crate_level_do= cs -Wrustdoc::unescaped_backticks -Cpanic=3Dabort -Cembed-bitcode=3Dn -Clto= =3Dn -Cforce-unwind-tables=3Dn -Ccodegen-units=3D1 -Csymbol-mangling-versio= n=3Dv0 -Crelocation-model=3Dstatic -Zfunction-sections=3Dn -Wclippy::float_= arithmetic --target=3Daarch64-unknown-none -Ctarget-feature=3D"-neon" -Cfor= ce-unwind-tables=3Dn -Zbranch-protection=3Dpac-ret -Copt-level=3D2 -Cdebug-= assertions=3Dy -Coverflow-checks=3Dy -Dwarnings -Cforce-frame-pointers=3Dy = -Zsanitizer=3Dkernel-address -Zsanitizer-recover=3Dkernel-address -Cllvm-ar= gs=3D-asan-mapping-offset=3D0xdfff800000000000 -Cpasses=3Dsancov-module -Cl= lvm-args=3D-sanitizer-coverage-level=3D3 -Cllvm-args=3D-sanitizer-coverage-= trace-pc -Cllvm-args=3D-sanitizer-coverage-trace-compares @./include/genera= ted/rustc_cfg -L./rust --extern ffi --extern pin_init --extern kernel --ext= ern build_error --extern macros --extern bindings --extern uapi --no-run --= crate-name kernel -Zunstable-options --sysroot=3D/dev/null --test-builder = ./scripts/rustdoc_test_builder ../rust/kernel/lib.rs >/dev/null""" + expected =3D "../rust/kernel/lib.rs" + self._assert_parsing(cmd, expected) + + def test_rustdoc_test_gen(self): + cmd =3D "./scripts/rustdoc_test_gen" + expected =3D "" + self._assert_parsing(cmd, expected) + + # flex command tests + def test_flex(self): + cmd =3D "flex -oscripts/kconfig/lexer.lex.c -L ../scripts/kconfig/= lexer.l" + expected =3D "../scripts/kconfig/lexer.l" + self._assert_parsing(cmd, expected) + + # bison command tests + def test_bison(self): + cmd =3D "bison -o scripts/kconfig/parser.tab.c --defines=3Dscripts= /kconfig/parser.tab.h -t -l ../scripts/kconfig/parser.y" + expected =3D "../scripts/kconfig/parser.y" + self._assert_parsing(cmd, expected) + + # bindgen command tests + def test_bindgen(self): + cmd =3D ( + "bindgen ../rust/bindings/bindings_helper.h " + "--blocklist-type __kernel_s?size_t --blocklist-type __kernel_= ptrdiff_t " + "--opaque-type xregs_state --opaque-type desc_struct --no-doc-= comments " + "--rust-target 1.68 --use-core --with-derive-default -o rust/b= indings/bindings_generated.rs " + "-- -Wp,-MMD,rust/bindings/.bindings_generated.rs.d -nostdinc = -I../arch/x86/include " + "-include ../include/linux/compiler-version.h -D__KERNEL__ -fi= ntegrated-as -fno-builtin -DMODULE; " + "sed -Ei 's/pub const RUST_CONST_HELPER_([a-zA-Z0-9_]*)/pub co= nst \\1/g' rust/bindings/bindings_generated.rs" + ) + expected =3D "../rust/bindings/bindings_helper.h ../include/linux/= compiler-version.h" + self._assert_parsing(cmd, expected) + + # perl command tests + def test_perl(self): + cmd =3D "perl ../lib/crypto/x86/poly1305-x86_64-cryptogams.pl > li= b/crypto/x86/poly1305-x86_64-cryptogams.S" + expected =3D "../lib/crypto/x86/poly1305-x86_64-cryptogams.pl" + self._assert_parsing(cmd, expected) + + # link-vmlinux.sh command tests + def test_link_vmlinux(self): + cmd =3D '../scripts/link-vmlinux.sh "ld" "-m elf_x86_64 -z noexecs= tack" "-z max-page-size=3D0x200000 --build-id=3Dsha1 --orphan-handling=3Der= ror --emit-relocs --discard-none" "vmlinux.unstripped"; true' + expected =3D "vmlinux.a" + self._assert_parsing(cmd, expected) + + def test_link_vmlinux_postlink(self): + cmd =3D '../scripts/link-vmlinux.sh "ld" "-m elf_x86_64 -z noexecs= tack --no-warn-rwx-segments" "--emit-relocs --discard-none -z max-page-size= =3D0x200000 --build-id=3Dsha1 -X --orphan-handling=3Derror"; make -f ../ar= ch/x86/Makefile.postlink vmlinux' + expected =3D "vmlinux.a" + self._assert_parsing(cmd, expected) + + # syscallhdr.sh command tests + def test_syscallhdr(self): + cmd =3D "sh ../scripts/syscallhdr.sh --abis common,64 --emit-nr = ../arch/x86/entry/syscalls/syscall_64.tbl arch/x86/include/generated/uapi/a= sm/unistd_64.h" + expected =3D "../arch/x86/entry/syscalls/syscall_64.tbl" + self._assert_parsing(cmd, expected) + + # syscalltbl.sh command tests + def test_syscalltbl(self): + cmd =3D "sh ../scripts/syscalltbl.sh --abis common,64 ../arch/x86/= entry/syscalls/syscall_64.tbl arch/x86/include/generated/asm/syscalls_64.h" + expected =3D "../arch/x86/entry/syscalls/syscall_64.tbl" + self._assert_parsing(cmd, expected) + + # mkcapflags.sh command tests + def test_mkcapflags(self): + cmd =3D "sh ../arch/x86/kernel/cpu/mkcapflags.sh arch/x86/kernel/c= pu/capflags.c ../arch/x86/kernel/cpu/../../include/asm/cpufeatures.h ../arc= h/x86/kernel/cpu/../../include/asm/vmxfeatures.h ../arch/x86/kernel/cpu/mkc= apflags.sh FORCE" + expected =3D "../arch/x86/kernel/cpu/../../include/asm/cpufeatures= .h ../arch/x86/kernel/cpu/../../include/asm/vmxfeatures.h" + self._assert_parsing(cmd, expected) + + # orc_hash.sh command tests + def test_orc_hash(self): + cmd =3D "mkdir -p arch/x86/include/generated/asm/; sh ../scripts/o= rc_hash.sh < ../arch/x86/include/asm/orc_types.h > arch/x86/include/generat= ed/asm/orc_hash.h" + expected =3D "../arch/x86/include/asm/orc_types.h" + self._assert_parsing(cmd, expected) + + # xen-hypercalls.sh command tests + def test_xen_hypercalls(self): + cmd =3D "sh '../scripts/xen-hypercalls.sh' arch/x86/include/genera= ted/asm/xen-hypercalls.h ../include/xen/interface/xen-mca.h ../include/xen/= interface/xen.h ../include/xen/interface/xenpmu.h" + expected =3D "../include/xen/interface/xen-mca.h ../include/xen/in= terface/xen.h ../include/xen/interface/xenpmu.h" + self._assert_parsing(cmd, expected) + + # gen_initramfs.sh command tests + def test_gen_initramfs(self): + cmd =3D "sh ../usr/gen_initramfs.sh -o usr/initramfs_data.cpio -l = usr/.initramfs_data.cpio.d ../usr/default_cpio_list" + expected =3D "../usr/default_cpio_list" + self._assert_parsing(cmd, expected) + + # mkuboot.sh command tests + def test_mkuboot(self): + cmd =3D "bash ../scripts/mkuboot.sh -A arm -O linux -C none -T ker= nel -a 0x8000 -e 0x8000 -n 'Linux-6.15.0' -d arch/arm/boot/zImage arch/arm/= boot/uImage" + expected =3D "arch/arm/boot/zImage" + self._assert_parsing(cmd, expected) + + # syscallnr.sh command tests + def test_syscallnr(self): + cmd =3D "sh ../arch/arm/tools/syscallnr.sh ../arch/arm/tools/sysca= ll.tbl arch/arm/include/generated/asm/unistd-nr.h" + expected =3D "../arch/arm/tools/syscall.tbl" + self._assert_parsing(cmd, expected) + + # gen-kernel-hwcaps.sh command tests + def test_gen_kernel_hwcaps(self): + cmd =3D "/bin/sh -e ../arch/arm64/tools/gen-kernel-hwcaps.sh ../ar= ch/arm64/include/uapi/asm/hwcap.h > arch/arm64/include/generated/asm/kernel= -hwcap.h" + expected =3D "../arch/arm64/include/uapi/asm/hwcap.h" + self._assert_parsing(cmd, expected) + + # vdso2c command tests + def test_vdso2c(self): + cmd =3D "arch/x86/entry/vdso/vdso2c arch/x86/entry/vdso/vdso64.so.= dbg arch/x86/entry/vdso/vdso64.so arch/x86/entry/vdso/vdso-image-64.c" + expected =3D "arch/x86/entry/vdso/vdso64.so.dbg arch/x86/entry/vds= o/vdso64.so" + self._assert_parsing(cmd, expected) + + # vdsomunge command tests + def test_vdsomunge(self): + cmd =3D "arch/arm64/kernel/vdso32/../../../arm/vdso/vdsomunge arch= /arm64/kernel/vdso32/vdso.so.raw arch/arm64/kernel/vdso32/vdso32.so.dbg" + expected =3D "arch/arm64/kernel/vdso32/vdso.so.raw" + self._assert_parsing(cmd, expected) + + # mkpiggy command tests + def test_mkpiggy(self): + cmd =3D "arch/x86/boot/compressed/mkpiggy arch/x86/boot/compressed= /vmlinux.bin.gz > arch/x86/boot/compressed/piggy.S" + expected =3D "arch/x86/boot/compressed/vmlinux.bin.gz" + self._assert_parsing(cmd, expected) + + # relocs command tests + def test_relocs(self): + cmd =3D "arch/x86/tools/relocs vmlinux.unstripped > arch/x86/boot/= compressed/vmlinux.relocs;arch/x86/tools/relocs --abs-relocs vmlinux.unstri= pped" + expected =3D "vmlinux.unstripped" + self._assert_parsing(cmd, expected) + + def test_relocs_with_realmode(self): + cmd =3D ( + "arch/x86/tools/relocs --realmode arch/x86/realmode/rm/realmod= e.elf > arch/x86/realmode/rm/realmode.relocs" + ) + expected =3D "arch/x86/realmode/rm/realmode.elf" + self._assert_parsing(cmd, expected) + + # mk_elfconfig command tests + def test_mk_elfconfig(self): + cmd =3D "scripts/mod/mk_elfconfig < scripts/mod/empty.o > scripts/= mod/elfconfig.h" + expected =3D "scripts/mod/empty.o" + self._assert_parsing(cmd, expected) + + # tools/build command tests + def test_build(self): + cmd =3D "arch/x86/boot/tools/build arch/x86/boot/setup.bin arch/x8= 6/boot/vmlinux.bin arch/x86/boot/zoffset.h arch/x86/boot/bzImage" + expected =3D "arch/x86/boot/setup.bin arch/x86/boot/vmlinux.bin ar= ch/x86/boot/zoffset.h" + self._assert_parsing(cmd, expected) + + # extract-cert command tests + def test_extract_cert(self): + cmd =3D 'certs/extract-cert "" certs/signing_key.x509' + expected =3D "" + self._assert_parsing(cmd, expected) + + # dtc command tests + def test_dtc_cat(self): + cmd =3D "./scripts/dtc/dtc -o drivers/of/empty_root.dtb -b 0 -i../= drivers/of/ -i../scripts/dtc/include-prefixes -Wno-unique_unit_address -Wno= -unit_address_vs_reg -Wno-avoid_unnecessary_addr_size -Wno-alias_paths -Wno= -graph_child_address -Wno-simple_bus_reg -d drivers/of/.empty_root.dtb.d.= dtc.tmp drivers/of/.empty_root.dtb.dts.tmp ; cat drivers/of/.empty_root.dtb= .d.pre.tmp drivers/of/.empty_root.dtb.d.dtc.tmp > drivers/of/.empty_root.dt= b.d" + expected =3D "drivers/of/.empty_root.dtb.dts.tmp drivers/of/.empty= _root.dtb.d.pre.tmp drivers/of/.empty_root.dtb.d.dtc.tmp" + self._assert_parsing(cmd, expected) + + # pnmtologo command tests + def test_pnmtologo(self): + cmd =3D "drivers/video/logo/pnmtologo -t clut224 -n logo_linux_clu= t224 -o drivers/video/logo/logo_linux_clut224.c ../drivers/video/logo/logo_= linux_clut224.ppm" + expected =3D "../drivers/video/logo/logo_linux_clut224.ppm" + self._assert_parsing(cmd, expected) + + # relacheck command tests + def test_relacheck(self): + cmd =3D "arch/arm64/kernel/pi/relacheck arch/arm64/kernel/pi/idreg= -override.pi.o arch/arm64/kernel/pi/idreg-override.o" + expected =3D "arch/arm64/kernel/pi/idreg-override.pi.o" + self._assert_parsing(cmd, expected) + + # gen-hyprel command tests + def test_gen_hyprel(self): + cmd =3D "arch/arm64/kvm/hyp/nvhe/gen-hyprel arch/arm64/kvm/hyp/nvh= e/kvm_nvhe.tmp.o > arch/arm64/kvm/hyp/nvhe/hyp-reloc.S" + expected =3D "arch/arm64/kvm/hyp/nvhe/kvm_nvhe.tmp.o" + self._assert_parsing(cmd, expected) + + # mkregtable command tests + def test_mkregtable(self): + cmd =3D "drivers/gpu/drm/radeon/mkregtable ../drivers/gpu/drm/rade= on/reg_srcs/r100 > drivers/gpu/drm/radeon/r100_reg_safe.h" + expected =3D "../drivers/gpu/drm/radeon/reg_srcs/r100" + self._assert_parsing(cmd, expected) + + # genheaders command tests + def test_genheaders(self): + cmd =3D "security/selinux/genheaders security/selinux/flask.h secu= rity/selinux/av_permissions.h" + expected =3D "" + self._assert_parsing(cmd, expected) + + # mkcpustr command tests + def test_mkcpustr(self): + cmd =3D "arch/x86/boot/mkcpustr > arch/x86/boot/cpustr.h" + expected =3D "" + self._assert_parsing(cmd, expected) + + # polgen command tests + def test_polgen(self): + cmd =3D "scripts/ipe/polgen/polgen security/ipe/boot_policy.c" + expected =3D "" + self._assert_parsing(cmd, expected) + + # gen_header.py command tests + def test_gen_header(self): + cmd =3D "mkdir -p drivers/gpu/drm/msm/generated && python3 ../driv= ers/gpu/drm/msm/registers/gen_header.py --no-validate --rnn ../drivers/gpu/= drm/msm/registers --xml ../drivers/gpu/drm/msm/registers/adreno/a2xx.xml c-= defines > drivers/gpu/drm/msm/generated/a2xx.xml.h" + expected =3D "../drivers/gpu/drm/msm/registers/adreno/a2xx.xml" + self._assert_parsing(cmd, expected) + + +if __name__ =3D=3D "__main__": + unittest.main() --=20 2.43.0 From nobody Mon May 25 05:54:39 2026 Received: from mailgw02.zimbra-vnc.de (mailgw02.zimbra-vnc.de [148.251.102.236]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 56C763890E2; Mon, 18 May 2026 06:21:36 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=148.251.102.236 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779085305; cv=none; b=hAjdsOtFvy4NsnnPptPnu7whBtekTQCdMT2X8VKbC5NG4xt3HUNPrIdnSVRrS/f4S6En7NVSehTdB+/nBvjv3iWoH1CjrGg8JF2/EkZEgKye6Om3uSLShULZrd/NlfaPtmytFfSPIzjks6hxmBcrvFmptni3DiTrn/8Kdi0Z/VY= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779085305; c=relaxed/simple; bh=bdItrdgPyiTB3vWGzgQeGi3b8ybVRDHiRyFx0UTZtJQ=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=tIbEZJxAo8gGwMJIk19DC0kZMKyDjOqo52xFgG/UR1Q4UcyszxGxmyKiwB8iTRB43iLdHb2WlcJ8+rcecu4Djd7WmuoAErCeK9C0jwiwoH7u+D8d9RHHgOe8Gl0XmW3O+87oj08JjXSVfXtHQW5mfTK/PeGHhwbbD63BkXhZGIg= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=tngtech.com; spf=pass smtp.mailfrom=tngtech.com; dkim=pass (2048-bit key) header.d=tngtech.com header.i=@tngtech.com header.b=QOya4Cdf; arc=none smtp.client-ip=148.251.102.236 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=tngtech.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=tngtech.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=tngtech.com header.i=@tngtech.com header.b="QOya4Cdf" Received: from zmproxy.tng.vnc.biz (zimbra-vnc.tngtech.com [35.234.71.156]) by mailgw02.zimbra-vnc.de (Postfix) with ESMTPS id 39A44200CD; Mon, 18 May 2026 08:21:31 +0200 (CEST) Received: from localhost (localhost [127.0.0.1]) by zmproxy.tng.vnc.biz (Postfix) with ESMTP id 0CEFE1F8989; Mon, 18 May 2026 08:21:31 +0200 (CEST) Received: from zmproxy.tng.vnc.biz ([127.0.0.1]) by localhost (zmproxy.tng.vnc.biz [127.0.0.1]) (amavis, port 10032) with ESMTP id OmGHK_MbeaCO; Mon, 18 May 2026 08:21:30 +0200 (CEST) Received: from localhost (localhost [127.0.0.1]) by zmproxy.tng.vnc.biz (Postfix) with ESMTP id 668421FAD23; Mon, 18 May 2026 08:21:30 +0200 (CEST) DKIM-Filter: OpenDKIM Filter v2.10.3 zmproxy.tng.vnc.biz 668421FAD23 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=tngtech.com; s=B14491C6-869D-11EB-BB6C-8DD33D883B31; t=1779085290; bh=FD6c3aam+X5MsDPR+uNQsdOzwvzHC15J+0Psqfwx3og=; h=From:To:Subject:Date:Message-ID:MIME-Version; b=QOya4Cdfk7OIkay1YU08etboNEx0ZNCdXqQgQuMrZSqGcKPxpEu5cm9imvoNW72Sh wgyyJo6IDle7B2CsyHDASNDPiJe8wYP20Ku50FwzisxtG/gwSciVNIKJSlTHsziuFc 2acgooF0Ns42J59DTp7poAJDQq7JG0WYffEbYVGysrjp14X8Bql1yckym3x3/NJp78 sZ52zxE5ZrGo3URZWVm0sHhbZs8xpH7uIMAjHtBZTW+pE9tMC4TfOO79QnvUeBiRFK lbOfPtgWOsoMBj1LAxOSg28XXLpJAfxDrEcYP+kcjH5f4U5aQ3Lsr3wDaHFJsQ/MwF Iqq9lBhoQ302Q== X-Virus-Scanned: amavis at zmproxy.tng.vnc.biz Received: from zmproxy.tng.vnc.biz ([127.0.0.1]) by localhost (zmproxy.tng.vnc.biz [127.0.0.1]) (amavis, port 10026) with ESMTP id EUmYUuGJIfnU; Mon, 18 May 2026 08:21:30 +0200 (CEST) Received: from luis-Precision-5480.. (ipservice-092-209-239-167.092.209.pools.vodafone-ip.de [92.209.239.167]) by zmproxy.tng.vnc.biz (Postfix) with ESMTPSA id 0CC2B1FAD21; Mon, 18 May 2026 08:21:30 +0200 (CEST) From: Luis To: nathan@kernel.org, nsc@kernel.org Cc: linux-kbuild@vger.kernel.org, linux-kernel@vger.kernel.org, akpm@linux-foundation.org, gregkh@linuxfoundation.org, kstewart@linuxfoundation.org, maximilian.huber@tngtech.com, Luis Augenstein Subject: [PATCH v7 15/15] scripts/sbom: add unit tests for SPDX-License-Identifier parsing Date: Mon, 18 May 2026 08:21:02 +0200 Message-ID: <20260518062102.2051814-16-luis.augenstein@tngtech.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20260518062102.2051814-1-luis.augenstein@tngtech.com> References: <20260518062102.2051814-1-luis.augenstein@tngtech.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Luis Augenstein Verify that SPDX-License-Identifier headers at the top of source files are parsed correctly. Assisted-by: Cursor:claude-sonnet-4-5 Assisted-by: OpenCode:GLM-4-7 Co-developed-by: Maximilian Huber Signed-off-by: Maximilian Huber Signed-off-by: Luis Augenstein --- scripts/sbom/tests/spdx_graph/__init__.py | 0 .../sbom/tests/spdx_graph/test_kernel_file.py | 35 +++++++++++++++++++ 2 files changed, 35 insertions(+) create mode 100644 scripts/sbom/tests/spdx_graph/__init__.py create mode 100644 scripts/sbom/tests/spdx_graph/test_kernel_file.py diff --git a/scripts/sbom/tests/spdx_graph/__init__.py b/scripts/sbom/tests= /spdx_graph/__init__.py new file mode 100644 index 00000000000..e69de29bb2d diff --git a/scripts/sbom/tests/spdx_graph/test_kernel_file.py b/scripts/sb= om/tests/spdx_graph/test_kernel_file.py new file mode 100644 index 00000000000..35a63a768ba --- /dev/null +++ b/scripts/sbom/tests/spdx_graph/test_kernel_file.py @@ -0,0 +1,35 @@ +# SPDX-License-Identifier: GPL-2.0-only OR MIT +# Copyright (C) 2025 TNG Technology Consulting GmbH + +import unittest +from pathlib import Path +import tempfile +from sbom.spdx_graph.kernel_file import _parse_spdx_license_identifier # = type: ignore + + +class TestKernelFile(unittest.TestCase): + def setUp(self): + self.tmpdir =3D tempfile.TemporaryDirectory() + self.src_tree =3D Path(self.tmpdir.name) + + def tearDown(self): + self.tmpdir.cleanup() + + def test_parse_spdx_license_identifier(self): + # REUSE-IgnoreStart + test_cases: list[tuple[str, str | None]] =3D [ + ("/* SPDX-License-Identifier: MIT*/", "MIT"), + ("// SPDX-License-Identifier: GPL-2.0-only", "GPL-2.0-only"), + ("# SPDX-License-Identifier: GPL-2.0-only", "GPL-2.0-only"), + ("#!/bin/bash\n# SPDX-License-Identifier: GPL-2.0-only", "GPL-= 2.0-only"), + ("/* SPDX-License-Identifier: GPL-2.0-or-later OR MIT */", "GP= L-2.0-or-later OR MIT"), + ("/* SPDX-License-Identifier: Apache-2.0 */\n extra text", "Ap= ache-2.0"), + ("", "GPL-2.0"), + ("int main() { return 0; }", None), + ] + # REUSE-IgnoreEnd + + for i, (file_content, expected_identifier) in enumerate(test_cases= ): + file_path =3D self.src_tree / f"file_{i}.c" + file_path.write_text(file_content) + self.assertEqual(_parse_spdx_license_identifier(str(file_path)= ), expected_identifier) --=20 2.43.0