From nobody Sat Feb 7 08:07:12 2026 Received: from mailgw01.zimbra-vnc.de (mailgw01.zimbra-vnc.de [148.251.101.236]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B1E1226299; Mon, 26 Jan 2026 19:35:16 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=148.251.101.236 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1769456119; cv=none; b=l4GJIpm+zwvXv0qybymzKfMlxxmheIkGiLH8ynxu0/rCAE7uG6u4k+CKSnBq+S0Rwb74Y/7Z8eYhT9ldd3UxKQ6b0JOBlt/ChhtiCI/fXeXk74aRyRkuhDEAXVRGNnVq7zmHgB5BrzYRoIz2AJMYVpe0DTiCefX9Buw6H6/3+so= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1769456119; c=relaxed/simple; bh=HPwbpQJewBfarJi6Oe11Ivllziu/z1HySqMAh1lkh3s=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=m/VX2I5MiaELalWW4G/pWSyKfSgDe3GnP77fSr71wM9sl+1IJum/iE/Arezpq9ekcfIiN+uEDLHL5SPfsH3PX1GKP0aO8hNkd+yhBsBbOqmUGEbPmOLB4/9XH4Yvp7y2mKGR2RWv7x/s2xE+twv0QEqk1rxspVlzOKvhED/LkxQ= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=tngtech.com; spf=pass smtp.mailfrom=tngtech.com; dkim=pass (2048-bit key) header.d=tngtech.com header.i=@tngtech.com header.b=KVVwCvV0; arc=none smtp.client-ip=148.251.101.236 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=tngtech.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=tngtech.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=tngtech.com header.i=@tngtech.com header.b="KVVwCvV0" Received: from zmproxy.tng.vnc.biz (zimbra-vnc.tngtech.com [35.234.71.156]) by mailgw01.zimbra-vnc.de (Postfix) with ESMTPS id B50DA3FAF1; Mon, 26 Jan 2026 20:35:14 +0100 (CET) Received: from localhost (localhost [127.0.0.1]) by zmproxy.tng.vnc.biz (Postfix) with ESMTP id 6F1B41FA3F3; Mon, 26 Jan 2026 20:35:14 +0100 (CET) Received: from zmproxy.tng.vnc.biz ([127.0.0.1]) by localhost (zmproxy.tng.vnc.biz [127.0.0.1]) (amavis, port 10032) with ESMTP id AHxlyhGeMcnR; Mon, 26 Jan 2026 20:35:13 +0100 (CET) Received: from localhost (localhost [127.0.0.1]) by zmproxy.tng.vnc.biz (Postfix) with ESMTP id 4B36B1FA3D7; Mon, 26 Jan 2026 20:35:13 +0100 (CET) DKIM-Filter: OpenDKIM Filter v2.10.3 zmproxy.tng.vnc.biz 4B36B1FA3D7 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=tngtech.com; s=B14491C6-869D-11EB-BB6C-8DD33D883B31; t=1769456113; bh=xrDNGWAMIo2IlFRuDSPTQuz4GyM6E65erWVEihXsGTw=; h=From:To:Subject:Date:Message-Id:MIME-Version; b=KVVwCvV0z4tvrQ/+XxMtcJYPWubTEPf3xjgCrzNp/0cNuTd4LK0zOI7u3g/9y6C2z wGI9xXAwvM5MfMkTXKvjhYYw+UZ23XyTntMYYI/oAO5wo2kSsEMYm4HNMY2bD+c+cP /maP0BnLDVjCnEEPDanq4mrz7JLguSdm1/bUdF83/gxB50ROypBTpnj/g0FQ2nvv/X DWUdpIZzOHHqzZ3pXipFZNEs4u9oo6pdVk3hW0NMuHkaYNXzT4xrv9ENAZBY04vDCD ILuE2JnVCxwFt7lbmqDhWwXDI/2XdFmJHVgqv4QQJuyFrmwnCjR3WCQ0C+WW5dfxQ/ WqE31tO1HyO6A== X-Virus-Scanned: amavis at zmproxy.tng.vnc.biz Received: from zmproxy.tng.vnc.biz ([127.0.0.1]) by localhost (zmproxy.tng.vnc.biz [127.0.0.1]) (amavis, port 10026) with ESMTP id nIEHU4PFHAJe; Mon, 26 Jan 2026 20:35:13 +0100 (CET) Received: from DESKTOP-0O0JV6I.localdomain (ipservice-092-208-231-176.092.208.pools.vodafone-ip.de [92.208.231.176]) by zmproxy.tng.vnc.biz (Postfix) with ESMTPSA id BE6D01FA3F3; Mon, 26 Jan 2026 20:35:12 +0100 (CET) From: Luis Augenstein To: nathan@kernel.org, nsc@kernel.org Cc: linux-kbuild@vger.kernel.org, linux-kernel@vger.kernel.org, akpm@linux-foundation.org, gregkh@linuxfoundation.org, maximilian.huber@tngtech.com, Luis Augenstein Subject: [PATCH v3 01/14] tools/sbom: integrate tool in make process Date: Mon, 26 Jan 2026 20:32:51 +0100 Message-Id: <20260126193304.320916-2-luis.augenstein@tngtech.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20260126193304.320916-1-luis.augenstein@tngtech.com> References: <20260126193304.320916-1-luis.augenstein@tngtech.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" integrate SBOM tool into the kernel build process. Co-developed-by: Maximilian Huber Signed-off-by: Maximilian Huber Signed-off-by: Luis Augenstein --- .gitignore | 1 + MAINTAINERS | 6 ++ Makefile | 15 +++- tools/Makefile | 3 +- tools/sbom/Makefile | 36 ++++++++ tools/sbom/README | 207 ++++++++++++++++++++++++++++++++++++++++++++ tools/sbom/sbom.py | 16 ++++ 7 files changed, 281 insertions(+), 3 deletions(-) create mode 100644 tools/sbom/Makefile create mode 100644 tools/sbom/README create mode 100644 tools/sbom/sbom.py diff --git a/.gitignore b/.gitignore index 3a7241c941f5..f3372f15eb1b 100644 --- a/.gitignore +++ b/.gitignore @@ -48,6 +48,7 @@ *.s *.so *.so.dbg +*.spdx.json *.su *.symtypes *.tab.[ch] diff --git a/MAINTAINERS b/MAINTAINERS index f1b020588597..03d7d93d8e63 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -23365,6 +23365,12 @@ R: Marc Murphy S: Supported F: arch/arm/boot/dts/ti/omap/am335x-sancloud* =20 +SBOM +M: Luis Augenstein +M: Maximilian Huber +S: Maintained +F: tools/sbom/ + SC1200 WDT DRIVER M: Zwane Mwaikambo S: Maintained diff --git a/Makefile b/Makefile index 9d38125263fb..7892c2725849 100644 --- a/Makefile +++ b/Makefile @@ -772,7 +772,7 @@ endif # in addition to whatever we do anyway. # Just "make" or "make all" shall build modules as well =20 -ifneq ($(filter all modules nsdeps compile_commands.json clang-%,$(MAKECMD= GOALS)),) +ifneq ($(filter all modules nsdeps compile_commands.json clang-% sbom,$(MA= KECMDGOALS)),) KBUILD_MODULES :=3D y endif =20 @@ -1457,6 +1457,17 @@ prepare: tools/bpf/resolve_btfids endif endif =20 +PHONY +=3D sbom +sbom: all + $(Q)mkdir -p $(objtree)/tools + $(Q)$(MAKE) \ + O=3D$(abspath $(objtree)) \ + subdir=3Dtools \ + -C $(srctree)/tools/ \ + sbom \ + srctree=3D$(abspath $(srctree)) \ + CONFIG_MODULES=3D$(CONFIG_MODULES) + # The tools build system is not a part of Kbuild and tends to introduce # its own unique issues. If you need to integrate a new tool into Kbuild, # please consider locating that tool outside the tools/ tree and using the @@ -1612,7 +1623,7 @@ CLEAN_FILES +=3D vmlinux.symvers modules-only.symvers= \ modules.builtin.ranges vmlinux.o.map vmlinux.unstripped \ compile_commands.json rust/test \ rust-project.json .vmlinux.objs .vmlinux.export.c \ - .builtin-dtbs-list .builtin-dtb.S + .builtin-dtbs-list .builtin-dtb.S sbom-*.spdx.json =20 # Directories & files removed with 'make mrproper' MRPROPER_FILES +=3D include/config include/generated \ diff --git a/tools/Makefile b/tools/Makefile index cb40961a740f..7b4b1c96dcd5 100644 --- a/tools/Makefile +++ b/tools/Makefile @@ -27,6 +27,7 @@ help: @echo ' nolibc - nolibc headers testing and installation' @echo ' objtool - an ELF object analysis tool' @echo ' perf - Linux performance measurement and analy= sis tool' + @echo ' sbom - SBOM generation tool' @echo ' selftests - various kernel selftests' @echo ' sched_ext - sched_ext example schedulers' @echo ' bootconfig - boot config tool' @@ -70,7 +71,7 @@ acpi: FORCE cpupower: FORCE $(call descend,power/$@) =20 -counter dma firewire hv guest bootconfig spi usb virtio mm bpf iio gpio ob= jtool leds wmi firmware debugging tracing: FORCE +counter dma firewire hv guest bootconfig spi usb virtio mm bpf iio gpio ob= jtool leds wmi firmware debugging tracing sbom: FORCE $(call descend,$@) =20 bpf/%: FORCE diff --git a/tools/sbom/Makefile b/tools/sbom/Makefile new file mode 100644 index 000000000000..90ae42dd28ee --- /dev/null +++ b/tools/sbom/Makefile @@ -0,0 +1,36 @@ +# SPDX-License-Identifier: GPL-2.0-only OR MIT +# Copyright (C) 2025 TNG Technology Consulting GmbH + +SBOM_SOURCE_FILE :=3D $(objtree)/sbom-source.spdx.json +SBOM_BUILD_FILE :=3D $(objtree)/sbom-build.spdx.json +SBOM_OUTPUT_FILE :=3D $(objtree)/sbom-output.spdx.json +SBOM_ROOTS_FILE :=3D $(objtree)/sbom-roots.txt + + +ifeq ($(srctree),$(objtree)) + SBOM_TARGETS :=3D $(SBOM_BUILD_FILE) $(SBOM_OUTPUT_FILE) +else + SBOM_TARGETS :=3D $(SBOM_SOURCE_FILE) $(SBOM_BUILD_FILE) $(SBOM_OUTPUT= _FILE) +endif + +SBOM_DEPS :=3D $(objtree)/$(KBUILD_IMAGE) $(objtree)/include/generated/aut= oconf.h +ifdef CONFIG_MODULES + SBOM_DEPS +=3D $(objtree)/modules.order +endif + +sbom: $(SBOM_TARGETS) + @: + +$(SBOM_TARGETS) &: $(SBOM_DEPS) + @echo " GEN $(notdir $(SBOM_TARGETS))" + + @printf "%s\n" "$(KBUILD_IMAGE)" > $(SBOM_ROOTS_FILE) + @if [ "$(CONFIG_MODULES)" =3D "y" ]; then \ + sed 's/\.o$$/.ko/' $(objtree)/modules.order >> $(SBOM_ROOTS_FILE); \ + fi + + @python3 sbom.py + + @rm $(SBOM_ROOTS_FILE) + +.PHONY: sbom diff --git a/tools/sbom/README b/tools/sbom/README new file mode 100644 index 000000000000..080d315acd2c --- /dev/null +++ b/tools/sbom/README @@ -0,0 +1,207 @@ + + +KernelSbom +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D + +Introduction +------------ + +KernelSbom is a Python script (``sbom.py`) that can be executed after a +successful kernel build. When invoked, KernelSbom analyzes all files +involved in the build and generates Software Bill of Materials (SBOM) +documents in SPDX 3.0.1 format. +The generated SBOM documents capture: +- **Final output artifacts**, typically the kernel image and modules +- **All source files** that contributed to the build with metadata + and licensing information +- **Details of the build process**, including intermediate artifacts + and the build commands linking source files to the final output + artifacts + +KernelSbom is originally developed in the +[KernelSbom repository](https://github.com/TNG/KernelSbom). + +Requirements +------------ + +Python 3.10 or later. No libraries or other dependencies are required. + +Basic Usage +----------- + +Run the `make sbom` target. +For example: + + $ make defconfig O=3Dkernel_build + $ make sbom O=3Dkernel_build -j$(nproc) + +This will trigger a kernel build. After all build outputs have been +generated, KernelSbom produces three SPDX documents in the root +directory of the object tree: + +- `sbom-source.spdx.json` + Describes all source files involved in the build and + associates each file with its corresponding license expression. + +- `sbom-output.spdx.json` + Captures all final build outputs (kernel image and `.ko` module files) + and includes build metadata such as environment variables and + a hash of the `.config` file used for the build. + +- `sbom-build.spdx.json` + Imports files from the source and output documents and describes every + intermediate build artifact. For each artifact, it records the exact + build command used and establishes the relationship between + input files and generated outputs. + +When enabling the KernelSbom tool, it is recommended to perform +out-of-tree builds using `O=3D`. KernelSbom classifies files as +source files when they are located in the source tree and not in the +object tree. For in-tree builds, where the source and object trees are +the same directory, this distinction can no longer be made reliably. +In that case, KernelSbom does not generate a dedicated source SBOM. +Instead, source files are included in the build SBOM. + +Standalone Usage +---------------- + +KernelSbom can also be used as a standalone script to generate +SPDX documents for specific build outputs. For example, after a +successful x86 kernel build, KernelSbom can generate SPDX documents +for the `bzImage` kernel image: + + $ SRCARCH=3Dx86 python3 tools/sbom/sbom.py \ + --src-tree . \ + --obj-tree ./kernel_build \ + --roots arch/x86/boot/bzImage \ + --generate-spdx \ + --generate-used-files \ + --prettify-json \ + --debug + +Note that when KernelSbom is invoked outside of the `make` process, +the environment variables used during compilation are not available and +therefore cannot be included in the generated SPDX documents. It is +recommended to set at least the `SRCARCH` environment variable to the +architecture for which the build was performed. + +For a full list of command-line options, run: + + $ python3 tools/sbom/sbom.py --help + +Output Format +------------- + +KernelSbom generates documents conforming to the +[SPDX 3.0.1 specification](https://spdx.github.io/spdx-spec/v3.0.1/) +serialized as JSON-LD. + +To reduce file size, the output documents use the JSON-LD `@context` +to define custom prefixes for `spdxId` values. While this is compliant +with the SPDX specification, only a limited number of tools in the +current SPDX ecosystem support custom JSON-LD contexts. To use such +tools with the generated documents, the custom JSON-LD context must +be expanded before providing the documents. +See https://lists.spdx.org/g/Spdx-tech/message/6064 for more information. + +How it Works +------------ + +KernelSbom operates in two major phases: +1. **Generate the cmd graph**, an acyclic directed dependency graph. +2. **Generate SPDX documents** based on the cmd graph. + +KernelSbom begins from the root artifacts specified by the user, e.g., +`arch/x86/boot/bzImage`. For each root artifact, it collects all +dependencies required to build that artifact. The dependencies come +from multiple sources: + +- **`.cmd` files**: The primary source is the `.cmd` file of the + generated artifact, e.g., `arch/x86/boot/.bzImage.cmd`. These files + contain the exact command used to build the artifact and often include + an explicit list of input dependencies. By parsing the `.cmd` file, + the full list of dependencies can be obtained. + +- **`.incbin` statements**: The second source are include binary + `.incbin` statements in `.S` assembly files. + +- **Hardcoded dependencies**: Unfortunately, not all build dependencies + can be found via `.cmd` files and `.incbin` statements. Some build + dependencies are directly defined in Makefiles or Kbuild files. + Parsing these files is considered too complex for the scope of this + project. Instead, the remaining gaps of the graph are filled using a + list of manually defined dependencies, see + `sbom/cmd_graph/hardcoded_dependencies.py`. This list is known to be + incomplete. However, analysis of the cmd graph indicates a ~99% + completeness. For more information about the completeness analysis, + see [KernelSbom #95](https://github.com/TNG/KernelSbom/issues/95). + +Given the list of dependency files, KernelSbom recursively processes +each file, expanding the dependency chain all the way to the version +controlled source files. The result is a complete dependency graph +where nodes represent files, and edges represent "file A was used to +build file B" relationships. + +Using the cmd graph, KernelSbom produces three SPDX documents. +For every file in the graph, KernelSbom: + +- Parses `SPDX-License-Identifier` headers, +- Computes file hashes, +- Estimates the file type based on extension and path, +- Records build relationships between files. + +Each root output file is additionally associated with an SPDX Package +element that captures version information, license data, and copyright. + +Advanced Usage +-------------- + +Including Kernel Modules +------------------------ + +The list of all `.ko` kernel modules produced during a build can be +extracted from the `modules.order` file within the object tree. +For example: + + $ echo "arch/x86/boot/bzImage" > sbom-roots.txt + $ sed 's/\.o$/.ko/' ./kernel_build/modules.order >> sbom-roots.txt + +Then use the generated roots file: + + $ SRCARCH=3Dx86 python3 tools/sbom/sbom.py \ + --src-tree . \ + --obj-tree ./kernel_build \ + --roots-file sbom-roots.txt \ + --generate-spdx + + +Equal Source and Object Trees +------------------------------ + +When the source tree and object tree are identical (for example, when +building in-tree), source files can no longer be reliably distinguished +from generated files. +In this scenario, KernelSbom does not produce a dedicated +`sbom-source.spdx.json` document. Instead, both source files and build +artifacts are included together in `sbom-build.spdx.json`, and +`sbom.used-files.txt` lists all files referenced in the build document. + +Unknown Build Commands +---------------------- + +Because the kernel supports a wide range of configurations and versions, +KernelSbom may encounter build commands in `.cmd` files that it does +not yet support. By default, KernelSbom will fail if an unknown build +command is encountered. + +If you still wish to generate SPDX documents despite unsupported +commands, you can use the `--do-not-fail-on-unknown-build-command` +option. KernelSbom will continue and produce the documents, although +the resulting SBOM will be incomplete. + +This option should only be used when the missing portion of the +dependency graph is small and an incomplete SBOM is acceptable for +your use case. diff --git a/tools/sbom/sbom.py b/tools/sbom/sbom.py new file mode 100644 index 000000000000..9c2e4c7f17ce --- /dev/null +++ b/tools/sbom/sbom.py @@ -0,0 +1,16 @@ +#!/usr/bin/env python3 +# SPDX-License-Identifier: GPL-2.0-only OR MIT +# Copyright (C) 2025 TNG Technology Consulting GmbH + +""" +Compute software bill of materials in SPDX format describing a kernel buil= d. +""" + + +def main(): + pass + + +# Call main method +if __name__ =3D=3D "__main__": + main() --=20 2.34.1 From nobody Sat Feb 7 08:07:12 2026 Received: from mailgw01.zimbra-vnc.de (mailgw01.zimbra-vnc.de [148.251.101.236]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B5B8F1C862F; Mon, 26 Jan 2026 19:35:21 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=148.251.101.236 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1769456123; cv=none; b=BfU9UUGqc+PUEQ7qvFGynAj0uN4FEaWWNcuZoAMtxzYhlqbDkgsXogANNjbYXNeosirOzpznLgn9rITbNKpp7zoT5dY8WQ76dCnbiJiZDKk7jr3tCuH0557V+PbH0ExVefMwvSriRcP2mAeTotb06qqiogNwbJu7pwlLHAVlv3E= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1769456123; c=relaxed/simple; bh=MmdU7mAXsERDhcftLhJk8wzfCW9FoyJkMAZGJQzgPTs=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=ZnM+ZOXMZNM3eO338jkUHdI587z+ExBROn/9MYiF9pH7OS9IWIviPGQ3Pi2U1ebcKq5xvACP2CXbL62qcCJjBxNq0JCTMh95InH1W8k5u2ih8xi5gE8OdIxlIJfQyqVaLcLaVC82JkUXrKbePo332Vy/kNObDtLClu9jUfvwVRg= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=tngtech.com; spf=pass smtp.mailfrom=tngtech.com; dkim=pass (2048-bit key) header.d=tngtech.com header.i=@tngtech.com header.b=K+vfmhHp; arc=none smtp.client-ip=148.251.101.236 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=tngtech.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=tngtech.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=tngtech.com header.i=@tngtech.com header.b="K+vfmhHp" Received: from zmproxy.tng.vnc.biz (zimbra-vnc.tngtech.com [35.234.71.156]) by mailgw01.zimbra-vnc.de (Postfix) with ESMTPS id 1A7593FAF1; Mon, 26 Jan 2026 20:35:20 +0100 (CET) Received: from localhost (localhost [127.0.0.1]) by zmproxy.tng.vnc.biz (Postfix) with ESMTP id D416D1FA856; Mon, 26 Jan 2026 20:35:19 +0100 (CET) Received: from zmproxy.tng.vnc.biz ([127.0.0.1]) by localhost (zmproxy.tng.vnc.biz [127.0.0.1]) (amavis, port 10032) with ESMTP id WrjxAdDRCDnt; Mon, 26 Jan 2026 20:35:16 +0100 (CET) Received: from localhost (localhost [127.0.0.1]) by zmproxy.tng.vnc.biz (Postfix) with ESMTP id 452BB1FA8D1; Mon, 26 Jan 2026 20:35:16 +0100 (CET) DKIM-Filter: OpenDKIM Filter v2.10.3 zmproxy.tng.vnc.biz 452BB1FA8D1 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=tngtech.com; s=B14491C6-869D-11EB-BB6C-8DD33D883B31; t=1769456116; bh=mCCGYzs2YjzxAtiQeTqaCe3JRrDhM42tNDPHzxjwJ4A=; h=From:To:Subject:Date:Message-Id:MIME-Version; b=K+vfmhHpqs66BZgWsZYPrhlfFHbm6WT8K9zkSkq16kHVi1QioD3hBxQ8Sm6t3Pblq tsKNpj5bcNvEYishXFMh31xzvUP+ubnv1PpYHe9qHHVMIFJ155OR8dUJLnHEhGb+AZ 0laKnAHwsI51YudnuyRKYPTpuMLps9MAA0UrTUQDpQ8ZI2IGRoTGacNnm+NmvzedBk ZMQX0QZETcGJ0pz22v9Xt+uwPQB3M4/7039Tigo70KKR88eyotUmK7L1dP3LXhcvWp X7Md/YzSGVjRvJoiRAxWcR5nRRvkizlqOPVuxNwzDuzNY/LPDc5koKsQoMx1aEyjDM aQludpxpIm26Q== X-Virus-Scanned: amavis at zmproxy.tng.vnc.biz Received: from zmproxy.tng.vnc.biz ([127.0.0.1]) by localhost (zmproxy.tng.vnc.biz [127.0.0.1]) (amavis, port 10026) with ESMTP id 8Xt-r6ncad-a; Mon, 26 Jan 2026 20:35:15 +0100 (CET) Received: from DESKTOP-0O0JV6I.localdomain (ipservice-092-208-231-176.092.208.pools.vodafone-ip.de [92.208.231.176]) by zmproxy.tng.vnc.biz (Postfix) with ESMTPSA id 7D9041FA8BF; Mon, 26 Jan 2026 20:35:15 +0100 (CET) From: Luis Augenstein To: nathan@kernel.org, nsc@kernel.org Cc: linux-kbuild@vger.kernel.org, linux-kernel@vger.kernel.org, akpm@linux-foundation.org, gregkh@linuxfoundation.org, maximilian.huber@tngtech.com, Luis Augenstein Subject: [PATCH v3 02/14] tools/sbom: setup sbom logging Date: Mon, 26 Jan 2026 20:32:52 +0100 Message-Id: <20260126193304.320916-3-luis.augenstein@tngtech.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20260126193304.320916-1-luis.augenstein@tngtech.com> References: <20260126193304.320916-1-luis.augenstein@tngtech.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Add logging infrastructure for warnings and errors. Errors and warnings are accumulated and summarized in the end. Co-developed-by: Maximilian Huber Signed-off-by: Maximilian Huber Signed-off-by: Luis Augenstein --- tools/sbom/sbom.py | 24 ++++++++- tools/sbom/sbom/__init__.py | 0 tools/sbom/sbom/config.py | 47 ++++++++++++++++++ tools/sbom/sbom/sbom_logging.py | 88 +++++++++++++++++++++++++++++++++ 4 files changed, 158 insertions(+), 1 deletion(-) create mode 100644 tools/sbom/sbom/__init__.py create mode 100644 tools/sbom/sbom/config.py create mode 100644 tools/sbom/sbom/sbom_logging.py diff --git a/tools/sbom/sbom.py b/tools/sbom/sbom.py index 9c2e4c7f17ce..c7f23d6eb300 100644 --- a/tools/sbom/sbom.py +++ b/tools/sbom/sbom.py @@ -6,9 +6,31 @@ Compute software bill of materials in SPDX format describing a kernel buil= d. """ =20 +import logging +import sys +import sbom.sbom_logging as sbom_logging +from sbom.config import get_config + =20 def main(): - pass + # Read config + config =3D get_config() + + # Configure logging + logging.basicConfig( + level=3Dlogging.DEBUG if config.debug else logging.INFO, + format=3D"[%(levelname)s] %(message)s", + ) + + # Report collected warnings and errors in case of failure + warning_summary =3D sbom_logging.summarize_warnings() + error_summary =3D sbom_logging.summarize_errors() + + if warning_summary: + logging.warning(warning_summary) + if error_summary: + logging.error(error_summary) + sys.exit(1) =20 =20 # Call main method diff --git a/tools/sbom/sbom/__init__.py b/tools/sbom/sbom/__init__.py new file mode 100644 index 000000000000..e69de29bb2d1 diff --git a/tools/sbom/sbom/config.py b/tools/sbom/sbom/config.py new file mode 100644 index 000000000000..3dc569ae0c43 --- /dev/null +++ b/tools/sbom/sbom/config.py @@ -0,0 +1,47 @@ +# SPDX-License-Identifier: GPL-2.0-only OR MIT +# Copyright (C) 2025 TNG Technology Consulting GmbH + +import argparse +from dataclasses import dataclass + + +@dataclass +class KernelSbomConfig: + debug: bool + """Whether to enable debug logging.""" + + +def _parse_cli_arguments() -> dict[str, bool]: + """ + Parse command-line arguments using argparse. + + Returns: + Dictionary of parsed arguments. + """ + parser =3D argparse.ArgumentParser( + description=3D"Generate SPDX SBOM documents for kernel builds", + ) + parser.add_argument( + "--debug", + action=3D"store_true", + default=3DFalse, + help=3D"Enable debug logs (default: False)", + ) + + args =3D vars(parser.parse_args()) + return args + + +def get_config() -> KernelSbomConfig: + """ + Parse command-line arguments and construct the configuration object. + + Returns: + KernelSbomConfig: Configuration object with all settings for SBOM = generation. + """ + # Parse cli arguments + args =3D _parse_cli_arguments() + + debug =3D args["debug"] + + return KernelSbomConfig(debug=3Ddebug) diff --git a/tools/sbom/sbom/sbom_logging.py b/tools/sbom/sbom/sbom_logging= .py new file mode 100644 index 000000000000..3460c4d84626 --- /dev/null +++ b/tools/sbom/sbom/sbom_logging.py @@ -0,0 +1,88 @@ +# SPDX-License-Identifier: GPL-2.0-only OR MIT +# Copyright (C) 2025 TNG Technology Consulting GmbH + +import logging +import inspect +from typing import Any, Literal + + +class MessageLogger: + """Logger that prints the first occurrence of each message immediately + and keeps track of repeated messages for a final summary.""" + + messages: dict[str, list[str]] + repeated_logs_limit: int + """Maximum number of repeated messages of the same type to log before = suppressing further output.""" + + def __init__(self, level: Literal["error", "warning"], repeated_logs_l= imit: int =3D 3) -> None: + self._level =3D level + self.messages =3D {} + self.repeated_logs_limit =3D repeated_logs_limit + + def log(self, template: str, /, **kwargs: Any) -> None: + """Log a message based on a template and optional variables.""" + message =3D template.format(**kwargs) + if template not in self.messages: + self.messages[template] =3D [] + if len(self.messages[template]) < self.repeated_logs_limit: + if self._level =3D=3D "error": + logging.error(message) + elif self._level =3D=3D "warning": + logging.warning(message) + self.messages[template].append(message) + + def get_summary(self) -> str: + """Return summary of collected messages.""" + if len(self.messages) =3D=3D 0: + return "" + summary: list[str] =3D [f"Summarize {self._level}s:"] + for msgs in self.messages.values(): + for i, msg in enumerate(msgs): + if i < self.repeated_logs_limit: + summary.append(msg) + continue + summary.append( + f"... (Found {len(msgs) - i} more {'instances' if (len= (msgs) - i) !=3D 1 else 'instance'} of this {self._level})" + ) + break + return "\n".join(summary) + + +_warning_logger: MessageLogger +_error_logger: MessageLogger + + +def warning(msg_template: str, /, **kwargs: Any) -> None: + """Log a warning message.""" + _warning_logger.log(msg_template, **kwargs) + + +def error(msg_template: str, /, **kwargs: Any) -> None: + """Log an error message including file, line, and function context.""" + frame =3D inspect.currentframe() + caller_frame =3D frame.f_back if frame else None + info =3D inspect.getframeinfo(caller_frame) if caller_frame else None + if info: + msg_template =3D f'File "{info.filename}", line {info.lineno}, in = {info.function}\n{msg_template}' + _error_logger.log(msg_template, **kwargs) + + +def summarize_warnings() -> str: + return _warning_logger.get_summary() + + +def summarize_errors() -> str: + return _error_logger.get_summary() + + +def has_errors() -> bool: + return len(_error_logger.messages) > 0 + + +def init() -> None: + global _warning_logger, _error_logger + _warning_logger =3D MessageLogger("warning") + _error_logger =3D MessageLogger("error") + + +init() --=20 2.34.1 From nobody Sat Feb 7 08:07:12 2026 Received: from mailgw01.zimbra-vnc.de (mailgw01.zimbra-vnc.de [148.251.101.236]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 5EF1328640C; Mon, 26 Jan 2026 19:35:34 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=148.251.101.236 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1769456137; cv=none; b=rg5rVsc13Cx4rUlVcJlN3DsbFJTjbfxr0qX45jaPj+7rLHQnPrgeQrCZc1JDWZ+a50yXsqSNNUMUxlEqnzqbDROq34xNBaNMUjPQPonFYQAjeLK1Mj9gnqu9c6wz2u5D9dG2XtV7JDtjn1onOlxvjN+Rgx/Iz6ANLVm5WenhK44= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1769456137; c=relaxed/simple; bh=vxsj0vozrmNCJZJnpHlDEszqHMXArMmk6Orxrb2wBJg=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=corQEGAi8wFvL+ub/JXjQGiDGt3qVET+ZihcsJIbNZ1X2G3+G0jQauLdB/uYV5KcSoRKRIH91yt5dlvtFN7q3tLwOeLZHoRF/iv8/KCPL8iKVXJQhtc8BlALiV0sYBgPLa4wl9Y0fKXVdX2kdq2HGXD7sZQm1W49rJGbBSF5+Lg= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=tngtech.com; spf=pass smtp.mailfrom=tngtech.com; dkim=pass (2048-bit key) header.d=tngtech.com header.i=@tngtech.com header.b=O13lEm39; arc=none smtp.client-ip=148.251.101.236 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=tngtech.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=tngtech.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=tngtech.com header.i=@tngtech.com header.b="O13lEm39" Received: from zmproxy.tng.vnc.biz (zimbra-vnc.tngtech.com [35.234.71.156]) by mailgw01.zimbra-vnc.de (Postfix) with ESMTPS id EDDC03FAF5; Mon, 26 Jan 2026 20:35:27 +0100 (CET) Received: from localhost (localhost [127.0.0.1]) by zmproxy.tng.vnc.biz (Postfix) with ESMTP id B02E21FA707; Mon, 26 Jan 2026 20:35:27 +0100 (CET) Received: from zmproxy.tng.vnc.biz ([127.0.0.1]) by localhost (zmproxy.tng.vnc.biz [127.0.0.1]) (amavis, port 10032) with ESMTP id dwcgiL3BFOS8; Mon, 26 Jan 2026 20:35:24 +0100 (CET) Received: from localhost (localhost [127.0.0.1]) by zmproxy.tng.vnc.biz (Postfix) with ESMTP id 5BA8D1FA3D7; Mon, 26 Jan 2026 20:35:20 +0100 (CET) DKIM-Filter: OpenDKIM Filter v2.10.3 zmproxy.tng.vnc.biz 5BA8D1FA3D7 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=tngtech.com; s=B14491C6-869D-11EB-BB6C-8DD33D883B31; t=1769456120; bh=X7VLnM545K9h8NsVTCbW97cPRbf+gkr53bpvW4mQFr0=; h=From:To:Subject:Date:Message-Id:MIME-Version; b=O13lEm39MgMiW2tA1LnWLSudXbmyp6heILzVsC0VD2fRwsCFLzmEg+e1mhaPRjDwa mtH7k6zVqPd+Vn0+dlUJFfAKJYlej9lLoDbrSvp9MY0ZY3ubLM31w6mqqCrhv3RAI7 kNOOvUeCsCIxiNED86fvKG7YKZHK9baezOOxWuvjrfaMhKWcG6gW27cb5FiGjG8aI3 LnNc08SKZpEylDLnozQZ80PPnrqOzqtnXW2kMoO8YA/eW/7/b83vvdBfLMjDa2LYd5 BZigaixSiKLXxQl9W2og2AGpNPRgCaHkiwvavTZRvkRRk5QQL/6omE/9ZhWZuNETii PnlvXI7kpsVUw== X-Virus-Scanned: amavis at zmproxy.tng.vnc.biz Received: from zmproxy.tng.vnc.biz ([127.0.0.1]) by localhost (zmproxy.tng.vnc.biz [127.0.0.1]) (amavis, port 10026) with ESMTP id QcsnSUKZfzHC; Mon, 26 Jan 2026 20:35:19 +0100 (CET) Received: from DESKTOP-0O0JV6I.localdomain (ipservice-092-208-231-176.092.208.pools.vodafone-ip.de [92.208.231.176]) by zmproxy.tng.vnc.biz (Postfix) with ESMTPSA id 2CEFB1FA8EF; Mon, 26 Jan 2026 20:35:17 +0100 (CET) From: Luis Augenstein To: nathan@kernel.org, nsc@kernel.org Cc: linux-kbuild@vger.kernel.org, linux-kernel@vger.kernel.org, akpm@linux-foundation.org, gregkh@linuxfoundation.org, maximilian.huber@tngtech.com, Luis Augenstein Subject: [PATCH v3 03/14] tools/sbom: add command parsers Date: Mon, 26 Jan 2026 20:32:53 +0100 Message-Id: <20260126193304.320916-4-luis.augenstein@tngtech.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20260126193304.320916-1-luis.augenstein@tngtech.com> References: <20260126193304.320916-1-luis.augenstein@tngtech.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Implement savedcmd_parser module for extracting input files from kernel build commands. Co-developed-by: Maximilian Huber Signed-off-by: Maximilian Huber Signed-off-by: Luis Augenstein --- tools/sbom/sbom/cmd_graph/savedcmd_parser.py | 664 +++++++++++++++++++ 1 file changed, 664 insertions(+) create mode 100644 tools/sbom/sbom/cmd_graph/savedcmd_parser.py diff --git a/tools/sbom/sbom/cmd_graph/savedcmd_parser.py b/tools/sbom/sbom= /cmd_graph/savedcmd_parser.py new file mode 100644 index 000000000000..d72f781b4498 --- /dev/null +++ b/tools/sbom/sbom/cmd_graph/savedcmd_parser.py @@ -0,0 +1,664 @@ +# SPDX-License-Identifier: GPL-2.0-only OR MIT +# Copyright (C) 2025 TNG Technology Consulting GmbH + +import re +import shlex +from dataclasses import dataclass +from typing import Any, Callable, Union +import sbom.sbom_logging as sbom_logging +from sbom.path_utils import PathStr + + +class CmdParsingError(Exception): + def __init__(self, message: str): + super().__init__(message) + self.message =3D message + + +@dataclass +class Option: + name: str + value: str | None =3D None + + +@dataclass +class Positional: + value: str + + +_SUBCOMMAND_PATTERN =3D re.compile(r"\$\$\(([^()]*)\)") +"""Pattern to match $$(...) blocks""" + + +def _tokenize_single_command(command: str, flag_options: list[str] | None = =3D None) -> list[Union[Option, Positional]]: + """ + Parse a shell command into a list of Options and Positionals. + - Positional: the command and any positional arguments. + - Options: handles flags and options with values provided as space-sep= arated, or equals-sign + (e.g., '--opt val', '--opt=3Dval', '--flag'). + + Args: + command: Command line string. + flag_options: Options that are flags without values (e.g., '--verb= ose'). + + Returns: + List of `Option` and `Positional` objects in command order. + """ + + # Wrap all $$(...) blocks in double quotes to prevent shlex from spli= tting them. + command_with_protected_subcommands =3D _SUBCOMMAND_PATTERN.sub(lambda = m: f'"$$({m.group(1)})"', command) + tokens =3D shlex.split(command_with_protected_subcommands) + + parsed: list[Option | Positional] =3D [] + i =3D 0 + while i < len(tokens): + token =3D tokens[i] + + # Positional + if not token.startswith("-"): + parsed.append(Positional(token)) + i +=3D 1 + continue + + # Option without value (--flag) + if (token.startswith("-") and i + 1 < len(tokens) and tokens[i + 1= ].startswith("-")) or ( + flag_options and token in flag_options + ): + parsed.append(Option(name=3Dtoken)) + i +=3D 1 + continue + + # Option with equals sign (--opt=3Dval) + if "=3D" in token: + name, value =3D token.split("=3D", 1) + parsed.append(Option(name=3Dname, value=3Dvalue)) + i +=3D 1 + continue + + # Option with space-separated value (--opt val) + if i + 1 < len(tokens) and not tokens[i + 1].startswith("-"): + parsed.append(Option(name=3Dtoken, value=3Dtokens[i + 1])) + i +=3D 2 + continue + + raise CmdParsingError(f"Unrecognized token: {token} in command {co= mmand}") + + return parsed + + +def _tokenize_single_command_positionals_only(command: str) -> list[str]: + command_parts =3D _tokenize_single_command(command) + positionals =3D [p.value for p in command_parts if isinstance(p, Posit= ional)] + if len(positionals) !=3D len(command_parts): + raise CmdParsingError( + f"Invalid command format: expected positional arguments only b= ut got options in command {command}." + ) + return positionals + + +def _parse_dd_command(command: str) -> list[PathStr]: + match =3D re.match(r"dd.*?if=3D(\S+)", command) + if match: + return [match.group(1)] + return [] + + +def _parse_cat_command(command: str) -> list[PathStr]: + positionals =3D _tokenize_single_command_positionals_only(command) + # expect positionals to be ["cat", input1, input2, ...] + return [p for p in positionals[1:]] + + +def _parse_compound_command(command: str) -> list[PathStr]: + compound_command_parsers: list[tuple[re.Pattern[str], Callable[[str], = list[PathStr]]]] =3D [ + (re.compile(r"dd\b"), _parse_dd_command), + (re.compile(r"cat.*?\|"), lambda c: _parse_cat_command(c.split("|"= )[0])), + (re.compile(r"cat\b[^|>]*$"), _parse_cat_command), + (re.compile(r"echo\b"), _parse_noop), + (re.compile(r"\S+=3D"), _parse_noop), + (re.compile(r"printf\b"), _parse_noop), + (re.compile(r"sed\b"), _parse_sed_command), + ( + re.compile(r"(.*/)scripts/bin2c\s*<"), + lambda c: [input] if (input :=3D c.split("<")[1].strip()) !=3D= "/dev/null" else [], + ), + (re.compile(r"^:$"), _parse_noop), + ] + + match =3D re.match(r"\s*[\(\{](.*)[\)\}]\s*>", command, re.DOTALL) + if match is None: + raise CmdParsingError("No inner commands found for compound comman= d") + input_files: list[PathStr] =3D [] + inner_commands =3D _split_commands(match.group(1)) + for inner_command in inner_commands: + if isinstance(inner_command, IfBlock): + sbom_logging.error( + "Skip parsing inner command {inner_command} of compound co= mmand because IfBlock is not supported", + inner_command=3Dinner_command, + ) + continue + + parser =3D next((parser for pattern, parser in compound_command_pa= rsers if pattern.match(inner_command)), None) + if parser is None: + sbom_logging.error( + "Skip parsing inner command {inner_command} of compound co= mmand because no matching parser was found", + inner_command=3Dinner_command, + ) + continue + try: + input_files +=3D parser(inner_command) + except CmdParsingError as e: + sbom_logging.error( + "Skip parsing inner command {inner_command} of compound co= mmand because of command parsing error: {error_message}", + inner_command=3Dinner_command, + error_message=3De.message, + ) + return input_files + + +def _parse_objcopy_command(command: str) -> list[PathStr]: + command_parts =3D _tokenize_single_command(command, flag_options=3D["-= S", "-w"]) + positionals =3D [part.value for part in command_parts if isinstance(pa= rt, Positional)] + # expect positionals to be ['objcopy', input_file] or ['objcopy', inpu= t_file, output_file] + if not (len(positionals) =3D=3D 2 or len(positionals) =3D=3D 3): + raise CmdParsingError( + f"Invalid objcopy command format: expected 2 or 3 positional a= rguments, got {len(positionals)} ({positionals})" + ) + return [positionals[1]] + + +def _parse_link_vmlinux_command(command: str) -> list[PathStr]: + """ + For simplicity we do not parse the `scripts/link-vmlinux.sh` script. + Instead the `vmlinux.a` dependency is just hardcoded for now. + """ + return ["vmlinux.a"] + + +def _parse_noop(command: str) -> list[PathStr]: + """ + No-op parser for commands with no input files (e.g., 'rm', 'true'). + Returns an empty list. + """ + return [] + + +def _parse_ar_command(command: str) -> list[PathStr]: + positionals =3D _tokenize_single_command_positionals_only(command) + # expect positionals to be ['ar', flags, output, input1, input2, ...] + flags =3D positionals[1] + if "r" not in flags: + # 'r' option indicates that new files are added to the archive. + # If this option is missing we won't find any relevant input files. + return [] + return positionals[3:] + + +def _parse_ar_piped_xargs_command(command: str) -> list[PathStr]: + printf_command, _ =3D command.split("|", 1) + positionals =3D _tokenize_single_command_positionals_only(printf_comma= nd.strip()) + # expect positionals to be ['printf', '{prefix_path}%s ', input1, inpu= t2, ...] + prefix_path =3D positionals[1].rstrip("%s ") + return [f"{prefix_path}{filename}" for filename in positionals[2:]] + + +def _parse_gcc_or_clang_command(command: str) -> list[PathStr]: + parts =3D shlex.split(command) + # compile mode: expect last positional argument ending in `.c` or `.S`= to be the input file + for part in reversed(parts): + if not part.startswith("-") and any(part.endswith(suffix) for suff= ix in [".c", ".S"]): + return [part] + + # linking mode: expect all .o files to be the inputs + return [p for p in parts if p.endswith(".o")] + + +def _parse_rustc_command(command: str) -> list[PathStr]: + parts =3D shlex.split(command) + # expect last positional argument ending in `.rs` to be the input file + for part in reversed(parts): + if not part.startswith("-") and part.endswith(".rs"): + return [part] + raise CmdParsingError("Could not find .rs input source file") + + +def _parse_rustdoc_command(command: str) -> list[PathStr]: + parts =3D shlex.split(command) + # expect last positional argument ending in `.rs` to be the input file + for part in reversed(parts): + if not part.startswith("-") and part.endswith(".rs"): + return [part] + raise CmdParsingError("Could not find .rs input source file") + + +def _parse_syscallhdr_command(command: str) -> list[PathStr]: + command_parts =3D _tokenize_single_command(command.strip(), flag_optio= ns=3D["--emit-nr"]) + positionals =3D [p.value for p in command_parts if isinstance(p, Posit= ional)] + # expect positionals to be ["sh", path/to/syscallhdr.sh, input, output] + return [positionals[2]] + + +def _parse_syscalltbl_command(command: str) -> list[PathStr]: + command_parts =3D _tokenize_single_command(command.strip()) + positionals =3D [p.value for p in command_parts if isinstance(p, Posit= ional)] + # expect positionals to be ["sh", path/to/syscalltbl.sh, input, output] + return [positionals[2]] + + +def _parse_mkcapflags_command(command: str) -> list[PathStr]: + positionals =3D _tokenize_single_command_positionals_only(command) + # expect positionals to be ["sh", path/to/mkcapflags.sh, output, input= 1, input2] + return [positionals[3], positionals[4]] + + +def _parse_orc_hash_command(command: str) -> list[PathStr]: + positionals =3D _tokenize_single_command_positionals_only(command) + # expect positionals to be ["sh", path/to/orc_hash.sh, '<', input, '>'= , output] + return [positionals[3]] + + +def _parse_xen_hypercalls_command(command: str) -> list[PathStr]: + positionals =3D _tokenize_single_command_positionals_only(command) + # expect positionals to be ["sh", path/to/xen-hypercalls.sh, output, i= nput1, input2, ...] + return positionals[3:] + + +def _parse_gen_initramfs_command(command: str) -> list[PathStr]: + command_parts =3D _tokenize_single_command(command) + positionals =3D [p.value for p in command_parts if isinstance(p, Posit= ional)] + # expect positionals to be ["sh", path/to/gen_initramfs.sh, input1, in= put2, ...] + return positionals[2:] + + +def _parse_vdso2c_command(command: str) -> list[PathStr]: + positionals =3D _tokenize_single_command_positionals_only(command) + # expect positionals to be ['vdso2c', raw_input, stripped_input, outpu= t] + return [positionals[1], positionals[2]] + + +def _parse_ld_command(command: str) -> list[PathStr]: + command_parts =3D _tokenize_single_command( + command=3Dcommand.strip(), + flag_options=3D[ + "-shared", + "--no-undefined", + "--eh-frame-hdr", + "-Bsymbolic", + "-r", + "--no-ld-generated-unwind-info", + "--no-dynamic-linker", + "-pie", + "--no-dynamic-linker--whole-archive", + "--whole-archive", + "--no-whole-archive", + "--start-group", + "--end-group", + ], + ) + positionals =3D [p.value for p in command_parts if isinstance(p, Posit= ional)] + # expect positionals to be ["ld", input1, input2, ...] + return positionals[1:] + + +def _parse_sed_command(command: str) -> list[PathStr]: + command_parts =3D shlex.split(command) + # expect command parts to be ["sed", *, input] + input =3D command_parts[-1] + if input =3D=3D "/dev/null": + return [] + return [input] + + +def _parse_awk(command: str) -> list[PathStr]: + command_parts =3D _tokenize_single_command(command) + positionals =3D [p.value for p in command_parts if isinstance(p, Posit= ional)] + # expect positionals to be ["awk", input1, input2, ...] + return positionals[1:] + + +def _parse_nm_piped_command(command: str) -> list[PathStr]: + nm_command, _ =3D command.split("|", 1) + command_parts =3D _tokenize_single_command( + command=3Dnm_command.strip(), + flag_options=3D["p", "--defined-only"], + ) + positionals =3D [p.value for p in command_parts if isinstance(p, Posit= ional)] + # expect positionals to be ["nm", input1, input2, ...] + return [p for p in positionals[1:]] + + +def _parse_pnm_to_logo_command(command: str) -> list[PathStr]: + command_parts =3D shlex.split(command) + # expect command parts to be ["pnmtologo", , input] + return [command_parts[-1]] + + +def _parse_relacheck(command: str) -> list[PathStr]: + positionals =3D _tokenize_single_command_positionals_only(command) + # expect positionals to be ["relachek", input, log_reference] + return [positionals[1]] + + +def _parse_perl_command(command: str) -> list[PathStr]: + positionals =3D _tokenize_single_command_positionals_only(command.stri= p()) + # expect positionals to be ["perl", input] + return [positionals[1]] + + +def _parse_strip_command(command: str) -> list[PathStr]: + command_parts =3D _tokenize_single_command(command, flag_options=3D["-= -strip-debug"]) + positionals =3D [p.value for p in command_parts if isinstance(p, Posit= ional)] + # expect positionals to be ["strip", input1, input2, ...] + return positionals[1:] + + +def _parse_mkpiggy_command(command: str) -> list[PathStr]: + mkpiggy_command, _ =3D command.split(">", 1) + positionals =3D _tokenize_single_command_positionals_only(mkpiggy_comm= and) + # expect positionals to be ["mkpiggy", input] + return [positionals[1]] + + +def _parse_relocs_command(command: str) -> list[PathStr]: + if ">" not in command: + # Only consider relocs commands that redirect output to a file. + # If there's no redirection, we assume it produces no output file = and therefore has no input we care about. + return [] + relocs_command, _ =3D command.split(">", 1) + command_parts =3D shlex.split(relocs_command) + # expect command_parts to be ["relocs", options, input] + return [command_parts[-1]] + + +def _parse_mk_elfconfig_command(command: str) -> list[PathStr]: + positionals =3D _tokenize_single_command_positionals_only(command) + # expect positionals to be ["mk_elfconfig", "<", input, ">", output] + return [positionals[2]] + + +def _parse_flex_command(command: str) -> list[PathStr]: + parts =3D shlex.split(command) + # expect last positional argument ending in `.l` to be the input file + for part in reversed(parts): + if not part.startswith("-") and part.endswith(".l"): + return [part] + raise CmdParsingError("Could not find .l input source file in command") + + +def _parse_bison_command(command: str) -> list[PathStr]: + parts =3D shlex.split(command) + # expect last positional argument ending in `.y` to be the input file + for part in reversed(parts): + if not part.startswith("-") and part.endswith(".y"): + return [part] + raise CmdParsingError("Could not find input .y input source file in co= mmand") + + +def _parse_tools_build_command(command: str) -> list[PathStr]: + positionals =3D _tokenize_single_command_positionals_only(command) + # expect positionals to be ["tools/build", "input1", "input2", "input3= ", "output"] + return positionals[1:-1] + + +def _parse_extract_cert_command(command: str) -> list[PathStr]: + command_parts =3D shlex.split(command) + # expect command parts to be [path/to/extract-cert, input, output] + input =3D command_parts[1] + if not input: + return [] + return [input] + + +def _parse_dtc_command(command: str) -> list[PathStr]: + wno_flags =3D [command_part for command_part in shlex.split(command) i= f command_part.startswith("-Wno-")] + command_parts =3D _tokenize_single_command(command, flag_options=3Dwno= _flags) + positionals =3D [p.value for p in command_parts if isinstance(p, Posit= ional)] + # expect positionals to be [path/to/dtc, input] + return [positionals[1]] + + +def _parse_bindgen_command(command: str) -> list[PathStr]: + command_parts =3D shlex.split(command) + header_file_input_paths =3D [part for part in command_parts if part.en= dswith(".h")] + return header_file_input_paths + + +def _parse_gen_header(command: str) -> list[PathStr]: + command_parts =3D shlex.split(command) + # expect command parts to be ["python3", path/to/gen_headers.py, ..., = "--xml", input] + i =3D next(i for i, token in enumerate(command_parts) if token =3D=3D = "--xml") + return [command_parts[i + 1]] + + +# Command parser registry +SINGLE_COMMAND_PARSERS: list[tuple[re.Pattern[str], Callable[[str], list[P= athStr]]]] =3D [ + # Compound commands + (re.compile(r"\(.*?\)\s*>", re.DOTALL), _parse_compound_command), + (re.compile(r"\{.*?\}\s*>", re.DOTALL), _parse_compound_command), + # Standard Unix utilities and system tools + (re.compile(r"^rm\b"), _parse_noop), + (re.compile(r"^mkdir\b"), _parse_noop), + (re.compile(r"^touch\b"), _parse_noop), + (re.compile(r"^cat\b.*?[\|>]"), lambda c: _parse_cat_command(c.split("= |")[0].split(">")[0])), + (re.compile(r"^echo[^|]*$"), _parse_noop), + (re.compile(r"^sed.*?>"), lambda c: _parse_sed_command(c.split(">")[0]= )), + (re.compile(r"^sed\b"), _parse_noop), + (re.compile(r"^awk.*?<.*?>"), lambda c: [c.split("<")[1].split(">")[0]= ]), + (re.compile(r"^awk.*?>"), lambda c: _parse_awk(c.split(">")[0])), + (re.compile(r"^(/bin/)?true\b"), _parse_noop), + (re.compile(r"^(/bin/)?false\b"), _parse_noop), + (re.compile(r"^openssl\s+req.*?-new.*?-keyout"), _parse_noop), + # Compilers and code generators + # (C/LLVM toolchain, Rust, Flex/Bison, Bindgen, Perl, etc.) + (re.compile(r"^([^\s]+-)?(gcc|clang)\b"), _parse_gcc_or_clang_command), + (re.compile(r"^([^\s]+-)?ld(\.bfd)?\b"), _parse_ld_command), + (re.compile(r"^printf\b.*\| xargs ([^\s]+-)?ar\b"), _parse_ar_piped_xa= rgs_command), + (re.compile(r"^([^\s]+-)?ar\b"), _parse_ar_command), + (re.compile(r"^([^\s]+-)?nm\b.*?\|"), _parse_nm_piped_command), + (re.compile(r"^([^\s]+-)?objcopy\b"), _parse_objcopy_command), + (re.compile(r"^([^\s]+-)?strip\b"), _parse_strip_command), + (re.compile(r".*?rustc\b"), _parse_rustc_command), + (re.compile(r".*?rustdoc\b"), _parse_rustdoc_command), + (re.compile(r"^flex\b"), _parse_flex_command), + (re.compile(r"^bison\b"), _parse_bison_command), + (re.compile(r"^bindgen\b"), _parse_bindgen_command), + (re.compile(r"^perl\b"), _parse_perl_command), + # Kernel-specific build scripts and tools + (re.compile(r"^(.*/)?link-vmlinux\.sh\b"), _parse_link_vmlinux_command= ), + (re.compile(r"sh (.*/)?syscallhdr\.sh\b"), _parse_syscallhdr_command), + (re.compile(r"sh (.*/)?syscalltbl\.sh\b"), _parse_syscalltbl_command), + (re.compile(r"sh (.*/)?mkcapflags\.sh\b"), _parse_mkcapflags_command), + (re.compile(r"sh (.*/)?orc_hash\.sh\b"), _parse_orc_hash_command), + (re.compile(r"sh (.*/)?xen-hypercalls\.sh\b"), _parse_xen_hypercalls_c= ommand), + (re.compile(r"sh (.*/)?gen_initramfs\.sh\b"), _parse_gen_initramfs_com= mand), + (re.compile(r"sh (.*/)?checkundef\.sh\b"), _parse_noop), + (re.compile(r"(.*/)?vdso2c\b"), _parse_vdso2c_command), + (re.compile(r"^(.*/)?mkpiggy.*?>"), _parse_mkpiggy_command), + (re.compile(r"^(.*/)?relocs\b"), _parse_relocs_command), + (re.compile(r"^(.*/)?mk_elfconfig.*?<.*?>"), _parse_mk_elfconfig_comma= nd), + (re.compile(r"^(.*/)?tools/build\b"), _parse_tools_build_command), + (re.compile(r"^(.*/)?certs/extract-cert"), _parse_extract_cert_command= ), + (re.compile(r"^(.*/)?scripts/dtc/dtc\b"), _parse_dtc_command), + (re.compile(r"^(.*/)?pnmtologo\b"), _parse_pnm_to_logo_command), + (re.compile(r"^(.*/)?kernel/pi/relacheck"), _parse_relacheck), + (re.compile(r"^drivers/gpu/drm/radeon/mkregtable"), lambda c: [c.split= (" ")[1]]), + (re.compile(r"(.*/)?genheaders\b"), _parse_noop), + (re.compile(r"^(.*/)?mkcpustr\s+>"), _parse_noop), + (re.compile(r"^(.*/)polgen\b"), _parse_noop), + (re.compile(r"make -f .*/arch/x86/Makefile\.postlink"), _parse_noop), + (re.compile(r"^(.*/)?raid6/mktables\s+>"), _parse_noop), + (re.compile(r"^(.*/)?objtool\b"), _parse_noop), + (re.compile(r"^(.*/)?module/gen_test_kallsyms.sh"), _parse_noop), + (re.compile(r"^(.*/)?gen_header.py"), _parse_gen_header), + (re.compile(r"^(.*/)?scripts/rustdoc_test_gen"), _parse_noop), +] + + +# If Block pattern to match a simple, single-level if-then-fi block. Neste= d If blocks are not supported. +IF_BLOCK_PATTERN =3D re.compile( + r""" + ^if(.*?);\s* # Match 'if ;' (non-greedy) + then(.*?);\s* # Match 'then ;' (non-greedy) + fi\b # Match 'fi' + """, + re.VERBOSE, +) + + +@dataclass +class IfBlock: + condition: str + then_statement: str + + +def _unwrap_outer_parentheses(s: str) -> str: + s =3D s.strip() + if not (s.startswith("(") and s.endswith(")")): + return s + + count =3D 0 + for i, char in enumerate(s): + if char =3D=3D "(": + count +=3D 1 + elif char =3D=3D ")": + count -=3D 1 + # If count is 0 before the end, outer parentheses don't match + if count =3D=3D 0 and i !=3D len(s) - 1: + return s + + # outer parentheses do match, unwrap once + return _unwrap_outer_parentheses(s[1:-1]) + + +def _find_first_top_level_command_separator( + commands: str, separators: list[str] =3D [";", "&&"] +) -> tuple[int | None, int | None]: + in_single_quote =3D False + in_double_quote =3D False + in_curly_braces =3D 0 + in_braces =3D 0 + for i, char in enumerate(commands): + if char =3D=3D "'" and not in_double_quote: + # Toggle single quote state (unless inside double quotes) + in_single_quote =3D not in_single_quote + elif char =3D=3D '"' and not in_single_quote: + # Toggle double quote state (unless inside single quotes) + in_double_quote =3D not in_double_quote + + if in_single_quote or in_double_quote: + continue + + # Toggle braces state + if char =3D=3D "{": + in_curly_braces +=3D 1 + if char =3D=3D "}": + in_curly_braces -=3D 1 + + if char =3D=3D "(": + in_braces +=3D 1 + if char =3D=3D ")": + in_braces -=3D 1 + + if in_curly_braces > 0 or in_braces > 0: + continue + + # return found separator position and separator length + for separator in separators: + if commands[i : i + len(separator)] =3D=3D separator: + return i, len(separator) + + return None, None + + +def _split_commands(commands: str) -> list[str | IfBlock]: + """ + Splits a string of command-line commands into individual parts. + + This function handles: + - Top-level command separators (e.g., `;` and `&&`) to split multiple = commands. + - Conditional if-blocks, returning them as `IfBlock` instances. + - Preserves the order of commands and trims whitespace. + + Args: + commands (str): The raw command string. + + Returns: + list[str | IfBlock]: A list of single commands or `IfBlock` object= s. + """ + single_commands: list[str | IfBlock] =3D [] + remaining_commands =3D _unwrap_outer_parentheses(commands) + while len(remaining_commands) > 0: + remaining_commands =3D remaining_commands.strip() + + # if block + matched_if =3D IF_BLOCK_PATTERN.match(remaining_commands) + if matched_if: + condition, then_statement =3D matched_if.groups() + single_commands.append(IfBlock(condition.strip(), then_stateme= nt.strip())) + full_matched =3D matched_if.group(0) + remaining_commands =3D remaining_commands.removeprefix(full_ma= tched).lstrip("; \n") + continue + + # command until next separator + separator_position, separator_length =3D _find_first_top_level_com= mand_separator(remaining_commands) + if separator_position is not None and separator_length is not None: + single_commands.append(remaining_commands[:separator_position]= .strip()) + remaining_commands =3D remaining_commands[separator_position += separator_length :].strip() + continue + + # single last command + single_commands.append(remaining_commands) + break + + return single_commands + + +def parse_inputs_from_commands(commands: str, fail_on_unknown_build_comman= d: bool) -> list[PathStr]: + """ + Extract input files referenced in a set of command-line commands. + + Args: + commands (str): Command line expression to parse. + fail_on_unknown_build_command (bool): Whether to fail if an unknow= n build command is encountered. If False, errors are logged as warnings. + + Returns: + list[PathStr]: List of input file paths required by the commands. + """ + + def log_error_or_warning(message: str, /, **kwargs: Any) -> None: + if fail_on_unknown_build_command: + sbom_logging.error(message, **kwargs) + else: + sbom_logging.warning(message, **kwargs) + + input_files: list[PathStr] =3D [] + for single_command in _split_commands(commands): + if isinstance(single_command, IfBlock): + inputs =3D parse_inputs_from_commands(single_command.then_stat= ement, fail_on_unknown_build_command) + if inputs: + log_error_or_warning( + "Skipped parsing command {then_statement} because inpu= t files in IfBlock 'then' statement are not supported", + then_statement=3Dsingle_command.then_statement, + ) + continue + + matched_parser =3D next( + (parser for pattern, parser in SINGLE_COMMAND_PARSERS if patte= rn.match(single_command)), None + ) + if matched_parser is None: + log_error_or_warning( + "Skipped parsing command {single_command} because no match= ing parser was found", + single_command=3Dsingle_command, + ) + continue + try: + inputs =3D matched_parser(single_command) + input_files.extend(inputs) + except CmdParsingError as e: + log_error_or_warning( + "Skipped parsing command {single_command} because of comma= nd parsing error: {error_message}", + single_command=3Dsingle_command, + error_message=3De.message, + ) + + return [input.strip().rstrip("/") for input in input_files] --=20 2.34.1 From nobody Sat Feb 7 08:07:12 2026 Received: from mailgw01.zimbra-vnc.de (mailgw01.zimbra-vnc.de [148.251.101.236]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 7DECE2517AA; Mon, 26 Jan 2026 19:35:32 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=148.251.101.236 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1769456135; cv=none; b=KcKZnhF8ge/uKkOLe5zUg3KBZYUAQIi0fDw6NtgrctVHf/5uMxfMxs+kkeGBU+snIjLJtNjynHV8Y3rJtJ0qtfK2mqG9YU9+Fdt4P0mmBoeZ0En427A4yjuneN9y+Bma/JrNPZInJJ9vcbaWSEhD9ip8MvH5zcEJ4UOH+rgqkyw= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1769456135; c=relaxed/simple; bh=Xu/IUnze8TQ/78kQCbfV4jDLL+6e3iVprsBZngcPcVQ=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=o2WlpPYAVvC4N1tonGXUAzSr3eCUoGGof7mf93KsnK8ZTbRlPn8RQ8ZsiycPxCEjcrwafvtDh1/mmiU25HcXbFNhg8g4nNG7WdLilCzRAC4Gmy5Qrnxiv3GcW53FOg+qSWpugyoXqE6mP0YHjm4efCrSt63y7L28PF7Fp9T4FSY= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=tngtech.com; spf=pass smtp.mailfrom=tngtech.com; dkim=pass (2048-bit key) header.d=tngtech.com header.i=@tngtech.com header.b=QuPhty8E; arc=none smtp.client-ip=148.251.101.236 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=tngtech.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=tngtech.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=tngtech.com header.i=@tngtech.com header.b="QuPhty8E" Received: from zmproxy.tng.vnc.biz (zimbra-vnc.tngtech.com [35.234.71.156]) by mailgw01.zimbra-vnc.de (Postfix) with ESMTPS id EF3CA3FAF6; Mon, 26 Jan 2026 20:35:27 +0100 (CET) Received: from localhost (localhost [127.0.0.1]) by zmproxy.tng.vnc.biz (Postfix) with ESMTP id CDA301FA3D7; Mon, 26 Jan 2026 20:35:27 +0100 (CET) Received: from zmproxy.tng.vnc.biz ([127.0.0.1]) by localhost (zmproxy.tng.vnc.biz [127.0.0.1]) (amavis, port 10032) with ESMTP id 4RP3If5P0ONE; Mon, 26 Jan 2026 20:35:24 +0100 (CET) Received: from localhost (localhost [127.0.0.1]) by zmproxy.tng.vnc.biz (Postfix) with ESMTP id EAFAE1FA8BB; Mon, 26 Jan 2026 20:35:20 +0100 (CET) DKIM-Filter: OpenDKIM Filter v2.10.3 zmproxy.tng.vnc.biz EAFAE1FA8BB DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=tngtech.com; s=B14491C6-869D-11EB-BB6C-8DD33D883B31; t=1769456121; bh=preD10lEhwJad2/t0Ogmn0P5D6/nocJXkrcJcwNs5yA=; h=From:To:Subject:Date:Message-Id:MIME-Version; b=QuPhty8EGmO0iAsNov7TZWJpmVMjFwW1vArVCEnxZicZAyrYf1t9LizSfqZDjcZRk 4R10NL5ppzhluZHRPluENIafAvBZ1L/NGUp4N4v25A9/r4YFXmvKRY1o4nXDjWJCmu lgCoWlflSccMc/KGIds6T9ONF33rZbcsdb558K86WAkk8RDHQW0JWMQuKGY56c6g5S w3bJPEVFHi+ToGooIVCX+KK1+u0qXXbDQYNCgKLCPEyRabps22rL8VScNPfGnWOeFK /u03o2RpvHdh1oN3IKcrS425xa6dljVhecUQDNtm2W8duTLpLW01k619tsVl082lsJ kj+dlHE9xUPfw== X-Virus-Scanned: amavis at zmproxy.tng.vnc.biz Received: from zmproxy.tng.vnc.biz ([127.0.0.1]) by localhost (zmproxy.tng.vnc.biz [127.0.0.1]) (amavis, port 10026) with ESMTP id fvqBmmrN7ZPv; Mon, 26 Jan 2026 20:35:20 +0100 (CET) Received: from DESKTOP-0O0JV6I.localdomain (ipservice-092-208-231-176.092.208.pools.vodafone-ip.de [92.208.231.176]) by zmproxy.tng.vnc.biz (Postfix) with ESMTPSA id D91A11FA90B; Mon, 26 Jan 2026 20:35:18 +0100 (CET) From: Luis Augenstein To: nathan@kernel.org, nsc@kernel.org Cc: linux-kbuild@vger.kernel.org, linux-kernel@vger.kernel.org, akpm@linux-foundation.org, gregkh@linuxfoundation.org, maximilian.huber@tngtech.com, Luis Augenstein Subject: [PATCH v3 04/14] tools/sbom: add cmd graph generation Date: Mon, 26 Jan 2026 20:32:54 +0100 Message-Id: <20260126193304.320916-5-luis.augenstein@tngtech.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20260126193304.320916-1-luis.augenstein@tngtech.com> References: <20260126193304.320916-1-luis.augenstein@tngtech.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Implement command graph generation by parsing .cmd files to build a dependency graph. Add CmdGraph, CmdGraphNode, and .cmd file parsing. Supports generating a flat list of used source files via the --generate-used-files cli argument. Co-developed-by: Maximilian Huber Signed-off-by: Maximilian Huber Signed-off-by: Luis Augenstein --- tools/sbom/Makefile | 6 +- tools/sbom/sbom.py | 39 +++++ tools/sbom/sbom/cmd_graph/__init__.py | 7 + tools/sbom/sbom/cmd_graph/cmd_file.py | 149 ++++++++++++++++++++ tools/sbom/sbom/cmd_graph/cmd_graph.py | 46 ++++++ tools/sbom/sbom/cmd_graph/cmd_graph_node.py | 120 ++++++++++++++++ tools/sbom/sbom/cmd_graph/deps_parser.py | 52 +++++++ tools/sbom/sbom/config.py | 147 ++++++++++++++++++- 8 files changed, 563 insertions(+), 3 deletions(-) create mode 100644 tools/sbom/sbom/cmd_graph/__init__.py create mode 100644 tools/sbom/sbom/cmd_graph/cmd_file.py create mode 100644 tools/sbom/sbom/cmd_graph/cmd_graph.py create mode 100644 tools/sbom/sbom/cmd_graph/cmd_graph_node.py create mode 100644 tools/sbom/sbom/cmd_graph/deps_parser.py diff --git a/tools/sbom/Makefile b/tools/sbom/Makefile index 90ae42dd28ee..cc4a632533ba 100644 --- a/tools/sbom/Makefile +++ b/tools/sbom/Makefile @@ -29,7 +29,11 @@ $(SBOM_TARGETS) &: $(SBOM_DEPS) sed 's/\.o$$/.ko/' $(objtree)/modules.order >> $(SBOM_ROOTS_FILE); \ fi =20 - @python3 sbom.py + @python3 sbom.py \ + --src-tree $(srctree) \ + --obj-tree $(objtree) \ + --roots-file $(SBOM_ROOTS_FILE) \ + --output-directory $(objtree) =20 @rm $(SBOM_ROOTS_FILE) =20 diff --git a/tools/sbom/sbom.py b/tools/sbom/sbom.py index c7f23d6eb300..25d912a282de 100644 --- a/tools/sbom/sbom.py +++ b/tools/sbom/sbom.py @@ -7,9 +7,13 @@ Compute software bill of materials in SPDX format describi= ng a kernel build. """ =20 import logging +import os import sys +import time import sbom.sbom_logging as sbom_logging from sbom.config import get_config +from sbom.path_utils import is_relative_to +from sbom.cmd_graph import CmdGraph =20 =20 def main(): @@ -22,6 +26,36 @@ def main(): format=3D"[%(levelname)s] %(message)s", ) =20 + # Build cmd graph + logging.debug("Start building cmd graph") + start_time =3D time.time() + cmd_graph =3D CmdGraph.create(config.root_paths, config) + logging.debug(f"Built cmd graph in {time.time() - start_time} seconds") + + # Save used files document + if config.generate_used_files: + if config.src_tree =3D=3D config.obj_tree: + logging.info( + f"Extracting all files from the cmd graph to {(config.used= _files_file_name,)} " + "instead of only source files because source files cannot = be " + "reliably classified when the source and object trees are = identical.", + ) + used_files =3D [os.path.relpath(node.absolute_path, config.src= _tree) for node in cmd_graph] + logging.debug(f"Found {len(used_files)} files in cmd graph.") + else: + used_files =3D [ + os.path.relpath(node.absolute_path, config.src_tree) + for node in cmd_graph + if is_relative_to(node.absolute_path, config.src_tree) + and not is_relative_to(node.absolute_path, config.obj_tree) + ] + logging.debug(f"Found {len(used_files)} source files in cmd gr= aph") + if not sbom_logging.has_errors() or config.write_output_on_error: + used_files_path =3D os.path.join(config.output_directory, conf= ig.used_files_file_name) + with open(used_files_path, "w", encoding=3D"utf-8") as f: + f.write("\n".join(str(file_path) for file_path in used_fil= es)) + logging.debug(f"Successfully saved {used_files_path}") + # Report collected warnings and errors in case of failure warning_summary =3D sbom_logging.summarize_warnings() error_summary =3D sbom_logging.summarize_errors() @@ -30,6 +64,11 @@ def main(): logging.warning(warning_summary) if error_summary: logging.error(error_summary) + if not config.write_output_on_error: + logging.info( + "Use --write-output-on-error to generate output documents = even when errors occur. " + "Note that in this case the generated SPDX documents may b= e incomplete." + ) sys.exit(1) =20 =20 diff --git a/tools/sbom/sbom/cmd_graph/__init__.py b/tools/sbom/sbom/cmd_gr= aph/__init__.py new file mode 100644 index 000000000000..9d661a5c3d93 --- /dev/null +++ b/tools/sbom/sbom/cmd_graph/__init__.py @@ -0,0 +1,7 @@ +# SPDX-License-Identifier: GPL-2.0-only OR MIT +# Copyright (C) 2025 TNG Technology Consulting GmbH + +from .cmd_graph import CmdGraph +from .cmd_graph_node import CmdGraphNode, CmdGraphNodeConfig + +__all__ =3D ["CmdGraph", "CmdGraphNode", "CmdGraphNodeConfig"] diff --git a/tools/sbom/sbom/cmd_graph/cmd_file.py b/tools/sbom/sbom/cmd_gr= aph/cmd_file.py new file mode 100644 index 000000000000..d85ef5de0c26 --- /dev/null +++ b/tools/sbom/sbom/cmd_graph/cmd_file.py @@ -0,0 +1,149 @@ +# SPDX-License-Identifier: GPL-2.0-only OR MIT +# Copyright (C) 2025 TNG Technology Consulting GmbH + +import os +import re +from dataclasses import dataclass, field +from sbom.cmd_graph.deps_parser import parse_cmd_file_deps +from sbom.cmd_graph.savedcmd_parser import parse_inputs_from_commands +import sbom.sbom_logging as sbom_logging +from sbom.path_utils import PathStr + +SAVEDCMD_PATTERN =3D re.compile(r"^(saved)?cmd_.*?:=3D\s*(?P= .+)$") +SOURCE_PATTERN =3D re.compile(r"^source.*?:=3D\s*(?P.+)$") + + +@dataclass +class CmdFile: + cmd_file_path: PathStr + savedcmd: str + source: PathStr | None =3D None + deps: list[str] =3D field(default_factory=3Dlist[str]) + make_rules: list[str] =3D field(default_factory=3Dlist[str]) + + @classmethod + def create(cls, cmd_file_path: PathStr) -> "CmdFile | None": + """ + Parses a .cmd file. + .cmd files are assumed to have one of the following structures: + 1. Full Cmd File + (saved)?cmd_ :=3D + source_ :=3D + deps_ :=3D \ + + :=3D $(deps_) + $(deps_): + + 2. Command Only Cmd File + (saved)?cmd_ :=3D + + 3. Single Dependency Cmd File + (saved)?cmd_ :=3D + :=3D + + Args: + cmd_file_path (Path): absolute Path to a .cmd file + + Returns: + cmd_file (CmdFile): Parsed cmd file. + """ + with open(cmd_file_path, "rt") as f: + lines =3D [line.strip() for line in f.readlines() if line.stri= p() !=3D "" and not line.startswith("#")] + + # savedcmd + match =3D SAVEDCMD_PATTERN.match(lines[0]) + if match is None: + sbom_logging.error( + "Skip parsing '{cmd_file_path}' because no 'savedcmd_' com= mand was found.", cmd_file_path=3Dcmd_file_path + ) + return None + savedcmd =3D match.group("full_command") + + # Command Only Cmd File + if len(lines) =3D=3D 1: + return CmdFile(cmd_file_path, savedcmd) + + # Single Dependency Cmd File + if len(lines) =3D=3D 2: + dep =3D lines[1].split(":")[1].strip() + return CmdFile(cmd_file_path, savedcmd, deps=3D[dep]) + + # Full Cmd File + # source + line1 =3D SOURCE_PATTERN.match(lines[1]) + if line1 is None: + sbom_logging.error( + "Skip parsing '{cmd_file_path}' because no 'source_' entry= was found.", cmd_file_path=3Dcmd_file_path + ) + return CmdFile(cmd_file_path, savedcmd) + source =3D line1.group("source_file") + + # deps + deps: list[str] =3D [] + i =3D 3 # lines[2] includes the variable assignment but no actual= dependency, so we need to start at lines[3]. + while i < len(lines): + if not lines[i].endswith("\\"): + break + deps.append(lines[i][:-1].strip()) + i +=3D 1 + + # make_rules + make_rules =3D lines[i:] + + return CmdFile(cmd_file_path, savedcmd, source, deps, make_rules) + + def get_dependencies( + self: "CmdFile", target_path: PathStr, obj_tree: PathStr, fail_on_= unknown_build_command: bool + ) -> list[PathStr]: + """ + Parses all dependencies required to build a target file from its c= md file. + + Args: + target_path: path to the target file relative to `obj_tree`. + obj_tree: absolute path to the object tree. + fail_on_unknown_build_command: Whether to fail if an unknown b= uild command is encountered. + + Returns: + list[PathStr]: dependency file paths relative to `obj_tree`. + """ + input_files: list[PathStr] =3D [ + str(p) for p in parse_inputs_from_commands(self.savedcmd, fail= _on_unknown_build_command) + ] + if self.deps: + input_files +=3D [str(p) for p in parse_cmd_file_deps(self.dep= s)] + input_files =3D _expand_resolve_files(input_files, obj_tree) + + cmd_file_dependencies: list[PathStr] =3D [] + for input_file in input_files: + # input files are either absolute or relative to the object tr= ee + if os.path.isabs(input_file): + input_file =3D os.path.relpath(input_file, obj_tree) + if input_file =3D=3D target_path: + # Skip target file to prevent cycles. This is necessary be= cause some multi stage commands first create an output and then pass it as = input to the next command, e.g., objcopy. + continue + cmd_file_dependencies.append(input_file) + + return cmd_file_dependencies + + +def _expand_resolve_files(input_files: list[PathStr], obj_tree: PathStr) -= > list[PathStr]: + """ + Expands resolve files which may reference additional files via '@' not= ation. + + Args: + input_files (list[PathStr]): List of file paths relative to the ob= ject tree, where paths starting with '@' refer to files + containing further file paths, each o= n a separate line. + obj_tree: Absolute path to the root of the object tree. + + Returns: + list[PathStr]: Flattened list of all input file paths, with any ne= sted '@' file references resolved recursively. + """ + expanded_input_files: list[PathStr] =3D [] + for input_file in input_files: + if not input_file.startswith("@"): + expanded_input_files.append(input_file) + continue + with open(os.path.join(obj_tree, input_file.lstrip("@")), "rt") as= f: + resolve_file_content =3D [line_stripped for line in f.readline= s() if (line_stripped :=3D line.strip())] + expanded_input_files +=3D _expand_resolve_files(resolve_file_conte= nt, obj_tree) + return expanded_input_files diff --git a/tools/sbom/sbom/cmd_graph/cmd_graph.py b/tools/sbom/sbom/cmd_g= raph/cmd_graph.py new file mode 100644 index 000000000000..cad54243ff3f --- /dev/null +++ b/tools/sbom/sbom/cmd_graph/cmd_graph.py @@ -0,0 +1,46 @@ +# SPDX-License-Identifier: GPL-2.0-only OR MIT +# Copyright (C) 2025 TNG Technology Consulting GmbH + +from collections import deque +from dataclasses import dataclass, field +from typing import Iterator + +from sbom.cmd_graph.cmd_graph_node import CmdGraphNode, CmdGraphNodeConfig +from sbom.path_utils import PathStr + + +@dataclass +class CmdGraph: + """Directed acyclic graph of build dependencies primarily inferred fro= m .cmd files produced during kernel builds""" + + roots: list[CmdGraphNode] =3D field(default_factory=3Dlist[CmdGraphNod= e]) + + @classmethod + def create(cls, root_paths: list[PathStr], config: CmdGraphNodeConfig)= -> "CmdGraph": + """ + Recursively builds a dependency graph starting from `root_paths`. + Dependencies are mainly discovered by parsing the `.cmd` files. + + Args: + root_paths (list[PathStr]): List of paths to root outputs rela= tive to obj_tree + config (CmdGraphNodeConfig): Configuration options + + Returns: + CmdGraph: A graph of all build dependencies for the given root= files. + """ + node_cache: dict[PathStr, CmdGraphNode] =3D {} + root_nodes =3D [CmdGraphNode.create(root_path, config, node_cache)= for root_path in root_paths] + return CmdGraph(root_nodes) + + def __iter__(self) -> Iterator[CmdGraphNode]: + """Traverse the graph in breadth-first order, yielding each unique= node.""" + visited: set[PathStr] =3D set() + node_stack: deque[CmdGraphNode] =3D deque(self.roots) + while len(node_stack) > 0: + node =3D node_stack.popleft() + if node.absolute_path in visited: + continue + + visited.add(node.absolute_path) + node_stack.extend(node.children) + yield node diff --git a/tools/sbom/sbom/cmd_graph/cmd_graph_node.py b/tools/sbom/sbom/= cmd_graph/cmd_graph_node.py new file mode 100644 index 000000000000..fdaed0f0ccba --- /dev/null +++ b/tools/sbom/sbom/cmd_graph/cmd_graph_node.py @@ -0,0 +1,120 @@ +# SPDX-License-Identifier: GPL-2.0-only OR MIT +# Copyright (C) 2025 TNG Technology Consulting GmbH + +from dataclasses import dataclass, field +from itertools import chain +import logging +import os +from typing import Iterator, Protocol + +from sbom import sbom_logging +from sbom.cmd_graph.cmd_file import CmdFile +from sbom.path_utils import PathStr, is_relative_to + + +@dataclass +class IncbinDependency: + node: "CmdGraphNode" + full_statement: str + + +class CmdGraphNodeConfig(Protocol): + obj_tree: PathStr + src_tree: PathStr + fail_on_unknown_build_command: bool + + +@dataclass +class CmdGraphNode: + """A node in the cmd graph representing a single file and its dependen= cies.""" + + absolute_path: PathStr + """Absolute path to the file this node represents.""" + + cmd_file: CmdFile | None =3D None + """Parsed .cmd file describing how the file at absolute_path was built= , or None if not available.""" + + cmd_file_dependencies: list["CmdGraphNode"] =3D field(default_factory= =3Dlist["CmdGraphNode"]) + incbin_dependencies: list[IncbinDependency] =3D field(default_factory= =3Dlist[IncbinDependency]) + hardcoded_dependencies: list["CmdGraphNode"] =3D field(default_factory= =3Dlist["CmdGraphNode"]) + + @property + def children(self) -> Iterator["CmdGraphNode"]: + seen: set[PathStr] =3D set() + for node in chain( + self.cmd_file_dependencies, + (dep.node for dep in self.incbin_dependencies), + self.hardcoded_dependencies, + ): + if node.absolute_path not in seen: + seen.add(node.absolute_path) + yield node + + @classmethod + def create( + cls, + target_path: PathStr, + config: CmdGraphNodeConfig, + cache: dict[PathStr, "CmdGraphNode"] | None =3D None, + depth: int =3D 0, + ) -> "CmdGraphNode": + """ + Recursively builds a dependency graph starting from `target_path`. + Dependencies are mainly discovered by parsing the `..cmd` file. + + Args: + target_path: Path to the target file relative to obj_tree. + config: Config options + cache: Tracks processed nodes to prevent cycles. + depth: Internal parameter to track the current recursion depth. + + Returns: + CmdGraphNode: cmd graph node representing the target file + """ + if cache is None: + cache =3D {} + + target_path_absolute =3D ( + os.path.realpath(p) + if os.path.islink(p :=3D os.path.join(config.obj_tree, target_= path)) + else os.path.normpath(p) + ) + + if target_path_absolute in cache: + return cache[target_path_absolute] + + if depth =3D=3D 0: + logging.debug(f"Build node: {target_path}") + + cmd_file_path =3D _to_cmd_path(target_path_absolute) + cmd_file =3D CmdFile.create(cmd_file_path) if os.path.exists(cmd_f= ile_path) else None + node =3D CmdGraphNode(target_path_absolute, cmd_file) + cache[target_path_absolute] =3D node + + if not os.path.exists(target_path_absolute): + error_or_warning =3D ( + sbom_logging.error + if is_relative_to(target_path_absolute, config.obj_tree) + or is_relative_to(target_path_absolute, config.src_tree) + else sbom_logging.warning + ) + error_or_warning( + "Skip parsing '{target_path_absolute}' because file does n= ot exist", + target_path_absolute=3Dtarget_path_absolute, + ) + return node + + if cmd_file is not None: + node.cmd_file_dependencies =3D [ + CmdGraphNode.create(cmd_file_dependency_path, config, cach= e, depth + 1) + for cmd_file_dependency_path in cmd_file.get_dependencies( + target_path, config.obj_tree, config.fail_on_unknown_b= uild_command + ) + ] + + return node + + +def _to_cmd_path(path: PathStr) -> PathStr: + name =3D os.path.basename(path) + return path.removesuffix(name) + f".{name}.cmd" diff --git a/tools/sbom/sbom/cmd_graph/deps_parser.py b/tools/sbom/sbom/cmd= _graph/deps_parser.py new file mode 100644 index 000000000000..fb3ccdd415b5 --- /dev/null +++ b/tools/sbom/sbom/cmd_graph/deps_parser.py @@ -0,0 +1,52 @@ +# SPDX-License-Identifier: GPL-2.0-only OR MIT +# Copyright (C) 2025 TNG Technology Consulting GmbH + +import re +import sbom.sbom_logging as sbom_logging +from sbom.path_utils import PathStr + +# Match dependencies on config files +# Example match: "$(wildcard include/config/CONFIG_SOMETHING)" +CONFIG_PATTERN =3D re.compile(r"\$\(wildcard (include/config/[^)]+)\)") + +# Match dependencies on the objtool binary +# Example match: "$(wildcard ./tools/objtool/objtool)" +OBJTOOL_PATTERN =3D re.compile(r"\$\(wildcard \./tools/objtool/objtool\)") + +# Match any Makefile wildcard reference +# Example match: "$(wildcard path/to/file)" +WILDCARD_PATTERN =3D re.compile(r"\$\(wildcard (?P[^)]+)\)") + +# Match ordinary paths: +# - ^(\/)?: Optionally starts with a '/' +# - (([\w\-\., ]*)\/)*: Zero or more directory levels +# - [\w\-\., ]+$: Path component (file or directory) +# Example matches: "/foo/bar.c", "dir1/dir2/file.txt", "plainfile" +VALID_PATH_PATTERN =3D re.compile(r"^(\/)?(([\w\-\., ]*)\/)*[\w\-\., ]+$") + + +def parse_cmd_file_deps(deps: list[str]) -> list[PathStr]: + """ + Parse dependency strings of a .cmd file and return valid input file pa= ths. + + Args: + deps: List of dependency strings as found in `.cmd` files. + + Returns: + input_files: List of input file paths + """ + input_files: list[PathStr] =3D [] + for dep in deps: + dep =3D dep.strip() + match dep: + case _ if CONFIG_PATTERN.match(dep) or OBJTOOL_PATTERN.match(d= ep): + # config paths like include/config/ should no= t be included in the graph + continue + case _ if match :=3D WILDCARD_PATTERN.match(dep): + path =3D match.group("path") + input_files.append(path) + case _ if VALID_PATH_PATTERN.match(dep): + input_files.append(dep) + case _: + sbom_logging.error("Skip parsing dependency {dep} because = of unrecognized format", dep=3Ddep) + return input_files diff --git a/tools/sbom/sbom/config.py b/tools/sbom/sbom/config.py index 3dc569ae0c43..39e556a4c53b 100644 --- a/tools/sbom/sbom/config.py +++ b/tools/sbom/sbom/config.py @@ -3,15 +3,43 @@ =20 import argparse from dataclasses import dataclass +import os +from typing import Any +from sbom.path_utils import PathStr =20 =20 @dataclass class KernelSbomConfig: + src_tree: PathStr + """Absolute path to the Linux kernel source directory.""" + + obj_tree: PathStr + """Absolute path to the build output directory.""" + + root_paths: list[PathStr] + """List of paths to root outputs (relative to obj_tree) to base the SB= OM on.""" + + generate_used_files: bool + """Whether to generate a flat list of all source files used in the bui= ld. + If False, no used-files document is created.""" + + used_files_file_name: str + """If `generate_used_files` is True, specifies the file name for the u= sed-files document.""" + + output_directory: PathStr + """Path to the directory where the generated output documents will be = saved.""" + debug: bool """Whether to enable debug logging.""" =20 + fail_on_unknown_build_command: bool + """Whether to fail if an unknown build command is encountered in a .cm= d file.""" + + write_output_on_error: bool + """Whether to write output documents even if errors occur.""" + =20 -def _parse_cli_arguments() -> dict[str, bool]: +def _parse_cli_arguments() -> dict[str, Any]: """ Parse command-line arguments using argparse. =20 @@ -19,8 +47,49 @@ def _parse_cli_arguments() -> dict[str, bool]: Dictionary of parsed arguments. """ parser =3D argparse.ArgumentParser( + formatter_class=3Dargparse.RawTextHelpFormatter, description=3D"Generate SPDX SBOM documents for kernel builds", ) + parser.add_argument( + "--src-tree", + default=3D"../linux", + help=3D"Path to the kernel source tree (default: ../linux)", + ) + parser.add_argument( + "--obj-tree", + default=3D"../linux/kernel_build", + help=3D"Path to the build output directory (default: ../linux/kern= el_build)", + ) + group =3D parser.add_mutually_exclusive_group(required=3DTrue) + group.add_argument( + "--roots", + nargs=3D"+", + default=3D"arch/x86/boot/bzImage", + help=3D"Space-separated list of paths relative to obj-tree for whi= ch the SBOM will be created.\n" + "Cannot be used together with --roots-file. (default: arch/x86/boo= t/bzImage)", + ) + group.add_argument( + "--roots-file", + help=3D"Path to a file containing the root paths (one per line). C= annot be used together with --roots.", + ) + parser.add_argument( + "--generate-used-files", + action=3D"store_true", + default=3DFalse, + help=3D( + "Whether to create the sbom.used-files.txt file, a flat list o= f all " + "source files used for the kernel build.\n" + "If src-tree and obj-tree are equal it is not possible to reli= ably " + "classify source files.\n" + "In this case sbom.used-files.txt will contain all files used = for the " + "kernel build including all build artifacts. (default: False)" + ), + ) + parser.add_argument( + "--output-directory", + default=3D".", + help=3D"Path to the directory where the generated output documents= will be stored (default: .)", + ) parser.add_argument( "--debug", action=3D"store_true", @@ -28,6 +97,28 @@ def _parse_cli_arguments() -> dict[str, bool]: help=3D"Enable debug logs (default: False)", ) =20 + # Error handling settings + parser.add_argument( + "--do-not-fail-on-unknown-build-command", + action=3D"store_true", + default=3DFalse, + help=3D( + "Whether to fail if an unknown build command is encountered in= a .cmd file.\n" + "If set to True, errors are logged as warnings instead. (defau= lt: False)" + ), + ) + parser.add_argument( + "--write-output-on-error", + action=3D"store_true", + default=3DFalse, + help=3D( + "Write output documents even if errors occur. The resulting do= cuments " + "may be incomplete.\n" + "A summary of warnings and errors can be found in the 'comment= ' property " + "of the CreationInfo element. (default: False)" + ), + ) + args =3D vars(parser.parse_args()) return args =20 @@ -42,6 +133,58 @@ def get_config() -> KernelSbomConfig: # Parse cli arguments args =3D _parse_cli_arguments() =20 + # Extract and validate cli arguments + src_tree =3D os.path.realpath(args["src_tree"]) + obj_tree =3D os.path.realpath(args["obj_tree"]) + root_paths =3D [] + if args["roots_file"]: + with open(args["roots_file"], "rt") as f: + root_paths =3D [root.strip() for root in f.readlines()] + else: + root_paths =3D args["roots"] + _validate_path_arguments(src_tree, obj_tree, root_paths) + + generate_used_files =3D args["generate_used_files"] + output_directory =3D os.path.realpath(args["output_directory"]) debug =3D args["debug"] =20 - return KernelSbomConfig(debug=3Ddebug) + fail_on_unknown_build_command =3D not args["do_not_fail_on_unknown_bui= ld_command"] + write_output_on_error =3D args["write_output_on_error"] + + # Hardcoded config + used_files_file_name =3D "sbom.used-files.txt" + + return KernelSbomConfig( + src_tree=3Dsrc_tree, + obj_tree=3Dobj_tree, + root_paths=3Droot_paths, + generate_used_files=3Dgenerate_used_files, + used_files_file_name=3Dused_files_file_name, + output_directory=3Doutput_directory, + debug=3Ddebug, + fail_on_unknown_build_command=3Dfail_on_unknown_build_command, + write_output_on_error=3Dwrite_output_on_error, + ) + + +def _validate_path_arguments(src_tree: PathStr, obj_tree: PathStr, root_pa= ths: list[PathStr]) -> None: + """ + Validate that the provided paths exist. + + Args: + src_tree: Absolute path to the source tree. + obj_tree: Absolute path to the object tree. + root_paths: List of root paths relative to obj_tree. + + Raises: + argparse.ArgumentTypeError: If any of the paths don't exist. + """ + if not os.path.exists(src_tree): + raise argparse.ArgumentTypeError(f"--src-tree {src_tree} does not = exist") + if not os.path.exists(obj_tree): + raise argparse.ArgumentTypeError(f"--obj-tree {obj_tree} does not = exist") + for root_path in root_paths: + if not os.path.exists(os.path.join(obj_tree, root_path)): + raise argparse.ArgumentTypeError( + f"path to root artifact {os.path.join(obj_tree, root_path)= } does not exist" + ) --=20 2.34.1 From nobody Sat Feb 7 08:07:12 2026 Received: from mailgw01.zimbra-vnc.de (mailgw01.zimbra-vnc.de [148.251.101.236]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id F087326299; Mon, 26 Jan 2026 19:35:27 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=148.251.101.236 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1769456130; cv=none; b=Cnrdvqgo2gYegz6Xnh2UlIcHdTSGDNj2/DJx4QREKFeIuMgeZ9xYaCE85uALaGpZLye5yfYlCh4f2GqA/JdktKvV3tWW5Y+qMaxS+iPWIVQsU1m/MaQYYhy79uuzkZ/3pvzK5PqMUD0y4zSKL0VEiLB0Y44ROATC5g5Iiqj64ik= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1769456130; c=relaxed/simple; bh=a1CLXFbQ5+y3Pm8Po8T4c1o5FI27N7Dg2WpWMiQme0I=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=K+sQava1CBlIDnQtP4OnPdkC6RV1kcy08Fm6eulV+evkc69wom+Sp25xn04C6Lr2KEM8/MOoJW+purwQ41kV/uFqkn38lvDDMltNZoo08jSMLLOv/ZDS+IfhsKKosgE1LjQL8cidHObaOvhoqJXBbzQOyGNsLWA6NZKRFUcYaUk= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=tngtech.com; spf=pass smtp.mailfrom=tngtech.com; dkim=pass (2048-bit key) header.d=tngtech.com header.i=@tngtech.com header.b=ji3kX5Z3; arc=none smtp.client-ip=148.251.101.236 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=tngtech.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=tngtech.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=tngtech.com header.i=@tngtech.com header.b="ji3kX5Z3" Received: from zmproxy.tng.vnc.biz (zimbra-vnc.tngtech.com [35.234.71.156]) by mailgw01.zimbra-vnc.de (Postfix) with ESMTPS id 485023FAF1; Mon, 26 Jan 2026 20:35:26 +0100 (CET) Received: from localhost (localhost [127.0.0.1]) by zmproxy.tng.vnc.biz (Postfix) with ESMTP id 1B2301FA878; Mon, 26 Jan 2026 20:35:26 +0100 (CET) Received: from zmproxy.tng.vnc.biz ([127.0.0.1]) by localhost (zmproxy.tng.vnc.biz [127.0.0.1]) (amavis, port 10032) with ESMTP id V03SDoXgyLFV; Mon, 26 Jan 2026 20:35:24 +0100 (CET) Received: from localhost (localhost [127.0.0.1]) by zmproxy.tng.vnc.biz (Postfix) with ESMTP id 95BB71FA707; Mon, 26 Jan 2026 20:35:22 +0100 (CET) DKIM-Filter: OpenDKIM Filter v2.10.3 zmproxy.tng.vnc.biz 95BB71FA707 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=tngtech.com; s=B14491C6-869D-11EB-BB6C-8DD33D883B31; t=1769456122; bh=lEKS0ZbWsthJmCiuPhhWQbtJ3rpxVFIKmbJc0QGQK8A=; h=From:To:Subject:Date:Message-Id:MIME-Version; b=ji3kX5Z32f9y0dTMB1uCVID68nYf/xoXVgCqBq1bL5FGAYR894/gwcCygTqTir6ZE icWipBpyaVdLrJ4mYI0GN+dRIzuFifqvaYY3H7Q8BhenvXKrxraImoT7rKHpBwYoIg LcfJwVAM0XuKqI+6uhvugBlp2qJK1nlyNRexoZ3jgyTk6m8//nYd018bzNq3mCUrDX T024ks/XiRY3Dqxsu5/1YJz4Kt6phz4xJZFKaNLJmk8QQ8KzYqI+V3pUGrma57FUDp eTXLEOWrQRWnuAV1+Pt57Z9AHK5vf1w3iDKDG3cKTrMsfVqErBkM4YLEvkO3mWVQl7 EEgTRvzo+oZhg== X-Virus-Scanned: amavis at zmproxy.tng.vnc.biz Received: from zmproxy.tng.vnc.biz ([127.0.0.1]) by localhost (zmproxy.tng.vnc.biz [127.0.0.1]) (amavis, port 10026) with ESMTP id SrEIBFBvSAGP; Mon, 26 Jan 2026 20:35:22 +0100 (CET) Received: from DESKTOP-0O0JV6I.localdomain (ipservice-092-208-231-176.092.208.pools.vodafone-ip.de [92.208.231.176]) by zmproxy.tng.vnc.biz (Postfix) with ESMTPSA id 95E4E1FA8D7; Mon, 26 Jan 2026 20:35:20 +0100 (CET) From: Luis Augenstein To: nathan@kernel.org, nsc@kernel.org Cc: linux-kbuild@vger.kernel.org, linux-kernel@vger.kernel.org, akpm@linux-foundation.org, gregkh@linuxfoundation.org, maximilian.huber@tngtech.com, Luis Augenstein Subject: [PATCH v3 05/14] tools/sbom: add additional dependency sources for cmd graph Date: Mon, 26 Jan 2026 20:32:55 +0100 Message-Id: <20260126193304.320916-6-luis.augenstein@tngtech.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20260126193304.320916-1-luis.augenstein@tngtech.com> References: <20260126193304.320916-1-luis.augenstein@tngtech.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Add hardcoded dependencies and .incbin directive parsing to discover dependencies not tracked by .cmd files. Co-developed-by: Maximilian Huber Signed-off-by: Maximilian Huber Signed-off-by: Luis Augenstein --- tools/sbom/sbom/cmd_graph/cmd_graph_node.py | 24 +++++- .../sbom/cmd_graph/hardcoded_dependencies.py | 83 +++++++++++++++++++ tools/sbom/sbom/cmd_graph/incbin_parser.py | 42 ++++++++++ tools/sbom/sbom/environment.py | 14 ++++ 4 files changed, 162 insertions(+), 1 deletion(-) create mode 100644 tools/sbom/sbom/cmd_graph/hardcoded_dependencies.py create mode 100644 tools/sbom/sbom/cmd_graph/incbin_parser.py create mode 100644 tools/sbom/sbom/environment.py diff --git a/tools/sbom/sbom/cmd_graph/cmd_graph_node.py b/tools/sbom/sbom/= cmd_graph/cmd_graph_node.py index fdaed0f0ccba..feacdbf76955 100644 --- a/tools/sbom/sbom/cmd_graph/cmd_graph_node.py +++ b/tools/sbom/sbom/cmd_graph/cmd_graph_node.py @@ -9,6 +9,8 @@ from typing import Iterator, Protocol =20 from sbom import sbom_logging from sbom.cmd_graph.cmd_file import CmdFile +from sbom.cmd_graph.hardcoded_dependencies import get_hardcoded_dependenci= es +from sbom.cmd_graph.incbin_parser import parse_incbin_statements from sbom.path_utils import PathStr, is_relative_to =20 =20 @@ -104,14 +106,34 @@ class CmdGraphNode: ) return node =20 + # Search for dependencies to add to the graph as child nodes. Chil= d paths are always relative to the output tree. + def _build_child_node(child_path: PathStr) -> "CmdGraphNode": + return CmdGraphNode.create(child_path, config, cache, depth + = 1) + + node.hardcoded_dependencies =3D [ + _build_child_node(hardcoded_dependency_path) + for hardcoded_dependency_path in get_hardcoded_dependencies( + target_path_absolute, config.obj_tree, config.src_tree + ) + ] + if cmd_file is not None: node.cmd_file_dependencies =3D [ - CmdGraphNode.create(cmd_file_dependency_path, config, cach= e, depth + 1) + _build_child_node(cmd_file_dependency_path) for cmd_file_dependency_path in cmd_file.get_dependencies( target_path, config.obj_tree, config.fail_on_unknown_b= uild_command ) ] =20 + if node.absolute_path.endswith(".S"): + node.incbin_dependencies =3D [ + IncbinDependency( + node=3D_build_child_node(incbin_statement.path), + full_statement=3Dincbin_statement.full_statement, + ) + for incbin_statement in parse_incbin_statements(node.absol= ute_path) + ] + return node =20 =20 diff --git a/tools/sbom/sbom/cmd_graph/hardcoded_dependencies.py b/tools/sb= om/sbom/cmd_graph/hardcoded_dependencies.py new file mode 100644 index 000000000000..a5977f14ae49 --- /dev/null +++ b/tools/sbom/sbom/cmd_graph/hardcoded_dependencies.py @@ -0,0 +1,83 @@ +# SPDX-License-Identifier: GPL-2.0-only OR MIT +# Copyright (C) 2025 TNG Technology Consulting GmbH + +import os +from typing import Callable +import sbom.sbom_logging as sbom_logging +from sbom.path_utils import PathStr, is_relative_to +from sbom.environment import Environment + +HARDCODED_DEPENDENCIES: dict[str, list[str]] =3D { + # defined in linux/Kbuild + "include/generated/rq-offsets.h": ["kernel/sched/rq-offsets.s"], + "kernel/sched/rq-offsets.s": ["include/generated/asm-offsets.h"], + "include/generated/bounds.h": ["kernel/bounds.s"], + "include/generated/asm-offsets.h": ["arch/{arch}/kernel/asm-offsets.s"= ], +} + + +def get_hardcoded_dependencies(path: PathStr, obj_tree: PathStr, src_tree:= PathStr) -> list[PathStr]: + """ + Some files in the kernel build process are not tracked by the .cmd dep= endency mechanism. + Parsing these dependencies programmatically is too complex for the sco= pe of this project. + Therefore, this function provides manually defined dependencies to be = added to the build graph. + + Args: + path: absolute path to a file within the src tree or object tree. + obj_tree: absolute Path to the base directory of the object tree. + src_tree: absolute Path to the `linux` source directory. + + Returns: + list[PathStr]: A list of dependency file paths (relative to the ob= ject tree) required to build the file at the given path. + """ + if is_relative_to(path, obj_tree): + path =3D os.path.relpath(path, obj_tree) + elif is_relative_to(path, src_tree): + path =3D os.path.relpath(path, src_tree) + + if path not in HARDCODED_DEPENDENCIES: + return [] + + template_variables: dict[str, Callable[[], str | None]] =3D { + "arch": lambda: _get_arch(path), + } + + dependencies: list[PathStr] =3D [] + for dependency_template in HARDCODED_DEPENDENCIES[path]: + dependency =3D _evaluate_template(dependency_template, template_va= riables) + if dependency is None: + continue + if os.path.exists(os.path.join(obj_tree, dependency)): + dependencies.append(dependency) + elif os.path.exists(os.path.join(src_tree, dependency)): + dependencies.append(os.path.relpath(dependency, obj_tree)) + else: + sbom_logging.error( + "Skip hardcoded dependency '{dependency}' for '{path}' bec= ause the dependency lies neither in the src tree nor the object tree.", + dependency=3Ddependency, + path=3Dpath, + ) + + return dependencies + + +def _evaluate_template(template: str, variables: dict[str, Callable[[], st= r | None]]) -> str | None: + for key, value_function in variables.items(): + template_key =3D "{" + key + "}" + if template_key in template: + value =3D value_function() + if value is None: + return None + template =3D template.replace(template_key, value) + return template + + +def _get_arch(path: PathStr): + srcarch =3D Environment.SRCARCH() + if srcarch is None: + sbom_logging.error( + "Skipped architecture specific hardcoded dependency for '{path= }' because the SRCARCH environment variable was not set.", + path=3Dpath, + ) + return None + return srcarch diff --git a/tools/sbom/sbom/cmd_graph/incbin_parser.py b/tools/sbom/sbom/c= md_graph/incbin_parser.py new file mode 100644 index 000000000000..130f9520837d --- /dev/null +++ b/tools/sbom/sbom/cmd_graph/incbin_parser.py @@ -0,0 +1,42 @@ +# SPDX-License-Identifier: GPL-2.0-only OR MIT +# Copyright (C) 2025 TNG Technology Consulting GmbH + +from dataclasses import dataclass +import re + +from sbom.path_utils import PathStr + +INCBIN_PATTERN =3D re.compile(r'\s*\.incbin\s+"(?P[^"]+)"') +"""Regex pattern for matching `.incbin ""` statements.""" + + +@dataclass +class IncbinStatement: + """A parsed `.incbin ""` directive.""" + + path: PathStr + """path to the file referenced by the `.incbin` directive.""" + + full_statement: str + """Full `.incbin ""` statement as it originally appeared in the = file.""" + + +def parse_incbin_statements(absolute_path: PathStr) -> list[IncbinStatemen= t]: + """ + Parses `.incbin` directives from an `.S` assembly file. + + Args: + absolute_path: Absolute path to the `.S` assembly file. + + Returns: + list[IncbinStatement]: Parsed `.incbin` statements. + """ + with open(absolute_path, "rt") as f: + content =3D f.read() + return [ + IncbinStatement( + path=3Dmatch.group("path"), + full_statement=3Dmatch.group(0).strip(), + ) + for match in INCBIN_PATTERN.finditer(content) + ] diff --git a/tools/sbom/sbom/environment.py b/tools/sbom/sbom/environment.py new file mode 100644 index 000000000000..b3fb2f0ba61d --- /dev/null +++ b/tools/sbom/sbom/environment.py @@ -0,0 +1,14 @@ +# SPDX-License-Identifier: GPL-2.0-only OR MIT +# Copyright (C) 2025 TNG Technology Consulting GmbH + +import os + + +class Environment: + """ + Read-only accessor for kernel build environment variables. + """ + + @classmethod + def SRCARCH(cls) -> str | None: + return os.getenv("SRCARCH") --=20 2.34.1 From nobody Sat Feb 7 08:07:12 2026 Received: from mailgw01.zimbra-vnc.de (mailgw01.zimbra-vnc.de [148.251.101.236]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B945A285C9F; Mon, 26 Jan 2026 19:35:35 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=148.251.101.236 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1769456138; cv=none; b=g+/MA5poSRGycNWJEQc475SYxCuQcVqB9KoDMyzyGz2JffGTgTWMfNv5TD/YTXjWmySwWtaZBEOHcuJgBNrJPLQ/BbvAOUZUNkfZRgkVpequw9ALVColmXyMVa6VN7wI1A5YBopoeTGANkqV6WXGkBQQCVXFuaJV0OJTIUDyY7s= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1769456138; c=relaxed/simple; bh=z6SyEuqZWISo2mKzCJ/Rs7e+ADbOnLnBKeF6iFIeX3g=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=BDIJPhX+5dOrEsa8E75f+dLbX5AO7vwCHd5AYeGlZd7Je8QiyOzczc6YQGf3XCIqaYvxQqVLNXyzXclleZ7y0o/yTLj1lhQpTxGzZgDjoxvXFYW0t7I1lougtzMQjf1eYVO+CgEXiLvqLCgNP8GH9a2tbexGT8xOVFAuemldUXM= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=tngtech.com; spf=pass smtp.mailfrom=tngtech.com; dkim=pass (2048-bit key) header.d=tngtech.com header.i=@tngtech.com header.b=VOYA7nhx; arc=none smtp.client-ip=148.251.101.236 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=tngtech.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=tngtech.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=tngtech.com header.i=@tngtech.com header.b="VOYA7nhx" Received: from zmproxy.tng.vnc.biz (zimbra-vnc.tngtech.com [35.234.71.156]) by mailgw01.zimbra-vnc.de (Postfix) with ESMTPS id 753DF3FAF7; Mon, 26 Jan 2026 20:35:28 +0100 (CET) Received: from localhost (localhost [127.0.0.1]) by zmproxy.tng.vnc.biz (Postfix) with ESMTP id E69AC1FA7AF; Mon, 26 Jan 2026 20:35:27 +0100 (CET) Received: from zmproxy.tng.vnc.biz ([127.0.0.1]) by localhost (zmproxy.tng.vnc.biz [127.0.0.1]) (amavis, port 10032) with ESMTP id kL7VvzdFRIXK; Mon, 26 Jan 2026 20:35:25 +0100 (CET) Received: from localhost (localhost [127.0.0.1]) by zmproxy.tng.vnc.biz (Postfix) with ESMTP id 5B7D11FA872; Mon, 26 Jan 2026 20:35:24 +0100 (CET) DKIM-Filter: OpenDKIM Filter v2.10.3 zmproxy.tng.vnc.biz 5B7D11FA872 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=tngtech.com; s=B14491C6-869D-11EB-BB6C-8DD33D883B31; t=1769456124; bh=Y6QJgTF9k8xayps7mJRuz8W1zUuK7Yii9up8GhJ78h8=; h=From:To:Subject:Date:Message-Id:MIME-Version; b=VOYA7nhx9ZoUdnSPVj7LcLx2Wr+DvZtLibJQKm19w+aJsDHAzA2mh4AuMASYyiMMN LFOAJWgZKZQVuF2rPAOsCgQFPwjdrNuNmG2YyqZWpIokOMZFmOHUQfhrRo0ZY9viDo Weu6UeJ7F9u+V4oHdJfX15REvXi9fhGU2a/88izQoGOQYUXQM9HH7sUM1a5J0IyGwT k7gxPz7lOof/88tcd5Ep1XFyXymJVVDa+ycGyB1kjVJJ2qUfKAagM2jc/Lf8WmSGmh B8pxr1MntlISgkeHEXvvBC7OWEmfQEjttPb79bP5YrY7LShalWMEyr/CwsVYPg7kEP Mdpjoo8wbV4lQ== X-Virus-Scanned: amavis at zmproxy.tng.vnc.biz Received: from zmproxy.tng.vnc.biz ([127.0.0.1]) by localhost (zmproxy.tng.vnc.biz [127.0.0.1]) (amavis, port 10026) with ESMTP id lXPaLjvzKT_k; Mon, 26 Jan 2026 20:35:24 +0100 (CET) Received: from DESKTOP-0O0JV6I.localdomain (ipservice-092-208-231-176.092.208.pools.vodafone-ip.de [92.208.231.176]) by zmproxy.tng.vnc.biz (Postfix) with ESMTPSA id 0F8EB1FA8C7; Mon, 26 Jan 2026 20:35:22 +0100 (CET) From: Luis Augenstein To: nathan@kernel.org, nsc@kernel.org Cc: linux-kbuild@vger.kernel.org, linux-kernel@vger.kernel.org, akpm@linux-foundation.org, gregkh@linuxfoundation.org, maximilian.huber@tngtech.com, Luis Augenstein Subject: [PATCH v3 06/14] tools/sbom: add SPDX classes Date: Mon, 26 Jan 2026 20:32:56 +0100 Message-Id: <20260126193304.320916-7-luis.augenstein@tngtech.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20260126193304.320916-1-luis.augenstein@tngtech.com> References: <20260126193304.320916-1-luis.augenstein@tngtech.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Implement Python dataclasses to model the SPDX classes required within an SPDX document. The class and property names are consistent with the SPDX 3.0.1 specification. Co-developed-by: Maximilian Huber Signed-off-by: Maximilian Huber Signed-off-by: Luis Augenstein --- tools/sbom/sbom/spdx/__init__.py | 7 + tools/sbom/sbom/spdx/build.py | 17 +++ tools/sbom/sbom/spdx/core.py | 182 ++++++++++++++++++++++++ tools/sbom/sbom/spdx/serialization.py | 56 ++++++++ tools/sbom/sbom/spdx/simplelicensing.py | 20 +++ tools/sbom/sbom/spdx/software.py | 71 +++++++++ tools/sbom/sbom/spdx/spdxId.py | 36 +++++ 7 files changed, 389 insertions(+) create mode 100644 tools/sbom/sbom/spdx/__init__.py create mode 100644 tools/sbom/sbom/spdx/build.py create mode 100644 tools/sbom/sbom/spdx/core.py create mode 100644 tools/sbom/sbom/spdx/serialization.py create mode 100644 tools/sbom/sbom/spdx/simplelicensing.py create mode 100644 tools/sbom/sbom/spdx/software.py create mode 100644 tools/sbom/sbom/spdx/spdxId.py diff --git a/tools/sbom/sbom/spdx/__init__.py b/tools/sbom/sbom/spdx/__init= __.py new file mode 100644 index 000000000000..4097b59f8f17 --- /dev/null +++ b/tools/sbom/sbom/spdx/__init__.py @@ -0,0 +1,7 @@ +# SPDX-License-Identifier: GPL-2.0-only OR MIT +# Copyright (C) 2025 TNG Technology Consulting GmbH + +from .spdxId import SpdxId, SpdxIdGenerator +from .serialization import JsonLdSpdxDocument + +__all__ =3D ["JsonLdSpdxDocument", "SpdxId", "SpdxIdGenerator"] diff --git a/tools/sbom/sbom/spdx/build.py b/tools/sbom/sbom/spdx/build.py new file mode 100644 index 000000000000..180a8f1e8bd3 --- /dev/null +++ b/tools/sbom/sbom/spdx/build.py @@ -0,0 +1,17 @@ +# SPDX-License-Identifier: GPL-2.0-only OR MIT +# Copyright (C) 2025 TNG Technology Consulting GmbH + +from dataclasses import dataclass, field +from sbom.spdx.core import DictionaryEntry, Element, Hash + + +@dataclass(kw_only=3DTrue) +class Build(Element): + """https://spdx.github.io/spdx-spec/v3.0.1/model/Build/Classes/Build/"= "" + + type: str =3D field(init=3DFalse, default=3D"build_Build") + build_buildType: str + build_buildId: str + build_environment: list[DictionaryEntry] =3D field(default_factory=3Dl= ist[DictionaryEntry]) + build_configSourceUri: list[str] =3D field(default_factory=3Dlist[str]) + build_configSourceDigest: list[Hash] =3D field(default_factory=3Dlist[= Hash]) diff --git a/tools/sbom/sbom/spdx/core.py b/tools/sbom/sbom/spdx/core.py new file mode 100644 index 000000000000..c5de9194bb89 --- /dev/null +++ b/tools/sbom/sbom/spdx/core.py @@ -0,0 +1,182 @@ +# SPDX-License-Identifier: GPL-2.0-only OR MIT +# Copyright (C) 2025 TNG Technology Consulting GmbH + +from dataclasses import dataclass, field +from datetime import datetime, timezone +from typing import Any, Literal +from sbom.spdx.spdxId import SpdxId + +SPDX_SPEC_VERSION =3D "3.0.1" + +ExternalIdentifierType =3D Literal["email", "gitoid", "urlScheme"] +HashAlgorithm =3D Literal["sha256", "sha512"] +ProfileIdentifierType =3D Literal["core", "software", "build", "lite", "si= mpleLicensing"] +RelationshipType =3D Literal[ + "contains", + "generates", + "hasDeclaredLicense", + "hasInput", + "hasOutput", + "ancestorOf", + "hasDistributionArtifact", + "dependsOn", +] +RelationshipCompleteness =3D Literal["complete", "incomplete", "noAssertio= n"] + + +@dataclass +class SpdxObject: + def to_dict(self) -> dict[str, Any]: + def _to_dict(v: Any): + return v.to_dict() if hasattr(v, "to_dict") else v + + d: dict[str, Any] =3D {} + for field_name in self.__dataclass_fields__: + value =3D getattr(self, field_name) + if not value: + continue + + if isinstance(value, Element): + d[field_name] =3D value.spdxId + elif isinstance(value, list) and len(value) > 0 and isinstance= (value[0], Element): # type: ignore + value: list[Element] =3D value + d[field_name] =3D [v.spdxId for v in value] + else: + d[field_name] =3D [_to_dict(v) for v in value] if isinstan= ce(value, list) else _to_dict(value) # type: ignore + return d + + +@dataclass(kw_only=3DTrue) +class IntegrityMethod(SpdxObject): + """https://spdx.github.io/spdx-spec/v3.0.1/model/Core/Classes/Integrit= yMethod/""" + + +@dataclass(kw_only=3DTrue) +class Hash(IntegrityMethod): + """https://spdx.github.io/spdx-spec/v3.0.1/model/Core/Classes/Hash/""" + + type: str =3D field(init=3DFalse, default=3D"Hash") + hashValue: str + algorithm: HashAlgorithm + + +@dataclass(kw_only=3DTrue) +class Element(SpdxObject): + """https://spdx.github.io/spdx-spec/v3.0.1/model/Core/Classes/Element/= """ + + type: str =3D field(init=3DFalse, default=3D"Element") + spdxId: SpdxId + creationInfo: str =3D "_:creationinfo" + name: str | None =3D None + verifiedUsing: list[Hash] =3D field(default_factory=3Dlist[Hash]) + comment: str | None =3D None + + +@dataclass(kw_only=3DTrue) +class ExternalMap(SpdxObject): + """https://spdx.github.io/spdx-spec/v3.0.1/model/Core/Classes/External= Map/""" + + type: str =3D field(init=3DFalse, default=3D"ExternalMap") + externalSpdxId: SpdxId + + +@dataclass(kw_only=3DTrue) +class NamespaceMap(SpdxObject): + """https://spdx.github.io/spdx-spec/v3.0.1/model/Core/Classes/Namespac= eMap/""" + + type: str =3D field(init=3DFalse, default=3D"NamespaceMap") + prefix: str + namespace: str + + +@dataclass(kw_only=3DTrue) +class ElementCollection(Element): + """https://spdx.github.io/spdx-spec/v3.0.1/model/Core/Classes/ElementC= ollection/""" + + type: str =3D field(init=3DFalse, default=3D"ElementCollection") + element: list[Element] =3D field(default_factory=3Dlist[Element]) + rootElement: list[Element] =3D field(default_factory=3Dlist[Element]) + profileConformance: list[ProfileIdentifierType] =3D field(default_fact= ory=3Dlist[ProfileIdentifierType]) + + +@dataclass(kw_only=3DTrue) +class SpdxDocument(ElementCollection): + """https://spdx.github.io/spdx-spec/v3.0.1/model/Core/Classes/SpdxDocu= ment/""" + + type: str =3D field(init=3DFalse, default=3D"SpdxDocument") + import_: list[ExternalMap] =3D field(default_factory=3Dlist[ExternalMa= p]) + namespaceMap: list[NamespaceMap] =3D field(default_factory=3Dlist[Name= spaceMap]) + + def to_dict(self) -> dict[str, Any]: + return {("import" if k =3D=3D "import_" else k): v for k, v in sup= er().to_dict().items()} + + +@dataclass(kw_only=3DTrue) +class ExternalIdentifier(SpdxObject): + """https://spdx.github.io/spdx-spec/v3.0.1/model/Core/Classes/External= Identifier/""" + + type: str =3D field(init=3DFalse, default=3D"ExternalIdentifier") + externalIdentifierType: ExternalIdentifierType + identifier: str + + +@dataclass(kw_only=3DTrue) +class Agent(Element): + """https://spdx.github.io/spdx-spec/v3.0.1/model/Core/Classes/Agent/""" + + type: str =3D field(init=3DFalse, default=3D"Agent") + externalIdentifier: list[ExternalIdentifier] =3D field(default_factory= =3Dlist[ExternalIdentifier]) + + +@dataclass(kw_only=3DTrue) +class SoftwareAgent(Agent): + """https://spdx.github.io/spdx-spec/v3.0.1/model/Core/Classes/Software= Agent/""" + + type: str =3D field(init=3DFalse, default=3D"SoftwareAgent") + + +@dataclass(kw_only=3DTrue) +class CreationInfo(SpdxObject): + """https://spdx.github.io/spdx-spec/v3.0.1/model/Core/Classes/Creation= Info/""" + + type: str =3D field(init=3DFalse, default=3D"CreationInfo") + id: SpdxId =3D "_:creationinfo" + specVersion: str =3D SPDX_SPEC_VERSION + createdBy: list[Agent] + created: str =3D field(default_factory=3Dlambda: datetime.now(timezone= .utc).strftime("%Y-%m-%dT%H:%M:%SZ")) + comment: str | None =3D None + + def to_dict(self) -> dict[str, Any]: + return {("@id" if k =3D=3D "id" else k): v for k, v in super().to_= dict().items()} + + +@dataclass(kw_only=3DTrue) +class Relationship(Element): + """https://spdx.github.io/spdx-spec/v3.0.1/model/Core/Classes/Relation= ship/""" + + type: str =3D field(init=3DFalse, default=3D"Relationship") + relationshipType: RelationshipType + from_: Element # underscore because 'from' is a reserved keyword + to: list[Element] + completeness: RelationshipCompleteness | None =3D None + + def to_dict(self) -> dict[str, Any]: + return {("from" if k =3D=3D "from_" else k): v for k, v in super()= .to_dict().items()} + + +@dataclass(kw_only=3DTrue) +class Artifact(Element): + """https://spdx.github.io/spdx-spec/v3.0.1/model/Core/Classes/Artifact= /""" + + type: str =3D field(init=3DFalse, default=3D"Artifact") + builtTime: str | None =3D None + originatedBy: list[Agent] =3D field(default_factory=3Dlist[Agent]) + + +@dataclass(kw_only=3DTrue) +class DictionaryEntry(SpdxObject): + """https://spdx.github.io/spdx-spec/v3.0.1/model/Core/Classes/Dictiona= ryEntry/""" + + type: str =3D field(init=3DFalse, default=3D"DictionaryEntry") + key: str + value: str diff --git a/tools/sbom/sbom/spdx/serialization.py b/tools/sbom/sbom/spdx/s= erialization.py new file mode 100644 index 000000000000..c830d6b3cf19 --- /dev/null +++ b/tools/sbom/sbom/spdx/serialization.py @@ -0,0 +1,56 @@ +# SPDX-License-Identifier: GPL-2.0-only OR MIT +# Copyright (C) 2025 TNG Technology Consulting GmbH + +import json +from typing import Any +from sbom.path_utils import PathStr +from sbom.spdx.core import SPDX_SPEC_VERSION, SpdxDocument, SpdxObject + + +class JsonLdSpdxDocument: + """Represents an SPDX document in JSON-LD format for serialization.""" + + context: list[str | dict[str, str]] + graph: list[SpdxObject] + + def __init__(self, graph: list[SpdxObject]) -> None: + """ + Initialize a JSON-LD SPDX document from a graph of SPDX objects. + The graph must contain a single SpdxDocument element. + + Args: + graph: List of SPDX objects representing the complete SPDX doc= ument. + """ + self.graph =3D graph + spdx_document =3D next(element for element in graph if isinstance(= element, SpdxDocument)) + self.context =3D [ + f"https://spdx.org/rdf/{SPDX_SPEC_VERSION}/spdx-context.jsonld= ", + {namespaceMap.prefix: namespaceMap.namespace for namespaceMap = in spdx_document.namespaceMap}, + ] + spdx_document.namespaceMap =3D [] + + def to_dict(self) -> dict[str, Any]: + """ + Convert the SPDX document to a dictionary representation suitable = for JSON serialization. + + Returns: + Dictionary with @context and @graph keys following JSON-LD for= mat. + """ + return { + "@context": self.context, + "@graph": [item.to_dict() for item in self.graph], + } + + def save(self, path: PathStr, prettify: bool) -> None: + """ + Save the SPDX document to a JSON file. + + Args: + path: File path where the document will be saved. + prettify: Whether to pretty-print the JSON with indentation. + """ + with open(path, "w", encoding=3D"utf-8") as f: + if prettify: + json.dump(self.to_dict(), f, indent=3D2) + else: + json.dump(self.to_dict(), f, separators=3D(",", ":")) diff --git a/tools/sbom/sbom/spdx/simplelicensing.py b/tools/sbom/sbom/spdx= /simplelicensing.py new file mode 100644 index 000000000000..750ddd24ad89 --- /dev/null +++ b/tools/sbom/sbom/spdx/simplelicensing.py @@ -0,0 +1,20 @@ +# SPDX-License-Identifier: GPL-2.0-only OR MIT +# Copyright (C) 2025 TNG Technology Consulting GmbH + +from dataclasses import dataclass, field +from sbom.spdx.core import Element + + +@dataclass(kw_only=3DTrue) +class AnyLicenseInfo(Element): + """https://spdx.github.io/spdx-spec/v3.0.1/model/SimpleLicensing/Class= es/AnyLicenseInfo/""" + + type: str =3D field(init=3DFalse, default=3D"simplelicensing_AnyLicens= eInfo") + + +@dataclass(kw_only=3DTrue) +class LicenseExpression(AnyLicenseInfo): + """https://spdx.github.io/spdx-spec/v3.0.1/model/SimpleLicensing/Class= es/LicenseExpression/""" + + type: str =3D field(init=3DFalse, default=3D"simplelicensing_LicenseEx= pression") + simplelicensing_licenseExpression: str diff --git a/tools/sbom/sbom/spdx/software.py b/tools/sbom/sbom/spdx/softwa= re.py new file mode 100644 index 000000000000..208e0168b939 --- /dev/null +++ b/tools/sbom/sbom/spdx/software.py @@ -0,0 +1,71 @@ +# SPDX-License-Identifier: GPL-2.0-only OR MIT +# Copyright (C) 2025 TNG Technology Consulting GmbH + +from dataclasses import dataclass, field +from typing import Literal +from sbom.spdx.core import Artifact, ElementCollection, IntegrityMethod + + +SbomType =3D Literal["source", "build"] +FileKindType =3D Literal["file", "directory"] +SoftwarePurpose =3D Literal[ + "source", + "archive", + "library", + "file", + "data", + "configuration", + "executable", + "module", + "application", + "documentation", + "other", +] +ContentIdentifierType =3D Literal["gitoid", "swhid"] + + +@dataclass(kw_only=3DTrue) +class Sbom(ElementCollection): + """https://spdx.github.io/spdx-spec/v3.0.1/model/Software/Classes/Sbom= /""" + + type: str =3D field(init=3DFalse, default=3D"software_Sbom") + software_sbomType: list[SbomType] =3D field(default_factory=3Dlist[Sbo= mType]) + + +@dataclass(kw_only=3DTrue) +class ContentIdentifier(IntegrityMethod): + """https://spdx.github.io/spdx-spec/v3.0.1/model/Software/Classes/Cont= entIdentifier/""" + + type: str =3D field(init=3DFalse, default=3D"software_ContentIdentifie= r") + software_contentIdentifierType: ContentIdentifierType + software_contentIdentifierValue: str + + +@dataclass(kw_only=3DTrue) +class SoftwareArtifact(Artifact): + """https://spdx.github.io/spdx-spec/v3.0.1/model/Software/Classes/Soft= wareArtifact/""" + + type: str =3D field(init=3DFalse, default=3D"software_Artifact") + software_primaryPurpose: SoftwarePurpose | None =3D None + software_additionalPurpose: list[SoftwarePurpose] =3D field(default_fa= ctory=3Dlist[SoftwarePurpose]) + software_copyrightText: str | None =3D None + software_contentIdentifier: list[ContentIdentifier] =3D field(default_= factory=3Dlist[ContentIdentifier]) + + +@dataclass(kw_only=3DTrue) +class Package(SoftwareArtifact): + """https://spdx.github.io/spdx-spec/v3.0.1/model/Software/Classes/Pack= age/""" + + type: str =3D field(init=3DFalse, default=3D"software_Package") + name: str # type: ignore + software_packageVersion: str | None =3D None + software_downloadLocation: str | None =3D None + + +@dataclass(kw_only=3DTrue) +class File(SoftwareArtifact): + """https://spdx.github.io/spdx-spec/v3.0.1/model/Software/Classes/File= /""" + + type: str =3D field(init=3DFalse, default=3D"software_File") + name: str # type: ignore + software_fileKind: FileKindType | None =3D None diff --git a/tools/sbom/sbom/spdx/spdxId.py b/tools/sbom/sbom/spdx/spdxId.py new file mode 100644 index 000000000000..589e85c5f706 --- /dev/null +++ b/tools/sbom/sbom/spdx/spdxId.py @@ -0,0 +1,36 @@ +# SPDX-License-Identifier: GPL-2.0-only OR MIT +# Copyright (C) 2025 TNG Technology Consulting GmbH + +from itertools import count +from typing import Iterator + +SpdxId =3D str + + +class SpdxIdGenerator: + _namespace: str + _prefix: str | None =3D None + _counter: Iterator[int] + + def __init__(self, namespace: str, prefix: str | None =3D None) -> Non= e: + """ + Initialize the SPDX ID generator with a namespace. + + Args: + namespace: The full namespace to use for generated IDs. + prefix: Optional. If provided, generated IDs will use this pre= fix instead of the full namespace. + """ + self._namespace =3D namespace + self._prefix =3D prefix + self._counter =3D count(0) + + def generate(self) -> SpdxId: + return f"{f'{self._prefix}:' if self._prefix else self._namespace}= {next(self._counter)}" + + @property + def prefix(self) -> str | None: + return self._prefix + + @property + def namespace(self) -> str: + return self._namespace --=20 2.34.1 From nobody Sat Feb 7 08:07:12 2026 Received: from mailgw01.zimbra-vnc.de (mailgw01.zimbra-vnc.de [148.251.101.236]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 92CB428504D; Mon, 26 Jan 2026 19:35:31 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=148.251.101.236 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1769456133; cv=none; b=cf/c+cp+EUgqxl88WUMZCH8cPB2/zqUN7+UVR69h/Sq/hcAahXiUD65EhHvEEXmIQhwPMHBtwLMSPmx4NXWCX/g1R8a/sA/f6Y504oj7ZGPAXmMGYsRFGk5PTMMKzYfszupysVWUoTo7FlmcRT9nVgfFgo8+rr0P6s/GQ+WahpA= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1769456133; c=relaxed/simple; bh=iBo7CqqZUbdt5NKY0d4+flItVF54YJcJAPZV4eEE2bc=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=Cruq0no70Ts694qnG0BW9jxjorXmdRnrXOY6pu5i2rJ0etvDjFUbGJIK/umoOU2itKoBWw2zRfJQ16pD/HVgxKuoVsdn8ypSwXyDM0hgJpNAYaXzdp9Wiqjkqa/Y7uFqPi3lDU7bvjXJGkQvDdVEvl5n/lhT3vpDxpi4Cnd4Vx0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=tngtech.com; spf=pass smtp.mailfrom=tngtech.com; dkim=pass (2048-bit key) header.d=tngtech.com header.i=@tngtech.com header.b=CWAT0TnQ; arc=none smtp.client-ip=148.251.101.236 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=tngtech.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=tngtech.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=tngtech.com header.i=@tngtech.com header.b="CWAT0TnQ" Received: from zmproxy.tng.vnc.biz (zimbra-vnc.tngtech.com [35.234.71.156]) by mailgw01.zimbra-vnc.de (Postfix) with ESMTPS id 2E2703FAF2; Mon, 26 Jan 2026 20:35:27 +0100 (CET) Received: from localhost (localhost [127.0.0.1]) by zmproxy.tng.vnc.biz (Postfix) with ESMTP id A492F1FA703; Mon, 26 Jan 2026 20:35:26 +0100 (CET) Received: from zmproxy.tng.vnc.biz ([127.0.0.1]) by localhost (zmproxy.tng.vnc.biz [127.0.0.1]) (amavis, port 10032) with ESMTP id f_MhrksYLW9s; Mon, 26 Jan 2026 20:35:24 +0100 (CET) Received: from localhost (localhost [127.0.0.1]) by zmproxy.tng.vnc.biz (Postfix) with ESMTP id 082D91FA6D0; Mon, 26 Jan 2026 20:35:24 +0100 (CET) DKIM-Filter: OpenDKIM Filter v2.10.3 zmproxy.tng.vnc.biz 082D91FA6D0 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=tngtech.com; s=B14491C6-869D-11EB-BB6C-8DD33D883B31; t=1769456124; bh=DUjIygVJ7USk7YTYmtr+cNBP96RzKe7WrgkUPu8Yckk=; h=From:To:Subject:Date:Message-Id:MIME-Version; b=CWAT0TnQxE4Ir7bTTnWNyhmKjPeMlLBntCSPPJwr0SixnJLEH0TDAlNLLrODAl+j/ gcxvNhioTXlv8C7+Kq5EL2wPWPZ7X68UTysbnzejfHVKCuxrupUiXR6s64uySuY8lL LsLOZ3Zlpahnrg+oDoQXyGKxAgRTzUH2VwxmNZnaXU7OC1OG4cq+MBSuGuYWSOfrWA 5RRayJma6p/V4MRXjBNfUtoNv10NWjO2xZ3Kr8ZRNM7G7SIXgPTGL5NEbiDcZLBYF4 7/AmcyRIGeRkUFmBkO95inzZMFAp5mqWdeiYIWLnfT/nbHyHS2YKVBrep/ZR5jBb0C gKGk6D0Aa3oUw== X-Virus-Scanned: amavis at zmproxy.tng.vnc.biz Received: from zmproxy.tng.vnc.biz ([127.0.0.1]) by localhost (zmproxy.tng.vnc.biz [127.0.0.1]) (amavis, port 10026) with ESMTP id w8uUK1FSIyAi; Mon, 26 Jan 2026 20:35:23 +0100 (CET) Received: from DESKTOP-0O0JV6I.localdomain (ipservice-092-208-231-176.092.208.pools.vodafone-ip.de [92.208.231.176]) by zmproxy.tng.vnc.biz (Postfix) with ESMTPSA id A336E1F8839; Mon, 26 Jan 2026 20:35:23 +0100 (CET) From: Luis Augenstein To: nathan@kernel.org, nsc@kernel.org Cc: linux-kbuild@vger.kernel.org, linux-kernel@vger.kernel.org, akpm@linux-foundation.org, gregkh@linuxfoundation.org, maximilian.huber@tngtech.com, Luis Augenstein Subject: [PATCH v3 07/14] tools/sbom: add JSON-LD serialization Date: Mon, 26 Jan 2026 20:32:57 +0100 Message-Id: <20260126193304.320916-8-luis.augenstein@tngtech.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20260126193304.320916-1-luis.augenstein@tngtech.com> References: <20260126193304.320916-1-luis.augenstein@tngtech.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Add infrastructure to serialize an SPDX graph as a JSON-LD document. NamespaceMaps in the SPDX document are converted to custom prefixes in the @context field of the JSON-LD output. The SBOM tool uses NamespaceMaps solely to shorten SPDX IDs, avoiding repetition of full namespace URIs by using short prefixes. Co-developed-by: Maximilian Huber Signed-off-by: Maximilian Huber Signed-off-by: Luis Augenstein --- tools/sbom/Makefile | 3 +- tools/sbom/sbom.py | 52 +++++++++++++++++ tools/sbom/sbom/config.py | 56 +++++++++++++++++++ tools/sbom/sbom/path_utils.py | 11 ++++ tools/sbom/sbom/spdx_graph/__init__.py | 7 +++ .../sbom/sbom/spdx_graph/build_spdx_graphs.py | 36 ++++++++++++ .../sbom/sbom/spdx_graph/spdx_graph_model.py | 36 ++++++++++++ 7 files changed, 200 insertions(+), 1 deletion(-) create mode 100644 tools/sbom/sbom/path_utils.py create mode 100644 tools/sbom/sbom/spdx_graph/__init__.py create mode 100644 tools/sbom/sbom/spdx_graph/build_spdx_graphs.py create mode 100644 tools/sbom/sbom/spdx_graph/spdx_graph_model.py diff --git a/tools/sbom/Makefile b/tools/sbom/Makefile index cc4a632533ba..1fef44cede46 100644 --- a/tools/sbom/Makefile +++ b/tools/sbom/Makefile @@ -33,7 +33,8 @@ $(SBOM_TARGETS) &: $(SBOM_DEPS) --src-tree $(srctree) \ --obj-tree $(objtree) \ --roots-file $(SBOM_ROOTS_FILE) \ - --output-directory $(objtree) + --output-directory $(objtree) \ + --generate-spdx =20 @rm $(SBOM_ROOTS_FILE) =20 diff --git a/tools/sbom/sbom.py b/tools/sbom/sbom.py index 25d912a282de..426521ade460 100644 --- a/tools/sbom/sbom.py +++ b/tools/sbom/sbom.py @@ -6,13 +6,18 @@ Compute software bill of materials in SPDX format describing a kernel buil= d. """ =20 +import json import logging import os import sys import time +import uuid import sbom.sbom_logging as sbom_logging from sbom.config import get_config from sbom.path_utils import is_relative_to +from sbom.spdx import JsonLdSpdxDocument, SpdxIdGenerator +from sbom.spdx.core import CreationInfo, SpdxDocument +from sbom.spdx_graph import SpdxIdGeneratorCollection, build_spdx_graphs from sbom.cmd_graph import CmdGraph =20 =20 @@ -56,10 +61,57 @@ def main(): f.write("\n".join(str(file_path) for file_path in used_fil= es)) logging.debug(f"Successfully saved {used_files_path}") =20 + if config.generate_spdx is False: + return + + # Build SPDX Documents + logging.debug("Start generating SPDX graph based on cmd graph") + start_time =3D time.time() + + # The real uuid will be generated based on the content of the SPDX gra= phs + # to ensure that the same SPDX document is always assigned the same uu= id. + PLACEHOLDER_UUID =3D "00000000-0000-0000-0000-000000000000" + spdx_id_base_namespace =3D f"{config.spdxId_prefix}{PLACEHOLDER_UUID}/" + spdx_id_generators =3D SpdxIdGeneratorCollection( + base=3DSpdxIdGenerator(prefix=3D"p", namespace=3Dspdx_id_base_name= space), + source=3DSpdxIdGenerator(prefix=3D"s", namespace=3Df"{spdx_id_base= _namespace}source/"), + build=3DSpdxIdGenerator(prefix=3D"b", namespace=3Df"{spdx_id_base_= namespace}build/"), + output=3DSpdxIdGenerator(prefix=3D"o", namespace=3Df"{spdx_id_base= _namespace}output/"), + ) + + spdx_graphs =3D build_spdx_graphs( + cmd_graph, + spdx_id_generators, + config, + ) + spdx_id_uuid =3D uuid.uuid5( + uuid.NAMESPACE_URL, + "".join( + json.dumps(element.to_dict()) for spdx_graph in spdx_graphs.va= lues() for element in spdx_graph.to_list() + ), + ) + logging.debug(f"Generated SPDX graph in {time.time() - start_time} sec= onds") + # Report collected warnings and errors in case of failure warning_summary =3D sbom_logging.summarize_warnings() error_summary =3D sbom_logging.summarize_errors() =20 + if not sbom_logging.has_errors() or config.write_output_on_error: + for kernel_sbom_kind, spdx_graph in spdx_graphs.items(): + spdx_graph_objects =3D spdx_graph.to_list() + # Add warning and error summary to creation info comment + creation_info =3D next(element for element in spdx_graph_objec= ts if isinstance(element, CreationInfo)) + creation_info.comment =3D "\n".join([warning_summary, error_su= mmary]).strip() + # Replace Placeholder uuid with real uuid for spdxIds + spdx_document =3D next(element for element in spdx_graph_objec= ts if isinstance(element, SpdxDocument)) + for namespaceMap in spdx_document.namespaceMap: + namespaceMap.namespace =3D namespaceMap.namespace.replace(= PLACEHOLDER_UUID, str(spdx_id_uuid)) + # Serialize SPDX graph to JSON-LD + spdx_doc =3D JsonLdSpdxDocument(graph=3Dspdx_graph_objects) + save_path =3D os.path.join(config.output_directory, config.spd= x_file_names[kernel_sbom_kind]) + spdx_doc.save(save_path, config.prettify_json) + logging.debug(f"Successfully saved {save_path}") + if warning_summary: logging.warning(warning_summary) if error_summary: diff --git a/tools/sbom/sbom/config.py b/tools/sbom/sbom/config.py index 39e556a4c53b..0985457c3cae 100644 --- a/tools/sbom/sbom/config.py +++ b/tools/sbom/sbom/config.py @@ -3,11 +3,18 @@ =20 import argparse from dataclasses import dataclass +from enum import Enum import os from typing import Any from sbom.path_utils import PathStr =20 =20 +class KernelSpdxDocumentKind(Enum): + SOURCE =3D "source" + BUILD =3D "build" + OUTPUT =3D "output" + + @dataclass class KernelSbomConfig: src_tree: PathStr @@ -19,6 +26,13 @@ class KernelSbomConfig: root_paths: list[PathStr] """List of paths to root outputs (relative to obj_tree) to base the SB= OM on.""" =20 + generate_spdx: bool + """Whether to generate SPDX SBOM documents. If False, no SPDX files ar= e created.""" + + spdx_file_names: dict[KernelSpdxDocumentKind, str] + """If `generate_spdx` is True, defines the file names for each SPDX SB= OM kind + (source, build, output) to store on disk.""" + generate_used_files: bool """Whether to generate a flat list of all source files used in the bui= ld. If False, no used-files document is created.""" @@ -38,6 +52,12 @@ class KernelSbomConfig: write_output_on_error: bool """Whether to write output documents even if errors occur.""" =20 + spdxId_prefix: str + """Prefix to use for all SPDX element IDs.""" + + prettify_json: bool + """Whether to pretty-print generated SPDX JSON documents.""" + =20 def _parse_cli_arguments() -> dict[str, Any]: """ @@ -72,6 +92,15 @@ def _parse_cli_arguments() -> dict[str, Any]: "--roots-file", help=3D"Path to a file containing the root paths (one per line). C= annot be used together with --roots.", ) + parser.add_argument( + "--generate-spdx", + action=3D"store_true", + default=3DFalse, + help=3D( + "Whether to create sbom-source.spdx.json, sbom-build.spdx.json= and " + "sbom-output.spdx.json documents (default: False)" + ), + ) parser.add_argument( "--generate-used-files", action=3D"store_true", @@ -119,6 +148,20 @@ def _parse_cli_arguments() -> dict[str, Any]: ), ) =20 + # SPDX specific options + spdx_group =3D parser.add_argument_group("SPDX options", "Options for = customizing SPDX document generation") + spdx_group.add_argument( + "--spdxId-prefix", + default=3D"urn:spdx.dev:", + help=3D"The prefix to use for all spdxId properties. (default: urn= :spdx.dev:)", + ) + spdx_group.add_argument( + "--prettify-json", + action=3D"store_true", + default=3DFalse, + help=3D"Whether to pretty print the generated spdx.json documents = (default: False)", + ) + args =3D vars(parser.parse_args()) return args =20 @@ -144,6 +187,7 @@ def get_config() -> KernelSbomConfig: root_paths =3D args["roots"] _validate_path_arguments(src_tree, obj_tree, root_paths) =20 + generate_spdx =3D args["generate_spdx"] generate_used_files =3D args["generate_used_files"] output_directory =3D os.path.realpath(args["output_directory"]) debug =3D args["debug"] @@ -151,19 +195,31 @@ def get_config() -> KernelSbomConfig: fail_on_unknown_build_command =3D not args["do_not_fail_on_unknown_bui= ld_command"] write_output_on_error =3D args["write_output_on_error"] =20 + spdxId_prefix =3D args["spdxId_prefix"] + prettify_json =3D args["prettify_json"] + # Hardcoded config + spdx_file_names =3D { + KernelSpdxDocumentKind.SOURCE: "sbom-source.spdx.json", + KernelSpdxDocumentKind.BUILD: "sbom-build.spdx.json", + KernelSpdxDocumentKind.OUTPUT: "sbom-output.spdx.json", + } used_files_file_name =3D "sbom.used-files.txt" =20 return KernelSbomConfig( src_tree=3Dsrc_tree, obj_tree=3Dobj_tree, root_paths=3Droot_paths, + generate_spdx=3Dgenerate_spdx, + spdx_file_names=3Dspdx_file_names, generate_used_files=3Dgenerate_used_files, used_files_file_name=3Dused_files_file_name, output_directory=3Doutput_directory, debug=3Ddebug, fail_on_unknown_build_command=3Dfail_on_unknown_build_command, write_output_on_error=3Dwrite_output_on_error, + spdxId_prefix=3DspdxId_prefix, + prettify_json=3Dprettify_json, ) =20 =20 diff --git a/tools/sbom/sbom/path_utils.py b/tools/sbom/sbom/path_utils.py new file mode 100644 index 000000000000..d28d67b25398 --- /dev/null +++ b/tools/sbom/sbom/path_utils.py @@ -0,0 +1,11 @@ +# SPDX-License-Identifier: GPL-2.0-only OR MIT +# Copyright (C) 2025 TNG Technology Consulting GmbH + +import os + +PathStr =3D str +"""Filesystem path represented as a plain string for better performance th= an pathlib.Path.""" + + +def is_relative_to(path: PathStr, base: PathStr) -> bool: + return os.path.commonpath([path, base]) =3D=3D base diff --git a/tools/sbom/sbom/spdx_graph/__init__.py b/tools/sbom/sbom/spdx_= graph/__init__.py new file mode 100644 index 000000000000..3557b1d51bf9 --- /dev/null +++ b/tools/sbom/sbom/spdx_graph/__init__.py @@ -0,0 +1,7 @@ +# SPDX-License-Identifier: GPL-2.0-only OR MIT +# Copyright (C) 2025 TNG Technology Consulting GmbH + +from .build_spdx_graphs import build_spdx_graphs +from .spdx_graph_model import SpdxIdGeneratorCollection + +__all__ =3D ["build_spdx_graphs", "SpdxIdGeneratorCollection"] diff --git a/tools/sbom/sbom/spdx_graph/build_spdx_graphs.py b/tools/sbom/s= bom/spdx_graph/build_spdx_graphs.py new file mode 100644 index 000000000000..bb3db4e423da --- /dev/null +++ b/tools/sbom/sbom/spdx_graph/build_spdx_graphs.py @@ -0,0 +1,36 @@ +# SPDX-License-Identifier: GPL-2.0-only OR MIT +# Copyright (C) 2025 TNG Technology Consulting GmbH + + +from typing import Protocol + +from sbom.config import KernelSpdxDocumentKind +from sbom.cmd_graph import CmdGraph +from sbom.path_utils import PathStr +from sbom.spdx_graph.spdx_graph_model import SpdxGraph, SpdxIdGeneratorCol= lection + + +class SpdxGraphConfig(Protocol): + obj_tree: PathStr + src_tree: PathStr + + +def build_spdx_graphs( + cmd_graph: CmdGraph, + spdx_id_generators: SpdxIdGeneratorCollection, + config: SpdxGraphConfig, +) -> dict[KernelSpdxDocumentKind, SpdxGraph]: + """ + Builds SPDX graphs (output, source, and build) based on a cmd dependen= cy graph. + If the source and object trees are identical, no dedicated source grap= h can be created. + In that case the source files are added to the build graph instead. + + Args: + cmd_graph: The dependency graph of a kernel build. + spdx_id_generators: Collection of SPDX ID generators. + config: Configuration options. + + Returns: + Dictionary of SPDX graphs + """ + return {} diff --git a/tools/sbom/sbom/spdx_graph/spdx_graph_model.py b/tools/sbom/sb= om/spdx_graph/spdx_graph_model.py new file mode 100644 index 000000000000..682194d4362a --- /dev/null +++ b/tools/sbom/sbom/spdx_graph/spdx_graph_model.py @@ -0,0 +1,36 @@ +# SPDX-License-Identifier: GPL-2.0-only OR MIT +# Copyright (C) 2025 TNG Technology Consulting GmbH + +from dataclasses import dataclass +from sbom.spdx.core import CreationInfo, SoftwareAgent, SpdxDocument, Spdx= Object +from sbom.spdx.software import Sbom +from sbom.spdx.spdxId import SpdxIdGenerator + + +@dataclass +class SpdxGraph: + """Represents the complete graph of a single SPDX document.""" + + spdx_document: SpdxDocument + agent: SoftwareAgent + creation_info: CreationInfo + sbom: Sbom + + def to_list(self) -> list[SpdxObject]: + return [ + self.spdx_document, + self.agent, + self.creation_info, + self.sbom, + *self.sbom.element, + ] + + +@dataclass +class SpdxIdGeneratorCollection: + """Holds SPDX ID generators for different document types to ensure glo= bally unique SPDX IDs.""" + + base: SpdxIdGenerator + source: SpdxIdGenerator + build: SpdxIdGenerator + output: SpdxIdGenerator --=20 2.34.1 From nobody Sat Feb 7 08:07:12 2026 Received: from mailgw01.zimbra-vnc.de (mailgw01.zimbra-vnc.de [148.251.101.236]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D35D1217723; Mon, 26 Jan 2026 19:35:31 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=148.251.101.236 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1769456133; cv=none; b=ufBN+MyMLrZJesllQCwNeOaBs0wyR/jefd9JTUsxPLS5OJKyIbjtSRZQgi4IshiL+7u35H42Uty+7+IEtaOIKlFKBP08qVyLskCvCo+gloocHsswqpREASSPFSoNGhNYV3pWj7CAc/t/2ireZfULBZH7ICe6gyV/3RCxwSs0HBE= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1769456133; c=relaxed/simple; bh=Th3sKvwarQGiGkATSt8SwerBpRF0rch7/VQ2w8JJx58=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=H0RmH3INrL3PvncjaWAR8Z8I2TCRnYMN9tPOIdum8O+MNm1F7Hcq82NDh86fDYnSuKqP1OvRpHzJ+aVWiHxvCvA1RkEQgVzgAulL1QYa1cjkSLHfygdPPbTnM2K1gK/3gA4Y9CesgI87N62UyONZwFjd8fRg43Tbls51XecOzjM= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=tngtech.com; spf=pass smtp.mailfrom=tngtech.com; dkim=pass (2048-bit key) header.d=tngtech.com header.i=@tngtech.com header.b=a4TflKfV; arc=none smtp.client-ip=148.251.101.236 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=tngtech.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=tngtech.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=tngtech.com header.i=@tngtech.com header.b="a4TflKfV" Received: from zmproxy.tng.vnc.biz (zimbra-vnc.tngtech.com [35.234.71.156]) by mailgw01.zimbra-vnc.de (Postfix) with ESMTPS id 8A9A93FAF3; Mon, 26 Jan 2026 20:35:27 +0100 (CET) Received: from localhost (localhost [127.0.0.1]) by zmproxy.tng.vnc.biz (Postfix) with ESMTP id 46F601FA707; Mon, 26 Jan 2026 20:35:27 +0100 (CET) Received: from zmproxy.tng.vnc.biz ([127.0.0.1]) by localhost (zmproxy.tng.vnc.biz [127.0.0.1]) (amavis, port 10032) with ESMTP id qDAkr-IRflZw; Mon, 26 Jan 2026 20:35:26 +0100 (CET) Received: from localhost (localhost [127.0.0.1]) by zmproxy.tng.vnc.biz (Postfix) with ESMTP id C2D201FA8B0; Mon, 26 Jan 2026 20:35:25 +0100 (CET) DKIM-Filter: OpenDKIM Filter v2.10.3 zmproxy.tng.vnc.biz C2D201FA8B0 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=tngtech.com; s=B14491C6-869D-11EB-BB6C-8DD33D883B31; t=1769456125; bh=mrKJ1t+v6kBR56XzilwCkJ0QgGrOaHh/FXUhLzq/HbE=; h=From:To:Subject:Date:Message-Id:MIME-Version; b=a4TflKfVwMpAdJM5KHLyZ0TaCQOLGQTIxhvSTW9u+klDRJyLoNHNpSsBjolM/pSE4 3LjvX3LBNvZ5isDcmxQxt47hlKbAzuMIfF8uW6moB4zkwka6Zt8xMzCEy/Nj43qinI oymXbY8eliX+f1G7J5F9Q9IvWqSc34HAEkyMB//AXwp2bl4clYCkZNeDS/O5F50tiY GIb4egYi11/QT6zDIaskeSYz3xbw5BvHduAGqQqdjRWOSLfEW3hLs+xhWgCRf0DiYC t/irdxllXiQ2/LPCiTuiroFrIZvb2nb7U5ADpH2ZuXIViQ4b+xZ0fUAGtyT0m7RlKk CR7KM14UEPDDg== X-Virus-Scanned: amavis at zmproxy.tng.vnc.biz Received: from zmproxy.tng.vnc.biz ([127.0.0.1]) by localhost (zmproxy.tng.vnc.biz [127.0.0.1]) (amavis, port 10026) with ESMTP id jvmznrzUbbn3; Mon, 26 Jan 2026 20:35:25 +0100 (CET) Received: from DESKTOP-0O0JV6I.localdomain (ipservice-092-208-231-176.092.208.pools.vodafone-ip.de [92.208.231.176]) by zmproxy.tng.vnc.biz (Postfix) with ESMTPSA id 01B9F1FA894; Mon, 26 Jan 2026 20:35:24 +0100 (CET) From: Luis Augenstein To: nathan@kernel.org, nsc@kernel.org Cc: linux-kbuild@vger.kernel.org, linux-kernel@vger.kernel.org, akpm@linux-foundation.org, gregkh@linuxfoundation.org, maximilian.huber@tngtech.com, Luis Augenstein Subject: [PATCH v3 08/14] tools/sbom: add shared SPDX elements Date: Mon, 26 Jan 2026 20:32:58 +0100 Message-Id: <20260126193304.320916-9-luis.augenstein@tngtech.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20260126193304.320916-1-luis.augenstein@tngtech.com> References: <20260126193304.320916-1-luis.augenstein@tngtech.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Implement shared SPDX elements used in all three documents. Co-developed-by: Maximilian Huber Signed-off-by: Maximilian Huber Signed-off-by: Luis Augenstein --- tools/sbom/sbom/config.py | 25 +++++++++++++++ .../sbom/sbom/spdx_graph/build_spdx_graphs.py | 5 ++- .../sbom/spdx_graph/shared_spdx_elements.py | 32 +++++++++++++++++++ 3 files changed, 61 insertions(+), 1 deletion(-) create mode 100644 tools/sbom/sbom/spdx_graph/shared_spdx_elements.py diff --git a/tools/sbom/sbom/config.py b/tools/sbom/sbom/config.py index 0985457c3cae..9278e2be7cb2 100644 --- a/tools/sbom/sbom/config.py +++ b/tools/sbom/sbom/config.py @@ -3,6 +3,7 @@ =20 import argparse from dataclasses import dataclass +from datetime import datetime from enum import Enum import os from typing import Any @@ -52,6 +53,9 @@ class KernelSbomConfig: write_output_on_error: bool """Whether to write output documents even if errors occur.""" =20 + created: datetime + """Datetime to use for the SPDX created property of the CreationInfo e= lement.""" + spdxId_prefix: str """Prefix to use for all SPDX element IDs.""" =20 @@ -150,6 +154,16 @@ def _parse_cli_arguments() -> dict[str, Any]: =20 # SPDX specific options spdx_group =3D parser.add_argument_group("SPDX options", "Options for = customizing SPDX document generation") + spdx_group.add_argument( + "--created", + default=3DNone, + help=3D( + "The SPDX created property to use for the CreationInfo element= in " + "ISO format (YYYY-MM-DD [HH:MM:SS]).\n" + "If not provided the last modification time of the first root = output " + "is used. (default: None)" + ), + ) spdx_group.add_argument( "--spdxId-prefix", default=3D"urn:spdx.dev:", @@ -195,6 +209,16 @@ def get_config() -> KernelSbomConfig: fail_on_unknown_build_command =3D not args["do_not_fail_on_unknown_bui= ld_command"] write_output_on_error =3D args["write_output_on_error"] =20 + if args["created"] is None: + created =3D datetime.fromtimestamp(os.path.getmtime(os.path.join(o= bj_tree, root_paths[0]))) + else: + try: + created =3D datetime.fromisoformat(args["created"]) + except ValueError: + raise argparse.ArgumentTypeError( + f"Invalid date format for argument '--created': '{args['cr= eated']}'. " + "Expected ISO format (YYYY-MM-DD [HH:MM:SS])." + ) spdxId_prefix =3D args["spdxId_prefix"] prettify_json =3D args["prettify_json"] =20 @@ -218,6 +242,7 @@ def get_config() -> KernelSbomConfig: debug=3Ddebug, fail_on_unknown_build_command=3Dfail_on_unknown_build_command, write_output_on_error=3Dwrite_output_on_error, + created=3Dcreated, spdxId_prefix=3DspdxId_prefix, prettify_json=3Dprettify_json, ) diff --git a/tools/sbom/sbom/spdx_graph/build_spdx_graphs.py b/tools/sbom/s= bom/spdx_graph/build_spdx_graphs.py index bb3db4e423da..9c47258a31c6 100644 --- a/tools/sbom/sbom/spdx_graph/build_spdx_graphs.py +++ b/tools/sbom/sbom/spdx_graph/build_spdx_graphs.py @@ -1,18 +1,20 @@ # SPDX-License-Identifier: GPL-2.0-only OR MIT # Copyright (C) 2025 TNG Technology Consulting GmbH =20 - +from datetime import datetime from typing import Protocol =20 from sbom.config import KernelSpdxDocumentKind from sbom.cmd_graph import CmdGraph from sbom.path_utils import PathStr from sbom.spdx_graph.spdx_graph_model import SpdxGraph, SpdxIdGeneratorCol= lection +from sbom.spdx_graph.shared_spdx_elements import SharedSpdxElements =20 =20 class SpdxGraphConfig(Protocol): obj_tree: PathStr src_tree: PathStr + created: datetime =20 =20 def build_spdx_graphs( @@ -33,4 +35,5 @@ def build_spdx_graphs( Returns: Dictionary of SPDX graphs """ + shared_elements =3D SharedSpdxElements.create(spdx_id_generators.base,= config.created) return {} diff --git a/tools/sbom/sbom/spdx_graph/shared_spdx_elements.py b/tools/sbo= m/sbom/spdx_graph/shared_spdx_elements.py new file mode 100644 index 000000000000..0c83428f4c70 --- /dev/null +++ b/tools/sbom/sbom/spdx_graph/shared_spdx_elements.py @@ -0,0 +1,32 @@ +# SPDX-License-Identifier: GPL-2.0-only OR MIT +# Copyright (C) 2025 TNG Technology Consulting GmbH + +from dataclasses import dataclass +from datetime import datetime +from sbom.spdx.core import CreationInfo, SoftwareAgent +from sbom.spdx.spdxId import SpdxIdGenerator + + +@dataclass(frozen=3DTrue) +class SharedSpdxElements: + agent: SoftwareAgent + creation_info: CreationInfo + + @classmethod + def create(cls, spdx_id_generator: SpdxIdGenerator, created: datetime)= -> "SharedSpdxElements": + """ + Creates shared SPDX elements used across multiple documents. + + Args: + spdx_id_generator: Generator for creating SPDX IDs. + created: SPDX 'created' property used for the creation info. + + Returns: + SharedSpdxElements with agent and creation info. + """ + agent =3D SoftwareAgent( + spdxId=3Dspdx_id_generator.generate(), + name=3D"KernelSbom", + ) + creation_info =3D CreationInfo(createdBy=3D[agent], created=3Dcrea= ted.strftime("%Y-%m-%dT%H:%M:%SZ")) + return SharedSpdxElements(agent=3Dagent, creation_info=3Dcreation_= info) --=20 2.34.1 From nobody Sat Feb 7 08:07:12 2026 Received: from mailgw01.zimbra-vnc.de (mailgw01.zimbra-vnc.de [148.251.101.236]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 2A7B7279DB4; Mon, 26 Jan 2026 19:35:36 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=148.251.101.236 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1769456138; cv=none; b=j1a8pS5LOcqorCF7unIzRDZW2tlHiMX5v8PfY3RE2k32aT+9CnNt5a+uJDTa5S6TLIVzsiB/iU3KPvr7SSYORdjDT+Oo+ZUp44I1H4cXrtz8J7zy3U1ZIu/tGUivYAW1NlznxgBbSwQ/sqHsde8Jx4wz+aJIBCZjAUqg9ntHZLo= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1769456138; c=relaxed/simple; bh=43IjMcgeRygQOUsk3eXgdsrwNkFJC4lZjRbyJSdH1f8=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=DS82QG/mqU40HUODl+3yJPL972huRzgekV1pvEdF/snO526ytW90ivTHtkzdbdXDaujAutbBb66tJdp0I4iiRHn/ijxg57cG7YebT6HVMkmAPAvk/jDWNFwDd06M9+oy9378cVzsHyxfyaPg8UZ9zNq9w0KczeFyScFss/UNDbA= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=tngtech.com; spf=pass smtp.mailfrom=tngtech.com; dkim=pass (2048-bit key) header.d=tngtech.com header.i=@tngtech.com header.b=I2H/Vwck; arc=none smtp.client-ip=148.251.101.236 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=tngtech.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=tngtech.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=tngtech.com header.i=@tngtech.com header.b="I2H/Vwck" Received: from zmproxy.tng.vnc.biz (zimbra-vnc.tngtech.com [35.234.71.156]) by mailgw01.zimbra-vnc.de (Postfix) with ESMTPS id 46DA43FAF1; Mon, 26 Jan 2026 20:35:30 +0100 (CET) Received: from localhost (localhost [127.0.0.1]) by zmproxy.tng.vnc.biz (Postfix) with ESMTP id 274CC1F8839; Mon, 26 Jan 2026 20:35:28 +0100 (CET) Received: from zmproxy.tng.vnc.biz ([127.0.0.1]) by localhost (zmproxy.tng.vnc.biz [127.0.0.1]) (amavis, port 10032) with ESMTP id HxzUARiz2WLU; Mon, 26 Jan 2026 20:35:26 +0100 (CET) Received: from localhost (localhost [127.0.0.1]) by zmproxy.tng.vnc.biz (Postfix) with ESMTP id C22161FA6D0; Mon, 26 Jan 2026 20:35:26 +0100 (CET) DKIM-Filter: OpenDKIM Filter v2.10.3 zmproxy.tng.vnc.biz C22161FA6D0 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=tngtech.com; s=B14491C6-869D-11EB-BB6C-8DD33D883B31; t=1769456126; bh=U14Rf/iGMHZtXdPtnYpV+SUKdW39dNHsGWJZ6qEpyDc=; h=From:To:Subject:Date:Message-Id:MIME-Version; b=I2H/VwckyYi/qDoUukmR4SyiW94zcEWMuawpzVPkm6R/kz0hdjndcSLqnlTRQZZc3 kr9m6/Yeu7DD/gFo5P/Xl6md0xtOgjtTaOboT41aitLhBy0TesUqqr2SzebrgVjUSo Xh5KcxTjWykvU2CllY1sdYWSi3g7bksjyUbl2cv42+LbvyVPXPrrP34WslA7Dfh6oR MNI1swwmQKqsXWJxaW2t4+X1q9Oeyk4J1tVXlQFLS7NaupP4p2508yeGEQBqTqv7pM gR/U4S3Cs4mYCGXzARUM6NLfAx8AahZ3MsqEboatKvmlViJHDzBbq4nhRvpX6xATiv ofMps5C1Ionfg== X-Virus-Scanned: amavis at zmproxy.tng.vnc.biz Received: from zmproxy.tng.vnc.biz ([127.0.0.1]) by localhost (zmproxy.tng.vnc.biz [127.0.0.1]) (amavis, port 10026) with ESMTP id 2W-7no0dTjbl; Mon, 26 Jan 2026 20:35:26 +0100 (CET) Received: from DESKTOP-0O0JV6I.localdomain (ipservice-092-208-231-176.092.208.pools.vodafone-ip.de [92.208.231.176]) by zmproxy.tng.vnc.biz (Postfix) with ESMTPSA id 33D5C1F8839; Mon, 26 Jan 2026 20:35:26 +0100 (CET) From: Luis Augenstein To: nathan@kernel.org, nsc@kernel.org Cc: linux-kbuild@vger.kernel.org, linux-kernel@vger.kernel.org, akpm@linux-foundation.org, gregkh@linuxfoundation.org, maximilian.huber@tngtech.com, Luis Augenstein Subject: [PATCH v3 09/14] tools/sbom: collect file metadata Date: Mon, 26 Jan 2026 20:32:59 +0100 Message-Id: <20260126193304.320916-10-luis.augenstein@tngtech.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20260126193304.320916-1-luis.augenstein@tngtech.com> References: <20260126193304.320916-1-luis.augenstein@tngtech.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Implement the kernel_file module that collects file metadata, including license identifier for source files, SHA-256 hash, Git blob object ID, an estimation of the file type, and whether files belong to the source, build, or output SBOM. Co-developed-by: Maximilian Huber Signed-off-by: Maximilian Huber Signed-off-by: Luis Augenstein --- .../sbom/sbom/spdx_graph/build_spdx_graphs.py | 2 + tools/sbom/sbom/spdx_graph/kernel_file.py | 310 ++++++++++++++++++ 2 files changed, 312 insertions(+) create mode 100644 tools/sbom/sbom/spdx_graph/kernel_file.py diff --git a/tools/sbom/sbom/spdx_graph/build_spdx_graphs.py b/tools/sbom/s= bom/spdx_graph/build_spdx_graphs.py index 9c47258a31c6..0f95f99d560a 100644 --- a/tools/sbom/sbom/spdx_graph/build_spdx_graphs.py +++ b/tools/sbom/sbom/spdx_graph/build_spdx_graphs.py @@ -7,6 +7,7 @@ from typing import Protocol from sbom.config import KernelSpdxDocumentKind from sbom.cmd_graph import CmdGraph from sbom.path_utils import PathStr +from sbom.spdx_graph.kernel_file import KernelFileCollection from sbom.spdx_graph.spdx_graph_model import SpdxGraph, SpdxIdGeneratorCol= lection from sbom.spdx_graph.shared_spdx_elements import SharedSpdxElements =20 @@ -36,4 +37,5 @@ def build_spdx_graphs( Dictionary of SPDX graphs """ shared_elements =3D SharedSpdxElements.create(spdx_id_generators.base,= config.created) + kernel_files =3D KernelFileCollection.create(cmd_graph, config.obj_tre= e, config.src_tree, spdx_id_generators) return {} diff --git a/tools/sbom/sbom/spdx_graph/kernel_file.py b/tools/sbom/sbom/sp= dx_graph/kernel_file.py new file mode 100644 index 000000000000..84582567bc4d --- /dev/null +++ b/tools/sbom/sbom/spdx_graph/kernel_file.py @@ -0,0 +1,310 @@ +# SPDX-License-Identifier: GPL-2.0-only OR MIT +# Copyright (C) 2025 TNG Technology Consulting GmbH + +from dataclasses import dataclass +from enum import Enum +import hashlib +import os +import re +from sbom.cmd_graph import CmdGraph +from sbom.path_utils import PathStr, is_relative_to +from sbom.spdx import SpdxId, SpdxIdGenerator +from sbom.spdx.core import Hash +from sbom.spdx.software import ContentIdentifier, File, SoftwarePurpose +import sbom.sbom_logging as sbom_logging +from sbom.spdx_graph.spdx_graph_model import SpdxIdGeneratorCollection + + +class KernelFileLocation(Enum): + """Represents the location of a file relative to the source/object tre= es.""" + + SOURCE_TREE =3D "source_tree" + """File is located in the source tree.""" + OBJ_TREE =3D "obj_tree" + """File is located in the object tree.""" + EXTERNAL =3D "external" + """File is located outside both source and object trees.""" + BOTH =3D "both" + """File is located in a folder that is both source and object tree.""" + + +@dataclass +class KernelFile: + """kernel-specific metadata used to generate an SPDX File element.""" + + absolute_path: PathStr + """Absolute path of the file.""" + file_location: KernelFileLocation + """Location of the file relative to the source/object trees.""" + name: str + """Name of the file element. Should be relative to the source tree if + file_location equals SOURCE_TREE and relative to the object tree if + file_location equals OBJ_TREE. If file_location equals EXTERNAL, the + absolute path is used.""" + license_identifier: str | None + """SPDX license ID if file_location equals SOURCE_TREE or BOTH; otherw= ise None.""" + spdx_id_generator: SpdxIdGenerator + """Generator for the SPDX ID of the file element.""" + + _spdx_file_element: File | None =3D None + + @classmethod + def create( + cls, + absolute_path: PathStr, + obj_tree: PathStr, + src_tree: PathStr, + spdx_id_generators: SpdxIdGeneratorCollection, + is_output: bool, + ) -> "KernelFile": + is_in_obj_tree =3D is_relative_to(absolute_path, obj_tree) + is_in_src_tree =3D is_relative_to(absolute_path, src_tree) + + # file element name should be relative to output or src tree if po= ssible + if not is_in_src_tree and not is_in_obj_tree: + file_element_name =3D str(absolute_path) + file_location =3D KernelFileLocation.EXTERNAL + spdx_id_generator =3D spdx_id_generators.build + elif is_in_src_tree and src_tree =3D=3D obj_tree: + file_element_name =3D os.path.relpath(absolute_path, obj_tree) + file_location =3D KernelFileLocation.BOTH + spdx_id_generator =3D spdx_id_generators.output if is_output e= lse spdx_id_generators.build + elif is_in_obj_tree: + file_element_name =3D os.path.relpath(absolute_path, obj_tree) + file_location =3D KernelFileLocation.OBJ_TREE + spdx_id_generator =3D spdx_id_generators.output if is_output e= lse spdx_id_generators.build + else: + file_element_name =3D os.path.relpath(absolute_path, src_tree) + file_location =3D KernelFileLocation.SOURCE_TREE + spdx_id_generator =3D spdx_id_generators.source + + # parse spdx license identifier + license_identifier =3D ( + _parse_spdx_license_identifier(absolute_path) + if file_location =3D=3D KernelFileLocation.SOURCE_TREE or file= _location =3D=3D KernelFileLocation.BOTH + else None + ) + + return KernelFile( + absolute_path, + file_location, + file_element_name, + license_identifier, + spdx_id_generator, + ) + + @property + def spdx_file_element(self) -> File: + if self._spdx_file_element is None: + self._spdx_file_element =3D _build_file_element( + self.absolute_path, + self.name, + self.spdx_id_generator.generate(), + self.file_location, + ) + return self._spdx_file_element + + +@dataclass +class KernelFileCollection: + """Collection of kernel files.""" + + source: dict[PathStr, KernelFile] + build: dict[PathStr, KernelFile] + output: dict[PathStr, KernelFile] + + @classmethod + def create( + cls, + cmd_graph: CmdGraph, + obj_tree: PathStr, + src_tree: PathStr, + spdx_id_generators: SpdxIdGeneratorCollection, + ) -> "KernelFileCollection": + source: dict[PathStr, KernelFile] =3D {} + build: dict[PathStr, KernelFile] =3D {} + output: dict[PathStr, KernelFile] =3D {} + root_node_paths =3D {node.absolute_path for node in cmd_graph.root= s} + for node in cmd_graph: + is_root =3D node.absolute_path in root_node_paths + kernel_file =3D KernelFile.create( + node.absolute_path, + obj_tree, + src_tree, + spdx_id_generators, + is_root, + ) + if is_root: + output[kernel_file.absolute_path] =3D kernel_file + elif kernel_file.file_location =3D=3D KernelFileLocation.SOURC= E_TREE: + source[kernel_file.absolute_path] =3D kernel_file + else: + build[kernel_file.absolute_path] =3D kernel_file + + return KernelFileCollection(source, build, output) + + def to_dict(self) -> dict[PathStr, KernelFile]: + return {**self.source, **self.build, **self.output} + + +def _build_file_element(absolute_path: PathStr, name: str, spdx_id: SpdxId= , file_location: KernelFileLocation) -> File: + verifiedUsing: list[Hash] =3D [] + content_identifier: list[ContentIdentifier] =3D [] + if os.path.exists(absolute_path): + verifiedUsing =3D [Hash(algorithm=3D"sha256", hashValue=3D_sha256(= absolute_path))] + content_identifier =3D [ + ContentIdentifier( + software_contentIdentifierType=3D"gitoid", + software_contentIdentifierValue=3D_git_blob_oid(absolute_p= ath), + ) + ] + elif file_location =3D=3D KernelFileLocation.EXTERNAL: + sbom_logging.warning( + "Cannot compute hash for {absolute_path} because file does not= exist.", + absolute_path=3Dabsolute_path, + ) + else: + sbom_logging.error( + "Cannot compute hash for {absolute_path} because file does not= exist.", + absolute_path=3Dabsolute_path, + ) + + # primary purpose + primary_purpose =3D _get_primary_purpose(absolute_path) + + return File( + spdxId=3Dspdx_id, + name=3Dname, + verifiedUsing=3DverifiedUsing, + software_primaryPurpose=3Dprimary_purpose, + software_contentIdentifier=3Dcontent_identifier, + ) + + +def _sha256(path: PathStr) -> str: + """Compute the SHA-256 hash of a file.""" + with open(path, "rb") as f: + data =3D f.read() + return hashlib.sha256(data).hexdigest() + + +def _git_blob_oid(file_path: str) -> str: + """ + Compute the Git blob object ID (SHA-1) for a file, like `git hash-obje= ct`. + + Args: + file_path: Path to the file. + + Returns: + SHA-1 hash (hex) of the Git blob object. + """ + with open(file_path, "rb") as f: + content =3D f.read() + header =3D f"blob {len(content)}\0".encode() + store =3D header + content + sha1_hash =3D hashlib.sha1(store).hexdigest() + return sha1_hash + + +# REUSE-IgnoreStart +SPDX_LICENSE_IDENTIFIER_PATTERN =3D re.compile(r"SPDX-License-Identifier:\= s*(?P.*?)(?:\s*(\*/|$))") +# REUSE-IgnoreEnd + + +def _parse_spdx_license_identifier(absolute_path: str, max_lines: int =3D = 5) -> str | None: + """ + Extracts the SPDX-License-Identifier from the first few lines of a sou= rce file. + + Args: + absolute_path: Path to the source file. + max_lines: Number of lines to scan from the top (default: 5). + + Returns: + The license identifier string (e.g., 'GPL-2.0-only') if found, oth= erwise None. + """ + try: + with open(absolute_path, "r") as f: + for _ in range(max_lines): + match =3D SPDX_LICENSE_IDENTIFIER_PATTERN.search(f.readlin= e()) + if match: + return match.group("id") + except (UnicodeDecodeError, OSError): + return None + return None + + +def _get_primary_purpose(absolute_path: PathStr) -> SoftwarePurpose | None: + def ends_with(suffixes: list[str]) -> bool: + return any(absolute_path.endswith(suffix) for suffix in suffixes) + + def includes_path_segments(path_segments: list[str]) -> bool: + return any(segment in absolute_path for segment in path_segments) + + # Source code + if ends_with([".c", ".h", ".S", ".s", ".rs", ".pl"]): + return "source" + + # Libraries + if ends_with([".a", ".so", ".rlib"]): + return "library" + + # Archives + if ends_with([".xz", ".cpio", ".gz", ".tar", ".zip"]): + return "archive" + + # Applications + if ends_with(["bzImage", "Image"]): + return "application" + + # Executables / machine code + if ends_with([".bin", ".elf", "vmlinux", "vmlinux.unstripped", "bpfilt= er_umh"]): + return "executable" + + # Kernel modules + if ends_with([".ko"]): + return "module" + + # Data files + if ends_with( + [ + ".tbl", + ".relocs", + ".rmeta", + ".in", + ".dbg", + ".x509", + ".pbm", + ".ppm", + ".dtb", + ".uc", + ".inc", + ".dts", + ".dtsi", + ".dtbo", + ".xml", + ".ro", + "initramfs_inc_data", + "default_cpio_list", + "x509_certificate_list", + "utf8data.c_shipped", + "blacklist_hash_list", + "x509_revocation_list", + "cpucaps", + "sysreg", + ] + ) or includes_path_segments(["drivers/gpu/drm/radeon/reg_srcs/"]): + return "data" + + # Configuration files + if ends_with([".pem", ".key", ".conf", ".config", ".cfg", ".bconf"]): + return "configuration" + + # Documentation + if ends_with([".md"]): + return "documentation" + + # Other / miscellaneous + if ends_with([".o", ".tmp"]): + return "other" + + sbom_logging.warning("Could not infer primary purpose for {absolute_pa= th}", absolute_path=3Dabsolute_path) --=20 2.34.1 From nobody Sat Feb 7 08:07:12 2026 Received: from mailgw01.zimbra-vnc.de (mailgw01.zimbra-vnc.de [148.251.101.236]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 2A747229B36; Mon, 26 Jan 2026 19:35:36 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=148.251.101.236 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1769456139; cv=none; b=osR5yEHD0x7iL8qLvIGnKA3GVbnvqh7T3qS5OQvY2ytKCmUC6b3xr952UTw4uLUWyll08mIMvY0/p/1M0iMPqwAb5XVQcn+gbBuT2M3wvXqPrSaQyy6TQTj0wwPM4xCXv8EvFI26ItzdeZaVmd0QxylU3PggwWA9JcpD448upR0= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1769456139; c=relaxed/simple; bh=ev8XL8ZNYC6JNi4L/WfEXdlmFxXSAxKCD94oN038FKA=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=uOXN6mbk8nl+Cl4tGsgcVotp4nWmgx+E74sfwGDVEyn101a2wssNe6c23W+ULls4GrMJRWIxJXa3IbRdil8gP9OFNyG1sBGhK703iKn2SPwKsKCbJLeKVPlFo1umqqEQet2jI6ZYJYuUp+FoN53poXMwePPxtQGXo3QB5ZfKY04= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=tngtech.com; spf=pass smtp.mailfrom=tngtech.com; dkim=pass (2048-bit key) header.d=tngtech.com header.i=@tngtech.com header.b=V8k0BROA; arc=none smtp.client-ip=148.251.101.236 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=tngtech.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=tngtech.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=tngtech.com header.i=@tngtech.com header.b="V8k0BROA" Received: from zmproxy.tng.vnc.biz (zimbra-vnc.tngtech.com [35.234.71.156]) by mailgw01.zimbra-vnc.de (Postfix) with ESMTPS id 3D1CB3FAF8; Mon, 26 Jan 2026 20:35:31 +0100 (CET) Received: from localhost (localhost [127.0.0.1]) by zmproxy.tng.vnc.biz (Postfix) with ESMTP id 0CBB01FA3D7; Mon, 26 Jan 2026 20:35:29 +0100 (CET) Received: from zmproxy.tng.vnc.biz ([127.0.0.1]) by localhost (zmproxy.tng.vnc.biz [127.0.0.1]) (amavis, port 10032) with ESMTP id rB-vb42oj8TP; Mon, 26 Jan 2026 20:35:28 +0100 (CET) Received: from localhost (localhost [127.0.0.1]) by zmproxy.tng.vnc.biz (Postfix) with ESMTP id 0AEED1FA856; Mon, 26 Jan 2026 20:35:28 +0100 (CET) DKIM-Filter: OpenDKIM Filter v2.10.3 zmproxy.tng.vnc.biz 0AEED1FA856 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=tngtech.com; s=B14491C6-869D-11EB-BB6C-8DD33D883B31; t=1769456128; bh=g+vgk61S7/+nivn1tWNWrgvbdtOC+/JQih3qv+IeF20=; h=From:To:Subject:Date:Message-Id:MIME-Version; b=V8k0BROAtZ9LB2iRlP0VaLPb+pLUvwlagyFFuwT1OIB1NFV1xoCRjnRXogfJZO57P igEogjHT0x9E0loO/jeAMFA1uFpu52qTVG1R8/3ITP5s0GBttOf5KjcHJJZhfNzBTt 3fuPSDTRTG0eXPlU0S540wHbfmfINIoPNwek76K97KIa7BAA+/CJ1rBgk6iHOnD+UD p9lKdQrmy2kXeyaePA+WesEeJIr1S4NsNvvAwgQWhBVEGB6lDALXFza2TXOvi/3fUF 5gwbK50KbwJSEAAziujsXys189JL6LZsH+9sWv2J4zcDM1p+gFi1kn/JTXIMb+a/7F 4w3Tztm5ieFIQ== X-Virus-Scanned: amavis at zmproxy.tng.vnc.biz Received: from zmproxy.tng.vnc.biz ([127.0.0.1]) by localhost (zmproxy.tng.vnc.biz [127.0.0.1]) (amavis, port 10026) with ESMTP id xa-ptY-lJmHK; Mon, 26 Jan 2026 20:35:27 +0100 (CET) Received: from DESKTOP-0O0JV6I.localdomain (ipservice-092-208-231-176.092.208.pools.vodafone-ip.de [92.208.231.176]) by zmproxy.tng.vnc.biz (Postfix) with ESMTPSA id A03781FA703; Mon, 26 Jan 2026 20:35:27 +0100 (CET) From: Luis Augenstein To: nathan@kernel.org, nsc@kernel.org Cc: linux-kbuild@vger.kernel.org, linux-kernel@vger.kernel.org, akpm@linux-foundation.org, gregkh@linuxfoundation.org, maximilian.huber@tngtech.com, Luis Augenstein Subject: [PATCH v3 10/14] tools/sbom: add SPDX output graph Date: Mon, 26 Jan 2026 20:33:00 +0100 Message-Id: <20260126193304.320916-11-luis.augenstein@tngtech.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20260126193304.320916-1-luis.augenstein@tngtech.com> References: <20260126193304.320916-1-luis.augenstein@tngtech.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Implement the SPDX output graph which contains the distributable build outputs and high level metadata about the build. Co-developed-by: Maximilian Huber Signed-off-by: Maximilian Huber Signed-off-by: Luis Augenstein --- tools/sbom/Makefile | 4 +- tools/sbom/sbom/config.py | 64 ++++++ tools/sbom/sbom/environment.py | 150 ++++++++++++++ .../sbom/sbom/spdx_graph/build_spdx_graphs.py | 18 +- .../sbom/sbom/spdx_graph/spdx_output_graph.py | 188 ++++++++++++++++++ 5 files changed, 422 insertions(+), 2 deletions(-) create mode 100644 tools/sbom/sbom/spdx_graph/spdx_output_graph.py diff --git a/tools/sbom/Makefile b/tools/sbom/Makefile index 1fef44cede46..38268ee1b1bd 100644 --- a/tools/sbom/Makefile +++ b/tools/sbom/Makefile @@ -34,7 +34,9 @@ $(SBOM_TARGETS) &: $(SBOM_DEPS) --obj-tree $(objtree) \ --roots-file $(SBOM_ROOTS_FILE) \ --output-directory $(objtree) \ - --generate-spdx + --generate-spdx \ + --package-license "GPL-2.0 WITH Linux-syscall-note" \ + --package-version "$(KERNELVERSION)" =20 @rm $(SBOM_ROOTS_FILE) =20 diff --git a/tools/sbom/sbom/config.py b/tools/sbom/sbom/config.py index 9278e2be7cb2..de57d9d94edb 100644 --- a/tools/sbom/sbom/config.py +++ b/tools/sbom/sbom/config.py @@ -59,6 +59,21 @@ class KernelSbomConfig: spdxId_prefix: str """Prefix to use for all SPDX element IDs.""" =20 + build_type: str + """SPDX buildType property to use for all Build elements.""" + + build_id: str | None + """SPDX buildId property to use for all Build elements.""" + + package_license: str + """License expression applied to all SPDX Packages.""" + + package_version: str | None + """Version string applied to all SPDX Packages.""" + + package_copyright_text: str | None + """Copyright text applied to all SPDX Packages.""" + prettify_json: bool """Whether to pretty-print generated SPDX JSON documents.""" =20 @@ -169,6 +184,40 @@ def _parse_cli_arguments() -> dict[str, Any]: default=3D"urn:spdx.dev:", help=3D"The prefix to use for all spdxId properties. (default: urn= :spdx.dev:)", ) + spdx_group.add_argument( + "--build-type", + default=3D"urn:spdx.dev:Kbuild", + help=3D"The SPDX buildType property to use for all Build elements.= (default: urn:spdx.dev:Kbuild)", + ) + spdx_group.add_argument( + "--build-id", + default=3DNone, + help=3D"The SPDX buildId property to use for all Build elements.\n" + "If not provided the spdxId of the high level Build element is use= d as the buildId. (default: None)", + ) + spdx_group.add_argument( + "--package-license", + default=3D"NOASSERTION", + help=3D( + "The SPDX licenseExpression property to use for the LicenseExp= ression " + "linked to all SPDX Package elements. (default: NOASSERTION)" + ), + ) + spdx_group.add_argument( + "--package-version", + default=3DNone, + help=3D"The SPDX packageVersion property to use for all SPDX Packa= ge elements. (default: None)", + ) + spdx_group.add_argument( + "--package-copyright-text", + default=3DNone, + help=3D( + "The SPDX copyrightText property to use for all SPDX Package e= lements.\n" + "If not specified, and if a COPYING file exists in the source = tree,\n" + "the package-copyright-text is set to the content of this file= . " + "(default: None)" + ), + ) spdx_group.add_argument( "--prettify-json", action=3D"store_true", @@ -220,6 +269,16 @@ def get_config() -> KernelSbomConfig: "Expected ISO format (YYYY-MM-DD [HH:MM:SS])." ) spdxId_prefix =3D args["spdxId_prefix"] + build_type =3D args["build_type"] + build_id =3D args["build_id"] + package_license =3D args["package_license"] + package_version =3D args["package_version"] if args["package_version"]= is not None else None + package_copyright_text: str | None =3D None + if args["package_copyright_text"] is not None: + package_copyright_text =3D args["package_copyright_text"] + elif os.path.isfile(copying_path :=3D os.path.join(src_tree, "COPYING"= )): + with open(copying_path, "r") as f: + package_copyright_text =3D f.read() prettify_json =3D args["prettify_json"] =20 # Hardcoded config @@ -244,6 +303,11 @@ def get_config() -> KernelSbomConfig: write_output_on_error=3Dwrite_output_on_error, created=3Dcreated, spdxId_prefix=3DspdxId_prefix, + build_type=3Dbuild_type, + build_id=3Dbuild_id, + package_license=3Dpackage_license, + package_version=3Dpackage_version, + package_copyright_text=3Dpackage_copyright_text, prettify_json=3Dprettify_json, ) =20 diff --git a/tools/sbom/sbom/environment.py b/tools/sbom/sbom/environment.py index b3fb2f0ba61d..f3a54bd613f9 100644 --- a/tools/sbom/sbom/environment.py +++ b/tools/sbom/sbom/environment.py @@ -3,12 +3,162 @@ =20 import os =20 +KERNEL_BUILD_VARIABLES_ALLOWLIST =3D [ + "AFLAGS_KERNEL", + "AFLAGS_MODULE", + "AR", + "ARCH", + "ARCH_CORE", + "ARCH_DRIVERS", + "ARCH_LIB", + "AWK", + "BASH", + "BINDGEN", + "BITS", + "CC", + "CC_FLAGS_FPU", + "CC_FLAGS_NO_FPU", + "CFLAGS_GCOV", + "CFLAGS_KERNEL", + "CFLAGS_MODULE", + "CHECK", + "CHECKFLAGS", + "CLIPPY_CONF_DIR", + "CONFIG_SHELL", + "CPP", + "CROSS_COMPILE", + "CURDIR", + "GNUMAKEFLAGS", + "HOSTCC", + "HOSTCXX", + "HOSTPKG_CONFIG", + "HOSTRUSTC", + "INSTALLKERNEL", + "INSTALL_DTBS_PATH", + "INSTALL_HDR_PATH", + "INSTALL_PATH", + "KBUILD_AFLAGS", + "KBUILD_AFLAGS_KERNEL", + "KBUILD_AFLAGS_MODULE", + "KBUILD_BUILTIN", + "KBUILD_CFLAGS", + "KBUILD_CFLAGS_KERNEL", + "KBUILD_CFLAGS_MODULE", + "KBUILD_CHECKSRC", + "KBUILD_CLIPPY", + "KBUILD_CPPFLAGS", + "KBUILD_EXTMOD", + "KBUILD_EXTRA_WARN", + "KBUILD_HOSTCFLAGS", + "KBUILD_HOSTCXXFLAGS", + "KBUILD_HOSTLDFLAGS", + "KBUILD_HOSTLDLIBS", + "KBUILD_HOSTRUSTFLAGS", + "KBUILD_IMAGE", + "KBUILD_LDFLAGS", + "KBUILD_LDFLAGS_MODULE", + "KBUILD_LDS", + "KBUILD_MODULES", + "KBUILD_PROCMACROLDFLAGS", + "KBUILD_RUSTFLAGS", + "KBUILD_RUSTFLAGS_KERNEL", + "KBUILD_RUSTFLAGS_MODULE", + "KBUILD_USERCFLAGS", + "KBUILD_USERLDFLAGS", + "KBUILD_VERBOSE", + "KBUILD_VMLINUX_LIBS", + "KBZIP2", + "KCONFIG_CONFIG", + "KERNELDOC", + "KERNELRELEASE", + "KERNELVERSION", + "KGZIP", + "KLZOP", + "LC_COLLATE", + "LC_NUMERIC", + "LD", + "LDFLAGS_MODULE", + "LEX", + "LINUXINCLUDE", + "LZ4", + "LZMA", + "MAKE", + "MAKEFILES", + "MAKEFILE_LIST", + "MAKEFLAGS", + "MAKELEVEL", + "MAKEOVERRIDES", + "MAKE_COMMAND", + "MAKE_HOST", + "MAKE_TERMERR", + "MAKE_TERMOUT", + "MAKE_VERSION", + "MFLAGS", + "MODLIB", + "NM", + "NOSTDINC_FLAGS", + "O", + "OBJCOPY", + "OBJCOPYFLAGS", + "OBJDUMP", + "PAHOLE", + "PATCHLEVEL", + "PERL", + "PYTHON3", + "Q", + "RCS_FIND_IGNORE", + "READELF", + "REALMODE_CFLAGS", + "RESOLVE_BTFIDS", + "RETHUNK_CFLAGS", + "RETHUNK_RUSTFLAGS", + "RETPOLINE_CFLAGS", + "RETPOLINE_RUSTFLAGS", + "RETPOLINE_VDSO_CFLAGS", + "RUSTC", + "RUSTC_BOOTSTRAP", + "RUSTC_OR_CLIPPY", + "RUSTC_OR_CLIPPY_QUIET", + "RUSTDOC", + "RUSTFLAGS_KERNEL", + "RUSTFLAGS_MODULE", + "RUSTFMT", + "SRCARCH", + "STRIP", + "SUBLEVEL", + "SUFFIXES", + "TAR", + "UTS_MACHINE", + "VERSION", + "VPATH", + "XZ", + "YACC", + "ZSTD", + "building_out_of_srctree", + "cross_compiling", + "objtree", + "quiet", + "rust_common_flags", + "srcroot", + "srctree", + "sub_make_done", + "subdir", +] + =20 class Environment: """ Read-only accessor for kernel build environment variables. """ =20 + @classmethod + def KERNEL_BUILD_VARIABLES(cls) -> dict[str, str | None]: + return {name: os.getenv(name) for name in KERNEL_BUILD_VARIABLES_A= LLOWLIST} + + @classmethod + def ARCH(cls) -> str | None: + return os.getenv("ARCH") + @classmethod def SRCARCH(cls) -> str | None: return os.getenv("SRCARCH") diff --git a/tools/sbom/sbom/spdx_graph/build_spdx_graphs.py b/tools/sbom/s= bom/spdx_graph/build_spdx_graphs.py index 0f95f99d560a..2af0fbe6cdbe 100644 --- a/tools/sbom/sbom/spdx_graph/build_spdx_graphs.py +++ b/tools/sbom/sbom/spdx_graph/build_spdx_graphs.py @@ -10,12 +10,18 @@ from sbom.path_utils import PathStr from sbom.spdx_graph.kernel_file import KernelFileCollection from sbom.spdx_graph.spdx_graph_model import SpdxGraph, SpdxIdGeneratorCol= lection from sbom.spdx_graph.shared_spdx_elements import SharedSpdxElements +from sbom.spdx_graph.spdx_output_graph import SpdxOutputGraph =20 =20 class SpdxGraphConfig(Protocol): obj_tree: PathStr src_tree: PathStr created: datetime + build_type: str + build_id: str | None + package_license: str + package_version: str | None + package_copyright_text: str | None =20 =20 def build_spdx_graphs( @@ -38,4 +44,14 @@ def build_spdx_graphs( """ shared_elements =3D SharedSpdxElements.create(spdx_id_generators.base,= config.created) kernel_files =3D KernelFileCollection.create(cmd_graph, config.obj_tre= e, config.src_tree, spdx_id_generators) - return {} + output_graph =3D SpdxOutputGraph.create( + root_files=3Dlist(kernel_files.output.values()), + shared_elements=3Dshared_elements, + spdx_id_generators=3Dspdx_id_generators, + config=3Dconfig, + ) + spdx_graphs: dict[KernelSpdxDocumentKind, SpdxGraph] =3D { + KernelSpdxDocumentKind.OUTPUT: output_graph, + } + + return spdx_graphs diff --git a/tools/sbom/sbom/spdx_graph/spdx_output_graph.py b/tools/sbom/s= bom/spdx_graph/spdx_output_graph.py new file mode 100644 index 000000000000..1ae0f935e0b9 --- /dev/null +++ b/tools/sbom/sbom/spdx_graph/spdx_output_graph.py @@ -0,0 +1,188 @@ +# SPDX-License-Identifier: GPL-2.0-only OR MIT +# Copyright (C) 2025 TNG Technology Consulting GmbH + +from dataclasses import dataclass +import os +from typing import Protocol +from sbom.environment import Environment +from sbom.path_utils import PathStr +from sbom.spdx.build import Build +from sbom.spdx.core import DictionaryEntry, NamespaceMap, Relationship, Sp= dxDocument +from sbom.spdx.simplelicensing import LicenseExpression +from sbom.spdx.software import File, Package, Sbom +from sbom.spdx.spdxId import SpdxIdGenerator +from sbom.spdx_graph.kernel_file import KernelFile +from sbom.spdx_graph.shared_spdx_elements import SharedSpdxElements +from sbom.spdx_graph.spdx_graph_model import SpdxGraph, SpdxIdGeneratorCol= lection + + +class SpdxOutputGraphConfig(Protocol): + obj_tree: PathStr + src_tree: PathStr + build_type: str + build_id: str | None + package_license: str + package_version: str | None + package_copyright_text: str | None + + +@dataclass +class SpdxOutputGraph(SpdxGraph): + """SPDX graph representing distributable output files""" + + high_level_build_element: Build + + @classmethod + def create( + cls, + root_files: list[KernelFile], + shared_elements: SharedSpdxElements, + spdx_id_generators: SpdxIdGeneratorCollection, + config: SpdxOutputGraphConfig, + ) -> "SpdxOutputGraph": + """ + Args: + root_files: List of distributable output files which act as ro= ots + of the dependency graph. + shared_elements: Shared SPDX elements used across multiple doc= uments. + spdx_id_generators: Collection of SPDX ID generators. + config: Configuration options. + + Returns: + SpdxOutputGraph: The SPDX output graph. + """ + # SpdxDocument + spdx_document =3D SpdxDocument( + spdxId=3Dspdx_id_generators.output.generate(), + profileConformance=3D["core", "software", "build", "simpleLice= nsing"], + namespaceMap=3D[ + NamespaceMap(prefix=3Dgenerator.prefix, namespace=3Dgenera= tor.namespace) + for generator in [spdx_id_generators.output, spdx_id_gener= ators.base] + if generator.prefix is not None + ], + ) + + # Sbom + sbom =3D Sbom( + spdxId=3Dspdx_id_generators.output.generate(), + software_sbomType=3D["build"], + ) + + # High-level Build elements + config_source_element =3D KernelFile.create( + absolute_path=3Dos.path.join(config.obj_tree, ".config"), + obj_tree=3Dconfig.obj_tree, + src_tree=3Dconfig.src_tree, + spdx_id_generators=3Dspdx_id_generators, + is_output=3DTrue, + ).spdx_file_element + high_level_build_element, high_level_build_element_hasOutput_relat= ionship =3D _high_level_build_elements( + config.build_type, + config.build_id, + config_source_element, + spdx_id_generators.output, + ) + + # Root file elements + root_file_elements: list[File] =3D [file.spdx_file_element for fil= e in root_files] + + # Package elements + package_elements =3D [ + Package( + spdxId=3Dspdx_id_generators.output.generate(), + name=3D_get_package_name(file.name), + software_packageVersion=3Dconfig.package_version, + software_copyrightText=3Dconfig.package_copyright_text, + originatedBy=3D[shared_elements.agent], + comment=3Df"Architecture=3D{arch}" if (arch :=3D Environme= nt.ARCH() or Environment.SRCARCH()) else None, + software_primaryPurpose=3Dfile.software_primaryPurpose, + ) + for file in root_file_elements + ] + package_hasDistributionArtifact_file_relationships =3D [ + Relationship( + spdxId=3Dspdx_id_generators.output.generate(), + relationshipType=3D"hasDistributionArtifact", + from_=3Dpackage, + to=3D[file], + ) + for package, file in zip(package_elements, root_file_elements) + ] + package_license_expression =3D LicenseExpression( + spdxId=3Dspdx_id_generators.output.generate(), + simplelicensing_licenseExpression=3Dconfig.package_license, + ) + package_hasDeclaredLicense_relationships =3D [ + Relationship( + spdxId=3Dspdx_id_generators.output.generate(), + relationshipType=3D"hasDeclaredLicense", + from_=3Dpackage, + to=3D[package_license_expression], + ) + for package in package_elements + ] + + # Update relationships + spdx_document.rootElement =3D [sbom] + + sbom.rootElement =3D [*package_elements] + sbom.element =3D [ + config_source_element, + high_level_build_element, + high_level_build_element_hasOutput_relationship, + *root_file_elements, + *package_elements, + *package_hasDistributionArtifact_file_relationships, + package_license_expression, + *package_hasDeclaredLicense_relationships, + ] + + high_level_build_element_hasOutput_relationship.to =3D [*root_file= _elements] + + output_graph =3D SpdxOutputGraph( + spdx_document, + shared_elements.agent, + shared_elements.creation_info, + sbom, + high_level_build_element, + ) + return output_graph + + +def _get_package_name(filename: str) -> str: + """ + Generates a SPDX package name from a filename. + Kernel images (bzImage, Image) get a descriptive name, others use the = basename of the file. + """ + KERNEL_FILENAMES =3D ["bzImage", "Image"] + basename =3D os.path.basename(filename) + return f"Linux Kernel ({basename})" if basename in KERNEL_FILENAMES el= se basename + + +def _high_level_build_elements( + build_type: str, + build_id: str | None, + config_source_element: File, + spdx_id_generator: SpdxIdGenerator, +) -> tuple[Build, Relationship]: + build_spdxId =3D spdx_id_generator.generate() + high_level_build_element =3D Build( + spdxId=3Dbuild_spdxId, + build_buildType=3Dbuild_type, + build_buildId=3Dbuild_id if build_id is not None else build_spdxId, + build_environment=3D[ + DictionaryEntry(key=3Dkey, value=3Dvalue) + for key, value in Environment.KERNEL_BUILD_VARIABLES().items() + if value + ], + build_configSourceUri=3D[config_source_element.spdxId], + build_configSourceDigest=3Dconfig_source_element.verifiedUsing, + ) + + high_level_build_element_hasOutput_relationship =3D Relationship( + spdxId=3Dspdx_id_generator.generate(), + relationshipType=3D"hasOutput", + from_=3Dhigh_level_build_element, + to=3D[], + ) + return high_level_build_element, high_level_build_element_hasOutput_re= lationship --=20 2.34.1 From nobody Sat Feb 7 08:07:12 2026 Received: from mailgw01.zimbra-vnc.de (mailgw01.zimbra-vnc.de [148.251.101.236]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 314DE28643C; Mon, 26 Jan 2026 19:35:36 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=148.251.101.236 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1769456139; cv=none; b=nX+M8pJkyoA1m+rhIaC77U9V7IpWXg/Om2yHtJx2dpl5f0Yy3FHWO2K+Y/SuOFGk4N6oqmadqx9DuO0QqKfvPpa+ReqmY62/Ff6r+XwByS+4VStxJojE83KkDBeNLXdUKGMknNLMz+FJGalo6MN0hhOBqr1yyiIZZ3GS0qt8RyU= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1769456139; c=relaxed/simple; bh=cFNZRlIxT/34gp+dzxMA1w3STtnbLv+SCBygP3i6rMc=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=mQ15+O+pwDJSf52uXV/1BVrdFaZvcln0oh9uWr/9Z0Dys5dmrrp8EG7ehC7dYvRWDYdxpmRSzRLdLgXIuNEcaBma4qezWLfac4c9oyvFXThRwYj6NMxumGbkBHJroghElTIw0DJasPJ+YVAmSDKgKqCYmwqd9cWcyhHm7zHU5Qc= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=tngtech.com; spf=pass smtp.mailfrom=tngtech.com; dkim=pass (2048-bit key) header.d=tngtech.com header.i=@tngtech.com header.b=RllAyyu0; arc=none smtp.client-ip=148.251.101.236 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=tngtech.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=tngtech.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=tngtech.com header.i=@tngtech.com header.b="RllAyyu0" Received: from zmproxy.tng.vnc.biz (zimbra-vnc.tngtech.com [35.234.71.156]) by mailgw01.zimbra-vnc.de (Postfix) with ESMTPS id 45D743FAF9; Mon, 26 Jan 2026 20:35:31 +0100 (CET) Received: from localhost (localhost [127.0.0.1]) by zmproxy.tng.vnc.biz (Postfix) with ESMTP id EC1431FA7B4; Mon, 26 Jan 2026 20:35:29 +0100 (CET) Received: from zmproxy.tng.vnc.biz ([127.0.0.1]) by localhost (zmproxy.tng.vnc.biz [127.0.0.1]) (amavis, port 10032) with ESMTP id 2QXBqrsOoFCz; Mon, 26 Jan 2026 20:35:29 +0100 (CET) Received: from localhost (localhost [127.0.0.1]) by zmproxy.tng.vnc.biz (Postfix) with ESMTP id 45A251FA703; Mon, 26 Jan 2026 20:35:29 +0100 (CET) DKIM-Filter: OpenDKIM Filter v2.10.3 zmproxy.tng.vnc.biz 45A251FA703 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=tngtech.com; s=B14491C6-869D-11EB-BB6C-8DD33D883B31; t=1769456129; bh=dZL1JaPMNMR+B1ET/+Ix+NpqGkk7K7L1IuuIEqfp/f0=; h=From:To:Subject:Date:Message-Id:MIME-Version; b=RllAyyu08j3M9etTa7cZxkZHdKWEcGeuTFMs0AsjSLqInN5TFubJLiTiA4np0TvVq eEaiKZSxRvWcFsdyZRtgYXQ0HIAqxVMoEBGgHvik7Ga0ZY5vJ0FoQxDBj7a+b5hluq cDRshHaBuqQPUzyk1KWUmu8U4cjbeIXpKFVMnCvL7iSUDcnNapBEY1zUOzrv8zY9bg /kt7b38xad7R2gD5kdjrKwxSDxQQgiVAfhX8iE4SQNwt2OWY9pXrFVaIoqBFvyZXXM 7PkcEvJFHMJMRMlA2u1XmmCKCsIFzwBZ+KyEHdjUrlqO0p++FiuJJe5KqvHCBqLrUq 3+4KrHwT8GikQ== X-Virus-Scanned: amavis at zmproxy.tng.vnc.biz Received: from zmproxy.tng.vnc.biz ([127.0.0.1]) by localhost (zmproxy.tng.vnc.biz [127.0.0.1]) (amavis, port 10026) with ESMTP id SN6NnRY4-hPo; Mon, 26 Jan 2026 20:35:29 +0100 (CET) Received: from DESKTOP-0O0JV6I.localdomain (ipservice-092-208-231-176.092.208.pools.vodafone-ip.de [92.208.231.176]) by zmproxy.tng.vnc.biz (Postfix) with ESMTPSA id CF7CE1FA6D0; Mon, 26 Jan 2026 20:35:28 +0100 (CET) From: Luis Augenstein To: nathan@kernel.org, nsc@kernel.org Cc: linux-kbuild@vger.kernel.org, linux-kernel@vger.kernel.org, akpm@linux-foundation.org, gregkh@linuxfoundation.org, maximilian.huber@tngtech.com, Luis Augenstein Subject: [PATCH v3 11/14] tools/sbom: add SPDX source graph Date: Mon, 26 Jan 2026 20:33:01 +0100 Message-Id: <20260126193304.320916-12-luis.augenstein@tngtech.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20260126193304.320916-1-luis.augenstein@tngtech.com> References: <20260126193304.320916-1-luis.augenstein@tngtech.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Implement the SPDX source graph which contains all source files involved during the build, along with the licensing information for each file. Co-developed-by: Maximilian Huber Signed-off-by: Maximilian Huber Signed-off-by: Luis Augenstein --- .../sbom/sbom/spdx_graph/build_spdx_graphs.py | 8 ++ .../sbom/sbom/spdx_graph/spdx_source_graph.py | 126 ++++++++++++++++++ 2 files changed, 134 insertions(+) create mode 100644 tools/sbom/sbom/spdx_graph/spdx_source_graph.py diff --git a/tools/sbom/sbom/spdx_graph/build_spdx_graphs.py b/tools/sbom/s= bom/spdx_graph/build_spdx_graphs.py index 2af0fbe6cdbe..a61257a905f3 100644 --- a/tools/sbom/sbom/spdx_graph/build_spdx_graphs.py +++ b/tools/sbom/sbom/spdx_graph/build_spdx_graphs.py @@ -10,6 +10,7 @@ from sbom.path_utils import PathStr from sbom.spdx_graph.kernel_file import KernelFileCollection from sbom.spdx_graph.spdx_graph_model import SpdxGraph, SpdxIdGeneratorCol= lection from sbom.spdx_graph.shared_spdx_elements import SharedSpdxElements +from sbom.spdx_graph.spdx_source_graph import SpdxSourceGraph from sbom.spdx_graph.spdx_output_graph import SpdxOutputGraph =20 =20 @@ -54,4 +55,11 @@ def build_spdx_graphs( KernelSpdxDocumentKind.OUTPUT: output_graph, } =20 + if len(kernel_files.source) > 0: + spdx_graphs[KernelSpdxDocumentKind.SOURCE] =3D SpdxSourceGraph.cre= ate( + source_files=3Dlist(kernel_files.source.values()), + shared_elements=3Dshared_elements, + spdx_id_generators=3Dspdx_id_generators, + ) + return spdx_graphs diff --git a/tools/sbom/sbom/spdx_graph/spdx_source_graph.py b/tools/sbom/s= bom/spdx_graph/spdx_source_graph.py new file mode 100644 index 000000000000..16176c4ea5ee --- /dev/null +++ b/tools/sbom/sbom/spdx_graph/spdx_source_graph.py @@ -0,0 +1,126 @@ +# SPDX-License-Identifier: GPL-2.0-only OR MIT +# Copyright (C) 2025 TNG Technology Consulting GmbH + +from dataclasses import dataclass +from sbom.spdx import SpdxIdGenerator +from sbom.spdx.core import Element, NamespaceMap, Relationship, SpdxDocume= nt +from sbom.spdx.simplelicensing import LicenseExpression +from sbom.spdx.software import File, Sbom +from sbom.spdx_graph.kernel_file import KernelFile +from sbom.spdx_graph.shared_spdx_elements import SharedSpdxElements +from sbom.spdx_graph.spdx_graph_model import SpdxGraph, SpdxIdGeneratorCol= lection + + +@dataclass +class SpdxSourceGraph(SpdxGraph): + """SPDX graph representing source files""" + + @classmethod + def create( + cls, + source_files: list[KernelFile], + shared_elements: SharedSpdxElements, + spdx_id_generators: SpdxIdGeneratorCollection, + ) -> "SpdxSourceGraph": + """ + Args: + source_files: List of files within the kernel source tree. + shared_elements: Shared SPDX elements used across multiple doc= uments. + spdx_id_generators: Collection of SPDX ID generators. + + Returns: + SpdxSourceGraph: The SPDX source graph. + """ + # SpdxDocument + source_spdx_document =3D SpdxDocument( + spdxId=3Dspdx_id_generators.source.generate(), + profileConformance=3D["core", "software", "simpleLicensing"], + namespaceMap=3D[ + NamespaceMap(prefix=3Dgenerator.prefix, namespace=3Dgenera= tor.namespace) + for generator in [spdx_id_generators.source, spdx_id_gener= ators.base] + if generator.prefix is not None + ], + ) + + # Sbom + source_sbom =3D Sbom( + spdxId=3Dspdx_id_generators.source.generate(), + software_sbomType=3D["source"], + ) + + # Src Tree Elements + src_tree_element =3D File( + spdxId=3Dspdx_id_generators.source.generate(), + name=3D"$(src_tree)", + software_fileKind=3D"directory", + ) + src_tree_contains_relationship =3D Relationship( + spdxId=3Dspdx_id_generators.source.generate(), + relationshipType=3D"contains", + from_=3Dsrc_tree_element, + to=3D[], + ) + + # Source file elements + source_file_elements: list[Element] =3D [file.spdx_file_element fo= r file in source_files] + + # Source file license elements + source_file_license_identifiers, source_file_license_relationships= =3D source_file_license_elements( + source_files, spdx_id_generators.source + ) + + # Update relationships + source_spdx_document.rootElement =3D [source_sbom] + source_sbom.rootElement =3D [src_tree_element] + source_sbom.element =3D [ + src_tree_element, + src_tree_contains_relationship, + *source_file_elements, + *source_file_license_identifiers, + *source_file_license_relationships, + ] + src_tree_contains_relationship.to =3D source_file_elements + + source_graph =3D SpdxSourceGraph( + source_spdx_document, + shared_elements.agent, + shared_elements.creation_info, + source_sbom, + ) + return source_graph + + +def source_file_license_elements( + source_files: list[KernelFile], spdx_id_generator: SpdxIdGenerator +) -> tuple[list[LicenseExpression], list[Relationship]]: + """ + Creates SPDX license expressions and links them to the given source fi= les + via hasDeclaredLicense relationships. + + Args: + source_files: List of files within the kernel source tree. + spdx_id_generator: Generator for unique SPDX IDs. + + Returns: + Tuple of (license expressions, hasDeclaredLicense relationships). + """ + license_expressions: dict[str, LicenseExpression] =3D {} + for file in source_files: + if file.license_identifier is None or file.license_identifier in l= icense_expressions: + continue + license_expressions[file.license_identifier] =3D LicenseExpression( + spdxId=3Dspdx_id_generator.generate(), + simplelicensing_licenseExpression=3Dfile.license_identifier, + ) + + source_file_license_relationships =3D [ + Relationship( + spdxId=3Dspdx_id_generator.generate(), + relationshipType=3D"hasDeclaredLicense", + from_=3Dfile.spdx_file_element, + to=3D[license_expressions[file.license_identifier]], + ) + for file in source_files + if file.license_identifier is not None + ] + return ([*license_expressions.values()], source_file_license_relations= hips) --=20 2.34.1 From nobody Sat Feb 7 08:07:12 2026 Received: from mailgw01.zimbra-vnc.de (mailgw01.zimbra-vnc.de [148.251.101.236]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 75A70286425; Mon, 26 Jan 2026 19:35:37 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=148.251.101.236 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1769456140; cv=none; b=i7jubmVNeUHfR8/6JEi9bQl67YibZ5/D/IbD2RpLHQTOg9d+PxSJoCYa6CWU6wCoNRHbZRSIoEqJ8HLQnXcWtCaa8NJVvj4StOlcROWZhSBhzzW7EHLa+U+rKn4uYeQF1LllZF/m1sQIzBovVmG4KIosKFnFG+dk+IT/nk6cVcw= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1769456140; c=relaxed/simple; bh=XzlBJG6ZF/AW0sS1Z+v0jcaFU8Ly1nZZ/ZCbwTQXCYI=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=Uv9/kPTdSnX5+McFiSLwMtj0sP1c4kv09o2O82f/ViHrY6yexiNtRreYVu8/EfwwB7oym4g53Zuiz2A/drFhM01S79zbkA67+ZWHXt7LyCadQo6ITOWKq/NA9PuFz4oK9jtkJ/LVDouqbcyhiwQL0wr11VZXpx85X/u/8EyeKbA= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=tngtech.com; spf=pass smtp.mailfrom=tngtech.com; dkim=pass (2048-bit key) header.d=tngtech.com header.i=@tngtech.com header.b=Tj0OnGGa; arc=none smtp.client-ip=148.251.101.236 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=tngtech.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=tngtech.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=tngtech.com header.i=@tngtech.com header.b="Tj0OnGGa" Received: from zmproxy.tng.vnc.biz (zimbra-vnc.tngtech.com [35.234.71.156]) by mailgw01.zimbra-vnc.de (Postfix) with ESMTPS id DF4B63FAFA; Mon, 26 Jan 2026 20:35:32 +0100 (CET) Received: from localhost (localhost [127.0.0.1]) by zmproxy.tng.vnc.biz (Postfix) with ESMTP id 31A621FA6D0; Mon, 26 Jan 2026 20:35:31 +0100 (CET) Received: from zmproxy.tng.vnc.biz ([127.0.0.1]) by localhost (zmproxy.tng.vnc.biz [127.0.0.1]) (amavis, port 10032) with ESMTP id QoSGWLsXx7oa; Mon, 26 Jan 2026 20:35:30 +0100 (CET) Received: from localhost (localhost [127.0.0.1]) by zmproxy.tng.vnc.biz (Postfix) with ESMTP id 49FC51FA703; Mon, 26 Jan 2026 20:35:30 +0100 (CET) DKIM-Filter: OpenDKIM Filter v2.10.3 zmproxy.tng.vnc.biz 49FC51FA703 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=tngtech.com; s=B14491C6-869D-11EB-BB6C-8DD33D883B31; t=1769456130; bh=UZCgv6iVYDQqxFrj1J/PBaDCRPPwWQoDBttlPqiTYXk=; h=From:To:Subject:Date:Message-Id:MIME-Version; b=Tj0OnGGaKsMv0W07naknhcSsRjP5wslLutWPfj+xMto/SgkeZ67chEcUx4Os4lefx NzcIpvMVWcw/DJPYfNALKE7NlLEjKkkr3M+KZRAWJR5jOmDvNzgHoaS/n8R2Zr5dMv EddVuoyYbvtLltl+F+Uvj4SxFkcpxCYcgoKn8b5d62J1HPp0TynVzyMrZDq+793Zka KbMg9nWiYB2vjMI2MVnjpR+3drnaFU9Q5/8rLZUIiTOwfTlNXKLMIF7su/FFAfsyzt igWsf8fx7CkuCG2eaOu46iHa9ftlvzGKkLd6y+ZuMHfQS+U0voCcY/iJvDeq2NLGB1 tydDqFDbPwGIA== X-Virus-Scanned: amavis at zmproxy.tng.vnc.biz Received: from zmproxy.tng.vnc.biz ([127.0.0.1]) by localhost (zmproxy.tng.vnc.biz [127.0.0.1]) (amavis, port 10026) with ESMTP id 204wHnoxRIWu; Mon, 26 Jan 2026 20:35:30 +0100 (CET) Received: from DESKTOP-0O0JV6I.localdomain (ipservice-092-208-231-176.092.208.pools.vodafone-ip.de [92.208.231.176]) by zmproxy.tng.vnc.biz (Postfix) with ESMTPSA id E90431FA6D0; Mon, 26 Jan 2026 20:35:29 +0100 (CET) From: Luis Augenstein To: nathan@kernel.org, nsc@kernel.org Cc: linux-kbuild@vger.kernel.org, linux-kernel@vger.kernel.org, akpm@linux-foundation.org, gregkh@linuxfoundation.org, maximilian.huber@tngtech.com, Luis Augenstein Subject: [PATCH v3 12/14] tools/sbom: add SPDX build graph Date: Mon, 26 Jan 2026 20:33:02 +0100 Message-Id: <20260126193304.320916-13-luis.augenstein@tngtech.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20260126193304.320916-1-luis.augenstein@tngtech.com> References: <20260126193304.320916-1-luis.augenstein@tngtech.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Implement the SPDX build graph to describe the relationships between source files in the source SBOM and output files in the output SBOM. Co-developed-by: Maximilian Huber Signed-off-by: Maximilian Huber Signed-off-by: Luis Augenstein --- .../sbom/sbom/spdx_graph/build_spdx_graphs.py | 17 + .../sbom/sbom/spdx_graph/spdx_build_graph.py | 317 ++++++++++++++++++ 2 files changed, 334 insertions(+) create mode 100644 tools/sbom/sbom/spdx_graph/spdx_build_graph.py diff --git a/tools/sbom/sbom/spdx_graph/build_spdx_graphs.py b/tools/sbom/s= bom/spdx_graph/build_spdx_graphs.py index a61257a905f3..eecc52156449 100644 --- a/tools/sbom/sbom/spdx_graph/build_spdx_graphs.py +++ b/tools/sbom/sbom/spdx_graph/build_spdx_graphs.py @@ -4,6 +4,7 @@ from datetime import datetime from typing import Protocol =20 +import logging from sbom.config import KernelSpdxDocumentKind from sbom.cmd_graph import CmdGraph from sbom.path_utils import PathStr @@ -11,6 +12,7 @@ from sbom.spdx_graph.kernel_file import KernelFileCollect= ion from sbom.spdx_graph.spdx_graph_model import SpdxGraph, SpdxIdGeneratorCol= lection from sbom.spdx_graph.shared_spdx_elements import SharedSpdxElements from sbom.spdx_graph.spdx_source_graph import SpdxSourceGraph +from sbom.spdx_graph.spdx_build_graph import SpdxBuildGraph from sbom.spdx_graph.spdx_output_graph import SpdxOutputGraph =20 =20 @@ -61,5 +63,20 @@ def build_spdx_graphs( shared_elements=3Dshared_elements, spdx_id_generators=3Dspdx_id_generators, ) + else: + logging.info( + "Skipped creating a dedicated source SBOM because source files= cannot be " + "reliably classified when the source and object trees are iden= tical. " + "Added source files to the build SBOM instead." + ) + + build_graph =3D SpdxBuildGraph.create( + cmd_graph, + kernel_files, + shared_elements, + output_graph.high_level_build_element, + spdx_id_generators, + ) + spdx_graphs[KernelSpdxDocumentKind.BUILD] =3D build_graph =20 return spdx_graphs diff --git a/tools/sbom/sbom/spdx_graph/spdx_build_graph.py b/tools/sbom/sb= om/spdx_graph/spdx_build_graph.py new file mode 100644 index 000000000000..2956800fa9ed --- /dev/null +++ b/tools/sbom/sbom/spdx_graph/spdx_build_graph.py @@ -0,0 +1,317 @@ +# SPDX-License-Identifier: GPL-2.0-only OR MIT +# Copyright (C) 2025 TNG Technology Consulting GmbH + +from dataclasses import dataclass +from typing import Mapping +from sbom.cmd_graph import CmdGraph +from sbom.path_utils import PathStr +from sbom.spdx import SpdxIdGenerator +from sbom.spdx.build import Build +from sbom.spdx.core import ExternalMap, NamespaceMap, Relationship, SpdxDo= cument +from sbom.spdx.software import File, Sbom +from sbom.spdx_graph.kernel_file import KernelFileCollection +from sbom.spdx_graph.shared_spdx_elements import SharedSpdxElements +from sbom.spdx_graph.spdx_graph_model import SpdxGraph, SpdxIdGeneratorCol= lection +from sbom.spdx_graph.spdx_source_graph import source_file_license_elements + + +@dataclass +class SpdxBuildGraph(SpdxGraph): + """SPDX graph representing build dependencies connecting source files = and + distributable output files""" + + @classmethod + def create( + cls, + cmd_graph: CmdGraph, + kernel_files: KernelFileCollection, + shared_elements: SharedSpdxElements, + high_level_build_element: Build, + spdx_id_generators: SpdxIdGeneratorCollection, + ) -> "SpdxBuildGraph": + if len(kernel_files.source) > 0: + return _create_spdx_build_graph( + cmd_graph, + kernel_files, + shared_elements, + high_level_build_element, + spdx_id_generators, + ) + else: + return _create_spdx_build_graph_with_mixed_sources( + cmd_graph, + kernel_files, + shared_elements, + high_level_build_element, + spdx_id_generators, + ) + + +def _create_spdx_build_graph( + cmd_graph: CmdGraph, + kernel_files: KernelFileCollection, + shared_elements: SharedSpdxElements, + high_level_build_element: Build, + spdx_id_generators: SpdxIdGeneratorCollection, +) -> SpdxBuildGraph: + """ + Creates an SPDX build graph where source and output files are referenc= ed + from external documents. + + Args: + cmd_graph: The dependency graph of a kernel build. + kernel_files: Collection of categorized kernel files involved in t= he build. + shared_elements: SPDX elements shared across multiple documents. + high_level_build_element: The high-level Build element referenced = by the build graph. + spdx_id_generators: Collection of generators for SPDX element IDs. + + Returns: + SpdxBuildGraph: The SPDX build graph connecting source files and d= istributable output files. + """ + # SpdxDocument + build_spdx_document =3D SpdxDocument( + spdxId=3Dspdx_id_generators.build.generate(), + profileConformance=3D["core", "software", "build"], + namespaceMap=3D[ + NamespaceMap(prefix=3Dgenerator.prefix, namespace=3Dgenerator.= namespace) + for generator in [ + spdx_id_generators.build, + spdx_id_generators.source, + spdx_id_generators.output, + spdx_id_generators.base, + ] + if generator.prefix is not None + ], + ) + + # Sbom + build_sbom =3D Sbom( + spdxId=3Dspdx_id_generators.build.generate(), + software_sbomType=3D["build"], + ) + + # Src and object tree elements + obj_tree_element =3D File( + spdxId=3Dspdx_id_generators.build.generate(), + name=3D"$(obj_tree)", + software_fileKind=3D"directory", + ) + obj_tree_contains_relationship =3D Relationship( + spdxId=3Dspdx_id_generators.build.generate(), + relationshipType=3D"contains", + from_=3Dobj_tree_element, + to=3D[], + ) + + # File elements + build_file_elements =3D [file.spdx_file_element for file in kernel_fil= es.build.values()] + file_relationships =3D _file_relationships( + cmd_graph=3Dcmd_graph, + file_elements=3D{key: file.spdx_file_element for key, file in kern= el_files.to_dict().items()}, + high_level_build_element=3Dhigh_level_build_element, + spdx_id_generator=3Dspdx_id_generators.build, + ) + + # Update relationships + build_spdx_document.rootElement =3D [build_sbom] + + build_spdx_document.import_ =3D [ + *( + ExternalMap(externalSpdxId=3Dfile_element.spdx_file_element.sp= dxId) + for file_element in kernel_files.source.values() + ), + ExternalMap(externalSpdxId=3Dhigh_level_build_element.spdxId), + *(ExternalMap(externalSpdxId=3Dfile.spdx_file_element.spdxId) for = file in kernel_files.output.values()), + ] + + build_sbom.rootElement =3D [obj_tree_element] + build_sbom.element =3D [ + obj_tree_element, + obj_tree_contains_relationship, + *build_file_elements, + *file_relationships, + ] + + obj_tree_contains_relationship.to =3D [ + *build_file_elements, + *(file.spdx_file_element for file in kernel_files.output.values()), + ] + + # create Spdx graphs + build_graph =3D SpdxBuildGraph( + build_spdx_document, + shared_elements.agent, + shared_elements.creation_info, + build_sbom, + ) + return build_graph + + +def _create_spdx_build_graph_with_mixed_sources( + cmd_graph: CmdGraph, + kernel_files: KernelFileCollection, + shared_elements: SharedSpdxElements, + high_level_build_element: Build, + spdx_id_generators: SpdxIdGeneratorCollection, +) -> SpdxBuildGraph: + """ + Creates an SPDX build graph where only output files are referenced from + an external document. Source files are included directly in the build = graph. + + Args: + cmd_graph: The dependency graph of a kernel build. + kernel_files: Collection of categorized kernel files involved in t= he build. + shared_elements: SPDX elements shared across multiple documents. + high_level_build_element: The high-level Build element referenced = by the build graph. + spdx_id_generators: Collection of generators for SPDX element IDs. + + Returns: + SpdxBuildGraph: The SPDX build graph connecting source files and d= istributable output files. + """ + # SpdxDocument + build_spdx_document =3D SpdxDocument( + spdxId=3Dspdx_id_generators.build.generate(), + profileConformance=3D["core", "software", "build"], + namespaceMap=3D[ + NamespaceMap(prefix=3Dgenerator.prefix, namespace=3Dgenerator.= namespace) + for generator in [ + spdx_id_generators.build, + spdx_id_generators.output, + spdx_id_generators.base, + ] + if generator.prefix is not None + ], + ) + + # Sbom + build_sbom =3D Sbom( + spdxId=3Dspdx_id_generators.build.generate(), + software_sbomType=3D["build"], + ) + + # File elements + build_file_elements =3D [file.spdx_file_element for file in kernel_fil= es.build.values()] + file_relationships =3D _file_relationships( + cmd_graph=3Dcmd_graph, + file_elements=3D{key: file.spdx_file_element for key, file in kern= el_files.to_dict().items()}, + high_level_build_element=3Dhigh_level_build_element, + spdx_id_generator=3Dspdx_id_generators.build, + ) + + # Source file license elements + source_file_license_identifiers, source_file_license_relationships =3D= source_file_license_elements( + list(kernel_files.build.values()), spdx_id_generators.build + ) + + # Update relationships + build_spdx_document.rootElement =3D [build_sbom] + root_file_elements =3D [file.spdx_file_element for file in kernel_file= s.output.values()] + build_spdx_document.import_ =3D [ + ExternalMap(externalSpdxId=3Dhigh_level_build_element.spdxId), + *(ExternalMap(externalSpdxId=3Dfile.spdxId) for file in root_file_= elements), + ] + + build_sbom.rootElement =3D [*root_file_elements] + build_sbom.element =3D [ + *build_file_elements, + *source_file_license_identifiers, + *source_file_license_relationships, + *file_relationships, + ] + + build_graph =3D SpdxBuildGraph( + build_spdx_document, + shared_elements.agent, + shared_elements.creation_info, + build_sbom, + ) + return build_graph + + +def _file_relationships( + cmd_graph: CmdGraph, + file_elements: Mapping[PathStr, File], + high_level_build_element: Build, + spdx_id_generator: SpdxIdGenerator, +) -> list[Build | Relationship]: + """ + Construct SPDX Build and Relationship elements representing dependency + relationships in the cmd graph. + + Args: + cmd_graph: The dependency graph of a kernel build. + file_elements: Mapping of filesystem paths (PathStr) to their + corresponding SPDX File elements. + high_level_build_element: The SPDX Build element representing the = overall build process/root. + spdx_id_generator: Generator for unique SPDX IDs. + + Returns: + list[Build | Relationship]: List of SPDX Build and Relationship el= ements + """ + high_level_build_ancestorOf_relationship =3D Relationship( + spdxId=3Dspdx_id_generator.generate(), + relationshipType=3D"ancestorOf", + from_=3Dhigh_level_build_element, + completeness=3D"complete", + to=3D[], + ) + + # Create a relationship between each node (output file) + # and its children (input files) + build_and_relationship_elements: list[Build | Relationship] =3D [high_= level_build_ancestorOf_relationship] + for node in cmd_graph: + if next(node.children, None) is None: + continue + + # .cmd file dependencies + if node.cmd_file is not None: + build_element =3D Build( + spdxId=3Dspdx_id_generator.generate(), + build_buildType=3Dhigh_level_build_element.build_buildType, + build_buildId=3Dhigh_level_build_element.build_buildId, + comment=3Dnode.cmd_file.savedcmd, + ) + hasInput_relationship =3D Relationship( + spdxId=3Dspdx_id_generator.generate(), + relationshipType=3D"hasInput", + from_=3Dbuild_element, + to=3D[file_elements[child_node.absolute_path] for child_no= de in node.children], + ) + hasOutput_relationship =3D Relationship( + spdxId=3Dspdx_id_generator.generate(), + relationshipType=3D"hasOutput", + from_=3Dbuild_element, + to=3D[file_elements[node.absolute_path]], + ) + build_and_relationship_elements +=3D [ + build_element, + hasInput_relationship, + hasOutput_relationship, + ] + high_level_build_ancestorOf_relationship.to.append(build_eleme= nt) + + # incbin dependencies + if len(node.incbin_dependencies) > 0: + incbin_dependsOn_relationship =3D Relationship( + spdxId=3Dspdx_id_generator.generate(), + relationshipType=3D"dependsOn", + comment=3D"\n".join([incbin_dependency.full_statement for = incbin_dependency in node.incbin_dependencies]), + from_=3Dfile_elements[node.absolute_path], + to=3D[ + file_elements[incbin_dependency.node.absolute_path] + for incbin_dependency in node.incbin_dependencies + ], + ) + build_and_relationship_elements.append(incbin_dependsOn_relati= onship) + + # hardcoded dependencies + if len(node.hardcoded_dependencies) > 0: + hardcoded_dependency_relationship =3D Relationship( + spdxId=3Dspdx_id_generator.generate(), + relationshipType=3D"dependsOn", + from_=3Dfile_elements[node.absolute_path], + to=3D[file_elements[n.absolute_path] for n in node.hardcod= ed_dependencies], + ) + build_and_relationship_elements.append(hardcoded_dependency_re= lationship) + + return build_and_relationship_elements --=20 2.34.1 From nobody Sat Feb 7 08:07:12 2026 Received: from mailgw01.zimbra-vnc.de (mailgw01.zimbra-vnc.de [148.251.101.236]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 2EAE82868B4; Mon, 26 Jan 2026 19:35:37 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=148.251.101.236 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1769456142; cv=none; b=glfJ5hfjLwb0mlPga7lxtQnpZsbYzocto3gn5w9VPMz6LmVrj+rjr3/FnDqjIkQ7eGQcA6Z8xT53JFZisEQmb+UBSnuxW2WdemmFU6wUwmEZKTWbdLHHyTHpJIxXvp/Z7vWbnKeyX46gQ5/BHfGM22iE1/C/UtRm/SLrAX/l69M= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1769456142; c=relaxed/simple; bh=h+H2TmDatq3oNLI02n2xtVqofZve0VPZObiOYaWGz8w=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=RXH4zEFFawXYBcwKUOnALrIQN/6vL4rbQSS6CJQNECULtwEdEKojrpz67Z/wl9gRMvr46gEoMMsav85n58yGSpxdy9D05Y9D8BnXJStNEi2sNgy1pWghp6ycm2/2mcaeoK4sPumzNfXmXh3KcOAREQAAUnXwPs+PSS1UoK1PRNs= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=tngtech.com; spf=pass smtp.mailfrom=tngtech.com; dkim=pass (2048-bit key) header.d=tngtech.com header.i=@tngtech.com header.b=DP78v96q; arc=none smtp.client-ip=148.251.101.236 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=tngtech.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=tngtech.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=tngtech.com header.i=@tngtech.com header.b="DP78v96q" Received: from zmproxy.tng.vnc.biz (zimbra-vnc.tngtech.com [35.234.71.156]) by mailgw01.zimbra-vnc.de (Postfix) with ESMTPS id 2E4993FAF3; Mon, 26 Jan 2026 20:35:34 +0100 (CET) Received: from localhost (localhost [127.0.0.1]) by zmproxy.tng.vnc.biz (Postfix) with ESMTP id 3B3461F8839; Mon, 26 Jan 2026 20:35:33 +0100 (CET) Received: from zmproxy.tng.vnc.biz ([127.0.0.1]) by localhost (zmproxy.tng.vnc.biz [127.0.0.1]) (amavis, port 10032) with ESMTP id pkh4fR6co6HU; Mon, 26 Jan 2026 20:35:31 +0100 (CET) Received: from localhost (localhost [127.0.0.1]) by zmproxy.tng.vnc.biz (Postfix) with ESMTP id 5E7F21FA703; Mon, 26 Jan 2026 20:35:31 +0100 (CET) DKIM-Filter: OpenDKIM Filter v2.10.3 zmproxy.tng.vnc.biz 5E7F21FA703 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=tngtech.com; s=B14491C6-869D-11EB-BB6C-8DD33D883B31; t=1769456131; bh=1WRWLe9/LHLZvky0Zvnuo20mt1L3lUvlIbFkMDMQqK0=; h=From:To:Subject:Date:Message-Id:MIME-Version; b=DP78v96qR52Atku6ehh+9TPRD1eZ9HgLxvjKyq/IVrhMc3DfFVQwNstsOJsRY8Hf+ 9I4h34lJ7L0jkTL1cWhq5T3aS4opOuSoZ8l/S/UB0k2888FKi4c9lLKa/0qQLbDHdZ y1f2vAy5dnTpmxuXG/ikMy5QiaGhDGmamitqktYoqpDev5busmAOWtBKOygaHi32dN Kk+QcGEc9yZNNY71UNVnDpY6T3a2yhFacX0a5MfY609J2ry3nStD266NVHHyLUliGi 7v/Izs04W5YdTLfj6jU8WUmDkPw3TLA9K6tO2FLQtN9myyg2BSnSWwhITs7pEXE0AB xXAK+0OQHoaHQ== X-Virus-Scanned: amavis at zmproxy.tng.vnc.biz Received: from zmproxy.tng.vnc.biz ([127.0.0.1]) by localhost (zmproxy.tng.vnc.biz [127.0.0.1]) (amavis, port 10026) with ESMTP id 6ApxIJoEnZXc; Mon, 26 Jan 2026 20:35:31 +0100 (CET) Received: from DESKTOP-0O0JV6I.localdomain (ipservice-092-208-231-176.092.208.pools.vodafone-ip.de [92.208.231.176]) by zmproxy.tng.vnc.biz (Postfix) with ESMTPSA id E409B1F8839; Mon, 26 Jan 2026 20:35:30 +0100 (CET) From: Luis Augenstein To: nathan@kernel.org, nsc@kernel.org Cc: linux-kbuild@vger.kernel.org, linux-kernel@vger.kernel.org, akpm@linux-foundation.org, gregkh@linuxfoundation.org, maximilian.huber@tngtech.com, Luis Augenstein Subject: [PATCH v3 13/14] tools/sbom: add unit tests for command parsers Date: Mon, 26 Jan 2026 20:33:03 +0100 Message-Id: <20260126193304.320916-14-luis.augenstein@tngtech.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20260126193304.320916-1-luis.augenstein@tngtech.com> References: <20260126193304.320916-1-luis.augenstein@tngtech.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Add unit tests to verify that command parsers correctly extract input files from build commands. Co-developed-by: Maximilian Huber Signed-off-by: Maximilian Huber Signed-off-by: Luis Augenstein --- tools/sbom/tests/__init__.py | 0 tools/sbom/tests/cmd_graph/__init__.py | 0 .../tests/cmd_graph/test_savedcmd_parser.py | 383 ++++++++++++++++++ 3 files changed, 383 insertions(+) create mode 100644 tools/sbom/tests/__init__.py create mode 100644 tools/sbom/tests/cmd_graph/__init__.py create mode 100644 tools/sbom/tests/cmd_graph/test_savedcmd_parser.py diff --git a/tools/sbom/tests/__init__.py b/tools/sbom/tests/__init__.py new file mode 100644 index 000000000000..e69de29bb2d1 diff --git a/tools/sbom/tests/cmd_graph/__init__.py b/tools/sbom/tests/cmd_= graph/__init__.py new file mode 100644 index 000000000000..e69de29bb2d1 diff --git a/tools/sbom/tests/cmd_graph/test_savedcmd_parser.py b/tools/sbo= m/tests/cmd_graph/test_savedcmd_parser.py new file mode 100644 index 000000000000..9409bc65ee25 --- /dev/null +++ b/tools/sbom/tests/cmd_graph/test_savedcmd_parser.py @@ -0,0 +1,383 @@ +# SPDX-License-Identifier: GPL-2.0-only OR MIT +# Copyright (C) 2025 TNG Technology Consulting GmbH + +import unittest + +from sbom.cmd_graph.savedcmd_parser import parse_inputs_from_commands +import sbom.sbom_logging as sbom_logging + + +class TestSavedCmdParser(unittest.TestCase): + def _assert_parsing(self, cmd: str, expected: str) -> None: + sbom_logging.init() + parsed =3D parse_inputs_from_commands(cmd, fail_on_unknown_build_c= ommand=3DFalse) + target =3D [] if expected =3D=3D "" else expected.split(" ") + self.assertEqual(parsed, target) + errors =3D sbom_logging._error_logger.messages # type: ignore + self.assertEqual(errors, {}) + + # Compound command tests + def test_dd_cat(self): + cmd =3D "(dd if=3Darch/x86/boot/setup.bin bs=3D4k conv=3Dsync stat= us=3Dnone; cat arch/x86/boot/vmlinux.bin) >arch/x86/boot/bzImage" + expected =3D "arch/x86/boot/setup.bin arch/x86/boot/vmlinux.bin" + self._assert_parsing(cmd, expected) + + def test_manual_file_creation(self): + cmd =3D """{ symbase=3D__dtbo_overlay_bad_unresolved; echo '$(poun= d)include '; echo '.section .rodata,"a"'; echo '= .balign STRUCT_ALIGNMENT'; echo ".global $${symbase}_begin"; echo "$${symba= se}_begin:"; echo '.incbin "drivers/of/unittest-data/overlay_bad_unresolved= .dtbo" '; echo ".global $${symbase}_end"; echo "$${symbase}_end:"; echo '.b= align STRUCT_ALIGNMENT'; } > drivers/of/unittest-data/overlay_bad_unresolve= d.dtbo.S""" + expected =3D "" + self._assert_parsing(cmd, expected) + + def test_cat_xz_wrap(self): + cmd =3D "{ cat arch/x86/boot/compressed/vmlinux.bin | sh ../script= s/xz_wrap.sh; printf \\130\\064\\024\\000; } > arch/x86/boot/compressed/vml= inux.bin.xz" + expected =3D "arch/x86/boot/compressed/vmlinux.bin" + self._assert_parsing(cmd, expected) + + def test_printf_sed(self): + cmd =3D r"""{ printf 'static char tomoyo_builtin_profile[] __init= data =3D\n'; sed -e 's/\\/\\\\/g' -e 's/\"/\\"/g' -e 's/\(.*\)/\t"\1\\n"/' = -- /dev/null; printf '\t"";\n'; printf 'static char tomoyo_builtin_excepti= on_policy[] __initdata =3D\n'; sed -e 's/\\/\\\\/g' -e 's/\"/\\"/g' -e 's/\= (.*\)/\t"\1\\n"/' -- ../security/tomoyo/policy/exception_policy.conf.defaul= t; printf '\t"";\n'; printf 'static char tomoyo_builtin_domain_policy[] __= initdata =3D\n'; sed -e 's/\\/\\\\/g' -e 's/\"/\\"/g' -e 's/\(.*\)/\t"\1\\n= "/' -- /dev/null; printf '\t"";\n'; printf 'static char tomoyo_builtin_man= ager[] __initdata =3D\n'; sed -e 's/\\/\\\\/g' -e 's/\"/\\"/g' -e 's/\(.*\)= /\t"\1\\n"/' -- /dev/null; printf '\t"";\n'; printf 'static char tomoyo_bu= iltin_stat[] __initdata =3D\n'; sed -e 's/\\/\\\\/g' -e 's/\"/\\"/g' -e 's/= \(.*\)/\t"\1\\n"/' -- /dev/null; printf '\t"";\n'; } > security/tomoyo/buil= tin-policy.h""" + expected =3D "../security/tomoyo/policy/exception_policy.conf.defa= ult" + self._assert_parsing(cmd, expected) + + def test_bin2c_echo(self): + cmd =3D """(echo "static char tomoyo_builtin_profile[] __initdata = =3D"; ./scripts/bin2c security/tomoyo/builtin-policy= .h""" + expected =3D "../security/tomoyo/policy/exception_policy.conf.defa= ult" + self._assert_parsing(cmd, expected) + + def test_cat_colon(self): + cmd =3D "{ cat init/modules.order; cat usr/modules.order; ca= t arch/x86/modules.order; cat arch/x86/boot/startup/modules.order; cat = kernel/modules.order; cat certs/modules.order; cat mm/modules.order; = cat fs/modules.order; cat ipc/modules.order; cat security/modules.order= ; cat crypto/modules.order; cat block/modules.order; cat io_uring/mod= ules.order; cat lib/modules.order; cat arch/x86/lib/modules.order; ca= t drivers/modules.order; cat sound/modules.order; cat samples/modules.o= rder; cat net/modules.order; cat virt/modules.order; cat arch/x86/pci= /modules.order; cat arch/x86/power/modules.order; cat arch/x86/video/mo= dules.order; :; } > modules.order" + expected =3D "init/modules.order usr/modules.order arch/x86/module= s.order arch/x86/boot/startup/modules.order kernel/modules.order certs/modu= les.order mm/modules.order fs/modules.order ipc/modules.order security/modu= les.order crypto/modules.order block/modules.order io_uring/modules.order l= ib/modules.order arch/x86/lib/modules.order drivers/modules.order sound/mod= ules.order samples/modules.order net/modules.order virt/modules.order arch/= x86/pci/modules.order arch/x86/power/modules.order arch/x86/video/modules.o= rder" + self._assert_parsing(cmd, expected) + + def test_cat_zstd(self): + cmd =3D "{ cat arch/x86/boot/compressed/vmlinux.bin arch/x86/boot/= compressed/vmlinux.relocs | zstd -22 --ultra; printf \\340\\362\\066\\003; = } > arch/x86/boot/compressed/vmlinux.bin.zst" + expected =3D "arch/x86/boot/compressed/vmlinux.bin arch/x86/boot/c= ompressed/vmlinux.relocs" + self._assert_parsing(cmd, expected) + + # cat command tests + def test_cat_redirect(self): + cmd =3D "cat ../fs/unicode/utf8data.c_shipped > fs/unicode/utf8dat= a.c" + expected =3D "../fs/unicode/utf8data.c_shipped" + self._assert_parsing(cmd, expected) + + def test_cat_piped(self): + cmd =3D "cat arch/x86/boot/compressed/vmlinux.bin arch/x86/boot/co= mpressed/vmlinux.relocs | gzip -n -f -9 > arch/x86/boot/compressed/vmlinux.= bin.gz" + expected =3D "arch/x86/boot/compressed/vmlinux.bin arch/x86/boot/c= ompressed/vmlinux.relocs" + self._assert_parsing(cmd, expected) + + # sed command tests + def test_sed(self): + cmd =3D "sed -n 's/.*define *BLIST_\\([A-Z0-9_]*\\) *.*/BLIST_FLAG= _NAME(\\1),/p' ../include/scsi/scsi_devinfo.h > drivers/scsi/scsi_devinfo_t= bl.c" + expected =3D "../include/scsi/scsi_devinfo.h" + self._assert_parsing(cmd, expected) + + # awk command tests + def test_awk(self): + cmd =3D "awk -f ../arch/arm64/tools/gen-cpucaps.awk ../arch/arm64/= tools/cpucaps > arch/arm64/include/generated/asm/cpucap-defs.h" + expected =3D "../arch/arm64/tools/cpucaps" + self._assert_parsing(cmd, expected) + + def test_awk_with_input_redirection(self): + cmd =3D "awk -v N=3D1 -f ../lib/raid6/unroll.awk < ../lib/raid6/in= t.uc > lib/raid6/int1.c" + expected =3D "../lib/raid6/int.uc" + self._assert_parsing(cmd, expected) + + # openssl command tests + def test_openssl(self): + cmd =3D "openssl req -new -nodes -utf8 -sha256 -days 36500 -batch = -x509 -config certs/x509.genkey -outform PEM -out certs/signing_key.pem -ke= yout certs/signing_key.pem 2>&1" + expected =3D "" + self._assert_parsing(cmd, expected) + + # gcc/clang command tests + def test_gcc(self): + cmd =3D ( + "gcc -Wp,-MMD,arch/x86/pci/.i386.o.d -nostdinc -I../arch/x86/i= nclude -I./arch/x86/include/generated -I../include -I./include -I../arch/x8= 6/include/uapi -I./arch/x86/include/generated/uapi -I../include/uapi -I./in= clude/generated/uapi -include ../include/linux/compiler-version.h -include = ../include/linux/kconfig.h -include ../include/linux/compiler_types.h -D__K= ERNEL__ -fmacro-prefix-map=3D../=3D -Werror -std=3Dgnu11 -fshort-wchar -fun= signed-char -fno-common -fno-PIE -fno-strict-aliasing -mno-sse -mno-mmx -mn= o-sse2 -mno-3dnow -mno-avx -fcf-protection=3Dbranch -fno-jump-tables -m64 -= falign-jumps=3D1 -falign-loops=3D1 -mno-80387 -mno-fp-ret-in-387 -mpreferre= d-stack-boundary=3D3 -mskip-rax-setup -march=3Dx86-64 -mtune=3Dgeneric -mno= -red-zone -mcmodel=3Dkernel -mstack-protector-guard-reg=3Dgs -mstack-protec= tor-guard-symbol=3D__ref_stack_chk_guard -Wno-sign-compare -fno-asynchronou= s-unwind-tables -mindirect-branch=3Dthunk-extern -mindirect-branch-register= -mindirect-branch-cs-prefix -mfunction-return=3Dthunk-extern -fno-jump-tab= les -fpatchable-function-entry=3D16,16 -fno-delete-null-pointer-checks -O2 = -fno-allow-store-data-races -fstack-protector-strong -fomit-frame-pointer -= fno-stack-clash-protection -falign-functions=3D16 -fno-strict-overflow -fno= -stack-check -fconserve-stack -fno-builtin-wcslen -Wall -Wextra -Wundef -We= rror=3Dimplicit-function-declaration -Werror=3Dimplicit-int -Werror=3Dretur= n-type -Werror=3Dstrict-prototypes -Wno-format-security -Wno-trigraphs -Wno= -frame-address -Wno-address-of-packed-member -Wmissing-declarations -Wmissi= ng-prototypes -Wframe-larger-than=3D2048 -Wno-main -Wvla-larger-than=3D1 -W= no-pointer-sign -Wcast-function-type -Wno-array-bounds -Wno-stringop-overfl= ow -Wno-alloc-size-larger-than -Wimplicit-fallthrough=3D5 -Werror=3Ddate-ti= me -Werror=3Dincompatible-pointer-types -Werror=3Ddesignated-init -Wenum-co= nversion -Wunused -Wno-unused-but-set-variable -Wno-unused-const-variable -= Wno-packed-not-aligned -Wno-format-overflow -Wno-format-truncation -Wno-str= ingop-truncation -Wno-override-init -Wno-missing-field-initializers -Wno-ty= pe-limits -Wno-shift-negative-value -Wno-maybe-uninitialized -Wno-sign-comp= are -Wno-unused-parameter -I../arch/x86/pci -Iarch/x86/pci -DKBUILD_MODF= ILE=3D" + "arch/x86/pci/i386" + " -DKBUILD_BASENAME=3D" + "i386" + " -DKBUILD_MODNAME=3D" + "i386" + " -D__KBUILD_MODNAME=3Dkmod_i386 -c -o arch/x86/pci/i386.o ../= arch/x86/pci/i386.c " + ) + expected =3D "../arch/x86/pci/i386.c" + self._assert_parsing(cmd, expected) + + def test_gcc_linking(self): + cmd =3D "gcc -o arch/x86/tools/relocs arch/x86/tools/relocs_32.o= arch/x86/tools/relocs_64.o arch/x86/tools/relocs_common.o" + expected =3D "arch/x86/tools/relocs_32.o arch/x86/tools/relocs_64.= o arch/x86/tools/relocs_common.o" + self._assert_parsing(cmd, expected) + + def test_gcc_without_compile_flag(self): + cmd =3D "gcc -Wp,-MMD,arch/x86/boot/compressed/.mkpiggy.d -Wall -W= missing-prototypes -Wstrict-prototypes -O2 -fomit-frame-pointer -std=3Dgnu1= 1 -I ../scripts/include -I../tools/include -I arch/x86/boot/compressed = -o arch/x86/boot/compressed/mkpiggy ../arch/x86/boot/compressed/mkpiggy.c" + expected =3D "../arch/x86/boot/compressed/mkpiggy.c" + self._assert_parsing(cmd, expected) + + def test_clang(self): + cmd =3D """clang -Wp,-MMD,arch/x86/entry/.entry_64_compat.o.d -nos= tdinc -I../arch/x86/include -I./arch/x86/include/generated -I../include -I.= /include -I../arch/x86/include/uapi -I./arch/x86/include/generated/uapi -I.= ./include/uapi -I./include/generated/uapi -include ../include/linux/compile= r-version.h -include ../include/linux/kconfig.h -D__KERNEL__ --target=3Dx86= _64-linux-gnu -fintegrated-as -Werror=3Dunknown-warning-option -Werror=3Dig= nored-optimization-argument -Werror=3Doption-ignored -Werror=3Dunused-comma= nd-line-argument -fmacro-prefix-map=3D../=3D -Werror -D__ASSEMBLY__ -fno-PI= E -m64 -I../arch/x86/entry -Iarch/x86/entry -DKBUILD_MODFILE=3D'"arch/x8= 6/entry/entry_64_compat"' -DKBUILD_MODNAME=3D'"entry_64_compat"' -D__KBUILD= _MODNAME=3Dkmod_entry_64_compat -c -o arch/x86/entry/entry_64_compat.o ../a= rch/x86/entry/entry_64_compat.S""" + expected =3D "../arch/x86/entry/entry_64_compat.S" + self._assert_parsing(cmd, expected) + + # ld command tests + def test_ld(self): + cmd =3D 'ld -o arch/x86/entry/vdso/vdso64.so.dbg -shared --hash-st= yle=3Dboth --build-id=3Dsha1 --no-undefined --eh-frame-hdr -Bsymbolic -z n= oexecstack -m elf_x86_64 -soname linux-vdso.so.1 -z max-page-size=3D4096 -T= arch/x86/entry/vdso/vdso.lds arch/x86/entry/vdso/vdso-note.o arch/x86/entr= y/vdso/vclock_gettime.o arch/x86/entry/vdso/vgetcpu.o arch/x86/entry/vdso/v= getrandom.o arch/x86/entry/vdso/vgetrandom-chacha.o; if readelf -rW arch/x8= 6/entry/vdso/vdso64.so.dbg | grep -v _NONE | grep -q " R_\w*_"; then (echo = >&2 "arch/x86/entry/vdso/vdso64.so.dbg: dynamic relocations are not support= ed"; rm -f arch/x86/entry/vdso/vdso64.so.dbg; /bin/false); fi' # type: ign= ore + expected =3D "arch/x86/entry/vdso/vdso-note.o arch/x86/entry/vdso/= vclock_gettime.o arch/x86/entry/vdso/vgetcpu.o arch/x86/entry/vdso/vgetrand= om.o arch/x86/entry/vdso/vgetrandom-chacha.o" + self._assert_parsing(cmd, expected) + + def test_ld_whole_archive(self): + cmd =3D "ld -m elf_x86_64 -z noexecstack -r -o vmlinux.o --whole= -archive vmlinux.a --no-whole-archive --start-group --end-group" + expected =3D "vmlinux.a" + self._assert_parsing(cmd, expected) + + def test_ld_with_at_symbol(self): + cmd =3D "ld.lld -m elf_x86_64 -z noexecstack -r -o fs/efivarfs/e= fivarfs.o @fs/efivarfs/efivarfs.mod ; ./tools/objtool/objtool --hacks=3Dju= mp_label --hacks=3Dnoinstr --hacks=3Dskylake --ibt --orc --retpoline --reth= unk --static-call --uaccess --prefix=3D16 --link --module fs/efivarfs/efi= varfs.o" + expected =3D "@fs/efivarfs/efivarfs.mod" + self._assert_parsing(cmd, expected) + + def test_ld_if_objdump(self): + cmd =3D """ld -o arch/x86/entry/vdso/vdso64.so.dbg -shared --hash-= style=3Dboth --build-id=3Dsha1 --eh-frame-hdr -Bsymbolic -z noexecstack -m= elf_x86_64 -soname linux-vdso.so.1 --no-undefined -z max-page-size=3D4096 = -T arch/x86/entry/vdso/vdso.lds arch/x86/entry/vdso/vdso-note.o arch/x86/en= try/vdso/vclock_gettime.o arch/x86/entry/vdso/vgetcpu.o arch/x86/entry/vdso= /vsgx.o && sh ./arch/x86/entry/vdso/checkundef.sh 'nm' 'arch/x86/entry/vdso= /vdso64.so.dbg'; if objdump -R arch/x86/entry/vdso/vdso64.so.dbg | grep -E = -h "R_X86_64_JUMP_SLOT|R_X86_64_GLOB_DAT|R_X86_64_RELATIVE| R_386_GLOB_DAT|= R_386_JMP_SLOT|R_386_RELATIVE"; then (echo >&2 "arch/x86/entry/vdso/vdso64.= so.dbg: dynamic relocations are not supported"; rm -f arch/x86/entry/vdso/v= dso64.so.dbg; /bin/false); fi""" + expected =3D "arch/x86/entry/vdso/vdso-note.o arch/x86/entry/vdso/= vclock_gettime.o arch/x86/entry/vdso/vgetcpu.o arch/x86/entry/vdso/vsgx.o" + self._assert_parsing(cmd, expected) + + # printf | xargs ar command tests + def test_ar_printf(self): + cmd =3D 'rm -f built-in.a; printf "./%s " init/built-in.a usr/bui= lt-in.a arch/x86/built-in.a arch/x86/boot/startup/built-in.a kernel/built-i= n.a certs/built-in.a mm/built-in.a fs/built-in.a ipc/built-in.a security/bu= ilt-in.a crypto/built-in.a block/built-in.a io_uring/built-in.a lib/built-i= n.a arch/x86/lib/built-in.a drivers/built-in.a sound/built-in.a net/built-i= n.a virt/built-in.a arch/x86/pci/built-in.a arch/x86/power/built-in.a arch/= x86/video/built-in.a | xargs ar cDPrST built-in.a' + expected =3D "./init/built-in.a ./usr/built-in.a ./arch/x86/built-= in.a ./arch/x86/boot/startup/built-in.a ./kernel/built-in.a ./certs/built-i= n.a ./mm/built-in.a ./fs/built-in.a ./ipc/built-in.a ./security/built-in.a = ./crypto/built-in.a ./block/built-in.a ./io_uring/built-in.a ./lib/built-in= .a ./arch/x86/lib/built-in.a ./drivers/built-in.a ./sound/built-in.a ./net/= built-in.a ./virt/built-in.a ./arch/x86/pci/built-in.a ./arch/x86/power/bui= lt-in.a ./arch/x86/video/built-in.a" + self._assert_parsing(cmd, expected) + + def test_ar_printf_nested(self): + cmd =3D 'rm -f arch/x86/pci/built-in.a; printf "arch/x86/pci/%s "= i386.o init.o mmconfig_64.o direct.o mmconfig-shared.o fixup.o acpi.o lega= cy.o irq.o common.o early.o bus_numa.o amd_bus.o | xargs ar cDPrST arch/x86= /pci/built-in.a' + expected =3D "arch/x86/pci/i386.o arch/x86/pci/init.o arch/x86/pci= /mmconfig_64.o arch/x86/pci/direct.o arch/x86/pci/mmconfig-shared.o arch/x8= 6/pci/fixup.o arch/x86/pci/acpi.o arch/x86/pci/legacy.o arch/x86/pci/irq.o = arch/x86/pci/common.o arch/x86/pci/early.o arch/x86/pci/bus_numa.o arch/x86= /pci/amd_bus.o" + self._assert_parsing(cmd, expected) + + # ar command tests + def test_ar_reordering(self): + cmd =3D "rm -f vmlinux.a; ar cDPrST vmlinux.a built-in.a lib/lib.= a arch/x86/lib/lib.a; ar mPiT $$(ar t vmlinux.a | sed -n 1p) vmlinux.a $$(a= r t vmlinux.a | grep -F -f ../scripts/head-object-list.txt)" + expected =3D "built-in.a lib/lib.a arch/x86/lib/lib.a" + self._assert_parsing(cmd, expected) + + def test_ar_default(self): + cmd =3D "rm -f lib/lib.a; ar cDPrsT lib/lib.a lib/argv_split.o lib= /bug.o lib/buildid.o lib/clz_tab.o lib/cmdline.o lib/cpumask.o lib/ctype.o = lib/dec_and_lock.o lib/decompress.o lib/decompress_bunzip2.o lib/decompress= _inflate.o lib/decompress_unlz4.o lib/decompress_unlzma.o lib/decompress_un= lzo.o lib/decompress_unxz.o lib/decompress_unzstd.o lib/dump_stack.o lib/ea= rlycpio.o lib/extable.o lib/flex_proportions.o lib/idr.o lib/iomem_copy.o l= ib/irq_regs.o lib/is_single_threaded.o lib/klist.o lib/kobject.o lib/kobjec= t_uevent.o lib/logic_pio.o lib/maple_tree.o lib/memcat_p.o lib/nmi_backtrac= e.o lib/objpool.o lib/plist.o lib/radix-tree.o lib/ratelimit.o lib/rbtree.o= lib/seq_buf.o lib/siphash.o lib/string.o lib/sys_info.o lib/timerqueue.o l= ib/union_find.o lib/vsprintf.o lib/win_minmax.o lib/xarray.o" + expected =3D "lib/argv_split.o lib/bug.o lib/buildid.o lib/clz_tab= .o lib/cmdline.o lib/cpumask.o lib/ctype.o lib/dec_and_lock.o lib/decompres= s.o lib/decompress_bunzip2.o lib/decompress_inflate.o lib/decompress_unlz4.= o lib/decompress_unlzma.o lib/decompress_unlzo.o lib/decompress_unxz.o lib/= decompress_unzstd.o lib/dump_stack.o lib/earlycpio.o lib/extable.o lib/flex= _proportions.o lib/idr.o lib/iomem_copy.o lib/irq_regs.o lib/is_single_thre= aded.o lib/klist.o lib/kobject.o lib/kobject_uevent.o lib/logic_pio.o lib/m= aple_tree.o lib/memcat_p.o lib/nmi_backtrace.o lib/objpool.o lib/plist.o li= b/radix-tree.o lib/ratelimit.o lib/rbtree.o lib/seq_buf.o lib/siphash.o lib= /string.o lib/sys_info.o lib/timerqueue.o lib/union_find.o lib/vsprintf.o l= ib/win_minmax.o lib/xarray.o" + self._assert_parsing(cmd, expected) + + def test_ar_llvm(self): + cmd =3D "llvm-ar mPiT $$(llvm-ar t vmlinux.a | sed -n 1p) vmlinux.= a $$(llvm-ar t vmlinux.a | grep -F -f ../scripts/head-object-list.txt)" + expected =3D "" + self._assert_parsing(cmd, expected) + + # nm command tests + def test_nm(self): + cmd =3D """llvm-nm -p --defined-only rust/core.o | awk '$$2~/(T|R|= D|B)/ && $$3!~/__(pfx|cfi|odr_asan)/ { printf "EXPORT_SYMBOL_RUST_GPL(%s);\= n",$$3 }' > rust/exports_core_generated.h""" + expected =3D "rust/core.o" + self._assert_parsing(cmd, expected) + + def test_nm_vmlinux(self): + cmd =3D r"nm vmlinux | sed -n -e 's/^\([0-9a-fA-F]*\) [ABbCDGRSTtV= W] \(_text\|__start_rodata\|__bss_start\|_end\)$/#define VO_\2 _AC(0x\1,UL)= /p' > arch/x86/boot/voffset.h" + expected =3D "vmlinux" + self._assert_parsing(cmd, expected) + + # objcopy command tests + def test_objcopy(self): + cmd =3D "objcopy --remove-section=3D'.rel*' --remove-section=3D!'.= rel*.dyn' vmlinux.unstripped vmlinux" + expected =3D "vmlinux.unstripped" + self._assert_parsing(cmd, expected) + + def test_objcopy_llvm(self): + cmd =3D "llvm-objcopy --remove-section=3D'.rel*' --remove-section= =3D!'.rel*.dyn' vmlinux.unstripped vmlinux" + expected =3D "vmlinux.unstripped" + self._assert_parsing(cmd, expected) + + # strip command tests + def test_strip(self): + cmd =3D "strip --strip-debug -o drivers/firmware/efi/libstub/mem.s= tub.o drivers/firmware/efi/libstub/mem.o" + expected =3D "drivers/firmware/efi/libstub/mem.o" + self._assert_parsing(cmd, expected) + + # rustc command tests + def test_rustc(self): + cmd =3D """OBJTREE=3D/workspace/linux/kernel_build rustc -Zbinary_= dep_depinfo=3Dy -Astable_features -Dnon_ascii_idents -Dunsafe_op_in_unsafe_= fn -Wmissing_docs -Wrust_2018_idioms -Wclippy::all -Wclippy::as_ptr_cast_mu= t -Wclippy::as_underscore -Wclippy::cast_lossless -Wclippy::ignored_unit_pa= tterns -Wclippy::mut_mut -Wclippy::needless_bitwise_bool -Aclippy::needless= _lifetimes -Wclippy::no_mangle_with_rust_abi -Wclippy::ptr_as_ptr -Wclippy:= :ptr_cast_constness -Wclippy::ref_as_ptr -Wclippy::undocumented_unsafe_bloc= ks -Wclippy::unnecessary_safety_comment -Wclippy::unnecessary_safety_doc -W= rustdoc::missing_crate_level_docs -Wrustdoc::unescaped_backticks -Cpanic=3D= abort -Cembed-bitcode=3Dn -Clto=3Dn -Cforce-unwind-tables=3Dn -Ccodegen-uni= ts=3D1 -Csymbol-mangling-version=3Dv0 -Crelocation-model=3Dstatic -Zfunctio= n-sections=3Dn -Wclippy::float_arithmetic --target=3D./scripts/target.json = -Ctarget-feature=3D-sse,-sse2,-sse3,-ssse3,-sse4.1,-sse4.2,-avx,-avx2 -Zcf-= protection=3Dbranch -Zno-jump-tables -Ctarget-cpu=3Dx86-64 -Ztune-cpu=3Dgen= eric -Cno-redzone=3Dy -Ccode-model=3Dkernel -Zfunction-return=3Dthunk-exter= n -Zpatchable-function-entry=3D16,16 -Copt-level=3D2 -Cdebug-assertions=3Dn= -Coverflow-checks=3Dy -Dwarnings @./include/generated/rustc_cfg --edition= =3D2021 --cfg no_fp_fmt_parse --emit=3Ddep-info=3Drust/.core.o.d --emit=3Do= bj=3Drust/core.o --emit=3Dmetadata=3Drust/libcore.rmeta --crate-type rlib -= L./rust --crate-name core /usr/lib/rust-1.84/lib/rustlib/src/rust/library/c= ore/src/lib.rs --sysroot=3D/dev/null ;llvm-objcopy --redefine-sym __addsf3= =3D__rust__addsf3 --redefine-sym __eqsf2=3D__rust__eqsf2 --redefine-sym __e= xtendsfdf2=3D__rust__extendsfdf2 --redefine-sym __gesf2=3D__rust__gesf2 --r= edefine-sym __lesf2=3D__rust__lesf2 --redefine-sym __ltsf2=3D__rust__ltsf2 = --redefine-sym __mulsf3=3D__rust__mulsf3 --redefine-sym __nesf2=3D__rust__n= esf2 --redefine-sym __truncdfsf2=3D__rust__truncdfsf2 --redefine-sym __unor= dsf2=3D__rust__unordsf2 --redefine-sym __adddf3=3D__rust__adddf3 --redefine= -sym __eqdf2=3D__rust__eqdf2 --redefine-sym __ledf2=3D__rust__ledf2 --redef= ine-sym __ltdf2=3D__rust__ltdf2 --redefine-sym __muldf3=3D__rust__muldf3 --= redefine-sym __unorddf2=3D__rust__unorddf2 --redefine-sym __muloti4=3D__rus= t__muloti4 --redefine-sym __multi3=3D__rust__multi3 --redefine-sym __udivmo= dti4=3D__rust__udivmodti4 --redefine-sym __udivti3=3D__rust__udivti3 --rede= fine-sym __umodti3=3D__rust__umodti3 rust/core.o""" + expected =3D "/usr/lib/rust-1.84/lib/rustlib/src/rust/library/core= /src/lib.rs rust/core.o" + self._assert_parsing(cmd, expected) + + # rustdoc command tests + def test_rustdoc(self): + cmd =3D """OBJTREE=3D/workspace/linux/kernel_build rustdoc --test = --edition=3D2021 -Zbinary_dep_depinfo=3Dy -Astable_features -Dnon_ascii_ide= nts -Dunsafe_op_in_unsafe_fn -Wmissing_docs -Wrust_2018_idioms -Wunreachabl= e_pub -Wclippy::all -Wclippy::as_ptr_cast_mut -Wclippy::as_underscore -Wcli= ppy::cast_lossless -Wclippy::ignored_unit_patterns -Wclippy::mut_mut -Wclip= py::needless_bitwise_bool -Aclippy::needless_lifetimes -Wclippy::no_mangle_= with_rust_abi -Wclippy::ptr_as_ptr -Wclippy::ptr_cast_constness -Wclippy::r= ef_as_ptr -Wclippy::undocumented_unsafe_blocks -Wclippy::unnecessary_safety= _comment -Wclippy::unnecessary_safety_doc -Wrustdoc::missing_crate_level_do= cs -Wrustdoc::unescaped_backticks -Cpanic=3Dabort -Cembed-bitcode=3Dn -Clto= =3Dn -Cforce-unwind-tables=3Dn -Ccodegen-units=3D1 -Csymbol-mangling-versio= n=3Dv0 -Crelocation-model=3Dstatic -Zfunction-sections=3Dn -Wclippy::float_= arithmetic --target=3Daarch64-unknown-none -Ctarget-feature=3D"-neon" -Cfor= ce-unwind-tables=3Dn -Zbranch-protection=3Dpac-ret -Copt-level=3D2 -Cdebug-= assertions=3Dy -Coverflow-checks=3Dy -Dwarnings -Cforce-frame-pointers=3Dy = -Zsanitizer=3Dkernel-address -Zsanitizer-recover=3Dkernel-address -Cllvm-ar= gs=3D-asan-mapping-offset=3D0xdfff800000000000 -Cpasses=3Dsancov-module -Cl= lvm-args=3D-sanitizer-coverage-level=3D3 -Cllvm-args=3D-sanitizer-coverage-= trace-pc -Cllvm-args=3D-sanitizer-coverage-trace-compares @./include/genera= ted/rustc_cfg -L./rust --extern ffi --extern pin_init --extern kernel --ext= ern build_error --extern macros --extern bindings --extern uapi --no-run --= crate-name kernel -Zunstable-options --sysroot=3D/dev/null --test-builder = ./scripts/rustdoc_test_builder ../rust/kernel/lib.rs >/dev/null""" + expected =3D "../rust/kernel/lib.rs" + self._assert_parsing(cmd, expected) + + def test_rustdoc_test_gen(self): + cmd =3D "./scripts/rustdoc_test_gen" + expected =3D "" + self._assert_parsing(cmd, expected) + + # flex command tests + def test_flex(self): + cmd =3D "flex -oscripts/kconfig/lexer.lex.c -L ../scripts/kconfig/= lexer.l" + expected =3D "../scripts/kconfig/lexer.l" + self._assert_parsing(cmd, expected) + + # bison command tests + def test_bison(self): + cmd =3D "bison -o scripts/kconfig/parser.tab.c --defines=3Dscripts= /kconfig/parser.tab.h -t -l ../scripts/kconfig/parser.y" + expected =3D "../scripts/kconfig/parser.y" + self._assert_parsing(cmd, expected) + + # bindgen command tests + def test_bindgen(self): + cmd =3D ( + "bindgen ../rust/bindings/bindings_helper.h " + "--blocklist-type __kernel_s?size_t --blocklist-type __kernel_= ptrdiff_t " + "--opaque-type xregs_state --opaque-type desc_struct --no-doc-= comments " + "--rust-target 1.68 --use-core --with-derive-default -o rust/b= indings/bindings_generated.rs " + "-- -Wp,-MMD,rust/bindings/.bindings_generated.rs.d -nostdinc = -I../arch/x86/include " + "-include ../include/linux/compiler-version.h -D__KERNEL__ -fi= ntegrated-as -fno-builtin -DMODULE; " + "sed -Ei 's/pub const RUST_CONST_HELPER_([a-zA-Z0-9_]*)/pub co= nst \\1/g' rust/bindings/bindings_generated.rs" + ) + expected =3D "../rust/bindings/bindings_helper.h ../include/linux/= compiler-version.h" + self._assert_parsing(cmd, expected) + + # perl command tests + def test_perl(self): + cmd =3D "perl ../lib/crypto/x86/poly1305-x86_64-cryptogams.pl > li= b/crypto/x86/poly1305-x86_64-cryptogams.S" + expected =3D "../lib/crypto/x86/poly1305-x86_64-cryptogams.pl" + self._assert_parsing(cmd, expected) + + # link-vmlinux.sh command tests + def test_link_vmlinux(self): + cmd =3D '../scripts/link-vmlinux.sh "ld" "-m elf_x86_64 -z noexecs= tack" "-z max-page-size=3D0x200000 --build-id=3Dsha1 --orphan-handling=3Der= ror --emit-relocs --discard-none" "vmlinux.unstripped"; true' + expected =3D "vmlinux.a" + self._assert_parsing(cmd, expected) + + def test_link_vmlinux_postlink(self): + cmd =3D '../scripts/link-vmlinux.sh "ld" "-m elf_x86_64 -z noexecs= tack --no-warn-rwx-segments" "--emit-relocs --discard-none -z max-page-size= =3D0x200000 --build-id=3Dsha1 -X --orphan-handling=3Derror"; make -f ../ar= ch/x86/Makefile.postlink vmlinux' + expected =3D "vmlinux.a" + self._assert_parsing(cmd, expected) + + # syscallhdr.sh command tests + def test_syscallhdr(self): + cmd =3D "sh ../scripts/syscallhdr.sh --abis common,64 --emit-nr = ../arch/x86/entry/syscalls/syscall_64.tbl arch/x86/include/generated/uapi/a= sm/unistd_64.h" + expected =3D "../arch/x86/entry/syscalls/syscall_64.tbl" + self._assert_parsing(cmd, expected) + + # syscalltbl.sh command tests + def test_syscalltbl(self): + cmd =3D "sh ../scripts/syscalltbl.sh --abis common,64 ../arch/x86/= entry/syscalls/syscall_64.tbl arch/x86/include/generated/asm/syscalls_64.h" + expected =3D "../arch/x86/entry/syscalls/syscall_64.tbl" + self._assert_parsing(cmd, expected) + + # mkcapflags.sh command tests + def test_mkcapflags(self): + cmd =3D "sh ../arch/x86/kernel/cpu/mkcapflags.sh arch/x86/kernel/c= pu/capflags.c ../arch/x86/kernel/cpu/../../include/asm/cpufeatures.h ../arc= h/x86/kernel/cpu/../../include/asm/vmxfeatures.h ../arch/x86/kernel/cpu/mkc= apflags.sh FORCE" + expected =3D "../arch/x86/kernel/cpu/../../include/asm/cpufeatures= .h ../arch/x86/kernel/cpu/../../include/asm/vmxfeatures.h" + self._assert_parsing(cmd, expected) + + # orc_hash.sh command tests + def test_orc_hash(self): + cmd =3D "mkdir -p arch/x86/include/generated/asm/; sh ../scripts/o= rc_hash.sh < ../arch/x86/include/asm/orc_types.h > arch/x86/include/generat= ed/asm/orc_hash.h" + expected =3D "../arch/x86/include/asm/orc_types.h" + self._assert_parsing(cmd, expected) + + # xen-hypercalls.sh command tests + def test_xen_hypercalls(self): + cmd =3D "sh '../scripts/xen-hypercalls.sh' arch/x86/include/genera= ted/asm/xen-hypercalls.h ../include/xen/interface/xen-mca.h ../include/xen/= interface/xen.h ../include/xen/interface/xenpmu.h" + expected =3D "../include/xen/interface/xen-mca.h ../include/xen/in= terface/xen.h ../include/xen/interface/xenpmu.h" + self._assert_parsing(cmd, expected) + + # gen_initramfs.sh command tests + def test_gen_initramfs(self): + cmd =3D "sh ../usr/gen_initramfs.sh -o usr/initramfs_data.cpio -l = usr/.initramfs_data.cpio.d ../usr/default_cpio_list" + expected =3D "../usr/default_cpio_list" + self._assert_parsing(cmd, expected) + + # vdso2c command tests + def test_vdso2c(self): + cmd =3D "arch/x86/entry/vdso/vdso2c arch/x86/entry/vdso/vdso64.so.= dbg arch/x86/entry/vdso/vdso64.so arch/x86/entry/vdso/vdso-image-64.c" + expected =3D "arch/x86/entry/vdso/vdso64.so.dbg arch/x86/entry/vds= o/vdso64.so" + self._assert_parsing(cmd, expected) + + # mkpiggy command tests + def test_mkpiggy(self): + cmd =3D "arch/x86/boot/compressed/mkpiggy arch/x86/boot/compressed= /vmlinux.bin.gz > arch/x86/boot/compressed/piggy.S" + expected =3D "arch/x86/boot/compressed/vmlinux.bin.gz" + self._assert_parsing(cmd, expected) + + # relocs command tests + def test_relocs(self): + cmd =3D "arch/x86/tools/relocs vmlinux.unstripped > arch/x86/boot/= compressed/vmlinux.relocs;arch/x86/tools/relocs --abs-relocs vmlinux.unstri= pped" + expected =3D "vmlinux.unstripped" + self._assert_parsing(cmd, expected) + + def test_relocs_with_realmode(self): + cmd =3D ( + "arch/x86/tools/relocs --realmode arch/x86/realmode/rm/realmod= e.elf > arch/x86/realmode/rm/realmode.relocs" + ) + expected =3D "arch/x86/realmode/rm/realmode.elf" + self._assert_parsing(cmd, expected) + + # mk_elfconfig command tests + def test_mk_elfconfig(self): + cmd =3D "scripts/mod/mk_elfconfig < scripts/mod/empty.o > scripts/= mod/elfconfig.h" + expected =3D "scripts/mod/empty.o" + self._assert_parsing(cmd, expected) + + # tools/build command tests + def test_build(self): + cmd =3D "arch/x86/boot/tools/build arch/x86/boot/setup.bin arch/x8= 6/boot/vmlinux.bin arch/x86/boot/zoffset.h arch/x86/boot/bzImage" + expected =3D "arch/x86/boot/setup.bin arch/x86/boot/vmlinux.bin ar= ch/x86/boot/zoffset.h" + self._assert_parsing(cmd, expected) + + # extract-cert command tests + def test_extract_cert(self): + cmd =3D 'certs/extract-cert "" certs/signing_key.x509' + expected =3D "" + self._assert_parsing(cmd, expected) + + # dtc command tests + def test_dtc_cat(self): + cmd =3D "./scripts/dtc/dtc -o drivers/of/empty_root.dtb -b 0 -i../= drivers/of/ -i../scripts/dtc/include-prefixes -Wno-unique_unit_address -Wno= -unit_address_vs_reg -Wno-avoid_unnecessary_addr_size -Wno-alias_paths -Wno= -graph_child_address -Wno-simple_bus_reg -d drivers/of/.empty_root.dtb.d.= dtc.tmp drivers/of/.empty_root.dtb.dts.tmp ; cat drivers/of/.empty_root.dtb= .d.pre.tmp drivers/of/.empty_root.dtb.d.dtc.tmp > drivers/of/.empty_root.dt= b.d" + expected =3D "drivers/of/.empty_root.dtb.dts.tmp drivers/of/.empty= _root.dtb.d.pre.tmp drivers/of/.empty_root.dtb.d.dtc.tmp" + self._assert_parsing(cmd, expected) + + # pnmtologo command tests + def test_pnmtologo(self): + cmd =3D "drivers/video/logo/pnmtologo -t clut224 -n logo_linux_clu= t224 -o drivers/video/logo/logo_linux_clut224.c ../drivers/video/logo/logo_= linux_clut224.ppm" + expected =3D "../drivers/video/logo/logo_linux_clut224.ppm" + self._assert_parsing(cmd, expected) + + # relacheck command tests + def test_relacheck(self): + cmd =3D "arch/arm64/kernel/pi/relacheck arch/arm64/kernel/pi/idreg= -override.pi.o arch/arm64/kernel/pi/idreg-override.o" + expected =3D "arch/arm64/kernel/pi/idreg-override.pi.o" + self._assert_parsing(cmd, expected) + + # mkregtable command tests + def test_mkregtable(self): + cmd =3D "drivers/gpu/drm/radeon/mkregtable ../drivers/gpu/drm/rade= on/reg_srcs/r100 > drivers/gpu/drm/radeon/r100_reg_safe.h" + expected =3D "../drivers/gpu/drm/radeon/reg_srcs/r100" + self._assert_parsing(cmd, expected) + + # genheaders command tests + def test_genheaders(self): + cmd =3D "security/selinux/genheaders security/selinux/flask.h secu= rity/selinux/av_permissions.h" + expected =3D "" + self._assert_parsing(cmd, expected) + + # mkcpustr command tests + def test_mkcpustr(self): + cmd =3D "arch/x86/boot/mkcpustr > arch/x86/boot/cpustr.h" + expected =3D "" + self._assert_parsing(cmd, expected) + + # polgen command tests + def test_polgen(self): + cmd =3D "scripts/ipe/polgen/polgen security/ipe/boot_policy.c" + expected =3D "" + self._assert_parsing(cmd, expected) + + # gen_header.py command tests + def test_gen_header(self): + cmd =3D "mkdir -p drivers/gpu/drm/msm/generated && python3 ../driv= ers/gpu/drm/msm/registers/gen_header.py --no-validate --rnn ../drivers/gpu/= drm/msm/registers --xml ../drivers/gpu/drm/msm/registers/adreno/a2xx.xml c-= defines > drivers/gpu/drm/msm/generated/a2xx.xml.h" + expected =3D "../drivers/gpu/drm/msm/registers/adreno/a2xx.xml" + self._assert_parsing(cmd, expected) + + +if __name__ =3D=3D "__main__": + unittest.main() --=20 2.34.1 From nobody Sat Feb 7 08:07:12 2026 Received: from mailgw01.zimbra-vnc.de (mailgw01.zimbra-vnc.de [148.251.101.236]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 76921286D4D; Mon, 26 Jan 2026 19:35:37 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=148.251.101.236 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1769456139; cv=none; b=K2FfHubQu97A6ze/HNHpuJl3gNU04coTAcCjHmuFyN+4Zpo4uJvdVohobV2gCXH8RwIF9xHHaqQUjYNtT2JB6nawVomQFZl7XKopnLy6wGWqAUVI7cDHhWDM4eS+T/Uax+dC+zexcThzGuA/PCy+4zYeV3V1yybDGFdNNiXWm4o= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1769456139; c=relaxed/simple; bh=ifK6y7ESTkLRQ3A93vdqh2szgwDr9S74b2hXw6gK29A=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=aeTX9VW/RoYqKl8rAyXExemj1ts8ENhpKcBubdsBeK/mwrvWabyqTW+/VffNo+65bwyuVNMGCQ5axA5wI4dKqZBEhMHDTlAi9ItnmDb7fwsf6JHZ9eDo5Hw0tOvnctbEvCcOrTg15x9feePVZjGMCnX7PqFA/m7bcHEFo6MDvAc= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=tngtech.com; spf=pass smtp.mailfrom=tngtech.com; dkim=pass (2048-bit key) header.d=tngtech.com header.i=@tngtech.com header.b=mU69tfIY; arc=none smtp.client-ip=148.251.101.236 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=tngtech.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=tngtech.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=tngtech.com header.i=@tngtech.com header.b="mU69tfIY" Received: from zmproxy.tng.vnc.biz (zimbra-vnc.tngtech.com [35.234.71.156]) by mailgw01.zimbra-vnc.de (Postfix) with ESMTPS id 29DBB3FAF2; Mon, 26 Jan 2026 20:35:34 +0100 (CET) Received: from localhost (localhost [127.0.0.1]) by zmproxy.tng.vnc.biz (Postfix) with ESMTP id E42901FA3D7; Mon, 26 Jan 2026 20:35:32 +0100 (CET) Received: from zmproxy.tng.vnc.biz ([127.0.0.1]) by localhost (zmproxy.tng.vnc.biz [127.0.0.1]) (amavis, port 10032) with ESMTP id GdYr0lHnBxoo; Mon, 26 Jan 2026 20:35:32 +0100 (CET) Received: from localhost (localhost [127.0.0.1]) by zmproxy.tng.vnc.biz (Postfix) with ESMTP id 5837F1FA7AF; Mon, 26 Jan 2026 20:35:32 +0100 (CET) DKIM-Filter: OpenDKIM Filter v2.10.3 zmproxy.tng.vnc.biz 5837F1FA7AF DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=tngtech.com; s=B14491C6-869D-11EB-BB6C-8DD33D883B31; t=1769456132; bh=gWg1ozqnV38gDPTbJjaTTr/2ExUrF6zZXVvlJpEkHhU=; h=From:To:Subject:Date:Message-Id:MIME-Version; b=mU69tfIYqOq/woFcGbijqrxxJsebqDl1A+bx6FxZGIhHu606pAzJhSNe6YWOeK0Rn ohuXJF9Nt36YqVCFF5H7HxtdhjYbjuAGUd5wxosTfl/UrA54kkBTyr1/qYMe8EY/Nc FEnL8FtURuxUEihF+Gx1m0C6C9H3SOFqPijq+7uo/Ir+N8/ZTN/phZOwG3LLk1yWAJ Im6Y8P0PvT23icxDUcY01UHlzdv9M9o+U+/JauQC39JhQbdxDth/8R1GkpM0PaPMJG NuXtLl3swjfDphaS/FCTKWGGt7C0doiKbLeBETWGJaeUQU3gPkOGG9Lqy62zo0LYkp y9IviNfdcsucA== X-Virus-Scanned: amavis at zmproxy.tng.vnc.biz Received: from zmproxy.tng.vnc.biz ([127.0.0.1]) by localhost (zmproxy.tng.vnc.biz [127.0.0.1]) (amavis, port 10026) with ESMTP id 2-8IAOAy3KXv; Mon, 26 Jan 2026 20:35:32 +0100 (CET) Received: from DESKTOP-0O0JV6I.localdomain (ipservice-092-208-231-176.092.208.pools.vodafone-ip.de [92.208.231.176]) by zmproxy.tng.vnc.biz (Postfix) with ESMTPSA id E82EE1FA3D7; Mon, 26 Jan 2026 20:35:31 +0100 (CET) From: Luis Augenstein To: nathan@kernel.org, nsc@kernel.org Cc: linux-kbuild@vger.kernel.org, linux-kernel@vger.kernel.org, akpm@linux-foundation.org, gregkh@linuxfoundation.org, maximilian.huber@tngtech.com, Luis Augenstein Subject: [PATCH v3 14/14] tools/sbom: add unit tests for SPDX-License-Identifier parsing Date: Mon, 26 Jan 2026 20:33:04 +0100 Message-Id: <20260126193304.320916-15-luis.augenstein@tngtech.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20260126193304.320916-1-luis.augenstein@tngtech.com> References: <20260126193304.320916-1-luis.augenstein@tngtech.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Verify that SPDX-License-Identifier headers at the top of source files are parsed correctly. Co-developed-by: Maximilian Huber Signed-off-by: Maximilian Huber Signed-off-by: Luis Augenstein --- tools/sbom/tests/spdx_graph/__init__.py | 0 .../sbom/tests/spdx_graph/test_kernel_file.py | 32 +++++++++++++++++++ 2 files changed, 32 insertions(+) create mode 100644 tools/sbom/tests/spdx_graph/__init__.py create mode 100644 tools/sbom/tests/spdx_graph/test_kernel_file.py diff --git a/tools/sbom/tests/spdx_graph/__init__.py b/tools/sbom/tests/spd= x_graph/__init__.py new file mode 100644 index 000000000000..e69de29bb2d1 diff --git a/tools/sbom/tests/spdx_graph/test_kernel_file.py b/tools/sbom/t= ests/spdx_graph/test_kernel_file.py new file mode 100644 index 000000000000..bc44e7a97d2a --- /dev/null +++ b/tools/sbom/tests/spdx_graph/test_kernel_file.py @@ -0,0 +1,32 @@ +# SPDX-License-Identifier: GPL-2.0-only OR MIT +# Copyright (C) 2025 TNG Technology Consulting GmbH + +import unittest +from pathlib import Path +import tempfile +from sbom.spdx_graph.kernel_file import _parse_spdx_license_identifier # = type: ignore + + +class TestKernelFile(unittest.TestCase): + def setUp(self): + self.tmpdir =3D tempfile.TemporaryDirectory() + self.src_tree =3D Path(self.tmpdir.name) + + def tearDown(self): + self.tmpdir.cleanup() + + def test_parse_spdx_license_identifier(self): + # REUSE-IgnoreStart + test_cases: list[tuple[str, str | None]] =3D [ + ("/* SPDX-License-Identifier: MIT*/", "MIT"), + ("// SPDX-License-Identifier: GPL-2.0-only", "GPL-2.0-only"), + ("/* SPDX-License-Identifier: GPL-2.0-or-later OR MIT */", "GP= L-2.0-or-later OR MIT"), + ("/* SPDX-License-Identifier: Apache-2.0 */\n extra text", "Ap= ache-2.0"), + ("int main() { return 0; }", None), + ] + # REUSE-IgnoreEnd + + for i, (file_content, expected_identifier) in enumerate(test_cases= ): + file_path =3D self.src_tree / f"file_{i}.c" + file_path.write_text(file_content) + self.assertEqual(_parse_spdx_license_identifier(str(file_path)= ), expected_identifier) --=20 2.34.1