From nobody Sat Apr 18 04:54:51 2026 Received: from mailgw01.zimbra-vnc.de (mailgw01.zimbra-vnc.de [148.251.101.236]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id A7CBF3DA7D1; Fri, 10 Apr 2026 21:31:52 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=148.251.101.236 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1775856715; cv=none; b=fgExA1zy7phM6suJIMlP5f8DBEdLbpzw5VRlfoASroqrfPFk45sZNoqxqGBiHfFc1yECv36lnsDEli9aC1641WqSnjJDaFsH6bMmtV6StoCaTFRPj90S/ofKBgRd2If/Bdw4dZbFIVObvxjrUZvNqTVEMywcZutVCL82fbeBajM= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1775856715; c=relaxed/simple; bh=BhDM3wxbRQR+9889hYbygD2Zy9Xs083yLTH4wRhGWIM=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=DHveKY6stbcCX0OBi74rfI1yWH/3MG9kIhMnFymHc67jVhDzHM3PvhbVUfrB5p2vlZdkGol1YAEF5PWcPS2izaHZKRO+/zzbl4QQ4uDnBiPmCGcmbfIbNz03fqXD2CJm+xPAo9X5JYlrci+c4AanQGNHAF4iF8ebS3beT5UAKgQ= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=tngtech.com; spf=pass smtp.mailfrom=tngtech.com; dkim=pass (2048-bit key) header.d=tngtech.com header.i=@tngtech.com header.b=dspNtTCh; arc=none smtp.client-ip=148.251.101.236 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=tngtech.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=tngtech.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=tngtech.com header.i=@tngtech.com header.b="dspNtTCh" Received: from zmproxy.tng.vnc.biz (zimbra-vnc.tngtech.com [35.234.71.156]) by mailgw01.zimbra-vnc.de (Postfix) with ESMTPS id D68323FAEE; Fri, 10 Apr 2026 23:23:29 +0200 (CEST) Received: from localhost (localhost [127.0.0.1]) by zmproxy.tng.vnc.biz (Postfix) with ESMTP id A0B561FA89E; Fri, 10 Apr 2026 23:23:29 +0200 (CEST) Received: from zmproxy.tng.vnc.biz ([127.0.0.1]) by localhost (zmproxy.tng.vnc.biz [127.0.0.1]) (amavis, port 10032) with ESMTP id iWZ7UxsKFQkL; Fri, 10 Apr 2026 23:23:28 +0200 (CEST) Received: from localhost (localhost [127.0.0.1]) by zmproxy.tng.vnc.biz (Postfix) with ESMTP id C2EE41FAA6D; Fri, 10 Apr 2026 23:23:28 +0200 (CEST) DKIM-Filter: OpenDKIM Filter v2.10.3 zmproxy.tng.vnc.biz C2EE41FAA6D DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=tngtech.com; s=B14491C6-869D-11EB-BB6C-8DD33D883B31; t=1775856208; bh=D+eLvV0iz2LovY/WMJKzo0PTVmPQCEG6btwOjsAZPIc=; h=From:To:Subject:Date:Message-ID:MIME-Version; b=dspNtTCh3o/YK3dAuCW8XiEv1+9UQymuozb0INoao0AU3SMwQervNPrOx5lutPvra o26u4Iamgii8gY8OwAGMmLIF6Bl6ukHUfcVPlAbNy+GIRlMPoTDq6pv4E/xr4KrLMi z0tw7C3LbisJ+fUXTZxX/VRg0y0LB1P9yUQgfniRYBUteAa5YPfNHKpsDQonjMiIyt UYLD07y8++4d7SsHwjukQMVDJ/DjTeHyF+NqudOIm1l4F5YIlNMlCmnDs2XJmq5+GD LzbUpIYFrG832f5htFiIevFIGixpKAAKPu/V7io/iLrCpJzIxEq9XbkMt8W3NFq+iR N2MzAWngD8XRg== X-Virus-Scanned: amavis at zmproxy.tng.vnc.biz Received: from zmproxy.tng.vnc.biz ([127.0.0.1]) by localhost (zmproxy.tng.vnc.biz [127.0.0.1]) (amavis, port 10026) with ESMTP id bVyjP1DtlvoJ; Fri, 10 Apr 2026 23:23:28 +0200 (CEST) Received: from luis-Precision-5480.. (ipservice-092-209-239-167.092.209.pools.vodafone-ip.de [92.209.239.167]) by zmproxy.tng.vnc.biz (Postfix) with ESMTPSA id 5F2651FA89E; Fri, 10 Apr 2026 23:23:28 +0200 (CEST) From: Luis To: nathan@kernel.org, nsc@kernel.org Cc: linux-kbuild@vger.kernel.org, linux-kernel@vger.kernel.org, akpm@linux-foundation.org, gregkh@linuxfoundation.org, kstewart@linuxfoundation.org, maximilian.huber@tngtech.com, Luis Augenstein Subject: [PATCH v5 01/15] scripts/sbom: add documentation Date: Fri, 10 Apr 2026 23:22:41 +0200 Message-ID: <20260410212255.9883-2-luis.augenstein@tngtech.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20260410212255.9883-1-luis.augenstein@tngtech.com> References: <20260410212255.9883-1-luis.augenstein@tngtech.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Luis Augenstein Assisted-by: Cursor:claude-sonnet-4-5 Assisted-by: OpenCode:GLM-4-7 Co-developed-by: Maximilian Huber Signed-off-by: Maximilian Huber Signed-off-by: Luis Augenstein --- Documentation/tools/index.rst | 1 + Documentation/tools/sbom/sbom.rst | 206 ++++++++++++++++++++++++++++++ 2 files changed, 207 insertions(+) create mode 100644 Documentation/tools/sbom/sbom.rst diff --git a/Documentation/tools/index.rst b/Documentation/tools/index.rst index 5f2f63bcb28..1adf4a6f909 100644 --- a/Documentation/tools/index.rst +++ b/Documentation/tools/index.rst @@ -13,3 +13,4 @@ more additions are needed here: rtla/index rv/index python + sbom/sbom diff --git a/Documentation/tools/sbom/sbom.rst b/Documentation/tools/sbom/s= bom.rst new file mode 100644 index 00000000000..029b08b6ad8 --- /dev/null +++ b/Documentation/tools/sbom/sbom.rst @@ -0,0 +1,206 @@ +.. SPDX-License-Identifier: GPL-2.0-only OR MIT +.. Copyright (C) 2025 TNG Technology Consulting GmbH + +KernelSbom +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D + +Introduction +------------ + +KernelSbom is a Python script ``scripts/sbom/sbom.py`` that can be +executed after a successful kernel build. When invoked, KernelSbom +analyzes all files involved in the build and generates Software Bill of +Materials (SBOM) documents in SPDX 3.0.1 format. +The generated SBOM documents capture: + +* **Final output artifacts**, typically the kernel image and modules +* **All source files** that contributed to the build with metadata + and licensing information +* **Details of the build process**, including intermediate artifacts + and the build commands linking source files to the final output + artifacts + +KernelSbom is originally developed in the +`KernelSbom repository `_. + +Requirements +------------ + +Python 3.10 or later. No libraries or other dependencies are required. + +Basic Usage +----------- + +Run the ``make sbom`` target. +For example:: + + $ make defconfig O=3Dkernel_build + $ make sbom O=3Dkernel_build -j$(nproc) + +This will trigger a kernel build. After all build outputs have been +generated, KernelSbom produces three SPDX documents in the root +directory of the object tree: + +* ``sbom-source.spdx.json`` + Describes all source files involved in the build and + associates each file with its corresponding license expression. + +* ``sbom-output.spdx.json`` + Captures all final build outputs (kernel image and ``.ko`` module files) + and includes build metadata such as environment variables and + a hash of the ``.config`` file used for the build. + +* ``sbom-build.spdx.json`` + Imports files from the source and output documents and describes every + intermediate build artifact. For each artifact, it records the exact + build command used and establishes the relationship between + input files and generated outputs. + +When invoking the sbom target, it is recommended to perform +out-of-tree builds using ``O=3D``. KernelSbom classifies files as +source files when they are located in the source tree and not in the +object tree. For in-tree builds, where the source and object trees are +the same directory, this distinction can no longer be made reliably. +In that case, KernelSbom does not generate a dedicated source SBOM. +Instead, source files are included in the build SBOM. + +Standalone Usage +---------------- + +KernelSbom can also be used as a standalone script to generate +SPDX documents for specific build outputs. For example, after a +successful x86 kernel build, KernelSbom can generate SPDX documents +for the ``bzImage`` kernel image:: + + $ SRCARCH=3Dx86 python3 scripts/sbom/sbom.py \ + --src-tree . \ + --obj-tree ./kernel_build \ + --roots arch/x86/boot/bzImage \ + --generate-spdx \ + --generate-used-files \ + --prettify-json \ + --debug + +Note that when KernelSbom is invoked outside of the ``make`` process, +the environment variables used during compilation are not available and +therefore cannot be included in the generated SPDX documents. It is +recommended to set at least the ``SRCARCH`` environment variable to the +architecture for which the build was performed. + +For a full list of command-line options, run:: + + $ python3 scripts/sbom/sbom.py --help + +Output Format +------------- + +KernelSbom generates documents conforming to the +`SPDX 3.0.1 specification `_ +serialized as JSON-LD. + +To reduce file size, the output documents use the JSON-LD ``@context`` +to define custom prefixes for ``spdxId`` values. While this is compliant +with the SPDX specification, only a limited number of tools in the +current SPDX ecosystem support custom JSON-LD contexts. To use such +tools with the generated documents, the custom JSON-LD context must +be expanded before providing the documents. +See https://lists.spdx.org/g/Spdx-tech/message/6064 for more information. + +How it Works +------------ + +KernelSbom operates in two major phases: + +1. **Generate the cmd graph**, an acyclic directed dependency graph. +2. **Generate SPDX documents** based on the cmd graph. + +KernelSbom begins from the root artifacts specified by the user, e.g., +``arch/x86/boot/bzImage``. For each root artifact, it collects all +dependencies required to build that artifact. The dependencies come +from multiple sources: + +* **.cmd files**: The primary source is the ``.cmd`` file of the + generated artifact, e.g., ``arch/x86/boot/.bzImage.cmd``. These files + contain the exact command used to build the artifact and often include + an explicit list of input dependencies. By parsing the ``.cmd`` + file, the full list of dependencies can be obtained. + +* **.incbin statements**: The second source are include binary + ``.incbin`` statements in ``.S`` assembly files. + +* **Hardcoded dependencies**: Unfortunately, not all build dependencies + can be found via ``.cmd`` files and ``.incbin`` statements. Some build + dependencies are directly defined in Makefiles or Kbuild files. + Parsing these files is considered too complex for the scope of this + project. Instead, the remaining gaps of the graph are filled using a + list of manually defined dependencies, see + ``scripts/sbom/sbom/cmd_graph/hardcoded_dependencies.py``. This list is + known to be incomplete. However, analysis of the cmd graph indicates a + ~99% completeness. For more information about the completeness analysis, + see `KernelSbom #95 `_. + +Given the list of dependency files, KernelSbom recursively processes +each file, expanding the dependency chain all the way to the version +controlled source files. The result is a complete dependency graph +where nodes represent files, and edges represent "file A was used to +build file B" relationships. + +Using the cmd graph, KernelSbom produces three SPDX documents. +For every file in the graph, KernelSbom: + +* Parses ``SPDX-License-Identifier`` headers, +* Computes file hashes, +* Estimates the file type based on extension and path, +* Records build relationships between files. + +Each root output file is additionally associated with an SPDX Package +element that captures version information, license data, and copyright. + +Advanced Usage +-------------- + +Including Kernel Modules +~~~~~~~~~~~~~~~~~~~~~~~~ + +The list of all ``.ko`` kernel modules produced during a build can be +extracted from the ``modules.order`` file within the object tree. +For example:: + + $ echo "arch/x86/boot/bzImage" > sbom-roots.txt + $ sed 's/\.o$/.ko/' ./kernel_build/modules.order >> sbom-roots.txt + +Then use the generated roots file:: + + $ SRCARCH=3Dx86 python3 scripts/sbom/sbom.py \ + --src-tree . \ + --obj-tree ./kernel_build \ + --roots-file sbom-roots.txt \ + --generate-spdx + +Equal Source and Object Trees +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +When the source tree and object tree are identical (for example, when +building in-tree), source files can no longer be reliably distinguished +from generated files. +In this scenario, KernelSbom does not produce a dedicated +``sbom-source.spdx.json`` document. Instead, both source files and build +artifacts are included together in ``sbom-build.spdx.json``, and +``sbom.used-files.txt`` lists all files referenced in the build document. + +Unknown Build Commands +~~~~~~~~~~~~~~~~~~~~~~ + +Because the kernel supports a wide range of configurations and versions, +KernelSbom may encounter build commands in ``.cmd`` files that it does +not yet support. By default, KernelSbom will fail if an unknown build +command is encountered. + +If you still wish to generate SPDX documents despite unsupported +commands, you can use the ``--do-not-fail-on-unknown-build-command`` +option. KernelSbom will continue and produce the documents, although +the resulting SBOM will be incomplete. + +This option should only be used when the missing portion of the +dependency graph is small and an incomplete SBOM is acceptable for +your use case. --=20 2.43.0 From nobody Sat Apr 18 04:54:51 2026 Received: from mailgw01.zimbra-vnc.de (mailgw01.zimbra-vnc.de [148.251.101.236]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id A42413AE71E; Fri, 10 Apr 2026 21:31:52 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=148.251.101.236 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1775856714; cv=none; b=LeuOtZGYMSyXusotFFQvhZ/oN11gwU1OjLGraH6L3V1MQXevz3381xCEOoNIXjKDKSfJ0czkEkF9zSrBySE7KY/WzLhkpesALEm2OP0eVoRQnty+BQ+xdD3GuKyBv93JB69ntZKGYvZQ/o5y7m5Fgkra59jlvRjH22500B+p+kk= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1775856714; c=relaxed/simple; bh=kQsNRrJf28E6KtNRje58wwPGUPof6BJutvNvCihBV60=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=XN/r3Js/p5QR0TRNove0QVYJwxo68ewBa9+AOFuUBWhGzsANhEDu3tt+3ZBNa03Rsp5DAD4N6K5iM7MH/sF86vK6ZVEU0ADQRn1ephBytUvFlt0zevR93IriAxFikUbTX/y2YD7aBSCQvPovBxF2ckV+K0ZDY5MDdnYV/ameGG0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=tngtech.com; spf=pass smtp.mailfrom=tngtech.com; dkim=pass (2048-bit key) header.d=tngtech.com header.i=@tngtech.com header.b=VzE4xr7I; arc=none smtp.client-ip=148.251.101.236 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=tngtech.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=tngtech.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=tngtech.com header.i=@tngtech.com header.b="VzE4xr7I" Received: from zmproxy.tng.vnc.biz (zimbra-vnc.tngtech.com [35.234.71.156]) by mailgw01.zimbra-vnc.de (Postfix) with ESMTPS id 6C4C43FAEF; Fri, 10 Apr 2026 23:23:31 +0200 (CEST) Received: from localhost (localhost [127.0.0.1]) by zmproxy.tng.vnc.biz (Postfix) with ESMTP id 3D6561FA89E; Fri, 10 Apr 2026 23:23:31 +0200 (CEST) Received: from zmproxy.tng.vnc.biz ([127.0.0.1]) by localhost (zmproxy.tng.vnc.biz [127.0.0.1]) (amavis, port 10032) with ESMTP id LIvB0A8LqQXJ; Fri, 10 Apr 2026 23:23:30 +0200 (CEST) Received: from localhost (localhost [127.0.0.1]) by zmproxy.tng.vnc.biz (Postfix) with ESMTP id 9F1921FAA6D; Fri, 10 Apr 2026 23:23:30 +0200 (CEST) DKIM-Filter: OpenDKIM Filter v2.10.3 zmproxy.tng.vnc.biz 9F1921FAA6D DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=tngtech.com; s=B14491C6-869D-11EB-BB6C-8DD33D883B31; t=1775856210; bh=ewTLlgH3wjgbmawmgxSH5jKLCQxtbiTK2yElmdV8jCc=; h=From:To:Subject:Date:Message-ID:MIME-Version; b=VzE4xr7IOVSuIdih0rrV4BMCUvwb/kfzQAXZR9JApDK3G5BRJvihVRhZtryv6q8GH QjiJuHy7RdRCX7hRcWpT9wJZ4uVzV3si5ZPzrotEx1+1oA1Z+Wgdxi3Krm0oTEMaKa EKns5HiVlMtMUsusDIpO/no4GDNWWgj/l3dHVTH57TtckupGkS0zjzplgva+BUQ0kJ RHIuzxQRDLlonratN+YT6KkB3QRiJSF7nWByh3axFFzJrS5OxqpVA6K5ND3jSxJuB9 LPa0/v3OUq9IuS1y1PZ1mmE6lORp3yHb3obsEqEkRAtZcZyQZIjKp271egnP61svzJ D7brMKOP/Z3+g== X-Virus-Scanned: amavis at zmproxy.tng.vnc.biz Received: from zmproxy.tng.vnc.biz ([127.0.0.1]) by localhost (zmproxy.tng.vnc.biz [127.0.0.1]) (amavis, port 10026) with ESMTP id OCStKVquEEh7; Fri, 10 Apr 2026 23:23:30 +0200 (CEST) Received: from luis-Precision-5480.. (ipservice-092-209-239-167.092.209.pools.vodafone-ip.de [92.209.239.167]) by zmproxy.tng.vnc.biz (Postfix) with ESMTPSA id 49A7F1FA89E; Fri, 10 Apr 2026 23:23:30 +0200 (CEST) From: Luis To: nathan@kernel.org, nsc@kernel.org Cc: linux-kbuild@vger.kernel.org, linux-kernel@vger.kernel.org, akpm@linux-foundation.org, gregkh@linuxfoundation.org, kstewart@linuxfoundation.org, maximilian.huber@tngtech.com, Luis Augenstein Subject: [PATCH v5 02/15] scripts/sbom: integrate script in make process Date: Fri, 10 Apr 2026 23:22:42 +0200 Message-ID: <20260410212255.9883-3-luis.augenstein@tngtech.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20260410212255.9883-1-luis.augenstein@tngtech.com> References: <20260410212255.9883-1-luis.augenstein@tngtech.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Luis Augenstein integrate SBOM script into the kernel build process. Assisted-by: Cursor:claude-sonnet-4-5 Assisted-by: OpenCode:GLM-4-7 Co-developed-by: Maximilian Huber Signed-off-by: Maximilian Huber Signed-off-by: Luis Augenstein --- .gitignore | 1 + MAINTAINERS | 6 ++++++ Makefile | 20 ++++++++++++++++++-- scripts/sbom/sbom.py | 16 ++++++++++++++++ 4 files changed, 41 insertions(+), 2 deletions(-) create mode 100644 scripts/sbom/sbom.py diff --git a/.gitignore b/.gitignore index 3a7241c941f..f3372f15eb1 100644 --- a/.gitignore +++ b/.gitignore @@ -48,6 +48,7 @@ *.s *.so *.so.dbg +*.spdx.json *.su *.symtypes *.tab.[ch] diff --git a/MAINTAINERS b/MAINTAINERS index c3fe46d7c4b..419a1f70a3a 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -23657,6 +23657,12 @@ R: Marc Murphy S: Supported F: arch/arm/boot/dts/ti/omap/am335x-sancloud* =20 +SBOM +M: Luis Augenstein +M: Maximilian Huber +S: Maintained +F: scripts/sbom/ + SC1200 WDT DRIVER M: Zwane Mwaikambo S: Maintained diff --git a/Makefile b/Makefile index 4f54c568563..06d1ccd9b96 100644 --- a/Makefile +++ b/Makefile @@ -777,7 +777,7 @@ endif # in addition to whatever we do anyway. # Just "make" or "make all" shall build modules as well =20 -ifneq ($(filter all modules nsdeps compile_commands.json clang-%,$(MAKECMD= GOALS)),) +ifneq ($(filter all modules nsdeps compile_commands.json clang-% sbom,$(MA= KECMDGOALS)),) KBUILD_MODULES :=3D y endif =20 @@ -1654,7 +1654,7 @@ CLEAN_FILES +=3D vmlinux.symvers modules-only.symvers= \ modules.builtin.ranges vmlinux.o.map vmlinux.unstripped \ compile_commands.json rust/test \ rust-project.json .vmlinux.objs .vmlinux.export.c \ - .builtin-dtbs-list .builtin-dtbs.S + .builtin-dtbs-list .builtin-dtbs.S sbom-*.spdx.json =20 # Directories & files removed with 'make mrproper' MRPROPER_FILES +=3D include/config include/generated \ @@ -1773,6 +1773,7 @@ help: @echo '' @echo 'Tools:' @echo ' nsdeps - Generate missing symbol namespace dependencie= s' + @echo ' sbom - Generate Software Bill of Materials' @echo '' @echo 'Kernel selftest:' @echo ' kselftest - Build and run kernel selftest' @@ -2159,6 +2160,21 @@ nsdeps: export KBUILD_NSDEPS=3D1 nsdeps: modules $(Q)$(CONFIG_SHELL) $(srctree)/scripts/nsdeps =20 +# Script to generate .spdx.json SBOM documents describing the build +# ------------------------------------------------------------------------= --- + +ifdef building_out_of_srctree +sbom_targets :=3D sbom-source.spdx.json +endif +sbom_targets +=3D sbom-build.spdx.json sbom-output.spdx.json +quiet_cmd_sbom =3D GEN $(sbom_targets) + cmd_sbom =3D printf "%s\n" "$(KBUILD_IMAGE)" >"$(tmp-target)"; \ + $(if $(CONFIG_MODULES),sed 's/\.o$$/.ko/' $(objtree)/modu= les.order >> "$(tmp-target)";) \ + $(PYTHON3) $(srctree)/scripts/sbom/sbom.py; +PHONY +=3D sbom +sbom: $(notdir $(KBUILD_IMAGE)) include/generated/autoconf.h $(if $(CONFIG= _MODULES),modules modules.order) + $(call cmd,sbom) + # Clang Tooling # ------------------------------------------------------------------------= --- =20 diff --git a/scripts/sbom/sbom.py b/scripts/sbom/sbom.py new file mode 100644 index 00000000000..9c2e4c7f17c --- /dev/null +++ b/scripts/sbom/sbom.py @@ -0,0 +1,16 @@ +#!/usr/bin/env python3 +# SPDX-License-Identifier: GPL-2.0-only OR MIT +# Copyright (C) 2025 TNG Technology Consulting GmbH + +""" +Compute software bill of materials in SPDX format describing a kernel buil= d. +""" + + +def main(): + pass + + +# Call main method +if __name__ =3D=3D "__main__": + main() --=20 2.43.0 From nobody Sat Apr 18 04:54:51 2026 Received: from mailgw01.zimbra-vnc.de (mailgw01.zimbra-vnc.de [148.251.101.236]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id A7BE73DA7C9; Fri, 10 Apr 2026 21:31:52 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=148.251.101.236 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1775856714; cv=none; b=b5jXZsCVa/1pNklnioR5yGeaGW2FtTFlrEDFAxAFJXikdea8H7CH27j3sRoJcnIQrzeIIyQn366lO3QvHGUPDQtQM3diHmgocfC0/OisVZybBDq6VoytlUKfBeB1Epil79hXc7FL2x0IggOplGApjgXClmRLY5HdhbwIvTFYEos= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1775856714; c=relaxed/simple; bh=U6eBdrFYfqZd2K299ZfqMAgesEAOqlWfIx3f0RYwrXc=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=NmPHo8En10Eyv/zOGQ5FFBrJ3ONLUprcPjh3z1GC2Oh4HN9FrmNn4w6sE875lH3aymsVWsl4scLKY1Q+zIT5o3G4mU0dIBCXeWWeCSFnjRkcgQWbmSA61Li3LWGZEjCfcJbTIWRKVj8d5bS9OLv1tNaZqBIS99eTZnKAV52INjU= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=tngtech.com; spf=pass smtp.mailfrom=tngtech.com; dkim=pass (2048-bit key) header.d=tngtech.com header.i=@tngtech.com header.b=bAqmmN7R; arc=none smtp.client-ip=148.251.101.236 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=tngtech.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=tngtech.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=tngtech.com header.i=@tngtech.com header.b="bAqmmN7R" Received: from zmproxy.tng.vnc.biz (zimbra-vnc.tngtech.com [35.234.71.156]) by mailgw01.zimbra-vnc.de (Postfix) with ESMTPS id F0E8E3FAF0; Fri, 10 Apr 2026 23:23:32 +0200 (CEST) Received: from localhost (localhost [127.0.0.1]) by zmproxy.tng.vnc.biz (Postfix) with ESMTP id C18661F89B3; Fri, 10 Apr 2026 23:23:32 +0200 (CEST) Received: from zmproxy.tng.vnc.biz ([127.0.0.1]) by localhost (zmproxy.tng.vnc.biz [127.0.0.1]) (amavis, port 10032) with ESMTP id 9rixxFyU1Rvx; Fri, 10 Apr 2026 23:23:32 +0200 (CEST) Received: from localhost (localhost [127.0.0.1]) by zmproxy.tng.vnc.biz (Postfix) with ESMTP id 1AB251FAA6D; Fri, 10 Apr 2026 23:23:32 +0200 (CEST) DKIM-Filter: OpenDKIM Filter v2.10.3 zmproxy.tng.vnc.biz 1AB251FAA6D DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=tngtech.com; s=B14491C6-869D-11EB-BB6C-8DD33D883B31; t=1775856212; bh=ejNKFDQ7nGjhznwzaTa7ZqAhyjz2bzoAnSJezf6S4pg=; h=From:To:Subject:Date:Message-ID:MIME-Version; b=bAqmmN7R0mGamyIDRKt2TdJywpM4xnVA2KZqXCXgynQh2KuezRZQP+J49PxSSPnvq NFlHV0oDic/l77x3tM/RJKgwOn8M1nGH43FmbwI4IqUOt7CtQxOsDcS8dHSJ/zT68A RWGoUGJ1gmgutCIL8sY+xgTzlyDuUjfUEOx1AUeNx790WUpenwXlJYn05VY/gAhXyI cgqnxdHrwauHhd9SQGaJQ8mrjbCmjydEWKU+5FxOu5TFLhmlEUPfBsfA85Pp25dXyW YOdcgeNjBRH2uJgGiTXP/MX9rrGD/9EQ+auKk5gI1Sm5IhkoYxSX1mg0EjyHUqj9/d tnvdtnrfRkGig== X-Virus-Scanned: amavis at zmproxy.tng.vnc.biz Received: from zmproxy.tng.vnc.biz ([127.0.0.1]) by localhost (zmproxy.tng.vnc.biz [127.0.0.1]) (amavis, port 10026) with ESMTP id rzYuBkj-gSui; Fri, 10 Apr 2026 23:23:32 +0200 (CEST) Received: from luis-Precision-5480.. (ipservice-092-209-239-167.092.209.pools.vodafone-ip.de [92.209.239.167]) by zmproxy.tng.vnc.biz (Postfix) with ESMTPSA id B92631FA89E; Fri, 10 Apr 2026 23:23:31 +0200 (CEST) From: Luis To: nathan@kernel.org, nsc@kernel.org Cc: linux-kbuild@vger.kernel.org, linux-kernel@vger.kernel.org, akpm@linux-foundation.org, gregkh@linuxfoundation.org, kstewart@linuxfoundation.org, maximilian.huber@tngtech.com, Luis Augenstein Subject: [PATCH v5 03/15] scripts/sbom: setup sbom logging Date: Fri, 10 Apr 2026 23:22:43 +0200 Message-ID: <20260410212255.9883-4-luis.augenstein@tngtech.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20260410212255.9883-1-luis.augenstein@tngtech.com> References: <20260410212255.9883-1-luis.augenstein@tngtech.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Luis Augenstein Add logging infrastructure for warnings and errors. Errors and warnings are accumulated and summarized in the end. Assisted-by: Cursor:claude-sonnet-4-5 Assisted-by: OpenCode:GLM-4-7 Co-developed-by: Maximilian Huber Signed-off-by: Maximilian Huber Signed-off-by: Luis Augenstein --- scripts/sbom/sbom.py | 24 ++++++++- scripts/sbom/sbom/__init__.py | 0 scripts/sbom/sbom/config.py | 47 +++++++++++++++++ scripts/sbom/sbom/sbom_logging.py | 88 +++++++++++++++++++++++++++++++ 4 files changed, 158 insertions(+), 1 deletion(-) create mode 100644 scripts/sbom/sbom/__init__.py create mode 100644 scripts/sbom/sbom/config.py create mode 100644 scripts/sbom/sbom/sbom_logging.py diff --git a/scripts/sbom/sbom.py b/scripts/sbom/sbom.py index 9c2e4c7f17c..c7f23d6eb30 100644 --- a/scripts/sbom/sbom.py +++ b/scripts/sbom/sbom.py @@ -6,9 +6,31 @@ Compute software bill of materials in SPDX format describing a kernel buil= d. """ =20 +import logging +import sys +import sbom.sbom_logging as sbom_logging +from sbom.config import get_config + =20 def main(): - pass + # Read config + config =3D get_config() + + # Configure logging + logging.basicConfig( + level=3Dlogging.DEBUG if config.debug else logging.INFO, + format=3D"[%(levelname)s] %(message)s", + ) + + # Report collected warnings and errors in case of failure + warning_summary =3D sbom_logging.summarize_warnings() + error_summary =3D sbom_logging.summarize_errors() + + if warning_summary: + logging.warning(warning_summary) + if error_summary: + logging.error(error_summary) + sys.exit(1) =20 =20 # Call main method diff --git a/scripts/sbom/sbom/__init__.py b/scripts/sbom/sbom/__init__.py new file mode 100644 index 00000000000..e69de29bb2d diff --git a/scripts/sbom/sbom/config.py b/scripts/sbom/sbom/config.py new file mode 100644 index 00000000000..3dc569ae0c4 --- /dev/null +++ b/scripts/sbom/sbom/config.py @@ -0,0 +1,47 @@ +# SPDX-License-Identifier: GPL-2.0-only OR MIT +# Copyright (C) 2025 TNG Technology Consulting GmbH + +import argparse +from dataclasses import dataclass + + +@dataclass +class KernelSbomConfig: + debug: bool + """Whether to enable debug logging.""" + + +def _parse_cli_arguments() -> dict[str, bool]: + """ + Parse command-line arguments using argparse. + + Returns: + Dictionary of parsed arguments. + """ + parser =3D argparse.ArgumentParser( + description=3D"Generate SPDX SBOM documents for kernel builds", + ) + parser.add_argument( + "--debug", + action=3D"store_true", + default=3DFalse, + help=3D"Enable debug logs (default: False)", + ) + + args =3D vars(parser.parse_args()) + return args + + +def get_config() -> KernelSbomConfig: + """ + Parse command-line arguments and construct the configuration object. + + Returns: + KernelSbomConfig: Configuration object with all settings for SBOM = generation. + """ + # Parse cli arguments + args =3D _parse_cli_arguments() + + debug =3D args["debug"] + + return KernelSbomConfig(debug=3Ddebug) diff --git a/scripts/sbom/sbom/sbom_logging.py b/scripts/sbom/sbom/sbom_log= ging.py new file mode 100644 index 00000000000..3460c4d8462 --- /dev/null +++ b/scripts/sbom/sbom/sbom_logging.py @@ -0,0 +1,88 @@ +# SPDX-License-Identifier: GPL-2.0-only OR MIT +# Copyright (C) 2025 TNG Technology Consulting GmbH + +import logging +import inspect +from typing import Any, Literal + + +class MessageLogger: + """Logger that prints the first occurrence of each message immediately + and keeps track of repeated messages for a final summary.""" + + messages: dict[str, list[str]] + repeated_logs_limit: int + """Maximum number of repeated messages of the same type to log before = suppressing further output.""" + + def __init__(self, level: Literal["error", "warning"], repeated_logs_l= imit: int =3D 3) -> None: + self._level =3D level + self.messages =3D {} + self.repeated_logs_limit =3D repeated_logs_limit + + def log(self, template: str, /, **kwargs: Any) -> None: + """Log a message based on a template and optional variables.""" + message =3D template.format(**kwargs) + if template not in self.messages: + self.messages[template] =3D [] + if len(self.messages[template]) < self.repeated_logs_limit: + if self._level =3D=3D "error": + logging.error(message) + elif self._level =3D=3D "warning": + logging.warning(message) + self.messages[template].append(message) + + def get_summary(self) -> str: + """Return summary of collected messages.""" + if len(self.messages) =3D=3D 0: + return "" + summary: list[str] =3D [f"Summarize {self._level}s:"] + for msgs in self.messages.values(): + for i, msg in enumerate(msgs): + if i < self.repeated_logs_limit: + summary.append(msg) + continue + summary.append( + f"... (Found {len(msgs) - i} more {'instances' if (len= (msgs) - i) !=3D 1 else 'instance'} of this {self._level})" + ) + break + return "\n".join(summary) + + +_warning_logger: MessageLogger +_error_logger: MessageLogger + + +def warning(msg_template: str, /, **kwargs: Any) -> None: + """Log a warning message.""" + _warning_logger.log(msg_template, **kwargs) + + +def error(msg_template: str, /, **kwargs: Any) -> None: + """Log an error message including file, line, and function context.""" + frame =3D inspect.currentframe() + caller_frame =3D frame.f_back if frame else None + info =3D inspect.getframeinfo(caller_frame) if caller_frame else None + if info: + msg_template =3D f'File "{info.filename}", line {info.lineno}, in = {info.function}\n{msg_template}' + _error_logger.log(msg_template, **kwargs) + + +def summarize_warnings() -> str: + return _warning_logger.get_summary() + + +def summarize_errors() -> str: + return _error_logger.get_summary() + + +def has_errors() -> bool: + return len(_error_logger.messages) > 0 + + +def init() -> None: + global _warning_logger, _error_logger + _warning_logger =3D MessageLogger("warning") + _error_logger =3D MessageLogger("error") + + +init() --=20 2.43.0 From nobody Sat Apr 18 04:54:51 2026 Received: from mailgw01.zimbra-vnc.de (mailgw01.zimbra-vnc.de [148.251.101.236]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 884A13B27DA; Fri, 10 Apr 2026 21:23:38 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=148.251.101.236 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1775856224; cv=none; b=u2XNAcRt2SxmKa+ra+rfzsF4SUmIVpWVCiFKhyCyd0O9j931gVwg3WmniUL58TKnEkWYdyfjm96va5h8gqNsjC7SjTXzrTz+UOpXpdEutptaLWZ0H5upM8MUb7N6tQ9rCsO74aO91JImAVbb/k2LOpUb2yoLIDQ4PYQcEu7gzW0= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1775856224; c=relaxed/simple; bh=Md0/aPDECVHZyF2Ic4SQMBbECrvc7Uqf3HSSj1NEOpA=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=A4fWim9F4H1V0WxCOMsUUOwbkiFZZWg9dgqA/JKeBTJUjKQqQ10iYaCHTeNPoSayPC4HTM/AM9nVFaB3D8TFivpsSufHXO+FnKeJVXPiXoZeVY/4DPFsJhhC63QQd8tff6ycrhXpefHCHZzNry5V9PNm45ptSzU4MQNgIGUR84Y= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=tngtech.com; spf=pass smtp.mailfrom=tngtech.com; dkim=pass (2048-bit key) header.d=tngtech.com header.i=@tngtech.com header.b=aF0uU90u; arc=none smtp.client-ip=148.251.101.236 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=tngtech.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=tngtech.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=tngtech.com header.i=@tngtech.com header.b="aF0uU90u" Received: from zmproxy.tng.vnc.biz (zimbra-vnc.tngtech.com [35.234.71.156]) by mailgw01.zimbra-vnc.de (Postfix) with ESMTPS id E68673FAF2; Fri, 10 Apr 2026 23:23:35 +0200 (CEST) Received: from localhost (localhost [127.0.0.1]) by zmproxy.tng.vnc.biz (Postfix) with ESMTP id BCB401F89B3; Fri, 10 Apr 2026 23:23:35 +0200 (CEST) Received: from zmproxy.tng.vnc.biz ([127.0.0.1]) by localhost (zmproxy.tng.vnc.biz [127.0.0.1]) (amavis, port 10032) with ESMTP id ML8eHHOZaXKm; Fri, 10 Apr 2026 23:23:34 +0200 (CEST) Received: from localhost (localhost [127.0.0.1]) by zmproxy.tng.vnc.biz (Postfix) with ESMTP id EC6691FA89E; Fri, 10 Apr 2026 23:23:33 +0200 (CEST) DKIM-Filter: OpenDKIM Filter v2.10.3 zmproxy.tng.vnc.biz EC6691FA89E DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=tngtech.com; s=B14491C6-869D-11EB-BB6C-8DD33D883B31; t=1775856214; bh=vuLjB1tgGLd3uI1WWcc1FcSnWeJOe78ZkpYPd6f1+bk=; h=From:To:Subject:Date:Message-ID:MIME-Version; b=aF0uU90usjmr15WASpbSRTbow5W5x0F8/cAfTmaTDu44v3qkzWK8NMDYcoCwETp5z DCKtjOx/gvb+3gYY0ZxMhSxKNBL+vq4bQoVE3I1OyA+5RO/36tcty/xxrUZ8+Afyqb oTHF6uzzLpxztYtSVtm8VKRwduOMiZsuY/tIX58wjkutbmR/M0djNoXbZPs3kGh8ud 4BFbhV53sNrJDyPqo++AFOa0c5i/mPsA48IF1lXrsnq+d/vbSBhia5d5w/1y7qvyVf XquVL0rBj/fHMF/ZEXUj0ehJm0NeLuCr2z/1bN4XGMMUGUcCNLMUsNmqs/Q+xS7VwI 8o1Ldxq3uuzJg== X-Virus-Scanned: amavis at zmproxy.tng.vnc.biz Received: from zmproxy.tng.vnc.biz ([127.0.0.1]) by localhost (zmproxy.tng.vnc.biz [127.0.0.1]) (amavis, port 10026) with ESMTP id FrCf9Vy5aPzR; Fri, 10 Apr 2026 23:23:33 +0200 (CEST) Received: from luis-Precision-5480.. (ipservice-092-209-239-167.092.209.pools.vodafone-ip.de [92.209.239.167]) by zmproxy.tng.vnc.biz (Postfix) with ESMTPSA id 870791F89B3; Fri, 10 Apr 2026 23:23:33 +0200 (CEST) From: Luis To: nathan@kernel.org, nsc@kernel.org Cc: linux-kbuild@vger.kernel.org, linux-kernel@vger.kernel.org, akpm@linux-foundation.org, gregkh@linuxfoundation.org, kstewart@linuxfoundation.org, maximilian.huber@tngtech.com, Luis Augenstein Subject: [PATCH v5 04/15] scripts/sbom: add command parsers Date: Fri, 10 Apr 2026 23:22:44 +0200 Message-ID: <20260410212255.9883-5-luis.augenstein@tngtech.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20260410212255.9883-1-luis.augenstein@tngtech.com> References: <20260410212255.9883-1-luis.augenstein@tngtech.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Luis Augenstein Implement savedcmd_parser module for extracting input files from kernel build commands. Assisted-by: Cursor:claude-sonnet-4-5 Assisted-by: OpenCode:GLM-4-7 Co-developed-by: Maximilian Huber Signed-off-by: Maximilian Huber Signed-off-by: Luis Augenstein --- .../cmd_graph/savedcmd_parser/__init__.py | 6 + .../command_parser_registry.py | 474 ++++++++++++++++++ .../savedcmd_parser/command_splitter.py | 124 +++++ .../savedcmd_parser/savedcmd_parser.py | 68 +++ .../cmd_graph/savedcmd_parser/tokenizer.py | 94 ++++ scripts/sbom/sbom/environment.py | 192 +++++++ 6 files changed, 958 insertions(+) create mode 100644 scripts/sbom/sbom/cmd_graph/savedcmd_parser/__init__.py create mode 100644 scripts/sbom/sbom/cmd_graph/savedcmd_parser/command_par= ser_registry.py create mode 100644 scripts/sbom/sbom/cmd_graph/savedcmd_parser/command_spl= itter.py create mode 100644 scripts/sbom/sbom/cmd_graph/savedcmd_parser/savedcmd_pa= rser.py create mode 100644 scripts/sbom/sbom/cmd_graph/savedcmd_parser/tokenizer.py create mode 100644 scripts/sbom/sbom/environment.py diff --git a/scripts/sbom/sbom/cmd_graph/savedcmd_parser/__init__.py b/scri= pts/sbom/sbom/cmd_graph/savedcmd_parser/__init__.py new file mode 100644 index 00000000000..d13876af4df --- /dev/null +++ b/scripts/sbom/sbom/cmd_graph/savedcmd_parser/__init__.py @@ -0,0 +1,6 @@ +# SPDX-License-Identifier: GPL-2.0-only OR MIT +# Copyright (C) 2025 TNG Technology Consulting GmbH + +from sbom.cmd_graph.savedcmd_parser.savedcmd_parser import parse_inputs_fr= om_commands + +__all__ =3D ["parse_inputs_from_commands"] diff --git a/scripts/sbom/sbom/cmd_graph/savedcmd_parser/command_parser_reg= istry.py b/scripts/sbom/sbom/cmd_graph/savedcmd_parser/command_parser_regis= try.py new file mode 100644 index 00000000000..baf0c34ae09 --- /dev/null +++ b/scripts/sbom/sbom/cmd_graph/savedcmd_parser/command_parser_registry.py @@ -0,0 +1,474 @@ +# SPDX-License-Identifier: GPL-2.0-only OR MIT +# Copyright (C) 2025 TNG Technology Consulting GmbH + +import re +import shlex +from typing import Callable, Iterator + +import sbom.sbom_logging as sbom_logging +from sbom.environment import Environment +from sbom.cmd_graph.savedcmd_parser.command_splitter import IfBlock, split= _commands +from sbom.cmd_graph.savedcmd_parser.tokenizer import ( + CmdParsingError, + Positional, + tokenize_single_command, + tokenize_single_command_positionals_only, +) +from sbom.path_utils import PathStr + +CommandParser =3D Callable[[str], list[PathStr]] +CommandParserRegistryEntry =3D tuple[re.Pattern[str], CommandParser] + + +def _parse_dd_command(command: str) -> list[PathStr]: + match =3D re.match(r"dd.*?if=3D(\S+)", command) + if match: + return [match.group(1)] + return [] + + +def _parse_cat_command(command: str) -> list[PathStr]: + positionals =3D tokenize_single_command_positionals_only(command) + # expect positionals to be ["cat", input1, input2, ...] + return [p for p in positionals[1:]] + + +def _parse_compound_command(command: str) -> list[PathStr]: + compound_command_parsers: list[CommandParserRegistryEntry] =3D [ + (re.compile(r"dd\b"), _parse_dd_command), + (re.compile(r"cat.*?\|"), lambda c: _parse_cat_command(c.split("|"= )[0])), + (re.compile(r"cat\b[^|>]*$"), _parse_cat_command), + (re.compile(r"echo\b"), _parse_noop), + (re.compile(r"\S+=3D"), _parse_noop), + (re.compile(r"printf\b"), _parse_noop), + (re.compile(r"sed\b"), _parse_sed_command), + ( + re.compile(r"(.*/)scripts/bin2c\s*<"), + lambda c: [input] if (input :=3D c.split("<")[1].strip()) !=3D= "/dev/null" else [], + ), + (re.compile(r"^:$"), _parse_noop), + ] + + match =3D re.match(r"\s*[\(\{](.*)[\)\}]\s*>", command, re.DOTALL) + if match is None: + raise CmdParsingError("No inner commands found for compound comman= d") + input_files: list[PathStr] =3D [] + inner_commands =3D split_commands(match.group(1)) + for inner_command in inner_commands: + if isinstance(inner_command, IfBlock): + sbom_logging.error( + "Skip parsing inner command {inner_command} of compound co= mmand because IfBlock is not supported", + inner_command=3Dinner_command, + ) + continue + + parser =3D next((parser for pattern, parser in compound_command_pa= rsers if pattern.match(inner_command)), None) + if parser is None: + sbom_logging.error( + "Skip parsing inner command {inner_command} of compound co= mmand because no matching parser was found", + inner_command=3Dinner_command, + ) + continue + try: + input_files +=3D parser(inner_command) + except CmdParsingError as e: + sbom_logging.error( + "Skip parsing inner command {inner_command} of compound co= mmand because of command parsing error: {error_message}", + inner_command=3Dinner_command, + error_message=3De.message, + ) + return input_files + + +def _parse_objcopy_command(command: str) -> list[PathStr]: + command_parts =3D tokenize_single_command(command, flag_options=3D["-S= ", "-w"]) + positionals =3D [part.value for part in command_parts if isinstance(pa= rt, Positional)] + # expect positionals to be ['objcopy', input_file] or ['objcopy', inpu= t_file, output_file] + if not (len(positionals) =3D=3D 2 or len(positionals) =3D=3D 3): + raise CmdParsingError( + f"Invalid objcopy command format: expected 2 or 3 positional a= rguments, got {len(positionals)} ({positionals})" + ) + return [positionals[1]] + + +def _parse_link_vmlinux_command(command: str) -> list[PathStr]: + """ + For simplicity we do not parse the `scripts/link-vmlinux.sh` script. + Instead the `vmlinux.a` dependency is just hardcoded for now. + """ + return ["vmlinux.a"] + + +def _parse_noop(command: str) -> list[PathStr]: + """ + No-op parser for commands with no input files (e.g., 'rm', 'true'). + Returns an empty list. + """ + return [] + + +def _parse_ar_command(command: str) -> list[PathStr]: + positionals =3D tokenize_single_command_positionals_only(command) + # expect positionals to be ['ar', flags, output, input1, input2, ...] + flags =3D positionals[1] + if "r" not in flags: + # 'r' option indicates that new files are added to the archive. + # If this option is missing we won't find any relevant input files. + return [] + return positionals[3:] + + +def _parse_ar_piped_xargs_command(command: str) -> list[PathStr]: + printf_command, _ =3D command.split("|", 1) + positionals =3D tokenize_single_command_positionals_only(printf_comman= d.strip()) + # expect positionals to be ['printf', '{prefix_path}%s ', input1, inpu= t2, ...] + prefix_path =3D positionals[1].rstrip("%s ") + return [f"{prefix_path}{filename}" for filename in positionals[2:]] + + +def _parse_gcc_or_clang_command(command: str) -> list[PathStr]: + parts =3D shlex.split(command) + # compile mode: expect last positional argument ending in a source fil= e extension to be the input file + for part in reversed(parts): + if not part.startswith("-") and any(part.endswith(suffix) for suff= ix in [".c", ".S", ".dts"]): + return [part] + + # linking mode: expect all .o files to be the inputs + return [p for p in parts if p.endswith(".o")] + + +def _parse_rustc_command(command: str) -> list[PathStr]: + parts =3D shlex.split(command) + # expect last positional argument ending in `.rs` to be the input file + for part in reversed(parts): + if not part.startswith("-") and part.endswith(".rs"): + return [part] + raise CmdParsingError("Could not find .rs input source file") + + +def _parse_rustdoc_command(command: str) -> list[PathStr]: + parts =3D shlex.split(command) + # expect last positional argument ending in `.rs` to be the input file + for part in reversed(parts): + if not part.startswith("-") and part.endswith(".rs"): + return [part] + raise CmdParsingError("Could not find .rs input source file") + + +def _parse_syscallhdr_command(command: str) -> list[PathStr]: + command_parts =3D tokenize_single_command(command.strip(), flag_option= s=3D["--emit-nr"]) + positionals =3D [p.value for p in command_parts if isinstance(p, Posit= ional)] + # expect positionals to be ["sh", path/to/syscallhdr.sh, input, output] + return [positionals[2]] + + +def _parse_syscalltbl_command(command: str) -> list[PathStr]: + command_parts =3D tokenize_single_command(command.strip()) + positionals =3D [p.value for p in command_parts if isinstance(p, Posit= ional)] + # expect positionals to be ["sh", path/to/syscalltbl.sh, input, output] + return [positionals[2]] + + +def _parse_mkcapflags_command(command: str) -> list[PathStr]: + positionals =3D tokenize_single_command_positionals_only(command) + # expect positionals to be ["sh", path/to/mkcapflags.sh, output, input= 1, input2] + return [positionals[3], positionals[4]] + + +def _parse_orc_hash_command(command: str) -> list[PathStr]: + positionals =3D tokenize_single_command_positionals_only(command) + # expect positionals to be ["sh", path/to/orc_hash.sh, '<', input, '>'= , output] + return [positionals[3]] + + +def _parse_xen_hypercalls_command(command: str) -> list[PathStr]: + positionals =3D tokenize_single_command_positionals_only(command) + # expect positionals to be ["sh", path/to/xen-hypercalls.sh, output, i= nput1, input2, ...] + return positionals[3:] + + +def _parse_gen_initramfs_command(command: str) -> list[PathStr]: + command_parts =3D tokenize_single_command(command) + positionals =3D [p.value for p in command_parts if isinstance(p, Posit= ional)] + # expect positionals to be ["sh", path/to/gen_initramfs.sh, input1, in= put2, ...] + return positionals[2:] + + +def _parse_vdso2c_command(command: str) -> list[PathStr]: + positionals =3D tokenize_single_command_positionals_only(command) + # expect positionals to be ['vdso2c', raw_input, stripped_input, outpu= t] + return [positionals[1], positionals[2]] + + +def _parse_ld_command(command: str) -> list[PathStr]: + command_parts =3D tokenize_single_command( + command=3Dcommand.strip(), + flag_options=3D[ + "-shared", + "--no-undefined", + "--eh-frame-hdr", + "-Bsymbolic", + "-r", + "--no-ld-generated-unwind-info", + "--no-dynamic-linker", + "-pie", + "--no-dynamic-linker--whole-archive", + "--whole-archive", + "--no-whole-archive", + "--start-group", + "--end-group", + ], + ) + positionals =3D [p.value for p in command_parts if isinstance(p, Posit= ional)] + # expect positionals to be ["ld", input1, input2, ...] + return positionals[1:] + + +def _parse_sed_command(command: str) -> list[PathStr]: + command_parts =3D shlex.split(command) + # expect command parts to be ["sed", *, input] + input =3D command_parts[-1] + if input =3D=3D "/dev/null": + return [] + return [input] + + +def _parse_awk(command: str) -> list[PathStr]: + command_parts =3D tokenize_single_command(command) + positionals =3D [p.value for p in command_parts if isinstance(p, Posit= ional)] + # expect positionals to be ["awk", input1, input2, ...] + return positionals[1:] + + +def _parse_nm_piped_command(command: str) -> list[PathStr]: + nm_command, _ =3D command.split("|", 1) + command_parts =3D tokenize_single_command( + command=3Dnm_command.strip(), + flag_options=3D["p", "--defined-only"], + ) + positionals =3D [p.value for p in command_parts if isinstance(p, Posit= ional)] + # expect positionals to be ["nm", input1, input2, ...] + return [p for p in positionals[1:]] + + +def _parse_pnm_to_logo_command(command: str) -> list[PathStr]: + command_parts =3D shlex.split(command) + # expect command parts to be ["pnmtologo", , input] + return [command_parts[-1]] + + +def _parse_relacheck(command: str) -> list[PathStr]: + positionals =3D tokenize_single_command_positionals_only(command) + # expect positionals to be ["relachek", input, log_reference] + return [positionals[1]] + + +def _parse_gen_hyprel_command(command: str) -> list[PathStr]: + gen_hyprel_command, _ =3D command.split(">", 1) + command_parts =3D shlex.split(gen_hyprel_command) + # expect command_parts to be ["gen-hyprel", input] + return [command_parts[1]] + + +def _parse_perl_command(command: str) -> list[PathStr]: + positionals =3D tokenize_single_command_positionals_only(command.strip= ()) + # expect positionals to be ["perl", input] + return [positionals[1]] + + +def _parse_strip_command(command: str) -> list[PathStr]: + command_parts =3D tokenize_single_command(command, flag_options=3D["--= strip-debug"]) + positionals =3D [p.value for p in command_parts if isinstance(p, Posit= ional)] + # expect positionals to be ["strip", input1, input2, ...] + return positionals[1:] + + +def _parse_mkpiggy_command(command: str) -> list[PathStr]: + mkpiggy_command, _ =3D command.split(">", 1) + positionals =3D tokenize_single_command_positionals_only(mkpiggy_comma= nd) + # expect positionals to be ["mkpiggy", input] + return [positionals[1]] + + +def _parse_relocs_command(command: str) -> list[PathStr]: + if ">" not in command: + # Only consider relocs commands that redirect output to a file. + # If there's no redirection, we assume it produces no output file = and therefore has no input we care about. + return [] + relocs_command, _ =3D command.split(">", 1) + command_parts =3D shlex.split(relocs_command) + # expect command_parts to be ["relocs", options, input] + return [command_parts[-1]] + + +def _parse_mk_elfconfig_command(command: str) -> list[PathStr]: + positionals =3D tokenize_single_command_positionals_only(command) + # expect positionals to be ["mk_elfconfig", "<", input, ">", output] + return [positionals[2]] + + +def _parse_flex_command(command: str) -> list[PathStr]: + parts =3D shlex.split(command) + # expect last positional argument ending in `.l` to be the input file + for part in reversed(parts): + if not part.startswith("-") and part.endswith(".l"): + return [part] + raise CmdParsingError("Could not find .l input source file in command") + + +def _parse_bison_command(command: str) -> list[PathStr]: + parts =3D shlex.split(command) + # expect last positional argument ending in `.y` to be the input file + for part in reversed(parts): + if not part.startswith("-") and part.endswith(".y"): + return [part] + raise CmdParsingError("Could not find input .y input source file in co= mmand") + + +def _parse_tools_build_command(command: str) -> list[PathStr]: + positionals =3D tokenize_single_command_positionals_only(command) + # expect positionals to be ["tools/build", "input1", "input2", "input3= ", "output"] + return positionals[1:-1] + + +def _parse_extract_cert_command(command: str) -> list[PathStr]: + command_parts =3D shlex.split(command) + # expect command parts to be [path/to/extract-cert, input, output] + input =3D command_parts[1] + if not input: + return [] + return [input] + + +def _parse_dtc_command(command: str) -> list[PathStr]: + wno_flags =3D [command_part for command_part in shlex.split(command) i= f command_part.startswith("-Wno-")] + command_parts =3D tokenize_single_command(command, flag_options=3Dwno_= flags) + positionals =3D [p.value for p in command_parts if isinstance(p, Posit= ional)] + # expect positionals to be [path/to/dtc, input] + return [positionals[1]] + + +def _parse_bindgen_command(command: str) -> list[PathStr]: + command_parts =3D shlex.split(command) + header_file_input_paths =3D [part for part in command_parts if part.en= dswith(".h")] + return header_file_input_paths + + +def _parse_gen_header(command: str) -> list[PathStr]: + command_parts =3D shlex.split(command) + # expect command parts to be ["python3", path/to/gen_headers.py, ..., = "--xml", input] + i =3D next(i for i, token in enumerate(command_parts) if token =3D=3D = "--xml") + return [command_parts[i + 1]] + + +class CommandParserRegistry: + """ + Registry mapping command patterns to their input-file parsers. + """ + + def __init__(self, entries: list[CommandParserRegistryEntry]) -> None: + self._entries =3D entries + + def __iter__(self) -> Iterator[CommandParserRegistryEntry]: + return iter(self._entries) + + @staticmethod + def create() -> "CommandParserRegistry": + def env_or_default_pattern(env_value: str | None, default_pattern:= str) -> str: + if env_value is None or not env_value.strip(): + return default_pattern + return rf"(?:{re.escape(env_value.strip())}|{default_pattern})" + + cc_pattern =3D env_or_default_pattern(Environment.CC(), r"([^\s]+-= )?(gcc|clang)") + ld_pattern =3D env_or_default_pattern(Environment.LD(), r"([^\s]+-= )?ld") + ar_pattern =3D env_or_default_pattern(Environment.AR(), r"([^\s]+-= )?ar") + nm_pattern =3D env_or_default_pattern(Environment.NM(), r"([^\s]+-= )?nm") + objcopy_pattern =3D env_or_default_pattern(Environment.OBJCOPY(), = r"([^\s]+-)?objcopy") + strip_pattern =3D env_or_default_pattern(Environment.STRIP(), r"([= ^\s]+-)?strip") + + entries: list[CommandParserRegistryEntry] =3D [ + # Compound commands + (re.compile(r"\(.*?\)\s*>", re.DOTALL), _parse_compound_comman= d), + (re.compile(r"\{.*?\}\s*>", re.DOTALL), _parse_compound_comman= d), + # Standard Unix utilities and system tools + (re.compile(r"^rm\b"), _parse_noop), + (re.compile(r"^mkdir\b"), _parse_noop), + (re.compile(r"^touch\b"), _parse_noop), + (re.compile(r"^cat\b.*?[\|>]"), lambda c: _parse_cat_command(c= .split("|")[0].split(">")[0])), + (re.compile(r"^echo[^|]*$"), _parse_noop), + (re.compile(r"^sed.*?>"), lambda c: _parse_sed_command(c.split= (">")[0])), + (re.compile(r"^sed\b"), _parse_noop), + (re.compile(r"^awk.*?<.*?>"), lambda c: [c.split("<")[1].split= (">")[0]]), + (re.compile(r"^awk.*?>"), lambda c: _parse_awk(c.split(">")[0]= )), + (re.compile(r"^(/bin/)?true\b"), _parse_noop), + (re.compile(r"^(/bin/)?false\b"), _parse_noop), + (re.compile(r"^openssl\s+req.*?-new.*?-keyout"), _parse_noop), + # Compilers and code generators + # (C/LLVM toolchain, Rust, Flex/Bison, Bindgen, Perl, etc.) + ( + re.compile(rf"^{cc_pattern}\b"), + lambda command: _parse_gcc_or_clang_command(re.sub(rf"^{cc= _pattern}\b", "gcc", command, count=3D1)), + ), + ( + re.compile(rf"^{ld_pattern}\b"), + lambda command: _parse_ld_command(re.sub(rf"^{ld_pattern}\= b", "ld", command, count=3D1)), + ), + ( + re.compile(rf"^printf\b.*\| xargs {ar_pattern}\b"), + lambda command: _parse_ar_piped_xargs_command( + re.sub(rf"xargs {ar_pattern}\b", "xargs ar", command, = count=3D1) + ), + ), + ( + re.compile(rf"^{ar_pattern}\b"), + lambda command: _parse_ar_command(re.sub(rf"^{ar_pattern}\= b", "ar", command, count=3D1)), + ), + ( + re.compile(rf"^{nm_pattern}\b.*?\|"), + lambda command: _parse_nm_piped_command(re.sub(rf"^{nm_pat= tern}\b", "nm", command, count=3D1)), + ), + ( + re.compile(rf"^{objcopy_pattern}\b"), + lambda command: _parse_objcopy_command(re.sub(rf"^{objcopy= _pattern}\b", "objcopy", command, count=3D1)), + ), + ( + re.compile(rf"^{strip_pattern}\b"), + lambda command: _parse_strip_command(re.sub(rf"^{strip_pat= tern}\b", "strip", command, count=3D1)), + ), + (re.compile(r".*?rustc\b"), _parse_rustc_command), + (re.compile(r".*?rustdoc\b"), _parse_rustdoc_command), + (re.compile(r"^flex\b"), _parse_flex_command), + (re.compile(r"^bison\b"), _parse_bison_command), + (re.compile(r"^bindgen\b"), _parse_bindgen_command), + (re.compile(r"^perl\b"), _parse_perl_command), + # Kernel-specific build scripts and tools + (re.compile(r"^(.*/)?link-vmlinux\.sh\b"), _parse_link_vmlinux= _command), + (re.compile(r"sh (.*/)?syscallhdr\.sh\b"), _parse_syscallhdr_c= ommand), + (re.compile(r"sh (.*/)?syscalltbl\.sh\b"), _parse_syscalltbl_c= ommand), + (re.compile(r"sh (.*/)?mkcapflags\.sh\b"), _parse_mkcapflags_c= ommand), + (re.compile(r"sh (.*/)?orc_hash\.sh\b"), _parse_orc_hash_comma= nd), + (re.compile(r"sh (.*/)?xen-hypercalls\.sh\b"), _parse_xen_hype= rcalls_command), + (re.compile(r"sh (.*/)?gen_initramfs\.sh\b"), _parse_gen_initr= amfs_command), + (re.compile(r"sh (.*/)?checkundef\.sh\b"), _parse_noop), + (re.compile(r"(.*/)?vdso2c\b"), _parse_vdso2c_command), + (re.compile(r"^(.*/)?mkpiggy.*?>"), _parse_mkpiggy_command), + (re.compile(r"^(.*/)?relocs\b"), _parse_relocs_command), + (re.compile(r"^(.*/)?mk_elfconfig.*?<.*?>"), _parse_mk_elfconf= ig_command), + (re.compile(r"^(.*/)?tools/build\b"), _parse_tools_build_comma= nd), + (re.compile(r"^(.*/)?certs/extract-cert"), _parse_extract_cert= _command), + (re.compile(r"^(.*/)?scripts/dtc/dtc\b"), _parse_dtc_command), + (re.compile(r"^(.*/)?pnmtologo\b"), _parse_pnm_to_logo_command= ), + (re.compile(r"^(.*/)?kernel/pi/relacheck"), _parse_relacheck), + (re.compile(r"^(.*/)?gen-hyprel\b"), _parse_gen_hyprel_command= ), + (re.compile(r"^drivers/gpu/drm/radeon/mkregtable"), lambda c: = [c.split(" ")[1]]), + (re.compile(r"(.*/)?genheaders\b"), _parse_noop), + (re.compile(r"^(.*/)?mkcpustr\s+>"), _parse_noop), + (re.compile(r"^(.*/)polgen\b"), _parse_noop), + (re.compile(r"make -f .*/arch/x86/Makefile\.postlink"), _parse= _noop), + (re.compile(r"^(.*/)?raid6/mktables\s+>"), _parse_noop), + (re.compile(r"^(.*/)?objtool\b"), _parse_noop), + (re.compile(r"^(.*/)?module/gen_test_kallsyms.sh"), _parse_noo= p), + (re.compile(r"^(.*/)?gen_header.py"), _parse_gen_header), + (re.compile(r"^(.*/)?scripts/rustdoc_test_gen"), _parse_noop), + ] + return CommandParserRegistry(entries) diff --git a/scripts/sbom/sbom/cmd_graph/savedcmd_parser/command_splitter.p= y b/scripts/sbom/sbom/cmd_graph/savedcmd_parser/command_splitter.py new file mode 100644 index 00000000000..f105986213b --- /dev/null +++ b/scripts/sbom/sbom/cmd_graph/savedcmd_parser/command_splitter.py @@ -0,0 +1,124 @@ +# SPDX-License-Identifier: GPL-2.0-only OR MIT +# Copyright (C) 2025 TNG Technology Consulting GmbH + +import re +from dataclasses import dataclass + + +# If Block pattern to match a simple, single-level if-then-fi block. Neste= d If blocks are not supported. +IF_BLOCK_PATTERN =3D re.compile( + r""" + ^if(.*?);\s* # Match 'if ;' (non-greedy) + then(.*?);\s* # Match 'then ;' (non-greedy) + fi\b # Match 'fi' + """, + re.VERBOSE, +) + + +@dataclass +class IfBlock: + condition: str + then_statement: str + + +def _unwrap_outer_parentheses(s: str) -> str: + s =3D s.strip() + if not (s.startswith("(") and s.endswith(")")): + return s + + count =3D 0 + for i, char in enumerate(s): + if char =3D=3D "(": + count +=3D 1 + elif char =3D=3D ")": + count -=3D 1 + # If count is 0 before the end, outer parentheses don't match + if count =3D=3D 0 and i !=3D len(s) - 1: + return s + + # outer parentheses do match, unwrap once + return _unwrap_outer_parentheses(s[1:-1]) + + +def _find_first_top_level_command_separator( + commands: str, separators: list[str] =3D [";", "&&"] +) -> tuple[int | None, int | None]: + in_single_quote =3D False + in_double_quote =3D False + in_curly_braces =3D 0 + in_braces =3D 0 + for i, char in enumerate(commands): + if char =3D=3D "'" and not in_double_quote: + # Toggle single quote state (unless inside double quotes) + in_single_quote =3D not in_single_quote + elif char =3D=3D '"' and not in_single_quote: + # Toggle double quote state (unless inside single quotes) + in_double_quote =3D not in_double_quote + + if in_single_quote or in_double_quote: + continue + + # Toggle braces state + if char =3D=3D "{": + in_curly_braces +=3D 1 + if char =3D=3D "}": + in_curly_braces -=3D 1 + + if char =3D=3D "(": + in_braces +=3D 1 + if char =3D=3D ")": + in_braces -=3D 1 + + if in_curly_braces > 0 or in_braces > 0: + continue + + # return found separator position and separator length + for separator in separators: + if commands[i : i + len(separator)] =3D=3D separator: + return i, len(separator) + + return None, None + + +def split_commands(commands: str) -> list[str | IfBlock]: + """ + Splits a string of command-line commands into individual parts. + + This function handles: + - Top-level command separators (e.g., `;` and `&&`) to split multiple = commands. + - Conditional if-blocks, returning them as `IfBlock` instances. + - Preserves the order of commands and trims whitespace. + + Args: + commands (str): The raw command string. + + Returns: + list[str | IfBlock]: A list of single commands or `IfBlock` object= s. + """ + single_commands: list[str | IfBlock] =3D [] + remaining_commands =3D _unwrap_outer_parentheses(commands) + while len(remaining_commands) > 0: + remaining_commands =3D remaining_commands.strip() + + # if block + matched_if =3D IF_BLOCK_PATTERN.match(remaining_commands) + if matched_if: + condition, then_statement =3D matched_if.groups() + single_commands.append(IfBlock(condition.strip(), then_stateme= nt.strip())) + full_matched =3D matched_if.group(0) + remaining_commands =3D remaining_commands.removeprefix(full_ma= tched).lstrip("; \n") + continue + + # command until next separator + separator_position, separator_length =3D _find_first_top_level_com= mand_separator(remaining_commands) + if separator_position is not None and separator_length is not None: + single_commands.append(remaining_commands[:separator_position]= .strip()) + remaining_commands =3D remaining_commands[separator_position += separator_length :].strip() + continue + + # single last command + single_commands.append(remaining_commands) + break + + return single_commands diff --git a/scripts/sbom/sbom/cmd_graph/savedcmd_parser/savedcmd_parser.py= b/scripts/sbom/sbom/cmd_graph/savedcmd_parser/savedcmd_parser.py new file mode 100644 index 00000000000..7062c23288b --- /dev/null +++ b/scripts/sbom/sbom/cmd_graph/savedcmd_parser/savedcmd_parser.py @@ -0,0 +1,68 @@ +# SPDX-License-Identifier: GPL-2.0-only OR MIT +# Copyright (C) 2025 TNG Technology Consulting GmbH + +from typing import Any +import sbom.sbom_logging as sbom_logging +from sbom.cmd_graph.savedcmd_parser.command_splitter import IfBlock, split= _commands +from sbom.cmd_graph.savedcmd_parser.command_parser_registry import Command= ParserRegistry +from sbom.cmd_graph.savedcmd_parser.tokenizer import CmdParsingError +from sbom.path_utils import PathStr + +DEFAULT_COMMAND_PARSER_REGISTRY =3D CommandParserRegistry.create() + + +def parse_inputs_from_commands( + commands: str, + fail_on_unknown_build_command: bool, + registry: CommandParserRegistry | None =3D None, +) -> list[PathStr]: + """ + Extract input files referenced in a set of command-line commands. + + Args: + commands (str): Command line expression to parse. + fail_on_unknown_build_command (bool): Whether to fail if an unknow= n build command is encountered. If False, errors are logged as warnings. + registry (CommandParserRegistry | None): Registry of single comman= d parsers. + + Returns: + list[PathStr]: List of input file paths required by the commands. + """ + + def log_error_or_warning(message: str, /, **kwargs: Any) -> None: + if fail_on_unknown_build_command: + sbom_logging.error(message, **kwargs) + else: + sbom_logging.warning(message, **kwargs) + + if registry is None: + registry =3D DEFAULT_COMMAND_PARSER_REGISTRY + + input_files: list[PathStr] =3D [] + for single_command in split_commands(commands): + if isinstance(single_command, IfBlock): + inputs =3D parse_inputs_from_commands(single_command.then_stat= ement, fail_on_unknown_build_command, registry) + if inputs: + log_error_or_warning( + "Skipped parsing command {then_statement} because inpu= t files in IfBlock 'then' statement are not supported", + then_statement=3Dsingle_command.then_statement, + ) + continue + + matched_parser =3D next((parser for pattern, parser in registry if= pattern.match(single_command)), None) + if matched_parser is None: + log_error_or_warning( + "Skipped parsing command {single_command} because no match= ing parser was found", + single_command=3Dsingle_command, + ) + continue + try: + inputs =3D matched_parser(single_command) + input_files.extend(inputs) + except CmdParsingError as e: + log_error_or_warning( + "Skipped parsing command {single_command} because of comma= nd parsing error: {error_message}", + single_command=3Dsingle_command, + error_message=3De.message, + ) + + return [input.strip().rstrip("/") for input in input_files] diff --git a/scripts/sbom/sbom/cmd_graph/savedcmd_parser/tokenizer.py b/scr= ipts/sbom/sbom/cmd_graph/savedcmd_parser/tokenizer.py new file mode 100644 index 00000000000..09ae2d6e36c --- /dev/null +++ b/scripts/sbom/sbom/cmd_graph/savedcmd_parser/tokenizer.py @@ -0,0 +1,94 @@ +# SPDX-License-Identifier: GPL-2.0-only OR MIT +# Copyright (C) 2025 TNG Technology Consulting GmbH + +import re +import shlex +from dataclasses import dataclass +from typing import Union + + +class CmdParsingError(Exception): + def __init__(self, message: str): + super().__init__(message) + self.message =3D message + + +@dataclass +class Option: + name: str + value: str | None =3D None + + +@dataclass +class Positional: + value: str + + +_SUBCOMMAND_PATTERN =3D re.compile(r"\$\$\(([^()]*)\)") +"""Pattern to match $$(...) blocks""" + + +def tokenize_single_command(command: str, flag_options: list[str] | None = =3D None) -> list[Union[Option, Positional]]: + """ + Parse a shell command into a list of Options and Positionals. + - Positional: the command and any positional arguments. + - Options: handles flags and options with values provided as space-sep= arated, or equals-sign + (e.g., '--opt val', '--opt=3Dval', '--flag'). + + Args: + command: Command line string. + flag_options: Options that are flags without values (e.g., '--verb= ose'). + + Returns: + List of `Option` and `Positional` objects in command order. + """ + + # Wrap all $$(...) blocks in double quotes to prevent shlex from spli= tting them. + command_with_protected_subcommands =3D _SUBCOMMAND_PATTERN.sub(lambda = m: f'"$$({m.group(1)})"', command) + tokens =3D shlex.split(command_with_protected_subcommands) + + parsed: list[Option | Positional] =3D [] + i =3D 0 + while i < len(tokens): + token =3D tokens[i] + + # Positional + if not token.startswith("-"): + parsed.append(Positional(token)) + i +=3D 1 + continue + + # Option without value (--flag) + if (token.startswith("-") and i + 1 < len(tokens) and tokens[i + 1= ].startswith("-")) or ( + flag_options and token in flag_options + ): + parsed.append(Option(name=3Dtoken)) + i +=3D 1 + continue + + # Option with equals sign (--opt=3Dval) + if "=3D" in token: + name, value =3D token.split("=3D", 1) + parsed.append(Option(name=3Dname, value=3Dvalue)) + i +=3D 1 + continue + + # Option with space-separated value (--opt val) + if i + 1 < len(tokens) and not tokens[i + 1].startswith("-"): + parsed.append(Option(name=3Dtoken, value=3Dtokens[i + 1])) + i +=3D 2 + continue + + raise CmdParsingError(f"Unrecognized token: {token} in command {co= mmand}") + + return parsed + + +def tokenize_single_command_positionals_only(command: str) -> list[str]: + command_parts =3D tokenize_single_command(command) + positionals =3D [p.value for p in command_parts if isinstance(p, Posit= ional)] + if len(positionals) !=3D len(command_parts): + raise CmdParsingError( + f"Invalid command format: expected positional arguments only b= ut got options in command {command}." + ) + return positionals diff --git a/scripts/sbom/sbom/environment.py b/scripts/sbom/sbom/environme= nt.py new file mode 100644 index 00000000000..4304066fe97 --- /dev/null +++ b/scripts/sbom/sbom/environment.py @@ -0,0 +1,192 @@ +# SPDX-License-Identifier: GPL-2.0-only OR MIT +# Copyright (C) 2025 TNG Technology Consulting GmbH + +import os + +KERNEL_BUILD_VARIABLES_ALLOWLIST =3D [ + "AFLAGS_KERNEL", + "AFLAGS_MODULE", + "AR", + "ARCH", + "ARCH_CORE", + "ARCH_DRIVERS", + "ARCH_LIB", + "AWK", + "BASH", + "BINDGEN", + "BITS", + "CC", + "CC_FLAGS_FPU", + "CC_FLAGS_NO_FPU", + "CFLAGS_GCOV", + "CFLAGS_KERNEL", + "CFLAGS_MODULE", + "CHECK", + "CHECKFLAGS", + "CLIPPY_CONF_DIR", + "CONFIG_SHELL", + "CPP", + "CROSS_COMPILE", + "CURDIR", + "GNUMAKEFLAGS", + "HOSTCC", + "HOSTCXX", + "HOSTPKG_CONFIG", + "HOSTRUSTC", + "INSTALLKERNEL", + "INSTALL_DTBS_PATH", + "INSTALL_HDR_PATH", + "INSTALL_PATH", + "KBUILD_AFLAGS", + "KBUILD_AFLAGS_KERNEL", + "KBUILD_AFLAGS_MODULE", + "KBUILD_BUILTIN", + "KBUILD_CFLAGS", + "KBUILD_CFLAGS_KERNEL", + "KBUILD_CFLAGS_MODULE", + "KBUILD_CHECKSRC", + "KBUILD_CLIPPY", + "KBUILD_CPPFLAGS", + "KBUILD_EXTMOD", + "KBUILD_EXTRA_WARN", + "KBUILD_HOSTCFLAGS", + "KBUILD_HOSTCXXFLAGS", + "KBUILD_HOSTLDFLAGS", + "KBUILD_HOSTLDLIBS", + "KBUILD_HOSTRUSTFLAGS", + "KBUILD_IMAGE", + "KBUILD_LDFLAGS", + "KBUILD_LDFLAGS_MODULE", + "KBUILD_LDS", + "KBUILD_MODULES", + "KBUILD_PROCMACROLDFLAGS", + "KBUILD_RUSTFLAGS", + "KBUILD_RUSTFLAGS_KERNEL", + "KBUILD_RUSTFLAGS_MODULE", + "KBUILD_USERCFLAGS", + "KBUILD_USERLDFLAGS", + "KBUILD_VERBOSE", + "KBUILD_VMLINUX_LIBS", + "KBZIP2", + "KCONFIG_CONFIG", + "KERNELDOC", + "KERNELRELEASE", + "KERNELVERSION", + "KGZIP", + "KLZOP", + "LC_COLLATE", + "LC_NUMERIC", + "LD", + "LDFLAGS_MODULE", + "LEX", + "LINUXINCLUDE", + "LZ4", + "LZMA", + "MAKE", + "MAKEFILES", + "MAKEFILE_LIST", + "MAKEFLAGS", + "MAKELEVEL", + "MAKEOVERRIDES", + "MAKE_COMMAND", + "MAKE_HOST", + "MAKE_TERMERR", + "MAKE_TERMOUT", + "MAKE_VERSION", + "MFLAGS", + "MODLIB", + "NM", + "NOSTDINC_FLAGS", + "O", + "OBJCOPY", + "OBJCOPYFLAGS", + "OBJDUMP", + "PAHOLE", + "PATCHLEVEL", + "PERL", + "PYTHON3", + "Q", + "RCS_FIND_IGNORE", + "READELF", + "REALMODE_CFLAGS", + "RESOLVE_BTFIDS", + "RETHUNK_CFLAGS", + "RETHUNK_RUSTFLAGS", + "RETPOLINE_CFLAGS", + "RETPOLINE_RUSTFLAGS", + "RETPOLINE_VDSO_CFLAGS", + "RUSTC", + "RUSTC_BOOTSTRAP", + "RUSTC_OR_CLIPPY", + "RUSTC_OR_CLIPPY_QUIET", + "RUSTDOC", + "RUSTFLAGS_KERNEL", + "RUSTFLAGS_MODULE", + "RUSTFMT", + "SRCARCH", + "STRIP", + "SUBLEVEL", + "SUFFIXES", + "TAR", + "UTS_MACHINE", + "VERSION", + "VPATH", + "XZ", + "YACC", + "ZSTD", + "building_out_of_srctree", + "cross_compiling", + "objtree", + "quiet", + "rust_common_flags", + "srcroot", + "srctree", + "sub_make_done", + "subdir", +] + + +class Environment: + """ + Read-only accessor for kernel build environment variables. + """ + + @classmethod + def KERNEL_BUILD_VARIABLES(cls) -> dict[str, str]: + return { + name: value.strip() + for name in KERNEL_BUILD_VARIABLES_ALLOWLIST + if (value :=3D os.getenv(name)) is not None and value.strip() + } + + @classmethod + def ARCH(cls) -> str | None: + return os.getenv("ARCH") + + @classmethod + def SRCARCH(cls) -> str | None: + return os.getenv("SRCARCH") + + @classmethod + def CC(cls) -> str | None: + return os.getenv("CC") + + @classmethod + def LD(cls) -> str | None: + return os.getenv("LD") + + @classmethod + def AR(cls) -> str | None: + return os.getenv("AR") + + @classmethod + def NM(cls) -> str | None: + return os.getenv("NM") + + @classmethod + def OBJCOPY(cls) -> str | None: + return os.getenv("OBJCOPY") + + @classmethod + def STRIP(cls) -> str | None: + return os.getenv("STRIP") --=20 2.43.0 From nobody Sat Apr 18 04:54:51 2026 Received: from mailgw01.zimbra-vnc.de (mailgw01.zimbra-vnc.de [148.251.101.236]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 4A7CF3C7DF2; Fri, 10 Apr 2026 21:23:39 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=148.251.101.236 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1775856225; cv=none; b=Se1283FdSqDormKna3BvzJItEndgjFLbyj3jAGa3OCKzO4fdYl1XXbks8OqybSlv4liqkkQr6q/nE2FqESH8s9xv0lJr8gnTfA/Z7TVmWBkSJQlutuIxS49hcKspi7WOEMSxSt1RbrEjzQmBZSl4CuHVXBuNX14RDE9se5Ka6Rs= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1775856225; c=relaxed/simple; bh=dMxScmmWvdV/DgvOlbwONaE917GIbGF2lZK3dey5El8=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=HmQqVD2CL5+t2w6rTdTKt6KN3Ut3xP8f0h90urmVIozaqQB4rmt7e8xQAtLlti0ASY0VIF8QizQmSdVdlY1hp1EUv4ej1Mxkce9tCuyhN1QzDQtMsSP3ms4YQ4U22sic51SjzPgjT/4TH2arNqLxU5VCAAFh9ZmjJaTwL2nzgSI= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=tngtech.com; spf=pass smtp.mailfrom=tngtech.com; dkim=pass (2048-bit key) header.d=tngtech.com header.i=@tngtech.com header.b=fOJVNeLr; arc=none smtp.client-ip=148.251.101.236 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=tngtech.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=tngtech.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=tngtech.com header.i=@tngtech.com header.b="fOJVNeLr" Received: from zmproxy.tng.vnc.biz (zimbra-vnc.tngtech.com [35.234.71.156]) by mailgw01.zimbra-vnc.de (Postfix) with ESMTPS id 700DB3FAF3; Fri, 10 Apr 2026 23:23:37 +0200 (CEST) Received: from localhost (localhost [127.0.0.1]) by zmproxy.tng.vnc.biz (Postfix) with ESMTP id 3A1851FA89E; Fri, 10 Apr 2026 23:23:37 +0200 (CEST) Received: from zmproxy.tng.vnc.biz ([127.0.0.1]) by localhost (zmproxy.tng.vnc.biz [127.0.0.1]) (amavis, port 10032) with ESMTP id O8m-TwMEe-G7; Fri, 10 Apr 2026 23:23:35 +0200 (CEST) Received: from localhost (localhost [127.0.0.1]) by zmproxy.tng.vnc.biz (Postfix) with ESMTP id B81351FADEA; Fri, 10 Apr 2026 23:23:35 +0200 (CEST) DKIM-Filter: OpenDKIM Filter v2.10.3 zmproxy.tng.vnc.biz B81351FADEA DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=tngtech.com; s=B14491C6-869D-11EB-BB6C-8DD33D883B31; t=1775856215; bh=Ec/NyMZ1LEwH8JGHpb6OE8SlGmGiPVfyghWL5EH+vBM=; h=From:To:Subject:Date:Message-ID:MIME-Version; b=fOJVNeLrvYBnEkvw7T0aAIh57Sn2m+kl4vRnGf3EtthGlOhbmFQU7a8aTqrVM0R9o OwYucx2HvU1Y6n/odtYze5kxZHpH3fPR9zOA6rzY2VPCqgy8cvGW5jFKXXc08Uy3Be OpDZInOJjtwidbx7n2XUI6fvYQ2/4F71hbCSIvWogpoJdlp5r7PkBLBGs0c8x0lhkw h/tBFBK47BqfRq01tJ1N7PvgHCpiMQT6Ycttph6EhezBPyFqEztZ8JojFKwRPCi23x NT4stRtdIeJXkLEJdL36vdzz7Xo8dBzms4DxoEeT5lk82lxjXyiYawXF6pPETgFcZY eJ6Ku/G05uVAA== X-Virus-Scanned: amavis at zmproxy.tng.vnc.biz Received: from zmproxy.tng.vnc.biz ([127.0.0.1]) by localhost (zmproxy.tng.vnc.biz [127.0.0.1]) (amavis, port 10026) with ESMTP id W6d8rO7HTu0M; Fri, 10 Apr 2026 23:23:35 +0200 (CEST) Received: from luis-Precision-5480.. (ipservice-092-209-239-167.092.209.pools.vodafone-ip.de [92.209.239.167]) by zmproxy.tng.vnc.biz (Postfix) with ESMTPSA id 35A921FADC9; Fri, 10 Apr 2026 23:23:35 +0200 (CEST) From: Luis To: nathan@kernel.org, nsc@kernel.org Cc: linux-kbuild@vger.kernel.org, linux-kernel@vger.kernel.org, akpm@linux-foundation.org, gregkh@linuxfoundation.org, kstewart@linuxfoundation.org, maximilian.huber@tngtech.com, Luis Augenstein Subject: [PATCH v5 05/15] scripts/sbom: add cmd graph generation Date: Fri, 10 Apr 2026 23:22:45 +0200 Message-ID: <20260410212255.9883-6-luis.augenstein@tngtech.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20260410212255.9883-1-luis.augenstein@tngtech.com> References: <20260410212255.9883-1-luis.augenstein@tngtech.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Luis Augenstein Implement command graph generation by parsing .cmd files to build a dependency graph. Add CmdGraph, CmdGraphNode, and .cmd file parsing. Supports generating a flat list of used source files via the --generate-used-files cli argument. Assisted-by: Cursor:claude-sonnet-4-5 Assisted-by: OpenCode:GLM-4-7 Co-developed-by: Maximilian Huber Signed-off-by: Maximilian Huber Signed-off-by: Luis Augenstein --- Makefile | 6 +- scripts/sbom/sbom.py | 39 +++++ scripts/sbom/sbom/cmd_graph/__init__.py | 7 + scripts/sbom/sbom/cmd_graph/cmd_file.py | 149 ++++++++++++++++++ scripts/sbom/sbom/cmd_graph/cmd_graph.py | 46 ++++++ scripts/sbom/sbom/cmd_graph/cmd_graph_node.py | 111 +++++++++++++ scripts/sbom/sbom/cmd_graph/deps_parser.py | 52 ++++++ scripts/sbom/sbom/config.py | 147 ++++++++++++++++- scripts/sbom/sbom/path_utils.py | 11 ++ 9 files changed, 565 insertions(+), 3 deletions(-) create mode 100644 scripts/sbom/sbom/cmd_graph/__init__.py create mode 100644 scripts/sbom/sbom/cmd_graph/cmd_file.py create mode 100644 scripts/sbom/sbom/cmd_graph/cmd_graph.py create mode 100644 scripts/sbom/sbom/cmd_graph/cmd_graph_node.py create mode 100644 scripts/sbom/sbom/cmd_graph/deps_parser.py create mode 100644 scripts/sbom/sbom/path_utils.py diff --git a/Makefile b/Makefile index 06d1ccd9b96..394ebd46e82 100644 --- a/Makefile +++ b/Makefile @@ -2170,7 +2170,11 @@ sbom_targets +=3D sbom-build.spdx.json sbom-output.s= pdx.json quiet_cmd_sbom =3D GEN $(sbom_targets) cmd_sbom =3D printf "%s\n" "$(KBUILD_IMAGE)" >"$(tmp-target)"; \ $(if $(CONFIG_MODULES),sed 's/\.o$$/.ko/' $(objtree)/modu= les.order >> "$(tmp-target)";) \ - $(PYTHON3) $(srctree)/scripts/sbom/sbom.py; + $(PYTHON3) $(srctree)/scripts/sbom/sbom.py \ + --src-tree $(abspath $(srctree)) \ + --obj-tree $(abspath $(objtree)) \ + --roots-file "$(tmp-target)" \ + --output-directory $(abspath $(objtree)); PHONY +=3D sbom sbom: $(notdir $(KBUILD_IMAGE)) include/generated/autoconf.h $(if $(CONFIG= _MODULES),modules modules.order) $(call cmd,sbom) diff --git a/scripts/sbom/sbom.py b/scripts/sbom/sbom.py index c7f23d6eb30..25d912a282d 100644 --- a/scripts/sbom/sbom.py +++ b/scripts/sbom/sbom.py @@ -7,9 +7,13 @@ Compute software bill of materials in SPDX format describi= ng a kernel build. """ =20 import logging +import os import sys +import time import sbom.sbom_logging as sbom_logging from sbom.config import get_config +from sbom.path_utils import is_relative_to +from sbom.cmd_graph import CmdGraph =20 =20 def main(): @@ -22,6 +26,36 @@ def main(): format=3D"[%(levelname)s] %(message)s", ) =20 + # Build cmd graph + logging.debug("Start building cmd graph") + start_time =3D time.time() + cmd_graph =3D CmdGraph.create(config.root_paths, config) + logging.debug(f"Built cmd graph in {time.time() - start_time} seconds") + + # Save used files document + if config.generate_used_files: + if config.src_tree =3D=3D config.obj_tree: + logging.info( + f"Extracting all files from the cmd graph to {(config.used= _files_file_name,)} " + "instead of only source files because source files cannot = be " + "reliably classified when the source and object trees are = identical.", + ) + used_files =3D [os.path.relpath(node.absolute_path, config.src= _tree) for node in cmd_graph] + logging.debug(f"Found {len(used_files)} files in cmd graph.") + else: + used_files =3D [ + os.path.relpath(node.absolute_path, config.src_tree) + for node in cmd_graph + if is_relative_to(node.absolute_path, config.src_tree) + and not is_relative_to(node.absolute_path, config.obj_tree) + ] + logging.debug(f"Found {len(used_files)} source files in cmd gr= aph") + if not sbom_logging.has_errors() or config.write_output_on_error: + used_files_path =3D os.path.join(config.output_directory, conf= ig.used_files_file_name) + with open(used_files_path, "w", encoding=3D"utf-8") as f: + f.write("\n".join(str(file_path) for file_path in used_fil= es)) + logging.debug(f"Successfully saved {used_files_path}") + # Report collected warnings and errors in case of failure warning_summary =3D sbom_logging.summarize_warnings() error_summary =3D sbom_logging.summarize_errors() @@ -30,6 +64,11 @@ def main(): logging.warning(warning_summary) if error_summary: logging.error(error_summary) + if not config.write_output_on_error: + logging.info( + "Use --write-output-on-error to generate output documents = even when errors occur. " + "Note that in this case the generated SPDX documents may b= e incomplete." + ) sys.exit(1) =20 =20 diff --git a/scripts/sbom/sbom/cmd_graph/__init__.py b/scripts/sbom/sbom/cm= d_graph/__init__.py new file mode 100644 index 00000000000..9d661a5c3d9 --- /dev/null +++ b/scripts/sbom/sbom/cmd_graph/__init__.py @@ -0,0 +1,7 @@ +# SPDX-License-Identifier: GPL-2.0-only OR MIT +# Copyright (C) 2025 TNG Technology Consulting GmbH + +from .cmd_graph import CmdGraph +from .cmd_graph_node import CmdGraphNode, CmdGraphNodeConfig + +__all__ =3D ["CmdGraph", "CmdGraphNode", "CmdGraphNodeConfig"] diff --git a/scripts/sbom/sbom/cmd_graph/cmd_file.py b/scripts/sbom/sbom/cm= d_graph/cmd_file.py new file mode 100644 index 00000000000..d85ef5de0c2 --- /dev/null +++ b/scripts/sbom/sbom/cmd_graph/cmd_file.py @@ -0,0 +1,149 @@ +# SPDX-License-Identifier: GPL-2.0-only OR MIT +# Copyright (C) 2025 TNG Technology Consulting GmbH + +import os +import re +from dataclasses import dataclass, field +from sbom.cmd_graph.deps_parser import parse_cmd_file_deps +from sbom.cmd_graph.savedcmd_parser import parse_inputs_from_commands +import sbom.sbom_logging as sbom_logging +from sbom.path_utils import PathStr + +SAVEDCMD_PATTERN =3D re.compile(r"^(saved)?cmd_.*?:=3D\s*(?P= .+)$") +SOURCE_PATTERN =3D re.compile(r"^source.*?:=3D\s*(?P.+)$") + + +@dataclass +class CmdFile: + cmd_file_path: PathStr + savedcmd: str + source: PathStr | None =3D None + deps: list[str] =3D field(default_factory=3Dlist[str]) + make_rules: list[str] =3D field(default_factory=3Dlist[str]) + + @classmethod + def create(cls, cmd_file_path: PathStr) -> "CmdFile | None": + """ + Parses a .cmd file. + .cmd files are assumed to have one of the following structures: + 1. Full Cmd File + (saved)?cmd_ :=3D + source_ :=3D + deps_ :=3D \ + + :=3D $(deps_) + $(deps_): + + 2. Command Only Cmd File + (saved)?cmd_ :=3D + + 3. Single Dependency Cmd File + (saved)?cmd_ :=3D + :=3D + + Args: + cmd_file_path (Path): absolute Path to a .cmd file + + Returns: + cmd_file (CmdFile): Parsed cmd file. + """ + with open(cmd_file_path, "rt") as f: + lines =3D [line.strip() for line in f.readlines() if line.stri= p() !=3D "" and not line.startswith("#")] + + # savedcmd + match =3D SAVEDCMD_PATTERN.match(lines[0]) + if match is None: + sbom_logging.error( + "Skip parsing '{cmd_file_path}' because no 'savedcmd_' com= mand was found.", cmd_file_path=3Dcmd_file_path + ) + return None + savedcmd =3D match.group("full_command") + + # Command Only Cmd File + if len(lines) =3D=3D 1: + return CmdFile(cmd_file_path, savedcmd) + + # Single Dependency Cmd File + if len(lines) =3D=3D 2: + dep =3D lines[1].split(":")[1].strip() + return CmdFile(cmd_file_path, savedcmd, deps=3D[dep]) + + # Full Cmd File + # source + line1 =3D SOURCE_PATTERN.match(lines[1]) + if line1 is None: + sbom_logging.error( + "Skip parsing '{cmd_file_path}' because no 'source_' entry= was found.", cmd_file_path=3Dcmd_file_path + ) + return CmdFile(cmd_file_path, savedcmd) + source =3D line1.group("source_file") + + # deps + deps: list[str] =3D [] + i =3D 3 # lines[2] includes the variable assignment but no actual= dependency, so we need to start at lines[3]. + while i < len(lines): + if not lines[i].endswith("\\"): + break + deps.append(lines[i][:-1].strip()) + i +=3D 1 + + # make_rules + make_rules =3D lines[i:] + + return CmdFile(cmd_file_path, savedcmd, source, deps, make_rules) + + def get_dependencies( + self: "CmdFile", target_path: PathStr, obj_tree: PathStr, fail_on_= unknown_build_command: bool + ) -> list[PathStr]: + """ + Parses all dependencies required to build a target file from its c= md file. + + Args: + target_path: path to the target file relative to `obj_tree`. + obj_tree: absolute path to the object tree. + fail_on_unknown_build_command: Whether to fail if an unknown b= uild command is encountered. + + Returns: + list[PathStr]: dependency file paths relative to `obj_tree`. + """ + input_files: list[PathStr] =3D [ + str(p) for p in parse_inputs_from_commands(self.savedcmd, fail= _on_unknown_build_command) + ] + if self.deps: + input_files +=3D [str(p) for p in parse_cmd_file_deps(self.dep= s)] + input_files =3D _expand_resolve_files(input_files, obj_tree) + + cmd_file_dependencies: list[PathStr] =3D [] + for input_file in input_files: + # input files are either absolute or relative to the object tr= ee + if os.path.isabs(input_file): + input_file =3D os.path.relpath(input_file, obj_tree) + if input_file =3D=3D target_path: + # Skip target file to prevent cycles. This is necessary be= cause some multi stage commands first create an output and then pass it as = input to the next command, e.g., objcopy. + continue + cmd_file_dependencies.append(input_file) + + return cmd_file_dependencies + + +def _expand_resolve_files(input_files: list[PathStr], obj_tree: PathStr) -= > list[PathStr]: + """ + Expands resolve files which may reference additional files via '@' not= ation. + + Args: + input_files (list[PathStr]): List of file paths relative to the ob= ject tree, where paths starting with '@' refer to files + containing further file paths, each o= n a separate line. + obj_tree: Absolute path to the root of the object tree. + + Returns: + list[PathStr]: Flattened list of all input file paths, with any ne= sted '@' file references resolved recursively. + """ + expanded_input_files: list[PathStr] =3D [] + for input_file in input_files: + if not input_file.startswith("@"): + expanded_input_files.append(input_file) + continue + with open(os.path.join(obj_tree, input_file.lstrip("@")), "rt") as= f: + resolve_file_content =3D [line_stripped for line in f.readline= s() if (line_stripped :=3D line.strip())] + expanded_input_files +=3D _expand_resolve_files(resolve_file_conte= nt, obj_tree) + return expanded_input_files diff --git a/scripts/sbom/sbom/cmd_graph/cmd_graph.py b/scripts/sbom/sbom/c= md_graph/cmd_graph.py new file mode 100644 index 00000000000..cad54243ff3 --- /dev/null +++ b/scripts/sbom/sbom/cmd_graph/cmd_graph.py @@ -0,0 +1,46 @@ +# SPDX-License-Identifier: GPL-2.0-only OR MIT +# Copyright (C) 2025 TNG Technology Consulting GmbH + +from collections import deque +from dataclasses import dataclass, field +from typing import Iterator + +from sbom.cmd_graph.cmd_graph_node import CmdGraphNode, CmdGraphNodeConfig +from sbom.path_utils import PathStr + + +@dataclass +class CmdGraph: + """Directed acyclic graph of build dependencies primarily inferred fro= m .cmd files produced during kernel builds""" + + roots: list[CmdGraphNode] =3D field(default_factory=3Dlist[CmdGraphNod= e]) + + @classmethod + def create(cls, root_paths: list[PathStr], config: CmdGraphNodeConfig)= -> "CmdGraph": + """ + Recursively builds a dependency graph starting from `root_paths`. + Dependencies are mainly discovered by parsing the `.cmd` files. + + Args: + root_paths (list[PathStr]): List of paths to root outputs rela= tive to obj_tree + config (CmdGraphNodeConfig): Configuration options + + Returns: + CmdGraph: A graph of all build dependencies for the given root= files. + """ + node_cache: dict[PathStr, CmdGraphNode] =3D {} + root_nodes =3D [CmdGraphNode.create(root_path, config, node_cache)= for root_path in root_paths] + return CmdGraph(root_nodes) + + def __iter__(self) -> Iterator[CmdGraphNode]: + """Traverse the graph in breadth-first order, yielding each unique= node.""" + visited: set[PathStr] =3D set() + node_stack: deque[CmdGraphNode] =3D deque(self.roots) + while len(node_stack) > 0: + node =3D node_stack.popleft() + if node.absolute_path in visited: + continue + + visited.add(node.absolute_path) + node_stack.extend(node.children) + yield node diff --git a/scripts/sbom/sbom/cmd_graph/cmd_graph_node.py b/scripts/sbom/s= bom/cmd_graph/cmd_graph_node.py new file mode 100644 index 00000000000..7a5279a1ba0 --- /dev/null +++ b/scripts/sbom/sbom/cmd_graph/cmd_graph_node.py @@ -0,0 +1,111 @@ +# SPDX-License-Identifier: GPL-2.0-only OR MIT +# Copyright (C) 2025 TNG Technology Consulting GmbH + +from dataclasses import dataclass, field +import logging +import os +from typing import Iterator, Protocol + +from sbom import sbom_logging +from sbom.cmd_graph.cmd_file import CmdFile +from sbom.path_utils import PathStr, is_relative_to + + +class CmdGraphNodeConfig(Protocol): + obj_tree: PathStr + src_tree: PathStr + fail_on_unknown_build_command: bool + + +@dataclass +class CmdGraphNode: + """A node in the cmd graph representing a single file and its dependen= cies.""" + + absolute_path: PathStr + """Absolute path to the file this node represents.""" + + cmd_file: CmdFile | None =3D None + """Parsed .cmd file describing how the file at absolute_path was built= , or None if not available.""" + + cmd_file_dependencies: list["CmdGraphNode"] =3D field(default_factory= =3Dlist["CmdGraphNode"]) + + @property + def children(self) -> Iterator["CmdGraphNode"]: + seen: set[PathStr] =3D set() + for node in self.cmd_file_dependencies: + if node.absolute_path not in seen: + seen.add(node.absolute_path) + yield node + + @classmethod + def create( + cls, + target_path: PathStr, + config: CmdGraphNodeConfig, + cache: dict[PathStr, "CmdGraphNode"] | None =3D None, + depth: int =3D 0, + ) -> "CmdGraphNode": + """ + Recursively builds a dependency graph starting from `target_path`. + Dependencies are mainly discovered by parsing the `..cmd` file. + + Args: + target_path: Path to the target file relative to obj_tree. + config: Config options + cache: Tracks processed nodes to prevent cycles. + depth: Internal parameter to track the current recursion depth. + + Returns: + CmdGraphNode: cmd graph node representing the target file + """ + if cache is None: + cache =3D {} + + target_path_absolute =3D ( + os.path.realpath(p) + if os.path.islink(p :=3D os.path.join(config.obj_tree, target_= path)) + else os.path.normpath(p) + ) + + if target_path_absolute in cache: + return cache[target_path_absolute] + + if depth =3D=3D 0: + logging.debug(f"Build node: {target_path}") + + cmd_file_path =3D _to_cmd_path(target_path_absolute) + cmd_file =3D CmdFile.create(cmd_file_path) if os.path.exists(cmd_f= ile_path) else None + node =3D CmdGraphNode(target_path_absolute, cmd_file) + cache[target_path_absolute] =3D node + + if not os.path.exists(target_path_absolute): + error_or_warning =3D ( + sbom_logging.error + if is_relative_to(target_path_absolute, config.obj_tree) + or is_relative_to(target_path_absolute, config.src_tree) + else sbom_logging.warning + ) + error_or_warning( + "Skip parsing '{target_path_absolute}' because file does n= ot exist", + target_path_absolute=3Dtarget_path_absolute, + ) + return node + + # Search for dependencies to add to the graph as child nodes. Chil= d paths are always relative to the output tree. + def _build_child_node(child_path: PathStr) -> "CmdGraphNode": + return CmdGraphNode.create(child_path, config, cache, depth + = 1) + + if cmd_file is not None: + node.cmd_file_dependencies =3D [ + _build_child_node(cmd_file_dependency_path) + for cmd_file_dependency_path in cmd_file.get_dependencies( + target_path, config.obj_tree, config.fail_on_unknown_b= uild_command + ) + ] + + return node + + +def _to_cmd_path(path: PathStr) -> PathStr: + name =3D os.path.basename(path) + return path.removesuffix(name) + f".{name}.cmd" diff --git a/scripts/sbom/sbom/cmd_graph/deps_parser.py b/scripts/sbom/sbom= /cmd_graph/deps_parser.py new file mode 100644 index 00000000000..fb3ccdd415b --- /dev/null +++ b/scripts/sbom/sbom/cmd_graph/deps_parser.py @@ -0,0 +1,52 @@ +# SPDX-License-Identifier: GPL-2.0-only OR MIT +# Copyright (C) 2025 TNG Technology Consulting GmbH + +import re +import sbom.sbom_logging as sbom_logging +from sbom.path_utils import PathStr + +# Match dependencies on config files +# Example match: "$(wildcard include/config/CONFIG_SOMETHING)" +CONFIG_PATTERN =3D re.compile(r"\$\(wildcard (include/config/[^)]+)\)") + +# Match dependencies on the objtool binary +# Example match: "$(wildcard ./tools/objtool/objtool)" +OBJTOOL_PATTERN =3D re.compile(r"\$\(wildcard \./tools/objtool/objtool\)") + +# Match any Makefile wildcard reference +# Example match: "$(wildcard path/to/file)" +WILDCARD_PATTERN =3D re.compile(r"\$\(wildcard (?P[^)]+)\)") + +# Match ordinary paths: +# - ^(\/)?: Optionally starts with a '/' +# - (([\w\-\., ]*)\/)*: Zero or more directory levels +# - [\w\-\., ]+$: Path component (file or directory) +# Example matches: "/foo/bar.c", "dir1/dir2/file.txt", "plainfile" +VALID_PATH_PATTERN =3D re.compile(r"^(\/)?(([\w\-\., ]*)\/)*[\w\-\., ]+$") + + +def parse_cmd_file_deps(deps: list[str]) -> list[PathStr]: + """ + Parse dependency strings of a .cmd file and return valid input file pa= ths. + + Args: + deps: List of dependency strings as found in `.cmd` files. + + Returns: + input_files: List of input file paths + """ + input_files: list[PathStr] =3D [] + for dep in deps: + dep =3D dep.strip() + match dep: + case _ if CONFIG_PATTERN.match(dep) or OBJTOOL_PATTERN.match(d= ep): + # config paths like include/config/ should no= t be included in the graph + continue + case _ if match :=3D WILDCARD_PATTERN.match(dep): + path =3D match.group("path") + input_files.append(path) + case _ if VALID_PATH_PATTERN.match(dep): + input_files.append(dep) + case _: + sbom_logging.error("Skip parsing dependency {dep} because = of unrecognized format", dep=3Ddep) + return input_files diff --git a/scripts/sbom/sbom/config.py b/scripts/sbom/sbom/config.py index 3dc569ae0c4..39e556a4c53 100644 --- a/scripts/sbom/sbom/config.py +++ b/scripts/sbom/sbom/config.py @@ -3,15 +3,43 @@ =20 import argparse from dataclasses import dataclass +import os +from typing import Any +from sbom.path_utils import PathStr =20 =20 @dataclass class KernelSbomConfig: + src_tree: PathStr + """Absolute path to the Linux kernel source directory.""" + + obj_tree: PathStr + """Absolute path to the build output directory.""" + + root_paths: list[PathStr] + """List of paths to root outputs (relative to obj_tree) to base the SB= OM on.""" + + generate_used_files: bool + """Whether to generate a flat list of all source files used in the bui= ld. + If False, no used-files document is created.""" + + used_files_file_name: str + """If `generate_used_files` is True, specifies the file name for the u= sed-files document.""" + + output_directory: PathStr + """Path to the directory where the generated output documents will be = saved.""" + debug: bool """Whether to enable debug logging.""" =20 + fail_on_unknown_build_command: bool + """Whether to fail if an unknown build command is encountered in a .cm= d file.""" + + write_output_on_error: bool + """Whether to write output documents even if errors occur.""" + =20 -def _parse_cli_arguments() -> dict[str, bool]: +def _parse_cli_arguments() -> dict[str, Any]: """ Parse command-line arguments using argparse. =20 @@ -19,8 +47,49 @@ def _parse_cli_arguments() -> dict[str, bool]: Dictionary of parsed arguments. """ parser =3D argparse.ArgumentParser( + formatter_class=3Dargparse.RawTextHelpFormatter, description=3D"Generate SPDX SBOM documents for kernel builds", ) + parser.add_argument( + "--src-tree", + default=3D"../linux", + help=3D"Path to the kernel source tree (default: ../linux)", + ) + parser.add_argument( + "--obj-tree", + default=3D"../linux/kernel_build", + help=3D"Path to the build output directory (default: ../linux/kern= el_build)", + ) + group =3D parser.add_mutually_exclusive_group(required=3DTrue) + group.add_argument( + "--roots", + nargs=3D"+", + default=3D"arch/x86/boot/bzImage", + help=3D"Space-separated list of paths relative to obj-tree for whi= ch the SBOM will be created.\n" + "Cannot be used together with --roots-file. (default: arch/x86/boo= t/bzImage)", + ) + group.add_argument( + "--roots-file", + help=3D"Path to a file containing the root paths (one per line). C= annot be used together with --roots.", + ) + parser.add_argument( + "--generate-used-files", + action=3D"store_true", + default=3DFalse, + help=3D( + "Whether to create the sbom.used-files.txt file, a flat list o= f all " + "source files used for the kernel build.\n" + "If src-tree and obj-tree are equal it is not possible to reli= ably " + "classify source files.\n" + "In this case sbom.used-files.txt will contain all files used = for the " + "kernel build including all build artifacts. (default: False)" + ), + ) + parser.add_argument( + "--output-directory", + default=3D".", + help=3D"Path to the directory where the generated output documents= will be stored (default: .)", + ) parser.add_argument( "--debug", action=3D"store_true", @@ -28,6 +97,28 @@ def _parse_cli_arguments() -> dict[str, bool]: help=3D"Enable debug logs (default: False)", ) =20 + # Error handling settings + parser.add_argument( + "--do-not-fail-on-unknown-build-command", + action=3D"store_true", + default=3DFalse, + help=3D( + "Whether to fail if an unknown build command is encountered in= a .cmd file.\n" + "If set to True, errors are logged as warnings instead. (defau= lt: False)" + ), + ) + parser.add_argument( + "--write-output-on-error", + action=3D"store_true", + default=3DFalse, + help=3D( + "Write output documents even if errors occur. The resulting do= cuments " + "may be incomplete.\n" + "A summary of warnings and errors can be found in the 'comment= ' property " + "of the CreationInfo element. (default: False)" + ), + ) + args =3D vars(parser.parse_args()) return args =20 @@ -42,6 +133,58 @@ def get_config() -> KernelSbomConfig: # Parse cli arguments args =3D _parse_cli_arguments() =20 + # Extract and validate cli arguments + src_tree =3D os.path.realpath(args["src_tree"]) + obj_tree =3D os.path.realpath(args["obj_tree"]) + root_paths =3D [] + if args["roots_file"]: + with open(args["roots_file"], "rt") as f: + root_paths =3D [root.strip() for root in f.readlines()] + else: + root_paths =3D args["roots"] + _validate_path_arguments(src_tree, obj_tree, root_paths) + + generate_used_files =3D args["generate_used_files"] + output_directory =3D os.path.realpath(args["output_directory"]) debug =3D args["debug"] =20 - return KernelSbomConfig(debug=3Ddebug) + fail_on_unknown_build_command =3D not args["do_not_fail_on_unknown_bui= ld_command"] + write_output_on_error =3D args["write_output_on_error"] + + # Hardcoded config + used_files_file_name =3D "sbom.used-files.txt" + + return KernelSbomConfig( + src_tree=3Dsrc_tree, + obj_tree=3Dobj_tree, + root_paths=3Droot_paths, + generate_used_files=3Dgenerate_used_files, + used_files_file_name=3Dused_files_file_name, + output_directory=3Doutput_directory, + debug=3Ddebug, + fail_on_unknown_build_command=3Dfail_on_unknown_build_command, + write_output_on_error=3Dwrite_output_on_error, + ) + + +def _validate_path_arguments(src_tree: PathStr, obj_tree: PathStr, root_pa= ths: list[PathStr]) -> None: + """ + Validate that the provided paths exist. + + Args: + src_tree: Absolute path to the source tree. + obj_tree: Absolute path to the object tree. + root_paths: List of root paths relative to obj_tree. + + Raises: + argparse.ArgumentTypeError: If any of the paths don't exist. + """ + if not os.path.exists(src_tree): + raise argparse.ArgumentTypeError(f"--src-tree {src_tree} does not = exist") + if not os.path.exists(obj_tree): + raise argparse.ArgumentTypeError(f"--obj-tree {obj_tree} does not = exist") + for root_path in root_paths: + if not os.path.exists(os.path.join(obj_tree, root_path)): + raise argparse.ArgumentTypeError( + f"path to root artifact {os.path.join(obj_tree, root_path)= } does not exist" + ) diff --git a/scripts/sbom/sbom/path_utils.py b/scripts/sbom/sbom/path_utils= .py new file mode 100644 index 00000000000..d28d67b2539 --- /dev/null +++ b/scripts/sbom/sbom/path_utils.py @@ -0,0 +1,11 @@ +# SPDX-License-Identifier: GPL-2.0-only OR MIT +# Copyright (C) 2025 TNG Technology Consulting GmbH + +import os + +PathStr =3D str +"""Filesystem path represented as a plain string for better performance th= an pathlib.Path.""" + + +def is_relative_to(path: PathStr, base: PathStr) -> bool: + return os.path.commonpath([path, base]) =3D=3D base --=20 2.43.0 From nobody Sat Apr 18 04:54:51 2026 Received: from mailgw01.zimbra-vnc.de (mailgw01.zimbra-vnc.de [148.251.101.236]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id F24D53CE484; Fri, 10 Apr 2026 21:23:39 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=148.251.101.236 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1775856223; cv=none; b=oBhrIKxxu9ybBHy8WQdcbnSPAFmHeV3OJMtqJS9wKpN525hv+gwmBnDyEQ+82XdmpEzCbr/wco25aCX7OKmU/ifNtk8VrptcLxrH6AsDiCfqzIMes4HwOG6+DBI/7Xz33W4m0xN9d6rsIoEGkr1cpni+sEUyf33p+jPvRXdjR7c= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1775856223; c=relaxed/simple; bh=MyXYbny0p7YtafTHy2tJkFVT/L3pClljyZAX67XjY4k=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=diw2VP4XgGksWicUt7DiEOdEOxYx2aBuH4j0clPVWx5xUf/+cHD/SEV8GVdXvqctZGjkz9FEdqJ0Y3Jk85rwdpF+GF4ioQ+2CVcgO6tFYfm9Bb4i2rJ3x/UcfJCYftjsAoxrELpmV8k3GRfX1jlcoJEokMQUWXPtNPbbfJrRQ4g= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=tngtech.com; spf=pass smtp.mailfrom=tngtech.com; dkim=pass (2048-bit key) header.d=tngtech.com header.i=@tngtech.com header.b=hUcj9pSF; arc=none smtp.client-ip=148.251.101.236 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=tngtech.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=tngtech.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=tngtech.com header.i=@tngtech.com header.b="hUcj9pSF" Received: from zmproxy.tng.vnc.biz (zimbra-vnc.tngtech.com [35.234.71.156]) by mailgw01.zimbra-vnc.de (Postfix) with ESMTPS id 02A1A3FAF6; Fri, 10 Apr 2026 23:23:38 +0200 (CEST) Received: from localhost (localhost [127.0.0.1]) by zmproxy.tng.vnc.biz (Postfix) with ESMTP id E3AFE1F89B3; Fri, 10 Apr 2026 23:23:37 +0200 (CEST) Received: from zmproxy.tng.vnc.biz ([127.0.0.1]) by localhost (zmproxy.tng.vnc.biz [127.0.0.1]) (amavis, port 10032) with ESMTP id GDhGXyp_D1tl; Fri, 10 Apr 2026 23:23:37 +0200 (CEST) Received: from localhost (localhost [127.0.0.1]) by zmproxy.tng.vnc.biz (Postfix) with ESMTP id 2FFFB1FADC9; Fri, 10 Apr 2026 23:23:37 +0200 (CEST) DKIM-Filter: OpenDKIM Filter v2.10.3 zmproxy.tng.vnc.biz 2FFFB1FADC9 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=tngtech.com; s=B14491C6-869D-11EB-BB6C-8DD33D883B31; t=1775856217; bh=ixwQrFF/HCDbBOGY+7tdSlPAbfVKT3drjaCxDac2zIE=; h=From:To:Subject:Date:Message-ID:MIME-Version; b=hUcj9pSF/6h84kvmhW/J1P9/URNt7jaXg7FIGzCsBiE84jz2/6CgnnAG5O3qCDdhH SPbid/m88gQHKV3M/2g3dOnIDypdo9/IYvy6RRpMbJSMbNGJOl4S7pe3qGr9FDtVdg KU10Q9OzpFRrAc/uoTIzMQNbtH8U4WMrIa5Hfxup/NlsV4Kh07LyBCnh6Y+P5KTgkD rVgxAPnZuYY8hYpQISTYQ9bYTVvbIIWwD4aATrDgUvjd6ml6N1ZvBWcr3qgsS1hBJV Y0zuVl98p5d5t0aZXXZjNUZC1UchoPi6/Vb9RpaHt5eeLkg1VREF1ajIg4jzWeQemM sCHuEPgzxLzuw== X-Virus-Scanned: amavis at zmproxy.tng.vnc.biz Received: from zmproxy.tng.vnc.biz ([127.0.0.1]) by localhost (zmproxy.tng.vnc.biz [127.0.0.1]) (amavis, port 10026) with ESMTP id V_qeZmypfRFm; Fri, 10 Apr 2026 23:23:37 +0200 (CEST) Received: from luis-Precision-5480.. (ipservice-092-209-239-167.092.209.pools.vodafone-ip.de [92.209.239.167]) by zmproxy.tng.vnc.biz (Postfix) with ESMTPSA id C8B3A1F89B3; Fri, 10 Apr 2026 23:23:36 +0200 (CEST) From: Luis To: nathan@kernel.org, nsc@kernel.org Cc: linux-kbuild@vger.kernel.org, linux-kernel@vger.kernel.org, akpm@linux-foundation.org, gregkh@linuxfoundation.org, kstewart@linuxfoundation.org, maximilian.huber@tngtech.com, Luis Augenstein Subject: [PATCH v5 06/15] scripts/sbom: add additional dependency sources for cmd graph Date: Fri, 10 Apr 2026 23:22:46 +0200 Message-ID: <20260410212255.9883-7-luis.augenstein@tngtech.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20260410212255.9883-1-luis.augenstein@tngtech.com> References: <20260410212255.9883-1-luis.augenstein@tngtech.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Luis Augenstein Add hardcoded dependencies and .incbin directive parsing to discover dependencies not tracked by .cmd files. Assisted-by: Cursor:claude-sonnet-4-5 Assisted-by: OpenCode:GLM-4-7 Co-developed-by: Maximilian Huber Signed-off-by: Maximilian Huber Signed-off-by: Luis Augenstein --- scripts/sbom/sbom/cmd_graph/cmd_graph_node.py | 33 +++++++- .../sbom/cmd_graph/hardcoded_dependencies.py | 83 +++++++++++++++++++ scripts/sbom/sbom/cmd_graph/incbin_parser.py | 42 ++++++++++ 3 files changed, 157 insertions(+), 1 deletion(-) create mode 100644 scripts/sbom/sbom/cmd_graph/hardcoded_dependencies.py create mode 100644 scripts/sbom/sbom/cmd_graph/incbin_parser.py diff --git a/scripts/sbom/sbom/cmd_graph/cmd_graph_node.py b/scripts/sbom/s= bom/cmd_graph/cmd_graph_node.py index 7a5279a1ba0..feacdbf7695 100644 --- a/scripts/sbom/sbom/cmd_graph/cmd_graph_node.py +++ b/scripts/sbom/sbom/cmd_graph/cmd_graph_node.py @@ -2,15 +2,24 @@ # Copyright (C) 2025 TNG Technology Consulting GmbH =20 from dataclasses import dataclass, field +from itertools import chain import logging import os from typing import Iterator, Protocol =20 from sbom import sbom_logging from sbom.cmd_graph.cmd_file import CmdFile +from sbom.cmd_graph.hardcoded_dependencies import get_hardcoded_dependenci= es +from sbom.cmd_graph.incbin_parser import parse_incbin_statements from sbom.path_utils import PathStr, is_relative_to =20 =20 +@dataclass +class IncbinDependency: + node: "CmdGraphNode" + full_statement: str + + class CmdGraphNodeConfig(Protocol): obj_tree: PathStr src_tree: PathStr @@ -28,11 +37,17 @@ class CmdGraphNode: """Parsed .cmd file describing how the file at absolute_path was built= , or None if not available.""" =20 cmd_file_dependencies: list["CmdGraphNode"] =3D field(default_factory= =3Dlist["CmdGraphNode"]) + incbin_dependencies: list[IncbinDependency] =3D field(default_factory= =3Dlist[IncbinDependency]) + hardcoded_dependencies: list["CmdGraphNode"] =3D field(default_factory= =3Dlist["CmdGraphNode"]) =20 @property def children(self) -> Iterator["CmdGraphNode"]: seen: set[PathStr] =3D set() - for node in self.cmd_file_dependencies: + for node in chain( + self.cmd_file_dependencies, + (dep.node for dep in self.incbin_dependencies), + self.hardcoded_dependencies, + ): if node.absolute_path not in seen: seen.add(node.absolute_path) yield node @@ -95,6 +110,13 @@ class CmdGraphNode: def _build_child_node(child_path: PathStr) -> "CmdGraphNode": return CmdGraphNode.create(child_path, config, cache, depth + = 1) =20 + node.hardcoded_dependencies =3D [ + _build_child_node(hardcoded_dependency_path) + for hardcoded_dependency_path in get_hardcoded_dependencies( + target_path_absolute, config.obj_tree, config.src_tree + ) + ] + if cmd_file is not None: node.cmd_file_dependencies =3D [ _build_child_node(cmd_file_dependency_path) @@ -103,6 +125,15 @@ class CmdGraphNode: ) ] =20 + if node.absolute_path.endswith(".S"): + node.incbin_dependencies =3D [ + IncbinDependency( + node=3D_build_child_node(incbin_statement.path), + full_statement=3Dincbin_statement.full_statement, + ) + for incbin_statement in parse_incbin_statements(node.absol= ute_path) + ] + return node =20 =20 diff --git a/scripts/sbom/sbom/cmd_graph/hardcoded_dependencies.py b/script= s/sbom/sbom/cmd_graph/hardcoded_dependencies.py new file mode 100644 index 00000000000..a5977f14ae4 --- /dev/null +++ b/scripts/sbom/sbom/cmd_graph/hardcoded_dependencies.py @@ -0,0 +1,83 @@ +# SPDX-License-Identifier: GPL-2.0-only OR MIT +# Copyright (C) 2025 TNG Technology Consulting GmbH + +import os +from typing import Callable +import sbom.sbom_logging as sbom_logging +from sbom.path_utils import PathStr, is_relative_to +from sbom.environment import Environment + +HARDCODED_DEPENDENCIES: dict[str, list[str]] =3D { + # defined in linux/Kbuild + "include/generated/rq-offsets.h": ["kernel/sched/rq-offsets.s"], + "kernel/sched/rq-offsets.s": ["include/generated/asm-offsets.h"], + "include/generated/bounds.h": ["kernel/bounds.s"], + "include/generated/asm-offsets.h": ["arch/{arch}/kernel/asm-offsets.s"= ], +} + + +def get_hardcoded_dependencies(path: PathStr, obj_tree: PathStr, src_tree:= PathStr) -> list[PathStr]: + """ + Some files in the kernel build process are not tracked by the .cmd dep= endency mechanism. + Parsing these dependencies programmatically is too complex for the sco= pe of this project. + Therefore, this function provides manually defined dependencies to be = added to the build graph. + + Args: + path: absolute path to a file within the src tree or object tree. + obj_tree: absolute Path to the base directory of the object tree. + src_tree: absolute Path to the `linux` source directory. + + Returns: + list[PathStr]: A list of dependency file paths (relative to the ob= ject tree) required to build the file at the given path. + """ + if is_relative_to(path, obj_tree): + path =3D os.path.relpath(path, obj_tree) + elif is_relative_to(path, src_tree): + path =3D os.path.relpath(path, src_tree) + + if path not in HARDCODED_DEPENDENCIES: + return [] + + template_variables: dict[str, Callable[[], str | None]] =3D { + "arch": lambda: _get_arch(path), + } + + dependencies: list[PathStr] =3D [] + for dependency_template in HARDCODED_DEPENDENCIES[path]: + dependency =3D _evaluate_template(dependency_template, template_va= riables) + if dependency is None: + continue + if os.path.exists(os.path.join(obj_tree, dependency)): + dependencies.append(dependency) + elif os.path.exists(os.path.join(src_tree, dependency)): + dependencies.append(os.path.relpath(dependency, obj_tree)) + else: + sbom_logging.error( + "Skip hardcoded dependency '{dependency}' for '{path}' bec= ause the dependency lies neither in the src tree nor the object tree.", + dependency=3Ddependency, + path=3Dpath, + ) + + return dependencies + + +def _evaluate_template(template: str, variables: dict[str, Callable[[], st= r | None]]) -> str | None: + for key, value_function in variables.items(): + template_key =3D "{" + key + "}" + if template_key in template: + value =3D value_function() + if value is None: + return None + template =3D template.replace(template_key, value) + return template + + +def _get_arch(path: PathStr): + srcarch =3D Environment.SRCARCH() + if srcarch is None: + sbom_logging.error( + "Skipped architecture specific hardcoded dependency for '{path= }' because the SRCARCH environment variable was not set.", + path=3Dpath, + ) + return None + return srcarch diff --git a/scripts/sbom/sbom/cmd_graph/incbin_parser.py b/scripts/sbom/sb= om/cmd_graph/incbin_parser.py new file mode 100644 index 00000000000..130f9520837 --- /dev/null +++ b/scripts/sbom/sbom/cmd_graph/incbin_parser.py @@ -0,0 +1,42 @@ +# SPDX-License-Identifier: GPL-2.0-only OR MIT +# Copyright (C) 2025 TNG Technology Consulting GmbH + +from dataclasses import dataclass +import re + +from sbom.path_utils import PathStr + +INCBIN_PATTERN =3D re.compile(r'\s*\.incbin\s+"(?P[^"]+)"') +"""Regex pattern for matching `.incbin ""` statements.""" + + +@dataclass +class IncbinStatement: + """A parsed `.incbin ""` directive.""" + + path: PathStr + """path to the file referenced by the `.incbin` directive.""" + + full_statement: str + """Full `.incbin ""` statement as it originally appeared in the = file.""" + + +def parse_incbin_statements(absolute_path: PathStr) -> list[IncbinStatemen= t]: + """ + Parses `.incbin` directives from an `.S` assembly file. + + Args: + absolute_path: Absolute path to the `.S` assembly file. + + Returns: + list[IncbinStatement]: Parsed `.incbin` statements. + """ + with open(absolute_path, "rt") as f: + content =3D f.read() + return [ + IncbinStatement( + path=3Dmatch.group("path"), + full_statement=3Dmatch.group(0).strip(), + ) + for match in INCBIN_PATTERN.finditer(content) + ] --=20 2.43.0 From nobody Sat Apr 18 04:54:51 2026 Received: from mailgw01.zimbra-vnc.de (mailgw01.zimbra-vnc.de [148.251.101.236]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D4FE13C9EF3; Fri, 10 Apr 2026 21:23:41 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=148.251.101.236 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1775856227; cv=none; b=eYbuEbAeQsWxqVBkhxewZdgzHGbi8zJIedNoCT7mMxvGvZj885dqi+nehU3vw+dFosgQdvFif6T9gt2KWpfC6KkvHfqXV69SwFkXVv/058iCfPAx/O1cgsEKneul+7iaxcw56IURLZr0hO3Hy0MTs/TmCR+MzQY6FW7+ZvAoJ2I= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1775856227; c=relaxed/simple; bh=wCLbKzEvbQp2g2o8c+1S9PeLQYWrtb2VwisHeUgOPcM=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=Zmh/ZgyruOayUg+JLLQ8t8YZ9Sbyaw/11FxPCFMXs1gcky49FJnna2xAzRO3oMAVA0sB33z7Akp2l7NH7J8xlTqizrfrVHwmPPS1oVki2ZZphzcgNHsEKonyt8ZCLBu9/JZnQDtJlcPRtSKKiQ4U1jYH9QCjSdewJYAgAF58GDg= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=tngtech.com; spf=pass smtp.mailfrom=tngtech.com; dkim=pass (2048-bit key) header.d=tngtech.com header.i=@tngtech.com header.b=SCMzWSjb; arc=none smtp.client-ip=148.251.101.236 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=tngtech.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=tngtech.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=tngtech.com header.i=@tngtech.com header.b="SCMzWSjb" Received: from zmproxy.tng.vnc.biz (zimbra-vnc.tngtech.com [35.234.71.156]) by mailgw01.zimbra-vnc.de (Postfix) with ESMTPS id 0662D3FAF8; Fri, 10 Apr 2026 23:23:40 +0200 (CEST) Received: from localhost (localhost [127.0.0.1]) by zmproxy.tng.vnc.biz (Postfix) with ESMTP id AD5261FADEA; Fri, 10 Apr 2026 23:23:39 +0200 (CEST) Received: from zmproxy.tng.vnc.biz ([127.0.0.1]) by localhost (zmproxy.tng.vnc.biz [127.0.0.1]) (amavis, port 10032) with ESMTP id 6g4sDe6OHofu; Fri, 10 Apr 2026 23:23:38 +0200 (CEST) Received: from localhost (localhost [127.0.0.1]) by zmproxy.tng.vnc.biz (Postfix) with ESMTP id 931751FA89E; Fri, 10 Apr 2026 23:23:38 +0200 (CEST) DKIM-Filter: OpenDKIM Filter v2.10.3 zmproxy.tng.vnc.biz 931751FA89E DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=tngtech.com; s=B14491C6-869D-11EB-BB6C-8DD33D883B31; t=1775856218; bh=xOM43qfjHIOHiubupTzMcNC7jRk35Lb4N+8/eq17WLM=; h=From:To:Subject:Date:Message-ID:MIME-Version; b=SCMzWSjbh4j2k/fFSGIUqeT+yOaHZgU733JC+uS9eo7fQqWsjv1RsGEEyhMULzdwF poRuN90z0/QPs/bwAFGh25DmioOffukQXw6xDs4M0u/FKQ/R31qubiW+0WRzybkcN/ eLLOxpWmae3D+E1RUko/xNtzn5mjZvfchAHESAp7hCOdR4GfaD1C4E1OvbcWcjsGSJ lxfT2RwnyjnEagzy93j4/Zl/fW1tKopUGKqgtW/jan5f9IyFlt99hj42rYUKuomiZd zkoggqfdNBC8JwVbU57qUfTRlzPliV5sv9Uer1vIV5z0yzB6z6LOya9eQ1UsFgkm7Y uaaU9lZCU+Ynw== X-Virus-Scanned: amavis at zmproxy.tng.vnc.biz Received: from zmproxy.tng.vnc.biz ([127.0.0.1]) by localhost (zmproxy.tng.vnc.biz [127.0.0.1]) (amavis, port 10026) with ESMTP id ylYlS9zRdnrV; Fri, 10 Apr 2026 23:23:38 +0200 (CEST) Received: from luis-Precision-5480.. (ipservice-092-209-239-167.092.209.pools.vodafone-ip.de [92.209.239.167]) by zmproxy.tng.vnc.biz (Postfix) with ESMTPSA id 3A0C61F89B3; Fri, 10 Apr 2026 23:23:38 +0200 (CEST) From: Luis To: nathan@kernel.org, nsc@kernel.org Cc: linux-kbuild@vger.kernel.org, linux-kernel@vger.kernel.org, akpm@linux-foundation.org, gregkh@linuxfoundation.org, kstewart@linuxfoundation.org, maximilian.huber@tngtech.com, Luis Augenstein Subject: [PATCH v5 07/15] scripts/sbom: add SPDX classes Date: Fri, 10 Apr 2026 23:22:47 +0200 Message-ID: <20260410212255.9883-8-luis.augenstein@tngtech.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20260410212255.9883-1-luis.augenstein@tngtech.com> References: <20260410212255.9883-1-luis.augenstein@tngtech.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Luis Augenstein Implement Python dataclasses to model the SPDX classes required within an SPDX document. The class and property names are consistent with the SPDX 3.0.1 specification. Assisted-by: Cursor:claude-sonnet-4-5 Assisted-by: OpenCode:GLM-4-7 Co-developed-by: Maximilian Huber Signed-off-by: Maximilian Huber Signed-off-by: Luis Augenstein --- scripts/sbom/sbom/spdx/__init__.py | 7 + scripts/sbom/sbom/spdx/build.py | 17 +++ scripts/sbom/sbom/spdx/core.py | 170 ++++++++++++++++++++++ scripts/sbom/sbom/spdx/serialization.py | 56 +++++++ scripts/sbom/sbom/spdx/simplelicensing.py | 20 +++ scripts/sbom/sbom/spdx/software.py | 69 +++++++++ scripts/sbom/sbom/spdx/spdxId.py | 36 +++++ 7 files changed, 375 insertions(+) create mode 100644 scripts/sbom/sbom/spdx/__init__.py create mode 100644 scripts/sbom/sbom/spdx/build.py create mode 100644 scripts/sbom/sbom/spdx/core.py create mode 100644 scripts/sbom/sbom/spdx/serialization.py create mode 100644 scripts/sbom/sbom/spdx/simplelicensing.py create mode 100644 scripts/sbom/sbom/spdx/software.py create mode 100644 scripts/sbom/sbom/spdx/spdxId.py diff --git a/scripts/sbom/sbom/spdx/__init__.py b/scripts/sbom/sbom/spdx/__= init__.py new file mode 100644 index 00000000000..4097b59f8f1 --- /dev/null +++ b/scripts/sbom/sbom/spdx/__init__.py @@ -0,0 +1,7 @@ +# SPDX-License-Identifier: GPL-2.0-only OR MIT +# Copyright (C) 2025 TNG Technology Consulting GmbH + +from .spdxId import SpdxId, SpdxIdGenerator +from .serialization import JsonLdSpdxDocument + +__all__ =3D ["JsonLdSpdxDocument", "SpdxId", "SpdxIdGenerator"] diff --git a/scripts/sbom/sbom/spdx/build.py b/scripts/sbom/sbom/spdx/build= .py new file mode 100644 index 00000000000..180a8f1e8bd --- /dev/null +++ b/scripts/sbom/sbom/spdx/build.py @@ -0,0 +1,17 @@ +# SPDX-License-Identifier: GPL-2.0-only OR MIT +# Copyright (C) 2025 TNG Technology Consulting GmbH + +from dataclasses import dataclass, field +from sbom.spdx.core import DictionaryEntry, Element, Hash + + +@dataclass(kw_only=3DTrue) +class Build(Element): + """https://spdx.github.io/spdx-spec/v3.0.1/model/Build/Classes/Build/"= "" + + type: str =3D field(init=3DFalse, default=3D"build_Build") + build_buildType: str + build_buildId: str + build_environment: list[DictionaryEntry] =3D field(default_factory=3Dl= ist[DictionaryEntry]) + build_configSourceUri: list[str] =3D field(default_factory=3Dlist[str]) + build_configSourceDigest: list[Hash] =3D field(default_factory=3Dlist[= Hash]) diff --git a/scripts/sbom/sbom/spdx/core.py b/scripts/sbom/sbom/spdx/core.py new file mode 100644 index 00000000000..74613e4bdd2 --- /dev/null +++ b/scripts/sbom/sbom/spdx/core.py @@ -0,0 +1,170 @@ +# SPDX-License-Identifier: GPL-2.0-only OR MIT +# Copyright (C) 2025 TNG Technology Consulting GmbH + +from dataclasses import dataclass, field +from datetime import datetime, timezone +from typing import Any, Literal +from sbom.spdx.spdxId import SpdxId + +SPDX_SPEC_VERSION =3D "3.0.1" + +ExternalIdentifierType =3D Literal["email", "gitoid", "urlScheme"] +HashAlgorithm =3D Literal["sha256", "sha512"] +ProfileIdentifierType =3D Literal["core", "software", "build", "lite", "si= mpleLicensing"] +RelationshipType =3D Literal[ + "contains", + "generates", + "hasDeclaredLicense", + "hasInput", + "hasOutput", + "ancestorOf", + "hasDistributionArtifact", + "dependsOn", +] +RelationshipCompleteness =3D Literal["complete", "incomplete", "noAssertio= n"] + + +@dataclass +class SpdxObject: + def to_dict(self) -> dict[str, Any]: + def _to_dict(v: Any): + return v.to_dict() if hasattr(v, "to_dict") else v + + d: dict[str, Any] =3D {} + for field_name in self.__dataclass_fields__: + value =3D getattr(self, field_name) + if not value: + continue + + if isinstance(value, Element): + d[field_name] =3D value.spdxId + elif isinstance(value, list) and len(value) > 0 and isinstance= (value[0], Element): # type: ignore + value: list[Element] =3D value + d[field_name] =3D [v.spdxId for v in value] + else: + d[field_name] =3D [_to_dict(v) for v in value] if isinstan= ce(value, list) else _to_dict(value) # type: ignore + return d + + +@dataclass(kw_only=3DTrue) +class IntegrityMethod(SpdxObject): + """https://spdx.github.io/spdx-spec/v3.0.1/model/Core/Classes/Integrit= yMethod/""" + + +@dataclass(kw_only=3DTrue) +class Hash(IntegrityMethod): + """https://spdx.github.io/spdx-spec/v3.0.1/model/Core/Classes/Hash/""" + + type: str =3D field(init=3DFalse, default=3D"Hash") + hashValue: str + algorithm: HashAlgorithm + + +@dataclass(kw_only=3DTrue) +class Element(SpdxObject): + """https://spdx.github.io/spdx-spec/v3.0.1/model/Core/Classes/Element/= """ + + type: str =3D field(init=3DFalse, default=3D"Element") + spdxId: SpdxId + creationInfo: str =3D "_:creationinfo" + name: str | None =3D None + verifiedUsing: list[Hash] =3D field(default_factory=3Dlist[Hash]) + comment: str | None =3D None + + +@dataclass(kw_only=3DTrue) +class ExternalMap(SpdxObject): + """https://spdx.github.io/spdx-spec/v3.0.1/model/Core/Classes/External= Map/""" + + type: str =3D field(init=3DFalse, default=3D"ExternalMap") + externalSpdxId: SpdxId + + +@dataclass(kw_only=3DTrue) +class NamespaceMap(SpdxObject): + """https://spdx.github.io/spdx-spec/v3.0.1/model/Core/Classes/Namespac= eMap/""" + + type: str =3D field(init=3DFalse, default=3D"NamespaceMap") + prefix: str + namespace: str + + +@dataclass(kw_only=3DTrue) +class ElementCollection(Element): + """https://spdx.github.io/spdx-spec/v3.0.1/model/Core/Classes/ElementC= ollection/""" + + type: str =3D field(init=3DFalse, default=3D"ElementCollection") + element: list[Element] =3D field(default_factory=3Dlist[Element]) + rootElement: list[Element] =3D field(default_factory=3Dlist[Element]) + profileConformance: list[ProfileIdentifierType] =3D field(default_fact= ory=3Dlist[ProfileIdentifierType]) + + +@dataclass(kw_only=3DTrue) +class SpdxDocument(ElementCollection): + """https://spdx.github.io/spdx-spec/v3.0.1/model/Core/Classes/SpdxDocu= ment/""" + + type: str =3D field(init=3DFalse, default=3D"SpdxDocument") + import_: list[ExternalMap] =3D field(default_factory=3Dlist[ExternalMa= p]) + namespaceMap: list[NamespaceMap] =3D field(default_factory=3Dlist[Name= spaceMap]) + + def to_dict(self) -> dict[str, Any]: + return {("import" if k =3D=3D "import_" else k): v for k, v in sup= er().to_dict().items()} + + +@dataclass(kw_only=3DTrue) +class Agent(Element): + """https://spdx.github.io/spdx-spec/v3.0.1/model/Core/Classes/Agent/""" + + type: str =3D field(init=3DFalse, default=3D"Agent") + + +@dataclass(kw_only=3DTrue) +class SoftwareAgent(Agent): + """https://spdx.github.io/spdx-spec/v3.0.1/model/Core/Classes/Software= Agent/""" + + type: str =3D field(init=3DFalse, default=3D"SoftwareAgent") + + +@dataclass(kw_only=3DTrue) +class CreationInfo(SpdxObject): + """https://spdx.github.io/spdx-spec/v3.0.1/model/Core/Classes/Creation= Info/""" + + type: str =3D field(init=3DFalse, default=3D"CreationInfo") + id: SpdxId =3D "_:creationinfo" + specVersion: str =3D SPDX_SPEC_VERSION + createdBy: list[Agent] + created: str =3D field(default_factory=3Dlambda: datetime.now(timezone= .utc).strftime("%Y-%m-%dT%H:%M:%SZ")) + comment: str | None =3D None + + def to_dict(self) -> dict[str, Any]: + return {("@id" if k =3D=3D "id" else k): v for k, v in super().to_= dict().items()} + + +@dataclass(kw_only=3DTrue) +class Relationship(Element): + """https://spdx.github.io/spdx-spec/v3.0.1/model/Core/Classes/Relation= ship/""" + + type: str =3D field(init=3DFalse, default=3D"Relationship") + relationshipType: RelationshipType + from_: Element # underscore because 'from' is a reserved keyword + to: list[Element] + completeness: RelationshipCompleteness | None =3D None + + def to_dict(self) -> dict[str, Any]: + return {("from" if k =3D=3D "from_" else k): v for k, v in super()= .to_dict().items()} + + +@dataclass(kw_only=3DTrue) +class Artifact(Element): + """https://spdx.github.io/spdx-spec/v3.0.1/model/Core/Classes/Artifact= /""" + + type: str =3D field(init=3DFalse, default=3D"Artifact") + + +@dataclass(kw_only=3DTrue) +class DictionaryEntry(SpdxObject): + """https://spdx.github.io/spdx-spec/v3.0.1/model/Core/Classes/Dictiona= ryEntry/""" + + type: str =3D field(init=3DFalse, default=3D"DictionaryEntry") + key: str + value: str diff --git a/scripts/sbom/sbom/spdx/serialization.py b/scripts/sbom/sbom/sp= dx/serialization.py new file mode 100644 index 00000000000..c830d6b3cf1 --- /dev/null +++ b/scripts/sbom/sbom/spdx/serialization.py @@ -0,0 +1,56 @@ +# SPDX-License-Identifier: GPL-2.0-only OR MIT +# Copyright (C) 2025 TNG Technology Consulting GmbH + +import json +from typing import Any +from sbom.path_utils import PathStr +from sbom.spdx.core import SPDX_SPEC_VERSION, SpdxDocument, SpdxObject + + +class JsonLdSpdxDocument: + """Represents an SPDX document in JSON-LD format for serialization.""" + + context: list[str | dict[str, str]] + graph: list[SpdxObject] + + def __init__(self, graph: list[SpdxObject]) -> None: + """ + Initialize a JSON-LD SPDX document from a graph of SPDX objects. + The graph must contain a single SpdxDocument element. + + Args: + graph: List of SPDX objects representing the complete SPDX doc= ument. + """ + self.graph =3D graph + spdx_document =3D next(element for element in graph if isinstance(= element, SpdxDocument)) + self.context =3D [ + f"https://spdx.org/rdf/{SPDX_SPEC_VERSION}/spdx-context.jsonld= ", + {namespaceMap.prefix: namespaceMap.namespace for namespaceMap = in spdx_document.namespaceMap}, + ] + spdx_document.namespaceMap =3D [] + + def to_dict(self) -> dict[str, Any]: + """ + Convert the SPDX document to a dictionary representation suitable = for JSON serialization. + + Returns: + Dictionary with @context and @graph keys following JSON-LD for= mat. + """ + return { + "@context": self.context, + "@graph": [item.to_dict() for item in self.graph], + } + + def save(self, path: PathStr, prettify: bool) -> None: + """ + Save the SPDX document to a JSON file. + + Args: + path: File path where the document will be saved. + prettify: Whether to pretty-print the JSON with indentation. + """ + with open(path, "w", encoding=3D"utf-8") as f: + if prettify: + json.dump(self.to_dict(), f, indent=3D2) + else: + json.dump(self.to_dict(), f, separators=3D(",", ":")) diff --git a/scripts/sbom/sbom/spdx/simplelicensing.py b/scripts/sbom/sbom/= spdx/simplelicensing.py new file mode 100644 index 00000000000..750ddd24ad8 --- /dev/null +++ b/scripts/sbom/sbom/spdx/simplelicensing.py @@ -0,0 +1,20 @@ +# SPDX-License-Identifier: GPL-2.0-only OR MIT +# Copyright (C) 2025 TNG Technology Consulting GmbH + +from dataclasses import dataclass, field +from sbom.spdx.core import Element + + +@dataclass(kw_only=3DTrue) +class AnyLicenseInfo(Element): + """https://spdx.github.io/spdx-spec/v3.0.1/model/SimpleLicensing/Class= es/AnyLicenseInfo/""" + + type: str =3D field(init=3DFalse, default=3D"simplelicensing_AnyLicens= eInfo") + + +@dataclass(kw_only=3DTrue) +class LicenseExpression(AnyLicenseInfo): + """https://spdx.github.io/spdx-spec/v3.0.1/model/SimpleLicensing/Class= es/LicenseExpression/""" + + type: str =3D field(init=3DFalse, default=3D"simplelicensing_LicenseEx= pression") + simplelicensing_licenseExpression: str diff --git a/scripts/sbom/sbom/spdx/software.py b/scripts/sbom/sbom/spdx/so= ftware.py new file mode 100644 index 00000000000..3e3389d75cd --- /dev/null +++ b/scripts/sbom/sbom/spdx/software.py @@ -0,0 +1,69 @@ +# SPDX-License-Identifier: GPL-2.0-only OR MIT +# Copyright (C) 2025 TNG Technology Consulting GmbH + +from dataclasses import dataclass, field +from typing import Literal +from sbom.spdx.core import Artifact, ElementCollection, IntegrityMethod + + +SbomType =3D Literal["source", "build"] +FileKindType =3D Literal["file", "directory"] +SoftwarePurpose =3D Literal[ + "source", + "archive", + "library", + "file", + "data", + "configuration", + "executable", + "module", + "application", + "documentation", + "other", +] +ContentIdentifierType =3D Literal["gitoid", "swhid"] + + +@dataclass(kw_only=3DTrue) +class Sbom(ElementCollection): + """https://spdx.github.io/spdx-spec/v3.0.1/model/Software/Classes/Sbom= /""" + + type: str =3D field(init=3DFalse, default=3D"software_Sbom") + software_sbomType: list[SbomType] =3D field(default_factory=3Dlist[Sbo= mType]) + + +@dataclass(kw_only=3DTrue) +class ContentIdentifier(IntegrityMethod): + """https://spdx.github.io/spdx-spec/v3.0.1/model/Software/Classes/Cont= entIdentifier/""" + + type: str =3D field(init=3DFalse, default=3D"software_ContentIdentifie= r") + software_contentIdentifierType: ContentIdentifierType + software_contentIdentifierValue: str + + +@dataclass(kw_only=3DTrue) +class SoftwareArtifact(Artifact): + """https://spdx.github.io/spdx-spec/v3.0.1/model/Software/Classes/Soft= wareArtifact/""" + + type: str =3D field(init=3DFalse, default=3D"software_Artifact") + software_primaryPurpose: SoftwarePurpose | None =3D None + software_copyrightText: str | None =3D None + software_contentIdentifier: list[ContentIdentifier] =3D field(default_= factory=3Dlist[ContentIdentifier]) + + +@dataclass(kw_only=3DTrue) +class Package(SoftwareArtifact): + """https://spdx.github.io/spdx-spec/v3.0.1/model/Software/Classes/Pack= age/""" + + type: str =3D field(init=3DFalse, default=3D"software_Package") + name: str # type: ignore + software_packageVersion: str | None =3D None + + +@dataclass(kw_only=3DTrue) +class File(SoftwareArtifact): + """https://spdx.github.io/spdx-spec/v3.0.1/model/Software/Classes/File= /""" + + type: str =3D field(init=3DFalse, default=3D"software_File") + name: str # type: ignore + software_fileKind: FileKindType | None =3D None diff --git a/scripts/sbom/sbom/spdx/spdxId.py b/scripts/sbom/sbom/spdx/spdx= Id.py new file mode 100644 index 00000000000..589e85c5f70 --- /dev/null +++ b/scripts/sbom/sbom/spdx/spdxId.py @@ -0,0 +1,36 @@ +# SPDX-License-Identifier: GPL-2.0-only OR MIT +# Copyright (C) 2025 TNG Technology Consulting GmbH + +from itertools import count +from typing import Iterator + +SpdxId =3D str + + +class SpdxIdGenerator: + _namespace: str + _prefix: str | None =3D None + _counter: Iterator[int] + + def __init__(self, namespace: str, prefix: str | None =3D None) -> Non= e: + """ + Initialize the SPDX ID generator with a namespace. + + Args: + namespace: The full namespace to use for generated IDs. + prefix: Optional. If provided, generated IDs will use this pre= fix instead of the full namespace. + """ + self._namespace =3D namespace + self._prefix =3D prefix + self._counter =3D count(0) + + def generate(self) -> SpdxId: + return f"{f'{self._prefix}:' if self._prefix else self._namespace}= {next(self._counter)}" + + @property + def prefix(self) -> str | None: + return self._prefix + + @property + def namespace(self) -> str: + return self._namespace --=20 2.43.0 From nobody Sat Apr 18 04:54:51 2026 Received: from mailgw01.zimbra-vnc.de (mailgw01.zimbra-vnc.de [148.251.101.236]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 4A9843B0AFA; Fri, 10 Apr 2026 21:23:42 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=148.251.101.236 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1775856228; cv=none; b=pGfL5VZKG3tVHDsGAnG5dnihlAOHy8PZ7dS4VYnc/Hr60V30V7QzhpRDBfbI2ShlId5OQuTEM4+aHZwla9znNjwiQLzjHQQ9MhSthZDMgTSii8W88JLdEEbtlxyFYY2sOehY6ilgGk3CEOZpNFAPyHz1PbFnDMD41QJxU6BHxBI= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1775856228; c=relaxed/simple; bh=sK4AClc3gRQ46QtyYglbZsIp9E8MTBR3W0UC9gvnz8s=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=ABZ4XpfYBLmDMR3TqPPBPy5XOyU13lvmDly8D/A7VoEFUwJjNwygeeDW2scLbYgJNloHnxkV9O8cA3TIl//6Nibcf4y44+C/wXbKExx8BnD1Y0xQd1FNHI6mh6pLC7bRZ7Q9t70v7RI0Lv7QqWGhNzVynLrvpz8hHcfUivaCCTg= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=tngtech.com; spf=pass smtp.mailfrom=tngtech.com; dkim=pass (2048-bit key) header.d=tngtech.com header.i=@tngtech.com header.b=l0Qzwzfl; arc=none smtp.client-ip=148.251.101.236 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=tngtech.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=tngtech.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=tngtech.com header.i=@tngtech.com header.b="l0Qzwzfl" Received: from zmproxy.tng.vnc.biz (zimbra-vnc.tngtech.com [35.234.71.156]) by mailgw01.zimbra-vnc.de (Postfix) with ESMTPS id 4B8A73FAFA; Fri, 10 Apr 2026 23:23:41 +0200 (CEST) Received: from localhost (localhost [127.0.0.1]) by zmproxy.tng.vnc.biz (Postfix) with ESMTP id 269801FAE85; Fri, 10 Apr 2026 23:23:41 +0200 (CEST) Received: from zmproxy.tng.vnc.biz ([127.0.0.1]) by localhost (zmproxy.tng.vnc.biz [127.0.0.1]) (amavis, port 10032) with ESMTP id 82UUqlk0-N5f; Fri, 10 Apr 2026 23:23:40 +0200 (CEST) Received: from localhost (localhost [127.0.0.1]) by zmproxy.tng.vnc.biz (Postfix) with ESMTP id 292751FADEA; Fri, 10 Apr 2026 23:23:40 +0200 (CEST) DKIM-Filter: OpenDKIM Filter v2.10.3 zmproxy.tng.vnc.biz 292751FADEA DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=tngtech.com; s=B14491C6-869D-11EB-BB6C-8DD33D883B31; t=1775856220; bh=fi1QhTRcAl83J9hWCIJ/+Tzev5iYH17LLNUSKGGZXwo=; h=From:To:Subject:Date:Message-ID:MIME-Version; b=l0QzwzflKRvy1rpbXuJED3ShP+//XTz17HXpV7Srsl9igaXGTe2kXzkAjneJC2QZe UP5AhZ1i/5meL7zz735FfgD2+qrbt+naDV38IByfA809ZUke7CP4JC33wy990o05H8 JUIFioOkOT1iAxKd0jBEdGzwVsb0YpcFOzCezSWwo+mqfXYaoLsyukUz57hEE1cvLU DYMCHvg0fz/MF4/GRJtAEmzRW/5RLKaTUHaQ8IPjBHMOtMij/D+t2t5qb+GOfzyWYR swtOowPQKJvaVOObb9pq++XOWsZnUQSbeKOuAyjLdfV5sU8QFsxAT3A0JdmRVjQtTc S6K/6GC5p6EUg== X-Virus-Scanned: amavis at zmproxy.tng.vnc.biz Received: from zmproxy.tng.vnc.biz ([127.0.0.1]) by localhost (zmproxy.tng.vnc.biz [127.0.0.1]) (amavis, port 10026) with ESMTP id Fow7Eg5pMUH6; Fri, 10 Apr 2026 23:23:40 +0200 (CEST) Received: from luis-Precision-5480.. (ipservice-092-209-239-167.092.209.pools.vodafone-ip.de [92.209.239.167]) by zmproxy.tng.vnc.biz (Postfix) with ESMTPSA id C32DF1FA89E; Fri, 10 Apr 2026 23:23:39 +0200 (CEST) From: Luis To: nathan@kernel.org, nsc@kernel.org Cc: linux-kbuild@vger.kernel.org, linux-kernel@vger.kernel.org, akpm@linux-foundation.org, gregkh@linuxfoundation.org, kstewart@linuxfoundation.org, maximilian.huber@tngtech.com, Luis Augenstein Subject: [PATCH v5 08/15] scripts/sbom: add JSON-LD serialization Date: Fri, 10 Apr 2026 23:22:48 +0200 Message-ID: <20260410212255.9883-9-luis.augenstein@tngtech.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20260410212255.9883-1-luis.augenstein@tngtech.com> References: <20260410212255.9883-1-luis.augenstein@tngtech.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Luis Augenstein Add infrastructure to serialize an SPDX graph as a JSON-LD document. NamespaceMaps in the SPDX document are converted to custom prefixes in the @context field of the JSON-LD output. The SBOM tool uses NamespaceMaps solely to shorten SPDX IDs, avoiding repetition of full namespace URIs by using short prefixes. Assisted-by: Cursor:claude-sonnet-4-5 Assisted-by: OpenCode:GLM-4-7 Co-developed-by: Maximilian Huber Signed-off-by: Maximilian Huber Signed-off-by: Luis Augenstein --- Makefile | 3 +- scripts/sbom/sbom.py | 52 +++++++++++++++++ scripts/sbom/sbom/config.py | 56 +++++++++++++++++++ scripts/sbom/sbom/spdx_graph/__init__.py | 7 +++ .../sbom/sbom/spdx_graph/build_spdx_graphs.py | 36 ++++++++++++ .../sbom/sbom/spdx_graph/spdx_graph_model.py | 36 ++++++++++++ 6 files changed, 189 insertions(+), 1 deletion(-) create mode 100644 scripts/sbom/sbom/spdx_graph/__init__.py create mode 100644 scripts/sbom/sbom/spdx_graph/build_spdx_graphs.py create mode 100644 scripts/sbom/sbom/spdx_graph/spdx_graph_model.py diff --git a/Makefile b/Makefile index 394ebd46e82..279e3abd34c 100644 --- a/Makefile +++ b/Makefile @@ -2174,7 +2174,8 @@ quiet_cmd_sbom =3D GEN $(sbom_targets) --src-tree $(abspath $(srctree)) \ --obj-tree $(abspath $(objtree)) \ --roots-file "$(tmp-target)" \ - --output-directory $(abspath $(objtree)); + --output-directory $(abspath $(objtree)) \ + --generate-spdx; PHONY +=3D sbom sbom: $(notdir $(KBUILD_IMAGE)) include/generated/autoconf.h $(if $(CONFIG= _MODULES),modules modules.order) $(call cmd,sbom) diff --git a/scripts/sbom/sbom.py b/scripts/sbom/sbom.py index 25d912a282d..426521ade46 100644 --- a/scripts/sbom/sbom.py +++ b/scripts/sbom/sbom.py @@ -6,13 +6,18 @@ Compute software bill of materials in SPDX format describing a kernel buil= d. """ =20 +import json import logging import os import sys import time +import uuid import sbom.sbom_logging as sbom_logging from sbom.config import get_config from sbom.path_utils import is_relative_to +from sbom.spdx import JsonLdSpdxDocument, SpdxIdGenerator +from sbom.spdx.core import CreationInfo, SpdxDocument +from sbom.spdx_graph import SpdxIdGeneratorCollection, build_spdx_graphs from sbom.cmd_graph import CmdGraph =20 =20 @@ -56,10 +61,57 @@ def main(): f.write("\n".join(str(file_path) for file_path in used_fil= es)) logging.debug(f"Successfully saved {used_files_path}") =20 + if config.generate_spdx is False: + return + + # Build SPDX Documents + logging.debug("Start generating SPDX graph based on cmd graph") + start_time =3D time.time() + + # The real uuid will be generated based on the content of the SPDX gra= phs + # to ensure that the same SPDX document is always assigned the same uu= id. + PLACEHOLDER_UUID =3D "00000000-0000-0000-0000-000000000000" + spdx_id_base_namespace =3D f"{config.spdxId_prefix}{PLACEHOLDER_UUID}/" + spdx_id_generators =3D SpdxIdGeneratorCollection( + base=3DSpdxIdGenerator(prefix=3D"p", namespace=3Dspdx_id_base_name= space), + source=3DSpdxIdGenerator(prefix=3D"s", namespace=3Df"{spdx_id_base= _namespace}source/"), + build=3DSpdxIdGenerator(prefix=3D"b", namespace=3Df"{spdx_id_base_= namespace}build/"), + output=3DSpdxIdGenerator(prefix=3D"o", namespace=3Df"{spdx_id_base= _namespace}output/"), + ) + + spdx_graphs =3D build_spdx_graphs( + cmd_graph, + spdx_id_generators, + config, + ) + spdx_id_uuid =3D uuid.uuid5( + uuid.NAMESPACE_URL, + "".join( + json.dumps(element.to_dict()) for spdx_graph in spdx_graphs.va= lues() for element in spdx_graph.to_list() + ), + ) + logging.debug(f"Generated SPDX graph in {time.time() - start_time} sec= onds") + # Report collected warnings and errors in case of failure warning_summary =3D sbom_logging.summarize_warnings() error_summary =3D sbom_logging.summarize_errors() =20 + if not sbom_logging.has_errors() or config.write_output_on_error: + for kernel_sbom_kind, spdx_graph in spdx_graphs.items(): + spdx_graph_objects =3D spdx_graph.to_list() + # Add warning and error summary to creation info comment + creation_info =3D next(element for element in spdx_graph_objec= ts if isinstance(element, CreationInfo)) + creation_info.comment =3D "\n".join([warning_summary, error_su= mmary]).strip() + # Replace Placeholder uuid with real uuid for spdxIds + spdx_document =3D next(element for element in spdx_graph_objec= ts if isinstance(element, SpdxDocument)) + for namespaceMap in spdx_document.namespaceMap: + namespaceMap.namespace =3D namespaceMap.namespace.replace(= PLACEHOLDER_UUID, str(spdx_id_uuid)) + # Serialize SPDX graph to JSON-LD + spdx_doc =3D JsonLdSpdxDocument(graph=3Dspdx_graph_objects) + save_path =3D os.path.join(config.output_directory, config.spd= x_file_names[kernel_sbom_kind]) + spdx_doc.save(save_path, config.prettify_json) + logging.debug(f"Successfully saved {save_path}") + if warning_summary: logging.warning(warning_summary) if error_summary: diff --git a/scripts/sbom/sbom/config.py b/scripts/sbom/sbom/config.py index 39e556a4c53..0985457c3ca 100644 --- a/scripts/sbom/sbom/config.py +++ b/scripts/sbom/sbom/config.py @@ -3,11 +3,18 @@ =20 import argparse from dataclasses import dataclass +from enum import Enum import os from typing import Any from sbom.path_utils import PathStr =20 =20 +class KernelSpdxDocumentKind(Enum): + SOURCE =3D "source" + BUILD =3D "build" + OUTPUT =3D "output" + + @dataclass class KernelSbomConfig: src_tree: PathStr @@ -19,6 +26,13 @@ class KernelSbomConfig: root_paths: list[PathStr] """List of paths to root outputs (relative to obj_tree) to base the SB= OM on.""" =20 + generate_spdx: bool + """Whether to generate SPDX SBOM documents. If False, no SPDX files ar= e created.""" + + spdx_file_names: dict[KernelSpdxDocumentKind, str] + """If `generate_spdx` is True, defines the file names for each SPDX SB= OM kind + (source, build, output) to store on disk.""" + generate_used_files: bool """Whether to generate a flat list of all source files used in the bui= ld. If False, no used-files document is created.""" @@ -38,6 +52,12 @@ class KernelSbomConfig: write_output_on_error: bool """Whether to write output documents even if errors occur.""" =20 + spdxId_prefix: str + """Prefix to use for all SPDX element IDs.""" + + prettify_json: bool + """Whether to pretty-print generated SPDX JSON documents.""" + =20 def _parse_cli_arguments() -> dict[str, Any]: """ @@ -72,6 +92,15 @@ def _parse_cli_arguments() -> dict[str, Any]: "--roots-file", help=3D"Path to a file containing the root paths (one per line). C= annot be used together with --roots.", ) + parser.add_argument( + "--generate-spdx", + action=3D"store_true", + default=3DFalse, + help=3D( + "Whether to create sbom-source.spdx.json, sbom-build.spdx.json= and " + "sbom-output.spdx.json documents (default: False)" + ), + ) parser.add_argument( "--generate-used-files", action=3D"store_true", @@ -119,6 +148,20 @@ def _parse_cli_arguments() -> dict[str, Any]: ), ) =20 + # SPDX specific options + spdx_group =3D parser.add_argument_group("SPDX options", "Options for = customizing SPDX document generation") + spdx_group.add_argument( + "--spdxId-prefix", + default=3D"urn:spdx.dev:", + help=3D"The prefix to use for all spdxId properties. (default: urn= :spdx.dev:)", + ) + spdx_group.add_argument( + "--prettify-json", + action=3D"store_true", + default=3DFalse, + help=3D"Whether to pretty print the generated spdx.json documents = (default: False)", + ) + args =3D vars(parser.parse_args()) return args =20 @@ -144,6 +187,7 @@ def get_config() -> KernelSbomConfig: root_paths =3D args["roots"] _validate_path_arguments(src_tree, obj_tree, root_paths) =20 + generate_spdx =3D args["generate_spdx"] generate_used_files =3D args["generate_used_files"] output_directory =3D os.path.realpath(args["output_directory"]) debug =3D args["debug"] @@ -151,19 +195,31 @@ def get_config() -> KernelSbomConfig: fail_on_unknown_build_command =3D not args["do_not_fail_on_unknown_bui= ld_command"] write_output_on_error =3D args["write_output_on_error"] =20 + spdxId_prefix =3D args["spdxId_prefix"] + prettify_json =3D args["prettify_json"] + # Hardcoded config + spdx_file_names =3D { + KernelSpdxDocumentKind.SOURCE: "sbom-source.spdx.json", + KernelSpdxDocumentKind.BUILD: "sbom-build.spdx.json", + KernelSpdxDocumentKind.OUTPUT: "sbom-output.spdx.json", + } used_files_file_name =3D "sbom.used-files.txt" =20 return KernelSbomConfig( src_tree=3Dsrc_tree, obj_tree=3Dobj_tree, root_paths=3Droot_paths, + generate_spdx=3Dgenerate_spdx, + spdx_file_names=3Dspdx_file_names, generate_used_files=3Dgenerate_used_files, used_files_file_name=3Dused_files_file_name, output_directory=3Doutput_directory, debug=3Ddebug, fail_on_unknown_build_command=3Dfail_on_unknown_build_command, write_output_on_error=3Dwrite_output_on_error, + spdxId_prefix=3DspdxId_prefix, + prettify_json=3Dprettify_json, ) =20 =20 diff --git a/scripts/sbom/sbom/spdx_graph/__init__.py b/scripts/sbom/sbom/s= pdx_graph/__init__.py new file mode 100644 index 00000000000..3557b1d51bf --- /dev/null +++ b/scripts/sbom/sbom/spdx_graph/__init__.py @@ -0,0 +1,7 @@ +# SPDX-License-Identifier: GPL-2.0-only OR MIT +# Copyright (C) 2025 TNG Technology Consulting GmbH + +from .build_spdx_graphs import build_spdx_graphs +from .spdx_graph_model import SpdxIdGeneratorCollection + +__all__ =3D ["build_spdx_graphs", "SpdxIdGeneratorCollection"] diff --git a/scripts/sbom/sbom/spdx_graph/build_spdx_graphs.py b/scripts/sb= om/sbom/spdx_graph/build_spdx_graphs.py new file mode 100644 index 00000000000..bb3db4e423d --- /dev/null +++ b/scripts/sbom/sbom/spdx_graph/build_spdx_graphs.py @@ -0,0 +1,36 @@ +# SPDX-License-Identifier: GPL-2.0-only OR MIT +# Copyright (C) 2025 TNG Technology Consulting GmbH + + +from typing import Protocol + +from sbom.config import KernelSpdxDocumentKind +from sbom.cmd_graph import CmdGraph +from sbom.path_utils import PathStr +from sbom.spdx_graph.spdx_graph_model import SpdxGraph, SpdxIdGeneratorCol= lection + + +class SpdxGraphConfig(Protocol): + obj_tree: PathStr + src_tree: PathStr + + +def build_spdx_graphs( + cmd_graph: CmdGraph, + spdx_id_generators: SpdxIdGeneratorCollection, + config: SpdxGraphConfig, +) -> dict[KernelSpdxDocumentKind, SpdxGraph]: + """ + Builds SPDX graphs (output, source, and build) based on a cmd dependen= cy graph. + If the source and object trees are identical, no dedicated source grap= h can be created. + In that case the source files are added to the build graph instead. + + Args: + cmd_graph: The dependency graph of a kernel build. + spdx_id_generators: Collection of SPDX ID generators. + config: Configuration options. + + Returns: + Dictionary of SPDX graphs + """ + return {} diff --git a/scripts/sbom/sbom/spdx_graph/spdx_graph_model.py b/scripts/sbo= m/sbom/spdx_graph/spdx_graph_model.py new file mode 100644 index 00000000000..682194d4362 --- /dev/null +++ b/scripts/sbom/sbom/spdx_graph/spdx_graph_model.py @@ -0,0 +1,36 @@ +# SPDX-License-Identifier: GPL-2.0-only OR MIT +# Copyright (C) 2025 TNG Technology Consulting GmbH + +from dataclasses import dataclass +from sbom.spdx.core import CreationInfo, SoftwareAgent, SpdxDocument, Spdx= Object +from sbom.spdx.software import Sbom +from sbom.spdx.spdxId import SpdxIdGenerator + + +@dataclass +class SpdxGraph: + """Represents the complete graph of a single SPDX document.""" + + spdx_document: SpdxDocument + agent: SoftwareAgent + creation_info: CreationInfo + sbom: Sbom + + def to_list(self) -> list[SpdxObject]: + return [ + self.spdx_document, + self.agent, + self.creation_info, + self.sbom, + *self.sbom.element, + ] + + +@dataclass +class SpdxIdGeneratorCollection: + """Holds SPDX ID generators for different document types to ensure glo= bally unique SPDX IDs.""" + + base: SpdxIdGenerator + source: SpdxIdGenerator + build: SpdxIdGenerator + output: SpdxIdGenerator --=20 2.43.0 From nobody Sat Apr 18 04:54:51 2026 Received: from mailgw01.zimbra-vnc.de (mailgw01.zimbra-vnc.de [148.251.101.236]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id F11DF3D411F; Fri, 10 Apr 2026 21:23:43 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=148.251.101.236 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1775856227; cv=none; b=ILPUaORbnlJuAWemgq/w3JWchplVwVHWwju8gEfSNvUkhuD3DXFR/jq+aiPZmj3ejl7ZRTd5hovoKHB9ANYmrcsqoWIAaLjiujXmlpVAzSjGQFYmnQEjOoXglSkdZUoLhf2w1R0DeNEpTCOTpwd8vduLqT+nI6R70+Ax4vJJpjk= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1775856227; c=relaxed/simple; bh=en97VGejxoWqqOyazA4YdtgoKd6lS/hoRfCENM+KAmk=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=cADkegaakh6B2lpgQenILRKv3xHVVMQemj89ywIFOMOZHD2Zhy7KAIaAl0y0HSDE46apoA7awX4scxwKiYzHohecoz0HoUhRbIyCApmbvLJWPemsgSlgQ7BxvoZFwakmly/LKraC/Dk0BDZ/EY76eb5//fwy2EUXtMa8EWuTk+0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=tngtech.com; spf=pass smtp.mailfrom=tngtech.com; dkim=pass (2048-bit key) header.d=tngtech.com header.i=@tngtech.com header.b=gPRcClRE; arc=none smtp.client-ip=148.251.101.236 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=tngtech.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=tngtech.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=tngtech.com header.i=@tngtech.com header.b="gPRcClRE" Received: from zmproxy.tng.vnc.biz (zimbra-vnc.tngtech.com [35.234.71.156]) by mailgw01.zimbra-vnc.de (Postfix) with ESMTPS id 4ECB33FAFD; Fri, 10 Apr 2026 23:23:42 +0200 (CEST) Received: from localhost (localhost [127.0.0.1]) by zmproxy.tng.vnc.biz (Postfix) with ESMTP id 186641FAE85; Fri, 10 Apr 2026 23:23:42 +0200 (CEST) Received: from zmproxy.tng.vnc.biz ([127.0.0.1]) by localhost (zmproxy.tng.vnc.biz [127.0.0.1]) (amavis, port 10032) with ESMTP id ekJf5H_iTRyC; Fri, 10 Apr 2026 23:23:41 +0200 (CEST) Received: from localhost (localhost [127.0.0.1]) by zmproxy.tng.vnc.biz (Postfix) with ESMTP id 763A41FADEA; Fri, 10 Apr 2026 23:23:41 +0200 (CEST) DKIM-Filter: OpenDKIM Filter v2.10.3 zmproxy.tng.vnc.biz 763A41FADEA DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=tngtech.com; s=B14491C6-869D-11EB-BB6C-8DD33D883B31; t=1775856221; bh=pc/Y+hYEhAhbGCky5XyQct6HCSsBP98k2EIRQ3n2X0M=; h=From:To:Subject:Date:Message-ID:MIME-Version; b=gPRcClREY4qSJSdgOHezaTvvrW3TztY6WwqgUew5EgGPgh5zbT1MM7vZlmVEJ6zSU BbywgGvJnlI6vmGG8zCMwszBU1dWf8jvb/3Uoj928i+/ixirLnd9+yBx8kQYTakYMt B2FoCEEQPnvJGoEAkHtzZ9JWlh4PYYj2VqgjgVQFwRWplc3aWWrD7jUm0jX3PHpvX6 KyNpGxnvveW2Z1uF49hPzzhJZO3cIrFKrSGmeurVtkiWOXELUH0HsP8h1Zd1MRKYc1 FzpoEsImmVIgB37fWQBaewnkXHjC5PH1rNwADHDZG0O24+vPLbA0nMjogANEPuXxoY 7ayxGuJwuzpcQ== X-Virus-Scanned: amavis at zmproxy.tng.vnc.biz Received: from zmproxy.tng.vnc.biz ([127.0.0.1]) by localhost (zmproxy.tng.vnc.biz [127.0.0.1]) (amavis, port 10026) with ESMTP id 8iCj8njOfsdP; Fri, 10 Apr 2026 23:23:41 +0200 (CEST) Received: from luis-Precision-5480.. (ipservice-092-209-239-167.092.209.pools.vodafone-ip.de [92.209.239.167]) by zmproxy.tng.vnc.biz (Postfix) with ESMTPSA id 1DCF01FAEB7; Fri, 10 Apr 2026 23:23:41 +0200 (CEST) From: Luis To: nathan@kernel.org, nsc@kernel.org Cc: linux-kbuild@vger.kernel.org, linux-kernel@vger.kernel.org, akpm@linux-foundation.org, gregkh@linuxfoundation.org, kstewart@linuxfoundation.org, maximilian.huber@tngtech.com, Luis Augenstein Subject: [PATCH v5 09/15] scripts/sbom: add shared SPDX elements Date: Fri, 10 Apr 2026 23:22:49 +0200 Message-ID: <20260410212255.9883-10-luis.augenstein@tngtech.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20260410212255.9883-1-luis.augenstein@tngtech.com> References: <20260410212255.9883-1-luis.augenstein@tngtech.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Luis Augenstein Implement shared SPDX elements used in all three documents. Assisted-by: Cursor:claude-sonnet-4-5 Assisted-by: OpenCode:GLM-4-7 Co-developed-by: Maximilian Huber Signed-off-by: Maximilian Huber Signed-off-by: Luis Augenstein --- scripts/sbom/sbom/config.py | 8 +++++ .../sbom/sbom/spdx_graph/build_spdx_graphs.py | 5 ++- .../sbom/spdx_graph/shared_spdx_elements.py | 32 +++++++++++++++++++ 3 files changed, 44 insertions(+), 1 deletion(-) create mode 100644 scripts/sbom/sbom/spdx_graph/shared_spdx_elements.py diff --git a/scripts/sbom/sbom/config.py b/scripts/sbom/sbom/config.py index 0985457c3ca..fa049f757cb 100644 --- a/scripts/sbom/sbom/config.py +++ b/scripts/sbom/sbom/config.py @@ -3,6 +3,7 @@ =20 import argparse from dataclasses import dataclass +from datetime import datetime from enum import Enum import os from typing import Any @@ -52,6 +53,9 @@ class KernelSbomConfig: write_output_on_error: bool """Whether to write output documents even if errors occur.""" =20 + created: datetime + """Datetime to use for the SPDX created property of the CreationInfo e= lement.""" + spdxId_prefix: str """Prefix to use for all SPDX element IDs.""" =20 @@ -195,6 +199,9 @@ def get_config() -> KernelSbomConfig: fail_on_unknown_build_command =3D not args["do_not_fail_on_unknown_bui= ld_command"] write_output_on_error =3D args["write_output_on_error"] =20 + created =3D datetime.fromtimestamp( + max([os.path.getmtime(os.path.join(obj_tree, root_path)) for root_= path in root_paths]) + ) spdxId_prefix =3D args["spdxId_prefix"] prettify_json =3D args["prettify_json"] =20 @@ -218,6 +225,7 @@ def get_config() -> KernelSbomConfig: debug=3Ddebug, fail_on_unknown_build_command=3Dfail_on_unknown_build_command, write_output_on_error=3Dwrite_output_on_error, + created=3Dcreated, spdxId_prefix=3DspdxId_prefix, prettify_json=3Dprettify_json, ) diff --git a/scripts/sbom/sbom/spdx_graph/build_spdx_graphs.py b/scripts/sb= om/sbom/spdx_graph/build_spdx_graphs.py index bb3db4e423d..9c47258a31c 100644 --- a/scripts/sbom/sbom/spdx_graph/build_spdx_graphs.py +++ b/scripts/sbom/sbom/spdx_graph/build_spdx_graphs.py @@ -1,18 +1,20 @@ # SPDX-License-Identifier: GPL-2.0-only OR MIT # Copyright (C) 2025 TNG Technology Consulting GmbH =20 - +from datetime import datetime from typing import Protocol =20 from sbom.config import KernelSpdxDocumentKind from sbom.cmd_graph import CmdGraph from sbom.path_utils import PathStr from sbom.spdx_graph.spdx_graph_model import SpdxGraph, SpdxIdGeneratorCol= lection +from sbom.spdx_graph.shared_spdx_elements import SharedSpdxElements =20 =20 class SpdxGraphConfig(Protocol): obj_tree: PathStr src_tree: PathStr + created: datetime =20 =20 def build_spdx_graphs( @@ -33,4 +35,5 @@ def build_spdx_graphs( Returns: Dictionary of SPDX graphs """ + shared_elements =3D SharedSpdxElements.create(spdx_id_generators.base,= config.created) return {} diff --git a/scripts/sbom/sbom/spdx_graph/shared_spdx_elements.py b/scripts= /sbom/sbom/spdx_graph/shared_spdx_elements.py new file mode 100644 index 00000000000..0c83428f4c7 --- /dev/null +++ b/scripts/sbom/sbom/spdx_graph/shared_spdx_elements.py @@ -0,0 +1,32 @@ +# SPDX-License-Identifier: GPL-2.0-only OR MIT +# Copyright (C) 2025 TNG Technology Consulting GmbH + +from dataclasses import dataclass +from datetime import datetime +from sbom.spdx.core import CreationInfo, SoftwareAgent +from sbom.spdx.spdxId import SpdxIdGenerator + + +@dataclass(frozen=3DTrue) +class SharedSpdxElements: + agent: SoftwareAgent + creation_info: CreationInfo + + @classmethod + def create(cls, spdx_id_generator: SpdxIdGenerator, created: datetime)= -> "SharedSpdxElements": + """ + Creates shared SPDX elements used across multiple documents. + + Args: + spdx_id_generator: Generator for creating SPDX IDs. + created: SPDX 'created' property used for the creation info. + + Returns: + SharedSpdxElements with agent and creation info. + """ + agent =3D SoftwareAgent( + spdxId=3Dspdx_id_generator.generate(), + name=3D"KernelSbom", + ) + creation_info =3D CreationInfo(createdBy=3D[agent], created=3Dcrea= ted.strftime("%Y-%m-%dT%H:%M:%SZ")) + return SharedSpdxElements(agent=3Dagent, creation_info=3Dcreation_= info) --=20 2.43.0 From nobody Sat Apr 18 04:54:51 2026 Received: from mailgw01.zimbra-vnc.de (mailgw01.zimbra-vnc.de [148.251.101.236]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 7AB8D3B27DE; Fri, 10 Apr 2026 21:23:45 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=148.251.101.236 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1775856229; cv=none; b=b0HdcYIncNYxKXcG4GNCV1w4qzsnBlFWw/BobGsbUOzUebS1hdcjZE94p29mMDbSyMCDvpMwh9gYwbeKHs0YJeiQ/qi/N7ib4RzIepKGLYsCXoMtZ0vP3PnOjbZOf9udn1Q88MMVHuShJc2dz//x8NzZTWGBK1WvSe7aYTxxfO4= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1775856229; c=relaxed/simple; bh=NZfRy0MnacdpZz0yTzk+6KNjnpZMnJ+k9Tn73vLy5I4=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=MDT1f1Q91X3+nVWuv2ylQFuL6oZ6ACjmWuOo0JSpDWSNgu8j78w4IQ+dYvJ17bhE4aCf20HaxcJSOb/5AqOTDV95JX0jBQtkqs5zNS/vx/Ge0ZIsfZvE67fDGEgDIq5dH8GS1x3lOMMq6rK12Dab3Q3yl51y9+E6+Q6b2/o9COM= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=tngtech.com; spf=pass smtp.mailfrom=tngtech.com; dkim=pass (2048-bit key) header.d=tngtech.com header.i=@tngtech.com header.b=JZEQQAwH; arc=none smtp.client-ip=148.251.101.236 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=tngtech.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=tngtech.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=tngtech.com header.i=@tngtech.com header.b="JZEQQAwH" Received: from zmproxy.tng.vnc.biz (zimbra-vnc.tngtech.com [35.234.71.156]) by mailgw01.zimbra-vnc.de (Postfix) with ESMTPS id A97A43FAF6; Fri, 10 Apr 2026 23:23:43 +0200 (CEST) Received: from localhost (localhost [127.0.0.1]) by zmproxy.tng.vnc.biz (Postfix) with ESMTP id 908311FADEA; Fri, 10 Apr 2026 23:23:43 +0200 (CEST) Received: from zmproxy.tng.vnc.biz ([127.0.0.1]) by localhost (zmproxy.tng.vnc.biz [127.0.0.1]) (amavis, port 10032) with ESMTP id pM6-5xqO5AQi; Fri, 10 Apr 2026 23:23:42 +0200 (CEST) Received: from localhost (localhost [127.0.0.1]) by zmproxy.tng.vnc.biz (Postfix) with ESMTP id AFD0E1FAE85; Fri, 10 Apr 2026 23:23:42 +0200 (CEST) DKIM-Filter: OpenDKIM Filter v2.10.3 zmproxy.tng.vnc.biz AFD0E1FAE85 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=tngtech.com; s=B14491C6-869D-11EB-BB6C-8DD33D883B31; t=1775856222; bh=Bo9qgWOSUxN9u0lrSF5eMvW/pUVtNNKukQZgbK6yhzk=; h=From:To:Subject:Date:Message-ID:MIME-Version; b=JZEQQAwHvz9zBFRfwPq7gLz6CTYhsg5V+KdZNODzSAaE3nMO7gRo+oSVGmLs4Tslp KCRaC/qW2ztSYC5YY8mXnZ3/kovPwahHGIJsxrxBzOIsqIA9jHJJ6TqbzbwuBQTfW5 a1efwEUeYrO+IjWVeeIqFD6chxB9lfzWQS5TvPYJVjRK7P+LsWGuFt4P4SH/i/UJgh 20PN+PkYAKwsCrVmLjWBAatcIM9iQzpRiXrlugbQvk0sypkdnX4L7NhbLxcA2uQaiU q7QQjkXoSEKZaj3zAxqrqK5nd06cQlfELPPtihRSDH4ih6PdVVmv87psXqYIuRfHcE Vw9gC/z/Q3eJg== X-Virus-Scanned: amavis at zmproxy.tng.vnc.biz Received: from zmproxy.tng.vnc.biz ([127.0.0.1]) by localhost (zmproxy.tng.vnc.biz [127.0.0.1]) (amavis, port 10026) with ESMTP id SRqULlZ9pW0X; Fri, 10 Apr 2026 23:23:42 +0200 (CEST) Received: from luis-Precision-5480.. (ipservice-092-209-239-167.092.209.pools.vodafone-ip.de [92.209.239.167]) by zmproxy.tng.vnc.biz (Postfix) with ESMTPSA id 520FB1FADEA; Fri, 10 Apr 2026 23:23:42 +0200 (CEST) From: Luis To: nathan@kernel.org, nsc@kernel.org Cc: linux-kbuild@vger.kernel.org, linux-kernel@vger.kernel.org, akpm@linux-foundation.org, gregkh@linuxfoundation.org, kstewart@linuxfoundation.org, maximilian.huber@tngtech.com, Luis Augenstein Subject: [PATCH v5 10/15] scripts/sbom: collect file metadata Date: Fri, 10 Apr 2026 23:22:50 +0200 Message-ID: <20260410212255.9883-11-luis.augenstein@tngtech.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20260410212255.9883-1-luis.augenstein@tngtech.com> References: <20260410212255.9883-1-luis.augenstein@tngtech.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Luis Augenstein Implement the kernel_file module that collects file metadata, including license identifier for source files, SHA-256 hash, Git blob object ID, an estimation of the file type, and whether files belong to the source, build, or output SBOM. Assisted-by: Cursor:claude-sonnet-4-5 Assisted-by: OpenCode:GLM-4-7 Co-developed-by: Maximilian Huber Signed-off-by: Maximilian Huber Signed-off-by: Luis Augenstein --- .../sbom/sbom/spdx_graph/build_spdx_graphs.py | 2 + scripts/sbom/sbom/spdx_graph/kernel_file.py | 310 ++++++++++++++++++ 2 files changed, 312 insertions(+) create mode 100644 scripts/sbom/sbom/spdx_graph/kernel_file.py diff --git a/scripts/sbom/sbom/spdx_graph/build_spdx_graphs.py b/scripts/sb= om/sbom/spdx_graph/build_spdx_graphs.py index 9c47258a31c..0f95f99d560 100644 --- a/scripts/sbom/sbom/spdx_graph/build_spdx_graphs.py +++ b/scripts/sbom/sbom/spdx_graph/build_spdx_graphs.py @@ -7,6 +7,7 @@ from typing import Protocol from sbom.config import KernelSpdxDocumentKind from sbom.cmd_graph import CmdGraph from sbom.path_utils import PathStr +from sbom.spdx_graph.kernel_file import KernelFileCollection from sbom.spdx_graph.spdx_graph_model import SpdxGraph, SpdxIdGeneratorCol= lection from sbom.spdx_graph.shared_spdx_elements import SharedSpdxElements =20 @@ -36,4 +37,5 @@ def build_spdx_graphs( Dictionary of SPDX graphs """ shared_elements =3D SharedSpdxElements.create(spdx_id_generators.base,= config.created) + kernel_files =3D KernelFileCollection.create(cmd_graph, config.obj_tre= e, config.src_tree, spdx_id_generators) return {} diff --git a/scripts/sbom/sbom/spdx_graph/kernel_file.py b/scripts/sbom/sbo= m/spdx_graph/kernel_file.py new file mode 100644 index 00000000000..c1136945dc0 --- /dev/null +++ b/scripts/sbom/sbom/spdx_graph/kernel_file.py @@ -0,0 +1,310 @@ +# SPDX-License-Identifier: GPL-2.0-only OR MIT +# Copyright (C) 2025 TNG Technology Consulting GmbH + +from dataclasses import dataclass +from enum import Enum +import hashlib +import os +import re +from sbom.cmd_graph import CmdGraph +from sbom.path_utils import PathStr, is_relative_to +from sbom.spdx import SpdxId, SpdxIdGenerator +from sbom.spdx.core import Hash +from sbom.spdx.software import ContentIdentifier, File, SoftwarePurpose +import sbom.sbom_logging as sbom_logging +from sbom.spdx_graph.spdx_graph_model import SpdxIdGeneratorCollection + + +class KernelFileLocation(Enum): + """Represents the location of a file relative to the source/object tre= es.""" + + SOURCE_TREE =3D "source_tree" + """File is located in the source tree.""" + OBJ_TREE =3D "obj_tree" + """File is located in the object tree.""" + EXTERNAL =3D "external" + """File is located outside both source and object trees.""" + BOTH =3D "both" + """File is located in a folder that is both source and object tree.""" + + +@dataclass +class KernelFile: + """kernel-specific metadata used to generate an SPDX File element.""" + + absolute_path: PathStr + """Absolute path of the file.""" + file_location: KernelFileLocation + """Location of the file relative to the source/object trees.""" + name: str + """Name of the file element. Should be relative to the source tree if + file_location equals SOURCE_TREE and relative to the object tree if + file_location equals OBJ_TREE. If file_location equals EXTERNAL, the + absolute path is used.""" + license_identifier: str | None + """SPDX license ID if file_location equals SOURCE_TREE or BOTH; otherw= ise None.""" + spdx_id_generator: SpdxIdGenerator + """Generator for the SPDX ID of the file element.""" + + _spdx_file_element: File | None =3D None + + @classmethod + def create( + cls, + absolute_path: PathStr, + obj_tree: PathStr, + src_tree: PathStr, + spdx_id_generators: SpdxIdGeneratorCollection, + is_output: bool, + ) -> "KernelFile": + is_in_obj_tree =3D is_relative_to(absolute_path, obj_tree) + is_in_src_tree =3D is_relative_to(absolute_path, src_tree) + + # file element name should be relative to output or src tree if po= ssible + if not is_in_src_tree and not is_in_obj_tree: + file_element_name =3D str(absolute_path) + file_location =3D KernelFileLocation.EXTERNAL + spdx_id_generator =3D spdx_id_generators.build + elif is_in_src_tree and src_tree =3D=3D obj_tree: + file_element_name =3D os.path.relpath(absolute_path, obj_tree) + file_location =3D KernelFileLocation.BOTH + spdx_id_generator =3D spdx_id_generators.output if is_output e= lse spdx_id_generators.build + elif is_in_obj_tree: + file_element_name =3D os.path.relpath(absolute_path, obj_tree) + file_location =3D KernelFileLocation.OBJ_TREE + spdx_id_generator =3D spdx_id_generators.output if is_output e= lse spdx_id_generators.build + else: + file_element_name =3D os.path.relpath(absolute_path, src_tree) + file_location =3D KernelFileLocation.SOURCE_TREE + spdx_id_generator =3D spdx_id_generators.source + + # parse spdx license identifier + license_identifier =3D ( + _parse_spdx_license_identifier(absolute_path) + if file_location =3D=3D KernelFileLocation.SOURCE_TREE or file= _location =3D=3D KernelFileLocation.BOTH + else None + ) + + return KernelFile( + absolute_path, + file_location, + file_element_name, + license_identifier, + spdx_id_generator, + ) + + @property + def spdx_file_element(self) -> File: + if self._spdx_file_element is None: + self._spdx_file_element =3D _build_file_element( + self.absolute_path, + self.name, + self.spdx_id_generator.generate(), + self.file_location, + ) + return self._spdx_file_element + + +@dataclass +class KernelFileCollection: + """Collection of kernel files.""" + + source: dict[PathStr, KernelFile] + build: dict[PathStr, KernelFile] + output: dict[PathStr, KernelFile] + + @classmethod + def create( + cls, + cmd_graph: CmdGraph, + obj_tree: PathStr, + src_tree: PathStr, + spdx_id_generators: SpdxIdGeneratorCollection, + ) -> "KernelFileCollection": + source: dict[PathStr, KernelFile] =3D {} + build: dict[PathStr, KernelFile] =3D {} + output: dict[PathStr, KernelFile] =3D {} + root_node_paths =3D {node.absolute_path for node in cmd_graph.root= s} + for node in cmd_graph: + is_root =3D node.absolute_path in root_node_paths + kernel_file =3D KernelFile.create( + node.absolute_path, + obj_tree, + src_tree, + spdx_id_generators, + is_root, + ) + if is_root: + output[kernel_file.absolute_path] =3D kernel_file + elif kernel_file.file_location =3D=3D KernelFileLocation.SOURC= E_TREE: + source[kernel_file.absolute_path] =3D kernel_file + else: + build[kernel_file.absolute_path] =3D kernel_file + + return KernelFileCollection(source, build, output) + + def to_dict(self) -> dict[PathStr, KernelFile]: + return {**self.source, **self.build, **self.output} + + +def _build_file_element(absolute_path: PathStr, name: str, spdx_id: SpdxId= , file_location: KernelFileLocation) -> File: + verifiedUsing: list[Hash] =3D [] + content_identifier: list[ContentIdentifier] =3D [] + if os.path.exists(absolute_path): + verifiedUsing =3D [Hash(algorithm=3D"sha256", hashValue=3D_sha256(= absolute_path))] + content_identifier =3D [ + ContentIdentifier( + software_contentIdentifierType=3D"gitoid", + software_contentIdentifierValue=3D_git_blob_oid(absolute_p= ath), + ) + ] + elif file_location =3D=3D KernelFileLocation.EXTERNAL: + sbom_logging.warning( + "Cannot compute hash for {absolute_path} because file does not= exist.", + absolute_path=3Dabsolute_path, + ) + else: + sbom_logging.error( + "Cannot compute hash for {absolute_path} because file does not= exist.", + absolute_path=3Dabsolute_path, + ) + + # primary purpose + primary_purpose =3D _get_primary_purpose(absolute_path) + + return File( + spdxId=3Dspdx_id, + name=3Dname, + verifiedUsing=3DverifiedUsing, + software_primaryPurpose=3Dprimary_purpose, + software_contentIdentifier=3Dcontent_identifier, + ) + + +def _sha256(path: PathStr) -> str: + """Compute the SHA-256 hash of a file.""" + with open(path, "rb") as f: + data =3D f.read() + return hashlib.sha256(data).hexdigest() + + +def _git_blob_oid(file_path: str) -> str: + """ + Compute the Git blob object ID (SHA-1) for a file, like `git hash-obje= ct`. + + Args: + file_path: Path to the file. + + Returns: + SHA-1 hash (hex) of the Git blob object. + """ + with open(file_path, "rb") as f: + content =3D f.read() + header =3D f"blob {len(content)}\0".encode() + store =3D header + content + sha1_hash =3D hashlib.sha1(store).hexdigest() + return sha1_hash + + +# REUSE-IgnoreStart +SPDX_LICENSE_IDENTIFIER_PATTERN =3D re.compile(r"SPDX-License-Identifier:\= s*(?P.*?)(?:\s*(\*/|$))") +# REUSE-IgnoreEnd + + +def _parse_spdx_license_identifier(absolute_path: str, max_lines: int =3D = 5) -> str | None: + """ + Extracts the SPDX-License-Identifier from the first few lines of a sou= rce file. + + Args: + absolute_path: Path to the source file. + max_lines: Number of lines to scan from the top (default: 5). + + Returns: + The license identifier string (e.g., 'GPL-2.0-only') if found, oth= erwise None. + """ + try: + with open(absolute_path, "r") as f: + for _ in range(max_lines): + match =3D SPDX_LICENSE_IDENTIFIER_PATTERN.search(f.readlin= e()) + if match: + return match.group("id") + except (UnicodeDecodeError, OSError): + return None + return None + + +def _get_primary_purpose(absolute_path: PathStr) -> SoftwarePurpose | None: + def ends_with(suffixes: list[str]) -> bool: + return any(absolute_path.endswith(suffix) for suffix in suffixes) + + def includes_path_segments(path_segments: list[str]) -> bool: + return any(segment in absolute_path for segment in path_segments) + + # Source code + if ends_with([".c", ".h", ".S", ".s", ".rs", ".pl", "gen_smb2_mapping"= ]): + return "source" + + # Libraries + if ends_with([".a", ".so", ".rlib"]): + return "library" + + # Archives + if ends_with([".xz", ".cpio", ".gz", ".tar", ".zip"]): + return "archive" + + # Applications + if ends_with(["bzImage", "Image"]): + return "application" + + # Executables / machine code + if ends_with([".bin", ".elf", "vmlinux", "vmlinux.unstripped", "bpfilt= er_umh"]): + return "executable" + + # Kernel modules + if ends_with([".ko"]): + return "module" + + # Data files + if ends_with( + [ + ".tbl", + ".relocs", + ".rmeta", + ".in", + ".dbg", + ".x509", + ".pbm", + ".ppm", + ".dtb", + ".uc", + ".inc", + ".dts", + ".dtsi", + ".dtbo", + ".xml", + ".ro", + "initramfs_inc_data", + "default_cpio_list", + "x509_certificate_list", + "utf8data.c_shipped", + "blacklist_hash_list", + "x509_revocation_list", + "cpucaps", + "sysreg", + ] + ) or includes_path_segments(["drivers/gpu/drm/radeon/reg_srcs/"]): + return "data" + + # Configuration files + if ends_with([".pem", ".key", ".conf", ".config", ".cfg", ".bconf"]): + return "configuration" + + # Documentation + if ends_with([".md"]): + return "documentation" + + # Other / miscellaneous + if ends_with([".o", ".tmp"]): + return "other" + + sbom_logging.warning("Could not infer primary purpose for {absolute_pa= th}", absolute_path=3Dabsolute_path) --=20 2.43.0 From nobody Sat Apr 18 04:54:51 2026 Received: from mailgw01.zimbra-vnc.de (mailgw01.zimbra-vnc.de [148.251.101.236]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 141DE2E62A9; Fri, 10 Apr 2026 21:23:46 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=148.251.101.236 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1775856231; cv=none; b=gyOexS+PC+qpBxRmtwidMPAekKX6dUdzFqRwgy2Am4zZSvfyThM3Q3/V9hNjEwEQYOwA987yRFtazd+3Do/SJT2ZJNf5U1koIOgUJoCMx1OveQv/TDFHu1qz1vJrtOGlmOhZht/pkeLjEW4cvqHhmi6mAL/FrFi50UVyXikVitw= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1775856231; c=relaxed/simple; bh=lABjwYOi3n2iLxXUWCXMquA1zQTBovKv5/Kwo8xvgkk=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=IwvkxA+ATLgNITt4u47UrYUWcvBSLIy99jKtqnLsa0scwjhvZxl99nQnE06Qlp3yp4d722ZUlU3TJlcZ8jm33gUOgpQ9Weu/z/qe+DC/YzAh46OFlCMlLV5sy9p+W93PdTa6VR+OqZ7+HI6ngfxIQfvNebJCU9zuT2nJxBCJnnA= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=tngtech.com; spf=pass smtp.mailfrom=tngtech.com; dkim=pass (2048-bit key) header.d=tngtech.com header.i=@tngtech.com header.b=FS8PtmUi; arc=none smtp.client-ip=148.251.101.236 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=tngtech.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=tngtech.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=tngtech.com header.i=@tngtech.com header.b="FS8PtmUi" Received: from zmproxy.tng.vnc.biz (zimbra-vnc.tngtech.com [35.234.71.156]) by mailgw01.zimbra-vnc.de (Postfix) with ESMTPS id 2721F3FAF2; Fri, 10 Apr 2026 23:23:45 +0200 (CEST) Received: from localhost (localhost [127.0.0.1]) by zmproxy.tng.vnc.biz (Postfix) with ESMTP id F19D51FAE85; Fri, 10 Apr 2026 23:23:44 +0200 (CEST) Received: from zmproxy.tng.vnc.biz ([127.0.0.1]) by localhost (zmproxy.tng.vnc.biz [127.0.0.1]) (amavis, port 10032) with ESMTP id otlrQzJ1xMIN; Fri, 10 Apr 2026 23:23:44 +0200 (CEST) Received: from localhost (localhost [127.0.0.1]) by zmproxy.tng.vnc.biz (Postfix) with ESMTP id 0F2F61FADEA; Fri, 10 Apr 2026 23:23:44 +0200 (CEST) DKIM-Filter: OpenDKIM Filter v2.10.3 zmproxy.tng.vnc.biz 0F2F61FADEA DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=tngtech.com; s=B14491C6-869D-11EB-BB6C-8DD33D883B31; t=1775856224; bh=EA3FKzEACKJbVsDduo96T9LYjlM4DGeYd+GPWkSilyM=; h=From:To:Subject:Date:Message-ID:MIME-Version; b=FS8PtmUiTOMG4Wyk2jNwLa6M/7z/2BGZbpnfFkGbamTVO6XoZteax+NyJtASoSIL6 O5D2UcQYMj9P713Lamb/LAXQm3eHkGFf4BZIUWF6PHCAJnJ31dl1DZGBMg3LVT6kPY iqd7J7j6nYV1I9erfMQr0puR1T1E1PAH3T2GyxPHMHdlc1GbXTleGFozvi0sgA9FAz 7HWcz+7cEa6C1MJYup2mNFzP4WrgHswxIRESznyN9C4DMAOYPG/Cz5oslGqSUSS3og AI37pr7AnTaauvgIPNdrakXiu2yvUDlmJkGoBr2WEQEiIJNd156uA6hRhZR1+I2T/d 2d3pDvyvh2KSQ== X-Virus-Scanned: amavis at zmproxy.tng.vnc.biz Received: from zmproxy.tng.vnc.biz ([127.0.0.1]) by localhost (zmproxy.tng.vnc.biz [127.0.0.1]) (amavis, port 10026) with ESMTP id MaEmtQ1j_iiY; Fri, 10 Apr 2026 23:23:43 +0200 (CEST) Received: from luis-Precision-5480.. (ipservice-092-209-239-167.092.209.pools.vodafone-ip.de [92.209.239.167]) by zmproxy.tng.vnc.biz (Postfix) with ESMTPSA id A482D1FAE85; Fri, 10 Apr 2026 23:23:43 +0200 (CEST) From: Luis To: nathan@kernel.org, nsc@kernel.org Cc: linux-kbuild@vger.kernel.org, linux-kernel@vger.kernel.org, akpm@linux-foundation.org, gregkh@linuxfoundation.org, kstewart@linuxfoundation.org, maximilian.huber@tngtech.com, Luis Augenstein Subject: [PATCH v5 11/15] scripts/sbom: add SPDX output graph Date: Fri, 10 Apr 2026 23:22:51 +0200 Message-ID: <20260410212255.9883-12-luis.augenstein@tngtech.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20260410212255.9883-1-luis.augenstein@tngtech.com> References: <20260410212255.9883-1-luis.augenstein@tngtech.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Luis Augenstein Implement the SPDX output graph which contains the distributable build outputs and high level metadata about the build. Assisted-by: Cursor:claude-sonnet-4-5 Assisted-by: OpenCode:GLM-4-7 Co-developed-by: Maximilian Huber Signed-off-by: Maximilian Huber Signed-off-by: Luis Augenstein --- Makefile | 5 +- scripts/sbom/sbom/config.py | 64 ++++++ .../sbom/sbom/spdx_graph/build_spdx_graphs.py | 18 +- .../sbom/sbom/spdx_graph/spdx_output_graph.py | 187 ++++++++++++++++++ 4 files changed, 272 insertions(+), 2 deletions(-) create mode 100644 scripts/sbom/sbom/spdx_graph/spdx_output_graph.py diff --git a/Makefile b/Makefile index 279e3abd34c..286effedafb 100644 --- a/Makefile +++ b/Makefile @@ -2175,7 +2175,10 @@ quiet_cmd_sbom =3D GEN $(sbom_targets) --obj-tree $(abspath $(objtree)) \ --roots-file "$(tmp-target)" \ --output-directory $(abspath $(objtree)) \ - --generate-spdx; + --generate-spdx \ + --package-license "GPL-2.0 WITH Linux-syscall-note" \ + --package-version "$(KERNELVERSION)" \ + --write-output-on-error; PHONY +=3D sbom sbom: $(notdir $(KBUILD_IMAGE)) include/generated/autoconf.h $(if $(CONFIG= _MODULES),modules modules.order) $(call cmd,sbom) diff --git a/scripts/sbom/sbom/config.py b/scripts/sbom/sbom/config.py index fa049f757cb..f47daaa7b8d 100644 --- a/scripts/sbom/sbom/config.py +++ b/scripts/sbom/sbom/config.py @@ -59,6 +59,21 @@ class KernelSbomConfig: spdxId_prefix: str """Prefix to use for all SPDX element IDs.""" =20 + build_type: str + """SPDX buildType property to use for all Build elements.""" + + build_id: str | None + """SPDX buildId property to use for all Build elements.""" + + package_license: str + """License expression applied to all SPDX Packages.""" + + package_version: str | None + """Version string applied to all SPDX Packages.""" + + package_copyright_text: str | None + """Copyright text applied to all SPDX Packages.""" + prettify_json: bool """Whether to pretty-print generated SPDX JSON documents.""" =20 @@ -159,6 +174,40 @@ def _parse_cli_arguments() -> dict[str, Any]: default=3D"urn:spdx.dev:", help=3D"The prefix to use for all spdxId properties. (default: urn= :spdx.dev:)", ) + spdx_group.add_argument( + "--build-type", + default=3D"urn:spdx.dev:Kbuild", + help=3D"The SPDX buildType property to use for all Build elements.= (default: urn:spdx.dev:Kbuild)", + ) + spdx_group.add_argument( + "--build-id", + default=3DNone, + help=3D"The SPDX buildId property to use for all Build elements.\n" + "If not provided the spdxId of the high level Build element is use= d as the buildId. (default: None)", + ) + spdx_group.add_argument( + "--package-license", + default=3D"NOASSERTION", + help=3D( + "The SPDX licenseExpression property to use for the LicenseExp= ression " + "linked to all SPDX Package elements. (default: NOASSERTION)" + ), + ) + spdx_group.add_argument( + "--package-version", + default=3DNone, + help=3D"The SPDX packageVersion property to use for all SPDX Packa= ge elements. (default: None)", + ) + spdx_group.add_argument( + "--package-copyright-text", + default=3DNone, + help=3D( + "The SPDX copyrightText property to use for all SPDX Package e= lements.\n" + "If not specified, and if a COPYING file exists in the source = tree,\n" + "the package-copyright-text is set to the content of this file= . " + "(default: None)" + ), + ) spdx_group.add_argument( "--prettify-json", action=3D"store_true", @@ -203,6 +252,16 @@ def get_config() -> KernelSbomConfig: max([os.path.getmtime(os.path.join(obj_tree, root_path)) for root_= path in root_paths]) ) spdxId_prefix =3D args["spdxId_prefix"] + build_type =3D args["build_type"] + build_id =3D args["build_id"] + package_license =3D args["package_license"] + package_version =3D args["package_version"] if args["package_version"]= is not None else None + package_copyright_text: str | None =3D None + if args["package_copyright_text"] is not None: + package_copyright_text =3D args["package_copyright_text"] + elif os.path.isfile(copying_path :=3D os.path.join(src_tree, "COPYING"= )): + with open(copying_path, "r") as f: + package_copyright_text =3D f.read() prettify_json =3D args["prettify_json"] =20 # Hardcoded config @@ -227,6 +286,11 @@ def get_config() -> KernelSbomConfig: write_output_on_error=3Dwrite_output_on_error, created=3Dcreated, spdxId_prefix=3DspdxId_prefix, + build_type=3Dbuild_type, + build_id=3Dbuild_id, + package_license=3Dpackage_license, + package_version=3Dpackage_version, + package_copyright_text=3Dpackage_copyright_text, prettify_json=3Dprettify_json, ) =20 diff --git a/scripts/sbom/sbom/spdx_graph/build_spdx_graphs.py b/scripts/sb= om/sbom/spdx_graph/build_spdx_graphs.py index 0f95f99d560..2af0fbe6cdb 100644 --- a/scripts/sbom/sbom/spdx_graph/build_spdx_graphs.py +++ b/scripts/sbom/sbom/spdx_graph/build_spdx_graphs.py @@ -10,12 +10,18 @@ from sbom.path_utils import PathStr from sbom.spdx_graph.kernel_file import KernelFileCollection from sbom.spdx_graph.spdx_graph_model import SpdxGraph, SpdxIdGeneratorCol= lection from sbom.spdx_graph.shared_spdx_elements import SharedSpdxElements +from sbom.spdx_graph.spdx_output_graph import SpdxOutputGraph =20 =20 class SpdxGraphConfig(Protocol): obj_tree: PathStr src_tree: PathStr created: datetime + build_type: str + build_id: str | None + package_license: str + package_version: str | None + package_copyright_text: str | None =20 =20 def build_spdx_graphs( @@ -38,4 +44,14 @@ def build_spdx_graphs( """ shared_elements =3D SharedSpdxElements.create(spdx_id_generators.base,= config.created) kernel_files =3D KernelFileCollection.create(cmd_graph, config.obj_tre= e, config.src_tree, spdx_id_generators) - return {} + output_graph =3D SpdxOutputGraph.create( + root_files=3Dlist(kernel_files.output.values()), + shared_elements=3Dshared_elements, + spdx_id_generators=3Dspdx_id_generators, + config=3Dconfig, + ) + spdx_graphs: dict[KernelSpdxDocumentKind, SpdxGraph] =3D { + KernelSpdxDocumentKind.OUTPUT: output_graph, + } + + return spdx_graphs diff --git a/scripts/sbom/sbom/spdx_graph/spdx_output_graph.py b/scripts/sb= om/sbom/spdx_graph/spdx_output_graph.py new file mode 100644 index 00000000000..ff9b2c31fb0 --- /dev/null +++ b/scripts/sbom/sbom/spdx_graph/spdx_output_graph.py @@ -0,0 +1,187 @@ +# SPDX-License-Identifier: GPL-2.0-only OR MIT +# Copyright (C) 2025 TNG Technology Consulting GmbH + +from dataclasses import dataclass +import os +from typing import Protocol +from sbom.environment import Environment +from sbom.path_utils import PathStr +from sbom.spdx.build import Build +from sbom.spdx.core import DictionaryEntry, NamespaceMap, Relationship, Sp= dxDocument +from sbom.spdx.simplelicensing import LicenseExpression +from sbom.spdx.software import File, Package, Sbom +from sbom.spdx.spdxId import SpdxIdGenerator +from sbom.spdx_graph.kernel_file import KernelFile +from sbom.spdx_graph.shared_spdx_elements import SharedSpdxElements +from sbom.spdx_graph.spdx_graph_model import SpdxGraph, SpdxIdGeneratorCol= lection + + +class SpdxOutputGraphConfig(Protocol): + obj_tree: PathStr + src_tree: PathStr + build_type: str + build_id: str | None + package_license: str + package_version: str | None + package_copyright_text: str | None + + +@dataclass +class SpdxOutputGraph(SpdxGraph): + """SPDX graph representing distributable output files""" + + high_level_build_element: Build + + @classmethod + def create( + cls, + root_files: list[KernelFile], + shared_elements: SharedSpdxElements, + spdx_id_generators: SpdxIdGeneratorCollection, + config: SpdxOutputGraphConfig, + ) -> "SpdxOutputGraph": + """ + Args: + root_files: List of distributable output files which act as ro= ots + of the dependency graph. + shared_elements: Shared SPDX elements used across multiple doc= uments. + spdx_id_generators: Collection of SPDX ID generators. + config: Configuration options. + + Returns: + SpdxOutputGraph: The SPDX output graph. + """ + # SpdxDocument + spdx_document =3D SpdxDocument( + spdxId=3Dspdx_id_generators.output.generate(), + profileConformance=3D["core", "software", "build", "simpleLice= nsing"], + namespaceMap=3D[ + NamespaceMap(prefix=3Dgenerator.prefix, namespace=3Dgenera= tor.namespace) + for generator in [spdx_id_generators.output, spdx_id_gener= ators.base] + if generator.prefix is not None + ], + ) + + # Sbom + sbom =3D Sbom( + spdxId=3Dspdx_id_generators.output.generate(), + software_sbomType=3D["build"], + ) + + # High-level Build elements + config_source_element =3D KernelFile.create( + absolute_path=3Dos.path.join(config.obj_tree, ".config"), + obj_tree=3Dconfig.obj_tree, + src_tree=3Dconfig.src_tree, + spdx_id_generators=3Dspdx_id_generators, + is_output=3DTrue, + ).spdx_file_element + high_level_build_element, high_level_build_element_hasOutput_relat= ionship =3D _high_level_build_elements( + config.build_type, + config.build_id, + config_source_element, + spdx_id_generators.output, + ) + + # Root file elements + root_file_elements: list[File] =3D [file.spdx_file_element for fil= e in root_files] + + # Package elements + package_elements =3D [ + Package( + spdxId=3Dspdx_id_generators.output.generate(), + name=3D_get_package_name(file.name), + software_packageVersion=3Dconfig.package_version, + software_copyrightText=3Dconfig.package_copyright_text, + comment=3Df"Architecture=3D{arch}" if (arch :=3D Environme= nt.ARCH() or Environment.SRCARCH()) else None, + software_primaryPurpose=3Dfile.software_primaryPurpose, + ) + for file in root_file_elements + ] + package_hasDistributionArtifact_file_relationships =3D [ + Relationship( + spdxId=3Dspdx_id_generators.output.generate(), + relationshipType=3D"hasDistributionArtifact", + from_=3Dpackage, + to=3D[file], + ) + for package, file in zip(package_elements, root_file_elements) + ] + package_license_expression =3D LicenseExpression( + spdxId=3Dspdx_id_generators.output.generate(), + simplelicensing_licenseExpression=3Dconfig.package_license, + ) + package_hasDeclaredLicense_relationships =3D [ + Relationship( + spdxId=3Dspdx_id_generators.output.generate(), + relationshipType=3D"hasDeclaredLicense", + from_=3Dpackage, + to=3D[package_license_expression], + ) + for package in package_elements + ] + + # Update relationships + spdx_document.rootElement =3D [sbom] + + sbom.rootElement =3D [*package_elements] + sbom.element =3D [ + config_source_element, + high_level_build_element, + high_level_build_element_hasOutput_relationship, + *root_file_elements, + *package_elements, + *package_hasDistributionArtifact_file_relationships, + package_license_expression, + *package_hasDeclaredLicense_relationships, + ] + + high_level_build_element_hasOutput_relationship.to =3D [*root_file= _elements] + + output_graph =3D SpdxOutputGraph( + spdx_document, + shared_elements.agent, + shared_elements.creation_info, + sbom, + high_level_build_element, + ) + return output_graph + + +def _get_package_name(filename: str) -> str: + """ + Generates a SPDX package name from a filename. + Kernel images (bzImage, Image) get a descriptive name, others use the = basename of the file. + """ + KERNEL_FILENAMES =3D ["bzImage", "Image"] + basename =3D os.path.basename(filename) + return f"Linux Kernel ({basename})" if basename in KERNEL_FILENAMES el= se basename + + +def _high_level_build_elements( + build_type: str, + build_id: str | None, + config_source_element: File, + spdx_id_generator: SpdxIdGenerator, +) -> tuple[Build, Relationship]: + build_spdxId =3D spdx_id_generator.generate() + high_level_build_element =3D Build( + spdxId=3Dbuild_spdxId, + build_buildType=3Dbuild_type, + build_buildId=3Dbuild_id if build_id is not None else build_spdxId, + build_environment=3D[ + DictionaryEntry(key=3Dkey, value=3Dvalue) + for key, value in Environment.KERNEL_BUILD_VARIABLES().items() + if value + ], + build_configSourceUri=3D[config_source_element.spdxId], + build_configSourceDigest=3Dconfig_source_element.verifiedUsing, + ) + + high_level_build_element_hasOutput_relationship =3D Relationship( + spdxId=3Dspdx_id_generator.generate(), + relationshipType=3D"hasOutput", + from_=3Dhigh_level_build_element, + to=3D[], + ) + return high_level_build_element, high_level_build_element_hasOutput_re= lationship --=20 2.43.0 From nobody Sat Apr 18 04:54:51 2026 Received: from mailgw01.zimbra-vnc.de (mailgw01.zimbra-vnc.de [148.251.101.236]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 98FCC3D9DA3; Fri, 10 Apr 2026 21:23:48 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=148.251.101.236 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1775856232; cv=none; b=YY143r5qnvDToHi3XrVPyb7IrlXobLJ4WwH9CGrUKElAAYgAhe+L4kpeIDyKcp1qkHMwYGPixVd88sV30hmp66LNtVJBIPUjLWON3kbry9FMsY+Y/x3PYzNBDJKF6IcI0fE6QLW5tcRGEK5EgTQBFH6zAm/5Q/ohVUEvdjyIBI0= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1775856232; c=relaxed/simple; bh=Ch66c5/FbvtSTmeTrypzv3E/FSr9HWXTwjwWx/c4BQc=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=QgL8HlKoEORqe0RJrLNlLf0IlIIhaLOLIUnhF20aWzEMQQ0TG2+z1xMsBxkyNmn5ViIgru152pu0CGW3za1VdWWLbHyyiQGLjgM/x3BAMftAXP9asVxb7WignUzCWLfEmEOfa8LCCKH6hXiEVn048gGbwelLBYh5cr0VoEsqwtI= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=tngtech.com; spf=pass smtp.mailfrom=tngtech.com; dkim=pass (2048-bit key) header.d=tngtech.com header.i=@tngtech.com header.b=cN3B5SKS; arc=none smtp.client-ip=148.251.101.236 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=tngtech.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=tngtech.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=tngtech.com header.i=@tngtech.com header.b="cN3B5SKS" Received: from zmproxy.tng.vnc.biz (zimbra-vnc.tngtech.com [35.234.71.156]) by mailgw01.zimbra-vnc.de (Postfix) with ESMTPS id EBBEF3FAF3; Fri, 10 Apr 2026 23:23:45 +0200 (CEST) Received: from localhost (localhost [127.0.0.1]) by zmproxy.tng.vnc.biz (Postfix) with ESMTP id D44771FAE85; Fri, 10 Apr 2026 23:23:45 +0200 (CEST) Received: from zmproxy.tng.vnc.biz ([127.0.0.1]) by localhost (zmproxy.tng.vnc.biz [127.0.0.1]) (amavis, port 10032) with ESMTP id 6zl2srHTWAwF; Fri, 10 Apr 2026 23:23:45 +0200 (CEST) Received: from localhost (localhost [127.0.0.1]) by zmproxy.tng.vnc.biz (Postfix) with ESMTP id 3A7B61FADEA; Fri, 10 Apr 2026 23:23:45 +0200 (CEST) DKIM-Filter: OpenDKIM Filter v2.10.3 zmproxy.tng.vnc.biz 3A7B61FADEA DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=tngtech.com; s=B14491C6-869D-11EB-BB6C-8DD33D883B31; t=1775856225; bh=dWA9kM+iNrKX2E168kWuLAwgVjwsWv/1z02ZEkLTBTg=; h=From:To:Subject:Date:Message-ID:MIME-Version; b=cN3B5SKSxfetN0om9NMa8fL0S+sMOHpfBbafS5dXOayWbfF8+t7i6yCGIBbKBLINW LESj1Xw1QBRZOwpgbYPRQqwYywpkauldTKHs5HEQZIS4MW9NjvFEFPagG91ov4z2B5 wpeH6oB4faRZg18kRd+Ge5ZvizEC4hUFKm6svea6XfYV7BQaiTxrgs0GvXSBXP8kWV 97M4rYGSX0HdABhWJrr59GXmlPQz0s6hVKS2vaYen71r+283znQkAFBxBx6BWwMNUw YBNVgo+HP3BMvHsSWKcq4d6omUtLNnbiH7CZThg5W7vvg3hyOr/BV95OsmCE/LwgYB YqWKgI4JlJ98g== X-Virus-Scanned: amavis at zmproxy.tng.vnc.biz Received: from zmproxy.tng.vnc.biz ([127.0.0.1]) by localhost (zmproxy.tng.vnc.biz [127.0.0.1]) (amavis, port 10026) with ESMTP id W03xgSH900rL; Fri, 10 Apr 2026 23:23:45 +0200 (CEST) Received: from luis-Precision-5480.. (ipservice-092-209-239-167.092.209.pools.vodafone-ip.de [92.209.239.167]) by zmproxy.tng.vnc.biz (Postfix) with ESMTPSA id D373C1FAEB7; Fri, 10 Apr 2026 23:23:44 +0200 (CEST) From: Luis To: nathan@kernel.org, nsc@kernel.org Cc: linux-kbuild@vger.kernel.org, linux-kernel@vger.kernel.org, akpm@linux-foundation.org, gregkh@linuxfoundation.org, kstewart@linuxfoundation.org, maximilian.huber@tngtech.com, Luis Augenstein Subject: [PATCH v5 12/15] scripts/sbom: add SPDX source graph Date: Fri, 10 Apr 2026 23:22:52 +0200 Message-ID: <20260410212255.9883-13-luis.augenstein@tngtech.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20260410212255.9883-1-luis.augenstein@tngtech.com> References: <20260410212255.9883-1-luis.augenstein@tngtech.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Luis Augenstein Implement the SPDX source graph which contains all source files involved during the build, along with the licensing information for each file. Assisted-by: Cursor:claude-sonnet-4-5 Assisted-by: OpenCode:GLM-4-7 Co-developed-by: Maximilian Huber Signed-off-by: Maximilian Huber Signed-off-by: Luis Augenstein --- .../sbom/sbom/spdx_graph/build_spdx_graphs.py | 8 ++ .../sbom/sbom/spdx_graph/spdx_source_graph.py | 126 ++++++++++++++++++ 2 files changed, 134 insertions(+) create mode 100644 scripts/sbom/sbom/spdx_graph/spdx_source_graph.py diff --git a/scripts/sbom/sbom/spdx_graph/build_spdx_graphs.py b/scripts/sb= om/sbom/spdx_graph/build_spdx_graphs.py index 2af0fbe6cdb..a61257a905f 100644 --- a/scripts/sbom/sbom/spdx_graph/build_spdx_graphs.py +++ b/scripts/sbom/sbom/spdx_graph/build_spdx_graphs.py @@ -10,6 +10,7 @@ from sbom.path_utils import PathStr from sbom.spdx_graph.kernel_file import KernelFileCollection from sbom.spdx_graph.spdx_graph_model import SpdxGraph, SpdxIdGeneratorCol= lection from sbom.spdx_graph.shared_spdx_elements import SharedSpdxElements +from sbom.spdx_graph.spdx_source_graph import SpdxSourceGraph from sbom.spdx_graph.spdx_output_graph import SpdxOutputGraph =20 =20 @@ -54,4 +55,11 @@ def build_spdx_graphs( KernelSpdxDocumentKind.OUTPUT: output_graph, } =20 + if len(kernel_files.source) > 0: + spdx_graphs[KernelSpdxDocumentKind.SOURCE] =3D SpdxSourceGraph.cre= ate( + source_files=3Dlist(kernel_files.source.values()), + shared_elements=3Dshared_elements, + spdx_id_generators=3Dspdx_id_generators, + ) + return spdx_graphs diff --git a/scripts/sbom/sbom/spdx_graph/spdx_source_graph.py b/scripts/sb= om/sbom/spdx_graph/spdx_source_graph.py new file mode 100644 index 00000000000..16176c4ea5e --- /dev/null +++ b/scripts/sbom/sbom/spdx_graph/spdx_source_graph.py @@ -0,0 +1,126 @@ +# SPDX-License-Identifier: GPL-2.0-only OR MIT +# Copyright (C) 2025 TNG Technology Consulting GmbH + +from dataclasses import dataclass +from sbom.spdx import SpdxIdGenerator +from sbom.spdx.core import Element, NamespaceMap, Relationship, SpdxDocume= nt +from sbom.spdx.simplelicensing import LicenseExpression +from sbom.spdx.software import File, Sbom +from sbom.spdx_graph.kernel_file import KernelFile +from sbom.spdx_graph.shared_spdx_elements import SharedSpdxElements +from sbom.spdx_graph.spdx_graph_model import SpdxGraph, SpdxIdGeneratorCol= lection + + +@dataclass +class SpdxSourceGraph(SpdxGraph): + """SPDX graph representing source files""" + + @classmethod + def create( + cls, + source_files: list[KernelFile], + shared_elements: SharedSpdxElements, + spdx_id_generators: SpdxIdGeneratorCollection, + ) -> "SpdxSourceGraph": + """ + Args: + source_files: List of files within the kernel source tree. + shared_elements: Shared SPDX elements used across multiple doc= uments. + spdx_id_generators: Collection of SPDX ID generators. + + Returns: + SpdxSourceGraph: The SPDX source graph. + """ + # SpdxDocument + source_spdx_document =3D SpdxDocument( + spdxId=3Dspdx_id_generators.source.generate(), + profileConformance=3D["core", "software", "simpleLicensing"], + namespaceMap=3D[ + NamespaceMap(prefix=3Dgenerator.prefix, namespace=3Dgenera= tor.namespace) + for generator in [spdx_id_generators.source, spdx_id_gener= ators.base] + if generator.prefix is not None + ], + ) + + # Sbom + source_sbom =3D Sbom( + spdxId=3Dspdx_id_generators.source.generate(), + software_sbomType=3D["source"], + ) + + # Src Tree Elements + src_tree_element =3D File( + spdxId=3Dspdx_id_generators.source.generate(), + name=3D"$(src_tree)", + software_fileKind=3D"directory", + ) + src_tree_contains_relationship =3D Relationship( + spdxId=3Dspdx_id_generators.source.generate(), + relationshipType=3D"contains", + from_=3Dsrc_tree_element, + to=3D[], + ) + + # Source file elements + source_file_elements: list[Element] =3D [file.spdx_file_element fo= r file in source_files] + + # Source file license elements + source_file_license_identifiers, source_file_license_relationships= =3D source_file_license_elements( + source_files, spdx_id_generators.source + ) + + # Update relationships + source_spdx_document.rootElement =3D [source_sbom] + source_sbom.rootElement =3D [src_tree_element] + source_sbom.element =3D [ + src_tree_element, + src_tree_contains_relationship, + *source_file_elements, + *source_file_license_identifiers, + *source_file_license_relationships, + ] + src_tree_contains_relationship.to =3D source_file_elements + + source_graph =3D SpdxSourceGraph( + source_spdx_document, + shared_elements.agent, + shared_elements.creation_info, + source_sbom, + ) + return source_graph + + +def source_file_license_elements( + source_files: list[KernelFile], spdx_id_generator: SpdxIdGenerator +) -> tuple[list[LicenseExpression], list[Relationship]]: + """ + Creates SPDX license expressions and links them to the given source fi= les + via hasDeclaredLicense relationships. + + Args: + source_files: List of files within the kernel source tree. + spdx_id_generator: Generator for unique SPDX IDs. + + Returns: + Tuple of (license expressions, hasDeclaredLicense relationships). + """ + license_expressions: dict[str, LicenseExpression] =3D {} + for file in source_files: + if file.license_identifier is None or file.license_identifier in l= icense_expressions: + continue + license_expressions[file.license_identifier] =3D LicenseExpression( + spdxId=3Dspdx_id_generator.generate(), + simplelicensing_licenseExpression=3Dfile.license_identifier, + ) + + source_file_license_relationships =3D [ + Relationship( + spdxId=3Dspdx_id_generator.generate(), + relationshipType=3D"hasDeclaredLicense", + from_=3Dfile.spdx_file_element, + to=3D[license_expressions[file.license_identifier]], + ) + for file in source_files + if file.license_identifier is not None + ] + return ([*license_expressions.values()], source_file_license_relations= hips) --=20 2.43.0 From nobody Sat Apr 18 04:54:51 2026 Received: from mailgw01.zimbra-vnc.de (mailgw01.zimbra-vnc.de [148.251.101.236]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B16273D3339; Fri, 10 Apr 2026 21:23:49 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=148.251.101.236 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1775856234; cv=none; b=SVv2LHxYyk011IDt/CI5nbbmZoL0kzGq8X+SjETK1OJVl1KTBbybvPELwswOQce+MNH9LEn2xHhmBRp2Y8dhIvf8PdZvDM6ImZCwvKTcc2n2TV/TfsPJUp4H0Kngb7hsTKUjmeca7Nn0H+XrvcOdlOc1YkhRBXQiLoXWZZCw7Uo= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1775856234; c=relaxed/simple; bh=VINbOXb7Zj55xMhGVfpTl7kcGtmvO6pWwdh436Vypuk=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=GZIWRohmIZhvvggRNV/u/PAAIpBzUQz4X8YlkBcWkWXemrxidr0pihATFxId13JTv6RP5gL/ssQNkY/vvCjvP/5Gs7n7tJDpvIzfBlG6dNfRcJzMbduOB9n3tbQ2X2czi4bJZY3caQxY8uzStAE4yTsNj2oLLSkT6X0r+hS6NII= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=tngtech.com; spf=pass smtp.mailfrom=tngtech.com; dkim=pass (2048-bit key) header.d=tngtech.com header.i=@tngtech.com header.b=eulRmua2; arc=none smtp.client-ip=148.251.101.236 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=tngtech.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=tngtech.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=tngtech.com header.i=@tngtech.com header.b="eulRmua2" Received: from zmproxy.tng.vnc.biz (zimbra-vnc.tngtech.com [35.234.71.156]) by mailgw01.zimbra-vnc.de (Postfix) with ESMTPS id C9DE63FAF8; Fri, 10 Apr 2026 23:23:47 +0200 (CEST) Received: from localhost (localhost [127.0.0.1]) by zmproxy.tng.vnc.biz (Postfix) with ESMTP id B5BC71FAEBD; Fri, 10 Apr 2026 23:23:47 +0200 (CEST) Received: from zmproxy.tng.vnc.biz ([127.0.0.1]) by localhost (zmproxy.tng.vnc.biz [127.0.0.1]) (amavis, port 10032) with ESMTP id bkO42lexOQ9b; Fri, 10 Apr 2026 23:23:46 +0200 (CEST) Received: from localhost (localhost [127.0.0.1]) by zmproxy.tng.vnc.biz (Postfix) with ESMTP id 5F40C1FAE85; Fri, 10 Apr 2026 23:23:46 +0200 (CEST) DKIM-Filter: OpenDKIM Filter v2.10.3 zmproxy.tng.vnc.biz 5F40C1FAE85 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=tngtech.com; s=B14491C6-869D-11EB-BB6C-8DD33D883B31; t=1775856226; bh=SxR8f/QlxNnWFnV4y8OUnWaD5y2lytYUXYJemmIW2kE=; h=From:To:Subject:Date:Message-ID:MIME-Version; b=eulRmua2CM6WDIc6eKKqLQTN+7fab1xTjKGYcIaQz8jE7hyRpQPG7IGIEqVkK4ubn WTUykcCKDGkJaqm1HxlBZ5+3F/9vYZZ5+Ey1REHsi9dBbErZbnnboN8UVivmCob8uu eMxjBdvTin/KoSbkknRY4vttlcOdRPR8QSbYjiiOmO88SHFnAICgyNWVIYatsf03yx A1kBiiofiK5yVAZtCKMFcgjXEQvvHzZTHzvTpdaytd8wf1QpTNGpkvCNuoddR+VZ4Y uCm0Fjeck65uujh6X7U4k61xTbyFULwmxYc0G2ZS1qcDX3S9gj73qSy2IhMi5go17N GNF+ucnjhSmww== X-Virus-Scanned: amavis at zmproxy.tng.vnc.biz Received: from zmproxy.tng.vnc.biz ([127.0.0.1]) by localhost (zmproxy.tng.vnc.biz [127.0.0.1]) (amavis, port 10026) with ESMTP id pf0vtbCLgfAF; Fri, 10 Apr 2026 23:23:46 +0200 (CEST) Received: from luis-Precision-5480.. (ipservice-092-209-239-167.092.209.pools.vodafone-ip.de [92.209.239.167]) by zmproxy.tng.vnc.biz (Postfix) with ESMTPSA id 032B31FADEA; Fri, 10 Apr 2026 23:23:45 +0200 (CEST) From: Luis To: nathan@kernel.org, nsc@kernel.org Cc: linux-kbuild@vger.kernel.org, linux-kernel@vger.kernel.org, akpm@linux-foundation.org, gregkh@linuxfoundation.org, kstewart@linuxfoundation.org, maximilian.huber@tngtech.com, Luis Augenstein Subject: [PATCH v5 13/15] scripts/sbom: add SPDX build graph Date: Fri, 10 Apr 2026 23:22:53 +0200 Message-ID: <20260410212255.9883-14-luis.augenstein@tngtech.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20260410212255.9883-1-luis.augenstein@tngtech.com> References: <20260410212255.9883-1-luis.augenstein@tngtech.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Luis Augenstein Implement the SPDX build graph to describe the relationships between source files in the source SBOM and output files in the output SBOM. Assisted-by: Cursor:claude-sonnet-4-5 Assisted-by: OpenCode:GLM-4-7 Co-developed-by: Maximilian Huber Signed-off-by: Maximilian Huber Signed-off-by: Luis Augenstein --- .../sbom/sbom/spdx_graph/build_spdx_graphs.py | 17 + .../sbom/sbom/spdx_graph/spdx_build_graph.py | 317 ++++++++++++++++++ 2 files changed, 334 insertions(+) create mode 100644 scripts/sbom/sbom/spdx_graph/spdx_build_graph.py diff --git a/scripts/sbom/sbom/spdx_graph/build_spdx_graphs.py b/scripts/sb= om/sbom/spdx_graph/build_spdx_graphs.py index a61257a905f..eecc5215644 100644 --- a/scripts/sbom/sbom/spdx_graph/build_spdx_graphs.py +++ b/scripts/sbom/sbom/spdx_graph/build_spdx_graphs.py @@ -4,6 +4,7 @@ from datetime import datetime from typing import Protocol =20 +import logging from sbom.config import KernelSpdxDocumentKind from sbom.cmd_graph import CmdGraph from sbom.path_utils import PathStr @@ -11,6 +12,7 @@ from sbom.spdx_graph.kernel_file import KernelFileCollect= ion from sbom.spdx_graph.spdx_graph_model import SpdxGraph, SpdxIdGeneratorCol= lection from sbom.spdx_graph.shared_spdx_elements import SharedSpdxElements from sbom.spdx_graph.spdx_source_graph import SpdxSourceGraph +from sbom.spdx_graph.spdx_build_graph import SpdxBuildGraph from sbom.spdx_graph.spdx_output_graph import SpdxOutputGraph =20 =20 @@ -61,5 +63,20 @@ def build_spdx_graphs( shared_elements=3Dshared_elements, spdx_id_generators=3Dspdx_id_generators, ) + else: + logging.info( + "Skipped creating a dedicated source SBOM because source files= cannot be " + "reliably classified when the source and object trees are iden= tical. " + "Added source files to the build SBOM instead." + ) + + build_graph =3D SpdxBuildGraph.create( + cmd_graph, + kernel_files, + shared_elements, + output_graph.high_level_build_element, + spdx_id_generators, + ) + spdx_graphs[KernelSpdxDocumentKind.BUILD] =3D build_graph =20 return spdx_graphs diff --git a/scripts/sbom/sbom/spdx_graph/spdx_build_graph.py b/scripts/sbo= m/sbom/spdx_graph/spdx_build_graph.py new file mode 100644 index 00000000000..2956800fa9e --- /dev/null +++ b/scripts/sbom/sbom/spdx_graph/spdx_build_graph.py @@ -0,0 +1,317 @@ +# SPDX-License-Identifier: GPL-2.0-only OR MIT +# Copyright (C) 2025 TNG Technology Consulting GmbH + +from dataclasses import dataclass +from typing import Mapping +from sbom.cmd_graph import CmdGraph +from sbom.path_utils import PathStr +from sbom.spdx import SpdxIdGenerator +from sbom.spdx.build import Build +from sbom.spdx.core import ExternalMap, NamespaceMap, Relationship, SpdxDo= cument +from sbom.spdx.software import File, Sbom +from sbom.spdx_graph.kernel_file import KernelFileCollection +from sbom.spdx_graph.shared_spdx_elements import SharedSpdxElements +from sbom.spdx_graph.spdx_graph_model import SpdxGraph, SpdxIdGeneratorCol= lection +from sbom.spdx_graph.spdx_source_graph import source_file_license_elements + + +@dataclass +class SpdxBuildGraph(SpdxGraph): + """SPDX graph representing build dependencies connecting source files = and + distributable output files""" + + @classmethod + def create( + cls, + cmd_graph: CmdGraph, + kernel_files: KernelFileCollection, + shared_elements: SharedSpdxElements, + high_level_build_element: Build, + spdx_id_generators: SpdxIdGeneratorCollection, + ) -> "SpdxBuildGraph": + if len(kernel_files.source) > 0: + return _create_spdx_build_graph( + cmd_graph, + kernel_files, + shared_elements, + high_level_build_element, + spdx_id_generators, + ) + else: + return _create_spdx_build_graph_with_mixed_sources( + cmd_graph, + kernel_files, + shared_elements, + high_level_build_element, + spdx_id_generators, + ) + + +def _create_spdx_build_graph( + cmd_graph: CmdGraph, + kernel_files: KernelFileCollection, + shared_elements: SharedSpdxElements, + high_level_build_element: Build, + spdx_id_generators: SpdxIdGeneratorCollection, +) -> SpdxBuildGraph: + """ + Creates an SPDX build graph where source and output files are referenc= ed + from external documents. + + Args: + cmd_graph: The dependency graph of a kernel build. + kernel_files: Collection of categorized kernel files involved in t= he build. + shared_elements: SPDX elements shared across multiple documents. + high_level_build_element: The high-level Build element referenced = by the build graph. + spdx_id_generators: Collection of generators for SPDX element IDs. + + Returns: + SpdxBuildGraph: The SPDX build graph connecting source files and d= istributable output files. + """ + # SpdxDocument + build_spdx_document =3D SpdxDocument( + spdxId=3Dspdx_id_generators.build.generate(), + profileConformance=3D["core", "software", "build"], + namespaceMap=3D[ + NamespaceMap(prefix=3Dgenerator.prefix, namespace=3Dgenerator.= namespace) + for generator in [ + spdx_id_generators.build, + spdx_id_generators.source, + spdx_id_generators.output, + spdx_id_generators.base, + ] + if generator.prefix is not None + ], + ) + + # Sbom + build_sbom =3D Sbom( + spdxId=3Dspdx_id_generators.build.generate(), + software_sbomType=3D["build"], + ) + + # Src and object tree elements + obj_tree_element =3D File( + spdxId=3Dspdx_id_generators.build.generate(), + name=3D"$(obj_tree)", + software_fileKind=3D"directory", + ) + obj_tree_contains_relationship =3D Relationship( + spdxId=3Dspdx_id_generators.build.generate(), + relationshipType=3D"contains", + from_=3Dobj_tree_element, + to=3D[], + ) + + # File elements + build_file_elements =3D [file.spdx_file_element for file in kernel_fil= es.build.values()] + file_relationships =3D _file_relationships( + cmd_graph=3Dcmd_graph, + file_elements=3D{key: file.spdx_file_element for key, file in kern= el_files.to_dict().items()}, + high_level_build_element=3Dhigh_level_build_element, + spdx_id_generator=3Dspdx_id_generators.build, + ) + + # Update relationships + build_spdx_document.rootElement =3D [build_sbom] + + build_spdx_document.import_ =3D [ + *( + ExternalMap(externalSpdxId=3Dfile_element.spdx_file_element.sp= dxId) + for file_element in kernel_files.source.values() + ), + ExternalMap(externalSpdxId=3Dhigh_level_build_element.spdxId), + *(ExternalMap(externalSpdxId=3Dfile.spdx_file_element.spdxId) for = file in kernel_files.output.values()), + ] + + build_sbom.rootElement =3D [obj_tree_element] + build_sbom.element =3D [ + obj_tree_element, + obj_tree_contains_relationship, + *build_file_elements, + *file_relationships, + ] + + obj_tree_contains_relationship.to =3D [ + *build_file_elements, + *(file.spdx_file_element for file in kernel_files.output.values()), + ] + + # create Spdx graphs + build_graph =3D SpdxBuildGraph( + build_spdx_document, + shared_elements.agent, + shared_elements.creation_info, + build_sbom, + ) + return build_graph + + +def _create_spdx_build_graph_with_mixed_sources( + cmd_graph: CmdGraph, + kernel_files: KernelFileCollection, + shared_elements: SharedSpdxElements, + high_level_build_element: Build, + spdx_id_generators: SpdxIdGeneratorCollection, +) -> SpdxBuildGraph: + """ + Creates an SPDX build graph where only output files are referenced from + an external document. Source files are included directly in the build = graph. + + Args: + cmd_graph: The dependency graph of a kernel build. + kernel_files: Collection of categorized kernel files involved in t= he build. + shared_elements: SPDX elements shared across multiple documents. + high_level_build_element: The high-level Build element referenced = by the build graph. + spdx_id_generators: Collection of generators for SPDX element IDs. + + Returns: + SpdxBuildGraph: The SPDX build graph connecting source files and d= istributable output files. + """ + # SpdxDocument + build_spdx_document =3D SpdxDocument( + spdxId=3Dspdx_id_generators.build.generate(), + profileConformance=3D["core", "software", "build"], + namespaceMap=3D[ + NamespaceMap(prefix=3Dgenerator.prefix, namespace=3Dgenerator.= namespace) + for generator in [ + spdx_id_generators.build, + spdx_id_generators.output, + spdx_id_generators.base, + ] + if generator.prefix is not None + ], + ) + + # Sbom + build_sbom =3D Sbom( + spdxId=3Dspdx_id_generators.build.generate(), + software_sbomType=3D["build"], + ) + + # File elements + build_file_elements =3D [file.spdx_file_element for file in kernel_fil= es.build.values()] + file_relationships =3D _file_relationships( + cmd_graph=3Dcmd_graph, + file_elements=3D{key: file.spdx_file_element for key, file in kern= el_files.to_dict().items()}, + high_level_build_element=3Dhigh_level_build_element, + spdx_id_generator=3Dspdx_id_generators.build, + ) + + # Source file license elements + source_file_license_identifiers, source_file_license_relationships =3D= source_file_license_elements( + list(kernel_files.build.values()), spdx_id_generators.build + ) + + # Update relationships + build_spdx_document.rootElement =3D [build_sbom] + root_file_elements =3D [file.spdx_file_element for file in kernel_file= s.output.values()] + build_spdx_document.import_ =3D [ + ExternalMap(externalSpdxId=3Dhigh_level_build_element.spdxId), + *(ExternalMap(externalSpdxId=3Dfile.spdxId) for file in root_file_= elements), + ] + + build_sbom.rootElement =3D [*root_file_elements] + build_sbom.element =3D [ + *build_file_elements, + *source_file_license_identifiers, + *source_file_license_relationships, + *file_relationships, + ] + + build_graph =3D SpdxBuildGraph( + build_spdx_document, + shared_elements.agent, + shared_elements.creation_info, + build_sbom, + ) + return build_graph + + +def _file_relationships( + cmd_graph: CmdGraph, + file_elements: Mapping[PathStr, File], + high_level_build_element: Build, + spdx_id_generator: SpdxIdGenerator, +) -> list[Build | Relationship]: + """ + Construct SPDX Build and Relationship elements representing dependency + relationships in the cmd graph. + + Args: + cmd_graph: The dependency graph of a kernel build. + file_elements: Mapping of filesystem paths (PathStr) to their + corresponding SPDX File elements. + high_level_build_element: The SPDX Build element representing the = overall build process/root. + spdx_id_generator: Generator for unique SPDX IDs. + + Returns: + list[Build | Relationship]: List of SPDX Build and Relationship el= ements + """ + high_level_build_ancestorOf_relationship =3D Relationship( + spdxId=3Dspdx_id_generator.generate(), + relationshipType=3D"ancestorOf", + from_=3Dhigh_level_build_element, + completeness=3D"complete", + to=3D[], + ) + + # Create a relationship between each node (output file) + # and its children (input files) + build_and_relationship_elements: list[Build | Relationship] =3D [high_= level_build_ancestorOf_relationship] + for node in cmd_graph: + if next(node.children, None) is None: + continue + + # .cmd file dependencies + if node.cmd_file is not None: + build_element =3D Build( + spdxId=3Dspdx_id_generator.generate(), + build_buildType=3Dhigh_level_build_element.build_buildType, + build_buildId=3Dhigh_level_build_element.build_buildId, + comment=3Dnode.cmd_file.savedcmd, + ) + hasInput_relationship =3D Relationship( + spdxId=3Dspdx_id_generator.generate(), + relationshipType=3D"hasInput", + from_=3Dbuild_element, + to=3D[file_elements[child_node.absolute_path] for child_no= de in node.children], + ) + hasOutput_relationship =3D Relationship( + spdxId=3Dspdx_id_generator.generate(), + relationshipType=3D"hasOutput", + from_=3Dbuild_element, + to=3D[file_elements[node.absolute_path]], + ) + build_and_relationship_elements +=3D [ + build_element, + hasInput_relationship, + hasOutput_relationship, + ] + high_level_build_ancestorOf_relationship.to.append(build_eleme= nt) + + # incbin dependencies + if len(node.incbin_dependencies) > 0: + incbin_dependsOn_relationship =3D Relationship( + spdxId=3Dspdx_id_generator.generate(), + relationshipType=3D"dependsOn", + comment=3D"\n".join([incbin_dependency.full_statement for = incbin_dependency in node.incbin_dependencies]), + from_=3Dfile_elements[node.absolute_path], + to=3D[ + file_elements[incbin_dependency.node.absolute_path] + for incbin_dependency in node.incbin_dependencies + ], + ) + build_and_relationship_elements.append(incbin_dependsOn_relati= onship) + + # hardcoded dependencies + if len(node.hardcoded_dependencies) > 0: + hardcoded_dependency_relationship =3D Relationship( + spdxId=3Dspdx_id_generator.generate(), + relationshipType=3D"dependsOn", + from_=3Dfile_elements[node.absolute_path], + to=3D[file_elements[n.absolute_path] for n in node.hardcod= ed_dependencies], + ) + build_and_relationship_elements.append(hardcoded_dependency_re= lationship) + + return build_and_relationship_elements --=20 2.43.0 From nobody Sat Apr 18 04:54:51 2026 Received: from mailgw01.zimbra-vnc.de (mailgw01.zimbra-vnc.de [148.251.101.236]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B31083DBD5C; Fri, 10 Apr 2026 21:23:51 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=148.251.101.236 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1775856239; cv=none; b=HnEWtHfQgkQ1j4h2uI0ElSh5TnW4eIfM0snFNjA6fv+dULEGXkE21KDil/0t0jS5ElYjHd5iO4JTtO/n2B3/V+1ZrB6nlscioq8n2r1MQPu4Tta71zv6KyQ1YKKQNm7WIcrUXwpWaGXHLaeyFNBbUEogoUDctHfhZzqJn1YL6kE= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1775856239; c=relaxed/simple; bh=W35fiRSNigatiVprYsyS02blTAKC75vX01d+FmheGRQ=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=sotI9qUTAGgYTR4I5UJ9RISz6eZPD+cr6kZnKiRylOM8Z0P7Sin9BLckyPsf8tQLO8Eof2DY8HC1/fZvxzFvpb3Is+FQPJXWX7Hy2yYFHJsUh3/IOEwPrTnz7pf31CtHVm5Bhyb7yaiR5HNWM9vN4tf0eyI/DZk+CcxGBHVnTfk= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=tngtech.com; spf=pass smtp.mailfrom=tngtech.com; dkim=pass (2048-bit key) header.d=tngtech.com header.i=@tngtech.com header.b=WdKg/BOc; arc=none smtp.client-ip=148.251.101.236 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=tngtech.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=tngtech.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=tngtech.com header.i=@tngtech.com header.b="WdKg/BOc" Received: from zmproxy.tng.vnc.biz (zimbra-vnc.tngtech.com [35.234.71.156]) by mailgw01.zimbra-vnc.de (Postfix) with ESMTPS id BB5E63FAFD; Fri, 10 Apr 2026 23:23:49 +0200 (CEST) Received: from localhost (localhost [127.0.0.1]) by zmproxy.tng.vnc.biz (Postfix) with ESMTP id A81311FAE85; Fri, 10 Apr 2026 23:23:49 +0200 (CEST) Received: from zmproxy.tng.vnc.biz ([127.0.0.1]) by localhost (zmproxy.tng.vnc.biz [127.0.0.1]) (amavis, port 10032) with ESMTP id 7UkXRkgrFwV6; Fri, 10 Apr 2026 23:23:47 +0200 (CEST) Received: from localhost (localhost [127.0.0.1]) by zmproxy.tng.vnc.biz (Postfix) with ESMTP id 827D61FAED2; Fri, 10 Apr 2026 23:23:47 +0200 (CEST) DKIM-Filter: OpenDKIM Filter v2.10.3 zmproxy.tng.vnc.biz 827D61FAED2 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=tngtech.com; s=B14491C6-869D-11EB-BB6C-8DD33D883B31; t=1775856227; bh=QWaH06u8sfh/bzCERI9BkfhsJtWFnDDGA18Y7s8EGOU=; h=From:To:Subject:Date:Message-ID:MIME-Version; b=WdKg/BOc+6BUQK73cEngfBbnkTeTUm9uu6d3tRtc47LmRd0dEau88qc3JkC3hZ0Le 6MVGjJQRDDH/qjq3g+7Buvvgo+nriZs8B60qJJ8C9mckYIZsIOEHLtOq7HoaqaLV60 YtAHE0bzzcV7PW4IoSN4EvWSycgBtH9Qm925MfwjbHGqyPr2oExYUnxYPcTcoBs7qb iuaVu5p5/WBpd3iUk33OD/WOeVYtGopm7PSzM+3YhNSfZO40KxHYHGTw7Xun5hNXZa KIvfdjHwpPIBSfrXEss/5lt4QzFdG+eu9spAZp7yH00fEnC0NzMA7z7IJWXIc7Sa+d pAJtGebQEMGHw== X-Virus-Scanned: amavis at zmproxy.tng.vnc.biz Received: from zmproxy.tng.vnc.biz ([127.0.0.1]) by localhost (zmproxy.tng.vnc.biz [127.0.0.1]) (amavis, port 10026) with ESMTP id AbWs_gGnq0bL; Fri, 10 Apr 2026 23:23:47 +0200 (CEST) Received: from luis-Precision-5480.. (ipservice-092-209-239-167.092.209.pools.vodafone-ip.de [92.209.239.167]) by zmproxy.tng.vnc.biz (Postfix) with ESMTPSA id 1A7751FAEC1; Fri, 10 Apr 2026 23:23:47 +0200 (CEST) From: Luis To: nathan@kernel.org, nsc@kernel.org Cc: linux-kbuild@vger.kernel.org, linux-kernel@vger.kernel.org, akpm@linux-foundation.org, gregkh@linuxfoundation.org, kstewart@linuxfoundation.org, maximilian.huber@tngtech.com, Luis Augenstein Subject: [PATCH v5 14/15] scripts/sbom: add unit tests for command parsers Date: Fri, 10 Apr 2026 23:22:54 +0200 Message-ID: <20260410212255.9883-15-luis.augenstein@tngtech.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20260410212255.9883-1-luis.augenstein@tngtech.com> References: <20260410212255.9883-1-luis.augenstein@tngtech.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Luis Augenstein Add unit tests to verify that command parsers correctly extract input files from build commands. Assisted-by: Cursor:claude-sonnet-4-5 Assisted-by: OpenCode:GLM-4-7 Co-developed-by: Maximilian Huber Signed-off-by: Maximilian Huber Signed-off-by: Luis Augenstein --- scripts/sbom/tests/__init__.py | 0 scripts/sbom/tests/cmd_graph/__init__.py | 0 .../tests/cmd_graph/test_savedcmd_parser.py | 412 ++++++++++++++++++ 3 files changed, 412 insertions(+) create mode 100644 scripts/sbom/tests/__init__.py create mode 100644 scripts/sbom/tests/cmd_graph/__init__.py create mode 100644 scripts/sbom/tests/cmd_graph/test_savedcmd_parser.py diff --git a/scripts/sbom/tests/__init__.py b/scripts/sbom/tests/__init__.py new file mode 100644 index 00000000000..e69de29bb2d diff --git a/scripts/sbom/tests/cmd_graph/__init__.py b/scripts/sbom/tests/= cmd_graph/__init__.py new file mode 100644 index 00000000000..e69de29bb2d diff --git a/scripts/sbom/tests/cmd_graph/test_savedcmd_parser.py b/scripts= /sbom/tests/cmd_graph/test_savedcmd_parser.py new file mode 100644 index 00000000000..469d932a341 --- /dev/null +++ b/scripts/sbom/tests/cmd_graph/test_savedcmd_parser.py @@ -0,0 +1,412 @@ +# SPDX-License-Identifier: GPL-2.0-only OR MIT +# Copyright (C) 2025 TNG Technology Consulting GmbH + +import os +import unittest + +from sbom.cmd_graph.savedcmd_parser import parse_inputs_from_commands +from sbom.cmd_graph.savedcmd_parser.command_parser_registry import Command= ParserRegistry +import sbom.sbom_logging as sbom_logging + + +class TestSavedCmdParser(unittest.TestCase): + def _assert_parsing(self, cmd: str, expected: str, registry: CommandPa= rserRegistry | None =3D None) -> None: + sbom_logging.init() + parsed =3D parse_inputs_from_commands(cmd, fail_on_unknown_build_c= ommand=3DFalse, registry=3Dregistry) + target =3D [] if expected =3D=3D "" else expected.split(" ") + self.assertEqual(parsed, target) + errors =3D sbom_logging._error_logger.messages # type: ignore + self.assertEqual(errors, {}) + + # Compound command tests + def test_dd_cat(self): + cmd =3D "(dd if=3Darch/x86/boot/setup.bin bs=3D4k conv=3Dsync stat= us=3Dnone; cat arch/x86/boot/vmlinux.bin) >arch/x86/boot/bzImage" + expected =3D "arch/x86/boot/setup.bin arch/x86/boot/vmlinux.bin" + self._assert_parsing(cmd, expected) + + def test_manual_file_creation(self): + cmd =3D """{ symbase=3D__dtbo_overlay_bad_unresolved; echo '$(poun= d)include '; echo '.section .rodata,"a"'; echo '= .balign STRUCT_ALIGNMENT'; echo ".global $${symbase}_begin"; echo "$${symba= se}_begin:"; echo '.incbin "drivers/of/unittest-data/overlay_bad_unresolved= .dtbo" '; echo ".global $${symbase}_end"; echo "$${symbase}_end:"; echo '.b= align STRUCT_ALIGNMENT'; } > drivers/of/unittest-data/overlay_bad_unresolve= d.dtbo.S""" + expected =3D "" + self._assert_parsing(cmd, expected) + + def test_cat_xz_wrap(self): + cmd =3D "{ cat arch/x86/boot/compressed/vmlinux.bin | sh ../script= s/xz_wrap.sh; printf \\130\\064\\024\\000; } > arch/x86/boot/compressed/vml= inux.bin.xz" + expected =3D "arch/x86/boot/compressed/vmlinux.bin" + self._assert_parsing(cmd, expected) + + def test_printf_sed(self): + cmd =3D r"""{ printf 'static char tomoyo_builtin_profile[] __init= data =3D\n'; sed -e 's/\\/\\\\/g' -e 's/\"/\\"/g' -e 's/\(.*\)/\t"\1\\n"/' = -- /dev/null; printf '\t"";\n'; printf 'static char tomoyo_builtin_excepti= on_policy[] __initdata =3D\n'; sed -e 's/\\/\\\\/g' -e 's/\"/\\"/g' -e 's/\= (.*\)/\t"\1\\n"/' -- ../security/tomoyo/policy/exception_policy.conf.defaul= t; printf '\t"";\n'; printf 'static char tomoyo_builtin_domain_policy[] __= initdata =3D\n'; sed -e 's/\\/\\\\/g' -e 's/\"/\\"/g' -e 's/\(.*\)/\t"\1\\n= "/' -- /dev/null; printf '\t"";\n'; printf 'static char tomoyo_builtin_man= ager[] __initdata =3D\n'; sed -e 's/\\/\\\\/g' -e 's/\"/\\"/g' -e 's/\(.*\)= /\t"\1\\n"/' -- /dev/null; printf '\t"";\n'; printf 'static char tomoyo_bu= iltin_stat[] __initdata =3D\n'; sed -e 's/\\/\\\\/g' -e 's/\"/\\"/g' -e 's/= \(.*\)/\t"\1\\n"/' -- /dev/null; printf '\t"";\n'; } > security/tomoyo/buil= tin-policy.h""" + expected =3D "../security/tomoyo/policy/exception_policy.conf.defa= ult" + self._assert_parsing(cmd, expected) + + def test_bin2c_echo(self): + cmd =3D """(echo "static char tomoyo_builtin_profile[] __initdata = =3D"; ./scripts/bin2c security/tomoyo/builtin-policy= .h""" + expected =3D "../security/tomoyo/policy/exception_policy.conf.defa= ult" + self._assert_parsing(cmd, expected) + + def test_cat_colon(self): + cmd =3D "{ cat init/modules.order; cat usr/modules.order; ca= t arch/x86/modules.order; cat arch/x86/boot/startup/modules.order; cat = kernel/modules.order; cat certs/modules.order; cat mm/modules.order; = cat fs/modules.order; cat ipc/modules.order; cat security/modules.order= ; cat crypto/modules.order; cat block/modules.order; cat io_uring/mod= ules.order; cat lib/modules.order; cat arch/x86/lib/modules.order; ca= t drivers/modules.order; cat sound/modules.order; cat samples/modules.o= rder; cat net/modules.order; cat virt/modules.order; cat arch/x86/pci= /modules.order; cat arch/x86/power/modules.order; cat arch/x86/video/mo= dules.order; :; } > modules.order" + expected =3D "init/modules.order usr/modules.order arch/x86/module= s.order arch/x86/boot/startup/modules.order kernel/modules.order certs/modu= les.order mm/modules.order fs/modules.order ipc/modules.order security/modu= les.order crypto/modules.order block/modules.order io_uring/modules.order l= ib/modules.order arch/x86/lib/modules.order drivers/modules.order sound/mod= ules.order samples/modules.order net/modules.order virt/modules.order arch/= x86/pci/modules.order arch/x86/power/modules.order arch/x86/video/modules.o= rder" + self._assert_parsing(cmd, expected) + + def test_cat_zstd(self): + cmd =3D "{ cat arch/x86/boot/compressed/vmlinux.bin arch/x86/boot/= compressed/vmlinux.relocs | zstd -22 --ultra; printf \\340\\362\\066\\003; = } > arch/x86/boot/compressed/vmlinux.bin.zst" + expected =3D "arch/x86/boot/compressed/vmlinux.bin arch/x86/boot/c= ompressed/vmlinux.relocs" + self._assert_parsing(cmd, expected) + + # cat command tests + def test_cat_redirect(self): + cmd =3D "cat ../fs/unicode/utf8data.c_shipped > fs/unicode/utf8dat= a.c" + expected =3D "../fs/unicode/utf8data.c_shipped" + self._assert_parsing(cmd, expected) + + def test_cat_piped(self): + cmd =3D "cat arch/x86/boot/compressed/vmlinux.bin arch/x86/boot/co= mpressed/vmlinux.relocs | gzip -n -f -9 > arch/x86/boot/compressed/vmlinux.= bin.gz" + expected =3D "arch/x86/boot/compressed/vmlinux.bin arch/x86/boot/c= ompressed/vmlinux.relocs" + self._assert_parsing(cmd, expected) + + # sed command tests + def test_sed(self): + cmd =3D "sed -n 's/.*define *BLIST_\\([A-Z0-9_]*\\) *.*/BLIST_FLAG= _NAME(\\1),/p' ../include/scsi/scsi_devinfo.h > drivers/scsi/scsi_devinfo_t= bl.c" + expected =3D "../include/scsi/scsi_devinfo.h" + self._assert_parsing(cmd, expected) + + # awk command tests + def test_awk(self): + cmd =3D "awk -f ../arch/arm64/tools/gen-cpucaps.awk ../arch/arm64/= tools/cpucaps > arch/arm64/include/generated/asm/cpucap-defs.h" + expected =3D "../arch/arm64/tools/cpucaps" + self._assert_parsing(cmd, expected) + + def test_awk_with_input_redirection(self): + cmd =3D "awk -v N=3D1 -f ../lib/raid6/unroll.awk < ../lib/raid6/in= t.uc > lib/raid6/int1.c" + expected =3D "../lib/raid6/int.uc" + self._assert_parsing(cmd, expected) + + # openssl command tests + def test_openssl(self): + cmd =3D "openssl req -new -nodes -utf8 -sha256 -days 36500 -batch = -x509 -config certs/x509.genkey -outform PEM -out certs/signing_key.pem -ke= yout certs/signing_key.pem 2>&1" + expected =3D "" + self._assert_parsing(cmd, expected) + + # gcc/clang command tests + def test_gcc(self): + cmd =3D ( + "gcc -Wp,-MMD,arch/x86/pci/.i386.o.d -nostdinc -I../arch/x86/i= nclude -I./arch/x86/include/generated -I../include -I./include -I../arch/x8= 6/include/uapi -I./arch/x86/include/generated/uapi -I../include/uapi -I./in= clude/generated/uapi -include ../include/linux/compiler-version.h -include = ../include/linux/kconfig.h -include ../include/linux/compiler_types.h -D__K= ERNEL__ -fmacro-prefix-map=3D../=3D -Werror -std=3Dgnu11 -fshort-wchar -fun= signed-char -fno-common -fno-PIE -fno-strict-aliasing -mno-sse -mno-mmx -mn= o-sse2 -mno-3dnow -mno-avx -fcf-protection=3Dbranch -fno-jump-tables -m64 -= falign-jumps=3D1 -falign-loops=3D1 -mno-80387 -mno-fp-ret-in-387 -mpreferre= d-stack-boundary=3D3 -mskip-rax-setup -march=3Dx86-64 -mtune=3Dgeneric -mno= -red-zone -mcmodel=3Dkernel -mstack-protector-guard-reg=3Dgs -mstack-protec= tor-guard-symbol=3D__ref_stack_chk_guard -Wno-sign-compare -fno-asynchronou= s-unwind-tables -mindirect-branch=3Dthunk-extern -mindirect-branch-register= -mindirect-branch-cs-prefix -mfunction-return=3Dthunk-extern -fno-jump-tab= les -fpatchable-function-entry=3D16,16 -fno-delete-null-pointer-checks -O2 = -fno-allow-store-data-races -fstack-protector-strong -fomit-frame-pointer -= fno-stack-clash-protection -falign-functions=3D16 -fno-strict-overflow -fno= -stack-check -fconserve-stack -fno-builtin-wcslen -Wall -Wextra -Wundef -We= rror=3Dimplicit-function-declaration -Werror=3Dimplicit-int -Werror=3Dretur= n-type -Werror=3Dstrict-prototypes -Wno-format-security -Wno-trigraphs -Wno= -frame-address -Wno-address-of-packed-member -Wmissing-declarations -Wmissi= ng-prototypes -Wframe-larger-than=3D2048 -Wno-main -Wvla-larger-than=3D1 -W= no-pointer-sign -Wcast-function-type -Wno-array-bounds -Wno-stringop-overfl= ow -Wno-alloc-size-larger-than -Wimplicit-fallthrough=3D5 -Werror=3Ddate-ti= me -Werror=3Dincompatible-pointer-types -Werror=3Ddesignated-init -Wenum-co= nversion -Wunused -Wno-unused-but-set-variable -Wno-unused-const-variable -= Wno-packed-not-aligned -Wno-format-overflow -Wno-format-truncation -Wno-str= ingop-truncation -Wno-override-init -Wno-missing-field-initializers -Wno-ty= pe-limits -Wno-shift-negative-value -Wno-maybe-uninitialized -Wno-sign-comp= are -Wno-unused-parameter -I../arch/x86/pci -Iarch/x86/pci -DKBUILD_MODF= ILE=3D" + "arch/x86/pci/i386" + " -DKBUILD_BASENAME=3D" + "i386" + " -DKBUILD_MODNAME=3D" + "i386" + " -D__KBUILD_MODNAME=3Dkmod_i386 -c -o arch/x86/pci/i386.o ../= arch/x86/pci/i386.c " + ) + expected =3D "../arch/x86/pci/i386.c" + self._assert_parsing(cmd, expected) + + def test_gcc_linking(self): + cmd =3D "gcc -o arch/x86/tools/relocs arch/x86/tools/relocs_32.o= arch/x86/tools/relocs_64.o arch/x86/tools/relocs_common.o" + expected =3D "arch/x86/tools/relocs_32.o arch/x86/tools/relocs_64.= o arch/x86/tools/relocs_common.o" + self._assert_parsing(cmd, expected) + + def test_gcc_without_compile_flag(self): + cmd =3D "gcc -Wp,-MMD,arch/x86/boot/compressed/.mkpiggy.d -Wall -W= missing-prototypes -Wstrict-prototypes -O2 -fomit-frame-pointer -std=3Dgnu1= 1 -I ../scripts/include -I../tools/include -I arch/x86/boot/compressed = -o arch/x86/boot/compressed/mkpiggy ../arch/x86/boot/compressed/mkpiggy.c" + expected =3D "../arch/x86/boot/compressed/mkpiggy.c" + self._assert_parsing(cmd, expected) + + def test_gcc_with_env_override(self): + os.environ["CC"] =3D "ccache gcc" + registry =3D CommandParserRegistry.create() + cmd =3D "gcc -o arch/x86/tools/relocs arch/x86/tools/relocs_32.o= arch/x86/tools/relocs_64.o arch/x86/tools/relocs_common.o" + expected =3D "arch/x86/tools/relocs_32.o arch/x86/tools/relocs_64.= o arch/x86/tools/relocs_common.o" + self._assert_parsing(cmd, expected, registry) + self._assert_parsing(f"ccache {cmd}", expected, registry) + + def test_gcc_dts_preprocessing(self): + cmd =3D "gcc -E -Wp,-MMD,drivers/of/.empty_root.dtb.d.pre.tmp -nos= tdinc -I ../scripts/dtc/include-prefixes -undef -D__DTS__ -x assembler-with= -cpp -o drivers/of/.empty_root.dtb.dts.tmp ../drivers/of/empty_root.dts" + expected =3D "../drivers/of/empty_root.dts" + self._assert_parsing(cmd, expected) + + def test_clang(self): + cmd =3D """clang -Wp,-MMD,arch/x86/entry/.entry_64_compat.o.d -nos= tdinc -I../arch/x86/include -I./arch/x86/include/generated -I../include -I.= /include -I../arch/x86/include/uapi -I./arch/x86/include/generated/uapi -I.= ./include/uapi -I./include/generated/uapi -include ../include/linux/compile= r-version.h -include ../include/linux/kconfig.h -D__KERNEL__ --target=3Dx86= _64-linux-gnu -fintegrated-as -Werror=3Dunknown-warning-option -Werror=3Dig= nored-optimization-argument -Werror=3Doption-ignored -Werror=3Dunused-comma= nd-line-argument -fmacro-prefix-map=3D../=3D -Werror -D__ASSEMBLY__ -fno-PI= E -m64 -I../arch/x86/entry -Iarch/x86/entry -DKBUILD_MODFILE=3D'"arch/x8= 6/entry/entry_64_compat"' -DKBUILD_MODNAME=3D'"entry_64_compat"' -D__KBUILD= _MODNAME=3Dkmod_entry_64_compat -c -o arch/x86/entry/entry_64_compat.o ../a= rch/x86/entry/entry_64_compat.S""" + expected =3D "../arch/x86/entry/entry_64_compat.S" + self._assert_parsing(cmd, expected) + + # ld command tests + def test_ld(self): + cmd =3D 'ld -o arch/x86/entry/vdso/vdso64.so.dbg -shared --hash-st= yle=3Dboth --build-id=3Dsha1 --no-undefined --eh-frame-hdr -Bsymbolic -z n= oexecstack -m elf_x86_64 -soname linux-vdso.so.1 -z max-page-size=3D4096 -T= arch/x86/entry/vdso/vdso.lds arch/x86/entry/vdso/vdso-note.o arch/x86/entr= y/vdso/vclock_gettime.o arch/x86/entry/vdso/vgetcpu.o arch/x86/entry/vdso/v= getrandom.o arch/x86/entry/vdso/vgetrandom-chacha.o; if readelf -rW arch/x8= 6/entry/vdso/vdso64.so.dbg | grep -v _NONE | grep -q " R_\w*_"; then (echo = >&2 "arch/x86/entry/vdso/vdso64.so.dbg: dynamic relocations are not support= ed"; rm -f arch/x86/entry/vdso/vdso64.so.dbg; /bin/false); fi' # type: ign= ore + expected =3D "arch/x86/entry/vdso/vdso-note.o arch/x86/entry/vdso/= vclock_gettime.o arch/x86/entry/vdso/vgetcpu.o arch/x86/entry/vdso/vgetrand= om.o arch/x86/entry/vdso/vgetrandom-chacha.o" + self._assert_parsing(cmd, expected) + + def test_ld_with_env_override(self): + os.environ["LD"] =3D "some-tool ld" + registry =3D CommandParserRegistry.create() + cmd =3D 'ld -o arch/x86/entry/vdso/vdso64.so.dbg -shared --hash-st= yle=3Dboth --build-id=3Dsha1 --no-undefined --eh-frame-hdr -Bsymbolic -z n= oexecstack -m elf_x86_64 -soname linux-vdso.so.1 -z max-page-size=3D4096 -T= arch/x86/entry/vdso/vdso.lds arch/x86/entry/vdso/vdso-note.o arch/x86/entr= y/vdso/vclock_gettime.o arch/x86/entry/vdso/vgetcpu.o arch/x86/entry/vdso/v= getrandom.o arch/x86/entry/vdso/vgetrandom-chacha.o; if readelf -rW arch/x8= 6/entry/vdso/vdso64.so.dbg | grep -v _NONE | grep -q " R_\w*_"; then (echo = >&2 "arch/x86/entry/vdso/vdso64.so.dbg: dynamic relocations are not support= ed"; rm -f arch/x86/entry/vdso/vdso64.so.dbg; /bin/false); fi' # type: ign= ore + expected =3D "arch/x86/entry/vdso/vdso-note.o arch/x86/entry/vdso/= vclock_gettime.o arch/x86/entry/vdso/vgetcpu.o arch/x86/entry/vdso/vgetrand= om.o arch/x86/entry/vdso/vgetrandom-chacha.o" + self._assert_parsing(cmd, expected, registry) + self._assert_parsing(f"some-tool {cmd}", expected, registry) + + def test_ld_whole_archive(self): + cmd =3D "ld -m elf_x86_64 -z noexecstack -r -o vmlinux.o --whole= -archive vmlinux.a --no-whole-archive --start-group --end-group" + expected =3D "vmlinux.a" + self._assert_parsing(cmd, expected) + + def test_ld_with_at_symbol(self): + cmd =3D "ld.lld -m elf_x86_64 -z noexecstack -r -o fs/efivarfs/e= fivarfs.o @fs/efivarfs/efivarfs.mod ; ./tools/objtool/objtool --hacks=3Dju= mp_label --hacks=3Dnoinstr --hacks=3Dskylake --ibt --orc --retpoline --reth= unk --static-call --uaccess --prefix=3D16 --link --module fs/efivarfs/efi= varfs.o" + expected =3D "@fs/efivarfs/efivarfs.mod" + self._assert_parsing(cmd, expected) + + def test_ld_if_objdump(self): + cmd =3D """ld -o arch/x86/entry/vdso/vdso64.so.dbg -shared --hash-= style=3Dboth --build-id=3Dsha1 --eh-frame-hdr -Bsymbolic -z noexecstack -m= elf_x86_64 -soname linux-vdso.so.1 --no-undefined -z max-page-size=3D4096 = -T arch/x86/entry/vdso/vdso.lds arch/x86/entry/vdso/vdso-note.o arch/x86/en= try/vdso/vclock_gettime.o arch/x86/entry/vdso/vgetcpu.o arch/x86/entry/vdso= /vsgx.o && sh ./arch/x86/entry/vdso/checkundef.sh 'nm' 'arch/x86/entry/vdso= /vdso64.so.dbg'; if objdump -R arch/x86/entry/vdso/vdso64.so.dbg | grep -E = -h "R_X86_64_JUMP_SLOT|R_X86_64_GLOB_DAT|R_X86_64_RELATIVE| R_386_GLOB_DAT|= R_386_JMP_SLOT|R_386_RELATIVE"; then (echo >&2 "arch/x86/entry/vdso/vdso64.= so.dbg: dynamic relocations are not supported"; rm -f arch/x86/entry/vdso/v= dso64.so.dbg; /bin/false); fi""" + expected =3D "arch/x86/entry/vdso/vdso-note.o arch/x86/entry/vdso/= vclock_gettime.o arch/x86/entry/vdso/vgetcpu.o arch/x86/entry/vdso/vsgx.o" + self._assert_parsing(cmd, expected) + + # printf | xargs ar command tests + def test_ar_printf(self): + cmd =3D 'rm -f built-in.a; printf "./%s " init/built-in.a usr/bui= lt-in.a arch/x86/built-in.a arch/x86/boot/startup/built-in.a kernel/built-i= n.a certs/built-in.a mm/built-in.a fs/built-in.a ipc/built-in.a security/bu= ilt-in.a crypto/built-in.a block/built-in.a io_uring/built-in.a lib/built-i= n.a arch/x86/lib/built-in.a drivers/built-in.a sound/built-in.a net/built-i= n.a virt/built-in.a arch/x86/pci/built-in.a arch/x86/power/built-in.a arch/= x86/video/built-in.a | xargs ar cDPrST built-in.a' + expected =3D "./init/built-in.a ./usr/built-in.a ./arch/x86/built-= in.a ./arch/x86/boot/startup/built-in.a ./kernel/built-in.a ./certs/built-i= n.a ./mm/built-in.a ./fs/built-in.a ./ipc/built-in.a ./security/built-in.a = ./crypto/built-in.a ./block/built-in.a ./io_uring/built-in.a ./lib/built-in= .a ./arch/x86/lib/built-in.a ./drivers/built-in.a ./sound/built-in.a ./net/= built-in.a ./virt/built-in.a ./arch/x86/pci/built-in.a ./arch/x86/power/bui= lt-in.a ./arch/x86/video/built-in.a" + self._assert_parsing(cmd, expected) + + def test_ar_printf_nested(self): + cmd =3D 'rm -f arch/x86/pci/built-in.a; printf "arch/x86/pci/%s "= i386.o init.o mmconfig_64.o direct.o mmconfig-shared.o fixup.o acpi.o lega= cy.o irq.o common.o early.o bus_numa.o amd_bus.o | xargs ar cDPrST arch/x86= /pci/built-in.a' + expected =3D "arch/x86/pci/i386.o arch/x86/pci/init.o arch/x86/pci= /mmconfig_64.o arch/x86/pci/direct.o arch/x86/pci/mmconfig-shared.o arch/x8= 6/pci/fixup.o arch/x86/pci/acpi.o arch/x86/pci/legacy.o arch/x86/pci/irq.o = arch/x86/pci/common.o arch/x86/pci/early.o arch/x86/pci/bus_numa.o arch/x86= /pci/amd_bus.o" + self._assert_parsing(cmd, expected) + + # ar command tests + def test_ar_reordering(self): + cmd =3D "rm -f vmlinux.a; ar cDPrST vmlinux.a built-in.a lib/lib.= a arch/x86/lib/lib.a; ar mPiT $$(ar t vmlinux.a | sed -n 1p) vmlinux.a $$(a= r t vmlinux.a | grep -F -f ../scripts/head-object-list.txt)" + expected =3D "built-in.a lib/lib.a arch/x86/lib/lib.a" + self._assert_parsing(cmd, expected) + + def test_ar_default(self): + cmd =3D "rm -f lib/lib.a; ar cDPrsT lib/lib.a lib/argv_split.o lib= /bug.o lib/buildid.o lib/clz_tab.o lib/cmdline.o lib/cpumask.o lib/ctype.o = lib/dec_and_lock.o lib/decompress.o lib/decompress_bunzip2.o lib/decompress= _inflate.o lib/decompress_unlz4.o lib/decompress_unlzma.o lib/decompress_un= lzo.o lib/decompress_unxz.o lib/decompress_unzstd.o lib/dump_stack.o lib/ea= rlycpio.o lib/extable.o lib/flex_proportions.o lib/idr.o lib/iomem_copy.o l= ib/irq_regs.o lib/is_single_threaded.o lib/klist.o lib/kobject.o lib/kobjec= t_uevent.o lib/logic_pio.o lib/maple_tree.o lib/memcat_p.o lib/nmi_backtrac= e.o lib/objpool.o lib/plist.o lib/radix-tree.o lib/ratelimit.o lib/rbtree.o= lib/seq_buf.o lib/siphash.o lib/string.o lib/sys_info.o lib/timerqueue.o l= ib/union_find.o lib/vsprintf.o lib/win_minmax.o lib/xarray.o" + expected =3D "lib/argv_split.o lib/bug.o lib/buildid.o lib/clz_tab= .o lib/cmdline.o lib/cpumask.o lib/ctype.o lib/dec_and_lock.o lib/decompres= s.o lib/decompress_bunzip2.o lib/decompress_inflate.o lib/decompress_unlz4.= o lib/decompress_unlzma.o lib/decompress_unlzo.o lib/decompress_unxz.o lib/= decompress_unzstd.o lib/dump_stack.o lib/earlycpio.o lib/extable.o lib/flex= _proportions.o lib/idr.o lib/iomem_copy.o lib/irq_regs.o lib/is_single_thre= aded.o lib/klist.o lib/kobject.o lib/kobject_uevent.o lib/logic_pio.o lib/m= aple_tree.o lib/memcat_p.o lib/nmi_backtrace.o lib/objpool.o lib/plist.o li= b/radix-tree.o lib/ratelimit.o lib/rbtree.o lib/seq_buf.o lib/siphash.o lib= /string.o lib/sys_info.o lib/timerqueue.o lib/union_find.o lib/vsprintf.o l= ib/win_minmax.o lib/xarray.o" + self._assert_parsing(cmd, expected) + + def test_ar_llvm(self): + cmd =3D "llvm-ar mPiT $$(llvm-ar t vmlinux.a | sed -n 1p) vmlinux.= a $$(llvm-ar t vmlinux.a | grep -F -f ../scripts/head-object-list.txt)" + expected =3D "" + self._assert_parsing(cmd, expected) + + # nm command tests + def test_nm(self): + cmd =3D """llvm-nm -p --defined-only rust/core.o | awk '$$2~/(T|R|= D|B)/ && $$3!~/__(pfx|cfi|odr_asan)/ { printf "EXPORT_SYMBOL_RUST_GPL(%s);\= n",$$3 }' > rust/exports_core_generated.h""" + expected =3D "rust/core.o" + self._assert_parsing(cmd, expected) + + def test_nm_vmlinux(self): + cmd =3D r"nm vmlinux | sed -n -e 's/^\([0-9a-fA-F]*\) [ABbCDGRSTtV= W] \(_text\|__start_rodata\|__bss_start\|_end\)$/#define VO_\2 _AC(0x\1,UL)= /p' > arch/x86/boot/voffset.h" + expected =3D "vmlinux" + self._assert_parsing(cmd, expected) + + # objcopy command tests + def test_objcopy(self): + cmd =3D "objcopy --remove-section=3D'.rel*' --remove-section=3D!'.= rel*.dyn' vmlinux.unstripped vmlinux" + expected =3D "vmlinux.unstripped" + self._assert_parsing(cmd, expected) + + def test_objcopy_llvm(self): + cmd =3D "llvm-objcopy --remove-section=3D'.rel*' --remove-section= =3D!'.rel*.dyn' vmlinux.unstripped vmlinux" + expected =3D "vmlinux.unstripped" + self._assert_parsing(cmd, expected) + + # strip command tests + def test_strip(self): + cmd =3D "strip --strip-debug -o drivers/firmware/efi/libstub/mem.s= tub.o drivers/firmware/efi/libstub/mem.o" + expected =3D "drivers/firmware/efi/libstub/mem.o" + self._assert_parsing(cmd, expected) + + # rustc command tests + def test_rustc(self): + cmd =3D """OBJTREE=3D/workspace/linux/kernel_build rustc -Zbinary_= dep_depinfo=3Dy -Astable_features -Dnon_ascii_idents -Dunsafe_op_in_unsafe_= fn -Wmissing_docs -Wrust_2018_idioms -Wclippy::all -Wclippy::as_ptr_cast_mu= t -Wclippy::as_underscore -Wclippy::cast_lossless -Wclippy::ignored_unit_pa= tterns -Wclippy::mut_mut -Wclippy::needless_bitwise_bool -Aclippy::needless= _lifetimes -Wclippy::no_mangle_with_rust_abi -Wclippy::ptr_as_ptr -Wclippy:= :ptr_cast_constness -Wclippy::ref_as_ptr -Wclippy::undocumented_unsafe_bloc= ks -Wclippy::unnecessary_safety_comment -Wclippy::unnecessary_safety_doc -W= rustdoc::missing_crate_level_docs -Wrustdoc::unescaped_backticks -Cpanic=3D= abort -Cembed-bitcode=3Dn -Clto=3Dn -Cforce-unwind-tables=3Dn -Ccodegen-uni= ts=3D1 -Csymbol-mangling-version=3Dv0 -Crelocation-model=3Dstatic -Zfunctio= n-sections=3Dn -Wclippy::float_arithmetic --target=3D./scripts/target.json = -Ctarget-feature=3D-sse,-sse2,-sse3,-ssse3,-sse4.1,-sse4.2,-avx,-avx2 -Zcf-= protection=3Dbranch -Zno-jump-tables -Ctarget-cpu=3Dx86-64 -Ztune-cpu=3Dgen= eric -Cno-redzone=3Dy -Ccode-model=3Dkernel -Zfunction-return=3Dthunk-exter= n -Zpatchable-function-entry=3D16,16 -Copt-level=3D2 -Cdebug-assertions=3Dn= -Coverflow-checks=3Dy -Dwarnings @./include/generated/rustc_cfg --edition= =3D2021 --cfg no_fp_fmt_parse --emit=3Ddep-info=3Drust/.core.o.d --emit=3Do= bj=3Drust/core.o --emit=3Dmetadata=3Drust/libcore.rmeta --crate-type rlib -= L./rust --crate-name core /usr/lib/rust-1.84/lib/rustlib/src/rust/library/c= ore/src/lib.rs --sysroot=3D/dev/null ;llvm-objcopy --redefine-sym __addsf3= =3D__rust__addsf3 --redefine-sym __eqsf2=3D__rust__eqsf2 --redefine-sym __e= xtendsfdf2=3D__rust__extendsfdf2 --redefine-sym __gesf2=3D__rust__gesf2 --r= edefine-sym __lesf2=3D__rust__lesf2 --redefine-sym __ltsf2=3D__rust__ltsf2 = --redefine-sym __mulsf3=3D__rust__mulsf3 --redefine-sym __nesf2=3D__rust__n= esf2 --redefine-sym __truncdfsf2=3D__rust__truncdfsf2 --redefine-sym __unor= dsf2=3D__rust__unordsf2 --redefine-sym __adddf3=3D__rust__adddf3 --redefine= -sym __eqdf2=3D__rust__eqdf2 --redefine-sym __ledf2=3D__rust__ledf2 --redef= ine-sym __ltdf2=3D__rust__ltdf2 --redefine-sym __muldf3=3D__rust__muldf3 --= redefine-sym __unorddf2=3D__rust__unorddf2 --redefine-sym __muloti4=3D__rus= t__muloti4 --redefine-sym __multi3=3D__rust__multi3 --redefine-sym __udivmo= dti4=3D__rust__udivmodti4 --redefine-sym __udivti3=3D__rust__udivti3 --rede= fine-sym __umodti3=3D__rust__umodti3 rust/core.o""" + expected =3D "/usr/lib/rust-1.84/lib/rustlib/src/rust/library/core= /src/lib.rs rust/core.o" + self._assert_parsing(cmd, expected) + + # rustdoc command tests + def test_rustdoc(self): + cmd =3D """OBJTREE=3D/workspace/linux/kernel_build rustdoc --test = --edition=3D2021 -Zbinary_dep_depinfo=3Dy -Astable_features -Dnon_ascii_ide= nts -Dunsafe_op_in_unsafe_fn -Wmissing_docs -Wrust_2018_idioms -Wunreachabl= e_pub -Wclippy::all -Wclippy::as_ptr_cast_mut -Wclippy::as_underscore -Wcli= ppy::cast_lossless -Wclippy::ignored_unit_patterns -Wclippy::mut_mut -Wclip= py::needless_bitwise_bool -Aclippy::needless_lifetimes -Wclippy::no_mangle_= with_rust_abi -Wclippy::ptr_as_ptr -Wclippy::ptr_cast_constness -Wclippy::r= ef_as_ptr -Wclippy::undocumented_unsafe_blocks -Wclippy::unnecessary_safety= _comment -Wclippy::unnecessary_safety_doc -Wrustdoc::missing_crate_level_do= cs -Wrustdoc::unescaped_backticks -Cpanic=3Dabort -Cembed-bitcode=3Dn -Clto= =3Dn -Cforce-unwind-tables=3Dn -Ccodegen-units=3D1 -Csymbol-mangling-versio= n=3Dv0 -Crelocation-model=3Dstatic -Zfunction-sections=3Dn -Wclippy::float_= arithmetic --target=3Daarch64-unknown-none -Ctarget-feature=3D"-neon" -Cfor= ce-unwind-tables=3Dn -Zbranch-protection=3Dpac-ret -Copt-level=3D2 -Cdebug-= assertions=3Dy -Coverflow-checks=3Dy -Dwarnings -Cforce-frame-pointers=3Dy = -Zsanitizer=3Dkernel-address -Zsanitizer-recover=3Dkernel-address -Cllvm-ar= gs=3D-asan-mapping-offset=3D0xdfff800000000000 -Cpasses=3Dsancov-module -Cl= lvm-args=3D-sanitizer-coverage-level=3D3 -Cllvm-args=3D-sanitizer-coverage-= trace-pc -Cllvm-args=3D-sanitizer-coverage-trace-compares @./include/genera= ted/rustc_cfg -L./rust --extern ffi --extern pin_init --extern kernel --ext= ern build_error --extern macros --extern bindings --extern uapi --no-run --= crate-name kernel -Zunstable-options --sysroot=3D/dev/null --test-builder = ./scripts/rustdoc_test_builder ../rust/kernel/lib.rs >/dev/null""" + expected =3D "../rust/kernel/lib.rs" + self._assert_parsing(cmd, expected) + + def test_rustdoc_test_gen(self): + cmd =3D "./scripts/rustdoc_test_gen" + expected =3D "" + self._assert_parsing(cmd, expected) + + # flex command tests + def test_flex(self): + cmd =3D "flex -oscripts/kconfig/lexer.lex.c -L ../scripts/kconfig/= lexer.l" + expected =3D "../scripts/kconfig/lexer.l" + self._assert_parsing(cmd, expected) + + # bison command tests + def test_bison(self): + cmd =3D "bison -o scripts/kconfig/parser.tab.c --defines=3Dscripts= /kconfig/parser.tab.h -t -l ../scripts/kconfig/parser.y" + expected =3D "../scripts/kconfig/parser.y" + self._assert_parsing(cmd, expected) + + # bindgen command tests + def test_bindgen(self): + cmd =3D ( + "bindgen ../rust/bindings/bindings_helper.h " + "--blocklist-type __kernel_s?size_t --blocklist-type __kernel_= ptrdiff_t " + "--opaque-type xregs_state --opaque-type desc_struct --no-doc-= comments " + "--rust-target 1.68 --use-core --with-derive-default -o rust/b= indings/bindings_generated.rs " + "-- -Wp,-MMD,rust/bindings/.bindings_generated.rs.d -nostdinc = -I../arch/x86/include " + "-include ../include/linux/compiler-version.h -D__KERNEL__ -fi= ntegrated-as -fno-builtin -DMODULE; " + "sed -Ei 's/pub const RUST_CONST_HELPER_([a-zA-Z0-9_]*)/pub co= nst \\1/g' rust/bindings/bindings_generated.rs" + ) + expected =3D "../rust/bindings/bindings_helper.h ../include/linux/= compiler-version.h" + self._assert_parsing(cmd, expected) + + # perl command tests + def test_perl(self): + cmd =3D "perl ../lib/crypto/x86/poly1305-x86_64-cryptogams.pl > li= b/crypto/x86/poly1305-x86_64-cryptogams.S" + expected =3D "../lib/crypto/x86/poly1305-x86_64-cryptogams.pl" + self._assert_parsing(cmd, expected) + + # link-vmlinux.sh command tests + def test_link_vmlinux(self): + cmd =3D '../scripts/link-vmlinux.sh "ld" "-m elf_x86_64 -z noexecs= tack" "-z max-page-size=3D0x200000 --build-id=3Dsha1 --orphan-handling=3Der= ror --emit-relocs --discard-none" "vmlinux.unstripped"; true' + expected =3D "vmlinux.a" + self._assert_parsing(cmd, expected) + + def test_link_vmlinux_postlink(self): + cmd =3D '../scripts/link-vmlinux.sh "ld" "-m elf_x86_64 -z noexecs= tack --no-warn-rwx-segments" "--emit-relocs --discard-none -z max-page-size= =3D0x200000 --build-id=3Dsha1 -X --orphan-handling=3Derror"; make -f ../ar= ch/x86/Makefile.postlink vmlinux' + expected =3D "vmlinux.a" + self._assert_parsing(cmd, expected) + + # syscallhdr.sh command tests + def test_syscallhdr(self): + cmd =3D "sh ../scripts/syscallhdr.sh --abis common,64 --emit-nr = ../arch/x86/entry/syscalls/syscall_64.tbl arch/x86/include/generated/uapi/a= sm/unistd_64.h" + expected =3D "../arch/x86/entry/syscalls/syscall_64.tbl" + self._assert_parsing(cmd, expected) + + # syscalltbl.sh command tests + def test_syscalltbl(self): + cmd =3D "sh ../scripts/syscalltbl.sh --abis common,64 ../arch/x86/= entry/syscalls/syscall_64.tbl arch/x86/include/generated/asm/syscalls_64.h" + expected =3D "../arch/x86/entry/syscalls/syscall_64.tbl" + self._assert_parsing(cmd, expected) + + # mkcapflags.sh command tests + def test_mkcapflags(self): + cmd =3D "sh ../arch/x86/kernel/cpu/mkcapflags.sh arch/x86/kernel/c= pu/capflags.c ../arch/x86/kernel/cpu/../../include/asm/cpufeatures.h ../arc= h/x86/kernel/cpu/../../include/asm/vmxfeatures.h ../arch/x86/kernel/cpu/mkc= apflags.sh FORCE" + expected =3D "../arch/x86/kernel/cpu/../../include/asm/cpufeatures= .h ../arch/x86/kernel/cpu/../../include/asm/vmxfeatures.h" + self._assert_parsing(cmd, expected) + + # orc_hash.sh command tests + def test_orc_hash(self): + cmd =3D "mkdir -p arch/x86/include/generated/asm/; sh ../scripts/o= rc_hash.sh < ../arch/x86/include/asm/orc_types.h > arch/x86/include/generat= ed/asm/orc_hash.h" + expected =3D "../arch/x86/include/asm/orc_types.h" + self._assert_parsing(cmd, expected) + + # xen-hypercalls.sh command tests + def test_xen_hypercalls(self): + cmd =3D "sh '../scripts/xen-hypercalls.sh' arch/x86/include/genera= ted/asm/xen-hypercalls.h ../include/xen/interface/xen-mca.h ../include/xen/= interface/xen.h ../include/xen/interface/xenpmu.h" + expected =3D "../include/xen/interface/xen-mca.h ../include/xen/in= terface/xen.h ../include/xen/interface/xenpmu.h" + self._assert_parsing(cmd, expected) + + # gen_initramfs.sh command tests + def test_gen_initramfs(self): + cmd =3D "sh ../usr/gen_initramfs.sh -o usr/initramfs_data.cpio -l = usr/.initramfs_data.cpio.d ../usr/default_cpio_list" + expected =3D "../usr/default_cpio_list" + self._assert_parsing(cmd, expected) + + # vdso2c command tests + def test_vdso2c(self): + cmd =3D "arch/x86/entry/vdso/vdso2c arch/x86/entry/vdso/vdso64.so.= dbg arch/x86/entry/vdso/vdso64.so arch/x86/entry/vdso/vdso-image-64.c" + expected =3D "arch/x86/entry/vdso/vdso64.so.dbg arch/x86/entry/vds= o/vdso64.so" + self._assert_parsing(cmd, expected) + + # mkpiggy command tests + def test_mkpiggy(self): + cmd =3D "arch/x86/boot/compressed/mkpiggy arch/x86/boot/compressed= /vmlinux.bin.gz > arch/x86/boot/compressed/piggy.S" + expected =3D "arch/x86/boot/compressed/vmlinux.bin.gz" + self._assert_parsing(cmd, expected) + + # relocs command tests + def test_relocs(self): + cmd =3D "arch/x86/tools/relocs vmlinux.unstripped > arch/x86/boot/= compressed/vmlinux.relocs;arch/x86/tools/relocs --abs-relocs vmlinux.unstri= pped" + expected =3D "vmlinux.unstripped" + self._assert_parsing(cmd, expected) + + def test_relocs_with_realmode(self): + cmd =3D ( + "arch/x86/tools/relocs --realmode arch/x86/realmode/rm/realmod= e.elf > arch/x86/realmode/rm/realmode.relocs" + ) + expected =3D "arch/x86/realmode/rm/realmode.elf" + self._assert_parsing(cmd, expected) + + # mk_elfconfig command tests + def test_mk_elfconfig(self): + cmd =3D "scripts/mod/mk_elfconfig < scripts/mod/empty.o > scripts/= mod/elfconfig.h" + expected =3D "scripts/mod/empty.o" + self._assert_parsing(cmd, expected) + + # tools/build command tests + def test_build(self): + cmd =3D "arch/x86/boot/tools/build arch/x86/boot/setup.bin arch/x8= 6/boot/vmlinux.bin arch/x86/boot/zoffset.h arch/x86/boot/bzImage" + expected =3D "arch/x86/boot/setup.bin arch/x86/boot/vmlinux.bin ar= ch/x86/boot/zoffset.h" + self._assert_parsing(cmd, expected) + + # extract-cert command tests + def test_extract_cert(self): + cmd =3D 'certs/extract-cert "" certs/signing_key.x509' + expected =3D "" + self._assert_parsing(cmd, expected) + + # dtc command tests + def test_dtc_cat(self): + cmd =3D "./scripts/dtc/dtc -o drivers/of/empty_root.dtb -b 0 -i../= drivers/of/ -i../scripts/dtc/include-prefixes -Wno-unique_unit_address -Wno= -unit_address_vs_reg -Wno-avoid_unnecessary_addr_size -Wno-alias_paths -Wno= -graph_child_address -Wno-simple_bus_reg -d drivers/of/.empty_root.dtb.d.= dtc.tmp drivers/of/.empty_root.dtb.dts.tmp ; cat drivers/of/.empty_root.dtb= .d.pre.tmp drivers/of/.empty_root.dtb.d.dtc.tmp > drivers/of/.empty_root.dt= b.d" + expected =3D "drivers/of/.empty_root.dtb.dts.tmp drivers/of/.empty= _root.dtb.d.pre.tmp drivers/of/.empty_root.dtb.d.dtc.tmp" + self._assert_parsing(cmd, expected) + + # pnmtologo command tests + def test_pnmtologo(self): + cmd =3D "drivers/video/logo/pnmtologo -t clut224 -n logo_linux_clu= t224 -o drivers/video/logo/logo_linux_clut224.c ../drivers/video/logo/logo_= linux_clut224.ppm" + expected =3D "../drivers/video/logo/logo_linux_clut224.ppm" + self._assert_parsing(cmd, expected) + + # relacheck command tests + def test_relacheck(self): + cmd =3D "arch/arm64/kernel/pi/relacheck arch/arm64/kernel/pi/idreg= -override.pi.o arch/arm64/kernel/pi/idreg-override.o" + expected =3D "arch/arm64/kernel/pi/idreg-override.pi.o" + self._assert_parsing(cmd, expected) + + # gen-hyprel command tests + def test_gen_hyprel(self): + cmd =3D "arch/arm64/kvm/hyp/nvhe/gen-hyprel arch/arm64/kvm/hyp/nvh= e/kvm_nvhe.tmp.o > arch/arm64/kvm/hyp/nvhe/hyp-reloc.S" + expected =3D "arch/arm64/kvm/hyp/nvhe/kvm_nvhe.tmp.o" + self._assert_parsing(cmd, expected) + + # mkregtable command tests + def test_mkregtable(self): + cmd =3D "drivers/gpu/drm/radeon/mkregtable ../drivers/gpu/drm/rade= on/reg_srcs/r100 > drivers/gpu/drm/radeon/r100_reg_safe.h" + expected =3D "../drivers/gpu/drm/radeon/reg_srcs/r100" + self._assert_parsing(cmd, expected) + + # genheaders command tests + def test_genheaders(self): + cmd =3D "security/selinux/genheaders security/selinux/flask.h secu= rity/selinux/av_permissions.h" + expected =3D "" + self._assert_parsing(cmd, expected) + + # mkcpustr command tests + def test_mkcpustr(self): + cmd =3D "arch/x86/boot/mkcpustr > arch/x86/boot/cpustr.h" + expected =3D "" + self._assert_parsing(cmd, expected) + + # polgen command tests + def test_polgen(self): + cmd =3D "scripts/ipe/polgen/polgen security/ipe/boot_policy.c" + expected =3D "" + self._assert_parsing(cmd, expected) + + # gen_header.py command tests + def test_gen_header(self): + cmd =3D "mkdir -p drivers/gpu/drm/msm/generated && python3 ../driv= ers/gpu/drm/msm/registers/gen_header.py --no-validate --rnn ../drivers/gpu/= drm/msm/registers --xml ../drivers/gpu/drm/msm/registers/adreno/a2xx.xml c-= defines > drivers/gpu/drm/msm/generated/a2xx.xml.h" + expected =3D "../drivers/gpu/drm/msm/registers/adreno/a2xx.xml" + self._assert_parsing(cmd, expected) + + +if __name__ =3D=3D "__main__": + unittest.main() --=20 2.43.0 From nobody Sat Apr 18 04:54:51 2026 Received: from mailgw01.zimbra-vnc.de (mailgw01.zimbra-vnc.de [148.251.101.236]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B0F783DBD59; Fri, 10 Apr 2026 21:23:51 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=148.251.101.236 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1775856235; cv=none; b=YkL+afmtzDOIOveg7wJ2YxXwJK5dQwHogH8ZfX9TKWQKuzTpCrW5FXj9/+ZqarTxoj+MKlUhQGFZx158/yl6qGigN6hFdo6tg3dtuwbl6+f5SIzFDGNczeSr8TaI0gWkZNG+QLRfZJ6tnAEam6JZWmq7hxeohXrsq6EFpHrh+L0= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1775856235; c=relaxed/simple; bh=uZYAV5rnm6Bnjyl/V05Hj9AP/xVFRUlXDXUTZMCG3Pg=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=UxncNURg9ALWr5jUkixMwX3mDox9RscjW+Ed/TCY5CjtuDz/BDWRmu8L16Zb8pTe+fxb4Qus8phmztnWkgFf/6wxDTpuZiq35oSciTtadzBi389LiBcIHSMI6fBlACmfRWkMWolfBtXyUHXjYg3Z3+ZjS8KVOfHEtU/sFy3t4IY= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=tngtech.com; spf=pass smtp.mailfrom=tngtech.com; dkim=pass (2048-bit key) header.d=tngtech.com header.i=@tngtech.com header.b=Wna3Gjtl; arc=none smtp.client-ip=148.251.101.236 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=tngtech.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=tngtech.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=tngtech.com header.i=@tngtech.com header.b="Wna3Gjtl" Received: from zmproxy.tng.vnc.biz (zimbra-vnc.tngtech.com [35.234.71.156]) by mailgw01.zimbra-vnc.de (Postfix) with ESMTPS id A45E53FAFA; Fri, 10 Apr 2026 23:23:49 +0200 (CEST) Received: from localhost (localhost [127.0.0.1]) by zmproxy.tng.vnc.biz (Postfix) with ESMTP id 7416C1FAEE8; Fri, 10 Apr 2026 23:23:49 +0200 (CEST) Received: from zmproxy.tng.vnc.biz ([127.0.0.1]) by localhost (zmproxy.tng.vnc.biz [127.0.0.1]) (amavis, port 10032) with ESMTP id VgiFCQK8SJGR; Fri, 10 Apr 2026 23:23:48 +0200 (CEST) Received: from localhost (localhost [127.0.0.1]) by zmproxy.tng.vnc.biz (Postfix) with ESMTP id D54A21FAEBF; Fri, 10 Apr 2026 23:23:48 +0200 (CEST) DKIM-Filter: OpenDKIM Filter v2.10.3 zmproxy.tng.vnc.biz D54A21FAEBF DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=tngtech.com; s=B14491C6-869D-11EB-BB6C-8DD33D883B31; t=1775856228; bh=jOd5RR3QdzyGzoaJEtl+35NCqPUWAP8mHCNjkzlpJ8k=; h=From:To:Subject:Date:Message-ID:MIME-Version; b=Wna3GjtlykITPdvVP0Y1hJ3iIv/m2DuyWfECJj0eJOpgIKTRwDejTXMAl1u1Zkjpm CmrX5c0alNpE2xY+/e7VmWKZwdaoFVHnmxSF6jazfRPQE3L0zMhviTNmTTs9rDwJLU 5AEZxoPdDD+mLafVP+MluEWcXgRPeMPXehkQcTu7hw1v+h3eMG2CH1+OL4HRVsHD8M 3KWcy/I9S5O940vc91KBxtpXqocC9l/rFRndTI5zuNambX3h2sFfvXowQkebmtGh4Z b/wvcYfshj0aoNnmqEAloX4OG2o1d64Sr/SAYEsxI4twaRriouctRcDyWWDvEZD3Vl 6R31Hzvp6iuzw== X-Virus-Scanned: amavis at zmproxy.tng.vnc.biz Received: from zmproxy.tng.vnc.biz ([127.0.0.1]) by localhost (zmproxy.tng.vnc.biz [127.0.0.1]) (amavis, port 10026) with ESMTP id Ipkj9F0JaNdE; Fri, 10 Apr 2026 23:23:48 +0200 (CEST) Received: from luis-Precision-5480.. (ipservice-092-209-239-167.092.209.pools.vodafone-ip.de [92.209.239.167]) by zmproxy.tng.vnc.biz (Postfix) with ESMTPSA id 76FE91FAEBD; Fri, 10 Apr 2026 23:23:48 +0200 (CEST) From: Luis To: nathan@kernel.org, nsc@kernel.org Cc: linux-kbuild@vger.kernel.org, linux-kernel@vger.kernel.org, akpm@linux-foundation.org, gregkh@linuxfoundation.org, kstewart@linuxfoundation.org, maximilian.huber@tngtech.com, Luis Augenstein Subject: [PATCH v5 15/15] scripts/sbom: add unit tests for SPDX-License-Identifier parsing Date: Fri, 10 Apr 2026 23:22:55 +0200 Message-ID: <20260410212255.9883-16-luis.augenstein@tngtech.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20260410212255.9883-1-luis.augenstein@tngtech.com> References: <20260410212255.9883-1-luis.augenstein@tngtech.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Luis Augenstein Verify that SPDX-License-Identifier headers at the top of source files are parsed correctly. Assisted-by: Cursor:claude-sonnet-4-5 Assisted-by: OpenCode:GLM-4-7 Co-developed-by: Maximilian Huber Signed-off-by: Maximilian Huber Signed-off-by: Luis Augenstein --- scripts/sbom/tests/spdx_graph/__init__.py | 0 .../sbom/tests/spdx_graph/test_kernel_file.py | 32 +++++++++++++++++++ 2 files changed, 32 insertions(+) create mode 100644 scripts/sbom/tests/spdx_graph/__init__.py create mode 100644 scripts/sbom/tests/spdx_graph/test_kernel_file.py diff --git a/scripts/sbom/tests/spdx_graph/__init__.py b/scripts/sbom/tests= /spdx_graph/__init__.py new file mode 100644 index 00000000000..e69de29bb2d diff --git a/scripts/sbom/tests/spdx_graph/test_kernel_file.py b/scripts/sb= om/tests/spdx_graph/test_kernel_file.py new file mode 100644 index 00000000000..bc44e7a97d2 --- /dev/null +++ b/scripts/sbom/tests/spdx_graph/test_kernel_file.py @@ -0,0 +1,32 @@ +# SPDX-License-Identifier: GPL-2.0-only OR MIT +# Copyright (C) 2025 TNG Technology Consulting GmbH + +import unittest +from pathlib import Path +import tempfile +from sbom.spdx_graph.kernel_file import _parse_spdx_license_identifier # = type: ignore + + +class TestKernelFile(unittest.TestCase): + def setUp(self): + self.tmpdir =3D tempfile.TemporaryDirectory() + self.src_tree =3D Path(self.tmpdir.name) + + def tearDown(self): + self.tmpdir.cleanup() + + def test_parse_spdx_license_identifier(self): + # REUSE-IgnoreStart + test_cases: list[tuple[str, str | None]] =3D [ + ("/* SPDX-License-Identifier: MIT*/", "MIT"), + ("// SPDX-License-Identifier: GPL-2.0-only", "GPL-2.0-only"), + ("/* SPDX-License-Identifier: GPL-2.0-or-later OR MIT */", "GP= L-2.0-or-later OR MIT"), + ("/* SPDX-License-Identifier: Apache-2.0 */\n extra text", "Ap= ache-2.0"), + ("int main() { return 0; }", None), + ] + # REUSE-IgnoreEnd + + for i, (file_content, expected_identifier) in enumerate(test_cases= ): + file_path =3D self.src_tree / f"file_{i}.c" + file_path.write_text(file_content) + self.assertEqual(_parse_spdx_license_identifier(str(file_path)= ), expected_identifier) --=20 2.43.0