From nobody Sat Feb 7 08:27:41 2026 Received: from mailgw01.zimbra-vnc.de (mailgw01.zimbra-vnc.de [148.251.101.236]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 7DECE2517AA; Mon, 26 Jan 2026 19:35:32 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=148.251.101.236 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1769456135; cv=none; b=KcKZnhF8ge/uKkOLe5zUg3KBZYUAQIi0fDw6NtgrctVHf/5uMxfMxs+kkeGBU+snIjLJtNjynHV8Y3rJtJ0qtfK2mqG9YU9+Fdt4P0mmBoeZ0En427A4yjuneN9y+Bma/JrNPZInJJ9vcbaWSEhD9ip8MvH5zcEJ4UOH+rgqkyw= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1769456135; c=relaxed/simple; bh=Xu/IUnze8TQ/78kQCbfV4jDLL+6e3iVprsBZngcPcVQ=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=o2WlpPYAVvC4N1tonGXUAzSr3eCUoGGof7mf93KsnK8ZTbRlPn8RQ8ZsiycPxCEjcrwafvtDh1/mmiU25HcXbFNhg8g4nNG7WdLilCzRAC4Gmy5Qrnxiv3GcW53FOg+qSWpugyoXqE6mP0YHjm4efCrSt63y7L28PF7Fp9T4FSY= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=tngtech.com; spf=pass smtp.mailfrom=tngtech.com; dkim=pass (2048-bit key) header.d=tngtech.com header.i=@tngtech.com header.b=QuPhty8E; arc=none smtp.client-ip=148.251.101.236 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=tngtech.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=tngtech.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=tngtech.com header.i=@tngtech.com header.b="QuPhty8E" Received: from zmproxy.tng.vnc.biz (zimbra-vnc.tngtech.com [35.234.71.156]) by mailgw01.zimbra-vnc.de (Postfix) with ESMTPS id EF3CA3FAF6; Mon, 26 Jan 2026 20:35:27 +0100 (CET) Received: from localhost (localhost [127.0.0.1]) by zmproxy.tng.vnc.biz (Postfix) with ESMTP id CDA301FA3D7; Mon, 26 Jan 2026 20:35:27 +0100 (CET) Received: from zmproxy.tng.vnc.biz ([127.0.0.1]) by localhost (zmproxy.tng.vnc.biz [127.0.0.1]) (amavis, port 10032) with ESMTP id 4RP3If5P0ONE; Mon, 26 Jan 2026 20:35:24 +0100 (CET) Received: from localhost (localhost [127.0.0.1]) by zmproxy.tng.vnc.biz (Postfix) with ESMTP id EAFAE1FA8BB; Mon, 26 Jan 2026 20:35:20 +0100 (CET) DKIM-Filter: OpenDKIM Filter v2.10.3 zmproxy.tng.vnc.biz EAFAE1FA8BB DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=tngtech.com; s=B14491C6-869D-11EB-BB6C-8DD33D883B31; t=1769456121; bh=preD10lEhwJad2/t0Ogmn0P5D6/nocJXkrcJcwNs5yA=; h=From:To:Subject:Date:Message-Id:MIME-Version; b=QuPhty8EGmO0iAsNov7TZWJpmVMjFwW1vArVCEnxZicZAyrYf1t9LizSfqZDjcZRk 4R10NL5ppzhluZHRPluENIafAvBZ1L/NGUp4N4v25A9/r4YFXmvKRY1o4nXDjWJCmu lgCoWlflSccMc/KGIds6T9ONF33rZbcsdb558K86WAkk8RDHQW0JWMQuKGY56c6g5S w3bJPEVFHi+ToGooIVCX+KK1+u0qXXbDQYNCgKLCPEyRabps22rL8VScNPfGnWOeFK /u03o2RpvHdh1oN3IKcrS425xa6dljVhecUQDNtm2W8duTLpLW01k619tsVl082lsJ kj+dlHE9xUPfw== X-Virus-Scanned: amavis at zmproxy.tng.vnc.biz Received: from zmproxy.tng.vnc.biz ([127.0.0.1]) by localhost (zmproxy.tng.vnc.biz [127.0.0.1]) (amavis, port 10026) with ESMTP id fvqBmmrN7ZPv; Mon, 26 Jan 2026 20:35:20 +0100 (CET) Received: from DESKTOP-0O0JV6I.localdomain (ipservice-092-208-231-176.092.208.pools.vodafone-ip.de [92.208.231.176]) by zmproxy.tng.vnc.biz (Postfix) with ESMTPSA id D91A11FA90B; Mon, 26 Jan 2026 20:35:18 +0100 (CET) From: Luis Augenstein To: nathan@kernel.org, nsc@kernel.org Cc: linux-kbuild@vger.kernel.org, linux-kernel@vger.kernel.org, akpm@linux-foundation.org, gregkh@linuxfoundation.org, maximilian.huber@tngtech.com, Luis Augenstein Subject: [PATCH v3 04/14] tools/sbom: add cmd graph generation Date: Mon, 26 Jan 2026 20:32:54 +0100 Message-Id: <20260126193304.320916-5-luis.augenstein@tngtech.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20260126193304.320916-1-luis.augenstein@tngtech.com> References: <20260126193304.320916-1-luis.augenstein@tngtech.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Implement command graph generation by parsing .cmd files to build a dependency graph. Add CmdGraph, CmdGraphNode, and .cmd file parsing. Supports generating a flat list of used source files via the --generate-used-files cli argument. Co-developed-by: Maximilian Huber Signed-off-by: Maximilian Huber Signed-off-by: Luis Augenstein --- tools/sbom/Makefile | 6 +- tools/sbom/sbom.py | 39 +++++ tools/sbom/sbom/cmd_graph/__init__.py | 7 + tools/sbom/sbom/cmd_graph/cmd_file.py | 149 ++++++++++++++++++++ tools/sbom/sbom/cmd_graph/cmd_graph.py | 46 ++++++ tools/sbom/sbom/cmd_graph/cmd_graph_node.py | 120 ++++++++++++++++ tools/sbom/sbom/cmd_graph/deps_parser.py | 52 +++++++ tools/sbom/sbom/config.py | 147 ++++++++++++++++++- 8 files changed, 563 insertions(+), 3 deletions(-) create mode 100644 tools/sbom/sbom/cmd_graph/__init__.py create mode 100644 tools/sbom/sbom/cmd_graph/cmd_file.py create mode 100644 tools/sbom/sbom/cmd_graph/cmd_graph.py create mode 100644 tools/sbom/sbom/cmd_graph/cmd_graph_node.py create mode 100644 tools/sbom/sbom/cmd_graph/deps_parser.py diff --git a/tools/sbom/Makefile b/tools/sbom/Makefile index 90ae42dd28ee..cc4a632533ba 100644 --- a/tools/sbom/Makefile +++ b/tools/sbom/Makefile @@ -29,7 +29,11 @@ $(SBOM_TARGETS) &: $(SBOM_DEPS) sed 's/\.o$$/.ko/' $(objtree)/modules.order >> $(SBOM_ROOTS_FILE); \ fi =20 - @python3 sbom.py + @python3 sbom.py \ + --src-tree $(srctree) \ + --obj-tree $(objtree) \ + --roots-file $(SBOM_ROOTS_FILE) \ + --output-directory $(objtree) =20 @rm $(SBOM_ROOTS_FILE) =20 diff --git a/tools/sbom/sbom.py b/tools/sbom/sbom.py index c7f23d6eb300..25d912a282de 100644 --- a/tools/sbom/sbom.py +++ b/tools/sbom/sbom.py @@ -7,9 +7,13 @@ Compute software bill of materials in SPDX format describi= ng a kernel build. """ =20 import logging +import os import sys +import time import sbom.sbom_logging as sbom_logging from sbom.config import get_config +from sbom.path_utils import is_relative_to +from sbom.cmd_graph import CmdGraph =20 =20 def main(): @@ -22,6 +26,36 @@ def main(): format=3D"[%(levelname)s] %(message)s", ) =20 + # Build cmd graph + logging.debug("Start building cmd graph") + start_time =3D time.time() + cmd_graph =3D CmdGraph.create(config.root_paths, config) + logging.debug(f"Built cmd graph in {time.time() - start_time} seconds") + + # Save used files document + if config.generate_used_files: + if config.src_tree =3D=3D config.obj_tree: + logging.info( + f"Extracting all files from the cmd graph to {(config.used= _files_file_name,)} " + "instead of only source files because source files cannot = be " + "reliably classified when the source and object trees are = identical.", + ) + used_files =3D [os.path.relpath(node.absolute_path, config.src= _tree) for node in cmd_graph] + logging.debug(f"Found {len(used_files)} files in cmd graph.") + else: + used_files =3D [ + os.path.relpath(node.absolute_path, config.src_tree) + for node in cmd_graph + if is_relative_to(node.absolute_path, config.src_tree) + and not is_relative_to(node.absolute_path, config.obj_tree) + ] + logging.debug(f"Found {len(used_files)} source files in cmd gr= aph") + if not sbom_logging.has_errors() or config.write_output_on_error: + used_files_path =3D os.path.join(config.output_directory, conf= ig.used_files_file_name) + with open(used_files_path, "w", encoding=3D"utf-8") as f: + f.write("\n".join(str(file_path) for file_path in used_fil= es)) + logging.debug(f"Successfully saved {used_files_path}") + # Report collected warnings and errors in case of failure warning_summary =3D sbom_logging.summarize_warnings() error_summary =3D sbom_logging.summarize_errors() @@ -30,6 +64,11 @@ def main(): logging.warning(warning_summary) if error_summary: logging.error(error_summary) + if not config.write_output_on_error: + logging.info( + "Use --write-output-on-error to generate output documents = even when errors occur. " + "Note that in this case the generated SPDX documents may b= e incomplete." + ) sys.exit(1) =20 =20 diff --git a/tools/sbom/sbom/cmd_graph/__init__.py b/tools/sbom/sbom/cmd_gr= aph/__init__.py new file mode 100644 index 000000000000..9d661a5c3d93 --- /dev/null +++ b/tools/sbom/sbom/cmd_graph/__init__.py @@ -0,0 +1,7 @@ +# SPDX-License-Identifier: GPL-2.0-only OR MIT +# Copyright (C) 2025 TNG Technology Consulting GmbH + +from .cmd_graph import CmdGraph +from .cmd_graph_node import CmdGraphNode, CmdGraphNodeConfig + +__all__ =3D ["CmdGraph", "CmdGraphNode", "CmdGraphNodeConfig"] diff --git a/tools/sbom/sbom/cmd_graph/cmd_file.py b/tools/sbom/sbom/cmd_gr= aph/cmd_file.py new file mode 100644 index 000000000000..d85ef5de0c26 --- /dev/null +++ b/tools/sbom/sbom/cmd_graph/cmd_file.py @@ -0,0 +1,149 @@ +# SPDX-License-Identifier: GPL-2.0-only OR MIT +# Copyright (C) 2025 TNG Technology Consulting GmbH + +import os +import re +from dataclasses import dataclass, field +from sbom.cmd_graph.deps_parser import parse_cmd_file_deps +from sbom.cmd_graph.savedcmd_parser import parse_inputs_from_commands +import sbom.sbom_logging as sbom_logging +from sbom.path_utils import PathStr + +SAVEDCMD_PATTERN =3D re.compile(r"^(saved)?cmd_.*?:=3D\s*(?P= .+)$") +SOURCE_PATTERN =3D re.compile(r"^source.*?:=3D\s*(?P.+)$") + + +@dataclass +class CmdFile: + cmd_file_path: PathStr + savedcmd: str + source: PathStr | None =3D None + deps: list[str] =3D field(default_factory=3Dlist[str]) + make_rules: list[str] =3D field(default_factory=3Dlist[str]) + + @classmethod + def create(cls, cmd_file_path: PathStr) -> "CmdFile | None": + """ + Parses a .cmd file. + .cmd files are assumed to have one of the following structures: + 1. Full Cmd File + (saved)?cmd_ :=3D + source_ :=3D + deps_ :=3D \ + + :=3D $(deps_) + $(deps_): + + 2. Command Only Cmd File + (saved)?cmd_ :=3D + + 3. Single Dependency Cmd File + (saved)?cmd_ :=3D + :=3D + + Args: + cmd_file_path (Path): absolute Path to a .cmd file + + Returns: + cmd_file (CmdFile): Parsed cmd file. + """ + with open(cmd_file_path, "rt") as f: + lines =3D [line.strip() for line in f.readlines() if line.stri= p() !=3D "" and not line.startswith("#")] + + # savedcmd + match =3D SAVEDCMD_PATTERN.match(lines[0]) + if match is None: + sbom_logging.error( + "Skip parsing '{cmd_file_path}' because no 'savedcmd_' com= mand was found.", cmd_file_path=3Dcmd_file_path + ) + return None + savedcmd =3D match.group("full_command") + + # Command Only Cmd File + if len(lines) =3D=3D 1: + return CmdFile(cmd_file_path, savedcmd) + + # Single Dependency Cmd File + if len(lines) =3D=3D 2: + dep =3D lines[1].split(":")[1].strip() + return CmdFile(cmd_file_path, savedcmd, deps=3D[dep]) + + # Full Cmd File + # source + line1 =3D SOURCE_PATTERN.match(lines[1]) + if line1 is None: + sbom_logging.error( + "Skip parsing '{cmd_file_path}' because no 'source_' entry= was found.", cmd_file_path=3Dcmd_file_path + ) + return CmdFile(cmd_file_path, savedcmd) + source =3D line1.group("source_file") + + # deps + deps: list[str] =3D [] + i =3D 3 # lines[2] includes the variable assignment but no actual= dependency, so we need to start at lines[3]. + while i < len(lines): + if not lines[i].endswith("\\"): + break + deps.append(lines[i][:-1].strip()) + i +=3D 1 + + # make_rules + make_rules =3D lines[i:] + + return CmdFile(cmd_file_path, savedcmd, source, deps, make_rules) + + def get_dependencies( + self: "CmdFile", target_path: PathStr, obj_tree: PathStr, fail_on_= unknown_build_command: bool + ) -> list[PathStr]: + """ + Parses all dependencies required to build a target file from its c= md file. + + Args: + target_path: path to the target file relative to `obj_tree`. + obj_tree: absolute path to the object tree. + fail_on_unknown_build_command: Whether to fail if an unknown b= uild command is encountered. + + Returns: + list[PathStr]: dependency file paths relative to `obj_tree`. + """ + input_files: list[PathStr] =3D [ + str(p) for p in parse_inputs_from_commands(self.savedcmd, fail= _on_unknown_build_command) + ] + if self.deps: + input_files +=3D [str(p) for p in parse_cmd_file_deps(self.dep= s)] + input_files =3D _expand_resolve_files(input_files, obj_tree) + + cmd_file_dependencies: list[PathStr] =3D [] + for input_file in input_files: + # input files are either absolute or relative to the object tr= ee + if os.path.isabs(input_file): + input_file =3D os.path.relpath(input_file, obj_tree) + if input_file =3D=3D target_path: + # Skip target file to prevent cycles. This is necessary be= cause some multi stage commands first create an output and then pass it as = input to the next command, e.g., objcopy. + continue + cmd_file_dependencies.append(input_file) + + return cmd_file_dependencies + + +def _expand_resolve_files(input_files: list[PathStr], obj_tree: PathStr) -= > list[PathStr]: + """ + Expands resolve files which may reference additional files via '@' not= ation. + + Args: + input_files (list[PathStr]): List of file paths relative to the ob= ject tree, where paths starting with '@' refer to files + containing further file paths, each o= n a separate line. + obj_tree: Absolute path to the root of the object tree. + + Returns: + list[PathStr]: Flattened list of all input file paths, with any ne= sted '@' file references resolved recursively. + """ + expanded_input_files: list[PathStr] =3D [] + for input_file in input_files: + if not input_file.startswith("@"): + expanded_input_files.append(input_file) + continue + with open(os.path.join(obj_tree, input_file.lstrip("@")), "rt") as= f: + resolve_file_content =3D [line_stripped for line in f.readline= s() if (line_stripped :=3D line.strip())] + expanded_input_files +=3D _expand_resolve_files(resolve_file_conte= nt, obj_tree) + return expanded_input_files diff --git a/tools/sbom/sbom/cmd_graph/cmd_graph.py b/tools/sbom/sbom/cmd_g= raph/cmd_graph.py new file mode 100644 index 000000000000..cad54243ff3f --- /dev/null +++ b/tools/sbom/sbom/cmd_graph/cmd_graph.py @@ -0,0 +1,46 @@ +# SPDX-License-Identifier: GPL-2.0-only OR MIT +# Copyright (C) 2025 TNG Technology Consulting GmbH + +from collections import deque +from dataclasses import dataclass, field +from typing import Iterator + +from sbom.cmd_graph.cmd_graph_node import CmdGraphNode, CmdGraphNodeConfig +from sbom.path_utils import PathStr + + +@dataclass +class CmdGraph: + """Directed acyclic graph of build dependencies primarily inferred fro= m .cmd files produced during kernel builds""" + + roots: list[CmdGraphNode] =3D field(default_factory=3Dlist[CmdGraphNod= e]) + + @classmethod + def create(cls, root_paths: list[PathStr], config: CmdGraphNodeConfig)= -> "CmdGraph": + """ + Recursively builds a dependency graph starting from `root_paths`. + Dependencies are mainly discovered by parsing the `.cmd` files. + + Args: + root_paths (list[PathStr]): List of paths to root outputs rela= tive to obj_tree + config (CmdGraphNodeConfig): Configuration options + + Returns: + CmdGraph: A graph of all build dependencies for the given root= files. + """ + node_cache: dict[PathStr, CmdGraphNode] =3D {} + root_nodes =3D [CmdGraphNode.create(root_path, config, node_cache)= for root_path in root_paths] + return CmdGraph(root_nodes) + + def __iter__(self) -> Iterator[CmdGraphNode]: + """Traverse the graph in breadth-first order, yielding each unique= node.""" + visited: set[PathStr] =3D set() + node_stack: deque[CmdGraphNode] =3D deque(self.roots) + while len(node_stack) > 0: + node =3D node_stack.popleft() + if node.absolute_path in visited: + continue + + visited.add(node.absolute_path) + node_stack.extend(node.children) + yield node diff --git a/tools/sbom/sbom/cmd_graph/cmd_graph_node.py b/tools/sbom/sbom/= cmd_graph/cmd_graph_node.py new file mode 100644 index 000000000000..fdaed0f0ccba --- /dev/null +++ b/tools/sbom/sbom/cmd_graph/cmd_graph_node.py @@ -0,0 +1,120 @@ +# SPDX-License-Identifier: GPL-2.0-only OR MIT +# Copyright (C) 2025 TNG Technology Consulting GmbH + +from dataclasses import dataclass, field +from itertools import chain +import logging +import os +from typing import Iterator, Protocol + +from sbom import sbom_logging +from sbom.cmd_graph.cmd_file import CmdFile +from sbom.path_utils import PathStr, is_relative_to + + +@dataclass +class IncbinDependency: + node: "CmdGraphNode" + full_statement: str + + +class CmdGraphNodeConfig(Protocol): + obj_tree: PathStr + src_tree: PathStr + fail_on_unknown_build_command: bool + + +@dataclass +class CmdGraphNode: + """A node in the cmd graph representing a single file and its dependen= cies.""" + + absolute_path: PathStr + """Absolute path to the file this node represents.""" + + cmd_file: CmdFile | None =3D None + """Parsed .cmd file describing how the file at absolute_path was built= , or None if not available.""" + + cmd_file_dependencies: list["CmdGraphNode"] =3D field(default_factory= =3Dlist["CmdGraphNode"]) + incbin_dependencies: list[IncbinDependency] =3D field(default_factory= =3Dlist[IncbinDependency]) + hardcoded_dependencies: list["CmdGraphNode"] =3D field(default_factory= =3Dlist["CmdGraphNode"]) + + @property + def children(self) -> Iterator["CmdGraphNode"]: + seen: set[PathStr] =3D set() + for node in chain( + self.cmd_file_dependencies, + (dep.node for dep in self.incbin_dependencies), + self.hardcoded_dependencies, + ): + if node.absolute_path not in seen: + seen.add(node.absolute_path) + yield node + + @classmethod + def create( + cls, + target_path: PathStr, + config: CmdGraphNodeConfig, + cache: dict[PathStr, "CmdGraphNode"] | None =3D None, + depth: int =3D 0, + ) -> "CmdGraphNode": + """ + Recursively builds a dependency graph starting from `target_path`. + Dependencies are mainly discovered by parsing the `..cmd` file. + + Args: + target_path: Path to the target file relative to obj_tree. + config: Config options + cache: Tracks processed nodes to prevent cycles. + depth: Internal parameter to track the current recursion depth. + + Returns: + CmdGraphNode: cmd graph node representing the target file + """ + if cache is None: + cache =3D {} + + target_path_absolute =3D ( + os.path.realpath(p) + if os.path.islink(p :=3D os.path.join(config.obj_tree, target_= path)) + else os.path.normpath(p) + ) + + if target_path_absolute in cache: + return cache[target_path_absolute] + + if depth =3D=3D 0: + logging.debug(f"Build node: {target_path}") + + cmd_file_path =3D _to_cmd_path(target_path_absolute) + cmd_file =3D CmdFile.create(cmd_file_path) if os.path.exists(cmd_f= ile_path) else None + node =3D CmdGraphNode(target_path_absolute, cmd_file) + cache[target_path_absolute] =3D node + + if not os.path.exists(target_path_absolute): + error_or_warning =3D ( + sbom_logging.error + if is_relative_to(target_path_absolute, config.obj_tree) + or is_relative_to(target_path_absolute, config.src_tree) + else sbom_logging.warning + ) + error_or_warning( + "Skip parsing '{target_path_absolute}' because file does n= ot exist", + target_path_absolute=3Dtarget_path_absolute, + ) + return node + + if cmd_file is not None: + node.cmd_file_dependencies =3D [ + CmdGraphNode.create(cmd_file_dependency_path, config, cach= e, depth + 1) + for cmd_file_dependency_path in cmd_file.get_dependencies( + target_path, config.obj_tree, config.fail_on_unknown_b= uild_command + ) + ] + + return node + + +def _to_cmd_path(path: PathStr) -> PathStr: + name =3D os.path.basename(path) + return path.removesuffix(name) + f".{name}.cmd" diff --git a/tools/sbom/sbom/cmd_graph/deps_parser.py b/tools/sbom/sbom/cmd= _graph/deps_parser.py new file mode 100644 index 000000000000..fb3ccdd415b5 --- /dev/null +++ b/tools/sbom/sbom/cmd_graph/deps_parser.py @@ -0,0 +1,52 @@ +# SPDX-License-Identifier: GPL-2.0-only OR MIT +# Copyright (C) 2025 TNG Technology Consulting GmbH + +import re +import sbom.sbom_logging as sbom_logging +from sbom.path_utils import PathStr + +# Match dependencies on config files +# Example match: "$(wildcard include/config/CONFIG_SOMETHING)" +CONFIG_PATTERN =3D re.compile(r"\$\(wildcard (include/config/[^)]+)\)") + +# Match dependencies on the objtool binary +# Example match: "$(wildcard ./tools/objtool/objtool)" +OBJTOOL_PATTERN =3D re.compile(r"\$\(wildcard \./tools/objtool/objtool\)") + +# Match any Makefile wildcard reference +# Example match: "$(wildcard path/to/file)" +WILDCARD_PATTERN =3D re.compile(r"\$\(wildcard (?P[^)]+)\)") + +# Match ordinary paths: +# - ^(\/)?: Optionally starts with a '/' +# - (([\w\-\., ]*)\/)*: Zero or more directory levels +# - [\w\-\., ]+$: Path component (file or directory) +# Example matches: "/foo/bar.c", "dir1/dir2/file.txt", "plainfile" +VALID_PATH_PATTERN =3D re.compile(r"^(\/)?(([\w\-\., ]*)\/)*[\w\-\., ]+$") + + +def parse_cmd_file_deps(deps: list[str]) -> list[PathStr]: + """ + Parse dependency strings of a .cmd file and return valid input file pa= ths. + + Args: + deps: List of dependency strings as found in `.cmd` files. + + Returns: + input_files: List of input file paths + """ + input_files: list[PathStr] =3D [] + for dep in deps: + dep =3D dep.strip() + match dep: + case _ if CONFIG_PATTERN.match(dep) or OBJTOOL_PATTERN.match(d= ep): + # config paths like include/config/ should no= t be included in the graph + continue + case _ if match :=3D WILDCARD_PATTERN.match(dep): + path =3D match.group("path") + input_files.append(path) + case _ if VALID_PATH_PATTERN.match(dep): + input_files.append(dep) + case _: + sbom_logging.error("Skip parsing dependency {dep} because = of unrecognized format", dep=3Ddep) + return input_files diff --git a/tools/sbom/sbom/config.py b/tools/sbom/sbom/config.py index 3dc569ae0c43..39e556a4c53b 100644 --- a/tools/sbom/sbom/config.py +++ b/tools/sbom/sbom/config.py @@ -3,15 +3,43 @@ =20 import argparse from dataclasses import dataclass +import os +from typing import Any +from sbom.path_utils import PathStr =20 =20 @dataclass class KernelSbomConfig: + src_tree: PathStr + """Absolute path to the Linux kernel source directory.""" + + obj_tree: PathStr + """Absolute path to the build output directory.""" + + root_paths: list[PathStr] + """List of paths to root outputs (relative to obj_tree) to base the SB= OM on.""" + + generate_used_files: bool + """Whether to generate a flat list of all source files used in the bui= ld. + If False, no used-files document is created.""" + + used_files_file_name: str + """If `generate_used_files` is True, specifies the file name for the u= sed-files document.""" + + output_directory: PathStr + """Path to the directory where the generated output documents will be = saved.""" + debug: bool """Whether to enable debug logging.""" =20 + fail_on_unknown_build_command: bool + """Whether to fail if an unknown build command is encountered in a .cm= d file.""" + + write_output_on_error: bool + """Whether to write output documents even if errors occur.""" + =20 -def _parse_cli_arguments() -> dict[str, bool]: +def _parse_cli_arguments() -> dict[str, Any]: """ Parse command-line arguments using argparse. =20 @@ -19,8 +47,49 @@ def _parse_cli_arguments() -> dict[str, bool]: Dictionary of parsed arguments. """ parser =3D argparse.ArgumentParser( + formatter_class=3Dargparse.RawTextHelpFormatter, description=3D"Generate SPDX SBOM documents for kernel builds", ) + parser.add_argument( + "--src-tree", + default=3D"../linux", + help=3D"Path to the kernel source tree (default: ../linux)", + ) + parser.add_argument( + "--obj-tree", + default=3D"../linux/kernel_build", + help=3D"Path to the build output directory (default: ../linux/kern= el_build)", + ) + group =3D parser.add_mutually_exclusive_group(required=3DTrue) + group.add_argument( + "--roots", + nargs=3D"+", + default=3D"arch/x86/boot/bzImage", + help=3D"Space-separated list of paths relative to obj-tree for whi= ch the SBOM will be created.\n" + "Cannot be used together with --roots-file. (default: arch/x86/boo= t/bzImage)", + ) + group.add_argument( + "--roots-file", + help=3D"Path to a file containing the root paths (one per line). C= annot be used together with --roots.", + ) + parser.add_argument( + "--generate-used-files", + action=3D"store_true", + default=3DFalse, + help=3D( + "Whether to create the sbom.used-files.txt file, a flat list o= f all " + "source files used for the kernel build.\n" + "If src-tree and obj-tree are equal it is not possible to reli= ably " + "classify source files.\n" + "In this case sbom.used-files.txt will contain all files used = for the " + "kernel build including all build artifacts. (default: False)" + ), + ) + parser.add_argument( + "--output-directory", + default=3D".", + help=3D"Path to the directory where the generated output documents= will be stored (default: .)", + ) parser.add_argument( "--debug", action=3D"store_true", @@ -28,6 +97,28 @@ def _parse_cli_arguments() -> dict[str, bool]: help=3D"Enable debug logs (default: False)", ) =20 + # Error handling settings + parser.add_argument( + "--do-not-fail-on-unknown-build-command", + action=3D"store_true", + default=3DFalse, + help=3D( + "Whether to fail if an unknown build command is encountered in= a .cmd file.\n" + "If set to True, errors are logged as warnings instead. (defau= lt: False)" + ), + ) + parser.add_argument( + "--write-output-on-error", + action=3D"store_true", + default=3DFalse, + help=3D( + "Write output documents even if errors occur. The resulting do= cuments " + "may be incomplete.\n" + "A summary of warnings and errors can be found in the 'comment= ' property " + "of the CreationInfo element. (default: False)" + ), + ) + args =3D vars(parser.parse_args()) return args =20 @@ -42,6 +133,58 @@ def get_config() -> KernelSbomConfig: # Parse cli arguments args =3D _parse_cli_arguments() =20 + # Extract and validate cli arguments + src_tree =3D os.path.realpath(args["src_tree"]) + obj_tree =3D os.path.realpath(args["obj_tree"]) + root_paths =3D [] + if args["roots_file"]: + with open(args["roots_file"], "rt") as f: + root_paths =3D [root.strip() for root in f.readlines()] + else: + root_paths =3D args["roots"] + _validate_path_arguments(src_tree, obj_tree, root_paths) + + generate_used_files =3D args["generate_used_files"] + output_directory =3D os.path.realpath(args["output_directory"]) debug =3D args["debug"] =20 - return KernelSbomConfig(debug=3Ddebug) + fail_on_unknown_build_command =3D not args["do_not_fail_on_unknown_bui= ld_command"] + write_output_on_error =3D args["write_output_on_error"] + + # Hardcoded config + used_files_file_name =3D "sbom.used-files.txt" + + return KernelSbomConfig( + src_tree=3Dsrc_tree, + obj_tree=3Dobj_tree, + root_paths=3Droot_paths, + generate_used_files=3Dgenerate_used_files, + used_files_file_name=3Dused_files_file_name, + output_directory=3Doutput_directory, + debug=3Ddebug, + fail_on_unknown_build_command=3Dfail_on_unknown_build_command, + write_output_on_error=3Dwrite_output_on_error, + ) + + +def _validate_path_arguments(src_tree: PathStr, obj_tree: PathStr, root_pa= ths: list[PathStr]) -> None: + """ + Validate that the provided paths exist. + + Args: + src_tree: Absolute path to the source tree. + obj_tree: Absolute path to the object tree. + root_paths: List of root paths relative to obj_tree. + + Raises: + argparse.ArgumentTypeError: If any of the paths don't exist. + """ + if not os.path.exists(src_tree): + raise argparse.ArgumentTypeError(f"--src-tree {src_tree} does not = exist") + if not os.path.exists(obj_tree): + raise argparse.ArgumentTypeError(f"--obj-tree {obj_tree} does not = exist") + for root_path in root_paths: + if not os.path.exists(os.path.join(obj_tree, root_path)): + raise argparse.ArgumentTypeError( + f"path to root artifact {os.path.join(obj_tree, root_path)= } does not exist" + ) --=20 2.34.1