From nobody Thu Dec 18 13:56:14 2025 Delivered-To: importer@patchew.org Authentication-Results: mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=pass(p=none dis=none) header.from=redhat.com ARC-Seal: i=1; a=rsa-sha256; t=1656761788; cv=none; d=zohomail.com; s=zohoarc; b=XZVYhg22xkfzgEg86Fm7YN9KpZhh4oBrr3ZqmIbEsfxMQBNCFjiyYnubg39sPPUE1K1QSmRY0y2n2OlxgaMKrFN2CRPI4ItZtzTdjrkX1ae+jBlU9DddejVIgSVfOnWfX+jgR1Cl2qS40ADssblTFjs22eVveL1eHkNBf/PMZ0U= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zohomail.com; s=zohoarc; t=1656761788; h=Content-Transfer-Encoding:Cc:Date:From:In-Reply-To:List-Subscribe:List-Post:List-Id:List-Archive:List-Help:List-Unsubscribe:MIME-Version:Message-ID:References:Sender:Subject:To; bh=AVsjRvUFO7IRtcut4g0ir534oBEQ2f8ft4+r+hQg0BA=; b=LZXYKX9NN9sFKBNClpTLjrvr0IjejT2XLBWZaAQLcBC/VxNYkAq3MfDGXJAA1rqcJquUYAbBhDZ0PL+cIF0wovSa+kE6hjdEUAZ8c8+QD1tSgi7VcITnKoRAK8eZFX/iZByTaD0jjkjpCLyF4lkB76Me+xX9ArmmFd7zKUvxwIQ= ARC-Authentication-Results: i=1; mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=pass header.from= (p=none dis=none) Return-Path: Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) by mx.zohomail.com with SMTPS id 1656761788054250.98910313720535; Sat, 2 Jul 2022 04:36:28 -0700 (PDT) Received: from localhost ([::1]:34964 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1o7bQI-00061d-4x for importer@patchew.org; Sat, 02 Jul 2022 07:36:26 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:51790) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1o7bNk-0002uc-N4 for qemu-devel@nongnu.org; Sat, 02 Jul 2022 07:33:48 -0400 Received: from us-smtp-delivery-124.mimecast.com ([170.10.133.124]:49709) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1o7bNh-0007V6-2v for qemu-devel@nongnu.org; Sat, 02 Jul 2022 07:33:47 -0400 Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-172-iacSVY3DPBqkML1rw_IxMA-1; Sat, 02 Jul 2022 07:33:41 -0400 Received: from smtp.corp.redhat.com (int-mx09.intmail.prod.int.rdu2.redhat.com [10.11.54.9]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id A36F82999B2C; Sat, 2 Jul 2022 11:33:40 +0000 (UTC) Received: from fedora.redhat.com (unknown [10.39.194.114]) by smtp.corp.redhat.com (Postfix) with ESMTP id 606C5492C3B; Sat, 2 Jul 2022 11:33:37 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1656761624; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=AVsjRvUFO7IRtcut4g0ir534oBEQ2f8ft4+r+hQg0BA=; b=CwsZW5qpPUVMQl5+E+BCovhIdgU5B/iFRhwpZR+qBePSCisB2LG/XbjnuZujvsqXIcHhlJ KbEPtfU+l/Z0FpMrSbm4uuqmQyXaEX94+BriyNdS2j3JHrfRR0RdCP7pY2M/tFmC6M1JoS KICNBNryLIR2VaTd1Cv/UPO6tiMFRy4= X-MC-Unique: iacSVY3DPBqkML1rw_IxMA-1 From: Alberto Faria To: qemu-devel@nongnu.org Cc: Paolo Bonzini , qemu-block@nongnu.org, "Denis V. Lunev" , Emanuele Giuseppe Esposito , Stefan Hajnoczi , Ronnie Sahlberg , Hanna Reitz , Stefano Garzarella , Kevin Wolf , Peter Xu , Alberto Garcia , John Snow , Eric Blake , Fam Zheng , Markus Armbruster , Vladimir Sementsov-Ogievskiy , Peter Lieven , Alberto Faria Subject: [RFC 1/8] Add an extensible static analyzer Date: Sat, 2 Jul 2022 12:33:24 +0100 Message-Id: <20220702113331.2003820-2-afaria@redhat.com> In-Reply-To: <20220702113331.2003820-1-afaria@redhat.com> References: <20220702113331.2003820-1-afaria@redhat.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 2.85 on 10.11.54.9 Received-SPF: pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Received-SPF: pass client-ip=170.10.133.124; envelope-from=afaria@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -21 X-Spam_score: -2.2 X-Spam_bar: -- X-Spam_report: (-2.2 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.082, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: "Qemu-devel" X-ZohoMail-DKIM: pass (identity @redhat.com) X-ZM-MESSAGEID: 1656761788635100001 Content-Type: text/plain; charset="utf-8" Add a static-analyzer.py script that uses libclang's Python bindings to provide a common framework on which arbitrary static analysis checks can be developed and run against QEMU's code base. As an example, a simple check is included that verifies that the return value of static, non-void functions is used by at least one caller. Signed-off-by: Alberto Faria --- static-analyzer.py | 509 +++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 509 insertions(+) create mode 100755 static-analyzer.py diff --git a/static-analyzer.py b/static-analyzer.py new file mode 100755 index 0000000000..010cc92212 --- /dev/null +++ b/static-analyzer.py @@ -0,0 +1,509 @@ +#!/usr/bin/env python3 +# ------------------------------------------------------------------------= ---- # + +from __future__ import annotations + +from dataclasses import dataclass +import json +import os +import os.path +import subprocess +import sys +import re +from argparse import ArgumentParser, Namespace, RawDescriptionHelpFormatter +from multiprocessing import Pool +from pathlib import Path +from typing import ( + Any, + Callable, + Dict, + Iterable, + List, + NoReturn, + Optional, + Mapping, + Sequence, + Tuple, +) + +# ------------------------------------------------------------------------= ---- # + +from clang.cindex import ( # type: ignore + Cursor, + CursorKind, + Diagnostic, + StorageClass, + TranslationUnit, + TranslationUnitLoadError, + TypeKind, +) + +Cursor.__hash__ =3D lambda self: self.hash # so `Cursor`s can be dict keys + +# ------------------------------------------------------------------------= ---- # +# Usage + + +def parse_args() -> Namespace: + + available_checks =3D "\n".join(f" {name}" for (name, _) in CHECKS) + + parser =3D ArgumentParser( + formatter_class=3DRawDescriptionHelpFormatter, + epilog=3Df""" +available checks: +{available_checks} + +exit codes: + 0 No problems found. + 1 Analyzer failure. + 2 Bad usage. + 3 Problems found in a translation unit. +""", + ) + + parser.add_argument("build_dir", type=3DPath) + + parser.add_argument( + "translation_units", + type=3DPath, + nargs=3D"*", + help=3D( + "Analyze only translation units whose root source file matches= or" + " is under one of the given paths." + ), + ) + + parser.add_argument( + "-c", + "--check", + metavar=3D"CHECK", + dest=3D"check_names", + choices=3D[name for (name, _) in CHECKS], + action=3D"append", + help=3D( + "Enable the given check. Can be given multiple times. If not g= iven," + " all checks are enabled." + ), + ) + + parser.add_argument( + "-j", + "--jobs", + dest=3D"threads", + type=3Dint, + help=3D( + "Number of threads to employ. Defaults to the number of logica= l" + " processors." + ), + ) + + parser.add_argument( + "--profile", + action=3D"store_true", + help=3D"Profile execution. Forces single-threaded execution.", + ) + + return parser.parse_args() + + +# ------------------------------------------------------------------------= ---- # +# Main + + +def main() -> NoReturn: + + args =3D parse_args() + + compile_commands =3D load_compilation_database(args) + contexts =3D get_translation_unit_contexts(args, compile_commands) + + analyze_translation_units(args, contexts) + + +def load_compilation_database(args: Namespace) -> Sequence[Mapping[str, st= r]]: + + # clang.cindex.CompilationDatabase.getCompileCommands() apparently pro= duces + # entries for files not listed in compile_commands.json in a best-effo= rt + # manner, which we don't want, so we parse the JSON ourselves instead. + + path =3D args.build_dir / "compile_commands.json" + + try: + with path.open("r") as f: + return json.load(f) + except FileNotFoundError: + fatal(f"{path} does not exist") + + +def get_translation_unit_contexts( + args: Namespace, compile_commands: Iterable[Mapping[str, str]] +) -> Sequence[TranslationUnitContext]: + + system_include_paths =3D get_clang_system_include_paths() + check_names =3D args.check_names or [name for (name, _) in CHECKS] + + contexts =3D ( + TranslationUnitContext( + absolute_path=3Dstr(Path(cmd["directory"], cmd["file"]).resolv= e()), + compilation_working_dir=3Dcmd["directory"], + compilation_command=3Dcmd["command"], + system_include_paths=3Dsystem_include_paths, + check_names=3Dcheck_names, + ) + for cmd in compile_commands + ) + + if args.translation_units: + + allowed_prefixes =3D [ + # ensure path exists and is slash-terminated (even if it is a = file) + os.path.join(path.resolve(strict=3DTrue), "") + for path in args.translation_units + ] + + contexts =3D ( + ctx + for ctx in contexts + if any( + (ctx.absolute_path + "/").startswith(prefix) + for prefix in allowed_prefixes + ) + ) + + context_list =3D list(contexts) + + if not context_list: + fatal("No translation units to analyze") + + return context_list + + +def get_clang_system_include_paths() -> Sequence[str]: + + # libclang does not automatically include clang's standard system incl= ude + # paths, so we ask clang what they are and include them ourselves. + + # TODO: Is there a less hacky way to do this? + + result =3D subprocess.run( + ["clang", "-E", "-", "-v"], + stdin=3Dsubprocess.DEVNULL, + stdout=3Dsubprocess.DEVNULL, + stderr=3Dsubprocess.PIPE, + universal_newlines=3DTrue, # decode stdout/stderr using default e= ncoding + check=3DTrue, + ) + + # Module `re` does not support repeated group captures. + pattern =3D ( + r"#include <...> search starts here:\n" + r"((?: \S*\n)+)" + r"End of search list." + ) + + match =3D re.search(pattern, result.stderr, re.MULTILINE) + assert match is not None + + return [line[1:] for line in match.group(1).splitlines()] + + +def fatal(message: str) -> NoReturn: + print(f"\033[0;31mERROR: {message}\033[0m") + sys.exit(1) + + +# ------------------------------------------------------------------------= ---- # +# Analysis + + +@dataclass +class TranslationUnitContext: + absolute_path: str + compilation_working_dir: str + compilation_command: str + system_include_paths: Sequence[str] + check_names: Sequence[str] + + +def analyze_translation_units( + args: Namespace, contexts: Sequence[TranslationUnitContext] +) -> NoReturn: + + results: List[bool] + + if not args.profile: + + with Pool(processes=3Dargs.threads) as pool: + results =3D pool.map(analyze_translation_unit, contexts) + + else: + + import cProfile + import pstats + + profile =3D cProfile.Profile() + + try: + results =3D profile.runcall( + lambda: list(map(analyze_translation_unit, contexts)) + ) + finally: + stats =3D pstats.Stats(profile, stream=3Dsys.stderr) + stats.strip_dirs() + stats.sort_stats("tottime") + stats.print_stats() + + print( + f"\033[0;34mAnalyzed {len(contexts)}" + f" translation unit{'' if len(contexts) =3D=3D 1 else 's'}.\033[0m" + ) + + sys.exit(0 if all(results) else 3) + + +def analyze_translation_unit(context: TranslationUnitContext) -> bool: + + # relative to script's original working directory + relative_path =3D os.path.relpath(context.absolute_path) + + # load translation unit + + command =3D context.compilation_command.split() + + adjusted_command =3D [ + # keep the original compilation command name + command[0], + # ignore unknown GCC warning options + "-Wno-unknown-warning-option", + # add clang system include paths + *( + arg + for path in context.system_include_paths + for arg in ("-isystem", path) + ), + # keep all other arguments but the last, which is the file name + *command[1:-1], + # replace relative path to get absolute location information + context.absolute_path, + ] + + original_cwd =3D os.getcwd() + os.chdir(context.compilation_working_dir) # for options like -I to wo= rk + + try: + tu =3D TranslationUnit.from_source(filename=3DNone, args=3Dadjuste= d_command) + except TranslationUnitLoadError as e: + raise RuntimeError(f"Failed to load {relative_path}") from e + + os.chdir(original_cwd) # to have proper relative paths in messages + + # check for fatal diagnostics + + found_problems =3D False + + for diag in tu.diagnostics: + # consider only Fatal diagnostics, like missing includes + if diag.severity >=3D Diagnostic.Fatal: + found_problems =3D True + location =3D format_location(diag, default=3Drelative_path) + print( + f"\033[0;33m{location}: {diag.spelling}; this may lead to = false" + f" positives and negatives\033[0m" + ) + + # analyze translation unit + + def log(node: Cursor, message: str) -> None: + nonlocal found_problems + found_problems =3D True + print(f"{format_location(node)}: {message}") + + try: + for (name, checker) in CHECKS: + if name in context.check_names: + checker(tu, context.absolute_path, log) + except Exception as e: + raise RuntimeError(f"Error analyzing {relative_path}") from e + + return not found_problems + + +# obj must have a location field/property that is a `SourceLocation`. +def format_location(obj: Any, *, default: str =3D "(none)") -> str: + + location =3D obj.location + + if location.file is None: + return default + else: + abs_path =3D Path(location.file.name).resolve() + rel_path =3D os.path.relpath(abs_path) + return f"{rel_path}:{location.line}:{location.column}" + + +# ------------------------------------------------------------------------= ---- # +# Checks + +Checker =3D Callable[[TranslationUnit, str, Callable[[Cursor, str], None]]= , None] + +CHECKS: List[Tuple[str, Checker]] =3D [] + + +def check(name: str) -> Callable[[Checker], Checker]: + def decorator(checker: Checker) -> Checker: + CHECKS.append((name, checker)) + return checker + + return decorator + + +@check("return-value-never-used") +def check_return_value_never_used( + translation_unit: TranslationUnit, + translation_unit_path: str, + log: Callable[[Cursor, str], None], +) -> None: + """ + Report static functions with a non-void return value that no caller ev= er + uses. + + This check is best effort, but should never report false positives (po= sitive + being error). + """ + + def function_occurrence_might_use_return_value( + ancestors: Sequence[Cursor], node: Cursor + ) -> bool: + + if ancestors[-1].kind.is_statement(): + + return False + + elif ( + ancestors[-1].kind =3D=3D CursorKind.CALL_EXPR + and ancestors[-1].referenced =3D=3D node.referenced + ): + + if not ancestors[-2].kind.is_statement(): + return True + + if ancestors[-2].kind in [ + CursorKind.IF_STMT, + CursorKind.SWITCH_STMT, + CursorKind.WHILE_STMT, + CursorKind.DO_STMT, + CursorKind.RETURN_STMT, + ]: + return True + + if ancestors[-2].kind =3D=3D CursorKind.FOR_STMT: + [_init, cond, _adv] =3D ancestors[-2].get_children() + if ancestors[-1] =3D=3D cond: + return True + + return False + + else: + + # might be doing something with a pointer to the function + return True + + # Maps canonical function `Cursor`s to whether we found a place that m= aybe + # uses their return value. Only includes static functions that don't r= eturn + # void and belong to the translation unit's root file (i.e, were not b= rought + # in by an #include). + funcs: Dict[Cursor, bool] =3D {} + + for [*ancestors, node] in all_paths(translation_unit.cursor): + + if ( + node.kind =3D=3D CursorKind.FUNCTION_DECL + and node.storage_class =3D=3D StorageClass.STATIC + and node.location.file.name =3D=3D translation_unit_path + and node.type.get_result().get_canonical().kind !=3D TypeKind.= VOID + ): + funcs.setdefault(node.canonical, False) + + if ( + node.kind =3D=3D CursorKind.DECL_REF_EXPR + and node.referenced.kind =3D=3D CursorKind.FUNCTION_DECL + and node.referenced.canonical in funcs + and function_occurrence_might_use_return_value(ancestors, node) + ): + funcs[node.referenced.canonical] =3D True + + # --- + + for (cursor, return_value_maybe_used) in funcs.items(): + if not return_value_maybe_used: + log(cursor, f"{cursor.spelling}() return value is never used") + + +# ------------------------------------------------------------------------= ---- # +# Traversal + +# Hides nodes of kind UNEXPOSED_EXPR. +def all_paths(root: Cursor) -> Iterable[Sequence[Cursor]]: + + path =3D [] + + def aux(node: Cursor) -> Iterable[Sequence[Cursor]]: + nonlocal path + + if node.kind !=3D CursorKind.UNEXPOSED_EXPR: + path.append(node) + yield path + + for child in node.get_children(): + yield from aux(child) + + if node.kind !=3D CursorKind.UNEXPOSED_EXPR: + path.pop() + + return aux(root) + + +# ------------------------------------------------------------------------= ---- # +# Utilities handy for development + + +def print_node(node: Cursor) -> None: + + print(f"{format_location(node)}: kind =3D {node.kind.name}", end=3D"") + + if node.spelling: + print(f", name =3D {node.spelling}", end=3D"") + + if node.type is not None: + print(f", type =3D {node.type.get_canonical().spelling}", end=3D"") + + if node.referenced is not None: + print(f", referenced =3D {node.referenced.spelling}", end=3D"") + + print() + + +def print_tree( + node: Cursor, *, max_depth: Optional[int] =3D None, indentation_level:= int =3D 0 +) -> None: + + if max_depth is None or max_depth >=3D 0: + + print(" " * indentation_level, end=3D"") + print_node(node) + + for child in node.get_children(): + print_tree( + child, + max_depth=3DNone if max_depth is None else max_depth - 1, + indentation_level=3Dindentation_level + 1, + ) + + +# ------------------------------------------------------------------------= ---- # + +if __name__ =3D=3D "__main__": + main() + +# ------------------------------------------------------------------------= ---- # --=20 2.36.1