From nobody Mon Feb 9 11:46:46 2026 Delivered-To: importer@patchew.org Authentication-Results: mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=pass(p=none dis=none) header.from=redhat.com ARC-Seal: i=1; a=rsa-sha256; t=1659100092; cv=none; d=zohomail.com; s=zohoarc; b=MlnVYrUCmbg0MrEXyPFRHdPOHvb0e6FkUar+hnTCT8+CS2C4gix41ugGIRU0N17NhbKi5A0uszQNhlOxlCwdYR0JCnb9o91GWSBkhZgFfCPZbsMvR3/yZs50/NUS8M7ZMirYnVsxxOml0ELg1Kcw2AHw8tAjIVixJvrHclfYtCs= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zohomail.com; s=zohoarc; t=1659100092; h=Content-Transfer-Encoding:Cc:Date:From:In-Reply-To:List-Subscribe:List-Post:List-Id:List-Archive:List-Help:List-Unsubscribe:MIME-Version:Message-ID:References:Sender:Subject:To; bh=uihxe1Ohk/aqUd0Yn39RrKWqrEEKFcQdal2v63T1yFQ=; b=Ltcc1N3inUGJ4PJkz4LpCNH0kZ4pd6oi4PjAMJ0Lo/tWz1tuPmySqEvFMU9CKcp5BpH5uUic9VtAA+Y9EiKeYdZRtOsqLPEJMt46Px9Zqq4wU5/C3GJ/ylCxvGMYKf9JdhiXmnYR18GNRvN8ulSNlr2XdS0nGZtA55fD7fpDdR0= ARC-Authentication-Results: i=1; mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=pass header.from= (p=none dis=none) Return-Path: Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) by mx.zohomail.com with SMTPS id 1659100092102591.1766229115944; Fri, 29 Jul 2022 06:08:12 -0700 (PDT) Received: from localhost ([::1]:44744 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1oHPit-0002GZ-01 for importer@patchew.org; Fri, 29 Jul 2022 09:08:11 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:38432) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1oHPcJ-0006yy-2J for qemu-devel@nongnu.org; Fri, 29 Jul 2022 09:01:23 -0400 Received: from us-smtp-delivery-124.mimecast.com ([170.10.129.124]:45628) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1oHPcC-0008U8-9Q for qemu-devel@nongnu.org; Fri, 29 Jul 2022 09:01:22 -0400 Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-81-vkuqeZkiPNCXfZ3xGEH8MQ-1; Fri, 29 Jul 2022 09:01:13 -0400 Received: from smtp.corp.redhat.com (int-mx04.intmail.prod.int.rdu2.redhat.com [10.11.54.4]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 56849185A7B2; Fri, 29 Jul 2022 13:01:11 +0000 (UTC) Received: from localhost.localdomain (unknown [10.39.192.53]) by smtp.corp.redhat.com (Postfix) with ESMTP id 21D442026D64; Fri, 29 Jul 2022 13:01:00 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1659099675; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=uihxe1Ohk/aqUd0Yn39RrKWqrEEKFcQdal2v63T1yFQ=; b=SVUnb1kiMkQaJb+YFwt63YB/lrOX5Z4tTHt7asyGcDVLyanqVqomZn4Gn3uecOvUNETS5a iZKcccISLOqqyWkLIqyWc1UPwL8IGcQSuYVQP9smr+GR4zA89sc+YhWEEymolax+3nmjRo ZQsMbu1ctDNkXnEu6BQxMgfuiGPfshk= X-MC-Unique: vkuqeZkiPNCXfZ3xGEH8MQ-1 From: Alberto Faria To: qemu-devel@nongnu.org Cc: =?UTF-8?q?Marc-Andr=C3=A9=20Lureau?= , Stefano Garzarella , Hannes Reinecke , "Dr. David Alan Gilbert" , Vladimir Sementsov-Ogievskiy , "Maciej S. Szmigiero" , Peter Lieven , kvm@vger.kernel.org, Xie Yongji , Eric Auger , Hanna Reitz , Jeff Cody , Eric Blake , "Denis V. Lunev" , =?UTF-8?q?Daniel=20P=2E=20Berrang=C3=A9?= , =?UTF-8?q?Philippe=20Mathieu-Daud=C3=A9?= , Christian Schoenebeck , Stefan Weil , Klaus Jensen , Laurent Vivier , Alberto Garcia , Michael Roth , Juan Quintela , David Hildenbrand , qemu-block@nongnu.org, Konstantin Kostiuk , Kevin Wolf , Gerd Hoffmann , Stefan Hajnoczi , Marcelo Tosatti , Greg Kurz , "Michael S. Tsirkin" , Amit Shah , Paolo Bonzini , Alex Williamson , Peter Xu , Raphael Norwitz , Ronnie Sahlberg , Jason Wang , Emanuele Giuseppe Esposito , Richard Henderson , Marcel Apfelbaum , Dmitry Fleytman , Eduardo Habkost , Fam Zheng , Thomas Huth , Keith Busch , =?UTF-8?q?Alex=20Benn=C3=A9e?= , "Richard W.M. Jones" , John Snow , Markus Armbruster , Alberto Faria Subject: [RFC v2 01/10] Add an extensible static analyzer Date: Fri, 29 Jul 2022 14:00:30 +0100 Message-Id: <20220729130040.1428779-2-afaria@redhat.com> In-Reply-To: <20220729130040.1428779-1-afaria@redhat.com> References: <20220729130040.1428779-1-afaria@redhat.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 2.78 on 10.11.54.4 Received-SPF: pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Received-SPF: pass client-ip=170.10.129.124; envelope-from=afaria@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -28 X-Spam_score: -2.9 X-Spam_bar: -- X-Spam_report: (-2.9 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.082, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_LOW=-0.7, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=unavailable autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: "Qemu-devel" X-ZohoMail-DKIM: pass (identity @redhat.com) X-ZM-MESSAGEID: 1659100092669100001 Content-Type: text/plain; charset="utf-8" Add a static-analyzer.py script that uses libclang's Python bindings to provide a common framework on which arbitrary static analysis checks can be developed and run against QEMU's code base. As an example, a simple "return-value-never-used" check is included that verifies that the return value of static, non-void functions is used by at least one caller. Signed-off-by: Alberto Faria --- static-analyzer.py | 486 +++++++++++++++++++++ static_analyzer/__init__.py | 242 ++++++++++ static_analyzer/return_value_never_used.py | 117 +++++ 3 files changed, 845 insertions(+) create mode 100755 static-analyzer.py create mode 100644 static_analyzer/__init__.py create mode 100644 static_analyzer/return_value_never_used.py diff --git a/static-analyzer.py b/static-analyzer.py new file mode 100755 index 0000000000..3ade422dbf --- /dev/null +++ b/static-analyzer.py @@ -0,0 +1,486 @@ +#!/usr/bin/env python3 +# ------------------------------------------------------------------------= ---- # + +from configparser import ConfigParser +from contextlib import contextmanager +from dataclasses import dataclass +import json +import os +import os.path +import shlex +import subprocess +import sys +import re +from argparse import ArgumentParser, Namespace, RawDescriptionHelpFormatter +from multiprocessing import Pool +from pathlib import Path +import textwrap +import time +from typing import ( + Iterable, + Iterator, + List, + Mapping, + NoReturn, + Sequence, + Union, +) + +import clang.cindex # type: ignore + +from static_analyzer import CHECKS, CheckContext + +# ------------------------------------------------------------------------= ---- # +# Usage + + +def parse_args() -> Namespace: + + available_checks =3D "\n".join( + f" {name} {' '.join((CHECKS[name].__doc__ or '').split())}" + for name in sorted(CHECKS) + ) + + parser =3D ArgumentParser( + allow_abbrev=3DFalse, + formatter_class=3DRawDescriptionHelpFormatter, + description=3Dtextwrap.dedent( + """ + Checks are best-effort, but should never report false positive= s. + + This only considers translation units enabled under the given = QEMU + build configuration. Note that a single .c file may give rise = to + several translation units. + + You should build QEMU before running this, since some translat= ion + units depend on files that are generated during the build. If = you + don't, you'll get errors, but should never get false negatives. + """ + ), + epilog=3Dtextwrap.dedent( + f""" + available checks: + {available_checks} + + exit codes: + 0 No problems found. + 1 Internal failure. + 2 Bad usage. + 3 Problems found in one or more translation units. + """ + ), + ) + + parser.add_argument( + "build_dir", + type=3DPath, + help=3D"Path to the build directory.", + ) + + parser.add_argument( + "translation_units", + type=3DPath, + nargs=3D"*", + help=3D( + "Analyze only translation units whose root source file matches= or" + " is under one of the given paths." + ), + ) + + # add regular options + + parser.add_argument( + "-c", + "--check", + metavar=3D"CHECK", + dest=3D"check_names", + choices=3Dsorted(CHECKS), + action=3D"append", + help=3D( + "Enable the given check. Can be given multiple times. If not g= iven," + " all checks are enabled." + ), + ) + + parser.add_argument( + "-j", + "--jobs", + dest=3D"threads", + type=3Dint, + default=3Dos.cpu_count() or 1, + help=3D( + f"Number of threads to employ. Defaults to {os.cpu_count() or = 1} on" + f" this machine." + ), + ) + + # add development options + + dev_options =3D parser.add_argument_group("development options") + + dev_options.add_argument( + "--profile", + metavar=3D"SORT_KEY", + help=3D( + "Profile execution. Forces single-threaded execution. The argu= ment" + " specifies how to sort the results; see" + " https://docs.python.org/3/library/profile.html#pstats.Stats.= sort_stats" + ), + ) + + dev_options.add_argument( + "--skip-checks", + action=3D"store_true", + help=3D"Do everything except actually running the checks.", + ) + + # parse arguments + + args =3D parser.parse_args() + args.check_names =3D sorted(set(args.check_names or CHECKS)) + + return args + + +# ------------------------------------------------------------------------= ---- # +# Main + + +def main() -> int: + + args =3D parse_args() + + if args.profile: + + import cProfile + import pstats + + profile =3D cProfile.Profile() + + try: + return profile.runcall(lambda: analyze(args)) + finally: + stats =3D pstats.Stats(profile, stream=3Dsys.stderr) + stats.strip_dirs() + stats.sort_stats(args.profile) + stats.print_stats() + + else: + + return analyze(args) + + +def analyze(args: Namespace) -> int: + + tr_units =3D get_translation_units(args) + + # analyze translation units + + start_time =3D time.monotonic() + results: List[bool] =3D [] + + if len(tr_units) =3D=3D 1: + progress_suffix =3D " of 1 translation unit...\033[0m\r" + else: + progress_suffix =3D f" of {len(tr_units)} translation units...\033= [0m\r" + + def print_progress() -> None: + print(f"\033[0;34mAnalyzed {len(results)}" + progress_suffix, end= =3D"") + + print_progress() + + def collect_results(results_iter: Iterable[bool]) -> None: + if sys.stdout.isatty(): + for r in results_iter: + results.append(r) + print_progress() + else: + for r in results_iter: + results.append(r) + + if tr_units: + + if args.threads =3D=3D 1: + + collect_results(map(analyze_translation_unit, tr_units)) + + else: + + # Mimic Python's default pool.map() chunk size, but limit it to + # 5 to avoid very big chunks when analyzing thousands of + # translation units. + chunk_size =3D min(5, -(-len(tr_units) // (args.threads * 4))) + + with Pool(processes=3Dargs.threads) as pool: + collect_results( + pool.imap_unordered( + analyze_translation_unit, tr_units, chunk_size + ) + ) + + end_time =3D time.monotonic() + + # print summary + + if len(tr_units) =3D=3D 1: + message =3D "Analyzed 1 translation unit" + else: + message =3D f"Analyzed {len(tr_units)} translation units" + + message +=3D f" in {end_time - start_time:.1f} seconds." + + print(f"\033[0;34m{message}\033[0m") + + # exit + + return 0 if all(results) else 3 + + +# ------------------------------------------------------------------------= ---- # +# Translation units + + +@dataclass +class TranslationUnit: + absolute_path: str + build_working_dir: str + build_command: str + system_include_paths: Sequence[str] + check_names: Sequence[str] + + +def get_translation_units(args: Namespace) -> Sequence["TranslationUnit"]: + """Return a list of translation units to be analyzed.""" + + system_include_paths =3D get_clang_system_include_paths() + compile_commands =3D load_compilation_database(args.build_dir) + + # get all translation units + + tr_units: Iterable[TranslationUnit] =3D ( + TranslationUnit( + absolute_path=3Dstr(Path(cmd["directory"], cmd["file"]).resolv= e()), + build_working_dir=3Dcmd["directory"], + build_command=3Dcmd["command"], + system_include_paths=3Dsystem_include_paths, + check_names=3Dargs.check_names, + ) + for cmd in compile_commands + ) + + # ignore translation units from git submodules + + repo_root =3D (args.build_dir / "Makefile").resolve(strict=3DTrue).par= ent + module_file =3D repo_root / ".gitmodules" + assert module_file.exists() + + modules =3D ConfigParser() + modules.read(module_file) + + disallowed_prefixes =3D [ + # ensure path is slash-terminated + os.path.join(repo_root, section["path"], "") + for section in modules.values() + if "path" in section + ] + + tr_units =3D ( + ctx + for ctx in tr_units + if all( + not ctx.absolute_path.startswith(prefix) + for prefix in disallowed_prefixes + ) + ) + + # filter translation units by command line arguments + + if args.translation_units: + + allowed_prefixes =3D [ + # ensure path exists and is slash-terminated (even if it is a = file) + os.path.join(path.resolve(strict=3DTrue), "") + for path in args.translation_units + ] + + tr_units =3D ( + ctx + for ctx in tr_units + if any( + (ctx.absolute_path + "/").startswith(prefix) + for prefix in allowed_prefixes + ) + ) + + # ensure that at least one translation unit is selected + + tr_unit_list =3D list(tr_units) + + if not tr_unit_list: + fatal("No translation units to analyze") + + # disable all checks if --skip-checks was given + + if args.skip_checks: + for context in tr_unit_list: + context.check_names =3D [] + + return tr_unit_list + + +def get_clang_system_include_paths() -> Sequence[str]: + + # libclang does not automatically include clang's standard system incl= ude + # paths, so we ask clang what they are and include them ourselves. + + result =3D subprocess.run( + ["clang", "-E", "-", "-v"], + stdin=3Dsubprocess.DEVNULL, + stdout=3Dsubprocess.DEVNULL, + stderr=3Dsubprocess.PIPE, + universal_newlines=3DTrue, # decode output using default encoding + check=3DTrue, + ) + + # Module `re` does not support repeated group captures. + pattern =3D ( + r"#include <...> search starts here:\n" + r"((?: \S*\n)+)" + r"End of search list." + ) + + match =3D re.search(pattern, result.stderr, re.MULTILINE) + assert match is not None + + return [line[1:] for line in match.group(1).splitlines()] + + +def load_compilation_database(build_dir: Path) -> Sequence[Mapping[str, st= r]]: + + # clang.cindex.CompilationDatabase.getCompileCommands() apparently pro= duces + # entries for files not listed in compile_commands.json in a best-effo= rt + # manner, which we don't want, so we parse the JSON ourselves instead. + + path =3D build_dir / "compile_commands.json" + + try: + with path.open("r") as f: + return json.load(f) + except FileNotFoundError: + fatal(f"{path} does not exist") + + +# ------------------------------------------------------------------------= ---- # +# Analysis + + +def analyze_translation_unit(tr_unit: TranslationUnit) -> bool: + + check_context =3D get_check_context(tr_unit) + + try: + for name in tr_unit.check_names: + CHECKS[name](check_context) + except Exception as e: + raise RuntimeError(f"Error analyzing {check_context._rel_path}") f= rom e + + return not check_context._problems_found + + +def get_check_context(tr_unit: TranslationUnit) -> CheckContext: + + # relative to script's original working directory + rel_path =3D os.path.relpath(tr_unit.absolute_path) + + # load translation unit + + command =3D shlex.split(tr_unit.build_command) + + adjusted_command =3D [ + # keep the original compilation command name + command[0], + # ignore unknown GCC warning options + "-Wno-unknown-warning-option", + # keep all other arguments but the last, which is the file name + *command[1:-1], + # add clang system include paths + *( + arg + for path in tr_unit.system_include_paths + for arg in ("-isystem", path) + ), + # replace relative path to get absolute location information + tr_unit.absolute_path, + ] + + # clang can warn about things that GCC doesn't + if "-Werror" in adjusted_command: + adjusted_command.remove("-Werror") + + # We change directory for options like -I to work, but have to change = back + # to have correct relative paths in messages. + with cwd(tr_unit.build_working_dir): + + try: + tu =3D clang.cindex.TranslationUnit.from_source( + filename=3DNone, args=3Dadjusted_command + ) + except clang.cindex.TranslationUnitLoadError as e: + raise RuntimeError(f"Failed to load {rel_path}") from e + + if sys.stdout.isatty(): + # add padding to fully overwrite progress message + printer =3D lambda s: print(s.ljust(50)) + else: + printer =3D print + + check_context =3D CheckContext( + translation_unit=3Dtu, + translation_unit_path=3Dtr_unit.absolute_path, + _rel_path=3Drel_path, + _build_working_dir=3DPath(tr_unit.build_working_dir), + _problems_found=3DFalse, + _printer=3Dprinter, + ) + + # check for error/fatal diagnostics + + for diag in tu.diagnostics: + if diag.severity >=3D clang.cindex.Diagnostic.Error: + check_context._problems_found =3D True + location =3D check_context.format_location(diag) + check_context._printer( + f"\033[0;33m{location}: {diag.spelling}; this may lead to = false" + f" positives and negatives\033[0m" + ) + + return check_context + + +# ------------------------------------------------------------------------= ---- # +# Utilities + + +@contextmanager +def cwd(path: Union[str, Path]) -> Iterator[None]: + + original_cwd =3D os.getcwd() + os.chdir(path) + + try: + yield + finally: + os.chdir(original_cwd) + + +def fatal(message: str) -> NoReturn: + print(f"\033[0;31mERROR: {message}\033[0m") + sys.exit(1) + + +# ------------------------------------------------------------------------= ---- # + +if __name__ =3D=3D "__main__": + sys.exit(main()) + +# ------------------------------------------------------------------------= ---- # diff --git a/static_analyzer/__init__.py b/static_analyzer/__init__.py new file mode 100644 index 0000000000..e6ca48d714 --- /dev/null +++ b/static_analyzer/__init__.py @@ -0,0 +1,242 @@ +# ------------------------------------------------------------------------= ---- # + +from ctypes import CFUNCTYPE, c_int, py_object +from dataclasses import dataclass +from enum import Enum +import os +import os.path +from pathlib import Path +from importlib import import_module +from typing import ( + Any, + Callable, + Dict, + List, + Optional, + Union, +) + +from clang.cindex import ( # type: ignore + Cursor, + CursorKind, + TranslationUnit, + SourceLocation, + conf, +) + +# ------------------------------------------------------------------------= ---- # +# Monkeypatch clang.cindex + +Cursor.__hash__ =3D lambda self: self.hash # so `Cursor`s can be dict keys + +# ------------------------------------------------------------------------= ---- # +# Traversal + + +class VisitorResult(Enum): + + BREAK =3D 0 + """Terminates the cursor traversal.""" + + CONTINUE =3D 1 + """Continues the cursor traversal with the next sibling of the cursor = just + visited, without visiting its children.""" + + RECURSE =3D 2 + """Recursively traverse the children of this cursor.""" + + +def visit(root: Cursor, visitor: Callable[[Cursor], VisitorResult]) -> boo= l: + """ + A simple wrapper around `clang_visitChildren()`. + + The `visitor` callback is called for each visited node, with that node= as + its argument. `root` is NOT visited. + + Unlike a standard `Cursor`, the callback argument will have a `parent`= field + that points to its parent in the AST. The `parent` will also have its = own + `parent` field, and so on, unless it is `root`, in which case its `par= ent` + field is `None`. We add this because libclang's `lexical_parent` field= is + almost always `None` for some reason. + + Returns `false` if the visitation was aborted by the callback returning + `VisitorResult.BREAK`. Returns `true` otherwise. + """ + + tu =3D root._tu + root.parent =3D None + + # Stores the path from `root` to the node being visited. We need this = to set + # `node.parent`. + path: List[Cursor] =3D [root] + + exception: List[BaseException] =3D [] + + @CFUNCTYPE(c_int, Cursor, Cursor, py_object) + def actual_visitor(node: Cursor, parent: Cursor, client_data: None) ->= int: + + try: + + # The `node` and `parent` `Cursor` objects are NOT reused in b= etween + # invocations of this visitor callback, so we can't assume that + # `parent.parent` is set. + + while path[-1] !=3D parent: + path.pop() + + node.parent =3D path[-1] + path.append(node) + + # several clang.cindex methods need Cursor._tu to be set + node._tu =3D tu + + return visitor(node).value + + except BaseException as e: + + # Exceptions can't cross into C. Stash it, abort the visitatio= n, and + # reraise it. + + exception.append(e) + return VisitorResult.BREAK.value + + result =3D conf.lib.clang_visitChildren(root, actual_visitor, None) + + if exception: + raise exception[0] + + return result =3D=3D 0 + + +# ------------------------------------------------------------------------= ---- # +# Node predicates + + +def might_have_attribute(node: Cursor, attr: Union[CursorKind, str]) -> bo= ol: + """ + Check whether any of `node`'s children are an attribute of the given k= ind, + or an attribute of kind `UNEXPOSED_ATTR` with the given `str` spelling. + + This check is best-effort and may erroneously return `True`. + """ + + if isinstance(attr, CursorKind): + + assert attr.is_attribute() + + def matcher(n: Cursor) -> bool: + return n.kind =3D=3D attr + + else: + + def matcher(n: Cursor) -> bool: + if n.kind !=3D CursorKind.UNEXPOSED_ATTR: + return False + tokens =3D list(n.get_tokens()) + # `tokens` can have 0 or more than 1 element when the attribute + # comes from a macro expansion. AFAICT, in that case we can't = do + # better here than tell callers that this might be the attribu= te + # that they're looking for. + return len(tokens) !=3D 1 or tokens[0].spelling =3D=3D attr + + return any(map(matcher, node.get_children())) + + +# ------------------------------------------------------------------------= ---- # +# Checks + + +@dataclass +class CheckContext: + + translation_unit: TranslationUnit + translation_unit_path: str # exactly as reported by libclang + + _rel_path: str + _build_working_dir: Path + _problems_found: bool + + _printer: Callable[[str], None] + + def format_location(self, obj: Any) -> str: + """obj must have a location field/property that is a + `SourceLocation`.""" + return self._format_location(obj.location) + + def _format_location(self, loc: SourceLocation) -> str: + + if loc.file is None: + return self._rel_path + else: + abs_path =3D (self._build_working_dir / loc.file.name).resolve= () + rel_path =3D os.path.relpath(abs_path) + return f"{rel_path}:{loc.line}:{loc.column}" + + def report(self, node: Cursor, message: str) -> None: + self._problems_found =3D True + self._printer(f"{self.format_location(node)}: {message}") + + def print_node(self, node: Cursor) -> None: + """This can be handy when developing checks.""" + + print(f"{self.format_location(node)}: kind =3D {node.kind.name}", = end=3D"") + + if node.spelling: + print(f", spelling =3D {node.spelling!r}", end=3D"") + + if node.type is not None: + print(f", type =3D {node.type.get_canonical().spelling!r}", en= d=3D"") + + if node.referenced is not None: + print(f", referenced =3D {node.referenced.spelling!r}", end=3D= "") + + start =3D self._format_location(node.extent.start) + end =3D self._format_location(node.extent.end) + print(f", extent =3D {start}--{end}") + + def print_tree( + self, + node: Cursor, + *, + max_depth: Optional[int] =3D None, + indentation_level: int =3D 0, + ) -> None: + """This can be handy when developing checks.""" + + if max_depth is None or max_depth >=3D 0: + + print(" " * indentation_level, end=3D"") + self.print_node(node) + + for child in node.get_children(): + self.print_tree( + child, + max_depth=3DNone if max_depth is None else max_depth -= 1, + indentation_level=3Dindentation_level + 1, + ) + + +Checker =3D Callable[[CheckContext], None] + +CHECKS: Dict[str, Checker] =3D {} + + +def check(name: str) -> Callable[[Checker], Checker]: + def decorator(checker: Checker) -> Checker: + assert name not in CHECKS + CHECKS[name] =3D checker + return checker + + return decorator + + +# ------------------------------------------------------------------------= ---- # +# Import all checks + +for path in Path(__file__).parent.glob("**/*.py"): + if path.name !=3D "__init__.py": + rel_path =3D path.relative_to(Path(__file__).parent) + module =3D "." + ".".join([*rel_path.parts[:-1], rel_path.stem]) + import_module(module, __package__) + +# ------------------------------------------------------------------------= ---- # diff --git a/static_analyzer/return_value_never_used.py b/static_analyzer/r= eturn_value_never_used.py new file mode 100644 index 0000000000..05c06cd4c2 --- /dev/null +++ b/static_analyzer/return_value_never_used.py @@ -0,0 +1,117 @@ +# ------------------------------------------------------------------------= ---- # + +from typing import Dict + +from clang.cindex import ( # type: ignore + Cursor, + CursorKind, + StorageClass, + TypeKind, +) + +from static_analyzer import ( + CheckContext, + VisitorResult, + check, + might_have_attribute, + visit, +) + +# ------------------------------------------------------------------------= ---- # + + +@check("return-value-never-used") +def check_return_value_never_used(context: CheckContext) -> None: + """Report static functions with a non-void return value that no caller= ever + uses.""" + + # Maps canonical function `Cursor`s to whether we found a place that m= aybe + # uses their return value. Only includes static functions that don't r= eturn + # void, don't have __attribute__((unused)), and belong to the translat= ion + # unit's root file (i.e., were not brought in by an #include). + funcs: Dict[Cursor, bool] =3D {} + + def visitor(node: Cursor) -> VisitorResult: + + if ( + node.kind =3D=3D CursorKind.FUNCTION_DECL + and node.storage_class =3D=3D StorageClass.STATIC + and node.location.file.name =3D=3D context.translation_unit_pa= th + and node.type.get_result().get_canonical().kind !=3D TypeKind.= VOID + and not might_have_attribute(node, "unused") + ): + funcs.setdefault(node.canonical, False) + + if ( + node.kind =3D=3D CursorKind.DECL_REF_EXPR + and node.referenced.kind =3D=3D CursorKind.FUNCTION_DECL + and node.referenced.canonical in funcs + and function_occurrence_might_use_return_value(node) + ): + funcs[node.referenced.canonical] =3D True + + return VisitorResult.RECURSE + + visit(context.translation_unit.cursor, visitor) + + for (cursor, return_value_maybe_used) in funcs.items(): + if not return_value_maybe_used: + context.report( + cursor, f"{cursor.spelling}() return value is never used" + ) + + +def function_occurrence_might_use_return_value(node: Cursor) -> bool: + + parent =3D get_parent_if_unexposed_expr(node.parent) + + if parent.kind.is_statement(): + + return False + + elif ( + parent.kind =3D=3D CursorKind.CALL_EXPR + and parent.referenced =3D=3D node.referenced + ): + + grandparent =3D get_parent_if_unexposed_expr(parent.parent) + + if not grandparent.kind.is_statement(): + return True + + if grandparent.kind in [ + CursorKind.IF_STMT, + CursorKind.SWITCH_STMT, + CursorKind.WHILE_STMT, + CursorKind.DO_STMT, + CursorKind.RETURN_STMT, + ]: + return True + + if grandparent.kind =3D=3D CursorKind.FOR_STMT: + + [*for_parts, for_body] =3D grandparent.get_children() + if len(for_parts) =3D=3D 0: + return False + elif len(for_parts) in [1, 2]: + return True # call may be in condition part of for loop + elif len(for_parts) =3D=3D 3: + # Comparison doesn't work properly with `Cursor`s originat= ing + # from nested visitations, so we compare the extent instea= d. + return parent.extent =3D=3D for_parts[1].extent + else: + assert False + + return False + + else: + + # might be doing something with a pointer to the function + return True + + +def get_parent_if_unexposed_expr(node: Cursor) -> Cursor: + return node.parent if node.kind =3D=3D CursorKind.UNEXPOSED_EXPR else = node + + +# ------------------------------------------------------------------------= ---- # --=20 2.37.1