From nobody Wed Apr 8 10:32:43 2026 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id E45C03EFD1A; Tue, 17 Mar 2026 18:09:47 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773770988; cv=none; b=jI6azq/WjbXTzprxcHPjaUjH4lrW+SRh5vV/ETMDh6GzLbdYqrRfjpfl/M/vZlmYuPRU0lASNpMfRTWdzj31Quu/xDHHQfHSbhgYuhRXLomCnnxa/E+E5hEOFJuGRSa5qG2HDeo6nEHqH+WlKBJ0DeaZRjeAF+BhuEm22dIl0pA= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773770988; c=relaxed/simple; bh=6GMr10A7PQzOhRN1qRUSN9zmJa91p74hL/Zmw065EaY=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=nW9ouy46kClciXRDn25b1e/VWQyGLema3lmx4uuvQfBQ7Pu229/HyHnEbBw93SRVq5ZYSdPDREx7U4rdr6SVSSmmmAGH0M7y9b0Ck+t0+zgKFMQb6UT/jhCYHK59PsVuA/HF8FxTmwGZNXj/na0jksEijKbbVxJOLUulfLALWik= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=LDnfuX/a; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="LDnfuX/a" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 5D1E8C4CEF7; Tue, 17 Mar 2026 18:09:47 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1773770987; bh=6GMr10A7PQzOhRN1qRUSN9zmJa91p74hL/Zmw065EaY=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=LDnfuX/amHxig2BSDbFa/2k1vR6IbSOHBVttcxyHGV0Smoa/h8pTMQiPjggluFe8W 34KCmI+cihfgJUfuU7yMOt19nkJYveFGjoZzj1lCPHHNcK9u/4a2YK6BgvSt15++K0 9aB7sdc7uDz3lEp9hIuodfAUDlfA1D2gwa2uIlTAaTAwWCEfOdWteegxUidAkeNS+C 7L636jFds0RvfZk02GeZk8NqbbNyF4DzfGixBcR2sjFQzkc9xvAYrHxAAHYtlWL+cs mZgHIb0xHJn+LUeszQ/5ZzikLhLKnWVsbvNZhYZrztQj8TgqCv87KwDpr6IlThQCBS UKG1Z7Ej/lnLQ== Received: from mchehab by mail.kernel.org with local (Exim 4.99.1) (envelope-from ) id 1w2YrR-0000000H5FC-25dk; Tue, 17 Mar 2026 19:09:45 +0100 From: Mauro Carvalho Chehab To: Jonathan Corbet , Linux Doc Mailing List Cc: Mauro Carvalho Chehab , linux-hardening@vger.kernel.org, linux-kernel@vger.kernel.org, Mauro Carvalho Chehab , Shuah Khan Subject: [PATCH v3 01/22] docs: python: add helpers to run unit tests Date: Tue, 17 Mar 2026 19:09:21 +0100 Message-ID: X-Mailer: git-send-email 2.52.0 In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Sender: Mauro Carvalho Chehab While python internal libraries have support for unit tests, its output is not nice. Add a helper module to improve its output. I wrote this module last year while testing some scripts I used internally. The initial skeleton was generated with the help of LLM tools, but it was higly modified to ensure that it will work as I would expect. Signed-off-by: Mauro Carvalho Chehab Message-ID: <37999041f616ddef41e84cf2686c0264d1a51dc9.1773074166.git.mcheha= b+huawei@kernel.org> --- Documentation/tools/python.rst | 2 + Documentation/tools/unittest.rst | 24 ++ tools/lib/python/unittest_helper.py | 353 ++++++++++++++++++++++++++++ 3 files changed, 379 insertions(+) create mode 100644 Documentation/tools/unittest.rst create mode 100755 tools/lib/python/unittest_helper.py diff --git a/Documentation/tools/python.rst b/Documentation/tools/python.rst index 1444c1816735..3b7299161f20 100644 --- a/Documentation/tools/python.rst +++ b/Documentation/tools/python.rst @@ -11,3 +11,5 @@ Python libraries feat kdoc kabi + + unittest diff --git a/Documentation/tools/unittest.rst b/Documentation/tools/unittes= t.rst new file mode 100644 index 000000000000..14a2b2a65236 --- /dev/null +++ b/Documentation/tools/unittest.rst @@ -0,0 +1,24 @@ +.. SPDX-License-Identifier: GPL-2.0 + +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D +Python unittest +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D + +Checking consistency of python modules can be complex. Sometimes, it is +useful to define a set of unit tests to help checking them. + +While the actual test implementation is usecase dependent, Python already +provides a standard way to add unit tests by using ``import unittest``. + +Using such class, requires setting up a test suite. Also, the default form= at +is a little bit ackward. To improve it and provide a more uniform way to +report errors, some unittest classes and functions are defined. + + +Unittest helper module +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D + +.. automodule:: lib.python.unittest_helper + :members: + :show-inheritance: + :undoc-members: diff --git a/tools/lib/python/unittest_helper.py b/tools/lib/python/unittes= t_helper.py new file mode 100755 index 000000000000..55d444cd73d4 --- /dev/null +++ b/tools/lib/python/unittest_helper.py @@ -0,0 +1,353 @@ +#!/usr/bin/env python3 +# SPDX-License-Identifier: GPL-2.0 +# Copyright(c) 2025-2026: Mauro Carvalho Chehab . +# +# pylint: disable=3DC0103,R0912,R0914,E1101 + +""" +Provides helper functions and classes execute python unit tests. + +Those help functions provide a nice colored output summary of each +executed test and, when a test fails, it shows the different in diff +format when running in verbose mode, like:: + + $ tools/unittests/nested_match.py -v + ... + Traceback (most recent call last): + File "/new_devel/docs/tools/unittests/nested_match.py", line 69, in te= st_count_limit + self.assertEqual(replaced, "bar(a); bar(b); foo(c)") + ~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + AssertionError: 'bar(a) foo(b); foo(c)' !=3D 'bar(a); bar(b); foo(c)' + - bar(a) foo(b); foo(c) + ? ^^^^ + + bar(a); bar(b); foo(c) + ? ^^^^^ + ... + +It also allows filtering what tests will be executed via ``-k`` parameter. + +Typical usage is to do:: + + from unittest_helper import run_unittest + ... + + if __name__ =3D=3D "__main__": + run_unittest(__file__) + +If passing arguments is needed, on a more complex scenario, it can be +used like on this example:: + + from unittest_helper import TestUnits, run_unittest + ... + env =3D {'sudo': ""} + ... + if __name__ =3D=3D "__main__": + runner =3D TestUnits() + base_parser =3D runner.parse_args() + base_parser.add_argument('--sudo', action=3D'store_true', + help=3D'Enable tests requiring sudo privil= eges') + + args =3D base_parser.parse_args() + + # Update module-level flag + if args.sudo: + env['sudo'] =3D "1" + + # Run tests with customized arguments + runner.run(__file__, parser=3Dbase_parser, args=3Dargs, env=3Denv) +""" + +import argparse +import atexit +import os +import re +import unittest +import sys + +from unittest.mock import patch + + +class Summary(unittest.TestResult): + """ + Overrides ``unittest.TestResult`` class to provide a nice colored + summary. When in verbose mode, displays actual/expected difference in + unified diff format. + """ + def __init__(self, *args, **kwargs): + super().__init__(*args, **kwargs) + + #: Dictionary to store organized test results. + self.test_results =3D {} + + #: max length of the test names. + self.max_name_length =3D 0 + + def startTest(self, test): + super().startTest(test) + test_id =3D test.id() + parts =3D test_id.split(".") + + # Extract module, class, and method names + if len(parts) >=3D 3: + module_name =3D parts[-3] + else: + module_name =3D "" + if len(parts) >=3D 2: + class_name =3D parts[-2] + else: + class_name =3D "" + + method_name =3D parts[-1] + + # Build the hierarchical structure + if module_name not in self.test_results: + self.test_results[module_name] =3D {} + + if class_name not in self.test_results[module_name]: + self.test_results[module_name][class_name] =3D [] + + # Track maximum test name length for alignment + display_name =3D f"{method_name}:" + + self.max_name_length =3D max(len(display_name), self.max_name_leng= th) + + def _record_test(self, test, status): + test_id =3D test.id() + parts =3D test_id.split(".") + if len(parts) >=3D 3: + module_name =3D parts[-3] + else: + module_name =3D "" + if len(parts) >=3D 2: + class_name =3D parts[-2] + else: + class_name =3D "" + method_name =3D parts[-1] + self.test_results[module_name][class_name].append((method_name, st= atus)) + + def addSuccess(self, test): + super().addSuccess(test) + self._record_test(test, "OK") + + def addFailure(self, test, err): + super().addFailure(test, err) + self._record_test(test, "FAIL") + + def addError(self, test, err): + super().addError(test, err) + self._record_test(test, "ERROR") + + def addSkip(self, test, reason): + super().addSkip(test, reason) + self._record_test(test, f"SKIP ({reason})") + + def printResults(self): + """ + Print results using colors if tty. + """ + # Check for ANSI color support + use_color =3D sys.stdout.isatty() + COLORS =3D { + "OK": "\033[32m", # Green + "FAIL": "\033[31m", # Red + "SKIP": "\033[1;33m", # Yellow + "PARTIAL": "\033[33m", # Orange + "EXPECTED_FAIL": "\033[36m", # Cyan + "reset": "\033[0m", # Reset to default terminal col= or + } + if not use_color: + for c in COLORS: + COLORS[c] =3D "" + + # Calculate maximum test name length + if not self.test_results: + return + try: + lengths =3D [] + for module in self.test_results.values(): + for tests in module.values(): + for test_name, _ in tests: + lengths.append(len(test_name) + 1) # +1 for colon + max_length =3D max(lengths) + 2 # Additional padding + except ValueError: + sys.exit("Test list is empty") + + # Print results + for module_name, classes in self.test_results.items(): + print(f"{module_name}:") + for class_name, tests in classes.items(): + print(f" {class_name}:") + for test_name, status in tests: + # Get base status without reason for SKIP + if status.startswith("SKIP"): + status_code =3D status.split()[0] + else: + status_code =3D status + color =3D COLORS.get(status_code, "") + print( + f" {test_name + ':':<{max_length}}{color}{s= tatus}{COLORS['reset']}" + ) + print() + + # Print summary + print(f"\nRan {self.testsRun} tests", end=3D"") + if hasattr(self, "timeTaken"): + print(f" in {self.timeTaken:.3f}s", end=3D"") + print() + + if not self.wasSuccessful(): + print(f"\n{COLORS['FAIL']}FAILED (", end=3D"") + failures =3D getattr(self, "failures", []) + errors =3D getattr(self, "errors", []) + if failures: + print(f"failures=3D{len(failures)}", end=3D"") + if errors: + if failures: + print(", ", end=3D"") + print(f"errors=3D{len(errors)}", end=3D"") + print(f"){COLORS['reset']}") + + +def flatten_suite(suite): + """Flatten test suite hierarchy.""" + tests =3D [] + for item in suite: + if isinstance(item, unittest.TestSuite): + tests.extend(flatten_suite(item)) + else: + tests.append(item) + return tests + + +class TestUnits: + """ + Helper class to set verbosity level. + + This class discover test files, import its unittest classes and + executes the test on it. + """ + def parse_args(self): + """Returns a parser for command line arguments.""" + parser =3D argparse.ArgumentParser(description=3D"Test runner with= regex filtering") + parser.add_argument("-v", "--verbose", action=3D"count", default= =3D1) + parser.add_argument("-f", "--failfast", action=3D"store_true") + parser.add_argument("-k", "--keyword", + help=3D"Regex pattern to filter test methods") + return parser + + def run(self, caller_file=3DNone, pattern=3DNone, + suite=3DNone, parser=3DNone, args=3DNone, env=3DNone): + """ + Execute all tests from the unity test file. + + It contains several optional parameters: + + ``caller_file``: + - name of the file that contains test. + + typical usage is to place __file__ at the caller test, e.g.= :: + + if __name__ =3D=3D "__main__": + TestUnits().run(__file__) + + ``pattern``: + - optional pattern to match multiple file names. Defaults + to basename of ``caller_file``. + + ``suite``: + - an unittest suite initialized by the caller using + ``unittest.TestLoader().discover()``. + + ``parser``: + - an argparse parser. If not defined, this helper will create + one. + + ``args``: + - an ``argparse.Namespace`` data filled by the caller. + + ``env``: + - environment variables that will be passed to the test suite + + At least ``caller_file`` or ``suite`` must be used, otherwise a + ``TypeError`` will be raised. + """ + if not args: + if not parser: + parser =3D self.parse_args() + args =3D parser.parse_args() + + if not caller_file and not suite: + raise TypeError("Either caller_file or suite is needed at Test= Units") + + verbose =3D args.verbose + + if not env: + env =3D os.environ.copy() + + env["VERBOSE"] =3D f"{verbose}" + + patcher =3D patch.dict(os.environ, env) + patcher.start() + # ensure it gets stopped after + atexit.register(patcher.stop) + + + if verbose >=3D 2: + unittest.TextTestRunner(verbosity=3Dverbose).run =3D lambda su= ite: suite + + # Load ONLY tests from the calling file + if not suite: + if not pattern: + pattern =3D caller_file + + loader =3D unittest.TestLoader() + suite =3D loader.discover(start_dir=3Dos.path.dirname(caller_f= ile), + pattern=3Dos.path.basename(caller_file= )) + + # Flatten the suite for environment injection + tests_to_inject =3D flatten_suite(suite) + + # Filter tests by method name if -k specified + if args.keyword: + try: + pattern =3D re.compile(args.keyword) + filtered_suite =3D unittest.TestSuite() + for test in tests_to_inject: # Use the pre-flattened list + method_name =3D test.id().split(".")[-1] + if pattern.search(method_name): + filtered_suite.addTest(test) + suite =3D filtered_suite + except re.error as e: + sys.stderr.write(f"Invalid regex pattern: {e}\n") + sys.exit(1) + else: + # Maintain original suite structure if no keyword filtering + suite =3D unittest.TestSuite(tests_to_inject) + + if verbose >=3D 2: + resultclass =3D None + else: + resultclass =3D Summary + + runner =3D unittest.TextTestRunner(verbosity=3Dargs.verbose, + resultclass=3Dresultclass, + failfast=3Dargs.failfast) + result =3D runner.run(suite) + if resultclass: + result.printResults() + + sys.exit(not result.wasSuccessful()) + + +def run_unittest(fname): + """ + Basic usage of TestUnits class. + + Use it when there's no need to pass any extra argument to the tests + with. The recommended way is to place this at the end of each + unittest module:: + + if __name__ =3D=3D "__main__": + run_unittest(__file__) + """ + TestUnits().run(fname) --=20 2.52.0 From nobody Wed Apr 8 10:32:43 2026 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id F15C63F0778; Tue, 17 Mar 2026 18:09:47 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773770988; cv=none; b=uHvaBAYBdZed9+8CWdKVAUL94fFfioPiEvTuwaicmiRqrtN9SIi+oSBOc9cFtHSeLTshTO0UcZDS050j1a3+HLgnqfyFT21xsIxp8tDTPt32CKWnKwfHiqhmKxb4r8txTF21nGA5HWrOHq3NZB9uPmBMTWMdJpFhyQXh2CcW3tk= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773770988; c=relaxed/simple; bh=1uxfkQgXX2b/e92MF0mNuJcjR0VUeRhzcj4mkmRc5x0=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=D8A+pIBKjsEQAIG0A+iGjAnhNtzWC62S5dS6ntsu7wxJ32O0xq2lHCmDLg6kJeHT6R/y8d52uerur5mGF5mBCzzwj8o7f03tleDlOQnb9f4nv1j0DbhF9pONe9a84tA0YAS61iRT28o3YQ3eBjnbZ986MzMd5Esu3UTNZHNZUGs= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=nh2+XJk5; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="nh2+XJk5" Received: by smtp.kernel.org (Postfix) with ESMTPSA id A3675C2BCB0; Tue, 17 Mar 2026 18:09:47 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1773770987; bh=1uxfkQgXX2b/e92MF0mNuJcjR0VUeRhzcj4mkmRc5x0=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=nh2+XJk5rZtLdAPu9GLsX1AKYBAfq7aJ0h+1GczJKc8UkPxlwYep+j/OEwt1JYHLk yX98iu9v9hriiTOAyh1Ol+VwuJGoBcE7UqFU3GdZsNIOEMmv39xCZpQ12vZVcA+EzP shtkzFd031LvnK1UzYu5wViP1Nq7JYEL+lHp4J03N0QOToXX6rBsYq9GaW5uxu7Y66 6azcdqdg1tSahwP9q0ipfg9JfSi4ZMi/4polcFss9Entg/jnx8Asr7EyWaYtSAL8g0 Jx69/3mlXkJJWIvdY3/PFt+c/b5TS4ehE/mCJ9QkGtEpdg2AYP67wBdSmX2GcXH1Q/ WhES5y7ECvQcA== Received: from mchehab by mail.kernel.org with local (Exim 4.99.1) (envelope-from ) id 1w2YrR-0000000H5Gi-2xXp; Tue, 17 Mar 2026 19:09:45 +0100 From: Mauro Carvalho Chehab To: Jonathan Corbet , Linux Doc Mailing List Cc: Mauro Carvalho Chehab , linux-hardening@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [PATCH v3 02/22] unittests: add a testbench to check public/private kdoc comments Date: Tue, 17 Mar 2026 19:09:22 +0100 Message-ID: X-Mailer: git-send-email 2.52.0 In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Sender: Mauro Carvalho Chehab Add unit tests to check if the public/private and comments strip is working properly. Running it shows that, on several cases, public/private is not doing what it is expected: test_private: TestPublicPrivate: test balanced_inner_private: OK test balanced_non_greddy_private: OK test balanced_private: OK test no private: OK test unbalanced_inner_private: FAIL test unbalanced_private: FAIL test unbalanced_struct_group_tagged_with_private: FAIL test unbalanced_two_struct_group_tagged_first_with_private: FAIL test unbalanced_without_end_of_line: FAIL Ran 9 tests FAILED (failures=3D5) Signed-off-by: Mauro Carvalho Chehab Message-ID: <144f4952e0cb74fe9c9adc117e9a21ec8aa1cc10.1773074166.git.mcheha= b+huawei@kernel.org> --- tools/unittests/test_private.py | 331 ++++++++++++++++++++++++++++++++ 1 file changed, 331 insertions(+) create mode 100755 tools/unittests/test_private.py diff --git a/tools/unittests/test_private.py b/tools/unittests/test_private= .py new file mode 100755 index 000000000000..eae245ae8a12 --- /dev/null +++ b/tools/unittests/test_private.py @@ -0,0 +1,331 @@ +#!/usr/bin/env python3 + +""" +Unit tests for struct/union member extractor class. +""" + + +import os +import re +import unittest +import sys + +from unittest.mock import MagicMock + +SRC_DIR =3D os.path.dirname(os.path.realpath(__file__)) +sys.path.insert(0, os.path.join(SRC_DIR, "../lib/python")) + +from kdoc.kdoc_parser import trim_private_members +from unittest_helper import run_unittest + +# +# List of tests. +# +# The code will dynamically generate one test for each key on this diction= ary. +# + +#: Tests to check if CTokenizer is handling properly public/private commen= ts. +TESTS_PRIVATE =3D { + # + # Simplest case: no private. Ensure that trimming won't affect struct + # + "no private": { + "source": """ + struct foo { + int a; + int b; + int c; + }; + """, + "trimmed": """ + struct foo { + int a; + int b; + int c; + }; + """, + }, + + # + # Play "by the books" by always having a public in place + # + + "balanced_private": { + "source": """ + struct foo { + int a; + /* private: */ + int b; + /* public: */ + int c; + }; + """, + "trimmed": """ + struct foo { + int a; + int c; + }; + """, + }, + + "balanced_non_greddy_private": { + "source": """ + struct foo { + int a; + /* private: */ + int b; + /* public: */ + int c; + /* private: */ + int d; + /* public: */ + int e; + + }; + """, + "trimmed": """ + struct foo { + int a; + int c; + int e; + }; + """, + }, + + "balanced_inner_private": { + "source": """ + struct foo { + struct { + int a; + /* private: ignore below */ + int b; + /* public: but this should not be ignored */ + }; + int b; + }; + """, + "trimmed": """ + struct foo { + struct { + int a; + }; + int b; + }; + """, + }, + + # + # Test what happens if there's no public after private place + # + + "unbalanced_private": { + "source": """ + struct foo { + int a; + /* private: */ + int b; + int c; + }; + """, + "trimmed": """ + struct foo { + int a; + }; + """, + }, + + "unbalanced_inner_private": { + "source": """ + struct foo { + struct { + int a; + /* private: ignore below */ + int b; + /* but this should not be ignored */ + }; + int b; + }; + """, + "trimmed": """ + struct foo { + struct { + int a; + }; + int b; + }; + """, + }, + + "unbalanced_struct_group_tagged_with_private": { + "source": """ + struct page_pool_params { + struct_group_tagged(page_pool_params_fast, fast, + unsigned int order; + unsigned int pool_size; + int nid; + struct device *dev; + struct napi_struct *napi; + enum dma_data_direction dma_dir; + unsigned int max_len; + unsigned int offset; + }; + struct_group_tagged(page_pool_params_slow, slow, + struct net_device *netdev; + unsigned int queue_idx; + unsigned int flags; + /* private: used by test code only */ + void (*init_callback)(netmem_ref netmem, void *arg= ); + void *init_arg; + }; + }; + """, + "trimmed": """ + struct page_pool_params { + struct_group_tagged(page_pool_params_fast, fast, + unsigned int order; + unsigned int pool_size; + int nid; + struct device *dev; + struct napi_struct *napi; + enum dma_data_direction dma_dir; + unsigned int max_len; + unsigned int offset; + }; + struct_group_tagged(page_pool_params_slow, slow, + struct net_device *netdev; + unsigned int queue_idx; + unsigned int flags; + }; + }; + """, + }, + + "unbalanced_two_struct_group_tagged_first_with_private": { + "source": """ + struct page_pool_params { + struct_group_tagged(page_pool_params_slow, slow, + struct net_device *netdev; + unsigned int queue_idx; + unsigned int flags; + /* private: used by test code only */ + void (*init_callback)(netmem_ref netmem, void *arg= ); + void *init_arg; + }; + struct_group_tagged(page_pool_params_fast, fast, + unsigned int order; + unsigned int pool_size; + int nid; + struct device *dev; + struct napi_struct *napi; + enum dma_data_direction dma_dir; + unsigned int max_len; + unsigned int offset; + }; + }; + """, + "trimmed": """ + struct page_pool_params { + struct_group_tagged(page_pool_params_slow, slow, + struct net_device *netdev; + unsigned int queue_idx; + unsigned int flags; + }; + struct_group_tagged(page_pool_params_fast, fast, + unsigned int order; + unsigned int pool_size; + int nid; + struct device *dev; + struct napi_struct *napi; + enum dma_data_direction dma_dir; + unsigned int max_len; + unsigned int offset; + }; + }; + """, + }, + "unbalanced_without_end_of_line": { + "source": """ \ + struct page_pool_params { \ + struct_group_tagged(page_pool_params_slow, slow, \ + struct net_device *netdev; \ + unsigned int queue_idx; \ + unsigned int flags; + /* private: used by test code only */ + void (*init_callback)(netmem_ref netmem, void *arg= ); \ + void *init_arg; \ + }; \ + struct_group_tagged(page_pool_params_fast, fast, \ + unsigned int order; \ + unsigned int pool_size; \ + int nid; \ + struct device *dev; \ + struct napi_struct *napi; \ + enum dma_data_direction dma_dir; \ + unsigned int max_len; \ + unsigned int offset; \ + }; \ + }; + """, + "trimmed": """ + struct page_pool_params { + struct_group_tagged(page_pool_params_slow, slow, + struct net_device *netdev; + unsigned int queue_idx; + unsigned int flags; + }; + struct_group_tagged(page_pool_params_fast, fast, + unsigned int order; + unsigned int pool_size; + int nid; + struct device *dev; + struct napi_struct *napi; + enum dma_data_direction dma_dir; + unsigned int max_len; + unsigned int offset; + }; + }; + """, + }, +} + + +class TestPublicPrivate(unittest.TestCase): + """ + Main test class. Populated dynamically at runtime. + """ + + def setUp(self): + self.maxDiff =3D None + + def add_test(cls, name, source, trimmed): + """ + Dynamically add a test to the class + """ + def test(cls): + result =3D trim_private_members(source) + + result =3D re.sub(r"\s++", " ", result).strip() + expected =3D re.sub(r"\s++", " ", trimmed).strip() + + msg =3D f"failed when parsing this source:\n" + source + + cls.assertEqual(result, expected, msg=3Dmsg) + + test.__name__ =3D f'test {name}' + + setattr(TestPublicPrivate, test.__name__, test) + + +# +# Populate TestPublicPrivate class +# +test_class =3D TestPublicPrivate() +for name, test in TESTS_PRIVATE.items(): + test_class.add_test(name, test["source"], test["trimmed"]) + + +# +# main +# +if __name__ =3D=3D "__main__": + run_unittest(__file__) --=20 2.52.0 From nobody Wed Apr 8 10:32:43 2026 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id E45323EB810; Tue, 17 Mar 2026 18:09:47 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773770988; cv=none; b=kMniCi1YWb2DoC0zjE9+VQvRPomRK5cGEGsvNVfRzbu71dejzr6wwuxDTvQhXAzFndOmTiaAbQvZ/GHAN+kpv6Aw9CigMkJSotpb8cxm7a3M5QdLLZl8OUDHC7daJ3VDFpnsAQAuNlQxWlYQHTdoCNCq7OqFJw+B97yFmBR8ne4= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773770988; c=relaxed/simple; bh=xUBIjCPiqikLumDJV0/ro119OoQ7PDq8BR8ypLrh5DM=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=hDc5Ah1aFIK7K+GUX8dvy5J1n6uaWB16AgSwZRGmakzUr5316p7biYJDGsr396NWglkma52JIM1miJqJJUHeyo5ZlhfYdgRqRNBmaUhbzXtVATs80ibe7jqJzJZ2+7V1iPWom49i2JK1Khfo2SlNJbBP82N9FAnz4TKYIAANUag= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=iO9bch1Q; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="iO9bch1Q" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 98CF2C2BC9E; Tue, 17 Mar 2026 18:09:47 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1773770987; bh=xUBIjCPiqikLumDJV0/ro119OoQ7PDq8BR8ypLrh5DM=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=iO9bch1QYjwx5A64hjbjXnwvv1lSbjisITynowNNomQaF7acj9NwYa0naYkwxLCjw 7zAbSYj8SnnAWXAKj2scVHfqBovy/RQNKuBOjZhkxdoM3mt2P76qMEKHcRO6upc1d1 C3l1rt3BSrQ4b0D3SyDffRdIFSL13FDUwbmrvkdO46SyWqeASwMYlCafDuHYHBH+mQ Q6gNkTDqRoctJaEN0GfzuLbsuiCBWkCvArMOK3zq8i79ttKiQde6iqq3xc0A81p9RK G0Nfh26c522cezt0F1o1Sh0iOsFQcHDOmS2sjVo3VNUQftHF/5thXLDwOHAEqNOeie 5c2cQBAfnoIRA== Received: from mchehab by mail.kernel.org with local (Exim 4.99.1) (envelope-from ) id 1w2YrR-0000000H5Hy-3oxl; Tue, 17 Mar 2026 19:09:45 +0100 From: Mauro Carvalho Chehab To: Jonathan Corbet , Linux Doc Mailing List Cc: Mauro Carvalho Chehab , linux-hardening@vger.kernel.org, linux-kernel@vger.kernel.org, Aleksandr Loktionov , Randy Dunlap Subject: [PATCH v3 03/22] docs: kdoc: don't add broken comments inside prototypes Date: Tue, 17 Mar 2026 19:09:23 +0100 Message-ID: <12ac4a97e2bd5a19d6537122c10098690c38d2c7.1773770483.git.mchehab+huawei@kernel.org> X-Mailer: git-send-email 2.52.0 In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Sender: Mauro Carvalho Chehab Parsing a file like drivers/scsi/isci/host.h, which contains broken kernel-doc markups makes it create a prototype that contains unmatched end comments. That causes, for instance, struct sci_power_control to be shown this this prototype: struct sci_power_control { * it is not. */ bool timer_started; */ struct sci_timer timer; * requesters field. */ u8 phys_waiting; */ u8 phys_granted_power; * mapped into requesters via struct sci_phy.phy_index */ struct isc= i_phy *requesters[SCI_MAX_PHYS]; }; as comments won't start with "/*" anymore. Fix the logic to detect such cases, and keep adding the comments inside it. Signed-off-by: Mauro Carvalho Chehab Message-ID: <18e577dbbd538dcc22945ff139fe3638344e14f0.1773074166.git.mcheha= b+huawei@kernel.org> Reviewed-by: Aleksandr Loktionov --- tools/lib/python/kdoc/kdoc_parser.py | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/tools/lib/python/kdoc/kdoc_parser.py b/tools/lib/python/kdoc/k= doc_parser.py index edf70ba139a5..086579d00b5c 100644 --- a/tools/lib/python/kdoc/kdoc_parser.py +++ b/tools/lib/python/kdoc/kdoc_parser.py @@ -1355,6 +1355,12 @@ class KernelDoc: elif doc_content.search(line): self.emit_msg(ln, f"Incorrect use of kernel-doc format: {line}= ") self.state =3D state.PROTO + + # + # Don't let it add partial comments at the code, as breaks the + # logic meant to remove comments from prototypes. + # + self.process_proto_type(ln, "/**\n" + line) # else ... ?? =20 def process_inline_text(self, ln, line): --=20 2.52.0 From nobody Wed Apr 8 10:32:43 2026 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 4127C3F7A80; Tue, 17 Mar 2026 18:09:48 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773770988; cv=none; b=n0YSoF5B9K2ELP4p2QEPO+R1wGt3EMS4iPW0Z8BV6v1IoWXkXn+CISYFqUweRb/6dcJoVG1q7Xq7gN8rKIu6//Jts58xolGP7T52mbCJNPhdzdxvzNTuvLX1rKtf6M1UxOUgztPbvLAjoJdiQ1xA5eiT4Oak4Pshc/bHAU0S0M0= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773770988; c=relaxed/simple; bh=n3BVrBoQMdxdpoipeTxJikhOj3p5syl4ggfaedUESPo=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=CoLjQD0Veio0oiyBORQrUilVzUFlp+N41zjIjNcJdbuaQndxqCz5amx5CNDrL4yJK78anS2LZhXZ/vKbFEQRwZq9/1+NZ9w2lKCIK0tNdgCKERJf8h+w0wRNVEWOUTzX7F9gRWFBGBIZil/e780cReqzZ/q8i11BebA2uuVR/gc= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=ENpKJELi; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="ENpKJELi" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 0BF6AC2BCB1; Tue, 17 Mar 2026 18:09:48 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1773770988; bh=n3BVrBoQMdxdpoipeTxJikhOj3p5syl4ggfaedUESPo=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=ENpKJELilG38U9Y6eJMkTxNuBgIDfaRk2n/RPMGBhDzD6xGf0hRvllAxtMXsmjR9s N0cg/rqxMnxbaiu6xpMwk5u7eQj4KODo6GWTjIZDQTbe0KfzLUEK7j44rg7D0Os8pD /y2GyIl7DIXj/jo24Q3TuOdeozr3yJwi3lLRbIoetRYSJUALYLevHEoNmarumV+JHt sC0kq7bp9Xt++8lyosR1aTiwcUVTkavHy7ejD8KzVVdM5KzpK0CvoRF4pxLgbUUEwa GhCe9BobfbTNjVqCmxzotreiHbK6hy66SYX7bS6zH4EMPwI/5+E8ZAd6sEsgc9kmgx t+sbJ2KsHsvtA== Received: from mchehab by mail.kernel.org with local (Exim 4.99.1) (envelope-from ) id 1w2YrS-0000000H5JB-0QU3; Tue, 17 Mar 2026 19:09:46 +0100 From: Mauro Carvalho Chehab To: Jonathan Corbet , Linux Doc Mailing List Cc: Mauro Carvalho Chehab , linux-hardening@vger.kernel.org, linux-kernel@vger.kernel.org, Aleksandr Loktionov , Randy Dunlap Subject: [PATCH v3 04/22] docs: kdoc: properly handle empty enum arguments Date: Tue, 17 Mar 2026 19:09:24 +0100 Message-ID: <640784283d52c5fc52ea597344ecd567e2fb6e22.1773770483.git.mchehab+huawei@kernel.org> X-Mailer: git-send-email 2.52.0 In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Sender: Mauro Carvalho Chehab Depending on how the enum proto is written, a comma at the end may incorrectly make kernel-doc parse an arg like " ". Strip spaces before checking if arg is empty. Signed-off-by: Mauro Carvalho Chehab Message-ID: <4182bfb7e5f5b4bbaf05cee1bede691e56247eaf.1773074166.git.mcheha= b+huawei@kernel.org> --- tools/lib/python/kdoc/kdoc_parser.py | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/tools/lib/python/kdoc/kdoc_parser.py b/tools/lib/python/kdoc/k= doc_parser.py index 086579d00b5c..4b3c555e6c8e 100644 --- a/tools/lib/python/kdoc/kdoc_parser.py +++ b/tools/lib/python/kdoc/kdoc_parser.py @@ -810,9 +810,10 @@ class KernelDoc: member_set =3D set() members =3D KernRe(r'\([^;)]*\)').sub('', members) for arg in members.split(','): - if not arg: - continue arg =3D KernRe(r'^\s*(\w+).*').sub(r'\1', arg) + if not arg.strip(): + continue + self.entry.parameterlist.append(arg) if arg not in self.entry.parameterdescs: self.entry.parameterdescs[arg] =3D self.undescribed --=20 2.52.0 From nobody Wed Apr 8 10:32:43 2026 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 507EE3EDADB; Tue, 17 Mar 2026 18:09:48 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773770988; cv=none; b=tO/+7/1hWEoq0UaLeCRpAyDrNq9AkY4BWl6M5efi/7YFaGFLk9dVGZN6bsZVamrP3mbBmsIXNZFQy10R1F95rcemlaBXQv2ft7k07c77O5FdAd6NLy8bJyJWtOb6wDMFDoND2u8qSConn1cUwtCxqqnyXt7oMDooQIMBFNcfWV8= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773770988; c=relaxed/simple; bh=O4LZ3gEEiGD2HRSpZdkl7bqURixuOAqHiC1+OrI1vh0=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=oP7wOls22SsFhBzZXlj7ehd3v/Axh0uHE0l7j2qRSxAAxUZ4l1jOIrVPC6fDDd1hLFIYYq1DL0BSWKJluaU4IfpOxQddi9IsqeziJdBkc/gfOx7W0v4Xqd2dAntKzH1ywqVGD41oeTAJV8gq6NIft2sYRz5Y0hbpCdTcshhnZoc= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=KfOjxvpI; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="KfOjxvpI" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 0C150C2BCB5; Tue, 17 Mar 2026 18:09:48 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1773770988; bh=O4LZ3gEEiGD2HRSpZdkl7bqURixuOAqHiC1+OrI1vh0=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=KfOjxvpI+nGFCa2tTsHgcDE07LXcZZ7PGI/Le7FAWofG3qTR8rLyF58B35zQbgg+d kG67fL6M6isBffQrVGGykuzAm5RSIzllf/j1q9S7wGuGKfAfAcEIgePT8r5RJD+q+O LA/CfdDm0M7upOoGuO5OvWyXky4A+MbsO/llX6/Ktm4HEejwa4GUSrvZmQyfP5kzio 8U13WDcUMBj1oldFuVSEFhoASh71QDDgTwvwad0Ahz5zAut1mXQxJ4gHZQb/bmRbvH CiAd9AGlD2LDA7jjw0Y4ertd09p8a16IQD+8ol3xvOYU/ENOaoelHcu8OhAQNFBDCr +sQhVP8yFb2JA== Received: from mchehab by mail.kernel.org with local (Exim 4.99.1) (envelope-from ) id 1w2YrS-0000000H5KT-1FEU; Tue, 17 Mar 2026 19:09:46 +0100 From: Mauro Carvalho Chehab To: Jonathan Corbet , Linux Doc Mailing List Cc: Mauro Carvalho Chehab , linux-hardening@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [PATCH v3 05/22] docs: add a C tokenizer to be used by kernel-doc Date: Tue, 17 Mar 2026 19:09:25 +0100 Message-ID: <39787bb8022e10c65df40c746077f7f66d07ffed.1773770483.git.mchehab+huawei@kernel.org> X-Mailer: git-send-email 2.52.0 In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Sender: Mauro Carvalho Chehab Handling C code purely using regular expressions doesn't work well. Add a C tokenizer to help doing it the right way. The tokenizer was written using as basis the Python re documentation tokenizer example from: https://docs.python.org/3/library/re.html#writing-a-tokenizer Signed-off-by: Mauro Carvalho Chehab --- tools/lib/python/kdoc/c_lex.py | 292 +++++++++++++++++++++++++++++++++ 1 file changed, 292 insertions(+) create mode 100644 tools/lib/python/kdoc/c_lex.py diff --git a/tools/lib/python/kdoc/c_lex.py b/tools/lib/python/kdoc/c_lex.py new file mode 100644 index 000000000000..9d726f821f3f --- /dev/null +++ b/tools/lib/python/kdoc/c_lex.py @@ -0,0 +1,292 @@ +#!/usr/bin/env python3 +# SPDX-License-Identifier: GPL-2.0 +# Copyright(c) 2025: Mauro Carvalho Chehab . + +""" +Regular expression ancillary classes. + +Those help caching regular expressions and do matching for kernel-doc. + +Please notice that the code here may rise exceptions to indicate bad +usage inside kdoc to indicate problems at the replace pattern. + +Other errors are logged via log instance. +""" + +import logging +import re + +from .kdoc_re import KernRe + +log =3D logging.getLogger(__name__) + + +class CToken(): + """ + Data class to define a C token. + """ + + # Tokens that can be used by the parser. Works like an C enum. + + COMMENT =3D 0 #: A standard C or C99 comment, including delimiter. + STRING =3D 1 #: A string, including quotation marks. + CHAR =3D 2 #: A character, including apostophes. + NUMBER =3D 3 #: A number. + PUNC =3D 4 #: A puntuation mark: / ``,`` / ``.``. + BEGIN =3D 5 #: A begin character: ``{`` / ``[`` / ``(``. + END =3D 6 #: A end character: ``}`` / ``]`` / ``)``. + CPP =3D 7 #: A preprocessor macro. + HASH =3D 8 #: The hash character - useful to handle other macro= s. + OP =3D 9 #: A C operator (add, subtract, ...). + STRUCT =3D 10 #: A ``struct`` keyword. + UNION =3D 11 #: An ``union`` keyword. + ENUM =3D 12 #: A ``struct`` keyword. + TYPEDEF =3D 13 #: A ``typedef`` keyword. + NAME =3D 14 #: A name. Can be an ID or a type. + SPACE =3D 15 #: Any space characters, including new lines + ENDSTMT =3D 16 #: End of an statement (``;``). + + BACKREF =3D 17 #: Not a valid C sequence, but used at sub regex pat= terns. + + MISMATCH =3D 255 #: an error indicator: should never happen in practi= ce. + + # Dict to convert from an enum interger into a string. + _name_by_val =3D {v: k for k, v in dict(vars()).items() if isinstance(= v, int)} + + # Dict to convert from string to an enum-like integer value. + _name_to_val =3D {k: v for v, k in _name_by_val.items()} + + @staticmethod + def to_name(val): + """Convert from an integer value from CToken enum into a string""" + + return CToken._name_by_val.get(val, f"UNKNOWN({val})") + + @staticmethod + def from_name(name): + """Convert a string into a CToken enum value""" + if name in CToken._name_to_val: + return CToken._name_to_val[name] + + return CToken.MISMATCH + + + def __init__(self, kind, value=3DNone, pos=3D0, + brace_level=3D0, paren_level=3D0, bracket_level=3D0): + self.kind =3D kind + self.value =3D value + self.pos =3D pos + self.level =3D (bracket_level, paren_level, brace_level) + + def __repr__(self): + name =3D self.to_name(self.kind) + if isinstance(self.value, str): + value =3D '"' + self.value + '"' + else: + value =3D self.value + + return f"CToken(CToken.{name}, {value}, {self.pos}, {self.level})" + +#: Regexes to parse C code, transforming it into tokens. +RE_SCANNER_LIST =3D [ + # + # Note that \s\S is different than .*, as it also catches \n + # + (CToken.COMMENT, r"//[^\n]*|/\*[\s\S]*?\*/"), + + (CToken.STRING, r'"(?:\\.|[^"\\])*"'), + (CToken.CHAR, r"'(?:\\.|[^'\\])'"), + + (CToken.NUMBER, r"0[xX][\da-fA-F]+[uUlL]*|0[0-7]+[uUlL]*|" + r"\d+(?:\.\d*)?(?:[eE][+-]?\d+)?[fFlL]*"), + + (CToken.ENDSTMT, r"(?:\s+;|;)"), + + (CToken.PUNC, r"[,\.]"), + + (CToken.BEGIN, r"[\[\(\{]"), + + (CToken.END, r"[\]\)\}]"), + + (CToken.CPP, r"#\s*(?:define|include|ifdef|ifndef|if|else|elif|end= if|undef|pragma)\b"), + + (CToken.HASH, r"#"), + + (CToken.OP, r"\+\+|\-\-|\->|=3D=3D|\!=3D|<=3D|>=3D|&&|\|\||<<|>>|= \+=3D|\-=3D|\*=3D|/=3D|%=3D" + r"|&=3D|\|=3D|\^=3D|[=3D\+\-\*/%<>&\|\^~!\?\:]"), + + (CToken.STRUCT, r"\bstruct\b"), + (CToken.UNION, r"\bunion\b"), + (CToken.ENUM, r"\benum\b"), + (CToken.TYPEDEF, r"\btypedef\b"), + + (CToken.NAME, r"[A-Za-z_]\w*"), + + (CToken.SPACE, r"\s+"), + + (CToken.BACKREF, r"\\\d+"), + + (CToken.MISMATCH,r"."), +] + +def fill_re_scanner(token_list): + """Ancillary routine to convert RE_SCANNER_LIST into a finditer regex"= "" + re_tokens =3D [] + + for kind, pattern in token_list: + name =3D CToken.to_name(kind) + re_tokens.append(f"(?P<{name}>{pattern})") + + return KernRe("|".join(re_tokens), re.MULTILINE | re.DOTALL) + +#: Handle C continuation lines. +RE_CONT =3D KernRe(r"\\\n") + +RE_COMMENT_START =3D KernRe(r'/\*\s*') + +#: tokenizer regex. Will be filled at the first CTokenizer usage. +RE_SCANNER =3D fill_re_scanner(RE_SCANNER_LIST) + + +class CTokenizer(): + """ + Scan C statements and definitions and produce tokens. + + When converted to string, it drops comments and handle public/private + values, respecting depth. + """ + + # This class is inspired and follows the basic concepts of: + # https://docs.python.org/3/library/re.html#writing-a-tokenizer + + def __init__(self, source=3DNone, log=3DNone): + """ + Create a regular expression to handle RE_SCANNER_LIST. + + While I generally don't like using regex group naming via: + (?P...) + + in this particular case, it makes sense, as we can pick the name + when matching a code via RE_SCANNER. + """ + + self.tokens =3D [] + + if not source: + return + + if isinstance(source, list): + self.tokens =3D source + return + + # + # While we could just use _tokenize directly via interator, + # As we'll need to use the tokenizer several times inside kernel-d= oc + # to handle macro transforms, cache the results on a list, as + # re-using it is cheaper than having to parse everytime. + # + for tok in self._tokenize(source): + self.tokens.append(tok) + + def _tokenize(self, source): + """ + Iterator that parses ``source``, splitting it into tokens, as defi= ned + at ``self.RE_SCANNER_LIST``. + + The interactor returns a CToken class object. + """ + + # Handle continuation lines. Note that kdoc_parser already has a + # logic to do that. Still, let's keep it for completeness, as we m= ight + # end re-using this tokenizer outsize kernel-doc some day - or we = may + # eventually remove from there as a future cleanup. + source =3D RE_CONT.sub("", source) + + brace_level =3D 0 + paren_level =3D 0 + bracket_level =3D 0 + + for match in RE_SCANNER.finditer(source): + kind =3D CToken.from_name(match.lastgroup) + pos =3D match.start() + value =3D match.group() + + if kind =3D=3D CToken.MISMATCH: + log.error(f"Unexpected token '{value}' on pos {pos}:\n\t'{= source}'") + elif kind =3D=3D CToken.BEGIN: + if value =3D=3D '(': + paren_level +=3D 1 + elif value =3D=3D '[': + bracket_level +=3D 1 + else: # value =3D=3D '{' + brace_level +=3D 1 + + elif kind =3D=3D CToken.END: + if value =3D=3D ')' and paren_level > 0: + paren_level -=3D 1 + elif value =3D=3D ']' and bracket_level > 0: + bracket_level -=3D 1 + elif brace_level > 0: # value =3D=3D '}' + brace_level -=3D 1 + + yield CToken(kind, value, pos, + brace_level, paren_level, bracket_level) + + def __str__(self): + out=3D"" + show_stack =3D [True] + + for i, tok in enumerate(self.tokens): + if tok.kind =3D=3D CToken.BEGIN: + show_stack.append(show_stack[-1]) + + elif tok.kind =3D=3D CToken.END: + prev =3D show_stack[-1] + if len(show_stack) > 1: + show_stack.pop() + + if not prev and show_stack[-1]: + # + # Try to preserve indent + # + out +=3D "\t" * (len(show_stack) - 1) + + out +=3D str(tok.value) + continue + + elif tok.kind =3D=3D CToken.COMMENT: + comment =3D RE_COMMENT_START.sub("", tok.value) + + if comment.startswith("private:"): + show_stack[-1] =3D False + show =3D False + elif comment.startswith("public:"): + show_stack[-1] =3D True + + continue + + if not show_stack[-1]: + continue + + if i < len(self.tokens) - 1: + next_tok =3D self.tokens[i + 1] + + # Do some cleanups before ";" + + if (tok.kind =3D=3D CToken.SPACE and + next_tok.kind =3D=3D CToken.PUNC and + next_tok.value =3D=3D ";"): + + continue + + if (tok.kind =3D=3D CToken.PUNC and + next_tok.kind =3D=3D CToken.PUNC and + tok.value =3D=3D ";" and + next_tok.kind =3D=3D CToken.PUNC and + next_tok.value =3D=3D ";"): + + continue + + out +=3D str(tok.value) + + return out --=20 2.52.0 From nobody Wed Apr 8 10:32:43 2026 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 547793F7A88; Tue, 17 Mar 2026 18:09:48 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773770988; cv=none; b=SZb8xWXN548w69ttxES0yT9MiPAcFAFq9Zk6B83RMgNXvYu7rtIgnqV/mhjtr87ufpBAKpaGoYvKZ8O9NOJEngxcA/S3O7kk6YHNOWLgTHX7s8mWUOqrV0B+dHgk0/bUbirqxx4zrLwB6sAVh4poDAMgxouYCS2orSHgOWykO9A= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773770988; c=relaxed/simple; bh=0vOonWQOVj1qAHLjTSR5HkO7zRxJDQfYV8AFJ54huG4=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=VPgZQZDcB//eVOG9tVzsPXMBN3UXqZ2IKufNhE3nD4y83LhrwUj8iaiE1aY2GB7lK2i3z1h7sUUy5wgcyCQ8cqwV7LGXWnwq7zhJzQ61gTah5GjYg4gYpJ0SeFayYgtLbC2c8D3ykOOdQpHWzUXA+OnknUGEx/IVCoScKZFhvlQ= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=DgdPbz6l; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="DgdPbz6l" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 33C6DC2BCAF; Tue, 17 Mar 2026 18:09:48 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1773770988; bh=0vOonWQOVj1qAHLjTSR5HkO7zRxJDQfYV8AFJ54huG4=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=DgdPbz6lLyies0+bMQgWXkS8c6OPUVdxJwDTzkRNt+2RCtV4Z4F8hiV2WSyOvD43T FwH6QFqyFvJOkrhxLALKGii1+TVJUP29lj2YiaHB09UOH5nRV+mLrW+bcrQkt9Tg7u SwUyZEK8wb9JKJzV2EpLxY7eNY6TxhnGYsb6jeSdzr9HTWXceq7Pr4vajWYD5/TitW k6dDXzSk1FhNMyp8Rrt4A9wRl0ueAPNdo2SVuX/ggXOx8UBAJtaArEt/90R3wvmql0 N7qddQx5ZZMFoYFOiowHOcfgrCPaF2zYEK5h4lEuqNzqE1B2nU4kV4PoYpUPpzK10t tdt8JGeLC8xLw== Received: from mchehab by mail.kernel.org with local (Exim 4.99.1) (envelope-from ) id 1w2YrS-0000000H5Lh-24iS; Tue, 17 Mar 2026 19:09:46 +0100 From: Mauro Carvalho Chehab To: Jonathan Corbet , Linux Doc Mailing List Cc: Mauro Carvalho Chehab , linux-hardening@vger.kernel.org, linux-kernel@vger.kernel.org, Aleksandr Loktionov , Randy Dunlap Subject: [PATCH v3 06/22] docs: kdoc: use tokenizer to handle comments on structs Date: Tue, 17 Mar 2026 19:09:26 +0100 Message-ID: <054763260f7b5459ad0738ed906d7c358d640692.1773770483.git.mchehab+huawei@kernel.org> X-Mailer: git-send-email 2.52.0 In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Sender: Mauro Carvalho Chehab Better handle comments inside structs. After those changes, all unittests now pass: test_private: TestPublicPrivate: test balanced_inner_private: OK test balanced_non_greddy_private: OK test balanced_private: OK test no private: OK test unbalanced_inner_private: OK test unbalanced_private: OK test unbalanced_struct_group_tagged_with_private: OK test unbalanced_two_struct_group_tagged_first_with_private: OK test unbalanced_without_end_of_line: OK Ran 9 tests This also solves a bug when handling STRUCT_GROUP() with a private comment on it: @@ -397134,7 +397134,7 @@ basic V4L2 device-level support. unsigned int max_len; unsigned int offset; struct page_pool_params_slow slow; - STRUCT_GROUP( struct net_device *netdev; + struct net_device *netdev; unsigned int queue_idx; unsigned int flags; }; Signed-off-by: Mauro Carvalho Chehab Message-ID: Reviewed-by: Aleksandr Loktionov --- tools/lib/python/kdoc/kdoc_parser.py | 13 ++++--------- 1 file changed, 4 insertions(+), 9 deletions(-) diff --git a/tools/lib/python/kdoc/kdoc_parser.py b/tools/lib/python/kdoc/k= doc_parser.py index 4b3c555e6c8e..62d8030cf532 100644 --- a/tools/lib/python/kdoc/kdoc_parser.py +++ b/tools/lib/python/kdoc/kdoc_parser.py @@ -13,6 +13,7 @@ import sys import re from pprint import pformat =20 +from kdoc.c_lex import CTokenizer from kdoc.kdoc_re import NestedMatch, KernRe from kdoc.kdoc_item import KdocItem =20 @@ -84,15 +85,9 @@ def trim_private_members(text): """ Remove ``struct``/``enum`` members that have been marked "private". """ - # First look for a "public:" block that ends a private region, then - # handle the "private until the end" case. - # - text =3D KernRe(r'/\*\s*private:.*?/\*\s*public:.*?\*/', flags=3Dre.S)= .sub('', text) - text =3D KernRe(r'/\*\s*private:.*', flags=3Dre.S).sub('', text) - # - # We needed the comments to do the above, but now we can take them out. - # - return KernRe(r'\s*/\*.*?\*/\s*', flags=3Dre.S).sub('', text).strip() + + tokens =3D CTokenizer(text) + return str(tokens) =20 class state: """ --=20 2.52.0 From nobody Wed Apr 8 10:32:43 2026 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 9252A3F7AB6; Tue, 17 Mar 2026 18:09:48 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773770988; cv=none; b=A0wcERhg2REZcvHmn+mlzZMOYcQoOFDl1pmv6fiLJFIVPbPj3jx2S3ILcY5e60rA1SyxoC5FvtYOdMQgnrCokTWVHiGO5eM2ws++ueEJp0mGs6u+cQp3ye00WmO8pQhTF2ywMQtufu4eGZ6yF6aUnmnvPfeEzaufjcBv8sBLId4= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773770988; c=relaxed/simple; bh=LlO8G0kmdnWIAU5jvxuyWy2ICYFHWQfkFHgsDP9agHw=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=QDe+a8FIhf579mW/gjFl7V97H+YPAftc1CNeYWJ8yamWCC9HW3+WPBo+zWmztSNWdqJZE2Xhhv6r1viLbN7eX+Y0ytuODL+ecWHLi81sVzyaUy3XY8lMOXh8UWz9nLGGueEHN3Li9odMO+KaJcAhrj706XoO7Fs6t7PJpvsuaw8= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=fCKlRDzm; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="fCKlRDzm" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 70276C2BCB0; Tue, 17 Mar 2026 18:09:48 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1773770988; bh=LlO8G0kmdnWIAU5jvxuyWy2ICYFHWQfkFHgsDP9agHw=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=fCKlRDzmCOQeI8wuBXD9T81ZF0k6Zk0ub0YSz0r3aG0Kp+JU0Q6nl71hmqD5i7McL iPKIHfMbrxiz9K3OYso5+Bc1sCNbH4bCuLFIhZccAgeO3NuvTSchsTEC5U0OqH/zUw Jl0WQ7HUO3b4tuQPcrP4uv61biE1X/eYbSfwHt/X6GOathuYYZRfOvtk9/M1wsB33r ntUBlSaAZ45ppHTGWvMZoTXUlLk8Ha3eFcvBU/bshIBTFNR9fMKHIY51qnZT/xRlMw RaT7FKdC10WK6pxQJb0rPunTrUspfbAxdXOGgny2kAbRpPziFImcOpmLnPuhOGrEyM OgcPobTvea6JQ== Received: from mchehab by mail.kernel.org with local (Exim 4.99.1) (envelope-from ) id 1w2YrS-0000000H5Mu-2wZd; Tue, 17 Mar 2026 19:09:46 +0100 From: Mauro Carvalho Chehab To: Jonathan Corbet , Linux Doc Mailing List Cc: Mauro Carvalho Chehab , linux-hardening@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [PATCH v3 07/22] unittests: test_private: modify it to use CTokenizer directly Date: Tue, 17 Mar 2026 19:09:27 +0100 Message-ID: <66e6320a4d5ad9730c1c0ceea79b5021e90c66c6.1773770483.git.mchehab+huawei@kernel.org> X-Mailer: git-send-email 2.52.0 In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Sender: Mauro Carvalho Chehab Change the logic to use the tokenizer directly. This allows adding more unit tests to check the validty of the tokenizer itself. Signed-off-by: Mauro Carvalho Chehab Message-ID: <2672257233ff73a9464c09b50924be51e25d4f59.1773074166.git.mcheha= b+huawei@kernel.org> --- .../{test_private.py =3D> test_tokenizer.py} | 75 +++++++++++++------ 1 file changed, 51 insertions(+), 24 deletions(-) rename tools/unittests/{test_private.py =3D> test_tokenizer.py} (85%) diff --git a/tools/unittests/test_private.py b/tools/unittests/test_tokeniz= er.py similarity index 85% rename from tools/unittests/test_private.py rename to tools/unittests/test_tokenizer.py index eae245ae8a12..3b1d0b5bd311 100755 --- a/tools/unittests/test_private.py +++ b/tools/unittests/test_tokenizer.py @@ -15,20 +15,43 @@ from unittest.mock import MagicMock SRC_DIR =3D os.path.dirname(os.path.realpath(__file__)) sys.path.insert(0, os.path.join(SRC_DIR, "../lib/python")) =20 -from kdoc.kdoc_parser import trim_private_members +from kdoc.c_lex import CTokenizer from unittest_helper import run_unittest =20 + # # List of tests. # # The code will dynamically generate one test for each key on this diction= ary. # =20 +def make_private_test(name, data): + """ + Create a test named ``name`` using parameters given by ``data`` dict. + """ + + def test(self): + """In-lined lambda-like function to run the test""" + tokens =3D CTokenizer(data["source"]) + result =3D str(tokens) + + # + # Avoid whitespace false positives + # + result =3D re.sub(r"\s++", " ", result).strip() + expected =3D re.sub(r"\s++", " ", data["trimmed"]).strip() + + msg =3D f"failed when parsing this source:\n{data['source']}" + self.assertEqual(result, expected, msg=3Dmsg) + + return test + #: Tests to check if CTokenizer is handling properly public/private commen= ts. TESTS_PRIVATE =3D { # # Simplest case: no private. Ensure that trimming won't affect struct # + "__run__": make_private_test, "no private": { "source": """ struct foo { @@ -288,41 +311,45 @@ TESTS_PRIVATE =3D { }, } =20 +#: Dict containing all test groups fror CTokenizer +TESTS =3D { + "TestPublicPrivate": TESTS_PRIVATE, +} =20 -class TestPublicPrivate(unittest.TestCase): - """ - Main test class. Populated dynamically at runtime. - """ +def setUp(self): + self.maxDiff =3D None =20 - def setUp(self): - self.maxDiff =3D None +def build_test_class(group_name, table): + """ + Dynamically creates a class instance using type() as a generator + for a new class derivated from unittest.TestCase. =20 - def add_test(cls, name, source, trimmed): - """ - Dynamically add a test to the class - """ - def test(cls): - result =3D trim_private_members(source) + We're opting to do it inside a function to avoid the risk of + changing the globals() dictionary. + """ =20 - result =3D re.sub(r"\s++", " ", result).strip() - expected =3D re.sub(r"\s++", " ", trimmed).strip() + class_dict =3D { + "setUp": setUp + } =20 - msg =3D f"failed when parsing this source:\n" + source + run =3D table["__run__"] =20 - cls.assertEqual(result, expected, msg=3Dmsg) + for test_name, data in table.items(): + if test_name =3D=3D "__run__": + continue =20 - test.__name__ =3D f'test {name}' + class_dict[f"test_{test_name}"] =3D run(test_name, data) =20 - setattr(TestPublicPrivate, test.__name__, test) + cls =3D type(group_name, (unittest.TestCase,), class_dict) =20 + return cls.__name__, cls =20 # -# Populate TestPublicPrivate class +# Create classes and add them to the global dictionary # -test_class =3D TestPublicPrivate() -for name, test in TESTS_PRIVATE.items(): - test_class.add_test(name, test["source"], test["trimmed"]) - +for group, table in TESTS.items(): + t =3D build_test_class(group, table) + globals()[t[0]] =3D t[1] =20 # # main --=20 2.52.0 From nobody Wed Apr 8 10:32:43 2026 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C449337F8CA; Tue, 17 Mar 2026 18:09:48 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773770988; cv=none; b=Rvz73HpOyqh7tcHx5xKq6UQLc+QKrCn0oG5Lw9TOifzCL82wnl8mSZzIBKnpi/PgumYbYFuUXLujpdl3wpa7AYshTwNsCMe1WZiwoUA6/Y7w2Ry9nb7y8DKcAQtR2hjGzfuoFEOD2jLNqd4/WrpgX/KAeOWnSchZSunv86VeTCs= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773770988; c=relaxed/simple; bh=iKh3pigxP/quPUOBxGpLOk4xpH0UcU+z50MhBcXBNx0=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=rn6QC3GoQtAgS2z3Nwoj59c++iKvp3xoAU2sSE9eysJz7+wjLt2O1SxxtM8wXNCBcWnwxbOcvyjRwHXdfIi5xY26yDWKWO3CNjl0cNGG+C2ef4lqiNDCsHIdjTmAF54+6JIfrJKcPTsxyEtIMHHcaky8+AlQdWSm3bQvvVTXdHA= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=E6gQXVqp; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="E6gQXVqp" Received: by smtp.kernel.org (Postfix) with ESMTPSA id A31AAC4CEF7; Tue, 17 Mar 2026 18:09:48 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1773770988; bh=iKh3pigxP/quPUOBxGpLOk4xpH0UcU+z50MhBcXBNx0=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=E6gQXVqp6HwGTe/gNzxushF46ipUy7LjRSAMrVFyHocWbwulU2JwaBEXmM3zmFbON HalyB8qNHuNjntW0ZIiS79CvXx1lpAwsqoFB7aAOyozgTfVqGSm4q05FzhKU1Ky2Fh pXcd15LYQ6xHzwzMGWhKqGQzT6ZDKuj2kd1vQhwG70FZPbeaQG1cLhGbunKeUL0trp azDgrXcJuIBxLcaPsExS+n4P4krI2ZQs7qrVQTze3DckHUK/aXnrZjddhU/Gp17YnZ iV0rNWT8XOxKqINzCGwn8QNn5Uou3aT4dx/LdbDDs7KQkVzNpXowD8PwnsUZmgMzLQ LN9QNWYd0NeLw== Received: from mchehab by mail.kernel.org with local (Exim 4.99.1) (envelope-from ) id 1w2YrS-0000000H5O8-3ks4; Tue, 17 Mar 2026 19:09:46 +0100 From: Mauro Carvalho Chehab To: Jonathan Corbet , Linux Doc Mailing List Cc: Mauro Carvalho Chehab , linux-hardening@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [PATCH v3 08/22] unittests: test_tokenizer: check if the tokenizer works Date: Tue, 17 Mar 2026 19:09:28 +0100 Message-ID: X-Mailer: git-send-email 2.52.0 In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Sender: Mauro Carvalho Chehab Add extra tests to check if the tokenizer is working properly. Signed-off-by: Mauro Carvalho Chehab --- tools/unittests/test_tokenizer.py | 108 +++++++++++++++++++++++++++++- 1 file changed, 106 insertions(+), 2 deletions(-) diff --git a/tools/unittests/test_tokenizer.py b/tools/unittests/test_token= izer.py index 3b1d0b5bd311..5634b4a7283e 100755 --- a/tools/unittests/test_tokenizer.py +++ b/tools/unittests/test_tokenizer.py @@ -15,15 +15,118 @@ from unittest.mock import MagicMock SRC_DIR =3D os.path.dirname(os.path.realpath(__file__)) sys.path.insert(0, os.path.join(SRC_DIR, "../lib/python")) =20 -from kdoc.c_lex import CTokenizer +from kdoc.c_lex import CToken, CTokenizer from unittest_helper import run_unittest =20 - # # List of tests. # # The code will dynamically generate one test for each key on this diction= ary. # +def tokens_to_list(tokens): + tuples =3D [] + + for tok in tokens: + if tok.kind =3D=3D CToken.SPACE: + continue + + tuples +=3D [(tok.kind, tok.value, tok.level)] + + return tuples + + +def make_tokenizer_test(name, data): + """ + Create a test named ``name`` using parameters given by ``data`` dict. + """ + + def test(self): + """In-lined lambda-like function to run the test""" + + # + # Check if logger is working + # + if "log_level" in data: + with self.assertLogs('kdoc.c_lex', level=3D'ERROR') as cm: + tokenizer =3D CTokenizer(data["source"]) + + return + + # + # Check if tokenizer is producing expected results + # + tokens =3D CTokenizer(data["source"]).tokens + + result =3D tokens_to_list(tokens) + expected =3D tokens_to_list(data["expected"]) + + self.assertEqual(result, expected, msg=3Df"{name}") + + return test + +#: Tokenizer tests. +TESTS_TOKENIZER =3D { + "__run__": make_tokenizer_test, + + "basic_tokens": { + "source": """ + int a; // comment + float b =3D 1.23; + """, + "expected": [ + CToken(CToken.NAME, "int"), + CToken(CToken.NAME, "a"), + CToken(CToken.ENDSTMT, ";"), + CToken(CToken.COMMENT, "// comment"), + CToken(CToken.NAME, "float"), + CToken(CToken.NAME, "b"), + CToken(CToken.OP, "=3D"), + CToken(CToken.NUMBER, "1.23"), + CToken(CToken.ENDSTMT, ";"), + ], + }, + + "depth_counters": { + "source": """ + struct X { + int arr[10]; + func(a[0], (b + c)); + } + """, + "expected": [ + CToken(CToken.STRUCT, "struct"), + CToken(CToken.NAME, "X"), + CToken(CToken.BEGIN, "{", brace_level=3D1), + + CToken(CToken.NAME, "int", brace_level=3D1), + CToken(CToken.NAME, "arr", brace_level=3D1), + CToken(CToken.BEGIN, "[", brace_level=3D1, bracket_level=3D1), + CToken(CToken.NUMBER, "10", brace_level=3D1, bracket_level=3D1= ), + CToken(CToken.END, "]", brace_level=3D1), + CToken(CToken.ENDSTMT, ";", brace_level=3D1), + CToken(CToken.NAME, "func", brace_level=3D1), + CToken(CToken.BEGIN, "(", brace_level=3D1, paren_level=3D1), + CToken(CToken.NAME, "a", brace_level=3D1, paren_level=3D1), + CToken(CToken.BEGIN, "[", brace_level=3D1, paren_level=3D1, br= acket_level=3D1), + CToken(CToken.NUMBER, "0", brace_level=3D1, paren_level=3D1, b= racket_level=3D1), + CToken(CToken.END, "]", brace_level=3D1, paren_level=3D1), + CToken(CToken.PUNC, ",", brace_level=3D1, paren_level=3D1), + CToken(CToken.BEGIN, "(", brace_level=3D1, paren_level=3D2), + CToken(CToken.NAME, "b", brace_level=3D1, paren_level=3D2), + CToken(CToken.OP, "+", brace_level=3D1, paren_level=3D2), + CToken(CToken.NAME, "c", brace_level=3D1, paren_level=3D2), + CToken(CToken.END, ")", brace_level=3D1, paren_level=3D1), + CToken(CToken.END, ")", brace_level=3D1), + CToken(CToken.ENDSTMT, ";", brace_level=3D1), + CToken(CToken.END, "}"), + ], + }, + + "mismatch_error": { + "source": "int a$ =3D 5;", # $ is illegal + "log_level": "ERROR", + }, +} =20 def make_private_test(name, data): """ @@ -314,6 +417,7 @@ TESTS_PRIVATE =3D { #: Dict containing all test groups fror CTokenizer TESTS =3D { "TestPublicPrivate": TESTS_PRIVATE, + "TestTokenizer": TESTS_TOKENIZER, } =20 def setUp(self): --=20 2.52.0 From nobody Wed Apr 8 10:32:43 2026 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 128D13F8806; Tue, 17 Mar 2026 18:09:48 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773770989; cv=none; b=fTvULSKAPNurHZcPqRGHx+TuboaLrdz6KRHhDwbb456+p4ph4SLFRvef8A5CLxfZ3Ehoox9PLfbUAN16v8837/pYnlUl0pcH0Ly2o0RLkJDbUPb7fejr05rM3D0MmOskDCrCnW5h9CD4AfWFLYRhU/DGeD/gADJ+nUXB1O5qUTo= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773770989; c=relaxed/simple; bh=PaOLRDk0iXEcEwnWCzqzsJpfgjdAgDVAImrhDci59t0=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=OW2MGxUxdvWTkSN8jdJDo4LPl5Z7qNSTV6EBQMUOI0eeH2rBL8yD4KA6J8taPz2WxTI43mf9EN0WQ8z+yXhEweLYfq08+C0jOOyCdMohZ3mQhb+bQ+ewaWR4+rE952EqbSW6DM22K5nrv7oXs6NJ5LjEO5jB+QYGqixsa0GgQA0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=o9nP3g5O; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="o9nP3g5O" Received: by smtp.kernel.org (Postfix) with ESMTPSA id D3F02C2BCB1; Tue, 17 Mar 2026 18:09:48 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1773770988; bh=PaOLRDk0iXEcEwnWCzqzsJpfgjdAgDVAImrhDci59t0=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=o9nP3g5OeSEBpEAsa1hvwtaqPORS6dNbK9/Ndlw3rE0tptUCD1gEfVoduy1JEbmta uEJneWemXulDiXVXrw6f7V+Z8++B4k66YF1AvyWc0mLLc0oSUe8gcwgcykLOHL00AY 2yNvLi4pjB3bn5xH24TMuzxAVS+U3037VSQ8uwD2KjdNSh3CFE1bJRkY4W8CB1cksV sq7ja/N4bW1clvK/TGzzKdcGN/yomDYAfpLYQQjQPXw+V0QOvc9kWrxC68VVy56swg ltTo38Z4huHtYV272vG8KrAmJ7LQ8xKyTxUA4hSWLRPxRq7xlDWonb+S+tNpftsCun ZfMDgLVx4PSKA== Received: from mchehab by mail.kernel.org with local (Exim 4.99.1) (envelope-from ) id 1w2YrT-0000000H5PL-0Lt3; Tue, 17 Mar 2026 19:09:47 +0100 From: Mauro Carvalho Chehab To: Jonathan Corbet , Linux Doc Mailing List Cc: Mauro Carvalho Chehab , linux-hardening@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [PATCH v3 09/22] unittests: add a runner to execute all unittests Date: Tue, 17 Mar 2026 19:09:29 +0100 Message-ID: <2d9dd14f03d3d6394346fdaceeb3167d54d1dd0c.1773770483.git.mchehab+huawei@kernel.org> X-Mailer: git-send-email 2.52.0 In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Sender: Mauro Carvalho Chehab We'll soon have multiple unit tests, add a runner that will discover all of them and execute all tests. It was opted to discover only files that starts with "test", as this way unittest discover won't try adding libraries or other stuff that might not contain unittest classes. Signed-off-by: Mauro Carvalho Chehab --- tools/unittests/run.py | 17 +++++++++++++++++ 1 file changed, 17 insertions(+) create mode 100755 tools/unittests/run.py diff --git a/tools/unittests/run.py b/tools/unittests/run.py new file mode 100755 index 000000000000..8c19036d43a1 --- /dev/null +++ b/tools/unittests/run.py @@ -0,0 +1,17 @@ +#!/bin/env python3 +import os +import unittest +import sys + +TOOLS_DIR=3Dos.path.join(os.path.dirname(os.path.realpath(__file__)), "..") +sys.path.insert(0, TOOLS_DIR) + +from lib.python.unittest_helper import TestUnits + +if __name__ =3D=3D "__main__": + loader =3D unittest.TestLoader() + + suite =3D loader.discover(start_dir=3Dos.path.join(TOOLS_DIR, "unittes= ts"), + pattern=3D"test*.py") + + TestUnits().run("", suite=3Dsuite) --=20 2.52.0 From nobody Wed Apr 8 10:32:43 2026 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 7C32B3F8DE1; Tue, 17 Mar 2026 18:09:49 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773770989; cv=none; b=e0bS3ihUnFGcRQIHCdpMUarVsH0K380b0M/Bxpm+P71R/df+G+FHKWhRIM14B5BOx9Lp9c2nIyMvgQB1jS95/n4m20o8nAgtU5T8zsKt17H3cFnoTbk5NYPBY44pr8fGS7NTY8zdGCjocLeR5cq+QH3lcE5JufysQ/CzOSutxt8= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773770989; c=relaxed/simple; bh=DE2bgHoSxjCADPPtYN1w1qD6MKNDV4mXxiXZgZKnIBQ=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=qedA3G6MuCV9EOmGyc/6P8ZT3PjFfNuH3IEweJcwq9XiwSwItzRmf7/U6fyhcDb9BwPNeGgFptsl/ctOzMofsCQm+bdri2tcMsLeuDobSghaXNX+V5HzCvz5A+68My/2irCq5+dwp4eTxWHka+7DICZCiOM+YH018/gq7DCuDTw= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=TUB/P2pp; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="TUB/P2pp" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 0FB4FC4CEF7; Tue, 17 Mar 2026 18:09:49 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1773770989; bh=DE2bgHoSxjCADPPtYN1w1qD6MKNDV4mXxiXZgZKnIBQ=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=TUB/P2pp2s+U2KB2hgmk1MqkaD1NbNkXF4ajY2mgNQnJXVsj25cBYiZO4X7J/Cnx5 Bp7N22OVFJC5sZ+8ZTWo0Hzg2JcLcAqBeZqjhJe/9YY/8aa0k4C3aRBwbYl5yrchS8 QE1Y536ui8utmmxCFqH5wOtdwJOw4ADmgSoNzCvEmxp41P1u6VuIoCBjUeHQ9ffuhF uDz04aig1plRvZYzOVBRetaa8vD0vLGOx78pBw7F+JZosxsHeb6XvBPTqRvj0bJGNz 5+073evKGRjCwv5e3X1oaWZiuSD6ItSG0RP461mRkbeX8Vh0Vqw7uyYawpcH8Fnudd No/T9au2TF2XQ== Received: from mchehab by mail.kernel.org with local (Exim 4.99.1) (envelope-from ) id 1w2YrT-0000000H5QY-1E68; Tue, 17 Mar 2026 19:09:47 +0100 From: Mauro Carvalho Chehab To: Jonathan Corbet , Linux Doc Mailing List Cc: Mauro Carvalho Chehab , linux-hardening@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [PATCH v3 10/22] docs: kdoc: create a CMatch to match nested C blocks Date: Tue, 17 Mar 2026 19:09:30 +0100 Message-ID: X-Mailer: git-send-email 2.52.0 In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Sender: Mauro Carvalho Chehab The NextMatch code is complex, and will become even more complex if we add there support for arguments. Now that we have a tokenizer, we can use a better solution, easier to be understood. Yet, to improve performance, it is better to make it use a previously tokenized code, changing its ABI. So, reimplement NextMatch using the CTokener class. Once it is done, we can drop NestedMatch. Signed-off-by: Mauro Carvalho Chehab --- tools/lib/python/kdoc/c_lex.py | 121 ++++++++++++++++++++++++++++++--- 1 file changed, 111 insertions(+), 10 deletions(-) diff --git a/tools/lib/python/kdoc/c_lex.py b/tools/lib/python/kdoc/c_lex.py index 9d726f821f3f..5da472734ff7 100644 --- a/tools/lib/python/kdoc/c_lex.py +++ b/tools/lib/python/kdoc/c_lex.py @@ -273,20 +273,121 @@ class CTokenizer(): =20 # Do some cleanups before ";" =20 - if (tok.kind =3D=3D CToken.SPACE and - next_tok.kind =3D=3D CToken.PUNC and - next_tok.value =3D=3D ";"): - + if tok.kind =3D=3D CToken.SPACE and next_tok.kind =3D=3D C= Token.ENDSTMT: continue =20 - if (tok.kind =3D=3D CToken.PUNC and - next_tok.kind =3D=3D CToken.PUNC and - tok.value =3D=3D ";" and - next_tok.kind =3D=3D CToken.PUNC and - next_tok.value =3D=3D ";"): - + if tok.kind =3D=3D CToken.ENDSTMT and next_tok.kind =3D=3D= tok.kind: continue =20 out +=3D str(tok.value) =20 return out + + +class CMatch: + """ + Finding nested delimiters is hard with regular expressions. It is + even harder on Python with its normal re module, as there are several + advanced regular expressions that are missing. + + This is the case of this pattern:: + + '\\bSTRUCT_GROUP(\\(((?:(?>[^)(]+)|(?1))*)\\))[^;]*;' + + which is used to properly match open/close parentheses of the + string search STRUCT_GROUP(), + + Add a class that counts pairs of delimiters, using it to match and + replace nested expressions. + + The original approach was suggested by: + + https://stackoverflow.com/questions/5454322/python-how-to-match-ne= sted-parentheses-with-regex + + Although I re-implemented it to make it more generic and match 3 types + of delimiters. The logic checks if delimiters are paired. If not, it + will ignore the search string. + """ + + # TODO: add a sub method + + def __init__(self, regex): + self.regex =3D KernRe(regex) + + def _search(self, tokenizer): + """ + Finds paired blocks for a regex that ends with a delimiter. + + The suggestion of using finditer to match pairs came from: + https://stackoverflow.com/questions/5454322/python-how-to-match-ne= sted-parentheses-with-regex + but I ended using a different implementation to align all three ty= pes + of delimiters and seek for an initial regular expression. + + The algorithm seeks for open/close paired delimiters and places th= em + into a stack, yielding a start/stop position of each match when the + stack is zeroed. + + The algorithm should work fine for properly paired lines, but will + silently ignore end delimiters that precede a start delimiter. + This should be OK for kernel-doc parser, as unaligned delimiters + would cause compilation errors. So, we don't need to raise excepti= ons + to cover such issues. + """ + + start =3D None + offset =3D -1 + started =3D False + + import sys + + stack =3D [] + + for i, tok in enumerate(tokenizer.tokens): + if start is None: + if tok.kind =3D=3D CToken.NAME and self.regex.match(tok.va= lue): + start =3D i + stack.append((start, tok.level)) + started =3D False + + continue + + if not started and tok.kind =3D=3D CToken.BEGIN: + started =3D True + continue + + if tok.kind =3D=3D CToken.END and tok.level =3D=3D stack[-1][1= ]: + start, level =3D stack.pop() + offset =3D i + + yield CTokenizer(tokenizer.tokens[start:offset + 1]) + start =3D None + + # + # If an END zeroing levels is not there, return remaining stuff + # This is meant to solve cases where the caller logic might be + # picking an incomplete block. + # + if start and offset < 0: + print("WARNING: can't find an end", file=3Dsys.stderr) + yield CTokenizer(tokenizer.tokens[start:]) + + def search(self, source): + """ + This is similar to re.search: + + It matches a regex that it is followed by a delimiter, + returning occurrences only if all delimiters are paired. + """ + + if isinstance(source, CTokenizer): + tokenizer =3D source + is_token =3D True + else: + tokenizer =3D CTokenizer(source) + is_token =3D False + + for new_tokenizer in self._search(tokenizer): + if is_token: + yield new_tokenizer + else: + yield str(new_tokenizer) --=20 2.52.0 From nobody Wed Apr 8 10:32:43 2026 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 5BAFB3F881D; Tue, 17 Mar 2026 18:09:49 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773770989; cv=none; b=H2D3EsgSZ5SIvlq2j+4Qxm3j2ddB/hJpuWW6DSIoiVFSWzOe2noQMP0wKn3HKcaEOk4pKETJAmKGz6iMt4uKOeqzW4R0ud++O78xQgNhU5GfqnAdXxbPMh/hY0aDAARPZhIYZgR3qTODqvihKd3xf7kTo+Psox49pmBIYnjnsBE= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773770989; c=relaxed/simple; bh=EY9AfgDGU0mksVrXJ3N/EicUXh9pcg8Dz+yFEsPHERA=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=mzBgtDWGER8rq4x/RehMa6hh/Ac8NjFZdSVx1HKJL9szDLVykrLFqUF1GUW/0T0D+IdEhKpDHTpZcsXrYCYJAfCkJAaA6ch7jNUCrAmNxxLCBD497dMB4gKCrad7sWZZTDsZeGHeHq1whjmHzn9RyL2spvTNdFyplvb439L1ZT8= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=ms1tAzZ1; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="ms1tAzZ1" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 3F826C2BCB0; Tue, 17 Mar 2026 18:09:49 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1773770989; bh=EY9AfgDGU0mksVrXJ3N/EicUXh9pcg8Dz+yFEsPHERA=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=ms1tAzZ1bAY6gJIZVvlmV7RvDjlwgE+f/ObLzzxBBWTJDpxNvi/k4nAND1WnRagET rk5TpQzGUnGvFwi86t8xyZS3ggyUdg9q9sEEsHLxI4Y5Yz9ItJPA5CLpjfhTt3pzsE r47WRxkhf6YE3jD3Nw5/CbuC/4yYHNXT/9fpQO8PumB6HxV6PVjlIwlpHx0rFSFl7t /Ygx4rOIqUJZWGXPPgZYXo662yMl9G7thXHF2OBk4wv2yCDhWBCazrwIg7jwLOmAkb aTOaUlw5X/nTn9OB6afX7CgVqm24QqbwE4nJACmKdgvErv3enyxom3EJrJwctx+ImT mrp6Xz8dDR2og== Received: from mchehab by mail.kernel.org with local (Exim 4.99.1) (envelope-from ) id 1w2YrT-0000000H5Rm-27gN; Tue, 17 Mar 2026 19:09:47 +0100 From: Mauro Carvalho Chehab To: Jonathan Corbet , Linux Doc Mailing List Cc: Mauro Carvalho Chehab , linux-hardening@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [PATCH v3 11/22] tools: unittests: add tests for CMatch Date: Tue, 17 Mar 2026 19:09:31 +0100 Message-ID: <119712b5bc53b4c6dda6a81b4a783dcbfd1d970d.1773770483.git.mchehab+huawei@kernel.org> X-Mailer: git-send-email 2.52.0 In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Sender: Mauro Carvalho Chehab The CMatch logic is complex enough to justify tests to ensure that it is doing its job. Add unittests to check the functionality provided by CMatch by replicating expected patterns. The CMatch class handles with complex macros. Add an unittest to check if its doing the right thing and detect eventual regressions as we improve its code. The initial version was generated using gpt-oss:latest LLM on my local GPU, as LLMs aren't bad transforming patterns into unittests. Yet, the curent version contains only the skeleton of what LLM produced, as I ended higly changing its content to be more representative and to have real case scenarios. The kdoc_xforms test suite contains 3 test groups. Two of them tests the basic functionality of CMatch to replace patterns. The last one (TestRealUsecases) contains real code snippets from the Kernel with some cleanups to better fit in 80 columns and uses the same transforms as kernel-doc, thus allowing to test the logic used inside kdoc_parser to transform functions, structs and variable patterns. Its output is like this: $ tools/unittests/kdoc_xforms.py Ran 25 tests in 0.003s OK test_cmatch: TestSearch: test_search_acquires_multiple: OK test_search_acquires_nested_paren: OK test_search_acquires_simple: OK test_search_must_hold: OK test_search_must_hold_shared: OK test_search_no_false_positive: OK test_search_no_function: OK test_search_no_macro_remains: OK Ran 8 tests Signed-off-by: Mauro Carvalho Chehab --- tools/unittests/test_cmatch.py | 95 ++++++++++++++++++++++++++++++++++ 1 file changed, 95 insertions(+) create mode 100755 tools/unittests/test_cmatch.py diff --git a/tools/unittests/test_cmatch.py b/tools/unittests/test_cmatch.py new file mode 100755 index 000000000000..53b25aa4dc4a --- /dev/null +++ b/tools/unittests/test_cmatch.py @@ -0,0 +1,95 @@ +#!/usr/bin/env python3 +# SPDX-License-Identifier: GPL-2.0 +# Copyright(c) 2026: Mauro Carvalho Chehab . +# +# pylint: disable=3DC0413,R0904 + + +""" +Unit tests for kernel-doc CMatch. +""" + +import os +import re +import sys +import unittest + + +# Import Python modules + +SRC_DIR =3D os.path.dirname(os.path.realpath(__file__)) +sys.path.insert(0, os.path.join(SRC_DIR, "../lib/python")) + +from kdoc.c_lex import CMatch +from kdoc.xforms_lists import CTransforms +from unittest_helper import run_unittest + +# +# Override unittest.TestCase to better compare diffs ignoring whitespaces +# +class TestCaseDiff(unittest.TestCase): + """ + Disable maximum limit on diffs and add a method to better + handle diffs with whitespace differences. + """ + + @classmethod + def setUpClass(cls): + """Ensure that there won't be limit for diffs""" + cls.maxDiff =3D None + + +# +# Tests doing with different macros +# + +class TestSearch(TestCaseDiff): + """ + Test search mechanism + """ + + def test_search_acquires_simple(self): + line =3D "__acquires(ctx) foo();" + result =3D ", ".join(CMatch("__acquires").search(line)) + self.assertEqual(result, "__acquires(ctx)") + + def test_search_acquires_multiple(self): + line =3D "__acquires(ctx) __acquires(other) bar();" + result =3D ", ".join(CMatch("__acquires").search(line)) + self.assertEqual(result, "__acquires(ctx), __acquires(other)") + + def test_search_acquires_nested_paren(self): + line =3D "__acquires((ctx1, ctx2)) baz();" + result =3D ", ".join(CMatch("__acquires").search(line)) + self.assertEqual(result, "__acquires((ctx1, ctx2))") + + def test_search_must_hold(self): + line =3D "__must_hold(&lock) do_something();" + result =3D ", ".join(CMatch("__must_hold").search(line)) + self.assertEqual(result, "__must_hold(&lock)") + + def test_search_must_hold_shared(self): + line =3D "__must_hold_shared(RCU) other();" + result =3D ", ".join(CMatch("__must_hold_shared").search(line)) + self.assertEqual(result, "__must_hold_shared(RCU)") + + def test_search_no_false_positive(self): + line =3D "call__acquires(foo); // should stay intact" + result =3D ", ".join(CMatch(r"\b__acquires").search(line)) + self.assertEqual(result, "") + + def test_search_no_macro_remains(self): + line =3D "do_something_else();" + result =3D ", ".join(CMatch("__acquires").search(line)) + self.assertEqual(result, "") + + def test_search_no_function(self): + line =3D "something" + result =3D ", ".join(CMatch(line).search(line)) + self.assertEqual(result, "") + +# +# Run all tests +# +if __name__ =3D=3D "__main__": + run_unittest(__file__) --=20 2.52.0 From nobody Wed Apr 8 10:32:43 2026 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D82DA3F8DEB; Tue, 17 Mar 2026 18:09:49 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773770989; cv=none; b=fxrre951u5rqmo0gpO3mWtBi6/kQf02QVwCHR+m9Al3WYLiVeP7HwK3/g8OOtiz+Fti8mKfPV3uL2Z2QkzraS74pRYDge41DO6BqK73upWK9rX2dhZ3ND6guQXKBbgUiQUyE56+2sgN+QbOIykq58f3ya9pzwvU8zkfAxT+iSIU= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773770989; c=relaxed/simple; bh=LUcFzGUiEZD4bIprs6ndA9LAE1KebeM/BqrpXPUV0iY=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=QG4G1ksoVSRUD/bwvDtjMiJAVmGr5QdG/azEgBErvwHh61lETdsnAu1GePoGo21/E4YqPtIiEeZt1NdP6JYJ6FRjPRzNLxMgpMfXk7Rqz50ZXxnhI+QtWOBg+yJxrtSy/w0YUsBBQVhrbw8VdJgF8YKiK8HTpKO3GZpBQK7iOis= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=oL3beYMF; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="oL3beYMF" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 94438C2BCB1; Tue, 17 Mar 2026 18:09:49 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1773770989; bh=LUcFzGUiEZD4bIprs6ndA9LAE1KebeM/BqrpXPUV0iY=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=oL3beYMFmzSXQZspVzBMqlm4aB81VpvnHsOx0FQEn5NLSlIbPiDX4X/rEFDnAr6zh uSy68UX27KW2xxaZ8AOG4zI6ah2EU9FZOiDMmCtOZWGUwI7k9Mp9l8SHqLUbKrF13t lfU4fnxfPiBapM8+sN52VaAeE6kSsHp9WnG8lZ7VdxYfSQ7UupnuYT9TkjgazcGo4b Ntg+/W7azhNpbYacjrDG5WhVAAF6mW2R5xmF79SCJ7/ROwoKsNgMprtLODX50iZpSO wo7Rpo8N1TaL6yjjxGRqOACg7FQJb1HZnXAsStpgx/8W7bzrmevGfVu732wdkgjLQw 6cCjwRmza6HFg== Received: from mchehab by mail.kernel.org with local (Exim 4.99.1) (envelope-from ) id 1w2YrT-0000000H5Sz-309C; Tue, 17 Mar 2026 19:09:47 +0100 From: Mauro Carvalho Chehab To: Jonathan Corbet , Linux Doc Mailing List Cc: Mauro Carvalho Chehab , linux-hardening@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [PATCH v3 12/22] docs: c_lex: properly implement a sub() method for CMatch Date: Tue, 17 Mar 2026 19:09:32 +0100 Message-ID: X-Mailer: git-send-email 2.52.0 In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Sender: Mauro Carvalho Chehab Implement a sub() method to do what it is expected, parsing backref arguments like \0, \1, \2, ... Signed-off-by: Mauro Carvalho Chehab --- tools/lib/python/kdoc/c_lex.py | 272 +++++++++++++++++++++++++++++++-- 1 file changed, 259 insertions(+), 13 deletions(-) diff --git a/tools/lib/python/kdoc/c_lex.py b/tools/lib/python/kdoc/c_lex.py index 5da472734ff7..20e50ff0ecd5 100644 --- a/tools/lib/python/kdoc/c_lex.py +++ b/tools/lib/python/kdoc/c_lex.py @@ -16,6 +16,8 @@ Other errors are logged via log instance. import logging import re =20 +from copy import copy + from .kdoc_re import KernRe =20 log =3D logging.getLogger(__name__) @@ -284,6 +286,172 @@ class CTokenizer(): return out =20 =20 +class CTokenArgs: + """ + Ancillary class to help using backrefs from sub matches. + + If the highest backref contain a "+" at the last element, + the logic will be greedy, picking all other delims. + + This is needed to parse struct_group macros with end with ``MEMBERS...= ``. + """ + def __init__(self, sub_str): + self.sub_groups =3D set() + self.max_group =3D -1 + self.greedy =3D None + + for m in KernRe(r'\\(\d+)([+]?)').finditer(sub_str): + group =3D int(m.group(1)) + if m.group(2) =3D=3D "+": + if self.greedy and self.greedy !=3D group: + raise ValueError("There are multiple greedy patterns!") + self.greedy =3D group + + self.sub_groups.add(group) + self.max_group =3D max(self.max_group, group) + + if self.greedy: + if self.greedy !=3D self.max_group: + raise ValueError("Greedy pattern is not the last one!") + + sub_str =3D KernRe(r'(\\\d+)[+]').sub(r"\1", sub_str) + + self.sub_str =3D sub_str + self.sub_tokeninzer =3D CTokenizer(sub_str) + + def groups(self, new_tokenizer): + """ + Create replacement arguments for backrefs like: + + ``\0``, ``\1``, ``\2``, ...``\n`` + + It also accepts a ``+`` character to the highest backref. When use= d, + it means in practice to ignore delimins after it, being greedy. + + The logic is smart enough to only go up to the maximum required + argument, even if there are more. + + If there is a backref for an argument above the limit, it will + raise an exception. Please notice that, on C, square brackets + don't have any separator on it. Trying to use ``\1``..``\n`` for + brackets also raise an exception. + """ + + level =3D (0, 0, 0) + + if self.max_group < 0: + return level, [] + + tokens =3D new_tokenizer.tokens + + # + # Fill \0 with the full token contents + # + groups_list =3D [ [] ] + + if 0 in self.sub_groups: + inner_level =3D 0 + + for i in range(0, len(tokens)): + tok =3D tokens[i] + + if tok.kind =3D=3D CToken.BEGIN: + inner_level +=3D 1 + + # + # Discard first begin + # + if not groups_list[0]: + continue + elif tok.kind =3D=3D CToken.END: + inner_level -=3D 1 + if inner_level < 0: + break + + if inner_level: + groups_list[0].append(tok) + + if not self.max_group: + return level, groups_list + + delim =3D None + + # + # Ignore everything before BEGIN. The value of begin gives the + # delimiter to be used for the matches + # + for i in range(0, len(tokens)): + tok =3D tokens[i] + if tok.kind =3D=3D CToken.BEGIN: + if tok.value =3D=3D "{": + delim =3D ";" + elif tok.value =3D=3D "(": + delim =3D "," + else: + self.log.error(fr"Can't handle \1..\n on {sub_str}") + + level =3D tok.level + break + + pos =3D 1 + groups_list.append([]) + + inner_level =3D 0 + for i in range(i + 1, len(tokens)): + tok =3D tokens[i] + + if tok.kind =3D=3D CToken.BEGIN: + inner_level +=3D 1 + if tok.kind =3D=3D CToken.END: + inner_level -=3D 1 + if inner_level < 0: + break + + if tok.kind in [CToken.PUNC, CToken.ENDSTMT] and delim =3D=3D = tok.value: + pos +=3D 1 + if self.greedy and pos > self.max_group: + pos -=3D 1 + else: + groups_list.append([]) + + if pos > self.max_group: + break + + continue + + groups_list[pos].append(tok) + + if pos < self.max_group: + log.error(fr"{self.sub_str} groups are up to {pos} instead of = {self.max_group}") + + return level, groups_list + + def tokens(self, new_tokenizer): + level, groups =3D self.groups(new_tokenizer) + + new =3D CTokenizer() + + for tok in self.sub_tokeninzer.tokens: + if tok.kind =3D=3D CToken.BACKREF: + group =3D int(tok.value[1:]) + + for group_tok in groups[group]: + new_tok =3D copy(group_tok) + + new_level =3D [0, 0, 0] + + for i in range(0, len(level)): + new_level[i] =3D new_tok.level[i] + level[i] + + new_tok.level =3D tuple(new_level) + + new.tokens +=3D [ new_tok ] + else: + new.tokens +=3D [ tok ] + + return new.tokens + + class CMatch: """ Finding nested delimiters is hard with regular expressions. It is @@ -309,10 +477,10 @@ class CMatch: will ignore the search string. """ =20 - # TODO: add a sub method =20 - def __init__(self, regex): - self.regex =3D KernRe(regex) + def __init__(self, regex, delim=3D"("): + self.regex =3D KernRe("^" + regex + r"\b") + self.start_delim =3D delim =20 def _search(self, tokenizer): """ @@ -335,7 +503,6 @@ class CMatch: """ =20 start =3D None - offset =3D -1 started =3D False =20 import sys @@ -351,15 +518,24 @@ class CMatch: =20 continue =20 - if not started and tok.kind =3D=3D CToken.BEGIN: - started =3D True - continue + if not started: + if tok.kind =3D=3D CToken.SPACE: + continue + + if tok.kind =3D=3D CToken.BEGIN and tok.value =3D=3D self.= start_delim: + started =3D True + continue + + # Name only token without BEGIN/END + if i > start: + i -=3D 1 + yield start, i + start =3D None =20 if tok.kind =3D=3D CToken.END and tok.level =3D=3D stack[-1][1= ]: start, level =3D stack.pop() - offset =3D i =20 - yield CTokenizer(tokenizer.tokens[start:offset + 1]) + yield start, i start =3D None =20 # @@ -367,9 +543,12 @@ class CMatch: # This is meant to solve cases where the caller logic might be # picking an incomplete block. # - if start and offset < 0: - print("WARNING: can't find an end", file=3Dsys.stderr) - yield CTokenizer(tokenizer.tokens[start:]) + if start and stack: + if started: + s =3D str(tokenizer) + log.warning(f"can't find a final end at {s}") + + yield start, len(tokenizer.tokens) =20 def search(self, source): """ @@ -386,8 +565,75 @@ class CMatch: tokenizer =3D CTokenizer(source) is_token =3D False =20 - for new_tokenizer in self._search(tokenizer): + for start, end in self._search(tokenizer): + new_tokenizer =3D CTokenizer(tokenizer.tokens[start:end + 1]) + if is_token: yield new_tokenizer else: yield str(new_tokenizer) + + def sub(self, sub_str, source, count=3D0): + """ + This is similar to re.sub: + + It matches a regex that it is followed by a delimiter, + replacing occurrences only if all delimiters are paired. + + if the sub argument contains:: + + r'\0' + + it will work just like re: it places there the matched paired data + with the delimiter stripped. + + If count is different than zero, it will replace at most count + items. + """ + if isinstance(source, CTokenizer): + is_token =3D True + tokenizer =3D source + else: + is_token =3D False + tokenizer =3D CTokenizer(source) + + # Detect if sub_str contains sub arguments + + args_match =3D CTokenArgs(sub_str) + + new_tokenizer =3D CTokenizer() + pos =3D 0 + n =3D 0 + + # + # NOTE: the code below doesn't consider overlays at sub. + # We may need to add some extra unit tests to check if those + # would cause problems. When replacing by "", this should not + # be a problem, but other transformations could be problematic + # + for start, end in self._search(tokenizer): + new_tokenizer.tokens +=3D tokenizer.tokens[pos:start] + + new =3D CTokenizer(tokenizer.tokens[start:end + 1]) + + new_tokenizer.tokens +=3D args_match.tokens(new) + + pos =3D end + 1 + + n +=3D 1 + if count and n >=3D count: + break + + new_tokenizer.tokens +=3D tokenizer.tokens[pos:] + + if not is_token: + return str(new_tokenizer) + + return new_tokenizer + + def __repr__(self): + """ + Returns a displayable version of the class init. + """ + + return f'CMatch("{self.regex.regex.pattern}")' --=20 2.52.0 From nobody Wed Apr 8 10:32:43 2026 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 372D33F8E12; Tue, 17 Mar 2026 18:09:50 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773770990; cv=none; b=NdfhQdnx+bOR614q9PWyUlztqNEGlgNR9WInbetEPQq3g93RWIRbzinnzFssEGJpvAjxTcjyasb1OuBOn16lbrvI0riKv+6rSocFTc/Z3f7XVlURUhotryq/mnTTl9PgYAg1X5HS7WVc820Co/xWkb/V60kTkNRmGek124d8RnQ= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773770990; c=relaxed/simple; bh=ZDRG5DBGkJYDCOF+lIGnqxY+DJZczzp7U4kkKLWMkJk=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=hI3j7rCfv5FL+O8yrojIAV31DLoaCZheh9MnI0pJD8/RB7LGdivQWNDoX++8TcgKiM7ueNG2rLeVRIoCypzYjK6MKeaRqmM43BCJE6cr2iaxzvwiWFvQOnT0me9Fo6awVzqJWal0qLgAUHgaGCyt876skZUvOS8otiAGvRBk87g= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=vNS1LYyx; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="vNS1LYyx" Received: by smtp.kernel.org (Postfix) with ESMTPSA id AA7E8C4CEF7; Tue, 17 Mar 2026 18:09:49 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1773770990; bh=ZDRG5DBGkJYDCOF+lIGnqxY+DJZczzp7U4kkKLWMkJk=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=vNS1LYyxOBioBEWYR7h62fJBrOtzb/roeeRlhGuGHctB34JMvIefm959FtIlKMaaF 7G8l4G2ZE2oT1u8sMNXsMYcMgBu022vP7d/RY+pnrCvQZZa2CkOCu9lkhrn4u7UYav ZyH/5lYxVUIl+517oQhTW5WaetI86B9ZzDlfPFVZsBnRnR+mW0ypJT48pmzemI5XBs B8qdaxMF8YOBy5g4kI4q39ePqMYTTVPLHdHK7JRxkZmTnCeTmESCFCo19SjAc3szYv 6Zud9kJJYdwOvC3SLYn1pfjsXyPBohzL9Ioe0ry0p5hq1zDy8RbtL4Ib8gMJ0oM4c9 l8qQZgTCW3C2Q== Received: from mchehab by mail.kernel.org with local (Exim 4.99.1) (envelope-from ) id 1w2YrT-0000000H5UD-3oyp; Tue, 17 Mar 2026 19:09:47 +0100 From: Mauro Carvalho Chehab To: Jonathan Corbet , Kees Cook , Linux Doc Mailing List Cc: Mauro Carvalho Chehab , linux-hardening@vger.kernel.org, linux-kernel@vger.kernel.org, "Gustavo A. R. Silva" Subject: [PATCH v3 13/22] unittests: test_cmatch: add tests for sub() Date: Tue, 17 Mar 2026 19:09:33 +0100 Message-ID: X-Mailer: git-send-email 2.52.0 In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Sender: Mauro Carvalho Chehab Now that we have code for sub(), test it. Signed-off-by: Mauro Carvalho Chehab --- tools/unittests/test_cmatch.py | 730 ++++++++++++++++++++++++++++++++- 1 file changed, 728 insertions(+), 2 deletions(-) diff --git a/tools/unittests/test_cmatch.py b/tools/unittests/test_cmatch.py index 53b25aa4dc4a..7b996f83784d 100755 --- a/tools/unittests/test_cmatch.py +++ b/tools/unittests/test_cmatch.py @@ -21,7 +21,7 @@ SRC_DIR =3D os.path.dirname(os.path.realpath(__file__)) sys.path.insert(0, os.path.join(SRC_DIR, "../lib/python")) =20 from kdoc.c_lex import CMatch -from kdoc.xforms_lists import CTransforms +from kdoc.kdoc_re import KernRe from unittest_helper import run_unittest =20 # @@ -75,7 +75,7 @@ class TestSearch(TestCaseDiff): =20 def test_search_no_false_positive(self): line =3D "call__acquires(foo); // should stay intact" - result =3D ", ".join(CMatch(r"\b__acquires").search(line)) + result =3D ", ".join(CMatch(r"__acquires").search(line)) self.assertEqual(result, "") =20 def test_search_no_macro_remains(self): @@ -88,6 +88,732 @@ class TestSearch(TestCaseDiff): result =3D ", ".join(CMatch(line).search(line)) self.assertEqual(result, "") =20 +# +# Override unittest.TestCase to better compare diffs ignoring whitespaces +# +class TestCaseDiff(unittest.TestCase): + """ + Disable maximum limit on diffs and add a method to better + handle diffs with whitespace differences. + """ + + @classmethod + def setUpClass(cls): + """Ensure that there won't be limit for diffs""" + cls.maxDiff =3D None + + def assertLogicallyEqual(self, a, b): + """ + Compare two results ignoring multiple whitespace differences. + + This is useful to check more complex matches picked from examples. + On a plus side, we also don't need to use dedent. + Please notice that line breaks still need to match. We might + remove it at the regex, but this way, checking the diff is easier. + """ + a =3D re.sub(r"[\t ]+", " ", a.strip()) + b =3D re.sub(r"[\t ]+", " ", b.strip()) + + a =3D re.sub(r"\s+\n", "\n", a) + b =3D re.sub(r"\s+\n", "\n", b) + + a =3D re.sub(" ;", ";", a) + b =3D re.sub(" ;", ";", b) + + self.assertEqual(a, b) + +# +# Tests doing with different macros +# + +class TestSubMultipleMacros(TestCaseDiff): + """ + Tests doing with different macros. + + Here, we won't use assertLogicallyEqual. Instead, we'll check if each + of the expected patterns are present at the answer. + """ + + def test_acquires_simple(self): + """Simple replacement test with __acquires""" + line =3D "__acquires(ctx) foo();" + result =3D CMatch(r"__acquires").sub("REPLACED", line) + + self.assertEqual("REPLACED foo();", result) + + def test_acquires_multiple(self): + """Multiple __acquires""" + line =3D "__acquires(ctx) __acquires(other) bar();" + result =3D CMatch(r"__acquires").sub("REPLACED", line) + + self.assertEqual("REPLACED REPLACED bar();", result) + + def test_acquires_nested_paren(self): + """__acquires with nested pattern""" + line =3D "__acquires((ctx1, ctx2)) baz();" + result =3D CMatch(r"__acquires").sub("REPLACED", line) + + self.assertEqual("REPLACED baz();", result) + + def test_must_hold(self): + """__must_hold with a pointer""" + line =3D "__must_hold(&lock) do_something();" + result =3D CMatch(r"__must_hold").sub("REPLACED", line) + + self.assertNotIn("__must_hold(", result) + self.assertIn("do_something();", result) + + def test_must_hold_shared(self): + """__must_hold with an upercase defined value""" + line =3D "__must_hold_shared(RCU) other();" + result =3D CMatch(r"__must_hold_shared").sub("REPLACED", line) + + self.assertNotIn("__must_hold_shared(", result) + self.assertIn("other();", result) + + def test_no_false_positive(self): + """ + Ensure that unrelated text containing similar patterns is preserved + """ + line =3D "call__acquires(foo); // should stay intact" + result =3D CMatch(r"\b__acquires").sub("REPLACED", line) + + self.assertLogicallyEqual(result, "call__acquires(foo);") + + def test_mixed_macros(self): + """Add a mix of macros""" + line =3D "__acquires(ctx) __releases(ctx) __must_hold(&lock) foo()= ;" + + result =3D CMatch(r"__acquires").sub("REPLACED", line) + result =3D CMatch(r"__releases").sub("REPLACED", result) + result =3D CMatch(r"__must_hold").sub("REPLACED", result) + + self.assertNotIn("__acquires(", result) + self.assertNotIn("__releases(", result) + self.assertNotIn("__must_hold(", result) + + self.assertIn("foo();", result) + + def test_no_macro_remains(self): + """Ensures that unmatched macros are untouched""" + line =3D "do_something_else();" + result =3D CMatch(r"__acquires").sub("REPLACED", line) + + self.assertEqual(result, line) + + def test_no_function(self): + """Ensures that no functions will remain untouched""" + line =3D "something" + result =3D CMatch(line).sub("REPLACED", line) + + self.assertEqual(result, line) + +# +# Check if the diff is logically equivalent. To simplify, the tests here +# use a single macro name for all replacements. +# + +class TestSubSimple(TestCaseDiff): + """ + Test argument replacements. + + Here, the function name can be anything. So, we picked __attribute__(), + to mimic a macro found at the Kernel, but none of the replacements her + has any relationship with the Kernel usage. + """ + + MACRO =3D "__attribute__" + + @classmethod + def setUpClass(cls): + """Define a CMatch to be used for all tests""" + cls.matcher =3D CMatch(cls.MACRO) + + def test_sub_with_capture(self): + """Test all arguments replacement with a single arg""" + line =3D f"{self.MACRO}(&ctx)\nfoo();" + + result =3D self.matcher.sub(r"ACQUIRED(\0)", line) + + self.assertLogicallyEqual("ACQUIRED(&ctx)\nfoo();", result) + + def test_sub_zero_placeholder(self): + """Test all arguments replacement with a multiple args""" + line =3D f"{self.MACRO}(arg1, arg2)\nbar();" + + result =3D self.matcher.sub(r"REPLACED(\0)", line) + + self.assertLogicallyEqual("REPLACED(arg1, arg2)\nbar();", result) + + def test_sub_single_placeholder(self): + """Single replacement rule for \1""" + line =3D f"{self.MACRO}(ctx, boo)\nfoo();" + result =3D self.matcher.sub(r"ACQUIRED(\1)", line) + + self.assertLogicallyEqual("ACQUIRED(ctx)\nfoo();", result) + + def test_sub_multiple_placeholders(self): + """Replacement rule for both \1 and \2""" + line =3D f"{self.MACRO}(arg1, arg2)\nbar();" + result =3D self.matcher.sub(r"REPLACE(\1, \2)", line) + + self.assertLogicallyEqual("REPLACE(arg1, arg2)\nbar();", result) + + def test_sub_mixed_placeholders(self): + """Replacement rule for \0, \1 and additional text""" + line =3D f"{self.MACRO}(foo, bar)\nbaz();" + result =3D self.matcher.sub(r"ALL(\0) FIRST(\1)", line) + + self.assertLogicallyEqual("ALL(foo, bar) FIRST(foo)\nbaz();", resu= lt) + + def test_sub_no_placeholder(self): + """Replacement without placeholders""" + line =3D f"{self.MACRO}(arg)\nfoo();" + result =3D self.matcher.sub(r"NO_BACKREFS()", line) + + self.assertLogicallyEqual("NO_BACKREFS()\nfoo();", result) + + def test_sub_count_parameter(self): + """Verify that the algorithm stops after the requested count""" + line =3D f"{self.MACRO}(a1) x();\n{self.MACRO}(a2) y();" + result =3D self.matcher.sub(r"ONLY_FIRST(\1) ", line, count=3D1) + + self.assertLogicallyEqual(f"ONLY_FIRST(a1) x();\n{self.MACRO}(a2) = y();", + result) + + def test_strip_multiple_acquires(self): + """Check if spaces between removed delimiters will be dropped""" + line =3D f"int {self.MACRO}(1) {self.MACRO}(2 ) {self.MACRO}(3)= foo;" + result =3D self.matcher.sub("", line) + + self.assertLogicallyEqual(result, "int foo;") + + def test_rise_early_greedy(self): + line =3D f"{self.MACRO}(a, b, c, d);" + sub =3D r"\1, \2+, \3" + + with self.assertRaises(ValueError): + result =3D self.matcher.sub(sub, line) + + def test_rise_multiple_greedy(self): + line =3D f"{self.MACRO}(a, b, c, d);" + sub =3D r"\1, \2+, \3+" + + with self.assertRaises(ValueError): + result =3D self.matcher.sub(sub, line) + +# +# Test replacements with slashrefs +# + + +class TestSubWithLocalXforms(TestCaseDiff): + """ + Test diferent usecase patterns found at the Kernel. + + Here, replacements using both CMatch and KernRe can be tested, + as it will import the actual replacement rules used by kernel-doc. + """ + + struct_xforms =3D [ + (CMatch("__attribute__"), ' '), + (CMatch('__aligned'), ' '), + (CMatch('__counted_by'), ' '), + (CMatch('__counted_by_(le|be)'), ' '), + (CMatch('__guarded_by'), ' '), + (CMatch('__pt_guarded_by'), ' '), + + (CMatch('__cacheline_group_(begin|end)'), ''), + + (CMatch('struct_group'), r'\2'), + (CMatch('struct_group_attr'), r'\3'), + (CMatch('struct_group_tagged'), r'struct \1 { \3+ } \2;'), + (CMatch('__struct_group'), r'\4'), + + (CMatch('__ETHTOOL_DECLARE_LINK_MODE_MASK'), r'DECLARE_BITMAP(\1, = __ETHTOOL_LINK_MODE_MASK_NBITS)'), + (CMatch('DECLARE_PHY_INTERFACE_MASK',), r'DECLARE_BITMAP(\1, PHY_I= NTERFACE_MODE_MAX)'), + (CMatch('DECLARE_BITMAP'), r'unsigned long \1[BITS_TO_LONGS(\2)]'), + + (CMatch('DECLARE_HASHTABLE'), r'unsigned long \1[1 << ((\2) - 1)]'= ), + (CMatch('DECLARE_KFIFO'), r'\2 *\1'), + (CMatch('DECLARE_KFIFO_PTR'), r'\2 *\1'), + (CMatch('(?:__)?DECLARE_FLEX_ARRAY'), r'\1 \2[]'), + (CMatch('DEFINE_DMA_UNMAP_ADDR'), r'dma_addr_t \1'), + (CMatch('DEFINE_DMA_UNMAP_LEN'), r'__u32 \1'), + (CMatch('VIRTIO_DECLARE_FEATURES'), r'union { u64 \1; u64 \1_array= [VIRTIO_FEATURES_U64S]; }'), + ] + + function_xforms =3D [ + (CMatch('__printf'), ""), + (CMatch('__(?:re)?alloc_size'), ""), + (CMatch("__diagnose_as"), ""), + (CMatch("DECL_BUCKET_PARAMS"), r"\1, \2"), + + (CMatch("__cond_acquires"), ""), + (CMatch("__cond_releases"), ""), + (CMatch("__acquires"), ""), + (CMatch("__releases"), ""), + (CMatch("__must_hold"), ""), + (CMatch("__must_not_hold"), ""), + (CMatch("__must_hold_shared"), ""), + (CMatch("__cond_acquires_shared"), ""), + (CMatch("__acquires_shared"), ""), + (CMatch("__releases_shared"), ""), + (CMatch("__attribute__"), ""), + ] + + var_xforms =3D [ + (CMatch('__guarded_by'), ""), + (CMatch('__pt_guarded_by'), ""), + (CMatch("LIST_HEAD"), r"struct list_head \1"), + ] + + #: Transforms main dictionary used at apply_transforms(). + xforms =3D { + "struct": struct_xforms, + "func": function_xforms, + "var": var_xforms, + } + + @classmethod + def apply_transforms(cls, xform_type, text): + """ + Mimic the behavior of kdoc_parser.apply_transforms() method. + + For each element of STRUCT_XFORMS, apply apply_transforms. + + There are two parameters: + + - ``xform_type`` + Can be ``func``, ``struct`` or ``var``; + - ``text`` + The text where the sub patterns from CTransforms will be appli= ed. + """ + for search, subst in cls.xforms.get(xform_type): + text =3D search.sub(subst, text) + + return text.strip() + + cls.matcher =3D CMatch(r"struct_group[\w\_]*") + + def test_struct_group(self): + """ + Test struct_group using a pattern from + drivers/net/ethernet/asix/ax88796c_main.h. + """ + line =3D """ + struct tx_pkt_info { + struct_group(tx_overhead, + struct tx_sop_header sop; + struct tx_segment_header seg; + ); + struct tx_eop_header eop; + u16 pkt_len; + u16 seq_num; + }; + """ + expected =3D """ + struct tx_pkt_info { + struct tx_sop_header sop; + struct tx_segment_header seg; + struct tx_eop_header eop; + u16 pkt_len; + u16 seq_num; + }; + """ + + result =3D self.apply_transforms("struct", line) + self.assertLogicallyEqual(result, expected) + + def test_struct_group_attr(self): + """ + Test two struct_group_attr using patterns from fs/smb/client/cifsp= du.h. + """ + line =3D """ + typedef struct smb_com_open_rsp { + struct smb_hdr hdr; /* wct =3D 34 BB */ + __u8 AndXCommand; + __u8 AndXReserved; + __le16 AndXOffset; + __u8 OplockLevel; + __u16 Fid; + __le32 CreateAction; + struct_group_attr(common_attributes,, + __le64 CreationTime; + __le64 LastAccessTime; + __le64 LastWriteTime; + __le64 ChangeTime; + __le32 FileAttributes; + ); + __le64 AllocationSize; + __le64 EndOfFile; + __le16 FileType; + __le16 DeviceState; + __u8 DirectoryFlag; + __u16 ByteCount; /* bct =3D 0 */ + } OPEN_RSP; + typedef struct { + struct_group_attr(common_attributes,, + __le64 CreationTime; + __le64 LastAccessTime; + __le64 LastWriteTime; + __le64 ChangeTime; + __le32 Attributes; + ); + __u32 Pad1; + __le64 AllocationSize; + __le64 EndOfFile; + __le32 NumberOfLinks; + __u8 DeletePending; + __u8 Directory; + __u16 Pad2; + __le32 EASize; + __le32 FileNameLength; + union { + char __pad; + DECLARE_FLEX_ARRAY(char, FileName); + }; + } FILE_ALL_INFO; /* level 0x107 QPathInfo */ + """ + expected =3D """ + typedef struct smb_com_open_rsp { + struct smb_hdr hdr; + __u8 AndXCommand; + __u8 AndXReserved; + __le16 AndXOffset; + __u8 OplockLevel; + __u16 Fid; + __le32 CreateAction; + __le64 CreationTime; + __le64 LastAccessTime; + __le64 LastWriteTime; + __le64 ChangeTime; + __le32 FileAttributes; + __le64 AllocationSize; + __le64 EndOfFile; + __le16 FileType; + __le16 DeviceState; + __u8 DirectoryFlag; + __u16 ByteCount; + } OPEN_RSP; + typedef struct { + __le64 CreationTime; + __le64 LastAccessTime; + __le64 LastWriteTime; + __le64 ChangeTime; + __le32 Attributes; + __u32 Pad1; + __le64 AllocationSize; + __le64 EndOfFile; + __le32 NumberOfLinks; + __u8 DeletePending; + __u8 Directory; + __u16 Pad2; + __le32 EASize; + __le32 FileNameLength; + union { + char __pad; + char FileName[]; + }; + } FILE_ALL_INFO; + """ + + result =3D self.apply_transforms("struct", line) + self.assertLogicallyEqual(result, expected) + + def test_raw_struct_group(self): + """ + Test a __struct_group pattern from include/uapi/cxl/features.h. + """ + line =3D """ + struct cxl_mbox_get_sup_feats_out { + __struct_group(cxl_mbox_get_sup_feats_out_hdr, hdr, /* emp= ty */, + __le16 num_entries; + __le16 supported_feats; + __u8 reserved[4]; + ); + struct cxl_feat_entry ents[] __counted_by_le(num_entries); + } __attribute__ ((__packed__)); + """ + expected =3D """ + struct cxl_mbox_get_sup_feats_out { + __le16 num_entries; + __le16 supported_feats; + __u8 reserved[4]; + struct cxl_feat_entry ents[]; + }; + """ + + result =3D self.apply_transforms("struct", line) + self.assertLogicallyEqual(result, expected) + + def test_raw_struct_group_tagged(self): + r""" + Test cxl_regs with struct_group_tagged patterns from drivers/cxl/c= xl.h. + + NOTE: + + This one has actually a violation from what kernel-doc would + expect: Kernel-doc regex expects only 3 members, but this is + actually defined as:: + + #define struct_group_tagged(TAG, NAME, MEMBERS...) + + The replace expression there is:: + + struct \1 { \3 } \2; + + but it should be really something like:: + + struct \1 { \3 \4 \5 \6 \7 \8 ... } \2; + + a later fix would be needed to address it. + + """ + line =3D """ + struct cxl_regs { + struct_group_tagged(cxl_component_regs, component, + void __iomem *hdm_decoder; + void __iomem *ras; + ); + + + /* This is actually a violation: too much commas */ + struct_group_tagged(cxl_device_regs, device_regs, + void __iomem *status, *mbox, *memdev; + ); + + struct_group_tagged(cxl_pmu_regs, pmu_regs, + void __iomem *pmu; + ); + + struct_group_tagged(cxl_rch_regs, rch_regs, + void __iomem *dport_aer; + ); + + struct_group_tagged(cxl_rcd_regs, rcd_regs, + void __iomem *rcd_pcie_cap; + ); + }; + """ + expected =3D """ + struct cxl_regs { + struct cxl_component_regs { + void __iomem *hdm_decoder; + void __iomem *ras; + } component; + + struct cxl_device_regs { + void __iomem *status, *mbox, *memdev; + } device_regs; + + struct cxl_pmu_regs { + void __iomem *pmu; + } pmu_regs; + + struct cxl_rch_regs { + void __iomem *dport_aer; + } rch_regs; + + struct cxl_rcd_regs { + void __iomem *rcd_pcie_cap; + } rcd_regs; + }; + """ + + result =3D self.apply_transforms("struct", line) + self.assertLogicallyEqual(result, expected) + + def test_struct_group_tagged_with_private(self): + """ + Replace struct_group_tagged with private, using the same regex + for the replacement as what happens in xforms_lists.py. + + As the private removal happens outside NestedGroup class, we manua= lly + dropped the remaining part of the struct, to simulate what happens + at kdoc_parser. + + Taken from include/net/page_pool/types.h + """ + line =3D """ + struct page_pool_params { + struct_group_tagged(page_pool_params_slow, slow, + struct net_device *netdev; + unsigned int queue_idx; + unsigned int flags; + /* private: only under "slow" struct */ + unsigned int ignored; + ); + /* Struct below shall not be ignored */ + struct_group_tagged(page_pool_params_fast, fast, + unsigned int order; + unsigned int pool_size; + int nid; + struct device *dev; + struct napi_struct *napi; + enum dma_data_direction dma_dir; + unsigned int max_len; + unsigned int offset; + ); + }; + """ + expected =3D """ + struct page_pool_params { + struct page_pool_params_slow { + struct net_device *netdev; + unsigned int queue_idx; + unsigned int flags; + } slow; + struct page_pool_params_fast { + unsigned int order; + unsigned int pool_size; + int nid; + struct device *dev; + struct napi_struct *napi; + enum dma_data_direction dma_dir; + unsigned int max_len; + unsigned int offset; + } fast; + }; + """ + + result =3D self.apply_transforms("struct", line) + self.assertLogicallyEqual(result, expected) + + def test_struct_kcov(self): + """ + """ + line =3D """ + struct kcov { + refcount_t refcount; + spinlock_t lock; + enum kcov_mode mode __guarded_by(&lock); + unsigned int size __guarded_by(&lock); + void *area __guarded_by(&lock); + struct task_struct *t __guarded_by(&lock); + bool remote; + unsigned int remote_size; + int sequence; + }; + """ + expected =3D """ + """ + + result =3D self.apply_transforms("struct", line) + self.assertLogicallyEqual(result, expected) + + + def test_struct_kcov(self): + """ + Test a struct from kernel/kcov.c. + """ + line =3D """ + struct kcov { + refcount_t refcount; + spinlock_t lock; + enum kcov_mode mode __guarded_by(&lock); + unsigned int size __guarded_by(&lock); + void *area __guarded_by(&lock); + struct task_struct *t __guarded_by(&lock); + bool remote; + unsigned int remote_size; + int sequence; + }; + """ + expected =3D """ + struct kcov { + refcount_t refcount; + spinlock_t lock; + enum kcov_mode mode; + unsigned int size; + void *area; + struct task_struct *t; + bool remote; + unsigned int remote_size; + int sequence; + }; + """ + + result =3D self.apply_transforms("struct", line) + self.assertLogicallyEqual(result, expected) + + def test_vars_stackdepot(self): + """ + Test guarded_by on vars from lib/stackdepot.c. + """ + line =3D """ + size_t pool_offset __guarded_by(&pool_lock) =3D DEPOT_POOL_SIZ= E; + __guarded_by(&pool_lock) LIST_HEAD(free_stacks); + void **stack_pools __pt_guarded_by(&pool_lock); + """ + expected =3D """ + size_t pool_offset =3D DEPOT_POOL_SIZE; + struct list_head free_stacks; + void **stack_pools; + """ + + result =3D self.apply_transforms("var", line) + self.assertLogicallyEqual(result, expected) + + def test_functions_with_acquires_and_releases(self): + """ + Test guarded_by on vars from lib/stackdepot.c. + """ + line =3D """ + bool prepare_report_consumer(unsigned long *flags, + const struct access_info *ai, + struct other_info *other_info) \ + __cond_acquires(true, &report_lock= ); + + int tcp_sigpool_start(unsigned int id, struct tcp_sigpool *c) \ + __cond_acquires(0, RCU_BH); + + bool undo_report_consumer(unsigned long *flags, + const struct access_info *ai, + struct other_info *other_info) \ + __cond_releases(true, &report_lock); + + void debugfs_enter_cancellation(struct file *file, + struct debugfs_cancellation *c= ) \ + __acquires(cancellation); + + void debugfs_leave_cancellation(struct file *file, + struct debugfs_cancellation *c= ) \ + __releases(cancellation); + + acpi_cpu_flags acpi_os_acquire_lock(acpi_spinlock lockp) \ + __acquires(lockp); + + void acpi_os_release_lock(acpi_spinlock lockp, + acpi_cpu_flags not_used) \ + __releases(lockp) + """ + expected =3D """ + bool prepare_report_consumer(unsigned long *flags, + const struct access_info *ai, + struct other_info *other_info); + + int tcp_sigpool_start(unsigned int id, struct tcp_sigpool *c); + + bool undo_report_consumer(unsigned long *flags, + const struct access_info *ai, + struct other_info *other_info); + + void debugfs_enter_cancellation(struct file *file, + struct debugfs_cancellation *c= ); + + void debugfs_leave_cancellation(struct file *file, + struct debugfs_cancellation *c= ); + + acpi_cpu_flags acpi_os_acquire_lock(acpi_spinlock lockp); + + void acpi_os_release_lock(acpi_spinlock lockp, + acpi_cpu_flags not_used) + """ + + result =3D self.apply_transforms("func", line) + self.assertLogicallyEqual(result, expected) + # # Run all tests # --=20 2.52.0 From nobody Wed Apr 8 10:32:43 2026 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id EE94D3F87F6; Tue, 17 Mar 2026 18:09:49 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773770990; cv=none; b=tXwG8YBBPwiDMAS+IRs6nyMl3u1DnZNoCgQjO03WmgpqYPDeiNbKauDCx91d+0RBVNlLXG29MvUuQ8rovpC5EcNL3V6BaquBVWAop936WOEks8wlSlGaUI2Ftv24a1pppny9r+AyLCaSLJm7lvAvqPhsDuheEU9xToTO6+iyGS4= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773770990; c=relaxed/simple; bh=lK8UHC4LZ0vQiDiDgYHCZuZXU1iZUv+O+RKaBZwhkQQ=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=YBMU1mIaW5j6ubY/fZg/0RaSDpVkMX+HNpDtbN3OcPogaim5w/3P1b7sEUv5sqlYfE1g72xvb51Mp2CW1Wcf+0XlqJh1tOTosgeVmgOrBNLz71CIFYNLtHBUriFFDBMszjWasN/8uoVM4eN+5gLz0aZ4FSfvy0bDpg4tjDiesQM= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=UfM3ACRC; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="UfM3ACRC" Received: by smtp.kernel.org (Postfix) with ESMTPSA id D1E80C2BCB2; Tue, 17 Mar 2026 18:09:49 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1773770989; bh=lK8UHC4LZ0vQiDiDgYHCZuZXU1iZUv+O+RKaBZwhkQQ=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=UfM3ACRCVmxhLzJW7DiWBlvdJS/j/sxMQqT9sltfIINN8O9LZNDVYPadW76DyLNPV GwQHUJ8nHOXc6IqisiRDMFpAI76YZybDoEJRK0BQ8rb5pZ+IC7QlpaFzVB00GzMmOe RwUxtZOw0qm+InnDKo3ewRHBeDhra5EGoJg+xpVGS95RLM9DTEPXwiCf+HItdaKYaB j00aMTqE8ErVTnqWRw4SVF4YgoUyXLU8XNM7NBAUSzzLK5DmmW6HvZgMptbz9R4RSJ S0xTzc1AjME0/V4YA5MVuDNngaf00utFefkL/xhsRc2mD8g5fd+2u7zGMopNW9rpWR 7hZ/N/WVoHyFQ== Received: from mchehab by mail.kernel.org with local (Exim 4.99.1) (envelope-from ) id 1w2YrU-0000000H5VS-0RCf; Tue, 17 Mar 2026 19:09:48 +0100 From: Mauro Carvalho Chehab To: Jonathan Corbet , Linux Doc Mailing List Cc: Mauro Carvalho Chehab , linux-hardening@vger.kernel.org, linux-kernel@vger.kernel.org, Aleksandr Loktionov , Randy Dunlap Subject: [PATCH v3 14/22] docs: kdoc: replace NestedMatch with CMatch Date: Tue, 17 Mar 2026 19:09:34 +0100 Message-ID: <900bff66f8093402999f9fe055fbfa3fa33a8d8b.1773770483.git.mchehab+huawei@kernel.org> X-Mailer: git-send-email 2.52.0 In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Sender: Mauro Carvalho Chehab Our previous approach to solve nested structs were to use NestedMatch. It works well, but adding support to parse delimiters is very complex. Instead, use CMatch, which uses a C tokenizer, making the code more reliable and simpler. Signed-off-by: Mauro Carvalho Chehab --- tools/lib/python/kdoc/kdoc_parser.py | 2 +- tools/lib/python/kdoc/xforms_lists.py | 31 ++++++++++++++------------- 2 files changed, 17 insertions(+), 16 deletions(-) diff --git a/tools/lib/python/kdoc/kdoc_parser.py b/tools/lib/python/kdoc/k= doc_parser.py index 62d8030cf532..efd58c88ff31 100644 --- a/tools/lib/python/kdoc/kdoc_parser.py +++ b/tools/lib/python/kdoc/kdoc_parser.py @@ -14,7 +14,7 @@ import re from pprint import pformat =20 from kdoc.c_lex import CTokenizer -from kdoc.kdoc_re import NestedMatch, KernRe +from kdoc.kdoc_re import KernRe from kdoc.kdoc_item import KdocItem =20 # diff --git a/tools/lib/python/kdoc/xforms_lists.py b/tools/lib/python/kdoc/= xforms_lists.py index c07cbe1e6349..7fa7f52cec7b 100644 --- a/tools/lib/python/kdoc/xforms_lists.py +++ b/tools/lib/python/kdoc/xforms_lists.py @@ -4,7 +4,8 @@ =20 import re =20 -from kdoc.kdoc_re import KernRe, NestedMatch +from kdoc.kdoc_re import KernRe +from kdoc.c_lex import CMatch =20 struct_args_pattern =3D r'([^,)]+)' =20 @@ -60,7 +61,7 @@ class CTransforms: # # As it doesn't properly match the end parenthesis on some cases. # - # So, a better solution was crafted: there's now a NestedMatch + # So, a better solution was crafted: there's now a CMatch # class that ensures that delimiters after a search are properly # matched. So, the implementation to drop STRUCT_GROUP() will be # handled in separate. @@ -72,9 +73,9 @@ class CTransforms: # # Replace macros # - # TODO: use NestedMatch for FOO($1, $2, ...) matches + # TODO: use CMatch for FOO($1, $2, ...) matches # - # it is better to also move those to the NestedMatch logic, + # it is better to also move those to the CMatch logic, # to ensure that parentheses will be properly matched. # (KernRe(r'__ETHTOOL_DECLARE_LINK_MODE_MASK\s*\(([^\)]+)\)', re.S), @@ -95,17 +96,17 @@ class CTransforms: (KernRe(r'DEFINE_DMA_UNMAP_LEN\s*\(' + struct_args_pattern + r'\)'= , re.S), r'__u32 \1'), (KernRe(r'VIRTIO_DECLARE_FEATURES\(([\w_]+)\)'), r'union { u64 \1;= u64 \1_array[VIRTIO_FEATURES_U64S]; }'), =20 - (NestedMatch(r"__cond_acquires\s*\("), ""), - (NestedMatch(r"__cond_releases\s*\("), ""), - (NestedMatch(r"__acquires\s*\("), ""), - (NestedMatch(r"__releases\s*\("), ""), - (NestedMatch(r"__must_hold\s*\("), ""), - (NestedMatch(r"__must_not_hold\s*\("), ""), - (NestedMatch(r"__must_hold_shared\s*\("), ""), - (NestedMatch(r"__cond_acquires_shared\s*\("), ""), - (NestedMatch(r"__acquires_shared\s*\("), ""), - (NestedMatch(r"__releases_shared\s*\("), ""), - (NestedMatch(r'\bSTRUCT_GROUP\('), r'\0'), + (CMatch(r"__cond_acquires"), ""), + (CMatch(r"__cond_releases"), ""), + (CMatch(r"__acquires"), ""), + (CMatch(r"__releases"), ""), + (CMatch(r"__must_hold"), ""), + (CMatch(r"__must_not_hold"), ""), + (CMatch(r"__must_hold_shared"), ""), + (CMatch(r"__cond_acquires_shared"), ""), + (CMatch(r"__acquires_shared"), ""), + (CMatch(r"__releases_shared"), ""), + (CMatch(r"STRUCT_GROUP"), r'\0'), ] =20 #: Transforms for function prototypes. --=20 2.52.0 From nobody Wed Apr 8 10:32:43 2026 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 29F173F8E11; Tue, 17 Mar 2026 18:09:50 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773770990; cv=none; b=dmSZ4YRCV0MwxnI5uq7W0XVkb3a8MhFDdUXwibBdAPytm158ZBJ544Q1Rgh5rM3Y8WJm0KP4ObOqGWMOziTe3HDLnxBH0ScGqInvMb+LWSeRCDky/lYOCp25RaLcfSYPCBJCt9LUptHGBxJQ2flw1BmAbwQZdkDoyt1EzBqsO2Y= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773770990; c=relaxed/simple; bh=PHXaltNZ4ol9wjdXwRcvdpg7f6YIjhlE0exg45N//uI=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=BayqnFcXGn61cA7zSAh3c4DfdbQnWphKBysR0hd4wjLeqZU4H2HZL1VuLk3gtx2+d6qZn/cOrfGTjtNc+lDXHFD/DjpHB87jDBeANsaA20+vtrJJwKjom4V+EXrenEs4JKjnQqS44xMPjxLIQlbNJliicMFVX8nGb+3pxOXlp8A= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=gZrXSmO8; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="gZrXSmO8" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 0C34CC2BC86; Tue, 17 Mar 2026 18:09:50 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1773770990; bh=PHXaltNZ4ol9wjdXwRcvdpg7f6YIjhlE0exg45N//uI=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=gZrXSmO8eUuqNsR8v0KCMp8jF81KEFRbJbqAYCDJ2UpCdHIATyteYnkX450rkHfgY V/R7LzUFsxScj+s2vqyXnulzQwxxKKx+sER7lhA+zDhYCX2l7YDlnOp04dtv88a+yD ySZQygOHGu/y6AjAlx3CpV/tVr0b8HnxeOtGICymjfk3I0imb1FkhcxR6r9OXN076s p7W+u2Ego4wP03zptj11rl/z6PvVZYvq0wEXoILGmFqJPHPB0UCd5MQXZt2YZiEP4H X1eNtVqzrJmMm+wP0dRsUQ8At2CK36Wf2Q75nMF/5m8PMEMQQZ+N/MYnK+WBbK7rww 8GCpMF9EVnSbw== Received: from mchehab by mail.kernel.org with local (Exim 4.99.1) (envelope-from ) id 1w2YrU-0000000H5Wi-1EkP; Tue, 17 Mar 2026 19:09:48 +0100 From: Mauro Carvalho Chehab To: Jonathan Corbet , Linux Doc Mailing List Cc: Mauro Carvalho Chehab , linux-hardening@vger.kernel.org, linux-kernel@vger.kernel.org, Aleksandr Loktionov , Randy Dunlap Subject: [PATCH v3 15/22] docs: kdoc_re: get rid of NestedMatch class Date: Tue, 17 Mar 2026 19:09:35 +0100 Message-ID: X-Mailer: git-send-email 2.52.0 In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Sender: Mauro Carvalho Chehab Now that everything was converted to CMatch, we can get rid of the previous NestedMatch implementation. Signed-off-by: Mauro Carvalho Chehab --- tools/lib/python/kdoc/kdoc_re.py | 201 ------------------------------- 1 file changed, 201 deletions(-) diff --git a/tools/lib/python/kdoc/kdoc_re.py b/tools/lib/python/kdoc/kdoc_= re.py index 085b89a4547c..6f3ae28859ea 100644 --- a/tools/lib/python/kdoc/kdoc_re.py +++ b/tools/lib/python/kdoc/kdoc_re.py @@ -140,204 +140,3 @@ class KernRe: """ =20 return self.last_match.groups() - -#: Nested delimited pairs (brackets and parenthesis) -DELIMITER_PAIRS =3D { - '{': '}', - '(': ')', - '[': ']', -} - -#: compiled delimiters -RE_DELIM =3D KernRe(r'[\{\}\[\]\(\)]') - - -class NestedMatch: - """ - Finding nested delimiters is hard with regular expressions. It is - even harder on Python with its normal re module, as there are several - advanced regular expressions that are missing. - - This is the case of this pattern:: - - '\\bSTRUCT_GROUP(\\(((?:(?>[^)(]+)|(?1))*)\\))[^;]*;' - - which is used to properly match open/close parentheses of the - string search STRUCT_GROUP(), - - Add a class that counts pairs of delimiters, using it to match and - replace nested expressions. - - The original approach was suggested by: - - https://stackoverflow.com/questions/5454322/python-how-to-match-ne= sted-parentheses-with-regex - - Although I re-implemented it to make it more generic and match 3 types - of delimiters. The logic checks if delimiters are paired. If not, it - will ignore the search string. - """ - - # TODO: make NestedMatch handle multiple match groups - # - # Right now, regular expressions to match it are defined only up to - # the start delimiter, e.g.: - # - # \bSTRUCT_GROUP\( - # - # is similar to: STRUCT_GROUP\((.*)\) - # except that the content inside the match group is delimiter-aligned. - # - # The content inside parentheses is converted into a single replace - # group (e.g. r`\0'). - # - # It would be nice to change such definition to support multiple - # match groups, allowing a regex equivalent to: - # - # FOO\((.*), (.*), (.*)\) - # - # it is probably easier to define it not as a regular expression, but - # with some lexical definition like: - # - # FOO(arg1, arg2, arg3) - - def __init__(self, regex): - self.regex =3D KernRe(regex) - - def _search(self, line): - """ - Finds paired blocks for a regex that ends with a delimiter. - - The suggestion of using finditer to match pairs came from: - https://stackoverflow.com/questions/5454322/python-how-to-match-ne= sted-parentheses-with-regex - but I ended using a different implementation to align all three ty= pes - of delimiters and seek for an initial regular expression. - - The algorithm seeks for open/close paired delimiters and places th= em - into a stack, yielding a start/stop position of each match when the - stack is zeroed. - - The algorithm should work fine for properly paired lines, but will - silently ignore end delimiters that precede a start delimiter. - This should be OK for kernel-doc parser, as unaligned delimiters - would cause compilation errors. So, we don't need to raise excepti= ons - to cover such issues. - """ - - stack =3D [] - - for match_re in self.regex.finditer(line): - start =3D match_re.start() - offset =3D match_re.end() - string_char =3D None - escape =3D False - - d =3D line[offset - 1] - if d not in DELIMITER_PAIRS: - continue - - end =3D DELIMITER_PAIRS[d] - stack.append(end) - - for match in RE_DELIM.finditer(line[offset:]): - pos =3D match.start() + offset - - d =3D line[pos] - - if escape: - escape =3D False - continue - - if string_char: - if d =3D=3D '\\': - escape =3D True - elif d =3D=3D string_char: - string_char =3D None - - continue - - if d in ('"', "'"): - string_char =3D d - continue - - if d in DELIMITER_PAIRS: - end =3D DELIMITER_PAIRS[d] - - stack.append(end) - continue - - # Does the end delimiter match what is expected? - if stack and d =3D=3D stack[-1]: - stack.pop() - - if not stack: - yield start, offset, pos + 1 - break - - def search(self, line): - """ - This is similar to re.search: - - It matches a regex that it is followed by a delimiter, - returning occurrences only if all delimiters are paired. - """ - - for t in self._search(line): - - yield line[t[0]:t[2]] - - def sub(self, sub, line, count=3D0): - """ - This is similar to re.sub: - - It matches a regex that it is followed by a delimiter, - replacing occurrences only if all delimiters are paired. - - if the sub argument contains:: - - r'\0' - - it will work just like re: it places there the matched paired data - with the delimiter stripped. - - If count is different than zero, it will replace at most count - items. - """ - out =3D "" - - cur_pos =3D 0 - n =3D 0 - - for start, end, pos in self._search(line): - out +=3D line[cur_pos:start] - - # Value, ignoring start/end delimiters - value =3D line[end:pos - 1] - - # replaces \0 at the sub string, if \0 is used there - new_sub =3D sub - new_sub =3D new_sub.replace(r'\0', value) - - out +=3D new_sub - - # Drop end ';' if any - if pos < len(line) and line[pos] =3D=3D ';': - pos +=3D 1 - - cur_pos =3D pos - n +=3D 1 - - if count and count >=3D n: - break - - # Append the remaining string - l =3D len(line) - out +=3D line[cur_pos:l] - - return out - - def __repr__(self): - """ - Returns a displayable version of the class init. - """ - - return f'NestedMatch("{self.regex.regex.pattern}")' --=20 2.52.0 From nobody Wed Apr 8 10:32:43 2026 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 5D1AF3F99C3; Tue, 17 Mar 2026 18:09:50 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773770990; cv=none; b=p2kTFOH0ch2sddSYpteC9hQArJVzMoQb6Pa1A9+vVu9zyBb2U1lk4+i2rlQZa/yl+q0EOXwrPcu7KZjdHVJy+MDsMIv6oD6qOqSfSl7qlmuvi/Y1xIO1xcO/fdrRk4y4s3aAs1jlekMSRC0shNludE6NHNlmQ4zDyCTHEMXwkfY= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773770990; c=relaxed/simple; bh=fTBYVejQ5or901ad6djW/bFIacxU0cRuaVuoUbZVtDI=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=HBa500VhLkAcNWqC3UHwAhXc9a9rKf7Vy1dLnlaYOvv5Lp4U+AH8ugRtVAp2y5bsAFmNNVLZibz29Xf4/W16WQ+o5YVkdcmhZNbSXg9QURD3poQ/ar7sTNcqsGW20uLAxMKTBBUpEH/MYIWB5PHhZgPf40veIigjjd2EfulDQNQ= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=dQT96pK0; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="dQT96pK0" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 40735C2BCB0; Tue, 17 Mar 2026 18:09:50 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1773770990; bh=fTBYVejQ5or901ad6djW/bFIacxU0cRuaVuoUbZVtDI=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=dQT96pK0CBh9M+ERLc/QFVhs1sdIl/G0lmjFlYURq5Xv+cvC/ZSD7VMMdqtL+ffby lgG9GCBZwMsQn0EO28eW9NQ9Godk0ko5reazzlxLE74y6DC4D2KBEqq5jgaSXcwki9 db1pWDxp3gQ6VpU8ku3DfOp1ioKzkR18wYpYvlAWShlUmLKo/hbp9ubMLLXue0E1Ye QyVndsMCxBdlahqjKqv9dY0Lg6PQHWHO/YoPlqFJAMwyaaAuXtqqItP73jSKRksKDM 1wZo0XlZ6xHrM95pEfK/fPwo4v2+uCCaGT/z2IwXwheP+u8uwq4A8vCAt0Ak5WnwHM Y7SbVW+LOcNwQ== Received: from mchehab by mail.kernel.org with local (Exim 4.99.1) (envelope-from ) id 1w2YrU-0000000H5Xy-27c5; Tue, 17 Mar 2026 19:09:48 +0100 From: Mauro Carvalho Chehab To: Jonathan Corbet , Linux Doc Mailing List Cc: Mauro Carvalho Chehab , linux-hardening@vger.kernel.org, linux-kernel@vger.kernel.org, Aleksandr Loktionov , Randy Dunlap Subject: [PATCH v3 16/22] docs: xforms_lists: handle struct_group directly Date: Tue, 17 Mar 2026 19:09:36 +0100 Message-ID: X-Mailer: git-send-email 2.52.0 In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Sender: Mauro Carvalho Chehab The previous logic was handling struct_group on two steps. Remove the previous approach, as CMatch can do it the right way on a single step. Signed-off-by: Mauro Carvalho Chehab --- tools/lib/python/kdoc/xforms_lists.py | 53 +++------------------------ 1 file changed, 6 insertions(+), 47 deletions(-) diff --git a/tools/lib/python/kdoc/xforms_lists.py b/tools/lib/python/kdoc/= xforms_lists.py index 7fa7f52cec7b..98632c50a146 100644 --- a/tools/lib/python/kdoc/xforms_lists.py +++ b/tools/lib/python/kdoc/xforms_lists.py @@ -32,52 +32,6 @@ class CTransforms: (KernRe(r'\s*____cacheline_aligned_in_smp', re.S), ' '), (KernRe(r'\s*____cacheline_aligned', re.S), ' '), (KernRe(r'\s*__cacheline_group_(begin|end)\([^\)]+\);'), ''), - # - # Unwrap struct_group macros based on this definition: - # __struct_group(TAG, NAME, ATTRS, MEMBERS...) - # which has variants like: struct_group(NAME, MEMBERS...) - # Only MEMBERS arguments require documentation. - # - # Parsing them happens on two steps: - # - # 1. drop struct group arguments that aren't at MEMBERS, - # storing them as STRUCT_GROUP(MEMBERS) - # - # 2. remove STRUCT_GROUP() ancillary macro. - # - # The original logic used to remove STRUCT_GROUP() using an - # advanced regex: - # - # \bSTRUCT_GROUP(\(((?:(?>[^)(]+)|(?1))*)\))[^;]*; - # - # with two patterns that are incompatible with - # Python re module, as it has: - # - # - a recursive pattern: (?1) - # - an atomic grouping: (?>...) - # - # I tried a simpler version: but it didn't work either: - # \bSTRUCT_GROUP\(([^\)]+)\)[^;]*; - # - # As it doesn't properly match the end parenthesis on some cases. - # - # So, a better solution was crafted: there's now a CMatch - # class that ensures that delimiters after a search are properly - # matched. So, the implementation to drop STRUCT_GROUP() will be - # handled in separate. - # - (KernRe(r'\bstruct_group\s*\(([^,]*,)', re.S), r'STRUCT_GROUP('), - (KernRe(r'\bstruct_group_attr\s*\(([^,]*,){2}', re.S), r'STRUCT_GR= OUP('), - (KernRe(r'\bstruct_group_tagged\s*\(([^,]*),([^,]*),', re.S), r'st= ruct \1 \2; STRUCT_GROUP('), - (KernRe(r'\b__struct_group\s*\(([^,]*,){3}', re.S), r'STRUCT_GROUP= ('), - # - # Replace macros - # - # TODO: use CMatch for FOO($1, $2, ...) matches - # - # it is better to also move those to the CMatch logic, - # to ensure that parentheses will be properly matched. - # (KernRe(r'__ETHTOOL_DECLARE_LINK_MODE_MASK\s*\(([^\)]+)\)', re.S), r'DECLARE_BITMAP(\1, __ETHTOOL_LINK_MODE_MASK_NBITS)'), (KernRe(r'DECLARE_PHY_INTERFACE_MASK\s*\(([^\)]+)\)', re.S), @@ -106,7 +60,12 @@ class CTransforms: (CMatch(r"__cond_acquires_shared"), ""), (CMatch(r"__acquires_shared"), ""), (CMatch(r"__releases_shared"), ""), - (CMatch(r"STRUCT_GROUP"), r'\0'), + + (CMatch('struct_group'), r'\2'), + (CMatch('struct_group_attr'), r'\3'), + (CMatch('struct_group_tagged'), r'struct \1 \2; \3'), + (CMatch('__struct_group'), r'\4'), + ] =20 #: Transforms for function prototypes. --=20 2.52.0 From nobody Wed Apr 8 10:32:43 2026 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 8D5C93F99D4; Tue, 17 Mar 2026 18:09:50 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773770990; cv=none; b=EkloYx8CYAjYx6rE51IrEXcIWjT1J3z6LDIYr0la5XdQaUrHlyHthCpcM7ZOJb3RtYbCplHhaPpIzIJX1UqwmH6/2HdK6PjXzef0KH29BBEYpnzG5lDvrPUo9lyXiIjSjbmLvaHuljerqcpLneCTTWoSC0V1eM2i+JcSNkAsOlE= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773770990; c=relaxed/simple; bh=TynJuQcxbqp8IlXZpmYv2V/i59/dnDGfqmHay7W37mI=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=QG2TLsnHFm0dEN3cm2JPfQTi1odFCbi3yDEtZnw373Ef4YrzNZIPqh2MOftyJ/uGWKBfDymLQpFz6EEkI77IBsiI3O3uhv1whP1FNBY7pQc2fwMRfWfb1dl/6FCwkknXIiS92NL7WVitpbmXbkipnEf8O2yysgYbNFEfbrZQ+xo= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=hcB1kI89; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="hcB1kI89" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 718C2C2BC86; Tue, 17 Mar 2026 18:09:50 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1773770990; bh=TynJuQcxbqp8IlXZpmYv2V/i59/dnDGfqmHay7W37mI=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=hcB1kI89Lk4hpwF+Yljz9qEld/+V6+ErS/F0xFpZs7RXOQOQ0dlQ7V7voj1DxzT+O eFrVKzqV1/QhotZO0899gc1UdbKNhOsBad0pEKFIu9kpmAMNpFEK7NvBw7OhQ4rLlG gPVTpSvzEc9dEsc6mSKX8zgRt8GLBvP1X+ZVzKwOMGHUMsD690wSlXMxfllWp/5+M0 Lr6YtOC0w/08ZXVAB/lsv/xIE7a+8MnAIfLPp9v/oZ0Nb9cXCo+Qz/XhQsUmbopXOn yHnfQMGi7IVFxwmnrkrNv13ea5wS6hCEt6n6gYAaY0h5Aw7AluKGV9vFErECcV82xi aim/doIaQ4aoQ== Received: from mchehab by mail.kernel.org with local (Exim 4.99.1) (envelope-from ) id 1w2YrU-0000000H5ZD-2y2l; Tue, 17 Mar 2026 19:09:48 +0100 From: Mauro Carvalho Chehab To: Jonathan Corbet , Linux Doc Mailing List Cc: Mauro Carvalho Chehab , linux-hardening@vger.kernel.org, linux-kernel@vger.kernel.org, Aleksandr Loktionov , Randy Dunlap Subject: [PATCH v3 17/22] docs: xforms_lists: better evaluate struct_group macros Date: Tue, 17 Mar 2026 19:09:37 +0100 Message-ID: <24bf2c036b08814d9b4aabc27542fd3b2ff54424.1773770483.git.mchehab+huawei@kernel.org> X-Mailer: git-send-email 2.52.0 In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Sender: Mauro Carvalho Chehab The previous approach were to unwind nested structs/unions. Now that we have a logic that can handle it well, use it to ensure that struct_group macros will properly reflect the actual struct. Note that the replacemend logic still simplifies the code a little bit, as the basic build block for struct group is: union { \ struct { MEMBERS } ATTRS; \ struct __struct_group_tag(TAG) { MEMBERS } ATTRS NAME; \ } ATTRS There: - ATTRS is meant to add extra macro attributes like __packed which we already discard, as they aren't relevant to document struct members; - TAG is used only when built with __cplusplus. So, instead, convert them into just: struct { MEMBERS }; Please notice that here, we're using the greedy version of the backrefs, as MEMBERS is actually MEMBERS... on all such macros. Signed-off-by: Mauro Carvalho Chehab Reviewed-by: Aleksandr Loktionov --- tools/lib/python/kdoc/xforms_lists.py | 14 ++++++++++---- 1 file changed, 10 insertions(+), 4 deletions(-) diff --git a/tools/lib/python/kdoc/xforms_lists.py b/tools/lib/python/kdoc/= xforms_lists.py index 98632c50a146..2056572852fd 100644 --- a/tools/lib/python/kdoc/xforms_lists.py +++ b/tools/lib/python/kdoc/xforms_lists.py @@ -61,10 +61,16 @@ class CTransforms: (CMatch(r"__acquires_shared"), ""), (CMatch(r"__releases_shared"), ""), =20 - (CMatch('struct_group'), r'\2'), - (CMatch('struct_group_attr'), r'\3'), - (CMatch('struct_group_tagged'), r'struct \1 \2; \3'), - (CMatch('__struct_group'), r'\4'), + # + # Macro __struct_group() creates an union with an anonymous + # and a non-anonymous struct, depending on the parameters. We only + # need one of those at kernel-doc, as we won't be documenting the = same + # members twice. + # + (CMatch('struct_group'), r'struct { \2+ };'), + (CMatch('struct_group_attr'), r'struct { \3+ };'), + (CMatch('struct_group_tagged'), r'struct { \3+ };'), + (CMatch('__struct_group'), r'struct { \4+ };'), =20 ] =20 --=20 2.52.0 From nobody Wed Apr 8 10:32:43 2026 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C61003F99E0; Tue, 17 Mar 2026 18:09:50 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773770990; cv=none; b=ayK2swwSV3vvn2Cs5VOagCem8Cf1BfGe/lMC8Fro1hjSrL6vWZoeNnfOh2lt6ToxpVuQwOvOMgjj7T3d621TmGG+Gztdw8xwOIU5H1jQ4G7f7m4TpL4t4HHdDP2VpIMNkIFA6PvUeHBLNXHvYb2J5uqQ2rEf12SAL7ydf1jH1lE= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773770990; c=relaxed/simple; bh=zee2diHRB7JHsdRfIDzL90thJMlji9fQpbRuAJApLrU=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=Pizpo+V7GJGHzTEF76G8IGb12x1fvc557CWsOkBc/uEJKL8gP8ulwmXsqkznwT3o9IS3hVY7SlNcr68KJNct99oQX2MCNJ1qvqD9pLqNdc9WcbuQmFIEEvZhSvjyoGggeBGKscJ87iVri6OC7Q6Ur/oe/UHCzjQw6N6nr1d3cGs= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=bcFPK4h3; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="bcFPK4h3" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 9C21CC2BCB0; Tue, 17 Mar 2026 18:09:50 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1773770990; bh=zee2diHRB7JHsdRfIDzL90thJMlji9fQpbRuAJApLrU=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=bcFPK4h37ag9gO0cS2nmQH3anfSw1jha6N11ZtSYBSxliD6/ICia2L41jjYoR3Kgo JkhaFYNe8SVqiX+FeWY+wawNcfvP7T9Xn5vepFMzXh9J1OvqDpZcJqjBkWG5yDfGeV I4BkoVxXAZeFll5fCtQ2mDL2+V3zYX6tBvHhh9x7X1K0blyxgcuMSvqEh5qaW8gkvA MFQeYs0QAgh02FIyj7d5XHM3HW1+xCjMbsbiUNVv4AxtXRU6ShAsNPAX0e+Dyvlbvo F8VD4w6xL8CgASKxMX8CjTMk3DFQnpWjbUjEwAnge5/RXVJwpT0OpcRatWzVL6OAHT 7toUiSXYJrDnw== Received: from mchehab by mail.kernel.org with local (Exim 4.99.1) (envelope-from ) id 1w2YrU-0000000H5ak-3mRd; Tue, 17 Mar 2026 19:09:48 +0100 From: Mauro Carvalho Chehab To: Jonathan Corbet , Linux Doc Mailing List Cc: Mauro Carvalho Chehab , linux-hardening@vger.kernel.org, linux-kernel@vger.kernel.org, Aleksandr Loktionov , Randy Dunlap Subject: [PATCH v3 18/22] docs: c_lex: setup a logger to report tokenizer issues Date: Tue, 17 Mar 2026 19:09:38 +0100 Message-ID: <903ad83ae176196a50444e66177a4f5bcdef5199.1773770483.git.mchehab+huawei@kernel.org> X-Mailer: git-send-email 2.52.0 In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Sender: Mauro Carvalho Chehab Report file that has issues detected via CMatch and CTokenizer. This is done by setting up a logger that will be overriden by kdoc_parser, when used on it. Signed-off-by: Mauro Carvalho Chehab Reviewed-by: Aleksandr Loktionov --- tools/lib/python/kdoc/c_lex.py | 16 ++++++++++++++++ tools/lib/python/kdoc/kdoc_parser.py | 4 +++- 2 files changed, 19 insertions(+), 1 deletion(-) diff --git a/tools/lib/python/kdoc/c_lex.py b/tools/lib/python/kdoc/c_lex.py index 20e50ff0ecd5..b6d58bd470a9 100644 --- a/tools/lib/python/kdoc/c_lex.py +++ b/tools/lib/python/kdoc/c_lex.py @@ -22,6 +22,22 @@ from .kdoc_re import KernRe =20 log =3D logging.getLogger(__name__) =20 +def tokenizer_set_log(logger, prefix =3D ""): + """ + Replace the module=E2=80=91level logger with a LoggerAdapter that + prepends *prefix* to every message. + """ + global log + + class PrefixAdapter(logging.LoggerAdapter): + """ + Ancillary class to set prefix on all message logs. + """ + def process(self, msg, kwargs): + return f"{prefix}{msg}", kwargs + + # Wrap the provided logger in our adapter + log =3D PrefixAdapter(logger, {"prefix": prefix}) =20 class CToken(): """ diff --git a/tools/lib/python/kdoc/kdoc_parser.py b/tools/lib/python/kdoc/k= doc_parser.py index efd58c88ff31..f90c6dd0343d 100644 --- a/tools/lib/python/kdoc/kdoc_parser.py +++ b/tools/lib/python/kdoc/kdoc_parser.py @@ -13,7 +13,7 @@ import sys import re from pprint import pformat =20 -from kdoc.c_lex import CTokenizer +from kdoc.c_lex import CTokenizer, tokenizer_set_log from kdoc.kdoc_re import KernRe from kdoc.kdoc_item import KdocItem =20 @@ -253,6 +253,8 @@ class KernelDoc: self.config =3D config self.xforms =3D xforms =20 + tokenizer_set_log(self.config.log, f"{self.fname}: CMatch: ") + # Initial state for the state machines self.state =3D state.NORMAL =20 --=20 2.52.0 From nobody Wed Apr 8 10:32:43 2026 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id E4A3E3F99E9; Tue, 17 Mar 2026 18:09:50 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773770991; cv=none; b=tYHIHpyMbLMjbO30870FyvsNbJe5MPfwhDTsxG54APG9UgEQKReVK1CiCA09u82HifdV6JkR3uTJNRzOSHDSLhJCXcQFhqos3CLVhntVnD/F3UxypQhb6hOjQPWApEtp7OHHx49KqL76iIbRzn7jyor28NWjOrzc2IaqCiKnW9Y= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773770991; c=relaxed/simple; bh=TmYHPBWmiJoumgXDNjybq4wZqraDZtMhzikKfDnelmU=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=Lay7Wfikvd6hEqDm1RjHA3hBdEbVRhyqb9562uYjJGewJMBmJ296qTdD6jnZeC3g4M/SZVBD7Afh7Dvc6Ea9hfJ+bWb1TzEuPUSZuZjN0VKL7cBByllvrUl+6HGzYcvWqsu8CC5U78lWgLiBtlkdBttE1apoWBtHPfam58TfZv8= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=nj7vYBYC; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="nj7vYBYC" Received: by smtp.kernel.org (Postfix) with ESMTPSA id C59C6C2BC9E; Tue, 17 Mar 2026 18:09:50 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1773770990; bh=TmYHPBWmiJoumgXDNjybq4wZqraDZtMhzikKfDnelmU=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=nj7vYBYCfl+pT13zfzAvDNa1R2d0FYqdakyGUq1T6a1GlOI+F1q6bkl6vrMqlF881 v9FlGL2WXumbCIzn0E0cWp7BLX12vVhERF1ZTYQP6QIpaC2NkzsSwC7cWRMM/O1qTJ 8j8jr03UBOCzkTzLn9EVYT1uVt7UtdallOtBhnjdv9TuHK46E5PJHzLzqFbHjEOh29 N/HVXmr+/KUhJr1NtvurMNuI4KlVawg7QbCu96lVfEMZJAhrZZhFbp0WWhYZ4ODQUv R5kV2Ri5qa5xxBMgcP5HiZu94P5IvhK+k91zDrEh3Tdz/cWBTc7iu+TBe2MrxoTWH/ pqUTtH7KhiQ7w== Received: from mchehab by mail.kernel.org with local (Exim 4.99.1) (envelope-from ) id 1w2YrV-0000000H5bz-0O2z; Tue, 17 Mar 2026 19:09:49 +0100 From: Mauro Carvalho Chehab To: Jonathan Corbet , Linux Doc Mailing List Cc: Mauro Carvalho Chehab , linux-hardening@vger.kernel.org, linux-kernel@vger.kernel.org, Randy Dunlap , Shuah Khan , Vincent Mailhol Subject: [PATCH v3 19/22] docs: kernel-doc.rst: document private: scope propagation Date: Tue, 17 Mar 2026 19:09:39 +0100 Message-ID: X-Mailer: git-send-email 2.52.0 In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Sender: Mauro Carvalho Chehab This was an undefined behavior, but at least one place used private: inside a nested struct meant to not be propagated outside it. Kernel-doc now defines how this is propagated. So, document that. Signed-off-by: Mauro Carvalho Chehab --- Documentation/doc-guide/kernel-doc.rst | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/Documentation/doc-guide/kernel-doc.rst b/Documentation/doc-gui= de/kernel-doc.rst index 8d2c09fb36e4..1c148fe8e1f9 100644 --- a/Documentation/doc-guide/kernel-doc.rst +++ b/Documentation/doc-guide/kernel-doc.rst @@ -213,6 +213,10 @@ The ``private:`` and ``public:`` tags must begin immed= iately following a ``/*`` comment marker. They may optionally include comments between the ``:`` and the ending ``*/`` marker. =20 +When ``private:`` is used on nested structs, it propagates only to inner +structs/unions. + + Example:: =20 /** @@ -256,8 +260,10 @@ It is possible to document nested structs and unions, = like:: union { struct { int memb1; + /* private: hides memb2 from documentation */ int memb2; }; + /* Everything here is public again, as private scope finished */ struct { void *memb3; int memb4; --=20 2.52.0 From nobody Wed Apr 8 10:32:43 2026 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 2033E3F7A80; Tue, 17 Mar 2026 18:09:51 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773770991; cv=none; b=FYT3cBVtyHHiBA++kvn8mazZMnUPxH8MrIFA16by+v61nQ8GTko4crtn7GGGo0SUolzfRZUfa+xfT0Yp9NxY1QiG2XBo8LDO17xBUxuyyH6HfpAhXroVcaIBNvIMTgyv78PV2WAbYE+pXWiUUAjMHS9jGNHCWR9Hh89rv9a0rHc= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773770991; c=relaxed/simple; bh=k+ehfP1sl6SdRybZAYJu9AFpJMlMcv42DU7JH8k+cI0=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=dWmc6LQPFqoy/Sx9NtzSRNpQ3BLU/1u58cMPqkzGxm7WhNpg2vlJwAAF1jDkVgymui2gazn9nQxFOlGYyRLBy9D45dU1pr5H239kpHkQxMaxZwxDVSh1s3BuepEv2eOC8bAxPX/+vUsijJpuUqpAVBwmYzbi6+gmbh2m8EX28c0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=YjHMiAkv; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="YjHMiAkv" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 02E63C2BC86; Tue, 17 Mar 2026 18:09:51 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1773770991; bh=k+ehfP1sl6SdRybZAYJu9AFpJMlMcv42DU7JH8k+cI0=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=YjHMiAkvj+/rn7fQ0lMGt0R3CjBL810iXcsRr3N7Qk3hzPMqEpFHboi4aVaeEImBS sk5lacZPxnqc6X1lzxXKY9H8awn0ffwa7qqW0edwpBAVGmqOKIg2YYjAEj17e+lW/K idQ+tuX85uDRM4Udvc8UEFeoqoFZNwex+kLNFA63+WDCQQTbK430ZGT717At+BOWpC qJjcxSWSUodzuFtitxb7P++u4EWWnfP7Fx5acS1fZtNnj4OaSAzZR/8iM6t9/3AlXL wtp3ps7QGGjTmIpPnW01tHP/kqQMUZ6vu2IpnmSv/AeoL3DiV6u9CJqHgOKfPd2llC aNwvY6aX+w5UQ== Received: from mchehab by mail.kernel.org with local (Exim 4.99.1) (envelope-from ) id 1w2YrV-0000000H5dD-1DqM; Tue, 17 Mar 2026 19:09:49 +0100 From: Mauro Carvalho Chehab To: Jonathan Corbet , Linux Doc Mailing List Cc: Mauro Carvalho Chehab , linux-hardening@vger.kernel.org, linux-kernel@vger.kernel.org, Aleksandr Loktionov , Randy Dunlap Subject: [PATCH v3 20/22] docs: kdoc: ensure that comments are dropped before calling split_struct_proto() Date: Tue, 17 Mar 2026 19:09:40 +0100 Message-ID: X-Mailer: git-send-email 2.52.0 In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Sender: Mauro Carvalho Chehab Changeset 2b957decdb6c ("docs: kdoc: don't add broken comments inside proto= types") revealed a hidden bug at split_struct_proto(): some comments there may break its capability of properly identifying a struct. Fixing it is as simple as stripping comments before calling it. Fixes: 2b957decdb6c ("docs: kdoc: don't add broken comments inside prototyp= es") Signed-off-by: Mauro Carvalho Chehab --- tools/lib/python/kdoc/kdoc_parser.py | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/tools/lib/python/kdoc/kdoc_parser.py b/tools/lib/python/kdoc/k= doc_parser.py index f90c6dd0343d..8b2c9d0f0c58 100644 --- a/tools/lib/python/kdoc/kdoc_parser.py +++ b/tools/lib/python/kdoc/kdoc_parser.py @@ -723,6 +723,7 @@ class KernelDoc: # # Do the basic parse to get the pieces of the declaration. # + proto =3D trim_private_members(proto) struct_parts =3D self.split_struct_proto(proto) if not struct_parts: self.emit_msg(ln, f"{proto} error: Cannot parse struct or unio= n!") @@ -763,6 +764,7 @@ class KernelDoc: # Strip preprocessor directives. Note that this depends on the # trailing semicolon we added in process_proto_type(). # + proto =3D trim_private_members(proto) proto =3D KernRe(r'#\s*((define|ifdef|if)\s+|endif)[^;]*;', flags= =3Dre.S).sub('', proto) # # Parse out the name and members of the enum. Typedef form first. @@ -770,7 +772,7 @@ class KernelDoc: r =3D KernRe(r'typedef\s+enum\s*\{(.*)\}\s*(\w*)\s*;') if r.search(proto): declaration_name =3D r.group(2) - members =3D trim_private_members(r.group(1)) + members =3D r.group(1) # # Failing that, look for a straight enum # @@ -778,7 +780,7 @@ class KernelDoc: r =3D KernRe(r'enum\s+(\w*)\s*\{(.*)\}') if r.match(proto): declaration_name =3D r.group(1) - members =3D trim_private_members(r.group(2)) + members =3D r.group(2) # # OK, this isn't going to work. # --=20 2.52.0 From nobody Wed Apr 8 10:32:43 2026 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 568F93F9F2A; Tue, 17 Mar 2026 18:09:51 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773770991; cv=none; b=Ln7OyD8Bm9TwxM0s0tOzv44zeFhOZHD3jDCxXZT64yFoKCt3iH23KgcCICdqh+wgBvRulfGMp0FBpRAHVf6WXqT0pb0Igw5S6Xdmrw1jzqR2vjaY4ZJoGfa48exSFWp81Z7jxgChVV5juo+XdElYBW73TZTEbsz5XzJayNVYMnI= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773770991; c=relaxed/simple; bh=to0W2pmEulg+FP9A5LEaz371Z48eT+uPf4OqaF42Mc4=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=R2aRJOHQBk06h3kmJHErQjJHL5vLqCRREXpJbOdFLBnuBwGYQWmR5BeAtTaZ5qdabokG2UZZ5HLIiG882fonljpKdyt+XQ6Yz1Lg9bqMTRmPSm5qznlwpNZq5f0LA1VwNWwjhoYDPve5zKc/03rlzurj3wN1fnIEYIvkacKZlsI= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=sh3c5Rx6; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="sh3c5Rx6" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 392ADC2BC9E; Tue, 17 Mar 2026 18:09:51 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1773770991; bh=to0W2pmEulg+FP9A5LEaz371Z48eT+uPf4OqaF42Mc4=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=sh3c5Rx6BcxG1vFD5Mz1ndZrpFiawuxYkqhEbvZIqbuesBVo1bS5FkfFXlqDWCivo S/bDhj7g/AdiVu5dqQyzR/thUeWqNYUuxlwAnnnGKA7uIDG4iWzOgj2fzySIGf0q+6 ge4AeIHZ1PosEOts4KRWEvMaX0fPLzdjlt/izt8pD2x6TtYVThm7eFa2+HJoOBDVA+ xYmeXVTQN3kuaKbD38QUUSxfSqvIKsiLRNyiMQLJMGFj23Di5HJKOEqaQDpNU/85kY 9+HFKdmvWl1NqfKW8kcasXXXhKEFqG+4yUhp4Ouf1845AeJMoyjW3hEBqWLaOVQ6fd 2PIONIZzfaKEw== Received: from mchehab by mail.kernel.org with local (Exim 4.99.1) (envelope-from ) id 1w2YrV-0000000H5eS-24Aj; Tue, 17 Mar 2026 19:09:49 +0100 From: Mauro Carvalho Chehab To: Jonathan Corbet , Linux Doc Mailing List Cc: Mauro Carvalho Chehab , linux-hardening@vger.kernel.org, linux-kernel@vger.kernel.org, Aleksandr Loktionov , Randy Dunlap Subject: [PATCH v3 21/22] docs: kdoc_parser: avoid tokenizing structs everytime Date: Tue, 17 Mar 2026 19:09:41 +0100 Message-ID: <1cc2a4286ebf7d4b2d03fcaf42a1ba9fa09004b9.1773770483.git.mchehab+huawei@kernel.org> X-Mailer: git-send-email 2.52.0 In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Sender: Mauro Carvalho Chehab Most of the rules inside CTransforms are of the type CMatch. Don't re-parse the source code every time. Doing this doesn't change the output, but makes kdoc almost as fast as before the tokenizer patches: # Before tokenizer patches $ time ./scripts/kernel-doc . -man >original 2>&1 real 0m42.933s user 0m36.523s sys 0m1.145s # After tokenizer patches $ time ./scripts/kernel-doc . -man >before 2>&1 real 1m29.853s user 1m23.974s sys 0m1.237s # After this patch $ time ./scripts/kernel-doc . -man >after 2>&1 real 0m48.579s user 0m45.938s sys 0m0.988s $ diff -s before after Files before and after are identical Manually checked the differences between original and after with: $ diff -U0 -prBw original after|grep -v Warning|grep -v "@@"|less They're due: - whitespace fixes; - struct_group are now better handled; - several badly-generated man pages from broken inline kernel-doc markups are now fixed. Signed-off-by: Mauro Carvalho Chehab --- tools/lib/python/kdoc/kdoc_parser.py | 1 - tools/lib/python/kdoc/xforms_lists.py | 30 +++++++++++++++++++++------ 2 files changed, 24 insertions(+), 7 deletions(-) diff --git a/tools/lib/python/kdoc/kdoc_parser.py b/tools/lib/python/kdoc/k= doc_parser.py index 8b2c9d0f0c58..f6c4ee3b18c9 100644 --- a/tools/lib/python/kdoc/kdoc_parser.py +++ b/tools/lib/python/kdoc/kdoc_parser.py @@ -737,7 +737,6 @@ class KernelDoc: # # Go through the list of members applying all of our transformatio= ns. # - members =3D trim_private_members(members) members =3D self.xforms.apply("struct", members) =20 # diff --git a/tools/lib/python/kdoc/xforms_lists.py b/tools/lib/python/kdoc/= xforms_lists.py index 2056572852fd..5a62d4a450cb 100644 --- a/tools/lib/python/kdoc/xforms_lists.py +++ b/tools/lib/python/kdoc/xforms_lists.py @@ -5,7 +5,7 @@ import re =20 from kdoc.kdoc_re import KernRe -from kdoc.c_lex import CMatch +from kdoc.c_lex import CMatch, CTokenizer =20 struct_args_pattern =3D r'([^,)]+)' =20 @@ -16,6 +16,12 @@ class CTransforms: into something we can parse and generate kdoc for. """ =20 + # + # NOTE: + # Due to performance reasons, place CMatch rules before KernRe, + # as this avoids running the C parser every time. + # + #: Transforms for structs and unions. struct_xforms =3D [ # Strip attributes @@ -124,13 +130,25 @@ class CTransforms: "var": var_xforms, } =20 - def apply(self, xforms_type, text): + def apply(self, xforms_type, source): """ - Apply a set of transforms to a block of text. + Apply a set of transforms to a block of source. + + As tokenizer is used here, this function also remove comments + at the end. """ if xforms_type not in self.xforms: - return text + return source + + if isinstance(source, str): + source =3D CTokenizer(source) =20 for search, subst in self.xforms[xforms_type]: - text =3D search.sub(subst, text) - return text + # + # KernRe only accept strings. + # + if isinstance(search, KernRe): + source =3D str(source) + + source =3D search.sub(subst, source) + return str(source) --=20 2.52.0 From nobody Wed Apr 8 10:32:43 2026 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D29E63F54D7; Tue, 17 Mar 2026 18:09:51 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773770991; cv=none; b=Ni06KgV2ZL+1hrnyA4Dvzvu4ZsZ6HNxqPIJYkAHfhGlafLX7oVD5cBm52p9jkx3zB/2qVdEHBJb90zB7iDmay4kskgkJ5dOMOdGfzevDoHkbLwFkYmiMJnuzvCPJIuHKMQzmA4EGzL11OTzh4vyvhHgqxnN/f8SCd0G3ZGcAtUw= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773770991; c=relaxed/simple; bh=8/hsaWMUHVtBvDS57a4t+PPCZuuen94qixln1/WAwiU=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=pg/nOk6IvqaIl0T8qOtH33FwGCEa+VB8fsXTPvdgQJ4vXWMGIhMByZrM60YbiMWHc4qOQa86JLSXIT1MfHCmKr/idlWI+ZL5HsOVQgZvWSsNsDTLPrdgJube4xZMIlr3YpSyinAtEfuoI1pqS7bgISxFMYYf9COK3LIgjINXMV0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=YiqZhqS+; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="YiqZhqS+" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 71F5EC4CEF7; Tue, 17 Mar 2026 18:09:51 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1773770991; bh=8/hsaWMUHVtBvDS57a4t+PPCZuuen94qixln1/WAwiU=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=YiqZhqS++k8jFBJw4eQ7+W8f1oHaNdNC48IvQRbfCUr6m4FdJaQEj+rMrSludd/3T /ZGgZuOZ5qX9ph/J7kbsV5GV8SSBrLqdlifkqzsq2pRmG+q9QenH2GBHpS5xFIJa/N fmdTs4kWquBlxZB+6YQxDxQlMVApxoPdlyL2uNLT/3dQFInEvdQTtfNAnM82COgyFy vYEtWARoBgrXHfA1lHoZniRZbr7U4m8/hVRi4FI2e1lHVhz6jCll7fwT5U/NVN9GKJ 49wBpA1ZXJMC5rYmc4jefCFMQKiA72Yt2iO0Sgx5XhxaDCEjol7Qo7ajBFHYtNAmCG 1QJPQHzQ1EdNA== Received: from mchehab by mail.kernel.org with local (Exim 4.99.1) (envelope-from ) id 1w2YrV-0000000H5fg-2tzF; Tue, 17 Mar 2026 19:09:49 +0100 From: Mauro Carvalho Chehab To: Jonathan Corbet , Kees Cook , Linux Doc Mailing List Cc: Mauro Carvalho Chehab , linux-hardening@vger.kernel.org, linux-kernel@vger.kernel.org, "Gustavo A. R. Silva" , Aleksandr Loktionov , Randy Dunlap Subject: [PATCH v3 22/22] docs: xforms_lists: use CMatch for all identifiers Date: Tue, 17 Mar 2026 19:09:42 +0100 Message-ID: <86d4a07ff0e054207747fabf38d6bb261b52b5fa.1773770483.git.mchehab+huawei@kernel.org> X-Mailer: git-send-email 2.52.0 In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Sender: Mauro Carvalho Chehab CMatch is lexically correct and replaces only identifiers, which is exactly where macro transformations happen. Use it to make the output safer and ensure that all arguments will be parsed the right way, even on complex cases. Signed-off-by: Mauro Carvalho Chehab --- tools/lib/python/kdoc/xforms_lists.py | 159 +++++++++++++------------- 1 file changed, 79 insertions(+), 80 deletions(-) diff --git a/tools/lib/python/kdoc/xforms_lists.py b/tools/lib/python/kdoc/= xforms_lists.py index 5a62d4a450cb..f6ea9efb11ae 100644 --- a/tools/lib/python/kdoc/xforms_lists.py +++ b/tools/lib/python/kdoc/xforms_lists.py @@ -7,7 +7,8 @@ import re from kdoc.kdoc_re import KernRe from kdoc.c_lex import CMatch, CTokenizer =20 -struct_args_pattern =3D r'([^,)]+)' +struct_args_pattern =3D r"([^,)]+)" + =20 class CTransforms: """ @@ -24,48 +25,40 @@ class CTransforms: =20 #: Transforms for structs and unions. struct_xforms =3D [ - # Strip attributes - (KernRe(r"__attribute__\s*\(\([a-z0-9,_\*\s\(\)]*\)\)", flags=3Dre= .I | re.S, cache=3DFalse), ' '), - (KernRe(r'\s*__aligned\s*\([^;]*\)', re.S), ' '), - (KernRe(r'\s*__counted_by\s*\([^;]*\)', re.S), ' '), - (KernRe(r'\s*__counted_by_(le|be)\s*\([^;]*\)', re.S), ' '), - (KernRe(r'\s*__guarded_by\s*\([^\)]*\)', re.S), ' '), - (KernRe(r'\s*__pt_guarded_by\s*\([^\)]*\)', re.S), ' '), - (KernRe(r'\s*__packed\s*', re.S), ' '), - (KernRe(r'\s*CRYPTO_MINALIGN_ATTR', re.S), ' '), - (KernRe(r'\s*__private', re.S), ' '), - (KernRe(r'\s*__rcu', re.S), ' '), - (KernRe(r'\s*____cacheline_aligned_in_smp', re.S), ' '), - (KernRe(r'\s*____cacheline_aligned', re.S), ' '), - (KernRe(r'\s*__cacheline_group_(begin|end)\([^\)]+\);'), ''), - (KernRe(r'__ETHTOOL_DECLARE_LINK_MODE_MASK\s*\(([^\)]+)\)', re.S), - r'DECLARE_BITMAP(\1, __ETHTOOL_LINK_MODE_MASK_NBITS)'), - (KernRe(r'DECLARE_PHY_INTERFACE_MASK\s*\(([^\)]+)\)', re.S), - r'DECLARE_BITMAP(\1, PHY_INTERFACE_MODE_MAX)'), - (KernRe(r'DECLARE_BITMAP\s*\(' + struct_args_pattern + r',\s*' + s= truct_args_pattern + r'\)', - re.S), r'unsigned long \1[BITS_TO_LONGS(\2)]'), - (KernRe(r'DECLARE_HASHTABLE\s*\(' + struct_args_pattern + r',\s*' = + struct_args_pattern + r'\)', - re.S), r'unsigned long \1[1 << ((\2) - 1)]'), - (KernRe(r'DECLARE_KFIFO\s*\(' + struct_args_pattern + r',\s*' + st= ruct_args_pattern + - r',\s*' + struct_args_pattern + r'\)', re.S), r'\2 *\1'), - (KernRe(r'DECLARE_KFIFO_PTR\s*\(' + struct_args_pattern + r',\s*' + - struct_args_pattern + r'\)', re.S), r'\2 *\1'), - (KernRe(r'(?:__)?DECLARE_FLEX_ARRAY\s*\(' + struct_args_pattern + = r',\s*' + - struct_args_pattern + r'\)', re.S), r'\1 \2[]'), - (KernRe(r'DEFINE_DMA_UNMAP_ADDR\s*\(' + struct_args_pattern + r'\)= ', re.S), r'dma_addr_t \1'), - (KernRe(r'DEFINE_DMA_UNMAP_LEN\s*\(' + struct_args_pattern + r'\)'= , re.S), r'__u32 \1'), - (KernRe(r'VIRTIO_DECLARE_FEATURES\(([\w_]+)\)'), r'union { u64 \1;= u64 \1_array[VIRTIO_FEATURES_U64S]; }'), - - (CMatch(r"__cond_acquires"), ""), - (CMatch(r"__cond_releases"), ""), - (CMatch(r"__acquires"), ""), - (CMatch(r"__releases"), ""), - (CMatch(r"__must_hold"), ""), - (CMatch(r"__must_not_hold"), ""), - (CMatch(r"__must_hold_shared"), ""), - (CMatch(r"__cond_acquires_shared"), ""), - (CMatch(r"__acquires_shared"), ""), - (CMatch(r"__releases_shared"), ""), + (CMatch("__attribute__"), ""), + (CMatch("__aligned"), ""), + (CMatch("__counted_by"), ""), + (CMatch("__counted_by_(le|be)"), ""), + (CMatch("__guarded_by"), ""), + (CMatch("__pt_guarded_by"), ""), + (CMatch("__packed"), ""), + (CMatch("CRYPTO_MINALIGN_ATTR"), ""), + (CMatch("__private"), ""), + (CMatch("__rcu"), ""), + (CMatch("____cacheline_aligned_in_smp"), ""), + (CMatch("____cacheline_aligned"), ""), + (CMatch("__cacheline_group_(?:begin|end)"), ""), + (CMatch("__ETHTOOL_DECLARE_LINK_MODE_MASK"), r"DECLARE_BITMAP(\1, = __ETHTOOL_LINK_MODE_MASK_NBITS)"), + (CMatch("DECLARE_PHY_INTERFACE_MASK",),r"DECLARE_BITMAP(\1, PHY_IN= TERFACE_MODE_MAX)"), + (CMatch("DECLARE_BITMAP"), r"unsigned long \1[BITS_TO_LONGS(\2)]"), + (CMatch("DECLARE_HASHTABLE"), r"unsigned long \1[1 << ((\2) - 1)]"= ), + (CMatch("DECLARE_KFIFO"), r"\2 *\1"), + (CMatch("DECLARE_KFIFO_PTR"), r"\2 *\1"), + (CMatch("(?:__)?DECLARE_FLEX_ARRAY"), r"\1 \2[]"), + (CMatch("DEFINE_DMA_UNMAP_ADDR"), r"dma_addr_t \1"), + (CMatch("DEFINE_DMA_UNMAP_LEN"), r"__u32 \1"), + (CMatch("VIRTIO_DECLARE_FEATURES"), r"union { u64 \1; u64 \1_array= [VIRTIO_FEATURES_U64S]; }"), + (CMatch("__cond_acquires"), ""), + (CMatch("__cond_releases"), ""), + (CMatch("__acquires"), ""), + (CMatch("__releases"), ""), + (CMatch("__must_hold"), ""), + (CMatch("__must_not_hold"), ""), + (CMatch("__must_hold_shared"), ""), + (CMatch("__cond_acquires_shared"), ""), + (CMatch("__acquires_shared"), ""), + (CMatch("__releases_shared"), ""), + (CMatch("__attribute__"), ""), =20 # # Macro __struct_group() creates an union with an anonymous @@ -73,51 +66,57 @@ class CTransforms: # need one of those at kernel-doc, as we won't be documenting the = same # members twice. # - (CMatch('struct_group'), r'struct { \2+ };'), - (CMatch('struct_group_attr'), r'struct { \3+ };'), - (CMatch('struct_group_tagged'), r'struct { \3+ };'), - (CMatch('__struct_group'), r'struct { \4+ };'), - + (CMatch("struct_group"), r"struct { \2+ };"), + (CMatch("struct_group_attr"), r"struct { \3+ };"), + (CMatch("struct_group_tagged"), r"struct { \3+ };"), + (CMatch("__struct_group"), r"struct { \4+ };"), ] =20 #: Transforms for function prototypes. function_xforms =3D [ - (KernRe(r"^static +"), ""), - (KernRe(r"^extern +"), ""), - (KernRe(r"^asmlinkage +"), ""), - (KernRe(r"^inline +"), ""), - (KernRe(r"^__inline__ +"), ""), - (KernRe(r"^__inline +"), ""), - (KernRe(r"^__always_inline +"), ""), - (KernRe(r"^noinline +"), ""), - (KernRe(r"^__FORTIFY_INLINE +"), ""), - (KernRe(r"__init +"), ""), - (KernRe(r"__init_or_module +"), ""), - (KernRe(r"__exit +"), ""), - (KernRe(r"__deprecated +"), ""), - (KernRe(r"__flatten +"), ""), - (KernRe(r"__meminit +"), ""), - (KernRe(r"__must_check +"), ""), - (KernRe(r"__weak +"), ""), - (KernRe(r"__sched +"), ""), - (KernRe(r"_noprof"), ""), - (KernRe(r"__always_unused *"), ""), - (KernRe(r"__printf\s*\(\s*\d*\s*,\s*\d*\s*\) +"), ""), - (KernRe(r"__(?:re)?alloc_size\s*\(\s*\d+\s*(?:,\s*\d+\s*)?\) +"), = ""), - (KernRe(r"__diagnose_as\s*\(\s*\S+\s*(?:,\s*\d+\s*)*\) +"), ""), - (KernRe(r"DECL_BUCKET_PARAMS\s*\(\s*(\S+)\s*,\s*(\S+)\s*\)"), r"\1= , \2"), - (KernRe(r"__no_context_analysis\s*"), ""), - (KernRe(r"__attribute_const__ +"), ""), - (KernRe(r"__attribute__\s*\(\((?:[\w\s]+(?:\([^)]*\))?\s*,?)+\)\)\= s+"), ""), + (CMatch("static"), ""), + (CMatch("extern"), ""), + (CMatch("asmlinkage"), ""), + (CMatch("inline"), ""), + (CMatch("__inline__"), ""), + (CMatch("__inline"), ""), + (CMatch("__always_inline"), ""), + (CMatch("noinline"), ""), + (CMatch("__FORTIFY_INLINE"), ""), + (CMatch("__init"), ""), + (CMatch("__init_or_module"), ""), + (CMatch("__exit"), ""), + (CMatch("__deprecated"), ""), + (CMatch("__flatten"), ""), + (CMatch("__meminit"), ""), + (CMatch("__must_check"), ""), + (CMatch("__weak"), ""), + (CMatch("__sched"), ""), + (CMatch("__always_unused"), ""), + (CMatch("__printf"), ""), + (CMatch("__(?:re)?alloc_size"), ""), + (CMatch("__diagnose_as"), ""), + (CMatch("DECL_BUCKET_PARAMS"), r"\1, \2"), + (CMatch("__no_context_analysis"), ""), + (CMatch("__attribute_const__"), ""), + (CMatch("__attribute__"), ""), + + # + # HACK: this is similar to process_export() hack. It is meant to + # drop _noproof from function name. See for instance: + # ahash_request_alloc kernel-doc declaration at include/crypto/has= h.h. + # + (KernRe("_noprof"), ""), ] =20 #: Transforms for variable prototypes. var_xforms =3D [ - (KernRe(r"__read_mostly"), ""), - (KernRe(r"__ro_after_init"), ""), - (KernRe(r'\s*__guarded_by\s*\([^\)]*\)', re.S), ""), - (KernRe(r'\s*__pt_guarded_by\s*\([^\)]*\)', re.S), ""), - (KernRe(r"LIST_HEAD\(([\w_]+)\)"), r"struct list_head \1"), + (CMatch("__read_mostly"), ""), + (CMatch("__ro_after_init"), ""), + (CMatch("__guarded_by"), ""), + (CMatch("__pt_guarded_by"), ""), + (CMatch("LIST_HEAD"), r"struct list_head \1"), + (KernRe(r"(?://.*)$"), ""), (KernRe(r"(?:/\*.*\*/)"), ""), (KernRe(r";$"), ""), --=20 2.52.0