From nobody Thu Apr 9 10:29:06 2026 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D170822A1D4; Mon, 9 Mar 2026 16:48:05 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773074885; cv=none; b=rnUaWalFyQqaU0ORzLoV+TwNofBjy4M+BdE6he/HOaaa3UEIMbeXHCJWGUvdKG/zSNIyeDVZJGsi3Du7+UX5o1Cj2gar/BJfMurH079dAPmTvJcI7YZ3tilrdpsM3chxDRkDtJYxFtKM/sGoTpn+KmkyWGzKiLY9qPDYAMQuPTA= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773074885; c=relaxed/simple; bh=lTdoS2vsiheWATIVgcF8OQSQ0Xw+s5JEdZeeekTiLBQ=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=RGMOZnIwFdkjoiZE/+p9Eh/4A0eZy7k6d2Qwgw9Jerj/CvCfP72GLVwblfzvWsTHIs5RFdNc/WEM8aOlIJrZjTgi0aqci95DDU6fhZuq6xXNh0zCoTM/2A7CvQOzyuzppVARA0YyAgG7aDUGOxOaBx3mG6HRHaL8++WUIzUJDpA= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=ShuknLG/; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="ShuknLG/" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 8831CC2BC9E; Mon, 9 Mar 2026 16:48:05 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1773074885; bh=lTdoS2vsiheWATIVgcF8OQSQ0Xw+s5JEdZeeekTiLBQ=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=ShuknLG/9JbKKrc499mUfg6nXJLU62Qy5PZeBkWHji2cHr9K+hfpJlkv7jBSQsQlK pKxrWju1wmIcUY0IaTmgRN/BJ49GEW3gYf6JnuCcDkWQqOnr2peS1ptp8l/l4etN7W 0Gm+u85+sU1KixsZPKHxGxFUstQ1Eyq5JoGKBcuA7hPVsVYescg3AJwmzfkpayEOZl v+Hkio3KCENa327OE2P0L+11La3fDG674kkbGiiHUH1L4iEKLnOELv5c+hJC7PUg0+ TtoxIDTHB8LTnU3q8BGv3DmGP/VPzk7sChdV5voV3putHqEWDjn4PJGHh8wtlFdpiX RmpRRFG+PWNOg== Received: from mchehab by mail.kernel.org with local (Exim 4.99.1) (envelope-from ) id 1vzdlz-0000000BhYE-2l0s; Mon, 09 Mar 2026 17:48:03 +0100 From: Mauro Carvalho Chehab To: Jonathan Corbet , Linux Doc Mailing List Cc: Mauro Carvalho Chehab , linux-kernel@vger.kernel.org, Mauro Carvalho Chehab , Shuah Khan Subject: [PATCH 1/8] docs: python: add helpers to run unit tests Date: Mon, 9 Mar 2026 17:47:52 +0100 Message-ID: <37999041f616ddef41e84cf2686c0264d1a51dc9.1773074166.git.mchehab+huawei@kernel.org> X-Mailer: git-send-email 2.52.0 In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Sender: Mauro Carvalho Chehab While python internal libraries have support for unit tests, its output is not nice. Add a helper module to improve its output. I wrote this module last year while testing some scripts I used internally. The initial skeleton was generated with the help of LLM tools, but it was higly modified to ensure that it will work as I would expect. Signed-off-by: Mauro Carvalho Chehab --- Documentation/tools/python.rst | 2 + Documentation/tools/unittest.rst | 24 ++ tools/lib/python/unittest_helper.py | 353 ++++++++++++++++++++++++++++ 3 files changed, 379 insertions(+) create mode 100644 Documentation/tools/unittest.rst create mode 100755 tools/lib/python/unittest_helper.py diff --git a/Documentation/tools/python.rst b/Documentation/tools/python.rst index 1444c1816735..3b7299161f20 100644 --- a/Documentation/tools/python.rst +++ b/Documentation/tools/python.rst @@ -11,3 +11,5 @@ Python libraries feat kdoc kabi + + unittest diff --git a/Documentation/tools/unittest.rst b/Documentation/tools/unittes= t.rst new file mode 100644 index 000000000000..14a2b2a65236 --- /dev/null +++ b/Documentation/tools/unittest.rst @@ -0,0 +1,24 @@ +.. SPDX-License-Identifier: GPL-2.0 + +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D +Python unittest +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D + +Checking consistency of python modules can be complex. Sometimes, it is +useful to define a set of unit tests to help checking them. + +While the actual test implementation is usecase dependent, Python already +provides a standard way to add unit tests by using ``import unittest``. + +Using such class, requires setting up a test suite. Also, the default form= at +is a little bit ackward. To improve it and provide a more uniform way to +report errors, some unittest classes and functions are defined. + + +Unittest helper module +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D + +.. automodule:: lib.python.unittest_helper + :members: + :show-inheritance: + :undoc-members: diff --git a/tools/lib/python/unittest_helper.py b/tools/lib/python/unittes= t_helper.py new file mode 100755 index 000000000000..55d444cd73d4 --- /dev/null +++ b/tools/lib/python/unittest_helper.py @@ -0,0 +1,353 @@ +#!/usr/bin/env python3 +# SPDX-License-Identifier: GPL-2.0 +# Copyright(c) 2025-2026: Mauro Carvalho Chehab . +# +# pylint: disable=3DC0103,R0912,R0914,E1101 + +""" +Provides helper functions and classes execute python unit tests. + +Those help functions provide a nice colored output summary of each +executed test and, when a test fails, it shows the different in diff +format when running in verbose mode, like:: + + $ tools/unittests/nested_match.py -v + ... + Traceback (most recent call last): + File "/new_devel/docs/tools/unittests/nested_match.py", line 69, in te= st_count_limit + self.assertEqual(replaced, "bar(a); bar(b); foo(c)") + ~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + AssertionError: 'bar(a) foo(b); foo(c)' !=3D 'bar(a); bar(b); foo(c)' + - bar(a) foo(b); foo(c) + ? ^^^^ + + bar(a); bar(b); foo(c) + ? ^^^^^ + ... + +It also allows filtering what tests will be executed via ``-k`` parameter. + +Typical usage is to do:: + + from unittest_helper import run_unittest + ... + + if __name__ =3D=3D "__main__": + run_unittest(__file__) + +If passing arguments is needed, on a more complex scenario, it can be +used like on this example:: + + from unittest_helper import TestUnits, run_unittest + ... + env =3D {'sudo': ""} + ... + if __name__ =3D=3D "__main__": + runner =3D TestUnits() + base_parser =3D runner.parse_args() + base_parser.add_argument('--sudo', action=3D'store_true', + help=3D'Enable tests requiring sudo privil= eges') + + args =3D base_parser.parse_args() + + # Update module-level flag + if args.sudo: + env['sudo'] =3D "1" + + # Run tests with customized arguments + runner.run(__file__, parser=3Dbase_parser, args=3Dargs, env=3Denv) +""" + +import argparse +import atexit +import os +import re +import unittest +import sys + +from unittest.mock import patch + + +class Summary(unittest.TestResult): + """ + Overrides ``unittest.TestResult`` class to provide a nice colored + summary. When in verbose mode, displays actual/expected difference in + unified diff format. + """ + def __init__(self, *args, **kwargs): + super().__init__(*args, **kwargs) + + #: Dictionary to store organized test results. + self.test_results =3D {} + + #: max length of the test names. + self.max_name_length =3D 0 + + def startTest(self, test): + super().startTest(test) + test_id =3D test.id() + parts =3D test_id.split(".") + + # Extract module, class, and method names + if len(parts) >=3D 3: + module_name =3D parts[-3] + else: + module_name =3D "" + if len(parts) >=3D 2: + class_name =3D parts[-2] + else: + class_name =3D "" + + method_name =3D parts[-1] + + # Build the hierarchical structure + if module_name not in self.test_results: + self.test_results[module_name] =3D {} + + if class_name not in self.test_results[module_name]: + self.test_results[module_name][class_name] =3D [] + + # Track maximum test name length for alignment + display_name =3D f"{method_name}:" + + self.max_name_length =3D max(len(display_name), self.max_name_leng= th) + + def _record_test(self, test, status): + test_id =3D test.id() + parts =3D test_id.split(".") + if len(parts) >=3D 3: + module_name =3D parts[-3] + else: + module_name =3D "" + if len(parts) >=3D 2: + class_name =3D parts[-2] + else: + class_name =3D "" + method_name =3D parts[-1] + self.test_results[module_name][class_name].append((method_name, st= atus)) + + def addSuccess(self, test): + super().addSuccess(test) + self._record_test(test, "OK") + + def addFailure(self, test, err): + super().addFailure(test, err) + self._record_test(test, "FAIL") + + def addError(self, test, err): + super().addError(test, err) + self._record_test(test, "ERROR") + + def addSkip(self, test, reason): + super().addSkip(test, reason) + self._record_test(test, f"SKIP ({reason})") + + def printResults(self): + """ + Print results using colors if tty. + """ + # Check for ANSI color support + use_color =3D sys.stdout.isatty() + COLORS =3D { + "OK": "\033[32m", # Green + "FAIL": "\033[31m", # Red + "SKIP": "\033[1;33m", # Yellow + "PARTIAL": "\033[33m", # Orange + "EXPECTED_FAIL": "\033[36m", # Cyan + "reset": "\033[0m", # Reset to default terminal col= or + } + if not use_color: + for c in COLORS: + COLORS[c] =3D "" + + # Calculate maximum test name length + if not self.test_results: + return + try: + lengths =3D [] + for module in self.test_results.values(): + for tests in module.values(): + for test_name, _ in tests: + lengths.append(len(test_name) + 1) # +1 for colon + max_length =3D max(lengths) + 2 # Additional padding + except ValueError: + sys.exit("Test list is empty") + + # Print results + for module_name, classes in self.test_results.items(): + print(f"{module_name}:") + for class_name, tests in classes.items(): + print(f" {class_name}:") + for test_name, status in tests: + # Get base status without reason for SKIP + if status.startswith("SKIP"): + status_code =3D status.split()[0] + else: + status_code =3D status + color =3D COLORS.get(status_code, "") + print( + f" {test_name + ':':<{max_length}}{color}{s= tatus}{COLORS['reset']}" + ) + print() + + # Print summary + print(f"\nRan {self.testsRun} tests", end=3D"") + if hasattr(self, "timeTaken"): + print(f" in {self.timeTaken:.3f}s", end=3D"") + print() + + if not self.wasSuccessful(): + print(f"\n{COLORS['FAIL']}FAILED (", end=3D"") + failures =3D getattr(self, "failures", []) + errors =3D getattr(self, "errors", []) + if failures: + print(f"failures=3D{len(failures)}", end=3D"") + if errors: + if failures: + print(", ", end=3D"") + print(f"errors=3D{len(errors)}", end=3D"") + print(f"){COLORS['reset']}") + + +def flatten_suite(suite): + """Flatten test suite hierarchy.""" + tests =3D [] + for item in suite: + if isinstance(item, unittest.TestSuite): + tests.extend(flatten_suite(item)) + else: + tests.append(item) + return tests + + +class TestUnits: + """ + Helper class to set verbosity level. + + This class discover test files, import its unittest classes and + executes the test on it. + """ + def parse_args(self): + """Returns a parser for command line arguments.""" + parser =3D argparse.ArgumentParser(description=3D"Test runner with= regex filtering") + parser.add_argument("-v", "--verbose", action=3D"count", default= =3D1) + parser.add_argument("-f", "--failfast", action=3D"store_true") + parser.add_argument("-k", "--keyword", + help=3D"Regex pattern to filter test methods") + return parser + + def run(self, caller_file=3DNone, pattern=3DNone, + suite=3DNone, parser=3DNone, args=3DNone, env=3DNone): + """ + Execute all tests from the unity test file. + + It contains several optional parameters: + + ``caller_file``: + - name of the file that contains test. + + typical usage is to place __file__ at the caller test, e.g.= :: + + if __name__ =3D=3D "__main__": + TestUnits().run(__file__) + + ``pattern``: + - optional pattern to match multiple file names. Defaults + to basename of ``caller_file``. + + ``suite``: + - an unittest suite initialized by the caller using + ``unittest.TestLoader().discover()``. + + ``parser``: + - an argparse parser. If not defined, this helper will create + one. + + ``args``: + - an ``argparse.Namespace`` data filled by the caller. + + ``env``: + - environment variables that will be passed to the test suite + + At least ``caller_file`` or ``suite`` must be used, otherwise a + ``TypeError`` will be raised. + """ + if not args: + if not parser: + parser =3D self.parse_args() + args =3D parser.parse_args() + + if not caller_file and not suite: + raise TypeError("Either caller_file or suite is needed at Test= Units") + + verbose =3D args.verbose + + if not env: + env =3D os.environ.copy() + + env["VERBOSE"] =3D f"{verbose}" + + patcher =3D patch.dict(os.environ, env) + patcher.start() + # ensure it gets stopped after + atexit.register(patcher.stop) + + + if verbose >=3D 2: + unittest.TextTestRunner(verbosity=3Dverbose).run =3D lambda su= ite: suite + + # Load ONLY tests from the calling file + if not suite: + if not pattern: + pattern =3D caller_file + + loader =3D unittest.TestLoader() + suite =3D loader.discover(start_dir=3Dos.path.dirname(caller_f= ile), + pattern=3Dos.path.basename(caller_file= )) + + # Flatten the suite for environment injection + tests_to_inject =3D flatten_suite(suite) + + # Filter tests by method name if -k specified + if args.keyword: + try: + pattern =3D re.compile(args.keyword) + filtered_suite =3D unittest.TestSuite() + for test in tests_to_inject: # Use the pre-flattened list + method_name =3D test.id().split(".")[-1] + if pattern.search(method_name): + filtered_suite.addTest(test) + suite =3D filtered_suite + except re.error as e: + sys.stderr.write(f"Invalid regex pattern: {e}\n") + sys.exit(1) + else: + # Maintain original suite structure if no keyword filtering + suite =3D unittest.TestSuite(tests_to_inject) + + if verbose >=3D 2: + resultclass =3D None + else: + resultclass =3D Summary + + runner =3D unittest.TextTestRunner(verbosity=3Dargs.verbose, + resultclass=3Dresultclass, + failfast=3Dargs.failfast) + result =3D runner.run(suite) + if resultclass: + result.printResults() + + sys.exit(not result.wasSuccessful()) + + +def run_unittest(fname): + """ + Basic usage of TestUnits class. + + Use it when there's no need to pass any extra argument to the tests + with. The recommended way is to place this at the end of each + unittest module:: + + if __name__ =3D=3D "__main__": + run_unittest(__file__) + """ + TestUnits().run(fname) --=20 2.52.0 From nobody Thu Apr 9 10:29:06 2026 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 3DA423BE14A; Mon, 9 Mar 2026 16:48:06 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773074886; cv=none; b=YkRFwjUAeJu35TA5lFVjKyjJKXQVkllUvmFJtIB/LMPa5Q1gPBAnWJG/oM3FIKJM3FS0vOpfgmGqC4sWWJ5LnOB3GUgiJ/lyBlU3T2+yNNwi8QzIZ1KxigTrVy/aEVXWpbZHyREsK50tnykD7VojxGDIsRkYT+xiMY7Yc+hFiCM= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773074886; c=relaxed/simple; bh=QUqLYmPuQ3O4ZK29hZAwAg0doyghqKMTrXuQP8pJrMQ=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=e4wL8zUMdU8nuNWlT+/iXkakgf7zZ/onDRLqi9xtW0Vr+3E5dJ8WvPXvK3iaVVzv1Mv1r2wJKopXQnDPi/HNEugO7IrUF6gAWjEDW6P3C/ZSnxsegv8TI+F/jvXhgwAXyvG6MDMzflKZOa98DcfhbPAGedUAqNhxhyiLtGz/h34= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=OAJ3jtqZ; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="OAJ3jtqZ" Received: by smtp.kernel.org (Postfix) with ESMTPSA id C14D5C2BCB0; Mon, 9 Mar 2026 16:48:05 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1773074886; bh=QUqLYmPuQ3O4ZK29hZAwAg0doyghqKMTrXuQP8pJrMQ=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=OAJ3jtqZsO3bO8eiCpMFkYulaN4oCuILxwC6HxXPAmoVbien7B+BIMovr0arFnnyx 47GYt+L4ilEjQlywGBKLbAQK6IRNCPE06vD37lBkHEKy6iIbYdHt2wOSvX70sdOebm jbqdOHr28XdiuRhvlmpC+bQ8RzJgSX8VaL6SZFf4bhXfoV8vCmPb20aC7+VMUETHQl 9M+aZpI8WwFV6YU5uRnJIN50enflnzIDt3AlGwZtuTO+DFfuSsrl9FC4EASFMLCE+l nV0YrqA5sje/cGzjFyMQDD40S8LKiBa4jI+WWVGtK5vU2cIvdkXoO5TlAWTnqPITIT uO8PviTZvrCFA== Received: from mchehab by mail.kernel.org with local (Exim 4.99.1) (envelope-from ) id 1vzdlz-0000000BhZR-3cs0; Mon, 09 Mar 2026 17:48:03 +0100 From: Mauro Carvalho Chehab To: Jonathan Corbet , Linux Doc Mailing List Cc: Mauro Carvalho Chehab , linux-kernel@vger.kernel.org Subject: [PATCH 2/8] unittests: add a testbench to check public/private kdoc comments Date: Mon, 9 Mar 2026 17:47:53 +0100 Message-ID: <144f4952e0cb74fe9c9adc117e9a21ec8aa1cc10.1773074166.git.mchehab+huawei@kernel.org> X-Mailer: git-send-email 2.52.0 In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Sender: Mauro Carvalho Chehab Add unit tests to check if the public/private and comments strip is working properly. Running it shows that, on several cases, public/private is not doing what it is expected: test_private: TestPublicPrivate: test balanced_inner_private: OK test balanced_non_greddy_private: OK test balanced_private: OK test no private: OK test unbalanced_inner_private: FAIL test unbalanced_private: FAIL test unbalanced_struct_group_tagged_with_private: FAIL test unbalanced_two_struct_group_tagged_first_with_private: FAIL test unbalanced_without_end_of_line: FAIL Ran 9 tests FAILED (failures=3D5) Signed-off-by: Mauro Carvalho Chehab --- tools/unittests/test_private.py | 331 ++++++++++++++++++++++++++++++++ 1 file changed, 331 insertions(+) create mode 100755 tools/unittests/test_private.py diff --git a/tools/unittests/test_private.py b/tools/unittests/test_private= .py new file mode 100755 index 000000000000..eae245ae8a12 --- /dev/null +++ b/tools/unittests/test_private.py @@ -0,0 +1,331 @@ +#!/usr/bin/env python3 + +""" +Unit tests for struct/union member extractor class. +""" + + +import os +import re +import unittest +import sys + +from unittest.mock import MagicMock + +SRC_DIR =3D os.path.dirname(os.path.realpath(__file__)) +sys.path.insert(0, os.path.join(SRC_DIR, "../lib/python")) + +from kdoc.kdoc_parser import trim_private_members +from unittest_helper import run_unittest + +# +# List of tests. +# +# The code will dynamically generate one test for each key on this diction= ary. +# + +#: Tests to check if CTokenizer is handling properly public/private commen= ts. +TESTS_PRIVATE =3D { + # + # Simplest case: no private. Ensure that trimming won't affect struct + # + "no private": { + "source": """ + struct foo { + int a; + int b; + int c; + }; + """, + "trimmed": """ + struct foo { + int a; + int b; + int c; + }; + """, + }, + + # + # Play "by the books" by always having a public in place + # + + "balanced_private": { + "source": """ + struct foo { + int a; + /* private: */ + int b; + /* public: */ + int c; + }; + """, + "trimmed": """ + struct foo { + int a; + int c; + }; + """, + }, + + "balanced_non_greddy_private": { + "source": """ + struct foo { + int a; + /* private: */ + int b; + /* public: */ + int c; + /* private: */ + int d; + /* public: */ + int e; + + }; + """, + "trimmed": """ + struct foo { + int a; + int c; + int e; + }; + """, + }, + + "balanced_inner_private": { + "source": """ + struct foo { + struct { + int a; + /* private: ignore below */ + int b; + /* public: but this should not be ignored */ + }; + int b; + }; + """, + "trimmed": """ + struct foo { + struct { + int a; + }; + int b; + }; + """, + }, + + # + # Test what happens if there's no public after private place + # + + "unbalanced_private": { + "source": """ + struct foo { + int a; + /* private: */ + int b; + int c; + }; + """, + "trimmed": """ + struct foo { + int a; + }; + """, + }, + + "unbalanced_inner_private": { + "source": """ + struct foo { + struct { + int a; + /* private: ignore below */ + int b; + /* but this should not be ignored */ + }; + int b; + }; + """, + "trimmed": """ + struct foo { + struct { + int a; + }; + int b; + }; + """, + }, + + "unbalanced_struct_group_tagged_with_private": { + "source": """ + struct page_pool_params { + struct_group_tagged(page_pool_params_fast, fast, + unsigned int order; + unsigned int pool_size; + int nid; + struct device *dev; + struct napi_struct *napi; + enum dma_data_direction dma_dir; + unsigned int max_len; + unsigned int offset; + }; + struct_group_tagged(page_pool_params_slow, slow, + struct net_device *netdev; + unsigned int queue_idx; + unsigned int flags; + /* private: used by test code only */ + void (*init_callback)(netmem_ref netmem, void *arg= ); + void *init_arg; + }; + }; + """, + "trimmed": """ + struct page_pool_params { + struct_group_tagged(page_pool_params_fast, fast, + unsigned int order; + unsigned int pool_size; + int nid; + struct device *dev; + struct napi_struct *napi; + enum dma_data_direction dma_dir; + unsigned int max_len; + unsigned int offset; + }; + struct_group_tagged(page_pool_params_slow, slow, + struct net_device *netdev; + unsigned int queue_idx; + unsigned int flags; + }; + }; + """, + }, + + "unbalanced_two_struct_group_tagged_first_with_private": { + "source": """ + struct page_pool_params { + struct_group_tagged(page_pool_params_slow, slow, + struct net_device *netdev; + unsigned int queue_idx; + unsigned int flags; + /* private: used by test code only */ + void (*init_callback)(netmem_ref netmem, void *arg= ); + void *init_arg; + }; + struct_group_tagged(page_pool_params_fast, fast, + unsigned int order; + unsigned int pool_size; + int nid; + struct device *dev; + struct napi_struct *napi; + enum dma_data_direction dma_dir; + unsigned int max_len; + unsigned int offset; + }; + }; + """, + "trimmed": """ + struct page_pool_params { + struct_group_tagged(page_pool_params_slow, slow, + struct net_device *netdev; + unsigned int queue_idx; + unsigned int flags; + }; + struct_group_tagged(page_pool_params_fast, fast, + unsigned int order; + unsigned int pool_size; + int nid; + struct device *dev; + struct napi_struct *napi; + enum dma_data_direction dma_dir; + unsigned int max_len; + unsigned int offset; + }; + }; + """, + }, + "unbalanced_without_end_of_line": { + "source": """ \ + struct page_pool_params { \ + struct_group_tagged(page_pool_params_slow, slow, \ + struct net_device *netdev; \ + unsigned int queue_idx; \ + unsigned int flags; + /* private: used by test code only */ + void (*init_callback)(netmem_ref netmem, void *arg= ); \ + void *init_arg; \ + }; \ + struct_group_tagged(page_pool_params_fast, fast, \ + unsigned int order; \ + unsigned int pool_size; \ + int nid; \ + struct device *dev; \ + struct napi_struct *napi; \ + enum dma_data_direction dma_dir; \ + unsigned int max_len; \ + unsigned int offset; \ + }; \ + }; + """, + "trimmed": """ + struct page_pool_params { + struct_group_tagged(page_pool_params_slow, slow, + struct net_device *netdev; + unsigned int queue_idx; + unsigned int flags; + }; + struct_group_tagged(page_pool_params_fast, fast, + unsigned int order; + unsigned int pool_size; + int nid; + struct device *dev; + struct napi_struct *napi; + enum dma_data_direction dma_dir; + unsigned int max_len; + unsigned int offset; + }; + }; + """, + }, +} + + +class TestPublicPrivate(unittest.TestCase): + """ + Main test class. Populated dynamically at runtime. + """ + + def setUp(self): + self.maxDiff =3D None + + def add_test(cls, name, source, trimmed): + """ + Dynamically add a test to the class + """ + def test(cls): + result =3D trim_private_members(source) + + result =3D re.sub(r"\s++", " ", result).strip() + expected =3D re.sub(r"\s++", " ", trimmed).strip() + + msg =3D f"failed when parsing this source:\n" + source + + cls.assertEqual(result, expected, msg=3Dmsg) + + test.__name__ =3D f'test {name}' + + setattr(TestPublicPrivate, test.__name__, test) + + +# +# Populate TestPublicPrivate class +# +test_class =3D TestPublicPrivate() +for name, test in TESTS_PRIVATE.items(): + test_class.add_test(name, test["source"], test["trimmed"]) + + +# +# main +# +if __name__ =3D=3D "__main__": + run_unittest(__file__) --=20 2.52.0 From nobody Thu Apr 9 10:29:06 2026 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id EB0C42494ED; Mon, 9 Mar 2026 16:48:05 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773074886; cv=none; b=JYBtoSbgqwDxBs2nYVYBxmOwPlu05x+OIkpgQSErUcblaFkVdahhG0RK4D1u+AoyuWb8oiQX0UYIZqE/LJr+yx3b6q/zDFBKGJaIEyii28KynKEr90U+W9qPKsTArnGVLymh76+JmveWwgK71TbOHBFXan8zZ8c3Wk1UzWhO9uE= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773074886; c=relaxed/simple; bh=Og3btdqqsss2BN07XAVoT1mCvNdW9py94Li7vBmuxJ4=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=K4JoY2Px/qPV7ADdmLD8PaursFTI0zfgkYTn+CPyVRCtX2pakRlbF2HNrKq2OELYk6QquDa1Pmskj8ujMoCKIE1Jwf078toeGlpwlxqbyTrrAbzvbRb9hc1TxrZKOpoQ4AoRbq8piFI+duQ+190OUEy8H6W2YhSKuTjS0EfRkg0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=YUfyEWvr; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="YUfyEWvr" Received: by smtp.kernel.org (Postfix) with ESMTPSA id C3AFAC2BCB2; Mon, 9 Mar 2026 16:48:05 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1773074885; bh=Og3btdqqsss2BN07XAVoT1mCvNdW9py94Li7vBmuxJ4=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=YUfyEWvrzNP4u8VpiIWyyDzjz8AZuCmEeDOIvLQOUdAhZS+I59gOPLT5M5H+D9f3A RA+4RrdQ6euOM/yiRDCvFFHsN209Guhm5jeiQiHsJMHFwgF1T1qricAxKxF+v4DAoQ KsHu46bPGadSr7OQqBQd1D2e142+tzXOaRs3P9fpLD0BGwH9/Syq6iUyXvXlWEZurl YMKlI1FDSLm2z55aL0RD8NkqerYBN4OQZlity/5QV43H9GhJQArmTdZuFfQIvSRh+V mJQfDni61ORCXiJa1G8qmQcLN2FOZxCjbBBnWBsRO92uczcXTeQ2LAPgjJhtk3A3c8 xsnpNw7yj/pMw== Received: from mchehab by mail.kernel.org with local (Exim 4.99.1) (envelope-from ) id 1vzdm0-0000000Bhag-0IlN; Mon, 09 Mar 2026 17:48:04 +0100 From: Mauro Carvalho Chehab To: Jonathan Corbet , Linux Doc Mailing List Cc: Mauro Carvalho Chehab , linux-kernel@vger.kernel.org, Aleksandr Loktionov , Randy Dunlap Subject: [PATCH 3/8] docs: kdoc: don't add broken comments inside prototypes Date: Mon, 9 Mar 2026 17:47:54 +0100 Message-ID: <18e577dbbd538dcc22945ff139fe3638344e14f0.1773074166.git.mchehab+huawei@kernel.org> X-Mailer: git-send-email 2.52.0 In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Sender: Mauro Carvalho Chehab Parsing a file like drivers/scsi/isci/host.h, which contains broken kernel-doc markups makes it create a prototype that contains unmatched end comments. That causes, for instance, struct sci_power_control to be shown this this prototype: struct sci_power_control { * it is not. */ bool timer_started; */ struct sci_timer timer; * requesters field. */ u8 phys_waiting; */ u8 phys_granted_power; * mapped into requesters via struct sci_phy.phy_index */ struct isc= i_phy *requesters[SCI_MAX_PHYS]; }; as comments won't start with "/*" anymore. Fix the logic to detect such cases, and keep adding the comments inside it. Signed-off-by: Mauro Carvalho Chehab --- tools/lib/python/kdoc/kdoc_parser.py | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/tools/lib/python/kdoc/kdoc_parser.py b/tools/lib/python/kdoc/k= doc_parser.py index edf70ba139a5..086579d00b5c 100644 --- a/tools/lib/python/kdoc/kdoc_parser.py +++ b/tools/lib/python/kdoc/kdoc_parser.py @@ -1355,6 +1355,12 @@ class KernelDoc: elif doc_content.search(line): self.emit_msg(ln, f"Incorrect use of kernel-doc format: {line}= ") self.state =3D state.PROTO + + # + # Don't let it add partial comments at the code, as breaks the + # logic meant to remove comments from prototypes. + # + self.process_proto_type(ln, "/**\n" + line) # else ... ?? =20 def process_inline_text(self, ln, line): --=20 2.52.0 From nobody Thu Apr 9 10:29:06 2026 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 3DAD93BED5F; Mon, 9 Mar 2026 16:48:06 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773074886; cv=none; b=a4XclYHn6ppbtEZ+iRSrJyPMcbG8In4oNWl0T+2rmJCm4qz9OWkn5iMuntYGRwzGlLbWAKI4ITGjy32CkVm5o7PeK9h3PgcQMSVoKGTV/uHLTo2IHKD10t3mpJQv6wKUshiSqiPDSUmpBWhvVZAXiqCeM25cpIhnnUmOoV+c8ik= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773074886; c=relaxed/simple; bh=iR9cZEMBrhxbAljLc1W53eZP2bmd/b55ybR+IN+0RyI=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=g4QRRa95AahatHbYZFRQUz6kPH87SuL8OCacXwDAMNsaxlbTj4I+hL+YjHx5klWki5SS5d232R4kwmmM7c6EGDayvGl/g9RSQHWawUx/vmAgepzbKhTE9DbibFDBURvN/vXXMEaNdjWRVZ2LZc3Zk8hq7v/VsQCIRlfsi5jAn7M= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=TvFjYCzy; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="TvFjYCzy" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 0B320C2BCB1; Mon, 9 Mar 2026 16:48:06 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1773074886; bh=iR9cZEMBrhxbAljLc1W53eZP2bmd/b55ybR+IN+0RyI=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=TvFjYCzye2aOkH4hWram9Eq8FP4qUiAKApgNOe1HTEtvR72dz42lSH599bcDMmUJO Ac149IMP0jZ+mqDysTamGGDm6S8tZc6QHxWbhiz8rNvyCHP4JATQfVXfpX9f5wL0Z2 8oA5QkIW4m+5z59uFKtqSX063uifDl6vcci4/cTH/fYiOs6AFZW9mI+PjXp1RFxoge 1Fe/6yqSSA2rHHpHQZTCIAUiwjppjiSz9lbk+ksdV06rNo4Zi+y3eRDUA8+SFr7Id8 zgAsxw8edUifZOBP7QbbB426RLY8nZwEt6O2Zax5COtyIWRZLno/CMUcFpVggElaGS I5Y+ZQc5M9mLA== Received: from mchehab by mail.kernel.org with local (Exim 4.99.1) (envelope-from ) id 1vzdm0-0000000Bhbt-1BFg; Mon, 09 Mar 2026 17:48:04 +0100 From: Mauro Carvalho Chehab To: Jonathan Corbet , Linux Doc Mailing List Cc: Mauro Carvalho Chehab , linux-kernel@vger.kernel.org, Aleksandr Loktionov , Randy Dunlap Subject: [PATCH 4/8] docs: kdoc: properly handle empty enum arguments Date: Mon, 9 Mar 2026 17:47:55 +0100 Message-ID: <4182bfb7e5f5b4bbaf05cee1bede691e56247eaf.1773074166.git.mchehab+huawei@kernel.org> X-Mailer: git-send-email 2.52.0 In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Sender: Mauro Carvalho Chehab Depending on how the enum proto is written, a comma at the end may incorrectly make kernel-doc parse an arg like " ". Strip spaces before checking if arg is empty. Signed-off-by: Mauro Carvalho Chehab --- tools/lib/python/kdoc/kdoc_parser.py | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/tools/lib/python/kdoc/kdoc_parser.py b/tools/lib/python/kdoc/k= doc_parser.py index 086579d00b5c..4b3c555e6c8e 100644 --- a/tools/lib/python/kdoc/kdoc_parser.py +++ b/tools/lib/python/kdoc/kdoc_parser.py @@ -810,9 +810,10 @@ class KernelDoc: member_set =3D set() members =3D KernRe(r'\([^;)]*\)').sub('', members) for arg in members.split(','): - if not arg: - continue arg =3D KernRe(r'^\s*(\w+).*').sub(r'\1', arg) + if not arg.strip(): + continue + self.entry.parameterlist.append(arg) if arg not in self.entry.parameterdescs: self.entry.parameterdescs[arg] =3D self.undescribed --=20 2.52.0 From nobody Thu Apr 9 10:29:06 2026 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 970633DA5BB; Mon, 9 Mar 2026 16:48:06 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773074886; cv=none; b=lkbSNHHOR1BHb+x8pygfHWxvYe+cFFRXEIxCaYH0WlUH6HIMXSbV4DBNgc5P0/rAs9YHfU8YKyTmpSa17uq+eIuaf0ADLzz4uD/X9TWMPsPNJT2h0hUhpPsoaB39s3+kxvkbRwqj9fR8UTURPfDMhEHxiQbDlZT/xmlaqeXUhro= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773074886; c=relaxed/simple; bh=Zxtsgu1TemvpgcnmCFSJTgzfv8FPAfvXqLO7ecUkY9U=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=oKlNMr3WGP6NurSqJMUCq2SsbFHGtOZdx7MHWnoBenTDRAvzUvSpvRydEAwMat2H2JHMjeY4/MmZgAG2g80gq1BljKJBF/l/H8I/0gDkrTkFrwhcZSUCYlmaGhX1EXJROUy29e4PT801c46ZBOs/inY0GRt3crB9vqmZold6rYg= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=tHaLgfeA; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="tHaLgfeA" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 5021FC4CEF7; Mon, 9 Mar 2026 16:48:06 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1773074886; bh=Zxtsgu1TemvpgcnmCFSJTgzfv8FPAfvXqLO7ecUkY9U=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=tHaLgfeA922HnEX8sKFd/khIphSQhIXouJcQ3/xldgHhaIW4DzxQVSRqkktvc3Yqs y2HUmnIZ6mizYdKIBV0mf6nchp/CjCq2PqSsUhzpwSCEKmqbquLNO0J4n2/fWhEDQI AR27htYMdR9VgcYEqNlokIMYJJXzjKEKZZzKvgANNwcx5H1y+DRq1FC0A3uwnie2Gh QtLDCTDObnnyzFHcKQzPl9wu0aihTyc49MZ2hufOf7PhvXrdyOIY8ZC0TJ3uHRFjxo dDImNZfoRXL0eCAr+G9DKi1L3vR04sQIgcRb1NUnbx/fL3j668LkD3xeogkDKoz/bv LLawyLuiUc/4w== Received: from mchehab by mail.kernel.org with local (Exim 4.99.1) (envelope-from ) id 1vzdm0-0000000Bhd6-22YC; Mon, 09 Mar 2026 17:48:04 +0100 From: Mauro Carvalho Chehab To: Jonathan Corbet , Linux Doc Mailing List Cc: Mauro Carvalho Chehab , linux-kernel@vger.kernel.org, Aleksandr Loktionov , Randy Dunlap Subject: [PATCH 5/8] docs: kdoc_re: add a C tokenizer Date: Mon, 9 Mar 2026 17:47:56 +0100 Message-ID: X-Mailer: git-send-email 2.52.0 In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Sender: Mauro Carvalho Chehab Handling C code purely using regular expressions doesn't work well. Add a C tokenizer to help doing it the right way. The tokenizer was written using as basis the Python re documentation tokenizer example from: https://docs.python.org/3/library/re.html#writing-a-tokenizer Signed-off-by: Mauro Carvalho Chehab --- tools/lib/python/kdoc/kdoc_re.py | 234 +++++++++++++++++++++++++++++++ 1 file changed, 234 insertions(+) diff --git a/tools/lib/python/kdoc/kdoc_re.py b/tools/lib/python/kdoc/kdoc_= re.py index 085b89a4547c..7bed4e9a8810 100644 --- a/tools/lib/python/kdoc/kdoc_re.py +++ b/tools/lib/python/kdoc/kdoc_re.py @@ -141,6 +141,240 @@ class KernRe: =20 return self.last_match.groups() =20 +class TokType(): + + @staticmethod + def __str__(val): + """Return the name of an enum value""" + return TokType._name_by_val.get(val, f"UNKNOWN({val})") + +class CToken(): + """ + Data class to define a C token. + """ + + # Tokens that can be used by the parser. Works like an C enum. + + COMMENT =3D 0 #: A standard C or C99 comment, including delimiter. + STRING =3D 1 #: A string, including quotation marks. + CHAR =3D 2 #: A character, including apostophes. + NUMBER =3D 3 #: A number. + PUNC =3D 4 #: A puntuation mark: ``;`` / ``,`` / ``.``. + BEGIN =3D 5 #: A begin character: ``{`` / ``[`` / ``(``. + END =3D 6 #: A end character: ``}`` / ``]`` / ``)``. + CPP =3D 7 #: A preprocessor macro. + HASH =3D 8 #: The hash character - useful to handle other macro= s. + OP =3D 9 #: A C operator (add, subtract, ...). + STRUCT =3D 10 #: A ``struct`` keyword. + UNION =3D 11 #: An ``union`` keyword. + ENUM =3D 12 #: A ``struct`` keyword. + TYPEDEF =3D 13 #: A ``typedef`` keyword. + NAME =3D 14 #: A name. Can be an ID or a type. + SPACE =3D 15 #: Any space characters, including new lines + + MISMATCH =3D 255 #: an error indicator: should never happen in practi= ce. + + # Dict to convert from an enum interger into a string. + _name_by_val =3D {v: k for k, v in dict(vars()).items() if isinstance(= v, int)} + + # Dict to convert from string to an enum-like integer value. + _name_to_val =3D {k: v for v, k in _name_by_val.items()} + + @staticmethod + def to_name(val): + """Convert from an integer value from CToken enum into a string""" + + return CToken._name_by_val.get(val, f"UNKNOWN({val})") + + @staticmethod + def from_name(name): + """Convert a string into a CToken enum value""" + if name in CToken._name_to_val: + return CToken._name_to_val[name] + + return CToken.MISMATCH + + def __init__(self, kind, value, pos, + brace_level, paren_level, bracket_level): + self.kind =3D kind + self.value =3D value + self.pos =3D pos + self.brace_level =3D brace_level + self.paren_level =3D paren_level + self.bracket_level =3D bracket_level + + def __repr__(self): + name =3D self.to_name(self.kind) + if isinstance(self.value, str): + value =3D '"' + self.value + '"' + else: + value =3D self.value + + return f"CToken({name}, {value}, {self.pos}, " \ + f"{self.brace_level}, {self.paren_level}, {self.bracket_lev= el})" + +#: Tokens to parse C code. +TOKEN_LIST =3D [ + (CToken.COMMENT, r"//[^\n]*|/\*[\s\S]*?\*/"), + + (CToken.STRING, r'"(?:\\.|[^"\\])*"'), + (CToken.CHAR, r"'(?:\\.|[^'\\])'"), + + (CToken.NUMBER, r"0[xX][0-9a-fA-F]+[uUlL]*|0[0-7]+[uUlL]*|" + r"[0-9]+(\.[0-9]*)?([eE][+-]?[0-9]+)?[fFlL]*"), + + (CToken.PUNC, r"[;,\.]"), + + (CToken.BEGIN, r"[\[\(\{]"), + + (CToken.END, r"[\]\)\}]"), + + (CToken.CPP, r"#\s*(define|include|ifdef|ifndef|if|else|elif|endif= |undef|pragma)\b"), + + (CToken.HASH, r"#"), + + (CToken.OP, r"\+\+|\-\-|\->|=3D=3D|\!=3D|<=3D|>=3D|&&|\|\||<<|>>|= \+=3D|\-=3D|\*=3D|/=3D|%=3D" + r"|&=3D|\|=3D|\^=3D|=3D|\+|\-|\*|/|%|<|>|&|\||\^|~|!|= \?|\:"), + + (CToken.STRUCT, r"\bstruct\b"), + (CToken.UNION, r"\bunion\b"), + (CToken.ENUM, r"\benum\b"), + (CToken.TYPEDEF, r"\bkinddef\b"), + + (CToken.NAME, r"[A-Za-z_][A-Za-z0-9_]*"), + + (CToken.SPACE, r"[\s]+"), + + (CToken.MISMATCH,r"."), +] + +#: Handle C continuation lines. +RE_CONT =3D KernRe(r"\\\n") + +RE_COMMENT_START =3D KernRe(r'/\*\s*') + +#: tokenizer regex. Will be filled at the first CTokenizer usage. +re_scanner =3D None + +class CTokenizer(): + """ + Scan C statements and definitions and produce tokens. + + When converted to string, it drops comments and handle public/private + values, respecting depth. + """ + + # This class is inspired and follows the basic concepts of: + # https://docs.python.org/3/library/re.html#writing-a-tokenizer + + def _tokenize(self, source): + """ + Interactor that parses ``source``, splitting it into tokens, as de= fined + at ``self.TOKEN_LIST``. + + The interactor returns a CToken class object. + """ + + # Handle continuation lines. Note that kdoc_parser already has a + # logic to do that. Still, let's keep it for completeness, as we m= ight + # end re-using this tokenizer outsize kernel-doc some day - or we = may + # eventually remove from there as a future cleanup. + source =3D RE_CONT.sub("", source) + + brace_level =3D 0 + paren_level =3D 0 + bracket_level =3D 0 + + for match in re_scanner.finditer(source): + kind =3D CToken.from_name(match.lastgroup) + pos =3D match.start() + value =3D match.group() + + if kind =3D=3D CToken.MISMATCH: + raise RuntimeError(f"Unexpected token '{value}' on {pos}:\= n\t{source}") + elif kind =3D=3D CToken.BEGIN: + if value =3D=3D '(': + paren_level +=3D 1 + elif value =3D=3D '[': + bracket_level +=3D 1 + else: # value =3D=3D '{' + brace_level +=3D 1 + + elif kind =3D=3D CToken.END: + if value =3D=3D ')' and paren_level > 0: + paren_level -=3D 1 + elif value =3D=3D ']' and bracket_level > 0: + bracket_level -=3D 1 + elif brace_level > 0: # value =3D=3D '}' + brace_level -=3D 1 + + yield CToken(kind, value, pos, + brace_level, paren_level, bracket_level) + + def __init__(self, source): + """ + Create a regular expression to handle TOKEN_LIST. + + While I generally don't like using regex group naming via: + (?P...) + + in this particular case, it makes sense, as we can pick the name + when matching a code via re_scanner(). + """ + global re_scanner + + if not re_scanner: + re_tokens =3D [] + + for kind, pattern in TOKEN_LIST: + name =3D CToken.to_name(kind) + re_tokens.append(f"(?P<{name}>{pattern})") + + re_scanner =3D KernRe("|".join(re_tokens), re.MULTILINE | re.D= OTALL) + + self.tokens =3D [] + for tok in self._tokenize(source): + self.tokens.append(tok) + + def __str__(self): + out=3D"" + show_stack =3D [True] + + for tok in self.tokens: + if tok.kind =3D=3D CToken.BEGIN: + show_stack.append(show_stack[-1]) + + elif tok.kind =3D=3D CToken.END: + prev =3D show_stack[-1] + if len(show_stack) > 1: + show_stack.pop() + + if not prev and show_stack[-1]: + # + # Try to preserve indent + # + out +=3D "\t" * (len(show_stack) - 1) + + out +=3D str(tok.value) + continue + + elif tok.kind =3D=3D CToken.COMMENT: + comment =3D RE_COMMENT_START.sub("", tok.value) + + if comment.startswith("private:"): + show_stack[-1] =3D False + show =3D False + elif comment.startswith("public:"): + show_stack[-1] =3D True + + continue + + if show_stack[-1]: + out +=3D str(tok.value) + + return out + + #: Nested delimited pairs (brackets and parenthesis) DELIMITER_PAIRS =3D { '{': '}', --=20 2.52.0 From nobody Thu Apr 9 10:29:06 2026 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 9809B3DA7C7; Mon, 9 Mar 2026 16:48:06 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773074886; cv=none; b=kvO2R2mO3MD77MHM/eh5F4ivw7J3jPA7fcWsv0LvikN39kFeZLaNqi8txLvcIvYP5QhMwGzVBwV8QJU136xzDWN12sneCqMEsz9pILvj8v1A/Yv1zFZCOsbNMWU/d8WqxAX5Wgjm+Z0XNdwiERt8xIsaEf2PyskBOzDUZjIFjZ8= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773074886; c=relaxed/simple; bh=xi9HTFCdsv+dCAkW0Hiur3D5yVumKUkJ/b9uKkf9fw8=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=F5oZhzjGk9prDiNmfb2ylmgMzBl2E/t6LlMf8yhqBJ+HX6ucLdAN2BmAqRzipcSidvbvfeCdR/cyR3+7OUnRnDhIseNZqv72qcn0kbN8um9+4DJegrXK9jgvFLgujBhdY6zreQMxRslgWuifhxbHM+q3h+/wLMm2/ocYr7dTGg4= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=Y8R6jAP0; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="Y8R6jAP0" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 74F38C2BCB0; Mon, 9 Mar 2026 16:48:06 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1773074886; bh=xi9HTFCdsv+dCAkW0Hiur3D5yVumKUkJ/b9uKkf9fw8=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=Y8R6jAP0iDqMNKIQpA+QJcNIY1PhRXYegtYcoXKlnd2t6bR1/6O8apZdpLybxfnuJ nul2hkuArze/3mM/TxZ4mLcBil/xb9bl3z2Jp/61B3HvjFiRg33w6n3eGlKMGxUaXz nBwpmQFLoYQ3IvgwkgrXhSdJUXsUx9sSReZK9mvJwSdMHMZNw3G7GdY3/5YIhhRo4w jfDs7jMEllzSwq4cxmFr1WAPBLFfUJxgf4xv99WlGgknyKFgdrOcWPlYMsBteCUONd E5zJAmM0mm2qZv11l74QkgD5KD0vrs1I4FQi/S5Zb2yEjFHiHtxWlZ3GxSNkWN0Vbn FYYImXuwnbfzg== Received: from mchehab by mail.kernel.org with local (Exim 4.99.1) (envelope-from ) id 1vzdm0-0000000BheJ-2rxL; Mon, 09 Mar 2026 17:48:04 +0100 From: Mauro Carvalho Chehab To: Jonathan Corbet , Linux Doc Mailing List Cc: Mauro Carvalho Chehab , linux-kernel@vger.kernel.org, Aleksandr Loktionov , Randy Dunlap Subject: [PATCH 6/8] docs: kdoc: use tokenizer to handle comments on structs Date: Mon, 9 Mar 2026 17:47:57 +0100 Message-ID: X-Mailer: git-send-email 2.52.0 In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Sender: Mauro Carvalho Chehab Better handle comments inside structs. After those changes, all unittests now pass: test_private: TestPublicPrivate: test balanced_inner_private: OK test balanced_non_greddy_private: OK test balanced_private: OK test no private: OK test unbalanced_inner_private: OK test unbalanced_private: OK test unbalanced_struct_group_tagged_with_private: OK test unbalanced_two_struct_group_tagged_first_with_private: OK test unbalanced_without_end_of_line: OK Ran 9 tests Signed-off-by: Mauro Carvalho Chehab --- tools/lib/python/kdoc/kdoc_parser.py | 14 ++++---------- 1 file changed, 4 insertions(+), 10 deletions(-) diff --git a/tools/lib/python/kdoc/kdoc_parser.py b/tools/lib/python/kdoc/k= doc_parser.py index 4b3c555e6c8e..6b181ead3175 100644 --- a/tools/lib/python/kdoc/kdoc_parser.py +++ b/tools/lib/python/kdoc/kdoc_parser.py @@ -13,7 +13,7 @@ import sys import re from pprint import pformat =20 -from kdoc.kdoc_re import NestedMatch, KernRe +from kdoc.kdoc_re import NestedMatch, KernRe, CTokenizer from kdoc.kdoc_item import KdocItem =20 # @@ -84,15 +84,9 @@ def trim_private_members(text): """ Remove ``struct``/``enum`` members that have been marked "private". """ - # First look for a "public:" block that ends a private region, then - # handle the "private until the end" case. - # - text =3D KernRe(r'/\*\s*private:.*?/\*\s*public:.*?\*/', flags=3Dre.S)= .sub('', text) - text =3D KernRe(r'/\*\s*private:.*', flags=3Dre.S).sub('', text) - # - # We needed the comments to do the above, but now we can take them out. - # - return KernRe(r'\s*/\*.*?\*/\s*', flags=3Dre.S).sub('', text).strip() + + tokens =3D CTokenizer(text) + return str(tokens) =20 class state: """ --=20 2.52.0 From nobody Thu Apr 9 10:29:06 2026 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C18F33DFC7F; Mon, 9 Mar 2026 16:48:06 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773074886; cv=none; b=XY2NkyR8wLiUoQxmTqpAbxPpBtCT/XUTu73UOkFKVjBJab6MfvA4X4bgAFEJo7VNeM5AFt2QcG+LFR6SvD4RpW7n12mGtEuTY7M5fUzAmRTWUZGztcGDt7FMR5ijEd8rqmByO1CzQfzLmgCpmqYw1Jic7VdevWZZkvh2mQJeXpk= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773074886; c=relaxed/simple; bh=w/naBW8NNuJGu7a0YYIA6roistwl+UTsYj4gxMVPhK8=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=nypOhz4hr7ui2yYEmviiFWQ+Q0nrCSpI2Wx+pw5oF1M9bfwyi7qM0Zv7NAbSrijlpfYzG9cW3N1zcmzkmxDbSeqKiU1nwxJ5PrEuLdByb0n/G4XU+VLFuubfekmwCmR3ya0aCBW5jxRG+TfkyrQ29XVtMiOsggKHL2tPwgsDK6M= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=CTRVEpCI; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="CTRVEpCI" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 99E60C2BCAF; Mon, 9 Mar 2026 16:48:06 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1773074886; bh=w/naBW8NNuJGu7a0YYIA6roistwl+UTsYj4gxMVPhK8=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=CTRVEpCIFIhNse9dlicZMdqjU2Ucj1VHUfBa+dMGBjHk5xoxazvpc0ml9P/ArgpUp 9UGejAVvQluCtoUrWz6msSIkzvd828sCrpRProl3YRBZeZMdQHQhAphq7PT9JjCVQN JOLIN5FYyAazCF9f9fpG49wLUDxHzsgVzS2sP8my+dWwRW3LLgyA/C9Ycum78RT1H/ h2wBzMrgfxjRCrWSfzo5vpTM2tC+7JvW//+MJrhJatMGEfMSOzbptcisEssC5P/9dm pWLSUU1KUqjaSv/PnljTJhIIgu8LCrRRU9W4I1UiqMBQPLSGayl91cHlb4/ktGjnAC 3zOMzTpchTNUw== Received: from mchehab by mail.kernel.org with local (Exim 4.99.1) (envelope-from ) id 1vzdm0-0000000BhfW-3ghd; Mon, 09 Mar 2026 17:48:04 +0100 From: Mauro Carvalho Chehab To: Jonathan Corbet , Linux Doc Mailing List Cc: Mauro Carvalho Chehab , linux-kernel@vger.kernel.org Subject: [PATCH 7/8] unittests: test_private: modify it to use CTokenizer directly Date: Mon, 9 Mar 2026 17:47:58 +0100 Message-ID: <2672257233ff73a9464c09b50924be51e25d4f59.1773074166.git.mchehab+huawei@kernel.org> X-Mailer: git-send-email 2.52.0 In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Sender: Mauro Carvalho Chehab Change the logic to use the tokenizer directly. This allows adding more unit tests to check the validty of the tokenizer itself. Signed-off-by: Mauro Carvalho Chehab --- .../{test_private.py =3D> test_tokenizer.py} | 76 +++++++++++++------ 1 file changed, 52 insertions(+), 24 deletions(-) rename tools/unittests/{test_private.py =3D> test_tokenizer.py} (85%) diff --git a/tools/unittests/test_private.py b/tools/unittests/test_tokeniz= er.py similarity index 85% rename from tools/unittests/test_private.py rename to tools/unittests/test_tokenizer.py index eae245ae8a12..da0f2c4c9e21 100755 --- a/tools/unittests/test_private.py +++ b/tools/unittests/test_tokenizer.py @@ -15,20 +15,44 @@ from unittest.mock import MagicMock SRC_DIR =3D os.path.dirname(os.path.realpath(__file__)) sys.path.insert(0, os.path.join(SRC_DIR, "../lib/python")) =20 -from kdoc.kdoc_parser import trim_private_members +from kdoc.kdoc_re import CTokenizer from unittest_helper import run_unittest =20 + + # # List of tests. # # The code will dynamically generate one test for each key on this diction= ary. # =20 +def make_private_test(name, data): + """ + Create a test named ``name`` using parameters given by ``data`` dict. + """ + + def test(self): + """In-lined lambda-like function to run the test""" + tokens =3D CTokenizer(data["source"]) + result =3D str(tokens) + + # + # Avoid whitespace false positives + # + result =3D re.sub(r"\s++", " ", result).strip() + expected =3D re.sub(r"\s++", " ", data["trimmed"]).strip() + + msg =3D f"failed when parsing this source:\n{data['source']}" + self.assertEqual(result, expected, msg=3Dmsg) + + return test + #: Tests to check if CTokenizer is handling properly public/private commen= ts. TESTS_PRIVATE =3D { # # Simplest case: no private. Ensure that trimming won't affect struct # + "__run__": make_private_test, "no private": { "source": """ struct foo { @@ -288,41 +312,45 @@ TESTS_PRIVATE =3D { }, } =20 +#: Dict containing all test groups fror CTokenizer +TESTS =3D { + "TestPublicPrivate": TESTS_PRIVATE, +} =20 -class TestPublicPrivate(unittest.TestCase): - """ - Main test class. Populated dynamically at runtime. - """ +def setUp(self): + self.maxDiff =3D None =20 - def setUp(self): - self.maxDiff =3D None +def build_test_class(group_name, table): + """ + Dynamically creates a class instance using type() as a generator + for a new class derivated from unittest.TestCase. =20 - def add_test(cls, name, source, trimmed): - """ - Dynamically add a test to the class - """ - def test(cls): - result =3D trim_private_members(source) + We're opting to do it inside a function to avoid the risk of + changing the globals() dictionary. + """ =20 - result =3D re.sub(r"\s++", " ", result).strip() - expected =3D re.sub(r"\s++", " ", trimmed).strip() + class_dict =3D { + "setUp": setUp + } =20 - msg =3D f"failed when parsing this source:\n" + source + run =3D table["__run__"] =20 - cls.assertEqual(result, expected, msg=3Dmsg) + for test_name, data in table.items(): + if test_name =3D=3D "__run__": + continue =20 - test.__name__ =3D f'test {name}' + class_dict[f"test_{test_name}"] =3D run(test_name, data) =20 - setattr(TestPublicPrivate, test.__name__, test) + cls =3D type(group_name, (unittest.TestCase,), class_dict) =20 + return cls.__name__, cls =20 # -# Populate TestPublicPrivate class +# Create classes and add them to the global dictionary # -test_class =3D TestPublicPrivate() -for name, test in TESTS_PRIVATE.items(): - test_class.add_test(name, test["source"], test["trimmed"]) - +for group, table in TESTS.items(): + t =3D build_test_class(group, table) + globals()[t[0]] =3D t[1] =20 # # main --=20 2.52.0 From nobody Thu Apr 9 10:29:06 2026 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id E5E0F3E7162; Mon, 9 Mar 2026 16:48:06 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773074887; cv=none; b=pSoS+5GO993ALZjABCMcs+W6PYVQpZGrSdfPxRRboOJXXOQN1MQq7RpDnXoXxuFoD/OTcseaFbrbuEQPgb8xy4IhvDoVUohUX60394wAFInKU2mBR2m2y3Crn8+Cc2Y+4rGKCUX2rw/tQvDOQKvzgryewlFBxOaU7hvkOMAYTXg= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773074887; c=relaxed/simple; bh=tRQ5UQGeY6yZl7zSl8udBzbIceMvjBuqhB3obUBrnHc=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=R9xe/Ib4vYWMf9V9WKKmCUFtMKtR7UrBtwP+sSUDCb23cCon7VhIII0H8YZwNn8acynpnnpARNrf+2aK7PofDqwB6Nri8XF52RjSzmdIPYwJX/JLuLQaGu0xroYxGkWeMEIIyV7yKhflDT4OaX0XEi7SIkHxlSsLV/hyqWae70M= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=Vdml6zpI; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="Vdml6zpI" Received: by smtp.kernel.org (Postfix) with ESMTPSA id C4AE8C2BC9E; Mon, 9 Mar 2026 16:48:06 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1773074886; bh=tRQ5UQGeY6yZl7zSl8udBzbIceMvjBuqhB3obUBrnHc=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=Vdml6zpIS+fXR4Y0izsY0Fr4ffRrjA1ohyPRxsR8UxljTvLA2pHeAiy7YoFgXzR1u JYK0DLCZ9UDSuC99aM6HOxgBLNoZYr4XIvss5R9/6PtFKy70F+quBo5o0orwx77ikA S6SpFpXmVgpHndDdjuJqTRO05QytPjNBOFJJUiZVrt0HyoMM109ACKmnpX1RSQn3t2 r5Ui9/iDkWVOyRGd0vBj9IkdjSXibIifijy0tytKqMK4B46cSMjWp6VhVwHRb/v5qe tdlYrmQF60XKcQaN1K6R7zMZv+sAEzXzCBFz4Q+hprhYTpX/9CI8D4n7ekv2cbME1E h0ZPsqtbbTRCQ== Received: from mchehab by mail.kernel.org with local (Exim 4.99.1) (envelope-from ) id 1vzdm1-0000000Bhgm-0I3m; Mon, 09 Mar 2026 17:48:05 +0100 From: Mauro Carvalho Chehab To: Jonathan Corbet , Linux Doc Mailing List Cc: Mauro Carvalho Chehab , linux-kernel@vger.kernel.org, Aleksandr Loktionov , Randy Dunlap Subject: [PATCH 8/8] unittests: test_tokenizer: check if the tokenizer works Date: Mon, 9 Mar 2026 17:47:59 +0100 Message-ID: <50a4be47b52450aed9f9228e06fef39df52a3dbf.1773074166.git.mchehab+huawei@kernel.org> X-Mailer: git-send-email 2.52.0 In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Sender: Mauro Carvalho Chehab Add extra tests to check if the tokenizer is working properly. Signed-off-by: Mauro Carvalho Chehab --- tools/lib/python/kdoc/kdoc_re.py | 4 +- tools/unittests/test_tokenizer.py | 109 +++++++++++++++++++++++++++++- 2 files changed, 108 insertions(+), 5 deletions(-) diff --git a/tools/lib/python/kdoc/kdoc_re.py b/tools/lib/python/kdoc/kdoc_= re.py index 7bed4e9a8810..b4e1a2dbdcc2 100644 --- a/tools/lib/python/kdoc/kdoc_re.py +++ b/tools/lib/python/kdoc/kdoc_re.py @@ -194,8 +194,8 @@ class CToken(): =20 return CToken.MISMATCH =20 - def __init__(self, kind, value, pos, - brace_level, paren_level, bracket_level): + def __init__(self, kind, value=3DNone, pos=3D0, + brace_level=3D0, paren_level=3D0, bracket_level=3D0): self.kind =3D kind self.value =3D value self.pos =3D pos diff --git a/tools/unittests/test_tokenizer.py b/tools/unittests/test_token= izer.py index da0f2c4c9e21..0955facad736 100755 --- a/tools/unittests/test_tokenizer.py +++ b/tools/unittests/test_tokenizer.py @@ -15,16 +15,118 @@ from unittest.mock import MagicMock SRC_DIR =3D os.path.dirname(os.path.realpath(__file__)) sys.path.insert(0, os.path.join(SRC_DIR, "../lib/python")) =20 -from kdoc.kdoc_re import CTokenizer +from kdoc.kdoc_re import CToken, CTokenizer from unittest_helper import run_unittest =20 - - # # List of tests. # # The code will dynamically generate one test for each key on this diction= ary. # +def tokens_to_list(tokens): + tuples =3D [] + + for tok in tokens: + if tok.kind =3D=3D CToken.SPACE: + continue + + tuples +=3D [(tok.kind, tok.value, + tok.brace_level, tok.paren_level, tok.bracket_level)] + + return tuples + + +def make_tokenizer_test(name, data): + """ + Create a test named ``name`` using parameters given by ``data`` dict. + """ + + def test(self): + """In-lined lambda-like function to run the test""" + + # + # Check if exceptions are properly handled + # + if "raises" in data: + with self.assertRaises(data["raises"]): + CTokenizer(data["source"]) + return + + # + # Check if tokenizer is producing expected results + # + tokens =3D CTokenizer(data["source"]).tokens + + result =3D tokens_to_list(tokens) + expected =3D tokens_to_list(data["expected"]) + + self.assertEqual(result, expected, msg=3Df"{name}") + + return test + +#: Tokenizer tests. +TESTS_TOKENIZER =3D { + "__run__": make_tokenizer_test, + + "basic_tokens": { + "source": """ + int a; // comment + float b =3D 1.23; + """, + "expected": [ + CToken(CToken.NAME, "int"), + CToken(CToken.NAME, "a"), + CToken(CToken.PUNC, ";"), + CToken(CToken.COMMENT, "// comment"), + CToken(CToken.NAME, "float"), + CToken(CToken.NAME, "b"), + CToken(CToken.OP, "=3D"), + CToken(CToken.NUMBER, "1.23"), + CToken(CToken.PUNC, ";"), + ], + }, + + "depth_counters": { + "source": """ + struct X { + int arr[10]; + func(a[0], (b + c)); + } + """, + "expected": [ + CToken(CToken.STRUCT, "struct"), + CToken(CToken.NAME, "X"), + CToken(CToken.BEGIN, "{", brace_level=3D1), + + CToken(CToken.NAME, "int", brace_level=3D1), + CToken(CToken.NAME, "arr", brace_level=3D1), + CToken(CToken.BEGIN, "[", brace_level=3D1, bracket_level=3D1), + CToken(CToken.NUMBER, "10", brace_level=3D1, bracket_level=3D1= ), + CToken(CToken.END, "]", brace_level=3D1), + CToken(CToken.PUNC, ";", brace_level=3D1), + CToken(CToken.NAME, "func", brace_level=3D1), + CToken(CToken.BEGIN, "(", brace_level=3D1, paren_level=3D1), + CToken(CToken.NAME, "a", brace_level=3D1, paren_level=3D1), + CToken(CToken.BEGIN, "[", brace_level=3D1, paren_level=3D1, br= acket_level=3D1), + CToken(CToken.NUMBER, "0", brace_level=3D1, paren_level=3D1, b= racket_level=3D1), + CToken(CToken.END, "]", brace_level=3D1, paren_level=3D1), + CToken(CToken.PUNC, ",", brace_level=3D1, paren_level=3D1), + CToken(CToken.BEGIN, "(", brace_level=3D1, paren_level=3D2), + CToken(CToken.NAME, "b", brace_level=3D1, paren_level=3D2), + CToken(CToken.OP, "+", brace_level=3D1, paren_level=3D2), + CToken(CToken.NAME, "c", brace_level=3D1, paren_level=3D2), + CToken(CToken.END, ")", brace_level=3D1, paren_level=3D1), + CToken(CToken.END, ")", brace_level=3D1), + CToken(CToken.PUNC, ";", brace_level=3D1), + CToken(CToken.END, "}"), + ], + }, + + "mismatch_error": { + "source": "int a$ =3D 5;", # $ is illegal + "raises": RuntimeError, + }, +} =20 def make_private_test(name, data): """ @@ -315,6 +417,7 @@ TESTS_PRIVATE =3D { #: Dict containing all test groups fror CTokenizer TESTS =3D { "TestPublicPrivate": TESTS_PRIVATE, + "TestTokenizer": TESTS_TOKENIZER, } =20 def setUp(self): --=20 2.52.0