From nobody Thu Apr  9 10:29:06 2026
Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org
 [10.30.226.201])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id D170822A1D4;
	Mon,  9 Mar 2026 16:48:05 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=10.30.226.201
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1773074885; cv=none;
 b=rnUaWalFyQqaU0ORzLoV+TwNofBjy4M+BdE6he/HOaaa3UEIMbeXHCJWGUvdKG/zSNIyeDVZJGsi3Du7+UX5o1Cj2gar/BJfMurH079dAPmTvJcI7YZ3tilrdpsM3chxDRkDtJYxFtKM/sGoTpn+KmkyWGzKiLY9qPDYAMQuPTA=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1773074885; c=relaxed/simple;
	bh=lTdoS2vsiheWATIVgcF8OQSQ0Xw+s5JEdZeeekTiLBQ=;
	h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References:
	 MIME-Version:Content-Type;
 b=RGMOZnIwFdkjoiZE/+p9Eh/4A0eZy7k6d2Qwgw9Jerj/CvCfP72GLVwblfzvWsTHIs5RFdNc/WEM8aOlIJrZjTgi0aqci95DDU6fhZuq6xXNh0zCoTM/2A7CvQOzyuzppVARA0YyAgG7aDUGOxOaBx3mG6HRHaL8++WUIzUJDpA=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org
 header.b=ShuknLG/; arc=none smtp.client-ip=10.30.226.201
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org
 header.b="ShuknLG/"
Received: by smtp.kernel.org (Postfix) with ESMTPSA id 8831CC2BC9E;
	Mon,  9 Mar 2026 16:48:05 +0000 (UTC)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org;
	s=k20201202; t=1773074885;
	bh=lTdoS2vsiheWATIVgcF8OQSQ0Xw+s5JEdZeeekTiLBQ=;
	h=From:To:Cc:Subject:Date:In-Reply-To:References:From;
	b=ShuknLG/9JbKKrc499mUfg6nXJLU62Qy5PZeBkWHji2cHr9K+hfpJlkv7jBSQsQlK
	 pKxrWju1wmIcUY0IaTmgRN/BJ49GEW3gYf6JnuCcDkWQqOnr2peS1ptp8l/l4etN7W
	 0Gm+u85+sU1KixsZPKHxGxFUstQ1Eyq5JoGKBcuA7hPVsVYescg3AJwmzfkpayEOZl
	 v+Hkio3KCENa327OE2P0L+11La3fDG674kkbGiiHUH1L4iEKLnOELv5c+hJC7PUg0+
	 TtoxIDTHB8LTnU3q8BGv3DmGP/VPzk7sChdV5voV3putHqEWDjn4PJGHh8wtlFdpiX
	 RmpRRFG+PWNOg==
Received: from mchehab by mail.kernel.org with local (Exim 4.99.1)
	(envelope-from <mchehab+huawei@kernel.org>)
	id 1vzdlz-0000000BhYE-2l0s;
	Mon, 09 Mar 2026 17:48:03 +0100
From: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
To: Jonathan Corbet <corbet@lwn.net>,
	Linux Doc Mailing List <linux-doc@vger.kernel.org>
Cc: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>,
	linux-kernel@vger.kernel.org,
	Mauro Carvalho Chehab <mchehab@kernel.org>,
	Shuah Khan <skhan@linuxfoundation.org>
Subject: [PATCH 1/8] docs: python: add helpers to run unit tests
Date: Mon,  9 Mar 2026 17:47:52 +0100
Message-ID: 
 <37999041f616ddef41e84cf2686c0264d1a51dc9.1773074166.git.mchehab+huawei@kernel.org>
X-Mailer: git-send-email 2.52.0
In-Reply-To: <cover.1773074166.git.mchehab+huawei@kernel.org>
References: <cover.1773074166.git.mchehab+huawei@kernel.org>
Precedence: bulk
X-Mailing-List: linux-kernel@vger.kernel.org
List-Id: <linux-kernel.vger.kernel.org>
List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: quoted-printable
Sender: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>

While python internal libraries have support for unit tests, its
output is not nice. Add a helper module to improve its output.

I wrote this module last year while testing some scripts I used
internally. The initial skeleton was generated with the help of
LLM tools, but it was higly modified to ensure that it will work
as I would expect.

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
---
 Documentation/tools/python.rst      |   2 +
 Documentation/tools/unittest.rst    |  24 ++
 tools/lib/python/unittest_helper.py | 353 ++++++++++++++++++++++++++++
 3 files changed, 379 insertions(+)
 create mode 100644 Documentation/tools/unittest.rst
 create mode 100755 tools/lib/python/unittest_helper.py

diff --git a/Documentation/tools/python.rst b/Documentation/tools/python.rst
index 1444c1816735..3b7299161f20 100644
--- a/Documentation/tools/python.rst
+++ b/Documentation/tools/python.rst
@@ -11,3 +11,5 @@ Python libraries
    feat
    kdoc
    kabi
+
+   unittest
diff --git a/Documentation/tools/unittest.rst b/Documentation/tools/unittes=
t.rst
new file mode 100644
index 000000000000..14a2b2a65236
--- /dev/null
+++ b/Documentation/tools/unittest.rst
@@ -0,0 +1,24 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
+Python unittest
+=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
+
+Checking consistency of python modules can be complex. Sometimes, it is
+useful to define a set of unit tests to help checking them.
+
+While the actual test implementation is usecase dependent, Python already
+provides a standard way to add unit tests by using ``import unittest``.
+
+Using such class, requires setting up a test suite. Also, the default form=
at
+is a little bit ackward. To improve it and provide a more uniform way to
+report errors, some unittest classes and functions are defined.
+
+
+Unittest helper module
+=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
+
+.. automodule:: lib.python.unittest_helper
+   :members:
+   :show-inheritance:
+   :undoc-members:
diff --git a/tools/lib/python/unittest_helper.py b/tools/lib/python/unittes=
t_helper.py
new file mode 100755
index 000000000000..55d444cd73d4
--- /dev/null
+++ b/tools/lib/python/unittest_helper.py
@@ -0,0 +1,353 @@
+#!/usr/bin/env python3
+# SPDX-License-Identifier: GPL-2.0
+# Copyright(c) 2025-2026: Mauro Carvalho Chehab <mchehab@kernel.org>.
+#
+# pylint: disable=3DC0103,R0912,R0914,E1101
+
+"""
+Provides helper functions and classes execute python unit tests.
+
+Those help functions provide a nice colored output summary of each
+executed test and, when a test fails, it shows the different in diff
+format when running in verbose mode, like::
+
+    $ tools/unittests/nested_match.py -v
+    ...
+    Traceback (most recent call last):
+    File "/new_devel/docs/tools/unittests/nested_match.py", line 69, in te=
st_count_limit
+        self.assertEqual(replaced, "bar(a); bar(b); foo(c)")
+        ~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+    AssertionError: 'bar(a) foo(b); foo(c)' !=3D 'bar(a); bar(b); foo(c)'
+    - bar(a) foo(b); foo(c)
+    ?       ^^^^
+    + bar(a); bar(b); foo(c)
+    ?       ^^^^^
+    ...
+
+It also allows filtering what tests will be executed via ``-k`` parameter.
+
+Typical usage is to do::
+
+    from unittest_helper import run_unittest
+    ...
+
+    if __name__ =3D=3D "__main__":
+        run_unittest(__file__)
+
+If passing arguments is needed, on a more complex scenario, it can be
+used like on this example::
+
+    from unittest_helper import TestUnits, run_unittest
+    ...
+    env =3D {'sudo': ""}
+    ...
+    if __name__ =3D=3D "__main__":
+        runner =3D TestUnits()
+        base_parser =3D runner.parse_args()
+        base_parser.add_argument('--sudo', action=3D'store_true',
+                                help=3D'Enable tests requiring sudo privil=
eges')
+
+        args =3D base_parser.parse_args()
+
+        # Update module-level flag
+        if args.sudo:
+            env['sudo'] =3D "1"
+
+        # Run tests with customized arguments
+        runner.run(__file__, parser=3Dbase_parser, args=3Dargs, env=3Denv)
+"""
+
+import argparse
+import atexit
+import os
+import re
+import unittest
+import sys
+
+from unittest.mock import patch
+
+
+class Summary(unittest.TestResult):
+    """
+    Overrides ``unittest.TestResult`` class to provide a nice colored
+    summary. When in verbose mode, displays actual/expected difference in
+    unified diff format.
+    """
+    def __init__(self, *args, **kwargs):
+        super().__init__(*args, **kwargs)
+
+        #: Dictionary to store organized test results.
+        self.test_results =3D {}
+
+        #: max length of the test names.
+        self.max_name_length =3D 0
+
+    def startTest(self, test):
+        super().startTest(test)
+        test_id =3D test.id()
+        parts =3D test_id.split(".")
+
+        # Extract module, class, and method names
+        if len(parts) >=3D 3:
+            module_name =3D parts[-3]
+        else:
+            module_name =3D ""
+        if len(parts) >=3D 2:
+            class_name =3D parts[-2]
+        else:
+            class_name =3D ""
+
+        method_name =3D parts[-1]
+
+        # Build the hierarchical structure
+        if module_name not in self.test_results:
+            self.test_results[module_name] =3D {}
+
+        if class_name not in self.test_results[module_name]:
+            self.test_results[module_name][class_name] =3D []
+
+        # Track maximum test name length for alignment
+        display_name =3D f"{method_name}:"
+
+        self.max_name_length =3D max(len(display_name), self.max_name_leng=
th)
+
+    def _record_test(self, test, status):
+        test_id =3D test.id()
+        parts =3D test_id.split(".")
+        if len(parts) >=3D 3:
+            module_name =3D parts[-3]
+        else:
+            module_name =3D ""
+        if len(parts) >=3D 2:
+            class_name =3D parts[-2]
+        else:
+            class_name =3D ""
+        method_name =3D parts[-1]
+        self.test_results[module_name][class_name].append((method_name, st=
atus))
+
+    def addSuccess(self, test):
+        super().addSuccess(test)
+        self._record_test(test, "OK")
+
+    def addFailure(self, test, err):
+        super().addFailure(test, err)
+        self._record_test(test, "FAIL")
+
+    def addError(self, test, err):
+        super().addError(test, err)
+        self._record_test(test, "ERROR")
+
+    def addSkip(self, test, reason):
+        super().addSkip(test, reason)
+        self._record_test(test, f"SKIP ({reason})")
+
+    def printResults(self):
+        """
+        Print results using colors if tty.
+        """
+        # Check for ANSI color support
+        use_color =3D sys.stdout.isatty()
+        COLORS =3D {
+            "OK":            "\033[32m",   # Green
+            "FAIL":          "\033[31m",   # Red
+            "SKIP":          "\033[1;33m", # Yellow
+            "PARTIAL":       "\033[33m",   # Orange
+            "EXPECTED_FAIL": "\033[36m",   # Cyan
+            "reset":         "\033[0m",    # Reset to default terminal col=
or
+        }
+        if not use_color:
+            for c in COLORS:
+                COLORS[c] =3D ""
+
+        # Calculate maximum test name length
+        if not self.test_results:
+            return
+        try:
+            lengths =3D []
+            for module in self.test_results.values():
+                for tests in module.values():
+                    for test_name, _ in tests:
+                        lengths.append(len(test_name) + 1)  # +1 for colon
+            max_length =3D max(lengths) + 2  # Additional padding
+        except ValueError:
+            sys.exit("Test list is empty")
+
+        # Print results
+        for module_name, classes in self.test_results.items():
+            print(f"{module_name}:")
+            for class_name, tests in classes.items():
+                print(f"    {class_name}:")
+                for test_name, status in tests:
+                    # Get base status without reason for SKIP
+                    if status.startswith("SKIP"):
+                        status_code =3D status.split()[0]
+                    else:
+                        status_code =3D status
+                    color =3D COLORS.get(status_code, "")
+                    print(
+                        f"        {test_name + ':':<{max_length}}{color}{s=
tatus}{COLORS['reset']}"
+                    )
+            print()
+
+        # Print summary
+        print(f"\nRan {self.testsRun} tests", end=3D"")
+        if hasattr(self, "timeTaken"):
+            print(f" in {self.timeTaken:.3f}s", end=3D"")
+        print()
+
+        if not self.wasSuccessful():
+            print(f"\n{COLORS['FAIL']}FAILED (", end=3D"")
+            failures =3D getattr(self, "failures", [])
+            errors =3D getattr(self, "errors", [])
+            if failures:
+                print(f"failures=3D{len(failures)}", end=3D"")
+            if errors:
+                if failures:
+                    print(", ", end=3D"")
+                print(f"errors=3D{len(errors)}", end=3D"")
+            print(f"){COLORS['reset']}")
+
+
+def flatten_suite(suite):
+    """Flatten test suite hierarchy."""
+    tests =3D []
+    for item in suite:
+        if isinstance(item, unittest.TestSuite):
+            tests.extend(flatten_suite(item))
+        else:
+            tests.append(item)
+    return tests
+
+
+class TestUnits:
+    """
+    Helper class to set verbosity level.
+
+    This class discover test files, import its unittest classes and
+    executes the test on it.
+    """
+    def parse_args(self):
+        """Returns a parser for command line arguments."""
+        parser =3D argparse.ArgumentParser(description=3D"Test runner with=
 regex filtering")
+        parser.add_argument("-v", "--verbose", action=3D"count", default=
=3D1)
+        parser.add_argument("-f", "--failfast", action=3D"store_true")
+        parser.add_argument("-k", "--keyword",
+                            help=3D"Regex pattern to filter test methods")
+        return parser
+
+    def run(self, caller_file=3DNone, pattern=3DNone,
+            suite=3DNone, parser=3DNone, args=3DNone, env=3DNone):
+        """
+        Execute all tests from the unity test file.
+
+        It contains several optional parameters:
+
+        ``caller_file``:
+            -  name of the file that contains test.
+
+               typical usage is to place __file__ at the caller test, e.g.=
::
+
+                    if __name__ =3D=3D "__main__":
+                        TestUnits().run(__file__)
+
+        ``pattern``:
+            - optional pattern to match multiple file names. Defaults
+              to basename of ``caller_file``.
+
+        ``suite``:
+            - an unittest suite initialized by the caller using
+              ``unittest.TestLoader().discover()``.
+
+        ``parser``:
+            - an argparse parser. If not defined, this helper will create
+              one.
+
+        ``args``:
+            - an ``argparse.Namespace`` data filled by the caller.
+
+        ``env``:
+            - environment variables that will be passed to the test suite
+
+        At least ``caller_file`` or ``suite`` must be used, otherwise a
+        ``TypeError`` will be raised.
+        """
+        if not args:
+            if not parser:
+                parser =3D self.parse_args()
+            args =3D parser.parse_args()
+
+        if not caller_file and not suite:
+            raise TypeError("Either caller_file or suite is needed at Test=
Units")
+
+        verbose =3D args.verbose
+
+        if not env:
+            env =3D os.environ.copy()
+
+        env["VERBOSE"] =3D f"{verbose}"
+
+        patcher =3D patch.dict(os.environ, env)
+        patcher.start()
+        # ensure it gets stopped after
+        atexit.register(patcher.stop)
+
+
+        if verbose >=3D 2:
+            unittest.TextTestRunner(verbosity=3Dverbose).run =3D lambda su=
ite: suite
+
+        # Load ONLY tests from the calling file
+        if not suite:
+            if not pattern:
+                pattern =3D caller_file
+
+            loader =3D unittest.TestLoader()
+            suite =3D loader.discover(start_dir=3Dos.path.dirname(caller_f=
ile),
+                                    pattern=3Dos.path.basename(caller_file=
))
+
+        # Flatten the suite for environment injection
+        tests_to_inject =3D flatten_suite(suite)
+
+        # Filter tests by method name if -k specified
+        if args.keyword:
+            try:
+                pattern =3D re.compile(args.keyword)
+                filtered_suite =3D unittest.TestSuite()
+                for test in tests_to_inject:  # Use the pre-flattened list
+                    method_name =3D test.id().split(".")[-1]
+                    if pattern.search(method_name):
+                        filtered_suite.addTest(test)
+                suite =3D filtered_suite
+            except re.error as e:
+                sys.stderr.write(f"Invalid regex pattern: {e}\n")
+                sys.exit(1)
+        else:
+            # Maintain original suite structure if no keyword filtering
+            suite =3D unittest.TestSuite(tests_to_inject)
+
+        if verbose >=3D 2:
+            resultclass =3D None
+        else:
+            resultclass =3D Summary
+
+        runner =3D unittest.TextTestRunner(verbosity=3Dargs.verbose,
+                                            resultclass=3Dresultclass,
+                                            failfast=3Dargs.failfast)
+        result =3D runner.run(suite)
+        if resultclass:
+            result.printResults()
+
+        sys.exit(not result.wasSuccessful())
+
+
+def run_unittest(fname):
+    """
+    Basic usage of TestUnits class.
+
+    Use it when there's no need to pass any extra argument to the tests
+    with. The recommended way is to place this at the end of each
+    unittest module::
+
+        if __name__ =3D=3D "__main__":
+            run_unittest(__file__)
+    """
+    TestUnits().run(fname)
--=20
2.52.0
From nobody Thu Apr  9 10:29:06 2026
Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org
 [10.30.226.201])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id 3DA423BE14A;
	Mon,  9 Mar 2026 16:48:06 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=10.30.226.201
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1773074886; cv=none;
 b=YkRFwjUAeJu35TA5lFVjKyjJKXQVkllUvmFJtIB/LMPa5Q1gPBAnWJG/oM3FIKJM3FS0vOpfgmGqC4sWWJ5LnOB3GUgiJ/lyBlU3T2+yNNwi8QzIZ1KxigTrVy/aEVXWpbZHyREsK50tnykD7VojxGDIsRkYT+xiMY7Yc+hFiCM=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1773074886; c=relaxed/simple;
	bh=QUqLYmPuQ3O4ZK29hZAwAg0doyghqKMTrXuQP8pJrMQ=;
	h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References:
	 MIME-Version:Content-Type;
 b=e4wL8zUMdU8nuNWlT+/iXkakgf7zZ/onDRLqi9xtW0Vr+3E5dJ8WvPXvK3iaVVzv1Mv1r2wJKopXQnDPi/HNEugO7IrUF6gAWjEDW6P3C/ZSnxsegv8TI+F/jvXhgwAXyvG6MDMzflKZOa98DcfhbPAGedUAqNhxhyiLtGz/h34=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org
 header.b=OAJ3jtqZ; arc=none smtp.client-ip=10.30.226.201
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org
 header.b="OAJ3jtqZ"
Received: by smtp.kernel.org (Postfix) with ESMTPSA id C14D5C2BCB0;
	Mon,  9 Mar 2026 16:48:05 +0000 (UTC)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org;
	s=k20201202; t=1773074886;
	bh=QUqLYmPuQ3O4ZK29hZAwAg0doyghqKMTrXuQP8pJrMQ=;
	h=From:To:Cc:Subject:Date:In-Reply-To:References:From;
	b=OAJ3jtqZsO3bO8eiCpMFkYulaN4oCuILxwC6HxXPAmoVbien7B+BIMovr0arFnnyx
	 47GYt+L4ilEjQlywGBKLbAQK6IRNCPE06vD37lBkHEKy6iIbYdHt2wOSvX70sdOebm
	 jbqdOHr28XdiuRhvlmpC+bQ8RzJgSX8VaL6SZFf4bhXfoV8vCmPb20aC7+VMUETHQl
	 9M+aZpI8WwFV6YU5uRnJIN50enflnzIDt3AlGwZtuTO+DFfuSsrl9FC4EASFMLCE+l
	 nV0YrqA5sje/cGzjFyMQDD40S8LKiBa4jI+WWVGtK5vU2cIvdkXoO5TlAWTnqPITIT
	 uO8PviTZvrCFA==
Received: from mchehab by mail.kernel.org with local (Exim 4.99.1)
	(envelope-from <mchehab+huawei@kernel.org>)
	id 1vzdlz-0000000BhZR-3cs0;
	Mon, 09 Mar 2026 17:48:03 +0100
From: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
To: Jonathan Corbet <corbet@lwn.net>,
	Linux Doc Mailing List <linux-doc@vger.kernel.org>
Cc: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>,
	linux-kernel@vger.kernel.org
Subject: [PATCH 2/8] unittests: add a testbench to check public/private kdoc
 comments
Date: Mon,  9 Mar 2026 17:47:53 +0100
Message-ID: 
 <144f4952e0cb74fe9c9adc117e9a21ec8aa1cc10.1773074166.git.mchehab+huawei@kernel.org>
X-Mailer: git-send-email 2.52.0
In-Reply-To: <cover.1773074166.git.mchehab+huawei@kernel.org>
References: <cover.1773074166.git.mchehab+huawei@kernel.org>
Precedence: bulk
X-Mailing-List: linux-kernel@vger.kernel.org
List-Id: <linux-kernel.vger.kernel.org>
List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: quoted-printable
Sender: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>

Add unit tests to check if the public/private and comments strip
is working properly.

Running it shows that, on several cases, public/private is not
doing what it is expected:

  test_private:
    TestPublicPrivate:
        test balanced_inner_private:                                 OK
        test balanced_non_greddy_private:                            OK
        test balanced_private:                                       OK
        test no private:                                             OK
        test unbalanced_inner_private:                               FAIL
        test unbalanced_private:                                     FAIL
        test unbalanced_struct_group_tagged_with_private:            FAIL
        test unbalanced_two_struct_group_tagged_first_with_private:  FAIL
        test unbalanced_without_end_of_line:                         FAIL

  Ran 9 tests

  FAILED (failures=3D5)

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
---
 tools/unittests/test_private.py | 331 ++++++++++++++++++++++++++++++++
 1 file changed, 331 insertions(+)
 create mode 100755 tools/unittests/test_private.py

diff --git a/tools/unittests/test_private.py b/tools/unittests/test_private=
.py
new file mode 100755
index 000000000000..eae245ae8a12
--- /dev/null
+++ b/tools/unittests/test_private.py
@@ -0,0 +1,331 @@
+#!/usr/bin/env python3
+
+"""
+Unit tests for struct/union member extractor class.
+"""
+
+
+import os
+import re
+import unittest
+import sys
+
+from unittest.mock import MagicMock
+
+SRC_DIR =3D os.path.dirname(os.path.realpath(__file__))
+sys.path.insert(0, os.path.join(SRC_DIR, "../lib/python"))
+
+from kdoc.kdoc_parser import trim_private_members
+from unittest_helper import run_unittest
+
+#
+# List of tests.
+#
+# The code will dynamically generate one test for each key on this diction=
ary.
+#
+
+#: Tests to check if CTokenizer is handling properly public/private commen=
ts.
+TESTS_PRIVATE =3D {
+    #
+    # Simplest case: no private. Ensure that trimming won't affect struct
+    #
+    "no private": {
+        "source": """
+            struct foo {
+                int a;
+                int b;
+                int c;
+            };
+        """,
+        "trimmed": """
+            struct foo {
+                int a;
+                int b;
+                int c;
+            };
+        """,
+    },
+
+    #
+    # Play "by the books" by always having a public in place
+    #
+
+    "balanced_private": {
+        "source": """
+            struct foo {
+                int a;
+                /* private: */
+                int b;
+                /* public: */
+                int c;
+            };
+        """,
+        "trimmed": """
+            struct foo {
+                int a;
+                int c;
+            };
+        """,
+    },
+
+    "balanced_non_greddy_private": {
+        "source": """
+            struct foo {
+                int a;
+                /* private: */
+                int b;
+                /* public: */
+                int c;
+                /* private: */
+                int d;
+                /* public: */
+                int e;
+
+            };
+        """,
+        "trimmed": """
+            struct foo {
+                int a;
+                int c;
+                int e;
+            };
+        """,
+    },
+
+    "balanced_inner_private": {
+        "source": """
+            struct foo {
+                struct {
+                    int a;
+                    /* private: ignore below */
+                    int b;
+                /* public: but this should not be ignored */
+                };
+                int b;
+            };
+        """,
+        "trimmed": """
+            struct foo {
+                struct {
+                    int a;
+                };
+                int b;
+            };
+        """,
+    },
+
+    #
+    # Test what happens if there's no public after private place
+    #
+
+    "unbalanced_private": {
+        "source": """
+            struct foo {
+                int a;
+                /* private: */
+                int b;
+                int c;
+            };
+        """,
+        "trimmed": """
+            struct foo {
+                int a;
+            };
+        """,
+    },
+
+    "unbalanced_inner_private": {
+        "source": """
+            struct foo {
+                struct {
+                    int a;
+                    /* private: ignore below */
+                    int b;
+                /* but this should not be ignored */
+                };
+                int b;
+            };
+        """,
+        "trimmed": """
+            struct foo {
+                struct {
+                    int a;
+                };
+                int b;
+            };
+        """,
+    },
+
+    "unbalanced_struct_group_tagged_with_private": {
+        "source": """
+            struct page_pool_params {
+                struct_group_tagged(page_pool_params_fast, fast,
+                        unsigned int    order;
+                        unsigned int    pool_size;
+                        int             nid;
+                        struct device   *dev;
+                        struct napi_struct *napi;
+                        enum dma_data_direction dma_dir;
+                        unsigned int    max_len;
+                        unsigned int    offset;
+                };
+                struct_group_tagged(page_pool_params_slow, slow,
+                        struct net_device *netdev;
+                        unsigned int queue_idx;
+                        unsigned int    flags;
+                        /* private: used by test code only */
+                        void (*init_callback)(netmem_ref netmem, void *arg=
);
+                        void *init_arg;
+                };
+            };
+        """,
+        "trimmed": """
+            struct page_pool_params {
+                struct_group_tagged(page_pool_params_fast, fast,
+                        unsigned int    order;
+                        unsigned int    pool_size;
+                        int             nid;
+                        struct device   *dev;
+                        struct napi_struct *napi;
+                        enum dma_data_direction dma_dir;
+                        unsigned int    max_len;
+                        unsigned int    offset;
+                };
+                struct_group_tagged(page_pool_params_slow, slow,
+                        struct net_device *netdev;
+                        unsigned int queue_idx;
+                        unsigned int    flags;
+                };
+            };
+        """,
+    },
+
+    "unbalanced_two_struct_group_tagged_first_with_private": {
+        "source": """
+            struct page_pool_params {
+                struct_group_tagged(page_pool_params_slow, slow,
+                        struct net_device *netdev;
+                        unsigned int queue_idx;
+                        unsigned int    flags;
+                        /* private: used by test code only */
+                        void (*init_callback)(netmem_ref netmem, void *arg=
);
+                        void *init_arg;
+                };
+                struct_group_tagged(page_pool_params_fast, fast,
+                        unsigned int    order;
+                        unsigned int    pool_size;
+                        int             nid;
+                        struct device   *dev;
+                        struct napi_struct *napi;
+                        enum dma_data_direction dma_dir;
+                        unsigned int    max_len;
+                        unsigned int    offset;
+                };
+            };
+        """,
+        "trimmed": """
+            struct page_pool_params {
+                struct_group_tagged(page_pool_params_slow, slow,
+                        struct net_device *netdev;
+                        unsigned int queue_idx;
+                        unsigned int    flags;
+                };
+                struct_group_tagged(page_pool_params_fast, fast,
+                        unsigned int    order;
+                        unsigned int    pool_size;
+                        int             nid;
+                        struct device   *dev;
+                        struct napi_struct *napi;
+                        enum dma_data_direction dma_dir;
+                        unsigned int    max_len;
+                        unsigned int    offset;
+                };
+            };
+        """,
+    },
+    "unbalanced_without_end_of_line": {
+        "source": """ \
+            struct page_pool_params { \
+                struct_group_tagged(page_pool_params_slow, slow, \
+                        struct net_device *netdev; \
+                        unsigned int queue_idx; \
+                        unsigned int    flags;
+                        /* private: used by test code only */
+                        void (*init_callback)(netmem_ref netmem, void *arg=
); \
+                        void *init_arg; \
+                }; \
+                struct_group_tagged(page_pool_params_fast, fast, \
+                        unsigned int    order; \
+                        unsigned int    pool_size; \
+                        int             nid; \
+                        struct device   *dev; \
+                        struct napi_struct *napi; \
+                        enum dma_data_direction dma_dir; \
+                        unsigned int    max_len; \
+                        unsigned int    offset; \
+                }; \
+            };
+        """,
+        "trimmed": """
+            struct page_pool_params {
+                struct_group_tagged(page_pool_params_slow, slow,
+                        struct net_device *netdev;
+                        unsigned int queue_idx;
+                        unsigned int    flags;
+                };
+                struct_group_tagged(page_pool_params_fast, fast,
+                        unsigned int    order;
+                        unsigned int    pool_size;
+                        int             nid;
+                        struct device   *dev;
+                        struct napi_struct *napi;
+                        enum dma_data_direction dma_dir;
+                        unsigned int    max_len;
+                        unsigned int    offset;
+                };
+            };
+        """,
+    },
+}
+
+
+class TestPublicPrivate(unittest.TestCase):
+    """
+    Main test class. Populated dynamically at runtime.
+    """
+
+    def setUp(self):
+        self.maxDiff =3D None
+
+    def add_test(cls, name, source, trimmed):
+        """
+        Dynamically add a test to the class
+        """
+        def test(cls):
+            result =3D trim_private_members(source)
+
+            result =3D re.sub(r"\s++", " ", result).strip()
+            expected =3D re.sub(r"\s++", " ", trimmed).strip()
+
+            msg =3D f"failed when parsing this source:\n" + source
+
+            cls.assertEqual(result, expected, msg=3Dmsg)
+
+        test.__name__ =3D f'test {name}'
+
+        setattr(TestPublicPrivate, test.__name__, test)
+
+
+#
+# Populate TestPublicPrivate class
+#
+test_class =3D TestPublicPrivate()
+for name, test in TESTS_PRIVATE.items():
+    test_class.add_test(name, test["source"], test["trimmed"])
+
+
+#
+# main
+#
+if __name__ =3D=3D "__main__":
+    run_unittest(__file__)
--=20
2.52.0
From nobody Thu Apr  9 10:29:06 2026
Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org
 [10.30.226.201])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id EB0C42494ED;
	Mon,  9 Mar 2026 16:48:05 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=10.30.226.201
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1773074886; cv=none;
 b=JYBtoSbgqwDxBs2nYVYBxmOwPlu05x+OIkpgQSErUcblaFkVdahhG0RK4D1u+AoyuWb8oiQX0UYIZqE/LJr+yx3b6q/zDFBKGJaIEyii28KynKEr90U+W9qPKsTArnGVLymh76+JmveWwgK71TbOHBFXan8zZ8c3Wk1UzWhO9uE=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1773074886; c=relaxed/simple;
	bh=Og3btdqqsss2BN07XAVoT1mCvNdW9py94Li7vBmuxJ4=;
	h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References:
	 MIME-Version:Content-Type;
 b=K4JoY2Px/qPV7ADdmLD8PaursFTI0zfgkYTn+CPyVRCtX2pakRlbF2HNrKq2OELYk6QquDa1Pmskj8ujMoCKIE1Jwf078toeGlpwlxqbyTrrAbzvbRb9hc1TxrZKOpoQ4AoRbq8piFI+duQ+190OUEy8H6W2YhSKuTjS0EfRkg0=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org
 header.b=YUfyEWvr; arc=none smtp.client-ip=10.30.226.201
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org
 header.b="YUfyEWvr"
Received: by smtp.kernel.org (Postfix) with ESMTPSA id C3AFAC2BCB2;
	Mon,  9 Mar 2026 16:48:05 +0000 (UTC)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org;
	s=k20201202; t=1773074885;
	bh=Og3btdqqsss2BN07XAVoT1mCvNdW9py94Li7vBmuxJ4=;
	h=From:To:Cc:Subject:Date:In-Reply-To:References:From;
	b=YUfyEWvrzNP4u8VpiIWyyDzjz8AZuCmEeDOIvLQOUdAhZS+I59gOPLT5M5H+D9f3A
	 RA+4RrdQ6euOM/yiRDCvFFHsN209Guhm5jeiQiHsJMHFwgF1T1qricAxKxF+v4DAoQ
	 KsHu46bPGadSr7OQqBQd1D2e142+tzXOaRs3P9fpLD0BGwH9/Syq6iUyXvXlWEZurl
	 YMKlI1FDSLm2z55aL0RD8NkqerYBN4OQZlity/5QV43H9GhJQArmTdZuFfQIvSRh+V
	 mJQfDni61ORCXiJa1G8qmQcLN2FOZxCjbBBnWBsRO92uczcXTeQ2LAPgjJhtk3A3c8
	 xsnpNw7yj/pMw==
Received: from mchehab by mail.kernel.org with local (Exim 4.99.1)
	(envelope-from <mchehab+huawei@kernel.org>)
	id 1vzdm0-0000000Bhag-0IlN;
	Mon, 09 Mar 2026 17:48:04 +0100
From: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
To: Jonathan Corbet <corbet@lwn.net>,
	Linux Doc Mailing List <linux-doc@vger.kernel.org>
Cc: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>,
	linux-kernel@vger.kernel.org,
	Aleksandr Loktionov <aleksandr.loktionov@intel.com>,
	Randy Dunlap <rdunlap@infradead.org>
Subject: [PATCH 3/8] docs: kdoc: don't add broken comments inside prototypes
Date: Mon,  9 Mar 2026 17:47:54 +0100
Message-ID: 
 <18e577dbbd538dcc22945ff139fe3638344e14f0.1773074166.git.mchehab+huawei@kernel.org>
X-Mailer: git-send-email 2.52.0
In-Reply-To: <cover.1773074166.git.mchehab+huawei@kernel.org>
References: <cover.1773074166.git.mchehab+huawei@kernel.org>
Precedence: bulk
X-Mailing-List: linux-kernel@vger.kernel.org
List-Id: <linux-kernel.vger.kernel.org>
List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: quoted-printable
Sender: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>

Parsing a file like drivers/scsi/isci/host.h, which contains
broken kernel-doc markups makes it create a prototype that contains
unmatched end comments.

That causes, for instance, struct sci_power_control to be shown this
this prototype:

    struct sci_power_control {
        * it is not. */ bool timer_started;
        */ struct sci_timer timer;
        * requesters field. */ u8 phys_waiting;
        */ u8 phys_granted_power;
        * mapped into requesters via struct sci_phy.phy_index */ struct isc=
i_phy *requesters[SCI_MAX_PHYS];
    };

as comments won't start with "/*" anymore.

Fix the logic to detect such cases, and keep adding the comments
inside it.

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
---
 tools/lib/python/kdoc/kdoc_parser.py | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/tools/lib/python/kdoc/kdoc_parser.py b/tools/lib/python/kdoc/k=
doc_parser.py
index edf70ba139a5..086579d00b5c 100644
--- a/tools/lib/python/kdoc/kdoc_parser.py
+++ b/tools/lib/python/kdoc/kdoc_parser.py
@@ -1355,6 +1355,12 @@ class KernelDoc:
         elif doc_content.search(line):
             self.emit_msg(ln, f"Incorrect use of kernel-doc format: {line}=
")
             self.state =3D state.PROTO
+
+            #
+            # Don't let it add partial comments at the code, as breaks the
+            # logic meant to remove comments from prototypes.
+            #
+            self.process_proto_type(ln, "/**\n" + line)
         # else ... ??
=20
     def process_inline_text(self, ln, line):
--=20
2.52.0
From nobody Thu Apr  9 10:29:06 2026
Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org
 [10.30.226.201])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id 3DAD93BED5F;
	Mon,  9 Mar 2026 16:48:06 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=10.30.226.201
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1773074886; cv=none;
 b=a4XclYHn6ppbtEZ+iRSrJyPMcbG8In4oNWl0T+2rmJCm4qz9OWkn5iMuntYGRwzGlLbWAKI4ITGjy32CkVm5o7PeK9h3PgcQMSVoKGTV/uHLTo2IHKD10t3mpJQv6wKUshiSqiPDSUmpBWhvVZAXiqCeM25cpIhnnUmOoV+c8ik=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1773074886; c=relaxed/simple;
	bh=iR9cZEMBrhxbAljLc1W53eZP2bmd/b55ybR+IN+0RyI=;
	h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References:
	 MIME-Version:Content-Type;
 b=g4QRRa95AahatHbYZFRQUz6kPH87SuL8OCacXwDAMNsaxlbTj4I+hL+YjHx5klWki5SS5d232R4kwmmM7c6EGDayvGl/g9RSQHWawUx/vmAgepzbKhTE9DbibFDBURvN/vXXMEaNdjWRVZ2LZc3Zk8hq7v/VsQCIRlfsi5jAn7M=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org
 header.b=TvFjYCzy; arc=none smtp.client-ip=10.30.226.201
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org
 header.b="TvFjYCzy"
Received: by smtp.kernel.org (Postfix) with ESMTPSA id 0B320C2BCB1;
	Mon,  9 Mar 2026 16:48:06 +0000 (UTC)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org;
	s=k20201202; t=1773074886;
	bh=iR9cZEMBrhxbAljLc1W53eZP2bmd/b55ybR+IN+0RyI=;
	h=From:To:Cc:Subject:Date:In-Reply-To:References:From;
	b=TvFjYCzye2aOkH4hWram9Eq8FP4qUiAKApgNOe1HTEtvR72dz42lSH599bcDMmUJO
	 Ac149IMP0jZ+mqDysTamGGDm6S8tZc6QHxWbhiz8rNvyCHP4JATQfVXfpX9f5wL0Z2
	 8oA5QkIW4m+5z59uFKtqSX063uifDl6vcci4/cTH/fYiOs6AFZW9mI+PjXp1RFxoge
	 1Fe/6yqSSA2rHHpHQZTCIAUiwjppjiSz9lbk+ksdV06rNo4Zi+y3eRDUA8+SFr7Id8
	 zgAsxw8edUifZOBP7QbbB426RLY8nZwEt6O2Zax5COtyIWRZLno/CMUcFpVggElaGS
	 I5Y+ZQc5M9mLA==
Received: from mchehab by mail.kernel.org with local (Exim 4.99.1)
	(envelope-from <mchehab+huawei@kernel.org>)
	id 1vzdm0-0000000Bhbt-1BFg;
	Mon, 09 Mar 2026 17:48:04 +0100
From: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
To: Jonathan Corbet <corbet@lwn.net>,
	Linux Doc Mailing List <linux-doc@vger.kernel.org>
Cc: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>,
	linux-kernel@vger.kernel.org,
	Aleksandr Loktionov <aleksandr.loktionov@intel.com>,
	Randy Dunlap <rdunlap@infradead.org>
Subject: [PATCH 4/8] docs: kdoc: properly handle empty enum arguments
Date: Mon,  9 Mar 2026 17:47:55 +0100
Message-ID: 
 <4182bfb7e5f5b4bbaf05cee1bede691e56247eaf.1773074166.git.mchehab+huawei@kernel.org>
X-Mailer: git-send-email 2.52.0
In-Reply-To: <cover.1773074166.git.mchehab+huawei@kernel.org>
References: <cover.1773074166.git.mchehab+huawei@kernel.org>
Precedence: bulk
X-Mailing-List: linux-kernel@vger.kernel.org
List-Id: <linux-kernel.vger.kernel.org>
List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: quoted-printable
Sender: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>

Depending on how the enum proto is written, a comma at the end
may incorrectly make kernel-doc parse an arg like " ".

Strip spaces before checking if arg is empty.

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
---
 tools/lib/python/kdoc/kdoc_parser.py | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/tools/lib/python/kdoc/kdoc_parser.py b/tools/lib/python/kdoc/k=
doc_parser.py
index 086579d00b5c..4b3c555e6c8e 100644
--- a/tools/lib/python/kdoc/kdoc_parser.py
+++ b/tools/lib/python/kdoc/kdoc_parser.py
@@ -810,9 +810,10 @@ class KernelDoc:
         member_set =3D set()
         members =3D KernRe(r'\([^;)]*\)').sub('', members)
         for arg in members.split(','):
-            if not arg:
-                continue
             arg =3D KernRe(r'^\s*(\w+).*').sub(r'\1', arg)
+            if not arg.strip():
+                continue
+
             self.entry.parameterlist.append(arg)
             if arg not in self.entry.parameterdescs:
                 self.entry.parameterdescs[arg] =3D self.undescribed
--=20
2.52.0
From nobody Thu Apr  9 10:29:06 2026
Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org
 [10.30.226.201])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id 970633DA5BB;
	Mon,  9 Mar 2026 16:48:06 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=10.30.226.201
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1773074886; cv=none;
 b=lkbSNHHOR1BHb+x8pygfHWxvYe+cFFRXEIxCaYH0WlUH6HIMXSbV4DBNgc5P0/rAs9YHfU8YKyTmpSa17uq+eIuaf0ADLzz4uD/X9TWMPsPNJT2h0hUhpPsoaB39s3+kxvkbRwqj9fR8UTURPfDMhEHxiQbDlZT/xmlaqeXUhro=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1773074886; c=relaxed/simple;
	bh=Zxtsgu1TemvpgcnmCFSJTgzfv8FPAfvXqLO7ecUkY9U=;
	h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References:
	 MIME-Version:Content-Type;
 b=oKlNMr3WGP6NurSqJMUCq2SsbFHGtOZdx7MHWnoBenTDRAvzUvSpvRydEAwMat2H2JHMjeY4/MmZgAG2g80gq1BljKJBF/l/H8I/0gDkrTkFrwhcZSUCYlmaGhX1EXJROUy29e4PT801c46ZBOs/inY0GRt3crB9vqmZold6rYg=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org
 header.b=tHaLgfeA; arc=none smtp.client-ip=10.30.226.201
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org
 header.b="tHaLgfeA"
Received: by smtp.kernel.org (Postfix) with ESMTPSA id 5021FC4CEF7;
	Mon,  9 Mar 2026 16:48:06 +0000 (UTC)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org;
	s=k20201202; t=1773074886;
	bh=Zxtsgu1TemvpgcnmCFSJTgzfv8FPAfvXqLO7ecUkY9U=;
	h=From:To:Cc:Subject:Date:In-Reply-To:References:From;
	b=tHaLgfeA922HnEX8sKFd/khIphSQhIXouJcQ3/xldgHhaIW4DzxQVSRqkktvc3Yqs
	 y2HUmnIZ6mizYdKIBV0mf6nchp/CjCq2PqSsUhzpwSCEKmqbquLNO0J4n2/fWhEDQI
	 AR27htYMdR9VgcYEqNlokIMYJJXzjKEKZZzKvgANNwcx5H1y+DRq1FC0A3uwnie2Gh
	 QtLDCTDObnnyzFHcKQzPl9wu0aihTyc49MZ2hufOf7PhvXrdyOIY8ZC0TJ3uHRFjxo
	 dDImNZfoRXL0eCAr+G9DKi1L3vR04sQIgcRb1NUnbx/fL3j668LkD3xeogkDKoz/bv
	 LLawyLuiUc/4w==
Received: from mchehab by mail.kernel.org with local (Exim 4.99.1)
	(envelope-from <mchehab+huawei@kernel.org>)
	id 1vzdm0-0000000Bhd6-22YC;
	Mon, 09 Mar 2026 17:48:04 +0100
From: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
To: Jonathan Corbet <corbet@lwn.net>,
	Linux Doc Mailing List <linux-doc@vger.kernel.org>
Cc: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>,
	linux-kernel@vger.kernel.org,
	Aleksandr Loktionov <aleksandr.loktionov@intel.com>,
	Randy Dunlap <rdunlap@infradead.org>
Subject: [PATCH 5/8] docs: kdoc_re: add a C tokenizer
Date: Mon,  9 Mar 2026 17:47:56 +0100
Message-ID: 
 <c63ad36c81fe043e9e33ca55630414893f127413.1773074166.git.mchehab+huawei@kernel.org>
X-Mailer: git-send-email 2.52.0
In-Reply-To: <cover.1773074166.git.mchehab+huawei@kernel.org>
References: <cover.1773074166.git.mchehab+huawei@kernel.org>
Precedence: bulk
X-Mailing-List: linux-kernel@vger.kernel.org
List-Id: <linux-kernel.vger.kernel.org>
List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: quoted-printable
Sender: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>

Handling C code purely using regular expressions doesn't work well.

Add a C tokenizer to help doing it the right way.

The tokenizer was written using as basis the Python re documentation
tokenizer example from:
	https://docs.python.org/3/library/re.html#writing-a-tokenizer

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
---
 tools/lib/python/kdoc/kdoc_re.py | 234 +++++++++++++++++++++++++++++++
 1 file changed, 234 insertions(+)

diff --git a/tools/lib/python/kdoc/kdoc_re.py b/tools/lib/python/kdoc/kdoc_=
re.py
index 085b89a4547c..7bed4e9a8810 100644
--- a/tools/lib/python/kdoc/kdoc_re.py
+++ b/tools/lib/python/kdoc/kdoc_re.py
@@ -141,6 +141,240 @@ class KernRe:
=20
         return self.last_match.groups()
=20
+class TokType():
+
+    @staticmethod
+    def __str__(val):
+        """Return the name of an enum value"""
+        return TokType._name_by_val.get(val, f"UNKNOWN({val})")
+
+class CToken():
+    """
+    Data class to define a C token.
+    """
+
+    # Tokens that can be used by the parser. Works like an C enum.
+
+    COMMENT =3D 0     #: A standard C or C99 comment, including delimiter.
+    STRING =3D 1      #: A string, including quotation marks.
+    CHAR =3D 2        #: A character, including apostophes.
+    NUMBER =3D 3      #: A number.
+    PUNC =3D 4        #: A puntuation mark: ``;`` / ``,`` / ``.``.
+    BEGIN =3D 5       #: A begin character: ``{`` / ``[`` / ``(``.
+    END =3D 6         #: A end character: ``}`` / ``]`` / ``)``.
+    CPP =3D 7         #: A preprocessor macro.
+    HASH =3D 8        #: The hash character - useful to handle other macro=
s.
+    OP =3D 9          #: A C operator (add, subtract, ...).
+    STRUCT =3D 10     #: A ``struct`` keyword.
+    UNION =3D 11      #: An ``union`` keyword.
+    ENUM =3D 12       #: A ``struct`` keyword.
+    TYPEDEF =3D 13    #: A ``typedef`` keyword.
+    NAME =3D 14       #: A name. Can be an ID or a type.
+    SPACE =3D 15      #: Any space characters, including new lines
+
+    MISMATCH =3D 255  #: an error indicator: should never happen in practi=
ce.
+
+    # Dict to convert from an enum interger into a string.
+    _name_by_val =3D {v: k for k, v in dict(vars()).items() if isinstance(=
v, int)}
+
+    # Dict to convert from string to an enum-like integer value.
+    _name_to_val =3D {k: v for v, k in _name_by_val.items()}
+
+    @staticmethod
+    def to_name(val):
+        """Convert from an integer value from CToken enum into a string"""
+
+        return CToken._name_by_val.get(val, f"UNKNOWN({val})")
+
+    @staticmethod
+    def from_name(name):
+        """Convert a string into a CToken enum value"""
+        if name in CToken._name_to_val:
+            return CToken._name_to_val[name]
+
+        return CToken.MISMATCH
+
+    def __init__(self, kind, value, pos,
+                 brace_level, paren_level, bracket_level):
+        self.kind =3D kind
+        self.value =3D value
+        self.pos =3D pos
+        self.brace_level =3D brace_level
+        self.paren_level =3D paren_level
+        self.bracket_level =3D bracket_level
+
+    def __repr__(self):
+        name =3D self.to_name(self.kind)
+        if isinstance(self.value, str):
+            value =3D '"' + self.value + '"'
+        else:
+            value =3D self.value
+
+        return f"CToken({name}, {value}, {self.pos}, " \
+               f"{self.brace_level}, {self.paren_level}, {self.bracket_lev=
el})"
+
+#: Tokens to parse C code.
+TOKEN_LIST =3D [
+    (CToken.COMMENT, r"//[^\n]*|/\*[\s\S]*?\*/"),
+
+    (CToken.STRING,  r'"(?:\\.|[^"\\])*"'),
+    (CToken.CHAR,    r"'(?:\\.|[^'\\])'"),
+
+    (CToken.NUMBER,  r"0[xX][0-9a-fA-F]+[uUlL]*|0[0-7]+[uUlL]*|"
+                     r"[0-9]+(\.[0-9]*)?([eE][+-]?[0-9]+)?[fFlL]*"),
+
+    (CToken.PUNC,    r"[;,\.]"),
+
+    (CToken.BEGIN,   r"[\[\(\{]"),
+
+    (CToken.END,     r"[\]\)\}]"),
+
+    (CToken.CPP,     r"#\s*(define|include|ifdef|ifndef|if|else|elif|endif=
|undef|pragma)\b"),
+
+    (CToken.HASH,    r"#"),
+
+    (CToken.OP,      r"\+\+|\-\-|\->|=3D=3D|\!=3D|<=3D|>=3D|&&|\|\||<<|>>|=
\+=3D|\-=3D|\*=3D|/=3D|%=3D"
+                     r"|&=3D|\|=3D|\^=3D|=3D|\+|\-|\*|/|%|<|>|&|\||\^|~|!|=
\?|\:"),
+
+    (CToken.STRUCT,  r"\bstruct\b"),
+    (CToken.UNION,   r"\bunion\b"),
+    (CToken.ENUM,    r"\benum\b"),
+    (CToken.TYPEDEF, r"\bkinddef\b"),
+
+    (CToken.NAME,      r"[A-Za-z_][A-Za-z0-9_]*"),
+
+    (CToken.SPACE,   r"[\s]+"),
+
+    (CToken.MISMATCH,r"."),
+]
+
+#: Handle C continuation lines.
+RE_CONT =3D KernRe(r"\\\n")
+
+RE_COMMENT_START =3D KernRe(r'/\*\s*')
+
+#: tokenizer regex. Will be filled at the first CTokenizer usage.
+re_scanner =3D None
+
+class CTokenizer():
+    """
+    Scan C statements and definitions and produce tokens.
+
+    When converted to string, it drops comments and handle public/private
+    values, respecting depth.
+    """
+
+    # This class is inspired and follows the basic concepts of:
+    #   https://docs.python.org/3/library/re.html#writing-a-tokenizer
+
+    def _tokenize(self, source):
+        """
+        Interactor that parses ``source``, splitting it into tokens, as de=
fined
+        at ``self.TOKEN_LIST``.
+
+        The interactor returns a CToken class object.
+        """
+
+        # Handle continuation lines. Note that kdoc_parser already has a
+        # logic to do that. Still, let's keep it for completeness, as we m=
ight
+        # end re-using this tokenizer outsize kernel-doc some day - or we =
may
+        # eventually remove from there as a future cleanup.
+        source =3D RE_CONT.sub("", source)
+
+        brace_level =3D 0
+        paren_level =3D 0
+        bracket_level =3D 0
+
+        for match in re_scanner.finditer(source):
+            kind =3D CToken.from_name(match.lastgroup)
+            pos =3D match.start()
+            value =3D match.group()
+
+            if kind =3D=3D CToken.MISMATCH:
+                raise RuntimeError(f"Unexpected token '{value}' on {pos}:\=
n\t{source}")
+            elif kind =3D=3D CToken.BEGIN:
+                if value =3D=3D '(':
+                    paren_level +=3D 1
+                elif value =3D=3D '[':
+                    bracket_level +=3D 1
+                else:  # value =3D=3D '{'
+                    brace_level +=3D 1
+
+            elif kind =3D=3D CToken.END:
+                if value =3D=3D ')' and paren_level > 0:
+                    paren_level -=3D 1
+                elif value =3D=3D ']' and bracket_level > 0:
+                    bracket_level -=3D 1
+                elif brace_level > 0:    # value =3D=3D '}'
+                    brace_level -=3D 1
+
+            yield CToken(kind, value, pos,
+                         brace_level, paren_level, bracket_level)
+
+    def __init__(self, source):
+        """
+        Create a regular expression to handle TOKEN_LIST.
+
+        While I generally don't like using regex group naming via:
+            (?P<name>...)
+
+        in this particular case, it makes sense, as we can pick the name
+        when matching a code via re_scanner().
+        """
+        global re_scanner
+
+        if not re_scanner:
+            re_tokens =3D []
+
+            for kind, pattern in TOKEN_LIST:
+                name =3D CToken.to_name(kind)
+                re_tokens.append(f"(?P<{name}>{pattern})")
+
+            re_scanner =3D KernRe("|".join(re_tokens), re.MULTILINE | re.D=
OTALL)
+
+        self.tokens =3D []
+        for tok in self._tokenize(source):
+            self.tokens.append(tok)
+
+    def __str__(self):
+        out=3D""
+        show_stack =3D [True]
+
+        for tok in self.tokens:
+            if tok.kind =3D=3D CToken.BEGIN:
+                show_stack.append(show_stack[-1])
+
+            elif tok.kind =3D=3D CToken.END:
+                prev =3D show_stack[-1]
+                if len(show_stack) > 1:
+                    show_stack.pop()
+
+                if not prev and show_stack[-1]:
+                    #
+                    # Try to preserve indent
+                    #
+                    out +=3D "\t" * (len(show_stack) - 1)
+
+                    out +=3D str(tok.value)
+                    continue
+
+            elif tok.kind =3D=3D CToken.COMMENT:
+                comment =3D RE_COMMENT_START.sub("", tok.value)
+
+                if comment.startswith("private:"):
+                    show_stack[-1] =3D False
+                    show =3D False
+                elif comment.startswith("public:"):
+                    show_stack[-1] =3D True
+
+                continue
+
+            if show_stack[-1]:
+                    out +=3D str(tok.value)
+
+        return out
+
+
 #: Nested delimited pairs (brackets and parenthesis)
 DELIMITER_PAIRS =3D {
     '{': '}',
--=20
2.52.0
From nobody Thu Apr  9 10:29:06 2026
Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org
 [10.30.226.201])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id 9809B3DA7C7;
	Mon,  9 Mar 2026 16:48:06 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=10.30.226.201
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1773074886; cv=none;
 b=kvO2R2mO3MD77MHM/eh5F4ivw7J3jPA7fcWsv0LvikN39kFeZLaNqi8txLvcIvYP5QhMwGzVBwV8QJU136xzDWN12sneCqMEsz9pILvj8v1A/Yv1zFZCOsbNMWU/d8WqxAX5Wgjm+Z0XNdwiERt8xIsaEf2PyskBOzDUZjIFjZ8=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1773074886; c=relaxed/simple;
	bh=xi9HTFCdsv+dCAkW0Hiur3D5yVumKUkJ/b9uKkf9fw8=;
	h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References:
	 MIME-Version:Content-Type;
 b=F5oZhzjGk9prDiNmfb2ylmgMzBl2E/t6LlMf8yhqBJ+HX6ucLdAN2BmAqRzipcSidvbvfeCdR/cyR3+7OUnRnDhIseNZqv72qcn0kbN8um9+4DJegrXK9jgvFLgujBhdY6zreQMxRslgWuifhxbHM+q3h+/wLMm2/ocYr7dTGg4=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org
 header.b=Y8R6jAP0; arc=none smtp.client-ip=10.30.226.201
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org
 header.b="Y8R6jAP0"
Received: by smtp.kernel.org (Postfix) with ESMTPSA id 74F38C2BCB0;
	Mon,  9 Mar 2026 16:48:06 +0000 (UTC)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org;
	s=k20201202; t=1773074886;
	bh=xi9HTFCdsv+dCAkW0Hiur3D5yVumKUkJ/b9uKkf9fw8=;
	h=From:To:Cc:Subject:Date:In-Reply-To:References:From;
	b=Y8R6jAP0iDqMNKIQpA+QJcNIY1PhRXYegtYcoXKlnd2t6bR1/6O8apZdpLybxfnuJ
	 nul2hkuArze/3mM/TxZ4mLcBil/xb9bl3z2Jp/61B3HvjFiRg33w6n3eGlKMGxUaXz
	 nBwpmQFLoYQ3IvgwkgrXhSdJUXsUx9sSReZK9mvJwSdMHMZNw3G7GdY3/5YIhhRo4w
	 jfDs7jMEllzSwq4cxmFr1WAPBLFfUJxgf4xv99WlGgknyKFgdrOcWPlYMsBteCUONd
	 E5zJAmM0mm2qZv11l74QkgD5KD0vrs1I4FQi/S5Zb2yEjFHiHtxWlZ3GxSNkWN0Vbn
	 FYYImXuwnbfzg==
Received: from mchehab by mail.kernel.org with local (Exim 4.99.1)
	(envelope-from <mchehab+huawei@kernel.org>)
	id 1vzdm0-0000000BheJ-2rxL;
	Mon, 09 Mar 2026 17:48:04 +0100
From: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
To: Jonathan Corbet <corbet@lwn.net>,
	Linux Doc Mailing List <linux-doc@vger.kernel.org>
Cc: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>,
	linux-kernel@vger.kernel.org,
	Aleksandr Loktionov <aleksandr.loktionov@intel.com>,
	Randy Dunlap <rdunlap@infradead.org>
Subject: [PATCH 6/8] docs: kdoc: use tokenizer to handle comments on structs
Date: Mon,  9 Mar 2026 17:47:57 +0100
Message-ID: 
 <f83ee9e8c38407eaab6ad10d4ccf155fb36683cc.1773074166.git.mchehab+huawei@kernel.org>
X-Mailer: git-send-email 2.52.0
In-Reply-To: <cover.1773074166.git.mchehab+huawei@kernel.org>
References: <cover.1773074166.git.mchehab+huawei@kernel.org>
Precedence: bulk
X-Mailing-List: linux-kernel@vger.kernel.org
List-Id: <linux-kernel.vger.kernel.org>
List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: quoted-printable
Sender: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>

Better handle comments inside structs. After those changes,
all unittests now pass:

  test_private:
    TestPublicPrivate:
        test balanced_inner_private:                                 OK
        test balanced_non_greddy_private:                            OK
        test balanced_private:                                       OK
        test no private:                                             OK
        test unbalanced_inner_private:                               OK
        test unbalanced_private:                                     OK
        test unbalanced_struct_group_tagged_with_private:            OK
        test unbalanced_two_struct_group_tagged_first_with_private:  OK
        test unbalanced_without_end_of_line:                         OK

  Ran 9 tests

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
---
 tools/lib/python/kdoc/kdoc_parser.py | 14 ++++----------
 1 file changed, 4 insertions(+), 10 deletions(-)

diff --git a/tools/lib/python/kdoc/kdoc_parser.py b/tools/lib/python/kdoc/k=
doc_parser.py
index 4b3c555e6c8e..6b181ead3175 100644
--- a/tools/lib/python/kdoc/kdoc_parser.py
+++ b/tools/lib/python/kdoc/kdoc_parser.py
@@ -13,7 +13,7 @@ import sys
 import re
 from pprint import pformat
=20
-from kdoc.kdoc_re import NestedMatch, KernRe
+from kdoc.kdoc_re import NestedMatch, KernRe, CTokenizer
 from kdoc.kdoc_item import KdocItem
=20
 #
@@ -84,15 +84,9 @@ def trim_private_members(text):
     """
     Remove ``struct``/``enum`` members that have been marked "private".
     """
-    # First look for a "public:" block that ends a private region, then
-    # handle the "private until the end" case.
-    #
-    text =3D KernRe(r'/\*\s*private:.*?/\*\s*public:.*?\*/', flags=3Dre.S)=
.sub('', text)
-    text =3D KernRe(r'/\*\s*private:.*', flags=3Dre.S).sub('', text)
-    #
-    # We needed the comments to do the above, but now we can take them out.
-    #
-    return KernRe(r'\s*/\*.*?\*/\s*', flags=3Dre.S).sub('', text).strip()
+
+    tokens =3D CTokenizer(text)
+    return str(tokens)
=20
 class state:
     """
--=20
2.52.0
From nobody Thu Apr  9 10:29:06 2026
Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org
 [10.30.226.201])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id C18F33DFC7F;
	Mon,  9 Mar 2026 16:48:06 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=10.30.226.201
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1773074886; cv=none;
 b=XY2NkyR8wLiUoQxmTqpAbxPpBtCT/XUTu73UOkFKVjBJab6MfvA4X4bgAFEJo7VNeM5AFt2QcG+LFR6SvD4RpW7n12mGtEuTY7M5fUzAmRTWUZGztcGDt7FMR5ijEd8rqmByO1CzQfzLmgCpmqYw1Jic7VdevWZZkvh2mQJeXpk=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1773074886; c=relaxed/simple;
	bh=w/naBW8NNuJGu7a0YYIA6roistwl+UTsYj4gxMVPhK8=;
	h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References:
	 MIME-Version:Content-Type;
 b=nypOhz4hr7ui2yYEmviiFWQ+Q0nrCSpI2Wx+pw5oF1M9bfwyi7qM0Zv7NAbSrijlpfYzG9cW3N1zcmzkmxDbSeqKiU1nwxJ5PrEuLdByb0n/G4XU+VLFuubfekmwCmR3ya0aCBW5jxRG+TfkyrQ29XVtMiOsggKHL2tPwgsDK6M=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org
 header.b=CTRVEpCI; arc=none smtp.client-ip=10.30.226.201
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org
 header.b="CTRVEpCI"
Received: by smtp.kernel.org (Postfix) with ESMTPSA id 99E60C2BCAF;
	Mon,  9 Mar 2026 16:48:06 +0000 (UTC)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org;
	s=k20201202; t=1773074886;
	bh=w/naBW8NNuJGu7a0YYIA6roistwl+UTsYj4gxMVPhK8=;
	h=From:To:Cc:Subject:Date:In-Reply-To:References:From;
	b=CTRVEpCIFIhNse9dlicZMdqjU2Ucj1VHUfBa+dMGBjHk5xoxazvpc0ml9P/ArgpUp
	 9UGejAVvQluCtoUrWz6msSIkzvd828sCrpRProl3YRBZeZMdQHQhAphq7PT9JjCVQN
	 JOLIN5FYyAazCF9f9fpG49wLUDxHzsgVzS2sP8my+dWwRW3LLgyA/C9Ycum78RT1H/
	 h2wBzMrgfxjRCrWSfzo5vpTM2tC+7JvW//+MJrhJatMGEfMSOzbptcisEssC5P/9dm
	 pWLSUU1KUqjaSv/PnljTJhIIgu8LCrRRU9W4I1UiqMBQPLSGayl91cHlb4/ktGjnAC
	 3zOMzTpchTNUw==
Received: from mchehab by mail.kernel.org with local (Exim 4.99.1)
	(envelope-from <mchehab+huawei@kernel.org>)
	id 1vzdm0-0000000BhfW-3ghd;
	Mon, 09 Mar 2026 17:48:04 +0100
From: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
To: Jonathan Corbet <corbet@lwn.net>,
	Linux Doc Mailing List <linux-doc@vger.kernel.org>
Cc: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>,
	linux-kernel@vger.kernel.org
Subject: [PATCH 7/8] unittests: test_private: modify it to use CTokenizer
 directly
Date: Mon,  9 Mar 2026 17:47:58 +0100
Message-ID: 
 <2672257233ff73a9464c09b50924be51e25d4f59.1773074166.git.mchehab+huawei@kernel.org>
X-Mailer: git-send-email 2.52.0
In-Reply-To: <cover.1773074166.git.mchehab+huawei@kernel.org>
References: <cover.1773074166.git.mchehab+huawei@kernel.org>
Precedence: bulk
X-Mailing-List: linux-kernel@vger.kernel.org
List-Id: <linux-kernel.vger.kernel.org>
List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: quoted-printable
Sender: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>

Change the logic to use the tokenizer directly. This allows
adding more unit tests to check the validty of the tokenizer
itself.

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
---
 .../{test_private.py =3D> test_tokenizer.py}    | 76 +++++++++++++------
 1 file changed, 52 insertions(+), 24 deletions(-)
 rename tools/unittests/{test_private.py =3D> test_tokenizer.py} (85%)

diff --git a/tools/unittests/test_private.py b/tools/unittests/test_tokeniz=
er.py
similarity index 85%
rename from tools/unittests/test_private.py
rename to tools/unittests/test_tokenizer.py
index eae245ae8a12..da0f2c4c9e21 100755
--- a/tools/unittests/test_private.py
+++ b/tools/unittests/test_tokenizer.py
@@ -15,20 +15,44 @@ from unittest.mock import MagicMock
 SRC_DIR =3D os.path.dirname(os.path.realpath(__file__))
 sys.path.insert(0, os.path.join(SRC_DIR, "../lib/python"))
=20
-from kdoc.kdoc_parser import trim_private_members
+from kdoc.kdoc_re import CTokenizer
 from unittest_helper import run_unittest
=20
+
+
 #
 # List of tests.
 #
 # The code will dynamically generate one test for each key on this diction=
ary.
 #
=20
+def make_private_test(name, data):
+    """
+    Create a test named ``name`` using parameters given by ``data`` dict.
+    """
+
+    def test(self):
+        """In-lined lambda-like function to run the test"""
+        tokens =3D CTokenizer(data["source"])
+        result =3D str(tokens)
+
+        #
+        # Avoid whitespace false positives
+        #
+        result =3D re.sub(r"\s++", " ", result).strip()
+        expected =3D re.sub(r"\s++", " ", data["trimmed"]).strip()
+
+        msg =3D f"failed when parsing this source:\n{data['source']}"
+        self.assertEqual(result, expected, msg=3Dmsg)
+
+    return test
+
 #: Tests to check if CTokenizer is handling properly public/private commen=
ts.
 TESTS_PRIVATE =3D {
     #
     # Simplest case: no private. Ensure that trimming won't affect struct
     #
+    "__run__": make_private_test,
     "no private": {
         "source": """
             struct foo {
@@ -288,41 +312,45 @@ TESTS_PRIVATE =3D {
     },
 }
=20
+#: Dict containing all test groups fror CTokenizer
+TESTS =3D {
+    "TestPublicPrivate": TESTS_PRIVATE,
+}
=20
-class TestPublicPrivate(unittest.TestCase):
-    """
-    Main test class. Populated dynamically at runtime.
-    """
+def setUp(self):
+    self.maxDiff =3D None
=20
-    def setUp(self):
-        self.maxDiff =3D None
+def build_test_class(group_name, table):
+    """
+    Dynamically creates a class instance using type() as a generator
+    for a new class derivated from unittest.TestCase.
=20
-    def add_test(cls, name, source, trimmed):
-        """
-        Dynamically add a test to the class
-        """
-        def test(cls):
-            result =3D trim_private_members(source)
+    We're opting to do it inside a function to avoid the risk of
+    changing the globals() dictionary.
+    """
=20
-            result =3D re.sub(r"\s++", " ", result).strip()
-            expected =3D re.sub(r"\s++", " ", trimmed).strip()
+    class_dict =3D {
+        "setUp": setUp
+    }
=20
-            msg =3D f"failed when parsing this source:\n" + source
+    run =3D table["__run__"]
=20
-            cls.assertEqual(result, expected, msg=3Dmsg)
+    for test_name, data in table.items():
+        if test_name =3D=3D "__run__":
+            continue
=20
-        test.__name__ =3D f'test {name}'
+        class_dict[f"test_{test_name}"] =3D run(test_name, data)
=20
-        setattr(TestPublicPrivate, test.__name__, test)
+    cls =3D type(group_name, (unittest.TestCase,), class_dict)
=20
+    return cls.__name__, cls
=20
 #
-# Populate TestPublicPrivate class
+# Create classes and add them to the global dictionary
 #
-test_class =3D TestPublicPrivate()
-for name, test in TESTS_PRIVATE.items():
-    test_class.add_test(name, test["source"], test["trimmed"])
-
+for group, table in TESTS.items():
+    t =3D build_test_class(group, table)
+    globals()[t[0]] =3D t[1]
=20
 #
 # main
--=20
2.52.0
From nobody Thu Apr  9 10:29:06 2026
Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org
 [10.30.226.201])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id E5E0F3E7162;
	Mon,  9 Mar 2026 16:48:06 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=10.30.226.201
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1773074887; cv=none;
 b=pSoS+5GO993ALZjABCMcs+W6PYVQpZGrSdfPxRRboOJXXOQN1MQq7RpDnXoXxuFoD/OTcseaFbrbuEQPgb8xy4IhvDoVUohUX60394wAFInKU2mBR2m2y3Crn8+Cc2Y+4rGKCUX2rw/tQvDOQKvzgryewlFBxOaU7hvkOMAYTXg=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1773074887; c=relaxed/simple;
	bh=tRQ5UQGeY6yZl7zSl8udBzbIceMvjBuqhB3obUBrnHc=;
	h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References:
	 MIME-Version:Content-Type;
 b=R9xe/Ib4vYWMf9V9WKKmCUFtMKtR7UrBtwP+sSUDCb23cCon7VhIII0H8YZwNn8acynpnnpARNrf+2aK7PofDqwB6Nri8XF52RjSzmdIPYwJX/JLuLQaGu0xroYxGkWeMEIIyV7yKhflDT4OaX0XEi7SIkHxlSsLV/hyqWae70M=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org
 header.b=Vdml6zpI; arc=none smtp.client-ip=10.30.226.201
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org
 header.b="Vdml6zpI"
Received: by smtp.kernel.org (Postfix) with ESMTPSA id C4AE8C2BC9E;
	Mon,  9 Mar 2026 16:48:06 +0000 (UTC)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org;
	s=k20201202; t=1773074886;
	bh=tRQ5UQGeY6yZl7zSl8udBzbIceMvjBuqhB3obUBrnHc=;
	h=From:To:Cc:Subject:Date:In-Reply-To:References:From;
	b=Vdml6zpIS+fXR4Y0izsY0Fr4ffRrjA1ohyPRxsR8UxljTvLA2pHeAiy7YoFgXzR1u
	 JYK0DLCZ9UDSuC99aM6HOxgBLNoZYr4XIvss5R9/6PtFKy70F+quBo5o0orwx77ikA
	 S6SpFpXmVgpHndDdjuJqTRO05QytPjNBOFJJUiZVrt0HyoMM109ACKmnpX1RSQn3t2
	 r5Ui9/iDkWVOyRGd0vBj9IkdjSXibIifijy0tytKqMK4B46cSMjWp6VhVwHRb/v5qe
	 tdlYrmQF60XKcQaN1K6R7zMZv+sAEzXzCBFz4Q+hprhYTpX/9CI8D4n7ekv2cbME1E
	 h0ZPsqtbbTRCQ==
Received: from mchehab by mail.kernel.org with local (Exim 4.99.1)
	(envelope-from <mchehab+huawei@kernel.org>)
	id 1vzdm1-0000000Bhgm-0I3m;
	Mon, 09 Mar 2026 17:48:05 +0100
From: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
To: Jonathan Corbet <corbet@lwn.net>,
	Linux Doc Mailing List <linux-doc@vger.kernel.org>
Cc: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>,
	linux-kernel@vger.kernel.org,
	Aleksandr Loktionov <aleksandr.loktionov@intel.com>,
	Randy Dunlap <rdunlap@infradead.org>
Subject: [PATCH 8/8] unittests: test_tokenizer: check if the tokenizer works
Date: Mon,  9 Mar 2026 17:47:59 +0100
Message-ID: 
 <50a4be47b52450aed9f9228e06fef39df52a3dbf.1773074166.git.mchehab+huawei@kernel.org>
X-Mailer: git-send-email 2.52.0
In-Reply-To: <cover.1773074166.git.mchehab+huawei@kernel.org>
References: <cover.1773074166.git.mchehab+huawei@kernel.org>
Precedence: bulk
X-Mailing-List: linux-kernel@vger.kernel.org
List-Id: <linux-kernel.vger.kernel.org>
List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: quoted-printable
Sender: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>

Add extra tests to check if the tokenizer is working properly.

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
---
 tools/lib/python/kdoc/kdoc_re.py  |   4 +-
 tools/unittests/test_tokenizer.py | 109 +++++++++++++++++++++++++++++-
 2 files changed, 108 insertions(+), 5 deletions(-)

diff --git a/tools/lib/python/kdoc/kdoc_re.py b/tools/lib/python/kdoc/kdoc_=
re.py
index 7bed4e9a8810..b4e1a2dbdcc2 100644
--- a/tools/lib/python/kdoc/kdoc_re.py
+++ b/tools/lib/python/kdoc/kdoc_re.py
@@ -194,8 +194,8 @@ class CToken():
=20
         return CToken.MISMATCH
=20
-    def __init__(self, kind, value, pos,
-                 brace_level, paren_level, bracket_level):
+    def __init__(self, kind, value=3DNone, pos=3D0,
+                 brace_level=3D0, paren_level=3D0, bracket_level=3D0):
         self.kind =3D kind
         self.value =3D value
         self.pos =3D pos
diff --git a/tools/unittests/test_tokenizer.py b/tools/unittests/test_token=
izer.py
index da0f2c4c9e21..0955facad736 100755
--- a/tools/unittests/test_tokenizer.py
+++ b/tools/unittests/test_tokenizer.py
@@ -15,16 +15,118 @@ from unittest.mock import MagicMock
 SRC_DIR =3D os.path.dirname(os.path.realpath(__file__))
 sys.path.insert(0, os.path.join(SRC_DIR, "../lib/python"))
=20
-from kdoc.kdoc_re import CTokenizer
+from kdoc.kdoc_re import CToken, CTokenizer
 from unittest_helper import run_unittest
=20
-
-
 #
 # List of tests.
 #
 # The code will dynamically generate one test for each key on this diction=
ary.
 #
+def tokens_to_list(tokens):
+    tuples =3D []
+
+    for tok in tokens:
+        if tok.kind =3D=3D CToken.SPACE:
+            continue
+
+        tuples +=3D [(tok.kind, tok.value,
+                    tok.brace_level, tok.paren_level, tok.bracket_level)]
+
+    return tuples
+
+
+def make_tokenizer_test(name, data):
+    """
+    Create a test named ``name`` using parameters given by ``data`` dict.
+    """
+
+    def test(self):
+        """In-lined lambda-like function to run the test"""
+
+        #
+        # Check if exceptions are properly handled
+        #
+        if "raises" in data:
+            with self.assertRaises(data["raises"]):
+                CTokenizer(data["source"])
+            return
+
+        #
+        # Check if tokenizer is producing expected results
+        #
+        tokens =3D CTokenizer(data["source"]).tokens
+
+        result =3D tokens_to_list(tokens)
+        expected =3D tokens_to_list(data["expected"])
+
+        self.assertEqual(result, expected, msg=3Df"{name}")
+
+    return test
+
+#: Tokenizer tests.
+TESTS_TOKENIZER =3D {
+    "__run__": make_tokenizer_test,
+
+    "basic_tokens": {
+        "source": """
+            int a; // comment
+            float b =3D 1.23;
+        """,
+        "expected": [
+            CToken(CToken.NAME, "int"),
+            CToken(CToken.NAME, "a"),
+            CToken(CToken.PUNC, ";"),
+            CToken(CToken.COMMENT, "// comment"),
+            CToken(CToken.NAME, "float"),
+            CToken(CToken.NAME, "b"),
+            CToken(CToken.OP, "=3D"),
+            CToken(CToken.NUMBER, "1.23"),
+            CToken(CToken.PUNC, ";"),
+        ],
+    },
+
+    "depth_counters": {
+        "source": """
+            struct X {
+                int arr[10];
+                func(a[0], (b + c));
+            }
+        """,
+        "expected": [
+            CToken(CToken.STRUCT, "struct"),
+            CToken(CToken.NAME, "X"),
+            CToken(CToken.BEGIN, "{", brace_level=3D1),
+
+            CToken(CToken.NAME, "int", brace_level=3D1),
+            CToken(CToken.NAME, "arr", brace_level=3D1),
+            CToken(CToken.BEGIN, "[", brace_level=3D1, bracket_level=3D1),
+            CToken(CToken.NUMBER, "10", brace_level=3D1, bracket_level=3D1=
),
+            CToken(CToken.END, "]", brace_level=3D1),
+            CToken(CToken.PUNC, ";", brace_level=3D1),
+            CToken(CToken.NAME, "func", brace_level=3D1),
+            CToken(CToken.BEGIN, "(", brace_level=3D1, paren_level=3D1),
+            CToken(CToken.NAME, "a", brace_level=3D1, paren_level=3D1),
+            CToken(CToken.BEGIN, "[", brace_level=3D1, paren_level=3D1, br=
acket_level=3D1),
+            CToken(CToken.NUMBER, "0", brace_level=3D1, paren_level=3D1, b=
racket_level=3D1),
+            CToken(CToken.END, "]", brace_level=3D1, paren_level=3D1),
+            CToken(CToken.PUNC, ",", brace_level=3D1, paren_level=3D1),
+            CToken(CToken.BEGIN, "(", brace_level=3D1, paren_level=3D2),
+            CToken(CToken.NAME, "b", brace_level=3D1, paren_level=3D2),
+            CToken(CToken.OP, "+", brace_level=3D1, paren_level=3D2),
+            CToken(CToken.NAME, "c", brace_level=3D1, paren_level=3D2),
+            CToken(CToken.END, ")", brace_level=3D1, paren_level=3D1),
+            CToken(CToken.END, ")", brace_level=3D1),
+            CToken(CToken.PUNC, ";", brace_level=3D1),
+            CToken(CToken.END, "}"),
+        ],
+    },
+
+    "mismatch_error": {
+        "source": "int a$ =3D 5;",          # $ is illegal
+        "raises": RuntimeError,
+    },
+}
=20
 def make_private_test(name, data):
     """
@@ -315,6 +417,7 @@ TESTS_PRIVATE =3D {
 #: Dict containing all test groups fror CTokenizer
 TESTS =3D {
     "TestPublicPrivate": TESTS_PRIVATE,
+    "TestTokenizer": TESTS_TOKENIZER,
 }
=20
 def setUp(self):
--=20
2.52.0