docs: Update our kernel-doc script to the kernel's new Python one

[PATCH for-10.2 5/8] scripts/kernel-doc: tweak for QEMU coding standards

Posted by Peter Maydell 5 months, 4 weeks ago

This commit makes the equivalent changes to the Python script that we
had for the old Perl script in commit 4cf41794411f ("docs: tweak
kernel-doc for QEMU coding standards").  To repeat the rationale from
that commit:

    Surprisingly, QEMU does have a pretty consistent doc comment style and
    it is not very different from the Linux kernel's.  Of the documentation
    "sigils", only "#" separates the QEMU doc comment style from Linux's,
    and it has 200+ instances vs. 6 for the kernel's '&struct foo' (all in
    accel/tcg/translate-all.c), so it's clear that the two standards are
    different in this respect.  In addition, our structs are typedefed and
    recognized by CamelCase names.

Note that in 4cf41794411f we used '(?!)' as our type_fallback regex;
this is strictly not quite a replacement for the upstream
'\&([_\w]+)', because the latter includes a group that can later be
matched with \1, and the former does not.  The old perl script did
not care about this, but the python version does, so we must include
the extra set of brackets to ensure we have a group.

This commit does not include all the same changes that 4cf41794411f
did.  Of the missing pieces, some had already gone in an earlier
kernel-doc update; the parts we still had but do not include here are:

    @@ -2057,7 +2060,7 @@
         }
         elsif (/$doc_decl/o) {
            $identifier = $1;
    -       if (/\s*([\w\s]+?)(\(\))?\s*-/) {
    +       if (/\s*([\w\s]+?)(\s*-|:)/) {
                $identifier = $1;
            }

    @@ -2067,7 +2070,7 @@
            $contents = "";
            $section = $section_default;
            $new_start_line = $. + 1;
    -       if (/-(.*)/) {
    +       if (/[-:](.*)/) {
                # strip leading/trailing/multiple spaces
                $descr= $1;
                $descr =~ s/^\s*//;

The second of these is already in the upstream version: the line r =
KernRe("[-:](.*)") in process_name() matches the regex we have.  The
first change has been refactored into the doc_begin_data and
doc_begin_func changes.  Since the output HTML for QEMU's
documentation has no relevant changes with the new kerneldoc, we
assume that this too has been handled upstream.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 scripts/lib/kdoc/kdoc_output.py | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/scripts/lib/kdoc/kdoc_output.py b/scripts/lib/kdoc/kdoc_output.py
index ea8914537ba..39fa872dfca 100644
--- a/scripts/lib/kdoc/kdoc_output.py
+++ b/scripts/lib/kdoc/kdoc_output.py
@@ -38,12 +38,12 @@
 type_fp_param2 = KernRe(r"\@(\w+->\S+)\(\)", cache=False)
 
 type_env = KernRe(r"(\$\w+)", cache=False)
-type_enum = KernRe(r"\&(enum\s*([_\w]+))", cache=False)
-type_struct = KernRe(r"\&(struct\s*([_\w]+))", cache=False)
-type_typedef = KernRe(r"\&(typedef\s*([_\w]+))", cache=False)
-type_union = KernRe(r"\&(union\s*([_\w]+))", cache=False)
-type_member = KernRe(r"\&([_\w]+)(\.|->)([_\w]+)", cache=False)
-type_fallback = KernRe(r"\&([_\w]+)", cache=False)
+type_enum = KernRe(r"#(enum\s*([_\w]+))", cache=False)
+type_struct = KernRe(r"#(struct\s*([_\w]+))", cache=False)
+type_typedef = KernRe(r"#(([A-Z][_\w]*))", cache=False)
+type_union = KernRe(r"#(union\s*([_\w]+))", cache=False)
+type_member = KernRe(r"#([_\w]+)(\.|->)([_\w]+)", cache=False)
+type_fallback = KernRe(r"((?!))", cache=False) # this never matches
 type_member_func = type_member + KernRe(r"\(\)", cache=False)
 
 
-- 
2.43.0

Re: [PATCH for-10.2 5/8] scripts/kernel-doc: tweak for QEMU coding standards

Posted by Mauro Carvalho Chehab 5 months, 4 weeks ago

Em Thu, 14 Aug 2025 18:13:20 +0100
Peter Maydell <peter.maydell@linaro.org> escreveu:

> This commit makes the equivalent changes to the Python script that we
> had for the old Perl script in commit 4cf41794411f ("docs: tweak
> kernel-doc for QEMU coding standards").  To repeat the rationale from
> that commit:
> 
>     Surprisingly, QEMU does have a pretty consistent doc comment style and
>     it is not very different from the Linux kernel's.  Of the documentation
>     "sigils", only "#" separates the QEMU doc comment style from Linux's,
>     and it has 200+ instances vs. 6 for the kernel's '&struct foo' (all in
>     accel/tcg/translate-all.c), so it's clear that the two standards are
>     different in this respect.  In addition, our structs are typedefed and
>     recognized by CamelCase names.
> 
> Note that in 4cf41794411f we used '(?!)' as our type_fallback regex;
> this is strictly not quite a replacement for the upstream
> '\&([_\w]+)', because the latter includes a group that can later be
> matched with \1, and the former does not.  The old perl script did
> not care about this, but the python version does, so we must include
> the extra set of brackets to ensure we have a group.
> 
> This commit does not include all the same changes that 4cf41794411f
> did.  Of the missing pieces, some had already gone in an earlier
> kernel-doc update; the parts we still had but do not include here are:
> 
>     @@ -2057,7 +2060,7 @@
>          }
>          elsif (/$doc_decl/o) {
>             $identifier = $1;
>     -       if (/\s*([\w\s]+?)(\(\))?\s*-/) {
>     +       if (/\s*([\w\s]+?)(\s*-|:)/) {
>                 $identifier = $1;
>             }
> 
>     @@ -2067,7 +2070,7 @@
>             $contents = "";
>             $section = $section_default;
>             $new_start_line = $. + 1;
>     -       if (/-(.*)/) {
>     +       if (/[-:](.*)/) {
>                 # strip leading/trailing/multiple spaces
>                 $descr= $1;
>                 $descr =~ s/^\s*//;
> 
> The second of these is already in the upstream version: the line r =
> KernRe("[-:](.*)") in process_name() matches the regex we have. 

Yes. If I recall correctly, we added this one to solve some issues on a 
couple of files that were full of ":" as separator. They violate what
is documented as a valid kernel-doc markup, but it didn't hurt adding 
support for such variant.

> The
> first change has been refactored into the doc_begin_data and
> doc_begin_func changes.  Since the output HTML for QEMU's
> documentation has no relevant changes with the new kerneldoc, we
> assume that this too has been handled upstream.
> 
> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

LGTM, but see my notes below.

Anyway:

Reviewed-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>

> ---
>  scripts/lib/kdoc/kdoc_output.py | 12 ++++++------
>  1 file changed, 6 insertions(+), 6 deletions(-)
> 
> diff --git a/scripts/lib/kdoc/kdoc_output.py b/scripts/lib/kdoc/kdoc_output.py
> index ea8914537ba..39fa872dfca 100644
> --- a/scripts/lib/kdoc/kdoc_output.py
> +++ b/scripts/lib/kdoc/kdoc_output.py
> @@ -38,12 +38,12 @@
>  type_fp_param2 = KernRe(r"\@(\w+->\S+)\(\)", cache=False)
>  
>  type_env = KernRe(r"(\$\w+)", cache=False)
> -type_enum = KernRe(r"\&(enum\s*([_\w]+))", cache=False)
> -type_struct = KernRe(r"\&(struct\s*([_\w]+))", cache=False)
> -type_typedef = KernRe(r"\&(typedef\s*([_\w]+))", cache=False)
> -type_union = KernRe(r"\&(union\s*([_\w]+))", cache=False)
> -type_member = KernRe(r"\&([_\w]+)(\.|->)([_\w]+)", cache=False)
> -type_fallback = KernRe(r"\&([_\w]+)", cache=False)

> +type_enum = KernRe(r"#(enum\s*([_\w]+))", cache=False)
> +type_struct = KernRe(r"#(struct\s*([_\w]+))", cache=False)
> +type_typedef = KernRe(r"#(([A-Z][_\w]*))", cache=False)
> +type_union = KernRe(r"#(union\s*([_\w]+))", cache=False)
> +type_member = KernRe(r"#([_\w]+)(\.|->)([_\w]+)", cache=False)
> +type_fallback = KernRe(r"((?!))", cache=False) # this never matches
>  type_member_func = type_member + KernRe(r"\(\)", cache=False)

That seems something that a class override would address it better.

Basically, you can do something like:


	type_enum = KernRe(r"#(enum\s*([_\w]+))", cache=False)
	type_struct = KernRe(r"#(struct\s*([_\w]+))", cache=False)
	type_typedef = KernRe(r"#(([A-Z][_\w]*))", cache=False)
	type_union = KernRe(r"#(union\s*([_\w]+))", cache=False)
	type_member = KernRe(r"#([_\w]+)(\.|->)([_\w]+)", cache=False)
	type_fallback = KernRe(r"((?!))", cache=False) # this never matches
	...

	(either keep the other types or add a __init__ that would append
         or replace only the above elements)

	class QemuRestFormat(RestFormatOutput):
	     highlights = [
	        (type_constant, r"``\1``"),
	        (type_constant2, r"``\1``"),

	        # Note: need to escape () to avoid func matching later
	        (type_member_func, r":c:type:`\1\2\3\\(\\) <\1>`"),
	        (type_member, r":c:type:`\1\2\3 <\1>`"),
	        (type_fp_param, r"**\1\\(\\)**"),
	        (type_fp_param2, r"**\1\\(\\)**"),
	        (type_func, r"\1()"),
	        (type_enum, r":c:type:`\1 <\2>`"),
	        (type_struct, r":c:type:`\1 <\2>`"),
	        (type_typedef, r":c:type:`\1 <\2>`"),
	        (type_union, r":c:type:`\1 <\2>`"),
	
	        # in rst this can refer to any type
	        (type_fallback, r":c:type:`\1`"),
	        (type_param_ref, r"**\1\2**")
	    ]

Where the above will be the QEMU-specific regexes.

Then, when creating a KernelFiles() instance at kerneldoc.py Sphinx
extension:

	def setup_kfiles(app):
            global kfiles

	    out_style = QemuRestFormat()
            kfiles = KernelFiles(out_style=out_style, logger=logger)

keeping the remaining code of the Kernel version of kerneldoc.py.

Thanks,
Mauro