docs: kdoc: tidy up create_parameter_list() somewhat

[PATCH 3/7] docs: kdoc: clean up the create_parameter_list() "first arg" logic

Posted by Jonathan Corbet 6 months ago

The logic for finding the name of the first in a series of variable names
is somewhat convoluted and, in the use of .extend(), actively buggy.
Document what is happening and simplify the logic.

Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 scripts/lib/kdoc/kdoc_parser.py | 22 +++++++++++-----------
 1 file changed, 11 insertions(+), 11 deletions(-)

diff --git a/scripts/lib/kdoc/kdoc_parser.py b/scripts/lib/kdoc/kdoc_parser.py
index 53051ce831ba..47f7ea01ed10 100644
--- a/scripts/lib/kdoc/kdoc_parser.py
+++ b/scripts/lib/kdoc/kdoc_parser.py
@@ -553,18 +553,18 @@ class KernelDoc:
                 arg = KernRe(r'\s*\[').sub('[', arg)
                 args = KernRe(r'\s*,\s*').split(arg)
                 args[0] = re.sub(r'(\*+)\s*', r' \1', args[0])
-
-                first_arg = []
-                r = KernRe(r'^(.*\s+)(.*?\[.*\].*)$')
-                if args[0] and r.match(args[0]):
-                    args.pop(0)
-                    first_arg.extend(r.group(1))
-                    first_arg.append(r.group(2))
+                #
+                # args[0] has a string of "type a".  If "a" includes an [array]
+                # declaration, we want to not be fooled by any white space inside
+                # the brackets, so detect and handle that case specially.
+                #
+                r = KernRe(r'^([^[\]]*\s+)' r'((?:.*?\[.*\].*)|(?:.*?))$')
+                if r.match(args[0]):
+                    args[0] = r.group(2)
+                    dtype = r.group(1)
                 else:
-                    first_arg = KernRe(r'\s+').split(args.pop(0))
-
-                args.insert(0, first_arg.pop())
-                dtype = ' '.join(first_arg)
+                    # No space in args[0]; this seems wrong but preserves previous behavior
+                    dtype = ''
 
                 bitfield_re = KernRe(r'(.*?):(\w+)')
                 for param in args:
-- 
2.50.1

Re: [PATCH 3/7] docs: kdoc: clean up the create_parameter_list() "first arg" logic

Posted by Mauro Carvalho Chehab 6 months ago

On Tue, 12 Aug 2025 13:57:44 -0600
Jonathan Corbet <corbet@lwn.net> wrote:

> The logic for finding the name of the first in a series of variable names
> is somewhat convoluted and, in the use of .extend(), actively buggy.
> Document what is happening and simplify the logic.
> 
> Signed-off-by: Jonathan Corbet <corbet@lwn.net>
> ---
>  scripts/lib/kdoc/kdoc_parser.py | 22 +++++++++++-----------
>  1 file changed, 11 insertions(+), 11 deletions(-)
> 
> diff --git a/scripts/lib/kdoc/kdoc_parser.py b/scripts/lib/kdoc/kdoc_parser.py
> index 53051ce831ba..47f7ea01ed10 100644
> --- a/scripts/lib/kdoc/kdoc_parser.py
> +++ b/scripts/lib/kdoc/kdoc_parser.py
> @@ -553,18 +553,18 @@ class KernelDoc:
>                  arg = KernRe(r'\s*\[').sub('[', arg)
>                  args = KernRe(r'\s*,\s*').split(arg)
>                  args[0] = re.sub(r'(\*+)\s*', r' \1', args[0])
> -
> -                first_arg = []
> -                r = KernRe(r'^(.*\s+)(.*?\[.*\].*)$')
> -                if args[0] and r.match(args[0]):
> -                    args.pop(0)
> -                    first_arg.extend(r.group(1))
> -                    first_arg.append(r.group(2))

I double-checked the Perl code. The Python version seems to be an exact
translation of what was there:

            $arg =~ s/\s*\[/\[/g;

            my @args = split('\s*,\s*', $arg);
            if ($args[0] =~ m/\*/) {
                $args[0] =~ s/(\*+)\s*/ $1/;
            }

	    my @first_arg;
            if ($args[0] =~ /^(.*\s+)(.*?\[.*\].*)$/) {
                shift @args;
                push(@first_arg, split('\s+', $1));
                push(@first_arg, $2);
            } else {
                @first_arg = split('\s+', shift @args);
            }

Yeah, I agree that this logic is confusing. 

> +                #
> +                # args[0] has a string of "type a".  If "a" includes an [array]
> +                # declaration, we want to not be fooled by any white space inside
> +                # the brackets, so detect and handle that case specially.
> +                #
> +                r = KernRe(r'^([^[\]]*\s+)' r'((?:.*?\[.*\].*)|(?:.*?))$')

Same comment as patch 6/7... concats in the middle of the like IMO makes it
harder to read. Better to place them on separate lines:

	r = KernRe(r'^([^[\]]*\s+)'
		   r'((?:.*?\[.*\].*)|(?:.*?))$')

> +                if r.match(args[0]):
> +                    args[0] = r.group(2)
> +                    dtype = r.group(1)
>                  else:
> -                    first_arg = KernRe(r'\s+').split(args.pop(0))
> -
> -                args.insert(0, first_arg.pop())
> -                dtype = ' '.join(first_arg)
> +                    # No space in args[0]; this seems wrong but preserves previous behavior
> +                    dtype = ''
>  
>                  bitfield_re = KernRe(r'(.*?):(\w+)')
>                  for param in args:

I didn't test your new code. On a first glance, it doesn't seem identical
to the previous one, but if you tested it and the results are the same,
the new version seems nicer once you split the concat on two lines. So,
feel free to add:

Acked-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>


-

Btw, IMHO, it would make sense to have unittests to check things
like that to ensure that new patches won't cause regressions for
some particular usecases.

Thanks,
Mauro

Re: [PATCH 3/7] docs: kdoc: clean up the create_parameter_list() "first arg" logic

Posted by Jonathan Corbet 6 months ago

Mauro Carvalho Chehab <mchehab+huawei@kernel.org> writes:

>> +                #
>> +                # args[0] has a string of "type a".  If "a" includes an [array]
>> +                # declaration, we want to not be fooled by any white space inside
>> +                # the brackets, so detect and handle that case specially.
>> +                #
>> +                r = KernRe(r'^([^[\]]*\s+)' r'((?:.*?\[.*\].*)|(?:.*?))$')
>
> Same comment as patch 6/7... concats in the middle of the like IMO makes it
> harder to read. Better to place them on separate lines:
>
> 	r = KernRe(r'^([^[\]]*\s+)'
> 		   r'((?:.*?\[.*\].*)|(?:.*?))$')

So I went to do this, and realized that the second chunk of the regex is
really just a complex way of saying "(.*)$" - so I'll make it just that,
at which point splitting up the string seems a bit excessive.

Thanks,

jon

[PATCH 1/7] docs: kdoc: remove dead code
[PATCH 2/7] docs: kdoc: tidy up space removal in create_parameter_list()
[PATCH 3/7] docs: kdoc: clean up the create_parameter_list() "first arg" logic
[PATCH 4/7] docs: kdoc: add a couple more comments in create_parameter_list()
[PATCH 5/7] docs: kdoc: tighten up the array-of-pointers case
[PATCH 6/7] docs: kdoc: tighten up the pointer-to-function case
[PATCH 7/7] docs: kdoc: remove redundant comment stripping