[PATCH RFC 0/2] kernel-doc: better handle data prototypes

Mauro Carvalho Chehab posted 2 patches 2 weeks ago
parse_c.py                           |  87 +++++++++++
tools/lib/python/kdoc/data_parser.py | 211 +++++++++++++++++++++++++++
2 files changed, 298 insertions(+)
create mode 100755 parse_c.py
create mode 100644 tools/lib/python/kdoc/data_parser.py
[PATCH RFC 0/2] kernel-doc: better handle data prototypes
Posted by Mauro Carvalho Chehab 2 weeks ago
Hi Jon,

Don't merge this series. It is just a heads on about what I'm
working right now.

This is basically a proof of concept, not yet integrated with
kernel-doc. It helps to show that investing on a tokenizer
was a good idea.

I'm still testing the code.

Right now, kernel-doc logic to handle data types is very
complex, and the code is split into dump_<type> functions, which
in turn calls several ancillary routines. The most complex ones
are related to handling struct, with involves converting inner
struct/unions into members of the main struct.

By using this new code, all elements from most data types can
be parsed with a single code.

Please notice that the code was designed to pick a single
declaration, as this is how kdoc_parser will use it.
If you try to parse multiple ones, the output won't be right,
as it will pick the first declaration name and create a single
item with all data declarations on it.

As it is not based on regexes, it can properly handle some
problematic cases, like having:

    {};

and:
    ;;;;;

in the middle of a struct/union.

For enums, if one has values inside the declaration, like:

    enum { FOO, BAR } type;

It picks the right data type. Kernel-doc maps this currently as:
    enum type

My plan is to integrate it at Kernel-doc and see how it goes.
It will likely rise some corner cases, but, once we get it right,
this will likely reduce the size and complexity of kdoc_parser.

If you want to test, you can use:

    ./parse_c.py

to use an example hardcoded on it, or it reads from a fname with:

    $ ./parse_c.py x.h
    CDataItem(decl_type=None, decl_name=None, parameterlist=['u16_data'], parametertypes={'u16_data': 'u16 u16_data[sizeof(u64) / sizeof(u16)]'})
    None None

    parameterlist:
      - u16_data

    parametertypes:
      - u16_data: u16 u16_data[sizeof(u64) / sizeof(u16)]

   (on this example, x.h has just:
    u16 u16_data[sizeof(u64) / sizeof(u16)];
   )

The logic stores decl_type and decl_name when the data is
struct/union/enum. If the data is just a declaration, it fills
only one element at parameterlist and at parametertypes.

Mauro Carvalho Chehab (2):
  docs: kdoc: add a class to parse data items
  HACK: add a parse_c.py file to test CDataParser

 parse_c.py                           |  87 +++++++++++
 tools/lib/python/kdoc/data_parser.py | 211 +++++++++++++++++++++++++++
 2 files changed, 298 insertions(+)
 create mode 100755 parse_c.py
 create mode 100644 tools/lib/python/kdoc/data_parser.py

-- 
2.53.0
Re: [PATCH RFC 0/2] kernel-doc: better handle data prototypes
Posted by Mauro Carvalho Chehab 1 week, 3 days ago
On Fri, 20 Mar 2026 10:46:39 +0100
Mauro Carvalho Chehab <mchehab+huawei@kernel.org> wrote:

> Hi Jon,
> 
> Don't merge this series. It is just a heads on about what I'm
> working right now.
> 
> This is basically a proof of concept, not yet integrated with
> kernel-doc. It helps to show that investing on a tokenizer
> was a good idea.
> 
> I'm still testing the code.

Heh, getting it working is hard, but I ended with something that
should work with a somewhat complex scenario.

The new version is at my scratch repository at:

	https://github.com/mchehab/linux PR_CDataParser-v2

I'm expecting that this parser should be able to handle:

	- typedef (for data types);
	- struct
	- union
	- enum
	- var

So, after properly integrated(*), it should simplify a lot the
code inside kdoc_parser.

(*) right now, it is minimally integrated, handling just
    struct/unions.
	
My current plan is to test it more with real-case scenarios,
aiming to submit it after 7.1-rc1, as it sounds to be that a
change like that is too late to be submitted like that.

IMO the newer code should be more reliable than the current
approach and should produce a better output once done.


-- 
Thanks,
Mauro

For this input:

<snip>
/**
 * struct property_entry - property entry
 *
 * @name: name description
 * @length: length description
 * @is_inline: is_inline description
 * @bar: bar description
 * @my_enum: my_enum description
 * @test: test description
 * @anonymous: anon description
 * @type: type description
 * @literal: literal description
 * @pointer: pointer description
 * @value: value description
 * @boou8_data: boou8_data description
 * @u16_data: u16_data description
 * @u32_data: u32_data description
 * @u64_data: u64_data description
 * @str: str description
 * @prop_name: prop name description
 */

struct property_entry {
	const char *name;
	size_t length;
	bool is_inline;   /* TEST */
	struct foo {
		char *bar[12];
		struct {
			enum enum_type my_enum; /* TEST	2 */
			struct {
				uint_t test; /* TEST 3 */
				static const int anonymous;
			};
		} foobar ;
		;;
		{};
	};
	enum dev_prop_type type;

	enum {
		EXPRESSION_LITERAL,
		EXPRESSION_BINARY,
		EXPRESSION_UNARY,
		EXPRESSION_FUNCTION,
		EXPRESSION_ARRAY
	} literal;

	union {
		const void *pointer;
		union {
			u8 boou8_data[sizeof(u64) / sizeof(u8)];
			u16 u16_data[sizeof(u64) / sizeof(u16)];
			u32 u32_data[sizeof(u64) / sizeof(u32)];
			u64 u64_data[sizeof(u64) / sizeof(u64)];
			const char *str[sizeof(u64) / sizeof(char *)];
		};
	};
	char *prop_name;
};
</snip>

Kernel-doc produces a proper result:

<snip>
Ignoring foobar


.. c:struct:: property_entry

  property entry

.. container:: kernelindent

  **Definition**::

    struct property_entry {
        const char *name;
        size_t length;
        bool is_inline;
        struct foo {
            char *bar[12];
            struct {
                enum enum_type my_enum;
                struct {
                    uint_t test;
                    static const int anonymous;
                };
            } foobar;
            {
            };
        };
        enum dev_prop_type type;
        enum {
            EXPRESSION_LITERAL,
            EXPRESSION_BINARY,
            EXPRESSION_UNARY,
            EXPRESSION_FUNCTION,
        EXPRESSION_ARRAY } literal;
        union {
            const void *pointer;
            union {
                u8 boou8_data[sizeof(u64) / sizeof(u8)];
                u16 u16_data[sizeof(u64) / sizeof(u16)];
                u32 u32_data[sizeof(u64) / sizeof(u32)];
                u64 u64_data[sizeof(u64) / sizeof(u64)];
                const char *str[sizeof(u64) / sizeof(char *)];
            };
        };
        char *prop_name;
        }
    };

  **Members**

  ``{unnamed_struct}``
    anonymous

  ``name``
    name description

  ``length``
    length description

  ``is_inline``
    is_inline description

  ``bar``
    bar description

  ``my_enum``
    my_enum description

  ``{unnamed_struct}``
    anonymous

  ``test``
    test description

  ``anonymous``
    anon description

  ``type``
    type description

  ``literal``
    literal description

  ``{unnamed_union}``
    anonymous

  ``pointer``
    pointer description

  ``boou8_data``
    boou8_data description

  ``u16_data``
    u16_data description

  ``u32_data``
    u32_data description

  ``u64_data``
    u64_data description

  ``str``
    str description

  ``prop_name``
    prop name description
</snip>