parse_c.py | 87 +++++++++++ tools/lib/python/kdoc/data_parser.py | 211 +++++++++++++++++++++++++++ 2 files changed, 298 insertions(+) create mode 100755 parse_c.py create mode 100644 tools/lib/python/kdoc/data_parser.py
Hi Jon,
Don't merge this series. It is just a heads on about what I'm
working right now.
This is basically a proof of concept, not yet integrated with
kernel-doc. It helps to show that investing on a tokenizer
was a good idea.
I'm still testing the code.
Right now, kernel-doc logic to handle data types is very
complex, and the code is split into dump_<type> functions, which
in turn calls several ancillary routines. The most complex ones
are related to handling struct, with involves converting inner
struct/unions into members of the main struct.
By using this new code, all elements from most data types can
be parsed with a single code.
Please notice that the code was designed to pick a single
declaration, as this is how kdoc_parser will use it.
If you try to parse multiple ones, the output won't be right,
as it will pick the first declaration name and create a single
item with all data declarations on it.
As it is not based on regexes, it can properly handle some
problematic cases, like having:
{};
and:
;;;;;
in the middle of a struct/union.
For enums, if one has values inside the declaration, like:
enum { FOO, BAR } type;
It picks the right data type. Kernel-doc maps this currently as:
enum type
My plan is to integrate it at Kernel-doc and see how it goes.
It will likely rise some corner cases, but, once we get it right,
this will likely reduce the size and complexity of kdoc_parser.
If you want to test, you can use:
./parse_c.py
to use an example hardcoded on it, or it reads from a fname with:
$ ./parse_c.py x.h
CDataItem(decl_type=None, decl_name=None, parameterlist=['u16_data'], parametertypes={'u16_data': 'u16 u16_data[sizeof(u64) / sizeof(u16)]'})
None None
parameterlist:
- u16_data
parametertypes:
- u16_data: u16 u16_data[sizeof(u64) / sizeof(u16)]
(on this example, x.h has just:
u16 u16_data[sizeof(u64) / sizeof(u16)];
)
The logic stores decl_type and decl_name when the data is
struct/union/enum. If the data is just a declaration, it fills
only one element at parameterlist and at parametertypes.
Mauro Carvalho Chehab (2):
docs: kdoc: add a class to parse data items
HACK: add a parse_c.py file to test CDataParser
parse_c.py | 87 +++++++++++
tools/lib/python/kdoc/data_parser.py | 211 +++++++++++++++++++++++++++
2 files changed, 298 insertions(+)
create mode 100755 parse_c.py
create mode 100644 tools/lib/python/kdoc/data_parser.py
--
2.53.0
On Fri, 20 Mar 2026 10:46:39 +0100
Mauro Carvalho Chehab <mchehab+huawei@kernel.org> wrote:
> Hi Jon,
>
> Don't merge this series. It is just a heads on about what I'm
> working right now.
>
> This is basically a proof of concept, not yet integrated with
> kernel-doc. It helps to show that investing on a tokenizer
> was a good idea.
>
> I'm still testing the code.
Heh, getting it working is hard, but I ended with something that
should work with a somewhat complex scenario.
The new version is at my scratch repository at:
https://github.com/mchehab/linux PR_CDataParser-v2
I'm expecting that this parser should be able to handle:
- typedef (for data types);
- struct
- union
- enum
- var
So, after properly integrated(*), it should simplify a lot the
code inside kdoc_parser.
(*) right now, it is minimally integrated, handling just
struct/unions.
My current plan is to test it more with real-case scenarios,
aiming to submit it after 7.1-rc1, as it sounds to be that a
change like that is too late to be submitted like that.
IMO the newer code should be more reliable than the current
approach and should produce a better output once done.
--
Thanks,
Mauro
For this input:
<snip>
/**
* struct property_entry - property entry
*
* @name: name description
* @length: length description
* @is_inline: is_inline description
* @bar: bar description
* @my_enum: my_enum description
* @test: test description
* @anonymous: anon description
* @type: type description
* @literal: literal description
* @pointer: pointer description
* @value: value description
* @boou8_data: boou8_data description
* @u16_data: u16_data description
* @u32_data: u32_data description
* @u64_data: u64_data description
* @str: str description
* @prop_name: prop name description
*/
struct property_entry {
const char *name;
size_t length;
bool is_inline; /* TEST */
struct foo {
char *bar[12];
struct {
enum enum_type my_enum; /* TEST 2 */
struct {
uint_t test; /* TEST 3 */
static const int anonymous;
};
} foobar ;
;;
{};
};
enum dev_prop_type type;
enum {
EXPRESSION_LITERAL,
EXPRESSION_BINARY,
EXPRESSION_UNARY,
EXPRESSION_FUNCTION,
EXPRESSION_ARRAY
} literal;
union {
const void *pointer;
union {
u8 boou8_data[sizeof(u64) / sizeof(u8)];
u16 u16_data[sizeof(u64) / sizeof(u16)];
u32 u32_data[sizeof(u64) / sizeof(u32)];
u64 u64_data[sizeof(u64) / sizeof(u64)];
const char *str[sizeof(u64) / sizeof(char *)];
};
};
char *prop_name;
};
</snip>
Kernel-doc produces a proper result:
<snip>
Ignoring foobar
.. c:struct:: property_entry
property entry
.. container:: kernelindent
**Definition**::
struct property_entry {
const char *name;
size_t length;
bool is_inline;
struct foo {
char *bar[12];
struct {
enum enum_type my_enum;
struct {
uint_t test;
static const int anonymous;
};
} foobar;
{
};
};
enum dev_prop_type type;
enum {
EXPRESSION_LITERAL,
EXPRESSION_BINARY,
EXPRESSION_UNARY,
EXPRESSION_FUNCTION,
EXPRESSION_ARRAY } literal;
union {
const void *pointer;
union {
u8 boou8_data[sizeof(u64) / sizeof(u8)];
u16 u16_data[sizeof(u64) / sizeof(u16)];
u32 u32_data[sizeof(u64) / sizeof(u32)];
u64 u64_data[sizeof(u64) / sizeof(u64)];
const char *str[sizeof(u64) / sizeof(char *)];
};
};
char *prop_name;
}
};
**Members**
``{unnamed_struct}``
anonymous
``name``
name description
``length``
length description
``is_inline``
is_inline description
``bar``
bar description
``my_enum``
my_enum description
``{unnamed_struct}``
anonymous
``test``
test description
``anonymous``
anon description
``type``
type description
``literal``
literal description
``{unnamed_union}``
anonymous
``pointer``
pointer description
``boou8_data``
boou8_data description
``u16_data``
u16_data description
``u32_data``
u32_data description
``u64_data``
u64_data description
``str``
str description
``prop_name``
prop name description
</snip>
© 2016 - 2026 Red Hat, Inc.