From nobody Sat Oct 4 00:26:49 2025 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id CCCB03126B3; Fri, 22 Aug 2025 14:19:48 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1755872388; cv=none; b=rnvaPTWD+/BuwC3CSM/EbEa7HT2vPI15bCatnQ/v013wzq1ZfNXzYJKTzvGJkGzlVrzGRFTnMKrG80axbcDyVaV1abp3FDuhdC1iQBsN8nte6hThjo+cQIgxnsTKsMAP2y4c3zyCCZl7pomE7juty8pNY0UFwpI3PMXjQcfZK38= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1755872388; c=relaxed/simple; bh=IkR2tDIMqOGFWIt+/VpJUQvrkHkeqcw1eRxVumPctes=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=t+cVsIxFV34BQE0+Rxrt2sRPXtEGzhbhIYxdQhlVP20an4xne0zC/zoAG1nx7Xvpr3Xjvm0qbVAG+kIkFSbKYyJWwUGLDqXWNm4fN0SbcVWeqYRVtYU1LoXg3zkpzZ97cFGATZF1lnh+j9mCEEjKI1pHaAe6Vng+1e4wwoiaV+s= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=oU65SJP/; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="oU65SJP/" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 10E97C116D0; Fri, 22 Aug 2025 14:19:48 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1755872388; bh=IkR2tDIMqOGFWIt+/VpJUQvrkHkeqcw1eRxVumPctes=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=oU65SJP/+jTEHTdqTyCfM+UFLSFdFufzlQG6yrIQibvMjeyaPCDZMHGhNUprTFotx hHNAGLEW3JhEzU/91ob0zp9BVM4ppQav7TMPy5ktxJZLEvz2xePoGeFyx7rhlehf3A SZq4/NQm4sFnCj3zJH64K8BkFWdv0n9f2W6bwlI5InXEjLNyQAKsrpBNreIK8RAvPZ SnxY8VlJDnE7fIAR29afATlqu9Nw1H30hSznzWEzYg/mhdgtE5QF7MAZrl3Ax4EAMV CkUSYjfgJFkiOaVq+UZQCKOIIyUfSErtMC9ENCE5fzHd3TzH3Ht4YawQbDAkcrTlyq rA5pK9xaIXJVQ== Received: from mchehab by mail.kernel.org with local (Exim 4.98.2) (envelope-from ) id 1upScM-0000000CCqz-0fiZ; Fri, 22 Aug 2025 16:19:46 +0200 From: Mauro Carvalho Chehab To: Jonathan Corbet , Linux Doc Mailing List Cc: Mauro Carvalho Chehab , "Mauro Carvalho Chehab" , linux-kernel@vger.kernel.org Subject: [PATCH v2 02/24] docs: parse-headers.py: convert parse-headers.pl Date: Fri, 22 Aug 2025 16:19:14 +0200 Message-ID: X-Mailer: git-send-email 2.50.1 In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Sender: Mauro Carvalho Chehab Content-Type: text/plain; charset="utf-8" When the Kernel started to use Sphinx, we had to come up with a solution to parse media headers. On that time, we didn't have much experience with Sphinx extensions. So, we came up with our own script-based solution that were basically implementing a set of rules we used to have at the Makefile. Convert it to Python, keeping it bug-compatible with the original script. While here, try to better document it. Signed-off-by: Mauro Carvalho Chehab --- Documentation/sphinx/parse-headers.py | 429 ++++++++++++++++++++++++++ 1 file changed, 429 insertions(+) create mode 100755 Documentation/sphinx/parse-headers.py diff --git a/Documentation/sphinx/parse-headers.py b/Documentation/sphinx/p= arse-headers.py new file mode 100755 index 000000000000..b39284d21090 --- /dev/null +++ b/Documentation/sphinx/parse-headers.py @@ -0,0 +1,429 @@ +#!/usr/bin/env python3 +# SPDX-License-Identifier: GPL-2.0 +# Copyright (c) 2016 by Mauro Carvalho Chehab . +# pylint: disable=3DC0103,R0902,R0912,R0914,R0915 + +""" +Convert a C header or source file (C_FILE), into a ReStructured Text +included via ..parsed-literal block with cross-references for the +documentation files that describe the API. It accepts an optional +EXCEPTIONS_FILE with describes what elements will be either ignored or +be pointed to a non-default reference. + +The output is written at the (OUT_FILE). + +It is capable of identifying defines, functions, structs, typedefs, +enums and enum symbols and create cross-references for all of them. +It is also capable of distinguish #define used for specifying a Linux +ioctl. + +The EXCEPTIONS_FILE contains a set of rules like: + + ignore ioctl VIDIOC_ENUM_FMT + replace ioctl VIDIOC_DQBUF vidioc_qbuf + replace define V4L2_EVENT_MD_FL_HAVE_FRAME_SEQ :c:type:`v4l2_event_mot= ion_det` +""" + +import argparse +import os +import re +import sys + + +class ParseHeader: + """ + Creates an enriched version of a Kernel header file with cross-links + to each C data structure type. + + It is meant to allow having a more comprehensive documentation, where + uAPI headers will create cross-reference links to the code. + + It is capable of identifying defines, functions, structs, typedefs, + enums and enum symbols and create cross-references for all of them. + It is also capable of distinguish #define used for specifying a Linux + ioctl. + + By default, it create rules for all symbols and defines, but it also + allows parsing an exception file. Such file contains a set of rules + using the syntax below: + + 1. Ignore rules: + + ignore ` + + Removes the symbol from reference generation. + + 2. Replace rules: + + replace + + Replaces how old_symbol with a new reference. The new_reference can be: + - A simple symbol name; + - A full Sphinx reference. + + On both cases, can be: + - ioctl: for defines that end with _IO*, e.g. ioctl definitions + - define: for other defines + - symbol: for symbols defined within enums; + - typedef: for typedefs; + - enum: for the name of a non-anonymous enum; + - struct: for structs. + + Examples: + + ignore define __LINUX_MEDIA_H + ignore ioctl VIDIOC_ENUM_FMT + replace ioctl VIDIOC_DQBUF vidioc_qbuf + replace define V4L2_EVENT_MD_FL_HAVE_FRAME_SEQ :c:type:`v4l2_event= _motion_det` + """ + + # Parser regexes with multiple ways to capture enums and structs + RE_ENUMS =3D [ + re.compile(r"^\s*enum\s+([\w_]+)\s*\{"), + re.compile(r"^\s*enum\s+([\w_]+)\s*$"), + re.compile(r"^\s*typedef\s*enum\s+([\w_]+)\s*\{"), + re.compile(r"^\s*typedef\s*enum\s+([\w_]+)\s*$"), + ] + RE_STRUCTS =3D [ + re.compile(r"^\s*struct\s+([_\w][\w\d_]+)\s*\{"), + re.compile(r"^\s*struct\s+([_\w][\w\d_]+)$"), + re.compile(r"^\s*typedef\s*struct\s+([_\w][\w\d_]+)\s*\{"), + re.compile(r"^\s*typedef\s*struct\s+([_\w][\w\d_]+)$"), + ] + + # FIXME: the original code was written a long time before Sphinx C + # domain to have multiple namespaces. To avoid to much turn at the + # existing hyperlinks, the code kept using "c:type" instead of the + # right types. To change that, we need to change the types not only + # here, but also at the uAPI media documentation. + DEF_SYMBOL_TYPES =3D { + "ioctl": { + "prefix": "\\ ", + "suffix": "\\ ", + "ref_type": ":ref", + }, + "define": { + "prefix": "\\ ", + "suffix": "\\ ", + "ref_type": ":ref", + }, + # We're calling each definition inside an enum as "symbol" + "symbol": { + "prefix": "\\ ", + "suffix": "\\ ", + "ref_type": ":ref", + }, + "typedef": { + "prefix": "\\ ", + "suffix": "\\ ", + "ref_type": ":c:type", + }, + # This is the name of the enum itself + "enum": { + "prefix": "", + "suffix": "\\ ", + "ref_type": ":c:type", + }, + "struct": { + "prefix": "", + "suffix": "\\ ", + "ref_type": ":c:type", + }, + } + + def __init__(self, debug: bool =3D False): + """Initialize internal vars""" + self.debug =3D debug + self.data =3D "" + + self.symbols =3D {} + + for symbol_type in self.DEF_SYMBOL_TYPES: + self.symbols[symbol_type] =3D {} + + def store_type(self, symbol_type: str, symbol: str, + ref_name: str =3D None, replace_underscores: bool =3D T= rue): + """ + Stores a new symbol at self.symbols under symbol_type. + + By default, underscores are replaced by "-" + """ + defs =3D self.DEF_SYMBOL_TYPES[symbol_type] + + prefix =3D defs.get("prefix", "") + suffix =3D defs.get("suffix", "") + ref_type =3D defs.get("ref_type") + + # Determine ref_link based on symbol type + if ref_type: + if symbol_type =3D=3D "enum": + ref_link =3D f"{ref_type}:`{symbol}`" + else: + if not ref_name: + ref_name =3D symbol.lower() + + if replace_underscores: + ref_name =3D ref_name.replace("_", "-") + + ref_link =3D f"{ref_type}:`{symbol} <{ref_name}>`" + else: + ref_link =3D symbol + + self.symbols[symbol_type][symbol] =3D f"{prefix}{ref_link}{suffix}" + + def store_line(self, line): + """Stores a line at self.data, properly indented""" + line =3D " " + line.expandtabs() + self.data +=3D line.rstrip(" ") + + def parse_file(self, file_in: str): + """Reads a C source file and get identifiers""" + self.data =3D "" + is_enum =3D False + is_comment =3D False + multiline =3D "" + + with open(file_in, "r", + encoding=3D"utf-8", errors=3D"backslashreplace") as f: + for line_no, line in enumerate(f): + self.store_line(line) + line =3D line.strip("\n") + + # Handle continuation lines + if line.endswith(r"\\"): + multiline +=3D line[-1] + continue + + if multiline: + line =3D multiline + line + multiline =3D "" + + # Handle comments. They can be multilined + if not is_comment: + if re.search(r"/\*.*", line): + is_comment =3D True + else: + # Strip C99-style comments + line =3D re.sub(r"(//.*)", "", line) + + if is_comment: + if re.search(r".*\*/", line): + is_comment =3D False + else: + multiline =3D line + continue + + # At this point, line variable may be a multilined stateme= nt, + # if lines end with \ or if they have multi-line comments + # With that, it can safely remove the entire comments, + # and there's no need to use re.DOTALL for the logic below + + line =3D re.sub(r"(/\*.*\*/)", "", line) + if not line.strip(): + continue + + # It can be useful for debug purposes to print the file af= ter + # having comments stripped and multi-lines grouped. + if self.debug > 1: + print(f"line {line_no + 1}: {line}") + + # Now the fun begins: parse each type and store it. + + # We opted for a two parsing logic here due to: + # 1. it makes easier to debug issues not-parsed symbols; + # 2. we want symbol replacement at the entire content, not + # just when the symbol is detected. + + if is_enum: + match =3D re.match(r"^\s*([_\w][\w\d_]+)\s*[\,=3D]?", = line) + if match: + self.store_type("symbol", match.group(1)) + if "}" in line: + is_enum =3D False + continue + + match =3D re.match(r"^\s*#\s*define\s+([\w_]+)\s+_IO", lin= e) + if match: + self.store_type("ioctl", match.group(1), + replace_underscores=3DFalse) + continue + + match =3D re.match(r"^\s*#\s*define\s+([\w_]+)(\s+|$)", li= ne) + if match: + self.store_type("define", match.group(1)) + continue + + match =3D re.match(r"^\s*typedef\s+([_\w][\w\d_]+)\s+(.*)\= s+([_\w][\w\d_]+);", + line) + if match: + name =3D match.group(2).strip() + symbol =3D match.group(3) + self.store_type("typedef", symbol, ref_name=3Dname, + replace_underscores=3DFalse) + continue + + for re_enum in self.RE_ENUMS: + match =3D re_enum.match(line) + if match: + self.store_type("enum", match.group(1)) + is_enum =3D True + break + + for re_struct in self.RE_STRUCTS: + match =3D re_struct.match(line) + if match: + self.store_type("struct", match.group(1), + replace_underscores=3DFalse) + break + + def process_exceptions(self, fname: str): + """ + Process exceptions file with rules to ignore or replace references. + """ + if not fname: + return + + name =3D os.path.basename(fname) + + with open(fname, "r", encoding=3D"utf-8", errors=3D"backslashrepla= ce") as f: + for ln, line in enumerate(f): + ln +=3D 1 + line =3D line.strip() + if not line or line.startswith("#"): + continue + + # Handle ignore rules + match =3D re.match(r"^ignore\s+(\w+)\s+(\S+)", line) + if match: + c_type =3D match.group(1) + symbol =3D match.group(2) + + if c_type not in self.DEF_SYMBOL_TYPES: + sys.exit(f"{name}:{ln}: {c_type} is invalid") + + d =3D self.symbols[c_type] + if symbol in d: + del d[symbol] + + continue + + # Handle replace rules + match =3D re.match(r"^replace\s+(\S+)\s+(\S+)\s+(\S+)", li= ne) + if not match: + sys.exit(f"{name}:{ln}: invalid line: {line}") + + c_type, old, new =3D match.groups() + + if c_type not in self.DEF_SYMBOL_TYPES: + sys.exit(f"{name}:{ln}: {c_type} is invalid") + + reftype =3D None + + # Parse reference type when the type is specified + + match =3D re.match(r"^\:c\:(data|func|macro|type)\:\`(.+)\= `", new) + if match: + reftype =3D f":c:{match.group(1)}" + new =3D match.group(2) + else: + match =3D re.search(r"(\:ref)\:\`(.+)\`", new) + if match: + reftype =3D match.group(1) + new =3D match.group(2) + + # If the replacement rule doesn't have a type, get default + if not reftype: + reftype =3D self.DEF_SYMBOL_TYPES[c_type].get("ref_typ= e") + if not reftype: + reftype =3D self.DEF_SYMBOL_TYPES[c_type].get("rea= l_type") + + new_ref =3D f"{reftype}:`{old} <{new}>`" + + # Change self.symbols to use the replacement rule + if old in self.symbols[c_type]: + self.symbols[c_type][old] =3D new_ref + else: + print(f"{name}:{ln}: Warning: can't find {old} {c_type= }") + + def debug_print(self): + """ + Print debug information containing the replacement rules per symbo= l. + To make easier to check, group them per type. + """ + if not self.debug: + return + + for c_type, refs in self.symbols.items(): + if not refs: # Skip empty dictionaries + continue + + print(f"{c_type}:") + + for symbol, ref in sorted(refs.items()): + print(f" {symbol} -> {ref}") + + print() + + def write_output(self, file_in: str, file_out: str): + """Write the formatted output to a file.""" + + # Avoid extra blank lines + text =3D re.sub(r"\s+$", "", self.data) + "\n" + text =3D re.sub(r"\n\s+\n", "\n\n", text) + + # Escape Sphinx special characters + text =3D re.sub(r"([\_\`\*\<\>\&\\\\:\/\|\%\$\#\{\}\~\^])", r"\\\1= ", text) + + # Source uAPI files may have special notes. Use bold font for them + text =3D re.sub(r"DEPRECATED", "**DEPRECATED**", text) + + # Delimiters to catch the entire symbol after escaped + start_delim =3D r"([ \n\t\(=3D\*\@])" + end_delim =3D r"(\s|,|\\=3D|\\:|\;|\)|\}|\{)" + + # Process all reference types + for ref_dict in self.symbols.values(): + for symbol, replacement in ref_dict.items(): + symbol =3D re.escape(re.sub(r"([\_\`\*\<\>\&\\\\:\/])", r"= \\\1", symbol)) + text =3D re.sub(fr'{start_delim}{symbol}{end_delim}', + fr'\1{replacement}\2', text) + + # Remove "\ " where not needed: before spaces and at the end of li= nes + text =3D re.sub(r"\\ ([\n ])", r"\1", text) + + title =3D os.path.basename(file_in) + + with open(file_out, "w", encoding=3D"utf-8", errors=3D"backslashre= place") as f: + f.write(".. -*- coding: utf-8; mode: rst -*-\n\n") + f.write(f"{title}\n") + f.write("=3D" * len(title)) + f.write("\n\n.. parsed-literal::\n\n") + f.write(text) + + +def main(): + """Main function""" + parser =3D argparse.ArgumentParser(description=3D__doc__, + formatter_class=3Dargparse.RawDescrip= tionHelpFormatter) + + parser.add_argument("-d", "--debug", action=3D"count", default=3D0, + help=3D"Increase debug level. Can be used multiple= times") + parser.add_argument("file_in", help=3D"Input C file") + parser.add_argument("file_out", help=3D"Output RST file") + parser.add_argument("file_exceptions", nargs=3D"?", + help=3D"Exceptions file (optional)") + + args =3D parser.parse_args() + + parser =3D ParseHeader(debug=3Dargs.debug) + parser.parse_file(args.file_in) + + if args.file_exceptions: + parser.process_exceptions(args.file_exceptions) + + parser.debug_print() + parser.write_output(args.file_in, args.file_out) + + +if __name__ =3D=3D "__main__": + main() --=20 2.50.1