[PATCH v5 06/13] scripts: generate_rust_analyzer.py: add type hints

Tamir Duberstein posted 13 patches 10 months, 2 weeks ago
There is a newer version of this series
[PATCH v5 06/13] scripts: generate_rust_analyzer.py: add type hints
Posted by Tamir Duberstein 10 months, 2 weeks ago
Python type hints allow static analysis tools like mypy to detect type
errors during development, improving the developer experience.

Python type hints have been present in the kernel since 2019 at the
latest; see commit 6ebf5866f2e8 ("kunit: tool: add Python wrappers for
running KUnit tests").

Add a subclass of `argparse.Namespace` to get type checking on the CLI
arguments. Move parsing of `cfg` out of `generate_crates` to reduce the
number of variables in scope with `cfg` in their name. Use a defaultdict
to avoid `.get("key", [])`.

Run `mypy --strict scripts/generate_rust_analyzer.py --python-version
3.8` to verify. Note that `mypy` no longer supports python < 3.8.

Tested-by: Daniel Almeida <daniel.almeida@collabora.com>
Signed-off-by: Tamir Duberstein <tamird@gmail.com>
---
 scripts/generate_rust_analyzer.py | 166 +++++++++++++++++++++++++-------------
 1 file changed, 109 insertions(+), 57 deletions(-)

diff --git a/scripts/generate_rust_analyzer.py b/scripts/generate_rust_analyzer.py
index 80eb21c0d082..b37d8345486a 100755
--- a/scripts/generate_rust_analyzer.py
+++ b/scripts/generate_rust_analyzer.py
@@ -10,16 +10,48 @@ import os
 import pathlib
 import subprocess
 import sys
+from collections import defaultdict
+from typing import DefaultDict, Dict, Iterable, List, Literal, Optional, TypedDict
 
-def args_crates_cfgs(cfgs):
-    crates_cfgs = {}
-    for cfg in cfgs:
-        crate, vals = cfg.split("=", 1)
-        crates_cfgs[crate] = vals.replace("--cfg", "").split()
 
-    return crates_cfgs
+class Dependency(TypedDict):
+    crate: int
+    name: str
 
-def generate_crates(srctree, objtree, sysroot_src, external_src, cfgs):
+
+class Source(TypedDict):
+    include_dirs: List[str]
+    exclude_dirs: List[str]
+
+
+class Crate(TypedDict):
+    display_name: str
+    root_module: str
+    is_workspace_member: bool
+    deps: List[Dependency]
+    cfg: List[str]
+    edition: Literal["2021"]
+    env: Dict[str, str]
+
+
+# `NotRequired` fields on `Crate` would be better but `NotRequired` was added in 3.11.
+class ProcMacroCrate(Crate):
+    is_proc_macro: Literal[True]
+    proc_macro_dylib_path: Optional[str]  # `pathlib.Path` is not JSON serializable.
+
+
+# `NotRequired` fields on `Crate` would be better but `NotRequired` was added in 3.11.
+class CrateWithGenerated(Crate):
+    source: Optional[Source]
+
+
+def generate_crates(
+    srctree: pathlib.Path,
+    objtree: pathlib.Path,
+    sysroot_src: pathlib.Path,
+    external_src: pathlib.Path,
+    crates_cfgs: DefaultDict[str, List[str]],
+) -> List[Crate]:
     # Generate the configuration list.
     cfg = []
     with open(objtree / "include" / "generated" / "rustc_cfg") as fd:
@@ -31,17 +63,16 @@ def generate_crates(srctree, objtree, sysroot_src, external_src, cfgs):
     # Now fill the crates list -- dependencies need to come first.
     #
     # Avoid O(n^2) iterations by keeping a map of indexes.
-    crates = []
-    crates_indexes = {}
-    crates_cfgs = args_crates_cfgs(cfgs)
+    crates: List[Crate] = []
+    crates_indexes: Dict[str, int] = {}
 
     def build_crate(
-        display_name,
-        root_module,
-        deps,
-        cfg=[],
-        is_workspace_member=True,
-    ):
+        display_name: str,
+        root_module: pathlib.Path,
+        deps: List[str],
+        cfg: List[str] = [],
+        is_workspace_member: bool = True,
+    ) -> Crate:
         return {
             "display_name": display_name,
             "root_module": str(root_module),
@@ -51,36 +82,30 @@ def generate_crates(srctree, objtree, sysroot_src, external_src, cfgs):
             "edition": "2021",
             "env": {
                 "RUST_MODFILE": "This is only for rust-analyzer"
-            }
+            },
         }
 
-    def register_crate(crate):
+    def register_crate(crate: Crate) -> None:
         crates_indexes[crate["display_name"]] = len(crates)
         crates.append(crate)
 
     def append_crate(
-        display_name,
-        root_module,
-        deps,
-        cfg=[],
-        is_workspace_member=True,
-    ):
+        display_name: str,
+        root_module: pathlib.Path,
+        deps: List[str],
+        cfg: List[str] = [],
+        is_workspace_member: bool = True,
+    ) -> None:
         register_crate(
-            build_crate(
-                display_name,
-                root_module,
-                deps,
-                cfg,
-                is_workspace_member,
-            )
+            build_crate(display_name, root_module, deps, cfg, is_workspace_member)
         )
 
     def append_proc_macro_crate(
-        display_name,
-        root_module,
-        deps,
-        cfg=[],
-    ):
+        display_name: str,
+        root_module: pathlib.Path,
+        deps: List[str],
+        cfg: List[str] = [],
+    ) -> None:
         crate = build_crate(display_name, root_module, deps, cfg)
         proc_macro_dylib_name = (
             subprocess.check_output(
@@ -99,7 +124,7 @@ def generate_crates(srctree, objtree, sysroot_src, external_src, cfgs):
             .decode("utf-8")
             .strip()
         )
-        proc_macro_crate = {
+        proc_macro_crate: ProcMacroCrate = {
             **crate,
             "is_proc_macro": True,
             "proc_macro_dylib_path": f"{objtree}/rust/{proc_macro_dylib_name}",
@@ -107,10 +132,10 @@ def generate_crates(srctree, objtree, sysroot_src, external_src, cfgs):
         register_crate(proc_macro_crate)
 
     def append_sysroot_crate(
-        display_name,
-        deps,
-        cfg=[],
-    ):
+        display_name: str,
+        deps: List[str],
+        cfg: List[str] = [],
+    ) -> None:
         append_crate(
             display_name,
             sysroot_src / display_name / "src" / "lib.rs",
@@ -122,7 +147,7 @@ def generate_crates(srctree, objtree, sysroot_src, external_src, cfgs):
     # NB: sysroot crates reexport items from one another so setting up our transitive dependencies
     # here is important for ensuring that rust-analyzer can resolve symbols. The sources of truth
     # for this dependency graph are `(sysroot_src / crate / "Cargo.toml" for crate in crates)`.
-    append_sysroot_crate("core", [], cfg=crates_cfgs.get("core", []))
+    append_sysroot_crate("core", [], cfg=crates_cfgs["core"])
     append_sysroot_crate("alloc", ["core"])
     append_sysroot_crate("std", ["alloc", "core"])
     append_sysroot_crate("proc_macro", ["core", "std"])
@@ -160,9 +185,9 @@ def generate_crates(srctree, objtree, sysroot_src, external_src, cfgs):
     )
 
     def append_crate_with_generated(
-        display_name,
-        deps,
-    ):
+        display_name: str,
+        deps: List[str],
+    ) -> None:
         crate = build_crate(
             display_name,
             srctree / "rust" / display_name / "lib.rs",
@@ -170,20 +195,23 @@ def generate_crates(srctree, objtree, sysroot_src, external_src, cfgs):
             cfg=cfg,
         )
         crate["env"]["OBJTREE"] = str(objtree.resolve(True))
-        crate["source"] = {
-            "include_dirs": [
-                str(srctree / "rust" / display_name),
-                str(objtree / "rust")
-            ],
-            "exclude_dirs": [],
+        crate_with_generated: CrateWithGenerated = {
+            **crate,
+            "source": {
+                "include_dirs": [
+                    str(srctree / "rust" / display_name),
+                    str(objtree / "rust")
+                ],
+                "exclude_dirs": [],
+            }
         }
-        register_crate(crate)
+        register_crate(crate_with_generated)
 
     append_crate_with_generated("bindings", ["core"])
     append_crate_with_generated("uapi", ["core"])
     append_crate_with_generated("kernel", ["core", "macros", "build_error", "bindings", "pin_init", "uapi"])
 
-    def is_root_crate(build_file, target):
+    def is_root_crate(build_file: pathlib.Path, target: str) -> bool:
         try:
             return f"{target}.o" in open(build_file).read()
         except FileNotFoundError:
@@ -192,7 +220,9 @@ def generate_crates(srctree, objtree, sysroot_src, external_src, cfgs):
     # Then, the rest outside of `rust/`.
     #
     # We explicitly mention the top-level folders we want to cover.
-    extra_dirs = map(lambda dir: srctree / dir, ("samples", "drivers"))
+    extra_dirs: Iterable[pathlib.Path] = map(
+        lambda dir: srctree / dir, ("samples", "drivers")
+    )
     if external_src is not None:
         extra_dirs = [external_src]
     for folder in extra_dirs:
@@ -216,7 +246,7 @@ def generate_crates(srctree, objtree, sysroot_src, external_src, cfgs):
     return crates
 
 
-def main():
+def main() -> None:
     parser = argparse.ArgumentParser()
     parser.add_argument("--verbose", "-v", action="store_true")
     parser.add_argument("--cfgs", action="append", default=[])
@@ -225,7 +255,17 @@ def main():
     parser.add_argument("sysroot", type=pathlib.Path)
     parser.add_argument("sysroot_src", type=pathlib.Path)
     parser.add_argument("exttree", type=pathlib.Path, nargs="?")
-    args = parser.parse_args()
+
+    class Args(argparse.Namespace):
+        verbose: bool
+        cfgs: List[str]
+        srctree: pathlib.Path
+        objtree: pathlib.Path
+        sysroot: pathlib.Path
+        sysroot_src: pathlib.Path
+        exttree: pathlib.Path
+
+    args = parser.parse_args(namespace=Args())
 
     logging.basicConfig(
         format="[%(asctime)s] [%(levelname)s] %(message)s",
@@ -236,7 +276,19 @@ def main():
     assert args.sysroot in args.sysroot_src.parents
 
     rust_project = {
-        "crates": generate_crates(args.srctree, args.objtree, args.sysroot_src, args.exttree, args.cfgs),
+        "crates": generate_crates(
+            args.srctree,
+            args.objtree,
+            args.sysroot_src,
+            args.exttree,
+            defaultdict(
+                list,
+                {
+                    crate: vals.lstrip("--cfg").split()
+                    for crate, vals in map(lambda cfg: cfg.split("=", 1), args.cfgs)
+                },
+            ),
+        ),
         "sysroot": str(args.sysroot),
     }
 

-- 
2.49.0
Re: [PATCH v5 06/13] scripts: generate_rust_analyzer.py: add type hints
Posted by Trevor Gross 9 months, 3 weeks ago
On Tue, Mar 25, 2025 at 3:07 PM Tamir Duberstein <tamird@gmail.com> wrote:
>
> Python type hints allow static analysis tools like mypy to detect type
> errors during development, improving the developer experience.
>
> Python type hints have been present in the kernel since 2019 at the
> latest; see commit 6ebf5866f2e8 ("kunit: tool: add Python wrappers for
> running KUnit tests").
>
> Add a subclass of `argparse.Namespace` to get type checking on the CLI
> arguments. Move parsing of `cfg` out of `generate_crates` to reduce the
> number of variables in scope with `cfg` in their name. Use a defaultdict
> to avoid `.get("key", [])`.
>
> Run `mypy --strict scripts/generate_rust_analyzer.py --python-version
> 3.8` to verify. Note that `mypy` no longer supports python < 3.8.
>
> Tested-by: Daniel Almeida <daniel.almeida@collabora.com>
> Signed-off-by: Tamir Duberstein <tamird@gmail.com>
> ---
>  scripts/generate_rust_analyzer.py | 166 +++++++++++++++++++++++++-------------
>  1 file changed, 109 insertions(+), 57 deletions(-)

> +                {
> +                    crate: vals.lstrip("--cfg").split()
> +                    for crate, vals in map(lambda cfg: cfg.split("=", 1), args.cfgs)
> +                },

Tiny nit only if you wind up touching this again, generators or
comprehension are a bit more canonical than `map` and `filter`

     for crate, _, vals in (cfg.partition("=") for cfg in args.cfg)

The rest looks good to me, with or without that

Reviewed-by: Trevor Gross <tmgross@umich.edu>
Re: [PATCH v5 06/13] scripts: generate_rust_analyzer.py: add type hints
Posted by Daniel Almeida 10 months, 1 week ago
Hi Tamir,

[snip]

>     rust_project = {
> -        "crates": generate_crates(args.srctree, args.objtree, args.sysroot_src, args.exttree, args.cfgs),
> +        "crates": generate_crates(
> +            args.srctree,
> +            args.objtree,
> +            args.sysroot_src,
> +            args.exttree,
> +            defaultdict(
> +                list,
> +                {
> +                    crate: vals.lstrip("--cfg").split()
> +                    for crate, vals in map(lambda cfg: cfg.split("=", 1), args.cfgs)
> +                },
> +            ),
> +        ),
>         "sysroot": str(args.sysroot),
>     }
> 
> 
> -- 
> 2.49.0
> 

I found `args_crates_cfgs()` a lot easier to understand, but I guess this is a
matter of taste. I also find that this `defaultdict()` call slightly pollutes
the surrounding code, but again, that might be just me.

Regardless, running `mypy` still passes, and there is no change to the output.

Reviewed-by: Daniel Almeida <daniel.almeida@collabora.com>

— Daniel
Re: [PATCH v5 06/13] scripts: generate_rust_analyzer.py: add type hints
Posted by Tamir Duberstein 10 months, 1 week ago
On Mon, Mar 31, 2025 at 1:09 PM Daniel Almeida
<daniel.almeida@collabora.com> wrote:
>
> Hi Tamir,
>
> [snip]
>
> >     rust_project = {
> > -        "crates": generate_crates(args.srctree, args.objtree, args.sysroot_src, args.exttree, args.cfgs),
> > +        "crates": generate_crates(
> > +            args.srctree,
> > +            args.objtree,
> > +            args.sysroot_src,
> > +            args.exttree,
> > +            defaultdict(
> > +                list,
> > +                {
> > +                    crate: vals.lstrip("--cfg").split()
> > +                    for crate, vals in map(lambda cfg: cfg.split("=", 1), args.cfgs)
> > +                },
> > +            ),
> > +        ),
> >         "sysroot": str(args.sysroot),
> >     }
> >
> >
> > --
> > 2.49.0
> >
>
> I found `args_crates_cfgs()` a lot easier to understand, but I guess this is a
> matter of taste. I also find that this `defaultdict()` call slightly pollutes
> the surrounding code, but again, that might be just me.

Would extracting a local variable suffice?

> Regardless, running `mypy` still passes, and there is no change to the output.
>
> Reviewed-by: Daniel Almeida <daniel.almeida@collabora.com>

Thanks!
Re: [PATCH v5 06/13] scripts: generate_rust_analyzer.py: add type hints
Posted by Daniel Almeida 10 months, 1 week ago
Hi Tamir,

>> 
>> I found `args_crates_cfgs()` a lot easier to understand, but I guess this is a
>> matter of taste. I also find that this `defaultdict()` call slightly pollutes
>> the surrounding code, but again, that might be just me.
> 
> Would extracting a local variable suffice?
> 

Yeah, that would make it less crowded.

— Daniel