[libvirt PATCH v2 1/3] scripts: Check spelling

Tim Wiederhake posted 3 patches 4 years ago
There is a newer version of this series
[libvirt PATCH v2 1/3] scripts: Check spelling
Posted by Tim Wiederhake 4 years ago
This is a wrapper for codespell [1], a spell checker for source code.
Codespell does not compare words to a dictionary, but rather works by
checking words against a list of common typos, making it produce fewer
false positives than other solutions.

The script in this patch works around the lack of per-directory ignore
lists and some oddities regarding capitalization in ignore lists.

[1] (https://github.com/codespell-project/codespell/)

Signed-off-by: Tim Wiederhake <twiederh@redhat.com>
---
 scripts/check-spelling.py | 119 ++++++++++++++++++++++++++++++++++++++
 1 file changed, 119 insertions(+)
 create mode 100755 scripts/check-spelling.py

diff --git a/scripts/check-spelling.py b/scripts/check-spelling.py
new file mode 100755
index 0000000000..0480a506e8
--- /dev/null
+++ b/scripts/check-spelling.py
@@ -0,0 +1,119 @@
+#!/usr/bin/env python3
+
+import argparse
+import re
+import subprocess
+import os
+
+
+IGNORE_LIST = [
+    # ignore all translation files
+    ("/po/", []),
+
+    # ignore all git files
+    ("/.git/", []),
+
+    # ignore this script
+    ("/scripts/check-spelling.py", []),
+
+    # 3rd-party: keycodemapdb
+    ("/src/keycodemapdb/", []),
+
+    # 3rd-party: VirtualBox SDK
+    ("/src/vbox/vbox_CAPI", []),
+
+    # 3rd-party: qemu
+    ("/tests/qemucapabilitiesdata/caps_", []),
+
+    # other
+    ("/", ["msdos", "MSDOS", "wan", "WAN", "hda", "HDA", "inout"]),
+    ("/NEWS.rst", "crashers"),
+    ("/docs/gitdm/companies/others", "Archiv"),
+    ("/docs/glib-adoption.rst", "preferrable"),
+    ("/docs/js/main.js", "whats"),
+    ("/examples/polkit/libvirt-acl.rules", ["userA", "userB", "userC"]),
+    ("/src/libvirt-domain.c", "PTD"),
+    ("/src/libxl/libxl_logger.c", "purposedly"),
+    ("/src/nwfilter/nwfilter_dhcpsnoop.c", "ether"),
+    ("/src/nwfilter/nwfilter_ebiptables_driver.c", "parm"),
+    ("/src/nwfilter/nwfilter_learnipaddr.c", "ether"),
+    ("/src/qemu/qemu_agent.c", "crypted"),
+    ("/src/qemu/qemu_agent.h", "crypted"),
+    ("/src/qemu/qemu_process.c", "wee"),
+    ("/src/security/apparmor/libvirt-lxc", "devic"),
+    ("/src/security/apparmor/libvirt-qemu", "readby"),
+    ("/src/storage_file/storage_file_probe.c", "conectix"),
+    ("/src/util/virnetdevmacvlan.c", "calld"),
+    ("/src/util/virtpm.c", "parm"),
+    ("/tests/qemuagenttest.c", "IST"),
+    ("/tests/storagepoolxml2xml", "cant"),
+    ("/tests/sysinfodata/", "sie"),
+    ("/tests/testutils.c", "nIn"),
+    ("/tests/vircgroupdata/ovirt-node-6.6.mounts", "hald"),
+    ("/tests/virhostcpudata/", "sie"),
+    ("/tools/virt-host-validate-common.c", "sie"),
+]
+
+
+def ignore(filename, linenumber, word, suggestion):
+    if len(word) <= 2:
+        return True
+
+    for f, w in IGNORE_LIST:
+        if not filename.startswith(f):
+            continue
+        if word in w or not w:
+            return True
+    return False
+
+
+def main():
+    line_pattern = re.compile("^(.*):(.*): (.*) ==> (.*)$")
+    output_template = "(\"{0}\", \"{2}\"),\t# line {1}, \"{3}\"?"
+
+    parser = argparse.ArgumentParser(description="Check spelling")
+    parser.add_argument(
+        "dir",
+        help="Path to source directory. "
+        "Defaults to parent directory of this script",
+        type=os.path.realpath,
+        nargs='?')
+    args = parser.parse_args()
+
+    if not args.dir:
+        args.dir = os.path.dirname(os.path.dirname(os.path.realpath(__file__)))
+
+    process = subprocess.run(
+        ["codespell", args.dir],
+        stdout=subprocess.PIPE,
+        stderr=subprocess.PIPE,
+        universal_newlines=True)
+
+    if process.returncode not in (0, 65):
+        exit("error: unexpected returncode %s" % process.returncode)
+
+    if process.stderr:
+        exit("error: unexpected output to stderr: \"%s\"" % process.stderr)
+
+    findings = 0
+    for line in process.stdout.split("\n"):
+        line = line.strip().replace(args.dir, "")
+        if not line:
+            continue
+
+        match = line_pattern.match(line)
+        if not match:
+            exit("error: unexpected line: \"%s\"" % line)
+
+        if ignore(*match.groups()):
+            continue
+
+        print(output_template.format(*match.groups()))
+        findings += 1
+
+    if findings:
+        exit("error: %s spelling errors" % findings)
+
+
+if __name__ == "__main__":
+    main()
-- 
2.31.1

Re: [libvirt PATCH v2 1/3] scripts: Check spelling
Posted by Andrea Bolognani 4 years ago
On Mon, Jan 10, 2022 at 04:41:25PM +0100, Tim Wiederhake wrote:
> +    ("/docs/glib-adoption.rst", "preferrable"),

This is an actual typo, isn't it?

> +    ("/docs/js/main.js", "whats"),
> +    ("/src/libxl/libxl_logger.c", "purposedly"),
> +    ("/src/qemu/qemu_process.c", "wee"),
> +    ("/tests/storagepoolxml2xml", "cant"),

These are a few cases where I feel that rewording the existing
comment or piece of code, even if it wouldn't strictly speaking count
as fixing a typo, would still be preferable to having to list it as
an exception.

> +    ("/src/util/virnetdevmacvlan.c", "calld"),

Same for this one, but I appreciate that others might consider
renaming the variable as unnecessary churn and not worth the effort.

> +    ("/src/security/apparmor/libvirt-lxc", "devic"),

Looking at the context where this appears:

  deny /sys/d[^e]*{,/**} wklx,
  deny /sys/de[^v]*{,/**} wklx,
  deny /sys/dev[^i]*{,/**} wklx,
  deny /sys/devi[^c]*{,/**} wklx,
  deny /sys/devic[^e]*{,/**} wklx,
  deny /sys/device[^s]*{,/**} wklx,
  deny /sys/devices/[^v]*{,/**} wklx,
  deny /sys/devices/v[^i]*{,/**} wklx,
  deny /sys/devices/vi[^r]*{,/**} wklx,
  deny /sys/devices/vir[^t]*{,/**} wklx,
  deny /sys/devices/virt[^u]*{,/**} wklx,
  deny /sys/devices/virtu[^a]*{,/**} wklx,
  deny /sys/devices/virtua[^l]*{,/**} wklx,
  deny /sys/devices/virtual/[^n]*{,/**} wklx,
  deny /sys/devices/virtual/n[^e]*{,/**} wklx,
  deny /sys/devices/virtual/ne[^t]*{,/**} wklx,
  deny /sys/devices/virtual/net?*{,/**} wklx,
  deny /sys/devices/virtual?*{,/**} wklx,
  deny /sys/devices?*{,/**} wklx,

I mean, I don't speak AppArmor but this can't be right, can it? :D

Jim, do you think we actually need such a slippery slope of deny
rules, or can we simplify things a bit?

> +    ("/src/security/apparmor/libvirt-qemu", "readby"),

This should probably be made to apply to all libvirt-* files in that
directory, as it's apparently part of the format specification.

> +    ("/tests/vircgroupdata/ovirt-node-6.6.mounts", "hald"),

In this case I think it's perfectly fine to just drop the offending
line and move on.

-- 
Andrea Bolognani / Red Hat / Virtualization

Re: [libvirt PATCH v2 1/3] scripts: Check spelling
Posted by Jim Fehlig 4 years ago
On 1/10/22 11:21, Andrea Bolognani wrote:
> On Mon, Jan 10, 2022 at 04:41:25PM +0100, Tim Wiederhake wrote:
>> +    ("/docs/glib-adoption.rst", "preferrable"),
> 
> This is an actual typo, isn't it?
> 
>> +    ("/docs/js/main.js", "whats"),
>> +    ("/src/libxl/libxl_logger.c", "purposedly"),
>> +    ("/src/qemu/qemu_process.c", "wee"),
>> +    ("/tests/storagepoolxml2xml", "cant"),
> 
> These are a few cases where I feel that rewording the existing
> comment or piece of code, even if it wouldn't strictly speaking count
> as fixing a typo, would still be preferable to having to list it as
> an exception.
> 
>> +    ("/src/util/virnetdevmacvlan.c", "calld"),
> 
> Same for this one, but I appreciate that others might consider
> renaming the variable as unnecessary churn and not worth the effort.
> 
>> +    ("/src/security/apparmor/libvirt-lxc", "devic"),
> 
> Looking at the context where this appears:
> 
>    deny /sys/d[^e]*{,/**} wklx,
>    deny /sys/de[^v]*{,/**} wklx,
>    deny /sys/dev[^i]*{,/**} wklx,
>    deny /sys/devi[^c]*{,/**} wklx,
>    deny /sys/devic[^e]*{,/**} wklx,
>    deny /sys/device[^s]*{,/**} wklx,
>    deny /sys/devices/[^v]*{,/**} wklx,
>    deny /sys/devices/v[^i]*{,/**} wklx,
>    deny /sys/devices/vi[^r]*{,/**} wklx,
>    deny /sys/devices/vir[^t]*{,/**} wklx,
>    deny /sys/devices/virt[^u]*{,/**} wklx,
>    deny /sys/devices/virtu[^a]*{,/**} wklx,
>    deny /sys/devices/virtua[^l]*{,/**} wklx,
>    deny /sys/devices/virtual/[^n]*{,/**} wklx,
>    deny /sys/devices/virtual/n[^e]*{,/**} wklx,
>    deny /sys/devices/virtual/ne[^t]*{,/**} wklx,
>    deny /sys/devices/virtual/net?*{,/**} wklx,
>    deny /sys/devices/virtual?*{,/**} wklx,
>    deny /sys/devices?*{,/**} wklx,
> 
> I mean, I don't speak AppArmor but this can't be right, can it? :D

It's valid apparmor. At least the apparmor parser doesn't complain :-). ISTM the 
last rule should cover the others.

> Jim, do you think we actually need such a slippery slope of deny
> rules, or can we simplify things a bit?

I don't know why all of these deny rules are defined in this manner. /sys/class, 
/proc/sys/kernel, and others are defined similarly. They were added by Cedric in 
commit 9265f8ab67d. Cedric, do you recall the purpose of defining the rules in 
this way?

>> +    ("/src/security/apparmor/libvirt-qemu", "readby"),
> 
> This should probably be made to apply to all libvirt-* files in that
> directory, as it's apparently part of the format specification.

Agree, it's a valid permissions value

https://gitlab.com/apparmor/apparmor/-/wikis/TechnicalDoc_Proc_and_ptrace

Regards,
Jim

Re: [libvirt PATCH v2 1/3] scripts: Check spelling
Posted by Andrea Bolognani 4 years ago
On Mon, Jan 10, 2022 at 03:58:55PM -0700, Jim Fehlig wrote:
> On 1/10/22 11:21, Andrea Bolognani wrote:
> > On Mon, Jan 10, 2022 at 04:41:25PM +0100, Tim Wiederhake wrote:
> > > +    ("/src/security/apparmor/libvirt-lxc", "devic"),
> >
> > Looking at the context where this appears:
> >
> >    deny /sys/d[^e]*{,/**} wklx,
> >    deny /sys/de[^v]*{,/**} wklx,
> >    deny /sys/dev[^i]*{,/**} wklx,
> >    deny /sys/devi[^c]*{,/**} wklx,
> >    deny /sys/devic[^e]*{,/**} wklx,
> >    deny /sys/device[^s]*{,/**} wklx,
> >    deny /sys/devices/[^v]*{,/**} wklx,
> >    deny /sys/devices/v[^i]*{,/**} wklx,
> >    deny /sys/devices/vi[^r]*{,/**} wklx,
> >    deny /sys/devices/vir[^t]*{,/**} wklx,
> >    deny /sys/devices/virt[^u]*{,/**} wklx,
> >    deny /sys/devices/virtu[^a]*{,/**} wklx,
> >    deny /sys/devices/virtua[^l]*{,/**} wklx,
> >    deny /sys/devices/virtual/[^n]*{,/**} wklx,
> >    deny /sys/devices/virtual/n[^e]*{,/**} wklx,
> >    deny /sys/devices/virtual/ne[^t]*{,/**} wklx,
> >    deny /sys/devices/virtual/net?*{,/**} wklx,
> >    deny /sys/devices/virtual?*{,/**} wklx,
> >    deny /sys/devices?*{,/**} wklx,
> >
> > I mean, I don't speak AppArmor but this can't be right, can it? :D
>
> It's valid apparmor. At least the apparmor parser doesn't complain :-). ISTM
> the last rule should cover the others.

I was not really suggesting that it was not a valid configuration,
it's just that looking at it immediately triggered a "that can't be
the best way to do it" reaction in me ;)

> > Jim, do you think we actually need such a slippery slope of deny
> > rules, or can we simplify things a bit?
>
> I don't know why all of these deny rules are defined in this manner.
> /sys/class, /proc/sys/kernel, and others are defined similarly. They were
> added by Cedric in commit 9265f8ab67d. Cedric, do you recall the purpose of
> defining the rules in this way?

The script that generated those rules is

  https://github.com/lxc/lxc/blob/master/config/apparmor/lxc-generate-aa-rules.py

and that's apparently its intended behavior. So there has to be a
reason why it's done this way, right? I just have no idea what it
could possibly be.

-- 
Andrea Bolognani / Red Hat / Virtualization

Re: [libvirt PATCH v2 1/3] scripts: Check spelling
Posted by Peter Krempa 4 years ago
On Mon, Jan 10, 2022 at 16:41:25 +0100, Tim Wiederhake wrote:
> This is a wrapper for codespell [1], a spell checker for source code.
> Codespell does not compare words to a dictionary, but rather works by
> checking words against a list of common typos, making it produce fewer
> false positives than other solutions.
> 
> The script in this patch works around the lack of per-directory ignore
> lists and some oddities regarding capitalization in ignore lists.
> 
> [1] (https://github.com/codespell-project/codespell/)
> 
> Signed-off-by: Tim Wiederhake <twiederh@redhat.com>
> ---
>  scripts/check-spelling.py | 119 ++++++++++++++++++++++++++++++++++++++
>  1 file changed, 119 insertions(+)
>  create mode 100755 scripts/check-spelling.py



> 
> diff --git a/scripts/check-spelling.py b/scripts/check-spelling.py
> new file mode 100755
> index 0000000000..0480a506e8
> --- /dev/null
> +++ b/scripts/check-spelling.py
> @@ -0,0 +1,119 @@
> +#!/usr/bin/env python3
> +
> +import argparse
> +import re
> +import subprocess
> +import os
> +
> +
> +IGNORE_LIST = [
> +    # ignore all translation files
> +    ("/po/", []),
> +
> +    # ignore all git files
> +    ("/.git/", []),

IMO this should be working on an equivalent of 'git ls-files' rather
than excluding files which _might_ interfere as if you run it in a
not-so-clean checkout it also looks for files which are not part of the
repo:

$ ./scripts/check-spelling.py
("/tags", "aLocation"),	# line 10516, "allocation"?
("/tags", "aLocation"),	# line 10517, "allocation"?
("/tags", "aLocation"),	# line 10518, "allocation"?
("/tags", "aLocation"),	# line 10521, "allocation"?
("/tags", "aLocation"),	# line 10522, "allocation"?
("/tags", "aLocation"),	# line 10523, "allocation"?
("/tags", "aLocation"),	# line 10526, "allocation"?
("/tags", "aLocation"),	# line 10527, "allocation"?
("/tags", "aLocation"),	# line 10528, "allocation"?
("/tags", "independant"),	# line 48256, "independent"?
("/tags", "parm"),	# line 60938, "param, pram, parma"?
("/tags", "parm"),	# line 60938, "param, pram, parma"?
("/tags", "calld"),	# line 80099, "called"?
("/tests/qemucapabilitiesnumbering.c", "occurence"),	# line 60, "occurrence"?
("/tests/virstoragetest.c.orig", "folowing"),	# line 103, "following"?
("/tests/qemucapabilitiestest.c", "programatic"),	# line 216, "programmatic"?
error: 16 spelling errors


'tags' is a file generated by 'ctags' and "/tests/virstoragetest.c.orig"
is an artifact of an editor.

Also since the script is lacking integration with meson and is also
missing from documentation it's not applied to local builds, only on
stuff that gets CI'd. Catching these early will hopefully prevent a few
pointless CI runs just to fix typos.