From nobody Wed Oct 1 21:24:20 2025 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 865F32F1FE9; Wed, 1 Oct 2025 14:49:59 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1759330199; cv=none; b=in/fTi5PBu9578SbbGbkkorTgI+0SBDavhMc/Rm62pNlzdyVJ8mKK7quqMGywF3kFlnF8S4jFciq6A88ErRJQuUHzkF+l8oOo5Czf8UcGNadKN2Zo/aK+YqS4Dk6Pu5zD5bgcCmLjqc8fd/LFdtELUPb53nhuE2vDE1Ha1PLhVU= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1759330199; c=relaxed/simple; bh=gWTpAMgrDNjjcf3uRyjFHcg7Xl9zfe2ooTOSLm0xHCw=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=fthL3/zW23umOD+tStdpIL7UHDpDOk/QrIFsloMdP63rkaOHLiG/voXKFPPnb2M1XNC3LJlhRDj3ZUgUvc/UUsjzExjm18ZByOdanLP8yGEA6bPJsdJMPK8wyRPwUxx2y8YHLT6oPvXvUTgVUJVBX7qCt4MpZ+EpysoiL9SNcSA= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=HBxgjRyq; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="HBxgjRyq" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 359B6C2BCB0; Wed, 1 Oct 2025 14:49:59 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1759330199; bh=gWTpAMgrDNjjcf3uRyjFHcg7Xl9zfe2ooTOSLm0xHCw=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=HBxgjRyqUkrnBXbn3U7XEwEufQPZmwMGKL7FyXfZ69ZFgZ/XQfRYMYGGnunGWtqUg 1gmWE1lMyfW9b8vt3I81X18juwK8XMuRkSWzUcWN1vwLsMEA4AHI1xmwowVz/9/+HI SSMNrfxUpCGK4IFDflvC0UDXq77LblwgcOAI3Urbym3BIvuekIOeoo9wh+8r0ORccx umFe6Uh3pqNUx14/NZQ1LmjykNE2yDqL1ujENw7UsC4moFzbv22Y+N09Qf1pyiI0rT ZWamN40UzByJWmU01w4h+wK20+WosgmOG2GZvcYXYc+reicrWWw2U61pinN4Oiq64b 6a+EOHspTioog== Received: from mchehab by mail.kernel.org with local (Exim 4.98.2) (envelope-from ) id 1v3y9V-0000000BLIp-1xpS; Wed, 01 Oct 2025 16:49:57 +0200 From: Mauro Carvalho Chehab To: "Jonathan Corbet" , Linux Doc Mailing List Cc: Mauro Carvalho Chehab , "Mauro Carvalho Chehab" , linux-kernel@vger.kernel.org, linux-media@vger.kernel.org Subject: [PATCH 13/23] docs: kernel_include.py: use get_close_matches() to propose alternatives Date: Wed, 1 Oct 2025 16:49:36 +0200 Message-ID: <7365feb74cbdd6b982c87baf5863360ab98cf727.1759329363.git.mchehab+huawei@kernel.org> X-Mailer: git-send-email 2.51.0 In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Sender: Mauro Carvalho Chehab Content-Type: text/plain; charset="utf-8" Improve the suggestions algorithm by using get_close_matches() if no suggestions with the same name are found. As we're now building a dict, when the name is identical, but on a different domain, the search is O(1), making it a lot faster. The get_close_matches is also fast, as there is just one loop, instead of 3. This can be useful to detect typos on references, with could be the base of a futuere extension that will handle ref unmatches for the entire build, allowing someone to find typos and fix them. As difflib and get_close_matches are there since the early Python 3.x days, we don't need to handle any extra dependencies to use it. We're keeping the default values for the search, e.g. n=3D3, cutoff=3D0.6. With that, we now have things like: $ make SPHINXDIRS=3D"userspace-api/media" htmldocs ... include/uapi/linux/videodev2.h:199: WARNING: Invalid xref: c:type:`v4l2_m= emory`. Possible alternatives: c:type:`v4l2_meta_format` (from v4l/dev-meta) c:type:`v4l2_rect` (from v4l/dev-overlay) c:type:`v4l2_area` (from v4l/ext-ctrls-image-source) [ref.missing] ... include/uapi/linux/videodev2.h:1985: WARNING: Invalid xref: c:type:`V4L.v= 4l2_queryctrl`. Possible alternatives: std:label:`v4l2-queryctrl` (from v4l/vidioc-queryctrl) std:label:`v4l2-query-ext-ctrl` (from v4l/vidioc-queryctrl) At the first example, it was not a typo, but a symbol that doesn't seem to be properly documented. The second example points to v4l2-queryctrl, which is a close match for the symbol. Signed-off-by: Mauro Carvalho Chehab --- Documentation/sphinx/kernel_include.py | 62 +++++++++++++------------- 1 file changed, 30 insertions(+), 32 deletions(-) diff --git a/Documentation/sphinx/kernel_include.py b/Documentation/sphinx/= kernel_include.py index 895646da7495..75e139287d50 100755 --- a/Documentation/sphinx/kernel_include.py +++ b/Documentation/sphinx/kernel_include.py @@ -87,6 +87,8 @@ import os.path import re import sys =20 +from difflib import get_close_matches + from docutils import io, nodes, statemachine from docutils.statemachine import ViewList from docutils.parsers.rst import Directive, directives @@ -401,8 +403,8 @@ class KernelInclude(Directive): # =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D =20 reported =3D set() - DOMAIN_INFO =3D {} +all_refs =3D {} =20 def fill_domain_info(env): """ @@ -419,47 +421,43 @@ def fill_domain_info(env): # Ignore domains that we can't retrieve object types, if any pass =20 + for domain in DOMAIN_INFO.keys(): + domain_obj =3D env.get_domain(domain) + for name, dispname, objtype, docname, anchor, priority in domain_o= bj.get_objects(): + ref_name =3D name.lower() + + if domain =3D=3D "c": + if '.' in ref_name: + ref_name =3D ref_name.split(".")[-1] + + if not ref_name in all_refs: + all_refs[ref_name] =3D [] + + all_refs[ref_name].append(f"\t{domain}:{objtype}:`{name}` (fro= m {docname})") + def get_suggestions(app, env, node, original_target, original_domain, original_reftype): """Check if target exists in the other domain or with different reftyp= es.""" original_target =3D original_target.lower() =20 # Remove namespace if present - if '.' in original_target: - original_target =3D original_target.split(".")[-1] - - targets =3D set([ - original_target, - original_target.replace("-", "_"), - original_target.replace("_", "-"), - ]) - - # Propose some suggestions, if possible - # The code below checks not only variants of the target, but also it - # works when .. c:namespace:: targets setting a different C namespace - # is in place + if original_domain =3D=3D "c": + if '.' in original_target: + original_target =3D original_target.split(".")[-1] =20 suggestions =3D [] - for target in sorted(targets): - for domain in DOMAIN_INFO.keys(): - domain_obj =3D env.get_domain(domain) - for name, dispname, objtype, docname, anchor, priority in doma= in_obj.get_objects(): - lower_name =3D name.lower() =20 - if domain =3D=3D "c": - # Check if name belongs to a different C namespace - match =3D RE_SPLIT_DOMAIN.match(name) - if match: - if target !=3D match.group(2).lower(): - continue - else: - if target !=3D lower_name: - continue - else: - if target !=3D lower_name: - continue + # If name exists, propose exact name match on different domains + if original_target in all_refs: + return all_refs[original_target] =20 - suggestions.append(f"\t{domain}:{objtype}:`{name}` (from {= docname})") + # If not found, get a close match, using difflib. + # Such method is based on Ratcliff-Obershelp Algorithm, which seeks + # for a close match within a certain distance. We're using the defaults + # here, e.g. cutoff=3D0.6, proposing 3 alternatives + matches =3D get_close_matches(original_target, all_refs.keys()) + for match in matches: + suggestions +=3D all_refs[match] =20 return suggestions =20 --=20 2.51.0