From nobody Sun Feb 8 22:57:49 2026 Delivered-To: importer@patchew.org Received-SPF: pass (zohomail.com: domain of redhat.com designates 170.10.133.124 as permitted sender) client-ip=170.10.133.124; envelope-from=libvir-list-bounces@redhat.com; helo=us-smtp-delivery-124.mimecast.com; Authentication-Results: mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=libvir-list-bounces@redhat.com; dmarc=pass(p=none dis=none) header.from=redhat.com ARC-Seal: i=1; a=rsa-sha256; t=1654009775; cv=none; d=zohomail.com; s=zohoarc; b=TwhOle1Jh4wr/v3naifjABQdNBtviwzYA5r2JhMUzJVUT0bmIaxk3X3QKRzIcVeAWCA8fdT586/0+tzFVdAzKryCwb7P1BqG1Qaudu7PBTnnb2YBw5AxDbnBh7A2KaW1BUrKcfVa3Hoh6IoiVlBIE4v8NMqcM0ryiq34g3F3IhY= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zohomail.com; s=zohoarc; t=1654009775; h=Content-Type:Content-Transfer-Encoding:Date:From:In-Reply-To:List-Subscribe:List-Post:List-Id:List-Archive:List-Help:List-Unsubscribe:MIME-Version:Message-ID:References:Sender:Subject:To; bh=h9kIA8OGQ3fS/gtfAJ9n+SZo8TukTKXoqOLlbsaoIUw=; b=XNhJbcft3ghvsJP60+CPYW5NWQYd9Ero5r5kU9G3qepmQlzpzxMzl5i9wxs5d2bq43XhbV7CADumw0afO+E0sr+94WMeGLThYdoq+odpHmN7v+/hVqT9Cau27QIzscSOkbIyEmZYn4b1wrVSggin9axr8HdO2r6k1Saqskau03A= ARC-Authentication-Results: i=1; mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=libvir-list-bounces@redhat.com; dmarc=pass header.from= (p=none dis=none) Return-Path: Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by mx.zohomail.com with SMTPS id 1654009775860890.2396309798154; Tue, 31 May 2022 08:09:35 -0700 (PDT) Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-546-i4aosXWDOruu9trMH6EAAQ-1; Tue, 31 May 2022 11:09:16 -0400 Received: from smtp.corp.redhat.com (int-mx06.intmail.prod.int.rdu2.redhat.com [10.11.54.6]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 60A971801398; Tue, 31 May 2022 15:07:45 +0000 (UTC) Received: from mm-prod-listman-01.mail-001.prod.us-east-1.aws.redhat.com (unknown [10.30.29.100]) by smtp.corp.redhat.com (Postfix) with ESMTP id 400A12166B26; Tue, 31 May 2022 15:07:45 +0000 (UTC) Received: from mm-prod-listman-01.mail-001.prod.us-east-1.aws.redhat.com (localhost [IPv6:::1]) by mm-prod-listman-01.mail-001.prod.us-east-1.aws.redhat.com (Postfix) with ESMTP id E8C0F194706D; Tue, 31 May 2022 15:07:44 +0000 (UTC) Received: from smtp.corp.redhat.com (int-mx05.intmail.prod.int.rdu2.redhat.com [10.11.54.5]) by mm-prod-listman-01.mail-001.prod.us-east-1.aws.redhat.com (Postfix) with ESMTP id 8EE99194707A for ; Tue, 31 May 2022 15:07:40 +0000 (UTC) Received: by smtp.corp.redhat.com (Postfix) id 81BF71731B; Tue, 31 May 2022 15:07:40 +0000 (UTC) Received: from speedmetal.lan (unknown [10.40.208.21]) by smtp.corp.redhat.com (Postfix) with ESMTP id F10CD1131A for ; Tue, 31 May 2022 15:07:39 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1654009774; h=from:from:sender:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:list-id:list-help: list-unsubscribe:list-subscribe:list-post; bh=h9kIA8OGQ3fS/gtfAJ9n+SZo8TukTKXoqOLlbsaoIUw=; b=Rx9dWnny5OhcXCFkaxf/eEyUsNPavdmCc+AUkrdZLe4YQJV5zS3NH2hutUjjXnOSSrB+Jj 6D5XdgoLwKs9g/G79FwpLDEi9YsO2Aybn5lgGIIPBHTSsenNyc9TKOgFWXF1ce/bEXKAeB 5toi5I9HOr6yO1ybPcbto7xDrE48CKw= X-MC-Unique: i4aosXWDOruu9trMH6EAAQ-1 X-Original-To: libvir-list@listman.corp.redhat.com From: Peter Krempa To: libvir-list@redhat.com Subject: [PATCH 67/67] docs: Add HTML reference checker Date: Tue, 31 May 2022 17:06:42 +0200 Message-Id: <46ad7f3ff1b34975a71302af5f162e6744f10573.1654008136.git.pkrempa@redhat.com> In-Reply-To: References: MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.11.54.5 X-BeenThere: libvir-list@redhat.com X-Mailman-Version: 2.1.29 Precedence: list List-Id: Development discussions about the libvirt library & tools List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: libvir-list-bounces@redhat.com Sender: "libvir-list" X-Scanned-By: MIMEDefang 2.78 on 10.11.54.6 Authentication-Results: relay.mimecast.com; auth=pass smtp.auth=CUSA124A263 smtp.mailfrom=libvir-list-bounces@redhat.com X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Transfer-Encoding: quoted-printable X-ZohoMail-DKIM: pass (identity @redhat.com) X-ZM-MESSAGEID: 1654009777702100008 Content-Type: text/plain; charset="utf-8" In many cases we move around or rename internal anchors which may break links leading to the content. docutils handle the case of links inside a document, but we are lacking the same form of checking between documents. Introduce a script which cross-checks all the anchors and links in HTML output files and prints problems and use it as a test case for the 'docs' directory. Signed-off-by: Peter Krempa --- docs/meson.build | 11 +++ scripts/check-html-references.py | 153 +++++++++++++++++++++++++++++++ scripts/meson.build | 1 + 3 files changed, 165 insertions(+) create mode 100755 scripts/check-html-references.py diff --git a/docs/meson.build b/docs/meson.build index d71f6006dd..cb70ef6084 100644 --- a/docs/meson.build +++ b/docs/meson.build @@ -350,3 +350,14 @@ run_target( ], depends: install_web_deps, ) + +test( + 'check-html-references', + python3_prog, + args: [ + check_html_references_prog.path(), + '--prefix', + meson.build_root() / 'docs' + ], + env: runutf8, +) diff --git a/scripts/check-html-references.py b/scripts/check-html-referenc= es.py new file mode 100755 index 0000000000..95a61a6bb4 --- /dev/null +++ b/scripts/check-html-references.py @@ -0,0 +1,153 @@ +#!/usr/bin/env python3 +# +# This library is free software; you can redistribute it and/or +# modify it under the terms of the GNU Lesser General Public +# License as published by the Free Software Foundation; either +# version 2.1 of the License, or (at your option) any later version. +# +# This library is distributed in the hope that it will be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU +# Lesser General Public License for more details. +# +# You should have received a copy of the GNU Lesser General Public +# License along with this library. If not, see +# . +# +# Check that external references between documentation HTML files are not = broken. + +import sys +import os +import argparse +import re +import xml.etree.ElementTree as ET + +ns =3D {'html': 'http://www.w3.org/1999/xhtml'} +externallinks =3D [] + + +def get_file_list(prefix): + filelist =3D [] + + for root, dir, files in os.walk(prefix): + prefixbase =3D os.path.dirname(prefix) + + if root.startswith(prefixbase): + relroot =3D root[len(prefixbase):] + else: + relroot =3D root + + for file in files: + if not re.search('\\.html$', file): + continue + + # the 404 page doesn't play well + if '404.html' in file: + continue + + fullfilename =3D os.path.join(root, file) + relfilename =3D os.path.join(relroot, file) + filelist.append((fullfilename, relfilename)) + + return filelist + + +# loads an XHTML and extracts all anchors, local and remote links for the = one file +def process_file(filetuple): + filename, relfilename =3D filetuple + tree =3D ET.parse(filename) + root =3D tree.getroot() + + anchors =3D [relfilename] + targets =3D [] + + for elem in root.findall('.//html:a', ns): + target =3D elem.get('href') + an =3D elem.get('id') + + if an: + anchors.append(relfilename + '#' + an) + + if target: + if re.search('://', target): + externallinks.append(target) + elif target[0] !=3D '#' and 'mailto:' not in target: + dirname =3D os.path.dirname(relfilename) + targetname =3D os.path.normpath(os.path.join(dirname, targ= et)) + + targets.append((targetname, filename, target)) + + # older docutils generate "