From nobody Sat Feb 7 06:55:39 2026 Delivered-To: importer@patchew.org Received-SPF: pass (zohomail.com: domain of redhat.com designates 170.10.129.124 as permitted sender) client-ip=170.10.129.124; envelope-from=libvir-list-bounces@redhat.com; helo=us-smtp-delivery-124.mimecast.com; Authentication-Results: mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=libvir-list-bounces@redhat.com; dmarc=pass(p=none dis=none) header.from=redhat.com ARC-Seal: i=1; a=rsa-sha256; t=1676411546; cv=none; d=zohomail.com; s=zohoarc; b=mWLtm05TqU0l8sEmmAmmqpk49mXx4KQeC+ELO0ZXC/n6nNPQDayNWeVal4N/mqW81rmgNyfKRAGGh1n0sHWL0YHJ93zXCmyU6cxN15Cx7o1FfmUeoq4ANF6YvFmZhJwlDKz6vc3F3mwR30RpGwtwNxe9PeONP+vOtIF6/bKIY+4= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zohomail.com; s=zohoarc; t=1676411546; h=Content-Type:Content-Transfer-Encoding:Date:From:In-Reply-To:List-Subscribe:List-Post:List-Id:List-Archive:List-Help:List-Unsubscribe:MIME-Version:Message-ID:References:Sender:Subject:To; bh=fSdS08q2mZPx5NkN18FfX6RoTIi9C7b7dz+kinZot4s=; b=DmKtsYqqQ9EmO9KNafDRfwvANAf7KIXbi7L947GUGMfCgDSDDVzy2YHHr8GsLYokXlow/VGeL/PFcXm0UU/zZ76SFmoEhtDUS/JkjZ02hKGPKBXROLVHI6Ji9J6fccbp1D9IRFdlYDZVnQlQVgBn5GyURCBFZvNpsX9HbWhUbmo= ARC-Authentication-Results: i=1; mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=libvir-list-bounces@redhat.com; dmarc=pass header.from= (p=none dis=none) Return-Path: Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by mx.zohomail.com with SMTPS id 1676411546745276.6054414937479; Tue, 14 Feb 2023 13:52:26 -0800 (PST) Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-593-thDbMb6sNUma6N3ZNTAGGg-1; Tue, 14 Feb 2023 16:51:36 -0500 Received: from smtp.corp.redhat.com (int-mx09.intmail.prod.int.rdu2.redhat.com [10.11.54.9]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 8E26C803D7C; Tue, 14 Feb 2023 21:51:30 +0000 (UTC) Received: from mm-prod-listman-01.mail-001.prod.us-east-1.aws.redhat.com (unknown [10.30.29.100]) by smtp.corp.redhat.com (Postfix) with ESMTP id 72530492B04; Tue, 14 Feb 2023 21:51:30 +0000 (UTC) Received: from mm-prod-listman-01.mail-001.prod.us-east-1.aws.redhat.com (localhost [IPv6:::1]) by mm-prod-listman-01.mail-001.prod.us-east-1.aws.redhat.com (Postfix) with ESMTP id 04FB219465B5; Tue, 14 Feb 2023 21:51:30 +0000 (UTC) Received: from smtp.corp.redhat.com (int-mx03.intmail.prod.int.rdu2.redhat.com [10.11.54.3]) by mm-prod-listman-01.mail-001.prod.us-east-1.aws.redhat.com (Postfix) with ESMTP id 0EC991946586 for ; Tue, 14 Feb 2023 21:51:28 +0000 (UTC) Received: by smtp.corp.redhat.com (Postfix) id 035EA1121314; Tue, 14 Feb 2023 21:51:28 +0000 (UTC) Received: from speedmetal.redhat.com (ovpn-208-8.brq.redhat.com [10.40.208.8]) by smtp.corp.redhat.com (Postfix) with ESMTP id 4DC051121318 for ; Tue, 14 Feb 2023 21:51:27 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1676411545; h=from:from:sender:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:list-id:list-help: list-unsubscribe:list-subscribe:list-post; bh=fSdS08q2mZPx5NkN18FfX6RoTIi9C7b7dz+kinZot4s=; b=EXbyeyrFYfFkadoBv0Q90BCJ+OUK2QbxMNJxv+FEkgW1/1c+Y+vFTIQ0DBRcVbjI+NS7ut vwV9vMT8EviEICxzSzMN1ENsPiOyE8E1lNWsGVFfnB/HI0brCTkKqn5/MNCSaynuAjR5fx 77kmImJzQ7YPD0zQKnlbXjKA8LafZxk= X-MC-Unique: thDbMb6sNUma6N3ZNTAGGg-1 X-Original-To: libvir-list@listman.corp.redhat.com From: Peter Krempa To: libvir-list@redhat.com Subject: [PATCH 9/9] scripts: check-html-refernces: Add checking for image file usage Date: Tue, 14 Feb 2023 22:51:17 +0100 Message-Id: In-Reply-To: References: MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.3 X-BeenThere: libvir-list@redhat.com X-Mailman-Version: 2.1.29 Precedence: list List-Id: Development discussions about the libvirt library & tools List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: libvir-list-bounces@redhat.com Sender: "libvir-list" X-Scanned-By: MIMEDefang 3.1 on 10.11.54.9 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Transfer-Encoding: quoted-printable X-ZohoMail-DKIM: pass (identity @redhat.com) X-ZM-MESSAGEID: 1676411548587100001 Content-Type: text/plain; charset="utf-8" Check both that a file is referenced from our pages and also that pages reference existing images. The mode for dumping external references now also dumps images. '--ignore-image' can be used repeatedly to suppress errors for specific images. Signed-off-by: Peter Krempa Reviewed-by: Daniel P. Berrang=C3=A9 --- scripts/check-html-references.py | 101 ++++++++++++++++++++++++++----- 1 file changed, 87 insertions(+), 14 deletions(-) diff --git a/scripts/check-html-references.py b/scripts/check-html-referenc= es.py index 4f08feab59..788622a2d0 100755 --- a/scripts/check-html-references.py +++ b/scripts/check-html-references.py @@ -24,25 +24,32 @@ import xml.etree.ElementTree as ET ns =3D {'html': 'http://www.w3.org/1999/xhtml'} externallinks =3D [] +externalimages =3D [] def get_file_list(prefix): filelist =3D [] + imagelist =3D [] + imageformats =3D ['.jpg', '.svg', '.png'] for root, dir, files in os.walk(prefix): for file in files: - if not re.search('\\.html$', file): - continue + ext =3D os.path.splitext(file)[1] - # the 404 page doesn't play well - if '404.html' in file: - continue + if ext =3D=3D '.html': + # the 404 page doesn't play well + if '404.html' in file: + continue + + filelist.append(os.path.join(root, file)) - filelist.append(os.path.join(root, file)) + elif ext in imageformats: + imagelist.append(os.path.join(root, file)) filelist.sort() + imagelist.sort() - return filelist + return filelist, imagelist # loads an XHTML and extracts all anchors, local and remote links for the = one file @@ -50,12 +57,14 @@ def process_file(filename): tree =3D ET.parse(filename) root =3D tree.getroot() docname =3D root.get('data-sourcedoc') + dirname =3D os.path.dirname(filename) if not docname: docname =3D filename anchors =3D [filename] targets =3D [] + images =3D [] for elem in root.findall('.//html:a', ns): target =3D elem.get('href') @@ -68,7 +77,6 @@ def process_file(filename): if re.search('://', target): externallinks.append(target) elif target[0] !=3D '#' and 'mailto:' not in target: - dirname =3D os.path.dirname(filename) targetfull =3D os.path.normpath(os.path.join(dirname, targ= et)) targets.append((filename, docname, targetfull, target)) @@ -87,20 +95,33 @@ def process_file(filename): if an: anchors.append(filename + '#' + an) - return (anchors, targets) + # find local images + for elem in root.findall('.//html:img', ns): + src =3D elem.get('src') + + if src: + if re.search('://', src): + externalimages.append(src) + else: + imagefull =3D os.path.normpath(os.path.join(dirname, src)) + images.append((imagefull, docname)) + + return (anchors, targets, images) def process_all(filelist): anchors =3D [] targets =3D [] + images =3D [] for file in filelist: - anchor, target =3D process_file(file) + anchor, target, image =3D process_file(file) targets =3D targets + target anchors =3D anchors + anchor + images =3D images + image - return (targets, anchors) + return (targets, anchors, images) def check_targets(targets, anchors): @@ -163,6 +184,46 @@ def check_usage(targets, files, entrypoint): return fail +# checks that images present in the directory are being used and also that +# pages link to existing images. For favicons, which are not referenced fr= om +# the '.html' files there's a builtin set of exceptions. +def check_images(usedimages, imagefiles, ignoreimages): + favicons =3D [ + 'android-chrome-192x192.png', + 'android-chrome-256x256.png', + 'apple-touch-icon.png', + 'favicon-16x16.png', + 'favicon-32x32.png', + 'mstile-150x150.png', + ] + fail =3D False + + if ignoreimages: + favicons =3D favicons + ignoreimages + + for usedimage, docname in usedimages: + if usedimage not in imagefiles: + print(f'ERROR: \'{docname}\' references image \'{usedimage}\' = not among images') + fail =3D True + + for imagefile in imagefiles: + used =3D False + + if imagefile in (usedimage[0] for usedimage in usedimages): + used =3D True + else: + for favicon in favicons: + if favicon in imagefile: + used =3D True + break + + if not used: + print(f'ERROR: Image \'{imagefile}\' is not used by any page') + fail =3D True + + return fail + + parser =3D argparse.ArgumentParser(description=3D'HTML reference checker') parser.add_argument('--webroot', required=3DTrue, help=3D'path to the web root') @@ -170,14 +231,16 @@ parser.add_argument('--entrypoint', default=3D"index.= html", help=3D'file name of web entry point relative to --web= root') parser.add_argument('--external', action=3D"store_true", help=3D'print external references instead') +parser.add_argument('--ignore-images', action=3D'append', + help=3D'paths to images that should be considered as u= sed') args =3D parser.parse_args() -files =3D get_file_list(os.path.abspath(args.webroot)) +files, imagefiles =3D get_file_list(os.path.abspath(args.webroot)) entrypoint =3D os.path.join(os.path.abspath(args.webroot), args.entrypoint) -targets, anchors =3D process_all(files) +targets, anchors, usedimages =3D process_all(files) fail =3D False @@ -186,7 +249,14 @@ if args.external: externallinks.sort() for ext in externallinks: if ext !=3D prev: - print(ext) + print(f'link: {ext}') + + prev =3D ext + + externalimages.sort() + for ext in externalimages: + if ext !=3D prev: + print(f'image: {ext}') prev =3D ext else: @@ -196,6 +266,9 @@ else: if check_usage(targets, files, entrypoint): fail =3D True + if check_images(usedimages, imagefiles, args.ignore_images): + fail =3D True + if fail: sys.exit(1) --=20 2.39.1