From nobody Mon Feb 9 01:16:11 2026 Delivered-To: importer@patchew.org Authentication-Results: mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=pass(p=none dis=none) header.from=redhat.com ARC-Seal: i=1; a=rsa-sha256; t=1657212370; cv=none; d=zohomail.com; s=zohoarc; b=QdCCRmo2E+IBd/sx1dfS26SGhhrgh4FZPdwUEFGFMICLau/rp0+9LFv+5PIUapJzmLl5z+pGoin73yP+wchlhQDDHqS1VUeGfv8AM85UYc5qSlUQOcvFSs3xfrNAzCMOqD5MjxK2vU5W0JyNkDkiZu3QUymRLjRl0TkumXOUfH0= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zohomail.com; s=zohoarc; t=1657212370; h=Content-Type:Content-Transfer-Encoding:Cc:Date:From:In-Reply-To:List-Subscribe:List-Post:List-Id:List-Archive:List-Help:List-Unsubscribe:MIME-Version:Message-ID:References:Sender:Subject:To; bh=5K2S0LswOphsaHudz/nfJSDJC4MuQQIoZkbgm3JRDig=; b=B76btlchbXgwZsLaf2T2Y6XYkMN9beY9Aq7nWsxWQ4TX1gm8zM28OWQhIzEf7E/vexTXtM5XHkdgWnK8yMIEDma65eZf0qZkAtscqyXGVKYJKOEqQukK1kGqLl4vo8hUO4MDMxd1kv+1qOGXoA14nTUbJH6I5kUXTwCRmV5izmU= ARC-Authentication-Results: i=1; mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=pass header.from= (p=none dis=none) Return-Path: Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) by mx.zohomail.com with SMTPS id 1657212370452672.3680732978399; Thu, 7 Jul 2022 09:46:10 -0700 (PDT) Received: from localhost ([::1]:53602 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1o9Udl-000057-9r for importer@patchew.org; Thu, 07 Jul 2022 12:46:09 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:51448) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1o9UW2-0008UL-52 for qemu-devel@nongnu.org; Thu, 07 Jul 2022 12:38:10 -0400 Received: from us-smtp-delivery-124.mimecast.com ([170.10.129.124]:40160) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1o9UVp-0006nr-CZ for qemu-devel@nongnu.org; Thu, 07 Jul 2022 12:38:09 -0400 Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-591-mzX1WXhcMHelPb8oIYLGgw-1; Thu, 07 Jul 2022 12:37:51 -0400 Received: from smtp.corp.redhat.com (int-mx10.intmail.prod.int.rdu2.redhat.com [10.11.54.10]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 7173F1833BE7; Thu, 7 Jul 2022 16:37:31 +0000 (UTC) Received: from localhost.localdomain.com (unknown [10.33.36.71]) by smtp.corp.redhat.com (Postfix) with ESMTP id 30A23492C3B; Thu, 7 Jul 2022 16:37:30 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1657211874; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=5K2S0LswOphsaHudz/nfJSDJC4MuQQIoZkbgm3JRDig=; b=fdnPrFZodAUh8rufrgNnqarJ9YKd8J2IbBN9hGSLe2Y1UXnHz6xcPZohR1pvNh6dKavZYa 1f2OolFPCl4J5Nh15saYnAOMgTs8Z4e8QN681JNqA4XP93AwNlvvdLtnyR/r8W18M148b1 zl7slPX+vbn2gP7QQx9Rmn26RmhJKQw= X-MC-Unique: mzX1WXhcMHelPb8oIYLGgw-1 From: =?UTF-8?q?Daniel=20P=2E=20Berrang=C3=A9?= To: qemu-devel@nongnu.org Cc: Eric Blake , =?UTF-8?q?Philippe=20Mathieu-Daud=C3=A9?= , Paolo Bonzini , =?UTF-8?q?Alex=20Benn=C3=A9e?= , Thomas Huth , Peter Maydell , =?UTF-8?q?Daniel=20P=2E=20Berrang=C3=A9?= Subject: [PATCH v3 5/9] tests/style: check for commonly doubled up words Date: Thu, 7 Jul 2022 17:37:16 +0100 Message-Id: <20220707163720.1421716-6-berrange@redhat.com> In-Reply-To: <20220707163720.1421716-1-berrange@redhat.com> References: <20220707163720.1421716-1-berrange@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 2.85 on 10.11.54.10 Received-SPF: pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Received-SPF: pass client-ip=170.10.129.124; envelope-from=berrange@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -21 X-Spam_score: -2.2 X-Spam_bar: -- X-Spam_report: (-2.2 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.082, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: "Qemu-devel" X-ZohoMail-DKIM: pass (identity @redhat.com) X-ZM-MESSAGEID: 1657212372342100003 This style check looks for cases where the words the then in an on if is it but for or at and do to are repeated in a sentence. It uses a multi-line match to catch the especially common mistake in docs where the last word on a line is repeated as the first word of the next line. There are inevitably be some false positives with this check, for example, some docs data tables have the same word in adjacent columns. There are a few different ways to express this text as a regex which have wildly different execution time. This impl was carefully chosen to attempt to minimize matching time. Signed-off-by: Daniel P. Berrang=C3=A9 --- tests/style.yml | 30 ++++++++++++++++++++++++++++++ 1 file changed, 30 insertions(+) diff --git a/tests/style.yml b/tests/style.yml index 704227d8e9..d06c55bb29 100644 --- a/tests/style.yml +++ b/tests/style.yml @@ -91,3 +91,33 @@ int_assign_bool: files: \.c$ prohibit: \.*=3D *(true|false)\b message: use bool type for boolean values + +double_words: + multiline: true + prohibit: + terms: + - the\s+the + - then\s+then + - in\s+in + - an\s+an + - on\s+on + - if\s+if + - is\s+is + - it\s+it + - but\s+but + - for\s+for + - or\s+or + - at\s+at + - and\s+and + - do\s+do + - to\s+to + - can\s+can + prefix: \b(?