From nobody Tue Oct 7 17:46:29 2025 Received: from smtpbgsg2.qq.com (smtpbgsg2.qq.com [54.254.200.128]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id F3E2222A7FC for ; Tue, 8 Jul 2025 09:36:55 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=54.254.200.128 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1751967426; cv=none; b=s6fZ43krq4yR54PnWUjMN3XqmclPhgW9QEUP+VVRG1Y+/6Ws0EQB3iQYt+O9Oa5dePtbB1pgtCwVEpFaWW8MOW6gqLGzO3L4jQOXBaU8h9i9XFDrH71XZ1/IJDFfn6wwdSVBaYOMrj164kEot130tbBoR1qyRK6q3ryj50wCPlk= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1751967426; c=relaxed/simple; bh=Sp3m4dJ6Sq8EmMvQ9cxzNKFQFMaqfGHocfCcG7CgQA8=; h=From:To:Cc:Subject:Date:Message-ID:MIME-Version:Content-Type; b=fp9JBZhyMf3dRc7lhxv5gre+UZ5UXNqYTBkvWUo0byO/kpcGZB9WZCckBeFBZs5CP8L4Fm68P1qgex6mGLw58mnra9kKqk1aB8R6P/lUEYGDrZQ5CFgUfjwaEPiXTJI6qxbOulXWNESnO0wIbX6ScGTvwt5vkAtRes9LU/Jz+iM= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=uniontech.com; spf=pass smtp.mailfrom=uniontech.com; dkim=pass (1024-bit key) header.d=uniontech.com header.i=@uniontech.com header.b=B4S6XLk2; arc=none smtp.client-ip=54.254.200.128 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=uniontech.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=uniontech.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=uniontech.com header.i=@uniontech.com header.b="B4S6XLk2" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=uniontech.com; s=onoh2408; t=1751967388; bh=BK22JqGvIVhaZ6AE7O0dYtJVDVL/TvyEbIHbe21FTrs=; h=From:To:Subject:Date:Message-ID:MIME-Version; b=B4S6XLk29NXSoFHVP85z5fV4Udk2T6h+75AWlFKUT16pcJ0EsxSDyCEjUJ1QlK09E r6Y62brXlXpO+MVcI9AH/IoIn188abAHBZCwbDm0vcwCIH4K0L5l4Nt3oFuhJQfVZR O9BG3TNCAmnycQeNp+vTgqMAyr+1am5JOLMZrz+c= X-QQ-mid: zesmtpip4t1751967343td8940b67 X-QQ-Originating-IP: 3nGj+yU5VxqykL1EA0WV60BWCOuINA14BKu6Pf7ezsE= Received: from avenger-e500 ( [localhost]) by bizesmtp.qq.com (ESMTP) with id ; Tue, 08 Jul 2025 17:35:41 +0800 (CST) X-QQ-SSF: 0000000000000000000000000000000 X-QQ-GoodBg: 1 X-BIZMAIL-ID: 5465229485389365988 EX-QQ-RecipientCnt: 9 From: WangYuli To: apw@canonical.com, joe@perches.com, dwaipayanray1@gmail.com, lukas.bulwahn@gmail.com Cc: linux-kernel@vger.kernel.org, zhanjun@uniontech.com, niecheng1@uniontech.com, guanwentao@uniontech.com, WangYuli Subject: [PATCH] checkpatch: Add full-width character detection Date: Tue, 8 Jul 2025 17:34:58 +0800 Message-ID: <2804E0A754F9E415+20250708093458.1230294-1-wangyuli@uniontech.com> X-Mailer: git-send-email 2.50.0 Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable X-QQ-SENDSIZE: 520 Feedback-ID: zesmtpip:uniontech.com:qybglogicsvrgz:qybglogicsvrgz8a-1 X-QQ-XMAILINFO: OMnP0zidhno0r3Q2kZABQRuTwgFKrkcNu67Kbw8eSU1vHLZrWbmBajKd zUTdiNjSmGq23p9WNJaTJWJRpwE0Xm/iGyNxprbz4nXS/HmBpPKmGgEz8EZBtHEDujFB7Ib eANobVLNg0v+RM/sVZmNboT8oPyh4wAcWVPnRsdWx4URB8Qv1PrZlO/LmF2AZZuMBfhNcVO 25F9M2s74puvU8E5waWjyLiHf1AEKDTbP4c0B4A7NWvCupITMwaWxHrvppBo0ukwvGhx4TC kmh4EUEjwfgv/0YY5oXFuvIa2uTffMNRXalBQac113R2ZpLf7lVsQa3tPe9aWrV+qcZWEyT FQBbfBNBWDEXR8IbxqaPneAkiPbk13Vlw/3t1DgW5WiPKpSUNfEHHy1IZ9IKAi9B9gSgzsn hcLYjPkoDKt6o57WFSVw42E2jbpF0rdUjTiGk2MJrhHY6ZGMdxKQ7C3rVhMVt9VpnjWpOxD G227dwr++solyjJACqy72SLffZeCXWGX3S0jrblrS8xwCN44+4zxVBL0XSMtHdhX+BYhuuA 4mOyDOugiWLi6UR/I/G1QlXTHW8EgzNlrFCwn43LzQJik60qctM3NlWejyVS+sxrme8KANU LM31wDkWAQoiVVF6jIbuY20zFsL3Ndm3UXXqMc3lS0KJ7mCwL2XPZIZoXPLC9WwFkGeXypb 3sILonspCqGpEAVRxQQO87N4aPDmXRkG2Dh5QSlhGaAh5RlYb3rvttniUDCbzhF+vEke7nD aRqRQ4aAYs7UJHwl9+1EgdI46QY/uZ3oEHcbEfeBnWA3cLF6MOMwsFV2d8cPOSR1coA9/ZZ gXb/zls6GXrJZigHBUCVFAZi8ZpzWRMN2j+gz5ebdm7Fsu/ZwD6vQ5nALQ0IRBP+PTXhOgQ huky4ZGeCC3En4pEFM/fLyPqedu8bPl+d2nbb38Xh2H7HzDGJbB7RN8GRv3gSXTIiBZfvRs BN25OkbMdFytASgG0KuABrD+etCOHvzYJoyNW9tbzJimlHcpQsU2thdNL2cThwK+zXlfayR e0HRE1ELCJ2J46HZ/hCVL7wxjG8U3UYP4Ezk8CB9lEs8QHYh9/fI5U2EoMoZLmcrdfAN+9n A== X-QQ-XMRINFO: NS+P29fieYNw95Bth2bWPxk= X-QQ-RECHKSPAM: 0 Add comprehensive detection and automatic fixing capability for full-width (Unicode) characters that are commonly mistaken for ASCII punctuation marks. This helps catch input method editor artifacts that can cause compilation errors or formatting issues. The implementation detects 25 types of full-width characters: - Basic punctuation: =EF=BC=9B=EF=BC=8C=E3=80=82=EF=BC=88=EF=BC=89=EF=BC=81= =EF=BC=9F=EF=BC=9A=E3=80=80 - Programming brackets: =EF=BC=BB=EF=BC=BD=EF=BD=9B=EF=BD=9D=EF=BC=9C=EF=BC= =9E - Assignment and comparison: =EF=BC=9D - Arithmetic operators: =EF=BC=8B=EF=BC=8D=EF=BC=8A=EF=BC=8F=EF=BC=BC - Other programming symbols: =EF=BC=85=EF=BC=83=EF=BC=86=EF=BD=9C Detection covers three areas: 1. Code lines (lines starting with '+') - FULLWIDTH_CHARS 2. Commit messages - FULLWIDTH_CHARS_COMMIT 3. Subject lines - FULLWIDTH_CHARS_SUBJECT Example usage: ./scripts/checkpatch.pl my_patch.patch ./scripts/checkpatch.pl --fix my_patch.patch ./scripts/checkpatch.pl --fix-inplace my_source.c Signed-off-by: WangYuli --- scripts/checkpatch.pl | 84 +++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 84 insertions(+) diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl index 664f7b7a622c..bd691dc848a2 100755 --- a/scripts/checkpatch.pl +++ b/scripts/checkpatch.pl @@ -75,6 +75,41 @@ my $git_command =3D'export LANGUAGE=3Den_US.UTF-8; git'; my $tabsize =3D 8; my ${CONFIG_} =3D "CONFIG_"; =20 +# Full-width character mappings (UTF-8 byte sequences to ASCII) +my %fullwidth_chars =3D ( + # Basic punctuation + "\xef\xbc\x9b" =3D> [";", "semicolon", "=EF=BC=9B"], + "\xef\xbc\x8c" =3D> [",", "comma", "=EF=BC=8C"], + "\xe3\x80\x82" =3D> [".", "period", "=E3=80=82"], + "\xef\xbc\x88" =3D> ["(", "opening parenthesis", "=EF=BC=88"], + "\xef\xbc\x89" =3D> [")", "closing parenthesis", "=EF=BC=89"], + "\xef\xbc\x81" =3D> ["!", "exclamation mark", "=EF=BC=81"], + "\xef\xbc\x9f" =3D> ["?", "question mark", "=EF=BC=9F"], + "\xef\xbc\x9a" =3D> [":", "colon", "=EF=BC=9A"], + "\xe3\x80\x80" =3D> [" ", "space", "=E3=80=80"], + # Programming brackets + "\xef\xbc\xbb" =3D> ["[", "left square bracket", "=EF=BC=BB"], + "\xef\xbc\xbd" =3D> ["]", "right square bracket", "=EF=BC=BD"], + "\xef\xbd\x9b" =3D> ["{", "left curly bracket", "=EF=BD=9B"], + "\xef\xbd\x9d" =3D> ["}", "right curly bracket", "=EF=BD=9D"], + "\xef\xbc\x9c" =3D> ["<", "less-than sign", "=EF=BC=9C"], + "\xef\xbc\x9e" =3D> [">", "greater-than sign", "=EF=BC=9E"], + # Assignment and comparison + "\xef\xbc\x9d" =3D> ["=3D", "equals sign", "=EF=BC=9D"], + # Arithmetic operators + "\xef\xbc\x8b" =3D> ["+", "plus sign", "=EF=BC=8B"], + "\xef\xbc\x8d" =3D> ["-", "minus sign", "=EF=BC=8D"], + "\xef\xbc\x8a" =3D> ["*", "asterisk", "=EF=BC=8A"], + "\xef\xbc\x8f" =3D> ["/", "solidus", "=EF=BC=8F"], + "\xef\xbc\xbc" =3D> ["\\", "reverse solidus", "=EF=BC=BC"], + # Other programming symbols + "\xef\xbc\x85" =3D> ["%", "percent sign", "=EF=BC=85"], + "\xef\xbc\x83" =3D> ["#", "number sign", "=EF=BC=83"], + "\xef\xbc\x86" =3D> ["&", "ampersand", "=EF=BC=86"], + "\xef\xbd\x9c" =3D> ["|", "vertical line", "=EF=BD=9C"], +); +my $fullwidth_pattern =3D join('|', map { quotemeta($_) } keys %fullwidth_= chars); + my %maybe_linker_symbol; # for externs in c exceptions, when seen in *vmli= nux.lds.h =20 sub help { @@ -1018,6 +1053,40 @@ sub read_words { return 0; } =20 +# Check for full-width characters and optionally fix them +sub check_fullwidth_chars { + my ($line, $context, $warning_type, $apply_fix, $fixlinenr, $fixed_ref, $= herecurr) =3D @_; + my @found_chars =3D (); + my $fixed_line =3D $line; + my $has_fixes =3D 0; + + return 0 unless $line =3D~ /$fullwidth_pattern/o; + + if ($apply_fix) { + $fixed_line =3D~ s/($fullwidth_pattern)/$fullwidth_chars{$1}[0]/ge; + $has_fixes =3D ($fixed_line ne $line); + } + + while ($line =3D~ /($fullwidth_pattern)/go) { + my $fullwidth_byte_seq =3D $1; + if (exists $fullwidth_chars{$fullwidth_byte_seq}) { + my ($ascii_char, $name, $fullwidth_char) =3D @{$fullwidth_chars{$fullwi= dth_byte_seq}}; + push @found_chars, "Full-width $name ($fullwidth_char) found$context, u= se ASCII $name ($ascii_char) instead"; + } + } + + if (@found_chars) { + foreach my $msg (@found_chars) { + WARN($warning_type, $msg . "\n" . $herecurr); + } + if ($apply_fix && $has_fixes && defined $fixed_ref) { + $fixed_ref->[$fixlinenr] =3D $fixed_line; + } + } + + return scalar @found_chars; +} + my $const_structs; if (show_type("CONST_STRUCT")) { read_words(\$const_structs, $conststructsfile) @@ -2960,6 +3029,11 @@ sub process { $commit_log_has_diff =3D 1; } =20 +# Check for full-width characters in commit message + if ($in_commit_log && show_type("FULLWIDTH_CHARS_COMMIT")) { + check_fullwidth_chars($rawline, " in commit message", "FULLWIDTH_CHARS_= COMMIT", 0, 0, undef, $herecurr); + } + # Check for incorrect file permissions if ($line =3D~ /^new (file )?mode.*[7531]\d{0,2}$/) { my $permhere =3D $here . "FILE: $realfile\n"; @@ -3265,6 +3339,11 @@ sub process { "A patch subject line should describe the change not the tool that= found it\n" . $herecurr); } =20 +# Check for full-width characters in Subject line + if ($in_header_lines && $line =3D~ /^Subject:/i && show_type("FULLWIDTH_= CHARS_SUBJECT")) { + check_fullwidth_chars($rawline, " in subject line", "FULLWIDTH_CHARS_SU= BJECT", 0, 0, undef, $herecurr); + } + # Check for Gerrit Change-Ids not in any patch context if ($realfile eq '' && !$has_patch_separator && $line =3D~ /^\s*change-i= d:/i) { if (ERROR("GERRIT_CHANGE_ID", @@ -3960,6 +4039,11 @@ sub process { } } =20 +# check for full-width characters (full-width punctuation marks, etc.) + if ($rawline =3D~ /^\+/ && show_type("FULLWIDTH_CHARS")) { + check_fullwidth_chars($rawline, "", "FULLWIDTH_CHARS", $fix, $fixlinenr= , \@fixed, $herecurr); + } + # check multi-line statement indentation matches previous line if ($perl_version_ok && $prevline =3D~ /^\+([ \t]*)((?:$c90_Keywords(?:\s+if)\s*)|(?:$Declar= e\s*)?(?:$Ident|\(\s*\*\s*$Ident\s*\))\s*|(?:\*\s*)*$Lval\s*=3D\s*$Ident\s*= )\(.*(\&\&|\|\||,)\s*$/) { --=20 2.50.0