From nobody Sat Oct 4 14:35:19 2025 Received: from smtpbgjp3.qq.com (smtpbgjp3.qq.com [54.92.39.34]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 50E0A1E3DFE for ; Fri, 15 Aug 2025 08:07:01 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=54.92.39.34 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1755245226; cv=none; b=WIgza7+p9vMUfoHpSDMFyUbsYMVxdUG1hMonlMSKpZrZ8978COhthhBdZi+dI9LkJPwUdW156cakWckS44RPF6jZwOAXfL1WzqWrQ6BqYEeEGeprgiK8hfVLndDdZI1ERXUD4/KnKGrpdmHp8TM/8dz1PJKdXcYDzBqCCdPhh7o= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1755245226; c=relaxed/simple; bh=Y6/KP8SnFn8kYOdnzmcBxBAOm8byDhvWCnaIwn7lidw=; h=From:To:Cc:Subject:Date:Message-Id:MIME-Version:Content-Type; b=R4/o3FbTkLKwcN+nGV/cXXQkX6TuuCu6ytpwmV6XBn7tuEtBlW5pqwbS1ouPgQSQY2VwRVzBXA9MqqKRupadyNq4ngBYuAmOSxRvBLl0MJ+vqkUqqttLLP5MUnO2thKXB679MQV74JutnS58lk7hWxAw0zrA6yWc6pRXWNYcxCs= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=uniontech.com; spf=pass smtp.mailfrom=uniontech.com; dkim=pass (1024-bit key) header.d=uniontech.com header.i=@uniontech.com header.b=I/IqmJGq; arc=none smtp.client-ip=54.92.39.34 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=uniontech.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=uniontech.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=uniontech.com header.i=@uniontech.com header.b="I/IqmJGq" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=uniontech.com; s=onoh2408; t=1755245184; bh=5EE70agYu97ffuNpv7fzYjOIQxPAG4AygdOEE1kVgNw=; h=From:To:Subject:Date:Message-Id:MIME-Version; b=I/IqmJGqSzvvCscrqtOZwMVCFjrqSHcaYgV5c+NwOaQPYwZiOATZx4+EYKhZxpMfw KJopDCQIM3XPUXVEUtr2oamFeKtOaKYMKnU11KPcHYlVb4Ql/Ln8DXNvADC0WJ0Hlk BGCwRdEWokmVgv8ktolrChwvmAPAKHZ1KSVSFQ9s= X-QQ-mid: zesmtpip4t1755245144t981c0aa4 X-QQ-Originating-IP: 98xEkdIS87v/KEyvFWxjMEUYAp1ewusHfqXMMPR3CdM= Received: from localhost.localdomain ( [localhost]) by bizesmtp.qq.com (ESMTP) with id ; Fri, 15 Aug 2025 16:05:43 +0800 (CST) X-QQ-SSF: 0000000000000000000000000000000 X-QQ-GoodBg: 1 X-BIZMAIL-ID: 18301434015980025460 EX-QQ-RecipientCnt: 9 From: Morduang Zang To: apw@canonical.com, joe@perches.com, dwaipayanray1@gmail.com, lukas.bulwahn@gmail.com Cc: linux-kernel@vger.kernel.org, wangyuli@uniontech.com, zhanjun@uniontech.com, niecheng1@uniontech.com, Morduang Zang Subject: [PATCH RESEND] checkpatch: Add full-width character detection Date: Fri, 15 Aug 2025 16:05:40 +0800 Message-Id: <20250815080540.136786-1-zhangdandan@uniontech.com> X-Mailer: git-send-email 2.20.1 Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable X-QQ-SENDSIZE: 520 Feedback-ID: zesmtpip:uniontech.com:qybglogicsvrgz:qybglogicsvrgz6b-0 X-QQ-XMAILINFO: OSB2dnkaylaqyliZs9rSDORPGyVb1pKsF+0vdM3RP+LqOe4aZgOhjc3V 8YK0Fkk2psrcnKAa0365iAHsvC4wbcVN3vc3qApN7/w7nG9WG7XySRMqMyK8ndqE9bojQWQ qIxhRsOkNhfPI9ouIGKFPT1gSyFkcLpyn42ZFicy0AbckfaOP34pWObrUOZD1Ej8TfIGM9q jlywrT/cjZO73VGqMuj94vwxcVnlixALRJE06RwUHOJZ1ORQrub6hA4DRYHJVulsIJsba+Q L2UKt0H7PiRJ0XZj2RQhNixacF+9S6vZrgbGLjo4kTcPiuYxH4JwwkPWljuZr9yyUNQ2ZBN ZYJmvucnBTucNWruAvy98Wz0n6GaQgDI/VNSJ30jc4xGaXqvArpLTpsZkD8yCXmVO8tpIYt x8Tayr5uowzUPFG8OlbLsEdHFAkHHqla3DgH2SUykE/18gYI2hMW3yr65PlgSGjOlxWiXRA ixa1ddBFuAzs1AaSkUBgRM8tJHr5NSW1EKCFE05cLN/vFoPo81SJcuqmyy0KpkcKAw3OJs7 esfd1QIDOYyhVPsreUJOD5kGfHFCyOJi0PjnhS3UHvXwQeEiBspfCevlGzlfmE/ScsLlCy+ artBvkRMoEj+0vHNUOXmp5KBAdj1bcZRvvLrFoeptNTwj1VmM4ucIQ0CXyC2IM1y6eI2Q/j FqmIoKmy/WAWmoRaW9bUtfG7ydLIdCCc2sNrKeCny7x1J9BQP2QPYD4VUka3S2y3xYFPE+d MYYVQWMi+tRtsWo8RRuKxKXv3gtROcX/iMLdECdryrpSUEjM5HBG/+1AsvZPX7Yrfm3J10p MSSlZLMTWzks533q68xvO8XlBT8cJ6lMmrDHIoI1ZDMKkJvpC9xVlNJTyeDwvHIfaYoB1LW hRAiMQopBsn5QQQM0jG+lE206oRr2wiN4ivxRlTRF18SUmN282qA2Kau846+OAWVEY2HxNp Y/wQPULuoXnxvqgKpKutNVyGxUECvjHz2JWTZ61yQ2JY4D4Fhb5PTJyIdmb3ZbOLCUDFddv eldMSAau5X7eMZLeggDSAoEQ6SHN+2LS4rra/K0/ZAbDPa+yHtjikFwu0+BISnHANvfwVrb Akx9dLrfseSyjejqXxOZ4ktVnWAbJjwPtQy81q7CHjtpMPYcULy/wMFV/TKp/KQEcIcIIpn AhNO7OpILknFbkOi3uGpChs78Xo3imD+lJDo X-QQ-XMRINFO: NS+P29fieYNw95Bth2bWPxk= X-QQ-RECHKSPAM: 0 From: WangYuli Add comprehensive detection and automatic fixing capability for full-width (Unicode) characters that are commonly mistaken for ASCII punctuation marks. This helps catch input method editor artifacts that can cause compilation errors or formatting issues. The implementation detects 25 types of full-width characters: - Basic punctuation: =EF=BC=9B=EF=BC=8C=E3=80=82=EF=BC=88=EF=BC=89=EF=BC=81= =EF=BC=9F=EF=BC=9A=E3=80=80 - Programming brackets: =EF=BC=BB=EF=BC=BD=EF=BD=9B=EF=BD=9D=EF=BC=9C=EF=BC= =9E - Assignment and comparison: =EF=BC=9D - Arithmetic operators: =EF=BC=8B=EF=BC=8D=EF=BC=8A=EF=BC=8F=EF=BC=BC - Other programming symbols: =EF=BC=85=EF=BC=83=EF=BC=86=EF=BD=9C Detection covers three areas: 1. Code lines (lines starting with '+') - FULLWIDTH_CHARS 2. Commit messages - FULLWIDTH_CHARS_COMMIT 3. Subject lines - FULLWIDTH_CHARS_SUBJECT Example usage: ./scripts/checkpatch.pl my_patch.patch ./scripts/checkpatch.pl --fix my_patch.patch ./scripts/checkpatch.pl --fix-inplace my_source.c Signed-off-by: WangYuli Signed-off-by: Morduang Zang --- scripts/checkpatch.pl | 84 +++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 84 insertions(+) diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl index e722dd6fa8ef..f4cb547a470b 100755 --- a/scripts/checkpatch.pl +++ b/scripts/checkpatch.pl @@ -75,6 +75,41 @@ my $git_command =3D'export LANGUAGE=3Den_US.UTF-8; git'; my $tabsize =3D 8; my ${CONFIG_} =3D "CONFIG_"; =20 +# Full-width character mappings (UTF-8 byte sequences to ASCII) +my %fullwidth_chars =3D ( + # Basic punctuation + "\xef\xbc\x9b" =3D> [";", "semicolon", "=EF=BC=9B"], + "\xef\xbc\x8c" =3D> [",", "comma", "=EF=BC=8C"], + "\xe3\x80\x82" =3D> [".", "period", "=E3=80=82"], + "\xef\xbc\x88" =3D> ["(", "opening parenthesis", "=EF=BC=88"], + "\xef\xbc\x89" =3D> [")", "closing parenthesis", "=EF=BC=89"], + "\xef\xbc\x81" =3D> ["!", "exclamation mark", "=EF=BC=81"], + "\xef\xbc\x9f" =3D> ["?", "question mark", "=EF=BC=9F"], + "\xef\xbc\x9a" =3D> [":", "colon", "=EF=BC=9A"], + "\xe3\x80\x80" =3D> [" ", "space", "=E3=80=80"], + # Programming brackets + "\xef\xbc\xbb" =3D> ["[", "left square bracket", "=EF=BC=BB"], + "\xef\xbc\xbd" =3D> ["]", "right square bracket", "=EF=BC=BD"], + "\xef\xbd\x9b" =3D> ["{", "left curly bracket", "=EF=BD=9B"], + "\xef\xbd\x9d" =3D> ["}", "right curly bracket", "=EF=BD=9D"], + "\xef\xbc\x9c" =3D> ["<", "less-than sign", "=EF=BC=9C"], + "\xef\xbc\x9e" =3D> [">", "greater-than sign", "=EF=BC=9E"], + # Assignment and comparison + "\xef\xbc\x9d" =3D> ["=3D", "equals sign", "=EF=BC=9D"], + # Arithmetic operators + "\xef\xbc\x8b" =3D> ["+", "plus sign", "=EF=BC=8B"], + "\xef\xbc\x8d" =3D> ["-", "minus sign", "=EF=BC=8D"], + "\xef\xbc\x8a" =3D> ["*", "asterisk", "=EF=BC=8A"], + "\xef\xbc\x8f" =3D> ["/", "solidus", "=EF=BC=8F"], + "\xef\xbc\xbc" =3D> ["\\", "reverse solidus", "=EF=BC=BC"], + # Other programming symbols + "\xef\xbc\x85" =3D> ["%", "percent sign", "=EF=BC=85"], + "\xef\xbc\x83" =3D> ["#", "number sign", "=EF=BC=83"], + "\xef\xbc\x86" =3D> ["&", "ampersand", "=EF=BC=86"], + "\xef\xbd\x9c" =3D> ["|", "vertical line", "=EF=BD=9C"], +); +my $fullwidth_pattern =3D join('|', map { quotemeta($_) } keys %fullwidth_= chars); + my %maybe_linker_symbol; # for externs in c exceptions, when seen in *vmli= nux.lds.h =20 sub help { @@ -1019,6 +1054,40 @@ sub read_words { return 0; } =20 +# Check for full-width characters and optionally fix them +sub check_fullwidth_chars { + my ($line, $context, $warning_type, $apply_fix, $fixlinenr, $fixed_ref, $= herecurr) =3D @_; + my @found_chars =3D (); + my $fixed_line =3D $line; + my $has_fixes =3D 0; + + return 0 unless $line =3D~ /$fullwidth_pattern/o; + + if ($apply_fix) { + $fixed_line =3D~ s/($fullwidth_pattern)/$fullwidth_chars{$1}[0]/ge; + $has_fixes =3D ($fixed_line ne $line); + } + + while ($line =3D~ /($fullwidth_pattern)/go) { + my $fullwidth_byte_seq =3D $1; + if (exists $fullwidth_chars{$fullwidth_byte_seq}) { + my ($ascii_char, $name, $fullwidth_char) =3D @{$fullwidth_chars{$fullwi= dth_byte_seq}}; + push @found_chars, "Full-width $name ($fullwidth_char) found$context, u= se ASCII $name ($ascii_char) instead"; + } + } + + if (@found_chars) { + foreach my $msg (@found_chars) { + WARN($warning_type, $msg . "\n" . $herecurr); + } + if ($apply_fix && $has_fixes && defined $fixed_ref) { + $fixed_ref->[$fixlinenr] =3D $fixed_line; + } + } + + return scalar @found_chars; +} + my $const_structs; if (show_type("CONST_STRUCT")) { read_words(\$const_structs, $conststructsfile) @@ -2961,6 +3030,11 @@ sub process { $commit_log_has_diff =3D 1; } =20 +# Check for full-width characters in commit message + if ($in_commit_log && show_type("FULLWIDTH_CHARS_COMMIT")) { + check_fullwidth_chars($rawline, " in commit message", "FULLWIDTH_CHARS_= COMMIT", 0, 0, undef, $herecurr); + } + # Check for incorrect file permissions if ($line =3D~ /^new (file )?mode.*[7531]\d{0,2}$/) { my $permhere =3D $here . "FILE: $realfile\n"; @@ -3266,6 +3340,11 @@ sub process { "A patch subject line should describe the change not the tool that= found it\n" . $herecurr); } =20 +# Check for full-width characters in Subject line + if ($in_header_lines && $line =3D~ /^Subject:/i && show_type("FULLWIDTH_= CHARS_SUBJECT")) { + check_fullwidth_chars($rawline, " in subject line", "FULLWIDTH_CHARS_SU= BJECT", 0, 0, undef, $herecurr); + } + # Check for Gerrit Change-Ids not in any patch context if ($realfile eq '' && !$has_patch_separator && $line =3D~ /^\s*change-i= d:/i) { if (ERROR("GERRIT_CHANGE_ID", @@ -3974,6 +4053,11 @@ sub process { } } =20 +# check for full-width characters (full-width punctuation marks, etc.) + if ($rawline =3D~ /^\+/ && show_type("FULLWIDTH_CHARS")) { + check_fullwidth_chars($rawline, "", "FULLWIDTH_CHARS", $fix, $fixlinenr= , \@fixed, $herecurr); + } + # check multi-line statement indentation matches previous line if ($perl_version_ok && $prevline =3D~ /^\+([ \t]*)((?:$c90_Keywords(?:\s+if)\s*)|(?:$Declar= e\s*)?(?:$Ident|\(\s*\*\s*$Ident\s*\))\s*|(?:\*\s*)*$Lval\s*=3D\s*$Ident\s*= )\(.*(\&\&|\|\||,)\s*$/) { --=20 2.20.1