From nobody Wed May 1 22:12:26 2024 Delivered-To: importer@patchew.org Received-SPF: pass (zohomail.com: domain of groups.io designates 66.175.222.12 as permitted sender) client-ip=66.175.222.12; envelope-from=bounce+27952+66316+1787277+3901457@groups.io; helo=web01.groups.io; Authentication-Results: mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of groups.io designates 66.175.222.12 as permitted sender) smtp.mailfrom=bounce+27952+66316+1787277+3901457@groups.io; dmarc=fail(p=none dis=none) header.from=intel.com ARC-Seal: i=1; a=rsa-sha256; t=1602834100; cv=none; d=zohomail.com; s=zohoarc; b=gdDpEUMdp5ttUDgCNyMVSMCfXYiTfEfGYsHuJCtka4DjKxfrg5k219ZMu4fk+VcDLDbG7R490rHApScFC9PbqiNDuxpD+ioM7ORPR/3v+ZfZIthcPyI7jTi6lna5b6fovtxIBH/yH/EW4Y2rYEpK/lGGPctKSV1Wqvxz21jgZBk= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zohomail.com; s=zohoarc; t=1602834100; h=Content-Transfer-Encoding:Cc:Date:From:List-Id:List-Unsubscribe:MIME-Version:Message-ID:Reply-To:Sender:Subject:To; bh=lqbQaqmT6o/wJD5tW9EgOWduTZ1+ip5tDS/45lxfmeM=; b=jG7obeAf3HGzJd5Mg20T6hqFPEgNOl8FM3UaO1fPrEe/Vax/RlAgmzBgdM6PDOLaIjJTbb3Khk950IVO/CPpVYxPFOlbvstsLEkBCelk9niAhyz5iapnUr6AKnZQ7EQsyJ7M4O6DjQ3ZgQYMoeXja7CuEsxVZcW6OFA12NtmYX0= ARC-Authentication-Results: i=1; mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of groups.io designates 66.175.222.12 as permitted sender) smtp.mailfrom=bounce+27952+66316+1787277+3901457@groups.io; dmarc=fail header.from= (p=none dis=none) header.from= Received: from web01.groups.io (web01.groups.io [66.175.222.12]) by mx.zohomail.com with SMTPS id 1602834100005207.29557103206844; Fri, 16 Oct 2020 00:41:40 -0700 (PDT) Return-Path: X-Received: by 127.0.0.2 with SMTP id 1WffYY1788612xBJMvY12Xgp; Fri, 16 Oct 2020 00:41:39 -0700 X-Received: from mga04.intel.com (mga04.intel.com [192.55.52.120]) by mx.groups.io with SMTP id smtpd.web11.10025.1602834093820351510 for ; Fri, 16 Oct 2020 00:41:34 -0700 IronPort-SDR: iYAiqFuOh2vAJ7Q5VFYBOMWWFuHoRkwguk+184JCX9jrJCm36qv4K0li4XAAzYWShdunCd1CJC IImIoBT6A41g== X-IronPort-AV: E=McAfee;i="6000,8403,9775"; a="163933661" X-IronPort-AV: E=Sophos;i="5.77,382,1596524400"; d="scan'208";a="163933661" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False X-Received: from fmsmga004.fm.intel.com ([10.253.24.48]) by fmsmga104.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 16 Oct 2020 00:41:30 -0700 IronPort-SDR: NS4G038UWC2oYCwFWG8O2Nlag0tDZm+N2SO6/kV70OSvGcfgKHmZqOOoiPDKt/x7ouxWUVdecy Y8TxPiCy6X7g== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.77,382,1596524400"; d="scan'208";a="346453445" X-Received: from shwdeopensfp777.ccr.corp.intel.com ([10.239.158.78]) by fmsmga004.fm.intel.com with ESMTP; 16 Oct 2020 00:41:25 -0700 From: "Wang, Jian J" To: devel@edk2.groups.io Cc: Bob Feng , Liming Gao , Yuwei Chen Subject: [edk2-devel] [PATCH] BaseTools: fix decoding issue in file operation Date: Fri, 16 Oct 2020 15:41:24 +0800 Message-Id: <20201016074124.831-1-jian.j.wang@intel.com> MIME-Version: 1.0 Precedence: Bulk List-Unsubscribe: Sender: devel@edk2.groups.io List-Id: Mailing-List: list devel@edk2.groups.io; contact devel+owner@edk2.groups.io Reply-To: devel@edk2.groups.io,jian.j.wang@intel.com X-Gm-Message-State: dytwjb2P9XExPBSi1bnnI8Clx1787277AA= Content-Transfer-Encoding: quoted-printable DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=groups.io; q=dns/txt; s=20140610; t=1602834099; bh=U8D6yCOOYGjNyzzN0VnOP038lCDwQtHyVyrgMw027hw=; h=Cc:Date:From:Reply-To:Subject:To; b=nn/C3RTOmTssWGDqHmQwkcSLAwQV9ExbNjfB5Mx5EWyzB3iFoTqKAR4Ako5j6JED00y GYD8OJmSO6YB2gY0OVqP/hkmE2NvqM7Bi3+3u13IPBH2bIiOmj9Y0mLtV41wL/qUqS17H r1cNS0KUXM9wznvCl2LIVCiV3i6JHSNS91Q= X-ZohoMail-DKIM: pass (identity @groups.io) Content-Type: text/plain; charset="utf-8" The build tool reports failure upon file read, such as calling trim to clean preprocessed source files, if the tool is running on OS with non-western code-page and the source file has non-ascii characters. Even if utf-8 has also problem when encountering some characters encoded in cp1252 (such 0x92, 0x96, 0xa0, etc). Currently, the safest way to read file in python code is using 'latin-1' (iso-8859-1) because it uses every byte between 00-FF and then won't cause encoding/decoding issue. It behaves almost the same as reading file in binary mode. cp1252 is similar to latin-1 but it doesn't support encoding '\x80' to '\xff' and doesn't support decoding following bytes: '\x81', '\x8d', '\x8f', '\x90', '\x9d' So if there're utf-8/16 encoded characters in file, it will fail sometimes. Refer to following links for details: https://en.wikipedia.org/wiki/Latin-1_Supplement_(Unicode_block) https://en.wikipedia.org/wiki/Windows-1252 https://kb.iu.edu/d/aepu https://www.i18nqa.com/debug/table-iso8859-1-vs-windows-1252.html One can use following python code to verify this. for i in range(0x100): try: chr(i).encode('latin-1') except: print(" %s cannot encode %02x" % ('latin-1', i)) for i in range(0x100): try: b =3D bytes([i]) b.decode('latin-1') except: print(" %s cannot decode %02x" % ('latin-1', i)) This patch add code to enforce using 'latin-1' as encoding argument of open() in function OpenLongFilePath(), if the open mode is for text file only. This can solve the file decoding issue completely. The possible related BZs: https://bugzilla.tianocore.org/show_bug.cgi?id=3D1434 https://bugzilla.tianocore.org/show_bug.cgi?id=3D1637 https://bugzilla.tianocore.org/show_bug.cgi?id=3D2578 https://bugzilla.tianocore.org/show_bug.cgi?id=3D2709 https://bugzilla.tianocore.org/show_bug.cgi?id=3D2829 Cc: Bob Feng Cc: Liming Gao Cc: Yuwei Chen Signed-off-by: Jian J Wang --- BaseTools/Source/Python/Common/LongFilePathSupport.py | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/BaseTools/Source/Python/Common/LongFilePathSupport.py b/BaseTo= ols/Source/Python/Common/LongFilePathSupport.py index 38c4396544..c8dce077f2 100644 --- a/BaseTools/Source/Python/Common/LongFilePathSupport.py +++ b/BaseTools/Source/Python/Common/LongFilePathSupport.py @@ -30,7 +30,8 @@ def LongFilePath(FileName): # wrap open to support opening a long file path # def OpenLongFilePath(FileName, Mode=3D'r', Buffer=3D -1): - return open(LongFilePath(FileName), Mode, Buffer) + Encoding =3D None if 'b' in Mode else 'latin-1' + return open(LongFilePath(FileName), Mode, Buffer, Encoding) =20 def CodecOpenLongFilePath(Filename, Mode=3D'rb', Encoding=3DNone, Errors= =3D'strict', Buffering=3D1): return codecs.open(LongFilePath(Filename), Mode, Encoding, Errors, Buf= fering) --=20 2.24.0.windows.2 -=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D- Groups.io Links: You receive all messages sent to this group. View/Reply Online (#66316): https://edk2.groups.io/g/devel/message/66316 Mute This Topic: https://groups.io/mt/77546105/1787277 Group Owner: devel+owner@edk2.groups.io Unsubscribe: https://edk2.groups.io/g/devel/unsub [importer@patchew.org] -=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-