BaseTools/Source/Python/Common/LongFilePathSupport.py | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
Tested-by: Yunhua Feng <fengyunhua@byosoft.com.cn>
-----邮件原件-----
发件人: bounce+27952+66316+5049190+8953120@groups.io <bounce+27952+66316+5049190+8953120@groups.io> 代表 Wang, Jian J
发送时间: 2020年10月16日 15:41
收件人: devel@edk2.groups.io
抄送: Bob Feng <bob.c.feng@intel.com>; Liming Gao <gaoliming@byosoft.com.cn>; Yuwei Chen <yuwei.chen@intel.com>
主题: [edk2-devel] [PATCH] BaseTools: fix decoding issue in file operation
The build tool reports failure upon file read, such as calling trim
to clean preprocessed source files, if the tool is running on OS with
non-western code-page and the source file has non-ascii characters.
Even if utf-8 has also problem when encountering some characters
encoded in cp1252 (such 0x92, 0x96, 0xa0, etc).
Currently, the safest way to read file in python code is using
'latin-1' (iso-8859-1) because it uses every byte between 00-FF
and then won't cause encoding/decoding issue. It behaves almost
the same as reading file in binary mode.
cp1252 is similar to latin-1 but it doesn't support encoding '\x80'
to '\xff' and doesn't support decoding following bytes:
'\x81', '\x8d', '\x8f', '\x90', '\x9d'
So if there're utf-8/16 encoded characters in file, it will fail
sometimes.
Refer to following links for details:
https://en.wikipedia.org/wiki/Latin-1_Supplement_(Unicode_block)
https://en.wikipedia.org/wiki/Windows-1252
https://kb.iu.edu/d/aepu
https://www.i18nqa.com/debug/table-iso8859-1-vs-windows-1252.html
One can use following python code to verify this.
for i in range(0x100):
try:
chr(i).encode('latin-1')
except:
print(" %s cannot encode %02x" % ('latin-1', i))
for i in range(0x100):
try:
b = bytes([i])
b.decode('latin-1')
except:
print(" %s cannot decode %02x" % ('latin-1', i))
This patch add code to enforce using 'latin-1' as encoding argument
of open() in function OpenLongFilePath(), if the open mode is for
text file only. This can solve the file decoding issue completely.
The possible related BZs:
https://bugzilla.tianocore.org/show_bug.cgi?id=1434
https://bugzilla.tianocore.org/show_bug.cgi?id=1637
https://bugzilla.tianocore.org/show_bug.cgi?id=2578
https://bugzilla.tianocore.org/show_bug.cgi?id=2709
https://bugzilla.tianocore.org/show_bug.cgi?id=2829
Cc: Bob Feng <bob.c.feng@intel.com>
Cc: Liming Gao <gaoliming@byosoft.com.cn>
Cc: Yuwei Chen <yuwei.chen@intel.com>
Signed-off-by: Jian J Wang <jian.j.wang@intel.com>
---
BaseTools/Source/Python/Common/LongFilePathSupport.py | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/BaseTools/Source/Python/Common/LongFilePathSupport.py b/BaseTools/Source/Python/Common/LongFilePathSupport.py
index 38c4396544..c8dce077f2 100644
--- a/BaseTools/Source/Python/Common/LongFilePathSupport.py
+++ b/BaseTools/Source/Python/Common/LongFilePathSupport.py
@@ -30,7 +30,8 @@ def LongFilePath(FileName):
# wrap open to support opening a long file path
#
def OpenLongFilePath(FileName, Mode='r', Buffer= -1):
- return open(LongFilePath(FileName), Mode, Buffer)
+ Encoding = None if 'b' in Mode else 'latin-1'
+ return open(LongFilePath(FileName), Mode, Buffer, Encoding)
def CodecOpenLongFilePath(Filename, Mode='rb', Encoding=None, Errors='strict', Buffering=1):
return codecs.open(LongFilePath(Filename), Mode, Encoding, Errors, Buffering)
--
2.24.0.windows.2
-=-=-=-=-=-=
Groups.io Links: You receive all messages sent to this group.
View/Reply Online (#66316): https://edk2.groups.io/g/devel/message/66316
Mute This Topic: https://groups.io/mt/77546105/5049190
Group Owner: devel+owner@edk2.groups.io
Unsubscribe: https://edk2.groups.io/g/devel/unsub [fengyunhua@byosoft.com.cn]
-=-=-=-=-=-=
-=-=-=-=-=-=-=-=-=-=-=-
Groups.io Links: You receive all messages sent to this group.
View/Reply Online (#66384): https://edk2.groups.io/g/devel/message/66384
Mute This Topic: https://groups.io/mt/77654194/1787277
Group Owner: devel+owner@edk2.groups.io
Unsubscribe: https://edk2.groups.io/g/devel/unsub [importer@patchew.org]
-=-=-=-=-=-=-=-=-=-=-=-
This patch is incompatible with python2. https://docs.python.org/2.7/library/functions.html#open open(name[, mode[, buffering]]) In Python2, open has no the Encoding argument Thanks, Bob -----Original Message----- From: fengyunhua <fengyunhua@byosoft.com.cn> Sent: Monday, October 19, 2020 4:55 PM To: devel@edk2.groups.io; Wang, Jian J <jian.j.wang@intel.com> Cc: Feng, Bob C <bob.c.feng@intel.com>; 'Liming Gao' <gaoliming@byosoft.com.cn>; Chen, Christine <yuwei.chen@intel.com> Subject: 回复: [edk2-devel] [PATCH] BaseTools: fix decoding issue in file operation Tested-by: Yunhua Feng <fengyunhua@byosoft.com.cn> -----邮件原件----- 发件人: bounce+27952+66316+5049190+8953120@groups.io <bounce+27952+66316+5049190+8953120@groups.io> 代表 Wang, Jian J 发送时间: 2020年10月16日 15:41 收件人: devel@edk2.groups.io 抄送: Bob Feng <bob.c.feng@intel.com>; Liming Gao <gaoliming@byosoft.com.cn>; Yuwei Chen <yuwei.chen@intel.com> 主题: [edk2-devel] [PATCH] BaseTools: fix decoding issue in file operation The build tool reports failure upon file read, such as calling trim to clean preprocessed source files, if the tool is running on OS with non-western code-page and the source file has non-ascii characters. Even if utf-8 has also problem when encountering some characters encoded in cp1252 (such 0x92, 0x96, 0xa0, etc). Currently, the safest way to read file in python code is using 'latin-1' (iso-8859-1) because it uses every byte between 00-FF and then won't cause encoding/decoding issue. It behaves almost the same as reading file in binary mode. cp1252 is similar to latin-1 but it doesn't support encoding '\x80' to '\xff' and doesn't support decoding following bytes: '\x81', '\x8d', '\x8f', '\x90', '\x9d' So if there're utf-8/16 encoded characters in file, it will fail sometimes. Refer to following links for details: https://en.wikipedia.org/wiki/Latin-1_Supplement_(Unicode_block) https://en.wikipedia.org/wiki/Windows-1252 https://kb.iu.edu/d/aepu https://www.i18nqa.com/debug/table-iso8859-1-vs-windows-1252.html One can use following python code to verify this. for i in range(0x100): try: chr(i).encode('latin-1') except: print(" %s cannot encode %02x" % ('latin-1', i)) for i in range(0x100): try: b = bytes([i]) b.decode('latin-1') except: print(" %s cannot decode %02x" % ('latin-1', i)) This patch add code to enforce using 'latin-1' as encoding argument of open() in function OpenLongFilePath(), if the open mode is for text file only. This can solve the file decoding issue completely. The possible related BZs: https://bugzilla.tianocore.org/show_bug.cgi?id=1434 https://bugzilla.tianocore.org/show_bug.cgi?id=1637 https://bugzilla.tianocore.org/show_bug.cgi?id=2578 https://bugzilla.tianocore.org/show_bug.cgi?id=2709 https://bugzilla.tianocore.org/show_bug.cgi?id=2829 Cc: Bob Feng <bob.c.feng@intel.com> Cc: Liming Gao <gaoliming@byosoft.com.cn> Cc: Yuwei Chen <yuwei.chen@intel.com> Signed-off-by: Jian J Wang <jian.j.wang@intel.com> --- BaseTools/Source/Python/Common/LongFilePathSupport.py | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/BaseTools/Source/Python/Common/LongFilePathSupport.py b/BaseTools/Source/Python/Common/LongFilePathSupport.py index 38c4396544..c8dce077f2 100644 --- a/BaseTools/Source/Python/Common/LongFilePathSupport.py +++ b/BaseTools/Source/Python/Common/LongFilePathSupport.py @@ -30,7 +30,8 @@ def LongFilePath(FileName): # wrap open to support opening a long file path # def OpenLongFilePath(FileName, Mode='r', Buffer= -1): - return open(LongFilePath(FileName), Mode, Buffer) + Encoding = None if 'b' in Mode else 'latin-1' + return open(LongFilePath(FileName), Mode, Buffer, Encoding) def CodecOpenLongFilePath(Filename, Mode='rb', Encoding=None, Errors='strict', Buffering=1): return codecs.open(LongFilePath(Filename), Mode, Encoding, Errors, Buffering) -- 2.24.0.windows.2 -=-=-=-=-=-= Groups.io Links: You receive all messages sent to this group. View/Reply Online (#66316): https://edk2.groups.io/g/devel/message/66316 Mute This Topic: https://groups.io/mt/77546105/5049190 Group Owner: devel+owner@edk2.groups.io Unsubscribe: https://edk2.groups.io/g/devel/unsub [fengyunhua@byosoft.com.cn] -=-=-=-=-=-= -=-=-=-=-=-=-=-=-=-=-=- Groups.io Links: You receive all messages sent to this group. View/Reply Online (#66446): https://edk2.groups.io/g/devel/message/66446 Mute This Topic: https://groups.io/mt/77675642/1787277 Group Owner: devel+owner@edk2.groups.io Unsubscribe: https://edk2.groups.io/g/devel/unsub [importer@patchew.org] -=-=-=-=-=-=-=-=-=-=-=-
© 2016 - 2024 Red Hat, Inc.