scripts/decodetree.py | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
When decodetree.py was added in commit 568ae7efae7, QEMU was
using Python 2 which happily reads UTF-8 files in text mode.
Python 3 requires either UTF-8 locale or an explicit encoding
passed to open(). Now that Python 3 is required, explicit
UTF-8 encoding for decodetree source files.
To avoid further problems with the user locale, also explicit
UTF-8 encoding for the generated C files.
Explicit both input/output are plain text by using the 't' mode.
This fixes:
$ /usr/bin/python3 scripts/decodetree.py test.decode
Traceback (most recent call last):
File "scripts/decodetree.py", line 1397, in <module>
main()
File "scripts/decodetree.py", line 1308, in main
parse_file(f, toppat)
File "scripts/decodetree.py", line 994, in parse_file
for line in f:
File "/usr/lib/python3.6/encodings/ascii.py", line 26, in decode
return codecs.ascii_decode(input, self.errors)[0]
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 80:
ordinal not in range(128)
Reported-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
---
v2: utf-8 output too (Peter)
explicit default text mode.
---
scripts/decodetree.py | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/scripts/decodetree.py b/scripts/decodetree.py
index 47aa9caf6d1..d3857066cfc 100644
--- a/scripts/decodetree.py
+++ b/scripts/decodetree.py
@@ -1304,7 +1304,7 @@ def main():
for filename in args:
input_file = filename
- f = open(filename, 'r')
+ f = open(filename, 'rt', encoding='utf-8')
parse_file(f, toppat)
f.close()
@@ -1324,7 +1324,7 @@ def main():
prop_size(stree)
if output_file:
- output_fd = open(output_file, 'w')
+ output_fd = open(output_file, 'wt', encoding='utf-8')
else:
output_fd = sys.stdout
--
2.26.2
On Fri, Jan 08, 2021 at 07:09:52PM +0100, Philippe Mathieu-Daudé wrote: > When decodetree.py was added in commit 568ae7efae7, QEMU was > using Python 2 which happily reads UTF-8 files in text mode. > Python 3 requires either UTF-8 locale or an explicit encoding > passed to open(). Now that Python 3 is required, explicit > UTF-8 encoding for decodetree source files. > > To avoid further problems with the user locale, also explicit > UTF-8 encoding for the generated C files. > > Explicit both input/output are plain text by using the 't' mode. I believe the 't' is unnecessary. But it's harmless and makes it more explicit. > > This fixes: > > $ /usr/bin/python3 scripts/decodetree.py test.decode > Traceback (most recent call last): > File "scripts/decodetree.py", line 1397, in <module> > main() > File "scripts/decodetree.py", line 1308, in main > parse_file(f, toppat) > File "scripts/decodetree.py", line 994, in parse_file > for line in f: > File "/usr/lib/python3.6/encodings/ascii.py", line 26, in decode > return codecs.ascii_decode(input, self.errors)[0] > UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 80: > ordinal not in range(128) > > Reported-by: Peter Maydell <peter.maydell@linaro.org> > Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Reviewed-by: Eduardo Habkost <ehabkost@redhat.com> However: > --- > v2: utf-8 output too (Peter) > explicit default text mode. > --- > scripts/decodetree.py | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/scripts/decodetree.py b/scripts/decodetree.py > index 47aa9caf6d1..d3857066cfc 100644 > --- a/scripts/decodetree.py > +++ b/scripts/decodetree.py > @@ -1304,7 +1304,7 @@ def main(): > > for filename in args: > input_file = filename > - f = open(filename, 'r') > + f = open(filename, 'rt', encoding='utf-8') > parse_file(f, toppat) > f.close() > > @@ -1324,7 +1324,7 @@ def main(): > prop_size(stree) > > if output_file: > - output_fd = open(output_file, 'w') > + output_fd = open(output_file, 'wt', encoding='utf-8') > else: > output_fd = sys.stdout This will still use the user locale encoding for sys.stdout. Can be solved with: output_fd = io.TextIOWrapper(sys.stdout.buffer, encoding='utf-8') (Based on a suggestion from Yonggang Luo) -- Eduardo
On Fri, Jan 8, 2021 at 10:58 AM Eduardo Habkost <ehabkost@redhat.com> wrote: > > On Fri, Jan 08, 2021 at 07:09:52PM +0100, Philippe Mathieu-Daudé wrote: > > When decodetree.py was added in commit 568ae7efae7, QEMU was > > using Python 2 which happily reads UTF-8 files in text mode. > > Python 3 requires either UTF-8 locale or an explicit encoding > > passed to open(). Now that Python 3 is required, explicit > > UTF-8 encoding for decodetree source files. > > > > To avoid further problems with the user locale, also explicit > > UTF-8 encoding for the generated C files. > > > > Explicit both input/output are plain text by using the 't' mode. > > I believe the 't' is unnecessary. But it's harmless and makes it > more explicit. > > > > > This fixes: > > > > $ /usr/bin/python3 scripts/decodetree.py test.decode > > Traceback (most recent call last): > > File "scripts/decodetree.py", line 1397, in <module> > > main() > > File "scripts/decodetree.py", line 1308, in main > > parse_file(f, toppat) > > File "scripts/decodetree.py", line 994, in parse_file > > for line in f: > > File "/usr/lib/python3.6/encodings/ascii.py", line 26, in decode > > return codecs.ascii_decode(input, self.errors)[0] > > UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 80: > > ordinal not in range(128) > > > > Reported-by: Peter Maydell <peter.maydell@linaro.org> > > Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org> > > Reviewed-by: Eduardo Habkost <ehabkost@redhat.com> > > However: > > > --- > > v2: utf-8 output too (Peter) > > explicit default text mode. > > --- > > scripts/decodetree.py | 4 ++-- > > 1 file changed, 2 insertions(+), 2 deletions(-) > > > > diff --git a/scripts/decodetree.py b/scripts/decodetree.py > > index 47aa9caf6d1..d3857066cfc 100644 > > --- a/scripts/decodetree.py > > +++ b/scripts/decodetree.py > > @@ -1304,7 +1304,7 @@ def main(): > > > > for filename in args: > > input_file = filename > > - f = open(filename, 'r') > > + f = open(filename, 'rt', encoding='utf-8') > > parse_file(f, toppat) > > f.close() > > > > @@ -1324,7 +1324,7 @@ def main(): > > prop_size(stree) > > > > if output_file: > > - output_fd = open(output_file, 'w') > > + output_fd = open(output_file, 'wt', encoding='utf-8') I misunderstand the cause, this is a better way > > else: > > output_fd = sys.stdout > > This will still use the user locale encoding for sys.stdout. Can > be solved with: > > output_fd = io.TextIOWrapper(sys.stdout.buffer, encoding='utf-8') For output to console/terminal. I suggest to use sys.stdout = io.TextIOWrapper(sys.stdout.buffer, encoding=sys.stdout.encoding, errors="ignore") When the console/terminal encoding still can not represent the char in the decodetree, still won't cause script failure. And that failure can not be fixed by other means. errors="ignore" are important, from my experince, even there is `char` can not represent in utf8 > > (Based on a suggestion from Yonggang Luo) > > -- > Eduardo > -- 此致 礼 罗勇刚 Yours sincerely, Yonggang Luo
© 2016 - 2024 Red Hat, Inc.