[v2] decodetree: Open files with encoding='utf-8'

[PATCH v2] decodetree: Open files with encoding='utf-8'

Posted by Philippe Mathieu-Daudé 5 years, 1 month ago

When decodetree.py was added in commit 568ae7efae7, QEMU was
using Python 2 which happily reads UTF-8 files in text mode.
Python 3 requires either UTF-8 locale or an explicit encoding
passed to open(). Now that Python 3 is required, explicit
UTF-8 encoding for decodetree source files.

To avoid further problems with the user locale, also explicit
UTF-8 encoding for the generated C files.

Explicit both input/output are plain text by using the 't' mode.

This fixes:

  $ /usr/bin/python3 scripts/decodetree.py test.decode
  Traceback (most recent call last):
    File "scripts/decodetree.py", line 1397, in <module>
      main()
    File "scripts/decodetree.py", line 1308, in main
      parse_file(f, toppat)
    File "scripts/decodetree.py", line 994, in parse_file
      for line in f:
    File "/usr/lib/python3.6/encodings/ascii.py", line 26, in decode
      return codecs.ascii_decode(input, self.errors)[0]
  UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 80:
  ordinal not in range(128)

Reported-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
---
v2: utf-8 output too (Peter)
    explicit default text mode.
---
 scripts/decodetree.py | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/scripts/decodetree.py b/scripts/decodetree.py
index 47aa9caf6d1..d3857066cfc 100644
--- a/scripts/decodetree.py
+++ b/scripts/decodetree.py
@@ -1304,7 +1304,7 @@ def main():
 
     for filename in args:
         input_file = filename
-        f = open(filename, 'r')
+        f = open(filename, 'rt', encoding='utf-8')
         parse_file(f, toppat)
         f.close()
 
@@ -1324,7 +1324,7 @@ def main():
         prop_size(stree)
 
     if output_file:
-        output_fd = open(output_file, 'w')
+        output_fd = open(output_file, 'wt', encoding='utf-8')
     else:
         output_fd = sys.stdout
 
-- 
2.26.2

Re: [PATCH v2] decodetree: Open files with encoding='utf-8'

Posted by Eduardo Habkost 5 years, 1 month ago

On Fri, Jan 08, 2021 at 07:09:52PM +0100, Philippe Mathieu-Daudé wrote:
> When decodetree.py was added in commit 568ae7efae7, QEMU was
> using Python 2 which happily reads UTF-8 files in text mode.
> Python 3 requires either UTF-8 locale or an explicit encoding
> passed to open(). Now that Python 3 is required, explicit
> UTF-8 encoding for decodetree source files.
> 
> To avoid further problems with the user locale, also explicit
> UTF-8 encoding for the generated C files.
> 
> Explicit both input/output are plain text by using the 't' mode.

I believe the 't' is unnecessary.  But it's harmless and makes it
more explicit.

> 
> This fixes:
> 
>   $ /usr/bin/python3 scripts/decodetree.py test.decode
>   Traceback (most recent call last):
>     File "scripts/decodetree.py", line 1397, in <module>
>       main()
>     File "scripts/decodetree.py", line 1308, in main
>       parse_file(f, toppat)
>     File "scripts/decodetree.py", line 994, in parse_file
>       for line in f:
>     File "/usr/lib/python3.6/encodings/ascii.py", line 26, in decode
>       return codecs.ascii_decode(input, self.errors)[0]
>   UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 80:
>   ordinal not in range(128)
> 
> Reported-by: Peter Maydell <peter.maydell@linaro.org>
> Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>

Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>

However:

> ---
> v2: utf-8 output too (Peter)
>     explicit default text mode.
> ---
>  scripts/decodetree.py | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/scripts/decodetree.py b/scripts/decodetree.py
> index 47aa9caf6d1..d3857066cfc 100644
> --- a/scripts/decodetree.py
> +++ b/scripts/decodetree.py
> @@ -1304,7 +1304,7 @@ def main():
>  
>      for filename in args:
>          input_file = filename
> -        f = open(filename, 'r')
> +        f = open(filename, 'rt', encoding='utf-8')
>          parse_file(f, toppat)
>          f.close()
>  
> @@ -1324,7 +1324,7 @@ def main():
>          prop_size(stree)
>  
>      if output_file:
> -        output_fd = open(output_file, 'w')
> +        output_fd = open(output_file, 'wt', encoding='utf-8')
>      else:
>          output_fd = sys.stdout

This will still use the user locale encoding for sys.stdout.  Can
be solved with:

    output_fd = io.TextIOWrapper(sys.stdout.buffer, encoding='utf-8')

(Based on a suggestion from Yonggang Luo)

-- 
Eduardo

Re: [PATCH v2] decodetree: Open files with encoding='utf-8'

Posted by 罗勇刚 (Yonggang Luo) 5 years, 1 month ago

On Fri, Jan 8, 2021 at 10:58 AM Eduardo Habkost <ehabkost@redhat.com> wrote:
>
> On Fri, Jan 08, 2021 at 07:09:52PM +0100, Philippe Mathieu-Daudé wrote:
> > When decodetree.py was added in commit 568ae7efae7, QEMU was
> > using Python 2 which happily reads UTF-8 files in text mode.
> > Python 3 requires either UTF-8 locale or an explicit encoding
> > passed to open(). Now that Python 3 is required, explicit
> > UTF-8 encoding for decodetree source files.
> >
> > To avoid further problems with the user locale, also explicit
> > UTF-8 encoding for the generated C files.
> >
> > Explicit both input/output are plain text by using the 't' mode.
>
> I believe the 't' is unnecessary.  But it's harmless and makes it
> more explicit.
>
> >
> > This fixes:
> >
> >   $ /usr/bin/python3 scripts/decodetree.py test.decode
> >   Traceback (most recent call last):
> >     File "scripts/decodetree.py", line 1397, in <module>
> >       main()
> >     File "scripts/decodetree.py", line 1308, in main
> >       parse_file(f, toppat)
> >     File "scripts/decodetree.py", line 994, in parse_file
> >       for line in f:
> >     File "/usr/lib/python3.6/encodings/ascii.py", line 26, in decode
> >       return codecs.ascii_decode(input, self.errors)[0]
> >   UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position
80:
> >   ordinal not in range(128)
> >
> > Reported-by: Peter Maydell <peter.maydell@linaro.org>
> > Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
>
> Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
>
> However:
>
> > ---
> > v2: utf-8 output too (Peter)
> >     explicit default text mode.
> > ---
> >  scripts/decodetree.py | 4 ++--
> >  1 file changed, 2 insertions(+), 2 deletions(-)
> >
> > diff --git a/scripts/decodetree.py b/scripts/decodetree.py
> > index 47aa9caf6d1..d3857066cfc 100644
> > --- a/scripts/decodetree.py
> > +++ b/scripts/decodetree.py
> > @@ -1304,7 +1304,7 @@ def main():
> >
> >      for filename in args:
> >          input_file = filename
> > -        f = open(filename, 'r')
> > +        f = open(filename, 'rt', encoding='utf-8')
> >          parse_file(f, toppat)
> >          f.close()
> >
> > @@ -1324,7 +1324,7 @@ def main():
> >          prop_size(stree)
> >
> >      if output_file:
> > -        output_fd = open(output_file, 'w')
> > +        output_fd = open(output_file, 'wt', encoding='utf-8')

I misunderstand the cause, this is a better way

> >      else:
> >          output_fd = sys.stdout
>
> This will still use the user locale encoding for sys.stdout.  Can
> be solved with:
>
>     output_fd = io.TextIOWrapper(sys.stdout.buffer, encoding='utf-8')

For output to console/terminal. I suggest to use
   sys.stdout = io.TextIOWrapper(sys.stdout.buffer,
encoding=sys.stdout.encoding, errors="ignore")
When the console/terminal encoding still can not represent the char in the
decodetree, still won't
cause script failure. And that failure can not be fixed by other means.
  errors="ignore" are important, from my experince, even there is `char`
can not represent
in utf8


>
> (Based on a suggestion from Yonggang Luo)
>
> --
> Eduardo
>


--
         此致
礼
罗勇刚
Yours
    sincerely,
Yonggang Luo