[v2] Update check-python-tox test for pylint 2.10

[PATCH v2 0/1] Update check-python-tox test for pylint 2.10

Posted by John Snow 4 years, 4 months ago

V2: It's not safe to use sys.stderr.encoding to determine a "console
encoding", because that uses the "current" stderr and not a
hypothetically generic one -- and doing this causes the acceptance tests
to fail.

Use UTF-8 instead.

Question: What encoding do terminal programs use? Is there an inherent
encoding to fprintf et al, or does it just push whatever bytes you put
into it straight into the stdout/stderr pipe?

John Snow (1):
  python: Update for pylint 2.10

 python/qemu/machine/machine.py | 3 ++-
 python/setup.cfg               | 1 +
 2 files changed, 3 insertions(+), 1 deletion(-)

-- 
2.31.1

Re: [PATCH v2 0/1] Update check-python-tox test for pylint 2.10

Posted by Daniel P. Berrangé 4 years, 4 months ago

On Wed, Sep 15, 2021 at 01:30:10AM -0400, John Snow wrote:
> V2: It's not safe to use sys.stderr.encoding to determine a "console
> encoding", because that uses the "current" stderr and not a
> hypothetically generic one -- and doing this causes the acceptance tests
> to fail.
> 
> Use UTF-8 instead.
> 
> Question: What encoding do terminal programs use? Is there an inherent
> encoding to fprintf et al, or does it just push whatever bytes you put
> into it straight into the stdout/stderr pipe?

Programs are expected to output data in the encoding that is set in
the various env variables LC_ALL/LC_CTYPE/LANG.

In traditional end user scenarios this almost always means UTF-8 charset.

There's plenty of cases which end up with the C locale though, which
would mean 7-bit ASCII on Linux, though apps are supposed to be 8-bit
clean allow data with the high bit to pass through without interpretation.
The latter is what python3 gets very wrong complaining if you output
8-bit high data in C locale.

There is increasing support for a C.UTF-8 locale to bring it closer to
other locales which are all UTF-8.

On macOS the C locale has been UTF-8 by default indefinitely.

Windows is a whole other world of fun and IIRC isn't UTF-8 by default,
but I don't recall details.

Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|

Re: [PATCH v2 0/1] Update check-python-tox test for pylint 2.10

Posted by John Snow 4 years, 4 months ago

On Wed, Sep 15, 2021 at 5:10 AM Daniel P. Berrangé <berrange@redhat.com>
wrote:

> On Wed, Sep 15, 2021 at 01:30:10AM -0400, John Snow wrote:
> > V2: It's not safe to use sys.stderr.encoding to determine a "console
> > encoding", because that uses the "current" stderr and not a
> > hypothetically generic one -- and doing this causes the acceptance tests
> > to fail.
> >
> > Use UTF-8 instead.
> >
> > Question: What encoding do terminal programs use? Is there an inherent
> > encoding to fprintf et al, or does it just push whatever bytes you put
> > into it straight into the stdout/stderr pipe?
>
> Programs are expected to output data in the encoding that is set in
> the various env variables LC_ALL/LC_CTYPE/LANG.
>
> In traditional end user scenarios this almost always means UTF-8 charset.
>
> There's plenty of cases which end up with the C locale though, which
> would mean 7-bit ASCII on Linux, though apps are supposed to be 8-bit
> clean allow data with the high bit to pass through without interpretation.
> The latter is what python3 gets very wrong complaining if you output
> 8-bit high data in C locale.
>
> There is increasing support for a C.UTF-8 locale to bring it closer to
> other locales which are all UTF-8.
>
> On macOS the C locale has been UTF-8 by default indefinitely.
>
> Windows is a whole other world of fun and IIRC isn't UTF-8 by default,
> but I don't recall details.
>
>
> Regards,
> Daniel
> --
> |: https://berrange.com      -o-
> https://www.flickr.com/photos/dberrange :|
> |: https://libvirt.org         -o-
> https://fstop138.berrange.com :|
> |: https://entangle-photo.org    -o-
> https://www.instagram.com/dberrange :|
>
>
Hm, I believe I can use `lang, encoding = locale.getlocale() ` in this case
-- I believe it follows LC_CTYPE. This ought to accurately match the
console output from QEMU.
I'll respin, actually. We don't test the Python packages on Windows, but I
see no reason to introduce a nasty timebomb.

Thanks!
--js