[PyQt] UnicodeDecodeError with output from Windows OS command

Thu Nov 30 11:18:27 GMT 2017

On Thu, Nov 30, 2017 at 11:02:57AM +0000, J Barchan wrote:
> While this has worked fine under Linux, when a user runs my Qt app under
> Windows and issues a perfectly normal robocopy command under a standard UK
> Windows with really nothing special/unusual going on with filenames, he
> gets:
> 
> > Unhandled Exception:
> >
> > 'utf-8' codec can't decode byte 0x9c in position 32: invalid start byte
> >
> > <class 'UnicodeDecodeError'>
> > File "C:\HJinn\widgets\messageboxes.py", line 289, in
> > processReadyReadStandardOutput
> > output = output.data().decode('utf-8')

Are you sure robocopy's output is utf-8 on Windows?
I'd assume it to be Windows-1252, and there, that'd be an œ character.

> I have over the years written, say, Windows C programs using standard
> Windows SDK calls for this kind of "redirector".  I simply grab the output
> of a sub-process and throw it at whatever the native Windows SetTextEdit()
> function is, and all has always been fine.
> 
> Note that I have *never* had to guess/decode/convert bytes to some text
> encoding, and this has worked across all platforms forever.  So I really
> don't expect to have to do so now, unless there's something going on in
> Qt/PyQt which is fundamentally any different.

Things like stdin/stdout are binary, not text. There are essentially three
approaches to take here:

- Treat everything as binary. That's (afaik) what e.g. 'grep' does, because it
  has a binary input, but also a binary commandline argument to search for.

- Implicitly convert stuff between text and binary. This is the approach the
  Windows API (probably), and Python 2 take - but while things appear to
  "work", you'll get hard to detect issues (like garbled output).

- Fail loudly instead of silently, which is what Python 3 generally does.
  IMHO, that's a very good thing ;-)

> So... can you please tell me how under Python/PyQt I can just display the
> output from an OS command (assuming "text-type" output, I don't expect
> arbitrary binary bytes) without the slightest chance of any kind of "I
> can't convert" Exception, please?

You might want to take a look at this first, to understand the background:
https://nedbatchelder.com/text/unipain.html
http://kunststube.net/encoding/

Solutions I can think of:

- Tell robocopy to output UTF-8
- Use the *correct* encoding. Your best bet is probably using
  sys.getfilesystemencoding() and hoping that robocopy outputs in that
  encoding.
- Pass an error handler such as 'replace' to .decode() and get question marks
  instead. Users with non-ASCII locales are probably not going to like you.

Florian

-- 
https://www.qutebrowser.org  | me at the-compiler.org (Mail/XMPP)
   GPG: 916E B0C8 FD55 A072  | https://the-compiler.org/pubkey.asc
         I love long mails!  | https://email.is-not-s.ms/
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: not available
URL: <https://www.riverbankcomputing.com/pipermail/pyqt/attachments/20171130/7523d3da/attachment.sig>