[PyQt] UnicodeDecodeError with output from Windows OS command
Florian Bruhin
me at the-compiler.org
Thu Nov 30 11:18:27 GMT 2017
On Thu, Nov 30, 2017 at 11:02:57AM +0000, J Barchan wrote:
> While this has worked fine under Linux, when a user runs my Qt app under
> Windows and issues a perfectly normal robocopy command under a standard UK
> Windows with really nothing special/unusual going on with filenames, he
> gets:
>
> > Unhandled Exception:
> >
> > 'utf-8' codec can't decode byte 0x9c in position 32: invalid start byte
> >
> > <class 'UnicodeDecodeError'>
> > File "C:\HJinn\widgets\messageboxes.py", line 289, in
> > processReadyReadStandardOutput
> > output = output.data().decode('utf-8')
Are you sure robocopy's output is utf-8 on Windows?
I'd assume it to be Windows-1252, and there, that'd be an œ character.
> I have over the years written, say, Windows C programs using standard
> Windows SDK calls for this kind of "redirector". I simply grab the output
> of a sub-process and throw it at whatever the native Windows SetTextEdit()
> function is, and all has always been fine.
>
> Note that I have *never* had to guess/decode/convert bytes to some text
> encoding, and this has worked across all platforms forever. So I really
> don't expect to have to do so now, unless there's something going on in
> Qt/PyQt which is fundamentally any different.
Things like stdin/stdout are binary, not text. There are essentially three
approaches to take here:
- Treat everything as binary. That's (afaik) what e.g. 'grep' does, because it
has a binary input, but also a binary commandline argument to search for.
- Implicitly convert stuff between text and binary. This is the approach the
Windows API (probably), and Python 2 take - but while things appear to
"work", you'll get hard to detect issues (like garbled output).
- Fail loudly instead of silently, which is what Python 3 generally does.
IMHO, that's a very good thing ;-)
> So... can you please tell me how under Python/PyQt I can just display the
> output from an OS command (assuming "text-type" output, I don't expect
> arbitrary binary bytes) without the slightest chance of any kind of "I
> can't convert" Exception, please?
You might want to take a look at this first, to understand the background:
https://nedbatchelder.com/text/unipain.html
http://kunststube.net/encoding/
Solutions I can think of:
- Tell robocopy to output UTF-8
- Use the *correct* encoding. Your best bet is probably using
sys.getfilesystemencoding() and hoping that robocopy outputs in that
encoding.
- Pass an error handler such as 'replace' to .decode() and get question marks
instead. Users with non-ASCII locales are probably not going to like you.
Florian
--
https://www.qutebrowser.org | me at the-compiler.org (Mail/XMPP)
GPG: 916E B0C8 FD55 A072 | https://the-compiler.org/pubkey.asc
I love long mails! | https://email.is-not-s.ms/
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: not available
URL: <https://www.riverbankcomputing.com/pipermail/pyqt/attachments/20171130/7523d3da/attachment.sig>
More information about the PyQt
mailing list