[PyQt] Implicit latin1 encoding for QByteArray

Thu Oct 2 11:49:17 BST 2014

On 02/10/2014 5:52 am, Florian Bruhin wrote:
> Hi,
> 
> The PyQt documentation says:
> 
>     If Qt expects a QByteArray then PyQt5 will also accept a str that
>     contains only Latin-1 characters, or a bytes.
> 
> I don't know the background/rationale for this, but (at least with
> Python3, probably things are different on py2) I'd have expected a
> TypeError for anything else than a bytes object.

With hindsight I think that's what should happen. I think I was trying 
to ease the transition between Python2 and Python3.

> I got bitten by this in a rather obscure place, in the error page for
> a QWebPage (code just for illustration, not tested):
> 
> # ----
> 
> class WebPage(QWebPage):
> 
>     def extension(self, ext, opt, out):
>         if ext != QWebPage.ErrorPageExtension:
>             return False
>         errpage = sip.cast(out, QWebPage.ErrorPageExtensionReturn)
>         info = sip.cast(opt, QWebPage.ErrorPageExtensionOption)
>         url = info.url.toDisplayString()
>         errpage.content = some_code_that_renders_errpage(url)
>         return True
> 
> # ----

You shouldn't need to use sip.cast().

> I got a confusing exception when there was an error page on an URL
> with UTF-8 chars with it:
> 
>     UnicodeEncodeError: 'latin-1' codec can't encode character
>     '\u2713' in position 213: ordinal not in range(256)
> 
> (with the traceback pointing on the erropage.content line).
> 
> If QByteArray would've done what Python3 does (bytes/string are
> incompatible), I'd have noticed my mistake right away, this way it
> exploded unexpectedly ;)
> 
> Of course this might be a breaking change, but I thought I'd propose
> it anyways ;)
> 
> Maybe another option would be to implicitely encode to UTF-8 for
> ErrorPageExtensionReturn::content if that's possible? After all,
> ErrorPageExtensionReturn::encoding defaults to 'utf-8', so Qt is
> expecting UTF-8 data there. (Now that I think of it that'd also mean
> it'd silently break with data that is valid Latin1 but invalid UTF8).

It is a breaking change. I think I need to be better at using 
deprecation warnings generally. So I may deprecate this behaviour in 
v5.4 and remove it in v5.5. I may handle the unhandled exceptions issue 
in the same way.

Phil