[PyQt] Inconsistent pylupdate5 behaviour on UTF8 data

Sun Feb 16 13:01:29 GMT 2020

On 12/02/2020 15:27, Giuseppe Corbelli wrote:
> Hi all
> I found a puzzling pylupdate5 behaviour inconsistency between Linux
> and Windows versions.
> Scenario: I am extracting translatable strings from python modules.
> The files are saved as UTF8, I run pylupdate and get different
> representations in the XML output.
> 
> pylupdate5 v5.14.1 as Debian package on Linux and fresh pip install in
> a venv on Windows 10.
> 
> As you can find in the attached test data:
> 
> - on windows the 'ç' character (U+00E7	ç	c3 a7	LATIN SMALL LETTER C
> WITH CEDILLA) is converted to <source>this needs UTF8 encoding:
> &#xc3;&#xa7;&#xc2;&#xb0;&#xc2;&#xa7;</source>
> 
> - on linux the same 'ç' correctly converts to <source>this needs UTF8
> encoding: &#xe7;&#xb0;&#xa7;</source>
> 
> So it seems that on windows each byte of the utf8 string is replaced
> with its unicode point in xml numeric character format, while on linux
> the same applies (correctly) to the character itself (formed by two
> bytes in UTF8).
> 
> Am I doing something wrong?

I can't reproduce this - I get identical results on Windows, Linux and 
macOS.

If you want to try and debug your own installation then look at 
evilBytes() in qpy\pylupdate\metatranslator.cpp

Phil