[PyQt] Inconsistent pylupdate5 behaviour on UTF8 data

Phil Thompson phil at riverbankcomputing.com
Sun Feb 16 13:01:29 GMT 2020


On 12/02/2020 15:27, Giuseppe Corbelli wrote:
> Hi all
> I found a puzzling pylupdate5 behaviour inconsistency between Linux
> and Windows versions.
> Scenario: I am extracting translatable strings from python modules.
> The files are saved as UTF8, I run pylupdate and get different
> representations in the XML output.
> 
> pylupdate5 v5.14.1 as Debian package on Linux and fresh pip install in
> a venv on Windows 10.
> 
> As you can find in the attached test data:
> 
> - on windows the 'ç' character (U+00E7	ç	c3 a7	LATIN SMALL LETTER C
> WITH CEDILLA) is converted to <source>this needs UTF8 encoding:
> &#xc3;&#xa7;&#xc2;&#xb0;&#xc2;&#xa7;</source>
> 
> - on linux the same 'ç' correctly converts to <source>this needs UTF8
> encoding: &#xe7;&#xb0;&#xa7;</source>
> 
> So it seems that on windows each byte of the utf8 string is replaced
> with its unicode point in xml numeric character format, while on linux
> the same applies (correctly) to the character itself (formed by two
> bytes in UTF8).
> 
> Am I doing something wrong?

I can't reproduce this - I get identical results on Windows, Linux and 
macOS.

If you want to try and debug your own installation then look at 
evilBytes() in qpy\pylupdate\metatranslator.cpp

Phil


More information about the PyQt mailing list