[PyQt] Inconsistent pylupdate5 behaviour on UTF8 data

Giuseppe Corbelli corbelligiuseppe at mesdan.it
Tue Feb 18 10:37:43 GMT 2020

On 2/16/20 2:01 PM, Phil Thompson wrote:
> On 12/02/2020 15:27, Giuseppe Corbelli wrote:
>> Hi all
>> I found a puzzling pylupdate5 behaviour inconsistency between Linux
>> and Windows versions.
>> Scenario: I am extracting translatable strings from python modules.
>> The files are saved as UTF8, I run pylupdate and get different
>> representations in the XML output.
>> pylupdate5 v5.14.1 as Debian package on Linux and fresh pip install in
>> a venv on Windows 10.
>> As you can find in the attached test data:
>> - on windows the 'ç' character (U+00E7    ç    c3 a7    LATIN SMALL 
>> WITH CEDILLA) is converted to <source>this needs UTF8 encoding:
>> &#xc3;&#xa7;&#xc2;&#xb0;&#xc2;&#xa7;</source>
>> - on linux the same 'ç' correctly converts to <source>this needs UTF8
>> encoding: &#xe7;&#xb0;&#xa7;</source>
>> So it seems that on windows each byte of the utf8 string is replaced
>> with its unicode point in xml numeric character format, while on linux
>> the same applies (correctly) to the character itself (formed by two
>> bytes in UTF8).
>> Am I doing something wrong?
> I can't reproduce this - I get identical results on Windows, Linux and 
> macOS.
> If you want to try and debug your own installation then look at 
> evilBytes() in qpy\pylupdate\metatranslator.cpp

Turns out that there's something in XML re-parsing (or maybe something 
else that escapes me). Same dataset as my previous email applies.

This is what happens if you run pylupdate (5.14.1) two times in a row in 
a windows 10 box:

(venv_latest) C:\devel\Dynamometer\Supervisor\norms>pylupdate5 -verbose 
Updating 'locale/it_IT.ts'...
     Found 2 source texts (2 new and 0 already existing)

(venv_latest) C:\devel\Dynamometer\Supervisor\norms>pylupdate5 -verbose 
Updating 'locale/it_IT.ts'...
     Found 2 source texts (1 new and 1 already existing)
     Kept 0 obsolete translations
     Removed 1 obsolete untranslated entry

The second time the UTF8 entry gets screwed up.
Everything is fine on Linux, same pylupdate version.

Digging some more...

Giuseppe Corbelli

More information about the PyQt mailing list