[PyQt] Inconsistent pylupdate5 behaviour on UTF8 data

Giuseppe Corbelli corbelligiuseppe at mesdan.it
Wed Feb 12 15:27:09 GMT 2020


Hi all
I found a puzzling pylupdate5 behaviour inconsistency between Linux and 
Windows versions.
Scenario: I am extracting translatable strings from python modules. The 
files are saved as UTF8, I run pylupdate and get different 
representations in the XML output.

pylupdate5 v5.14.1 as Debian package on Linux and fresh pip install in a 
venv on Windows 10.

As you can find in the attached test data:

- on windows the 'ç' character (U+00E7	ç	c3 a7	LATIN SMALL LETTER C WITH 
CEDILLA) is converted to <source>this needs UTF8 encoding: 
&#xc3;&#xa7;&#xc2;&#xb0;&#xc2;&#xa7;</source>

- on linux the same 'ç' correctly converts to <source>this needs UTF8 
encoding: &#xe7;&#xb0;&#xa7;</source>

So it seems that on windows each byte of the utf8 string is replaced 
with its unicode point in xml numeric character format, while on linux 
the same applies (correctly) to the character itself (formed by two 
bytes in UTF8).

Am I doing something wrong?

Thanks
-- 
Giuseppe Corbelli
-------------- next part --------------
A non-text attachment was scrubbed...
Name: it_IT.ts.linux
Type: text/xml
Size: 504 bytes
Desc: not available
URL: <https://www.riverbankcomputing.com/pipermail/pyqt/attachments/20200212/ccc67519/attachment.xml>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: it_IT.ts.win32
Type: text/xml
Size: 522 bytes
Desc: not available
URL: <https://www.riverbankcomputing.com/pipermail/pyqt/attachments/20200212/ccc67519/attachment-0001.xml>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: module.py
Type: text/x-python
Size: 107 bytes
Desc: not available
URL: <https://www.riverbankcomputing.com/pipermail/pyqt/attachments/20200212/ccc67519/attachment.py>
-------------- next part --------------
CODECFORSRC = UTF-8

TRANSLATIONS = it_IT.ts

SOURCES = module.py



More information about the PyQt mailing list