[PyQt] Inconsistent pylupdate5 behaviour on UTF8 data

Giuseppe Corbelli corbelligiuseppe at mesdan.it
Wed Feb 19 09:06:50 GMT 2020


On 2/18/20 5:58 PM, Phil Thompson wrote:
> What if you use trUtf8() instead if tr()?

I explored all the combinations I could think of on Windows 10, pyqt 
5.14.1 from pip and linguist 5.13.2 and I could NOT find any working 
combination. Below I am attaching the test results. Rather lengthy and 
boring I fear.

If gist is preferrable:
https://gist.github.com/cowo78/26057f575ddfa3ee20a0b636acd894ff


Section A - using trUtf8() in code
===============================================================================
Using trUtf8 I ALWAYS get a 'Non-ASCII character detected in trUtf8 
string' warning

Case 1 - NOT working
-------------------------------------------------------------------------------
trUtf8()
# CODECFORSRC = UTF-8
# CODECFORTR = UTF-8

Message created:
<message encoding="UTF-8">
     <location filename="../translations_for_testsuite.py" line="6"/>
     <source>this needs UTF8 encoding: ç°§</source>
     <translation type="unfinished"></translation>
</message>

Repeated pylupdate5 runs are OK, the same message is consistently generated.

Processed by linguist 5.13.2:
<message>
     <location filename="../translations_for_testsuite.py" line="6"/>
     <source>this needs UTF8 encoding: ç°§</source>
     <translation>UTF8</translation>
</message>

Reprocessed by pylupdate5
<message>
     <location filename="../translations_for_testsuite.py" line="6"/>
     <source>this needs UTF8 encoding: &#xe7;&#xb0;&#xa7;</source>
     <translation type="obsolete">UTF8</translation>
</message>
<message encoding="UTF-8">
     <location filename="../translations_for_testsuite.py" line="6"/>
     <source>this needs UTF8 encoding: ç°§</source>
     <translation type="unfinished"></translation>
</message>


Case 2 - NOT working
-------------------------------------------------------------------------------
trUtf8()
CODECFORSRC = UTF-8
# CODECFORTR = UTF-8

Message created the FIRST time and subsequent ODD runs
<message encoding="UTF-8">
     <location filename="../translations_for_testsuite.py" line="6"/>
     <source>this needs UTF8 encoding: 簧</source>
     <translation type="unfinished"></translation>
</message>

Message created the SECOND time and subsequent EVEN runs
<message encoding="UTF-8">
     <location filename="../translations_for_testsuite.py" line="6"/>
     <source>this needs UTF8 encoding: ç°§</source>
     <translation type="unfinished"></translation>
</message>


Case 3 - NOT working
-------------------------------------------------------------------------------
trUtf8()
# CODECFORSRC = UTF-8
CODECFORTR = UTF-8

Message created:
<message encoding="UTF-8">
     <location filename="../translations_for_testsuite.py" line="6"/>
     <source>this needs UTF8 encoding: ç°§</source>
     <translation type="unfinished"></translation>
</message>

Repeated pylupdate5 runs are OK, the same message is consistently generated.

Processed by linguist 5.13.2:
<message>
     <location filename="../translations_for_testsuite.py" line="6"/>
     <source>this needs UTF8 encoding: ç°§</source>
     <translation>utf8</translation>
</message>

Reprocessed by pylupdate5
<message>
     <location filename="../translations_for_testsuite.py" line="6"/>
     <source>this needs UTF8 encoding: &#xe7;&#xb0;&#xa7;</source>
     <translation type="obsolete">utf8</translation>
</message>
<message encoding="UTF-8">
     <location filename="../translations_for_testsuite.py" line="6"/>
     <source>this needs UTF8 encoding: ç°§</source>
     <translation type="unfinished"></translation>
</message>


Case 4 - NOT working
-------------------------------------------------------------------------------
trUtf8()
CODECFORSRC = UTF-8
CODECFORTR = UTF-8

Message created:
<message encoding="UTF-8">
     <location filename="../translations_for_testsuite.py" line="6"/>
     <source>this needs UTF8 encoding: ç°§</source>
     <translation type="unfinished"></translation>
</message>

Repeated pylupdate5 runs are OK, the same message is consistently generated.

Processed by linguist 5.13.2:
<message>
     <location filename="../translations_for_testsuite.py" line="6"/>
     <source>this needs UTF8 encoding: ç°§</source>
     <translation>utf8</translation>
</message>

Reprocessed by pylupdate5:
<message>
     <location filename="../translations_for_testsuite.py" line="6"/>
     <source>this needs UTF8 encoding: &#xe7;&#xb0;&#xa7;</source>
     <translation type="obsolete">utf8</translation>
</message>
<message encoding="UTF-8">
     <location filename="../translations_for_testsuite.py" line="6"/>
     <source>this needs UTF8 encoding: ç°§</source>
     <translation type="unfinished"></translation>
</message>


Section B - using tr() in code
===============================================================================
Case 1 - NOT working

-------------------------------------------------------------------------------

tr()

# CODECFORSRC = UTF-8

# CODECFORTR = UTF-8



Message created:

<message>

     <location filename="../translations_for_testsuite.py" line="6"/>

     <source>this needs UTF8 encoding: 
&#xc3;&#xa7;&#xc2;&#xb0;&#xc2;&#xa7;</source>

     <translation type="unfinished"></translation>

</message>



Repeated runs OK.



Linguist shows WRONG characters as the source is incorrectly formatted.





Case 2 - NOT working

-------------------------------------------------------------------------------

tr()

CODECFORSRC = UTF-8

# CODECFORTR = UTF-8



Message created the FIRST time and subsequent ODD runs

<message>

     <location filename="../translations_for_testsuite.py" line="6"/>

     <source>this needs UTF8 encoding: &#xe7;&#xb0;&#xa7;</source>

     <translation type="unfinished"></translation>

</message>



Message created the SECOND time and subsequent EVEN runs

<message>

     <location filename="../translations_for_testsuite.py" line="6"/>

     <source>this needs UTF8 encoding: 
&#xc3;&#xa7;&#xc2;&#xb0;&#xc2;&#xa7;</source>

     <translation type="unfinished"></translation>

</message>





Case 3 - NOT working

-------------------------------------------------------------------------------

tr()

# CODECFORSRC = UTF-8

CODECFORTR = UTF-8



Message created:

<message>

     <location filename="../translations_for_testsuite.py" line="6"/>

     <source>this needs UTF8 encoding: 
&#xc3;&#xa7;&#xc2;&#xb0;&#xc2;&#xa7;</source>

     <translation type="unfinished"></translation>

</message>



Linguist shows WRONG characters as the source is incorrectly formatted.





Case 4 - NOT working

-------------------------------------------------------------------------------

tr()

CODECFORSRC = UTF-8

CODECFORTR = UTF-8



Message created:

<message encoding="UTF-8">

     <location filename="../translations_for_testsuite.py" line="6"/>

     <source>this needs UTF8 encoding: ç°§</source>

     <translation type="unfinished"></translation>

</message>



Repeated pylupdate5 runs are OK, the same message is consistently generated.



Processed by linguist 5.13.2:

<message>

     <location filename="../translations_for_testsuite.py" line="6"/>

     <source>this needs UTF8 encoding: ç°§</source>

     <translation>utf8</translation>

</message>



Reprocessed by pylupdate5:

<message>

     <location filename="../translations_for_testsuite.py" line="6"/>

     <source>this needs UTF8 encoding: &#xe7;&#xb0;&#xa7;</source>

     <translation>utf8</translation>

</message>



Reprocessed by pylupdate5 on subsequent runs:

<message>

     <location filename="../translations_for_testsuite.py" line="6"/>

     <source>this needs UTF8 encoding: &#xe7;&#xb0;&#xa7;</source>

     <translation type="obsolete">utf8</translation>

</message>

<message>

     <location filename="../translations_for_testsuite.py" line="6"/>

     <source>this needs UTF8 encoding: 
&#xc3;&#xa7;&#xc2;&#xb0;&#xc2;&#xa7;</source>

     <translation type="unfinished"></translation>

</message>



Those who survived until here must be brave.

-- 
Giuseppe Corbelli


More information about the PyQt mailing list