[PyQt] Sorting / Normalizing Unicode text (Myanmar)

Timothy W. Grove tim_grove at sil.org
Mon Oct 10 13:07:35 BST 2016


I had the following feedback from someone testing my application with 
Myanmar text. Essentially a user in some cases may type the same word 
into a QTextEdit with a different character order and still have the 
text displayed correctly, but only one character order will result in 
correct ordering when a number of words are sorted in a list 
(QTreeWidget). A user shouldn't be forced to input the characters in any 
particular order, but some sort of normalization routine would be good. 
Is this something that Qt (PyQt) should do automatically or is it up to 
me to create something custom? Thanks for any advice.

Best regards,
Tim


User feedback:

However, we did find a problem with the OSX 10.11.1 machine in a 
different area where the OSX 10.9.5 machine performed better.
The OSX 10.11.1 machine allowed us to enter sequences of Myanmar 
characters in two alternate orders.
for example:
U+102D U+102F  OR U+102F U+102D
U+103C U+1015  OR U+1015 U+103C
U+1031 U+1000  OR U+1000 U+1031
(in each case, the second order is the "legal" order in which the codes 
must be stored in order to sort properly)
Both keying orders resulted in text that displayed properly, but the 
"illegal orders" did not sort in their proper alphabetical order. When 
we exported the project back to the OSX 10.9.5 machine the illegal 
orders did not display properly. When typing in the OSX 10.9.5 it does 
not allow one to type the "illegal order" and shows the characters 
either out of order or with a place holder circle separating them to 
indicate that there is a problem.
This issue on the OSX 10.11.1 wouldn't be a problem if the typist in OSX 
10.11.1 always uses the correct typing order. But since the Burmese 
script is not strictly linear this is difficult for Deaf to learn as 
there are no visual clues (only auditory ones) about which character to 
type first, and previous legacy font keyboards have used various orders 
so the  typists don't have an ingrained habit. In some cases the Unicode 
system specifies that either keying order can be accepted but the 
"illegal" one needs to be automatically normalized and stored as the 
"legal" order to facilitate consistency and proper sorting and searching.



More information about the PyQt mailing list