[PyQt] QString API v2 concern...

Thu May 9 19:59:34 BST 2013

On Monday, May 06, 2013 07:49:25 AM Phil Thompson wrote:
> The first PyQt5 snapshots are now available. You will need the current SIP
> snapshot. PyQt5 can be installed alongside PyQt4.
> 
> I welcome any suggestions for additional changes - as PyQt5 is not
> intended to be compatible with PyQt4 it is an opportunity to fix and
> improve things.
> 
> Current changes from PyQt4:
> 
> - Versions of Python earlier than v2.6 are not supported.
> 
> - PyQt4 supported a number of different API versions (QString, QVariant
> etc.). PyQt5 only implements v2 of those APIs for all versions of Python.
> 

I haven't looked into this deeper but I am a bit worried about the possible 
performance impacts of QString always being converted to a python str/unicode. 
(Not to mention the added porting work when going c++ <-> python).

The vast majority of the PyQt code that we use loads data from libraries that 
deal with Qt types, and either directly loads that data into widgets, or does 
some processing then loads the data into widgets.  I suspect that this kind of 
usage is very common.  

As an example a user of QtSql with the qsqlpsql driver that loads data and 
displays it in a list view is going to see the following data 
transformations/copies:

PyQt4 with v1 QString api:

libpq data comes from socket
-> QString (probable utf8->utf16)
-> PyQt wrapper of QString (actual data not copied or converted)
-> QString (pointer dereference to get Qt type)

PyQt5, PyQt4 with v2 QString api:

libpq data comes from socket
-> QString (probable utf8->utf16)
-> unicode (deep copy of data)
-> QString (deep copy of data)

So instead of one conversion we now have one conversion and two deep copies.  
Another very probable side-effect is that in many cases either the original 
QString and/or the unicode object will be held in memory, resulting in two or 
possibly even three copies of the data.  Even if all but the last stage is 
freed, there will still be 2 or 3 copies in memory during processing depending 
on how the code is written, which can reduce performance quite a bit depending 
on data size because of cpu cache flushing.

So far this is completely theoretical, and I'm sure in a large portion of 
applications will have no noticeable effect, however I don't like the idea that 
things may get permanently less efficient for apps that do process and display 
larger data sets.

The one thing that stands out to me as possibly being a saving grace is the 
fact that (at least in my understanding) both Qt and python use utf16 as their 
internal string format, which means fast copies instead of slower conversions, 
and that it may be possible with some future Qt/python changes to actually 
allow QString -> unicode -> QString without any data copies.

At some point I will try to do some benchmarks and look into the actual code 
to see if there is an elegant solution to this potential problem.

Matt