[PyQt] QString API v2 concern...

Sat May 11 17:12:59 BST 2013

On Thu, 9 May 2013 11:59:34 -0700, Matt Newell <newellm at blur.com> wrote:
> On Monday, May 06, 2013 07:49:25 AM Phil Thompson wrote:
>> The first PyQt5 snapshots are now available. You will need the current
>> SIP
>> snapshot. PyQt5 can be installed alongside PyQt4.
>> 
>> I welcome any suggestions for additional changes - as PyQt5 is not
>> intended to be compatible with PyQt4 it is an opportunity to fix and
>> improve things.
>> 
>> Current changes from PyQt4:
>> 
>> - Versions of Python earlier than v2.6 are not supported.
>> 
>> - PyQt4 supported a number of different API versions (QString, QVariant
>> etc.). PyQt5 only implements v2 of those APIs for all versions of
Python.
>> 
> 
> I haven't looked into this deeper but I am a bit worried about the
> possible 
> performance impacts of QString always being converted to a python
> str/unicode. 
> (Not to mention the added porting work when going c++ <-> python).
> 
> The vast majority of the PyQt code that we use loads data from libraries
> that 
> deal with Qt types, and either directly loads that data into widgets, or
> does 
> some processing then loads the data into widgets.  I suspect that this
> kind of 
> usage is very common.  
> 
> As an example a user of QtSql with the qsqlpsql driver that loads data
and 
> displays it in a list view is going to see the following data 
> transformations/copies:
> 
> PyQt4 with v1 QString api:
> 
> libpq data comes from socket
> -> QString (probable utf8->utf16)
> -> PyQt wrapper of QString (actual data not copied or converted)
> -> QString (pointer dereference to get Qt type)
> 
> PyQt5, PyQt4 with v2 QString api:
> 
> libpq data comes from socket
> -> QString (probable utf8->utf16)
> -> unicode (deep copy of data)
> -> QString (deep copy of data)
> 
> So instead of one conversion we now have one conversion and two deep
> copies.  
> Another very probable side-effect is that in many cases either the
> original 
> QString and/or the unicode object will be held in memory, resulting in
two
> or 
> possibly even three copies of the data.  Even if all but the last stage
is 
> freed, there will still be 2 or 3 copies in memory during processing
> depending 
> on how the code is written, which can reduce performance quite a bit
> depending 
> on data size because of cpu cache flushing.
> 
> So far this is completely theoretical, and I'm sure in a large portion
of 
> applications will have no noticeable effect, however I don't like the
idea
> that 
> things may get permanently less efficient for apps that do process and
> display 
> larger data sets.
> 
> The one thing that stands out to me as possibly being a saving grace is
> the 
> fact that (at least in my understanding) both Qt and python use utf16 as
> their 
> internal string format, which means fast copies instead of slower
> conversions, 
> and that it may be possible with some future Qt/python changes to
actually 
> allow QString -> unicode -> QString without any data copies.
> 
> At some point I will try to do some benchmarks and look into the actual
> code 
> to see if there is an elegant solution to this potential problem.

The v2 API was first considered for PyQt3. It was rejected because of
performance concerns - those concerns were never validated. Python3 has
always defaulted to the v2 API - a period of 4 years - and I've never seen
any complaints about it.

So, yes, you need to show there is a real problem.

Phil