[PyQt] QString API v2 concern...
Matt Newell
newellm at blur.com
Thu May 9 19:59:34 BST 2013
On Monday, May 06, 2013 07:49:25 AM Phil Thompson wrote:
> The first PyQt5 snapshots are now available. You will need the current SIP
> snapshot. PyQt5 can be installed alongside PyQt4.
>
> I welcome any suggestions for additional changes - as PyQt5 is not
> intended to be compatible with PyQt4 it is an opportunity to fix and
> improve things.
>
> Current changes from PyQt4:
>
> - Versions of Python earlier than v2.6 are not supported.
>
> - PyQt4 supported a number of different API versions (QString, QVariant
> etc.). PyQt5 only implements v2 of those APIs for all versions of Python.
>
I haven't looked into this deeper but I am a bit worried about the possible
performance impacts of QString always being converted to a python str/unicode.
(Not to mention the added porting work when going c++ <-> python).
The vast majority of the PyQt code that we use loads data from libraries that
deal with Qt types, and either directly loads that data into widgets, or does
some processing then loads the data into widgets. I suspect that this kind of
usage is very common.
As an example a user of QtSql with the qsqlpsql driver that loads data and
displays it in a list view is going to see the following data
transformations/copies:
PyQt4 with v1 QString api:
libpq data comes from socket
-> QString (probable utf8->utf16)
-> PyQt wrapper of QString (actual data not copied or converted)
-> QString (pointer dereference to get Qt type)
PyQt5, PyQt4 with v2 QString api:
libpq data comes from socket
-> QString (probable utf8->utf16)
-> unicode (deep copy of data)
-> QString (deep copy of data)
So instead of one conversion we now have one conversion and two deep copies.
Another very probable side-effect is that in many cases either the original
QString and/or the unicode object will be held in memory, resulting in two or
possibly even three copies of the data. Even if all but the last stage is
freed, there will still be 2 or 3 copies in memory during processing depending
on how the code is written, which can reduce performance quite a bit depending
on data size because of cpu cache flushing.
So far this is completely theoretical, and I'm sure in a large portion of
applications will have no noticeable effect, however I don't like the idea that
things may get permanently less efficient for apps that do process and display
larger data sets.
The one thing that stands out to me as possibly being a saving grace is the
fact that (at least in my understanding) both Qt and python use utf16 as their
internal string format, which means fast copies instead of slower conversions,
and that it may be possible with some future Qt/python changes to actually
allow QString -> unicode -> QString without any data copies.
At some point I will try to do some benchmarks and look into the actual code
to see if there is an elegant solution to this potential problem.
Matt
More information about the PyQt
mailing list