[PyQt] Not able to traverse the DOM in a QWebView after reload

Kovid Goyal kovid at kovidgoyal.net
Sat Oct 4 04:35:53 BST 2014


You need to spin the event loop between actions, the best way to do that
is to connect to the loadFinished signal and only do your DOM
modifications there.

Kovid.

On Fri, Oct 03, 2014 at 06:02:14PM -0700, Brian Knudson wrote:
> Hello all,
> 
> I'm using a QtWebKit.QWebView widget to display some content in an app.
> 
> The app generates html, then displays it in the QWebView widget using setHtml().
> 
> After the data is displayed, under certain circumstances, I need to walk through the elements in the page, find certain tags and adjust them. 
> 
> This all works quite well the first time one of the QWebViews is displayed (actually, it seems to work best on the *second* page viewed, not the first).  However, if the user requests to look at a different page (which then just calls setHtml again, but with different data), while the page is displayed correctly, the traversal fails.  In fact, if I try to fetch the html data with self.page().mainFrame().toHtml(), the html returned is truncated.  Again, the data is correctly displayed, but I can't seem to get or traverse the html.
> 
> I can't really put together a working example, but here's some of the important bits:
> 
> - My class inherits from QtWebKit.QWebView
> - When requested by the user, html is generated and stored in a string which is pushed to the page with self.setHtml(data).  That string contains everything from the opening html tag to the closing html tag - head, script, body, etc. tags included.
> - After the page is displayed, the QWebElements of the page are traversed, some of which are modified.
> - I've been checking that the data was set correctly by writing it out to tmp files: 1.html (which comes from fp.write(data) where "data" is the same string that was used in setHtml(data)) and 2.html (which comes from self.page().mainFrame().toHtml()).
> 	1.html looks like I expect it to - a fully formed html page that can also be viewed by other web browsers.
> 	2.html is a truncated version of 1.html.  Always truncated in the same spot.  If it matters, it's truncated after the second script tag is closed.  There's also a closing html tag appended to it.
> - traversal happens like so:
>         document = self.page().mainFrame().documentElement()
>         self.examineChildElements(document)
> 
> def examineChildElements(self, parentElement):
>         element = parentElement.firstChild()
>         while not element.isNull():
> 	    if re.match(self.__exrpat,element.attribute("data-original")):
> 		doStuff()
>             self.examineChildElements(element)
>             element = element.nextSibling()
> 
> To be clear, the question is this: Why, after calling setHtml, modifying some tags, then setHtml again, does self.page().mainFrame().toHtml() not give me correct html (the stuff from the second setHtml()) back?  Should I not be re-using the same page and calling setHtml over and over again?  
> At the end of the day, I want to be able to generate some html, display it, possibly modify it, then generate different html, display it, possibly modify it, etc, etc, etc.
> 
> Thanks,
> -Brian
> 
> !DSPAM:3,542f474316654769438357!

> _______________________________________________
> PyQt mailing list    PyQt at riverbankcomputing.com
> http://www.riverbankcomputing.com/mailman/listinfo/pyqt
> 
> !DSPAM:3,542f474316654769438357!


-- 
_____________________________________

Dr. Kovid Goyal 
http://www.kovidgoyal.net
http://calibre-ebook.com
_____________________________________
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 181 bytes
Desc: not available
URL: <http://www.riverbankcomputing.com/pipermail/pyqt/attachments/20141004/7341aeeb/attachment-0001.sig>


More information about the PyQt mailing list