[PyKDE] How do you get HTML source from konqueror/KHTMLPart?

Marcos Dione mdione at grulic.org.ar
Wed Dec 20 23:24:58 GMT 2006

On Wed, Dec 20, 2006 at 10:59:06AM -0800, yichun wei wrote:
> I am trying to grab some html pages via KHTMLPart.openURL and scrape
> the content I get. However I am not able to read out the HTML document
> sources I have in KHTMLPart.

    just call:

domDocu= part.document ()
html= domDocu.toString ().string ()

    that's a QString.

> kdelibs has KHTML::documentSource in khtml that can return the source of the
> pages since 2005, however I only found .document() in pyKDE. 

    yes; either it dissapeared from the sources or sip didn't pick it up
or something.

> toHTML() seemed to return nothing (None or ""), while toString() gave
> me an exception and my script crashed:

    yes, under certain circumstances that happens. I think it's because
the KHTMLPart has no parentWidet or no parent or both. if you setup the
whole apparatus for showing the part, everythings works just fine.

> I find
> some discussion which point me to use KIO.get, but it returns a
> TransferJob and I have no idea how to get a QString from a
> TransferJob...

    the kios[1] send signals when data() arrives. just use a KIO::Get
job, connect it to a slot that accumulates the data. there's another
signal when it finishes (result). you could also use NetAccess[2].

[1] http://developer.kde.org/documentation/library/3.5-api/kdelibs-apidocs/kio/kio/html/index.html

[2] http://developer.kde.org/documentation/library/3.5-api/kdelibs-apidocs/kio/kio/html/classKIO_1_1NetAccess.html
(Not so) Random fortune:
[11:50] <xanthus> m4rgin4l: si, pero es un pais civilizado por mas que sea un caos
            -- xanthus, hablando de Argentina.

More information about the PyQt mailing list