[PyKDE] How do you get HTML source from konqueror/KHTMLPart?
Jim Bublitz
jbublitz at nwinternet.com
Thu Dec 21 00:04:26 GMT 2006
On Wednesday 20 December 2006 15:24, Marcos Dione wrote:
> On Wed, Dec 20, 2006 at 10:59:06AM -0800, yichun wei wrote:
> > I am trying to grab some html pages via KHTMLPart.openURL and scrape
> > the content I get. However I am not able to read out the HTML document
> > sources I have in KHTMLPart.
>
> just call:
>
> domDocu= part.document ()
> html= domDocu.toString ().string ()
>
> that's a QString.
>
> > kdelibs has KHTML::documentSource in khtml that can return the source of
> > the pages since 2005, however I only found .document() in pyKDE.
>
> yes; either it dissapeared from the sources or sip didn't pick it up
> or something.
It appears there was a dependency change from KDE 3.2.3 to KDE 3.3.0, where
the latter required kutils and the former didn't, and that only affected
khtml_part.sip. Because of the way I hacked around that (so that both
versions would be supported), the end result is that khtml_part.sip is stuck
at KDE 3.3.0, which it should be.
There shouldn't be any problem providing the documentSource method.
I'll get to that in the next release (which is already hopelessly late).
Jim
> > toHTML() seemed to return nothing (None or ""), while toString() gave
> > me an exception and my script crashed:
>
> yes, under certain circumstances that happens. I think it's because
> the KHTMLPart has no parentWidet or no parent or both. if you setup the
> whole apparatus for showing the part, everythings works just fine.
>
> > I find
> > some discussion which point me to use KIO.get, but it returns a
> > TransferJob and I have no idea how to get a QString from a
> > TransferJob...
>
> the kios[1] send signals when data() arrives. just use a KIO::Get
> job, connect it to a slot that accumulates the data. there's another
> signal when it finishes (result). you could also use NetAccess[2].
>
> --
> [1]
> http://developer.kde.org/documentation/library/3.5-api/kdelibs-apidocs/kio/
>kio/html/index.html
>
> [2]
> http://developer.kde.org/documentation/library/3.5-api/kdelibs-apidocs/kio/
>kio/html/classKIO_1_1NetAccess.html
More information about the PyQt
mailing list