[PyKDE] QScintilla Lexers

Mon Jan 10 00:54:52 GMT 2005

Hello. I am in the middle of writing a text editor in Python using 
PyQt/QScintilla. I would like my editor to include syntax highlighting for 
HTML, XML, XSLT, etc. but I have found that QScintilla does not support these 
very well (or at all). 

There is a lexer for HTML, but the syntax highlighting seems very inaccurate. 
For example, it correctly colourises good syntax like <a href="index.html">, 
but then does the same for html-nonsense like <href a="index.html">. There is 
also no support for CSS. 

I thought I might be able to fix some of this myself by sub-classing 
QextScintillaLexerHTML. However, I soon found that several of the functions 
which have to be re-implemented return const char * rather than a QString. 
This means none of the lexers can be usefully sub-classed in Python because 
the QextScintilla class expects const char * return types when it sets/calls 
the lexers internally.

My next step was to try creating my own lexer classes in Python using the 
low-level API. I had some success at this, but at the expense of features 
like auto-indentation which are only provided by QextScintilla when it's 
using one of its own lexers. If only I could sub-class QextScintillaLexer in 
Python!

The next thing was to try learning a bit of C++ and write my own lexers. But 
then I had a look at the source code for the HTML lexer and thought again - 
all that stuff to handle the scripting languages made my head spin... My last 
hope was to try looking at something simpler, like the source code for the 
XML lexer - except that there is no real XML lexer because it is just an 
alias for the HTML lexer!

The real point of all this is to introduce a couple proposals for improving 
the lexers in QScintilla. The first one is obvious: make QextScintillaLexer 
Python-friendly so it can be fully sub-classed. This will open up all the 
"hidden" languages (like CSS) to Python developers who want to go ahead and 
create their own lexer classes (or sub-classes). And the second proposal is: 
include a purely markup-orientated XML lexer (i.e. one not based on the HTML 
lexer). This would then make it easy to add lexer classes for XHTML, XSLT, 
XSD, etc. by sub-classing from the XML lexer.

Does anybody have any thoughts on this, please?

-- 
yawber