<div></div><blockquote style="margin: 0pt 0pt 0pt 0.8ex; border-left: 1px solid rgb(204, 204, 204); padding-left: 1ex;" class="gmail_quote"><pre>There might be two different issues.<br><br>1. The encoding was set to utf-8-default, which means, no suitable encoding <br>
was detected and eric simply chose the default setting. My question is, was <br>enhanced encoding detection activated on the Editor->Filehandling page of the <br>config dialog? What is the correct encoding of the file? Would you please send <br>
it.<br><br>2. The styling may be wrong. Please check, if selecting "Alternative: <br>Django/Jinja" as the lexer language (via edittor context menu) gives a correct <br>highlighting.<br><br>Regards,<br>Detlev<br>
</pre></blockquote><div>1. Correct file encoding is utf-8 (system-wide locale), on Filehandling page also set utf-8, moreover, I've tried to turn off encoding detection with no result. There is my file <a href="http://www.mediafire.com/?fdytlqdb51z">http://www.mediafire.com/?fdytlqdb51z</a><br>
<br>2. Because I don't have the Jinja plugin installed, I've tried another lexers, such as HTML/PHP, Python and others - all of them works fine with the same file(s) (I mean there was no pretty django highlighting, but string mangling has gone too).<br>
<br>3. Finally, I'd like you to pay attention to my previous messages - the problem seems to be gone if I use the str() instead of the unicode() function. According to the <a href="http://boodebr.org/main/python/all-about-python-and-unicode">http://boodebr.org/main/python/all-about-python-and-unicode</a> , Python may return wrong values on len function when you call it for unicode strings.Now I see that str() works properly only due to it encodes unicode to ascii. So it likely won't work with, for example, japanese locale, which has no ascii implementation.<br>
Look at short demo I prepared <a href="http://img571.imageshack.us/img571/4782/snapshot2g.png">http://img571.imageshack.us/img571/4782/snapshot2g.png</a> (I show only first tag "block", text bellow is unimportant) :<br>
<ol><li>Everything works great until I use russain.</li><li>For example, english works fine.</li><li>I wrote 1 russian symbol. In this moment lexer highlighted closing tag not fully, note that exactly one symbol highlighted wrong.<br>
</li><li>I wrote second symbol and you can see that the lexer now not highlighted two symbols.</li><li>Each added russian symbol cause lexer "forget" to highlight one more symbol in closing tag.</li></ol>So, here is my explanation:<br>
One russian letter in my case takes two bytes. len(unicode(one_russian_letter)) returns, as expected, 1. But lexer, obviously, assumed that one letter takes one byte - here we get one-byte shift and symbol corruption (did you note that lexer corrupts only odd number of symbols?), so lexer badly interprets length of non-english strings. And if I replace unicode() with str(), strings encodes to one-byte ascii and lexer works fine.<br>
So, here comes two conclusions:<br><ol><li>Eric's lexer subsystyem works with strings as common ascii strings, not unicode.</li><li>Other lexers (Python, HTML/PHP) works fine, cause they convert strings to ascii.<br></li>
</ol>Please, fix me if I'm wrong.<br><br>P.S.: absolutely the same happens if I don't replace unicode() with str() and add .encode() to it in styleText method, so it looks now like "for token, txt in self.__lexer.get_tokens(unicode(self.editor.text())<b>.encode()</b>[:end + 1]):" (w/o quotes)<br>
Also excuse me for my not so good english and my manner to write much boring text.<br></div>