[PyQt] Search method for Arabic text

Zachary Scheuren angryjaga at gmail.com
Mon Aug 27 06:44:17 BST 2018


This isn't really a PyQt question. You can do all that in basic Python, but
it can help if you have something like the pyarabic library. With that you
can strip out the vocalization before comparing strings. You also need to
consider all the possible Alefs like in str1 you have Alef with Wasla, but
str2 only has Alef. pyarabic can also help there with araby.ALEFAT which is
a list of all possible Alefs with marks. You need to manually check that
because Alef with Wasla has no Unicode decomposition and the wasla isn't
encoded as a separate mark. There have been Unicode proposals for that, but
nothing has happened so far. Anyway, I did a quick little test with your
strings...

import re
from pyarabic import araby
str3_nomarks = araby.separate(str3)[0]  # strips all diacritics
for c in araby.ALEFAT:  # replace any Alef with a mark by base Alef
    str3_nomarks = str3_nomarks.replace(c, araby.ALEF)

re.findall(str2, str3_nomarks)

Something like that will get you matches, but if you need to track the
position in a string you'll have to do some more work since dropping the
diacritics will throw off the index.



On Wed, Aug 22, 2018 at 12:43 AM, Maziar Parsijani <
maziar.parsijani at gmail.com> wrote:

> Hi
> I have some Arabic strings in mt database now I want to if I search like
> this :
>
>   str1 = "ٱلْمُفْلِحُونَ"
>   str2 = "المفلحون"
> as you can see str1 is the same as str2 but in Arabic text str1 has more
> alphabetical characters.
> Is there anyway to search str2 but I could find both of them in a string
> like :
>  str3 = " المفلحون ٱلْمُفْلِحُونَ ٱلنَّاسُ المفلحون ٱلْمُفْلِحُونَ
> المفلحون ٱلنَّاسُ المفلحون ٱلنَّاسُ "
>
> _______________________________________________
> PyQt mailing list    PyQt at riverbankcomputing.com
> https://www.riverbankcomputing.com/mailman/listinfo/pyqt
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://www.riverbankcomputing.com/pipermail/pyqt/attachments/20180826/378b917f/attachment-0001.html>


More information about the PyQt mailing list