From: Nathan Harmston on
Hi,

So I m trying to use a very large regular expression, basically I have
a list of items I want to find in text, its kind of a conjunction of
two regular expressions and a big list......not pretty. However
everytime I try to run my code I get this exception:

OverflowError: regular expression code size limit exceeded

I understand that there is a Python imposed limit on the size of the
regular expression. And although its not nice I have a machine with
12Gb of RAM just waiting to be used, is there anyway I can alter
Python to allow big regular expressions?

Could anyone suggest other methods of these kind of string matching in
Python? I m trying to see if my swigged alphabet trie is faster than
whats possible in Python!

Many thanks,


Nathan
From: Stefan Behnel on
Nathan Harmston, 15.03.2010 13:21:
> So I m trying to use a very large regular expression, basically I have
> a list of items I want to find in text, its kind of a conjunction of
> two regular expressions and a big list......not pretty. However
> everytime I try to run my code I get this exception:
>
> OverflowError: regular expression code size limit exceeded
>
> I understand that there is a Python imposed limit on the size of the
> regular expression. And although its not nice I have a machine with
> 12Gb of RAM just waiting to be used, is there anyway I can alter
> Python to allow big regular expressions?
>
> Could anyone suggest other methods of these kind of string matching in
> Python?

If what you are trying to match is in fact a set of strings instead of a
set of regular expressions, you might find this useful:

http://pypi.python.org/pypi/acora

Stefan

From: Alain Ketterlin on
Nathan Harmston <iwanttobeabadger(a)googlemail.com> writes:

[...]
> Could anyone suggest other methods of these kind of string matching in
> Python? I m trying to see if my swigged alphabet trie is faster than
> whats possible in Python!

Since you mention using a trie, I guess it's just a big alternative of
fixed strings. You may want to try using the Aho-Corasick variant. It
looks like there are several implementations (google finds at least
two). I would be surprised if any pure python solution were faster than
tries implemented in C. Don't forget to tell us your findings.

-- Alain.
From: MRAB on
Nathan Harmston wrote:
> Hi,
>
> So I m trying to use a very large regular expression, basically I have
> a list of items I want to find in text, its kind of a conjunction of
> two regular expressions and a big list......not pretty. However
> everytime I try to run my code I get this exception:
>
> OverflowError: regular expression code size limit exceeded
>
> I understand that there is a Python imposed limit on the size of the
> regular expression. And although its not nice I have a machine with
> 12Gb of RAM just waiting to be used, is there anyway I can alter
> Python to allow big regular expressions?
>
> Could anyone suggest other methods of these kind of string matching in
> Python? I m trying to see if my swigged alphabet trie is faster than
> whats possible in Python!
>
There's the regex module at http://pypi.python.org/pypi/regex. It'll
even release the GIL while matching on strings! :-)