From: William Ahern on 20 Dec 2009 03:07
David Combs <dkcombs(a)panix.com> wrote:
> Trust me, friend -- regular expressions are what make unix/linux
> so useful. And no, they're not all that trivially simple,
> but man are they POWERFUL. One line can transform stuff
> that would take a complicated MULTI-line program NOT using them.
Regular expressions are very useful, but also very limited. For one thing,
they can only represent regular languages, which means among other things
that you can't express nested structures. The greediness and lookbehind
operators do help, but if you look in the appendix of Mastering Regular
Expressions you'll find a regex which can parse any valid e-mail address;
it's two pages long! (At least, that's how I remember, but I haven't opened
that book in over 5 years.)
Perl 6 will come with something called Parsing Expression Grammars, which
are much more powerful. (Though Perl 6 didn't invent them.) I think this
will probably be the future, but it will obviously takes many years for the
rest of the world to catch up. Lua currently has one of the better
implementations, in terms of language integration.
For C I use Ragel for regular expressions. In Ragel you can handled nested
structures--and many other issues--easily because it let's you jump to
different [state] machines explicitly, and allows the use of a state stack.
Using Ragel I've discovered ambiguities in several RFC ABNF specifications
which are silently papered over by most common regular expression engines.
The big problem with regex's is that people just slop them together, and
never notice the bugs. They have been, and will continue to be, one of the
major sources of bugs and security issues.
Sometimes they just get used too much. For instance, the following Perl
basename implementation reads much better to me than any regex would:
print STDOUT (split "/", shift)[-1], "\n"
From: Janis Papanagnou on 20 Dec 2009 04:12
Chris F.A. Johnson wrote:
> On 2009-12-20, David Combs wrote:
> I very rarely use anything more than very simple regular
> expressions. Complex REs are more trouble than they're worth,
> especially when they need to be modified.
In some languages (awk, for example, where you can compose them in
strings[*]) you can define them in a way that looks similar to a
quite good readable BNF notation. Being able to compose them that
way and reuse all parts in many places of the regexp definitions,
reduces a lot of their complexity and crypticality and makes them
a pleasure to use.
[*] The usual caveats apply.
From: Janis Papanagnou on 20 Dec 2009 04:16
William Ahern wrote:
> David Combs <dkcombs(a)panix.com> wrote:
> Regular expressions are very useful, but also very limited. For one thing,
> they can only represent regular languages, which means among other things
> that you can't express nested structures.
And back-references, to name another prominent example, which also do not
belong to the class of regular languages, but are nonetheless added to some
programming languages and libraries.
> For C I use Ragel for regular expressions. In Ragel you can handled nested
> structures--and many other issues--easily because it let's you jump to
> different [state] machines explicitly, and allows the use of a state stack.
Thanks for that useful hint.