From: Roy Smith on
Stephen Hansen <me+list/python(a)ixokai.io> wrote:

> The quote does not deny the power of regular expressions; it challenges
> widely held assumption and belief that comes from *somewhere* that they
> are the best way to approach any problem that is text related.

Well, that assumption comes from historical unix usage where traditional
tools like awk, sed, ed, and grep, made heavy use of regex, and
therefore people learned to become proficient at them and use them all
the time. Somewhat later, the next generation of tools such as vi and
perl continued that tradition. Given the tools that were available at
the time, regex was indeed likely to be the best tool available for most
text-related problems.

Keep in mind that in the early days, people were working on hard-copy
terminals [[http://en.wikipedia.org/wiki/ASR-33]] so economy of
expression was a significant selling point for regexes.

Not trying to further this somewhat silly debate, just adding a bit of
historical viewpoint to answer the implicit question you ask as to where
the assumption came from.
From: Stephen Hansen on
On 7/1/10 3:03 AM, Jean-Michel Pichavant wrote:
> Re is part of the python standard library, for some purpose I guess.

No, *really*?

So all those people who have been advocating its useless and shouldn't
be are already too late?

Damn.

Well, there goes *that* whole crusade we were all out on. Since we can't
destroy re, maybe we can go club baby seals.

--

... Stephen Hansen
... Also: Ixokai
... Mail: me+list/python (AT) ixokai (DOT) io
... Blog: http://meh.ixokai.io/

From: Stephen Hansen on
On 7/1/10 5:11 AM, Roy Smith wrote:
> Stephen Hansen<me+list/python(a)ixokai.io> wrote:
>
>> The quote does not deny the power of regular expressions; it challenges
>> widely held assumption and belief that comes from *somewhere* that they
>> are the best way to approach any problem that is text related.
>
> Well, that assumption comes from historical unix usage where traditional
> tools like awk, sed, ed, and grep, made heavy use of regex, and
> therefore people learned to become proficient at them and use them all
> the time.

Oh, I'm fully aware of the history of re's -- but its not those old hats
and even their students and the unix geeks I'm talking about.

It's the newbies and people wandering into the language with absolutely
no idea about the history of unix, shell scripting and such, who so
often arrive with the idea firmly planted in their head, that I wonder
at. Sure, there's going to be a certain amount of cross-polination from
unix-geeks to students-of-students-of-students-of unix geeks to spread
the idea, but it seems more pervasive for that. I just picture a
re-vangelist camping out in high schools and colleges selling the party
line or something :)

--

... Stephen Hansen
... Also: Ixokai
... Mail: me+list/python (AT) ixokai (DOT) io
... Blog: http://meh.ixokai.io/

P.S. And no, unix geeks is not a pejorative term.
From: Lawrence D'Oliveiro on
In message <pan.2010.06.29.09.35.18.594000(a)nowhere.com>, Nobody wrote:

> On Tue, 29 Jun 2010 12:30:36 +1200, Lawrence D'Oliveiro wrote:
>
>>> Seriously, almost every other kind of library uses a binary API. What
>>> makes databases so special that they need a string-command based API?
>>
>> HTML is also effectively a string-based API.
>
> HTML is a data format. The sane way to construct or manipulate HTML is via
> the DOM, not string operations.

What is this “DOM” of which you speak? I looked here
<http://docs.python.org/library/>, but can find nothing that sounds like
that, that is relevant to HTML.

>> And what about regular expressions?
>
> What about them? As the saying goes:
>
> Some people, when confronted with a problem, think
> "I know, I'll use regular expressions."
> Now they have two problems.
>
> They have some uses, e.g. defining tokens[1]. Using them to match more
> complex constructs is error-prone ...

What if they're NOT more complex, but they can simply contain user-entered
data?

>> And all the functionality available through the subprocess
>> module and its predecessors?
>
> The main reason why everyone recommends subprocess over its predecessors
> is that it allows you to bypass the shell, which is one of the most
> common sources of the type of error being discussed in this thread.

How would you deal with this, then: I wrote a script called ExtractMac, to
convert various old Macintosh-format documents accumulated over the years
(stored in AppleDouble form by uploading to a Netatalk server) to more
cross-platform formats. This has a table of conversion commands to use. For
example, the entries for PICT and TEXT Macintosh file types look like this:

"PICT" :
{
"type" : "image",
"ext" : ".png",
"act" : "convert %(src)s %(dst)s",
},
"TEXT" :
{
"type" : "text",
"ext" : ".txt",
"act" : "LineEndings unix <%(src)s >%(dst)s",
},

The conversion code that uses this table looks like

Cmd = \
(
Act.get("act", "cp -p %(src)s %(dst)s")
%
{
"src" : ShellEscape(Src),
"dst" : ShellEscape(DstFileName),
}
)
sys.stderr.write("Doing: %s\n" % Cmd)
Status = os.system(Cmd)

How much simpler would your alternative be? I don't think it would be
simpler at all.
From: Rami Chowdhury on
On Saturday 03 July 2010 19:33:44 Lawrence D'Oliveiro wrote:
> In message <pan.2010.06.29.09.35.18.594000(a)nowhere.com>, Nobody wrote:
> > On Tue, 29 Jun 2010 12:30:36 +1200, Lawrence D'Oliveiro wrote:
> >>> Seriously, almost every other kind of library uses a binary API. What
> >>> makes databases so special that they need a string-command based API?
> >>
> >> HTML is also effectively a string-based API.
> >
> > HTML is a data format. The sane way to construct or manipulate HTML is
> > via the DOM, not string operations.
>
> What is this “DOM” of which you speak? I looked here
> <http://docs.python.org/library/>, but can find nothing that sounds like
> that, that is relevant to HTML.
>

The Document Object Model - I don't think the standard library has an HTML DOM
module but there's certainly one for XML (and XHTML):
http://docs.python.org/library/xml.dom.html

----
Rami Chowdhury
"Any sufficiently advanced incompetence is indistinguishable from malice."
-- Grey's Law
+1-408-597-7068 / +44-7875-841-046 / +88-01819-245544