From: Sven Mascheck on
Janis Papanagnou wrote:

> [...] In Kornshell, just to name one example, you
> have the @(...|...), *(...|...), +(...|...), and even the
> powerful !(...|...) regular expression meta-constructs, in addition
> to the more primitive * ? [...] [^...] regexps that can be used in
> file globbing (i.e. regexps coupled to file object search), as well.
>
> Globbing is the use of a regular expression to select the subset
> of matching files; i.e. regular expressions coupled to a concrete
> set ob objects on the file system. In shell you can disable file
> globbing and stay with the regular expressions alone, for example
> in case statements or some implementation's if [[...]] constructs.


It just doesn't make sense to use the term regular expression for
globbing, because it's not called like that in documentation
(even if there is no fixed term but some variations in practice
like globbing, wildcards, pattern matching)

Only few applications use globbing, e.g., shell, find, Debian dpkg.

Regular expression implementations show even more variations in
practice than globbing, but they are sufficiently different from
globbing on unix, so that it doesn't make sense to mix terms.

PS: why are they characteristically different:
The motivation for globbing was *intuitive* handling of file names
- sometimes overlooked but important: globbing uses implicit anchors.
I believe globbing was simply "recycled" in the other places, that is,
the case condition and pattern matching parameter expansion.
From: Seebs on
On 2010-01-24, Janis Papanagnou <janis_papanagnou(a)hotmail.com> wrote:
> Seebs wrote:
>> On 2010-01-23, Janis Papanagnou <janis_papanagnou(a)hotmail.com> wrote:
>>> Of course it does. File globbing uses regular expressions.
>> No, it doesn't. File globbing uses shell patterns.

> Yes, and "patterns are often described using regular expressions" [Wikipedia]

But "shell patterns" (aka globs) are a special case, because the set of things
they can represent is absolutely a TINY proper subset of what POSIX regular
expressions can do.

>> They're confusingly similar in a few ways, but quite different.

> In which way different? (Beyond differences in usability that I mentioned
> upthread.) It would be helpful to elaborate that beyond a simple statement.
> Can you provide a standard grep(1) example that we cannot implement with
> pattern matching capabilities (globbing without files) of a typical shell?

Yes.

Okay, quick summary: The key is that shell patterns have no grouping and no
repetition operators. You can map RE '.' onto glob '?', and RE '.*' onto
glob '*'. There is nothing you can write in shell glob that corresponds
to 'a*', or even to '.?'. You can handle the anchoring/no-anchoring thing --
you can wrap a regex in '^$' or a pattern in '**'. But you can't make up
for the lack of grouping and repetition.

re: foo(bar)?
glob: ... you can't do that.

Note that case statements are closer, because you could do:
foo | foobar )

-s
--
Copyright 2010, all wrongs reversed. Peter Seebach / usenet-nospam(a)seebs.net
http://www.seebs.net/log/ <-- lawsuits, religion, and funny pictures
http://en.wikipedia.org/wiki/Fair_Game_(Scientology) <-- get educated!
From: Janis Papanagnou on
Sven Mascheck wrote:
> [...]
>
> Regular expression implementations show even more variations in
> practice than globbing, but they are sufficiently different from
> globbing on unix, so that it doesn't make sense to mix terms.

The terminology is already foobar'ed. People now seem to associate the
concepts of regular expressions (as defined by formal language theory)
bound to the regexp library on Unix (BRE and ERE). And they call every
extension of those functions still as regular expressions, whether they
are or not. Usability extensions (like the \d and many others) are okay,
but even backreferences are introduced in some tools and the libs and
expressions are still called regular, even it they are not. OTOH, a language
that is conforming to a Chomsky-3 grammar seems not to be recognized any
more as such. It's probably worth to adapt to that fuzzy terminology from a
practical point of view, but if you're viewing that from a formal language
theory point of view it's not that clear any more what's the preferable way.

>
> PS: why are they characteristically different:
> The motivation for globbing was *intuitive* handling of file names

Certainly, because before System 7 (AFAIK) there was no built-in globbing
in bourne shell, rather there was an external program for that expansion.

> - sometimes overlooked but important: globbing uses implicit anchors.

This detail is important (and well known to me) but doesn't change anything;
you can convert unanchored regexp's to anchored "globbing-pattern" and vice
versa.

> I believe globbing was simply "recycled" in the other places, that is,
> the case condition and pattern matching parameter expansion.

Frankly, I've never used an old UNIX edition 6 bourne shell and don't know
how the case statement worked at that time, or whether the case statement
was existing at all. Sven, wasn't that you who had access to old bourne
shells? You may want to inspect whether the case statement was there, and
if so, what patterns and regular expression metacharacters it had suported.

Janis
From: Ben Finney on
Seebs <usenet-nospam(a)seebs.net> writes:

> On 2010-01-24, Janis Papanagnou <janis_papanagnou(a)hotmail.com> wrote:
> > Seebs wrote:
> >> On 2010-01-23, Janis Papanagnou <janis_papanagnou(a)hotmail.com> wrote:
> >>> Of course it does. File globbing uses regular expressions.
> >> No, it doesn't. File globbing uses shell patterns.
> > Yes, and "patterns are often described using regular expressions"
> > [Wikipedia]
>
> But "shell patterns" (aka globs) are a special case, because the set
> of things they can represent is absolutely a TINY proper subset of
> what POSIX regular expressions can do.

As the Posix specification says:

Historically, pattern matching notation is related to, but slightly
different from, the regular expression notation described in XBD
Regular Expressions.

<URL:http://www.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html#tag_18_13>

That “slightly different” is rather an understatement, when one looks at
the severely minimal (compared to Posix regular expressions) set of
pattern-matching operations that can be done with pathname globs.

> Okay, quick summary: The key is that shell patterns have no grouping
> and no repetition operators. You can map RE '.' onto glob '?', and RE
> '.*' onto glob '*'. There is nothing you can write in shell glob that
> corresponds to 'a*', or even to '.?'. You can handle the
> anchoring/no-anchoring thing -- you can wrap a regex in '^$' or a
> pattern in '**'. But you can't make up for the lack of grouping and
> repetition.

Some shells, of course, go outside the Posix standard and do provide
such facilities for globs.

> re: foo(bar)?
> glob: ... you can't do that.

For example, in Bash, the above could be written as the pathname glob
'foo{,bar}'. There's no such capability in Posix AFAIK, though.

--
\ “Natural catastrophes are rare, but they come often enough. We |
`\ need not force the hand of nature.” —Carl Sagan, _Cosmos_, 1980 |
_o__) |
Ben Finney
From: Janis Papanagnou on
Seebs wrote:
> On 2010-01-24, Janis Papanagnou <janis_papanagnou(a)hotmail.com> wrote:
>> Seebs wrote:
>>> On 2010-01-23, Janis Papanagnou <janis_papanagnou(a)hotmail.com> wrote:
>>>> Of course it does. File globbing uses regular expressions.
>>> No, it doesn't. File globbing uses shell patterns.
>
>> Yes, and "patterns are often described using regular expressions" [Wikipedia]
>
> But "shell patterns" (aka globs) are a special case, because the set of things
> they can represent is absolutely a TINY proper subset of what POSIX regular
> expressions can do.
>
>>> They're confusingly similar in a few ways, but quite different.
>
>> In which way different? (Beyond differences in usability that I mentioned
>> upthread.) It would be helpful to elaborate that beyond a simple statement.
>> Can you provide a standard grep(1) example that we cannot implement with
>> pattern matching capabilities (globbing without files) of a typical shell?
>
> Yes.
>
> Okay, quick summary: The key is that shell patterns have no grouping and no
> repetition operators. You can map RE '.' onto glob '?', and RE '.*' onto
> glob '*'. There is nothing you can write in shell glob that corresponds
> to 'a*', or even to '.?'. You can handle the anchoring/no-anchoring thing --
> you can wrap a regex in '^$' or a pattern in '**'. But you can't make up
> for the lack of grouping and repetition.
>
> re: foo(bar)?
> glob: ... you can't do that.
>
> Note that case statements are closer, because you could do:
> foo | foobar )

You can do all that with the upthread mentioned globbing mechanisms in
Kornshell, in bash (with extended globbing), and I think in zsh as well;
use these constructs respectively: *(...) ?([ ]) ?(...)

Your point seems to be that it's not possible in bourne shell and older
bash'es, and it's supposedly not defined in POSIX. Granted.

Janis

>
> -s