From: Stephane CHAZELAS on
2010-02-21, 16:16(+00), pk:
> Stephane CHAZELAS wrote:
>
>> 2010-02-21, 10:35(+00), pk:
>> [...]
>>> awk -F '{|}' ...
>>
>> That's incorrect POSIX syntax (leads to unspecified results),
>> you want:
>>
>> awk -F '\{|\}'
>>
>> with a POSIX awk (like with gawk when POSIXLY_CORRECT is on).
>
> That's incorrect as well, and takes you back to the '{|}' case; if you go
> that route, you need
>
> awk -F '\\{|\\}'
>
> due to the way awk scans strings.

s/awk/gawk/

> I used just '{|}' because most awk nowadays either do NOT support {} as

s/most/GNU/ (except in POSIX mode).

> regex characters (though it's mandated by POSIX), and those that do are
> smart enough to see that there's nothing to "quantify" there and take the {
> and } literally.

Except GNU awk:

$ POSIXLY_CORRECT=1 gawk -F '{|}' '{print $1}'
gawk: fatal: Invalid preceding regular expression: /{|}/

awk -F '\{|\}'

should seems to be OK with every POSIX awk except GNU awk. I'm
not sure if it's a gawk bug or not as the POSIX spec is unclear
to me on that point, but I agree that

awk -F '\\{|\\}'

is better as it works on all POSIX awks including GNU awk.

--
St�phane
From: pk on
Stephane CHAZELAS wrote:

>> awk -F '\\{|\\}'
>>
>> due to the way awk scans strings.
>
> s/awk/gawk/

I must admit that I had always thought that the "double pass" on strings as
described in the GNU awk manual was the default for awk in general, not just
gawk. But it seems indeed that other awks do accept the version with single
backslashes, so I stand corrected. Thanks.

> should seems to be OK with every POSIX awk except GNU awk. I'm
> not sure if it's a gawk bug or not as the POSIX spec is unclear
> to me on that point,

POSIX seems indeed to mandate that:

"...If the right-hand operand is any expression other than the lexical token
ERE, the string value of the expression shall be interpreted as an extended
regular expression, including the escape conventions described above. Note
that these same escape conventions shall also be applied in determining the
value of a string literal (the lexical token STRING), and thus shall be
applied a second time when a string literal is used in this context."
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

However it's true that the text above refers to when a string (not a literal
ERE) is used in the context of "~" and "!~" operator only. I guess each awk
implementation had its take on whether the above should apply in other
contexts (like FS) or not.

> but I agree that
>
> awk -F '\\{|\\}'
>
> is better as it works on all POSIX awks including GNU awk.

Agreed. That should work in any case.
From: Geoff Clare on
Stephane CHAZELAS wrote:

> not sure if it's a gawk bug or not as the POSIX spec is unclear
> to me on that point, but I agree that
>
> awk -F '\\{|\\}'
>
> is better as it works on all POSIX awks including GNU awk.

Looks like a defect in POSIX to me. I have reported it.

http://austingroupbugs.net/view.php?id=224

--
Geoff Clare <netnews(a)gclare.org.uk>