From: Janis Papanagnou on
pk wrote:
> On Wednesday 7 May 2008 17:37, Ed Morton wrote:
>
>>>>these locales, `[a-dx-z]' is typically not equivalent to `[abcdxyz]';
>>>>instead it might be equivalent to `[aBbCcDdxXyYz]', for example.
>>>
>>>Is there a way to explicitly print out that information (or, better, the
>>>entire collating sequence in use)? I've been looking for a method to do
>>>that for long time, but I have found no complete answer.
>>
>>I expect you could use the ord() and chr() functions described here:
>>
>>http://www.gnu.org/software/gawk/manual/gawk.html#Ordinal-Functions
>>
>>[snip]
>
>[snip]
>
> It seems that the function you point out use the mere numeric character
> values and don't take locale into account.

Yes, indeed. And the quoted passage in the GNU manual says about ord()...
"the numeric value for that character in the machine's character set".

While this matches the interpretation I am comfortable with - in this
case and also with the ranges of characters in regexps, the quoted
link about "Character-Lists" explicitly mentions a locale dependency.

man regex(7) also doesn't enlighten the topic; in one sentence they
talk about a _character set_ ("`[0-9]' in ASCII matches any decimal
digit"), and shortly after that they talk about _collating sequences_
("Ranges are very collating-sequence-dependent").

I'm still puzzled about that.

Janis

[snip]