|
From: Janis Papanagnou on 7 May 2008 14:46 pk wrote: > On Wednesday 7 May 2008 17:37, Ed Morton wrote: > >>>>these locales, `[a-dx-z]' is typically not equivalent to `[abcdxyz]'; >>>>instead it might be equivalent to `[aBbCcDdxXyYz]', for example. >>> >>>Is there a way to explicitly print out that information (or, better, the >>>entire collating sequence in use)? I've been looking for a method to do >>>that for long time, but I have found no complete answer. >> >>I expect you could use the ord() and chr() functions described here: >> >>http://www.gnu.org/software/gawk/manual/gawk.html#Ordinal-Functions >> >>[snip] > >[snip] > > It seems that the function you point out use the mere numeric character > values and don't take locale into account. Yes, indeed. And the quoted passage in the GNU manual says about ord()... "the numeric value for that character in the machine's character set". While this matches the interpretation I am comfortable with - in this case and also with the ranges of characters in regexps, the quoted link about "Character-Lists" explicitly mentions a locale dependency. man regex(7) also doesn't enlighten the topic; in one sentence they talk about a _character set_ ("`[0-9]' in ASCII matches any decimal digit"), and shortly after that they talk about _collating sequences_ ("Ranges are very collating-sequence-dependent"). I'm still puzzled about that. Janis [snip]
First
|
Prev
|
Pages: 1 2 Prev: scp script Next: Get the md5sum of every 64MB block in a large file using bash. |