From: syd_p on
On 29 Sep, 12:53, syd_p <sydneypue...(a)yahoo.com> wrote:
> On 29 Sep, 12:43, syd_p <sydneypue...(a)yahoo.com> wrote:
>
> > On 29 Sep, 10:37, Marcel Bruinsma <m...(a)nomail.afraid.org> wrote:
>
> > > Am Dienstag, 29. September 2009 10:31, syd_p a écrit :
>
> > > > But with LANG=C which I thought was only 7 bits the following
> > > > printfs work just fine.
>
> > > > $ printf "(octal 353) is the character \0353\n"
> > > > (octal 353) is the character ë
> > > > printf "(octal 361) is the character \0361\n"
> > > > (octal 361) is the character ñ
>
> > > Good, the typeface (font) has the characters you need.
>
> > > > These are two of the characters in the MSSQL db which the
> > > > application (not open source) handles as "?".
>
> > > Check if the application is really the cause of the problem.
> > > For file 'foo', generated by application, run :
> > > LANG=en_US.ISO-8859-15 tr -d '\000-\177' <foo | od -b
>
> > > --
> > > printf -v email $(echo \ 155 141 162 143 145 154 142 162 165 151 \
> > > 156 163 155 141 100 171 141 150 157 157 056 143 157 155|tr \  \\\\)
> > > #  Live every life as if it were your last!  #
>
> > $ cat  foo
> > A?O
> > $  od -b  foo
> > 0000000 101 077 117 012
> > 0000004
> > octal 101 =  A
> > octal 077 =  ?
> > octal 077 =  O
> > middle char should be capital ñ
>
> > $ LANG=en_US.ISO-8859-15 tr -d '\000-\177' <foo | od -b
> > 0000000
> > not quite sure what this does ;-)
>
> Ah yes I am - it deletes all "normal" chars and passes the remainder
> to od...
> not quite sure what all zeros as the out means tho...
>
> And on a Centos 5.3 box I just invoked
> $  printf "(octal 353) is the character \0353\n"
> (octal 353) is the character 3
> on the centos 3.8 box I got the expected output of ë
Ahh - I see need to specify the hex value thusly:
$ printf "(hex EB) is the character \xEB\n"
(hex EB) is the character ë
this works on 3.8 and 5.3
From: syd_p on
On 29 Sep, 12:53, syd_p <sydneypue...(a)yahoo.com> wrote:
> On 29 Sep, 12:43, syd_p <sydneypue...(a)yahoo.com> wrote:
>
> > On 29 Sep, 10:37, Marcel Bruinsma <m...(a)nomail.afraid.org> wrote:
>
> > > Am Dienstag, 29. September 2009 10:31, syd_p a écrit :
>
> > > > But with LANG=C which I thought was only 7 bits the following
> > > > printfs work just fine.
>
> > > > $ printf "(octal 353) is the character \0353\n"
> > > > (octal 353) is the character ë
> > > > printf "(octal 361) is the character \0361\n"
> > > > (octal 361) is the character ñ
>
> > > Good, the typeface (font) has the characters you need.
>
> > > > These are two of the characters in the MSSQL db which the
> > > > application (not open source) handles as "?".
>
> > > Check if the application is really the cause of the problem.
> > > For file 'foo', generated by application, run :
> > > LANG=en_US.ISO-8859-15 tr -d '\000-\177' <foo | od -b
>
> > > --
> > > printf -v email $(echo \ 155 141 162 143 145 154 142 162 165 151 \
> > > 156 163 155 141 100 171 141 150 157 157 056 143 157 155|tr \  \\\\)
> > > #  Live every life as if it were your last!  #
>
> > $ cat  foo
> > A?O
> > $  od -b  foo
> > 0000000 101 077 117 012
> > 0000004
> > octal 101 =  A
> > octal 077 =  ?
> > octal 077 =  O
> > middle char should be capital ñ
>
> > $ LANG=en_US.ISO-8859-15 tr -d '\000-\177' <foo | od -b
> > 0000000
> > not quite sure what this does ;-)
>
> Ah yes I am - it deletes all "normal" chars and passes the remainder
> to od...
> not quite sure what all zeros as the out means tho...
>
> And on a Centos 5.3 box I just invoked
> $  printf "(octal 353) is the character \0353\n"
> (octal 353) is the character 3
> on the centos 3.8 box I got the expected output of ë
Ahh - I see need to specify the hex value thusly:
$ printf "(hex EB) is the character \xEB\n"
(hex EB) is the character ë
this works on 3.8 and 5.3
From: Marcel Bruinsma on
Am Dienstag, 29. September 2009 13:53, syd_p a écrit :

>> $  od -b  foo
>> 0000000 101 077 117 012
>> 0000004
>> octal 101 =  A
>> octal 077 =  ?
>> octal 077 =  O
>> middle char should be capital ñ

Ok, the problem is caused by your application.
Try running it with latin9 ctype :
LANG=en_US.ISO-8859-15 application ...

>> $ LANG=en_US.ISO-8859-15 tr -d '\000-\177' <foo | od -b
>> 0000000
>> not quite sure what this does ;-)
> Ah yes I am - it deletes all "normal" chars and passes
> the remainder to od...

Yes, I thought 'foo' might be a big file, with mostly us-ascii.
In a file with 10000 ascii characters and only 10 non-ascii
the output of od (without the tr filter) might be a bit of a
challenge. ;-)

> not quite sure what all zeros as the out means tho...

Od starts eachs line with an address, the offset of the first
byte in that line. The first line starts at offset 0, unless you
invoke od with the -j option.

> And on a Centos 5.3 box I just invoked
> $ printf "(octal 353) is the character \0353\n"
> (octal 353) is the character 3

Yes, that is what posix printf is required to do. From,
http://www.opengroup.org/onlinepubs/9699919799/utilities/printf.html
« [...] "\ddd", where ddd is a one, two, or three-digit octal
» number, shall be written as a byte with the numeric value
» specified by the octal number. »

In the printf above, "\035" is an escape sequence, and the
following "3" is a normal digit. To write octal 353, the
printf format string should be '(octal 353) ... \353\n'.

> on the centos 3.8 box I got the expected output of ë

Are you using zsh on the centos 3.8 box?
The zsh built-in printf expects octal escape sequences to
start with '\0' followed by zero, one, three or four octal
digits.

posix: \353 => zsh: \0353
posix: \75 => zsh: \075

--
printf -v email $(echo \ 155 141 162 143 145 154 142 162 165 151 \
156 163 155 141 100 171 141 150 157 157 056 143 157 155|tr \ \\\\)
# Live every life as if it were your last! #
From: Marcel Bruinsma on
Am Dienstag, 29. September 2009 15:54, syd_p a écrit :

> $ printf "(hex EB) is the character \xEB\n"
> (hex EB) is the character ë
> this works on 3.8 and 5.3

This a nice one to try. It shows the terminal mapping:
printf '\xa4 \x80\n'

To understand that:
zgrep -E '\x(80|a4)' /usr/share/i18n/charmaps/ISO-8859-15.gz
zgrep -E '\x(80|a4)' /usr/share/i18n/charmaps/ISO-8859-1.gz
zgrep -E '\x(80|a4)' /usr/share/i18n/charmaps/CP1252.gz

--
printf -v email $(echo \ 155 141 162 143 145 154 142 162 165 151 \
156 163 155 141 100 171 141 150 157 157 056 143 157 155|tr \ \\\\)
# Live every life as if it were your last! #