From: deadlyhead on
I've been messing around a bit with files of various encodings, and
just recently I've become aware of the Form parameter to Open and
Create and the -gnatW switch for handling character encoding.

This is a pretty big deal to me. For a long time I've been a bit...
frustrated? ... by the fact that the Ada standard specifically gives
us Wide_ and Wide_Wide_Characters and their associated strings, but
actually _using_ them seemed pretty much worthless. I mean, if you
can't actually _talk_ with them to a modern system (UTF-8 or UTF-16
encoding seems to be pretty much the way it goes), what's the point in
using them?

So I'm pretty happy with using either the WCEM=8 or -gnatW8 methods of
setting the encoding to get UTF-8 input and output. What I'm
wondering now is can I get other UTF outputs to work?

I actually have the peculiar case of dealing with UTF-32 encoded
files, which need to be translated to UTF-8 for editing, and back to
UTF-32 for machine-use again. It seems that it would be pretty
straight-forward to just pull the file in with a straight
Wide_Wide_Text_IO.Open/Get_Line system, then output via
Wide_Wide_Text_IO.Put on a file where Form => "WCEM=8". So far,
though, I'm having trouble since the encoding for GNAT defaults to
bracket notation, not binary character dumping. As well, if I want
output printed to the terminal in UTF-8, I have to set the -gnatW8
switch, which means that _now_ the default encoding for all
unspecified files is UTF-8. Any ideas on how to get around this?

And, just for giggles, is it _possible_ to use the Upper_Half encoding
"WCEM=u" to encode UTF-16? Or is this something completely different
(which it seems it might be, from the little that's said in the GNAT
Reference Manual).

I'm okay with giving up on this method and using the XML/Ada Unicode
libraries for the text translation. It'd be nice if I didn't have to,
though.