From: Tony Johansson on
Hi!

Here is some encodings standards
1.ASCII
2.Unicode
3.UTF-7
4.UTF-8
5.UTF-32

In the beginning of the file encoded with Unicode,UTF-8 and UTF-32 is code
markers but file encoded
with ASCII and UTF-7 does not contains any code markers at all.
So why is that not code markers for these two.

//Tony


From: Harlan Messinger on
Tony Johansson wrote:
> Hi!
>
> Here is some encodings standards
> 1.ASCII
> 2.Unicode
> 3.UTF-7
> 4.UTF-8
> 5.UTF-32
>
> In the beginning of the file encoded with Unicode,UTF-8 and UTF-32 is code
> markers but file encoded
> with ASCII and UTF-7 does not contains any code markers at all.
> So why is that not code markers for these two.
>
The purpose of the marker is to indicate whether the data is stored in
"big-endian" or "little-endian" order--that is, whether multibyte
encodings are arranged high-order byte first or low-order byte first.
Therefore, the need for this marker only arose when multibyte encodings
were introduced.
From: Peter Duniho on
Tony Johansson wrote:
> Hi!
>
> Here is some encodings standards
> 1.ASCII
> 2.Unicode
> 3.UTF-7
> 4.UTF-8
> 5.UTF-32
>
> In the beginning of the file encoded with Unicode,UTF-8 and UTF-32 is code
> markers but file encoded
> with ASCII and UTF-7 does not contains any code markers at all.
> So why is that not code markers for these two.

You are not guaranteed markers for the standard Unicode formats either.

ASCII was "designed" long before anyone was really thinking hard about
portable character encodings, so there was no chance it would support a
marker.

And UTF-7 is used in such specialized situations, there's no need for a
marker because anything that can use it will be doing so in a context
where there's some other way to specify the format.

In general, it's very difficult to identify encoding from the text file
itself. There are some exceptions (XML allows inclusion of the
encoding, for example, as part of the header), but most of the time
encoded text needs some external indicator as to what encoding is used.
Either some convention or some explicit statement to that effect.

Pete
From: Jeff Johnson on
"Peter Duniho" <no.peted.spam(a)no.nwlink.spam.com> wrote in message
news:uafrPurxKHA.3560(a)TK2MSFTNGP02.phx.gbl...

> In general, it's very difficult to identify encoding from the text file
> itself.

Yup: http://blogs.msdn.com/michkap/archive/2006/07/11/662342.aspx


From: Arne Vajhøj on
On 18-03-2010 11:39, Tony Johansson wrote:
> Here is some encodings standards
> 1.ASCII
> 2.Unicode
> 3.UTF-7
> 4.UTF-8
> 5.UTF-32
>
> In the beginning of the file encoded with Unicode,UTF-8 and UTF-32 is code
> markers but file encoded
> with ASCII and UTF-7 does not contains any code markers at all.
> So why is that not code markers for these two.

I would not consider Unicode an encoding.

And the BOM is optional not required for UTF-8.

Regarding why then BOM only makes sense for certain
encodings, but in the end it is a matter of
choice by whoever designed the encoding.

If you define the TonyEncoding to map between Unicode
and bytes, then you can put the headers in front that
you want.

Arne