From: Simon on
Thanks for all the replies.

In the end I looked at the way notepad++ reads the files, as Mihai N.
mentioned, they read the file in 'rb' and then call MultiByteToWideChar(
.... )

because the file is read in Bytes they have various functions to check
the file format, (UTF-8, UTF-16, ascci and so forth).

Simon

From: Simon on
>
> if you on a Japanese system
> probably UTF-8 or Shift-JIS (cp932)
> else
> probably UTF-8

Thanks for the replies,

How do I know if I am on a Japanese system???
and even if I know, (using the local and so forth), how can I test if it
is UTF-8 or Shift-JIS (cp932)?

if it is UTF-8 I can, (now), read it properly, (using MultiByteToWideChar).

But how can I convert 'read' Shift-JIS (cp932) and convert to wide char
accordingly?

>
> So load the file as bytes, then use MultiByteToWideChar.

Many thanks

Simon
From: Giovanni Dicanio on
"Simon" <bad(a)example.com> ha scritto nel messaggio
news:OnxYESLzKHA.5040(a)TK2MSFTNGP02.phx.gbl...

> But how can I convert 'read' Shift-JIS (cp932) and convert to wide char
> accordingly?

You can use MultiByteToWideChar with proper code page identifier:

http://msdn.microsoft.com/en-us/library/dd317756(VS.85).aspx

Giovanni


From: Tom Serface on
If your file is UTF-8 or Unicode and you are reading into Unicode for the
memory string it shouldn't matter what kind of system you are on since the
codepage would no longer be an issue.

To test a file type you should check the Byte Order Mark (BOM) which is the
first two or three bytes in the file:

#define UTF8_BOM "\xef\xbb\xbf" // UTF-8 file "byte order mark" which goes
at start of file
#define UTF8_BOM_SIZE 3
#define UTF16_LE_BOM "\xff\xfe" // Unicode "byte order mark" which goes
at start of file
#define UTF16_BOM_SIZE 2

Tom

"Simon" <bad(a)example.com> wrote in message
news:OnxYESLzKHA.5040(a)TK2MSFTNGP02.phx.gbl...
>>
>> if you on a Japanese system
>> probably UTF-8 or Shift-JIS (cp932)
>> else
>> probably UTF-8
>
> Thanks for the replies,
>
> How do I know if I am on a Japanese system???
> and even if I know, (using the local and so forth), how can I test if it
> is UTF-8 or Shift-JIS (cp932)?
>
> if it is UTF-8 I can, (now), read it properly, (using
> MultiByteToWideChar).
>
> But how can I convert 'read' Shift-JIS (cp932) and convert to wide char
> accordingly?
>
>>
>> So load the file as bytes, then use MultiByteToWideChar.
>
> Many thanks
>
> Simon