From: AK on
Hello,

Does the standard C string library support utf-8 charset
basically operations like string length, string compare, string copy,
string search?

Can I use the C conversion functions like wcstombs and
mbstowcs to convert string between wide and utf-8 format?

I requires the information for an application that I am
developing for Windows CE using Visual Studio 2005.

Thanks & Regards,
Ajith

From: Carl Daniel [VC++ MVP] on
AK wrote:
> Hello,
>
> Does the standard C string library support utf-8 charset
> basically operations like string length, string compare, string copy,
> string search?

No, the Standard C library knows nothing at all about UTF-8.

The VC++ CRT has extensions that are "MBCS-aware": See _mbslen, etc. that
are prototyped in <mbstring.h>

>
> Can I use the C conversion functions like wcstombs and
> mbstowcs to convert string between wide and utf-8 format?

These are not standard C functions either, but VC++ extensions. Yes, you
can use them to convert between UTF-8 and UCS-2 by using an appropriate code
page (CP_UTF8 or 65001) on the utf-8 side.

>
> I requires the information for an application that I am
> developing for Windows CE using Visual Studio 2005.

You'll have to check the documentation to see what of the above works on Win
CE.

-cd


From: Igor Tandetnik on
Carl Daniel [VC++ MVP]
<cpdaniel_remove_this_and_nospam(a)mvps.org.nospam> wrote:
>> Does the standard C string library support utf-8 charset
>> basically operations like string length, string compare, string copy,
>> string search?
>
> No, the Standard C library knows nothing at all about UTF-8.
>
> The VC++ CRT has extensions that are "MBCS-aware": See _mbslen, etc.
> that are prototyped in <mbstring.h>

.... none of which, unfortunately, support UTF-8 either. They all assume
no more than two bytes per character.

>> Can I use the C conversion functions like wcstombs and
>> mbstowcs to convert string between wide and utf-8 format?
>
> These are not standard C functions either, but VC++ extensions. Yes,
> you can use them to convert between UTF-8 and UCS-2 by using an
> appropriate code page (CP_UTF8 or 65001) on the utf-8 side.

.... except that neither wcstombs nor mbstowcs takes code page as a
parameter. You may be thinking about MultiByteToWideChar and
WideCharToMultiByte APIs.
--
With best wishes,
Igor Tandetnik

With sufficient thrust, pigs fly just fine. However, this is not
necessarily a good idea. It is hard to be sure where they are going to
land, and it could be dangerous sitting under them as they fly
overhead. -- RFC 1925


From: AK on
Hello Carl,

So are you saying that the standard C String functions like strlen,
strstr, strcmp, strcp etc wouldn't work with utf-8?

Also how do I specify to the mbcs functions the charset encoding that
it has to operate on . I see a mbcs functions with _l suffix that
takes in locale info. Could you tell me as to how to use the _locale_t
structure to say that the encoding is utf-8 (CP_UTF8).I couldn't find
enough description in msdn as to how the _locale_t can be used.

I would appreciate if you can give some code sample.

Thanks & Regards,
AK
From: AK on
Hello Igor,

Yes, the Win32 equivalents MultiByteToWideChar and
WideCharToMultiByte does properly convert between utf-8 and wide char.
But I mostly to try to use the standard C/C++ libraries to keep the
code portable ready.

My application receieves a lot of text payload in utf-8
over the network. I need to do lot of manipulation and substitution
before I sent them back over the network again. Currently I do it by
converting it to wide char first and then using wcs*** functions or
the STL based std::wstring to peform the necesary operation.

I wanted to do all text manipulation without converting
them to utf-8. Is this possible?

Also does the STL std:string support utf-8?

Regards,
AK