From: Dave on
A few weeks ago I looked for an implementation of std::string that can
handle UTF8 strings. I was thinking that the STL iterator abstraction
would be nice for iterating over a variable length encoded string. So
far I haven't found anything. Does anybody know of a UTF8 std::string
implementation?

I'm really curious how the char_traits template was implemented to
handle variable length character encodings.

Thanks,
Dave


[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

From: johnchx2@yahoo.com on
Dave wrote:

> I'm really curious how the char_traits template was implemented to
> handle variable length character encodings.

I don't think it has been. std::basic_string is, AFAIK, intended to
work only with fixed-length encodings (i.e. "internal" representation).
Translation to and from variable-length encodings is handled by
locales associated with i/o streams.

There may however be std::string-like classes out there that handle
variable length encodings.


[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

From: Niek Sanders on
Dave wrote:
> A few weeks ago I looked for an implementation of std::string that can
> handle UTF8 strings. I was thinking that the STL iterator abstraction
> would be nice for iterating over a variable length encoded string. So
> far I haven't found anything. Does anybody know of a UTF8 std::string
> implementation?
>

The QString class in TrollTech's QT library supports UTF8. The
documentation is here:
http://doc.trolltech.com/4.0/qstring.html

- Niek Sanders
http://www.cis.rit.edu/~njs8030/


[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

From: Ulrich Eckhardt on
Dave wrote:
> A few weeks ago I looked for an implementation of std::string that can
> handle UTF8 strings. I was thinking that the STL iterator abstraction
> would be nice for iterating over a variable length encoded string. So
> far I haven't found anything. Does anybody know of a UTF8 std::string
> implementation?
>
> I'm really curious how the char_traits template was implemented to
> handle variable length character encodings.

It isn't.
std::basic_string assumes that you have one character per element of the
string. That said, there are basically two ways to use std::basic_string
when you need UTF-8:
1. Use std::wstring
This means that you internally use wchar_t as character type which, at
least on some platforms, can hold the whole Unicode range in characters.
You then convert to UTF-8 where you need it (Note: iostreams already
contain a conversion facility called codecvt which is perfectly suited to
reading and writing UTF-8 files).

2. Use std::string
This means that you store the UTF-8 string as-is in a char based string.
The main caveat is that somestring[4] will not give you the fifth
character of the string, it just gives you the fifth byte. Typically, you
don't need single-character access very often though, so that should not
be a problem - if you need, you could implement an iterator that iterates
over a UTF-8 sequence or simply convert it to wchar_t if that suffices.

I personally use std::wstring (and wcout, wfstream etc) for every thing
that is supposed to be presented to a user in my programs and that needs
full Unicode range.

Uli


[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

From: Vaclav Haisman on
Dave wrote, On 8.6.2006 0:20:
> A few weeks ago I looked for an implementation of std::string that can
> handle UTF8 strings. I was thinking that the STL iterator abstraction
> would be nice for iterating over a variable length encoded string. So
> far I haven't found anything. Does anybody know of a UTF8 std::string
> implementation?
>
> I'm really curious how the char_traits template was implemented to
> handle variable length character encodings.
>
> Thanks,
> Dave
>
Try IBM's libICU.

--
VH

[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

 |  Next  |  Last
Pages: 1 2 3 4 5 6 7 8 9 10 11
Prev: localtime deprecated?
Next: bind guard ?