From: Maxim Yegorushkin on

Dave wrote:
> A few weeks ago I looked for an implementation of std::string that can
> handle UTF8 strings. I was thinking that the STL iterator abstraction
> would be nice for iterating over a variable length encoded string. So
> far I haven't found anything. Does anybody know of a UTF8 std::string
> implementation?

std::string was not designed to handle variable length characters. The
idea was that code works with fixed length characters most of the time
only converting to/from variable length characters on output/input.


[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

From: Tom Widmer on
Dave wrote:
> A few weeks ago I looked for an implementation of std::string that can
> handle UTF8 strings. I was thinking that the STL iterator abstraction
> would be nice for iterating over a variable length encoded string. So
> far I haven't found anything. Does anybody know of a UTF8 std::string
> implementation?
>
> I'm really curious how the char_traits template was implemented to
> handle variable length character encodings.

std::basic_string and std::char_traits only operate on fixed width
encodings. The general std approach is to only use variable length
encodings in storage, converting them to and from fixed length when
performing IO (using a codecvt facet).

OTOH, lots of other string libraries do handle UTF8 strings, just not
std::basic_string.

Tom

[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

From: Bronek Kozicki on
Dave wrote:
> A few weeks ago I looked for an implementation of std::string that can
> handle UTF8 strings. I was thinking that the STL iterator abstraction

I suggest that for your normal data processing needs you stick with
fixed-width Unicode encodings, like UTF16 or UTF32 - most std::wstring
implementations directly support one or another. Use UTF8 only for
input/output using IO specific for your platform and/or its support functions.
The reason is simple - efficiency.


B.

[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

From: shunsuke on
Dave wrote:
> A few weeks ago I looked for an implementation of std::string that can
> handle UTF8 strings. I was thinking that the STL iterator abstraction
> would be nice for iterating over a variable length encoded string. So
> far I haven't found anything. Does anybody know of a UTF8 std::string
> implementation?

Boost has (secretly?) such iterators.
Check <boost/regex/pending/unicode_iterator.hpp>
They are not string classes but iterator adaptors.

--
Shunsuke Sogame


[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

From: kanze on
Bronek Kozicki wrote:
> Dave wrote:
> > A few weeks ago I looked for an implementation of std::string that
> > can handle UTF8 strings. I was thinking that the STL iterator
> > abstraction

> I suggest that for your normal data processing needs you stick with
> fixed-width Unicode encodings, like UTF16 or UTF32 - most std::wstring
> implementations directly support one or another. Use UTF8 only for
> input/output using IO specific for your platform and/or its support
> functions. The reason is simple - efficiency.

I'm not sure I agree. I think a lot depends on the application. For a
large set of applications, I'm pretty sure that UTF-8 strings would be
more efficient. With the correct supporting tools (e.g. a regex class
which understands them), they probably wouldn't be any harder to use.
The one case where they really loose is with random access based
strictly on the character index, e.g. accessing the 132nd character in
a
string (without accessing any of the intermediate characters). But if
my applications are typical, that's something that you never do --
outside of an editor, when would you do something like that?

--
James Kanze GABI Software
Conseils en informatique orient?e objet/
Beratung in objektorientierter Datenverarbeitung
9 place S?mard, 78210 St.-Cyr-l'?cole, France, +33 (0)1 30 23 00 34


[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

First  |  Prev  |  Next  |  Last
Pages: 1 2 3 4 5 6 7 8 9 10 11 12
Prev: localtime deprecated?
Next: bind guard ?