From: byang on
Hi,
I am wondering how can read/write UTF-8 files with C++. Say, I know
there is a file encoded with UTF-8, I am now wanting to change some
character in the file. How can I achieve? Could anybody here to help on
explaining some encoding issue?

Thanks in advance!

Regards!
Bo

From: thomas.mertes on
On 16 Apr., 06:09, byang <techr...(a)eyou.com> wrote:
> Hi,
> I am wondering how can read/write UTF-8 files with C++. Say, I know
> there is a file encoded with UTF-8, I am now wanting to change some
> character in the file. How can I achieve? Could anybody here to help on
> explaining some encoding issue?

While UTF-8 has a lot of advantages it also has a disadvantage:
The relationship between byte position and char position is not
a simple relationship like: char_position * 4 = byte_position.
In most cases it is not possible to go to a file position
(with fseek) and to write the new character.
In the general case there is no other possibility than to read
the whole file and to write it with the change.

BTW.: Seed7 has a function to open an UTF-8 file. After opening
an UTF-8 file with 'open_utf8' it can be used as a normal file.
When reading from the UTF-8 encoded file the characters are
converted to the UTF-32 encoding. Internally only UTF-32
characters and strings are used. A write operation to a file
opened with 'open_utf8' converts the UTF-32 characters back
to UTF-8. In Seed7 the seek and change method is also not
useable since seek uses byte positions and not character
positions. But at least the read + change + write solution is
simple.

Greetings Thomas Mertes

Seed7 Homepage: http://seed7.sourceforge.net
Seed7 - The extensible programming language: User defined statements
and operators, abstract data types, templates without special
syntax, OO with interfaces and multiple dispatch, statically typed,
interpreted or compiled, portable, runs under linux/unix/windows.