From: Francis Glassborow on
joe wrote:
> Jens Schmidt wrote:
>> Stanley Friesen wrote:
>>
>>> I disagree, or at least this is only true in English. Even for
>>> German, the fact that the Web standard specifies NFD for
>>> transmission means any useful application would need to support
>>> combining characters (think umlauts).
>> For an example where more than one code point for one character can be
>> used in English: résumé.
>
> No, that is not correct. English does not have diacritics. Should one
> chose to use them, for romantic reasons or whatever, is fine (have fun
> typing that on your computer), but that is not English. It is someone's
> idea of a composite language that combines French and English. The
> English word is: 'resume', and what distinguishes it from the other
> meaning of the literal text is the surrounding context, which surely will
> be unambiguous. Please don't litter my nice clean language with
> diddly-do-dads from other languages thank you.
>

Yes, it is true that American has no diacriticals, but English this side
of the Atlantic does. Please do not try to clean up the muddy language
that is my mother tongue :)

Anyway this has got very far from C++ where we certainly do need a way
to handle text in more than just American English. IIRC it was the
narrow view of text in C that caused problems for the first C Standard
with the result that trigraphs were invented.

I really wish that I could simple handle textual IO without having to
know all the grisly details. If a mechanism exists to provide the input,
then I would like to be able to capture it, process it and output it
with the minimum of fuss. And issues of the source character set should
be restricted to local coding standards. Whether an accented letter, an
Arabic letter or a Chinese character is allowed as a variable name
should not IMO be an issue for the language Standard but an issue for a
code shop to decide. Of course if you are writing a library you
probably want it readable (but why assume that the Roman alphabet is the
one true way and everyone must restrict themselves to that set of letters.




--
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

From: joe on
Francis Glassborow wrote:
> joe wrote:
>> Jens Schmidt wrote:
>>> Stanley Friesen wrote:
>>>
>>>> I disagree, or at least this is only true in English. Even for
>>>> German, the fact that the Web standard specifies NFD for
>>>> transmission means any useful application would need to support
>>>> combining characters (think umlauts).
>>> For an example where more than one code point for one character can
>>> be used in English: r�sum�.
>>
>> No, that is not correct. English does not have diacritics. Should one
>> chose to use them, for romantic reasons or whatever, is fine (have
>> fun typing that on your computer), but that is not English. It is
>> someone's idea of a composite language that combines French and
>> English. The English word is: 'resume', and what distinguishes it
>> from the other meaning of the literal text is the surrounding
>> context, which surely will be unambiguous. Please don't litter my
>> nice clean language with diddly-do-dads from other languages thank
>> you.
>
> Yes, it is true that American has no diacriticals, but English this
> side of the Atlantic does. Please do not try to clean up the muddy
> language that is my mother tongue :)
>
> Anyway this has got very far from C++ where we certainly do need a way
> to handle text in more than just American English. IIRC it was the
> narrow view of text in C that caused problems for the first C Standard
> with the result that trigraphs were invented.
>
> I really wish that I could simple handle textual IO without having to
> know all the grisly details. If a mechanism exists to provide the
> input, then I would like to be able to capture it, process it and
> output it with the minimum of fuss. And issues of the source character
> set
> should be restricted to local coding standards. Whether an accented
> letter, an Arabic letter or a Chinese character is allowed as a
> variable name should not IMO be an issue for the language Standard but
> an issue for
> a code shop to decide. Of course if you are writing a library you
> probably want it readable (but why assume that the Roman alphabet is
> the one true way and everyone must restrict themselves to that set of
> letters.

Oops. I forgot to add in my last post (just to reiterate) that I use
Unicode also, but that I am satisfied with the fixed-width variety of it
(UCS-2 and I don't think I'll ever need UCS-4). UCS-2 rocks! (My bad...
I was thinking of U2!) ;)


--
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

From: Mathias Gaunard on
On Jul 17, 11:25 pm, "joe" <jc1...(a)att.net> wrote:

> No, that is not correct. English does not have diacritics.

It does, even if their use is uncommon and mostly restricted to
loanwords (which still count as English).
See <http://en.wikipedia.org/wiki/English_words_with_diacritics>.


--
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

From: joe on
Francis Glassborow wrote:
> joe wrote:
>> Jens Schmidt wrote:
>>> Stanley Friesen wrote:
>>>
>>>> I disagree, or at least this is only true in English. Even for
>>>> German, the fact that the Web standard specifies NFD for
>>>> transmission means any useful application would need to support
>>>> combining characters (think umlauts).
>>> For an example where more than one code point for one character can
>>> be used in English: r�sum�.
>>
>> No, that is not correct. English does not have diacritics. Should one
>> chose to use them, for romantic reasons or whatever, is fine (have
>> fun typing that on your computer), but that is not English. It is
>> someone's idea of a composite language that combines French and
>> English. The English word is: 'resume', and what distinguishes it
>> from the other meaning of the literal text is the surrounding
>> context, which surely will be unambiguous. Please don't litter my
>> nice clean language with diddly-do-dads from other languages thank
>> you.
>
> Yes, it is true that American has no diacriticals, but English this
> side of the Atlantic does. Please do not try to clean up the muddy
> language that is my mother tongue :)
>

No need to here. :) Yes, you are stuck with an old fashioned, jettisoned
way. Not to worry though, Unicode is for you (whether you want it or
not!).

> Anyway this has got very far from C++ where we certainly do need a way
> to handle text in more than just American English.

Not far at all from C++ given that it has lame support for Unicode, but
surely it hurts you more than me because we are smarter about language
design this side of the Atlantic.

> IIRC it was the
> narrow view of text in C that caused problems for the first C Standard
> with the result that trigraphs were invented.

Again though, wouldn't it just be easier to move to a place that has a
fine language? ;)

>
> I really wish that I could simple handle textual IO without having to
> know all the grisly details.

It's a breath of fresh air knowing that there aren't any of those! (Not
in your case. Woe is you I guess).

> If a mechanism exists to provide the
> input, then I would like to be able to capture it, process it and
> output it with the minimum of fuss. And issues of the source character
> set
> should be restricted to local coding standards.

That is BEGGING for a large, institutionalized, internationalized,
over-generalized "standard" thing (Unicode). All kidding aside, if you
need it, you need it. It's that simple. Oh, but if you don't, you don't!
:)

> Whether an accented
> letter, an Arabic letter or a Chinese character is allowed as a
> variable name should not IMO be an issue for the language Standard but
> an issue for
> a code shop to decide.

I believe that the defined phases of translation allow that (? you know
better than me about that). As long as it maps onto the defined source
character set, it's a go.

> Of course if you are writing a library you
> probably want it readable (but why assume that the Roman alphabet is
> the one true way and everyone must restrict themselves to that set of
> letters.

I didn't say that at all. I just said that my target audience is that
which speaks primarily English. Yes, the C++ standard needs to address
the larger scope, obviously that goes without saying.


--
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

From: Miles Bader on
Francis Glassborow <francis.glassborow(a)btinternet.com> writes:
> Whether an accented letter, an Arabic letter or a Chinese character is
> allowed as a variable name should not IMO be an issue for the language
> Standard but an issue for a code shop to decide.

I gather that there is some support in c99/c++0x for this.
Gcc kinda-sorta supports it, but only using \u notation, not actual utf8
source code, e.g.:

int \u5909\u6570 = 5;

[which is just a transliteration of:

int 変数 = 5;
]

[compile with "g++ -fextended-identifiers -std=c++0x -c c.cc"]

Obviously the above is only useful if you go to the trouble of somehow
pre-processing your source files before compiling, but from googling
around, I get the impression that the plan for gcc, at least, is to
support real UTF-8 identifiers in the future. I dunno if that's
actually part of c++0x though...

-miles

--
Non-combatant, n. A dead Quaker.

[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]