From: Peter Olcott on
On 5/18/2010 8:12 PM, Joseph M. Newcomer wrote:
> See below...
> On Tue, 18 May 2010 18:05:54 -0500, Peter Olcott<NoSpam(a)OCR4Screen.com> wrote:
>
>> On 5/18/2010 5:40 PM, Joseph M. Newcomer wrote:
>>>> I could have called the range of code points above 7F to be named
>>>> something other than a letter but there was no need to since its makes
>>>> no relevant difference.
>>> ****
>>> So if I'm writing in some other language, and write two parameters, in my native language
>>> using the letters for A and B, and my native comma, you are saying that A,B is actually a
>>> single identifier? Really? Am I going to believe you are supporting me coding in my
>>> native language? And this is just looking at the most trivial examples; I don't know
>>> enough of Chinese, Japanese or Korean to tell how a "name" which is a sequence of letters
>>> can be formed. I could believe that a single Chinese character would constitute a valid
>>> variable name, so two such glyphs would be the equivalent of writing, in C, the expression
>>> A B
>>> which is syntactically invalid.
>>>
>>
>> Let me more precisely state my original goal. My language will
>> essentially have the syntax of C++, and allow its users to write
>> Identifiers in their native language. If they don't use the ASCII comma
>> between parameters then they are specifying incorrect syntax.
> ***
> Oh, so it is "write identifiers in their native language" but if they write the digits
> 1234 in their native script, you think it is an identifier, not a number. And a comma is
> a comma only if it is a comma that I recognize, not one you tell me I have to use.
> ****

A church that I used to go to had an expression, "What would Jesus do?"
as their measure of correct behavior. In my case I have an analogous
measure, "What would C++ do?"

Would C++ permit digits other than ASCII [0-9] ???

>>
>> From what I understand what I am proposing is much better than C/C++
>> provides. From what I understand C/C++ only accepts ASCII. This
>> restriction mandates substantially poorer code quality (in terms of self
>> documenting code) for much of the rest of the world.
> ****
> C/C++ actually states that the character set for identifiers is implementation-specific

So C++ can handle UTF-8?

From: Mihai N. on

> So C++ can take UTF-8 Identifiers?

No, it can take Unicode identifiers.
The exact transformation format is not relevant.


--
Mihai Nita [Microsoft MVP, Visual C++]
http://www.mihai-nita.net
------------------------------------------
Replace _year_ with _ to get the real email

From: Mihai N. on
> (2) Makes my success too dependent upon Microsoft.

That's a good reason, indeed.
You stuff is so great, that Microsoft might actually drag you down :-)


--
Mihai Nita [Microsoft MVP, Visual C++]
http://www.mihai-nita.net
------------------------------------------
Replace _year_ with _ to get the real email

From: Joseph M. Newcomer on
See below...
On Tue, 18 May 2010 20:48:26 -0500, Peter Olcott <NoSpam(a)OCR4Screen.com> wrote:

>On 5/18/2010 8:12 PM, Joseph M. Newcomer wrote:
>> See below...
>> On Tue, 18 May 2010 18:05:54 -0500, Peter Olcott<NoSpam(a)OCR4Screen.com> wrote:
>>
>>> On 5/18/2010 5:40 PM, Joseph M. Newcomer wrote:
>>>>> I could have called the range of code points above 7F to be named
>>>>> something other than a letter but there was no need to since its makes
>>>>> no relevant difference.
>>>> ****
>>>> So if I'm writing in some other language, and write two parameters, in my native language
>>>> using the letters for A and B, and my native comma, you are saying that A,B is actually a
>>>> single identifier? Really? Am I going to believe you are supporting me coding in my
>>>> native language? And this is just looking at the most trivial examples; I don't know
>>>> enough of Chinese, Japanese or Korean to tell how a "name" which is a sequence of letters
>>>> can be formed. I could believe that a single Chinese character would constitute a valid
>>>> variable name, so two such glyphs would be the equivalent of writing, in C, the expression
>>>> A B
>>>> which is syntactically invalid.
>>>>
>>>
>>> Let me more precisely state my original goal. My language will
>>> essentially have the syntax of C++, and allow its users to write
>>> Identifiers in their native language. If they don't use the ASCII comma
>>> between parameters then they are specifying incorrect syntax.
>> ***
>> Oh, so it is "write identifiers in their native language" but if they write the digits
>> 1234 in their native script, you think it is an identifier, not a number. And a comma is
>> a comma only if it is a comma that I recognize, not one you tell me I have to use.
>> ****
>
>A church that I used to go to had an expression, "What would Jesus do?"
>as their measure of correct behavior. In my case I have an analogous
>measure, "What would C++ do?"
>
>Would C++ permit digits other than ASCII [0-9] ???
***
How about

"Would a person who makes a claim that his language allows programmers to program in their
native language create a compiler in which their native digits are considered letters be
lying in his teeth about his claim?"
****
>
>>>
>>> From what I understand what I am proposing is much better than C/C++
>>> provides. From what I understand C/C++ only accepts ASCII. This
>>> restriction mandates substantially poorer code quality (in terms of self
>>> documenting code) for much of the rest of the world.
>> ****
>> C/C++ actually states that the character set for identifiers is implementation-specific
>
>So C++ can handle UTF-8?
****
No, the language document does NOT specify UTF-8; it says it is implementation-specific.
You seem to confuse a couple implementations that have deliberately limited themselves to
the ASCII-7 subset to be definitive of the language. Then you say you are going to create
a compiler which allows localized identifiers, then you say that by "localized identifier"
you mean "Whatever combination of special characters you feel like". I guess I don't see
how you can claim you have extended the acceptable set of input tokens and then said "any
glyph which is not an ASCII-7 character is necessarily a letter". Either you have a made
a horrible design blunder, or you are falsely representing that you can support localized
languages. UTF-8 is a trivial implementation detail by comparison. You seem to have
confused the concept of representation of symbols (UTF-8) with semantics of symbols
(localized digits and punctuation marks).

This sounds like another "My design decisions must be correct because they are the
decisions I made and therefore they cannot be in error" discussion. If anyone tries to
point out exactly how your design decisions are wrong, you insist that they are, by
definition, correct, because you have made them that way and you are always correct. This
is not productive. You don't even understand the difference between abstractions (lexical
tokens and parser noterminals) and their implementation (UTF-8 input text vs. Unicode
input text vs, ASCII-7 text).
joe

Joseph M. Newcomer [MVP]
email: newcomer(a)flounder.com
Web: http://www.flounder.com
MVP Tips: http://www.flounder.com/mvp_tips.htm
From: Peter Olcott on
On 5/19/2010 12:55 AM, Mihai N. wrote:
>
>> So C++ can take UTF-8 Identifiers?
>
> No, it can take Unicode identifiers.
> The exact transformation format is not relevant.
>
>
I though that it choked on anything besides ASCII. So are you implying
that it can take Unicode within any encoding?