The D Programming Language [C++]

Prev: how can operator new overrun memory?!
Next: Why no std::back_insert_iterator::value_type?

From: Lourens Veen on 25 Nov 2006 21:21

Walter Bright wrote:
>
> But I do have a couple challenges for you <g>.
>
> 1) Extend std::string to support UTF-8.

UTF-8 is a compression scheme for 32-bits UNICODE. std::string stores
uncompressed (possibly only 8-bit wide, depending on the platform)
characters. Consequently, there's a fundamental problem there,
because a UTF-8 encoded string has 32-bits wide characters.

I'm not an expert on C++ strings by any stretch of the imagination, so
the following may be based on misunderstanding (and if so please
kindly enlighten me):

What could be done is to create a char32_t type and an
std::basic_string<char32_t, char32_t_traits> implementation that uses
full 32-bits UNICODE encoding on the outside, and UTF-8 on the
inside, with on-the-fly compression and decompression where needed
(e.g. when inserting or extracting characters via operator[]). Add a
typedef calling it string32 for example.

If wchar_t is 32 bits, the above could be the implementation of
std::wstring.

I'd say that that is quite possible in C++, without compiler support.
This may not be compatible with other libraries using std::string,
but that's a problem with those libraries: they weren't written to
support 32-bit characters. Updating them should not be too much of a
hassle though: you just need to change char to char32_t, and
std::string to std::string32 (and possibly fix literals, although a
converting constructor and assignment operator could easily be
provided).

Lourens

--
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

From: Walter Bright on 26 Nov 2006 08:55

Peter Dimov wrote:
> Walter Bright wrote:
>> 5) better code generation (like for complex, we can (and do) return them
>> in the FPU registers rather than like a struct)
>
> If you optimize the general case of struct { X x; Y y; }, where X and Y
> fit in a register, you get much better ROI.

The C++98 Standard defines many operations on complex as taking an
argument of type complex&. That means taking a pointer to it, which puts
the kibosh on enregistering complex.

--
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

From: Walter Bright on 26 Nov 2006 08:49

James Kanze wrote:
> My point is that developing the
> language in ways which would allow std::complex to be just as
> good as a native type might be more productive than just
> implementing it as a native type, and letting it go at that.

Except that I haven't seen any solution that enables that. I'm sorry I'm
not smarter than the summed efforts of all the other language designers
out there, but as Clint Eastwood says, a man's got to know his
limitations <g>.

> In over thirty years of programming, I've never needed a complex
> type;

Numerical analysis is quite likely something that a mainstream
programmer would never encounter. It doesn't come up when you're writing
guis, databases, compilers, device drivers, web apps, games, etc.

It does come up when you're trying to do physics problems, numerical
integration, matrix math, stress analysis, orbital calculations,
engineering design work, scientific research, meteorological studies, etc.

> So why not move in a way that helps everyone, and not just a
> small group?

Because I've never seen a proposal for a solution that gets us there. Do
you have one?

>> 1) Digital Mars C and D implement complex natively, and complex function
>> return values are in the floating point register pair ST1,ST0. I don't
>> know of any C++ compiler that does that.
> And why not? Wouldn't it be better to develop a compiler which
> put any class type consisting of only two doubles in registers,
> rather than special case complex?

It's a big problem to do it with types that have things like copy
constructors, or any type that uses reference semantics - because you
cannot take the address of a register.

> (I know that g++ does put
> some simple structures in registers, at least in certain cases.
> I don't know if complex falls into those cases, however.)

g++ does not do it with complex. I doubt it does it for any type with
copy constructors.

>> It's certainly more efficient.
> It's a cheap hack, yes, which allows compiler writers to get
> efficiency simply for a benchmark case, while not providing it
> in general.

That would be true if complex numbers were only used in benchmarks.

> Why should complex be more performant than Point2D,
> or ColorPixel, or any other small class of that sort? (That's
> the Java situation, which is why Java beats C++ when dealing
> with double[], but becomes significantly slower as soon as you
> change it to Point2D[].)

Good point. So let's be fair and dumb down the language to the lowest
common denominator. Are you going to propose that 'int' also be removed
from C++ and replaced with std::Int ?

>> Native support also means the compiler can easily enregister the complex
>> numbers within the function.
> I'm not sure I understand. What do you mean be enregister the
> complex numbers within the function.

Enregistering a variable means storing it in registers rather than in
memory. Enregistering class types is a huge problem because of the
common use of reference semantics (such as in the copy constructor). I
don't know any compiler that enregisters class types.

> At any rate, I know that it is easier for a compiler to optimize
> a built in type. Which doesn't mean that it can't do as well
> with a user defined type, just that it requires a lot more
> sophisitication on the part of the compiler. But that's an
> argument which affects every type---in my current work, fixed
> decimal would be more useful than complex, and in my preceding
> job, an IP type. Where do you stop?

Since current compilers don't do this, it's clearly not an easy tweak.
How long are you (as numerical analyst) willing to wait for this? 1
year? 5 years? 10? Might as well stick with FORTRAN.

>> 2) std::complex has no way to produce a complex literal. So, you have to
>> write:
>> complex<double>(6,7)
>> instead of:
>> 6+7i // D programming
>
> Syntax is an issue, but isn't the solution developping ways to
> provide comfortable syntax for user defined types?

That would be a solution, if anyone has discovered a way to create user
defined tokens in a practical manner. There are no proposals to do this
for C++. What languages do allow user defined tokens? Perhaps it isn't
such an easy nut to crack.

>> 3) Error messages involving native types tend to be much more lucid than
>> error messages on misuse of library code.
>
> Especially if the type is a template:-).
>
> Again, it's a problem that compiler writers should solve, if
> only because library types aren't going to go away. (It is, I
> think, a more difficult problem than just getting the two
> doubles of a struct into registers.)

C++ compiler writers haven't solved this problem in the last 20 years of
trying, how long are you willing to wait for it? C++ isn't a new
language. Since solutions to these problems haven't appeared in C++
compilers yet, perhaps it just isn't practical to solve them in the
compiler.

It's easy to sit back as a language spec writer and wave your arms
around demanding that language implementors resolve everything. The
reality is that if you make the language too hard to implement, it
doesn't matter if it is theoretically possible to implement it or not -
the users don't have it available. "Export" is a prime example of that.

>> 4) There is much better potential for core native type exploitation of
>> mathematical identities and constant folding than there is for library
>> types where meaning must be deduced.
> I'm not sure about this. The "mathematical identities" of a
> constant are based on the identities of the underlying reals,
> along with the operations performed on them (definition of
> addition, multiplication, etc.). I would expect that the
> compiler could find them in both cases.

That isn't true with complex numbers. Because of the oddities of
floating point not exactly matching mathematics, decisions must be made
on the semantic identities that are not expressible in the UDT. I went
over some of these issues in the discussion on complex types.

>> 5) Lack of a separate imaginary type [...]
> I won't argue about the qualities of a particular
> implemenation choice. My knowledge of numeric processing isn't
> sufficient to really be able to judge such things. But I don't
> see where this is a problem related to the issue of whether the
> type is built-in or not: you could easily define an imaginary
> type in the library, and you could just as easily define a
> built-in complex without imaginary.

In this case, that is true. My bringing up the imaginary type serves a
couple of points:

1) lack of it shows a lack of understanding by C++ of the needs of
numerical analysis programmers, which suggests that numerical analysts
aren't using C++.

2) You can't just go and add your own imaginary class, and then expect
it to work properly with C++ libraries that use the standard
std::complex. Having complex be a library type doesn't mean it is
flexible or user extendible.

>> So, why isn't there much of any outcry about these problems with
>> std::complex?
> Maybe because it's not a real problem.

Maybe.

> Or maybe just because
> not enough people understand the issues. (I seem to recall
> reading somewhere that IBM's base 16 floating point cause real
> problems as well, but there wasn't much outcry about it,
> either.)

Few programmers understand computer floating point math very well, and
how it does not line up with mathematics. I've seen enough real examples
of people believing their floating point result "because it's a
computer, and computers cannot be wrong" to know that programmers
routinely fail to recognize that they are getting wrong answers.
Therefore, it's up to us as language designers and vendors to at least
get them the most correct implementations possible.

> As for extended doubles, I suspect that the main part of the
> reason is a lack of hardware support on the plateforms being
> used. I know that Sparc doesn't have it, for example.

As a numerical analyst who bought a PC to get 80 bit floats, why should
I suffer because of the Sparc's limitations? I didn't buy a Sparc, I
bought a PC. And dagnamit, I want to use the capabilities OF THE MACHINE
I BOUGHT. Dumbing down the language to support the worst floating point
implementation out there, to the point where I cannot even use better
floating point hardware that's been around for 25 years, is just
embarrassing.

> In fact,
> the only machine I know today where it is present is on PC's,
> and compilers there DO support it.

VC++ does not support it.

>> I used to do numerical analysis (calculating stress, inertia, dynamic
>> response, etc.), and having 80 bits available is a big deal.
> I can also be a trap. I'm sure that numeric analysists know how
> to deal with it, but I've seen people caught out more than a few
> times by the fact that intermediate calculations are done in
> long double, and the exact value of the results depending on
> when and where the compiler spilled to memory.

More bits is better, not a "trap". As I recently told a user who was
baffled by this exact problem, it showed that his calculation needed
more bits than doubles provided. He needed to rethink his algorithm, or
at the very least upgrade to 80 bit types. If all he had were doubles,
he might have never noticed that is calculation had gone awry. In no
case is he numerically *worse* off if temporaries have more bits.

>> If it was, why are
>> there proposals to, for example, add new basic types to core C++ to
>> support UTF strings?
> I don't know. I've supported UTF-8 strings in my code for
> years, without new types:-).

You aren't using std::string, then, or at least you're not using it as a
*string* type. Try inserting a non-ASCII Unicode character into one.

--
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

From: Walter Bright on 26 Nov 2006 08:53

Keith A. Lewis wrote:
> Forth never had strong corporate backing. You say you enjoy reading
> history. Chuck Moore invented Forth. Elizabeth Rather commercialized
> it. They are both still around today doing their respective thing.
> If you decide to look into this, I'd love to hear your take.

Perhaps they should write a wikipedia article on it.

> I'm probably going to get modded out for mentioning D, Oak, (Java
> implicitly), and Forth. I think C++ is the eighth wonder of the
> world. The amount of intellectual manpower that went into its
> creation is similar to the amount of physical manpower that must
> have been involved in the Great Pyramid of Giza.

I agree. That's why D is heavily based on C++ and not lesser stars in
the firmament.

> The (arti)fact is
> that it is very difficult to become a good C++ programmer.

The implication there is that powerful code must be hard to use. I
disagree strongly with that notion. After all, airplanes have gotten
progressively more capable, but are easier than ever to fly.

> You and Gosling might be the last two people on the planet that
> can write a major language singlehandedly. C++ is going to be around
> for a long time, and D seems to be a real improvement, but Microsoft
> learned their lesson from Java. C# is their reply, and they have
> the money to make it happen.

While D doesn't have billions of corporate dollars behind it, in a way
that's to the better. D doesn't have a corporate agenda behind it. It is
not designed to push a particular piece of hardware, or operating
system. It doesn't have to be justified to the shareholders. It doesn't
need to please a thesis adviser.

It's driven forward solely by the enthusiasm of its community, and
serves only them.

--
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

From: Walter Bright on 26 Nov 2006 08:54

Nemanja Trifunovic wrote:
> Walter Bright wrote:
>> Nemanja Trifunovic wrote:
>>> Walter Bright wrote:
>>>> 1) Extend std::string to support UTF-8.
>>> Ehm: http://utfcpp.sourceforge.net/
>> The class supports UTF-8, but not by extending std::string. It uses
>> std::vector, for example.
>>
>
> The set of template functions (not a class) is meant to be used with
> sequences of bytes, including char[], std::string or vector<char>. The
> point of my post is that a utf-8 encoded string may be stored into a
> std::string object just fine, and then some of the common string
> operations would require a little help from a library like the above
> mentioned one.

That doesn't help it work with other libraries that use plain
std::string and depend on it to work as specified. They'll fall over as
soon as they see a UTF-8 encoding.

--
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

First | Prev | Next | Last
Pages: 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41
Prev: how can operator new overrun memory?!
Next: Why no std::back_insert_iterator::value_type?