From: Andrei Alexandrescu (See Website For Email) on
David Abrahams wrote:
> "Andrei Alexandrescu (See Website For Email)"
> <SeeWebsiteForEmail(a)erdani.org> writes:
>
>
>>There might be a terminology confusion here, which I'd like to clear
>>from the beginning:
>>
>>1. A program "has undefined behavior" = effectively anything could
>>happen as the result of executing that program. The metaphor with the
>>demons flying out of one's nose comes to mind. Anything.
>>
>>2. A program "produces an undefined value" = the program could produce
>>an unexpected value, while all other values, and that program's
>>integrity, are not violated.
>>
>>The two are fundamentally different because in the second case you can
>>still count on objects being objects etc.; the memory safety of the
>>program has not been violated. Therefore the program is much easier to
>>debug.
>
>
> Seriously?
>
> IME you're at least likely to crash noisily close to the undefined
> behavior. If you make everything defined the program necessarily
> soldiers on until one of your own internal checks is able to notice
> that something went wrong. Or am I missing something?

I think it's one thing to have a wrong numeric value and one very
different thing to have a program in which all hell breaks looks due to
random overwriting of memory.

> I don't have any real experience with Java, but Python generally
> exhibits Java-like behavior, and I don't find it easier to debug than
> C++.

Well the only thing I can add is that in my limited experience,
debugging Java programs is much easier because there's never the case
that a dangling pointer misteriously overwrites some object it wasn't
supposed to. I remember __to this day__ a night in 1998 when a colleague
and myself spent one night figuring out a completely weird exception
being thrown (in a C++ program) under very complex circumstances - just
because of a misfit memcpy() in a completely different and unrelated
part of the program. Now that I think of that, I remember a few others.
Probably I'd remember even more under hypnosis :o). To tell the truth, I
also remember of a JVM bug causing me a few gray hairs... :o)


Andrei

--
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

From: Andrei Alexandrescu (See Website For Email) on
Gabriel Dos Reis wrote:
> "Andrei Alexandrescu (See Website For Email)"
> <SeeWebsiteForEmail(a)erdani.org> writes:
>
> [...]
>
> | There might be a terminology confusion here, which I'd like to clear
> | from the beginning:
> |
> | 1. A program "has undefined behavior" = effectively anything could
> | happen as the result of executing that program. The metaphor with the
> | demons flying out of one's nose comes to mind. Anything.
>
> Why is not that the value of the computation?
>
> | 2. A program "produces an undefined value" = the program could produce
> | an unexpected value, while all other values, and that program's
> | integrity, are not violated.
> |
> | The two are fundamentally different because in the second case you can
> | still count on objects being objects etc.;
>
> I don't see anything fundamental in that difference.

It's very simple. In one case you have a program that preserves its own
guarantees (e.g. there's no random overwriting of memory), but which has
one numerical value that's invalid; that can't corrupt memory because
there's no pointer forging. In the other case you can't count on pretty
much anything.

Explaining this any better is beyond my abilities.


Andrei

--
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

From: Jean-Marc Bourguet on
"Andrei Alexandrescu (See Website For Email)"
<SeeWebsiteForEmail(a)erdani.org> writes:

> Well the only thing I can add is that in my limited experience,
> debugging Java programs is much easier because there's never the case
> that a dangling pointer misteriously overwrites some object it wasn't
> supposed to.

Instead you are writing to an object which was supposed to be out of
existence for a long time. In my experience, that give you the same kind
of elusive bugs. Excepted that purify can't help you and that random
behaviour including crashes are replaced by deterministic, often plausible
but wrong results.

Yours,

--
Jean-Marc

[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

From: James Kanze on
Andrei Alexandrescu (See Website For Email) wrote:
> James Kanze wrote:
> > I don't know quite what different definitions we could be using.
> > Undefined behavior occurs when the language specification places
> > no definition on the behavior. I don't know how you can easily
> > search for it, because it is the absence of a definition. Java
> > (and most other languages) don't use the term, or even specify
> > explicitely what they don't specify. So the reponse is rather
> > the opposite: unless you can find some statement in the language
> > specification which defines this behavior, it is undefined
> > behavior.

> I was hoping I'd be saved of searching online docs, but now it looks
> like I had to, so so be it.

> There might be a terminology confusion here, which I'd like to clear
> from the beginning:

> 1. A program "has undefined behavior" = effectively anything could
> happen as the result of executing that program. The metaphor with the
> demons flying out of one's nose comes to mind. Anything.

The example is meant to be taken humorously. Surely you don't
think that the C++ standard would be improved, and that we would
have eliminated all "undefined behavior", in any useful,
realistic sense, if we added a clause to the standard saying
that "in no case is a program allowed to cause demons to fly out
of the programmers nose."

In practice, "undefined behavior" is always somewhat restricted;
in non-privileged code under Unix or Windows, for example, you
may get a core dump, but you won't corrupt the system or even
reformat the hard drive. The C++ standard prefers to not give
even these guarantees, because C++ is conceived for use in areas
where they don't apply---if you have undefined behavior in a
device driver, you might end up reformatting the hard disk.
Java can make concrete, specific limits, because it doesn't try
to be usable in such contexts. From the point of view of
someone developping application (non-privileged) software, C++
has some limits as well. That doesn't mean that it doesn't have
undefined behavior in such cases, at least not for any useful
meaning of the expression.

> 2. A program "produces an undefined value" = the program could produce
> an unexpected value, while all other values, and that program's
> integrity, are not violated.

In practice, in real programs, it's much more complicated.
"Values" interact, and the results of modifying values in the
wrong order, and seeing those modifications, can result in
behavior that the programmer cannot foresee. Not limited to
unexpected values, but including unexpected exceptions, etc. If
you violate the rules in Java, you cannot count on much in
practice, any more than if you violate them in C++. (You can
count on NOT getting a core dump, of course. Which I would
consider a defect, more than an advantage.)

> The two are fundamentally different because in the second case you can
> still count on objects being objects etc.; the memory safety of the
> program has not been violated. Therefore the program is much easier to
> debug.

Memory safety is only one part of "undefined behavior". Not
crashing when you have a serious error makes the program much
harder to debug---if there's a weakness here in C++, it's that
the crash is not guaranteed, not that it isn't forbidden. (But
pratically speaking, guaranteeing the crash in such cases is not
implementable at reasonable cost.)

> C++ allows programs with (1). We might also consider that it allows
> programs with (2) under the name of "unspecified behavior" or
> "implementation-dependent behavior". (There would be a subtle difference
> there, but passons.)

There's a radical difference. As a pratical programmer, there's
really not any significant difference between "unspecified
behavior" and "undefined behavior", unless there are serious
restrictions on "unspecified". Whereas I use implementation
defined behavior in just about every program I write.

> My current understanding is that Java programs never exhibit (1),

If you mean that Java guarantees that a program will never make
demons fly out of your nose, you're probably right. If you mean
that the program will behave in a reliable and predictable
manner regardless of what I've coded, you're definitely wrong.
The question is, I think, just how unreliable and unpredictable
it has to be before we speak of "undefined behavior". I would
say that there are certain cases involving threading where the
behavior is so unreliable and unpredictable that I would
consider it "undefined". Whether you agree with the actual word
is really not the issue---the point is that for a practical
programmer, you're faced with the same issues.

(Don't get me wrong---I think there is far too much of this
problem in C++, and Java does handle it significantly better.
The only cases I can think of in Java where it is a problem do
involve threading, which is an extremely complex issue; in C++,
you can get similar problems with even the simplest, single
threaded code, e.g. by returning a pointer or a reference to a
local variable. Just because I refuse to accord Java the
absolute doesn't mean that I don't recognize that it represents
orders of magnitude improvement in most cases.)

> and
> might exhibit (2) only on values that can't be read atomically (which
> remarkably are never pointers).
> To find out whether my understanding is
> correct, I looked up the language spec, which says after a discussion of
> the memory model (see
> http://java.sun.com/docs/books/jls/third_edition/html/memory.html#17.3):

> "Therefore, a data race cannot cause incorrect behavior such as
> returning the wrong length for an array."

Which is a true, but it is a useless guarantee. I can get the
wrong length from a java.util.Vector.

The possibly useful guarantee is that if I use the wrong length,
I still have defined behavior. It would be even more useful if
the guarantee was sensible; if the code was guaranteed to crash,
instead of just throwing an exception which can be caught and
ignored. (At least in my field of endevour. I can quite
understand that there are cases where the exception, if it is
caught at a high enough level, might be appropriate. The trick
would be to define a type of exception which can only be caught
at a high enough level, so that lower level code can't mask its
errors and return wrong results.)

> Later on that page, there is a section "17.7 Non-atomic Treatment of
> double and long" that discusses the exact issue we are talking about here.

> "Some implementations may find it convenient to divide a single write
> action on a 64-bit long or double value into two write actions on
> adjacent 32 bit values. For efficiency's sake, this behavior is
> implementation specific; Java virtual machines are free to perform
> writes to long and double values atomically or in two parts.

> For the purposes of the Java programming language memory model, a single
> write to a non-volatile long or double value is treated as two separate
> writes: one to each 32-bit half. This can result in a situation where a
> thread sees the first 32 bits of a 64 bit value from one write, and the
> second 32 bits from another write. Writes and reads of volatile long and
> double values are always atomic. Writes to and reads of references are
> always atomic, regardless of whether they are implemented as 32 or 64
> bit values.

> VM implementors are encouraged to avoid splitting their 64-bit values
> where possible. Programmers are encouraged to declare shared 64-bit
> values as volatile or synchronize their programs correctly to avoid
> possible complications."

> This section can be understood only if we know what a Java program does
> once it's read an invalid (say, NaN) value. Will it crash?

Can the VM avoid crashing, if the OS decides that that is what
it wants to do?

More to the point, does the fact that a Java program cannot
crash (IF that is the case) mean that Java has no undefined
behavior, or is it more or less a specious guarantee, with about
as much meaning as if C++ added a guarantee that no C++ program
could make demons fly out of your nose. Do my programs suddenly
loose all undefined behavior if I set SIGILL, SIGBUS, SIGSEGV
and SIGFPE to ignore at the start?

--
James Kanze (GABI Software) email:james.kanze(a)gmail.com
Conseils en informatique orient�e objet/
Beratung in objektorientierter Datenverarbeitung
9 place S�mard, 78210 St.-Cyr-l'�cole, France, +33 (0)1 30 23 00 34


--
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

From: James Kanze on
Andrei Alexandrescu (See Website For Email) wrote:
> Gabriel Dos Reis wrote:
> > "Andrei Alexandrescu (See Website For Email)"
> > <SeeWebsiteForEmail(a)erdani.org> writes:

> > [...]

> > | There might be a terminology confusion here, which I'd like to clear
> > | from the beginning:

> > | 1. A program "has undefined behavior" = effectively anything could
> > | happen as the result of executing that program. The metaphor with the
> > | demons flying out of one's nose comes to mind. Anything.

> > Why is not that the value of the computation?

> > | 2. A program "produces an undefined value" = the program could produce
> > | an unexpected value, while all other values, and that program's
> > | integrity, are not violated.

> > | The two are fundamentally different because in the second case you can
> > | still count on objects being objects etc.;

> > I don't see anything fundamental in that difference.

> It's very simple. In one case you have a program that preserves its own
> guarantees (e.g. there's no random overwriting of memory), but which has
> one numerical value that's invalid; that can't corrupt memory because
> there's no pointer forging. In the other case you can't count on pretty
> much anything.

I think we understand this difference. I, at least, also
recognize it as a positive point. The problem is understanding
just to what point it's relevant.

To come back to a point you made: Java guarantees that you
cannot get the wrong length for an array. Fine, but unless it
can guarantee the same thing for Vector, or other similar types,
has it really bought me anything? Individual values don't exist
in a vacuum; they exist in relationships to other values. The
effect is just as undefined as in C++, in practice. Java
certainly makes more guarantees than C++, and it also provides
defined means of detecting a number of errors, but it isn't
100%. As far as I know, 100% isn't possible.

--
James Kanze (GABI Software) email:james.kanze(a)gmail.com
Conseils en informatique orient�e objet/
Beratung in objektorientierter Datenverarbeitung
9 place S�mard, 78210 St.-Cyr-l'�cole, France, +33 (0)1 30 23 00 34


--
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]