From: James Kanze on
Niklas Matthies wrote:
> On 2006-12-13 23:48, Andrei Alexandrescu (See Website For Email) wrote:
> > Niklas Matthies wrote:
> >> Well, it depends what one considers "basic". It's possible in Java to
> >> have the statement

> >> System.out.println("Hello, world!");

> >> output "Suprise!" (or any other arbitrary string), by appropriate
> >> preceding code.

[...]

> > I didn't know that! How is it possible?

> Because string objects initialized from string literals are just
> regular instances of the java.lang.String class, which is implemented
> in plain Java (with the exception of its intern() method).

> > Got code?

> Here you go:

You missed the best part:

> class Test
> {
> public static void main(String[] args) throws Exception
> {
> java.util.HashSet set = new java.util.HashSet();
> set.add("Hello, World!");
>
> doEvil();
>
> set.add("Hello, World!");
> System.out.println(set); // prints "[Surprise!, Surprise!]"
System.out.println( "Hello, World!" );
// Also prints "Surprise!"
> }

[...]

It's nice to know that string literals aren't constants. (Sort
of reminds me of Fortran IV, where constants passed to a
function could be modified by the function, so a different
constant would be passed the next time. If you look at Niklas'
code, you'll also see how you can get things like:
String s = "Hello, World!" ;
s.lastIndexOf( 'H' )
throwing an ArrayIndexOutOfBoundsException.

Of course, this was also the case in the original C. Maybe Java
got its ideas about how a string literal should behave from
there. Thank goodness we've made some progress in this respect
in C++ (and in C90---even the C standards committee thought that
modifying constants was taking empowerment of the programmer a
bit too far).

--
James Kanze (GABI Software) email:james.kanze(a)gmail.com
Conseils en informatique orient�e objet/
Beratung in objektorientierter Datenverarbeitung
9 place S�mard, 78210 St.-Cyr-l'�cole, France, +33 (0)1 30 23 00 34


--
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

From: Al on
James Kanze wrote:
<snip>
>
> It's nice to know that string literals aren't constants. (Sort
> of reminds me of Fortran IV, where constants passed to a
> function could be modified by the function, so a different
> constant would be passed the next time. If you look at Niklas'
> code, you'll also see how you can get things like:
> String s = "Hello, World!" ;
> s.lastIndexOf( 'H' )
> throwing an ArrayIndexOutOfBoundsException.
>
> Of course, this was also the case in the original C. Maybe Java
> got its ideas about how a string literal should behave from
> there. Thank goodness we've made some progress in this respect
> in C++ (and in C90---even the C standards committee thought that
> modifying constants was taking empowerment of the programmer a
> bit too far).

Well, there are two issues, which are distinct:

A) (String) Literals being unique (single instance).
B) (String) Literals being constant (immutable).

If I understand correctly, A is done to minimize redundant memory
consumption. I agree that /if/ A is true (in any given language), then B
/should/ be true.

However, if A is false, then B is not necessary. In my opinion, A is
Premature Optimization� that puts unfortunate constraints on the
language. How many identical string literals does a program have, on
average? I would say very few, if the code is well-written. If the
program is dynamically localizable (as is often the case), probably /none/.

Furthermore, if I understand correctly:

In C++, A is true* and B is true**.
In Java, A is true*** and B true****.

* Or at least, probably, since the compiler will likely optimize it.
** Except char pointers decay to non-const.
*** At least those created at compile-time.
**** Except that reflection can be used to bypass it.

So I would conclude that ideally, a modern language should make string
literals:

A) Per-instance (or CoW).
B) Mutable.

If this is not possible, then at least:

A) Unique.
B) Const.

The worst possible case is:

A) Unique.
B) Mutable.


Depending on how you interpret the caveats, I would argue that both Java
/and/ C++ are in the third category, which is not good.

Cheers,
-Al.

--
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

From: peter koch larsen on

Andrei Alexandrescu (See Website For Email) skrev:
> Niklas Matthies wrote:
> > Well, it depends what one considers "basic". It's possible in Java to
> > have the statement
> >
> > System.out.println("Hello, world!");
> >
> > output "Suprise!" (or any other arbitrary string), by appropriate
> > preceding code.
[snip]
> I didn't know that! How is it possible? Got code? Heck, it's not
> possible in many C and C++ implementations - they put constant strings
> in read-only pages.

cheers! Andrei,

I accidently fell over an article called something like
"hi there".equals("cheers !") == true

and skimming the article shows that this is exactly the article you
requested. It is referenced at Kevlin Heeney'(?)s web (curbralan?), and
I believe it was an article from Artima. Anyway, a quick google should
get you home no sweat.

hi there
Peter


--
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

From: James Kanze on
Al wrote:
> James Kanze wrote:
> <snip>

> > It's nice to know that string literals aren't constants. (Sort
> > of reminds me of Fortran IV, where constants passed to a
> > function could be modified by the function, so a different
> > constant would be passed the next time. If you look at Niklas'
> > code, you'll also see how you can get things like:
> > String s = "Hello, World!" ;
> > s.lastIndexOf( 'H' )
> > throwing an ArrayIndexOutOfBoundsException.

> > Of course, this was also the case in the original C. Maybe Java
> > got its ideas about how a string literal should behave from
> > there. Thank goodness we've made some progress in this respect
> > in C++ (and in C90---even the C standards committee thought that
> > modifying constants was taking empowerment of the programmer a
> > bit too far).

> Well, there are two issues, which are distinct:

> A) (String) Literals being unique (single instance).
> B) (String) Literals being constant (immutable).

Formally, yes. Practically, strings are values, so identity
isn't important, which means that if the strings are constant,
whether identical strings are a single instance or not is
irrelevant. (There are exceptions to this, of course. When
optimizing, it is sometimes useful to require a single instance
for all identical strings, in order to just compare pointers,
rather than comparing all of the characters.)

> If I understand correctly, A is done to minimize redundant memory
> consumption.

Not only. Depending on how and where it is done, it can be used
to reduce total memory consumation, reduce dynamic allocation
(which can be expensive in terms of run-time) or to simplify
comparisons---if you know that two strings with the same value
must be at the same address, you can just compare pointers.

> I agree that /if/ A is true (in any given language), then B
> /should/ be true.

Per definition, B should be true. A literal is a compile time
constant. The only exceptions I'm aware of were early versions
of Fortran and C---and now Java. Both Fortran and C corrected
this defect very early in their existance. Java seems to have
added it; it wasn't present in the earliest implementations
(which didn't have reflection).

> However, if A is false, then B is not necessary.

I disagree. If I see a numeric constant 42 in the source code,
I should be able to count on its value being 42. And if I see a
string literal "abc", I should be able to count on its value
being "abc". Constants should not be variables, and vice versa.

> In my opinion, A is
> Premature Optimization? that puts unfortunate constraints on the
> language.

It has nothing to do with optimization. It's a question of
readability. How would you like it if the expression "i += 1"
added 2 to i? And how is that any different from the expression
`System.println( "Hello" )' printing "Good bye"?

> How many identical string literals does a program have, on
> average? I would say very few, if the code is well-written. If
> the program is dynamically localizable (as is often the case),
> probably /none/.

I don't know. "WHERE" tends to occur a lot in SQL requests
(with what precedes and follows variable). And I would strongly
recommend NOT replacing "WHERE" with "O�" or "WO", just because
you are in a French or German locale. An HTML client will
doubtlessly want to use "GET" (but that use is more likely to be
localized in one place in the program). And the logging macros
are full of __FILE__, which expands to the same string literal
throughout the file.

Not that that's relevant to anything. (Except maybe the
expansion of __FILE__, which could increase the size of the
executable noticeably if the identical instances aren't merged.)

> Furthermore, if I understand correctly:

> In C++, A is true* and B is true**.

> * Or at least, probably, since the compiler will likely optimize it.
> ** Except char pointers decay to non-const.

A is unspecified. B is formally true, in that any attempt to
modify a string literal is undefined behavior. Because early C
guaranteed that string literals could be modified, and that each
instance was a separate object, many C++ compilers still support
this (often only with certain compiler options).

Note that the fact that the pointer can be implicitly converted
to non-const, at least in some very frequent cases, does not
authorize modification. It's an intentional hack to support
previously existing practice.

> In Java, A is true*** and B true****.
> *** At least those created at compile-time.
> **** Except that reflection can be used to bypass it.

If it isn't created at compile-time, it isn't a string literal,
either in Java or C++. And if there's anything in the language
which allows you to modify a literal, that's a serious defect.

In the case of Java, the problem concerning literals may be the
most shocking, externally, but the fact that you can modify a
String after having passed it to another subsystem is far more
serious, since it undermines many of Java's security measures.

> So I would conclude that ideally, a modern language should make string
> literals:

> A) Per-instance (or CoW).
> B) Mutable.

A literal should never be mutable. Modifying a literal is on
the same level as other self-modifying code.

> If this is not possible, then at least:

> A) Unique.
> B) Const.

> The worst possible case is:

> A) Unique.
> B) Mutable.

> Depending on how you interpret the caveats, I would argue that
> both Java /and/ C++ are in the third category, which is not
> good.

The modification of literals is a fun exercise, to demonstrate
the problem. (G++ puts string literals in write protected
memory, so they can't be modified. Period. Sun CC will do so
to, with the right options.) But it's only one aspect of the
problem; the real problem is modifying something that the author
of the code thinks cannot be modified. In C++, this is most
often a result of unintentional aliasing---just because you have
a std::string const& doesn't mean that the string value will not
change. In C++, however, this is so frequently a problem that
it is pretty well understood; most C++ programmers know that if
you need to be sure that something doesn't change, you make a
deep copy of it---you use pass by value. Java has similar
problems, in that you don't always know when objects are shared,
and when they aren't. This is normally only a problem with
objects which have value semantics---if identity is relevant to
the object's semantics, then obviously, you know which objects
are shared, and which aren't, by design. The normal solution to
this is to make value objects immutable. (For a good example of
what happens when you don't, consider the return value of
javax.swing.getPreferredSize(), which returns a mutable value
object. What happens if you modify it? Depending on the code
you've previously executed, and the layout manager installed,
you may or may not modify the preferred size of the component;
it's anybody's guess.) And of course, the problem here is that
we have a means of modifying an object which has been carefully
designed to be immutable, and which must be immutable, for
security reasons. In practice, you can probably force
uniqueness by something like:

StringBuffer tmp( " " ) ;
tmp.append( s ) ;
s = tmp.substring( 1 ) ;

but 1) I don't think it's formally guaranteed, and 2) I've never
seen the necessity of this sort of hack documented.

And I repeat, the possibility of modifying a string *after*
having passed it to a library function is a serious security
hole. I'm very surprised that Java let's this one through.

--
James Kanze (GABI Software) email:james.kanze(a)gmail.com
Conseils en informatique orient�e objet/
Beratung in objektorientierter Datenverarbeitung
9 place S�mard, 78210 St.-Cyr-l'�cole, France, +33 (0)1 30 23 00 34


--
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

From: Niklas Matthies on
On 2006-12-15 13:23, James Kanze wrote:
:
> In the case of Java, the problem concerning literals may be the
> most shocking, externally, but the fact that you can modify a
> String after having passed it to another subsystem is far more
> serious, since it undermines many of Java's security measures.

No, it doesn't, because a security-conscious application will
run under a SecurityManager that will prevent such accesses
(the setAccessible() call will fail).

The motivation for enabling such accesses is of course for use by a
debugger without causing the debugging-enabled Java implementation to
become non-conformant.

-- Niklas Matthies

--
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]