From: Roger Pack on
> One has to laugh or cry. As best I could, I factored out my opinion of
> all this into a separate file:
> http://github.com/candlerb/string19/raw/47b0cba0a2047eca0612b4e24a540f011cf2cac3/soapbox.rb


You should post it to core...
-r
--
Posted via http://www.ruby-forum.com/.

From: Charles Oliver Nutter on
On Fri, Feb 26, 2010 at 3:04 PM, Brian Candler <b.candler(a)pobox.com> wrote:
> Bob Hutchison wrote:
>> Brian your http://github.com/candlerb/string19/blob/master/string19.rb
>> is something else! I'm laughing with a slightly hysterical edge.
>
> One has to laugh or cry. As best I could, I factored out my opinion of
> all this into a separate file:
> http://github.com/candlerb/string19/raw/47b0cba0a2047eca0612b4e24a540f011cf2cac3/soapbox.rb

This is exactly the situation I worried about when Matz proposed the
"all encodings" view of Ruby 1.9. Even though many applications won't
run into this, any that try to deal with >1 encoding at a time will
have a clusterfuck of a time making sure everything fits together. And
this is to say nothing of the implementation effort required, which
still isn't all there in JRuby (and won't be until 1.6 or later).

I didn't read this whole thread, since there's a lot of "it's a
bug/it's not a bug" exploration, but if there's something we need to
fix in JRuby, please do report it (and try to help fix it, too :)).

- Charlie

From: Roger Pack on
> One has to laugh or cry. As best I could, I factored out my opinion of
> all this into a separate file:
> http://github.com/candlerb/string19/raw/47b0cba0a2047eca0612b4e24a540f011cf2cac3/soapbox.rb

re: string1 + string2 + string3 actually working without fear...

One thing that might help would be to set the default encoding, then all
three strings would (might ?) have the same encoding (?)

-rp

--
Posted via http://www.ruby-forum.com/.

From: Brian Candler on
Roger Pack wrote:
> re: string1 + string2 + string3 actually working without fear...
>
> One thing that might help would be to set the default encoding, then all
> three strings would (might ?) have the same encoding (?)

That depends where the strings came from. If they were returned by a
library function (either Ruby core or 3rd party) you won't know what
encoding they have unless it is documented what the encoding is or how
it is chosen, and it almost never is.

Equally, if you are writing a library for use by other people, then you
really should not touch global state such as Encoding.default_external.
So you are left with Ruby guessing encodings and forcing them if it
guesses wrongly, e.g.

$ ruby19 -e 'puts %x{cat /bin/sh}.encoding'
UTF-8

Of course, if you're saying that your application handles all strings in
the same encoding, then this whole business of tagging every
*individual* string object with its own encoding is a waste of time and
effort, and is just something which you have to fight against.

But we're flogging a dead horse here. I hate this stuff; other people
seem to love it.
--
Posted via http://www.ruby-forum.com/.

From: Caleb Clausen on
On 3/3/10, Brian Candler <b.candler(a)pobox.com> wrote:
> But we're flogging a dead horse here. I hate this stuff; other people
> seem to love it.

Having wrestled with these issues a little bit myself, I think your
criticisms are cogent. Unlike you, tho, I'd rather not drop the whole
string encoding feature in 1.9. (Any solution to the rather ugly
problem of string encodings is going to have some problems. Ruby's got
a different (and more complicated) approach to it than other
languages... but if the remaining wrinkles can be smoothed out, it
will be a better solution overall.)

I wish someone would take the inconsistencies you've found and
criticisms you've made to heart and find some kind of way to address
them.

One thing that might help is a variant of the Rope class Intransition
was wishing for just recently. There's no reason that the individual
String segments of a Rope couldn't each have different encodings....
this would help with the catenation of Strings with different
encodings, for instance. It gets complicated, tho. How do you do a
Regexp match against a multi-encoded Rope? (It's hard and/or tricky,
but I think can be done.) I've suggested this on ruby-core before, but
no-one wants this in the interpreter itself... probably appropriately.