From: Brian Candler on
Perry Smith wrote:
> But the "fixed encoding" is a key part of the puzzle I was missing.
> Also, David, I had not bumped into the ENC_UTF8 constant yet. There are
> quite a few constants (like the 16 pointed out by David also) is a flag
> to make the encoding "fixed".

16 is just Regexp::FIXEDENCODING

irb(main):001:0> Regexp::FIXEDENCODING
=> 16

In the 1.9.2 I have here (r24186, 2009-07-18) there is no
Regexp::ENC_UTF8, so it must be relatively new.

irb(main):002:0> Regexp::ENC_UTF8
NameError: uninitialized constant Regexp::ENC_UTF8
from (irb):2
from /usr/local/bin/irb192:12:in `<main>'
irb(main):003:0> Regexp.constants
=> [:IGNORECASE, :EXTENDED, :MULTILINE, :FIXEDENCODING]

As for the third arg to Regexp.new, I have no idea. Documentation is not
Ruby's strong point at the best of times, but it's nonexistent for the
encoding stuff.
--
Posted via http://www.ruby-forum.com/.

From: David Springer on
[Note: parts of this message were removed to make it a legal post.]

My bad.

I was running 1.9.1, which had no FIXEDENCODING.

Regexp.constants
=> [:IGNORECASE, :EXTENDED, :MULTILINE, :ONCE, :ENC_NONE, :ENC_EUC, :ENC
_SJIS, :ENC_UTF8]

So things have changed since 1.9.1

If you are running 1.9.2 then use FIXEDENCODING and you should be fine.

I THINK that you are saying with FIXEDENCODING is NOT to revert back to
something like ASCII.

BTW in 1.9.1

>> Regexp::ENC_EUC
=> 16
>> Regexp::ENC_SJIS
=> 16
>> Regexp::ENC_UTF8
=> 16

On Wed, Feb 24, 2010 at 4:55 PM, Brian Candler <b.candler(a)pobox.com> wrote:

> Perry Smith wrote:
> > But the "fixed encoding" is a key part of the puzzle I was missing.
> > Also, David, I had not bumped into the ENC_UTF8 constant yet. There are
> > quite a few constants (like the 16 pointed out by David also) is a flag
> > to make the encoding "fixed".
>
> 16 is just Regexp::FIXEDENCODING
>
> irb(main):001:0> Regexp::FIXEDENCODING
> => 16
>
> In the 1.9.2 I have here (r24186, 2009-07-18) there is no
> Regexp::ENC_UTF8, so it must be relatively new.
>
> irb(main):002:0> Regexp::ENC_UTF8
> NameError: uninitialized constant Regexp::ENC_UTF8
> from (irb):2
> from /usr/local/bin/irb192:12:in `<main>'
> irb(main):003:0> Regexp.constants
> => [:IGNORECASE, :EXTENDED, :MULTILINE, :FIXEDENCODING]
>
> As for the third arg to Regexp.new, I have no idea. Documentation is not
> Ruby's strong point at the best of times, but it's nonexistent for the
> encoding stuff.
> --
> Posted via http://www.ruby-forum.com/.
>
>


--
David N. Springer
Eau Claire, WI

From: Bob Hutchison on

On 24-Feb-10, at 6:22 PM, David Springer wrote:

> My bad.
>
> I was running 1.9.1, which had no FIXEDENCODING.
>
> Regexp.constants
> =>
> [:IGNORECASE, :EXTENDED, :MULTILINE, :ONCE, :ENC_NONE, :ENC_EUC, :ENC
> _SJIS, :ENC_UTF8]
>
> So things have changed since 1.9.1
>
> If you are running 1.9.2 then use FIXEDENCODING and you should be
> fine.
>
> I THINK that you are saying with FIXEDENCODING is NOT to revert back
> to
> something like ASCII.

This has been really helpful, but I'm still having difficulties. I'm
running 1.9.1p376 and:

Regexp.constants
=> [:IGNORECASE, :EXTENDED, :MULTILINE]

But if I use 16 rather than FIXEDENCODING it works as in the examples
in this thread.

Does anyone know what's going on here? I used to have a pretty good
handle on encodings. This Ruby encoding stuff is something I've been
struggling with for 6 months and I think all that I've managed to do
is completely corrupt my understanding of encoding. It's starting to
look like magic. I know that a bunch of things changed between
1.9.1p243 and 1.9.1p376, but, since I think that what I 'know' about
encoding might be completely delusional at this point, I suppose I
don't really know.

Brian your http://github.com/candlerb/string19/blob/master/string19.rb
is something else! I'm laughing with a slightly hysterical edge.

Cheers,
Bob

>
> BTW in 1.9.1
>
>>> Regexp::ENC_EUC
> => 16
>>> Regexp::ENC_SJIS
> => 16
>>> Regexp::ENC_UTF8
> => 16
>
> On Wed, Feb 24, 2010 at 4:55 PM, Brian Candler <b.candler(a)pobox.com>
> wrote:
>
>> Perry Smith wrote:
>>> But the "fixed encoding" is a key part of the puzzle I was missing.
>>> Also, David, I had not bumped into the ENC_UTF8 constant yet.
>>> There are
>>> quite a few constants (like the 16 pointed out by David also) is a
>>> flag
>>> to make the encoding "fixed".
>>
>> 16 is just Regexp::FIXEDENCODING
>>
>> irb(main):001:0> Regexp::FIXEDENCODING
>> => 16
>>
>> In the 1.9.2 I have here (r24186, 2009-07-18) there is no
>> Regexp::ENC_UTF8, so it must be relatively new.
>>
>> irb(main):002:0> Regexp::ENC_UTF8
>> NameError: uninitialized constant Regexp::ENC_UTF8
>> from (irb):2
>> from /usr/local/bin/irb192:12:in `<main>'
>> irb(main):003:0> Regexp.constants
>> => [:IGNORECASE, :EXTENDED, :MULTILINE, :FIXEDENCODING]
>>
>> As for the third arg to Regexp.new, I have no idea. Documentation
>> is not
>> Ruby's strong point at the best of times, but it's nonexistent for
>> the
>> encoding stuff.
>> --
>> Posted via http://www.ruby-forum.com/.
>>
>>
>
>
> --
> David N. Springer
> Eau Claire, WI

----
Bob Hutchison
Recursive Design Inc.
http://www.recursive.ca/
weblog: http://xampl.com/so





From: Perry Smith on
Bob Hutchison wrote:
> On 24-Feb-10, at 6:22 PM, David Springer wrote:
>
> Regexp.constants
> => [:IGNORECASE, :EXTENDED, :MULTILINE]
>

I might help you to know that your constants are the same as mine. I
don't know how David got his.

Unfortunately, I still have not gotten back to my investigation of this.
Looking at the code in re.c helped me a bit.

Aside from that, I think we are all struggling with this. I'm hoping
that there are a few "bugs" in the code... i.e. Mat has a clear idea of
how things should work but there are just a few mistakes that really
hamper our understanding.

HTH,
Perry
--
Posted via http://www.ruby-forum.com/.

From: Brian Candler on
Bob Hutchison wrote:
> Brian your http://github.com/candlerb/string19/blob/master/string19.rb
> is something else! I'm laughing with a slightly hysterical edge.

One has to laugh or cry. As best I could, I factored out my opinion of
all this into a separate file:
http://github.com/candlerb/string19/raw/47b0cba0a2047eca0612b4e24a540f011cf2cac3/soapbox.rb
--
Posted via http://www.ruby-forum.com/.