From: Dennis Nedry on
I have a routine for converting ansi with "extended" ibm characters to
html. It is as follows...

EXTENDED_ANSI_TABLE = {
227.chr => "<br>",
32.chr => "&nbsp;",
128.chr => "&Ccedil;", #128 C, cedilla (199)
129.chr => "&uuml;", #129 u, umlaut (252)
130.chr => "&eacute;", #130 e, acute accent (233)
131.chr => "&acirc;", #131 a, circumflex accent (226)
132.chr => "&auml;", #132 a, umlaut (228)
133.chr => "&agrave;", #133 a, grave accent (224)
134.chr => "&aring;", #134 a, ring (229)
135.chr => "&ccedil;", #135 c, cedilla (231)
136.chr => "&ecirc;", #136 e, circumflex accent (234)
137.chr => "&euml;", #137 e, umlaut (235)
138.chr => "&egrave;", #138 e, grave accent (232)
139.chr => "&iuml;", #139 i, umlaut (239)
140.chr => "&icirc;", #140 i, circumflex accent (238)
141.chr => "&igrave;", #141 i, grave accent (236)
#big huge list continues for pages...
}


def parse_ansi_ext(str)

EXTENDED_ANSI_TABLE.each_pair {|color, result|
str = str.gsub(color,result)
}
return str
end

This worked in 1.8, no problem.

If the input contains a character above 127.chr, it now bombs with the error:

"Encoding::CompatibilityError at /
incompatible encoding regexp match (ASCII-8BIT regexp with ISO-8859-1 string)"

I've tried various acts of desperation to fix it, to no avail. I
don't understand exactly what is wrong...

Thanks,

Dennis

From: Dennis Nedry on
On Wed, Jun 16, 2010 at 6:30 PM, Michael Fellinger
<m.fellinger(a)gmail.com> wrote:
>
> str has the encoding ISO-8859-1, probably inherited from your system locale.
> Convert it to ASCII-8BIT before processing it.
>
> http://blog.grayproductions.net/articles/ruby_19s_string

Thanks, that worked. I guess we should always specify file encoding
from now on.

Take Care,

mark


--
"I've got ham but I'm not a hamster."

-Bill Bailey