gsub not working to replace a 'Chinese' Charater. [Ruby]

Prev: Mapping error in soap4r
Next: What am I doing wrong?

From: Ryan Smith on 28 Jan 2010 12:05

gsub not works for me when replace 'DBCS'(double byte character set)
character, using last version ruby 1.8.6

when "strºº×Öend".gsub(/ºº×Ö/,"hanzi"),
output still is: strºº×Öend , but not strhanziend which I want to
get.

Searched web two whole night with no clue found.

Anyone can help are much appreciated, need got it work very urgent.
thank you!
--
Posted via http://www.ruby-forum.com/.

From: Roger Pack on 28 Jan 2010 12:58

Ryan Smith wrote:
> gsub not works for me when replace 'DBCS'(double byte character set)
> character, using last version ruby 1.8.6

Maybe try 1.9?
-r
--
Posted via http://www.ruby-forum.com/.

From: Benoit Daloze on 28 Jan 2010 13:05

With ruby 1.9.2dev (2010-01-14 trunk 26319) [x86_64-darwin10.2.0]
"strhanziend"

It works fine. You need to set encoding with "# encoding: utf-8" at
the top of the file.
In fact, it will complain if not in 1.9.2

Ruby 1.8.6 is kind of outdated, but at least I think it works with "-Ku".

2010/1/28 Ryan Smith <sunraise2005(a)gmail.com>:
> gsub not works for me when replace 'DBCS'(double byte character set)
> character, using last version ruby 1.8.6
>
> when "strºº×Öend".gsub(/ºº×Ö/,"hanzi"),
> output still is: strºº×Öend , but not strhanziend which I want to
> get.
>
> Searched web two whole night with no clue found.
>
> Anyone can help are much appreciated, need got it work very urgent.
> thank you!
> --
> Posted via http://www.ruby-forum.com/.
>
>

From: Ryan Smith on 28 Jan 2010 13:13

I parse a webpage which encoded in gb2312, using Watir to get the
context of the page title, and want to replace the 'chinese character'
in title with english words.

When puts title which watir get, the chinese character displaied as
corrupt code there (under windows cmd,code page using cp936, display
works normal when change code page to utf-8). But I think cmd's code
page just display setting does not related with what I need (replace
chinese char). I did not know if string I get by Watir is also in
'gb2312' encoding or something others, the fact is fail happen when
convert the string to utf-8 encoding, message is complain the char is
invalid.

totally no idea what need to do.

Richard Conroy wrote:
> On Thu, Jan 28, 2010 at 5:05 PM, Ryan Smith <sunraise2005(a)gmail.com>
> wrote:
>
>> thank you!
>>
>
> Mixing encoding schemes is hell in almost any context, and Ruby is no
> exception.
> Until you have complete control in your program over all encoding inputs
> you
> are
> going to fail.
>
> If your input is coming from the shell environment or standard in the
> text
> can be
> in the system encoding, regardless of what encoding you specify in Ruby.
>
> It is preferable to use unicode (UTF-8) in any operation where you are
> processing
> multilingual text. Failing that there is the Iconv library which you can
> use
> to convert
> between encoding schemes.
>
> Note that 'double-byte encoding scheme' is an utterly useless term for
> practical encoding
> purposes. Its a gross simplification of what is going on, and especially
> so
> with Han
> character sets. To do any practical work with non-unicode, multi-byte
> character sets, you
> have to know the encoding scheme.

--
Posted via http://www.ruby-forum.com/.

From: Marnen Laibow-Koser on 28 Jan 2010 13:16

Benoit Daloze wrote:
> With ruby 1.9.2dev (2010-01-14 trunk 26319) [x86_64-darwin10.2.0]
> "strhanziend"
>
> It works fine. You need to set encoding with "# encoding: utf-8" at
> the top of the file.
> In fact, it will complain if not in 1.9.2
>
> Ruby 1.8.6 is kind of outdated, but at least I think it works with
> "-Ku".

Ruby 1.8 isn't outdated. It just doesn't handle multibyte text that
well.

>
> 2010/1/28 Ryan Smith <sunraise2005(a)gmail.com>:

Best,
--
Marnen Laibow-Koser
http://www.marnen.org
marnen(a)marnen.org
--
Posted via http://www.ruby-forum.com/.

| Next | Last
Pages: 1 2
Prev: Mapping error in soap4r
Next: What am I doing wrong?