From: Ben Bullock on
On Sun, 15 Jun 2008 15:26:11 -0700, gtr wrote:

> I have a friend that sends me notes from his Windows box and text
> utilties in Japanese. I think this/his enclave still uses Shift-JIS
> for some of their text encoding. I am of the suspicion that generally
> the Mac lives in unicode-ville.
>
> So he sends some text I can't read.

> Does anyone know of similar utilities that will do this kind of thing
> alone, rather than being buried inside a much larger and more expensive
> program?

If you just want to read the text, look at the text file inside a web
browser. Browsers can usually guess the encoding but if it is still
garbled you can use the "View/Character Encoding" option to set to Shift-
JIS.

If you want to convert a lot of files, get ActiveState Perl (it's free)
and run the following script on it as in

perl script garbled.txt

>>> start of script, delete all lines until # >>>

#! perl
use warnings;
use strict;
open my $garbled, "<:encoding(cp932)", $ARGV[0] or die $!;
binmode STDOUT,"encoding(utf8)"; # pick another encoding if you prefer
while (<$garbled>) { print }
__END__
<<< end of script, delete this line and everything after

This will print the output onto the console. If you need to save it to a
file,

perl script garbled.txt utf8out.txt

#! perl
use warnings;
use strict;
open my $garbled, "<:encoding(cp932)", $ARGV[0] or die $!;
open my $garbledout, ">:encoding(utf8)", $ARGV[1] or die $!;
while (<$garbled>) { print $garbledout $_ }

Note that the name of the variety of "Shift JIS" produced by Microsoft
software like Microsoft Word is actually CP932 (code page 932) not "shift
JIS" - if you use authentic "shift JIS" you'll get some errors with odd-
bod characters, so it's safer to say "cp932".

--
sci.lang.japan FAQ: http://www.sljfaq.org/
From: Ben Bullock on
On Wed, 18 Jun 2008 17:59:11 +0900, Paul D wrote:

> Windows' Notepad (their version of Textedit) couldn't even display
> Unicode text last time I checked.

The last time you checked being around 1998 or so?

> (chuckles sadly, shakes head)

(& doesn't check facts before posting)

--
sci.lang.japan FAQ: http://www.sljfaq.org/
From: gtr on
On 2008-06-17 21:33:59 -0700, Jolly Roger <jollyroger(a)pobox.com> said:

>>> And _maybe_ TextWrangler would be better at guessing the encoding.
>>
>> I can't figure out how to make it work. I paste it Shift-JIS garble,
>> and can't figure out what to do to change the intended encoding. I
>> took a hard look at the manual too and can't figure it from there.
>> Looks like a cool program, though.
>
> Before pasting it into the new document, change the encoding of the new
> document to Shift-JIS.

Thanks, I just tried that. Didn't work. I culled text from a usenet
posting, took it to TextWrangler, set the document for a few encodings,
pasted the text. It was identical everytime. I've done the same kind
of thing in JEdit, and almost everytime it changed it to a different
garble; when the correct format was identified it corrected the garble.
With TextWrangler, I change the format, paste the text and regardless
of the format it changes not one whit.

So I still thinking I'm doing doing it correctly.
--
Thank you and have a nice day.

From: gtr on
On 2008-06-18 09:05:16 -0700, Jolly Roger <jollyroger(a)pobox.com> said:

>> So I still thinking I'm doing doing it correctly.
>
> I think I would try saving the post from your news reader to a text
> file. It's important that the text file maintain the encoding. Then
> open it in TextWrangler.

Have you actually done this; changed encodings from within TW?
--
Thank you and have a nice day.

From: gtr on
On 2008-06-18 09:05:16 -0700, Jolly Roger <jollyroger(a)pobox.com> said:

[ For bystanders on sci.lang.japan, this discussion involves Mac
programs and OS alone. ]

> I think I would try saving the post from your news reader to a text
> file. It's important that the text file maintain the encoding. Then
> open it in TextWrangler.

I'm being thorough:

I saved the post as is (from sci.lang.japan subject:
新たに発見された、木簡上の和歌について ), from my newsreader (Unison) to the desktop.

I run TextWrangler, using the menu item Open, I open the document using
"auto-detect". I then Open it toggling the encoding to ISO2022 (my
first guess), and--wow! it opens beautifully. If I toggle the item at
the bottom of the frame to UTF8 and save it that way it will now be
treated as a utf8 document.

But most of the time I'm not opening documents out of usenet, but
clipping text from emails, id3 tags, usenet posts, the web and so on.
So I clip some text from this same message while perusing it garbled on
usenet. I can paste this into TextEdit, guess that it's iso2022 and
save it that way, then open this document in TextWrangler--if I
overrule the inoperable "auto-detect" and tell it explicitly that it's
iso2022. But that seems like a lot of trouble. I can do all that with
TextEdit, with the request save/reload processes.

I was looking for a program that would allow me, inexensively, to
simple toggle inaccurate encoding to one that is readable; all while in
the program. Paste, shift encoding, review, discard. That's what I
really would like. JEdit can do that, but at $28 with lots more bells
and whistles than I need. TextWrangler, as well as TextEdit, can't do
this without a few special saves, guessing about the encoding, and
explicit Open commands. That's my take.

If you use TextWrangler, I have a question: How do you enlarge and
reduce the display font? They've highjacked the semi-official cmd-= and
shift-cmd-- for changing display font and are using it for
search/replace options.
--
Thank you and have a nice day.