|
From: Ben Bullock on 17 Jun 2008 18:41 On Sun, 15 Jun 2008 15:26:11 -0700, gtr wrote: > I have a friend that sends me notes from his Windows box and text > utilties in Japanese. I think this/his enclave still uses Shift-JIS > for some of their text encoding. I am of the suspicion that generally > the Mac lives in unicode-ville. > > So he sends some text I can't read. > Does anyone know of similar utilities that will do this kind of thing > alone, rather than being buried inside a much larger and more expensive > program? If you just want to read the text, look at the text file inside a web browser. Browsers can usually guess the encoding but if it is still garbled you can use the "View/Character Encoding" option to set to Shift- JIS. If you want to convert a lot of files, get ActiveState Perl (it's free) and run the following script on it as in perl script garbled.txt >>> start of script, delete all lines until # >>> #! perl use warnings; use strict; open my $garbled, "<:encoding(cp932)", $ARGV[0] or die $!; binmode STDOUT,"encoding(utf8)"; # pick another encoding if you prefer while (<$garbled>) { print } __END__ <<< end of script, delete this line and everything after This will print the output onto the console. If you need to save it to a file, perl script garbled.txt utf8out.txt #! perl use warnings; use strict; open my $garbled, "<:encoding(cp932)", $ARGV[0] or die $!; open my $garbledout, ">:encoding(utf8)", $ARGV[1] or die $!; while (<$garbled>) { print $garbledout $_ } Note that the name of the variety of "Shift JIS" produced by Microsoft software like Microsoft Word is actually CP932 (code page 932) not "shift JIS" - if you use authentic "shift JIS" you'll get some errors with odd- bod characters, so it's safer to say "cp932". -- sci.lang.japan FAQ: http://www.sljfaq.org/
From: Ben Bullock on 18 Jun 2008 07:52 On Wed, 18 Jun 2008 17:59:11 +0900, Paul D wrote: > Windows' Notepad (their version of Textedit) couldn't even display > Unicode text last time I checked. The last time you checked being around 1998 or so? > (chuckles sadly, shakes head) (& doesn't check facts before posting) -- sci.lang.japan FAQ: http://www.sljfaq.org/
From: gtr on 18 Jun 2008 11:15 On 2008-06-17 21:33:59 -0700, Jolly Roger <jollyroger(a)pobox.com> said: >>> And _maybe_ TextWrangler would be better at guessing the encoding. >> >> I can't figure out how to make it work. I paste it Shift-JIS garble, >> and can't figure out what to do to change the intended encoding. I >> took a hard look at the manual too and can't figure it from there. >> Looks like a cool program, though. > > Before pasting it into the new document, change the encoding of the new > document to Shift-JIS. Thanks, I just tried that. Didn't work. I culled text from a usenet posting, took it to TextWrangler, set the document for a few encodings, pasted the text. It was identical everytime. I've done the same kind of thing in JEdit, and almost everytime it changed it to a different garble; when the correct format was identified it corrected the garble. With TextWrangler, I change the format, paste the text and regardless of the format it changes not one whit. So I still thinking I'm doing doing it correctly. -- Thank you and have a nice day.
From: gtr on 18 Jun 2008 13:13 On 2008-06-18 09:05:16 -0700, Jolly Roger <jollyroger(a)pobox.com> said: >> So I still thinking I'm doing doing it correctly. > > I think I would try saving the post from your news reader to a text > file. It's important that the text file maintain the encoding. Then > open it in TextWrangler. Have you actually done this; changed encodings from within TW? -- Thank you and have a nice day.
From: gtr on 18 Jun 2008 15:33
On 2008-06-18 09:05:16 -0700, Jolly Roger <jollyroger(a)pobox.com> said: [ For bystanders on sci.lang.japan, this discussion involves Mac programs and OS alone. ] > I think I would try saving the post from your news reader to a text > file. It's important that the text file maintain the encoding. Then > open it in TextWrangler. I'm being thorough: I saved the post as is (from sci.lang.japan subject: 新たに発見された、木簡上の和歌について ), from my newsreader (Unison) to the desktop. I run TextWrangler, using the menu item Open, I open the document using "auto-detect". I then Open it toggling the encoding to ISO2022 (my first guess), and--wow! it opens beautifully. If I toggle the item at the bottom of the frame to UTF8 and save it that way it will now be treated as a utf8 document. But most of the time I'm not opening documents out of usenet, but clipping text from emails, id3 tags, usenet posts, the web and so on. So I clip some text from this same message while perusing it garbled on usenet. I can paste this into TextEdit, guess that it's iso2022 and save it that way, then open this document in TextWrangler--if I overrule the inoperable "auto-detect" and tell it explicitly that it's iso2022. But that seems like a lot of trouble. I can do all that with TextEdit, with the request save/reload processes. I was looking for a program that would allow me, inexensively, to simple toggle inaccurate encoding to one that is readable; all while in the program. Paste, shift encoding, review, discard. That's what I really would like. JEdit can do that, but at $28 with lots more bells and whistles than I need. TextWrangler, as well as TextEdit, can't do this without a few special saves, guessing about the encoding, and explicit Open commands. That's my take. If you use TextWrangler, I have a question: How do you enlarge and reduce the display font? They've highjacked the semi-official cmd-= and shift-cmd-- for changing display font and are using it for search/replace options. -- Thank you and have a nice day. |