|
Prev: Wireless Mics - OEM Wireless Mics Manufacturer
Next: Wholesale Karaoke Microphone - Chinese Karaoke Microphone Manufacturer
From: dutone on 23 Apr 2008 17:29 I have some text files that were saved in Windows as ASCII which, unfortunately, causes the text file to contain non-control chars in the range that iso-8859-1 defines control chars. iconv and recode do not convert or drop these 1252 codes (145,146, and 147) to the appropriate iso-8859-1 equivalents and instead give me garbage. Is there a utility that I can use to convert the chars appropriately?
From: dutone on 23 Apr 2008 17:43 On Apr 23, 2:29 pm, dutone <dut...(a)hotmail.com> wrote: > I have some text files that were saved in Windows as ASCII which, > unfortunately, causes the text file to contain non-control chars in > the range that iso-8859-1 defines control chars. > > iconv and recode do not convert or drop these 1252 codes (145,146, and > 147) to the appropriate iso-8859-1 equivalents and instead give me > garbage. > > Is there a utility that I can use to convert the chars appropriately? Note that I can do this via Perl or Sed via perl -pe"s/\x92/'/g" But was wondering if there was an existing util and/or why iconv and recode don't convert when possible.
From: Lew Pitcher on 23 Apr 2008 17:47 In comp.unix.shell, dutone wrote: > I have some text files that were saved in Windows as ASCII which, > unfortunately, causes the text file to contain non-control chars in > the range that iso-8859-1 defines control chars. That would be impossible to do with /ASCII/. I'm sure that you mean that you saved the text files in the CP1252 characterset (/not/ the ASCII characterset), and are having problems converting from CP1252 to ISO-8859-1 > iconv and recode do not convert or drop these 1252 codes (145,146, and > 147) Yup, that's not ASCII. ASCII characters range from 0 to 127 inclusive. If the character value exceeds 127, then you /don't/ have ASCII > to the appropriate iso-8859-1 equivalents and instead give me > garbage. > > Is there a utility that I can use to convert the chars appropriately? In CP1252, character 145 is LEFT SINGLE QUOTATION MARK, character 146 is RIGHT SINGLE QUOTATION MARK, and character 147 is LEFT DOUBLE QUOTATION MARK (courtesy of the ISO Internationalization working group's characterset map at http://anubis.dkuug.dk/i18n/charmaps/CP1252 ) In ISO-8895-1 (http://anubis.dkuug.dk/i18n/charmaps/ISO_8859-1) there doesn't seem to be a corresponding character (codepoint) for any of those three characters. By rights, they all should map to the 0x1a (SUB) character. I know of no utility save iconv that would convert these for you. Perhaps you can convert in two stages: CP1252 to Unicode, and Unicode to ISO-8895-1. Luck be with you -- Lew Pitcher Master Codewright & JOAT-in-training | Registered Linux User #112576 http://pitcher.digitalfreehold.ca/ | GPG public key available by request ---------- Slackware - Because I know what I'm doing. ------
From: dutone on 23 Apr 2008 19:25 On Apr 23, 2:47 pm, Lew Pitcher <lpitc...(a)teksavvy.com> wrote: > In comp.unix.shell, dutone wrote: > > I have some text files that were saved in Windows as ASCII which, > > unfortunately, causes the text file to contain non-control chars in > > the range that iso-8859-1 defines control chars. > > That would be impossible to do with /ASCII/. I'm sure that you mean that you > saved the text files in the CP1252 characterset (/not/ the ASCII > characterset), and are having problems converting from CP1252 to ISO-8859-1 I'm sorry, I meant ANSI. Notepad's Save As ANSI option does not save it as iso-8859-1, rather 1252. > > iconv and recode do not convert or drop these 1252 codes (145,146, and > > 147) > > Yup, that's not ASCII. ASCII characters range from 0 to 127 inclusive. If > the character value exceeds 127, then you /don't/ have ASCII I would expect a Windows-1252 to iso-8859-1 conversion to replace 145,146 with 39 and ,147 with 34. Guess I'm sticking with Perl for the conversion. Thanks.
From: Gary Johnson on 23 Apr 2008 19:55
dutone <dutone(a)hotmail.com> wrote: > On Apr 23, 2:47 pm, Lew Pitcher <lpitc...(a)teksavvy.com> wrote: >> In comp.unix.shell, dutone wrote: >> > I have some text files that were saved in Windows as ASCII which, >> > unfortunately, causes the text file to contain non-control chars in >> > the range that iso-8859-1 defines control chars. >> >> That would be impossible to do with /ASCII/. I'm sure that you mean that you >> saved the text files in the CP1252 characterset (/not/ the ASCII >> characterset), and are having problems converting from CP1252 to ISO-8859-1 > > I'm sorry, I meant ANSI. Notepad's Save As ANSI option does not save > it as iso-8859-1, rather 1252. > >> > iconv and recode do not convert or drop these 1252 codes (145,146, and >> > 147) >> >> Yup, that's not ASCII. ASCII characters range from 0 to 127 inclusive. If >> the character value exceeds 127, then you /don't/ have ASCII > > I would expect a Windows-1252 to iso-8859-1 conversion to replace > 145,146 with 39 and ,147 with 34. > > Guess I'm sticking with Perl for the conversion. You can use iconv for this, but you have to add the //TRANSLIT suffix, like this: iconv -c -f windows-1252 -t iso-8859-1//TRANSLIT That tells iconv to choose a symbol from the output character set that is close to the desired symbol. -- Gary Johnson |