|
Prev: Can someone know the time for a oo implemented F2003 complier release under the window platform?
Next: Can someone know the time for a oo implemented F2003 complier release under the window platform?
From: Mik on 16 Apr 2008 13:00 I have files with data and text in Russian Windows encoding (CP1251). My current locale is UTF-8 (Linux). My Fortran program parses strings in files and produces computations. I use a utility named 'recode' to convert text to UTF-8. Windows version of program works without errors, but Linux version can't parse these files, because Russian Unicode characters place two bytes per symbol. Which solution is there? Thanks
From: Mik on 16 Apr 2008 13:12 Mik пишет: > I have files with data and text in Russian Windows encoding (CP1251). My > current locale is UTF-8 (Linux). My Fortran program parses strings in > files and produces computations. I use a utility named 'recode' to > convert text to UTF-8. Windows version of program works without errors, > but Linux version can't parse these files, because Russian Unicode > characters place two bytes per symbol. Which solution is there? > > Thanks Strings are approximately such as: | абвгд | 1 | 23.45 | 67.89 | опрст |
From: Terence on 17 Apr 2008 19:28 The whole problem is that 2-byte usage for Russian. I provide software which runs in many left-to-right languages by providing external modules of message strings, in several languages, for each internal message in the program. Here I use ONLY a one-byte symbol and select the appropriate Microsoft table for the language required. For Russian this would be the Cyrilic table. For Polish it's the Slavic table and so on. For Greek I use a complete Greek table, not the 10 or so top-table physics notation set. So one solution that occurs to me is:- Write a program to read the data file and detect the leading byte of the two-byte UTF-8 code (D0h=Cyrilic, for the Cyrilic coding throughout the data), and convert the second byte to a new byte corresponding to a 256-byte DOS Miscrosoft Cyrilic symbol table. Then use a single-byte Cyrilic table when reading Russian data if this is possible in Linux or else the nearest distinct Latin equivalent to make the text understandable (R.N P F...). Its obviously possible here in the Forum as the Russian comes out readably. Another solution is to look up the Russian-coded string internally and convert it to a word in your language of choice, using single-byte symbols and store back, in what was amplee space for a now one-byte coded system.
From: Terence on 17 Apr 2008 19:34 I wrote a reply with two soutions. I don't see it. I was about to comment that the first byte of UTF=8 for Cyrilic is D0h AND D1h, not jut D0H as I stated. The previous message SAYS it got posted the simple way. This time there's a different screen!
From: Gerry Ford on 17 Apr 2008 21:06
"Terence" <tbwright(a)cantv.net> wrote in message news:1fd9e9b5-0c7a-45eb-906f-e7c9d2db6bb2(a)f63g2000hsf.googlegroups.com... > The whole problem is that 2-byte usage for Russian. That's one of the problems. Another is that the wall, that used to divide Berlin, shifted east and has kept westerners--at least this westerner--from communicating with Gospodun Putin's russia on the internet, in particular, in newsgroups. I could shed plenty of light on this question, if OP can help me, for example, use the cyrillic keys on my keyboard. I could then replicate his data set and put his question in the crossfire. -- "A belief in a supernatural source of evil is not necessary; men alone are quite capable of every wickedness." ~~ Joseph Conrad (1857-1924), novelist |