How to get unicode range of some languages. [General Programming]

Prev: [ANN] Software Testing & Quality in Methods & Tools Summer 2010
Next: see man page (Re: ed buffer representation)

From: Unicode on 23 Jun 2010 04:59

Im trying to display different language texts with opengl. Inorder to
support whole unicode character set, I should create 65536 bitmaps. It
will consume huge memory and not practical at all. Now my idea is to
create required bitmaps for current system language.

For example, if current system language is Russian I'll create bitmaps
from 0x400 to 0x4ff.

I dont know maximum unicode range to support following languages.
German Italian French Spanish Danish Swedish Norwegian Finnish

Inorder to support a single language at a time, I have to know the
unicode character range of these languages.

Thanks in advance,

From: Jussi Piitulainen on 23 Jun 2010 05:23

Unicode writes:

> Im trying to display different language texts with opengl. Inorder to
> support whole unicode character set, I should create 65536 bitmaps. It
> will consume huge memory and not practical at all. Now my idea is to
> create required bitmaps for current system language.
>
> For example, if current system language is Russian I'll create bitmaps
> from 0x400 to 0x4ff.
>
> I dont know maximum unicode range to support following languages.
> German Italian French Spanish Danish Swedish Norwegian Finnish
>
> Inorder to support a single language at a time, I have to know the
> unicode character range of these languages.

Surely you are looking for a minimum range.

The eight-bit code Latin-1 (iso-8859-1) is almost sufficient for these
languages. Add the few extra characters from Latin-9 (iso-8859-15) to
be even closer. One of these extra characters is the euro symbol.
Others are just a few letters and punctuation.

I believe Unicode assigns the same codes as Latin-1 to the characters
that are in Latin-1.

Some of the letters in these sets are rare, but the users of Latin-1
or Latin-9 would still assume that they can use them when needed.

From: Pascal J. Bourguignon on 24 Jun 2010 06:22

Jussi Piitulainen <jpiitula(a)ling.helsinki.fi> writes:

> Unicode writes:
>
>> Im trying to display different language texts with opengl. Inorder to
>> support whole unicode character set, I should create 65536 bitmaps. It
>> will consume huge memory and not practical at all. Now my idea is to
>> create required bitmaps for current system language.
>>
>> For example, if current system language is Russian I'll create bitmaps
>> from 0x400 to 0x4ff.
>>
>> I dont know maximum unicode range to support following languages.
>> German Italian French Spanish Danish Swedish Norwegian Finnish
>>
>> Inorder to support a single language at a time, I have to know the
>> unicode character range of these languages.
>
> Surely you are looking for a minimum range.
>
> The eight-bit code Latin-1 (iso-8859-1) is almost sufficient for these
> languages. Add the few extra characters from Latin-9 (iso-8859-15) to
> be even closer. One of these extra characters is the euro symbol.
> Others are just a few letters and punctuation.
>
> I believe Unicode assigns the same codes as Latin-1 to the characters
> that are in Latin-1.
>
> Some of the letters in these sets are rare, but the users of Latin-1
> or Latin-9 would still assume that they can use them when needed.

But even if the current language is English, I should be able to write
in English that "Résumé" is a French word and that "Здраствуйте" means
"Hello", and that 聽龍 is my Chinese name.

Instead of pre-computing maps per language, which would be an
artificial limitation, use a cache to map the characters of the text
at hand.

--
__Pascal Bourguignon__ http://www.informatimago.com/

From: Jongware on 24 Jun 2010 11:11

On 24-Jun-10 12:22 PM, Pascal J. Bourguignon wrote:
> But even if the current language is English, I should be able to write
> in English that "Résumé" is a French word and that "Здраствуйте" means
> "Hello", and that 聽龍 is my Chinese name.

"Obey the Emperor". Not really a literal translation, is it? :-D

I think the OP wants to avoid having to create 65,536 separate bitmaps
(not the actual number of UC glyphs, but it may come close). A
reasonable alternative could be to create bitmaps per _UC block_.

Unicode is language-oblivious, and the blocks are not constructed with
/language groups/ in mind.

[Jw]

From: BGB / cr88192 on 24 Jun 2010 16:39

"Unicode" <santhosh4g(a)gmail.com> wrote in message
news:e3ed3966-cf50-4bfd-9714-3c3731de146b(a)y11g2000yqm.googlegroups.com...
> Im trying to display different language texts with opengl. Inorder to
> support whole unicode character set, I should create 65536 bitmaps. It
> will consume huge memory and not practical at all. Now my idea is to
> create required bitmaps for current system language.
>
> For example, if current system language is Russian I'll create bitmaps
> from 0x400 to 0x4ff.
>
> I dont know maximum unicode range to support following languages.
> German Italian French Spanish Danish Swedish Norwegian Finnish
>
> Inorder to support a single language at a time, I have to know the
> unicode character range of these languages.
>

well, here is how I handled all this:
create a texture representing a block of characters (using character blocks,
rather than individual characters, puts less strain on GL).

for example, handling 16x16 character texture blocks, which gives a single
texture holding 256 characters (if each char is 16x16 pixels, this means a
256x256 texture).

and, the entire BMP can be handled with 256 textures.

now, what happens when one tries to draw a character?...
it looks up the correct character block texture, and if none is present, it
creates it;
it draws a quad using the ST coords for the correct spot in the character
block.

now, what happens when creating a block:
create new texture buffer;
draw in all the characters for this block;
send it into GL.

so, no real problem, and no real need to try to figure out in advance which
language ranges to expect.

or such...

|
Pages: 1
Prev: [ANN] Software Testing & Quality in Methods & Tools Summer 2010
Next: see man page (Re: ed buffer representation)