Is hex an ascii thing? [ASM]

Prev: DIV overflow
Next: RIP relative adresses

From: robertwessel2 on 11 Apr 2007 18:36

On Apr 11, 8:44 am, "Mike Gonta" <e...(a)mikegonta.com> wrote:
> In English hexadecimal notation uses the first 6 characters of the
> Latin character set ( A-F ) to represent the values 10 to 15.
> If your native language uses a different character set, for example
> Cyrillic or Greek, do you use the first 6 characters of that set or:
>
> Is hex an ascii thing?

Almost all programming is done with a character set that include at
least the uppercase Latin letters, even when done by speakers of human
languages where that's not entirely natural. A number of languages
(or their implementations) have been fairly liberal in what the accept
as variable names and whatnot, so I've seen Cobol with Katakana
variable and procedure names. Java actually lets you use most any
Unicode character, which allows some interesting possibilities.

The A-F convention has become fairly universal, but there have
certainly been others. U-Z has been used for the extra digits, as
well as 0-5 with an bar over or under the digit.

Also, ASCII is not really the right term, hex in EBCDIC uses the same
convention, although with all different code points.

From: Evenbit on 11 Apr 2007 22:14

On Apr 11, 5:28 pm, "Jim Carlock" <anonym...(a)127.0.0.1> wrote:
> "[Jongware]" wrote...
>
> "Jim Carlock" posted...
> : News reader: Outlook Express
> : Greek Capital Letter Gamma...
> : G
> :
> : Accessories\System Tools\Character Map.
> : Set the Font to Arial.
> : Scroll down to U+0393 Greek Capital Letter Gamma, select it
> : by clicking on it, then Click on the Select button.
> : Click on the Copy button.
> :
> : Not sure if this works or not. Will see.
>
> Well it worked before I pressed the Send button. Seems to require
> a Greek Newsgroup with HTML encoding to be able to handle the
> extra characters.
>
> 0, 1, 2, 3, 4, 5, 6, 7, 8, 9...
> 10 ? ? ? ? ? ¤ ? ? ? ?
> 20 ¶ § ? ? ? ? ? ? ? ?
> 30 ? ? !
>

I have to say that I am a tad disappointed. Knowing your interest
level for "all things assembly" I assumed you would write some 'code'
to help you conduct these tests. ;-)

You give me no other choice but to plop some HLA fodder here for Rene
(and gang) to feast:

program chartest;
#include( "stdlib.hhf" )
// change line #23 below if you use StdLib 2.0

static
s :string;
buff :byte[16];

begin chartest;

str.init( buff, 16 );
mov( eax, s );

stdout.put( " Dec Hex Char" nl );
stdout.put( "----+----+----" nl );
mov( 32, cl );

double_ampersand:

stdout.puts( " " );
stdout.putu8( cl );
stdout.puts( " " );
conv.bToStr( cl, 2, ' ', s ); // in StdLib 2.0 use byteToHex()
stdout.puts( s );
stdout.puts( " " );
stdout.putc( cl );
stdout.newln();

inc( cl );
jnz double_ampersand;

end chartest;

Nathan.

From: Wolfgang Kern on 11 Apr 2007 23:24

Mike Gonta wrote:

>> I'm not Greek, but I think also Greek and Russian programmers
>> use ASCII a..f (or A..F) in hexadecimal notations.

> I'm not Greek, but I use the ocassional Greek letter in mathematical
> notion.

Me too. But my CPU wont understand trigonometric formulas anyway.
So it's just on me (or any who write math.programs) to convert
math. expressions into binary code.

> Are we using ASCII as a convention or due to the lack of Unicode
> support in programming.

Why make things easy when they could be done more complicated ? :)

__
wolfgang

From: cr88192 on 12 Apr 2007 03:25

"Mike Gonta" <email(a)mikegonta.com> wrote in message
news:1176325076.591812.238630(a)y80g2000hsf.googlegroups.com...
> "[Jongware]" <IdontWantS...(a)hotmail.com> wrote:
>
>> How about that? The string "0x1A" in Unicode is still recognizable by an
>> extended atoi() function.
>
> That's because 7bit ASCII is a proper subset of Unicode.
>
>> Replace the character 'A' with any other first
>> character in another alphabet and it is not.
>
> My point exactly.
>
>
>> Maybe we should stick to the de facto definition of a hex number as
>> "consisting
>> of 0 to 9 and A to F".
>
> Yes indeed. The numerals and the first 6 letters of the alphabet.
> But why must the characters be English?
> Worst still is the lack of agreement on the indicator that the number
> is in fact hexadecimal.
>

letters are not "english".

this alphabet has been a de-facto standard for several millenia, in western
europe.

your german or french will have no trouble figuring out this one.

and for everyone else, why do they need to vary the convention for their own
language?
more so, why would that even be a good thing?
since the characters are from a different alphabet, they are naturally
distinguished (much as are numerals from letters), and thus the situation
should be in-fact better in other languages.

this is much like the math-head convention of using greek leters for various
operators/notation and latin letters for variables.

>
> Mike Gonta
>
> look and see - many look but few see
>

From: [Jongware] on 12 Apr 2007 05:29

"Mike Gonta" <email(a)mikegonta.com> wrote in message
news:1176324393.367669.66380(a)d57g2000hsg.googlegroups.com...
> I'm just wondering if this is a generalized thing due to the
> historical lack of a universal character representation (Unicode) and/
> or the difficulty in upgrading our programming environment to utilize
> Unicode.

You may be over-complicating this.
The compiler -- that is, the program that reads your source file in text
format -- expects to recognize items such as keywords, numbers, and
comments. It recognizes a "number" because it's formed out of characters
between '0' and '9' -- that is, the character codes for those. It does not
recognize the Unicode glyph 56DB "si" to have the numeric value '4', even
though its _meaning_ is "four", just as it does not recognize the four
characters "four" to describe that same number.

Your question may well be extended to keywords. If your compiler accepts
"new" as a keyword, shouldn't it accept "novum" as well, as this means
exactly the same? No -- the compiler doesn't 'know' the meaning of "new" --
it sees character codes for 'n', 'e', and 'w', in this order, and surrounded
by 'whitespace' or other 'delimiters' (both quoted, because both are defined
elsewhere).

Does this mean "Unicode is not compatible with programming"? Nah. Suppose --
I'm not aware of any -- there is a Unicode-compliant compiler, which can
read and parse Unicode sources. You still would have to use '0' to '9' for
numbers, and 'n', 'e', and 'w' characters for the command "new", but your
comments and your variables might contain any character in Greek, Thai, or
Telugu you want. But here comes the caveat: The practical problem is the set
of "characters", "whitespace" and "delimiters" in the compiler has to be
revised. There are a number of different 'official' white space characters
defined in Unicode, for example, the Chinese set has its own "fixed width"
space -- are all of these equal to (ASCII) character 32? If so, how about
the circled numerals in the dingbat section? Are these numbers? (There is no
circled '0' -- there is a circle '10', which leads to all new kind of
horrors in parsing. There are also glyphs for Roman numerals, which just
_may_ be too much to interpret...)
And if you think _that_ far ahead, you might as well accept the single
ideograph 5D2D "zhan" to equal the keyword "new" -- and now you're buggered
again, as there is at least one other ideograph 5D84 "zhan" with exactly the
same meaning.

[Jw]

First | Prev | Next | Last
Pages: 1 2 3 4 5 6 7 8
Prev: DIV overflow
Next: RIP relative adresses