Prev: DIV overflow
Next: RIP relative adresses
From: robertwessel2 on 11 Apr 2007 18:36 On Apr 11, 8:44 am, "Mike Gonta" <e...(a)mikegonta.com> wrote: > In English hexadecimal notation uses the first 6 characters of the > Latin character set ( A-F ) to represent the values 10 to 15. > If your native language uses a different character set, for example > Cyrillic or Greek, do you use the first 6 characters of that set or: > > Is hex an ascii thing? Almost all programming is done with a character set that include at least the uppercase Latin letters, even when done by speakers of human languages where that's not entirely natural. A number of languages (or their implementations) have been fairly liberal in what the accept as variable names and whatnot, so I've seen Cobol with Katakana variable and procedure names. Java actually lets you use most any Unicode character, which allows some interesting possibilities. The A-F convention has become fairly universal, but there have certainly been others. U-Z has been used for the extra digits, as well as 0-5 with an bar over or under the digit. Also, ASCII is not really the right term, hex in EBCDIC uses the same convention, although with all different code points.
From: Evenbit on 11 Apr 2007 22:14 On Apr 11, 5:28 pm, "Jim Carlock" <anonym...(a)127.0.0.1> wrote: > "[Jongware]" wrote... > > "Jim Carlock" posted... > : News reader: Outlook Express > : Greek Capital Letter Gamma... > : G > : > : Accessories\System Tools\Character Map. > : Set the Font to Arial. > : Scroll down to U+0393 Greek Capital Letter Gamma, select it > : by clicking on it, then Click on the Select button. > : Click on the Copy button. > : > : Not sure if this works or not. Will see. > > Well it worked before I pressed the Send button. Seems to require > a Greek Newsgroup with HTML encoding to be able to handle the > extra characters. > > 0, 1, 2, 3, 4, 5, 6, 7, 8, 9... > 10 ? ? ? ? ? ¤ ? ? ? ? > 20 ¶ § ? ? ? ? ? ? ? ? > 30 ? ? ! > I have to say that I am a tad disappointed. Knowing your interest level for "all things assembly" I assumed you would write some 'code' to help you conduct these tests. ;-) You give me no other choice but to plop some HLA fodder here for Rene (and gang) to feast: program chartest; #include( "stdlib.hhf" ) // change line #23 below if you use StdLib 2.0 static s :string; buff :byte[16]; begin chartest; str.init( buff, 16 ); mov( eax, s ); stdout.put( " Dec Hex Char" nl ); stdout.put( "----+----+----" nl ); mov( 32, cl ); double_ampersand: stdout.puts( " " ); stdout.putu8( cl ); stdout.puts( " " ); conv.bToStr( cl, 2, ' ', s ); // in StdLib 2.0 use byteToHex() stdout.puts( s ); stdout.puts( " " ); stdout.putc( cl ); stdout.newln(); inc( cl ); jnz double_ampersand; end chartest; Nathan.
From: Wolfgang Kern on 11 Apr 2007 23:24 Mike Gonta wrote: >> I'm not Greek, but I think also Greek and Russian programmers >> use ASCII a..f (or A..F) in hexadecimal notations. > I'm not Greek, but I use the ocassional Greek letter in mathematical > notion. Me too. But my CPU wont understand trigonometric formulas anyway. So it's just on me (or any who write math.programs) to convert math. expressions into binary code. > Are we using ASCII as a convention or due to the lack of Unicode > support in programming. Why make things easy when they could be done more complicated ? :) __ wolfgang
From: cr88192 on 12 Apr 2007 03:25 "Mike Gonta" <email(a)mikegonta.com> wrote in message news:1176325076.591812.238630(a)y80g2000hsf.googlegroups.com... > "[Jongware]" <IdontWantS...(a)hotmail.com> wrote: > >> How about that? The string "0x1A" in Unicode is still recognizable by an >> extended atoi() function. > > That's because 7bit ASCII is a proper subset of Unicode. > >> Replace the character 'A' with any other first >> character in another alphabet and it is not. > > My point exactly. > > >> Maybe we should stick to the de facto definition of a hex number as >> "consisting >> of 0 to 9 and A to F". > > Yes indeed. The numerals and the first 6 letters of the alphabet. > But why must the characters be English? > Worst still is the lack of agreement on the indicator that the number > is in fact hexadecimal. > letters are not "english". this alphabet has been a de-facto standard for several millenia, in western europe. your german or french will have no trouble figuring out this one. and for everyone else, why do they need to vary the convention for their own language? more so, why would that even be a good thing? since the characters are from a different alphabet, they are naturally distinguished (much as are numerals from letters), and thus the situation should be in-fact better in other languages. this is much like the math-head convention of using greek leters for various operators/notation and latin letters for variables. > > Mike Gonta > > look and see - many look but few see >
From: [Jongware] on 12 Apr 2007 05:29
"Mike Gonta" <email(a)mikegonta.com> wrote in message news:1176324393.367669.66380(a)d57g2000hsg.googlegroups.com... > I'm just wondering if this is a generalized thing due to the > historical lack of a universal character representation (Unicode) and/ > or the difficulty in upgrading our programming environment to utilize > Unicode. You may be over-complicating this. The compiler -- that is, the program that reads your source file in text format -- expects to recognize items such as keywords, numbers, and comments. It recognizes a "number" because it's formed out of characters between '0' and '9' -- that is, the character codes for those. It does not recognize the Unicode glyph 56DB "si" to have the numeric value '4', even though its _meaning_ is "four", just as it does not recognize the four characters "four" to describe that same number. Your question may well be extended to keywords. If your compiler accepts "new" as a keyword, shouldn't it accept "novum" as well, as this means exactly the same? No -- the compiler doesn't 'know' the meaning of "new" -- it sees character codes for 'n', 'e', and 'w', in this order, and surrounded by 'whitespace' or other 'delimiters' (both quoted, because both are defined elsewhere). Does this mean "Unicode is not compatible with programming"? Nah. Suppose -- I'm not aware of any -- there is a Unicode-compliant compiler, which can read and parse Unicode sources. You still would have to use '0' to '9' for numbers, and 'n', 'e', and 'w' characters for the command "new", but your comments and your variables might contain any character in Greek, Thai, or Telugu you want. But here comes the caveat: The practical problem is the set of "characters", "whitespace" and "delimiters" in the compiler has to be revised. There are a number of different 'official' white space characters defined in Unicode, for example, the Chinese set has its own "fixed width" space -- are all of these equal to (ASCII) character 32? If so, how about the circled numerals in the dingbat section? Are these numbers? (There is no circled '0' -- there is a circle '10', which leads to all new kind of horrors in parsing. There are also glyphs for Roman numerals, which just _may_ be too much to interpret...) And if you think _that_ far ahead, you might as well accept the single ideograph 5D2D "zhan" to equal the keyword "new" -- and now you're buggered again, as there is at least one other ideograph 5D84 "zhan" with exactly the same meaning. [Jw] |