From: BGB / cr88192 on 5 Jul 2010 03:02 "Mike Austin" <mike(a)mike-nospam-austin.com> wrote in message news:GKmdnb3R96WKlqzRnZ2dnUVZ_s-dnZ2d(a)giganews.com... > BGB / cr88192 wrote: >> "Jacko" <jackokring(a)gmail.com> wrote in message >> news:974bdba6-f158-4ab5-85eb-72ef53e745c4(a)k39g2000yqd.googlegroups.com... >>> Just looked. >>> >>> My VM spec's at http://acebforth.googlecode.com >>> >>> Just 23 opcodes. >>> > > A Tree-Based Alternative to Java Byte-Codes > http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.26.2124&rep=rep1&type=pdf > > 0 opcodes :) > > But on a more serious note, a tree can be plenty fast when implemented > efficiently. It doesn't physically have to be a tree. > yep, this is a reasonable option... actually, my existing RPNIL is relatively high-level (a bit higher level than JBC, and on similar grounds to MSIL). however, typically, it is still stack-based. ordering issues have also been a problem in the past... ASTs could work... actually, this reminds me of in the past having designed binary XML variants, most offering notably better density than WBXML (typically by combining the XML serialization with an LZP / LZ + Markov Predictor strategy). little stops similar from being used, except that I hadn't really considered transporting code in the form of ASTs, since this still leaves a lot of the compiler work on the client, and as well is the matter of abstracting the AST from the source language (an AST is likely to be largely specialized to a small number of fairly similar languages). at a lower level, a little more language neutrality can be gained, but granted there are other costs. a stack based IL is at least fairly fast and easy to interpret (AST interpreters are usually slower), and the potential ordering issues are annoying but not show-stoppers (in a compiler, the IL stack need not have much in common with the real stack, as the IL stack may in effect only deal with a virtual conception of a stack existing within the compiler, rather than the actual underlying CPU stack...). for example, 'dup' and 'exch' are cheap, typically because they don't actually "do" anything. this is partly how my existing RPNIL compiler works... however, the piece of code I am imagining is slightly atypical, as the use of a table-based opcode system may hint at: an interpreter for this is not likely to use the big-switch strategy so often used with interpreters... or such...
From: Robbert Haarman on 5 Jul 2010 03:10 On Sun, Jul 04, 2010 at 09:16:04AM -0700, BGB / cr88192 wrote: > > "Jacko" <jackokring(a)gmail.com> wrote in message > news:974bdba6-f158-4ab5-85eb-72ef53e745c4(a)k39g2000yqd.googlegroups.com... > > Just looked. > > > > My VM spec's at http://acebforth.googlecode.com > > > > Just 23 opcodes. > > > > some of my main bytecodes tend to include a bit more... > > but, OTOH, handling statically typed C-family languages tends to result in a > lot of opcodes. Why is that? I have always designed my bytecodes to be either some tree form, or a RISC instructions set. In both cases, they have operated on machine words, and the operations have been things like ALU operations, branching, call, return, and goto. Since these byte codes offer essentially the same operations that instruction sets for actual CPUs offer, they can be used as compilation targets much the same as actual, physical machine architectures, and the same techniques for instruction selection, optimization, etc. apply. As with RISC instruction sets, there is no notion of types, and languages can be compiled to these bytecodes regardless of the source language's type system. Apparently, your bytecode designs differ from mine so that the type system has an impact on the number of operations offered by the bytecode. Could you explain how that works, and what advantage it offers over an "everything is a machine word" bytecode? > but, yeah, the nifty point would be having a bytecode with a non-fixed > opcode assignment, mostly so that it can be used with different interpreters > or JIT machinery, without me having to force them all into using exactly the > same numbering, ... What would be the advantage of that? If two interpreters offer the same operation, why not assign it the same code? The advantage of doing so would be compatibility of bytecode across interpreters, as long as the interpreter running the program supports all the used operations. Changing the numbers, it seems to me, ensures that even the chance of compatibility is thrown out of the window. I see that as a disadvantage. I fail to see what you gain by it. Regards, Bob
From: Jacko on 5 Jul 2010 07:49 Added 2 extra opcode and altered one slightly. As the opcode set is possibly going to be 'hardwired' at some stage, it makes sense to set limits on what a opcode can do. * Only one stack item con be created or destroyed. * Only one return stack item can be created or destroyed. * Only the top two stack items maybe modified. * Only the top return stack item may be modified. * An implicit subroutine return stack prefetch is always performed, with a return to threading. * Othe data or return stack items may be fetched. * The zero valued machine word prefix on all opcodes exits from threading and runs one opcode. This keeps the virtual machine complexity down. Packed binary format is UTF-16 -> UTF-8 (option) -> BWTS -> LZ77/ DEFLATE with start vector 0. Wider word sizes are by UTF-32 -> UTF-8 mapping higher codes, and folding the last bit. 111111xx (10xxxxxx) ^ 5 ... There are seven reserved opcodes 25 to 31 for alternate language support. All opcodes above 31 are undefined. Cheers Jacko
From: BGB / cr88192 on 5 Jul 2010 13:01 "Robbert Haarman" <comp.lang.misc(a)inglorion.net> wrote in message news:20100705071043.GA2635(a)yoda.inglorion.net... > On Sun, Jul 04, 2010 at 09:16:04AM -0700, BGB / cr88192 wrote: >> >> "Jacko" <jackokring(a)gmail.com> wrote in message >> news:974bdba6-f158-4ab5-85eb-72ef53e745c4(a)k39g2000yqd.googlegroups.com... >> > Just looked. >> > >> > My VM spec's at http://acebforth.googlecode.com >> > >> > Just 23 opcodes. >> > >> >> some of my main bytecodes tend to include a bit more... >> >> but, OTOH, handling statically typed C-family languages tends to result >> in a >> lot of opcodes. > > Why is that? I have always designed my bytecodes to be either some tree > form, > or a RISC instructions set. In both cases, they have operated on machine > words, and the operations have been things like ALU operations, branching, > call, return, and goto. > > Since these byte codes offer essentially the same operations that > instruction > sets for actual CPUs offer, they can be used as compilation targets much > the same as actual, physical machine architectures, and the same > techniques > for instruction selection, optimization, etc. apply. As with RISC > instruction > sets, there is no notion of types, and languages can be compiled to these > bytecodes regardless of the source language's type system. > > Apparently, your bytecode designs differ from mine so that the type system > has an impact on the number of operations offered by the bytecode. Could > you > explain how that works, and what advantage it offers over an "everything > is a machine word" bytecode? > mine is a bit higher level, and so what the bytecode sees is not exactly the same as the world the machine sees. so, one doesn't have highly general purpose instructions like mov or lea, rather a fair number of specialized instructions. similarly, the ISA doesn't operare on "machine words", rather, it operates on opaque typed references or values, each of which has an undefined size at the IL level (although there is a 'sizeof' opcode, ...). in the native ISA, the actual values are managed by the lower-end compiler, which figures out their size and representation. in an interpreter, it would probably use references instead (I have yet to entirely decide between using dynamically-typed references or fat-references, each has their tradeoffs). examples: loadindex/setindex: load or set an array index (accepts array, pointer, ...) loadslot/setslot: load or set a struct/object slot (accepts struct, struct pointer, object reference, ...) call/call_s/callmethod/callmethod_s: perform calls to functions or method, where the generic form takes the reference on the stack, and the '_s' form has a hardcoded name; as well, load/store/loada/... for loading/storing variables to/from the stack. then there are opcodes like: inc2_s/dec2_s: add or subtract 2 from a named variable. then there are opcodes for various manner of conditional jumps (a generic "jump if true/false" followed by a number which perform a compare and a jump). as well as the arithmetic opcodes, ... these all lead to a fair number of opcodes in total... >> but, yeah, the nifty point would be having a bytecode with a non-fixed >> opcode assignment, mostly so that it can be used with different >> interpreters >> or JIT machinery, without me having to force them all into using exactly >> the >> same numbering, ... > > What would be the advantage of that? If two interpreters offer the same > operation, why not assign it the same code? The advantage of doing so > would > be compatibility of bytecode across interpreters, as long as the > interpreter > running the program supports all the used operations. Changing the > numbers, > it seems to me, ensures that even the chance of compatibility is thrown > out > of the window. I see that as a disadvantage. I fail to see what you gain > by it. not really: different interpreters can retain compatibility by instead using symbolic binding, rather than relying on specific opcode numbers (really, it is not too much different than doing DLL linkage). in the naive case, this would likely mean either a renumbering or function-pointer table to be used in the interpreter, where each number is looked up and assigned a spot in the re-index table (in this strategy, likely such a table would be needed for every module or code-block, which could add a little cost). a slightly less naive case would be to remap it to an internal form (an interpreter-specific form, likely more optimized for the specific interpreter machinery, or JIT'ing the code to native machine code), or, what I had considered, pre-decode the opcodes into a linked-list stucture with each "opcode" holding a function pointer to the code to execute it. now, the main problem with fixed opcode numbers is that they lead to much-increased "managerial" effort, where if one adds an opcode one place, they also have to add it everywhere else (which can be a problem if ones' compiler machinery is getting bulky). one can share tables between different components (say, the compiler and interpreter share the same internal "opcodes" table), but this is uncertain WRT modularity. (note that "tables" are fairly central in a lot of my stuff, as well, a lot of it tends to be partly machine-written as well...). if a 3rd party were to also plug into this VM, they would either have to reuse this table, or do their own clone table, meaning that fixed-number opcodes either: lead to increased dependencies between code, and hinder the creation of 3rd party implementations; code has to stick rigidly to some centrally-assigned list of opcodes, limiting the possibility flexibility and creativity in 3rd parties (they can't as easily add new features, or adapt the ISA to their own uses, since this would risk clashing with the original ISA). for example, one can use something like MSIL or JBC the same way MS or Sun does, but given they make relatively few provisions for "generally" extending the formats, there is no real way to flexibly add features without breaking compatibility for any modules using these features, and there is no real good way for multiple implementations to "share" features or ideas (since, for what limited extensibility is offered, there is no real standardized way to identify the type/nature of these extensions, nor for one parties' extensions to avoid clashing with others' extensions...). for example, for JBC this would likely mean having some defined coding and authority for FE and FF extension opcodes, which would bog everything down, or building a new mechanism and trying to gain support. even in newer Sun stuff, they themselves have ended up resorting to piles of hackery WRT extending JVM's core facilities, since a simple ill-concieved extension basically means their tower can collapse (like in Jenga or something), which is sub-optimal (not even x86 machine code has it this bad, one can hack new opcodes into this all they want so long as they pay some caution to what Intel or Sun is doing...). so, the idea is that if most things are defined within the IL (such as the number and layout of tables, the names and encodings of opcodes, ...) then little stops a 3rd party from adding or customizing features in a way which doesn't break existing implementations (which can ignore the new features, or throw an exception or similar if control reaches an unhandled opcode, while still being able to otherwise work with the opcodes). also, it would allow me, in my case, to use the same IL for somewhat divergent VM parts (some stack-based, some potentially using other forms, such as TAC/SSA or similar, or potentially even for things like "decompiled" x86 machine code, or for transcribing JBC or MSIL, allowing additional shared machinery to be used rather than having to maintain as much machinery for these various edge cases...). it also reduces the need for a lot of tools to have "complete" knowledge of the format (for example, one can disassemble the bytecode while not having some central opcode table, since the data needed to disassemble is already in the IL). or such...
From: Jacko on 5 Jul 2010 13:27 > then there are opcodes like: inc2_s/dec2_s: add or subtract 2 from a named > variable. > then there are opcodes for various manner of conditional jumps (a generic > "jump if true/false" followed by a number which perform a compare and a > jump). Things like 1+ 2* etc, I have avoided making opcodes for. The codes above 31 are unused/undefined/always free for system defined optimization. In that sense they are secondary optimization opcodes. Only primary computation essentials and structural foundational opcodes are to be considered for hard assignment. There are only 7 more slots for this purpose. > >> but, yeah, the nifty point would be having a bytecode with a non-fixed > >> opcode assignment, mostly so that it can be used with different > >> interpreters > >> or JIT machinery, without me having to force them all into using exactly > >> the > >> same numbering, ... > > > What would be the advantage of that? If two interpreters offer the same > > operation, why not assign it the same code? The advantage of doing so > > would > > be compatibility of bytecode across interpreters, as long as the > > interpreter > > running the program supports all the used operations. Changing the > > numbers, > > it seems to me, ensures that even the chance of compatibility is thrown > > out > > of the window. I see that as a disadvantage. I fail to see what you gain > > by it. > > not really: > different interpreters can retain compatibility by instead using symbolic > binding, rather than relying on specific opcode numbers (really, it is not > too much different than doing DLL linkage). In some senses an number could be a variable name. > for example, one can use something like MSIL or JBC the same way MS or Sun > does, but given they make relatively few provisions for "generally" > extending the formats, there is no real way to flexibly add features without > breaking compatibility for any modules using these features, and there is no > real good way for multiple implementations to "share" features or ideas > (since, for what limited extensibility is offered, there is no real > standardized way to identify the type/nature of these extensions, nor for > one parties' extensions to avoid clashing with others' extensions...). Yes, the name space clashes when the names are numeric is somewhat limited. > for example, for JBC this would likely mean having some defined coding and > authority for FE and FF extension opcodes, which would bog everything down, > or building a new mechanism and trying to gain support. > > even in newer Sun stuff, they themselves have ended up resorting to piles of > hackery WRT extending JVM's core facilities, since a simple ill-concieved > extension basically means their tower can collapse (like in Jenga or > something), which is sub-optimal (not even x86 machine code has it this bad, > one can hack new opcodes into this all they want so long as they pay some > caution to what Intel or Sun is doing...). > it also reduces the need for a lot of tools to have "complete" knowledge of > the format (for example, one can disassemble the bytecode while not having > some central opcode table, since the data needed to disassemble is already > in the IL). Yes. Cheers Jacko
First
|
Prev
|
Next
|
Last
Pages: 1 2 3 Prev: need ur help for my Masters project(TASM project) Next: ANN: Seed7 Release 2010-07-04 |