a bytecode idea spec... [General Programming]

Prev: need ur help for my Masters project(TASM project)
Next: ANN: Seed7 Release 2010-07-04

From: Jacko on 5 Jul 2010 13:31

Any idea how many opcodes C limited to int, long, uint, ulong, [], *,
(){} would have?

From: BGB / cr88192 on 5 Jul 2010 16:31

"Jacko" <jackokring(a)gmail.com> wrote in message
news:6a56ae07-f3b6-4ca7-9eab-2b5c9e8bc31b(a)d8g2000yqf.googlegroups.com...
> Any idea how many opcodes C limited to int, long, uint, ulong, [], *,
> (){} would have?

well, the opcodes in my case are not generally specific to specific types,
but rather the variety of stuff which pops up in a language like C.

but, yeah, what I support is a variant of C99 (it is not strictly conforming
on several points, but is close enough that code generally works without
issue).

it does support a few "extensions" though, mostly 128-bit integers, built in
geometric vectors and quaternions, ... but these don't really effect the
number of IL opcodes all that much (most opcodes are shared between a wide
number of operand types, so not much issue here).

but, yeah, if one simplified "C", say down to a Java-like level (no real
explicit memory or function pointers, ...) then yeah, they could probably
shave off a few opcodes (as well as somewhat simplifying a compiler).

but, yeah, pointers and function pointers are probably one of the major
sources of compliler complexity in C (and typedef combined with the
declaration syntax a major source of parser complexity).

but, anyways, if one tries to simplify C, what they have will no longer be
C.
much like how GLSL is not C, even if it sort of resembles C...

From: BGB / cr88192 on 5 Jul 2010 17:22

"Jacko" <jackokring(a)gmail.com> wrote in message
news:e0b9eff9-e5d0-4473-8986-ef6750050c38(a)j8g2000yqd.googlegroups.com...
>> then there are opcodes like: inc2_s/dec2_s: add or subtract 2 from a
>> named
>> variable.
>> then there are opcodes for various manner of conditional jumps (a generic
>> "jump if true/false" followed by a number which perform a compare and a
>> jump).
>
> Things like 1+ 2* etc, I have avoided making opcodes for. The codes
> above 31 are unused/undefined/always free for system defined
> optimization. In that sense they are secondary optimization opcodes.
> Only primary computation essentials and structural foundational
> opcodes are to be considered for hard assignment. There are only 7
> more slots for this purpose.
>

"inc2_s" was so that some common special cases, such as "x+=2;" could be
encoded as a single opcode (yeah, "x++;" and "x--;" can also be encoded as
single opcodes).
all this was mostly important when I was mostly dealing with an
interpreter...

my compiler internally essentially uses a highly modified version of my
interpreter's bytecode (where the interpreter had been using dynamic-types +
type-inference, and had basic JIT support at one point...).

when reworked into use in a compiler, many of these opcodes stayed...
(many other compound and type-specialized opcodes, however, were dropped).

later on, this interpreter was partly rewritten, and many of these
special-case opcodes (as well as the JIT) were dropped mostly as it had
moved to a different GC/typesystem, and many no longer applied.

>> >> but, yeah, the nifty point would be having a bytecode with a non-fixed
>> >> opcode assignment, mostly so that it can be used with different
>> >> interpreters
>> >> or JIT machinery, without me having to force them all into using
>> >> exactly
>> >> the
>> >> same numbering, ...
>>
>> > What would be the advantage of that? If two interpreters offer the same
>> > operation, why not assign it the same code? The advantage of doing so
>> > would
>> > be compatibility of bytecode across interpreters, as long as the
>> > interpreter
>> > running the program supports all the used operations. Changing the
>> > numbers,
>> > it seems to me, ensures that even the chance of compatibility is thrown
>> > out
>> > of the window. I see that as a disadvantage. I fail to see what you
>> > gain
>> > by it.
>>
>> not really:
>> different interpreters can retain compatibility by instead using symbolic
>> binding, rather than relying on specific opcode numbers (really, it is
>> not
>> too much different than doing DLL linkage).
>
> In some senses an number could be a variable name.
>

in practicality, this is not usually the case.
the reason is that numbers are typically sequential/positional, and so pose
many additional problems.

for example:
IIRC DLL's originally used numerical binding (based on ordinal numbers), but
this later fell into disuse as symbolic binding became the norm (the whole
ordinal system still exists in PE/COFF, but is typically used only as a
hint, or ignored, rather than as the primary means of binding, with pretty
much all DLL's I have seen binding by name rather than by ordinal).

little is to say bytecode can't do similar...

then vendors can add extension opcodes simply by defining opcodes with their
vendor name as a prefix or similar...

it also allows marshalling one ISA within another (such as easily
marshalling x86 instructions), which can't be generally done with most
existing bytecode formats (example "x86.mov.rr32" or "x86.lea.rrm32", or
"x86.addsubpd.xxm").

granted, this doesn't necessarily mean all tools will understand all of
these opcodes, but in this case, they won't necessarily need to. symbolic
opcodes also avoids needing to assign all of these possibilities into big
massive tables, or for that matter, needing to use some massively large
opcode space with lots of long opcode numbers.

>> for example, one can use something like MSIL or JBC the same way MS or
>> Sun
>> does, but given they make relatively few provisions for "generally"
>> extending the formats, there is no real way to flexibly add features
>> without
>> breaking compatibility for any modules using these features, and there is
>> no
>> real good way for multiple implementations to "share" features or ideas
>> (since, for what limited extensibility is offered, there is no real
>> standardized way to identify the type/nature of these extensions, nor for
>> one parties' extensions to avoid clashing with others' extensions...).
>
> Yes, the name space clashes when the names are numeric is somewhat
> limited.
>

yep.

as well as the inability to determine one vendor's extensions from
another...

something as trivial as a vendor-prefix for a name largely minimizes these
issues, and a reflective format allows either ignoring uninteresting data,
or potentially rejecting the file, depending on the tool and the data.

for example, a tool for "untrusted" code could reject anything which doesn't
get past its internal validator, and another tool (such as an interpreter),
could throw an exception when such an unrecognized vendor code is found, but
execute any other instructions without issue.

more so, since the names are non-clashing, it allows different interpreters
to potentially implement opcodes defined by other vendors (this being, for
example, a fairly common practice in OpenGL, ...).

>> for example, for JBC this would likely mean having some defined coding
>> and
>> authority for FE and FF extension opcodes, which would bog everything
>> down,
>> or building a new mechanism and trying to gain support.
>>
>> even in newer Sun stuff, they themselves have ended up resorting to piles
>> of
>> hackery WRT extending JVM's core facilities, since a simple ill-concieved
>> extension basically means their tower can collapse (like in Jenga or
>> something), which is sub-optimal (not even x86 machine code has it this
>> bad,
>> one can hack new opcodes into this all they want so long as they pay some
>> caution to what Intel or Sun is doing...).
>
>> it also reduces the need for a lot of tools to have "complete" knowledge
>> of
>> the format (for example, one can disassemble the bytecode while not
>> having
>> some central opcode table, since the data needed to disassemble is
>> already
>> in the IL).
>
> Yes.
>
> Cheers Jacko

First | Prev |
Pages: 1 2 3
Prev: need ur help for my Masters project(TASM project)
Next: ANN: Seed7 Release 2010-07-04