From: on
Hi All,

I've read several posts concerning structures and their
implementation in assembly language. Given some
misconceptions about structures in assembly language,
I pieced together the following article about structures
in assembly.
Randy Hyde

Structures in Assembly Language Programs

Structures, or records, are an abstract data type
that allows a programmer to collect different
objects together into a single, composite,
object. Structures can help make programs easier
to read, write, modify, and maintain. Used
appropriately, they can also help your programs
run faster. Despite the advantages that structures
offer, their appearance in assembly language is a
relatively recent phenomenon (in the past two
decades, or so), and many assemblers still do not
support this facility. Furthermore, many
"old-timer" assembly language programmers attempt
to argue that the appearance of records violates
the whole principle of "assembly language
programming." This article will certain refute
such arguments and describe the benefits of using
structures in an assembly language program.

Despite the fact that records have been available
in various assembly languages for years (e.g.,
Microsoft's MASM assembler introduced structures
in 80x86 assembly language in the 1980s), the
"lack of support for structures" is a common
argument against assembly language by HLL
programmers who don't know much about assembly.
In some respects, their ignorance is justified --
many assemblers don't support structures or
records. A second goal of this article is to
educate assembly language programmers to counter
claims like "assembly language doesn't support
structures." Hopefully, that same education will
convince those assem- bly language programmers
who've never bothered to use structures, to
consider their use.

This article will use the term "record" to denote
a struc- ture/record to avoid confusion with the
more general term "data structure". Note,
however, that the terms "record" and "structure"
are synonymous in this article.

What is a Record (Structure)?

(for those who don't have any idea of how records
are implemented in memory, you may want to consider
reading the chapter on this subject in "The Art of
Assembly Language Programming" at

The whole purpose of a record is to let you
encapsulate different, but logically related,
data into a single package. Here is a typical
record declaration, in HLA using the RECORD /
ENDRECORD declaration:




Name: string;
Major: int16;
SSN: char[12];
Midterm1: int16;
Midterm2: int16;
Final: int16;
Homework: int16;
Projects: int16;


The field names within the record must be unique.
That is, the same name may not appear two or more
times in the same record. However, in reasonable
assemblers (like HLA) that support true
structures, all the field names are local to that
record. With such assemblers, you may reuse those
field names elsewhere in the pro- gram.

The RECORD/ENDRECORD type declaration may appear
in a variable declaration section (e.g., an HLA
STATIC or VAR section) or in a TYPE declaration
section. In the previous example the Student
declaration appears in an HLA TYPE section, so
this does not actually allocate any storage for a
Student variable. Instead, you have to
explicitly declare a variable of type Student.
The following example demonstrates how to do

John: Student;

This allocates 28 bytes of storage: four bytes for
the Name field (HLA strings are four-byte
pointers to character data found else- where in
memory), 12 bytes for the SSN field, and two bytes
for each of the other six fields.

If the label John corresponds to the base address
of this record, then the Name field is at offset
John+0, the Major field is at offset John+4, the
SSN field is at offset John+6, etc.

To access an element of a structure you need to
know the offset from the beginning of the
structure to the desired field. For example, the
Major field in the variable John is at offset 4
from the base address of John. Therefore, you
could store the value in AX into this field using
the instruction

mov( ax, (type word John[4]) );

Unfortunately, memorizing all the offsets to
fields in a record defeats the whole purpose of
using them in the first place. After all, if
you've got to deal with these numeric offsets why
not just use an array of bytes instead of a

Using Symbolic Equates to Implement Record Fields

Some enterprising types have noted that they can
improve the readability of their "structure"
accesses by using symbolic equates rather than
literal numeric constants. That is, they can play
games such as this:

Name := 0;
Major := Name+sizeOfString;
SSN := Major+sizeOfInt16;
Midterm1 := SSN+12*sizeOfChar;
Midterm2 := Midterm1+sizeOfInt16;
Final := Midterm2+sizeOfInt16;
Homework := Final+sizeOfInt16;
Projects := Homework+sizeOfInt16;

sizeOfStudent := Projects+sizeOfInt16;

Certainly it is the case that a statement like

mov( ax, (type word John[Major] );

is far more readable than

mov( ax, (type word John[4]));

Now if your assembler doesn't support structs, this
is about as good as it gets for you. Granted, it
*is* more readable and maintainable than the
earlier version, but that certainly doesn't mean
that the result is readable and maintainable. After
all, "mov( ax, (type word John[4]));" is far more
readable than

byte $66, $a3, $04, $00, $00, $00;

but this doesn't imply that

mov( ax, (type word John[4]));

is a particularly readable, maintainable, or good
way to access this field of the student structure.
Better, unquestionably, but not good. We'll return
to this subject of hacked attempts at record
simulation later in this paper.

Assemblers like HLA that support true records
commonly let you refer to field names in a record
using the same mechanism C/C++ and Pascal use:
the dot operator. To store AX into the Major
field, you could use "mov( ax, John.Major );"
instead of the previous instruction. This is much
more readable and certainly easier to use than
other schemes people have invented.

Record Constants

HLA lets you define record constants. In fact,
HLA is probably unique among x86 assemblers
insofar as it supports both symbolic record
constants and literal record constants. Record
constants are useful as initializers for static
record variables. They are also quite useful as
compile-time data structures when using the HLA
com- pile-time language (that is, the macro
processor language). This section discusses how
to create record constants.

A record literal constant takes the following form:

RecordTypeName:[ List_of_comma_separated_constants ]

The RecordTypeName is the name of a record data
type you've defined in an HLA TYPE section prior
to this point. To create a record constant you
must have previously defined the record type in a
TYPE section of your program.

The constant list appearing between the brackets
are the data items for each of the fields in the
specified record. The first item in the list
corresponds to the first field of the record, the
second item in the list corresponds to the second
field, etc. The data types of each of the
constants appearing in this list must match their
respective field types. The following example
demonstrates how to use a lit- eral record
constant to initialize a record variable:






Vector: point := point:[ 1, -2, 3 ];

This declaration initializes Vector.x with 1,
Vector.y with -2, and Vector.z with 3.

You can also create symbolic record constants by
declaring record objects in the CONST or VAL
sections of an HLA program. You access fields of
these symbolic record constants just as you would
access the field of a record variable, using the
dot operator. Since the object is a constant,
you can specify the field of a record constant
anywhere a constant of that field's type is legal.
You can also employ symbolic record constants as
record variable initializ- ers. The following
example demonstrates this:







PointInSpace: point := point:[ 1, 2, 3 ];


Vector: point := PointInSpace;
XCoord: int32 := PointInSpace.x;

Arrays of Records

It is a perfectly reasonable operation to create
an array of records. To do so, you simply create
a record type and then use the standard array
declaration syntax when declaring an array of that
record type. The following example demonstrates
how you could do this:




<< fields for this record >>




recArray: recElement[4];

Naturally, you can create multidimensional arrays
of records as well. You would use the standard
row or column major order func- tions to compute
the address of an element within such records.
The only thing that really changes (from the
discussion of arrays) is that the size of each
element is the size of the record object.


rec2D: recElement[ 4, 6 ];

Arrays and Records as Record Fields

Records may contain other records or arrays as
fields. Consider the following definition:




Pt: point;
color: dword;


The definition above defines a single point with a
32 bit color component. When initializing an
object of type Pixel, the first ini- tializer
corresponds to the Pt field, not the x-coordinate
field. The following definition is incorrect:


ThisPt: Pixel := Pixel:[ 5, 10 ]; // Syntactically incorrect!

The value of the first field ('5') is not an
object of type point. Therefore, the assembler
generates an error when encountering this
statement. HLA will allow you to initialize the
fields of Pixel using declarations like the


ThisPt: Pixel := Pixel:[ point:[ 1, 2, 3 ], 10 ];
ThatPt: Pixel := Pixel:[ point:[ 0, 0, 0 ], 5 ];

Accessing Pixel fields is very easy. Like a high
level language you use a single period to
reference the Pt field and a second period to
access the x, y, and z fields of point:

stdout.put( "ThisPt.Pt.x = ", ThisPt.Pt.x, nl );

stdout.put( "ThisPt.Pt.y = ", ThisPt.Pt.y, nl );

stdout.put( "ThisPt.Pt.z = ", ThisPt.Pt.z, nl );


mov( eax, ThisPt.Color );

You can also declare arrays as record fields. The
following record creates a data type capable of
representing an object with eight points (e.g., a



Pts: point[8];
Color: dword;


There are two common ways to nest record
definitions. As noted earlier in this section,
you can create a record type in a TYPE section
and then use that type name as the data type of
some field within a record (e.g., the Pt:point
field in the Pixel data type above). It is also
possible to declare a record directly within
another record without creating a separate data
type for that record; the following example
demonstrates this:



iField: int32;
sField: string;


Generally, it's a better idea to create a separate
type rather than embed records directly in other
records, but nesting them is perfectly legal and
a reasonable thing to do on occasion.

Controlling Field Offsets Within a Record

By default, whenever you create a record, most
assemblers automatically assign the offset zero
to the first field of that record. This
corresponds to records in a high level language
and is the intu- itive default condition. In some
instances, however, you may want to assign a
different starting offset to the first field of
the record. The HLA assembler provides a
mechanism that lets you set the starting offset
of the first field in the record.

The syntax to set the first offset is

record := startingOffset;

<< Record Field Declarations >>


Using the syntax above, the first field will have
the starting offset specified by the
startingOffset int32 constant expression. Since
this is an int32 value, the starting offset value
can be positive, zero, or negative.

One circumstance where this feature is invaluable
is when you have a record whose base address is
actually somewhere within the data structure.
The classic example is an HLA string. An HLA
string uses a record declaration similar to the


MaxStrLen: dword;
length: dword;
charData: char[xxxx];


However, HLA string pointers do not contain the
address of the MaxStrLen field; they point at
the charData field. The str.strRec record type
found in the HLA Standard Library Strings module
uses a record declaration similar to the


record := -8;
MaxStrLen: dword;
length: dword;
charData: char;

The starting offset for the MaxStrLen field is -8.
Therefore, the offset for the length field is -4
(four bytes later) and the offset for the
charData field is zero. Therefore, if EBX points
at some string data, then "(type str.strRec
[ebx]).length" is equivalent to "[ebx-4]" since
the length field has an offset of -4.

Aligning Fields Within a Record

To achieve maximum performance in your programs,
or to ensure that your records properly map to
records or structures in some high level
language, you will often need to be able to
control the alignment of fields within a record.
For example, you might want to ensure that a
dword field's offset is an even multiple of four.
You use the ALIGN directive in a record
declaration to do this. The following example
shows how to align some fields on important


c: char;
d: dword;
b: boolean;
w: word;

Whenever HLA encounters the ALIGN directive within
a record declaration, it automatically adjusts
the following field's off- set so that it is an
even multiple of the value the ALIGN directive
specifies. It accomplishes this by increasing the
offset of that field, if necessary. In the
example above, the fields would have the fol-
lowing offsets: c:0, d:4, b:8, w:10.

If you want to ensure that the record's size is a
multiple of some value, then simply stick an ALIGN
directive as the last item in the record
declaration. HLA will emit an appropriate number
of bytes of padding at the end of the record to
fill it in to the appropriate size. The
following example demonstrates how to ensure that
the record's size is a multiple of four bytes:


<< some field declarations >>



Be aware of the fact that the ALIGN directive in a
RECORD only aligns fields in memory if the record
object itself is aligned on an appropriate
boundary. Therefore, you must ensure appropriate
alignment of any record variable whose fields
you're assuming are aligned.

If you want to ensure that all fields are
appropriately aligned on some boundary within a
record, but you don't want to have to man- ually
insert ALIGN directives throughout the record, HLA
provides a second alignment option to solve your
problem. Consider the fol- lowing syntax:

alignedRecord3 :

<< Set of fields >>


The "[4]" immediately following the RECORD
reserved word tells HLA to start all fields in
the record at offsets that are multiples of four,
regardless of the object's size (and the size of
the objects preceeding the field). HLA allows any
integer expression that pro- duces a value in the
range 1..4096 inside these parenthesis. If you
specify the value one (which is the default), then
all fields are packed (aligned on a byte
boundary). For values greater than one, HLA will
align each field of the record on the specified
boundary. For arrays, HLA will align the field
on a boundary that is a multiple of the array
element's size. The maximum boundary HLA will
round any field to is a multiple of 4096 bytes.

Note that if you set the record alignment using
this syntactical form, any ALIGN directive you
supply in the record may not pro- duce the desired
results. When HLA sees an ALIGN directive in a
record that is using field alignment, HLA will
first align the current offset to the value
specified by ALIGN and then align the next
field's offset to the global record align value.

Nested record declarations may specify a different
alignment value than the enclosing record, e.g.,

alignedRecord4 : record[4]

In this example, HLA aligns fields a, b, f, and g
on dword bound- aries, it aligns d and e (within
c) on eight-byte boundaries. Note that the
alignment of the fields in the nested record is
true only within that nested record. That is, if
c turns out to be aligned on some boundary other
than an eight-byte boundary, then d and e will
not actually be on eight-byte boundaries; they
will, however be on eight-byte boundaries
relative to the start of c.

In addition to letting you specify a fixed
alignment value, HLA also lets you specify a
minimum and maximum alignment value for a record.
The syntax for this is the following:

recordname : record[maximum : minimum]

<< fields >>


Whenever you specify a maximum and minimum value
as above, HLA will align all fields on a boundary
that is at least the minimum alignment value.
However, if the object's size is greater than the
minimum value but less than or equal to the
maximum value, then HLA will align that
particular field on a boundary that is a multiple
of the object's size. If the object's size is
greater than the maximum size, then HLA will align
the object on a boundary that is a multiple of
the maximum size. As an example, consider the
fol- lowing record:


r: record[ 4:1 ];
a:byte; // offset 0
b:word; // offset 2
c:byte; // offset 4
d:dword[2]; // offset 8
e:byte; // offset 16
f:byte; // offset 17
g:qword; // offset 20

Note that HLA aligns g on a dword boundary (not
qword, which would be offset 24) since the
maximum alignment size is four. Note that since
the minimum size is one, HLA allows the f field to
be aligned on an odd boundary (since it's a

If an array, record, or union field appears within
a record, then HLA uses the size of an array
element or the largest field of the record or
union to determine the alignment size. That is,
HLA will align the field without the outermost
record on a boundary that is compatible with the
size of the largest element of the nested array,
union, or record.

HLA sophisticated record alignment facilities let
you specify record field alignments that match
that used by most major high level language
compilers. This lets you easily access data types
used in those HLLs without resorting to inserting
lots of ALIGN directives inside the record.

Using Records/Structures in Assembly

In the "good old days" assembly language
programmers typically ignored records. Records
and structures were treated as unwanted
stepchildren from high-level languages, that
weren't necessary in "real" assembly language
programs. Manually counting offsets and
hand-coding literal constant offsets from a base
address was the way "real" programmers wrote code
in early PC applications. Unfortunately for
those "real programmers", the advent of
sophisticated operating systems like Windows and
Linux put an end to that nonsense. Today, it is
very difficult to avoid using records in modern
applications because too many API functions
require their use. If you look at typical Windows
and Linux include files for C or assembly
language, you'll find hundreds of different
structure declarations, many of which have
dozens of different members. Attempting to keep
track of all the field offsets in all of these
struc- tures is out of the question. Worse,
between various releases of an operating system
(e.g., Linux), some structures have been known to
change, thus exacerbating the problem. Today, it's
unreasonable to expect an assembly language
programmer to manually track such offsets - most
programmers have the reasonable expectation that
the assembler will provide this facility for

Implementing Structures in an Assembler: Part I

Unfortunately, properly implementing structures in
an assembler takes considerable effort. A large
number of the "hobby" (i.e., non-commercial)
assemblers were not designed from the start to
support sophisticated features such as
records/structures. The symbol table management
routines in most assemblers use a "flat" layout,
with all of the symbols appearing at the same
level in the symbol table database. To properly
support structures or records, you need a
hierarchical structure in your symbol table
database. The bad news is that it's quite
difficult to retrofit a hierarchical structure
over the top of a flat database (i.e., the symbol
"hobby assembler" symbol table). Therefore,
unless the assembler was originally designed to
handle structures properly, the result is usually a
major hacked-up kludge.

Four assemblers I'm aware of, MASM, TASM, OPTASM,
and HLA, handle structures well. Most other
assemblers are still trying to simulate
structures using a flat symbol table database,
with varying results.

Probably the first attempt people make at records,
when their assembler doesn't support them
properly, is to create a list of constant
symbols that specify the offsets into the record.
Returning to our first example (in HLA):


Name: string;
Major: int16;
SSN: char[12];
Midterm1: int16;
Midterm2: int16;
Final: int16;
Homework: int16;
Projects: int16;

One first attempt might be the following:

Name := 0;
Major := 4;
SSN := 6;
Midterm1 := 18;
Midterm2 := 20;
Final := 22;
Homework := 24;
Projects := 26;
size_student := 28;

With such a set of declarations, you could reserve
space for a student "record" by reserving
"size_student" bytes of storage (which almost all
assemblers handle okay) and then you can access
fields of the record by adding the constant offset
to your base address, e.g.,


John : byte[ size_student ];
mov( John[Midterm1], ax );

There are several problems with this approach.
First of all, the field names are global and must
be globally unique. That is, you cannot have two
record types that have the same fieldname (as is
possible when the assembler supports true
records). The second problem, which is
fundamentally more problematic, is the fact that
you can attach these constant offsets to any
object, not just a "student record" type object.
For example, suppose "ClassAverage" is an array
of words, there is nothing stopping you from
writing the following when using constant equate
values to simulate record offsets:

mov( ClassAverage[ Midterm1 ], ax );

Finally, and probably the most damning criticism
of this approach, is that it is very difficult to
maintain code that accesses structures in this
manner. Inserting fields into the middle of a
record, changing data types, and coming up with
globally unique names can create all sorts of
problems. In particular, a change to the record in
the middle of the record generally requires
changing all the following "equates" (constant
definitions) to allow for the insertion, deletion,
or other modification.

As noted earlier in this article, you *can* reduce
the maintenance issues somewhat by defining your
constants in terms of one another. E.g.,

Name := 0;
Major := Name+sizeOfString;
SSN := Major+sizeOfInt16;
Midterm1 := SSN+12*sizeOfChar;
Midterm2 := Midterm1+sizeOfInt16;
Final := Midterm2+sizeOfInt16;
Homework := Final+sizeOfInt16;
Projects := Homework+sizeOfInt16;

sizeOfStudent := Projects+sizeOfInt16;

Now when you insert, delete, or change a field
definition, the offsets "percolate" through the
remainder of the definition. However, you *do* have
to adjust the definition of the object following
the new insertion, deletion, or other modification.
IOW, modifications to this "structure" scheme have
dependencies. Adding (or otherwise modifying a
field) requires that you change *other* fields in
addition to the one you're modifying. This is not a
good thing if you want easy to maintain code.
Maintainable code allows you to make only the
changes desired and the rest of the code adjusts
appropriately, without having to change unrelated
lines of code as well.

One other problem with this approach is that it is
difficult to read. Consider the statement:

Final := Midterm2+sizeOfInt16;

If you look closely, you discover that nothing in
this statement tells you anything about what Final
is other than it's value (that is, it's offset
within the structure) is equal to Midterm2's value
plus the value "sizeOfInt16" (which is presumably
two, but no guarantees on that). In particular,
this statement doesn't tell you *anything* about
Final's type or size. To glean that information you
have to look at a different, unrelated statement:

Homework := Final+sizeOfInt16;

It is the declaration of "Homework" where we learn
that Final's size is two bytes. Sure, that's going
to be on the next line (usually, not required), but
the bottom line is that the statement declaring
Final's value should contain this information, you
shouldn't have to look elsewhere for it.

Many high-level language programmers
who've tried to learn assembly language have given
up after discovering that they had to maintain
records in this fashion in an assembly language
program (too bad they didn't start off with a
reasonable assembler that properly supports

Types and Sizes (a quick digression)

One big problem you find with some assemblers is
that they don't store any type information in the
symbol table along with the symbols. The argument
many assembler authors provide is that "type
checking doesn't belong in assembly language." This
is complete nonsense. All reasonable assemblers do
*some* type checking. For example, every assembler
I've seen will report an error if you attempt to do
something like the following:

mov( 123456, al );

The literal constant "123456" is too big for an
eight-bit register. That is, the *type* of this
constant is *not* byte (it's at least an unsigned
18-bit integer).

Similarly, instructions like the following are also
(generally) rejected by assemblers:

mov( 1.2345, eax );

Even if the floating-point constant "1.2345" does
fit in 32 bits, it generally isn't appropriate to
load a floating-point constant into an integer
register. Few assemblers will allow this. And most
of the ones that do (e.g., HLA), require that you
explicitly *state* that you really mean to move the
32-bit bit pattern corresponding to the
floating-point value 1.2345 into a 32-bit integer
register. For interested parties, here's how you
do this in HLA:

mov( @dword( real32(1.2345)), eax );

So the argument that "type checking doesn't belong
in assembly language" is a non-starter right from
the beginning. Of *course* assembly language has to
check operand types in order to verify correct
machine code generation.

The most *fundamental* type of all, the one that
most assemblers deal with, is the *size* of an
operand. A typical late-model Intel CPU, for
example, works with 8-, 16-, 32-, 64-, 80-, and
128-bit data types. Generally, the two operands of
any instruction must all be the same size (with
obvious exceptions for the "extension" instructions
and the like). The following examples all
demonstrate typical type mismatches that most
assemblers will report:

mov( al, ax );
mov( 12345, al );
mov( st0, eax );
mov( mm0, st0 );
mov( xmm0, al );

I cannot imagine any assembly language programmer
questioning an assembler that reports an error for
one of the above statements. So why, then, do
programmers make ridiculous statements like: "type
checking doesn't belong in assembly language."

The problems start when you start considering
memory access. As noted earlier in this section,
most assemblers maintain little information in the
symbol table beyond the symbol's name and a numeric
or textual value associated with that name. In
particular, a typical "low-level" assembler might
take a declaration like the following:

someVar db ?

and keep only the string "someVar" and the
offset to the storage associated with this name in
the symbol table. In particular, these types of
assemblers generally do *not* store information
that tells the assembler that "someVar" was
declared as a byte object. Therefore, when an
assembler sees statements like the following:

mov( someVar, al );
mov( someVar, ax );
mov( someVar, eax );
fld( someVar );
mov( someVar, mm0 );
mov( someVar, xmm0 );

the assembler has no information available to tell
the assembler whether these statements are correct.
So the assembler just blindly assumes that all
these statements are correct; after all, the
assembler can encode the offset to "someVar" into
the machine instruction, so it must be a legal
instruction, right?

These is where some "less than sophisticated"
assembly programmers start arguing about how great
it is that a particular assembler doesn't do type
checking. Why, they can access these objects anyway
they please "without the assembler getting in their
way." But does it really make sense to load the
contents of a byte variable into an 80-bit floating
point register (or worse yet, store an 80-bit
floating-point value into a byte variable)? Most of
the time the answer is "absolutely not." The vast
majority of the time this is a programming error
and it would be nice if the assembler reported that

Before you start hollering and hooting about how
"this is assembly language, I should be allowed to
do whatever I want" I would point out that the
presence of type checking in an assembler in no way
prevents you from storing an 80-bit floating-point
variable starting at the address of a byte memory
location, if this is what you really want to do.
The only issue is that you must *explicitly* tell
the assembler that you're bending the rules, but
you know what you're doing and the assembler should
let you do it. In HLA, for example, you could

fstp( (type real80 someVar) );

and this would tell HLA that "Yes, I know that
someVar isn't a real80 variable, but I want you to
treat it as though it was such a variable."

The old-time die-hard assembly programmer might
argue "but this is a bunch of red tape that gets in
the way of my code." This is absolute nonsense. If
you're writing *good* code, you generally declare
your variables to be the appropriate type and
access those variables using instructions
appropriate for that type. Type casting or coerce
should be the rare exception. If you're constantly
casting an object as some other type, perhaps you
need to consider the design of your application
rather than blaming the assembler for constantly
warning you about how you're breaking the rules.
Though type casting (implicit or explicit) probably
occurs more often in assembly language than in
other languages, it still should be a rare event.
That being the case, the argument about how type
casting "gets in my way" goes right out the window.

In fact, few assemblers do any sort of
sophisticated type checking above and beyond making
sure the size of the operands agree. For example,
HLA is probably the strongest typed "traditional"
x86 assembler out there, yet it allows you to do
something like this:


mov( fvar, eax );

The sizes of the two operands agree, so HLA accepts
this instruction, even though (technically) it
shoudl reject the statement because fvar is a
floating-point variable and EAX is an integer
register. (This was actually a hard design
decision to make in HLA, I might point out.
Ultimately, I chose to let this slide because it is
often the case that you want to work on a
floating-point value as a string of bits, and the
EAX register [or any general-purpose register] is
what you would use to operate on a bit string; that
is, EAX doesn't simply hold integers, it also holds
bit strings and floating-point values are
legitimate bit strings; I could go into a
long-winded discussion about why HLA will accept
"mov( fvar, eax);" but won't accept "fld(ivar);",
but that discussion will have to wait for another

Again, size is the most fundamental piece of type
information we can associate with a symbolic name
in an assembly language program.

You may recall in the earlier discussion of record
simulation via equates an example like:

Final := Midterm2+sizeOfInt16;

As noted earlier, this statement doesn't tell us
what the size of Final is, we have to look at the
next statement to figure this out:

Homework := Final+sizeOfInt16;

And after all of this, we still can't ask the
assembler "what is the size of Final?" That's
because there is no type information associated
with the "Final" field when implementing structures
in this manner (actually, "Final" is just an
assembly time equate, so symbol table information
associated with "Final" really has no bearing on
the information associated with the actual Final
field in the resulting record). As a result, it is
up to the programming to manually maintain size
information, for use when manipulating "Final" in
their program.

One common manipulation of a record field is to
obtain its size for use in some calculation. As you
can see in this simple example, Final's size is
equal to the constant "sizeOfInt16" (which,
presumably, is two bytes). There is nothing magic
about the symbol "sizeOfInt16", it's just another
constant declaration appearing elsewhere in the
program. Something the programmer had to create and
has to manually maintain. In any case, whenever
the programmer wants to write some code that
utilizes the size of "Final", they'd just use the
symbol "sizeOfInt16" in their code, e.g.,

mov( sizeOfInt16, eax );

But here we see one of the *huge* pitfalls to this
scheme -- the code is less maintainable when you do
this. Suppose, for example, you need to change
Final so that it is a 32-bit integer, or an 8-bit
integer, or some other type that doesn't fit in two
bytes. Now what? Well, all of a sudden, all the
code you've written like "mov( sizeOfInt16, eax );"
is now broken. And a simple "search and replace"
isn't going to do the job for you, because it's
likely that *most* of the "sizeOfInt16" references
*don't* have anything to do with "Final". You've
got to manually search through your code and
determine which occurences of "sizeOfInt16" apply
to "Final" and which do not. What you've got is a
maintenance nightmare.

Somewhat more experienced assembly language
programmers will note that they can create
individual size constants for each record member,

NameField := 0;
sizeOfNameField := sizeOfStringPtr;

Major := Name+sizeOfNameField;
sizeOfMajor := sizeOfInt16;

SSN := Major+sizeOfMajor;
sizeOfSSN := 12*sizeOfChar;

Midterm1 := SSN+sizeOfSSN;
sizeOfMidterm1 := sizeOfInt16;

Midterm2 := Midterm1+sizeOfMidterm1;
sizeOfMidterm2 := sizeOfInt16;

Final := Midterm2+sizeOfMidterm2;
sizeOfFinal := sizeOfInt16;

Homework := Final+Final;
sizeOfHomework := sizeOfInt16;

Projects := Homework+sizeOfHomework;
sizeOfHomework := sizeOfInt16;

sizeOfStudent := Projects+sizeOfHomework;

Now, if the programmer uses the constant
"sizeOfFinal" everywhere they need the size of the
final exam's data type, they've only got to make a
single change to their program if they decide to
change Final's type. So the result is quite a bit
more maintainable than the earlier version, but
it's still far from ideal. Indeed, we've just
doubled the number of statements needed to declare
our structure (making it less readable and
maintainable) and we still haven't solved the
problem of having to change multiple lines when we
modify the record's layout (indeed, inserting and
deleting fields in the record now require *more*
work than before). The "Holy Grail" we're searching
for here, of course, is to be able to change only
*one* line in the declaration when we want to make
a modification to one field of the record.

The big problem with this current example is that
the assembler (requiring you to code records this
way) doesn't store away any form of size
information in the symbol table, so it cannot
provide that information back to the programmer.
This forces the programmer to *manually* mantain
this size information themselves.

Fortunately, some of the better assemblers, e.g.,
FASM, MASM, TASM, and HLA, *do* maintain a little
more information in the symbol table other than the
symbol's name and some numeric value (like the
variable's offset or the constant's value). In
particular, these assemblers store away the size of
a declared object (if appropriate) and provide a
"compile-time function" that lets you determine the
size of that object. For example, consider the
following pseudo-record "by equates" declaration in

// Field offsets for a "point3D" record:

x :dword := 0;
y :dword := x + @size(x);
z :dword := y + @size(y);

(*) This is a specially-designed example that just
happens to work for this one special case. This
idea does not easily generalize to other
pseudo-record types.

The "@size" compile-time function in HLA (similar
functions are available in assemblers like FASM,
MASM, TASM, and OptASM) returns the size, in bytes,
of the operand you pass it. As I've declare the x,
y, and z constants in this example to be dwords,
the @size function will return four when applied to
these three names. Therefore, throughout my program
I can use constructs like "@size(z)" and the
assembler will automatically substitute the size of
z, in bytes, in place of the compile-time function.
Therefore, if I decide to use word or byte values,
rather than double-word values, the program
automatically adjusts, e.g.,

// Field offsets for a "point3D" record:

x :word := 0;
y :word := x + @size(x);
z :word := y + @size(y);

Recompiling the program sets the offsets to 0, 2,
and 4, and also automatically updates all the
occurrences of @size(z) to be two, rather than
four. So by providing a function that computes the
size of an object at compile-time, it's much easier
to create maintainable, readable, code. Of course,
we still haven't reached the "Holy Grail" with this
example, as you have to modify two lines when you
insert or delete a field in the middle of the
record, but things *are* getting better.

Implementing Structures in an Assembler, Part II:
Using Macros

Manually maintaining all the constant offsets is a
maintenance nightmare. So somewhere along the
way, some assembly language programmers figured
out that they could write macros to handle the
declaration of constant offsets for them. For
example, here's how you could do this in an HLA

program t;

// struct-
// Declares a "structure".
// Syntax:
// struct
// (
// structName,
// field1:type1,
// field2:type2,
// .
// .
// .
// fieldn:typen
// );
// Creates a "type declaration" that will
// reserve sufficient storage for the record
// and also creates a set of fieldname constants
// initialized with the offsets of each of these
// fields.
// Usage: see example immediately following.

#macro struct( _structName_, _dcls_[] ):
_dcl_, _id_, _type_, _colon_, _offset_;

// _offset is the current field offset we're
// going to use. Initialize it with zero for
// the start of the struct.

?_offset_ := 0;

// _dcl_ is going to be a string with the
// current declaration we're processing (from
// the _dcls_ array of strings, corresponding
// to a variable parameter list passed to this
// macro):

#for( _dcl_ in _dcls_ )

// Declarations take the form
// fieldName : typename
// The following statement locates the
// position of the ":" in the _dcl_ string.

?_colon_ := @index( _dcl_ , 0, ":" );
#if( _colon_ = -1 )

// If we didn't find a ":", then we've
// got a syntax error. Report it.

"Expected <id>:<type> in struct "
"definition, encountered: ",


// Okay, now extract the field name
// (which is all the text before the
// colon) and the type name (which is
// all the text after the colon):

?_id_ := @substr( _dcl_, 0, _colon_ );
?_type_ :=
@length( _dcl_ ) - _colon_

// Emit the fieldName as a constant
// that is initialized to the current
// offset we're computing.

?@text( _id_ ) := _offset_;

// Adjust the current offset beyond the
// length of the current field:

?_offset_ :=
_offset_ + @size( @text( _type_ ));



// Create a string that we can use to allocate
// storage for this "struct" type. That is,
// the string expands to the syntax that
// creates an HLA array of bytes, with a
// sufficient number of bytes for this
// struct:

?_structName_:text :=
"byte[" + @string( _offset_ ) + "]";


// Declare a struct "threeItems" with the
// three fields: i, j, k:

struct( threeItems, i:byte, j:word, k:dword )

// Create a "threeItems" variable and allocate
// storage for it in the static section:

aStruct: threeItems;

begin t;

// To access the fields, we must index into
// the aStruct data structure. Note that we
// also have to cast each of the fields as
// HLA will complain if the types mismatch
// (this wouldn't be a problem if the assembler
// didn't support type checking at all, so
// don't get too excited by the extra type
// casting taking place; of course, it goes
// without saying that this *isn't* the way
// you'd do this in HLA, anyway.

mov( (type byte aStruct[i]), al );
mov( (type word aStruct[j]), ax );
mov( (type dword aStruct[k]), eax );

end t;

The "struct" macro expects a set of valid HLA
variable declarations supplied as macro
arguments. It generates a set of constants using
the supplied variable names whose offsets are
adjusted according to the size of the objects
previously appearing in the list. In this
example, HLA creates the following equates:

i = 0
j = 1
k = 3

This declaration also creates a "data type" named
"threeItems" which is equivalent to "byte[7]"
(since there are seven bytes in this record) that
you may use to create variables of type
"threeItems", as is done in this example.

Currently, the "macro" approach is the one used by
assemblers such as NASM. From the NASM

4.8.5 STRUC and ENDSTRUC: Declaring Structure Data Types

The core of NASM contains no intrinsic means of
defining data structures; instead, the preprocessor
is sufficiently powerful that data structures can
be implemented as a set of macros. The macros STRUC
and ENDSTRUC are used to define a structure data

STRUC takes one parameter, which is the name of the
data type. This name is defined as a symbol with
the value zero, and also has the suffix _size
appended to it and is then defined as an EQU giving
the size of the structure. Once STRUC has been
issued, you are defining the structure, and should
define fields using the RESB family of
pseudo-instructions, and then invoke ENDSTRUC to
finish the definition.

For example, to define a structure called mytype
containing a longword, a word, a byte and a string
of bytes, you might code

struc mytype

mt_long: resd 1
mt_word: resw 1
mt_byte: resb 1
mt_str: resb 32


The above code defines six symbols: mt_long as 0
(the offset from the beginning of a mytype
structure to the longword field), mt_word as 4,
mt_byte as 6, mt_str as 7, mytype_size as 39, and
mytype itself as zero.

[rlh: note the use of a naming convention to
overcome the global namespace solution, there is a
solution to this using NASM local symbols, read

The reason why the structure type name is defined
at zero is a side effect of allowing structures to
work with the local label mechanism: if your
structure members tend to have the same names in
more than one structure, you can define the above
structure like this:

struc mytype

.long: resd 1
.word: resw 1
.byte: resb 1
.str: resb 32


This defines the offsets to the structure fields as
mytype.long, mytype.word, mytype.byte and

NASM, since it has no intrinsic structure support,
does not support any form of period notation to
refer to the elements of a structure once you have
one (except the above local-label notation), so
code such as mov ax,[mystruc.mt_word] is not valid.
mt_word is a constant just like any other constant,
so the correct syntax is mov ax,[mystruc+mt_word]
or mov ax,[mystruc+mytype.word].


As you can see, NASM macro implemention has all the
typical problems I've mentioned with the macro
implementation, particularly with respect to
namespace pollution. This is not to pick on NASM
unfairly, *all* macro implementations of records
are going to have similar problems. Fancier macros
may solve some of them, but you're not going to get
a complete solution.

Creating structures with macros solves one of the
three major problems: it makes it easier to
maintain the constant equates list, as you do not
have to manually adjust all the constants when
inserting and removing fields in a record. That
is, we've achieved the "Holy Grail" of struct field
maintenance -- if we want to change, insert, or
delete only one field, we only have to manipulate
that one field.

Unfortunately, there are other maintenance problems
associated with the "records as offsets" approach,
and this macro does not address those problems.

The Global Namespace Problem

One huge problem with the macro implementation of
the previous section is that it does not help with
the problem of "global namespace pollution". In
true record implementations, the field names are
*local* to that record. In the "records fields are
just constant offsets" scheme described thus far,
each and every field name is a global constant that
must be unique in the name space. True record
declarations (similar to those in HLLs) allow you
to create record types like the following:



There is never any ambiguity between the field
names in record1 or record2. The assembler/compiler
is smart enough to maintain a separate list of
identifers for each of these two records, so when
the assembler sees something like "rec2Var.field3"
it knows to use the second definition of field3 in
record2 (assuming "rec2Var" is of type record2).

With the macro given in the previous section, you'd
get a "duplicate symbol error when you attempted to
declare record2, i.e.,



One solution that enterprising assembly programmers
have come up with to solve (well, reduce the impact
of) the global namespace problem is to adopt a
naming convention that reduces the possibilty of
name conflicts. One such naming convention is to
rename all the fields using names like
"recname_field1", "recname_field2", etc. For
example, we could rewrite the above structs as:



Unfortunately, this has the tendency to create some
very unwieldy names, particularly as you begin
creating advanced data structures involving nested
records and the like. Though "record1_field1" may
not seem so bad by itself, keep in mind that this
is just the offset, not the whole name. That is,
you've still got to apply the variable name to the
whole thing, too. Now, if you can imagine a record
that has a nested record type, and that nested
record type also has a nested record type, you wind
up writing code like this:

mov( structVar[record1_i_record2_j_record3_k], al );

rather than the standard record syntax (which is a
bit easier to read):

mov( structVar.i.j.k, al );

An even worse problem with the "records as constant
offsets" approach is that it doesn't prevent you
from applying arbitrary offsets to *any* record.
Consider the following declarations which are legal
(no duplicate symbol errors):




Now suppose you do this:

mov( eax, r1[field5] );

There is no error. Yet r1 doesn't have a "field5"
field associated with it and the offset associate
with field5 takes you well beyond the allocated
storage for r1. IOW, the results are generally
undefined when this happens. Yet the assembler
gladly accepts this without complaint and emits
code that (most likely) is not going to produce
desireable results.

Fancier macros could be written, macros that
generate identifiers like "objectname_fieldName"
that help solve the globally unique problem.

However, this introduces some problems of it's own.
Suppose I write a "struct" macro similar to the one
I gave earlier that only generates fieldnames and
constructs them by appending the structure name to
each of the field names. Now consider the following
two structure declarations:



Having two structs like this with the same name
could be perfectly allowable by the assembler,
because the actual names these macros produce are




and all of these names are globally unique. Note
that, effectively, what I've done is create a
*union* of two record types. That's because the
offsets for the fields field1, field2, and field3
overlap those of field4, field5, and field6 in
these records. If I create two variables, r1 and
r2, that old the record1 (#1) and record1 (#2)
fields, respectively, I can get into a lot of
trouble if I start writing code like:

mov( al, r2[ record1_field1] ); //Whoops!


mov( eax, r1[record1_field4] ); //Whoops!

There is no checking to see that r2 actually
has a "record1_field1" associated with it, nor can
the assembler check to see that r1 has a
"record1_field4" associated with it.

IOW, you can mix and match fields from one type
declaration in variables that aren't of that type.
What a recipe for a disaster!

The bottom line is that naming conventions don't
really solve the problem. A naming convention
doesn't prevent naming conflicts between a local
record fieldname and some global symbol (what's to
prevent you from defining your own "record1_field1"
symbol, for example?). Worse, the use of naming
conventions does *not* prevent people from
"inserting" fields into your structure. For
example, if you've got a macro that defines the
following symbols that correspond to fields of a


there's nothing stopping someone from creating a
new symbol, "record1_field4" that has nothing to do
with your record, but sure *looks* like it's a
field of "record1". Namespace pollution works both
ways. Not only can a record pollute the global
namespace, but the global namespace and pollute the
record namespace as well. All of this leads to a
bunch of maintenance and readability problems.

These hacks really begin to fail when you attempt
to declare nested records, arrays within records,
and arrays of records (possibly containing nested
records and arrays of records). Trying to keep
track of all the possibilties is an open invitation
for the introduction of defects into your code (as
the author of one low-end assembler puts it "why
would anyone do this?" [that is, create such
advanced structures.] With an assembler that
doesn't support true records, I'd have to ask the
same question!)


It is amusing to watch the evolution of various
assemblers for the x86 processor family, with
respect to record support. MASM is typical of the
progression that assemblers go through:

1) When the assembler is first introduced, there is
no support for records at all; users who want to
utilize records have to create symbolic constants
to provide the offset into their records.

2) As users begin to complain about how nice it
would be to have records, the assembler's author(s)
discover that it's a *lot* of work to do structures
properly, so they offer a stop-gap macro
implementation of structures to allow people to get
some work done without having to make any serious
changes to the assembler itself. These macros are
not unlike the examples I gave earlier (btw, I
don't actually recall Microsoft going through this
phase, but it is a phase that assemblers like NASM
and FASM have or are going through).

3) The problem with the macro implementation is
that all you really get are offsets. No type
information is really associated with those
offsets, making the use of such record
implementions very weak. A scheme I
call "pseudo-struct" implementation. (we're going
to discuss this mechanism shortly.)

3) As the shortcomings of the pseudo-struct
approach become ever more apparent, the assembler's
author(s) bite the bullet and get around to adding
true record support to their product.

In the case of MASM, the first versions (IIRC) did
not offer any struct support at all. Somewhere
around MASM v4 or v5, Microsoft added struct
support in a form I call "pseudo-structs". This was
4-7 years after the original introduction of the
assembler. In version 6.0, nearly 10 years after
MASM's introduction, Microsoft finally added *true*
struct support.

So exactly what is a "pseudo-struct"? Well, a
pseudo-struct is a structure declaration, not
unlike the "struct" macro I gave earlier in this
article, that associates a set of offsets with a
set of names. But beyond that, it also associates
type information (e.g., object size). The
declaration would be a "true" record declaration
except for the fact that names are not local to the
particular record (that is, the global namespace
pollution problem still exists). Various
assemblers address this problems in different ways,
but it usually involves some sort of naming

The struc facilities in MASM v5.x were a good
example of "pseudo-structs". You could declare
actual record type declarations, create variables
of those types, and access the fields of those
variable using record-like "dot notation", but the
field names were still global and had to be
globally unique. Ultimately, of course, Microsoft
bit the bullet and expanded their symbol table data
structure to support true local symbols in a
structure declaration (in MASM v6, at which point
they supported true records).

Type Declarations vs. Variable Instantiation

Instantiation is the process of binding a block of
storage, and possibly a value, to some sort of
object (I use this term in the generic sense, not
in the OOP sense, here). Record instantiation is
yet another area where true record/struct
facilities in an assembler win big over other

Allocation of storage is, perhaps, the *one* thing
that most attempts at records get right. After all,
to allocate storage all you really need to know is
the size of the object, and then you can allocate
a block of bytes of that size. Even the "records at
constant offsets" implementation can do this.

But ultimately, if all you have is a name that
begins a block of allocated storage, you can get
into a lot of trouble. E.g.,

mov( record1[someField], al );

How would the assembler ever notify you if
"someField" is not actually a field of record1? It
can't. The symbol "someField" is just a numeric
constant and the assembler happily obliges your
request to access the byte at offset "someField"
from the base address of record1. This problem also
afflict the common "macro implementations of
structs" like the one that NASM uses. Note that
this is not an example of the "evil type checking"
that so many naive assembly programmers complain
about. This has nothing to do with type checking.
Types are involved here. What this has to do with
is verifying that "someField" actually *is* a field
of "record1". This is one of the more important
benefits of record instantiation -- not only do you
get the storage you ask for, but you get some
built-in sanity checking for the record object as

Along with storage allocation, providing an
(optional) initial value for a record variable
declaration is also a part of instantiation. This
is one area where type checking turns out to
provide an *extremely* invaluable service. First,
let's consider how you'd initialize a record object
using NASM's istruct, at, and iend macros. From the
NASM manual:

4.8.6 ISTRUC, AT and IEND: Declaring Instances of Structures

Having defined a structure type, the next thing you
typically want to do is to declare instances of
that structure in your data segment. NASM provides
an easy way to do this in the ISTRUC mechanism. To
declare a structure of type mytype in a program,
you code something like this:

istruc mytype

at mt_long, dd 123456
at mt_word, dw 1024
at mt_byte, db 'x'
at mt_str, db 'hello, world', 13, 10, 0


The function of the AT macro is to make use of the
TIMES prefix to advance the assembly position to
the correct point for the specified structure
field, and then to declare the specified data.
Therefore the structure fields must be declared in
the same order as they were specified in the
structure definition.

If the data to go in a structure field requires
more than one source line to specify, the remaining
source lines can easily come after the AT line. For

at mt_str, db 123,134,145,156,167,178,189
db 190,100,0

Depending on personal taste, you can also omit the
code part of the AT line completely, and start the
structure field on the next line:

at mt_str
db 'hello, world'
db 13,10,0


This all looks well and good. But it hides an
important pitfall -- symbols like mt_long and
mt_word are only numeric offsets. No type
information is associated with these names. So it
is syntactically legal to do something like the

istruc mytype

at mt_long, db 123
at mt_word, db 124
at mt_byte, dd 12345
at mt_str, dw 1024


NASM will happily inject these byte, dword, and
word values at the offsets you specify, even though
the actual data types do *not* correspond to the
values in your struct declaration.

Also consider the following:

istruc mytype

at mt_word, dw 1024, 0
at mt_long, dd 123456,7890
at mt_byte, db 123
at mt_str, db "Hello World", 13, 10, 0


What does this generate? What does the assembler
report? Why isn't it notifying you that you've
messed up and the data definitions don't match the
original declaration?

The argument that "well, this is assembly language
and the programmer *should* know what they're
doing" is a ridiculous response. Nobody is perfect
and everybody makes mistakes. It's nice when the
assembler/compiler can help notify you when you've
made some mistakes, rather than emitting bad code
or data without any indication of the problem.

A macro implementation of records need not suffer
from the problem we're seeing with NASM here.
Consider the definition of FASM structs (from the
FASM documentation):

2.3.4 Structures

struc directive is a special variant of macro
directive that is used to define data structures.
Macroinstruction defined using the struc directive
must be preceded by a label (like the data
definition directive) when it's used. This label
will be also attached at the
From: hutch-- on
I am much of the view that an assembler than does not have native
structure and union support is a toothless terror for writing hacky
little demos with. While most assembler programmers know how to
dereference an address and add offsets to it for each structure member,
it falls well short of being able to predefine a structure with or
without embedded unions and nested structures.

A perfect axample is the structures used for a PE header and the
section data. While you can do it the slow hard and unreliable way, you
write far cleaner and more reliable code when you have structures and
unions available.

The only reason why these toothless terrors are still with us is
structure and union support is hard to write.


hutch at movsd dot com

From: Annie on

On 2005-11-19 randyhyde(a) said:

> Hi All,
> I've read several posts concerning structures and their
> implementation in assembly language. Given some
> misconceptions about structures in assembly language,
> I pieced together the following article about structures
> in assembly.
> _____
> [ snip ] ((( `\
_ _`\ )
Ummm...could you put this in (^ ) )
PDF format, Randy? Hehe! ~-( )
,-' \_/ `\
( , |
\ / | |
=()=: / ,' aa

From: James Buchanan on
Annie wrote:
> _ _`\ )
> Ummm...could you put this in (^ ) )
> PDF format, Randy? Hehe! ~-( )
> _'((,,,)))

Another sophisticated and interesting contribution to the debate from Annie.
From: Betov on
"hutch--" <hutch(a)> ?crivait news:1132461357.066607.288060

> I am much of the view that an assembler than does not have native
> structure and union support is a toothless terror

Said by our prefered Power Basic Programmer,
who redistributes illegaly a weird MicroSoft
C-Side Toy, in the absurd hope of damaging
the Open Source Mouvement.



< >