This documentation is not ready for prime time yet. Not even close. It's not so much documentation as random blathering of mine intended to be notes to myself that may eventually be turned into real documentation.
I take no responsibility for any negative effect it may have on your professional, personal, or spiritual life. Read it at your own risk. Caveat emptor. Delete before reading. Abandon all hope, ye who enter here.
However, enhancements will be gratefully accepted.
BFD_ASSEMBLER BFD, MANY_SECTIONS, BFD_HEADERS
... `local' symbols ... flags ...
The definition for struct symbol
, also known as symbolS
, is
located in `struc-symbol.h'. Symbol structures can contain the following
fields:
sy_value
expressionS
that describes the value of the symbol. It might
refer to another symbol; if so, its true value may not be known until
foo
is called.
More generally, however, ... undefined? ... or an offset from the start of a
frag pointed to by the sy_frag
field.
sy_resolved
sy_resolving
sy_used_in_reloc
sy_next
sy_previous
symbolS
structures describe a singly or doubly
linked list. (If SYMBOLS_NEED_BACKPOINTERS
is not defined, the
sy_previous
field will be omitted.) These fields should be accessed
with symbol_next
and symbol_previous
.
sy_frag
fragS
that this symbol is attached to.
sy_used
bsym
BFD_ASSEMBLER
is defined, this points to the asymbol
that will
be used in writing the object file.
sy_name_offset
BFD_ASSEMBLER
is not defined.) This is the position of
the symbol's name in the symbol table of the object file. On some formats,
this will start at position 4, with position 0 reserved for unnamed symbols.
This field is not used until write_object_file
is called.
sy_symbol
BFD_ASSEMBLER
is not defined.) This is the
format-specific symbol structure, as it would be written into the object file.
sy_number
BFD_ASSEMBLER
is not defined.) This is a 24-bit symbol
number, for use in constructing relocation table entries.
sy_obj
OBJ_SYMFIELD_TYPE
. If no macro by
that name is defined in `obj-format.h', this field is not defined.
sy_tc
TC_SYMFIELD_TYPE
. If no macro
by that name is defined in `targ-cpu.h', this field is not defined.
TARGET_SYMBOL_FIELDS
OBJ_SYMFIELD_TYPE
and TC_SYMFIELD_TYPE
.
Access with S_SET_SEGMENT, S_SET_VALUE, S_GET_VALUE, S_GET_SEGMENT, etc., etc.
Expressions are stored as a combination of operator, symbols, blah.
The frag is the basic unit for storing section contents.
fr_address
relax_segment
fills in this field.
fr_next
fr_fix
fr_var
fr_fix
characters. May be zero.
fr_symbol
fr_offset
fr_opcode
line
fr_type
fr_offset
,
fr_symbol
and the variable-length tail of the frag, as well as the
treatment it gets in various phases of processing. It does not affect the
initial fr_fix
characters; they are always supposed to be output
verbatim (fixups aside). See below for specific values this field can have.
fr_subtype
md_relax_frag
isn't defined, this is
assumed to be an index into md_relax_table
for the generic relaxation
code to process. (See section Relaxation.) If md_relax_frag
is defined,
this field is available for any use by the CPU-specific code.
align_mask
align_offset
.align
directives are given; instead, the number of bytes needed
may be computable when the .align
directive is processed. Hmm. Is this
the right place for these, or should they be in the frchainS
structure?
fr_pcrel_adjust
fr_bsr
struct
frag
is defined before the CPU-specific header files are included, they must
unconditionally be defined.
fr_literal
These are the possible relaxation states, provided in the enumeration type
relax_stateT
, and the interpretations they represent for the other
fields:
rs_align
fr_offset
is the logarithm (base 2) of the alignment in bytes.
(For example, if alignment on an 8-byte boundary were desired, fr_offset
would have a value of 3.) The variable characters indicate the fill pattern to
be used. (More than one?)
rs_broken_word
rs_fill
fr_offset
times. If
fr_offset
is 0, this frag has a length of fr_fix
.
rs_machine_dependent
fr_symbol
and fr_offset
, and fr_subtype
indicates the
particular machine-specific addressing mode desired. See section Relaxation.
rs_org
fr_symbol
and
fr_offset
; one character from the variable-length tail is used as the
fill character.
A chain of frags is built up for each subsection. The data structure
describing a chain is called a frchainS
, and contains the following
fields:
frch_root
frch_last
frch_next
frchainS
structures.
frch_seg
frch_subseg
fix_root, fix_tail
BFD_ASSEMBLER
is defined.) Point to first and last
fixS
structures associated with this subsection.
frch_obstack
A frchainS
corresponds to a subsection; each section has a list of
frchainS
records associated with it. In most cases, only one subsection
of each section is used, so the list will only be one element long, but any
processing of frag chains should be prepared to deal with multiple chains per
section.
After the input files have been completely processed, and no more frags are to be generated, the frag chains are joined into one per section for further processing. After this point, it is safe to operate on one chain per section.
The "broken word" idea derives from the fact that some compilers, including
gcc
, will sometimes emit switch tables specifying 16-bit .word
displacements to branch targets, and branch instructions that load entries from
that table to compute the target address. If this is done on a 32-bit machine,
there is a chance (at least with really large functions) that the displacement
will not fit in 16 bits. Thus the "broken word" idea is well named, since
there is an implied promise that the 16-bit field will in fact hold the
specified displacement.
If the "broken word" processing is enabled, and a situation like this is
encountered, the assembler will insert a jump instruction into the instruction
stream, close enough to be reached with the 16-bit displacement. This jump
instruction will transfer to the real desired target address. Thus, as long as
the .word
value really is used as a displacement to compute an address
to jump to, the net effect will be correct (minus a very small efficiency
cost). If .word
directives with label differences for values are used
for other purposes, however, things may not work properly. I think there is a
command-line option to turn on warnings when a broken word is discovered.
This code is turned off by the WORKING_DOT_WORD
macro. It isn't needed
if .word
emits a value large enough to contain an address (or, more
correctly, any possible difference between two addresses).
Blah blah blah, initialization, argument parsing, file reading, whitespace munging, opcode parsing and lookup, operand parsing. Now it's time to write the output file.
In BFD_ASSEMBLER
mode, processing of relocations and symbols and
creation of the output file is initiated by calling write_object_file
.
BFD_ASSEMBLER
only.)
Is it okay to use this section's section-symbol in a relocation entry? If not,
a new internal-linkage symbol is generated and emitted if such a relocation
entry is needed. (Default: Always use a new symbol.)
BFD_ASSEMBLER
only.)
If this macro is defined, it is invoked just before setting the symbol table of
the output BFD. Any finalizing changes needed in the symbol table should be
done here. For example, in the COFF support, if there is no .file
symbol defined already, one is generated at this point. If no such adjustments
are needed, this macro need not be defined.
BFD_ASSEMBLER
only.)
Should section symbols be included in the symbol list if they're used in
relocations? Some formats can generate section-relative relocations, and thus
don't need symbols emitted for them. (Default: 1.)
.file
directive is seen, or a
#line
directive with a file name. Currently it is defined only
for COFF and ELF. (Default: No action.)
Currently some CPU support does not examine this value, and therefore does not bother setting it. Eventually, all CPU backend files should set it.
If md_relax_frag
isn't defined, and TC_GENERIC_RELAX_TABLE
is,
the assembler will perform some relaxation on rs_machine_dependent
frags
based on the frag subtype and the displacement to some specified target
address. The basic idea is that many machines have different addressing modes
for instructions that can specify different ranges of values, with successive
modes able to access wider ranges, including the entirety of the previous
range. Smaller ranges are assumed to be more desirable (perhaps the
instruction requires one word instead of two or three); if this is not the
case, don't describe the smaller-range, inferior mode.
The fr_subtype
and the field of a frag is an index into a CPU-specific
relaxation table. That table entry indicates the range of values that can be
stored, the number of bytes that will have to be added to the frag to
accomodate the addressing mode, and the index of the next entry to examine if
the value to be stored is outside the range accessible by the current
addressing mode. The fr_symbol
field of the frag indicates what symbol
is to be accessed; the fr_offset
field is added in.
If the fr_pcrel_adjust
field is set, which currently should only happen
for the NS32k family, the TC_PCREL_ADJUST
macro is called on the frag to
compute an adjustment to be made to the displacement.
The value fitted by the relaxation code is always assumed to be a displacement
from the current frag. (More specifically, from fr_fix
bytes into the
frag.) This seems kinda silly. What about fitting small absolute values? I
suppose md_assemble
is supposed to take care of that, but if the operand
is a difference between symbols, it might not be able to, if the difference was
not computable yet.
The end of the relaxation sequence is indicated by a "next" value of 0. This is kinda silly too, since it means that the first entry in the table can't be used. I think -1 would make a more logical sentinel value.
The table md_relax_table
from `targ-cpu.c' describes the relaxation
modes available. Currently this must always be provided, even on machines for
which this type of relaxation isn't possible or practical. Probably fewer than
half the machines gas supports used it; it ought to be made conditional on some
CPU-specific macro. Currently, also that table must be declared "const;" on
some machines, though, it might make sense to keep it writeable, so it can be
modified depending on which CPU of a family is specified. For example, in the
m68k family, the 68020 has some addressing modes that are not available on the
68000.
For some configurations, the linker can do relaxing within a section of an object file. If call instructions of various sizes exist, the linker can determine which should be used in each instance, when a symbol's value is resolved. In order for the linker to avoid wasting space and having to insert no-op instructions, it must be able to expand or shrink the section contents while still preserving intra-section references and meeting alignment requirements.
For the i960 using b.out format, no expansion is done; instead, each `.align' directive causes extra space to be allocated, enough that when the linker is relaxing a section and removing unneeded space, it can discard some or all of this extra padding and cause the following data to be correctly aligned.
For the H8/300, I think the linker expands calls that can't reach, and doesn't worry about alignment issues; the cpu probably never needs any significant alignment beyond the instruction size. But I'm not sure; check with Steve.
The relaxation table type contains these fields:
long rlx_forward
long rlx_backward
rlx_length
rlx_more
The relaxation is done in relax_segment
in `write.c'. The
difference in the length fields between the original mode and the one finally
chosen by the relaxing code is taken as the size by which the current frag will
be increased in size. For example, if the initial relaxing mode has a length
of 2 bytes, and because of the size of the displacement, it gets upgraded to a
mode with a size of 6 bytes, it is assumed that the frag will grow by 4 bytes.
(The initial two bytes should have been part of the fixed portion of the frag,
since it is already known that they will be output.) This growth must be
effected by md_convert_frag
; it should increase the fr_fix
field
by the appropriate size, and fill in the appropriate bytes of the frag.
(Enough space for the maximum growth should have been allocated in the call to
frag_var as the second argument.)
If relocation records are needed, they should be emitted by
md_estimate_size_before_relax
.
These are the machine-specific definitions associated with the relaxation mechanism:
fr_subtype
of the frag if needed. When this function is called, if
the symbol has not yet been defined, it will not become defined later; however,
its value may still change if the section it is in gets relaxed.
Usually, if the symbol is in the same section as the frag (given by the
sec argument), the narrowest likely relaxation mode is stored in
fr_subtype
, and that's that.
If the symbol is undefined, or in a different section (and therefore moveable
to an arbitrarily large distance), the largest available relaxation mode is
specified, fix_new
is called to produce the relocation record,
fr_fix
is increased to include the relocated field (remember, this
storage was allocated when frag_var
was called), and frag_wane
is
called to convert the frag to an rs_fill
frag with no variant part.
Sometimes changing addressing modes may also require rewriting the instruction.
It can be accessed via fr_opcode
or fr_fix
.
Sometimes fr_var
is increased instead, and frag_wane
is not
called. I'm not sure, but I think this is to keep fr_fix
referring to
an earlier byte, and fr_subtype
set to rs_machine_dependent
so
that md_convert_frag
will get called.
const struct relax_type *
. Typically, it will simply expand to
md_relax_table
, declared in `targ-cpu.h' as an array of
(const
or non-const
) struct relax_type
elements.
fr_subtype
field for use by the CPU-specific code.
.word
directives
will never need the "broken word" processing performed.
It is also defined by `obj-coff.h' if BFD_ASSEMBLER
is not defined,
but I'm not sure why.
obj_frob_file
, this macro handles miscellaneous last-minute
cleanup. Currently only used on PowerPC/POWER support, for setting up a
.debug
section. This macro should not cause the symbol table to be
modified.
If WORKING_DOT_WORD
is not defined, this code is enabled.
It makes use of at least two target-specific variables:
WORKING_DOT_WORD
is not defined, they do not need to be defined.
The a.out
format is described by `obj-aout.*'.
The b.out
format, described by `obj-bout.*', is similar to
a.out
format, except for a few additional fields in the file header
describing section alignment and address.
Originally, `obj-coff' was a purely non-BFD version, and
`obj-coffbfd' was created to use BFD for low-level byte-swapping. When
the BFD_ASSEMBLER
conversion started, the first COFF target to be
converted was using `obj-coff', and the two files had diverged somewhat,
and I didn't feel like first converting the support of that target over to use
the low-level BFD interface.
So `obj-coff' got converted, and to simplify certain things,
`obj-coffbfd' got "merged" in with a brute-force approach.
Specifically, preprocessor conditionals testing for BFD_ASSEMBLER
effectively split the `obj-coff' files into the two separate versions. It
isn't pretty. They will be merged more thoroughly, and eventually only the
higher-level interface will be used.
All ECOFF configurations use BFD for writing object files.
ELF is a fairly reasonable format, without many of the deficiencies the other object file formats have. (It's got some of its own, but not as bad as the others.) All ELF configurations use BFD for writing object files.
This is the format used on VMS. Yes, someone has actually written BFD support for it. The code hasn't been integrated yet though.
The XCOFF configuration is based on the COFF cofiguration (using the higher-level BFD interface). In fact, it uses the same files in the assembler.
This is the old Vax VMS support. It doesn't use BFD.
Foo: a29k, alpha, h8300, h8500, hppa, i386, i860, i960, m68k, m88k, mips, ns32k, ppc, sh, sparc, tahoe, vax, z8k.
The operand syntax handling is atrocious. There is no clear specification of the operand syntax. I'm looking into using a Bison grammar to replace much of it.
Operands on the 68k series processors can have two displacement values
specified, plus a base register and a (possibly scaled) index register of which
only some bits might be used. Thus a single 68k operand requires up to two
expressions, two register numbers, and size and scale factors. The
struct m68k_op
type also includes a field indicating the mode of the
operand, and an error
field indicating a problem encountered while
parsing the operand.
An instruction on the 68k may have up to 6 operands, although most of them have to be simple register operands. Up to 11 (16-bit) words may be required to express the instruction.
A struct m68k_exp
expression contains an expressionS
, pointers to
the first and last characters of the input that produced the expression, an
indication of the section to which the expression belongs, and a size field.
I'm not sure what the size field describes.
Many instructions used the low six bits of the first instruction word to describe the location of the operand, or how to compute the location. The six bits are typically split into three for a "mode" and three for a "register" value. The interpretation of these values is as follows:
Mode Register Operand addressing mode 0 Dn data register 1 An address register 2 An indirect 3 An indirect, post-increment 4 An indirect, pre-decrement 5 An indirect with displacement 6 An indirect with optional displacement and index; may involve multiple indirections and two displacements 7 0 16-bit address follows 7 1 32-bit address follows 7 2 PC indirect with displacement 7 3 PC indirect with optional displacements and index 7 4 immediate 16- or 32-bit 7 5,6,7 Reserved
On the 68000 and 68010, support for modes 6 and 7.3 are incomplete; the displacement must fit in 8 bits, and no scaling or index suppression is permitted.
The relaxation modes used on the 68k are:
ABRANCH
BCC68000
is applicable.
FBRANCH
PCREL
ABSL
mode, and the CPU is not a 68000 or 68010.
(Why? Those processors support mode 7.2.)
BCC68000
DBCC
BCC68000
, but for dbCC
(decrement and branch on condition)
instructions.
PCLEA
AOFF
with a PC-relative addressing mode and a displacement that won't
fit in 16 bits, or which is variable and is not specified to have a size other
than long.
PCINDEX
These are the `te-*.h' files.
Returns non-zero if any warnings or errors, respectively, have been printed during this invocation.
Displays a BFD or system error, then clears the error status.
These functions display messages about something amiss with the input file, or
internal problems in the assembler itself. The current file name and line
number are printed, followed by the supplied message, formatted using
vfprintf
, and a final newline.
An error indicated by as_bad
will result in a non-zero exit status when
the assembler has finished. Calling as_fatal
will result in immediate
termination of the assembler process.
These variants permit specification of the file name and line number, and are used when problems are detected when reprocessing information saved away when processing some earlier part of the file. For example, fixups are processed after all input has been read, but messages about fixups should refer to the original filename and line number that they are applicable to.
These functions are helpful for converting a valueT
value into printable
format, in case it's wider than modes that *printf
can handle. If the
type is narrow enough, a decimal number will be produced; otherwise, it will be
in hexadecimal (FIXME: currently without `0x' prefix). The value itself is not
examined to make this determination.
Creates the hash table control structure.
Deletes entry from the hash table, returns the value it had.
Updates the value for an entry already in the table, returning the old value. If no entry was found, just returns NULL.
Inserting a value already in the table is an error. Returns an error message or NULL.
Inserts if the value isn't already present, updates it if it is.
The test suite is kind of lame for most processors. Often it only checks to
see if a couple of files can be assembled without the assembler reporting any
errors. For more complete testing, write a test which either examines the
assembler listing, or runs objdump
and examines its output. For the
latter, the TCL procedure run_dump_test
may come in handy. It takes the
base name of a file, and looks for `file.d'. This file should
contain as its initial lines a set of variable settings in `#' comments,
in the form:
#varname: value
The varname may be objdump
, nm
, or as
, in which case
it specifies the options to be passed to the specified programs. Exactly one
of objdump
or nm
must be specified, as that also specifies which
program to run after the assembler has finished. If varname is
source
, it specifies the name of the source file; otherwise,
`file.s' is used. If varname is name
, it specifies the
name of the test to be used in the pass
or fail
messages.
The non-commented parts of the file are interpreted as regular expressions, one
per line. Blank lines in the objdump
or nm
output are skipped,
as are blank lines in the .d
file; the other lines are tested to see if
the regular expression matches the program output. If it does not, the test
fails.
Note that this means the tests must be modified if the objdump
output
style is changed.
Go to the first, previous, next, last section, table of contents.