Basic source IR - [State 1]¶
The IR at this stage is, for a large part, built.
See
Figure 2
for a global view of its structure.
It can be used right away to
regenerate the source code unmodified. However, it does not yet contain
the infrastructure necessary for data-flow analysis and therefore cannot
be differentiated yet. The IR is accessible essentially from one object
(e.g. “callGraph”) of type CallGraph
. The main structure available
in this CallGraph
is a tree of nested objects of type Unit
, each
Unit
standing for either:
a source file (which behaves a lot like a package)
a package (Fortran style
MODULE
’s)a class (C++ style)
a procedure (subroutine or function, including the C
main
or the FortranPROGRAM
)a constructor or destructor (C++ style)
The nesting of this tree of Unit
’s exactly reflects the textual
nesting in the source files. In contrast, it is not in charge of
reflecting dependencies, class inheritance, class extension, module USE,
procedure call...
The topmost nodes of this tree of nested Unit
’s are the ones standing
for files. They are accessible through callGraph.topUnits()
.
Every Unit
(e.g. “unit”) holds the list of all Unit
’s immediately
under it in the tree of nested Unit
’s This list is accessible through
unit.lowerLevelUnits
.
Conversely, every Unit
holds a link to the Unit
immediately
enclosing it in the tree of nested Unit
’s The enclosing Unit
is null
for the topmost Unit
’s in the tree. This enclosing Unit
is
accessible through unit.upperLevelUnit()
.
The CallGraph
also holds the complete list of all Unit
’s. They are
accessible through callGraph.units()
.
The Call-Graph part, strictly speaking, is represented as arrows
(CallArrow
) between Unit
’s. A CallArrow
holds origin and
destination Unit
’s, plus its nature (call, contains, uses...). Each
Unit
holds the list (TapList<CallArrow> callees
) of the
CallArrow
’s that it calls, and symmetrically the list
(TapList<CallArrow> callers
) of the CallArrow
’s that call it.
In parallel with the building of the Unit
’s, a hierarchy of
SymbolTable
’s is built. It has the shape of a tree, rooted at the root
SymbolTable
, which is accessible from the CallGraph
. Each
SymbolTable
has a link to its enclosing SymbolTable
, which is null
for the root Symboltable
. Unit
’s, and also Block
’s in Unit
’s,
have a handle to their symbolTable(s). At this stage, the
SymbolTable
’s are not completely “finished” (they will be finished
at the next step) but they already contain all the symbols that have
been declared by the code just read.
A Unit
itself contains its Flow-Graph. Flow-Graphs can be trivial
(e.g. for Modules and Classes), and arbitrarily complex for procedures
and files. A Flow-Graph consists of a set of basic blocks (Block
)
linked by flow arrows (FGArrow
). Block
is the parent class for
derived classes BasicBlock
(plain Block
), HeaderBlock
(loop
header), EntryBlock
(Unit entry), ExitBlock
(Unit exit), LoopBlock
(composite super-block holding a complete loop). There is a special
entry block (EntryBlock
) and exit block (ExitBlock
). All the other
Block
’s are kept in a list (TapList<Block>
) which is ordered in a
very natural order known as DFST (Depth First Spanning Tree). Block
’s
can also be accessed through a hierarchy of nested loop levels:
topBlocks are the topmost level. The elements of topBlocks of type
LoopBlock
represent loops. LoopBlock
’s contain link to their inside
Block
’s, and so on so forth. A Block
holds the flow arrows that
leave from it (TapList<FGArrow> flow
) and that arrive to it
(TapList<FGArrow> backflow
) A FGArrow
hold an origin, destination,
control nature, and control case. Consequently, a Block
can contain
only one Instruction
that makes a control-flow decision, e.g. a loop
header, a conditional, or a jump. When it is present, this control
Instruction
must be the last one in the Block
contents.
Additionnaly, a Block
is related to only one scope. Therefore if some
part of the code has local declarations, then some Block
’s may have to
be split in several Block
’s even if they form a straight-line control
portion. Consequently, a file Unit
may also need to hold several
Block
’s (even if, obviously, it is straight-line) because of the C++
namespace
construct that opens nested scopes.
The links to the entry block, exit block, allBlocks
, topBlocks
etc
are kept by their containing Unit
.
When appropriate, Objects of the IR hold back-links to their enclosing
IR object e.g. Block
to enclosing LoopBlock
, Block
to enclosing
Unit
, SymbolTable
to enclosing SymbolTable
, Unit
to enclosing
Unit
, Unit
to enclosing CallGraph
, etc.
MemoryMap, AlignmentBoundary
are used to analyze Fortran’s COMMON
and EQUIVALENCE
declarations. They build a representation of the
sequential memory corresponding to each COMMON
, in order to sort out
the implied equivalences (aka aliasing) between variables.