Basic source IR - **[State 1]**
--------------------------------------------------------
The IR at this stage is, for a large part, built.
See
[Figure 2](../../../../../../../docs/README-figure6.md)
for a global view of its structure.
It can be used right away to
regenerate the source code unmodified. However, it does not yet contain
the infrastructure necessary for data-flow analysis and therefore cannot
be differentiated yet. The IR is accessible essentially from one object
(e.g. \"callGraph\") of type `CallGraph`. The main structure available
in this `CallGraph` is a tree of nested objects of type `Unit`, each
`Unit` standing for either:
- a source file (which behaves a lot like a package)
- a package (Fortran style `MODULE`'s)
- a class (C++ style)
- a procedure (subroutine or function, including the C `main` or the
Fortran `PROGRAM`)
- a constructor or destructor (C++ style)
The nesting of this tree of `Unit`'s exactly reflects the textual
nesting in the source files. In contrast, it is not in charge of
reflecting dependencies, class inheritance, class extension, module USE,
procedure call\...
The topmost nodes of this tree of nested `Unit`'s are the ones standing
for files. They are accessible through `callGraph.topUnits()`.
Every `Unit` (e.g. \"unit\") holds the list of all `Unit`'s immediately
under it in the tree of nested `Unit`'s This list is accessible through
`unit.lowerLevelUnits`.
Conversely, every `Unit` holds a link to the `Unit` immediately
enclosing it in the tree of nested `Unit`'s The enclosing `Unit` is null
for the topmost `Unit`'s in the tree. This enclosing `Unit` is
accessible through `unit.upperLevelUnit()`.
The `CallGraph` also holds the complete list of all `Unit`'s. They are
accessible through `callGraph.units()`.
The Call-Graph part, strictly speaking, is represented as arrows
(`CallArrow`) between `Unit`'s. A `CallArrow` holds origin and
destination `Unit`'s, plus its nature (call, contains, uses\...). Each
`Unit` holds the list (`TapList callees`) of the
`CallArrow`'s that it calls, and symmetrically the list
(`TapList callers`) of the `CallArrow`'s that call it.
In parallel with the building of the `Unit`'s, a hierarchy of
`SymbolTable`'s is built. It has the shape of a tree, rooted at the root
`SymbolTable`, which is accessible from the `CallGraph`. Each
`SymbolTable` has a link to its enclosing `SymbolTable`, which is null
for the root `Symboltable`. `Unit`'s, and also `Block`'s in `Unit`'s,
have a handle to their symbolTable(s). At this stage, the
`SymbolTable`'s are not completely \"finished\" (they will be finished
at the next step) but they already contain all the symbols that have
been declared by the code just read.
A `Unit` itself contains its Flow-Graph. Flow-Graphs can be trivial
(e.g. for Modules and Classes), and arbitrarily complex for procedures
and files. A Flow-Graph consists of a set of basic blocks (`Block`)
linked by flow arrows (`FGArrow`). `Block` is the parent class for
derived classes `BasicBlock` (plain `Block`), `HeaderBlock` (loop
header), `EntryBlock` (Unit entry), `ExitBlock` (Unit exit), `LoopBlock`
(composite super-block holding a complete loop). There is a special
entry block (`EntryBlock`) and exit block (`ExitBlock`). All the other
`Block`'s are kept in a list (`TapList`) which is ordered in a
very natural order known as DFST (Depth First Spanning Tree). `Block`'s
can also be accessed through a hierarchy of nested loop levels:
topBlocks are the topmost level. The elements of topBlocks of type
`LoopBlock` represent loops. `LoopBlock`'s contain link to their inside
`Block`'s, and so on so forth. A `Block` holds the flow arrows that
leave from it (`TapList flow`) and that arrive to it
(`TapList backflow`) A `FGArrow` hold an origin, destination,
control nature, and control case. Consequently, a `Block` can contain
only one `Instruction` that makes a control-flow decision, e.g. a loop
header, a conditional, or a jump. When it is present, this control
`Instruction` must be the last one in the `Block` contents.
Additionnaly, a `Block` is related to only one scope. Therefore if some
part of the code has local declarations, then some `Block`'s may have to
be split in several `Block`'s even if they form a straight-line control
portion. Consequently, a file `Unit` may also need to hold several
`Block`'s (even if, obviously, it is straight-line) because of the C++
`namespace` construct that opens nested scopes.
The links to the entry block, exit block, `allBlocks`, `topBlocks` etc
are kept by their containing `Unit`.
When appropriate, Objects of the IR hold back-links to their enclosing
IR object e.g. `Block` to enclosing `LoopBlock`, `Block` to enclosing
`Unit`, `SymbolTable` to enclosing `SymbolTable`, `Unit` to enclosing
`Unit`, `Unit` to enclosing `CallGraph`, etc.
`MemoryMap, AlignmentBoundary` are used to analyze Fortran's `COMMON`
and `EQUIVALENCE` declarations. They build a representation of the
sequential memory corresponding to each `COMMON`, in order to sort out
the implied equivalences (aka aliasing) between variables.