Basic source IR - [State 1]

The IR at this stage is, for a large part, built. See Figure 2 for a global view of its structure. It can be used right away to regenerate the source code unmodified. However, it does not yet contain the infrastructure necessary for data-flow analysis and therefore cannot be differentiated yet. The IR is accessible essentially from one object (e.g. “callGraph”) of type CallGraph. The main structure available in this CallGraph is a tree of nested objects of type Unit, each Unit standing for either:

  • a source file (which behaves a lot like a package)

  • a package (Fortran style MODULE’s)

  • a class (C++ style)

  • a procedure (subroutine or function, including the C main or the Fortran PROGRAM)

  • a constructor or destructor (C++ style)

The nesting of this tree of Unit’s exactly reflects the textual nesting in the source files. In contrast, it is not in charge of reflecting dependencies, class inheritance, class extension, module USE, procedure call...

The topmost nodes of this tree of nested Unit’s are the ones standing for files. They are accessible through callGraph.topUnits().

Every Unit (e.g. “unit”) holds the list of all Unit’s immediately under it in the tree of nested Unit’s This list is accessible through unit.lowerLevelUnits.

Conversely, every Unit holds a link to the Unit immediately enclosing it in the tree of nested Unit’s The enclosing Unit is null for the topmost Unit’s in the tree. This enclosing Unit is accessible through unit.upperLevelUnit().

The CallGraph also holds the complete list of all Unit’s. They are accessible through callGraph.units().

The Call-Graph part, strictly speaking, is represented as arrows (CallArrow) between Unit’s. A CallArrow holds origin and destination Unit’s, plus its nature (call, contains, uses...). Each Unit holds the list (TapList<CallArrow> callees) of the CallArrow’s that it calls, and symmetrically the list (TapList<CallArrow> callers) of the CallArrow’s that call it.

In parallel with the building of the Unit’s, a hierarchy of SymbolTable’s is built. It has the shape of a tree, rooted at the root SymbolTable, which is accessible from the CallGraph. Each SymbolTable has a link to its enclosing SymbolTable, which is null for the root Symboltable. Unit’s, and also Block’s in Unit’s, have a handle to their symbolTable(s). At this stage, the SymbolTable’s are not completely “finished” (they will be finished at the next step) but they already contain all the symbols that have been declared by the code just read.

A Unit itself contains its Flow-Graph. Flow-Graphs can be trivial (e.g. for Modules and Classes), and arbitrarily complex for procedures and files. A Flow-Graph consists of a set of basic blocks (Block) linked by flow arrows (FGArrow). Block is the parent class for derived classes BasicBlock (plain Block), HeaderBlock (loop header), EntryBlock (Unit entry), ExitBlock (Unit exit), LoopBlock (composite super-block holding a complete loop). There is a special entry block (EntryBlock) and exit block (ExitBlock). All the other Block’s are kept in a list (TapList<Block>) which is ordered in a very natural order known as DFST (Depth First Spanning Tree). Block’s can also be accessed through a hierarchy of nested loop levels: topBlocks are the topmost level. The elements of topBlocks of type LoopBlock represent loops. LoopBlock’s contain link to their inside Block’s, and so on so forth. A Block holds the flow arrows that leave from it (TapList<FGArrow> flow) and that arrive to it (TapList<FGArrow> backflow) A FGArrow hold an origin, destination, control nature, and control case. Consequently, a Block can contain only one Instruction that makes a control-flow decision, e.g. a loop header, a conditional, or a jump. When it is present, this control Instruction must be the last one in the Block contents. Additionnaly, a Block is related to only one scope. Therefore if some part of the code has local declarations, then some Block’s may have to be split in several Block’s even if they form a straight-line control portion. Consequently, a file Unit may also need to hold several Block’s (even if, obviously, it is straight-line) because of the C++ namespace construct that opens nested scopes.

The links to the entry block, exit block, allBlocks, topBlocks etc are kept by their containing Unit.

When appropriate, Objects of the IR hold back-links to their enclosing IR object e.g. Block to enclosing LoopBlock, Block to enclosing Unit, SymbolTable to enclosing SymbolTable, Unit to enclosing Unit, Unit to enclosing CallGraph, etc.

MemoryMap, AlignmentBoundary are used to analyze Fortran’s COMMON and EQUIVALENCE declarations. They build a representation of the sequential memory corresponding to each COMMON, in order to sort out the implied equivalences (aka aliasing) between variables.