Basic source IR - **[State 1]** -------------------------------------------------------- The IR at this stage is, for a large part, built. See [Figure 2](../../../../../../../docs/README-figure6.md) for a global view of its structure. It can be used right away to regenerate the source code unmodified. However, it does not yet contain the infrastructure necessary for data-flow analysis and therefore cannot be differentiated yet. The IR is accessible essentially from one object (e.g. \"callGraph\") of type `CallGraph`. The main structure available in this `CallGraph` is a tree of nested objects of type `Unit`, each `Unit` standing for either: - a source file (which behaves a lot like a package) - a package (Fortran style `MODULE`'s) - a class (C++ style) - a procedure (subroutine or function, including the C `main` or the Fortran `PROGRAM`) - a constructor or destructor (C++ style) The nesting of this tree of `Unit`'s exactly reflects the textual nesting in the source files. In contrast, it is not in charge of reflecting dependencies, class inheritance, class extension, module USE, procedure call\... The topmost nodes of this tree of nested `Unit`'s are the ones standing for files. They are accessible through `callGraph.topUnits()`. Every `Unit` (e.g. \"unit\") holds the list of all `Unit`'s immediately under it in the tree of nested `Unit`'s This list is accessible through `unit.lowerLevelUnits`. Conversely, every `Unit` holds a link to the `Unit` immediately enclosing it in the tree of nested `Unit`'s The enclosing `Unit` is null for the topmost `Unit`'s in the tree. This enclosing `Unit` is accessible through `unit.upperLevelUnit()`. The `CallGraph` also holds the complete list of all `Unit`'s. They are accessible through `callGraph.units()`. The Call-Graph part, strictly speaking, is represented as arrows (`CallArrow`) between `Unit`'s. A `CallArrow` holds origin and destination `Unit`'s, plus its nature (call, contains, uses\...). Each `Unit` holds the list (`TapList callees`) of the `CallArrow`'s that it calls, and symmetrically the list (`TapList callers`) of the `CallArrow`'s that call it. In parallel with the building of the `Unit`'s, a hierarchy of `SymbolTable`'s is built. It has the shape of a tree, rooted at the root `SymbolTable`, which is accessible from the `CallGraph`. Each `SymbolTable` has a link to its enclosing `SymbolTable`, which is null for the root `Symboltable`. `Unit`'s, and also `Block`'s in `Unit`'s, have a handle to their symbolTable(s). At this stage, the `SymbolTable`'s are not completely \"finished\" (they will be finished at the next step) but they already contain all the symbols that have been declared by the code just read. A `Unit` itself contains its Flow-Graph. Flow-Graphs can be trivial (e.g. for Modules and Classes), and arbitrarily complex for procedures and files. A Flow-Graph consists of a set of basic blocks (`Block`) linked by flow arrows (`FGArrow`). `Block` is the parent class for derived classes `BasicBlock` (plain `Block`), `HeaderBlock` (loop header), `EntryBlock` (Unit entry), `ExitBlock` (Unit exit), `LoopBlock` (composite super-block holding a complete loop). There is a special entry block (`EntryBlock`) and exit block (`ExitBlock`). All the other `Block`'s are kept in a list (`TapList`) which is ordered in a very natural order known as DFST (Depth First Spanning Tree). `Block`'s can also be accessed through a hierarchy of nested loop levels: topBlocks are the topmost level. The elements of topBlocks of type `LoopBlock` represent loops. `LoopBlock`'s contain link to their inside `Block`'s, and so on so forth. A `Block` holds the flow arrows that leave from it (`TapList flow`) and that arrive to it (`TapList backflow`) A `FGArrow` hold an origin, destination, control nature, and control case. Consequently, a `Block` can contain only one `Instruction` that makes a control-flow decision, e.g. a loop header, a conditional, or a jump. When it is present, this control `Instruction` must be the last one in the `Block` contents. Additionnaly, a `Block` is related to only one scope. Therefore if some part of the code has local declarations, then some `Block`'s may have to be split in several `Block`'s even if they form a straight-line control portion. Consequently, a file `Unit` may also need to hold several `Block`'s (even if, obviously, it is straight-line) because of the C++ `namespace` construct that opens nested scopes. The links to the entry block, exit block, `allBlocks`, `topBlocks` etc are kept by their containing `Unit`. When appropriate, Objects of the IR hold back-links to their enclosing IR object e.g. `Block` to enclosing `LoopBlock`, `Block` to enclosing `Unit`, `SymbolTable` to enclosing `SymbolTable`, `Unit` to enclosing `Unit`, `Unit` to enclosing `CallGraph`, etc. `MemoryMap, AlignmentBoundary` are used to analyze Fortran's `COMMON` and `EQUIVALENCE` declarations. They build a representation of the sequential memory corresponding to each `COMMON`, in order to sort out the implied equivalences (aka aliasing) between variables.