Creation of the differentiated IR - **** ================================================================================= * Files: [Javadoc](../../../../../../../build/fr/inria/tapenade/differentiation/package-index.html) `BlockDifferentiator CallGraphDifferentiator DiffAssignmentNode DiffConstants DifferentiationEnv DynMemoryDifferentiator ExpressionDifferentiator FlowGraphDifferentiator NewBlockGraph NewBlockGraphArrow NewBlockGraphNode ProcedureCallDifferentiator SplitForSave UnitCreationContext UnitDiffInfo VarRefDifferentiator` ``` CallGraphDifferentiator.run() --> CallGraphDifferentiator.differentiateDownFrom() --> FlowGraphDifferentiator.differentiateProcedure() --> FlowGraphDifferentiator.differentiateProcedureInMode() --> FlowGraphDifferentiator.precomputeDifferentiatedBlocks() --> BlockDifferentiator.differentiateBlock() --> BlockDifferentiator.differentiateInstructionDeclaration() --> BlockDifferentiator.buildDiffInstructionsOfAssignAllocate() --> BlockDifferentiator.buildDiffInstructionsOfDeallocate() --> ProcedureCallDifferentiator.buildDiffInstructionsOfMPICall() --> ProcedureCallDifferentiator.buildDiffInstructions() --> BlockDifferentiator.buildDiffInstructionsOfPlainAssignment() --> BlockDifferentiator.buildDiffInstructionsOfPlainDeclaration() --> BlockDifferentiator.buildDiffInstructionsOfIncrement() --> BlockDifferentiator.buildDiffInstructionsOfReturn() --> BlockDifferentiator.buildDiffInstructionsOfIORead() --> BlockDifferentiator.buildDiffInstructionsOfOther() --> FlowGraphDifferentiator.differentiateFlowGraphLevel() --> FlowGraphDifferentiator.differentiateEntryBlockLevel() --> FlowGraphDifferentiator.differentiatePlainBlockLevel() --> FlowGraphDifferentiator.differentiateCkpPiece() --> FlowGraphDifferentiator.differentiateNoDiffPiece() --> FlowGraphDifferentiator.differentiateBinomialLoop() --> FlowGraphDifferentiator.differentiateIILoop() --> FlowGraphDifferentiator.differentiateNaturalLoop() --> FlowGraphDifferentiator.differentiateFixedPointLoop() --> FlowGraphDifferentiator.differentiateStructuredLoop() ``` At the level of the complete Call-Graph --------------------------------------- * Files `CallGraphDifferentiator` The top level of Differentiation `CallGraphDifferentiator.run()` operates at the Call-Graph level. It first creates a new `DifferentiationEnv`, that gathers a `CallGraphDifferentiator`, a `FlowGraphDifferentiator`, a `BlockDifferentiator`, a `VarRefDifferentiator`, i.e. the tools to differentiate the nested levels of constructs in a program. The `DifferentiationEnv` also keeps correspondences between original and differentiated objects. After creation of the `DifferentiationEnv`, `differentiateDownFrom()` is called, that does the main part of the work. `differentiateDownFrom()` Consists of successive \"`STEP`'s\", each one running on the complete Call-Graph: - one preliminary step runs for each `Unit`, and inside for each `ActivityPattern` existing for this `Unit`: the activity is computed for the elements of `COMMON`'s in `Unit`. Then the activity may be augmented to produce a differentiated code that is maybe marginally less efficient but less error-prone for the end-user, by forcing more initialization of derivatives. - the differentiated Call-Graph `diffCallGraph` is created and initialized - then another preliminary step runs for each `Unit`, and inside for each `ActivityPattern` existing for this `Unit`: the activity of the `Unit`'s formal arguments is computed and stored into the `UnitDiffInfo` of this `Unit`. - then another preliminary step (`takeCopyUnitDecisions`) decides, for each (source, aka \"primal\") `Unit`, whether the differentiated code must contain a copy of this `Unit`. When there is a risk of ambiguity, the name of the copied `Unit` is often suffixed with `_nodiff`. - then another preliminary step runs for each `Unit`, and inside for each `ActivityPattern` existing for this `Unit`: In principally prepares info about pass-by-value arguents. - then the `diffCallGraph` is populated with all its `Unit`'s, correctly contained in one another when needed, but apart from that empty: no flow graph, no `SymbolTable`'s. *Packages* (Translation units (files), `MODULE`'s) are created first. Then procedures are created: the *copied* procedure first if needed, then as many differentiated procedures are there are `ActivityPattern`'s. Diff procedures are also created for `INTRINSIC`'s that need a derivative. - then the `CallArrow`'s are placed between the newly created `Unit`'s. Note: the `CallArrow`'s between differentiated `Unit`'s will be placed later. - then the differentiated `SymbolTable`'s are built. Root-most `SymbolTable`'s are built first, starting with the root `SymbolTable` of `diffCallGraph`. After this step the differentiated `SymbolTable`'s are essentially copies. The differentiated symbols will be added later. - then the type (`FunctionTypeSpec`) of differentiated Units is built, as well as the differentiated interface declarations. - then the differentiated `Unit`'s are each filled with their differentiated Flow-Graphs. This can be seen as the main part, and it calls the deeper level of differentiation: `FlowGraphDifferentiator`. - then a few adminstrative work is done about the parts of the differentiated `CallGraph` that are actually copies of parts of the source `CallGraph`: these copies may have received a different name (e.g. `XXX_NODIFF` instead of `XXX`). The copies may also refer to copied elements that now have a different name. This step updates these copied names everywhere, taking care of interface names, module names, allocation commands .... This step is a little messy as bits have been added progressively. - finally, all new names (for new, differentiated objects) are fixed (e.g. going from tentative `#V_D#` to final `V_D`), taking care of conflicts with existing names. For instance `#V_D#` can be turned into final `V0_D` if there is already a `V_D` that shares its scope. [Notes on copies of source code inside the generated code](README-copysource.md) At the level of one particular `Unit`'s Flow-Graph -------------------------------------------------- * Files `FlowGraphDifferentiator` Each time the `CallGraphDifferentiator` must differentiate the Flow-Graph contents of a `Unit`, this is subcontracted to the `FlowGraphDifferentiator` Hint: to gain insight on how the `FlowGraphDifferentiator` works and how differentiated `Block`'s are created and connected together, one may run Tapenade with the command-line option `-traceDifferentiation `, that prints a long log of this complex operation. The main entry point is `FlowGraphDifferentiator.differentiateProcedure`, that fills each differentiated version of the current `Unit` with its own (differentiated) Flow-Graph. The source Flow-Graph is used to build a tree of `FlowGraphLevel`'s, which are nested control structures, mostly about nested loops (remember that Tapenade's Flow-Graphs are flat graphs). Each `FlowGraphLevel` receives a kind based on its structure and on the differentiation directives present in the source. `FlowGraphDifferentiator` has a separate differentiation method for each kind. Kinds can be: - `PLAIN_BLOCK`: essentially for an individual `Block` - `ENTRY_BLOCK`: for the `Unit`'s `EntryBlock` - `DO_LOOP`: for a clean `DO` loop or a C clean `for` loop - `WHILE_LOOP`: for a while loop. - `TIMES_LOOP`: for a loop running a fixed number of times. - `NATURAL_LOOP`: for a loop not clean enough to fit in any of the above kinds (extra entries or exits). - `II_LOOP`: for a clean loop that the user guarantees with Independent Iterations and thus deserves a special adjoint differentiation. - `BINOMIAL_LOOP`: for a clean loop that the user guarantees is time-stepping and thus deserves a special adjoint differentiation. - `FIXEDPOINT_LOOP`: for a clean while loop that the user guarantees implements a fixed-point iteration and thus deserves a special adjoint differentiation. - `CHECKPOINT`: for a (connected) subset of a set of \"sibling\" structures, that we group because they will be treated in a special way (*checkpointing* in adjoint differentiation). - `NODIFF`: for a (connected) subset of a set of \"sibling\" structures, that the user requests they are *not* differentiated. - `PLAIN_GROUP`: for a (connected) subset of a set of \"sibling\" structures, that are grouped for any other reason. - `TOP_LEVEL`: for the top node of the tree of nested structures, therefore representing the complete Flow-Graph of the current `Unit`. Each node of the tree of control structures is differentiated in turn, recursively, bottom up. The recursive entry point in this algorithm is `FlowGraphDifferentiator.differentiateFlowGraphLevel`. Missing: details on fwd/bwd entry/exit diffs of one given FlowGraphLevel At the level of one particular Basic Block ------------------------------------------ * Files `BlockDifferentiator` Each time the `FlowGraphDifferentiator` must fill the derivative `Block`'s of a given source `Block`, this is subcontracted to the `BlockDifferentiator`. Hint: to gain insight on how the `BlockDifferentiator` works and where (in the Java source) are created the individual differentiated instructions, one may run Tapenade with the command-line option `-traceDifferentiation `. Upon creation, each new instruction is printed with a short explanation text that one may then search for in `BlockDifferentiator.java`. The main entry point is `BlockDifferentiator.differentiateBlock`, that fills a given empty forward-diff `Block` and a given empty backward-diff `Block` with the differentiation of the instructions of the given source `Block`, in the correct order, and obeying the given `differentiationMode` (one of `TANGENT_MODE, ADJOINT_MODE, ADJOINT_SPLIT_MODE`). Internally, the `Instruction`'s of the source `Block` are swept from first to last, and for each the forward and backward differentiated `Instruction`'s are built. The differentiated `Instruction`'s are accumulated into two data-dependency graphs (one for forward, one for backward). When all instructions of the source `Block` have been swept, each data-dependency graph is first searched for desirable node fusion (preserving data-dependencies), then it is sorted (topological sort), and the resulting list of differentiated `Instruction`'s is put into the corresponding (forward- or backward-) diff `Block`. The primitive that adds a new instruction into the forward data-dependency graph is `addFuturePlainNodeFwd`. The primitives that add a new instruction into the backward data-dependency graph are `addFuturePlainNodeBwd, addFutureZeroDiffNodeBwd, addFutureSetDiffNodeBwd, addFutureIncrDiffNodeBwd`, plus a special `addFutureBwdSplitDecl`. At the level of one particular procedure call --------------------------------------------- * Files `ProcedureCallDifferentiator` [Javadoc](../../../../../../../build/fr/inria/tapenade/differentiation/ProcedureCallDifferentiator.html) At the level of one particular assignment ----------------------------------------- * Files `ExpressionDifferentiator` [Javadoc](../../../../../../../build/fr/inria/tapenade/differentiation/ExpressionDifferentiator.html) In the most frequent situation, differentiating an `expression` happens in the context of an assignment `lhs=expression`. Method `tangentDifferentiateAssignedExpression` returns the tangent-differentiated statement of `lhs=expression`, whereas method `adjointDifferentiateAssignedExpression` returns the list of its adjoint-differentiated statements. There are special situations where there is no `lhs`, for instance when `expression` is the return expression of a function. For these situations, which indeed occur only in tangent mode, method `tangentDifferentiatePlainExpression` returns the tangent-differentiated expression. These three methods share most of their mechanism: 1. A "balanced" copy of `expression` is built. It is balanced in the sense that chains of associative operations are rearranged in the shape of a balanced binary tree. In the sequel, `expression` is indeed the balanced copy. A few sweeps through `expression` evaluate the execution costs and the number of occurences of each of its sub-expressions and of their derivatives in the future derivative code. These are used to take splitting decisions, i.e. decision to precompute a sub-expression or its derivative into a temporary variable. See method `findDiffOptimalCuts`. Final decision is stored at each node of the expression, in annotation `CutCosts`. 2. A "linearization tree" is built from `expression` (together with `lhs` if present), by a call to the `linearize` method. In the linearization tree, where each node is an object of (a sub-class of) `LinTree`. This linearization tree reflects the structure of the linearize expression. For example, the topmost node of the linearization tree is of type `LinAssign`. Available sub-classes of `LinTree` are: * `LinLeaf` for a reference to a variable, i.e. a leaf of the linearization tree. * `LinStd` for a standard arithmetic operation. * `LinAssign` for the assignment * `LinConvert` for a conversion operation * `LinIf` fon an if-expression * `LinCatenate` for a plain juxtaposition of values, e.g. an array constructor * `LinReduce` for an array SUM reduction, possibly with DIM and MASK. * `LinSpread` for an expansion of a scalar (or an array with one less dimension) into an array. These sub-classes seem to cover all needs to-date. However, we provide a template to build new sub-classes of `LinTree` if needed [README-LinOther.md](README-LinOther.md). Linearization observes splitting decisions: when the `CutCosts` annotation on one sub-tree of `expression` requires the derivative (happens only for the adjoint) to be split here, this branch of the current linearization tree is terminated with a `LinLeaf` standing for the temporary (derivative) variable, and a new linearization tree is built starting from a `LinAssign` of the temporary variable with the sub-expression. The linearization tree built is returnd by side effect into the second (`parent`) argument. Assignments of split primal sub-expressions into temporary variables are collected by side-effect into the `toPrecomputes` argument. Linearization trees of split derivative sub-expressions are collected by side-effect into the `toLinTrees` argument. 3. The linearization tree (each of them if splitting occured) is swept to produce the derivative statements. These sweeps are different in tangent and reverse mode. * In tangent mode, method `buildTreeForward` is applied to the root of each linearization tree, root linearization tree last. The linearization tree is swept top-down, and the differentiated expression is built progressively on the way back up. * In adjoint mode, method `buildTreesBackward` is applied to the root of each linearization tree, root linearization tree first. The linearization tree is swept top-down, bringing a copy of the differentiated lhs to each leaf, then the differentiated instruction corresponding to this leaf is built by a special sweep up from the leaf (method `buildTreeBackward`), and is accumulated in the result list of instructions. 4. Upon return from the method, argument `toPrecomputes` is filled with the precomputation instructions introduced by the splitting mechanism. At the level of one particular dynamic memory command ----------------------------------------------------- Missing At the level of one particular declaration ------------------------------------------ Missing At the level of one particular reference expression --------------------------------------------------- * Files `VarRefDifferentiator` [Javadoc](../../../../../../../build/fr/inria/tapenade/differentiation/VarRefDifferentiator.html) Notes about reverse differentiation of Pass-By-Value variables -------------------------------------------------------------- When variables are passed-by-value, a variable `v` used by the called function `F` is indeed a new variable, holding a copy of `v` and vanishing at the exit of `F`. => All passed-by-value formal parameter is always passive (because not useful) and adjoint-dead (because vanishing) at the end of `F`. EXCEPTION: because of the optimization described below, that identifies `vb` and `vb0` when `v` is ONLY READ, we must consider that in this case usefulness of `v` propagates backwards across call exit (cf C:lh14). Therefore, for every formal parameter v of function `F`: - if `v` is not active upon entry of `F` then since it is also passive on exit of `F`, there will be no `vb` parameter (there can be a `vb` inside `F_B`, but it will be local) - else if `v` is ONLY READ during `F` and therefore `vb` is ONLY incremented during `F_B`. In this easier case, we can identify this `vb` in `F_B` with the outside `vb`. - Each call to `F_B` must pass a reference `&vb`, - the header of `F_B` expects a `*vb`, - all occurences of `vb` inside `F_B` become `*vb`. - else in the general case, we cannot identify the inside `vb` with the outside `vb`: - Each call to `F_B` must pass a reference `&vb`, - the header of `F_B` expects a variable we call `*vb0`, - all occurences of `vb` inside `F_B` remain accesses to a local `vb` (not `*vb`) which holds the derivative of the local copy of `v` inside `F_B`. At the end of `F_B`, `vb` is transferred into `*vb0` by `*vb0 += vb;`.