Tapenade data-flow analyses =========================== * Files: [Javadoc](../../../../../../../build/fr/inria/tapenade/analysis/package-index.html) `ADActivityAnalyzer ADTBRAnalyzer ActivityPattern DataFlowAnalyzer DepsAnalyzer` `DiffLivenessAnalyzer InOutAnalyzer LoopArrayAccessAnalyzer PointerAnalyzer ReqExplicit` `OmpClauseAnalyzer` ~~~ new XXAnalyzer() XXAnalyzer.run() --> runBottomUpAnalysis() or runTopDownAnalysis() --> setCurUnitEtc() --> initializeCGForUnit() --> initializeCGForRootUnit() --> analyze() --> analyzeForward() or analyzeBackward() --> initializeUnit() --> setCurBlockEtc() --> initializeFGForBlock() --> initializeInitBlock() --> accumulateValuesFromUpstream() or ...FromDownstream() --> setEmptyCumulAndCycleValues() --> getValueFlowingThrough() or ...FlowingBack() --> cumulValueWithAdditional() --> cumulCycleValueWithAdditional() --> initializeCumulValue() --> compareUpstreamValues() --> propagateValuesForwardThroughBlock() or ...BackwardThroughBlock() --> propagateValuesForwardThroughExpression() or ...BackwardThroughExpression() --> compareDownstreamValues() --> terminateFGForBlock() --> terminateTermBlock() --> terminateUnit() --> terminateCGForUnit() ~~~ Data-Flow analysis are context-sensitive (run on the full `CallGraph`) and flow-sensitive (for each procedure, runs on its Flow-Graph). They behave well on circular Call-Graphs (recursivity) and circular Flow-Graphs (loops) thanks to intensive use of fixed-point iterations. For efficiency reasons, the order in which the graph nodes are visited is based on - DFST (Depth First Spanning Tree) order for top-down and for forward propagations, and the reverse of DFST for bottom-up and for backwards propagations. - a Waiting List mechanism to avoid sweeps on nodes that propagate nothing new. Each Data-Flow analysis is supposed to run on the whole IR. There is no equivalent of separate compilation. The advantage is that data-flow information on a procedure can be used to compute more finely the data-flow information on the procedures that call it. Each data-flow analysis is defined by its own class, that inherits from `DataFlowAnalyzer`. In the following, we show in ` red` the (virtual) methods that a specific data-flow analysis (suppose it is named ` XXAnalyzer`) must define, and in ` green` the predefined methods that are provided by the ` DataFlowAnalyzer` and that may be used in the implementation of the analysis-specific methods. In the case of data-flow analysis, these predefined methods very often call other virtual methods that must be defined by ` XXAnalyzer`. This back and forth behavior between ` XXAnalyzer` and ` DataFlowAnalyzer` is a natural consequence of the classical theory of data-flow analyses known as *abstract interpretation*. For each data-flow analysis (e.g. the `XX` analysis), `XXAnalyzer` must at least define/overload: - its constructor: `XXAnalyzer(TapEnv, CallGraph, String, TapList)`: creates the ` XXAnalyzer` object. - its launcher: `run(TapList rootUnits)`: runs the analysis on the Call-Graph below the given rootUnits. One must implement their own `run`. It can be standalone, but in most cases one will make it call the predefined `runTopDownAnalysis(TapList)` and/or `runBottomUpAnalysis(TapList)`. In this case, these predefined methods eventually call the following virtual methods. For each of those, class ` XXAnalyzer` must provide their own implementation, or call a predefined implementation when provided: - `setCurUnitEtc(Unit)`: hook to declare that the given `Unit` becomes the new current `curUnit`. - `initializeCGForUnit()`: hook to initialize things for the current `curUnit` before running the analysis on the current `curCallGraph`. - `initializeCGForRootUnit(Object)`: hook to initialize things for the `curUnit`, which is a rootUnit, before running the analysis on the Call-Graph - `analyze()`: runs the analysis on the `curUnit` - `concludeCGForUnit()`: hook to terminate things for the `curUnit` after running the analysis on the Call-Graph One must implement their own `analyze()`. It can be standalone, but in most cases one will make it call the predefined `analyzeForward(TapList entryArrows, TapList insideBlocks, TapList exitArrows)` and/or `analyzeBackward(TapList entryArrows, TapList insideBlocks, TapList exitArrows)`. In that case, these predefined methods eventually call the following virtual methods. For each of those, class ` XXAnalyzer` must provide their own implementation, or call a predefined implementation when provided: - `initializeUnit()`: hook to initialize things for `curUnit` before starting the analysis on `curUnit`. - `setCurBlockEtc(Block)`: hook to declare that a Block becomes the new `curBlock`. - `initializeFGForBlock()`: hook to initialize things for the `curBlock` before starting the analysis on `curUnit`. - `initializeInitBlock()`: hook to initialize things for the `curBlock`, which is an initial block of the `curUnit` for the given analysis direction, before starting the analysis on `curUnit`. - `accumulateValuesFromUpstream()` (resp. ` Downstream` when backwards): accumulate the data-flow information coming flow-wise (resp. back-flow-wise) through the `FGArrow`'s flowing to (resp. from)`curBlock`. - `compareUpstreamValues()`: compare data-flow information just upstream `curBlock`, with the data-flow information at the same location but upon previous passage. Return true when the information has changed and therefore a new fixed-point iteration is required. - `propagateValuesForwardThroughBlock()` (resp. ` Backward` when backwards): propagate the data-flow information forward (resp. backward) across `curBlock`. Must return true when the control flow can effectively go through `curBlock` (no `stop` nor `exit`). - `compareDownstreamValues()`: compare data-flow information just downstream `curBlock`, with the data-flow information at the same location but upon previous passage. Return true when the information has changed and therefore a new fixed-point iteration is required. - `terminateFGForBlock()`: hook to terminate things for the `curBlock` after running the analysis on `curUnit`. - `terminateTermBlock()`: hook to terminate things for the `curBlock`, which is a final block of the `curUnit` for the given analysis direction, after running the analysis on `curUnit`. - `terminateUnit()`: hook to terminate things for `curUnit` after running the analysis on `curUnit`. One can implement their own ` accumulateValuesFromUpstream` (resp. ` Downstream` when backwards). One can alternatively do nothing and use the predefined ` accumulateValuesFromUpstream` (resp. ` Downstream`) provided in ` DataFlowAnalyzer`. In that case the predefined implementation eventually calls the following virtual methods. For each of those, class ` XXAnalyzer` must provide their own implementation, or call a predefined implementation when provided: - `setEmptyCumulAndCycleValues()`: initialize a recipient for fusing the values arriving through several flow arrows. - `getValueFlowingThrough()` (resp. ` FlowingBack` when backwards): get the value that arrives through the current `curArrow`. - `cumulValueWithAdditional(SymbolTable)`: accumulates the arriving value into the non-cycling part of the accumulation recipient. - `cumulCycleValueWithAdditional(SymbolTable)`: accumulates the arriving value into the cycling part of the accumulation recipient. - `initializeCumulValue()` (needed only when backwards): Create an empty info to propagate it backwards. Necessary when `curBlock` contains `stop` or `exit`. One can implement their own ` propagateValuesForwardThroughBlock` (resp. ` Backward` when backwards). One can alternatively do nothing and use the predefined ` propagateValuesForwardThroughBlock` (resp. ` Backward`) provided in ` DataFlowAnalyzer`. In that case the predefined implementation eventually calls the following virtual methods. For each of those, class ` XXAnalyzer` must provide their own implementation, or call a predefined implementation when provided: - `propagateValuesForwardThroughExpression(Tree)` (resp. ` Backward` when backwards): propagate the data-flow information forward (resp. backward) across the given instruction. AD-independent preliminary data-flow analyses - **[Step 2]** -------------------------------------------------------------------------------------------- * Files: `LoopArrayAccessAnalyzer PointerAnalyzer InOutAnalyzer` ``` CallGraph.prepareAnalyses() --> LoopArrayAccessAnalyzer.runAnalysis() --> PointerAnalyzer.runAnalysis() --> InOutAnalyzer.runAnalysis() ``` AD-independent Data-Flow analyses are run in this order: - Preparatory analysis on loops - Pointer destinations - In-Out (aka read-write). Each of these 3 analyses is run completely on the full code (full `CallGraph`) before the next analysis starts. Notes on LoopArrayAccessAnalyzer -------------------------------- ``` Utility analysis that detects loop-localized array zones, and array zones completely swept by a loop. Vocabulary: regarding a loop "H", a given zone "i" can be: -- "localized" (L) when accesses to "i" in the loop "H" can access the same cell only during the same iteration. In other words, there is no loop carried dependency. -- "uniqueAccess" (U) when each iteration inside "H" only accesses one unique memory cell of "i" -- "topUniqueAccess" (T) when "H" is the topmost "U" loop for "i" -- "loopComplete" (C) when after all iterations of loop "H", each and every cell of "i" has been visited. We claim there are only the following possible combinations of UTC: not-U ; U-and-not-T ; U-and-T-and-not-C ; U-and-T-and-C Here are several examples. Consider 2 nested loops B1 : Do i... B2 : Do j... EndDo EndDo When the only accesses | then the zone of T | and the zone of T to array T inside are: | is (regarding B1): | is (regarding B2): -------------------------------------------------------------------- T(i,j) | LUTC | LU T(i,j,cste) | LUT | LU T(i,j,3) and T(i,j,4) | L | L T(j) | . | LUTC T(i) | LUTC | .U T(i,2j) and T(i,2j+1) | L | L T(2i,j) and T(2i+1,j) | L | L T(12,5) | .UT | .U T(i/2) | . | .UT This information is used to stop propagation of data-flow info along inexistent loop-carried dependencies, and to have a minimal treatment of array-kill info by loops. ``` Source IR with basic data-flow info - **[State 3]** ---------------------------------------------------------------------------------- AD-specific data-flow analyses - **** ------------------------------------------------------------------------------ * Files: `ADActivityAnalyzer ReqExplicit DepsAnalyzer DiffLivenessAnalyzer ADTBRAnalyzer DiffRoot DiffPattern ActivityPattern` ``` ADActivityAnalyzer.runAnalysis() ReqExplicit.runAnalysis() DepsAnalyzer.runAnalysis() DiffLivenessAnalyzer.runAnalysis() ADTBRAnalyzer.runAnalysis() ``` AD-specific Data-Flow analyses are run in this order: Activity; Pointer Activity; Dependencies; Diff-Liveness; TBR. Each of these 5 analyses is run completely on the full code (full `CallGraph`) before the next analysis starts. Notes on the 4 phases Storage Activity Analysis (ReqExplicit) ------------------------------------------------------------- ``` [1] BOTTOMUP_1=="conditional maybe_required" : For each Unit curU, computes ** RE: reqXEffects(curU): The effect of any call curU() on ReqX analysis, a BoolMatrix which, multiplied on the right with the ReqX downstream the call curU(), returns the ReqX upstream the call curU(). ** REA: reqXEffectsAdded(curU): The "added" effect of any call curU() on ReqX analysis, a BoolVector of the zones to be added unconditionally to the ReqX upstream the call curU(). REA could be an extra column in RE. Computation: (Backwards from final point to entry point of each analyzed unit curU) -- Initialization values written down at the final point: RE_d := Identity REA_d := Empty Rationale: This is the "effect" of an empty-body routine, i.e. a neutral element for the propagation through a call. -- up through an active access (read or write) to *P : RE_u := RE_d (i.e. unmodified!) REA_u := REA_d U {P} Rationale: If *P is active, this makes P' required up, unconditionally -- up through a pointer assignment P=>P2 or P=>malloc or P=>&T : RE_u := RE_d {set {P->zero} if P is total} U {P2->RE_d(P)} REA_u := REA_d \ {P if P is total} U {P2 if P in REA_d} Rationale: The reasons why P may be required down, get accumulated into the reasons why P2 may be required up. If P is totally overwritten, the upstream P doesn't yet require a P'. -- up through a free(P) or nullify(P) (special case of the above): RE_u := RE_d {set {P->zero} if P is total} REA_u := REA_d \ {P if P is total} Rationale: If P is total, P' is freed, so P' is not required up any more. Notice this case requires an incorrect code as P, like P', is required by downstream code, but we found no allocation of P upstream. -- up through a call e.g. call Foo(...,P,...) RE_u := {RE(Foo) +Id for non-total actual-args} * RE_d REA_u := ({RE(Foo) +Id for non-total actual-args} * REA_d) U REA(Foo) U {actual-args of active formal args} Rationale: For any P, P' is required upstream the call Foo() either because Foo may require it depending on pointers required downstream call Foo(), or because Foo may require it unconditionally. -- Final result found at the initial point: RE(curU) := RE_u REA(curU) := REA_u Rationale: this is what is needed for the TOPDOWN_2 phase on any call curU() [2] TOPDOWN_2=="maybe_required" : For each Unit curU, computes ** ReqX: Required active storage Between each instruction, the (pointer) zones for which the diff pointer may need to be allocated when reaching that point. Computation: (Backwards from final point to entry point of each analyzed unit curU) -- Initialization value written down at the final point: R_d := {accumulated R_d's analysis context on the exit of curU}, or Empty. Rationale: no P' required at the final exit of the differentiated part. -- up through an active access (read or write) to *P : R_u := R_d U {P} Rationale: active use of P requires to have a P' available. -- up through a pointer assignment P=>P2 or P=>malloc or P=>&T : R_u := R_d \ {P if P is total} U {P2 if P in R_d} Rationale: a required down P' must be made here => not required up. Furthermore, if P' required down, then P2' is required up. -- up through a free(P) or nullify(P) (special case of the above): R_u := R_d \ {P if P is total} Rationale: a priori, the upstream P doesn't yet require a P'. -- up through a call e.g. call Foo(...,P,...) ==> Accumulate R_d into the TOPDOWN_2 analysis context on the exit of Foo R_u := ({RE(Foo) +Id for non-total actual-args} * R_d) U REA(Foo) Rationale: by definition of RE(Foo) and of REA(Foo). Non-total actual-args must retain their status from R_d. -- Final result found at the initial point: R_u When curU is a top Unit, R_u(curU) can be used to require the calling context of curU to provide hand-built derivative variables. [3] BOTTOMUP_3=="conditional maybe_present" : For each Unit curU, computes ** AEV: unitsAvlXEffectsVisible(curU): The "Visibility" effect of any call curU() on AvlX analysis, i.e. the pointer arguments that possibly at least partly keep their allocation status through the call curU(). This may be seen as the list of P possibly not deallocated by curU. ** AEA: unitsAvlXEffectsAdded(curU): The "added" effect of any call curU() on AvlX analysis, a BoolVector of the zones to be added to the AvlX downstream the call curU(). This may be seen as the list of pointers P outputs of call curU() whose derivative P' may become (partly) available (hopefully because allocated) upon return from curU. Computation: (Forwards from entry point to final point of each analyzed unit curU) -- Initialization values written down at the initial point: AEV_u := Full AEA_u := Empty Rationale: neutral element, i.e. the effect of a no-op. -- down through an active access (read or write) to *P : AEV_d := AEV_u (i.e. unmodified!) AEA_d := AEA_u (i.e. unmodified!) Rationale: The "maybe-present" analysis considers that the generating fact for a pointer variable P' to be present is the place where it is effectively allocated and not the place where this allocation has been required. Therefore we assume that future A_u at this point will certainly contain P. -- down through a pointer assignment P=>P2 or P=>malloc or P=>&T : AEV_d := AEV_u \ {P if P is total} AEA_d := AEA_u \ {P if P is total} U {P if P in R_d} Rationale: the up P is overwritten, so its "presence" status vanishes. If the down P is ReqX down, then P' must be made here and therefore becomes present down. -- down through a free(P) or nullify(P) (special case of the above): AEV_d := AEV_u \ {P if P is total} AEA_d := AEA_u \ {P if P is total} Rationale: the up P is overwritten, so its "presence" status vanishes. -- down through a call e.g. call Foo(...,P,...) AEV_d := AEV_u \ {(!AEV(Foo)) except for non-total actual-args} AEA_d := AEA_u \ {(!AEV(Foo)) except for non-total actual-args} U AEA(Foo) Rationale: by definition of AEV(Foo) and of AEA(Foo). -- Final result found at the final point: AEV(curU) := AEV_d AEA(curU) := AEA_d Rationale: this is what is needed for the TOPDOWN_4 phase on any call curU() [4] TOPDOWN_4=="maybe_present" : For each Unit curU, computes ** AvlX: MaybePresent active storage Between each instruction, the (pointer) zones for which the diff pointer may be present and allocated and therefore should be deallocated (with if_present test) along with the source variable. Computation: (Forwards from entry point to final point of each analyzed unit curU) -- Initialization value written down at the initial point: A_u := {accumulated A_u's analysis context on the entry of curU} U R_u Rationale: we assume everything required up will have been made available by the calling context. -- down through an active access (read or write) to *P : A_d := A_u (i.e. unmodified!) Rationale: could be "A_u U {P}", but we believe that this analysis has already put P into A_u. -- down through a pointer assignment P=>P2 or P=>malloc or P=>&T : A_d := A_u \ {P if P is total} U {P if P in R_d} Rationale: the up P is overwritten, so its "presence" status vanishes. actually, even if we wanted to free P', this statement looses its address! Then if P is required down, P' will be made here => it is now present! Also for P2, we believe that this analysis has already placed the presence status in A_u if P2' is present. -- down through a free(P) or nullify(P) (special case of the above): A_d := A_u \ {P if P is total} Rationale: the up P is overwritten, so its "presence" status vanishes. -- down through a call e.g. call Foo(...,P,...) ==> Accumulate A_u into the TOPDOWN_4 analysis context on the entry of Foo A_d := A_u \ {(!AEV(Foo)) except for non-total actual-args} U AEA(Foo) Rationale: by definition of AEV(Foo) and of AEA(Foo). Non-total actual-args must retain their status from A_u. -- Final result found at the final point: A_d When curU is a top Unit, A_d(curU) can be used to tell the calling context of curU that it may free the available differentiated derivatives if they exist. ``` Ready for differentiation - **[State 4]** ------------------------------------------------------------------------