Tapenade data-flow analyses

ADActivityAnalyzer ADTBRAnalyzer ActivityPattern DataFlowAnalyzer DepsAnalyzer
DiffLivenessAnalyzer InOutAnalyzer LoopArrayAccessAnalyzer PointerAnalyzer ReqExplicit
OmpClauseAnalyzer

new XXAnalyzer()  
XXAnalyzer.run()  
--> runBottomUpAnalysis() or runTopDownAnalysis()  
   --> setCurUnitEtc()  
   --> initializeCGForUnit()  
   --> initializeCGForRootUnit()  
   --> analyze()  
      --> analyzeForward() or analyzeBackward()  
        --> initializeUnit()  
        --> setCurBlockEtc()  
        --> initializeFGForBlock()  
        --> initializeInitBlock()  
        --> accumulateValuesFromUpstream() or ...FromDownstream()  
          --> setEmptyCumulAndCycleValues()  
          --> getValueFlowingThrough() or ...FlowingBack()  
          --> cumulValueWithAdditional()  
          --> cumulCycleValueWithAdditional()  
          --> initializeCumulValue()  
      --> compareUpstreamValues()  
      --> propagateValuesForwardThroughBlock() or ...BackwardThroughBlock()  
        --> propagateValuesForwardThroughExpression() or ...BackwardThroughExpression() 
      --> compareDownstreamValues()  
      --> terminateFGForBlock()  
      --> terminateTermBlock()  
      --> terminateUnit()  
   --> terminateCGForUnit()

Data-Flow analysis are context-sensitive (run on the full CallGraph) and flow-sensitive (for each procedure, runs on its Flow-Graph). They behave well on circular Call-Graphs (recursivity) and circular Flow-Graphs (loops) thanks to intensive use of fixed-point iterations. For efficiency reasons, the order in which the graph nodes are visited is based on

  • DFST (Depth First Spanning Tree) order for top-down and for forward propagations, and the reverse of DFST for bottom-up and for backwards propagations.

  • a Waiting List mechanism to avoid sweeps on nodes that propagate nothing new.

Each Data-Flow analysis is supposed to run on the whole IR. There is no equivalent of separate compilation. The advantage is that data-flow information on a procedure can be used to compute more finely the data-flow information on the procedures that call it.

Each data-flow analysis is defined by its own class, that inherits from DataFlowAnalyzer. In the following, we show in red the (virtual) methods that a specific data-flow analysis (suppose it is named XXAnalyzer) must define, and in green the predefined methods that are provided by the DataFlowAnalyzer and that may be used in the implementation of the analysis-specific methods. In the case of data-flow analysis, these predefined methods very often call other virtual methods that must be defined by XXAnalyzer. This back and forth behavior between XXAnalyzer and DataFlowAnalyzer is a natural consequence of the classical theory of data-flow analyses known as abstract interpretation.

For each data-flow analysis (e.g. the XX analysis), XXAnalyzer must at least define/overload:

  • its constructor: XXAnalyzer(TapEnv, CallGraph, String, TapList<String>): creates the XXAnalyzer object.

  • its launcher: run(TapList<Unit> rootUnits): runs the analysis on the Call-Graph below the given rootUnits.

One must implement their own run. It can be standalone, but in most cases one will make it call the predefined runTopDownAnalysis(TapList<Unit>) and/or runBottomUpAnalysis(TapList<Unit>). In this case, these predefined methods eventually call the following virtual methods. For each of those, class XXAnalyzer must provide their own implementation, or call a predefined implementation when provided:

  • setCurUnitEtc(Unit): hook to declare that the given Unit becomes the new current curUnit.

  • initializeCGForUnit(): hook to initialize things for the current curUnit before running the analysis on the current curCallGraph.

  • initializeCGForRootUnit(Object): hook to initialize things for the curUnit, which is a rootUnit, before running the analysis on the Call-Graph

  • analyze(): runs the analysis on the curUnit

  • concludeCGForUnit(): hook to terminate things for the curUnit after running the analysis on the Call-Graph

One must implement their own analyze(). It can be standalone, but in most cases one will make it call the predefined analyzeForward(TapList<FGArrow> entryArrows, TapList<Block> insideBlocks, TapList<FGArrow> exitArrows) and/or analyzeBackward(TapList<FGArrow> entryArrows, TapList<Block> insideBlocks, TapList<FGArrow> exitArrows). In that case, these predefined methods eventually call the following virtual methods. For each of those, class XXAnalyzer must provide their own implementation, or call a predefined implementation when provided:

  • initializeUnit(): hook to initialize things for curUnit before starting the analysis on curUnit.

  • setCurBlockEtc(Block): hook to declare that a Block becomes the new curBlock.

  • initializeFGForBlock(): hook to initialize things for the curBlock before starting the analysis on curUnit.

  • initializeInitBlock(): hook to initialize things for the curBlock, which is an initial block of the curUnit for the given analysis direction, before starting the analysis on curUnit.

  • accumulateValuesFromUpstream() (resp. Downstream when backwards): accumulate the data-flow information coming flow-wise (resp. back-flow-wise) through the FGArrow’s flowing to (resp. from)curBlock.

  • compareUpstreamValues(): compare data-flow information just upstream curBlock, with the data-flow information at the same location but upon previous passage. Return true when the information has changed and therefore a new fixed-point iteration is required.

  • propagateValuesForwardThroughBlock() (resp. Backward when backwards): propagate the data-flow information forward (resp. backward) across curBlock. Must return true when the control flow can effectively go through curBlock (no stop nor exit).

  • compareDownstreamValues(): compare data-flow information just downstream curBlock, with the data-flow information at the same location but upon previous passage. Return true when the information has changed and therefore a new fixed-point iteration is required.

  • terminateFGForBlock(): hook to terminate things for the curBlock after running the analysis on curUnit.

  • terminateTermBlock(): hook to terminate things for the curBlock, which is a final block of the curUnit for the given analysis direction, after running the analysis on curUnit.

  • terminateUnit(): hook to terminate things for curUnit after running the analysis on curUnit.

One can implement their own accumulateValuesFromUpstream (resp. Downstream when backwards). One can alternatively do nothing and use the predefined accumulateValuesFromUpstream (resp. Downstream) provided in DataFlowAnalyzer. In that case the predefined implementation eventually calls the following virtual methods. For each of those, class XXAnalyzer must provide their own implementation, or call a predefined implementation when provided:

  • setEmptyCumulAndCycleValues(): initialize a recipient for fusing the values arriving through several flow arrows.

  • getValueFlowingThrough() (resp. FlowingBack when backwards): get the value that arrives through the current curArrow.

  • cumulValueWithAdditional(SymbolTable): accumulates the arriving value into the non-cycling part of the accumulation recipient.

  • cumulCycleValueWithAdditional(SymbolTable): accumulates the arriving value into the cycling part of the accumulation recipient.

  • initializeCumulValue() (needed only when backwards): Create an empty info to propagate it backwards. Necessary when curBlock contains stop or exit.

One can implement their own propagateValuesForwardThroughBlock (resp. Backward when backwards). One can alternatively do nothing and use the predefined propagateValuesForwardThroughBlock (resp. Backward) provided in DataFlowAnalyzer. In that case the predefined implementation eventually calls the following virtual methods. For each of those, class XXAnalyzer must provide their own implementation, or call a predefined implementation when provided:

  • propagateValuesForwardThroughExpression(Tree) (resp. Backward when backwards): propagate the data-flow information forward (resp. backward) across the given instruction.

AD-independent preliminary data-flow analyses - [Step 2]

  • Files: LoopArrayAccessAnalyzer PointerAnalyzer InOutAnalyzer

       CallGraph.prepareAnalyses()
       --> LoopArrayAccessAnalyzer.runAnalysis()
       --> PointerAnalyzer.runAnalysis()
       --> InOutAnalyzer.runAnalysis()           

AD-independent Data-Flow analyses are run in this order:

  • Preparatory analysis on loops

  • Pointer destinations

  • In-Out (aka read-write).

Each of these 3 analyses is run completely on the full code (full CallGraph) before the next analysis starts.

Notes on LoopArrayAccessAnalyzer

   Utility analysis that detects loop-localized array zones,
   and array zones completely swept by a loop.
   Vocabulary: regarding a loop "H", a given zone "i" can be:
   -- "localized" (L) when accesses to "i" in the loop "H"
   can access the same cell only during the same iteration.
   In other words, there is no loop carried dependency.
   -- "uniqueAccess" (U) when each iteration inside "H"
   only accesses one unique memory cell of "i"
   -- "topUniqueAccess" (T) when "H" is the topmost "U" loop for "i"
   -- "loopComplete" (C) when after all iterations of loop "H",
   each and every cell of "i" has been visited.
   We claim there are only the following possible combinations of UTC:
   not-U ; U-and-not-T ; U-and-T-and-not-C ; U-and-T-and-C
   Here are several examples. Consider 2 nested loops
   B1 : Do i...
   B2 :    Do j...
   EndDo
   EndDo
   When the only accesses   | then the zone of T  | and the zone of T
   to array T inside are:   | is (regarding B1):  | is (regarding B2):
   --------------------------------------------------------------------
   T(i,j)                   | LUTC                | LU
   T(i,j,cste)              | LUT                 | LU
   T(i,j,3) and T(i,j,4)    | L                   | L
   T(j)                     | .                   | LUTC
   T(i)                     | LUTC                | .U
   T(i,2j) and T(i,2j+1)    | L                   | L
   T(2i,j) and T(2i+1,j)    | L                   | L
   T(12,5)                  | .UT                 | .U
   T(i/2)                   | .                   | .UT

   This information is used to stop propagation of data-flow info
   along inexistent loop-carried dependencies, and to have a
   minimal treatment of array-kill info by loops.

Source IR with basic data-flow info - [State 3]

AD-specific data-flow analyses - <Step 3>

  • Files:

ADActivityAnalyzer ReqExplicit DepsAnalyzer DiffLivenessAnalyzer ADTBRAnalyzer DiffRoot DiffPattern ActivityPattern

       ADActivityAnalyzer.runAnalysis()
       ReqExplicit.runAnalysis()
       DepsAnalyzer.runAnalysis()
       DiffLivenessAnalyzer.runAnalysis()
       ADTBRAnalyzer.runAnalysis()        

AD-specific Data-Flow analyses are run in this order: Activity; Pointer Activity; Dependencies; Diff-Liveness; TBR. Each of these 5 analyses is run completely on the full code (full CallGraph) before the next analysis starts.

Notes on the 4 phases Storage Activity Analysis (ReqExplicit)

[1] BOTTOMUP_1=="conditional maybe_required" : For each Unit curU, computes
      ** RE:  reqXEffects(curU): The effect of any call curU() on ReqX analysis,
            a BoolMatrix which, multiplied on the right with the ReqX
            downstream the call curU(), returns the ReqX upstream the call curU().
      ** REA: reqXEffectsAdded(curU): The "added" effect of any call curU() on
            ReqX analysis, a BoolVector of the zones to be added unconditionally
            to the ReqX upstream the call curU(). REA could be an extra column in RE.
  Computation: (Backwards from final point to entry point of each analyzed unit curU)
  -- Initialization values written down at the final point:
      RE_d  := Identity
      REA_d := Empty
     Rationale: This is the "effect" of an empty-body routine,
      i.e. a neutral element for the propagation through a call.

  -- up through an active access (read or write) to *P :
      RE_u  := RE_d          (i.e. unmodified!)
      REA_u := REA_d U {P}
     Rationale: If *P is active, this makes P' required up, unconditionally

  -- up through a pointer assignment P=>P2 or P=>malloc or P=>&amp;T :
      RE_u  := RE_d  {set {P->zero} if P is total} U {P2->RE_d(P)}
      REA_u := REA_d \ {P if P is total} U {P2 if P in REA_d}
     Rationale: The reasons why P may be required down, get accumulated into
      the reasons why P2 may be required up. If P is totally overwritten,
      the upstream P doesn't yet require a P'.

  -- up through a free(P) or nullify(P) (special case of the above):
      RE_u  := RE_d  {set {P->zero} if P is total}
      REA_u := REA_d \ {P if P is total}
     Rationale: If P is total, P' is freed, so P' is not required up any more.
      Notice this case requires an incorrect code as P, like P', is required
      by downstream code, but we found no allocation of P upstream.

  -- up through a call e.g. call Foo(...,P,...)
      RE_u  :=  {RE(Foo) +Id for non-total actual-args} * RE_d
      REA_u := ({RE(Foo) +Id for non-total actual-args} * REA_d) U REA(Foo)
                  U {actual-args of active formal args}
     Rationale: For any P, P' is required upstream the call Foo()
      either because Foo may require it depending on pointers required
      downstream call Foo(), or because Foo may require it unconditionally.

  -- Final result found at the initial point:
      RE(curU)  := RE_u
      REA(curU) := REA_u
     Rationale: this is what is needed for the TOPDOWN_2 phase on any call curU()


[2] TOPDOWN_2=="maybe_required" : For each Unit curU, computes
      ** ReqX: Required active storage
            Between each instruction, the (pointer) zones for which the
            diff pointer may need to be allocated when reaching that point.
  Computation: (Backwards from final point to entry point of each analyzed unit curU)
  -- Initialization value written down at the final point:
      R_d := {accumulated R_d's analysis context on the exit of curU}, or Empty.
     Rationale: no P' required at the final exit of the differentiated part.

  -- up through an active access (read or write) to *P :
      R_u := R_d U {P}
     Rationale: active use of P requires to have a P' available.

  -- up through a pointer assignment P=>P2 or P=>malloc or P=>&amp;T :
      R_u := R_d \ {P if P is total} U {P2 if P in R_d}
     Rationale: a required down P' must be made here => not required up.
      Furthermore, if P' required down, then P2' is required up.

  -- up through a free(P) or nullify(P) (special case of the above):
      R_u := R_d \ {P if P is total}
     Rationale: a priori, the upstream P doesn't yet require a P'.

  -- up through a call e.g. call Foo(...,P,...)
      ==> Accumulate R_d into the TOPDOWN_2 analysis context on the exit of Foo
      R_u := ({RE(Foo) +Id for non-total actual-args} * R_d) U REA(Foo)
     Rationale: by definition of RE(Foo) and of REA(Foo).
      Non-total actual-args must retain their status from R_d.

  -- Final result found at the initial point:
      R_u
     When curU is a top Unit, R_u(curU) can be used to require the calling
     context of curU to provide hand-built derivative variables.


[3] BOTTOMUP_3=="conditional maybe_present" : For each Unit curU, computes
      ** AEV: unitsAvlXEffectsVisible(curU): The "Visibility" effect of any call curU() on
            AvlX analysis, i.e. the pointer arguments that possibly at least
            partly keep their allocation status through the call curU().
            This may be seen as the list of P possibly not deallocated by curU.
      ** AEA: unitsAvlXEffectsAdded(curU): The "added" effect of any call curU() on AvlX analysis,
            a BoolVector of the zones to be added to the AvlX downstream the call curU().
            This may be seen as the list of pointers P outputs of call curU() whose
            derivative P' may become (partly) available (hopefully
            because allocated) upon return from curU.
  Computation: (Forwards from entry point to final point of each analyzed unit curU)
  -- Initialization values written down at the initial point:
      AEV_u := Full
      AEA_u := Empty
     Rationale: neutral element, i.e. the effect of a no-op.

  -- down through an active access (read or write) to *P :
      AEV_d := AEV_u       (i.e. unmodified!)
      AEA_d := AEA_u       (i.e. unmodified!)
     Rationale: The "maybe-present" analysis considers that the
      generating fact for a pointer variable P' to be present is
      the place where it is effectively allocated and not the place
      where this allocation has been required. Therefore we assume
      that future A_u at this point will certainly contain P.

  -- down through a pointer assignment P=>P2 or P=>malloc or P=>&amp;T :
      AEV_d := AEV_u \ {P if P is total}
      AEA_d := AEA_u \ {P if P is total} U {P if P in R_d}
     Rationale: the up P is overwritten, so its "presence" status vanishes.
      If the down P is ReqX down, then P' must be made here and therefore
      becomes present down.

  -- down through a free(P) or nullify(P) (special case of the above):
      AEV_d := AEV_u \ {P if P is total}
      AEA_d := AEA_u \ {P if P is total}
     Rationale: the up P is overwritten, so its "presence" status vanishes.

  -- down through a call e.g. call Foo(...,P,...)
      AEV_d := AEV_u \ {(!AEV(Foo)) except for non-total actual-args}
      AEA_d := AEA_u \ {(!AEV(Foo)) except for non-total actual-args} U AEA(Foo)
     Rationale: by definition of AEV(Foo) and of AEA(Foo).

  -- Final result found at the final point:
      AEV(curU) := AEV_d
      AEA(curU) := AEA_d
     Rationale: this is what is needed for the TOPDOWN_4 phase on any call curU()


[4] TOPDOWN_4=="maybe_present" : For each Unit curU, computes
      ** AvlX: MaybePresent active storage
            Between each instruction, the (pointer) zones for which the
            diff pointer may be present and allocated and therefore should be
            deallocated (with if_present test) along with the source variable.
  Computation: (Forwards from entry point to final point of each analyzed unit curU)
  -- Initialization value written down at the initial point:
      A_u := {accumulated A_u's analysis context on the entry of curU} U R_u
     Rationale: we assume everything required up
      will have been made available by the calling context.

  -- down through an active access (read or write) to *P :
      A_d := A_u       (i.e. unmodified!)
     Rationale: could be "A_u U {P}", but we believe that this analysis
      has already put P into A_u.

  -- down through a pointer assignment P=>P2 or P=>malloc or P=>&amp;T :
      A_d := A_u \ {P if P is total} U {P if P in R_d}
     Rationale: the up P is overwritten, so its "presence" status vanishes.
      actually, even if we wanted to free P', this statement looses its address!
      Then if P is required down, P' will be made here => it is now present!
      Also for P2, we believe that this analysis has already placed the
      presence status in A_u if P2' is present.

  -- down through a free(P) or nullify(P) (special case of the above):
      A_d := A_u \ {P if P is total}
     Rationale: the up P is overwritten, so its "presence" status vanishes.

  -- down through a call e.g. call Foo(...,P,...)
      ==> Accumulate A_u into the TOPDOWN_4 analysis context on the entry of Foo
      A_d := A_u \ {(!AEV(Foo)) except for non-total actual-args} U AEA(Foo)
     Rationale: by definition of AEV(Foo) and of AEA(Foo).
      Non-total actual-args must retain their status from A_u.

  -- Final result found at the final point:
      A_d
     When curU is a top Unit, A_d(curU) can be used to tell the calling
     context of curU that it may free the available differentiated
     derivatives if they exist.

Ready for differentiation - [State 4]