Source IR with data-flow infrastructure - **[State 2]** ================================================================================ Tapenade represent the memory that each variable can access by abstract memory *zones*. This memory layout is abstract in the sense that it need not match the one that a compiler may build. This abstract layout is only concerned with ensuring an easy detection of memory overlapping. This abstract layout defines so-called memory *zones*, each one with a unique *zone rank*. Different zones are guaranteed to not overlap in memory. On the other hand, zones are not required to be consecutive in the actual memory. Reciprocally, we can compute the set of zones possibly referenced by any reference expression. In cases of aliasing, two different variables can share some zones in their sets possibly referenced. If this expression, in the context defined by its position in the IR, may access to one zone, then this zone is guaranteed to be in the set. But, due to the uncertainties of static data-flow analysis, the set of zones for this expression may very well contain zones that are never accessed in actual execution. For efficency, the set of all zones must remain small. Therefore we never distinguish array elements. In other words, a simple array will have only one zone. On the other hand, we want to distinguish record fields, and therefore a variable of "record" type will have as many zones as record fields. To illustrate, consider variable `Z` declared in Fortran90 by ``` type point real, dimension(3) :: x real :: y end type point type(point), dimension(len), target :: Z ``` `Z` is an array of records. It only has two zones assigned, one (say 17) for all the `Z(1:len)%x(1:3)` and the other (say 18) for all the `Z(1:len)%y`. In passing, this also illustrates why a zone need not be consecutive in actual memory. To allow for easy access to the zones of any expression, the zones assigned to individual variables are arranged in the form of a tree. These trees are built using `TapList`'s, and the leaves are `TapIntList`'s rather than individual `int`'s to capture the cases where the *actual* zone cannot be determined statically. Taking again the example above, the tree of zones for `Z` is (as Tapenade would print it) `((<17>) (<18>))` or equivalently (with a graphical convention that we use a lot): .______.______() | | .__() .__() | | <17> <18> Convention is: dots "`.`\" stand for a `TapList` object, horizontal lines go from a `TapList` on the left to its `tail` on the right, vertical lines from a `TapList` on top to its `head` below. An empty list \"`()`\" stands for the Java `null`, and `TapIntList`'s are shown between angle brackets \"`< >`\". The tree of zones of a single simple variable or array has the shape: .__() | <9> Similarly, the tree of zones of a reference to, say, `Z(5)%y` is easily found using the zones tree of `Z` yielding: .__() | <18> This tree arrangement also captures the case of pointers, in the following way: The zones of a pointer variable are represented with a `TapList` whose head holds (like any simple variable) the zone of the pointer itself (therefore as a `TapIntList`), and whose tail (if given) holds the zone of the pointer destination or possible destinations. For example, suppose first that we have defined `Z2` as type(point), dimension(len2), target :: Z2 and Tapenade has assigned zones tree `((<22>) (<23>))` to `Z2`. Suppose furthermore that a pointer `p` declared as type(point), pointer :: p is made to point to `Z(3)` in one code branch and to `Z2(1)` in another branch. Then at some code location *after* the branches have merged, Tapenade will dynamically compute the following tree of zones for `p` as .______._______.______() | | | <40> .__() .__() | | <17 22> <18 23> or equivalently (as Tapenade would print it) `(<40> (<17 22>) (<18 23>))` Note that in C, arrays are in fact mostly pointers, so the same example gives birth to more pointers. Pointer `p` declared and defined as: typedef struct {float x[3]; float y;} point; point Z[10]; point Z2[20]; point *p = (someTest?&(Z[3]):&(Z2[1])) point **pp = (someTest?&Z:&Z2); will receive a slightly different zones tree: .______.__________________.__() | | | <39> .________.__() .__() | | | <17 22> <19 24> <18 23> that reflects the additional flexibility of this code (one can e.g. reassign `(*p)[5].x` itself). The analyses of Tapenade further down the line must/will handle this difference and treat these codes in a similar way. For further comparison, the tree of zones for `Z` itself is: .______.__________________.__() | | | <16> .________.__() .__() | | | <17> <19> <18> whereas the tree of zones for the new variable `pp` is: .______._______.__________________.__() | | | | <40> <16 21> .________.__() .__() | | | <17 22> <19 24> <18 23> Missing: details on ZoneInfo and PublicInfo Missing: description of Unit.externalShape, Unit.translator, Unit.transferMatrix, CallArrow.translator Missing: details on translation of Data-Flow info through calls