Source IR with data-flow infrastructure - [State 2]

Tapenade represent the memory that each variable can access by abstract memory zones. This memory layout is abstract in the sense that it need not match the one that a compiler may build. This abstract layout is only concerned with ensuring an easy detection of memory overlapping. This abstract layout defines so-called memory zones, each one with a unique zone rank. Different zones are guaranteed to not overlap in memory. On the other hand, zones are not required to be consecutive in the actual memory. Reciprocally, we can compute the set of zones possibly referenced by any reference expression. In cases of aliasing, two different variables can share some zones in their sets possibly referenced. If this expression, in the context defined by its position in the IR, may access to one zone, then this zone is guaranteed to be in the set. But, due to the uncertainties of static data-flow analysis, the set of zones for this expression may very well contain zones that are never accessed in actual execution.

For efficency, the set of all zones must remain small. Therefore we never distinguish array elements. In other words, a simple array will have only one zone. On the other hand, we want to distinguish record fields, and therefore a variable of “record” type will have as many zones as record fields.

To illustrate, consider variable Z declared in Fortran90 by

     type point
       real, dimension(3) :: x
       real :: y
     end type point
     type(point), dimension(len), target :: Z

Z is an array of records. It only has two zones assigned, one (say 17) for all the Z(1:len)%x(1:3) and the other (say 18) for all the Z(1:len)%y. In passing, this also illustrates why a zone need not be consecutive in actual memory.

To allow for easy access to the zones of any expression, the zones assigned to individual variables are arranged in the form of a tree. These trees are built using TapList’s, and the leaves are TapIntList’s rather than individual int’s to capture the cases where the actual zone cannot be determined statically. Taking again the example above, the tree of zones for Z is (as Tapenade would print it) ((<17>) (<18>)) or equivalently (with a graphical convention that we use a lot):

  .______.______()
  |      |
  .__()  .__()
  |      |
<17>   <18>

Convention is: dots “.” stand for a TapList object, horizontal lines go from a TapList on the left to its tail on the right, vertical lines from a TapList on top to its head below. An empty list “()” stands for the Java null, and TapIntList’s are shown between angle brackets “< >”.

The tree of zones of a single simple variable or array has the shape:

  .__()
  |
 <9>

Similarly, the tree of zones of a reference to, say, Z(5)%y is easily found using the zones tree of Z yielding:

  .__()
  |
<18>

This tree arrangement also captures the case of pointers, in the following way: The zones of a pointer variable are represented with a TapList whose head holds (like any simple variable) the zone of the pointer itself (therefore as a TapIntList), and whose tail (if given) holds the zone of the pointer destination or possible destinations. For example, suppose first that we have defined Z2 as

 type(point), dimension(len2), target :: Z2

and Tapenade has assigned zones tree ((<22>) (<23>)) to Z2. Suppose furthermore that a pointer p declared as

 type(point), pointer :: p

is made to point to Z(3) in one code branch and to Z2(1) in another branch. Then at some code location after the branches have merged, Tapenade will dynamically compute the following tree of zones for p as

  .______._______.______()
  |      |       |
<40>     .__()   .__()
         |       |
      <17 22> <18 23>

or equivalently (as Tapenade would print it) (<40> (<17 22>) (<18 23>))

Note that in C, arrays are in fact mostly pointers, so the same example gives birth to more pointers. Pointer p declared and defined as:

 typedef struct {float x[3]; float y;} point;
 point Z[10];
 point Z2[20];
 point *p = (someTest?&(Z[3]):&(Z2[1]))
 point **pp = (someTest?&Z:&Z2);

will receive a slightly different zones tree:

  .______.__________________.__()
  |      |                  |
<39>     .________.__()     .__()
         |        |         |
      <17 22>  <19 24>   <18 23>

that reflects the additional flexibility of this code (one can e.g. reassign (*p)[5].x itself). The analyses of Tapenade further down the line must/will handle this difference and treat these codes in a similar way. For further comparison, the tree of zones for Z itself is:

  .______.__________________.__()
  |      |                  |
<16>     .________.__()     .__()
         |        |         |
       <17>     <19>      <18>

whereas the tree of zones for the new variable pp is:

  .______._______.__________________.__()
  |      |       |                  |
<40>  <16 21>    .________.__()     .__()
                 |        |         |
              <17 22>  <19 24>   <18 23>

Missing: details on ZoneInfo and PublicInfo

Missing: description of Unit.externalShape, Unit.translator, Unit.transferMatrix, CallArrow.translator

Missing: details on translation of Data-Flow info through calls