Intel® Advisor Help

Dependencies Tool Limitations

The Dependencies tool only examines your running program, so it has limitations about what problems it can find. Intel Advisor is designed for serial programs, and its tools assume that only a single thread will execute each parallel site (see Using Partially Parallel Programs with Intel Advisor Tools).

The following sections explain some Dependencies tool limitations.

Case 1

A shared variable is declared within the same function where the parallel site begins. Consider the following code:

void foo(void){ 
    int global = 0;  // stack variable, but global relative to parallel region
      ANNOTATE_SITE_BEGIN(a);
       for (int i=0; i<n; i++){
         ANNOTATE_ITERATION_TASK(task_a);
         int local = 0;
         global++ ;  // race condition not reported by the Dependencies tool
         local++; // not race condition: local to the task
         fum(&global, &local);
       }
      ANNOTATE_SITE_END();
}

The Dependencies tool should report the incremented variable global as a data race condition, because its declaration is global to the multiple tasks that are modifying it. However, the Dependencies tool cannot tell the difference between variables global and local, because they are both in the same stack frame of the function foo() and the annotations are just simple macros. If the Dependencies tool reports the incremented variable global as a data race, it would also report incremented variable local as a data race, resulting in a false positive problem report - so it does not report either one.

However, with actual parallelism, the body of the loop is a separate scope, and incremented variable global is in a different stack frame from the incremented variable local, so the Intel® Inspector can report global as a race, and it won’t falsely report local as a race. Because the Dependencies tool only has a serial program available to analyze, it tries to predict tasks, but cannot fully model the location of stack variables that belong to each task's stacks.

Case 2

The variable callers_global is declared in the stack of the caller function caller(). The caller() function calls function foo(), and foo() contains a parallel site. This is even more arcane: because of the x86 calling conventions, arguments are pushed onto the stack when caller() calls foo(). The caller()'s arguments physically belong to caller()’s stack frame, but belong to foo()'s declaration scope. Like in the first case, the Dependencies tool does not report potential data races for variables declared in foo()'s scope. Because items in caller()'s stack may be in foo()'s scope, the Dependencies tool does not report problems with items in caller()'s stack because it might generate false positives. The following example resembles the one above, except int global is removed and the caller() declaration is added at the end:

void foo(int &global)
// the rest of foo() is the same as in the previous example, except the int global declaration
    ANNOTATE_SITE_BEGIN(a);
     for (int i=0; i<n; i++){
        ANNOTATE_ITERATION_TASK(task_a);
        int local = 0;
        global++ ;  // race condition not reported by the Dependencies tool
        local++; // not race condition: local to the task
        fum(&global, &local);
     }
    ANNOTATE_SITE_END();
}

void caller(void){ int callers_global; foo(callers_global);}

One way of avoiding this shortcoming of Dependencies analysis is to introduce additional stack frames between the declaration of global and its use inside the task. To do this, replace the code in the loop body with two nested C++11 lambda expressions to create two nested calls, which create two stack frames (see the help topic Enabling C++11 Lambda Expression Support):

void foo(void){
  int global=0; 
  [&](){
     [&](){ 
        ANNOTATE_SITE_BEGIN(b); 
        for (int i=0; i<20; i++){
           ANNOTATE_ITERATION_TASK(MyTask2);
           int local=0; 
           global++; 
           local++; 
        }
       ANNOTATE_SITE_END();  
     }();
  }();
} 

These cases show why you need to run Intel® Inspector after you convert annotations to parallel code. Intel® Inspector can catch remaining data races that Intel Advisor cannot detect.

See Also