Update Documentation authored by Camille Coti's avatar Camille Coti
...@@ -164,8 +164,126 @@ Built /home/users/coti/x86/tau2/x86_64/lib/Makefile.tau-starpu-pthread ...@@ -164,8 +164,126 @@ Built /home/users/coti/x86/tau2/x86_64/lib/Makefile.tau-starpu-pthread
***************** DONE ************************ ***************** DONE ************************
``` ```
TAU (executables and libraries) is installed in a directory named after the current machine's architecture, such as `x86_64` or `ibm64linux`.
A few notes: A few notes:
- This will enable support for pthreads. It is mandatory to add support for some concurrency, since StarPU is multithreaded. - This will enable support for pthreads. It is mandatory to add support for some concurrency, since StarPU is multithreaded.
- Support for Cuda can be added, with options like `-cupti` and `-cuda`. I don't have first-hand experience with the OpenCl support, but I know it exists. - Support for Cuda can be added, with options like `-cupti` and `-cuda`. I don't have first-hand experience with the OpenCl support, but I know it exists.
- The last configure command that was executed is kept in `.last_config`. All the configure commands that were executed previously are kept in `.all_config`. - The last configure command that was executed is kept in `.last_config`. All the configure commands that were executed previously are kept in `.all_config`.
I have put an example in `examples/starpu`. Compile it with `make`. An executable `mult` is created, execute it using:
```
tau_exec -T starpu,serial ./mult
```
The `-T` options select the TAU library you want to use. A library is generated for every configuration that was executed. Support for MPI is enabled by default. If you are not using MPI, you need to pass `serial`. You can see how `tau_exec` picks it using `-v`:
```
$ STARPU_NCPU=8 tau_exec -T starpu,serial -v ./mult
Program to run : ./mult
Matching bindings:
shared-starpu-cupti-pthread shared-starpu-pthread
Using:
shared-starpu-pthread
[...]
```
At the end of the execution, profile files are generated. Their names correspond to the MPI rank (here, they are all called 0) and a "thread rank".
```
coti@voltar:~/x86/tau2/examples/starpu$ ls
Makefile mult mult.o profile.0.0.1 profile.0.0.11 profile.0.0.13 profile.0.0.15 profile.0.0.3 profile.0.0.5 profile.0.0.7 profile.0.0.9
README mult.c profile.0.0.0 profile.0.0.10 profile.0.0.12 profile.0.0.14 profile.0.0.2 profile.0.0.4 profile.0.0.6 profile.0.0.8
```
You can see the text-based representation of the profile using `pprof`:
```
coti@voltar:~/x86/tau2/examples/starpu$ pprof
Reading Profile files in profile.*
NODE 0;CONTEXT 0;THREAD 0:
---------------------------------------------------------------------------------------
%Time Exclusive Inclusive #Call #Subrs Inclusive Name
msec total msec usec/call
---------------------------------------------------------------------------------------
100.0 0.978 6,807 1 1 6807516 .TAU application
100.0 2 6,806 1 1 6806538 taupreload_main
100.0 6,797 6,804 2 24 3402198 StarPU init
0.1 6 6 11 0 583 pthread_join
0.0 0.652 0.652 12 0 54 pthread_create
[...]
FUNCTION SUMMARY (total):
---------------------------------------------------------------------------------------
%Time Exclusive Inclusive #Call #Subrs Inclusive Name
msec total msec usec/call
---------------------------------------------------------------------------------------
100.0 18,719 1:45.807 16 32 6612963 .TAU application
45.7 41,873 48,340 32 32 1510649 StarPU_transfer
25.0 26,408 26,408 4 0 6602033 [PTHREAD] addr=<0x7fca6c9e5970>
6.4 2 6,806 1 1 6806538 taupreload_main
6.4 6,797 6,804 2 24 3402198 StarPU init
6.1 6,467 6,467 16 0 404212 StarPU exec cpu_mult
3.9 0.538 4,156 8 8 519503 [PTHREAD] _starpu_cpu_worker
3.9 4,152 4,155 8 8 519436 StarPU driver
1.3 12 1,376 3 3 458888 [PTHREAD] _starpu_cuda_worker
1.3 1,366 1,366 11 3 124262 StarPU driver init
0.0 6 6 11 0 583 pthread_join
0.0 0.809 0.809 15 0 54 pthread_create
FUNCTION SUMMARY (mean):
---------------------------------------------------------------------------------------
%Time Exclusive Inclusive #Call #Subrs Inclusive Name
msec total msec usec/call
---------------------------------------------------------------------------------------
100.0 1,169 6,612 1 2 6612963 .TAU application
45.7 2,617 3,021 2 2 1510649 StarPU_transfer
25.0 1,650 1,650 0.25 0 6602033 [PTHREAD] addr=<0x7fca6c9e5970>
6.4 0.134 425 0.0625 0.0625 6806538 taupreload_main
6.4 424 425 0.125 1.5 3402198 StarPU init
6.1 404 404 1 0 404212 StarPU exec cpu_mult
3.9 0.0336 259 0.5 0.5 519503 [PTHREAD] _starpu_cpu_worker
3.9 259 259 0.5 0.5 519436 StarPU driver
1.3 0.789 86 0.1875 0.1875 458888 [PTHREAD] _starpu_cuda_worker
1.3 85 85 0.6875 0.1875 124262 StarPU driver init
0.0 0.401 0.401 0.6875 0 583 pthread_join
0.0 0.0506 0.0506 0.9375 0 54 pthread_create
```
A few notes:
- Entries starting with `[PTHREAD]` are provided by the pthread support
- Entries starting with `StarPU` are provided by the StarPU support
- when a kernel is executed, its execution time is given by an entry `StarPU exec cpu_mult`, if `cpu_mult` is the name of the kernel
More information on the entries is available using the `-a` option:
```
coti@voltar:~/x86/tau2/examples/starpu$ pprof -a
Reading Profile files in profile.*
[...]
FUNCTION SUMMARY (mean):
---------------------------------------------------------------------------------------
%Time Exclusive Inclusive #Call #Subrs Inclusive Name
msec total msec usec/call
---------------------------------------------------------------------------------------
100.0 1,169 6,612 1 2 6612963 .TAU application
45.7 2,617 3,021 2 2 1510649 StarPU_transfer [{ memnode 0 }]
25.0 1,650 1,650 0.25 0 6602033 [PTHREAD] addr=<0x7fca6c9e5970> [{(unknown)} {0, 0}]
6.4 0.134 425 0.0625 0.0625 6806538 taupreload_main
6.4 424 425 0.125 1.5 3402198 StarPU init
6.1 404 404 1 0 404212 StarPU exec cpu_mult [{CPU:0} function 0x40118d { /home/users/coti/x86/tau2/examples/starpu/mult.c:88 }]
3.9 0.0336 259 0.5 0.5 519503 [PTHREAD] _starpu_cpu_worker [{/home/users/coti/x86/starpu/src/drivers/cpu/driver_cpu.c} {692, 0}]
1.3 0.789 86 0.1875 0.1875 458888 [PTHREAD] _starpu_cuda_worker [{/home/users/coti/x86/starpu/src/drivers/cuda/driver_cuda.c} {2230, 0}]
0.5 32 32 0.0625 0.0625 519587 StarPU driver [{CPU:0}]
0.5 32 32 0.0625 0.0625 519541 StarPU driver [{CPU:1}]
```
For instance, we have the file and line number where `cpu_mult` is defined.