Skip to content

Using the TAU Skel Plugin

Kevin Huck edited this page Aug 21, 2020 · 6 revisions

Using the TAU Skel Plugin

Table of contents generated with markdown-toc

Overview

The TAU "Skel" plugin is used to capture MPI Collective operations and POSIX I/O operations in a trace. The resulting trace is used to generate a skeleton proxy application that will replicate the communication and I/O patterns for an application that cannot be shared otherwise (private repo, export controlled, etc.). This page explains how to configure TAU to use the skeleton plugin, and how to use it to generate the skel data. The trace data is written as a directory of JSON files that (currently) use the Python3 Plotly Gantt chart format. (we are considering switching to the Google Trace Event format). Each function event (one trace record) contains something like the following data:

[
{"timestamp": 1597984506044769, "duration": 10243, "step": 0, "type": "MPI", "function": "MPI_Comm_split", "comm_in": "0x651860", "color": 1, "key": 0, "comm_out": "0x0272a3a0"},
{"timestamp": 1597984506055084, "duration": 4028, "step": 0, "type": "POSIX", "function": "fopen64", "path": "simulation/settings-files.json", "mode": "r", "return": "0x2736780"},
{"timestamp": 1597984506059153, "duration": 45, "step": 0, "type": "POSIX", "function": "read", "fd": 30, "pathname": "simulation/settings-files.json", "return": 389},
{"timestamp": 1597984506059324, "duration": 10, "step": 0, "type": "POSIX", "function": "fclose", "fp": "0x2736780", "pathname": "simulation/settings-files.json"},
{"timestamp": 1597984506059587, "duration": 2107, "step": 0, "type": "MPI", "function": "MPI_Cart_create", "comm": "0x272a3a0", "ndims": 3, "dims": [4,3,3], "periods": [1,1,1], "reorder": 0, "comm_out": "0x2737d00"},
{"timestamp": 1597984506078406, "duration": 4663, "step": 0, "type": "MPI", "function": "MPI_Comm_dup", "comm_in": "0x272a3a0", "comm_out": "0x028292e0"}, 
{"timestamp": 1597984506084255, "duration": 1136, "step": 0, "type": "POSIX", "function": "fopen64", "path": "adios2.xml", "mode": "r", "return": "0x282a700"},
{"timestamp": 1597984506085429, "duration": 62, "step": 0, "type": "POSIX", "function": "read", "fd": 30, "pathname": "adios2.xml", "return": 5447}, 
{"timestamp": 1597984506085524, "duration": 5, "step": 0, "type": "POSIX", "function": "read", "fd": 30, "pathname": "adios2.xml", "return": 0},
{"timestamp": 1597984506085532, "duration": 101, "step": 0, "type": "POSIX", "function": "fclose", "fp": "0x282a700", "pathname": "adios2.xml"},
{"timestamp": 1597984506085713, "duration": 37, "step": 0, "type": "MPI", "function": "MPI_Bcast", "size": 8, "root": 0, "comm": "0x028292e0"},
{"timestamp": 1597984506085755, "duration": 32, "step": 0, "type": "MPI", "function": "MPI_Bcast", "size": 5447, "root": 0, "comm": "0x028292e0"},
{"timestamp": 1597984506086357, "duration": 404, "step": 0, "type": "MPI", "function": "MPI_Comm_dup", "comm_in": "0x28292e0", "comm_out": "0x0284c970"},
{"timestamp": 1597984506087581, "duration": 19, "step": 0, "type": "MPI", "function": "MPI_Barrier", "comm": "0x0284c970"},

{"timestamp": 1597984510084519, "duration": 1, "step": 0, "type": "none", "function": "program exit"}
]

The MPI functions are captured using the interposition library technique built into the MPI interface standard. The POSIX functions are captured by using the dynamic library support (dl) to replace the actual POSIX functions (e.g. open, close, read) with wrapped versions. Therefore no instrumentation or modification of the original application is required.

Configuring and Building TAU from scratch

TAU does not have any additional dependencies for the skel plugin. As expected, the I/O wrapper and MPI support are required. To configure TAU by building from source, do the following (assuming GCC compilers, which are the default):

# Clone the public mirror of the TAU master branch
git clone https://github.com/UO-OACISS/tau2.git
# Enter the TAU directory
cd tau2
# Configure
./configure -mpi -iowrapper -pthread
# Make
make -j install
# Add the TAU utilities (e.g. tau_exec) to your path
export PATH=$PATH:`pwd`/ibm64linux/bin

The ibm64linux arch is used on POWER9 systems like the ORNL Summit machine. On other architectures (e.g. x86_64, ARM64) change ibm64linux to your specific platform.

Building TAU with Spack

If you are using Spack, installing TAU for this support is even easier. Note: the IO wrapper and pthread support are enabled by default, but to be pedantic/explicit do the following:

spack install tau@develop +mpi +io +pthreads
spack load tau

To build without any TAU unrelated dependencies that aren't needed for the Skel support, disable some default dependencies:

spack install tau@develop +mpi +io +pthreads ~binutils ~otf2 ~pdt 
spack load tau

Using the plugin support

To extract an MPI Collective and POSIX trace from an application, insert the tau_exec wrapper script in between your jsrun/mpirun/srun command arguments and the application executable. For example, if you are using vanilla Open MPI and the ADIOS2 gray-scott tutorial example application, you would run something like the following:

[khuck@delphi gray-scott]$ mpirun -np 36 \
tau_exec -T mpi,pthread \
-io -skel \
build-delphi/gray-scott simulation/settings-files.json

Simulation writes data using engine type:              BP4
========================================
grid:             256x256x256
steps:            10
plotgap:          1
F:                0.01
k:                0.048
dt:               2
Du:               0.2
Dv:               0.1
noise:            1e-07
output:           gs.bp
adios_config:     adios2.xml
process layout:   4x3x3
local grid size:  64x86x86
========================================
Simulation at step 1 writing output step     1
Simulation at step 2 writing output step     2
Simulation at step 3 writing output step     3
Simulation at step 4 writing output step     4
Simulation at step 5 writing output step     5
Simulation at step 6 writing output step     6
Simulation at step 7 writing output step     7
Simulation at step 8 writing output step     8
Simulation at step 9 writing output step     9
Simulation at step 10 writing output step     10
[khuck@delphi gray-scott]$

In this example, the tau_exec script will LD_PRELOAD the necessary TAU libraries. The -T mpi,pthread tells the script which TAU configuration to use (not required for Spack installations or if there is only one TAU configuration available). The -io flag enables the POSIX I/O wrapper. The -skel flag enables the skel plugin for TAU.

After execution, you should see a directory of output trace files:

[khuck@delphi gray-scott]$ ls skel
rank00000.trace  rank00008.trace  rank00016.trace  rank00024.trace  rank00032.trace
rank00001.trace  rank00009.trace  rank00017.trace  rank00025.trace  rank00033.trace
rank00002.trace  rank00010.trace  rank00018.trace  rank00026.trace  rank00034.trace
rank00003.trace  rank00011.trace  rank00019.trace  rank00027.trace  rank00035.trace
rank00004.trace  rank00012.trace  rank00020.trace  rank00028.trace
rank00005.trace  rank00013.trace  rank00021.trace  rank00029.trace
rank00006.trace  rank00014.trace  rank00022.trace  rank00030.trace
rank00007.trace  rank00015.trace  rank00023.trace  rank00031.trace

Installation and usage on Summit

TAU has been pre-configured and built for use with PGI 19.9 and GCC 7.4 compilers. They are located in /gpfs/alpine/world-shared/gen010/khuck/tau2.installations/skel-pgi19.9 and /gpfs/alpine/world-shared/gen010/khuck/tau2.installations/skel-gcc7.4, respectively. There is also some sample data in /gpfs/alpine/world-shared/gen010/khuck/tau2.installations/xgc-sample-data from a very small, short run of XGC1 using ADIOS2 to write out periodic data. Here's an example of how I used the PGI version of the builds. In my job script, I did:

...
source /gpfs/alpine/world-shared/gen010/khuck/tau2.installations/skel-pgi19.9/sourceme.sh

export TAU_EXEC="tau_exec -T pgi,mpi,pthread -skel"
export XGC_EXEC=/ccs/home/khuck/src/XGC-Devel/xgc-build/bin/xgc-es-cpp-gpu

jsrun --nrs 48 --tasks_per_rs 1 --cpu_per_rs 7 --gpu_per_rs 1 \
--rs_per_host 6 --latency_priority gpu-cpu --launch_distribution cyclic --bind packed:7 \
${TAU_EXEC} ${XGC_EXEC}