7.0-rc1 (revision v7.0-rc1)
Application Sampling

This document describes how to use the sampling options within Score-P.

Introduction

Score-P supports sampling that can be used concurrently to instrumentation to generate profiles and traces. In the following, we will describe how sampling differs from instrumentation. Reading this text will help you to interpret resulting performance data. However, if you are aware of how sampling works, you can skip the preface.

In our context, we understand sampling as a technique to capture the behavior and performance of programs. We interrupt the running programs at a specified interval (the sampling period) and capture the current state of the program (i.e., the current stack) and performance metrics (e.g., PAPI). The obtained data is than further stored as a trace or a profile and can be used to analyze the behavior of the sampled program.

Before version 2.0 of Score-P, only instrumentation-based performance analysis had been possible. Such an instrumentation relies on callbacks to the measurement environment (instrumentation points), e.g., a function enter or exit. The resulting trace or profile presented the exact runtimes of the functions, augmented with performance data and communication information. However, instrumentation introduces a constant overhead for each of the instrumentation points. For small instrumented functions, this constant overhead can be overwhelming.

Sampling provides the opportunity to prevent this overwhelming overhead, and even more, the overhead introduced by sampling is controllable by setting the sampling rate. However, the resulting performance data is more "fuzzy". Not every function call is captured and thus the resulting data should be analyzed carefully. Based on the duration of a function and the sampling period, a function call might or might not be included in the gathered performance data. However, statistically, the profile information is correct. Additionally, the sampling rate allows to regulate the trade-off between overhead and correctness, which is not possible for instrumentation.

In Score-P we support both instrumentation and sampling. This allows you for example to get a statistical overview of your program as well as analyzing the communication behavior. If a sample hits a function that is known to the measurement environment via instrumentation (e.g., by OPARI2), the sample will show the same function in the trace and the profile.

Prerequisites

This version of Score-P provides support for sampling. To enable sampling, several prerequisites have to be met.

Configure Options

libunwind

If libunwind is not installed in a standard directory, you can provide the following flags in the configure step:

--with-libunwind=(yes|no|<Path to libunwind installation>)
If you want to build scorep with libunwind but do
not have a libunwind in a standard location, you
need to explicitly specify the directory where it is
installed. On non-cross-compile systems we search
the system include and lib paths per default [yes];
on cross-compile systems, however, you have to
specify a path [no]. --with-libunwind is a shorthand
for --with-libunwind-include=<Path/include> and
--with-libunwind-lib=<Path/lib>. If these shorthand
assumptions are not correct, you can use the
explicit include and lib options directly.
--with-libunwind-include=<Path to libunwind headers>
--with-libunwind-lib=<Path to libunwind libraries>

Sampling Related Score-P Measurement Configuration Variables

The following lists the Score-P measurement configuration variables which are related to sampling. Please refer to the individual variables for a more detailed description.

Use Cases

Enable unwinding in instrumented programs

Additionally to the instrumentation, you now see where the instrumented region has been called. A pure MPI instrumentation for example does not tell you which functions have been issuing communications. With unwinding enabled, this is revealed and stored in the trace or profile.

Instrument your program, e.g., with MPI instrumentation enabled.

scorep mpicc my_mpi_code.c -o my_mpi_application

Set the following environment variables:

export SCOREP_ENABLE_UNWINDING=true
export SCOREP_SAMPLING_EVENTS=

Run your program

mpirun -np 16 ./my_mpi_application

Instrument a hybrid parallel program and enable sampling

In this example you get rid of a possible enormous compiler instrumentation overhead but you are still able to see statistical occurrences of small code regions. The NAS Parallel Benchmark BT-MZ for example uses small sub functions within OpenMP parallel functions that increase the measurement overhead significantly when compiler instrumentation is enabled.

Instrument your program, e.g., with MPI and OpenMP instrumentation enabled.

scorep mpicc -fopenmp my_hybrid_code.c -o my_hybrid_application

Note: If you use the GNU compiler and shared libraries of Score-P you might get errors due to undefined references depending on your gcc version. Please add --no-as-needed to your scorep command line. This flag will add a GNU ld linker flag to fix undefined references when using shared Score-P libraries. This happens on systems using --as-needed as linker default. It will be handled transparently in future releases of Score-P.

Set the following environment variables:

export SCOREP_ENABLE_UNWINDING=true

If you want to use a sampling event and period differing from the default settings you additionally set:

export SCOREP_SAMPLING_EVENTS=PAPI_TOT_CYC@1000000

Run your program

mpirun -np 16 ./my_mpi_application

Test Environment

Example

Instrument NAS BT-MZ code

cd <NAS_BT_MZ_SRC_DIR>
vim config/make.def

Set add the Score-P wrapper to your MPI Fortran compiler.

MPIF77 = scorep mpif77

Recompile the NAS BT-MZ code.

make clean
make bt-mz CLASS=C NPROCS=128

Run instrumented binary

cd bin
sbatch run.slurm

Batch script example:

#!/bin/bash
#SBATCH -J NAS_BT_C_128x2
#SBATCH --nodes=32
#SBATCH --tasks-per-node=4
#SBATCH --cpus-per-task=2
#SBATCH --time=00:30:00
export OMP_NUM_THREADS=2
export NPB_MZ_BLOAD=0
export SCOREP_ENABLE_TRACING=true
export SCOREP_ENABLE_PROFILING=false
export SCOREP_ENABLE_UNWINDING=true
export SCOREP_TOTAL_MEMORY=200M
export SCOREP_SAMPLING_EVENTS=perf_cycles@2000000
export SCOREP_EXPERIMENT_DIRECTORY='bt-mz_C.128x2_trace_unwinding'
srun ./bt-mz_C.128