9.0-rc2 (revision ee0aaf9c9)
|
This document describes how to use the sampling options within Score-P.
Score-P supports sampling that can be used concurrently to instrumentation to generate profiles and traces. In the following, we will describe how sampling differs from instrumentation. Reading this text will help you to interpret resulting performance data. However, if you are aware of how sampling works, you can skip the preface.
In our context, we understand sampling as a technique to capture the behavior and performance of programs. We interrupt the running programs at a specified interval (the sampling period) and capture the current state of the program (i.e., the current stack) and performance metrics (e.g., PAPI). The obtained data is than further stored as a trace or a profile and can be used to analyze the behavior of the sampled program.
Before version 2.0 of Score-P, only instrumentation-based performance analysis had been possible. Such an instrumentation relies on callbacks to the measurement environment (instrumentation points
), e.g., a function enter or exit. The resulting trace or profile presented the exact runtimes of the functions, augmented with performance data and communication information. However, instrumentation introduces a constant overhead for each of the instrumentation points. For small instrumented functions, this constant overhead can be overwhelming.
Sampling provides the opportunity to prevent this overwhelming overhead, and even more, the overhead introduced by sampling is controllable by setting the sampling rate. However, the resulting performance data is more "fuzzy". Not every function call is captured and thus the resulting data should be analyzed carefully. Based on the duration of a function and the sampling period, a function call might or might not be included in the gathered performance data. However, statistically, the profile information is correct. Additionally, the sampling rate allows to regulate the trade-off between overhead and correctness, which is not possible for instrumentation.
In Score-P we support both instrumentation and sampling. This allows you for example to get a statistical overview of your program as well as analyzing the communication behavior. If a sample hits a function that is known to the measurement environment via instrumentation (e.g., by OPARI2), the sample will show the same function in the trace and the profile.
This version of Score-P provides support for sampling. To enable sampling, several prerequisites have to be met.
libunwind:
Additionally to the usual configuration process of Score-P, libunwind
is needed. libunwind
can be installed using a standard package manager or by downloading the latest version from
http://download.savannah.gnu.org/releases/libunwind/
This library must be available at your system to enable sampling. In our tests, we used the most current stable version (1.1) as previous versions might result in segmentation faults.
Sampling Sources:
Sampling sources generate interrupts that trigger a sample. We interface three different interrupt generators, which can be chosen at runtime.
Interval timer:
Interval timers are POSIX compliant but provide a major drawback: They cannot be used for multi-threaded programs, but only for single-threaded ones. We check for setitimer
that is provided by sys/time.h
.
PAPI:
We interface the PAPI library, if it is found in the configure phase. The PAPI interrupt source uses overflowing performance counters to interrupt the program. This source can be used in multi-threaded programs. Due to limitations from the PAPI library, PAPI counters will not be available if PAPI sampling is enabled. However, you can use perf metrics, e.g.,
perf
is comparable to PAPI but much more low-level. We directly use the system call. This source can be used in multi-threaded programs. PAPI counters are available if perf
is used as an interrupt source. Currently we only provide a cycle based overflow counter via perf
. We recommend using PAPI
or perf
as interrupt sources. However, these also pose a specific disadvantage when power saving techniques such as DVFS or idle states are active on a system. In this case, a constant sampling interval cannot be guaranteed. If, for example, an application calls a sleep routine, then the cycle counter might not increase as the CPU might switch to an idle state. This can also influence the result data. Such idling times can also be introduced by OpenMP runtimes and can be avoided by setting the block times accordingly or setting the environment variable OMP_WAIT_POLICY
to ACTIVE
.
If libunwind is not installed in a standard directory, you can provide the following flags in the configure step:
The following lists the Score-P measurement configuration variables which are related to sampling. Please refer to the individual variables for a more detailed description.
SCOREP_ENABLE_UNWINDING
SCOREP_SAMPLING_EVENTS
SCOREP_SAMPLING_SEP
SCOREP_TRACING_CONVERT_CALLING_CONTEXT_EVENTS
Additionally to the instrumentation, you now see where the instrumented region has been called. A pure MPI instrumentation for example does not tell you which functions have been issuing communications. With unwinding enabled, this is revealed and stored in the trace or profile.
Instrument your program, e.g., with MPI instrumentation enabled.
Set the following environment variables:
Run your program
In this example you get rid of a possible enormous compiler instrumentation overhead but you are still able to see statistical occurrences of small code regions. The NAS Parallel Benchmark BT-MZ for example uses small sub functions within OpenMP parallel functions that increase the measurement overhead significantly when compiler instrumentation is enabled.
Instrument your program, e.g., with MPI and OpenMP instrumentation enabled.
Note: If you use the GNU compiler and shared libraries of Score-P you might get errors due to undefined references depending on your gcc version. Please add --no-as-needed
to your scorep command line. This flag will add a GNU ld linker flag to fix undefined references when using shared Score-P libraries. This happens on systems using --as-needed
as linker default. It will be handled transparently in future releases of Score-P.
Set the following environment variables:
If you want to use a sampling event and period differing from the default settings you additionally set:
Run your program
Example
Set add the Score-P wrapper to your MPI Fortran compiler.
Recompile the NAS BT-MZ code.
Batch script example: