Supercomputing is a key technology of modern science and engineering, indispensable to solve critical problems of high complexity. However, since the number of cores on modern supercomputers is increasing from generation to generation, HPC applications are required to harness much higher degrees of parallelism to satisfy their growing demand for computing power. Therefore—as a prerequisite for the productive use of today's large-scale computing systems—the HPC community needs powerful and robust performance analysis tools that make the optimization of parallel applications both more effective and more efficient.

The Scalasca Trace Tools developed at the Jülich Supercomputing Centre are a collection of trace-based performance analysis tools that have been specifically designed for use on large-scale systems featuring hundreds of thousands of CPU cores, but also suitable for smaller HPC platforms. A distinctive feature of the Scalasca Trace Tools is its scalable automatic trace-analysis component which provides the ability to identify wait states that occur, for example, as a result of unevenly distributed workloads [6]. Especially when trying to scale communication intensive applications to large process counts, such wait states can present severe challenges to achieving good performance. Besides merely identifying wait states, the trace analyzer is also able to pinpoint their root causes (i.e., delays) [2], and to identify the activities on the critical path of the target application [3], highlighting those routines which determine the length of the program execution and therefore constitute the best candidates for optimization.

The current focus of the Scalasca Trace Tools analyses is on applications using MPI [11], OpenMP [15], POSIX threads [7], or hybrid MPI+OpenMP/Pthreads parallelization. While traces from applications using CUDA [13], OpenCL [8], or OpenACC [14] parallelization can be handled as long as they do not contain any device activities (i.e., only host-side events have been measured), no specific support for those paradigms has been implemented yet. Thus, analysis results from such traces need to be interpreted with care, but may nevertheless provide useful insights. We intend to add better support for those accelerator programming models in the future.

Unlike previous versions of the Scalasca toolset—which used a custom measurement system and trace data format—the Scalasca Trace Tools 2.x release series is based on the community-driven instrumentation and measurement infrastructure Score-P [10]. The Score-P software is jointly developed by a consortium of partners from Germany and the US, and supports a number of complementary performance analysis tools through the use of the common data formats CUBE4 for profiles and the Open Trace Format 2 (OTF2) [5] for event trace data. This significantly improves interoperability between Scalasca and other performance analysis tool suites such as Vampir [9] and TAU [18]. Nevertheless, backward compatibility to Scalasca 1.x is maintained where possible. For example, the Scalasca trace analyzer is still able to process trace measurements generated by the measurement system of the Scalasca 1.x release series.

This user guide is intended to address the needs of users which are new to Scalasca as well as those already familiar with previous versions of the Scalasca toolset. For both user groups, it is recommended to work through Chapter Getting started to get familiar with the intended Scalasca analysis workflow in general, and to learn about the changes compared to the Scalasca 1.x release series which are highlighted when appropriate. Later chapters then provide more in-depth reference information for the individual Scalasca commands and tools, and can be consulted when necessary.