9.0-rc2 (revision ee0aaf9c9)
|
The general work-flow for performance analysis with Score-P is:
To invoke scorep-score you must provide the filename of a CUBE4 profile as argument. Thus, the basic command looks like this:
scorep-score profile.cubex
The output of the command may look like this (taking an MPI/OpenMP hybrid application as an example):
Estimated aggregate size of event trace: 20MB Estimated requirements for largest trace buffer (max_buf): 20MB Estimated memory requirements (SCOREP_TOTAL_MEMORY): 24MB (hint: When tracing set SCOREP_TOTAL_MEMORY=24MB to avoid intermediate flushes or reduce requirements using USR regions filters.) flt type max_buf[B] visits time[s] time[%] time/visit[us] region ALL 19,377,048 786,577 27.48 100.0 34.93 ALL USR 16,039,680 668,320 0.36 1.3 0.53 USR OMP 3,328,344 117,881 26.92 98.0 228.37 OMP COM 9,024 376 0.20 0.7 532.17 COM SCOREP 41 1 0.00 0.0 13.82 SCOREP
The first line of the output gives an estimation of the total size of the trace, aggregated over all processes. This information is useful for estimating the space required on disk. In the given example, the estimated total size of the event trace is 20MB.
The second line prints an estimation of the memory space required by a single process for the trace. The memory space that Score-P reserves on each process at application start must be large enough to hold the process' trace in memory in order to avoid flushes during runtime, because flushes heavily disturb measurements. In addition to the trace, Score-P requires some additional memory to maintain internal data structures. Thus, it provides also an estimation for the total amount of required memory on each process. The memory size per process that Score-P reserves is set via the environment variable SCOREP_TOTAL_MEMORY
. In the given example the per process memory should be larger than 24MB.
Beginning with the 6th line, scorep-score prints a table that show how the trace memory requirements and the runtime is distributed among certain function groups. The column max_tbc shows how much trace buffer is needed on a single process. The column time(s) shows how much execution time was spend in regions of that group in seconds, the column % shows the fraction of the overall runtime that was used by this group, and the column time/visit(us) shows the average time per visit in microseconds.
The following groups exist:
This group aggregates activities within the measurment system
For a more detailed output, which shows the data for every region, you can use the -r option. The command could look like this.
scorep-score profile.cubex -r
This command adds information about the used buffer sizes and execution time of every region to the table. The additional lines of the output may look like this:
flt type max_buf[B] visits time[s] time[%] time/visit[us] region COM 24 4 0.00 0.0 67.78 Init COM 24 4 0.00 0.0 81.20 main USR 24 4 0.12 2.0 30931.14 InitializeMatrix COM 24 4 0.05 0.8 12604.78 CheckError USR 24 4 0.00 0.0 23.76 PrintResults COM 24 4 0.01 0.2 3441.83 Finish COM 24 4 0.48 7.7 120338.17 Jacobi
The region name is displayed in the column named region. The column type shows to which group this region belongs. In the example above the function main belongs to group COM required 24 bytes per process and used 0 s execution time. By default, the regions are sorted by their buffer requirements. With the option -s
By default scorep-score uses demangled function names. However, if you want to map data to tools which use mangled names you might want to display mangled names. Furthermore, if you have trouble with function signatures that contain characters that also have a wildcard meaning, defining filters on mangled names might be easier. To display mangled names instead of demangled names, you can use the -m flag, e.g.,
scorep-score profile.cubex -r -m
If you have a filter file, you can test the effect of your filter on the trace file. Therefor, you need to pass a -f followed by the file name of your filter. E.g., if your filter file name is myfilter, the command looks like this:
scorep-score profile.cubex -f myfilter
An example output is:
Estimated aggregate size of event trace: 7kB Estimated requirements for largest trace buffer (max_buf): 1806 bytes Estimated memory requirements (SCOREP_TOTAL_MEMORY): 5MB (hint: When tracing set SCOREP_TOTAL_MEMORY=5MB to avoid intermediate flushes or reduce requirements using USR regions filters.) flt type max_buf[B] visits time[s] time[%] time/visit[us] region - ALL 2,093 172 5.17 100.0 30066.64 ALL - MPI 1,805 124 4.20 81.3 33910.31 MPI - COM 240 40 0.84 16.3 21092.44 COM - USR 48 8 0.12 2.4 15360.71 USR - SCOREP 41 4 0.00 0.0 13.82 SCOREP * ALL 1,805 124 4.20 81.3 33910.31 ALL-FLT - MPI 1,805 124 4.20 81.3 33910.31 MPI-FLT - SCOREP 41 4 0.00 0.0 13.82 SCOREP-FLT + FLT 288 48 0.97 18.7 20137.15 FLT
Now, the output estimates the total trace size an the required memory per process, if you would apply the provided filter for the measurement run which records the trace. A new group FLT appears, which contains all regions that are filtered. Under max_tbc the group FLT displays how the memory requirements per process are reduced. Furthermore, the groups that end on -FLT, like ALL-FLT contain only the unfiltered regions of the original group. E.g., USR-FLT contains all regions of group USR that are not filtered.
Furthermore, the column flt is no longer empty but contain a symbol that indicates how this group is affected by the filter. A '-' means 'not filtered', a '+' means 'filtered' and a '*' appears in front of groups that potentially can be affected by the filter.
You may combine the -f option with a -r option. In this case, for each function a '+' or '-' indicates whether the function is filtered.
Recording additional metrics, e.g., hardware counters may significantly increase the trace size, because for many events additional metric values are stored. In order to estimate the effects of these metrics, you may add a -c followed by the number of metrics you want to record, e.g.,
scorep-score profile.cubex -c 3
would mean that scorep-score estimates the disk and memory requirements for the case that you record 3 additional metrics.
With the -g option scorep-score can generate an initial filter file in Score-P filter file format. It provides a starting point for the user to adapt and change the filter file to his requirements.
The user can provide an optional list of parameters to control the inclusion heuristic of the filter file generation. A valid parameter list has the form KEY=VALUE[,KEY=VALUE]*. By default, the following control parameters are used:
bufferpercent=1,timepervisit=1A region is included in the filter file (i.e., excluded from measurement) if it matches all of the given conditions, with the following keys:
As an alternative to using the heuristics to limit the scope of the included regions, the parameter all creates a filter, that includes all filterable regions. This filter file serves as a starting point for a fully manual approach allowing the user to decide, which regions to keep without the need of copy&paste of the scoring output. To not interfere with heuristics mode this file is named max_scorep.filter. As this mode already contains every region there is no need for iterative behavior or the inclusion of filtered regions specified via -f. Additional calls to this option will backup existing files to preserve any changes.