perf¶
Brendan Gregg’s blog about perf.
Prerequisites¶
- If you want to see kernel symbols:
sudo sysctl -w kernel.kptr_restrict=0
. - You probably also need to install the following packages:
sudo apt-get install libelf-dev libunwind-dev libaudit-dev
- Make sure you use these flags during compilation:
-g2 -fno-omit-frame-pointer -mno-omit-leaf-frame-pointer
- Make sure you run
perf script
withsudo
(as it needs access to kallsyms).
Counting events — perf stat -e
¶
It can count various (primarily cpu) events.
- List all events:
perf list
. - Example:
perf stat -e cycles,instructions,cache-references,cache-misses,bus-cycles -a sleep 10
.
Profiling / Sampling — perf top
& perf record
¶
perf top¶
You can use perf
without recording perf.data
(similarly to top
/htop
):
-t
— thread id, or-p (pgrep -f "^/home/.*/bin/worker ")
— process id.-d
— update frequency in seconds.--call-graph dwarf
— collect and show backtraces.
Record perf.data
¶
Alternatively, you can record perf.data
and then investigate it.
Automatically start and finish¶
Run the command below in one terminal. It will wait for “signal” to start recording, and then for another “signal” to stop recording).
In a second terminal, run the command wrapped in nc
(the nc
will signal to the first command to start and stop recording):
After it finishes, perf record
started on the first terminal should produce perf.data
file.
CPU counters¶
Alternatively, it is possible to sample/record a specific counter:
# Sample CPU stack traces, once every 10,000 Level 1 data cache misses, for 5 seconds:
perf record -e L1-dcache-load-misses -c 10000 -ag -- sleep 5
Tracing / Probes — perf probe --add
¶
The main idea behind dynamic tracing is that sometimes there is no existing coutners / function, so you can add your
own probes in both Linux kernel and user code, and then use tracing (perf record -e
). You can add them dynamically
(without recompilation of code) as well as statically (code has to be modified).
See perf-probe
man for more info.
Collect perf.data
¶
Let’s say our binary/library that has a symbol that we are interested in, is:
Firstly, find symbol name that we would like to trace. In case of C programs it easy. In the case of C++ we ask perf to dump all symbols in mangled form:
Then we can inspect:
And finally add probe:
And then just record data is it suggested in the output above: perf record -e probe_libkernelLib:_ZN14ArenaAllocator8allocateEm -aR sleep 1
.
Cleanup:
Viewing / Visualising of perf.data
¶
UIs¶
-
pprof Web UI, Graphviz UI, FlameChart UI, Disassembly UI
-
perf report: Use one of the:
-
perf annotate:
perf annotate
-
Firefox Profiler:
perf script -F +pid > /tmp/test.perf
-
Speedscope
-
Flamegraph
bn="gen2-worker5-F2987"; mw=0.01; suffix="$mw"; sudo perf script | ./FlameGraph/stackcollapse-perf.pl > $bn.perf.data.folded; \ cat $bn.perf.data.folded | ./FlameGraph/flamegraph.pl --minwidth $mw --width 2250 --title "$bn $suffix" --reverse --inverted > ./$bn-$suffix.perf-icicle.svg && \ cat $bn.perf.data.folded | ./FlameGraph/flamegraph.pl --minwidth $mw --width 2250 --title "$bn $suffix" > ./$bn-$suffix.perf-flame.svg
set bn "gen2-F2987.1"; set mw 0.01; set suffix "$mw"; sudo perf script | ~/devel/FlameGraph/stackcollapse-perf.pl > $bn.perf.data.folded; \ cat $bn.perf.data.folded | ~/devel/FlameGraph/flamegraph.pl --minwidth $mw --width 2250 --title "$bn $suffix" --reverse --inverted > ./$bn-$suffix.perf-icicle.svg && \ cat $bn.perf.data.folded | ~/devel/FlameGraph/flamegraph.pl --minwidth $mw --width 2250 --title "$bn $suffix" > ./$bn-$suffix.perf-flame.svg
What is FlameGraph?
- Read more about FlameGraph here.
- Get flame graph:
git clone git@github.com:brendangregg/FlameGraph.git
- You probably want to make this adjustment (otherwise it collapses different template C++ functions, which should not be collapsed):
diff --git a/stackcollapse-perf.pl b/stackcollapse-perf.pl index fd3c78e..afc7be9 100755 --- a/stackcollapse-perf.pl +++ b/stackcollapse-perf.pl @@ -79,8 +79,8 @@ my $include_pname = 1; # include process names in stacks my $include_pid = 0; # include process ID with process name my $include_tid = 0; # include process & thread ID with process name my $include_addrs = 0; # include raw address where a symbol can't be found -my $tidy_java = 1; # condense Java signatures -my $tidy_generic = 1; # clean up function names a little +my $tidy_java = 0; # condense Java signatures +my $tidy_generic = 0; # clean up function names a little my $target_pname; # target process name from perf invocation my $event_filter = ""; # event type filter, defaults to first encountered event my $event_defaulted = 0; # whether we defaulted to an event (none provided)
-
Inferno-Flamegraph / FlameGraph-rs
-
FlameScope
-
perf timechart
-
KDAB Hotspot