perf¶
Brendan Gregg’s blog about perf.
Prerequisites¶
- If you want to see kernel symbols: sudo sysctl -w kernel.kptr_restrict=0.
- You probably also need to install the following packages:
  sudo apt-get install libelf-dev libunwind-dev libaudit-dev
- Make sure you use these flags during compilation: -g2 -fno-omit-frame-pointer -mno-omit-leaf-frame-pointer
- Make sure you run perf scriptwithsudo(as it needs access to kallsyms).
Counting events — perf stat -e¶
It can count various (primarily cpu) events.
- List all events: perf list.
- Example: perf stat -e cycles,instructions,cache-references,cache-misses,bus-cycles -a sleep 10.
Profiling / Sampling — perf top & perf record¶
perf top¶
You can use perf without recording perf.data (similarly to top/htop):
- -t— thread id, or- -p (pgrep -f "^/home/.*/bin/worker ")— process id.
- -d— update frequency in seconds.
- --call-graph dwarf— collect and show backtraces.
Record perf.data¶
Alternatively, you can record perf.data and then investigate it.
Automatically start and finish¶
Run the command below in one terminal. It will wait for “signal” to start recording, and then for another “signal” to stop recording).
In a second terminal, run the command wrapped in nc (the nc will signal to the first command to start and stop recording):
After it finishes, perf record started on the first terminal should produce perf.data file.
CPU counters¶
Alternatively, it is possible to sample/record a specific counter:
# Sample CPU stack traces, once every 10,000 Level 1 data cache misses, for 5 seconds:
perf record -e L1-dcache-load-misses -c 10000 -ag -- sleep 5
Tracing / Probes — perf probe --add¶
The main idea behind dynamic tracing is that sometimes there is no existing coutners / function, so you can add your
own probes in both Linux kernel and user code, and then use tracing (perf record -e). You can add them dynamically
(without recompilation of code) as well as statically (code has to be modified).
See perf-probe man for more info.
Collect perf.data¶
Let’s say our binary/library that has a symbol that we are interested in, is:
Firstly, find symbol name that we would like to trace. In case of C programs it easy. In the case of C++ we ask perf to dump all symbols in mangled form:
Then we can inspect:
And finally add probe:
And then just record data is it suggested in the output above: perf record -e probe_libkernelLib:_ZN14ArenaAllocator8allocateEm -aR sleep 1.
Cleanup:
Viewing / Visualising of perf.data¶
UIs¶
- 
pprof Web UI, Graphviz UI, FlameChart UI, Disassembly UI 
- 
perf report: Use one of the: 
- 
perf annotate: perf annotate
- 
Firefox Profiler: perf script -F +pid > /tmp/test.perf
- 
Speedscope 
- 
Flamegraph bn="gen2-worker5-F2987"; mw=0.01; suffix="$mw"; sudo perf script | ./FlameGraph/stackcollapse-perf.pl > $bn.perf.data.folded; \ cat $bn.perf.data.folded | ./FlameGraph/flamegraph.pl --minwidth $mw --width 2250 --title "$bn $suffix" --reverse --inverted > ./$bn-$suffix.perf-icicle.svg && \ cat $bn.perf.data.folded | ./FlameGraph/flamegraph.pl --minwidth $mw --width 2250 --title "$bn $suffix" > ./$bn-$suffix.perf-flame.svgset bn "gen2-F2987.1"; set mw 0.01; set suffix "$mw"; sudo perf script | ~/devel/FlameGraph/stackcollapse-perf.pl > $bn.perf.data.folded; \ cat $bn.perf.data.folded | ~/devel/FlameGraph/flamegraph.pl --minwidth $mw --width 2250 --title "$bn $suffix" --reverse --inverted > ./$bn-$suffix.perf-icicle.svg && \ cat $bn.perf.data.folded | ~/devel/FlameGraph/flamegraph.pl --minwidth $mw --width 2250 --title "$bn $suffix" > ./$bn-$suffix.perf-flame.svgWhat is FlameGraph?- Read more about FlameGraph here.
- Get flame graph: git clone git@github.com:brendangregg/FlameGraph.git
- You probably want to make this adjustment (otherwise it collapses different template C++ functions, which should not be collapsed):
 diff --git a/stackcollapse-perf.pl b/stackcollapse-perf.pl index fd3c78e..afc7be9 100755 --- a/stackcollapse-perf.pl +++ b/stackcollapse-perf.pl @@ -79,8 +79,8 @@ my $include_pname = 1; # include process names in stacks my $include_pid = 0; # include process ID with process name my $include_tid = 0; # include process & thread ID with process name my $include_addrs = 0; # include raw address where a symbol can't be found -my $tidy_java = 1; # condense Java signatures -my $tidy_generic = 1; # clean up function names a little +my $tidy_java = 0; # condense Java signatures +my $tidy_generic = 0; # clean up function names a little my $target_pname; # target process name from perf invocation my $event_filter = ""; # event type filter, defaults to first encountered event my $event_defaulted = 0; # whether we defaulted to an event (none provided)
- 
Inferno-Flamegraph / FlameGraph-rs 
- 
FlameScope 
- 
perf timechart 
- 
KDAB Hotspot