Linux perf is a very powerful performance analysis tool. It consists of two parts:
It is build around events which you can list with perf list. There are hardware (cycles, L1 cache misses, etc.) and software events (branches, faults, etc.). These events can be sampled in various ways:
perf record -F 99 sleep 4: Collects samples onperf_event’s which get approximately capped at 99 samples a second (could be more or less depending on c-state of the cpus)perf record -e '{cycles,cache-misses}:S' sleep 4: Collects samples on the event group leadercycles
Userspace Binary
There are two main ways to use the userspace binary:
perf stat
perf stat runs perf in a counting mode with minimal active overhead. It simply aggregates events and outputs sums at a fixed interval
hilldani@hilldani-mobl:~$ sudo perf stat -I 5000 -e cycles,instructions
# time counts unit events
5.010436323 1359010193 cycles
5.010436323 1040197178 instructions # 0.77 insn per cycle
10.029055772 462597075 cycles
10.029055772 283970127 instructions # 0.61 insn per cycle
15.047830993 1326929294 cycles
15.047830993 1054771130 instructions # 0.79 insn per cycle
15.639174290 17915015 cycles
15.639174290 5086827 instructions # 0.28 insn per cycle
perf record
perf record actively collects context for every sample and does not aggregate. It then has to be postprocessed by perf script.
hilldani@hilldani-mobl:~$ sudo perf record -F 99 -ag -e cycles sleep 4
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.303 MB perf.data (289 samples) ]
hilldani@hilldani-mobl:~$ sudo perf script
perf 128927 [000] 23729.724910: 1 cycles:
ffffffff8100ab03 __intel_pmu_enable_all.constprop.0+0x43 ([kernel.kallsyms])
ffffffff8124f282 event_function+0x82 ([kernel.kallsyms])
ffffffff8124971f remote_function+0x3f ([kernel.kallsyms])
ffffffff8119b23c generic_exec_single+0x4c ([kernel.kallsyms])
ffffffff8119b34b smp_call_function_single+0xdb ([kernel.kallsyms])
ffffffff8124fb74 event_function_call+0x114 ([kernel.kallsyms])
ffffffff81248952 perf_event_for_each_child+0x32 ([kernel.kallsyms])
ffffffff8125712b _perf_ioctl+0x20b ([kernel.kallsyms])
ffffffff8125789d perf_ioctl+0x3d ([kernel.kallsyms])
ffffffff81324838 __x64_sys_ioctl+0x88 ([kernel.kallsyms])
ffffffff81ed9e38 do_syscall_64+0x38 ([kernel.kallsyms])
ffffffff82000099 entry_SYSCALL_64_after_hwframe+0x61 ([kernel.kallsyms])
7f749433a3ab ruserok_af+0x4b (/usr/lib/x86_64-linux-gnu/libc-2.31.so)
559ffa7f472a __evlist__enable+0x1ea (/usr/local/bin/perf)
559ffa760f1e cmd_record+0x212e (/usr/local/bin/perf)
559ffa7de453 run_builtin+0x73 (/usr/local/bin/perf)
559ffa74723c main+0x67c (/usr/local/bin/perf)
7f749424a083 putenv+0xf3 (/usr/lib/x86_64-linux-gnu/libc-2.31.so)
Raw perf script output can be processed by whatever program you want. Here it shows the first sample from cpu 0 had only 1 cycles (it just started), and was running perf at the time of the sample. Below we get a call stack of what function we were in inside of perf at that moment
Kernel API’s
The perf binary is simply a wrapper around perf syscalls into the linux kernel.
fd = syscall(__NR_perf_event_open, ...);