CPI flamegraphs

Performance engineering is often split into three groups:

  1. Software
  2. Deployment
  3. Hardware

Each group works independently to make a workload as fast as possible. Software folks look at big O complexity. Deployment folks try to pick the best OS’s, containers, and geo distributions. Hardware folk try to pick the fastest hardware with the lowest power requirements (classic power in money out machine). Each discipline is valuable angle but has tunnel vision on the complete picture of performance engineering.

Teaching observability as integral to the software process can solve this problem. Take for example the task of JSON processing. Many software developers address this problem from a big O perspect. Just for java alone there is a litany of solutions:

However few software developers see it from a hardware perspective. Recent vectorized instruction sets can perform massive operations in parallel which used to be sequential (see SIMDJSON). Lets try to solve this knowledge gap. Flamegraphs, a visualization made popular by Brendan Gregg, are often used to characterize software.

The data for this chart can be collected with the following perf command

sudo perf record -F 99 -ag sleep 60

But this doesn’t show any hardware insights. Let’s monitor instruction pipeline efficiency on top of this. Now the command looks like:

sudo perf record -F 99 -ag -e "{cycles,instructions}" sleep 60

This samples collapsed stacks, cpu-cycles, and instructions approximately 99 times a second (can be less if the cpu is in a power saving state). Now if we regenerate the graph but color by CPI (cycles per instructions) we can see what parts of our code flow quickly through the CPU and which are less efficient and require more cpu-cycles to complete.

Originally it looked like the stress-ng processes should be the targets of optimization but now it becomes clear IO is causing major cpu stall inside the Node process. Flamegraphs show where you’re CPU spends the most time but don’t convey which areas have the highest potential for improvement.