operf, callgrind & kcachegrind

It is possible to combine output from valgrind's callgrind and opreport, and thus recover accurate call counts, as an alternative to using gprof.

The simple recipe for using operf with kcachegrind was:

[Compile with -g]
$ operf -gl ./a.out
$ op2kcg -o out.dat
$ kcachegrind out.dat &

To use with callgrind this becomes

[Compile with -g]
$ operf -gl ./a.out
$ valgrind --tool=callgrind --compress-strings=no --separate-recs=1 --callgrind-out-file=callgrind.out ./a.out
$ op2kcg -cg -o out.dat
$ kcachegrind out.dat &

The parser in op2kcg is rather simple, hence --compress-strings=no is required.

The output should be identical to that produced in the gprof example.

KCachegrind screenshot

gprof vs callgrind

Which of gprof and callgrind is better for this analysis?

The results from each would be expected to be identical. Whereas gprof works by adding extra code around each function call, callgrind works by running the code on an emulated CPU and trapping call instructions. The disadvantage of gprof is the need to recompile, whereas the disadvantage of callgrind is an even larger run-time overhead than gprof. Their overheads are different too: gprof's is mostly per function call, whereas callgrind's is continuous. For the simple example here, gprof increased the runtime by about 25%, whereas callgrind increased it by a factor of over 15. In neither case would one want to pay much attention to timing data from such a run.