Definitely the combination of callgrind (valgrind --tool=callgrind) and kcachegrind, or the combination of HotSpot and perf.
I have toyed with Intel's vTune, but I felt it was very hard to get running so its discouraging before you even start. That said, if you need a lot of info on cache etc., vTune is fantastic.
I have toyed with Intel's vTune, but I felt it was very hard to get running so its discouraging before you even start. That said, if you need a lot of info on cache etc., vTune is fantastic.