It’s been a few years but if I recall correctly the fact that there’s variation in the sampling itself makes the aliasing worse not better. At the very least should be no different.
It sounds like one of your concerns is being catastrophically unlucky with the sampling rate:
> This is because you run into aliasing with the samples being collected at unhelpful points.
I interpret this as you saying "we sample at times T, 2T, ..., but there's a hotspot that hits at T+0.001, T+0.003, ..., T+0.999, 2T+0.001, ..., and we never get to visit it". I'll grant that this could happen, altho it seems contrived, but my claim is that by sampling at "T+/- epsilon, 2T +/- 2 epsilon, ...." sooner or later you're going to start hitting that hotspot. And before too long if, say, 5% of the time the CPU is executing that code you're going to hit it, on average, 5% of the time.
It'll be aliased, sure, but in a way that smears across all frequency bins instead of getting missed. You won't be able to recover the true frequency (at least not without fancy sparse methods) but why do you care? The important question is "where is the CPU spending its time" and not "is this function being entered 100000 times per second or 100001 times".
Here's another general objection. The things being sampled are square waves: a function is either in the call stack or it's not, the program counter is either at a particular location or it's not, and so on. That means you're going to have energy at all odd partials, which you'll have to account for somehow, but however you do it it's not going to reflect the underlying behavior.
It’s not about being unlucky, it’s the second thing you point out. It’s not the difference between 100khz and 100.001khz. You’re going to roughly alias at multiples of the sampling frequency, so you can’t tell between 2khz, 10khz, and 100khz which means that even if you collect enough signal, your 1Mhz signal with code running at 1ghz is going to potentially look like a 10khz signal leading you to make the wrong conclusions about where time is being spent.
Remember - profiling tools aren’t even giving you a frequency bin and indicating which samples you see there. It’s giving you a sample and estimating frequency. Most people are not optimizing code that’s running at millions of times per second so it’s not a common problem, but all sorts of wrong conclusions start to get made.
> so you can’t tell between 2khz, 10khz, and 100khz
Only when your sample rate is perfect!
Let's say we have a 1GHz system, one instruction per Hz, a signal at 1MHz and we're measuring at 1kHZ. If we're rock-solid at 1kHz then, as you said, we can't distinguish 1MHz from 2kHz, but if we're off by a little bit, then things change.
Let's say the 1MHz events are at times 50, 1050, 2050, and so on, and the 1kHz sample rate triggers at 0, 1000000, 2000000, and so on. You'll obviously never see that event.
But suppose there's a little bit of noise in the sample rate timing and we're just under 1kHz, so now the samples are 0, 1000001, 2000001, 3000005, etc. Sooner or later we're going to hit that 1MHz event, and we're not going to hit it at the same rates as we would a 2kHz event, a 10kHz event, and so on.
We might not hit it that often, but we'll hit it sooner or later, and we'll also hit a lot of its neighbors.
You’re not describing anything different from how Nyquist pops up in analog signals either - sampling and signal will always have jitter and noise. I would recommend reading the linked pdf if you haven’t done so as it gives a number of examples of how Nyquist can screw up your intuition.
I read it when it was first posted, and skimmed it again just now to be sure, and my conclusion is unchanged: Nyquist has nothing useful to say about sampling profilers. I think the author would agree with me. See, for example, the discussion of EKG signals, or this bit from the intro:
> The difficulty with the Nyquist-Shannon sampling theorem is that it is based on the notion that the signal to be sampled must be perfectly band limited. This property of the theorem is unfortunate because no real world signal is truly and perfectly band limited.
whose relevance I'll get to below.
(I'm assuming that by "sampling profiler" you mean the usual practice of recording on a cadence the call stack / program counter / etc of some code; if this is not what you meant please clarify)
If you approach sampling profiling from a signals viewpoint, what's the signal that you're sampling? I see it as a collection of pulse waves of varying (and irregular!) duty cycles, each pulse wave corresponding to a yes/no answer to "is the program inside this function / executing this instruction / etc". At any given sample we collect our data and that tells us which waves are high; all others will be low.
Nyquist, as the above quote points out, only applies to perfectly band limited signals. Not only are pulse waves not perfectly band limited, they're actually extremely not band limited, with energy at harmonics going all the way up. Right away that should tell you that Nyquist is the wrong way to be thinking about things!
And furthermore, Nyquist tells you what you need if you want to reconstruct the original signal, but why do you want to do that in the first place? Do you actually care about the phase of the wave and the timings of the pulses, or do you just care about how often the wave is high? (i.e. how often any bit of code is executed). I don't think I've ever cared about the phase and timings when profiling, but I do care very much about how often the wave is high.