I started dressing nice at work, reasoning that looking sharp would buy me a few seconds or minutes of grace to allow my social deficiencies to catch up - just in case an executive decided to ask me a question.
Of course, that never happened for months, years until the one day I went in wearing cargo pants and a gothy synth band shirt and was greeted by a delegation of executives from out of town engaging everyone in small talk…
I worked for a downtown firm for a while which loosened up dress code a little bit so I didn’t always wear my jacket in—though cargo pants and rock T would definitely have led to an HR meeting. One day I had to borrow a jacket from someone when I had to go to a nearby studio for a TV interview:-)
Great article. Personally I have been learning more about the mathematics of beyond-CLT scenarios (fat tails, infinite variance etc)
The great philosophical question is why CLT applies so universally. The article explains it well as a consequence of the averaging process.
Alternatively, I’ve read that natural processes tend to exhibit Gaussian behaviour because there is a tendency towards equilibrium: forces, homeostasis, central potentials and so on and this equilibrium drives the measurable into the central region.
For processes such as prices in financial markets, with complicated feedback loops and reflexivity (in the Soros sense) the probability mass tends to ends up in the non central region, where the CLT does not apply.
The key principle is that you get CLT when a bunch of random factors add. Which happens in lots of places.
In finance, the effects of random factors tend to multiply. So you get a log-normal curve.
As Taleb points out, though, the underlying assumptions behind log-normal break in large market movements. Because in large movements, things that were uncorrelated, become correlated. Resulting in fat tails, where extreme combinations of events (aka "black swans") become far more likely than naively expected.
I know you know that and were just simplifying. Just wanted this fact to be better known for practitioners. Your comment on multiplicative processes is spot on.
Absolutely. The effect of straightforward correlations is a change in the variance, which can be measured in finance.
The effect of the nonlinear changing correlations is that future global behavior can't be predicted from local observations without a very sophisticated model.
The standard framing defines the Gaussian as this special object with a nice PDF, then presents the CLT as a surprising property it happens to have. But convolution of densities is the fundamental operation. If you keep convolving any finite-variance distribution with itself, the shape converges, and we called the limit "normal." The Gaussian is a fixed point of iterated convolution under √n rescaling. It earned its name by being the thing you inevitably get, not by having elegant closed-form properties.
The most interesting assumptions to relax are the independence assumptions. They're way more permissive than the textbook version suggests. You need dependence to decay fast enough, and mixing conditions (α-mixing, strong mixing) give you exactly that: correlations that die off let the CLT go through essentially unchanged. Where it genuinely breaks is long-range dependence -fractionally integrated processes, Hurst parameter above 0.5, where autocorrelations decay hyperbolically instead of exponentially. There the √n normalization is wrong, you get different scaling exponents, and sometimes non-Gaussian limits.
There are also interesting higher order terms. The √n is specifically the rate that zeroes out the higher-order cumulants. Skewness (third cumulant) decays at 1/√n, excess kurtosis at 1/n, and so on up. Edgeworth expansions formalize this as an asymptotic series in powers of 1/√n with cumulant-dependent coefficients. So the Gaussian is the leading term of that expansion, and Edgeworth tells you the rate and structure of convergence to it.
It is the not knowing, the unknown unknowns and known unknowns which result in the max entropy distribution's appearance. When we know more, it is not Gaussian. That is known.
Exactly this. From this perspective, the CLT then can be restated as: "it's interesting that when you add up a sufficiently large number of independent random variables, then even if you have a lot of specific detailed knowledge about each of those variables, in the end all you know about their sum is its mean and variation. But at least you do reliably know that much."
Came here basically looking to see this explanation. Normal dist is [approximately] common when summing lots of things we don't understand, otherwise, it isn't really.
>natural processes tend to exhibit Gaussian behaviour
to me it results of 2 factors - 1. Gaussian is the max entropy for a distribution with a given variance and 2. variance is the model of energy-limited behavior whereis physical processes are always under some energy limits. Basically it is the 2nd law.
AFAIK they still dominate on clock rate, which I was surprised to see when doing some back of the envelope calculations regarding core counts.
I felt my 8 core i9 9900K was inadequate, so shopped around for something AMD, and IIRC the core multiplier of the chip I found was dominated by the clock rate multiplier so it’s possible that at full utilization my i9 is still towards the best I can get at the price.
Not sure if I’m the typical consumer in this case however.
Your 9900k at 5ghz does work slower than a Ryzen 9800X3D at 5ghz. A lot slower (1700 single core geekbench vs 3300, and just about any benchmark will tell the same story). Clock speed alone doesn't mean anything.
>8 Cores and 16 processing threads, based on AMD "Zen 5" architecture
which is the same thread geometry as my 9900K.
My main concerns at the time were:
1. More cores for running large workloads on k8s since I had just upgraded to 128G RAM
2. More thread level parallelism for my C++ code
Naively I thought that, ceteris paribus and assuming good L1 cache utilization, having more physical cores with a higher clock rate would be the ticket for 2.
Does the 9800X3D have a wider pipeline or is it some other microarchitectural feature that makes it faster?
You don't even need to go into the pipeline details. The 9800X3D has 8x more L2 cache, 6x more L3 cache, 2x the memory bandwidth than the now 8 years old i9 9900K. 3D V-cache is pretty cool.
I purposely picked a CPU with the same thread geometry as your 9900K to avoid calls of "apples & oranges" or whatever. If you want more threads, the 9950X is right there in the same socket. Or Core Ultra 9 285k. Either of which will run circles around a 9900K in code compilation.
I think my i9 was released right after the Spectre and Meltdown mitigations in 2019, but I seem to remember even more recent vulns in that family… so that could also be a factor.
I replied to the sibling comment: I was making simplifying assumptions for two specific use cases and naively treated physical cores and clock rate as my variables.
Making funny memes of my friends mainly. ChatGPT won’t touch that, I haven’t tried with Claude yet, but grok keeps the group chat flush with laughing emojis.
That’s all I use it for really- things out of alignment with the other platforms- which IMO are better on every other metric (except having a sense of humour of course)
Perhaps the lesson here is upgrade your use case for AI's! All that power and that's your stumbling block? LOL, no disrespect.
Sure, I have no problem with what you're doing, and as things evolve I'm sure there'll be no problem, but there's countless other apps designed to do exactly what you've said.
As a Canadian I strongly felt it was GG to the Democrats when they didn’t run a second, competitive, knives-out primary for VP Harris.
For the second time, the party apparatus coalesced around a candidate who was ultimately trounced by someone wrongly considered unelectable.
Even if it was just theatre in the end, having a dramatic primary where the VP won would have made her look stronger and given her a chance to claw back some of the swing voters.
Or could have made her look worse because of the mud slinging between the candidates in the primary debates. You know that any criticism of a candidate by her competitors would have been trumpeted and distorted by Trump.
It feels like my era of education 2012-2020 (couple of degrees over that time) really deemphasized perf tuning, even heard it was practically useless in current day a few times.
I had a computer organization course that came close but mostly just described microarchitecture and its historical development, not so much the practical ways to exploit it.
Actually taking the time to sit down and poke around with techniques was mind blowing. I grew up during the golden age of CPU and OS advancements 90s-00s and the rush from seeing ‘instructions per cycle’ > 1 captured a bit of that magic that CRUD app dev and wrangling k8s just doesn’t have.
It's not worth optimising if you don't have a problem. Focus your effort.
If this code runs once per week at midnight, needs to finish by 5am, and currently it takes 18 minutes, the fact it could take 40 seconds isn't actually important and so spending meaningful engineering effort to go from 18 minutes to 40 seconds is a waste.
On the other hand, if the code runs on every toaster when it's started and ideally would finish before the toast pops up, but currently takes 4 minutes, then even getting it down to 2.5 minutes will make more customers happy [also, why the fuck are we running software in the toaster? But that's beside the point] and might well be worth doing.
The classic UX examples given are much closer to the latter category. When I type fast the symbols ought to appear immediately for example, if you can't do that then you have a performance problem and optimisation is appropriate. But so much of what software engineers do all day isn't in that space and doesn't need to prioritise performance so optimisation shouldn't be a priority.
>so much of what software engineers do all day isn't in that space
Seems to me that critical infra that supports a lot of modern computing is in that space though.
If you want to develop that depth of knowledge you need to go into HPC/scientific, trading or accelerator hardware. I didn’t get into this sometimes crazy industry to NOT learn stuff and push the limits of my computer.
I’m glad I know about those applications now, but I wonder how much of a disservice we did to the industry by just focusing on frameworks and abstraction especially now that you can just sling a lot of that out with a prompt…
In the era of kubernetes and edge servers and everything running on battery power, that distinction between need and want becomes much fuzzier because of course we can bin pack the more efficient one better or preserve another five minutes of standby time even if the wall clock behavior is moot.
And I’d also argue that if you wait to use a skill only until the need is dire then you will be both 1) shit at doing it and fail to achieve your goal well and 2) won’t have spent enough time on the cost/benefit analysis to know when things have changed over from want to need. Like the blind people I allude to in my top level.
I went to a top ten school. I had one semester of circuit design, one one of EE, and a couple of computer architecture that went over the MIPs and writing assembly.
I think there was some sort of transition of curriculum going on with the introductory classes though because the difficulty from one homework assignment to the next that first year of CS 1XX classes was pretty choppy. A friend and I made a game of one-upsmanship of adding our own constraints to the easier assignments to make them more interesting. Like taking larger inputs than the requirements and counting execution time.
When I left school my first job the application was glacially slow, and I learned half of what I know about optimization in a short stint there through trial and error. It was a couple jobs in before I ever got pushback and had to learn the human factors element. But it (the optimization balanced against readability, robustness, extensibility) was a way I have always made pedestrian work more interesting. There are whole classes of code smells that also contain performance penalties, and at the peak of my restlessness I needed those to keep my sanity without irritating coworkers. I’m just cleaning up this messy code, nothing to see here.
Reading release notes for other tools bragging on their improvements. Dev tools and frameworks are more forthcoming about how and what than consumer apps, but there are standouts from time to time. I read a ton of SIGPLAN proceedings during that era. Fortune favors the prepared mind and you look a lot smarter when you’re confronting a problem or opportunity with a primed pump rather than coming in cold (being friendly with other disciplines in your company also helps there).
reply