I just wrote this years ago (go.mod said 1.12) for the fun of it, thought I had it in a Gist/GitHub, and uploaded it yesterday in response to this post.
One thing I remember trying was coding this up in ASM… which makes it worse, because it prevents inlining. But I learned the Go ASM syntax that way.
https://github.com/ncruces/fastmath/blob/main/fast.go
Also see this StackOverflow:
https://stackoverflow.com/questions/32042673/optimized-low-a...