None of those are particularly optimized to make the most of the hardware capabi...

Voultapher · on Oct 6, 2023

Sorry but what exactly do you base that on? They are very much so designed to exploit the out-of-order, speculative, super-scalar nature of modern CPUs. vqsort is pure, well done SIMD and yet how do you explain that graph https://github.com/Voultapher/sort-research-rs/blob/main/wri... if your claims are true? Also there are reasons why SIMD is not a good fit for every situation.

janwas · on Oct 7, 2023

That's before the performance bugfix we talked about, right? (We re-initialized the RNG on every sort.)

Vqsort performance on M1 is indeed about half that of AVX-512. The NEON instruction set is missing some important operations for Quicksort, including Compress and popcount of vector masks.

Voultapher · on Oct 12, 2023

Yes, like everything before the addendum the results are for the the same tested vqsort version without the bug-fix, as I motivate in the section "Author's conclusion and opinion" at the end. I suspect thought that even the new version will not outperform crumsort or ipnsort on this machine.

mgaunard · on Oct 7, 2023

There are plenty of other SIMD sort libraries, some of which specifically optimized for AVX2, AVX512, NEON or SVE.

janwas · on Oct 6, 2023

Vqsort is indeed using SIMD :)

mgaunard · on Oct 7, 2023

I just looked at the link he gave and it wasn't.

Intel publishes its own fast library for this, which isn't even in the benchmark list.

janwas · on Oct 7, 2023

Measurements (by the same author) for Intel's x86-simd-sort and vqsort are here: https://github.com/Voultapher/sort-research-rs/blob/main/wri...

mgaunard · on Oct 7, 2023

Pretty impressive results on the new version of vqsort.

janwas · on Oct 8, 2023

Thanks :) We fixed a performance bug (re-seeding an RNG on each sort).