Unfortunately SIMD in Rust tends to be pretty painful if you want to gracefully ...

burntsushi · 2025-11-06T14:49:32 1762440572

It's not quite that dire. The `memchr` crate uses abstraction to limit the code duplication: https://github.com/BurntSushi/memchr/blob/1230fc5c638a4d922f...

That is, the `memchr` crate has a `Vector` trait that is generic over the vector type. Which is essentially your `DataStructure<S>` where `S` is the ISA. (Using the vector type isn't load bearing. I could do the ISA. But in `memchr`'s case, the vector type implies the ISA.)

It relies on `#[inline(always)]` to work. But you can write the algorithm generically once: https://github.com/BurntSushi/memchr/blob/1230fc5c638a4d922f...

And the entry point into a specific instantiation of the generic algorithm is where you apply `#[target_feature(enable = "foo")]`: https://github.com/BurntSushi/memchr/blob/1230fc5c638a4d922f...

kouteiheika · 2025-11-06T15:49:41 1762444181

> It relies on `#[inline(always)]` to work.

This is a workaround, but it's still very painful, and there are still many problems with it:

1) you need to design the whole thing with this in mind; you can't just take an existing something which takes some T and get that to use SIMD,

2) `#[inline(always)]` can't be used with `#[target_feature]` at the same time,

3) you need to have `#[inline(always)]` on your whole call stack (I often don't want to inline everything, but I still want to propagate that we're in an SIMD-safe context, which you normally do with `#[target_feature]`, but that's incompatible with being able to abstract over things),

4) you have to sprinkle `unsafe` everywhere (which is especially annoying as recently we gained the ability to finally be able to call a lot of the intrinsics without any `unsafe`),

5) you have to manually make sure you're not calling something you should not be calling (`#[target_feature]` is great here because it prevents you from accidentally calling e.g. AVX2 intrinsics if you're not on the AVX2 codepath),

6) any mistake (e.g. missing an annotation somewhere) you make will just have the compiler silently not inline the intrinsics, completely killing your performance, and you're completely on your own debugging it.

I work with SIMD a lot in Rust, and unfortunately currently it's a miserable experience. It's not that bad when you're writing e.g. an algorithm which fits in a single function, but once you want to write something bigger with more sophisticated abstractions and/or you want to integrate SIMD into existing code then it becomes a world of pain.

burntsushi · 2025-11-06T16:10:07 1762445407

I don't deny that there are downsides. But your original comment was misleading. Hence my clarification. Your comment was too absolute. It is possible to avoid duplication and build some sound abstractions.

I use the same technique in the `aho-corasick` crate (which has more sophistication than `memchr`) and it works there too.