Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Unfortunately SIMD in Rust tends to be pretty painful if you want to gracefully do runtime autodetection of a given SIMD extension (instead of it being a hard requirement for your program to even run).

The major problem is that Rust essentially requires you to annotate every (!) function in your whole call stack with e.g. `#[target_feature(enable = "avx2")]` to make sure that the SIMD intrinsics will actually get inlined (if they're not inlined then the performance is horrible, which makes using SIMD completely pointless). This makes it very hard to build any reasonable abstractions because you need to hardcode this all over your code. You can't have e.g. a `DataStructure<S>` where S is the SIMD ISA, so that you could do `DataStructure<AVX2>` or `DataStructure<SSE>` to get a nicely specialized version of it for a given instruction set. You need to copy-paste the whole thing with changed `target_feature` attributes (or use a procedural macro which does the copy-pasting) and have two entirely separate `DataStructureAVX2` and `DataStructureSSE` types.



It's not quite that dire. The `memchr` crate uses abstraction to limit the code duplication: https://github.com/BurntSushi/memchr/blob/1230fc5c638a4d922f...

That is, the `memchr` crate has a `Vector` trait that is generic over the vector type. Which is essentially your `DataStructure<S>` where `S` is the ISA. (Using the vector type isn't load bearing. I could do the ISA. But in `memchr`'s case, the vector type implies the ISA.)

It relies on `#[inline(always)]` to work. But you can write the algorithm generically once: https://github.com/BurntSushi/memchr/blob/1230fc5c638a4d922f...

And the entry point into a specific instantiation of the generic algorithm is where you apply `#[target_feature(enable = "foo")]`: https://github.com/BurntSushi/memchr/blob/1230fc5c638a4d922f...


> It relies on `#[inline(always)]` to work.

This is a workaround, but it's still very painful, and there are still many problems with it:

1) you need to design the whole thing with this in mind; you can't just take an existing something which takes some T and get that to use SIMD,

2) `#[inline(always)]` can't be used with `#[target_feature]` at the same time,

3) you need to have `#[inline(always)]` on your whole call stack (I often don't want to inline everything, but I still want to propagate that we're in an SIMD-safe context, which you normally do with `#[target_feature]`, but that's incompatible with being able to abstract over things),

4) you have to sprinkle `unsafe` everywhere (which is especially annoying as recently we gained the ability to finally be able to call a lot of the intrinsics without any `unsafe`),

5) you have to manually make sure you're not calling something you should not be calling (`#[target_feature]` is great here because it prevents you from accidentally calling e.g. AVX2 intrinsics if you're not on the AVX2 codepath),

6) any mistake (e.g. missing an annotation somewhere) you make will just have the compiler silently not inline the intrinsics, completely killing your performance, and you're completely on your own debugging it.

I work with SIMD a lot in Rust, and unfortunately currently it's a miserable experience. It's not that bad when you're writing e.g. an algorithm which fits in a single function, but once you want to write something bigger with more sophisticated abstractions and/or you want to integrate SIMD into existing code then it becomes a world of pain.


I don't deny that there are downsides. But your original comment was misleading. Hence my clarification. Your comment was too absolute. It is possible to avoid duplication and build some sound abstractions.

I use the same technique in the `aho-corasick` crate (which has more sophistication than `memchr`) and it works there too.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: