The time was taken by the code around the assembly language. It was an abstraction someone wrote in a way that happened not to be efficient. The ability of someone to write a costly abstraction doesn't take away from Rust any more than the possibility of doing that in C++ would from whether C++ has them.
It was caused by the code around it, but the actual CPU cycles stalled out in the assembly language, making it much harder to find the problem.
Is how I understood the post.
Anyway there's noting wrong with rust FFI. The overhead was because this function wanted to support two options and didn't implement that in the best way.
Fixing it took wizardry.