Good for training, definitely a bad idea for inference. But if you are spending ...

evilduck · on March 31, 2025

Powering ten 3060's and having a computer that can accept ten GPUs becomes a non-negligible hurdle to overcome.

bee_rider · on March 31, 2025

For LLM developers, is there really no advantage to having a big block of unified memory, rather than a bunch of devices with a small amount of memory each?

bick_nyers · on March 31, 2025

MoE inference wouldn't be terrible. That being said, there's not a good MoE model in the 70-160B range as far as I'm aware.