i've run inference on Intel Arc and it works just fine so i am not sure what you're talking about. I certainly didn't need docker! I've never tried to do anything on AMD yet.
I had the 16GB arc, and it was able to run inference at the speed i expected, but twice as many per batch as my 8GB card, which i think is about what you'd expect.
once the model is on the card, there's no "disk" anymore, so having more vram to load the model and the tokenizer and whatever else on means there's no disk, and realistically when i am running loads on my 24GB 3090 the CPU is maybe 4% over idle usage. My bottleneck, as it stands, to running large models is vram, not anything else.
If i needed to train (from scratch or whatever) i'd just rent time somewhere, even with a 128GB card locally, because obviously more tensors is better.
and you're getting downvoted because there's literally lm studio and llama.cpp and sd-webui that run just fine for inference on our non-dc, non-nvlink, 1/15th the cost GPUs.
I had the 16GB arc, and it was able to run inference at the speed i expected, but twice as many per batch as my 8GB card, which i think is about what you'd expect.
once the model is on the card, there's no "disk" anymore, so having more vram to load the model and the tokenizer and whatever else on means there's no disk, and realistically when i am running loads on my 24GB 3090 the CPU is maybe 4% over idle usage. My bottleneck, as it stands, to running large models is vram, not anything else.
If i needed to train (from scratch or whatever) i'd just rent time somewhere, even with a 128GB card locally, because obviously more tensors is better.
and you're getting downvoted because there's literally lm studio and llama.cpp and sd-webui that run just fine for inference on our non-dc, non-nvlink, 1/15th the cost GPUs.