That is a really good point, and the 3090s have a surprising amount of VRAM on them. For many smaller models this is sufficient. However, where I work without going into a lot of specifics, because of the size of the models, the amount of VRAM is crucial, as well as the infrastructure of the PCI lanes connected to it, the speed of the local storage, and the networking between both cards on the same node as well as between nodes.
The moment the model gets to be bigger than the size of any one GPU's VRAM, the higher by orders of magnitude of difficulty in the process of training that model.
The moment the model gets to be bigger than the size of any one GPU's VRAM, the higher by orders of magnitude of difficulty in the process of training that model.