This part caught my eye: "Using a half-precision FSDP full shard with a 1024 seq...

furiousteabag · on Nov 27, 2023

I've been using vast.ai for a very long time. It is like a GPU marketplace, where people rent and lease GPUs. There are a lot of VMs with 4090, and beasts like 8xA100 80GB are also available from time to time.

skerit · on Nov 27, 2023

I've used vast.ai to do some fine-tuning just a few days ago. It is indeed pretty great, though some servers fail to start up properly, or have some weird performance issues. I also wish they had more templates to try.

icelancer · on Nov 28, 2023

Yeah it works pretty well for the price - just need to be comfortable with running code and putting data on random peoples' computers (which I am for certain things). Someone on HN posted a script or snippet of output on mass-testing vast.ai servers for connectivity and configuration, and auto-labeling them using their API. Wish I could find it now... maybe with the search?

mk_stjames · on Nov 28, 2023

There are all these 8x 4090 machines on Vast.ai running in ASRock epyc servers and I just want to know where the hell all those are coming from. Like I want to see pictures of these setups, since there are no off-the-shelf 4090s with blower cooler setups and watercooling that many cards together is a lot of custom hardware. And the backstories, because the fact they are 4090s and not datacenter cards, are these hobbyists just building octo-gpu $18k EPYC rigs for fun? (I even saw one with 9x 4090s! gotta use up those occulink PCIe lanes) It's not ex-mining hardware since the 4090 landed after the Eth proof-of-stake-changeover.

I've been looking for an answer to this every time I check out the current vast.ai console.

crazysim · on Nov 28, 2023

There were some posts recently about 4090s being mass imported into China and the chips being desoldered/converted to lower height reference boards and blower fans.

_ea1k · on Nov 27, 2023

I think Tensordock and vast.ai are cheaper than AWS. Lambda labs can be as well, but they seem to only have reserved instances now.

cheptsov · on Nov 28, 2023

We are building dstack.ai, an open-source tool that helps run anything on vast.ai and TensorDock. Happy to hear your feedback.

oliverx0 · on Nov 28, 2023

Happy user of dstack.ai. The simplicity of just spinning up a machine with my required set of GPUs and memory, from my vendor of choice, and have an endpoint to easily access it via ssh and VSCode has been game changing for me.

I once had some trouble setting it up and the founder literally went on a zoom chat with me to help navigate through things. Couldnt recommend them enough!!

cosmojg · on Nov 27, 2023

runpod.io is another good-and-cheap option

ojosilva · on Nov 28, 2023

It caught mine too. I'm weighting several alternatives to "fine-tuning model fine-tuning", meaning the back-and-forth, trial-and-error previous to massively running the full training set.

My goal is to fine-tune a model on our codebase. I find RAG to be too orthopedic, I'd really would like to train the model on what is each part of the code and how we do things and see how it responds with a more complete perspective that goes beyond context.

The options I've considered for pre-fine-tuning:

- using a service like vast.ai, runpod, gradient or similar

- use Google Collab

- getting a more powerful MacBook, M3max with plenty of RAM

siquick · on Nov 27, 2023

Excuse the ignorance but are you using these instances to fine tune a “fresh install” of a model, and then when you’ve finished fine tuning it do you download the whole model from the instance for use somewhere else?

furiousteabag · on Nov 28, 2023

First I download the weights of the base pre-trained model to the VM instance. Then I upload my data there. Afterward, I fine-tune either LoRA or full and when training finishes, from the VM instance I download the adapters in case of LoRA and full weights in case of full fine-tune and run inference on a way less expensive instance (usually 3090).

bllchmbrs · on Nov 28, 2023

Check out other prices on https://gpumonger.com/

Disclosure: I collected the data and built the site, but it has a ton of comparison data for GPU clouds.