Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

This part caught my eye:

"Using a half-precision FSDP full shard with a 1024 sequence length and a micro batch size of 2 required 63GB of VRAM on each of the eight A100 80 GB GPUs. The training, lasting three epochs, took just 20 minutes. The total cost for the VM was $8.88 per hour, resulting in $3, not including the time for experiments and bug fixes."

I wondered where you could rent cycles on a machine like that, a quick Google found that p4d.24xlarge on AWS is available, while the on-demand cost is $20.1755 per hour, the Spot is only $8.99 (I guess it's gone up?)

Cool to know I could fine-tune for only ~$3.



I've been using vast.ai for a very long time. It is like a GPU marketplace, where people rent and lease GPUs. There are a lot of VMs with 4090, and beasts like 8xA100 80GB are also available from time to time.


I've used vast.ai to do some fine-tuning just a few days ago. It is indeed pretty great, though some servers fail to start up properly, or have some weird performance issues. I also wish they had more templates to try.


Yeah it works pretty well for the price - just need to be comfortable with running code and putting data on random peoples' computers (which I am for certain things). Someone on HN posted a script or snippet of output on mass-testing vast.ai servers for connectivity and configuration, and auto-labeling them using their API. Wish I could find it now... maybe with the search?


There are all these 8x 4090 machines on Vast.ai running in ASRock epyc servers and I just want to know where the hell all those are coming from. Like I want to see pictures of these setups, since there are no off-the-shelf 4090s with blower cooler setups and watercooling that many cards together is a lot of custom hardware. And the backstories, because the fact they are 4090s and not datacenter cards, are these hobbyists just building octo-gpu $18k EPYC rigs for fun? (I even saw one with 9x 4090s! gotta use up those occulink PCIe lanes) It's not ex-mining hardware since the 4090 landed after the Eth proof-of-stake-changeover.

I've been looking for an answer to this every time I check out the current vast.ai console.


There were some posts recently about 4090s being mass imported into China and the chips being desoldered/converted to lower height reference boards and blower fans.


I think Tensordock and vast.ai are cheaper than AWS. Lambda labs can be as well, but they seem to only have reserved instances now.


We are building dstack.ai, an open-source tool that helps run anything on vast.ai and TensorDock. Happy to hear your feedback.


Happy user of dstack.ai. The simplicity of just spinning up a machine with my required set of GPUs and memory, from my vendor of choice, and have an endpoint to easily access it via ssh and VSCode has been game changing for me.

I once had some trouble setting it up and the founder literally went on a zoom chat with me to help navigate through things. Couldnt recommend them enough!!


runpod.io is another good-and-cheap option


It caught mine too. I'm weighting several alternatives to "fine-tuning model fine-tuning", meaning the back-and-forth, trial-and-error previous to massively running the full training set.

My goal is to fine-tune a model on our codebase. I find RAG to be too orthopedic, I'd really would like to train the model on what is each part of the code and how we do things and see how it responds with a more complete perspective that goes beyond context.

The options I've considered for pre-fine-tuning:

- using a service like vast.ai, runpod, gradient or similar

- use Google Collab

- getting a more powerful MacBook, M3max with plenty of RAM


Excuse the ignorance but are you using these instances to fine tune a “fresh install” of a model, and then when you’ve finished fine tuning it do you download the whole model from the instance for use somewhere else?


First I download the weights of the base pre-trained model to the VM instance. Then I upload my data there. Afterward, I fine-tune either LoRA or full and when training finishes, from the VM instance I download the adapters in case of LoRA and full weights in case of full fine-tune and run inference on a way less expensive instance (usually 3090).


Check out other prices on https://gpumonger.com/

Disclosure: I collected the data and built the site, but it has a ton of comparison data for GPU clouds.




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: