Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Show HN: Seamless ML training on AWS Spot instances (spotml.io)
6 points by vishnukool on Nov 13, 2021 | hide | past | favorite | 3 comments


Take advantage of the MemVerge Fault Tolerance Service early access program to mitigate the risk of terminations which can save you a ton of heartache and money.

What you get 1.Memory Machine software with Fault Tolerance Service. 2.Free professional services needed to configure various services for automated recovery and restart. 3.A free license for Memory Machine for 6 months. 4.Free white glove support for 6 months.

What you have to do 1.Deploy a non-fault-tolerant and/or long-running workload on AWS. 2.Pay for services (AWS) not provided by MemVerge. 3.Provide on-going feedback to MemVerge about the status of the deployment and operation.

frank.berry@memverge.com

Lear more at https://bigmemorycloud.com/


SpotML is a command line tool that automatically manages ML training on AWS spot instances. It lets you handle spot interruptions by resuming training using the latest checkpoint.

Documentation link to try it out: https://docs.spotml.io/getting-started

Looking for feedback from early testers. You would be an ideal candidate if you have a side project that you're spending your own money to train.

Acknowledgement: - SpotML is built on top of existing open source library Spotty: https://github.com/spotty-cloud/spotty


Looks interesting, will try it out later this week for my pet project.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: