Opinion: Don’t put your business data in s3 objects! It can work if all you will...

Terretta · on Aug 25, 2021

You might not have to make a SQL engine immediately.

- S3 Select: https://docs.aws.amazon.com/AmazonS3/latest/API/API_SelectOb...

- Athena: https://towardsdatascience.com/query-data-from-s3-files-usin...

anothernewdude · on Aug 26, 2021

Athena is a bad tool that if you have too much data or your queries take too long will just error out and give you empty results, and charge you for the privilege. It's cheaper and less pain to host your own Presto if you need it.

How much is too much data? It depends. On things you won't be told. Varies day to day. Totally random.

Hermitian909 · on Aug 25, 2021

> you will be stuck reimplementing a sql engine in your app code.

Only if you're intent on keeping the s3-only architecture when it no longer makes sense, and the transition to SQL need need not be hard if you've been disciplined about your data structures from the get go.

I've been part of a transition from s3 only to s3/sql when business requirements changed, it only took us a week to set up a db, set up a workflow manager to insert necessary data into the db when something uploaded to s3, and ran some batch jobs for data that already existed. Total data in s3 was ~1pb.

IMO it's all about whether your problem space fits well with s3-only, and sometimes it does!

cweagans · on Aug 25, 2021

+1 for SQLite. You might take a look at https://github.com/rqlite/rqlite if you really need replication. SQLite can go pretty far on its own though.

otoolep · on Aug 25, 2021

rqlite author here, happy to answer questions about it. I should note that rqlite is a distributed database that uses SQLite as its database engine. This means in practice it does replicate a SQLite database to each node on the rqlite cluster -- but it's not a SQLite replication system per-se.

https://github.com/rqlite/rqlite/blob/master/DOC/FAQ.md