An easy to maintain stack from my experience that almost anyone can do: - S3 for...

hipadev23 · on Aug 11, 2024

Everytime I look at S3/Glue/Athena I can’t help but feeling like the Glue layer shouldn’t be necessary and it’s instead just part of athena’s ddl

ianburrell · on Aug 11, 2024

Athena is query engine and can use multiple catalogs. It forwards DDL queries to the catalog. Glue is the default catalog.

mulmen · on Aug 13, 2024

Glue catalogs can be used by other query engines as well. Separating schema from compute is the foundational concept behind a data lake.

fifilura · on Aug 12, 2024

Exactly, and same as my comment below (parquet+iceberg+s3).

And yes Athena is a part of that. And we also use dbt but mostly for a place to commit and push queries. And I agree with the other question about glue, it is the ugliest part.

I guess a +1 is not per hacker news standard, but i still want to give it some strength, given that we came up with the same solution independently.

OutOfHere · on Aug 11, 2024

Is DBT really necessary? (serious question) If so, why? What would go wrong by skipping it?

moltar · on Aug 11, 2024

No, not necessary at all. You can write queries and CTEs and create views in Athena/Glue by hand, if that’s what you prefer.

OutOfHere · on Aug 11, 2024

I mean what does DBT offer me here that makes it worthwhile?

moltar · on Aug 12, 2024

- an established convention for project organization

- a tool to run lots of SQL queries at scale

- a tool to create and update views in the correct graph order to avoid dependency issues (e.g. removing a column from child view that parent still depends on).

- SQL codegen / templating using Jinja

- an ecosystem of packages that provide useful utility macros. E.g. every project eventually needs a calendar. Just look at that SQL statement to generate one. It’s gnarly.

- a test runner on data to ensure quality and contract adherence to avoid breakage upstream.

itsoktocry · on Aug 12, 2024

It offers a well organized SQL project. You have to store the scripts somewhere.

You may not need it, I find it really useful.