Duckdb is an excellent OLAP db, I have had customers who had s3 data lake of par...

adammarples · 2025-10-24T18:59:36 1761332376

I have been playing today with ducklake, and I have to confess I don't quite get what it does that duckdb doesn't already do, if duckdb can just run on top of parquet files quite happily without this extension...

RobinL · 2025-10-24T20:01:39 1761336099

It's main purpose is to solve the problem of upserts to a data lake, because upsert operations to file based data storage are a real pain.

mrtimo · 2025-10-24T16:44:19 1761324259

I have experience with duckDB but not databricks... from the perspective of a company, is a tool like databricks more "secure" than duckdb? If my company adopts duckdb as a datalake, how do we secure it?

rapatel0 · 2025-10-24T17:12:02 1761325922

Duckdb can run as a local instance that points to parquet files in a n s3 bucket. So your "auth" can live on the layer that gives permissions to access that bucket.

lopatin · 2025-10-24T17:08:20 1761325700

DuckDB is great but it’s barely OLAP right? A key part of OLAP is “online”. Since the writer process blocks any other processes from doing reads, calling it OLAP is a stretch I think.

ansgri · 2025-10-24T18:17:47 1761329867

Isn't the Online part here about getting results immediately after query, as opposed to overnight batch reports? So if you don't completely overwhelm DuckDB with writes, it still qualifies. The quality you're describing is something like "realtime analytics", and is a whole another category: Clickhouse doesn't qualify (batching updates, merging etc. — but it's clearly OLAP), Druid does.

lopatin · 2025-10-24T18:47:30 1761331650

Huh yeah looks like I was totally wrong about what online meant. So yeah DuckDB is OLAP. Not that anyone was asking me in the first place. Carry on :)

sdairs · 2025-10-24T18:49:06 1761331746

ClickHouse is the market leader in real-time analytics so it's an interesting take that you don't think it qualifies.

ansgri · 2025-10-24T21:42:33 1761342153

For certain definition of realtime, certainly (as would any system with bounded ingestion latency), but it’s not low-latency streaming realtime. Tens of seconds or more can pass before new data becomes visible in queries in normal operation. There’s batching, there’s merging, and its overall architecture prioritizes throughput over latency.