ksqlDB: Event streaming database for stream processing applications

xearl · on Nov 23, 2019

The naming strikes me as quite unfortunate. Kdb+ is a (time-series) database that supports the languages K and SQL. Naming a product in a similar space KSQL was already a questionable idea. Jumping through the hoops of renaming only to choose ksqlDB (which, arguably, is even more confusing) is, well, unfortunate.

therein · on Nov 23, 2019

Entirely agreed. I immediately thought this had something to do with Kx Systems.

georgewfraser · on Nov 23, 2019

Streams are a fascinating topic, but this is not in any way shape or form a substitute for a real database. Take a look at the docs for joins:

https://docs.confluent.io/current/ksql/docs/developer-guide/...

It’s a long list of caveats about how your queries will silently fail if you don’t guarantee all sorts of properties of the data.

Anyone considering making Kafka the center of your data infrastructure should also consider using a conventional column-store database instead. The only advantage of Kafka in this comparison is better latency (seconds vs minutes). If you can live with a minute or two of latency, the advantages of a real database are MANY.

dig1 · on Nov 23, 2019

There is major difference between Kafka and database(s): Kafka is used for passing data through and databases are for storing the data. Also, KSQL/kdbSQL are not intended to be general tool for transforming all data, but to assist you without writing custom services (unless I'm getting things wrong).

Now, should Kafka be central place for routing data? I think that is the main reason why it exists. Should it be used for permanently storing data like conventional database? Absolutely not.

By design, it will temporarily store it in case consumer goes offline, but you can also use it to implement backpressure/queue for slow receivers that are not capable to catch up with fast data ingestion (e.g. syslog -> kafka -> logstash).

synthc · on Nov 23, 2019

accompanying blog post: https://www.confluent.io/blog/intro-to-ksqldb-sql-database-s...

darren0 · on Nov 23, 2019

I really think most people want a traditional SQL database that has a stream query that gives real time updates. I honestly don't know enough about databases to understand why that isn't a simple enough feature to implement.

lelc · on Nov 23, 2019

Looks a lot like what VoltDB has for stream processing. Will definitely make developing streaming apps a lot easier.

arnon · on Nov 23, 2019

> It does not provide read-after-write consistency

So, this is probably the last "database" you should ever opt to use, unless you're in a very serious bind.

egwor · on Nov 23, 2019

I don't think that KDB does, either? MY understanding of this in KDB, and I could be wrong (I'm not an expert) is that in KDB you can write to the tickerplant database. Those clients subscribing to the realtime will see this in realtime, and then eventually it will get asynchronously written to another DB which allows querying. I guess you'd think of it as eventual consistency. (Please correct me if I'm incorrect/misunderstood the above)

bladecatcher · on Nov 23, 2019

You’re right. You don’t get read-your-own-writes guarantees in a typical Kdb tickerplant architecture, as the event streams are propagated to the real-time nodes asynchronously.

bladecatcher · on Nov 23, 2019

I think the usecase for this is streaming queries and aggregations where the query output is materialised asynchronously

missosoup · on Nov 23, 2019

What does this actually mean in practical terms for the db?

Erwin · on Nov 23, 2019

AIUI you can only write to the database indirectly through Kafka messages which the KSQL then periodically consumes and materializes into tables with syntax almost like SQL.

So if you emit an event (like "New User Created") and have a ksql table that summarizes the user count, then Kafka having accepted the event does not mean the ksql tables have also been updated.

Contrasted with a traditional database where a COMMIT returning one one session means the second can immediately read it.

missosoup · on Nov 23, 2019

That's standard behaviour in every analytics database that talks to kafka that I can think of. Kafka doesn't really encourage ACK/NACK type processing patterns. Accepting an event usually means the consumer has successfully read it and staged it for whatever is meant to happen next, not that the operation is completed.

Now if it were possible for it to accept an event from Kafka but not guarantee that the event will eventually make it into the materialized view but may be lost, that'd be a problem.

bladecatcher · on Nov 23, 2019

> Now if it were possible for it to accept an event from Kafka but not guarantee that the event will eventually make it into the materialized view but may be lost, that'd be a problem.

This is usually not a problem these days as it’s possible to guarantee exactly once ingestion using Kafka offsets

missosoup · on Nov 23, 2019

IIRC clickhouse still doesn't have this guarantee. It guarantees exactly one ingestion but not that the ingested event gets processed all the way through to view. And if the processing fails, that event won't be retried and is now gone.

geocar · on Nov 23, 2019

The subliminal uppercase is a bit tasteless though.

neonate · on Nov 23, 2019

I don't understand, can you explain?