In the last few months the mention of Datalog has increased, I wondered how it d...

refset · on Feb 15, 2023

XTDB, which is mentioned in the post, is subtly different from the other Clojure-based Datalog systems in this respect, because its Datalog engine executes in terms of multi-way joins using a "Worst-Case Optimal Join" implementation that is ideal for graph processing (vs. a tree of binary hash joins). Therefore, based on statistics and query planning heuristics, it will often perform graph pattern matching before resolving the logic/horn clauses. (source: I work on the XTDB team)

eternalban · on Feb 15, 2023

Interesting architecture:

https://raw.githubusercontent.com/xtdb/xtdb/master/docs/conc...

Btw, is that 'RocksDB or ?' for the local store current or other storage engines can get plugged in?

p.s. this is datomic's architecture for comparison.

https://docs.datomic.com/on-prem/images/clientarch_orig.svg

refset · on Feb 15, 2023

Hmm, it seems there's some SVG rendering issue when viewing directly. It should say "LMDB" in that empty spot. It renders okay for me on the GitHub readme: https://github.com/xtdb/xtdb (and here https://docs.xtdb.com/concepts/what-is-xtdb/)

felixyz · on Feb 15, 2023

I did an interview [1] with Kevin Feeney, one of the founders (no longer active) of TerminusDb, which goes into some depth about the difference between RDF stores and (property) graph databases, where the former is more closely aligned with datalog and logic programming. There are links to a really excellent series of blog posts by Kevin on this topic in the show notes.

[1] https://thesearch.space/episodes/5-kevin-feeney-on-terminusd...

forks · on Feb 15, 2023

I love The Search Space. Waiting patiently for new episodes!

felixyz · on Feb 16, 2023

Thank you! They're coming, I promise :)

YeGoblynQueenne · on Feb 17, 2023

Second that!

westurner · on Feb 15, 2023

With RDF* and SPARQL* ("RDF-star" and "SPARQL-star") how are triple (or quad) stores still distinct from property graphs?

RDFS and SHACL (and OWL) are optional in a triple store, which expects the subject and predicate to be string URIs, and there is an object datatype and optional language:

  (?s ?p ?o <datatype> [lang])

  (?subject:URI, ?predicate:URI, ?object:datatype, object_datatype, [object_language])

RDFS introduces rdfs:domain and rdfs:range type restrictions for Properties, and rdfs:Class and rdfs:subClassOf.

`a` means `rdf:type`; which does not require RDFS:

  ("#xyz", a,        "https://schema.org/Thing")
  ("#xyz", rdf:type, "https://schema.org/Thing")

Quad stores have a graph_id string URI "?g" for Named Graphs:

  (?g ?s ?p ?o)

  ("https://example.org/ns/graphs/0", "#xyz", a, "https://schema.org/Thing")

  ("https://example.org/ns/graphs/1", "#xyz", a, "https://schema.org/ScholarlyArticle")

There's a W3C CG (Community Group) revising very many of the W3C Linked Data specs to support RDF-star: https://www.w3.org/groups/wg/rdf-star

Looks like they ended up needing to update basically most of the current specs: https://www.w3.org/groups/wg/rdf-star/tools

"RDF-star and SPARQL-star" (Draft Community Group Report; 08 December 2022) https://w3c.github.io/rdf-star/cg-spec/editors_draft.html

GH topics: rdf-star, rdfstar: https://github.com/topics/rdf-star, https://github.com/topics/rdfstar

pyDatalog does datalog with SQLAlchemy and e.g. just the SQLite database: https://github.com/pcarbonn/pyDatalog ; and it is apparently superseded by IDP-Z3: https://gitlab.com/krr/IDP-Z3/

From https://twitter.com/westurner/status/1000516851984723968 :

> A feature comparison of SQL w/ EAV, SPARQL/SPARUL, [SPARQL12 SPARQL-star, [T-SPARQL, SPARQLMT,]], Cypher, Gremlin, GraphQL, and Datalog would be a useful resource for evaluating graph query languages.

> I'd probably use unstructured text search to identify the relevant resources first.

noduerme · on Feb 15, 2023

That's a really neat example of something I'm not familiar with. Going up a tree from child to parent is often the heaviest part of dealing with regular datasets, and usually requires a mix of queries and application logic. The idea of flattening the data along some pattern like that is of course always possible in a relational db, but it's not usually efficient, especially not for heavy writing. Lateral joins and window partitions can help. But this seems like an interesting approach to removing the app code completely.

flyingsilverfin · on Feb 15, 2023

I work on TypeDB (https://vaticle.com/typedb), and it sits somewhere at this intersection. The exposed query language has elements of both logic programming constructs and graph-like structures. Both amount to a kind of "constraint" programming.

rapnie · on Feb 15, 2023

A quick peek shows it seems along similar lines as TerminusDB sorta kinda, but they have WOQL [0]. At this time I start to worry again about all the different kinds and flavours of query languages that are emerging.

[0] https://en.wikipedia.org/wiki/TerminusDB#Query_language

convolvatron · on Feb 15, 2023

why would you worry? this space has been occupied by the default for so long, its refreshing to see people experiment with what might be possible.

cmrdporcupine · on Feb 15, 2023

Agree, only concern is that whatever emerges here has conceptual clarity and doesn't get bastardized by people who haven't studied the foundations of the relational model.

I have this fear because there's a history of that with novel query languages and DB platforms tossing in network/hierarchical/"document"/object-oriented features, and creating a dog's breakfast which loses the compositional/expressive power of the relational algebra. Think MongoDB or Redis. Conceptually a big mess.

RDF itself has a history of this as well. Appeals to novelty.

Or even Google's F1, which smashes hierarchical tree-structured protobufs into a SQL DB, and so has really weird behaviour on joins and projections.

Well, whatever, you know my opinions on this stuff, I think :-)

At this point I'd settle (or ask for) for a network available tuplestore which just receives relational-algebraic operators from a client, and optimizes/executes, and returns pure tuples, and the client-side could formulate whatever query language (or API) it wanted on top of that. I started playing with building something like that between the two jobs, but never got far.

felixyz · on Feb 15, 2023

I really like TypeDB! Haven't been able to use it for anything serious yet, but have a couple of project brewing where it might fit :)

cmrdporcupine · on Feb 15, 2023

You might be interested in https://relational.ai/

Treats graph edges as binary relations ("graph normal form"), has a Datalog-ish language. Built for managing large interconnected knowledge sets in a declarative way.

I recommend this talk: https://www.youtube.com/watch?v=WRHy7M30mM4

felixyz · on Feb 15, 2023

Great project, not open source alas. This is another great talk about RelationalAI (and its precursor), highlighting how using powerful databases can simplify complex applications: https://www.hytradboi.com/2022/experience-report-building-en...

cmrdporcupine · on Feb 15, 2023

Not speaking on behalf of the company, but... It remains difficult to do open source and also pay people.

felixyz · on Feb 16, 2023

Yes, sorry, I didn't mean that to come off as "if it's not open source, it's not worthy of attention".

In the case of RelationalAI, as far as I understand there is no way to even try it out without becoming a customer? I just wish there were, because I really do like the approach!