Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

In the last few months the mention of Datalog has increased, I wondered how it differed from graph databases and found a clear answer in SO [1]. I am not an incumbent but found graph databases and clause approaches interesting.

[1] https://stackoverflow.com/questions/29192927/a-graph-db-vs-a... (2015)



XTDB, which is mentioned in the post, is subtly different from the other Clojure-based Datalog systems in this respect, because its Datalog engine executes in terms of multi-way joins using a "Worst-Case Optimal Join" implementation that is ideal for graph processing (vs. a tree of binary hash joins). Therefore, based on statistics and query planning heuristics, it will often perform graph pattern matching before resolving the logic/horn clauses. (source: I work on the XTDB team)


Interesting architecture:

https://raw.githubusercontent.com/xtdb/xtdb/master/docs/conc...

Btw, is that 'RocksDB or ?' for the local store current or other storage engines can get plugged in?

p.s. this is datomic's architecture for comparison.

https://docs.datomic.com/on-prem/images/clientarch_orig.svg


Hmm, it seems there's some SVG rendering issue when viewing directly. It should say "LMDB" in that empty spot. It renders okay for me on the GitHub readme: https://github.com/xtdb/xtdb (and here https://docs.xtdb.com/concepts/what-is-xtdb/)


I did an interview [1] with Kevin Feeney, one of the founders (no longer active) of TerminusDb, which goes into some depth about the difference between RDF stores and (property) graph databases, where the former is more closely aligned with datalog and logic programming. There are links to a really excellent series of blog posts by Kevin on this topic in the show notes.

[1] https://thesearch.space/episodes/5-kevin-feeney-on-terminusd...


I love The Search Space. Waiting patiently for new episodes!


Thank you! They're coming, I promise :)


Second that!


With RDF* and SPARQL* ("RDF-star" and "SPARQL-star") how are triple (or quad) stores still distinct from property graphs?

RDFS and SHACL (and OWL) are optional in a triple store, which expects the subject and predicate to be string URIs, and there is an object datatype and optional language:

  (?s ?p ?o <datatype> [lang])

  (?subject:URI, ?predicate:URI, ?object:datatype, object_datatype, [object_language])
RDFS introduces rdfs:domain and rdfs:range type restrictions for Properties, and rdfs:Class and rdfs:subClassOf.

`a` means `rdf:type`; which does not require RDFS:

  ("#xyz", a,        "https://schema.org/Thing")
  ("#xyz", rdf:type, "https://schema.org/Thing")
Quad stores have a graph_id string URI "?g" for Named Graphs:

  (?g ?s ?p ?o)

  ("https://example.org/ns/graphs/0", "#xyz", a, "https://schema.org/Thing")

  ("https://example.org/ns/graphs/1", "#xyz", a, "https://schema.org/ScholarlyArticle")
There's a W3C CG (Community Group) revising very many of the W3C Linked Data specs to support RDF-star: https://www.w3.org/groups/wg/rdf-star

Looks like they ended up needing to update basically most of the current specs: https://www.w3.org/groups/wg/rdf-star/tools

"RDF-star and SPARQL-star" (Draft Community Group Report; 08 December 2022) https://w3c.github.io/rdf-star/cg-spec/editors_draft.html

GH topics: rdf-star, rdfstar: https://github.com/topics/rdf-star, https://github.com/topics/rdfstar

pyDatalog does datalog with SQLAlchemy and e.g. just the SQLite database: https://github.com/pcarbonn/pyDatalog ; and it is apparently superseded by IDP-Z3: https://gitlab.com/krr/IDP-Z3/

From https://twitter.com/westurner/status/1000516851984723968 :

> A feature comparison of SQL w/ EAV, SPARQL/SPARUL, [SPARQL12 SPARQL-star, [T-SPARQL, SPARQLMT,]], Cypher, Gremlin, GraphQL, and Datalog would be a useful resource for evaluating graph query languages.

> I'd probably use unstructured text search to identify the relevant resources first.


That's a really neat example of something I'm not familiar with. Going up a tree from child to parent is often the heaviest part of dealing with regular datasets, and usually requires a mix of queries and application logic. The idea of flattening the data along some pattern like that is of course always possible in a relational db, but it's not usually efficient, especially not for heavy writing. Lateral joins and window partitions can help. But this seems like an interesting approach to removing the app code completely.


I work on TypeDB (https://vaticle.com/typedb), and it sits somewhere at this intersection. The exposed query language has elements of both logic programming constructs and graph-like structures. Both amount to a kind of "constraint" programming.


A quick peek shows it seems along similar lines as TerminusDB sorta kinda, but they have WOQL [0]. At this time I start to worry again about all the different kinds and flavours of query languages that are emerging.

[0] https://en.wikipedia.org/wiki/TerminusDB#Query_language


why would you worry? this space has been occupied by the default for so long, its refreshing to see people experiment with what might be possible.


Agree, only concern is that whatever emerges here has conceptual clarity and doesn't get bastardized by people who haven't studied the foundations of the relational model.

I have this fear because there's a history of that with novel query languages and DB platforms tossing in network/hierarchical/"document"/object-oriented features, and creating a dog's breakfast which loses the compositional/expressive power of the relational algebra. Think MongoDB or Redis. Conceptually a big mess.

RDF itself has a history of this as well. Appeals to novelty.

Or even Google's F1, which smashes hierarchical tree-structured protobufs into a SQL DB, and so has really weird behaviour on joins and projections.

Well, whatever, you know my opinions on this stuff, I think :-)

At this point I'd settle (or ask for) for a network available tuplestore which just receives relational-algebraic operators from a client, and optimizes/executes, and returns pure tuples, and the client-side could formulate whatever query language (or API) it wanted on top of that. I started playing with building something like that between the two jobs, but never got far.


I really like TypeDB! Haven't been able to use it for anything serious yet, but have a couple of project brewing where it might fit :)


You might be interested in https://relational.ai/

Treats graph edges as binary relations ("graph normal form"), has a Datalog-ish language. Built for managing large interconnected knowledge sets in a declarative way.

I recommend this talk: https://www.youtube.com/watch?v=WRHy7M30mM4


Great project, not open source alas. This is another great talk about RelationalAI (and its precursor), highlighting how using powerful databases can simplify complex applications: https://www.hytradboi.com/2022/experience-report-building-en...


Not speaking on behalf of the company, but... It remains difficult to do open source and also pay people.


Yes, sorry, I didn't mean that to come off as "if it's not open source, it's not worthy of attention".

In the case of RelationalAI, as far as I understand there is no way to even try it out without becoming a customer? I just wish there were, because I really do like the approach!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: