ENGLISH

Knowledge graphs beyond the hype: Getting knowledge in and out of graphs and databases

385

Knowledge graphs are hyped. We can officially say this now, since Gartner included knowledge graphs in the 2018 hype cycle for emerging technologies. Though we did not have to wait for Gartner — declaring this as the “Year of the Graph” was our opener for 2018. Like anyone active in the field, we see the opportunity, as well as the threat in this: With hype comes confusion.

Knowledge graphs are real. They have been for the last 20 years at least. Knowledge graphs, in their original definition and incarnation, have been about knowledge representation and reasoning. Things such as controlled vocabularies, taxonomies, schemas, and ontologies have all been part of this, built on a Semantic Web foundation of standards and practices.

Also: Planet analytics 1.0: From the UN lab to the globe

So, what’s changed? How come the likes of Airbnb, Amazon, Google, LinkedIn, Uber, and Zalando sport knowledge graphs in their core business? How come Amazon and Microsoft joined the crowd of graph database vendors with their latest products? And how can you make this work?

Knowledge graphs before they were cool

Knowledge graphs sound cool and all. But what are they, exactly? It may sound like a naive question, but actually getting definitions right is how you build a knowledge graph. From taxonomies to ontologies — essentially, schemas and rules of varying complexity — that’s how people have been doing it for years.

RDF, the standard used to encode these schemas, has a graph structure. So, calling knowledge encoded on top of a graph structure a “knowledge graph” sounds natural. And the people doing this, the data modelers, have been called knowledge engineers, or ontologists.

Also: AWS Neptune going GA: The good, the bad, and the ugly for graph database users and vendors

There can be many applications for these knowledge graphs — from cataloguing items, to data integration and publishing on the web, to complex reasoning. For some of the most prominent ones, you can look at schema.org, Airbnb, Amazon, Diffbot, Google, LinkedIn, Uber, and Zalando. This is why people seasoned in knowledge graphs sneer at the hype.

pr4908665trendsintheemergingtechhypecycle2018hypecycle.png

Like any data modeling, this is hard and complicated work. It must take into account many stakeholders and views of the world, manage provenance and schema drift, and so on. Add to the mix reasoning, and web scale, and things easily get out of hand, which may explain why up until recently, this approach was not the most popular in the real world.

Going schema-less, on the other hand, has been and still is popular. Going schema-less can get you started quickly; it’s simpler and more flexible, at least up to a certain point. The simplicity of not using a schema can be deceiving though. Because, in the end, whatever your domain, a schema will exist. Schema-on-read? Fine. But no schema at all?

Also: GraphQL for databases: A layer for universal database access?

You may not know your schema well enough a priori. It may be complex, and it may evolve. But it will exist. So, ignoring or downplaying schema does not solve any problem, it only makes things worse. Issues will lurk, and cost you time and money, as they will hamper developers and analysts who will try to develop applications and derive insights on a fuzzy blob of data.

The point then is not to throw schema away, but to make it functional, flexible, and interchangeable. RDF is pretty good at this, as it also underlies standardized formats for data exchange, such as JSON-LD. RDF can also be used for lightweight schema and schema-less approaches, and data integration, by the way.

Getting knowledge into or out of graphs

So, what’s with the hype? How can a 20-year old technology be on the emerging slope of the infamous hype cycle? Hype is real, too, as is the reason for this. It’s the same story as the meteoric rise of the AI hype: It’s not so much that things have changed in the approach, it’s more that the data and compute power are there now to make it work at scale.

Plus, the AI itself helps. Or, to be more precise, the kind of bottom-up, machine learning-based AI that gets the hype these days. Knowledge graphs essentially are AI, too. Just another kind. Not some hyped-up-to-now AI, but the symbolic, top-down, rule-based kind. The hitherto unpopular kind.

It’s not that this approach does not have its limitations. It’s hard to encode knowledge about complex domains in a functional way, and to reason about it at scale. So, the machine learning way of doing things, just like the schema-less way, got popular. And for good reasons, too.

With the big data explosion, and the rise of NoSQL, something else started happening, too. Tools and databases for non-RDF graphs appeared in the market, and started finding success. These graphs, of the labeled property kind (LPG), are simpler and less verbose. They either lack schema, or have basic schema capabilities compared to RDF.

And they typically perform better for operational applications, graph algorithms, or graph analytics. Lately, graphs are starting to be used for machine learning, too. These are all very useful things.

Algorithms, analytics and machine learning can provide insights about graphs, with some common use cases being fraud detection or recommendations. You could therefore say that such techniques and applications get knowledge out of graphs, bottom-up. RDF graphs on the other hand get knowledge into graphs, top-down.

So, are bottom-up graphs knowledge graphs, too?

Also: Moving fast without breaking data: Governance for managing risk in machine learning and beyond

As a knowledge engineer would say, it’s a matter of semantics. It’s tempting to ride the knowledge graph hype. But in the end, lack of clarity might prove of little service. Graph algorithms, graph analytics, and graph-based machine learning and insights are all good, accurate terms. And they are not mutually exclusive with “traditional” knowledge graphs either.

All the prominent use cases we mentioned earlier are based on a combination of approaches. Having a knowledge graph and populating it using machine learning for example has helped build the biggest knowledge graph ever — at least in terms of instances, if not entities. And it’s what AI pioneers like DeepMind are researching, as well.

Some things old, some things new, and some things borrowed for graph databases

As usual, the choice of approach and tool to use for your graph depends on your use case. This also applies to graph databases, which we have been closely monitoring as they evolve, with new vendors and capabilities being added rapidly.

Last week at Strata, both the winner and the runner-up for the Most Disruptive Startup award were graph databases: TigerGraph and Memgraph. In case you needed more proof of how rapid progress is made in the field, there you have it. Both startups are no more than a couple of years old, by the way.

For TigerGraph, which came out of stealth in September 2017, this has been a very active year. Today, TigerGraph is announcing a new release. And it’s got some things old, some things new, and some things borrowed — though we could not really spot anything blue.