Large language models (LLMs) are good at learning from unstructured data. But a lot of the proprietary value that enterprises hold is locked up inside relational databases, spreadsheets, and other structured file types.
Large enterprises have long used knowledge graphs to better understand underlying relationships between data points, but these graphs are difficult to build and maintain, requiring effort on the part of developers, data engineers, and subject matter experts who know what the data actually means.
Knowledge graphs are a layer of connective tissue that sits on top of raw data stores, turning information into contextually meaningful knowledge. So in theory, they’d be a great way to help LLMs understand the meaning of corporate data sets, making it easier and more efficient for companies to find relevant data to embed into queries, and making the LLMs themselves faster and more accurate.
In June 2023, Gartner researchers said, data and analytics leaders must leverage the power of LLMs with the robustness of knowledge graphs for fault-tolerant AI applications. Vendors took the ball and ran with it. The first major announcement was in September 2023 by graph database company NebulaGraph. Their tool, Graph RAG, makes it easier for companies to use knowledge graphs as part of their retrieval augmented generation (RAG) implementations.
RAG is when, instead of just sending a simple question to an LLM, a company adds context to that question, by embedding relevant documents or information from a vector database. Without RAG, LLMs only know what they were trained on. With RAG, companies can add up-to-date information, or information that’s unique to their companies. For example, if an LLM is asked to provide information about a company’s product, manuals for that product and other reference materials would be extremely helpful.
Microsoft announced its GraphRAG project in February then open sourced it in July. Graph database company Neo4j also built an LLM Graph Transformer tool, which it donated to the open source LangChain project in March. In April, this tool was integrated into Google Cloud and Vertex AI as part of Google’s GraphRAG implementation.
Most recently, in early December, Amazon also announced support for GraphRAG with Amazon Neptune Analytics, as part of the Amazon Bedrock Knowledge Bases.
With all this activity, it’s no surprise that in November, Gartner put GraphRAG on its 2024 hype cycle for gen AI, half-way up the slope to the peak of inflated expectations. Gartner says it’ll take GraphRAG two to five years to reach maturity. By comparison, autonomous agents, located just below GraphRAG on the hype cycle, will take five to 10 years.
GraphRAG improves the accuracy, reliability, and explainability of RAG systems, Gartner says but adds that the downside is integrating knowledge graphs with gen AI models can be technically complex and computationally expensive. Not to mention that knowledge graphs by themselves aren’t for the faint of heart.
“I’ve been in the data space for 20 years, and for at least half of it, people have been trying to make knowledge graphs the way to go,” says Matt Aslett, director of research, data, and analytics at ISG Research.
There are some organizations that have invested in the technology, he adds, like large media and publishing companies, or pharmaceutical firms working on drug discovery. Novartis, for example, uses a graph database to link its internal data to an external database of research abstracts. The goal is to link genes, diseases, and compounds in order to accelerate drug discovery.
And Intuit has built its security knowledge platform on a knowledge graph, using knowledge graph vendor Neo4j technology, with 75 million database updates being fed into the graph per hour. But most enterprises aren’t using knowledge graphs, says Aslett. Companies that need to bring data together typically do one-off data integration projects instead.
“If you’ve gone through the knowledge graph process, then it’d make sense to have that information available to your AI project as well,” he adds. “But if you haven’t yet, then you’ve got this whole big project to go through first to get your information into a knowledge graph.”
In the past, that would be a daunting proposition. But now gen AI is being used to help create these knowledge graphs, accelerating the virtuous cycle that turns corporate data into actionable insights, and improving LLM accuracy while reducing cost and latency.
A demand for better supply
A knowledge graph can be built into a database, sit on top of a database, link multiple databases together, and even pull in information from other sources, all without changing the underlying data structures.
In traditional relational databases, the relationships between data points are part of the structure of the database itself, and are typically limited to key pieces of information. For example, customer records might be linked to individual transactions by having a common customer identification number. And those transactions could, in turn, be linked to a product database via a common product ID.
But figuring out that a particular group of customers all have the same preferences is a bit tricker, and things get even more complex when the relationships are more subtle.
Making all these relationships explicit via a knowledge graph makes it easier to extract all the relevant information when it comes time to provide an LLM with the context it needs to answer a question, producing more accurate results.
Enterprises typically use RAG embeddings to augment LLM queries with their proprietary knowledge, but experts estimate that accuracy rates are typically up to 70%.
“Approaches like traditional retrieval augmented generation often can’t achieve greater than 80% accuracy,” says Daniel Bukowski, CTO at Data2, a software startup working on the accuracy problem. “While this might be adequate for some uses, many industries and situations require at or near 99%.”
LLMs are optimized for unstructured data, adds Sudhir Hasbe, COO at Neo4j. “But a lot of enterprise data is structured, too. So how do you bring that structured and unstructured data together to answer questions? You want to be able to go and get the answer, and, more importantly, explain why you got the answer.”
Knowledge graphs reduce hallucinations, he says, but they also help solve the explainability challenge. Knowledge graphs sit on top of traditional databases, providing a layer of connection and deeper understanding, says Anant Adya, EVP at Infosys. “You can do better contextual search,” he says. “And it helps you drive better insights.”
Infosys is now running proof of concepts to use knowledge graphs to combine the knowledge the company has gathered over many years with gen AI tools. “We’re identifying those use cases where they can make a bigger impact,” he says. They include automated knowledge extraction, budgeting, procurement, and enterprise planning. “But it’s very early,” he adds. “It’s still not in production.”
One company that’s deployed a knowledge graph to improve gen AI performance, and wrote about, is LinkedIn. In a paper published in April, LinkedIn reports that combining RAG with a knowledge graph helped it improve the accuracy of a customer service gen AI application by 78%. And, over the preceding six months, the combo was used by LinkedIn’s customer service team, reducing the median per-issue resolution time by 29%.
Reducing cost and latency
When gen AI functionality is added to enterprise workflows, the queries are typically augmented with relevant information, typically from a vector database. And the more information that can be added to the query, the more context the LLM will have to produce a response.
“But the more context and documents I provide, the RAG gets bigger and bigger,” says Vamsi Duvvuri, technology, media, and entertainment, and telecommunications AI leader at EY. “And my system gets slower and slower.” Plus, gen AI vendors often charge by the token; the more information their models process, the higher the costs.
According to an April Microsoft research paper, GraphRAG required up to 97% fewer tokens while still providing more comprehensive answers than standard RAG.
When a knowledge graph is used as part of the RAG infrastructure, explicit connections can be used to quickly zero in on the most relevant information. “It becomes very efficient,” said Duvvuri. And companies are taking advantage of this, he says. “The hard question is how many of those solutions are seen in production, which is quite rare. But that’s true of a lot of gen AI applications.”
Putting LLMs to use
The challenge of creating knowledge graphs is they take real expertise to create. This was particularly difficult for large, complex data sets, exactly the ones where knowledge graphs were most needed. Much of the hard work of creating a knowledge graph is building the ontology like defining terms, deciding on classifications, and figuring out that two diverse pieces of data are somehow related. “And this is something that gen AI can be good at,” says ISG’s Aslett. Some vendors are already trying to offer this capability, he says, but the tools are still in the early stages of development.
Before gen AI, companies used to try to create knowledge graphs with machine learning. “We used to use natural language processing to create knowledge graphs, using name entity recognition and creating relationships using co-occurrence,” says Duvvuri. “The creation was very time-consuming because the NLP pipeline had to be trained. It was a high-effort way to get to it.”
Today, LLMs significantly reduce the time required to create knowledge graphs.
“I’ve personally created knowledge graphs using large language models,” he says. “It’s a great way to extract relationships. The power of a knowledge graph is accelerated by using a large language model, and adding a knowledge graph to an LLM accelerates its performance and improves the cost as well.”
Pierre Liang, professor of accounting at Carnegie Mellon University’s Tepper School of Business says gen AI has an uncanny way to generate knowledge that wasn’t possible before. “I’ve seen examples of this in my lab,” he says. “The opportunities for corporations in using LLMs to help us generate and use knowledge graphs is very promising.”
Read More from This Article: Knowledge graphs: the missing link in enterprise AI
Source: News