Why Traditional RAG Search is Inadequate for Security
Here is how traditional RAG search works (essentially Google search and Perplexity search work this way) - you take a large set of documents (the entire internet) , chunk them , convert them into embeddings, store them into a vector database. When the user searches a query, you get the closest n matched text from the embeddings. There are several othe flavors of RAG, such as agentic RAG , graph RAG - they all have the same architecture
Traditional RAG Arch
Problems with traditonal RAG search
Sourcing
Unlike consumer search use cases, security and compliance work is highly dependent on accuracy and sources. The impact of taking actions on the wrong sources is highly consequential. In RAG search, you don't have much control over the documents that RAG selects at inference time. Our compliance customers have complained again and again that ther workflow is not complete until they take the results from google or perplexity search, and cmd+F on the official document to verify the accuracy.
For example, take a simple search of password requirements for PCI. Since none of the answers are from official PCI documentation, a compliance engineer has to spend additional time searching the official documentation whether these answers are correct or not.
Accuracy
Along with sourcing , hand in hand comes with accuracy issues. If the user query is well known and has internet corpus , then the answers are fairly accurate in the traditional RAG methods as the answer is sourced from multiple internet URLs.
However, the accuracy falls apart when its needed - when it is not a well knowen question . We will lay out the accuracy issues as we discuss the alternate architecture that we use at Transilience AI
Alternate Architecture
An alternate architecture we propose is how cyber security consultant approaches the problem. The consultant, when the user asks, will surigically go and read the documents (or have the pre knowledge of the official documents and answers) and will only give answer from the official documents, which are authoritative. A good consultant also will tell you control numbers, page numbers for the answer as well
Transilience RAG Architecture
Lets test at the accuracy for both approaches for couple of deeper questions
Example 1 - PCI Assessment Findings
The official PCI documentation on assessment finding types. There are 4 possible assessment findings
PCI Official answer
Perplexity answer - Perplexity gives a 5th possible type, thats not an option
Transilience cyber consultant answer gets it right .
Transilience Cyber Consultant Answer
Example 2 - RBI
Lets take RBI requirements for co-operative banks. Here is a snippet from the official document
Official snippet
Here is perplexity answer
Perplexity answer - inaccurage
Here is the answer from Transilience cyber security consulant with references -
Answer from Transilience Cyber Consultant.
We are introducing this architecture as a beta functionality in Transilience cybersecurity consultant application. Just like our other apps (vulnerability and threat intelligence) that are powering our backend, we are offering this as a free apps for cyber professionals to use.
For commercial and to use it on our own custom documentation, please contact hello@transilience.ai
Transilience AI backend team
Smritika Sadhukhan ✨ Venkat Pothamsetty
Frontend team
Muzaffar Hossain Garima Sadhnani