Retrieval-Augmented Generation (RAG)

RAG is a specialised AI technique that enhances the performance and accuracy of language models by retrieving relevant context from connected knowledge sources before generating a response. Instead of relying entirely on pre-trained model knowledge, RAG dynamically pulls up-to-date, grounded information from documents, databases, or APIs—ensuring that responses are accurate, relevant, and actionable.

RAG blocks are most useful in high-context or high-precision environments, where relying solely on a model’s memory may lead to hallucinations or incorrect assumptions.

🔑 Key Concepts

RAG Modules are built around the idea of combining search-based retrieval with language generation. They allow LLMs to consult external knowledge in real-time, improving factual accuracy and contextual relevance.

RAG is particularly effective for:

Structured knowledge bases (e.g., PDF process manuals, FAQs, documentation)
Support automation
Enterprise Q&A
Decision-making systems
Document summarization

How RAG Works

Triggered Contextual Query: When a query is received, the RAG module retrieves supporting content from a connected knowledge source (e.g., PDF, Notion, Slack, HubSpot).
Retrieval Stack: The retrieval logic uses vector search (semantic similarity) and keyword-based search (e.g., BM25) to find relevant data.
Reranking: Retrieved items are optionally reranked for contextual relevance, ensuring the best content is prioritized.
Synthesis: The reranked results are passed into an LLM to synthesize a final response grounded in the retrieved material.

Standard Node Pipeline in RAG Blocks

RAG modules follow a consistent node-based structure within workflows:

Ingest Configuration – Initializes the module with access credentials and knowledge source metadata.
Retrieve Nodes – Finds documents or content relevant to the input query.
Rerank Nodes – Optimizes result ordering for contextual fit.
Synthesize Nodes – Uses LLM to generate a response or insight based on the retrieved content.

RAG modules often work in combination with Data Connectors, especially when content is dynamic or sourced from tools like Slack, Notion, GitHub, or HubSpot.

Why Use RAG Modules

Reduce hallucination and improve model trustworthiness.
Maintain access to real-time or external information not stored within the model.
Enable context-aware conversations and intelligent decision support.
Ground outputs in auditable source material (e.g., PDF pages, Notion blocks, Slack messages).
Enhance retrieval precision with BM25, ColBERT, or vector embeddings.

⚙️ Configuration

Head to the block editor and drop a module from the RAG section (e.g., Knowledge Base, Notion, Slack, etc.).
Click on the block to configure its access credentials and knowledge source (e.g., file uploads, page IDs, tokens).

ℹ️ Each RAG module has a specific configuration pattern. Refer to individual block documentation for details.
Save the block.
View or modify the auto-generated workflow in the workflow editor.
Optionally, use bridges to connect RAG outputs to LLM blocks or analytics components.

For more on building and customizing workflows, see the Workflow Documentation.

🔍 Behind the Scenes: How RAG Operates

INTELLITHING’s RAG modules follow a well-structured, multi-step process designed for efficiency, modularity, and precision. Here’s a simplified view of the internal mechanics—represented.

Step 1: Ingest Configuration

# Load configuration (tokens, paths, database references, etc.)
config = load_config()

# Initialize vector index or document store if available
if storage.exists():
    index = load_vector_index_from_storage()
else:
    documents = load_documents_from_directory()
    index = create_index_from_documents(documents)
    persist_index(index)

Step 2: Retrieve Relevant Nodes

# Capture the user query
query = get_query_from_context_or_event()

# Create retriever
retriever = index.as_retriever(top_k=5)

# Retrieve documents/nodes relevant to the query
retrieved_nodes = retriever.retrieve(query)

Step 3: Rerank Results (Optional)

# Use LLM-based reranker to sort results by semantic and contextual relevance
reranker = LLMReranker(top_n=3)
reranked_nodes = reranker.rerank(retrieved_nodes, query)

Step 4: Synthesize Response

# Combine top reranked nodes into a coherent answer
synthesizer = CompactAndRefine()
response = synthesizer.summarize(query=query, nodes=reranked_nodes)

# Return the final response to the workflow
return response

What This Means for You

Storage-Backed Indexing: If data was indexed previously, it loads instantly—saving compute and time.
Retriever Logic: Uses semantic search (e.g., vector embeddings) to find relevant context.
Reranking Engine: Uses a small LLM internally to boost the relevance of the retrieved data.
Response Synthesizer: Carefully combines retrieved knowledge into a clear, user-ready output.

This modular design ensures that every RAG block in INTELLITHING is extensible, explainable, and robust—whether you're grounding answers in product docs, customer tickets, or process manuals.