In this post, I will first define what LlamaIndex, RAG (Retrieval-Augmented Generation) is, describe the basic architecture pipeline that I am building, and implement the concept using Python code.
LlamaIndex#
LlamaIndex is a powerful data framework design to connect custom(text, HTML, pdf, etc.) data sources to the Large Language Models.
It acts as an interface to manage the interaction with the LLMs like loading data from source, create the index form the input data, which than used to respond
to the users queries.
RAG (Retrieval-Augmented Generation)#
LLMs are trained with publicly available data but do not have access to private data,
meaning they are not trained with your private data. RAG (Retrieval-Augmented Generation) comes in to bridge the gap. RAG adds your private data to the LLMs.
So that we can build LLMs app on top of private data (structure and un-structure data).
RAG is the technique to query over both structured and unstructured documents using the large language model(LLM).
Architecture#
Here, I will explain the architecture of a basic RAG (Retrieval-Augmented Generation) pipeline designed for summarization and Q&A tasks using query engines with below architecture diagram.
Here’s an explanation of the components and the flow:
Document#
The input data source, which could be one or more PDF documents. The pipeline processes these documents to build indexes.
Vector Index#
An index created from the document for Q&A tasks. It enables fast similarity searches using vector embeddings of the document content.
Connected to the Q&A Query Engine for retrieving context-relevant information.
Summary Index#
An index generated specifically for summarization tasks. It processes and organizes the document content to facilitate quick and accurate summarization.
Q&A Query Engine#
This query engine interacts with the Vector Index to answer specific questions by retrieving the most relevant document sections.
Summarization Query Engine#
This query engine interacts with the Summary Index to provide concise and coherent summaries of the document content.
Router (RouterQueryEngine):#
The central component that receives the Query from the user.
Dynamically selects the appropriate query engine (Q&A or Summarization) based on the nature of the query.
Combines the response from the chosen engine and sends it back to the user.
Query and Response:#
Query:#
The input from the user, which could be a question or a request for summarization.
Response:#
The processed output generated by the selected query engine
Step-by-Step Implementation#
Here, I will write a code to implement the concept that I was discussing above
Step 1: Load Environment Variables#
The first step is Install the necessary dependencies. Here, I am using pip
to install them pip install llama-index llama-index-core python-dotenv
and
to load environment variables (such as API keys) from a .env
file.
1
2
| from dotenv import load_dotenv
load_dotenv()
|
Here, from .env
file we will be loading OPENAI_API_KEY=your-key-goes-here
to connect OpenAI LLM
Step 2: Load Documents#
Use SimpleDirectoryReader
to load documents from a directory. Here, we specify a folder named pdf/
where the documents are stored. I have downloaded The Google PageRank Algorithm
pdf file and placed inside pdf folder.
1
2
3
| from llama_index.core import SimpleDirectoryReader
documents = SimpleDirectoryReader("pdf/").load_data()
|
Step 3: Split Documents into Chunks#
Large documents are split into smaller, manageable chunks (or “nodes”) using SentenceSplitter
. This improves the efficiency and accuracy of LLMs.
1
2
3
4
| from llama_index.core.node_parser import SentenceSplitter
splitter = SentenceSplitter(chunk_size=1024)
nodes = splitter.get_nodes_from_documents(documents)
|
Here, each node contains up to 1024 characters, balancing granularity and processing time.
Here I configure OpenAI’s gpt-3.5-turbo
as the primary language model and text-embedding-ada-002
for generating embeddings.
These settings are applied globally using the Settings
object.
1
2
3
4
5
6
| from llama_index.llms.openai import OpenAI
from llama_index.embeddings.openai import OpenAIEmbedding
from llama_index.core import Settings
Settings.llm = OpenAI(model="gpt-3.5-turbo")
Settings.embed_model = OpenAIEmbedding(model="text-embedding-ada-002")
|
Step 5: Build Indices#
Two indices are created:
SummaryIndex
for summarizing documents.VectorStoreIndex
for semantic search and retrieval.
1
2
3
4
5
6
| # Step 5: Build indices for summarization and vector-based retrieval
from llama_index.core import SummaryIndex, VectorStoreIndex
summary_index = SummaryIndex(nodes) # Creates a summarization index
vector_index = VectorStoreIndex(nodes) # Creates a vector-based index for semantic search
|
Step 6: Create Query Engines#
Query engines enable us to interact with the indices. The summarization engine uses a tree summarization
approach, while the vector engine supports general queries.
1
2
3
4
5
6
| # Step 6: Create query engines from the indices
summary_query_engine = summary_index.as_query_engine(
response_mode="tree_summarize", # Use tree summarization mode
use_async=True, # Enable faster query generation by leveraging asynchronous processing
)
|
Tools are abstractions that associate a query engine with a description. They are used to route queries based on their intent.
1
2
3
4
5
6
7
8
9
10
11
12
13
| # Step 7: Define tools for summarization and specific queries
from llama_index.core.tools import QueryEngineTool
summary_tool = QueryEngineTool.from_defaults(
query_engine=summary_query_engine,
description="Useful for summarization questions related to The Google PageRank Algorithm",
)
vector_tool = QueryEngineTool.from_defaults(
query_engine=vector_query_engine,
description="Get the important concept form the paper",
)
|
A RouterQueryEngine
selects the appropriate tool for each query using a selector, such as LLMSingleSelector
. This setup makes the pipeline flexible and scalable.
1
2
3
4
5
6
7
8
9
10
11
12
13
| # Step 7: Combine tools into a router query engine
from llama_index.core.query_engine.router_query_engine import RouterQueryEngine
from llama_index.core.selectors import LLMSingleSelector
query_engine = RouterQueryEngine(
selector=LLMSingleSelector.from_defaults(), # Selector to route queries to the appropriate tool
query_engine_tools=[
summary_tool, # Tool for summarization
vector_tool, # Tool for specific questions
],
verbose=True # Enable verbose output for debugging
)
|
Step 9: Query the Documents#
Finally, we query the documents using the router query engine.
The engine automatically selects the best tool to process each query.
1
2
3
4
5
6
7
8
| response = query_engine.query("What is the summary of the document?")
print("Summary Response:", str(response))
response = query_engine.query("Who is the author of the paper and when was published?")
print("Author and Date Response:", str(response))
response = query_engine.query("What is about?")
print("Papaer is about:", str(response))
|
Let’s all put together
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
| from dotenv import load_dotenv
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader, SummaryIndex
from llama_index.llms.openai import OpenAI
from llama_index.core.node_parser import SentenceSplitter
from llama_index.core import Settings
from llama_index.embeddings.openai import OpenAIEmbedding
from llama_index.core.tools import QueryEngineTool
from llama_index.core.query_engine.router_query_engine import RouterQueryEngine
from llama_index.core.selectors import LLMSingleSelector
# Step 1: Load environment variables from a .env file
load_dotenv()
# Step 2: Load documents from the specified directory
documents = SimpleDirectoryReader("pdf/").load_data()
# Step 3: Split documents into smaller chunks (nodes) for efficient processing
splitter = SentenceSplitter(chunk_size=1024)
nodes = splitter.get_nodes_from_documents(documents)
# Step 4: Configure the LLM and embedding models
Settings.llm = OpenAI(model="gpt-3.5-turbo") # Language model for NLP tasks
Settings.embed_model = OpenAIEmbedding(model="text-embedding-ada-002") # Embedding model for vector representation
# Step 5: Build indices for summarization and vector-based retrieval
summary_index = SummaryIndex(nodes) # Creates a summarization index
vector_index = VectorStoreIndex(nodes) # Creates a vector-based index for semantic search
# Step 6: Create query engines from the indices
summary_query_engine = summary_index.as_query_engine(
response_mode="tree_summarize", # Use tree summarization mode
use_async=True, # Enable faster query generation by leveraging asynchronous processing
)
vector_query_engine = vector_index.as_query_engine() # Standard query engine for the vector index
# Step 7: Define tools for summarization and specific queries
summary_tool = QueryEngineTool.from_defaults(
query_engine=summary_query_engine,
description="Useful for summarization questions related to The Google PageRank Algorithm",
)
vector_tool = QueryEngineTool.from_defaults(
query_engine=vector_query_engine,
description="Get the important concept form the paper",
)
# Step 8: Combine tools into a router query engine
query_engine = RouterQueryEngine(
selector=LLMSingleSelector.from_defaults(), # Selector to route queries to the appropriate tool
query_engine_tools=[
summary_tool, # Tool for summarization
vector_tool, # Tool for specific questions
],
verbose=True # Enable verbose output for debugging
)
# Step 9: Query the documents using the router query engine
response = query_engine.query("What is the summary of the document?")
print("Summary Response:", str(response))
response = query_engine.query("Who is the author of the paper and when was published?")
print("Author and Date Response:", str(response))
response = query_engine.query("What is about?")
print("Papaer is about:", str(response))
|
Response
1
2
3
4
5
6
7
8
9
| /Users/prakash/code/lab/lamaindex/.venv/bin/python /Users/prakash/code/lab/lamaindex/project1.py
Selecting query engine 1: This choice focuses on extracting the important concept from the paper, which is essential for summarizing the document..
Summary Response: The document discusses the importance of outbound links for ranking on Google, emphasizing the significance of high-quality links. It mentions the impact of Google PageRank on optimization strategies and predicts continued use of the PageRank concept in various applications. Additionally, it highlights the application of the PageRank algorithm beyond the web, such as in social impact analysis and text summarization. The document also introduces the basic framework of PageRank and its enhancements by researchers.
Selecting query engine 1: This choice focuses on extracting the important concept from the paper, which is more relevant to identifying the author and publication date..
Author and Date Response: The author of the paper is J. He, and it was published in 2023.
Selecting query engine 0: The question is asking for the usefulness of the information related to The Google PageRank Algorithm, which aligns with choice 1..
Papaer is about: The information provided discusses the PageRank algorithm, its mathematical foundation, and its applications beyond web page ranking. It explains how PageRank works as a random walk model on a directed graph, determining the importance of nodes based on their connectivity. The algorithm has been extended to various domains like social network analysis, link recommendation, and prediction. Additionally, it introduces the concept of personalized PageRank for tailored recommendations and explores the potential of community discovery within networks. The text also touches upon the future prospects of PageRank and its continued relevance in information retrieval and ranking algorithms.
Process finished with exit code 0
|
In this post, I built a basic Retrieval-Augmented Generation (RAG) pipeline using LlamaIndex to handle both summarization and Q&A tasks.
I explored the concept of RAG through a step-by-step guide and demonstrated how to load documents, split them into chunks,
create vector and summary indexes, and set up query engines.
I then implemented a dynamic query router that selects the appropriate engine based on the user’s query type.
Github Repo : https://github.com/dev-scripts/Building-a-RAG-Pipeline-for-Summarization-and-Q-A-with-Llamaindex