Simple RAG Using Python, Langchain, OpenAI and Chroma

7 min readJun 7, 2024

The Retrieval Augmented Generation (RAG) technique is gaining significant attention. RAG combines the retrieval of information from external sources with the capabilities of a Large Language Model (LLM) to create new, relevant, and informative sentences. This technique is typically applied in systems utilizing LLM models and Vector Databases.

An analogy for RAG is having a library within your computer. When you want to know something, RAG will open these “books,” read their contents, and compile the relevant information into an easily understandable answer.

For example, if you ask, “What is the capital of Indonesia?” The answer generation network (RAG) will access all “books” and derive the information from the section that presents the capital of Indonesia, and organize the answer as: “The capital of Indonesia is Jakarta.”

This article aims to introduce how to create a simple RAG system by using some technologies like Python, Langchain, OpenAI, and Chroma. Below is the step-by-step guide to building an End-to-End RAG solution. Here is the architecture.

Install Library

!pip install langchain
!pip install langchain-community langchain-core
!pip install -U langchain-openai
!pip install langchain-chroma

The OpenAI API is a service that allows developers to access and use OpenAI’s large language models (LLMs) in their own applications.
LangChain is an open-source framework that makes it easier for developers to build LLM applications.
ChromaDB is an open-source vector database specifically designed to store and manage vector representations of text data.

Import Library

# To split long text into smaller sections based on specific characters
from langchain.text_splitter import RecursiveCharacterTextSplitter

# To interact with OpenAI's large language models (LLMs) in a conversational manner
from langchain.chat_models import ChatOpenAI

# To create prompt templates
from langchain.prompts import PromptTemplate

# To combine the Retriever with the QA chain
from langchain.chains import RetrievalQA

# To generate embeddings using OpenAI's LLM
from langchain_openai import OpenAIEmbeddings

# To interact with OpenAI's large language models (LLMs) in a conversational manner
from langchain_openai import ChatOpenAI

# Import ChromaDB
from langchain_chroma import Chroma
import chromadb

# To tidy up print output
import pprint

The above are the libraries used along with a brief description of their purpose and usage.

Data Preparation

texts =  [
    "There is ample evidence to show that the Earth is round.",
    "First, satellite images orbiting the Earth clearly depict our planet's round shape.",
    "Additionally, during a lunar eclipse, the Earth's shadow cast on the Moon is always curved, which can only happen if the Earth is round.",
    "Navigation of ships also provides evidence, as ships moving away from the shore gradually disappear from view bottom first, indicating the Earth's curved surface.",
    "Observing stars from different parts of the world shows that constellations change positions due to the Earth's curvature.",
    "Eratosthenes' ancient experiment measuring the shadow lengths at two different locations in Egypt also provided strong evidence of the Earth's curvature.",
    "If the Earth were flat, the shadow lengths would be the same in both places.",
    "Airplane flights support this fact, as long-distance flight paths often curve rather than follow a straight line to take advantage of the Earth's curvature.",
    "The horizon phenomenon also shows that we cannot see very distant objects because the Earth curves.",
    "GPS satellites that help us navigate can only function optimally if the Earth is round.",
    "Gravity experiments show that gravity pulls towards the center of mass, causing the Earth to be round.",
    "Photos from the Apollo missions that landed on the Moon also show the Earth's round shape from a distance.",
    "Weather observations from satellites show cloud movement and storm patterns consistent with a round Earth.",
    "The light we see at dawn and dusk also indicates the Earth's curvature.",
    "The height of radio towers and antennas is determined by considering the Earth's curvature to optimize signal range.",
    "Experiments using high-flying drones show a curved horizon.",
    "International space missions, like the ISS, also show the Earth as round from low orbit.",
    "The phenomenon of tides is also related to the gravity of a round Earth.",
    "Satellite communication systems orbiting the Earth require coordination that considers the planet's curvature.",
    "Geodesy research, the science of measuring and mapping the Earth, also shows that the Earth is a geoid, or round with slight deviations at the poles.",
    "Moreover, the pattern of day and night distribution around the world is only possible if the Earth is round.",
    "All this evidence consistently supports the fact that our Earth is round."
]

# Combine all elements in the list into a single string with newline as the separator
combined_text = "\n".join(texts)

# Perform "RecursiveCharacterTextSplitter" so that the data can have an object "page_content"
# This code splits the text into characters separated by "\n", with each character in a separate chunk.
text_splitter = RecursiveCharacterTextSplitter(separators=["\n"], chunk_size=1, chunk_overlap=0)
texts = text_splitter.create_documents([combined_text])

Above is the code that is used along with a brief description of its purpose and usage. Below is the output provided.

Embedding

embeddings = OpenAIEmbeddings(openai_api_key=openai_api_key)

This process aims to enable the generation of numerical representations (embeddings) of text data using a trained OpenAI model. These embeddings capture the semantic relationships between words and can be used for various natural language processing (NLP) tasks.

The output is in the form of floating-point number vectors. The distance between two vectors measures their relatedness. A small distance indicates a high level of relatedness, while a large distance indicates a low level of relatedness.

Store to Chroma DB

persist_directory = "chroma_db"

# save to local
db = Chroma.from_documents(
    documents=texts, embedding=embeddings, persist_directory=persist_directory
)


# load from lokal
db = Chroma(persist_directory = persist_directory, embedding_function=embeddings)

#Testing
query = "Why earth like a ball?"
docs = db.similarity_search(query)
print(docs)

Prompt Enginering

template = """

Role: You are a Scientist. 
Input: Use the following context to answer the question.
Context: {context}
Question: {question}
Steps: Answer politely and say I hope you are healthy, then focus on answering the question.
Expectation: Provide accurate and relevant answers based on the context provided.
Narrowing: 
1. Limit your responses to the context given. Focus only on questions about Earth.
2. If you don't know the answer, just say you don't know.
3. If there are words or questions outside the earth, just say let's talk about earth.

Answer:

"""

# {context} is data derived from the database vectors that have similarities with the question
# {question} is the question that will be asked to the application

PROMPT = PromptTemplate(
    template = template,
    input_variables=["context", "question"]
)

This section is used to design the prompt template, preferably using the RISEN framework, which is a Prompt Engineering approach. RISEN stands for Role, Input, Steps, Expectation, Narrowing. Here is the explanation

Define LLM

# Define the parameter values.
temperature = 0.2

param = {
          "top_p": 0.4,
          "frequency_penalty":0.1,
          "presence_penalty":0.7
        }

# Create an LLM object with the specified parameters.
llm = ChatOpenAI(
    temperature=temperature,
    api_key=openai_api_key,
    model_kwargs= param
)

This process aims to create parameter settings for the LLM model that will be used, such as:

temperature: This parameter controls the randomness and creativity of ChatGPT’s responses. Higher values result in more varied and unpredictable responses, while lower values produce more conservative and predictable responses. Adjust this parameter according to the level of creativity you desire.
top_p: This parameter controls the diversity of ChatGPT’s responses. Higher values result in more diverse responses, while lower values produce more conservative and predictable responses. This can be adjusted depending on the context and desired outcomes.
frequency_penalty: This parameter affects the likelihood of generating words that have already been used in the response. Higher values result in more varied responses, while lower values result in more repetitive responses. Use this parameter to minimize repetition in ChatGPT’s output.
presence_penalty: This parameter penalizes the likelihood of generating words that are not present in the input prompt. Higher values result in more relevant responses, while lower values result in less relevant responses. Adjust this parameter to ensure that the output remains focused and relevant.

RetrievalQA

qa_with_source = RetrievalQA.from_chain_type(
    llm = llm,
    chain_type = 'stuff',
    retriever = db.as_retriever(search_kwargs={"k": 5}),
    chain_type_kwargs={"prompt":PROMPT,},
    return_source_documents=True,
)

RetrievalQA is a method for question answering tasks that utilizes an index to retrieve relevant documents or text snippets, suitable for simple question-answering applications. RetrievalQAChain combines Retriever and a QA chain. It’s used to fetch documents from the Retriever and then utilize the QA chain to answer questions based on the retrieved documents.

Asking Question

pprint.pprint(
    qa_with_source("Earth is Flat !!!")
)

What Next?

Create a User Interface with Streamlit: Develop a user-friendly interface using Streamlit, a Python library for creating interactive web applications. This interface will allow users to input questions and receive answers.
Integrate with FastAPI: Combine our code with FastAPI, a Python web framework for building APIs quickly.
Store Questions and Answers in a Database: Set up a database (example, SQLite, PostgreSQL) to store questions and their corresponding answers.

Best Regards

Lintang Gilang Pratama