Chroma
Quick Summary
DeepEval allows you to evaluate your Chroma retriever and optimize retrieval hyperparameters like top-K
, embedding model
, and similarity function
.
To get started, install Chroma through the CLI using the following command:
pip install chromadb
Chroma is a lightweight and scalable vector database designed for fast and efficient retrieval in RAG applications. It provides an intuitive API for managing embeddings and performing similarity search. Learn more about Chroma here.
This diagram illustrates how Chroma fits into your RAG pipeline as a retriever.
Setup Chroma
To get started, initialize your Chroma client and create a collection.
import chromadb
# Initialize Chroma client
client = chromadb.PersistentClient(path="./chroma_db")
# Create or load a collection
collection = client.get_or_create_collection(name="rag_documents")
Next, define an embedding model to convert your document chunks into vectors before storing them in Chroma.
# Load an embedding model
from sentence_transformers import SentenceTransformer
model = SentenceTransformer("all-MiniLM-L6-v2")
# Example document chunks
document_chunks = [
"Chroma is a lightweight vector database for AI applications.",
"RAG improves AI-generated responses with retrieved context.",
"Vector search enables high-precision semantic retrieval.",
...
]
# Store chunks with embeddings in Chroma
for i, chunk in enumerate(document_chunks):
embedding = model.encode(chunk).tolist() # Convert text to vector
collection.add(
ids=[str(i)], # Unique ID for each document
embeddings=[embedding], # Vector representation
metadatas=[{"text": chunk}] # Store original text as metadata
)
To use Chroma as the vector database and retriever in your RAG pipeline, retrieve relevant context from your collection based on user input and incorporate it into your prompt template. This provides your model with the necessary information for accurate and well-informed responses.
Evaluating Chroma Retrieval
Evaluating your Chroma retriever consists of 2 steps:
- Preparing an
input
query along with the expected LLM response, and using theinput
to generate a response from your RAG pipeline to create anLLMTestCase
containing the input, actual output, expected output, and retrieval context. - Evaluating the test case using a selection of retrieval metrics.
An LLMTestCase
allows you to create unit tests for your LLM applications, helping you identify specific weaknesses in your RAG application.
Preparing your Test Case
Since the first step in generating a response from your RAG pipeline is retrieving the relevant retrieval_context
from your Chroma index, first perform this retrieval for your input
query.
def search(query):
query_embedding = model.encode(query).tolist()
res = collection.query(
query_embeddings=[query_embedding],
n_results=3 # Retrieve top-K matches
)
return res["metadatas"][0][0]["text"] if res["metadatas"][0] else None
query = "How does Chroma work?"
retrieval_context = search(query)
Next, pass the retrieved context into your LLM's prompt template to generate a response.
prompt = """
Answer the user question based on the supporting context.
User Question:
{input}
Supporting Context:
{retrieval_context}
"""
actual_output = generate(prompt) # Replace with your LLM function
print(actual_output)
Let's examine the actual_output
generated by our RAG pipeline:
Chroma is a lightweight vector database designed for AI applications, enabling fast semantic retrieval.
Finally, create an LLMTestCase
using the input and expected output you prepared, along with the actual output and retrieval context you generated.
from deepeval.test_case import LLMTestCase
test_case = LLMTestCase(
input=input,
actual_output=actual_output,
retrieval_context=retrieval_context,
expected_output="Chroma is an efficient vector database for AI applications, optimized for semantic search and retrieval.",
)
Running Evaluations
To run evaluations on the LLMTestCase
, we first need to define relevant deepeval
metrics to evaluate the Chroma retriever: contextual recall, contextual precision, and contextual relevancy.
These contextual metrics help assess your retriever. For more retriever evaluation details, check out this guide.
from deepeval.metrics import (
ContextualRecallMetric,
ContextualPrecisionMetric,
ContextualRelevancyMetric,
)
contextual_recall = ContextualRecallMetric(),
contextual_precision = ContextualPrecisionMetric()
contextual_relevancy = ontextualRelevancyMetric()
Finally, pass the test case and metrics into the evaluate
function to begin the evaluation.
from deepeval import evaluate
evaluate(
[test_case],
metrics=[contextual_recall, contextual_precision, contextual_relevancy]
)
Improving Chroma Retrieval
Below is a table outlining the hypothetical metric scores for your evaluation run.
Metric | Score |
---|---|
Contextual Precision | 0.85 |
Contextual Recall | 0.92 |
Contextual Relevancy | 0.44 |
Each contextual metric evaluates a specific hyperparameter. To learn more about this, read this guide on RAG evaluation.
To improve your Chroma retriever, you'll need to experiment with various hyperparameters and prepare LLMTestCase
s using generations from different retriever versions.
Ultimately, analyzing improvements and regressions in contextual metric scores (the three metrics defined above) will help you determine the optimal hyperparameter combination for your Chroma retriever.
For a more detailed guide on tuning your retriever’s hyperparameters, check out this guide.