Multiple Retrieval Sources
Often times you may want to do retrieval over multiple sources. These can be different vectorstores (where one contains information about topic X and the other contains info about topic Y). They could also be completely different databases altogether!
A key part is is doing as much of the retrieval in parrelel as possible. This will keep the latency as low as possible. Luckily, LangChain Expression Language supports parrellism out of the box.
Let's take a look where we do retrieval over a SQL database and a vectorstore.
from langchain.chat_models import ChatOpenAI
API Reference:
Set up SQL query
from langchain.utilities import SQLDatabase
from langchain.chains import create_sql_query_chain
db = SQLDatabase.from_uri("sqlite:///../../../../../notebooks/Chinook.db")
query_chain = create_sql_query_chain(ChatOpenAI(temperature=0), db)
API Reference:
Set up vectorstore
from langchain.indexes import VectorstoreIndexCreator
from langchain.schema.document import Document
index_creator = VectorstoreIndexCreator()
index = index_creator.from_documents([Document(page_content="Foo")])
retriever = index.vectorstore.as_retriever()
API Reference:
Combine
from langchain.prompts import ChatPromptTemplate
system_message = """Use the information from the below two sources to answer any questions.
Source 1: a SQL database about employee data
<source1>
{source1}
</source1>
Source 2: a text database of random information
<source2>
{source2}
</source2>
"""
prompt = ChatPromptTemplate.from_messages([("system", system_message), ("human", "{question}")])
API Reference:
full_chain = {
"source1": {"question": lambda x: x["question"]} | query_chain | db.run,
"source2": (lambda x: x['question']) | retriever,
"question": lambda x: x['question'],
} | prompt | ChatOpenAI()
response = full_chain.invoke({"question":"How many Employees are there"})
print(response)
Number of requested results 4 is greater than number of elements in index 1, updating n_results = 1
content='There are 8 employees.' additional_kwargs={} example=False