Vald
Vald is a highly scalable distributed fast approximate nearest neighbor (ANN) dense vector search engine.
This notebook shows how to use functionality related to the Vald
database.
To run this notebook you need a running Vald cluster. Check Get Started for more information.
See the installation instructions.
pip install vald-client-python
Basic Example
from langchain.document_loaders import TextLoader
from langchain.embeddings import HuggingFaceEmbeddings
from langchain.text_splitter import CharacterTextSplitter
from langchain.vectorstores import Vald
raw_documents = TextLoader('state_of_the_union.txt').load()
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
documents = text_splitter.split_documents(raw_documents)
embeddings = HuggingFaceEmbeddings()
db = Vald.from_documents(documents, embeddings, host="localhost", port=8080)
API Reference:
query = "What did the president say about Ketanji Brown Jackson"
docs = db.similarity_search(query)
docs[0].page_content
Similarity search by vector
embedding_vector = embeddings.embed_query(query)
docs = db.similarity_search_by_vector(embedding_vector)
docs[0].page_content
Similarity search with score
docs_and_scores = db.similarity_search_with_score(query)
docs_and_scores[0]
Maximal Marginal Relevance Search (MMR)
In addition to using similarity search in the retriever object, you can also use mmr
as retriever.
retriever = db.as_retriever(search_type="mmr")
retriever.get_relevant_documents(query)
Or use max_marginal_relevance_search
directly:
db.max_marginal_relevance_search(query, k=2, fetch_k=10)