Hugging Face Local Pipelines
Hugging Face models can be run locally through the HuggingFacePipeline
class.
The Hugging Face Model Hub hosts over 120k models, 20k datasets, and 50k demo apps (Spaces), all open source and publicly available, in an online platform where people can easily collaborate and build ML together.
These can be called from LangChain either through this local pipeline wrapper or by calling their hosted inference endpoints through the HuggingFaceHub class. For more information on the hosted pipelines, see the HuggingFaceHub notebook.
To use, you should have the transformers
python package installed, as well as pytorch. You can also install xformer
for a more memory-efficient attention implementation.
%pip install transformers --quiet
Load the model
from langchain.llms import HuggingFacePipeline
llm = HuggingFacePipeline.from_model_id(
model_id="bigscience/bloom-1b7",
task="text-generation",
model_kwargs={"temperature": 0, "max_length": 64},
)
API Reference:
Create Chain
With the model loaded into memory, you can compose it with a prompt to form a chain.
from langchain.prompts import PromptTemplate
template = """Question: {question}
Answer: Let's think step by step."""
prompt = PromptTemplate.from_template(template)
chain = prompt | llm
question = "What is electroencephalography?"
print(chain.invoke({"question": question}))
API Reference:
First, we need to understand what is an electroencephalogram. An electroencephalogram is a recording of brain activity. It is a recording of brain activity that is made by placing electrodes on the scalp. The electrodes are placed