LLMs
Features (natively supported)β
All LLMs implement the Runnable interface, which comes with default implementations of all methods, ie. ainvoke
, batch
, abatch
, stream
, astream
. This gives all LLMs basic support for async, streaming and batch, which by default is implemented as below:
- Async support defaults to calling the respective sync method in asyncio's default thread pool executor. This lets other async functions in your application make progress while the LLM is being executed, by moving this call to a background thread.
- Streaming support defaults to returning an
Iterator
(orAsyncIterator
in the case of async streaming) of a single value, the final result returned by the underlying LLM provider. This obviously doesn't give you token-by-token streaming, which requires native support from the LLM provider, but ensures your code that expects an iterator of tokens can work for any of our LLM integrations. - Batch support defaults to calling the underlying LLM in parallel for each input by making use of a thread pool executor (in the sync batch case) or
asyncio.gather
(in the async batch case). The concurrency can be controlled with themax_concurrency
key inRunnableConfig
.
Each LLM integration can optionally provide native implementations for async, streaming or batch, which, for providers that support it, can be more efficient. The table shows, for each integration, which features have been implemented with native support.
Model | Invoke | Async invoke | Stream | Async stream | Batch | Async batch |
---|---|---|---|---|---|---|
AI21 | β | β | β | β | β | β |
AlephAlpha | β | β | β | β | β | β |
AmazonAPIGateway | β | β | β | β | β | β |
Anthropic | β | β | β | β | β | β |
Anyscale | β | β | β | β | β | β |
Aviary | β | β | β | β | β | β |
AzureMLOnlineEndpoint | β | β | β | β | β | β |
AzureOpenAI | β | β | β | β | β | β |
Banana | β | β | β | β | β | β |
Baseten | β | β | β | β | β | β |
Beam | β | β | β | β | β | β |
Bedrock | β | β | β | β | β | β |
CTransformers | β | β | β | β | β | β |
CTranslate2 | β | β | β | β | β | β |
CerebriumAI | β | β | β | β | β | β |
ChatGLM | β | β | β | β | β | β |
Clarifai | β | β | β | β | β | β |
Cohere | β | β | β | β | β | β |
Databricks | β | β | β | β | β | β |
DeepInfra | β | β | β | β | β | β |
DeepSparse | β | β | β | β | β | β |
EdenAI | β | β | β | β | β | β |
Fireworks | β | β | β | β | β | β |
FireworksChat | β | β | β | β | β | β |
ForefrontAI | β | β | β | β | β | β |
GPT4All | β | β | β | β | β | β |
GooglePalm | β | β | β | β | β | β |
GooseAI | β | β | β | β | β | β |
GradientLLM | β | β | β | β | β | β |
HuggingFaceEndpoint | β | β | β | β | β | β |
HuggingFaceHub | β | β | β | β | β | β |
HuggingFacePipeline | β | β | β | β | β | β |
HuggingFaceTextGenInference | β | β | β | β | β | β |
HumanInputLLM | β | β | β | β | β | β |
JavelinAIGateway | β | β | β | β | β | β |
KoboldApiLLM | β | β | β | β | β | β |
LlamaCpp | β | β | β | β | β | β |
ManifestWrapper | β | β | β | β | β | β |
Minimax | β | β | β | β | β | β |
MlflowAIGateway | β | β | β | β | β | β |
Modal | β | β | β | β | β | β |
MosaicML | β | β | β | β | β | β |
NIBittensorLLM | β | β | β | β | β | β |
NLPCloud | β | β | β | β | β | β |
Nebula | β | β | β | β | β | β |
OctoAIEndpoint | β | β | β | β | β | β |
Ollama | β | β | β | β | β | β |
OpaquePrompts | β | β | β | β | β | β |
OpenAI | β | β | β | β | β | β |
OpenLLM | β | β | β | β | β | β |
OpenLM | β | β | β | β | β | β |
Petals | β | β | β | β | β | β |
PipelineAI | β | β | β | β | β | β |
Predibase | β | β | β | β | β | β |
PredictionGuard | β | β | β | β | β | β |
PromptLayerOpenAI | β | β | β | β | β | β |
QianfanLLMEndpoint | β | β | β | β | β | β |
RWKV | β | β | β | β | β | β |
Replicate | β | β | β | β | β | β |
SagemakerEndpoint | β | β | β | β | β | β |
SelfHostedHuggingFaceLLM | β | β | β | β | β | β |
SelfHostedPipeline | β | β | β | β | β | β |
StochasticAI | β | β | β | β | β | β |
TextGen | β | β | β | β | β | β |
TitanTakeoff | β | β | β | β | β | β |
Tongyi | β | β | β | β | β | β |
VLLM | β | β | β | β | β | β |
VLLMOpenAI | β | β | β | β | β | β |
VertexAI | β | β | β | β | β | β |
VertexAIModelGarden | β | β | β | β | β | β |
Writer | β | β | β | β | β | β |
Xinference | β | β | β | β | β | β |
ποΈ LLMs
Features (natively supported)
ποΈ AI21
AI21 Studio provides API access to Jurassic-2 large language models.
ποΈ Aleph Alpha
The Luminous series is a family of large language models.
ποΈ Amazon API Gateway
Amazon API Gateway is a fully managed service that makes it easy for developers to create, publish, maintain, monitor, and secure APIs at any scale. APIs act as the "front door" for applications to access data, business logic, or functionality from your backend services. Using API Gateway, you can create RESTful APIs and WebSocket APIs that enable real-time two-way communication applications. API Gateway supports containerized and serverless workloads, as well as web applications.
ποΈ Anyscale
Anyscale is a fully-managed Ray platform, on which you can build, deploy, and manage scalable AI and Python applications
ποΈ Azure ML
Azure ML is a platform used to build, train, and deploy machine learning models. Users can explore the types of models to deploy in the Model Catalog, which provides Azure Foundation Models and OpenAI Models. Azure Foundation Models include various open-source models and popular Hugging Face models. Users can also import models of their liking into AzureML.
ποΈ Azure OpenAI
This notebook goes over how to use Langchain with Azure OpenAI.
ποΈ Baidu Qianfan
Baidu AI Cloud Qianfan Platform is a one-stop large model development and service operation platform for enterprise developers. Qianfan not only provides including the model of Wenxin Yiyan (ERNIE-Bot) and the third-party open source models, but also provides various AI development tools and the whole set of development environment, which facilitates customers to use and develop large model applications easily.
ποΈ Banana
Banana is focused on building the machine learning infrastructure.
ποΈ Baseten
Baseten provides all the infrastructure you need to deploy and serve ML models performantly, scalably, and cost-efficiently.
ποΈ Beam
Calls the Beam API wrapper to deploy and make subsequent calls to an instance of the gpt2 LLM in a cloud deployment. Requires installation of the Beam library and registration of Beam Client ID and Client Secret. By calling the wrapper an instance of the model is created and run, with returned text relating to the prompt. Additional calls can then be made by directly calling the Beam API.
ποΈ Bedrock
Amazon Bedrock is a fully managed service that makes FMs from leading AI startups and Amazon available via an API, so you can choose from a wide range of FMs to find the model that is best suited for your use case
ποΈ Bittensor
Bittensor is a mining network, similar to Bitcoin, that includes built-in incentives designed to encourage miners to contribute compute + knowledge.
ποΈ CerebriumAI
Cerebrium is an AWS Sagemaker alternative. It also provides API access to several LLM models.
ποΈ ChatGLM
ChatGLM-6B is an open bilingual language model based on General Language Model (GLM) framework, with 6.2 billion parameters. With the quantization technique, users can deploy locally on consumer-grade graphics cards (only 6GB of GPU memory is required at the INT4 quantization level).
ποΈ Clarifai
Clarifai is an AI Platform that provides the full AI lifecycle ranging from data exploration, data labeling, model training, evaluation, and inference.
ποΈ Cohere
Cohere is a Canadian startup that provides natural language processing models that help companies improve human-machine interactions.
ποΈ C Transformers
The C Transformers library provides Python bindings for GGML models.
ποΈ CTranslate2
CTranslate2 is a C++ and Python library for efficient inference with Transformer models.
ποΈ Databricks
The Databricks Lakehouse Platform unifies data, analytics, and AI on one platform.
ποΈ DeepInfra
DeepInfra provides several LLMs.
ποΈ DeepSparse
This page covers how to use the DeepSparse inference runtime within LangChain.
ποΈ Eden AI
Eden AI is revolutionizing the AI landscape by uniting the best AI providers, empowering users to unlock limitless possibilities and tap into the true potential of artificial intelligence. With an all-in-one comprehensive and hassle-free platform, it allows users to deploy AI features to production lightning fast, enabling effortless access to the full breadth of AI capabilities via a single API. (website//edenai.co/)
ποΈ Fireworks
Fireworks accelerates product development on generative AI by creating an innovative AI experiment and production platform.
ποΈ ForefrontAI
The Forefront platform gives you the ability to fine-tune and use open source large language models.
ποΈ GCP Vertex AI
Note: This is separate from the Google PaLM integration, it exposes Vertex AI PaLM API on Google Cloud.
ποΈ GooseAI
GooseAI is a fully managed NLP-as-a-Service, delivered via API. GooseAI provides access to these models.
ποΈ GPT4All
GitHub:nomic-ai/gpt4all an ecosystem of open-source chatbots trained on a massive collections of clean assistant data including code, stories and dialogue.
ποΈ Gradient
Gradient allows to fine tune and get completions on LLMs with a simple web API.
ποΈ Hugging Face Hub
The Hugging Face Hub is a platform with over 120k models, 20k datasets, and 50k demo apps (Spaces), all open source and publicly available, in an online platform where people can easily collaborate and build ML together.
ποΈ Hugging Face Local Pipelines
Hugging Face models can be run locally through the HuggingFacePipeline class.
ποΈ Huggingface TextGen Inference
Text Generation Inference is a Rust, Python and gRPC server for text generation inference. Used in production at HuggingFace to power LLMs api-inference widgets.
ποΈ Javelin AI Gateway Tutorial
This Jupyter Notebook will explore how to interact with the Javelin AI Gateway using the Python SDK.
ποΈ JSONFormer
JSONFormer is a library that wraps local HuggingFace pipeline models for structured decoding of a subset of the JSON Schema.
ποΈ KoboldAI API
KoboldAI is a "a browser-based front-end for AI-assisted writing with multiple local & remote AI models...". It has a public and local API that is able to be used in langchain.
ποΈ Llama.cpp
llama-cpp-python is a Python binding for llama.cpp.
ποΈ LLM Caching integrations
This notebook covers how to cache results of individual LLM calls using different caches.
ποΈ Manifest
This notebook goes over how to use Manifest and LangChain.
ποΈ Minimax
Minimax is a Chinese startup that provides natural language processing models for companies and individuals.
ποΈ Modal
The Modal cloud platform provides convenient, on-demand access to serverless cloud compute from Python scripts on your local computer.
ποΈ MosaicML
MosaicML offers a managed inference service. You can either use a variety of open source models, or deploy your own.
ποΈ NLP Cloud
The NLP Cloud serves high performance pre-trained or custom models for NER, sentiment-analysis, classification, summarization, paraphrasing, grammar and spelling correction, keywords and keyphrases extraction, chatbot, product description and ad generation, intent classification, text generation, image generation, blog post generation, code generation, question answering, automatic speech recognition, machine translation, language detection, semantic search, semantic similarity, tokenization, POS tagging, embeddings, and dependency parsing. It is ready for production, served through a REST API.
ποΈ OctoAI
OctoML is a service with efficient compute. It enables users to integrate their choice of AI models into applications. The OctoAI compute service helps you run, tune, and scale AI applications.
ποΈ Ollama
Ollama allows you to run open-source large language models, such as Llama 2, locally.
ποΈ OpaquePrompts
OpaquePrompts is a service that enables applications to leverage the power of language models without compromising user privacy. Designed for composability and ease of integration into existing applications and services, OpaquePrompts is consumable via a simple Python library as well as through LangChain. Perhaps more importantly, OpaquePrompts leverages the power of confidential computing to ensure that even the OpaquePrompts service itself cannot access the data it is protecting.
ποΈ OpenAI
OpenAI offers a spectrum of models with different levels of power suitable for different tasks.
ποΈ OpenLLM
π¦Ύ OpenLLM is an open platform for operating large language models (LLMs) in production. It enables developers to easily run inference with any open-source LLMs, deploy to the cloud or on-premises, and build powerful AI apps.
ποΈ OpenLM
OpenLM is a zero-dependency OpenAI-compatible LLM provider that can call different inference endpoints directly via HTTP.
ποΈ Petals
Petals runs 100B+ language models at home, BitTorrent-style.
ποΈ PipelineAI
PipelineAI allows you to run your ML models at scale in the cloud. It also provides API access to several LLM models.
ποΈ Predibase
Predibase allows you to train, finetune, and deploy any ML modelβfrom linear regression to large language model.
ποΈ Prediction Guard
Basic LLM usage
ποΈ PromptLayer OpenAI
PromptLayer is the first platform that allows you to track, manage, and share your GPT prompt engineering. PromptLayer acts a middleware between your code and OpenAIβs python library.
ποΈ RELLM
RELLM is a library that wraps local Hugging Face pipeline models for structured decoding.
ποΈ Replicate
Replicate runs machine learning models in the cloud. We have a library of open-source models that you can run with a few lines of code. If you're building your own machine learning models, Replicate makes it easy to deploy them at scale.
ποΈ Runhouse
The Runhouse allows remote compute and data across environments and users. See the Runhouse docs.
ποΈ SageMakerEndpoint
Amazon SageMaker is a system that can build, train, and deploy machine learning (ML) models for any use case with fully managed infrastructure, tools, and workflows.
ποΈ StochasticAI
Stochastic Acceleration Platform aims to simplify the life cycle of a Deep Learning model. From uploading and versioning the model, through training, compression and acceleration to putting it into production.
ποΈ Nebula (Symbl.ai)
Nebula is a large language model (LLM) built by Symbl.ai. It is trained to perform generative tasks on human conversations. Nebula excels at modeling the nuanced details of a conversation and performing tasks on the conversation.
ποΈ TextGen
GitHub:oobabooga/text-generation-webui A gradio web UI for running Large Language Models like LLaMA, llama.cpp, GPT-J, Pythia, OPT, and GALACTICA.
ποΈ Titan Takeoff
TitanML helps businesses build and deploy better, smaller, cheaper, and faster NLP models through our training, compression, and inference optimization platform.
ποΈ Tongyi Qwen
Tongyi Qwen is a large-scale language model developed by Alibaba's Damo Academy. It is capable of understanding user intent through natural language understanding and semantic analysis, based on user input in natural language. It provides services and assistance to users in different domains and tasks. By providing clear and detailed instructions, you can obtain results that better align with your expectations.
ποΈ vLLM
vLLM is a fast and easy-to-use library for LLM inference and serving, offering:
ποΈ Writer
Writer is a platform to generate different language content.
ποΈ Xorbits Inference (Xinference)
Xinference is a powerful and versatile library designed to serve LLMs,