vLLM download | SourceForge.net

vLLM is a fast and easy-to-use library for LLM inference and serving. High-throughput serving with various decoding algorithms, including parallel sampling, beam search, and more.

Features

State-of-the-art serving throughput
Efficient management of attention key and value memory with PagedAttention
Continuous batching of incoming requests
Optimized CUDA kernels
Seamless integration with popular HuggingFace models
Tensor parallelism support for distributed inference

Project Samples

Project Activity

See All Activity >

License

Apache License V2.0

Follow vLLM

vLLM Web Site

Other Useful Business Software

Our Free Plans just got better! | Auth0

With up to 25k MAUs and unlimited Okta connections, our Free Plan lets you focus on what you do best—building great apps.

You asked, we delivered! Auth0 is excited to expand our Free and Paid plans to include more options so you can focus on building, deploying, and scaling applications without having to worry about your security. Auth0 now, thank yourself later.

Try free now

Rate This Project

User Reviews

Be the first to post a review of vLLM!

Additional Project Details

Programming Language

Python

Related Categories

Python Large Language Models (LLM), Python LLM Inference Tool

Registered

2023-08-21

Similar Business Software

LM-Kit.NET

LM-Kit.NET is a cutting-edge, high-level inference SDK designed specifically to bring the advanced capabilities of Large Language Models (LLM) into the C# ecosystem. Tailored for developers working within .NET, LM-Kit.NET provides a comprehensive suite of powerful Generative AI tools, making...

See Software
Vertex AI

Build, deploy, and scale machine learning (ML) models faster, with fully managed ML tools for any use case. Through Vertex AI Workbench, Vertex AI is natively integrated with BigQuery, Dataproc, and Spark. You can use BigQuery ML to create and execute machine learning models in BigQuery...

See Software
Google AI Studio

Google AI Studio is a unified development platform that helps teams explore, build, and deploy applications using Google’s most advanced AI models, including Gemini 3. It brings text, image, audio, and video models together in one interactive playground. With vibe coding, developers can use...

See Software
VLLM

VLLM is a high-performance library designed to facilitate efficient inference and serving of Large Language Models (LLMs). Originally developed in the Sky Computing Lab at UC Berkeley, vLLM has evolved into a community-driven project with contributions from both academia and industry. It offers...

See Software
RunPod

RunPod offers a cloud-based platform designed for running AI workloads, focusing on providing scalable, on-demand GPU resources to accelerate machine learning (ML) model training and inference. With its diverse selection of powerful GPUs like the NVIDIA A100, RTX 3090, and H100, RunPod supports...

See Software
Ministral 3B

Mistral AI introduced two state-of-the-art models for on-device computing and edge use cases, named "les Ministraux": Ministral 3B and Ministral 8B. These models set a new frontier in knowledge, commonsense reasoning, function-calling, and efficiency in the sub-10B category. They can be used or...

See Software

Report inappropriate content

vLLM

A high-throughput and memory-efficient inference and serving engine

Get an email when there's a new version of vLLM

Features

Project Samples

Project Activity

Categories

License

Follow vLLM

User Reviews

Additional Project Details

Programming Language

Related Categories

Registered