+
+

Related Products

  • LM-Kit.NET
    23 Ratings
    Visit Website
  • Vertex AI
    783 Ratings
    Visit Website
  • RunPod
    180 Ratings
    Visit Website
  • RaimaDB
    9 Ratings
    Visit Website
  • Google AI Studio
    11 Ratings
    Visit Website
  • CLEAR
    1 Rating
    Visit Website
  • Apify
    1,021 Ratings
    Visit Website
  • Hotspot Shield
    121 Ratings
    Visit Website
  • Google Cloud BigQuery
    1,927 Ratings
    Visit Website
  • Nutrient SDK
    100 Ratings
    Visit Website

About

Fast, lightweight, portable, rust-powered, and OpenAI compatible. We work with cloud providers, especially edge cloud/CDN compute providers, to support microservices for web apps. Use cases include AI inference, database access, CRM, ecommerce, workflow management, and server-side rendering. We work with streaming frameworks and databases to support embedded serverless functions for data filtering and analytics. The serverless functions could be database UDFs. They could also be embedded in data ingest or query result streams. Take full advantage of the GPUs, write once, and run anywhere. Get started with the Llama 2 series of models on your own device in 5 minutes. Retrieval-argumented generation (RAG) is a very popular approach to building AI agents with external knowledge bases. Create an HTTP microservice for image classification. It runs YOLO and Mediapipe models at native GPU speed.

About

VLLM is a high-performance library designed to facilitate efficient inference and serving of Large Language Models (LLMs). Originally developed in the Sky Computing Lab at UC Berkeley, vLLM has evolved into a community-driven project with contributions from both academia and industry. It offers state-of-the-art serving throughput by efficiently managing attention key and value memory through its PagedAttention mechanism. It supports continuous batching of incoming requests and utilizes optimized CUDA kernels, including integration with FlashAttention and FlashInfer, to enhance model execution speed. Additionally, vLLM provides quantization support for GPTQ, AWQ, INT4, INT8, and FP8, as well as speculative decoding capabilities. Users benefit from seamless integration with popular Hugging Face models, support for various decoding algorithms such as parallel sampling and beam search, and compatibility with NVIDIA GPUs, AMD CPUs and GPUs, Intel CPUs, and more.

Platforms Supported

Windows
Mac
Linux
Cloud
On-Premises
iPhone
iPad
Android
Chromebook

Platforms Supported

Windows
Mac
Linux
Cloud
On-Premises
iPhone
iPad
Android
Chromebook

Audience

Developers in search of a runtime solution to build cloud-native applications

Audience

AI infrastructure engineers looking for a solution to optimize the deployment and serving of large-scale language models in production environments

Support

Phone Support
24/7 Live Support
Online

Support

Phone Support
24/7 Live Support
Online

API

Offers API

API

Offers API

Screenshots and Videos

Screenshots and Videos

Pricing

No information available.
Free Version
Free Trial

Pricing

No information available.
Free Version
Free Trial

Reviews/Ratings

Overall 0.0 / 5
ease 0.0 / 5
features 0.0 / 5
design 0.0 / 5
support 0.0 / 5

This software hasn't been reviewed yet. Be the first to provide a review:

Review this Software

Reviews/Ratings

Overall 0.0 / 5
ease 0.0 / 5
features 0.0 / 5
design 0.0 / 5
support 0.0 / 5

This software hasn't been reviewed yet. Be the first to provide a review:

Review this Software

Training

Documentation
Webinars
Live Online
In Person

Training

Documentation
Webinars
Live Online
In Person

Company Information

Second State
United States
www.secondstate.io

Company Information

VLLM
United States
docs.vllm.ai/en/latest/

Alternatives

Alternatives

OpenVINO

OpenVINO

Intel
LM-Kit.NET

LM-Kit.NET

LM-Kit
Vertex AI

Vertex AI

Google
Ministral 3B

Ministral 3B

Mistral AI

Categories

Categories

Integrations

Docker
Kubernetes
OpenAI
ChatGPT
Database Mart
Discord
Filecoin
GitHub
GitLab
Hugging Face
KServe
Llama 2
NVIDIA DRIVE
Nebula Graph
Node.js
Oasis Parcel
Polkadot
Slack
VMware Cloud
WebAssembly

Integrations

Docker
Kubernetes
OpenAI
ChatGPT
Database Mart
Discord
Filecoin
GitHub
GitLab
Hugging Face
KServe
Llama 2
NVIDIA DRIVE
Nebula Graph
Node.js
Oasis Parcel
Polkadot
Slack
VMware Cloud
WebAssembly
Claim Second State and update features and information
Claim Second State and update features and information
Claim VLLM and update features and information
Claim VLLM and update features and information