Best Open Source Go Large Language Models (LLM)

Go Large Language Models (LLM)

Large Language Models (LLM) Go Clear Filters

Browse free open source Go Large Language Models (LLM) and projects below. Use the toggles on the left to filter open source Go Large Language Models (LLM) by OS, license, language, programming language, and project status.

Our Free Plans just got better! | Auth0
With up to 25k MAUs and unlimited Okta connections, our Free Plan lets you focus on what you do best—building great apps.

You asked, we delivered! Auth0 is excited to expand our Free and Paid plans to include more options so you can focus on building, deploying, and scaling applications without having to worry about your security. Auth0 now, thank yourself later.

Try free now
Easily Host LLMs and Web Apps on Cloud Run
Run everything from popular models with on-demand NVIDIA L4 GPUs to web apps without infrastructure management.

Run frontend and backend services, batch jobs, host LLMs, and queue processing workloads without the need to manage infrastructure. Cloud Run gives you on-demand GPU access for hosting LLMs and running real-time AI—with 5-second cold starts and automatic scale-to-zero so you only pay for actual usage. New customers get $300 in free credit to start.

Try Cloud Run Free
1

LocalAI

Self-hosted, community-driven, local OpenAI compatible API

Self-hosted, community-driven, local OpenAI compatible API. Drop-in replacement for OpenAI running LLMs on consumer-grade hardware. Free Open Source OpenAI alternative. No GPU is required. Runs ggml, GPTQ, onnx, TF compatible models: llama, gpt4all, rwkv, whisper, vicuna, koala, gpt4all-j, cerebras, falcon, dolly, starcoder, and many others. LocalAI is a drop-in replacement REST API that’s compatible with OpenAI API specifications for local inferencing. It allows you to run LLMs (and not only) locally or on-prem with consumer-grade hardware, supporting multiple model families that are compatible with the ggml format. Does not require GPU.

Downloads: 19 This Week

Last Update: 2026-02-07
See Project
2

Qwen2.5-Coder

Qwen2.5-Coder is the code version of Qwen2.5, the large language model

Qwen2.5-Coder, developed by QwenLM, is an advanced open-source code generation model designed for developers seeking powerful and diverse coding capabilities. It includes multiple model sizes—ranging from 0.5B to 32B parameters—providing solutions for a wide array of coding needs. The model supports over 92 programming languages and offers exceptional performance in generating code, debugging, and mathematical problem-solving. Qwen2.5-Coder, with its long context length of 128K tokens, is ideal for a variety of use cases, from simple code assistants to complex programming scenarios, matching the capabilities of models like GPT-4o.

1 Review

Downloads: 7 This Week

Last Update: 2025-03-04
See Project
3

KubeAI

Private Open AI on Kubernetes

Get inferencing running on Kubernetes: LLMs, Embeddings, Speech-to-Text. KubeAI serves an OpenAI compatible HTTP API. Admins can configure ML models by using the Model Kubernetes Custom Resources. KubeAI can be thought of as a Model Operator (See Operator Pattern) that manages vLLM and Ollama servers.

Downloads: 0 This Week

Last Update: 2025-12-03
See Project
4

LLaMA.go

llama.go is like llama.cpp in pure Golang

llama.go is like llama.cpp in pure Golang. The code of the project is based on the legendary ggml.cpp framework of Georgi Gerganov written in C++ with the same attitude to performance and elegance. Both models store FP32 weights, so you'll needs at least 32Gb of RAM (not VRAM or GPU RAM) for LLaMA-7B. Double to 64Gb for LLaMA-13B.

Downloads: 0 This Week

Last Update: 2023-08-25
See Project
Build AI Apps with Gemini 3 on Vertex AI
Access Google’s most capable multimodal models. Train, test, and deploy AI with 200+ foundation models on one platform.

Vertex AI gives developers access to Gemini 3—Google’s most advanced reasoning and coding model—plus 200+ foundation models including Claude, Llama, and Gemma. Build generative AI apps with Vertex AI Studio, customize with fine-tuning, and deploy to production with enterprise-grade MLOps. New customers get $300 in free credits.

Try Vertex AI Free
5

Zep

Zep: A long-term memory store for LLM / Chatbot applications

Easily add relevant documents, chat history memory & rich user data to your LLM app's prompts. Understands chat messages, roles, and user metadata, not just texts and embeddings. Zep Memory and VectorStore implementations are shipped with your favorite frameworks: LangChain, LangChain.js, LlamaIndex, and more. Automatically embed texts and messages using state-of-the-art opeb source models, OpenAI, or bring your own vectors. Zep’s local embedding models and async enrichment ensure a snappy user experience.

Downloads: 0 This Week

Last Update: 2025-09-11
See Project
6

aqueduct LLM

Aqueduct allows you to run LLM and ML workloads on any infrastructure

Aqueduct is an MLOps framework that allows you to define and deploy machine learning and LLM workloads on any cloud infrastructure. Aqueduct is an open-source MLOps framework that allows you to write code in vanilla Python, run that code on any cloud infrastructure you'd like to use, and gain visibility into the execution and performance of your models and predictions. Aqueduct's Python native API allows you to define ML tasks in regular Python code. You can connect Aqueduct to your existing cloud infrastructure (docs), and Aqueduct will seamlessly move your code from your laptop to the cloud or between different cloud infrastructure layers. Aqueduct provides a single interface to running machine learning tasks on your existing cloud infrastructure — Kubernetes, Spark, Lambda, etc. From the same Python API, you can run code across any or all of these systems seamlessly and gain visibility into how your code is performing.

Downloads: 0 This Week

Last Update: 2023-08-25
See Project