Maxim
Maxim is an agent simulation, evaluation, and observability platform that empowers modern AI teams to deploy agents with quality, reliability, and speed.
Maxim's end-to-end evaluation and data management stack covers every stage of the AI lifecycle, from prompt engineering to pre & post release testing and observability, data-set creation & management, and fine-tuning.
Use Maxim to simulate and test your multi-turn workflows on a wide variety of scenarios and across different user personas before taking your application to production.
Features:
Agent Simulation
Agent Evaluation
Prompt Playground
Logging/Tracing Workflows
Custom Evaluators- AI, Programmatic and Statistical
Dataset Curation
Human-in-the-loop
Use Case:
Simulate and test AI agents
Evals for agentic workflows: pre and post-release
Tracing and debugging multi-agent workflows
Real-time alerts on performance and quality
Creating robust datasets for evals and fine-tuning
Human-in-the-loop workflows
Learn more
Literal AI
Literal AI is a collaborative platform designed to assist engineering and product teams in developing production-grade Large Language Model (LLM) applications. It offers a suite of tools for observability, evaluation, and analytics, enabling efficient tracking, optimization, and integration of prompt versions. Key features include multimodal logging, encompassing vision, audio, and video, prompt management with versioning and AB testing capabilities, and a prompt playground for testing multiple LLM providers and configurations. Literal AI integrates seamlessly with various LLM providers and AI frameworks, such as OpenAI, LangChain, and LlamaIndex, and provides SDKs in Python and TypeScript for easy instrumentation of code. The platform also supports the creation of experiments against datasets, facilitating continuous improvement and preventing regressions in LLM applications.
Learn more
Langfuse
Langfuse is an open source LLM engineering platform to help teams collaboratively debug, analyze and iterate on their LLM Applications.
Observability: Instrument your app and start ingesting traces to Langfuse
Langfuse UI: Inspect and debug complex logs and user sessions
Prompts: Manage, version and deploy prompts from within Langfuse
Analytics: Track metrics (LLM cost, latency, quality) and gain insights from dashboards & data exports
Evals: Collect and calculate scores for your LLM completions
Experiments: Track and test app behavior before deploying a new version
Why Langfuse?
- Open source
- Model and framework agnostic
- Built for production
- Incrementally adoptable - start with a single LLM call or integration, then expand to full tracing of complex chains/agents
- Use GET API to build downstream use cases and export data
Learn more
Arize Phoenix
Phoenix is an open-source observability library designed for experimentation, evaluation, and troubleshooting. It allows AI engineers and data scientists to quickly visualize their data, evaluate performance, track down issues, and export data to improve. Phoenix is built by Arize AI, the company behind the industry-leading AI observability platform, and a set of core contributors. Phoenix works with OpenTelemetry and OpenInference instrumentation. The main Phoenix package is arize-phoenix. We offer several helper packages for specific use cases. Our semantic layer is to add LLM telemetry to OpenTelemetry. Automatically instrumenting popular packages. Phoenix's open-source library supports tracing for AI applications, via manual instrumentation or through integrations with LlamaIndex, Langchain, OpenAI, and others. LLM tracing records the paths taken by requests as they propagate through multiple steps or components of an LLM application.
Learn more