Search Results for "wimdows open source benchmark"

Sort By:

Showing 699 open source projects for "wimdows open source benchmark"

View related business solutions

MongoDB Atlas runs apps anywhere
Deploy in 115+ regions with the modern database for every enterprise.

MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.

Start Free
Powerful App Monitoring Without Surprise Bills
AppSignal starts at $23/month with all features included. No overages, no hidden fees. 30-day free trial.

Tired of monitoring tools that punish you for scaling? AppSignal offers transparent, predictable pricing with every feature unlocked on every plan. Track errors, monitor performance, detect anomalies, and manage logs across Ruby, Python, Node.js, and more. Trusted by developers since 2012 with free dev-to-dev support. No credit card required to start your 30-day trial.

Try AppSignal Free
1

Open Source Vizier

Python-based research interface for blackbox

Open Source (OSS) Vizier is a Python-based interface for blackbox optimization and research, based on Google’s original internal Vizier, one of the first hyperparameter tuning services designed to work at scale. Allows a user to setup an OSS Vizier Server, which can host black-box optimization algorithms to serve multiple clients simultaneously in a fault-tolerant manner to tune their objective functions.

Downloads: 0 This Week

Last Update: 2025-02-01
See Project
2

Benchmark

A microbenchmark support library

A library to benchmark code snippets, similar to unit tests.

Downloads: 0 This Week

Last Update: 2026-01-21
See Project
3

Open LLMs

A list of open LLMs available for commercial use

Open LLMs, by the same author behind applied-ml — serves as a curated directory of open large language models (LLMs) that are available for commercial or open-source use. Rather than proprietary or closed-source LLMs, this repo focuses on freely available or permissively licensed models that practitioners can download, run, fine-tune or integrate without restrictive licensing.

Downloads: 2 This Week

Last Update: 2025-12-10
See Project
4

MLPerf

Reference implementations of MLPerf™ training benchmarks

This is a repository of reference implementations for the MLPerf training benchmarks. These implementations are valid as starting points for benchmark implementations but are not fully optimized and are not intended to be used for "real" performance measurements of software frameworks or hardware. Benchmarking the performance of training ML models on a wide variety of use cases, software, and hardware drives AI performance across the tech industry. The MLPerf Training working group draws on...

Downloads: 1 This Week

Last Update: 2024-08-16
See Project
Gemini 3 and 200+ AI Models on One Platform
Access Google's best plus Claude, Llama, and Gemma. Fine-tune and deploy from one console.

Build generative AI apps with Vertex AI. Switch between models without switching platforms.

Start Free
5

hyperfine

A command-line benchmarking tool

A command-line benchmarking tool. Statistical analysis across multiple runs. Support for arbitrary shell commands. Constant feedback about the benchmark progress and current estimates. Warmup runs can be executed before the actual benchmark. Cache-clearing commands can be set up before each timing run. Statistical outlier detection to detect interference from other programs and caching effects. Export results to various formats: CSV, JSON, Markdown, AsciiDoc. Parameterized benchmarks (e.g....

Downloads: 2 This Week

Last Update: 2025-11-18
See Project
6

XMRig

RandomX, KawPow, CryptoNight, AstroBWT and GhostRider unified miner

High performance, open-source, cross-platform RandomX, KawPow, CryptoNight, and AstroBWT CPU/GPU miner, RandomX benchmark, and stratum proxy. XMRig is a high-performance, open-source, cross-platform RandomX, KawPow, CryptoNight, and AstroBWT unified CPU/GPU miner and RandomX benchmark. Official binaries are available for Windows, Linux, macOS, and FreeBSD.

1 Review

Downloads: 25 This Week

Last Update: 2025-12-23
See Project
7

BEIR

A Heterogeneous Benchmark for Information Retrieval

BEIR is a benchmark framework for evaluating information retrieval models across various datasets and tasks, including document ranking and question answering.

Downloads: 0 This Week

Last Update: 2025-06-04
See Project
8

BenchmarkTools.jl

A benchmarking framework for the Julia language

BenchmarkTools makes performance tracking of Julia code easy by supplying a framework for writing and running groups of benchmarks as well as comparing benchmark results. This package is used to write and run the benchmarks found in BaseBenchmarks.jl. The CI infrastructure for automated performance testing of the Julia language is not in this package but can be found in Nanosoldier.jl. Our story begins with two packages, "Benchmarks" and "BenchmarkTrackers". The Benchmarks package...

Downloads: 0 This Week

Last Update: 2025-10-21
See Project
9

GLM-4.6

Agentic, Reasoning, and Coding (ARC) foundation models

GLM-4.6 is the latest iteration of Zhipu AI’s foundation model, delivering significant advancements over GLM-4.5. It introduces an extended 200K token context window, enabling more sophisticated long-context reasoning and agentic workflows. The model achieves superior coding performance, excelling in benchmarks and practical coding assistants such as Claude Code, Cline, Roo Code, and Kilo Code. Its reasoning capabilities have been strengthened, including improved tool usage during inference...

Downloads: 61 This Week

Last Update: 2026-02-01
See Project
Stop Storing Third-Party Tokens in Your Database
Auth0 Token Vault handles secure token storage, exchange, and refresh for external providers so you don't have to build it yourself.

Rolling your own OAuth token storage can be a security liability. Token Vault securely stores access and refresh tokens from federated providers and handles exchange and renewal automatically. Connected accounts, refresh exchange, and privileged worker flows included.

Try Auth0 for Free
10

AgentBench

A Comprehensive Benchmark to Evaluate LLMs as Agents (ICLR'24)

AgentBench is an open-source benchmark designed to evaluate the capabilities of large language models when used as autonomous agents. Unlike traditional language model benchmarks that focus on static text tasks, AgentBench measures how models perform in interactive environments that require planning, reasoning, and decision-making. The benchmark includes multiple environments that simulate realistic scenarios such as web interaction, database querying, and problem solving tasks. ...

Downloads: 0 This Week

Last Update: 2026-03-05
See Project
11

AICGSecEval

A.S.E (AICGSecEval) is a repository-level AI-generated code security

AICGSecEval is an open-source benchmark framework designed to evaluate the security of code generated by artificial intelligence systems. The project was developed to address concerns that AI-assisted programming tools may produce insecure code containing vulnerabilities such as injection flaws or unsafe logic. The framework constructs evaluation tasks based on real-world software repositories and known vulnerability cases derived from CVE records.

Downloads: 0 This Week

Last Update: 2026-03-09
See Project
12

MTEB

MTEB: Massive Text Embedding Benchmark

Text embeddings are commonly evaluated on a small set of datasets from a single task not covering their possible applications to other tasks. It is unclear whether state-of-the-art embeddings on semantic textual similarity (STS) can be equally well applied to other tasks like clustering or reranking. This makes progress in the field difficult to track, as various models are constantly being proposed without proper evaluation. To solve this problem, we introduce the Massive Text Embedding...

Downloads: 5 This Week

Last Update: 1 day ago
See Project
13

LongBench

LongBench v2 and LongBench (ACL 25'&24')

LongBench is a comprehensive benchmark designed to evaluate the ability of large language models to understand and reason over very long textual contexts. Traditional language model benchmarks typically evaluate tasks involving relatively short inputs, which does not reflect many real-world applications such as analyzing large documents or entire code repositories. LongBench addresses this gap by providing datasets that require models to process and reason over long sequences of text across...

Downloads: 0 This Week

Last Update: 2026-03-09
See Project
14

kube-bench

Checks whether Kubernetes is deployed

kube-bench is a tool that checks whether Kubernetes is deployed securely by running the checks documented in the CIS Kubernetes Benchmark. Trivy, the all-in-one cloud-native security scanner, can be deployed as a Kubernetes Operator inside a cluster. Both, the Trivy CLI, and the Trivy Operator support CIS Kubernetes Benchmark scanning among several other features. There are multiple ways to run kube-bench. You can run kube-bench inside a pod, but it will need access to the host's PID...

Downloads: 0 This Week

Last Update: 2026-02-20
See Project
15

Crow Framework

A Fast and Easy to use microframework for the web

A Fast and Easy to use microframework for the web. Crow is a C++ framework for creating HTTP or Websocket web services. It uses routing similar to Python's Flask which makes it easy to use. It is also extremely fast, beating multiple existing C++ frameworks as well as non-C++ frameworks. Crow is provided free of charge courtesy of everyone who is donating their money, time, and expertise to keep it going. The 1000-mile journey begins with a single step. Get started by installing Crow and...

Downloads: 21 This Week

Last Update: 2026-02-12
See Project
16

DeepSeek-OCR 2

Visual Causal Flow

DeepSeek-OCR-2 is the second-generation optical character recognition system developed to improve document understanding by introducing a “visual causal flow” mechanism, enabling the encoder to reorder visual tokens in a way that better reflects semantic structure rather than strict raster scan order. It is designed to handle complex layouts and noisy documents by giving the model causal reasoning capabilities that mimic human visual scanning behavior, enhancing OCR performance on documents...

Downloads: 10 This Week

Last Update: 2026-02-03
See Project
17

HumanEval

Code for the paper "Evaluating Large Language Models Trained on Code"

human-eval is a benchmark dataset and evaluation framework created by OpenAI for measuring the ability of language models to generate correct code. It consists of hand-written programming problems with unit tests, designed to assess functional correctness rather than superficial metrics like text similarity. Each task includes a natural language prompt and a function signature, requiring the model to generate an implementation that passes all provided tests. The benchmark has become a...

Downloads: 1 This Week

Last Update: 3 days ago
See Project
18

Meta Agents Research Environments (ARE)

Meta Agents Research Environments is a comprehensive platform

Meta Agents Research Environments (ARE) is a simulation and benchmarking platform. It is designed to evaluate AI agents in dynamic, evolving, multi-step tasks. Unlike static benchmarks, ARE supports environments where agents must adapt to changes over time and reason over sequences of actions. It interacts with applications and faces uncertainty. The included Gaia2 benchmark offers 800 scenarios across multiple “universes”. It can test reasoning, memory, tool use, and adaptability....

Downloads: 0 This Week

Last Update: 2026-01-23
See Project
19

JMH Gradle Plugin

Integrates the JMH benchmarking framework with Gradle

The JMH Gradle Plugin provides integration of the Java Microbenchmark Harness (JMH) into Gradle builds, enabling developers to write and run performance benchmarks directly in their projects. JMH is the de facto standard for writing accurate and reliable Java microbenchmarks, and this plugin automates tasks like generating benchmark sources, compiling them with the required JMH support classes, and packaging runnable benchmark jars. It simplifies the workflow by handling classpath setup and...

Downloads: 0 This Week

Last Update: 2025-09-03
See Project
20

Hallucination Leaderboard

Leaderboard Comparing LLM Performance at Producing Hallucinations

Hallucination Leaderboard is an open research project that tracks and compares the tendency of large language models to produce hallucinated or inaccurate information when generating summaries. The project provides a standardized benchmark that evaluates different models using a dedicated hallucination detection system known as the Hallucination Evaluation Model. Each model is tested on document summarization tasks to measure how often generated responses introduce information that is not supported by the original source material. ...

Downloads: 0 This Week

Last Update: 2026-03-10
See Project
21

SDGym

Benchmarking synthetic data generation methods

The Synthetic Data Gym (SDGym) is a benchmarking framework for modeling and generating synthetic data. Measure performance and memory usage across different synthetic data modeling techniques – classical statistics, deep learning and more! The SDGym library integrates with the Synthetic Data Vault ecosystem. You can use any of its synthesizers, datasets or metrics for benchmarking. You also customize the process to include your own work. Select any of the publicly available datasets from the...

Downloads: 0 This Week

Last Update: 2026-03-09
See Project
22

CodeGeeX

CodeGeeX: An Open Multilingual Code Generation Model (KDD 2023)

CodeGeeX is a large-scale multilingual code generation model with 13 billion parameters, trained on 850B tokens across more than 20 programming languages. Developed with MindSpore and later made PyTorch-compatible, it is capable of multilingual code generation, cross-lingual code translation, code completion, summarization, and explanation. It has been benchmarked on HumanEval-X, a multilingual program synthesis benchmark introduced alongside the model, and achieves state-of-the-art...

Downloads: 7 This Week

Last Update: 3 days ago
See Project
23

LZ4

Extremely fast compression algorithm

...It features an extremely fast decoder, with speed in multiple GB/s per core (~1 Byte/cycle). A high compression derivative, called LZ4_HC, is available, trading customizable CPU time for compression ratio. LZ4 library is provided as open-source software using a BSD license. This benchmark simulates simple "static content transfer" scenario such as OS Kernel compression or video game's static assets (text/images/tables/scripts/etc) which loading from Flash Memory / HDD / SSD. In this case, compression time is completely ignored. Because only content developers compress the data at once and usually they don't care about its computational cost. ...

Downloads: 271 This Week

Last Update: 2024-07-22
See Project
24

KonaBess

A GPU overclock & undervolt tool for various Snapdragon chips

KonaBess is a straightforward application designed to customize GPU frequency and voltage tables without the need for kernel recompilation. The application achieves customization by unpacking the Boot/Vendor Boot image, decompiling and editing relevant dtb (device tree binary) files, and finally repacking and flashing the modified image. The extent of improvement varies, with some users reporting a 25% reduction in power consumption in the graphics benchmark (4.2w->3.2w) after undervolting...

Downloads: 45 This Week

Last Update: 2025-10-03
See Project
25

Drill

Drill is an HTTP load testing application written in Rust

Drill is an HTTP load-testing application written in Rust. The main goal for this project is to build a really lightweight tool as an alternative to other that require JVM and other stuff. You can write benchmark files, in YAML format, describing all the stuff you want to test. It was inspired by Ansible syntax because it is really easy to use and extend. As you can see, you can play with interpolations in different ways. This will let you specify a benchmark with different requests and...

Downloads: 0 This Week

Last Update: 2025-12-29
See Project