Showing 1010 open source projects for "data quality"

View related business solutions
  • Level Up Your Cyber Defense with External Threat Management Icon
    Level Up Your Cyber Defense with External Threat Management

    See every risk before it hits. From exposed data to dark web chatter. All in one unified view.

    Move beyond alerts. Gain full visibility, context, and control over your external attack surface to stay ahead of every threat.
    Try for Free
  • Gen AI apps are built with MongoDB Atlas Icon
    Gen AI apps are built with MongoDB Atlas

    Build gen AI apps with an all-in-one modern database: MongoDB Atlas

    MongoDB Atlas provides built-in vector search and a flexible document model so developers can build, scale, and run gen AI apps without stitching together multiple databases. From LLM integration to semantic search, Atlas simplifies your AI architecture—and it’s free to get started.
    Start Free
  • 1
    DQO Data Quality Operations Center

    DQO Data Quality Operations Center

    Data Quality Operations Center

    DQO is an DataOps friendly data quality monitoring tool with customizable data quality checks and data quality dashboards. DQO comes with around 100 predefined data quality checks which helps you monitor the quality of your data. Table and column-level checks which allows writing your own SQL queries. Daily and monthly date partition testing. Data segmentation by up to 9 different data streams. Build-in scheduling. Calculation of data quality KPIs which can be displayed on multiple built...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 2
    data-diff

    data-diff

    Efficiently diff rows across two different databases

    We're excited to announce the launch of a new open-source product, data-diff that makes comparing datasets across databases fast at any scale. data-diff automates data quality checks for data replication and migration. In modern data platforms, data is constantly moving between systems, and at the modern data volume and complexity, systems go out of sync all the time. Until now, there has not been any tooling to ensure that when the data is correctly copied. Replicating data at scale, across...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 3
    Synthetic Data Kit

    Synthetic Data Kit

    Tool for generating high quality Synthetic datasets

    Synthetic Data Kit is a CLI-centric toolkit for generating high-quality synthetic datasets to fine-tune Llama models, with an emphasis on producing reasoning traces and QA pairs that line up with modern instruction-tuning formats. It ships an opinionated, modular workflow that covers ingesting heterogeneous sources (documents, transcripts), prompting models to create labeled examples, and exporting to fine-tuning schemas with minimal glue code. The kit’s design goal is to shorten the “data prep...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 4
    Cookiecutter Data Science

    Cookiecutter Data Science

    Project structure for doing and sharing data science work

    A logical, reasonably standardized, but flexible project structure for doing and sharing data science work. When we think about data analysis, we often think just about the resulting reports, insights, or visualizations. While these end products are generally the main event, it's easy to focus on making the products look nice and ignore the quality of the code that generates them. Because these end products are created programmatically, code quality is still important! And we're not talking...
    Downloads: 0 This Week
    Last Update:
    See Project
  • Keep company data safe with Chrome Enterprise Icon
    Keep company data safe with Chrome Enterprise

    Protect your business with AI policies and data loss prevention in the browser

    Make AI work your way with Chrome Enterprise. Block unapproved sites and set custom data controls that align with your company's policies.
    Download Chrome
  • 5
    Data-Juicer

    Data-Juicer

    Data processing for and with foundation models

    Data-Juicer is an open-source data processing and augmentation framework designed to enhance the quality and diversity of datasets for machine learning tasks. It includes a modular pipeline for scalable data transformation.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 6
    dbt-re-data

    dbt-re-data

    re_data - fix data issues before your users & CEO would discover them

    re_data is an open-source data reliability framework for the modern data stack. Currently, re_data focuses on observing the dbt project (together with underlaying data warehouse - Postgres, BigQuery, Snowflake, Redshift). Data transformations in re_data are implemented and exposed as models & macros in this dbt package. Gather all relevant outputs about your data in one place using our cloud. Invite your team and debug it easily from there. Go back in time, and see your past metadata. Set up...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 7
    Synthetic Data Vault (SDV)

    Synthetic Data Vault (SDV)

    Synthetic Data Generation for tabular, relational and time series data

    The Synthetic Data Vault (SDV) is a Synthetic Data Generation ecosystem of libraries that allows users to easily learn single-table, multi-table and timeseries datasets to later on generate new Synthetic Data that has the same format and statistical properties as the original dataset. Synthetic data can then be used to supplement, augment and in some cases replace real data when training Machine Learning models. Additionally, it enables the testing of Machine Learning or other data dependent...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 8
    LosslessCut

    LosslessCut

    The swiss army knife of lossless video/audio editing

    ... losing quality. Or you can add a music or subtitle track to your video without needing to encode. Everything is extremely fast because it does an almost direct data copy, fueled by the awesome FFmpeg which does all the grunt work. Lossless merge/concatenation of arbitrary files (with identical codecs parameters, e.g. from the same camera). Lossless stream editing: Combine arbitrary tracks from multiple files (ex. add music or subtitle track to a video file).
    Downloads: 284 This Week
    Last Update:
    See Project
  • 9
    Wan2.2

    Wan2.2

    Wan2.2: Open and Advanced Large-Scale Video Generative Model

    Wan2.2 is a major upgrade to the Wan series of open and advanced large-scale video generative models, incorporating cutting-edge innovations to boost video generation quality and efficiency. It introduces a Mixture-of-Experts (MoE) architecture that splits the denoising process across specialized expert models, increasing total model capacity without raising computational costs. Wan2.2 integrates meticulously curated cinematic aesthetic data, enabling precise control over lighting, composition...
    Downloads: 90 This Week
    Last Update:
    See Project
  • Simple, Secure Domain Registration Icon
    Simple, Secure Domain Registration

    Get your domain at wholesale price. Cloudflare offers simple, secure registration with no markups, plus free DNS, CDN, and SSL integration.

    Register or renew your domain and pay only what we pay. No markups, hidden fees, or surprise add-ons. Choose from over 400 TLDs (.com, .ai, .dev). Every domain is integrated with Cloudflare's industry-leading DNS, CDN, and free SSL to make your site faster and more secure. Simple, secure, at-cost domain registration.
    Sign up for free
  • 10
    LaTeX2e Kernel Code Repository
    LaTeX is a high-quality typesetting system; it includes features designed for the production of technical and scientific documentation. LaTeX is the de facto standard for the communication and publication of scientific documents. LaTeX is available as free software.
    Downloads: 70 This Week
    Last Update:
    See Project
  • 11
    DB Browser for SQLite

    DB Browser for SQLite

    The DB Browser for SQLite

    DB Browser for SQLite (DB4S) is a high quality, visual, open source tool to create, design, and edit database files compatible with SQLite. DB4S is for users and developers who want to create, search, and edit databases. DB4S uses a familiar spreadsheet-like interface, and complicated SQL commands do not have to be learned. This program is not a visual shell for the sqlite command line tool, and does not require familiarity with SQL commands. It is a tool to be used by both developers and end...
    Downloads: 86 This Week
    Last Update:
    See Project
  • 12
    ZXing

    ZXing

    Barcode scanning library for Java, Android

    ZXing or “Zebra Crossing” is an open source multi-format 1D/2D barcode image processing library that’s been implemented in Java, and also comes with ports to other languages. It currently supports the following formats: UPC-A and UPC-E EAN-8 and EAN-13 Code 39 Code 93 Code 128 ITF Codabar RSS-14 (all variants) RSS Expanded (most variants) QR Code Data Matrix Aztec ('beta' quality) PDF 417 ('alpha' quality) MaxiCode ZXing is made up of several modules, including a core image...
    Downloads: 65 This Week
    Last Update:
    See Project
  • 13
    Grafana

    Grafana

    Leading open-source visualization and observability platform

    Grafana Open Source is a leading open-source visualization and observability platform that lets you query, visualize, alert on, and explore your data—regardless of where it’s stored. With support for 100+ data source plugins (such as Prometheus, Loki, Elasticsearch, InfluxDB, SQL/NoSQL databases, OpenTelemetry, and more), you can unify metrics, logs, traces, and other observability signals in one place. Grafana OSS empowers you to build dynamic, reusable dashboards with rich visualizations...
    Downloads: 47 This Week
    Last Update:
    See Project
  • 14
    CSV Lint

    CSV Lint

    CSV Lint plug-in for Notepad++ for syntax highlighting

    CSV Lint plug-in for Notepad++ for syntax highlighting, csv validation, automatic column and datatype detecting fixed width datasets, change datetime format, decimal separator, sort data, count unique values, convert to xml, json, sql etc. A plugin for data cleaning and working with messy data files. Use CSV Lint for metadata discovery, technical data validation, and reformatting on tabular data files. It is not meant to be a replacement for spreadsheet programs like Excel or SPSS, but rather...
    Downloads: 25 This Week
    Last Update:
    See Project
  • 15
    Mumble

    Mumble

    Mumble is an open-source, low-latency, high quality voice chat

    Mumble is an open-source, low-latency, high-quality voice chat software. There are two modules in Mumble; the client (mumble) and the server (murmur). The client works on Windows, Linux, FreeBSD, OpenBSD, and macOS, while the server should work on anything Qt can be installed on. Low-latency and high-quality voice-chat program written on top of Qt and Opus. Administrators appreciate Mumble for being able to self-host and have control over data security and privacy. Some make use...
    Downloads: 28 This Week
    Last Update:
    See Project
  • 16
    MinerU

    MinerU

    A high-quality tool for convert PDF to Markdown and JSON

    MinerU is an open-source, high-quality document extraction toolkit focused on converting PDFs (and other document formats) into structured Markdown and JSON. It leverages OCR and layout analysis to preserve semantic structure and metadata, ideal for research and data science workflows.
    Downloads: 13 This Week
    Last Update:
    See Project
  • 17
    Another Redis Desktop Manager

    Another Redis Desktop Manager

    A faster, better and more stable Redis desktop manager

    AnotherRedisDesktopManager is a cross-platform GUI client for Redis that simplifies connecting, browsing, and manipulating data. It supports standalone, Sentinel, and Cluster modes, plus SSH tunneling and ACL credentials for secure access in varied environments. The UI provides tree and table views of keys with inline editors for strings, hashes, lists, sets, sorted sets, and streams, including TTL management and batch operations. Built-in monitoring lets you watch stats, slow logs, and command...
    Downloads: 25 This Week
    Last Update:
    See Project
  • 18
    StabilityMatrix

    StabilityMatrix

    Multi-Platform Package Manager for Stable Diffusion

    StabilityMatrix is a project that helps organize, evaluate, and compare generative AI models and their behavior across prompts, datasets, or configuration settings. It provides a framework to run experiments systematically—capturing inputs, model configurations, outputs, and metrics—so researchers and practitioners can reason about differences in quality, robustness, and failure modes. The repository often bundles tooling for automated prompt sweeping, scoring heuristics (such as diversity...
    Downloads: 24 This Week
    Last Update:
    See Project
  • 19
    Animity

    Animity

    Use the Android app to watch anime on your phone without ads

    An Android app to watch anime on your phone without ads. Animity is an anime streaming app that provides high-quality streaming of your favorite anime shows. It's designed to be easy to use, so you can quickly find the anime you want to watch and start streaming. With Animity, you can also sync your account with Anilist, so you can keep track of your favorite shows and manage your anime watchlist. Whether you're a die-hard anime fan or a casual viewer, Animity has something for everyone...
    Downloads: 19 This Week
    Last Update:
    See Project
  • 20
    Middleware

    Middleware

    Open-source DORA metrics platform for engineering teams

    Bring more visibility to your engineering pipeline, get the right data & actionable insights to unclog bottlenecks, ensuring smooth software delivery.
    Downloads: 12 This Week
    Last Update:
    See Project
  • 21
    AdGuardHome

    AdGuardHome

    Network-wide ads and trackers blocking DNS server

    ... and are updated on a regular basis, guaranteeing the best filtering quality. Protecting your personal data is our top priority. With AdGuard, you and your sensitive data will be safe from any online tracker and analytics system that may attempt to steal your data while surfing the web. Use the Family protection mode to block access to all websites with adult content and enforce safe search in the browser, in addition to the regular perks of ad blocking and browsing security.
    Downloads: 16 This Week
    Last Update:
    See Project
  • 22
    DeepSeek-V3.2-Exp

    DeepSeek-V3.2-Exp

    An experimental version of DeepSeek model

    DeepSeek-V3.2-Exp is an experimental release of the DeepSeek model family, intended as a stepping stone toward the next generation architecture. The key innovation in this version is DeepSeek Sparse Attention (DSA), a sparse attention mechanism that aims to optimize training and inference efficiency in long-context settings without degrading output quality. According to the authors, they aligned the training setup of V3.2-Exp with V3.1-Terminus so that benchmark results remain largely...
    Downloads: 17 This Week
    Last Update:
    See Project
  • 23
    Matplotlib

    Matplotlib

    matplotlib: plotting with Python

    Matplotlib is a comprehensive library for creating static, animated, and interactive visualizations in Python. Matplotlib makes easy things easy and hard things possible. Matplotlib ships with several add-on toolkits, including 3D plotting with mplot3d, axes helpers in axes_grid1 and axis helpers in axisartist. A large number of third party packages extend and build on Matplotlib functionality, including several higher-level plotting interfaces (seaborn, HoloViews, ggplot, ...), and a...
    Downloads: 12 This Week
    Last Update:
    See Project
  • 24
    Best-of Python

    Best-of Python

    A ranked list of awesome Python open-source libraries

    This curated list contains 390 awesome open-source projects with a total of 1.4M stars grouped into 28 categories. All projects are ranked by a project-quality score, which is calculated based on various metrics automatically collected from GitHub and different package managers. If you like to add or update projects, feel free to open an issue, submit a pull request, or directly edit the projects.yaml. Contributions are very welcome! Ranked list of awesome python libraries for web development...
    Downloads: 8 This Week
    Last Update:
    See Project
  • 25
    WebP Codec

    WebP Codec

    Library to encode and decode images in WebP format

    libwebp is the reference codec library for Google’s WebP image format, providing both encoding and decoding along with command-line tools. It supplies cwebp to compress images into WebP and dwebp to decompress them back, making it easy to test quality/size trade-offs across presets and tuning parameters. The GitHub repository is a mirror; the canonical source of truth lives on Chromium’s git, and developer docs are hosted on WebP’s portal. The project underpins WebP support across browsers...
    Downloads: 10 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • 2
  • 3
  • 4
  • 5
  • Next
Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.