Showing 226 open source projects for "cleaning"

View related business solutions
  • Gemini 3 and 200+ AI Models on One Platform Icon
    Gemini 3 and 200+ AI Models on One Platform

    Access Google's best plus Claude, Llama, and Gemma. Fine-tune and deploy from one console.

    Build generative AI apps with Vertex AI. Switch between models without switching platforms.
    Start Free
  • Catch Bugs Before Your Customers Do Icon
    Catch Bugs Before Your Customers Do

    Real-time error alerts, performance insights, and anomaly detection across your full stack. Free 30-day trial.

    Move from alert to fix before users notice. AppSignal monitors errors, performance bottlenecks, host health, and uptime—all from one dashboard. Instant notifications on deployments, anomaly triggers for memory spikes or error surges, and seamless log management. Works out of the box with Rails, Django, Express, Phoenix, Next.js, and dozens more. Starts at $23/month with no hidden fees.
    Try AppSignal Free
  • 1
    janitor

    janitor

    Simple tools for data cleaning in R

    janitor provides simple, convenient tools for data cleaning, formatting, and exploration in R. It is especially useful for cleaning messy data frames, removing duplicates, formatting column names, and producing frequency tables in a tidy workflow.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 2
    Czkawka

    Czkawka

    Multi functional app to find duplicates, empty folders, similar images

    Czkawka (Polish for “hiccup”) is a lightning‑fast, multi‑purpose file cleaning tool written in Rust. It helps users declutter storage by finding duplicate files, similar images or audio, empty folders, and unusually large files through CPU‑efficient multithreading. Available with both GUI (GTK‑based) and CLI versions for flexible usage.
    Downloads: 612 This Week
    Last Update:
    See Project
  • 3
    AI Data Science Team

    AI Data Science Team

    An AI-powered data science team of agents

    AI Data Science Team is a Python library and agent ecosystem designed to accelerate and automate common data science workflows by modeling them as specialized AI “agents” that can be orchestrated to perform tasks like data cleaning, transformation, analysis, visualization, and machine learning. It provides a modular agent framework where each agent focuses on a step in the typical data science pipeline — for example, loading data from CSV/Excel files, cleaning and wrangling messy datasets, engineering predictive features, building models with AutoML, connecting to SQL databases, and producing visual outputs — all driven by natural language or programmatic instructions. ...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 4
    litlyx

    litlyx

    Analytics for developers, setup Analytics in 30 seconds

    The easiest, developer-centric analytics tool. Litlyxis an open-source, self-hostable analytics solution for the modern framework. Litlyx offers a unique eyewear cleaning system that includes a special cleaning solution and reusable microfiber swabs. This system is designed to provide a more thorough and eco-friendly way to clean glasses, lenses, and screens. The brand emphasizes sustainability by reducing single-use plastics and promoting long-term use of their products. Their cleaning kit is compact, portable, and designed to be effective for everyday use, ensuring that users can maintain clear vision without the hassle of disposable wipes or sprays.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Try Google Cloud Risk-Free With $300 in Credit Icon
    Try Google Cloud Risk-Free With $300 in Credit

    No hidden charges. No surprise bills. Cancel anytime.

    Use your credit across every product. Compute, storage, AI, analytics. When it runs out, 20+ products stay free. You only pay when you choose to.
    Start Free
  • 5
    MeshLab

    MeshLab

    The open source mesh processing system

    ...The open source system for processing and editing 3D triangular meshes. It provides a set of tools for editing, cleaning, healing, inspecting, rendering, texturing and converting meshes. It offers features for processing raw data produced by 3D digitization tools/devices and for preparing models for 3D printing.
    Downloads: 48 This Week
    Last Update:
    See Project
  • 6
    FDUPES

    FDUPES

    FDUPES is a program for identifying or deleting duplicate files

    ...Because it operates directly on file content rather than just filenames, fdupes can accurately detect true copies and guide cleaning operations in data cleanup or migration tasks. It’s a simple, efficient, and widely used utility on Unix-like systems, appreciated by administrators, developers, and power users.
    Downloads: 11 This Week
    Last Update:
    See Project
  • 7
    NYC Taxi Data

    NYC Taxi Data

    Import public NYC taxi and for-hire vehicle (Uber, Lyft)

    ...It collects and preprocesses large-scale trip datasets (fares, pickup/dropoff, timestamps, locations, passenger counts) to enable data analysis, modeling, and visualization efforts. The project includes scripts and notebooks for cleaning and filtering the raw data, memory-efficient processing for large CSV/Parquet files, and aggregation workflows (e.g. trips per hour, heatmaps of pickups/dropoffs). It also contains example analyses—spatial and temporal visualizations like maps, time-series plots, and hotspot detection—highlighting insights such as patterns of demand, peak times, and geospatial distributions. ...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 8
    Windows Cleaner Utility

    Windows Cleaner Utility

    A windows batch script that cleans your PC from temporary files

    A Windows batch script that cleans your PC from temporary files. If you've been a Windows user for a long time without cleaning it up, or you don't know how to clean it, you will be amazed of how much space this tool can free for you! Just download "WindowsCleanerUtility.bat" or "WindowsCleanerUtility-x86_64.exe" from Releases and right click on it and select run as administrator. Choose any option from the meun provided.
    Downloads: 25 This Week
    Last Update:
    See Project
  • 9
    CSV Lint

    CSV Lint

    CSV Lint plug-in for Notepad++ for syntax highlighting

    CSV Lint plug-in for Notepad++ for syntax highlighting, csv validation, automatic column and datatype detecting fixed width datasets, change datetime format, decimal separator, sort data, count unique values, convert to xml, json, sql etc. A plugin for data cleaning and working with messy data files. Use CSV Lint for metadata discovery, technical data validation, and reformatting on tabular data files. It is not meant to be a replacement for spreadsheet programs like Excel or SPSS, but rather it's a quality control tool to examine, verify or polish up a dataset before further processing.
    Downloads: 52 This Week
    Last Update:
    See Project
  • Our Free Plans just got better! | Auth0 Icon
    Our Free Plans just got better! | Auth0

    With up to 25k MAUs and unlimited Okta connections, our Free Plan lets you focus on what you do best—building great apps.

    You asked, we delivered! Auth0 is excited to expand our Free and Paid plans to include more options so you can focus on building, deploying, and scaling applications without having to worry about your security. Auth0 now, thank yourself later.
    Try free now
  • 10
    nb-clean

    nb-clean

    Clean Jupyter notebooks of outputs, metadata, and empty cells

    ...Note that the Git filter and pre-commit hook work differently, with different effects on your working directory. The pre-commit hook operates on the notebook on disk, cleaning the copy in your working directory. The Git filter cleans notebooks as they are added to the index, leaving the copy in your working directory dirty.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 11
    airgeddon

    airgeddon

    This is a multi-use bash script for Linux systems

    ..."DoS Pursuit mode" is available to avoid AP channel hopping (available also on DoS performed on Evil Twin attacks). Full support for 2.4Ghz and 5Ghz bands. Assisted WPA/WPA2 personal networks Handshake file and PMKID capturing. Cleaning and optimizing Handshake captured files. Offline password decrypting on WPA/WPA2 captured files for personal networks (Handshakes and PMKIDs) using a dictionary, brute-force, and rule-based attacks with aircrack, crunch and hashcat tools. Enterprise networks captured password decrypting based on john the ripper, crunch, asleap and hashcat tools. ...
    Downloads: 51 This Week
    Last Update:
    See Project
  • 12
    ExtractThinker

    ExtractThinker

    ExtractThinker is a Document Intelligence library for LLMs

    ExtractThinker is a tool designed to facilitate the extraction and analysis of information from various data sources, aiding in data processing and knowledge discovery.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 13
    labelme Image Polygonal Annotation

    labelme Image Polygonal Annotation

    Image polygonal annotation with Python

    ...It is written in Python and uses Qt for its graphical interface. Image annotation for polygon, rectangle, circle, line and point. Image flag annotation for classification and cleaning. Video annotation. (video annotation). GUI customization (predefined labels / flags, auto-saving, label validation, etc). Exporting VOC-format dataset for semantic/instance segmentation. (semantic segmentation, instance segmentation). Exporting COCO-format dataset for instance segmentation. (instance segmentation). The first time you run labelme, it will create a config file in ~/.labelmerc. ...
    Downloads: 9 This Week
    Last Update:
    See Project
  • 14
    Perfect Roadmap To Learn Data Science

    Perfect Roadmap To Learn Data Science

    Basic To Intermediate Python data science guide

    ...What makes it particularly valuable is its holistic nature: rather than focusing only on modeling or theory, it also addresses the broader lifecycle of data-science work, data ingestion, cleaning, EDA, feature engineering, model building, validation, deployment, etc.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 15
    DOLMA

    DOLMA

    Data and tools for generating and inspecting OLMo pre-training data

    DOLMA (Data Optimization and Learning for Model Alignment) is a framework designed to manage large-scale datasets for training and fine-tuning language models efficiently.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 16
    Laravel Chunk Upload

    Laravel Chunk Upload

    The basic implementation for chunk upload with multiple providers

    Laravel Chunk Upload simplifies chunked uploads with support for multiple JavaScript libraries atop Laravel's file upload system, designed with a minimal memory footprint. Features include cross-domain request support, automatic cleaning, and intuitive usage.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 17
    The Data Engineering Handbook

    The Data Engineering Handbook

    Links to everything you'd ever want to learn about data engineering

    ...Rather than being a code project itself, it’s a learning handbook that links to books, articles, tutorials, community groups, boot camps, and real-world project examples that collectively form a roadmap to mastering data engineering skills. It includes beginner and intermediate boot camps, interview guides, data cleaning and transformation resources, and curated lists of newsletters and industry communities, making it useful both for self-study and technical interview preparation. The repository is actively maintained and widely starred, reflecting its role as a go-to reference for newcomers and experienced practitioners alike.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 18
    Crowbook LaTeX

    Crowbook LaTeX

    Converts books written in Markdown to HTML, LaTeX/PDF and EPUB

    Crowbook's aim is to allow you to write a book in Markdown without worrying about formatting or typography and let the program generate HTML, PDF and EPUB output for you. Its focus is novels and fiction, and the default settings should (hopefully) generate readable books with correct typography without requiring you to worry about it.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 19
    Mac Cleaner CLI

    Mac Cleaner CLI

    Scan and remove junk files, caches, logs, and more

    Mac Cleaner CLI is a free and open-source terminal-based utility that helps users scan, identify, and remove unnecessary files from their macOS systems to reclaim storage space and keep systems tidy. Through a simple command-line interface, the tool performs deep scans to find caches, temporary files, logs, browser data, and other clutter, presenting results in an organized interactive menu where users can choose exactly what to clean. It emphasizes safety by allowing users to exclude...
    Downloads: 3 This Week
    Last Update:
    See Project
  • 20
    Hotkeys JS

    Hotkeys JS

    A robust Javascript library for capturing keyboard input

    ...Because it has no external dependencies and a small footprint, it drops easily into existing codebases. Its focus on developer ergonomics makes defining, managing, and cleaning up shortcuts straightforward.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 21
    Open Interpreter

    Open Interpreter

    A natural language interface for computers

    Open Interpreter is an open-source tool that provides a natural-language interface for interacting with your computer. It lets large language models (LLMs) run code locally (Python, JavaScript, shell, etc.), enabling you to ask your computer to do tasks like data analysis, file manipulation, browsing, etc. in human terms (“chat with your computer”), with safeguards. Runs locally or via configured remote LLM servers/inference backends, giving flexibility to use models you trust or have...
    Downloads: 15 This Week
    Last Update:
    See Project
  • 22
    Java Tablesaw

    Java Tablesaw

    Java dataframe and visualization library

    Tablesaw is a dataframe and visualization library that supports loading, cleaning, transforming, filtering, and summarizing data. If you work with data in Java, it may save you time and effort. Tablesaw also supports descriptive statistics and can be used to prepare data for working with machine learning libraries like Smile, Tribuo, H20.ai, DL4J. Import data from RDBMS, Excel, CSV, TSV, JSON, HTML, or Fixed Width text files, whether they are local or remote (http, S3, etc.)
    Downloads: 2 This Week
    Last Update:
    See Project
  • 23
    All-in-RAG

    All-in-RAG

    Big Model Application Development Practice 1

    All-in-RAG is an open-source educational project designed to teach developers how to build applications using retrieval-augmented generation techniques. The repository provides a structured learning path that covers both theoretical foundations and practical implementation steps for RAG systems. It explains the full development pipeline required to create knowledge-aware AI assistants, including data preparation, document indexing, vector embedding generation, and retrieval strategies. The...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 24
    Practical Machine Learning with Python

    Practical Machine Learning with Python

    Master the essential skills needed to recognize and solve problems

    Practical Machine Learning with Python is a comprehensive repository built to accompany a project-centered guide for applying machine learning techniques to real-world problems using Python’s mature data science ecosystem. It centralizes example code, datasets, model pipelines, and explanatory notebooks that teach users how to approach problems from data ingestion and cleaning all the way through feature engineering, model selection, evaluation, tuning, and production-ready deployment patterns. The repository emphasizes end-to-end workflows rather than isolated code snippets, showing how to handle common challenges like class imbalance, overfitting, hyperparameter optimization, and interpretability. By leveraging popular Python libraries such as pandas, scikit-learn, XGBoost, and visualization tools, it illustrates how to build reproducible and robust solutions that scale beyond small demos.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 25
    stylish-haskell

    stylish-haskell

    Haskell code prettifier

    A simple Haskell code prettifier. The goal is not to format all of the code in a file since I find that kind of tools often "get in the way". However, manually cleaning up import statements, etc. gets tedious very quickly. This tool tries to help where necessary without getting in the way. Aligns and sorts import statements. Groups and wraps {-# LANGUAGE #-} pragmas, can remove (some) redundant pragmas. Removes trailing whitespace. Aligns branches in case and fields in records. Converts line endings (customizable) Replaces tabs by four spaces (turned off by default) Replaces some ASCII sequences by their Unicode equivalents (turned off by default) Format data constructors and fields in records.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • 2
  • 3
  • 4
  • 5
  • Next
MongoDB Logo MongoDB