Showing 307 open source projects for "big data"

View related business solutions
  • Level Up Your Cyber Defense with External Threat Management Icon
    Level Up Your Cyber Defense with External Threat Management

    See every risk before it hits. From exposed data to dark web chatter. All in one unified view.

    Move beyond alerts. Gain full visibility, context, and control over your external attack surface to stay ahead of every threat.
    Try for Free
  • The All-in-One Commerce Platform for Businesses - Shopify Icon
    The All-in-One Commerce Platform for Businesses - Shopify

    Shopify offers plans for anyone that wants to sell products online and build an ecommerce store, small to mid-sized businesses as well as enterprise

    Shopify is a leading all-in-one commerce platform that enables businesses to start, build, and grow their online and physical stores. It offers tools to create customized websites, manage inventory, process payments, and sell across multiple channels including online, in-person, wholesale, and global markets. The platform includes integrated marketing tools, analytics, and customer engagement features to help merchants reach and retain customers. Shopify supports thousands of third-party apps and offers developer-friendly APIs for custom solutions. With world-class checkout technology, Shopify powers over 150 million high-intent shoppers worldwide. Its reliable, scalable infrastructure ensures fast performance and seamless operations at any business size.
    Learn More
  • 1
    data.table

    data.table

    Extends base R’s data for high-performance data manipulation

    data.table is an R package that extends base R’s data.frame for high-performance data manipulation. It offers concise syntax, blazing speed, and memory-efficient operations. It supports fast file reading/writing, joins, grouping, reshaping, and updates by reference. It is heavily used in large data workflows, big data in R, production pipelines, etc. Extremely efficient grouping/aggregation/summarization; can handle very large datasets (hundreds of millions to billions of rows) in memory...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 2
    pandas

    pandas

    Fast, flexible and powerful Python data analysis toolkit

    pandas is a Python data analysis library that provides high-performance, user friendly data structures and data analysis tools for the Python programming language. It enables you to carry out entire data analysis workflows in Python without having to switch to a more domain specific language. With pandas, performance, productivity and collaboration in doing data analysis in Python can significantly increase. pandas is continuously being developed to be a fundamental high-level building...
    Downloads: 75 This Week
    Last Update:
    See Project
  • 3
    Brave for iOS

    Brave for iOS

    Brave iOS Browser

    The best online privacy. Search and browse privately, turning your back on the big techies. By default, Brave blocks trackers and annoying ads from the websites you visit. And you also forget that the ads follow you wherever you browse. The advantages of blocking ads, incognito windows, private search and even VPN. All in one download. Quickly import bookmarks, extensions, and even saved passwords. The best of your old browser, but more secure. And it will only take you a minute to change...
    Downloads: 73 This Week
    Last Update:
    See Project
  • 4
    Umbrel

    Umbrel

    A beautiful personal server OS for Raspberry Pi or any Linux distro

    Run your personal server with a Bitcoin and Lightning node in your home, self-host open source apps like Nextcloud and Matrix to break away from big tech, and take full control of your data. For free. All our interactions on the internet today are mediated by a few companies who offer “free” services in exchange for storing our data on their servers to spy on us. Running a personal server fundamentally changes that. You and your family’s photos, videos, files, notes, passwords, everything, have...
    Downloads: 87 This Week
    Last Update:
    See Project
  • Gen AI apps are built with MongoDB Atlas Icon
    Gen AI apps are built with MongoDB Atlas

    The database for AI-powered applications.

    MongoDB Atlas is the developer-friendly database used to build, scale, and run gen AI and LLM-powered apps—without needing a separate vector database. Atlas offers built-in vector search, global availability across 115+ regions, and flexible document modeling. Start building AI apps faster, all in one place.
    Start Free
  • 5
    TimescaleDB

    TimescaleDB

    An open-source time-series SQL database optimized for fast ingest

    TimescaleDB is the open-source relational database for time-series and analytics. Build powerful data-intensive applications. Become instantly productive with full SQL. Rely on the same PostgreSQL you know, love, and trust. Hyperfunctions make time series easier. Achieve 10-100x faster queries than with vanilla PostgreSQL, InfluxDB, MongoDB. Write millions of data points per second per node. Horizontally scale to petabytes. Don’t worry about cardinality. Simplify your stack, ask more complex...
    Downloads: 40 This Week
    Last Update:
    See Project
  • 6
    BFG Repo-Cleaner

    BFG Repo-Cleaner

    Remove large or troublesome blobs

    The BFG is a simpler, faster alternative to git-filter-branch for cleansing bad data out of your Git repository history. You can use it for removing crazy big files, and for removing passwords, credentials and other private data. The git-filter-branch command is enormously powerful and can do things that the BFG can't, but the BFG is much better for the tasks above, because is faster and simpler. The BFG isn't particularily clever, but is focused on making the above tasks easy. If you need...
    Downloads: 18 This Week
    Last Update:
    See Project
  • 7
    QRCoder

    QRCoder

    A pure C# Open Source QR Code implementation

    ... to other libraries or network stacks. (Like QR Code generators which are relying on online services which makes them vulnerable/slow in some cases.) Although simplicity is one of the main goals, QRCoder is really flexible, in both "output formats" as well as in "payload types". Payload types? Yes, QRCoder brings its own "payload generator", which helps you to create a big list of different payload types to generate special QR codes like "WiFi QR Codes", "Girocodes", "SwissQRCodes" and many more.
    Downloads: 28 This Week
    Last Update:
    See Project
  • 8
    Apache HBase

    Apache HBase

    Get random, realtime read/write access to your Big Data

    Use Apache HBase™ when you need random, realtime read/write access to your Big Data. This project's goal is the hosting of very large tables, billions of rows X millions of columns, atop clusters of commodity hardware. Apache HBase is an open-source, distributed, versioned, non-relational database modeled after Google's Bigtable. A Distributed Storage System for Structured Data by Chang et al. Just as Bigtable leverages the distributed data storage provided by the Google File System, Apache...
    Downloads: 7 This Week
    Last Update:
    See Project
  • 9
    testng

    testng

    TestNG testing framework

    TestNG is a testing framework inspired from JUnit and NUnit but introduces some new functionalities that make it more powerful and easier to use. Run your tests in arbitrarily big thread pools with various policies available (all methods in their own thread, one thread per test class, etc...).
    Downloads: 9 This Week
    Last Update:
    See Project
  • Simple, Secure Domain Registration Icon
    Simple, Secure Domain Registration

    Get your domain at wholesale price. Cloudflare offers simple, secure registration with no markups, plus free DNS, CDN, and SSL integration.

    Register or renew your domain and pay only what we pay. No markups, hidden fees, or surprise add-ons. Choose from over 400 TLDs (.com, .ai, .dev). Every domain is integrated with Cloudflare's industry-leading DNS, CDN, and free SSL to make your site faster and more secure. Simple, secure, at-cost domain registration.
    Sign up for free
  • 10
    marimo

    marimo

    A reactive notebook for Python

    marimo is an open-source reactive notebook for Python, reproducible, git-friendly, executable as a script, and shareable as an app. marimo notebooks are reproducible, extremely interactive, designed for collaboration (git-friendly!), deployable as scripts or apps, and fit for modern Pythonista. Run one cell and marimo reacts by automatically running affected cells, eliminating the error-prone chore of managing the notebook state. marimo's reactive UI elements, like data frame GUIs and plots...
    Downloads: 8 This Week
    Last Update:
    See Project
  • 11
    StartOS

    StartOS

    Linux server OS optimized for self-hosting

    ... emphasis on encryption and self-ownership, StartOS offers a holistic alternative to Big Tech infrastructure.
    Downloads: 12 This Week
    Last Update:
    See Project
  • 12
    XCharts

    XCharts

    A charting and data visualization library for Unity

    A charting and data visualization library for Unity. Unity data visualization chart plugin. A UGUIpowerful, easy-to-use, parameter-configurable data visualization chart plug-in. It supports ten built-in charts. A powerful, easy-to-use, configurable charting and data visualization library for Unity. Visual configuration of parameters, real-time preview of effects, and pure code drawing without additional resources. Support ten built-in charts such as line chart, column chart, pie chart, radar...
    Downloads: 5 This Week
    Last Update:
    See Project
  • 13
    Planetiler

    Planetiler

    Flexible tool to build planet-scale vector tilesets

    ... into an MBTiles (SQLite) or PMTiles file that can be served using tools like TileServer GL or Martin or even queried directly from the browser. See awesome-vector-tiles for more projects that work with data in this format. Planetiler works by mapping input elements to vector tile features, flattening them into a big list, and then sorting by tile ID to group them into tiles.
    Downloads: 10 This Week
    Last Update:
    See Project
  • 14
    Volcano

    Volcano

    A Cloud Native Batch System (Project under CNCF)

    Volcano is a batch system built on Kubernetes. It provides a suite of mechanisms that are commonly required by many classes of batch & elastic workload including machine learning/deep learning, bioinformatics/genomics, and other "big data" applications. These types of applications typically run on generalized domain frameworks like TensorFlow, Spark, Ray, PyTorch, MPI, etc, which Volcano integrates with. Volcano builds upon a decade and a half of experience running a wide variety of high...
    Downloads: 12 This Week
    Last Update:
    See Project
  • 15
    Streamlink

    Streamlink

    Streamlink is a CLI utility which pipes video streams

    Streamlink is a command-line utility that pipes video streams from various services into a video player, such as VLC. The main purpose of Streamlink is to avoid resource-heavy and unoptimized websites, while still allowing the user to enjoy various streamed content. There is also an API available for developers who want access to the stream data. Streamlink is built upon a plugin system that allows support for new services to be easily added. Most of the big streaming services are supported...
    Downloads: 8 This Week
    Last Update:
    See Project
  • 16
    FinMind

    FinMind

    Open Data, more than 50 financial data

    In the era of big data, data is the foundation of everything. We collect more than 50 kinds of Taiwan stock related information and provide download, online analysis, and backtesting. Regardless of the program, you can download data through the api provided by FinMind, or you can download data directly from the website. After data is available, statistical analysis, regression analysis, time series analysis, machine learning, and deep learning can be performed. For individual stocks, provide...
    Downloads: 3 This Week
    Last Update:
    See Project
  • 17
    Redash

    Redash

    Connect to any data source, easily visualize and share your data

    Redash is an essential tool to help you make sense of your data. It allows everyone, regardless of level of technical know-how to harness the power of data. SQL users connect, query, visualize and share data easily and efficiently, allowing everyone in their organization to use the data. Redash combines the power and comfort of an SQL client with the collaborative benefits of a cloud-based service. It lets you create big, beautiful and easy to digest visualizations on dashboards for better...
    Downloads: 6 This Week
    Last Update:
    See Project
  • 18
    Grafana Alloy

    Grafana Alloy

    OpenTelemetry Collector distribution with programmable pipelines

    ... “big tent” collector that’s compatible with the most popular open-source observability ecosystems and includes enterprise-grade features to simplify operating at scale in a modern cloud-native infrastructure.
    Downloads: 5 This Week
    Last Update:
    See Project
  • 19
    Apache Doris

    Apache Doris

    MPP-based interactive SQL data warehousing for reporting and analysis

    Apache Doris is a modern MPP analytical database product. It can provide sub-second queries and efficient real-time data analysis. With it's distributed architecture, up to 10PB level datasets will be well supported and easy to operate. Apache Doris can meet various data analysis demands, including history data reports, real-time data analysis, interactive data analysis, and exploratory data analysis. Make your data analysis easier! Support standard SQL language, compatible with MySQL protocol...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 20
    Fluid

    Fluid

    Fluid, elastic data abstraction and acceleration for BigData/AI apps

    Fluid, elastic data abstraction and acceleration for BigData/AI applications in the cloud. Provide DataSet abstraction for underlying heterogeneous data sources with multidimensional management in a cloud environment. Enable dataset warmup and acceleration for data-intensive applications by using a distributed cache in Kubernetes with observability, portability, and scalability. Taking characteristics of application and data into consideration for cloud application/dataset scheduling to improve...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 21
    Vue Json Pretty

    Vue Json Pretty

    A JSON tree view component that is easy to use

    A Vue component for rendering JSON data as a tree structure. The CSS file is included separately and needs to be imported manually. You can either import CSS globally in your app (if supported by your framework) or directly from the component.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 22
    Vespa

    Vespa

    The open big data serving engine

    Make AI-driven decisions using your data, in real-time. At any scale, with unbeatable performance. Vespa is a full-featured text search engine and supports both regular text search and fast approximate vector search (ANN). This makes it easy to create high-performing search applications at any scale, whether you want to use traditional techniques or a modern vector-based approach. You can even combine both approaches efficiently in the same query, something no other engine can do...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 23
    Genie

    Genie

    Distributed Big Data Orchestration Service

    Genie is a completely open source distributed job orchestration engine developed by Netflix. Genie provides REST-ful APIs to run a variety of big data jobs like Hadoop, Pig, Hive, Spark, Presto, Sqoop and more. It also provides APIs for managing the metadata of many distributed processing clusters and the commands and applications which run on them.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 24
    PHP7

    PHP7

    PHP7 / Laravel Multi-format Streaming Parser

    When it comes to parsing XML/CSV/JSON/... documents, there are 2 approaches to consider. DOM loading loads all the documents, making it easy to navigate and parse, and as such provides maximum flexibility for developers. Streaming implies iterating through the document, acts like a cursor, and stops at each element in its way, thus avoiding memory overkill. Thus, when it comes to big files, callbacks will be executed meanwhile file is downloading and will be much more efficient as far as memory...
    Downloads: 3 This Week
    Last Update:
    See Project
  • 25
    Apache Spark

    Apache Spark

    A unified analytics engine for large-scale data processing

    ... (microbatches) and Structured Streaming, it delivers low-latency event processing suitable for real-time analytics. The built-in MLlib library provides scalable machine learning algorithms, while GraphX enables graph computations integrated with data pipelines. Spark supports multiple languages—Scala, Java, Python, R—and connects with many storage systems like HDFS, S3, Cassandra, and streaming platforms like Kafka, making it a versatile choice for big data workloads in analytics, ETL, and data science.
    Downloads: 3 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • 2
  • 3
  • 4
  • 5
  • Next
Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.