Alternatives to Apache Doris
Compare Apache Doris alternatives for your business or organization using the curated list below. SourceForge ranks the best alternatives to Apache Doris in 2026. Compare features, ratings, user reviews, pricing, and more from Apache Doris competitors and alternatives in order to make an informed decision for your business.
-
1
Google Cloud BigQuery
Google
BigQuery is a serverless, multicloud data warehouse that simplifies the process of working with all types of data so you can focus on getting valuable business insights quickly. At the core of Google’s data cloud, BigQuery allows you to simplify data integration, cost effectively and securely scale analytics, share rich data experiences with built-in business intelligence, and train and deploy ML models with a simple SQL interface, helping to make your organization’s operations more data-driven. Gemini in BigQuery offers AI-driven tools for assistance and collaboration, such as code suggestions, visual data preparation, and smart recommendations designed to boost efficiency and reduce costs. BigQuery delivers an integrated platform featuring SQL, a notebook, and a natural language-based canvas interface, catering to data professionals with varying coding expertise. This unified workspace streamlines the entire analytics process. -
2
StarTree
StarTree
StarTree, powered by Apache Pinot™, is a fully managed real-time analytics platform built for customer-facing applications that demand instant insights on the freshest data. Unlike traditional data warehouses or OLTP databases—optimized for back-office reporting or transactions—StarTree is engineered for real-time OLAP at true scale, meaning: - Data Volume: query performance sustained at petabyte scale - Ingest Rates: millions of events per second, continuously indexed for freshness - Concurrency: thousands to millions of simultaneous users served with sub-second latency With StarTree, businesses deliver always-fresh insights at interactive speed, enabling applications that personalize, monitor, and act in real time.Starting Price: Free -
3
Striim
Striim
Data integration for your hybrid cloud. Modern, reliable data integration across your private and public cloud. All in real-time with change data capture and data streams. Built by the executive & technical team from GoldenGate Software, Striim brings decades of experience in mission-critical enterprise workloads. Striim scales out as a distributed platform in your environment or in the cloud. Scalability is fully configurable by your team. Striim is fully secure with HIPAA and GDPR compliance. Built ground up for modern enterprise workloads in the cloud or on-premise. Drag and drop to create data flows between your sources and targets. Process, enrich, and analyze your streaming data with real-time SQL queries. -
4
Snowflake
Snowflake
Snowflake is a comprehensive AI Data Cloud platform designed to eliminate data silos and simplify data architectures, enabling organizations to get more value from their data. The platform offers interoperable storage that provides near-infinite scale and access to diverse data sources, both inside and outside Snowflake. Its elastic compute engine delivers high performance for any number of users, workloads, and data volumes with seamless scalability. Snowflake’s Cortex AI accelerates enterprise AI by providing secure access to leading large language models (LLMs) and data chat services. The platform’s cloud services automate complex resource management, ensuring reliability and cost efficiency. Trusted by over 11,000 global customers across industries, Snowflake helps businesses collaborate on data, build data applications, and maintain a competitive edge.Starting Price: $2 compute/month -
5
Databend
Databend
Databend is a modern, cloud-native data warehouse built to deliver high-performance, cost-efficient analytics for large-scale data processing. It is designed with an elastic architecture that scales dynamically to meet the demands of different workloads, ensuring efficient resource utilization and lower operational costs. Written in Rust, Databend offers exceptional performance through features like vectorized query execution and columnar storage, which optimize data retrieval and processing speeds. Its cloud-first design enables seamless integration with cloud platforms, and it emphasizes reliability, data consistency, and fault tolerance. Databend is an open source solution, making it a flexible and accessible choice for data teams looking to handle big data analytics in the cloud.Starting Price: Free -
6
Oxla
Oxla
Purpose-built for compute, memory, and storage efficiency, Oxla is a self-hosted data warehouse optimized for large-scale, low-latency analytics with robust time-series support. Cloud data warehouses aren’t for everyone. At scale, long-term cloud compute costs outweigh short-term infrastructure savings, and regulated industries require full control over data beyond VPC and BYOC deployments. Oxla outperforms both legacy and cloud warehouses through efficiency, enabling scale for growing datasets with predictable costs, on-prem or in any cloud. Easily deploy, run, and maintain Oxla with Docker and YAML to power diverse workloads in a single, self-hosted data warehouse.Starting Price: $50 per CPU core / monthly -
7
SelectDB
SelectDB
SelectDB is a modern data warehouse based on Apache Doris, which supports rapid query analysis on large-scale real-time data. From Clickhouse to Apache Doris, to achieve the separation of the lake warehouse and upgrade to the lake warehouse. The fast-hand OLAP system carries nearly 1 billion query requests every day to provide data services for multiple scenes. Due to the problems of storage redundancy, resource seizure, complicated governance, and difficulty in querying and adjustment, the original lake warehouse separation architecture was decided to introduce Apache Doris lake warehouse, combined with Doris's materialized view rewriting ability and automated services, to achieve high-performance data query and flexible data governance. Write real-time data in seconds, and synchronize flow data from databases and data streams. Data storage engine for real-time update, real-time addition, and real-time pre-polymerization.Starting Price: $0.22 per hour -
8
VeloDB
VeloDB
Powered by Apache Doris, VeloDB is a modern data warehouse for lightning-fast analytics on real-time data at scale. Push-based micro-batch and pull-based streaming data ingestion within seconds. Storage engine with real-time upsert、append and pre-aggregation. Unparalleled performance in both real-time data serving and interactive ad-hoc queries. Not just structured but also semi-structured data. Not just real-time analytics but also batch processing. Not just run queries against internal data but also work as a federate query engine to access external data lakes and databases. Distributed design to support linear scalability. Whether on-premise deployment or cloud service, separation or integration of storage and compute, resource usage can be flexibly and efficiently adjusted according to workload requirements. Built on and fully compatible with open source Apache Doris. Support MySQL protocol, functions, and SQL for easy integration with other data tools. -
9
Apache Druid
Druid
Apache Druid is an open source distributed data store. Druid’s core design combines ideas from data warehouses, timeseries databases, and search systems to create a high performance real-time analytics database for a broad range of use cases. Druid merges key characteristics of each of the 3 systems into its ingestion layer, storage format, querying layer, and core architecture. Druid stores and compresses each column individually, and only needs to read the ones needed for a particular query, which supports fast scans, rankings, and groupBys. Druid creates inverted indexes for string values for fast search and filter. Out-of-the-box connectors for Apache Kafka, HDFS, AWS S3, stream processors, and more. Druid intelligently partitions data based on time and time-based queries are significantly faster than traditional databases. Scale up or down by just adding or removing servers, and Druid automatically rebalances. Fault-tolerant architecture routes around server failures. -
10
Databricks Data Intelligence Platform
Databricks
The Databricks Data Intelligence Platform allows your entire organization to use data and AI. It’s built on a lakehouse to provide an open, unified foundation for all data and governance, and is powered by a Data Intelligence Engine that understands the uniqueness of your data. The winners in every industry will be data and AI companies. From ETL to data warehousing to generative AI, Databricks helps you simplify and accelerate your data and AI goals. Databricks combines generative AI with the unification benefits of a lakehouse to power a Data Intelligence Engine that understands the unique semantics of your data. This allows the Databricks Platform to automatically optimize performance and manage infrastructure in ways unique to your business. The Data Intelligence Engine understands your organization’s language, so search and discovery of new data is as easy as asking a question like you would to a coworker. -
11
Apache Pinot
Apache Corporation
Pinot is designed to answer OLAP queries with low latency on immutable data. Pluggable indexing technologies - Sorted Index, Bitmap Index, Inverted Index. Joins are currently not supported, but this problem can be overcome by using Trino or PrestoDB for querying. SQL like language that supports selection, aggregation, filtering, group by, order by, distinct queries on data. Consist of of both offline and real-time table. Use real-time table only to cover segments for which offline data may not be available yet. Detect the right anomalies by customizing anomaly detect flow and notification flow. -
12
SingleStore
SingleStore
SingleStore (formerly MemSQL) is a distributed, highly-scalable SQL database that can run anywhere. We deliver maximum performance for transactional and analytical workloads with familiar relational models. SingleStore is a scalable SQL database that ingests data continuously to perform operational analytics for the front lines of your business. Ingest millions of events per second with ACID transactions while simultaneously analyzing billions of rows of data in relational SQL, JSON, geospatial, and full-text search formats. SingleStore delivers ultimate data ingestion performance at scale and supports built in batch loading and real time data pipelines. SingleStore lets you achieve ultra fast query response across both live and historical data using familiar ANSI SQL. Perform ad hoc analysis with business intelligence tools, run machine learning algorithms for real-time scoring, perform geoanalytic queries in real time.Starting Price: $0.69 per hour -
13
StarRocks
StarRocks
Whether you're working with a single table or multiple, you'll experience at least 300% better performance on StarRocks compared to other popular solutions. From streaming data to data capture, with a rich set of connectors, you can ingest data into StarRocks in real time for the freshest insights. A query engine that adapts to your use cases. Without moving your data or rewriting SQL, StarRocks provides the flexibility to scale your analytics on demand with ease. StarRocks enables a rapid journey from data to insight. StarRocks' performance is unmatched and provides a unified OLAP solution covering the most popular data analytics scenarios. Whether you're working with a single table or multiple, you'll experience at least 300% better performance on StarRocks compared to other popular solutions. StarRocks' built-in memory-and-disk-based caching framework is specifically designed to minimize the I/O overhead of fetching data from external storage to accelerate query performance.Starting Price: Free -
14
Imply
Imply
Imply is a real-time analytics platform built on Apache Druid, designed to handle large-scale, high-performance OLAP (Online Analytical Processing) workloads. It offers real-time data ingestion, fast query performance, and the ability to perform complex analytical queries on massive datasets with low latency. Imply is tailored for organizations that need interactive analytics, real-time dashboards, and data-driven decision-making at scale. It provides a user-friendly interface for data exploration, along with advanced features such as multi-tenancy, fine-grained access controls, and operational insights. With its distributed architecture and scalability, Imply is well-suited for use cases in streaming data analytics, business intelligence, and real-time monitoring across industries. -
15
Materialize
Materialize
Materialize is a reactive database that delivers incremental view updates. We help developers easily build with streaming data using standard SQL. Materialize can connect to many different external sources of data without pre-processing. Connect directly to streaming sources like Kafka, Postgres databases, CDC, or historical sources of data like files or S3. Materialize allows you to query, join, and transform data sources in standard SQL - and presents the results as incrementally-updated Materialized views. Queries are maintained and continually updated as new data streams in. With incrementally-updated views, developers can easily build data visualizations or real-time applications. Building with streaming data can be as simple as writing a few lines of SQL.Starting Price: $0.98 per hour -
16
Timeplus
Timeplus
Timeplus is a simple, powerful, and cost-efficient stream processing platform. All in a single binary, easily deployed anywhere. We help data teams process streaming and historical data quickly and intuitively, in organizations of all sizes and industries. Lightweight, single binary, without dependencies. End-to-end analytic streaming and historical functionalities. 1/10 the cost of similar open source frameworks. Turn real-time market and transaction data into real-time insights. Leverage append-only streams and key-value streams to monitor financial data. Implement real-time feature pipelines using Timeplus. One platform for all infrastructure logs, metrics, and traces, the three pillars supporting observability. In Timeplus, we support a wide range of data sources in our web console UI. You can also push data via REST API, or create external streams without copying data into Timeplus.Starting Price: $199 per month -
17
Exasol
Exasol
With an in-memory, columnar database and MPP architecture, you can query billions of rows in seconds. Queries are distributed across all nodes in a cluster, providing linear scalability for more users and advanced analytics. MPP, in-memory, and columnar storage add up to the fastest database built for data analytics. With SaaS, cloud, on premises and hybrid deployment options you can analyze data wherever it lives. Automatic query tuning reduces maintenance and overhead. Seamless integrations and performance efficiency gets you more power at a fraction of normal infrastructure costs. Smart, in-memory query processing allowed this social networking company to boost performance, processing 10B data sets a year. A single data repository and speed engine to accelerate critical analytics, delivering improved patient outcome and bottom line. -
18
Arroyo
Arroyo
Scale from zero to millions of events per second. Arroyo ships as a single, compact binary. Run locally on MacOS or Linux for development, and deploy to production with Docker or Kubernetes. Arroyo is a new kind of stream processing engine, built from the ground up to make real-time easier than batch. Arroyo was designed from the start so that anyone with SQL experience can build reliable, efficient, and correct streaming pipelines. Data scientists and engineers can build end-to-end real-time applications, models, and dashboards, without a separate team of streaming experts. Transform, filter, aggregate, and join data streams by writing SQL, with sub-second results. Your streaming pipelines shouldn't page someone just because Kubernetes decided to reschedule your pods. Arroyo is built to run in modern, elastic cloud environments, from simple container runtimes like Fargate to large, distributed deployments on the Kubernetes logo Kubernetes. -
19
Infobright DB
IgniteTech
Infobright DB is a high-performance enterprise database leveraging a columnar storage engine to enable business analysts to dissect data efficiently and more quickly obtain reports. InfoBright DB can be deployed on-premise or in the cloud. Store & analyze big data for interactive business intelligence and complex queries. Improve query performance, reduce storage cost and increase overall efficiency in business analytics and reporting. Easily store up to several hundred TB of data — traditionally not achievable with conventional databases. Run big data applications and eliminate indexing and partitioning — with zero administrative overhead. With the volumes of machine data exploding, IgniteTech’s Infobright DB is specifically designed to achieve high performance for large volumes of machine-generated data. Manage a complex ad hoc analytic environments without the database administration required by other products. -
20
Kinetica
Kinetica
A scalable cloud database for real-time analysis on large and streaming datasets. Kinetica is designed to harness modern vectorized processors to be orders of magnitude faster and more efficient for real-time spatial and temporal workloads. Track and gain intelligence from billions of moving objects in real-time. Vectorization unlocks new levels of performance for analytics on spatial and time series data at scale. Ingest and query at the same time to act on real-time events. Kinetica's lockless architecture and distributed ingestion ensures data is available to query as soon as it lands. Vectorized processing enables you to do more with less. More power allows for simpler data structures, which lead to lower storage costs, more flexibility and less time engineering your data. Vectorized processing opens the door to amazingly fast analytics and detailed visualization of moving objects at scale. -
21
Aerospike
Aerospike
Aerospike is the global leader in next-generation, real-time NoSQL data solutions for any scale. Aerospike enterprises overcome seemingly impossible data bottlenecks to compete and win with a fraction of the infrastructure complexity and cost of legacy NoSQL databases. Aerospike’s patented Hybrid Memory Architecture™ delivers an unbreakable competitive advantage by unlocking the full potential of modern hardware, delivering previously unimaginable value from vast amounts of data at the edge, to the core and in the cloud. Aerospike empowers customers to instantly fight fraud; dramatically increase shopping cart size; deploy global digital payment networks; and deliver instant, one-to-one personalization for millions of customers. Aerospike customers include Airtel, Banca d’Italia, Nielsen, PayPal, Snap, Verizon Media and Wayfair. The company is headquartered in Mountain View, Calif., with additional locations in London; Bengaluru, India; and Tel Aviv, Israel. -
22
Hydra
Hydra
Hydra is an open source, column-oriented Postgres. Query billions of rows instantly, no code changes. Hydra parallelizes and vectorizes aggregates (COUNT, SUM, AVG) to deliver the speed you’ve always wanted on Postgres. Boost performance at every size! Set up Hydra in 5 minutes without changing your syntax, tools, data model, or extensions. Use Hydra Cloud for fully managed operations and smooth sailing. Different industries have different needs. Get better analytics with powerful Postgres extensions, custom functions, and take control. Built by you, for you. Hydra is the fastest Postgres in the market for analytics. Boost performance with columnar storage, vectorization, and query parallelization. -
23
Presto
Presto Foundation
Presto is an open source distributed SQL query engine for running interactive analytic queries against data sources of all sizes ranging from gigabytes to petabytes. For data engineers who struggle with managing multiple query languages and interfaces to siloed databases and storage, Presto is the fast and reliable engine that provides one simple ANSI SQL interface for all your data analytics and your open lakehouse. Different engines for different workloads means you will have to re-platform down the road. With Presto, you get 1 familar ANSI SQL language and 1 engine for your data analytics so you don't need to graduate to another lakehouse engine. Presto can be used for interactive and batch workloads, small and large amounts of data, and scales from a few to thousands of users. Presto gives you one simple ANSI SQL interface for all of your data in various siloed data systems, helping you join your data ecosystem together. -
24
QuestDB
QuestDB
QuestDB is a relational column-oriented database designed for time series and event data. It uses SQL with extensions for time series to assist with real-time analytics. These pages cover core concepts of QuestDB, including setup steps, usage guides, and reference documentation for syntax, APIs and configuration. This section describes the architecture of QuestDB, how it stores and queries data, and introduces features and capabilities unique to the system. Designated timestamp is a core feature that enables time-oriented language capabilities and partitioning. Symbol type makes storing and retrieving repetitive strings efficient. Storage model describes how QuestDB stores records and partitions within tables. Indexes can be used for faster read access on specific columns. Partitions can be used for significant performance benefits on calculations and queries. SQL extensions allow performant time series analysis with a concise syntax. -
25
Tiger Data
Tiger Data
Tiger Data is the creator of TimescaleDB, the world’s leading PostgreSQL-based time-series and analytics database. It provides a modern data platform purpose-built for developers, devices, and AI agents. Designed to extend PostgreSQL beyond traditional limits, Tiger Data offers built-in primitives for time-series data, search, materialization, and scale. With features like auto-partitioning, hybrid storage, and compression, it helps teams query billions of rows in milliseconds while cutting infrastructure costs. Tiger Cloud delivers these capabilities as a fully managed, elastic environment with enterprise-grade security and compliance. Trusted by innovators like Cloudflare, Toyota, Polymarket, and Hugging Face, Tiger Data powers real-time analytics, observability, and intelligent automation across industries.Starting Price: $30 per month -
26
CelerData Cloud
CelerData
CelerData is a high-performance SQL engine built to power analytics directly on data lakehouses, eliminating the need for traditional data‐warehouse ingestion pipelines. It delivers sub-second query performance at scale, supports on-the‐fly JOINs without costly denormalization, and simplifies architecture by allowing users to run demanding workloads on open format tables. Built on the open source engine StarRocks, the platform outperforms legacy query engines like Trino, ClickHouse, and Apache Druid in latency, concurrency, and cost-efficiency. With a cloud-managed service that runs in your own VPC, you retain infrastructure control and data ownership while CelerData handles maintenance and optimization. The platform is positioned to power real-time OLAP, business intelligence, and customer-facing analytics use cases and is trusted by enterprise customers (including names such as Pinterest, Coinbase, and Fanatics) who have achieved significant latency reductions and cost savings. -
27
DoubleCloud
DoubleCloud
Save time & costs by streamlining data pipelines with zero-maintenance open source solutions. From ingestion to visualization, all are integrated, fully managed, and highly reliable, so your engineers will love working with data. You choose whether to use any of DoubleCloud’s managed open source services or leverage the full power of the platform, including data storage, orchestration, ELT, and real-time visualization. We provide leading open source services like ClickHouse, Kafka, and Airflow, with deployment on Amazon Web Services or Google Cloud. Our no-code ELT tool allows real-time data syncing between systems, fast, serverless, and seamlessly integrated with your existing infrastructure. With our managed open-source data visualization you can simply visualize your data in real time by building charts and dashboards. We’ve designed our platform to make the day-to-day life of engineers more convenient.Starting Price: $0.024 per 1 GB per month -
28
DeltaStream
DeltaStream
DeltaStream is a unified serverless stream processing platform that integrates with streaming storage services. Think about it as the compute layer on top of your streaming storage. It provides functionalities of streaming analytics(Stream processing) and streaming databases along with additional features to provide a complete platform to manage, process, secure and share streaming data. DeltaStream provides a SQL based interface where you can easily create stream processing applications such as streaming pipelines, materialized views, microservices and many more. It has a pluggable processing engine and currently uses Apache Flink as its primary stream processing engine. DeltaStream is more than just a query processing layer on top of Kafka or Kinesis. It brings relational database concepts to the data streaming world, including namespacing and role based access control enabling you to securely access, process and share your streaming data regardless of where they are stored. -
29
Google Cloud Datastream
Google
Serverless and easy-to-use change data capture and replication service. Access to streaming data from MySQL, PostgreSQL, AlloyDB, SQL Server, and Oracle databases. Near real-time analytics in BigQuery. Easy-to-use setup with built-in secure connectivity for faster time-to-value. A serverless platform that automatically scales, with no resources to provision or manage. Log-based mechanism to reduce the load and potential disruption on source databases. Synchronize data across heterogeneous databases, storage systems, and applications reliably, with low latency, while minimizing impact on source performance. Get up and running fast with a serverless and easy-to-use service that seamlessly scales up or down, and has no infrastructure to manage. Connect and integrate data across your organization with the best of Google Cloud services like BigQuery, Spanner, Dataflow, and Data Fusion. -
30
Trino
Trino
Trino is a query engine that runs at ludicrous speed. Fast-distributed SQL query engine for big data analytics that helps you explore your data universe. Trino is a highly parallel and distributed query engine, that is built from the ground up for efficient, low-latency analytics. The largest organizations in the world use Trino to query exabyte-scale data lakes and massive data warehouses alike. Supports diverse use cases, ad-hoc analytics at interactive speeds, massive multi-hour batch queries, and high-volume apps that perform sub-second queries. Trino is an ANSI SQL-compliant query engine, that works with BI tools such as R, Tableau, Power BI, Superset, and many others. You can natively query data in Hadoop, S3, Cassandra, MySQL, and many others, without the need for complex, slow, and error-prone processes for copying the data. Access data from multiple systems within a single query.Starting Price: Free -
31
BigObject
BigObject
At the heart of our innovation is in-data computing, a technology designed to process large amounts of data efficiently. Our flagship product, BigObject, embodies this core technology; it’s a time series database developed with the goal of high-speed storage and handling of massive data. With our core technology of in-data computing, we launched BigObject, which can quickly and continuously handle non-stop and all aspects of data streams. BigObject is a time series database developed with the goal of high-speed storage and analysis of massive data. It boasts excellent performance and powerful complex query capabilities. Extending the relational data structure to a time-series model structure, it utilizes in-data computing to optimize the database’s performance. Our core technology is an abstract model in which all data is kept in an infinite and persistent memory space for both storage and computing. -
32
R2 SQL
Cloudflare
R2 SQL is Cloudflare’s serverless, distributed analytics query engine (currently in open beta) that enables you to run SQL queries over Apache Iceberg tables stored in R2 Data Catalog without needing to manage your own compute clusters. It is built to efficiently query large volumes of data by leveraging metadata pruning, partition-level statistics, file and row-group filtering, and Cloudflare’s globally distributed compute infrastructure to parallelize execution. The system works by integrating with R2 object storage and an Iceberg catalog layer, so you can ingest data via Cloudflare Pipelines into Iceberg tables, and then query that data with minimal overhead. Queries can be issued via the Wrangler CLI or HTTP API (with an API token granting permissions across R2 SQL, Data Catalog, and storage). During the open beta period, using R2 SQL itself is not billed, only storage and standard R2 operations incur charges.Starting Price: Free -
33
Citus
Citus Data
Citus gives you the Postgres you love, plus the superpower of distributed tables. 100% open source. Now with schema-based and row-based sharding, plus Postgres 16 support. Scale Postgres by distributing data & queries. You can start with a single Citus node, then add nodes & rebalance shards when you need to grow. Speed up queries by 20x to 300x (or more) through parallelism, keeping more data in memory, higher I/O bandwidth, and columnar compression. Citus is an extension (not a fork) to the latest Postgres versions, so you can use your familiar SQL toolset & leverage your Postgres expertise. Reduce your infrastructure headaches by using a single database for both your transactional and analytical workloads. Download and use Citus open source for free. You can manage Citus yourself, embrace open source, and help us improve Citus via GitHub. Focus on your application & forget about your database. Run your app on Citus in the cloud with Azure Cosmos DB for PostgreSQL.Starting Price: $0.27 per hour -
34
Decodable
Decodable
No more low level code and stitching together complex systems. Build and deploy pipelines in minutes with SQL. A data engineering service that makes it easy for developers and data engineers to build and deploy real-time data pipelines for data-driven applications. Pre-built connectors for messaging systems, storage systems, and database engines make it easy to connect and discover available data. For each connection you make, you get a stream to or from the system. With Decodable you can build your pipelines with SQL. Pipelines use streams to send data to, or receive data from, your connections. You can also use streams to connect pipelines together to handle the most complex processing tasks. Observe your pipelines to ensure data keeps flowing. Create curated streams for other teams. Define retention policies on streams to avoid data loss during external system failures. Real-time health and performance metrics let you know everything’s working.Starting Price: $0.20 per task per hour -
35
Statsbot
Statsbot
Connect your database, let Statsbot generate data relationships from your database automatically, and get your first insights in a matter of minutes. Statsbot works with terabytes of raw data from hundreds of data sources. You don’t need to optimize or transform data beforehand. Define your transformations using SQL and Javascript. No need to learn proprietary languages or navigate complex interfaces. Data storage is cheap, but querying data is expensive. Statsbot makes your queries cost effective using a pre-aggregations layer on top of your data. Statsbot is designed to work with raw data stored in modern data warehouses. It not only queries and transforms data on the fly, it also does background jobs to optimize and pre-aggregate heavy calculations.By utilizing a transformation layer, Statsbot pre-aggregates heavy queries in the background. It makes even very complicated reports load blazingly fast.Starting Price: $39 per month -
36
Yellowbrick
Yellowbrick Data
Data Warehousing Without Limits While legacy platforms like Netezza struggle to stay relevant, and cloud-only options like Snowflake suffer from a reliance on VMs running on commodity hardware, Yellowbrick shatters ceilings on price/performance and deployment flexibility across on-premises and cloud environments. Pricing & Configurations. Performance Security. Get 100X Performance. Let thousands of users run ad hoc queries 10x-100x faster than any legacy or cloud-only data warehouse, on PBs of data. Plus, query real-time and at-rest data simultaneously. Deploy Anywhere Deploy applications everywhere — on-premises, in multiple public clouds, or both with the same data and performance everywhere (and no data egress charges). Save Millions Pay a fraction of what other options charge you via fixed-cost subscriptions for budget certainty, the more queries you run, the lower the cost per query. -
37
Apache Iceberg
Apache Software Foundation
Iceberg is a high-performance format for huge analytic tables. Iceberg brings the reliability and simplicity of SQL tables to big data, while making it possible for engines like Spark, Trino, Flink, Presto, Hive and Impala to safely work with the same tables, at the same time. Iceberg supports flexible SQL commands to merge new data, update existing rows, and perform targeted deletes. Iceberg can eagerly rewrite data files for read performance, or it can use delete deltas for faster updates. Iceberg handles the tedious and error-prone task of producing partition values for rows in a table and skips unnecessary partitions and files automatically. No extra filters are needed for fast queries, and the table layout can be updated as data or queries change.Starting Price: Free -
38
Thousands of customers use Amazon Managed Service for Apache Flink to run stream processing applications. With Amazon Managed Service for Apache Flink, you can transform and analyze streaming data in real-time using Apache Flink and integrate applications with other AWS services. There are no servers and clusters to manage, and there is no computing and storage infrastructure to set up. You pay only for the resources you use. Build and run Apache Flink applications, without setting up infrastructure and managing resources and clusters. Process gigabytes of data per second with subsecond latencies and respond to events in real-time. Deploy highly available and durable applications with Multi-AZ deployments and APIs for application lifecycle management. Develop applications that transform and deliver data to Amazon Simple Storage Service (Amazon S3), Amazon OpenSearch Service, and more.Starting Price: $0.11 per hour
-
39
Greenplum
Greenplum Database
Greenplum Database® is an advanced, fully featured, open source data warehouse. It provides powerful and rapid analytics on petabyte scale data volumes. Uniquely geared toward big data analytics, Greenplum Database is powered by the world’s most advanced cost-based query optimizer delivering high analytical query performance on large data volumes. Greenplum Database® project is released under the Apache 2 license. We want to thank all our current community contributors and are interested in all new potential contributions. For the Greenplum Database community no contribution is too small, we encourage all types of contributions. An open-source massively parallel data platform for analytics, machine learning and AI. Rapidly create and deploy models for complex applications in cybersecurity, predictive maintenance, risk management, fraud detection, and many other areas. Experience the fully featured, integrated, open source analytics platform. -
40
Tinybird
Tinybird
Query and shape your data using Pipes, a new way to chain SQL queries inspired by Python Notebooks. Designed to reduce complexity without sacrificing performance. By splitting your query in different nodes you simplify development and maintenance. Activate your production-ready API endpoints with one click. Transformations occur on-the-fly so you will always work with the latest data. Share access securely to your data in one click and get fast and consistent results. Apart from providing monitoring tools, Tinybird scales linearly: don't worry about traffic spikes. Imagine if you could turn, in a matter of minutes, any Data Stream or CSV file into a fully secured real-time analytics API endpoint. We believe in high-frequency decision-making for all organizations in all industries including retail, manufacturing, telecommunications, government, advertising, entertainment, healthcare, and financial services.Starting Price: $0.07 per processed GB -
41
Tabular
Tabular
Tabular is an open table store from the creators of Apache Iceberg. Connect multiple computing engines and frameworks. Decrease query time and storage costs by up to 50%. Centralize enforcement of data access (RBAC) policies. Connect any query engine or framework, including Athena, BigQuery, Redshift, Snowflake, Databricks, Trino, Spark, and Python. Smart compaction, clustering, and other automated data services reduce storage costs and query times by up to 50%. Unify data access at the database or table. RBAC controls are simple to manage, consistently enforced, and easy to audit. Centralize your security down to the table. Tabular is easy to use plus it features high-powered ingestion, performance, and RBAC under the hood. Tabular gives you the flexibility to work with multiple “best of breed” compute engines based on their strengths. Assign privileges at the data warehouse database, table, or column level.Starting Price: $100 per month -
42
ClickHouse
ClickHouse
ClickHouse is a fast open-source OLAP database management system. It is column-oriented and allows to generate analytical reports using SQL queries in real-time. ClickHouse's performance exceeds comparable column-oriented database management systems currently available on the market. It processes hundreds of millions to more than a billion rows and tens of gigabytes of data per single server per second. ClickHouse uses all available hardware to its full potential to process each query as fast as possible. Peak processing performance for a single query stands at more than 2 terabytes per second (after decompression, only used columns). In distributed setup reads are automatically balanced among healthy replicas to avoid increasing latency. ClickHouse supports multi-master asynchronous replication and can be deployed across multiple datacenters. All nodes are equal, which allows avoiding having single points of failure. -
43
Apache Kylin
Apache Software Foundation
Apache Kylin™ is an open source, distributed Analytical Data Warehouse for Big Data; it was designed to provide OLAP (Online Analytical Processing) capability in the big data era. By renovating the multi-dimensional cube and precalculation technology on Hadoop and Spark, Kylin is able to achieve near constant query speed regardless of the ever-growing data volume. Reducing query latency from minutes to sub-second, Kylin brings online analytics back to big data. Kylin can analyze 10+ billions of rows in less than a second. No more waiting on reports for critical decisions. Kylin connects data on Hadoop to BI tools like Tableau, PowerBI/Excel, MSTR, QlikSense, Hue and SuperSet, making the BI on Hadoop faster than ever. As an Analytical Data Warehouse, Kylin offers ANSI SQL on Hadoop/Spark and supports most ANSI SQL query functions. Kylin can support thousands of interactive queries at the same time, thanks to the low resource consumption of each query. -
44
Upsolver
Upsolver
Upsolver makes it incredibly simple to build a governed data lake and to manage, integrate and prepare streaming data for analysis. Define pipelines using only SQL on auto-generated schema-on-read. Easy visual IDE to accelerate building pipelines. Add Upserts and Deletes to data lake tables. Blend streaming and large-scale batch data. Automated schema evolution and reprocessing from previous state. Automatic orchestration of pipelines (no DAGs). Fully-managed execution at scale. Strong consistency guarantee over object storage. Near-zero maintenance overhead for analytics-ready data. Built-in hygiene for data lake tables including columnar formats, partitioning, compaction and vacuuming. 100,000 events per second (billions daily) at low cost. Continuous lock-free compaction to avoid “small files” problem. Parquet-based tables for fast queries. -
45
Apache Impala
Apache
Impala provides low latency and high concurrency for BI/analytic queries on the Hadoop ecosystem, including Iceberg, open data formats, and most cloud storage options. Impala also scales linearly, even in multitenant environments. Impala is integrated with native Hadoop security and Kerberos for authentication, and via the Ranger module, you can ensure that the right users and applications are authorized for the right data. Utilize the same file and data formats and metadata, security, and resource management frameworks as your Hadoop deployment, with no redundant infrastructure or data conversion/duplication. For Apache Hive users, Impala utilizes the same metadata and ODBC driver. Like Hive, Impala supports SQL, so you don't have to worry about reinventing the implementation wheel. With Impala, more users, whether using SQL queries or BI applications, can interact with more data through a single repository and metadata stored from source through analysis.Starting Price: Free -
46
Hitachi Streaming Data Platform
Hitachi
The Hitachi Streaming Data Platform (SDP) is a real-time data processing system designed to analyze large volumes of time-sequenced data as it is generated. By leveraging in-memory and incremental computational processing, SDP enables swift analysis without the delays associated with traditional stored data processing. Users can define summary analysis scenarios using Continuous Query Language (CQL), similar to SQL, allowing for flexible and programmable data analysis without the need for custom applications. The platform's architecture comprises components such as development servers, data-transfer servers, data-analysis servers, and dashboard servers, facilitating scalable and efficient data processing workflows. SDP's modular design supports various data input and output formats, including text files and HTTP packets, and integrates with visualization tools like RTView for real-time monitoring. -
47
BigLake
Google
BigLake is a storage engine that unifies data warehouses and lakes by enabling BigQuery and open-source frameworks like Spark to access data with fine-grained access control. BigLake provides accelerated query performance across multi-cloud storage and open formats such as Apache Iceberg. Store a single copy of data with uniform features across data warehouses & lakes. Fine-grained access control and multi-cloud governance over distributed data. Seamless integration with open-source analytics tools and open data formats. Unlock analytics on distributed data regardless of where and how it’s stored, while choosing the best analytics tools, open source or cloud-native over a single copy of data. Fine-grained access control across open source engines like Apache Spark, Presto, and Trino, and open formats such as Parquet. Performant queries over data lakes powered by BigQuery. Integrates with Dataplex to provide management at scale, including logical data organization.Starting Price: $5 per TB -
48
Serverless, interactive querying for analyzing data in IBM Cloud Object Storage. Query your data directly where it is stored, there's no ETL, no databases, and no infrastructure to manage. IBM Cloud SQL Query uses Apache Spark, an open-source, fast, extensible, in-memory data processing engine optimized for low latency and ad hoc analysis of data. No ETL or schema definition needed to enable SQL queries. Analyze data where it sits in IBM Cloud Object Storage using our query editor and REST API. Run as many queries as you need; with pay-per-query pricing, you pay only for the data scan. Compress or partition data to drive savings and performance. IBM Cloud SQL Query is highly available and executes queries using compute resources across multiple facilities. IBM Cloud SQL Query supports a variety of data formats such as CSV, JSON and Parquet, and allows for standard ANSI SQL.Starting Price: $5.00/Terabyte-Month
-
49
OpenText Analytics Database is a high-performance, scalable analytics platform that enables organizations to analyze massive data sets quickly and cost-effectively. It supports real-time analytics and in-database machine learning to deliver actionable business insights. The platform can be deployed flexibly across hybrid, multi-cloud, and on-premises environments to optimize infrastructure and reduce total cost of ownership. Its massively parallel processing (MPP) architecture handles complex queries efficiently, regardless of data size. OpenText Analytics Database also features compatibility with data lakehouse architectures, supporting formats like Parquet and ORC. With built-in machine learning and broad language support, it empowers users from SQL experts to Python developers to derive predictive insights.
-
50
Nebula Graph
vesoft
The graph database built for super large-scale graphs with milliseconds of latency. We are continuing to collaborate with the community to prepare, popularize and promote the graph database. Nebula Graph only allows authenticated access via role-based access control. Nebula Graph supports multiple storage engine types and the query language can be extended to support new algorithms. Nebula Graph provides low latency read and write , while still maintaining high throughput to simplify the most complex data sets. With a shared-nothing distributed architecture , Nebula Graph offers linear scalability. Nebula Graph's SQL-like query language is easy to understand and powerful enough to meet complex business needs. With horizontal scalability and a snapshot feature, Nebula Graph guarantees high availability even in case of failures. Large Internet companies like JD, Meituan, and Xiaohongshu have deployed Nebula Graph in production environments.