Alternatives to Apache Kylin
Compare Apache Kylin alternatives for your business or organization using the curated list below. SourceForge ranks the best alternatives to Apache Kylin in 2026. Compare features, ratings, user reviews, pricing, and more from Apache Kylin competitors and alternatives in order to make an informed decision for your business.
-
1
Teradata VantageCloud
Teradata
Teradata VantageCloud: The complete cloud analytics and data platform for AI. Teradata VantageCloud is an enterprise-grade, cloud-native data and analytics platform that unifies data management, advanced analytics, and AI/ML capabilities in a single environment. Designed for scalability and flexibility, VantageCloud supports multi-cloud and hybrid deployments, enabling organizations to manage structured and semi-structured data across AWS, Azure, Google Cloud, and on-premises systems. It offers full ANSI SQL support, integrates with open-source tools like Python and R, and provides built-in governance for secure, trusted AI. VantageCloud empowers users to run complex queries, build data pipelines, and operationalize machine learning models—all while maintaining interoperability with modern data ecosystems. -
2
Google Cloud BigQuery
Google
BigQuery is a serverless, multicloud data warehouse that simplifies the process of working with all types of data so you can focus on getting valuable business insights quickly. At the core of Google’s data cloud, BigQuery allows you to simplify data integration, cost effectively and securely scale analytics, share rich data experiences with built-in business intelligence, and train and deploy ML models with a simple SQL interface, helping to make your organization’s operations more data-driven. Gemini in BigQuery offers AI-driven tools for assistance and collaboration, such as code suggestions, visual data preparation, and smart recommendations designed to boost efficiency and reduce costs. BigQuery delivers an integrated platform featuring SQL, a notebook, and a natural language-based canvas interface, catering to data professionals with varying coding expertise. This unified workspace streamlines the entire analytics process. -
3
StarTree
StarTree
StarTree, powered by Apache Pinot™, is a fully managed real-time analytics platform built for customer-facing applications that demand instant insights on the freshest data. Unlike traditional data warehouses or OLTP databases—optimized for back-office reporting or transactions—StarTree is engineered for real-time OLAP at true scale, meaning: - Data Volume: query performance sustained at petabyte scale - Ingest Rates: millions of events per second, continuously indexed for freshness - Concurrency: thousands to millions of simultaneous users served with sub-second latency With StarTree, businesses deliver always-fresh insights at interactive speed, enabling applications that personalize, monitor, and act in real time.Starting Price: Free -
4
Amazon Redshift
Amazon
More customers pick Amazon Redshift than any other cloud data warehouse. Redshift powers analytical workloads for Fortune 500 companies, startups, and everything in between. Companies like Lyft have grown with Redshift from startups to multi-billion dollar enterprises. No other data warehouse makes it as easy to gain new insights from all your data. With Redshift you can query petabytes of structured and semi-structured data across your data warehouse, operational database, and your data lake using standard SQL. Redshift lets you easily save the results of your queries back to your S3 data lake using open formats like Apache Parquet to further analyze from other analytics services like Amazon EMR, Amazon Athena, and Amazon SageMaker. Redshift is the world’s fastest cloud data warehouse and gets faster every year. For performance intensive workloads you can use the new RA3 instances to get up to 3x the performance of any cloud data warehouse.Starting Price: $0.25 per hour -
5
Apache Pinot
Apache Corporation
Pinot is designed to answer OLAP queries with low latency on immutable data. Pluggable indexing technologies - Sorted Index, Bitmap Index, Inverted Index. Joins are currently not supported, but this problem can be overcome by using Trino or PrestoDB for querying. SQL like language that supports selection, aggregation, filtering, group by, order by, distinct queries on data. Consist of of both offline and real-time table. Use real-time table only to cover segments for which offline data may not be available yet. Detect the right anomalies by customizing anomaly detect flow and notification flow. -
6
SSAS
Microsoft
Installed as an on-premises server instance, SQL Server Analysis Services supports tabular models at all compatibility levels (depending on version), multidimensional models, data mining, and Power Pivot for SharePoint. A typical implementation workflow includes installing a SQL Server Analysis Services instance, creating a tabular or multidimensional data model, deploying the model as a database to a server instance, processing the database to load it with data, and then assigning permissions to allow data access. When ready to go, the data model can be accessed by any client application supporting Analysis Services as a data source. Models are populated with data from external data systems, usually data warehouses hosted on a SQL Server or Oracle relational database engine (Tabular models support additional data source types). -
7
Trino
Trino
Trino is a query engine that runs at ludicrous speed. Fast-distributed SQL query engine for big data analytics that helps you explore your data universe. Trino is a highly parallel and distributed query engine, that is built from the ground up for efficient, low-latency analytics. The largest organizations in the world use Trino to query exabyte-scale data lakes and massive data warehouses alike. Supports diverse use cases, ad-hoc analytics at interactive speeds, massive multi-hour batch queries, and high-volume apps that perform sub-second queries. Trino is an ANSI SQL-compliant query engine, that works with BI tools such as R, Tableau, Power BI, Superset, and many others. You can natively query data in Hadoop, S3, Cassandra, MySQL, and many others, without the need for complex, slow, and error-prone processes for copying the data. Access data from multiple systems within a single query.Starting Price: Free -
8
SelectDB
SelectDB
SelectDB is a modern data warehouse based on Apache Doris, which supports rapid query analysis on large-scale real-time data. From Clickhouse to Apache Doris, to achieve the separation of the lake warehouse and upgrade to the lake warehouse. The fast-hand OLAP system carries nearly 1 billion query requests every day to provide data services for multiple scenes. Due to the problems of storage redundancy, resource seizure, complicated governance, and difficulty in querying and adjustment, the original lake warehouse separation architecture was decided to introduce Apache Doris lake warehouse, combined with Doris's materialized view rewriting ability and automated services, to achieve high-performance data query and flexible data governance. Write real-time data in seconds, and synchronize flow data from databases and data streams. Data storage engine for real-time update, real-time addition, and real-time pre-polymerization.Starting Price: $0.22 per hour -
9
IBM Db2 Big SQL
IBM
A hybrid SQL-on-Hadoop engine delivering advanced, security-rich data query across enterprise big data sources, including Hadoop, object storage and data warehouses. IBM Db2 Big SQL is an enterprise-grade, hybrid ANSI-compliant SQL-on-Hadoop engine, delivering massively parallel processing (MPP) and advanced data query. Db2 Big SQL offers a single database connection or query for disparate sources such as Hadoop HDFS and WebHDFS, RDMS, NoSQL databases, and object stores. Benefit from low latency, high performance, data security, SQL compatibility, and federation capabilities to do ad hoc and complex queries. Db2 Big SQL is now available in 2 variations. It can be integrated with Cloudera Data Platform, or accessed as a cloud-native service on the IBM Cloud Pak® for Data platform. Access and analyze data and perform queries on batch and real-time data across sources, like Hadoop, object stores and data warehouses. -
10
QuerySurge
RTTS
QuerySurge leverages AI to automate the data validation and ETL testing of Big Data, Data Warehouses, Business Intelligence Reports and Enterprise Apps/ERPs with full DevOps functionality for continuous testing. Use Cases - Data Warehouse & ETL Testing - Hadoop & NoSQL Testing - DevOps for Data / Continuous Testing - Data Migration Testing - BI Report Testing - Enterprise App/ERP Testing QuerySurge Features - Projects: Multi-project support - AI: automatically create datas validation tests based on data mappings - Smart Query Wizards: Create tests visually, without writing SQL - Data Quality at Speed: Automate the launch, execution, comparison & see results quickly - Test across 200+ platforms: Data Warehouses, Hadoop & NoSQL lakes, databases, flat files, XML, JSON, BI Reports - DevOps for Data & Continuous Testing: RESTful API with 60+ calls & integration with all mainstream solutions - Data Analytics & Data Intelligence: Analytics dashboard & reports -
11
Apache Phoenix
Apache Software Foundation
Apache Phoenix enables OLTP and operational analytics in Hadoop for low-latency applications by combining the best of both worlds. The power of standard SQL and JDBC APIs with full ACID transaction capabilities and the flexibility of late-bound, schema-on-read capabilities from the NoSQL world by leveraging HBase as its backing store. Apache Phoenix is fully integrated with other Hadoop products such as Spark, Hive, Pig, Flume, and Map Reduce. Become the trusted data platform for OLTP and operational analytics for Hadoop through well-defined, industry-standard APIs. Apache Phoenix takes your SQL query, compiles it into a series of HBase scans, and orchestrates the running of those scans to produce regular JDBC result sets. Direct use of the HBase API, along with coprocessors and custom filters, results in performance on the order of milliseconds for small queries, or seconds for tens of millions of rows.Starting Price: Free -
12
Apache Trafodion
Apache Software Foundation
Apache Trafodion is a webscale SQL-on-Hadoop solution enabling transactional or operational workloads on Apache Hadoop. Trafodion builds on the scalability, elasticity, and flexibility of Hadoop. Trafodion extends Hadoop to provide guaranteed transactional integrity, enabling new kinds of big data applications to run on Hadoop. Full-functioned ANSI SQL language support. JDBC/ODBC connectivity for Linux/Windows clients. Distributed ACID transaction protection across multiple statements, tables, and rows. Performance improvements for OLTP workloads with compile-time and run-time optimizations. Support for large data sets using a parallel-aware query optimizer. Reuse existing SQL skills and improve developer productivity. Distributed ACID transactions guarantee data consistency across multiple rows and tables. Interoperability with existing tools and applications. Hadoop and Linux distribution neutral. Easy to add to your existing Hadoop infrastructure.Starting Price: Free -
13
Apache Spark
Apache Software Foundation
Apache Spark™ is a unified analytics engine for large-scale data processing. Apache Spark achieves high performance for both batch and streaming data, using a state-of-the-art DAG scheduler, a query optimizer, and a physical execution engine. Spark offers over 80 high-level operators that make it easy to build parallel apps. And you can use it interactively from the Scala, Python, R, and SQL shells. Spark powers a stack of libraries including SQL and DataFrames, MLlib for machine learning, GraphX, and Spark Streaming. You can combine these libraries seamlessly in the same application. Spark runs on Hadoop, Apache Mesos, Kubernetes, standalone, or in the cloud. It can access diverse data sources. You can run Spark using its standalone cluster mode, on EC2, on Hadoop YARN, on Mesos, or on Kubernetes. Access data in HDFS, Alluxio, Apache Cassandra, Apache HBase, Apache Hive, and hundreds of other data sources. -
14
Apache Doris
The Apache Software Foundation
Apache Doris is a modern data warehouse for real-time analytics. It delivers lightning-fast analytics on real-time data at scale. Push-based micro-batch and pull-based streaming data ingestion within a second. Storage engine with real-time upsert, append and pre-aggregation. Optimize for high-concurrency and high-throughput queries with columnar storage engine, MPP architecture, cost based query optimizer, vectorized execution engine. Federated querying of data lakes such as Hive, Iceberg and Hudi, and databases such as MySQL and PostgreSQL. Compound data types such as Array, Map and JSON. Variant data type to support auto data type inference of JSON data. NGram bloomfilter and inverted index for text searches. Distributed design for linear scalability. Workload isolation and tiered storage for efficient resource management. Supports shared-nothing clusters as well as separation of storage and compute.Starting Price: Free -
15
Apache Druid
Druid
Apache Druid is an open source distributed data store. Druid’s core design combines ideas from data warehouses, timeseries databases, and search systems to create a high performance real-time analytics database for a broad range of use cases. Druid merges key characteristics of each of the 3 systems into its ingestion layer, storage format, querying layer, and core architecture. Druid stores and compresses each column individually, and only needs to read the ones needed for a particular query, which supports fast scans, rankings, and groupBys. Druid creates inverted indexes for string values for fast search and filter. Out-of-the-box connectors for Apache Kafka, HDFS, AWS S3, stream processors, and more. Druid intelligently partitions data based on time and time-based queries are significantly faster than traditional databases. Scale up or down by just adding or removing servers, and Druid automatically rebalances. Fault-tolerant architecture routes around server failures. -
16
GeoSpock
GeoSpock
GeoSpock enables data fusion for the connected world with GeoSpock DB – the space-time analytics database. GeoSpock DB is a unique, cloud-native database optimised for querying for real-world use cases, able to fuse multiple sources of Internet of Things (IoT) data together to unlock its full value, whilst simultaneously reducing complexity and cost. GeoSpock DB enables efficient storage, data fusion, and rapid programmatic access to data, and allows you to run ANSI SQL queries and connect to analytics tools via JDBC/ODBC connectors. Users are able to perform analysis and share insights using familiar toolsets, with support for common BI tools (such as Tableau™, Amazon QuickSight™, and Microsoft Power BI™), and Data Science and Machine Learning environments (including Python Notebooks and Apache Spark). The database can also be integrated with internal applications and web services – with compatibility for open-source and visualisation libraries such as Kepler and Cesium.js. -
17
BigLake
Google
BigLake is a storage engine that unifies data warehouses and lakes by enabling BigQuery and open-source frameworks like Spark to access data with fine-grained access control. BigLake provides accelerated query performance across multi-cloud storage and open formats such as Apache Iceberg. Store a single copy of data with uniform features across data warehouses & lakes. Fine-grained access control and multi-cloud governance over distributed data. Seamless integration with open-source analytics tools and open data formats. Unlock analytics on distributed data regardless of where and how it’s stored, while choosing the best analytics tools, open source or cloud-native over a single copy of data. Fine-grained access control across open source engines like Apache Spark, Presto, and Trino, and open formats such as Parquet. Performant queries over data lakes powered by BigQuery. Integrates with Dataplex to provide management at scale, including logical data organization.Starting Price: $5 per TB -
18
Azure Synapse Analytics
Microsoft
Azure Synapse is Azure SQL Data Warehouse evolved. Azure Synapse is a limitless analytics service that brings together enterprise data warehousing and Big Data analytics. It gives you the freedom to query data on your terms, using either serverless or provisioned resources—at scale. Azure Synapse brings these two worlds together with a unified experience to ingest, prepare, manage, and serve data for immediate BI and machine learning needs. -
19
Apache Impala
Apache
Impala provides low latency and high concurrency for BI/analytic queries on the Hadoop ecosystem, including Iceberg, open data formats, and most cloud storage options. Impala also scales linearly, even in multitenant environments. Impala is integrated with native Hadoop security and Kerberos for authentication, and via the Ranger module, you can ensure that the right users and applications are authorized for the right data. Utilize the same file and data formats and metadata, security, and resource management frameworks as your Hadoop deployment, with no redundant infrastructure or data conversion/duplication. For Apache Hive users, Impala utilizes the same metadata and ODBC driver. Like Hive, Impala supports SQL, so you don't have to worry about reinventing the implementation wheel. With Impala, more users, whether using SQL queries or BI applications, can interact with more data through a single repository and metadata stored from source through analysis.Starting Price: Free -
20
CelerData Cloud
CelerData
CelerData is a high-performance SQL engine built to power analytics directly on data lakehouses, eliminating the need for traditional data‐warehouse ingestion pipelines. It delivers sub-second query performance at scale, supports on-the‐fly JOINs without costly denormalization, and simplifies architecture by allowing users to run demanding workloads on open format tables. Built on the open source engine StarRocks, the platform outperforms legacy query engines like Trino, ClickHouse, and Apache Druid in latency, concurrency, and cost-efficiency. With a cloud-managed service that runs in your own VPC, you retain infrastructure control and data ownership while CelerData handles maintenance and optimization. The platform is positioned to power real-time OLAP, business intelligence, and customer-facing analytics use cases and is trusted by enterprise customers (including names such as Pinterest, Coinbase, and Fanatics) who have achieved significant latency reductions and cost savings. -
21
E-MapReduce
Alibaba
EMR is an all-in-one enterprise-ready big data platform that provides cluster, job, and data management services based on open-source ecosystems, such as Hadoop, Spark, Kafka, Flink, and Storm. Alibaba Cloud Elastic MapReduce (EMR) is a big data processing solution that runs on the Alibaba Cloud platform. EMR is built on Alibaba Cloud ECS instances and is based on open-source Apache Hadoop and Apache Spark. EMR allows you to use the Hadoop and Spark ecosystem components, such as Apache Hive, Apache Kafka, Flink, Druid, and TensorFlow, to analyze and process data. You can use EMR to process data stored on different Alibaba Cloud data storage service, such as Object Storage Service (OSS), Log Service (SLS), and Relational Database Service (RDS). You can quickly create clusters without the need to configure hardware and software. All maintenance operations are completed on its Web interface. -
22
Firebolt
Firebolt Analytics
Firebolt delivers extreme speed and elasticity at any scale solving your impossible data challenges. Firebolt has completely redesigned the cloud data warehouse to deliver a super fast, incredibly efficient analytics experience at any scale. An order-of-magnitude leap in performance means you can analyze much more data at higher granularity with lightning fast queries. Easily scale up or down to support any workload, amount of data and concurrent users. At Firebolt we believe that data warehouses should be much easier to use than what we’re used to. That's why we focus on turning everything that used to be complicated and labor intensive into simple tasks. Cloud data warehouse providers profit from the cloud resources you consume. We don’t! Finally, a pricing model that is fair, transparent, and allows you to scale without breaking the bank. -
23
Presto
Presto Foundation
Presto is an open source distributed SQL query engine for running interactive analytic queries against data sources of all sizes ranging from gigabytes to petabytes. For data engineers who struggle with managing multiple query languages and interfaces to siloed databases and storage, Presto is the fast and reliable engine that provides one simple ANSI SQL interface for all your data analytics and your open lakehouse. Different engines for different workloads means you will have to re-platform down the road. With Presto, you get 1 familar ANSI SQL language and 1 engine for your data analytics so you don't need to graduate to another lakehouse engine. Presto can be used for interactive and batch workloads, small and large amounts of data, and scales from a few to thousands of users. Presto gives you one simple ANSI SQL interface for all of your data in various siloed data systems, helping you join your data ecosystem together. -
24
Oracle Big Data SQL Cloud Service enables organizations to immediately analyze data across Apache Hadoop, NoSQL and Oracle Database leveraging their existing SQL skills, security policies and applications with extreme performance. From simplifying data science efforts to unlocking data lakes, Big Data SQL makes the benefits of Big Data available to the largest group of end users possible. Big Data SQL gives users a single location to catalog and secure data in Hadoop and NoSQL systems, Oracle Database. Seamless metadata integration and queries which join data from Oracle Database with data from Hadoop and NoSQL databases. Utilities and conversion routines support automatic mappings from metadata stored in HCatalog (or the Hive Metastore) to Oracle Tables. Enhanced access parameters give administrators the flexibility to control column mapping and data access behavior. Multiple cluster support enables one Oracle Database to query multiple Hadoop clusters and/or NoSQL systems.
-
25
ZetaAnalytics
Halliburton
The ZetaAnalytics product requires a compatible database appliance for its Data Warehouse. Landmark has qualified the ZetaAnalytics software using Teradata, EMC Greenplum, and IBM Netezza. Please see the ZetaAnalytics Release Notes for the most up to date qualified versions. Before installing and configuring ZetaAnalytics software, ensure that the Data Warehouse you use for drilling data is created and running. Scripts to create the various Zeta-specific database components within the Data Warehouse will need to be run as part of the installation process. These require database administrator (DBA) rights. The ZetaAnalytics product requires Apache Hadoop for model scoring and real-time streaming. If you do not already have an Apache Hadoop cluster installed in your environment, please install it before running the ZetaAnalytics installer, which will prompt you for the name and port number of your Hadoop Name Server and Map Reducer. -
26
Databend
Databend
Databend is a modern, cloud-native data warehouse built to deliver high-performance, cost-efficient analytics for large-scale data processing. It is designed with an elastic architecture that scales dynamically to meet the demands of different workloads, ensuring efficient resource utilization and lower operational costs. Written in Rust, Databend offers exceptional performance through features like vectorized query execution and columnar storage, which optimize data retrieval and processing speeds. Its cloud-first design enables seamless integration with cloud platforms, and it emphasizes reliability, data consistency, and fault tolerance. Databend is an open source solution, making it a flexible and accessible choice for data teams looking to handle big data analytics in the cloud.Starting Price: Free -
27
Oracle Big Data Service
Oracle
Oracle Big Data Service makes it easy for customers to deploy Hadoop clusters of all sizes, with VM shapes ranging from 1 OCPU to a dedicated bare metal environment. Customers choose between high-performance NVmE storage or cost-effective block storage, and can grow or shrink their clusters. Quickly create Hadoop-based data lakes to extend or complement customer data warehouses, and ensure that all data is both accessible and managed cost-effectively. Query, visualize and transform data so data scientists can build machine learning models using the included notebook with its R, Python and SQL support. Move customer-managed Hadoop clusters to a fully-managed cloud-based service, reducing management costs and improving resource utilization.Starting Price: $0.1344 per hour -
28
Azure HDInsight
Microsoft
Run popular open-source frameworks—including Apache Hadoop, Spark, Hive, Kafka, and more—using Azure HDInsight, a customizable, enterprise-grade service for open-source analytics. Effortlessly process massive amounts of data and get all the benefits of the broad open-source project ecosystem with the global scale of Azure. Easily migrate your big data workloads and processing to the cloud. Open-source projects and clusters are easy to spin up quickly without the need to install hardware or manage infrastructure. Big data clusters reduce costs through autoscaling and pricing tiers that allow you to pay for only what you use. Enterprise-grade security and industry-leading compliance with more than 30 certifications helps protect your data. Optimized components for open-source technologies such as Hadoop and Spark keep you up to date. -
29
EspressReport ES
Quadbase Systems
EspressRepot ES (Enterprise Server) is a web and desktop-based software that allows users to develop stunning and interactive data visualization and reporting. The platform offers full Java EE integration, to draw data from data sources such as Bid Data (Hadoop, Spark, and MongoDB), ad-hoc queries and reports, online map support, mobile compatibility, alert monitor, and many other amazing features. -
30
Apache Mahout
Apache Software Foundation
Apache Mahout is a powerful, scalable, and versatile machine learning library designed for distributed data processing. It offers a comprehensive set of algorithms for various tasks, including classification, clustering, recommendation, and pattern mining. Built on top of the Apache Hadoop ecosystem, Mahout leverages MapReduce and Spark to enable data processing on large-scale datasets. Apache Mahout(TM) is a distributed linear algebra framework and mathematically expressive Scala DSL designed to let mathematicians, statisticians, and data scientists quickly implement their own algorithms. Apache Spark is the recommended out-of-the-box distributed back-end or can be extended to other distributed backends. Matrix computations are a fundamental part of many scientific and engineering applications, including machine learning, computer vision, and data analysis. Apache Mahout is designed to handle large-scale data processing by leveraging the power of Hadoop and Spark. -
31
Apache Hive
Apache Software Foundation
The Apache Hive data warehouse software facilitates reading, writing, and managing large datasets residing in distributed storage using SQL. Structure can be projected onto data already in storage. A command line tool and JDBC driver are provided to connect users to Hive. Apache Hive is an open source project run by volunteers at the Apache Software Foundation. Previously it was a subproject of Apache® Hadoop®, but has now graduated to become a top-level project of its own. We encourage you to learn about the project and contribute your expertise. Traditional SQL queries must be implemented in the MapReduce Java API to execute SQL applications and queries over distributed data. Hive provides the necessary SQL abstraction to integrate SQL-like queries (HiveQL) into the underlying Java without the need to implement queries in the low-level Java API. -
32
IBM Analytics Engine provides an architecture for Hadoop clusters that decouples the compute and storage tiers. Instead of a permanent cluster formed of dual-purpose nodes, the Analytics Engine allows users to store data in an object storage layer such as IBM Cloud Object Storage and spins up clusters of computing notes when needed. Separating compute from storage helps to transform the flexibility, scalability and maintainability of big data analytics platforms. Build on an ODPi compliant stack with pioneering data science tools with the broader Apache Hadoop and Apache Spark ecosystem. Define clusters based on your application's requirements. Choose the appropriate software pack, version, and size of the cluster. Use as long as required and delete as soon as an application finishes jobs. Configure clusters with third-party analytics libraries and packages. Deploy workloads from IBM Cloud services like machine learning.Starting Price: $0.014 per hour
-
33
jethro
jethro
Data-driven decision-making has unleashed a surge of business data and a rise in user demand to analyze it. This trend drives IT departments to migrate off expensive Enterprise Data Warehouses (EDW) toward cost-effective Big Data platforms like Hadoop or AWS. These new platforms come with a Total Cost of Ownership (TCO) that is about 10 times lower. They are not ideal for interactive BI applications, however, as they fail to match the high performance and user concurrency of legacy EDWs. For this exact reason, we developed Jethro. Customers use Jethro for interactive BI on Big Data. Jethro is a transparent middle tier that requires no changes to existing apps or data. It is self-driving with no maintenance required. Jethro is compatible with BI tools like Tableau, Qlik, and Microstrategy and is data source agnostic. Jethro delivers on the demands of business users allowing for thousands of concurrent users to run complicated queries over billions of records. -
34
HugeGraph
HugeGraph
HugeGraph is a fast-speed and highly-scalable graph database. Billions of vertices and edges can be easily stored into and queried from HugeGraph due to its excellent OLTP ability. As compliance to Apache TinkerPop 3 framework, various complicated graph queries can be accomplished through Gremlin (a powerful graph traversal language). Among its features, it provides compliance to Apache TinkerPop 3, supporting Gremlin. Schema Metadata Management, including VertexLabel, EdgeLabel, PropertyKey and IndexLabel. Multi-type Indexes, supporting exact query, range query and complex conditions combination query. Plug-in Backend Store Driver Framework, supporting RocksDB, Cassandra, ScyllaDB, HBase and MySQL now and easy to add other backend store driver if needed. Integration with Hadoop/Spark. HugeGraph relies on the TinkerPop framework, we refer to the storage structure of Titan and the schema definition of DataStax. -
35
Imply
Imply
Imply is a real-time analytics platform built on Apache Druid, designed to handle large-scale, high-performance OLAP (Online Analytical Processing) workloads. It offers real-time data ingestion, fast query performance, and the ability to perform complex analytical queries on massive datasets with low latency. Imply is tailored for organizations that need interactive analytics, real-time dashboards, and data-driven decision-making at scale. It provides a user-friendly interface for data exploration, along with advanced features such as multi-tenancy, fine-grained access controls, and operational insights. With its distributed architecture and scalability, Imply is well-suited for use cases in streaming data analytics, business intelligence, and real-time monitoring across industries. -
36
Apache Bigtop
Apache Software Foundation
Bigtop is an Apache Foundation project for Infrastructure Engineers and Data Scientists looking for comprehensive packaging, testing, and configuration of the leading open source big data components. Bigtop supports a wide range of components/projects, including, but not limited to, Hadoop, HBase and Spark. Bigtop packages Hadoop RPMs and DEBs, so that you can manage and maintain your Hadoop cluster. Bigtop provides an integrated smoke testing framework, alongside a suite of over 50 test files. Bigtop provides vagrant recipes, raw images, and (work-in-progress) docker recipes for deploying Hadoop from zero. Bigtop support many Operating Systems, including Debian, Ubuntu, CentOS, Fedora, openSUSE and many others. Bigtop includes tools and a framework for testing at various levels (packaging, platform, runtime, etc.) for both initial deployments as well as upgrade scenarios for the entire data platform, not just the individual components. -
37
Apache Sentry
Apache Software Foundation
Apache Sentry™ is a system for enforcing fine grained role based authorization to data and metadata stored on a Hadoop cluster. Apache Sentry has successfully graduated from the Incubator in March of 2016 and is now a Top-Level Apache project. Apache Sentry is a granular, role-based authorization module for Hadoop. Sentry provides the ability to control and enforce precise levels of privileges on data for authenticated users and applications on a Hadoop cluster. Sentry currently works out of the box with Apache Hive, Hive Metastore/HCatalog, Apache Solr, Impala and HDFS (limited to Hive table data). Sentry is designed to be a pluggable authorization engine for Hadoop components. It allows you to define authorization rules to validate a user or application’s access requests for Hadoop resources. Sentry is highly modular and can support authorization for a wide variety of data models in Hadoop. -
38
Apache Drill
The Apache Software Foundation
Schema-free SQL Query Engine for Hadoop, NoSQL and Cloud Storage -
39
OpenText Analytics Database is a high-performance, scalable analytics platform that enables organizations to analyze massive data sets quickly and cost-effectively. It supports real-time analytics and in-database machine learning to deliver actionable business insights. The platform can be deployed flexibly across hybrid, multi-cloud, and on-premises environments to optimize infrastructure and reduce total cost of ownership. Its massively parallel processing (MPP) architecture handles complex queries efficiently, regardless of data size. OpenText Analytics Database also features compatibility with data lakehouse architectures, supporting formats like Parquet and ORC. With built-in machine learning and broad language support, it empowers users from SQL experts to Python developers to derive predictive insights.
-
40
SingleStore
SingleStore
SingleStore (formerly MemSQL) is a distributed, highly-scalable SQL database that can run anywhere. We deliver maximum performance for transactional and analytical workloads with familiar relational models. SingleStore is a scalable SQL database that ingests data continuously to perform operational analytics for the front lines of your business. Ingest millions of events per second with ACID transactions while simultaneously analyzing billions of rows of data in relational SQL, JSON, geospatial, and full-text search formats. SingleStore delivers ultimate data ingestion performance at scale and supports built in batch loading and real time data pipelines. SingleStore lets you achieve ultra fast query response across both live and historical data using familiar ANSI SQL. Perform ad hoc analysis with business intelligence tools, run machine learning algorithms for real-time scoring, perform geoanalytic queries in real time.Starting Price: $0.69 per hour -
41
Hadoop
Apache Software Foundation
The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models. It is designed to scale up from single servers to thousands of machines, each offering local computation and storage. Rather than rely on hardware to deliver high-availability, the library itself is designed to detect and handle failures at the application layer, so delivering a highly-available service on top of a cluster of computers, each of which may be prone to failures. A wide variety of companies and organizations use Hadoop for both research and production. Users are encouraged to add themselves to the Hadoop PoweredBy wiki page. Apache Hadoop 3.3.4 incorporates a number of significant enhancements over the previous major release line (hadoop-3.2). -
42
Apache Ranger
The Apache Software Foundation
Apache Ranger™ is a framework to enable, monitor and manage comprehensive data security across the Hadoop platform. The vision with Ranger is to provide comprehensive security across the Apache Hadoop ecosystem. With the advent of Apache YARN, the Hadoop platform can now support a true data lake architecture. Enterprises can potentially run multiple workloads, in a multi tenant environment. Data security within Hadoop needs to evolve to support multiple use cases for data access, while also providing a framework for central administration of security policies and monitoring of user access. Centralized security administration to manage all security related tasks in a central UI or using REST APIs. Fine grained authorization to do a specific action and/or operation with Hadoop component/tool and managed through a central administration tool. Standardize authorization method across all Hadoop components. Enhanced support for different authorization methods - Role based access control etc. -
43
The Ocient Hyperscale Data Warehouse transforms and loads data in seconds, enables organizations to store and analyze more data, and executes queries on hyperscale datasets up to 50x faster. To deliver next-generation data analytics, Ocient completely reimagined its data warehouse design to deliver rapid, continuous analysis of complex, hyperscale datasets. The Ocient Hyperscale Data Warehouse brings storage adjacent to compute to maximize performance on industry-standard hardware, enables users to transform, stream or load data directly, and returns previously infeasible queries in seconds. Optimized for industry standard hardware, Ocient has benchmarked query performance levels up to 50x better than competing products. The Ocient Hyperscale Data Warehouse empowers next-generation data analytics solutions in key areas where existing solutions fall short.
-
44
ClickHouse
ClickHouse
ClickHouse is a fast open-source OLAP database management system. It is column-oriented and allows to generate analytical reports using SQL queries in real-time. ClickHouse's performance exceeds comparable column-oriented database management systems currently available on the market. It processes hundreds of millions to more than a billion rows and tens of gigabytes of data per single server per second. ClickHouse uses all available hardware to its full potential to process each query as fast as possible. Peak processing performance for a single query stands at more than 2 terabytes per second (after decompression, only used columns). In distributed setup reads are automatically balanced among healthy replicas to avoid increasing latency. ClickHouse supports multi-master asynchronous replication and can be deployed across multiple datacenters. All nodes are equal, which allows avoiding having single points of failure. -
45
Greenplum
Greenplum Database
Greenplum Database® is an advanced, fully featured, open source data warehouse. It provides powerful and rapid analytics on petabyte scale data volumes. Uniquely geared toward big data analytics, Greenplum Database is powered by the world’s most advanced cost-based query optimizer delivering high analytical query performance on large data volumes. Greenplum Database® project is released under the Apache 2 license. We want to thank all our current community contributors and are interested in all new potential contributions. For the Greenplum Database community no contribution is too small, we encourage all types of contributions. An open-source massively parallel data platform for analytics, machine learning and AI. Rapidly create and deploy models for complex applications in cybersecurity, predictive maintenance, risk management, fraud detection, and many other areas. Experience the fully featured, integrated, open source analytics platform. -
46
VeloDB
VeloDB
Powered by Apache Doris, VeloDB is a modern data warehouse for lightning-fast analytics on real-time data at scale. Push-based micro-batch and pull-based streaming data ingestion within seconds. Storage engine with real-time upsert、append and pre-aggregation. Unparalleled performance in both real-time data serving and interactive ad-hoc queries. Not just structured but also semi-structured data. Not just real-time analytics but also batch processing. Not just run queries against internal data but also work as a federate query engine to access external data lakes and databases. Distributed design to support linear scalability. Whether on-premise deployment or cloud service, separation or integration of storage and compute, resource usage can be flexibly and efficiently adjusted according to workload requirements. Built on and fully compatible with open source Apache Doris. Support MySQL protocol, functions, and SQL for easy integration with other data tools. -
47
SAP BW/4HANA
SAP
SAP BW/4HANA is a packaged data warehouse based on SAP HANA. As the on-premise data warehouse layer of SAP’s Business Technology Platform, it allows you to consolidate data across the enterprise to get a consistent, agreed-upon view of your data. Streamline processes and support innovations with a single source for real-time insights. Based on SAP HANA, our next-generation data warehouse solution can help you capitalize on the full value of all your data from SAP applications or third-party solutions, as well as unstructured, geospatial, or Hadoop-based. Transform data practices to gain the efficiency and agility to deploy live insights at scale, both on premise or in the cloud. Drive digitization across all lines of business with a Big Data warehouse, while leveraging digital business platform solutions from SAP. -
48
MLlib
Apache Software Foundation
Apache Spark's MLlib is a scalable machine learning library that integrates seamlessly with Spark's APIs, supporting Java, Scala, Python, and R. It offers a comprehensive suite of algorithms and utilities, including classification, regression, clustering, collaborative filtering, and tools for constructing machine learning pipelines. MLlib's high-quality algorithms leverage Spark's iterative computation capabilities, delivering performance up to 100 times faster than traditional MapReduce implementations. It is designed to operate across diverse environments, running on Hadoop, Apache Mesos, Kubernetes, standalone clusters, or in the cloud, and accessing various data sources such as HDFS, HBase, and local files. This flexibility makes MLlib a robust solution for scalable and efficient machine learning tasks within the Apache Spark ecosystem. -
49
Baidu Palo
Baidu AI Cloud
Palo helps enterprises to create the PB-level MPP architecture data warehouse service within several minutes and import the massive data from RDS, BOS, and BMR. Thus, Palo can perform the multi-dimensional analytics of big data. Palo is compatible with mainstream BI tools. Data analysts can analyze and display the data visually and gain insights quickly to assist decision-making. It has the industry-leading MPP query engine, with column storage, intelligent index,and vector execution functions. It can also provide in-library analytics, window functions, and other advanced analytics functions. You can create a materialized view and change the table structure without the suspension of service. It supports flexible and efficient data recovery. -
50
Deeplearning4j
Deeplearning4j
DL4J takes advantage of the latest distributed computing frameworks including Apache Spark and Hadoop to accelerate training. On multi-GPUs, it is equal to Caffe in performance. The libraries are completely open-source, Apache 2.0, and maintained by the developer community and Konduit team. Deeplearning4j is written in Java and is compatible with any JVM language, such as Scala, Clojure, or Kotlin. The underlying computations are written in C, C++, and Cuda. Keras will serve as the Python API. Eclipse Deeplearning4j is the first commercial-grade, open-source, distributed deep-learning library written for Java and Scala. Integrated with Hadoop and Apache Spark, DL4J brings AI to business environments for use on distributed GPUs and CPUs. There are a lot of parameters to adjust when you're training a deep-learning network. We've done our best to explain them, so that Deeplearning4j can serve as a DIY tool for Java, Scala, Clojure, and Kotlin programmers.