Alternatives to Apache Accumulo
Compare Apache Accumulo alternatives for your business or organization using the curated list below. SourceForge ranks the best alternatives to Apache Accumulo in 2026. Compare features, ratings, user reviews, pricing, and more from Apache Accumulo competitors and alternatives in order to make an informed decision for your business.
-
1
Amazon DynamoDB
Amazon
Amazon DynamoDB is a key-value and document database that delivers single-digit millisecond performance at any scale. It's a fully managed, multi-region, Multimaster, durable database with built-in security, backup and restore, and in-memory caching for internet-scale applications. DynamoDB can handle more than 10 trillion requests per day and can support peaks of more than 20 million requests per second. Many of the world's fastest-growing businesses such as Lyft, Airbnb, and Redfin as well as enterprises such as Samsung, Toyota, and Capital One depend on the scale and performance of DynamoDB to support their mission-critical workloads. Focus on driving innovation with no operational overhead. Build out your game platform with player data, session history, and leaderboards for millions of concurrent users. Use design patterns for deploying shopping carts, workflow engines, inventory tracking, and customer profiles. DynamoDB supports high-traffic, extreme-scaled events. -
2
Redis
Redis Labs
Redis Labs: home of Redis. Redis Enterprise is the best version of Redis. Go beyond cache; try Redis Enterprise free in the cloud using NoSQL & data caching with the world’s fastest in-memory database. Run Redis at scale, enterprise grade resiliency, massive scalability, ease of management, and operational simplicity. DevOps love Redis in the Cloud. Developers can access enhanced data structures, a variety of modules, and rapid innovation with faster time to market. CIOs love the confidence of working with 99.999% uptime best in class security and expert support from the creators of Redis. Implement relational databases, active-active, geo-distribution, built in conflict distribution for simple and complex data types, & reads/writes in multiple geo regions to the same data set. Redis Enterprise offers flexible deployment options, cloud on-prem, & hybrid. Redis Labs: home of Redis. Redis JSON, Redis Java, Python Redis, Redis on Kubernetes & Redis gui best practices.Starting Price: Free -
3
Apache HBase
The Apache Software Foundation
Use Apache HBase™ when you need random, realtime read/write access to your Big Data. This project's goal is the hosting of very large tables -- billions of rows X millions of columns -- atop clusters of commodity hardware. Automatic failover support between RegionServers. Easy to use Java API for client access. Thrift gateway and a REST-ful Web service that supports XML, Protobuf, and binary data encoding options. Support for exporting metrics via the Hadoop metrics subsystem to files or Ganglia; or via JMX. -
4
ArcadeDB
ArcadeDB
ArcadeDB is an open-source, next-generation multi-model database. Forget Polyglot Persistence — store graphs, documents, key-value pairs, search engine indexes, vectors, and time-series data all in one database with native support for every model. No translation layers, no performance penalties. Process over 10 million records per second. Traversal speed stays constant whether your database has hundreds or billions of records. Query in the language you prefer: SQL, Cypher, Gremlin, GraphQL, MongoDB API, or Java. Deploy ArcadeDB embedded in your JVM application, on a standalone server, or distributed across multiple nodes with Raft Consensus for high availability. Fully ACID-compliant. Super lightweight. Apache 2.0 licensed — free for production and commercial use.Starting Price: Free -
5
HerdDB
Diennea
HerdDB is a SQL distributed database implemented in Java. It has been designed to be embeddable in any Java Virtual Machine. It is optimized for fast "writes" and primary key read/update access patterns. HerdDB is designed to manage hundreds of tables. It is simple to add and remove hosts and to reconfigure tablespaces to easly distribute the load on multiple systems. HerdDB leverages Apache Zookeeper and Apache Bookkeeper to build a fully replicated, shared-nothing architecture without any single point of failure. At the low level HerdDB is very similar to a key-value NoSQL database. On top of that an SQL abstraction layer and JDBC Driver support enables every user to leverage existing known-how and port existing applications to HerdDB. At Diennea we developed EmailSuccess, a powerfull MTA (Mail Transfer Agent), designed to deliver millions of email messages per hour to inboxes all around the world, -
6
GridGain
GridGain Systems
The enterprise-grade platform built on Apache Ignite that provides in-memory speed and massive scalability for data-intensive applications and real-time data access across datastores and applications. Upgrade from Ignite to GridGain with no code changes and deploy your clusters securely at global scale with zero downtime. Perform rolling upgrades of your production clusters with no impact on application availability. Replicate across globally distributed data centers to load balance workloads and prevent downtime from regional outages. Secure your data at rest and in motion, and ensure compliance with security and privacy standards. Easily integrate with your organization's authentication and authorization system. Enable full data and user activity auditing. Create automated schedules for full and incremental backups. Restore your cluster to the last stable state with snapshots and point-in-time recovery. -
7
FoundationDB
FoundationDB
FoundationDB is multi-model, meaning you can store many types data in a single database. All data is safely stored, distributed, and replicated in the Key-Value Store component. FoundationDB is easy to install, grow, and manage. It has a distributed architecture that gracefully scales out, and handles faults while acting like a single ACID database. FoundationDB provides amazing performance on commodity hardware, allowing you to support very heavy loads at low cost. FoundationDB has been running in production for years and been hardened with lessons learned. Backing FoundationDB up is an unmatched testing system based on a deterministic simulation engine. We encourage your participation in our open-source community! Join us in technical and user discussions on the community forums, and learn how to contribute. -
8
etcd
etcd
etcd is a strongly consistent, distributed key-value store that provides a reliable way to store data that needs to be accessed by a distributed system or cluster of machines. It gracefully handles leader elections during network partitions and can tolerate machine failure, even in the leader node. Store data in hierarchically organized directories, as in a standard filesystem. Watch specific keys or directories for changes and react to changes in values. -
9
InterSystems IRIS
InterSystems
InterSystems IRIS is a complete cloud-first data platform that includes a multi-model transactional data management engine, an application development platform, and interoperability engine, and an open analytics platform. It is the next generation of our proven data management software.It includes the capabilities of InterSystems Cache and Ensemble, plus a wealth of exciting new capabilities to make it easy to build and deploy cloud based, analytics-intensive enterprise applications with even greater performance and scalability. InterSystems IRIS provides a set of APIs to operate with transactional persistent data simultaneously: key-value, relational, object, document, multidimensional. Data can be managed by SQL, Java, node.js, .NET, C++, Python, and native server-side ObjectScript language. InterSystems IRIS includes -
10
LeanXcale
LeanXcale
LeanXcale is a fast and scalable database that combines the characteristics of SQL and NoSQL. It is built to ingest massive batch and real-time data pipelines and make it available through SQL or GIS for any use, such as operational applications, analytics, dashboarding, or machine learning processing. No matter what stack you use, LeanXcale provides you both SQL and NoSQL interfaces. KiVi storage engine is a relational key-value data store. Users can access the data not only through the standard SQL API but also through a direct ACID key-value interface. This key-value interface allows users to perform data ingestion at very high rates and very efficiently by avoiding SQL processing overhead. Highly-scalable, efficient and distributed storage engine distributed data along the cluster to improve the performance and increase the reliability.Starting Price: $0.127 per GB per month -
11
OrbitDB
OrbitDB
OrbitDB is a serverless, distributed, peer-to-peer database that utilizes IPFS for data storage and Libp2p Pubsub for automatic synchronization across peers. It employs Merkle-CRDTs to ensure conflict-free database writes and merges, making it suitable for decentralized applications, blockchain integrations, and local-first web apps. OrbitDB offers various database types tailored to different use cases: 'events' for immutable append-only logs, 'documents' for JSON document storage indexed by a specified key, 'keyvalue' for traditional key-value pairs, and 'keyvalue-indexed' for LevelDB-indexed key-value data. All these databases are built atop OpLog, an immutable, cryptographically verifiable, operation-based CRDT structure. The JavaScript implementation supports both browser and Node.js environments, with a Go version maintained by the Berty project.Starting Price: Free -
12
Apache Cassandra
Apache Software Foundation
The Apache Cassandra database is the right choice when you need scalability and high availability without compromising performance. Linear scalability and proven fault-tolerance on commodity hardware or cloud infrastructure make it the perfect platform for mission-critical data. Cassandra's support for replicating across multiple datacenters is best-in-class, providing lower latency for your users and the peace of mind of knowing that you can survive regional outages. -
13
Google Cloud Bigtable
Google
Google Cloud Bigtable is a fully managed, scalable NoSQL database service for large analytical and operational workloads. Fast and performant: Use Cloud Bigtable as the storage engine that grows with you from your first gigabyte to petabyte-scale for low-latency applications as well as high-throughput data processing and analytics. Seamless scaling and replication: Start with a single node per cluster, and seamlessly scale to hundreds of nodes dynamically supporting peak demand. Replication also adds high availability and workload isolation for live serving apps. Simple and integrated: Fully managed service that integrates easily with big data tools like Hadoop, Dataflow, and Dataproc. Plus, support for the open source HBase API standard makes it easy for development teams to get started. -
14
ScyllaDB
ScyllaDB
ScyllaDB is the database for data-intensive apps that require high performance and low latency. It enables teams to harness the ever-increasing computing power of modern infrastructures – eliminating barriers to scale as data grows. Unlike any other database, ScyllaDB is a distributed NoSQL database fully compatible with Apache Cassandra and Amazon DynamoDB, yet is built with deep architectural advancements that enable exceptional end-user experiences at radically lower costs. Over 400 game-changing companies like Disney+ Hotstar, Expedia, FireEye, Discord, Zillow, Starbucks, Comcast, and Samsung use ScyllaDB for their toughest database challenges. ScyllaDB is available as free open source software, a fully-supported enterprise product, and a fully managed database-as-a-service (DBaaS) on multiple cloud providers. -
15
Infinispan
Infinispan
Infinispan is an open-source in-memory data grid that offers flexible deployment options and robust capabilities for storing, managing, and processing data. Infinispan provides a key/value data store that can hold all types of data, from Java objects to plain text. Infinispan distributes your data across elastically scalable clusters to guarantee high availability and fault tolerance, whether you use Infinispan as a volatile cache or a persistent data store. Infinispan turbocharges applications by storing data closer to processing logic, which reduces latency and increases throughput. Available as a Java library, you simply add Infinispan to your application dependencies and then you’re ready to store data in the same memory space as the executing code. -
16
Apache Sentry
Apache Software Foundation
Apache Sentry™ is a system for enforcing fine grained role based authorization to data and metadata stored on a Hadoop cluster. Apache Sentry has successfully graduated from the Incubator in March of 2016 and is now a Top-Level Apache project. Apache Sentry is a granular, role-based authorization module for Hadoop. Sentry provides the ability to control and enforce precise levels of privileges on data for authenticated users and applications on a Hadoop cluster. Sentry currently works out of the box with Apache Hive, Hive Metastore/HCatalog, Apache Solr, Impala and HDFS (limited to Hive table data). Sentry is designed to be a pluggable authorization engine for Hadoop components. It allows you to define authorization rules to validate a user or application’s access requests for Hadoop resources. Sentry is highly modular and can support authorization for a wide variety of data models in Hadoop. -
17
BoltDB
BoltDB
Bolt is a pure Go key/value store inspired by Howard Chu's LMDB project. The goal of the project is to provide a simple, fast, and reliable database for projects that don't require a full database server such as Postgres or MySQL. Since Bolt is meant to be used as such a low-level piece of functionality, simplicity is key. The API will be small and only focus on getting values and setting values. That's it. The original goal of Bolt was to provide a simple pure Go key/value store and to not bloat the code with extraneous features. To that end, the project has been a success. However, this limited scope also means that the project is complete. Maintaining an open source database requires an immense amount of time and energy. Changes to the code can have unintended and sometimes catastrophic effects so even simple changes require hours and hours of careful testing and validation. -
18
Apache Geode
Apache
Build high-speed, data-intensive applications that elastically meet performance requirements at any scale. Take advantage of Apache Geode's unique technology that blends advanced techniques for data replication, partitioning and distributed processing. Apache Geode provides a database-like consistency model, reliable transaction processing and a shared-nothing architecture to maintain very low latency performance with high concurrency processing. Data can easily be partitioned (sharded) or replicated between nodes allowing performance to scale as needed. Durability is ensured through redundant in-memory copies and disk-based persistence. Super fast write-ahead-logging (WAL) persistence with a shared-nothing architecture that is optimized for fast parallel recovery of nodes or an entire cluster. -
19
Speedb
Speedb
The next-generation key-value storage engine.bSpeedb is 100% RocksDB compatible enhancing stability, efficiency, and overall performance. Join the Hive, Speedb’s open-source community, to interact, improve, and share knowledge and best practices on RocksDB. Speedb is a compatible alternative for LevelDB and RocksDB users who would like to take their application to the next level. When using event streaming platforms like Kafka, Flink, Spark, Splunk, Elastic, or others, consider using Speedb to enhance its performance. The increase in metadata in modern data sets is causing significant performance issues for many applications. With Speedb you can keep costs low and ensure your applications continue to run smoothly even under heavy loads. When it comes to making a choice to upgrade or deploy a new key-value store with your platform, Speedb is up for the challenge. By seamlessly integrating Speedb's advanced key-value storage engine with your projects, you'll experience immediate relief.Starting Price: Free -
20
JanusGraph
JanusGraph
JanusGraph is a scalable graph database optimized for storing and querying graphs containing hundreds of billions of vertices and edges distributed across a multi-machine cluster. JanusGraph is a project under The Linux Foundation, and includes participants from Expero, Google, GRAKN.AI, Hortonworks, IBM and Amazon. Elastic and linear scalability for a growing data and user base. Data distribution and replication for performance and fault tolerance. Multi-datacenter high availability and hot backups. All functionality is totally free. No need to buy commercial licenses. JanusGraph is fully open source under the Apache 2 license. JanusGraph is a transactional database that can support thousands of concurrent users executing complex graph traversals in real time. Support for ACID and eventual consistency. In addition to online transactional processing (OLTP), JanusGraph supports global graph analytics (OLAP) with its Apache Spark integration. -
21
Apache Trafodion
Apache Software Foundation
Apache Trafodion is a webscale SQL-on-Hadoop solution enabling transactional or operational workloads on Apache Hadoop. Trafodion builds on the scalability, elasticity, and flexibility of Hadoop. Trafodion extends Hadoop to provide guaranteed transactional integrity, enabling new kinds of big data applications to run on Hadoop. Full-functioned ANSI SQL language support. JDBC/ODBC connectivity for Linux/Windows clients. Distributed ACID transaction protection across multiple statements, tables, and rows. Performance improvements for OLTP workloads with compile-time and run-time optimizations. Support for large data sets using a parallel-aware query optimizer. Reuse existing SQL skills and improve developer productivity. Distributed ACID transactions guarantee data consistency across multiple rows and tables. Interoperability with existing tools and applications. Hadoop and Linux distribution neutral. Easy to add to your existing Hadoop infrastructure.Starting Price: Free -
22
InterSystems Caché
InterSystems
InterSystems Caché® is a high-performance database that powers transaction processing applications around the world. It is used for everything from mapping a billion stars in the Milky Way, to processing a billion equity trades in a day, to managing smart energy grids. Caché is a multi-model (object, relational, key-value) DBMS and application server developed by InterSystems. InterSystems Caché provides several APIs to operate with same data simultaneously: key-value, relational, object, document, multi-dimensional. Data can be managed via SQL, Java, node.js, .NET, C++, Python. Caché also provides an application server which hosts web apps (CSP), REST, SOAP, web sockets and other types of TCP access for Caché data. -
23
E-MapReduce
Alibaba
EMR is an all-in-one enterprise-ready big data platform that provides cluster, job, and data management services based on open-source ecosystems, such as Hadoop, Spark, Kafka, Flink, and Storm. Alibaba Cloud Elastic MapReduce (EMR) is a big data processing solution that runs on the Alibaba Cloud platform. EMR is built on Alibaba Cloud ECS instances and is based on open-source Apache Hadoop and Apache Spark. EMR allows you to use the Hadoop and Spark ecosystem components, such as Apache Hive, Apache Kafka, Flink, Druid, and TensorFlow, to analyze and process data. You can use EMR to process data stored on different Alibaba Cloud data storage service, such as Object Storage Service (OSS), Log Service (SLS), and Relational Database Service (RDS). You can quickly create clusters without the need to configure hardware and software. All maintenance operations are completed on its Web interface. -
24
IBM Analytics Engine provides an architecture for Hadoop clusters that decouples the compute and storage tiers. Instead of a permanent cluster formed of dual-purpose nodes, the Analytics Engine allows users to store data in an object storage layer such as IBM Cloud Object Storage and spins up clusters of computing notes when needed. Separating compute from storage helps to transform the flexibility, scalability and maintainability of big data analytics platforms. Build on an ODPi compliant stack with pioneering data science tools with the broader Apache Hadoop and Apache Spark ecosystem. Define clusters based on your application's requirements. Choose the appropriate software pack, version, and size of the cluster. Use as long as required and delete as soon as an application finishes jobs. Configure clusters with third-party analytics libraries and packages. Deploy workloads from IBM Cloud services like machine learning.Starting Price: $0.014 per hour
-
25
Lucid KV
Lucid KV
Lucid is currently in a development stage but we want to achieve a fast, secure and distributed key-value store accessible through an HTTP API, we also want to propose persistence, encryption, WebSocket streaming, replication and a lot of features. Private Keys Storing, IoT (to collect and save statistics data), Distributed cache, service discovery, distributed configuration, blob storage etc. -
26
LevelDB
Google
LevelDB is a fast key-value storage library written at Google that provides an ordered mapping from string keys to string values. Keys and values are arbitrary byte arrays. Data is stored sorted by key. Callers can provide a custom comparison function to override the sort order. Multiple changes can be made in one atomic batch. Users can create a transient snapshot to get a consistent view of data. Forward and backward iteration is supported over the data. Data is automatically compressed using the Snappy compression library. External activity (file system operations etc.) is relayed through a virtual interface so users can customize the operating system interactions. We use a database with a million entries. Each entry has a 16 byte key, and a 100 byte value. Values used by the benchmark compress to about half their original size. We list the performance of reading sequentially in both the forward and reverse direction, and also the performance of a random lookup. -
27
Apache TinkerPop
Apache Software Foundation
Apache TinkerPop™ is a graph computing framework for both graph databases (OLTP) and graph analytic systems (OLAP). Gremlin is the graph traversal language of Apache TinkerPop. Gremlin is a functional, data-flow language that enables users to succinctly express complex traversals on (or queries of) their application's property graph. Every Gremlin traversal is composed of a sequence of (potentially nested) steps. A graph is a structure composed of vertices and edges. Both vertices and edges can have an arbitrary number of key/value pairs called properties. Vertices denote discrete objects such as a person, a place, or an event. Edges denote relationships between vertices. For instance, a person may know another person, have been involved in an event, and/or have recently been at a particular place. If a user's domain is composed of a heterogeneous set of objects (vertices) that can be related to one another in a multitude of ways (edges).Starting Price: Free -
28
Hazelcast
Hazelcast
In-Memory Computing Platform. The digital world is different. Microseconds matter. That's why the world's largest organizations rely on us to power their most time-sensitive applications at scale. New data-enabled applications can deliver transformative business power – if they meet today’s requirement of immediacy. Hazelcast solutions complement virtually any database to deliver results that are significantly faster than a traditional system of record. Hazelcast’s distributed architecture provides redundancy for continuous cluster up-time and always available data to serve the most demanding applications. Capacity grows elastically with demand, without compromising performance or availability. The fastest in-memory data grid, combined with third-generation high-speed event processing, delivered through the cloud. -
29
Valkey
Valkey
Valkey is an open source high-performance key/value datastore that supports a variety of workloads, such as caching, message queues, and can act as a primary database. It is backed by the Linux Foundation, ensuring it will remain open source forever. Valkey can run as either a standalone daemon or in a cluster, with options for replication and high availability. It natively supports a rich collection of datatypes, including strings, numbers, hashes, lists, sets, sorted sets, bitmaps, hyperloglogs, and more. You can operate on data structures in-place with an expressive collection of commands. Valkey also supports native extensibility with built-in scripting support for Lua and supports module plugins to create new commands, data types, and more. Valkey 8.1 introduces several performance improvements that reduce latency, increase throughput, and lower memory usage.Starting Price: Free -
30
ArangoDB
ArangoDB
Natively store data for graph, document and search needs. Utilize feature-rich access with one query language. Map data natively to the database and access it with the best patterns for the job – traversals, joins, search, ranking, geospatial, aggregations – you name it. Polyglot persistence without the costs. Easily design, scale and adapt your architectures to changing needs and with much less effort. Combine the flexibility of JSON with semantic search and graph technology for next generation feature extraction even for large datasets. -
31
Aerospike
Aerospike
Aerospike is the global leader in next-generation, real-time NoSQL data solutions for any scale. Aerospike enterprises overcome seemingly impossible data bottlenecks to compete and win with a fraction of the infrastructure complexity and cost of legacy NoSQL databases. Aerospike’s patented Hybrid Memory Architecture™ delivers an unbreakable competitive advantage by unlocking the full potential of modern hardware, delivering previously unimaginable value from vast amounts of data at the edge, to the core and in the cloud. Aerospike empowers customers to instantly fight fraud; dramatically increase shopping cart size; deploy global digital payment networks; and deliver instant, one-to-one personalization for millions of customers. Aerospike customers include Airtel, Banca d’Italia, Nielsen, PayPal, Snap, Verizon Media and Wayfair. The company is headquartered in Mountain View, Calif., with additional locations in London; Bengaluru, India; and Tel Aviv, Israel. -
32
Azure Cosmos DB
Microsoft
Azure Cosmos DB is a fully managed NoSQL database service for modern app development with guaranteed single-digit millisecond response times and 99.999-percent availability backed by SLAs, automatic and instant scalability, and open source APIs for MongoDB and Cassandra. Enjoy fast writes and reads anywhere in the world with turnkey multi-master global distribution. Reduce time to insight by running near-real time analytics and AI on the operational data within your Azure Cosmos DB NoSQL database. Azure Synapse Link for Azure Cosmos DB seamlessly integrates with Azure Synapse Analytics without data movement or diminishing the performance of your operational data store. -
33
Spark Streaming
Apache Software Foundation
Spark Streaming brings Apache Spark's language-integrated API to stream processing, letting you write streaming jobs the same way you write batch jobs. It supports Java, Scala and Python. Spark Streaming recovers both lost work and operator state (e.g. sliding windows) out of the box, without any extra code on your part. By running on Spark, Spark Streaming lets you reuse the same code for batch processing, join streams against historical data, or run ad-hoc queries on stream state. Build powerful interactive applications, not just analytics. Spark Streaming is developed as part of Apache Spark. It thus gets tested and updated with each Spark release. You can run Spark Streaming on Spark's standalone cluster mode or other supported cluster resource managers. It also includes a local run mode for development. In production, Spark Streaming uses ZooKeeper and HDFS for high availability. -
34
RocksDB
RocksDB
RocksDB uses a log structured database engine, written entirely in C++, for maximum performance. Keys and values are just arbitrarily-sized byte streams. RocksDB is optimized for fast, low latency storage such as flash drives and high-speed disk drives. RocksDB exploits the full potential of high read/write rates offered by flash or RAM. RocksDB provides basic operations such as opening and closing a database, reading and writing to more advanced operations such as merging and compaction filters. RocksDB is adaptable to different workloads. From database storage engines such as MyRocks to application data caching to embedded workloads, RocksDB can be used for a variety of data needs. -
35
upscaledb
upscaledb
upscaledb is a fast key-value database which optimizes storage and algorithms for your specific data types. Optional compression further reduces file size and I/O, and can keep more data in memory to increase performance and scalability when running full-table scans to query and analyze the data. upscaledb can be used to build all functions of a typical SQL database, tailored to the specific needs of your application, and directly linked into your program. Its blazingly fast analytical functions and database cursors make it a natural fit to process data whenever a SQL database is not fast enough. Applications using upscaledb are deployed on tens of millions of desktops, but also on cloud instances, cell phones and other embedded devices. This benchmark runs a full-table scan over 50 million records and retrieves the maximum. The records are configured as uint32 values. -
36
MLlib
Apache Software Foundation
Apache Spark's MLlib is a scalable machine learning library that integrates seamlessly with Spark's APIs, supporting Java, Scala, Python, and R. It offers a comprehensive suite of algorithms and utilities, including classification, regression, clustering, collaborative filtering, and tools for constructing machine learning pipelines. MLlib's high-quality algorithms leverage Spark's iterative computation capabilities, delivering performance up to 100 times faster than traditional MapReduce implementations. It is designed to operate across diverse environments, running on Hadoop, Apache Mesos, Kubernetes, standalone clusters, or in the cloud, and accessing various data sources such as HDFS, HBase, and local files. This flexibility makes MLlib a robust solution for scalable and efficient machine learning tasks within the Apache Spark ecosystem. -
37
Hadoop
Apache Software Foundation
The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models. It is designed to scale up from single servers to thousands of machines, each offering local computation and storage. Rather than rely on hardware to deliver high-availability, the library itself is designed to detect and handle failures at the application layer, so delivering a highly-available service on top of a cluster of computers, each of which may be prone to failures. A wide variety of companies and organizations use Hadoop for both research and production. Users are encouraged to add themselves to the Hadoop PoweredBy wiki page. Apache Hadoop 3.3.4 incorporates a number of significant enhancements over the previous major release line (hadoop-3.2). -
38
Yandex Data Proc
Yandex
You select the size of the cluster, node capacity, and a set of services, and Yandex Data Proc automatically creates and configures Spark and Hadoop clusters and other components. Collaborate by using Zeppelin notebooks and other web apps via a UI proxy. You get full control of your cluster with root permissions for each VM. Install your own applications and libraries on running clusters without having to restart them. Yandex Data Proc uses instance groups to automatically increase or decrease computing resources of compute subclusters based on CPU usage indicators. Data Proc allows you to create managed Hive clusters, which can reduce the probability of failures and losses caused by metadata unavailability. Save time on building ETL pipelines and pipelines for training and developing models, as well as describing other iterative tasks. The Data Proc operator is already built into Apache Airflow.Starting Price: $0.19 per hour -
39
Amazon EMR
Amazon
Amazon EMR is the industry-leading cloud big data platform for processing vast amounts of data using open-source tools such as Apache Spark, Apache Hive, Apache HBase, Apache Flink, Apache Hudi, and Presto. With EMR you can run Petabyte-scale analysis at less than half of the cost of traditional on-premises solutions and over 3x faster than standard Apache Spark. For short-running jobs, you can spin up and spin down clusters and pay per second for the instances used. For long-running workloads, you can create highly available clusters that automatically scale to meet demand. If you have existing on-premises deployments of open-source tools such as Apache Spark and Apache Hive, you can also run EMR clusters on AWS Outposts. Analyze data using open-source ML frameworks such as Apache Spark MLlib, TensorFlow, and Apache MXNet. Connect to Amazon SageMaker Studio for large-scale model training, analysis, and reporting. -
40
GigaSpaces
GigaSpaces
eRAG (enterprise RAG) combines the power of real-time operational data with GPT’s fantastic user experience: Chat spontaneously and get immediate answers grounded in a unique understanding of your operational data. With its sophisticated semantic reasoning capabilities, eRAG ensures you get accurate, consistent answers. It answers complex, cross-system questions instantly, supports decisions with suggestions, challenges, and next steps. eRAG connects your business data with external events, so that you can weigh the effect of new tax legislation or weather disruptions on your operations. eRAG combines all your operational data sources so you can get a full, unified picture of your business, offering measurable revenue and efficiency outcomes. Through a self-serve UI, IT teams can connect SQL-based databases like Oracle, PostgreSQL, SAP and other systems in just a few clicks. And you can get up and running in 2–3 weeks - no data prep needed. -
41
VMware Tanzu GemFire
Broadcom
VMware Tanzu GemFire is a distributed, in-memory, key-value store that performs read and write operations at blazingly fast speeds. It offers highly available parallel message queues, continuous availability, and an event-driven architecture you can scale dynamically, with no downtime. As your data size requirements increase to support high-performance, real-time apps, Tanzu GemFire can scale linearly with ease. Traditional databases are often too brittle or unreliable for use with microservices. That’s why every modern distributed architecture needs a cache! With Tanzu GemFire, applications get low-latency responses to data access requests, and always return fresh data. Your applications can subscribe to real-time events to react to changes immediately. Tanzu GemFire’s continuous queries notify your application when new data is available, which reduces the overhead on your SQL database. -
42
Apache Knox
Apache Software Foundation
The Knox API Gateway is designed as a reverse proxy with consideration for pluggability in the areas of policy enforcement, through providers and the backend services for which it proxies requests. Policy enforcement ranges from authentication/federation, authorization, audit, dispatch, hostmapping and content rewrite rules. Policy is enforced through a chain of providers that are defined within the topology deployment descriptor for each Apache Hadoop cluster gated by Knox. The cluster definition is also defined within the topology deployment descriptor and provides the Knox Gateway with the layout of the cluster for purposes of routing and translation between user facing URLs and cluster internals. Each Apache Hadoop cluster that is protected by Knox has its set of REST APIs represented by a single cluster specific application context path. This allows the Knox Gateway to both protect multiple clusters and present the REST API consumer with a single endpoint. -
43
Apache Mahout
Apache Software Foundation
Apache Mahout is a powerful, scalable, and versatile machine learning library designed for distributed data processing. It offers a comprehensive set of algorithms for various tasks, including classification, clustering, recommendation, and pattern mining. Built on top of the Apache Hadoop ecosystem, Mahout leverages MapReduce and Spark to enable data processing on large-scale datasets. Apache Mahout(TM) is a distributed linear algebra framework and mathematically expressive Scala DSL designed to let mathematicians, statisticians, and data scientists quickly implement their own algorithms. Apache Spark is the recommended out-of-the-box distributed back-end or can be extended to other distributed backends. Matrix computations are a fundamental part of many scientific and engineering applications, including machine learning, computer vision, and data analysis. Apache Mahout is designed to handle large-scale data processing by leveraging the power of Hadoop and Spark. -
44
Focus on developing data stream processing applications and don’t waste time maintaining the infrastructure. Managed Service for Apache Kafka is responsible for managing Zookeeper brokers and clusters, configuring clusters, and updating their versions. Distribute your cluster brokers across different availability zones and set the replication factor to ensure the desired level of fault tolerance. The service analyzes the metrics and status of the cluster and automatically replaces it if one of the nodes fails. For each topic, you can set the replication factor, log cleanup policy, compression type, and maximum number of messages to make better use of computing, network, and disk resources. You can add brokers to your cluster with just a click of a button to improve its performance, or change the class of high-availability hosts without stopping them or losing any data.
-
45
DataStax
DataStax
The Open, Multi-Cloud Stack for Modern Data Apps. Built on open-source Apache Cassandra™. Global-scale and 100% uptime without vendor lock-in. Deploy on multi-cloud, on-prem, open-source, and Kubernetes. Elastic and pay-as-you-go for improved TCO. Start building faster with Stargate APIs for NoSQL, real-time, reactive, JSON, REST, and GraphQL. Skip the complexity of multiple OSS projects and APIs that don’t scale. Ideal for commerce, mobile, AI/ML, IoT, microservices, social, gaming, and richly interactive applications that must scale-up and scale-down with demand. Get building modern data applications with Astra, a database-as-a-service powered by Apache Cassandra™. Use REST, GraphQL, JSON with your favorite full-stack framework Richly interactive apps that are elastic and viral-ready from Day 1. Pay-as-you-go Apache Cassandra DBaaS that scales effortlessly and affordably. -
46
Apache Spark
Apache Software Foundation
Apache Spark™ is a unified analytics engine for large-scale data processing. Apache Spark achieves high performance for both batch and streaming data, using a state-of-the-art DAG scheduler, a query optimizer, and a physical execution engine. Spark offers over 80 high-level operators that make it easy to build parallel apps. And you can use it interactively from the Scala, Python, R, and SQL shells. Spark powers a stack of libraries including SQL and DataFrames, MLlib for machine learning, GraphX, and Spark Streaming. You can combine these libraries seamlessly in the same application. Spark runs on Hadoop, Apache Mesos, Kubernetes, standalone, or in the cloud. It can access diverse data sources. You can run Spark using its standalone cluster mode, on EC2, on Hadoop YARN, on Mesos, or on Kubernetes. Access data in HDFS, Alluxio, Apache Cassandra, Apache HBase, Apache Hive, and hundreds of other data sources. -
47
Oracle Berkeley DB
Oracle
Berkeley DB is a family of embedded key-value database libraries providing scalable high-performance data management services to applications. The Berkeley DB products use simple function-call APIs for data access and management. Berkeley DB enables the development of custom data management solutions, without the overhead traditionally associated with such custom projects. Berkeley DB provides a collection of well-proven building-block technologies that can be configured to address any application need from the hand-held device to the data center, from a local storage solution to a world-wide distributed one, from kilobytes to petabytes. -
48
Apache Ignite
Apache Ignite
Use Ignite as a traditional SQL database by leveraging JDBC drivers, ODBC drivers, or the native SQL APIs that are available for Java, C#, C++, Python, and other programming languages. Seamlessly join, group, aggregate, and order your distributed in-memory and on-disk data. Accelerate your existing applications by 100x using Ignite as an in-memory cache or in-memory data grid that is deployed over one or more external databases. Think of a cache that you can query with SQL, transact, and compute on. Build modern applications that support transactional and analytical workloads by using Ignite as a database that scales beyond the available memory capacity. Ignite allocates memory for your hot data and goes to disk whenever applications query cold records. Execute kilobyte-size custom code over petabytes of data. Turn your Ignite database into a distributed supercomputer for low-latency calculations, complex analytics, and machine learning. -
49
eXtremeDB
McObject
How is platform independent eXtremeDB different? - Hybrid data storage. Unlike other IMDS, eXtremeDB can be all-in-memory, all-persistent, or have a mix of in-memory tables and persistent tables - Active Replication Fabric™ is unique to eXtremeDB, offering bidirectional replication, multi-tier replication (e.g. edge-to-gateway-to-gateway-to-cloud), compression to maximize limited bandwidth networks and more - Row & Columnar Flexibility for Time Series Data supports database designs that combine row-based and column-based layouts, in order to best leverage the CPU cache speed - Embedded and Client/Server. Fast, flexible eXtremeDB is data management wherever you need it, and can be deployed as an embedded database system, and/or as a client/server database system -A hard real-time deterministic option in eXtremeDB/rt Designed for use in resource-constrained, mission-critical embedded systems. Found in everything from routers to satellites to trains to stock markets worldwide -
50
Yugabyte
Yugabyte
The Leading High-Performance Distributed SQL Database. Open source, cloud native relational DB for powering global, internet-scale apps. Single-Digit Millisecond Latency Build blazing fast cloud applications by serving queries directly from the DB. Massive Scale. Achieve millions of transactions per second and store multiple TB’s of data per node. Geo-Distribution. Deploy across regions and clouds with synchronous or multi-master replication. Built for Cloud Native Architectures. Develop, deploy and operationalize modern applications faster than ever before with YugabyteDB. Gain Developer Agility. Leverage full power of PostgreSQL-compatible SQL and distributed ACID transactions. Operate Resilient Services. Ensure continuous availability even when underlying compute, storage or network fails. Scale On-Demand. Add and remove nodes at will. Say no to over-provisioned clusters forever. Lower User Latency.