Alternatives to lakeFS

Compare lakeFS alternatives for your business or organization using the curated list below. SourceForge ranks the best alternatives to lakeFS in 2025. Compare features, ratings, user reviews, pricing, and more from lakeFS competitors and alternatives in order to make an informed decision for your business.

  • 1
    Minitab Connect
    The best insights are based on the most complete, most accurate, and most timely data. Minitab Connect empowers data users from across the enterprise with self-serve tools to transform diverse data into a governed network of data pipelines, feed analytics initiatives and foster organization-wide collaboration. Users can effortlessly blend and explore data from databases, cloud and on-premise apps, unstructured data, spreadsheets, and more. Flexible, automated workflows accelerate every step of the data integration process, while powerful data preparation and visualization tools help yield transformative insights. Flexible, intuitive data integration tools let users connect and blend data from a variety of internal and external sources, like data warehouses, data lakes, IoT devices, SaaS applications, cloud storage, spreadsheets, and email.
  • 2
    FileCloud

    FileCloud

    CodeLathe

    #1 Enterprise File Sharing, Sync, Backup & Remote Access. Get complete data ownership, control and governance. Self-host it on-premises or on cloud. Run your own private Dropbox-like file sharing and sync solution, integrated with your IT infrastructure and storage. We host FileCloud for you on a world class infrastructure in the region of your choice. No installation. We take care of all the technical details. Run FileCloud on your infrastructure, with full control over your data. Self-host FileCloud on AWS, AWS GovCloud and Azure. Pre-built FileCloud images are available on AWS and Azure marketplaces. Supports local storage (Disk, Network Shares, CIFS/NFS) and cloud storage. Can connect to multiple storage endpoints. Supports AWS S3, Azure Blob, Wasabi, EMC ECS and other S3 compatible storage systems. Both primary (managed) and file gateway (Network share) storage modes are supported.
    Starting Price: $50.00/year/user
  • 3
    Delta Lake

    Delta Lake

    Delta Lake

    Delta Lake is an open-source storage layer that brings ACID transactions to Apache Spark™ and big data workloads. Data lakes typically have multiple data pipelines reading and writing data concurrently, and data engineers have to go through a tedious process to ensure data integrity, due to the lack of transactions. Delta Lake brings ACID transactions to your data lakes. It provides serializability, the strongest level of isolation level. Learn more at Diving into Delta Lake: Unpacking the Transaction Log. In big data, even the metadata itself can be "big data". Delta Lake treats metadata just like data, leveraging Spark's distributed processing power to handle all its metadata. As a result, Delta Lake can handle petabyte-scale tables with billions of partitions and files at ease. Delta Lake provides snapshots of data enabling developers to access and revert to earlier versions of data for audits, rollbacks or to reproduce experiments.
  • 4
    Azure Blob Storage
    Massively scalable and secure object storage for cloud-native workloads, archives, data lakes, high-performance computing, and machine learning. Azure Blob Storage helps you create data lakes for your analytics needs, and provides storage to build powerful cloud-native and mobile apps. Optimize costs with tiered storage for your long-term data, and flexibly scale up for high-performance computing and machine learning workloads. Blob storage is built from the ground up to support the scale, security, and availability needs of mobile, web, and cloud-native application developers. Use it as a cornerstone for serverless architectures such as Azure Functions. Blob storage supports the most popular development frameworks, including Java, .NET, Python, and Node.js, and is the only cloud storage service that offers a premium, SSD-based object storage tier for low-latency and interactive scenarios.
    Starting Price: $0.00099
  • 5
    BigLake

    BigLake

    Google

    BigLake is a storage engine that unifies data warehouses and lakes by enabling BigQuery and open-source frameworks like Spark to access data with fine-grained access control. BigLake provides accelerated query performance across multi-cloud storage and open formats such as Apache Iceberg. Store a single copy of data with uniform features across data warehouses & lakes. Fine-grained access control and multi-cloud governance over distributed data. Seamless integration with open-source analytics tools and open data formats. Unlock analytics on distributed data regardless of where and how it’s stored, while choosing the best analytics tools, open source or cloud-native over a single copy of data. Fine-grained access control across open source engines like Apache Spark, Presto, and Trino, and open formats such as Parquet. Performant queries over data lakes powered by BigQuery. Integrates with Dataplex to provide management at scale, including logical data organization.
    Starting Price: $5 per TB
  • 6
    Cribl Search
    Cribl Search delivers next-generation search-in-place technology, empowering users to explore, discover, and analyze data that was previously impossible – directly at its source, across any cloud, even data locked behind APIs. Effortlessly search your Cribl Lake or sift through data in major object stores like AWS S3, Amazon Security Lake, Azure Blob, and Google Cloud Storage, and enrich your insights by querying dozens of live API endpoints from various SaaS providers. The power of Cribl Search lies in its strategic approach: forward only the critical data to your systems of analysis, thus avoiding the cost of expensive storage. With native support for platforms such as Amazon Security Lake, AWS S3, Azure Blob, and Google Cloud Storage, Cribl Search delivers a first-of-its-kind ability to seamlessly analyze all data right at its source. Cribl Search allows users to search and analyze data wherever it is located, from debug logs at the edge to archived data in cold storage.
  • 7
    ELCA Smart Data Lake Builder
    Classical Data Lakes are often reduced to basic but cheap raw data storage, neglecting significant aspects like transformation, data quality and security. These topics are left to data scientists, who end up spending up to 80% of their time acquiring, understanding and cleaning data before they can start using their core competencies. In addition, classical Data Lakes are often implemented by separate departments using different standards and tools, which makes it harder to implement comprehensive analytical use cases. Smart Data Lakes solve these various issues by providing architectural and methodical guidelines, together with an efficient tool to build a strong high-quality data foundation. Smart Data Lakes are at the core of any modern analytics platform. Their structure easily integrates prevalent Data Science tools and open source technologies, as well as AI and ML. Their storage is cheap and scalable, supporting both unstructured data and complex data structures.
    Starting Price: Free
  • 8
    Dremio

    Dremio

    Dremio

    Dremio delivers lightning-fast queries and a self-service semantic layer directly on your data lake storage. No moving data to proprietary data warehouses, no cubes, no aggregation tables or extracts. Just flexibility and control for data architects, and self-service for data consumers. Dremio technologies like Data Reflections, Columnar Cloud Cache (C3) and Predictive Pipelining work alongside Apache Arrow to make queries on your data lake storage very, very fast. An abstraction layer enables IT to apply security and business meaning, while enabling analysts and data scientists to explore data and derive new virtual datasets. Dremio’s semantic layer is an integrated, searchable catalog that indexes all of your metadata, so business users can easily make sense of your data. Virtual datasets and spaces make up the semantic layer, and are all indexed and searchable.
  • 9
    Azure Data Lake
    Azure Data Lake includes all the capabilities required to make it easy for developers, data scientists, and analysts to store data of any size, shape, and speed, and do all types of processing and analytics across platforms and languages. It removes the complexities of ingesting and storing all of your data while making it faster to get up and running with batch, streaming, and interactive analytics. Azure Data Lake works with existing IT investments for identity, management, and security for simplified data management and governance. It also integrates seamlessly with operational stores and data warehouses so you can extend current data applications. We’ve drawn on the experience of working with enterprise customers and running some of the largest scale processing and analytics in the world for Microsoft businesses like Office 365, Xbox Live, Azure, Windows, Bing, and Skype. Azure Data Lake solves many of the productivity and scalability challenges that prevent you from maximizing the
  • 10
    Electrik.Ai

    Electrik.Ai

    Electrik.Ai

    Automatically ingest marketing data into any data warehouse or cloud file storage of your choice such as BigQuery, Snowflake, Redshift, Azure SQL, AWS S3, Azure Data Lake, Google Cloud Storage with our fully managed ETL pipelines in the cloud. Our hosted marketing data warehouse integrates all your marketing data and provides ad insights, cross-channel attribution, content insights, competitor Insights, and more. Our customer data platform performs identity resolution in real-time across data sources thus enabling a unified view of the customer and their journey. Electrik.AI is a cloud-based marketing analytics software and full-service platform. Electrik.AI’s Google Analytics Hit Data Extractor enriches and extracts the un-sampled hit level data sent to Google Analytics from the website or application and periodically ships it to your desired destination database/data warehouse or file/data lake.
    Starting Price: $49 per month
  • 11
    Lentiq

    Lentiq

    Lentiq

    Lentiq is a collaborative data lake as a service environment that’s built to enable small teams to do big things. Quickly run data science, machine learning and data analysis at scale in the cloud of your choice. With Lentiq, your teams can ingest data in real time and then process, clean and share it. From there, Lentiq makes it possible to build, train and share models internally. Simply put, data teams can collaborate with Lentiq and innovate with no restrictions. Data lakes are storage and processing environments, which provide ML, ETL, schema-on-read querying capabilities and so much more. Are you working on some data science magic? You definitely need a data lake. In the Post-Hadoop era, the big, centralized data lake is a thing of the past. With Lentiq, we use data pools, which are multi-cloud, interconnected mini-data lakes. They work together to give you a stable, secure and fast data science environment.
  • 12
    Upsolver

    Upsolver

    Upsolver

    Upsolver makes it incredibly simple to build a governed data lake and to manage, integrate and prepare streaming data for analysis. Define pipelines using only SQL on auto-generated schema-on-read. Easy visual IDE to accelerate building pipelines. Add Upserts and Deletes to data lake tables. Blend streaming and large-scale batch data. Automated schema evolution and reprocessing from previous state. Automatic orchestration of pipelines (no DAGs). Fully-managed execution at scale. Strong consistency guarantee over object storage. Near-zero maintenance overhead for analytics-ready data. Built-in hygiene for data lake tables including columnar formats, partitioning, compaction and vacuuming. 100,000 events per second (billions daily) at low cost. Continuous lock-free compaction to avoid “small files” problem. Parquet-based tables for fast queries.
  • 13
    Alibaba Cloud Data Lake Formation
    A data lake is a centralized repository used for big data and AI computing. It allows you to store structured and unstructured data at any scale. Data Lake Formation (DLF) is a key component of the cloud-native data lake framework. DLF provides an easy way to build a cloud-native data lake. It seamlessly integrates with a variety of compute engines and allows you to manage the metadata in data lakes in a centralized manner and control enterprise-class permissions. Systematically collects structured, semi-structured, and unstructured data and supports massive data storage. Uses an architecture that separates computing from storage. You can plan resources on demand at low costs. This improves data processing efficiency to meet the rapidly changing business requirements. DLF can automatically discover and collect metadata from multiple engines and manage the metadata in a centralized manner to solve the data silo issues.
  • 14
    Qlik Data Integration
    The Qlik Data Integration platform for managed data lakes automates the process of providing continuously updated, accurate, and trusted data sets for business analytics. Data engineers have the agility to quickly add new sources and ensure success at every step of the data lake pipeline from real-time data ingestion, to refinement, provisioning, and governance. A simple and universal solution for continually ingesting enterprise data into popular data lakes in real-time. A model-driven approach for quickly designing, building, and managing data lakes on-premises or in the cloud. Deliver a smart enterprise-scale data catalog to securely share all of your derived data sets with business users.
  • 15
    Data Lakes on AWS
    Many Amazon Web Services (AWS) customers require a data storage and analytics solution that offers more agility and flexibility than traditional data management systems. A data lake is a new and increasingly popular way to store and analyze data because it allows companies to manage multiple data types from a wide variety of sources, and store this data, structured and unstructured, in a centralized repository. The AWS Cloud provides many of the building blocks required to help customers implement a secure, flexible, and cost-effective data lake. These include AWS managed services that help ingest, store, find, process, and analyze both structured and unstructured data. To support our customers as they build data lakes, AWS offers the data lake solution, which is an automated reference implementation that deploys a highly available, cost-effective data lake architecture on the AWS Cloud along with a user-friendly console for searching and requesting datasets.
  • 16
    Azure Data Lake Analytics
    Easily develop and run massively parallel data transformation and processing programs in U-SQL, R, Python, and .NET over petabytes of data. With no infrastructure to manage, you can process data on demand, scale instantly, and only pay per job. Process big data jobs in seconds with Azure Data Lake Analytics. There is no infrastructure to worry about because there are no servers, virtual machines, or clusters to wait for, manage, or tune. Instantly scale the processing power, measured in Azure Data Lake Analytics Units (AU), from one to thousands for each job. You only pay for the processing that you use per job. Act on all of your data with optimized data virtualization of your relational sources such as Azure SQL Database and Azure Synapse Analytics. Your queries are automatically optimized by moving processing close to the source data without data movement, which maximizes performance and minimizes latency.
    Starting Price: $2 per hour
  • 17
    Cazena

    Cazena

    Cazena

    Cazena’s Instant Data Lake accelerates time to analytics and AI/ML from months to minutes. Powered by its patented automated data platform, Cazena delivers the first SaaS experience for data lakes. Zero operations required. Enterprises need a data lake that easily supports all of their data and tools for analytics, machine learning and AI. To be effective, a data lake must offer secure data ingestion, flexible data storage, access and identity management, tool integration, optimization and more. Cloud data lakes are complicated to do yourself, which is why they require expensive teams. Cazena’s Instant Cloud Data Lakes are instantly production-ready for data loading and analytics. Everything is automated, supported on Cazena’s SaaS Platform with continuous Ops and self-service access via the Cazena SaaS Console. Cazena's Instant Data Lakes are turnkey and production-ready for secure data ingest, storage and analytics.
  • 18
    Dimodelo

    Dimodelo

    Dimodelo

    Stay focused on delivering valuable and impressive reporting, analytics and insights, instead of being stuck in data warehouse code. Don’t let your data warehouse become a jumble of 100’s of hard-to-maintain pipelines, notebooks, stored procedures, tables. and views etc. Dimodelo DW Studio dramatically reduces the effort required to design, build, deploy and run a data warehouse. Design, generate and deploy a data warehouse targeting Azure Synapse Analytics. Generating a best practice architecture utilizing Azure Data Lake, Polybase and Azure Synapse Analytics, Dimodelo Data Warehouse Studio delivers a high-performance, modern data warehouse in the cloud. Utilizing parallel bulk loads and in-memory tables, Dimodelo Data Warehouse Studio generates a best practice architecture that delivers a high-performance, modern data warehouse in the cloud.
    Starting Price: $899 per month
  • 19
    Azure Storage Explorer
    Manage your storage accounts in multiple subscriptions across all Azure regions, Azure Stack, and Azure Government. Add new features and capabilities with extensions to manage even more of your cloud storage needs. Accessible, intuitive, and feature-rich graphical user interface (GUI) for full management of cloud storage resources. Securely access your data using Azure AD and fine-tuned access control list (ACL) permissions. Efficiently connect and manage your Azure storage service accounts and resources across subscriptions and organizations. Create, delete, view, edit, and manage resources for Azure Storage, Azure Data Lake Storage, and Azure managed disks. Seamlessly view, search, and interact with your data and resources using an intuitive interface. Improved accessibility with multiple screen reader options, high contrast themes, and hot keys on Windows and macOS.
  • 20
    SelectDB

    SelectDB

    SelectDB

    SelectDB is a modern data warehouse based on Apache Doris, which supports rapid query analysis on large-scale real-time data. From Clickhouse to Apache Doris, to achieve the separation of the lake warehouse and upgrade to the lake warehouse. The fast-hand OLAP system carries nearly 1 billion query requests every day to provide data services for multiple scenes. Due to the problems of storage redundancy, resource seizure, complicated governance, and difficulty in querying and adjustment, the original lake warehouse separation architecture was decided to introduce Apache Doris lake warehouse, combined with Doris's materialized view rewriting ability and automated services, to achieve high-performance data query and flexible data governance. Write real-time data in seconds, and synchronize flow data from databases and data streams. Data storage engine for real-time update, real-time addition, and real-time pre-polymerization.
    Starting Price: $0.22 per hour
  • 21
    Ganymede

    Ganymede

    Ganymede

    Metadata such as instrument settings, last date of service, which user performed the analysis, and experiment time is not tracked. Raw data is lost; analyses cannot be modified or re-run without substantial effort. Lack of traceability makes meta-analyses difficult. Even just entering the primary analysis results becomes a drag on scientists' productivity. Raw data is saved in the cloud and analysis is automated, with traceability in between. Data can then go into ELNs/LIMS, Excel, analysis apps, pipelines - anything. We also build a data lake of this as we go. All your raw data, analyzed data, metadata, and even the internal data from your intergrated apps is saved forever in a single cloud data lake. Run analyses automatically and add metadata automatically. Push results into any app or pipeline, or even back to instruments for control.
  • 22
    SAS Data Loader for Hadoop
    Load your data into or out of Hadoop and data lakes. Prep it so it's ready for reports, visualizations or advanced analytics – all inside the data lakes. And do it all yourself, quickly and easily. Makes it easy to access, transform and manage data stored in Hadoop or data lakes with a web-based interface that reduces training requirements. Built from the ground up to manage big data on Hadoop or in data lakes; not repurposed from existing IT-focused tools. Lets you group multiple directives to run simultaneously or one after the other. Schedule and automate directives using the exposed Public API. Enables you to share and secure directives. Call them from SAS Data Integration Studio, uniting technical and nontechnical user activities. Includes built-in directives – casing, gender and pattern analysis, field extraction, match-merge and cluster-survive. Profiling runs in-parallel on the Hadoop cluster for better performance.
  • 23
    Tarsal

    Tarsal

    Tarsal

    Tarsal's infinite scalability means as your organization grows, Tarsal grows with you. Tarsal makes it easy for you to switch where you're sending data - today's SIEM data is tomorrow's data lake data; all with one click. Keep your SIEM and gradually migrate analytics over to a data lake. You don't have to rip anything out to use Tarsal. Some analytics just won't run on your SIEM. Use Tarsal to have query-ready data on a data lake. Your SIEM is one of the biggest line items in your budget. Use Tarsal to send some of that data to your data lake. Tarsal is the first highly scalable ETL data pipeline built for security teams. Easily exfil terabytes of data in just just a few clicks, with instant normalization, and route that data to your desired destination.
  • 24
    IBM watsonx.data
    Put your data to work, wherever it resides, with the open, hybrid data lakehouse for AI and analytics. Connect your data from anywhere, in any format, and access through a single point of entry with a shared metadata layer. Optimize workloads for price and performance by pairing the right workloads with the right query engine. Embed natural-language semantic search without the need for SQL, so you can unlock generative AI insights faster. Manage and prepare trusted data to improve the relevance and precision of your AI applications. Use all your data, everywhere. With the speed of a data warehouse, the flexibility of a data lake, and special features to support AI, watsonx.data can help you scale AI and analytics across your business. Choose the right engines for your workloads. Flexibly manage cost, performance, and capability with access to multiple open engines including Presto, Presto C++, Spark Milvus, and more.
  • 25
    NooBaa

    NooBaa

    Red Hat

    NooBaa is a software-driven infrastructure that enables agility, flexibility and hybrid cloud capabilities. A deployment takes 5 minutes from download to an operational system. With unprecedented flexibility, pay-as-you-go pricing, and incredible management simplicity, NooBaa represents an entirely new approach to managing the explosive growth of data. NooBaa can consume data from AWS S3, Microsoft Azure Blobs, Google Storage or any AWS S3 compatible storage Private Cloud. Eliminate vendor lock-in, allowing your application software stack to be independent of the underlying infrastructure. This independence also creates the interoperability required for fast migration or expansion of workloads. It allows you to run a specific workload on a specific platform, without worrying about the storage. NooBaa provides an AWS S3-compatible API, the de facto standard, independent of any specific vendor or location.
  • 26
    Cloud Storage Manager

    Cloud Storage Manager

    SmiKar Software

    Azure storage consumption is growing at an incredible pace, even faster than originally predicted. Organizations have an ever-growing data footprint and are therefore eager to take advantage of Azure and it’s limitless supply of storage and resources. However, as an organization’s storage requirements grow, it’s easy to lose track of where all the storage is being consumed, which also means the Azure storage cost keeps going up often causing cost blowout. With Cloud Storage Manager you will be able to instantly see where all your storage is going, allowing you to take back control and save money. Cloud Storage Manager provides you with an Azure Explorer like view of all your Azure Blobs and what resides in your Azure Files. From this view you can see details of each individual Blob, including Blob size, date the Azure Blob was created and last modified, as well as what Storage Tiering the Blob currently is in.
    Starting Price: $500
  • 27
    Apache Doris

    Apache Doris

    The Apache Software Foundation

    Apache Doris is a modern data warehouse for real-time analytics. It delivers lightning-fast analytics on real-time data at scale. Push-based micro-batch and pull-based streaming data ingestion within a second. Storage engine with real-time upsert, append and pre-aggregation. Optimize for high-concurrency and high-throughput queries with columnar storage engine, MPP architecture, cost based query optimizer, vectorized execution engine. Federated querying of data lakes such as Hive, Iceberg and Hudi, and databases such as MySQL and PostgreSQL. Compound data types such as Array, Map and JSON. Variant data type to support auto data type inference of JSON data. NGram bloomfilter and inverted index for text searches. Distributed design for linear scalability. Workload isolation and tiered storage for efficient resource management. Supports shared-nothing clusters as well as separation of storage and compute.
    Starting Price: Free
  • 28
    Oracle Big Data Service
    Oracle Big Data Service makes it easy for customers to deploy Hadoop clusters of all sizes, with VM shapes ranging from 1 OCPU to a dedicated bare metal environment. Customers choose between high-performance NVmE storage or cost-effective block storage, and can grow or shrink their clusters. Quickly create Hadoop-based data lakes to extend or complement customer data warehouses, and ensure that all data is both accessible and managed cost-effectively. Query, visualize and transform data so data scientists can build machine learning models using the included notebook with its R, Python and SQL support. Move customer-managed Hadoop clusters to a fully-managed cloud-based service, reducing management costs and improving resource utilization.
    Starting Price: $0.1344 per hour
  • 29
    Azure Chaos Studio
    Improve application resilience with chaos engineering and testing by deliberately introducing faults that simulate real-world outages. Azure Chaos Studio is a fully managed chaos engineering experimentation platform for accelerating the discovery of hard-to-find problems, from late-stage development through production. Disrupt your apps intentionally to identify gaps and plan mitigations before your customers are impacted by a problem. Experiment by subjecting your Azure apps to real or simulated faults in a controlled manner to better understand application resilience. Observe how your apps will respond to real-world disruptions such as network latency, an unexpected storage outage, expiring secrets, or even a full data center outage with chaos engineering and testing. Validate product quality when and where it makes sense for your organization. Take advantage of a hypothesis-based approach to drive application resilience with integrated chaos in your CI/CD pipeline.
    Starting Price: $0.10 per action-minute
  • 30
    Qlik Compose
    Qlik Compose for Data Warehouses provides a modern approach by automating and optimizing data warehouse creation and operation. Qlik Compose automates designing the warehouse, generating ETL code, and quickly applying updates, all whilst leveraging best practices and proven design patterns. Qlik Compose for Data Warehouses dramatically reduces the time, cost and risk of BI projects, whether on-premises or in the cloud. Qlik Compose for Data Lakes automates your data pipelines to create analytics-ready data sets. By automating data ingestion, schema creation, and continual updates, organizations realize faster time-to-value from their existing data lake investments.
  • 31
    Etleap

    Etleap

    Etleap

    Etleap was built from the ground up on AWS to support Redshift and snowflake data warehouses and S3/Glue data lakes. Their solution simplifies and automates ETL by offering fully-managed ETL-as-a-service. Etleap's data wrangler and modeling tools let users control how data is transformed for analysis, without writing any code. Etleap monitors and maintains data pipelines for availability and completeness, eliminating the need for constant maintenance, and centralizes data from 50+ disparate sources and silos into your data warehouse or data lake.
  • 32
    Hyper Historian
    ICONICS’ Hyper Historian™ is an advanced 64-bit high-speed, reliable, and robust historian. Designed for the most mission-critical applications, Hyper Historian's advanced high compression algorithm delivers unparalleled performance with very efficient use of resources. Hyper Historian integrates with our ISA-95-compliant asset database and the latest big data technologies, including Azure SQL, Microsoft Data Lakes, Kafka, and Hadoop. This makes Hyper Historian the most efficient and secure real-time plant historian for any Microsoft operating system. Hyper Historian includes a module for automatic or manual insertion of data, empowering users to import historical or log data from databases, other historians, or intermittently connected field devices and equipment. This also provides for greatly increased reliability in capturing all data, even when network disruptions occur. Leverage rapid collection for enterprise-wide storage.
  • 33
    Observo AI

    Observo AI

    Observo AI

    ​Observo AI is an AI-native data pipeline platform designed to address the challenges of managing vast amounts of telemetry data in security and DevOps operations. By leveraging machine learning and agentic AI, Observo AI automates data optimization, enabling enterprises to process AI-generated data more efficiently, securely, and cost-effectively. It reduces data processing costs by over 50% and accelerates incident response times by more than 40%. Observo AI's features include intelligent data deduplication and compression, real-time anomaly detection, and dynamic data routing to appropriate storage or analysis tools. It also enriches data streams with contextual information to enhance threat detection accuracy while minimizing false positives. Observo AI offers a searchable cloud data lake for efficient data storage and retrieval.
  • 34
    AWS HealthLake
    Extract meaning from unstructured data with integrated Amazon Comprehend Medical for easy search and querying. Make predictions on health data using Amazon Athena queries, Amazon SageMaker ML models, and Amazon QuickSight analytics. Support interoperable standards such as the Fast Healthcare Interoperability Resources (FHIR). Run medical imaging applications in the cloud to increase scale and reduce costs. AWS HealthLake is a HIPAA-eligible service offering healthcare and life sciences companies a chronological view of individual or patient population health data for query and analytics at scale. Analyze population health trends, predict outcomes, and manage costs with advanced analytics tools and ML models. Identify opportunities to close gaps in care and deliver targeted interventions with a longitudinal view of patient journeys. Apply advanced analytics and ML to newly structured data to optimize appointment scheduling and reduce unnecessary procedures.
  • 35
    Archon Data Store

    Archon Data Store

    Platform 3 Solutions

    Archon Data Store™ is a powerful and secure open-source based archive lakehouse platform designed to store, manage, and provide insights from massive volumes of data. With its compliance features and minimal footprint, it enables large-scale search, processing, and analysis of structured, unstructured, & semi-structured data across your organization. Archon Data Store combines the best features of data warehouses and data lakes into a single, simplified platform. This unified approach eliminates data silos, streamlining data engineering, analytics, data science, and machine learning workflows. Through metadata centralization, optimized data storage, and distributed computing, Archon Data Store maintains data integrity. Its common approach to data management, security, and governance helps you operate more efficiently and innovate faster. Archon Data Store provides a single platform for archiving and analyzing all your organization's data while delivering operational efficiencies.
  • 36
    Google Cloud Data Fusion
    Open core, delivering hybrid and multi-cloud integration. Data Fusion is built using open source project CDAP, and this open core ensures data pipeline portability for users. CDAP’s broad integration with on-premises and public cloud platforms gives Cloud Data Fusion users the ability to break down silos and deliver insights that were previously inaccessible. Integrated with Google’s industry-leading big data tools. Data Fusion’s integration with Google Cloud simplifies data security and ensures data is immediately available for analysis. Whether you’re curating a data lake with Cloud Storage and Dataproc, moving data into BigQuery for data warehousing, or transforming data to land it in a relational store like Cloud Spanner, Cloud Data Fusion’s integration makes development and iteration fast and easy.
  • 37
    Altada

    Altada

    Altada Technology Solutions

    The Altada AI Platform empowers our customers to leverage their data fabric to achieve extraordinary outcomes in automation and data-driven decision making. We provide a complete view of the data supply chain using ingestion, indexing, data remediations and inference, enabling businesses to scale, increase profitability and realise measurable impact. Ingesting data from client data lakes to our secure storage systems through a robust scalable data pipeline. Classifying, categorizing and validating the scanned documents in a matter of seconds using advanced image classification techniques along with NLP techniques. A query interface and personalizable dashboard will allow the user to search the data and present the results in a readable format to bookmark, filter or restructure the view.
  • 38
    PuppyGraph

    PuppyGraph

    PuppyGraph

    PuppyGraph empowers you to seamlessly query one or multiple data stores as a unified graph model. Graph databases are expensive, take months to set up, and need a dedicated team. Traditional graph databases can take hours to run multi-hop queries and struggle beyond 100GB of data. A separate graph database complicates your architecture with brittle ETLs and inflates your total cost of ownership (TCO). Connect to any data source anywhere. Cross-cloud and cross-region graph analytics. No complex ETLs or data replication is required. PuppyGraph enables you to query your data as a graph by directly connecting to your data warehouses and lakes. This eliminates the need to build and maintain time-consuming ETL pipelines needed with a traditional graph database setup. No more waiting for data and failed ETL processes. PuppyGraph eradicates graph scalability issues by separating computation and storage.
    Starting Price: Free
  • 39
    Symantec Cloud Workload Protection
    Many applications and services running in public clouds use Amazon S3 buckets and Azure Blob storage. Over time, storage can become contaminated with malware, misconfigured buckets can allow data breaches, and unclassified sensitive data can result in compliance violations and fines. CWP for Storage automatically discovers and scans Amazon S3 buckets and Azure Blobs to keep cloud storage clean and secure. CWP for Storage DLP applies Symantec DLP policy to Amazon S3 to discover and classify sensitive information. AWS Tags can be applied as needed for remediation and further actions in time. Cloud security posture management (CSPM) for Amazon Web Services (AWS), Microsoft Azure, and Google Cloud Platform (GCP). Containers improve agility, however they also bring public cloud security challenges and vulnerabilities that increase risk.
  • 40
    Quantarium

    Quantarium

    Quantarium

    Built on the foundation of real AI, Quantarium’s innovative-yet-explainable solutions enable more accurate decision making, comprehensively spanning valuations, analytics, propensity models and portfolio optimization. The most accurate real estate insights into property values and trends instantly. Industry-leading highly scalable and resilient next-generation cloud Infrastructure. Quantarium’s adaptive AI computer vision technology is trained on millions of real estate images, and its knowledge is then incorporated into a range of QVM-based solutions. An asset within the Quantarium Data Lake, our managed data set is the most comprehensive and dynamic in the real estate industry. A machine-generated and AI-enhanced data set, curated by AI scientists, data scientists, software engineers, and industry experts, this is the new standard in real estate information. Quantarium combines deep domain expertise, self-learning technology, and innovative computer vision.
  • 41
    Auguria

    Auguria

    Auguria

    Auguria is a cloud-native security data platform that harnesses human-machine teaming to extract the 1 percent of event data that matters from billions of logs in real time by cleansing, denoising, and ranking security events. At its core is the Auguria Security Knowledge Layer, a vector database and embedding engine built on an ontology distilled from decades of real-world SecOps experience, which semantically groups trillions of events into investigation-worthy insights. Without requiring expert data engineering, users can connect any data source to an automated pipeline that prioritizes, filters, and routes events to SIEM, XDR, data lakes, or object storage. Auguria continuously updates its state-of-the-art AI models with new security signals and state-specific context, provides anomaly scoring and justifications for each event, and delivers real-time dashboards and analytics to accelerate incident triage, threat hunting, and compliance.
  • 42
    SAP IQ
    Enhance in-the-moment decision-making with SAP IQ, our columnar relational database management system (RDBMS) optimized for Big Data analytics. Gain maximum speed, power, and security, while supporting extreme-scale enterprise data warehousing and Big Data analytics, with this affordable, efficient relational database management system (RDBMS) for SAP Business Technology Platform. Deploy as a fully managed cloud service on a major hyper scale platform. Ingest, store, and query large data volumes within a relational data lake that provides a native object storage for most file types. Provide a compatible, fully managed cloud option for SAP IQ customers that extend existing Sybase investments. Simplify the migration of existing SAP IQ databases to the cloud. Make Big Data available faster to applications and people for in-the-moment decisions.
  • 43
    Trino

    Trino

    Trino

    Trino is a query engine that runs at ludicrous speed. Fast-distributed SQL query engine for big data analytics that helps you explore your data universe. Trino is a highly parallel and distributed query engine, that is built from the ground up for efficient, low-latency analytics. The largest organizations in the world use Trino to query exabyte-scale data lakes and massive data warehouses alike. Supports diverse use cases, ad-hoc analytics at interactive speeds, massive multi-hour batch queries, and high-volume apps that perform sub-second queries. Trino is an ANSI SQL-compliant query engine, that works with BI tools such as R, Tableau, Power BI, Superset, and many others. You can natively query data in Hadoop, S3, Cassandra, MySQL, and many others, without the need for complex, slow, and error-prone processes for copying the data. Access data from multiple systems within a single query.
    Starting Price: Free
  • 44
    Cribl Lake
    Storage that doesn’t lock data in. Get up and running fast with a managed data lake. Easily store, access, and retrieve data, without being a data expert. Cribl Lake keeps you from drowning in data. Easily store, manage, enforce policy on, and access data when you need. Dive into the future with open formats and unified retention, security, and access control policies. Let Cribl handle the heavy lifting so data can be usable and valuable to the teams and tools that need it. Minutes, not months to get up and running with Cribl Lake. Zero configuration with automated provisioning and out-of-the-box integrations. Streamline workflows with Stream and Edge for powerful data ingestion and routing. Cribl Search unifies queries no matter where data is stored, so you can get value from data without delays. Take an easy path to collect and store data for long-term retention. Comply with legal and business requirements for data retention by defining specific retention periods.
  • 45
    Deep Lake

    Deep Lake

    activeloop

    Generative AI may be new, but we've been building for this day for the past 5 years. Deep Lake thus combines the power of both data lakes and vector databases to build and fine-tune enterprise-grade, LLM-based solutions, and iteratively improve them over time. Vector search does not resolve retrieval. To solve it, you need a serverless query for multi-modal data, including embeddings or metadata. Filter, search, & more from the cloud or your laptop. Visualize and understand your data, as well as the embeddings. Track & compare versions over time to improve your data & your model. Competitive businesses are not built on OpenAI APIs. Fine-tune your LLMs on your data. Efficiently stream data from remote storage to the GPUs as models are trained. Deep Lake datasets are visualized right in your browser or Jupyter Notebook. Instantly retrieve different versions of your data, materialize new datasets via queries on the fly, and stream them to PyTorch or TensorFlow.
    Starting Price: $995 per month
  • 46
    Locus

    Locus

    EQ Works

    With multiple environments to work in, Locus provides a streamlined method of deep analysis of geospatial data for everyone from tech-challenged marketers, to deep query analysis for data scientists and analysts, to top-level metrics for data-driven execs hungry to find their next success. This provides for the most secure, efficient and seamless way to connect other data sources or your data lake to LOCUS. Connection Hub has integrated data lineage governance and transformation capabilities built-in to allow for further integration with tools such as LOCUS Notebook and LOCUS QL. EQ builds its own directed acyclical graph processor on top of the popular Apache Airflow framework. The DAG Builder has been engineered to crunch (and munch) your geospatial workflows with over twenty (20) built-in helper stages.
  • 47
    MovingLake

    MovingLake

    MovingLake

    MovingLake provides state-of-the-art real-time data connectors for infrastructure, hospitality, and e-commerce. Power your data warehouse, databases, and data lakes, as well as your microservices using the same API connectors, and get consistent data across all your systems. Make data-driven decisions faster with MovingLake!
  • 48
    Azure FXT Edge Filer
    Create cloud-integrated hybrid storage that works with your existing network-attached storage (NAS) and Azure Blob Storage. This on-premises caching appliance optimizes access to data in your datacenter, in Azure, or across a wide-area network (WAN). A combination of software and hardware, Microsoft Azure FXT Edge Filer delivers high throughput and low latency for hybrid storage infrastructure supporting high-performance computing (HPC) workloads.Scale-out clustering provides non-disruptive NAS performance scaling. Join up to 24 FXT nodes per cluster to scale to millions of IOPS and hundreds of GB/s. When you need performance and scale in file-based workloads, Azure FXT Edge Filer keeps your data on the fastest path to processing resources. Managing data storage is easy with Azure FXT Edge Filer. Shift aging data to Azure Blob Storage to keep it easily accessible with minimal latency. Balance on-premises and cloud storage.
  • 49
    Iterative

    Iterative

    Iterative

    AI teams face challenges that require new technologies. We build these technologies. Existing data warehouses and data lakes do not fit unstructured datasets like text, images, and videos. AI hand in hand with software development. Built with data scientists, ML engineers, and data engineers in mind. Don’t reinvent the wheel! Fast and cost‑efficient path to production. Your data is always stored by you. Your models are trained on your machines. Existing data warehouses and data lakes do not fit unstructured datasets like text, images, and videos. AI teams face challenges that require new technologies. We build these technologies. Studio is an extension of GitHub, GitLab or BitBucket. Sign up for the online SaaS version or contact us to get on-premise installation
  • 50
    DataLakeHouse.io

    DataLakeHouse.io

    DataLakeHouse.io

    DataLakeHouse.io (DLH.io) Data Sync provides replication and synchronization of operational systems (on-premise and cloud-based SaaS) data into destinations of their choosing, primarily Cloud Data Warehouses. Built for marketing teams and really any data team at any size organization, DLH.io enables business cases for building single source of truth data repositories, such as dimensional data warehouses, data vault 2.0, and other machine learning workloads. Use cases are technical and functional including: ELT, ETL, Data Warehouse, Pipeline, Analytics, AI & Machine Learning, Data, Marketing, Sales, Retail, FinTech, Restaurant, Manufacturing, Public Sector, and more. DataLakeHouse.io is on a mission to orchestrate data for every organization particularly those desiring to become data-driven, or those that are continuing their data driven strategy journey. DataLakeHouse.io (aka DLH.io) enables hundreds of companies to managed their cloud data warehousing and analytics solutions.
    Starting Price: $99