Kylo
Kylo is an open source enterprise-ready data lake management software platform for self-service data ingest and data preparation with integrated metadata management, governance, security and best practices inspired by Think Big's 150+ big data implementation projects. Self-service data ingest with data cleansing, validation, and automatic profiling. Wrangle data with visual sql and an interactive transform through a simple user interface. Search and explore data and metadata, view lineage, and profile statistics. Monitor health of feeds and services in the data lake. Track SLAs and troubleshoot performance. Design batch or streaming pipeline templates in Apache NiFi and register with Kylo to enable user self-service. Organizations can expend significant engineering effort moving data into Hadoop yet struggle to maintain governance and data quality. Kylo dramatically simplifies data ingest by shifting ingest to data owners through a simple guided UI.
Learn more
AWS Lake Formation
AWS Lake Formation is a service that makes it easy to set up a secure data lake in days. A data lake is a centralized, curated, and secured repository that stores all your data, both in its original form and prepared for analysis. A data lake lets you break down data silos and combine different types of analytics to gain insights and guide better business decisions. Setting up and managing data lakes today involves a lot of manual, complicated, and time-consuming tasks. This work includes loading data from diverse sources, monitoring those data flows, setting up partitions, turning on encryption and managing keys, defining transformation jobs and monitoring their operation, reorganizing data into a columnar format, deduplicating redundant data, and matching linked records. Once data has been loaded into the data lake, you need to grant fine-grained access to datasets, and audit access over time across a wide range of analytics and machine learning (ML) tools and services.
Learn more
BigLake
BigLake is a storage engine that unifies data warehouses and lakes by enabling BigQuery and open-source frameworks like Spark to access data with fine-grained access control. BigLake provides accelerated query performance across multi-cloud storage and open formats such as Apache Iceberg. Store a single copy of data with uniform features across data warehouses & lakes. Fine-grained access control and multi-cloud governance over distributed data. Seamless integration with open-source analytics tools and open data formats. Unlock analytics on distributed data regardless of where and how it’s stored, while choosing the best analytics tools, open source or cloud-native over a single copy of data. Fine-grained access control across open source engines like Apache Spark, Presto, and Trino, and open formats such as Parquet. Performant queries over data lakes powered by BigQuery. Integrates with Dataplex to provide management at scale, including logical data organization.
Learn more
Archon Data Store
Archon Data Store™ is a powerful and secure open-source based archive lakehouse platform designed to store, manage, and provide insights from massive volumes of data. With its compliance features and minimal footprint, it enables large-scale search, processing, and analysis of structured, unstructured, & semi-structured data across your organization. Archon Data Store combines the best features of data warehouses and data lakes into a single, simplified platform. This unified approach eliminates data silos, streamlining data engineering, analytics, data science, and machine learning workflows. Through metadata centralization, optimized data storage, and distributed computing, Archon Data Store maintains data integrity. Its common approach to data management, security, and governance helps you operate more efficiently and innovate faster. Archon Data Store provides a single platform for archiving and analyzing all your organization's data while delivering operational efficiencies.
Learn more