117 Integrations with Apache Hive

View a list of Apache Hive integrations and software that integrates with Apache Hive below. Compare the best Apache Hive integrations as well as features, ratings, user reviews, and pricing of software that integrates with Apache Hive. Here are the current Apache Hive integrations in 2025:

  • 1
    Jotform

    Jotform

    Jotform

    Trusted by over 25 million users, Jotform is an all-in-one, no-code platform that simplifies data collection, automation, and online sales. Using its drag-and-drop Form Builder, businesses can create customized forms and surveys to collect leads, payments, and e-signatures. With 10,000+ templates and advanced features like conditional logic and 200+ integrations, Jotform streamlines workflows. Jotform's AI-powered Agents provide real-time customer support, guiding users through form submissions, answering questions, and ensuring a smooth experience while reducing manual intervention. These AI agents learn from interactions to improve responses, enhancing efficiency and customer satisfaction. The platform also includes a Store Builder to sell products and services, accept payments through 30+ gateways, and tools like Approvals and Report Builder to automate workflows and generate actionable insights.
    Leader badge
    Starting Price: $34 per month
    View Software
    Visit Website
  • 2
    DbVisualizer

    DbVisualizer

    DbVisualizer

    DbVisualizer is one of the world's most popular database editors. With almost 7 million downloads and Pro users in 150 countries worldwide, it won't disappoint you. Free and Pro versions are available. Developers, analysts, and DBAs use it to elevate their SQL experience with modern tools to visualize and manage their databases, schemas, objects, and table data, auto-generate, write, and optimize queries, and so much more. It connects to all popular databases, such as MySQL, PostgreSQL, SQL Server, Oracle, Cassandra, Snowflake, SQLite, BigQuery, and 30+ others, and runs on all popular OSes (Windows, macOS, and Linux). A powerful SQL editor with intelligent autocomplete, visual query builders, variables, and more. You can fully control window layouts, key bindings, UI theme, mark scripts, and database objects as favorites for quick access or even work outside of DbVisualizer. DbVisualizer is also built to meet rigorous security standards, all configurable within the product.
    Leader badge
    Starting Price: Free
    Partner badge
    View Software
    Visit Website
  • 3
    Omniscope Evo
    Visokio builds Omniscope Evo, complete and extensible BI software for data processing, analytics and reporting. A smart experience on any device. Start from any data in any shape, load, edit, blend, transform while visually exploring it, extract insights through ML algorithms, automate your data workflows, and publish interactive reports and dashboards to share your findings. Omniscope is not only an all-in-one BI tool with a responsive UX on all modern devices, but also a powerful and extensible platform: you can augment data workflows with Python / R scripts and enhance reports with any JS visualisation. Whether you’re a data manager, scientist or analyst, Omniscope is your complete solution: from data, through analytics to visualisation.
    Starting Price: $59/month/user
  • 4
    DataGrip

    DataGrip

    JetBrains

    Meet DataGrip, our new database IDE that is tailored to suit the specific needs of professional SQL developers. Allows you to execute queries in different modes and provides a local history that keeps track of all your activity and protects you from losing your work. Lets you jump to any table, view, or procedure by its name via the corresponding action, or directly from its usages in the SQL code. Gives you an extended insight into how your queries work and into the database engine behavior, so you can make your queries more efficient. DataGrip provides context-sensitive code completion, helping you to write SQL code faster. Completion is aware of the tables structure, foreign keys, and even database objects created in code you're editing. DataGrip detects probable bugs in your code and suggests the best options to fix them on the fly. It will immediately let you know about unresolved objects, using keywords as identifiers and always offers a way to fix the problems.
    Starting Price: $199 per year
  • 5
    DBeaver

    DBeaver

    DBeaver

    Free multi-platform database tool for developers, database administrators, analysts and all people who need to work with databases. Supports all popular databases: MySQL, PostgreSQL, SQLite, Oracle, DB2, SQL Server, Sybase, MS Access, Teradata, Firebird, Apache Hive, Phoenix, Presto, etc. Copy As: format configuration editor was added. Extra configuration for filter dialog (performance). Sort by column as fixed (for small fetch sizes). Case-insensitive filters support was added. Plaintext view now support top/bottom dividers. Data editor was fixed (when column name conflicts with alias name). Duplicate row(s) command was fixed for multiple selected rows. Edit sub-menu was returned to the context menu. Columns auto-size configuration was added. Dictionary viewer was fixed (for read-only connections). Current/selected row highlighting support was added (configurable).
  • 6
    DataClarity Unlimited Analytics
    DataClarity Unlimited Analytics is the only free modern embeddable data and analytics platform in the world that provides a self-service, powerful, secure and seamless end-to-end experience. Highlights: SIMPLIFIED DATA INTEGRATION – Easily connect, join, curate, cache and catalog diverse data through drag and drop, custom SQL builder & AI-powered data profiling. | INTERACTIVE DASHBOARDS – craft compelling reports using 80 stunning visualizations, geospatial maps and flexibility to bring your own charts. | REAL-TIME ANALYSIS – Perform advanced analysis & data exploration using drill-down, drill-through, filters, built-in statistical & predictive models, or your own Python and R code. | SEAMLESS APPLICATION INTEGRATION – Achieve smooth integration with versatile APIs, tailor-made configurations & flexible embedding features. | SECURITY & GOVERNANCE – Ensure adherence to your security guidelines, governance standards, multitenancy, row-level data protection, and Single Sign-On (SSO).
    Starting Price: FREE
  • 7
    Dataiku

    Dataiku

    Dataiku

    Dataiku is an advanced data science and machine learning platform designed to enable teams to build, deploy, and manage AI and analytics projects at scale. It empowers users, from data scientists to business analysts, to collaboratively create data pipelines, develop machine learning models, and prepare data using both visual and coding interfaces. Dataiku supports the entire AI lifecycle, offering tools for data preparation, model training, deployment, and monitoring. The platform also includes integrations for advanced capabilities like generative AI, helping organizations innovate and deploy AI solutions across industries.
  • 8
    IBM API Connect
    IBM API Connect is a scalable API solution that helps organizations implement a robust API strategy by creating, exposing, managing and monetizing an entire API ecosystem across multiple clouds. As businesses embrace their digital transformation journey, APIs become critical to unlock the value of business data and assets. With increasing adoption of APIs, consistency and governance are needed across the enterprise. API Connect aims to help businesses to get new features to market fast, maintain continuous availability, meet changing user needs, and spur innovation.
  • 9
    Sifflet

    Sifflet

    Sifflet

    Automatically cover thousands of tables with ML-based anomaly detection and 50+ custom metrics. Comprehensive data and metadata monitoring. Exhaustive mapping of all dependencies between assets, from ingestion to BI. Enhanced productivity and collaboration between data engineers and data consumers. Sifflet seamlessly integrates into your data sources and preferred tools and can run on AWS, Google Cloud Platform, and Microsoft Azure. Keep an eye on the health of your data and alert the team when quality criteria aren’t met. Set up in a few clicks the fundamental coverage of all your tables. Configure the frequency of runs, their criticality, and even customized notifications at the same time. Leverage ML-based rules to detect any anomaly in your data. No need for an initial configuration. A unique model for each rule learns from historical data and from user feedback. Complement the automated rules with a library of 50+ templates that can be applied to any asset.
  • 10
    Activeeon ProActive
    The solution provided by Activeeon is suited to fit modern challenges such as the growth of data, new infrastructures, cloud strategy evolving, new application architecture, etc. It provides orchestration and scheduling to automate and build a solid base for future growth. ProActive Workflows & Scheduling is a java-based cross-platform workflow scheduler and resource manager that is able to run workflow tasks in multiple languages and multiple environments (Windows, Linux, Mac, Unix, etc). ProActive Resource Manager makes compute resources available for task execution. It handles on-premises and cloud compute resources in an elastic, on-demand and distributed fashion. ProActive AI Orchestration from Activeeon empowers data engineers and data scientists with a simple, portable and scalable solution for machine learning pipelines. It provides pre-built and customizable tasks that enable automation within the machine learning lifecycle, which helps data scientists and IT Operations work.
    Starting Price: $10,000
  • 11
    StarfishETL

    StarfishETL

    StarfishETL

    StarfishETL is an Integration Platform as a Service (iPaaS), and although “integration” is in the name, it’s capable of much more. An iPaaS lives in the cloud and can integrate different systems by using their APIs. This makes it adaptable beyond integration for migration, data governance, and data cleansing. Unlike traditional integration apps, StarfishETL provides low-code mapping and powerful scripting tools to manage, personalize, and manipulate data at scale. Features: - Drag and drop mapping - AI-powered connections - Purpose built integrations - Extensibility through scripting - Secure on-premises connections - Scalable data capacity
    Starting Price: 400/month
  • 12
    Aqua Data Studio

    Aqua Data Studio

    AquaFold, an Idera, Inc. company

    Aqua Data Studio is a multiple-platform, integrated development environment (IDE) for data. It provides benefits to a variety of data-centric roles, allowing them to manage a wide range of data sources. Aqua Data Studio provides scalable, cross-platform data management, supporting IT and data-centric specialists, including developers, database administrators, as well as data analysts, data modelers, and data architects. Simplifies tedious tasks involving SQL queries, data, result sets, schema, data models, files, instances, servers, as well as automation. Aqua Data Studio can be installed on the three popular operating systems: Microsoft Windows, Apple macOS, and Linux. The graphical user interface can display the nine of the most widely spoken languages: English, Spanish, French, German, Korean, Portuguese, Japanese, and Chinese. Aqua Data Studio supports over 40 of the most popular data source platforms, including relational, NoSQL, as well as managed cloud data source
    Starting Price: $499 per user per year
  • 13
    IRI CoSort

    IRI CoSort

    IRI, The CoSort Company

    What is CoSort? IRI CoSort® is a fast, affordable, and easy-to-use sort/merge/report utility, and a full-featured data transformation and preparation package. The world's first sort product off the mainframe, CoSort continues to deliver maximum price-performance and functional versatility for the manipulation and blending of big data sources. CoSort also powers the IRI Voracity data management platform and many third-party tools. What does CoSort do? CoSort runs multi-threaded sort/merge jobs AND many other high-volume (big data) manipulations separately, or in combination. It can also cleanse, mask, convert, and report at the same time. Self-documenting 4GL scripts supported in Eclipse™ help you speed or leave legacy: sort, ETL and BI tools; COBOL and SQL programs, plus Hadoop, Perl, Python, and other batch jobs. Use CoSort to sort, join, aggregate, and load 2-20X faster than data wrangling and BI tools, 10x faster than SQL transforms, and 6x faster than most ETL tools.
    Starting Price: $4,000 perpetual use
  • 14
    Hackolade

    Hackolade

    Hackolade

    Hackolade Studio is a powerful data modeling platform that supports a wide range of technologies including relational SQL and NoSQL databases, cloud data warehouses, APIs, streaming platforms, and data exchange formats. Designed for modern data architecture, it enables users to visually design, document, and evolve schemas across systems like Oracle, PostgreSQL, Databricks, Snowflake, MongoDB, Cassandra, DynamoDB, Neo4j, Kafka (with Confluent Schema Registry), OpenAPI, GraphQL, and more. Hackolade Studio offers forward and reverse engineering, schema versioning, model validation, and integration with metadata catalogs such as Unity Catalog and Collibra. It empowers data architects, engineers, and governance teams to collaborate on consistent, governed, and scalable data models. Whether building data products, managing API contracts, or ensuring regulatory compliance, Hackolade Studio streamlines the process in one unified interface.
    Starting Price: €175 per month
  • 15
    Union Cloud

    Union Cloud

    Union.ai

    Union.ai is an award-winning, Flyte-based data and ML orchestrator for scalable, reproducible ML pipelines. With Union.ai, you can write your code locally and easily deploy pipelines to remote Kubernetes clusters. “Flyte’s scalability, data lineage, and caching capabilities enable us to train hundreds of models on petabytes of geospatial data, giving us an edge in our business.” — Arno, CTO at Blackshark.ai “With Flyte, we want to give the power back to biologists. We want to stand up something that they can play around with different parameters for their models because not every … parameter is fixed. We want to make sure we are giving them the power to run the analyses.” — Krishna Yeramsetty, Principal Data Scientist at Infinome “Flyte plays a vital role as a key component of Gojek's ML Platform by providing exactly that." — Pradithya Aria Pura, Principal Engineer at Goj
    Starting Price: Free (Flyte)
  • 16
    Apache Iceberg

    Apache Iceberg

    Apache Software Foundation

    Iceberg is a high-performance format for huge analytic tables. Iceberg brings the reliability and simplicity of SQL tables to big data, while making it possible for engines like Spark, Trino, Flink, Presto, Hive and Impala to safely work with the same tables, at the same time. Iceberg supports flexible SQL commands to merge new data, update existing rows, and perform targeted deletes. Iceberg can eagerly rewrite data files for read performance, or it can use delete deltas for faster updates. Iceberg handles the tedious and error-prone task of producing partition values for rows in a table and skips unnecessary partitions and files automatically. No extra filters are needed for fast queries, and the table layout can be updated as data or queries change.
    Starting Price: Free
  • 17
    Datameer

    Datameer

    Datameer

    Datameer revolutionizes data transformation with a low-code approach, trusted by top global enterprises. Craft, transform, and publish data seamlessly with no code and SQL, simplifying complex data engineering tasks. Empower your data teams to make informed decisions confidently while saving costs and ensuring responsible self-service analytics. Speed up your analytics workflow by transforming datasets to answer ad-hoc questions and support operational dashboards. Empower everyone on your team with our SQL or Drag-and-Drop to transform your data in an intuitive and collaborative workspace. And best of all, everything happens in Snowflake. Datameer is designed and optimized for Snowflake to reduce data movement and increase platform adoption. Some of the problems Datameer solves: - Analytics is not accessible - Drowning in backlog - Long development
  • 18
    DreamFactory

    DreamFactory

    DreamFactory Software

    DreamFactory Software is the fastest way to build secure, internal REST APIs. Instantly generate APIs from any database with built-in enterprise security controls that operates on-premises, air-gapped, or in the cloud. Develop 4x faster, save 70% on new projects, remove project management uncertainty, focus talent on truly critical issues, win more clients, and integrate with newer & legacy technologies instantly as needed. DreamFactory is the easiest and fastest way to automatically generate, publish, manage, and secure REST APIs, convert SOAP to REST, and aggregate disparate data sources through a single API platform. See why companies like Disney, Bosch, Netgear, T-Mobile, Intel, and many more are embracing DreamFactory's innovative platform to get a competitive edge. Start a hosted trial or talk to our engineers to get access to an on-prem environment!
    Starting Price: $1500/month
  • 19
    ClicData

    ClicData

    ClicData

    ClicData is the world first 100% cloud-based Business Intelligence and data management software. With our included data warehouse, you can easily cleanse, combine, transform and merge any data from any data source. Create interactive and self-updated dashboards that you can share with your Manager, your team or customers in multiple ways: email delivery schedule, export or even dynamic dashboards via our LiveLinks. With ClicData, automate everything from data connection, data refresh and management, and scheduling routines.
    Starting Price: $25.00/month
  • 20
    Immuta

    Immuta

    Immuta

    Immuta is the market leader in secure Data Access, providing data teams one universal platform to control access to analytical data sets in the cloud. Only Immuta can automate access to data by discovering, securing, and monitoring data. Data-driven organizations around the world trust Immuta to speed time to data, safely share more data with more users, and mitigate the risk of data leaks and breaches. Founded in 2015, Immuta is headquartered in Boston, MA. Immuta is the fastest way for algorithm-driven enterprises to accelerate the development and control of machine learning and advanced analytics. The company's hyperscale data management platform provides data scientists with rapid, personalized data access to dramatically improve the creation, deployment and auditability of machine learning and AI.
  • 21
    Coginiti

    Coginiti

    Coginiti

    Coginiti, the AI-enabled enterprise data workspace, empowers everyone to get consistent answers fast to any business question. Accelerating the analytic development lifecycle from development to certification, Coginiti makes it easy for you to search and find approved metrics for your use case. Coginiti integrates all the functionality you need to build, approve, version, and curate analytics across all business domains for reuse, all while adhering to your data governance policy and standards. Data and analytic teams in the insurance, financial services, healthcare, and retail/consumer package goods industries trust Coginiti’s collaborative data workspace to deliver value to their customers.
    Starting Price: $189/user/year
  • 22
    Rational BI

    Rational BI

    Rational BI

    Spend less time preparing your data and more time analyzing it. Not only can you build better looking and more accurate reports, you can centralize all your data gathering, analytics and data science in a single interface, accessible to everyone in the organization. Import all your data no matter where it lives. Whether you’re looking to build scheduled reports from your Excel files, cross-reference data between files and databases or turn your data into SQL queryable databases, Rational BI gives you all the tools you need. Discover the signals hidden in your data, make it available without delay and move ahead of your competition. Magnify the analytics capabilities of your organization through business intelligence that makes it easy to find the latest up-to-date data and analyze it through an interface that delights both data scientists and casual data consumers.
    Starting Price: $129 per month
  • 23
    IBM App Connect
    Improve speed and quality of application integration with AI and automation. IBM® App Connect instantly connects applications and data from existing systems and modern technologies across all environments. App Connect offers enterprise service bus (ESB) and agile integration architecture (AIA) microservices deployment of integration artifacts, allowing businesses to deploy to a multitude of flexible integration patterns. Integration and AI create an engaging experience to allow customers to make online insurance claims easier and more accurate. Open banking APIs are being adopted across the globe and leading the way towards an open data economy that empowers users and unlocks innovation. Continuum of care is a concept involving an integrated system that guides and tracks patients over time through a comprehensive array of health services spanning all levels of intensity.
    Starting Price: $500 per month
  • 24
    IBM Cloud Pak for Integration
    IBM Cloud Pak for Integration® is a hybrid integration platform with an automated, closed-loop approach that supports multiple styles of integration within a single, unified experience. Unlock business data and assets as APIs, connect cloud and on-premise applications, reliably move data with enterprise messaging, deliver real-time event interactions, transfer data across any cloud and deploy and scale with cloud-native architecture and shared foundational services, all with end-to-end enterprise-grade security and encryption. Achieve the best results from integration with an automated, closed-loop and multi-style approach. Apply targeted innovations to automate integrations, such as natural language–powered integration flows, AI-assisted mapping and RPA, and use company-specific operational data to continuously improve integrations, enhance API test generation, workload balancing and more.
    Starting Price: $934 per month
  • 25
    Google Cloud Data Catalog
    A fully managed and highly scalable data discovery and metadata management service. New customers get $300 in free credits to spend on Google Cloud during the Free Trial. All customers get up to 1 MiB of business or ingested metadata storage and 1 million API calls, free of charge. Pinpoint your data with a simple but powerful faceted-search interface. Sync technical metadata automatically and create schematized tags for business metadata. Tag sensitive data automatically, through Cloud Data Loss Prevention (DLP) integration. Get access immediately then scale without infrastructure to set up or manage. Empower any user on the team to find or tag data with a powerful UI, built with the same search technology as Gmail, or via API access. Data Catalog is fully managed, so you can start and scale effortlessly. Enforce data security policies and maintain compliance through Cloud IAM and Cloud DLP integrations.
    Starting Price: $100 per GiB per month
  • 26
    Causal

    Causal

    Causal

    Build models 10x faster, connect them directly to your data, and share them with interactive dashboards and beautiful visuals. Causal's formulas are in plain English— no cell references or obscure syntax and a single Causal formula can do the work of 10s, and even 100s of spreadsheet formulas. Causal's built-in scenarios feature lets you easily set up and compare what-if scenarios, and you can work with ranges ("5 to 10") to understand the full range of possible outcomes of your model. Startups use Causal to calculate runway, track KPIs, plan employee compensation, and build investor-ready financial models for fundraising. Generate beautiful charts and tables without spending hours on customisation and configuration. Easily switch between different time scales and summary views.
    Starting Price: $50 per user per month
  • 27
    Flyte

    Flyte

    Union.ai

    The workflow automation platform for complex, mission-critical data and ML processes at scale. Flyte makes it easy to create concurrent, scalable, and maintainable workflows for machine learning and data processing. Flyte is used in production at Lyft, Spotify, Freenome, and others. At Lyft, Flyte has been serving production model training and data processing for over four years, becoming the de-facto platform for teams like pricing, locations, ETA, mapping, autonomous, and more. In fact, Flyte manages over 10,000 unique workflows at Lyft, totaling over 1,000,000 executions every month, 20 million tasks, and 40 million containers. Flyte has been battle-tested at Lyft, Spotify, Freenome, and others. It is entirely open-source with an Apache 2.0 license under the Linux Foundation with a cross-industry overseeing committee. Configuring machine learning and data workflows can get complex and error-prone with YAML.
    Starting Price: Free
  • 28
    Ascend

    Ascend

    Ascend

    Ascend gives data teams a unified and automated platform to ingest, transform, and orchestrate their entire data engineering and analytics engineering workloads, 10X faster than ever before.​ Ascend helps gridlocked teams break through constraints to build, manage, and optimize the increasing number of data workloads required. Backed by DataAware intelligence, Ascend works continuously in the background to guarantee data integrity and optimize data workloads, reducing time spent on maintenance by up to 90%. Build, iterate on, and run data transformations easily with Ascend’s multi-language flex-code interface enabling the use of SQL, Python, Java, and, Scala interchangeably. Quickly view data lineage, data profiles, job and user logs, system health, and other critical workload metrics at a glance. Ascend delivers native connections to a growing library of common data sources with our Flex-Code data connectors.
    Starting Price: $0.98 per DFC
  • 29
    Predibase

    Predibase

    Predibase

    Declarative machine learning systems provide the best of flexibility and simplicity to enable the fastest-way to operationalize state-of-the-art models. Users focus on specifying the “what”, and the system figures out the “how”. Start with smart defaults, but iterate on parameters as much as you’d like down to the level of code. Our team pioneered declarative machine learning systems in industry, with Ludwig at Uber and Overton at Apple. Choose from our menu of prebuilt data connectors that support your databases, data warehouses, lakehouses, and object storage. Train state-of-the-art deep learning models without the pain of managing infrastructure. Automated Machine Learning that strikes the balance of flexibility and control, all in a declarative fashion. With a declarative approach, finally train and deploy models as quickly as you want.
  • 30
    Secoda

    Secoda

    Secoda

    With Secoda AI on top of your metadata, you can now get contextual search results from across your tables, columns, dashboards, metrics, and queries. Secoda AI can also help you generate documentation and queries from your metadata, saving your team hundreds of hours of mundane work and redundant data requests. Easily search across all columns, tables, dashboards, events, and metrics. AI-powered search lets you ask any question to your data and get a contextual answer, fast. Get answers to questions. Integrate data discovery into your workflow without disrupting it with our API. Perform bulk updates, tag PII data, manage tech debt, build custom integrations, identify the least used resources, and more. Eliminate manual error and have total trust in your knowledge repository.
    Starting Price: $50 per user per month
  • 31
    Apache Doris

    Apache Doris

    The Apache Software Foundation

    Apache Doris is a modern data warehouse for real-time analytics. It delivers lightning-fast analytics on real-time data at scale. Push-based micro-batch and pull-based streaming data ingestion within a second. Storage engine with real-time upsert, append and pre-aggregation. Optimize for high-concurrency and high-throughput queries with columnar storage engine, MPP architecture, cost based query optimizer, vectorized execution engine. Federated querying of data lakes such as Hive, Iceberg and Hudi, and databases such as MySQL and PostgreSQL. Compound data types such as Array, Map and JSON. Variant data type to support auto data type inference of JSON data. NGram bloomfilter and inverted index for text searches. Distributed design for linear scalability. Workload isolation and tiered storage for efficient resource management. Supports shared-nothing clusters as well as separation of storage and compute.
    Starting Price: Free
  • 32
    Hue

    Hue

    Hue

    Hue brings the best querying experience with the most intelligent autocomplete and query editor components. The tables and storage browsers leverage your existing data catalog knowledge transparently. Help users find the correct data among thousands of databases and self-document it. Assist users with their SQL queries and leverage rich previews for links, sharing from the editor directly in Slack. Several apps, each one specialized in a certain type of querying are available. Data sources can be explored first via the browsers. The editor shines for SQL queries. It comes with an intelligent autocomplete, risk alerts, and self-service troubleshooting. Dashboards focus on visualizing indexed data but can also query SQL databases. You can now search for certain cell values in the table and the results are highlighted. To make your SQL editing experience, Hue comes with one of the best SQL autocomplete on the planet.
    Starting Price: Free
  • 33
    Yandex Data Proc
    You select the size of the cluster, node capacity, and a set of services, and Yandex Data Proc automatically creates and configures Spark and Hadoop clusters and other components. Collaborate by using Zeppelin notebooks and other web apps via a UI proxy. You get full control of your cluster with root permissions for each VM. Install your own applications and libraries on running clusters without having to restart them. Yandex Data Proc uses instance groups to automatically increase or decrease computing resources of compute subclusters based on CPU usage indicators. Data Proc allows you to create managed Hive clusters, which can reduce the probability of failures and losses caused by metadata unavailability. Save time on building ETL pipelines and pipelines for training and developing models, as well as describing other iterative tasks. The Data Proc operator is already built into Apache Airflow.
    Starting Price: $0.19 per hour
  • 34
    Apache Impala
    Impala provides low latency and high concurrency for BI/analytic queries on the Hadoop ecosystem, including Iceberg, open data formats, and most cloud storage options. Impala also scales linearly, even in multitenant environments. Impala is integrated with native Hadoop security and Kerberos for authentication, and via the Ranger module, you can ensure that the right users and applications are authorized for the right data. Utilize the same file and data formats and metadata, security, and resource management frameworks as your Hadoop deployment, with no redundant infrastructure or data conversion/duplication. For Apache Hive users, Impala utilizes the same metadata and ODBC driver. Like Hive, Impala supports SQL, so you don't have to worry about reinventing the implementation wheel. With Impala, more users, whether using SQL queries or BI applications, can interact with more data through a single repository and metadata stored from source through analysis.
    Starting Price: Free
  • 35
    StarRocks

    StarRocks

    StarRocks

    Whether you're working with a single table or multiple, you'll experience at least 300% better performance on StarRocks compared to other popular solutions. From streaming data to data capture, with a rich set of connectors, you can ingest data into StarRocks in real time for the freshest insights. A query engine that adapts to your use cases. Without moving your data or rewriting SQL, StarRocks provides the flexibility to scale your analytics on demand with ease. StarRocks enables a rapid journey from data to insight. StarRocks' performance is unmatched and provides a unified OLAP solution covering the most popular data analytics scenarios. Whether you're working with a single table or multiple, you'll experience at least 300% better performance on StarRocks compared to other popular solutions. StarRocks' built-in memory-and-disk-based caching framework is specifically designed to minimize the I/O overhead of fetching data from external storage to accelerate query performance.
    Starting Price: Free
  • 36
    Apache Phoenix

    Apache Phoenix

    Apache Software Foundation

    Apache Phoenix enables OLTP and operational analytics in Hadoop for low-latency applications by combining the best of both worlds. The power of standard SQL and JDBC APIs with full ACID transaction capabilities and the flexibility of late-bound, schema-on-read capabilities from the NoSQL world by leveraging HBase as its backing store. Apache Phoenix is fully integrated with other Hadoop products such as Spark, Hive, Pig, Flume, and Map Reduce. Become the trusted data platform for OLTP and operational analytics for Hadoop through well-defined, industry-standard APIs. Apache Phoenix takes your SQL query, compiles it into a series of HBase scans, and orchestrates the running of those scans to produce regular JDBC result sets. Direct use of the HBase API, along with coprocessors and custom filters, results in performance on the order of milliseconds for small queries, or seconds for tens of millions of rows.
    Starting Price: Free
  • 37
    Stackable

    Stackable

    Stackable

    The Stackable data platform was designed with openness and flexibility in mind. It provides you with a curated selection of the best open source data apps like Apache Kafka, Apache Druid, Trino, and Apache Spark. While other current offerings either push their proprietary solutions or deepen vendor lock-in, Stackable takes a different approach. All data apps work together seamlessly and can be added or removed in no time. Based on Kubernetes, it runs everywhere, on-prem or in the cloud. stackablectl and a Kubernetes cluster are all you need to run your first stackable data platform. Within minutes, you will be ready to start working with your data. Configure your one-line startup command right here. Similar to kubectl, stackablectl is designed to easily interface with the Stackable Data Platform. Use the command line utility to deploy and manage stackable data apps on Kubernetes. With stackablectl, you can create, delete, and update components.
    Starting Price: Free
  • 38
    Inferyx

    Inferyx

    Inferyx

    Move past application silos, cost overrun, and skill obsolescence to scale faster with our intelligent data and analytics platform. An intelligent platform built to perform data management and advanced analytics. Helps you scale across the technology landscape. Our architecture understands how data flows and transforms throughout its lifecycle. Enabling the development of future-proof enterprise AI applications. A highly modular and extensible platform that enables the handling of multifold components. Designed to scale with a multi-tenant architecture. Analyzing complex data structures is made easy using advanced data visualization. Resulting in enhanced enterprise AI app development in an intuitive and low-code predictive platform. Our unique hybrid multi-cloud platform is built using open source community software which makes it immensely adaptive, highly secure, and essentially low-cost.
    Starting Price: Free
  • 39
    DataHub

    DataHub

    DataHub

    DataHub is an open source metadata platform designed to streamline data discovery, observability, and governance across diverse data ecosystems. It enables organizations to effortlessly discover trustworthy data, with experiences tailored for each person and eliminates breaking changes with detailed cross-platform and column-level lineage. DataHub builds confidence in your data by providing a comprehensive view of business, operational, and technical context, all in one place. The platform offers automated data quality checks and AI-driven anomaly detection, notifying teams when issues arise and centralizing incident tracking. With detailed lineage, documentation, and ownership information, DataHub facilitates swift issue resolution. It also automates governance programs by classifying assets as they evolve, minimizing manual work through GenAI documentation, AI-driven classification, and smart propagation. DataHub's extensible architecture supports over 70 native integrations.
    Starting Price: Free
  • 40
    Alteryx

    Alteryx

    Alteryx

    Step into a new era of analytics with the Alteryx AI Platform. Empower your organization with automated data preparation, AI-powered analytics, and approachable machine learning — all with embedded governance and security. Welcome to the future of data-driven decisions for every user, every team, every step of the way. Empower your teams with an easy, intuitive user experience allowing everyone to create analytic solutions that improve productivity, efficiency, and the bottom line. Build an analytics culture with an end-to-end cloud analytics platform and transform data into insights with self-service data prep, machine learning, and AI-generated insights. Reduce risk and ensure your data is fully protected with the latest security standards and certifications. Connect to your data and applications with open API standards.
  • 41
    Protegrity

    Protegrity

    Protegrity

    Our platform allows businesses to use data—including its application in advanced analytics, machine learning, and AI—to do great things without worrying about putting customers, employees, or intellectual property at risk. The Protegrity Data Protection Platform doesn't just secure data—it simultaneously classifies and discovers data while protecting it. You can't protect what you don't know you have. Our platform first classifies data, allowing users to categorize the type of data that can mostly be in the public domain. With those classifications established, the platform then leverages machine learning algorithms to discover that type of data. Classification and discovery finds the data that needs to be protected. Whether encrypting, tokenizing, or applying privacy methods, the platform secures the data behind the many operational systems that drive the day-to-day functions of business, as well as the analytical systems behind decision-making.
  • 42
    Algonomy

    Algonomy

    Algonomy

    Algonomy’s real-time CDP helps marketers provide consistent personalized customer engagement in the moment. It enables real-time activation of audiences by unifying customer identities across online and offline channels for a single view. It is built for retail with the ability to track customer behavior using 1,200+ measures and dimensions out of the box. It uses ML algorithms to create micro-segments, throw up deep insights and uncover marketing opportunities throughout the customer lifecycle.
  • 43
    Ataccama ONE
    Ataccama reinvents the way data is managed to create value on an enterprise scale. Unifying Data Governance, Data Quality, and Master Data Management into a single, AI-powered fabric across hybrid and Cloud environments, Ataccama gives your business and data teams the ability to innovate with unprecedented speed while maintaining trust, security, and governance of your data.
  • 44
    IBM Cloud Mass Data Migration
    IBM Cloud® Mass Data Migration uses storage devices with 120 TB of usable capacity to accelerate moving data to the cloud and overcome common transfer challenges like high costs, long transfer times and security concerns — all in a single service. Using a single IBM Cloud Mass Data Migration device, you can migrate up to 120 TB of data (at RAID-6) in just days, as opposed to weeks or months using traditional data-transfer methods. Whether you need to migrate a few terabytes or many petabytes of data, you have the flexibility to request one or multiple devices to accommodate your workload. Moving large data sets can be an expensive and time-consuming process. Use an IBM Cloud Mass Data Migration device at your location for just 50 USD per day. IBM sends you a preconfigured device for you to simply connect, ingest data and then ship back to IBM for offload into IBM Cloud Object Storage. Once offloaded, enjoy immediate access to your data in the cloud while IBM securely wipes the device.
    Starting Price: $50 per day
  • 45
    E-MapReduce
    EMR is an all-in-one enterprise-ready big data platform that provides cluster, job, and data management services based on open-source ecosystems, such as Hadoop, Spark, Kafka, Flink, and Storm. Alibaba Cloud Elastic MapReduce (EMR) is a big data processing solution that runs on the Alibaba Cloud platform. EMR is built on Alibaba Cloud ECS instances and is based on open-source Apache Hadoop and Apache Spark. EMR allows you to use the Hadoop and Spark ecosystem components, such as Apache Hive, Apache Kafka, Flink, Druid, and TensorFlow, to analyze and process data. You can use EMR to process data stored on different Alibaba Cloud data storage service, such as Object Storage Service (OSS), Log Service (SLS), and Relational Database Service (RDS). You can quickly create clusters without the need to configure hardware and software. All maintenance operations are completed on its Web interface.
  • 46
    Apache Ranger

    Apache Ranger

    The Apache Software Foundation

    Apache Ranger™ is a framework to enable, monitor and manage comprehensive data security across the Hadoop platform. The vision with Ranger is to provide comprehensive security across the Apache Hadoop ecosystem. With the advent of Apache YARN, the Hadoop platform can now support a true data lake architecture. Enterprises can potentially run multiple workloads, in a multi tenant environment. Data security within Hadoop needs to evolve to support multiple use cases for data access, while also providing a framework for central administration of security policies and monitoring of user access. Centralized security administration to manage all security related tasks in a central UI or using REST APIs. Fine grained authorization to do a specific action and/or operation with Hadoop component/tool and managed through a central administration tool. Standardize authorization method across all Hadoop components. Enhanced support for different authorization methods - Role based access control etc.
  • 47
    Vaultspeed

    Vaultspeed

    VaultSpeed

    Experience faster data warehouse automation. The Vaultspeed automation tool is built on the Data Vault 2.0 standard and a decade of hands-on experience in data integration projects. Get support for all Data Vault 2.0 objects and implementation options. Generate quality code fast for all scenarios in a Data Vault 2.0 integration system. Plug Vaultspeed into your current set-up and leverage your investments in tools and knowledge. Get guaranteed compliance with the latest Data Vault 2.0 standard. We are in continuous interaction with Scalefree, the body of knowledge for the Data Vault 2.0 community. The Data Vault 2.0 modelling approach strips the model components to their bare minimum so they can be loaded through the same loading pattern (repeatable pattern) and have the same database structure. Vaultspeed works with a template system, which understands the structure of the object types, and easy-to-set configuration parameters.
    Starting Price: €600 per user per month
  • 48
    PHEMI Health DataLab
    The PHEMI Trustworthy Health DataLab is a unique, cloud-based, integrated big data management system that allows healthcare organizations to enhance innovation and generate value from healthcare data by simplifying the ingestion and de-identification of data with NSA/military-grade governance, privacy, and security built-in. Conventional products simply lock down data, PHEMI goes further, solving privacy and security challenges and addressing the urgent need to secure, govern, curate, and control access to privacy-sensitive personal healthcare information (PHI). This improves data sharing and collaboration inside and outside of an enterprise—without compromising the privacy of sensitive information or increasing administrative burden. PHEMI Trustworthy Health DataLab can scale to any size of organization, is easy to deploy and manage, connects to hundreds of data sources, and integrates with popular data science and business analysis tools.
  • 49
    SQLyog

    SQLyog

    Webyog

    A Powerful MySQL Development and Administration Solution. SQLyog Ultimate enables database developers, administrators, and architects to visually compare, optimize, and document schemas. SQLyog Ultimate includes a power tool to automate and schedule the synchronization of data between two MySQL hosts. Create the job definition file using the interactive wizard. The tool does not require any installation on the MySQL hosts. You can use any host to run the tool. SQLyog Ultimate includes a power tool to interactively synchronize data. Run the tool in attended mode to compare data from source and target before taking action. Using the intuitive display, compare data on source and target for every row to decide whether it should be synchronized in which direction. SQLyog Ultimate includes a power tool to interactively compare and synchronize schema. View the differences between tables, indexes, columns, and routines of two databases.
  • 50
    Apache Avro

    Apache Avro

    Apache Software Foundation

    Apache Avro™ is a data serialization system. Avro provides rich data structures, a compact, fast, binary data format, a container file, to store persistent data, remote procedure call (RPC). Also, it provides simple integration with dynamic languages. Code generation is not required to read or write data files nor to use or implement RPC protocols. Code generation as an optional optimization, only worth implementing for statically typed languages. Avro relies on schemas. When Avro data is read, the schema used when writing it is always present. This permits each datum to be written with no per-value overheads, making serialization both fast and small. This also facilitates use with dynamic, scripting languages, since data, together with its schema, is fully self-describing. When Avro data is stored in a file, its schema is stored with it, so that files may be processed later by any program. If the program reading the data expects a different schema this can be easily resolved.
  • Previous
  • You're on page 1
  • 2
  • 3
  • Next