Page 3 | Best Open Source Linux Big Data Tools 2026

Big Data Tools for Linux

View 45 business solutions

Big Data Linux Clear Filters

Gemini 3 and 200+ AI Models on One Platform
Access Google's best plus Claude, Llama, and Gemma. Fine-tune and deploy from one console.

Build generative AI apps with Vertex AI. Switch between models without switching platforms.

Start Free
Auth0 B2B Essentials: SSO, MFA, and RBAC Built In
Unlimited organizations, 3 enterprise SSO connections, role-based access control, and pro MFA included. Dev and prod tenants out of the box.

Auth0's B2B Essentials plan gives you everything you need to ship secure multi-tenant apps. Unlimited orgs, enterprise SSO, RBAC, audit log streaming, and higher auth and API limits included. Add on M2M tokens, enterprise MFA, or additional SSO connections as you scale.

Sign Up Free
1

HSRA

Hadoop spliced read aligner for RNA-seq data

HSRA is a MapReduce-based parallel tool for mapping reads from RNA sequencing (RNA-seq) experiments. RNA-seq analyses typically begin by mapping reads to a reference genome in order to determine the location from which the reads were originated, which is a very time-consuming step. This tool allows bioinformatics researchers to efficiently distribute their mapping tasks over the nodes of a cluster by combining a fast multithreaded spliced aligner (HISAT2) with Apache Hadoop, which is a distributed computing framework for scalable Big Data processing. HSRA currently supports single-end and paired-end read alignments from FASTQ/FASTA datasets. Moreover, our tool uses the Hadoop Sequence Parser (HSP) library (link above) to efficiently read the input datasets stored on the Hadoop Distributed File System (HDFS), being able to process datasets compressed with Gzip and BZip2 codecs.

Downloads: 0 This Week

Last Update: 2019-01-23
See Project
2

LEACrypt

TTAK.KO-12.0223 Lightweight Encryption Algorithm Tool

The Lightweight Encryption Algorithm (also known as LEA) is a 128-bit block cipher developed by South Korea in 2013 to provide confidentiality in high-speed environments such as big data and cloud computing, as well as lightweight environments such as IoT devices and mobile devices. LEA is one of the cryptographic algorithms approved by the Korean Cryptographic Module Validation Program (KCMVP) and is the national standard of Republic of Korea (KS X 3246). LEA is included in the ISO/IEC 29192-2:2019 standard (Information security - Lightweight cryptography - Part 2: Block ciphers). This project is licensed under the ISC License. Copyright © 2020-2021 ALBANESE Research Lab Source code: https://github.com/pedroalbanese/leacrypt Visit: http://albanese.atwebpages.com

Downloads: 0 This Week

Last Update: 2022-12-16
See Project
3

LogicalSets

Integrated Comprehensive Data Architecture & Methodology

This is an advanced data architecture and methodology. A comprehensive Enterprise Resource Management System. A re-usable database with rules for customization, While being a data driven transaction processing engine, this system has very advanced reporting capabilities. This design eliminates up to 90% of business logic due to the way the data is structured. Uses a concept called Table Sets. Has a compound key that tells the programmer what tableset, which record which applet will view/edit the data. Developed in SAP PowerDesigner, for (Sybase) SQL Anywhere. Don't let the date fool you, this system is ahead of its time.

Downloads: 0 This Week

Last Update: 2021-12-06
See Project
4

MapReduce Brazil

Aggregates MapReduce projects

Nowadays the production and storage of Big Data is common, both in the academy and in the enterprises. To process this huge amount of data it is essential the use of high performance platforms and programming models like MapReduce

Downloads: 0 This Week

Last Update: 2015-08-26
See Project
AI-generated apps that pass security review
Stop waiting on engineering. Build production-ready internal tools with AI—on your company data, in your cloud.

Retool lets you generate dashboards, admin panels, and workflows directly on your data. Type something like “Build me a revenue dashboard on my Stripe data” and get a working app with security, permissions, and compliance built in from day one. Whether on our cloud or self-hosted, create the internal software your team needs without compromising enterprise standards or control.

Try Retool free
5

MarDRe

MapReduce-based tool to remove duplicate DNA reads

MarDRe is a de novo MapReduce-based parallel tool to remove duplicate and near-duplicate DNA reads through the clustering of single-end and paired-end sequences from FASTQ/FASTA datasets. This tool allows bioinformatics to avoid the analysis of not necessary reads, reducing the time of subsequent procedures with the dataset. MarDRe is the Big Data counterpart of ParDRe (link above), which employs HPC technologies (i.e., hybrid MPI/multithreading) to reduce runtime on multicore systems. Instead, MarDRe takes advantage of the MapReduce programming model to significantly improve ParDRe performance on distributed systems, especially on cloud-based infrastructures. Written in pure Java to maximize cross-platform compatibility, MarDRe is built upon the open-source Apache Hadoop project, the most popular distributed computing framework for Big Data processing.

Downloads: 0 This Week

Last Update: 2019-01-23
See Project
6

Modin

Scale your Pandas workflows by changing a single line of code

Scale your pandas workflow by changing a single line of code. Modin uses Ray, Dask or Unidist to provide an effortless way to speed up your pandas notebooks, scripts, and libraries. Unlike other distributed DataFrame libraries, Modin provides seamless integration and compatibility with existing pandas code. Even using the DataFrame constructor is identical. It is not necessary to know in advance the available hardware resources in order to use Modin. Additionally, it is not necessary to specify how to distribute or place data. Modin acts as a drop-in replacement for pandas, which means that you can continue using your previous pandas notebooks, unchanged, while experiencing a considerable speedup thanks to Modin, even on a single machine. Once you’ve changed your import statement, you’re ready to use Modin just like you would pandas.

Downloads: 0 This Week

Last Update: 2025-10-02
See Project
7

Nebula Graph

A distributed, fast open-source graph database

The graph database built for super large-scale graphs with milliseconds of latency. Optimized SUBGRAPH and FIND PATH for better performance. Optimized query paths to reduce redundant paths and time complexity. Optimized the method to get properties for better performance of MATCH statements. Nebula Graph adopts the Apache 2.0 license, one of the most permissive free software licenses in the world. Free as in freedom, because, under the Apache 2.0 license, you can use, copy, modify and redistribute Nebula Graph, even for commercial purposes, all without asking for permission. We believe that great open source projects are not built in isolation, but rather by a community of contributors. We welcome contributions to Nebula Graph from anyone regardless of skill level or background in software development. If you have an idea for a feature you would like to see added, or you have identified a bug that needs fixing, please don't hesitate to submit an issue to our Github repository.

Downloads: 0 This Week

Last Update: 2024-05-17
See Project
8

Neuro

The Neuro crypto currency

The Neuro NRO cryptocurrency is designed to support solutions of machine learning tasks, big data and neural networks. Neuro is a scientific-technical project uniting scientists, engineers and programmers inspired by the idea to build something big, kind and bright. From the first stages of work, we will be engaged in the development of new architectures and algorithms of neural networks. Someday we will undoubtedly enter the annual ImageNet Challenge contest to compete with such giants as GoogLeNet Inception and Microsoft ResNet. At further stages of the work, we adapt the neural networks to calculate molecular interactions in protein environments. Our system will help to look for new types of drugs for cancer, Alzheimer's and other serious problems of modern medicine. We plan to make a serious contribution to the increase of human life expectancy.

Downloads: 0 This Week

Last Update: 2019-07-29
See Project
9

OCW Test - Out of Commerce Works

Program for out of commerce works detection

The OCW Test program has been designed to provide assistance in the detection of works outside trade, taking as reference a list of works from a specific bibliographic catalog. In this first version, the program operates on the identifiers of the books of the library of the Complutense University of Madrid. However, the program can be reedited, to work on any bibliographic catalog.

Downloads: 0 This Week

Last Update: 2019-03-24
See Project
Try Google Cloud Risk-Free With $300 in Credit
No hidden charges. No surprise bills. Cancel anytime.

Use your credit across every product. Compute, storage, AI, analytics. When it runs out, 20+ products stay free. You only pay when you choose to.

Start Free
10

Oasis Development Tool

OASIS Development Tool

The OASIS Development Tool is an innovative IDE for Code Generation-, Code Debugging- and Visual Coding- using the OASIS Programming Language. The OASIS Programming Language is a 4GL Concurrency- and Database Language running round a distributed OASIS Runtime Machine Environment (RME) as interpreted OASIS Scripts sequenced into OASIS Polyglot Runtime Components (PRC) with just in time patterns. The IDE is designed specifically for the OASIS Programming Language. The IDE is focused around the concept of Visual-, Online-, Data-Centric-, Concurrent-, and Runtime- Code, whilst remaining an IDE to handle OASIS Programming. The IDE has a number of visual code drag and drop features. The Tool is by no means a representative of the Cyclical UML Model- and Code concept, but rather a replacement. The IDE Tool is focused around (Team Based) System Engineering, Meta Programming, Visual Coding, Concurrent Processing and, Databases and Big Data.

Downloads: 0 This Week

Last Update: 2015-02-15
See Project
11

Oblivious Bloom Intersection

Oblivious Bloom Intersection

This page is about the PSI implementation described in the paper: When Private Set Intersection Meets Big Data: An Efficient and Scalable Protocol (CCS 2013).

Downloads: 0 This Week

Last Update: 2013-08-22
See Project
12

Occursions

Fast customizable time series web database for big data like log files

Our goal is to create the world's fastest extendable, non-transactional time series database for big data (you know, for kids)! Log file indexing is our initial focus. For example append only ASCII files produced by libraries like Log4J, or containing FIX messages or JSON objects. Occursions was built by a small team sick of creating hacks to remotely copy and/or grep through tons of large log files. We use it to index around a terabyte of new log data per day. You can use it too. Who doesn't have `just too many' log files? Occursions asynchronously tails log files and indexes the individual lines in each log file as each line is written to disk so you don't even have to wait for a second after an event happens to search for it. Occursions uses custom disk backed data structures to create and search its indexes so it is very efficient at using CPU, memory and disk. You can extend Occursions with shared libraries to support your own file formats, even binary file formats!

Downloads: 0 This Week

Last Update: 2013-05-30
See Project
13

PROPER

PROPER is a package for visual evaluation of ranking classifiers for biological big data mining studies in the mathematical language MATLAB. It is an efficient tool for optimization and comparison of the state-of-the-art ranking classifiers by generating over 20 different high quality two- and three-dimensional performance curves.

Downloads: 0 This Week

Last Update: 2015-06-06
See Project
14

PROPER

PROPER is a package for visual evaluation of ranking classifiers for biological big data mining studies in the mathematical language MATLAB. It is an efficient tool for optimization and comparison of the state-of-the-art ranking classifiers by generating over 20 different high quality two- and three-dimensional performance curves.

Downloads: 0 This Week

Last Update: 2015-03-06
See Project
15

PanoramaServer

Open Source Panorama Server for free virtual tour of 360 degrees views

Ideal for creating virtual tours of panoramic views for all sorts including property exhibition for brokers at real estate agencies/property agents, tour guide for indoor/outdoor venues, information to public/private facilities for curators, travel journal for tourist as log book, backdrop setting for storytelling, treasure hunt like games, big data mining for pattern through computer vision in artificial intelligence, etc. It is like creating your own Google Map Street View. All is required by the user is to have photos of equirectangular format (panorama) taken from 3D cameras common for on-site premises. These images can be referenced by the PanoramaServer to create virtual travels with 360 degrees view where viewers can navigate to different locations, view information, etc. If made available online to general public over the internet, can even share the link of your virtual trips. PanoramaServer is free as it is open source licensed.

Downloads: 0 This Week

Last Update: 2018-09-07
See Project
16

R Hadoop for Big Data

Download Free Associated R open source script files for big data analy

Download Free Associated R open source script files for big data analysis with Hadoop and R These are R script source file from Ram Venkat from a past Meetup we did at http://www.meetup.com/R-Matlab-Users/events/85160532/ Also, there is a long video and Powerpoint presentation slide PDF with R files at: http://quantlabs.net/blog/2012/11/how-to-use-hadoop-and-r-for-big-data-parallel-processing-free-download-pdf/ Download source files from http://quantlabs.net/blog/2012/11/download-free-associated-r-open-source-script-files-for-big-data-analysis-with-hadoop-and-r-rstats-hadoop/

Downloads: 0 This Week

Last Update: 2015-06-04
See Project
17

Random Bits Regression

Random Bits Regression is a strong general predictor.

We proposed an accurate, robust and fast general predictor (RBR) for regression and classification in big data era. The application of this method is very broad, from science to industry, finance and health. The accuracy and robustness improvement of our method over existing method could bring huge benefits in some critical applications. For example, natural disaster prediction, stock price prediction, personal/population disease prediction. The fast-speed nature of our method not only allows big data analysis but also enables real-time recognition and predictions. The RBR framework also hints the mechanism of brain function and leads to a "wide learning" hypothesis. We believe that this method will make a great impact and enable many downstream applications.

Downloads: 0 This Week

Last Update: 2016-12-04
See Project
18

Redis Desktop Manager

:wrench: Cross-platform GUI management tool for Redis

Redis Desktop Manager is a fast, open source Redis database management application based on Qt 5. It's available for Windows, Linux and MacOS and offers an easy-to-use GUI to access your Redis DB. With Redis Desktop Manager you can perform some basic operations such as view keys as a tree, CRUD keys and execute commands via shell. It also supports SSL/TLS encryption, SSH tunnels and cloud Redis instances, such as: Amazon ElastiCache, Microsoft Azure Redis Cache and Redis Labs.

1 Review

Downloads: 0 This Week

Last Update: 2018-10-11
See Project
19

Relation Tags

Source code for be able to use Relation Tags.

Source code for be able to use Relation Tags. It is part of project VocabularyMem but can be used separately. Relation Tags are tags which can be relationed together . For example tag "Paris" and tag "France" can be relationed with a relation "is part of". This code is created from 0 and is able to define which type of relation we use, using most elemental mathematic properties. It is strongly recommended to read "Relation Tags guide for programmers". Inside source zip, also contains dialogs for set properties of this extended tags. All this dialogs files finish either with "...dlg.cpp" or ",,,dlg.h". Please read "readme" file. It is recommended to use a binary matrix class like BinMatrix in order to have enough speed for calculations of implicit relations in a system of bogus tags with big data. Need to be compiled with C++11 and Qt libraries

Downloads: 0 This Week

Last Update: 2015-08-11
See Project
20

Sample Level Musical Timeline

Sample Level Modulation of Musical Timeline

Sample Level Modulation of Musical Timeline Mingfeng Zhang Dept. of Electrical and Computer Engineering, University of Rochester In this toolbox we provide signal processing tools to allocate music events (samples of musical notes) to specified time locations with sample level accuracy. In this implementation, we use computational tools to add in micro-timing variations in J.S. Bach four-part chorales as a "visualizer" for big data. By extracting data patterns from multiple time scales, we implement a tool that musicians can perform the big data at different resolutions. This toolbox will need the following supporting toolboxes: MIDI TOOLBOX https://www.jyu.fi/hum/laitokset/musiikki/en/research/coe/materials/miditoolbox MIR TOOLBOX https://www.jyu.fi/hum/laitokset/musiikki/en/research/coe/materials/mirtoolbox Please add the path in MATLAB for these two toolbox. Please also read the project document file (readme.doc/pdf) for more details

Downloads: 0 This Week

Last Update: 2015-07-02
See Project
21

SentimentAnalysis-Rick&Morty

Rick & Morty Sentiment Analysis - End-of-Degree Project - UNIR

The remarkable progress in the field of Big Data has driven the development of new technologies in natural language processing and data analysis. Text mining is a fascinating application of data analysis that extracts relevant information from related writings in different linguistic contexts. And therefore, in natural language processing, sentiment analysis and classification stands out as a key application supported by text mining. Through the extraction of information from textual data, it becomes possible to identify and comprehend the sentiments and emotions conveyed. In this end-of-degree work, we analyze and classify the dialogue of characters in an English-language television series as "Rick and Morty" using Python. The objective is to identify and categorize the feelings and emotions expressed in the text, comparing the human perception of the characters' personalities with the results obtained using natural language processing techniques.

Downloads: 0 This Week

Last Update: 2023-07-12
See Project
22

Snowplow Analytics

Enterprise-strength marketing and product analytics platform

Snowplow is ideal for data teams who want to manage the collection and warehousing of data across all their platforms and products.

Downloads: 0 This Week

Last Update: 2022-01-31
See Project
23

TensorBase

TensorBase is a new big data warehousing with modern efforts

TensorBase hopes the open source not become a copy game. TensorBase has a clear-cut opposition to fork communities, repeat wheels, or hack traffic for so-called reputations (like Github stars). After thoughts, we decided to temporarily leave the general data warehousing field. For people who want to learn how a database system can be built up, or how to apply modern Rust to the high-performance field, or embed a lightweight data analysis system into your own big one. You can still try, ask or contribute to TensorBase. The committers are still around the community. We will help you in all kinds of interesting things pursued in the project by us and maybe you. We still maintain the project to look forward to meeting more database geniuses in this world, although no new feature will be added in the near future.

Downloads: 0 This Week

Last Update: 2022-07-25
See Project
24

Universal Java Matrix Package

sparse and dense matrix, linear algebra, visualization, big data

The Universal Java Matrix Package (UJMP) is an open source Java library which provides sparse and dense matrix classes, as well as a large number of calculations for linear algebra such as matrix multiplication or matrix inverse. Operations such as mean, correlation, standard deviation, replacement of missing values or the calculation of mutual information are supported, too. The Universal Java Matrix Package provides various visualization methods, import and export filters for a large number of file formats, and even the possibility to link to JDBC databases. Multi-dimensional matrices as well as generic matrices with a specified object type are supported and very large matrices can be handled even when they do not fit into memory.

1 Review

Downloads: 0 This Week

Last Update: 2015-08-19
See Project
25

Vaex

Out-of-Core hybrid Apache Arrow/NumPy DataFrame for Python

Data science solutions, insights, dashboards, machine learning, deployment. We start at 100GB. Vaex is a high-performance Python library for lazy Out-of-Core data frames (similar to Pandas), to visualize and explore big tabular datasets. It calculates statistics such as mean, sum, count, standard deviation etc, on an N-dimensional grid for more than a billion (10^9) samples/rows per second. Visualization is done using histograms, density plots and 3d volume rendering, allowing interactive exploration of big data. Vaex uses memory mapping, zero memory copy policy and lazy computations for best performance (no memory wasted). Cut development cut development time by 80%. Your prototype is your solution. Create automatic pipelines for any model.

Downloads: 0 This Week

Last Update: 2023-07-31
See Project