Open Source Linux Natural Language Processing (NLP) Tools

Natural Language Processing (NLP) Tools for Linux

View 26 business solutions

Browse free open source Natural Language Processing (NLP) tools and projects for Linux below. Use the toggles on the left to filter open source Natural Language Processing (NLP) tools by OS, license, language, programming language, and project status.

  • Full-stack observability with actually useful AI | Grafana Cloud Icon
    Full-stack observability with actually useful AI | Grafana Cloud

    Our generous forever free tier includes the full platform, including the AI Assistant, for 3 users with 10k metrics, 50GB logs, and 50GB traces.

    Built on open standards like Prometheus and OpenTelemetry, Grafana Cloud includes Kubernetes Monitoring, Application Observability, Incident Response, plus the AI-powered Grafana Assistant. Get started with our generous free tier today.
    Create free account
  • MongoDB Atlas runs apps anywhere Icon
    MongoDB Atlas runs apps anywhere

    Deploy in 115+ regions with the modern database for every enterprise.

    MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.
    Start Free
  • 1
    MeCab is a fast and customizable Japanese morphological analyzer. MeCab is designed for generic purpose and applied to variety of NLP tasks, such as Kana-Kanji conversion. MeCab provides parameter estimation functionalities based on CRFs and HMM
    Leader badge
    Downloads: 2,254 This Week
    Last Update:
    See Project
  • 2
    OpenVINO

    OpenVINO

    OpenVINO™ Toolkit repository

    OpenVINO™ is an open-source toolkit for optimizing and deploying AI inference. Boost deep learning performance in computer vision, automatic speech recognition, natural language processing and other common tasks. Use models trained with popular frameworks like TensorFlow, PyTorch and more. Reduce resource demands and efficiently deploy on a range of Intel® platforms from edge to cloud. This open-source version includes several components: namely Model Optimizer, OpenVINO™ Runtime, Post-Training Optimization Tool, as well as CPU, GPU, MYRIAD, multi device and heterogeneous plugins to accelerate deep learning inferencing on Intel® CPUs and Intel® Processor Graphics. It supports pre-trained models from the Open Model Zoo, along with 100+ open source and public models in popular formats such as TensorFlow, ONNX, PaddlePaddle, MXNet, Caffe, Kaldi.
    Downloads: 23 This Week
    Last Update:
    See Project
  • 3
    Open Interpreter

    Open Interpreter

    A natural language interface for computers

    Open Interpreter is an open-source tool that provides a natural-language interface for interacting with your computer. It lets large language models (LLMs) run code locally (Python, JavaScript, shell, etc.), enabling you to ask your computer to do tasks like data analysis, file manipulation, browsing, etc. in human terms (“chat with your computer”), with safeguards. Runs locally or via configured remote LLM servers/inference backends, giving flexibility to use models you trust or have locally. It prompts you to approve code before executing, and supports both online LLM models and local inference servers. It seeks to combine convenience (like ChatGPT’s code interpreter) with control and flexibility by running on your own machine.
    Downloads: 16 This Week
    Last Update:
    See Project
  • 4
    tidytext

    tidytext

    Text mining using tidy tools

    tidytext brings tidy data principles to text mining by converting text into a tidy data frame format. It provides tools for tokenization, sentiment analysis, n‑gram creation, and term‑document matrices, enabling interoperability with dplyr, ggplot2, and other tidyverse workflows.
    Downloads: 15 This Week
    Last Update:
    See Project
  • AI-generated apps that pass security review Icon
    AI-generated apps that pass security review

    Stop waiting on engineering. Build production-ready internal tools with AI—on your company data, in your cloud.

    Retool lets you generate dashboards, admin panels, and workflows directly on your data. Type something like “Build me a revenue dashboard on my Stripe data” and get a working app with security, permissions, and compliance built in from day one. Whether on our cloud or self-hosted, create the internal software your team needs without compromising enterprise standards or control.
    Try Retool free
  • 5
    spaCy

    spaCy

    Industrial-strength Natural Language Processing (NLP)

    spaCy is a library built on the very latest research for advanced Natural Language Processing (NLP) in Python and Cython. Since its inception it was designed to be used for real world applications-- for building real products and gathering real insights. It comes with pretrained statistical models and word vectors, convolutional neural network models, easy deep learning integration and so much more. spaCy is the fastest syntactic parser in the world according to independent benchmarks, with an accuracy within 1% of the best available. It's blazing fast, easy to install and comes with a simple and productive API.
    Downloads: 12 This Week
    Last Update:
    See Project
  • 6
    Ciphey

    Ciphey

    Decrypt encryptions without knowing the key or cipher

    Fully automated decryption/decoding/cracking tool using natural language processing & artificial intelligence, along with some common sense. You don't know, you just know it's possibly encrypted. Ciphey will figure it out for you. Ciphey can solve most things in 3 seconds or less. Ciphey aims to be a tool to automate a lot of decryptions & decodings such as multiple base encodings, classical ciphers, hashes or more advanced cryptography. If you don't know much about cryptography, or you want to quickly check the ciphertext before working on it yourself, Ciphey is for you. The technical part. Ciphey uses a custom-built artificial intelligence module (AuSearch) with a Cipher Detection Interface to approximate what something is encrypted with. And then a custom-built, customizable natural language processing Language Checker Interface, which can detect when the given text becomes plaintext.
    Downloads: 8 This Week
    Last Update:
    See Project
  • 7
    Super comprehensive deep learning notes

    Super comprehensive deep learning notes

    Super Comprehensive Deep Learning Notes

    Super comprehensive deep learning notes is a massive and well-structured collection of deep learning notebooks that serve as a comprehensive study resource for anyone wanting to learn or reinforce concepts in computer vision, natural language processing, deep learning architectures, and even large-model agents. The repository contains hundreds of Jupyter notebooks that are richly annotated and organized by topic, progressing from basic Python and PyTorch fundamentals to advanced neural network designs like ResNet, transformers, and object detection algorithms. It’s not just a dry code repository; it includes theoretical explanations alongside hands-on examples, loss function explorations, optimization routines, and full end-to-end experiments on real datasets, making it highly suitable for both self-study and classroom use.
    Downloads: 7 This Week
    Last Update:
    See Project
  • 8
    Virastyar

    Virastyar

    Virastyar is an spell checker for low-resource languages

    Virastyar is a free and open-source (FOSS) spell checker. It stands upon the shoulders of many free/libre/open-source (FLOSS) libraries developed for processing low-resource languages, especially Persian and RTL languages Publications: Kashefi, O., Nasri, M., & Kanani, K. (2010). Towards Automatic Persian Spell Checking. SCICT. Kashefi, O., Sharifi, M., & Minaie, B. (2013). A novel string distance metric for ranking Persian respelling suggestions. Natural Language Engineering, 19(2), 259-284. Rasooli, M. S., Kahefi, O., & Minaei-Bidgoli, B. (2011). Effect of adaptive spell checking in Persian. In NLP-KE Contributors: Omid Kashefi Azadeh Zamanifar Masoumeh Mashaiekhi Meisam Pourafzal Reza Refaei Mohammad Hedayati Kamiar Kanani Mehrdad Senobari Sina Iravanin Mohammad Sadegh Rasooli Mohsen Hoseinalizadeh Mitra Nasri Alireza Dehlaghi Fatemeh Ahmadi Neda PourMorteza
    Downloads: 29 This Week
    Last Update:
    See Project
  • 9
    AWS Toolkit for Visual Studio Code

    AWS Toolkit for Visual Studio Code

    Local Lambda debug, CodeWhisperer, SAM/CFN syntax, etc.

    The AWS Toolkit extension for Visual Studio Code enables you to interact with Amazon Web Services (AWS). Try the AWS Code Sample Catalog to start coding with the AWS SDK. The AWS Explorer provides access to the AWS services that you can work with when using the Toolkit. To see the AWS Explorer, choose the AWS icon in the Activity bar. The Developer Tools panel is a section for developer-focused tooling curated for working in an IDE. The Developer Tools panel can be found underneath the AWS Explorer when the AWS icon is selected in the Activity bar. The AWS CDK Explorer enables you to work with AWS Cloud Development Kit (CDK) applications. It shows a top-level view of your CDK applications that have been synthesized in your workspace. Amazon CodeWhisperer provides inline code suggestions using machine learning and natural language processing on the contents of your current file. Supported languages include Java, Python and Javascript.
    Downloads: 4 This Week
    Last Update:
    See Project
  • Earn up to 16% annual interest with Nexo. Icon
    Earn up to 16% annual interest with Nexo.

    Access competitive interest rates on your digital assets.

    Generate interest, borrow against your crypto, and trade a range of cryptocurrencies — all in one platform. Geographic restrictions, eligibility, and terms apply.
    Get started with Nexo.
  • 10
    Docspell

    Docspell

    Assist in organizing your piles of documents

    Docspell is a personal document organizer. Or sometimes called a "Document Management System" (DMS). You'll need a scanner to convert your papers into files. Docspell can then assist in organizing the resulting mess. It can unify your files from scanners, emails, and other sources. It is targeted for home use, i.e. families, households, and also for smaller groups/companies. You can associate tags, set correspondent,s and lots of other predefined and custom metadata. If your documents are associated with such metadata, you can quickly find them later using the search feature. However adding this manually is a tedious task. Docspell can help by suggesting correspondents, guessing tags or finding dates using machine learning. It can learn metadata from existing documents and find things using NLP. This makes adding metadata to your documents a lot easier. For machine learning, it relies on the free (GPL) Stanford Core NLP library.
    Downloads: 4 This Week
    Last Update:
    See Project
  • 11
    VADER

    VADER

    Lexicon and rule-based sentiment analysis tool

    VADER (Valence Aware Dictionary and sEntiment Reasoner) is a lexicon and rule-based sentiment analysis tool designed for analyzing the sentiment of text, particularly in social media and short text formats. It is optimized for quick and accurate analysis of positive, negative, and neutral sentiments.
    Downloads: 4 This Week
    Last Update:
    See Project
  • 12
    compromise

    compromise

    Modest natural-language processing

    Language is complicated and there's a gazillion words. Compromise is a javascript library that interprets and pre-parses text and makes some reasonable decisions so things are way easier. Compromise tries its best to parse text. it is small, quick, and often good-enough. It is not as smart as you'd think. Conjugate and negate verbs in any tense. Play between plural, singular and possessive forms. Interpret plain-text numbers. Handle implicit terms. Use it on the client-side or as an es-module. compromise is 180kb (minified). It's pretty fast. It can run on keypress. It works mainly by conjugating all forms of a basic word list. Decide how words get interpreted or make heavier changes with a compromise-plugin. Parse text without running POS-tagging. Pre-parse any match statements for faster lookups. It is not the most accurate, or clever nlp library, but found its niche as an easy, small library that can run everywhere.
    Downloads: 4 This Week
    Last Update:
    See Project
  • 13
    Bolt NLP

    Bolt NLP

    Bolt is a deep learning library with high performance

    Bolt is a high-performance deep learning inference framework developed by Huawei Noah's Ark Lab. It is designed to optimize and accelerate the deployment of deep learning models across various hardware platforms. Bolt is a light-weight library for deep learning. Bolt, as a universal deployment tool for all kinds of neural networks, aims to automate the deployment pipeline and achieve extreme acceleration. Bolt has been widely deployed and used in many departments of HUAWEI company, such as 2012 Laboratory, CBG and HUAWEI Product Lines. If you have questions or suggestions, you can submit issue.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 14
    ChatGLM.cpp

    ChatGLM.cpp

    C++ implementation of ChatGLM-6B & ChatGLM2-6B & ChatGLM3 & GLM4(V)

    ChatGLM.cpp is a C++ implementation of the ChatGLM-6B model, enabling efficient local inference without requiring a Python environment. It is optimized for running on consumer hardware.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 15
    Datasets

    Datasets

    Hub of ready-to-use datasets for ML models

    Datasets is a library for easily accessing and sharing datasets, and evaluation metrics for Natural Language Processing (NLP), computer vision, and audio tasks. Load a dataset in a single line of code, and use our powerful data processing methods to quickly get your dataset ready for training in a deep learning model. Backed by the Apache Arrow format, process large datasets with zero-copy reads without any memory constraints for optimal speed and efficiency. We also feature a deep integration with the Hugging Face Hub, allowing you to easily load and share a dataset with the wider NLP community. There are currently over 2658 datasets, and more than 34 metrics available. Datasets naturally frees the user from RAM memory limitation, all datasets are memory-mapped using an efficient zero-serialization cost backend (Apache Arrow). Smart caching: never wait for your data to process several times.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 16
    Hazm

    Hazm

    Persian NLP Toolkit

    Hazm is a natural language processing (NLP) library for Persian text, offering various tools for text preprocessing, tokenization, part-of-speech tagging, and more.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 17
    NVIDIA NeMo

    NVIDIA NeMo

    Toolkit for conversational AI

    NVIDIA NeMo, part of the NVIDIA AI platform, is a toolkit for building new state-of-the-art conversational AI models. NeMo has separate collections for Automatic Speech Recognition (ASR), Natural Language Processing (NLP), and Text-to-Speech (TTS) models. Each collection consists of prebuilt modules that include everything needed to train on your data. Every module can easily be customized, extended, and composed to create new conversational AI model architectures. Conversational AI architectures are typically large and require a lot of data and compute for training. NeMo uses PyTorch Lightning for easy and performant multi-GPU/multi-node mixed-precision training. Supported models: Jasper, QuartzNet, CitriNet, Conformer-CTC, Conformer-Transducer, Squeezeformer-CTC, Squeezeformer-Transducer, ContextNet, LSTM-Transducer (RNNT), LSTM-CTC. NGC collection of pre-trained speech processing models.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 18
    Subliminal Blaster 4

    Subliminal Blaster 4

    Subliminal Blaster Powered 4 - Mude seus Hábitos! Change your habits

    Subliminal Blaster is a NLP software that shows text subliminal messages in your computer screen while you use it normaly for your activities. It re-programs your mind in a subconscious level while you exercite your conscious with your activities like browsing, working, watching video and others. Subliminal Blaster é um software de PNL que exibe mensagens subliminares na tela do PC enquanto você utiliza normalmente para suas atividades. Ele reprograma sua mente a nível subconsciente enquanto você exercita seu consciente em suas atividades. WE ARE NOW ON VERSION 4! Please support the project by donating bitcoins 1GRYGnSmpuU1ZuXodn2H9UVEpVRBx5CTL2 Or dogecoins! DBfkGrdLvmpbYQzcRCm9KLUuPk9Zigjjod Would you like to contribute? Go to our Facebook page! https://www.facebook.com/SubliminalBlasterIntl/
    Leader badge
    Downloads: 24 This Week
    Last Update:
    See Project
  • 19
    OpenNLP provides the organizational structure for coordinating several different projects which approach some aspect of Natural Language Processing. OpenNLP also defines a set of Java interfaces and implements some basic infrastructure for NLP compon
    Leader badge
    Downloads: 19 This Week
    Last Update:
    See Project
  • 20
    Detoxify

    Detoxify

    Trained models & code to predict toxic comments

    Detoxify is a deep learning-based tool for detecting and filtering toxic language in online conversations, leveraging Transformer models for high accuracy.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 21
    Diffgram

    Diffgram

    Training data (data labeling, annotation, workflow) for all data types

    From ingesting data to exploring it, annotating it, and managing workflows. Diffgram is a single application that will improve your data labeling and bring all aspects of training data under a single roof. Diffgram is world’s first truly open source training data platform that focuses on giving its users an unlimited experience. This is aimed to reduce your data labeling bills and increase your Training Data Quality. Training Data is the art of supervising machines through data. This includes the activities of annotation, which produces structured data; ready to be consumed by a machine learning model. Annotation is required because raw media is considered to be unstructured and not usable without it. That’s why training data is required for many modern machine learning use cases including computer vision, natural language processing and speech recognition.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 22
    Dragonfire

    Dragonfire

    The open-source virtual assistant for Ubuntu based Linux distributions

    Dragonfire is the open-source virtual assistant project for Ubuntu-based Linux distributions. Her main objective is to serve as a command and control interface to the helmet user. So that you will be able to give orders just by using your voice commands and your eye movements. That makes the helmet handsfree. We are planning to ship Dragonfire as a preinstalled software package on DragonOS Linux Distribution. DragonOS will be a Linux distribution specially designed for the helmet. It will contain various software packages for controlling the helmet. It will be the first of its kind. Dragonfire uses Mozilla DeepSpeech to understand your voice commands and Festival Speech Synthesis System to handle text-to-speech tasks.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 23
    Stanford CoreNLP

    Stanford CoreNLP

    Stanford CoreNLP, a Java suite of core NLP tools

    CoreNLP is your one stop shop for natural language processing in Java! CoreNLP enables users to derive linguistic annotations for text, including token and sentence boundaries, parts of speech, named entities, numeric and time values, dependency and constituency parses, coreference, sentiment, quote attributions, and relations. CoreNLP currently supports 6 languages, Arabic, Chinese, English, French, German, and Spanish. The centerpiece of CoreNLP is the pipeline. Pipelines take in raw text, run a series of NLP annotators on the text, and produce a final set of annotations. Pipelines produce CoreDocuments, data objects that contain all of the annotation information, accessible with a simple API, and serializable to a Google Protocol Buffer. CoreNLP generates a variety of linguistic annotations, including parts of speech, named entities, dependency parses, and coreference.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 24
    AIVA (A.I. Virtual Assistant)

    AIVA (A.I. Virtual Assistant)

    AIVA (A.I. Virtual Assistant): General-purpose virtual assistant

    AIVA is a general-purpose virtual assistant designed for developers, enabling the creation of customizable AI assistants for various applications.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 25
    Chinese-XLNet

    Chinese-XLNet

    Chinese XLNet pre-trained model

    Chinese-XLNet is a Chinese language pre-trained model based on the XLNet architecture, providing an advanced foundation for natural language processing tasks in Mandarin and other Chinese dialects. Unlike traditional masked language modeling, XLNet uses a permutation language modeling objective that captures bidirectional context more effectively by training over all possible token orderings, yielding richer contextual representations. This model is trained on large-scale Chinese text datasets to learn linguistic patterns, long-range dependencies, and semantic nuance typical of Chinese writing, making it useful for tasks like text classification, question answering, named entity recognition, and language generation. Chinese-XLNet offers an alternative to models like BERT by emphasizing autoregressive and permutation-based learning, which can lead to performance improvements on certain benchmarks and tasks.
    Downloads: 1 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • 2
  • 3
  • 4
  • 5
  • Next
MongoDB Logo MongoDB