Open Source Java Natural Language Processing (NLP) Tools

Java Natural Language Processing (NLP) Tools

View 189 business solutions

Browse free open source Java Natural Language Processing (NLP) Tools and projects below. Use the toggles on the left to filter open source Java Natural Language Processing (NLP) Tools by OS, license, language, programming language, and project status.

  • Gen AI apps are built with MongoDB Atlas Icon
    Gen AI apps are built with MongoDB Atlas

    Build gen AI apps with an all-in-one modern database: MongoDB Atlas

    MongoDB Atlas provides built-in vector search and a flexible document model so developers can build, scale, and run gen AI apps without stitching together multiple databases. From LLM integration to semantic search, Atlas simplifies your AI architecture—and it’s free to get started.
    Start Free
  • Our Free Plans just got better! | Auth0 Icon
    Our Free Plans just got better! | Auth0

    With up to 25k MAUs and unlimited Okta connections, our Free Plan lets you focus on what you do best—building great apps.

    You asked, we delivered! Auth0 is excited to expand our Free and Paid plans to include more options so you can focus on building, deploying, and scaling applications without having to worry about your security. Auth0 now, thank yourself later.
    Try free now
  • 1
    Stanford CoreNLP

    Stanford CoreNLP

    Stanford CoreNLP, a Java suite of core NLP tools

    CoreNLP is your one stop shop for natural language processing in Java! CoreNLP enables users to derive linguistic annotations for text, including token and sentence boundaries, parts of speech, named entities, numeric and time values, dependency and constituency parses, coreference, sentiment, quote attributions, and relations. CoreNLP currently supports 6 languages, Arabic, Chinese, English, French, German, and Spanish. The centerpiece of CoreNLP is the pipeline. Pipelines take in raw text, run a series of NLP annotators on the text, and produce a final set of annotations. Pipelines produce CoreDocuments, data objects that contain all of the annotation information, accessible with a simple API, and serializable to a Google Protocol Buffer. CoreNLP generates a variety of linguistic annotations, including parts of speech, named entities, dependency parses, and coreference.
    Downloads: 5 This Week
    Last Update:
    See Project
  • 2
    OpenNLP provides the organizational structure for coordinating several different projects which approach some aspect of Natural Language Processing. OpenNLP also defines a set of Java interfaces and implements some basic infrastructure for NLP compon
    Downloads: 21 This Week
    Last Update:
    See Project
  • 3

    BioC

    We describe a simple XML format to share text documents and annotation

    A minimalist approach to share text documents and data annotations. Allows a large number of different annotations to be represented. Project files contain: - simple code to hold/read/write data and perform sample processing. - BioC-formatted corpora - BioC tools that work with BioC corpora BioC goals - simplicity - interoperability - broad use - reuse There should be little investment required to learn to use a format or a software module to process that format. We are interested in reuse, and we focus on common NLP tasks that are broadly useful for textmining.
    Leader badge
    Downloads: 21 This Week
    Last Update:
    See Project
  • 4
    masmt

    masmt

    A frame work for Multi agent system development

    MaSMT is a java based multi-agent system development framework, especially designed for development of English to Sinhala machine translation system. MaSMT also capable to develop any multi-agent based system through its architecture. Reference: B. Hettige, A. S. Karunananda, G. Rzevski, Multi-agent solution for managing complexity in English to Sinhala Machine Translation, International Journal of Design & Nature and Ecodynamics, Volume 11, Issue 2, 2016, 88 – 96. B. Hettige, A. S. Karunananda, G. Rzevski, ” MaSMT: A Multi-agent System Development Framework for English-Sinhala Machine Translation”, International Journal of Computational Linguistics and Natural Language Processing (IJCLNLP), Volume 2 Issue 7 July 2013.
    Downloads: 5 This Week
    Last Update:
    See Project
  • Secure User Management, Made Simple | Frontegg Icon
    Secure User Management, Made Simple | Frontegg

    Get 7,500 MAUs, 50 tenants, and 5 SSOs free – integrated into your app with just a few lines of code.

    Frontegg powers modern businesses with a user management platform that’s fast to deploy and built to scale. Embed SSO, multi-tenancy, and a customer-facing admin portal using robust SDKs and APIs – no complex setup required. Designed for the Product-Led Growth era, it simplifies setup, secures your users, and frees your team to innovate. From startups to enterprises, Frontegg delivers enterprise-grade tools at zero cost to start. Kick off today.
    Start for Free
  • 5
    This ohnlp project has released "pipelines" that were contributed by members of the OHNLP Consortium. The pipelines are based on the Apache UIMA framework. medKAT/P, MedCoref, MedTagger, MedXN, and cTAKES are licensed under Apache License V2.0. MedTime is licensed under GNU General Public License version 3.0 (GPLv3). cTAKES development has moved to apache.org. See http://ctakes.apache.org/
    Downloads: 3 This Week
    Last Update:
    See Project
  • 6
    Next Generation Programming

    Next Generation Programming

    Compose Software Without Writing Any Programing Code

    "Next Generation Programming - Programming Without Coding Software" is a drag-drop wizard for creating simple or complex applications without writing any programming language code The Software is coded/designed with "Java Programming Language" for novice/expert programmers; Programmers can write softwares with visual tools : drag-drop components;visual editors... Programmers can use the software to compose of simple/complex applications : Database programs, circuit design, generate code and upload to chip for designed circuits (ESP8266, ESP32 chips) The Software in question is much simpler to use than PWCT (https://sourceforge.net/projects/doublesvsoop/) software. The Software has more features than PWCT software such as SCADA. Please start by looking at examples from the website first. In this way, you can learn the features of the software and how to use the software in a very short time. More Information (Documents, Videos, Examples ...) : negep.epizy.com
    Downloads: 9 This Week
    Last Update:
    See Project
  • 7
    Welsh Natural Language Toolkit
    The project supports the Welsh Language Technology domain with a set of NLP tools that drive innovation and advance the development of sophisticated textual analysis solutions. The WNLT project delivers four core NLP modules; a) Word Segmentation for separating text into words b) Sentence Boundary Disambiguation for finding sentence boundaries c) Part of Speech Tagger for determining the part of speech of each word d) Morphological Analyser for identifying the root form (lemma) of words. The modules are written in JAVA and ‘wrapped’ for execution under the General Architecture for Text Engineering (GATE) framework. The project also includes CYMRIE an adapted version for Welsh of the GATE - ANNIE Named Entity Recognition (NER) application for a range of entities such as Persons, Organisations, Locations, and date and time expressions. Version 2.x The CYMRIE pipeline is accessible via a API, standalone GUI and CLI. The CymrIE pipeline has also been adapted for Twitter.
    Downloads: 6 This Week
    Last Update:
    See Project
  • 8
    Maximum entropy is a powerful method for constructing statistical models of classification tasks, such as part of speech tagging in Natural Language Processing. Several example applications using maxent can be found in the OpenNLP Tools Library.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 9
    MutationFinder is a biomedical natural language processing (NLP) system for extracting mentions of point mutations from free text. MutationFinder achieves high performance (99% precision, 81% recall on blind test data) as an information extraction system
    Downloads: 4 This Week
    Last Update:
    See Project
  • AI-powered service management for IT and enterprise teams Icon
    AI-powered service management for IT and enterprise teams

    Enterprise-grade ITSM, for every business

    Give your IT, operations, and business teams the ability to deliver exceptional services—without the complexity. Maximize operational efficiency with refreshingly simple, AI-powered Freshservice.
    Try it Free
  • 10
    TXM

    TXM

    Unicode XML TEI text analysis platform

    TXM is a free and open-source cross-platform Unicode & XML based text analysis environment and graphical client, supporting Windows, Linux and Mac OS X. It can also be used online as a J2EE standard compliant web portal (GWT based) with access control built in. DOWNLOAD LATEST VERSION OF TXM : http://textometrie.ens-lyon.fr/spip.php?rubrique61&lang=en TXM offers a comprehensive range of analysis tools (concordances, collocate search, frequency lists, etc.) based on the powerfull CQP full text search engine (http://cwb.sourceforge.net) and a range of statistical functions (factorial analysis, classification, cooccurrency analysis, etc.) based on R packages (http://www.r-project.org). Read the scientific background at the Textométrie project web site http://textometrie.ens-lyon.fr/?lang=en. Read a full description at the TEI Tools wiki http://wiki.tei-c.org/index.php/TXM.
    Downloads: 4 This Week
    Last Update:
    See Project
  • 11
    JWNL is a Java API for accessing the WordNet relational dictionary. WordNet is widely used for developing NLP applications, and a Java API such as JWNL will allow developers to more easily use Java for building NLP applications.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 12
    Common Resource Grep - crgrep

    Common Resource Grep - crgrep

    Common Resource Grep

    CRGREP searches for matching text in databases, various document formats, archives and other difficult to access resources. A command line tool for name and content text matching in database tables, plain files, MS Office documents, PDF, archives, MP3 audio, image meta-data, scanned documents, maven dependencies and web resources. CRGREP will search resources within resources of any arbitrary combination or depth, so text within a document within a zip archive, and so on. Here you will find binary downloads and discussion (https://sourceforge.net/p/crgrep/discussion/) . The actual development and issue tracking can be found here: https://bitbucket.org/cryanfuse/crgrep
    Downloads: 1 This Week
    Last Update:
    See Project
  • 13
    MARF is a general cross-platform framework with a collection of algorithms for audio (voice, speech, and sound) and natural language text analysis and recognition along with sample applications (identification, NLP, etc.) of its use, implemented in Java.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 14
    Welsh Natural Language Toolkit

    Welsh Natural Language Toolkit

    WNLT is a suite of open source natural language modules for the Welsh

    The project supports the Welsh Language Technology domain with a set of NLP tools that drive innovation and advance the development of sophisticated textual analysis solutions. The WNLT project delivers four core NLP modules; a) Word Segmentation for separating text into words b) Sentence Boundary Disambiguation for finding sentence boundaries c) Part of Speech Tagger for determining the part of speech of each word d) Morphological Analyser for identifying the root form (lemma) of words. The modules are written in JAVA and ‘wrapped’ for execution under the General Architecture for Text Engineering (GATE) framework. The project also includes CYMRIE an adapted version for Welsh of the GATE - ANNIE Named Entity Recognition (NER) application for a range of entities such as Persons, Organisations, Locations, and date and time expressions.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 15
    mbFXWords

    mbFXWords

    Analyze text. Diagonal read subject, predicate, obj. Search other pdf.

    Version 1.04. Applies and builds upon Apache OpenNLP. For English, French and German files. JavaFX Application, runs with Oracle Java Runtime Environment version 8 that is including JavaFX. NLP extensions: - Divide sentences in subclauses: segmentation. - Divide plain text: subject, predicate, object. - Count words: stemming. - Search for similar content: pdf's. Gives out subject, predicate and object of sentences of pdf and plain text files. Provides comfortable GUI. Automatic language detection.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 16

    Service Grid - Language Grid Base System

    SOA infrastracture initially developed by NICT Language Grid Project

    Service Grid is an infrastructure for accumulating and sharing Web services. Resources with complicated intellectual property issues are wrapped as Web services and shared on the Service Grid. If you release your software by using the software of this project, please include the following description in the documents or on the website. * This software uses the [SOFTWARE] by the Language Grid project (http://langrid.org/). [SOFTWARE] is one of: * Service Grid Server Software (http://langrid.org/oss-project/en/service_grid.html) * Language Service Development Libraries (http://langrid.org/oss-project/en/language_service.html) * Language Grid Toolbox (http://langrid.org/oss-project/en/toolbox.html) If you publish a paper by using the software of this project, please cite the following book. * Toru Ishida Ed. The Language Grid: Service-Oriented Collective Intelligence for Laguage Resource Interoperability. Springer, 2011. ISBN 978-3-642-21177-5.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 17
    Open Pandora's Box

    Open Pandora's Box

    Pandora is an artificial intelligent web based bot

    Pandora is an artificial intelligent web based bot written in Java. Pandora is a component based AI architecture including, database memory, XML, voice, voice rec, chat, IRC, HTTP, Wiktionary, Freebase, consciousness, language, GUI, applet, web, jsp, Android
    Downloads: 1 This Week
    Last Update:
    See Project
  • 18

    OPTIMA cidoc-crm Semantic Annotation

    Semantic annotation of archaeology reports with respect to CIDOC-CRM

    The semantic annotation system OPTIMA is the result of Andreas Vlachidis PhD work, (supervised by Prof. Douglas Tudhope, University of Glamorgan, UK). OPTIMA performs the NLP tasks of Named Entity Recognition, Relation Extraction, Negation Detection and Word Sense Disambiguation using hand-crafted rules and SKOS terminological resources (English Heritage Thesauri and Glossaries). The resulted semantic annotations are associated with classes of the (ISO 21127:2006) CIDOC Conceptual Reference Model (CRM) and its archaeological extension, CRM-EH. OPTIMA is also targeted at the detection and recognition of contextual relations between CRM entities. Such relations are modeled with respect to the CRM-EH archaeology extension. The pipeline targets the CIDOC-CRM entities; E19.Physical_Object, E53.Place, E49.Time_Appellation and E57.Material and the CRM-EH entities; EHE1001.Context_Event, EHE1002.Production_Event, EHE1004.Deposition_Event and P45.consists_of material property
    Downloads: 1 This Week
    Last Update:
    See Project
  • 19

    TextProcessor

    A Java package to preprocess text datasets for posterior text analysis

    The TextProcessor Java package is a text processing toolkit, which provides some frequently used text processing functions such as stemming, removing stop-words, generating a term vocabulary, and calculating the term-doc frequency matrix. Basic topic mining models such as LDA and sparse NMF are also supported. The package can also generate feature files from a given text dataset with LDA and LIBSVM format for posterior procedures such as classification or clustering. The toolkit is also being extended for more advanced text analysis tasks based on natural language processing techniques.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 20
    The Aikernel is an intelligence server and cell runtime environment that uses natural language processing and other pattern matching with Activators, Contexts, Concepts to allow multi tasking between installed cells.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 21
    AminePlatform

    AminePlatform

    Amine is a Multi-Layer Platform for the dev. of Intelligent Systems

    Amine is an Artificial Intelligence Multi-Layer Java Open Source Platform dedicated to the development of various kinds of Intelligent Systems and Agents (Knowledge-Based, Ontology-Based, Conceptual Graph -CG- Based, NLP, Reasoning and Learning, Natural Language Processing, etc.). Ontology, KB can be created and manipulated with various processes. CG theory is used as the main knowledge representation language. Amine provides two languages: PROLOG+CG which extends PROLOG with CG and Amine modules, and SYNERGY which is a visual activation/propagation based language. CGs are considered by SYNERGY as activable/executable graphs. See for more detail: //amine-platform.sourceforge.net/
    Downloads: 0 This Week
    Last Update:
    See Project
  • 22
    Ansj Chinese word segmentation

    Ansj Chinese word segmentation

    Ansj word segmentation

    The real java implementation of ict. The word segmentation effect is faster than the open source version of ict. Chinese word segmentation, name recognition, part-of-speech tagging, user-defined dictionary. This is a java implementation of Chinese word segmentation based on n-Gram+CRF+HMM. The word segmentation speed reaches about 2 million words per second (tested under mac air), and the accuracy rate can reach more than 96%. At present, it has realized the functions of Chinese word segmentation, Chinese name recognition, user-defined dictionary, keyword extraction, automatic summarization, and keyword tagging. It can be applied to natural language processing and other aspects, and is suitable for various projects that require high word segmentation effects.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 23
    Apache OpenNLP

    Apache OpenNLP

    Apache OpenNLP

    Apache OpenNLP is a machine learning-based NLP library that provides tools for text-processing tasks such as tokenization, sentence segmentation, and named entity recognition.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 24
    AutoSummary uses Natural Language Processing to generate a contextually-relevant synopsis of plain text. It uses statistical and rule-based methods for part-of-speech tagging, word sense disambiguation, sentence deconstruction and semantic analysis.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 25

    Bermuda Text-to-Speech

    This project includes basic NLP and DSP techniques for Text-to-Speech

    See TTS demo at: http://rslp.racai.ro/index.php?page=tts This is an entirely written in JAVA project which includes a set of tools and methods designed to enable Multilingual Text-to-Speech (TTS) synthesis. We currently support English and Romanian but we will soon train more models and make them available for download. If you want to read more about our other NLP and TTS tools check out http://nlptools.racai.ro.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • 2
  • 3
  • 4
  • Next
Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.