Best Open Source Java Linguistics Software

Java Linguistics Software

Linguistics Java Clear Filters

Browse free open source Java Linguistics Software and projects below. Use the toggles on the left to filter open source Java Linguistics Software by OS, license, language, programming language, and project status.

Cut Data Warehouse Costs up to 54% with BigQuery
Migrate from Snowflake, Databricks, or Redshift with free migration tools. Exabyte scale without the Exabyte price.

BigQuery delivers up to 54% lower TCO than cloud alternatives. Migrate from legacy or competing warehouses using free BigQuery Migration Service with automated SQL translation. Get serverless scale with no infrastructure to manage, compressed storage, and flexible pricing—pay per query or commit for deeper discounts. New customers get $300 in free credit.

Try BigQuery Free
Ship AI Apps Faster with Vertex AI
Go from idea to deployed AI app without managing infrastructure. Vertex AI offers one platform for the entire AI development lifecycle.

Ship AI apps and features faster with Vertex AI—your end-to-end AI platform. Access Gemini 3 and 200+ foundation models, fine-tune for your needs, and deploy with enterprise-grade MLOps. Build chatbots, agents, or custom models. New customers get $300 in free credit.

Try Vertex AI Free
1

WordNetSQL

WordNet Database in various SQL format

2 Reviews

Downloads: 42 This Week

Last Update: 2014-02-16
See Project
2

Wordcorr

Data management for comparative linguistics

Wordcorr automates the tedious and risky process of tabulating and managing the sound correspondences used in working out the historical development of natural languages. Initial support was from NSF.

4 Reviews

Downloads: 14 This Week

Last Update: 2013-01-05
See Project
3

sgmweka

Weka wrapper for the SGM toolkit for text classification and modeling.

Weka wrapper for the SGM toolkit for text classification and modeling. Provides Sparse Generative Models for scalable and accurate text classification and modeling for use in high-speed and large-scale text mining. Has lower time complexity of classification than comparable software due to inference based on sparse model representation and use of an inverted index. The provided .zip file is in the Weka package format, giving access to text classification. Other functions are usable through either Java command-line commands or class inclusion into Java projects.

Downloads: 15 This Week

Last Update: 2016-06-23
See Project
4

TXM

Unicode XML TEI text analysis platform

TXM is a free and open-source cross-platform Unicode & XML based text analysis environment and graphical client, supporting Windows, Linux and Mac OS X. It can also be used online as a J2EE standard compliant web portal (GWT based) with access control built in. DOWNLOAD LATEST VERSION OF TXM : http://textometrie.ens-lyon.fr/spip.php?rubrique61&lang=en TXM offers a comprehensive range of analysis tools (concordances, collocate search, frequency lists, etc.) based on the powerfull CQP full text search engine (http://cwb.sourceforge.net) and a range of statistical functions (factorial analysis, classification, cooccurrency analysis, etc.) based on R packages (http://www.r-project.org). Read the scientific background at the Textométrie project web site http://textometrie.ens-lyon.fr/?lang=en. Read a full description at the TEI Tools wiki http://wiki.tei-c.org/index.php/TXM.

Downloads: 13 This Week

Last Update: 2024-12-09
See Project
Easily Host LLMs and Web Apps on Cloud Run
Run everything from popular models with on-demand NVIDIA L4 GPUs to web apps without infrastructure management.

Run frontend and backend services, batch jobs, host LLMs, and queue processing workloads without the need to manage infrastructure. Cloud Run gives you on-demand GPU access for hosting LLMs and running real-time AI—with 5-second cold starts and automatic scale-to-zero so you only pay for actual usage. New customers get $300 in free credit to start.

Try Cloud Run Free
5

srt-translator

Subtitle translator from one natural language to other.

Translating subtitles in format SubRip from one natural language to other. It is based on Google Translate without API and therefore without payment. Translator have automatic and manual spell checkers.

Downloads: 11 This Week

Last Update: 2016-07-19
See Project
6

oopinyinguide

OO Pinyin Guide is a Java extension for OpenOffice 3 or higher. It enables the user to add pinyin transliteration over Chinese characters inside a text document. This tool can be useful for people learning or teaching Chinese.

3 Reviews

Downloads: 5 This Week

Last Update: 2013-04-29
See Project
7

LaBB-CAT

A linguistic annotation store

LABB-CAT is a browser-based linguistics research tool that stores recordings and regular-expression searchable text transcripts of interviews. The search results, entire transcripts, and media, can be viewed or exported in a variety of format

Downloads: 7 This Week

Last Update: 13 hours ago
See Project
8

XML-Print

XML-Print: typesetting arbitrary XML documents in high quality

"XML-Print" is a joint project of the FH Worms (Prof. Marc W. Küster) and the University of Trier (Prof. Claudine Moulin) with support from TU Darmstadt (Prof. Andrea Rapp). Its goal is the creation of a XML formatter designated especially for the needs of the “Digital Humanties”. The project is funded by the DFG. Please visit https://sites.google.com/a/budabe.eu/xmlprint_de/kontakt and let us know, what you think about XML-Print – Does it meet your expectations? – What is missing? – Do you use it regularly? Thank you.

1 Review

Downloads: 3 This Week

Last Update: 2017-01-05
See Project
9

Ghawwas_V4

An open source system for Arabic corpora processing

Ghawwas (previously known as Khawas) is an open source system for Arabic corpora processing. Ghawwas V4.0 provides the following main functions: a. Frequency list for single word and N-Grams b. Concordance c. Collocation (MI, CHI Squared, LL, T-Score, Z Score, Dice, Log Dice, Weirdness Coefficient) d. Lexical patterns search e. Two corpora frequency profile comparison based on MI, CHI, LL, T-Score, Z Score, Dice, Log Dice, Weirdness Coefficient f. Accept Windows and UTF-8 character encoding g. Accept TXT, DOC, DOCX, RTF and HTML formats h. Export the processing results in CSV file format

1 Review

Downloads: 2 This Week

Last Update: 2018-12-09
See Project
Build on Google Cloud with $300 in Free Credit
New to Google Cloud? Get $300 in free credit to explore Compute Engine, BigQuery, Cloud Run, Vertex AI, and 150+ other products.

Start your next project with $300 in free Google Cloud credit. Spin up VMs, run containers, query exabytes in BigQuery, or build AI apps with Vertex AI and Gemini. Once your credits are used, keep building with 20+ products with free monthly usage, including Compute Engine, Cloud Storage, GKE, and Cloud Run functions. Sign up to start building right away.

Start Free Trial
10

Helsinki Finite-State Technology

The Helsinki Finite-State Transducer toolkit is intended for processing natural language morphologies. The toolkit is demonstrated by wide-coverage implementations of a number of languages of varying morphological complexity.

Downloads: 2 This Week

Last Update: 2017-09-14
See Project
11

Dutch sentiment analysis engine

Een module om de sentiment van een stuk Nederlandse tekst to bepalen

This application was developed by Incentro to satisfy requests by clients for a sentiment analyser for the Dutch language. It is currently in it's alpha stage and we expect to have a beta release by November 2012. If you would like to help with the development or testing of this product please contact us at +31[0]15 76 40 750 - of info {at} incentro.com. Deze applicatie is ontwikkeld door Incentro om te voldoen aan klantaanvragen voor een sentimentanalyse module voor de Nederlandse taal. Momenteel is de module in alpha versie beschikbaar en een beta versie wordt verwacht in november 2012. Als u ons wilt helpen bij het ontwikkelen of testen van deze module, neem dan contact op met Incentro via +31[0]15 76 40 750 - of info {at} incentro.com.

1 Review

Downloads: 1 This Week

Last Update: 2016-10-06
See Project
12

IceNLP

IceNLP is an open source Natural Language Processing (NLP) toolkit for analyzing and processing Icelandic text. The toolkit is implemented in Java.

1 Review

Downloads: 1 This Week

Last Update: 2018-04-13
See Project
13

JAVA Arabic Stemmer

A JAVA class with a small functionality that is stemming Arabic words

A JAVA Arabic stemmer that is based on Shereen Khoja algorithm. This java class offers a function called stemWrod which takes an arabic word and return the stem of it.

1 Review

Downloads: 1 This Week

Last Update: 2013-05-30
See Project
14

TIES

A smart search engine for medical documents

TIES (Text Information Extraction System) is a clinical text search engine that uses Natural Language Processing techniques to extract medical concepts from free text clinical reports. It provides secure de-identified access to this information and has in built collaboration tools and honest broker functionality. It is licensed for academic use under the BSD license. For commercial use please contact Nexi at http://nexihub.com *** NOTICE: this software and forum are no longer maintained, as of 8/15/2019. You are free to continue to use this software under the license for academic use under the BSD license. For commercial use please contact Nexi at http://nexihub.com

1 Review

Downloads: 1 This Week

Last Update: 2019-09-09
See Project
15

GramLab

Le projet Gramlab vise à mettre à disposition des entreprises des outils logiciels OpenSource et gratuits, qui peuvent être mis en oeuvre par des développeurs qui ne sont pas spécialistes du traitement des langues. Note : L'outil GLabCorpus Manager nécessite l'installation d'un serveur SolR. Pour le télécharger et plus d'information, veuillez vous rendre dans la section Files.

Downloads: 1 This Week

Last Update: 2016-03-10
See Project
16

HermeneutiX

Your graphical tool for Syntactic/Semantic Structure Analysis of texts

HermeneutiX is a tool for diagramming syntactic and semantic structures of complex (not necessarily foreign-language) texts (e.g. bible or other historical excerpts). HermeneutiX is now part of SciToS (the scientific tool set). Starting with version 2.0.0, HermeneutiX can be found on GitHub. Please check out the release summary: https://github.com/scientific-tool-set/scitos/releases For an introduction, check out this video: https://youtu.be/uQjewyG0Ad8 PS: To run a Java application such as HermeneutiX (i.e. SciToS) you need a Java Runtime Environment (JRE). HermeneutiX is currently built to be compatible down to JRE version 6. You may download the current JRE here: http://www.java.com/en/download

Downloads: 1 This Week

Last Update: 2017-09-28
See Project
17

OPTIMA cidoc-crm Semantic Annotation

Semantic annotation of archaeology reports with respect to CIDOC-CRM

The semantic annotation system OPTIMA is the result of Andreas Vlachidis PhD work, (supervised by Prof. Douglas Tudhope, University of Glamorgan, UK). OPTIMA performs the NLP tasks of Named Entity Recognition, Relation Extraction, Negation Detection and Word Sense Disambiguation using hand-crafted rules and SKOS terminological resources (English Heritage Thesauri and Glossaries). The resulted semantic annotations are associated with classes of the (ISO 21127:2006) CIDOC Conceptual Reference Model (CRM) and its archaeological extension, CRM-EH. OPTIMA is also targeted at the detection and recognition of contextual relations between CRM entities. Such relations are modeled with respect to the CRM-EH archaeology extension. The pipeline targets the CIDOC-CRM entities; E19.Physical_Object, E53.Place, E49.Time_Appellation and E57.Material and the CRM-EH entities; EHE1001.Context_Event, EHE1002.Production_Event, EHE1004.Deposition_Event and P45.consists_of material property

Downloads: 1 This Week

Last Update: 2015-10-11
See Project
18

TF-IDF Measure

TF-IDF.jar is a Java Archive file to measure TF-IDF of each document in a document collection (corpus). The jar can be used to (a) get all the terms in the corpus (b) get the document frequency (DF) and inverse document frequency (IDF) of all the terms in the corpus (c) get the TF-IDF of each document in the corpus (d) get each term with their frequency (no. of presence), term frequency (TF) and TF-IDF in every document

Downloads: 1 This Week

Last Update: 2015-12-17
See Project
19

iLastic

Query, integrate and manipulate data using natural languages.

iLastic is an open-source framework to query, integrate and manipulate any type of data in English. Extract, transform and merge information from the web, databases, files or any other data repository using a language you already know... English

Downloads: 1 This Week

Last Update: 2013-10-31
See Project
20

korpus

Corpus Linguistics Software

Some software for Corpus Linguistics, which includes Corpus Text Editor, Web-based search, etc. This project created for Belarusian Corpus, but can be used for other languages with some adaption.

Downloads: 1 This Week

Last Update: 2021-02-02
See Project
21

leXkit: Generic XML-based Dictionary CMS

leXkit: a client-server dictionary edition environment, that makes editing easier for the lexicographer, who hasn’t to be aware of technical issues. Entry meta-information is used to provide advanced functionality, such as context-dependent tasks.

Downloads: 1 This Week

Last Update: 2013-04-19
See Project
22

ARARSS

Downloads: 0 This Week

Last Update: 2019-01-01
See Project
23

Annoschemer

Annoschemer is a little tool for easy editing of MMAX2 annotationschemes.

Downloads: 0 This Week

Last Update: 2014-07-15
See Project
24

AraRooter

Find Arabic Root Word

Using Machine Learning, AraRooter finds the three-lettered root of any Arabic lemma with around 84% accuracy.

Downloads: 0 This Week

Last Update: 2013-06-19
See Project
25

Arabic Morphology& Sentacs coding

This project aimed at creating framework and binary data format for etymological Arabic system. and will not continue hosted at sourceforge because the term of use determine me as enemy, so I am prohibited from using sourceforge services.

Downloads: 0 This Week

Last Update: 2016-07-22
See Project