Page 2 | Best Open Source Linguistics Software 2026

Linguistics Software

Linguistics Windows ChromeOS Clear Filters

Try Google Cloud Risk-Free With $300 in Credit
No hidden charges. No surprise bills. Cancel anytime.

Use your credit across every product. Compute, storage, AI, analytics. When it runs out, 20+ products stay free. You only pay when you choose to.

Start Free
MongoDB Atlas runs apps anywhere
Deploy in 115+ regions with the modern database for every enterprise.

MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.

Start Free
1

BANNER Named Entity Recognition System

BANNER is a named entity recognition system intended primarily for biomedical text. It uses conditional random fields as the primary recognition engine and includes a wide survey of the best techniques described in recent literature.

Downloads: 0 This Week

Last Update: 2015-07-30
See Project
2

BioLemmatizer

Lemmatization tool for morphological analysis of biomedical literature

The BioLemmatizer is a domain-specific lemmatization tool for the morphological analysis of biomedical literature. It is tailored to the biological domain through integration of several published lexical resources related to molecular biology. It focuses on the inflectional morphology of English, including the plural form of nouns, the conjugations of verbs, and the comparative and superlative form of adjectives and adverbs. README: https://sourceforge.net/projects/biolemmatizer/files/ The BioLemmatizer 1.2 release adds an optional functionality to normalize British English spellings into American English spellings and then retrieve corresponding lemmas. If you use the BioLemmatizer to support academic research, please cite the following paper: Haibin Liu, Tom Christiansen, William A Baumgartner Jr, and Karin Verspoor BioLemmatizer: a lemmatization tool for morphological processing of biomedical text Journal of Biomedical Semantics 2012, 3:3.

Downloads: 0 This Week

Last Update: 2013-10-23
See Project
3

Board Game Language

Board Game Language (BGL, pronounced "bagel") is a natural language syntax programming language for first-time programmers. It uses board games as a metaphor for programming concepts, with the goal of teaching users the foundations of programming.

Downloads: 0 This Week

Last Update: 2014-06-23
See Project
4

C4 - Christian's C++ Code Collection

C4 is a C++ class library for analyzing sound files, particularly spoken and sung phonations. C4 provides features such as frequency analysis, pitch extraction, or calculation of voice quality parameters (e.g. alpha ratio, HNR, jitter, etc.).

Downloads: 0 This Week

Last Update: 2015-03-19
See Project
Full-stack observability with actually useful AI | Grafana Cloud
Our generous forever free tier includes the full platform, including the AI Assistant, for 3 users with 10k metrics, 50GB logs, and 50GB traces.

Built on open standards like Prometheus and OpenTelemetry, Grafana Cloud includes Kubernetes Monitoring, Application Observability, Incident Response, plus the AI-powered Grafana Assistant. Get started with our generous free tier today.

Create free account
5

CHALICE

Connecting Historical Authorities with Links, Contexts and Entities. CHALICE is a historic placename gazetteer for the UK, published as Linked Data and linked to other widely-used sources of placename reference information on the semantic web.

Downloads: 0 This Week

Last Update: 2013-04-26
See Project
6

CRFSharp

CRFSharp is a .NET(C#) implementation of Conditional Random Field

CRFSharp(aka CRF#) is a .NET(C#) implementation of Conditional Random Fields, an machine learning algorithm for learning from labeled sequences of examples. It is widely used in Natural Language Process (NLP) tasks, for example: word breaker, postagging, named entity recognized, query chunking and so on. CRF#'s mainly algorithm is the same as CRF++ written by Taku Kudo. It encodes model parameters by L-BFGS. Moreover, it has many significant improvement than CRF++, such as totally parallel encoding, optimizing memory usage and so on. Currently, when training corpus, compared with CRF++, CRF# can make full use of multi-core CPUs and only uses very low memory, and memory grow is very smoothly and slowly while amount of training corpus, tags increase. with multi-threads process, CRF# is more suitable for large data and tags training than CRF++ now. For example, in machine with 64GB, CRF# encodes model with more than 4.5 hundred million features quickly.

Downloads: 0 This Week

Last Update: 2015-08-03
See Project
7

Chaski

Distributed phrase-based machine translation training tool based on Hadoop.

Downloads: 0 This Week

Last Update: 2013-04-26
See Project
8

Communication Supporting System

Downloads: 0 This Week

Last Update: 2015-03-26
See Project
9

Communication Supporting System

Downloads: 0 This Week

Last Update: 2013-05-29
See Project
Stop Storing Third-Party Tokens in Your Database
Auth0 Token Vault handles secure token storage, exchange, and refresh for external providers so you don't have to build it yourself.

Rolling your own OAuth token storage can be a security liability. Token Vault securely stores access and refresh tokens from federated providers and handles exchange and renewal automatically. Connected accounts, refresh exchange, and privileged worker flows included.

Try Auth0 for Free
10

ConTextKit

ConTextKit is a Java-based implementation of Wendy Chapman's ConText algorithm for annotating the context of medical documents, specifically the negation, temporality, and experiencer.

Downloads: 0 This Week

Last Update: 2014-06-24
See Project
11

CoocViewer

Viewer for co-occurrences and positional co-occurrences

A Demo is available at: http://coocviewer.sourceforge.net/coocviewer/index.php

Downloads: 0 This Week

Last Update: 2013-11-08
See Project
12

CorpSe

CORPSE (CORPus SEarch) is a powerful search engine written in Java. The aim is to provide an efficient implementation of a word level inverted index search with various cool functions that can be used on very large corpora.

1 Review

Downloads: 0 This Week

Last Update: 2013-04-26
See Project
13

DawNLITE

DawNLITE is a Natural-Language-based Image Transmoding Engine. The software transforms an image to a video as recorded by a virtual camera panning and zooming over the image, following a natural language text description of the image.

Downloads: 0 This Week

Last Update: 2013-04-18
See Project
14

Dictionary Additions Management System

Dictionary Additions Management System (DAMS), a collection of open source translation dictionaries. These files are compatible with the Open Translation Engine (OTE). For more info, see http://sourceforge.net/projects/ote/

1 Review

Downloads: 0 This Week

Last Update: 2013-04-17
See Project
15

Donatus Parsing Tools for Portuguese

Donatus is an on-going project consisting of Python, NLTK-based tools and grammars for deep parsing and syntactical annotation of Brazilian Portuguese corpora. It includes a user-friendly graphical user interface for building syntactic parsers with the NLTK, providing some additional functionalities.

Downloads: 0 This Week

Last Update: 2016-08-28
See Project
16

Dutch sentiment analysis engine

Een module om de sentiment van een stuk Nederlandse tekst to bepalen

This application was developed by Incentro to satisfy requests by clients for a sentiment analyser for the Dutch language. It is currently in it's alpha stage and we expect to have a beta release by November 2012. If you would like to help with the development or testing of this product please contact us at +31[0]15 76 40 750 - of info {at} incentro.com. Deze applicatie is ontwikkeld door Incentro om te voldoen aan klantaanvragen voor een sentimentanalyse module voor de Nederlandse taal. Momenteel is de module in alpha versie beschikbaar en een beta versie wordt verwacht in november 2012. Als u ons wilt helpen bij het ontwikkelen of testen van deze module, neem dan contact op met Incentro via +31[0]15 76 40 750 - of info {at} incentro.com.

1 Review

Downloads: 0 This Week

Last Update: 2016-10-06
See Project
17

ELIA(eye-tracking for psycholinguistics)

ELIA(Eyegaze Language Integration Analysis) supports the analysis of eye-tracking data for studies in language processing. ELIA eases early analysis of data to enable iterative development of experiments in response to spoken language.

1 Review

Downloads: 0 This Week

Last Update: 2013-04-24
See Project
18

ElixirFM

Functional Arabic Morphology

ElixirFM is a high-level implementation of Functional Arabic Morphology. The core of ElixirFM is written in Haskell, while interfaces in Python and Perl support lexicon editing and other interactions. http://github.com/otakar-smrz/elixir-fm

1 Review

Downloads: 0 This Week

Last Update: 2016-06-28
See Project
19

Encode Arabic

Encode Arabic provides tools for encoding and decoding Arabic in Haskell, Python, Perl, or LaTeX. Interprets the ArabTeX notation to generate original orthography or phonetic transcription. Supports Buckwalter and other romanizations. Converts legacy byte encodings into Unicode. http://github.com/otakar-smrz/encode-arabic

1 Review

Downloads: 0 This Week

Last Update: 2016-06-28
See Project
20

EyeMap - Eye Movement Data Analyzer

EyeMap is a visualization and analysis tool for text reading eye movement data. It can process Unicode, proportion/non-proportion and spaced/unspaced reading materials, which supports various languages and experiment methods.

1 Review

Downloads: 0 This Week

Last Update: 2013-08-10
See Project
21

FALCON - Text Search Java Project

JSON based text search Java Project

----------------- - What is it? - ----------------- The "Falcon Search" is a JAVA API and tool to search inside the documents. It was originally started to search the content in pdf files under the project "HAWK Search". Searching with this tool is query-based not word-based as in most of the document search tools OR document readers. It also takes care of jumbling of words within query and spelling mistakes. Commonly used techniques in this project are Natural Language Processing, Information Extraction and Question-Answering Architecture. ---------------------- - Latest Version - ---------------------- Details of latest version can be found on project website - http://geekdadaji.com --------------------------- - CONTACT DETAILS - --------------------------- CREATOR : SWAPNIL A JADHAV (saj1919) EMAIL ID : dadajibudhau@gmail.com WEBSITE : http://geekdadaji.com LICENSE : CC BY-NC 4.0

Downloads: 0 This Week

Last Update: 2014-04-18
See Project
22

Genie

Genie is a highly sophisticated cognitive child-machine. Genie at its core is an artificial intelligence project, focusing on creating a new form of life.

Downloads: 0 This Week

Last Update: 2013-04-26
See Project
23

Genre Classification for SA languages

The goal of this project is to investigate optimal ways to do genre classification for the ten indigenous South African languages. Funded by Dept of Arts and Culture of the SA Government. http://www.trifonius.co.za/projects/genre-classification

Downloads: 0 This Week

Last Update: 2022-05-25
See Project
24

Google Translate PHP

Free Google Translate API PHP Package

A simple and effective PHP library for translating text using Google Translate without needing an API key. It allows developers to integrate real-time translation features into their applications with minimal setup and supports multiple languages, leveraging Google Translate’s unofficial endpoint.

Downloads: 0 This Week

Last Update: 2025-05-23
See Project
25

HAWK - PDF Text Search Java Project

No more support for this project - TAKE A LOOK AT FALCONSEARCH

No more support for this project - TAKE A LOOK AT FALCONSEARCH "https://sourceforge.net/projects/falcontextsearch/"

Downloads: 0 This Week

Last Update: 2014-04-19
See Project