Open Source Windows Linguistics Software - Page 3

Linguistics Software for Windows

View 1216 business solutions
  • Our Free Plans just got better! | Auth0 Icon
    Our Free Plans just got better! | Auth0

    With up to 25k MAUs and unlimited Okta connections, our Free Plan lets you focus on what you do best—building great apps.

    You asked, we delivered! Auth0 is excited to expand our Free and Paid plans to include more options so you can focus on building, deploying, and scaling applications without having to worry about your security. Auth0 now, thank yourself later.
    Try free now
  • Build Securely on Azure with Proven Frameworks Icon
    Build Securely on Azure with Proven Frameworks

    Lay a foundation for success with Tested Reference Architectures developed by Fortinet’s experts. Learn more in this white paper.

    Moving to the cloud brings new challenges. How can you manage a larger attack surface while ensuring great network performance? Turn to Fortinet’s Tested Reference Architectures, blueprints for designing and securing cloud environments built by cybersecurity experts. Learn more and explore use cases in this white paper.
    Download Now
  • 1

    Ghawwas_V4

    An open source system for Arabic corpora processing

    Ghawwas (previously known as Khawas) is an open source system for Arabic corpora processing. Ghawwas V4.0 provides the following main functions: a. Frequency list for single word and N-Grams b. Concordance c. Collocation (MI, CHI Squared, LL, T-Score, Z Score, Dice, Log Dice, Weirdness Coefficient) d. Lexical patterns search e. Two corpora frequency profile comparison based on MI, CHI, LL, T-Score, Z Score, Dice, Log Dice, Weirdness Coefficient f. Accept Windows and UTF-8 character encoding g. Accept TXT, DOC, DOCX, RTF and HTML formats h. Export the processing results in CSV file format
    Downloads: 4 This Week
    Last Update:
    See Project
  • 2
    DSL-KeyPad

    DSL-KeyPad

    Tool for multilingual input on Latin & Cyrillic scripts and more.

    “DSL KeyPad” is a utility written on AutoHotkey 2.0, designed for inputting a wide range of characters using hotkeys and auxiliary functions. Its primary focus is on enhancing input capabilities for Latin and Cyrillic scripts, allowing typing in multiple languages without the need for separate keyboard layouts for each language. Requires common QWERTY (English US)/ЙЦУКЕН (Russian) keyboard layouts. More than 4,700 Unicode characters are available. Additionaly, it supports typing on the Germanic Runes, Glagolitic, Old Turkic, Old Permic, Phoenician, Carian, Lycian, Ugaritic etc.
    Downloads: 7 This Week
    Last Update:
    See Project
  • 3
    Helsinki Finite-State Technology
    The Helsinki Finite-State Transducer toolkit is intended for processing natural language morphologies. The toolkit is demonstrated by wide-coverage implementations of a number of languages of varying morphological complexity.
    Downloads: 7 This Week
    Last Update:
    See Project
  • 4
    HanNanum - Korean POS Tagger
    HanNanum is a Korean Morphological Analyzer and POS Tagger. A plug-in component-based architecture is adapted to the new Java version for flexible use. You can find the work flow for morphological analysis, POS tagging, noun extraction, etc. Contact: kschoi@kaist.ac.kr hjjeong@world.kaist.ac.kr
    Downloads: 2 This Week
    Last Update:
    See Project
  • Build Securely on AWS with Proven Frameworks Icon
    Build Securely on AWS with Proven Frameworks

    Lay a foundation for success with Tested Reference Architectures developed by Fortinet’s experts. Learn more in this white paper.

    Moving to the cloud brings new challenges. How can you manage a larger attack surface while ensuring great network performance? Turn to Fortinet’s Tested Reference Architectures, blueprints for designing and securing cloud environments built by cybersecurity experts. Learn more and explore use cases in this white paper.
    Download Now
  • 5
    mgiza has now moved to github https://github.com/moses-smt/mgiza
    Downloads: 5 This Week
    Last Update:
    See Project
  • 6
    WordCount

    WordCount

    Count frequency of single, 2-word and 3-word clusters in a text

    The program can read a text file and count the occurrences of single words and clusters of 2 and 3 words. The resulting list will be sorted in descending order (highest frequency on top).
    Downloads: 5 This Week
    Last Update:
    See Project
  • 7

    Arabic Corpus

    Text categorization, arabic language processing, language modeling

    The Arabic Corpus {compiled by Dr. Mourad Abbas ( http://sites.google.com/site/mouradabbas9/corpora ) The corpus Khaleej-2004 contains 5690 documents. It is divided to 4 topics (categories). The corpus Watan-2004 contains 20291 documents organized in 6 topics (categories). Researchers who use these two corpora would mention the two main references: (1) For Watan-2004 corpus ---------------------- M. Abbas, K. Smaili, D. Berkani, (2011) Evaluation of Topic Identification Methods on Arabic Corpora,JOURNAL OF DIGITAL INFORMATION MANAGEMENT,vol. 9, N. 5, pp.185-192. 2) For Khaleej-2004 corpus --------------------------------- M. Abbas, K. Smaili (2005) Comparison of Topic Identification Methods for Arabic Language, RANLP05 : Recent Advances in Natural Language Processing ,pp. 14-17, 21-23 september 2005, Borovets, Bulgary. More useful references to check: ------------------------------------------- https://sites.google.com/site/mouradabbas9/corpora
    Downloads: 4 This Week
    Last Update:
    See Project
  • 8

    BuckTagger

    User-assisted tool for Arabic stem entry to Buckwalter Morpho Analyzer

    Using rules written in a Drools decision table, BuckTagger determines the correct Buckwalter Tag based on morphological properties of the input, automatically extracted or given by the user. At the moment, BuckTagger is not complete; it can only handle input that is: - Uninflected - In lexical form, i.e., no clitics or affixes. - A Perfect or Imperfect Verb - Preferably the first and before-last letters are diacritized/vocalized. The interface is in Arabic. See the README for more details. There is much room for development. Feel free to comment.
    Downloads: 4 This Week
    Last Update:
    See Project
  • 9

    TEI LingSIG

    Production space for the TEI Linguistics SIG

    This used to be the experimentation and production space for the Special Interest Group (SIG) of the Text Encoding Initiative (TEI) called "TEI for Linguists", LingSIG for short. Currently, this is a storage place for documents produced by the SIG. Use https://github.com/LingSIG to access the current production space.
    Downloads: 4 This Week
    Last Update:
    See Project
  • Picsart Enterprise Background Removal API for Stunning eCommerce Visuals Icon
    Picsart Enterprise Background Removal API for Stunning eCommerce Visuals

    Instantly remove the background from your images in just one click.

    With our Remove Background API tool, you can access the transformative capabilities of automation , which will allow you to turn any photo asset into compelling product imagery. With elevated visuals quality on your digital platforms, you can captivate your audience, and therefore achieve higher engagement and sales.
    Learn More
  • 10
    ALECSO Spell Checker

    ALECSO Spell Checker

    Arabic Spell Checker by ALECSO

    Arabic Spell Checker by ALECSO (The Arab League Educational, Cultural and Scientific Organization). Based on Hunspell
    Downloads: 2 This Week
    Last Update:
    See Project
  • 11
    Align parallel corpora on sentence level
    Downloads: 2 This Week
    Last Update:
    See Project
  • 12
    Cunei is a data-driven machine translation system that builds dynamic, statistical models based on instances of known translations found in a corpus.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 13
    Grammar-multi is most useful for languages which words have many forms («more» inflected languages), and for which grammatical agreement (and other syntactic connections) in a sentence is «more» important and «obvious». Need a help of linguists. Program is not for every-day use, but to show Grammar is working. If you want your language Grammar version - tell me.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 14
    JInsect
    The JINSECT toolkit is a Java-based toolkit and library that supports and demonstrates the use of n-gram graphs within Natural Language Processing applications, ranging from summarization and summary evaluation to text classification and indexing.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 15
    Phần mềm Dịch tiếng Anh

    Phần mềm Dịch tiếng Anh

    Translate English-Vietnamese & Dictionary - Online & Free

    English Vietnamese Translator and Dictionary # Translate Text from Word, PDF, Website... # Translate Text from Images, Videos, Programs... Phần mềm dịch tiếng Anh - tiếng Việt miễn phí. Bạn có thể dịch trực tiếp văn bản trên website bất kỳ, hoặc nhập văn bản cần dịch. Để kết quả dịch được chính xác, bạn nên dịch theo cụm từ hoặc từng câu. Bạn chỉ cần nhấn đúp chuột vào một từ hoặc dùng chuột để đánh dấu một đoạn văn bản khi đang lướt web để thấy kết quả dịch. Phần mềm có thể dịch tiếng Anh sáng tiếng Việt hoặc tiếng Việt sang tiếng Anh. Yêu cầu: cài đặt .Net Framework 4.5.2 trở lên Website: https://vikitranslator.com
    Downloads: 1 This Week
    Last Update:
    See Project
  • 16

    TML - Text Mining Library for LSA & CMM

    TML is a Java Library for LSA and extracting Concept Maps from text

    TML has moved to http://www.villalon.cl/tml.html and the code to https://github.com/villalon/tml
    Downloads: 1 This Week
    Last Update:
    See Project
  • 17
    Better PO Editor is an editor for .po files, used to generate compiled gettext .mo files which are used by many programs and websites to localize the user interface. It offers great features... It's worth to give it a try! PLEASE NOTE: the project moved to GitHub: see https://github.com/mlocati/betterpoeditor/releases
    Downloads: 2 This Week
    Last Update:
    See Project
  • 18
    This is a database of the Arabic roots and their derivatives in voweled and unvoweled forms along with stems. The database is extracted from the well known Arabic legacy dictionary "تاج العروس من جواهر القاموس".
    Downloads: 3 This Week
    Last Update:
    See Project
  • 19
    Based on the Buckwalter Morphological Analyzer (Version 1.0) for doing Arabic stemming and POS tagging. Includes a rewrite of the original Perl script, with better documentation and more flexible options, and a C++ interface (usable as a library or app).
    Downloads: 3 This Week
    Last Update:
    See Project
  • 20
    Color to Word

    Color to Word

    Turn colors into words

    The program will turn a color into a list of 10 words, obtained according to a custom designed algorithm based on letter shape and position in the alphabet. - Click inside the frame on the left to pick a color through the color chooser window - The program will match the color with the colors corresponding to a list of all the English words contained in the file wordcolor.txt - The first 10 matches will appear in the frame on the right - Right-click - Copy to copy the word matches and the RGB values This version comes with a text file (wordcolor.txt) containing all the English words followed by Red, Green, Blue channel values for the corresponding color. The colors were obtained through a modified version of the program "Text to Color" by same author, available for download on GitHub and SourceForge on the profile page of Fonazza-Stent. The next version (coming soon) will include a tool to convert a custom word list into a word+color list named wordcolor.txt
    Downloads: 3 This Week
    Last Update:
    See Project
  • 21

    KSUCCA Corpus

    A 50 million tokens corpus of Classical Arabic.

    King Saud University Corpus of Classical Arabic (KSUCCA) is a pioneering 50 million tokens annotated corpus of Classical Arabic texts from the period of pre-Islamic era until the fourth Hijri century (equivalent to the period from the seventh until early eleventh century CE), which is the period of pure classical Arabic. The main aim of this corpus is to be used for studying the distributional lexical semantics of The Quran words. However, it can be used for other research purposes, such as: • Arabic linguistics, which includes: lexical, morphological, syntactic, semantic and pragmatic research. • Arabic computational linguistics, which includes: lexical, morphological, syntactic, semantic and pragmatic research including their various applications. • Arabic language teaching for both Arabs and non Arabs. • Artificial intelligence. • Natural language processing. • Information retrieval. • Question answering. • Machine translation.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 22

    Linguistic Analyzer

    The Linguistic Analyzer is a tool for corpus analysis and comparison

    The Linguistic Analyzer (Almuhalil Alloghawy) is a free tool designed by a team from Al-Imam Muhammad bin Saud islamic university that can be used for corpus analysis and comparison in terms of the several linguistic characteristics, such as frequency lists generation, concordances, collocation extraction, the difference between two words, and keyword identification.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 23

    NetBeans Dictionaries

    Additional dictionary files for the NetBeans spellchecker.

    Additional dictionary files for the NetBeans spellchecker.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 24
    PronunDict

    PronunDict

    a pronunciation dictionary of American English

    PronunDict is both a reverse phonetic dictionary (searching by pronunciation) and a standard one to search by spelling. Pronunciation is transcribed with IPA symbols. It runs on Windows, and should also work with Wine on Linux and macOS. NEW PronunDict for French project page! https://sourceforge.net/projects/pronundict-french/ Acknowledgement: This app uses two external dictionaries (bundled with it): 1. AmEPD -- the American English Pronunciation Dictionary by Reece H. Dunn, (https://github.com/rhdunn/amepd) 2. CMUdict -- the Carnegie Mellon University Pronouncing Dictionary, (http://www.speech.cs.cmu.edu/cgi-bin/cmudict) Warning: Unfortunately, this dictionary is not flawless. There are some errors in the dictionary entries. Also, marking of stressed syllables does not always work perfectly, because syllable boundaries are only guessed based on a set of possible syllable onsets. You can follow PronunDict on Twitter. https://twitter.com/PronunDict
    Downloads: 3 This Week
    Last Update:
    See Project
  • 25
    Substitution Cipher Toolkit

    Substitution Cipher Toolkit

    Substitution cipher toolkit (en/decryption + automatical cracking)

    This substitution cipher toolkit enables you to en- and decrypt texts with substitution cipher, to gather language statistics of a specific language and to crack encrypted texts both manually and automatically. All functions can be accessed via an easy-to-use graphical user interface.
    Downloads: 3 This Week
    Last Update:
    See Project
Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.