Showing 39 open source projects for "arabic corpus csv"

View related business solutions
  • Try Google Cloud Risk-Free With $300 in Credit Icon
    Try Google Cloud Risk-Free With $300 in Credit

    No hidden charges. No surprise bills. Cancel anytime.

    Use your credit across every product. Compute, storage, AI, analytics. When it runs out, 20+ products stay free. You only pay when you choose to.
    Start Free
  • MongoDB Atlas runs apps anywhere Icon
    MongoDB Atlas runs apps anywhere

    Deploy in 115+ regions with the modern database for every enterprise.

    MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.
    Start Free
  • 1
    Dawarich

    Dawarich

    Self-hostable alternative to Google Timeline

    Dawarich is a command-line tool (likely Ruby-based) for transforming and analyzing Arabic text data with normalization, diacritic handling, segmentation, and morphological tokenization. Designed for text mining and NLP workflows in Arabic-language contexts.
    Downloads: 13 This Week
    Last Update:
    See Project
  • 2
    PDF Bookmark Extractor Arabic

    PDF Bookmark Extractor Arabic

    Extract PDF bookmarks to CSV files

    This program will extract PDF bookmarks to CSV file. برنامج لاستخلاص الاشارات المرجعية من ملفات بي دي اف وحفظها في ملف قابل للفتح في برنامج اكسل يجب تحميل الملف iepdf32.dll ووضعه في نفس مجلد البرنامج
    Downloads: 0 This Week
    Last Update:
    See Project
  • 3
    OmegaT - multiplatform CAT tool

    OmegaT - multiplatform CAT tool

    The free computer aided translation (CAT) tool for professionals

    OmegaT is a free and open source multiplatform Computer Assisted Translation tool with fuzzy matching, translation memory, keyword search, glossaries, and translation leveraging into updated projects.
    Leader badge
    Downloads: 1,557 This Week
    Last Update:
    See Project
  • 4
    TextSeek

    TextSeek

    Professional full-text desktop search tool

    TextSeek is a professional full-text desktop search tool. Unlike the filename search tool like Everything and Listary, TextSeek can search filename and file content easily and quickly. It supports PDF, Word, Excel, Powerpoint, RTF and other formats. The software can run directly, and no extra package is required to install.
    Downloads: 6 This Week
    Last Update:
    See Project
  • Gemini 3 and 200+ AI Models on One Platform Icon
    Gemini 3 and 200+ AI Models on One Platform

    Access Google's best plus Claude, Llama, and Gemma. Fine-tune and deploy from one console.

    Build generative AI apps with Vertex AI. Switch between models without switching platforms.
    Start Free
  • 5
    Downloads: 0 This Week
    Last Update:
    See Project
  • 6

    Linguistic Analyzer

    The Linguistic Analyzer is a tool for corpus analysis and comparison

    The Linguistic Analyzer (Almuhalil Alloghawy) is a free tool designed by a team from Al-Imam Muhammad bin Saud islamic university that can be used for corpus analysis and comparison in terms of the several linguistic characteristics, such as frequency lists generation, concordances, collocation extraction, the difference between two words, and keyword identification.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 7
    43 queries of various topics for the Information Retrieval Collection . The corpus is created from the OSAC corpus of journalistic texts consisting of 4763 articles recovered from the Arabic BBC News. https://sourceforge.net/projects/ar-text-mining/files/Arabic-Corpora/
    Downloads: 0 This Week
    Last Update:
    See Project
  • 8
    In this corpus: 10 essays containing 752 sentences (with a total of 4,160 words). The essays were selected from different collections of partially or totally diacritic Arabic texts, all of which are available in the Tashkeela corpus. Texts in this corpus have been used in the evaluation of AGD checker. There are two types of texts in this corpus: 1- Texts without errors to evaluate AGD in terms of detecting and correcting errors that we do not know about before the checking process 2-Texts with errors to evaluate AGD’s ability to discover inserted errors in entirely correct essays.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 9

    KSUCCA Corpus

    A 50 million tokens corpus of Classical Arabic.

    King Saud University Corpus of Classical Arabic (KSUCCA) is a pioneering 50 million tokens annotated corpus of Classical Arabic texts from the period of pre-Islamic era until the fourth Hijri century (equivalent to the period from the seventh until early eleventh century CE), which is the period of pure classical Arabic. The main aim of this corpus is to be used for studying the distributional lexical semantics of The Quran words.
    Downloads: 10 This Week
    Last Update:
    See Project
  • Forever Free Full-Stack Observability | Grafana Cloud Icon
    Forever Free Full-Stack Observability | Grafana Cloud

    Our generous forever free tier includes the full platform, including the AI Assistant, for 3 users with 10k metrics, 50GB logs, and 50GB traces.

    Built on open standards like Prometheus and OpenTelemetry, Grafana Cloud includes Kubernetes Monitoring, Application Observability, Incident Response, plus the AI-powered Grafana Assistant. Get started with our generous free tier today.
    Create free account
  • 10
    Vehicle Weighbridge Software

    Vehicle Weighbridge Software

    Vehicle Weighing Software

    Vehicle Weighing Software - Veighsoft, weighbridge application software for the vehicle weight management in the field of Mining, Logistics, Industrial Plants, Ports and Roadways Industries as private and roadways usage. This software can be used for all type of full weighbridges, axle weighbridges and wheel weighing pads. It can transfer data to Cloud, Shared or Main server database in online and offline mode.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 11

    Arabic Corpus

    Text categorization, arabic language processing, language modeling

    The Arabic Corpus {compiled by Dr. Mourad Abbas ( http://sites.google.com/site/mouradabbas9/corpora ) The corpus Khaleej-2004 contains 5690 documents. It is divided to 4 topics (categories). The corpus Watan-2004 contains 20291 documents organized in 6 topics (categories). Researchers who use these two corpora would mention the two main references: (1) For Watan-2004 corpus ---------------------- M.
    Leader badge
    Downloads: 3 This Week
    Last Update:
    See Project
  • 12

    Queries for OSAC (Arabic) Corpus

    43 Queries for Arabic Information Retrieval Collection

    43 queries of various topics for the Information Retrieval Collection . The corpus is created from the OSAC corpus of journalistic texts consisting of 4763 articles recovered from the Arabic BBC News. https://sourceforge.net/projects/ar-text-mining/files/Arabic-Corpora/
    Downloads: 0 This Week
    Last Update:
    See Project
  • 13

    Ghawwas_V4

    An open source system for Arabic corpora processing

    ...Accept Windows and UTF-8 character encoding g. Accept TXT, DOC, DOCX, RTF and HTML formats h. Export the processing results in CSV file format
    Downloads: 8 This Week
    Last Update:
    See Project
  • 14
    Tashkeela: Arabic diacritization corpus

    Tashkeela: Arabic diacritization corpus

    Tashkeela: Arabic discritization Corpus (Vocalized texts)

    Tashkeela: Arabic discritization Corpus, Resource, Arabic vocalized texts: نصوص عربية مشكولة =========== Contains Arabic text vocalized . Text -format; 75.6 millions words Please cite this resource as: T. Zerrouki, A. Balla, Tashkeela: Novel corpus of Arabic vocalized texts, data for auto-diacritization systems, Data in Brief (2017), http://dx.doi.org/10.1016/j.dib.2017.01.011 Data in Brief ∎ ( ∎∎∎∎ ) ∎∎∎ – ∎∎∎
    Leader badge
    Downloads: 3 This Week
    Last Update:
    See Project
  • 15

    groundcad

    CAD 2D for Land Surveying and Civil

    Operating System : Windows - Linux - Mac OS Drawing 2D Point Line Polygon Rectangle Circle Ellipse Text,Image with thickness,color,hatch... Drawing commands : rotation,scale,explode... Labeling Point Line Polygon Rectangle Circle (Name Code Comment XYZ Length Angle Area Radius ...) Area Calculation by points or by object. Import-Export module : Image bmp,jpg,png CSV TXT SVG DXF R12 SOKKIA SDR33 TOPCON FC4 GTS7 XML LANDXML LEICA GSI8 GSI16
    Downloads: 5 This Week
    Last Update:
    See Project
  • 16
    DatacenterManager

    DatacenterManager

    UNIX Performance Monitoring / Trend Analysis Java Software

    Remotely Inventory and Poll UNIX servers in seconds. (without installing extra software on your servers, just by SSH communication plain old UNIX commands).https://sites.google.com/site/ronuitzaandam/ Your entire datacenter can be automatically inventoried by supplying hostname, username & password for each server, either “one by one” or via an automated CSV host-list import file. This software goes great with other UNIX software like WinSCP and Putty etc !!!
    Downloads: 0 This Week
    Last Update:
    See Project
  • 17

    PADIC

    A multilingual Parallel Arabic DIalectal Corpus

    PADIC (Parallel Arabic DIalectal Corpus) is a multi-dialectal corpus built in the framework of the National Research Project "TORJMAN", led by Scientific and Technical Research Center for the Development of Arabic Language and funded by the Algerian Ministry of Higher Education and Scientific Research. PADIC is composed of 6 dialects: two Algerian dialects (Algiers and Annaba cities), Palestinian, Syrian, Tunisian, Moroccan) and MSA.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 18

    Arabic business corpora

    Arabic business and management corpus

    This corpora is made up of 3 sub corpora as follows: 1) Management Corpus: 400 articles by Chairmans and CEOs of Arabic companies in the Middle East. 2) Economics News: 400 news articles from different Arabic online newspapers. 3) Stock market news, 400 articles collected from investing.com. The main corpora contains 1200 articles. The articles have been tagged using Stanford Arabic Part of Speech Tagger.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 19

    Classical Arabic Corpus

    A corpus contains more than 1 M distinct Arabic words.

    This project has been developed as part of a master thesis named "Edit Distance Adapted to Natural Language Words". The available project consists three parts. First, the corpus gathers more than one million distinct Arab words. Second, the text files of Arabic resources. Third, the index file presents some information about these resources. Additional details about these parts are available in README file.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 20
    Osman Arabic Text Readability

    Osman Arabic Text Readability

    Open Source tool for Arabic text readability

    We present OSMAN (Open Source Metric for Measuring Arabic Narratives) - a novel open source Arabic readability metric and tool. The open source Java tool allows users to calculate readability for Arabic text (with and without diacritics). The tool provides methods to split the text into words and sentence, count syllables, Faseeh letters, hard and complex words in addition to adding diacritics (vocalise text). This makes the tool useful for researchers and educators working with Arabic text....
    Downloads: 0 This Week
    Last Update:
    See Project
  • 21
    AFEWC corpus is a multilingual comparable text articles in Arabic, French, and English languages. Each triple article is related to the same topic (aligned at article level). AFEWC corpus is collected from Wikipedia. The corpus is available for free for research purposes only. It is composed of 40K aligned articles, 91.3M English words, 57.8M French words, 22M Arabic words, 2.8M English unique words, 1.9M French unique words, and 1.5M Arabic unique words. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 22
    Pootle, Virtaal & Translate Toolkit

    Pootle, Virtaal & Translate Toolkit

    Localization tools built by localizers for localizers

    Tools for localization: - Pootle: web based translation management system. - Virtaal: Computer Aided Translation (CAT) tool. - Translate Toolkit: QA, format conversion and support (PO, Java .properties, OpenOffice, Mozilla, XLIFF, TMX, TBX, CSV, Qt .ts).
    Leader badge
    Downloads: 54 This Week
    Last Update:
    See Project
  • 23
    phpMyAdmin

    phpMyAdmin

    A software tool to bring MySQL to the Web

    phpMyAdmin is a tool written in PHP intended to handle the administration of MySQL over the Web. Currently it can create and drop databases, create/drop/alter tables, delete/edit/add columns, execute any SQL statement, manage indexes on columns.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 24

    DBF converter (unicode support)

    DBF loading and converting, Supporting Persian and Arabic unicodes

    there are several small applications in C# for importing and or exporting DBF (foxpro database) files. none of them supports for FARSI or ARABIC unicodes, so i completed their application to involve it. it has one drow-back i know : it cant convert farsi unicodes to DBF version as well as needed. whole project is written in C# and solution is attached here, so you can change and modify it.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 25

    Arabic Named Entity Gazetteer

    Arabic Named Entity Gazetteer

    ...To use this corpus, please cite the following publication: F. Alotaibi and M. Lee, "Automatically Developing a Fine-grained Arabic Named Entity Corpus and Gazetteer by utilizing Wikipedia", In Proceedings of IJCNLP, p392-400. Nagoya, Japan, October, 2013. Author URL: http://www.cs.bham.ac.uk/~fsa081/index.html http://fsalotaibi.kau.edu.sa Email: fsalotaibi {AT} kau.edu.sa fsa081 {AT} cs.bham.ac.uk
    Downloads: 0 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • 2
  • Next
MongoDB Logo MongoDB