Showing 33 open source projects for "indexing chinese texts"

View related business solutions
  • Try Google Cloud Risk-Free With $300 in Credit Icon
    Try Google Cloud Risk-Free With $300 in Credit

    No hidden charges. No surprise bills. Cancel anytime.

    Use your credit across every product. Compute, storage, AI, analytics. When it runs out, 20+ products stay free. You only pay when you choose to.
    Start Free
  • Gemini 3 and 200+ AI Models on One Platform Icon
    Gemini 3 and 200+ AI Models on One Platform

    Access Google's best plus Claude, Llama, and Gemma. Fine-tune and deploy from one console.

    Build generative AI apps with Vertex AI. Switch between models without switching platforms.
    Start Free
  • 1
    Anna’s Archive

    Anna’s Archive

    Comprehensive search engine for books, papers, comics, magazines

    Anna’s Archive is a large-scale open-source search engine and data aggregation platform designed to index and provide access to a vast collection of books, academic papers, comics, magazines, and other digital texts through a unified interface. The project includes all the infrastructure required to run a full instance locally or in production, combining web servers, databases, and search indexing systems into a scalable architecture. It relies heavily on technologies such as Elasticsearch for search functionality and MariaDB for structured data storage, enabling fast and efficient querying across massive datasets. ...
    Downloads: 114 This Week
    Last Update:
    See Project
  • 2
    Text Search Engine

    Text Search Engine

    A text search engine that supports mixed Chinese and English search

    Text-Search-Engine is a JavaScript-based lightweight search engine that enables full-text search functionality. It allows developers to implement fast search indexing and retrieval in web applications.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 3
    shuyuan

    shuyuan

    Reading book source

    shuyuan is a project oriented around reading and knowledge consumption, especially targeting large-scale text content such as books, articles, or educational material. The name suggests “academy” or “study hall,” and the tool aims to help users ingest, organize, and manage reading content — possibly offering features like text parsing, annotation, metadata generation, translation, or storage for later reference. The repository is set up to support document ingestion, indexing, and maybe some...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 4
    PageIndex

    PageIndex

    Document Index for Vectorless, Reasoning-based RAG

    PageIndex is an innovative open-source framework that reimagines retrieval-augmented generation (RAG) by eliminating conventional vector similarity search and instead building hierarchical semantic indexes that mirror a document’s natural structure. Rather than chunking text and embedding it into a vector database, PageIndex constructs a tree-structured index — similar to a detailed, AI-enhanced table of contents — that a large language model can traverse to locate the most relevant sections...
    Downloads: 0 This Week
    Last Update:
    See Project
  • Our Free Plans just got better! | Auth0 Icon
    Our Free Plans just got better! | Auth0

    With up to 25k MAUs and unlimited Okta connections, our Free Plan lets you focus on what you do best—building great apps.

    You asked, we delivered! Auth0 is excited to expand our Free and Paid plans to include more options so you can focus on building, deploying, and scaling applications without having to worry about your security. Auth0 now, thank yourself later.
    Try free now
  • 5
    Awesome English Ebooks

    Awesome English Ebooks

    Curated list of freely available English-language magazine issues

    awesome-english-ebooks is a curated list that collects high-quality, English-language ebooks across programming, computer science, mathematics, and related technical domains. The repository organizes links by topic and technology so learners can quickly find foundational texts, deep dives, and practical handbooks relevant to their goals. Entries often include notes about edition, format, or prerequisite knowledge, helping readers gauge where a book fits in a learning path. Because it lives...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 6
    AnyTXT Searcher

    AnyTXT Searcher

    A Powerful Desktop Full-Text Search Engine, Just Like Local Google.

    AnyTXT Searcher is a powerful file full-text search engine, a desktop search application for fast document retrieval. Just like a local disk Google search engine, much faster than Windows Search, it is your ideal desktop file content full-text search engine. It has a powerful document parsing engine built in, which extracts the text of commonly used file formats without installing any other software, and combines the built-in high-speed indexing system to store the metadata of the...
    Leader badge
    Downloads: 5,366 This Week
    Last Update:
    See Project
  • 7

    askimo

    AI desktop app with local RAG, privacy-first, multi-model support

    Askimo is an open-source, privacy-first AI desktop application designed to help users work with multiple AI models from a single, consistent interface. It supports popular AI providers such as OpenAI, Anthropic (Claude), Gemini, Ollama, LocalAI, Docker AI, LM Studio, and X AI, allowing users to switch models easily without vendor lock-in. A core feature of Askimo is Retrieval-Augmented Generation (RAG). Users can connect the app to local files, documents, and project folders so the AI can...
    Downloads: 18 This Week
    Last Update:
    See Project
  • 8
    San-Libs

    San-Libs

    Fixes the scanner error / Corrige les erreurs du scanner

    [En] San-Libs fix the device detection error when the scanner driver is already installed in the Linux systems using the Sane backends. San-Libs utility will verify the Sane libraries and set the configuration files to detect your scanner. [Fr] San-Libs corrige l’érreur de détection du scanner alors que le pilote est installé sur les systèmes Linux qui utilisent les back-end de Sane. L’utilitaire San-Libs va vérifier les librairies et reconfigurer les fichiers de Sane pour détecter...
    Downloads: 12 This Week
    Last Update:
    See Project
  • 9
    Qwen2.5

    Qwen2.5

    Open source large language model by Alibaba

    Qwen2.5 is a series of large language models developed by the Qwen team at Alibaba Cloud, designed to enhance natural language understanding and generation across multiple languages. The models are available in various sizes, including 0.5B, 1.5B, 3B, 7B, 14B, 32B, and 72B parameters, catering to diverse computational requirements. Trained on a comprehensive dataset of up to 18 trillion tokens, Qwen2.5 models exhibit significant improvements in instruction following, long-text generation...
    Downloads: 29 This Week
    Last Update:
    See Project
  • MongoDB Atlas runs apps anywhere Icon
    MongoDB Atlas runs apps anywhere

    Deploy in 115+ regions with the modern database for every enterprise.

    MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.
    Start Free
  • 10
    MyBox

    MyBox

    Easy Tools of PDF, Image, File, Network, Data, and Medias

    javafx-desktop-apps pdf image ocr icc barcode color-palette text bytes markdown html archive compress digest video audio editor converter media https://github.com/Mararsh/MyBox Self-contain packages need not java env nor installation. Jar packages need Java 16 or higher.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 11
    Math Model

    Math Model

    Code, resources, and templates for mathematical modeling

    Math_Model is a repository collecting resources, code, and algorithm templates for mathematical modeling and competition (e.g. Chinese modeling contests, US undergraduate modeling competitions). It includes LaTeX templates for writing solutions, records of past contest problems and winning solutions, algorithm implementations in MATLAB / M scripts for optimization, intelligent algorithms, numerical methods, and model frameworks. In effect, it is a curated library of modeling code, papers,...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 12
    PyCAPGE

    PyCAPGE

    PyCAPGE - Python Classic Adventure Point and Click Game Engine

    PyCAPGE (Python Classic Adventure Point and Click Game Engine) is a versatile, open-source framework designed for creating retro-style 2D graphic adventures using Python and Pygame. Inspired by the golden age of SCUMM games, it features a customizable 9-verb interface and robust inventory management. Key features include a Scene Manager supporting parallax scrolling, walk-behind masks, and depth-based character scaling. It implements intelligent Pathfinding to navigate complex...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 13
    Tria Sistema Operatiu

    Tria Sistema Operatiu

    Helps you find, download & burn the best Operating System for any PC

    Tria O.S. detects hard & soft specs of the PC where you run it, you can load this info or specify it manually to let the program inform you what would be the best operating system for a specific PC, with the specified hardware, and then see the difference by adding RAM, a SSD hard drive, changing the graphics card ... etc. REQUIREMENTS: For Linux, you need GAMBAS 3.3 or later, so you will have to install the gambas3 package before installing. 90's EDITION needs 160 MB (192 MB on...
    Downloads: 7 This Week
    Last Update:
    See Project
  • 14

    Syntax Untangler

    Teach your students how to figure out tricky texts in any language.

    Web-based activity that asks the learner to visually mark up a short primary text in any language, in order to improve small-scale reading skills. Students get instant feedback to actions. Instructors use Web-based authoring interface to write and publish their content and questions in any language (Unicode).
    Downloads: 0 This Week
    Last Update:
    See Project
  • 15
    cocoNLP

    cocoNLP

    A Chinese information extraction tool

    cocoNLP is a lightweight natural-language processing toolkit geared toward practical information extraction from raw text, especially for Chinese and mixed Chinese–English content. Instead of requiring a heavy pipeline, it focuses on quick wins such as extracting names, places, organizations, emails, phone numbers, and dates directly from unstructured sentences. The project blends pattern-based methods with NLP heuristics, giving developers dependable results for real-world texts like chats, comments, and user-generated content. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 16
    OpenSearchServer Search Engine

    OpenSearchServer Search Engine

    An open source search engine with RESTFul API and crawlers

    OpenSearchServer is a powerful, enterprise-class, search engine program. Using the web user interface, the crawlers (web, file, database, etc.) and the client libraries (REST/API , Ruby, Rails, Node.js, PHP, Perl) you will be able to integrate quickly and easily advanced full-text search capabilities in your application: Full-text with basic semantic, join queries, boolean queries, facet and filter, document (PDF, Office, etc.) indexation, web scrapping,etc. OpenSearchServer runs on...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 17
    Riot search

    Riot search

    Go Open Source, Distributed, Simple and efficient Search Engine

    Go Open Source, Distributed, Simple and efficient full text search engine. Efficient indexing and search (1M blog 500M data 28 seconds index finished, 1.65 ms search response time, 19K search QPS). Support for logical search. Support Chinese word segmentation (use gse word segmentation package concurrent word, speed 27MB / s). Support the calculation of the keyword in the text close to the distance(token proximity). Support calculation BM25 correlation.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 18
    jspforum-simple

    jspforum-simple

    A simplified forum application based on Java EE: strust+spring+ibatis

    A simplified forum application based on full and professional Java EE technology: struts+spring+ibatis (ssb) / hibernate (ssh). Which be implemented as the prototype of web forum / BBS, and provided for all the functionalities. Note: the WAR file can be directly imported from Eclipse or other Java EE IDEs and it is also including all the source code inside. IMPORTANT: while launching the app in the server, please re-configure the fields in the files of log4j.properties +...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 19
    Marcion

    Marcion

    The study environment of ancient languages (Coptic, Greek, Latin)

    Marcion is a software forming a study environment of ancient languages (esp. Coptic, Greek, Latin) and providing many tools and resources (dictionaties, grammars, texts). Although Marcion is focused on to study the gnosticism and early christianity, it is an universal library working with various file formats and allowing to collect, organize and backup texts of any kind. Overview of gnostic sources in Coptic language delivered with Marcion: Nag Hammadi Library; Berlin Codex; Codex...
    Leader badge
    Downloads: 45 This Week
    Last Update:
    See Project
  • 20

    Arabic Desktop Search Engine

    desktop search engine

    hello this is an desktop search engine target Arabic search engine also can work with other languages, this application use lucene.net for indexing and searching html file documents, developed with visual studio 2013. http://www.mediafire.com/download/p3lcez1h93pcpd8/ArDesktopSearch_SourceCode.7z The application strip Arabic diacritics when indexing html files also able to Highlight match founded texts with diacritics and without it using EasyMark highlighter JavaScript plugin with embedded browser to view the search results..
    Downloads: 0 This Week
    Last Update:
    See Project
  • 21
    Piwigo

    Piwigo

    Open Source photo Library Software

    Piwigo is an online photo gallery and media library software. It comes with powerful features for organizing, sharing and publishing your media files to the web. Organize your content with unlimited albums and sub-albums, tags and other indexing fields. Manage users and permissions, create a private or public web gallery. Customize your gallery with themes. Extend the features with plugins. Manage your digital content easily: photos, videos, audio files and more! Download latest...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 22
    HotShots

    HotShots

    a screenshot and annotation tool

    HotShots is an application for capturing screens and saving them in a variety of image formats as well as adding annotations and graphical data (arrows, lines, texts, ...).
    Downloads: 6 This Week
    Last Update:
    See Project
  • 23
    FirteX is a high performance,full-featured text indexing and retrieval platform.It provides a flexible and feasible experiment platform for researchers,as well as a scalable platform for Web search development.It is very fast,and well support for Chi
    Downloads: 0 This Week
    Last Update:
    See Project
  • 24
    Wukong

    Wukong

    Highly customizable full-text search engine

    Efficient indexing and searching (1M Weibo 500M data is indexed in 28 seconds, search response time is 1.65 milliseconds, and search QPS is 19K). Support Chinese word segmentation (concurrent word segmentation using the sego word segmentation package, speed 27MB/sec). Support to calculate the proximity distance of keywords in the text (token proximity).
    Downloads: 0 This Week
    Last Update:
    See Project
  • 25
    QuickClip

    QuickClip

    It's never so easy to record your clipboard activities.

    QuickClip is such a software to save anything you put in your clipboard automatically,so you can easily clip any texts ,pictures,even screen shots(use Print Screen key),and access them later. 记录剪贴板从未如此简单。在任何应用程序中选择“复制”,按下Ctrl+C或者Print Screen按钮,剪贴板中的文字和图片就被自动保存下来。绿色无插件,可以装在U盘中。 ========================================== 注意:本程序预计的的功能只实现了一部分,作者的寒假就结束了。没实现的功能在程序中一律不可用。另外,程序对于配置文件QuickClip.ini没有加入任何错误处理,所以请大家不要调戏,以免程序崩溃。如果出现异常,请删除QuickClip.ini以恢复默认设置。
    Downloads: 0 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • 2
  • Next
MongoDB Logo MongoDB