Showing 48 open source projects for "pdf index"

View related business solutions
  • Our Free Plans just got better! | Auth0 Icon
    Our Free Plans just got better! | Auth0

    With up to 25k MAUs and unlimited Okta connections, our Free Plan lets you focus on what you do best—building great apps.

    You asked, we delivered! Auth0 is excited to expand our Free and Paid plans to include more options so you can focus on building, deploying, and scaling applications without having to worry about your security. Auth0 now, thank yourself later.
    Try free now
  • Build Securely on AWS with Proven Frameworks Icon
    Build Securely on AWS with Proven Frameworks

    Lay a foundation for success with Tested Reference Architectures developed by Fortinet’s experts. Learn more in this white paper.

    Moving to the cloud brings new challenges. How can you manage a larger attack surface while ensuring great network performance? Turn to Fortinet’s Tested Reference Architectures, blueprints for designing and securing cloud environments built by cybersecurity experts. Learn more and explore use cases in this white paper.
    Download Now
  • 1

    Create Index from PDF

    PDF Indexing Script: Searches PDF for words, records page numbers

    This Python script helps automate the process of creating an index for a PDF document. It reads a list of words from a text file, searches through each page of the PDF, and records the page numbers where each word appears. The script accounts for the first 24 pages of the PDF that use Roman numerals (i-xxiv) and adjusts the page numbers accordingly. It is designed to be case-insensitive, ensuring that variations in capitalization do not affect the search results. As it processes the PDF...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 2
    Sphinx

    Sphinx

    Main repository for the Sphinx documentation builder

    Sphinx is a tool that makes it easy to create intelligent and beautiful documentation, written by Georg Brandl and licensed under the BSD license. It was originally created for the Python documentation, and it has excellent facilities for the documentation of software projects in a range of languages. Of course, this site is also created from reStructuredText sources using Sphinx! HTML (including Windows HTML Help), LaTeX (for printable PDF versions), ePub, Texinfo, manual pages, plain text...
    Downloads: 7 This Week
    Last Update:
    See Project
  • 3
    LlamaParse

    LlamaParse

    Parse files for optimal RAG

    LlamaParse is a GenAI-native document parser that can parse complex document data for any downstream LLM use case (RAG, agents). Load in 160+ data sources and data formats, from unstructured, and semi-structured, to structured data (API's, PDFs, documents, SQL, etc.) Store and index your data for different use cases. Integrate with 40+ vector stores, document stores, graph stores, and SQL db providers.
    Downloads: 8 This Week
    Last Update:
    See Project
  • 4
    Papermerge

    Papermerge

    Open Source Document Management System for Digital Archives

    Papermerge is an open source document management system (DMS) primarily designed for archiving and retrieving your digital documents. Instead of having piles of paper documents all over your desk, office or drawers - you can quickly scan them and configure your scanner to directly upload to Papermerge DMS. Store, organize and index scanned documents in PDF, JPEG and TIFF formats. Instantly find relevant information using full text, tags and metadata-based search. Papermerge is free and open...
    Downloads: 7 This Week
    Last Update:
    See Project
  • MongoDB Atlas | Run databases anywhere Icon
    MongoDB Atlas | Run databases anywhere

    Ensure the availability of your data with coverage across AWS, Azure, and GCP on MongoDB Atlas—the multi-cloud database for every enterprise.

    MongoDB Atlas allows you to build and run modern applications across 125+ cloud regions, spanning AWS, Azure, and Google Cloud. Its multi-cloud clusters enable seamless data distribution and automated failover between cloud providers, ensuring high availability and flexibility without added complexity.
    Learn More
  • 5
    PaperQA2

    PaperQA2

    High accuracy RAG for answering questions from scientific documents

    ... search index, and finally answer the user question with an LLM agent.
    Downloads: 6 This Week
    Last Update:
    See Project
  • 6
    Paperless-ngx

    Paperless-ngx

    A community-supported supercharged version of paperless

    Paperless-ngx is a community-supported open-source document management system that transforms your physical documents into a searchable online archive so you can keep, well, less paper.
    Downloads: 4 This Week
    Last Update:
    See Project
  • 7
    AnyTXT Searcher

    AnyTXT Searcher

    A Powerful Desktop Full-Text Search Engine, Just Like Local Google.

    .... You can quickly find any text in any file on your disk by Anytxt almost in 0.1 second. It works on Windows 11,10, 8, 7, Vista, XP, 2008, 2012, 2016,2022... AnyTXT Searcher supports the following file formats: Plain text (txt, cpp, py, html, etc.) Microsoft OneNote (one) Microsoft Word (doc, docx) Microsoft Excel (xls, xlsx) Microsoft PowerPoint (ppt, pptx) PDF WPS Office (wps, et, dps) EBook (epub, mobi, azw3, fb2 etc.) Mind Map Format (lighten, mmap, mm, xmind etc.) OFD .....
    Leader badge
    Downloads: 4,138 This Week
    Last Update:
    See Project
  • 8
    Chordii

    Chordii

    Easy lead sheets from text input

    ChordPro creates elegant, stafless lead sheets for musicians needing only chords and lyrics. It processes plain text input in ChordPro format and it is a rewrite of the old though still popular Chord/Chordii programs.
    Leader badge
    Downloads: 36 This Week
    Last Update:
    See Project
  • 9
    Lexifinder

    Lexifinder

    A tool to create the analytical index of a manuscript

    Lexifinder is a free and open source tool to automate the creation of an analytical index of a manuscript, based on a natural language processing model. First, convert your Docx or ODT file into a PDF. Choose the output text file, set the similarity index, and choose your desired keywords. Lexifinder will include in the index all words whose significance resemble that of at least one keyword. The similarity index spans from 1 to 100 and expresses the degree of resemblance required for a noun...
    Downloads: 4 This Week
    Last Update:
    See Project
  • No-Nonsense Code-to-Cloud Security for Devs | Aikido Icon
    No-Nonsense Code-to-Cloud Security for Devs | Aikido

    Connect your GitHub, GitLab, Bitbucket, or Azure DevOps account to start scanning your repos for free.

    Aikido provides a unified security platform for developers, combining 12 powerful scans like SAST, DAST, and CSPM. AI-driven AutoFix and AutoTriage streamline vulnerability management, while runtime protection blocks attacks.
    Start for Free
  • 10
    WIKINDX

    WIKINDX

    Virtual Research Environment / On-line Bibliography Manager

    Reference management, bibliography management, citations and a whole lot more. Designed by academics for academics, under continuous development since 2003, and used by both individuals and major research institutions worldwide, WIKINDX is a Virtual Research Environment (an enhanced on-line bibliography manager) storing searchable references, notes, files, citations, ideas, and more. An integrated WYSIWYG word processor exports formatted articles to RTF and HTML. Plugins include a...
    Leader badge
    Downloads: 21 This Week
    Last Update:
    See Project
  • 11
    WA2L/WinTools

    WA2L/WinTools

    End User Tools for Windows.

    Some end user utilities for the Windows operating system. The utilities can be called thru the "Send To" context menu when right-clicking on a file or directory in the explorer or thru the Windows "Start Menu". The package can be 'installed' portable and does not need admin rights. ◆ 𝗨𝗧𝗜𝗟𝗜𝗧𝗜𝗘𝗦 - https://sourceforge.net/projects/wa2l-wintools/files/ → README ◆ 𝗙𝗘𝗔𝗧𝗨𝗥𝗘𝗦 - https://wa2l-wintools.sourceforge.net/man1/wintools.1.html -...
    Downloads: 10 This Week
    Last Update:
    See Project
  • 12
    TextSeek

    TextSeek

    Professional full-text desktop search tool

    TextSeek is a professional full-text desktop search tool. Unlike the filename search tool like Everything and Listary, TextSeek can search filename and file content easily and quickly. It supports PDF, Word, Excel, Powerpoint, RTF and other formats. The software can run directly, and no extra package is required to install.
    Downloads: 4 This Week
    Last Update:
    See Project
  • 13
    miRDeep*

    miRDeep*

    MiRDeep*

    Please cite: An, J., Lai, J., Lehman, M.L. and Nelson, C.C. (2013) miRDeep*: an integrated application tool for miRNA identification from RNA sequencing data. Nucleic Acids Res, 41, 727-737. We will create index for you if you tell us your interested species (j.an@qut.edu.au). download command line version "MDS_command_line_Vxx.zip" clicking "Browse All Files" please find miRPlant in sourceforge for plant miRNA prediction.
    Leader badge
    Downloads: 5 This Week
    Last Update:
    See Project
  • 14
    File System Crawler for Elasticsearch

    File System Crawler for Elasticsearch

    Elasticsearch File System Crawler (FS Crawler)

    This crawler helps to index binary documents such as PDF, Open Office, MS Office. Local file system (or a mounted drive) crawling and indexing new files, updating existing ones, and removing old ones. Remote file system over SSH/FTP crawling. REST interface to let you “upload” your binary documents to elastic search.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 15
    node-html-pdf

    node-html-pdf

    HTML to PDF converter that uses phantomjs

    HTML to PDF converter that uses phantomjs. html-pdf can read the header or footer either out of the footer and header config object or out of the HTML source. You can either set a default header & footer or overwrite that by appending a page number (1 based index) to the id="pageHeader" attribute of an HTML tag. You can use any combination of those tags. The library tries to find any element, that contains the page header or pageFooter id prefix. The full options object gets converted to JSON...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 16
    Paperless-ng

    Paperless-ng

    A supercharged version of paperless, scan, index and archive docs

    Paperless is a simple Django application running in two parts, a Consumer (the thing that does the indexing) and a Web server (the part that lets you search & download already-indexed documents). Paper is a nightmare. Environmental issues aside, there’s no excuse for it in the 21st century. It takes up space, collects dust, doesn’t support any form of a search feature, indexing is tedious, it’s heavy and prone to damage & loss. I wrote this to make “going paperless” easier. I do not have to...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 17

    IEC104-RTU-Simulator

    IEC104 RTU simulator

    IEC 104 RTU simulator is a program to simulate the operation of RTU (remote terminal unit) or server as defined by protocol IEC 60870-5-104. It can simulate any number of RTUs or servers. Simulated RTUs could be connected to different or same SCADA master station. IO signals are indexed and grouped by using index numbers. You can send IO signals from all RTUs to the connected SCADA master stations at once by using index number. It is written in python3 language and code is supporting both...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 18
    OpenSearchServer Search Engine

    OpenSearchServer Search Engine

    An open source search engine with RESTFul API and crawlers

    OpenSearchServer is a powerful, enterprise-class, search engine program. Using the web user interface, the crawlers (web, file, database, etc.) and the client libraries (REST/API , Ruby, Rails, Node.js, PHP, Perl) you will be able to integrate quickly and easily advanced full-text search capabilities in your application: Full-text with basic semantic, join queries, boolean queries, facet and filter, document (PDF, Office, etc.) indexation, web scrapping,etc. OpenSearchServer runs on Windows...
    Downloads: 26 This Week
    Last Update:
    See Project
  • 19
    Docmenta

    Docmenta

    Single Source Publishing Web-Application

    Docmenta is a Java web-application for single source publishing and help authoring. The application allows collaborative creation of documentation, e-books and online-help. Supported output formats are PDF, HTML, WebHelp, EPUB (eBook) and DocBook. For more information, visit: http://www.docmenta.org
    Downloads: 1 This Week
    Last Update:
    See Project
  • 20
    ms-small-basic-dev-guide

    ms-small-basic-dev-guide

    Command reference for MSB (Microsoft Small Basic)

    Revised - 2017.10.13 This is a "Developer Command Reference Guide" for MSB (Microsoft Small Basic) divided into 12 pdf sections. There are 11 subject areas plus 1 reference doc; master command list, and reference charts: color, ascii, music, and math. 1) Includes master api & reference charts 2) 11 individual subject areas 3) Complete doc set merged for mobile users 4) 12 tab 3 ring binder index page This set of documents are in their **finished format**. Only occasional corrections...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 21

    jvqa

    Video Quality Assessment in Java

    jvqa 1.0-alpha-8 March 3, 2015 Video Quality Assessment in Java. Built upon Java Native Access for Avisynth - jnavi (https://sourceforge.net/projects/jnavi). Based on the Fast Structural Similarity index proposed by Chen and Bovik: http://live.ece.utexas.edu/publications/2011/chen_rtip_2011.pdf Implements the original, variance-based SSIM, Multi-Scale SSIM, Fast SSIM, 2, 3 and 4-Component Weighted SSIM, and Gradient Magnitude Similarity Deviation. Indexes may be customized...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 22
    IndexFile (IFile)

    IndexFile (IFile)

    IFile, PHP based framework for indexing and search in the documents

    Index documents using Lucene Seach Engine or the MySql Full-Text. IFile supports many type of documents: Rich Text Format (.rtf); Moving Picture Expert Group-1/2 Audio Layer 3 (.mp3); Joint Photographic Experts Group (.jpg - .jpeg); Tagged Image File Format (.tiff); Microsoft Word 97-2000 (.doc); Microsoft Word 2003-2007 (.docx); Microsoft Excel 97-2000 (.xls); Microsoft Excel 2003-2007 (.xlsx); Microsoft PowerPint 2003-2007 (.pptx); OpenOffice.org Writer (.odt...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 23

    Personalized Search Engine

    Personalized Search Engine for Your Files

    ... also extract text content from files of many wildly used file types such as pdf, doc, ppt, and mp3 to improve the index quality.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 24

    eLibrary

    Personalized Search Engine for Commonly Used Files

    ... text content from files of many wildly used file types such as pdf, doc, ppt, and mp3 to improve the index quality.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 25

    SearchServer2

    Local and Remote Meta Search Engine

    Searchserver2 creates an index of your local filesystem, index RSS Feeds and Text-Files and sends Remote Requests to other Search Engines like Google or Youtube. Searchserver is a Standalone Webserver written in .net and Contains a Webpage for Searching. Contents like 3D Parts, Images, Movies (youtube and local) and Audio is playing directly in the Browser-Window. Movies and PDF shown as Thumbnails.
    Downloads: 1 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • 2
  • Next
Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.