Search Results for "docx metadata extraction"

Showing 41 open source projects for "docx metadata extraction"

View related business solutions
  • ConnectWise Cybersecurity Management for MSPs Icon
    ConnectWise Cybersecurity Management for MSPs

    Software and support solutions to protect your clients’ critical business assets

    ConnectWise SIEM (formerly Perch) offers threat detection and response backed by an in-house Security Operations Center (SOC). Defend against business email compromise, account takeovers, and see beyond your network traffic. Our team of threat analysts does all the tedium for you, eliminating the noise and sending only identified and verified treats to action on. Built with multi-tenancy, ConnectWise SIEM helps you keep clients safe with the best threat intel on the market.
  • Cybersecurity Management Software for MSPs Icon
    Cybersecurity Management Software for MSPs

    Secure your clients from cyber threats.

    Define and Deliver Comprehensive Cybersecurity Services. Security threats continue to grow, and your clients are most likely at risk. Small- to medium-sized businesses (SMBs) are targeted by 64% of all cyberattacks, and 62% of them admit lacking in-house expertise to deal with security issues. Now technology solution providers (TSPs) are a prime target. Enter ConnectWise Cybersecurity Management (formerly ConnectWise Fortify) — the advanced cybersecurity solution you need to deliver the managed detection and response protection your clients require. Whether you’re talking to prospects or clients, we provide you with the right insights and data to support your cybersecurity conversation. From client-facing reports to technical guidance, we reduce the noise by guiding you through what’s really needed to demonstrate the value of enhanced strategy.
  • 1
    dude uncomplicated data extraction

    dude uncomplicated data extraction

    dude uncomplicated data extraction: A simple framework

    Dude is a very simple framework for writing web scrapers using Python decorators. The design, inspired by Flask, was to easily build a web scraper in just a few lines of code. Dude has an easy-to-learn syntax. Dude is currently in Pre-Alpha. Please expect breaking changes. You can run your scraper from terminal/shell/command-line by supplying URLs, the output filename of your choice and the paths to your python scripts to dude scrape command.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 2
    yt-dlp

    yt-dlp

    A youtube-dl fork with additional features and fixes

    yt-dlp is a youtube-dl fork based on the now inactive youtube-dlc. The main focus of this project is adding new features and patches while also keeping up to date with the original project
    Downloads: 194 This Week
    Last Update:
    See Project
  • 3
    Trafilatura

    Trafilatura

    Python & command-line tool to gather text on the Web

    Trafilatura is a Python package and command-line tool designed to gather text on the Web. It includes discovery, extraction and text-processing components. Its main applications are web crawling, downloads, scraping, and extraction of main texts, metadata and comments. It aims at staying handy and modular: no database is required, the output can be converted to various commonly used formats. Going from raw HTML to essential parts can alleviate many problems related to text quality, first...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 4
    Amplify

    Amplify

    Automatic enrichment, enhancement, and explanation of your data

    Amplify attaches afterburners to your data. Amplify explains metadata extraction, classification, tagging, and reporting. Eriches derivative data generation like thumbnails, previews, conversions, etc. Enhances batteries-included value-adds like data quality reports, image augmentation, OCR, translations, etc. Amplify leverages the decentralized compute provided by Bacalhau to magically enrich your data. A built-in suite of pipelines decides what your data is and how to best improve upon...
    Downloads: 0 This Week
    Last Update:
    See Project
  • Total Network Visibility for Network Engineers and IT Managers Icon
    Total Network Visibility for Network Engineers and IT Managers

    Network monitoring and troubleshooting is hard. TotalView makes it easy.

    This means every device on your network, and every interface on every device is automatically analyzed for performance, errors, QoS, and configuration.
  • 5
    GROBID

    GROBID

    A machine learning software for extracting information

    ... here covers the usual bibliographical information (e.g. title, abstract, authors, affiliations, keywords, etc.). References extraction and parsing from articles in PDF format, around .87 F1-score against on an independent PubMed Central set of 1943 PDF containing 90,125 references, and around .89 on a similar bioRxiv set of 2000 PDF (using the Deep Learning citation model). All the usual publication metadata are covered (including DOI, PMID, etc.).
    Downloads: 0 This Week
    Last Update:
    See Project
  • 6
    tika-python

    tika-python

    Python binding to the Apache Tika™ REST services

    A Python port of the Apache Tika library that makes Tika available using the Tika REST Server. This makes Apache Tika available as a Python library, installable via Setuptools, Pip and easy to install. To use this library, you need to have Java 7+ installed on your system as tika-python starts up the Tika REST server in the background. To get this working in a disconnected environment, download a tika server file (both tika-server.jar and tika-server.jar.md5, which can be found here) and set...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 7
    AnyTXT Searcher

    AnyTXT Searcher

    A Powerful Desktop Full-Text Search Engine, Just Like Local Google.

    AnyTXT Searcher is a powerful file full-text search engine, a desktop search application for fast document retrieval. Just like a local disk Google search engine, much faster than Windows Search, it is your ideal desktop file content full-text search engine. It has a powerful document parsing engine built in, which extracts the text of commonly used file formats without installing any other software, and combines the built-in high-speed indexing system to store the metadata of the text...
    Leader badge
    Downloads: 2,547 This Week
    Last Update:
    See Project
  • 8
    OpenKM Document Management - DMS

    OpenKM Document Management - DMS

    Document Management System and Content Management System

    OpenKM is a electronic document management system and record management system EDRMS ( DMS, RMS, CMS ). It provides modern and flexible architecture that meet today's IT demands, based on open technology (Java, Tomcat, GWT, Lucene, Hibernate, Spring and jBPM), powerful and scalable multiplatform application. OpenKM is a Web 2.0 application that works with Internet Explorer, Firefox, Safari and Opera. Can be configured in major DMBS like Oracle, PostgreSQL and MySQL among others. Due to...
    Leader badge
    Downloads: 925 This Week
    Last Update:
    See Project
  • 9
    DocWire SDK

    DocWire SDK

    Award-winning modern data processing in C++17/20

    DocWire SDK, a standout C++17/20 data processing tool, has received award from SourceForge and strong backing from Microsoft. It handles nearly 100 file types, empowering efficient text extraction, web data extraction, and document analysis. The upcoming integration of C++17 and C++20 will bring advanced functionalities, particularly in areas like HTTP capabilities and web data extraction. For businesses, the shift to DocWire SDK signifies a leap forward. It promises comprehensive document...
    Downloads: 10 This Week
    Last Update:
    See Project
  • Recruit and Manage your Workforce Icon
    Recruit and Manage your Workforce

    Evolia makes it easier to hire, schedule and track time worked by frontline in medium and large-sized businesses.

    Evolia is a web and mobile platform that connects enterprises with 1000’s of local shift workers and offers free workforce scheduling and time and attendance solutions. Is your business on Evolia?
  • 10

    joy of text

    Editor with scripting language, security features & system interfaces.

    Jot was developed general purpose editor for large CAD files. It's command-driven UI requires no mode switching and hence requires fewer keystrokes to get a typical job done. It is particularly useful for checking and cross-referencing between several source, intermediate and output files - a common requirement for CAD work. But jot's usefulness doesn't stop there. It's sophisticated search features can, for example, be used for interactive data mining or automating the extraction of numerical...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 11
    AICtools

    AICtools

    Workflow and set of tools for Automated Imagery Classification

    AICtools is a GIS workflow and set of tools to facilitate Automated Imagery Classification (AIC) and analysis of surface features over time. Allows bulk processing of large data sets, including automated metadata processing/filtering, compressed archive extraction and file manipulation, raster band compositing, pre-processing, mosaicking, clipping. Automates a subset of operations involved in classification of satellite imagery and the associated raster calculations used for time trend analysis.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 12
    XL-Parser

    XL-Parser

    XL-Parser is a tool for data extraction and analysis.

    XL-Parser provides a bunch of functions for data extraction and analysis. It also provides web log analysis features like a tool for detection of suspicious activities. More details and screenshots on http://le-tools.com.
    Downloads: 6 This Week
    Last Update:
    See Project
  • 13
    PDF To Text Watcher

    PDF To Text Watcher

    Profile-based watcher for automated processing of PDF tiles to text.

    Watches folders to automate transforming PDF files into text with optional metadata extraction. Requires the XPDF tools, which you must source separately. Lets you set up multiple profiles, modify profiles 'hot' without saving and move or delete the source PDFs after processing.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 14

    Semantic Assistants

    Natural Language Processing (NLP) for the Masses

    Semantic Assistants support users in content retrieval, analysis, and development, by offering context-sensitive NLP services directly integrated in standard desktop clients, like a word processor, and web information systems, like a wiki.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 15
    Roomba

    Roomba

    A Node.js tool to examine the correctness of Open Data Metadata

    Linked Open Data (LOD) has emerged as one of the largest collection of interlinked datasets on the web. Benefiting from this mine of data requires the existence of descriptive information about each dataset in the accompanying metadata. Such meta information is currently very limited to few data portals where they are usually provided manually thus giving little or bad quality insights. To address this issue, we propose a scalable automatic approach for extracting, validating and generating...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 16
    Newspaper3k

    Newspaper3k

    News, full-text, and article metadata extraction in Python 3

    Inspired by requests for its simplicity and powered by lxml for its speed. Newspaper is an amazing python library for extracting & curating articles. Newspaper delivers Instapaper style article extraction. Newspaper is a Python3 library! If you are certain that an entire news source is in one language, go ahead and use the same api. Works in 10+ languages, English, Chinese, German, Arabic, and more! On python3 you must install newspaper3k, not newspaper. newspaper is our python2 library...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 17

    biomed_XML_parsers

    Parsers to extract biomedical XML and inject into SQL

    These parsers are intended to support data and metadata extraction from various centralized biomedical resources, with the intention of injecting this data into SQL databases via standardized schemas. These are adaptations of the Medline parser by Martijn Schuemie, (https://github.com/OHDSI/MedlineXmlToDatabase) to whom credit should be given.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 18
    The National Library of New Zealand's Metadata Extraction Tool automatically extracts preservation-related metadata from digital files, then output that metadata in XML formats. It can be used through a graphical user interface or command-line interface. Please take the latest code from 'https://github.com/DIA-NZ/Metadata-Extraction-Tool.git'. The code on source forge will not be updated henceforth as it is moved to github.
    Downloads: 28 This Week
    Last Update:
    See Project
  • 19
    OpenSearchServer Extractor

    OpenSearchServer Extractor

    A RESTFul/JSON Web Service for text and metata extraction

    An open source RESTFul Web Service for text , meta-data extraction and analysis. oss-text-extractor supports various binary formats: Word processor (doc, docx, odt, rtf) Spreadsheet (xls, xlsx, ods) Presentation (ppt, pptx, odp) Publishing (pdf, pub) Web (rss, html/xhtml) Medias (audio, images) Others (vsd, text)
    Downloads: 0 This Week
    Last Update:
    See Project
  • 20
    DCTFinder

    DCTFinder

    Extract title and creation time from web page.

    Web pages do not offer reliable metadata concerning their creation date and time. However, getting the document creation time is a necessary step for allowing to apply temporal normalization systems to web pages. DCTFinder is a system that parses a web page and extracts from its content the title and the creation date of this web page. DCTFinder combines heuristic title detection, supervised learning with Conditional Random Fields (CRFs) for document date extraction, and rule-based creation...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 21
    Simple general-purpose metadata extraction API with support for popular multimedia metadata formats such as EXIF and ID3.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 22

    audiokonverter

    audio file converter for KDE3 and KDE4

    audiokonverter is a service menu (and also a commandline utility) to easily convert from MP3, OGG, AMR, AAC, M4A, FLAC, WMA, RealAudio, Musepack, Wavpack, WAV and movies to MP3, OGG, M4A, WAV and FLAC in Konqueror by right-clicking on them. It needs ffmpeg/avconv, oggenc, oggdec, faac, faad, flac, mplayer and lame to work. id3lib is optional for full functionality. Also optional are vorbis-tools and metaflac for handling other metadata. See README how/where to get that software. Be sure about...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 23
    CMIS Input plugin for Pentaho

    CMIS Input plugin for Pentaho

    Allows querying Content Management Systems that use the CMIS.

    Imagine being able to extract from your Enterprise Content Management System, all the metadata of your documents using simple queries with a query language very close to the traditional SQL. Imagine using the information extracted for statistical purposes, for creating reports and, more generally, to analyse your document archives in a way unthinkable until now with the current tools available. All this is possible within the Pentaho Suite, the Open Source Business Intelligence platform, which...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 24
    TMX editor

    TMX editor

    Open, edit .tmx files and set specific metadata

    Open, edit .tmx files and set metadata to make them of better use in the most CAT programs. Very light program written in C++, enables you to open (almost) any types of bilingual .tmx file, edit it, split and merge segments, shift the segments up and down as well as delete them. In-built QA checker helps to identify wrong alignments. The most important is the ability to add and edit metadata - proper ones as well as those embodied in opened tmx. Finally you can save the TMX in a well-structured...
    Leader badge
    Downloads: 26 This Week
    Last Update:
    See Project
  • 25
    cde4php - Cross Database Engine for PHP

    cde4php - Cross Database Engine for PHP

    Uniform Database Abstraction for PHP Development

    Debby has replaced CDE in the Tina4Stack, you may want to check it out at http://tina4.com CDE is a PHP class which implements the general database functions in PHP and provides a common SQL platform for php development where developers change their databases but not their code. Supports Firebird, MySQL,Oracle,SQLite, MSSQL(both drivers),CUBRID,ODBC. CDE now supports date uniformity, param passing & BLOB handling across all the databases supported. CDE is not a replacement for PDO, in...
    Downloads: 0 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • 2
  • Next