Search Results for "extract email addresses from web pages"

Sort By:

Showing 150 open source projects for "extract email addresses from web pages"

View related business solutions

$300 in Free Credit Towards Top Cloud Services
Build VMs, containers, AI, databases, storage—all in one place.

Start your project in minutes. After credits run out, 20+ products include free monthly usage. Only pay when you're ready to scale.

Get Started
MongoDB Atlas runs apps anywhere
Deploy in 115+ regions with the modern database for every enterprise.

MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.

Start Free
1

Web Spider, Web Crawler, Email Extractor

Free Extracts Emails, Phones and custom text from Web using JAVA Regex

In Files there is WebCrawlerMySQL.jar which supports MySql Connection Free Web Spider & Crawler. Extracts Information from Web by parsing millions of pages. Store data into Derby Database and data are not being lost after force closing the spider. - Free Web Spider , Parser, Extractor, Crawler - Extraction of Emails , Phones and Custom Text from Web - Export to Excel File - Data Saved into Derby and MySQL Database - Written in Java Cross Platform Also See Free email Sender : https://sourceforge.net/projects/gitst-free-email-ender/ Please install Microsoft OpenJDK to start the application https://www.microsoft.com/openjdk

Downloads: 5 This Week

Last Update: 2025-11-23
See Project
2

Toutatis

Extract public Instagram account information from usernames

Toutatis is an open source command-line tool designed to extract publicly available information from Instagram accounts. It helps users gather various data points from a target profile by querying Instagram using a username or account ID. The tool can retrieve details such as profile metadata, follower counts, biography information, and other publicly accessible account attributes. In addition to basic profile data, Toutatis can also reveal contact details that may be publicly exposed, including email addresses and phone numbers associated with the account. ...

Downloads: 11 This Week

Last Update: 4 days ago
See Project
3

MCP Server RAG Web Browser

A MCP Server for the RAG Web Browser Actor

The MCP Server for the RAG Web Browser Actor allows AI assistants and LLMs to perform web searches and extract information from web pages. It facilitates interaction with the web, enabling up-to-date context retrieval for AI applications.

Downloads: 0 This Week

Last Update: 2025-08-21
See Project
4

Spider

High-performance Rust web crawler and scraper for large-scale data

Spider is a high-performance web crawler and web scraping library written in Rust that enables developers to crawl and index websites efficiently. It focuses on speed, concurrency, and reliability by using asynchronous and multi-threaded processing to handle large volumes of web pages. It can rapidly crawl websites to collect links, retrieve page content, and extract structured information from HTML documents.

Downloads: 3 This Week

Last Update: 18 hours ago
See Project
AI-generated apps that pass security review
Stop waiting on engineering. Build production-ready internal tools with AI—on your company data, in your cloud.

Retool lets you generate dashboards, admin panels, and workflows directly on your data. Type something like “Build me a revenue dashboard on my Stripe data” and get a working app with security, permissions, and compliance built in from day one. Whether on our cloud or self-hosted, create the internal software your team needs without compromising enterprise standards or control.

Try Retool free
5

Geziyor

Blazing fast Go framework for web crawling and data scraping tasks

Geziyor is a high-performance web crawling and web scraping framework built for the Go programming language. It is designed to help developers crawl websites and extract structured information from web pages efficiently. It focuses on speed and scalability, allowing large numbers of requests to be processed concurrently. Geziyor supports use cases such as data mining, monitoring web content, and automated testing workflows. ...

Downloads: 0 This Week

Last Update: 10 hours ago
See Project
6

Zotero

Tool to help you collect, organize, annotate, cite, and share research

...The software has a plugin architecture and a connector for web browsers to enable one-click capturing of sources from library catalogs, academic databases, and other sites.

Downloads: 3 This Week

Last Update: 2026-03-09
See Project
7

DuckDuckGo for iOS and Mac

DuckDuckGo Browser for iPhone and Mac

...The Fire Button allows users to clear browsing data instantly with a single tap. Email Protection helps hide personal email addresses from trackers and marketers. Overall, DuckDuckGo offers simple, strong privacy protections built directly into the browsing experience.

Downloads: 22 This Week

Last Update: 18 hours ago
See Project
8

AI-Crawler

Crawl a website starting from a URL, find relevant pages

AI Crawler is an experimental AI-powered web crawling and data extraction tool that uses natural language prompts to guide the discovery and retrieval of relevant information across websites. Unlike traditional web scrapers that rely on static selectors and manual scripting, it uses AI to dynamically identify and prioritize pages based on user intent, making it more flexible and resilient to changes in website structure. Users can define their data requirements in plain English, and the...

Downloads: 1 This Week

Last Update: 2026-03-18
See Project
9

QueryList

Progressive PHP web crawler framework with jQuery-like DOM parsing

QueryList is an extensible PHP web scraping and crawling framework designed to extract and process data from web pages. It provides a simple and expressive API that allows developers to collect structured information from HTML documents using familiar DOM traversal techniques. It is built on top of phpQuery and uses CSS3 selectors similar to those found in jQuery, making it easy for developers to query and manipulate page elements during scraping tasks. ...

Downloads: 0 This Week

Last Update: 2026-03-10
See Project
Gemini 3 and 200+ AI Models on One Platform
Access Google's best plus Claude, Llama, and Gemma. Fine-tune and deploy from one console.

Build generative AI apps with Vertex AI. Switch between models without switching platforms.

Start Free
10

videodl

Lightweight Python tool for downloading videos from many platforms

Videodl is a lightweight video downloader implemented entirely in Python that allows users to retrieve videos from a wide range of online media platforms. It focuses on providing a fast and simple way to parse video pages and download media files, often prioritizing high-definition versions without watermarks when available. It supports numerous video platforms across both Chinese and international streaming ecosystems, enabling users to fetch content from many popular services through a...

Downloads: 1 This Week

Last Update: 2026-03-11
See Project
11

OSINT Framework

OSINT Framework

OSINT-Framework is a web-based intelligence resource map designed to help investigators and researchers quickly locate free open-source intelligence tools and data sources. Rather than functioning as an automated scanner, it organizes hundreds of OSINT resources into a structured, navigable interface grouped by investigation type, such as usernames, email addresses, domains, and social media.

Downloads: 69 This Week

Last Update: 1 day ago
See Project
12

AgentQL MCP

Model Context Protocol server that integrates AgentQL's data

The AgentQL MCP Server is a Model Context Protocol (MCP) server that integrates AgentQL's data extraction capabilities, enabling users to extract structured data from web pages using natural language prompts.

Downloads: 0 This Week

Last Update: 2025-04-08
See Project
13

browserable

Open source and self-hostable browser automation library for AI agents

Browserable is an open-source browser automation framework designed specifically for AI agents that need to interact with web interfaces in a human-like way. The project provides tools that allow automated agents to navigate websites, click buttons, fill out forms, and extract information from pages without manual scripting of each step. Built primarily in JavaScript, the framework offers both a developer-friendly SDK and a REST API that allow integration with AI applications and automation pipelines. ...

Downloads: 0 This Week

Last Update: 2026-03-09
See Project
14

newspaper4k

Python library for scraping and analyzing online news articles easily

Newspaper4k is a Python library designed for extracting, processing, and analyzing news articles from websites. It is a continuation and active fork of the original newspaper3k library, which had stopped receiving updates, with the goal of keeping the ecosystem maintained while adding improvements and bug fixes. It provides developers with tools to automatically download web pages, extract the main article content, and collect associated metadata such as titles, authors, images, and publication dates. ...

Downloads: 0 This Week

Last Update: 2026-03-11
See Project
15

crawley

The unix-way web crawler

Crawls web pages and prints any link it can find. Fast HTML SAX-parser (powered by golang.org/x/net/html) Small (below 1500 SLOC), idiomatic, 100% test-covered codebase. Grabs most of useful resources URLs (pics, videos, audios, forms, etc...) Found URLs are streamed to stdout and guaranteed to be unique (with fragments omitted) Scan depth (limited by starting host and path, by default - 0) can be configured. Can crawl rules and sitemaps from robots.txt. Brute mode - scan HTML comments for...

Downloads: 0 This Week

Last Update: 2026-03-14
See Project
16

DataExtract

Extracts Data Types Like Email Addresses From All Kinds Of Files

DataExtract is a program that scans files of many different types - text, PDF, Word, Excel etc, extracting all kinds of structured patterns, like email addresses and phone numbers, from them.

Downloads: 0 This Week

Last Update: 2025-01-15
See Project
17

Sparrow

Structured data extraction and instruction calling with ML, LLM

Sparrow is an open-source platform designed to extract structured information from documents, images, and other unstructured data sources using machine learning and large language models. The system focuses on transforming complex documents such as invoices, receipts, forms, and scanned pages into structured formats like JSON that can be processed by downstream applications. It combines several components, including OCR pipelines, vision-language models, and LLM-based reasoning modules to...

Downloads: 0 This Week

Last Update: 2026-03-04
See Project
18

Browserbase MCP Server

Allow LLMs to control a browser with Browserbase and Stagehand

Browserbase MCP Server is a server implementation of the Model Context Protocol (MCP) that enables large language models to interact with web browsers programmatically through cloud-based automation. The project provides a standardized interface for connecting AI systems to real-world web environments, allowing them to navigate pages, extract structured data, and perform user-like actions such as clicking, typing, and form submission. It leverages Browserbase infrastructure along with...

Downloads: 0 This Week

Last Update: 2026-03-18
See Project
19

spider_collection

Collection of Python web scraping scripts for data extraction tasks

spider_collection is a collection of Python web crawler scripts created primarily for experimentation, learning, and practical scraping tasks. spider_collection gathers multiple independent spiders designed to collect data from different platforms and services, demonstrating a variety of scraping techniques and workflows. These crawlers make use of common Python scraping tools such as requests, parsel, BeautifulSoup, and the Scrapy framework to extract structured information from web pages. ...

Downloads: 1 This Week

Last Update: 7 days ago
See Project
20

Synapse

Matrix reference homeserver

Matrix is an ambitious new ecosystem for open federated Instant Messaging and VoIP. Everything in Matrix happens in a room. Rooms are distributed and do not exist on any single server. Rooms can be located using convenience aliases like #matrix:matrix.org or #test:localhost:8448. Synapse is currently in rapid development, but as of version 0.5 we believe it is sufficiently stable to be run as an internet-facing service for real usage! Create and manage fully distributed chat rooms with no...

Downloads: 4 This Week

Last Update: 2026-03-24
See Project
21

JS Analyzer

Burp Suite extension for JavaScript static analysis

JS Analyzer is a powerful static analysis tool implemented as a Burp Suite extension that helps security researchers and web developers automatically uncover important artifacts in JavaScript files during web application testing. It parses JavaScript responses intercepted by Burp Suite and intelligently extracts API endpoints, full URLs (including cloud storage links), secrets like API keys or tokens, and email addresses while filtering out noise from irrelevant code patterns. ...

Downloads: 0 This Week

Last Update: 2026-01-28
See Project
22

Python-Spider

Python3 web crawler practice

Python-Spider is a repository intended to teach or provide examples for writing web spiders / crawlers in Python — part of a broader learning and resource collection by its author. The code and documentation are oriented toward beginners or intermediate learners who want to learn how to fetch, parse, and extract data from websites programmatically. As part of the author’s public learning-path repositories, python-spider likely includes examples of HTTP requests, HTML parsing, maybe concurrency or scheduling to crawl multiple pages, and techniques to handle common web-scraping issues. ...

Downloads: 0 This Week

Last Update: 2025-12-08
See Project
23

watercrawl

AI-ready web crawler that extracts and structures website content

WaterCrawl is an open source web crawling and data extraction platform designed to transform website content into structured data suitable for machine learning and AI workflows. It enables developers and researchers to crawl web pages, extract meaningful information, and convert it into formats that are easier to process and analyze. It provides a modern crawling system that can automatically navigate links, control crawl depth, and collect content from targeted sections of a website. ...

Downloads: 0 This Week

Last Update: 2026-03-11
See Project
24

WebHarvest - web data extraction tool

Web data extraction (web data mining, web scraping) tool. It leverages well proved XML and text processing techologies in order to easely extract useful data from arbitrary web pages.

14 Reviews

Downloads: 2 This Week

Last Update: 2025-10-27
See Project
25

Critical

Extract & Inline Critical-path CSS in HTML pages

Critical extracts & inlines critical-path (above-the-fold) CSS from HTML. Generate and inline critical-path CSS. Generate critical-path CSS. Generate and minify critical-path CSS. Generate, minify and inline critical-path CSS. Generate and return output via callback. Generate and return output via promise. When your site is adaptive and you want to deliver critical CSS for multiple screen resolutions this is a useful option. note, (your final output will be minified as to eliminate duplicate...

Downloads: 0 This Week

Last Update: 2024-09-23
See Project