RobotsTxt

This is a high-performance, production-tested library for parsing and evaluating robots.txt rules against crawler user agents. It implements the core semantics of the Robots Exclusion Protocol: user-agent sections, Allow/Disallow directives, wildcard handling, and precedence rules. The code is optimized for speed and low memory so large crawls can evaluate millions of URLs quickly. It also focuses on correctness—edge cases like overlapping patterns and longest-match resolution are handled consistently. Consumers integrate it to decide whether a specific URL may be fetched by a particular bot name and to respect crawl-delay or sitemaps hints where applicable. The library serves both search-scale crawlers and smaller tools that need a reliable decision engine for polite crawling.

Features

Fast parser and matcher for Allow/Disallow rules
Correct handling of wildcards and longest-match precedence
User-agent specific rule sections with sensible fallbacks
Low-overhead evaluation for high-throughput crawlers
Support for common extensions like Sitemap hints
Clear API to check URL fetch permissions per bot name

Project Samples

Project Activity

See All Activity >

License

Apache License V2.0

Follow RobotsTxt

RobotsTxt Web Site

Other Useful Business Software

Keep company data safe with Chrome Enterprise

Protect your business with AI policies and data loss prevention in the browser

Make AI work your way with Chrome Enterprise. Block unapproved sites and set custom data controls that align with your company's policies.

Download Chrome

Rate This Project

User Reviews

Be the first to post a review of RobotsTxt!

Additional Project Details

Operating Systems

Linux, Mac, Windows

Programming Language

C++

Related Categories

C++ Robotics Software

Registered

2025-10-09

Similar Business Software

ActCAD Software

ActCAD is a native dwg/dxf cad software suitable for professional 2D drafting and 3D modeling projects. ActCAD is trusted by over 30000 users in over 103 countries for more than 10 years. The interface, commands, icons, dialogs, shortcuts etc. are very much similar to other popular cad software...

See Software
Dronedesk

Are you wasting hours on drone flight planning? Still using spreadsheets, doc templates, and paper checklists? If so, it's time to switch to Dronedesk, the web-based drone operations management application that makes planning safe drone flights super-efficient. Dronedesk does all the...

See Software
The Asset Guardian EAM (TAG)

Meet The Asset Guardian (TAG) Mobi – Tackle Downtime Now TAG Mobi is the solution for preventive maintenance and asset management (EAM) within Microsoft Dynamics 365 Business Central. It helps manufacturing teams reduce risk and minimize downtime by offering dependable, integrated asset...

See Software
Altium Develop

Altium Develop is a multidisciplinary product creation platform that breaks down silos and empowers teams to design collaboratively without limits. Built on Altium Designer and Altium 365, it unifies electrical, mechanical, software, sourcing, and manufacturing teams in a shared environment....

See Software
Houzz Pro

Houzz Pro is the #1 construction management solution for residential contractors and designers. Get an all-in-one solution that spans the full customer lifecycle, including marketing, CRM, estimates, takeoffs, 3D floor plans, project management, selections, online invoicing & payments,...

See Software
Evocon

Trusted by manufacturers worldwide, Evocon is a simple and easy-to-use OEE software that helps manufacturing companies improve their production efficiency and reduce waste. The system enables automated data collection, real-time data visualization, downtime tracking, bottleneck identification,...

See Software

Report inappropriate content

RobotsTxt

The repository contains Google's robots.txt parser

Get an email when there's a new version of RobotsTxt

Features

Project Samples

Project Activity

Categories

License

Follow RobotsTxt

User Reviews

Additional Project Details

Operating Systems

Programming Language

Related Categories

Registered