Inspired by requests for its simplicity and powered by lxml for its speed. Newspaper is an amazing python library for extracting & curating articles. Newspaper delivers Instapaper style article extraction. Newspaper is a Python3 library! If you are certain that an entire news source is in one language, go ahead and use the same api. Works in 10+ languages, English, Chinese, German, Arabic, and more! On python3 you must install newspaper3k, not newspaper. newspaper is our python2 library. Although installing newspaper is simple with pip, you will run into fixable issues if you are trying to install on ubuntu. Source objects are an abstraction of online news media websites like CNN or ESPN. You can initialize them in two different ways. Building a Source will extract its categories, feeds, articles, brand, and description for you. You may also provide configuration parameters like language, browser_user_agent, and etc seamlessly.

Features

  • Multi-threaded article download framework
  • News url identification
  • Text extraction from html
  • Top image extraction from html
  • All image extraction from html
  • Keyword extraction from text
  • Summary extraction from text
  • Author extraction from text
  • Google trending terms extraction

Project Samples

Project Activity

See All Activity >

License

MIT License

Follow Newspaper3k

Newspaper3k Web Site

Other Useful Business Software
Try Google Cloud Risk-Free With $300 in Credit Icon
Try Google Cloud Risk-Free With $300 in Credit

No hidden charges. No surprise bills. Cancel anytime.

Use your credit across every product. Compute, storage, AI, analytics. When it runs out, 20+ products stay free. You only pay when you choose to.
Start Free
Rate This Project
Login To Rate This Project

User Reviews

Be the first to post a review of Newspaper3k!

Additional Project Details

Operating Systems

Mac

Programming Language

Python

Related Categories

Python MARC and Book Library Metadata, Python Metadata Editors

Registered

2021-05-26