Download Latest Version trafilatura-2.0.0 source code.tar.gz (31.4 MB)
Email in envelope

Get an email when there's a new version of Trafilatura

Home / v2.0.0
Name Modified Size InfoDownloads / Week
Parent folder
README.md 2024-12-03 2.0 kB
trafilatura-2.0.0 source code.tar.gz 2024-12-03 31.4 MB
trafilatura-2.0.0 source code.zip 2024-12-03 31.8 MB
Totals: 3 Items   63.1 MB 0

Breaking changes: - Python 3.6 and 3.7 deprecated (#709) - bare_extraction(): - now returns an instance of the Document class by default - as_dict deprecation warning → use .as_dict() method on return value (#730) - bare_extraction() and extract(): no_fallback deprecation warning → use fast instead (#730) - downloads: remove decode argument in fetch_url() → use fetch_response instead (#724) - deprecated graphical user interface now removed (#713) - extraction: move max_tree_size parameter to settings.cfg (#742) - use type hinting (#721, [#723], [#748]) - see Python and CLI deprecations in the docs

Fixes: - set options.source before raising error on empty doc tree by @dmoklaf (#707) - robust encoding in options.source (#717) - more robust mapping for conversion to HTML (#721) - CLI downloads: use all information in settings file (#734) - downloads: cleaner urllib3 code (#736) - refine table markdown output by @unsleepy22 (#752) - extraction fix: images in text nodes by @unsleepy22 (#757)

Metadata: - more robust URL extraction (#710)

Command-line interface: - CLI: print URLs early for feeds and sitemaps with --list with @gremid (#744) - CLI: add 126 exit code for high error ratio (#747)

Maintenance: - remove already deprecated functions and args (#716) - add type hints (#723, [#728]) - setup: use pyproject.toml file (#715) - simplify code (#708, [#709], [#727]) - better debug messages in main_extractor (#714) - evaluation: review data, update packages, add magic_html (#731) - setup: explicit exports through __all__ (#740) - tests: extend coverage (#753)

Documentation: - fix link in docs/index.html by @nzw0301 (#711) - remove docs from published packages (#743) - update docs (#745)

Source: README.md, updated 2024-12-03