Download Latest Version v0.7.3_ Updates for Docker container_s SingleFile, YT-DLP, Chrome, and other dependencies only source code.zip (543.7 kB)
Email in envelope

Get an email when there's a new version of ArchiveBox

Home / v0.6.2
Name Modified Size InfoDownloads / Week
Parent folder
Electron-ArchiveBox-macOS-x64-0.6.2.app.zip 2021-04-14 80.3 MB
archivebox-0.6.2.tar.gz 2021-04-14 413.6 kB
archivebox-0.6.2-py3-none-any.whl 2021-04-14 489.4 kB
archivebox--0.6.2-1.big_sur.bottle.tar.gz 2021-04-14 12.0 MB
archivebox_0.6.2-1_all.deb 2021-04-14 288.7 kB
README.md 2021-04-14 4.7 kB
v0.6.2_ _10x performance gain, new Admin UI _ CLI features, and more source code.tar.gz 2021-04-14 394.1 kB
v0.6.2_ _10x performance gain, new Admin UI _ CLI features, and more source code.zip 2021-04-14 489.2 kB
Totals: 8 Items   94.4 MB 0

New features

  • new ArchiveResult log in the admin web UI, with full editing ability of individual extractor outputs + list of outputs under each Snapshot admin entry
  • ability to save multiple snapshots of the same URL over time using new Re-snapshot button
  • add init --quick and server --quick-init options to quickly update the db version without doing a full re-init (for users with large archive collections this will make version upgrades a lot faster / less painful)
  • add new archivebox setup command and archivebox init --setup flag to aid in automatically installing dependencies and creating a superuser during initial setup
  • new SNAPSHOTS_PER_PAGE=40 and MEDIA_MAX_SIZE=750m config options
  • allow hotlinking directly to specific extractor output on the snapshot detail page using URL #hash e.g. /archive/<timestamp>/index.html#git
  • add ability to view snapshot matching a given URLs by visiting /archive/https://example.com/some/url -> redirects to -> /archive/<timestamp>/index.html (also works without scheme /archive/example.com)
  • [#660] add ability to tag URLs while adding them via the web UI and via the CLI using archivebox add --tag=tag1,tag2,tag3 ...
  • [#659] add back ability to override visual styling with custom HTML and CSS using new config option CUSTOM_TEMPLATES_DIR
  • ability to add and remove multiple tags at once from the snapshot admin using autocompleting dropdown

Enhancements

  • lots of performance improvements! (in testing with 100k entries, the main index was brought down from 10-14 second load times to ~110ms once cache warms up)
  • full text search now works on the public snapshot list
  • dates and times are now localized to your browser's timezone instead of showing in UTC
  • integrity and correctness improvements to readability, mercury, warc, and other extractors
  • video subtitles and description are now added to the full-text search index as well (including youtube's autogenerated transcripts in all languages)
  • log all errors with full tracebacks to new data/logs/errors.log file (so users no longer have to run in --debug mode to see error details)
  • better archivebox schedule logging and changed logfile location to ./logs/schedule.log
  • better docker-compose setup experience with sonic config example in docker-compose.yml
  • add Django Debug Toolbar + djdt_flamegraph for developers to profile UI performance
  • add --overwrite flag support to archivebox schedule, archived urls get added similarly to add --overwrite
  • [#644] remove boostrap and jquery remove network requests to CDNs by inlining them instead
  • [#647] allow filtering by ArchiveResult status in the Snapshot admin UI to select only links that have been archived or not archived
  • [#550] kill all orphan child processes after each extractor finishes to prevent dangling chromium/node subprocesses and memory leaks
  • 3276434 add new SEARCH_BACKEND_TIMEOUT config option to tune amount of time search backend can take before it gives up
  • more diagnostic info added to the Snapshot admin view including most recent status code, content type, detected server, etc
  • make the order of the table columns, layout, and spacing the same on the public view and private view (also remove DataTable, we're not using it)
  • better snapshot grid page (faster load times, nicer CSS for tags and cards, more actions supported and metadata shown)
  • added Cache-Control headers to dramatically speed up load times by caching favicons, screenshots, etc. in browsers/upstreams
  • new project releases page https://releases.archivebox.io and demo url https://demo.archivebox.io

Bugfixes

  • [#673] fix searching by URL substring in Snapshot admin list
  • [#658] fix Snapshot admin action buttons not working in Safari and some other browsers
  • [#678] fix AssertionError error when archivebox would to attempt archive with CHROME_BINARY=None when Chrome was not found on host system
  • [#654] fix some issues with sonic attempting to index massive text blobs or binary blobs on some pages and hanging
  • [#674] fix UTF-8 encoding encoding problems with file reading/writing on Windows (supporting a Python pkg on Windows is unreasonably painful ya'll)
  • [#433] fix deleted items sometimes reappearing on next import/update
  • [#473] fix issue preventing use of archivebox python API inside raw REPL (not using archivebox shell)
  • fix stdin/stdout/stderr handling for some edge cases in Docker/Docker-Compose

image image

Source: README.md, updated 2021-04-14