Download Latest Version 0.15.0.zip (16.6 MB)
Email in envelope

Get an email when there's a new version of ACHE Focused Crawler

Home / 0.11.0
Name Modified Size InfoDownloads / Week
Parent folder
README.md 2018-06-01 1.3 kB
v0.11.0.tar.gz 2018-06-01 15.7 MB
v0.11.0.zip 2018-06-01 16.3 MB
Totals: 3 Items   32.1 MB 0

We are pleased to announce version 0.11.0 of ACHE Crawler! Besides several technical improvements, we are really glad to announce the very first ACHE release under the Apache License 2 (APLv2).

Following is a detailed log of the major changes since the last version:

  • Removed dependency on Weka and reimplemented all machine-learning code using SMILE.
  • Added option to skip cross-validation on ache buildModel command
  • Added option to configure max number of features on ache buildModel command
  • Changed license from GNU GPL to Apache 2.0
  • Added tool (ache run ReplayCrawl) to replay old crawls using a new configuration file
  • Added near-duplicate page detection using min-hashing and LSH
  • Support ELASTIC format in Kafka data format (issue [#155])
  • Upgrade react-scripts to get rid of vulnerable transitive dependency (hoek:4.2.0)
  • Upgrade npm version to 5.8.0 on gradle build script
  • Changed smile target page classifier to use Platt's scaling only when the parameter 'relevance_threshold' is provided in the pageclassifier.yml file.
  • Added Ansible scripts for automatic deployment
  • Added RocksDB-based target repository (RocksDBTargetRepository)
  • Fixed bug in ache-dashboard that prevented reloading search page on the browser page refresh (issue [#163])
  • Support Elasticsearch 6.x (issue [#158])
Source: README.md, updated 2018-06-01