| Name | Modified | Size | Downloads / Week |
|---|---|---|---|
| Parent folder | |||
| README.md | 2018-06-01 | 1.3 kB | |
| v0.11.0.tar.gz | 2018-06-01 | 15.7 MB | |
| v0.11.0.zip | 2018-06-01 | 16.3 MB | |
| Totals: 3 Items | 32.1 MB | 0 | |
We are pleased to announce version 0.11.0 of ACHE Crawler! Besides several technical improvements, we are really glad to announce the very first ACHE release under the Apache License 2 (APLv2).
Following is a detailed log of the major changes since the last version:
- Removed dependency on Weka and reimplemented all machine-learning code using SMILE.
- Added option to skip cross-validation on
ache buildModelcommand - Added option to configure max number of features on
ache buildModelcommand - Changed license from GNU GPL to Apache 2.0
- Added tool (ache run ReplayCrawl) to replay old crawls using a new configuration file
- Added near-duplicate page detection using min-hashing and LSH
- Support ELASTIC format in Kafka data format (issue [#155])
- Upgrade react-scripts to get rid of vulnerable transitive dependency (hoek:4.2.0)
- Upgrade npm version to 5.8.0 on gradle build script
- Changed
smiletarget page classifier to use Platt's scaling only when the parameter 'relevance_threshold' is provided in thepageclassifier.ymlfile. - Added Ansible scripts for automatic deployment
- Added RocksDB-based target repository (RocksDBTargetRepository)
- Fixed bug in ache-dashboard that prevented reloading search page on the browser page refresh (issue [#163])
- Support Elasticsearch 6.x (issue [#158])