Name | Modified | Size | Downloads / Week |
---|---|---|---|
Parent folder | |||
py_data_juicer-1.3.1-py3-none-any.whl | 2025-04-11 | 475.1 kB | |
README.md | 2025-04-11 | 1.4 kB | |
Release v1.3.1_ added HumanOPs _ fixed some bugs source code.tar.gz | 2025-04-11 | 31.7 MB | |
Release v1.3.1_ added HumanOPs _ fixed some bugs source code.zip | 2025-04-11 | 32.2 MB | |
Totals: 4 Items | 64.4 MB | 0 |
Major Updates
- 💥 prototype Implementation for HumanOps (annotation). [#617] Included features:
- boilerplate code for supporting label studio powered human annotation ops
- a human preference annotation reference implementation is provided
- label studio service script; can start up local instance using docker or pip, whichever is available
- reference configs and data
- event driven and notification mixins framework for ops
New OPs
extract_tables_from_html_mapper
: extract tables from html texts. [#634]general_fused_op
: an explicitly fused operator designed to execute multiple sequential operations (OPs) on the same batch, enabling fine-grained control over data processing. [#626]
Bug Fixed
- fix dataset builder initialization failure [#630]
- update Executor references from Executor to DefaultExecutor [#632] [#633]
- switch the backend of
plt
to avoid sub-process/thread error [#633] - fix some boundary condition bugs in several deduplicators [#635] [#637]
Others
- check dataset when loading to support to pass dataset in the
DefaultExecutor.run
method. [#633] - update docs to highlight light env installation part. [#636]
Acknowledgement
- @liuyuhanalex helps to add a new OP and fix some of the boundary condition bugs. [#634] [#635]
Full Changelog: https://github.com/modelscope/data-juicer/compare/v1.3.0...v1.3.1