Download Latest Version v0.5.1 source code.tar.gz (583.7 kB)
Email in envelope

Get an email when there's a new version of Chonkie

Home / v0.5.0
Name Modified Size InfoDownloads / Week
Parent folder
README.md 2025-02-17 5.1 kB
v0.5.0 source code.tar.gz 2025-02-17 583.5 kB
v0.5.0 source code.zip 2025-02-17 625.4 kB
Totals: 3 Items   1.2 MB 0

🚨 Breaking changes

  • All chunkers except TokenChunker have their argument tokenizer renamed to tokenizer_or_token_counter to denote that the chunkers support callable token counters as well.
  • DeprecatedWarning has been set for chunk_overlap>0 and users are suggested to use OverlapRefinery for its speed and flexibility.

✨ Highlights

  • All chunkers now support a return_type="texts" parameter, causing the chunker to output only List[str]; skip receiving the metadata available in the Chunk dataclass and get only texts. This saves a little bit of memory as well.
  • All chunkers support Callable in their tokenizer_or_token_counter arg. This allows you to pass in functions defined like def token_counter (text:str) -> int: ... into the chunkers.
  • All chunkers which use delimiters (i.e. SentenceChunker, RecursiveChunker, LateChunker etc) have include_delim="next" which puts the delimiter in the next chunk. This feature is useful in processing Markdown files properly.
  • Added initial support for Chonkie's pre-processing classes, Chef with TextChef that can handle loading and pre-processing Text and Markdown files.
  • All Chunk dataclasses have to_dict and from_dict method, which allows to convert Chunk <--> Dict. This is especially useful if you want to store chunks as JSON or JSONLines files.

What's Changed

Full Changelog: https://github.com/chonkie-ai/chonkie/compare/v0.4.1...v0.5.0

Source: README.md, updated 2025-02-17