| Name | Modified | Size | Downloads / Week |
|---|---|---|---|
| Parent folder | |||
| README.md | 2024-10-29 | 3.0 kB | |
| v0.8.0 source code.tar.gz | 2024-10-29 | 18.4 MB | |
| v0.8.0 source code.zip | 2024-10-29 | 18.6 MB | |
| Totals: 3 Items | 37.1 MB | 0 | |
Breaking changes
- Updated the minimal supported Python version to 3.9.
- All prompts for our built-in LLM based metrics are updated. Now all of them have
Output your thought process first, and then provide your final answeras the last sentence to make sure LLM evaluators actually do the chain-of-thought reasoning. This may affect the output scores as well. - Fixed a typo in a module name. Now
langcheck.utils.progess_baris renamed tolangcheck.utils.progress_bar. - Default prompts for
langcheck.en.toxicityandlangcheck.ja.toxicityare updated. Refer to [#136] for comparison with the original prompt. You can fallback to the old prompts by specifyingeval_prompt_version="v1"as an argument. - Updated the arguments for
langcheck.augment.rephrase. Now they takeEvalClients instead of directly taking OpenAI parameters.
New Features
- Added langcheck.metrics.custom_text_quality. With functions in this module, you can build your own LLM-based metrics with custom prompts. See the documentation for details.
- Added support of some local LLMs as evaluators
- LlamaEvalClient
- PrometheusEvalClient
- Added new text augmentations
jailbreak_templatesaugmentation with the following templatesbasic,chatgpt_dan,chatgpt_good_vs_evil,johnanduniversal_adversarial_suffix(EN)basic,chatgpt_good_vs_evilandjohn(JA)
payload_splitting(EN, JA)to_full_width(EN)conv_kana(JA)- Added new LLM-based built-in metrics for both EN & JA languages
answer_correctnessanswer_safetypersonal_data_leakagehate_speechadult_contentharmful_activity- Added "Simulated Annotators", a confidence score estimating method proposed in paper Trust or Escalate: LLM Judges with Provable Guarantees for Human Agreement. You can use that by adding
calculated_confidence=Trueforlangcheck.metrics.en.pairwise_comparison. - Supported embedding-based metrics (e.g.
semantic_similarity) with async OpenAI-based eval clients.
Bug Fixes
- Added error handling code in
OpenAIEvalClientandGeminiAIEvalClientso that they just returnNoneeven if they fail in the function calling step. - Updated
langcheck.metrics.pairwise_comparisonto accept lists withNoneas source texts. - Fixed an error in
langcheck.augment.synonymcaused by a missingnltkpackage. - Fixed the issue on decoding UTF-8 texts in some environments.
- Fixed typos in documentation.
Full Changelog: https://github.com/citadel-ai/langcheck/compare/v0.7.1...v0.8.0