LangCheck - Browse /v0.8.0 at SourceForge.net

The interactive file manager requires Javascript. Please enable it or use sftp or scp.
You may still browse the files here.

Name	Modified	Size	InfoDownloads / Week
Parent folder
README.md	2024-10-29	3.0 kB	0
v0.8.0 source code.tar.gz	2024-10-29	18.4 MB	0
v0.8.0 source code.zip	2024-10-29	18.6 MB	0
Totals: 3 Items		37.1 MB	0

Breaking changes

Updated the minimal supported Python version to 3.9.
All prompts for our built-in LLM based metrics are updated. Now all of them have Output your thought process first, and then provide your final answer as the last sentence to make sure LLM evaluators actually do the chain-of-thought reasoning. This may affect the output scores as well.
Fixed a typo in a module name. Now langcheck.utils.progess_bar is renamed to langcheck.utils.progress_bar.
Default prompts for langcheck.en.toxicity and langcheck.ja.toxicity are updated. Refer to [#136] for comparison with the original prompt. You can fallback to the old prompts by specifying eval_prompt_version="v1" as an argument.
Updated the arguments for langcheck.augment.rephrase. Now they take EvalClients instead of directly taking OpenAI parameters.

Added langcheck.metrics.custom_text_quality. With functions in this module, you can build your own LLM-based metrics with custom prompts. See the documentation for details.
Added support of some local LLMs as evaluators
LlamaEvalClient
PrometheusEvalClient
Added new text augmentations
jailbreak_templates augmentation with the following templates
- basic, chatgpt_dan, chatgpt_good_vs_evil, john and universal_adversarial_suffix (EN)
- basic, chatgpt_good_vs_evil and john (JA)
payload_splitting (EN, JA)
to_full_width (EN)
conv_kana (JA)
Added new LLM-based built-in metrics for both EN & JA languages
answer_correctness
answer_safety
personal_data_leakage
hate_speech
adult_content
harmful_activity
Added "Simulated Annotators", a confidence score estimating method proposed in paper Trust or Escalate: LLM Judges with Provable Guarantees for Human Agreement. You can use that by adding calculated_confidence=True for langcheck.metrics.en.pairwise_comparison.
Supported embedding-based metrics (e.g. semantic_similarity) with async OpenAI-based eval clients.

Added error handling code in OpenAIEvalClient and GeminiAIEvalClient so that they just return None even if they fail in the function calling step.
Updated langcheck.metrics.pairwise_comparison to accept lists with None as source texts.
Fixed an error in langcheck.augment.synonym caused by a missing nltk package.
Fixed the issue on decoding UTF-8 texts in some environments.
Fixed typos in documentation.

Full Changelog: https://github.com/citadel-ai/langcheck/compare/v0.7.1...v0.8.0

Source: README.md, updated 2024-10-29