Download Latest Version v0.9.0 source code.tar.gz (18.5 MB)
Email in envelope

Get an email when there's a new version of LangCheck

Home / v0.8.0
Name Modified Size InfoDownloads / Week
Parent folder
README.md 2024-10-29 3.0 kB
v0.8.0 source code.tar.gz 2024-10-29 18.4 MB
v0.8.0 source code.zip 2024-10-29 18.6 MB
Totals: 3 Items   37.1 MB 0

Breaking changes

  • Updated the minimal supported Python version to 3.9.
  • All prompts for our built-in LLM based metrics are updated. Now all of them have Output your thought process first, and then provide your final answer as the last sentence to make sure LLM evaluators actually do the chain-of-thought reasoning. This may affect the output scores as well.
  • Fixed a typo in a module name. Now langcheck.utils.progess_bar is renamed to langcheck.utils.progress_bar.
  • Default prompts for langcheck.en.toxicity and langcheck.ja.toxicity are updated. Refer to [#136] for comparison with the original prompt. You can fallback to the old prompts by specifying eval_prompt_version="v1" as an argument.
  • Updated the arguments for langcheck.augment.rephrase. Now they take EvalClients instead of directly taking OpenAI parameters.

New Features

  • Added langcheck.metrics.custom_text_quality. With functions in this module, you can build your own LLM-based metrics with custom prompts. See the documentation for details.
  • Added support of some local LLMs as evaluators
  • LlamaEvalClient
  • PrometheusEvalClient
  • Added new text augmentations
  • jailbreak_templates augmentation with the following templates
    • basic, chatgpt_dan, chatgpt_good_vs_evil, john and universal_adversarial_suffix (EN)
    • basic, chatgpt_good_vs_evil and john (JA)
  • payload_splitting (EN, JA)
  • to_full_width (EN)
  • conv_kana (JA)
  • Added new LLM-based built-in metrics for both EN & JA languages
  • answer_correctness
  • answer_safety
  • personal_data_leakage
  • hate_speech
  • adult_content
  • harmful_activity
  • Added "Simulated Annotators", a confidence score estimating method proposed in paper Trust or Escalate: LLM Judges with Provable Guarantees for Human Agreement. You can use that by adding calculated_confidence=True for langcheck.metrics.en.pairwise_comparison.
  • Supported embedding-based metrics (e.g. semantic_similarity) with async OpenAI-based eval clients.

Bug Fixes

  • Added error handling code in OpenAIEvalClient and GeminiAIEvalClient so that they just return None even if they fail in the function calling step.
  • Updated langcheck.metrics.pairwise_comparison to accept lists with None as source texts.
  • Fixed an error in langcheck.augment.synonym caused by a missing nltk package.
  • Fixed the issue on decoding UTF-8 texts in some environments.
  • Fixed typos in documentation.

Full Changelog: https://github.com/citadel-ai/langcheck/compare/v0.7.1...v0.8.0

Source: README.md, updated 2024-10-29