Documentation

Factly is a modern CLI tool designed to evaluate the factuality of Large Language Models (LLMs) on the MMLU (Massive Multitask Language Understanding) benchmark. It provides a robust framework for prompt engineering experiments and factual accuracy assessment.

Overview

Features

  • Evaluate LLM factuality on the MMLU benchmark with detailed results

  • Support for various prompt engineering experiments via configurable system instructions

  • Generate comparative visualizations of factuality scores across models and prompts

  • Structured output for easy analysis and comparison

  • Built with modern Python tooling (Python 3.12, uv, click, pydantic)

  • Extensible and reproducible evaluation workflows

Note

Currently, only OpenAI models are supported.

Quick Start

# Run factuality evaluation with default settings
factly evaluate

# Run evaluation and generate plots
factly evaluate --plot

# Get help on all available options
factly evaluate --help

That’s it! The tool uses optimized default parameters and saves all outputs to the output directory.

For more advanced usage, including saving results and evaluation, see the Usage Guide.

Note

For detailed installation instructions, please see the Installation Guide. And for usage instructions, use cases, examples, and advanced configuration options, please see the Usage Guide.


Full Table of Contents

The User Guide

This part of the documentation, which is mostly prose, begins with some background information about Factly, then focuses on step-by-step instructions for getting the most out of Factly.

The Community Guide

This part of the documentation, which is mostly prose, details the Factly ecosystem and community.

The API Documentation / Guide

If you are looking for information on a specific function, class, method, or algorithm, this part of the documentation is for you.

The Contributor Guide

If you want to contribute to the project, this part of the documentation is for you.

Support

Should you have any question, any remark, or if you find a bug, or if there is something you can’t do with the Factly, please open an issue.

Project Information

Factly is released under the MIT License, its documentation lives at Read the Docs, the code on GitHub, and the latest release on PyPI. It’s rigorously tested on Python 3.12+.

If you’d like to contribute to Factly you’re most welcome!

Indices and tables