Documentation
Factly is a modern CLI tool designed to evaluate the factuality of Large Language Models (LLMs) on the Massive Multitask Language Understanding (MMLU) benchmark. It provides a robust framework for prompt engineering experiments and factual accuracy assessment.
Overview
Features
Evaluate LLM factuality on the MMLU benchmark with detailed results
Support for various prompt engineering experiments via configurable system instructions
Generate comparative visualizations of factuality scores across models and prompts
Structured output for easy analysis and comparison
Built with modern Python tooling (Python 3.12, uv, click, pydantic)
Extensible and reproducible evaluation workflows
Note
Currently, only OpenAI models are supported.
Quick Start
# Run MMLU evaluation with default settings
factly mmlu
# Run MMLU evaluation and generate plots
factly mmlu --plot
# Get help on all available options
factly mmlu --help
# Get help on all available commands
factly --help
That’s it! The tool uses optimized default parameters and saves all outputs to the output directory.
Note
For detailed installation instructions, please see the Installation Guide. And for usage instructions, use cases, examples, and advanced configuration options, please see the Usage Guide.
Full Table of Contents
The User Guide
This part of the documentation, which is mostly prose, begins with some background information about Factly, then focuses on step-by-step instructions for getting the most out of Factly.
The Community Guide
This part of the documentation, which is mostly prose, details the Factly ecosystem and community.
The API Documentation / Guide
If you are looking for information on a specific function, class, method, or algorithm, this part of the documentation is for you.
The Contributor Guide
If you want to contribute to the project, this part of the documentation is for you.
Support
Should you have any question, any remark, or if you find a bug, or if there is something you can’t do with the Factly, please open an issue.
Project Information
Factly is released under the MIT License, its documentation lives at Read the Docs, the code on GitHub, and the latest release on PyPI. It’s rigorously tested on Python 3.12+.
If you’d like to contribute to Factly you’re most welcome!