Usage

This document provides detailed instructions for using the factly command-line tool.

Command Line Interface

The primary entrypoint for Factly is the factly command-line interface, which provides tools to evaluate the factuality of Large Language Models (LLMs) on the MMLU benchmark.

Basic Usage

# Run factuality evaluation with default settings
factly evaluate

# Run evaluation and generate plots
factly evaluate --plot

# Get help on all available options
factly evaluate --help

# List available MMLU tasks
factly list-tasks

Command Structure

Factly provides the following commands:

factly [OPTIONS] COMMAND [ARGS]...

Main Commands:

evaluate: Run factuality evaluation on MMLU benchmark
list-tasks: List all available MMLU tasks

Common Global Options:

--help: Show help message and exit
--version: Show version and exit

Command Line Options for `evaluate`

Option	Description	Default
`--model TEXT`	The OpenAI model to use for evaluation	From `.env` or `gpt-4o`
`--tasks TEXT`	MMLU task categories to evaluate (can be repeated)	All tasks
`--n-shots INTEGER`	Number of examples for few-shot learning	`0`
`--workers INTEGER`	Maximum number of concurrent API requests	Auto-detected based on system resources
`--instruction TEXT`	Path to YAML file with system instruction variants.	`./instructions.yaml`
`--plot`	Generate visualization plots
`--plot-path`	Path to save the plot	`./outputs/factuality-<model>-t<count>.png)`
`--verbose`	Enable verbose output
`--help`	Show help message and exit

Command Line Options for `list-tasks`

Option	Description	Default
`--help`	Show help message and exit

Advanced Usage

Task Selection

You can select specific MMLU tasks to evaluate:

# Evaluate specific model on selected MMLU tasks
factly evaluate --model gpt-4o --tasks mathematics --tasks high_school_us_history

# Evaluate on STEM tasks only
factly evaluate --tasks STEM

# Evaluate on business-related tasks
factly evaluate --tasks BUSINESS

Few-Shot Learning

Configure the number of examples provided for few-shot learning:

# Zero-shot evaluation (default)
factly evaluate --n-shots 0

# 3-shot evaluation
factly evaluate --n-shots 3

# 5-shot evaluation
factly evaluate --n-shots 5

Performance Optimization

Factly uses asynchronous concurrent processing to maximize evaluation throughput. It evaluates multiple questions concurrently for each model, significantly reducing total evaluation time. You can control the concurrency level with the --workers parameter:

# Auto-determine optimal concurrency (default)
factly evaluate --tasks STEM

# Set concurrency level explicitly (process 20 questions in parallel)
factly evaluate --tasks STEM --workers 20

The implementation uses asyncio and semaphores for controlled concurrency with automatic resource detection for optimal performance across different environments.

System Instructions

Factly supports different system instructions for prompt engineering experiments:

# Use the default instruction from instructions.yaml in current directory
factly evaluate

# Use a custom instructions defined in ~/path/to/instructions.yaml file
factly evaluate --instructions ~/path/to/instructions.yaml

By default instructions should be defined in the instructions.yaml file in current directory. Each instruction should provide a different way to guide the model’s behavior when responding to questions.

Examples

Basic Evaluation

# Run basic evaluation with default settings
factly evaluate

# Run evaluation and generate plots
factly evaluate --plot

# Run verbose evaluation with plots
factly evaluate --verbose --plot

Subject-Specific Evaluation

# Evaluate mathematics knowledge
factly evaluate --tasks mathematics --n-shots 3 --plot

# Evaluate humanities subjects
factly evaluate --tasks high_school_european_history --tasks high_school_us_history --plot

# Evaluate computer science knowledge
factly evaluate --tasks computer_science --verbose --plot

Customized Evaluation

# Customize API settings and system instruction
export OPENAI_API_KEY=https://your-proxy.example.com/v1
factly evaluate --model gpt-4o-mini --instructions ~/path/to/instructions.yaml

Environment Variables

Instead of specifying command-line arguments each time, you can set environment variables in the .env file:

# API Configuration
OPENAI_API_KEY=your_api_key_here
OPENAI_MODEL=gpt-4o
OPENAI_API_BASE=your_api_base_url  # Optional