Usage

This document provides detailed instructions for using the factly command-line tool.

Command Line Interface

The primary entrypoint for Factly is the factly command-line interface, which provides tools to evaluate the factuality of Large Language Models (LLMs) on the MMLU benchmark.

Basic Usage

# Run factuality evaluation with default settings
factly evaluate

# Run evaluation and generate plots
factly evaluate --plot

# Get help on all available options
factly evaluate --help

# List available MMLU tasks
factly list-tasks

Command Structure

Factly provides the following commands:

factly [OPTIONS] COMMAND [ARGS]...

Main Commands:

  • evaluate: Run factuality evaluation on MMLU benchmark

  • list-tasks: List all available MMLU tasks

Common Global Options:

  • --help: Show help message and exit

  • --version: Show version and exit

Command Line Options for evaluate

Option

Description

Default

--model TEXT

The OpenAI model to use for evaluation

From .env or gpt-4o

--tasks TEXT

MMLU task categories to evaluate (can be repeated)

All tasks

--n-shots INTEGER

Number of examples for few-shot learning

0

--workers INTEGER

Maximum number of concurrent API requests

Auto-detected based on system resources

--instruction TEXT

Path to YAML file with system instruction variants.

./instructions.yaml

--plot

Generate visualization plots

--plot-path

Path to save the plot

./outputs/factuality-<model>-t<count>.png)

--verbose

Enable verbose output

--help

Show help message and exit

Command Line Options for list-tasks

Option

Description

Default

--help

Show help message and exit

Advanced Usage

Task Selection

You can select specific MMLU tasks to evaluate:

# Evaluate specific model on selected MMLU tasks
factly evaluate --model gpt-4o --tasks mathematics --tasks high_school_us_history

# Evaluate on STEM tasks only
factly evaluate --tasks STEM

# Evaluate on business-related tasks
factly evaluate --tasks BUSINESS

Few-Shot Learning

Configure the number of examples provided for few-shot learning:

# Zero-shot evaluation (default)
factly evaluate --n-shots 0

# 3-shot evaluation
factly evaluate --n-shots 3

# 5-shot evaluation
factly evaluate --n-shots 5

Performance Optimization

Factly uses asynchronous concurrent processing to maximize evaluation throughput. It evaluates multiple questions concurrently for each model, significantly reducing total evaluation time. You can control the concurrency level with the --workers parameter:

# Auto-determine optimal concurrency (default)
factly evaluate --tasks STEM

# Set concurrency level explicitly (process 20 questions in parallel)
factly evaluate --tasks STEM --workers 20

The implementation uses asyncio and semaphores for controlled concurrency with automatic resource detection for optimal performance across different environments.

System Instructions

Factly supports different system instructions for prompt engineering experiments:

# Use the default instruction from instructions.yaml in current directory
factly evaluate

# Use a custom instructions defined in ~/path/to/instructions.yaml file
factly evaluate --instructions ~/path/to/instructions.yaml

By default instructions should be defined in the instructions.yaml file in current directory. Each instruction should provide a different way to guide the model’s behavior when responding to questions.

Examples

Basic Evaluation

# Run basic evaluation with default settings
factly evaluate

# Run evaluation and generate plots
factly evaluate --plot

# Run verbose evaluation with plots
factly evaluate --verbose --plot

Subject-Specific Evaluation

# Evaluate mathematics knowledge
factly evaluate --tasks mathematics --n-shots 3 --plot

# Evaluate humanities subjects
factly evaluate --tasks high_school_european_history --tasks high_school_us_history --plot

# Evaluate computer science knowledge
factly evaluate --tasks computer_science --verbose --plot

Customized Evaluation

# Customize API settings and system instruction
export OPENAI_API_KEY=https://your-proxy.example.com/v1
factly evaluate --model gpt-4o-mini --instructions ~/path/to/instructions.yaml

Environment Variables

Instead of specifying command-line arguments each time, you can set environment variables in the .env file:

# API Configuration
OPENAI_API_KEY=your_api_key_here
OPENAI_MODEL=gpt-4o
OPENAI_API_BASE=your_api_base_url  # Optional