Maintainers’ Guide
This document outlines essential guidelines for maintaining the Factly project. It provides instructions for testing, building, and deploying the package, as well as managing CI workflows.
Overview
The Factly project is a CLI tool for evaluating the factuality of LLMs on the MMLU benchmark. This guide assumes familiarity with GitHub Actions, uv, and common Python development workflows.
Key configurations:
Python Versions Supported: >= 3.12 (tested on 3.12 and 3.13)
Dependency Management:
uvversion 6.xPrimary Dependencies:
click,datasets,deepeval,pandas,matplotlibDocumentation Tool:
sphinxwith Read the Docs themeTesting Tools:
pytest,coverageLinting Tools:
ruff(for linting and formatting)
Development Environment
Prerequisites
To work on the Factly project, you need:
Python 3.12 or higher
uv (for dependency management)
Git
Setting Up
Clone the repository and install dependencies:
git clone https://github.com/sergeyklay/factly.git
cd factly
# Copy and configure environment
cp .env.example .env
# Edit .env with your API keys and configuration
# Install dependencies with development tools
uv sync --locked
Testing the Project
Unit tests and coverage reporting are managed using pytest and coverage.
Running Tests Locally
Run tests using pytest:
# Run all tests
uv run coverage erase
uv run coverage run -m pytest
Generate Coverage Reports
Generate HTML, XML, and LCOV coverage reports:
uv run coverage combine
uv run coverage report
uv run coverage html
uv run coverage xml
This will create reports in the coverage/ directory with subdirectories for each format, as configured in pyproject.toml.
CI Workflow
Tests are executed automatically on supported platforms and Python versions (3.12 and 3.13) via GitHub Actions.
The CI workflow includes:
Code formatting verification
Linting checks
Unit tests with coverage reporting
Coverage report upload to Codecov
Building the Package
The factly package is built using hatchling as specified in pyproject.toml.
Local Build
Build the package:
# Build the package
uv build
Verify the built package:
uv pip install dist/*.whl
factly --version
Documentation Management
Documentation is written using sphinx with the Read the Docs theme.
Building Documentation Locally
Install documentation dependencies:
uv sync --locked --no-default-groups --group docs
Build the documentation:
# Navigate to docs directory
cd docs
# Build HTML documentation
make html
Or build directly with sphinx:
# Build HTML documentation
python -m sphinx \
--jobs auto \
--builder html \
--nitpicky \
--show-traceback \
--fail-on-warning \
--doctree-dir docs/build/doctrees \
docs/source docs/build/html
View the documentation:
# On macOS
open docs/build/html/index.html
# On Linux
xdg-open docs/build/html/index.html
# On Windows
start docs/build/html/index.html
Other Documentation Formats
The docs Makefile supports various output formats:
cd docs
make epub # Build EPUB documentation
make man # Build man pages
make clean # Clean build directory
Without make, use these sphinx-build commands:
cd docs
# Build EPUB documentation
sphinx-build -b epub source build/epub
# Build man pages
sphinx-build -b man source build/man
# Clean build directory
rm -rf build/
CI Workflow
The docs workflow automatically builds and validates documentation on pushes and pull requests. See .github/workflows/docs.yml.
Linting and Code Quality Checks
Code quality is enforced using ruff, which handles both linting and formatting.
Running Locally
Lint and format code:
# Lint and format code
uv run ruff check --select I --fix ./
uv run ruff format --target-version py312 ./
# Check formatting without making changes
uv run ruff format --diff --target-version py312 ./
# Run linter without making changes
uv run ruff check --target-version py312 --preview ./
Pre-commit Hooks
The project uses pre-commit hooks to ensure code quality before commits:
# Install pre-commit hooks
pre-commit install
# Run pre-commit hooks on all files
pre-commit run --all-files
CI Workflow
The CI workflow in .github/workflows/ci.yml includes formatting and linting checks. Pull requests with formatting issues will show the diff of improperly formatted files.
Release Process
Steps for Release
Ensure all tests pass and documentation builds successfully
Update version in
pyproject.tomland__init__.pyUpdate
CHANGELOG.rstwith the changes in the new versioTag the version using git and push tag to GitHub:
git tag -a v1.x.y -m "Release v1.x.y" git push origin v1.x.y
Build and publish the package:
uv build uv publish
CI Workflow
The release workflow is triggered when a new tag matching the pattern v* is pushed to GitHub. It builds the package and publishes it to PyPI.
Continuous Integration and Deployment
CI/CD is managed via GitHub Actions, with workflows for:
Testing: Ensures functionality and compatibility across Python 3.12, and 3.13 on Ubuntu
Linting: Maintains code quality with ruff
Documentation: Validates and builds project documentation
Building: Verifies the package’s integrity
Release: Publishes the package to PyPI
The CI workflow includes:
Caching of dependencies to speed up builds
Automatic code formatting verification
Coverage reporting to Codecov
JUnit XML test results
Development Guidelines
Code Style
The project follows the style enforced by ruff. Key style points:
Line length: 88 characters
Target Python version: 3.12
Use 4 spaces for indentation
Follow PEP 8 with some customizations in pyproject.toml
Type Annotations
Use type annotations for all function parameters and return values:
def process_results(
scores: dict[str, float],
threshold: float = 0.7
) -> list[str]:
"""Process evaluation results."""
# Implementation
Documentation Standards
Use Google-style docstrings for all public functions, classes, and methods
Include examples in docstrings where appropriate
Keep the documentation up-to-date with code changes
Example docstring:
def calculate_factuality_score(responses: list[str]) -> float:
"""Calculate the factuality score based on responses.
Args:
responses: List of model responses to evaluate
Returns:
A float between 0 and 1 representing factuality score
"""
# Implementation
Troubleshooting
Common Development Issues
uv environment issues:
# Recreate the virtual environment rm -rf .venv uv venv uv sync
Pre-commit hook failures:
# Update pre-commit hooks uv run pre-commit autoupdate # Run hooks manually uv run pre-commit run --all-files
Documentation build errors:
# Clean build directory cd docs make clean # Rebuild with verbose output uv run sphinx-build -v --nitpicky --show-traceback --fail-on-warning --builder html docs/source docs/build/html
Test failures:
# Run tests with verbose output uv run pytest -vvv ./factly ./tests # Run a specific test uv run pytest -vvv ./tests/test_specific_file.py::test_specific_function
Cleaning build artifacts without make:
# Remove Python cache files find ./ -name '__pycache__' -delete -o -name '*.pyc' -delete # Remove pytest cache rm -rf ./.pytest_cache # Remove coverage reports rm -rf ./coverage