Description: This repository contains all the software artifacts from the MSc in Artificial Intelligence Individual Project on using LLMs for performing proof repair in Lean 4, including any modules that were developed as a product of the project and records of experiments performed to yield the results in the final report.
To access and run any scripts provided in this repository, please follow the following steps to set-up:
-
Clone the repository:
git clone https://github.com/monkeytim19/proof-repair-LLM-Lean4.git
-
Navigate into the project directory:
cd proof-repair-LLM-Lean4
-
If using the pipeline, install dependencies:
pip install -r pipeline_requirements.txt
-
If running experiments with LLMs and models, install dependencies:
pip install -r finetune_inference_requirements.txt
The pipeline consists of 3 main components: tracer
, scraper
, verifier
. Below provides the basic instructions on how to use them.
Before any part of the pipeline can be used, it is necessary to fill in all the necessary details in pipeline.config.py
.
The tracer
can retrieve information on theorems from all .lean
files in the directory that is provided to it. To run it:
python -m pipeline.tracer.trace -d RELATIVE/PATH/FROM/TARGET/REPO/ROOT/TO/DIR
The scraper
scrapes data from a Git repository that corresponds to a Lean 4 project, preprocesses it, and prepares it for the use of AI model training.
To use scrape data with it, take the following steps:
-
Create a
FOO.txt
file within theFILENAMES_DIR
directory that was previously specified inpipeline.config.py
. Within this file, include all the relative paths from the target repository to the.lean
files or subdirectories that you wish to scrape data from. Separate each entry by a new line. -
Run
python -m pipeline.scraper.collect -f FOO
After scraping the raw data, it is possible to aggregate and preprocess it via:
python -m pipeline.scraper.preprocess -d BAR.csv
The preprocessed data will then be saved as BAR.csv
within the DATA_DIR
directory from pipeline.config.py
. To ensure the soundness of the dataset, it is strongly recommended to use the verifier
to ensure that all extracted reference/ground-truth proofs are indeed valid relative to its theorem statement
Finally, it is also possible to perform data-spliting with the module using a fully randomised split by:
python -m pipeline.scraper.split -d BAR.csv -v NUM_VALIDATION -t NUM_TEST -r
Alternatively, spliting the data based on grouping the datapoints by their source .lean
file can be done by:
python -m pipeline.scraper.split -d BAR.csv -v NUM_VALIDATION -t NUM_TEST
The verifier
can verify the validity of theorem proofs by replacing the proof attempt to the original .lean
files and asking Lean's kernel to check it.
Note that the attempted proof must be part of a .csv
file with additional columns proof
, thm_name
, filepath
, statement
, commit
.
To run it:
python -m pipeline.verifier.verify -d RELATIVE/PATH/TO/CSV/FILE -c COLUMN_NAME_TO_VERIFY
If the -c flag is not specified, then the verifier
will by default verify the column with the heading predicted_proof
.
All the models that have been trained are located in a subdirectory with their model names within the model_training
directory.
To perform any fine-tuning or inference with LLMs, first navigate to the finetuning_inference
directory via:
cd finetuning_inference
Then open the file run_commands.txt
. From this file, you can copy the appropriate command and run it in bash. For instance, if you want to perform fine-tuning on the ReProver model using the training data and evaluating it using the validation data from the by_file split, you can run:
model_training/train.sh -m reprover -n -t train-valid -v test -d random
In general, for fine-tuning you can run:
model_training/train.sh -m MODEL_NAME -k WANDB_API_KEY -n -t TRAIN_SPLIT -v EVAL_SPLIT -d DATA_SPLIT
Note, for this to run, there should exsit proof_repair_data/DATA_SPLIT/TRAIN_SPLIT.csv
and proof_repair_data/DATA_SPLIT/EVAL_SPLIT.csv
. Also, this requires that the directory model_training/MODEL_NAME
exists with the main.py
file, which should contain the finetune
and inference
functions.
For performing inference on a model, you can run
model_training/inference.sh -m MODEL_NAME -i TEST_SPLIT -d DATA_SPLIT -c RELATIVE/PATH/TO/MODEL_CKPT_DIR/FROM/MODEL_NAME
Again, this will require that proof_repair_data/DATA_SPLIT/TEST_SPLIT.csv
exists. Also, it will require that RELATIVE/PATH/TO/MODEL_CKPT_DIR/FROM/MODEL_NAME
refers to to a model checkpoint that is used to run the inference.
If you want to run inference on a pre-trained base model, then replace the RELATIVE/PATH/TO/MODEL_CKPT_DIR/FROM/MODEL_NAME
directly with the path from HuggingFace. For instance, to perform inference on the base ReProver generator model, you can run:
model_training/inference.sh -m reprover -i test -d by_file -c kaiyuy/leandojo-lean4-tacgen-byt5-small -n
All experiment results are within the directory ./experiments
, can each subdirectory contains the test predictions from inference and also the indices of the datapoints in mathlib4-repair that either were successfully or unsucessfully repaired by the model.
Checkpoints to LLMs that have already been fine-tuned are saved as a collection on HuggingFace and can be accessed via: https://huggingface.co/collections/tcwong/proof-repair-llm-lean4-66de36a17a044c1fbdeaf8de