Skip to content

KatherLab/ToolMaker

Repository files navigation

ToolMaker

This repository contains the official code for the paper:

LLM Agents Making Agent Tools
Georg Wölflein, Dyke Ferber, Daniel Truhn, Ognjen Arandjelović and Jakob N. Kather
arXiv, Feb 2025.

Read abstract Tool use has turned large language models (LLMs) into powerful agents that can perform complex multi-step tasks by dynamically utilising external software components. However, these tools must be implemented in advance by human developers, hindering the applicability of LLM agents in domains which demand large numbers of highly specialised tools, like in life sciences and medicine. Motivated by the growing trend of scientific studies accompanied by public code repositories, we propose ToolMaker, a novel agentic framework that autonomously transforms papers with code into LLM-compatible tools. Given a short task description and a repository URL, ToolMaker autonomously installs required dependencies and generates code to perform the task, using a closed-loop self-correction mechanism to iteratively diagnose and rectify errors. To evaluate our approach, we introduce a benchmark comprising 15 diverse and complex computational tasks spanning both medical and non-medical domains with over 100 unit tests to objectively assess tool correctness and robustness. ToolMaker correctly implements 80% of the tasks, substantially outperforming current state-of-the-art software engineering agents. ToolMaker therefore is a step towards fully autonomous agent-based scientific workflows.

Overview

Installation

First, install uv. Then, create a virtual environment with:

uv sync

Also, create a .env file in the root directory with the following content:

OPENAI_API_KEY=sk-proj-...  # your OpenAI API key (required to run toolmaker)
HF_TOKEN=hf_...  # your Hugging Face API key (required for some benchmark tools)
CUDA_VISIBLE_DEVICES=0  # if you have a GPU

Usage

First, use toolmaker to install the repository (replace $TOOL with the path to the tool definition file, e.g. benchmark/tasks/uni_extract_features.yaml):

uv run python -m toolmaker install $TOOL --name my_tool_installed

Then, use toolmaker to create the tool:

uv run python -m toolmaker create $TOOL --name my_tool --installed my_tool_installed

Finally, you can run the tool on one of the test cases:

uv run python -m toolmaker run my_tool --name kather100k_muc

Here, kather100k_muc is the name of the test case defined in the tool definition file. See benchmark/README.md for details on how tools are defined.

Visualize trajectory

To visualize the trajectory of the tool creation process (showing actions, LLM calls, etc.), use the following command:

uv run python -m toolmaker.utils.visualize -i tool_output/tools/my_uni_tool/logs.jsonl -o my_uni_tool.html

This will create a my_uni_tool.html file in the current directory which you can view in your browser.

Benchmarking

To run the unit tests that constitute the benchmark, use the following command (note that this requires the benchmark dependency group to be installed via uv sync --group benchmark):

uv run python -m pytest benchmark/tests --junit-xml=benchmark.xml -m cached  # only run cached tests (faster)

This will create a benchmark.xml containing JUnit-style XML test results.

Unit tests

To run toolmaker's own unit tests (not to be confused with the unit tests in the benchmark), use the following command:

uv run python -m pytest tests

Reference

If you find our work useful in your research or if you use parts of this code please consider citing our preprint:

@misc{wolflein2025toolmaker,
  author        = {W\"{o}lflein, Georg and Ferber, Dyke and Truhn, Daniel and Arandjelovi\'{c}, Ognjen and Kather, Jakob Nikolas},
  title         = {{LLM} Agents Making Agent Tools},
  year          = {2025},
  eprint        = {2502.11705},
  archivePrefix = {arXiv},
  primaryClass  = {cs.CV},
  url           = {https://arxiv.org/abs/2502.11705}
}