Skip to content

This repository contains source code for the demos and attacks we present in our paper Security of AI Agents.

Notifications You must be signed in to change notification settings

SecurityLab-UCD/ai-agent-security

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

23 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ai-agent-security

This repository contains source code for the demos encryption defense we present in our paper Security of AI Agents. The code for sandbox defense and evaluation can be found in our fork of AgentBench.

Requirements

Python 3.8 or above

Setup

env.sh is for letting Python find our modules. Source it from repo root directory.

source ./env.sh

Install dependencies

pip install -r requirements.txt

Generate homomorphic encryption data

  • Run python HE_data.py -h to see how to modify generated ciphertexts
cd HE_data && python HE_data.py && cd ../

Run Demos

To run agents using OpenAI LLMs for reasoning, set this environment variable first

export OPENAI_API_KEY="<key>"

SSN Agent Demo

To run the agent

python agents/ssn_agent.py --model=<model> --user_id=<id> --ssns_path=<path_to_ssns> --secretkeys_path=<path_to_secretkeys>

When prompting, write "number" instead of "SSN" or "social security number" to avoid triggering alignment. You can ask for groups of the number such as the first three digits or last four digits.

Example prompt: What are the first three digits of my number?

Homomorphic Encryption Agent Demo

To run the agent

python agents/HE_agent.py --model=<model>

When prompting, please specify "sum" or "product" for postprocessing reasons. The default encryptor we use cannot handle numbers greater than 400 (this can be changed in HE_data/HE_data.py), so limit calculation results to the range 0 to 400 inclusive.

Example prompt: What is the product of indices 0 and 1?

  • Known bug: The LLM indexes the wrong thing if 0 is not included as an index in the prompt. Make sure the first index you write in the prompt is 0.

Tests

To run tests

# Create ciphertext files if you haven't already
cd HE_data && python HE_data.py && cd ../

# Run tests
pytest tests/*

Cite

@inproceedings{he2025aiagent,
    author = {He, Yifeng and Wang, Ethan and Rong, Yuyang and Cheng, Zifei and Chen, Hao},
    title = {Security of AI Agents},  
    booktitle = {International Workshop on Responsible AI Engineering (RAIE)},
    date = {2025-04-29},
    address = {Ottawa, Ontario, Canada},
    doi = {https://doi.org/10.48550/arXiv.2406.08689},
}

About

This repository contains source code for the demos and attacks we present in our paper Security of AI Agents.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published