- I’m currently a third-year PhD candidate from Georgia Tech.
- I am working on safety alignment for large language models. Particularly, I am interested in red-teaming attacks and defenses for LLMs.
I try to push myself to publish high quality papers in the periodicity of every three months.
Here is the first paper I wrote in 2025 (maybe not only in 2025...)
- [2025/1/30] Virus: Harmful Fine-tuning Attack for Large Language Models bypassing Guardrail Moderation arXiv [paper] [code]
Here are the papers I wrote in 2024.
- [2024/9/26] Harmful Fine-tuning Attacks and Defenses for Large Language Models: A Survey arXiv [paper] [repo]
- [2024/9/3] Booster: Tackling harmful fine-tuning for large language models via attenuating harmful perturbation ICLR2025 [paper] [code] [Openreview]
- [2024/8/18] Antidote: Post-fine-tuning safety alignment for large language models against harmful fine-tuning arXiv [paper]
- [2024/5/28] Lazy safety alignment for large language models against harmful fine-tuning NeurIPS2024 [paper] [code]
- [2024/2/2] Vaccine: Perturbation-aware alignment for large language model aginst harmful fine-tuning NeurIPS2024 [paper] [code]