Skip to content
View huangtiansheng's full-sized avatar
🌴
On vacation
🌴
On vacation

Organizations

@git-disl @DatabasePractice

Block or report huangtiansheng

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
huangtiansheng/README.md

Hi there 👋 I am Tiansheng Huang

  • I’m currently a third-year PhD candidate from Georgia Tech.
  • I am working on safety alignment for large language models. Particularly, I am interested in red-teaming attacks and defenses for LLMs.

Selected Publications

I try to push myself to publish high quality papers in the periodicity of every three months.

Here is the first paper I wrote in 2025 (maybe not only in 2025...)

  • [2025/1/30] Virus: Harmful Fine-tuning Attack for Large Language Models bypassing Guardrail Moderation arXiv [paper] [code]

Here are the papers I wrote in 2024.

  • [2024/9/26] Harmful Fine-tuning Attacks and Defenses for Large Language Models: A Survey arXiv [paper] [repo]
  • [2024/9/3] Booster: Tackling harmful fine-tuning for large language models via attenuating harmful perturbation ICLR2025 [paper] [code] [Openreview]
  • [2024/8/18] Antidote: Post-fine-tuning safety alignment for large language models against harmful fine-tuning arXiv [paper]
  • [2024/5/28] Lazy safety alignment for large language models against harmful fine-tuning NeurIPS2024 [paper] [code]
  • [2024/2/2] Vaccine: Perturbation-aware alignment for large language model aginst harmful fine-tuning NeurIPS2024 [paper] [code]

Pinned Loading

  1. git-disl/Virus git-disl/Virus Public

    This is the official code for the paper "Virus: Harmful Fine-tuning Attack for Large Language Models Bypassing Guardrail Moderation"

    Python 35 1

  2. git-disl/awesome_LLM-harmful-fine-tuning-papers git-disl/awesome_LLM-harmful-fine-tuning-papers Public

    A survey on harmful fine-tuning attack for large language model

    130 3

  3. git-disl/Booster git-disl/Booster Public

    This is the official code for the paper "Booster: Tackling Harmful Fine-tuning for Large Language Models via Attenuating Harmful Perturbation" (ICLR2025).

    Shell 14 1

  4. git-disl/Lisa git-disl/Lisa Public

    This is the official code for the paper "Lazy Safety Alignment for Large Language Models against Harmful Fine-tuning" (NeurIPS2024)

    Python 15

  5. git-disl/Vaccine git-disl/Vaccine Public

    This is the official code for the paper "Vaccine: Perturbation-aware Alignment for Large Language Models" (NeurIPS2024)

    Shell 33 4

  6. Silencer Silencer Public

    Python 1