PAIR Interpretability

This repo contains code and articles on PAIR interpretability projects.

Scalable Influence and Fact Tracing for Large Language Model Pretraining (ICLR'25)

See blog post, for a light introduction to the paper. There is also a public demo, and the dedicated github repo. The full paper is Scalable Influence and Fact Tracing for Large Language Model Pretraining -- Tyler Chang, Dheeraj Rajagopal, Tolga Bolukbasi, Lucas Dixon, Ian Tenney (RH)

Racing Thoughts: Explaining Large Language Model Contextualization Errors (NAACL'25)

Racing Thoughts: Explaining Contextualization Errors Within Large Language Models -- Michael A. Lepori, Mike Mozer, Asma Ghandeharioun (RH)

Who's asking? User personas and the mechanics of latent misalignment (NeurIPS'24)

Who's asking? User personas and the mechanics of latent misalignment -- Asma Ghandeharioun, Ann Yuan, Marius Guerard, Emily Reif, Michael A. Lepori, Lucas Dixon, at NeurIPS'24.

Patchscopes: A Unifying Framework for Inspecting Hidden Representations of Language Models (ICML'24)

The Patchscopes mini-site & the interactive explorable contain a brief introduction to the longer paper (ICML'24) by Asma Ghandeharioun, Ann Yuan, Marius Guerard, Emily Reif, Michael A. Lepori, Lucas Dixon.

Visualizing and Measuring the Geometry of BERT

bert-tree and context-atlas are repos for two interactive blogposts/visualizations for the paper Visualizing and Measuring the Geometry of BERT :

Language, trees, and geometry in neural networks explores the geometry of syntactic information in BERT (bert-tree)
Language, Context, and Geometry in Neural Network explores semantics and context in BERT. See the accompanying tool, Context Atlas, for more details (context-atlas)

Deep dreaming on text

text-dream contains different experiments and tools to work with deep dreaming for text.

LinguisticLens

data-synth-syntax contains LinguisticLens, a tool for visualizing generated text data.

Name		Name	Last commit message	Last commit date
Latest commit History 296 Commits
bert-tree		bert-tree
context-atlas		context-atlas
data-synth-syntax		data-synth-syntax
patchscopes		patchscopes
personas		personas
text-dream		text-dream
third_party/mathjax		third_party/mathjax
uncertainty-over-space		uncertainty-over-space
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
index.html		index.html

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PAIR Interpretability

Scalable Influence and Fact Tracing for Large Language Model Pretraining (ICLR'25)

Racing Thoughts: Explaining Large Language Model Contextualization Errors (NAACL'25)

Who's asking? User personas and the mechanics of latent misalignment (NeurIPS'24)

Patchscopes: A Unifying Framework for Inspecting Hidden Representations of Language Models (ICML'24)

Visualizing and Measuring the Geometry of BERT

Deep dreaming on text

LinguisticLens

About

Releases

Packages

Contributors 10

Languages

License

PAIR-code/interpretability

Folders and files

Latest commit

History

Repository files navigation

PAIR Interpretability

Scalable Influence and Fact Tracing for Large Language Model Pretraining (ICLR'25)

Racing Thoughts: Explaining Large Language Model Contextualization Errors (NAACL'25)

Who's asking? User personas and the mechanics of latent misalignment (NeurIPS'24)

Patchscopes: A Unifying Framework for Inspecting Hidden Representations of Language Models (ICML'24)

Visualizing and Measuring the Geometry of BERT

Deep dreaming on text

LinguisticLens

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 10

Languages

Packages