This list focuses on neuron-level techniques in LLMs. Another list focuses on understanding the internal mechanism of LLMs.
Definition of neurons: A column/row in the matrix in attention or FFN layers. Please refer to "Transformer Feed-Forward Layers Are Key-Value Memories" for FFN neurons, and "Neuron-Level Knowledge Attribution in Large Language Models" for attention neurons.
Conference paper recommendation: please release a issue or contact me.
-
- [arxiv] [2025.5]
-
Back Attention: Understanding and Enhancing Multi-Hop Reasoning in Large Language Models
- [arxiv] [2025.2]
-
Understanding and Mitigating Gender Bias in LLMs via Interpretable Neuron Editing
- [arxiv] [2025.1]
-
Scaling Automatic Neuron Description
- [Transluce AI] [2024.10]
-
Interpreting Arithmetic Mechanism in Large Language Models through Comparative Neuron Analysis
- [EMNLP 2024] [2024.9]
-
Confidence Regulation Neurons in Language Models
- [NeurIPS 2024] [2024.6]
-
- [ACL 2024] [2024.6]
-
Neuron-Level Knowledge Attribution in Large Language Models
- [EMNLP 2024] [2024.6]
-
Wasserstein Distances, Neuronal Entanglement, and Sparsity
- [ICLR 2025] [2024.5]
-
Language-Specific Neurons: The Key to Multilingual Capabilities in Large Language Models
- [ACL 2024] [2024.2]
-
What does the Knowledge Neuron Thesis Have to do with Knowledge?
- [ICLR 2024] [2023.11]
-
Fact Finding: Attempting to Reverse-Engineer Factual Recall on the Neuron Level
- [Deepmind] [2023.12]
-
Neurons in Large Language Models: Dead, N-gram, Positional
- [ACL 2024] [2023.9]
-
Finding Neurons in a Haystack: Case Studies with Sparse Probing
- [TMLR 2024] [2023.5]
-
Language models can explain neurons in language models
- [OpenAI] [2023.5]
-
Finding Alignments Between Interpretable Causal Variables and Distributed Neural Representations
- [arxiv] [2023.5]
-
- [Anthropic] [2022.9]
-
Neuron-level Interpretation of Deep NLP Models: A Survey
- [survey] [2022.8]
-
Transformer Feed-Forward Layers Build Predictions by Promoting Concepts in the Vocabulary Space
- [EMNLP 2022] [2022.3]
-
Knowledge Neurons in Pretrained Transformers
- [ACL 2021] [2021.4]
-
Transformer Feed-Forward Layers Are Key-Value Memories
- [EMNLP 2021] [2020.12]