To main bulk of work is reproducible within the llava_sae notebook. The notebook
- Loads a hooked version of LLaVA so that we can explore and intervene on model activations in the resiudal stream
- Loads a pretrained SAE for Gemma-2B using SAELens
- We run the model on imagenet images and the prompt "describe the image."
- From there, we can use the pretrained SAE to analyze the activations
- We also provide a function to choose an interpretable feature and subtract that from the residual stream in the original LLaVA model. We include some examples in the notebook.