Mechanistic Interpretability of VLM on Spatial Relational Reasoning

published: February 1, 2026

Vision–Language Models perform well at recognizing objects but struggle to grasp spatial relations. We analyze LLaVA-1.5-7B on a controlled synthetic 2D dataset using logit lens, linear probing, and attention diagnostics. On balanced binary relational questions, the model exhibits a strong yes bias. Logit lens and occlusion analyses show this bias depends on visual context rather than language priors alone. Linear probing shows that the correct relational label becomes linearly decodable by mid-layers, despite the final output contradicting this internal information. Attention patterns indicate reliance on object existence rather than relational cues, suggesting an underlying existence bias. We distinguish yes bias as the observable tendency to answer “yes” and existence bias as the mechanism driving this behavior

📄 Download Report