Publications
The Reliability Gap in Benchmark Auditing: Distribution Shift and Scale as Failure Modes of Contamination Detection
Monitoring the Internal Monologue: Probe Trajectories Reveal Reasoning Dynamics
Jailbreaking Vision-Language Models Through the Visual Modality