Efficient LLM Moderation publication cover

Efficient LLM Moderation with Multi-Layer Latent Prototype

Maciej Chrabąszcz, Filip Szatkowski, Bartosz Wójcik, Jan Dubiński, Tomasz Trzciński, Sebastian Cygert

ICML 2026
Reliability Gap publication cover

The Reliability Gap in Benchmark Auditing: Distribution Shift and Scale as Failure Modes of Contamination Detection

Wojciech Zarzecki, Jan Dubiński, Sebastian Cygert

ECML 2026
Internal Monologue publication cover

Monitoring the Internal Monologue: Probe Trajectories Reveal Reasoning Dynamics

Maciej Chrabąszcz, Aleksander Szymczyk, Marcin Sendera, Tomasz Trzciński, Sebastian Cygert

arXiv Preprint
Jailbreaking VLM publication cover

Jailbreaking Vision-Language Models Through the Visual Modality

Aharon Azulay, Jan Dubiński, Zhuoyun Li, Atharv Mittal, Yossi Gandelsman

ICML 2026
Activation Transport publication cover

Conditioned Activation Transport for T2I Safety Steering

Maciej Chrabąszcz, Aleksander Szymczyk, Jan Dubiński, Tomasz Trzciński, Franziska Boenisch, Adam Dziedzic

arXiv Preprint