Avatar

Sivan Doveh

I am a Research Scientist at Google (Mountain View, CA) in the Creative Camera group. Previously, I was a Postdoctoral Researcher at Stanford University. I hold a PhD in Computer Science from the Weizmann Institute of Science, supervised by Prof. Shimon Ullman. I studied how vision-language models function. Exploring their core mechanisms, strengths, and limitations - mainly by developing new data and training approaches

I earned my Master’s degree in Electrical Engineering from Tel Aviv University and my Bachelor’s degree in Electrical Engineering from Ben-Gurion University of the Negev (BGU). In parallel with my academic journey, I have also worked at Google, Applied Materials and IBM Research.

I am actively looking for student collaborators in the area of multi-modal learning.

Contact: sdoveh [at] gmail [dot] com

Recent News

02/26: PRISMM-Bench: A Benchmark of Peer-Review Grounded Multimodal Inconsistencies accepted to ICLR, 2026 (Work done at Stanford)
07/25: Teaching VLMs to Localize Specific Objects from In-context Examples (IPLoc) accepted to ICCV, 2025.
05/25: Invited talk at BIU CS Multi-Modal Day .
05/25: Invited talk at New Tech Event .
04/25: Our workshop "Long Multi-Scene Video Foundations: Generation, Understanding and Evaluation" got accepted at ICCV 2025.
04/25: Invited talk at 14th Israel Machine Vision Conference (IMVC) 2025.
02/25: 2 papers accepted to CVPR, 2025 (workshops).
01/25: LiveXiv accepted to ICLR, 2025.
12/24: Our workshop "What's Next in Multi-Modal Foundation Models" got accepted to CVPR 2025.
09/24: Invited talk at TU Graz.
04/24: Invited talk at 13th Israel Machine Vision Conference (IMVC) 2024.
12/23: Our workshop "What's Next in Multi-Modal Foundation Models" got accepted to CVPR 2024.

Selected Publications