Adversarial Robustness

Overview

Modern vision-language models achieve remarkable performance on standard benchmarks, but remain vulnerable to adversarial perturbations and distribution shifts. My research develops provably-guaranteed detection methods and robust training procedures.

Current Projects

KoALA: KL-L0 Adversarial Detector via Label Agreement

Status: Under review at ICLR 2026

arXiv

→ Key insight: Combining KL divergence with L0-based label agreement enables formal detection guarantees.

What we did:

Designed a dual-head detector that flags inputs when prediction heads disagree
Established formal detection guarantees with rigorous mathematical proofs
Validated on ResNet-18 and CLIP (ViT-B/32) across standard adversarial benchmarks

Why it matters: Most adversarial detection methods lack formal guarantees — they work empirically but can be bypassed by adaptive attacks. KoALA provides provable bounds on detection performance.

Research Questions I’m Exploring

Certified robustness for VLMs — Can we extend randomized smoothing to vision-language models?
Distribution shift detection — How do we distinguish adversarial inputs from natural distribution shifts?
Robust fine-tuning — Does adversarial training transfer across modalities?

← Back to home

Overview

Current Projects

KoALA: KL-L0 Adversarial Detector via Label Agreement

Research Questions I’m Exploring

Related Publications