Research Notes

Thoughts on vision-language models, multimodal learning, and AI robustness