vlms专题

AI推介-多模态视觉语言模型VLMs论文速览(arXiv方向):2024.08.20-2024.08.25

文章目录~ 1.LowCLIP: Adapting the CLIP Model Architecture for Low-Resource Languages in Multimodal Image Retrieval Task2.Evaluating Attribute Comprehension in Large Vision-Language Models3.PropSAM: A P

视觉语言模型(VLMs)知多少?

最近这几年,自然语言处理和计算机视觉这两大领域真是突飞猛进,让机器不仅能看懂文字,还能理解图片。这两个领域的结合,催生了视觉语言模型,也就是Vision language models (VLMs) ,它们能同时处理视觉信息和文字数据。 VLMs就像是AI界的新宠,能搞定那些既需要看图又需要读文的活儿,比如给图片配文字、回答有关图片的问题,或者根据文字描述生成图片。以前这些活儿都得靠不同

AI推介-多模态视觉语言模型VLMs论文速览(arXiv方向):2024.08.15-2024.08.20

文章目录~ 1.Out-of-Distribution Detection with Attention Head Masking for Multimodal Document Classification2.Open-FinLLMs: Open Multimodal Large Language Models for Financial Applications3.HiRED: Atte

AI推介-多模态视觉语言模型VLMs论文速览(arXiv方向):2024.05.25-2024.05.31

文章目录~ 1.Empowering Visual Creativity: A Vision-Language Assistant to Image Editing Recommendations2.Bootstrap3D: Improving 3D Content Creation with Synthetic Data3.Video-MME: The First-Ever Compreh

AI推介-多模态视觉语言模型VLMs论文速览(arXiv方向):2024.05.10-2024.05.20

文章目录~ 1.Diff-BGM: A Diffusion Model for Video Background Music Generation2.Rethinking Overlooked Aspects in Vision-Language Models3.Unifying 3D Vision-Language Understanding via Promptable Queries4

AI推介-多模态视觉语言模型VLMs论文速览(arXiv方向):2024.04.10-2024.04.15

文章目录~ 1.Photo-Realistic Image Restoration in the Wild with Controlled Vision-Language Models2.Do LLMs Understand Visual Anomalies? Uncovering LLM Capabilities in Zero-shot Anomaly Detection3.UNIAA:

AI推介-多模态视觉语言模型VLMs论文速览(arXiv方向):2024.04.05-2024.04.10

文章目录~ 1.BRAVE: Broadening the visual encoding of vision-language models2.ORacle: Large Vision-Language Models for Knowledge-Guided Holistic OR Domain Modeling3.MedRG: Medical Report Grounding with

AI推介-多模态视觉语言模型VLMs论文速览(arXiv方向):2024.03.31-2024.04.05

文章目录~ 1.Know Your Neighbors: Improving Single-View Reconstruction via Spatial Vision-Language Reasoning2.DeViDe: Faceted medical knowledge for improved medical vision-language pre-training3.Is CLIP

AI推介-多模态视觉语言模型VLMs论文速览(arXiv方向):2024.03.10-2024.03.15

论文目录~ 1.3D-VLA: A 3D Vision-Language-Action Generative World Model2.PosSAM: Panoptic Open-vocabulary Segment Anything3.Anomaly Detection by Adapting a pre-trained Vision Language Model4.Introducing

AI推介-多模态视觉语言模型VLMs论文速览(arXiv方向):2024.03.05-2024.03.10

论文目录~ 1.RESTORE: Towards Feature Shift for Vision-Language Prompt Learning2.In-context Prompt Learning for Test-time Vision Recognition with Frozen Vision-language Model3.DeepSeek-VL: Towards Real-

AI推介-多模态视觉语言模型VLMs论文速览(arXiv方向):2024.03.01-2024.03.05

论文目录~ 1.CLEVR-POC: Reasoning-Intensive Visual Question Answering in Partially Observable Environments2.Feast Your Eyes: Mixture-of-Resolution Adaptation for Multimodal Large Language Models3.MADTP: