视觉问答VQA知识资料全集

2024-04-20 11:18

本文主要是介绍视觉问答VQA知识资料全集,希望对大家解决编程问题提供一定的参考价值,需要的开发者们随着小编来一起学习吧!

视觉问答(Visual Question Answering,VQA)专知荟萃

  • 入门学习
  • 进阶论文
    • Attention-Based
    • Knowledge-based
    • Memory Network
    • Video QA
  • 综述
  • Tutorial
  • Dataset
  • Code
  • 领域专家

入门学习

  • 基于深度学习的VQA(视觉问答)技术
    • [https://zhuanlan.zhihu.com/p/22530291]
  • 视觉问答全景概述:从数据集到技术方法
    • https://mp.weixin.qq.com/s/dyor9bv2y0VyX7woMDVLkA
  • 论文读书笔记(Multimodal Compact Bilinear Pooling for Visual Question Answering and Visual Grounding)
    • [http://www.jianshu.com/p/5bf03d1fadfa]
  • 能看图回答问题的AI离我们还有多远?Facebook向视觉对话进发
    • [https://www.leiphone.com/news/201711/4B9cNlCINsVyPdTw.html]
  • 图像问答Image Question Answering
    • [http://www.cnblogs.com/ranjiewen/p/7604468.html]
  • 实战深度学习之图像问答
    • [https://zhuanlan.zhihu.com/p/20899091]
  • 2017 VQA Challenge 第一名技术报告
    • [https://zhuanlan.zhihu.com/p/29688475]
  • 深度学习为视觉和语言之间搭建了一座桥梁
    • [http://www.msra.cn/zh-cn/news/features/vision-and-language-20170713]

进阶论文

  • Kushal Kafle, and Christopher Kanan. Visual question answering: Datasets, algorithms, and future challenges. Computer Vision and Image Understanding [2017].
    • [https://arxiv.org/abs/1610.01465]
  • Justin Johnson, Bharath Hariharan, Laurens van der Maaten, Li Fei-Fei, C. Lawrence Zitnick, Ross Girshick, CLEVR: A Diagnostic Dataset for Compositional Language and Elementary Visual Reasoning, CVPR 2017.
    • [http://vision.stanford.edu/pdf/johnson2017cvpr.pdf]
  • Justin Johnson, Bharath Hariharan, Laurens van der Maaten, Judy Hoffman, Li Fei-Fei, C. Lawrence Zitnick, Ross Girshick, Inferring and Executing Programs for Visual Reasoning, arXiv:1705.03633, 2017. [https://arxiv.org/abs/1705.03633]
  • Ronghang Hu, Jacob Andreas, Marcus Rohrbach, Trevor Darrell, Kate Saenko, Learning to Reason: End-to-End Module Networks for Visual Question Answering, arXiv:1704.05526, 2017. [https://arxiv.org/abs/1704.05526]
  • Adam Santoro, David Raposo, David G.T. Barrett, Mateusz Malinowski, Razvan Pascanu, Peter Battaglia, Timothy Lillicrap, A simple neural network module for relational reasoning, arXiv:1706.01427, 2017. [https://arxiv.org/abs/1706.01427]
  • Hedi Ben-younes, Remi Cadene, Matthieu Cord, Nicolas Thome: MUTAN: Multimodal Tucker Fusion for Visual Question Answering [https://arxiv.org/pdf/1705.06676.pdf] [https://github.com/Cadene/vqa.pytorch]
  • Vahid Kazemi, Ali Elqursh, Show, Ask, Attend, and Answer: A Strong Baseline For Visual Question Answering, arXiv:1704.03162, 2016. [https://arxiv.org/abs/1704.03162] [https://github.com/Cyanogenoid/pytorch-vqa]
  • Kushal Kafle, and Christopher Kanan. An Analysis of Visual Question Answering Algorithms. arXiv:1703.09684, 2017. [https://arxiv.org/abs/1703.09684]
  • Hyeonseob Nam, Jung-Woo Ha, Jeonghee Kim, Dual Attention Networks for Multimodal Reasoning and Matching, arXiv:1611.00471, 2016. [https://arxiv.org/abs/1611.00471]
  • Jin-Hwa Kim, Kyoung Woon On, Jeonghee Kim, Jung-Woo Ha, Byoung-Tak Zhang, Hadamard Product for Low-rank Bilinear Pooling, arXiv:1610.04325, 2016. [https://arxiv.org/abs/1610.04325]
  • Akira Fukui, Dong Huk Park, Daylen Yang, Anna Rohrbach, Trevor Darrell, Marcus Rohrbach, Multimodal Compact Bilinear Pooling for Visual Question Answering and Visual Grounding, arXiv:1606.01847, 2016. [https://arxiv.org/abs/1606.01847] [https://github.com/akirafukui/vqa-mcb]
  • Kuniaki Saito, Andrew Shin, Yoshitaka Ushiku, Tatsuya Harada, DualNet: Domain-Invariant Network for Visual Question Answering. arXiv:1606.06108v1, 2016. [https://arxiv.org/pdf/1606.06108.pdf]
  • Arijit Ray, Gordon Christie, Mohit Bansal, Dhruv Batra, Devi Parikh, Question Relevance in VQA: Identifying Non-Visual And False-Premise Questions, arXiv:1606.06622, 2016. [https://arxiv.org/pdf/1606.06622v1.pdf]
  • Hyeonwoo Noh, Bohyung Han, Training Recurrent Answering Units with Joint Loss Minimization for VQA, arXiv:1606.03647, 2016. [http://arxiv.org/abs/1606.03647v1]
  • Jiasen Lu, Jianwei Yang, Dhruv Batra, Devi Parikh, Hierarchical Question-Image Co-Attention for Visual Question Answering, arXiv:1606.00061, 2016. [https://arxiv.org/pdf/1606.00061v2.pdf] [https://github.com/jiasenlu/HieCoAttenVQA]
  • Jin-Hwa Kim, Sang-Woo Lee, Dong-Hyun Kwak, Min-Oh Heo, Jeonghee Kim, Jung-Woo Ha, Byoung-Tak Zhang, Multimodal Residual Learning for Visual QA, arXiv:1606.01455, 2016. [https://arxiv.org/pdf/1606.01455v1.pdf]
  • Peng Wang, Qi Wu, Chunhua Shen, Anton van den Hengel, Anthony Dick, FVQA: Fact-based Visual Question Answering, arXiv:1606.05433, 2016. [https://arxiv.org/pdf/1606.05433.pdf]
  • Ilija Ilievski, Shuicheng Yan, Jiashi Feng, A Focused Dynamic Attention Model for Visual Question Answering, arXiv:1604.01485. [https://arxiv.org/pdf/1604.01485v1.pdf]
  • Yuke Zhu, Oliver Groth, Michael Bernstein, Li Fei-Fei, Visual7W: Grounded Question Answering in Images, CVPR 2016. [http://arxiv.org/abs/1511.03416]
  • Hyeonwoo Noh, Paul Hongsuck Seo, and Bohyung Han, Image Question Answering using Convolutional Neural Network with Dynamic Parameter Prediction, CVPR, 2016.[http://arxiv.org/pdf/1511.05756.pdf]
  • Jacob Andreas, Marcus Rohrbach, Trevor Darrell, Dan Klein, Learning to Compose Neural Networks for Question Answering, NAACL 2016. [http://arxiv.org/pdf/1601.01705.pdf]
  • Jacob Andreas, Marcus Rohrbach, Trevor Darrell, Dan Klein, Deep compositional question answering with neural module networks, CVPR 2016. [https://arxiv.org/abs/1511.02799]
  • Zichao Yang, Xiaodong He, Jianfeng Gao, Li Deng, Alex Smola, Stacked Attention Networks for Image Question Answering, CVPR 2016. [http://arxiv.org/abs/1511.02274] [https://github.com/JamesChuanggg/san-torch]
  • Kevin J. Shih, Saurabh Singh, Derek Hoiem, Where To Look: Focus Regions for Visual Question Answering, CVPR, 2015. [http://arxiv.org/pdf/1511.07394v2.pdf]
  • Kan Chen, Jiang Wang, Liang-Chieh Chen, Haoyuan Gao, Wei Xu, Ram Nevatia, ABC-CNN: An Attention Based Convolutional Neural Network for Visual Question Answering, arXiv:1511.05960v1, Nov 2015. [http://arxiv.org/pdf/1511.05960v1.pdf]
  • Huijuan Xu, Kate Saenko, Ask, Attend and Answer: Exploring Question-Guided Spatial Attention for Visual Question Answering, arXiv:1511.05234v1, Nov 2015. [http://arxiv.org/abs/1511.05234]
  • Kushal Kafle and Christopher Kanan, Answer-Type Prediction for Visual Question Answering, CVPR 2016. [http://www.cv-foundation.org/openaccess/content_cvpr_2016/html/Kafle_Answer-Type_Prediction_for_CVPR_2016_paper.html]
  • Stanislaw Antol, Aishwarya Agrawal, Jiasen Lu, Margaret Mitchell, Dhruv Batra, C. Lawrence Zitnick, Devi Parikh, VQA: Visual Question Answering, ICCV, 2015. [http://arxiv.org/pdf/1505.00468]
  • Stanislaw Antol, Aishwarya Agrawal, Jiasen Lu, Margaret Mitchell, Dhruv Batra, C. Lawrence Zitnick, Devi Parikh, VQA: Visual Question Answering, ICCV, 2015. [http://arxiv.org/pdf/1505.00468] [https://github.com/JamesChuanggg/VQA-tensorflow]
  • Bolei Zhou, Yuandong Tian, Sainbayar Sukhbaatar, Arthur Szlam, Rob Fergus, Simple Baseline for Visual Question Answering, arXiv:1512.02167v2, Dec 2015. [http://arxiv.org/abs/1512.02167]
  • Hauyuan Gao, Junhua Mao, Jie Zhou, Zhiheng Huang, Lei Wang, Wei Xu, Are You Talking to a Machine? Dataset and Methods for Multilingual Image Question Answering, NIPS 2015. [http://arxiv.org/pdf/1505.05612.pdf]
  • Mateusz Malinowski, Marcus Rohrbach, Mario Fritz, Ask Your Neurons: A Neural-based Approach to Answering Questions about Images, ICCV 2015. [http://arxiv.org/pdf/1505.01121v3.pdf]
  • Mengye Ren, Ryan Kiros, Richard Zemel, Exploring Models and Data for Image Question Answering, ICML 2015. [http://arxiv.org/pdf/1505.02074.pdf]
  • Mateusz Malinowski, Mario Fritz, Towards a Visual Turing Challe, NIPS Workshop 2015. [http://arxiv.org/abs/1410.8027]
  • Mateusz Malinowski, Mario Fritz, A Multi-World Approach to Question Answering about Real-World Scenes based on Uncertain Input, NIPS 2014. [http://arxiv.org/pdf/1410.0210v4.pdf]

Attention-Based

  • Hedi Ben-younes, Remi Cadene, Matthieu Cord, Nicolas Thome: MUTAN: Multimodal Tucker Fusion for Visual Question Answering [https://arxiv.org/pdf/1705.06676.pdf] [https://github.com/Cadene/vqa.pytorch]
  • Jin-Hwa Kim, Kyoung Woon On, Jeonghee Kim, Jung-Woo Ha, Byoung-Tak Zhang, Hadamard Product for Low-rank Bilinear Pooling, arXiv:1610.04325, 2016. [https://arxiv.org/abs/1610.04325]
  • Akira Fukui, Dong Huk Park, Daylen Yang, Anna Rohrbach, Trevor Darrell, Marcus Rohrbach, Multimodal Compact Bilinear Pooling for Visual Question Answering and Visual Grounding, arXiv:1606.01847, 2016. [https://arxiv.org/abs/1606.01847]
  • Hyeonwoo Noh, Bohyung Han, Training Recurrent Answering Units with Joint Loss Minimization for VQA, arXiv:1606.03647, 2016. [http://arxiv.org/abs/1606.03647v1]
  • Jiasen Lu, Jianwei Yang, Dhruv Batra, Devi Parikh, Hierarchical Question-Image Co-Attention for Visual Question Answering, arXiv:1606.00061, 2016. [https://arxiv.org/pdf/1606.00061v2.pdf]
  • Zichao Yang, Xiaodong He, Jianfeng Gao, Li Deng, Alex Smola, Stacked Attention Networks for Image Question Answering, CVPR 2016. [http://arxiv.org/abs/1511.02274]
  • Ilija Ilievski, Shuicheng Yan, Jiashi Feng, A Focused Dynamic Attention Model for Visual Question Answering, arXiv:1604.01485. [https://arxiv.org/pdf/1604.01485v1.pdf]
  • Kan Chen, Jiang Wang, Liang-Chieh Chen, Haoyuan Gao, Wei Xu, Ram Nevatia, ABC-CNN: An Attention Based Convolutional Neural Network for Visual Question Answering, arXiv:1511.05960v1, Nov 2015. [http://arxiv.org/pdf/1511.05960v1.pdf]
  • Huijuan Xu, Kate Saenko, Ask, Attend and Answer: Exploring Question-Guided Spatial Attention for Visual Question Answering, arXiv:1511.05234v1, Nov 2015. [http://arxiv.org/abs/1511.05234]

Knowledge-based

  • Peng Wang, Qi Wu, Chunhua Shen, Anton van den Hengel, Anthony Dick, FVQA: Fact-based Visual Question Answering, arXiv:1606.05433, 2016. [https://arxiv.org/pdf/1606.05433.pdf]
  • Qi Wu, Peng Wang, Chunhua Shen, Anton van den Hengel, Anthony Dick, Ask Me Anything: Free-form Visual Question Answering Based on Knowledge from External Sources, CVPR 2016. [http://arxiv.org/abs/1511.06973]
  • Peng Wang, Qi Wu, Chunhua Shen, Anton van den Hengel, Anthony Dick, Explicit Knowledge-based Reasoning for Visual Question Answering, arXiv:1511.02570v2, Nov 2015. [http://arxiv.org/abs/1511.02570]
  • Yuke Zhu, Ce Zhang, Christopher Re,́ Li Fei-Fei, Building a Large-scale Multimodal Knowledge Base System for Answering Visual Queries, arXiv:1507.05670, Nov 2015. [http://arxiv.org/abs/1507.05670]

Memory Network

  • Caiming Xiong, Stephen Merity, Richard Socher, Dynamic Memory Networks for Visual and Textual Question Answering, ICML 2016. [http://arxiv.org/abs/1603.01417]
  • Aiwen Jiang, Fang Wang, Fatih Porikli, Yi Li, Compositional Memory for Visual Question Answering, arXiv:1511.05676v1, Nov 2015. [http://arxiv.org/abs/1511.05676]

Video QA

  • Kuo-Hao Zeng, Tseng-Hung Chen, Ching-Yao Chuang, Yuan-Hong Liao, Juan Carlos Niebles, Min Sun, Leveraging Video Descriptions to Learn Video Question Answering, AAAI 2017. [https://arxiv.org/abs/1611.04021]
    • Makarand Tapaswi, Yukun Zhu, Rainer Stiefelhagen, Antonio Torralba, Raquel Urtasun, Sanja Fidler, MovieQA: Understanding Stories in Movies through Question-Answering, CVPR 2016. [http://arxiv.org/abs/1512.02902]
    • Linchao Zhu, Zhongwen Xu, Yi Yang, Alexander G. Hauptmann, Uncovering Temporal Context for Video Question and Answering, arXiv:1511.05676v1, Nov 2015. [http://arxiv.org/abs/1511.04670]

综述

  • Qi Wu, Damien Teney, Peng Wang, Chunhua Shen, Anthony Dick, and Anton van den Hengel. Visual question answering: A survey of methods and datasets. Computer Vision and Image Understanding [2017].
    • [https://arxiv.org/abs/1607.05910]
  • Tutorial on Answering Questions about Images with Deep Learning Mateusz Malinowski, Mario Fritz
    • [https://arxiv.org/abs/1610.01076]
  • Survey of Visual Question Answering: Datasets and Techniques
    • [https://arxiv.org/abs/1705.03865]
  • Visual Question Answering: Datasets, Algorithms, and Future Challenges
    • [https://arxiv.org/abs/1610.01465]

Tutorial

  • CVPR 2017 VQA Challenge Workshop (有很多PPT)
    • [http://www.visualqa.org/workshop.html]
  • CVPR 2016 VQA Challenge Workshop
    • [http://www.visualqa.org/vqa_v1_workshop.html\]
  • Tutorial on Answering Questions about Images with Deep Learning
    • [https://arxiv.org/pdf/1610.01076.pdf]
  • Visual Question Answering Demo in Python Notebook
    • [http://iamaaditya.github.io/2016/04/visual_question_answering_demo_notebook\]
  • Tutorial on Question Answering about Images
    • [https://www.linkedin.com/pulse/tutorial-question-answering-images-mateusz-malinowski/]

Dataset

  • Visual7W: Grounded Question Answering in Images
    • homepage: http://web.stanford.edu/~yukez/visual7w/
    • github: https://github.com/yukezhu/visual7w-toolkit
    • github: https://github.com/yukezhu/visual7w-qa-models
  • DAQUAR
    • [http://www.cs.toronto.edu/~mren/imageqa/results/\]
  • COCO-QA
    • [http://www.cs.toronto.edu/~mren/imageqa/data/cocoqa/\]
  • The VQA Dataset
    • [http://visualqa.org/]
  • FM-IQA
    • [http://idl.baidu.com/FM-IQA.html]
  • Visual Genome
    • [http://visualgenome.org/]

Code

  • VQA Demo: Visual Question Answering Demo on pretrained model
    • [https://github.com/iamaaditya/VQA_Demo]
    • [http://iamaaditya.github.io/research/]
  • deep-qa: Implementation of the Convolution Neural Network for factoid QA on the answer sentence selection task
    • [https://github.com/aseveryn/deep-qa]
  • YodaQA: A Question Answering system built on top of the Apache UIMA framework
    • [http://ailao.eu/yodaqa/]
    • [https://github.com/brmson/yodaqa]
  • insuranceQA-cnn-lstm: tensorflow and theano cnn code for insurance QA
    • [https://github.com/white127/insuranceQA-cnn-lstm]
  • Tensorflow Implementation of Deeper LSTM+ normalized CNN for Visual Question Answering
    • [https://github.com/JamesChuanggg/VQA-tensorflow]
  • Visual Question Answering with Keras
    • [https://anantzoid.github.io/VQA-Keras-Visual-Question-Answering/]
  • Deep Learning Models for Question Answering with Keras
    • [http://sujitpal.blogspot.jp/2016/10/deep-learning-models-for-question.html]
  • Deep QA: Using deep learning to answer Aristo's science questions
    • [https://github.com/allenai/deep_qa]
  • Visual Question Answering in Pytorch
    • [https://github.com/Cadene/vqa.pytorch]

领域专家

  • Qi Wu
    • [https://researchers.adelaide.edu.au/profile/qi.wu01]
  • Bolei Zhou 周博磊
    • [http://people.csail.mit.edu/bzhou/]
  • Stanislaw Antol
    • [https://computing.ece.vt.edu/~santol/\]
  • Jin-Hwa Kim
    • [https://bi.snu.ac.kr/~jhkim/\]
  • Vahid Kazemi
    • [http://www.csc.kth.se/~vahidk/index.html\]
  • Justin Johnson
    • [http://cs.stanford.edu/people/jcjohns/]
  • Ilija Ilievski
    • [https://ilija139.github.io/]

这篇关于视觉问答VQA知识资料全集的文章就介绍到这儿,希望我们推荐的文章对编程师们有所帮助!



http://www.chinasem.cn/article/920117

相关文章

Java架构师知识体认识

源码分析 常用设计模式 Proxy代理模式Factory工厂模式Singleton单例模式Delegate委派模式Strategy策略模式Prototype原型模式Template模板模式 Spring5 beans 接口实例化代理Bean操作 Context Ioc容器设计原理及高级特性Aop设计原理Factorybean与Beanfactory Transaction 声明式事物

sqlite3 相关知识

WAL 模式 VS 回滚模式 特性WAL 模式回滚模式(Rollback Journal)定义使用写前日志来记录变更。使用回滚日志来记录事务的所有修改。特点更高的并发性和性能;支持多读者和单写者。支持安全的事务回滚,但并发性较低。性能写入性能更好,尤其是读多写少的场景。写操作会造成较大的性能开销,尤其是在事务开始时。写入流程数据首先写入 WAL 文件,然后才从 WAL 刷新到主数据库。数据在开始

系统架构师考试学习笔记第三篇——架构设计高级知识(20)通信系统架构设计理论与实践

本章知识考点:         第20课时主要学习通信系统架构设计的理论和工作中的实践。根据新版考试大纲,本课时知识点会涉及案例分析题(25分),而在历年考试中,案例题对该部分内容的考查并不多,虽在综合知识选择题目中经常考查,但分值也不高。本课时内容侧重于对知识点的记忆和理解,按照以往的出题规律,通信系统架构设计基础知识点多来源于教材内的基础网络设备、网络架构和教材外最新时事热点技术。本课时知识

计算机视觉工程师所需的基本技能

一、编程技能 熟练掌握编程语言 Python:在计算机视觉领域广泛应用,有丰富的库如 OpenCV、TensorFlow、PyTorch 等,方便进行算法实现和模型开发。 C++:运行效率高,适用于对性能要求严格的计算机视觉应用。 数据结构与算法 掌握常见的数据结构(如数组、链表、栈、队列、树、图等)和算法(如排序、搜索、动态规划等),能够优化代码性能,提高算法效率。 二、数学基础

【Python知识宝库】上下文管理器与with语句:资源管理的优雅方式

🎬 鸽芷咕:个人主页  🔥 个人专栏: 《C++干货基地》《粉丝福利》 ⛺️生活的理想,就是为了理想的生活! 文章目录 前言一、什么是上下文管理器?二、上下文管理器的实现三、使用内置上下文管理器四、使用`contextlib`模块五、总结 前言 在Python编程中,资源管理是一个重要的主题,尤其是在处理文件、网络连接和数据库

dr 航迹推算 知识介绍

DR(Dead Reckoning)航迹推算是一种在航海、航空、车辆导航等领域中广泛使用的技术,用于估算物体的位置。DR航迹推算主要通过已知的初始位置和运动参数(如速度、方向)来预测物体的当前位置。以下是 DR 航迹推算的详细知识介绍: 1. 基本概念 Dead Reckoning(DR): 定义:通过利用已知的当前位置、速度、方向和时间间隔,计算物体在下一时刻的位置。应用:用于导航和定位,

《计算机视觉工程师养成计划》 ·数字图像处理·数字图像处理特征·概述~

1 定义         从哲学角度看:特征是从事物当中抽象出来用于区别其他类别事物的属性集合,图像特征则是从图像中抽取出来用于区别其他类别图像的属性集合。         从获取方式看:图像特征是通过对图像进行测量或借助算法计算得到的一组表达特性集合的向量。 2 认识         有些特征是视觉直观感受到的自然特征,例如亮度、边缘轮廓、纹理、色彩等。         有些特征需要通

【python计算机视觉编程——7.图像搜索】

python计算机视觉编程——7.图像搜索 7.图像搜索7.1 基于内容的图像检索(CBIR)从文本挖掘中获取灵感——矢量空间模型(BOW表示模型)7.2 视觉单词**思想****特征提取**: 创建词汇7.3 图像索引7.3.1 建立数据库7.3.2 添加图像 7.4 在数据库中搜索图像7.4.1 利用索引获取获选图像7.4.2 用一幅图像进行查询7.4.3 确定对比基准并绘制结果 7.

参会邀请 | 第二届机器视觉、图像处理与影像技术国际会议(MVIPIT 2024)

第二届机器视觉、图像处理与影像技术国际会议(MVIPIT 2024)将于2024年9月13日-15日在中国张家口召开。 MVIPIT 2024聚焦机器视觉、图像处理与影像技术,旨在为专家、学者和研究人员提供一个国际平台,分享研究成果,讨论问题和挑战,探索前沿技术。诚邀高校、科研院所、企业等有关方面的专家学者参加会议。 9月13日(周五):签到日 9月14日(周六):会议日 9月15日(周日

【python计算机视觉编程——8.图像内容分类】

python计算机视觉编程——8.图像内容分类 8.图像内容分类8.1 K邻近分类法(KNN)8.1.1 一个简单的二维示例8.1.2 用稠密SIFT作为图像特征8.1.3 图像分类:手势识别 8.2贝叶斯分类器用PCA降维 8.3 支持向量机8.3.2 再论手势识别 8.4 光学字符识别8.4.2 选取特征8.4.3 多类支持向量机8.4.4 提取单元格并识别字符8.4.5 图像校正