多模态生成是指综合使用图像,视频,文本等多种模态信息,进而自动化地生成符合人类视觉,文化的文本片段描述。这次我整理了16篇多模态生成领域的论文,希望对大家的学习有所启发。3D caption系列论文1.Spatiality-guided Transformer for 3D Dense Captioning on Point Clouds2.X-Trans2Cap_ Cross-Modal Knowledge Transfer using Transformerfor 3D Dense CaptioningImage caption系列论文3.A Comprehensive Survey of Deep Learning for ImageCaptioning4.Image Captioning with Semantic Attention5.Knowing When to Look_ Adaptive Attention viaA Visual Sentinel for Image Captioning6.Learning to Evaluate Image Captioning7.SCA-CNN_ Spatial and Channel-wise Attention in Convolutional Networksfor Image Captioning8.Show and Tell_ A Neural Image Caption Generator9.Show, Attend and Tell_ Neural Image CaptionGeneration with Visual AttentionMulti-tasks caption系列论文10.CLIP4Caption - CLIP for Video Caption11.VisualGPT_ Data-efficient
………………………………