文章预览
在语义上是否相关。用这个模型从十亿规模的image-text 对中过滤掉相关性不高的数据,从而生成的数据集LAIT("],[20,"Large-scale weAk-supervised Image-Text),其中包含了 一千万张图片,图片描述的平均长度为13个字。","27:\"10\""],[20,"\n\n"],[20,{"gallery":"https://uploader.shimo.im/f/u3awaRh8G8wYYga8.png!thumbnail"},"29:0|30:0|3:\"1036\"|4:\"auto\"|crop:\"\"|frame:\"none\"|ori-height:\"360\"|ori-width:\"1036\""],[20,"\nLAIT数据集中的样本"],[20,"\n","7:1"],[20,"\n\n4、"],[20,"ImageBERT","27:\"12\""],[20,"模型"],[20,"\n","32:2"],[20,{"gallery":"https://uploader.shimo.im/f/PZ5V0YZq89Q21jEI.png!thumbnail"},"29:0|30:0|3:\"1424\"|4:\"auto\"|crop:\"\"|frame:\"none\"|ori-height:\"682\"|ori-width:\"1424\""],[20,"\n \n如上图所示,"],[20,"ImageBERT模型的总体架构和BERT类似,都采用了","27:\"12\"|31:2"],
………………………………