尽管做出了显著的努力,但先进的视频分割方法仍然对遮挡和快速移动很敏感,因为它们依赖于对象嵌入的外观,而这些嵌入容易受到这些干扰的影响。一个常见的解决方案是使用光流来提供运动信息,但它基本上只考虑像素级别的运动,这仍然依赖于外观相似性,因此在遮挡和快速移动时经常不准确。在这项工作中,我们研究了实例级别的运动,并提出了InstMove,即Instance Motion for Object-centric Video Segmentation。相对于像素级别运动,InstMove主要依赖于自由于图像特征嵌入的实例级别运动信息,并具有物理解释,使其更准确且对遮挡和快速移动对象更具鲁棒性。为了更好地适应视频分割任务,InstMove使用实例掩模来模拟对象的物理存在,并通过记忆网络学习动态模型以预测其在下一帧中的位置和形状。通过几行代码,InstMove可以集成到当前SOTA方法中,用于三种不同的视频分割任务并提高它们的性能。具体而言,我们在OVIS数据集上的先前艺术作品上提高了1.5 AP,该数据集具有严重的遮挡,以及在YouTubeVIS-Long数据集上提高了4.9 AP,该数据集主要包含快速移动的对象。这些结果表明实例级别的运动是稳健和准确的,因此在面向对象的视频分割的复杂场景中具有强大的解决方案。
Despite significant efforts, cutting-edge video segmentation methods still
remain sensitive to occlusion and rapid movement, due to their reliance on the
appearance of objects in the form of object embeddings, which are vulnerable to
these disturbances. A common solution is to use optical flow to provide motion
information, but essentially it only considers pixel-level motion, which still
relies on appearance similarity and hence is often inaccurate under occlusion
and fast movement. In this work, we study the instance-level motion and present
InstMove, which stands for Instance Motion for Object-centric Video
Segmentation. In comparison to pixel-wise motion, InstMove mainly relies on
instance-level motion information that is free from image feature embeddings,
and features physical interpretations, making it more accurate and robust
toward occlusion and fast-moving objects. To better fit in with the video
segmentation tasks, InstMove uses instance masks to model the physical presence
of an object and learns the dynamic model through a memory network to predict
its position and shape in the next frame. With only a few lines of code,
InstMove can be integrated into current SOTA methods for three different video
segmentation tasks and boost their performance. Specifically, we improve the
previous arts by 1.5 AP on OVIS dataset, which features heavy occlusions, and
4.9 AP on YouTubeVIS-Long dataset, which mainly contains fast-moving objects.
These results suggest that instance-level motion is robust and accurate, and
hence serving as a powerful solution in complex scenarios for object-centric
video segmentation.
论文链接:http://arxiv.org/pdf/2303.08132v1
原创文章,作者:fendouai,如若转载,请注明出处:https://panchuang.net/2023/03/15/instmove%ef%bc%9a%e9%92%88%e5%af%b9%e4%bb%a5%e5%af%b9%e8%b1%a1%e4%b8%ba%e4%b8%ad%e5%bf%83%e7%9a%84%e8%a7%86%e9%a2%91%e5%88%86%e5%89%b2%e7%9a%84%e5%ae%9e%e4%be%8b%e7%a7%bb%e5%8a%a8%e6%8a%80%e6%9c%af/