ABSTRACT
In the last decade, we have seen a significant uprise of deep neural networks in image processing tasks and many other research areas. However, while various neural architectures have successfully solved numerous tasks, they constantly demand more and more processing time and training data. Moreover, the current trend of using existing pre-trained architectures just as backbones and attaching new processing branches on top not only increases this demand but diminishes the explainability of the whole model.
Our research focuses on combinations of explainable building blocks for the image processing tasks, such as object tracking. We propose a combination of Mask R-CNN, state-of-the-art object detection and segmentation neural network, with our previously published method of sparse feature tracking [16]. Such a combination allows us to track objects by connecting detected masks using the proposed sparse feature tracklets. However, this method cannot recover from complete object occlusions and has to be assisted by an object re-identification.
To this end, this paper uses our feature tracking method for a slightly different task: an unsupervised extraction of object representations that we can directly use to fine-tune an object re-identification algorithm, see Fig. 1 for visualisation. As we have to use objects masks already in the object tracking, our approach utilises the additional information as an alpha channel of the object representations, which further increases the precision of the re-identification. An additional benefit is that our fine-tuning method can be employed even in a fully online scenario.
- Alina Bialkowski, Simon Denman, Sridha Sridharan, Clinton Fookes, and Patrick Lucey. 2012. A database for person re-identification in multi-camera surveillance networks. In 2012 International Conference on Digital Image Computing Techniques and Applications (DICTA). IEEE, 1–8.Google ScholarCross Ref
- Michael Calonder, Vincent Lepetit, Christoph Strecha, and Pascal Fua. 2010. Brief: Binary robust independent elementary features. In ECCV. Springer.Google Scholar
- Andrea Colombari, Andrea Fusiello, and Vittorio Murino. 2007. Segmentation and tracking of multiple video objects. Pattern Recognition 40, 4 (2007), 1307–1317.Google ScholarDigital Library
- Afshin Dehghan, Shayan Modiri Assari, and Mubarak Shah. 2015. Gmmcp tracker: Globally optimal generalized maximum multi clique problem for multiple object tracking. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4091–4099.Google ScholarCross Ref
- Pedro F Felzenszwalb, Ross B Girshick, David McAllester, and Deva Ramanan. 2009. Object detection with discriminatively trained part-based models. IEEE transactions on pattern analysis and machine intelligence 32, 9(2009), 1627–1645.Google Scholar
- Christopher G Harris, Mike Stephens, 1988. A combined corner and edge detector.. In Alvey vision conference, Vol. 15. Citeseer, 10–5244.Google Scholar
- Kaiming He, Georgia Gkioxari, Piotr Dollár, and Ross Girshick. 2017. Mask r-cnn. In Proceedings of the IEEE international conference on computer vision. 2961–2969.Google ScholarCross Ref
- Martin Hirzer, Csaba Beleznai, Peter M Roth, and Horst Bischof. 2011. Person re-identification by descriptive and discriminative classification. In Scandinavian conference on Image analysis. Springer, 91–102.Google ScholarCross Ref
- Harold W Kuhn. 1955. The Hungarian method for the assignment problem. Naval research logistics quarterly 2, 1-2 (1955), 83–97.Google Scholar
- José Lezama, Karteek Alahari, Josef Sivic, and Ivan Laptev. 2011. Track to the future: Spatio-temporal video segmentation with long-range motion cues. In CVPR 2011. IEEE, 3369–3376.Google ScholarDigital Library
- Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. 2014. Microsoft coco: Common objects in context. In European conference on computer vision. Springer, 740–755.Google ScholarCross Ref
- David G Lowe. 2004. Distinctive image features from scale-invariant keypoints. International journal of computer vision 60, 2 (2004), 91–110.Google ScholarDigital Library
- Bruce D Lucas, Takeo Kanade, 1981. An iterative image registration technique with an application to stereo vision. In IJCAI.Google Scholar
- Niki Martinel, Christian Micheloni, and Claudio Piciarelli. 2012. Distributed signature fusion for person re-identification. In 2012 Sixth International Conference on Distributed Smart Cameras (ICDSC). IEEE, 1–6.Google Scholar
- A. Milan, L. Leal-Taixé, I. Reid, S. Roth, and K. Schindler. 2016. MOT16: A Benchmark for Multi-Object Tracking. arXiv:1603.00831 [cs] (March 2016). http://arxiv.org/abs/1603.00831 arXiv:1603.00831.Google Scholar
- Petr Pulc. 2019. Hierarchical Motion Tracking for UHD video processing. GitHub repository (2019). https://github.com/petrpulc/gpu_orb_trackerGoogle Scholar
- Petr Pulc. 2021. Mask R-CNN Pedestrian Tracklets. https://doi.org/10.34740/KAGGLE/DS/1376245Google Scholar
- Petr Pulc and Martin Holeňa. 2018. Hierarchical Motion Tracking Using Matching of Sparse Features. In 2018 14th International Conference on Signal-Image Technology & Internet-Based Systems (SITIS). IEEE, 449–456.Google Scholar
- Joseph Redmon and Ali Farhadi. 2018. YOLOv3: An Incremental Improvement. arXiv (2018).Google Scholar
- Edward Rosten and Tom Drummond. 2006. Machine learning for high-speed corner detection. In ECCV. Springer, 430–443.Google Scholar
- Ethan Rublee, Vincent Rabaud, Kurt Konolige, and Gary Bradski. 2011. ORB: An efficient alternative to SIFT or SURF. In ICCV. IEEE, 2564–2571.Google Scholar
- Jianbo Shi and Carlo Tomasi. 1993. Good features to track. Technical Report. Cornell University.Google Scholar
- Du Tran, Lubomir Bourdev, Rob Fergus, Lorenzo Torresani, and Manohar Paluri. 2015. Learning spatiotemporal features with 3d convolutional networks. In Proceedings of the IEEE international conference on computer vision. 4489–4497.Google ScholarDigital Library
- Qing Wang, Feng Chen, Wenli Xu, and Ming-Hsuan Yang. 2011. An experimental comparison of online object-tracking algorithms. In Wavelets and Sparsity XIV, Vol. 8138. International Society for Optics and Photonics.Google Scholar
- Shu Wang, Huchuan Lu, Fan Yang, and Ming-Hsuan Yang. 2011. Superpixel tracking. In 2011 International Conference on Computer Vision. IEEE, 1323–1330.Google Scholar
- Taiqing Wang, Shaogang Gong, Xiatian Zhu, and Shengjin Wang. 2014. Person re-identification by video ranking. In European conference on computer vision. Springer, 688–703.Google ScholarCross Ref
- Nicolai Wojke and Alex Bewley. 2018. Deep cosine metric learning for person re-identification. In 2018 IEEE winter conference on applications of computer vision (WACV). IEEE, 748–756.Google Scholar
- Nicolai Wojke, Alex Bewley, and Dietrich Paulus. 2017. Simple Online and Realtime Tracking with a Deep Association Metric. In 2017 IEEE International Conference on Image Processing (ICIP). IEEE, 3645–3649. https://doi.org/10.1109/ICIP.2017.8296962Google ScholarDigital Library
- Ning Xu, Linjie Yang, Yuchen Fan, Dingcheng Yue, Yuchen Liang, Jianchao Yang, and Thomas Huang. 2018. Youtube-vos: A large-scale video object segmentation benchmark. arXiv preprint arXiv:1809.03327(2018).Google Scholar
- Linjie Yang, Yuchen Fan, and Ning Xu. 2019. Video instance segmentation. CoRR abs/1905.04804(2019). https://arxiv.org/abs/1905.04804Google Scholar
- Yifu Zhang, Chunyu Wang, Xinggang Wang, Wenjun Zeng, and Wenyu Liu. 2020. FairMOT: On the Fairness of Detection and Re-Identification in Multiple Object Tracking. arXiv preprint arXiv:2004.01888(2020).Google Scholar
- Zongpu Zhang, Yang Hua, Tao Song, Zhengui Xue, Ruhui Ma, Neil Robertson, and Haibing Guan. 2018. Tracking-assisted Weakly Supervised Online Visual Object Segmentation in Unconstrained Videos. In Proceedings of the 26th ACM international conference on Multimedia. 941–949.Google ScholarDigital Library
- Liang Zheng, Zhi Bie, Yifan Sun, Jingdong Wang, Chi Su, Shengjin Wang, and Qi Tian. 2016. Mars: A video benchmark for large-scale person re-identification. In European Conference on Computer Vision. Springer, 868–884.Google ScholarCross Ref
Index Terms
- Unsupervised Construction of Task-Specific Datasets for Object Re-identification
Recommendations
Multiple Object Tracking by Joint Head, Body Detection and Re-Identification
Intelligent Robotics and ApplicationsAbstractMulti-object tracking (MOT) is an important problem in computer vision which has a wide range of applications. Formulating MOT as multi-task learning of object detection and re-Identification (re-ID) in a single network is appealing since it ...
Real-time object tracking using bounded irregular pyramids
Target representation and localization is a central component in visual object tracking. In this paper a new approach for target representation and localization is presented. This approach tackles two of the most important causes of failure in object ...
Hierarchical feature grouping for multiple object segmentation and tracking
IVCNZ '12: Proceedings of the 27th Conference on Image and Vision Computing New ZealandIn this paper, we propose a hierarchical feature grouping method for multiple object segmentation and tracking. The proposed method aims to segment and track objects in the object-level without prior knowledge about the scene and object. We firstly ...
Comments