Categorizing Video Datasets: Video Object Detection, Multiple and Single Object Tracking

Categorizing Video Datasets: Video Object Detection, Multiple and Single Object Tracking

	© 2024 by IJETT Journal
	Volume-72 Issue-3
	Year of Publication : 2024
	Author : Sara Bouraya, Abdessamad Belangour
	DOI : 10.14445/22315381/IJETT-V72I3P110

How to Cite?

Sara Bouraya, Abdessamad Belangour, "Categorizing Video Datasets: Video Object Detection, Multiple and Single Object Tracking," International Journal of Engineering Trends and Technology, vol. 72, no. 3, pp. 99-105, 2024. Crossref, https://doi.org/10.14445/22315381/IJETT-V72I3P110

Abstract
Video Object detection, Single Object detection, Multiple Object Detection are crucial tasks in computer vision, enabling various real-world applications. The success of these tasks algorithms heavily relies on the availability of high-quality datasets for training and evaluation. This paper presents a comprehensive categorization of datasets specifically designed for multiple object detection, single object detection, and video object detection tasks in computer vision. Object detection and tracking are fundamental problems in the field, and accurate and diverse datasets are essential for training and evaluating detection and tracking algorithms effectively. By analyzing the characteristics of datasets for multiple object detection, single object detection, and video object detection, this paper serves as a valuable resource to drive advancements in object detection, tracking algorithms and systems. Accurate and diverse datasets are pivotal in the pursuit of robust and efficient object detection, tracking solutions across various applications in computer vision.

Keywords
Multiple Object Tracking, Single Object Tracking, Video Object Detection, Video Dataset, VOD dataset, MOT dataset, SOT dataset.

References
[1] Olga Russakovsky et al., “ImageNet Large Scale Visual Recognition Challenge,” International Journal of Computer Vision, vol. 115, pp. 211-252, 2015.
[CrossRef] [Google Scholar] [Publisher Link]
[2] Esteban Real et al., “YouTube-BoundingBoxes: A Large High-precision Human-Annotated Data Set for Object Detection in Video,” Proceeding - IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, pp. 7464-7473, 2017.
[CrossRef] [Google Scholar] [Publisher Link]
[3] Adel Ahmadyan et al., “Objectron: A Large Scale Dataset of Object-Centric Videos in the Wild with Pose Annotations,” Proceeding of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7818-7827, 2021.
[Google Scholar] [Publisher Link]
[4] Longyin Wen et al., “Detection, Tracking, and Counting Meets Drones in Crowds: A Benchmark,” Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, pp. 7808-7817, 2021.
[CrossRef] [Google Scholar] [Publisher Link]
[5] Yujun Zhang et al., “VIL-100: A New Dataset and A Baseline Model for Video Instance Lane Detection,” 2021 IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, QC, Canada, pp. 15661-15670, 2021.
[CrossRef] [Google Scholar] [Publisher Link]
[6] Longyin Wen et al., “UA-DETRAC: A New Benchmark and Protocol for Multi-object Detection and Tracking,” Computer Vision and Image Understanding, vol. 193, 2020.
[CrossRef] [Google Scholar] [Publisher Link]
[7] Mohammadamin Barekatain et al., “Okutama-Action: An Aerial View Video Dataset for Concurrent Human Action Detection,” IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Honolulu, HI, USA, pp. 2153-2160, 2017.
[CrossRef] [Google Scholar] [Publisher Link]
[8] H. Yu et al., “The Unmanned Aerial Vehicle Benchmark: Object Detection, Tracking and Baseline,” International Journal of Computer Vision, vol. 128, pp. 1141-1159, 2020.
[CrossRef] [Google Scholar] [Publisher Link]
[9] Isha Kalra et al., “DroneSURF: Benchmark Dataset for Drone-based Face Recognition,” 2019 14th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2019), Lille, France, pp. 1-7, 2019.
[CrossRef] [Google Scholar] [Publisher Link]
[10] Pengfei Zhu et al., “Vision Meets Drones: A Challenge,” arXiv, pp. 1-11, 2018.
[CrossRef] [Google Scholar] [Publisher Link]
[11] Leon Amadeus Varga et al., “SeaDronesSee: A Maritime Benchmark for Detecting Humans in Open Water,” 2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), Waikoloa, HI, USA, pp. 3686-3696, 2022.
[CrossRef] [Google Scholar] [Publisher Link]
[12] Meng-Ru Hsieh, Yen-Liang Lin, and Winston H. Hsu, “Drone-Based Object Counting by Spatially Regularized Regional Proposal Network,” 2017 IEEE International Conference on Computer Vision (ICCV), Venice, Italy, pp. 4165-4173, 2017.
[CrossRef] [Google Scholar] [Publisher Link]
[13] Andreas Ess, Bastian Leibe, and Luc Van Gool, “Depth and Appearance for Mobile Scene Analysis,” 2007 IEEE 11th International Conference on Computer Vision, Rio de Janeiro, Brazil, pp. 1-8, 2007.
[CrossRef] [Google Scholar] [Publisher Link]
[14] Achal Dave et al., “TAO: A Large-Scale Benchmark for Tracking Any Object,” Computer Vision-European Conference on Computer Vision ECCV 2020, vol. 12350, pp. 436-454, 2020.
[CrossRef] [Google Scholar] [Publisher Link]
[15] Hexin Bai et al., “GMOT-40: A Benchmark for Generic Multiple Object Tracking,” 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, pp. 6715-6724, 2021.
[CrossRef] [Google Scholar] [Publisher Link]
[16] Fisher Yu et al., “BDD100K: A Diverse Driving Dataset for Heterogeneous Multitask Learning,” 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, pp. 2633-2642, 2020.
[CrossRef] [Google Scholar] [Publisher Link]
[17] Andreas Geiger, Philip Lenz, and Raquel Urtasun, “Are We Ready for Autonomous Driving? The KITTI Vision Benchmark Suite,” 2012 IEEE Conference on Computer Vision and Pattern Recognition, 2012.
[CrossRef] [Google Scholar] [Publisher Link]
[18] Matej Kristan et al., “The Visual Object Tracking VOT2017 Challenge Results,” 2017 IEEE International Conference on Computer Vision Workshops (ICCVW), Venice, Italy, pp. 1949-1972, 2017.
[CrossRef] [Publisher Link]
[19] Ergys Ristani et al., “Performance Measures and a Data Set for multi-target, Multi-camera Tracking,” Computer Vision-European Conference on Computer Vision ECCV 2016 Workshop, vol. 9914, pp. 17-35, 2016.
[CrossRef] [Google Scholar] [Publisher Link]
[20] Alexandre Robicquet et al., “Learning Social Etiquette: Human Trajectory Understanding in Crowded Scenes,” Computer VisionEuropean Conference on Computer Vision ECCV 2016, vol. 9912, pp. 549-565, 2016.
[CrossRef] [Google Scholar] [Publisher Link]
[21] Dawei Du et al., “The Unmanned Aerial Vehicle Benchmark: Object Detection and Tracking,” Computer Vision-European Conference on Computer Vision ECCV 2018, pp. 375–391, 2018.
[CrossRef] [Google Scholar] [Publisher Link]
[22] Santiago Manen et al., “PathTrack: Fast Trajectory Annotation with Path Supervision,” 2017 IEEE International Conference on Computer Vision (ICCV), Venice, Italy, pp. 290-299, 2017.
[CrossRef] [Google Scholar] [Publisher Link]
[23] Yiyang Gan et al., “Self-supervised Multi-View Multi-Human Association and Tracking,” MM ’21: Proceedings of the 29th ACM International Conference on Multimedia, pp. 282-290, 2021.
[CrossRef] [Google Scholar] [Publisher Link]
[24] Tatjana Chavdarova et al., “WILDTRACK: A Multi-camera HD Dataset for Dense Unscripted Pedestrian Detection,” 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, pp. 5030-5039, 2018.
[CrossRef] [Google Scholar] [Publisher Link]
[25] Shenghao Hao et al., “DIVOTrack: A Novel Dataset and Baseline Method for Cross-View Multi-Object Tracking in DIVerse Open Scenes,” arXiv, pp. 1-19, 2023.
[CrossRef] [Google Scholar] [Publisher Link]
[26] Vijay Mahadevan et al., “Anomaly Detection in Crowded Scenes,” 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, San Francisco, CA, USA, pp. 1975-1981, 2010.
[CrossRef] [Publisher Link]
[27] Laura Leal-Taixé et al., “MOTChallenge 2015: Towards a Benchmark for Multi-Target Tracking,” arXiv, 2015.
[CrossRef] [Google Scholar] [Publisher Link]
[28] Anton Milan et al., “MOT16: A Benchmark for Multi-Object Tracking,” arXiv, 2016.
[CrossRef] [Google Scholar] [Publisher Link]
[29] Jack Valmadre et al., “Long-Term Tracking in the Wild: A Benchmark,” Computer Vision-European Conference on Computer Vision ECCV 2018, vol. 11207, pp. 692–707, 2018.
[CrossRef] [Google Scholar] [Publisher Link]
[30] Xiao Wang et al., “Towards more Flexible and Accurate Object Tracking with Natural Language: Algorithms and Benchmark,” 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 13763-13773, 2021.
[CrossRef] [Google Scholar] [Publisher Link]
[31] Arnold W.M. Smeulders et al., “Visual Tracking: An Experimental Survey,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 36, no. 7, pp. 1442–1468, 2014.
[CrossRef] [Google Scholar] [Publisher Link]
[32] Annan Li et al., “NUS-PRO: A New Visual Tracking Challenge,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 38, no. 2, pp. 335–349, 2016.
[CrossRef] [Google Scholar] [Publisher Link]
[33] Matthias Mueller, Neil Smith, and Bernard Ghanem, A Benchmark and Simulator for UAV Tracking, European Conference on Computer Vision, vol. 9905, pp. 445-461, 2016.
[CrossRef] [Google Scholar] [Publisher Link]
[34] Yi Wu, Jongwoo Lim, and Ming-Hsuan Yang, “Online Object Tracking: A Benchmark,” 2013 IEEE Conference on Computer Vision and Pattern Recognition, Portland, OR, USA, pp. 2411-2418, 2013.
[CrossRef] [Google Scholar] [Publisher Link]
[35] Yi Wu, Jongwoo Lim, and Ming-Hsuan Yang, “Object Tracking Benchmark,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 37, no. 9, pp. 1834-1848, 2015.
[CrossRef] [Google Scholar] [Publisher Link]
[36] Pengpeng Liang, Erik Blasch, and Haibin Ling, “Encoding Color Information for Visual Tracking: Algorithms and Benchmark,” IEEE Transactions on Image Processing, vol. 24, no. 12, pp. 5630-5644, 2015.
[CrossRef] [Google Scholar] [Publisher Link]
[37] Hamed Kiani Galoogahi et al., “Need for Speed: A Benchmark for Higher Frame Rate Object Tracking,” Proceedings of the IEEE International Conference on Computer Vision (ICCV), Venice, Italy, pp. 1134-1143, 2017.
[CrossRef] [Google Scholar] [Publisher Link]
[38] Alan Lukezic et al., “CDTB: A Color and Depth Visual Object tracking Dataset and Benchmark,” 2019 IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Korea (South), pp. 10012-10021, 2019.
[CrossRef] [Google Scholar] [Publisher Link]
[39] M. Dunnhofer et al., “Visual Object Tracking in First Person Vision,” International Journal of Computer Vision, vol. 131, pp. 259- 283, 2023.
[CrossRef] [Google Scholar] [Publisher Link]
[40] Lianghua Huang, Xin Zhao, and Kaiqi Huang, “Got-10k: A Large High-diversity Benchmark for Generic Object Tracking in the Wild,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 43, no. 5, pp. 1562-1577, 2021.
[CrossRef] [Google Scholar] [Publisher Link]
[41] Matthias Müller et al., “TrackingNet: A Large-Scale Dataset and Benchmark for Object Tracking in the Wild,” Computer Vision-European Conference on Computer Vision ECCV 2018, vol. 11205, pp. 310-327, 2018.
[CrossRef] [Google Scholar] [Publisher Link]
[42] Pengfei Zhu et al., “Multi-Drone-Based Single Object Tracking with Agent Sharing Network,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 31, no. 10, pp. 4058-4070, 2021.
[CrossRef] [Google Scholar] [Publisher Link]
[43] Heng Fan et al., “LaSOT: A High-quality Large-Scale Single Object Tracking Benchmark,” International Journal of Computer Vision, vol. 129, pp. 439-461, 2021.
[CrossRef] [Google Scholar] [Publisher Link]
[44] Nan Jiang et al., “Anti-UAV: A Large Multi-Modal Benchmark for UAV Tracking,” arXiv, pp. 1-13, 2021.
[CrossRef] [Google Scholar] [Publisher Link]
[45] Matej Kristan et al., “The Visual Object Tracking VOT2016 Challenge Results,” Computer Vision-European Conference on Computer Vision ECCV 2016, vol. 9914, pp. 777-823, 2016.
[CrossRef] [Google Scholar] [Publisher Link]

IJBTT

Categorizing Video Datasets: Video Object Detection, Multiple and Single Object Tracking