Abstract
Nowadays, the diversity and large deployment of video recorders result in a large volume of video data, whose effective use requires a video indexing process. However, this process generates a major problem consisting in the semantic gap between the extracted low-level features and the ground truth. The ontology paradigm provides a promising solution to overcome this problem. However, no naming syntax convention has been followed in the concept creation step, which constitutes another problem. In this paper, we have considered these two issues and have developed a full video surveillance ontology following a formal naming syntax convention and semantics that addresses queries of both academic research and industrial applications. In addition, we propose an ontology video surveillance indexing and retrieval system (OVIS) using a set of semantic web rule language (SWRL) rules that bridges the semantic gap problem. Currently, the existing indexing systems are essentially based on low-level features and the ontology paradigm is used only to support this process with representing surveillance domain. In this paper, we developed the OVIS system based on the SWRL rules and the experiments prove that our approach leads to promising results on the top video evaluation benchmarks and also shows new directions for future developments.
Similar content being viewed by others
References
Kless D, Jansen L, Lindenthal J, Wiebensohn J (2012) A method for reengineering a thesaurus into an ontology. In: Frontiers in artificial intelligence and applications (FAIA), pp 133–146
Badii A, Lallah C, Zhu M, Crouch M (2009) The dream framework: Using a network of scalable ontologies for intelligent indexing and retrieval of visual content. In: International conference on web intelligence and intelligent agent technology (WI-IAT), pp 551–554
Rodrguez-Muro M, Calvanese D (2012) High performance query answering over DL-Lite ontologies. In: International conference on principles of knowledge representation and reasoning (KR), pp 308–318
Scherp A, Saathoff C, Franz T, Staab S (2011) Designing core ontologies. J Appl Ontol 03:177–221
Benmokhtar R, Huet B (2011) An ontology-based evidential framework for video indexing using high-level multimodal fusion. Multimed Tools Appl 55(3):1–27
Rector A, Brandt S, Drummond N, Horridge M, Pulestin C, Stevens R (2012) Engineering use cases for modular development of ontologies in owl. J Appl Ontol 02:113–132
Smith B, Ceusters W (2010) Ontological realism as a methodology for coordinated evolution of scientific ontologies. J Appl Ontol 03(4):139–188
Hernandez-Leal P, Escalante HJ, Sucar LE (2017) Towards a generic ontology for video surveillance. In: Applications for future internet
Kara S, Alan Z, Sabuncu O, Akpnar S, Cicekli NK, Alpaslan FN (2012) An ontology-based retrieval system using semantic indexing. Inf Syst J 04:294–305
Mossakowski T, Lange C, Kutz O (2013) Three semantics for the core of the distributed ontology language. In: International joint conferences on artificial intelligence (IJCAI), pp 3027–3031
Ballan L, Bertini M, Del Bimbo A, Serra G (2010) Semantic annotation of soccer videos by visual instance clustering and spatial/temporal reasoning in ontologies. Multimed Tools Appl 02:313–337
Bagdanov AD, Bertini M, Del Bimbo A, Serra G, Torniai C (2007) Semantic annotation and retrieval of video events using multimedia ontologies. In: International conference on semantic computing (ICSC), pp 713–720
Bertini M, Del Bimbo A, Torniai C, Grana C, Cucchiara R (2007) Dynamic pictorial ontologies for video digital libraries annotation. In: 1st ACM workshop on the many faces of multimedia semantics, pp 47–56
Bertini M, Del Bimbo A, Serra G (2008) Learning ontology rules for semantic video annotation. In: 2nd ACM workshop on multimedia semantics, pp 1–8
OConnor M, Knublauch H, Tu S, Grosof B, Dean M, Grosso W, Musen M (2005) Supporting rule system interoperability on the semantic web with SWRL. In: 4th international semantic web conference (ISWC), pp 974–986
Xue M, Zheng S, Zhang C (2012) Ontology-based surveillance video archive and retrieval system. In: 5th International conference on advanced computational intelligence (ICACI), pp 84–89
Lee J, Abualkibash MH, Ramalingam PK (2008) Ontology based shot indexing for video surveillance system. In: Innovations and advanced techniques in systems, computing sciences and software engineering, pp 237–242
Snidaro L, Belluz M, Foresti GL (2007) Representing and recognizing complex events in surveillance applications. In: IEEE international conference on advanced video and signal-based surveillance (AVSS), pp 493–498
Calavia L, Baladrn C, Aguiar JM, Carro B, Sanchez-Esguevillas A (2012) A semantic autonomous video surveillance system for dense camera networks in smart cities. Sensors 12:10407–10429
Papadopoulos GT, Mezaris V, Kompatsiaris I, Strintzis MG (2007) Ontology-driven semantic video analysis using visual information objects. In: International conference on semantic and digital media technologies, pp 56–69
Saad S, Beul DD, Said M, Pierre M (2012) An ontology for video human movement representation based on benesh notation. In: IEEE international conference on multimedia computing and systems (ICMCS), pp 77–82
Trochidis I, Tambouris E, Tarabanis K (2007) An ontology for modeling life-events. In: IEEE international conference on services computing (SCC), pp 19–20
Bohlken W, Neumann B (2009) Generation of rules from ontologies for high-level scene interpretation. In: Lecture notes in computer science, pp 93–107
Nevatia R, Hobbs J, Bolles B (2004) An ontology for video event representation. In: Computer vision and pattern recognition (CVPR), pp 119–128
Francois ARJ, Nevatia R, Hobbs J, Bolles RC, Smith JR (2005) VERL: an ontology framework for representing and annotating video events. IEEE Multimed 12:76–86
Bai L, Lao S, Zhang W, Jones GJF, Smeaton AF (2008) Video semantic content analysis framework based on ontology combined mpeg-7. In: Lecture notes in computer science, pp 237–250
SanMiguel JC, Martinez JM, Garcia A (2009) An ontology for event detection and its application in surveillance video. In: IEEE international conference on advanced video and signal-based surveillance (AVSS), pp 220–225
Utasi A, Kiss A, Sziranyi T (2009) Statistical filters for crowd image analysis. In: Performance evaluation of tracking and surveillance workshop, at CVPR, pp 95–100
Chan AB, Morrowand M, Vasconcelos N (2009) Analysis of crowded scenes using holistic properties. In: 11th IEEE international workshop on performance evaluation of tracking and surveillance (PETS)
Zhao Z, Wang M, Xiang R, Zhao S, Zhou K, liu M, He S, Zhu Y, Zhao Y, Su F (2016) BUPT-MCPRL, at TRECVID
Markatopoulou F, Moumtzidou A, Galanopoulos D, Mironidis T, Kaltsa V, Ioannidou A, Symeonidis S, Avgerinakis K, Andreadis S, Gialampoukidis I, Vrochidis S, Briassouli A, Mezaris V, Kompatsiaris I, Patras I (2016) ITI-CERTH, at TRECVID
Kazi Tani MY, Ghomari A, Belhadef H, Lablack A, Bilasco IM (2014) An ontology based approach for inferring multiple object events in surveillance domain. In: IEEE science and information conference (SAI), pp 404–409
Kazi Tani MY, Ghomari A, Lablack A, Bilasco IM (2015) Events detection using a video-surveillance ontology and a rule-based approach. In Computer vision + ONTology applied cross-disciplinary technologies workshop (CONTACT) in conjunction with European conference in computer vision (ECCV), pp 299–308
PETS. Pets 2012 challenge. http://www.cvg.reading.ac.uk/PETS2012/a.html
TRECVID. TRECVID 2016 challenge. http://www-nlpir.nist.gov/projects/tv2016/tv2016.html
Kuznetsova P, Ordonez V, Berg T, Choi Y (2014) Treetalk: composition and compression of trees for image descriptions. In: Transactions of the association for computational linguistics (TACL), pp 351–362
Socher R, Karpathy A, Le VQ, Manning CD, Ng AY (2014) Grounded compositional semantics for finding and describing images with sentences. Trans Assoc Comput Linguist 2:207–218
Vinyals O, Toshev A, Bengio S, Erhan D (2014) Show and tell: a neural image caption generator. arXiv:1411.4555
Kiros R, Salakhutdinov R, Zemel RS (2014) Unifying visual-semantic embeddings with multimodal neural language models. arXiv:1411.2539
Mao J, Xu W, Yang Y, Wang J, Yuille AL (2014) Explain images with multimodal recurrent neural networks,.arXiv:1410.1090
Yao L, Torabi A, Cho K, Ballas N, Pal C, Larochelle H, Courville A (2015) Describing videos by exploiting temporal structure. In: IEEE international conference on computer vision (ICCV)
Rohrbach A, Rohrbach M, Qiu W, Friedrich A, Pinkal M, Schiele B (2014) Coherent multi-sentence video description with variable level of detail. In: German conference on pattern recognition (GCPR)
Rohrbach M, Qiu W, Titov I, Stefan T, Pinkal M, Schiele B (2013) Translating video content to natural language descriptions. In: IEEE international conference on computer vision (ICCV)
Venugopalan S, Xu H, Donahue J, Rohrbach M, Mooney RJ, Saenko K (2014) Translating videos to natural language using deep recurrent neural networks. arXiv:1412.4729
OpenCV. The OpenCV API. http://docs.opencv.org/3.3.0/
Protege. The protege project. http://protege.stanford.edu
Sirin EB, Parsia B, Cuenca Grau B, Kalyanpur A, Katz Y (2003) Pellet: a practical OWL-DL reasoner. J Web Semantics 5:51–53
Allen JF (1983) Maintaining knowledge about temporal intervals. Commun ACM 26:832–843
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Kazi Tani, M.Y., Ghomari, A., Lablack, A. et al. OVIS: ontology video surveillance indexing and retrieval system. Int J Multimed Info Retr 6, 295–316 (2017). https://doi.org/10.1007/s13735-017-0133-z
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13735-017-0133-z