Abstract
Kernel methods such as Support Vector Machines are widely applied to classification problems, including concept detection in video. Nonetheless issues like modeling specific distance functions of feature descriptors or the temporal sequence of features in the kernel have received comparatively little attention in multimedia research. We review work on kernels for commonly used MPEG-7 visual features and propose a kernel for matching temporal sequences of these features. The sequence kernel is based on ideas from string matching, but does not require discretization of the input feature vectors and deals with partial matches and gaps. Evaluation on the TRECVID 2007 high-level feature extraction data set shows that the sequence kernel clearly outperforms the radial basis function (RBF) kernel and the MPEG-7 visual feature kernels using only single key frames.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Ayache, S., Quénot, G.: TRECVID 2007: Collaborative annotation using active learning. In: TRECVID (2007)
Bailer, W., Lee, F., Thallinger, G.: A distance measure for repeated takes of one scene. The Visual Computer 25(1), 53–68 (2009)
Ballan, L., Bertini, M., Del Bimbo, A., Serra, G.: Video event classification using string kernels. Multimedia Tools Appl. 48(1), 69–87 (2010)
Chang, C.-C., Lin, C.-J.: LIBSVM: a library for support vector machines (2001), Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm
Choi, J., Jeon, W.J., Lee, S.-C.: Spatio-temporal pyramid matching for sports videos. In: Proceeding of the 1st ACM International Conference on Multimedia Information Retrieval, MIR 2008, pp. 291–297. ACM, New York (2008)
Djordjevic, D., Izquierdo, E.: Kernels in structured multi-feature spaces for image retrieval. Electronics Letters 42(15), 856–857 (2006)
Djordjevic, D., Izquierdo, E.: Relevance feedback for image retrieval in structured multi-feature spaces. In: Proceedings of the 2nd International Conference on Mobile Multimedia Communications, MobiMedia 2006, pp. 1–5. ACM, New York (2006)
Grauman, K., Darrell, T.: The pyramid match kernel: Efficient learning with sets of features. J. Mach. Learn. Res. 8, 725–760 (2007)
Information technology-multimedia content description interface: Part 3: Visual. ISO/IEC 15938-3 (2001)
Kotsia, I., Patras, I.: Relative margin support tensor machines for gait and action recognition. In: Proceedings of the ACM International Conference on Image and Video Retrieval, CIVR 2010, pp. 446–453. ACM, New York (2010)
Kraaij, W., Awad, G.: TRECVID-2009 high-level feature task: Overview (2009), http://www-nlpir.nist.gov/projects/tvpubs/tv9.slides/tv9.hlf.slides.pdf
Manjunath, B.S., Ohm, J.-R., Vasudevan, V.V., Yamada, A.: Color and texture descriptors. IEEE Transactions on Circuits and Systems for Video Technology 11(6), 703–715 (2001)
Naphade, M.R., Kennedy, L., Kender, J.R., Chang, S.-F., Smith, J.R., Over, P., Hauptmann, A.: A light scale concept ontology for multimedia understanding for TRECVID 2005. Technical Report RC23612 (W0505-104), IBM Research (2005)
Qi, G.-J., Hua, X.-S., Rui, Y., Tang, J., Mei, T., Wang, M., Zhang, H.-J.: Correlative multilabel video annotation with temporal kernels. ACM Trans. Multimedia Comput. Commun. Appl. 5(1), 1–27 (2008)
Shawe-Taylor, J., Cristianini, N.: Kernel Methods for Pattern Analysis. Cambridge University Press, Cambridge (2004)
Smeaton, A.F., Over, P., Kraaij, W.: Evaluation campaigns and TRECVid. In: Proceedings of the 8th ACM International Workshop on Multimedia Information Retrieval, MIR 2006, pp. 321–330. ACM Press, New York (2006)
Wilkins, P., Adamek, T., Byrne, D., Jones, G.J.F., Lee, H., Keenan, G., McGuinness, K., Smeaton, A.F., O’Connor, N.E., Amin, A., Obrenovic, Z., Benmokhtar, R., Galmar, E., Huet, B., Essid, S., Landais, R., Vallet, F., Papadopoulos, G.T., Vrochidis, S., Mezaris, V., Kompatsiaris, I., Spyrou, E., Avrithis, Y., Mörzinger, R., Schallauer, P., Bailer, W., Piatrik, T., Chandramouli, K., Izquierdo, E., Haller, M., Goldmann, L., Samour, A., Cobet, A., Sikora, T., Praks, P.: K-Space at TRECVid 2007. In: Proceedings of the TRECVid Workshop (2007)
Wu, G., Wu, Y., Jiao, L., Wang, Y.-F., Chang, E.Y.: Multi-camera spatio-temporal fusion and biased sequence-data learning for security surveillance. In: Proceedings of the eleventh ACM International Conference on Multimedia, MULTIMEDIA 2003, pp. 528–538. ACM, New York (2003)
Xu, D., Chang, S.-F.: Video event recognition using kernel methods with multilevel temporal alignment. IEEE Trans. Pattern Anal. Mach. Intell. 30(11), 1985–1997 (2008)
Yeh, M.-C., Cheng, K.-T.: A string matching approach for visual retrieval and classification. In: Proceeding of the 1st ACM International Conference on Multimedia Information Retrieval, MIR 2008, pp. 52–58. ACM, New York (2008)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Bailer, W. (2011). A Feature Sequence Kernel for Video Concept Classification. In: Lee, KT., Tsai, WH., Liao, HY.M., Chen, T., Hsieh, JW., Tseng, CC. (eds) Advances in Multimedia Modeling. MMM 2011. Lecture Notes in Computer Science, vol 6523. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-17832-0_34
Download citation
DOI: https://doi.org/10.1007/978-3-642-17832-0_34
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-17831-3
Online ISBN: 978-3-642-17832-0
eBook Packages: Computer ScienceComputer Science (R0)