18 July 2016 Multiview fusion for activity recognition using deep neural networks
Rahul Kavi, Vinod Kulathumani, Fnu Rohit, Vlad Kecojevic
Author Affiliations +
Abstract
Convolutional neural networks (ConvNets) coupled with long short term memory (LSTM) networks have been recently shown to be effective for video classification as they combine the automatic feature extraction capabilities of a neural network with additional memory in the temporal domain. This paper shows how multiview fusion can be applied to such a ConvNet LSTM architecture. Two different fusion techniques are presented. The system is first evaluated in the context of a driver activity recognition system using data collected in a multicamera driving simulator. These results show significant improvement in accuracy with multiview fusion and also show that deep learning performs better than a traditional approach using spatiotemporal features even without requiring any background subtraction. The system is also validated on another publicly available multiview action recognition dataset that has 12 action classes and 8 camera views.
© 2016 SPIE and IS&T 1017-9909/2016/$25.00 © 2016 SPIE and IS&T
Rahul Kavi, Vinod Kulathumani, Fnu Rohit, and Vlad Kecojevic "Multiview fusion for activity recognition using deep neural networks," Journal of Electronic Imaging 25(4), 043010 (18 July 2016). https://doi.org/10.1117/1.JEI.25.4.043010
Published: 18 July 2016
Lens.org Logo
CITATIONS
Cited by 27 scholarly publications and 1 patent.
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Cameras

Neural networks

Data fusion

Convolution

Video

Imaging systems

Image fusion

Back to Top