Elsevier

Pattern Recognition

Volume 36, Issue 3, March 2003, Pages 585-601
Pattern Recognition

Recent developments in human motion analysis

https://doi.org/10.1016/S0031-3203(02)00100-0Get rights and content

Abstract

Visual analysis of human motion is currently one of the most active research topics in computer vision. This strong interest is driven by a wide spectrum of promising applications in many areas such as virtual reality, smart surveillance, perceptual interface, etc. Human motion analysis concerns the detection, tracking and recognition of people, and more generally, the understanding of human behaviors, from image sequences involving humans. This paper provides a comprehensive survey of research on computer-vision-based human motion analysis. The emphasis is on three major issues involved in a general human motion analysis system, namely human detection, tracking and activity understanding. Various methods for each issue are discussed in order to examine the state of the art. Finally, some research challenges and future directions are discussed.

Introduction

As one of the most active research areas in computer vision, visual analysis of human motion attempts to detect, track and identify people, and more generally, to interpret human behaviors, from image sequences involving humans. Human motion analysis has attracted great interests from computer vision researchers due to its promising applications in many areas such as visual surveillance, perceptual user interface, content-based image storage and retrieval, video conferencing, athletic performance analysis, virtual reality, etc.

Human motion analysis has been investigated under several large research projects worldwide. For example, Defense Advanced Research Projects Agency (DARPA) funded a multi-institution project on Video Surveillance and Monitoring (VSAM) [1], whose purpose was to develop an automatic video understanding technology that enabled a single human operator to monitor activities over complex areas such as battlefields and civilian scenes. The real-time visual surveillance system W4 [2] employed a combination of shape analysis and tracking, and constructed the models of people's appearances to make itself capable of detecting and tracking multiple people as well as monitoring their activities even in the presence of occlusions in an outdoor environment. Researchers in the UK have also done much research on the tracking of vehicles and people and the recognition of their interactions [3]. In addition, companies such as IBM and Microsoft are also investing on research on human motion analysis [4], [5].

In recent years, human motion analysis has been featured in a number of leading international journals such as International Journal of Computer Vision (IJCV), Computer Vision and Image Understanding (CVIU), IEEE Transactions on Pattern Recognition and Machine Intelligence (PAMI), and Image and Vision Computing (IVC), as well as prestigious international conferences and workshops such as International Conference on Computer Vision (ICCV), IEEE International Conference on Computer Vision and Pattern Recognition (CVPR), European Conference on Computer Vision (ECCV), Workshop on Applications of Computer Vision (WACV), and IEEE International Workshop on Visual Surveillance (IWVS).

All the above activities have demonstrated a great and growing interest in human motion analysis from the pattern recognition and computer vision community. The primary purpose of this paper is thus to review the recent developments in this exciting research area, especially the progress since previous such reviews.

Human motion analysis has a wide range of potential applications such as smart surveillance, advanced user interface, motion-based diagnosis, to name a few [6].

The strong need of smart surveillance systems [7], [8] stems from those security-sensitive areas such as banks, department stores, parking lots, and borders. Surveillance cameras are already prevalent in commercial establishments, while camera outputs are usually recorded in tapes or stored in video archives. These video data is currently used only “after the fact” as a forensic tool, losing its primary benefit as an active real-time media. What is needed is the real-time analysis of surveillance data to alert security officers to a burglary in progress, or to a suspicious individual wandering around in the parking lot. Nowadays, the tracking and recognition techniques of face [9], [10], [11], [12] and gait [13], [14], [15], [16] have been strongly motivated for the purpose of access control. As well as the obvious security applications, smart surveillance has also been proposed to measure traffic flow, monitor pedestrian congestion in public spaces [17], [18], compile consumer demographics in shopping malls, etc.

Another important application domain is advanced user interfaces in which human motion analysis is usually used to provide control and command. Generally speaking, communication among people is mainly realized by speech. Therefore, speech understanding has already been widely used in early human–machine interfaces. However, it is subject to the restrictions from environmental noise and distance. Vision is very useful to complement speech recognition and natural language understanding for more natural and intelligent communication between human and machines. That is to say, more detailed cues can be obtained by gestures, body poses, facial expressions, etc. [19], [20], [21], [22]. Hence, future machines must be able to independently sense the surrounding environment, e.g., detecting human presence and interpreting human behavior. Other applications in the user interface domain include sign-language translation, gesture driven controls, and signaling in high-noise environment such as factories and airports [23].

It is particularly useful to segment various body parts of human in an image, track the movement of joints over an image sequence, and recover the underlying 3-D body structure for the analysis and training of athletic performance. With the development of digital libraries, interpreting video sequences automatically using content-based indexing will save tremendous human efforts in sorting and retrieving images or video in a huge database. Traditional gait analysis [24], [25], [26] aims at providing medical diagnosis and treatment support, while human gait can also be used as a new biometric feature for personal identification [13], [14], [15], [16]. Some other applications of vision-based motion analysis lie in personalized training systems for various sports, medical diagnostics of orthopedic patients, choreography of dance and ballet, etc.

In addition, human motion analysis shows its importance in other related areas. For instance, typical applications in virtual reality include chat-rooms, games, virtual studios, character animations, teleconferencing, etc. As far as computer games [27] are concerned, they have been very prevalent in entertainment. Maybe people are surprised at the realism of virtual humans and simulated actions in computer games. In fact, this benefits greatly from computer graphics dealing with devising realistic models of human bodies and the synthesis of human movement based on knowledge of the acquisition of human body model, the retrieval of body pose, human behavior analysis, etc. Also, it is obvious that model-based image coding (e.g., only encoding the pose of the tracked face in images in more detail than the uninterested background in a videophone setting) will bring about very low bit-rate video compression for more effective image storage and transmission.

The importance and popularity of human motion analysis has led to several previous surveys. Each such survey is discussed in the following in order to put the current review in context.

The earliest relevant review was probably due to Aggarwal et al. [28]. It covered various methods used in articulated and elastic non-rigid motion prior to 1994. As for articulated motion, the approaches with or without a prior shape models were described.

Cedars and Shah [29] presented an overview of methods for motion extraction prior to 1995, in which human motion analysis was illustrated as action recognition, recognition of body parts and body configuration estimation.

Aggarwal and Cai gave another survey of human motion analysis [30], which covered the work prior to 1997. Their latest review [31] covering 69 publications was an extension of their workshop paper [30]. The paper provided an overview of various tasks involved in motion analysis of human body prior to 1998. The focuses were on three major areas related to interpreting human motion: (a) motion analysis involving human body parts, (b) tracking moving human from a single view or multiple camera perspectives, and (c) recognizing human activities from image sequences.

A similar survey by Gavrila [6] described the work in human motion analysis prior to 1998. Its emphasis was on discussing various methodologies that were grouped into 2-D approaches with or without explicit shape models and 3-D approaches. It concluded with two main future directions in 3-D tracking and action recognition.

Recently, a relevant study by Pentland [32] centered on person identification, surveillance/monitoring, 3-D methods, and smart rooms/perceptual user interfaces to review the state of the art of “looking at people”. The paper was not intended to survey the current work on human motion analysis, but touched on several interesting topics in human motion analysis and its applications.

The latest survey of computer-vision-based human motion capture was presented by Moeslund and Granum [33]. Its focus was on a general overview based on the taxonomy of system functionalities, viz., initialization, tracking, pose estimation and recognition. It covered the achievements from 1980 into the first half of 2000. In addition, a number of general assumptions used in this research field were identified and suggestions for future research directions were offered.

The growing interest in human motion analysis has led to significant progress in recent years, especially on high-level vision issues such as human activity and behavior understanding. This paper will provide a comprehensive survey of work on human motion analysis from 1989 onwards. Approximately 70% of the references discussed in this paper are found after 1996. In contrast to the previous reviews, the current review focuses on the most recent developments, especially on intermediate-level and high-level vision issues.

To discuss the topic more conveniently, various surveys usually select different taxonomies to group individual papers depending on their purposes. Unlike previous reviews, we will focus on a more general overview on the overall process of a human motion analysis system shown in Fig. 1. Three major tasks in the process of human motion analysis (namely human detection, human tracking and human behavior understanding) will be of particular concern. Although they do have some overlap (e.g., the use of motion detection during tracking), this general classification provides a good framework for discussion throughout this survey.

The majority of past works in human motion analysis are accomplished within tracking and action recognition. Similar in principle to earlier reviews, we will make more detailed introductions to both processes. We also introduce relevant reviews on motion segmentation used in human detection, and behavior semantic description used in human activity interpretation. Compared with previous reviews, we include more comprehensive discussions on research challenges and future open directions in the domain of vision-based human motion analysis.

Instead of detailed summaries of individual publications, our emphasis is on discussing various methods for different tasks involved in a general human motion analysis system. Each issue will be accordingly divided into sub-processes or categories of various methods to examine the state of the art, and only the principles of each group of methods are described in this paper.

Unlike previous reviews, this paper is clearly organized in a hierarchical manner from low-level vision, intermediate-level vision to high-level vision according to the general framework of human motion analysis. This, we believe, will help the readers, especially newcomers to this area, not only to obtain an understanding of the state of the art in human motion analysis but also to appreciate the major components of a general human motion analysis system and the inter-component links.

In summary, the primary purpose and contributions of this paper are as follows (when compared with the existing survey papers on human motion analysis):


(1) This paper aims to provide a comprehensive survey of the most recent developments in vision-based human motion analysis. It covers the latest research ranging mainly from 1997 to 2001. It thus contains many new references not found in previous surveys.

(2) Unlike previous reviews, this paper is organized in a hierarchical manner (from low-level vision, intermediate-level vision, to high-level vision) according to a general framework of human motion analysis systems.

(3) Unlike other reviews, this paper selects a taxonomy based on functionalities including detection, tracking and behavior understanding within human motion analysis systems.

(4) This paper focuses more on overall methods and general characteristics involved in the above three issues (functionalities), so each issue is accordingly divided into sub-processes and categories of approaches so as to provide more detailed discussions.

(5) In contrast to past surveys, we provide detailed introduction to motion segmentation and object classification (an important basis for human motion analysis systems) and semantic description of behaviors (an interesting direction which has recently received increasing attentions).

(6) We also provide more detailed discussions on research challenges and future research directions in human motion analysis than any other earlier reviews.


The remainder of this paper is organized as follows. Section 2 reviews the work on human detection including motion segmentation and moving object classification. Section 3 covers human tracking, which is divided into four categories of methods: model-based, region-based, active-contour-based and feature-based. The paper then extends the discussion to the recognition and description of human activities in image sequences in Section 4. Section 5 analyzes some challenges and presents some possible directions for future research at length. Section 6 concludes this paper.

Section snippets

Detection

Nearly every system of vision-based human motion analysis starts with human detection. Human detection aims at segmenting regions corresponding to people from the rest of an image. It is a significant issue in a human motion analysis system since the subsequent processes such as tracking and action recognition are greatly dependent on it. This process usually involves motion segmentation and object classification.

Tracking

Object tracking in video streams has been a popular topic in the field of computer vision. Tracking is a particularly important issue in human motion analysis since it serves as a means to prepare data for pose estimation and action recognition. In contrast to human detection, human tracking belongs to a higher-level computer vision problem. However, the tracking algorithms within human motion analysis usually have considerable intersection with motion segmentation during processing.

Tracking

Behavior understanding

After successfully tracking the moving humans from one frame to another in an image sequence, the problem of understanding human behaviors from image sequences follows naturally. Behavior understanding involves action recognition and description. As a final or long-time goal, human behavior understanding can guide the development of many human motion analysis systems. In our opinion, it will be the most important area of future research in human motion analysis.

Behavior understanding is to

Discussions

Although a large amount of work has been done in human motion analysis, many issues are still open and deserve further research, especially in the following areas:

(1) Segmentation. Fast and accurate motion segmentation is a significant but difficult problem. The captured images in dynamic environments are often affected by many factors such as weather, lighting, clutter, shadow, occlusion, and even camera motion. Taking only shadow for an example, they may either be in contact with the

Conclusions

Computer-vision-based human motion analysis has become an active research area. It is strongly driven by many promising applications such as smart surveillance, virtual reality, advanced user interface, etc. Recent technical developments have strongly demonstrated that visual systems can successfully deal with complex human movements. It is exciting to see many researchers gradually spreading their achievements into more intelligent practical applications.

Bearing in mind a general processing

Acknowledgements

The authors would like to thank H. Z. Ning and the referee for their valuable suggestions. This work is supported in part by NSFC (Grant No. 69825105 and 60105002), and the Institute of Automation (Grant No. 1M01J02), Chinese Academy of Sciences.

About the Author—LIANG WANG received his B.Sc. (1997) and M.Sc. (2000) in the Department of Electronics Engineering and Information Science from Anhui University, China. He is currently a Ph.D. candidate in the National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences, Beijing, China. He has published more than six papers in major national journals and international conferences. His main research interests include computer vision, pattern recognition,

References (163)

  • J Badenas et al.

    Motion-based segmentation and region tracking in image sequences

    Pattern Recognition

    (2001)
  • A Baumberg et al.

    Generating spatio-temporal models from examples

    Image Vision Comput.

    (1996)
  • R.T. Collins, et al., A system for video surveillance and monitoring: VSAM final report, CMU-RI-TR-00-12, Technical...
  • I Haritaoglu et al.

    W4: real-time surveillance of people and their activities

    IEEE Trans. Pattern Anal. Mach. Intell.

    (2000)
  • C Maggioni et al.

    Gesture computer: history, design, and applications

  • W. Freeman, C. Weissman, Television control by hand gestures, Proceedings of the International Conference on Automatic...
  • R.T Collins et al.

    Introduction to the special section on video surveillance

    IEEE Trans. Pattern Anal. Mach. Intell.

    (2000)
  • S Maybank et al.

    Introduction to special section on visual surveillance

    Int. J. Comput. Vision

    (2000)
  • J. Steffens, E. Elagin, H. Neven, Person Spotter-fast and robust system for human detection, tracking and recognition,...
  • J. Yang, A. Waibel, A real-time face tracker, Proceedings of the IEEE CS Workshop on Applications of Computer Vision,...
  • B. Moghaddam, W. Wahid, A. Pentland, Beyond eigenfaces: probabilistic matching for face recognition, Proceedings of the...
  • C. Wang, M.S. Brandstein, A hybrid real-time face tracking system, Proceedings of the International Conference on...
  • J.J Little et al.

    Recognizing people by their gaitthe shape of motion

    J. Comput. Vision Res.

    (1998)
  • J.D. Shutler, M.S. Nixon, C.J. Harris, Statistical gait recognition via velocity moments, Proceedings of the IEE...
  • P.S Huang et al.

    Human gait recognition in canonical space using temporal templates

    Proc. IEE Vision Image Signal Process.

    (1999)
  • D. Cunado, M.S. Nixon, J.N. Carter, Automatic gait recognition via model-based evidence gathering, Proceedings of the...
  • B.A Boghossian et al.

    Image processing system for pedestrian monitoring using neural classification of normal motion patterns

    Meas. Control

    (1999)
  • B.A. Boghossian, S.A. Velastin, Motion-based machine vision techniques for the management of large crowds, Proceedings...
  • Yi Li, Songde Ma, Hanqing Lu, Human posture recognition using multi-scale morphological method and Kalman motion...
  • J. Segen, S. Kumar, Shadow gestures: 3D hand pose estimation using a single camera, Proceedings of the IEEE CS...
  • M-H. Yang, N. Ahuja, Recognizing hand gesture using motion trajectories, Proceedings of the IEEE CS Conference on...
  • Y. Cui, J.J. Weng, Hand segmentation using learning-based prediction and verification for hand sign recognition,...
  • M. Turk, Visual interaction with lifelike characters, Proceedings of the IEEE International Conference on Automatic...
  • H.M. Lakany, G.M. Haycs, M. Hazlewood, S.J. Hillman, Human walking: tracking and analysis, Proceedings of the IEE...
  • M. Köhle, D. Merkl, J. Kastner, Clinical gait analysis by neural networks: issues and experiences, Proceedings of the...
  • D. Meyer, J. Denzler and H. Niemann, Model based extraction of articulated objects in image sequences for gait...
  • W. Freeman, et al., Computer vision for computer games, Proceedings of the International Conference on Automatic Face...
  • J.K. Aggarwal, Q. Cai, W. Liao, B. Sabata, Articulated and elastic non-rigid motion: a review, Proceedings of the IEEE...
  • J.K. Aggarwal, Q. Cai, Human motion analysis: a review, Proceedings of the IEEE Workshop on Motion of Non-Rigid and...
  • Alex Pentland

    Looking at peoplesensing for ubiquitous and wearable computing

    IEEE Trans. Pattern Anal. Mach. Intell.

    (2000)
  • K.P. Karmann, A. Brandt, Moving object recognition using an adaptive background memory, in: V. Cappellini (Ed.),...
  • M. Kilger, A shadow handler in a video-based real-time traffic monitoring system, Proceedings of the IEEE Workshop on...
  • Y.H Yang et al.

    The background primal sketchan approach for tracking moving objects

    Mach. Vision Appl.

    (1992)
  • C.R Wren et al.

    Pfinderreal-time tracking of the human body

    IEEE Trans. Pattern Anal. Mach. Intell.

    (1997)
  • C. Stauffer, W. Grimson, Adaptive background mixture models for real-time tracking, Proceedings of the IEEE CS...
  • S. Arseneau, J.R. Cooperstock, Real-time image segmentation for action recognition, Proceedings of the IEEE Pacific Rim...
  • H.Z. Sun, T. Feng, T.N. Tan, Robust extraction of moving objects from image sequences, Proceedings of the Fourth Asian...
  • A. Elgammal, D. Harwood, L.S. David, Nonparametric background model for background subtraction, Proceedings of the...
  • A.J. Lipton, H. Fujiyoshi, R.S. Patil, Moving target classification and tracking from real-time video, Proceedings of...
  • C. Anderson, P. Bert, G. Vander Wal, Change detection and tracking using pyramids transformation techniques,...
  • Cited by (0)

    About the Author—LIANG WANG received his B.Sc. (1997) and M.Sc. (2000) in the Department of Electronics Engineering and Information Science from Anhui University, China. He is currently a Ph.D. candidate in the National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences, Beijing, China. He has published more than six papers in major national journals and international conferences. His main research interests include computer vision, pattern recognition, digital image processing and analysis, multimedia, visual surveillance, etc.

    About the Author—WEIMING HU received his Ph.D. Degree from the Department of Computer Science and Engineering, Zhejiang University, China. From April 1998 to March 2000, he worked as a Postdoctoral Research Fellow at the Institute of Computer Science and Technology, Founder Research and Design Center, Peking University. From April 2000, he worked at the National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences, as an Associate Professor. His research interests are in visual surveillance and monitoring of dynamic scenes, neural network, 3-D computer graphics, physical design of ICs, and map publishing system. He has published more than 20 papers in major national journals, such as Science in China, Chinese Journal of Computers, Chinese Journal of Software, and Chinese Journal of Semiconductors.

    About the Author—TIENIU TAN received his B.Sc. (1984) in electronic engineering from Xi'an Jiaotong University, China, and M.Sc. (1986), DIC (1986) and Ph.D. (1989) in Electronic Engineering from Imperial College of Science, Technology and Medicine, London, UK. In October 1989, he joined the Computational Vision Group at the Department of Computer Science, The University of Reading, England, where he worked as Research Fellow, Senior Research Fellow and Lecturer. In January 1998, he returned to China to join the National Laboratory of Pattern Recognition, the Institute of Automation of the Chinese Academy of Sciences, Beijing, China. He is currently Professor and Director of the National Laboratory of Pattern Recognition as well as President of the Institute of Automation. Dr. Tan has published widely on image processing, computer vision and pattern recognition. He is a Senior Member of the IEEE and was an elected member of the Executive Committee of the British Machine Vision Association and Society for Pattern Recognition (1996–1997). He serves as referee for many major national and international journals and conferences. He is an Associate Editor of the International Journal of Pattern Recognition, the Asia Editor of the International Journal of Image and Vision Computing and is a founding co-chair of the IEEE International Workshop on Visual Surveillance. His current research interests include speech and image processing, machine and computer vision, pattern recognition, multimedia, and robotics.

    View full text