skip to main content
research-article

Spatiotemporal Models for Data-Anomaly Detection in Dynamic Environmental Monitoring Campaigns

Published:01 August 2011Publication History
Skip Abstract Section

Abstract

The ecological sciences have benefited greatly from recent advances in wireless sensor technologies. These technologies allow researchers to deploy networks of automated sensors, which can monitor a landscape at very fine temporal and spatial scales. However, these networks are subject to harsh conditions, which lead to malfunctions in individual sensors and failures in network communications. The resulting data streams often exhibit incorrect data measurements and missing values. Identifying and correcting these is time-consuming and error-prone. We present a method for real-time automated data quality control (QC) that exploits the spatial and temporal correlations in the data to distinguish sensor failures from valid observations. The model adapts to each deployment site by learning a Bayesian network structure that captures spatial relationships between sensors, and it extends the structure to a dynamic Bayesian network to incorporate temporal correlations. This model is able to flag faulty observations and predict the true values of the missing or corrupt readings. The performance of the model is evaluated on data collected by the SensorScope Project. The results show that the spatiotemporal model demonstrates clear advantages over models that include only temporal or only spatial correlations, and that the model is capable of accurately imputing corrupted values.

References

  1. Andersson, S., Madigan, D., and Perlman, M. D. 1995. A characterization of markov equivalence classes for acyclic digraphs. Ann. Stat. 25, 505--541.Google ScholarGoogle Scholar
  2. Aradhye, H. 1997. Sensor fault detection, isolation, and accommodation using neural networks, fuzzy logick, and bayesian belief networks. M.S. thesis. University of New Mexico, Albuquerque, NM.Google ScholarGoogle Scholar
  3. Chatfield, C. 2000. Time-Series Forecasting. Chapman & Hall/CRC, New York, NY.Google ScholarGoogle Scholar
  4. Cohen, J. A. 1960. A coefficient of agreement for nominal scales. Educat. Psych. Measure. 20, 37--46.Google ScholarGoogle ScholarCross RefCross Ref
  5. Daly, C., Redmond, K., Gibson, W., Doggett, M., Smith, J., Taylor, G., Pasteris, P., and Johnson, G. 2005. Opportunities for improvements in the quality control of climate observations. In Proceedings of the 15th AMS Conference on Applied Climatology. American Meteorological Society, Savannah, GA.Google ScholarGoogle Scholar
  6. Das, K. and Schneider, J. 2007. Detecting anomalous records in categorical datasets. In Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, New York, NY, 220--229. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Dean, T. and Kanazawa, K. 1988. Probabilistic temporal reasoning. In Proceedings of the 7th National Conference on Artificial Intelligence. MIT Press, Cambridge, MA, 524--529.Google ScholarGoogle Scholar
  8. Dechter, R. 1996. Bucket elimination: A unifying framework for probabilistic inference. In Proceedings of the 12th Conference on Uncertainty in Artificial Intelligence. E. Horvitz and F. Jensen Eds., Morgan Kaufmann, San Francisco, CA, 211--219. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Dereszynski, E. and Dietterich, T. 2007. A probabilistic model for anomaly detection in remote sensor data streams. In Proceedings of the 23rd Conference on Uncertainty in Artificial Intelligence (UAI’07). R. Parr and L. van der Gaag Eds., AUAI Press, Vancouver, BC, Canada, 75--82.Google ScholarGoogle Scholar
  10. Eskin, E. 2000. Anomaly detection over noisy data using learned probability distributions. In Proceedings of the 17th International Conference on Machine Learning. P. Langley Ed., Morgan Kaufmann, San Francisco, CA, 255--262. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Geiger, D. and Heckerman, D. 1994. Learning Gaussian networks. Tech. rep. MSR-TR-94-10. Microsoft Research, Redmond, WA.Google ScholarGoogle Scholar
  12. Geman, S. and Geman, D. 1984. Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images. IEEE Trans. Patt. Anal. Mach. Intell. 6, 20, 721--741.Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Gillispie, S. B. and Perlman, M. D. 2002. The size distribution for Markov equivalence classes of acyclic digraph models. Art. Intell. 141, 1-2, 137--155. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Hill, D. J. and Minsker, B. S. 2006. Automated fault detection for in-situ environmental sensors. In Proceedings of the 7th International Conference on Hydroinformatics. Research Publishing Services, Singapore.Google ScholarGoogle Scholar
  15. Hill, D. J., Minsker, B. S., and Amir, E. 2007. Real-time Bayesian anomaly detection for environmental sensor data. In Proceedings of the 32nd Conference of IAHR. International Association of Hydraulic Engineering and Research, Madrid, Spain.Google ScholarGoogle Scholar
  16. Hodge, V. and Austin, J. 2004. A survey of outlier detection methodologies. Art. Intell. Rev. 22, 2, 85--126. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Ibarguengoytia, P., Sucar, L., and Vadera, S. 1996. A probabilistic model for sensor validation. In Proceedings of the 12th Conference on Uncertainty in Artificial Intelligence. E. Horvitz and F. Jensen, Eds., Morgan Kaufmann, San Francisco, CA, 332--333. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Isermann, R. 2005. Model-based fault detection and diagnosis: Status and applications. Ann. Rev. Contr. 29, 71--85.Google ScholarGoogle ScholarCross RefCross Ref
  19. Kalman, R. E. 1960. A new approach to linear filtering and prediction problems. Trans. ASME--J. Basic Eng. 82, Series D, 35--45.Google ScholarGoogle ScholarCross RefCross Ref
  20. keen Wong, W., Moore, A., Cooper, G., and Wagner, M. 2002. Rule-based anomaly pattern detection for detecting disease outbreaks. In Proceedings of the 18th National Conference on Artificial Intelligence. K. Ford Ed., AAAI Press, Menlo Park, CA, 217--223. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Lauritzen, S. 1992. Propogation of probabilities, means, and variance in mixed graphical association models. J. Amer. Stat. Assoc. 87, 420, 1098--1108.Google ScholarGoogle ScholarCross RefCross Ref
  22. Lauritzen, S. and Wermuth, N. 1989. Graphical models for associations between variables, some of which are qualitative and some quantitative. Ann. Stat. 17, 1, 31--57.Google ScholarGoogle ScholarCross RefCross Ref
  23. Matheron, G. 1963. Principles of geostatistics. Econ. Geol. 53, 8, 1246--1266.Google ScholarGoogle ScholarCross RefCross Ref
  24. Mehranbod, N., Soroush, M., Piovos, M., and Ogunnaike, B. A. 2003. Probabilistic model for sensor fault detection and identification. AIChe J. 49, 7, 1787--1802.Google ScholarGoogle ScholarCross RefCross Ref
  25. Mourad, M. and Bertrand-Krajewski, J. 2002. A method for automatic validation of long time series of data in urban hydrology. Water Sci. Tech. 5, 4-5, 263--270.Google ScholarGoogle Scholar
  26. Murphy, K. P. 1998. Inference and learning in hybrid Bayesian networks. Tech. rep. UCB/CSD-98-990. University of California, Berkeley. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Nicholson, A. E. and Brady, J. M. 1992. Sensor validation using dynamic belief networks. In Proceedings of the 8th Conference on Uncertainty in Artificial Intelligence. Morgan Kaufmann, San Francisco, CA, 207--214. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Pearl, J. 1988. Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. Morgan Kaufmann, San Francisco, CA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Rabiner, L. R. 1990. A tutorial on hidden Markov models and selected applications in Speech Recognition, In Readings in Speech Recognition. Morgan Kaufmann, San Francisco, CA, 267--296. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Reis, B. Y., Pagano, M., and Mandl, K. D. 2003. Using temporal context to improve biosurveillance. Proc. Nat. Acad. Sci. 100, 4, 1961--1965.Google ScholarGoogle ScholarCross RefCross Ref
  31. Russell, S. and Norvig, P. 2003. Artificial Intelligence: A Modern Approach. Pearson Education, Upper Saddle River, NJ. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Schmidt, M., Nicolescu-Mizil, A., and Murphy, K. 2007. Learning graphical model structure using L1-regularization paths. In Proceedings of the 22nd AAAI Conference on Artificial Intelligence. AAAI Press, Menlo Park, CA, 1278--1284. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Sensirion. 2005. SHT1x/SHT7x Humidity & Temperature Sensor. Sensirion AG, Stääfa, Switzerland.Google ScholarGoogle Scholar
  34. Szalay, A. and Gray, J. 2002. The world-wide telescope, an archetype for online science. Tech. rep. MSR-TR-2002-75. Microsoft Research, Redmond, WA.Google ScholarGoogle Scholar
  35. Tsamardinos, I., Brown, L. E., and Aliferis, C. F. 2006. The max-min hill-climbing Bayesian network structure learning algorithm. Mach. Learn. 65, 1, 31--78. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Viera, A. J. and Garrett, J. M. 2005. Understanding interobserver agreement: The kappa statistic. Fam. Med. 37, 5, 360--363.Google ScholarGoogle Scholar
  37. Wang, L., Ramoni, M. F., Mandl, K. D., and Sebastiani, P. 2005. Factors affecting automated syndromic surveillance. Art. Intell. Med. 34, 3, 269--278. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Yuan, M. and Lin, Y. 2007. Model selection and estimation in the Gaussian graphical model. Biometrika 94, 1, 19--35.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Spatiotemporal Models for Data-Anomaly Detection in Dynamic Environmental Monitoring Campaigns

                Recommendations

                Reviews

                Esfandiar Haghverdi

                Portable sensor stations allow for the transport of equipment to sites of interest and make it possible to observe ecological phenomena at any desired spatial granularity. In addition, these networks operate at fine time resolution, thus generating a huge amount of data in need of automated data cleaning analysis (for example, remedial actions in case of sensor malfunction or outage). In this paper, the authors consider one such network-the SensorScope Station at the ?cole Polytechnique Fédérale de Lausanne in Switzerland-and in particular focus on air temperature data. The proposed data cleaning is based on a machine learning approach. Moreover, an adaptive quality control system is proposed that exploits both spatial and temporal relationships among multiple sensors in a site. After section 1's brief introduction to data analysis problems and challenges in multi-sensor networks, the authors provide a detailed introduction to the SensorScope system and data in sections 2 and 3. This is followed by a discussion of the hybrid Bayesian networks used to model the air temperature data. Such models are very appropriate for the task, as they contain both continuous and discrete variables. Despite the power of such models, they are static and cannot represent the dynamics of transitions from one time slice to another. To incorporate the temporal aspects, the Bayesian model is augmented by adding Markovian lag variables for each true temperature variable, thus changing the model to a dynamic Bayesian network, which is a spatiotemporal model. This is detailed in sections 4 and 5 with great rigor. The learned spatial models are then validated (using a series of leave-one-out prediction tests), and the efficacy of the dynamic Bayesian network model is evaluated in comparison with spatial and temporal quality control models on real data from SensorScope, justifying the pursuit of a spatiotemporal model. In addition, section 6 contains the performance evaluation of the model in terms of type I and II errors through noise-injection experiments. Following section 7's thorough discussion of related work, the final section (8) presents concluding remarks and future research directions. There are several interesting points in this paper: the use of hybrid Bayesian networks in modeling the spatial data; the use of dynamic Bayesian networks for complete (spatiotemporal) data modeling; and-perhaps the most important-the formulation of an automated quality control process in the domain of environmental monitoring sensor networks. Even though these models have been successfully applied to air temperature sensor data, they might face unexpected challenges when applied to other kinds of data-for example, wind velocity or soil moisture. However, it is highly likely that they can face these challenges successfully, as their structural flexibility will allow for other types of correlations to be incorporated. Online Computing Reviews Service

                Access critical reviews of Computing literature here

                Become a reviewer for Computing Reviews.

                Comments

                Login options

                Check if you have access through your login credentials or your institution to get full access on this article.

                Sign in

                Full Access

                • Published in

                  cover image ACM Transactions on Sensor Networks
                  ACM Transactions on Sensor Networks  Volume 8, Issue 1
                  August 2011
                  247 pages
                  ISSN:1550-4859
                  EISSN:1550-4867
                  DOI:10.1145/1993042
                  Issue’s Table of Contents

                  Copyright © 2011 ACM

                  Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

                  Publisher

                  Association for Computing Machinery

                  New York, NY, United States

                  Publication History

                  • Published: 1 August 2011
                  • Accepted: 1 October 2010
                  • Revised: 1 September 2010
                  • Received: 1 March 2009
                  Published in tosn Volume 8, Issue 1

                  Permissions

                  Request permissions about this article.

                  Request Permissions

                  Check for updates

                  Qualifiers

                  • research-article
                  • Research
                  • Refereed

                PDF Format

                View or Download as a PDF file.

                PDF

                eReader

                View online with eReader.

                eReader