Abstract
The ecological sciences have benefited greatly from recent advances in wireless sensor technologies. These technologies allow researchers to deploy networks of automated sensors, which can monitor a landscape at very fine temporal and spatial scales. However, these networks are subject to harsh conditions, which lead to malfunctions in individual sensors and failures in network communications. The resulting data streams often exhibit incorrect data measurements and missing values. Identifying and correcting these is time-consuming and error-prone. We present a method for real-time automated data quality control (QC) that exploits the spatial and temporal correlations in the data to distinguish sensor failures from valid observations. The model adapts to each deployment site by learning a Bayesian network structure that captures spatial relationships between sensors, and it extends the structure to a dynamic Bayesian network to incorporate temporal correlations. This model is able to flag faulty observations and predict the true values of the missing or corrupt readings. The performance of the model is evaluated on data collected by the SensorScope Project. The results show that the spatiotemporal model demonstrates clear advantages over models that include only temporal or only spatial correlations, and that the model is capable of accurately imputing corrupted values.
- Andersson, S., Madigan, D., and Perlman, M. D. 1995. A characterization of markov equivalence classes for acyclic digraphs. Ann. Stat. 25, 505--541.Google Scholar
- Aradhye, H. 1997. Sensor fault detection, isolation, and accommodation using neural networks, fuzzy logick, and bayesian belief networks. M.S. thesis. University of New Mexico, Albuquerque, NM.Google Scholar
- Chatfield, C. 2000. Time-Series Forecasting. Chapman & Hall/CRC, New York, NY.Google Scholar
- Cohen, J. A. 1960. A coefficient of agreement for nominal scales. Educat. Psych. Measure. 20, 37--46.Google ScholarCross Ref
- Daly, C., Redmond, K., Gibson, W., Doggett, M., Smith, J., Taylor, G., Pasteris, P., and Johnson, G. 2005. Opportunities for improvements in the quality control of climate observations. In Proceedings of the 15th AMS Conference on Applied Climatology. American Meteorological Society, Savannah, GA.Google Scholar
- Das, K. and Schneider, J. 2007. Detecting anomalous records in categorical datasets. In Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, New York, NY, 220--229. Google ScholarDigital Library
- Dean, T. and Kanazawa, K. 1988. Probabilistic temporal reasoning. In Proceedings of the 7th National Conference on Artificial Intelligence. MIT Press, Cambridge, MA, 524--529.Google Scholar
- Dechter, R. 1996. Bucket elimination: A unifying framework for probabilistic inference. In Proceedings of the 12th Conference on Uncertainty in Artificial Intelligence. E. Horvitz and F. Jensen Eds., Morgan Kaufmann, San Francisco, CA, 211--219. Google ScholarDigital Library
- Dereszynski, E. and Dietterich, T. 2007. A probabilistic model for anomaly detection in remote sensor data streams. In Proceedings of the 23rd Conference on Uncertainty in Artificial Intelligence (UAI’07). R. Parr and L. van der Gaag Eds., AUAI Press, Vancouver, BC, Canada, 75--82.Google Scholar
- Eskin, E. 2000. Anomaly detection over noisy data using learned probability distributions. In Proceedings of the 17th International Conference on Machine Learning. P. Langley Ed., Morgan Kaufmann, San Francisco, CA, 255--262. Google ScholarDigital Library
- Geiger, D. and Heckerman, D. 1994. Learning Gaussian networks. Tech. rep. MSR-TR-94-10. Microsoft Research, Redmond, WA.Google Scholar
- Geman, S. and Geman, D. 1984. Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images. IEEE Trans. Patt. Anal. Mach. Intell. 6, 20, 721--741.Google ScholarDigital Library
- Gillispie, S. B. and Perlman, M. D. 2002. The size distribution for Markov equivalence classes of acyclic digraph models. Art. Intell. 141, 1-2, 137--155. Google ScholarDigital Library
- Hill, D. J. and Minsker, B. S. 2006. Automated fault detection for in-situ environmental sensors. In Proceedings of the 7th International Conference on Hydroinformatics. Research Publishing Services, Singapore.Google Scholar
- Hill, D. J., Minsker, B. S., and Amir, E. 2007. Real-time Bayesian anomaly detection for environmental sensor data. In Proceedings of the 32nd Conference of IAHR. International Association of Hydraulic Engineering and Research, Madrid, Spain.Google Scholar
- Hodge, V. and Austin, J. 2004. A survey of outlier detection methodologies. Art. Intell. Rev. 22, 2, 85--126. Google ScholarDigital Library
- Ibarguengoytia, P., Sucar, L., and Vadera, S. 1996. A probabilistic model for sensor validation. In Proceedings of the 12th Conference on Uncertainty in Artificial Intelligence. E. Horvitz and F. Jensen, Eds., Morgan Kaufmann, San Francisco, CA, 332--333. Google ScholarDigital Library
- Isermann, R. 2005. Model-based fault detection and diagnosis: Status and applications. Ann. Rev. Contr. 29, 71--85.Google ScholarCross Ref
- Kalman, R. E. 1960. A new approach to linear filtering and prediction problems. Trans. ASME--J. Basic Eng. 82, Series D, 35--45.Google ScholarCross Ref
- keen Wong, W., Moore, A., Cooper, G., and Wagner, M. 2002. Rule-based anomaly pattern detection for detecting disease outbreaks. In Proceedings of the 18th National Conference on Artificial Intelligence. K. Ford Ed., AAAI Press, Menlo Park, CA, 217--223. Google ScholarDigital Library
- Lauritzen, S. 1992. Propogation of probabilities, means, and variance in mixed graphical association models. J. Amer. Stat. Assoc. 87, 420, 1098--1108.Google ScholarCross Ref
- Lauritzen, S. and Wermuth, N. 1989. Graphical models for associations between variables, some of which are qualitative and some quantitative. Ann. Stat. 17, 1, 31--57.Google ScholarCross Ref
- Matheron, G. 1963. Principles of geostatistics. Econ. Geol. 53, 8, 1246--1266.Google ScholarCross Ref
- Mehranbod, N., Soroush, M., Piovos, M., and Ogunnaike, B. A. 2003. Probabilistic model for sensor fault detection and identification. AIChe J. 49, 7, 1787--1802.Google ScholarCross Ref
- Mourad, M. and Bertrand-Krajewski, J. 2002. A method for automatic validation of long time series of data in urban hydrology. Water Sci. Tech. 5, 4-5, 263--270.Google Scholar
- Murphy, K. P. 1998. Inference and learning in hybrid Bayesian networks. Tech. rep. UCB/CSD-98-990. University of California, Berkeley. Google ScholarDigital Library
- Nicholson, A. E. and Brady, J. M. 1992. Sensor validation using dynamic belief networks. In Proceedings of the 8th Conference on Uncertainty in Artificial Intelligence. Morgan Kaufmann, San Francisco, CA, 207--214. Google ScholarDigital Library
- Pearl, J. 1988. Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. Morgan Kaufmann, San Francisco, CA. Google ScholarDigital Library
- Rabiner, L. R. 1990. A tutorial on hidden Markov models and selected applications in Speech Recognition, In Readings in Speech Recognition. Morgan Kaufmann, San Francisco, CA, 267--296. Google ScholarDigital Library
- Reis, B. Y., Pagano, M., and Mandl, K. D. 2003. Using temporal context to improve biosurveillance. Proc. Nat. Acad. Sci. 100, 4, 1961--1965.Google ScholarCross Ref
- Russell, S. and Norvig, P. 2003. Artificial Intelligence: A Modern Approach. Pearson Education, Upper Saddle River, NJ. Google ScholarDigital Library
- Schmidt, M., Nicolescu-Mizil, A., and Murphy, K. 2007. Learning graphical model structure using L1-regularization paths. In Proceedings of the 22nd AAAI Conference on Artificial Intelligence. AAAI Press, Menlo Park, CA, 1278--1284. Google ScholarDigital Library
- Sensirion. 2005. SHT1x/SHT7x Humidity & Temperature Sensor. Sensirion AG, Stääfa, Switzerland.Google Scholar
- Szalay, A. and Gray, J. 2002. The world-wide telescope, an archetype for online science. Tech. rep. MSR-TR-2002-75. Microsoft Research, Redmond, WA.Google Scholar
- Tsamardinos, I., Brown, L. E., and Aliferis, C. F. 2006. The max-min hill-climbing Bayesian network structure learning algorithm. Mach. Learn. 65, 1, 31--78. Google ScholarDigital Library
- Viera, A. J. and Garrett, J. M. 2005. Understanding interobserver agreement: The kappa statistic. Fam. Med. 37, 5, 360--363.Google Scholar
- Wang, L., Ramoni, M. F., Mandl, K. D., and Sebastiani, P. 2005. Factors affecting automated syndromic surveillance. Art. Intell. Med. 34, 3, 269--278. Google ScholarDigital Library
- Yuan, M. and Lin, Y. 2007. Model selection and estimation in the Gaussian graphical model. Biometrika 94, 1, 19--35.Google ScholarCross Ref
Index Terms
- Spatiotemporal Models for Data-Anomaly Detection in Dynamic Environmental Monitoring Campaigns
Recommendations
Hyperspherical cluster based distributed anomaly detection in wireless sensor networks
This article describes a distributed hyperspherical cluster based algorithm for identifying anomalies in measurements from a wireless sensor network, and an implementation on a real wireless sensor network testbed. The communication overhead incurred in ...
Learning from sensor network data
SenSys '09: Proceedings of the 7th ACM Conference on Embedded Networked Sensor SystemsWithin the PermaSense project, two wireless sensor networks have been deployed for a long-term operation in the Swiss Alps. For enabling state-of-the-art permafrost research based on the collected data, highest possible data quality and yield have to be ...
Wireless sensor networks for habitat monitoring
WSNA '02: Proceedings of the 1st ACM international workshop on Wireless sensor networks and applicationsWe provide an in-depth study of applying wireless sensor networks to real-world habitat monitoring. A set of system design requirements are developed that cover the hardware design of the nodes, the design of the sensor network, and the capabilities for ...
Comments