Abstract
Understanding building occupancy is critical to a wide array of applications including natural hazards loss analysis, green building technologies, and population distribution modeling. Due to the expense of directly monitoring buildings, scientists rely in addition on a wide and disparate array of ancillary and open source information including subject matter expertise, survey data, and remote sensing information. These data are fused using data harmonization methods, which refer to a loose collection of formal and informal techniques for fusing data together to create viable content for building occupancy estimation. In this paper, we add to the current state of the art by introducing the population data tables (PDT), a Bayesian model and informatics system for systematically arranging data and harmonization techniques into a consistent, transparent, knowledge learning framework that retains in the final estimation uncertainty emerging from data, expert judgment, and model parameterization. PDT aims to estimate ambient occupancy in units of people/1000 ft2 for a number of building types at the national and sub-national level with the goal of providing global coverage. We present the PDT model, situate the work within the larger community, and report on the progress of this multi-year project.
Similar content being viewed by others
Notes
For example, the PAGER and Hazus systems (https://www.fema.gov/hazus) estimate building class populations as weighted linear combinations of demographic classes found in sources like the US Census, Dun and Bradstreet data, and the CIA World Factbook data [see, e.g., Table 1 in Jaiswal et al. (2010) and Table 13.2 in the Hazus Version 2 Manual (FEMA 2011)].
The symbol | means “given.” The expression is read the “probability of θ given the observed information”.
This claim arises from the fact that any local building specific data would also inform the global model and should not under normal inference methods present wider ranges.
We assume that if there were fewer than 400 the report would say “nearly 400” instead of “nearly 500”.
Technically the observational model is a truncated Gaussian with lower limit of 0 people/1000 ft2.
Note that the total effects do not sum to one because the individual interactions between variables are unknown and counted multiple times in the interaction effect indices.
Area can be a distribution as well where uncertainty about the actual square footage may be present.
References
Albert I, Donnet S, Guihenneuc-Jouyaux C, Low-Choy S, Mengersen K, Rousseau J (2012) Combining expert opinions in prior elicitation. Bayesian Anal 7(3):503–532
Axhausen K, Zimmermann A, Schönfelder S, Rindsfüser G, Haupt T (2000) Observing the rhythms of daily life: a six-week travel diary. Transportation 29(2):95–124
Badan Pusat Statistik (2010) Household by floor area of dwelling unit and households member size at http://sp2010.bps.go.id/index.php/site/tabel?tid=334&wid=1100000000. Accessed Apr 2014
Beckers Hospital Review (2015) http://www.beckershospitalreview.com/capacity-management/8-statistics-on-hospital-capacity.html. Accessed Nov 2015
Beresovsky V, Burt C, Parsons V, Schenker N, Mutter R (2011) Application of hierarchical Bayesian models with poststratification for small area estimation from complex survey data. Am Stat Assoc Jt Stat Meet Miami, FL
Berger J (2010) Statistical decision theory and Bayesian analysis. Springer, New York
Berlin Metropolitan School (BMS) (2015) metropolitanschool.com/home. Accessed Nov 2015 Hos
Bernardo J (2003) Bayesian statistics. In: Viertl R (ed) Encyclopedia of life support systems, probability and statistics. UNESCO, Oxford
Bhaduri B, Bright E, Coleman P, Urban M (2007) LandScan USA: a high-resolution geospatial and temporal modeling approach for population distribution and dynamics. GeoJournal 69(1):103–117
Billari F, Graziani R, Melilli E (2012) Stochastic population forecasts based on conditional expert opinions. J R Stat Soc Ser A (Stat Soc) 175(2):491–511
Bolstad W (2007) Introduction to Bayesian statistics. Wiley, Hoboken
Bryant J, Graham P (2013) Bayesian demographic accounts: subnational population estimation using multiple data sources. Bayesian Anal 8(3):591–622. doi:10.1214/13-BA820
Buckland S, Newman K, Thomas L, Koesters N (2004) State-space models for the dynamics of wild animal populations. Ecol Model 171(1–2):157–175
Cooke R (1991) Opinion and subjective probability in science. Oxford University Press, New York
Dell’Acqua F, Gamba P, Jaiswal K (2013) Spatial aspects of building and population exposure data and their implications for global earthquake exposure modeling. Nat Hazards 68(3):1291–1309
Earle P, Wald D, Jaiswal K, Allen T, Hearne M, Marano K, Hotovec A, Fee J (2009) Prompt assessment of global earthquakes for response (PAGER): a system for rapidly determining the impact of earthquakes worldwide. United States Geological Survey
Eguchi R, Goltz J, Seligson H, Flores P, Blais N, Heaton T, Bortugno E (1997) Real-time loss estimation as an emergency response decision support system: the Early Post-Earthquake Damage Assessment Tool (EPEDAT). Earthq Spectra 13(4):815–833
Elliott M, Little R (2000) A Bayesian approach to combining information from a census, a coverage measurement survey, and demographic analysis. J Am Stat Assoc 95(450):351–362
FEMA (2011) Hazus 2.0 Manual. https://www.fema.gov/media-library/assets/documents/21879. Accessed June 2014
French S (2011) Aggregating expert judgment. Rev Real Acad Cienc Exactas Fis Nat Ser A Mat 105(1):181–206
Furukawa Y, Curless B, Seitz SM, Szeliski R (2009) Reconstructing building interiors from images. In: IEEE 12th international conference on computer vision
Gamba P, Cavalca D, Jaiswal K, Huyck C, Crowley H (2012) The GED4GEM project: development of a global exposure database for the global earthquake model initiative. In: 15th world conference on earthquake engineering, Lisbon, Portugal
Garthwaite P, Kadane J, O’Hagan A (2005) Statistical methods for eliciting probability distributions. J Am Stat Assoc 100(470):680–700
GEM (2014) Global earthquake model. http://www.globalquakemodel.org/. Accessed Apr 2014
Genest C, Weerahandi S, Zidek J (1984) Aggregating opinions through logarithmic pooling. Theor Decis 17(1):61–70
Gonzalez M, Hidalgo H, Barabasi A (2008) Understanding individual human mobility patterns. Nature 453(7196):779–782
Heid I, Kuchenhoff H, Miles J, Kreienbrock L, Wichmann H (2004) Two dimensions of measurement error: classical and Berkson error in residential radon exposure assessment. J Expos Anal Environ Epidemiol 14(5):365–377
Herrmann C, Metzler J (2013) Density estimation in aerial images of large crowds for automatic people counting. In: SPIE proceedings: ISR processing III: image exploitation, Baltimore, MD
Hong T, Lin H-W (2013) Occupant behavior: impact on energy use of private offices. Berkeley National Laboratory and the Green Energy and Environment Laboratories, Industrial Technology Research Institute, Taiway, ROC
Illinois Department of Public Health (IDPH) (2012) John H. Stroger Hospital of Cook County Profile. http://app.idph.state.il.us/files/BMI/2012%20Hosp%20Profiles/5272.pdf. Accessed on Sept 2015
Jaiswal K, Wald D (2008) Creating a global building inventory for earthquake loss assessment and risk management. United States Geological Survey
Jaiswal K, Wald D (2010) Development of a semi-empirical loss model within the USGS Prompt Assessment of Global Earthquakes for Response (PAGER) system. United States Geological Survey
Jaiswal K, Wald D, Earle P, Porter K, Hearne M (2009) Earthquake casualty models within the USGS Prompt Assessment of Global Earthquakes for Response (PAGER) system. In: Second international workshop on disaster casualties, University of Cambridge, UK
Jaiswal K, Wald D, Porter K (2010) A global building inventory for earthquake loss estimation and risk management. Earthq Spectra 26(3):731
Jaiswal K, Wald D, Earle P, Porter K, Herne M (2011) Earthquake casualty models within the USGS Prompt Assessment of Global Earthquakes for Response (PAGER) system. Human casualties in earthquakes. Springer, Berlin, pp 83–94
Johnston RJ, Pattie CJ (1993) Entropy-maximizing and the iterative proportional fitting procedure. Prof Geogr 45(3):317
Joshi, B (2008) Prisons and the rights of detainees: a photo exhibition on prison conditions in Nepal. Office of the high commissioner for human rights in Nepal, Nepal
Kim JR, Muller JP (2002) 3D reconstruction from very high resolution satellite stereo and its application to object identification. In: International society for photogrammetry and remote sensing, symposium on geospatial theory, processing and applications, vol 34(4)
Kolendo A, Frumkin P (2012) Case study: the Art Institute of Chicago and the decision to start building. The Harris School of Public Policy at the University of Chicago
Luo Y, Gavrilova M (2006) 3D building reconstruction from LIDAR data. In: Gavrilova M, Gervasi O, Kumar V et al (eds) Computational science and its applications—ICCSA 2006, vol 3980. Springer, Berlin, pp 431–439
Martani C, Lee D, Robinson P, Britter R, Ratti C (2012) ENERNET: studying the dynamic relationship between building occupancy and energy consumption. Energy Build 47:584–591
Melfi R, Rosenblum B, Nordman B, Christensen K (2011) Measuring building occupancy using existing network infrastructure. In: Proceedings of the 2011 international green computing conference and workshops. IEEE Computer Society, pp 1–8
Meyn S, Surana A, Lin Y, Oggianu S, Narayanan S, Frewen T (2009) A sensor-utility-network method for estimation of occupancy distribution in buildings. In: 48th IEEE conference on decision and control
Ministry of Education Istanbul (MEI) (2015) tayfursokmenio.meb.k12.tr/tema/. Accessed Sept 2015
Morton A (2013) A process model for capturing museum population dynamics mathematics. California State Polytechnic University
Mugglin A, Carlin B (1998) Hierarchical modeling in geographic information systems: population interpolation over incompatible zones. J Agric Biol Environ Stat 3(2):111–130
Mugglin A, Carlin B, Gelfand A (2000) Fully model-based approaches for spatially misaligned data. J Am Stat Assoc 95(451):877
Ng E (2010) Designing high-density cities for social and environmental sustainability. EarthScan, London
Nigerian MDG Information System (NIS) (2015) nmis.mdgs.gov.ng. Accessed Sept 1015
Noulas A, Scellato S, Lambiotte R, Pontil M, Mascolo C (2012) A tale of many cities: universal patterns in human urban mobility. PLoS ONE 7(5):e37027. doi:10.1371/journal.pone.0037027
Phillips L (1999) Group elicitation of probability distributions: Are many heads better than one? In: Shanteau J, Mellors B, Schum D (eds) Decision science and technology: reflections on the contributions of Ward Edwards. Kluwer Academic Publishers, Norwell
Press J (2003) Subjective and objective Bayesian statistics: principles, models, and applications, 2nd edn., Wiley series in probability and statisticsWiley, New York
Pujol G (2007) Sensitivity package, R package version 1.1
Raftery AE, Li N, Ševčíková H, Gerland P, Heilig GK (2012) Bayesian probabilistic population projections for all countries. Proceedings of the National Academy of Sciences 109:13915–13921
Royal London Hospital (RLH) (2015) The Royal London Hospital Quality Report. http://www.cqc.org.uk/sites/default/files/new_reports/AAAC0234.pdf. Accessed Sept 2015
Saltelli A, Tarantola S, Chan K (1999) A quantitative model-independent method for global sensitivity analysis of model output. Technometrics. 41(1):39–56
Schlich R, Axhausen K (2003) Habitual travel behaviour: evidence from a six-week travel diary. Transportation 30(1):13–36
Sharpe E, Skeggs T, McNaught S, Saraceno V, Stapley-Brown V (2013) Visitor figures 2013. The art newspaper. Allemandi Publishing, New York
St. Nicholas School (SNS) (2015). stnicholas.com.br/highlights.php. Accessed Sept 2015
Stewart R, White D, Urban M, Morton A, Webster C, Stoyanov M, Bright E, Bhaduri B (2013) Uncertainty quantification techniques for population density estimates derived from sparse open source data. Proc SPIE Geospatial InfoFusion III (refereed) 8747:874705
Stewart R, Piburn J, Weber E, Urban M, Morton A, Thakur G, Bhaduri B (2016) Can social media play a role in developing building occupancy curves, Advances in Geocomputation: Geocomputation 2015—The 13th International Conference (in press)
Sutton P, Elvidge C, Obremski T (2003) Building and evaluating models to estimate ambient population density. Photogramm Eng Remote Sens 69(5):545–553
Tan Z, Xi W (2003) Bayesian analysis with consideration of data uncertainty in a specific scenario. Reliab Eng Syst Saf 79(1):17–31
Tehran Streetview (2015). http://map.tehran.ir/streetview/?lang=en. Accessed Oct 2015
Thakur GS, Bhaduri BL, Piburn JO, Sims KM, Stewart RN, Urban, ML (2015) PlanetSense: a real-time streaming and spatio-temporal analytics platform for gathering geo-spatial intelligence from open source data. Computers and society. In: ACM SigSpatial conference, Seattle
Trendafiloski G, Wyss M, Rosset P (2011) Loss estimation module in the second generation software QLARM human casualties in earthquakes. Springer, Berlin, pp 95–106
United Nations Economic Commission for Europe (2013) Country profiles on housing and land management. ECE/HBP/176, United Nations, Geneva Switzerland
Wald D, Jaiswal K, So E, Gracia D, Marano K, Lin K, Hearne M, Greene M, D’Ayala D, Crowley H, Gamba P, Porter K (2011) The role of PAGER in improving global hazard, building, and loss inventories. Seismological Society of America Annual Meeting, Memphis (TE), Seismological Research Letters
WHE (2014) World housing encyclopedia project. http://www.world-housing.net/. Accessed 19 Nov 2014
Wisse B, Bedford T, Quigley J (2008) Expert judgement combination using moment methods. Reliab Eng Syst Saf 93(5):675–686
Wyss M, Tollis S, Rosset P, Pacchiani F (2013) Approximate model for worldwide building stock in three size categories. Report for world agency of planetary monitoring and earthquake risk reduction, Global Assessment Report on Disaster Risk Reduction, The United Nations Office for Disaster Risk Reduction
Yang DB, Gonzalez-Banos HH, Guibas LJ (2003) Counting people in crowds with a real-time network of simple image sensors. In: Proceedings ninth IEEE international conference on computer vision 2003
Acknowledgments
This manuscript has been authored by UT-Battelle, LLC under Contract No. DE-AC05-00OR22725 with the U.S. Department of Energy. The United States Government retains and the publisher, by accepting the article for publication, acknowledges that the United States Government retains a non-exclusive, paid-up, irrevocable, worldwide license to publish or reproduce the published form of this manuscript, or allow others to do so, for United States Government purposes.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Stewart, R., Urban, M., Duchscherer, S. et al. A Bayesian machine learning model for estimating building occupancy from open source data. Nat Hazards 81, 1929–1956 (2016). https://doi.org/10.1007/s11069-016-2164-9
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11069-016-2164-9