Skip to main content

Advertisement

Log in

A Bayesian machine learning model for estimating building occupancy from open source data

  • Original Paper
  • Published:
Natural Hazards Aims and scope Submit manuscript

Abstract

Understanding building occupancy is critical to a wide array of applications including natural hazards loss analysis, green building technologies, and population distribution modeling. Due to the expense of directly monitoring buildings, scientists rely in addition on a wide and disparate array of ancillary and open source information including subject matter expertise, survey data, and remote sensing information. These data are fused using data harmonization methods, which refer to a loose collection of formal and informal techniques for fusing data together to create viable content for building occupancy estimation. In this paper, we add to the current state of the art by introducing the population data tables (PDT), a Bayesian model and informatics system for systematically arranging data and harmonization techniques into a consistent, transparent, knowledge learning framework that retains in the final estimation uncertainty emerging from data, expert judgment, and model parameterization. PDT aims to estimate ambient occupancy in units of people/1000 ft2 for a number of building types at the national and sub-national level with the goal of providing global coverage. We present the PDT model, situate the work within the larger community, and report on the progress of this multi-year project.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

Notes

  1. For example, the PAGER and Hazus systems (https://www.fema.gov/hazus) estimate building class populations as weighted linear combinations of demographic classes found in sources like the US Census, Dun and Bradstreet data, and the CIA World Factbook data [see, e.g., Table 1 in Jaiswal et al. (2010) and Table 13.2 in the Hazus Version 2 Manual (FEMA 2011)].

  2. The symbol | means “given.” The expression is read the “probability of θ given the observed information”.

  3. This claim arises from the fact that any local building specific data would also inform the global model and should not under normal inference methods present wider ranges.

  4. We assume that if there were fewer than 400 the report would say “nearly 400” instead of “nearly 500”.

  5. Technically the observational model is a truncated Gaussian with lower limit of 0 people/1000 ft2.

  6. Note that the total effects do not sum to one because the individual interactions between variables are unknown and counted multiple times in the interaction effect indices.

  7. Area can be a distribution as well where uncertainty about the actual square footage may be present.

References

  • Albert I, Donnet S, Guihenneuc-Jouyaux C, Low-Choy S, Mengersen K, Rousseau J (2012) Combining expert opinions in prior elicitation. Bayesian Anal 7(3):503–532

    Article  Google Scholar 

  • Axhausen K, Zimmermann A, Schönfelder S, Rindsfüser G, Haupt T (2000) Observing the rhythms of daily life: a six-week travel diary. Transportation 29(2):95–124

    Article  Google Scholar 

  • Badan Pusat Statistik (2010) Household by floor area of dwelling unit and households member size at http://sp2010.bps.go.id/index.php/site/tabel?tid=334&wid=1100000000. Accessed Apr 2014

  • Beckers Hospital Review (2015) http://www.beckershospitalreview.com/capacity-management/8-statistics-on-hospital-capacity.html. Accessed Nov 2015

  • Beresovsky V, Burt C, Parsons V, Schenker N, Mutter R (2011) Application of hierarchical Bayesian models with poststratification for small area estimation from complex survey data. Am Stat Assoc Jt Stat Meet Miami, FL

    Google Scholar 

  • Berger J (2010) Statistical decision theory and Bayesian analysis. Springer, New York

    Google Scholar 

  • Berlin Metropolitan School (BMS) (2015) metropolitanschool.com/home. Accessed Nov 2015 Hos

  • Bernardo J (2003) Bayesian statistics. In: Viertl R (ed) Encyclopedia of life support systems, probability and statistics. UNESCO, Oxford

    Google Scholar 

  • Bhaduri B, Bright E, Coleman P, Urban M (2007) LandScan USA: a high-resolution geospatial and temporal modeling approach for population distribution and dynamics. GeoJournal 69(1):103–117

    Article  Google Scholar 

  • Billari F, Graziani R, Melilli E (2012) Stochastic population forecasts based on conditional expert opinions. J R Stat Soc Ser A (Stat Soc) 175(2):491–511

    Article  Google Scholar 

  • Bolstad W (2007) Introduction to Bayesian statistics. Wiley, Hoboken

    Book  Google Scholar 

  • Bryant J, Graham P (2013) Bayesian demographic accounts: subnational population estimation using multiple data sources. Bayesian Anal 8(3):591–622. doi:10.1214/13-BA820

    Article  Google Scholar 

  • Buckland S, Newman K, Thomas L, Koesters N (2004) State-space models for the dynamics of wild animal populations. Ecol Model 171(1–2):157–175

    Article  Google Scholar 

  • Cooke R (1991) Opinion and subjective probability in science. Oxford University Press, New York

    Google Scholar 

  • Dell’Acqua F, Gamba P, Jaiswal K (2013) Spatial aspects of building and population exposure data and their implications for global earthquake exposure modeling. Nat Hazards 68(3):1291–1309

    Article  Google Scholar 

  • Earle P, Wald D, Jaiswal K, Allen T, Hearne M, Marano K, Hotovec A, Fee J (2009) Prompt assessment of global earthquakes for response (PAGER): a system for rapidly determining the impact of earthquakes worldwide. United States Geological Survey

  • Eguchi R, Goltz J, Seligson H, Flores P, Blais N, Heaton T, Bortugno E (1997) Real-time loss estimation as an emergency response decision support system: the Early Post-Earthquake Damage Assessment Tool (EPEDAT). Earthq Spectra 13(4):815–833

    Article  Google Scholar 

  • Elliott M, Little R (2000) A Bayesian approach to combining information from a census, a coverage measurement survey, and demographic analysis. J Am Stat Assoc 95(450):351–362

    Article  Google Scholar 

  • FEMA (2011) Hazus 2.0 Manual. https://www.fema.gov/media-library/assets/documents/21879. Accessed June 2014

  • French S (2011) Aggregating expert judgment. Rev Real Acad Cienc Exactas Fis Nat Ser A Mat 105(1):181–206

    Article  Google Scholar 

  • Furukawa Y, Curless B, Seitz SM, Szeliski R (2009) Reconstructing building interiors from images. In: IEEE 12th international conference on computer vision

  • Gamba P, Cavalca D, Jaiswal K, Huyck C, Crowley H (2012) The GED4GEM project: development of a global exposure database for the global earthquake model initiative. In: 15th world conference on earthquake engineering, Lisbon, Portugal

  • Garthwaite P, Kadane J, O’Hagan A (2005) Statistical methods for eliciting probability distributions. J Am Stat Assoc 100(470):680–700

    Article  Google Scholar 

  • GEM (2014) Global earthquake model. http://www.globalquakemodel.org/. Accessed Apr 2014

  • Genest C, Weerahandi S, Zidek J (1984) Aggregating opinions through logarithmic pooling. Theor Decis 17(1):61–70

    Article  Google Scholar 

  • Gonzalez  M, Hidalgo H, Barabasi A (2008) Understanding individual human mobility patterns. Nature 453(7196):779–782

    Article  Google Scholar 

  • Heid I, Kuchenhoff H, Miles J, Kreienbrock L, Wichmann H (2004) Two dimensions of measurement error: classical and Berkson error in residential radon exposure assessment. J Expos Anal Environ Epidemiol 14(5):365–377

    Article  Google Scholar 

  • Herrmann C, Metzler J (2013) Density estimation in aerial images of large crowds for automatic people counting. In: SPIE proceedings: ISR processing III: image exploitation, Baltimore, MD

  • Hong T, Lin H-W (2013) Occupant behavior: impact on energy use of private offices. Berkeley National Laboratory and the Green Energy and Environment Laboratories, Industrial Technology Research Institute, Taiway, ROC

  • Illinois Department of Public Health (IDPH) (2012) John H. Stroger Hospital of Cook County Profile. http://app.idph.state.il.us/files/BMI/2012%20Hosp%20Profiles/5272.pdf. Accessed on Sept 2015

  • Jaiswal K, Wald D (2008) Creating a global building inventory for earthquake loss assessment and risk management. United States Geological Survey

  • Jaiswal K, Wald D (2010) Development of a semi-empirical loss model within the USGS Prompt Assessment of Global Earthquakes for Response (PAGER) system. United States Geological Survey

  • Jaiswal K, Wald D, Earle P, Porter K, Hearne M (2009) Earthquake casualty models within the USGS Prompt Assessment of Global Earthquakes for Response (PAGER) system. In: Second international workshop on disaster casualties, University of Cambridge, UK

  • Jaiswal K, Wald D, Porter K (2010) A global building inventory for earthquake loss estimation and risk management. Earthq Spectra 26(3):731

    Article  Google Scholar 

  • Jaiswal K, Wald D, Earle P, Porter K, Herne M (2011) Earthquake casualty models within the USGS Prompt Assessment of Global Earthquakes for Response (PAGER) system. Human casualties in earthquakes. Springer, Berlin, pp 83–94

    Google Scholar 

  • Johnston RJ, Pattie CJ (1993) Entropy-maximizing and the iterative proportional fitting procedure. Prof Geogr 45(3):317

    Article  Google Scholar 

  • Joshi, B (2008) Prisons and the rights of detainees: a photo exhibition on prison conditions in Nepal. Office of the high commissioner for human rights in Nepal, Nepal

  • Kim JR, Muller JP (2002) 3D reconstruction from very high resolution satellite stereo and its application to object identification. In: International society for photogrammetry and remote sensing, symposium on geospatial theory, processing and applications, vol 34(4)

  • Kolendo A, Frumkin P (2012) Case study: the Art Institute of Chicago and the decision to start building. The Harris School of Public Policy at the University of Chicago

  • Luo Y, Gavrilova M (2006) 3D building reconstruction from LIDAR data. In: Gavrilova M, Gervasi O, Kumar V et al (eds) Computational science and its applications—ICCSA 2006, vol 3980. Springer, Berlin, pp 431–439

  • Martani C, Lee D, Robinson P, Britter R, Ratti C (2012) ENERNET: studying the dynamic relationship between building occupancy and energy consumption. Energy Build 47:584–591

    Article  Google Scholar 

  • Melfi R, Rosenblum B, Nordman B, Christensen K (2011) Measuring building occupancy using existing network infrastructure. In: Proceedings of the 2011 international green computing conference and workshops. IEEE Computer Society, pp 1–8

  • Meyn S, Surana A, Lin Y, Oggianu S, Narayanan S, Frewen T (2009) A sensor-utility-network method for estimation of occupancy distribution in buildings. In: 48th IEEE conference on decision and control

  • Ministry of Education Istanbul (MEI) (2015) tayfursokmenio.meb.k12.tr/tema/. Accessed Sept 2015

  • Morton A (2013) A process model for capturing museum population dynamics mathematics. California State Polytechnic University

  • Mugglin A, Carlin B (1998) Hierarchical modeling in geographic information systems: population interpolation over incompatible zones. J Agric Biol Environ Stat 3(2):111–130

    Article  Google Scholar 

  • Mugglin A, Carlin B, Gelfand A (2000) Fully model-based approaches for spatially misaligned data. J Am Stat Assoc 95(451):877

    Article  Google Scholar 

  • Ng E (2010) Designing high-density cities for social and environmental sustainability. EarthScan, London

    Google Scholar 

  • Nigerian MDG Information System (NIS) (2015) nmis.mdgs.gov.ng. Accessed Sept 1015

  • Noulas A, Scellato S, Lambiotte R, Pontil M, Mascolo C (2012) A tale of many cities: universal patterns in human urban mobility. PLoS ONE 7(5):e37027. doi:10.1371/journal.pone.0037027

    Article  Google Scholar 

  • Phillips L (1999) Group elicitation of probability distributions: Are many heads better than one? In: Shanteau J, Mellors B, Schum D (eds) Decision science and technology: reflections on the contributions of Ward Edwards. Kluwer Academic Publishers, Norwell

    Google Scholar 

  • Press J (2003) Subjective and objective Bayesian statistics: principles, models, and applications, 2nd edn., Wiley series in probability and statisticsWiley, New York

    Google Scholar 

  • Pujol G (2007) Sensitivity package, R package version 1.1

  • Raftery AE, Li N, Ševčíková H, Gerland P, Heilig GK (2012) Bayesian probabilistic population projections for all countries. Proceedings of the National Academy of Sciences 109:13915–13921

  • Royal London Hospital (RLH) (2015) The Royal London Hospital Quality Report. http://www.cqc.org.uk/sites/default/files/new_reports/AAAC0234.pdf. Accessed Sept 2015

  • Saltelli A, Tarantola S, Chan K (1999) A quantitative model-independent method for global sensitivity analysis of model output. Technometrics. 41(1):39–56

    Article  Google Scholar 

  • Schlich R, Axhausen K (2003) Habitual travel behaviour: evidence from a six-week travel diary. Transportation 30(1):13–36

    Article  Google Scholar 

  • Sharpe E, Skeggs T, McNaught S, Saraceno V, Stapley-Brown V (2013) Visitor figures 2013. The art newspaper. Allemandi Publishing, New York

    Google Scholar 

  • St. Nicholas School (SNS) (2015). stnicholas.com.br/highlights.php. Accessed Sept 2015

  • Stewart R, White D, Urban M, Morton A, Webster C, Stoyanov M, Bright E, Bhaduri B (2013) Uncertainty quantification techniques for population density estimates derived from sparse open source data. Proc SPIE Geospatial InfoFusion III (refereed) 8747:874705

    Article  Google Scholar 

  • Stewart R, Piburn J, Weber E, Urban M, Morton A, Thakur G, Bhaduri B (2016) Can social media play a role in developing building occupancy curves, Advances in Geocomputation: Geocomputation 2015—The 13th International Conference (in press)

  • Sutton P, Elvidge C, Obremski T (2003) Building and evaluating models to estimate ambient population density. Photogramm Eng Remote Sens 69(5):545–553

    Article  Google Scholar 

  • Tan Z, Xi W (2003) Bayesian analysis with consideration of data uncertainty in a specific scenario. Reliab Eng Syst Saf 79(1):17–31

    Article  Google Scholar 

  • Tehran Streetview (2015). http://map.tehran.ir/streetview/?lang=en. Accessed Oct 2015

  • Thakur GS, Bhaduri BL, Piburn JO, Sims KM, Stewart RN, Urban, ML (2015) PlanetSense: a real-time streaming and spatio-temporal analytics platform for gathering geo-spatial intelligence from open source data. Computers and society. In: ACM SigSpatial conference, Seattle

  • Trendafiloski G, Wyss M, Rosset P (2011) Loss estimation module in the second generation software QLARM human casualties in earthquakes. Springer, Berlin, pp 95–106

    Book  Google Scholar 

  • United Nations Economic Commission for Europe (2013) Country profiles on housing and land management. ECE/HBP/176, United Nations, Geneva Switzerland 

  • Wald D, Jaiswal K, So E, Gracia D, Marano K, Lin K, Hearne M, Greene M, D’Ayala D, Crowley H, Gamba P, Porter K (2011) The role of PAGER in improving global hazard, building, and loss inventories. Seismological Society of America Annual Meeting, Memphis (TE), Seismological Research Letters

  • WHE (2014) World housing encyclopedia project. http://www.world-housing.net/. Accessed 19 Nov 2014

  • Wisse B, Bedford T, Quigley J (2008) Expert judgement combination using moment methods. Reliab Eng Syst Saf 93(5):675–686

    Article  Google Scholar 

  • Wyss M, Tollis S, Rosset P, Pacchiani F (2013) Approximate model for worldwide building stock in three size categories. Report for world agency of planetary monitoring and earthquake risk reduction, Global Assessment Report on Disaster Risk Reduction, The United Nations Office for Disaster Risk Reduction

  • Yang DB, Gonzalez-Banos HH, Guibas LJ (2003) Counting people in crowds with a real-time network of simple image sensors. In: Proceedings ninth IEEE international conference on computer vision 2003

Download references

Acknowledgments

This manuscript has been authored by UT-Battelle, LLC under Contract No. DE-AC05-00OR22725 with the U.S. Department of Energy. The United States Government retains and the publisher, by accepting the article for publication, acknowledges that the United States Government retains a non-exclusive, paid-up, irrevocable, worldwide license to publish or reproduce the published form of this manuscript, or allow others to do so, for United States Government purposes.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Robert Stewart.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Stewart, R., Urban, M., Duchscherer, S. et al. A Bayesian machine learning model for estimating building occupancy from open source data. Nat Hazards 81, 1929–1956 (2016). https://doi.org/10.1007/s11069-016-2164-9

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11069-016-2164-9

Keywords

Navigation