Introduction

Criticisms of green buildings tend to revolve around the widespread marketing use of “greenwashing” – the advancement of unsubstantiated claims of environmental performance that are, at best, ambiguous and, at worst, false. Greenwashing is nothing new, but it is most common in industries and products that have a long or complex production cycle (Laufer 2003). For these products, it is difficult for a consumer to discern the exact nature of all materials used in their production and it is difficult to test claims of performance (Howarth, Haddad, and Paton 2000). For commercial buildings, even when actual energy consumption data is available, accurately comparing relative energy performance is complicated by the variation in physical characteristics, spatial orientation, occupancy and occupant behavior, and installed systems.

The confusion around measuring energy efficiency in buildings has been exacerbated by an over-reliance on single rule-of-thumb metrics, particularly energy use intensity (EUI), and eco-labels as proxies for energy performance. This ambiguity surrounding performance measures creates significant uncertainty for capital markets in assessing the value and risk implications of energy efficient buildings. The literature on the economic effects of green buildings has focused primarily on the price and occupancy effects of eco-labels (see, for example, Eichholtz, Kok, and Quigley 2010; Fuerst and McAllister 2011; Kahn and Kok 2013). Recent work has begun to examine the link between price impacts and energy consumption, but this research has been limited by incomplete data on actual energy consumption at the building level and relevant construction type, locational, and occupancy characteristics (Brounen and Kok 2011; Fuerst, McAllister, van de Wetering, and Wyatt 2011; Fuerst, van de Wetering, and Wyatt 2013).

The recent proliferation of energy disclosure policies in U.S. and global cities has begun to generate significant new streams of data on patterns of energy consumption in buildings (Burr, Keicher, and Lawrence 2013). The logic behind these policies is predicated on the power of measurement and information to shift awareness and market behavior around energy consumption and thus generate greater demand for more energy efficient properties (Kontokosta 2013). The mandatory reporting of energy consumption and other building characteristics allows the marketplace to identify better and worse energy performers based on a comparison of actual energy intensity to that predicted by regression models, a normalization process that controls for observed building characteristics. To date, one of the most significant recognition programs for energy efficiency is the U.S. Environmental Protection Agency’s (EPA) Energy Star rating system for commercial buildings, launched in 1999. Using data from the Commercial Building Energy Consumption Survey (CBECS), this tool normalizes for multiple building characteristics and weather to predict expected energy consumption and produces a percentile rank based on a frequency distribution of energy efficiency for similar space types.

This research seeks to improve methods to measure and model energy performance as distinct from typical energy intensity and efficiency metrics and to develop a multi-factor index of building energy performance. The purpose is to more precisely quantify relative energy performance, a critical first step in developing data-driven market diffusion and deployment strategies, including recognition and reward programs, targeted incentives and regulations, and evidence-based design guidelines, along with enhanced risk mitigation for investors.

Using a robust multivariate regression model, this paper examines the determinants of office building energy intensity across numerous physical, occupancy, and parcel characteristics (Chung et al. 2006; Griffith et al. 2008; Kahn, Kok, and Quigley 2013; Tso and Yau 2007; Turiel 1987; Xu et al. 2013; Zhao and Magoulès 2012). Based on an unprecedented dataset of actual building energy consumption from New York City’s Local Law 84, a predictive model is created to develop a building energy performance index for commercial office buildings. This work tests the significance of modifications to the EPA Energy Star methodology, and explores the implications of a market-specific benchmarking system that models predicted efficiency based on actual building energy consumption patterns in particular cities and markets.

Benchmarking Methods and Energy Performance Models – an Overview

Methods for comparing building energy performance can be divided into four categories (Olofsson, Meier, and Lamberts 2004). As Sartor et al. (2000) describe, these methods can be based on simulation models, point systems, end-use metrics, and regression models. Simulation-based models rely on energy simulation tools, such as DOE-2, to estimate an energy use profile based on specified design and use parameters. While such tools are quite useful in the building design process, differentials between actual and simulated energy consumption patterns limit the functionality of such a method in estimating a particular building’s performance with respect to similar buildings over time (Ahmad and Culp 2006). These ex ante simulation models are limited by post-design changes in operations and building management, tenancy and occupant use patterns, and other variations in climate conditions beyond those modeled.

Points-based models utilize a checklist and scoring system to account for a wide variety of design components and simulated performance characteristics. The most well-known of these systems, the U.S. Green Building Council’s Leadership in Energy and Environmental Design (LEED) rating system, assigns points for including certain sustainable design practices, material selections, and construction methods into the construction and operations of a new or existing building (Gowri 2004). This type of system acts as an important guide for building design and operations and provides a clear and recognizable symbol of a building’s overall quality and environmental impact (Kontokosta 2011). However, these points-based methods provide little quantitative information to compare the relative energy performance of buildings with different point totals (levels of certification) or between those that have been certified and those that have not. For instance, it would not be accurate to claim that a LEED Gold certified building would be more energy efficient than a LEED Silver certified building based purely on the certification level, nor that a LEED certified building is, necessarily, more energy efficient than a non-LEED certified building (Newsham, Mancini, and Birt 2009). However, the LEED rating system is regularly modified and updated and may incorporate a greater proportion of performance metrics over time (Todd, Pyke, and Tufts 2013).

End-use metrics focus on the performance of individual systems and provide “rules of thumb” for building operations. The end-use metric approach utilizes simple bivariate relationships to provide an approximation of the operational energy efficiency of specific systems. These metrics are typically used by building and facilities managers as an intuitive guide to approximate the proper functioning of specific equipment. However, since this method does not control for a full range of factors simultaneously, it provides only marginal usefulness in comparing the relative efficiency of two (or more) buildings. Energy use intensity is the most common metric used to describe whole building intensity, and has been criticized because of the mis-interpretation of EUI as an indicator of energy efficiency. While this metric is straightforward to communicate, it is ineffective as a reliable measure of relative energy performance when nontrivial variations exist in physical, occupancy, and other conditions across buildings.

Statistical or regression-based benchmarking methods provide a straightforward means to control for numerous building characteristics, thus allowing for a more precise identification of suitable control or comparison groups (Monts and Blissett 1982). This benchmarking methodology does, however, necessitate a large sample of building data to be statistically significant, a constraint that has limited the usefulness of such methods in the past (Federspiel, Zhang, and Arens 2002). For example, the U.S. EPA Energy Star building rating system utilizes the CBECS sample as the reference data set for its statistical models, although this survey contains only 498 useable observations for office buildings distributed across nine U.S. census regions (U.S. EPA 2007). While CBECS provides a rich picture of a national sample of buildings, the relatively few observations limits estimation techniques at a finer granularity, namely cities, markets, or regions, and can produce unreliable estimates across a heterogeneous sample of buildings.

The efficacy of regression-based benchmarking as a building energy performance measure is contingent upon several elements. First, sufficient data must be available to perform regression analyses and to achieve reasonable model significance. Second, the data available must include a range of building characteristics, with a level of confidence in the accuracy of the data provided. Robust, quality-controlled data are a prerequisite for the efficacy of regression-based benchmarking models. New sources of building energy data, as well as physical, operational, and occupancy characteristics, have emerged that have the potential to better understand patterns of building energy consumption (City of New York 2012; Kontokosta 2013). Third, the model specification must capture the appropriate nature of the relationship between the dependent and control variables, specifically addressing concerns that arise when using linear models to fit non-linear relationships and issues of collinearity across independent variables. Fourth, model results must be distilled into a common measure that can be easily understood and interpreted by a wide range of stakeholders, including building owners and managers, tenants, and investors and lenders.

Understanding relative energy performance, therefore, necessitates both an appropriate reference group from which to benchmark expected energy consumption and a model that accurately captures and accounts for similar characteristics between buildings to establish appropriate peer groups. Here, a commercial building energy performance measure is created to support operations and decision-making for building owners, investors, tenants, and managers by comparing energy consumption back to a reference group of matched buildings. It is expected that this measure can be used to compare a building’s energy performance against actual data for similar buildings, monitor changes over time, and evaluate a range of energy conservation measures (Barley et al. 2005). It also provides an initial screen to evaluate buildings where the potential for energy improvements may be highest.

Material and Methods

Data Description

The data for this study were collected through New York City’s Local Law 84 (LL84) energy disclosure mandate, which requires annual energy consumption reporting for large buildings. Local Law 84 defines “covered” buildings (buildings subject to the law) as:

  1. i.

    a building that exceeds 50,000 gross square feet,

  2. ii.

    two or more buildings on the same tax lot that together exceed 100,000 gross square feet, or (iii) two or more buildings held in the condominium form of ownership that are governed by the same board of managers and that together exceed 100,000 gross square feet.Footnote 1

In total, LL84 applies to more than 15,000 privately-owned properties across the five boroughs of New York City. These buildings account for approximately 45 % of the City’s total energy consumption and represent more than 2.9 billion square feet of space. For comparison, buildings subject to this law in New York City represents more space than in the entire urban center (Core Based Statistical Area) of Chicago, and account for a majority of the space subject to disclosure policies across the country (City of New York 2013).

This analysis includes all buildings for which energy data was submitted by August 1st, 2011, which accounts for 64 % of covered buildings, and is the actual whole-building annual energy consumption data for the year 2010. The LL84 database was then geocoded and merged with land use and geographic data at the tax lot level from the Primary Land Use Tax Lot Output (hereafter PLUTO) data file of the New York City Department of City Planning. The PLUTO dataset includes building and tax lot characteristics from files maintained by the Department of City Planning, the Department of Finance, the Department of City-wide Administrative Services, and the Landmarks Preservation Commission. The initial merged database contained 59 variables from PLUTO and 166 variables from the LL84 database as entered through the EPA’s Portfolio Manager tool.

Initially, the database contained information for 10,201 tax lots. However, substantial cleaning was performed to adjust for errors and omissions in data entry by building owners and their representatives through the EPA Portfolio Manager interface. An observation was dropped from the analysis if it exhibited one of the following characteristics:

  • Tax lot located outside of the political boundaries of the five boroughs of New York City

  • Missing designation of Facility Type (primary building use)

  • Missing (or insufficient) energy consumption information

  • Missing building size information

  • Duplicate entries

After removing observations with energy use intensities below the 1st percentile and above the 99th percentile by primary space type to eliminate outliers, the resultant cleaned database contains 8,648 buildings of various use types. Additional measures were taken to ensure the accuracy and validity of the remaining observations, including means testing with a quality-controlled sample of the dataset (New York City 2012). The dropped observations are indicative of a range of data entry errors and reflect, in some cases, uncertainty in how to report non-typical circumstances relating to building and meter configuration. For instance, two buildings on two separate parcels, but with a shared meter, might have been entered with all of the energy consumption attributed to just one building (and thus resulting in zero energy consumption for the other) or by arbitrarily allocating energy use between the two buildings.

Data from the CoStar Group were geocoded and integrated into the merged LL84-PLUTO database using a multi-dimensional matching technique that included Borough-Block-Lot (BBL), Building Identification Number (BIN), latitude and longitude coordinates, and, if necessary, address. The CoStar database provides additional variables not included in the LL84 dataset, such as building class, LEED certification status, construction material type, and annual operating expenses, which allow for a more comprehensive set of controls in the regression models. After merging and cleaning the two datasets, the resultant sample (hereafter LL84-PLUTO-CoStar) includes 685 office properties, all located in the borough of Manhattan. Figure 1 presents the spatial distribution of buildings for which LL84 and CoStar data are available.

Fig. 1
figure 1

Map of Properties Contained in LL84 and CoStar Databases

Descriptive Statistics

The 8,648 buildings in the full, cleaned LL84 dataset account for over 1.4 billion square feet of space, with multi-family buildings representing approximately 63 % of the total space. For the 948 office buildings included in the reported data, the median source EUI is 213.3 and the mean is 233.8 with a standard deviation of 111.0. The median EUI for New York City office buildings is within 1.6 % of the weather-adjusted primary EUI for office buildings in the Northeast region, according to the 2003 CBECS. For the merged sample of 685 office buildings that form the basis of this analysis, the median source EUI is 211.0 and the mean is 224.1 with a standard deviation of 92.4. Therefore, the sample of buildings for which CoStar data is available consists of buildings, on average, with lower EUIs than the full sample of office buildings collected through LL84, and a t-test indicates that this represents a statistically significant difference.

Figures 2, 3, 4, and 5 below provide a descriptive analysis of the type and quantity of space included in the dataset and patterns of energy consumption and intensity, as measured by source EUI unless otherwise indicated. Since billed site energy can include both primary and secondary energy forms, comparing buildings based on site energy does not account for thermodynamic equivalency of different fuel types. Source energy, through the use of conversion factors, includes the energy used (or lost) during generation and transmission in addition to energy consumed at the site (for more information on site-source energy conversion, see Deru and Torcellini 2007).

Fig. 2
figure 2

Histogram of Building Size, Office Buildings Only, Full LL84 Sample (n = 948)

Fig. 3
figure 3

Histogram of Source EUI with Kernel Density Plot, Office Buildings Only, Full LL84 Sample (n = 948)

Fig. 4
figure 4

Histogram of Source EUI, LL84-PLUTO-CoStar Sample (n = 685)

Fig. 5
figure 5

Spatial Distribution of Office Buildings by Source Energy Use Intensity (n = 948)

Fig. 6
figure 6

Scatterplot of PC Density and Worker Density

Figure 5 below shows the spatial distribution of office buildings by source EUI. Clusters of high EUI buildings are evident, particularly in the Midtown and Downtown submarkets. The highlighted areas demonstrate the cluster of high EUI buildings from 34th to 59th, particularly from Park Avenue to Seventh Avenue, and the difficulties in reporting energy consumption for multiple buildings on a lot, evidenced by the conditions in the upper box. The identification and visualization of spatial clustering can yield important insights for targeted policies, infrastructure investment, and potential applications of distributed energy generation.

Methodology

This section presents the empirical methodology used to analyze the determinants of building energy intensity, based on the merged LL84, PLUTO, and CoStar data, to develop a predictive model of building energy performance, and to establish a benchmarking framework for the comparison of peer-group energy efficiency. In addition to developing an energy performance index, the predictive model can be used to estimate energy efficiency and consumption patterns in buildings for which actual energy data is not available. This could be particularly useful in modeling non-compliant buildings, and buildings below the 50,000 square foot size threshold in New York City, as well estimating building consumption in other cities.

The Limitations of Current Methods

The proliferation of city energy disclosure policies, beginning with New York City, now allows for the development of market-specific benchmarking models. This is a significant advancement, as characteristics of building inventories in various markets differ substantially and thus models and resultant coefficient estimates predicated on a national sample (intrinsically diverse in building type, climate conditions, occupancy and use characteristics, among other factors) may not hold within specific markets. As importantly, relationships between variables exhibit different patterns within different building classifications, and the assumption of linear correlations between variables often fails. Market-specific benchmarks are also an important component of understanding – and improving – a city’s energy profile (and, by extension, carbon emissions from buildings). Similarly, investors and building owners often evaluate asset performance within a market-specific context; adding a localized energy benchmarking system should help to catalyze investments in more efficient properties and/or energy improvements.

The most widely-used benchmarking tool, the U.S. EPA Energy Star rating system, is based on a national sample of commercial building energy consumption, derived from 2003 CBECS data. According to the U.S. EPA (2007), the model specification is given by:

$$ y=\alpha +{\beta}_1 \log CWor \ker Density+{\beta}_2 \log COp Hours+{\beta}_3 PCDensity+{\beta}_4 \log FloorArea+{\beta}_5 HDD+{\beta}_5 CDD+\varepsilon $$

Where:

logCWorkerDensity :

the natural log of the number of workers in the building per 1,000 square feet (centered on the mean value for the sample)

logCOpHours :

the natural log of the average weekly operating hours for the building (centered on the mean value for the sample)

PCDensity :

the number of personal computers per 1,000 square feetFootnote 2

logFloorArea :

the natural log of the gross floor area of the building

HDD :

number of annual Heating Degree Days

CDD :

number of annual Cooling Degree Days

The dependent variable, y, is the source EUI and CBECS data are used as the reference sample. While this survey provides important insights into building energy use and building characteristics across a diverse sample of buildings representing several use types, the scope of the survey limits the sample size for any particular building type. Of particular relevance to this study, the CBECS dataset provides only 498 buildings eligible for use in developing the basis for the Energy Star office building rating methodology.

After estimating the Energy Star model using the New York City LL84 data as the reference set, it is found to explain less than 11.0 % of the variation of weather-normalized EUI in New York City office buildings, compared to an R-value of approximately 0.33 when using the national CBECS sample as the reference data. This difference in explanatory power may represent a possible mis-specification of the Energy Star model and suggests limitations to using this methodology as tool to compare peer building performance within the New York City market. Three of the four variables are found to be significant.Footnote 3 The lack of significance of the WorkerDensity variable may be explained, in part, by the collinearity with the PCDensity variable. This relationship is highlighted in Fig. 6 and reflected in the Pearson coefficient of 0.85. In addition, there appears to be a reporting default bias, with a higher proportion of buildings reporting the default values for occupancy-related variables, including worker density and operating hours. This may reflect the difficulty for some in collecting these data, and may bias estimation results.

Another important caveat in the Energy Star methodology is that it censors the FloorArea variable at 200,000 square feet; thus, all buildings larger than 200,000 square feet are entered into the model as being equal to 200,000 square feet (U.S. EPA, 2007). While this reflects the relatively few buildings larger than 200,000 in the CBECS national office sample, such censoring in the NYC context affects 333 of 685 buildings (48.6 %) in the sample used for this analysis. This right censoring of the data negates observed differences in the relationship between building size and EUI. Figure 7 below shows a scatterplot of EUI by building size, with three linear fit lines (one each for buildings less than 200,000 square feet, 200,000 to 1,000,000 square feet, and greater than 1,000,000 square feet, respectively). The linear fit lines demonstrate both varying slopes and intercepts for the three bins, indicating that there is no statistically significant relationship between building size and EUI for buildings less than 200,000, a positive relationship for buildings 200,000 to 1,000,000 square feet, and a positive discontinuity above 1,000,000 square feet. This censoring may result in a biased coefficient estimate for building size and, therefore, a lower degree of confidence in the prediction model output as specified.

Fig. 7
figure 7

Scatterplot of Building Size (Gross Floor Area) and Source EUI, with Linear Fit Line

To visualize the implications of using CBECS and the LL84 data as the reference samples for benchmarking energy performance, Fig. 8 shows a scatterplot comparing Energy Star scores for New York City office buildings, with the values of the x-axis calculated using CBECS data as the reference, and the y-axis calculated using New York City LL84 data as the reference. In other words, the y-axis values can be interpreted as the Energy Star score for each building calculated against other New York City buildings only. In both cases, the same methodology was used, which is the model developed by the EPA for calculating Energy Star scores presented above.

Fig. 8
figure 8

Scatterplot of Energy Star Scores using CBECS data as the baseline (x-axis) and Energy Star Scores using LL84 data as the market-specific baseline (y-axis)

The pattern of the scatterplot in Fig. 8 reveals the extent of the difference in Energy Star scores using the local market data as a baseline. While scores for very poorly performing and very well performing buildings remain relatively consistent, there is a wide variation among building scores in the middle quartiles. This reflects both the better performance, on average, of New York City buildings than the national sample, and the potential limitations of the methodology used currently to calculate Energy Star scores. The four variables – building size, personal computer density, worker density, and operating hours – used to match buildings in the current model specification fail to capture important differences across buildings that are relevant when looking at specific markets. For instance, while building age is not included in the Energy Star model, it is, as discussed below, a significant variable in understanding consumption patterns in New York City buildings. Given the range of building ages in New York City, this omission can lead to uncertainty in the results when using a model that excludes this characteristic.

Determinants of Commercial Building Energy Intensity

Here, the factors that drive building energy intensity are examined using robust multivariate regression techniques (Griffith et al. 2008; Tso and Yau 2007; Turiel 1987). Based on the model coefficients and tests of significance, the model output is used to predict building energy intensity and provide the computational basis for a building energy performance index from New York City data (Chung et al. 2006; Hernandez et al. 2008). It should be noted that there are limitations to using multiple regression modeling to predict building energy intensity (Tso and Yau 2007). For example, potential bias in the coefficient estimates may result from collinearity between the explanatory variables (as discussed above). However, this method is found to be appropriate given the nature and extent of the data and the purpose of this analysis. These limitations are accounted for in this analysis through appropriate model specification and logarithmic transformation of certain included variables.

For this analysis, building energy intensity is estimated using the following model specification:

$$ y=\alpha +\beta BLD{G}_i+\gamma OC{C}_i+\phi CONST+\delta EC{O}_i+\varepsilon $$

where BLDG consists of a range of physical building characteristics, including age, size, and lot location; OCC includes occupancy variables such as worker density and operating hours; CONST accounts for construction material and renovation status; ECO represents fuel type mix and the presence of eco-labels including LEED certification and Energy Star rating; β, γ, ϕ and δ are vectors of parameters to be estimated; and ε is the error term. The dependent variable, y, is the natural log of source EUI.

Due to significant differences in energy consumption across building types, and the space type-specific variables available in the LL84 database, individual models are needed for each primary building type, although certain variables are common across building types. These include:

Building Size :

Gross floor area of the building (as reported through LL84)

Building Age :

Age of building based on year built from PLUTO database (natural log)

Renovation :

A binary variable equal to 1 if the building has been renovated, and 0 otherwise (based on a variable from the PLUTO database)

Number of Floors :

Total number of floors in the building from PLUTO database

As-built Floor Area Ratio :

The actual, as-built floor area ratio (FAR) of the building from the PLUTO database. The FAR is calculated as the building area divided by the lot area

Inside Lot :

A binary variable for whether the building is an inside lot or corner lot (based on a variable from the PLUTO database)

Primary Energy Source :

A binary variable equal to 1 for the dominant fuel type/energy source in the building, and equal to 0 otherwise. A fuel type/energy source is considered dominant if it accounts for more than 50 % of the building’s total site energy consumption (LL84)

% Leased :

The percentage of space leased (CoStar data)

LEED Certification :

Binary variables for any certification or for specific certification levels (CoStar data)

Energy Star Label :

A binary variable equal to 1 if the building has an Energy Star label (CoStar data)

LEED & Energy Star :

An interaction variable equal to 1 if the building is both LEED Certified and Energy Star Labeled, and 0 otherwise (CoStar data)

For office buildings, the following variables are added to the base model variables above:

Office Worker Density :

Number of employees per 1,000 square feet (default value = 2.3) (LL84)

Office Weekly Operating Hours :

Number of hours per week that a building is occupied by at least 75 % of the tenant employees (default value = 65 h per week) (LL84)

% Air-conditioned Space :

Proportion of space that is air-conditioned (LL84)

% Data Center :

Proportion of space classified as data center use (LL84)

Building Class :

Class A, B, or C as a measure of building quality (CoStar data)

Construction Material :

Structural construction type (CoStar data)

Individual models for the full sample of office buildings, only Class A buildings, and only Class B buildings are estimated using linear regression techniques with Huber-White robust standard errors. Each model uses the natural log of weather normalized source EUI as the dependent variable. This semi-logarithmic transformation allows the coefficient estimates for independent variables to be interpreted as the effect of a one unit change on the percentage change in the dependent variable. Continuous variables, such as building size and worker density, are recoded to categorical variables or transformed to log values to account for observed nonlinearity and heteroskedasticity in the dataset. Variance inflation factors (VIF) are calculated to detect possible multi-collinearity issues. It should also be noted that in addition to the data cleaning steps described above, lots with multiple buildings are excluded from the analysis due to aggregated data reporting for these lots. Furthermore, it should be noted that possible explanatory variables for energy intensity, such as building systems (type of HVAC equipment, for instance) and variables relating to tenant firm type and occupant behavior are not included due to data limitations. Descriptive statistics are provided in Fig. 9, and the estimation results are presented below.

Fig. 9
figure 9

Descriptive Statistics, by Building Class, Office Buildings, LL84-PLUTO-CoStar Sample (n = 685)

Results

Determinants of Commercial Building Energy Intensity

The full model explains approximately 33 % of the variation in energy intensity across office buildings, as shown in the regression output in Fig. 10. This model incorporates a unique range of building characteristics, including building class, lot location, usage patterns, and construction material, although that limited systems-level and data on tenant type constrains its explanatory power. The coefficient estimates shown can be approximated as the percent change in source EUI for a unit change in the independent variable. It should be noted that the precise interpretation of the relationship between dummy variables and the dependent variable is given by exp(c)-1 where c is the coefficient value (Halvorsen and Palmquist 1980). Variance Inflation Factors are found to be in the acceptable range, with no individual variable having a VIF higher than 2.0.

Fig. 10
figure 10

Regression Results, log Source EUI as Dependent Variable. ***Significant at the 99 % confidence level, **Significant at the 95 % confidence level, *Significant at the 90 % confidence level

Examining the results of the regression analysis presented in Fig. 10, several variables are found to be statistically significant at or above the 95 % confidence level. Notably, building age is negatively correlated with EUI. Therefore, buildings built since 1990 are found to have higher energy intensities than those more than 80 years old (the reference category in the model). Newer buildings consume approximately 40 % more energy per square foot, on average, than buildings built before 1930, controlling for the factors included in the model. Buildings built between 1930 and 1990 do not demonstrate a statistically significant difference from those built before 1930. These findings are consistent across, and within, building classes and consistent with a recent study of the energy intensity of buildings constructed over the past two decades (Andrews and Krogmann 2009). These older buildings were constructed at a time when natural light and air were the primary means of space lighting and ventilation, facades were typically designed with lower glazing ratios (on the order of 20–40 % versus 50–75 % for modern glass curtainwall structures), and relatively thick masonry walls created high thermal mass (Oldfield, Trabucco, and Wood 2009). Together, these factors may contribute to the relatively lower energy intensity of pre-1930s office buildings when compared to those constructed since the 1990s.

This is a significant result that warrants additional analysis. In particular, it is possible that omitted variable bias may influence this finding, as older buildings may systematically have less energy-intensive tenants/users and that the technologies installed in such buildings result in lower energy demand than newer buildings. In addition, the newest building in the sample was completed in 2007 (reflecting broader economic conditions post-2007), which is prior to the adoption of increased building code energy efficiency standards in 2008 (referencing ASHRAE 90.1-2004) and the passage of Local Law 85, which created the New York City Energy Conservation Code (effective as of 2010). Therefore, this finding does not represent the possible effects of these more stringent energy code requirements. However, given the controls in this model, the results suggest older existing buildings may present an untapped opportunity for energy efficient space in industrial-era cities with a significant inventory of older and pre-World War II buildings and raise questions regarding the effects of modern office building design preferences and systems.

Several physical building characteristics, such as size, space type, and location, merit attention. First, no relationship is found between building floor area and source EUI. After controlling for the other variables in the model, larger buildings are not found to have a statistically significant difference in EUI than smaller (between 50,000 and 200,000 square feet) buildings. For Class A buildings, however, larger buildings are found to be more efficient, controlling for other factors, than buildings less than 200,000 square feet, with buildings greater than 1,000,000 square feet being the most efficient on a per square foot basis. Arguments regarding the expected efficiency of larger buildings can be made in either direction. On the one hand, larger buildings should have efficiencies of scale, reflected in the fact that the marginal EUI attributed to base building systems should decrease, up to a point, as square footage is added and systems operations can be optimized. On the other hand, larger buildings require additional vertical conveyance systems and taller buildings require more energy to circulate people, water, and air. In support of the latter argument, the results indicate that taller buildings do, in fact, have higher EUIs, with each additional floor increasing EUI by approximately 1.0 %. To control for building massing, the as-built floor area ratio of the building is included. This variable is found to be negatively correlated with EUI, suggesting that lower and more densely constructed buildings on a particular parcel are more efficient than towers with small footprints relative to lot size.

Controlling for multiple uses in a building is an important element of accurate building comparison and benchmarking. In the LL84 database, the primary space type is based on which use accounts for more than 50 % of total gross floor area. Space characteristics for any additional, secondary uses are also provided, and these can be intensive users of energy. In particular, the proportion of data center space in an office building is an important determinant of whole-building energy consumption (Pan, Yin, and Huang 2008). Therefore, understanding the energy use of data centers within office buildings is critical component of an accurate performance index for comparing building energy efficiency in commercial markets. Here, the coefficient for the percent of data center space is positive and statistically significant. The magnitude of the coefficient indicates that for every 10 % increase in data center space within a building, the building’s EUI increases by approximately 34 %.

The unique database employed in this study also allows for the analysis of certain parcel-level characteristics that could have an impact on building energy intensity. One variable from the PLUTO dataset that is included in the model is a measure for whether the building is on an inside lot with adjacent buildings on either side. The coefficient for the inside lot dummy variable indicates that buildings on interior lots have a lower EUI than corner buildings or freestanding buildings, by approximately 9.0 %. This is an important, but expected, finding: a majority of heat loss in high-rise buildings is through the building’s vertical envelope rather than its roof. When buildings share party walls, and thus have conditioned space on either side, thermal loss is minimized through the building envelope (Al-Homoud and Mohammad 2005).

Occupancy variables are shown to be key drivers of building energy efficiency. Worker densities vary considerably across usage and tenant types in office buildings, from relatively low-density law firm use (averaging 250 to 400 gross square feet per employee), for example, to very high-density trading floors or call centers. Understanding and controlling for occupant density and the operational hours of a building are critical to the reliable and effective identification and comparison of peer groups of buildings.Footnote 4 The regression results shown in Fig. 10 indicate that the coefficients for both weekly operating hours and worker density are positive and significant. As expected, this finding indicates that as operating hours or occupant density increases, so does the EUI of the building, after controlling for the factors included the model. Therefore, it is shown that buildings with more people working longer hours have higher EUIs. This reinforces previous empirical evidence on building energy consumption from CBECS data and highlights the importance of understanding building usage and occupant characteristics in attempting peer-to-peer building efficiency comparisons (Santin et al. 2009). A quadratic term for the operating hours is included in the model to account for non-linearity in the relationship between operating hours and EUI (see Fig. 11). The coefficient is found to be negative, indicating a threshold effect of buildings reporting higher than average operating schedules. However, it is unclear whether those properties reporting the maximum number of operating hours (168 per week) are fully operational 24/7 or if this represents a reporting error due to ambiguity around how to calculate operating hours.

Fig. 11
figure 11

Scatterplot of Operating Hours and Source EUI with Quadratic Fit

The merged CoStar data allow for a robust analysis of the energy consumption effects of LEED certification status (Certified, Silver, Gold, or Platinum), construction material type (structural steel and curtainwall, masonry, or reinforced concrete), and building class (A, B, or C).

It is unclear precisely what the expected effect of these variables should be on building energy efficiency. Buildings with a LEED certification of any level, particularly those attaining the more stringent ratings of Gold or Platinum, should be more efficient, given the sustainable features required to secure the certification. However, studies have shown that LEED certified buildings are not necessarily better energy performers than their non-certified counterparts (Newsham, Mancini, and Birt 2009; Turner and Frankel 2008). In New York City, where city building energy codes are more stringent, on average, than many other major markets in the U.S., the relative improvement from following LEED guidelines for energy consumption may be minimal. Also, LEED certified properties tend to be newer, Class A office properties that attract higher intensity energy users as tenants.

Based on thermal envelope and insulation properties (R-values for wall assemblies), it can be expected that masonry buildings, which typically have a lower glazing ratio than curtainwall buildings, should be more energy efficient than comparable buildings of other construction types. The effect of reinforced concrete structures is dependent on the facade treatment, and therefore it is unclear what the effect of this type of construction will be on energy performance. In addition, a significant issue in both structural steel frame buildings with concrete slabs and reinforced concrete buildings are the exposed slab edges, which provide little insulation (R) value and are a source of heat transfer and loss.

Finally, the class of the building may have a relationship to energy efficiency. Class A buildings tend to be newer, attract higher energy consumption tenants, and to be constructed with steel or reinforced concrete and curtainwall construction. These buildings also tend to have more amenities, including elaborate lobby and atrium spaces, which may further increase energy consumption. Class C buildings are often older, interior lot buildings with masonry construction and lower glazing ratios. The technology installations may also be less sophisticated, with lower power requirements, thus limiting overall building energy consumption from IT infrastructure and plug load demand.

The results show that buildings that are LEED certified do not have statistically significant differences in EUI when compared to non-certified buildings. Here, buildings are grouped either as having any level of certification or none at all. Although the coefficient for LEED certification is positive, it is not statistically significant, indicating that a LEED certification does not, by itself, signal that a building is more energy efficient when controlling for other factors. When the individual levels of certification are included – Certified, Silver, Gold, and Platinum – the coefficients are found to be negative for Gold and Platinum, but not statistically significant. The reliability of these coefficient estimates, however, is limited by the small sample size of LEED certified buildings in the database (n = 38), a constraint that is magnified when grouping by certification level (since, for instance, there is only one LEED Platinum building in the New York City sample). Figure 12 shows the median source EUI and Energy Star scores for LEED and non-LEED buildings. This descriptive analysis does indicate significantly higher Energy Star scores for LEED certified buildings, but also higher source EUIs, with the exception of LEED Gold buildings.

Fig. 12
figure 12

Median Source EUI and Energy Star Score by LEED Certification Level. Note: There is only one LEED Platinum building in the sample; that category is omitted

Buildings with an Energy Star label are found to be approximately 11 % more energy efficient than non-labeled buildings. This is somewhat consistent with the requirements for the Energy Star label, which necessitates that a building demonstrate energy performance, using actual data, in the top 25 % for that facility type in the national CBECS sample. However, for the national sample of buildings with Energy Star scores, buildings with an Energy Star label (a score of 75 or higher) have, on average, a 35 % lower EUI. This discrepancy warrants further investigation.

Interestingly, the interaction term for buildings that are both LEED certified and Energy Star labeled is negative and significant, indicating that these buildings are more efficient than non-certified or non-labeled buildings. The magnitude of the coefficient also reveals that buildings that have both of these eco-labels are more efficient than buildings that are only Energy Star labeled. This suggests that LEED certified buildings that have emphasized energy efficiency in the design process are more efficient than buildings that only focused on energy efficiency, and not on the other factors addressed in the LEED certification process. The greater focus on energy requirements in the most recent iteration of the LEED rating system should support more consistent energy performance in LEED buildings going forward (Todd, Pyke, and Tufts 2013).

Construction materials affect building energy performance along a number of dimensions. Even after controlling for building age, masonry buildings have lower EUIs than structural steel frame buildings, by approximately 7 %, suggesting that the proliferation of glass curtainwall systems may require additional consideration be given to the potential implications of such office (and increasingly, residential) design trends on building energy performance.

Finally, Class C buildings are found to have substantially lower EUIs than Class A buildings, on the order of 26 % less energy consumption per square foot. This is an important finding, and one that is critical for targeting policies and incentives to certain building types. This result warrants further analysis, as the explanation for the relative efficiency of Class C buildings with respect to Class A buildings could hinge on several factors. First, Class A, and to a lesser extent Class B, buildings are often newer, taller, glass curtain wall/structural steel frame design, and either freestanding or on corner lots. It may be that the impact of such physical and locational variables is greater when they are present together than when looking at each factor independently (recall that the model controls for such factors). Second, it is possible that this result is capturing the effect of some unobserved variable(s) in the model. Third, and related to the previous, it is believed that the findings for Class B and Class C represent the effects of occupiers on building EUI. Higher-quality buildings will typically command higher rents, which can be associated with a number of factors, including higher intensity tenant energy users. Analytics and information technology-based tenants, for instance, may require higher energy consumption to conduct their business than more traditional firms, and these energy demands may not be fully reflected in the operating hours or worker density of the building. The energy performance effects of occupier type and lease structure are important areas for future research.

Discussion

Developing a Commercial Building Energy Performance Index

Using the coefficient estimates from the regression results above, the predictive capacity of the models can be tested using actual data from observations in the LL84 database. The predicted model values are compared to the actual EUIs based on building characteristics in the LL84-PLUTO-CoStar dataset. This dataset provides a suitable basis for a more robust methodology that accounts for additional building characteristics (physical, human, and energy-related) to improve peer group comparisons across buildings and create an energy performance index for commercial office buildings.

To explore the potential to expand the model and make it more robust in the New York City context, here I present a methodology for a commercial building energy performance index. The predictive model described above provides the basis for developing a market-specific benchmark for office buildings and, potentially, other facility types, particularly multi-family residential. It is noted that the model presented here is preliminary, and would be strengthened considerably by adding additional information on building systems, design, and tenant characteristics. Building energy audit data could be used to supplement the LL84 and PLUTO databases and create a more robust predictive model.

The model uses the coefficient estimates for variables presented above that are statistically significant at the 95 % confidence level. The predicted building EUI is given by the sum of the actual variable value multiplied by the coefficient estimate for that variable for all statistically significant variables, as specified below:

$$ \widehat{y_i} = {\displaystyle \sum_{i=0}^n}\alpha + {\beta}_1{X}_{1 i}+{\beta}_2{X}_{2 i}+\dots +{\beta}_n{X}_{n i} $$

Figures 13 and 14 show a scatterplot of actual versus predicted EUI values and a quantile-quantile (Q-Q) plot, respectively, for office buildings in the sample. The Q-Q plots display the relationship of actual EUI values to those predicted by the model for the entire LL84-PLUTO-CoStar sample. To clarify the interpretation of the graphs, if the predictive model was perfectly accurate in predicting building EUI, then all dots would fall on the upward-sloping diagonal line. The model tends to under-estimate EUI for less efficient (higher EUI) buildings, indicating that there are unobserved variables that are driving energy consumption in higher EUI buildings. These factors could include the type of HVAC and mechanical systems, occupier information, and the performance and operations of property and building managers.

Fig. 13
figure 13

Scatterplot, Actual v. Predicted Energy Use Intensity, Office Buildings

Fig. 14
figure 14

Q-Q Plot, Actual v. Predicted EUI, Office Buildings

To estimate whether a building is “inefficient” or is being operated as efficiently as can be expected, given physical, location, and occupancy characteristics, the ratio of actual to predicted source EUI, labeled the Energy Efficiency Ratio (EER), is calculated. The EER is given by:

$$ EER= Actual\ Source\ EUI/ Predicted\ Source\ EUI $$

A lower value indicates a particular building is more efficient than predicted, and the opposite for higher values. Figure 15 below shows a cumulative distribution plot of EER values for the sample. It should be noted that 80 % of the buildings have EER values of 1.06 or less and that 60 % of buildings in the sample have EER values between 0.67 and 1.06.

Fig. 15
figure 15

Cumulative EER Distribution

Finally, to calculate the normalized building performance index, the EER values are then converted to a standardized score given by:

$$ \boldsymbol{z} = \frac{\boldsymbol{x} - \boldsymbol{\mu}}{\boldsymbol{\sigma}} $$

The standardized scores are then converted to a percentile rank order, creating an index value on a 1 to 100 scale, labeled here at the New York City Energy Performance Index (NYC-EPI). Buildings with higher index values are considered better performers. The distribution of NYC-EPI relative to Source EUI is presented in Fig. 16. The overlay is a local polynomial smoothed line derived by weighted least squares regression (Cleveland 1979; Muller 1987). The smoothed line provides a visual relationship between the NYC-EPI and Source EUI values. Figure 16 demonstrates the importance of both a model of energy intensity that captures a broad range of physical, occupancy, and locational characteristics, and for benchmarking to an appropriate reference case. Of note is the wide variation in NYC-EPI values for the range of Source EUIs between 180 and 250, a result that highlights the limitations of using EUI as an effective comparative metric of energy performance.

Fig. 16
figure 16

Scatterplot of NYC Energy Performance Index (NYC-EPI) and Source EUI, with local polynomial smoothed line

Cluster Analysis

Building on the normalized energy performance index values, cluster analysis provides a useful tool for segmenting buildings into bins based on their relative energy performance. In this case, k-means partition clustering is used to group buildings based on their energy performance index as developed above across four categories of performance. This binning could be thought of as a means to assign “grades” to buildings based on their energy performance, from ‘A’ to ‘D’. Figure 17 shows scatterplots for building size and source EUI based on group classification, as an indicator of the differentiation across performance clusters. The lowest performing group includes a greater proportion of newer buildings and demonstrates a higher dispersion in EUI when compared to the other groups. There is significant overlap across the clusters, suggesting that additional factors are needed to identify common characteristics of buildings with similar energy performance levels. Considering the model accounts for approximately one-third of the variation in EUI, this is not unexpected.

Fig. 17
figure 17

Scatterplot of Building Size and Source EUI, by EER Cluster

It should be noted that the variation in physical characteristics between the best-performing cluster and the worst-performing cluster are minimal. The median floor area, worker density, personal computer density, operating hours, and age are nearly identical between the two clusters. There is, however, a more significant difference in primary fuel type, represented in Fig. 18 below. Figure 18 shows electric energy consumption as a percent of total consumption for each of the four identified clusters, an outcome partially captured in the use of Source EUI and the conversion factors for electric energy.

Fig. 18
figure 18

Scatterplot of Electric Energy Consumption as % of Total Consumption and Source EUI, Weighted by Building Size, by EER Cluster

There are numerous applications of these clustering techniques. First, following the calculation of the energy performance index, clusters can be used to establish performance “grades” for buildings that would allow for a more intuitive interpretation of this methodology by the general public. Second, clustering can provide a tool for targeting policies, either based on relative performance or similar building characteristics, or both. Third, the clusters can provide suitable comparison groups for measuring improvements in energy efficiency over time.

Implications and Conclusions

Appropriate building energy performance metrics and an accurate method for comparing building energy efficiency provide the necessary foundation for market transformation predicated on transparent, competitive markets through energy use disclosure policies. This paper uses new data sources to demonstrate methodologies to improve benchmarking models and to develop a more reliable and comprehensive index of building energy performance. The findings presented here illustrate the limitations of current models and the potential constraints of using national data samples as reference baselines, and provide the basis for a new tool for evaluating and comparing building energy performance of office buildings. Several implications are worth noting:

  • Collecting appropriate data

    Data limitations continue to constrain the effectiveness of regression-based benchmarking systems and prediction models. For example, the results of this analysis demonstrate the significance of variables such as construction material type, secondary uses (such as data centers) and spatial controls, specifically those describing the relationship of a building to its surroundings. In densely urbanized areas, the spatial relationship between buildings is an important element for understanding thermal properties and solar exposure, and thus significant to measuring relative building energy use. More importantly, the effects of systems-level data and expanded occupant data – particularly types of uses, improved measures of occupant density, and utility payment structures – should be explored further to determine the influence of these factors in explaining differences in energy consumption across physically similar buildings.

    As important as the question of what to measure and collect, data quality and reliability are significant concerns with building energy data. The reliability of energy consumption figures and key occupancy variables such as worker density and operating hours remains somewhat uncertain. Even variables integral to the baseline energy efficiency metrics, such as gross floor area, suffer from mis-measurement. As there are few clear and universal definitions and data standards for this type of information, improvements to data quality through standardization, normalization, and automation could be significant.

  • Market-specific benchmarking models

    The diffusion of energy disclosure policies, both in the U.S. and globally, has begun to create a wealth of non-self-selected, market-specific building energy data. These data increase regional sample sizes and present an opportunity to develop market-specific benchmarking tools that can be updated more frequently than national samples. As demonstrated here, the use of a national sample of buildings to estimate a benchmarking model has limitations, as predictive model coefficients derived from a national sample may fail when applied to a more localized, heterogenous sample of buildings. The issue of building size, for instance, is indicative of the need to calibrate predictive models using market-specific data. In addition, asset-specific decisions on whether to retrofit a building or make other energy-related investments are affected by the relative performance of that asset within a particular market or submarket. A reliable benchmarking tool at the local level will help to facilitate such decisions. It remains to be determined how best to integrate national and market-specific benchmarking models.

  • Performance segmenting and clustering of the market

    Segmenting the market by energy performance and building characteristics is needed to optimally allocate limited resources for incentives and can provide a baseline for targeting policies and programs to improve energy efficiency in buildings with the lowest marginal cost of efficiency gains. While the clustering on physical characteristics and basic occupancy variables presented here offers little clear differentiation across energy performance groups, it is expected that clustering will become more useful as more information is obtained on tenant characteristics, installed systems, and ownership and capital structure for a particular building. In turn, policies, incentives, and financing programs can be targeted and tailored to the needs of various market segments.

The findings presented here provide new insights into the drivers of energy consumption in large commercial office buildings. While additional data on tenants and building systems would improve the predictive power of the model, the market-specific building energy performance index described in this paper can be a useful tool for developing a more robust peer-group measure to compare and analyze relative energy efficiency across buildings and markets.