Introduction

Prediction of respiratory function is an essential component of understanding respiratory status in daily clinical practice, but the prediction of respiratory function based on able-bodied values has limitations in persons with spinal cord injury (SCI). The respiratory function in SCI is reduced due to the partly or complete impairment of the respiratory muscles [1, 2]. This impairment is dependent on the completeness and level of the lesion [3, 4]. Immediately after the onset of the injury the respiratory function is reduced but during inpatient rehabilitation and the first year thereafter the respiratory function can partly recover [1, 5,6,7,8,9]. At some point this initial improvement turns into a decline [5] which exceeds the normal age-related decline [10]. This age-related decline in able-bodied persons is mainly related to gender and height [11,12,13]. In persons with SCI the decline in respiratory function in the first years after rehabilitation also appears associated with higher body mass index, lower inspiratory muscle strength and declined physical fitness [6, 10, 14]. In general, large interpersonal differences in changes of respiratory function might be possible [10]. The intention of this study is to promote regular measurements of the respiratory function by spirometry but also give the clinicians an additional tool for better individualized interpretation of the measured values. SCI-specific prediction models can help identify patients whose respiratory function is below the norm (i.e. predicted values) and thus, clinicians can deliver targeted interventions to this group of patients. Individual pulmonary diagnostics and therapy are therefore central treatment targets to maintain health and quality of life [15].

In 2012 a reference value calculator was developed to obtain an overview of respiratory function status while taking into account the spinal cord lesion level [15]. To develop these initial models, 440 persons were tested of whom 150 were between six months and two years post injury; a period when lung function is known to improve after a SCI. As such, it is unclear whether these models are accurate for persons several years post injury.

Therefore, the aim of this study was to test the accuracy of the 2012 models in persons with long-term SCI and if necessary develop and validate new prediction models of respiratory function. We hypothesize that the lesion-specific reference models published in 2012 [15] are not sufficiently accurate for long-term injured persons and that new models are required to improve decision making about treatment [16].

Methods

Design and setting

A multicenter, cross-sectional study with 10 SCI rehabilitation centers was performed (The Netherlands (n = 8), Australia (n = 1), Switzerland (n = 1)).

In the Netherlands, data from the research program ‘Active Lifestyle Rehabilitation Interventions in aging Spinal Cord Injury’ (ALLRISC) have been used [17]. Reporting follows the TRIPOD checklist [18].

Study population

Incusion criteria for this study were: long-term SCI ( >2 years after injury, median 20 years) due to trauma, at least 18 years of age, motor complete injury (American Spinal Injury Association Impairment Scale (AIS) A and B), right and left motor level between C4-T12. Only data from motor complete injured persons were analyzed to increase the homogeneity of the sample.

Persons with acute or chonic respiratory diseases were excluded, as well as persons with severe scoliosis, progressive neurological diseases, bronchodilators or any other medication that could have adversely affected respiratory function at the time of assessment. Persons with respiratory failure, ventilator or tracheostomy dependency, traumatic brain injury or mental disorders were excluded. Persons with obstructive sleep apnea were not excluded.

Procedure

The outcome measures were forced vital capacity (FVC), forced expiratory volume in one second (FEV1), peak expiratory flow (PEF), maximal inspiratory (PImax) and maximal expiratory pressure (PEmax). Gender, age, height (measured or asked and not arm span as a surrogate), weight and time post injury (TPI) at the time of the respiratory function measurement were recorded. Information on smoking history was also collected. Each center entered their anonymized data in a centralized study database. In the Swiss center the data were collected from November 2002 to June 2018, in the Dutch centers from November 2011 to February 2014 and in the Australian center from April 1996 to May 2014. The lung function measurements in the Netherlands and in Australia were performed for research purposes [19, 20]. The lung function measurements in Switzerland were performed during clinical practice. Measurements of respiratory function were performed only once per participant in all centers by appropriately trained personnel according to the ATS/ERS guidelines [21]. Participants were sitting upright in their own wheelchair and breathed through a mouthpiece while wearing a nose clip. Each measurement was repeated until three reproducible measurements were registered and the highest value was quoted for further analysis. The spirometers for measurement of FVC, FEV1, and PEF were calibrated daily. To measure FVC and FEV1, the participants were instructed to exhale fully from total lung capacity. The maximum airflow during this forced expiration was measured to assess PEF. PImax and PEmax were measured using a respiratory pressure meter (Micro RPM, Micro Medical, Hoechberg, Germany). For the measurements the participants carried out their maximal in- and expiratory maneuvers from residual volume and total lung capacity, resprectively [22]. Abdominal binders were removed for all measurements.

Statistical analyses

Demographic data are presented as frequency or as median (25% and 75% quartile). Depending on the distribution of the data, an independent t-test or a Mann–Whitney U test was used to investigate differences between the model- and validation sample for all numerical data. Differences between the model- and the validation sample of categorical data were compared using chi-square tests. Lesion level was treated as a categorical variable with four different lesion groups, two for tetraplegia grouped in lesion level C4–C5 and C6–C8 and two for paraplegia grouped in lesion level T1–T6 and T6-T12. These lesion level groups were built based on the level of innervation of the main respiratory muscles. Lesion group T1–T6 was the reference group.

All analysis except the multilevel regression analysis were performed using SPSS (Version 18.0.3, IBM, Somers, NY, USA). Statistical significance was set at alpha ≤0.05. The multilevel regression analysis were done with the multi-level modelling program MLwin (MLwin, version 1.1; Center for Multilevel Modelling, Institute for Education, London, UK) [23, 24].

The project was divided into three phases as illustrated in Fig. 1.

Fig. 1
figure 1

Flow-chart of the project

Phase 1– Accuracy of the statistical models published in 2012

Bland-Altman plots were used [25] to quantify the agreement and to evaluate a bias between the predicted values (calculated with the lesion-specific reference values published in 2012 [15]) and measured respiratory function values. The mean differences between measured and predicted values were assessed against their mean and the limits of agreement (mean ± 1.96 standard deviations (SD) of the difference). A visual examination of the plot allowed us to evaluate the global agreement between the two different methods [25]. Adding a regression line of the difference and confidence interval limits into the Bland–Altman plots assisted with describing any proportional difference [25]. To test how much the measured test scores are spread around the ‘true’ score the standard error of measurement (SEM) was calculated. T-tests were performed to test for proportional bias in the old models (measured values versus predicted values). The ICCs (two-way random, absolute agreement) were calculated to determine the correlation between the predicted and measured respiratory function value. Values for ICCs indicate the following reliability: less than 0.5 poor, between 0.5 and 0.75 moderate, between 0.75 and 0.9 good, greater than 0.9 excellent [26]. The ICCs and Bland-Altman tests are appropriate for reliability analysis and it is recommended that both are used [27].

Data splitting procedure (preparation for phase 2 and 3)

The data were split into two samples: (1) a sample to develop the new statistical models, the ‘model sample’ (80%) (phase 2) and (2) a sample to cross-validate the reliability of the models, the ‘validation sample’ (20%) (phase 3). Random numbers were generated and the data split on basis of the order of these numbers (model 80% and validation 20%) [28, 29].

Phase 2 - Development of lesion-specific reference models

For the development of the new lesion-specific reference models we principally followed the 2012 statistical procedure but now added the parameter country as level. For each of the respiratory function parameters FVC, FEV1, PEF, PImax, and PEmax one model was developed to determine the relationship of personal and lesion characteristics with respiratory function. The hierarchy in the data was as follows: individual participants (level 1) who were grouped in the participating centers (level 2) and the participating countries (level 3). In order to calculate the influence of the lesion level, three dummy variables were used and the lesion group T1–T6 with the most participants was defined as reference group. Further factors potentially influencing respiratory function, such as gender (male = 1, female = 0), age (years), height (cm), weight (kg), TPI (years) were added to a basic univariate multilevel regression equation. Information about smoking history (pack-years (years), ever (0 = never smoker, 1 = ever smoker), former (0 = no former smoker, 1 = former smoker) or current smoker (0 = no current smoker, 1 = current smoker)) were added to the basic univariate multilevel regression equation separately. Independent variables with p-values ≤ 0.1 were included in a subsequent multivariable equation. Model fit was assessed with the −2 Log likelihood for the equations. A backward selection procedure was then carried out, excluding non-significant determinants (p ≥ 0.05) in order to create a final multivariable equation.

The predictive ability of each of the five models was judged based on the adjusted R2 (explained variance) as a statistical measure of accuracy.

Phase 3 - Accuracy of the lesion-specific reference models

The validation sample was used to test the predictive value of the developed models. As in phase 1 of the project the predicted respiratory function values (calculated with the new models) were compared with the measured respiratory function values. Adding a regression line of the difference and confidence interval limits into the Bland–Altman plots assisted with describing any proportional difference [25]. T-tests were performed to test for proportional bias in the new models (measured values versus predicted values). For the graphical illustration of the validation also residual plots were used. The SEM and ICCs were calculated.

Results

A total of 613 participants were analysed; 346 participants (56%) with tetraplegia and 267 (44%) with paraplegia. The Swiss rehabilitation center provided data from 304 persons, the Dutch 215, and the Australian center 94 persons with long-term SCI. Demographic data are presented in Table 1. No differences in personal and lesion characteristics were found between the model- and the validation sample (Table 1). No measurements of respiratory muscle pressures from the Dutch centers and no measurements of PEF from the Australian center were available.

Table 1 Characteristics of participants of the whole sample (n = 613), devided into a model- and a validation-sample

Phase 1 - Accuracy of the statistical models published in 2012

The predictions of the models published in 2012 did not match the measured data of persons with long-term SCI well. The t-tests to test for proportional bias (measured versus predicted values) are significant for FEV1 (p = 0.033), PEF (p < 0.000), PImax (p < 0.000), PEmax (p < 0.000) and almost for FVC (p = 0.053), that means that the measured values are significantly lower or higher than the predicted values. Thus, there is a systematic difference in the old models for all five respiratory function parameters and the models give an over- or underestimation. Figure 2 demonstrates this proportional bias for all of the variables examined. The 2012 equations underestimated the actual values in the lower range and overestimated in the upper range for all five parameters. The ICCs for the lung volumes (0.62–0.64) were moderate but the ICCs for the respiratory muscle strength values were only poor to moderate with 0.41 for PImax and 0.40 for PEmax, respectively (Table 2) [26]. The residual- and scatterplots can be found in online supplement [25, 30, 31].

Fig. 2
figure 2

(phase 1). Bland-Altman plots of differences between the measured respiratory function values (FVC, FEV1, PEF, PImax, and PEmax) of each participant and the calculated respiratory function values using the lesion-specific reference models published in 2012. The bold line represents the proportional bias and dashed lines show the limits of agreement. Dotted, diagonal line represents the regression line and confidence interval limits are presented as continuous, diagonal lines. FVC regression line (95% CI): y = −2.27 (4.545 to 4.717) + 0.48 (0.832 to 0.997) *x. FEV1 regression line (95% CI): y = −1.81 (3.786 to 3.928) + 0.46 (0.810 to 0.981) *x. PEF regression line (95% CI): y = − 3.76 (7.537 to 7.856) + 0.52 (0.959 to1.128) *x. PImax regression line (95% CI): y = −59.5 (93.529 to 98.852)+ 0.67 (0.838 to 0.995) *x. PEmax regression line (95% CI): y = −54.66 (85.585 to 91.181) + 0.67 (0.839 to 0.997) *x

Table 2 Phase 1: Accuracy for prediction of lung function and respiratory muscle strength for the whole sample of persons with long-term SCI with the models published in 2012

Phase 2 - Development of lesion-specific reference models

Due to the poor accuracy of the models published in 2012 for long-term SCI, new statistical models to predict respiratory reference values were developed with the model sample (80% of the total sample) (Table 3). The remaining candidate predictors after the backward regression models for all five respiratory function parameters were lesion level and gender. Age, height, weight, and TPI had an additional significant influence on a sub-set of parameters only. We explored different forms of TPI in advance (observed, linear, logarithmic, exponential fittings) and did not find substantial differences between the four fittings. Based on those plots we decided that other fittings than the linear fitting did not improve the models. All significant predictors can be seen in Table 3.

Table 3 Phase 2: Regression coefficients (β) and 95% CI of the new models from the multilevel regression analysis of respiratory function parameters

The predictors included in the new models explained 69–78% of the variance (R2) for lung volumes and 69% for both respiratory muscle strength parameters (Table 3).

Phase 3 - Accuracy of the lesion-specific reference models

The reliability of the new models were tested with the ‘validation sample’ (20% of the total sample) (Fig. 1). The t-tests to test for proportional bias (measured versus predicted values) are significant for FVC (p < 0.000), FEV1 (p < 0.000) and PEF (p < 0.000). That means that the measured values are significantly lower or higher than the predicted values, but not significant for PImax (p = 0.43) or PEmax (p = 0.41). Thus, there is a systematic difference with an over- or underestimation in the new models for all three lung function parameters but not for the respiratory muscle strength parameters. Figure 3 shows the predicted and measured respiratory function values in Bland–Altman plots. The ICCs between the measured and predicted lung function values for the new models ranged from 0.28 (PImax) to 0.55 (PEF), i.e., between poor and moderate reliability (Table 4) [26]. The residual- and scatterplots can be found in the online supplement [25, 30, 31].

Fig. 3
figure 3

(phase 3). Bland–Altman plots of differences between the measured respiratory function values (FVC, FEV1, PEF, PImax, and PEmax) of each participant and the calculated respiratory function values using the new developed models with the representation of the limits of agreement (dashed, horizontal lines). The bold line represents the proportional bias and dashed lines show the limits of agreement. Dotted, diagonal line represents the regression line and confidence interval limits are presented as continuous, diagonal lines. FVC regression line (95% CI): y = −4.57 (−5.08 to −4.06) + 1.22 (1.07 to 1.37) *x. FEV1 regression line (95% CI): y = −3.29 (−3.70 to −2.87) + 1.08 (0.94 to 1.23) *x. PEF regression line (95% CI): y = − 6.54 (−7.60 to −5.47) + 1.02 (0.85 to 1.20) *x. PImax regression line (95% CI): y = −99.05 (−125.42 to −72.67) + 1.26 (0.95 to 1.57) *x. PEmax regression line (95% CI): y = −79.40 (−100.56 to −58.25) + 1.14 (0.86 to 1.41) *x

Table 4 Phase 3: Accuracy for prediction of lung function and respiratory muscle strength for the validation sample with the developed statistical models

Discussion

The prediction models published in 2012 showed to be not accurate enough for persons with long-term SCI and new models needed to be developed. The reason is that the 2012 equations underestimated the actual values in the lower range and overestimated them in the upper range for all five respiratory function parameters. For clinical practice, comparison of measured values with population specific reference values is important in order to prescribe preventive treatment. One possible treatment could be the increase of inspiratory muscle strength by respiratory muscle training as shown in a previous publication of our research group where we also used reference values [32]. Using lesion-specific relative values of respiratory function is much more sensitive in the SCI-population than just absolute values or reference values from able-bodied persons [32]. For lung function we did not find better models for persons with long-term SCI but the models for respiratory muscle strength improved.

Phase 1 - Accuracy of the statistical models published in 2012

There was a need to test the accuracy of the first published models for persons with more than two years TPI. The ICCs for the lung volumes were good and also the Bland–Altman plots only showed small differences between the measured and the predicted values (Table 2). According to the ATS/ERS documents the acceptable difference between the measured and the predicted value should be below 0.150 L for FVC, should not exceed 20% for FEV1 and should be below 0.67 L/s for PEF [33]. In our results these differences between measured and predicted values are within these acceptable differences, however, the 95% limits of agreement for FVC are between 2.12 L and 1.96 L which represents a wide range (Table 2). For PImax and PEmax the ICCs were poor, the differences and limits of agreement between the measured and the predicted values relatively wide (Table 2). Normal ranges for respiratory muscle strength are wide and the inter-individual differences between measurements in muscle strength is considerably greater than for lung function [34]. Due to these findings we judged the old models not accurate, especially for respiratory muscle strength for a long-term SCI population.

Phase 2 - Development of lesion-specific reference models

New models to predict respiratory function values in long-term SCI were developed with lesion level, gender and weight as the main candidate predictors. There are some parellels between the ‘old’ and the newly developed models but also some fundamental differences. Lesion level, as a SCI-specific parameter is important for all models and similar to able-bodied persons, also gender had an influence on all models (Table 3) [15]. Typically women have smaller vital capacities and maximal expiratory flow rates, reduced airway-diameters and smaller diffusion surfaces than age- and height-matched men [11]. Height is similar between both models [15] (Table 3), only in PEF there was no additional increase with increasing height (Table 3). In able-bodied persons a 1% increase in height corresponds to a 2.5% increase in FVC and FEV1 [13].

In the ‘old’ models, increasing age had a negative effect on all five respiratory function parameters [15]. In the newly developed models age had only a negative effect on FEV1 with a decrease of 15 ml per year (Table 3). This is about half the age-related decline shown in the old models and in able-bodied persons where FEV1 declines by up to 30 ml per year [12, 35]. The aging lung is likely to have experienced exposures to environmental toxins and reductions in physiological capacity [12], and the chest-wall compliance is reduced due to stiffening of the rib cage [1]. In the “old” models TPI was positively associated with PEmax [15]. The newly developed models with the long-term SCI population showed that the longer the TPI the lower the FVC (Table 3). For FVC the TPI was even a stronger predictor than age. Large interpersonal differences in change of FVC can happen in the first five years after rehabilitation [10]. If a clinician is interested to see what the influence is of 10 years aging or TPI instead of 1 year aging or TPI one can multiply the beta of age or TPI by 10 (Table 3). Smoking history conferred no predictive power even though it is well known that starting smoking is related to a rapid decline of lung function while quitting smoking has a beneficial effect on lung function [36]. Our findings are supported by literature where smoking history did also not show differences over time for FVC and FEV1 in males with traumatic tetraplegia and in persons with either tetra- or paraplegia with AIS A-D [10, 37]. In our study all those participants with chronic lung diseases have been excluded and thus, an underrepresentation of the average population-based respiratory function status may occur. When estimating smoking history, recall bias may be an issue due to the retrospective design among ex-smokers [38]. Recall bias means when persons remember past events, they do not usually have a complete or accurate picture of what happened. However, in another investigation self-reports of regular, former or never smoking are found to be usually accurate and the validity of self-reporting smoking seems to be similar among persons from different ages and socioeconomic groups [39].

Phase 3 - Accuracy of the lesion-specific reference models

An important aspect of prediction is to consider if a regression model can be used reliably in persons with comparable characteristics. The ICCs for the lung volume parameters were poor and the Bland-Altman plots for FVC and FEV1 showed relatively wide differences and limits of agreement between the measured and the predicted values (Table 4). According to the ATS/ERS documents the acceptable difference between the measured and the predicted value should be below 0.15 L for FVC, should not exceed 20% for FEV1 and should be below 0.67 L/s for PEF [33]. In our study the difference between the measured and the predicted values for FVC (−0.45 L) is higher than the acceptable difference of 0.15 L and for FEV1 (−0.31 L) and PEF (−0.53 L/s) they are within the acceptable range but the differences are higher than in the validation of the old models (Table 4). A positive trend appeared evident in all five Bland–Altman plots, as illustrated with the regression line of the difference and confidence interval limits [25] (Fig. 3). We conclude that for lung volume models in long-term SCI, other possible candidate predictors need to be evaluated in future research. The list of further predictors is long, ranging from clinical (e.g. co-morbidities) and laboratory (e.g. Chest X-ray) predictors to social/ psychologial (e.g. ethnicity, motivation) predictors [40]. With the currently available candidate predictors, lung function cannot be well-predicted. As such, the “old” models appeared to be more accurate for predicting FVC and FEV1.

The use of different prediction equations can lead to different interpretation with results differing on models and geopgraphical site of assesment [41]. The Global Lung function Initiative (GLI) provide standardized spirometry reference equations for able-bodied aged from 3 to 95 years to improve comparability and accuracy [42]. After comparing the GLI and another two commonly used spirometry prediction equations in a group of healthy Kenyan volunteers one can see that no equation consistently provides accurate estimates of normal lung function [43]. Even if the exposure to environmental pollutants in the Kenyan population is not comparable with the reference population the study brings into question the validity of these major published spirometry prediction equations [43]. The best accuracy is given with detailed information about age and height. A few months age difference can affect the predicted values by up to 8.5% [44]. Also an 1 cm error in height can lead to an error in the predicted value of 6% [42].

Clinical relevance

Prediction models for respiratory function are useful to individually assess the respiratory function in persons with different levels of injury. By yearly measuring the respiratory function and comparing them to the predicted values, we can potentially identify lung dysfunction and deterioration with aging, posture, obesity or ascension of neurological level.

Strengths and limitations

The strengths of this study are the appropriate set of respiratory function values and the dataset that best represents the target patient population (Table 1) [45]. Data of several laboratories are combined in which the techniques are performed in the same way as in daily routine so that sources of variability are minimized [45].

The development of these new models involved some compromise as data of respiratory muscle strength and PEF were only available in two of three countries. Ideally, reference values are calculated with models derived from measurements observed in a representative sample of a clearly defined and, as much as possible, corresponding population [46, 47]. With our study size of 276 and 393 participants for respiratory muscle strength and PEF (Table 3), respectively, these criteria can still be fulfilled.

The lung function measurements in the Netherlands and in Australia were especially performed for research purposes which bears the risk of a selection bias, but had the advantage of standardized measurement protocols.

The period of inclusion of participants from 1996 to 2018 is long and changes in treatment regime happened, e.g. new respiratory muscle training devices are used in clinical practice and more is known about the effect of in- and expiratory muscle strength training. However, since this is not a training- or longitudinal study (one measurement per participant only) this may only have a marginal influence on the results.

Differences in TPIs may have contributed to the heterogeneity of the results. In the Australian and Swiss centers the TPI started at two years in contrast to the Dutch centers where the TPI was defined with 10 years. Since TPI had been included as continuous parameter into the analysis, this fact should not be a limitation.

Our focus was on the most obvious and easy to assess candidate predictors for feasibility reasons in the daily clinical practice. The research team selected the candidate varaibles based on the available literature in the same population [15, 20] and with able-bodied persons [47]. In future, the Delphi method as a structured communication technique could help to update and reach consensus on candidate predictors [40]. Despite the fact that our models do still not contain all potential candidate predictors of relevance [40], our large sample size in general reflects a wide range of various levels of, e.g. activity and sports among the participants.

For the graphical illustration of the validation, Bland–Altman plots (Figs. 2 and 3) and residual plots (online supplement) were used. There is a discussion about the difference between these two statistical methods with the argument that in Bland-Altman plots a systematic proportional bias can occur [48]. When looking at Figs. 2 and 3, in fact this type of bias is obvious. At lower respiratory function values the predicted values are higher than the measured values while at higher respiratory function values the predicted values are lower than the measured values.

Conclusion

The respiratory function prediction models published in 2012 showed to be not accurate enough for persons with long-term SCI. Thus, new statistical models have been developed to predict the respiratory function in persons with injuries more than two years ago. In summary, we did not find better models for lung function in long-term SCI but those for respiratory muscle strength showed better accuracy.