Abstract
Mediation analysis is an important statistical approach to evaluate the relationships among observed variables. The most commonly used models for mediation analysis handle single-valued variables. However, there are several circumstances (e.g., dimensionality reduction of large datasets, clinical patient courses, repeated measures, masked data, uncertain data) in which the collected information can be represented more naturally by means of intervals. In these cases, standard mediation analyses can be ill-suited. Although interval-valued variables can be transformed into standard single-valued variables, such procedures may mask some relevant information provided by intervals. In this article, we present a novel and simple model (IMedA) to perform mediation analysis on interval-valued variables which is based on both the symbolic regression approach and the regression based mediation framework. We also generalize Stolzenberg’s decomposition of effects to cope with interval-valued data. We further introduce a specific variance based decomposition procedure to descriptively evaluate the sizes of such effects. Finally, to better highlight the IMedA features we apply our model to a real case study from behavioral contexts.
Similar content being viewed by others
Notes
In both SEM-ML and SEM-WLS procedures, interval variables were defined as standard single-valued variables in terms of centers and ranges. The R code for the Lavaan mediation model for interval variables is provided as Supplementary Material.
References
Alarcon GM (2011) A meta-analysis of burnout with job demands, resources, and attitudes. J Vocat Behav 79(2):549–562
Alkhamisi M (2010) Simulation study of new estimators combining the sur ridge regression and the restricted least squares methodologies. Stat Pap 51(3):651–672
Alwin DF, Hauser RM (1975) The decomposition of effects in path analysis. Am Sociol Rev 40:37–47
Arndt S, Turvey C, Coryell WH, Dawson JD, Leon AC, Akiskal HS (2000) Charting patients’ course: a comparison of statistics used to summarize patient course in longitudinal and repeated measures studies. J Psychiatr Res 34(2):105–113
Augustin T (2002) Expected utility within a generalized concept of probabilitya comprehensive framework for decision making under ambiguity. Stat Pap 43(1):5–22
Avanzi L, van Dick R, Fraccaroli F, Sarchielli G (2012) The downside of organizational identification: relations between identification, workaholism and well-being. Work Stress 26(3):289–307
Avanzi L, Balducci C, Fraccaroli F (2013) Contribution to the italian validation of the copenhagen burnout inventory (cbi). Psicol Della Salute 2:120–135
Baron RM, Kenny DA (1986) The moderator-mediator variable distinction in social psychological research: conceptual, strategic, and statistical considerations. J Pers Soc Psychol 51(6):1173–1182
Billard L, Diday E (2002) Symbolic regression analysis. In: Jajuga K, Sokołowski A, Bock HH (eds) Classification, clustering, and data analysis: recent advances and applications. Springer, Berlin, pp 281–288
Billard L, Diday E (2003) From the statistics of data to the statistics of knowledge: symbolic data analysis. J Am Stat Assoc 98(462):470–487
Blanco-Fernández A, Winker P (2016) Data generation processes and statistical management of interval data. AStA Adv Stat Anal 100(4):475–494
Blanco-Fernández A, Colubi A, García-Bárzana M (2013a) A set arithmetic-based linear regression model for modelling interval-valued responses through real-valued variables. Inf Sci 247:109–122
Blanco-Fernández A, Colubi A, González-Rodríguez G (2013b) Linear regression analysis for interval-valued data based on set arithmetic: a review. Towards advanced data analysis by combining soft computing and statistics. Springer, Berlin, pp 19–31
Bliese PD, Castro CA (2000) Role clarity, work overload and organizational support: multilevel evidence of the importance of support. Work Stress 14(1):65–73
Bollen KA, Stine R (1990) Direct and indirect effects: classical and bootstrap estimates of variability. Sociol Methodol 20(1):15–140
Caffo B, Chen S, Stewart W, Bolla K, Yousem D, Davatzikos C, Schwartz BS (2008) Are brain volumes based on magnetic resonance imaging mediators of the associations of cumulative lead dose with cognitive function? Am J Epidemiol 167(4):429–437
Calcagnì A, Lombardi L (2014) Dynamic fuzzy rating tracker (dyfrat): a novel methodology for modeling real-time dynamic cognitive processes in rating scales. Appl Soft Comput 24:948–961
Calcagnì A, Lombardi L, Sulpizio S (2017) Analyzing spatial data from mouse tracker methodology: an entropic approach. Behav Res Methods 1–19. doi:10.3758/s13428-016-0839-5
Carpita M, Ciavolino E (2017) A generalized maximum entropy estimator to simple linear measurement error model with a composite indicator. Adv Data Anal Classif 11(1):139–158
Choi JY, Lee MJ (2016) Regression discontinuity: review with extensions. Stat Pap, pp 1–30
Claessens BJ, Van Eerde W, Rutte CG, Roe RA (2004) Planning behavior and perceived control of time at work. J Organ Behav 25(8):937–950
Couso I, Dubois D (2014) Statistical reasoning with set-valued information: Ontic vs. epistemic views. Int J Approx Reason 55(7):1502–1518
Diday E, Noirhomme-Fraiture M et al (2008) Symbolic data analysis and the SODAS software. Wiley, New York
Edwards JA, Webster S, Van Laar D, Easton S (2008) Psychometric analysis of the uk health and safety executive’s management standards work-related stress indicator tool. Work Stress 22(2):96–107
Edwards JR, Lambert LS (2007) Methods for integrating moderation and mediation: a general analytical framework using moderated path analysis. Psychol Methods 12(1):1–22
Everitt B (1995) The analysis of repeated measures: a practical review with examples. Stat 44(1):113–135
Fairchild AJ, MacKinnon DP, Taborga MP, Taylor AB (2009) R2 effect-size measures for mediation analysis. Behav Res Methods 41(2):486–498
Fields GS (2003) Accounting for income inequality and its change: a new method, with application to the distribution of earnings in the united states. Res Labor Econ 22:1–38
Fishburn PC (1973) Interval representations for interval orders and semiorders. J Math Psychol 10(1):91–105
Fisher CD, To ML (2012) Using experience sampling methodology in organizational behavior. J Organ Behav 33(7):865–877
Frison L, Pocock SJ (1992) Repeated measures in clinical trials: analysis using mean summary statistics and its implications for design. Stat Med 11(13):1685–1704
Gómez G, Calle ML, Oller R (2004) Frequentist and bayesian approaches for interval-censored data. Stat Pap 45(2):139–173
Halff HM, Ortony A, Anderson RC (1976) A context-sensitive representation of word meanings. Memory Cogn 4(4):378–383
Hayes AF, Preacher KJ (2010) Quantifying and testing indirect effects in simple mediation models when the constituent paths are nonlinear. Multivar Behav Res 45(4):627–660
Imai K, Van Dyk DA (2004) Causal inference with general treatment regimes. J Am Stat Assoc 99(467):854–866
Imai K, Keele L, Yamamoto T (2010) Identification, inference and sensitivity analysis for causal mediation effects. Stat Sci 25(1):51–71
Johnson A, Mulder B, Sijbinga A, Hulsebos L (2012) Action as a window to perception: measuring attention with mouse movements. Appl Cogn Psychol 26(5):802–809
Judd CM, Kenny DA (1981) Process analysis estimating mediation in treatment evaluations. Eval Rev 5(5):602–619
Kiers HA (2002) Setting up alternating least squares and iterative majorization algorithms for solving various matrix optimization problems. Comput Stat Data Anal 41(1):157–170
Kim H, Park H (2007) Sparse non-negative matrix factorizations via alternating non-negativity-constrained least squares for microarray data analysis. Bioinformatics 23(12):1495–1502
Kristal AR, Glanz K, Tilley BC, Li S (2000) Mediating factors in dietary change: understanding the impact of a worksite nutrition intervention. Health Educ Behav 27(1):112–125
Kristensen TS, Borritz M, Villadsen E, Christensen KB (2005) The copenhagen burnout inventory: a new tool for the assessment of burnout. Work Stress 19(3):192–207
Lima Neto EdA, de Carvalho FdA (2008) Centre and range method for fitting a linear regression model to symbolic interval data. Comput Stat Data Anal 52(3):1500–1515
Lima Neto EdA, de Carvalho FdA (2010) Constrained linear regression models for symbolic interval-valued variables. Comput Stat Data Anal 54(2):333–347
Little R (1993) Statistical analysis of masked data. J Off Stat 9(2):407–426
Luce RD (1956) Semiorders and a theory of utility discrimination. Econometrica 24(2):178–191
Luo P, Geng Z (2016) Causal mediation analysis for survival outcome with unobserved mediator-outcome confounders. Comput Stat Data Anal 93:336–347
MacKinnon D (2008) Introduction to statistical mediation analysis. Routledge, New York
MacKinnon DP, Fairchild AJ (2009) Current directions in mediation analysis. Curr Dir Psychol Sci 18(1):16–20
Mood A, Graybill F (1950) Introduction to the theory of statistics. McGraw-Hill, New York
Moore RE (1966) Interval analysis. Prentice-Hall Englewood Cliffs, New York
Nkurunziza S, Ejaz Ahmed S (2011) Estimation strategies for the regression coefficient parameter matrix in multivariate multiple regression. Stat Neerl 65(4):387–406
Parchami A, Taheri SM, Mashinchi M (2012) Testing fuzzy hypotheses based on vague observations: a p-value approach. Stat Pap 53(2):469–484
Preacher KJ, Hayes AF (2008) Asymptotic and resampling strategies for assessing and comparing indirect effects in multiple mediator models. Behav Res Methods 40(3):879–891
Preacher KJ, Kelley K (2011) Effect size measures for mediation models: quantitative strategies for communicating indirect effects. Psychol Methods 16(2):93
Rosseel Y (2012) Lavaan: an R package for structural equation modeling. J Stat Softw 48(2):1–36
Salicone S (2007) Measurement uncertainty: an approach via the mathematical theory of evidence. Springer, New York
Sawyer JE (1992) Goal and process clarity: specification of multiple constructs of role ambiguity and a structural equation model of their antecedents and consequences. J Appl Psychol 77(2):130
Seibold DR, McPhee RD (1979) Commonality analysis: a method for decomposing explained variance in multiple regression analyses. Human Commun Res 5(4):355–365
Senn S, Stevens L, Chaturvedi N (2000) Repeated measures in clinical trials: simple strategies for analysis using summary measures. Stat Med 19(6):861–877
Sobel ME (1982) Asymptotic confidence intervals for indirect effects in structural equation models. Sociol Methodol 13(1982):290–312
Stolzenberg RM (1980) The measurement and decomposition of causal effects in nonlinear and nonadditive models. Sociol Methodol 11:459–488
Sutton S (1998) Predicting and explaining intentions and behavior: How well are we doing? J Appl Soc Psychol 28(15):1317–1338
Takane Y, Young FW, De Leeuw J (1977) Nonmetric individual differences multidimensional scaling: an alternating least squares method with optimal scaling features. Psychometrika 42(1):7–67
Taris TW, de Lange AH, Kompier MA (2010) Research methods in occupational health psychology. In: Houdmont J, Leka S (eds) Occupational health psychology. Wiley-Blackwell, Hoboken, pp 269–297
Taylor AB, MacKinnon D, Tein JY (2008) Tests of the three-path mediated effect. Organ Res Methods 11(2):241–269
Timmerman ME, Kiers HA (2002) Three-way component analysis with smoothness constraints. Comput Stat Data Anal 40(3):447–470
Toderi S, Balducci C, Edwards JA, Sarchielli G, Broccoli M, Mancini G (2013) Psychometric properties of the uk and italian versions of the hse stress indicator tool. Eur J Psychol Assess 29(1):72–79
Wardle J, Carnell S, Haworth CM, Farooqi IS, O’Rahilly S, Plomin R (2008) Obesity associated genetic variation in fto is associated with diminished satiety. J Clin Endocrinol Metab 93(9):3640–3643
Yahya W, Olaifa J (2014) A note on ridge regression modeling techniques. Electron J Appl Stat Anal 7(2):343–361
Yuan KH, Cheng Y, Maxwell S (2013) Moderation analysis using a two-level regression model. Psychometrika 79(4):701–732
Zhang Z, Wang L (2013) Methods for mediation analysis with missing data. Psychometrika 78(1):154–184
Author information
Authors and Affiliations
Corresponding author
Electronic supplementary material
Below is the link to the electronic supplementary material.
Appendices
Appendix A: Solutions for IMedA model
where vec(.) is the linear operator that converts a \(n \times k\) matrix into a \(kn \times 1\) vector, \(\otimes \) denotes the Kronecker product, \(\mathbf {I}_k\) is a \(k \times k\) identity matrix whereas \(\mathbf {1}\) is a \(n\times k\) matrix of all ones.
Appendix B: Decomposition of effects for IMedA model
In order to derive direct and indirect effects for the IMedA model, we proceed as follows. Consider the regression systems \(\mathcal {S}_1\) and \(\mathcal {S}_2\) shown in Eq. 1:
Firstly, substitute the equations of \({\mathbf{M}^\mathbf{c}}\) and \({{\mathbf{M}^\mathbf{r}}}\) into \({{\mathbf{y}^\mathbf{c}}}\) and \({{\mathbf{y}^\mathbf{r}}}\), as follows:
Multiplying through and expanding terms, using a little algebra, we obtain the following reduced form system \(\mathcal {S'}_2\):
Next, taking the partial derivatives of \({{\mathbf{y}^\mathbf{c}}}\) and \({{\mathbf{y}^\mathbf{r}}}\) with respect to \({{\mathbf{x}^\mathbf{c}}}\) and \({{\mathbf{x}^\mathbf{r}}}\) we have the equations for the total effect (TE) of the model, as follows:
Finally, collecting and simplifying the ensuing terms, we obtain the following equations for TE:
which are in the general form of \(\mathrm{TE} = \mathrm{DE}_c + \mathrm{DE}_r + \mathrm{IE}_{c/c} + \mathrm{IE}_{c/r} + \mathrm{IE}_{r/c} + \mathrm{IE}_{r/r}\). Note that the equation \(\mathrm{TE}^{y^r}\) for \({{\mathbf{y}^\mathbf{r}}}\) is obtained as linear combination of \(\mathrm{TE}^{y^c}\) through the parameter \(\delta \).
Appendix C: Decomposition of variance for IMedA model
Considering the reduced form system \(\mathcal {S'}_2\):
the following identities hold:
after noticing that:
where \({{\mathrm{cov}}}(.)\) and \({{\mathrm{var}}}(.)\) indicates the covariance and variance operators, \(\circ \) denotes the Hadamard product whereas \(\mathbf {1}_k\) is a \(k\times 1\) vector of all ones. The following properties hold:
where \(\omega _1 = \Vert {{\mathbf{y}^\mathbf{c}}} - {{\mathbf{y}^\mathbf{c}}}^* \Vert ^2 /~ \Vert {{\mathbf{y}^\mathbf{c}}} - {\overline{\mathbf{y}^\mathbf{c}}} \Vert ^2\) whereas \(\omega _2 = \Vert {{\mathbf{y}^\mathbf{r}}} - {{\mathbf{y}^\mathbf{r}}}^* \Vert ^2 /~ \Vert {{\mathbf{y}^\mathbf{r}}} - {\overline{\mathbf{y}^\mathbf{r}}} \Vert ^2\). Note that \({\bar{\mathbf{y^c}}}\) and \({\bar{\mathbf{y^r}}}\) are \(n\times 1\) vectors containing the mean values of \(\mathbf {{y^c}}\) and \(\mathbf {{y^r}}\) whereas \({{\mathbf{y}^\mathbf{c}}}^*\) and \({{\mathbf{y}^\mathbf{r}}}^*\) refers to the estimated reduced equations in \(\mathcal {S'}_2\) without considering the residual terms \({\varvec{\epsilon }}^c\) and \({\varvec{\epsilon }}^r\). Note also that the terms in the right side of Eq. C4 denotes the variance explained by the reduced system \(\mathcal {S'}_2\).
Appendix D: Bias-corrected and accelerated (BCa) bootstrap procedure
Bias-corrected and accelerated (BCa) is a powerful bootstrap procedure usually adopted in mediation analysis. More precisely, Q samples (with \(Q\ge 1000\)) of size n are row-wise randomly drawn (with replacement) from the original matrices \({\mathbf{M}^\mathbf{c}}\), \({{\mathbf{M}^\mathbf{r}}}\) and original vectors \({{\mathbf{y}^\mathbf{c}}}\), \({{\mathbf{y}^\mathbf{r}}}\). For each \(q-\)th sample, the mediation parameters are estimated by applying the IMedA procedure on the sample matrices \({\mathbf{M}^\mathbf{c}}_q\), \({{\mathbf{M}^\mathbf{r}}}_q\) and vectors \({{\mathbf{y}^\mathbf{c}}}_q\),\({{\mathbf{y}^\mathbf{r}}}_q\). These steps are repeated for Q times. The ensuing sample parameter distributions are then used for computing the standard errors or BCa based confidence intervals (95% CIs) for every estimated parameter in the model. In what follows we briefly describe how the BCa based CIs can be obtained. For the sake of generality, considering the i-th parameter of \(\widehat{{\varvec{\gamma }}^c}_i\) with \(\widehat{{\varvec{\gamma }}^c}_i^* = \widehat{ {\varvec{\gamma }}^c}_{i1}, \widehat{ {\varvec{\gamma }}^c}_{i2}, \ldots , \widehat{ {\varvec{\gamma }}^c}_{iQ}\) denoting its empirical distribution. The 95% BCa based confidence interval for such parameter takes the form of \([~ \widehat{ {\varvec{\gamma }}^c}^*_{i_g}, \widehat{ {\varvec{\gamma }}^c}^*_{i_v}~]\), where g and v are, in turn, computed as follows:
where \(\Phi _\mathcal {N}\) is the cumulative normal distribution function, \(z_{\alpha /2} = -1.96\) and \(z_{1-\alpha /2} = 1.96\) for \(\alpha =0.05\) whereas \(z_{\phi _2} = \Phi _\mathcal {N}^{-1}(\phi _1,0,1)\), with \(\Phi _\mathcal {N}^{-1}\) being the inverse cumulative normal distribution function. Note that the term \(z_{\phi _2}\) measures the median bias of the bootstrap distribution \(\widehat{ \gamma }^*_i\) where \(\phi _2\) is computed as follows:
whereas, on the contrary, the acceleration term \(\phi _1\), which measures the rate of change of the standard deviation of \(\widehat{ \gamma }^*_i\), is computed as follows:
Note that, when \(\phi _1 = \phi _2 = 0\) the BCa based CIs simply reduce to the standard percentile CIs.
Rights and permissions
About this article
Cite this article
Calcagnì, A., Lombardi, L., Avanzi, L. et al. Multiple mediation analysis for interval-valued data. Stat Papers 61, 347–369 (2020). https://doi.org/10.1007/s00362-017-0940-6
Received:
Revised:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00362-017-0940-6