Abstract
In recent years, natural stimuli such as audio excerpts or video streams have received increasing attention in neuroimaging studies. Compared with conventional simple, idealized and repeated artificial stimuli, natural stimuli contain more unrepeated, dynamic and complex information that are more close to real-life. However, there is no direct correspondence between the stimuli and any sensory or cognitive functions of the brain, which makes it difficult to apply traditional hypothesis-driven analysis methods (e.g., the general linear model (GLM)). Moreover, traditional data-driven methods (e.g., independent component analysis (ICA)) lack quantitative modeling of stimuli, which may limit the power of analysis models. In this paper, we propose a sparse representation based decoding framework to explore the neural correlates between the computational audio features and functional brain activities under free listening conditions. First, we adopt a biologically-plausible auditory saliency feature to quantitatively model the audio excerpts and meanwhile develop sparse representation/dictionary learning method to learn an over-complete dictionary basis of brain activity patterns. Then, we reconstruct the auditory saliency features from the learned fMRI-derived dictionaries. After that, a group-wise analysis procedure is conducted to identify the associated brain regions and networks. Experiments showed that the auditory saliency feature can be well decoded from brain activity patterns by our methods, and the identified brain regions and networks are consistent and meaningful. At last, our method is evaluated and compared with ICA method and experimental results demonstrated the superiority of our methods.
Similar content being viewed by others
References
Alluri, V., Toiviainen, P., Jääskeläinen, I. P., et al. (2012). Large-scale brain networks emerge from dynamic processing of musical timbre, key and rhythm. NeuroImage, 59(4), 3677–3689.
Alluri, V., Toiviainen, P., Lund, T. E., et al. (2013). From vivaldi to beatles and back: predicting lateralized brain responses to music. NeuroImage, 83, 627–636. https://doi.org/10.1016/j.neuroimage.2011.11.019
Altmann, C. F., Henning, M., Döring, M. K., et al. (2008). Effects of feature-selective attention on auditory pattern and location processing. NeuroImage, 41(1), 69–79. https://doi.org/10.1016/j.neuroimage.2008.02.013
Bartels, A., & Zeki, S. (2005). Brain dynamics during natural viewing conditions? a new guide for mapping connectivity in vivo. NeuroImage, 24(2), 339–349. https://doi.org/10.1016/j.neuroimage.2004.08.044
Beckmann, C. F., & Smith, S. M. (2004). Probabilistic independent component analysis for functional magnetic resonance imaging. IEEE Transactions on Medical Imaging, 23(2), 137–152.
Biswal, B. B., & Ulmer, J. L. (1999). Blind source separation of multiple signal sources of fmri data sets using independent component analysis. Journal of Computer Assisted Tomography, 23(2), 265–271.
Bordier, C., Puja, F., & Macaluso, E. (2013). Sensory processing during viewing of cinematographic. material: Computational modeling and functional neuroimaging. NeuroImage, 67, 213–226. https://doi.org/10.1016/j.neuroimage.2012.11.031
Calhoun, V. D., Adali, T., Pearlson, G. D., et al. (2001). A method for making group inferences from functional mri data using independent component analysis. Human Brain Mapping, 14(3), 140–151. https://doi.org/10.1002/hbm.1048
Cong, F., Alluri, V., Nandi, A. K., et al. (2013). Linking brain responses to naturalistic music through analysis of ongoing eeg and stimulus features. IEEE Transactions on Multimedia, 15(5), 1060–1069. https://doi.org/10.1109/TMM.2013.2253452
Culham, J. C., & Kanwisher, N. G. (2001). Neuroimaging of cognitive functions in human parietal cortex. Current Opinion in Neurobiology, 11(2), 157–163. https://doi.org/10.1016/S0959-4388(00)00191-4
Daubechies, I., Roussos, E., Takerkart, S., et al. (2009). Independent component analysis for brain fmri does not select for independence. Proceedings of the National Academy of Sciences, 106(26), 10,415–10,422. https://doi.org/10.1073/pnas.0903525106
Du, L., Liu, K., Zhang, T., Yao, X., Yan, J., Risacher, S. L., et al. (2017). A Novel SCCA Approach via Truncated ℓ1-norm and Truncated Group Lasso for Brain Imaging Genetics. Bioinformatics. https://doi.org/10.1093/bioinformatics/btx594
Duncan, J. (2010). The multiple-demand (md) system of the primate brain: mental programs for intelligent behaviour. Trends in Cognitive Sciences, 14(4), 172–179. https://doi.org/10.1016/j.tics.2010.01.004
Fei-Fei, L., & Perona, P. (2005). A bayesian hierarchical model for learning natural scene categories. In: Computer Vision and Pattern Recognition, 2005. CVPR 2005. IEEE Computer Society Conference on, IEEE, 2, 524–531. https://doi.org/10.1109/CVPR.2005.16
Fransson, P. (2005). Spontaneous low-frequency bold signal fluctuations: An fmri investigation of the resting-state default mode of brain function hypothesis. Human Brain Mapping, 26(1), 15–29. https://doi.org/10.1002/hbm.20113
Friston, K. J., Fletcher, P., Josephs, O., et al. (1998). Event-related fmri: characterizing differential responses. NeuroImage, 7(1), 30–40.
Friston, K. J., Holmes, A. P., Worsley, K. J., et al. (1994). Statistical parametric maps in functional imaging: a general linear approach. Human Brain Mapping, 2(4), 189–210. https://doi.org/10.1002/hbm.460020402
Greicius, M. D., Krasnow, B., Reiss, A. L., et al. (2003). Functional connectivity in the resting brain: a network analysis of the default mode hypothesis. Proceedings of the National Academy of Sciences, 100(1), 253–258. https://doi.org/10.1073/pnas.0135058100
Han, J., Chen, C., Shao, L., et al. (2015). Learning computational models of video memorability from fmri brain imaging. IEEE Transactions on Cybernetics, 45(8), 1692–1703. https://doi.org/10.1109/TCYB.2014.2358647
Han, J., Ngan, K. N., Li, M., et al. (2005). A memory learning framework for effective image retrieval. IEEE Transactions on Image Processing, 14(4), 511–524. https://doi.org/10.1109/TIP.2004.841205
Hanson, S. J., Gagliardi, A., & Hanson, C. (2009). Solving the brain synchrony eigenvalue problem: conservation of temporal dynamics (fmri) over subjects doing the same task. Journal of Computational Neuroscience, 27(1), 103–114. https://doi.org/10.1007/s10827-008-0129-z
Hasson, U., & Honey, C. J. (2012). Future trends in neuroimaging: Neural processes as expressed within real-life contexts. NeuroImage, 62(2), 1272–1278. https://doi.org/10.1016/j.neuroimage.2012.02.004
Hasson, U., Malach, R., & Heeger, D. J. (2010). Reliability of cortical activity during natural stimulation. Trends in Cognitive Sciences, 14(1), 40–48. https://doi.org/10.1016/j.tics.2009.10.011
Hasson, U., Nir, Y., Levy, I., et al. (2004). Intersubject synchro-nization of cortical activity during natural vision. Science, 303(5664), 1634–1640. https://doi.org/10.1126/science.1089506
Hu, X., Guo, L., Han, J., et al. (2017). Decoding power-spectral profiles from fmri brain activities during naturalistic auditory experience. Brain Imaging and Behavior, 11(1), 253–263. https://doi.org/10.1007/s11682-016-9515-8
Hu, X., Lv, C., Cheng, G., et al. (2015). Sparsity-constrained fmri decoding of visual saliency in naturalistic video streams. IEEE Transactions on Autonomous Mental Development, 7(2), 65–75. https://doi.org/10.1109/TAMD.2015.2409835
Huang, H., Hu, X., Zhao, Y., Makkie, M., Dong, Q., Zhao, S. et al. (2017). Modeling task fMRI data via deep convolutional autoencoder. IEEE Transactions on Medical Imaging, 9, 1–1. https://doi.org/10.1109/TMI.2017.2715285
Itti, L., & Koch, C. (2001). Computational modelling of visual attention. Nature Reviews. Neuroscience, 2(3), 194–203. https://doi.org/10.1038/35058500
Janata, P. (2009). The neural architecture of music-evoked autobiographical memories. Cerebral Cortex, 19(11), 2579–2594. https://doi.org/10.1093/cercor/bhp008
Jenkinson, M., & Smith, S. (2001). A global optimisation method for robust affine registration of brain images. Medical mage analysis, 5(2), 143–156. https://doi.org/10.1016/S1361-8415(01)00036-6
Ji, X., Han, J., Jiang, X., et al. (2015). Analysis of music/speech via integration of audio content and functional brain response. Information Sciences, 297, 271–282. https://doi.org/10.1016/j.ins.2014.11.020
Jiang, X., Li, X., Lv, J., et al. (2015). Sparse representation of hcp grayordinate data reveals novel functional architecture of cerebral cortex. Human Brain Mapping, 36(12), 5301–5319. https://doi.org/10.1002/hbm.23013
Kalinli, O., & Narayanan, S. S. (2007). A saliency-based auditory attention model with applications to unsupervised prominent syllable detection in speech. A saliency-based auditory, pp 1941–1944.
Kastner, S., Pinsk, M. A., De Weerd, P., et al. (1999). Increased activity in human visual cortex during directed attention in the absence of visual stimulation. Neuron, 22(4), 751–761. https://doi.org/10.1016/S0896-6273(00)80734-5
Kauppi, J. P., Jääskeläinen, I. P., Sams, M., et al. (2010). Inter-subject correlation of brain hemodynamic responses during watching a movie: localization in space and frequency. Frontiers in Neuroinformatics, 4. https://doi.org/10.3389/fninf.2010.00005
Kayser, C., Petkov, C. I., Lippert, M., et al. (2005). Mechanisms for allocating auditory attention: an auditory saliency map. Current Biology, 15(21), 1943–1947. https://doi.org/10.1016/j.cub.2005.09.040
Koch, C., & Ullman, S. (1985). Shifts in selective visual attention: towards the underlying neural circuitry. Human Neurobiology, 4(4), 219–227.
Kumar, S., Bonnici, H. M., Teki, S., et al. (2014). Representations of specific acoustic patterns in the auditory cortex and hippocampus. Proceedings of the Royal Society of London - Series B: Biological Sciences, 281(1791), 20141,000. https://doi.org/10.1098/rspb.2014.1000
Lee, K., Tak, S., & Ye, J. C. (2011). A data-driven sparse glm for fmri analysis using sparse dictionary learning with mdl criterion. IEEE Transactions on Medical Imaging, 30(5), 1076–1089. https://doi.org/10.1109/TMI.2010.2097275
Lin, Y., & Lee, D. D. (2006) Bayesian L 1 -norm sparse learning. In 2006 IEEE International Conference on Acoustics Speech andSignal Processing Proceedings, 5, 605–608.
Liu, T., Hu, X., Li, X., et al. (2014). Merging neuroimaging and multimedia: Methods, opportunities, and challenges. IEEE Transactions on Human-Machine Systems, 44(2), 270–280. https://doi.org/10.1109/THMS.2013.2296871
Lu, L., & Hanjalic, A. (2008). Audio keywords discovery for text-like audio content analysis and retrieval. IEEE Transactions on Multimedia, 10(1), 74–85. https://doi.org/10.1109/TMM.2007.911304
Lv, J., Jiang, X., Li, X., Zhu, D., Zhang, S., Zhao, S., et al. (2015a). Holistic atlases of functional networks and interactions reveal reciprocal organizational architecture of cortical function. IEEE Transactions on Biomedical Engineering, 62(4), 1120–1131.
Lv, J., Jiang, X., Li, X., Zhu, D., Chen, H., Zhang, T., Huang, H. (2015b). Sparse representation of whole-brain fMRI signals for identification of functional networks. Medical Image Analysis, 20(1), 112–134.
Mairal, J., Bach, F., Ponce, J., & Sapiro, G. (2010a). Online Learning for Matrix Factorization and Sparse Coding. Journal of Machine Learning Research, 11, 19–60. https://doi.org/10.1006/nimg.1997.0306
Mairal, J., Bach, F., Ponce, J., et al. (2010b). Online learning for matrix factorization and sparse coding. Journal of Machine Learning Research, 11, 19–60.
McKeown, M. J., Jung, T. P., Makeig, S., et al. (1998). Spatially independent activity patterns in functional mri data during the stroop color-naming task. Proceedings of the National Academy of Sciences, 95(3), 803–810.
Mechler, F., Victor, J. D., Purpura, K. P., et al. (1998). Robust temporal coding of contrast by v1 neurons for transient but not for steady-state stimuli. The Journal of Neuroscience, 18(16), 6583–6598.
Nardo, D., Santangelo, V., & Macaluso, E. (2011). Stimulus-driven orienting of visuo-spatial attention in complex dynamic environments. Neuron, 69(5), 1015–1028. https://doi.org/10.1016/j.neuron.2011.02.020
Nishimoto, S., Vu, A. T., Naselaris, T., et al. (2011). Reconstructing visual experiences from brain activity evoked by natural movies. Current Biology, 21(19), 1641–1646. https://doi.org/10.1016/j.cub.2011.08.031
Olshausen, B. A., & Field, D. J. (1996). Emergence of simple-cell receptive field properties by learning a sparse code for natural images. Nature, 381(6583), 607.
Parkhurst, D., Law, K., & Niebur, E. (2002). Modeling the role of salience in the allocation of overt visual attention. Vision Research, 42(1), 107–123. https://doi.org/10.1016/S0042-6989(01)00250-4
Pearce, M., & Rohrmeier, M. (2012). Music cognition and the cognitive sciences. Topics in Cognitive Science, 4(4), 468–484. https://doi.org/10.1111/j.1756-8765.2012.01226.x
Pearce, M. T., Ruiz, M. H., Kapasi, S., et al. (2010). Unsupervised statistical learning underpins computational, behavioural, and neural manifestations of musical expectation. NeuroImage, 50(1), 302–313. https://doi.org/10.1016/j.neuroimage.2009.12.019
Pessoa, L. (2012). Beyond brain regions: Network perspective of cognition–emotion interactions. The Behavioral and Brain Sciences, 35(3), 158–159. https://doi.org/10.1017/s0140525x11001567
Quiroga, R. Q., Kreiman, G., Koch, C., et al. (2008). Sparse but not grandmother-cellcoding in the medial temporal lobe. Trends in Cognitive Sciences, 12(3), 87–91. https://doi.org/10.1016/j.tics.2007.12.003
Rissanen, J. (1978). Modeling by shortest data description. Automatica, 14(5), 465–471. https://doi.org/10.1016/0005-1098(78)90005-5
Rui, Y., Huang, T. S., Ortega, M., et al. (1998). Relevance feedback: a power tool for interactive content-based image retrieval. IEEE Transactions on Circuits and Systems for Video Technology, 8(5), 644–655. https://doi.org/10.1109/76.718510
Smith, S. M., Fox, P. T., Miller, K. L., et al. (2009). Correspondence of the brain’s functional architecture during activation and rest. Proceedings of the National Academy of Sciences, 106(31), 13,040–13,045. https://doi.org/10.1073/pnas.0905267106
Stone, J., Porrill, J., Porter, N., et al. (2002). Spatiotemporal independent component analysis of event-related fmri data using skewed probability density functions. NeuroImage, 15(2), 407–421. https://doi.org/10.1006/nimg.2001.0986
Sui, J., Adali, T., Yu, Q., et al. (2012). A review of multivariate methods for multimodal fusion of brain imaging data. Journal of Neuroscience Methods, 204(1), 68–81. https://doi.org/10.1016/j.jneumeth.2011.10.031
Toiviainen, P., Alluri, V., Brattico, E., et al. (2014). Capturing the musical brain with lasso: Dynamic decoding of musical features from fmri data. NeuroImage, 88, 170–180. https://doi.org/10.1016/j.neuroimage.2013.11.017
Wright, J., Ma, Y., Mairal, J., et al. (2010). Sparse representation for computer vision and pattern recognition. Proceedings of the IEEE, 98(6), 1031–1044. https://doi.org/10.1109/JPROC.2010.2044470
Yamashita, O., Ma, S., Yoshioka, T., et al. (2008). Sparse estimation automatically selects voxels relevant for the decoding of fmri activity patterns. NeuroImage, 42(4), 1414–1429. https://doi.org/10.1016/j.neuroimage.2008.05.050
Yao, H., Shi, L., Han, F., et al. (2007). Rapid learning in cortical coding of visual scenes. Nature Neuroscience, 10(6), 772–778. https://doi.org/10.1038/nn1895
Yao, X., Han, J., Zhang, D., & Nie, F. (2017). Revisiting co-saliency detection: a novel approach based on two-stage multi-view spectral rotation co-clustering. IEEE Transactions on Image Processing, 26(7), 3196–3209.
Zhao, S., Jiang, X., Han, J., Hu, X., Zhu, D., Lv, J.,& Liu, T. (2014). Decoding auditory saliency from FMRI brain imaging. In Proceedings of the 22nd ACM international conference on Multimedia (pp. 873–876). https://doi.org/10.1145/2647868.2655039
Zhang, W., Jiang, X., Zhang, S., Howell, B. R., Zhao, Y., Zhang, T., et al. (2017a). Connectome-scale Functional Intrinsic Connectivity Networks in Macaques. Neuroscience, 364, 1.
Zhang, S., Li, X., Lv, J., Jiang, X., Guo, L., Liu, T. (2016). Characterizing and differentiating task-based and resting state fMRI signals via two-stage sparse representations. Brain Imaging and Behavior, 10(1), 21–32.
Zhang, S., Zhao, Y., Jiang, X., Shen, D., & Liu, T. (2017b). Joint representation of consistent structural and functional profiles for identification of common cortical landmarks. Brain Imaging and Behavior, 3, 1–15.
Zhao, Y., Dong, Q., Chen, H., Iraji, A., Li, Y., Makkie, M., et al. (2017b). Constructing fine-granularity functional brain network atlases via deep convolutional autoencoder. Medical Image Analysis, 42, 200.
Zhao, Y., Dong, Q., Zhang, S., Zhang, W., Chen, H., Jiang, X., et al. (2017a). Automatic Recognition of fMRI-derived Functional Networks using 3D Convolutional Neural Networks. IEEE Transactions on Biomedical Engineering, (99), 1. https://doi.org/10.1109/TBME.2017.2715281
Zhao, S., Han, J., Lv, J., et al. (2017). Extendable supervised dictionary learning for exploring diverse and concurrent brain activities in task-based fMRI. Brain Imaging and Behavior, 1, 1–15. https://doi.org/10.1007/s11682-017-9733-8
Acknowledgements
This work is supported by National Key R&D Program of China under contract No. 2017YFB1002201. S. Zhao was supported by the Fundamental Research Funds for the Central Universities under grant 3102017zy030 and the China Postdoctoral Science Foundation under grant 2017 M613206. J. Han was supported by the National Science Foundation of China under Grant 61473231 and 61522207. T Liu was supported by NIH R01 DA-033393, NIH R01 AG-042599, NSF CAREER Award IIS-1149260, NSF CBET-1302089, NSF BCS-1439051 and NSF DBI-1564736. L. Guo was supported by the National Science Foundation of China under Grant 61333017.
Author information
Authors and Affiliations
Corresponding authors
Rights and permissions
About this article
Cite this article
Zhao, S., Han, J., Jiang, X. et al. Decoding Auditory Saliency from Brain Activity Patterns during Free Listening to Naturalistic Audio Excerpts. Neuroinform 16, 309–324 (2018). https://doi.org/10.1007/s12021-018-9358-0
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12021-018-9358-0