ABSTRACT
The speech modality is a rich source of personal information. As such, speech detection is a fundamental function of many social sensing applications. Simply the amount of speech present in our surroundings can give indications about our socialbility and communication patterns. In this work, we present and evaluate a speech detection approach utilizing dictionary learning and sparse signal representation. Transforming the noisy audio data to the sparse representation with a dictionary learned from clean speech data, we show that speech and non speech can be discriminated even in low signal-to-noise conditions with up to 92% accuracy. In addition to an evaluation with simulated data, we evaluate the algorithm on a real-world data set recorded during firefighting missions. We show, that speech activity of firefighters can be detected with 85% accuracy when using a smartphone that was placed in the firefighting jacket.
- Feese, S., Arnrich, B., Rossi, M., Burtscher, M., Meyer, B., Jonas, K., and Tröster, G. Towards Monitoring Firefighting Teams with the Smartphone. In Proc. WIP PerCom (2013).Google ScholarCross Ref
- Garofolo, J. S., Lamel, L. F., Fisher, W. M., Fiscus, J. G., Pallett, D. S., and Dahlgren, N. L. TIMIT acoustic phonetic continuous speech corpus, 1993.Google ScholarCross Ref
- Ghosh, P. K., Tsiartas, A., and Narayanan, S. Robust Voice Activity Detection Using Long-Term Signal Variability. IEEE Trans. ASLP (2011). Google ScholarDigital Library
- Lane, N., Mohammod, M., Lin, M., Yang, X., Lu, H., Ali, S., Doryab, A., Berke, E., Choudhury, T., and Campbell, A. BeWell: A Smartphone Application to Monitor, Model and Promote Wellbeing. In Proc. PervasiveHealth (2011).Google ScholarCross Ref
- Lu, H., Brush, A. J. B., Priyantha, B., Karlson, A. K., and Liu, J. Speakersense: energy efficient unobtrusive speaker identification on mobile phones. In Proc. Pervasive (2011). Google ScholarDigital Library
- Lu, H., Pan, W., Lane, N. D., Choudhury, T., and Campbell, A. T. Soundsense: scalable sound sensing for people-centric applications on mobile phones. In Proc. MobiSys (2009). Google ScholarDigital Library
- Mairal, J., Bach, F., and Edu, G. U. M. N. Online Dictionary Learning for Sparse Coding. In Proc. ICML (2009). Google ScholarDigital Library
- Rachuri, K. K., Mascolo, C., Rentfrow, P. J., and Longworth, C. EmotionSense : A Mobile Phones based Adaptive Platform for Experimental Social Psychology Research. In Proc. UbiComp (2010). Google ScholarDigital Library
- Varga, A., and Steeneken, H. J. M. Assessment for automatic speech recognition ii: Noisex-92: a database and an experiment to study the effect of additive noise on speech recognition systems. Speech Commun. (1993). Google ScholarDigital Library
- You, D., Han, J., Zheng, G., and Zheng, T. Sparse power spectrum based robust voice activity detector. In Proc. ICASSP (2012).Google ScholarCross Ref
Index Terms
- Robust voice activity detection for social sensing
Recommendations
A study of voice activity detection techniques for NIST speaker recognition evaluations
Since 2008, interview-style speech has become an important part of the NIST speaker recognition evaluations (SREs). Unlike telephone speech, interview speech has lower signal-to-noise ratio, which necessitates robust voice activity detectors (VADs). ...
Formant-based robust voice activity detection
Voice activity detection (VAD) can be used to distinguish human speech from other sounds, and various applications can benefit from VAD-including speech coding and speech recognition. To accurately detect voice activity, the algorithm must take into ...
Super-audible voice activity detection
In this paper, reflected sound of frequency just above the audible range is used to detect speech activity. The active signal used is inaudible to humans, readily generated by the typical audio circuitry and components found in mobile telephones, and is ...
Comments