tutorial

Robust voice activity detection for social sensing

Authors:
Sebastian Feese

ETH Zurich, Zurich, Switzerland

ETH Zurich, Zurich, Switzerland
View Profile

,
Gerhard Tröster

ETH Zurich, Zurich, Switzerland

ETH Zurich, Zurich, Switzerland
View Profile

UbiComp '13 Adjunct: Proceedings of the 2013 ACM conference on Pervasive and ubiquitous computing adjunct publicationSeptember 2013Pages 931–938https://doi.org/10.1145/2494091.2497347

Published:08 September 2013Publication History

UbiComp '13 Adjunct: Proceedings of the 2013 ACM conference on Pervasive and ubiquitous computing adjunct publication

Pages 931–938

ABSTRACT

The speech modality is a rich source of personal information. As such, speech detection is a fundamental function of many social sensing applications. Simply the amount of speech present in our surroundings can give indications about our socialbility and communication patterns. In this work, we present and evaluate a speech detection approach utilizing dictionary learning and sparse signal representation. Transforming the noisy audio data to the sparse representation with a dictionary learned from clean speech data, we show that speech and non speech can be discriminated even in low signal-to-noise conditions with up to 92% accuracy. In addition to an evaluation with simulated data, we evaluate the algorithm on a real-world data set recorded during firefighting missions. We show, that speech activity of firefighters can be detected with 85% accuracy when using a smartphone that was placed in the firefighting jacket.

References

Feese, S., Arnrich, B., Rossi, M., Burtscher, M., Meyer, B., Jonas, K., and Tröster, G. Towards Monitoring Firefighting Teams with the Smartphone. In Proc. WIP PerCom (2013).Google ScholarCross Ref
Garofolo, J. S., Lamel, L. F., Fisher, W. M., Fiscus, J. G., Pallett, D. S., and Dahlgren, N. L. TIMIT acoustic phonetic continuous speech corpus, 1993.Google ScholarCross Ref
Ghosh, P. K., Tsiartas, A., and Narayanan, S. Robust Voice Activity Detection Using Long-Term Signal Variability. IEEE Trans. ASLP (2011). Google ScholarDigital Library
Lane, N., Mohammod, M., Lin, M., Yang, X., Lu, H., Ali, S., Doryab, A., Berke, E., Choudhury, T., and Campbell, A. BeWell: A Smartphone Application to Monitor, Model and Promote Wellbeing. In Proc. PervasiveHealth (2011).Google ScholarCross Ref
Lu, H., Brush, A. J. B., Priyantha, B., Karlson, A. K., and Liu, J. Speakersense: energy efficient unobtrusive speaker identification on mobile phones. In Proc. Pervasive (2011). Google ScholarDigital Library
Lu, H., Pan, W., Lane, N. D., Choudhury, T., and Campbell, A. T. Soundsense: scalable sound sensing for people-centric applications on mobile phones. In Proc. MobiSys (2009). Google ScholarDigital Library
Mairal, J., Bach, F., and Edu, G. U. M. N. Online Dictionary Learning for Sparse Coding. In Proc. ICML (2009). Google ScholarDigital Library
Rachuri, K. K., Mascolo, C., Rentfrow, P. J., and Longworth, C. EmotionSense : A Mobile Phones based Adaptive Platform for Experimental Social Psychology Research. In Proc. UbiComp (2010). Google ScholarDigital Library
Varga, A., and Steeneken, H. J. M. Assessment for automatic speech recognition ii: Noisex-92: a database and an experiment to study the effect of additive noise on speech recognition systems. Speech Commun. (1993). Google ScholarDigital Library
You, D., Han, J., Zheng, G., and Zheng, T. Sparse power spectrum based robust voice activity detector. In Proc. ICASSP (2012).Google ScholarCross Ref

Index Terms

Robust voice activity detection for social sensing

Recommendations

A study of voice activity detection techniques for NIST speaker recognition evaluations

Since 2008, interview-style speech has become an important part of the NIST speaker recognition evaluations (SREs). Unlike telephone speech, interview speech has lower signal-to-noise ratio, which necessitates robust voice activity detectors (VADs). ...
Read More
Formant-based robust voice activity detection

Voice activity detection (VAD) can be used to distinguish human speech from other sounds, and various applications can benefit from VAD-including speech coding and speech recognition. To accurately detect voice activity, the algorithm must take into ...
Read More
Super-audible voice activity detection

In this paper, reflected sound of frequency just above the audible range is used to detect speech activity. The active signal used is inaudible to humans, readily generated by the typical audio circuitry and components found in mobile telephones, and is ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
UbiComp '13 Adjunct: Proceedings of the 2013 ACM conference on Pervasive and ubiquitous computing adjunct publication
September 2013
1608 pages
ISBN:9781450322157
DOI:10.1145/2494091
General Chairs:
Friedemann Mattern
ETH Zurich, CH
,
Silvia Santini
TU Darmstadt, DE
,
Program Chairs:
John F. Canny
UC Berkeley, US
,
Marc Langheinrich
Università della Svizzera italiana, CH
,
Jun Rekimoto
University of Tokyo, JP
Copyright © 2013 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 8 September 2013
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
communication pattern
human behavior observation
robust speech detection
social sensing
Qualifiers
- tutorial
Conference

Acceptance Rates
UbiComp '13 Adjunct Paper Acceptance Rate254of399submissions,64%Overall Acceptance Rate764of2,912submissions,26%
More
Upcoming Conference
UBICOMP '24

Sponsor:

sigchi

sigchi

UBICOMP '24: The 2024 ACM International Joint Conference on Pervasive and Ubiquitous Computing

October 5 - 9, 2024

Melbourne , VIC , Australia
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 2
  Total Citations
  View Citations
- 135
  Total Downloads
- Downloads (Last 12 months)3
- Downloads (Last 6 weeks)2
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Robust voice activity detection for social sensing

UbiComp '13 Adjunct: Proceedings of the 2013 ACM conference on Pervasive and ubiquitous computing adjunct publication

ABSTRACT

References

Cited By

Index Terms

Recommendations

A study of voice activity detection techniques for NIST speaker recognition evaluations

Formant-based robust voice activity detection

Super-audible voice activity detection