ABSTRACT
Speech signals convey information not only for speakers' identity and the spoken language, but also for the acquisition devices used during their recording. Therefore, it is reasonable to perform acquisition device identification by analyzing the recorded speech signal. To this end, the random spectral features (RSFs) are proposed as an intrinsic fingerprint suitable for device identification. The RSFs are extracted from each speech signal by first averaging its spectrogram along the time axis and then by projecting the resulting mean spectrogram onto a Gaussian random matrix of compatible dimensions. By applying a sparse-representation based classifier to the device RSFs, state-of-the-art identification accuracy of 95.55% has been obtained on a set of 8 telephone handsets, from Lincoln-Labs Handset Database (LLHDB).
- E. Bingham and H. Mannila. Random projection in dimensionality reduction: applications to image and text data. In Proc. 7th ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 245--250, San Francisco, California, USA, 2001. Google ScholarDigital Library
- E. Candes and T. Tao. Decoding by linear programming. IEEE Trans. Inform. Theory, 51(12):4203--4215, 2005. Google ScholarDigital Library
- C.-C. Chang and C.-J. Lin. LIBSVM: A library for support vector machines. ACM Trans. Intell. Syst. Technol., 2(3):1--27, 2011. Google ScholarDigital Library
- D. Donoho. For most large underdetermined systems of equations, the minimal l1-norm near-solution approximates the sparsest near-solution. Communications on Pure and Applied Mathematics, 59(7):907--934, 2006.Google ScholarCross Ref
- H. Farid. Digital image forensics. Scientific American, 6(298):66--71, 2008.Google ScholarCross Ref
- D. Garcia-Romero and C. Y. Espy-Wilson. Automatic acquisition device identification from speech recordings. In Proc. 2010 IEEE Int. Conf. Acoustics, Speech, and Signal Processing, pages 1806--1809, Dallas, Texas, USA, 2010.Google ScholarCross Ref
- C. Hanilci, F. Ertas, T. Ertas, and O. Eskidere. Recognition of brand and models of cell-phones from recorded speech signals. IEEE Trans. Information Forensics and Security, 7(2):625--634, 2012.Google ScholarDigital Library
- C. Kraetzer, A. Oermann, J. Dittmann, and A. Lang. Digital audio forensics: a first practical evaluation on microphone and environment classification. In Proc. 9th ACM Workshop Multimedia and Security, pages 63--74, Dallas, Texas, USA, 2007. Google ScholarDigital Library
- R. Maher. Audio forensic examination. IEEE Signal Processing Magazine, 26(2):84--94, 2009.Google ScholarCross Ref
- H. Malik and H. Farid. Audio forensics from acoustic reverberation. In Proc. 2010 IEEE Int. Conf. Acoustics Speech and Signal Processing, pages 1710--1713, Dallas, Texas, USA, 2010.Google ScholarCross Ref
- A. Oermann, A. Lang, and J. Dittmann. Verifier-tuple for audio-forensic to determine speaker environment. In Proc. 7th ACM Workshop on Multimedia and Security, pages 57--62, New York, NY, USA, 2005. Google ScholarDigital Library
- D. Reynolds. HTIMIT and LLHDB: speech corpora for the study of handset transducer effects. In Proc. 1997 IEEE Int. Conf. Acoustics, Speech, and Signal Processing, volume 2, pages 1535--1538, Munich, Germany, 1997. Google ScholarDigital Library
- J. Wright, A. Yang, A. Ganesh, S. Sastry, and Y. Ma. Robust face recognition via sparse representation. IEEE Trans. Pattern Anal. Mach. Intell., 31(2):210--227, 2009. Google ScholarDigital Library
- R. Yang, Z. Qu, and J. Huang. Detecting digital audio forgeries by checking frame offsets. In Proc. 10th ACM workshop on Multimedia and Security, pages 21--26, New York, NY, USA, 2008. Google ScholarDigital Library
Index Terms
- Automatic telephone handset identification by sparse representation of random spectral features
Recommendations
Telephone Handset Identification by Collaborative Representations
Recorded speech signals convey information not only for the speakers' identity and the spoken language, but also for the acquisition devices used for their recording. Therefore, it is reasonable to perform acquisition device identification by analyzing ...
Exemplar-Based Sparse Representation Features: From TIMIT to LVCSR
The use of exemplar-based methods, such as support vector machines (SVMs), k-nearest neighbors (kNNs) and sparse representations (SRs), in speech recognition has thus far been limited. Exemplar-based techniques utilize information about individual ...
Native vs. non-native accent identification using Japanese spoken telephone numbers
In forensic investigations, it would be helpful to be able to identify a speaker's native language based on the sound of their speech. Previous research on foreign accent identification suggested that the identification accuracy can be improved by using ...
Comments