Abstract
The human speech contains paralinguistic information used in many speech recognition applications like automatic speech recognition, speaker recognition, and verification. Gender from voice is considered as one of the essential tasks to be detected for such applications. To build a model from a training set, a set of relevant speech features is extracted in order to distinguish gender (i.e., female or male) from a speech signal. This paper focuses on comparison of the proposed neural network (NN) model with the different features like MFCC and mel spectrogram extracted from the speech signal to recognize the gender. Experiments are carried on Mozilla voice dataset and evaluated performance of the network. Experiments show that the combination of MFCC and mel feature sets shows the better accuracy with 94.32%.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Bishop, J., Keating, P.: Perception of pitch location within a speaker’s range: fundamental frequency, voice quality and speaker sex. J. Acoust. Soc. Am. 32–2, 1100–1112 (2012)
Gaikwad, S., Gawali, B., Mehrotra, S.C.: Gender identification using SVM with combination of MFCC. Adv. Comput. Res. 4, 69–73 (2012)
Zeng, Y.M., Wu, Z.Y., Falk, T., Chan, W.Y.: Robust GMM based gender classification using pitch and RASTA-PLP parameters of speech. In: Proceedings of the International Conference on Machine Learning and Cybernetics, pp. 3376–3379 (2006)
Vergin, R., Farhat, A., O’Shaughnessy, D.: Robust gender-dependent acoustic phonetic modelling in continuous speech recognition based on a new automatic male/female classification. In: Proceedings of International IEEE Conference Acoustics, Speech, and Signal Processing (ICASSP-96), vol. 2, pp 1081–1084. Atlanta, May 7–10 1996
Harb, H., Chen, L.: Voice-based gender identification in multimedia applications. J. Intell. Inform. Syst. 24(2), 179–198 (2005)
Zeng, Y., Wu, Z., Falk, T., Chan, W.Y.: Robust GMM based gender classification using pitch and RASTA-PLP parameters of speech. In: Proceedings of 5th IEEE international conference machine learning and cybernetics, pp 3376–3379. China (2006)
Metze, F., Ajmera, J., Englert, R., Bub, U., Burkhardt, F., Stegmann, J., Muller, C., Huber, R., Andrassy, B., Bauer, J.G., Littel, B.: Comparison of four approaches to age and gender recognition for telephone applications. In: Proceedings of 2007 IEEE International Conference on Acoustics, Speech and Signal Processing, vol. 4, pp. 1089–1092. Honolulu, April 15–20 2007
Ververidis, D., Kotropoulos, C.: Automatic speech classification to five emotional states based on gender information. In: Proceedings of European Signal Processing Conference (EUSIPCO 04), vol. 1, pp. 341–344, Vienna, Austria, Sep. 6–10 2004
Lin, Y.L., Wei, G.: Speech Emotion Recognition Based on HMM and SVM. In: Proceedings of IEEE International Conference Machine Learning and Cybernetics, vol. 8, pp. 4898–4901. Guangzhou, China (2005)
Xiao, Z., Dellandréa, E., Dou, W., Chen, L.: Hierarchical classification of emotional speech. Technical Report RR-LIRIS-2007-06, LIRIS UMR 5205 CNRS (2007)
Raahul, A., Sapthagiri, R., Pankaj, K., Vijayarajan, V.: Voice based gender classification using machine learning. Published under licence by IOP Publishing Ltd., IOP Conference Series: Materials Science and Engineering, vol 263, Issue 4
Qawaqneh, Z., Mallouh, A.A., Barkana, B.D.: Deep neural network framework and transformed MFCCs for speaker’s age and gender classification. Knowl.-Based Syst. 115, 5–14 (2017)
Kabil, S.H., Muckenhirn, H., Magimai-Doss, M.: On learning to identify genders from raw speech signal using CNNs. In: Proceedings of Interspeech, pp. 287–291 (2018)
Doukhan, D., Carrive, J., Vallet, F., Larcher, A., Meignier, S.: An open-source speaker gender detection framework for monitoring gender equality. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (2018)
Buyukyilmaz, M., Cibikdiken, A.O.: Voice gender recognition using deep learning. In: Conference on Modeling, Simulation and Optimization Technologies and Applications (2016)
Markitantov, M., Verkholyak, O.: Automatic recognition of speaker age and gender based on deep neural networks. In: International Conference on Speech and Computer (SPECOM) (July 2019)
Mozilla: Common voice. Retrieved from https://voice.mozilla.org/ and https://www.kaggle.com/mozillaorg/common-voice
McFee, B., et al.: librosa: audio and music signal analysis in Python. In: Proceedings of the 14th Python in Science Conference, pp. 18–24 (2015)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Chachadi, K., Nirmala, S.R. (2022). Voice-Based Gender Recognition Using Neural Network. In: Joshi, A., Mahmud, M., Ragel, R.G., Thakur, N.V. (eds) Information and Communication Technology for Competitive Strategies (ICTCS 2020). Lecture Notes in Networks and Systems, vol 191. Springer, Singapore. https://doi.org/10.1007/978-981-16-0739-4_70
Download citation
DOI: https://doi.org/10.1007/978-981-16-0739-4_70
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-16-0738-7
Online ISBN: 978-981-16-0739-4
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)