Optimizing Speech Emotion Recognition Using Convolutional Neural Network Technique
Keywords:
Deep Learning; Feature Extraction; Datasets; Convolutional Neural Network; Speech Emotion RecognitionAbstract
This study investigates the successful recognition and interpretation of emotional states through the use of technology, specifically speech-based emotional analysis. Since people with mental health disorders frequently have trouble regulating their emotions, accurate emotion recognition is essential to delivering individualized care. Counsellors and mental health specialists can provide individualized interventions and gain a deeper understanding of people's emotional states by utilizing technology. Convolutional Neural Networks (CNNs) for Speech Emotion Recognition (SER) are the main topic of this study, which focuses on deep learning methods. Notwithstanding notable progress, issues like lowering computational complexity and attaining high accuracy still exist in this field. The paper assesses SER utilizing well-known datasets, such as RAVDESS, CREMA, SAVEE, and TESS, in order to overcome these issues. Root Mean Square Error (RMSE) is the most successful feature extraction technique, according to the methodology, which looks at the performance of five distinct feature extraction strategies. The CNN-based model attains remarkable accuracy scores of 93.11% and 96.07% using RMSE. Conv1D layers are used in the model to apply the best feature extraction technique, allowing for real-time emotion recognition. In order to improve SER accuracy, this study emphasizes the significance of making thoughtful decisions about feature extraction techniques and dataset selection. The study significantly advances SER technology by using CNN models and carefully optimizing feature extraction. In order to progress the area and fully solve its enduring issues, it also highlights the necessity of enhanced modelling techniques and further investigation of various datasets.