In this blog, I am demonstrating how to convert speech to text using Python. This can be done with the help of the “Speech Recognition” API and “PyAudio” library.
By Dhilip Subramanian, Data Scientist and AI Enthusiast
Speech is the most common means of communication and the majority of the population in the world relies on speech to communicate with one another. Speech recognition system basically translates spoken languages into text. There are various real-life examples of speech recognition systems. For example, Apple SIRI which recognize the speech and truncates into text.
Hidden Markov Model (HMM), deep neural network models are used to convert the audio into text. A full detailed process is beyond the scope of this blog. In this blog, I am demonstrating how to convert speech to text using Python. This can be done with the help of the “Speech Recognition” API and “PyAudio” library.
Speech Recognition API supports several API’s, in this blog I used Google speech recognition API. For more details, please check this. It helps to translate for converting speech into text.
!pip install SpeechRecognition
Steps:
#import library import speech_recognition as sr # Initialize recognizer class (for recognizing the speech) r = sr.Recognizer() # Reading Audio file as source # listening the audio file and store in audio_text variable with sr.AudioFile('I-dont-know.wav') as source: audio_text = r.listen(source) # recoginize_() method will throw a request error if the API is unreachable, hence using exception handling try: # using google speech recognition text = r.recognize_google(audio_text) print('Converting audio transcripts into text ...') print(text) except: print('Sorry.. run again...')
Output
For example, if we want to read a french language audio file, then need to add language option in the recogonize_google. Remaining code remains the same. Please refer more on the documentation
#Adding french langauge option text = r.recognize_google(audio_text, language = "fr-FR")
Output
Steps:
!pip install PyAudio
#import library import speech_recognition as sr # Initialize recognizer class (for recognizing the speech) r = sr.Recognizer() # Reading Microphone as source # listening the speech and store in audio_text variable with sr.Microphone() as source: print("Talk") audio_text = r.listen(source) print("Time over, thanks") # recoginize_() method will throw a request error if the API is unreachable, hence using exception handling try: # using google speech recognition print("Text: "+r.recognize_google(audio_text)) except: print("Sorry, I did not get that")
I just talked “How are you?”
Output
Again, we need to add the required language option in the recognize_google(). I am talking in Tamil, Indian language and adding “ta-IN” in the language option.
# Adding "tamil language" print(“Text: “+r.recognize_google(audio_text, language = “ta-IN”))
I just said “how are you” in Tamil and it prints the text in Tamil accurately.
Output
Google speech recognition API is an easy method to convert speech into text, but it requires an internet connection to operate.
In this blog, we have seen how to convert the speech into text using Google speech recognition API. This would be very helpful for NLP projects especially handling audio transcripts data. If you have anything to add, please feel free to leave a comment!
Thanks for reading. Keep learning and stay tuned for more!