Introduction
The spoken language identifier is a service that tries to determine the language spoken in an audio recording.
The model currently supports 8 languages: English, Spanish, Italian, French, German, Portuguese, Dutch, and Russian.
Supported audio formats: WAV, FLAC, OGG.
Technology
The model uses convolutional and recurrent neural networks trained on tens of hours of speech data. This is an end-to-end model that uses a raw waveform as input and makes no assumptions about the phonetics or the grammars of the languages considered. Rather, it tries to infer all the relevant features of the audio from the data. It produces the probability distribution over the languages recognized by the model as the output.
You can use it to classify recordings as short as 1 second and as long as a minute. Note that the longer the recording, the higher the accuracy of the prediction. For 20 second recordings the accuracy is about 95%, while for 5 second samples it is just over 80%
I want it
If this technology interests you, please have a look at our API available on
Rapidapi