Automatic language identifier
Information on the language identifier
A language identifier is an automatic classifier. It calculates the similarity of a text with previously inserted reference texts.
It creates an n-dimensional representation of the text (Vector Space Model) by using the statistical properties of the byte sequences found in the text as coordinates. It performs the same operation on previously inserted reference texts. In the n-dimensional space, the inserted text will have a precise position. The reference text closest to it will be the one which most resembles it.
Why have we developed this?
This technology is an integral part of a spider able to extract useful information for our translators from the web.
As an automatic classifier, it can easily be used to say in which category a document belongs by providing example documents. For this, we are also using it to classify our correspondence and to identify the topic of a written text in a language we do not understand.
I want it!
If you are interested in this technology, please read more on Translated Labs and our services for natural language processing.
Explore our experiments
The Language Identifier automatically detects the language of a written text. It can also be used to identify the topic of a written text in a language you do not understand.Learn more
What do the words airplane, bird, and helicopter have in common? This application searches for semantic relationships in a text by analyzing the statistical properties of words.Learn more
What happens when you translate an English sentence into Japanese, and then again into English, as if it was an infinite loop? Well, give it a try! And don't forget to share the funniest results with your friends.Learn more