Jun 19, 2015
Thanks to a collaboration with the PuntuEUS Foundation, the Twitter social network has taken the leap to the .EUS domain, which is the expression of the Basque language community. Moreover, thanks to data provided by CodeSyntax, Twitter will soon be able to identify tweets written in Basque.

But this is not the only step that Twitter (now Twitter.eus as well!) will be taking in support of the Basque community. Twitter's interface is in Basque since 2012, Twitter does not recognise Basque in each tweet; the metadata are marked wrong: Basque tweets are always wrongly identified as Indonesian, Finnish, whatever... Now, thanks to the collaboration with the PuntuEUS Foundation, Twitter is working to resolve this, and it will soon be able to identify tweets written in Basque. And we at CodeSyntax will assist in this effort.

Since 2011, CodeSyntax maintains the UMAP project, which handles and filters the activity of Basque tweeters, offering, among other data, Trending Topics speficic to the Basque community, which Twitter doesn't do by itself.



For instance, according to our data, there were 7,569 active users using Basque in April in Twitter. As far as the tweets are concerned, these active users in the community sent 653,108 tweets of which 50.70% were in Basque. Language detection is done realtime in Umap, so PuntuEUS and Twitter Spain asked us for help, in the form of a corpus of tagged tweets. They needed several thousands, we sent 200,000. We hope it will suffice.

CodeSyntax has not only developed Basque language technology to be used in Twitter. We have done a similar effort with Welsh in the Ffrwti website.

Add comment

You can add a comment by filling out the form below. Plain text formatting. Web and email addresses are transformed into clickable links. Comments are moderated.

You may be interested in these other articles