Twitter will be able to detect and markup Basque, in collaboration with PuntuEUS and CodeSyntax

Jun 19, 2015
Thanks to a collaboration with the PuntuEUS Foundation, the Twitter social network has taken the leap to the .EUS domain, which is the expression of the Basque language community. Moreover, thanks to data provided by CodeSyntax, Twitter will soon be able to identify tweets written in Basque.
Josu Waliño of PuntuEUS and Alex Gibelalde of Twitter Spain (Pic: PuntuEUS)

Josu Waliño of PuntuEUS and Alex Gibelalde of Twitter Spain (Pic: PuntuEUS)

But this is not the only step that Twitter (now Twitter.eus as well!) will be taking in support of the Basque community. Twitter's interface is in Basque since 2012, Twitter does not recognise Basque in each tweet; the metadata are marked wrong: Basque tweets are always wrongly identified as Indonesian, Finnish, whatever... Now, thanks to the collaboration with the PuntuEUS Foundation, Twitter is working to resolve this, and it will soon be able to identify tweets written in Basque. And we at CodeSyntax will assist in this effort.

Since 2011, CodeSyntax maintains the UMAP project, which handles and filters the activity of Basque tweeters, offering, among other data, Trending Topics speficic to the Basque community, which Twitter doesn't do by itself.



For instance, according to our data, there were 7,569 active users using Basque in April in Twitter. As far as the tweets are concerned, these active users in the community sent 653,108 tweets of which 50.70% were in Basque. Language detection is done realtime in Umap, so PuntuEUS and Twitter Spain asked us for help, in the form of a corpus of tagged tweets. They needed several thousands, we sent 200,000. We hope it will suffice.

CodeSyntax has not only developed Basque language technology to be used in Twitter. We have done a similar effort with Welsh in the Ffrwti website.
 

Comment

To comment this article you have to log in using your Facebook, Twitter or Google account.

Luistxo Fernandez

Project manager.