Twitter will be able to detect and markup Basque, in collaboration with PuntuEUS and CodeSyntax
But this is not the only step that Twitter (now Twitter.eus as well!) will be taking in support of the Basque community. Twitter's interface is in Basque since 2012, Twitter does not recognise Basque in each tweet; the metadata are marked wrong: Basque tweets are always wrongly identified as Indonesian, Finnish, whatever... Now, thanks to the collaboration with the PuntuEUS Foundation, Twitter is working to resolve this, and it will soon be able to identify tweets written in Basque. And we at CodeSyntax will assist in this effort.
Since 2011, CodeSyntax maintains the UMAP project, which handles and filters the activity of Basque tweeters, offering, among other data, Trending Topics speficic to the Basque community, which Twitter doesn't do by itself.
For instance, according to our data, there were 7,569 active users using Basque in April in Twitter. As far as the tweets are concerned, these active users in the community sent 653,108 tweets of which 50.70% were in Basque. Language detection is done realtime in Umap, so PuntuEUS and Twitter Spain asked us for help, in the form of a corpus of tagged tweets. They needed several thousands, we sent 200,000. We hope it will suffice.
CodeSyntax has not only developed Basque language technology to be used in Twitter. We have done a similar effort with Welsh in the Ffrwti website.