Mar 01, 2011
It's been a month since we launched the 3rd version of Umap, the Welsh website Umap Cymraeg, an engine than aims to gather all tweets written in Welsh and build rich information upon that. We've been following its deployment and acceptance, and we're overwhelmed. With just 600 users at its launch, it's automated system has already detected almost double that number, and we are happy to see that Welsh users are praising the tool. It's not all merit of CodeSyntax, makers of Umap: a Welsh friend, Rhodri ap Dyfrig has helped this task greatly. Diolch. | Umap Cymraeg © cc-by-sa: codesyntax

Our friend Rhodri has described Umap Cymraeg or in postings in his website (welsh, english), but we'll extract some of those thoughts here:

Some may not quite see the point of duplicating content from Twitter, but the aim is to provide a space in which Welsh tweets are seen in context with each other, rather than in a sea of other content. Umap creates a distinct Welsh language space out of a very complex and busy twitter stream.

Apart from the ability to be able to dip in to a monolingual Welsh Twitter, this has several other spinoff advantages:

  • This is he first time that trends amongst Welsh language tweets have been identified. From a mapping perspective alone this is useful, but it can also be harnessed for marketing, discovery, brand /news monitoring and other uses.
  • "Umap’s top news/links": feed creates a new and unique way of seeing what the conversation is about at any moment in time. No service has previously been able to automatically gauge what is making waves online in the Welsh language. Of course it can also show that little is shared, or that the range of things that are shared in Welsh are limited, but this is great for building a better discussion in the Welsh language. Spot the gaps, and we can then find ways to fill them. | Umap Cymraeg news section © cc-by-sa: codesyntax

  • Umap can work as a first point of contact for new Twitter users who wish to discover other Welsh speakers. There are now over 1,100 users being followed with the top 50 busiest ranked on a page. This is already being used as a signpost to where the Welsh language conversation is at.
  • Umap can be built upon. All published tweets are archived and searchable. Twitter displays them for a limited time. Umap displays them until there is not enough space for them all. This archive is easily searchable and could be valuable as a research tool or as a corpus of informal Welsh language usage. In future I hope that it will be possible for developers to use this data for secondary applications of all sorts. After all, one of the great things about Twitter is that services like Umap can be built on top of it, allowing for many innovations.

Anyway, Umap Cymraeg is not perfect. At present, from rough calculations and mere obvservation it seems that it recognises about 60-65% of Welsh language tweets. Problems arise when tweets are short or where there is a mixture of Welsh and another language. It is however very good at not publishing tweets which aren’t in Welsh, with relatively few English only tweets coming through. We hope that we can improve this rate as we go on but it seems that even this rate is acceptable in providing an overview of the public discussion in Welsh.

We hope to keep improving Umap Cymraeg. Some features will be shared with the Catalan or Basque sites (as well as possible future launches) as is the case with the News section. But others need to be specific: we're not happy with the Welsh detection algorithm. Tests will come and we hope to produce better results soon.

If you think, as Rhodri thought in Wales, than developing a Twitter filter for your language is a good idea, please contact us.

Add comment

You can add a comment by filling out the form below. Plain text formatting. Web and email addresses are transformed into clickable links. Comments are moderated.

You may be interested in these other articles