View text source at Wikipedia
Hatebase is a joint project of the Sentinel Project for Genocide Prevention and the Dark Data Project that is described on its website as an "online repository of structured, multilingual, usage-based hate speech". It uses text analysis of speech and written content (including radio transcripts, transcripts of spoken web content, tweets, and articles) and identification of hate speech patterns within it to predict potential regional violence.[1]
The introduction of Hatebase was announced on the Sentinel Project blog on March 25, 2013.[2][3] The initiative is led by Timothy Quinn of the Dark Data Project.[4][2]
In an article for Foreign Policy, Joshua Keating described Hatebase as follows: "There are two main features to Hatebase. The first is a Wikipedia-like interface which allows users to identify hate speech terms by region and the group they refer to. This could have some value for researchers, but Hatebase's developers are especially excited by the second main feature, which allows users to identify instances when they've heard these terms used."[5] The example of the Rwandan genocide was cited in that article and also in an article about Hatebase on Maclean's: in the months leading up to the genocide, radio stations attempted to dehumanize Tutsis to Hutus by repeatedly referring to the Tutsis as cockroaches.[4]
The regional and multilingual focus of the site was deemed particularly useful for identifying words that could be construed as hate in some languages and contexts but that outsiders would not know of, such as the word "sakkiliya" in Sinhalese (the language in Sri Lanka) used to refer to a Tamil person as 'a very unhygienic or uncultured person'[6] or the reference to Tutsis as cockroaches by the Rwandan radio stations, that an outsider may simply consider evidence that the region was suffering from a literal cockroach infestation.[7][5] This relates to the challenge of identifying subtly different uses of the same or similar words, one of which connotes hate and the other doesn't.[5] In the context of language that equates humans with pollution or stains, this is also called the human stain problem.
Another related challenge is to control for the ambient level of casual hate speech in society (such as YouTube comments): in some societies and contexts, hateful language may not be accompanied by or followed by violence, whereas in others, it might. For this reason, the evidence was only considered valuable in conjunction with other evidence about the risk and threat of violence, and the project concentrated its efforts on mapping hate speech in regions with a history of violence.[5]
Hatebase provided an Application programming interface, which is now retired,[1] and a PHP wrapper/SDK is available on GitHub.[8] Information about the API can be found at Programmable Web[9] and Mashape.[10]
The launch of Hatebase was covered in Wired Magazine[6] and the story was picked up and discussed on Slashdot.[11] Hatebase was also covered in Metro News, a Canadian publication.[7] It was also covered in the Canadian weekly Maclean's.[4]
Joshua Keating covered Hatebase in an article for Foreign Policy.[5] A week later, the magazine published a response letter by Gwyneth Sutherlin, a doctoral candidate at the University of Bradford, pointing out potential problems and limitations of the approach used by Hatebase.[12]
On September 10, 2019, TechCrunch published a feature about Hatebase called "Hatebase catalogues the world’s hate speech in real time so you don’t have to".[13]