HealthMap: Global news surveillance for infectious diseases
While western countries usually have effective and comprehensive systems for monitoring infectious diseases, the countries most at risk often also lack the systems for quality reporting. HealthMap is a Google-founded research project that tries to counter this by using publicly available online information from non-official sources, such as news outlets, discussion forums and blogs, to give early warnings on disease outbreaks. The system is discussed in a recent paper from the open access journal PLoS Medicine and is already running available free of charge to the public, and is already used by 20.000 individuals monthly.
Taken from the report, the heat map below shows A) the number of outbreaks reported in English language sources, in raw numbers, from October 1, 2006 to July 18, 2007 and B) the reported outbreaks adjusted for population. The numbers were collected to investigate the nature of bias in reporting of infectious diseases:

In this study, the researchers found a host of biases skewing their data:
“There was a clear bias towards increased reporting from countries with higher numbers of media outlets, more developed public health resources, and greater availability of electronic communication infrastructure (approximated by number of Internet hosts) (Figure 3B). These trends are highly relevant for users of the system, and thus the individual impact of these factors on surveillance will form the basis of a detailed user guide currently under development.”
Furthermore, they also acknowledge other potential problems:
“While local news sources may report on incidents involving a few cases that would not be picked up at the national level, such sources may be less reliable, lacking resources and training, and may report stories without adequate confirmation. Furthermore, other biases may be intentionally introduced for political reasons through disinformation campaigns (false positives) or state censorship of information relating to outbreaks (false negatives).
[..]
We found that pathogen diversity was substantial across news sources, with 141 unique infectious disease categories reported through the Google News feed alone (Table 1). We found the frequency of reports about particular pathogens to be related not to their associated morbidity or mortality impact, but rather to the direct or potential economic and social disruption caused by the outbreak.”
However, that is just the nature of researching patterns of human behavior, particularly as it is reported by news media. Identifying the bias goes a long way towards controlling for it - and as far as disease outbreaks go, knowing about the bias may be enough. Getting as many reports as possible could be sufficient, depending on what you need.
My immediate thought, was that it would be interesting to track developments in time for the different diseases - to see if they follow on, perhaps drawing links between outbreaks as they happen in time. Another idea could be to see if there are improbably close outbreaks, either in time or space, to detect if diseases are spreading. The advantage with getting the data out visually like this is that there is a host of methods to detect patterns in the complexity that only humans can do - and then it can be implemented as algorithms.
Then again, I’m not a epidemologist, so it could be that there already is extremely advanced systems for this already..
Read more in Wired: Google maps disease outbreaks. Furthermore, Google.org has its own blog post (Google.org is Google’s philantrophic organization).



