Amid the 150 million tweets published by Twitter every day, a massive bank of data is produced. Some of it is about music, some of it is about products, some about politics, and some about what people are eating for breakfast.
But when there’s a measles outbreak or a flu epidemic, people post information about their symptoms, their concerns, and their treatments.
Far from being useless online noise, these tweets can be used as an online tracker for disease outbreaks – in real time. By looking at online trends in patient-reported flu-like symptoms, one million anecdotes can provide useful information on the location and severity of a disease outbreak.
Even at the height of a pandemic, when the full arsenal of health authorities’ surveillance system has been mobilised, the data published by health authorities is a couple of weeks behind the real trend.
This is where online data mining can help. Google Flu Trends has been shown to reveal the pattern of seasonal flu cases reported by health agencies. Official data based on confirmed cases of flu fits neatly over the chart produced by Google’s online monitoring.
This matters. Having two weeks’ notice of a pandemic gives more time to prepare and improves the likelihood that the disease can be contained. Google flu trends
A word of caution
However, some have cautioned against relying on Google Flu Tends or Twitter given that influenza-like symptoms may not caused by the flu virus. There is also the risk that health scares will produce an online ‘buzz’ about a particular subject, even when it may be a false alarm.
So, perhaps health authorities will hold off before abandoning their tried and testing networks of infectious disease reporting.
Having said that, analyzing the data contained in millions of tweets is a new science and will become more precise over time. The reliability also strengthens as the data set grows, meaning that the increasing volume of tweets, status updates and blog posts will continue to provide a more robust information source.
The question is: should health authorities be mining social media sites for data to support their traditional disease surveillance systems?