Friday, December 30, 2011

Operational Research Consulting & Data Journalism

As data becomes more and more accessible, together with visualisation tools becoming more available and user friendly, Data Journalism is heating up. I've been following the Guardian's Data Blog enthusiastically, it is full of interesting information relevant to current affairs, explained with much facts and data.

This article talks about the 10 point guide to data journalism. I particularly like point 5:
Data journalism is 80% perspiration, 10% great idea, 10% output
The Prezi under point 5 explains the process of how data is used to support news, the angles to consider when mashing datasets together, the technical challenges of working with data, iterative calculation and QA process, which finally get turned into the beautiful output with the various (mostly free) visualisation tools.

This is practically the same process that an Operational Research consulting project takes - or any application of OR or Science in general:
  • Understand what the problem/question is
  • Create a hypothesis to be proven or disproved
  • Define what data is needed for the quest
  • Get the data
  • Clean it, and manipulate/wrangle with it so it's usable for analysis
  • Analyse/calculate to come to some conclusion - hence proving or disproving the hypothesis
  • Compare it to subject matter experts' view on what the likely answer should be (sanity check)
  • Refine the analysis until satisfied
  • Shape the output message so it can be easily understood by the audience
  • Communicate the findings
  • All throughout the process, keep communicating to the audience to make sure they are engaged and understand (principle-wise) what you're trying to do, so that they are not unpleasantly surprised when the final answer is presented
  • Best yet, to ensure smooth change management if your solution is to be implemented, work closely with the end users from the start of designing the solution, and then implement and test, so that they believe in the solution because they were part of the creation process.
As the Flowing Data blog points out, this is what statisticians do. I will add that this is what Science does in general. I will also say that in practice, the first step, "understanding what the problem/question is", often takes 70-80% of the time. The technical 'doing' to follow, in practice, is relatively easy compared to what our academic institutions thoroughly prepare us for (which is needed).

For those interested in the how of data journalism, read this about the work that went into reporting on the 2011 London Riots. Fascinating social media analytics at work. Not easy. Impressive and very interdisciplinary.

P.S. Most of this post has been sitting as draft since the summer, hence referencing 'old' news. It's still relevant, so why not.