Tuesday, November 5, 2013

From Operations Research to Data Science

In the last post, I wrote about how good it is to see OR linked as a skillset to data science. However, do note that OR is only one part of the DS skillsets. OR ≠ Data Science. How does an Operations Researcher transition to a Data Scientist?

There are a few things the O'Reilly book I talked about in the last post briefly mentions as suggestions for an OR person to learn more about: some of the new Bayesian / Monte Carlo Statistics methods, broad programming skills, data warehouse architecture for big data technology, and business kills "to be able to intelligently collaborate with (or lead) others on a data science team".


For those looking to upgrade, here are my quick thoughts on where to start.

Bayesian Data Analysis: Andrew Gelman from Columbia is running a course on Bayesian Data Analysis *right now*, with Google+ Hangout sessions. Looks very interesting.

Programming skills: see my previous post on learning R and Python - the languages of data science.

Big data architecture: in my experience, first understand the layers of a normal data warehouse architecture, then broaden to the enterprise BI architecture stack, then learn about the new bits for addressing the "big" aspect. I was fortunate to have led a fairly big project in this area, and had the opportunity to work with some great data warehouse architects and enterprise BI architects to learn a ton from them. I'm not sure what the best self-learning material is other than the typical read-a-lot. Wikipedia doesn't seem to cut it, and the best material that helped me aren't publicly available. Hmm...I will have to think about this - topic for another post perhaps. In the meanwhile, Pivotal seems to do a fairly good job in their blog to dumb down the explanation of the bits for "big" data technology in some practical terms.

Business skills: I think this only applies to academics (sorry for the generalisation). For the practitioners, i.e. OR people working in and with businesses, that's a fundamental part of our jobs.

Operations Research is a skillset of a Data Scientist

... according to O'Reilly, yes, it is. 
This is perhaps the clearest I've seen anyone link OR to Data Science. Or perhaps, depending on how you read it...it shows that some Data Scientists are OR people. OR is a subset of Data Science skills.

Data Scientist (DS) - a very popular label that seems to be associated with people kind of like us-OR-people these days (just like "analytics" has been for the last few years), but no one is completely sure exactly what it is. As a result, many of us are reluctant to call ourselves a data scientist, or don't know how to make the transition to be called one (see my next post on where to start). There is the Venn diagram, and examples from famous DS people like Nate Silver and Hilary Mason (who are identified more as statisticians than anything else), but confusions are still bountiful.

OR has always had a bit of an identity crisis - how many jobs have you seen with the words "operations research" in the title or description? Is "Data Science" here to help?

O'Reilly published a book, titled "Analyzing the Analyzers", which discusses the results and implications for people in these related fields, based on a survey they ran in mid-2012, with whom they consider as data scientists, and "how they viewed their skills, careers, and experiences with prospective employers". Their goal, best summarised in their own words, are, "in the broad Analytics / Data Science / Big Data / Applied Stats / Machine Learning space, ...to define these new fields better, and we hope the results will help people such as yourself talk about how your skills and your work fit in with everyone else's."

The main result was summarised into a 5X4 matrix (credit: O'Reilly), showing where the survey respondents are in terms of skills / expertise and the label they associate themselves with. 

The list of skills they grouped under "Math / OR" are: Optimisation, Math, Graphical Models, Bayesian / Monte Carlo Statistics, Algorithms, Simulation. Sounds familiar indeed.

Hooray for the mention of OR as a Data Science skillset!


I recommend reading the full report for more details. Here is a summary to give you a taste:
  • Four data scientist clusters
  • Cases in miscommunication between data scientists and organisations looking to hire
  • Why "T-shaped" data scientists have an advantage in breadth and depth of skills
  • How organisations can apply the survey results to identify, train, integrate, team up, and promote data scientists
(The last point above: it wasn't too comprehensive, so don't expect too much. More of a taster.)

Have you got what it takes to call yourself a Data Scientist? OR folks, see my next post on how to upgrade yourself (umm, didn't mean to make you sound like machines).