Monday, December 16, 2013

Hiring 1 Data Science unicorn is hard enough, a team is impossible. To scale means to specialise.

The Data Scientists need a large set of skills, including business know-how, modelling and mathematics, plus programming. They are as hard to find as unicorns, or superheroes. I know this talent shortage first hand. Is the solution to create more unicorns, or can we devise better solutions?

In my last role as a managing consultant in the Operations Research and Analytics team of a large global consultancy, I also ran recruitment. Having spoken to or met 150-200 of such candidates personally, and my recruitment team saw multiples of this number, I can tell you not many of those candidates made the cut. That's because they didn't have all of the skills we were looking for. And we were only looking for the first 2.5 of the 5 core skill-sets of a data scientist below. "Good luck" is what people offer to this talent-search problem, but I think we can get around the unicorns.

The 5 core skills of a Data Scientist 

Expanding on the data science venn diagram, I think the following 5 skills deserve closer attention, separately*.
  • Business consulting (from problem definition to stakeholder and team management) :: what problem to solve
  • Analysis and modelling (maths, stats, physics, OR, engineering, etc. / note this includes coding) :: how to solve it
  • Communication and visualisation (artistic and functional, learn the visualisation tools) :: how to tell the story
  • Data engineering (take data in, store it, push data out: computer science) :: how to get the data for the solution
  • Programming (for enterprise use at production level, software engineering, integrating into BI systems, automated decision making embedded in operational systems) :: how to make the solution useful to a wider audience

Furthermore, each of the above have subfields and specialties, because they are complicated in their own right. It is not possible to be very good at so many things, not at scale anyway or to be above mediocracy at best. How many sportsman/woman excel at more than one sport, for example?

It's a lot to ask for one person. So, why ask just one person?

The thing is, these people all exist, have existed, and will exist. They are just separate individuals. They have labels like business analytics consultants, statisticians and modellers (operations researchers included), data visualisation experts, DBAs and software engineers. Yes, they are also talents in need, but they are not unicorns. If we need data scientists in troves, we need a team, not just a few geniuses.

The future I see is like the age old relationship advice: 

Don't try to change them. Instead, let's change how we work with them.

People should diversify a bit, for instance a modeller should be able to code, but ultimately they need to specialise in something they are good at. A modeller must be able to prototype on his/her own, which requires coding skills, but s/he shouldn't be expected to produce production-ready code for large scale applications. Similarly, asking a good modeller to do database administration and ETL tasks is a waste of talent, Hadoop or not.

Specialisation is the reason for humanity's proliferation. Therefore, I'd say it's not the people we need to change, but the system that we need to setup to allow such specialised workforce to team up together. It's lazy for the analytics field to put up its feet and just summon one person to provide it all.

As a starter-for-ten, I think the future of our field could be modelled after the traditional IT project group make-up:
  • The technical "purists": analytics modeller, data engineer, visualiser, programmer
  • The "bridge": more like a traditional business analyst
  • The "glue": project manager with business consulting skills
There will be complications to address. To name a few...topics for another post:
  • Who should start up your data science team?
  • What's the load balance? (how much of each skill to have)
  • How to coordinate the division of labour?
  • Where should they sit in the organisation?
  • How to prioritise the problems to set them working on?
  • How to manage this team?
  • What's the career path?


Where do you think the analytics field is heading to?


Disclaimer:
My views are definitely biased by my background: I am a manager in business analytics consulting, trained in Operations Research and Computer Science.

* My expansion on the data science venn diagram's 3 skills are based on various articles, such as O'Reilly guideIntro to DS skills, and job requirements on numerous current data science job posts.