Showing posts with label Operations Research Careers. Show all posts
Showing posts with label Operations Research Careers. Show all posts

Thursday, January 23, 2014

Finally Some Sense on Analytics & Data Science Job Ads

After yesterday's post on the state of the debate on building data science teams (individual vs team approach), it's so refreshing to stumble onto this careers page of Civis Analytics. Great example of Analytics & data science job ads done right. This page alone makes me want to apply to work there!

They actually divide their jobs into: data scientist, engagement analyst, project manager, software engineer!

data science analytics roles done right: software engineer, data scientist, engagement analyst, project manager

How sensible. I like it! 
Nothing like the typical data science job posts, asking for "everything and the kitchen sink".

Wednesday, January 22, 2014

Building Data Science Teams: Individuals vs Team - State of the Debate So Far

Since my last article on "Hiring 1 Data Science unicorn is hard enough, a team is impossible. To scale means to specialise", similar ideas have been expressed by InformationWeekMcKinsey/HBR, and KDnuggets (here, here, here and here).

There has a been a ton of great discussion. I attempt to summarise the viewpoints so far: 
  • Data Scientists are supposed to have some pretty deep expertise in some pretty hard areas (see diagram). 
  • Is it possible to close this talent gap when we seem to be chasing after superheroes or unicorns? (there are some, but very few)
  • Some (44%) think there should be data science sub-specialisations (which all exist today), and have them work together in a team.
  • Others (44% too) prefer the superhero approach - individuals who have it all

Opinions so far on the approach of team vs individuals to build out a data science team are as follows:

For Team / against individuals For Individuals / against team
for bigger companies for smaller companies (can't afford)
Easier to find all necessary skill-sets Easier to get things done (no coordination friction)
Don't fall apart if an individual leaves
Jack-of-all trades, master of none; Deep expertise more possible in team Automation tools will take over data engineering & cleaning from DS jobs, so can concentrate on modelling
Business domain expertise & soft skills are hard to find in math/quant majors  Higher-ed will turn out DS superstars soon, who will have the combined maths/computing skills
A good team has both Specialists and Generalists
DS is a field that's evolving fast, and so will these opinions
You want an all-round DS guy/gal to get you started, or 2-3 of them who round each other off. As your team grows with demand, it will become increasingly difficult to find those all-encompassing individuals, so your team will naturally be people with 1-2 of the DS skills.

If you are still keen to know more about what data scientists do, and who they are, listen to these DS guys talk:
  • Amazon's principal engineer: John Rauser, "What is a career in big data?" - 17 minutes of a very good stepped-back view of data science.
  • Cloudera's director of data science: Josh Wills, "Life as a data scientist" - some good nuggets in there at minute 10, 16, 25, 52:
    • "I'm a competent statistician... I'm a competent programmer... I would not say I am good... I am capable of having a conversation at each of those fields with them..."
    • "Scientists get linear regression...but they don't get the difference between linear regression and logistic regression...or the assumptions that underlie the regression models", like normal distribution of the variables for linear regression; it's more of a "mechanical" exercise to turn the crank on the data without understanding the assumptions that support the model
    • Kaggle has "done most of the hard work [for the competitors]". In my opinion, the guys who are competing are good at using the ML tools on a clean'ish data set; but it doesn't exactly test their ability to go from a business problem to a "mental model of the data required" to the type of problem to solve (segmentation, regression, etc...)
    • what stats to learn for someone from the computer engineering side of data science: "learn linear regression, t-tests, confidence intervals, binomial random variables, exponentially distributed random variables, ... the core stuff, really, really well"

P.S. After writing this all out, it sounds so obvious. But believe me, there has been so much debate around this topic, and I wanted some... sense. Go read those articles linked at the top if you want to know.

Monday, December 16, 2013

Hiring 1 Data Science unicorn is hard enough, a team is impossible. To scale means to specialise.

The Data Scientists need a large set of skills, including business know-how, modelling and mathematics, plus programming. They are as hard to find as unicorns, or superheroes. I know this talent shortage first hand. Is the solution to create more unicorns, or can we devise better solutions?

In my last role as a managing consultant in the Operations Research and Analytics team of a large global consultancy, I also ran recruitment. Having spoken to or met 150-200 of such candidates personally, and my recruitment team saw multiples of this number, I can tell you not many of those candidates made the cut. That's because they didn't have all of the skills we were looking for. And we were only looking for the first 2.5 of the 5 core skill-sets of a data scientist below. "Good luck" is what people offer to this talent-search problem, but I think we can get around the unicorns.

The 5 core skills of a Data Scientist 

Expanding on the data science venn diagram, I think the following 5 skills deserve closer attention, separately*.
  • Business consulting (from problem definition to stakeholder and team management) :: what problem to solve
  • Analysis and modelling (maths, stats, physics, OR, engineering, etc. / note this includes coding) :: how to solve it
  • Communication and visualisation (artistic and functional, learn the visualisation tools) :: how to tell the story
  • Data engineering (take data in, store it, push data out: computer science) :: how to get the data for the solution
  • Programming (for enterprise use at production level, software engineering, integrating into BI systems, automated decision making embedded in operational systems) :: how to make the solution useful to a wider audience

Furthermore, each of the above have subfields and specialties, because they are complicated in their own right. It is not possible to be very good at so many things, not at scale anyway or to be above mediocracy at best. How many sportsman/woman excel at more than one sport, for example?

It's a lot to ask for one person. So, why ask just one person?

The thing is, these people all exist, have existed, and will exist. They are just separate individuals. They have labels like business analytics consultants, statisticians and modellers (operations researchers included), data visualisation experts, DBAs and software engineers. Yes, they are also talents in need, but they are not unicorns. If we need data scientists in troves, we need a team, not just a few geniuses.

The future I see is like the age old relationship advice: 

Don't try to change them. Instead, let's change how we work with them.

People should diversify a bit, for instance a modeller should be able to code, but ultimately they need to specialise in something they are good at. A modeller must be able to prototype on his/her own, which requires coding skills, but s/he shouldn't be expected to produce production-ready code for large scale applications. Similarly, asking a good modeller to do database administration and ETL tasks is a waste of talent, Hadoop or not.

Specialisation is the reason for humanity's proliferation. Therefore, I'd say it's not the people we need to change, but the system that we need to setup to allow such specialised workforce to team up together. It's lazy for the analytics field to put up its feet and just summon one person to provide it all.

As a starter-for-ten, I think the future of our field could be modelled after the traditional IT project group make-up:
  • The technical "purists": analytics modeller, data engineer, visualiser, programmer
  • The "bridge": more like a traditional business analyst
  • The "glue": project manager with business consulting skills
There will be complications to address. To name a few...topics for another post:
  • Who should start up your data science team?
  • What's the load balance? (how much of each skill to have)
  • How to coordinate the division of labour?
  • Where should they sit in the organisation?
  • How to prioritise the problems to set them working on?
  • How to manage this team?
  • What's the career path?

Where do you think the analytics field is heading to?

My views are definitely biased by my background: I am a manager in business analytics consulting, trained in Operations Research and Computer Science.

* My expansion on the data science venn diagram's 3 skills are based on various articles, such as O'Reilly guideIntro to DS skills, and job requirements on numerous current data science job posts.

Tuesday, November 5, 2013

From Operations Research to Data Science

In the last post, I wrote about how good it is to see OR linked as a skillset to data science. However, do note that OR is only one part of the DS skillsets. OR ≠ Data Science. How does an Operations Researcher transition to a Data Scientist?

There are a few things the O'Reilly book I talked about in the last post briefly mentions as suggestions for an OR person to learn more about: some of the new Bayesian / Monte Carlo Statistics methods, broad programming skills, data warehouse architecture for big data technology, and business kills "to be able to intelligently collaborate with (or lead) others on a data science team".

For those looking to upgrade, here are my quick thoughts on where to start.

Bayesian Data Analysis: Andrew Gelman from Columbia is running a course on Bayesian Data Analysis *right now*, with Google+ Hangout sessions. Looks very interesting.

Programming skills: see my previous post on learning R and Python - the languages of data science.

Big data architecture: in my experience, first understand the layers of a normal data warehouse architecture, then broaden to the enterprise BI architecture stack, then learn about the new bits for addressing the "big" aspect. I was fortunate to have led a fairly big project in this area, and had the opportunity to work with some great data warehouse architects and enterprise BI architects to learn a ton from them. I'm not sure what the best self-learning material is other than the typical read-a-lot. Wikipedia doesn't seem to cut it, and the best material that helped me aren't publicly available. Hmm...I will have to think about this - topic for another post perhaps. In the meanwhile, Pivotal seems to do a fairly good job in their blog to dumb down the explanation of the bits for "big" data technology in some practical terms.

Business skills: I think this only applies to academics (sorry for the generalisation). For the practitioners, i.e. OR people working in and with businesses, that's a fundamental part of our jobs.

Operations Research is a skillset of a Data Scientist

... according to O'Reilly, yes, it is. 
This is perhaps the clearest I've seen anyone link OR to Data Science. Or perhaps, depending on how you read shows that some Data Scientists are OR people. OR is a subset of Data Science skills.

Data Scientist (DS) - a very popular label that seems to be associated with people kind of like us-OR-people these days (just like "analytics" has been for the last few years), but no one is completely sure exactly what it is. As a result, many of us are reluctant to call ourselves a data scientist, or don't know how to make the transition to be called one (see my next post on where to start). There is the Venn diagram, and examples from famous DS people like Nate Silver and Hilary Mason (who are identified more as statisticians than anything else), but confusions are still bountiful.

OR has always had a bit of an identity crisis - how many jobs have you seen with the words "operations research" in the title or description? Is "Data Science" here to help?

O'Reilly published a book, titled "Analyzing the Analyzers", which discusses the results and implications for people in these related fields, based on a survey they ran in mid-2012, with whom they consider as data scientists, and "how they viewed their skills, careers, and experiences with prospective employers". Their goal, best summarised in their own words, are, "in the broad Analytics / Data Science / Big Data / Applied Stats / Machine Learning space, define these new fields better, and we hope the results will help people such as yourself talk about how your skills and your work fit in with everyone else's."

The main result was summarised into a 5X4 matrix (credit: O'Reilly), showing where the survey respondents are in terms of skills / expertise and the label they associate themselves with. 

The list of skills they grouped under "Math / OR" are: Optimisation, Math, Graphical Models, Bayesian / Monte Carlo Statistics, Algorithms, Simulation. Sounds familiar indeed.

Hooray for the mention of OR as a Data Science skillset!

I recommend reading the full report for more details. Here is a summary to give you a taste:
  • Four data scientist clusters
  • Cases in miscommunication between data scientists and organisations looking to hire
  • Why "T-shaped" data scientists have an advantage in breadth and depth of skills
  • How organisations can apply the survey results to identify, train, integrate, team up, and promote data scientists
(The last point above: it wasn't too comprehensive, so don't expect too much. More of a taster.)

Have you got what it takes to call yourself a Data Scientist? OR folks, see my next post on how to upgrade yourself (umm, didn't mean to make you sound like machines).

Friday, October 25, 2013

How to Learn Python and R, the Data Science Programming Languages, from Beginner to Intermediate and Advanced

The Data Science programming / analytics languages to know are, R and Python. If you're in Operations Research or another analytics field that somewhat fits under the "Data Science" hat, you: a) already know them really well, b) want to brush up on them, or c) you probably should learn them now. Here I compile my thinking on how to learn R and Python from Beginner to the Intermediate and Advanced levels, based on having tried some of these course materials.

Beginner (doing basic analysis)


Computing for Data Analysis on Coursera and Youtube (weeks 1, 2, 3, 4), by Roger Peng from Johns Hopkins University

  • Summary: It covers the basics of conditioning and loop structures, R's syntax, debugging, Object Oriented Programming, performing basic tasks with R, such as importing data, basic statistical analysis, plotting and regular expressions. See syllabus for more.
  • Time commitment: 11~36 hours total, including: 
    • non-programmers: 4 weeks X [3 hours/week on video + 2~6 hours/week on exercises]
    • programmers: [3 hours of notes reading + 8~16 hours] on exercises
  • Advice for: 
    • non-programmers: Listen to all lectures (videos), make sure you understand all details, and do all the exercises to hone your skills. Programming is all about practicing. Doing the exercises are important. See below for "Advanced".
    • programmers: Don't bother with the videos, go straight to the lecture notes (link). Read the notes - much faster than the videos. if you don't understand anything, look up the video and watch, or google the topic. Then do all the exercises. You don't need me to tell you that practice is king (um, and cash too).

The swirl package within R, by the Biostatistics team at Johns Hopkins University
  • Summary: It aims to teach R and Statistics within the R environment itself, through a package called swirl. See the announcement here for more detailed info.
  • I haven't tried this, so I'm not sure how much time it takes or how good it is. However, I think it sounds pretty good, and deserves a mention. I was never a fan of reading books to learn a programming language. Show me the code, or in this case, let me write the code, and get involved, is much more, well, involving.


Google's Python course (link)
  • Summary: It's straight to the meat, no non-sense stuff, and covers all the important things. Suits my style. Enough said, so see the course page on the syllabus. 
  • Time commitment: 8-10 hours
    • including reading notes and doing exercises
  • Note, this is for experienced programmers. There are videos too, but don't bother. The notes on the course page are the same, and it always takes less time to read than watch.

Intermediate (building analytical models)


Data Analysis with R on Coursera and Youtube (plus class notes), by Jeff Leek from Johns Hopkins University
  • Summary: It covers the full modelling cycle, from getting data, to structuring the analysis pipeline, exploring with graphs and statistical analysis, modelling (clustering, regression and trees), and model checking with simulation. It also talks about important statistical watch-outs like p-values, confidence intervals, multiple testing and bootstrapping. More syllabus here.
  • Time commitment: 32~56 hours
    • including 8 weeks X [2~3 hours/week videos + 2~4 hours/week exercises]

Forecasting using R (link), by Rob Hyndman from Monash University in Australia and Revolution Analytics (the enterprise R solution)
  • Summary: topics include "seasonality and trends, exponential smoothing, ARIMA modelling, dynamic regression and state space models, as well as forecast accuracy methods and forecast evaluation techniques such as cross-validation. Some recent developments in each of these areas will be explored" (quoted from course site). Read more there.
  • Note: I haven't done this (just started), so I'm not sure about its time requirement or quality. I'm also not sure if they are planning to make available the lectures. Time will tell on these questions.

Python / Octave:

Machine Learning on Coursera, by Andrew Ng from Stanford University --> My Favourite!
  • Summary: The course actually teaches in the Octave language, but it all can be done in Python. I suppose you can do it twice, first in Octave, and then in Python, if you've got the time. It certainly would solidify your understanding of the material, and Andrew Ng is sure that Octave is rather important in Machine Learning. It assumes some prior knowledge of linear algebra and probability, and refreshes you on some basics. "Topics include: (i) Supervised learning (parametric/non-parametric algorithms, support vector machines, kernels, neural networks). (ii) Unsupervised learning (clustering, dimensionality reduction, recommender systems, deep learning). (iii) Best practices in machine learning (bias/variance theory; innovation process in machine learning and AI)." (quoted from the course website)
  • Time commitment: 50~90 hours
    • including 10 weeks X [2~3 hours/week videos + 3~6 hours/week exercises]
  • Note: this course covers a subset of the statistical and modelling principles from the Data Analysis with R course above, but the overall level is more advanced. I enjoyed this course the most.

Advanced (you follow the drift from above)

Advanced = Experienced.
This is true for programming, analytics, and learning any foreign languages.

"Just do it", is how you get experienced.

There is no course on this stuff (i.e. being advanced), not without a PhD _plus_ years of field work.

My best suggestion is use your curiosity. Find a problem. Dig into it.

Plus, work with other people that are really good.

Happy learning!

Sunday, December 30, 2012

Coursera and the analytics talent gap

It's been a while, and ThinkOR is back to blogging about Operational Research and its related themes.

ThinkOR authors are about to start on 3 Coursera courses over the next couple months:

I am not only learning about some new topics for my own benefit, but also interested in assessing how such easily accessible courses could help the so-called 'big data and analytics talent gap' in businesses. As a Business Analytics consultant, this is one of the biggest issues I see my clients facing in today's business world - one wouldn't think about it, if they don't know about it, and once they know about it, they don't know how to get more of it. Obviously, there would need to be some sort of a step progression, such as (just an example without much research at this point):
  1. Statistics One
  2. Data Analysis (with R) and/or Computing for Data Analysis
  3. some sort of programming course, check the computing course catalogue
  4. Focus on one or several of the main OR techniques and their associated tools, such as Discrete Event Simulation, Monte Carlo Simulation, Optimisation, Forecasting, Machine Learning, and the good old Volumetric Modelling, as some examples
  5. and if you are going to work with humongous data sets, Intro to Data Science sounds reasonable to become familiar with the various big data technology to apply data science (I suspect this often eludes traditional OR practitioners)
As ThinkOR goes along, we will be blogging about these courses and our learning experience. So far, there has only been very positive feedback. Let's get going!

Merry Christmas and Happy New Year!

Friday, January 30, 2009

ThinkOR's authors looking for Operational Research positions in Europe

ThinkOR's authors are looking for exciting Operational Research and Operations Management work opportunities in western Europe. Aleksey and Dawen are moving from Canada to Europe to further and broaden their work and life experiences.

We are flexible in the cities we reside in and the industries we work in, so long as the problems are interesting and that we can contribute to the development of an exciting organization. If you or someone you know are looking to hire English-speaking OR consultants, please contact Dawen at dawen[dot]peng[at]gmail[dot]com and/or Aleksey at aleksey[dot]nozdrynplotnicki[at]gmail[dot]com. Our high-level resumes are available on LinkedIn (Aleksey and Dawen). Detailed resumes and references are available upon request. Your help is greatly appreciated. Advices are also welcomed on OR job hunting in Europe.

Since we will be in Europe, if you'd like to meet and chat, we'd certainly be glad to meet more fellow Operational Research professionals. Just shoot over an email to connect.

Thursday, January 22, 2009

Operational Researchers and Industrial Engineers - Top 10 Happiest Professionals

As an Operational Research professional, the kind of work we do is pretty exhilarating. Don't you agree?

Recently at work (a major health care authority), my team did some analysis of the emergency department visits trend. We presented our findings and communicated our recommendations to the senior management based on thorough quantitative analysis. I left the boardroom thinking "geeze, who knows when the recommendations will be taken seriously, but if only they would". Then a week later (a week only, can you imagine?!), to my surprise, we hear about the ED changing the physician coverage based on our suggestions and analysis. To take it one step further, they have voluntarily asked for continuous measurement and report to see how this has impacted the ED operations. It made my day! However, I should emphasize that we are lucky to be working with progressive clinical staff who are open to quickly try new suggestions. Cooperative clients make a very happy OR consultant.

It feels good to be useful. I'm sure this is a feeling shared by many other OR professionals. Naturally, we enjoy our line of work, and we are one (or two) of America's top 10 happiest professionals.

At number 7:
Science technicians
Job Description: Use principles and theories of science and mathematics to solve problems in research and development, and to help invent and improve products and processes.

Very happy: 51.0%
Median salary (research scientists): $72,435

At number 9:
Industrial engineers
Job Description: Design, develop, test, and evaluate integrated systems for managing industrial production processes.

Very happy: 48.4%
Median salary: $61,729
So we have some frustrating moments (a lot of them, actually), but when it works according to design, it puts a big smile on my face. :)

Saturday, August 9, 2008

OR Career Path Talk by Jason Goto

Jason Goto came to the University of British Columbia and gave an informal talk & Q&A on career path in Operations Research as part of the INFORMS UBC Student Chapter event series. It was a very open dialogue appreciated by the audience. Here are some highlights from this talk.

Job Market:
Work opportunities in operations research locally in Vancouver is rather limited. It includes
  • The big health authorities: Fraser Health Authorities, Coastal Health Authorities, BC Cancer Agency, etc.
  • and maybe some engineering firms, such as Sandwell Engineering
Particularly, if an OR professional is looking for good job opportunities, one should consider relocation to the east coast or the States. However, we live in Vancouver because of a lifestyle choice, so if that's clear, prepare to sacrifice in pay and business opportunities. 

Jason stressed on the importance of critical mass of OR group to sustain an OR operation and presence within an organization. If an organization has only a few OR professionals working, they may not be able to achieve enough to show the importance of OR; and if someone leaves or goes on vacation, things grind to a halt and will take a long time to get back on track.

OR Consulting:
If an OR professional is thinking of going into consultancy by joining a consulting agency, then those companies will value the consulting/business/soft skills much more than they do about your technical OR skills. However, an OR professional should possess the following skill set:
  • data skills
  • consulting & communication skills (written & verbal)
  • change management
  • empathy - put yourself in other's shoes to help them understand your view
There are quite a few very small OR consulting companies with 1 to 3 people. Mostly they are academics doing consulting on the side. These small outfits don't tend to grow, because it is simply easier to do the work with only a few people, especially if you want the work to be done well, without much administration and supervision.

Jason's operations research consulting company, AnalysisWorks, incorporated in 2000, has been growing 25% a year. He has groomed it to an 8 person outfit - a steady, unaggressive growth since the start.

Starting a consulting company, the first year is the hardest. Everyone is against you if you have no credentials or portfolios to show. It is difficult to get the consulting projects because of that. Especially if you look too young (if you are starting out early), people don't take you seriously. You wonder if you are getting your market value. However, on the other hand, if you start the company when you are older, there are elements pressing against you as well: family, dog, house, pension, etc.  Some people may choose to start a company in groups. This requires careful consideration and an early agreement on who does what, just like in a marriage. If one partner is good at selling and the other good at doing, the two must agree on how they will operate together and the compensation scheme. Otherwise, break-ups could be very bad - again, just like marriages.

In general, it is difficult to get the OR consulting projects. If 10 companies are contacted, 1 may come back with some interest. Most people think the work is good, but do not think it absolutely necessary. The good clients are the ones that really, really need your help, because otherwise their jobs and the company's survival is on the line.

Over-delivery is a consultant's own loss. Clients may have been more than happy with the less than perfect solution, compared to a perfect solution which could have taken hours of the consultant's time - exhausting the budget that way.