Showing posts with label Starting up in Operational Research. Show all posts
Showing posts with label Starting up in Operational Research. Show all posts

Friday, October 25, 2013

How to Learn Python and R, the Data Science Programming Languages, from Beginner to Intermediate and Advanced

The Data Science programming / analytics languages to know are, R and Python. If you're in Operations Research or another analytics field that somewhat fits under the "Data Science" hat, you: a) already know them really well, b) want to brush up on them, or c) you probably should learn them now. Here I compile my thinking on how to learn R and Python from Beginner to the Intermediate and Advanced levels, based on having tried some of these course materials.

Beginner (doing basic analysis)


Computing for Data Analysis on Coursera and Youtube (weeks 1, 2, 3, 4), by Roger Peng from Johns Hopkins University

  • Summary: It covers the basics of conditioning and loop structures, R's syntax, debugging, Object Oriented Programming, performing basic tasks with R, such as importing data, basic statistical analysis, plotting and regular expressions. See syllabus for more.
  • Time commitment: 11~36 hours total, including: 
    • non-programmers: 4 weeks X [3 hours/week on video + 2~6 hours/week on exercises]
    • programmers: [3 hours of notes reading + 8~16 hours] on exercises
  • Advice for: 
    • non-programmers: Listen to all lectures (videos), make sure you understand all details, and do all the exercises to hone your skills. Programming is all about practicing. Doing the exercises are important. See below for "Advanced".
    • programmers: Don't bother with the videos, go straight to the lecture notes (link). Read the notes - much faster than the videos. if you don't understand anything, look up the video and watch, or google the topic. Then do all the exercises. You don't need me to tell you that practice is king (um, and cash too).

The swirl package within R, by the Biostatistics team at Johns Hopkins University
  • Summary: It aims to teach R and Statistics within the R environment itself, through a package called swirl. See the announcement here for more detailed info.
  • I haven't tried this, so I'm not sure how much time it takes or how good it is. However, I think it sounds pretty good, and deserves a mention. I was never a fan of reading books to learn a programming language. Show me the code, or in this case, let me write the code, and get involved, is much more, well, involving.


Google's Python course (link)
  • Summary: It's straight to the meat, no non-sense stuff, and covers all the important things. Suits my style. Enough said, so see the course page on the syllabus. 
  • Time commitment: 8-10 hours
    • including reading notes and doing exercises
  • Note, this is for experienced programmers. There are videos too, but don't bother. The notes on the course page are the same, and it always takes less time to read than watch.

Intermediate (building analytical models)


Data Analysis with R on Coursera and Youtube (plus class notes), by Jeff Leek from Johns Hopkins University
  • Summary: It covers the full modelling cycle, from getting data, to structuring the analysis pipeline, exploring with graphs and statistical analysis, modelling (clustering, regression and trees), and model checking with simulation. It also talks about important statistical watch-outs like p-values, confidence intervals, multiple testing and bootstrapping. More syllabus here.
  • Time commitment: 32~56 hours
    • including 8 weeks X [2~3 hours/week videos + 2~4 hours/week exercises]

Forecasting using R (link), by Rob Hyndman from Monash University in Australia and Revolution Analytics (the enterprise R solution)
  • Summary: topics include "seasonality and trends, exponential smoothing, ARIMA modelling, dynamic regression and state space models, as well as forecast accuracy methods and forecast evaluation techniques such as cross-validation. Some recent developments in each of these areas will be explored" (quoted from course site). Read more there.
  • Note: I haven't done this (just started), so I'm not sure about its time requirement or quality. I'm also not sure if they are planning to make available the lectures. Time will tell on these questions.

Python / Octave:

Machine Learning on Coursera, by Andrew Ng from Stanford University --> My Favourite!
  • Summary: The course actually teaches in the Octave language, but it all can be done in Python. I suppose you can do it twice, first in Octave, and then in Python, if you've got the time. It certainly would solidify your understanding of the material, and Andrew Ng is sure that Octave is rather important in Machine Learning. It assumes some prior knowledge of linear algebra and probability, and refreshes you on some basics. "Topics include: (i) Supervised learning (parametric/non-parametric algorithms, support vector machines, kernels, neural networks). (ii) Unsupervised learning (clustering, dimensionality reduction, recommender systems, deep learning). (iii) Best practices in machine learning (bias/variance theory; innovation process in machine learning and AI)." (quoted from the course website)
  • Time commitment: 50~90 hours
    • including 10 weeks X [2~3 hours/week videos + 3~6 hours/week exercises]
  • Note: this course covers a subset of the statistical and modelling principles from the Data Analysis with R course above, but the overall level is more advanced. I enjoyed this course the most.

Advanced (you follow the drift from above)

Advanced = Experienced.
This is true for programming, analytics, and learning any foreign languages.

"Just do it", is how you get experienced.

There is no course on this stuff (i.e. being advanced), not without a PhD _plus_ years of field work.

My best suggestion is use your curiosity. Find a problem. Dig into it.

Plus, work with other people that are really good.

Happy learning!

Tuesday, August 27, 2013

More MOOC on Analytics - Coursera

A hoard of analytics related Massive Open Online Courses (MOOCs) are about to start in September. Have your pick on what to learn. Having taken a few Coursera courses now, I would recommend 1) not taking too many courses at once, however tempting it is to sign up to all of them, unless you have no other work or projects on the go. This is just to make sure you have a reasonable load and able to devote enough of your attention to learning the material properly. 2) Make good use of the discussion forums, as they are both a good source of clarifications and a window into other people's perspectives on the material. 3) Do the exercises, programming assignments and quizzes to ensure your understanding of the material.

Linear and Integer Programming
Starts 2 Sept 2013, 9 weeks, 5-7 hours/week
(the basics of mathematical optimisation, a core toolkit in the field of Operations Research)

Statistics One
Starts 22 Sept 2013, 12 weeks, 5-8 hours/week

Introduction to Recommender Systems
Starts 3 Sept 2013, 14 weeks, 4-10 hours/week

Computing for Data Analysis
Starts 23 Sept 2013, 4 weeks, 3-5 hours/week
As I've written before here.

Web Intelligence and Big Data
Starts 26 Aug 2013, 12 weeks, 3-4 hours/week

Thinking Again: How to Reason and Argue
Starts 26 Aug 2013, 12 weeks, 5-6 hours/week
Perhaps a bit off topic, but perhaps not, since all analytics are more or less rooted in proving or disproving arguments, so we better learn how to do it well.

Related article:
Coursera and the Analytics Talent Gap
Starting up in Operational Research: What Programming Languages Should I Learn?

Monday, July 29, 2013

Learn R with Coursera for Data Analysis

Heads up: the Computing for Data Analysis course is running in September 2013.

It will teach you the R language for data analysis. The course is described as:
This course is about learning the fundamental computing skills necessary for effective data analysis. You will learn to program in R and to use R for reading data, writing functions, making informative graphs, and applying modern statistical methods. 
In this course you will learn how to program in R and how to use R for effective data analysis. You will learn how to install and configure software necessary for a statistical programming environment, discuss generic programming language concepts as they are implemented in a high-level statistical language. The course covers practical issues in statistical computing which includes programming in R, reading data into R, creating informative data graphics, accessing R packages, creating R packages with documentation, writing R functions, debugging, and organizing and commenting R code. Topics in statistical data analysis and optimization will provide working examples.

Related article:
Coursera and the Analytics Talent Gap
Starting up in Operational Research: What Programming Languages Should I Learn?

Sunday, December 30, 2012

Coursera and the analytics talent gap

It's been a while, and ThinkOR is back to blogging about Operational Research and its related themes.

ThinkOR authors are about to start on 3 Coursera courses over the next couple months:

I am not only learning about some new topics for my own benefit, but also interested in assessing how such easily accessible courses could help the so-called 'big data and analytics talent gap' in businesses. As a Business Analytics consultant, this is one of the biggest issues I see my clients facing in today's business world - one wouldn't think about it, if they don't know about it, and once they know about it, they don't know how to get more of it. Obviously, there would need to be some sort of a step progression, such as (just an example without much research at this point):
  1. Statistics One
  2. Data Analysis (with R) and/or Computing for Data Analysis
  3. some sort of programming course, check the computing course catalogue
  4. Focus on one or several of the main OR techniques and their associated tools, such as Discrete Event Simulation, Monte Carlo Simulation, Optimisation, Forecasting, Machine Learning, and the good old Volumetric Modelling, as some examples
  5. and if you are going to work with humongous data sets, Intro to Data Science sounds reasonable to become familiar with the various big data technology to apply data science (I suspect this often eludes traditional OR practitioners)
As ThinkOR goes along, we will be blogging about these courses and our learning experience. So far, there has only been very positive feedback. Let's get going!

Merry Christmas and Happy New Year!

Saturday, August 22, 2009

"Doing good with good OR" - it's not just academic

ThinkOR reader, Tina asks:
I recently graduated from college and am considering going to graduate school to study Operations Research because it is a subject I really like. There's something strangely satisfying about improving the real world with mathematical models. However, I am conflicted about what kind of career path a masters in OR would put me on. In an admittedly naive way, I want to use my education to improve our society. I think that OR can be better applied to many social services to improve efficiency. However, are there currently opportunities like this available? It seems like most of the job market (at least in the US) is for market analysis...something I don't know I'd want to devote my life to, not that there's anything wrong with that.

There is an issue of Interfaces coming out about the sort of thing I'd be interested in doing--"Doing good with good OR", but the contributors so far are all academic. Is this the main option for this kind of research? I would hate to spend two more years getting a masters degree, only to find out that the kind of job I'm looking for doesn't really exist.

ThinkOR's reply to Tina's concerns on non-academic careers in Operations Research that would do good in our society (outside of finance):

True that OR can be applied to many social (or non-social) services to improve efficiency, because as long as there is a process in place, OR can be applied to it. The question is to what degree it would help - is the ROI worthwhile? You are right that some of the "Doing Good with Good OR" seems research oriented. However, I would disagree that market analysis is the only 'career' for OR graduates out there. In fact, health care is the biggest employer for my graduating class in Vancouver, Canada. It is my understanding that health care is employing OR folks more and more in North America, so there you go, a very valid social/public service that is using OR to improve our society.

Also, in this website, they have listed quite a few other real world examples of using OR to do good, some of them are certainly for the good of our society:
  • evacuation planning
  • cancer therapy
  • acquisition prioritization
  • dispatching service vehicles
  • delay management in public transportation
  • design of a house for disabled persons
  • hub location in cargo applications
  • production resetting optimization
  • optimization of the collection and disposal of recyclable waste
See this website "24 Hours Operations Research - operations research clock" for more details on the above projects.

Sunday, June 7, 2009

Starting up in Operational Research: Should I be a generalist or a specialist?

This is the part 2 of 3 of the mini-series on "Starting up in Operational Research".

Question 1: What programming languages should I learn?

Question 2: Should I be a generalist or a specialist as an Operational Research professional?
"As an Operational Research professional, are you usually viewed as a "jack of all trades" or do you usually have to specialize in one area like marketing, government, military, logistics, etc.?"

The short answer is:
First of all, there are two different types of "specializations" in Operational Research: industry specialization, and OR technique specialization. When you are a student at the master's level, you cannot afford to specialize in either industry or technique, because there is so much to learn, and it is all somewhat important. However, once you start working as an OR professional, because of the nature of your work / organization, you will almost be forced to specialize in an industry, such as marketing, healthcare, defense, logistics, mining, energy, etc. However, personally, I would not corner myself into specializing in an OR technique, such as optimization, forecasting, simulation, etc., unless I were an academia. This is because of 'what-if' scenarios for your career. As an OR professional, if you specialize in a technique, you may pigeon-hole yourself into one type of job, which will be difficult to change from if you ever want to. For example, what if you wanted a change from doing simulation models? Personally, specializing in one OR technique could quickly get boring, but that may not be the case for everybody.

Now, let me elaborate a bit more on the above:
As a student of Operational Research (a.k.a. "Operations Research" in North America), there simply isn't time enough to specialize in one field of OR during the studies. At least that was the case for me. My program, Master of Management in Operations Research, run by the Centre for Operations Excellence in the Sauder School of Business, University of British Columbia, is 15 months long. It included 8 months of intensive, mandatory, foundational courses to build up the skills and tools necessary for an Operational Research professional, including but not limited to: optimization, simulation, forecasting, statistical methodology, stochastic processes, decision analysis, operations management and logistics, consulting practices, as well as operations research and management sciences best practices. These are our tools to be a "jack of all trades", and must not be neglected. Then the program included a crucial 4-month (typically) hands-on project, where the student acts as the main consultant on behalf of the school to work with a private or public organization on a relatively high importance OR project, charged with real deliverables to the client. This makes it a "professional degree", instead of a M.Sc. (Master of Science) where the student is expected to do research and produce an academic thesis paper. After the project, the entire program wraps up with another 4 months of courses, but to be chosen by the student. This is the opportunity to specialize if you wish. However, I don't believe 4 months of studies can make anyone a "specialist" in anything. It is the future work that you do that will shape you into whatever specialist you may choose to be.

As a professional working in OR, one will be forced into specializing in an industry or a field of business, such as healthcare, unless you go with a large consulting firm that deals with more than just one type of industry. With the big consulting firms, you may get the chance to be exposed to different industries, but you may have to insist. That experience could be invaluable. From my current job hunting experience in the UK, many industries are rather incestuous, such as energy, finance, insurance, and healthcare. Many jobs will require you to have experience in an industry before they would consider you a worthy candidate. I do not agree with it entirely. Even though there is much to be said about prior industry experience, a good management consultant can transcend industries, because his/her expertise is in the problem solving aspect. Industry knowledge can be picked up quickly by a good consultant, not to be an expert, but enough to solve the problem efficiently. Not mentioning, if an industry keeps hiring from within, not to be cliche, but it just doesn't have the new blood or the out-of-the-box fresh thinking to approach problems from a different angle. I understand if the hiring manager prefers a candidate with prior industry experience over one that does not, but to list it as an essential criteria is over the top and short-sighted.

To learn more about the fields that Operational Research plays a major role in, check this out.

Friday, May 15, 2009

Starting up in Operational Research: What Programming Languages Should I Learn?

A ThinkOR blog reader asked me some questions about getting started as an Operational Research professional. The reader is in his final year studying Mathematical Statistics, and is preparing to get into an OR master's degree upon graduation. I think these might be some common questions about starting up in OR, so I'm publishing my answers here as a mini-series on "Starting up in Operational Research".

Question 1: What programming languages should I learn?
"I'm a bit interested in what programs and programming languages you would recommend to learn? May seem like a silly question, but since we use a few different programs in university, I'd like to focus on the programming languages which are most widely used/accepted. Our main tool at the moment is R, with some Matlab thrown in for Econometrics and larger matrix calculations. Would you consider R a well used tool in OR or is it mainly used in academics but not in real life?"

I think it is not so important as to what language to learn, but to learn the basics of programming well, so that you can pick up any language easily in the future.

The reason I say that is because:
  1. The OR application and software world is very fragmented. There are many different applications, and they seem to like having their own proprietary languages. However, most of them are developing a Graphical User Interface (GUI) for non-programmers, since they are often aimed at business users (that's where the money is). That said, if you want to fully utilize a software's potential, especially in the case of simulation software, you'd better learn its own language, which is usually proprietary. This would require you to pick up a new language.
  2. Most of these proprietary languages are quite low level languages. Therefore, it is important to have a solid foundation in programming. It will help you understand the language and learn it fast.
My undergraduate degree was in Computer Science (CS). I learned quite a few different languages, but did not really get to use them in Operational Research. However, my CS education helped me to pick up the programming languages needed in OR very quickly. Here I will list the main languages that I encountered in OR:
  • VBA
  • SAS
  • R
VBA is the number one, since a lot of models are done in Excel. SAS is probably the most used in business and a required skill by many employers. Matlab is more for researchers, I believe. I haven't seen it used in any commercial setting so far. I do believe R is used in some commercial settings though.

Note: needless to say, these are only my thoughts on the topic. Please feel free to chime in.