tag:blogger.com,1999:blog-57643050548613615972024-03-06T09:01:55.163+00:00ThinkOR - Think Operations Researchan exchange corner for the OR professionals,
an OR information source for the general publicDawenhttp://www.blogger.com/profile/07152350276687825418noreply@blogger.comBlogger120125tag:blogger.com,1999:blog-5764305054861361597.post-1922854586151073912015-05-01T05:37:00.000+01:002015-05-01T05:37:11.593+01:00Proud of my Masters Program in OR Winning the George D Smith Prize<div dir="ltr" style="text-align: left;" trbidi="on">
This year's <a href="http://meetings2.informs.org/wordpress/analytics2015/" target="_blank">INFORMS Business Analytics and Operations Research conference</a> was held in Huntington Beach, California. Excellent talks on really relevant topics. Great conference, networking and top organizers. :)<br />
<br />
Best of all, my former masters program in Operations Research, <a href="https://www.informs.org/About-INFORMS/News-Room/Press-Releases/INFORMS-Awards-2015-UPS-George-D.-Smith-Prize-to-UBC-Sauder-School-s-Centre-for-Operations-Excellence" target="_blank">the Centre for Operations Excellence at the Sauder School of Business, University of British Columbia, won this year's George D Smith Prize</a>. This is the field's top prize recognizing some of the best education programs.<br />
<blockquote class="tr_bq">
<i>The UPS George D. Smith Prize is awarded to an academic department or program for effective and innovative preparation of students to be good practitioners of operations research.</i></blockquote>
<br />
I was honoured to be at the gala event, and ecstatic for the program's win. My career would not have looked quite the same without the program. So happy for them.<br />
<blockquote class="tr_bq">
<i>"I am indebted to my father for living, but to my teacher for living well."</i> - Alexander the Great<br /><br /><i>"There are two kinds of teachers: the kind that fill you with so much quail shot that you can't move, and the kind that just gives you a little prod behind and you jump to the skies."</i> - Robert Frost</blockquote>
</div>
<div class="blogger-post-footer">
This article was originally posted on <a href="http://www.thinkor.org">ThinkOR.org</a>. Share if you like it.</div>Dawenhttp://www.blogger.com/profile/07152350276687825418noreply@blogger.com0tag:blogger.com,1999:blog-5764305054861361597.post-27822529594238621092014-11-23T04:46:00.001+00:002014-11-23T04:46:30.324+00:00Spam gets a personal touch: Human 1, Machine 1<div dir="ltr" style="text-align: left;" trbidi="on">
Blogging and spamming practically come hand in hand. The obvious ones have been pretty well controlled by the major blogging platforms' spam filters, thanks to advances in text analysis and machine learning algorithms. However, it is not perfect, or is it - you be the judge in this case.<br />
<br />
This could be an example of how creative spammers are at combating algorithms.<br />
<br />
Or, it could be an example of a business owner trying to do his own selective SEO (search engine optimization).<br />
<br />
An old post on <a href="http://www.thinkor.org/2012/01/school-uniforms-in-developing-countries.html" target="_blank">mandatory school uniforms</a> got the following spam:<br />
<blockquote class="tr_bq">
<i>I think school uniforms must be compulsory in schools because after one time-investment in the uniform, it prevents the child from the traits of social inequality,inferiority complex etc.And If you have decided to buy the uniform, buy it from Wang Uniforms </i>(link removed)</blockquote>
<br />
I speculate that a human wrote the comment, because it is a sensible comment, and also because of the grammatical, punctuation and spacing errors.<br />
<br />
However, the link, which I removed for this post, does point to a legitimate school uniform maker in the UAE. I suppose there are two possibilities:<br />
1) The uniform business had legitimately read the article, had something genuine to say, and also wanted to promote its own business.<br />
2) The uniform business hired a spammer / mass commenter to do the job for SEO purposes.<br />
<br />
I had a bit of a hard time deciding whether this is spam or not. Since I cannot edit the comment to remove the link, I rejected the comment. Especially after I found out that the profile for the commenter was some jewelry shop in South East Asia - nothing to do with uniforms.<br />
<br />
Algorithms are never perfect. The underlying uncertainty is why we build algorithms at all. Given I the human had trouble identifying the authenticity of this comment, I'm glad the machine (spam filter) didn't just rule it out.<br />
<br />
So... Human vs Machine: Human 1, Machine 1?<br />
<br />
<br />
P.S. Unrelated, but this is quite funny. Don't be fooled by the title.<br />
<a href="http://bigdatapix.tumblr.com/" target="_blank">Visualizing Big Data</a></div>
<div class="blogger-post-footer">
This article was originally posted on <a href="http://www.thinkor.org">ThinkOR.org</a>. Share if you like it.</div>Dawenhttp://www.blogger.com/profile/07152350276687825418noreply@blogger.com1tag:blogger.com,1999:blog-5764305054861361597.post-48067559292416261722014-02-11T01:10:00.000+00:002014-02-11T01:10:33.146+00:00What I learned from a sabbatical year<div dir="ltr" style="text-align: left;" trbidi="on">
I spent 2013 '<a href="http://en.wikipedia.org/wiki/Overlanding" target="_blank">overlanding</a>' through South America with my partner. 1 year, 1 continent, 1 simple car, 2 people, 13 countries, 40,000 km. After moving from Canada to the UK 5 years ago, and setting up a new life there, we gave up our jobs, salary, friends and all the comforts of life in one of the greatest metropolis in the world. A lot to let go, but we gained so much more.<br />
<br />
Above all, I learned <b><span style="color: #660000; font-size: large;">how little I need to live on to be happy</span></b>, material-wise. We converted the back of our little van into a bed, so we slept in it a lot of the nights. Wild camping at some bizarre and cool spots, like 24-hour gas stations and garages, road-side somewhere in the country, cliff edge by the sea, a lot of central plazas and town squares, in front of police stations (with permission), and once within a secure military compound. The living was rough, and it took some getting used to. I had very few possessions; I was happy; and my eyes were filled with wondrous things throughout the year.<br />
<br />
<span style="color: #660000; font-size: large;"><b>Communities kick ass</b></span> in supporting overlanding travellers of all modes, by car, motorcycle, bicycle, uni-cycle or even by donkey(!). A few hundred people gathered on a Facebook group were the best near real-time information providers. Almost all overlanders are super eager to share information with each other and help, because we've all known a few hard moments on the road. Most people have never met each other in the cyber community, but are ready to answer questions when asked.<br />
<br />
One can have <span style="color: #660000; font-size: large;"><b>too much of a good thing</b></span>. I love travelling, and still do. 60+ countries later, my imaginary list is still quite long. Doing a year of pure travel is super fortunate, and I almost don't dare to utter that sometimes I found it hard to drag myself for the 11th time in 3 months to drive through yet another beautiful wine country with breath-taking alpine scenery, or more Andean mountain villages, or serene beaches... etc.. Managing the trip is a huge challenge, but I also missed work a lot, missing the other challenges. So, in the evenings I:<br />
<ul>
<li>brushed up on <b>R</b> and some <b>Machine Learning</b> techniques through <a href="http://coursera.org/" target="_blank">Coursera</a> (awesome!)</li>
<li>learned something new, <b>Octave</b>, <b>Python</b>, more <b>Machine Learning</b> techniques</li>
<li>read a lot of blogs on <b>OR</b>, <b>analytics</b> and <b>data science</b></li>
<li>wrote a few <b>blog</b> articles here (definitely neglected when I was working a busy job)</li>
<li>thought long and hard about <b>what I want to do</b> when I get back</li>
</ul>
<br />
A <span style="color: #660000; font-size: large;"><b>bunch of random stuff</b></span> I learned a bit about:<br />
<ul style="text-align: left;">
<li><b>Navigating</b> in places I've never been before. "<i>Don't listen to the British lady</i> (aka the GPS voice)<i>, she's never been to Venezuela</i>", and she's leading us down a dead-end.</li>
<li><b>Spotting and dodging</b> potholes, rocks, livestock, cowboys, donkey carts, tree stumps, burned tires (12 day riot aftermath), flying fallen ladder (kid you not from the truck 15m in front at 90km/hr), alignment-breaking and bottom-scraping grooves in the road from heavy Brazilian trucks ... ...</li>
<li><strike>Making it</strike> <b>Swimming through </b>potholes<b> </b>the size of a swimming pool, with muddy and seemingly endless bottoms, with a 2x4 car that had 6" clearance (nope, didn't get stuck even once! 4x4 is not a necessity for everyone)</li>
<li>Fixing <b>cars</b> and dealing with mechanics, and their other-worldly Spanish</li>
<li>Playing with the <b>police</b> to always avoid paying bribes (wasn't too often)</li>
<li>Finding out just how <b>friendly</b> people are (lots of home-stay invites)</li>
<li>Playing the <b>Quena</b> (Andean flute) is way harder than it seems - sticking with my uke instead</li>
<li><b>Optimising the journey</b> in Travelling Sales Man fashion (had to return to the origin to sell the car) - yes, Operations Research is useful in every walk, or drive, of life</li>
<li><b>Optimising decisions</b> under uncertain conditions</li>
<li>And of course, learning <b>Spanish</b>, with all sorts of accents and idioms, and the 13 countries' history, culture, landscape, food and people (P.S. mechanics and old country farmers are really hard to understand)</li>
</ul>
<br />
<div>
Having finished the year-long journey over a month ago, I was inspired to write this article after reading "<a href="http://blogs.hbr.org/2013/12/why-i-put-my-company-on-a-year-long-sabbatical/" target="_blank">Why I put my company on a year-long sabbatical</a>". This is not a PR article, but one to say that anyone can do this sabbatical thing, and you will learn a ton. You don't need the best car all decked out. You don't need to be young. You don't need to be retired. You don't need to be without kids (met a lot of families, with kids from 6-months to 17-year olds). You don't need to have a partner. You don't need to be rich (our all-in costs: £10,000 per person, assuming 2 people sharing). Actually, you will learn how little you need at all. All you need is a bit of discipline to save some money, a bit of gut to throw yourself at it, some luck and common sense to be safe, and a lot of curiosity to explore.</div>
<div>
<br /></div>
<div>
In case you are inspired to consider a sabbatical year, here are some great overlanding resources:</div>
<div>
<a href="http://www.expeditionportal.com/" target="_blank">Expedition Portal</a></div>
<div>
<a href="http://wikioverland.org/" target="_blank">WikiOverland</a></div>
<div>
<a href="http://www.landcruisingadventure.com/" target="_blank">LandCruising Adventure</a></div>
<div>
<br /></div>
<div>
<br /></div>
<div>
2014 is going to be great. I am never more ready.<br />
First step, land an awesome job.<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhHRdLOhGT3t9TJ-r3qvIGRq0lsITgKESeu1hFXoRbEW54D5Gs-P3PzP2AeGMBX7-_GeOycdlViP5dK0czjSjZrzrnJPEu_Nerdr0FSApYkQDtIofDBxkItByzXsVMWbqFAfpiGAC06t919/s1600/map-5.jpg" imageanchor="1" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhHRdLOhGT3t9TJ-r3qvIGRq0lsITgKESeu1hFXoRbEW54D5Gs-P3PzP2AeGMBX7-_GeOycdlViP5dK0czjSjZrzrnJPEu_Nerdr0FSApYkQDtIofDBxkItByzXsVMWbqFAfpiGAC06t919/s1600/map-5.jpg" height="181" width="320" /></a></div>
<br /></div>
</div>
<div class="blogger-post-footer">
This article was originally posted on <a href="http://www.thinkor.org">ThinkOR.org</a>. Share if you like it.</div>Dawenhttp://www.blogger.com/profile/07152350276687825418noreply@blogger.com4tag:blogger.com,1999:blog-5764305054861361597.post-11359316747640883532014-01-23T23:03:00.002+00:002014-01-23T23:03:50.894+00:00Finally Some Sense on Analytics & Data Science Job Ads<div dir="ltr" style="text-align: left;" trbidi="on">
After yesterday's post on the state of the debate on <a href="http://www.thinkor.org/2014/01/building-data-science-team-vs-individual-summary.html" target="_blank">building data science teams</a> (individual vs team approach), it's so refreshing to stumble onto this <a href="https://act.civisanalytics.com/page/signup/apply" target="_blank">careers page of Civis Analytics</a>. Great example of Analytics & data science job ads done right. This page alone makes me want to apply to work there!<br />
<br />
They actually divide their jobs into: data scientist, engagement analyst, project manager, software engineer!<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhQP78XF0wkEpDAuyegoK93x6Pd7J4qdKpEVU6Dm5JXTa2rKEcvqdr6ieFJW32v3c7AxDRLoN6b95UDbiThlObI4-Y2F0sso1mb1gJhXA1zZO-SvgvvPbEWOwWUTYm-Njn052X3Glf7bGqt/s1600/data+science+roles+done+right+-+civis.jpg" imageanchor="1" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"><img alt="data science analytics roles done right: software engineer, data scientist, engagement analyst, project manager" border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhQP78XF0wkEpDAuyegoK93x6Pd7J4qdKpEVU6Dm5JXTa2rKEcvqdr6ieFJW32v3c7AxDRLoN6b95UDbiThlObI4-Y2F0sso1mb1gJhXA1zZO-SvgvvPbEWOwWUTYm-Njn052X3Glf7bGqt/s1600/data+science+roles+done+right+-+civis.jpg" height="178" title="data science analytics roles done right" width="320" /></a></div>
<br />
<div>
<br /></div>
<div>
<br /></div>
<div>
<br /></div>
<div>
<br /></div>
<div>
<br /></div>
<div>
<br /></div>
<div>
<br /></div>
<div>
<br /></div>
<div>
<br /></div>
<div>
<br /></div>
<div>
<br /></div>
<div>
How sensible. I like it! </div>
<div>
Nothing like the typical data science job posts, asking for "<i>everything and the kitchen sink</i>".</div>
<div>
</div>
</div>
<div class="blogger-post-footer">
This article was originally posted on <a href="http://www.thinkor.org">ThinkOR.org</a>. Share if you like it.</div>Dawenhttp://www.blogger.com/profile/07152350276687825418noreply@blogger.com0tag:blogger.com,1999:blog-5764305054861361597.post-61465257949150926882014-01-22T22:59:00.000+00:002014-01-22T22:59:10.771+00:00Building Data Science Teams: Individuals vs Team - State of the Debate So Far<div dir="ltr" style="text-align: left;" trbidi="on">
Since my last article on "<a href="http://www.thinkor.org/2013/12/data-science-unicorns-superheoroes-scale-specialise.html" target="_blank">Hiring 1 Data Science unicorn is hard enough, a team is impossible. To scale means to specialise</a>", similar ideas have been expressed by <a href="http://www.informationweek.com/big-data/big-data-analytics/how-to-build-a-successful-data-science-team-/d/d-id/1113234?" target="_blank">InformationWeek</a>, <a href="http://blogs.hbr.org/2014/01/make-the-most-of-scarce-data-mining-talent/" target="_blank">McKinsey/HBR</a>, and KDnuggets (<a href="http://www.kdnuggets.com/2013/12/unicorn-data-scientists-vs-data-science-teams-discussion.html" target="_blank">here</a>, <a href="http://www.kdnuggets.com/2014/01/biernbaum-data-science-99-percent-too-fast.html" target="_blank">here</a>, <a href="http://www.kdnuggets.com/2013/12/what-is-wrong-with-definition-data-science.html" target="_blank">here</a> and <a href="http://www.kdnuggets.com/2014/01/split-on-data-science-skills-individual-vs-team-approach.html" target="_blank">here</a>).<br />
<div>
<br /></div>
<div>
<a href="http://upload.wikimedia.org/wikipedia/commons/d/db/Data_Science_Venn_Diagram.png" imageanchor="1" style="clear: right; float: right; margin-bottom: 1em; margin-left: 1em;"><img border="0" src="http://upload.wikimedia.org/wikipedia/commons/d/db/Data_Science_Venn_Diagram.png" height="190" width="200" /></a>There has a been a ton of great discussion. I attempt to summarise the viewpoints so far: </div>
<div>
<ul style="text-align: left;">
<li>Data Scientists are supposed to have some pretty deep expertise in some pretty hard areas (see diagram). </li>
<li>Is it possible to close this talent gap when we seem to be chasing after superheroes or unicorns? (there are <i>some</i>, but very few)</li>
<li>Some (<a href="http://www.kdnuggets.com/2014/01/split-on-data-science-skills-individual-vs-team-approach.html" target="_blank">44%</a>) think there should be data science sub-specialisations (which all exist today), and have them work together in a team.</li>
<li>Others (44% too) prefer the superhero approach - individuals who have it all</li>
</ul>
</div>
<div>
<h3 style="text-align: left;">
<span style="color: #660000;">Opinions so far on the approach of team vs individuals to build out a data science team are as follows:</span></h3>
</div>
<style type="text/css">
.myOtherTable { background-color:#FFFFE0;border-collapse:collapse;color:#000;table-layout:fixed;width:100%; }
.myOtherTable tr { width:50%; }
.myOtherTable td { border-bottom:1px dotted #BDB76B; padding-right:4px; padding-left:4px;}
</style><br />
<div>
<table class="myOtherTable"><tbody>
<tr><th>For Team / against individuals</th> <th>For Individuals / against team</th> </tr>
<tr> <td>for bigger companies</td> <td>for smaller companies (can't afford)</td></tr>
<tr> <td>Easier to find all necessary skill-sets</td> <td>Easier to get things done (no coordination friction)</td></tr>
<tr> <td>Don't fall apart if an individual leaves</td> <td><br /></td></tr>
<tr> <td>Jack-of-all trades, master of none; Deep expertise more possible in team</td> <td>Automation tools will take over data engineering & cleaning from DS jobs, so can concentrate on modelling</td></tr>
<tr> <td>Business domain expertise & soft skills are hard to find in math/quant majors </td> <td>Higher-ed will turn out DS superstars soon, who will have the combined maths/computing skills</td></tr>
<tr> <td colspan="2"><div style="text-align: center;">
A good team has both Specialists and Generalists</div>
</td> </tr>
<tr> <td colspan="2"><div style="text-align: center;">
DS is a field that's evolving fast, and so will these opinions</div>
</td> </tr>
<tr> <td colspan="2"><div style="text-align: center;">
<b>You want an all-round DS guy/gal to get you started, or 2-3 of them who round each other off. As your team grows with demand, it will become increasingly difficult to find those all-encompassing individuals, so your team will naturally be people with 1-2 of the DS skills.</b></div>
</td> </tr>
</tbody></table>
</div>
<div>
<br /></div>
<br />
If you are still keen to know more about what data scientists do, and who they are, listen to these DS guys talk:<br />
<div style="text-align: left;">
</div>
<ul style="text-align: left;">
<li><a href="https://www.youtube.com/watch?v=0tuEEnL61HM" target="_blank">Amazon</a>'s principal engineer: John Rauser, "What is a career in big data?" - 17 minutes of a <i>very good</i> stepped-back view of data science.</li>
<li><a href="https://www.youtube.com/watch?v=h9vQIPfe2uU" target="_blank">Cloudera</a>'s director of data science: Josh Wills, "Life as a data scientist" - some good nuggets in there at minute 10, 16, 25, 52:</li>
<ul>
<li>"<i>I'm a competent statistician... I'm a competent programmer... I would not say I am good... I am capable of having a conversation at each of those fields with them...</i>"</li>
<li>"<i>Scientists get linear regression...but they don't get the difference between linear regression and logistic regression...or the assumptions that underlie the regression models</i>", like normal distribution of the variables for linear regression; it's more of a "<i>mechanical</i>" exercise to turn the crank on the data without understanding the assumptions that support the model</li>
<li>Kaggle has<i> </i>"<i>done most of the hard work </i>[for the competitors]<i>". </i>In my opinion, the guys who are competing are good at using the ML tools on a clean'ish data set; but it doesn't exactly test their ability to go from a business problem to a "<i>mental model of the data required</i>" to the type of problem to solve (segmentation, regression, etc...)</li>
<li>what stats to learn for someone from the computer engineering side of data science: "<i>learn linear regression, t-tests, confidence intervals, binomial random variables, exponentially distributed random variables, ... the core stuff, really, really well</i>"</li>
</ul>
</ul>
<br />
P.S. After writing this all out, it sounds so obvious. But believe me, there has been so much debate around this topic, and I wanted some... sense. Go read those articles linked at the top if you want to know.</div>
<div class="blogger-post-footer">
This article was originally posted on <a href="http://www.thinkor.org">ThinkOR.org</a>. Share if you like it.</div>Dawenhttp://www.blogger.com/profile/07152350276687825418noreply@blogger.com0tag:blogger.com,1999:blog-5764305054861361597.post-22544797120791029762013-12-16T15:25:00.000+00:002013-12-16T15:31:11.674+00:00Hiring 1 Data Science unicorn is hard enough, a team is impossible. To scale means to specialise.<div dir="ltr" style="text-align: left;" trbidi="on">
The Data Scientists need a large set of skills, including business know-how, modelling and mathematics, plus programming. They are as hard to find as <a href="http://www.informationweek.com/big-data/big-data-analytics/are-you-recruiting-a-data-scientist-or-unicorn/d/d-id/899843" target="_blank">unicorns</a>, or superheroes. I know this talent shortage first hand. Is the solution to create more unicorns, or can we devise better solutions?<br />
<br />
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhXG9-X-8IFqnjR0ZSIXQmsHRhjXfoijog7smooUIa0pNMpntCdlGNvhmxA9pVsd5bVqv7mZNaSSpr0ZfUWNiqHoydTxq7PIZotXM_DL3Wt_eLfN-made3cj_EHCwm8dQvwNXlAOipFl2h0/s1600/superheroes.jpg" imageanchor="1" style="clear: right; float: right; margin-bottom: 1em; margin-left: 1em;"><img border="0" height="150" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhXG9-X-8IFqnjR0ZSIXQmsHRhjXfoijog7smooUIa0pNMpntCdlGNvhmxA9pVsd5bVqv7mZNaSSpr0ZfUWNiqHoydTxq7PIZotXM_DL3Wt_eLfN-made3cj_EHCwm8dQvwNXlAOipFl2h0/s200/superheroes.jpg" width="200" /></a>In my last role as a managing consultant in the Operations Research and Analytics team of a large global consultancy, I also ran recruitment. Having spoken to or met 150-200 of such candidates personally, and my recruitment team saw multiples of this number, I can tell you not many of those candidates made the cut. That's because they didn't have all of the skills we were looking for. And <b>we were only looking for the first 2.5 of the 5 core skill-sets of a data scientist below</b>. "<i>Good luck</i>" is what people offer to this talent-search problem, but I think we can get around the unicorns.<br />
<br />
<h3>
<b><i>The 5 core skills of a Data Scientist</i></b> </h3>
Expanding on the data science venn diagram, I think the following 5 skills deserve closer attention, separately*.<br />
<ul style="text-align: left;">
<li><b>Business consulting</b> (from problem definition to stakeholder and team management) :: <i>what problem to solve</i></li>
<li><b>Analysis and modelling</b> (maths, stats, physics, OR, engineering, etc. / note this includes coding) :: <i>how to solve it</i></li>
<li><b>Communication and visualisation</b> (artistic and functional, learn the visualisation tools) :: <i>how to tell the story</i></li>
<li><b>Data engineering</b> (take data in, store it, push data out: computer science) :: <i>how to get the data for the solution</i></li>
<li><b>Programming</b> (for enterprise use at production level, software engineering, integrating into BI systems, automated decision making embedded in operational systems) :: <i>how to make the solution useful to a wider audience</i></li>
</ul>
<br />
Furthermore, each of the above have subfields and specialties, because they are complicated in their own right. It is not possible to be very good at so many things, not at scale anyway or to be above mediocracy at best. How many sportsman/woman excel at more than one sport, for example?<br />
<br />
<h3 style="text-align: left;">
It's a lot to ask for one person. So, why ask just one person?</h3>
The thing is, these people all exist, have existed, and will exist. They are just separate individuals. They have labels like <b><i>business analytics consultants, statisticians and modellers (operations researchers included), data visualisation experts, DBAs and software engineers</i></b>. Yes, they are also talents in need, but they are not unicorns. If we need data scientists in troves, we need a team, not just a few geniuses.<br />
<br />
<div style="text-align: left;">
The future I see is like the age old relationship advice: </div>
<h3>
<i>Don't try to change them. </i>Instead, let's change how <i>we</i> work with <i>them</i>.</h3>
People should diversify a bit, for instance a modeller should be able to code, but ultimately they need to specialise in something they are good at. A modeller must be able to prototype on his/her own, which requires coding skills, but s/he shouldn't be expected to produce production-ready code for large scale applications. Similarly, asking a good modeller to do database administration and ETL tasks is a waste of talent, Hadoop or not.<br />
<br />
Specialisation is the reason for humanity's proliferation. Therefore, I'd say it's not the people we need to change, but the system that we need to setup to allow such specialised workforce to team up together. It's lazy for the analytics field to put up its feet and just summon one person to provide it all.<br />
<br />
As a starter-for-ten, I think the future of our field could be modelled after the traditional IT project group make-up:<br />
<ul style="text-align: left;">
<li>The technical "<b>purists</b>": analytics modeller, data engineer, visualiser, programmer</li>
<li>The "<b>bridge</b>": more like a traditional business analyst</li>
<li>The "<b>glue</b>": project manager with business consulting skills</li>
</ul>
There will be complications to address. To name a few...topics for another post:<br />
<ul style="text-align: left;">
<li>Who should start up your data science team?</li>
<li>What's the load balance? (how much of each skill to have)</li>
<li>How to coordinate the division of labour?</li>
<li>Where should they sit in the organisation?</li>
<li>How to prioritise the problems to set them working on?</li>
<li>How to manage this team?</li>
<li>What's the career path?</li>
</ul>
<br />
<br />
Where do you think the analytics field is heading to?<br />
<br />
<br />
Disclaimer:<br />
My views are definitely biased by my background: I am a manager in business analytics consulting, trained in Operations Research and Computer Science.<br />
<br />
* My expansion on the data science venn diagram's 3 skills are based on various articles, such as <a href="http://strata.oreilly.com/2013/06/theres-more-than-one-kind-of-data-scientist.html" target="_blank">O'Reilly guide</a>, <a href="https://docs.google.com/document/d/17xYa-IL1qKvFZBp9EpfMZqzYIx4Ec6xAPYYZwmFMARo/pub#h.x050eqfzxhq5" target="_blank">Intro to DS skills</a>, and job requirements on numerous current data science job posts.</div>
<div class="blogger-post-footer">
This article was originally posted on <a href="http://www.thinkor.org">ThinkOR.org</a>. Share if you like it.</div>Dawenhttp://www.blogger.com/profile/07152350276687825418noreply@blogger.com0tag:blogger.com,1999:blog-5764305054861361597.post-36581795845129094632013-11-05T13:17:00.001+00:002013-11-05T13:17:22.483+00:00From Operations Research to Data Science<div dir="ltr" style="text-align: left;" trbidi="on">
In the last post, I wrote about how good it is to see <a href="http://www.thinkor.org/2013/11/operations-research-is-skillset-of-data-science.html" target="_blank">OR linked as a skillset to data science</a>. However, do note that OR is only one part of the DS skillsets. OR ≠ Data Science. How does an Operations Researcher transition to a Data Scientist?<br />
<div>
<br /></div>
<div>
There are a few things the <a href="http://strata.oreilly.com/2013/06/theres-more-than-one-kind-of-data-scientist.html" target="_blank">O'Reilly book</a> I talked about in the <a href="http://www.thinkor.org/2013/11/operations-research-is-skillset-of-data-science.html" target="_blank">last post</a> briefly mentions as suggestions for an OR person to learn more about: some of the new Bayesian / Monte Carlo Statistics methods, broad programming skills, data warehouse architecture for big data technology, and business kills "<i>to be able to intelligently collaborate with (or lead) others on a data science team</i>".<br />
<br />
<br />
For those looking to upgrade, here are my quick thoughts on where to start.</div>
<div>
<br />
<b>Bayesian</b> Data Analysis: Andrew Gelman from Columbia is running a <a href="http://andrewgelman.com/2013/10/17/g-hangout-test-run-bda-course/" target="_blank">course on Bayesian Data Analysis</a> *right now*, with Google+ Hangout sessions. Looks very interesting. </div>
<div>
<b><br /></b>
<b>Programming skills</b>: see my previous post on <a href="http://www.thinkor.org/2013/10/how-to-learn-python-and-r-data-science.html" target="_blank">learning R and Python - the languages of data science</a>.</div>
<div>
<b><br /></b>
<b>Big data architecture</b>: in my experience, first understand the layers of a normal data warehouse architecture, then broaden to the enterprise BI architecture stack, then learn about the new bits for addressing the "big" aspect. I was fortunate to have led a fairly big project in this area, and had the opportunity to work with some great data warehouse architects and enterprise BI architects to learn a ton from them. I'm not sure what the best self-learning material is other than the typical read-a-lot. Wikipedia doesn't seem to cut it, and the best material that helped me aren't publicly available. Hmm...I will have to think about this - topic for another post perhaps. In the meanwhile, <i>Pivotal</i> seems to do a fairly good job in their blog to dumb down the explanation of <a href="http://blog.gopivotal.com/products/demystifying-hadoop-in-5-pictures" target="_blank">the bits for "big" data technology</a> in some practical terms.</div>
<div>
<b><br /></b>
<b>Business skills</b>: I think this only applies to academics (sorry for the generalisation). For the practitioners, i.e. OR people working in and with businesses, that's a fundamental part of our jobs.</div>
<div>
<br /></div>
</div>
<div class="blogger-post-footer">
This article was originally posted on <a href="http://www.thinkor.org">ThinkOR.org</a>. Share if you like it.</div>Dawenhttp://www.blogger.com/profile/07152350276687825418noreply@blogger.com0tag:blogger.com,1999:blog-5764305054861361597.post-15348747306653951902013-11-05T13:17:00.000+00:002013-11-05T13:17:13.331+00:00Operations Research is a skillset of a Data Scientist<div dir="ltr" style="text-align: left;" trbidi="on">
... according to <a href="http://strata.oreilly.com/2013/06/theres-more-than-one-kind-of-data-scientist.html">O'Reilly</a>, yes, it is. <div>
This is perhaps the clearest I've seen anyone link OR to Data Science. Or perhaps, depending on how you read it...it shows that some Data Scientists are OR people. OR is a subset of Data Science skills.</div>
<div>
<br />Data Scientist (DS) - a very popular label that seems to be associated with people kind of like us-OR-people these days (just like "analytics" has been for the last few years), but no one is completely sure exactly what it is. As a result, many of us are reluctant to call ourselves a data scientist, or don't know how to make the transition to be called one (<span style="background-color: white;"><a href="http://www.thinkor.org/2013/11/from-operations-research-to-data-science.html" target="_blank">see my next post on where to start</a>)</span>. There is the <a href="https://s3.amazonaws.com/aws.drewconway.com/viz/venn_diagram/data_science.html" target="_blank">Venn diagram</a>, and examples from famous DS people like <a href="http://en.wikipedia.org/wiki/Nate_Silver" target="_blank">Nate Silver</a> and <a href="http://www.hilarymason.com/" target="_blank">Hilary Mason</a> (who are identified more as statisticians than anything else), but confusions are still bountiful.<br /><br />OR has always had a bit of an identity crisis - how many jobs have you seen with the words "operations research" in the title or description? Is "Data Science" here to help?<br /><br />O'Reilly published a <a href="http://oreilly.com/data/stratareports/analyzing-the-analyzers.csp" target="_blank">book</a>, titled "<i>Analyzing the Analyzers</i>", which discusses the results and implications for people in these related fields, based on a survey they ran in mid-2012, with whom they consider as data scientists, and "how they viewed their skills, careers, and experiences with prospective employers". Their goal, best summarised in their own words, are, "<i>in the broad Analytics / Data Science / Big Data / Applied Stats / Machine Learning space, ...to define these new fields better, and we hope the results will help people such as yourself talk about how your skills and your work fit in with everyone else's.</i>"<br /><br />The main result was summarised into a 5X4 matrix (credit: O'Reilly), showing where the survey respondents are in terms of skills / expertise and the label they associate themselves with. <div class="separator" style="clear: both; text-align: center;">
<a href="http://s.radar.oreilly.com/wp-files/5/2013/06/Screen-Shot-2013-06-24-at-12.05.55-PM.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="380" src="http://s.radar.oreilly.com/wp-files/5/2013/06/Screen-Shot-2013-06-24-at-12.05.55-PM.png" width="400" /></a></div>
<div>
<br />The list of skills they grouped under "Math / OR" are: Optimisation, Math, Graphical Models, Bayesian / Monte Carlo Statistics, Algorithms, Simulation. Sounds familiar indeed.</div>
<div>
<br />Hooray for the mention of OR as a Data Science skillset! <br /><br /><br />I recommend reading the <a href="http://oreilly.com/data/stratareports/analyzing-the-analyzers.csp" target="_blank">full report</a> for more details. Here is a summary to give you a taste:<br /><blockquote class="tr_bq">
<ul style="text-align: left;">
<li>Four data scientist clusters</li>
<li>Cases in miscommunication between data scientists and organisations looking to hire</li>
<li>Why "T-shaped" data scientists have an advantage in breadth and depth of skills</li>
<li>How organisations can apply the survey results to identify, train, integrate, team up, and promote data scientists</li>
</ul>
</blockquote>
(The last point above: it wasn't too comprehensive, so don't expect too much. More of a taster.)</div>
<div>
<br />Have you got what it takes to call yourself a Data Scientist? OR folks, see my <a href="http://www.thinkor.org/2013/11/from-operations-research-to-data-science.html" target="_blank">next post on how to upgrade yourself</a> (umm, didn't mean to make you sound like machines).</div>
</div>
<div>
<br /></div>
</div>
<div class="blogger-post-footer">
This article was originally posted on <a href="http://www.thinkor.org">ThinkOR.org</a>. Share if you like it.</div>Dawenhttp://www.blogger.com/profile/07152350276687825418noreply@blogger.com0tag:blogger.com,1999:blog-5764305054861361597.post-83105951282738705032013-10-25T14:54:00.000+01:002013-11-05T13:25:33.187+00:00How to Learn Python and R, the Data Science Programming Languages, from Beginner to Intermediate and Advanced<div dir="ltr" style="text-align: left;" trbidi="on">
<i>The</i> Data Science programming / analytics languages to know are, R and Python. If you're in Operations Research or another analytics field that somewhat fits under the "Data Science" hat, you: a) already know them really well, b) want to brush up on them, or c) you probably should learn them now. Here I compile my thinking on how to learn R and Python from Beginner to the Intermediate and Advanced levels, based on having tried some of these course materials.<br />
<br />
<h2 style="text-align: left;">
Beginner (doing basic analysis)</h2>
<h3 style="text-align: left;">
<i>R</i>:</h3>
<b><i style="background-color: #f1c232;">Computing for Data Analysis</i></b> on <a href="https://class.coursera.org/compdata-003/class" target="_blank">Coursera</a> and Youtube (weeks <a href="http://goo.gl/8HBAS" target="_blank">1</a>, <a href="http://goo.gl/kxVft" target="_blank">2</a>, <a href="http://goo.gl/OkDQO" target="_blank">3</a>, <a href="http://goo.gl/emkML" target="_blank">4</a>), by Roger Peng from Johns Hopkins University<br />
<br />
<ul style="text-align: left;">
<li><i><u>Summary</u></i>: It covers the basics of conditioning and loop structures, R's syntax, debugging, Object Oriented Programming, performing basic tasks with R, such as importing data, basic statistical analysis, plotting and regular expressions. See <a href="https://class.coursera.org/compdata-003/wiki/view?page=syllabus" target="_blank">syllabus</a> for more.</li>
<li><i><u>Time</u> commitment</i>: 11~36 hours total, including: </li>
<ul>
<li><i>non-programmers</i>: 4 weeks X [3 hours/week on video + 2~6 hours/week on exercises]</li>
<li><i>programmers</i>: [3 hours of notes reading + 8~16 hours] on exercises</li>
</ul>
<li><i><u>Advice</u></i> for: </li>
<ul>
<li><i>non-programmers</i>: Listen to all lectures (videos), make sure you understand all details, and do all the exercises to hone your skills. Programming is all about practicing. Doing the exercises are important. See below for "Advanced".</li>
<li><i>programmers</i>: Don't bother with the videos, go straight to the lecture notes (link). Read the notes - much faster than the videos. if you don't understand anything, look up the video and watch, or google the topic. Then do all the exercises. You don't need me to tell you that practice is king (um, and cash too).</li>
</ul>
</ul>
<div>
<br /></div>
<b><i style="background-color: orange;">The swirl package within R</i></b>, by the Biostatistics team at Johns Hopkins University<br />
<ul style="text-align: left;">
<li><u><i>Summary</i></u>: It aims to teach R and Statistics within the R environment itself, through a package called <a href="http://ncarchedi.github.io/swirl/" target="_blank">swirl</a>. See the announcement <a href="http://simplystatistics.org/2013/09/27/announcing-statistics-with-interactive-r-learning-software-environment/" target="_blank">here</a> for more detailed info.</li>
<li>I haven't tried this, so I'm not sure how much time it takes or how good it is. However, I think it sounds pretty good, and deserves a mention. I was never a fan of reading books to learn a programming language. Show me the code, or in this case, let me write the code, and get involved, is much more, well, involving.</li>
</ul>
<br />
<h3 style="text-align: left;">
<i>Python:</i></h3>
<b><i style="background-color: orange;">Google's Python course</i></b> (<a href="https://developers.google.com/edu/python/" target="_blank">link</a>)<br />
<ul style="text-align: left;">
<li><i><u>Summary</u></i>: It's straight to the meat, no non-sense stuff, and covers all the important things. Suits my style. Enough said, so see the <a href="https://developers.google.com/edu/python/" target="_blank">course page</a> on the syllabus. </li>
<li><u><i>Time commitment</i></u>: 8-10 hours</li>
<ul>
<li>including reading notes and doing exercises</li>
</ul>
<li>Note, this is for experienced programmers. There are videos too, but don't bother. The notes on the course page are the same, and it always takes less time to read than watch.</li>
</ul>
<br />
<h2 style="text-align: left;">
Intermediate (building analytical models)</h2>
<h3 style="text-align: left;">
<i>R:</i></h3>
<b><i style="background-color: orange;">Data Analysis with R</i></b> on <a href="https://www.coursera.org/course/dataanalysis" target="_blank">Coursera</a> and <a href="https://www.youtube.com/user/jtleek2007/featured"><span id="goog_827905032"></span>Youtube<span id="goog_827905033"></span></a> (plus <a href="https://github.com/jtleek/dataanalysis/" target="_blank">class notes</a>), by Jeff Leek from Johns Hopkins University<br />
<ul style="text-align: left;">
<li><u><i>Summary</i></u>: It covers the full modelling cycle, from getting data, to structuring the analysis pipeline, exploring with graphs and statistical analysis, modelling (clustering, regression and trees), and model checking with simulation. It also talks about important statistical watch-outs like p-values, confidence intervals, multiple testing and bootstrapping. More syllabus <a href="https://www.coursera.org/course/dataanalysis">here</a>.</li>
<li><u><i>Time</i></u> commitment: 32~56 hours</li>
<ul>
<li>including 8 weeks X [2~3 hours/week videos + 2~4 hours/week exercises]</li>
</ul>
</ul>
<span style="background-color: orange;"><b><i></i></b></span><br />
<div>
<span style="background-color: orange;"><b><i><span style="background-color: orange;"><b><i><br /></i></b></span></i></b></span></div>
<span style="background-color: orange;"><b><i>
Forecasting using R</i></b></span> (<a href="http://robjhyndman.com/hyndsight/revolutionr2013/" target="_blank">link</a>), by Rob Hyndman from Monash University in Australia and <a href="http://www.revolutionanalytics.com/" target="_blank">Revolution Analytics</a> (the enterprise R solution)<br />
<div>
<ul>
<li>Summary: topics include "seasonality and trends, exponential smoothing, ARIMA modelling, dynamic regression and state space models, as well as forecast accuracy methods and forecast evaluation techniques such as cross-validation. Some recent developments in each of these areas will be explored" (quoted from course site). Read more <a href="http://robjhyndman.com/hyndsight/revolutionr2013/" target="_blank">there</a>.</li>
<li>Note: I haven't done this (just started), so I'm not sure about its time requirement or quality. I'm also not sure if they are planning to make available the lectures. Time will tell on these questions.</li>
</ul>
<div>
<br /></div>
</div>
<h3 style="text-align: left;">
<i>Python / Octave:</i></h3>
<b><i style="background-color: orange;">Machine Learning</i></b> on <a href="https://class.coursera.org/ml-003/class">Coursera</a>, by Andrew Ng from Stanford University --> My Favourite!<br />
<ul style="text-align: left;">
<li><i><u>Summary</u></i>: The course actually teaches in the Octave language, but it all can be done in Python. I suppose you can do it twice, first in Octave, and then in Python, if you've got the time. It certainly would solidify your understanding of the material, and Andrew Ng is sure that Octave is rather important in Machine Learning. It assumes some prior knowledge of linear algebra and probability, and refreshes you on some basics. "Topics include: (i) Supervised learning (parametric/non-parametric algorithms, support vector machines, kernels, neural networks). (ii) Unsupervised learning (clustering, dimensionality reduction, recommender systems, deep learning). (iii) Best practices in machine learning (bias/variance theory; innovation process in machine learning and AI)." (quoted from the course website)</li>
<li><i><u>Time</u></i> commitment: 50~90 hours</li>
<ul>
<li>including 10 weeks X [2~3 hours/week videos + 3~6 hours/week exercises]</li>
</ul>
<li>Note: this course covers a subset of the statistical and modelling principles from the Data Analysis with R course above, but the overall level is more advanced. I enjoyed this course the most.</li>
</ul>
<div>
<br /></div>
<div>
<h2 style="text-align: left;">
Advanced (you follow the drift from above)</h2>
Advanced = Experienced. <br />
This is true for programming, analytics, and learning any foreign languages.<br />
<br />
"<i><b>Just do it</b></i>", is how you get experienced.<br />
<br />
There is no course on this stuff (i.e. being advanced), not without a PhD _<u>plus</u>_ years of field work.<br />
<br />
My best suggestion is use your curiosity. Find a problem. Dig into it. <br />
<br />
Plus, work with other people that are really good.<br />
<div>
<br /></div>
<div>
<br /></div>
<div>
<br /></div>
<div>
Happy learning!</div>
<div>
<br /></div>
<div>
<br /></div>
</div>
</div>
<div class="blogger-post-footer">
This article was originally posted on <a href="http://www.thinkor.org">ThinkOR.org</a>. Share if you like it.</div>Dawenhttp://www.blogger.com/profile/07152350276687825418noreply@blogger.com5tag:blogger.com,1999:blog-5764305054861361597.post-15584185779921181092013-10-07T05:43:00.001+01:002013-10-07T05:43:18.404+01:00The most efficient pizza shop - Ugi's in Buenos Aires<div dir="ltr" style="text-align: left;" trbidi="on">
<a href="http://www.argentinaindependent.com/images/edition054/ugis/ugis04.jpg" imageanchor="1" style="clear: right; float: right; margin-bottom: 1em; margin-left: 1em;"><img border="0" height="169" src="http://www.argentinaindependent.com/images/edition054/ugis/ugis04.jpg" width="200" /></a>This <a href="http://operationsroom.wordpress.com/2013/10/03/speed-variety-tradeoffs-in-fast-food/" target="_blank">blog post</a> by the Operations Room, one of my favourite operations blogs, reminded me that I should write about the Argentine pizza chain, <a href="http://es.wikipedia.org/wiki/Ugi's" target="_blank">Ugi's</a>. Its pizza was pretty good, but I was more fascinated by its operation of the stores and the business model, while visiting Buenos Aires (BA).<br />
<br />
<h3 style="text-align: left;">
<b><span style="color: #660000;">The business model</span></b></h3>
<b><br /></b>
<b>The product</b>:<br />
They sold one pizza. Exactly one type: mozzarella on tomato sauce on pizza dough.<br />
No variations.<br />
<br />
<b>Size</b>:<br />
The whole pizza. Or by the slice.<br />
<br />
<b>Extras</b>:<br />
You can add condiments like chilli peppers and oregano, after you get the pizza, for free.<br />
The cardboard box is extra.<br />
<br />
<b>Environment</b>:<br />
Basic and bare. No frills.<br />
There is basically standing room only with very few seats and tables in the shops. Most people do take out.<br />
<br />
--------------------------------------------------------------<br />
<b>The USP</b> (Unique Selling Point):<br />
Cheap.<br />
Fast.<br />
<br />
<b>The result</b>:
Very popular! Probably the longest queue for food in BA.<br />
--------------------------------------------------------------<br />
<div>
<br /></div>
<div>
<h3>
<b><span style="color: #660000;">The operation</span></b></h3>
</div>
<div>
Each shop had a big oven and 2 guys making the pizzas. That seems to be it.</div>
<div>
<br /></div>
<div>
<b>Tasks of the pizza maker</b>:</div>
<div>
1. Work the dough and spin it out to lay onto a wooden pizza pan.</div>
<div>
2. Ladle the tomato sauce from a big pot onto the dough, and smooth it over the dough with a circular smudge with the bottom of the ladle.</div>
<div>
3. Cut a brick sized block of mozzarella from a giant block. Split it in half, and chuck it in the middle of the sauced dough. (It nicely melts all over somewhat evenly.)</div>
<div>
4. Put the pizza into the oven.</div>
<div>
5. Check on the other pizzas in the oven. Take them out onto the table when ready.</div>
<div>
<br /></div>
<div>
<b>Tasks of the pizza giver</b>:</div>
<div>
6. Slice the pizza.</div>
<div>
7. Box it. Or put a slice on a plate.</div>
<div>
8. Hand it to the customer.</div>
<div>
9. Take money.</div>
<div>
<br /></div>
<div>
<div>
I could have stared at the guy making pizzas for hours. It was so well practiced and smooth, since it's the only thing he makes all day long, by the hundreds, every day. It reminded me of some of the best run factories I've visited before. Precise. Lean and Mean. The simple business model makes it possible.</div>
</div>
<div>
<br /></div>
<h3 style="text-align: left;">
<b><span style="color: #660000;">Change / Improve?</span></b></h3>
<div>
Do you find yourself asking, "Given their popularity, why don't they add 1 or 2 more flavours, like pepperoni or something?" or...why should they change anything?</div>
<div>
<ul style="text-align: left;">
<li>I think for one, it would trade off speed with variety.</li>
<li>Secondly, they are already maxing out their capacity, so why add more. There's unlikely more revenue to be had, and I can't comment on profitability.</li>
<li>Thirdly, given their popularity, why should they change any of it? The customers clearly like it the way it is.</li>
</ul>
</div>
<div>
<br /></div>
<div>
You can read more about Ugi's <a href="http://www.argentinaindependent.com/reviews/thegrill/ugi%E2%80%99s-the-bargain-basement-of-pizzas/" target="_blank">here</a>.<br />
<br />
As much as I love the parrilladas Argentinas, do have a Ugi's pizza next time you're in BA!</div>
</div>
<div class="blogger-post-footer">
This article was originally posted on <a href="http://www.thinkor.org">ThinkOR.org</a>. Share if you like it.</div>Dawenhttp://www.blogger.com/profile/07152350276687825418noreply@blogger.com0tag:blogger.com,1999:blog-5764305054861361597.post-24989762263058351152013-08-29T17:29:00.000+01:002013-08-29T17:29:22.323+01:00Colombia's national agricultural strike can use some facts and numbers to aid the public discussions<div dir="ltr" style="text-align: left;" trbidi="on">
Today, there is the largest yet march in the cities of Colombia during the country's nationwide agricultural <i>paro</i> (strike) by the <i>campesinos</i> (farmers). It's been 10 days so far. Students (suspending classes), truck drivers, health workers, together with miners, potato and other small farmers, coffee growers will all march today. The situation is fairly grave. You can read about the demands of the farmers <a href="http://www.eltiempo.com/colombia/boyaca/peticiones-de-los-campesinos-al-gobierno-nacional_13007020-4" target="_blank">here</a>, but in summary, they are asking for financial help from the government for seeds, fertilisers, fuel and highway tolls. (FYI: Colombia's highway toll costs have been the highest in our year-long South America trip so far.)<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiYeuksxDWVoNsBGzn23B_p66hkYqRJydk8QrMff-P_SI31njHHFjXr_suTZHoqC8jSoEhkF2Fj2iaavIPV94_rvXxJxPpHOKGUwviCoVFJ3k8xOWkdsdu-kgk9eUwg72vXksMAr99qO7aL/s1600/Bogota+march+Colombia+paro+2013.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="320" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiYeuksxDWVoNsBGzn23B_p66hkYqRJydk8QrMff-P_SI31njHHFjXr_suTZHoqC8jSoEhkF2Fj2iaavIPV94_rvXxJxPpHOKGUwviCoVFJ3k8xOWkdsdu-kgk9eUwg72vXksMAr99qO7aL/s320/Bogota+march+Colombia+paro+2013.jpg" width="316" /></a></div>
<div class="separator" style="clear: both; text-align: center;">
Photo source: ElTiempo.com & NoticiasRCN.com, screen captures from <a href="http://www.eltiempo.com/colombia/bogota/manifestaciones-en-bogota_13027528-4" target="_blank">here</a> and <a href="http://www.noticiasrcn.com/nacional-bogota/bogota-se-prepara-un-dia-marchas" target="_blank">here</a></div>
<br />
<h3 style="text-align: left;">
<b><span style="color: #660000;">A lot of emotions, but how much is all this financial help?</span></b></h3>
Not clear.<br />
<br />
All the reporting by newspapers like <a href="http://www.eltiempo.com/colombia/boyaca/ARTICULO-WEB-NEW_NOTA_INTERIOR-13012522.html" target="_blank">this</a>, <a href="http://www.ntn24.com/noticias/colombia-two-minutes-why-there-general-strike-103748" target="_blank">this</a> and <a href="http://www.noticiasrcn.com/nacional-bogota/bogota-se-prepara-un-dia-marchas" target="_blank">this</a>, only ever talk about whether the farmers and the government have reached an agreement, that the road block continues, that the food prices are rising due to shortage, that there are more blocks and marches, etc. Nothing concrete about how much all this financial help amounts to, not even a summarised high level figure.<br />
<br />
Minimum: half a million US dollars.<br />
<br />
The best I could gather was from <a href="http://www.eltiempo.com/politica/paro-en-colombia-propuestas-del-gobierno_13027527-4" target="_blank">this article</a>, that the government has asked for 1 billion pesos (~half million USD) to be added to the agricultural budget. My best guess is that this is the minimum, because Colombia is not that expensive, but it's not that cheap either. Nothing on the news can tell me what the range of this ballpark figure could be though.<br />
<br />
<br />
More likely: 20 million US dollars.<br />
As <a href="http://www.eltiempo.com/colombia/boyaca/peticiones-de-los-campesinos-al-gobierno-nacional_13007020-4" target="_blank">this article</a> suggests that earlier this year when there was a one-day strike by the farmers, this is the amount the government promised, but never materialised.<br />
<br />
Note: none of the figures above says the time period. Is it over a year, or many years? No clue.<br />
Also note the large range between the two figures.<br />
<div>
<br /></div>
<h3>
<b><span style="color: #660000;">Facts and numbers can help bring neutrality to the conflict.</span></b></h3>
<div>
As an outsider, there appears to be strong popular support for the farmers in the country, all based on sentiments though. It is a <i>us the people versus them the government</i> situation. </div>
<div>
<br /></div>
<div>
There is very little facts and figures being discussed. Had the government analysed the costs of the agricultural financial help being asked for, and let the people know the consequences and the financial size of the negotiation talks, people would then have an idea to what degree their government can afford to help the farmers. These farmers are indeed quite poor, so images of their leathery and sunken faces shrouded in an earthy poncho arouse a lot of sympathy from the media and social media. </div>
<div>
<br /></div>
<div>
Without concrete numbers, people are acting solely on emotions. Instead of posing the question, <i>"can we as a nation help the farmers, and how much"</i>, it is instead a finger pointing exercise by the people to the government that they have forgotten about the country's farmers. The government has next to no sympathy from the people. Aggressive riots continue that are met by hoards of police in full riot gear. Bloody conflicts pursue day after day. People are getting angrier. There is only black and white, us and them in the story. No neutrality.</div>
<div>
<br /></div>
<h3>
<b><span style="color: #660000;">Is the government missing a great opportunity to introduce changes to its tax system?</span></b></h3>
In my opinion, this is a great opportunity for the government to do small step changes in its tax system. That is, IF they can outline the costs involved for the financial help to the farmers, so that the people can understand the consequences. Since there is strong popular support for the farmers, and assuming the government runs the country with money from taxation of the people it serves, the government should be asking the Colombian people to fund the farmer's agricultural activities through taxation.<br />
<br class="Apple-interchange-newline" />
The country's economy is growing a lot, despite a biased international image of a dangerous drug land. Its city people have a relatively high standard of living compared to its neighbours in South America. As tourists, we feel its prices and infrastructure is comparable to countries like Chile and Argentina. As the country grows more, its tax system is going to need reforms to fund all the nation's spendings, as it is <a href="http://en.wikipedia.org/wiki/Taxation_in_Colombia" target="_blank">relatively low</a> right now compared to western standards. I cannot think of a better opportunity to introduce such changes. That said, there is a lot I don't know about the country!<br />
<br />
<h3 style="text-align: left;">
<div class="separator" style="clear: both; font-size: medium; font-weight: normal; text-align: center;">
</div>
</h3>
<h3>
<b><span style="color: #660000;">Finally, where are the road blocks? Can I get around?</span></b></h3>
<div>
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhmR7OZQ1riMKtcMf3m5EbaTTKKGpI1bd4MJ2zvOtH5uAI2TLM0XkCJDghYVaaGDC2LSk_UGcaH-tCaJhZL_qARKkMshfHxJIfYYI_fgUKAdLLSXM-ediMY8RVwWT8UegSsqjFDDIfGV42y/s1600/Numearl+767+road+block+update.jpg" imageanchor="1" style="clear: right; float: right; margin-bottom: 1em; margin-left: 1em;"><img border="0" height="200" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhmR7OZQ1riMKtcMf3m5EbaTTKKGpI1bd4MJ2zvOtH5uAI2TLM0XkCJDghYVaaGDC2LSk_UGcaH-tCaJhZL_qARKkMshfHxJIfYYI_fgUKAdLLSXM-ediMY8RVwWT8UegSsqjFDDIfGV42y/s200/Numearl+767+road+block+update.jpg" width="95" /></a>Read this list, updated daily.<br />
<br />
Nope, there is no map.<br />
<br />
I'm thankful for the information provided by the <a href="https://www.facebook.com/Numeral767" target="_blank">helpful service #767</a>. At a glance though, without knowing the country's cities and towns super well, I struggle to know whether there are any open roads towards my destination.<br />
<br />
Neither do I have a visual context of the scale of the road blocks.<br />
<br />
<br /></div>
<div>
As we sit here in Bogota, Colombia, totally stuck and unable to leave the capital due to rubber-tire-burning and rocky-fallen-tree-stumps road blocks setup by protesting farmers, we are selfishly annoyed by the disturbance to our year-long South American trip in a mini, self-made <i>casa rodante</i> (house on wheels). We don't want to risk driving through these road blocks, as our friends on the road had firsthand experience going through them. Although they pleaded their way through the blocks ultimately safely, they did also report incidents of rock throwing, tire puncturing, window breaking, etc., done to other cars. Some Colombians say the farmers won't do anything to us and our car, since we are foreigners, but tense conflicts do not always afford reasons. Therefore, we're waiting it out on our friend's couch.<br />
<br /></div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj-6AzytBLs2jrjNutQpDnNhg4UrAbefHXkF1tvsyy87a_gy520E1fu_sKowxq7cMF7e1WEoDZw54U7cEZZBGQUK19qEz_cdg9HHuKljhUlvocFnb1-HCQoCC2zbbXEo3LFhm3ps4yLIDso/s1600/Colombia+paro+bloqueos.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="320" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj-6AzytBLs2jrjNutQpDnNhg4UrAbefHXkF1tvsyy87a_gy520E1fu_sKowxq7cMF7e1WEoDZw54U7cEZZBGQUK19qEz_cdg9HHuKljhUlvocFnb1-HCQoCC2zbbXEo3LFhm3ps4yLIDso/s320/Colombia+paro+bloqueos.jpg" width="318" /></a></div>
<div class="separator" style="clear: both; text-align: center;">
Photo source: ElTiempo.com, screen captures from <a href="http://www.eltiempo.com/Multimedia/especiales/esp_video/paronacional/" target="_blank">here</a></div>
<div class="separator" style="clear: both; text-align: center;">
<br /></div>
<div class="separator" style="clear: both; text-align: center;">
<br /></div>
<b><span style="color: magenta;">Special thanks</span></b> to Angela, William, Julian and Sergio for generously accommodating us in and around Bogota. Colombians are such kind and generous people, helping us completely out of the blue, and in this case, accommodating us simply having met us in hotels or on the roads. Thank you. I wish for peace in your country.</div>
<div class="blogger-post-footer">
This article was originally posted on <a href="http://www.thinkor.org">ThinkOR.org</a>. Share if you like it.</div>Dawenhttp://www.blogger.com/profile/07152350276687825418noreply@blogger.com0tag:blogger.com,1999:blog-5764305054861361597.post-60078895276904588242013-08-27T05:28:00.001+01:002013-11-05T13:26:01.527+00:00More MOOC on Analytics - Coursera<div dir="ltr" style="text-align: left;" trbidi="on">
A hoard of analytics related Massive Open Online Courses (MOOCs) are about to start in September. Have your pick on what to learn. Having taken a few Coursera courses now, I would recommend 1) not taking too many courses at once, however tempting it is to sign up to all of them, unless you have no other work or projects on the go. This is just to make sure you have a reasonable load and able to devote enough of your attention to learning the material properly. 2) Make good use of the discussion forums, as they are both a good source of clarifications and a window into other people's perspectives on the material. 3) Do the exercises, programming assignments and quizzes to ensure your understanding of the material.<br />
<br />
<a href="https://www.coursera.org/course/linearprogramming" target="_blank">Linear and Integer Programming</a><br />
Starts 2 Sept 2013, 9 weeks, 5-7 hours/week <br />
(the basics of mathematical optimisation, a core toolkit in the field of Operations Research)<br />
<br />
<a href="https://www.coursera.org/course/stats1" target="_blank">Statistics One</a><br />
Starts 22 Sept 2013, 12 weeks, 5-8 hours/week<br />
<br />
<a href="https://www.coursera.org/course/recsys" target="_blank">Introduction to Recommender Systems</a><br />
Starts 3 Sept 2013, 14 weeks, 4-10 hours/week <br />
<br />
<a href="https://www.coursera.org/course/compdata" target="_blank">Computing for Data Analysis</a><br />
Starts 23 Sept 2013, 4 weeks, 3-5 hours/week <br />
As I've written before <a href="http://www.thinkor.org/2013/07/learn-r-with-coursera-for-data-analysis.html" target="_blank">here</a>. <br />
<br />
<a href="https://www.coursera.org/course/bigdata" target="_blank">Web Intelligence and Big Data</a><br />
Starts 26 Aug 2013, 12 weeks, 3-4 hours/week<br />
<br />
<a href="https://www.coursera.org/course/thinkagain" target="_blank">Thinking Again: How to Reason and Argue</a><br />
Starts 26 Aug 2013, 12 weeks, 5-6 hours/week <br />
Perhaps a bit off topic, but perhaps not, since all analytics are more or less rooted in proving or disproving arguments, so we better learn how to do it well.<br />
<br />
<br />
Related article:<br />
<a href="http://www.thinkor.org/2012/12/coursera-and-analytics-talent-gap.html">Coursera and the Analytics Talent Gap</a><br />
<a href="http://www.thinkor.org/2009/05/starting-up-in-operational-research.html">Starting up in Operational Research: What Programming Languages Should I Learn?</a></div>
<div class="blogger-post-footer">
This article was originally posted on <a href="http://www.thinkor.org">ThinkOR.org</a>. Share if you like it.</div>Dawenhttp://www.blogger.com/profile/07152350276687825418noreply@blogger.com0tag:blogger.com,1999:blog-5764305054861361597.post-9671016171459883352013-08-14T02:00:00.000+01:002013-11-05T13:31:54.928+00:00Everybody likes to predict, but nobody likes being predictable, nor told what to do<div dir="ltr" style="text-align: left;" trbidi="on">
The Netflix algorithm is in the news again.<br />
<a href="http://www.wired.com/underwire/2013/08/qq_netflix-algorithm/" target="_blank">The Science Behind the Netflix Algorithms That Decide What You’ll Watch Next</a><br />
<br />
Netflix finds rating predictions are no longer as important, trumped by current viewing behaviour, i.e. what you are watching now. However, browsing through the comments, and again, you will see a generally negative reaction. Some people really hate being told what to watch, even if it's just a recommendation. Others say Netflix sucks, because it recommends things they've watched elsewhere. That sounds like a lack of understanding: if you don't tell Netflix you've watched something already, then how could it know?<br />
<br />
As "big data" gets more media attention, it is reaching a wider audience who don't yet understand how algorithms work, but only know there are algorithms everywhere in their life, and it's scary to them. The lack of understanding seems to create fear and resentment.<br />
<div>
<br /></div>
<div>
LinkedIn and Facebook's recommendation systems for helping people find colleagues or friends they may know are generally well received, yet these film recommendation systems aren't. The difference between them might underline the success criteria of rolling out such recommendation systems.</div>
</div>
<div class="blogger-post-footer">
This article was originally posted on <a href="http://www.thinkor.org">ThinkOR.org</a>. Share if you like it.</div>Dawenhttp://www.blogger.com/profile/07152350276687825418noreply@blogger.com0tag:blogger.com,1999:blog-5764305054861361597.post-80838983797429068832013-08-13T02:00:00.000+01:002013-11-05T13:39:17.116+00:00Machine Learning in Movie Script Analysis Rouses Angry Reactions<div dir="ltr" style="text-align: left;" trbidi="on">
An application of Machine Learning is covered in the news lately: movie script analysis.<br />
<a href="http://www.nytimes.com/2013/05/06/business/media/solving-equation-of-a-hit-film-script-with-data.html?pagewanted=all&_r=2&" target="_blank">Solving Equation of a Hit Film Script, With Data</a><br />
<br />
They "compare the story structure and genre of a draft script with those of released movies, looking for clues to box-office success". However, the comments reveal that the general population (at least of the commenters) dislikes the concept for fear of anti-creativity.<br />
<br />
Comments like these sum up the overall sentiment:<br />
<blockquote class="tr_bq">
<i>"Using old data to presage a current idea is both terrible and foolish. It is to writing what Denny's is to fine dining - mediocrity run wild." </i> </blockquote>
<blockquote class="tr_bq">
<i>"Data crunchers will take the art out of everything. Paint-by-numbers." </i></blockquote>
<br />
Ouch.<br />
You be the judge whether this is a good application or not.<br />
<br />
I tend to bias towards answers like this from the comments (sadly this was only 1 of 2 positive comments at the time of my reading; the other one was from the CEO of the script analysis business):<br />
<blockquote class="tr_bq">
<i>"<span style="background-color: rgba(255, 255, 255, 0); text-align: -webkit-auto;">I'm sure people have all sots of assumptions about what audiences like already. This data could be a tool to look deeper into these assumptions. Film makers have always wondered about consumer taste. It is a business. When commerce and art mix, there are inevitable compromises. This tool helps people see possible preferences based on past behavior. Information should never frighten us. It is how this information is applied that most deserves our attention.</span>"</i> </blockquote>
<br />
I think it also never helps the image of such machine learning practitioners when the journalist tries to paint him with an antagonist brush, such as "chain-smoking" and "<span style="background-color: white; font-family: georgia, 'times new roman', times, serif; font-size: 15px; line-height: 22px;">taking a chug of Diet Dr Pepper followed by a gulp of Diet Coke and a drag on a Camel". Reminded me somewhat of <a href="http://www.thinkor.org/2008/09/numerati-casting-or-folks-in-evil-light.html" target="_blank">another writer's writing style when covering analytics</a>.</span></div>
<div class="blogger-post-footer">
This article was originally posted on <a href="http://www.thinkor.org">ThinkOR.org</a>. Share if you like it.</div>Dawenhttp://www.blogger.com/profile/07152350276687825418noreply@blogger.com0tag:blogger.com,1999:blog-5764305054861361597.post-1691030920370257862013-08-12T17:03:00.000+01:002013-11-05T13:26:37.916+00:00Our labels: data scientist vs statisticians (or OR)<div dir="ltr" style="text-align: left;" trbidi="on">
A perennial discussion of identities in the world of analytics is making the rounds on the blogs of statisticians. Or wait a second, what should we call them?<br />
<a href="http://simplystatistics.org/2013/08/08/data-scientist-is-just-a-sexed-up-word-for-statistician/" target="_blank">Data scientist is just a sexed up word for statistician</a><br />
<div>
<br /></div>
Data Scientists, Statisticians, Applied Mathematicians, Operational Researchers...jus to name a few, are the labels one might apply to themselves in the field of analytics. How shall we label ourselves? I can't agree more with Nate Silver,<br />
<blockquote class="tr_bq">
"<i>Just do good work and call yourself whatever you want</i>."</blockquote>
<br /></div>
<div class="blogger-post-footer">
This article was originally posted on <a href="http://www.thinkor.org">ThinkOR.org</a>. Share if you like it.</div>Dawenhttp://www.blogger.com/profile/07152350276687825418noreply@blogger.com0tag:blogger.com,1999:blog-5764305054861361597.post-10067250580914347812013-08-12T16:03:00.003+01:002013-08-12T16:55:51.667+01:00Value chain trumps good design - ColaLife<div dir="ltr" style="text-align: left;" trbidi="on">
Babies in Africa suffer and die from diarrhoea, but it's easily treatable with medicines that costs pennies. The problem is getting the medicine into the mothers hands - a supply chain problem in a rural and sparsely populated area.<br />
<br />
Here comes <a href="http://www.bbc.co.uk/news/magazine-23348408" target="_blank">ColaLife: Turning profits into healthy babies</a>.<br />
<br />
Inventing medicine packaging to fit into coca cola bottle gaps is ingenious, but understanding the value chain, so that all hands that touch the supply chain of the medicine has an incentive to ensure its stock and flow, is even more important.<br />
<br />
<span style="color: #660000;"><b>If there is only one message to take away, I would choose:</b></span><br />
<span style="color: #660000;"><b>"<span style="background-color: rgba(255, 255, 255, 0); text-align: -webkit-auto;">What's in it for me?" </span></b></span><br />
<span style="background-color: rgba(255, 255, 255, 0); text-align: -webkit-auto;">Always ask this to make sure there is a hard incentive for all players to participate. Free give-aways are often not valued, resulting in poorly managed resources and relatively low success rate. Ample training and advertising for awareness and effective usage is also key for product / technology adoption.</span></div>
<div class="blogger-post-footer">
This article was originally posted on <a href="http://www.thinkor.org">ThinkOR.org</a>. Share if you like it.</div>Dawenhttp://www.blogger.com/profile/07152350276687825418noreply@blogger.com0tag:blogger.com,1999:blog-5764305054861361597.post-23246004820712814512013-08-03T14:45:00.000+01:002013-08-03T14:45:41.058+01:00The Slightly Rosier Side of Gambling Analytics<div dir="ltr" style="text-align: left;" trbidi="on">
Having posted about <a href="http://www.thinkor.org/2013/08/the-ugly-side-of-analytics-casino.html" target="_blank">the ugly side of analytics - casino loyalty programmes</a>, the Guardian's DataBlog caught my eye with their article on a rosier side of gambling analytics, where <span style="background-color: rgba(255, 255, 255, 0);"><a href="http://www.theguardian.com/news/datablog/2013/aug/01/uk-firm-uses-machine-learning-fight-gambling-addiction" target="_blank">UK technology firm uses machine learning to combat gambling addiction</a>.</span><br />
<span style="background-color: rgba(255, 255, 255, 0);"><br /></span>
<span style="background-color: rgba(255, 255, 255, 0);">Of course, a business is still a business. It needs to be profitable, so there are reasons more than just "let's be good". I list out below my take on the reasons for "them" the <strike>gamblers</strike> clients, and the reasons for "us" the casinos. Note, I simply assumed the machine learning study is sponsored by the casinos.</span><br /><br />
<h3 style="text-align: left;">
<span style="color: #660000;">Just for "them":</span></h3>
<div>
Casinos too have a corporate social responsibility (CSP). Helping pathological gamblers, or identifying them before they become one is a nice thing to do.</div>
<div>
<br />
<h3 style="text-align: left;">
<span style="color: #660000;">For "them" and for "us":</span></h3>
</div>
<div>
More for everyone! They get to play more, and we get to profit more. The more people play a bit for longer is better than playing a lot for a short amount of time due to self exclusion lists. (I'm not sure which is the better evil of the two though...) <br />
That's the business case. It's not all soft and cuddly like the CSP. Well, ok, business cases almost never are. <br />
<blockquote class="tr_bq">
"<span style="background-color: rgba(255, 255, 255, 0); text-align: -webkit-auto;"><i>If you can help that player have long term sustainable activity, then over the long term that customer will be of more value to you than if they make a short term loss, decide they are out of control and withdraw completely</i>"</span></blockquote>
<br />
<h3 style="text-align: left;">
<span style="color: #660000;">Just for "us":</span></h3>
Minimising gambling problems helps keep the country's regulators off the companies' backs, so they don't have to relocate when the country's regulations tighten. Relocation = cost. A lot of it.<br />
Plus,
<blockquote><i>"And there's also brand reputation for the operator. No company wants to be named in a case study of extreme gambling addiction, to be named in relation to a problem gambler losing their house"</i></blockquote>
<br />
<br />
<h3 style="text-align: left;">
<span style="color: #660000;">A side note: This reaffirmed why I don't gamble...it's a lose-win situation.</span></h3>
<blockquote class="tr_bq">
<i><span style="background-color: rgba(255, 255, 255, 0);">"A lot of casino games operate around a return-to-player rate (RTP) whereby if the customer pays, say £100, the game would be set up to pay back an average of £90. Different games will have different RTPs, and there are a few schools of thought on whether certain rates have different impacts on somebody's likelihood of becoming addicted.</span><span style="background-color: rgba(255, 255, 255, 0);">Some believe that if you lose really quickly, you'll be out of funds very quickly and will leave, and that a higher RTP will keep people on site, but others disagree"</span></i></blockquote>
<br />
I highly recommend reading the full article on the <a href="http://www.theguardian.com/news/datablog/2013/aug/01/uk-firm-uses-machine-learning-fight-gambling-addiction" target="_blank">DataBlog</a>.<br />
<br />
<div style="background-repeat: no-repeat no-repeat; border-collapse: collapse; margin-bottom: 13px; padding: 0px; text-align: left;">
<span style="background-color: rgba(255, 255, 255, 0);"><br /></span></div>
</div>
</div><div class="blogger-post-footer">
This article was originally posted on <a href="http://www.thinkor.org">ThinkOR.org</a>. Share if you like it.</div>Dawenhttp://www.blogger.com/profile/07152350276687825418noreply@blogger.com0tag:blogger.com,1999:blog-5764305054861361597.post-73188769323280576362013-08-01T05:54:00.002+01:002013-08-01T05:54:56.340+01:00The Ugly Side of Analytics - Casino Customer Loyalty<div dir="ltr" style="text-align: left;" trbidi="on">
While listening to <a href="http://www.thisamericanlife.org/radio-archives/episode/466/blackjack" target="_blank"><i>This American Life</i>'s episode "<i>Blackjack</i>"</a>, its Act 2 had me in the car saying, "oh no, they did not!" The "they" is the Caesars Entertainment Corporations (the casino), and yes, they have a customer loyalty programme that they use to "attract more customers", and claim it's no different than other such programmes in industries like supermarkets, hotels, airlines or dry cleaners.<br />
<br />
Well...there is a wee bit of difference.<br />
<h3 style="text-align: left;">
<span style="color: #660000;">No one is addicted to dry cleaning.</span></h3>
<div>
I am saddened that analytics is used to help the casino loyalty programme and hurt the pathological gamblers. The show indicates that the programme identifies "high value customers" using loyalty cards, tracking all spend and results, and then offer them the "right" rewards to keep them coming back. Most addicted gamblers are "high value customers". The bigger the looser, the more the reward. Rewards include drinks and meals, hotel suites, trips to casinos (if you don't live there), to gifts like handbags and diamonds.</div>
<div>
<br /></div>
<div>
Analytics and Operational Research is supposed to be the <i><a href="http://www.scienceofbetter.org/" target="_blank">Science of Better</a></i>.</div>
<div>
<br /></div>
<h3 style="text-align: left;">
<span style="color: #660000;">I'd like to call on all professionals in the analytics field to reflect on the moral goodness, or lack of, in your work.</span></h3>
<div>
There is still hope though. If casinos can use analytics to identify problem gamblers, then others can too. Given pathological gambling is a mental health issue, is it time for NGOs or governments to catch up with technology and get their hands on those loyalty card data?</div>
</div>
<div class="blogger-post-footer">
This article was originally posted on <a href="http://www.thinkor.org">ThinkOR.org</a>. Share if you like it.</div>Dawenhttp://www.blogger.com/profile/07152350276687825418noreply@blogger.com1tag:blogger.com,1999:blog-5764305054861361597.post-37928248216235334472013-07-29T23:40:00.000+01:002013-11-05T13:40:03.196+00:00Learn R with Coursera for Data Analysis<div dir="ltr" style="text-align: left;" trbidi="on">
Heads up: the <a href="https://www.coursera.org/course/compdata" target="_blank">Computing for Data Analysis</a> course is running in September 2013.<br />
<br />
It will teach you the R language for data analysis. The course is described as:<br />
<blockquote class="tr_bq">
<span style="color: #741b47;">This course is about learning the fundamental computing skills necessary for effective data analysis. You will learn to program in R and to use R for reading data, writing functions, making informative graphs, and applying modern statistical methods. </span></blockquote>
<blockquote class="tr_bq">
<span style="color: #741b47;">In this course you will learn how to program in R and how to use R for effective data analysis. You will learn how to install and configure software necessary for a statistical programming environment, discuss generic programming language concepts as they are implemented in a high-level statistical language. The course covers practical issues in statistical computing which includes programming in R, reading data into R, creating informative data graphics, accessing R packages, creating R packages with documentation, writing R functions, debugging, and organizing and commenting R code. Topics in statistical data analysis and optimization will provide working examples.</span></blockquote>
<br />
<br />
Related article:<br />
<a href="http://www.thinkor.org/2012/12/coursera-and-analytics-talent-gap.html" target="_blank">Coursera and the Analytics Talent Gap</a><br />
<a href="http://www.thinkor.org/2009/05/starting-up-in-operational-research.html" target="_blank">Starting up in Operational Research: What Programming Languages Should I Learn?</a></div>
<div class="blogger-post-footer">
This article was originally posted on <a href="http://www.thinkor.org">ThinkOR.org</a>. Share if you like it.</div>Dawenhttp://www.blogger.com/profile/07152350276687825418noreply@blogger.com0tag:blogger.com,1999:blog-5764305054861361597.post-8431665295482947702013-07-28T06:44:00.004+01:002013-07-29T23:41:32.215+01:00Even Google can't get their numbers straight<div dir="ltr" style="text-align: left;" trbidi="on">
Google has so many various entities and <a href="http://en.wikipedia.org/wiki/List_of_Google_products" target="_blank">products</a>, either grown within the organisation or externally acquired. It appears that even Google, the leader in Data Science and Analytics, cannot get all the numbers straight across their products: Google Analytics vs. Blogger.<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgKaq7hbu2DryMU4uwdgni8VXrgChYj-25c1DU5mWb2N2gLT2GDtN4mpQCaYE5GC0LEF-L2ZvyB8ZjuJ6SIEQwYgGYpBqCu1YaoE8vAAkREL30r_Heft1fTWTLpM93zUPebkftqCWYbN052/s1600/blogger.jpeg" imageanchor="1" style="clear: left; display: inline !important; margin-bottom: 1em; margin-right: 1em;"><img border="0" height="104" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgKaq7hbu2DryMU4uwdgni8VXrgChYj-25c1DU5mWb2N2gLT2GDtN4mpQCaYE5GC0LEF-L2ZvyB8ZjuJ6SIEQwYgGYpBqCu1YaoE8vAAkREL30r_Heft1fTWTLpM93zUPebkftqCWYbN052/s200/blogger.jpeg" width="140" /></a><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEg3C_Q8vvgq3Ff9ns9u3YOie0eA8K0BlSPUKIHt2u92C5lZzENDPFxXao51e80abVt4l_4EJrc9vr-7nM7ZVis9vB_GvRE6bhS-Gc4bRSpWZ0wfTQ7WbZkYA1ic5GVJGaEoZxHII4TqqulO/s1600/GoogleAnalytics.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="140" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEg3C_Q8vvgq3Ff9ns9u3YOie0eA8K0BlSPUKIHt2u92C5lZzENDPFxXao51e80abVt4l_4EJrc9vr-7nM7ZVis9vB_GvRE6bhS-Gc4bRSpWZ0wfTQ7WbZkYA1ic5GVJGaEoZxHII4TqqulO/s200/GoogleAnalytics.png" width="140" /></a></div>
<div class="separator" style="clear: both; text-align: center;">
<br /></div>
<h3 style="text-align: left;">
<span style="color: #660000;">Is this blog really </span><i style="color: #660000;">that</i><span style="color: #660000;"> popular? Really?</span></h3>
While I was checking this blog's traffic numbers on Blogger's built-in "Stats" function, I was really surprised that the blog seems to be really popular, even though I have not been good (sorry!) at writing much for some time. As an ex-SEO'er, I had an inkling that something is not right. Up comes Google Analytics.<br />
<br />
<h3>
<span style="color: #660000;">Blogger Stats numbers are 4.5 times bigger than </span><span style="color: #660000;">Google Analytics'.</span></h3>
After checking my Google Analytics (GA) numbers. I was really surprised to see that the Blogger Pageview numbers were 4.5 times bigger than the GA numbers. That is a staggering difference!<br />
<br />
After some research on the web, I concluded that:<br />
<ol style="text-align: left;">
<li>GA is much closer to the truth (but not quite completely true, see 3 below).</li>
<li>Blogger stats include all kinds of bots traffic, so it's heavily inflated (GA tries to filter most out).</li>
<li>GA cannot count any traffic if the user has <a href="https://support.google.com/analytics/answer/1008065?hl=en" rel="nofollow" target="_blank">disabled Javascript</a>. Some folks suggest it undercounts traffic by 50%, but there is no hard evidence to back it up, so take it with a grain of salt.</li>
<li>Blogger seems set on reporting only Pageviews, not any other useful metrics, such as Visits or Unique Visitors. Not sure why.</li>
<li>This blog has probably been targeted by a spam bot. Upon closer look, one of the bots probably comes from a particular Dutch ISP.</li>
</ol>
<div>
<br /></div>
<h3 style="text-align: left;">
<span style="color: #660000;">Share best practice and be consistent.</span></h3>
<div>
I would have expected Google, the leader in Data Science and Analytics, to share best practice amongst its entities and products, such as reporting on key metrics (not just Pageviews).</div>
<div>
<br /></div>
<div>
I would also have expected Google to be able to have a consistent set of numbers amongst its entities and products. Doesn't appear so neither.</div>
<div>
<br /></div>
<br />
The majority of a Business Intelligence (BI) analyst's job is spent verifying and reconciling numbers amongst various reports, more often than not. Major BI tech giants sell BI applications that often allude to reducing such activities and increasing business confidence in the numbers in their data warehouse. However, it is still a major challenge to most companies, as evidenced here. Without a good and reliable data source, the validity of any following analysis is heavily undermined.<br />
<br />
Let's try to stay consistent.<br />
That goes for the metric choice, <i>and</i> the numbers.<br />
<br />
<br />
FYI: if you want to find out if and who is attacking your site with spam bots, <a href="http://davebuesing.com/google-analytics-spam-traffic-bots/" target="_blank">read this helpful post</a>.<br />
<div>
<br /></div>
</div>
<div class="blogger-post-footer">
This article was originally posted on <a href="http://www.thinkor.org">ThinkOR.org</a>. Share if you like it.</div>Dawenhttp://www.blogger.com/profile/07152350276687825418noreply@blogger.com0tag:blogger.com,1999:blog-5764305054861361597.post-84632311985037669762013-04-06T02:42:00.000+01:002013-04-06T18:08:20.894+01:007.2% raise for 1,000 best paid Ontario public sector employees<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEglZGiLVQ1WEGZzA6g1M6c6h9TvRoVp8Wz08CpKNFhndu1JG_XCeWTQrE1sjtNMbQr-cE46QLf3_wQLNkUiDru9q42_naqGYdifWEkAW8b5-WZwfNp5TqA0U9C6vs1anXCDcD9VjDigsrQk/s1600/graphv2.png" imageanchor="1" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"><img border="0" height="217" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEglZGiLVQ1WEGZzA6g1M6c6h9TvRoVp8Wz08CpKNFhndu1JG_XCeWTQrE1sjtNMbQr-cE46QLf3_wQLNkUiDru9q42_naqGYdifWEkAW8b5-WZwfNp5TqA0U9C6vs1anXCDcD9VjDigsrQk/s400/graphv2.png" width="100%" /></a></div>
<div style="margin-bottom: 0cm;">
<span style="font-family: Calibri, sans-serif;"><br /></span></div>
<div style="margin-bottom: 0cm;">
<span style="font-family: Calibri, sans-serif;"><br /></span></div>
<div style="margin-bottom: 0cm;">
<span style="font-family: Calibri, sans-serif;">The
top 1,000 employees with the highest package (salary + taxable
benefits) in the Ontario Public Sector Salary Disclosure, the
so-called “Sunshine List”, saw an average increase of almost
$25,000 in 2012 compared to the previous year, an increase of 7.2%,
much higher than the bottom half of the 80,000-strong list which saw
an increase of only 2.2%.</span></div>
<div style="margin-bottom: 0cm;">
<br /></div>
<div style="margin-bottom: 0cm;">
<span style="font-family: Calibri, sans-serif;">Is
this cause for alarm? Highly paid CEO's are fully in the public
spotlight, and the many many school principals have their pay closely
monitored, but what about the highly paid individuals near, but not
at the top? The data shows that for them, 2012 was a good year.</span></div>
<div style="margin-bottom: 0cm;">
<br /></div>
<div style="margin-bottom: 0cm;">
<span style="font-family: Calibri, sans-serif;">Every
year since 1996, the Ontario Ministry of Finance has released a list
of all public sector employees who earned more than $100,000 in the
previous year.</span></div>
<div style="margin-bottom: 0cm;">
<br /></div>
<div style="margin-bottom: 0cm;">
<span style="font-family: Calibri, sans-serif;"><b>Oversight</b></span></div>
<div style="margin-bottom: 0cm;">
<br /></div>
<div style="margin-bottom: 0cm;">
<span style="font-family: Calibri, sans-serif;">We can
all see that “Sunshine List” champion Thomas Mitchell, President
& CEO of Ontario Power Generation took a pay cut this year, but
with approaching 100,000 names on the list, more sophisticated,
data-drive oversight is possible.</span></div>
<div style="margin-bottom: 0cm;">
<br /></div>
<div style="margin-bottom: 0cm;">
<span style="font-family: Calibri, sans-serif;">Government-friendly
observes point out that the average salary on the list has decreased,
just like last year, but that is a red herring. Anyone can add over
9,000 people earning just over $100k to a list with an average salary
of $129k and bring down the average. As the list continues to grow
from the bottom, we can expect the average salary to decline, without
this being any indicator of public fiscal discipline.</span></div>
<div style="margin-bottom: 0cm;">
<br /></div>
<div style="margin-bottom: 0cm;">
<span style="font-family: Calibri, sans-serif;">Opposition
partisans will lament the increasing growth of the list, 9,000 more
this year and 7,500 the year before. This is again misleading. The
pyramid shape of any organisation tells us that there are more people
as you move down the salary brackets. With a perfectly reasonable
average salary growth at just over 2.5%, 9,600 employees graduated to
the “Sunshine List” this year after having earned around $98k
last year. Probably more than 9,600 employees, currently earning
around $98k will be new additions to the list next year, and more the
year after. Inflation and economic growth will ensure that the list
grows, and the pyramid shape will ensure that it grows faster.</span></div>
<div style="margin-bottom: 0cm;">
<br /></div>
<div style="margin-bottom: 0cm;">
<span style="font-family: Calibri, sans-serif;"><b>Top
1,000</b></span></div>
<div style="margin-bottom: 0cm;">
<br /></div>
<div style="margin-bottom: 0cm;">
<span style="font-family: Calibri, sans-serif;">So who
are these lucky 1,000 who on average made 7.2% more in 2012?</span></div>
<div style="margin-bottom: 0cm;">
<br /></div>
<div style="margin-bottom: 0cm;">
<span style="font-family: Calibri, sans-serif;">This
year the top 1000 best packages on the list included:</span></div>
<ul>
<li><div style="margin-bottom: 0cm;">
<span style="font-family: Calibri, sans-serif;">583
individuals working in hospitals</span></div>
<ul>
<li><div style="margin-bottom: 0cm;">
<span style="font-family: Calibri, sans-serif;">176
Pathologists</span></div>
</li>
<li><div style="margin-bottom: 0cm;">
<span style="font-family: Calibri, sans-serif;">50
Chief Executive Officers</span></div>
</li>
<li><div style="margin-bottom: 0cm;">
<span style="font-family: Calibri, sans-serif;">66
Vice-Presidents (Senior, Executive, etc.)</span></div>
</li>
<li><div style="margin-bottom: 0cm;">
<span style="font-family: Calibri, sans-serif;">79
Psychiatrists</span></div>
</li>
</ul>
</li>
<li><div style="margin-bottom: 0cm;">
<span style="font-family: Calibri, sans-serif;">86
employees in electricity</span></div>
<ul>
<li><div style="margin-bottom: 0cm;">
<span style="font-family: Calibri, sans-serif;">56
Vice-Presidents (Senior, Executive, etc.)</span></div>
</li>
</ul>
</li>
<li><div style="margin-bottom: 0cm;">
<span style="font-family: Calibri, sans-serif;">144
working at Universities</span></div>
<ul>
<li><div style="margin-bottom: 0cm;">
<span style="font-family: Calibri, sans-serif;">100
Professors</span></div>
</li>
</ul>
</li>
</ul>
<div style="margin-bottom: 0cm;">
<span style="font-family: Calibri, sans-serif;"><b>Big
raises</b></span></div>
<div style="margin-bottom: 0cm;">
<br /></div>
<div style="margin-bottom: 0cm;">
<span style="font-family: Calibri, sans-serif;">Of the
1,000, 737 can be matched exactly by name and organisation type to
last year. 92 of those fortunate souls saw an increase of over 25%!
At the top of the pack was Mohamed Abelaziz Elbestawi, Vice-President
Research/Professor at McMaster University who was reported as paid
salary $266k in 2011 and $506k in 2012! Trung Kien Mai, a
Pathologist at The Ottawa Hospital saw his paid salary move from
$306k in 2011 to $515k in 2012!</span></div>
<div style="margin-bottom: 0cm;">
<br /></div>
<div style="margin-bottom: 0cm;">
<span style="font-family: Calibri, sans-serif;">Of
those 92 with big raises:</span></div>
<ul>
<li><div style="margin-bottom: 0cm;">
<span style="font-family: Calibri, sans-serif;">83
work in hospitals</span></div>
<ul>
<li><div style="margin-bottom: 0cm;">
<span style="font-family: Calibri, sans-serif;">50
are Pathologists</span></div>
</li>
</ul>
</li>
</ul>
<div style="margin-bottom: 0cm;">
<span style="font-family: Calibri, sans-serif;"><b>More
questions</b></span></div>
<div style="margin-bottom: 0cm;">
<br /></div>
<div style="margin-bottom: 0cm;">
<span style="font-family: Calibri, sans-serif;">At
this point, this analysis raises more questions than it answers, but
that is to be expected from an analysis of this salary disclosure
data. The Public Salary Disclosure Act can help us find questions,
not answers. What we do know is that:</span></div>
<ul>
<li><div style="margin-bottom: 0cm;">
<span style="font-family: Calibri, sans-serif;">Salaries
near the top grew substantially</span></div>
</li>
<li><div style="margin-bottom: 0cm;">
<span style="font-family: Calibri, sans-serif;">Those
salaries grew much more, even on a % basis than those at the bottom</span></div>
</li>
<li><div style="margin-bottom: 0cm;">
<span style="font-family: Calibri, sans-serif;">Growth
was higher than expected given slow economic growth</span></div>
</li>
<li><div style="margin-bottom: 0cm;">
<span style="font-family: Calibri, sans-serif;">Some
individuals can be shown to have experienced extraordinary raises</span></div>
</li>
<li><div style="margin-bottom: 0cm;">
<span style="font-family: Calibri, sans-serif;">Pathologists
do well, and 2012 was a particularly good year for some</span><br />
<span style="font-family: Calibri, sans-serif;"><br /></span></div>
</li>
</ul>
<span style="font-family: Calibri, sans-serif;">Source: </span><a href="http://www.fin.gov.on.ca/en/publications/salarydisclosure/pssd/">http://www.fin.gov.on.ca/en/publications/salarydisclosure/pssd/</a>
<div class="blogger-post-footer">
This article was originally posted on <a href="http://www.thinkor.org">ThinkOR.org</a>. Share if you like it.</div>Anonymoushttp://www.blogger.com/profile/09105534198004682280noreply@blogger.com0tag:blogger.com,1999:blog-5764305054861361597.post-43254492964219108982013-04-06T02:28:00.000+01:002013-04-06T02:28:50.159+01:00Timberland customer care & operations - I approve!<div dir="ltr" style="text-align: left;" trbidi="on">
Buying a brand is buying quality - that's especially true for outdoor equipment.<br />
<br />
With this belief, I purchased a pair of Timberland hiking boots that said "Waterproof" on a piece of official-looking metal attached to them. I then ended up with wet feet during an 8-day trek in Patagonia where it often rains - that sucked.<br />
<br />
With my toes literally swimming in water within the boots, after a soppy wet day of a 19km hike, I was not a happy camper. However, my perception of Timberland took an 180 degree turn for the better.<br />
<br />
Having bought the boots in southern Chile in a <a href="http://www.bata.com/" target="_blank">Bata</a> store, having used them extensively and been disappointed and upset by them, I ran into a <a href="http://www.timberland.com/" target="_blank">Timberland</a> brand store 2,500km away from where I bought them, still in Chile. I went and complained about my disappointment in these supposedly "waterproof" boots, and I was offered the chance to exchange them for a brand new pair that is indeed waterproof, paying only the small price difference between the two pairs.<br />
<br />
This is operationally remarkable:<br />
<br />
<div style="text-align: left;">
<span style="color: #660000; font-size: large;"><b>Different stores (Bata vs Timberland)</b></span></div>
<div>
I bought them in Bata, which is a popular international brand that happens to carry the Timberland boots. However, I was able to exchange them in a Timberland own brand store. Given the receipts I got from the Timberland store says "Bata" on it, I suspect the two are operated by the same company. However, as a western audience, can you imagine buying something in Gap and then returning in Banana Republic (same mother company)?</div>
<div>
<br /></div>
<div>
<div style="text-align: left;">
<span style="color: #660000; font-size: large;"><b>Different cities and provinces</b></span></div>
</div>
I don't know how it's like in the US, but in Canada, returns and exchanges wouldn't be possible cross provincial borders. Yet, in this case, it was not a problem.<br />
<br />
<br />
<div style="text-align: left;">
<span style="color: #660000; font-size: large;"><b>After the 14-day exchange period without the paper receipt</b></span></div>
<div style="text-align: left;">
It was at least 3 weeks after the original purchase date, while the receipt stated a 14-day exchange period. I also didn't keep the paper receipt (trying to be light while travelling), but I had a photo of it on my phone. This I was able to email to them to enable the processing. Again, can you imagine this to happen in a western country? </div>
<div style="text-align: left;">
<br /></div>
<br />
<div style="text-align: left;">
<b><span style="font-size: large;"><span style="color: #660000;">"Waterproof" <span style="background-color: white; font-family: arial, sans-serif; line-height: 16px;">≠</span> </span><span style="color: #660000;">"Gore-Tex"</span></span></b></div>
<div>
Finally, for everyone's learning, apparently, if it only says "waterproof", it's not waterproof. Only if it says "Gore-Tex", then it's actually waterprof.</div>
<div>
<br /></div>
<div>
<br /></div>
<div>
I went into the Timberland store only to vent my frustration. I was positively flabbergasted when they offered to exchange for a new pair. Not only is the customer care commendable, but operationally that this could happen is something I would never have expected. They basically went against all the rules I know that would make this infeasible in western countries. Yet, the teens that worked at the Timberland store were willing enough to find ways to help me, a foreigner with broken Spanish, so I would have this outstanding experience and be happy with the decently expensive pair of hiking boots. How they keep the books straight on this transaction is beyond me, 'cause surely they are running Bata and Timberland as two separate business entities. </div>
<div>
<br /></div>
<div>
The result: Timberland now has a new loyal customer. This is an outstanding example of great customer care made possible by some well-integrated and smooth operations.</div>
</div>
<div class="blogger-post-footer">
This article was originally posted on <a href="http://www.thinkor.org">ThinkOR.org</a>. Share if you like it.</div>Dawenhttp://www.blogger.com/profile/07152350276687825418noreply@blogger.com2tag:blogger.com,1999:blog-5764305054861361597.post-46665539204523549032012-12-30T16:50:00.002+00:002013-11-05T13:27:53.957+00:00Coursera and the analytics talent gap<div dir="ltr" style="text-align: left;" trbidi="on">
It's been a while, and ThinkOR is back to blogging about Operational Research and its related themes.<br />
<br />
ThinkOR authors are about to start on 3 <a href="https://www.coursera.org/" target="_blank">Coursera</a> courses over the next couple months:<br />
<ul style="text-align: left;"><a href="https://s3.amazonaws.com/coursera/topics/dataanalysis/small-icon.hover.png" imageanchor="1" style="clear: right; float: right; margin-bottom: 1em; margin-left: 1em;"><img border="0" height="112" src="https://s3.amazonaws.com/coursera/topics/dataanalysis/small-icon.hover.png" width="200" /></a>
<li><a href="https://www.coursera.org/course/nlangp" target="_blank">Natural Language Processing</a></li>
<li><a href="https://www.coursera.org/course/ml" target="_blank">Machine Learning</a></li>
<li><a href="https://www.coursera.org/course/dataanalysis" target="_blank">Data Analysis (with R)</a> (to refresh)</li>
</ul>
<br />
I am not only learning about some new topics for my own benefit, but also interested in assessing how such easily accessible courses could help the so-called 'big data and analytics talent gap' in businesses. As a Business Analytics consultant, this is one of the biggest issues I see my clients facing in today's business world - one wouldn't think about it, if they don't know about it, and once they know about it, they don't know how to get more of it. Obviously, there would need to be some sort of a step progression, such as (just an example without much research at this point):<br />
<ol style="text-align: left;">
<li><a href="https://www.coursera.org/course/stats1" target="_blank">Statistics One</a></li>
<li><a href="https://www.coursera.org/course/dataanalysis" target="_blank">Data Analysis (with R)</a> and/or <a href="https://www.coursera.org/course/compdata" target="_blank">Computing for Data Analysis</a></li>
<li>some sort of programming course, check the <a href="https://www.coursera.org/category/cs-programming" target="_blank">computing course catalogue</a></li>
<li>Focus on one or several of the main OR techniques and their associated tools, such as Discrete Event Simulation, Monte Carlo Simulation, Optimisation, Forecasting, <a href="https://www.coursera.org/course/ml" target="_blank">Machine Learning</a>, and the good old Volumetric Modelling, as some examples</li>
<li>and if you are going to work with humongous data sets, <a href="https://www.coursera.org/course/datasci" target="_blank">Intro to Data Science</a> sounds reasonable to become familiar with the various big data technology to apply data science (I suspect this often eludes traditional OR practitioners)</li>
</ol>
As ThinkOR goes along, we will be blogging about these courses and our learning experience. So far, there has only been <i>very</i> positive feedback. Let's get going!<br />
<br />
Merry Christmas and Happy New Year!</div>
<div class="blogger-post-footer">
This article was originally posted on <a href="http://www.thinkor.org">ThinkOR.org</a>. Share if you like it.</div>Dawenhttp://www.blogger.com/profile/07152350276687825418noreply@blogger.com0tag:blogger.com,1999:blog-5764305054861361597.post-56800987079344413422012-05-31T21:44:00.002+01:002012-05-31T21:44:13.743+01:00Consistent Education Divide in CitiesThe Daily Viz <a href="http://thedailyviz.com/jp/growing-education-divide-in-cities/?utm_source=rss&utm_medium=rss&utm_campaign=growing-education-divide-in-cities">brought</a> <a href="http://www.nytimes.com/interactive/2012/05/29/us/growing-education-divide-in-cities.html?ref=us">this to my attention</a>. It's a visual by the New York Times showing how the distribution of cities by proportion of adults with college degrees has changed over the last 40 years.<br />
<br />
Nicely formatted and presented, though my ability to compare the distributions side-by-side is a little bit limited.<br />
<br />
The key story that this visual is telling is that the average has moved from 12% to 32%, but that the number of cities more than 5% above or below the average has increased substantially. "College graduates are more unevenly distributed in the top 100 metropolitan areas now than they were four decades ago." But i'm not sure if it's as simple as that.<br />
<br />
Suppose I was measuring trees. One species was 10 feet tall on average and species two was 100 feet tall. If the first tended to vary between 7 feet and 13 feet, but the latter tended to vary from 85 feet to 115 feet, I wouldn't remark at how much more variable these trees were. For species one, no tree was more than 3 feet from the average, but in species two, presumably many are. Is this a sign that species two is more unevenly distributed? Not really. Species one varies up and down by 30% where two does so by 15%.<br />
<br />
So I asked myself, given that the average proportion of adults with college degrees has nearly tripled to 32%, has their variability increased proportionally? Now that these trees are 32 feet tall, it seems strange to still measure their "unevenness" by how many of them are between 27 and 37.<br />
<br />
So I reached out to a statistic, the <a href="http://en.wikipedia.org/wiki/Coefficient_of_variation">Coefficient of Variation</a>. Using my eyes to collect the data from the charts (so not precisely the correct data), I calculate a coefficient of 0.25 in 1970 and 0.22 in 2010. The variation in the data as a proportion of the average has gone down in the last four decades.<br />
<br />
Again, the NYT concludes that "College graduates are more unevenly distributed in the top 100 metropolitan areas now than they were four decades ago.", but I would argue that if anything they are slightly more <i>evenly</i> spread than before and <i>not</i> remarkably so.<div class="blogger-post-footer">
This article was originally posted on <a href="http://www.thinkor.org">ThinkOR.org</a>. Share if you like it.</div>Anonymoushttp://www.blogger.com/profile/09105534198004682280noreply@blogger.com0tag:blogger.com,1999:blog-5764305054861361597.post-41864523444458304322012-04-28T00:46:00.000+01:002013-07-29T23:55:29.515+01:00Data Journalism<div dir="ltr" style="text-align: left;" trbidi="on">
<br />
<br />
<div class="MsoNormal">
I've recently started following the Guardian's <a href="http://www.guardian.co.uk/news/datablog">Data Blog</a>,
but I was a little disappointed with their<a href="http://www.guardian.co.uk/news/datablog/2012/apr/27/grammar-schools-intake-kent"> recent article on grammar schools inthe UK</a>.</div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
My understanding is that grammar schools are a subset of
schools in the UK that supposedly offer entry on a meritocratic basis and
deliver higher quality education. Depending on your political leanings you
either believe that grammar schools re-enforce the class division in the UK by
giving entry disproportionately to the already higher class and then giving
them a better education or you believe that grammar schools enable class mobility
by delivering a better education to bright lower class students who would not
otherwise afford such a thing. As an outsider in the UK I’m not qualified to
hold an opinion here, but I suspect that naturally each extreme fails to
appreciate some nuanced details.</div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
The article appears to have pulled off a classic
journalist's ploy:</div>
<div class="MsoNormal">
</div>
<ol>
<li>Present a statistical analysis of the data in a leading
way without drawing conclusions</li>
<li>Quote somebody else's opinion on the topic</li>
</ol>
<br />
<div class="MsoNormal">
Essentially you can deliver opinion supported by the
apparent full weight of objective statistical analysis without having to put
your name to the conclusions which might not hold up to rigorous challenge.</div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
Notice also that one of the opinions is much stronger than
the other. Notice also that Rosemary Joyce's note has very complex implications
which are not at all explored for the reader. Even I’m not sure if she has a
point.</div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
I could offer a very different view of the same data:</div>
<div class="MsoNormal">
</div>
<ul>
<li><o:p> </o:p>14 of 32 schools favoured those not privately educated,
giving fewer than 6% of offers to the privately educated</li>
<li>Taking 24 of the 32 schools (3/4) with the lowest
privately educated proportions, the average was 6%, the same as the overall
population</li>
<li>Removing the two clear outliers in the data,
"Tonbridge Grammar School", "The Judd School" overall the
privately educated averaged to 8.9%</li>
</ul>
<br />
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
I feel the key fact I’m missing is: What % of students in
Kent who scored well on the 11-plus exam were privately educated? How does this
compare to the 10.89%? How does this compare to the 8.9% removing outliers? Is
there a social bias in the offers?</div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
I’m also missing any information about how these numbers
have been changing with time. Simon Murphy complains that the government is not
taking steps to improve the chances of poor children, and yet for all I know
that 10.89% was maybe 12% last year.</div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
What about this “local context” anyhow? How do these percentages
compare at a lower level of granularity that county-wide? How do these
percentages compare to applications?</div>
<div class="MsoNormal" style="text-indent: 0cm;">
<br /></div>
<div class="MsoNormal">
Is this a story of a county-wide bias, or just the story of
two bad apples and handful of not-so-good-ones? I think I know what The Guardian
wants me to think. Data Journalism is still Journalism I suppose.</div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
For my readers, I ask, why do you suppose the 10.89% number
is the only one in the text of the article to two decimal places?</div>
<br /></div>
<div class="blogger-post-footer">
This article was originally posted on <a href="http://www.thinkor.org">ThinkOR.org</a>. Share if you like it.</div>Anonymoushttp://www.blogger.com/profile/09105534198004682280noreply@blogger.com0