Thursday, August 29, 2013

Colombia's national agricultural strike can use some facts and numbers to aid the public discussions

Today, there is the largest yet march in the cities of Colombia during the country's nationwide agricultural paro (strike) by the campesinos (farmers). It's been 10 days so far. Students (suspending classes), truck drivers, health workers, together with miners, potato and other small farmers, coffee growers will all march today. The situation is fairly grave. You can read about the demands of the farmers here, but in summary, they are asking for financial help from the government for seeds, fertilisers, fuel and highway tolls. (FYI: Colombia's highway toll costs have been the highest in our year-long South America trip so far.)

Photo source: &, screen captures from here and here

A lot of emotions, but how much is all this financial help?

Not clear.

All the reporting by newspapers like this, this and this, only ever talk about whether the farmers and the government have reached an agreement, that the road block continues, that the food prices are rising due to shortage, that there are more blocks and marches, etc. Nothing concrete about how much all this financial help amounts to, not even a summarised high level figure.

Minimum: half a million US dollars.

The best I could gather was from this article, that the government has asked for 1 billion pesos (~half million USD) to be added to the agricultural budget. My best guess is that this is the minimum, because Colombia is not that expensive, but it's not that cheap either. Nothing on the news can tell me what the range of this ballpark figure could be though.

More likely: 20 million US dollars.
As this article suggests that earlier this year when there was a one-day strike by the farmers, this is the amount the government promised, but never materialised.

Note: none of the figures above says the time period. Is it over a year, or many years? No clue.
Also note the large range between the two figures.

Facts and numbers can help bring neutrality to the conflict.

As an outsider, there appears to be strong popular support for the farmers in the country, all based on sentiments though. It is a us the people versus them the government situation. 

There is very little facts and figures being discussed. Had the government analysed the costs of the agricultural financial help being asked for, and let the people know the consequences and the financial size of the negotiation talks, people would then have an idea to what degree their government can afford to help the farmers. These farmers are indeed quite poor, so images of their leathery and sunken faces shrouded in an earthy poncho arouse a lot of sympathy from the media and social media. 

Without concrete numbers, people are acting solely on emotions. Instead of posing the question, "can we as a nation help the farmers, and how much", it is instead a finger pointing exercise by the people to the government that they have forgotten about the country's farmers. The government has next to no sympathy from the people. Aggressive riots continue that are met by hoards of police in full riot gear. Bloody conflicts pursue day after day. People are getting angrier. There is only black and white, us and them in the story. No neutrality.

Is the government missing a great opportunity to introduce changes to its tax system?

In my opinion, this is a great opportunity for the government to do small step changes in its tax system. That is, IF they can outline the costs involved for the financial help to the farmers, so that the people can understand the consequences. Since there is strong popular support for the farmers, and assuming the government runs the country with money from taxation of the people it serves, the government should be asking the Colombian people to fund the farmer's agricultural activities through taxation.

The country's economy is growing a lot, despite a biased international image of a dangerous drug land. Its city people have a relatively high standard of living compared to its neighbours in South America. As tourists, we feel its prices and infrastructure is comparable to countries like Chile and Argentina. As the country grows more, its tax system is going to need reforms to fund all the nation's spendings, as it is relatively low right now compared to western standards. I cannot think of a better opportunity to introduce such changes. That said, there is a lot I don't know about the country!

Finally, where are the road blocks? Can I get around?

Read this list, updated daily.

Nope, there is no map.

I'm thankful for the information provided by the helpful service #767. At a glance though, without knowing the country's cities and towns super well, I struggle to know whether there are any open roads towards my destination.

Neither do I have a visual context of the scale of the road blocks.

As we sit here in Bogota, Colombia, totally stuck and unable to leave the capital due to rubber-tire-burning and rocky-fallen-tree-stumps road blocks setup by protesting farmers, we are selfishly annoyed by the disturbance to our year-long South American trip in a mini, self-made casa rodante (house on wheels). We don't want to risk driving through these road blocks, as our friends on the road had firsthand experience going through them. Although they pleaded their way through the blocks ultimately safely, they did also report incidents of rock throwing, tire puncturing, window breaking, etc., done to other cars. Some Colombians say the farmers won't do anything to us and our car, since we are foreigners, but tense conflicts do not always afford reasons. Therefore, we're waiting it out on our friend's couch.

Photo source:, screen captures from here

Special thanks to Angela, William, Julian and Sergio for generously accommodating us in and around Bogota. Colombians are such kind and generous people, helping us completely out of the blue, and in this case, accommodating us simply having met us in hotels or on the roads. Thank you. I wish for peace in your country.

Tuesday, August 27, 2013

More MOOC on Analytics - Coursera

A hoard of analytics related Massive Open Online Courses (MOOCs) are about to start in September. Have your pick on what to learn. Having taken a few Coursera courses now, I would recommend 1) not taking too many courses at once, however tempting it is to sign up to all of them, unless you have no other work or projects on the go. This is just to make sure you have a reasonable load and able to devote enough of your attention to learning the material properly. 2) Make good use of the discussion forums, as they are both a good source of clarifications and a window into other people's perspectives on the material. 3) Do the exercises, programming assignments and quizzes to ensure your understanding of the material.

Linear and Integer Programming
Starts 2 Sept 2013, 9 weeks, 5-7 hours/week
(the basics of mathematical optimisation, a core toolkit in the field of Operations Research)

Statistics One
Starts 22 Sept 2013, 12 weeks, 5-8 hours/week

Introduction to Recommender Systems
Starts 3 Sept 2013, 14 weeks, 4-10 hours/week

Computing for Data Analysis
Starts 23 Sept 2013, 4 weeks, 3-5 hours/week
As I've written before here.

Web Intelligence and Big Data
Starts 26 Aug 2013, 12 weeks, 3-4 hours/week

Thinking Again: How to Reason and Argue
Starts 26 Aug 2013, 12 weeks, 5-6 hours/week
Perhaps a bit off topic, but perhaps not, since all analytics are more or less rooted in proving or disproving arguments, so we better learn how to do it well.

Related article:
Coursera and the Analytics Talent Gap
Starting up in Operational Research: What Programming Languages Should I Learn?

Wednesday, August 14, 2013

Everybody likes to predict, but nobody likes being predictable, nor told what to do

The Netflix algorithm is in the news again.
The Science Behind the Netflix Algorithms That Decide What You’ll Watch Next

Netflix finds rating predictions are no longer as important, trumped by current viewing behaviour, i.e. what you are watching now. However, browsing through the comments, and again, you will see a generally negative reaction. Some people really hate being told what to watch, even if it's just a recommendation. Others say Netflix sucks, because it recommends things they've watched elsewhere. That sounds like a lack of understanding: if you don't tell Netflix you've watched something already, then how could it know?

As "big data" gets more media attention, it is reaching a wider audience who don't yet understand how algorithms work, but only know there are algorithms everywhere in their life, and it's scary to them. The lack of understanding seems to create fear and resentment.

LinkedIn and Facebook's recommendation systems for helping people find colleagues or friends they may know are generally well received, yet these film recommendation systems aren't. The difference between them might underline the success criteria of rolling out such recommendation systems.

Tuesday, August 13, 2013

Machine Learning in Movie Script Analysis Rouses Angry Reactions

An application of Machine Learning is covered in the news lately: movie script analysis.
Solving Equation of a Hit Film Script, With Data

They "compare the story structure and genre of a draft script with those of released movies, looking for clues to box-office success". However, the comments reveal that the general population (at least of the commenters) dislikes the concept for fear of anti-creativity.

Comments like these sum up the overall sentiment:
"Using old data to presage a current idea is both terrible and foolish. It is to writing what Denny's is to fine dining - mediocrity run wild."   
"Data crunchers will take the art out of everything. Paint-by-numbers."  

You be the judge whether this is a good application or not.

I tend to bias towards answers like this from the comments (sadly this was only 1 of 2 positive comments at the time of my reading; the other one was from the CEO of the script analysis business):
"I'm sure people have all sots of assumptions about what audiences like already. This data could be a tool to look deeper into these assumptions. Film makers have always wondered about consumer taste. It is a business. When commerce and art mix, there are inevitable compromises. This tool helps people see possible preferences based on past behavior. Information should never frighten us. It is how this information is applied that most deserves our attention." 

I think it also never helps the image of such machine learning practitioners when the journalist tries to paint him with an antagonist brush, such as "chain-smoking" and "taking a chug of Diet Dr Pepper followed by a gulp of Diet Coke and a drag on a Camel". Reminded me somewhat of another writer's writing style when covering analytics.

Monday, August 12, 2013

Our labels: data scientist vs statisticians (or OR)

A perennial discussion of identities in the world of analytics is making the rounds on the blogs of statisticians. Or wait a second, what should we call them?
Data scientist is just a sexed up word for statistician

Data Scientists, Statisticians, Applied Mathematicians, Operational Researchers...jus to name a few, are the labels one might apply to themselves in the field of analytics. How shall we label ourselves? I can't agree more with Nate Silver,
"Just do good work and call yourself whatever you want."

Value chain trumps good design - ColaLife

Babies in Africa suffer and die from diarrhoea, but it's easily treatable with medicines that costs pennies. The problem is getting the medicine into the mothers hands - a supply chain problem in a rural and sparsely populated area.

Here comes ColaLife: Turning profits into healthy babies.

Inventing medicine packaging to fit into coca cola bottle gaps is ingenious, but understanding the value chain, so that all hands that touch the supply chain of the medicine has an incentive to ensure its stock and flow, is even more important.

If there is only one message to take away, I would choose:
"What's in it for me?" 
Always ask this to make sure there is a hard incentive for all players to participate. Free give-aways are often not valued, resulting in poorly managed resources and relatively low success rate. Ample training and advertising for awareness and effective usage is also key for product / technology adoption.

Saturday, August 3, 2013

The Slightly Rosier Side of Gambling Analytics

Having posted about the ugly side of analytics - casino loyalty programmes, the Guardian's DataBlog caught my eye with their article on a rosier side of gambling analytics, where UK technology firm uses machine learning to combat gambling addiction.

Of course, a business is still a business. It needs to be profitable, so there are reasons more than just "let's be good". I list out below my take on the reasons for "them" the gamblers clients, and the reasons for "us" the casinos. Note, I simply assumed the machine learning study is sponsored by the casinos.

Just for "them":

Casinos too have a corporate social responsibility (CSP). Helping pathological gamblers, or identifying them before they become one is a nice thing to do.

For "them" and for "us":

More for everyone! They get to play more, and we get to profit more. The more people play a bit for longer is better than playing a lot for a short amount of time due to self exclusion lists. (I'm not sure which is the better evil of the two though...)
That's the business case. It's not all soft and cuddly like the CSP. Well, ok, business cases almost never are.
"If you can help that player have long term sustainable activity, then over the long term that customer will be of more value to you than if they make a short term loss, decide they are out of control and withdraw completely"

Just for "us":

Minimising gambling problems helps keep the country's regulators off the companies' backs, so they don't have to relocate when the country's regulations tighten. Relocation = cost. A lot of it.
"And there's also brand reputation for the operator. No company wants to be named in a case study of extreme gambling addiction, to be named in relation to a problem gambler losing their house"

A side note: This reaffirmed why I don't's a lose-win situation.

"A lot of casino games operate around a return-to-player rate (RTP) whereby if the customer pays, say £100, the game would be set up to pay back an average of £90. Different games will have different RTPs, and there are a few schools of thought on whether certain rates have different impacts on somebody's likelihood of becoming addicted.Some believe that if you lose really quickly, you'll be out of funds very quickly and will leave, and that a higher RTP will keep people on site, but others disagree"

I highly recommend reading the full article on the DataBlog.

Thursday, August 1, 2013

The Ugly Side of Analytics - Casino Customer Loyalty

While listening to This American Life's episode "Blackjack", its Act 2 had me in the car saying, "oh no, they did not!"  The "they" is the Caesars Entertainment Corporations (the casino), and yes, they have a customer loyalty programme that they use to "attract more customers", and claim it's no different than other such programmes in industries like supermarkets, hotels, airlines or dry cleaners.

Well...there is a wee bit of difference.

No one is addicted to dry cleaning.

I am saddened that analytics is used to help the casino loyalty programme and hurt the pathological gamblers. The show indicates that the programme identifies "high value customers" using loyalty cards, tracking all spend and results, and then offer them the "right" rewards to keep them coming back. Most addicted gamblers are "high value customers". The bigger the looser, the more the reward. Rewards include drinks and meals, hotel suites, trips to casinos (if you don't live there), to gifts like handbags and diamonds.

Analytics and Operational Research is supposed to be the Science of Better.

I'd like to call on all professionals in the analytics field to reflect on the moral goodness, or lack of, in your work.

There is still hope though. If casinos can use analytics to identify problem gamblers, then others can too. Given pathological gambling is a mental health issue, is it time for NGOs or governments to catch up with technology and get their hands on those loyalty card data?