Showing posts with label Modelling. Show all posts
Showing posts with label Modelling. Show all posts

Sunday, February 27, 2011

Faking It On Your Wedding Day

Earlier this month we wrote about our love of podcasts and just last week I was listening to Japan: A Friend In Need from the BBC Documentaries Archive. Here I was in the month of love, listening to a podcast on the subject and I found math in an unexpected place.

The documentary is about an agency in Japan that supplies fake people, or actors I suppose. In particular, this agency will supply people to fill out your side of a wedding. In the given example, we met a young man whose parents were deceased and his siblings were astranged, such that he only had two friends to attend his wedding. So as to keep up appearances, unbeknownest to the bride, he hired parents, friends and relatives. All told, 30 people at his wedding were fake, costing him something like £3,000, equal to his recent redundancy compensation.

The agency claims never to have been caught, and they say that they "research their assignments assiduously", but it got me wondering just how long you could operate such a service without getting caught. How many weddings could you do before a repeat guest noticed that they had seen one of your actors at a wedding before?

The first wedding is simple, and guaranteed to go off without a hitch, but what about the second? Suppose every wedding has on average 30 guests from each family. In the second wedding we need all 30 people to not be from the 30 in the previous wedding. Still pretty easy in a country of 127 million. But what about the 30th wedding when there are 900 previous guests out there in the population? Things are still looking pretty good, but the probabilities are starting to pile up in a similar way to the phenomenon that means that in a group of 23 people there's a 50% chance that two will have the same birthday.

So given a constant wedding size of 60, 30 real and 30 fake, what is the probability that this is the wedding that breaks us? This is the same as the probability that one or more of today's guests attended a previous wedding. This is the same as one minus the probability that none of today's guests attended a previous wedding. For wedding n and a population p:
Assuming 127 million people in Japan...
  • For wedding 1, it's a sure bet as nobody has attended a previous wedding.
  • For wedding 2, we face only a 0.0011% chance of getting caught.
  • Even for wedding 100 our risk is only a 0.11% chance. No problem!
But wait, the above probabilities are conditional probabilites. Our chance of getting caught at wedding 100 given that we got to wedding 99 is 0.11%. What is our chance of getting to wedding 99? This is the the probability that we didn't get caught in one or more of the previous weddings, the probability of a perfect record. Mathematically our chance of getting to and past wedding n is:
  • For wedding 1, it's a sure bet.
  • For wedding 2, it's 99.99%
  • For wedding 100, it's 94.58%.
  • For wedding 500, it's 24.57%.
Even though by the time we get to wedding 500, ony 15,000 people in Japan have been to weddings with our staff, we would be lucky to have made it that far.

If we started this agency today, on average how long can we expect to go before we get caught? Now I'm not going to bother expressing that mathematically, but hacking at it with Excel numerically, I can tell you that it comes to roughly 374. If we were to start such an agency today under such conditions and such assumptions, we would on average expect to do 374 weddings before getting caught.

So I think the moral of the story is, if you're looking to hire fake people for your wedding, you're doing alright, but if you're looking to run a business doing it, you might want to reconsider. Then again, if we're looking for morals in this story, honesty might come first.

Tuesday, July 13, 2010

What qualifies as a Simulation Model?

A theme that has been running through my career since my Master's project has been the question of measuring complexity in modelling and simulation. When can one proclaim to have built a simulation model and when is one glorifying simple analysis?

In the Operations Research ecosystem the tendency is certainly to inflate. Salesmen, curriculum vitae authors, recruiters and consultancies across the spectrum are all motivated to embellish the work that they do and work that is done. Like any scientific individual I seek to slice through the static, inform myself as to who is doing extraordinary work, and to build myself a framework from which I can safely criticize the inflations of others.

I have been working on a set of rules for separating "models" into models, calculations and simulations. I feel like there is a gaping opportunity here for contribution from complexity, chaos, and other disciplines in Computer Science and Mathematics, but here's what I've put together thus far:

Simulations are models, but not all models are simulations. Calculations are not models.

Models
  1. A model is a simplified representation of a system.
  2. All models are wrong, but some models are useful
Calculations
  1. The result of a calculation can be expressed in a single equation using relatively basic mathematical notation.
  2. Where calculations contain an time element, values at different times can be determined in any order without referring to previous values.
Simulations
  1. A simulation is a calculation in which one parameter is the simulation clock that increments regularly or irregularly.
  2. The outcome of a simulation could not have been determined without the use of the clock.
  3. While an initial state is typically defined, an intermediate state at a given time should be difficult or impossible to determine without having run the simulation to that point.
  4. Almost any model that involves repeated samples of random numbers should be classified as a simulation.
Consider the following progression of "models" that output an expected total savings:
  1. Inputs: Expected total savings.
  2. Inputs: Annual savings by year, time-frame of analysis.
  3. Inputs: Annual savings per truck per year, number of trucks by year, time-frame of analysis.
  4. Inputs: Annual savings per truck per year, current number of customers, number of trucks per customer, annual increase in customers, time-frame of analysis
  5. Inputs: Annual savings per truck per year, current number of customers by geographical location, annual increase in customers by geographical location, routing algorithm to determine necessary trucks, time-frame of analysis.
  6. Inputs: Annual savings per truck per year, current number of customers by geographical location, distribution of possible growth in customers by geographical location, routing algorithm to determine necessary trucks, time-frame of analysis.
As you can see, complexity builds and eventually passes a threshold where we would accept it as a model. "Model" 4 is still little more than a back of the envelope calculation, but Model 5 takes a quantum leap in complexity with the introduction of the algorithm. Model 5 however I would still not classify s a simulation, because any year could be calculated without having calculated the others. Finally Model 6 introduces a stochastic variable (randomness) that compounds from one year to another and brings us to a proper simulation.

I've seen calculations masquerading as simulations models at a Fortune 500 company both internally and externally. While the result is the same: outcomes determined from data where validity is asserted by the author, I know that Operational Research practitioners reading this will appreciate my desire to classify. At the very least it will help us separate what the MBAs do with spreadsheets from our own work.

I welcome input from others on this topic, as I am only just developing my own theories.