Showing posts with label Simulation. Show all posts
Showing posts with label Simulation. Show all posts

Wednesday, September 15, 2010

Restaurant Systems Dynamics - Influence Diagrams

Systems Dynamics is a discipline that floats about in the management science/management consulting ecosystem. It is genetically related to Systems Thinking, though Systems Thinking contains much more, but no aspect of simulation. The two most important aspects of Systems Dynamics are influence/causal diagrams and continuous simulation. Today I would like to outline an example of the use of influence diagrams to study a simple system, gain strategic insight, and form the basis of a stock and flow continuous simulation.

I was in Paris the other weekend, looking for a restaurant for Sunday lunch. Finding a good restaurant as a tourist is always difficult because tourist restaurants just aren't very good. The restaurants in my neighbourhood in London rely a lot on repeat business and referrals from friends and engage in a repeated interaction with their customers. The restaurants in touristy areas on the other hand get the majority of their business based on location. My local restaurant wants to delivery value for money so that I or my friends will come again. The restaurant in Venice never expects to see me again and is motivated to give me the lowest value for money to maximize profit. We have an example here of repeated and non-repeated games, but this is not an article about game theory.

As regular travellers, we have a strategy for finding the right place. There are a number of aspects to that strategy, but the one I want to highlight today is: Find busy restaurants. We are by no means the only people employing this strategy, as it is clear that busyness should be an indication of quality.

Where is this all going? I'm telling this story because I want to use an influence diagram to study restaurants in general, study touristy restaurants in particular and gain strategic insight from that. Influence diagrams are used to study the interactions in a system, particularly the between key strategic resources. In the case of our restaurants these will be:
  • Customers occupying tables
  • Customers queuing for tables
  • Perceived restaurant quality
  • Available customers


Figure 1. Simple Tourist Restaurant Influence Diagram

The make-up of an influence diagram is relatively simple:
  • Strategic resources, flows or other system variables
  • Arrows indicating one influencing another
  • An indication of a positive influence or negative influence
  • Optionally indications of re-enforcing and balancing loops

Consider Figure 1 above, the influences shown are as follows:
  • As the number of "New Customers Arriving" increases, the number of "Customers Occupying Tables" increases
  • As the number of "Customers Occupying Tables" increases, the "Perceived Restaurant Quality" increases
  • As the "Perceived Restaurant Quality" increases, the "New Customers Arriving" increases
  • As the number of "Customers Occupying Tables" increases, the "Length of Queue for Seating" increases
  • As the "Length of Queue for Seating" increases people will be discouraged and it will reduce the number of "New Customers Arriving"
  • As the number of "New Customers Arriving" increases, the number of "Available Customers" decreases
  • As the number of "Available Customers" decreases, the number of "New Customers Arriving" decreases

Re-enforcing loops can be exploited to achieve exponential growth and profit, but can also cause exponential collapse and bankruptcy. Balancing loops are often related to limited resources which limit what we can achieve, but also serve to mitigate damage.

Loop B1 is a balancing loop: As more customers choose to enter our restaurant, the total number of potential customers is diminished, thus reducing the flow of new customers. This puts a natural limit on our business, the number of potential customers.

Loop B2 is a balancing loop: As more customers arrive, our tables experience a higher and higher occupancy and customers must wait in a queue either for other customers to leave or for dirty tables to be turned over. Here is another resource constraint on our system: capacity.

Loop R1 is a re-enforcing loop: More customers leads to an increased perception of quality which then leads to more customers. This is they key re-enforcing loop that we should study further.

The key strategic conclusion that can be drawn form studying this influence diagram comes out of loop R1, the re-enforcing loop. The consequence of this loop is that full restaurants tend to stay full and empty restaurants tend to stay empty. Given that each restaurant starts empty each day, the key challenge appears to be in first becoming not empty. Easier said than done.

Restaurants and bars have a number of ways of achieving this. The first, but least interesting, is simply good quality. A regular customer base or recommendations in guide books will provide the seed customers from which a full house can grow. Alternatively, we need some other means of getting people in the door. This makes me think of my time in Turkey on the Mediterranean coast. Walking along the waterfront in a tourist town, a restaurant owner offered me a half-priced beer as long as I would sit along the front edge of his balcony. If this makes you think of happy hour there's probably a good reason.

I will admit that the "strategic insights" discussed above with respect to the restaurant industry are not earth moving, profound, or even unexpected. However, this article provides a simple real-world example of a dynamic system, and demonstrates the concept nicely. Had we not already known that full restaurants stay full and empty restaurants stay empty, going through this exercise could have revealed that to us.

The next step would be to design a simulation based on the influence diagram, something that I will endeavour to do in a future article.

Tuesday, July 13, 2010

What qualifies as a Simulation Model?

A theme that has been running through my career since my Master's project has been the question of measuring complexity in modelling and simulation. When can one proclaim to have built a simulation model and when is one glorifying simple analysis?

In the Operations Research ecosystem the tendency is certainly to inflate. Salesmen, curriculum vitae authors, recruiters and consultancies across the spectrum are all motivated to embellish the work that they do and work that is done. Like any scientific individual I seek to slice through the static, inform myself as to who is doing extraordinary work, and to build myself a framework from which I can safely criticize the inflations of others.

I have been working on a set of rules for separating "models" into models, calculations and simulations. I feel like there is a gaping opportunity here for contribution from complexity, chaos, and other disciplines in Computer Science and Mathematics, but here's what I've put together thus far:

Simulations are models, but not all models are simulations. Calculations are not models.

Models
  1. A model is a simplified representation of a system.
  2. All models are wrong, but some models are useful
Calculations
  1. The result of a calculation can be expressed in a single equation using relatively basic mathematical notation.
  2. Where calculations contain an time element, values at different times can be determined in any order without referring to previous values.
Simulations
  1. A simulation is a calculation in which one parameter is the simulation clock that increments regularly or irregularly.
  2. The outcome of a simulation could not have been determined without the use of the clock.
  3. While an initial state is typically defined, an intermediate state at a given time should be difficult or impossible to determine without having run the simulation to that point.
  4. Almost any model that involves repeated samples of random numbers should be classified as a simulation.
Consider the following progression of "models" that output an expected total savings:
  1. Inputs: Expected total savings.
  2. Inputs: Annual savings by year, time-frame of analysis.
  3. Inputs: Annual savings per truck per year, number of trucks by year, time-frame of analysis.
  4. Inputs: Annual savings per truck per year, current number of customers, number of trucks per customer, annual increase in customers, time-frame of analysis
  5. Inputs: Annual savings per truck per year, current number of customers by geographical location, annual increase in customers by geographical location, routing algorithm to determine necessary trucks, time-frame of analysis.
  6. Inputs: Annual savings per truck per year, current number of customers by geographical location, distribution of possible growth in customers by geographical location, routing algorithm to determine necessary trucks, time-frame of analysis.
As you can see, complexity builds and eventually passes a threshold where we would accept it as a model. "Model" 4 is still little more than a back of the envelope calculation, but Model 5 takes a quantum leap in complexity with the introduction of the algorithm. Model 5 however I would still not classify s a simulation, because any year could be calculated without having calculated the others. Finally Model 6 introduces a stochastic variable (randomness) that compounds from one year to another and brings us to a proper simulation.

I've seen calculations masquerading as simulations models at a Fortune 500 company both internally and externally. While the result is the same: outcomes determined from data where validity is asserted by the author, I know that Operational Research practitioners reading this will appreciate my desire to classify. At the very least it will help us separate what the MBAs do with spreadsheets from our own work.

I welcome input from others on this topic, as I am only just developing my own theories.

Thursday, May 13, 2010

Security Screening: Discrete Event Simulation with Arena

Simulation is a powerful tool in the hands of Operations Research practitioners. In this article I intend to demonstrate the usage of a discrete event process simulation, extending on the bottleneck analysis I wrote about previously.

A few days ago I wrote an article demonstrating how you could use bottle neck analysis to compare two different configurations of the security screening process at London Gatwick Airport. Bottleneck analysis is a simple process analysis tool that sits in the toolbox of Operations Research practitioners. I showed that a resource-pooled, queue-merged process might screen as many as 20% more passengers per hour and that the poor as-is configuration was probably costing the system something like 10% of its potential capacity.

The previous article would be good to read before continuing, but to summarize briefly: Security screening happens in two steps, beginning with a check of the passenger's boarding pass followed by the x-ray machines. Four people checking boarding passes and 6 teams working x-ray machines were organized into 4 sub-systems with a checker in each system and one or two x-ray teams. The imbalance in each system was forcing a resource to be under utilised, and Dawen quite rightly pointed out that by joining the entire system together as a whole such that all 6 x-ray machines effectively served a queue fed by all 4 checkers, a more efficient result could be achieved. We will look at these two key scenarios, comparing the As-Is system with the What-If system.

The bottleneck analysis was able to quantify the capacity that is being lost due to this inefficiency, but as I alluded, this was not the entire story. Another big impact of this is on passenger experience. That is, time spent waiting in queues in the system. In order to study queuing times, we turn to another Operations Research tool: Simulation, specifically Process-Driven Discrete Event Simulation. Note: There may be an opportunity to apply Queuing Theory, another Operations Research discipline, but we won't be doing that here today.

Discrete Event Simulation

Discrete Event Simulation is a computer simulation paradigm where a model is made of the real world process and the key focus is the entities (passengers) and resources (boarding pass checkers and x-ray teams) in the system. The focus is on discrete, indivisible things like people and machines. "Event" because the driving mechanism of the model is a list of events that are processed in chronological order, events that typically spawn new events to be scheduled. An alternative driving mechanism is with set timesteps as in system dynamics, continuous simulations. Using a DES model allows you to go beyond the simple mathematics of bottleneck analysis. By explicitly tracking individual passengers as they go through the process, important statistics can be collected like utilisation rates and waiting times.

During my masters degree, the simulation tool at the heart of our simulation courses was Arena from Rockwell Automation, so I tend to go to it without even thinking. I have previously used Arena in my work for Vancouver Coastal Health, simulating Ultrasound departments and there are plenty of others associated with the Sauder School of Business using Arena. Example. Example. Arena is an excellent tool and I've used it here for this artilce. I hope to test other products on this same problem in the future and publish a comparison.

In the Arena GUI you put logical blocks together to build the simulation in the same way that you might build a process map. Intuitively, at the high level, an Arena simulation reads like a process map when in actuality the blocks are building SIMAN code that does the heavy lifting for you.

The Simulation

Here's a snapshot of the as-is model of the Gatwick screening process that I built for this article:


Passengers decide to go through screening on the left, select the boarding pass checker with the shortest queue, are checked, proceed to the dedicated x-ray team(s) and eventually all end up in the departures hall.

An X-Ray team is assumed to take a minute on average to screen each passenger. This is very different from taking exactly a minute to screen each passenger. Stochastic (random) processing times are an import source of dynamic complexity in queuing systems and without modelling that randomness you can make totally wrong conclusions. For our purposes we have assumed an exponentially distributed processing time with a mean of 1 minute. In practice we would grab our stop-watches and collect the data, but we would probably get arrested for doing that as an outsider. Suffice it to say that this is a very reasonable assumption and that exponential distributions are often used to express service times.

As in the previous article, we were uncertain as to the relationship between throughput of boarding pass checkers and throughput of x-ray teams. We will consider three possibilities where processing time for the boarding pass checker is exponentially distributed with an average of: 60 seconds (S-slow), 40 seconds (M-medium), 30 seconds (F-fast) (These are alpha = 1, 1.5 and 2 from the previous article). In the fast F scenario, our bottleneck analysis says there should be no increased throughput What-If vs. As-Is because all x-ray machines are fully utilised in the As-Is system. In the slow S scenario there would similarly be no throughput benefit because all boarding pass checkers would be fully utilised in the As-Is system. Thus the medium M scenario is our focus, but our analysis may reveal some interesting results for F and S.

We're focused here on system resources and configuration and how they determine throughput, but we can't forget about passenger arrivals. The number of passengers actually requiring screening is the most significant limitation on the throughput of the system. I fed the system with six passengers per minute, the capacity of the x-ray teams. This ensured both that the x-ray teams had the potential to be 100% utilised and that they were never overwhelmed. This ensured comparability of x-ray queuing time.

I ran 28 (four weeks) replications of the simulation and let each replication run for 16 hours (working day). We need to run the simulation many times because of the stochastic element. Since the events are random, a different set of random outcomes will lead to a different result, so we must run many replications to study the possible results.

Also note that I implemented a rule in the as-is system, that if more than 10 passengers were waiting for an x-ray team the boarding pass checker would stop processing passengers for them.

Results

Scenario M - Throughput Statistics


First let's look at throughput. On average, over 16 hours the what-if system screened 18.9% more passengers than as-is. The statistics in the table are important. Stochastic simulations don't given a single, simple answer, but rather a range of possibilities described statistically. The average for 4 weeks is given in the table, but we can't be certain that would be the average over an entire year. The half width tell us our 90% confidence range. The actual average is probably between one half-width below the average and one above.

Note: I would like to point out that this is almost exactly the result predicted analytically with the bottleneck analysis. We predicted that in this case the system was running at 83.3% capacity and here we show As-Is throughput is 4728.43/5621.57 of What-If throughput = 84.1%. The small discrepancy is probably due to random variation and the warm-up time from the simulation start.

But what has happened to waiting times?


The above graph is a cumulative frequency graph. It reads as follows: The what-if value for 2 minutes is 0.29. This means that 29% of passengers wait less than 2 minutes. The as-is value for 5 minutes is 0.65. This means that 65% of passengers wait less than 5 minutes.

Comparing the two lines we can see that, while we have achieved higher throughput, customers will now have a higher waiting time. Management would have to consider this when making the change. Note that the waiting time increased because the load on the system also increased. What happens if we hold the load on the system constant? I adjusted the supply of passengers so that the throughput in both scenarios is the same, and re-ran the simulation:


Now we can see a huge difference! Not only does the new configuration outperform the old in terms of throughput, it is significantly better for customer waiting times.

What about our slow and fast scenarios? We know from our bottle-neck analysis that throughput will not increase, but what will happen to waiting times?


Above is a comparison between as-is and what-if for the fast scenario. The boarding pass checkers are fast compared to the x-ray machines, so in both cases the x-ray machines are nearly overwhelmed and the waiting time is long. Why do the curves cross? The passengers that are fortunate enough to pick a checker with two x-ray machines behind them will experience better waiting times due to the pooling and the others experience worse.

This is a bit subtle, but an interesting result. In this scenario there is no throughput benefit from changing, there is no average waiting time benefit from changing, but waiting times are less variable.


Finally, we can take a quick glance at our slow S scenario. We know again from our bottleneck analysis that there is no benefit to be had in terms of throughput, but what about waiting times? Clearly a huge differenence. The slow checkers are able to provide plenty of customers for the single x-ray teams, but are unable to keep the double teams busy. If you're unlucky you end up in a queue for a single x-ray machine, but if you're luck you are served immediately by one of the double teams.

Summary

To an Operations Research practitioner with experience doing discrete event simulation, this example will seem a bit Mickey Mouse. However, it's an excellent and easily accessible demonstration of the benefits one can realize with this tool. A manager whose bottleneck analysis has determined that no large throughput increase could be achieved with a reconfiguration might change their mind after seeing this analysis. The second order benefits, improved customer waiting times, are substantial.

In order to build the model for this article in a professional setting you would probably require Arena Basic Edition Plus, as I used the advanced feature of output to file that is not available in Basic. Arena Basic goes for $1,895 USD. You could easily accomplish what we have done today with much cheaper products, but it is not simple examples like this that demonstrate the power of products like Arena.



Related articles:
OR not at work: Gatwick Airport security screening (an observation and process map of the inefficiency)
Security Screening: Bottleneck Analysis (a mathematical quantification of the inefficiency)