Tuesday, October 28, 2008

Approaches to Reporting Access to Diagnostic Imaging in Health Care

In the publicly funded and administrated health care system in Canada the absence of market forces makes access to services of chief concern. Thus reporting, synthesizing and acting on data regarding access is critical. In the context of diagnostic imaging, an area that I have recently had experience with, access is typically talked about in terms of waiting times or waiting lists. The issue of waiting times in imaging is, like so much in health care, a complex one. Multiple exam types requiring varying specialty resources are performed on patients with a kaleidoscope of urgency levels. Typically data exists at a patient-by-patient level, but the challenge is how to aggregate the information in such a way that waiting times can be reported for both the benefit of the decision maker and the benefit of the public. The details oriented operations research practitioner is tempted to over-deliver on their level of analysis when presenting these metrics and we must seek to trim it back while still including critical information that impacts what are ultimately life and death decisions. Below I hope to combine a survey of the current state of public information on CT (Computed Tomography) and MRI (Magnetic Resonance Imaging) waiting times in Canada with a discussion of nuances of the metrics chosen.

Beginning with the worst example I saw in my research, we look at the Nova Scotia Department of Health Website. Waiting times are reported by authority and by facility, important data for individuals seeking to balance transportation with access. However, it's how the wait times are measured that worries me the most. Waiting time is defined as the number of calendar days from the day the request arrives to the next available day with three open appointments. I have found that this is the traditional manner in which department supervisors like to define waiting lists, but at a management level it's embarrassingly simplistic. At the time of writing, the wait time at Dartmouth General Hospital for a CT scan is 86 days. I guarantee you that not every patient is waiting 86 days for an appointment. Not even close. Neither is the average 86 days, nor is the median 86 days. The question of urgency requires that we discuss our level of access for varying urgencies. Additionally, there's the fact that 3 available appointments 86 days from now says nothing about what day my schedule and the hospital's schedule will allow for an appointment. If there's that much wrong with this measurement method, then why do they do it? The simple fact is that it is very easy to implement. In healthcare where good data can be oh so lacking, this system of measuring "waiting lists" is cheap and easy to implement. No patient data is required, one needs simply to call up the area supervisor or a booking clerk and ask for the information. So hats off to Nova Scotia for doing something rather than nothing, which indeed is better than some of the provinces, but there's much work to be done.

Next, we'll look at the Manitoba Health Wait Time Information website. Again we have data reported by health authority and facility. Here we see the "Estimated Maximum Wait Time" as measured in weeks. The site says, "Diagnostic wait times are reported as estimated maximum wait times rather than averages or medians. In most cases patients typically wait much less than the reported wait time; very few patients may wait longer." If this is true, and it is, then this is pretty useless information, isn't it? Indeed I am reconsidering my accusation of Nova Scotia being the worst of the lot. If this information represents something like the 90th or 95th percentile then I apologize because, as I discuss later, this is a decent figure to report. However, it is not explicitly described as such.

Heading west to Alberta, we visit the Alberta Waitlist Registry. Here we can essentially see the waiting time distribution of most patients scanned in MRI or CT accross the province in the last 90 days. The site reports the "median" (50th) and "majority" (90th) percentiles of waiting time. It then follows to report the % of patients served in <3>18 months. What is lacking in this data is two key elements. For one, both day patients and in patients are included in this data. This means that both the patient waiting for months to get an MRI on their knee and the patient waiting for hours to get one on their head are treated as equal. Patients admitted to the hospital and outpatients experience waiting times on time scales of different orders of magintude and should not be considered together. The percentage of patients seen in less than 3 weeks must therefore include many inpatients and thus overstates their true level of service. The other key element is the notion of priority. Once again, for an individual in the population looking for information about how long they might wait or for a manager/politician looking to quantify what the level-of-care consequences are of current access levels, this data isn't very useful because it lacks priority. If urgent patients are being served at the median waiting time, this shows significant problems in the system, but without data reported by urgency, we can only guess that this is being done well. As someone who has seen it from the inside, I would NOT be confident that it is.

Now I return to what westerners would rather not admit is the heart of Canada, Ontario and the Ontario Ministry of Health and Long-Term Care website. This site measures wait times in terms of the time between request and completion. It reports the 90th percentile wait times in days by facility and provincially and calls it the "point at which 9 out of 10 patients have had their exam." The data excludes inpatients and urgent outpatients scanned the same-day, addressing a critical issue I had with the Alberta data. Priorities are lacking, but with a little digging you can find the province's targets by priority, so there is, perhaps, hope. Reporting the 90th percentile seems like a good practice to me. With the funky distributions we seen when measuring waiting times, averages are certainly of no use. Additionally the median isn't of great interest because this is not an indication what any one individual's experience will be. This leaves the 90th percentile which expresses what might be called a "reasonable worst case scenario".

Finally I turn to the organization whose explicit business is communicating complex issues with the public, the Canadian Broadcasting Corporation. Their CBC News Interactive Map from November 2006 assigned letter grades from A-F converted from %ages of the population that were treated within benchmark. Who knows if this is glossing over the lack of priority data or if it includes the %age that met the benchmark for each priority, but it's a start. Letter grades given were: BC N/A, AB D, SK N/A, MN F, ON C, QC N/A, NB N/A, NS N/A, PEI F, NF N/A. So with over half not reporting, there wasn't much they could do.

So what have we learned from this survey? Well we have certainly learned that the writer has a love of detail and is dissatisfied with each province that omits any. This is, as discusses in the introduction, natural for an operations research practitioner. If I were advising someone on the development of diagnostic access-to-care metrics I would tell them this: (1) Focus on the patient experience. Averages and medians don't tell me what my experience might be. 90th percentiles do a much better job of this. (2) Focus on the context. Waiting times in the context of an inpatient are in a different universe than those of an outpatient and should be treated as such. Waiting times of urgent cases vs. routine cases bear different significance and should be similarly separated.

Monday, October 20, 2008

Healthcare and OR in Canada: selected talks at INFORMS 2008

The Ontario Ministry of Health in Canada would like to reduce the delay in transfer of care from the ambulance to hospital emergency department. The delay usually occurs when the ambulance is at the hospital site waiting to transfer patients to the emergency wards. The ministry would like to use alternative sites, UCCs (Urgent Care Centres), to accommodate the ambulance patients who would typically be discharged on the same day, so as to free up time at the ED needed to deal with these type of patients. The good news is that the ministry has the smarts to research the feasibility of this solution before doing anything. However, the bad news is that the two databases necessary (EMS & hospital databases) for doing this study do not have identifiers for patients. What’s new, right? Healthcare and bad data almost always go hand in hand. Therefore, the team lead by Ali Vahit Esensoy at the University of Toronto cannot identify the same patient in both databases. However, using accurate GPS timestamps and various triage indicators, the team was able to come up with an algorithm to match over 90% of the patients in the two databases. Then with the help of the physicians and staff, the team was able to devise a set of decision rules to filter out the patients that would be candidates for UCC. The result of the study is that the proposed UCC solution is in fact not a good idea, because there are simply not enough such patients. This is a classic case illustrating the importance of quantitative analysis for informed decision making.

On the west coast of Canada, two groups within the CIHR (Canadian Institutes of Health Research) team in Operations Research to improve cancer care, are making impacts in the BC Cancer Agency. They would like to call out to the OR community to help them join in their efforts of establishing an online community to share resources among the OR people working in cancer care.

The British Columbia Cancer Agency (BCCA) is the sole cancer treatment provider for the entire province. The problem to be resolved at the facility is a lack of space (examination rooms and space for physicians to dictate) at the ambulatory care unit (ACU). However, again, the process flow related data was not available. The BCCA OR team, Pablo Santibáñez and Vincent Chow mapped the patient flow process, and then manually collected time and motion data to track the movement of patients and physicians. The data was used for a simulation model to evaluate different what-if scenarios: different appointment scheduling methods and room allocation methods. As a result, the team was able to achieve up to 70% reduction in patient appointment wait time with minimum impact on the clinical duration. They were also able to free up to 26% of the exam rooms to accommodate for other physician duties.

On the academic front, a Ph.D student at the Sauder School of Business in the University of BC, Antoine Sauré, has been helping BCCA in another department: Radiation Therapy treatment units. This research is motivated by the adverse effect of delays on patients’ health such as physical distress and deterioration of the quality of life, and the inefficiencies in the use of expensive resources. Rather than maximizing revenue, the main purpose of our work is to identify good scheduling policies for dynamically allocating available treatment capacity to incoming demand so as to achieve wait time targets in a cost-effective manner. Currently, the number of urgent treatments that would start after the recommended target time is significantly below the target. This goal involves the development of a Markov Decision Process model and simulation models for identifying and testing different types of policies. This is still an on-going research. No results are currently available. However, the team is ready to test algorithms for determining the optimal scheduling policy based on an affine approximation of the value function and a column generation algorithm to tackle the otherwise very large MDP problem.

The papers for the above two projects are available online at http://www.orincancercare.org/cihrteam/ if you wish to obtain more information.

Credits: These 3 talks were given at the INFORMS 2008 conference in Washington DC. The track sessions were TB21, TB34, and TB34. Speakers are Ali Vahit Esensoy, University of Toronto, Canada; Pablo Santibanez, Operations Research Scientist, British Columbia Cancer Agency, Canada; and Antoine Saure, PhD Student, Sauder School of Business, University of British Columbia, Canada. The talks were titled "Evaluation of Ambulance Transfers into UCCs to Improve Ambulance Availability & Reduce Offload Delay", "Reducing Wait Times & Improving Resource Utilization at the BC Cancer Agency’s Ambulatory Care Unit", and "Radiation Therapy Treatment Scheduling".

Friday, October 17, 2008

Doing Good with Good OR: Energy & The Environment

How could analysis and Operations Research help us foresee trends and make intelligent and informed political decisions? Philip Sharp, the president of Resources for the Future, former Congressman from Indiana, US House of Representatives, attempts to answer this question in the Doing Good with Good OR series of plenaries. The humble but frank former congressman speaks to the scientific crowd about the importance of rigorous analysis for important political issues. Furthermore, Sharp elaborates on the institutional connections and the ability to communicate complex issues simply as the crucial factors to make the analysis matter.

From the 70’s to now, we have been on the roller coaster ride with oil prices. In the 1970’s, with the belief of an energy shortage, the National Energy Act was signed to protect the US from the over consumption of oil. Tax credits were handed out to engage people to convert gasoline cars to burning natural gas. The go-ahead for ethanol was also around the same time. However, in 1986, oil prices crashed down, which caused policies and investors who relied on conventional wisdom to withdraw. Then came 2004, when oil prices went up unexpectedly, and the government had to update the fuel economy mandate to adjust to the rise. Similar rise and fall goes for the natural gas industry as well. The ups and downs of natural gas influenced the liquefied natural gas (LNG) terminals development, the talks of an Alaska pipeline to serve the lower 48 states, the energy ties with the northern neighbor, Canada, and much more. In the US, the prices of oil and gas largely impact the country. When energy prices go through the roof, “all bets are off”, says Sharp, just like the stock market crash that the country is now deeply suffering from.

However, being open and frank, Sharp thinks analysis is powerful and important, but it is not the means to all ends. “It will help us organize and attack so many unknowns and uncertainties”, but persistency to get the ideas across to politicians is crucial. Responding to an audience’s question, Sharp stresses the importance of identifying institutional connections. In the world of politics, the voice of an individual scientist may have some impact, but a collective agreement of a group of politicians will have far greater reach. “The most difficult thing in the congress is to get people to agree on something”, says Sharp. The scientist’s ability to communicate simply the complex things and the level of confidence and trust s/he can offer to the politician is golden. Politicians need to find out who and what to trust, especially when reports are being thrown around to counter other reports. It is too tempting to just pick the easiest path or whatever supports one’s self interest.

It is always a pity to see brilliant ideas sitting on the book shelves because of ineffective communication. Politicians are typically the most powerful figures in a country and can make the biggest impact. If Operations Researchers want to do good with good OR, then as Sharp suggests, identifying institutional connections will be the key.

Credits: The talk was given at the INFORMS 2008 conference in Washington DC as a plenary in the series of Doing Good with Good OR).

Doing Good with Good OR: Health Policy, AHQR

Dr. Steven Cohen, Director at the Center for Financing, Access & Cost Trends at the Agency for Healthcare Research & Quality (AHRQ) in the Department of Health & Human Services, USA, shared with the audience the various topics that the Center has been working on.

The Center employs statisticians, economists, social scientists and clinicians to achieve their motto, “advancing excellence in healthcare”. They also monitor and analyze the current trends in healthcare, ranging from cost, to coverage, to access, and to quality of care. Some figures that were shared were:
  • Every $1 out of $6 of the US GDP (or 16%) goes to healthcare – largest component of federal and State budget
  • Other western European nations spend less than 10% on healthcare.
  • In 2006, $2.1 trillion was spent on healthcare. That is $7,026 per capita, which is a 6.7% increase over 2005.
  • At this kind of growth, it is projected that $4.1 trillion will be spent in healthcare in 2016 (1/5 of GDP). 
  • 5% of the US population account for 50% of the healthcare expenditures.
  • Prescribed medication expenditures almost doubled from 12% to 21% in ~10 years. 
  • The largest expenditure is on inpatients (over 30%).
  • The second largest is on ambulatory care (over 20%). 
  • Chronic diseases (heart, cancer, trauma-related disorders, mental health, and pulmonary) account for a majority of the expenditures.
  • Medical errors accounted for 44,000 avoidable deaths in hospitals a year.
  • Americans are less healthy: 33% obesity rate and high rate of chronic diseases.

AHRQ aims to provide long term system wide improvement for healthcare quality and safety. To provide policy makers with informed recommendations, surveys, simulations , forecasting models, and other OR tools are often employed to answer difficult questions. Through these methods, AHRQ is able to establish evidence and assess risks associated with alternative care choices.

AHRQ’s focus on efficient allocation of resources and structuring of care processes that meet patient care needs aids policy makers to establish the necessary high-level strategies and policies. Especially in dire times like these, issues of rationing become the center of discussion. It is AHQR’s responsibility to have the right information to help policy makers make the right trade offs.

Credits: The talk was given at the INFORMS 2008 conference in Washington DC as a plenary in the series of Doing Good with Good OR).

Tuesday, October 14, 2008

Social Media and Operations Research

Social media and Web 2.0 have been the buzz words in the internet marketing world for a few years now. Of course, we can count on the Numerati (the new term for Operations Researchers in reference to the title of the new book by Stephen Baker) to start scratching their heads and eventually come up with systematic ways of mining vast amount of data (i.e. analytics), and then applying the harvest of knowledge from other disciplines, such as psychology, to study people’s behaviours (hence diverting from the non-traditional OR application field of mechanical processes). Claudia Perlich from IBM and Sanmay Das from Rensselaer Polytechnic Institute individually explain ways they have used OR to dissect the world of blogs and Wikipedia to provide insight to marketers and to demonstrate the conversion of Wikipedia’s highly edited pages to a stable and credible source.

Ever since the existence of blogs, marketers have been nervous about the reputation of their products. Lucky for the IBMers, when the marketers at IBM are nervous about the client response to their product (i.e. Lotus), help is within reach from the IBM Research team. Marketers want to know: 1. What are the relevant blogs? 2. Who are the influencers? 3. What are they saying about us? 4. What are the emerging trends in the broad-topic space? Perlich’s team went about these four questions by starting with a small set of blogs identified by the marketers (the core “snowball” of blogs), then “rolling” over the “snowball” twice to discover other blogs related the core set (i.e. max 2 degrees of separation from the core). To find the authoritative sources in the snowball, Google’s Page Rank algorithm came to the rescue. Using sentiment labeling, the team was able to say whether the overall message of a blog was positive or negative. Then to allow useful interaction with and leveraging of the data by the users (i.e. marketers), a visual representation was rendered to show the general trend in the blogosphere about the product in question (see image). At which stage, marketers are able to dig into each blog that is identified as positive or negative, and the problem would seem much more manageable.

Das’ talk fits in rather nicely with Perlich’s, as he examines the trend of blog comment activities and Wikipedia’s edit histories to try to demonstrate the eventual conversion of these information sources to a stable state. The underlying assumptions are that the more edits a Wikipedia page has, the more reliable its information is, hence the higher the quality; and the higher the quality a page is, the less likely that there will be more talks/edits on that page, because most likely what needed to be said has already been said (aka informational limit). Das obtained the data dump of all pages on Wikipedia in May 2008, and obtained all 15,000 pages (out of 13.5 million in total) that had over 1,000 edits. Using dynamic programming to model a stochastic process, Das was able to find a model for the edit rate of an aggregation of these highly edited Wikipedia pages. Then he applied the same model to blog comment activities. In both cases, the model fit extremely well to the data, and surprisingly the shape of the activity pattern over time looked very much alike between blog comment and Wikipedia page edit activities. An interesting inference made by Das was that people contribute less of their knowledge to Wikipedia pages than blogs.

This is the beauty of Operations Research. It is a horizontal plane that can cut through and be applied to many sciences and industries. Aren’t you proud to be a dynamic Numerati?

Credits: The talk was given at the INFORMS 2008 conference in Washington DC. The track session was MA12 & MB12. Speakers are Claudia Perlich, T.J. Watson IBM Research, and Sanmay Das, Assistant Professor, Rensselaer Polytechnic Institute, Department of Computer Science. The talks were titled "BANTER: Anything Interesting Going On Out There in the Blogosphere?", and "Why Wikipedia Works: Modeling the Stability of Collective Information Update Processes".

Similarity between “A Single EU Sky” and “Ford Suppliers”? Optimization

How should EUROCONTROL select which improvement projects to invest in for a unified European sky for air traffic by the year 2020? Will Ford’s Automative Holding Components shut down their facilities and outsource everything? Optimization models come to the rescue, and they are big! 2 of the 6 finalists for the Daniel H. Wagner Prize for Excellence in Operations Research presented their exciting projects at INFORMS on Monday.

The air traffic is getting more and more congested in Europe, and a solution is needed to unite the European sky with a single set of protocols for all EU countries. EUROCONTROL is in the position to do so. It is the European Organization for the Safety of Air Navigation, and it currently has 37 member states. As a part of the modernization activities, EUROCONTROL needs to select a set of technological improvement projects from four major categories: network efficiency, airport operations, sector productivity, and safety net. For example, if considering only a subset of 5 projects out of a list of 20+ projects, each with 2 to 5 implementation options, there would be 300 possible combinations. Therefore, the question is which set of projects to select that would satisfy the various objectives and constraints of multiple stakeholders, such as the airports, the airlines, the society (environment), and more. Grushka’s team from the London Business School was asked by EUROCONTROL to provide vigorous and transparent analysis of this problem by involving all stakeholders. The team used an integrative and iterative framework to approach the problem. They used a mixed integer programming model to find the optimal combination of projects. As a result of this team’s effort, the EU may now use the same language to talk about uniting the European sky with a common understanding of the problem.

The Ford team had a very different but extremely urgent problem to solve. They needed to figure out whether 2 of their automotive interior supply production facilities should be closed down and their work outsourced to other suppliers because of unprofitability and underutilization at the plants. The number one problem was time – 8 weeks was all they had. The optimization problem they faced, however, was extremely large. The mixed integer non-linear programming model (MINLP) had around 450,000 variables and 200,000 constraints. After removing nonlinearity in the model, the mixed integer programming model (MIP) had 4.3 million variables and 1.8 million constraints. Just imagine the data gathering process and the model formulation! What a nightmare. Luckily, the team had the unconditional support from the CEO and was able to obtain a complete set of data –150,000 data points in the model. After 8 weeks of 20 hour-days, the team was able to deliver a model to test out the what-if scenarios and therefore removing the subjective decision making that is so common in most enterprises. As a result, Ford will be able to maintain 1 facility and outsource only a certain percentage of the work. It presented a saving of $50 million over five years compared to the alternative of outsourcing all the work.

The two projects were fascinating to listen to. They both showcased the importance of quantitative decision making in business. It will be a tough job to select a winner out of these two. Good luck to both teams!

Credits: The talk was given at the INFORMS 2008 conference in Washington DC. The track session was MC32. Speakers are:Yael Grushka-Cockayne, London Business School, Regent’s Park, London, United Kingdom, and Erica Klampfl, Technical Leader, Ford Research & Advanced Engineering, Systems Analytics & Env. Sciences Department. The talk was titled "Daniel H. Wagner Prize Competition: Towards a Single European Sky, and Using OR to Make Urgent Sourcing Decisions in a Distressed Supplier Environment".

Majority Judgment - the new & fair voting/ranking system

Could Al Gore have been the President of the United States if the electoral voting system were different? Could we have had a different world champion in figure skating if the ranking system were different? “Maybe”, says Michel Balinski, a distinguished, multi-award winning professor and now the Directeur de Recherche de classe exceptionnelle, Ecole Polytechnique and CNRS, Paris. The pending US patent of the proposed majority judgment voting system by Balinski and Rida Laraki is claimed to be more accurate in reflecting people’s (or jury’s) true desires, and the process is promised to be more fair than the traditional voting/ranking systems.

Balinski describes the traditional voting/electoral system as trying to elect the Condorcet winner. Wikipedia says, “The Condorcet candidate or Condorcet winner of an election is the candidate who, when compared with every other candidate, is preferred by more voters. Informally, the Condorcet winner is the person who would win a two-candidate election against each of the other candidates”. However, such a winner may not exist because of the Condorcet’s paradox and Arrow’s impossibility theorem. Besides, doesn’t it strike people as odd that in a world where there is a lot of gray area and not just black or white, that we are basically casting a yes/no vote for the most important person in a country? Balinski argues that we could do better with the majority judgment voting process.

The process would list all electoral candidates on the ballot, and ask voters to rate each candidate as one of the following by providing a tick mark in the grid, for example: excellent, very good, good, acceptable, poor, to reject. This experiment was conducted in the 2007 Presidential election in France, and these rankings are of “common language” to all French voters, because the schools have always used them. Similar common languages include star ratings (1 to 5 stars) for movies and restaurants, and letter grading (A to F) for school grades. Summing or averaging the scores wouldn’t make sense, because it is not an interval measure, according to Balinski. Therefore, he proposes that the set of grades each candidate receives should be ordered from the best to the worst, and obtain the median grade (the 3rd of a total of 5 grades for example, or the 4th of a total of 6). If all candidates are tied in the first round, then repeat the method until the tie is broken, but each time with the median grade (the 3rd or the 4th grade) removed for each candidate’s set of grades. The final grades would become the majority grade.

It should be noted that for such a method, a common language of grades is essential, or there would not exist a collective decision. However, Balinski claims that this voting system would produce a much more reflective result of the voter’s desires. So much so that the experiment done in 3 of the Orsay’s voting bureaux in France, the French voters were able to tell who each of the 3 candidates was simply by looking at the final majority grades of the top 3 presidential candidates,.

When studies show that a third of the voters do not state a single preferred presidential candidate, one questions whether it is correct to force voters to vote for only one candidate on a ballot. And if such existing systems do not work well, shouldn’t we be inclined to change and try out new methods? After all, we are an adaptive and ever-changing society (even though the human nature is afraid of change). But hey, if it doesn’t work, we can always chuck it away. What’s there to lose?

Credits: The talk was given at the INFORMS 2008 conference in Washington DC as a keynote talk. Speaker is Michel Balinski, a distinguished, multi-award winning professor and now the Directeur de Recherche de classe exceptionnelle, Ecole Polytechnique and CNRS, Paris. The talk was titled "One-Vote, One-Value: The Majority Judgment".

Sunday, October 12, 2008

ER Crowding, physician appointment and nursing care waits

Does Michael Moore’s SiCKO do the American healthcare any justice? Is the US healthcare riddled with delays? With the increasing exposure of healthcare wait time problems on TV and other media, and with healthcare being one of the central topics of political debates and strategies, what and where are the problems, and how do we deal with them? Professor Linda Green from the Columbia Business School talks about the three critical types of waits in healthcare and their impact on one another: emergency room (ER), physician appointment (or imaging tests) and nursing care waits. She sets out to prove that there are not “too many” beds in the system, physician appointments cannot be simply Just-In-Time, and the nurse staffing level doesn’t have a one-size-fit-all solution.

When physicians aren’t surprised of seeing patients suffering from heart attacks dying while waiting to get into a bed in ER (2006 news from an Illinois emergency room), when patients leave ER without being seen due to the frustration of long waiting hours, when ambulances have to be diverted because the ER is full, one needs to ask why. Why are the ERs always full when you need it? The number one reason for this is a lack of inpatient beds – the bottleneck of the healthcare system, which causes the inability to move patients in ER to an inpatient unit and therefore releasing beds in the ER. So, why are there not enough beds? The answer is relatively simple. Healthcare is expensive, and when governments need to stay on budget, healthcare gets cut. Healthcare also happens to be largely funded by government subsidization, so such cuts have reduced the number of hospitals from 7000 to 5000 (from 1980 to now), and the number of beds available from 435 to 269 per 100,000 persons in the US. Yet, politicians cry that there are “too many” beds. Why, you ask? Because important decision makers are looking at occupancy level (or bed utilization ratio), which for ER beds the utilization currently resides at around 66% (target is 85%). Utilization looks good, but it is the wrong measuring stick. Every OR person knows variability means extra capacity (more so than the average) is needed to deal with times of heavy demand with a reasonable level of customer satisfaction. Healthcare is one such process where demand varies greatly with the day of the week and the hour of the day. Aiming at and planning for averages is not going to work. A Green’s study showed that if New York state hospitals wanted to have a less than 10% chance of not having a bed available for up to 1 hour for an ER patient, then 58% of the hospitals would be too small. If they aim for a 5% chance of not having a bed available, then 74% hospitals would be too small; and if the aim were 1%, 90% hospitals would be too small. However, the good news is performance could be improved without increasing stuffing level at the hospitals, as Green has shown in her work.

Moving onto physician appointment and nursing care waits, these two factors both contribute to the overcrowding of ERs. When physicians are asked to treat patients in a Just-In-Time fashion, the major issue is that no one knows how to balance the physician capacity with the patient demand. When physicians have too many patients, not everybody can get seen in time, and patients cancel appointments if the wait is long. When patients cancel appointments, they also ask for another one because they still needs to be seen, so it goes back and adds to the backlog for the physicians. Study shows the shorter the backlog the higher the physician’s utilization. Green’s project of recommending the right number of patients for doctors has helped over 500 clinics and organizations to find that balance of capacity and demand.

It is widely acknowledged that there is a shortage of nurses, and that the lack of nursing care is directly correlated with increasing patient safety issues (mortality included) and patient satisfaction. California has passed a legislation to ensure a 1:6 nurse-to-patient ratio for general medical-surgical wards. It is good to see actions, but is this policy more disruptive than helpful? According to Green, varying unit sizes, the level of intensity of care for differing patient types, and the length of stay (and therefore turnover of patients) at different hospitals can all mean different optimal nurse-to-patient ratios. In larger units, the California legislation could result in overstaffing at a cost of $300,000 per year per unit, says Green. That is expensive.

Overall, healthcare certainly has a lot of problems. To name a few, there is the increasing cost, quality problems, and access problems. Operations Research is needed in this field. It could mean a matter of life and death.

Credits: The talk was given at the INFORMS 2008 conference in Washington DC. The track session was SC37. Speaker is Linda Green, Armand G. Erpf Professor, Columbia Business School. The talk was titled "Tutorial: Using Operations Research to Reduce Delays for Health Care".

Business Intelligence: Data Text Mining & Its Challenges

In the world of business intelligence (BI), data and text mining is a rising star, but it has a lot of challenges. Seth Grimes points out the importance of having structured data in relational databases, and the need for statistical, linguistic and structural techniques to analyze various dimensions of the raw text. He also shares with the audience some useful, open source tools in the field of text mining. John Elder, on the other hand, shares the top 5 lessons he has learned through mistakes in data mining, where he also reveals one of the biggest secret weapons of data miners.

Grimes took the audience on a journey the traditional BI work, where data miners take raw csv (comma separated values) files, and turn them into relational databases, which then gets displayed as fancy monitoring dashboards in analytics tools – all very pretty and organized. However, most of the data that BI deals with are “unstructured” data, where information is hiding in pictures and graphs, or in documents stuffed with text. According to Grimes, 80% of enterprise information is in “unstructured” form. To process the raw text information, Grimes says it needs to be 3-tiered: statistical/lexical, linguistic and structural. Statistical techniques help cluster and classify the text for ease of search (try ranks.nl). Syntactical analysis from linguistics helps with sentence structures to provide relational information between clusters of words (try Machinese from connexor.eu to form a sentence tree). Finally, content analysis helps to extract the right kind of data by tagging words and phrases for building relational databases and predictive models (try GATE from gate.ac.uk).

Elder’s list of top 5 data mining mistakes includes:
1. Focus on training the data
2. Rely on one technique
3. Listen (only) to data (not applying common sense to processing data)
4. Accept leaks from future
5. Believe your model is the best model (don’t be an artist and fall in love with your art)

In particular, Elder shares with the audience the biggest secret weapon of data mining: combining different techniques that do well in 1-2 categories will give much better results. See Figure 1. 5 algorithms on 6 datasets & Figure 2. Essentially every bundling method improves performance. Figure 3. Median (and Mean) Error Reduced with each Stage of Combination also illustrates the combinatorial power of methods for another example in his talk.

Data text mining is still in its early stage, and the “miners” have a lot of challenges to overcome. However, given the richness of information floating around on the internet and hiding in thick binding books in the library, data text mining could revolutionize the business intelligence field.

Credits: The talk was given at the INFORMS 2008 conference in Washington DC. The track session was SB13. Speakers are: Seth Grimes, Intelligent Enterprise, B-eye network; and John Elder, Chief Scientist, Elder Research, Inc. The talk was titled "Panel Discussion: Challenges Facing Data & Text Miners in 2008 and Beyond".

Forecasting Hollywood movie box office revenue with HSX trading history

Want to know what movies are going to make it or flank it at the box office? Is it going to be a hit, a fast decaying, or a sleeper movie – that is in terms of its box office revenue trend? Natasha Foutz, Wolfgang Jank and Gareth James have attempted to predict the revenue trend of Hollywood movies with 3 principle components (average/longetivty, fast decay, and sleeper effect) in conjunction with Hollywood Stock Exchange (HSX) trading histories. HSX is a virtual stock market of music, TV shows, and movies. The authors claim a high degree of forecasting accuracy using functional shape analysis and regression on the 3 principle components and early HSX trading histories for the individual 10 weeks box office opening revenue of Hollywood movies. If you are really good at this game, you may end up selling your billion-dollar HSX portfolio on ebay, who knows?

Hollywood movies have widely varying box office revenues, some much more profitable than others. Therefore, it is crucial to forecast movie demand decay patterns for movie financing, contracting, general planning purposes, etc. The forecast needs to be made long before the movie release, since planning happens much more in advance, sometimes years earlier. Most movies gain the majority of its revenue in the first 10 weeks of opening, so the model looks at the forecasting of demand decay patterns of the first 10 weeks of Hollywood movies. The use of HSX data is proven to provide more information for the revenue forecasting purposes. Virtual stock markets (VSM), the show of wisdom of crowds, are of no stranger to forecasting complicated issues ranging from election results, NBA championship winnings, to Al Gore’s 2007 Noble Prize winning. The results produced by VSM are very impressive and accurate. For example, the political VSM was said to be 75% more accurate than political polls.

Foutz, Jank and James identified 3 principle components to be used alongside the trading history of HSX: average/longevity, fast decay, and sleeper. Longevity captures the average box revenue over the lifetime of the movie where the trend is relatively smooth (a linear decreasing trend of a log transformation of the revenue figures), such as Batman Begins. Fast decay captures the movies that have great openings but quickly die out, such as Anchorman. Sleeper describes the movies that have a slow start, but with word of mouth (for example), it would pick up momentum in later weeks of the opening, such as Monster or My Big Fat Greek Wedding.

The authors tested out 5 different models of weekly revenue regression over a period of 10 weeks. Each model uses a combination (or the lack of) the three principle components and the trading histories from HSX. The results indicate that movies with higher level of trading activities on HSX at the very early stage (weeks in advance) would more likely have a higher weekend box office opening revenue. How could this finding be used for more meaningful purposes than betting with your friends? For example, theatre owners could better allocate screens and profit sharing, while movie producers could design different contracts for the slow burners than the fast ones. If you are a movie buff, maybe it’s time to get on the HSX for some trading fun instead of crying over the financial stock markets.

Credits: The talk was given at the INFORMS (Institute For Operations Research and Management Science) 2008 conference in Washington DC, in session SA68, by Natasha Z Foutz, Assistant Professor of Marketing, from the McIntire School of Commerce, University of Virginia. The title of the talk was "Forecasting Movie Demand Decay Via Functional Models of Prediction Markets".