Monday, December 10, 2012

Understanding Variation ... and Global Warming

I’ve written before about the many different jobs I’ve had over the years, the many roles I’ve played in industry, and the kinds of different things I accomplished. My last job or role was as the Technical Quality Leader of a company with two billion dollars in annual revenue. That’s a lot of dough, and my job was to make sure our pies were the best.

I performed various tasks to accomplish that goal from recommending and deploying best practices for software development to keeping engineering and service apprised of the quality of our products in the field to setting targets for new products. Key to all of those activities was explaining and interpreting results for everyone from individual programmers and engineers to the company’s vice presidents and CEO.

As the lead statistician, I was often called upon to explain the numbers. Key to those explanations is understanding variation. You see, the numbers always varied. How to make a cohesive message out of results that were always changing was the challenge. How to predict trends and “averages” based on values that would be high one month and low the next called upon all my skills at presenting, and during my tenure I developed several key charts that would display different values and indices together so you could see the causes and effects more clearly over time.

I became know as “Mr. Charts” and “Dr. Numbers,” and I even had marketing use my data and data representations to sell customers on the quality of our products. That was very difficult because we wanted to be careful just how much information we gave to our customers for fear some facts would put us in a bad light.

One of the aids I used to teach the understanding of variation was a book by Donald J. Wheeler called quite appropriately “Understanding Variation, the Key to Managing Chaos.” I found the book so useful and so accessible, I actually bought a copy for everyone on the management committee with my own personal funds. Fortunately the book cost a very reasonable $29.00 and I bought a dozen copies to distribute to the executives that I worked with directly.

Let’s take a simple scenario. Suppose you have a goal to produce a product with a certain “mean time to failure.” “Mean” is a fancy mathematical word for “average,” at least a particular average called the “arithmetic mean.” And a failure is when something goes wrong with one of our big printers and we have to send out someone to repair it. They would replace a failed part or possibly make some adjustments or alignments and then the printer would be printing again.

On an engineering basis you would probably measure such a failure in some time measurement. We operated on a monthly measurement cycle and whenever a repairperson had to go fix a machine that was called an “RA” or “Repair Action.”

We had a metric called RA per MM or RA/MM. An “MM” is a machine-month. That is, one machine running for one month. Every month we would add up all the machines of a particular type in the field and divide that into the total number of repairs done on that type of machine that month. That was the RA/MM. Notice this is actually an average: the average number of repairs on a particular design or machine type per month.

Now imagine something as large and complicated as an airliner. One of those big Boeing jobs that flies coast to coast. Now they are not perfect machines and they need to be repaired and maintained on a regular basis. They might have to be maintained after every three days of flying, for example. Well then that would be an average of about 10 repairs a month for, say, a 747.

Our big printers were also complicated machines that needed regular maintenance and repairs. Some as often as two or three times a month. Notice how this is different than you personal automobile. It may only need repairs or an oil change every five months. Our smaller printers were like that. Some only needed repair once every three or four years on average. But our largest printers were more like that airliner, and sometimes the IBM service person had a permanent office at the customer location doing nothing but maintaining a dozen or more big printers at that facility.

It was these largest printers that we were very focused on because these down time and repair actions cost us a lot of money (we were operating with maintenance agreements … basically a full warranty that the customer pays for each year) and customers wanted to print, not have the printer down for repairs.

We tracked lots of things like how many hours were spent in repair and the cost of replacement parts, or even how often we had to return a second time to fix something that should have been fixed in the first visit, but the RA/MM was our most important measurement.

Suppose the expected number of repairs was two a month. Well then one extra repair increases that metric by 50%. These are events in statistics that we call “small numbers.” Since statistical analysis works best with large numbers, it was a struggle to make sense of these variations.

Suppose the target is 2.35 RA/MM. Further suppose one month the actual number is 2.17. Hurrah … we met target … exceeded it in fact. Unfortunately, the next month, the average is 2.58 RA/MM. Oh, oh … over target. Now what. Do we panic? Do we ask engineering to develop an action plan to correct the target miss and for service to develop a mitigation plan to deal with higher than expected repairs? NO!! At least that was what I was trying to get across to executives.

It is normal for these kinds of measurements to vary slightly from month to month. That is especially true of the kinds of small number measurements we were making. A few outliers … that’s a bad machine at a bad location that had some bad parts installed and ended up with 7 RAs in a month … can really skew the data. Plus, all the machines needed regular replacement of consumable parts like the photoconductor that wears out like the tires on your car … only a lot faster. So there were very few machines getting less than at least one RA in a month. That further skewed the data. A bad machine could get 10 RAs, almost 8 over target, while a great machine could only be 2.35 under target with a total of zero RAs in the month.

What to do? Well, to a statistician, the answer is simple. Keep a running average of the average RAs. I would use moving twelve-month average as a better indicator of the quality of the machines over time, but even that isn’t perfect. But I think you can see the point. Things that vary can be measured and managed with some mathematical measurement of central tendency … what laymen call “averages.” Further, you can do averages of averages or “running averages.”

Mathematical warning … averages of averages can be very misleading unless you are very careful. I won’t get into the math, but it is a good idea to never do averages of averages unless you really know the underlying math and just what is being measured and how. For example, it would not be correct to add up the individual monthly values and divide by 12. No, no, no. That is not the correct method and, under certain circumstances, can be really wrong, wrong, wrong. No, instead you sum up the total RAs for the entire year and divide that by the sum of all the machine-months for the entire year.

Then, each month, you move the average “to the right” by using the previous twelve months. I did this entire simple math in good old MS Excel. I then used the charting function to graph the 12-month average line and the individual monthly values. The twelve-month running average was much smoother (that means the variations are less from month to month). The graph was interesting with the monthly results some times less than the running average (driving the running average downward) or greater than the running average (forcing it up). The running average was the trend line and it actually held more information because it integrated or averaged out the variations.

You could do analysis such as “if the monthly value is above the running average for four months in a row” then that required action because, obviously, the trend was upward, which we didn’t want. Good quality on this chart was a low number.

In my example, as long as the total number of machines in the field was relatively constant and old machines were replaced with new machines at a fairly constant rate and there was no seasonal variation and … about another twelve considerations … then my rolling averages was a good way to go. You do have to understand the context of all data.

The idea was that good months and bad months would average out and you would get a valid indication of the underlying quality of the machine. When you got nervous was when the trend line started moving up. That meant the bad months were outweighing the good months and the machines were actually getting worse … or the repair people were not as effective and fixed machines broke more often … or the fleet of machines were aging … or the amount of printing in an average month was increasing … or the paper or toner / ink quality was … OK. It is never easy, and you always used more than one chart to really measure a process or a product.

But the idea of trends of averages to measure variation is a solid concept.

Now let’s change gears. Suppose you want to measure “climate.” Now, in a mathematical sense, “climate” is just an average over time of “weather.” Take temperature for example. You can talk about an average high temperature in the summer in Chicago. Take one hundred years of Chicago summer temperatures and do the math. You will see they vary, as some years are warmer or colder than other years. That’s normal.

But if you track the trends and see the average summer temperature in Chicago is trending upward, you might think you’ve found some effect and look for a cause. If it were global temperature you’re interested in, then you would average Chicago and New York and Denver and L.A. … plus Honolulu, Tokyo, Moscow, Berlin, London, Cape Town, Melbourne, Buenos Aires … etc.

If you combined all the averages for summer and winter and fall, etc., temperatures of the cities all over the world, you would get a so-called “Average Global Temperature.” You could plot that temperature on a graph. Of course seasonal variations are typical when you are measuring weather. You might choose to track average summertime temperatures, or winter temperatures, or just average the whole year.

Now this is the point I want to get to. Scientists have set up a method to calculate the average global temperature and have the data for over 100 years. So what happens when you graph the current temperature averages against that long-term average? The trend is up.

In fact, the annual average has been above the 100-year average for every one of the last 25 years. If you are younger than 25, then you never lived through a “below average” temperature year.

According to an article in Scientific American (July 26, 2010), there have been exactly zero months, since February 1985, with average temperatures below those for the entire 20th century. (And those numbers are not as dramatic as they could be, because the last 15 years of the 20th century included in this period raised its average temperature, thereby lessening the century-long heat differential.) That streak -- 304 months and counting -- was certainly not broken in June 2010, according to the U.S. National Oceanic and Atmospheric Administration (NOAA). That month saw average global surface temperatures 0.68 degree Celsius warmer than the 20th-century average of 15.5 degrees C for June -- making it the warmest June at ground level since record-keeping began in 1880.

Not only that, June continued another streak, it was the fourth warmest month on record in a row globally, with average combined land and sea surface temperatures for the period at 16.2 degrees C. The high heat in much of Asia and Europe as well as North and South America more than counterbalanced some local cooling in southern China, Scandinavia and the northwestern U.S. -- putting 2010 on track to surpass 2005 as the warmest year on record. Even in the higher reaches of the atmosphere -- where cooling of the upper levels generally continues thanks to climate change below -- June was the second warmest month since satellite record-keeping began in 1978, trailing only 1998.

"Warmer than average global temperatures have become the new normal," says Jay Lawrimore, chief of climate analysis at NOAA's National Climatic Data Center, which tracks these numbers. "The global temperature has increased more than 1 degree Fahrenheit [0.7 degree C] since 1900 and the rate of warming since the late 1970s has been about three times greater than the century-scale trend."

So what does the near future hold in terms of heat waves and record-breaking highs? Depending on how quickly La Niña conditions strengthen in the Pacific Ocean (and a host of other factors), this year could surpass previous records or at least take its place as one of the warmer years on record.

Now the first principle for understanding data is that “No data have meaning apart from their context." So how do we analyze this data? Again, from Wheeler, a rule: “The best analysis is the simplest analysis.”

I suppose you could break this down into two questions. The first is “Is the global climate warming up?” and then the second would be “Why?” I think the data is very clear. Even with the notoriously varying weather, charts that show the running average on the increase for over two decades should end the debate. Never mind secondary effects such as melting ice and glaciers or tertiary effects like increases in storm intensity and other measured changes. The evidence should be very clear … the earth is warming … and that isn’t really a good thing for most of us that live here.

Why is it warming? Well, one obvious answer is the increase in green house gasses in the atmosphere. Certainly you could hypothesize many causes of global warming from increased output of the sun to a change in the Earth’s orbit to internal heating or even a heat ray from the planet Mars. But it is clear from measurements that none of those things happened, while the amount of CO2 and other green house gasses has increased in the last century. I think we've found the cause. Levels of several important greenhouse gases have increased by about 25 percent since large-scale industrialization began around 150 years ago. One of the causes could be cow farts ... but you could also ask why there are so many cows. No, it appears that the increase in greenhouse gasses is due to man-made causes, so it follows that global warning is caused by man. (That still leaves open the question of what to do about it, but the first step in problem solving is to identify the problem.)

Further, secondary effects from that increase of heat will likely further increase green house gas emissions such as emissions from melting swamps in the north emitting methane gas. We may be in for a very rough time ahead.

It is a simple cause – effect. It is a scientific fact that carbon dioxide increases the amount of heat trapped in the Earth’s atmosphere -- the green house effect. It is a scientific fact that the amount of CO2 and other green house gasses is increasing in the atmosphere. And the effect of “global warming” (or the more accurate “global climate change since the effect can actually lower temperatures in some regions due to changes in ocean currents or air currents) is measurable and real.

So what is there left to deny other than science itself. Of course, some people deny science … yet they believe in the science of aerodynamics by flying in airplanes or the rules of electrodynamics by using electric lights and telephones and TV and radio. Or they believe in chemistry by using glues and plastics and other man-made materials. Like I said, “What is there to deny?” Just do the statistics. It isn’t that hard to eliminate the variation and arrive at the long-term trend.

I have a co-worker who doesn’t believe in global warming. He says that science said margarine was better for you than butter, and then they reversed themselves and said butter is better. So he doesn’t believe in science any more. He also says that it wasn’t long ago scientist thought the earth was cooling and we were headed toward an ice age. I suppose all that is true. Science isn’t exactly “an exact science,” you could say. Science is a lot of trial and error … and data that requires interpreting. But, at some point, the data becomes so clear that all but the most stubborn don’t agree.

I’ve looked at the data and analyzed the trends. I’ve read the articles from several different journals and news magazines. I think the answer is clear, but you don’t have to believe me. Just do the math yourself.

No comments:

Post a Comment