I’ve had so many different roles during my career. At IBM I
worked in different areas over the years. I started in copier
manufacturing as a tester. I then moved to diskette drive manufacturing
and later magnetic recording head manufacturing. From there I went to
Education and Training where I worked as an instructor, course
developer, and course manager for fourteen years. Finally, in 1998, I
went to work for Printing Systems Division where I worked as a
programmer, a tester, a project manager, and finally a very nebulous job
that involved a lot of statistical analysis among other things. My
title was Quality Technical Leader. (Or Technical Quality Leader, we
never could decide just which term modified which.) I employed
technology in order to manage product quality. I used process
engineering as well as statistics and mathematical methods to perform
these duties. In other words, I was the head bean counter.
The
statistical work with Printing Systems was not my first time as a
statistician. While working in diskette drive manufacturing I did some
statistical analysis. Let’s start with that story. I worked on
eight-inch diskette drives as a manufacturing engineer. I was
responsible for testing all the drives manufactured in Boulder. IBM also
had a plant in Italy that manufactured drives. I designed, built, and
maintained a tool called ESTAR for Eight Station Test and Repair. This
was a console used to operate and test the diskette drives at the end of
their production line.
After IBM came out with the PC in
1982, we began work on five and one-quarter-inch drives for the PC.
Prior to that we had been developing a four-inch drive, but the IBM team
producing the PC in Boca Raton did not want to use such new and
untested technology, so they chose the then industry standard 5.25-inch
drives purchased outside IBM. When the PC became a large seller, we
responded to that business opportunity, and Boulder began designing and
building 5.25-inch drives.
One fact about these “floppy
drives” was that the magnetic read/write head rode directly on the
diskette. This could cause problems with wear. Although diskettes were
designed to be as smooth as possible, they did contain what were
primarily iron filings and would act like sandpaper wearing out the
heads. We had done tests on our drives and were satisfied that they
would have sufficient life even with this wearing down.
However,
in the last couple of months before our drives would be ready for
delivery, we realized that diskettes from different manufacturers had
different characteristics affecting head wear. The different processes
used by manufacturers to make diskettes produced varying degrees of
roughness that the development lab had not expected. We had not tested
all the different diskettes, but had focused only on the IBM diskette.
That was a big mistake. Once we realized that there was so much
variability in the surface smoothness between diskette manufacturers, we
needed to perform a broader test using more types of diskettes.
Since
I had just received a Master’s degree in mathematics, and I was
responsible for drive testing on the current 8-inch manufacturing line, I
was assigned to test the wear characteristics of the 5.25-inch drives
with various types of diskettes. I gathered 35 PCs from around the
plant, effectively borrowing them for two months. I set them up in a
test lab and I quickly wrote a program that would move the head to the
center of the diskette tracks and read the data for fifteen seconds. The
drive would then reset to track zero, and then access the middle track
again and read the data for another fifteen seconds. This process was
repeated over and over again in a loop. The program would keep this up
until it detected repeated failures to read caused by the recording head
wearing out from fiction with the disks. The program kept track of the
results and basically ran automatically without attention.
I
had designed both the program and the experiment being run to measure
the wear characteristics of the head and performing the tests on 35 PCs
allowed me to test with several different brands of diskettes. There
were seven primary brands that made up about 90% of the diskettes sold
in the US. (The plant in Italy was responsible for testing with European
diskettes.) So I put five of each brand of diskette in the computers
and started the test.
I was using a statistical method
called Weibull analysis to convert the results of the test into a “mean
time to failure” which would be a measure of the life of the head with
that particular diskette. The Weibull distribution or probability
density function is widely used in reliability and life data analysis
due to its versatility. Further analysis would combine the results for
the different brands of diskettes and produce an overall estimate of
life. I already had the extensive test results the development lab had
done with one brand of diskette sold under the IBM name.
The
parameterized distribution for the data can be used to estimate
important life characteristics of the product such as reliability or
probability of failure at a specific time, the mean life and the failure
rate, and other life characteristics. However, to actually calculate
results, the test must continue to its conclusion. That is, you have to
run the test until diskette drives start to fail from wear. I didn’t
need all 35 PC tests to fail, but I needed over half of them to reach
the point of failure before I could produce meaningful results and
estimates.
It was expected from the initial IBM diskette
results that it would take over 30 days of continuous running to get to
the point were drives were failing. Although we were still several
months from the scheduled GA date (General Availability is the point
where the product is available to the customer), still management was
anxious to have the results as soon as possible.
Things
went well for the first couple of weeks — as expected, and then the
executives started to get antsy. The vice president of development was
an acquaintance of mine since I had taught him how to use the PC when
they first came out. (I taught a series of classes on using the PC at
the Boulder plant when the PC was first released by IBM. My students
included several site executives.)
He started calling me
asking how the test was going. I explained that I would not have any
results until drives started to fail. I would need a number of failure
data points in order to plot the curve. He asked at what point in time,
assuming no failures, the drives would meet the specifications we were
aiming for. I replied that it was true that the longer the test ran
without failures, the better. However, I could not accurately calculate
life until I had a number of failure data points, as the time between
failures was an important parameter in the calculation. I needed to know
when in the life of the head the peak failures occurred in order to
determine the parameters to use in the mathematical model that would
predict general head life. That is, the life of an average recording
head.
He understood since he as an engineer too, but that
didn’t keep him from calling me several times a week during the latter
phase of the testing. Finally the drives started to fail. Once the first
drive failed, several others died within days. Over half of the drives
had reach the point of failure during a one week period which was good
because it meant the wear mechanism was a stable function. As soon as
about 70% of the drives had failed, I could plot the Weibull curve, and I
calculated a value for head life. The good news is that the results
would meet our requirements.
An interesting sidelight was
how worn out the diskettes themselves were. Not only did the diskette
wear the recording head down like sandpaper, but the diskettes
themselves lost magnetic coating due to friction with the head. There
was a noticeable “rut” in the diskette at the center track from head
wear, and yet the diskettes still were able to be read. We verified that
fact by reading diskettes from drives that had failed the test in good
drives. It was not our intent to measure diskette error and the test was
somewhat unreasonable since the head stayed primarily in one place on
the disk. In normal use the head would read and write on all the tracks,
and not just ride continuously in one place. An important point in
statistics and design of experiments is how the methodology used
reflects real life performance. This experiment was designed to test
head wear.
The failed heads were removed and examined
under a microscope to determine additional information about failures,
but we had seen worn out heads before and there were no surprises in
that regard. Still my tests did add information to our process and
knowledge of head performance and may have provided useful information.
We didn’t want to have any surprises once thousands of customers started
using our drives. The goal was for the diskette drives to function
flawlessly for the life of the computer and we created estimates of how
many drives would fail and how often our customer service people would
have to replace a diskette drive based on this experimental data. IBM
was well known for manufacturing highly reliable products, and we were
able to verify these drives would not be a problem to our customers.
The
mathematics I used to calculate results came right out of a math
“cookbook.” Weibull analysis can be done with special statistical
programs, but I just used an HP programmable calculator and some
formulas from a math book. I was given the assignment because my manager
knew I had just graduated with a math degree. What he didn’t realize
was that my studies had only included one course on statistics, and that
was during my under graduate work. My math degree was based on the area
of “analysis” which is fancy math words for Calculus. I had taken lots
of Calculus, Differential Equations, Vector and other advanced Algebra
classes, but I had not trained as a statistician.
Still I
had the basic tools needed to perform statistical analysis. I just had
to read the right books and use the right formulas. I returned to
statistics at the end of my IBM career. This time I was calculating
product quality based on customer data and had the goal of continuous
improvement of product quality. I used my statistical analysis to
determine product quality over product life and to verify that new
products IBM released were better than their predecessor. This involved a
lot of metrics and setting product goals and I was always the “numbers
guy” that the executives would come to set appropriate targets and
measure if we were meeting those targets.
I focused on
something call RA/MM which is the number of repair actions performed per
machine-month. “Machine month” is simply the total inventory of a
particular printer model in the field in a given month and “repair
action” is any time a customer support representative had to go out and
fix a problem on the machine. Depending on the complexity of the
printer, we had a numerical goal for just how often the repairman had to
be called. We had several other metrics to measure just how easy it was
to fix failures on the printers, how long it took on average to perform
a repair, or how much it was costing in parts to maintain the printer
and other measurements that would allow us to determine the overall
quality and the impact on customer satisfaction of the printers
performance.
My job was to work with product engineering
to set these targets and then work with field service to gather data and
verify that the printer was meeting product expectations. If it weren’t meeting
targets, then we would plan some sort of action to rectify the problem;
maybe a redesign of a part or going to a different part supplier to get
more reliable components. It was a continual process of setting goals
and monitoring performance against those quality targets.
Think
of how often you have to make repairs on your car or imagine the lonely
Maytag repairman who has nothing to do since the washing machine is so
reliable. That is what I’m talking about.
I’ll save that tale for another telling.
Sunday, September 16, 2012
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment