Sunday, March 10, 2013

The Art of Computer Programming


This is a story about “programming.” Or, as I prefer to call it, “coding.” That’s creating something in that terse, somewhat mathematical syntax called “a programming language” that is then run on a machine to perform some magical result, whether it is calculating the orbit taken to fly to the moon or just simulate people living in a city … SimCity. It might be part of a function to turn thoughts into print … dots on paper … or turn an algorithmic representation in zeros and ones into one of Beethoven’s symphonies … or a good Beatles song.

It might be the code to encode a picture (jpg) or a movie (x264) or that Beatles tune (mp3 or AAC). A well-written piece of code is like a painting, you can recognize the artist from the brush strokes. It is really a work of art.

When I first started coding, I would just copy programs out of books. You would take a few lines of this and a few lines of that, put them together in a known form: sort of “I IV V” of a blues song, and you would get the program to do what you wanted.

One of the first, actual, professional bits of code I wrote was a small program to handle the inventory of our disk drive manufacturing process. Not anything fancy like ordering parts or maintaining stock levels. It was just a simple program to keep track of what parts were in the queuing area ready to build drives.

Most of the program went well, even though I was an electronics engineer and not a professional programmer. I had not had all the fancy training or study that computer scientists had; I just had a rough understanding of how to put SEQUENCE, SELECTION, and ITERATION together with allowed syntax to form a program.

As a hobbyist and owner of several home computers that ran BASIC, I was well aware of the syntax required to write in the Microsoft’s PC BASIC that came installed on all new PC’s when IBM announced its first offering in 1981. My manager obtained a copy of the BASIC Compiler, so I could produce programs that could be run quickly on the new PCs that were appearing everywhere in IBM offices.

(A “compiler” is a program that converts text called “source code” into a language a computer can process called “object code.” Programs on a disk are in “object code.” Certain programming languages can also be converted “on-the-fly.” That’s called “interpreted code.” BASIC was designed to be interpreted or converted “on-the-fly.” However, it was more efficient and faster to convert it ahead of time using a compiler. That way you converted it once, and then ran it thousands of times.

Many programming languages are designed only to be compiled (C or C++) while other languages are designed to only be interpreted (Java). Some can be done both ways. It’s a deep topic … more “computer science.”)

However, I soon encountered a problem I didn’t know how to solve. I had a database, a very simple database that was really just a file of items in the inventory. It was a simple, two-part list. The first part was a part number and the second part was the quantity in inventory stored as an integer or simple number. I had a second list that was pairs of part numbers and a short description of that part to make the data displayed more user-friendly.

What I needed to do in the program was to search the first list for a part number and then search the second list for a matching part number so I could display the part description along with the part number to the user. But I didn’t know how to search efficiently. All I could think of was to start at the top of the list and scan through all the part numbers looking for a match.

That would work, but since there were thousands of part numbers and I was searching the list stored on a floppy disk, which took considerable time to read, my program ran too slow. It worked, but it was inefficient and unusable. This was before IBM had hard disks in the PC, and besides, the problem really was my slow or inefficient search method – called an algorithm. A faster disk would have helped, but as the list grew, even a faster disk would not solve the issue of inefficient code.

I needed a faster searching algorithm. I asked one of my fellow engineers who had taken some computer classes as part of his degree (I had not) how to solve this problem. And he referred me to a set of books by Donald Knuth called “The Art of Computer Programming.”

Knuth is a computer science professor at Stanford University where I currently attend. Unfortunately, since I’m taking classes remotely via the Internet, I’m not likely to run into him on campus. Knuth is a brilliant mathematician and computer scientist and he is often called the “father of the analysis of algorithms.”

Back in the early sixties he started to write a book analyzing computer algorithms … that is, an effective method, written as a list or some other simple description of a specific technique to solve a particular problem. As I alluded to in a previous article on this blog, an algorithm is like a “recipe.”

Knuth’s work quickly evolved into a plan for a seven-volume set of books and he proceeded to write the first four:

Vol. 1 – Fundamental Algorithms
Vol. 2 – Seminumerical Algorithms
Vol. 3 – Sorting and Searching
Vol. 4 – Combinatorical Algorithms

And then he stopped. He had planned these additional volumes:

Vol. 5 – Syntactic Algorithms
Vol. 6 – The Theory of Context-Free Languages
Vol. 7 – Compiler Techniques

The reason he stopped when he did is that he was getting frustrated with the simple programming language he was using to typeset his books. Of particular concern was the mathematical language and formulas in the book and getting them to the publisher in the form he wished without introducing errors. As a programmer he was aware of the issue of errors or bugs in code and he didn’t want any in his writing. In fact, he offered a reward of “one hexadecimal dollar” or $2.56 for each error found, and he didn’t have to pay that fee too many times. But he was also interested in the beauty of the typesetting and book: Art + Science + Engineering.

Let me borrow these words from Wikipedia to explain where Knuth’s head was at upon publishing his work.

When the first volume of Donald Knuth's The Art of Computer Programming was published in 1969, it was typeset using hot metal type set by a Monotype Corporation typecaster with a hot metal typesetting machine from the 19th century which produced a "good classic style" appreciated by Knuth.

When the second edition of the second volume was published, in 1976, the whole book had to be typeset again because the Monotype technology had been largely replaced by photographic techniques, and the original fonts were no longer available. However, when Knuth received the galley proofs of the new book on 30 March 1977, he found them awful.

Around that time, Knuth saw for the first time the output of a high-quality digital typesetting system, and became interested in digital typography. The disappointing galley proofs gave him the final motivation to solve the problem at hand once and for all by designing his own typesetting system. On 13 May 1977, he wrote a memo to himself describing the basic features of TeX.

Even the name, which is pronounced “tech,” used special formatting. The lowercase e is written as a sort-of, subscript: TeX, but not quite like this. The “e” should extend down below the “baseline,” a term used in formatting text … it’s art you know.

(Due to the limits of HTML which produces this blog text, I can’t even show the correct format of the language name.) 

His new creation included an advanced font control system called Metafont and also included a special language for entering complex mathematical equations called LaTeX.

Although Knuth began the work developing this new typesetting language during a sabbatical in 1978, TeX became widely available in the early nineties. He was later quoted as saying that he had no idea that work would take so long ... over ten years. Still, text book writers the world over are glad he did finish the work of art. I don't suppose Michelangelo expected it would take so long to complete the ceiling of the Sistine Chapel either, but we're all glad he finished that work of art too.

Here’s an example of the quadratic formula written in LaTeX:

$-b \pm \sqrt{b^2 – 4ac} \over 2a$

The dollar signs enclose the LaTeX formula and the reverse slash is a markup indicating a specific symbol and format. You may have guessed that ^ indicates superscript of “exponent” and the "\pm" indicates the "plus-minus" symbol. These markups are called “metatext” or instructions or commands embedded in the text. HTML itself is a markup language.

When typeset, this results in :


As you can see, my HTML does include LaTex and I can format formulas in this blog.

But, getting back to his Art of Computer Programming, I quickly checked out Volume  3 from the IBM Library.

(Back then, IBM had a library at every development site and I was good friends with the manager of the Boulder library as I had been in there taking courses on programming and computers since I was first hired by IBM back in the seventies. I had a lot to learn and the library had lecture tapes in Beta format that I would watch on the consoles there.)

I quickly learned how to sort my lists in my program and then search them efficiently to find the part number match I needed. I also learned of the search for the best searching algorithm and the importance of sorting as part of searching. I even learned the mathematical technique for evaluating algorithms, something called the "Order of N" description.

The result of all of this was that I literally fell in love with anything that Donald Knuth wrote. I bought the three volumes that were available at that time and bought the fourth volume when it came out. Since then I’ve purchased the revised versions too. And this is no small task as these books are each priced near $100, like most good textbooks. But they are definitely worth every penny.

As an aside, Donald Knuth is a serious Christian. He wrote a very interesting and scientific analysis of the entire Bible called "3:16 Bible Texts Illuminated." This interesting book analyzes the Bible by studying verse 3:16 in each book. It is illustrated with many diverse fonts produced by Knuth's friends and colleagues in the publishing business. It is a good book for people of faith, as well as printers and scientists, and it presents a fascinating view of the Holy Scriptures. It is also strong evidence of the Christian Faith in a leading scientist for those that think Christianity is just about tradition and superstition.

Knuth taught me to write programs like classical music or fine poetry. The Art of Programming indeed. That is why I recommend that all children today … in this highly technological world … learn to program. A complete education must include reading and writing and arithmetic, and that includes reading and writing programs and calculating the result of the mathematical algorithms. That’s a well rounded education. It’s STEM plus Art and Design, and Programming is an art.

My coding art is not on the level of Knuth's or many co-workers that I met at IBM and elsewhere. But I can read and understand code well enough to recognize and appreciate the beauty of a well written algorithm. I wish everyone had that ability, especially some of the programmers I've met. Well written code is a symphony and poetry and fine art. It is not as obvious as other scientific endeavors such as architecture or bridge building, but the beauty is there for those that know the language.

Imagine that you are a fan of French poetry. You know that translations into English just don’t capture the art and beauty. Then there is no choice but to learn to read and write French.

Programs are automatically translated on the computer they run. But, as with the French poetry, it’s the same. You can’t translate the beauty of a program if you don’t know the language. This is the language of modern living and technology. It is sad that so many can’t speak it and will never know its beauty.

In case you want to learn, here are some links to some starting points. I’ll reference my favorite encyclopedia. Enjoy …








No comments:

Post a Comment