Astonishing things in informatics

October 18, 2011

Some things astonish me more the longer I know them. Some examples from informatics:

Reliable and complex from simple, brittle pieces

TCP guarantees the arrival of intact data, short of the entire network fragmenting. Some telecom systems claim nine nine's of up time. Large data centers expect disk failure (PDF).

No single person understands the whole of one of Intel's chips, or those from AMD, or any other desktop microprocessor today. This is Conway's law at work: "...organizations which design systems ... are constrained to produce designs which are copies of the communication structures of these organizations." Turn it around: a system beyond the grasp of an individual human can be built by organizing them along its lines, and designing the organization can be done by one human.

In groups we can build things beyond the grasp of the individual, that are more robust than any part we have available to put into it. If that doesn't give you hope for our species, what does?


The simplest mini-Prolog is almost trivial. It's a few functions, none more than a few lines of code. Seen in the host language, the functions are inscrutable, as opaque as looking up at the surface of the water.

On the far side, unification over Horn clauses (i.e., Prolog) is a consistent and self contained view of computing. You can peer down at the surface where Prolog begins and lambda calculus ends, but everything on the far side is hazy and distorted.

I always feel a sense of vertigo when I cross such a boundary, and even more when I erect one.

The infinite horizon of homoiconic languages

In Lisp, all is Lisp. In Smalltalk, all is Smalltalk. In Prolog, all is Prolog. These languages are easily defined in themselves. Their interpreters and compilers live as automata in the world which they define.

This imparts a feeling of space. In Haskell or C you can see the boundary clearly. Your semantics, your type system, the hard wall of the compiler, sit in plain view. In the homoiconic languages, the compiler is just another function, another object, another rule. The boundary that defines the edge of your semantics recedes into the distance and vanishes.

Practically, such languages are incredibly easy to bootstrap. If you can just breath life into some kind of interpreter, you open a door to wonderland, and may step through and forget whence you came.

Intrinsically hard problems

The implicit trust of mathematicians for millenia was that if we were only smart enough, if our reason were just powerful enough, we could calculate anything in the universe.

Complexity theory ended that. There are problems in our universe on which reason may break itself in vain, and they are named NP-complete. In a universe of subatomic particles colliding and appearing and disappearing, the seeming abstraction of how hard a problem is to solve is bound by universal laws.

Artificial intelligence isn't human intelligence

Computers bear no resemblance to human cognition bears. We did not make these machines in our image. Indeed, one of the successes of the machines was to make us realize that we are not computers. Our thought processes do not work like that.

We are things capable of taking decisions based on the world around us. We created other things likewise capable, but they do so in a manner entirely removed from our own. After living with our creations, I can only wonder at the naivete of those who could think a being that created us would do so in his own image.

Updating technical consensus is very, very hard

JavaScript is broken. Everyone knows and acknowledges this. Books are written about how to avoid its jagged edges.

The Internet was not designed for web applications or streaming media or commerce or identification of its users.

Unicode is one of the most remarkable consenses of mankind.

Technical consensus is necessary for complicated artifacts, but no one knows how to change them. There are many obvious things that could be fixed in JavaScript, things that no one would dispute, but socially we don't know how.

Public/private key cryptography and webs of trust

I can encrypt a message, send you a key to decrypt it, and be sure that no one else can eavesdrop. Public/private key cryptography, like building reliable systems from unreliable components, seems too good to be true. Of course, there's a catch. How do you know someone didn't replace the message and the key?

This is where webs of trust come in. I may not trust the key I got from you, but I may trust that a mutual friend says it's the right key. That may not scale very well - why should I trust the friend of a friend of a friend? - but it needn't scale very far. The small world theorem says that if the web is sufficiently comprehensive, I am unlikely to need very long chains of trust.

In practice, such cryptography hasn't become ubiquitous enough for the small world effect to really kick in, which is a shame.

Self destruction of communities

In any sufficiently large and general online community, cliques arise that try to destroy the community, along with lots of other odd behavior.

If the interactions are limited enough, then this kind of abuse tends not to arise because the technical limitations make it too much work. It is hard to imagine how to go on a rampage in Twitter. In a small community, the social pressure to behave is high, and the probability of someone resistant or hostile to such pressure being a part of the community is small.

When the interactions are general enough and the community large enough, groups arise who view the community around them as their intellectual prey.

And then a system of controls and enforcers is put in place if there wasn't already one. The history of IRC is one of networks splitting off and adding controls and enforcer powers. Usenet continually suffered from flamewars. Spam appeared in email, not to mention mail bombs.

We have built communities again and again from scratch on the Internet. It appears that a certain fraction of the human race is naturally destructive and must be held in check, and that some part of every community's resources must be devoted to that.

It's a strange thing to learn from a collection of wires and transistors.

Cryptographic hash functions

Imagine a function that has an inverse. The function is easy to compute, the inverse almost impossible. This is another of those pieces of magic that makes modern computing possible.

Add to that two more properties: small changes in input cause large changes in output, and no two inputs yield the same output. Now you have a tool to reduce anything to an identifying integer. At some level, computer security is controlling who has the data and who only has the identifying integer.

Accept liberally, produce conservatively, and go mad

RFC 791 says "Be liberal in what you accept, and conservative in what you send." Since RFC 791 defines IP, the foundation of the Internet, this is, for all practical purposes, the Intenet's motto. And the Internet grew insanely fast.

The World Wide Web did the same thing, resulting in broken HTML, distorted HTTP requests, and growth more usually associated with bacteria.

You must accept that many people will have to spent a lot of time cleaning the mess to the best of their ability at some point in the future. On the World Wide Web, at least 40% of HTML was malformed in 1996, and probably much, much more (it's hard to tell from the way the data are presented). In 2001, there were at least 140 invalid HTML documents on the web for every valid one (source).

And a lot of people have spent the last fifteen years trying to fix this. The situation has actually improved, but think how much time and acrimony has been spent.

On the other hand, without this path, the web would have had only a brief history as a medium for pictures of women as opposed to its multidecade run as the largest source of images of the female of the species ever created by man.

It's apparently better to let everyone in, let it go to hell, and try to clean up later. To the mathematical mind, this is a boggling principle.

Concurrent is different

Concurrency, or multiprogramming as it used to be called, has long been the bugbear waiting to snatch the unsuspecting programmer. Most programmers developed an implicit view of their craft based on some variation of a Turing machine. They work by setting up and running that machine in their head. This has crippled more programmers than I care to think about. (Dijkstra had many unkind things to say about such issues).

The human brain can keep track of one machine going back and forth on the tape. At each stage, there is only one possibility. The programmer develops an intuition based on how he would move chess pieces around on a board. But as soon as there is more than one machine on the tape, this fails, completely and utterly. There are now two chess players, and even when we only allow them to move their own pieces, that game has absorbed more of man's intellectual energies over the course of history than programming has.

Concurrent programming is more difficult than single threaded programming, but more importantly, it's just different. Once you have trashed the little Turing machine in your head, there is a lot known about multiprogramming. But what is astonishing in this? It's not concurrency, though there are gems to be found there. It's that we spend endless work trying to pander to programmers crippled by a curable mental afflication.

The creation of a network

The Internet went from 213 hosts in 1981 to 29 million in 1998 to 681 million last year (source). I remember when the only contact information on business cards were telephone numbers and mailing addresses, before FAX numbers were added. Today the telephone number is secondary to the email address or web site.

My mother was 17 when the first ARPAnet link was established. TCP/IP became the sole accepted protocol on the Internet in 1983, the year I was born.

The Internet is something new and different, though it has become passe to say so. Pundits and bloggers and talking heads pontificate about the transformative powers of this or that trend. They have no more idea of what's happening than the neurons in our brains understand our thoughts. (I should add that the Internet is not a brain, and that analogies between the Internet and brains are roughly as useful as analogies between the Internet and cottage cheese; see my astonishment, above, that we created machines not in the image of ourselves).

How outlandish does the phrase, "The Internet thinks your cute?" sound? We're not far from it seeming almost normal. Now go read A Miracle of Science, where Mars is a group mind and at some point says to someone, "Mars likes you."


Now that the list is made, why did I choose things from informatics? There are things in math and physics and music that astonish me more over the years, but they tend to feel eternal. They are truths the world may reach towards, but they will not meet it half way.

The things I listed in informatics are visceral, low-brow, as though we were playing in a mud puddle from which the Loch Ness monster emerged. And they are, in so many case, about people, about how people interact and how people fail and how people end up in horrible messes.

Did you enjoy that? Try one of my books:
Nonfiction Fiction
Into the Sciences Monologue: A Comedy of Telepathy