Saturday, June 26, 2004

Information Theory

I was planning to write a entry on Fahrenheit 9/11 and Control Room and Thomas Friedman and Nick Berg today, but I didn't end up seeing Fahrenheit 9/11 after all. Check back Thursday.

In the meantime, I'll post this link:

"If English were written like Chinese"

(I think I originally found that via Snarkout)

Of course, Chinese orthography is interesting in its own right, but if you've wondered a bit about information theory, it's even more interesting. Information theory was invented by Bell labs engineer Claude Shannon, whose original paper on the subject I've more-or-less read. In fact, it's just about all I've read on the subject. In the paper, Shannon asks us to imagine a machine which generates letters. Perhaps it generates "A," "B," or "C" with equal probability. So you don't know which of the three you're going to get, and when you see the output, it's a little bit of a surprise; it's news to you, information. Now imagine it almost always generates A's... You already know that the next symbol is probably going to be A, so it's not very surprising when you see it. You don't get much new information -- you already had a reasonable guess what the answer was. But when you see "B," you're suprised. "B" carries more information.

Now you use those probabilities again to figure out what the average "surprisal" any letter of the alphabet would be, and that tells you the amount of "uncertainty" you have when confronted with a single unknown symbol from this alphabet, and thus it's "information" value. This is useful because...

Wait a second-

Hold on-

This page I'm using for reference, to write the blog entry? I wrote this page! I googled, and I turned up my old paper on information theory and didn't even recognize it... No wonder I kept finding myself agreeing with the descriptions, and thinking how pleasantly simple it made everything sound... Gosh, I hope it's right.

Okay, anyway, Shannon figures that if one or two or however many of the sybols can spontaneously change to other symbols, then you still have some uncertainty (depending on the probability of the switch) after reading the symbol, about what was actually transmitted. So in order to send the same amount of information, you have to send a string which is longer by an amount that you can calculate, using this probability stuff. If you're an engineer who has to deal with noisy telephone lines and sources of error, this is very handy to know. You can calculate how much redundancy you need to get your message through, with whatever degree of confidence you like. (Of course, this is actually more difficult if you're sending sounds instead of symbols, but if you encode everything digitally, so that you're sending zeros and ones, it gets simple again. Shannon has a whole section on analog sources, but I really only skimmed that, as a native of the digital age. When Shannon talked about discrete signals, he was thinking dots and dashes, as sent over telegraph wires. He did this work in the late 1940s.)

Now what does all of this have to do with Chinese orthography? Well, Chinese has a lot more symbols. So the surprisal of each symbol is much, much higher, and so the information content, the average surprisal, is as well... (That is, if you choose the describe the language on a symbol-by-symbol basis, and not a word-by-word or sentence-by-sentence or phoneme-by-phoneme basis.)

But a sentence in Chinese is also going to be shorter than the same sentence in English... Does it work out exactly? Do two sentences which mean the same thing have the same information content in both languages? Does Shannon's mathematical information actually relate to the everyday kind, to meaning? Can you quantify meaning at all?

To work it out, I would have to know a bit about Chinese orthography first... Hence the link.

(My friend Carol is studying this stuff more formally, and I sent her this link first. I owe her an e-mail, and my thanks, for correcting some misconceptions I'd got hold of.)

No comments: