Described by Billy Pilgrim in Slaughterhouse-Five (Kurt Vonnegut, 1969) as two feet high green toilet plungers with a hand on top, and a single green eye in the palm, the Tralfamadorians are friendly alien creatures that can see in four dimensions. Billy recounts (based on his observations while caged in a Tralfamadorian zoo)
…they could see in four dimensions. They pitied Earthlings for being able to see only three. They had many wonderful things to teach Earthlings about time.Because future and past are the same to them, they are not greatly concerned by events (including the fact that they are responsible for the eventual destruction of the universe!), and tend to simply respond with:
So it goes.
Now suppose the only aid the Tralfamadorians have for learning to communicate with the people of Earth is a copy of Great Expectations by Charles Dickens, that somehow came floating through space and landed on their planet. Let's follow the progress of their mathematicians as they analysed this wonderful book in an effort to understand how we on Earth communicate.
The Tralfamadorian mathematicians quickly realised that this Earth language was written with an alphabet, so after identifying the letter symbols, they began constructing random sentences using these letters and a space as below.
These results were not particularly encouraging — the sentences produced did not display any real similarity to the Dickens text, and would thus unlikely be of any use in communicating on Earth.
They continued their analysis, and recognised that the letter symbols appeared with different frequencies in the text. For example, an 'e' appeared 9.7% of the time (that's very nearly one in every ten letters), more often than the 2.4% for an 'm', and much more often than the tiny 0.1% for an 'x' (or once in every 100 letters). The complete table of letter frequencies is as follows.
space 19.6% | a 6.6% | b 1.3% |
c 1.8% | d 3.9% | e 9.7% |
f 1.7% | g 1.7% | h 5.1% |
i 5.8% | j 0.18% | k 0.8% |
l 2.9% | m 2.4% | n 5.6% |
o 6.3% | p 1.4% | q 0.07% |
r 4.3% | s 4.8% | t 7.3% |
u 2.3% | v 0.71% | w 2.1% |
x 0.1% | y 1.7% | z 0.02% |
This improved the appearance of the text, but only very rarely did it generate words that matched the original.
th 2.2% | he 2.1% | in 1.5% |
an 1.4% | er 1.4% | nd 1.1% |
Working this way there are still only very few genuine "words" generated, but the structure of the text and the original are definitely converging.
the 1.22% | and 0.86% | ing 0.64% |
her 0.42% | tha 0.36% | you 0.32% |
Genuine words are now beginning to appear, and it seems that the Tralfamadorians are really onto something — something that can unlock the structure of this strange and alien language called "English".
By this stage the Tralfamadorian mathematicians could generate text that matched quite well with the original, and so completed their study of letter frequency. They next turned their analytical attention to the statistical properties of complete words.
Just like letter frequencies, different words occur with different frequencies. Of the almost 200 000 words in the Dickens text, 586 of them appear more than 100 times, whereas 4683 words appear only once. Most common is the word 'the', appearing more than 8000 times (or approximately 4% of all words). The following table shows the number of appearances for the 12 most common words.
the 8145 | and 7098 | to 5157 |
of 4438 | a 4049 | in 3028 |
that 2987 | was 2836 | it 2671 |
he 2208 | you 2185 | my 2070 |
With this in mind, the Tralfamadorian mathematicians turned their attention to how meaning emerges from the arrangement of words.
Just like generating words from letter frequencies was insufficient for reproducing the appearance of real text, a string of random words does not read at all like a sentence.
While this still results in very strange sentences, there is definitely some coherence, and it is often possible to assign meaning to reasonably long sections.
Surprisingly, this frequently produces sentences with a coherent meaning that extends over much more than 3 words.