The other day I was enjoying one of my favorite snacks, Alphaghetti, and I began to ask myself a few questions.
How many letters are in a can of Alphaghetti? Are all 26 letters included, or are there some that are missing? Is there a roughly equal number of each letter, or are some letters consistently more common than others? To answer these questions, I decided to do a scientifically rigorous analysis of the statistical properties of Alphaghetti.
The first step is to collect data. I grabbed a can of Alphaghetti, and emptied it into a bowl. I began picking the letters out one by one, keeping a tally of all 26 letters. I enlisted the help of my roommate John.
It took about two hours to count all the letters. There were 853 letters in the can. The data is here. There were no letters missing. The most common letter was “i”, but it was not so far ahead that it couldn’t be fluke.
I ran some chi-squared tests to get a better idea of the frequency distribution amongst the letters. I discovered that there is no way the letters are distributed as they are in a large sample of English text. Another chi-squared test revealed that it’s not totally implausible that the letters are uniformly distributed. If I had to guess based on this evidence, I would say that the letters in a can of Alphaghetti are uniformly distributed, although this guess is not statistically justified.
One fact that surprised me was the surprisingly low occurrence of defects in the letters. I would expect that letters like M, W, and Z would break often, but they didn’t. I only found 30 noodle fragments in the whole can which could not be clearly identified as letters.