Archive for January, 2010

Statistical Properties of Alphaghetti

Saturday, January 23rd, 2010

The other day I was enjoying one of my favorite snacks, Alphaghetti, and I began to ask myself a few questions.

How many letters are in a can of Alphaghetti?  Are all 26 letters included, or are there some that are missing?  Is there a roughly equal number of each letter, or are some letters consistently more common than others?  To answer these questions, I decided to do a scientifically rigorous analysis of the statistical properties of Alphaghetti.

The first step is to collect data.  I grabbed a can of Alphaghetti, and emptied it into a bowl.  I began picking the letters out one by one, keeping a tally of all 26 letters.  I enlisted the help of my roommate John.

Jeff and John tallying noodles from a can of Alphaghetti

It took about two hours to count all the letters.  There were 853 letters in the can.  The data is here.  There were no letters missing.  The most common letter was “i”, but it was not so far ahead that it couldn’t be fluke.

Jeff tallying noodles

I ran some chi-squared tests to get a better idea of the frequency distribution amongst the letters.  I discovered that there is no way the letters are distributed as they are in a large sample of English text.  Another chi-squared test revealed that it’s not totally implausible that the letters are uniformly distributed.  If I had to guess based on this evidence, I would say that the letters in a can of Alphaghetti are uniformly distributed, although this guess is not statistically justified.

One fact that surprised me was the surprisingly low occurrence of defects in the letters.  I would expect that letters like M, W, and Z would break often, but they didn’t.  I only found 30 noodle fragments in the whole can which could not be clearly identified as letters.

The Contest Codebase

Saturday, January 23rd, 2010

There are currently six people working on the software that runs the Google AI Challenge.  As it happens, pieces of it are written in six different programming languages.  This software is quite an eclectic beast!  Here is the breakdown of programming languages in use, by number of lines of code.

  • Java: 3000 lines
  • C: 1500 lines
  • PHP: 1200 lines
  • Ruby: 1000 lines
  • Bash: 140 lines
  • Python: 130 lines

The grand total is about 7000 lines of code, and still growing.